Re: [Xen-devel] [PATCH v11 6/9] xen: Add ring 3 vmware_port support
On 12.06.15 at 00:10, dsl...@verizon.com wrote: On 06/05/15 06:54, Ian Campbell wrote: It would be really useful to see a comprehensive list of exactly what guest ring3 access to the vmware port actually enables i.e. a list of specific features which require it. Ok, I have done some testing. Here is what I know: Without ring3 support: 1) VMware tools will not install on linux and windows. 2) open-vm-tools (https://github.com/vmware/open-vm-tools) will not install (how ever it is not hard to change it to do so, you need to add a call to iopl(3) need to be added in a few places) on linux However if VMware tools did get installed on the window disk bits somehow, the VMware mouse support works. Linux gets this because Xorg detects and uses the VMware mouse under IOPL(3). Now that tells us that the tools may not work, but not what implications that has on the usability of the VM once migrated to Xen. Them not installing is a non-issue afaict, since after having moved the VM to Xen there shouldn't be a need to install them anymore - either they've been there, or I don't see why they would be needed _after_ the move. And you say that the mouse works in both cases if the tools happen to be there. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 06/16] hvmloader: get guest memory map into memory_map[]
On 2015/6/11 17:38, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM Now we get this map layout by call XENMEM_memory_map then save them into one global variable memory_map[]. It should include lowmem range, rdm range and highmem range. Note rdm range and highmem range may not exist in some cases. And here we need to check if any reserved memory conflicts with [RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END]. This range is used to allocate memory in hvmloder level, and we would lead hvmloader failed in case of conflict since its another rare possibility in real world. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- tools/firmware/hvmloader/e820.h | 7 +++ tools/firmware/hvmloader/hvmloader.c | 37 tools/firmware/hvmloader/util.c | 26 + tools/firmware/hvmloader/util.h | 11 +++ 4 files changed, 81 insertions(+) diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h index b2ead7f..8b5a9e0 100644 --- a/tools/firmware/hvmloader/e820.h +++ b/tools/firmware/hvmloader/e820.h @@ -15,6 +15,13 @@ struct e820entry { uint32_t type; } __attribute__((packed)); +#define E820MAX128 + +struct e820map { +unsigned int nr_map; +struct e820entry map[E820MAX]; +}; + #endif /* __HVMLOADER_E820_H__ */ /* diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c index 25b7f08..c9f170e 100644 --- a/tools/firmware/hvmloader/hvmloader.c +++ b/tools/firmware/hvmloader/hvmloader.c @@ -107,6 +107,8 @@ asm ( .text \n ); +struct e820map memory_map; + unsigned long scratch_start = SCRATCH_PHYSICAL_ADDRESS; static void init_hypercalls(void) @@ -199,6 +201,39 @@ static void apic_setup(void) ioapic_write(0x11, SET_APIC_ID(LAPIC_ID(0))); } +void memory_map_setup(void) +{ +unsigned int nr_entries = E820MAX, i; +int rc; +uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1; +uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr; + +rc = get_mem_mapping_layout(memory_map.map, nr_entries); + +if ( rc ) +{ +printf(Failed to get guest memory map.\n); +BUG(); +} + +BUG_ON(!nr_entries); +memory_map.nr_map = nr_entries; + +for ( i = 0; i nr_entries; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED ) +{ +if ( check_overlap(alloc_addr, alloc_size, + memory_map.map[i].addr, + memory_map.map[i].size) ) +{ +printf(RDM conflicts Memory allocation.\n); hvmloader has no concept of RDM here. It's just E820_RESERVED type. Please make the error message clear, e.g. Fail to setup memory map due to conflict on dynamic reserved memory range. Okay, +{ +printf(Fail to setup memory map due to conflict); +printf( on dynamic reserved memory range.\n); +BUG(); +} Otherwise: Reviewed-by: Kevin Tian kevin.t...@intel.com Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 3/7] libxl: add pvusb API
On 6/12/2015 at 03:39 PM, in message 557afd2f0266000d4...@relay2.provo.novell.com, Chun Yan Liu cy...@suse.com wrote: On 6/12/2015 at 12:42 AM, in message 21881.47707.526863.158...@mariner.uk.xensource.com, Ian Jackson ian.jack...@eu.citrix.com wrote: Chunyan Liu writes ([Xen-devel] [PATCH V4 3/7] libxl: add pvusb API): Add pvusb APIs, including: ... Thanks for your contribution. I'm afraid I haven't had time to completely finish my review this, but here's what I have: --- /dev/null +++ b/docs/misc/pvusb.txt @@ -0,0 +1,243 @@ +1. Overview It's good that you have provided documentation. But I think this document is a bit confused about its audience. Information about design choices should be removed from this user- and application-facing document, and put in comments in the code, or commit messages, I think. Thanks. Will update. +int libxl_device_usbctrl_getinfo(libxl_ctx *ctx, uint32_t domid, +libxl_device_usbctrl *usbctrl, +libxl_usbctrlinfo *usbctrlinfo) +LIBXL_EXTERNAL_CALLERS_ONLY; Why is this function marked LIBXL_EXTERNAL_CALLERS_ONLY ? Currently libxl itself won't call it. Exposed for toolstack usage. Will remove that if it's not proper. diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index ef7aa1d..89a9f07 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2439,6 +2439,18 @@ _hidden void libxl__device_vtpm_add(libxl__egc *egc, uint32_t domid, ... +/* AO operation to connect a PVUSB controller. + * Adding a PVUSB controller will add a 'vusb' device entry in xenstore, + * and will wait for device connection. In this context I think will wait for device connection is misleading. What you mean is that the vusb is available for adding devices to, but won't have any yet. Nothing in libxl is waiting. Here I mean libxl_wait_for_device_connection. Since it adds a new device entry to xenstore, it needs to wait for a moment for frontend/backend connection. diff --git a/tools/libxl/libxl_pvusb.c b/tools/libxl/libxl_pvusb.c new file mode 100644 index 000..a6e1aa1 --- /dev/null +++ b/tools/libxl/libxl_pvusb.c @@ -0,0 +1,1249 @@ ... +void libxl__device_usbctrl_add(libxl__egc *egc, uint32_t domid, + libxl_device_usbctrl *usbctrl, + libxl__ao_device *aodev) +{ +STATE_AO_GC(aodev-ao); ... +libxl_domain_config_init(d_config); +libxl_device_usbctrl_init(usbctrl_saved); +libxl_device_usbctrl_copy(CTX, usbctrl_saved, usbctrl); Wei will need to review the saved/live saved device info handling, and the json, etc. +static int +libxl_device_usbctrl_remove_common(libxl_ctx *ctx, uint32_t domid, + libxl_device_usbctrl *usbctrl, + const libxl_asyncop_how *ao_how, + int force) As discussed, you mustn't call this within libxl. I don't quite understand why. I guess it's the same as usb_add problem, something related to embedded ao? As in usb_add: libxl_device_usb_add() AO_CREATE(ctx, domid, ao_how) libxl__device_usb_add() libxl__device_usb_setdefault() libxl_device_usbctrl_add_common() AO_CREATE(ctx, domid, NULL) if this is not allowed, what is the correct way? Saw the discussion thread and got it. Will update the codes. If you need to, you need to break it out into an internal function (libxl__initiate_device_usbctrl_remove, maybe) which makes a callback when done. +libxl_device_usbctrl * +libxl_device_usbctrl_list(libxl_ctx *ctx, uint32_t domid, int *num) +{ +GC_INIT(ctx); +libxl_device_usbctrl *usbctrls = NULL; +char *fe_path = NULL; +char **dir = NULL; +unsigned int ndirs = 0; + +*num = 0; + +fe_path = GCSPRINTF(%s/device/vusb, +libxl__xs_get_dompath(gc, domid)); +dir = libxl__xs_directory(gc, XBT_NULL, fe_path, ndirs); + +if (dir ndirs) { +usbctrls = malloc(sizeof(*usbctrls) * ndirs); Please use libxl__calloc with NOGC. Thanks. Missing this one. +libxl_device_usbctrl* usbctrl; +libxl_device_usbctrl* end = usbctrls + ndirs; +for (usbctrl = usbctrls; usbctrl end; usbctrl++, dir++, (*num)++) { +char *tmp; +const char *be_path = libxl__xs_read(gc, XBT_NULL, +
Re: [Xen-devel] [v3][PATCH 08/16] hvmloader/e820: construct guest e820 table
On 2015/6/11 17:59, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM Now we can use that memory map to build our final e820 table but it may need to reorder all e820 entries. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- tools/firmware/hvmloader/e820.c | 62 +++-- 1 file changed, 48 insertions(+), 14 deletions(-) diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c index 2e05e93..c39b0aa 100644 --- a/tools/firmware/hvmloader/e820.c +++ b/tools/firmware/hvmloader/e820.c @@ -73,7 +73,8 @@ int build_e820_table(struct e820entry *e820, unsigned int lowmem_reserved_base, unsigned int bios_image_base) { -unsigned int nr = 0; +unsigned int nr = 0, i, j; +uint64_t low_mem_pgend = hvm_info-low_mem_pgend PAGE_SHIFT; You may call it low_mem_end to differentiate from original low_mem_pgend since one means actual address while the other means pfn. Okay. if ( !lowmem_reserved_base ) lowmem_reserved_base = 0xA; @@ -117,13 +118,6 @@ int build_e820_table(struct e820entry *e820, e820[nr].type = E820_RESERVED; nr++; -/* Low RAM goes here. Reserve space for special pages. */ -BUG_ON((hvm_info-low_mem_pgend PAGE_SHIFT) (2u 20)); -e820[nr].addr = 0x10; -e820[nr].size = (hvm_info-low_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; -nr++; - /* * Explicitly reserve space for special pages. * This space starts at RESERVED_MEMBASE an extends to cover various @@ -159,16 +153,56 @@ int build_e820_table(struct e820entry *e820, nr++; } - -if ( hvm_info-high_mem_pgend ) +/* + * Construct the remaining according memory_map. Construct E820 table according to recorded memory map Fixed. + * + * Note memory_map includes, The memory map created by toolstack may include: Fixed. + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * + * #2. RDM region if it exists Reserved regions if they exist Fixed. + * + * #3. High memory region if it exists + */ +for ( i = 0; i memory_map.nr_map; i++ ) { -e820[nr].addr = ((uint64_t)1 32); -e820[nr].size = -((uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; +e820[nr] = memory_map.map[i]; nr++; } +/* Low RAM goes here. Reserve space for special pages. */ +BUG_ON(low_mem_pgend (2u 20)); +/* + * We may need to adjust real lowmem end since we may + * populate RAM to get enough MMIO previously. + */ +for ( i = 0; i memory_map.nr_map; i++ ) since you already translate memory map into e820 earlier, here you should use 'nr' instead of memory_map.nr_map. As we're saying in the code comment above, we're just handling the lowmem entry, so I think memory_map.nr_map is enough. +{ +uint64_t end = e820[i].addr + e820[i].size; +if ( e820[i].type == E820_RAM + low_mem_pgend e820[i].addr low_mem_pgend end ) +e820[i].size = low_mem_pgend - e820[i].addr; +} Sorry I may miss the code but could you elaborate where the low_mem_pgend is changed after memory map is created? If it happens within hvmloader, suppose the amount of reduced memory from original E820_RAM entry should be added to another E820_RAM entry for highmem, right? You're right so I really should compensate this in highmem entry, add_high_mem = end - low_mem_end; /* * And then we also need to adjust highmem. */ if ( add_high_mem ) { for ( i = 0; i memory_map.nr_map; i++ ) { if ( e820[i].type == E820_RAM e820[i].addr (1ull 32)) e820[i].size += add_high_mem; } } Thanks Tiejun + +/* Finally we need to reorder all e820 entries. */ +for ( j = 0; j nr-1; j++ ) +{ +for ( i = j+1; i nr; i++ ) +{ +if ( e820[j].addr e820[i].addr ) +{ +struct e820entry tmp; +tmp = e820[j]; +e820[j] = e820[i]; +e820[i] = tmp; +} +} +} + return nr; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 00/16] Fix RMRR
On 2015/6/12 16:04, Jan Beulich wrote: On 12.06.15 at 04:10, tiejun.c...@intel.com wrote: On 2015/6/11 20:52, Tim Deegan wrote: which would be better handeld explicitly: if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm ) ... So if I'm correct, we should do this check explicitly, if ( p2mt == p2m_invalid || (p2mt == p2m_mmio_dm !mfn_valid(mfn) ) Note this is equivalent to Jan's comment. I think the !mfn_valid() part is really redundant - p2m_mmio_dm should never be put on a page translating to a valid MFN. I'm not sure immediately but I believe you're definitely right at this point, so lets goes there :) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 2/8] xen/arm: Add functions of mapping between vCPUID and virtual affinity
From: Chen Baozi baoz...@gmail.com GICv3 restricts that the maximum number of CPUs in affinity 0 (one cluster) is 16. (See the note of 'Bits[15:0]' in '5.7.29 ICC_SGI0R_EL1 ICC_SGI1R_EL1 and ICC_ASGI1R_EL1, GICv3 Architecture Specification') That is to say the upper 4 bits of affinity 0 is unused. Current implementation considers that AFF0 is equal to vCPUID, which makes all vCPUs in one cluster, limiting its number to 16. If we would like to support more than 16 number of vCPU in one guest, we need to make use of AFF1. Considering the unused upper 4 bits, we need to create a pair of functions mapping the vCPUID and virtual affinity. Signed-off-by: Chen Baozi baoz...@gmail.com Reviewed-by: Julien Grall julien.gr...@citrix.com --- xen/include/asm-arm/domain.h | 38 ++ 1 file changed, 38 insertions(+) diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 75b17af..35b9a6d 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -266,6 +266,44 @@ static inline unsigned int domain_max_vcpus(const struct domain *d) return MAX_VIRT_CPUS; } +/* + * Due to the restriction of GICv3, the number of vCPUs in AFF0 is + * limited to 16, thus only the first 4 bits of AFF0 are legal. We will + * use the first 2 affinity levels here, expanding the number of vCPU up + * to 4096 (16*256), which is more than the PEs that GIC-500 supports. + * + * Since we don't save information of vCPU's topology (affinity) in + * vMPIDR at the moment, we map the vcpuid to the vMPIDR linearly. + */ +static inline unsigned int vaffinity_to_vcpuid(register_t vaff) +{ +unsigned int vcpuid; + +vaff = MPIDR_HWID_MASK; + +vcpuid = MPIDR_AFFINITY_LEVEL(vaff, 0); +vcpuid |= MPIDR_AFFINITY_LEVEL(vaff, 1) 4; + +return vcpuid; +} + +static inline register_t vcpuid_to_vaffinity(unsigned int vcpuid) +{ +register_t vaff; + +/* + * Right now only AFF0 and AFF1 are supported in virtual affinity. + * Since only the first 4 bits in AFF0 are used in GICv3, the + * available bits are 12 (4+8). + */ +BUILD_BUG_ON(!(MAX_VIRT_CPUS ((1 12; + +vaff = (vcpuid 0x0f) MPIDR_LEVEL_SHIFT(0); +vaff |= ((vcpuid 4) MPIDR_LEVEL_MASK) MPIDR_LEVEL_SHIFT(1); + +return vaff; +} + #endif /* __ASM_DOMAIN_H__ */ /* -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 5/8] tools/libxl: Set 'reg' of cpu node equal to MPIDR affinity for domU
From: Chen Baozi baoz...@gmail.com According to ARM CPUs bindings, the reg field should match the MPIDR's affinity bits. We will use AFF0 and AFF1 when constructing the reg value of the guest at the moment, for it is enough for the current max vcpu number. Signed-off-by: Chen Baozi baoz...@gmail.com Reviewed-by: Julien Grall julien.gr...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com --- tools/libxl/libxl_arm.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c index c5088c4..5f3c434 100644 --- a/tools/libxl/libxl_arm.c +++ b/tools/libxl/libxl_arm.c @@ -272,6 +272,7 @@ static int make_cpus_node(libxl__gc *gc, void *fdt, int nr_cpus, const struct arch_info *ainfo) { int res, i; +uint64_t mpidr_aff; res = fdt_begin_node(fdt, cpus); if (res) return res; @@ -283,7 +284,16 @@ static int make_cpus_node(libxl__gc *gc, void *fdt, int nr_cpus, if (res) return res; for (i = 0; i nr_cpus; i++) { -const char *name = GCSPRINTF(cpu@%d, i); +const char *name; + +/* + * According to ARM CPUs bindings, the reg field should match + * the MPIDR's affinity bits. We will use AFF0 and AFF1 when + * constructing the reg value of the guest at the moment, for it + * is enough for the current max vcpu number. + */ +mpidr_aff = (i 0x0f) | (((i 4) 0xff) 8); +name = GCSPRINTF(cpu@%PRIx64, mpidr_aff); res = fdt_begin_node(fdt, name); if (res) return res; @@ -297,7 +307,7 @@ static int make_cpus_node(libxl__gc *gc, void *fdt, int nr_cpus, res = fdt_property_string(fdt, enable-method, psci); if (res) return res; -res = fdt_property_regs(gc, fdt, 1, 0, 1, (uint64_t)i); +res = fdt_property_regs(gc, fdt, 1, 0, 1, mpidr_aff); if (res) return res; res = fdt_end_node(fdt); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 4/8] xen/arm: Use AFF1 when translating ICC_SGI1R_EL1 to cpumask
From: Chen Baozi baoz...@gmail.com The old unsigned long type of vcpu_mask can only express 64 cpus at the most, which might not be enough for the guest which used vGICv3. We introduce a new struct sgi_target for the target cpu list of SGI, which holds the affinity path information (only level 1 at the moment). For GICv2 that has no affinity level, we can just set the corresponding fields to be 0. Signed-off-by: Chen Baozi baoz...@gmail.com --- xen/arch/arm/vgic-v2.c| 7 +++--- xen/arch/arm/vgic-v3.c| 10 + xen/arch/arm/vgic.c | 45 +-- xen/include/asm-arm/gic_v3_defs.h | 3 +++ xen/include/asm-arm/vgic.h| 7 +- 5 files changed, 38 insertions(+), 34 deletions(-) diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c index 3be1a51..5949cf1 100644 --- a/xen/arch/arm/vgic-v2.c +++ b/xen/arch/arm/vgic-v2.c @@ -201,16 +201,17 @@ static int vgic_v2_to_sgi(struct vcpu *v, register_t sgir) int virq; int irqmode; enum gic_sgi_mode sgi_mode; -unsigned long vcpu_mask = 0; +struct sgi_target target; +memset(target, 0, sizeof(struct sgi_target)); irqmode = (sgir GICD_SGI_TARGET_LIST_MASK) GICD_SGI_TARGET_LIST_SHIFT; virq = (sgir GICD_SGI_INTID_MASK); -vcpu_mask = (sgir GICD_SGI_TARGET_MASK) GICD_SGI_TARGET_SHIFT; /* Map GIC sgi value to enum value */ switch ( irqmode ) { case GICD_SGI_TARGET_LIST_VAL: +target.list = (sgir GICD_SGI_TARGET_MASK) GICD_SGI_TARGET_SHIFT; sgi_mode = SGI_TARGET_LIST; break; case GICD_SGI_TARGET_OTHERS_VAL: @@ -226,7 +227,7 @@ static int vgic_v2_to_sgi(struct vcpu *v, register_t sgir) return 0; } -return vgic_to_sgi(v, sgir, sgi_mode, virq, vcpu_mask); +return vgic_to_sgi(v, sgir, sgi_mode, virq, target); } static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info) diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index ef9a71a..93610d0 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -977,17 +977,19 @@ static int vgic_v3_to_sgi(struct vcpu *v, register_t sgir) int virq; int irqmode; enum gic_sgi_mode sgi_mode; -unsigned long vcpu_mask = 0; +struct sgi_target target; +memset(target, 0, sizeof(struct sgi_target)); irqmode = (sgir ICH_SGI_IRQMODE_SHIFT) ICH_SGI_IRQMODE_MASK; virq = (sgir ICH_SGI_IRQ_SHIFT ) ICH_SGI_IRQ_MASK; -/* SGI's are injected at Rdist level 0. ignoring affinity 1, 2, 3 */ -vcpu_mask = sgir ICH_SGI_TARGETLIST_MASK; /* Map GIC sgi value to enum value */ switch ( irqmode ) { case ICH_SGI_TARGET_LIST: +/* We assume that only AFF1 is used in ICC_SGI1R_EL1. */ +target.aff1 = (sgir ICH_SGI_AFFINITY_LEVEL(1)) ICH_SGI_AFFx_MASK; +target.list = sgir ICH_SGI_TARGETLIST_MASK; sgi_mode = SGI_TARGET_LIST; break; case ICH_SGI_TARGET_OTHERS: @@ -998,7 +1000,7 @@ static int vgic_v3_to_sgi(struct vcpu *v, register_t sgir) return 0; } -return vgic_to_sgi(v, sgir, sgi_mode, virq, vcpu_mask); +return vgic_to_sgi(v, sgir, sgi_mode, virq, target); } static int vgic_v3_emulate_sysreg(struct cpu_user_regs *regs, union hsr hsr) diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 7b387b7..59bd98a 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -318,15 +318,14 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n) } } -/* TODO: unsigned long is used to fit vcpu_mask.*/ int vgic_to_sgi(struct vcpu *v, register_t sgir, enum gic_sgi_mode irqmode, int virq, -unsigned long vcpu_mask) +const struct sgi_target *target) { struct domain *d = v-domain; int vcpuid; int i; - -ASSERT(d-max_vcpus 8*sizeof(vcpu_mask)); +unsigned int base; +unsigned long int bitmap; ASSERT( virq 16 ); @@ -334,29 +333,33 @@ int vgic_to_sgi(struct vcpu *v, register_t sgir, enum gic_sgi_mode irqmode, int { case SGI_TARGET_LIST: perfc_incr(vgic_sgi_list); +base = target-aff1 4; +bitmap = target-list; +for_each_set_bit( i, bitmap, sizeof(target-list) * 8 ) +{ +vcpuid = base + i; +if ( d-vcpu[vcpuid] != NULL !is_vcpu_online(d-vcpu[vcpuid]) ) +{ +gprintk(XENLOG_WARNING, VGIC: write r=%PRIregister \ +target-list=%hx, wrong CPUTargetList \n, +sgir, target-list); +continue; +} +vgic_vcpu_inject_irq(d-vcpu[vcpuid], virq); +} break; case SGI_TARGET_OTHERS: -/* - * We expect vcpu_mask to be 0 for SGI_TARGET_OTHERS and - * SGI_TARGET_SELF mode. So Force vcpu_mask to 0 - */ perfc_incr(vgic_sgi_others); -vcpu_mask = 0; for ( i = 0; i d-max_vcpus;
[Xen-devel] [PATCH v8 6/8] xen/arm: Set 'reg' of cpu node for dom0 to match MPIDR's affinity
From: Chen Baozi baoz...@gmail.com According to ARM CPUs bindings, the reg field should match the MPIDR's affinity bits. We will use AFF0 and AFF1 when constructing the reg value of the guest at the moment, for it is enough for the current max vcpu number. Signed-off-by: Chen Baozi baoz...@gmail.com Acked-by: Ian Campbell ian.campb...@citrix.com Reviewed-by: Julien Grall julien.gr...@citrix.com --- xen/arch/arm/domain_build.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index a156de9..12b46ca 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -712,6 +712,7 @@ static int make_cpus_node(const struct domain *d, void *fdt, char buf[15]; u32 clock_frequency; bool_t clock_valid; +uint64_t mpidr_aff; DPRINT(Create cpus node\n); @@ -761,9 +762,16 @@ static int make_cpus_node(const struct domain *d, void *fdt, for ( cpu = 0; cpu d-max_vcpus; cpu++ ) { -DPRINT(Create cpu@%u node\n, cpu); +/* + * According to ARM CPUs bindings, the reg field should match + * the MPIDR's affinity bits. We will use AFF0 and AFF1 when + * constructing the reg value of the guest at the moment, for it + * is enough for the current max vcpu number. + */ +mpidr_aff = vcpuid_to_vaffinity(cpu); +DPRINT(Create cpu@%PRIx64 (logical CPUID: %d) node\n, mpidr_aff, cpu); -snprintf(buf, sizeof(buf), cpu@%u, cpu); +snprintf(buf, sizeof(buf), cpu@%lx, mpidr_aff); res = fdt_begin_node(fdt, buf); if ( res ) return res; @@ -776,7 +784,7 @@ static int make_cpus_node(const struct domain *d, void *fdt, if ( res ) return res; -res = fdt_property_cell(fdt, reg, cpu); +res = fdt_property_cell(fdt, reg, mpidr_aff); if ( res ) return res; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 7/8] xen/arm: make domain_max_vcpus return value from vgic_ops
From: Chen Baozi baoz...@gmail.com Each vGIC driver supports different maximum numbers of vCPU. For example, GICv2 is limited to 8 vCPUs, while GICv3 can support up to 4096 vCPUs if we use both AFF0 and AFF1. Thus, domain_max_vcpus should depend on not only MAX_VIRT_CPUS but also the version of vGIC that the guest uses. Since evtchn_init would call domain_max_vcpus to allocate poll_mask when the vgic_ops haven't been initialised yet, we make it return MAX_VIRT_CPUS at that time. On ARM32, event channel doesn't need to allocate the poll_mask because MAX_VIRT_CPUS BITS_PER_LONG, while allocating more memory (2 unsigned long rather than 1) only for poll_mask on arm64 with GICv2 looks not so expensive. We didn't keep it as the old static inline form because it will break compilation when access the member of struct domain: In file included from xen/include/xen/domain.h:6:0, from xen/include/xen/sched.h:10, from arm64/asm-offsets.c:10: xen/include/asm/domain.h: In function ‘domain_max_vcpus’: xen/include/asm/domain.h:266:10: error: dereferencing pointer to incomplete type if (d-arch.vgic.version == GIC_V2) ^ Signed-off-by: Chen Baozi baoz...@gmail.com --- xen/arch/arm/domain.c| 14 ++ xen/arch/arm/vgic-v2.c | 1 + xen/arch/arm/vgic-v3.c | 5 + xen/include/asm-arm/domain.h | 5 + xen/include/asm-arm/vgic.h | 2 ++ 5 files changed, 23 insertions(+), 4 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 0cf147c..01d8ca89 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -890,6 +890,20 @@ void vcpu_block_unless_event_pending(struct vcpu *v) vcpu_unblock(current); } +unsigned int domain_max_vcpus(const struct domain *d) +{ +/* + * Since evtchn_init would call domain_max_vcpus for poll_mask + * allocation when the vgic_ops haven't been initialised yet, + * we return MAX_VIRT_CPUS if d-arch.vgic.handler is null. + */ +if ( !d-arch.vgic.handler ) +return MAX_VIRT_CPUS; +else +return min_t(unsigned int, MAX_VIRT_CPUS, + d-arch.vgic.handler-max_vcpus); +} + /* * Local variables: * mode: C diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c index 5949cf1..bbeb740 100644 --- a/xen/arch/arm/vgic-v2.c +++ b/xen/arch/arm/vgic-v2.c @@ -585,6 +585,7 @@ const struct vgic_ops vgic_v2_ops = { .domain_init = vgic_v2_domain_init, .get_irq_priority = vgic_v2_get_irq_priority, .get_target_vcpu = vgic_v2_get_target_vcpu, +.max_vcpus = 8, }; /* diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index 93610d0..93af6c8 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -1204,6 +1204,11 @@ const struct vgic_ops vgic_v3_ops = { .get_irq_priority = vgic_v3_get_irq_priority, .get_target_vcpu = vgic_v3_get_target_vcpu, .emulate_sysreg = vgic_v3_emulate_sysreg, +/* + * We use both AFF1 and AFF0 in (v)MPIDR. Thus, the max number of CPU + * that can be supported is up to 4096(256*16) in theory. + */ +.max_vcpus = 4096, }; /* diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 35b9a6d..23598dd 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -261,10 +261,7 @@ struct arch_vcpu void vcpu_show_execution_state(struct vcpu *); void vcpu_show_registers(const struct vcpu *); -static inline unsigned int domain_max_vcpus(const struct domain *d) -{ -return MAX_VIRT_CPUS; -} +unsigned int domain_max_vcpus(const struct domain *); /* * Due to the restriction of GICv3, the number of vCPUs in AFF0 is diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h index d68539a..4477962 100644 --- a/xen/include/asm-arm/vgic.h +++ b/xen/include/asm-arm/vgic.h @@ -115,6 +115,8 @@ struct vgic_ops { struct vcpu *(*get_target_vcpu)(struct vcpu *v, unsigned int irq); /* vGIC sysreg emulation */ int (*emulate_sysreg)(struct cpu_user_regs *regs, union hsr hsr); +/* Maximum number of vCPU supported */ +const unsigned int max_vcpus; }; /* Number of ranks of interrupt registers for a domain */ -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 8/8] xen/arm64: increase MAX_VIRT_CPUS to 128 on arm64
From: Chen Baozi baoz...@gmail.com After we have increased the size of GICR in address space for guest and made use of both AFF0 and AFF1 in (v)MPIDR, we are now able to support up to 4096 vCPUs in theory. However, it will cost 512M address space for GICR region, which is unnecessary big at the moment. Considering the max CPU number that GIC-500 can support and the old value of MAX_VIRT_CPUS before commit aa25a61, we increase its value to 128. Signed-off-by: Chen Baozi baoz...@gmail.com --- Since the domain_max_vcpus has been changed to depends on vgic_ops, we could have done more work in order to drop the definition of MAX_VIRT_CPUS. However, because it is still used for some conditional compilation in common code, I think that would be better done in a separate cleanup patch series. xen/arch/arm/vgic-v3.c | 1 - xen/include/asm-arm/config.h | 4 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index 93af6c8..2548b62 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -889,7 +889,6 @@ static int vgic_v3_distr_mmio_write(struct vcpu *v, mmio_info_t *info) rank = vgic_rank_offset(v, 64, gicd_reg - GICD_IROUTER, DABT_DOUBLE_WORD); if ( rank == NULL ) goto write_ignore; -BUG_ON(v-domain-max_vcpus 8); new_irouter = *r; vgic_lock_rank(v, rank, flags); diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h index 3b23e05..817c216 100644 --- a/xen/include/asm-arm/config.h +++ b/xen/include/asm-arm/config.h @@ -47,7 +47,11 @@ #define NR_CPUS 128 #endif +#ifdef CONFIG_ARM_64 +#define MAX_VIRT_CPUS 128 +#else #define MAX_VIRT_CPUS 8 +#endif #define asmlinkage /* Nothing needed */ -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] x86: Avoid tripping watchdog when constructing dom0
Constructing dom0 may take a few seconds, particularly if the slow VESA graphics terminal is used. Process pending softirqs a few times to avoid tripping a watchdog with a short timeout. Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com --- Changes in v2: Process pending softirqs inside ELF code. xen/arch/x86/domain_build.c| 4 xen/common/libelf/libelf-dominfo.c | 8 2 files changed, 12 insertions(+) diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c index 03e4bfe..d76707f 100644 --- a/xen/arch/x86/domain_build.c +++ b/xen/arch/x86/domain_build.c @@ -930,6 +930,8 @@ int __init construct_dom0( BUG_ON(d-vcpu[0] == NULL); BUG_ON(v-is_initialised); +process_pending_softirqs(); + printk(*** LOADING DOMAIN 0 ***\n); d-max_pages = ~0U; @@ -1167,6 +1169,8 @@ int __init construct_dom0( _p(v_start), _p(v_end)); printk( ENTRY ADDRESS: %p\n, _p(parms.virt_entry)); +process_pending_softirqs(); + mpt_alloc = (vpt_start - v_start) + pfn_to_paddr(alloc_spfn); if ( vinitrd_start ) mpt_alloc -= PAGE_ALIGN(initrd_len); diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c index 6120dd4..4882c52 100644 --- a/xen/common/libelf/libelf-dominfo.c +++ b/xen/common/libelf/libelf-dominfo.c @@ -16,6 +16,10 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ +#ifdef __XEN__ +#include xen/softirq.h +#endif + #include libelf-private.h /* */ @@ -235,6 +239,10 @@ static unsigned elf_xen_parse_notes(struct elf_binary *elf, ELF_HANDLE_PTRVAL(note) parms-elf_note_end; note = elf_note_next(elf, note) ) { +#ifdef __XEN__ +process_pending_softirqs(); +#endif + if ( *total_note_count = ELF_MAX_TOTAL_NOTE_COUNT ) { elf_mark_broken(elf, too many ELF notes); -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 16/16] xen/vtd: prevent from assign the device with shared rmrr
On 2015/6/11 18:25, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM Currently we're intending to cover this kind of devices we're - we're not? I mean currently we want to handle this shared case *simply* so I think its still we're, right? with shared RMRR simply since the case of shared RMRR is a rare case according to our previous experiences. But late we can group these devices which shared rmrr, and then allow all devices within a group to be assigned to same domain. Signed-off-by: Tiejun Chen tiejun.c...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com except one text comment. --- xen/drivers/passthrough/vtd/iommu.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index d3233b8..f220081 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2277,13 +2277,37 @@ static int intel_iommu_assign_device( if ( list_empty(acpi_drhd_units) ) return -ENODEV; +seg = pdev-seg; +bus = pdev-bus; +/* + * In rare cases one given rmrr is shared by multiple devices but + * obviously this would put the security of a system at risk. So + * we should prevent from this sort of device assignment. + * + * TODO: actually we can group these devices which shared rmrr, and + * then allow all devices within a group to be assigned to same domain. TODO: in the future we can introduce group device assignment interface to make sure devices sharing RMRR are assigned to the same domain together. Thank you to rephrase this. Tiejun + */ +for_each_rmrr_device( rmrr, bdf, i ) +{ +if ( rmrr-segment == seg + PCI_BUS(bdf) == bus + PCI_DEVFN2(bdf) == devfn ) +{ +if ( rmrr-scope.devices_cnt 1 ) +{ +ret = -EPERM; +printk(XENLOG_G_ERR VTDPREFIX +cannot assign this device with shared RMRR for Dom%d (%d)\n, + d-domain_id, ret); +return ret; +} +} +} + ret = reassign_device_ownership(hardware_domain, d, devfn, pdev); if ( ret ) return ret; -seg = pdev-seg; -bus = pdev-bus; - /* Setup rmrr identity mapping */ for_each_rmrr_device( rmrr, bdf, i ) { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] libxl backports for 4.4 and 4.5
Ian, Jan, you seem to be disagreeing, or more likely Ian missed Jan's reply, so I wanted to double check. Right, but unless really urgent/important I think this is too late for 4.5.1 now and needs to be put on the list of backports for 4.5.2. Jan, just to clarify, are you saying back porting for 4.4.3 is OK, but 4.5.1 is too tight? Looking at the 4 months cadence, 4.4.3 would be sometimes in July: so I can live with this as a compromise. Even though having a released version of Xen which works out of the box with OpenStack is pretty important. @Stefano, @Anthony: please keep on top of necessary back ports for OpenStack in future. I thought these have been requested and only found out when I talked to Anthony yesterday that this wasn't the case. Regards Lars ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
On 2015/6/11 17:28, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM This patch extends the existing hypercall to support rdm reservation policy. We return error or just throw out a warning message depending on whether the policy is strict or relaxed when reserving RDM regions in pfn space. Note in some special cases, e.g. add a device to hwdomain, and remove a device from user domain, 'relaxed' is fine enough since this is always safe to hwdomain. could you elaborate add a device to hwdomain, and remove a device from user domain ? move a device from user domain to hwdomain or completely irrelevant? Yes, they're not relevant. And I think we shouldn't care our policy, #1. When add a device to hwdomain I think RMRR is always reserved on e820 so either of flag is fine. #2. remove a device from domain remove action also can ignore that since the original mechanism is enough. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- xen/arch/x86/mm/p2m.c | 8 +++- xen/drivers/passthrough/amd/pci_amd_iommu.c | 3 ++- xen/drivers/passthrough/arm/smmu.c | 2 +- xen/drivers/passthrough/device_tree.c | 11 ++- xen/drivers/passthrough/pci.c | 10 ++ xen/drivers/passthrough/vtd/iommu.c | 20 xen/include/asm-x86/p2m.h | 2 +- xen/include/public/domctl.h | 5 + xen/include/xen/iommu.h | 2 +- 9 files changed, 45 insertions(+), 18 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index c7198a5..3fcdcac 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -899,7 +899,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn, } int set_identity_p2m_entry(struct domain *d, unsigned long gfn, - p2m_access_t p2ma) + p2m_access_t p2ma, u32 flag) { p2m_type_t p2mt; p2m_access_t a; @@ -924,6 +924,12 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn, printk(XENLOG_G_WARNING Cannot identity map d%d:%lx, already mapped to %lx.\n, d-domain_id, gfn, mfn_x(mfn)); + +if ( flag == XEN_DOMCTL_DEV_RDM_RELAXED ) +{ +ret = 0; +printk(XENLOG_G_WARNING Some devices may work failed.\n); Do you need this extra printk? The warning message is already given several lines above and here you just need to change return value for relaxed policy. Okay. +} } gfn_unlock(p2m, gfn, 0); diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c index e83bb35..920b35a 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct domain *target, } static int amd_iommu_assign_device(struct domain *d, u8 devfn, - struct pci_dev *pdev) + struct pci_dev *pdev, + u32 flag) { struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev-seg); int bdf = PCI_BDF2(pdev-bus, devfn); diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c index 6cc4394..9a667e9 100644 --- a/xen/drivers/passthrough/arm/smmu.c +++ b/xen/drivers/passthrough/arm/smmu.c @@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct iommu_domain *domain) } static int arm_smmu_assign_dev(struct domain *d, u8 devfn, - struct device *dev) + struct device *dev, u32 flag) { struct iommu_domain *domain; struct arm_smmu_xen_domain *xen_domain; diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c index 5d3842a..ea85645 100644 --- a/xen/drivers/passthrough/device_tree.c +++ b/xen/drivers/passthrough/device_tree.c @@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev) goto fail; } -rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev)); +rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev), + XEN_DOMCTL_DEV_NO_RDM); if ( rc ) goto fail; @@ -148,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct domain *d, if ( domctl-u.assign_device.dev != XEN_DOMCTL_DEV_DT ) break; +if ( domctl-u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM ) +{ +printk(XENLOG_G_ERR XEN_DOMCTL_assign_device: assign \%s\ +to dom%u failed (%d) since we don't support RDM.\n, + dt_node_full_name(dev), d-domain_id, ret); +break; +} + if (
Re: [Xen-devel] [RFC v2] misc/xenmicrocode: Upload /lib/firmware/some blob to the hypervisor
On 12/06/15 03:30, Luis R. Rodriguez wrote: On Fri, Jan 30, 2015 at 1:57 PM, Luis R. Rodriguez mcg...@suse.com wrote: On Fri, Jan 30, 2015 at 08:37:33PM +, Andrew Cooper wrote: The right thing to do is to fix the hypercall implementation, at which point the utility below is fine and xc_microcode_update() can be a thing wrapper around the hypercall. The actions Xen needs to take are: - Copy the buffer into Xen. - Scan the buffer for the correct patch - Rendezvous all online cpus in an IPI to apply the patch, and keep the processors in until all have completed the patch. The IPI itself probably wants common rendezvous code, and a system specific call for application. The system specific call will need to adhere to the requirements in the relevant manual. Care will have to be taken to avoid deadlocking with the time calibration rendezvous, and facilities such as the watchdog might temporally need pausing. If you feel up to all that, then please go ahead. If not, I will attempt to find some copious free time. You can have a crack at that. Let me know when the above is ready and I'll respin. I'd try it but it seems you would spend considerbly less time than me doing the above. Andrew, wanted to check to see if you have had time to work on the above. Sadly no - I have not had time. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 03/16] xen/vtd: create RMRR mapping
On 2015/6/12 13:59, Tian, Kevin wrote: From: Chen, Tiejun Sent: Friday, June 12, 2015 1:58 PM On 2015/6/12 10:43, Chen, Tiejun wrote: On 2015/6/11 22:07, Tim Deegan wrote: At 17:31 +0800 on 11 Jun (1434043916), Chen, Tiejun wrote: while ( base_pfn end_pfn ) { -int err = intel_iommu_map_page(d, base_pfn, base_pfn, - IOMMUF_readable|IOMMUF_writable); +int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw); if ( err ) return err; Tim has another comment to replace earlier unmap with Yes, I knew this. guest_physmap_remove_page() which will call iommu unmap internally. Please include this change too. But, guest_physmap_remove_page() | + p2m_remove_page() | + iommu_unmap_page() | + p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), xxx) I think this already remove these pages both on ept/vt-d sides, right? Yes; this is about this code further up in the same function: while ( base_pfn end_pfn ) { if ( intel_iommu_unmap_page(d, base_pfn) ) ret = -ENXIO; base_pfn++; } which ought to be calling guest_physmap_remove_page() or similar, to make sure that both iommu and EPT mappings get removed. I still just think current implementation might be fine at this point. We have two scenarios here, the case of shared ept and the case of non-shared ept. But no matter what case we're tracking, shouldn't guest_physmap_remove_page() always call p2m-set_entry() to clear *all* *valid* mfn which is owned by a given VM? And p2m-set_entry() also calls iommu_unmap_page() internally. So nothing special should further consider. If I'm wrong or misunderstanding, please correct me :) Sorry for my misunderstanding to this. Right now Kevin help me understand what you mean. Sounds like we should address something specific to unmap rmrr here. So I will do this as follows: #1. Provide a clear helper +int clear_identity_p2m_entry(struct domain *d, unsigned long gfn, + unsigned int page_order) +{ +struct p2m_domain *p2m = p2m_get_hostp2m(d); +int ret; +gfn_lock(p2m, gfn, page_order); +ret = p2m_remove_page(p2m, gfn, gfn, page_order); +gfn_unlock(p2m, gfn, page_order); +return ret; +} + #2. Call such a helper @@ -1840,7 +1840,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, while ( base_pfn end_pfn ) { -if ( intel_iommu_unmap_page(d, base_pfn) ) +if ( clear_identity_p2m_entry(d, base_pfn, 0) ) ret = -ENXIO; base_pfn++; } Is this right? Thanks Tiejun could you explain why existing guest_physmap_remove_page can't serve the purpose so you need invent a new identity mapping specific one? For unmapping suppose it should be common regardless of whether it's identity-mapped or not. :-) I have some concerns here: #1. guest_physmap_remove_page() is a void function without a returning value, so you still need a little change. #2. guest_physmap_remove_page() doesn't make readable in such a code context; rmrr_identity_mapping() { ... guest_physmap_remove_page() ... } #3. a new helper is good to further extend if necessary. Of course, I'd like to modify-to-use guest_physmap_remove_page() if you guys aren't in agreement with me :) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 5/7] xl: add pvusb commands
On 6/12/2015 at 03:37 PM, in message 557a8c57.1040...@suse.com, Juergen Gross jgr...@suse.com wrote: On 06/10/2015 05:20 AM, Chunyan Liu wrote: Add pvusb commands: usb-ctrl-attach, usb-ctrl-detach, usb-list, usb-attach and usb-detach. To attach a usb device to guest through pvusb, one could follow following example: #xl usb-ctrl-attach test_vm version=1 num_ports=8 #xl usb-list test_vm will show the usb controllers and port usage under the domain. #xl usb-assignable-list will list assignable USB devices xl usb-assignable-list is not part of this patch. Either merge this patch and the following one, or describe the command in the next patch. Oh, yes, I forget to split. #xl usb-attach test_vm 1.6 will find the first usable controller:port, and attach usb device whose bus address is 1.6 (busnum is 1, devnum is 6) to it. One could also specify which controller and which port. #xl usb-detach test_vm 1.6 #xl usb-ctrl-detach test_vm dev_id will destroy the controller with specified dev_id. Dev_id can be traced in usb-list info. Signed-off-by: Chunyan Liu cy...@suse.com Signed-off-by: Simon Cao caobosi...@gmail.com --- docs/man/xl.pod.1 | 38 +++ tools/libxl/xl.h | 5 + tools/libxl/xl_cmdimpl.c | 251 ++ tools/libxl/xl_cmdtable.c | 25 + 4 files changed, 319 insertions(+) ... diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index c858068..b29d0fc 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -3219,6 +3219,257 @@ int main_cd_insert(int argc, char **argv) return 0; } +static void usbinfo_print(libxl_device_usb *usbs, int num) { +int i; Blank line missing. +if (usbs == NULL) + return; +for (i = 0; i num; i++) { +libxl_usbinfo usbinfo; Blank line missing. +libxl_usbinfo_init(usbinfo); + +if (usbs[i].port) +printf( Port %d:, usbs[i].port); +if (!libxl_device_usb_getinfo(ctx, usbs[i], usbinfo)) { +printf( Bus %03x Device %03x: ID %04x:%04x %s %s\n, +usbinfo.busnum, usbinfo.devnum, +usbinfo.idVendor, usbinfo.idProduct, +usbinfo.manuf ?: , usbinfo.prod ?: ); Is it really possible for a device to be assigned but without a port number? For assigned usb device, it's not possible. But this function will be called in two places, one is to list assigned usb devices in 'xl usb-list'; another is to list assignable usb devices in 'xl usb-assignable-list'. If 'usb-assignable-list' is not taken, this could be improved. I'd rather combine the two if's and printf statements. This would avoid the case where Port 1: Port 2: ... is printed due to a failing libxl_device_usb_getinfo() for port 1. +} +libxl_usbinfo_dispose(usbinfo); +} +} + +int main_usbctrl_attach(int argc, char **argv) +{ +uint32_t domid; +int opt; +char *oparg; +libxl_device_usbctrl usbctrl; + +SWITCH_FOREACH_OPT(opt, , NULL, usb-ctrl-attach, 1) { +/* No options */ +} + +domid = find_domain(argv[optind++]); + +libxl_device_usbctrl_init(usbctrl); + +while (argc optind) { +if (MATCH_OPTION(type, argv[optind], oparg)) { +if (!strcmp(oparg, pv)) { + usbctrl.protocol = LIBXL_USB_PROTOCOL_PV; +} else { + fprintf(stderr, unsupported type `%s'\n, oparg); + exit(-1); +} +} else if (MATCH_OPTION(version, argv[optind], oparg)) { +usbctrl.version = atoi(oparg); Shouldn't you check for valid versions? OK. Will check. +} else if (MATCH_OPTION(ports, argv[optind], oparg)) { +usbctrl.ports = atoi(oparg); Same here for number of ports. Otherwise you could blow up xenstore by e.g. specifying 2 billion ports here. Will add check. What's the supported max ports? +} else { +fprintf(stderr, unrecognized argument `%s'\n, argv[optind]); +exit(-1); +} +optind++; +} + +if (usbctrl.protocol == LIBXL_USB_PROTOCOL_AUTO) +usbctrl.protocol = LIBXL_USB_PROTOCOL_PV; Is this really necessary? You do it in libxl, too. Yes, seems not necessary. Anyway (from config file or hotplug), it will call libxl_device_usbctrl_setdefault and do that. Thanks, Chunyan Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list
[Xen-devel] [PATCH] pvusb: don't rely on linux kernel macros for the interface
The interface description of pvUSB lacks some access macros as using linux kernel macros is assumed to work well. This solution is rather unfriendly for pvusb implementations being outside the linux kernel. Additionally things will break quite unpleasent in case the linux kernel implementation is changed. To avoid these problems define own macros for accessing bitfields of the interface and for values of several structure members. While working on the file add some more comments, especially for the xenstore interface. Signed-off-by: Juergen Gross jgr...@suse.com --- xen/include/public/io/usbif.h | 140 -- 1 file changed, 121 insertions(+), 19 deletions(-) diff --git a/xen/include/public/io/usbif.h b/xen/include/public/io/usbif.h index 0af2a38..9ef0cdc 100644 --- a/xen/include/public/io/usbif.h +++ b/xen/include/public/io/usbif.h @@ -31,6 +31,76 @@ #include ring.h #include ../grant_table.h +/* + * Feature and Parameter Negotiation + * = + * The two halves of a Xen pvUSB driver utilize nodes within the XenStore to + * communicate capabilities and to negotiate operating parameters. This + * section enumerates these nodes which reside in the respective front and + * backend portions of the XenStore, following the XenBus convention. + * + * Any specified default value is in effect if the corresponding XenBus node + * is not present in the XenStore. + * + * XenStore nodes in sections marked PRIVATE are solely for use by the + * driver side whose XenBus tree contains them. + * + * + *Backend XenBus Nodes + * + * + *-- Backend Device Identification (PRIVATE) -- + * + * num-ports + * Values: unsigned [1...31] + * + * Number of ports for this (virtual) USB host connector. + * + * usb-ver + * Values: unsigned [1...2] + * + * USB version of this host connector: 1 = USB 1.1, 2 = USB 2.0. + * + * port/[1...31] + * Values: string + * + * Physical USB device connected to the given port, e.g. 3-1.5. + * + * + *Frontend XenBus Nodes + * + * + *--- Request Transport Parameters --- + * + * event-channel + * Values: unsigned + * + * The identifier of the Xen event channel used to signal activity + * in the ring buffer. + * + * urb-ring-ref + * Values: unsigned + * + * The Xen grant reference granting permission for the backend to map + * the sole page in a single page sized ring buffer. This is the ring + * buffer for urb requests. + * + * conn-ring-ref + * Values: unsigned + * + * The Xen grant reference granting permission for the backend to map + * the sole page in a single page sized ring buffer. This is the ring + * buffer for connection/disconnection requests. + * + * protocol + * Values: string (XEN_IO_PROTO_ABI_*) + * Default Value: XEN_IO_PROTO_ABI_NATIVE + * + * The machine ABI rules governing the format of all ring request and + * response structures. + * + */ + enum usb_spec_version { USB_VER_UNKNOWN = 0, USB_VER_USB11, @@ -41,37 +111,65 @@ enum usb_spec_version { /* * USB pipe in usbif_request * - * bits 0-5 are specific bits for virtual USB driver. - * bits 7-31 are standard urb pipe. - * - * - port number(NEW):bits 0-4 - * (USB_MAXCHILDREN is 31) + * - port number: bits 0-4 + * (USB_MAXCHILDREN is 31) * - * - operation flag(NEW): bit 5 - * (0 = submit urb, - * 1 = unlink urb) + * - operation flag: bit 5 + * (0 = submit urb, + * 1 = unlink urb) * * - direction: bit 7 - * (0 = Host-to-Device [Out] - * 1 = Device-to-Host [In]) + * (0 = Host-to-Device [Out] + * 1 = Device-to-Host [In]) * * - device address: bits 8-14 * * - endpoint:bits 15-18 * - * - pipe type: bits 30-31 - * (00 = isochronous, 01 = interrupt, - * 10 = control, 11 = bulk) + * - pipe type: bits 30-31 + * (00 = isochronous, 01 = interrupt, + * 10 = control, 11 = bulk) */ -#define usbif_pipeportnum(pipe) ((pipe) 0x1f) -#define usbif_setportnum_pipe(pipe, portnum) \ -
Re: [Xen-devel] [OSSTEST Nested PATCH v11 6/7] Compose the main recipe of nested test job
On Fri, 2015-06-12 at 11:42 +0800, Robert Hu wrote: Hi Ian J., because nested Xen isn't that matured at present; it is currently 'tech preview' phase. We now just add some sanity test case to defend current work fruits. We can add more complicated test cases later as nested Xen development moves forward. I don't think it needs doing as part of this series, but I do think it would be worth adding the standard suite of tests steps to L2 guests soon after. The way osstest deals with test failures i.e. classifying them into has always failed vs. regression means that it is fine to add tests for features which aren't mature yet, since they will fall into the has always failed set and not block pushes. In fact it is in some sense good to do this since it gives a concrete set of (necessary but not necessarily sufficient) things which need to be fixed for the feature to reach maturity. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
On 12.06.15 at 08:31, tiejun.c...@intel.com wrote: On 2015/6/11 17:28, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev) PCI_DEVFN2(bdf) != devfn ) continue; -rmrr_identity_mapping(pdev-domain, 0, rmrr); +rmrr_identity_mapping(pdev-domain, 0, rmrr, + XEN_DOMCTL_DEV_RDM_RELAXED); ditto It doesn't matter when we're trying to remove a device since we don't care this flag. In such a case it helps to add a brief comment saying that the precise value passed is irrelevant. Or maybe this could be expressed by folding this and the map parameters of the function (in which case it might become self-documenting)? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] libxl backports for 4.4 and 4.5
On 12.06.15 at 11:32, lars.kurth@gmail.com wrote: Ian, Jan, you seem to be disagreeing, or more likely Ian missed Jan's reply, so I wanted to double check. Right, but unless really urgent/important I think this is too late for 4.5.1 now and needs to be put on the list of backports for 4.5.2. Jan, just to clarify, are you saying back porting for 4.4.3 is OK, but 4.5.1 is too tight? Looking at the 4 months cadence, 4.4.3 would be sometimes in July: so I can live with this as a compromise. Even though having a released version of Xen which works out of the box with OpenStack is pretty important. The release being stuck anyway, perhaps we can put them in with the understanding that this time round we'll need at least another RC. Ian - you know how risky the changes are, please use your judgment. (Btw., as to 4.4.3, I don't think I want to kick off that process until 4.5.1 gets out of its stuck state, i.e. as all stable trees are affected by the underlying problem, I don't want both trees to be hold up half way through the stable release process.) Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 5/7] xl: add pvusb commands
On 06/10/2015 05:20 AM, Chunyan Liu wrote: Add pvusb commands: usb-ctrl-attach, usb-ctrl-detach, usb-list, usb-attach and usb-detach. To attach a usb device to guest through pvusb, one could follow following example: #xl usb-ctrl-attach test_vm version=1 num_ports=8 #xl usb-list test_vm will show the usb controllers and port usage under the domain. #xl usb-assignable-list will list assignable USB devices xl usb-assignable-list is not part of this patch. Either merge this patch and the following one, or describe the command in the next patch. #xl usb-attach test_vm 1.6 will find the first usable controller:port, and attach usb device whose bus address is 1.6 (busnum is 1, devnum is 6) to it. One could also specify which controller and which port. #xl usb-detach test_vm 1.6 #xl usb-ctrl-detach test_vm dev_id will destroy the controller with specified dev_id. Dev_id can be traced in usb-list info. Signed-off-by: Chunyan Liu cy...@suse.com Signed-off-by: Simon Cao caobosi...@gmail.com --- docs/man/xl.pod.1 | 38 +++ tools/libxl/xl.h | 5 + tools/libxl/xl_cmdimpl.c | 251 ++ tools/libxl/xl_cmdtable.c | 25 + 4 files changed, 319 insertions(+) ... diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index c858068..b29d0fc 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -3219,6 +3219,257 @@ int main_cd_insert(int argc, char **argv) return 0; } +static void usbinfo_print(libxl_device_usb *usbs, int num) { +int i; Blank line missing. +if (usbs == NULL) + return; +for (i = 0; i num; i++) { +libxl_usbinfo usbinfo; Blank line missing. +libxl_usbinfo_init(usbinfo); + +if (usbs[i].port) +printf( Port %d:, usbs[i].port); +if (!libxl_device_usb_getinfo(ctx, usbs[i], usbinfo)) { +printf( Bus %03x Device %03x: ID %04x:%04x %s %s\n, +usbinfo.busnum, usbinfo.devnum, +usbinfo.idVendor, usbinfo.idProduct, +usbinfo.manuf ?: , usbinfo.prod ?: ); Is it really possible for a device to be assigned but without a port number? I'd rather combine the two if's and printf statements. This would avoid the case where Port 1: Port 2: ... is printed due to a failing libxl_device_usb_getinfo() for port 1. +} +libxl_usbinfo_dispose(usbinfo); +} +} + +int main_usbctrl_attach(int argc, char **argv) +{ +uint32_t domid; +int opt; +char *oparg; +libxl_device_usbctrl usbctrl; + +SWITCH_FOREACH_OPT(opt, , NULL, usb-ctrl-attach, 1) { +/* No options */ +} + +domid = find_domain(argv[optind++]); + +libxl_device_usbctrl_init(usbctrl); + +while (argc optind) { +if (MATCH_OPTION(type, argv[optind], oparg)) { +if (!strcmp(oparg, pv)) { + usbctrl.protocol = LIBXL_USB_PROTOCOL_PV; +} else { + fprintf(stderr, unsupported type `%s'\n, oparg); + exit(-1); +} +} else if (MATCH_OPTION(version, argv[optind], oparg)) { +usbctrl.version = atoi(oparg); Shouldn't you check for valid versions? +} else if (MATCH_OPTION(ports, argv[optind], oparg)) { +usbctrl.ports = atoi(oparg); Same here for number of ports. Otherwise you could blow up xenstore by e.g. specifying 2 billion ports here. +} else { +fprintf(stderr, unrecognized argument `%s'\n, argv[optind]); +exit(-1); +} +optind++; +} + +if (usbctrl.protocol == LIBXL_USB_PROTOCOL_AUTO) +usbctrl.protocol = LIBXL_USB_PROTOCOL_PV; Is this really necessary? You do it in libxl, too. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 10/11] x86/intel_pstate: support the use of intel_pstate in pmstat.c
On 12.06.15 at 05:03, wei.w.w...@intel.com wrote: On 11/06/2015 22:02, Julien Grall wrote: On 11/06/2015 04:31, Wei Wang wrote: +else +{ +list_for_each(pos, cpufreq_governor_list) gov_num++; The indentation looks wrong to me. It has four +$, should be correct. The gov_num++ is supposedly the body of the list_for_each(), and hence needs to be indented by four more spaces than the loop header (i.e. a total of 12). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST Nested PATCH v11 6/7] Compose the main recipe of nested test job
On Fri, 2015-06-12 at 17:00 +0800, Robert Hu wrote: On Fri, 2015-06-12 at 09:44 +0100, Ian Campbell wrote: On Fri, 2015-06-12 at 11:42 +0800, Robert Hu wrote: Hi Ian J., because nested Xen isn't that matured at present; it is currently 'tech preview' phase. We now just add some sanity test case to defend current work fruits. We can add more complicated test cases later as nested Xen development moves forward. I don't think it needs doing as part of this series, but I do think it would be worth adding the standard suite of tests steps to L2 guests soon after. Agree. We can separately add those part in a later patch series. The way osstest deals with test failures i.e. classifying them into has always failed vs. regression means that it is fine to add tests for features which aren't mature yet, since they will fall into the has always failed set and not block pushes. In fact it is in some sense good to do this since it gives a concrete set of (necessary but not necessarily sufficient) things which need to be fixed for the feature to reach maturity. We would like this test job's failure to be marked 'regression', which shall block related breaking code's commitment. This is all done automatically, by comparison with previous flights. In essence if it passed last time then it will be considered a regression if it fails and for as long as it keeps failing (unless it is manually overridden via a forced push). So there is nothing to be marked. We can consider to contribute more code later, including proved/success test cases (which will fall into regression as this case), and fail test cases (which shall soon be fixed/developed). You should just add the cases, await the first results and investigate if it is not a pass as expected. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
-Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 12 June 2015 04:22 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/11/2015 09:25 PM, Paul Durrant wrote: -Original Message- From: Yang Hongyang [mailto:yan...@cn.fujitsu.com] Sent: 11 June 2015 13:59 To: Paul Durrant; Wen Congyang; Andrew Cooper; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/11/2015 06:20 PM, Paul Durrant wrote: -Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 11 June 2015 09:48 To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; [...] In our implementation, we don't start a new emulator. The codes can work, but some bugs may be not triggered. How do you reconcile the incoming QEMU save record with the running emulator state? We introduce a qmp command xen-load-devices- state(libxl__qmp_restore) which can restore the emulator state. The step of resotre emulator state at a checkpoint is: 1. libxl__qmp_stop- vm_stop() in qemu 2. libxl__qmp_restore - load_vmstate() in qemu 3. libxl__qmp_resume - vm_start() in qemu Ok, that sounds like the ideal time to hook back into Xen by creating a new ioreq server. I have some questions about ioreq server: 1. If we use old version xen and newest version qemu, is it OK? Is default ioreq server created when the guest is created. xen_create_ioreq_server() does nothing, and xen_get_ioreq_server_info() will get the default ioreq server information. Is it right? No. It's not compatible in that direction. A new Xen will work with an old QEMU but not the other way round. 2. Why we create a default ioreq server when getting the hvm param if there is already a not default ioreq server? If something reads the 'legacy' HVM params then that is Xen's trigger to create the default server. Any 'new' emulator should be using the ioreq server hypercalls so the default server will not be needed. 3. In the far end, we will clear the ioreq page, and this ioreq page is used for default ioreq server, is it right? Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and that includes the default server pages. Other ioreq servers will have their pages cleared on re-insertion to the P2M at the source end when the server is disabled. Paul Thanks Wen Congyang Paul Paul Thanks Wen Congyang Paul Thanks Wen Congyang Paul We will set to the guest to a new state, the old state should be dropped. Thanks Wen Congyang Paul Thanks Wen Congyang Paul Thanks Wen Congyang ~Andrew . . . ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel . . ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel . -- Thanks, Yang. . ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 3/7] libxl: add pvusb API
On 6/12/2015 at 12:42 AM, in message 21881.47707.526863.158...@mariner.uk.xensource.com, Ian Jackson ian.jack...@eu.citrix.com wrote: Chunyan Liu writes ([Xen-devel] [PATCH V4 3/7] libxl: add pvusb API): Add pvusb APIs, including: ... Thanks for your contribution. I'm afraid I haven't had time to completely finish my review this, but here's what I have: --- /dev/null +++ b/docs/misc/pvusb.txt @@ -0,0 +1,243 @@ +1. Overview It's good that you have provided documentation. But I think this document is a bit confused about its audience. Information about design choices should be removed from this user- and application-facing document, and put in comments in the code, or commit messages, I think. Thanks. Will update. +int libxl_device_usbctrl_getinfo(libxl_ctx *ctx, uint32_t domid, +libxl_device_usbctrl *usbctrl, +libxl_usbctrlinfo *usbctrlinfo) +LIBXL_EXTERNAL_CALLERS_ONLY; Why is this function marked LIBXL_EXTERNAL_CALLERS_ONLY ? Currently libxl itself won't call it. Exposed for toolstack usage. Will remove that if it's not proper. diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index ef7aa1d..89a9f07 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2439,6 +2439,18 @@ _hidden void libxl__device_vtpm_add(libxl__egc *egc, uint32_t domid, ... +/* AO operation to connect a PVUSB controller. + * Adding a PVUSB controller will add a 'vusb' device entry in xenstore, + * and will wait for device connection. In this context I think will wait for device connection is misleading. What you mean is that the vusb is available for adding devices to, but won't have any yet. Nothing in libxl is waiting. Here I mean libxl_wait_for_device_connection. Since it adds a new device entry to xenstore, it needs to wait for a moment for frontend/backend connection. diff --git a/tools/libxl/libxl_pvusb.c b/tools/libxl/libxl_pvusb.c new file mode 100644 index 000..a6e1aa1 --- /dev/null +++ b/tools/libxl/libxl_pvusb.c @@ -0,0 +1,1249 @@ ... +void libxl__device_usbctrl_add(libxl__egc *egc, uint32_t domid, + libxl_device_usbctrl *usbctrl, + libxl__ao_device *aodev) +{ +STATE_AO_GC(aodev-ao); ... +libxl_domain_config_init(d_config); +libxl_device_usbctrl_init(usbctrl_saved); +libxl_device_usbctrl_copy(CTX, usbctrl_saved, usbctrl); Wei will need to review the saved/live saved device info handling, and the json, etc. +static int +libxl_device_usbctrl_remove_common(libxl_ctx *ctx, uint32_t domid, + libxl_device_usbctrl *usbctrl, + const libxl_asyncop_how *ao_how, + int force) As discussed, you mustn't call this within libxl. I don't quite understand why. I guess it's the same as usb_add problem, something related to embedded ao? As in usb_add: libxl_device_usb_add() AO_CREATE(ctx, domid, ao_how) libxl__device_usb_add() libxl__device_usb_setdefault() libxl_device_usbctrl_add_common() AO_CREATE(ctx, domid, NULL) if this is not allowed, what is the correct way? If you need to, you need to break it out into an internal function (libxl__initiate_device_usbctrl_remove, maybe) which makes a callback when done. +libxl_device_usbctrl * +libxl_device_usbctrl_list(libxl_ctx *ctx, uint32_t domid, int *num) +{ +GC_INIT(ctx); +libxl_device_usbctrl *usbctrls = NULL; +char *fe_path = NULL; +char **dir = NULL; +unsigned int ndirs = 0; + +*num = 0; + +fe_path = GCSPRINTF(%s/device/vusb, +libxl__xs_get_dompath(gc, domid)); +dir = libxl__xs_directory(gc, XBT_NULL, fe_path, ndirs); + +if (dir ndirs) { +usbctrls = malloc(sizeof(*usbctrls) * ndirs); Please use libxl__calloc with NOGC. Thanks. Missing this one. +libxl_device_usbctrl* usbctrl; +libxl_device_usbctrl* end = usbctrls + ndirs; +for (usbctrl = usbctrls; usbctrl end; usbctrl++, dir++, (*num)++) { +char *tmp; +const char *be_path = libxl__xs_read(gc, XBT_NULL, +GCSPRINTF(%s/%s/backend, fe_path, *dir)); + +libxl_device_usbctrl_init(usbctrl); +usbctrl-devid = atoi(*dir); This function (and the corresponding writing code) is quite formulaic. Perhaps a macro or something could be used. I would make a similar criticism of libxl_device_usbctrl_getinfo. +/* check if USB device is already assigned to a domain */ +static bool is_usb_assigned(libxl__gc *gc,
Re: [Xen-devel] [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges
On 2015/6/11 17:51, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM When allocating mmio address for PCI bars, we need to make sure they don't overlap with reserved regions. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- tools/firmware/hvmloader/pci.c | 36 ++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index 5ff87a7..98af568 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -59,8 +59,8 @@ void pci_setup(void) uint32_t bar_reg; uint64_t bar_sz; } *bars = (struct bars *)scratch_start; -unsigned int i, nr_bars = 0; -uint64_t mmio_hole_size = 0; +unsigned int i, j, nr_bars = 0; +uint64_t mmio_hole_size = 0, reserved_end, max_bar_sz = 0; const char *s; /* @@ -226,6 +226,8 @@ void pci_setup(void) bars[i].devfn = devfn; bars[i].bar_reg = bar_reg; bars[i].bar_sz = bar_sz; +if ( bar_sz max_bar_sz ) +max_bar_sz = bar_sz; if ( ((bar_data PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_MEMORY) || @@ -301,6 +303,21 @@ void pci_setup(void) pci_mem_start = 1; } +/* Relocate PCI memory that overlaps reserved space, like RDM. */ +for ( j = 0; j memory_map.nr_map ; j++ ) +{ +if ( memory_map.map[j].type != E820_RAM ) +{ +reserved_end = memory_map.map[j].addr + memory_map.map[j].size; +if ( check_overlap(pci_mem_start, pci_mem_end, + memory_map.map[j].addr, + memory_map.map[j].size) ) +pci_mem_start -= memory_map.map[j].size PAGE_SHIFT; what's the point of subtracting reserved size here? I think you want to move pci_mem_start higher instead of lower to avoid conflict, right? No. +pci_mem_start = (pci_mem_start + max_bar_sz - 1) +~(uint64_t)(max_bar_sz - 1); better have some comment to explain what exactly you're trying to achieve here. Actually I didn't have this code fragment. Here I add this chunk of codes to address one concern that Jan raised. Please see the below. +} +} + if ( mmio_total (pci_mem_end - pci_mem_start) ) { printf(Low MMIO hole not large enough for all devices, @@ -407,8 +424,23 @@ void pci_setup(void) } base = (resource-base + bar_sz - 1) ~(uint64_t)(bar_sz - 1); + reallocate_mmio: In earlier comment you said: +/* Relocate PCI memory that overlaps reserved space, like RDM. */ If pci_mem_start has been relocated to avoid overlapping, how will actual allocation here will conflict again? Sorry I may miss the two relocations here... bar_data |= (uint32_t)base; bar_data_upper = (uint32_t)(base 32); +for ( j = 0; j memory_map.nr_map ; j++ ) +{ +if ( memory_map.map[j].type != E820_RAM ) +{ +reserved_end = memory_map.map[j].addr + memory_map.map[j].size; +if ( check_overlap(base, bar_sz, + memory_map.map[j].addr, + memory_map.map[j].size) ) +{ +base = (reserved_end + bar_sz - 1) ~(uint64_t)(bar_sz - 1); +goto reallocate_mmio; That is because our previous implementation is just skipping that conflict region, But you do nothing to make sure the MMIO regions all fit in the available window (see the code ahead of this relocating RAM if necessary). and ...it simply skips assigning resources. Your changes potentially growing the space needed to fit all MMIO BARs therefore also needs to adjust the up front calculation, such that if necessary more RAM can be relocated to make the hole large enough. And then I replied as follows, You're right. Just think about we're always trying to check pci_mem_start to populate more RAM to obtain enough PCI mempry, /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */ while ( (pci_mem_start PAGE_SHIFT) hvm_info-low_mem_pgend ) { struct xen_add_to_physmap xatp; unsigned int nr_pages = min_t( unsigned int, hvm_info-low_mem_pgend - (pci_mem_start PAGE_SHIFT), (1u 16) - 1); if ( hvm_info-high_mem_pgend == 0 ) hvm_info-high_mem_pgend = 1ull (32 - PAGE_SHIFT); hvm_info-low_mem_pgend -= nr_pages; printf(Relocating 0x%x pages from PRIllx to PRIllx\ for lowmem MMIO hole\n, nr_pages, PRIllx_arg(((uint64_t)hvm_info-low_mem_pgend)PAGE_SHIFT), PRIllx_arg(((uint64_t)hvm_info-high_mem_pgend)PAGE_SHIFT)); xatp.domid = DOMID_SELF; xatp.space =
Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling
On Thu, Jun 11, 2015 at 3:10 PM, Ian Campbell ian.campb...@citrix.com wrote: Draft F follows. Also at: http://xenbits.xen.org/people/ianc/vits/draftF.{pdf,html} Here's a quick update based on feedback prior to meeting on #xenarm at 12:00AM BST / 7:00AM EDT / 4:30PM IST (which is ~1:20 from now) Ian. % Xen on ARM vITS Handling % Ian Campbell ian.campb...@citrix.com % Draft F # Changelog ## Since Draft E * Discussion of `struct pending_irq` * Fix handling of enable/disable, requiring switching back to trapping the virtual cfg table again. get_vlpi_cfg is no longer needed. * Fix p2m_lookup to also use get_page_from_gfn. ## Since Draft D * Fixed assumptions about vLPI-pLPI mapping, which is not possible. This lead to changes to the model for enabling and disabling pLPI and vLPI and the handling of the virtual LPI configuration table, resolving _Unresolved Issue 1_. * Made the pLPI and vLPI interrupt priorities explicit. * Attempted to clarify the trust issues regarding in-guest data structures. * Mandate a particular cacheability for tables in guest memory. ## Since Draft C * _Major_ rework, in an attempt to simplify everything into something more likely to be achievable for 4.6. * Made some simplifying assumptions. * Reduced the scope of some support. * Command emulation is now mostly trivial. * Expanded detail on host setup, allowing other assumptions to be made during emulation. * Many other things lost in the noise of the above. ## Since Draft B * Details of command translation (thanks to Julien and Vijay) * Added background on LPI Translation and Pending tables * Added background on Collections * Settled on `N:N` scheme for vITS:pat's mapping. * Rejigged section nesting a bit. * Since we now thing translation should be cheap, settle on translation at scheduling time. * Lazy `INVALL` and `SYNC` ## Since Draft A * Added discussion of when/where command translation occurs. * Contention on scheduler lock, suggestion to use SOFTIRQ. * Handling of domain shutdown. * More detailed discussion of multiple vs single vits pros/cons. # Introduction ARM systems containing a GIC version 3 or later may contain one or more ITS logical blocks. An ITS is used to route Message Signalled interrupts from devices into an LPI injection on the processor. The following summarises the ITS hardware design and serves as a set of assumptions for the vITS software design. For full details of the ITS see the GIC Architecture Specification. ## Locality-specific Peripheral Interrupts (`LPI`) This is a new class of message signalled interrupts introduced in GICv3. They occupy the interrupt ID space from `8192..(2^32)-1`. The number of LPIs support by an ITS is exposed via `GITS_TYPER.IDbits` (as number of bits - 1), it may be up to 2^32. _Note_: This field also contains the number of Event IDs supported by the ITS. ### LPI Configuration Table Each LPI has an associated configuration byte in the LPI Configuration Table (managed via the GIC Redistributor and placed at `GICR_PROPBASER` or `GICR_VPROPBASER`). This byte configures: * The LPI's priority; * Whether the LPI is enabled or disabled. Software updates the Configuration Table directly but must then issue an invalidate command (per-device `INV` ITS command, global `INVALL` ITS command or write `GICR_INVLPIR`) for the affect to be guaranteed to become visible (possibly requiring an ITS `SYNC` command to ensure completion of the `INV` or `INVALL`). Note that it is valid for an implementation to reread the configuration table at any time (IOW it is _not_ guaranteed that a change to the LPI Configuration Table won't be visible until an invalidate is issued). ### LPI Pending Table Each LPI also has an associated bit in the LPI Pending Table (managed by the GIC redistributor). This bit signals whether the LPI is pending or not. This region may contain out of date information and the mechanism to synchronise is `IMPLEMENTATION DEFINED`. ## Interrupt Translation Service (`ITS`) ### Device Identifiers Each device using the ITS is associated with a unique Device Identifier. The device IDs are properties of the implementation and are typically described via system firmware, e.g. the ACPI IORT table or via device tree. The number of device ids in a system depends on the implementation and can be discovered via `GITS_TYPER.Devbits`. This field allows an ITS to have up to 2^32 devices. ### Events Each device can generate Events (called `ID` in the spec) these correspond to possible interrupt sources in the device (e.g. MSI offset). The maximum number of interrupt sources is device specific. It is usually discovered either from firmware tables (e.g. DT or ACPI) or from bus specific mechanisms (e.g. PCI config space). The maximum number of events ids support by an ITS is exposed via `GITS_TYPER.IDbits` (as number of bits - 1), it
Re: [Xen-devel] [PATCH] libxl: libxl_internal.h: Clarify ao rule against internal callers
On Thu, 2015-06-11 at 17:57 +0100, Ian Jackson wrote: Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com CC: Juergen Gross jgr...@suse.com Acked-by: Ian Campbell ian.campb...@citrix.com AO_INITIATOR_ENTRY was just a stray remainder of some wip name? --- tools/libxl/libxl_internal.h |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 465..bfc0729 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -1911,8 +1911,9 @@ _hidden void libxl__egc_cleanup(libxl__egc *egc); * All slow functions (see below for the exact definition) need to * use the asynchronous operation (ao) machinery. The function * should take a parameter const libxl_asyncop_how *ao_how and must - * start with a call to AO_INITIATOR_ENTRY. These functions MAY NOT - * be called from inside libxl, because they can cause reentrancy + * start with a call to AO_CREATE or equivalent. These functions MAY + * NOT be called from inside libxl (regardless of what is passed for + * ao_how), because they can cause reentrancy hazards due to * callbacks. * * For the same reason functions taking an ao_how may make themselves ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] Modified RTDS scheduler to use an event-driven model instead of polling.
On Wed, 2015-06-10 at 22:50 -0700, Meng Xu wrote: Hi Dario, First I think I got most of the points you raised/explained! They are very very clear and thanks for the detailed explanation of your idea! Glad you got it and (sort of :-) ) agree. 2015-06-09 5:53 GMT-07:00 Dario Faggioli dario.faggi...@citrix.com: 2) Using cyclictest as Dario mentioned before to test the real-time performance at end user. Dagaen, I can provide you the commands to run it, which is actually quite simple to run. So, yes, seeing some results would be great, independently from the specific work done in this patch. Right! I have some ideas about this test but won't want to mess up the focus of this thread. :-) I will raise this test again when we come to it. Ok. Looking forward to see this happening. Yeah, and what I can't figure out is why you decided to do so. The reason I don't like it is that things become a lot (more) complex to understand, maintain and modify. Now I get why you think it is harder to maintain. Right. reusing the timer will just make the rt_schedule complex and make the hot path longer. Exactly. runq_tickle(snext)...WAIT, WHAT?!?! :-O I mean, and I'm noticing this now, if the replenishments done during a particular call to rt_schedule() are not enough to change the situation on that particular pcpu, and hence the task which was running (and that you are deliberately disturbing with _a_full_execution_ of the scheduler's core function!) should continue to do so, you're calling runq_tickle() right on it. So, AFAICT, you're tickling scurr, not a newly replenished vcpu! Am I missing or misreading something? Let's assume not, and see what happens in this case... You are right! Although this issue (i.e., tickling on scurr instead of the next high priority VCPU) can be fixed (dirtily), it can be avoided with the design option a) you said. Of course it can be fixed.. Pretty much everything can! Point is the reason why it happened, and how to make these things not happen and/or easier to figure out. That's (one of) the point(s) of keeping things simple and self contained, even within a single component (like a scheduler), instead of let's do everything in this 10k lines function! :-P Glad that you saw what I meat. Jokes apart, if the above analysis is accurate, I think this is a good enough example of what I meant when saying to Dagaen this is making things too complex. Yes. The flow chart you drew is very clear! Thanks! @Dagaen, what do you think? Please comment on Dario's reply with your opinion and raise any of your concerns. Indeed. Here's how I envision things to go. Again, I'm talking about sticking with option a), so no per-vcpu timers, just 1 timer and a global queue, which now is a replenishment queue: timer interrupt TIMER_SOFTIRQ raised process softirqs replenishment_timer_handler() [spin_lock] for_each_replenishment_event(repl_time NOW()) { replenish(vcpu) runq_tickle(vcpu) } [spin_lock] Then, on the tickled pcpus (if any): process softirqs SCHEDULE_SOFTIRQ rt_schedule() [spin_lock] snext = runq_pick(): snext == vcpu X [spin_unlock] And get rid of __repl_update(), which makes my eyes bleed every time I open sched_rt.c. :-) Oh, and note that we probably can use a new spin lock for the replenishment queue, different from the existing global scheduler spinlock, lowering contention and increasing scalabiliy even further! Here is the main question I have about your advice. If we are introducing a new spin lock for the replenishment queue, what is the relation between the new lock and the old lock of the runq? That's hard to tell in advance, you'll know it completely only when trying to implement it. But, yes, when more locks are involved, it's impotant to figure out the relationships between each other, or we risk introducing deadlock or, even if things are correct, fail to improve the performance (or do even worse!!). The idea is, since the two operations (scheduling/budget enforcement and replenishments) are logically independent, and if they're implemented in a way that stress this independence, then it make sens to try to use different spin locks. As soon as you have more than one spin lock, what you should pay the most attention to is, if they need to 'nest' (one is acquired when the other is already being held), that has to happen consistently, or deadlock will occur! :-/ Because the deadline decides the priority of VCPUs and thus decides the ordering of VCPUs in the runq, the replenish(vcpu) will operate on the runq as well. As shown in the workflow in your reply: replenish(vcpu) runq_tickle(vcpu) The runq_tickle(vcpu) will pick the desired CPU. On that CPU,
Re: [Xen-devel] [RFC v2 07/15] vt-d: Add API to update IRTE when VT-d PI is used
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, June 09, 2015 10:33 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: Re: [RFC v2 07/15] vt-d: Add API to update IRTE when VT-d PI is used On 08.05.15 at 11:07, feng...@intel.com wrote: +bool_t pi_update_irte(struct vcpu *v, struct pirq *pirq, uint8_t gvec) Without seeing the caller right away it's hard to judge, but generally I'd prefer functions to return -E... values as error indicators, i.e. +{ +struct irq_desc *desc; +struct msi_desc *msi_desc; +int remap_index; +bool_t rc = 0; +struct pci_dev *pci_dev; +struct acpi_drhd_unit *drhd; +struct iommu *iommu; +struct ir_ctrl *ir_ctrl; +struct iremap_entry *iremap_entries = NULL, *p = NULL; +struct iremap_entry new_ire; +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc; +unsigned long flags; + +desc = pirq_spin_lock_irq_desc(pirq, NULL); +if ( !desc ) +return 0; -ENOMEM +msi_desc = desc-msi_desc; +if ( !msi_desc ) +goto unlock_out; -EBADSLT +pci_dev = msi_desc-dev; +if ( !pci_dev ) +goto unlock_out; -ENODEV +remap_index = msi_desc-remap_index; +drhd = acpi_find_matched_drhd_unit(pci_dev); +if ( !drhd ) +{ +dprintk(XENLOG_INFO VTDPREFIX, +%pv: failed to get drhd, pci device: +%04x:%02x:%02x.%u, guest vector: %u\n, +v, pci_dev-seg, pci_dev-bus, PCI_SLOT(pci_dev-devfn), +PCI_FUNC(pci_dev-devfn), gvec); +goto unlock_out; +} + +iommu = drhd-iommu; +ir_ctrl = iommu_ir_ctrl(iommu); +if ( !ir_ctrl ) +{ +dprintk(XENLOG_INFO VTDPREFIX, +%pv: failed to get ir_ctrl, pci device: +%04x:%02x:%02x.%u, guest vector: %u\n, +v, pci_dev-seg, pci_dev-bus, PCI_SLOT(pci_dev-devfn), +PCI_FUNC(pci_dev-devfn), gvec); +goto unlock_out; +} Do you think these log messages are useful beyond your bringup purposes? So what kind of message do you think I need to show here? Thanks, Feng +spin_lock_irqsave(ir_ctrl-iremap_lock, flags); + +GET_IREMAP_ENTRY(ir_ctrl-iremap_maddr, remap_index, iremap_entries, p); + +memcpy(new_ire, p, sizeof(new_ire)); Please use structure assignment (being type safe) in preference to memcpy() (not being type safe). --- a/xen/drivers/passthrough/vtd/iommu.h +++ b/xen/drivers/passthrough/vtd/iommu.h @@ -327,6 +327,10 @@ struct iremap_entry { }; }; +#define PDA_LOW_BIT26 +#define PDA_HIGH_BIT 32 +#define PDA_MASK(XX) (~(-1UL PDA_##XX##_BIT)) To me it would look more natural if you used ~0UL. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 14/15] Suppress posting interrupts when 'SN' is set
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, June 10, 2015 2:49 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: Re: [RFC v2 14/15] Suppress posting interrupts when 'SN' is set On 08.05.15 at 11:07, feng...@intel.com wrote: --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1664,9 +1664,20 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v) static void vmx_deliver_posted_intr(struct vcpu *v, u8 vector) { +int r, sn; + if ( pi_test_and_set_pir(vector, v-arch.hvm_vmx.pi_desc) ) return; +/* + * Currently, we don't support urgent interrupt, all interrupts + * are recognized as non-urgent interrupt, so we cannot send + * posted-interrupt when 'SN' is set. + */ + +sn = v-arch.hvm_vmx.pi_desc.sn; +r = pi_test_and_set_on(v-arch.hvm_vmx.pi_desc); I'm probably misunderstanding something here, but to me this looks like a change that would need to be done quite a bit earlier in the series (i.e. at this point it looks like it's fixing a bug/oversight of an earlier patch). From hardware p.o.v, if 'SN' is set, the hardware doesn't send notification event. vmx_deliver_posted_intr() is the software way to delivery posted-interrupts, so we need to follow the HW's behavior. Hence we check 'SN' first, and not send notification event if it is set. Apart from that I'm also not understanding the synchronization aspect here: What if SN gets set after having been latched above, but before the latched value gets looked at below? Yes, that is a question. Here is the scenario your described above, right? .. sn = v-arch.hvm_vmx.pi_desc.sn; /*sn gets 0 here*/ v-arch.hvm_vmx.pi_desc.sn gets set by others else if ( !r !sn ) /*Oops, sn cannot reflect the real v-arch.hvm_vmx.pi_desc.sn here*/ .. Maybe I need think about how to handle this. Thanks, Feng Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d async invalidation for Device-TLB.
On 12.06.15 at 04:40, quan...@intel.com wrote: On 10.06.15 at 16:05, jbeul...@suse.com wrote: On 03.06.15 at 09:49, quan...@intel.com wrote: For Context Invalidation and IOTLB invalidation without Device-TLB invalidation, Invalidation Queue flushes synchronous invalidation as before(This is a tradeoff and the cost of interrupt is overhead). DMAR_OPERATION_TIMEOUT being 1s, are you saying that you're not intending to replace the current spinning for the non-ATS case? Yes, we are not intending to replace the current spinning for the non-ATS case. I'm not really happy about that. Considering that expiring these loops results in panic()s, I would expect these to become asynchronous _and_ contained to the affected VM alongside the ATS induced changed behavior. You talking of overhead - can you quantify that? I tested it by a Myri-10G Dual-Protocol NIC, which is an ATS device. for an invalidation: By sync way, it takes about 1.4 ms. By async way, it takes about 4.3 ms. What's the theory on why this is? After all, it shouldn't matter how the completion of the invalidation gets signaled. Apart from that measuring the ATS case (in which case we're set to use async mode anyway) is kind of pointless here - we'd need to know the overhead of non-ATS async compared to non-ATS sync. More details: 1. invalidation table. We define iommu _invl structure in domain. Struct iommu _invl { volatile u64 iommu _invl _poll_slot :62; domid_t dom_id; u64 iommu _invl _status_data :32; }__attribute__ ((aligned (64))); iommu _invl _poll_slot: Set it equal to the status address of wait descriptor when the invalidation queue is with Device-TLB. dom_id: Keep the id of the domain. iommu _invl _status_data: Keep the count of in-flight queue with Device-TLB invalidation. Without further explanation above/below I don't think I really understand the purpose of this structure, nor its organization: Is this something imposed by the VT-d specification? If so, a reference to the respective section in the spec would be useful. If not, I can't see why the structure is laid out the (odd) way it is. Refer to the explanation above. If it is still not clear, I will continue to explain in next email. The explanation above helped for what I asked above, but didn't make clear to me what the structure here is, how it relates to hw defined structures, and hence (as said) why it is laid out the way it is. 4. New interrupt handler for invalidation completion: - when hardware completes the invalidations with Device IOTLB, it generates an interrupt to notify hypervisor. - In interrupt handler, we will schedule a soft-irq to handle the finished invalidations. - soft-irq to handle finished invalidation: Scan the pending flush list for each entry in list check the values of iommu _invl _poll_slot and iommu _invl _status_data in each domain's invalidation table. if yes, clear iommu_pending_flush and invalidation table, then wakeup the domain. Did you put some consideration into how long this list may get, and hence how long it may take you to iterate through the entire list? Only the domain which has the ATS device assigned will be tracked in this list. So the list length shouldn't be very long. Okay, if this is a list of domains (or of devices), that would hopefully be acceptable (albeit on a huge system this could still be dozens). If this was a list of pending flush requests, it might be worse. Besides, the DEVICE-IOTLB invalidation doesn't happened frequently so the cost should be acceptable. That's not a valid consideration: At no time must any processing inside the hypervisor take arbitrarily long. This requirement is entirely independent of how frequently such cases may occur. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 00/16] Fix RMRR
On 12.06.15 at 04:10, tiejun.c...@intel.com wrote: On 2015/6/11 20:52, Tim Deegan wrote: which would be better handeld explicitly: if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm ) ... So if I'm correct, we should do this check explicitly, if ( p2mt == p2m_invalid || (p2mt == p2m_mmio_dm !mfn_valid(mfn) ) Note this is equivalent to Jan's comment. I think the !mfn_valid() part is really redundant - p2m_mmio_dm should never be put on a page translating to a valid MFN. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST Nested PATCH v11 6/7] Compose the main recipe of nested test job
On Fri, 2015-06-12 at 09:44 +0100, Ian Campbell wrote: On Fri, 2015-06-12 at 11:42 +0800, Robert Hu wrote: Hi Ian J., because nested Xen isn't that matured at present; it is currently 'tech preview' phase. We now just add some sanity test case to defend current work fruits. We can add more complicated test cases later as nested Xen development moves forward. I don't think it needs doing as part of this series, but I do think it would be worth adding the standard suite of tests steps to L2 guests soon after. Agree. We can separately add those part in a later patch series. The way osstest deals with test failures i.e. classifying them into has always failed vs. regression means that it is fine to add tests for features which aren't mature yet, since they will fall into the has always failed set and not block pushes. In fact it is in some sense good to do this since it gives a concrete set of (necessary but not necessarily sufficient) things which need to be fixed for the feature to reach maturity. We would like this test job's failure to be marked 'regression', which shall block related breaking code's commitment. We can consider to contribute more code later, including proved/success test cases (which will fall into regression as this case), and fail test cases (which shall soon be fixed/developed). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
On 2015/6/12 16:45, Jan Beulich wrote: On 12.06.15 at 08:31, tiejun.c...@intel.com wrote: On 2015/6/11 17:28, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev) PCI_DEVFN2(bdf) != devfn ) continue; -rmrr_identity_mapping(pdev-domain, 0, rmrr); +rmrr_identity_mapping(pdev-domain, 0, rmrr, + XEN_DOMCTL_DEV_RDM_RELAXED); ditto It doesn't matter when we're trying to remove a device since we don't care this flag. In such a case it helps to add a brief comment saying that the precise value passed is irrelevant. Or maybe this could be expressed by Okay. folding this and the map parameters of the function (in which case it might become self-documenting)? Sorry, I don't know exactly how to implement this idea. Have we any similar example on Xen side? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, June 09, 2015 11:09 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: Re: [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU On 08.05.15 at 11:07, feng...@intel.com wrote: This patch adds a new per-vCPU tasklet to wakeup the blocked vCPU. It can be used in the case vcpu_unblock cannot be called directly. This tasklet will be used in later patch in this series. Signed-off-by: Feng Wu feng...@intel.com --- xen/common/domain.c | 11 +++ xen/include/xen/sched.h | 3 +++ 2 files changed, 14 insertions(+) What is the rationale for doing this in common code, modifying common structures? Do you mean moving this tasklet to the architecture specific structure, such as, struct arch_vcpu ? Thanks, Feng Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 08/15] Update IRTE according to guest interrupt config changes
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, June 09, 2015 11:06 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: Re: [RFC v2 08/15] Update IRTE according to guest interrupt config changes On 08.05.15 at 11:07, feng...@intel.com wrote: +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id, +uint8_t dest_mode, uint8_t delivery_mode, +uint8_t gvec, struct vcpu **dest_vcpu) +{ +struct vcpu *v, **dest_vcpu_array; +unsigned int dest_vcpu_num = 0; +int ret; This, being given as operand to return, should match in type with the function's return type. +dest_vcpu_array = xzalloc_array(struct vcpu *, d-max_vcpus); You realize that this can be quite big an allocation? (You could at least halve it by storing vCPU IDs instead of pointers, but if I'm not mistaken this could even be a simple bitmap.) If we use vCPU IDs or bitmap, we need to iterate all the vCPUs in the domain to find the right vCPU from the vcpu_id, right? +if ( !dest_vcpu_array ) +{ +dprintk(XENLOG_G_INFO, +dom%d: failed to allocate memeory.\n, d-domain_id); Please fix the typo and remove the stop. +return 0; +} + +for_each_vcpu ( d, v ) +{ +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL, 0, +dest_id, dest_mode) ) +continue; + +dest_vcpu_array[dest_vcpu_num++] = v; +} + +if ( delivery_mode == dest_LowestPrio ) +{ +if ( dest_vcpu_num != 0 ) +{ +*dest_vcpu = dest_vcpu_array[gvec % dest_vcpu_num]; +ret = 1; +} +else +ret = 0; +} +else if ( dest_vcpu_num == 1 ) +{ +*dest_vcpu = dest_vcpu_array[0]; +ret = 1; +} +else +ret = 0; + +xfree(dest_vcpu_array); +return ret; +} Blank line before final return statement please. @@ -330,11 +398,40 @@ int pt_irq_create_bind( /* Calculate dest_vcpu_id for MSI-type pirq migration. */ dest = pirq_dpci-gmsi.gflags VMSI_DEST_ID_MASK; dest_mode = !!(pirq_dpci-gmsi.gflags VMSI_DM_MASK); +delivery_mode = (pirq_dpci-gmsi.gflags GFLAGS_SHIFT_DELIV_MODE) +VMSI_DELIV_MASK; dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode); pirq_dpci-gmsi.dest_vcpu_id = dest_vcpu_id; spin_unlock(d-event_lock); if ( dest_vcpu_id = 0 ) hvm_migrate_pirqs(d-vcpu[dest_vcpu_id]); + +/* Use interrupt posting if it is supported */ +if ( iommu_intpost ) +{ +struct vcpu *vcpu = NULL; + +if ( !pi_find_dest_vcpu(d, dest, dest_mode, delivery_mode, +pirq_dpci-gmsi.gvec, vcpu) ) Why not have the function return the vCPU instead of passing it via indirection? Good suggestion! Thanks, Feng +{ +dprintk(XENLOG_G_WARNING, +%pv: failed to find the dest vCPU for PI, guest +vector:%u use software way to deliver the + interrupts.\n, vcpu, pirq_dpci-gmsi.gvec); You shouldn't be printing the (NULL) vCPU here. And please print vectors as hex values. +break; +} + +if ( pi_update_irte( vcpu, info, pirq_dpci-gmsi.gvec ) != 0 ) +{ +dprintk(XENLOG_G_WARNING, +%pv: failed to update PI IRTE, guest vector:%u +use software way to deliver the interrupts.\n, +vcpu, pirq_dpci-gmsi.gvec); + +break; +} +} + break; By using if() / else if() you could drop _both_ break-s you add. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 07/15] vt-d: Add API to update IRTE when VT-d PI is used
-Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, June 10, 2015 2:18 PM To: Wu, Feng Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org Subject: Re: [RFC v2 07/15] vt-d: Add API to update IRTE when VT-d PI is used On 08.05.15 at 11:07, feng...@intel.com wrote: +static inline void setup_posted_irte( +struct iremap_entry *new_ire, struct pi_desc *pi_desc, uint8_t gvec) +{ +new_ire-post.urg = 0; +new_ire-post.vector = gvec; +new_ire-post.pda_l = (((u64)virt_to_maddr(pi_desc)) + (32 - PDA_LOW_BIT)) PDA_MASK(LOW); +new_ire-post.pda_h = (((u64)virt_to_maddr(pi_desc)) 32) + PDA_MASK(HIGH); Looking at this another time I can't see what the and-ing with PAD_MASK() is supposed to be good for. I cannot understand this well. Do you mean we don't need and PDA_MASK() here? Thanks, Feng Nor can I see the need for the u64 casts. The two literal 32-s aren't ideal either, but I guess tolerable. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
-Original Message- From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Jan Beulich Sent: Tuesday, June 09, 2015 11:20 PM To: Wu, Feng Cc: Tian, Kevin; k...@xen.org; george.dun...@eu.citrix.com; andrew.coop...@citrix.com; xen-devel@lists.xen.org; Zhang, Yang Z Subject: Re: [Xen-devel] [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts On 08.05.15 at 11:07, feng...@intel.com wrote: @@ -3262,6 +3271,28 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs) } /* + * Handle VT-d posted-interrupt when VCPU is blocked. + */ +void pi_wakeup_interrupt(struct cpu_user_regs *regs) static (and perhaps move it up so you don't need to forward declare it). +{ +struct vcpu *v; +unsigned int cpu = smp_processor_id(); + +spin_lock(per_cpu(blocked_vcpu_lock, cpu)); +list_for_each_entry(v, per_cpu(blocked_vcpu, cpu), +blocked_vcpu_list) { How long do you think such a list can grow? It depends on how many vCPU is current blocked on this specific pCPU. I am not sure how long it can grow. The basic idea here is that, when notification wakeup event happens, we need to find a way to find the right vCPU, and then wakeup it, do you have any better idea for this? Thanks, Feng I'm afraid you might be adding quite a bit of latency here to the system. +struct pi_desc *pi_desc = v-arch.hvm_vmx.pi_desc; + +if ( pi_desc-on == 1 ) Isn't this a single-bit (i.e. boolean) field? In which case - no reason to compare with 1. I also don't see the value of the local variable - it's being used only once, and without it the line wouldn't get overly long or unreadable. --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -148,6 +148,8 @@ struct vcpu struct vcpu *next_in_list; +struct list_head blocked_vcpu_list; Again - why here instead of in VT-d/VMX specific structures? Do you think it is okay to put it in struct arch_vmx_struct ? Thanks, Feng Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [libvirt test] 58401: tolerable FAIL - PUSHED
flight 58401 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/58401/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-amd64-i386-libvirt 11 guest-start fail like 58334 test-amd64-amd64-libvirt 11 guest-start fail like 58334 test-amd64-i386-libvirt-xsm 11 guest-start fail like 58334 test-armhf-armhf-libvirt 11 guest-start fail like 58334 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 11 guest-start fail never pass version targeted for testing: libvirt c0ef99525d55aecbcf84dea73ba6753c243680ca baseline version: libvirt c178d38b8faabf93521b4013bc6772c0f5e6b9e0 People who touched revisions under test: Eric Blake ebl...@redhat.com Michal Privoznik mpriv...@redhat.com jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-xsm pass test-armhf-armhf-libvirt-xsm fail test-amd64-i386-libvirt-xsm fail test-amd64-amd64-libvirt fail test-armhf-armhf-libvirt fail test-amd64-i386-libvirt fail sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : + branch=libvirt + revision=c0ef99525d55aecbcf84dea73ba6753c243680ca + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getconfig Repos +++ perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push libvirt c0ef99525d55aecbcf84dea73ba6753c243680ca + branch=libvirt + revision=c0ef99525d55aecbcf84dea73ba6753c243680ca + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getconfig Repos +++ perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . cri-common ++ . cri-getconfig ++ umask 002 + select_xenbranch + case $branch in + tree=libvirt + xenbranch=xen-unstable + '[' xlibvirt = xlinux ']' + linuxbranch= + '[' x = x ']' + qemuubranch=qemu-upstream-unstable + : tested/2.6.39.x + . ap-common ++ : osst...@xenbits.xen.org +++ getconfig OsstestUpstream +++ perl -e ' use Osstest; readglobalconfig(); print $c{OsstestUpstream} or die $!; ' ++ : ++ : git://xenbits.xen.org/xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/xen.git ++ : git://xenbits.xen.org/staging/qemu-xen-unstable.git ++ : git://git.kernel.org ++ : git://git.kernel.org/pub/scm/linux/kernel/git ++ : git ++ : git://libvirt.org/libvirt.git ++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git ++ : git://xenbits.xen.org/libvirt.git ++ : git://xenbits.xen.org/rumpuser-xen.git ++ : git ++ : git://xenbits.xen.org/rumpuser-xen.git ++ : osst...@xenbits.xen.org:/home/xen/git/rumpuser-xen.git +++ besteffort_repo https://github.com/rumpkernel/rumpkernel-netbsd-src +++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src +++ cached_repo https://github.com/rumpkernel/rumpkernel-netbsd-src '[fetch=try]' +++ local
Re: [Xen-devel] [OSSTEST Nested PATCH v11 5/7] Add new script to customize nested test configuration
On Fri, 2015-06-12 at 03:46 +, Pang, LongtaoX wrote: -Original Message- From: Ian Jackson [mailto:ian.jack...@eu.citrix.com] Sent: Thursday, June 11, 2015 11:15 PM To: Pang, LongtaoX Cc: xen-devel@lists.xen.org; ian.campb...@citrix.com; wei.l...@citrix.com; Hu, Robert Subject: RE: [OSSTEST Nested PATCH v11 5/7] Add new script to customize nested test configuration Pang, LongtaoX writes (RE: [OSSTEST Nested PATCH v11 5/7] Add new script to customize nested test configuration): [Ian Jackson [mailto:ian.jack...@eu.citrix.com]] We would avoid having to mention /dev/xvdb if we created the VG in the host, before doing block-attach. I'm not sure whether that's an improvement. I am sorry, I cannot get your point. Could you please make it more clear? Thanks. As you explain in the comment, you have to mention /dev/xvdb here even though after rebooting into Xen this will be /dev/sdb. This wrinkle would become invisible if you did pvcreate and vgcreate in the L0 (before attach), rather than the L1 (after attach). Because, then none of your operations on L1 would not need to name the physical device. Thanks for your explanation. For nested job, we will install L2 guest inside L1 reuse 'ts-debian-hvm-install' script. But according to recent code design, inside L1 HVM guest, there is no vg group, so that we need to create a vg group for L2 installation. So, inside L0, we create storage lv(lvcreate -L ${guest_storage_lv_size}M -n $guest_storage_lv_name $vgname), attach this lv to L1. Then inside L1, using the attached disk to create a pv(pvcreate /dev/xvdb), then create a vg(vgcreate ${l1_gn}-disk /dev/xvdb) base on the pv. Then, using this vg for installing L2. I think '/dev/xvdb' is necessary when create vg inside L1, using command 'vgcreate ${l1_gn}-disk /dev/xvdb' . Please correct me if I am wrong. I think Ian J's suggestion is that instead of creating the ${l1_gn}-disk VG via commands on /dev/xvdb in the L1 you can create it via commands on ${guest_storage_lvdev} in L0. e.g. instead of +target_cmd_root($l1, pvcreate /dev/xvdb vgcreate ${l1_gn}-disk /dev/xvdb); you can instead do: +target_cmd_root($l0, pvcreate ${guest_storage_lvdev} vgcreate ${l1_gn}-disk /dev/xvdb); before you do the block-attach. That way the VG is already present on the device when it appears in L1 and L1 will just find it. This avoids any confusion about sdb vs xvdb naming. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-linus test] 58399: regressions - FAIL
flight 58399 linux-linus real [real] http://logs.test-lab.xenproject.org/osstest/logs/58399/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-rumpuserxen-amd64 15 rumpuserxen-demo-xenstorels/xenstorels.repeat fail REGR. vs. 58128 Regressions which are regarded as allowable (not blocking): test-amd64-i386-libvirt-xsm 11 guest-start fail REGR. vs. 58128 test-amd64-i386-libvirt 11 guest-start fail like 58128 test-amd64-amd64-libvirt 11 guest-start fail like 58128 test-amd64-i386-freebsd10-amd64 9 freebsd-install fail like 58128 test-amd64-i386-freebsd10-i386 9 freebsd-install fail like 58128 test-armhf-armhf-libvirt-xsm 11 guest-start fail like 58128 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 58128 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 58128 test-armhf-armhf-libvirt 11 guest-start fail like 58128 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-intel 13 guest-saverestorefail never pass test-amd64-amd64-xl-pvh-amd 11 guest-start fail never pass test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 12 migrate-support-checkfail never pass test-armhf-armhf-xl 12 migrate-support-checkfail never pass test-armhf-armhf-xl-xsm 12 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 12 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail never pass test-armhf-armhf-xl-sedf-pin 12 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail never pass test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 12 migrate-support-checkfail never pass version targeted for testing: linux2bfc60dd2869745f5bbed9583c517cc76e3db924 baseline version: linux37ef1647b7f73d4ff4c7993984599b6c4f26443a People who touched revisions under test: Eric W. Biederman ebied...@xmission.com Tom Herbert t...@herbertland.com Aaro Koskinen aaro.koski...@nokia.com Alban Bedel al...@free.fr Alexei Starovoitov a...@plumgrid.com Andrew Morton a...@linux-foundation.org Anjali Singhai Jain anjali.sing...@intel.com Arnd Bergmann a...@arndb.de David S. Miller da...@davemloft.net David Woodhouse david.woodho...@intel.com Dmitry Torokhov dmitry.torok...@gmail.com Felipe Balbi ba...@ti.com Flavio Leitner f...@redhat.com Florian Fainelli f.faine...@gmail.com Gregory CLEMENT gregory.clem...@free-electrons.com Gu Zheng guz.f...@cn.fujitsu.com Guenter Roeck li...@roeck-us.net Hannes Frederic Sowa han...@stressinduktion.org Hauke Mehrtens ha...@hauke-m.de Heiko Carstens heiko.carst...@de.ibm.com Huacai Chen che...@lemote.com Imre Kaloz ka...@openwrt.org James Hogan james.ho...@imgtec.com Jeff Kirsher jeffrey.t.kirs...@intel.com Jeroen Hofstee jer...@myspectrum.nl Jesse Brandeburg jesse.brandeb...@intel.com Jiang Liu jiang@linux.intel.com Jim Young james.m.yo...@intel.com Jiri Benc jb...@redhat.com Joe Perches j...@perches.com Joerg Roedel jroe...@suse.de Johannes Weiner han...@cmpxchg.org Jonas Gorski j...@openwrt.org Josh Hunt joh...@akamai.com Joshua Kinard ku...@gentoo.org Kevin Hilman khil...@linaro.org Krzysztof Kozlowski k.kozlow...@samsung.com Lendacky, Thomas thomas.lenda...@amd.com Lennox Wu lennox...@gmail.com Linus Torvalds torva...@linux-foundation.org Maciej W. Rozycki ma...@linux-mips.org Marc Zyngier marc.zyng...@arm.com Markos Chandras markos.chand...@imgtec.com Matthias Brugger matthias@gmail.com Matthijs van Duin matthijsvand...@gmail.com Mel Gorman mgor...@suse.de Michael Holzheu holz...@linux.vnet.ibm.com Michael S. Tsirkin m...@redhat.com Michal Hocko mho...@suse.cz Minchan Kim minc...@kernel.org Nadav Haklai nad...@marvell.com Nicholas Mc Guire hof...@osadl.org Nicolas Schichan nschic...@freebox.fr Nikolay Aleksandrov niko...@cumulusnetworks.com Pavel Machek pa...@ucw.cz Peter Hutterer peter.hutte...@who-t.net Pravin B Shelar pshe...@nicira.com Ralf Baechle r...@linux-mips.org Robert Nelson robertcnel...@gmail.com Robert Shearman rshea...@brocade.com Sascha Hauer s.ha...@pengutronix.de Sergey Senozhatsky sergey.senozhat...@gmail.com Shawn Bohrer sboh...@rgmadvisors.com Sriharsha Basavapatna sriharsha.basavapa...@avagotech.com Tejun Heo t...@kernel.org Tero Kristo t-kri...@ti.com Thomas Petazzoni
Re: [Xen-devel] [v3][PATCH 03/16] xen/vtd: create RMRR mapping
From: Chen, Tiejun Sent: Friday, June 12, 2015 1:58 PM On 2015/6/12 10:43, Chen, Tiejun wrote: On 2015/6/11 22:07, Tim Deegan wrote: At 17:31 +0800 on 11 Jun (1434043916), Chen, Tiejun wrote: while ( base_pfn end_pfn ) { -int err = intel_iommu_map_page(d, base_pfn, base_pfn, - IOMMUF_readable|IOMMUF_writable); +int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw); if ( err ) return err; Tim has another comment to replace earlier unmap with Yes, I knew this. guest_physmap_remove_page() which will call iommu unmap internally. Please include this change too. But, guest_physmap_remove_page() | + p2m_remove_page() | + iommu_unmap_page() | + p2m_set_entry(p2m, gfn, _mfn(INVALID_MFN), xxx) I think this already remove these pages both on ept/vt-d sides, right? Yes; this is about this code further up in the same function: while ( base_pfn end_pfn ) { if ( intel_iommu_unmap_page(d, base_pfn) ) ret = -ENXIO; base_pfn++; } which ought to be calling guest_physmap_remove_page() or similar, to make sure that both iommu and EPT mappings get removed. I still just think current implementation might be fine at this point. We have two scenarios here, the case of shared ept and the case of non-shared ept. But no matter what case we're tracking, shouldn't guest_physmap_remove_page() always call p2m-set_entry() to clear *all* *valid* mfn which is owned by a given VM? And p2m-set_entry() also calls iommu_unmap_page() internally. So nothing special should further consider. If I'm wrong or misunderstanding, please correct me :) Sorry for my misunderstanding to this. Right now Kevin help me understand what you mean. Sounds like we should address something specific to unmap rmrr here. So I will do this as follows: #1. Provide a clear helper +int clear_identity_p2m_entry(struct domain *d, unsigned long gfn, + unsigned int page_order) +{ +struct p2m_domain *p2m = p2m_get_hostp2m(d); +int ret; +gfn_lock(p2m, gfn, page_order); +ret = p2m_remove_page(p2m, gfn, gfn, page_order); +gfn_unlock(p2m, gfn, page_order); +return ret; +} + #2. Call such a helper @@ -1840,7 +1840,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, while ( base_pfn end_pfn ) { -if ( intel_iommu_unmap_page(d, base_pfn) ) +if ( clear_identity_p2m_entry(d, base_pfn, 0) ) ret = -ENXIO; base_pfn++; } Is this right? Thanks Tiejun could you explain why existing guest_physmap_remove_page can't serve the purpose so you need invent a new identity mapping specific one? For unmapping suppose it should be common regardless of whether it's identity-mapped or not. :-) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [ovmf test] 58398: tolerable FAIL - PUSHED
flight 58398 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/58398/ Failures :-/ but no regressions. Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail REGR. vs. 56492 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail REGR. vs. 56492 version targeted for testing: ovmf 13af4ab06516eefb40fb985467141e09efe9c58b baseline version: ovmf f1f0f0deb6e64df6b7c04ead7330afecf5537e46 People who touched revisions under test: Ma, Maurice maurice...@intel.com Mudusuru, Giri P giri.p.mudus...@intel.com Yao, Jiewen jiewen@intel.com Zachary Bobroff zacha...@ami.com Ard Biesheuvel ard.biesheu...@linaro.org Chao Zhang chao.b.zh...@intel.com Dandan Bi dandan...@intel.com David Wei david@intel.com David Wei david@intel.com Eric Dong eric.d...@intel.com Eric Jin eric@intel.com Feng Tian feng.t...@intel.com Guo Dong guo.d...@intel.com Hao Wu hao.a...@intel.com Heyi Guo heyi@linaro.org Jaben Carsey jaben.car...@intel.com Jeff Fan jeff@intel.com jiaxinwu jiaxin...@intel.com Jordan Justen jordan.l.jus...@intel.com Laszlo Ersek ler...@redhat.com lhauch larry.ha...@intel.com Liming Gao liming@intel.com Long Qin qin.l...@intel.com Ludovic Rousseau ludovic.rouss...@gmail.com Ma, Maurice maurice...@intel.com Maurice Ma maurice...@intel.com Mudusuru, Giri P giri.p.mudus...@intel.com Olivier Martin olivier.mar...@arm.com Paulo Alcantara pca...@zytor.com Qiu Shumin shumin@intel.com Ruiyu Ni ruiyu...@intel.com Shifei Lu shifeix.a...@intel.com Star Zeng star.z...@intel.com Tapan Shah tapands...@hp.com Tian Feng feng.t...@intel.com Yao, Jiewen jiewen@intel.com Yingke Liu yingke.d@intel.com Zachary Bobroff zacha...@ami.com jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmpass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 pass test-amd64-amd64-xl-qemuu-winxpsp3 pass test-amd64-i386-xl-qemuu-winxpsp3pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : + branch=ovmf + revision=13af4ab06516eefb40fb985467141e09efe9c58b + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getconfig Repos +++ perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x '!=' x/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock ++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf 13af4ab06516eefb40fb985467141e09efe9c58b + branch=ovmf + revision=13af4ab06516eefb40fb985467141e09efe9c58b + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getconfig Repos +++ perl -e ' use Osstest; readglobalconfig(); print $c{Repos} or die $!; ' ++ repos=/home/osstest/repos ++ repos_lock=/home/osstest/repos/lock ++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']' + . cri-common ++ . cri-getconfig ++ umask 002 + select_xenbranch + case $branch in + tree=ovmf + xenbranch=xen-unstable + '[' xovmf = xlinux ']' +
Re: [Xen-devel] RIP MTRR - status update for upcoming v4.2
On 12.06.15 at 01:23, toshi.k...@hp.com wrote: There are two usages on MTRRs: 1) MTRR entries set by firmware 2) MTRR entries set by OS drivers We can obsolete 2), but we have no control over 1). As UEFI firmwares also set this up, this usage will continue to stay. So, we should not get rid of the MTRR code that looks up the MTRR entries, while we have no need to modify them. Such MTRR entries provide safe guard to /dev/mem, which allows privileged user to access a range that may require UC mapping while the /dev/mem driver blindly maps it with WB. MTRRs converts WB to UC in such a case. But it wouldn't be impossible to simply read the MTRRs upon boot, store the information, disable MTRRs, and correctly use PAT to achieve the same effect (i.e. the blindly maps part of course would need fixing). UEFI memory table has memory attribute, which describes cache types supported in physical memory ranges. However, this information gets lost when it it is converted to e820 table. I'm afraid you rather don't want to trust that information, as firmware vendors frequently screw it up. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 10/16] tools: extend xc_assign_device() to support rdm reservation policy
On 2015/6/11 18:02, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM This patch passes rdm reservation policy to xc_assign_device() so the policy is checked when assigning devices to a VM. Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- tools/libxc/include/xenctrl.h | 3 ++- tools/libxc/xc_domain.c | 6 +- tools/libxl/libxl_pci.c | 3 ++- tools/ocaml/libs/xc/xenctrl_stubs.c | 18 ++ tools/python/xen/lowlevel/xc/xc.c | 29 +++-- 5 files changed, 42 insertions(+), 17 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 6c01362..7fd60d5 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2078,7 +2078,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch, /* HVM guest pass-through */ int xc_assign_device(xc_interface *xch, uint32_t domid, - uint32_t machine_sbdf); + uint32_t machine_sbdf, + uint32_t flag); int xc_get_device_group(xc_interface *xch, uint32_t domid, diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 4f96e1b..19127ec 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch, int xc_assign_device( xc_interface *xch, uint32_t domid, -uint32_t machine_sbdf) +uint32_t machine_sbdf, +uint32_t flag) { DECLARE_DOMCTL; @@ -1705,6 +1706,7 @@ int xc_assign_device( domctl.domain = domid; domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI; domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf; +domctl.u.assign_device.flag = flag; return do_domctl(xch, domctl); } @@ -1792,6 +1794,8 @@ int xc_assign_dt_device( domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT; domctl.u.assign_device.u.dt.size = size; +/* DT doesn't own any RDM. */ +domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM; still not clear about this NO_RDM flag. If a device-tree device doesn't own RDM, the hypervisor will know it. Why do we require toolstack to tell hypervisor not use it? I think an explicit flag can make this sort of case identified, right? And other flags brings easily some confusions, or even a potential risk in the future. Thanks Tiejun set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path); rc = do_domctl(xch, domctl); diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index e0743f8..632c15e 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i FILE *f; unsigned long long start, end, flags, size; int irq, i, rc, hvm = 0; +uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED; if (type == LIBXL_DOMAIN_TYPE_INVALID) return ERROR_FAIL; @@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i out: if (!libxl_is_stubdom(ctx, domid, NULL)) { -rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev)); +rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), flag); if (rc 0 (hvm || errno != ENOSYS)) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed); return ERROR_FAIL; diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c index 64f1137..317bf75 100644 --- a/tools/ocaml/libs/xc/xenctrl_stubs.c +++ b/tools/ocaml/libs/xc/xenctrl_stubs.c @@ -1172,12 +1172,19 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d CAMLreturn(Val_bool(ret == 0)); } -CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc) +static int domain_assign_device_rdm_flag_table[] = { +XEN_DOMCTL_DEV_NO_RDM, +XEN_DOMCTL_DEV_RDM_RELAXED, +XEN_DOMCTL_DEV_RDM_STRICT, +}; + +CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc, +value rflag) { - CAMLparam3(xch, domid, desc); + CAMLparam4(xch, domid, desc, rflag); int ret; int domain, bus, dev, func; - uint32_t sbdf; + uint32_t sbdf, flag; domain = Int_val(Field(desc, 0)); bus = Int_val(Field(desc, 1)); @@ -1185,7 +1192,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc) func = Int_val(Field(desc, 3)); sbdf = encode_sbdf(domain, bus, dev, func); - ret = xc_assign_device(_H(xch), _D(domid), sbdf); + ret = Int_val(Field(rflag, 0)); + flag = domain_assign_device_rdm_flag_table[ret]; + + ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag); if (ret 0) failwith_xc(_H(xch)); diff --git
Re: [Xen-devel] [v3][PATCH 15/16] xen/vtd: enable USB device assignment
On 2015/6/11 18:22, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM Before we refine RMRR mechanism, USB RMRR may conflict with guest bios region so we always ignore USB RMRR. If USB RMRR conflicts with guest bios, the conflict is always there before and after your refinement. :-) Yeah :) Now this can be gone when we enable pci_force to check/reserve RMRR. So what about this? USB RMRR may conflict with guest bios region so we always ignore USB RMRR. But now this can be checked to handle after we introduce our policy mechanism. Signed-off-by: Tiejun Chen tiejun.c...@intel.com Acked-by: Kevin Tian kevin.t...@intel.com except one small comment below --- xen/drivers/passthrough/vtd/dmar.h | 1 - xen/drivers/passthrough/vtd/iommu.c | 11 ++- xen/drivers/passthrough/vtd/utils.c | 7 --- 3 files changed, 2 insertions(+), 17 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.h b/xen/drivers/passthrough/vtd/dmar.h index af1feef..af205f5 100644 --- a/xen/drivers/passthrough/vtd/dmar.h +++ b/xen/drivers/passthrough/vtd/dmar.h @@ -129,7 +129,6 @@ do {\ int vtd_hw_check(void); void disable_pmr(struct iommu *iommu); -int is_usb_device(u16 seg, u8 bus, u8 devfn); int is_igd_drhd(struct acpi_drhd_unit *drhd); #endif /* _DMAR_H_ */ diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index d7c9e1c..d3233b8 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2229,11 +2229,9 @@ static int reassign_device_ownership( /* * If the device belongs to the hardware domain, and it has RMRR, don't * remove it from the hardware domain, because BIOS may use RMRR at - * booting time. Also account for the special casing of USB below (in - * intel_iommu_assign_device()). + * booting time. this code is run-time right? According to one associated commit, commit 8b99f4400b695535153dcd5d949b3f63602ca8bf Author: Jan Beulich jbeul...@suse.com Date: Fri Oct 10 10:54:21 2014 +0200 VT-d: fix RMRR related error handling - reassign_device_ownership() now tears down RMRR mappings (for other than Dom0) - to facilitate that, rmrr_identity_mapping() now deals with both establishing and tearing down of these mappings (the open coded equivalent in intel_iommu_remove_device() is being replaced at once) - intel_iommu_assign_device() now unrolls the assignment upon RMRR mapping errors - intel_iommu_add_device() now returns consistent values upon RMRR mapping failures (was: failure when last iteration ran into a problem, success otherwise) - intel_iommu_remove_device() no longer special cases Dom0 (it only ever gets called for devices removed from the _system_, not a domain) - rmrr_identity_mapping() now returns a proper error indicator instead of -1 when intel_iommu_map_page() failed Signed-off-by: Jan Beulich jbeul...@suse.com Acked-by: Kevin Tian kevin.t...@intel.com This chunk of codes resides inside intel_iommu_remove_device() so I think this shouldn't be for a running domain. Thanks Tiejun */ -if ( !is_hardware_domain(source) - !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) ) +if ( !is_hardware_domain(source) ) { const struct acpi_rmrr_unit *rmrr; u16 bdf; @@ -2283,13 +2281,8 @@ static int intel_iommu_assign_device( if ( ret ) return ret; -/* FIXME: Because USB RMRR conflicts with guest bios region, - * ignore USB RMRR temporarily. - */ seg = pdev-seg; bus = pdev-bus; -if ( is_usb_device(seg, bus, pdev-devfn) ) -return 0; /* Setup rmrr identity mapping */ for_each_rmrr_device( rmrr, bdf, i ) diff --git a/xen/drivers/passthrough/vtd/utils.c b/xen/drivers/passthrough/vtd/utils.c index bd14c02..b8a077f 100644 --- a/xen/drivers/passthrough/vtd/utils.c +++ b/xen/drivers/passthrough/vtd/utils.c @@ -29,13 +29,6 @@ #include extern.h #include asm/io_apic.h -int is_usb_device(u16 seg, u8 bus, u8 devfn) -{ -u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), -PCI_CLASS_DEVICE); -return (class == 0xc03); -} - /* Disable vt-d protected memory registers. */ void disable_pmr(struct iommu *iommu) { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 2/7] libxl_read_file_contents: add new entry to read sysfs file
On 6/12/2015 at 12:16 AM, in message 21881.46148.466883.923...@mariner.uk.xensource.com, Ian Jackson ian.jack...@eu.citrix.com wrote: Chunyan Liu writes ([Xen-devel] [PATCH V4 2/7] libxl_read_file_contents: add new entry to read sysfs file): Sysfs file has size=4096 but actual file content is less than that. Current libxl_read_file_contents will treat it as error when file size and actual file content differs, so reading sysfs file content with this function always fails. Add a new entry libxl_read_sysfs_file_contents to handle sysfs file specially. It would be used in later pvusb work. I think this patch is roughly right, but: -int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, - void **data_r, int *datalen_r) { +static int libxl_read_file_contents_core(libxl_ctx *ctx, const char *filename, + void **data_r, int *datalen_r, + bool is_sysfs_file) I would prefer a functional rather than contextual name for the variabvle is_sysfs_file. How about `tolerate_shrinking_file' ? OK. @@ -360,15 +362,16 @@ int libxl_read_file_contents(libxl_ctx *ctx, const char *filename, if (stab.st_size data_r) { data = malloc(datalen); +memset(data, 0, datalen); What is this for ? I found sometimes reading sysfs file contents, at the end, there is some random character. With this line, there is no problem then. if (!data) goto xe; rs = fread(data, 1, datalen, f); -if (rs != datalen) { +if (rs != datalen !(feof(f) is_sysfs_file)) { if (ferror(f)) LOGE(ERROR, failed to read %s, filename); else if (feof(f)) I think it would be better to handle the special case here, with something like if (!tolerate_shrinking_file) error stuff else { assert(datalen_r); and to move the `goto xe' into the individual branches. OK. Will update. Is there any risk that the file is actually bigger than advertised, rather than smaller ? For sysfs file, couldn't be bigger. diff --git a/tools/libxl/libxl_utils.h b/tools/libxl/libxl_utils.h index 1c1761d..7639662 100644 --- a/tools/libxl/libxl_utils.h +++ b/tools/libxl/libxl_utils.h ... +int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char *filename, + void **data_r, int *datalen_r); I think this is a function with sufficiently odd semantics, and a sufficiently internal purpose, that it should probably not be exposed in the API. So move to libxl_internal.h? Or not hacking here but just adding an internal function in libxl_pvusb.c with repeated codes? Ian ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv11 4/4] gnttab: use per-VCPU maptrack free lists
On 11.06.15 at 17:28, t...@xen.org wrote: At 15:51 +0100 on 05 Jun (1433519478), Jan Beulich wrote: Iirc before both of these changes, and the v10 ones imo should have invalidated it. Tim, I'm particularly trying to understand whether you're okay with the original's (potentially even heavier) resource use and/or this version's (risking to run out of maptrack entries _much_ earlier than currently). The concern with the earlier version being that the maximum maptrack limit gets a lot higher with many vcpus? I was OK with that. There are a lot of things that scale with #vcpus, and xenheap pages are not particularly scarce any more. So let's say I don't find one 128-vcpu guest much different from 128 1-vcpu guests for this purpose. I certainly can see that resource consumption may scale with the number of vCPU-s a guest has. But whether that ought to apply to everything, i.e. including all per-domain (rather than per-vCPU) resources (which the maptrack table is only an example of) I'm not sure about. In the end, if we were to talk about v9 and the default of 256 frames, a 1024-vCPU DomU could consume 1Gb of memory for its maptrack even if (quite likely) it doesn't serve any other guest. IOW to me it would seem that a necessary prereq to this change would be to make the limit a per-domain setting, with only Dom0 getting its limit bumped by default (and even then the at-least-one-page-per-vCPU requirement undermines that, i.e. a slow path to avoid going over the set limit would still seem necessary). Additionally, with the maptrack pages no longer shared across vCPU-s, running into an allocation failure namely on ballooned setups (where Xen typically doesn't have much memory available, since the ballooning ordinarily only accounts for the immediate domain needs) is going to become quite a bit more likely. Yes, I already hear David saying I don't care about the ballooning case, but no, I don't think we can simply ignore this case (unless we want to declare ballooning a deprecated, no longer supported feature). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V4 5/7] xl: add pvusb commands
On 06/12/2015 10:03 AM, Chun Yan Liu wrote: On 6/12/2015 at 03:37 PM, in message 557a8c57.1040...@suse.com, Juergen Gross jgr...@suse.com wrote: On 06/10/2015 05:20 AM, Chunyan Liu wrote: ... +} else if (MATCH_OPTION(ports, argv[optind], oparg)) { +usbctrl.ports = atoi(oparg); Same here for number of ports. Otherwise you could blow up xenstore by e.g. specifying 2 billion ports here. Will add check. What's the supported max ports? 31. I'm about to send a patch updating the pvusb interface description which will then contain a define for that (USBIF_MAX_PORTNR). Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 13/16] tools/libxl: detect and avoid conflicts with RDM
On 2015/6/11 18:19, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM While building a VM, HVM domain builder provides struct hvm_info_table{} to help hvmloader. Currently it includes two fields to construct guest e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should check them to fix any conflict with RAM. RMRR can reside in address space beyond 4G theoretically, but we never see this in real world. So in order to avoid breaking highmem layout we don't solve highmem conflict. Note this means highmem rmrr could still be supported if no conflict. But in the case of lowmem, RMRR probably scatter the whole RAM space. Especially multiple RMRR entries would worsen this to lead a complicated memory layout. And then its hard to extend hvm_info_table{} to work hvmloader out. So here we're trying to figure out a simple solution to avoid breaking existing layout. So when a conflict occurs, #1. Above a predefined boundary (default 2G) - move lowmem_end below reserved region to solve conflict; #2. Below a predefined boundary (default 2G) - Check strict/relaxed policy. strict policy leads to fail libxl. Note when both policies are specified on a given region, 'strict' is always preferred. relaxed policy issue a warning message and also mask this entry INVALID to indicate we shouldn't expose this entry to hvmloader. Note this predefined boundary can be changes with the parameter rdm_mem_boundary in .cfg file. Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Kevin Tian kevint.t...@intel.com One comment though. could you be consistent to use RDM in the code? RMRR is just an example of RDM... Sure. Thanks Tiejun --- docs/man/xl.cfg.pod.5 | 21 tools/libxc/xc_hvm_build_x86.c | 5 +- tools/libxl/libxl.h| 6 + tools/libxl/libxl_create.c | 6 +- tools/libxl/libxl_dm.c | 255 + tools/libxl/libxl_dom.c| 11 +- tools/libxl/libxl_internal.h | 11 +- tools/libxl/libxl_types.idl| 8 ++ tools/libxl/xl_cmdimpl.c | 3 + 9 files changed, 322 insertions(+), 4 deletions(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 638b350..6fd2370 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -767,6 +767,27 @@ to a given device, and strict is default here. Note this would override global Brdm option. +=item Brdm_mem_boundary=MBYTES + +Number of megabytes to set a boundary for checking rdm conflict. + +When RDM conflicts with RAM, RDM probably scatter the whole RAM space. +Especially multiple RMRR entries would worsen this to lead a complicated +memory layout. So here we're trying to figure out a simple solution to +avoid breaking existing layout. So when a conflict occurs, + +#1. Above a predefined boundary +- move lowmem_end below reserved region to solve conflict; + +#2. Below a predefined boundary +- Check strict/relaxed policy. +strict policy leads to fail libxl. Note when both policies +are specified on a given region, 'strict' is always preferred. +relaxed policy issue a warning message and also mask this entry INVALID +to indicate we shouldn't expose this entry to hvmloader. + +Here the default is 2G. + =back =back diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index 0e98c84..5142578 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -21,6 +21,7 @@ #include stdlib.h #include unistd.h #include zlib.h +#include assert.h #include xg_private.h #include xc_private.h @@ -270,7 +271,7 @@ static int setup_guest(xc_interface *xch, elf_parse_binary(elf); v_start = 0; -v_end = args-mem_size; +v_end = args-lowmem_end; if ( nr_pages target_pages ) memflags |= XENMEMF_populate_on_demand; @@ -754,6 +755,8 @@ int xc_hvm_build_target_mem(xc_interface *xch, args.mem_size = (uint64_t)memsize 20; args.mem_target = (uint64_t)target 20; args.image_file_name = image_name; +if ( args.mmio_size == 0 ) +args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH; return xc_hvm_build(xch, domid, args); } diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index 0a7913b..a6212fb 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -858,6 +858,12 @@ const char *libxl_defbool_to_string(libxl_defbool b); #define LIBXL_TIMER_MODE_DEFAULT -1 #define LIBXL_MEMKB_DEFAULT ~0ULL +/* + * We'd like to set a memory boundary to determine if we need to check + * any overlap with reserved device memory. + */ +#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024) + #define LIBXL_MS_VM_GENID_LEN 16 typedef struct { uint8_t bytes[LIBXL_MS_VM_GENID_LEN]; diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 6c8ec63..0438731 100644 ---
[Xen-devel] [PATCH v8 0/8] Support more than 8 vcpus on arm64 with GICv3
From: Chen Baozi baoz...@gmail.com Currently the number of vcpus on arm64 with GICv3 is limited up to 8 due to the fixed size of redistributor mmio region. Increasing the size makes the number expand to 16 because of AFF0 restriction on GICv3. To create a guest up to 128 vCPUs, which is the maxium number that GIC-500 can support, this patchset uses the AFF1 information to create a mapping relation between vCPUID and vMPIDR and deals with the related issues. These patches are written based upon Julien's GICv2 on GICv3 series and the IROUTER emulation cleanup patch. Changes from V7: * Updates some commit logs * Drop aff2 and aff3 in struct sgi_target for it is unused. * Minor changes according to previous reviews. Changes from V6: * Use the new 'struct sgi_target' instead of cpumask_t in vgic_to_sgi. * Make the domain_max_vcpus to return MAX_VIRT_CPUS and avoid to split arch_domain_init. Changes form V5: * Rework gicv3_sgir_to_cpumask in #5 * Rework #8 to split arch_domain_create into two parts: - arch_domain_preinit to initialise vgic_ops before evtchn_init is called - the rest of logic remains in arch_domain_create * Use a field value in struct vgic_ops instead of the function point for max_vcpus. * Minor changes according to previous reviews. Changes from V4: * Split the patch 4/8 of V3 into two part: - Use cpumask_t type for vcpu_mask in vgic_to_sgi. - Use AFF1 when translating ICC_SGI1R_EL1 to cpumask. * Use a more efficient algorithm when calculate cpumask. * Add a patch to call arch_domain_create before evtchn_init, because evtchn_init needs vgic info which is initialised during acrh_domain_create. * Get the max vcpu info from vgic_ops. * Minor changes according to previous reviews. Changes from V3: * Drop the wrong patch that altering domain_max_vcpus to a macro. * Change the domain_max_vcpus to return value accodring to the version of the vGIC in used. Changes from V2: * Reorder the patch which increases MAX_VIRT_CPUS to the last to make this series bisectable. * Drop the dynamic re-distributor region allocation patch in tools. * Use cpumask_t type instead of unsigned long in vgic_to_sgi and do the translation from GICD_SGIR to vcpu_mask in both vGICv2 and vGICv3. * Make domain_max_vcpus be alias of max_vcpus in struct domain Changes from V1: * Use the way that expanding the GICR address space to support up to 128 redistributor in guest memory layout rather than use the dynamic allocation. * Add support to include AFF1 information in vMPIDR/logical CPUID. Chen Baozi (8): xen/arm: gic-v3: Increase the size of GICR in address space for guest xen/arm: Add functions of mapping between vCPUID and virtual affinity xen/arm: Use the new functions for vCPUID/vaffinity transformation xen/arm: Use AFF1 when translating ICC_SGI1R_EL1 to cpumask tools/libxl: Set 'reg' of cpu node equal to MPIDR affinity for domU xen/arm: Set 'reg' of cpu node for dom0 to match MPIDR's affinity xen/arm: make domain_max_vcpus return value from vgic_ops xen/arm64: increase MAX_VIRT_CPUS to 128 on arm64 tools/libxl/libxl_arm.c | 14 ++-- xen/arch/arm/domain.c | 20 - xen/arch/arm/domain_build.c | 14 +--- xen/arch/arm/vgic-v2.c| 8 --- xen/arch/arm/vgic-v3.c| 18 ++-- xen/arch/arm/vgic.c | 45 +-- xen/arch/arm/vpsci.c | 5 + xen/include/asm-arm/config.h | 4 xen/include/asm-arm/domain.h | 39 +++-- xen/include/asm-arm/gic_v3_defs.h | 3 +++ xen/include/asm-arm/vgic.h| 9 +++- xen/include/public/arch-arm.h | 4 ++-- 12 files changed, 129 insertions(+), 54 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 1/8] xen/arm: gic-v3: Increase the size of GICR in address space for guest
From: Chen Baozi baoz...@gmail.com Currently it only supports up to 8 vCPUs. Increase the region to hold up to 128 vCPUs, which is the maximum number that GIC-500 supports. Signed-off-by: Chen Baozi baoz...@gmail.com Reviewed-by: Julien Grall julien.gr...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com --- xen/include/public/arch-arm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index c029e0f..ec0c261 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -388,8 +388,8 @@ struct xen_arch_domainconfig { #define GUEST_GICV3_RDIST_STRIDE 0x2ULL #define GUEST_GICV3_RDIST_REGIONS 1 -#define GUEST_GICV3_GICR0_BASE 0x0302ULL/* vCPU0 - vCPU7 */ -#define GUEST_GICV3_GICR0_SIZE 0x0010ULL +#define GUEST_GICV3_GICR0_BASE 0x0302ULL/* vCPU0 - vCPU127 */ +#define GUEST_GICV3_GICR0_SIZE 0x0100ULL /* * 16MB == 4096 pages reserved for guest to use as a region to map its -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI Passthrough ARM Design : Draft1
On Thu, 2015-06-11 at 14:38 -0700, Manish Jaggi wrote: On Wednesday 10 June 2015 12:21 PM, Julien Grall wrote: Hi, On 10/06/2015 08:45, Ian Campbell wrote: 4. DomU access / assignment PCI device -- When a device is attached to a domU, provision has to be made such that it can access the MMIO space of the device and xen is able to identify the mapping between guest bdf and system bdf. Two hypercalls are introduced I don't think we want/need new hypercalls here, the same existing hypercalls which are used on x86 should be suitable. I think both the hypercalls are necessary a) the mapping of guest bdf to actual sbdf is required as domU accesses for GIC are trapped and not handled by pciback. A device say 1:0:0.3 is assigned in domU at 0:0:0.3. This is the bestway I could find that works. b) map_mmio call is issued just after the device is added on the pcu bus (in case of domU) The function register_xen_pci_notifier (drivers/xen/pci.c) is modified such that notification is received in domU and dom0. In which please please add to the document a discussion of the current interfaces and why they are not suitable. Beware that the 1:1 mapping doesn't fit with the current guest memory layout which is pre-defined at Xen build time. So you would also have to make it dynamically or decide to use the same memory layout as the host. If same layout as host used, would there be any issue? I'm not sure that a 1:1 mapping is any different to the host layout. But in any case, the host layout also doesn't match the guest layout, so it has the same issues. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 3/8] xen/arm: Use the new functions for vCPUID/vaffinity transformation
From: Chen Baozi baoz...@gmail.com There are 3 places to change: * Initialise vMPIDR value in vcpu_initialise() * Find the vCPU from vMPIDR affinity information when accessing GICD registers in vGIC * Find the vCPU from vMPIDR affinity information when booting with vPSCI in vGIC - Both PSCI 0.1 and PSCI 0.2 are modified to respect the MPIDR like. Signed-off-by: Chen Baozi baoz...@gmail.com Reviewed-by: Julien Grall julien.gr...@citrix.com --- PSCI 0.1 Section 6.3 (ARM DEN 0022A): Ideally platform discovery mechanism such as firmware tables would be used by secure firmware to describe the set of valid CPUIDs to the hypervisor or Rich OS, if the former is not present. The hypervisor in turn can create and supply virtual discovery mechanisms to its guests. Thus, we consider CPUID is equal to the 'reg' register in DT (which is an MPIDR-like value). xen/arch/arm/domain.c | 6 +- xen/arch/arm/vgic-v3.c | 2 +- xen/arch/arm/vpsci.c | 5 + 3 files changed, 3 insertions(+), 10 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 2bde26e..0cf147c 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -501,11 +501,7 @@ int vcpu_initialise(struct vcpu *v) v-arch.sctlr = SCTLR_GUEST_INIT; -/* - * By default exposes an SMP system with AFF0 set to the VCPU ID - * TODO: Handle multi-threading processor and cluster - */ -v-arch.vmpidr = MPIDR_SMP | (v-vcpu_id MPIDR_AFF0_SHIFT); +v-arch.vmpidr = MPIDR_SMP | vcpuid_to_vaffinity(v-vcpu_id); v-arch.actlr = READ_SYSREG32(ACTLR_EL1); diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c index 540f85f..ef9a71a 100644 --- a/xen/arch/arm/vgic-v3.c +++ b/xen/arch/arm/vgic-v3.c @@ -61,7 +61,7 @@ static struct vcpu *vgic_v3_irouter_to_vcpu(struct domain *d, uint64_t irouter) if ( irouter GICD_IROUTER_SPI_MODE_ANY ) return d-vcpu[0]; -vcpu_id = irouter MPIDR_AFF0_MASK; +vcpu_id = vaffinity_to_vcpuid(irouter); if ( vcpu_id = d-max_vcpus ) return NULL; diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c index 5d899be..aebe1e2 100644 --- a/xen/arch/arm/vpsci.c +++ b/xen/arch/arm/vpsci.c @@ -32,10 +32,7 @@ static int do_common_cpu_on(register_t target_cpu, register_t entry_point, int is_thumb = entry_point 1; register_t vcpuid; -if( ver == XEN_PSCI_V_0_2 ) -vcpuid = (target_cpu MPIDR_HWID_MASK); -else -vcpuid = target_cpu; +vcpuid = vaffinity_to_vcpuid(target_cpu); if ( vcpuid = d-max_vcpus || (v = d-vcpu[vcpuid]) == NULL ) return PSCI_INVALID_PARAMETERS; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling
On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote: Please could you trim your quotes to only include the bit you are referring to. Otherwise there is a high chance I will miss a one line comment in the middle of the thousand lines of quoted matter. The `GITS_BASER0` will be setup to request sufficient memory for a device table consisting of entries of: struct vdevice_table { uint64_t vitt_ipa; uint32_t vitt_size; uint32_t padding; }; How about adding valid bit to know if the entry is valid or not? I suggest to use vitt_ipa == INVALID_PADDR to signal this rather than using another bit. ## Handling of unrouted/spurious LPIs Since there is no 1:1 link between a `vLPI` and `pLPI` enabling and disabling of phyiscal LPIs cannot be driven from the state of an associated vLPI. Each `pLPI` is routed and enabled during device assignment, therefore it is possible to receive a physical LPI which has yet to be routed (via a `vITS`) to a `vLPI`. Why do we need to enable LPIs during device assignment? Can't we do it only on LPI configuration update, which is trapped in Xen as mentioned in 7.8? ( ## Enabling and disabling LPIs) Quoting the first sentence/paragraph of this section: Since there is no 1:1 link between a `vLPI` and `pLPI` enabling and disabling of phyiscal LPIs cannot be driven from the state of an associated vLPI. To expand on that: The vITT can map multiple (vDevice,vEvent) pairs to the same LPI, and each of those (vDevice,vEvent) pairs is related to a different (pDevice,pEvent) which in turn has a unique pLPI associated with it. Thus a vLPI can be associated with more than one pLPI. Enumerating all pLPIs associated with a given vLPI would be expensive (a complete walk of the vITT). In addition if it were possible to do so we would also need to manage enabling/disabling the pLPI in several other places that in vPLI cfg traps, specifically MAPC and MAPD at least. So pLPIs must be routed at device assignment time because in the vLPI configuration table trap there is no mapping back to a single pLPI. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 04/16] xen/passthrough: extend hypercall to support rdm reservation policy
On 12.06.15 at 11:20, tiejun.c...@intel.com wrote: On 2015/6/12 16:45, Jan Beulich wrote: On 12.06.15 at 08:31, tiejun.c...@intel.com wrote: On 2015/6/11 17:28, Tian, Kevin wrote: From: Chen, Tiejun Sent: Thursday, June 11, 2015 9:15 AM @@ -1940,7 +1942,8 @@ static int intel_iommu_remove_device(u8 devfn, struct pci_dev *pdev) PCI_DEVFN2(bdf) != devfn ) continue; -rmrr_identity_mapping(pdev-domain, 0, rmrr); +rmrr_identity_mapping(pdev-domain, 0, rmrr, + XEN_DOMCTL_DEV_RDM_RELAXED); ditto It doesn't matter when we're trying to remove a device since we don't care this flag. In such a case it helps to add a brief comment saying that the precise value passed is irrelevant. Or maybe this could be expressed by Okay. folding this and the map parameters of the function (in which case it might become self-documenting)? Sorry, I don't know exactly how to implement this idea. Have we any similar example on Xen side? No idea what you're after. What we have with your change are tuples like (map, relaxed) (map, strict) (unmap, ignored) Clearly these can be represented with three distinct numbers. I.e. along with using XEN_DOMCTL_DEV_RDM_* without the map boolean, (unmap, ignored) could e.g. be expressed by passing -1. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
On 06/12/2015 03:41 PM, Paul Durrant wrote: -Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 12 June 2015 04:22 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/11/2015 09:25 PM, Paul Durrant wrote: -Original Message- From: Yang Hongyang [mailto:yan...@cn.fujitsu.com] Sent: 11 June 2015 13:59 To: Paul Durrant; Wen Congyang; Andrew Cooper; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/11/2015 06:20 PM, Paul Durrant wrote: -Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 11 June 2015 09:48 To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; [...] In our implementation, we don't start a new emulator. The codes can work, but some bugs may be not triggered. How do you reconcile the incoming QEMU save record with the running emulator state? We introduce a qmp command xen-load-devices- state(libxl__qmp_restore) which can restore the emulator state. The step of resotre emulator state at a checkpoint is: 1. libxl__qmp_stop- vm_stop() in qemu 2. libxl__qmp_restore - load_vmstate() in qemu 3. libxl__qmp_resume - vm_start() in qemu Ok, that sounds like the ideal time to hook back into Xen by creating a new ioreq server. I have some questions about ioreq server: 1. If we use old version xen and newest version qemu, is it OK? Is default ioreq server created when the guest is created. xen_create_ioreq_server() does nothing, and xen_get_ioreq_server_info() will get the default ioreq server information. Is it right? No. It's not compatible in that direction. A new Xen will work with an old QEMU but not the other way round. 2. Why we create a default ioreq server when getting the hvm param if there is already a not default ioreq server? If something reads the 'legacy' HVM params then that is Xen's trigger to create the default server. Any 'new' emulator should be using the ioreq server hypercalls so the default server will not be needed. If there are two ioreq servers: default ioreq server, and a ioreq server created by emulator. The guest can work it correctly in this case? Is there any application(not emulator) that uses the libxenctrl directly? Thanks Wen Congyang 3. In the far end, we will clear the ioreq page, and this ioreq page is used for default ioreq server, is it right? Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and that includes the default server pages. Other ioreq servers will have their pages cleared on re-insertion to the P2M at the source end when the server is disabled. Paul Thanks Wen Congyang Paul Paul Thanks Wen Congyang Paul Thanks Wen Congyang Paul We will set to the guest to a new state, the old state should be dropped. Thanks Wen Congyang Paul Thanks Wen Congyang Paul Thanks Wen Congyang ~Andrew . . . ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel . . ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel . -- Thanks, Yang. . . ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST] Arrange to upgrade microcode on x86 test hosts.
Ian Campbell writes (Re: [PATCH OSSTEST] Arrange to upgrade microcode on x86 test hosts.): On Thu, 2015-06-11 at 16:39 +0100, Ian Jackson wrote: If $c{prop} is undef, the logm will produce a warning. Maybe you want //'' somewhere. I think that was a stray wip debug thing, I could fix, remove or move it below the check of $c{$prop}. Any of those would be fine by me. This use of caps lock for unexported shell variables is rather unidiomatic, don't you think ? I'm not really in the habit of caring for short scripts, but I'll down case them. I find it confusing, because I want to know what other process is going to be using them. Thanks. mg-debian-installer-update is idempotent. Perhaps this script should be too ? After all you can only overwrite something generated today. True. I'll arrange that. I'll see if I can also make it deterministic like with the installer stuff, which will make spotting differences easier. Great. +for BIN in $AMD_BINS ; do +curl -s $LINUX_FW/plain/$BIN $BIN +done We're really fetching these via the gitweb/cgit/... ? Isn't that a bit fragile ? A bit, yes. The alternative is to clone the whole of linux-firmware.git. I suppose with --depth=1 that might not be so bad. Would you prefer that? Yes. Will go on to read your v2 just now... Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()
Ian Campbell writes (Re: [Xen-devel] [PATCH v2 COLOPre 06/13] tools/libxl: Introduce a new internal API libxl__domain_unpause()): On Thu, 2015-06-11 at 17:09 +0800, Wen Congyang wrote: If the public API creates a new AO, it is safe to call it directly? A public function which takes an ao_how is, I believe, an exception to this rule and should be annotated with LIBXL_EXTERNAL_CALLERS_ONLY to prevent accidents. Yes. I don't think libxl_domain_unpause is such a function though. Indeed. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] libxl backports for 4.4 and 4.5
On 12 Jun 2015, at 13:01, Ian Jackson ian.jack...@eu.citrix.com wrote: Ian Jackson writes (Re: [Xen-devel] libxl backports for 4.4 and 4.5): I propose to push all of these today, unless you object. I will double-check with you on irc. Now done. Thanks Lars ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling
On 12/06/2015 08:55, Ian Campbell wrote: On Thu, 2015-06-11 at 10:40 +0100, Ian Campbell wrote: ## Command Queue Virtualisation The command translation/emulation in this design has been arranged to be as cheap as possible (e.g. in many cases the actions are NOPs), avoiding previous concerns about the length of time which an emulated write to a `CWRITER` register may block the vcpu. The vits will simply track its reader and writer pointers. On write to `CWRITER` it will immediately and synchronously process all commands in the queue and update its state accordingly. It might be possible to implement a rudimentary form of preemption by periodically (as determined by `hypercall_preempt_check()`) returning to the guest without incrementing PC but with updated internal `CREADR` state, meaning it will reexecute the write to `CWRITER` and we can pickup where we left off for another iteration. This at least lets us schedule other vcpus etc and prevents a monopoly. In the presence of multiple VCPUs writing to GITS_CWRITER preemption actually gets pretty subtle. I suggest leaving it out for now. Would it be possible to do it with re-doing the write to the GITS_CWRITER? Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 09/15] Add a new per-vCPU tasklet to wakeup the blocked vCPU
On 12.06.15 at 11:40, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, June 09, 2015 11:09 PM On 08.05.15 at 11:07, feng...@intel.com wrote: This patch adds a new per-vCPU tasklet to wakeup the blocked vCPU. It can be used in the case vcpu_unblock cannot be called directly. This tasklet will be used in later patch in this series. Signed-off-by: Feng Wu feng...@intel.com --- xen/common/domain.c | 11 +++ xen/include/xen/sched.h | 3 +++ 2 files changed, 14 insertions(+) What is the rationale for doing this in common code, modifying common structures? Do you mean moving this tasklet to the architecture specific structure, such as, struct arch_vcpu ? Exactly. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 06/11] x86/intel_pstate: the main boby of the intel_pstate driver
On 11/06/2015 21:41, Wang, Wei W wrote: On 11/06/2015 22:02, Julien Grall wrote: On 11/06/2015 04:27, Wei Wang wrote: diff --git a/xen/include/acpi/cpufreq/cpufreq.h b/xen/include/acpi/cpufreq/cpufreq.h index d10e4c7..71bb45c 100644 --- a/xen/include/acpi/cpufreq/cpufreq.h +++ b/xen/include/acpi/cpufreq/cpufreq.h @@ -34,6 +34,12 @@ struct acpi_cpufreq_data { extern struct acpi_cpufreq_data *cpufreq_drv_data[NR_CPUS]; +/* + * Maximum transition latency is in nanoseconds - if it's unknown, + * CPUFREQ_ETERNAL shall be used. + */ +#define CPUFREQ_ETERNAL(-1) + struct cpufreq_cpuinfo { unsigned intmax_freq; unsigned intsecond_max_freq;/* P1 if Turbo Mode is on */ @@ -77,6 +83,8 @@ struct cpufreq_policy { }; DECLARE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_policy); +extern int intel_pstate_init(void); + As said on a previous version [1], intel_pstate_init is x86 specific. Although xen/include/acpi contains common headers. Please see our latest discussion here (the bottom of the link): http://lists.xen.org/archives/html/xen-devel/2015-06/msg00047.html Well we are planning to move cpufreq.h out of acpi in order to use for device tree based platform. Most of these declaration is common. Although any x86 specific function would have to go out in a separate header. Please avoid to add new one when it's possible. I don't see why a new asm-x86/cpufreq.h can't be added... extern int __cpufreq_set_policy(struct cpufreq_policy *data, struct cpufreq_policy *policy); @@ -101,6 +109,12 @@ struct cpufreq_freqs { * CPUFREQ GOVERNORS* ** ***/ +/* The four internal governors used in intel_pstate */ +#define CPUFREQ_POLICY_POWERSAVE(1) +#define CPUFREQ_POLICY_PERFORMANCE (2) +#define CPUFREQ_POLICY_USERSPACE(3) +#define CPUFREQ_POLICY_ONDEMAND (4) + From the comment, this looks like x86 specific. Maybe even intel_pstate? Yes. It's currently only used by the intel_pstate driver. After looking to this series, this statement looks wrong to me... You are using all these defines in the common cpufreq code (parameters, pmstat,...). The cpufreq framework should be agnostic to any cpufreq driver implementation. So it looks like to me that we want CPUFREQ_* to be exposed for anyone. And specifying the behavior when policy = 0 would be great too rather than relying on a future developer to not define 0. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] libxl backports for 4.4 and 4.5
Jan Beulich writes (Re: [Xen-devel] libxl backports for 4.4 and 4.5): On 12.06.15 at 11:32, lars.kurth@gmail.com wrote: you seem to be disagreeing, or more likely Ian missed Jan's reply, so I wanted to double check. Thanks for pointing this out, Lars. I had just spotted it myself. The release being stuck anyway, perhaps we can put them in with the understanding that this time round we'll need at least another RC. Ian - you know how risky the changes are, please use your judgment. These changes are old, well-tested, and they are important bugfixes. I think they should go in. (Btw., as to 4.4.3, I don't think I want to kick off that process until 4.5.1 gets out of its stuck state, i.e. as all stable trees are affected by the underlying problem, I don't want both trees to be hold up half way through the stable release process.) Right. So I take you to mean that I should apply backports to 4.4 freely. Jan, things that I have queued for 4.5 are therefore: d83bf9d2 libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap 12e817e2 Fix build of 32bit toolstacks on CentOS 5.x following XSA-125 9369988 libxl: event handling: Break out ao_work_outstanding f1335f0 libxl: event handling: ao_inprogress does waits while reports outstanding Of these, all but 12e817e2 have been in master for a long time. 12e817e2 was pushed this morning. I propose to push all of these today, unless you object. I will double-check with you on irc. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix
-Original Message- From: Fabio Fantoni [mailto:fabio.fant...@m2r.biz] Sent: 12 June 2015 11:44 To: Paul Durrant; xen-de...@lists.xenproject.org Subject: Re: [Xen-devel] [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix Il 11/06/2015 17:42, Paul Durrant ha scritto: This patch series re-works much of the code involved in emulation of port and memory mapped I/O for HVM guests. The code has become very convoluted and, at least by inspection, certain emulations will apparently malfunction. The series is broken down into 17 patches (which are also available in my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git on the emulation22 branch) as follows: 0001-x86-hvm-simplify-hvmemul_do_io.patch 0002-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch 0003-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch 0004-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch 0005-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch 0006-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch 0007-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch 0008-x86-hvm-split-I-O-completion-handling-from-state-mod.patch 0009-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch 0010-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch 0011-x86-hvm-remove-hvm_io_state-enumeration.patch 0012-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch 0013-x86-hvm-only-acquire-RAM-pages-for-emulation-when-we.patch 0014-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch 0015-x86-hvm-make-sure-translated-MMIO-reads-or-writes-fa.patch 0016-x86-hvm-remove-multiple-open-coded-chunking-loops.patch 0017-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch v2: - Removed bogus assertion from patch 16 - Re-worked patch 17 after basic testing of back-port onto XenServer Thanks, I confirm that this new version solves the critical regression that causes dom0 insta-reboot. I tested them on windows and linux hvm domUs, I had strange results... On windows 7 64 bit sp1 domU (with new winpv drivers) seems there are a small performance regression (or probably only latency increased) but I not found warning/error or something userful in logs, I'm not sure is real regression of your patches, if needed I'll retry without. On linux domU (fedora 21) qxl that had sse2 instructions problem still not working, same of latest 2 years not always qemu crash with a gdb showing sse2 instruction problem but xorg crash with EQ overflowing and/or domU remain with 100% cpu used by xorg qemu process at 100% cpu. I posted the backtrace similar time ago and a qemu/spice developer told that was a conseguence of other problem if I remember good. How can I be sure that sse2 or similar istructions (with videoram) are now working correctly? Once I have debugged through XenServer automated testing I plan to add a targeted SSE test somewhere, possibly as a smoke test in hvmloader. Paul If you need more informations/tests tell me and I'll post them. Thanks for any reply and sorry for my bad english. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling
On Fri, 2015-06-12 at 09:09 -0400, Julien Grall wrote: Hi Ian, On 12/06/2015 04:52, Ian Campbell wrote: On Fri, 2015-06-12 at 14:07 +0530, Vijay Kilari wrote: So pLPIs must be routed at device assignment time because in the vLPI configuration table trap there is no mapping back to a single pLPI. I just remembered the exact reason that made use to differ SPI enabling. I can't parse this sentence, differ how? When the device is assigned, the domain VCPUs are still down (even VCPU0). If we receive an interrupt before the VCPU0 is unpaused, the interrupt will be lost. Same if the interrupt is not yet configured (i.e before the vITS setup correctly the table) with your proposal. Is this any different to booting with the ITT not setup? (SPIs are a slightly different case because they don't need h/w routing) This could happen when the device is not quiescent. We had this issue on the vexpress at boot time when the network card was trying to send an interrupt before DOM0 is setup. I don't fully understand the issue you are trying to describe, but do you want to propose a change to the spec? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 07/15] vt-d: Add API to update IRTE when VT-d PI is used
On 12.06.15 at 11:40, feng...@intel.com wrote: -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] On 08.05.15 at 11:07, feng...@intel.com wrote: +static inline void setup_posted_irte( +struct iremap_entry *new_ire, struct pi_desc *pi_desc, uint8_t gvec) +{ +new_ire-post.urg = 0; +new_ire-post.vector = gvec; +new_ire-post.pda_l = (((u64)virt_to_maddr(pi_desc)) + (32 - PDA_LOW_BIT)) PDA_MASK(LOW); +new_ire-post.pda_h = (((u64)virt_to_maddr(pi_desc)) 32) + PDA_MASK(HIGH); Looking at this another time I can't see what the and-ing with PAD_MASK() is supposed to be good for. I cannot understand this well. Do you mean we don't need and PDA_MASK() here? Correct - the bitfield width (where the data gets stored into) already enforces the intended truncation afaict. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Relocatable Xen early boot code
On 12.06.15 at 13:14, daniel.ki...@oracle.com wrote: Here I do not want to discuss GRUB2 and multiboot2 protocol support details for relocatable images. It is not needed. It is sufficient to know that it is able to put loaded image anywhere in available memory below 4 GiB. Loaded image is informed about its base address according to multiboot2 protocol via special tag. This is new feature not available in upstream GRUB2. I work on upstreaming it in parallel. Relevant patches will be posted together with Xen patches. Before going into any detail on what you write later on - if this isn't in upstream grub2, why can't you do what you want to do first without another change needed in the boot loader? In which case, if I'm reading this correctly, you wouldn't need our boot code to become relocatable either. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI Passthrough ARM Design : Draft1
On Fri, 2015-06-12 at 07:41 -0400, Julien Grall wrote: I was suggesting to expose the host layout to the guest layout (similar to e820). We do this on x86 only as a workaround for broken hardware (essentially magic system devices which bypass the IOMMU). We shouldn't do this on ARM by default, but only as and when a similar requirement arises. Although, this is not my preferred way and a non 1:1 mapping would be the best. Yes, a non-1:1 mapping is the right answer. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 5/6] gnttab: fix/adjust gnttab_transfer()
- don't update shared entry's frame number for translated domains (as MFNs shouldn't be exposed to such guests) - for v1 grant table format, force copying of the page also when the intended MFN doesn't fit in 32 bits (and the domain isn't translated) - fix an apparent off-by-one error (it's unclear to me why commit 5cc77f9098 (32-on-64: Fix domain address-size clamping, implement) uses BITS_PER_LONG-1 here, while using BITS_PER_LONG in the two other invocations of domain_clamp_alloc_bitsize()) - adjust comments accompanying the shared entry's frame field Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -1639,7 +1639,8 @@ gnttab_transfer( } max_bitsize = domain_clamp_alloc_bitsize( -e, BITS_PER_LONG+PAGE_SHIFT-1); +e, e-grant_table-gt_version 1 || paging_mode_translate(e) + ? BITS_PER_LONG + PAGE_SHIFT : 32 + PAGE_SHIFT); if ( (1UL (max_bitsize - PAGE_SHIFT)) = mfn ) { struct page_info *new_page; @@ -1736,14 +1737,18 @@ gnttab_transfer( if ( e-grant_table-gt_version == 1 ) { grant_entry_v1_t *sha = shared_entry_v1(e-grant_table, gop.ref); + guest_physmap_add_page(e, sha-frame, mfn, 0); -sha-frame = mfn; +if ( !paging_mode_translate(e) ) +sha-frame = mfn; } else { grant_entry_v2_t *sha = shared_entry_v2(e-grant_table, gop.ref); + guest_physmap_add_page(e, sha-full_page.frame, mfn, 0); -sha-full_page.frame = mfn; +if ( !paging_mode_translate(e) ) +sha-full_page.frame = mfn; } smp_wmb(); shared_entry_header(e-grant_table, gop.ref)-flags |= --- a/xen/include/public/grant_table.h +++ b/xen/include/public/grant_table.h @@ -134,8 +134,10 @@ struct grant_entry_v1 { /* The domain being granted foreign privileges. [GST] */ domid_t domid; /* - * GTF_permit_access: Frame that @domid is allowed to map and access. [GST] - * GTF_accept_transfer: Frame whose ownership transferred by @domid. [XEN] + * GTF_permit_access: GFN that @domid is allowed to map and access. [GST] + * GTF_accept_transfer: GFN that @domid is allowed to transfer into. [GST] + * GTF_transfer_completed: MFN whose ownership transferred by @domid + * (non-translated guests only). [XEN] */ uint32_t frame; }; gnttab: fix/adjust gnttab_transfer() - don't update shared entry's frame number for translated domains (as MFNs shouldn't be exposed to such guests) - for v1 grant table format, force copying of the page also when the intended MFN doesn't fit in 32 bits (and the domain isn't translated) - fix an apparent off-by-one error (it's unclear to me why commit 5cc77f9098 (32-on-64: Fix domain address-size clamping, implement) uses BITS_PER_LONG-1 here, while using BITS_PER_LONG in the two other invocations of domain_clamp_alloc_bitsize()) - adjust comments accompanying the shared entry's frame field Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -1639,7 +1639,8 @@ gnttab_transfer( } max_bitsize = domain_clamp_alloc_bitsize( -e, BITS_PER_LONG+PAGE_SHIFT-1); +e, e-grant_table-gt_version 1 || paging_mode_translate(e) + ? BITS_PER_LONG + PAGE_SHIFT : 32 + PAGE_SHIFT); if ( (1UL (max_bitsize - PAGE_SHIFT)) = mfn ) { struct page_info *new_page; @@ -1736,14 +1737,18 @@ gnttab_transfer( if ( e-grant_table-gt_version == 1 ) { grant_entry_v1_t *sha = shared_entry_v1(e-grant_table, gop.ref); + guest_physmap_add_page(e, sha-frame, mfn, 0); -sha-frame = mfn; +if ( !paging_mode_translate(e) ) +sha-frame = mfn; } else { grant_entry_v2_t *sha = shared_entry_v2(e-grant_table, gop.ref); + guest_physmap_add_page(e, sha-full_page.frame, mfn, 0); -sha-full_page.frame = mfn; +if ( !paging_mode_translate(e) ) +sha-full_page.frame = mfn; } smp_wmb(); shared_entry_header(e-grant_table, gop.ref)-flags |= --- a/xen/include/public/grant_table.h +++ b/xen/include/public/grant_table.h @@ -134,8 +134,10 @@ struct grant_entry_v1 { /* The domain being granted foreign privileges. [GST] */ domid_t domid; /* - * GTF_permit_access: Frame that @domid is allowed to map and access. [GST] - * GTF_accept_transfer: Frame whose ownership transferred by @domid. [XEN] + * GTF_permit_access: GFN that @domid is allowed to map and access. [GST] + * GTF_accept_transfer: GFN that @domid is allowed to transfer into. [GST] + * GTF_transfer_completed: MFN whose ownership
[Xen-devel] [PATCH 6/6] gnttab: make struct grant_mapping private
This documents that no entity outside of gnttab.c actually accesses objects of that type. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -113,6 +113,16 @@ struct gnttab_unmap_common { goto _lbl; \ } while ( 0 ) +/* + * Tracks a mapping of another domain's grant reference. Each domain has a + * table of these, indexes into which are returned as a 'mapping handle'. + */ +struct grant_mapping { +u32 ref; /* grant ref */ +u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ +domid_t domid; /* granting domain */ +}; + #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) #define maptrack_entry(t, e) \ ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) --- a/xen/include/xen/grant_table.h +++ b/xen/include/xen/grant_table.h @@ -52,16 +52,6 @@ /* The maximum size of a grant table. */ extern unsigned int max_grant_frames; -/* - * Tracks a mapping of another domain's grant reference. Each domain has a - * table of these, indexes into which are returned as a 'mapping handle'. - */ -struct grant_mapping { -u32 ref; /* grant ref */ -u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ -domid_t domid; /* granting domain */ -}; - /* Per-domain grant information. */ struct grant_table { /* Table size. Number of frames shared with guest */ gnttab: make struct grant_mapping private This documents that no entity outside of gnttab.c actually accesses objects of that type. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -113,6 +113,16 @@ struct gnttab_unmap_common { goto _lbl; \ } while ( 0 ) +/* + * Tracks a mapping of another domain's grant reference. Each domain has a + * table of these, indexes into which are returned as a 'mapping handle'. + */ +struct grant_mapping { +u32 ref; /* grant ref */ +u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ +domid_t domid; /* granting domain */ +}; + #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) #define maptrack_entry(t, e) \ ((t)-maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) --- a/xen/include/xen/grant_table.h +++ b/xen/include/xen/grant_table.h @@ -52,16 +52,6 @@ /* The maximum size of a grant table. */ extern unsigned int max_grant_frames; -/* - * Tracks a mapping of another domain's grant reference. Each domain has a - * table of these, indexes into which are returned as a 'mapping handle'. - */ -struct grant_mapping { -u32 ref; /* grant ref */ -u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ -domid_t domid; /* granting domain */ -}; - /* Per-domain grant information. */ struct grant_table { /* Table size. Number of frames shared with guest */ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 00/17] x86/hvm: I/O emulation cleanup and fix
Il 11/06/2015 17:42, Paul Durrant ha scritto: This patch series re-works much of the code involved in emulation of port and memory mapped I/O for HVM guests. The code has become very convoluted and, at least by inspection, certain emulations will apparently malfunction. The series is broken down into 17 patches (which are also available in my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git on the emulation22 branch) as follows: 0001-x86-hvm-simplify-hvmemul_do_io.patch 0002-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch 0003-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch 0004-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch 0005-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch 0006-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch 0007-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch 0008-x86-hvm-split-I-O-completion-handling-from-state-mod.patch 0009-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch 0010-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch 0011-x86-hvm-remove-hvm_io_state-enumeration.patch 0012-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch 0013-x86-hvm-only-acquire-RAM-pages-for-emulation-when-we.patch 0014-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch 0015-x86-hvm-make-sure-translated-MMIO-reads-or-writes-fa.patch 0016-x86-hvm-remove-multiple-open-coded-chunking-loops.patch 0017-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch v2: - Removed bogus assertion from patch 16 - Re-worked patch 17 after basic testing of back-port onto XenServer Thanks, I confirm that this new version solves the critical regression that causes dom0 insta-reboot. I tested them on windows and linux hvm domUs, I had strange results... On windows 7 64 bit sp1 domU (with new winpv drivers) seems there are a small performance regression (or probably only latency increased) but I not found warning/error or something userful in logs, I'm not sure is real regression of your patches, if needed I'll retry without. On linux domU (fedora 21) qxl that had sse2 instructions problem still not working, same of latest 2 years not always qemu crash with a gdb showing sse2 instruction problem but xorg crash with EQ overflowing and/or domU remain with 100% cpu used by xorg qemu process at 100% cpu. I posted the backtrace similar time ago and a qemu/spice developer told that was a conseguence of other problem if I remember good. How can I be sure that sse2 or similar istructions (with videoram) are now working correctly? If you need more informations/tests tell me and I'll post them. Thanks for any reply and sorry for my bad english. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 11/15] vmx: Add a global wake-up vector for VT-d Posted-Interrupts
On 12.06.15 at 11:40, feng...@intel.com wrote: From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Jan Beulich Sent: Tuesday, June 09, 2015 11:20 PM On 08.05.15 at 11:07, feng...@intel.com wrote: +{ +struct vcpu *v; +unsigned int cpu = smp_processor_id(); + +spin_lock(per_cpu(blocked_vcpu_lock, cpu)); +list_for_each_entry(v, per_cpu(blocked_vcpu, cpu), +blocked_vcpu_list) { How long do you think such a list can grow? It depends on how many vCPU is current blocked on this specific pCPU. I am not sure how long it can grow. The basic idea here is that, when notification wakeup event happens, we need to find a way to find the right vCPU, and then wakeup it, do you have any better idea for this? Not immediately, or else I would have spelled it out. But the list perhaps growing to hundreds entries is something that needs to be taken into consideration. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
-Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 12 June 2015 11:26 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/12/2015 03:41 PM, Paul Durrant wrote: -Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 12 June 2015 04:22 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/11/2015 09:25 PM, Paul Durrant wrote: -Original Message- From: Yang Hongyang [mailto:yan...@cn.fujitsu.com] Sent: 11 June 2015 13:59 To: Paul Durrant; Wen Congyang; Andrew Cooper; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/11/2015 06:20 PM, Paul Durrant wrote: -Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 11 June 2015 09:48 To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; [...] In our implementation, we don't start a new emulator. The codes can work, but some bugs may be not triggered. How do you reconcile the incoming QEMU save record with the running emulator state? We introduce a qmp command xen-load-devices- state(libxl__qmp_restore) which can restore the emulator state. The step of resotre emulator state at a checkpoint is: 1. libxl__qmp_stop- vm_stop() in qemu 2. libxl__qmp_restore - load_vmstate() in qemu 3. libxl__qmp_resume - vm_start() in qemu Ok, that sounds like the ideal time to hook back into Xen by creating a new ioreq server. I have some questions about ioreq server: 1. If we use old version xen and newest version qemu, is it OK? Is default ioreq server created when the guest is created. xen_create_ioreq_server() does nothing, and xen_get_ioreq_server_info() will get the default ioreq server information. Is it right? No. It's not compatible in that direction. A new Xen will work with an old QEMU but not the other way round. 2. Why we create a default ioreq server when getting the hvm param if there is already a not default ioreq server? If something reads the 'legacy' HVM params then that is Xen's trigger to create the default server. Any 'new' emulator should be using the ioreq server hypercalls so the default server will not be needed. If there are two ioreq servers: default ioreq server, and a ioreq server created by emulator. The guest can work it correctly in this case? You mean a secondary emulator? Yes, that's why there is the notion of default ioreq server... to allow a secondary emulator to be used even when an old QEMU is in use. Is there any application(not emulator) that uses the libxenctrl directly? What do you mean by application? Toolstacks may use libxenctrl. Paul Thanks Wen Congyang 3. In the far end, we will clear the ioreq page, and this ioreq page is used for default ioreq server, is it right? Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and that includes the default server pages. Other ioreq servers will have their pages cleared on re-insertion to the P2M at the source end when the server is disabled. Paul Thanks Wen Congyang Paul Paul Thanks Wen Congyang Paul Thanks Wen Congyang Paul We will set to the guest to a new state, the old state should be dropped. Thanks Wen Congyang Paul Thanks Wen Congyang Paul Thanks Wen Congyang ~Andrew . . . ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel . . ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel . -- Thanks, Yang. . . ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [rumpuserxen test] 58422: regressions - FAIL
flight 58422 rumpuserxen real [real] http://logs.test-lab.xenproject.org/osstest/logs/58422/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-rumpuserxen 5 rumpuserxen-build fail REGR. vs. 33866 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866 Tests which did not succeed, but are not blocking: test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a version targeted for testing: rumpuserxen 3b91e44996ea6ae1276bce1cc44f38701c53ee6f baseline version: rumpuserxen 30d72f3fc5e35cd53afd82c8179cc0e0b11146ad People who touched revisions under test: Antti Kantee po...@iki.fi Ian Jackson ian.jack...@eu.citrix.com Martin Lucina mar...@lucina.net Wei Liu l...@liuw.name jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-i386-rumpuserxen-i386 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 535 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 10/11] x86/intel_pstate: support the use of intel_pstate in pmstat.c
Hi, On 11/06/2015 23:03, Wang, Wei W wrote: On 11/06/2015 22:02, Julien Grall wrote: On 11/06/2015 04:31, Wei Wang wrote: -list_for_each(pos, cpufreq_governor_list) +if (policy-policy) What if another cpufreq decides to use policy-policy? What is another cpufreq? The policy is per-CPU struct. I mean another cpufreq driver. Correct me if I'm wrong but from the name policy is not intel pstate specific. That means that a new cpufreq driver can decide to use the field his own purpose.. +gov_num = INTEL_PSTATE_INTERNAL_GOV_NUM; Why not using cpufreq_governor_list? That's used by the old driver. We are not going through that old governor layer. This is common code, it's used by different cpufreq driver for both x86 and ARM (not yet supported). We should not relying on any other cpufreq driver won't use the field policy. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Relocatable Xen early boot code
Hey, During work on multiboot2 protocol support for Xen on EFI platform I discovered that we need relocatable Xen early boot code (which is mostly 32-bit code). More you can find here: http://lists.xen.org/archives/html/xen-devel/2015-02/msg01257.html I would like to focus on solution #1 described above. I have working PoC which implements it. We also discussed various solutions for this issue during Xen Hackhaton in Shanghai. As Andrew and Jan asked I tried to implement solution based on segment registers. However, after revisiting this issue and further investigation I still have some doubts. You can read about my findings below. Here I do not want to discuss GRUB2 and multiboot2 protocol support details for relocatable images. It is not needed. It is sufficient to know that it is able to put loaded image anywhere in available memory below 4 GiB. Loaded image is informed about its base address according to multiboot2 protocol via special tag. This is new feature not available in upstream GRUB2. I work on upstreaming it in parallel. Relevant patches will be posted together with Xen patches. 1) My PoC uses %ebp (last unused register available globally in Xen early boot code; there is assumption that %ds == %es == %ss) as a storage for Xen image base address. If boot protocol do not support relocatable images it is filled with static value calculated during build. Otherwise it is taken from special multiboot2 tag. We need Xen image base address to access memory in Xen image area in at least three different cases: a) direct memory access, e.g.: mov %eax,sym_offset(boot_tsc_stamp)(%ebp) b) memory address calculation, e.g.: lea sym_offset(__page_tables_start)(%ebp),%edx, c) update static addresses calculated during build, e.g. prebuild page tables. If we have Xen image base address in a register then all mentioned operations are quite simple. This idea is based on x86_64 addressing mode which uses %rip as a reference. 2) Andrew and Jan suggested that we can use segment descriptor to store Xen image base address. This way all references to variables in Xen image can be easily calculated during build and they will be static. However, relevant segment descriptor must be updated during Xen start. At first sight, it looks that %cs, %ds, %es, %ss should be initialized as is (start: 0, size: 4 GiB, ...). This way we will not break references to trampoline and all other variables living outside of Xen image. Additionally, we are sure that all references to variables from C code (xen/arch/x86/boot/reloc.c) are pointing always to the same place regardless of x86 instruction generated by compiler and segment register used by this instruction for addressing. So, it looks that potentially we can use %fs or %gs as segment register to access variables living in Xen image. Looks good... Case a and b from solution #1 seams easy to resolve by prefixing by chosen segment register. Case c requires an extra register to store temporarily Xen image base address. Looks quite easy... However, looking at xen/arch/x86/boot/cmdline.S (at first sight this is the worst thing) I am not so happy with segment register solution. Unfortunately all functions get as argument just displacement in segment. This means that we must rework all stuff there and pass segment(s) as additional argument(s) because most of code checks memory in and out of Xen image in parallel. Of course we can use solution similar to case c described above but I think that then whole stuff move closer to idea #1. This way we will have something which uses things from #1 and #2. I do not think that this is solution which we want. 3) There is a third solution which is a mixture of #1 and #2. We can use e.g. %fs:0 (e.g. located somewhere in Xen image) to store Xen image base address. If this value is needed then we can access it directly, e.g. add %fs:0,%esi, or copy to temporary register and use as required. So, I am still in favor of #1. It is clean and easy. It does not require a lot of work. #2 has a potential but requires a lot of changes in fragile cmdline.S (maybe others difficult places). Is it worth? I think that #3 is a backup solution in case we choose #1 and later it will appear that we need a globally unused register. Daniel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2] xSplice design
On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote: [...] ## Hypercalls We will employ the sub operations of the system management hypercall (sysctl). There are to be four sub-operations: * upload the payloads. * listing of payloads summary uploaded and their state. * getting an particular payload summary and its state. * command to apply, delete, or revert the payload. The patching is asynchronous therefore the caller is responsible to verify that it has been applied properly by retrieving the summary of it and verifying that there are no error codes associated with the payload. We **MUST** make it asynchronous due to the nature of patching: it requires every physical CPU to be lock-step with each other. The patching mechanism while an implementation detail, is not an short operation and as such the design **MUST** assume it will be an long-running operation. I am not convinced yet, that you need an asynchronous approach here. The experience from our prototype suggests that hotpatching itself is not an expensive operation. It can usually be completed well below 1ms with the most expensive part being getting the hypervisor to a quiet state. If we go for a barrier at hypervisor exit, combined with forcing all other CPUs through the hypervisor with IPIs, the typical case is very quick. The only reason why that would take some time is, if another CPU is executing a lengthy operation in the hypervisor already. In that case, you probably don't want to block the whole machine waiting for the joining of that single CPU anyway and instead re-try later, for example, using a timeout on the barrier. That could be signaled to the user-land process (EAGAIN) so that he could re-attempt hotpatching after some seconds. Martin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 07/11] x86/intel_pstate: changes in cpufreq_del_cpu for CPU offline
On 11/06/2015 22:01, Wang, Wei W wrote: On 11/06/2015 22:06, Julien Grall wrote: On 11/06/2015 04:28, Wei Wang wrote: cpufreq_cpu_policy is used in intel_pstate_set_pstate(), so we change to NULL it after the call of cpufreq_driver-exit. Otherwise, a calltrace will show up on your screen due to the reference of a NULL pointer when you power down the system. Signed-off-by: Wei Wang wei.w.w...@intel.com --- xen/drivers/cpufreq/cpufreq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/drivers/cpufreq/cpufreq.c b/xen/drivers/cpufreq/cpufreq.c index 6003a8c..a8772e8 100644 --- a/xen/drivers/cpufreq/cpufreq.c +++ b/xen/drivers/cpufreq/cpufreq.c @@ -335,12 +335,11 @@ int cpufreq_del_cpu(unsigned int cpu) /* for HW_ALL, stop gov for each core of the _PSD domain */ /* for SW_ALL SW_ANY, stop gov for the 1st core of the _PSD domain */ -if (hw_all || (cpumask_weight(cpufreq_dom-map) == - perf-domain_info.num_processors)) +if (!policy-policy (hw_all || (cpumask_weight(cpufreq_dom-map) == + perf-domain_info.num_processors))) Based on your patch #6, the field policy contains value which is defined per- cpufreq driver (because you defined internal value). How can you be sure that a driver will never use 0 as a valid value? Hi Julien, what do you mean by per-cpufreq driver? We currently have two P-state drivers. This filed is currently only used by the intel_pstate driver, and the four usable values are: #define CPUFREQ_POLICY_POWERSAVE (1) #define CPUFREQ_POLICY_PERFORMANCE (2) #define CPUFREQ_POLICY_USERSPACE (3) #define CPUFREQ_POLICY_ONDEMAND(4) The intel_pstate won't use 0 as a valid value, and the default value is CPUFREQ_POLICY_ONDEMAND. If it's 0, it basically means the old acpi-cpufreq driver is being used. You seem to rely on nobody else with use the cpufreq framework... which is wrong. intel_pstate won't be the only possible cpufreq driver. Some ARM developper are working on adding a cpufreq for ARM power management. You said that CPUFREQ_POLICY_* is specific to the intel driver. But use them in the common code. If any cpufreq driver can use the value, then make it common. Otherwise please move this code outside of the framework. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
-Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 12 June 2015 12:10 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen-devel@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/12/2015 06:54 PM, Paul Durrant wrote: -Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 12 June 2015 11:26 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/12/2015 03:41 PM, Paul Durrant wrote: -Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 12 June 2015 04:22 To: Paul Durrant; Yang Hongyang; Andrew Cooper; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/11/2015 09:25 PM, Paul Durrant wrote: -Original Message- From: Yang Hongyang [mailto:yan...@cn.fujitsu.com] Sent: 11 June 2015 13:59 To: Paul Durrant; Wen Congyang; Andrew Cooper; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; yunhong.ji...@intel.com; Eddie Dong; rshri...@cs.ubc.ca; Ian Jackson Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time On 06/11/2015 06:20 PM, Paul Durrant wrote: -Original Message- From: Wen Congyang [mailto:we...@cn.fujitsu.com] Sent: 11 June 2015 09:48 To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- de...@lists.xen.org Cc: Wei Liu; Ian Campbell; guijianf...@cn.fujitsu.com; [...] In our implementation, we don't start a new emulator. The codes can work, but some bugs may be not triggered. How do you reconcile the incoming QEMU save record with the running emulator state? We introduce a qmp command xen-load-devices- state(libxl__qmp_restore) which can restore the emulator state. The step of resotre emulator state at a checkpoint is: 1. libxl__qmp_stop- vm_stop() in qemu 2. libxl__qmp_restore - load_vmstate() in qemu 3. libxl__qmp_resume - vm_start() in qemu Ok, that sounds like the ideal time to hook back into Xen by creating a new ioreq server. I have some questions about ioreq server: 1. If we use old version xen and newest version qemu, is it OK? Is default ioreq server created when the guest is created. xen_create_ioreq_server() does nothing, and xen_get_ioreq_server_info() will get the default ioreq server information. Is it right? No. It's not compatible in that direction. A new Xen will work with an old QEMU but not the other way round. If the xen is newest, and qemu is old, how is the default ioreq server created for the emulator? Will the old qemu call xc_get_hvm_param(), and the hypervisor then creates a default ioreq server? get is a 'readonly' operation, and creating the default ioreq server in it is very strange. Might be strange, but that's the way it's done. 2. Why we create a default ioreq server when getting the hvm param if there is already a not default ioreq server? If something reads the 'legacy' HVM params then that is Xen's trigger to create the default server. Any 'new' emulator should be using the ioreq server hypercalls so the default server will not be needed. If there are two ioreq servers: default ioreq server, and a ioreq server created by emulator. The guest can work it correctly in this case? You mean a secondary emulator? Yes, that's why there is the notion of default ioreq server... to allow a secondary emulator to be used even when an old QEMU is in use. No, only one emulator. Can we run more than one emulator for one hvm guest? How to do it? Yes, more than one emulator can run. There's nothing in libxl to do it, but we do it in XenServer using XAPI. Is there any application(not emulator) that uses the libxenctrl directly? What do you mean by application? Toolstacks may use libxenctrl. For example: libvirt. I know it uses libxl now. Is there any similar application which uses libxenctrl. Well, XAPI uses it for one. Paul Thanks Wen Congyang Paul Thanks Wen Congyang 3. In the far end, we will clear the ioreq page, and this ioreq page is used for default ioreq server, is it right? Yes, AFAIK it's only the 'magic' pages that get cleared at the far end - and that includes the default server pages.
Re: [Xen-devel] [PATCH v2 COLOPre 07/13] tools/libxl: Update libxl__domain_unpause() to support qemu-xen
On Mon, Jun 08, 2015 at 11:43:11AM +0800, Yang Hongyang wrote: Currently, libxl__domain_unpause() only supports qemu-xen-traditional. Update it to support qemu-xen. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com This looks very similar to an existing function called libxl__domain_resume_device_model. Maybe you don't need to invent a new function. --- tools/libxl/libxl.c | 42 +- 1 file changed, 33 insertions(+), 9 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index d5691dc..5c843c2 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -933,10 +933,37 @@ out: return AO_INPROGRESS; } -int libxl__domain_unpause(libxl__gc *gc, uint32_t domid) +static int libxl__domain_unpause_device_model(libxl__gc *gc, uint32_t domid) { char *path; char *state; + +switch (libxl__device_model_version_running(gc, domid)) { +case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: { +uint32_t dm_domid = libxl_get_stubdom_id(CTX, domid); + +path = libxl__device_model_xs_path(gc, dm_domid, domid, /state); +state = libxl__xs_read(gc, XBT_NULL, path); +if (state != NULL !strcmp(state, paused)) { The only difference between your function and libxl__domain_unpause_device_model is the check for state node. I think you can just add the check to libxl__domain_resume_device_model and use that function. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v11 6/9] xen: Add ring 3 vmware_port support
On 06/12/15 02:25, Jan Beulich wrote: On 12.06.15 at 00:10, dsl...@verizon.com wrote: On 06/05/15 06:54, Ian Campbell wrote: It would be really useful to see a comprehensive list of exactly what guest ring3 access to the vmware port actually enables i.e. a list of specific features which require it. Ok, I have done some testing. Here is what I know: Without ring3 support: 1) VMware tools will not install on linux and windows. 2) open-vm-tools (https://github.com/vmware/open-vm-tools) will not install (how ever it is not hard to change it to do so, you need to add a call to iopl(3) need to be added in a few places) on linux However if VMware tools did get installed on the window disk bits somehow, the VMware mouse support works. Linux gets this because Xorg detects and uses the VMware mouse under IOPL(3). Now that tells us that the tools may not work, but not what implications that has on the usability of the VM once migrated to Xen. Them not installing is a non-issue afaict, since after having moved the VM to Xen there shouldn't be a need to install them anymore - either they've been there, or I don't see why they would be needed _after_ the move. And you say that the mouse works in both cases if the tools happen to be there. The VMware tools service will start but does not work. It adds to the event log: [warning][vmusr:vmtoolsd] The vmuser service needs to run inside a virtual machine. And the available features of VMware tools is disabled: 1) The ability to perform virtual machine power operations gracefully is missing. (code to access QEMU's from Xen to do this is missing). I.E. get windows to shutdown when requested! 2) Execution of VMware provided or user configured scripts in guests during various power operations. 3) Clock synchronization between guests and hosts or client desktops. 4) Access to VMware guest info variables (code to access QEMU's from Xen to do this is missing). This can be used to customize guest operating systems immediately after powering on virtual machines. It can also be used to monitor the health of a guest. The reason to install them after is to get the VMware mouse driver on Windows. This mouse driver works much better on Window when there is higher network latency. -Don Slutz Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST v2] Arrange to upgrade microcode on x86 test hosts.
Ian Campbell writes ([PATCH OSSTEST v2] Arrange to upgrade microcode on x86 test hosts.): Both Xen and Linux support extracting a microcode update from an initramfs early during boot. This requires prepending a suitable uncompressed cpio archive containing the necessary files to the initrd. ... Upon commit please run mg-cpu-microcode-update and set MicrocodeUpdateAmd64 and MicrocodeUpdateI386 to the resulting file in both production-config and production-config-cambridge (I will copy the binary to the Cambridge location) Acked-by: Ian Jackson ian.jack...@eu.citrix.com Please, as discussed on irc, feel free to run this new mg-cpu-microcode-update on the production instance. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/6] gnttab: simplify page copying/clearing
... by making {copy,clear}_domain_page() available also on other than x86. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -230,24 +230,6 @@ void unmap_domain_page(const void *ptr) local_irq_restore(flags); } -void clear_domain_page(unsigned long mfn) -{ -void *ptr = map_domain_page(mfn); - -clear_page(ptr); -unmap_domain_page(ptr); -} - -void copy_domain_page(unsigned long dmfn, unsigned long smfn) -{ -const void *src = map_domain_page(smfn); -void *dst = map_domain_page(dmfn); - -copy_page(dst, src); -unmap_domain_page(dst); -unmap_domain_page(src); -} - int mapcache_domain_init(struct domain *d) { struct mapcache_domain *dcache = d-arch.pv_domain.mapcache; --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -1643,7 +1643,6 @@ gnttab_transfer( if ( (1UL (max_bitsize - PAGE_SHIFT)) = mfn ) { struct page_info *new_page; -void *sp, *dp; new_page = alloc_domheap_page(e, MEMF_no_owner | MEMF_bits(max_bitsize)); @@ -1653,11 +1652,7 @@ gnttab_transfer( goto unlock_and_copyback; } -sp = map_domain_page(mfn); -dp = __map_domain_page(new_page); -memcpy(dp, sp, PAGE_SIZE); -unmap_domain_page(dp); -unmap_domain_page(sp); +copy_domain_page(page_to_mfn(new_page), mfn); page-count_info = ~(PGC_count_mask|PGC_allocated); free_domheap_page(page); @@ -2434,7 +2429,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA /* Make sure there's no crud left over in the table from the old version. */ for ( i = 0; i nr_grant_frames(gt); i++ ) -memset(gt-shared_raw[i], 0, PAGE_SIZE); +clear_page(gt-shared_raw[i]); /* Restore the first 8 entries (toolstack reserved grants) */ if ( gt-gt_version != 0 op.version == 1 ) --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -1170,6 +1170,26 @@ long do_memory_op(unsigned long cmd, XEN return rc; } +#ifdef CONFIG_DOMAIN_PAGE +void clear_domain_page(unsigned long mfn) +{ +void *ptr = map_domain_page(mfn); + +clear_page(ptr); +unmap_domain_page(ptr); +} + +void copy_domain_page(unsigned long dmfn, unsigned long smfn) +{ +const void *src = map_domain_page(smfn); +void *dst = map_domain_page(dmfn); + +copy_page(dst, src); +unmap_domain_page(dst); +unmap_domain_page(src); +} +#endif + void destroy_ring_for_helper( void **_va, struct page_info *page) { --- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -264,6 +264,8 @@ static inline lpae_t mfn_to_xen_entry(un /* Actual cacheline size on the boot CPU. */ extern size_t cacheline_bytes; +#define copy_page(dp, sp) memcpy(dp, sp, PAGE_SIZE) + /* Functions for flushing medium-sized areas. * if 'range' is large enough we might want to use model-specific * full-cache flushes. */ gnttab: simplify page copying/clearing ... by making {copy,clear}_domain_page() available also on other than x86. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/domain_page.c +++ b/xen/arch/x86/domain_page.c @@ -230,24 +230,6 @@ void unmap_domain_page(const void *ptr) local_irq_restore(flags); } -void clear_domain_page(unsigned long mfn) -{ -void *ptr = map_domain_page(mfn); - -clear_page(ptr); -unmap_domain_page(ptr); -} - -void copy_domain_page(unsigned long dmfn, unsigned long smfn) -{ -const void *src = map_domain_page(smfn); -void *dst = map_domain_page(dmfn); - -copy_page(dst, src); -unmap_domain_page(dst); -unmap_domain_page(src); -} - int mapcache_domain_init(struct domain *d) { struct mapcache_domain *dcache = d-arch.pv_domain.mapcache; --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -1643,7 +1643,6 @@ gnttab_transfer( if ( (1UL (max_bitsize - PAGE_SHIFT)) = mfn ) { struct page_info *new_page; -void *sp, *dp; new_page = alloc_domheap_page(e, MEMF_no_owner | MEMF_bits(max_bitsize)); @@ -1653,11 +1652,7 @@ gnttab_transfer( goto unlock_and_copyback; } -sp = map_domain_page(mfn); -dp = __map_domain_page(new_page); -memcpy(dp, sp, PAGE_SIZE); -unmap_domain_page(dp); -unmap_domain_page(sp); +copy_domain_page(page_to_mfn(new_page), mfn); page-count_info = ~(PGC_count_mask|PGC_allocated); free_domheap_page(page); @@ -2434,7 +2429,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA /* Make sure there's no crud left over in the table from the old version. */ for ( i = 0; i nr_grant_frames(gt); i++ ) -memset(gt-shared_raw[i], 0, PAGE_SIZE); +
Re: [Xen-devel] [PATCH] libxl: libxl_internal.h: Clarify ao rule against internal callers
Ian Campbell writes (Re: [PATCH] libxl: libxl_internal.h: Clarify ao rule against internal callers): On Thu, 2015-06-11 at 17:57 +0100, Ian Jackson wrote: Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com CC: Juergen Gross jgr...@suse.com Acked-by: Ian Campbell ian.campb...@citrix.com AO_INITIATOR_ENTRY was just a stray remainder of some wip name? Yes. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl: libxl_internal.h: Clarify ao rule against internal callers
Juergen Gross writes (Re: [Xen-devel] [PATCH] libxl: libxl_internal.h: Clarify ao rule against internal callers): Now it's 100% clear what to avoid. Acked-by: Juergen Gross jgr...@suse.com Thanks, pushed. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] libxl backports for 4.4 and 4.5
Ian Jackson writes (Re: [Xen-devel] libxl backports for 4.4 and 4.5): I propose to push all of these today, unless you object. I will double-check with you on irc. Now done. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Backport request libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap. (Was: Re: [Bug report] Security issue in xl vcpu-set)
Ian Campbell writes (Backport request libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap. (Was: Re: [Bug report] Security issue in xl vcpu-set)): commit d83bf9d224eeb5b73b93c2703f7dba4473cfa89c Author: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Fri Apr 3 16:02:29 2015 -0400 libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap. Now backported to staging-4.5. I fixed up the conflict, correctly I think. Ian. commit 0d8cbcad03764e42ff2f0d224aff883c3734d782 Author: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Fri Apr 3 16:02:29 2015 -0400 libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap. There is no sense in trying to online (or offline) CPUs when the size of cpumap is greater than the maximum number of VCPUs the guest can go to. As such fail the operation if the count of CPUs to online is greater than what the guest started with. For the offline case we do not check (as the bits are unset in the cpumap) and let it go through. We coalesce some of the underlying libxl_set_vcpuonline code together which was duplicated in QMP and XenStore codepaths. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Acked-by: Ian Campbell ian.campb...@citrix.com (cherry picked from commit d83bf9d224eeb5b73b93c2703f7dba4473cfa89c) Conflicts: tools/libxl/libxl.c Signed-off-by: Ian Jackson ian.jack...@eu.citrix.com diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 1f4dce2..489d5f8 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -5487,25 +5487,19 @@ int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid, } static int libxl__set_vcpuonline_xenstore(libxl__gc *gc, uint32_t domid, - libxl_bitmap *cpumap) + libxl_bitmap *cpumap, + const libxl_dominfo *info) { -libxl_dominfo info; char *dompath; xs_transaction_t t; int i, rc = ERROR_FAIL; -libxl_dominfo_init(info); - -if (libxl_domain_info(CTX, info, domid) 0) { -LOGE(ERROR, getting domain info list); -goto out; -} if (!(dompath = libxl__xs_get_dompath(gc, domid))) goto out; retry_transaction: t = xs_transaction_start(CTX-xsh); -for (i = 0; i = info.vcpu_max_id; i++) +for (i = 0; i = info-vcpu_max_id; i++) libxl__xs_write(gc, t, libxl__sprintf(gc, %s/cpu/%u/availability, dompath, i), %s, libxl_bitmap_test(cpumap, i) ? online : offline); @@ -5515,24 +5509,16 @@ retry_transaction: } else rc = 0; out: -libxl_dominfo_dispose(info); return rc; } static int libxl__set_vcpuonline_qmp(libxl__gc *gc, uint32_t domid, - libxl_bitmap *cpumap) + libxl_bitmap *cpumap, + const libxl_dominfo *info) { -libxl_dominfo info; int i; -libxl_dominfo_init(info); - -if (libxl_domain_info(CTX, info, domid) 0) { -LOGE(ERROR, getting domain info list); -libxl_dominfo_dispose(info); -return ERROR_FAIL; -} -for (i = 0; i = info.vcpu_max_id; i++) { +for (i = 0; i = info-vcpu_max_id; i++) { if (libxl_bitmap_test(cpumap, i)) { /* Return value is ignore because it does not tell anything useful * on the completion of the command. @@ -5542,33 +5528,53 @@ static int libxl__set_vcpuonline_qmp(libxl__gc *gc, uint32_t domid, libxl__qmp_cpu_add(gc, domid, i); } } -libxl_dominfo_dispose(info); return 0; } int libxl_set_vcpuonline(libxl_ctx *ctx, uint32_t domid, libxl_bitmap *cpumap) { GC_INIT(ctx); -int rc; +int rc, maxcpus; +libxl_dominfo info; + +libxl_dominfo_init(info); + +rc = libxl_domain_info(CTX, info, domid); +if (rc 0) { +LOGE(ERROR, getting domain info list); +goto out; +} + +maxcpus = libxl_bitmap_count_set(cpumap); +if (maxcpus info.vcpu_max_id + 1) +{ +LOGE(ERROR, Requested %d VCPUs, however maxcpus is %d!, + maxcpus, info.vcpu_max_id + 1); +rc = ERROR_FAIL; +goto out; +} + switch (libxl__domain_type(gc, domid)) { case LIBXL_DOMAIN_TYPE_HVM: switch (libxl__device_model_version_running(gc, domid)) { case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: -rc = libxl__set_vcpuonline_xenstore(gc, domid, cpumap); +rc = libxl__set_vcpuonline_xenstore(gc, domid, cpumap, info); break; case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: -rc = libxl__set_vcpuonline_qmp(gc, domid, cpumap); +rc = libxl__set_vcpuonline_qmp(gc, domid, cpumap, info);
Re: [Xen-devel] [Draft F] Xen on ARM vITS Handling
On Thu, 2015-06-11 at 10:40 +0100, Ian Campbell wrote: ## Command Queue Virtualisation The command translation/emulation in this design has been arranged to be as cheap as possible (e.g. in many cases the actions are NOPs), avoiding previous concerns about the length of time which an emulated write to a `CWRITER` register may block the vcpu. The vits will simply track its reader and writer pointers. On write to `CWRITER` it will immediately and synchronously process all commands in the queue and update its state accordingly. It might be possible to implement a rudimentary form of preemption by periodically (as determined by `hypercall_preempt_check()`) returning to the guest without incrementing PC but with updated internal `CREADR` state, meaning it will reexecute the write to `CWRITER` and we can pickup where we left off for another iteration. This at least lets us schedule other vcpus etc and prevents a monopoly. In the presence of multiple VCPUs writing to GITS_CWRITER preemption actually gets pretty subtle. I suggest leaving it out for now. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 COLOPre 10/13] tools/libxl: Add back channel to allow migration target send data back
On Mon, Jun 08, 2015 at 11:43:14AM +0800, Yang Hongyang wrote: From: Wen Congyang we...@cn.fujitsu.com In colo mode, slave needs to send data to master, but the io_fd only can be written in master, and only can be read in slave. Save recv_fd in domain_suspend_state, and send_fd in domain_create_state. You failed to mention in commit message new structures are introduced in IDL. Signed-off-by: Wen Congyang we...@cn.fujitsu.com Signed-off-by: Yang Hongyang yan...@cn.fujitsu.com --- tools/libxl/libxl.c | 2 +- tools/libxl/libxl_create.c | 14 ++ tools/libxl/libxl_internal.h | 2 ++ tools/libxl/libxl_types.idl | 7 +++ tools/libxl/xl_cmdimpl.c | 7 +++ You also need to add LIBXL_HAVE in libxl.h. 5 files changed, 27 insertions(+), 5 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 5c843c2..36b97fe 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -832,7 +832,7 @@ int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info, dss-callback = remus_failover_cb; dss-domid = domid; dss-fd = send_fd; -/* TODO do something with recv_fd */ +dss-recv_fd = recv_fd; dss-type = type; dss-live = 1; dss-debug = 0; diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 86384d2..bd8149c 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1577,8 +1577,8 @@ static void domain_create_cb(libxl__egc *egc, int rc, uint32_t domid); static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, -uint32_t *domid, -int restore_fd, int checkpointed_stream, +uint32_t *domid, int restore_fd, +int send_fd, int checkpointed_stream, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) { @@ -1591,6 +1591,7 @@ static int do_domain_create(libxl_ctx *ctx, libxl_domain_config *d_config, libxl_domain_config_init(cdcs-dcs.guest_config_saved); libxl_domain_config_copy(ctx, cdcs-dcs.guest_config_saved, d_config); cdcs-dcs.restore_fd = restore_fd; +cdcs-dcs.send_fd = send_fd; cdcs-dcs.callback = domain_create_cb; cdcs-dcs.checkpointed_stream = checkpointed_stream; libxl__ao_progress_gethow(cdcs-dcs.aop_console_how, aop_console_how); @@ -1619,7 +1620,7 @@ int libxl_domain_create_new(libxl_ctx *ctx, libxl_domain_config *d_config, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) { -return do_domain_create(ctx, d_config, domid, -1, 0, +return do_domain_create(ctx, d_config, domid, -1, -1, 0, ao_how, aop_console_how); } @@ -1629,7 +1630,12 @@ int libxl_domain_create_restore(libxl_ctx *ctx, libxl_domain_config *d_config, const libxl_asyncop_how *ao_how, const libxl_asyncprogress_how *aop_console_how) { -return do_domain_create(ctx, d_config, domid, restore_fd, +int send_fd = -1; + +if (params-checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_COLO) +send_fd = params-send_fd; + +return do_domain_create(ctx, d_config, domid, restore_fd, send_fd, params-checkpointed_stream, ao_how, aop_console_how); } diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index fbbae93..6d214b5 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2874,6 +2874,7 @@ struct libxl__domain_save_state { uint32_t domid; int fd; +int recv_fd; libxl_domain_type type; int live; int debug; @@ -3143,6 +3144,7 @@ struct libxl__domain_create_state { libxl_domain_config *guest_config; libxl_domain_config guest_config_saved; /* vanilla config */ int restore_fd; +int send_fd; libxl__domain_create_cb *callback; libxl_asyncprogress_how aop_console_how; /* private to domain_create */ diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 23f27d4..8a3d7ba 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -198,6 +198,12 @@ libxl_viridian_enlightenment = Enumeration(viridian_enlightenment, [ (3, reference_tsc), ]) +libxl_checkpointed_stream = Enumeration(checkpointed_stream, [ +(0, NONE), +(1, REMUS), +(2, COLO), +], init_val = 0) The default init_val is 0 so you don't need to write it down. + # # Complex libxl types # @@ -346,6 +352,7 @@ libxl_domain_create_info = Struct(domain_create_info,[ libxl_domain_restore_params = Struct(domain_restore_params, [ (checkpointed_stream,
Re: [Xen-devel] [PATCH v2] tools/libxc: Batch memory allocations for PV guests
On Thu, Jun 11, 2015 at 09:52:44AM +0100, Ross Lagerwall wrote: The current code for allocating memory for PV guests batches the hypercalls to allocate memory by allocating 1024*1024 extents of order 0 at a time. To make this faster, first try allocating extents of order 9 (2 MiB) before falling back to the order 0 allocating if the order 9 allocation fails. On my test machine this reduced the time to start a 128 GiB PV guest by about 60 seconds. Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com --- Changed in v2: Batched hypercalls for order 9 allocations. tools/libxc/xc_dom_x86.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index 783f749..2d461e3 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -761,7 +761,7 @@ int arch_setup_meminit(struct xc_dom_image *dom) { int rc; xen_pfn_t pfn, allocsz, mfn, total, pfn_base; -int i, j; +int i, j, k; rc = x86_compat(dom-xch, dom-guest_domid, dom-guest_type); if ( rc ) @@ -869,6 +869,8 @@ int arch_setup_meminit(struct xc_dom_image *dom) unsigned int memflags; uint64_t pages; unsigned int pnode = dom-vnode_to_pnode[dom-vmemranges[i].nid]; +int count = dom-total_pages SUPERPAGE_PFN_SHIFT; +xen_pfn_t extents[count]; Sorry, I forgot to mention not to follow this practice. We don't want unbound growth of the stack, which can lead to process getting killed. The hypervisor would preempt long running operation anyway so the overly large array doesn't necessarily reduce the number of hypercall issued. Could you define a batch size (say, 512 or 1024) and then start from there? Sorry again for not having mentioned that earlier. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC v2 08/15] Update IRTE according to guest interrupt config changes
On 12.06.15 at 11:40, feng...@intel.com wrote: From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Tuesday, June 09, 2015 11:06 PM On 08.05.15 at 11:07, feng...@intel.com wrote: +static bool_t pi_find_dest_vcpu(struct domain *d, uint8_t dest_id, +uint8_t dest_mode, uint8_t delivery_mode, +uint8_t gvec, struct vcpu **dest_vcpu) +{ +struct vcpu *v, **dest_vcpu_array; +unsigned int dest_vcpu_num = 0; +int ret; This, being given as operand to return, should match in type with the function's return type. +dest_vcpu_array = xzalloc_array(struct vcpu *, d-max_vcpus); You realize that this can be quite big an allocation? (You could at least halve it by storing vCPU IDs instead of pointers, but if I'm not mistaken this could even be a simple bitmap.) If we use vCPU IDs or bitmap, we need to iterate all the vCPUs in the domain to find the right vCPU from the vcpu_id, right? Why? You can index d-vcpu[]. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/arm: Propagate clock-frequency to DOMU if present in the DT timer node
On 11/06/2015 17:43, Chris (Christopher) Brand wrote: Hi Julien, Hi Chris, The patch does work exactly as advertised. When I used dtc to convert CONFIG_DTB_FILE from dtb to dts, I could see that it didn't in fact have a timer clock-frequency node. After re-creating the dtb and rebuilding Xen, ls /proc/device-tree/timer/ shows a clock-frequency file. When I then fire up DomU and do the same command, it too has a clock-frequency node. Can I get your Tested-by on this patch? Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI Passthrough ARM Design : Draft1
On 12/06/2015 04:32, Ian Campbell wrote: On Thu, 2015-06-11 at 14:38 -0700, Manish Jaggi wrote: On Wednesday 10 June 2015 12:21 PM, Julien Grall wrote: Hi, On 10/06/2015 08:45, Ian Campbell wrote: 4. DomU access / assignment PCI device -- When a device is attached to a domU, provision has to be made such that it can access the MMIO space of the device and xen is able to identify the mapping between guest bdf and system bdf. Two hypercalls are introduced I don't think we want/need new hypercalls here, the same existing hypercalls which are used on x86 should be suitable. I think both the hypercalls are necessary a) the mapping of guest bdf to actual sbdf is required as domU accesses for GIC are trapped and not handled by pciback. A device say 1:0:0.3 is assigned in domU at 0:0:0.3. This is the bestway I could find that works. b) map_mmio call is issued just after the device is added on the pcu bus (in case of domU) The function register_xen_pci_notifier (drivers/xen/pci.c) is modified such that notification is received in domU and dom0. In which please please add to the document a discussion of the current interfaces and why they are not suitable. Beware that the 1:1 mapping doesn't fit with the current guest memory layout which is pre-defined at Xen build time. So you would also have to make it dynamically or decide to use the same memory layout as the host. If same layout as host used, would there be any issue? I'm not sure that a 1:1 mapping is any different to the host layout. But in any case, the host layout also doesn't match the guest layout, so it has the same issues. I was suggesting to expose the host layout to the guest layout (similar to e820). Although, this is not my preferred way and a non 1:1 mapping would be the best. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/6] gnttab: limit mapcount() looping
The function also doesn't need to return counts; all its callers are after is whether at least one entry of a certain kind exists. With that there's no point for that loop to continue once the looked for condition was found to be met by one entry. Rename the function to match the changed behavior. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -541,24 +541,28 @@ static int grant_map_exists(const struct return -EINVAL; } -static void mapcount( -struct grant_table *lgt, struct domain *rd, unsigned long mfn, -unsigned int *wrc, unsigned int *rdc) +#define MAPKIND_READ 1 +#define MAPKIND_WRITE 2 +static unsigned int mapkind( +const struct grant_table *lgt, const struct domain *rd, unsigned long mfn) { struct grant_mapping *map; grant_handle_t handle; +unsigned int kind = 0; -*wrc = *rdc = 0; - -for ( handle = 0; handle lgt-maptrack_limit; handle++ ) +for ( handle = 0; !(kind MAPKIND_WRITE) + handle lgt-maptrack_limit; handle++ ) { map = maptrack_entry(lgt, handle); if ( !(map-flags (GNTMAP_device_map|GNTMAP_host_map)) || map-domid != rd-domain_id ) continue; if ( active_entry(rd-grant_table, map-ref).frame == mfn ) -(map-flags GNTMAP_readonly) ? (*rdc)++ : (*wrc)++; +kind |= map-flags GNTMAP_readonly ? +MAPKIND_READ : MAPKIND_WRITE; } + +return kind; } /* @@ -782,21 +786,21 @@ __gnttab_map_grant_ref( if ( gnttab_need_iommu_mapping(ld) ) { -unsigned int wrc, rdc; +unsigned int kind = mapkind(lgt, rd, frame); int err = 0; + /* We're not translated, so we know that gmfns and mfns are the same things, so the IOMMU entry is always 1-to-1. */ -mapcount(lgt, rd, frame, wrc, rdc); if ( (act_pin (GNTPIN_hstw_mask|GNTPIN_devw_mask)) !(old_pin (GNTPIN_hstw_mask|GNTPIN_devw_mask)) ) { -if ( wrc == 0 ) +if ( !(kind MAPKIND_WRITE) ) err = iommu_map_page(ld, frame, frame, IOMMUF_readable|IOMMUF_writable); } else if ( act_pin !old_pin ) { -if ( (wrc + rdc) == 0 ) +if ( !kind ) err = iommu_map_page(ld, frame, frame, IOMMUF_readable); } if ( err ) @@ -990,12 +994,12 @@ __gnttab_unmap_common( if ( gnttab_need_iommu_mapping(ld) ) { -unsigned int wrc, rdc; +unsigned int kind = mapkind(lgt, rd, op-frame); int err = 0; -mapcount(lgt, rd, op-frame, wrc, rdc); -if ( (wrc + rdc) == 0 ) + +if ( !kind ) err = iommu_unmap_page(ld, op-frame); -else if ( wrc == 0 ) +else if ( !(kind MAPKIND_WRITE) ) err = iommu_map_page(ld, op-frame, op-frame, IOMMUF_readable); if ( err ) { gnttab: limit mapcount() looping The function also doesn't need to return counts; all its callers are after is whether at least one entry of a certain kind exists. With that there's no point for that loop to continue once the looked for condition was found to be met by one entry. Rename the function to match the changed behavior. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -541,24 +541,28 @@ static int grant_map_exists(const struct return -EINVAL; } -static void mapcount( -struct grant_table *lgt, struct domain *rd, unsigned long mfn, -unsigned int *wrc, unsigned int *rdc) +#define MAPKIND_READ 1 +#define MAPKIND_WRITE 2 +static unsigned int mapkind( +const struct grant_table *lgt, const struct domain *rd, unsigned long mfn) { struct grant_mapping *map; grant_handle_t handle; +unsigned int kind = 0; -*wrc = *rdc = 0; - -for ( handle = 0; handle lgt-maptrack_limit; handle++ ) +for ( handle = 0; !(kind MAPKIND_WRITE) + handle lgt-maptrack_limit; handle++ ) { map = maptrack_entry(lgt, handle); if ( !(map-flags (GNTMAP_device_map|GNTMAP_host_map)) || map-domid != rd-domain_id ) continue; if ( active_entry(rd-grant_table, map-ref).frame == mfn ) -(map-flags GNTMAP_readonly) ? (*rdc)++ : (*wrc)++; +kind |= map-flags GNTMAP_readonly ? +MAPKIND_READ : MAPKIND_WRITE; } + +return kind; } /* @@ -782,21 +786,21 @@ __gnttab_map_grant_ref( if ( gnttab_need_iommu_mapping(ld) ) { -unsigned int wrc, rdc; +unsigned int kind = mapkind(lgt, rd, frame); int err = 0; + /* We're not translated, so we know that gmfns and mfns are the same things, so the IOMMU entry is always 1-to-1. */ -mapcount(lgt, rd, frame, wrc, rdc); if
[Xen-devel] [PATCH 3/6] gnttab: simplify shared entry v1 vs v2 handling
In a number of places both v1 and v2 pointers are being obtained when none or just one suffices. Additionally in __acquire_grant_for_copy() the flow of if/else-if can be slightly improved by re-ordering. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -588,8 +588,6 @@ __gnttab_map_grant_ref( unsigned int cache_flags; struct active_grant_entry *act = NULL; struct grant_mapping *mt; -grant_entry_v1_t *sha1; -grant_entry_v2_t *sha2; grant_entry_header_t *shah; uint16_t *status; @@ -645,15 +643,7 @@ __gnttab_map_grant_ref( act = active_entry(rgt, op-ref); shah = shared_entry_header(rgt, op-ref); -if (rgt-gt_version == 1) { -sha1 = shared_entry_v1(rgt, op-ref); -sha2 = NULL; -status = shah-flags; -} else { -sha2 = shared_entry_v2(rgt, op-ref); -sha1 = NULL; -status = status_entry(rgt, op-ref); -} +status = rgt-gt_version == 1 ? shah-flags : status_entry(rgt, op-ref); /* If already pinned, check the active domid and avoid refcnt overflow. */ if ( act-pin @@ -676,8 +666,10 @@ __gnttab_map_grant_ref( if ( !act-pin ) { unsigned long frame; +unsigned long gfn = rgt-gt_version == 1 ? +shared_entry_v1(rgt, op-ref).frame : +shared_entry_v2(rgt, op-ref).full_page.frame; -unsigned long gfn = sha1 ? sha1-frame : sha2-full_page.frame; rc = __get_paged_frame(gfn, frame, pg, !!(op-flags GNTMAP_readonly), rd); if ( rc != GNTST_okay ) @@ -1882,7 +1874,6 @@ __acquire_grant_for_copy( uint16_t *page_off, uint16_t *length, unsigned allow_transitive) { struct grant_table *rgt = rd-grant_table; -grant_entry_v1_t *sha1; grant_entry_v2_t *sha2; grant_entry_header_t *shah; struct active_grant_entry *act; @@ -1909,13 +1900,11 @@ __acquire_grant_for_copy( shah = shared_entry_header(rgt, gref); if ( rgt-gt_version == 1 ) { -sha1 = shared_entry_v1(rgt, gref); sha2 = NULL; status = shah-flags; } else { -sha1 = NULL; sha2 = shared_entry_v2(rgt, gref); status = status_entry(rgt, gref); } @@ -1937,7 +1926,19 @@ __acquire_grant_for_copy( td = rd; trans_gref = gref; -if ( sha2 (shah-flags GTF_type_mask) == GTF_transitive ) +if ( !sha2 ) +{ +unsigned long gfn = shared_entry_v1(rgt, gref).frame; + +rc = __get_paged_frame(gfn, grant_frame, page, readonly, rd); +if ( rc != GNTST_okay ) +goto unlock_out_clear; +act-gfn = gfn; +is_sub_page = 0; +trans_page_off = 0; +trans_length = PAGE_SIZE; +} +else if ( (shah-flags GTF_type_mask) == GTF_transitive ) { if ( !allow_transitive ) PIN_FAIL(unlock_out_clear, GNTST_general_error, @@ -1999,16 +2000,6 @@ __acquire_grant_for_copy( is_sub_page = 1; act-gfn = -1ul; } -else if ( sha1 ) -{ -rc = __get_paged_frame(sha1-frame, grant_frame, page, readonly, rd); -if ( rc != GNTST_okay ) -goto unlock_out_clear; -act-gfn = sha1-frame; -is_sub_page = 0; -trans_page_off = 0; -trans_length = PAGE_SIZE; -} else if ( !(sha2-hdr.flags GTF_sub_page) ) { rc = __get_paged_frame(sha2-full_page.frame, grant_frame, page, readonly, rd); @@ -3154,8 +3145,6 @@ static void gnttab_usage_print(struct do { struct active_grant_entry *act; struct grant_entry_header *sha; -grant_entry_v1_t *sha1; -grant_entry_v2_t *sha2; uint16_t status; uint64_t frame; @@ -3167,16 +3156,12 @@ static void gnttab_usage_print(struct do if ( gt-gt_version == 1 ) { -sha1 = shared_entry_v1(gt, ref); -sha2 = NULL; status = sha-flags; -frame = sha1-frame; +frame = shared_entry_v1(gt, ref).frame; } else { -sha2 = shared_entry_v2(gt, ref); -sha1 = NULL; -frame = sha2-full_page.frame; +frame = shared_entry_v2(gt, ref).full_page.frame; status = status_entry(gt, ref); } gnttab: simplify shared entry v1 vs v2 handling In a number of places both v1 and v2 pointers are being obtained when none or just one suffices. Additionally in __acquire_grant_for_copy() the flow of if/else-if can be slightly improved by re-ordering. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -588,8 +588,6 @@ __gnttab_map_grant_ref(
[Xen-devel] [PATCH 1/6] gnttab: eliminate several explicit version checks
By having nr_grant_entries() return zero when the grant table version is still unset we can reduce the number of error paths and at once fix grant_map_exists() running into the being removed ASSERT() when called for a page owned by a domain not having its grant table set up yet. Signed-off-by: Jan Beulich jbeul...@suse.com --- Said ASSERT() was pointless in the grant_map_exists() case, since the function only looks at active entries, which are version independent. Hence this wasn't a security issue. --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -319,11 +319,15 @@ get_maptrack_handle( /* Number of grant table entries. Caller must hold d's grant table lock. */ static unsigned int nr_grant_entries(struct grant_table *gt) { -ASSERT(gt-gt_version != 0); -if (gt-gt_version == 1) +switch ( gt-gt_version ) +{ +case 1: return (nr_grant_frames(gt) PAGE_SHIFT) / sizeof(grant_entry_v1_t); -else +case 2: return (nr_grant_frames(gt) PAGE_SHIFT) / sizeof(grant_entry_v2_t); +} + +return 0; } static int _set_status_v1(domid_t domid, @@ -631,10 +635,6 @@ __gnttab_map_grant_ref( rgt = rd-grant_table; spin_lock(rgt-lock); -if ( rgt-gt_version == 0 ) -PIN_FAIL(unlock_out, GNTST_general_error, - remote grant table not yet set up\n); - /* Bounds check on the grant ref */ if ( unlikely(op-ref = nr_grant_entries(rgt))) PIN_FAIL(unlock_out, GNTST_bad_gntref, Bad ref (%d).\n, op-ref); @@ -1502,14 +1502,6 @@ gnttab_prepare_for_transfer( spin_lock(rgt-lock); -if ( rgt-gt_version == 0 ) -{ -gdprintk(XENLOG_INFO, - Grant table not ready for transfer to domain(%d).\n, - rd-domain_id); -goto fail; -} - if ( unlikely(ref = nr_grant_entries(rgt)) ) { gdprintk(XENLOG_INFO, @@ -1905,10 +1897,6 @@ __acquire_grant_for_copy( spin_lock(rgt-lock); -if ( rgt-gt_version == 0 ) -PIN_FAIL(unlock_out, GNTST_general_error, - remote grant table not ready\n); - if ( unlikely(gref = nr_grant_entries(rgt)) ) PIN_FAIL(unlock_out, GNTST_bad_gntref, Bad grant reference %ld\n, gref); @@ -2395,20 +2383,16 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARA change the version number, except for the first 8 entries which are allowed to be in use (xenstore/xenconsole keeps them mapped). (You need to change the version number for e.g. kexec.) */ -if ( gt-gt_version != 0 ) +for ( i = GNTTAB_NR_RESERVED_ENTRIES; i nr_grant_entries(gt); i++ ) { -for ( i = GNTTAB_NR_RESERVED_ENTRIES; i nr_grant_entries(gt); i++ ) +act = active_entry(gt, i); +if ( act-pin != 0 ) { -act = active_entry(gt, i); -if ( act-pin != 0 ) -{ -gdprintk(XENLOG_WARNING, - tried to change grant table version from %d to %d, but some grant entries still in use\n, - gt-gt_version, - op.version); -res = -EBUSY; -goto out_unlock; -} +gdprintk(XENLOG_WARNING, + tried to change grant table version from %d to %d, but some grant entries still in use\n, + gt-gt_version, op.version); +res = -EBUSY; +goto out_unlock; } } @@ -2592,9 +2576,6 @@ __gnttab_swap_grant_ref(grant_ref_t ref_ spin_lock(gt-lock); -if ( gt-gt_version == 0 ) -PIN_FAIL(out, GNTST_general_error, grant table not yet set up\n); - /* Bounds check on the grant refs */ if ( unlikely(ref_a = nr_grant_entries(d-grant_table))) PIN_FAIL(out, GNTST_bad_gntref, Bad ref-a (%d).\n, ref_a); @@ -3165,9 +3146,6 @@ static void gnttab_usage_print(struct do spin_lock(gt-lock); -if ( gt-gt_version == 0 ) -goto out; - for ( ref = 0; ref != nr_grant_entries(gt); ref++ ) { struct active_grant_entry *act; @@ -3211,7 +3189,6 @@ static void gnttab_usage_print(struct do sha-domid, frame, status); } - out: spin_unlock(gt-lock); if ( first ) gnttab: eliminate several explicit version checks By having nr_grant_entries() return zero when the grant table version is still unset we can reduce the number of error paths and at once fix grant_map_exists() running into the being removed ASSERT() when called for a page owned by a domain not having its grant table set up yet. Signed-off-by: Jan Beulich jbeul...@suse.com --- Said ASSERT() was pointless in the grant_map_exists() case, since the function only looks at active entries, which are version independent. Hence this wasn't a security issue. --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -319,11 +319,15 @@ get_maptrack_handle( /* Number of grant table entries. Caller must hold