Re: [Xen-devel] Xen 4.6 released
On 10/7/2015 7:16 PM, Wei Liu wrote: Hi all I'm pleased to announce that Xen 4.6 is released. As release manager I would like to thank everyone who involved in the making of 4.6 release (either in the form of patch, bug report or packaging effort). This release wouldn't have happened without all your contributions. You can check out 4.6 release from xen.git with the tag "RELEASE-4.6.0". Tarball can be obtained from: http://bits.xensource.com/oss-xen/release/4.6.0/xen-4.6.0.tar.gz Wei and Ian, Could we merge these following commits into xen-4.6.1? commit 83215fba3a80c30b5191a4a1086dc510cca43069 libxl: introduce libxl__is_igd_vga_passthru commit 488508b49a65dda20cf6eb6d8f549e8d758e610f libxl: introduce gfx_passthru_kind Currently they're already in staging branch. I know we're going to sync qemu-upstream into qemu-xen in xen 4.7 to support IGD passthrough with qemu-xen. But actually only these two commits can make sure xen-4.6.x work out IGD passthrough with qemu-upstream, instead of qemu-xen. I mean we can set something like this, device_model_version="qemu-xen" device_model_override="//qemu-system-i386" Thanks Tiejun Signature for tarball: http://bits.xensource.com/oss-xen/release/4.6.0/xen-4.6.0.tar.gz.sig Note that this is xen-devel@ only announcement. Official release announcement on xen-announce@, press release, blog post and various other docs / web pages will be available next week on October 13. The official download link may also be different from the one provided above. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] [PULL 0/19] xen-2015-09-08-tag
On 9/22/2015 12:03 AM, Stefano Stabellini wrote: It is going to be in QEMU 2.5 and qemu-xen 4.7. Thanks for your reply. Do we have any possibility of just merging this series into qemu-xen 4.6? We really want to support IGD passthrough on xen 4.6 if possible :) Thanks Tiejun On Mon, 21 Sep 2015, Chen, Tiejun wrote: Stefano, I have two questions, #1. Which qemu version is this igd stuff going into? 2.6? #2. Is this igd stuff going into qemu-xen inside xen? Any plan to go into xen 4.6? Thanks Tiejun On 9/9/2015 1:21 AM, Stefano Stabellini wrote: > The following changes since commit 8611280505119e296757a60711a881341603fa5a: > >target-microblaze: Use setcond for pcmp* (2015-09-08 08:49:33 +0200) > > are available in the git repository at: > >git://xenbits.xen.org/people/sstabellini/qemu-dm.git > tags/xen-2015-09-08-tag > > for you to fetch changes up to ba2250ad148997b1352aba976aac66b55410e7e4: > >xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings. > (2015-09-08 15:21:56 +) > > > Xen branch xen-2015-09-08 > > > Don Slutz (1): >xen-hvm: Add trace to ioreq > > Jan Beulich (1): >xen/HVM: atomically access pointers in bufioreq handling > > Konrad Rzeszutek Wilk (7): >xen-hvm: When using xc_domain_add_to_physmap also include errno when > reporting >xen/pt: Update comments with proper function name. >xen/pt: Make xen_pt_msi_set_enable static >xen/pt: xen_host_pci_config_read returns -errno, not -1 on failure >xen: use errno instead of rc for xc_domain_add_to_physmap >xen/pt/msi: Add the register value when printing logging and error > messages >xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings. > > Michael S. Tsirkin (1): >i440fx: make types configurable at run-time > > Tiejun Chen (9): >pc_init1: pass parameters just with types >piix: create host bridge to passthrough >hw/pci-assign: split pci-assign.c >xen, gfx passthrough: basic graphics passthrough support >xen, gfx passthrough: retrieve VGA BIOS to work >igd gfx passthrough: create a isa bridge >xen, gfx passthrough: register a isa bridge >xen, gfx passthrough: register host bridge specific to passthrough >xen, gfx passthrough: add opregion mapping > > configure | 28 + > hw/core/machine.c | 20 +++ > hw/i386/Makefile.objs |1 + > hw/i386/kvm/pci-assign.c | 82 ++--- > hw/i386/pc_piix.c | 139 - > hw/i386/pci-assign-load-rom.c | 93 ++ > hw/pci-host/piix.c| 91 +- > hw/xen/Makefile.objs |1 + > hw/xen/xen-host-pci-device.c |5 + > hw/xen/xen-host-pci-device.h |1 + > hw/xen/xen_pt.c | 42 ++- > hw/xen/xen_pt.h | 22 +++- > hw/xen/xen_pt_config_init.c | 59 - > hw/xen/xen_pt_graphics.c | 272 > + > hw/xen/xen_pt_msi.c |2 +- > include/hw/boards.h |1 + > include/hw/i386/pc.h |9 +- > include/hw/pci/pci-assign.h | 27 > include/hw/xen/xen_common.h | 34 +- > qemu-options.hx |3 + > trace-events |7 ++ > vl.c | 10 ++ > xen-hvm.c | 55 +++-- > 23 files changed, 891 insertions(+), 113 deletions(-) > create mode 100644 hw/i386/pci-assign-load-rom.c > create mode 100644 hw/xen/xen_pt_graphics.c > create mode 100644 include/hw/pci/pci-assign.h > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression in RMRRs identity mapping for PVH Dom0
On 9/23/2015 11:56 PM, Elena Ufimtseva wrote: Hi There is a regression in RMRR patch 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in new set_identity_p2m_entry. RMRRs are not being mapped in IOMMU for PVH Dom0. This causes pages faults and some long 'hang-like' delays during boot and device assignments. During construct_dom0, in PVH path p2m is being constructed and identity mapped in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx. New code used to map RMRRs invoked from rmrr_identity_mapping checks if p2m entry exists with same type and access and if yes, skips iommu mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being mapped in IOMMU. This debug patch attached fixes this and Ill be glad to see if there is a more elegant fix. Based on your explanation, sounds pvh always creates this mapping beforehand, so what about this? diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index cf8485e..d026845 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -964,7 +964,7 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn, struct p2m_domain *p2m = p2m_get_hostp2m(d); int ret; -if ( !paging_mode_translate(p2m->domain) ) +if ( !paging_mode_translate(p2m->domain) || is_pvh_domain(d) ) { if ( !need_iommu(d) ) return 0; Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] [v4][PATCH 0/2] libxl: try to support IGD passthrough for qemu upstream
Ping... Thanks Tiejun On 9/18/2015 4:30 PM, Tiejun Chen wrote: Ian, As we discussed previously, http://patchwork.ozlabs.org/patch/457055/ now it's time to push this into on xen/tools side since all qemu stuffs have been merged. https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg02094.html v4: Ian, Actually we had v3.5 online previously, which was reviewed by you. http://permalink.gmane.org/gmane.comp.emulators.qemu/329100 So here I just bring a little bit to refine code just for patch #2 according to out last conversation. v3: * Refine some codes based on Campbell's feedback so thanks for Campbell's kind guideline to patch #2 * Update the manpages in patch #2 v2: * Refine patch #2's head description * Improve codes quality inside patch #1 based on Wei's comments * Refill the summary inside patch #0 based on Konrad and Wei's suggestion When we're working to support IGD GFX passthrough with qemu upstream, instead of "-gfx_passthru" we'd like to make that a machine option, "-machine xxx,igd-passthru=on". https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02050.html This need to bring a change on tool side. After a discussion with Campbell, we'd like to construct a table to record all IGD devices we can support. If we hit that table, we should pass that option. And so we also introduce a new field of type, 'gfx_passthru_kind', to cooperate with 'gfx_passthru' to cover all scenarios like this, gfx_passthru = 0=> sets build_info.u.gfx_passthru to false gfx_passthru = 1=> sets build_info.u.gfx_passthru to true and build_info.u.gfx_passthru_kind to DEFAULT gfx_passthru = "igd"=> sets build_info.u.gfx_passthru to false and build_info.u.gfx_passthru_kind to IGD And note actually that option "-gfx_passthru" is just introduced to work for qemu-xen-traditional so we should get this away from libxl__build_device_model_args_new() in the case of qemu upstream. Tiejun Chen (2): libxl: introduce libxl__is_igd_vga_passthru libxl: introduce gfx_passthru_kind docs/man/xl.cfg.pod.5| 35 -- tools/libxl/libxl.h | 6 ++ tools/libxl/libxl_dm.c | 46 +++-- tools/libxl/libxl_internal.h | 2 + tools/libxl/libxl_pci.c | 124 +++ tools/libxl/libxl_types.idl | 6 ++ tools/libxl/xl_cmdimpl.c | 14 +++- 7 files changed, 223 insertions(+), 10 deletions(-) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag
Stefano, I have two questions, #1. Which qemu version is this igd stuff going into? 2.6? #2. Is this igd stuff going into qemu-xen inside xen? Any plan to go into xen 4.6? Thanks Tiejun On 9/9/2015 1:21 AM, Stefano Stabellini wrote: The following changes since commit 8611280505119e296757a60711a881341603fa5a: target-microblaze: Use setcond for pcmp* (2015-09-08 08:49:33 +0200) are available in the git repository at: git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-2015-09-08-tag for you to fetch changes up to ba2250ad148997b1352aba976aac66b55410e7e4: xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings. (2015-09-08 15:21:56 +) Xen branch xen-2015-09-08 Don Slutz (1): xen-hvm: Add trace to ioreq Jan Beulich (1): xen/HVM: atomically access pointers in bufioreq handling Konrad Rzeszutek Wilk (7): xen-hvm: When using xc_domain_add_to_physmap also include errno when reporting xen/pt: Update comments with proper function name. xen/pt: Make xen_pt_msi_set_enable static xen/pt: xen_host_pci_config_read returns -errno, not -1 on failure xen: use errno instead of rc for xc_domain_add_to_physmap xen/pt/msi: Add the register value when printing logging and error messages xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings. Michael S. Tsirkin (1): i440fx: make types configurable at run-time Tiejun Chen (9): pc_init1: pass parameters just with types piix: create host bridge to passthrough hw/pci-assign: split pci-assign.c xen, gfx passthrough: basic graphics passthrough support xen, gfx passthrough: retrieve VGA BIOS to work igd gfx passthrough: create a isa bridge xen, gfx passthrough: register a isa bridge xen, gfx passthrough: register host bridge specific to passthrough xen, gfx passthrough: add opregion mapping configure | 28 + hw/core/machine.c | 20 +++ hw/i386/Makefile.objs |1 + hw/i386/kvm/pci-assign.c | 82 ++--- hw/i386/pc_piix.c | 139 - hw/i386/pci-assign-load-rom.c | 93 ++ hw/pci-host/piix.c| 91 +- hw/xen/Makefile.objs |1 + hw/xen/xen-host-pci-device.c |5 + hw/xen/xen-host-pci-device.h |1 + hw/xen/xen_pt.c | 42 ++- hw/xen/xen_pt.h | 22 +++- hw/xen/xen_pt_config_init.c | 59 - hw/xen/xen_pt_graphics.c | 272 + hw/xen/xen_pt_msi.c |2 +- include/hw/boards.h |1 + include/hw/i386/pc.h |9 +- include/hw/pci/pci-assign.h | 27 include/hw/xen/xen_common.h | 34 +- qemu-options.hx |3 + trace-events |7 ++ vl.c | 10 ++ xen-hvm.c | 55 +++-- 23 files changed, 891 insertions(+), 113 deletions(-) create mode 100644 hw/i386/pci-assign-load-rom.c create mode 100644 hw/xen/xen_pt_graphics.c create mode 100644 include/hw/pci/pci-assign.h ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag
On 9/15/2015 7:00 PM, Paolo Bonzini wrote: On 15/09/2015 11:55, Stefano Stabellini wrote: On Mon, 14 Sep 2015, Paolo Bonzini wrote: > On 10/09/2015 12:29, Stefano Stabellini wrote: > > +if (lseek(config_fd, pos, SEEK_SET) != pos) { > > +return -errno; > > +} > > do { > > -rc = pread(config_fd, (uint8_t *), len, pos); > > +rc = read(config_fd, (uint8_t *), len); > > } while (rc < 0 && (errno == EINTR || errno == EAGAIN)); > > This leaks config_fd. I don't follow, it leaks config_fd where? Where lseek returns -errno (and IIRC in other places in the same function). Do you mean we need this change? diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 1fb71c8..7d44228 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -775,15 +775,18 @@ static int host_pci_config_read(int pos, int len, uint32_t val) } if (lseek(config_fd, pos, SEEK_SET) != pos) { +close(config_fd); return -errno; } do { rc = read(config_fd, (uint8_t *), len); } while (rc < 0 && (errno == EINTR || errno == EAGAIN)); if (rc != len) { +close(config_fd); return -errno; } +close(config_fd); return 0; } Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode
But looks its not better, so any idea? Did you at least make an attempt to find other examples of where we dynamically determine the log level to be used for a message? I would assume that if you did, you'd have come to printk(XENLOG_GUEST "%s" VTDPREFIX I didn't know this tip on Xen side and its really good. " It's %s to assign %04x:%02x:%02x.%u" " with shared RMRR at %"PRIx64" for Dom%d.\n", relaxed ? XENLOG_WARNING : XENLOG_ERROR, relaxed ? "risky" : "disallowed", seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), rmrr->base_address, d->domain_id); pretty naturally. But I noticed my original patch is already merged into staging. So Wei, Do you think if we need a small patch to improved this? Maybe you can squash that if necessary. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode
OK, that explanation is fine to me as long as it's made clear no security guarantee once admin uses 'relax' for any domain. Tiejun could you resend patch with right warning/error type? Sure, but a little bit makes me confused when I'm trying to address this. Actually most messages are same, except for logevel, so I did this like, printk(XENLOG_G_INFO VTDPREFIX " Assign %04x:%02x:%02x.%u" " with shared RMRR at %"PRIx64" for Dom%d.", seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), rmrr->base_address, d->domain_id); if ( relaxed ) printk(XENLOG_G_WARNING VTDPREFIX " It's really risky."); else printk(XENLOG_G_ERR VTDPREFIX " So it's disallowed!"); printk(XENLOG_G_INFO VTDPREFIX "\n"); But looks its not better, so any idea? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
Sort of (the patch has the intended effect, but for its size very many rough edges). I guess we need to amend the original parameter, once_mapping_mfns, like this, /* xen_once_mapping_mfns: memory mapping mfn bumbers once. */ unsigned int xen_once_mapping_mfns; size_param("once_mapping_mfns", xen_once_mapping_mfns); static void xen_once_mapping_mfns_setup(void) { if ( once_mapping_mfns < 64 ) xen_once_mapping_mfns = 64; else if ( once_mapping_mfns > 1024 ) xen_once_mapping_mfns = 1024; else xen_once_mapping_mfns = once_mapping_mfns; } Or what is your expected rule? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag
As you see this short log, "hw/pci-assign: split pci-assign.c", so this means I just extract something from the original hw/i386/kvm/pci-assign.c, and here so I just keep those original head files residing hw/i386/kvm/pci-assign.c, and I didn't introduce anything new. hw/i386/kvm/pci-assign.c is only built if configure set CONFIG_KVM, which it won't do on Windows or OSX builds. It sounds like your patch has incorrectly moved code out of files which are compiled only if KVM is present, or only if we're doing Xen PCI passthrough, and into compiled-for-everything files. Yes, we want to share this chunk of codes between Xen and Kvm. Just to this error, could we remove #include ? As I mentioned I still can compile this file without this head file. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag
xen-host-pci-device.c is only compiled if CONFIG_XEN_PCI_PASSTHROUGH was set by configure. That won't be the case on OSX or Windows, where the Xen headers don't exist. Okay. This actually shouldn't be enabled on Windows so what about this? diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index 58a33fb..9a1fcb9 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -739,6 +739,7 @@ static const TypeInfo i440fx_info = { .class_init= i440fx_class_init, }; +#ifndef _WIN32 /* IGD Passthrough Host Bridge. */ typedef struct { uint8_t offset; @@ -819,6 +820,7 @@ static const TypeInfo igd_passthrough_i440fx_info = { .instance_size = sizeof(PCII440FXState), .class_init= igd_passthrough_i440fx_class_init, }; +#endif static const char *i440fx_pcihost_root_bus_path(PCIHostState *host_bridge, PCIBus *rootbus) @@ -861,7 +863,9 @@ static const TypeInfo i440fx_pcihost_info = { static void i440fx_register_types(void) { type_register_static(_info); +#ifndef _WIN32 type_register_static(_passthrough_i440fx_info); +#endif type_register_static(_pci_type_info); type_register_static(_info); type_register_static(_xen_info); Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
Right, that's one of the things that would need taking care of. (Whether enforcing an upper limit is actually needed I'm not sure - we generally allow the admin to shoot himself in the foot if he wants to. And whether the lower limit should be 64 instead of just ensuring the limit is not zero is another question.) 64 was semi-arbitrary - it ended up giving good latency on highly scalar machines (8 socket). Higher numbers ended up affecting the latency. But higher numbers on small socket machines were OK. (As they do not have 8 IOMMU VT-d chipsets all potentially flodding the QPI with serialized cache flushes). So we should make this range [8, ] here, but 64 by default. Right? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode
> > Need to have separate warning/error level for relax/strict. > > > > However I don't think this patch is a right fix. So far relax/strict policy > > is per-domain. what about one VM specifies relax while another VM > > specifies strict when each is assigned with a device sharing rmrr > > with the other? In that case it becomes a system-wide security hole. > > The one specifying "strict" won't gets its device assigned (due to > the code above, taking the path that was there already without > the patch), so I don't see the security issue. > Agreed. A VM can't get such device assigned in the first place, so the hypothetical scenario doesn't exist. Sorry it's a bad example. My actual concern is that we can't count on this per-VM relax/strict policy to prevent group devices assigned to different VM. In that case it's definitely a security hole since one VM may clobber shared RMRR to impact another VM. So right example for that scenario is both VMs specified with 'relax'. What if one of group devices is still owned by Dom0? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag
Thanks! I'll fold it the offending patch (http://marc.info/?l=qemu-devel=144174596628052=2) and resend. Reviewed-by: Michael S. TsirkinMichale and Stefano, Thanks a lot :) Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode
>> > > However I don't think this patch is a right fix. So far relax/strict policy >> > > is per-domain. what about one VM specifies relax while another VM >> > > specifies strict when each is assigned with a device sharing rmrr >> > > with the other? In that case it becomes a system-wide security hole. >> > >> > The one specifying "strict" won't gets its device assigned (due to >> > the code above, taking the path that was there already without >> > the patch), so I don't see the security issue. >> > >> >> Agreed. A VM can't get such device assigned in the first place, so the >> hypothetical scenario doesn't exist. >> > > Sorry it's a bad example. My actual concern is that we can't count > on this per-VM relax/strict policy to prevent group devices assigned > to different VM. In that case it's definitely a security hole since > one VM may clobber shared RMRR to impact another VM. So right > example for that scenario is both VMs specified with 'relax'. What if one of group devices is still owned by Dom0? It's also risky since other VM may attack Dom0 in such scenario. In my opinion, Dom0 should have a big impact... Anyway, this always means we have to start refactoring some codes. For example, we are probably going to introduce some new fields in struct acpi_rmrr_unit, just like, int domain_id -> Distinguish which domain owns this unit unsigned int flag -> Record state: XEN_DOMCTL_DEV_RDM_RELAXE or !XEN_DOMCTL_DEV_RDM_RELAXE This should involve several sections, such as parsing rmrr, setup hwdomain and assign/remove device. But I'm not sure if this is good to handle current problem. Actually I prefer to work on current patch just now, and then we can start discussing our final solution :) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Question to Xen log level in the case of PT
So can Xen change log level dynamically like Linux? If yes, we might change this level temporarily while passing through IGD. If not, any suggestion? First of all you could boot without lowering the log level (non-debug builds) or raising the log level ("loglvl=warning"; debug builds). But Sorry I don't know how to build "non-debug" here. Could you give me this detail? Or where I can get this info? that would change the log level for the entire session, which may not be what you're after. I too realized that having a way to dynamically adjust the log level would be useful occasionally. For post-4.6 I have a patch (attached) ready allowing to do so in a limited way from the serial console (and hence also via "xl debug-key"). As you'll see in there I also took note of it probably being desirable to have a sysctl (and then a wrapping xl command) to full control the log level. I didn't get around to implement that yet. Good to know this. Otoh the specific messages you cite are of quite questionable use in the first place. I certainly would welcome a patch lowering their priority to XENLOG_G_DEBUG (which however would still not Done. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode
On 9/9/2015 2:54 PM, Jan Beulich wrote: On 09.09.15 at 03:59,wrote: @@ -2310,12 +2312,16 @@ static int intel_iommu_assign_device( PCI_DEVFN2(bdf) == devfn && rmrr->scope.devices_cnt > 1 ) { -printk(XENLOG_G_ERR VTDPREFIX - " cannot assign %04x:%02x:%02x.%u" +bool_t relaxed = !!(flag & XEN_DOMCTL_DEV_RDM_RELAXED); + +printk(XENLOG_G_WARNING VTDPREFIX Well, I can live with this always being a warning, but it's not what I had asked for. The VT-d maintainers will have to judge. Kevin, Any comments? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag
On 9/10/2015 12:10 AM, Stefano Stabellini wrote: On Wed, 9 Sep 2015, Stefano Stabellini wrote: On Tue, 8 Sep 2015, Peter Maydell wrote: > On 8 September 2015 at 18:21, Stefano Stabellini >wrote: > > The following changes since commit 8611280505119e296757a60711a881341603fa5a: > > > > target-microblaze: Use setcond for pcmp* (2015-09-08 08:49:33 +0200) > > > > are available in the git repository at: > > > > git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-2015-09-08-tag > > > > for you to fetch changes up to ba2250ad148997b1352aba976aac66b55410e7e4: > > > > xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings. (2015-09-08 15:21:56 +) > > > > > > Xen branch xen-2015-09-08 > > > > > > Hi. I'm afraid this fails to build on OSX (and probably Windows too, > though that build hasn't run yet): > > CCi386-softmmu/hw/i386/pci-assign-load-rom.o > /Users/pm215/src/qemu/hw/i386/pci-assign-load-rom.c:6:10: fatal error: > 'sys/io.h' file not found > #include > ^ > CCalpha-softmmu/hw/alpha/pci.o > 1 error generated. Tiejun, this is caused by 33d33242b7d802e6c994f3d56ecba96a66465dc3, "hw/pci-assign: split pci-assign.c". Could you please double-check non-Linux builds? I found another issue introduced by the gfx passthrough series on Windows: ../hw/pci-host/piix.o: In function `host_pci_config_read': /root/qemu/hw/pci-host/piix.c:778: undefined reference to `_pread' It is introduced by: commit fdb70721ba0496a767137e5505dd27627d19c4a8 Author: Tiejun Chen Date: Wed Jul 15 13:37:43 2015 +0800 piix: create host bridge to passthrough You might have to replace the pread call with lseek and read. This is also surprising to me. Just see xen_host_pci_config_() inside /hw/xen/xen-host-pci-device.c, there are so many this pread usage(). So I really don't understand what's difference between these two files. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag
On 9/9/2015 9:06 PM, Stefano Stabellini wrote: On Tue, 8 Sep 2015, Peter Maydell wrote: On 8 September 2015 at 18:21, Stefano Stabelliniwrote: > The following changes since commit 8611280505119e296757a60711a881341603fa5a: > > target-microblaze: Use setcond for pcmp* (2015-09-08 08:49:33 +0200) > > are available in the git repository at: > > git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-2015-09-08-tag > > for you to fetch changes up to ba2250ad148997b1352aba976aac66b55410e7e4: > > xen/pt: Use XEN_PT_LOG properly to guard against compiler warnings. (2015-09-08 15:21:56 +) > > > Xen branch xen-2015-09-08 > > Hi. I'm afraid this fails to build on OSX (and probably Windows too, though that build hasn't run yet): CCi386-softmmu/hw/i386/pci-assign-load-rom.o /Users/pm215/src/qemu/hw/i386/pci-assign-load-rom.c:6:10: fatal error: 'sys/io.h' file not found #include ^ CCalpha-softmmu/hw/alpha/pci.o 1 error generated. Tiejun, this is caused by 33d33242b7d802e6c994f3d56ecba96a66465dc3, "hw/pci-assign: split pci-assign.c". Could you please double-check non-Linux builds? Its interesting. As you see this short log, "hw/pci-assign: split pci-assign.c", so this means I just extract something from the original hw/i386/kvm/pci-assign.c, and here so I just keep those original head files residing hw/i386/kvm/pci-assign.c, and I didn't introduce anything new. So its very probably that you still can't compile successfully even without my commit on OSX/Windows, right? I think Peter may be right, "Will passthrough even work on Windows and OSX hosts? Consider whether we should be building this code on those hosts at all..." I prefer this isn't what we did previously. I suspect that the fix would be quite small, but I don't have an OSX or a Windows build environment to try it. I haven't a this build environment as well. But I think right now you can remove "#include " to fix this simply since looks this is redundant actually. hw/i386/pci-assign: remove one head file This is redundant actually but really break OS/Windows build. Signed-off-by: Tiejun Chen diff --git a/hw/i386/pci-assign-load-rom.c b/hw/i386/pci-assign-load-rom.c index bad53b7..1f0d4ef 100644 --- a/hw/i386/pci-assign-load-rom.c +++ b/hw/i386/pci-assign-load-rom.c @@ -3,7 +3,6 @@ */ #include #include -#include #include #include #include At least I can build this under Linux, ./configure --target-list=x86_64-softmmu && make Thanks Tiejun Speak about build environments, Peter, would you care to share your scripts and setup so that I can run similar tests in the future on my own? I have no OSX machines so I tried to do a Windows cross-compile, following http://wiki.qemu.org/Hosts/W32 on Debian 7, but I failed very early with an "ERROR: zlib check failed". ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping
If the 64 limit was arbitrary then I would suggest increasing it to at least 1024 so that at least 4M of BAR can be mapped in one go and it reduces the overhead by a factor of 16. 1024 may be a little much, but 256 is certainly a possibility, plus Konrad's suggestion to allow this limit to be controlled via command line option. Are you guys talking this way? diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 3946e4c..a9671bb 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -88,6 +88,10 @@ boolean_param("noapic", skip_ioapic_setup); s8 __read_mostly xen_cpuidle = -1; boolean_param("cpuidle", xen_cpuidle); +/* once_mapping_mfns: memory mapping mfn bumbers once. */ +unsigned int xen_once_mapping_mfns; +integer_param("once_mapping_mfns", xen_once_mapping_mfns); + #ifndef NDEBUG unsigned long __initdata highmem_start; size_param("highmem-start", highmem_start); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 3bf39f1..82c85e3 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -33,6 +33,8 @@ #include #include +extern unsigned int xen_once_mapping_mfns; + static DEFINE_SPINLOCK(domctl_lock); DEFINE_SPINLOCK(vcpu_alloc_lock); @@ -1035,7 +1037,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) ret = -E2BIG; /* Must break hypercall up as this could take a while. */ -if ( nr_mfns > 64 ) +if ( nr_mfns > xen_once_mapping_mfns ) break; ret = -EPERM; Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode
Need to have separate warning/error level for relax/strict. However I don't think this patch is a right fix. So far relax/strict policy is per-domain. what about one VM specifies relax while another VM specifies strict when each is assigned with a device sharing rmrr with the other? In that case it becomes a system-wide security hole. Once we add code to track group relationship cross domains, it'd be close to the final fix to support group assignment which originally target 4.7. It might be risky to add that in 4.6. Yes. So my suggestion is to live with current limitation. But recently someone was encountering this problem. http://www.gossamer-threads.com/lists/xen/devel/391684?page=last We'd better figure out a simple way to this regression. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Question to Xen log level in the case of PT
All guys, Sorry to raise a question to you since I'm not very sure how to deal with this. When I passthrough a device like IGD, I can see so many messages: "memory_map:add:" and "memory_map:remove:" since we have to add/remove all pages map residing PCI bar. Especially as a graphic device, oftentimes this range would occupy dozens of MB, even hundreds of MB. These print messages consume a lot of time to boot a VM. For instance, it takes about 5 minutes to boot a Windows guest on my BDW. But if I remove these output simply like this, diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 7f959f3..82da9d1 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -1049,10 +1049,6 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) if ( add ) { -printk(XENLOG_G_INFO - "memory_map:add: dom%d gfn=%lx mfn=%lx nr=%lx\n", - d->domain_id, gfn, mfn, nr_mfns); - ret = map_mmio_regions(d, gfn, nr_mfns, mfn); if ( ret ) printk(XENLOG_G_WARNING @@ -1061,10 +1057,6 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) } else { -printk(XENLOG_G_INFO - "memory_map:remove: dom%d gfn=%lx mfn=%lx nr=%lx\n", - d->domain_id, gfn, mfn, nr_mfns); - ret = unmap_mmio_regions(d, gfn, nr_mfns, mfn); if ( ret && is_hardware_domain(current->domain) ) printk(XENLOG_ERR its down to a half, about 2.5 minutes. I know I can't delete this directly. But currently there are four log level on Xen side, * XENLOG_ERR: Fatal errors, either Xen, Guest or Dom0 * is about to crash. * * XENLOG_WARNING: Something bad happened, but we can recover. * * XENLOG_INFO: Interesting stuff, but not too noisy. * * XENLOG_DEBUG: Use where ever you like. Lots of noise. looks I have to change XENLOG_G_INFO to XENLOG_G_WARNING but its not appropriate here. So can Xen change log level dynamically like Linux? If yes, we might change this level temporarily while passing through IGD. If not, any suggestion? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
Sorry for this delay response because I was on vacation. On 9/5/2015 5:52 AM, Tamas K Lengyel wrote: On Fri, Sep 4, 2015 at 2:17 AM, Jan Beulichwrote: >>> On 03.09.15 at 21:39, wrote: > So this change in 4.6 prevents me from passing through devices that have > worked previously with VT-d: > > (XEN) [VT-D] cannot assign :00:1a.0 with shared RMRR at ae8a9000 for > Dom30. > (XEN) [VT-D] cannot assign :00:1d.0 with shared RMRR at ae8a9000 for > Dom31. > > The devices are the USB 2.0 devices on a DQ67SW motherboard: > > 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset > Family USB Enhanced Host Controller #2 (rev 04) > 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset > Family USB Enhanced Host Controller #1 (rev 04) Please don't top post. And I'm also puzzled by you sending this to Ian rather than the author. Hm, I've just hit reply-all to the latest message I've found in the thread. Technically - Tiejun, should this perhaps be permitted in relaxed mode, at least until group assignment gets implemented? (Or I agree. What about this? diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 836aed5..038776a 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2310,12 +2310,16 @@ static int intel_iommu_assign_device( PCI_DEVFN2(bdf) == devfn && rmrr->scope.devices_cnt > 1 ) { +u32 relaxed = flag & XEN_DOMCTL_DEV_RDM_RELAXED; + printk(XENLOG_G_ERR VTDPREFIX - " cannot assign %04x:%02x:%02x.%u" + " Currently its %s to assign %04x:%02x:%02x.%u" " with shared RMRR at %"PRIx64" for Dom%d.\n", + relaxed ? "disallowed" : "risky", seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), rmrr->base_address, d->domain_id); -return -EPERM; +if ( !relaxed ) +return -EPERM; } } Tamas, do you actually mean to assign these to _different_ guests, considering the log fragment above?) No, I actually want to assign them to the same domain. The domain creation fails with either of those devices specified for passthrough whether they are to be attached to the same domain or not. Tamas, could you try this in your case? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr
On 9/6/2015 11:19 AM, Tamas K Lengyel wrote: diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 836aed5..038776a 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2310,12 +2310,16 @@ static int intel_iommu_assign_device( PCI_DEVFN2(bdf) == devfn && rmrr->scope.devices_cnt > 1 ) { +u32 relaxed = flag & XEN_DOMCTL_DEV_RDM_RELAXED; + printk(XENLOG_G_ERR VTDPREFIX - " cannot assign %04x:%02x:%02x.%u" + " Currently its %s to assign %04x:%02x:%02x.%u" " with shared RMRR at %"PRIx64" for Dom%d.\n", + relaxed ? "disallowed" : "risky", This debug message is backwards? Yeah. Its indeed like this, relaxed ? "risky" : "disallowed" But lets wait Jan's comment to step next. seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), rmrr->base_address, d->domain_id); -return -EPERM; +if ( !relaxed ) +return -EPERM; } } Tamas, do you actually mean to assign these to _different_ guests, considering the log fragment above?) No, I actually want to assign them to the same domain. The domain creation fails with either of those devices specified for passthrough whether they are to be attached to the same domain or not. Tamas, could you try this in your case? Took me a while to find the xl config option to set this flag (pci = [ 'sbdf, rdm_policy=strict/relaxed' ]) but now it works as expected! I remember 'relaxed' is a default value so 'rdm_policy' can't be dropped here if you like this. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6
On 8/27/2015 7:03 PM, Konrad Rzeszutek Wilk wrote: On Thu, Aug 27, 2015 at 11:06:30AM +0800, Chen, Tiejun wrote: On 8/25/2015 10:43 PM, Konrad Rzeszutek Wilk wrote: On Tue, Aug 25, 2015 at 02:55:31PM +0800, Chen, Tiejun wrote: On 8/25/2015 8:19 AM, Tamas K Lengyel wrote: Hi everyone, I saw some people passingly mention this on the list before but just in case it has been missed, my serial is also being spammed with the following printouts with both Xen 4.6 RC1 and the latest staging build: ... (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 2610742000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid ... What's your platform? BDW? And how much memory is set to your guest OS? Is see this as well. But oddly enough - only when I use the AMT feature (normally I just use serial console on the machine). The platform is /DQ67SW, BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012 There is no guest OS - this is initial domain. And I boot with 2GB: Released 0 page(s) Xen: [mem 0x-0x00099fff] usable Xen: [mem 0x0009a800-0x000f] reserved Xen: [mem 0x0010-0x1fff] usable Xen: [mem 0x2000-0x201f] reserved Xen: [mem 0x2020-0x3fff] usable Xen: [mem 0x4000-0x401f] reserved Xen: [mem 0x4020-0x80465fff] usable Xen: [mem 0x80466000-0x9e855fff] unusable Xen: [mem 0x9e856000-0x9e85efff] ACPI data Xen: [mem 0x9e85f000-0x9e8a9fff] ACPI NVS Xen: [mem 0x9e8aa000-0x9e8b1fff] unusable Xen: [mem 0x9e8b2000-0x9e9a4fff] reserved Xen: [mem 0x9e9a5000-0x9e9a6fff] unusable Xen: [mem 0x9e9a7000-0x9ebc5fff] reserved Xen: [mem 0x9ebc6000-0x9ebc6fff] unusable Xen: [mem 0x9ebc7000-0x9ebd6fff] reserved Xen: [mem 0x9ebd7000-0x9ebf4fff] ACPI NVS Xen: [mem 0x9ebf5000-0x9ec18fff] reserved Xen: [mem 0x9ec19000-0x9ec5bfff] ACPI NVS Xen: [mem 0x9ec5c000-0x9ee7bfff] reserved Xen: [mem 0x9ee7c000-0x9eff] unusable Xen: [mem 0x9f80-0xbf9f] reserved Xen: [mem 0xfec0-0xfec00fff] reserved Xen: [mem 0xfed1c000-0xfed3] reserved Xen: [mem 0xfed9-0xfed91fff] reserved Xen: [mem 0xfee0-0xfeef] reserved Xen: [mem 0xff00-0x] reserved Xen: [mem 0x0001-0x00043e5f] unusable Just at first glance to fault address, this seems be issued from some As you see those fault addresses are out of the normal memory range here. known erratas on BDS and SKL. I am runnig v4.2-rc8. So I really doubt this is related to some erratas. Currently the pre-fetch unit of IOMMU unit dedicated to IGD can't work well on some platforms, so you can see these wired faults. Do you have some ideas for a solution/patch? I can't reproduce this on my BDW. And this is my assumption as well. So could anyone provide a log with iommu=debug? Thanks Tiejun Thanks! ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for 4.6] VT-d: Create IOMMU mappings for RMRR regions if shared EPT is not being used
On 8/27/2015 4:40 PM, Malcolm Crossley wrote: On 27/08/15 03:59, Chen, Tiejun wrote: This kind of issue is already gone. https://www.mail-archive.com/xen-devel@lists.xen.org/msg32464.html There is a bug in the code you refer to above which results in no IOMMU page table mappings being created if the guest domain is not sharing it's EPT page tables with the IOMMU. set_identity_p2m_entry only configures the EPT page tables and does not configure the IOMMU page tables. Okay, I got what you mean. Instead, could you insert iommu_{map,unmap_page() into {set,clear}_identity_p2m_entry()? I think this can make {set,clear}_identity_p2m_entry approachable in all circumstances. Kevin and Jan, Is this fine? Thanks Tiejun We had a real world regression (with xen 4.6-rc1) on a Intel Haswell system with integrated graphics. The patch below resolves the regression. Malcolm Thanks Tiejun On 8/26/2015 11:49 PM, Malcolm Crossley wrote: Add RMRR 1:1 IOMMU mappings to IOMMU page tables if EPT page table are not being shared with the IOMMU. This is a regression in behaviour versus Xen 4.5. Signed-off-by: Malcolm Crossley malcolm.cross...@citrix.com --- xen/drivers/passthrough/vtd/iommu.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 836aed5..89de741 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1839,8 +1839,16 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, while ( base_pfn end_pfn ) { -if ( (d, base_pfn) ) -ret = -ENXIO; +if ( iommu_use_hap_pt(d) ) +{ +if ( clear_identity_p2m_entry(d, base_pfn) ) +ret = -ENXIO; +} +else +{ +if ( intel_iommu_unmap_page(d, base_pfn) ) +ret = -ENXIO; +} base_pfn++; } @@ -1855,7 +1863,16 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, while ( base_pfn end_pfn ) { -int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag); +int err; +if ( iommu_use_hap_pt(d) ) +{ +err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag); +} +else +{ +err = intel_iommu_map_page(d, base_pfn, base_pfn, + IOMMUF_readable|IOMMUF_writable); +} if ( err ) return err; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6
On 8/27/2015 12:19 AM, Malcolm Crossley wrote: On 25/08/15 15:43, Konrad Rzeszutek Wilk wrote: On Tue, Aug 25, 2015 at 02:55:31PM +0800, Chen, Tiejun wrote: On 8/25/2015 8:19 AM, Tamas K Lengyel wrote: Hi everyone, I saw some people passingly mention this on the list before but just in case it has been missed, my serial is also being spammed with the following printouts with both Xen 4.6 RC1 and the latest staging build: ... (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 2610742000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid ... I think this problem is caused by missing IOMMU mappings for the RMRR regions if the domain does not have shared EPT enabled. This includes PV Dom 0. I don't think this is relevant to current problem. I have posted a patch to fix the issue: http://lists.xen.org/archives/html/xen-devel/2015-08/msg02090.html And we already fixed this on 4.6. Thanks Tiejun What's your platform? BDW? And how much memory is set to your guest OS? Is see this as well. But oddly enough - only when I use the AMT feature (normally I just use serial console on the machine). The platform is /DQ67SW, BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012 There is no guest OS - this is initial domain. And I boot with 2GB: Released 0 page(s) Xen: [mem 0x-0x00099fff] usable Xen: [mem 0x0009a800-0x000f] reserved Xen: [mem 0x0010-0x1fff] usable Xen: [mem 0x2000-0x201f] reserved Xen: [mem 0x2020-0x3fff] usable Xen: [mem 0x4000-0x401f] reserved Xen: [mem 0x4020-0x80465fff] usable Xen: [mem 0x80466000-0x9e855fff] unusable Xen: [mem 0x9e856000-0x9e85efff] ACPI data Xen: [mem 0x9e85f000-0x9e8a9fff] ACPI NVS Xen: [mem 0x9e8aa000-0x9e8b1fff] unusable Xen: [mem 0x9e8b2000-0x9e9a4fff] reserved Xen: [mem 0x9e9a5000-0x9e9a6fff] unusable Xen: [mem 0x9e9a7000-0x9ebc5fff] reserved Xen: [mem 0x9ebc6000-0x9ebc6fff] unusable Xen: [mem 0x9ebc7000-0x9ebd6fff] reserved Xen: [mem 0x9ebd7000-0x9ebf4fff] ACPI NVS Xen: [mem 0x9ebf5000-0x9ec18fff] reserved Xen: [mem 0x9ec19000-0x9ec5bfff] ACPI NVS Xen: [mem 0x9ec5c000-0x9ee7bfff] reserved Xen: [mem 0x9ee7c000-0x9eff] unusable Xen: [mem 0x9f80-0xbf9f] reserved Xen: [mem 0xfec0-0xfec00fff] reserved Xen: [mem 0xfed1c000-0xfed3] reserved Xen: [mem 0xfed9-0xfed91fff] reserved Xen: [mem 0xfee0-0xfeef] reserved Xen: [mem 0xff00-0x] reserved Xen: [mem 0x0001-0x00043e5f] unusable Just at first glance to fault address, this seems be issued from some known erratas on BDS and SKL. I am runnig v4.2-rc8. Thanks Tiejun The device in question is an integrated Intel graphics card: 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) Same device.. This is a sandy bridge device which doesn't support shared EPT for GPU device. The only way I found to stop the messages from making my serial connection useless was by assigning the device to xen-pciback. This will cause the GPU to be reset and the reset stopped the GPU accessing the RMRR region. Cheers, Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for 4.6] VT-d: Create IOMMU mappings for RMRR regions if shared EPT is not being used
This kind of issue is already gone. https://www.mail-archive.com/xen-devel@lists.xen.org/msg32464.html Thanks Tiejun On 8/26/2015 11:49 PM, Malcolm Crossley wrote: Add RMRR 1:1 IOMMU mappings to IOMMU page tables if EPT page table are not being shared with the IOMMU. This is a regression in behaviour versus Xen 4.5. Signed-off-by: Malcolm Crossley malcolm.cross...@citrix.com --- xen/drivers/passthrough/vtd/iommu.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 836aed5..89de741 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1839,8 +1839,16 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, while ( base_pfn end_pfn ) { -if ( clear_identity_p2m_entry(d, base_pfn) ) -ret = -ENXIO; +if ( iommu_use_hap_pt(d) ) +{ +if ( clear_identity_p2m_entry(d, base_pfn) ) +ret = -ENXIO; +} +else +{ +if ( intel_iommu_unmap_page(d, base_pfn) ) +ret = -ENXIO; +} base_pfn++; } @@ -1855,7 +1863,16 @@ static int rmrr_identity_mapping(struct domain *d, bool_t map, while ( base_pfn end_pfn ) { -int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag); +int err; +if ( iommu_use_hap_pt(d) ) +{ +err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag); +} +else +{ +err = intel_iommu_map_page(d, base_pfn, base_pfn, + IOMMUF_readable|IOMMUF_writable); +} if ( err ) return err; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6
On 8/25/2015 10:43 PM, Konrad Rzeszutek Wilk wrote: On Tue, Aug 25, 2015 at 02:55:31PM +0800, Chen, Tiejun wrote: On 8/25/2015 8:19 AM, Tamas K Lengyel wrote: Hi everyone, I saw some people passingly mention this on the list before but just in case it has been missed, my serial is also being spammed with the following printouts with both Xen 4.6 RC1 and the latest staging build: ... (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 2610742000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid ... What's your platform? BDW? And how much memory is set to your guest OS? Is see this as well. But oddly enough - only when I use the AMT feature (normally I just use serial console on the machine). The platform is /DQ67SW, BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012 There is no guest OS - this is initial domain. And I boot with 2GB: Released 0 page(s) Xen: [mem 0x-0x00099fff] usable Xen: [mem 0x0009a800-0x000f] reserved Xen: [mem 0x0010-0x1fff] usable Xen: [mem 0x2000-0x201f] reserved Xen: [mem 0x2020-0x3fff] usable Xen: [mem 0x4000-0x401f] reserved Xen: [mem 0x4020-0x80465fff] usable Xen: [mem 0x80466000-0x9e855fff] unusable Xen: [mem 0x9e856000-0x9e85efff] ACPI data Xen: [mem 0x9e85f000-0x9e8a9fff] ACPI NVS Xen: [mem 0x9e8aa000-0x9e8b1fff] unusable Xen: [mem 0x9e8b2000-0x9e9a4fff] reserved Xen: [mem 0x9e9a5000-0x9e9a6fff] unusable Xen: [mem 0x9e9a7000-0x9ebc5fff] reserved Xen: [mem 0x9ebc6000-0x9ebc6fff] unusable Xen: [mem 0x9ebc7000-0x9ebd6fff] reserved Xen: [mem 0x9ebd7000-0x9ebf4fff] ACPI NVS Xen: [mem 0x9ebf5000-0x9ec18fff] reserved Xen: [mem 0x9ec19000-0x9ec5bfff] ACPI NVS Xen: [mem 0x9ec5c000-0x9ee7bfff] reserved Xen: [mem 0x9ee7c000-0x9eff] unusable Xen: [mem 0x9f80-0xbf9f] reserved Xen: [mem 0xfec0-0xfec00fff] reserved Xen: [mem 0xfed1c000-0xfed3] reserved Xen: [mem 0xfed9-0xfed91fff] reserved Xen: [mem 0xfee0-0xfeef] reserved Xen: [mem 0xff00-0x] reserved Xen: [mem 0x0001-0x00043e5f] unusable Just at first glance to fault address, this seems be issued from some As you see those fault addresses are out of the normal memory range here. known erratas on BDS and SKL. I am runnig v4.2-rc8. So I really doubt this is related to some erratas. Currently the pre-fetch unit of IOMMU unit dedicated to IGD can't work well on some platforms, so you can see these wired faults. Thanks Tiejun Thanks Tiejun The device in question is an integrated Intel graphics card: 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) Same device.. The only way I found to stop the messages from making my serial connection useless was by assigning the device to xen-pciback. Cheers, Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VT-d faults with Integrated Intel graphics on 4.6
On 8/25/2015 8:19 AM, Tamas K Lengyel wrote: Hi everyone, I saw some people passingly mention this on the list before but just in case it has been missed, my serial is also being spammed with the following printouts with both Xen 4.6 RC1 and the latest staging build: ... (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 33487d7000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [VT-D]DMAR:[DMA Read] Request device [:00:02.0] fault addr 2610742000, iommu reg = 82c000201000 (XEN) [VT-D]DMAR: reason 07 - Next page table ptr is invalid ... What's your platform? BDW? And how much memory is set to your guest OS? Just at first glance to fault address, this seems be issued from some known erratas on BDS and SKL. Thanks Tiejun The device in question is an integrated Intel graphics card: 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) The only way I found to stop the messages from making my serial connection useless was by assigning the device to xen-pciback. Cheers, Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6] libxl: move calling libxl__arch_domain_construct_memmap to right place
On 8/5/2015 7:25 PM, Wei Liu wrote: On Wed, Aug 05, 2015 at 12:06:22PM +0100, Ian Campbell wrote: On Wed, 2015-08-05 at 11:58 +0100, Wei Liu wrote: On Wed, Aug 05, 2015 at 11:48:55AM +0100, Ian Campbell wrote: On Wed, 2015-08-05 at 11:43 +0100, Wei Liu wrote: On Wed, Aug 05, 2015 at 11:38:54AM +0100, Ian Campbell wrote: On Wed, 2015-08-05 at 11:23 +0100, Wei Liu wrote: This function was called in the wrong place, because both libxl__vnuma_build_vmemrange_hvm and xc_hvm_build rely on its output. What is the effect of this call being in the wrong place? Presumably one or the other of those functions reaches the wrong conclusion? Originally, by the time that function got called, all guest pages were already populated. The end result is E820 map disagrees with what vNUMA says and what address ranges memory actually resides, i.e. risk of guest accessing region that doesn't have backing pages. Ouch. This should certainly be explained in the commit message. With that: Acked-by: Ian Campbell ian.campb...@citrix.com I will post v2 shortly with that information in commit message. Although perhaps we should wait for confirmation this fix doesn't regress RMRR somehow? I doubt it will. The code as-is is already broken for RMRR. But let's wait till tomorrow for Tiejun to reply. If he doesn't reply by tomorrow, I suggest we apply v2 first and fix up any subsequent issues later. Agreed. Actually I want to retract this patch. I confused hvm path with pv path and drew my conclusion when looking at both code paths. In hvm path, neither libxl__vnuma_build_vmemragen_hvm nor xc_hvm_build depends on output of libxl__arch_domain_construct_memmap (in fact it doesn't change anything). So the code is OK. In pv path, there is a path which relies on having a valid E820 map first, but that path 1) relies on host E820 map; 2) doesn't involve RMRR support. In the end, moving that function call has no effect whatsoever. Sorry I don't go into this details but seems I have nothing to do about this, ultimately. Right? Thanks Tiejun Sorry for the noise! Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression in OVMF + RMRR series
Jul 25 17:48:51.685107 (d1) BIOS map: Jul 25 17:48:51.685145 (d1) ffe0-: Main BIOS Jul 25 17:48:51.693030 (d1) *** HVMLoader bug at e820.c:262 Jul 25 17:48:51.693064 (d1) *** HVMLoader crashed. Git blame shows that the change that crashes hvmloader was part of the RMRR series. Tiejun, could you please fix this please? It should be easy to reproduce. This is really my responsibility and I just sent out one patch to fix this, tools/hvmloader: sync memory map[] Please take a review. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.6 v2 7/8] python/xc: reinstate original implementation of next_bdf
On 2015/7/28 1:45, Wei Liu wrote: I missed the fact that next_bdf is used to parsed user supplied strings when reviewing. The user supplied string is a NULL-terminated string separated by comma. User can supply several PCI devices in that string. There is, however, no delimiter for different devices, hence we can't change the syntax of that string. This patch reinstate the original implementation of next_bdf to preserve the original syntax. The last argument for xc_assign_device is always 0. Signed-off-by: Wei Liu wei.l...@citrix.com --- Cc: Tiejun Chen tiejun.c...@intel.com Tiejun, are you actually using this python binding? I don't think we This change is just following to RMRR series but currently we don't use this. So its fine if you think this don't break any other usages. Thanks Tiejun have in tree user. If nobody is using it, I propose we remove this binding in next release. I don't have live example of that string. My analysis is based on reverse-engineering of original code. --- tools/python/xen/lowlevel/xc/xc.c | 30 ++ 1 file changed, 10 insertions(+), 20 deletions(-) diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index c8380d1..2c36eb2 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -592,8 +592,7 @@ static int token_value(char *token) return strtol(token, NULL, 16); } -static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func, -int *flag) +static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func) { char *token; @@ -608,17 +607,8 @@ static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func, *dev = token_value(token); token = strchr(token, ',') + 1; *func = token_value(token); -token = strchr(token, ',') + 1; -if ( token ) { -*flag = token_value(token); -*str = token + 1; -} -else -{ -/* O means we take strict as our default policy. */ -*flag = 0; -*str = NULL; -} +token = strchr(token, ','); +*str = token ? token + 1 : NULL; return 1; } @@ -630,14 +620,14 @@ static PyObject *pyxc_test_assign_device(XcObject *self, uint32_t dom; char *pci_str; int32_t sbdf = 0; -int seg, bus, dev, func, flag; +int seg, bus, dev, func; static char *kwd_list[] = { domid, pci, NULL }; if ( !PyArg_ParseTupleAndKeywords(args, kwds, is, kwd_list, dom, pci_str) ) return NULL; -while ( next_bdf(pci_str, seg, bus, dev, func, flag) ) +while ( next_bdf(pci_str, seg, bus, dev, func) ) { sbdf = seg 16; sbdf |= (bus 0xff) 8; @@ -663,21 +653,21 @@ static PyObject *pyxc_assign_device(XcObject *self, uint32_t dom; char *pci_str; int32_t sbdf = 0; -int seg, bus, dev, func, flag; +int seg, bus, dev, func; static char *kwd_list[] = { domid, pci, NULL }; if ( !PyArg_ParseTupleAndKeywords(args, kwds, is, kwd_list, dom, pci_str) ) return NULL; -while ( next_bdf(pci_str, seg, bus, dev, func, flag) ) +while ( next_bdf(pci_str, seg, bus, dev, func) ) { sbdf = seg 16; sbdf |= (bus 0xff) 8; sbdf |= (dev 0x1f) 3; sbdf |= (func 0x7); -if ( xc_assign_device(self-xc_handle, dom, sbdf, flag) != 0 ) +if ( xc_assign_device(self-xc_handle, dom, sbdf, 0) != 0 ) { if (errno == ENOSYS) sbdf = -1; @@ -696,14 +686,14 @@ static PyObject *pyxc_deassign_device(XcObject *self, uint32_t dom; char *pci_str; int32_t sbdf = 0; -int seg, bus, dev, func, flag; +int seg, bus, dev, func; static char *kwd_list[] = { domid, pci, NULL }; if ( !PyArg_ParseTupleAndKeywords(args, kwds, is, kwd_list, dom, pci_str) ) return NULL; -while ( next_bdf(pci_str, seg, bus, dev, func, flag) ) +while ( next_bdf(pci_str, seg, bus, dev, func) ) { sbdf = seg 16; sbdf |= (bus 0xff) 8; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
Tiejun, please can you send a patch to fix this up. Please send just a revised version of this patch. I think the rest of the series will rebase just fine on top of it. (If I'm wrong then we will need to do something more complex.) Sorry to this. On ARM side the flag field doesn't take any affect. Signed-off-by: Tiejun Chen tiejun.c...@intel.com diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c index 9a667e9..b62c8cf 100644 --- a/xen/drivers/passthrough/arm/smmu.c +++ b/xen/drivers/passthrough/arm/smmu.c @@ -2709,7 +2709,8 @@ static int arm_smmu_reassign_dev(struct domain *s, struct domain *t, return ret; if (t) { - ret = arm_smmu_assign_dev(t, devfn, dev); + /* The flag field doesn't matter to DT device. */ + ret = arm_smmu_assign_dev(t, devfn, dev, 0); if (ret) return ret; } Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
Can you avoid the mention of DT in the comment please, since PCI will eventually go that path. Something like No flags are defined for ARM would suffice I think. Works for me. But in some way 0 also represents one valid flag. So what about this ? No flags are defined for ARM so its always safe to set 0. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
These cosmetic changes can be fixed by a follow-up patch. If Jackson would like not to fix this directly in his tree, I can post this a small patch but we'd better squash this into the predecessor just as one commit. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
OOPS! Please refer to this version: (One miss changing flag to flags in patch #11 although we can compile successfully.) #1. To patch #8 diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 2991333..9c5ef8b 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1316,7 +1316,7 @@ int xc_get_machine_memory_map(xc_interface *xch, uint32_t max_entries); int xc_reserved_device_memory_map(xc_interface *xch, - uint32_t flag, + uint32_t flags, uint16_t seg, uint8_t bus, uint8_t devfn, diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 298b3b5..1b074b7 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -686,7 +686,7 @@ int xc_domain_set_memory_map(xc_interface *xch, } int xc_reserved_device_memory_map(xc_interface *xch, - uint32_t flag, + uint32_t flags, uint16_t seg, uint8_t bus, uint8_t devfn, @@ -695,11 +695,11 @@ int xc_reserved_device_memory_map(xc_interface *xch, { int rc; struct xen_reserved_device_memory_map xrdmmap = { -.flag = flag, -.seg = seg, -.bus = bus, -.devfn = devfn, -.nr_entries = *max_entries +.flags = flags, +.nr_entries = *max_entries, +.dev.pci.seg = seg, +.dev.pci.bus = bus, +.dev.pci.devfn = devfn, }; DECLARE_HYPERCALL_BOUNCE(entries, sizeof(struct xen_reserved_device_memory) * #2. To patch #11 diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 29476fc..8d103c3 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -94,7 +94,7 @@ const char *libxl__domain_device_model(libxl__gc *gc, static int libxl__xc_device_get_rdm(libxl__gc *gc, - uint32_t flag, + uint32_t flags, uint16_t seg, uint8_t bus, uint8_t devfn, @@ -107,7 +107,7 @@ libxl__xc_device_get_rdm(libxl__gc *gc, * We really can't presume how many entries we can get in advance. */ *nr_entries = 0; -r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, +r = xc_reserved_device_memory_map(CTX-xch, flags, seg, bus, devfn, NULL, nr_entries); assert(r = 0); /* 0 means we have no any rdm entry. */ @@ -119,7 +119,7 @@ libxl__xc_device_get_rdm(libxl__gc *gc, } GCNEW_ARRAY(*xrdm, *nr_entries); -r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, +r = xc_reserved_device_memory_map(CTX-xch, flags, seg, bus, devfn, *xrdm, nr_entries); if (r) rc = ERROR_FAIL; @@ -212,7 +212,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc, unsigned int nr_entries; /* Collect all rdm info if exist. */ -rc = libxl__xc_device_get_rdm(gc, PCI_DEV_RDM_ALL, +rc = libxl__xc_device_get_rdm(gc, XENMEM_RDM_ALL, 0, 0, 0, nr_entries, xrdm); if (rc) goto out; @@ -240,7 +240,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc, devfn = PCI_DEVFN(d_config-pcidevs[i].dev, d_config-pcidevs[i].func); nr_entries = 0; -rc = libxl__xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL, +rc = libxl__xc_device_get_rdm(gc, 0, seg, bus, devfn, nr_entries, xrdm); if (rc) goto out; Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
I intend to produce a git branch RSN. On the staging? Tomorrow I can pull to check/test this. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
Thanks for your clarification to me. The solution to this is to be systematic in how you handle the email based review of a series, not to add a further to the reviewer by using IRC as well. For example, here is how I handle review of a series I am working on: I keep all the replies to a series I'm working on marked unread. When I come to rework the series I take the first reply to the first patch and start a draft reply to it quoting the entire review. I then go through the comments one by one and either: * make the _complete_ code change, including adding the Changes in vN bit to the commit log and delete that comment from the reply Are you saying this case of resending this particular patch online? or * write a reply in my draft to that particular comment which does one or more of: * Explain that I don't understand the suggestion, probably asking questions to try and clarify what was being asked for. Yes, in this case we're arguing, I was really trying to send a sample of this code fragment to ask this before I sent out the complete change. * Explain, with reasons, why I disagree with the suggestion * Explain, with reasons, why I only implemented part of the suggestion. Only then do I move on to the next comment in that mail and repeat. At the end I've either deleted all the comments from my draft (because I've fully implemented everything) so the draft can be discarded or I have an email to send explaining what I've not done and why. Only now do I mark the original review email as read. Then I move on to the next reply to that thread in my mail box and repeat until I have been through every mail in the thread and addressed _all_ of the comments. At the end of this process _every_ bit of review feedback is addressed either by making the requested change or by responding explaining the reason why the change hasn't been made. This is absolutely crucial. You should never silently ignore a piece of review, even if you don't I should double check each line but sometimes this doesn't mean I can understand everything correctly to do right thing as you expect. But this is really not what I intend to do. Thanks Tiejun understand it or disagree with it, always reply and explain why you haven't. If the review was particularly long or complex I will then do a second pass through the review comments and check that every comment is either mentioned in a Changes in vN changelog comment or I have replied to it. I do all of that before posting the next version. IMHO until a contributor has shown they are diligent about addressing review comments they should _never_ send out a series which only has review partially addressed. The presence of an IRC channel in no way changes the requirement to be systematic and thorough when dealing with email review. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v12] introduce XENMEM_reserved_device_memory_map
On 2015/7/22 21:03, Jan Beulich wrote: On 22.07.15 at 14:55, tiejun.c...@intel.com wrote: +#ifdef HAS_PASSTHROUGH +case XENMEM_reserved_device_memory_map: +{ +struct get_reserved_device_memory grdm; + +if ( unlikely(start_extent) ) +return -ENOSYS; + +if ( copy_from_guest(grdm.map, compat, 1) || + !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) ) +return -EFAULT; + +if ( grdm.map.flags ~XENMEM_RDM_ALL ) +return -EINVAL; + +grdm.used_entries = 0; +rc = iommu_get_reserved_device_memory(get_reserved_device_memory, + grdm); + Just as you asked me previously, Here if RDM doesn't exist, so grdm.map.nr_entries = grdm.used_entries = 0, and rc = 0, right? No, grdm.map.nr_entries still holds whatever the caller passed. What if the caller pass 0 like raising an inquiry? Indeed, this is what we did in patch #11. I think this is reasonable since the caller always doesn't know how much buffers should be allocated beforehand, so instead, the caller prefer to make this sort of inquiry without any buffers. +if ( !rc grdm.map.nr_entries grdm.used_entries ) +rc = -ENOBUFS; +grdm.map.nr_entries = grdm.used_entries; +if ( __copy_to_guest(compat, grdm.map, 1) ) So can we still do this copy here? We not only can, we need to. The only case where we might skip it is when the incoming grdm.map.nr_entries is unchanged. If what I'm saying above is right, __copy_to_guest() would return a error in this case, right? I don't think this make sense. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
On 2015/7/22 21:24, Ian Jackson wrote: Tiejun Chen writes ([v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM): Acked-by: Wei Liu wei.l...@citrix.com I have dropped Wei's ack on this from my git branch, as it was clearly inappropriate to retain it. (v11 was acked by me, so there is no need for a re-review by Wei unless he feels it would be useful.) Ian, Sounds you start to merge them into your tree? But now Jan is trying to update patch #1 as you see. I think something needs to be synced on tool sides. Although that is not finished, at least three changes exist: #1. PCI_DEV_RDM_ALL - XENMEM_RDM_ALL #2. new public memop interface structure #2.1 +union { +struct physdev_pci_device pci; +} dev; #2.2 flag - flags Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
On 2015/7/22 16:28, Jan Beulich wrote: On 22.07.15 at 03:30, tiejun.c...@intel.com wrote: CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Acked-by: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com Reviewed-by: Kevin Tian kevin.t...@intel.com --- v11: * Use GCNEW_ARRAY to replace libxl__malloc() * #define pfn_to_paddrk is missing safety () around x, and move this into libxl_internal.h * Rename set_rdm_entries() to add_rdm_entry() and put the increment at the end so that the assignments are to -rdms[d_config-num_rdms]. * Simply make it so that if there are any rdms specified in the domain config, they are used instead of the automatically gathered information (from strategy and devices). So just return if d_config-rmds is valid. * Shorten some code comments. I think it is not the first time that we're pointing out to you that when you make not just cosmetic changes, review and ack tags should be dropped. I don't recall this sort of requirement was mentioned. Instead, this is new to me. So where can I found this warning you said previously? Furthermore, you ask me to drop Reviewed-by/Acked-by in this revision, what's next? Just to this example, No.1 revision: Acked-by: Wei Liu wei.l...@citrix.com Reviewed-by: Kevin Tian kevin.t...@intel.com No.2 revision: I addressed some comments raised by Jackson. But you mean Reviewed-by/Acked-by should be dropped. No.3 revision: I assume Jackson Ack or Review to this so I should leave one line like this, Reviewed-by: Ian Jackson ian.jack...@eu.citrix.com without two previous Acked-by/Reviewed-by, right? So looks like the latter always override the former, right? And I also can't understand why we should drop Reviewed-by/Acked-by from other guys. And, all new comments I addressed don't conflict with our previous revision so why? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v13 00/16] Fix RMRR (avoid RDM)
Ian, This was supposed to be my responsibility to resend this series so appreciate for your extra effort. git branch here: http://xenbits.xen.org/gitweb/?p=people/iwj/xen.git;a=summary git://xenbits.xen.org/people/iwj/xen.git base.rdm-v13..wip.rdm-v13 I pick all RMRR patches from wip.rdm-v13 and apply them into staging branch in my tree. ... Tiejun, can you please test this series ? Hopefully if you say it Yes. I can compile and boot successfully with/without rdm setting. still works after this, we can commit it tomorrow. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
Ian, Thanks for your effort. A tiny change may be needed but I don't block this. +libxl__xc_device_get_rdm(libxl__gc *gc, + uint32_t flag, Since now we are sitting on xc_reserved_device_memory_map(, flags, xxx), s/flag/flags may be better. + uint16_t seg, + uint8_t bus, + uint8_t devfn, + unsigned int *nr_entries, + struct xen_reserved_device_memory **xrdm) +{ [snip] +r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, Ditto. + NULL, nr_entries); +assert(r = 0); +/* 0 means we have no any rdm entry. */ +if (!r) goto out; + +if (errno != ENOBUFS) { +rc = ERROR_FAIL; +goto out; +} + +GCNEW_ARRAY(*xrdm, *nr_entries); +r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, Ditto. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
On 2015/7/22 22:04, Ian Jackson wrote: Chen, Tiejun writes (Re: [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM): Sounds you start to merge them into your tree? But now Jan is trying to update patch #1 as you see. I think something needs to be synced on tool sides. Although that is not finished, at least three changes exist: Thanks. I am negotiating with Jan et al on IRC. We just need to sync two patches: #1. To patch #8: diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 2991333..9c5ef8b 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1316,7 +1316,7 @@ int xc_get_machine_memory_map(xc_interface *xch, uint32_t max_entries); int xc_reserved_device_memory_map(xc_interface *xch, - uint32_t flag, + uint32_t flags, uint16_t seg, uint8_t bus, uint8_t devfn, diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 298b3b5..1b074b7 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -686,7 +686,7 @@ int xc_domain_set_memory_map(xc_interface *xch, } int xc_reserved_device_memory_map(xc_interface *xch, - uint32_t flag, + uint32_t flags, uint16_t seg, uint8_t bus, uint8_t devfn, @@ -695,11 +695,11 @@ int xc_reserved_device_memory_map(xc_interface *xch, { int rc; struct xen_reserved_device_memory_map xrdmmap = { -.flag = flag, -.seg = seg, -.bus = bus, -.devfn = devfn, -.nr_entries = *max_entries +.flags = flags, +.nr_entries = *max_entries, +.dev.pci.seg = seg, +.dev.pci.bus = bus, +.dev.pci.devfn = devfn, }; DECLARE_HYPERCALL_BOUNCE(entries, sizeof(struct xen_reserved_device_memory) * #2. To patch #11: diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 29476fc..40b2bba 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -212,7 +212,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc, unsigned int nr_entries; /* Collect all rdm info if exist. */ -rc = libxl__xc_device_get_rdm(gc, PCI_DEV_RDM_ALL, +rc = libxl__xc_device_get_rdm(gc, XENMEM_RDM_ALL, 0, 0, 0, nr_entries, xrdm); if (rc) goto out; @@ -240,7 +240,7 @@ int libxl__domain_device_construct_rdm(libxl__gc *gc, devfn = PCI_DEVFN(d_config-pcidevs[i].dev, d_config-pcidevs[i].func); nr_entries = 0; -rc = libxl__xc_device_get_rdm(gc, ~PCI_DEV_RDM_ALL, +rc = libxl__xc_device_get_rdm(gc, 0, seg, bus, devfn, nr_entries, xrdm); if (rc) goto out; Note I just compiled with these changes since right now I can't access any machine to test. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v12] introduce XENMEM_reserved_device_memory_map
+#ifdef HAS_PASSTHROUGH +case XENMEM_reserved_device_memory_map: +{ +struct get_reserved_device_memory grdm; + +if ( unlikely(start_extent) ) +return -ENOSYS; + +if ( copy_from_guest(grdm.map, compat, 1) || + !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) ) +return -EFAULT; + +if ( grdm.map.flags ~XENMEM_RDM_ALL ) +return -EINVAL; + +grdm.used_entries = 0; +rc = iommu_get_reserved_device_memory(get_reserved_device_memory, + grdm); + Just as you asked me previously, Here if RDM doesn't exist, so grdm.map.nr_entries = grdm.used_entries = 0, and rc = 0, right? +if ( !rc grdm.map.nr_entries grdm.used_entries ) +rc = -ENOBUFS; +grdm.map.nr_entries = grdm.used_entries; +if ( __copy_to_guest(compat, grdm.map, 1) ) So can we still do this copy here? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
Just to your example, libxl_domain_config cfg; cfg.stuff = blah; cfg.rdm.strategy = HOST; libxl_domain_create_new(cfg, domid); libxl_domain_destroy(domid); Here looks you mean d_config-rdms would be changed, right? Currently this shouldn't be allowed. But I think we need to further discussion make this case clear after feature freeze since we didn't have this kind of assumption in our previous design. libxl_domain_create_new(cfg, domid); This response of yours does not lead me to think you have understood what I am saying, but I agree that this can be dealt with later (if Indeed, I'm not a fan to Xen tools so I can't picture what this real scenario would happen. So if I'm misunderstanding what you mean, just please correct me. Or if you still think its hard to explain this to me, just tell me what I should do. I think this make your life easy. Thanks Tiejun indeed it needs to be dealt with at all). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
On 2015/7/21 23:09, Ian Jackson wrote: Chen, Tiejun writes (Re: [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM): +static void +add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config, + uint64_t rdm_start, uint64_t rdm_size, int rdm_policy) +{ +d_config-num_rdms++; +d_config-rdms = libxl__realloc(NOGC, d_config-rdms, +d_config-num_rdms * sizeof(libxl_device_rdm)); + +d_config-rdms[d_config-num_rdms - 1].start = rdm_start; +d_config-rdms[d_config-num_rdms - 1].size = rdm_size; +d_config-rdms[d_config-num_rdms - 1].policy = rdm_policy; +} But, I wrote: Can I suggest a function void add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config, uint64_t rdm_start, uint64_t rdm_size, int rdm_policy) which assumes that d_config-num_rdms is set correctly, and increments it ? (Please put the increment at the end so that the assignments are to -rdms[d_config-num_rdms], or perhaps make a convenience alias.) Note the last paragraph. This is now the third time I have posted that text. It is the fifth request or clarification I have had to make about this very small area. I have to say that I'm finding this rather frustrating. Sorry, I just ignore the line in brackets since I always think this kind of thing is often not a big deal, and next time I should pay more attention to the (). But indeed, before I post this whole patch online I also picked up this chunk of code to ask you to take a look that. This manner means I'm not very sure if I'm addressing this properly. But I didn't get a further response, so I guess that should work for you and then I posted the whole online. Now back on our problem, static void add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config, uint64_t rdm_start, uint64_t rdm_size, int rdm_policy) { d_config-rdms = libxl__realloc(NOGC, d_config-rdms, (d_config-num_rdms+1) * sizeof(libxl_device_rdm)); d_config-rdms[d_config-num_rdms].start = rdm_start; d_config-rdms[d_config-num_rdms].size = rdm_size; d_config-rdms[d_config-num_rdms].policy = rdm_policy; d_config-num_rdms++; } Does this work for you? If I'm still wrong, please correct this function directly to cost you less. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
On 2015/7/21 23:57, Ian Jackson wrote: Chen, Tiejun writes (Re: [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM): Sorry, I just ignore the line in brackets since I always think this kind of thing is often not a big deal, and next time I should pay more attention to the (). But indeed, before I post this whole patch online I also picked up this chunk of code to ask you to take a look that. This manner means I'm not very sure if I'm addressing this properly. But I didn't get a further response, so I guess that should work for you and then I posted the whole online. You are talking about 55ae2bb1.9030...@intel.com I guess. I replied to that with several comments about your prose and about the computation of the new set of rdms. It's true that I didn't comment on the frat that you had half-done one of the things I had requested. It is of course a waste of my time to be constantly re-reviewing half-done changes. Next time I should see each line of all comments carefully. Maybe its good way to use IRC to take your quick advice in advance, and I hope this make you feel better. Anyway, sorry to bring this kind of inconvenience. Now back on our problem, static void add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config, uint64_t rdm_start, uint64_t rdm_size, int rdm_policy) { d_config-rdms = libxl__realloc(NOGC, d_config-rdms, (d_config-num_rdms+1) * sizeof(libxl_device_rdm)); d_config-rdms[d_config-num_rdms].start = rdm_start; d_config-rdms[d_config-num_rdms].size = rdm_size; d_config-rdms[d_config-num_rdms].policy = rdm_policy; d_config-num_rdms++; } Does this work for you? If I'm still wrong, please correct this function directly to cost you less. Yes, that is what I meant. Good to know. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
I hope the following can address all comments below: diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 2f8e590..a4bd2a1 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -407,7 +407,7 @@ int libxl__domain_build(libxl__gc *gc, switch (info-type) { case LIBXL_DOMAIN_TYPE_HVM: -ret = libxl__build_hvm(gc, domid, info, state); +ret = libxl__build_hvm(gc, domid, d_config, state); if (ret) goto out; diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 634b8d2..ba852fe 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -92,6 +92,276 @@ const char *libxl__domain_device_model(libxl__gc *gc, return dm; } +static int +libxl__xc_device_get_rdm(libxl__gc *gc, + uint32_t flag, + uint16_t seg, + uint8_t bus, + uint8_t devfn, + unsigned int *nr_entries, + struct xen_reserved_device_memory **xrdm) +{ +int rc = 0, r; + +/* + * We really can't presume how many entries we can get in advance. + */ +*nr_entries = 0; +r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, + NULL, nr_entries); +assert(r = 0); +/* 0 means we have no any rdm entry. */ +if (!r) goto out; + +if (errno != ENOBUFS) { +rc = ERROR_FAIL; +goto out; +} + +GCNEW_ARRAY(*xrdm, *nr_entries); +r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, + *xrdm, nr_entries); +if (r) +rc = ERROR_FAIL; + + out: +if (rc) { +*nr_entries = 0; +*xrdm = NULL; +LOG(ERROR, Could not get reserved device memory maps.\n); +} +return rc; +} + +/* + * Check whether there exists rdm hole in the specified memory range. + * Returns true if exists, else returns false. + */ +static bool overlaps_rdm(uint64_t start, uint64_t memsize, + uint64_t rdm_start, uint64_t rdm_size) +{ +return (start + memsize rdm_start) (start rdm_start + rdm_size); +} + +static void +add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config, + uint64_t rdm_start, uint64_t rdm_size, int rdm_policy) +{ +assert(d_config-num_rdms); + +d_config-rdms = libxl__realloc(NOGC, d_config-rdms, +d_config-num_rdms * sizeof(libxl_device_rdm)); + +d_config-rdms[d_config-num_rdms - 1].start = rdm_start; +d_config-rdms[d_config-num_rdms - 1].size = rdm_size; +d_config-rdms[d_config-num_rdms - 1].policy = rdm_policy; +} + +/* + * Check reported RDM regions and handle potential gfn conflicts according + * to user preferred policy. + * + * RDM can reside in address space beyond 4G theoretically, but we never + * see this in real world. So in order to avoid breaking highmem layout + * we don't solve highmem conflict. Note this means highmem rmrr could + * still be supported if no conflict. + * + * But in the case of lowmem, RDM probably scatter the whole RAM space. + * Especially multiple RDM entries would worsen this to lead a complicated + * memory layout. And then its hard to extend hvm_info_table{} to work + * hvmloader out. So here we're trying to figure out a simple solution to + * avoid breaking existing layout. So when a conflict occurs, + * + * #1. Above a predefined boundary (default 2G) + * - Move lowmem_end below reserved region to solve conflict; + * + * #2. Below a predefined boundary (default 2G) + * - Check strict/relaxed policy. + * strict policy leads to fail libxl. + * relaxed policy issue a warning message and also mask this entry + * INVALID to indicate we shouldn't expose this entry to hvmloader. + * Note when both policies are specified on a given region, the per-device + * policy should override the global policy. + */ +int libxl__domain_device_construct_rdm(libxl__gc *gc, + libxl_domain_config *d_config, + uint64_t rdm_mem_boundary, + struct xc_hvm_build_args *args) +{ +int i, j, conflict, rc; +struct xen_reserved_device_memory *xrdm = NULL; +uint32_t strategy = d_config-b_info.u.hvm.rdm.strategy; +uint16_t seg; +uint8_t bus, devfn; +uint64_t rdm_start, rdm_size; +uint64_t highmem_end = args-highmem_end ? args-highmem_end : (1ull32); + +/* Might not expose rdm. */ +if (strategy == LIBXL_RDM_RESERVE_STRATEGY_IGNORE +!d_config-num_pcidevs) +return 0; + +/* Query all RDM entries in this platform */ +if (strategy == LIBXL_RDM_RESERVE_STRATEGY_HOST) { +unsigned int nr_entries; + +/* Collect all rdm info if exist. */ +rc = libxl__xc_device_get_rdm(gc, PCI_DEV_RDM_ALL, + 0, 0, 0, nr_entries, xrdm); +
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
The domain destroy would not change cfg at all. Okay. If not, I should double check this duplication so what about a return in the case of duplicating one entry? What I am looking for is a *decision* about what the right behaviour is, backed up by a *rationale*. The most obvious answer to me would be that if an rdms array is specified, the strategy should be ignored. That is how the automatic numa placement API works. I'm not familiar with this. Do you mean this? if (d_config-num_rdms) strategy = LIBXL_RDM_RESERVE_STRATEGY_IGNORE; But except this global strategy, we also have per-device setting so maybe something like this? if (d_config-num_rdms) return 0; But another answer would be to take the union of the specified regions. That would be more complicated, because the union would have to be computed. if (d_config-rdms[i].start == rdm_start) return; That doesn't, of course, compute the union. Sorry I don't understand what the take the union of the specified regions is here. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
On 2015/7/21 19:11, Ian Jackson wrote: Chen, Tiejun writes (Re: [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM): I would add this check at the beginning of this function: assert(!d_config-num_rdms !d_config-rdms). As Ian Campbell and I have explained, that is not OK. The caller may choose to pass both some rdms and strategy=host. The most obvious case is something like the following code libxl_domain_config cfg; cfg.stuff = blah; cfg.rdm.strategy = HOST; libxl_domain_create_new(cfg, domid); libxl_domain_destroy(domid); I'm not sure about this procedure, so do you mean this doesn't free d_config-num_rdms/d_config-rdms? If not, I should double check this duplication so what about a return in the case of duplicating one entry? static void add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config, uint64_t rdm_start, uint64_t rdm_size, int rdm_policy) { if (d_config-rdms) { unsigned int i; for (i = 0; i d_config-num_rdms; i++) { if (d_config-rdms[i].start == rdm_start) return; } } d_config-num_rdms++; d_config-rdms = libxl__realloc(NOGC, d_config-rdms, d_config-num_rdms * sizeof(libxl_device_rdm)); d_config-rdms[d_config-num_rdms - 1].start = rdm_start; d_config-rdms[d_config-num_rdms - 1].size = rdm_size; d_config-rdms[d_config-num_rdms - 1].policy = rdm_policy; } Thanks Tiejun libxl_domain_create_new(cfg, domid); which must work (and the second domain should have identical configuration to the first). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
On 2015/7/21 19:11, Ian Jackson wrote: Chen, Tiejun writes (Re: [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM): I would add this check at the beginning of this function: assert(!d_config-num_rdms !d_config-rdms). As Ian Campbell and I have explained, that is not OK. The caller may choose to pass both some rdms and strategy=host. If you're talking to create the domain twice I think this should be reasonable, if (d_config-num_rdms) return 0; Because RDMs are always associated to the platform so we should keep the same array. I mean same configuration shouldn't change this point. Thanks Tiejun The most obvious case is something like the following code libxl_domain_config cfg; cfg.stuff = blah; cfg.rdm.strategy = HOST; libxl_domain_create_new(cfg, domid); libxl_domain_destroy(domid); libxl_domain_create_new(cfg, domid); which must work (and the second domain should have identical configuration to the first). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
But d_config is a libxl_domain_config which is supplied by libxl's caller. It might contain some rdms. I guess this line make you or other guys confused so lets delete this line directly. I don't think I am very confused. And if you still worry about something, I can add assert() at the beginning of this function like this, assert(!d_config-num_rdms !d_config-rdms). If you are sure that this assertion is correct, then that would be proper. But as I say above, I don't think it is. Based on Campbell' explanation I think you guys are raising a reasonable concern. We shouldn't clear that over there arbitrarily. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
I think the confusion here is that the d_config-rdms array (which num_rdms is the length of) is in the public API (because it is in libxl_types.idl) but is apparently only being used in this series as an internal state for the domain build process (i.e. xl doesn't ever add anything to the array rdms). Tiejun, is that an accurate summary? Yes. If the field is in the public API then the possibility of something being passed in their must be considered now, even if this particular series adds no such calls, since we cannot prevent 3rd party users of libxl adding such configuration. Is the possibility of the toolstack (i.e. the caller of libxl) supplying an array of rdm regions seems to be being left aside for future work or it not intended to ever support that? Its very possible so you're right. Thanks Tiejun Ian. And if you still worry about something, I can add assert() at the beginning of this function like this, assert(!d_config-num_rdms !d_config-rdms). If you are sure that this assertion is correct, then that would be proper. But as I say above, I don't think it is. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
On 2015/7/21 20:33, Ian Jackson wrote: Chen, Tiejun writes (Re: [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM): [Ian Jackson:] The most obvious answer to me would be that if an rdms array is specified, the strategy should be ignored. That is how the automatic numa placement API works. I'm not familiar with this. Do you mean this? if (d_config-num_rdms) strategy = LIBXL_RDM_RESERVE_STRATEGY_IGNORE; But except this global strategy, we also have per-device setting so maybe something like this? if (d_config-num_rdms) return 0; This latter was the kind of approach I had in mind. I think that will do for 4.6. Thanks for your guideline on this patch, and also I realize lots of things need to be improved. So I'm going to make a TODO list publically after this feature freeze. Now just for this patch, please take a final review (I just hope so.) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 2f8e590..a4bd2a1 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -407,7 +407,7 @@ int libxl__domain_build(libxl__gc *gc, switch (info-type) { case LIBXL_DOMAIN_TYPE_HVM: -ret = libxl__build_hvm(gc, domid, info, state); +ret = libxl__build_hvm(gc, domid, d_config, state); if (ret) goto out; diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 634b8d2..3afa735 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -92,6 +92,280 @@ const char *libxl__domain_device_model(libxl__gc *gc, return dm; } +static int +libxl__xc_device_get_rdm(libxl__gc *gc, + uint32_t flag, + uint16_t seg, + uint8_t bus, + uint8_t devfn, + unsigned int *nr_entries, + struct xen_reserved_device_memory **xrdm) +{ +int rc = 0, r; + +/* + * We really can't presume how many entries we can get in advance. + */ +*nr_entries = 0; +r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, + NULL, nr_entries); +assert(r = 0); +/* 0 means we have no any rdm entry. */ +if (!r) goto out; + +if (errno != ENOBUFS) { +rc = ERROR_FAIL; +goto out; +} + +GCNEW_ARRAY(*xrdm, *nr_entries); +r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn, + *xrdm, nr_entries); +if (r) +rc = ERROR_FAIL; + + out: +if (rc) { +*nr_entries = 0; +*xrdm = NULL; +LOG(ERROR, Could not get reserved device memory maps.\n); +} +return rc; +} + +/* + * Check whether there exists rdm hole in the specified memory range. + * Returns true if exists, else returns false. + */ +static bool overlaps_rdm(uint64_t start, uint64_t memsize, + uint64_t rdm_start, uint64_t rdm_size) +{ +return (start + memsize rdm_start) (start rdm_start + rdm_size); +} + +static void +add_rdm_entry(libxl__gc *gc, libxl_domain_config *d_config, + uint64_t rdm_start, uint64_t rdm_size, int rdm_policy) +{ +d_config-num_rdms++; +d_config-rdms = libxl__realloc(NOGC, d_config-rdms, +d_config-num_rdms * sizeof(libxl_device_rdm)); + +d_config-rdms[d_config-num_rdms - 1].start = rdm_start; +d_config-rdms[d_config-num_rdms - 1].size = rdm_size; +d_config-rdms[d_config-num_rdms - 1].policy = rdm_policy; +} + +/* + * Check reported RDM regions and handle potential gfn conflicts according + * to user preferred policy. + * + * RDM can reside in address space beyond 4G theoretically, but we never + * see this in real world. So in order to avoid breaking highmem layout + * we don't solve highmem conflict. Note this means highmem rmrr could + * still be supported if no conflict. + * + * But in the case of lowmem, RDM probably scatter the whole RAM space. + * Especially multiple RDM entries would worsen this to lead a complicated + * memory layout. And then its hard to extend hvm_info_table{} to work + * hvmloader out. So here we're trying to figure out a simple solution to + * avoid breaking existing layout. So when a conflict occurs, + * + * #1. Above a predefined boundary (default 2G) + * - Move lowmem_end below reserved region to solve conflict; + * + * #2. Below a predefined boundary (default 2G) + * - Check strict/relaxed policy. + * strict policy leads to fail libxl. + * relaxed policy issue a warning message and also mask this entry + * INVALID to indicate we shouldn't expose this entry to hvmloader. + * Note when both policies are specified on a given region, the per-device + * policy should override the global policy. + */ +int libxl__domain_device_construct_rdm(libxl__gc *gc, + libxl_domain_config *d_config
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
But another answer would be to take the union of the specified regions. That would be more complicated, because the union would have to be computed. if (d_config-rdms[i].start == rdm_start) return; That doesn't, of course, compute the union. Sorry I don't understand what the take the union of the specified regions is here. A region is a subset of the address space. I meant the set union: https://en.wikipedia.org/wiki/Union_%28set_theory%29 That is, an address would be reserved if it was reserved in any of the rdm regions implied by the config. Are you saying this point? The union of two sets A and B is the set of elements which are in A, in B, or in both A and B. The explicitly specified regions might overlap with the computed ones, without being identical. Computing the union would not be entirely trivial. Just to your example, libxl_domain_config cfg; cfg.stuff = blah; cfg.rdm.strategy = HOST; libxl_domain_create_new(cfg, domid); libxl_domain_destroy(domid); Here looks you mean d_config-rdms would be changed, right? Currently this shouldn't be allowed. But I think we need to further discussion make this case clear after feature freeze since we didn't have this kind of assumption in our previous design. libxl_domain_create_new(cfg, domid); Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
Before you add memory_map.nr_map, you should be able to iterate from 0 to (not inclusive) nr. At least as far as I recall the original patch. Sorry, I really don't understand what you want. Before we add memory_map.nr_map, e820[0, nr) don't include low/high memory, right? Why? memory_map is representing the reserved areas only, isn't it? If that's not the case, then of course everything is fine. I'm pretty sure the memory map we get here is an extension of the original PV-only get_e820 hypercall, which *does* include both the lowmem and highmem regions. In any case, it's pretty clear from the patched code that Tiejun is removing the old code which created the lowmem and highmem regions and is not replacing it. Where do you think the highmem region he's looking for was coming from? On second thoughts, I prefer to check/sync memory_map.map[] before copy them into e820 since ultimately this can make sure hvm_info, memory_map.map[] and e820 are on the same page. Anyway, I would send out v10 to address others so please further post your comments over there. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 00/16] Fix RMRR
One brief request -- if you do end up sending this series again, can you please rebase to staging? This is the second time I've had to manually fix up some patches so that they apply. Sorry for this inconvenience. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table
v10: * Instead of correcting e820, I'd like to correct memory_map.map[] and then copy them into e820 directly. I think this can make sure hvm_info, memory_map.map[] and e820 are on the same page. Actually, now that you mention it -- this should probably happen instead when we update hvm_info-{low,high}_mem_pgend. I also considered this point previously but I thought just right now we only update hvm_info-low/high_mem_pgend inside pci_setup(). But you can't guarantee this would be a sole place in the future. Instead, memory_map.map[] would always be copied into e820 when we build e820 table. So I think we'd better do this update once at the last minute. Thanks Tiejun I'm happy to leave this where it is for now, so with Jan's comments addressed (in particular, incrementing nr_map): Reviewed-by: George Dunlap george.dun...@eu.citrix.com But if we have time it might be better to pull the loop in pci_setup() which moves the memory out into a function, and have that function modify the lowmem and highmem map[] entries at the same time we modify low_mem_pgend and high_mem_pgend. Alternately, you could put that on your list of clean-ups to hvmloader for 4.7. Thanks, -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs
On 2015/7/20 22:16, Jan Beulich wrote: On 20.07.15 at 16:10, george.dun...@citrix.com wrote: Hmm... although I suppose that doesn't catch the possibility of a memory range crossing the 4G boundary. I think we can safely ignore that - both real and virtual hardware have special regions right below 4Gb, so neither RAM not RMRRs can be reasonably placed there. Okay, I regenerate this patch online. And I just hope its good to be acked here: hvmloader/pci: Try to avoid placing BARs in RMRRs Try to avoid placing PCI BARs over RMRRs: - If mmio_hole_size is not specified, and the existing MMIO range has RMRRs in it, and there is space to expand the hole in lowmem without moving more memory, then make the MMIO hole as large as possible. - When placing RMRRs, find the next RMRR higher than the current base in the lowmem mmio hole. If it overlaps, skip ahead of it and find the next one. This certainly won't work in all cases, but it should work in a significant number of cases. Additionally, users should be able to work around problems by setting mmio_hole_size larger in the guest config. Signed-off-by: George Dunlap george.dun...@eu.citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- tools/firmware/hvmloader/pci.c | 65 ++ 1 file changed, 65 insertions(+) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index 5ff87a7..74fc080 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -38,6 +38,46 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0; enum virtual_vga virtual_vga = VGA_none; unsigned long igd_opregion_pgbase = 0; +/* Check if the specified range conflicts with any reserved device memory. */ +static bool check_overlap_all(uint64_t start, uint64_t size) +{ +unsigned int i; + +for ( i = 0; i memory_map.nr_map; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED + check_overlap(start, size, + memory_map.map[i].addr, + memory_map.map[i].size) ) +return true; +} + +return false; +} + +/* Find the lowest RMRR higher than base. */ +static int find_next_rmrr(uint32_t base) +{ +unsigned int i; +int next_rmrr = -1; +uint64_t end, min_end = (1ull 32); + +for ( i = 0; i memory_map.nr_map ; i++ ) +{ +end = memory_map.map[i].addr + memory_map.map[i].size; + +if ( memory_map.map[i].type == E820_RESERVED + end base + min_end min_end ) +{ +next_rmrr = i; +min_end = end; +} +} + +return next_rmrr; +} + void pci_setup(void) { uint8_t is_64bar, using_64bar, bar64_relocate = 0; @@ -46,6 +86,7 @@ void pci_setup(void) uint32_t vga_devfn = 256; uint16_t class, vendor_id, device_id; unsigned int bar, pin, link, isa_irq; +int next_rmrr; /* Resources assignable to PCI devices via BARs. */ struct resource { @@ -299,6 +340,15 @@ void pci_setup(void) || (((pci_mem_start 1) PAGE_SHIFT) = hvm_info-low_mem_pgend)) ) pci_mem_start = 1; + +/* + * Try to accomodate RMRRs in our MMIO region on a best-effort basis. + * If we have RMRRs in the range, then make pci_mem_start just after + * hvm_info-low_mem_pgend. + */ +if ( pci_mem_start (hvm_info-low_mem_pgend PAGE_SHIFT) + check_overlap_all(pci_mem_start, pci_mem_end-pci_mem_start) ) +pci_mem_start = hvm_info-low_mem_pgend PAGE_SHIFT; } if ( mmio_total (pci_mem_end - pci_mem_start) ) @@ -352,6 +402,8 @@ void pci_setup(void) io_resource.base = 0xc000; io_resource.max = 0x1; +next_rmrr = find_next_rmrr(pci_mem_start); + /* Assign iomem and ioport resources in descending order of size. */ for ( i = 0; i nr_bars; i++ ) { @@ -407,6 +459,19 @@ void pci_setup(void) } base = (resource-base + bar_sz - 1) ~(uint64_t)(bar_sz - 1); + +/* If we're using mem_resource, check for RMRR conflicts. */ +while ( resource == mem_resource +next_rmrr = 0 +check_overlap(base, bar_sz, + memory_map.map[next_rmrr].addr, + memory_map.map[next_rmrr].size) ) +{ +base = memory_map.map[next_rmrr].addr + memory_map.map[next_rmrr].size; +base = (base + bar_sz - 1) ~(bar_sz - 1); +next_rmrr = find_next_rmrr(base); +} + bar_data |= (uint32_t)base; bar_data_upper = (uint32_t)(base 32); base += bar_sz; -- 1.9.1 Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table
Looks just a little bit should be changed so I also paste this new online to try winning your Acked here, hvmloader/e820: construct guest e820 table Now use the hypervisor-supplied memory map to build our final e820 table: * Add regions for BIOS ranges and other special mappings not in the hypervisor map * Add in the hypervisor supplied regions * Adjust the lowmem and highmem regions if we've had to relocate memory (adding a highmem region if necessary) * Sort all the ranges so that they appear in memory order. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Reviewed-by: George Dunlap george.dun...@eu.citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- tools/firmware/hvmloader/e820.c | 109 +++- 1 file changed, 96 insertions(+), 13 deletions(-) diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c index 7a414ab..a6cacdf 100644 --- a/tools/firmware/hvmloader/e820.c +++ b/tools/firmware/hvmloader/e820.c @@ -105,7 +105,11 @@ int build_e820_table(struct e820entry *e820, unsigned int lowmem_reserved_base, unsigned int bios_image_base) { -unsigned int nr = 0; +unsigned int nr = 0, i, j; +uint32_t low_mem_end = hvm_info-low_mem_pgend PAGE_SHIFT; +uint32_t add_high_mem = 0; +uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT; +uint64_t map_start, map_size, map_end; if ( !lowmem_reserved_base ) lowmem_reserved_base = 0xA; @@ -149,13 +153,6 @@ int build_e820_table(struct e820entry *e820, e820[nr].type = E820_RESERVED; nr++; -/* Low RAM goes here. Reserve space for special pages. */ -BUG_ON((hvm_info-low_mem_pgend PAGE_SHIFT) (2u 20)); -e820[nr].addr = 0x10; -e820[nr].size = (hvm_info-low_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; -nr++; - /* * Explicitly reserve space for special pages. * This space starts at RESERVED_MEMBASE an extends to cover various @@ -191,16 +188,102 @@ int build_e820_table(struct e820entry *e820, nr++; } +/* Low RAM goes here. Reserve space for special pages. */ +BUG_ON(low_mem_end (2u 20)); -if ( hvm_info-high_mem_pgend ) +/* + * Construct E820 table according to recorded memory map. + * + * The memory map created by toolstack may include, + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * + * #2. Reserved regions if they exist + * + * #3. High memory region if it exists + * + * Note we just have one low memory entry and one high mmeory entry if + * exists. + * + * But we may have relocated RAM to allocate sufficient MMIO previously + * so low_mem_pgend would be changed over there. And here memory_map[] + * records the original low/high memory, so if low_mem_end is less than + * the original we need to revise low/high memory range firstly. + */ +for ( i = 0; i memory_map.nr_map; i++ ) { -e820[nr].addr = ((uint64_t)1 32); -e820[nr].size = -((uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; +map_start = memory_map.map[i].addr; +map_size = memory_map.map[i].size; +map_end = map_start + map_size; + +/* If we need to adjust lowmem. */ +if ( memory_map.map[i].type == E820_RAM + low_mem_end map_start low_mem_end map_end ) +{ +add_high_mem = map_end - low_mem_end; +memory_map.map[i].size = low_mem_end - map_start; +break; +} +} + +/* If we need to adjust highmem. */ +if ( add_high_mem ) +{ +/* Modify the existing highmem region if it exists. */ +for ( i = 0; i memory_map.nr_map; i++ ) +{ +map_start = memory_map.map[i].addr; +map_size = memory_map.map[i].size; +map_end = map_start + map_size; + +if ( memory_map.map[i].type == E820_RAM + map_start == ((uint64_t)1 32)) +{ +memory_map.map[i].size += add_high_mem; +break; +} +} + +/* If there was no highmem region, just create one. */ +if ( i == memory_map.nr_map ) +{ +memory_map.map[i].addr = ((uint64_t)1 32); +memory_map.map[i].size = add_high_mem; +memory_map.map[i].type = E820_RAM; +memory_map.nr_map++; +} + +/* A sanity
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
Note I need more time to address others. +int libxl__domain_device_construct_rdm(libxl__gc *gc, + libxl_domain_config *d_config, + uint64_t rdm_mem_boundary, + struct xc_hvm_build_args *args) +{ ... +/* Query all RDM entries in this platform */ +if (strategy == LIBXL_RDM_RESERVE_STRATEGY_HOST) { ... +} else { +d_config-num_rdms = 0; +} Does this not override the domain configuration's num_rdms ? I don't We don't have the specific num_rdms parameter in .cfg so I don't understand what you mean here. Thanks Tiejun think that is correct. If the domain configuration has rdms and num_rdms already set, then the strategy should presumably be ignored. (Passing the same domain configuration struct to libxl_domain_create_new, after destroying the domain, ought to work, even after the first call has modified it.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table
Actually, now that you mention it -- this should probably happen instead when we update hvm_info-{low,high}_mem_pgend. I also considered this point previously but I thought just right now we only update hvm_info-low/high_mem_pgend inside pci_setup(). But you can't guarantee this would be a sole place in the future. Instead, memory_map.map[] would always be copied into e820 when we build e820 table. We can guarantee it, if nothing else by making sure that no new changes are added which change the one but not the other. This means you have to syn this again once you change hvm_info so I think this may cost a little bit. Thanks Tiejun But perhaps better would be to put a check in build_e820_map() to BUG() if hvm_info and memory_map are out of sync. On the other hand, looking at this now, I think that long-term we should probably move to have *all* information about the memory layout passed to hvmloader via the memory map, rather than via hvm_info. That way we can also get rid of the magic knowledge that hvmloader has about the memory layout (e.g., the VGA hole). -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs
+/* Find the lowest RMRR higher than base. */ +static int find_next_rmrr(uint32_t base) +{ +unsigned int i; +int next_rmrr = -1; +uint64_t min_base = (1ull 32); + +for ( i = 0; i memory_map.nr_map ; i++ ) +{ +if ( memory_map.map[i].type == E820_RESERVED + memory_map.map[i].addr base + memory_map.map[i].addr min_base ) +{ +next_rmrr = i; +min_base = memory_map.map[i].addr; +} +} +return next_rmrr; +} Considering _both_ callers, I think the function should actually return the lowest RMRR higher than or equal to base. You mean instead of strictly greater than the base. Or wait - we actually need to find the lowest RMRR the _end_ of which is higher than base. Yes, you're right: there's always a risk that pci_mem_start will *start* in the middle of a range. Looking for the next *end* is more robust. Sorry this is not very clear to me so are you saying something like this? for ( i = 0; i memory_map.nr_map ; i++ ) { end = memory_map.map[i].addr + memory_map.map[i].size; if ( memory_map.map[i].type == E820_RESERVED end base memory_map.map[i].addr min_base ) { next_rmrr = i; min_base = memory_map.map[i].addr; } } Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 16/16] tools: parse to enable new rdm policy parameters
For clarity: I am not acking this patch, primarily because I am not happy with the code in xlu_rdm_parse which is (a) the result of repeated clone-and-hack and (b) consists of ad-hoc string pointer fiddling. Yes, I knew you mentioned this previously but I also remember our last deal was something as follows: Really I would prefer that this parsing was done with a miniature flex parser, rather than ad-hoc pointer arithmetic and use of strtok. Sorry, could you show this explicitly? Something like what was done for disk devices. See libxlu_disk_l.l for an example. In this case your code would be a lot less complicated than what you see there. After the codefreeze I would probably have some time to write it for Sounds yourself would do this so currently I just keep the original, right? Thanks Tiejun you. (I think that would be valuable because libxlu_disk_l.l is a very complicated example, and I want be able to point future submitters at something simpler.) Ian. Then I didn't receive any response again so I thought yourself made this promise. Thanks Tiejun If I had been able to review this patch earlier in the release cycle I would be explictly nacking this patch. It is true that maybe someone will have some time to clean this up later; but in practice it often turns out that they don't - which is why we usually try not to accept patches on the basis of promises to do further cleanup. However, I am late to this party. I first made this complaint in response to v7 on the 9th of July. Under the circumstances I am going to stand aside and neither ack nor nack this patch. The rules then say that the patch may be committed given that it has Wei's ack as a tools maintainer. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM
+int libxl__domain_device_construct_rdm(libxl__gc *gc, + libxl_domain_config *d_config, + uint64_t rdm_mem_boundary, + struct xc_hvm_build_args *args) +{ ... +/* Query all RDM entries in this platform */ +if (strategy == LIBXL_RDM_RESERVE_STRATEGY_HOST) { ... +} else { +d_config-num_rdms = 0; +} Does this not override the domain configuration's num_rdms ? I don't We don't have the specific num_rdms parameter in .cfg so I don't understand what you mean here. The domain configuration specified to libxl might contain some rdms. Then num_rdms in the incoming config would be nonzero. We never set d_config-num_rdms/d_config-rdms before we goes inside libxl__domain_device_construct_rdm(). And actually libxl__domain_device_construct_rdm is only one place to set d_config-num_rdms/d_config-rdms. I guess this line make you or other guys confused so lets delete this line directly. And if you still worry about something, I can add assert() at the beginning of this function like this, assert(!d_config-num_rdms !d_config-rdms). Thanks Tiejun So I think there are two problems here: 1. If that were the case you would leak the application's rdms array. 2. Anyway, if the caller specifies such an array you should use it. (Fixing this would avoid (1) in any case.) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs
Okay, I regenerate this patch online. And I just hope its good to be acked here: Provided it also works, Reviewed-by: Jan Beulich jbeul...@suse.com Why is this marked as Acked-by this time? The same question is raised to another hvmloader patch as well. This really makes me confused since you're the key maintainer associated to this, and I remember you also gave me Acked-by to the first hvmloader patch. I know this solution is always argued, so does this mean you still don't think this is good to go in the tree in your perspective, so you want to leave this Acked-by to other maintainers, right? And what about patch #7, hvmloader/e820: construct guest e820 table, is this also not fine to you? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v9][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
--- v9: * A little improvement to code implementation but again, its still argued about this solution. And as said in reply to George's reply to v8 - the alternative he proposed is still better than this one, and would therefore have better chances of me agreeing to take what is there instead of pushing for a proper solution. Looks I just need to pick up George' patch as our solution at least to eliminate all arguments in current cycle. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Requesting for freeze exception for RMRR
My main disagreement here continues to be that we're talking about a bug fix, and hence I don't view this as needing a freeze exception in the first place (at least not at this point in time). Yes, the bug fix involves adding code that looks like a new feature, but that happens with bug fixes. Fine then. I'm not going to argue feature vs bug fix at this stage. The final resolution is still the same. Tiejun can continue working on this next week. Wei and Jan, Really thanks for your clarification to this case. Looks two key problems should be addressed as follows: #1. mmio conflicting with RDM As Jan suggested George's patch is fine in this phrase. #2. construct e820 I need to understand what Jan comments properly than resend a patch. I'm going to finalize these things next week as early as possible. Again, thanks for all guys's help and I can't walk here without any your guides. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
On 2015/7/17 18:50, Jan Beulich wrote: On 17.07.15 at 11:09, tiejun.c...@intel.com wrote: And then of course there's the question of whether nr is really the right upper loop bound here: Just prior to this you added the hypervisor supplied entries - why would you need to iterate over them here? I.e. I'd see this better be moved ahead of that other code. Sounds you mean I should sync low/high memory in memory_map.map[] beforehand and then fulfill e820 like this, Why would you want/need to sync into memory_map.map[]? But actually I just felt this make our process clear. That's certainly not what I suggested. Do you mean I should check low/high mem before we add the hypervisor supplied entries like this? +for ( i = nr-1; i memory_map.nr_map; i-- ) +{ +uint64_t end = e820[i].addr + e820[i].size; + +if ( e820[i].type == E820_RAM + low_mem_end e820[i].addr low_mem_end end ) +{ +add_high_mem = end - low_mem_end; +e820[i].size = low_mem_end - e820[i].addr; +break; +} +} + +/* And then we also need to adjust highmem. */ +if ( add_high_mem ) +{ +/* Modify the existing highmem region if it exists. */ +for ( i = nr-1 ; i memory_map.nr_map; i-- ) +{ +if ( e820[i].type == E820_RAM + e820[i].addr == ((uint64_t)1 32)) +{ +e820[i].size += add_high_mem; +break; +} +} + +/* If there was no highmem region, just create one. */ +if ( i == memory_map.nr_map ) +{ +e820[nr].addr = ((uint64_t)1 32); +e820[nr].size = add_high_mem; +e820[nr].type = E820_RAM; +i = nr; +nr++; +} + +/* A sanity check if high memory is broken. */ +BUG_ON( high_mem_end != e820[i].addr + e820[i].size); +} Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Requesting for freeze exception for RMRR
The PCI allocation code is in a state, but it was in a similarly bad state before. I agree with Jan's point of the risk that these new changes cause a regression in booting guests, although we can mitigate that somewhat by testing. I feel at this point that we shouldn't block the RMRR bugfix on also fixing the PCI allocation algorithm (which was a pre-existing issue). Therefore, I recommend that v9 gets respun to v10 to address the current comments, and accepted. Afterwards, the PCI allocation algorithm gets worked on as a bugfix activity, to pro actively cater for the risk of regression. If you have any good solution to fix the PCI allocation algorithm completely, things couldn't be better :) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
The way it's written I take it that you assume there to be exactly one region that the adjustment needs to be done for. Iirc this is correct with the current model, but why would you continue the loop then afterwards? Placing a break in the if()'s body would document the fact that only one such region should exist, and would eliminate questions as to whether add_high_mem shouldn't be updated (+=) instead of simply being assigned a new value. Yes, break should be added here. And then of course there's the question of whether nr is really the right upper loop bound here: Just prior to this you added the hypervisor supplied entries - why would you need to iterate over them here? I.e. I'd see this better be moved ahead of that other code. Sounds you mean I should sync low/high memory in memory_map.map[] beforehand and then fulfill e820 like this, diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c index 7a414ab..b0aa48d 100644 --- a/tools/firmware/hvmloader/e820.c +++ b/tools/firmware/hvmloader/e820.c @@ -105,7 +105,11 @@ int build_e820_table(struct e820entry *e820, unsigned int lowmem_reserved_base, unsigned int bios_image_base) { -unsigned int nr = 0; +unsigned int nr = 0, i, j; +uint32_t low_mem_end = hvm_info-low_mem_pgend PAGE_SHIFT; +uint32_t add_high_mem = 0; +uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT; +uint64_t map_start, map_size, map_end; if ( !lowmem_reserved_base ) lowmem_reserved_base = 0xA; @@ -149,13 +153,6 @@ int build_e820_table(struct e820entry *e820, e820[nr].type = E820_RESERVED; nr++; -/* Low RAM goes here. Reserve space for special pages. */ -BUG_ON((hvm_info-low_mem_pgend PAGE_SHIFT) (2u 20)); -e820[nr].addr = 0x10; -e820[nr].size = (hvm_info-low_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; -nr++; - /* * Explicitly reserve space for special pages. * This space starts at RESERVED_MEMBASE an extends to cover various @@ -191,16 +188,101 @@ int build_e820_table(struct e820entry *e820, nr++; } +/* Low RAM goes here. Reserve space for special pages. */ +BUG_ON(low_mem_end (2u 20)); -if ( hvm_info-high_mem_pgend ) +/* + * Construct E820 table according to recorded memory map. + * + * The memory map created by toolstack may include, + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * + * #2. Reserved regions if they exist + * + * #3. High memory region if it exists + * + * Note we just have one low memory entry and one high mmeory entry if + * exists. + * + * But we may have relocated RAM to allocate sufficient MMIO previously + * so low_mem_pgend would be changed over there. And here memory_map[] + * records the original low/high memory, so if low_mem_end is less than + * the original we need to revise low/high memory range firstly. + */ +for ( i = 0; i memory_map.nr_map; i++ ) { -e820[nr].addr = ((uint64_t)1 32); -e820[nr].size = -((uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; +map_start = memory_map.map[i].addr; +map_size = memory_map.map[i].size; +map_end = map_start + map_end; + +/* If we need to adjust lowmem. */ +if ( memory_map.map[i].type == E820_RAM + low_mem_end map_start low_mem_end map_end ) +{ +add_high_mem = map_end - low_mem_end; +memory_map.map[i].size = low_mem_end - map_start; +break; +} +} + +/* If we need to adjust highmem. */ +if ( add_high_mem ) +{ +/* Modify the existing highmem region if it exists. */ +for ( i = 0; i memory_map.nr_map; i++ ) +{ +map_start = memory_map.map[i].addr; +map_size = memory_map.map[i].size; +map_end = map_start + map_end; + +if ( memory_map.map[i].type == E820_RAM + map_start == ((uint64_t)1 32)) +{ +memory_map.map[i].size += add_high_mem; +break; +} +} + +/* If there was no highmem region, just create one. */ +if ( i == memory_map.nr_map ) +{ +memory_map.map[i].addr = ((uint64_t)1 32); +memory_map.map[i].size = add_high_mem; +memory_map.map[i].type = E820_RAM; +} + +/* A sanity check if high memory is broken. */ +BUG_ON( high_mem_end != +memory_map.map[i].addr + memory_map.map[i].size); +} + +/* Now fulfill e820. */ +for ( i = 0; i memory_map.nr_map; i++ )
Re: [Xen-devel] Requesting for freeze exception for RMRR
I think Andrew means you (or someone else) improves that algorithm later. No need to provide a perfect solution next week. Yes, I understand what he mean. But I still want to further ask if he have such a good idea right now, maybe we can try to address that directly :) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
+for ( i = nr-1; i memory_map.nr_map; i-- ) Before you add memory_map.nr_map, you should be able to iterate from 0 to (not inclusive) nr. At least as far as I recall the original patch. Sorry, I really don't understand what you want. Before we add memory_map.nr_map, e820[0, nr) don't include low/high memory, right? So sounds you want me to for ( i = 0 i memory_map.nr_map; i++ ) { if we need to adjust low memory, we just set final low e820 entry; if we need to adjust high memory, we just set final high e820 entry; } Right? But its impossible to do this since we can't assume memory_map.map[low memory] is always prior to memory_map.map[high memory]. If I still follow your way, please don't mind to show a pseudocode help me understand what you want. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
Remind me again please - what prevents the highmem region from colliding with hypervisor supplied entries? Also, what if the resulting region exceeds the addressable range (guest's view of CPUID[8008].EAX[0:7])? Any idea to this? I think this issue also exists previously. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table
On 2015/7/18 0:06, Jan Beulich wrote: On 17.07.15 at 17:54, tiejun.c...@intel.com wrote: +for ( i = nr-1; i memory_map.nr_map; i-- ) Before you add memory_map.nr_map, you should be able to iterate from 0 to (not inclusive) nr. At least as far as I recall the original patch. Sorry, I really don't understand what you want. Before we add memory_map.nr_map, e820[0, nr) don't include low/high memory, right? Why? memory_map is representing the reserved areas only, isn't it? + * The memory map created by toolstack may include, + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * + * #2. Reserved regions if they exist + * + * #3. High memory region if it exists Am I wrong with your expectation? Thanks Tiejun If that's not the case, then of course everything is fine. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/16 17:40, George Dunlap wrote: On Thu, Jul 16, 2015 at 3:05 AM, Chen, Tiejun tiejun.c...@intel.com wrote: Could you take a look at the original patch #06 ? Although Jan thought that is complicated, that is really one version that I can refine in current time slot. When you say original, which version are you talking about? You mean the one at the base of this thread (v7)? Yes, I'm pointing patch #6 in v7. And sorry to make this confused to you. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy
On 2015/7/16 15:40, Jan Beulich wrote: On 16.07.15 at 08:52, tiejun.c...@intel.com wrote: @@ -1577,9 +1578,15 @@ int iommu_do_pci_domctl( seg = machine_sbdf 16; bus = PCI_BUS(machine_sbdf); devfn = PCI_DEVFN2(machine_sbdf); +flag = domctl-u.assign_device.flag; +if ( flag XEN_DOMCTL_DEV_RDM_RELAXED ) Didn't we settle on flag ~XEN_DOMCTL_DEV_RDM_RELAXED? Sorry its my fault to miss this merge. BTW, could I resend this patch separately to get your Ack? If you don't have other objections. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 00/16] Fix RMRR
On 2015/7/16 15:55, Jan Beulich wrote: On 10.07.15 at 16:50, george.dun...@eu.citrix.com wrote: On Thu, Jul 9, 2015 at 6:33 AM, Tiejun Chen tiejun.c...@intel.com wrote: v7: It looks like most of the libxl/libxc patches have been acked. It seems to me that most of the hypervisor patches (1-3, 14-15) are either ready to go in or pretty close. Now that I looked over v8 I have to admit that if I was a tools maintainer I wouldn't want to see some of the tools patches in with just an ack, but without any review. I'm somewhat confused at this point. Acked-by: is often used by the maintainer of the affected code when that maintainer neither contributed to nor forwarded the patch. It is a record that the acker has at least reviewed the patch and has indicated acceptance. Does this imply this is already reviewed? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 00/16] Fix RMRR
It looks like most of the libxl/libxc patches have been acked. It seems to me that most of the hypervisor patches (1-3, 14-15) are either ready to go in or pretty close. Now that I looked over v8 I have to admit that if I was a tools maintainer I wouldn't want to see some of the tools patches in with just an ack, but without any review. I'm somewhat confused at this point. Acked-by: is often used by the maintainer of the affected code when that maintainer neither contributed to nor forwarded the patch. It is a record that the acker has at least reviewed the patch and has indicated acceptance. Does this imply this is already reviewed? No, that would be expressed by Reviewed-by. Acked-by merely means no objection by the maintainer for the change to go in. Sorry I'm trying to dig into this. If nobody would like to take a look at this, so isn't this the associated maintainer's responsibility to review finally? In this case isn't Acked-by fine enough? Or you still want to us add two lines explicitly, Reviewed-by: A Acked-by: A Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
+/* + * And then we also need to adjust highmem. + */ +if ( add_high_mem ) +{ +for ( i = 0; i nr; i++ ) +{ +if ( e820[i].type == E820_RAM + e820[i].addr == (1ull 32)) +{ +e820[i].size += add_high_mem; +add_high_mem = 0; +break; +} +} +} + +/* Or this is just populated by hvmloader itself. */ This should probably say something like: If there was no highmem region, we need to create one. Okay, If there was no highmem entry, we need to create one. +if ( add_high_mem ) +{ +/* + * hvmloader should always update hvm_info-high_mem_pgend + * when it relocates RAM anywhere. + */ +BUG_ON( !hvm_info-high_mem_pgend ); + e820[nr].addr = ((uint64_t)1 32); e820[nr].size = ((uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT) - e820[nr].addr; In theory add_high_mem and hvm_info-high_mem_pgend PAGE_SHIFT - 4GiB is the same, but it seems like asking for trouble to assume so No, its not true in the first if( add_high_mem ) conditional. Before we enter hvmloader, there are two cases: #1. hvm_info-high_mem_pgend == 0 So we wouldn't have a highmem entry in e820. But hvmloader may relocate RAM upward highmem (add_high_mem) to get sufficient mmio, so hvm_info-high_mem_pgend is expanded somewhere (4GiB + add_high_mem). Then we would fall into the second if( add_high_mem ) conditional. #2. hvm_info-high_mem_pgend != 0 We always walk into the first if( add_high_mem ) conditional. But here add_high_mem just represents that highmem section expanded by hvmloader, its really not the whole higmem:(hvm_info-high_mem_pgend PAGE_SHIFT - 4GiB). Thanks Tiejun without checking. Perhaps in the first if( add_high_mem ) conditional, you can BUG_ON(add_high_mem != ((hvm_info-high_mem_pgend PAGE_SHIFT) - (ull1 32))) ? Other than that, this looks good, thanks. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
Except that this isn't valid C (no statement following the label). I can accept goto-s for some error handling cases where the alternatives might be considered even more ugly than using goto. But the way this or your original proposal look, I'd rather not have goto-s used like this. What about this? +bool is_conflict = false; for ( devfn = 0; devfn 256; devfn++ ) { @@ -60,7 +61,7 @@ static void disable_conflicting_devices(void) continue; /* Check all bars */ -for ( bar = 0; bar 7; bar++ ) +for ( bar = 0; bar 7 !is_conflict; bar++ ) { bar_reg = PCI_BASE_ADDRESS_0 + 4*bar; if ( bar == 6 ) @@ -89,7 +90,7 @@ static void disable_conflicting_devices(void) bar_sz = pci_readl(devfn, bar_reg); bar_sz = PCI_BASE_ADDRESS_MEM_MASK; -for ( i = 0; i memory_map.nr_map ; i++ ) +for ( i = 0; i memory_map.nr_map !is_conflict; i++ ) { if ( memory_map.map[i].type == E820_RESERVED ) { @@ -105,13 +106,13 @@ static void disable_conflicting_devices(void) devfn3, devfn7, bar_reg, bar_data); cmd = pci_readw(devfn, PCI_COMMAND); pci_writew(devfn, PCI_COMMAND, ~cmd); -/* Jump next device. */ -goto check_next_device; +/* So need to jump next device. */ +is_conflict = true; } } } } - check_next_device: +is_conflict = false; } } Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
+for ( i = 0; i memory_map.nr_map ; i++ ) +{ +if ( memory_map.map[i].type != E820_RAM ) Here we're assuming that any region not marked as RAM is an RMRR. Is that true? In any case, it would be just as strange to have a device BAR overlap with guest RAM as with an RMRR, wouldn't it? OOPS! Actually I should take this, if ( memory_map.map[i].type == E820_RESERVED ) This is same as when I check [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END). +{ +uint64_t reserved_start, reserved_size; +reserved_start = memory_map.map[i].addr; +reserved_size = memory_map.map[i].size; +if ( check_overlap(bar_data , bar_sz, + reserved_start, reserved_size) ) +{ +is_conflict = true; +/* Now disable the memory or I/O mapping. */ +printf(pci dev %02x:%x bar %02x : 0x%08x : conflicts + reserved resource so disable this device.!\n, + devfn3, devfn7, bar_reg, bar_data); +cmd = pci_readw(devfn, PCI_COMMAND); +pci_writew(devfn, PCI_COMMAND, ~cmd); +break; +} +} + +/* Jump next device. */ +if ( is_conflict ) +{ +is_conflict = false; +break; +} This conditional is still inside the memory_map loop; you want it one loop futher out, in the bar loop, don't you? Here what I intended to do is if one of all bars specific to one given device already conflicts with RDM, its not necessary to continue check other remaining bars of this device and other RDM regions, we just disable this device simply then check next device. Also, if you declare is_conflict inside the devfn loop, rather than in the main function, then you don't need this is_conflict=false here. It might also be more sensible to use a goto instead; but this is one This can work for me so it may be as follows: for ( devfn = 0; devfn 256; devfn++ ) { check_next_device: vendor_id = pci_readw(devfn, PCI_VENDOR_ID); device_id = pci_readw(devfn, PCI_DEVICE_ID); if ( (vendor_id == 0x) (device_id == 0x) ) continue; ... if ( check_overlap(bar_data , bar_sz, reserved_start, reserved_size) ) { ... /* Jump next device. */ devfn++; goto check_next_device; } where Jan will have a better idea what standard practice will be. I can follow that again if Jan has any good implementation. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
Yes, sorry, add_high_mem will be the size of memory *relocated*, not the actual end of it (unless, as you say, the original highmem region didn't exist). What I really meant was that either way, after adjusting the highmem region in the e820, the end of that region should correspond to hvm_info-high_mem_pgend. What about something like this? --- /* * And then we also need to adjust highmem. */ if ( add_high_mem ) { /* * Modify the existing highmem region if it exists */ for ( i = 0; i nr; i++ ) { if ( e820[i].type == E820_RAM e820[i].addr == (1ull 32)) { e820[i].size += add_high_mem; break; } } /* * If we didn't find a highmem region, make one */ if ( i == nr ) { e820[nr].addr = ((uint64_t)1 32); e820[nr].size = e820[nr].addr + add_high_mem; e820[nr].type = E820_RAM; nr++; } /* * Either way, at this point i points to the entry containing * highmem. Compare it to what's in hvm_info as a sanity * check. */ BUG_ON(e820[i].addr+e820[i].size != ((uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT)); } Looks really better. I just introduce a little change based on yours, and I post this as a whole, diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c index 7a414ab..8c9b01f 100644 --- a/tools/firmware/hvmloader/e820.c +++ b/tools/firmware/hvmloader/e820.c @@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820, unsigned int lowmem_reserved_base, unsigned int bios_image_base) { -unsigned int nr = 0; +unsigned int nr = 0, i, j; +uint32_t low_mem_end = hvm_info-low_mem_pgend PAGE_SHIFT; +uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT; +uint64_t add_high_mem = 0; if ( !lowmem_reserved_base ) lowmem_reserved_base = 0xA; @@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820, e820[nr].type = E820_RESERVED; nr++; -/* Low RAM goes here. Reserve space for special pages. */ -BUG_ON((hvm_info-low_mem_pgend PAGE_SHIFT) (2u 20)); -e820[nr].addr = 0x10; -e820[nr].size = (hvm_info-low_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; -nr++; - /* * Explicitly reserve space for special pages. * This space starts at RESERVED_MEMBASE an extends to cover various @@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820, nr++; } - -if ( hvm_info-high_mem_pgend ) +/* + * Construct E820 table according to recorded memory map. + * + * The memory map created by toolstack may include, + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * + * #2. Reserved regions if they exist + * + * #3. High memory region if it exists + */ +for ( i = 0; i memory_map.nr_map; i++ ) { -e820[nr].addr = ((uint64_t)1 32); -e820[nr].size = -((uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; +e820[nr] = memory_map.map[i]; nr++; } +/* Low RAM goes here. Reserve space for special pages. */ +BUG_ON(low_mem_end (2u 20)); + +/* + * Its possible to relocate RAM to allocate sufficient MMIO previously + * so low_mem_pgend would be changed over there. And here memory_map[] + * records the original low/high memory, so if low_mem_end is less than + * the original we need to revise low/high memory range in e820. + */ +for ( i = 0; i nr; i++ ) +{ +uint64_t end = e820[i].addr + e820[i].size; +if ( e820[i].type == E820_RAM + low_mem_end e820[i].addr low_mem_end end ) +{ +add_high_mem = end - low_mem_end; +e820[i].size = low_mem_end - e820[i].addr; +} +} + +/* + * And then we also need to adjust highmem. + */ +if ( add_high_mem ) +{ +/* Modify the existing highmem region if it exists. */ +for ( i = 0; i nr; i++ ) +{ +if ( e820[i].type == E820_RAM + e820[i].addr == ((uint64_t)1 32)) +{ +e820[i].size += add_high_mem; +break; +} +} + +/* If there was no highmem region, just create one. */ +if ( i == nr ) +{ +e820[nr].addr = ((uint64_t)1 32); +e820[nr].size = high_mem_end - e820[nr].addr; +e820[nr].type = E820_RAM; +
Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
What about this? Looks reasonable (but don't forget that I continue to be unconvinced that the patch as a whole makes sense). Yes, I always keep this in my mind as I mentioned in patch #00. Any risk you're still concerning? Is it that case if guest OS force enabling these devices again? IMO, at this point there are two cases: #1. Without passing through a RMRR device Those emulated devices don't create 1:1 mapping so its safe, right? #2. With passing through a RMRR device This just probably cause these associated devices not to work well, but still don't bring any impact to other Domains, right? I mean this isn't going to worsen the preexisting situation. If I'm wrong please correct me. Thanks Tiejun Jan +bool is_conflict = false; for ( devfn = 0; devfn 256; devfn++ ) { @@ -60,7 +61,7 @@ static void disable_conflicting_devices(void) continue; /* Check all bars */ -for ( bar = 0; bar 7; bar++ ) +for ( bar = 0; bar 7 !is_conflict; bar++ ) { bar_reg = PCI_BASE_ADDRESS_0 + 4*bar; if ( bar == 6 ) @@ -89,7 +90,7 @@ static void disable_conflicting_devices(void) bar_sz = pci_readl(devfn, bar_reg); bar_sz = PCI_BASE_ADDRESS_MEM_MASK; -for ( i = 0; i memory_map.nr_map ; i++ ) +for ( i = 0; i memory_map.nr_map !is_conflict; i++ ) { if ( memory_map.map[i].type == E820_RESERVED ) { @@ -105,13 +106,13 @@ static void disable_conflicting_devices(void) devfn3, devfn7, bar_reg, bar_data); cmd = pci_readw(devfn, PCI_COMMAND); pci_writew(devfn, PCI_COMMAND, ~cmd); -/* Jump next device. */ -goto check_next_device; +/* So need to jump next device. */ +is_conflict = true; } } } } - check_next_device: +is_conflict = false; } } Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
On 2015/7/16 23:16, George Dunlap wrote: On 07/16/2015 04:04 PM, Chen, Tiejun wrote: Yes, sorry, add_high_mem will be the size of memory *relocated*, not the actual end of it (unless, as you say, the original highmem region didn't exist). What I really meant was that either way, after adjusting the highmem region in the e820, the end of that region should correspond to hvm_info-high_mem_pgend. What about something like this? --- /* * And then we also need to adjust highmem. */ if ( add_high_mem ) { /* * Modify the existing highmem region if it exists */ for ( i = 0; i nr; i++ ) { if ( e820[i].type == E820_RAM e820[i].addr == (1ull 32)) { e820[i].size += add_high_mem; break; } } /* * If we didn't find a highmem region, make one */ if ( i == nr ) { e820[nr].addr = ((uint64_t)1 32); e820[nr].size = e820[nr].addr + add_high_mem; e820[nr].type = E820_RAM; nr++; } /* * Either way, at this point i points to the entry containing * highmem. Compare it to what's in hvm_info as a sanity * check. */ BUG_ON(e820[i].addr+e820[i].size != ((uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT)); } Looks really better. I just introduce a little change based on yours, and I post this as a whole, diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c index 7a414ab..8c9b01f 100644 --- a/tools/firmware/hvmloader/e820.c +++ b/tools/firmware/hvmloader/e820.c @@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820, unsigned int lowmem_reserved_base, unsigned int bios_image_base) { -unsigned int nr = 0; +unsigned int nr = 0, i, j; +uint32_t low_mem_end = hvm_info-low_mem_pgend PAGE_SHIFT; +uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT; +uint64_t add_high_mem = 0; if ( !lowmem_reserved_base ) lowmem_reserved_base = 0xA; @@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820, e820[nr].type = E820_RESERVED; nr++; -/* Low RAM goes here. Reserve space for special pages. */ -BUG_ON((hvm_info-low_mem_pgend PAGE_SHIFT) (2u 20)); -e820[nr].addr = 0x10; -e820[nr].size = (hvm_info-low_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; -nr++; - /* * Explicitly reserve space for special pages. * This space starts at RESERVED_MEMBASE an extends to cover various @@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820, nr++; } - -if ( hvm_info-high_mem_pgend ) +/* + * Construct E820 table according to recorded memory map. + * + * The memory map created by toolstack may include, + * + * #1. Low memory region + * + * Low RAM starts at least from 1M to make sure all standard regions + * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios, + * have enough space. + * + * #2. Reserved regions if they exist + * + * #3. High memory region if it exists + */ +for ( i = 0; i memory_map.nr_map; i++ ) { -e820[nr].addr = ((uint64_t)1 32); -e820[nr].size = -((uint64_t)hvm_info-high_mem_pgend PAGE_SHIFT) - e820[nr].addr; -e820[nr].type = E820_RAM; +e820[nr] = memory_map.map[i]; nr++; } +/* Low RAM goes here. Reserve space for special pages. */ +BUG_ON(low_mem_end (2u 20)); + +/* + * Its possible to relocate RAM to allocate sufficient MMIO previously + * so low_mem_pgend would be changed over there. And here memory_map[] + * records the original low/high memory, so if low_mem_end is less than + * the original we need to revise low/high memory range in e820. + */ +for ( i = 0; i nr; i++ ) +{ +uint64_t end = e820[i].addr + e820[i].size; +if ( e820[i].type == E820_RAM + low_mem_end e820[i].addr low_mem_end end ) +{ +add_high_mem = end - low_mem_end; +e820[i].size = low_mem_end - e820[i].addr; +} +} + +/* + * And then we also need to adjust highmem. + */ +if ( add_high_mem ) +{ +/* Modify the existing highmem region if it exists. */ +for ( i = 0; i nr; i++ ) +{ +if ( e820[i].type == E820_RAM + e820[i].addr == ((uint64_t)1 32)) +{ +e820[i].size += add_high_mem; +break; +} +} + +/* If there was no highmem region, just create one. */ +if ( i == nr ) +{ +e820[nr].addr
Re: [Xen-devel] [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table
Honestly I didn't try to change that point but maybe I'm missing something? Yes, you are missing something. :-) I told you exactly what I wanted changed and what I said could remain the same: By all means, calculate high_mem_end so it's easier to read. But then, when creating a new region, set e820[nr].size = add_high_mem, so that the BUG_ON() that follows actually checks something useful. Just to be clear, I want the second if() statement to look like this: +if ( i == nr ) +{ +e820[nr].addr = ((uint64_t)1 32); +e820[nr].size = add_high_mem; Ahh, when you're replying this, I also see this difference and realize what you meant. Sorry to this inconvenience and I'll sync this line into my tree :) Thanks Tiejun +e820[nr].type = E820_RAM; +nr++; +} Think about why and maybe that will help you understand what I'm talking about. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
Here what I intended to do is if one of all bars specific to one given device already conflicts with RDM, its not necessary to continue check other remaining bars of this device and other RDM regions, we just disable this device simply then check next device. I know what you're trying to do; what I'm saying is I don't think it does what you want it to do. You have loops nested 3 deep: 1. for each dev 2. for each bar 3. for each memory range This conditional is in loop 3; you want it to be in loop 2. (In fact, when you set is_conflict, you then break out of loop 3 back into loop 2; so this code will never actually be run.) Sorry I should make this clear last time. I mean I already knew what you were saying is right at this point so I tried to use goto to fix this bug. Also, if you declare is_conflict inside the devfn loop, rather than in the main function, then you don't need this is_conflict=false here. It might also be more sensible to use a goto instead; but this is one [snip] I'm not a fan of hard-coding the loop continuing condition like this; if I were going to do a goto, I'd want to go to the end of the loop. I guess something like this, ... pci_writew(devfn, PCI_COMMAND, ~cmd); /* Jump next device. */ goto check_next_device; } } } } check_next_device: } } Anyway, the code is OK as it is; I'd rather spend time working on something that's more of a blocker. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
On 2015/7/16 23:39, George Dunlap wrote: On 07/16/2015 04:20 PM, Chen, Tiejun wrote: What about this? Looks reasonable (but don't forget that I continue to be unconvinced that the patch as a whole makes sense). Yes, I always keep this in my mind as I mentioned in patch #00. Any risk you're still concerning? Is it that case if guest OS force enabling these devices again? IMO, at this point there are two cases: #1. Without passing through a RMRR device Those emulated devices don't create 1:1 mapping so its safe, right? #2. With passing through a RMRR device This just probably cause these associated devices not to work well, but still don't bring any impact to other Domains, right? I mean this isn't going to worsen the preexisting situation. If I'm wrong please correct me. But I think the issue is, without doing *something* about MMIO collisions, the feature as a whole is sort of pointless. You can carefully specify rdm=strategy=host,reserved=strict, but you might I got what your mean. But there's no a good approach to bridge between xl and hvmloader to follow this policy. Right now, maybe just one thing could be tried like this, Under hvmloader circumstance, strict - Still set RDM as E820_RESERVED relaxed - Set RDM as a new internal E820 flag like E820_HAZARDOUS Then in the case of MMIO collisions E820_RESERVED - BUG() - Stop VM E820_HAZARDOUS - our warning messages + disable devices I think this can make sure we always take consistent policy in each involved cycle. Thanks Tiejun still get devices whose MMIO regions conflict with RMMRs, and there's nothing you can really do about it. And although I personally think it might be possible / reasonable to check in a newly-written, partial MMIO collision avoidance patch, not everyone might agree. Even if I were to rewrite and post a patch myself, they may argue that doing such a complicated re-design after the feature freeze shouldn't be allowed. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
Jan and George, That implementation to this problem in v7 is really not accepted? Yes, that isn't a best solution to make our original mechanism very well, but in high level I just think that should be a correct solution fixing this problem. According to recent discussion seems we have not a efficient way which can be put into 4.6. So instead, could your guys help me make that better gradually to reach our current requirement as possible? Thanks Tiejun On 2015/7/17 0:40, George Dunlap wrote: On 07/16/2015 05:08 PM, Chen, Tiejun wrote: On 2015/7/16 23:39, George Dunlap wrote: On 07/16/2015 04:20 PM, Chen, Tiejun wrote: What about this? Looks reasonable (but don't forget that I continue to be unconvinced that the patch as a whole makes sense). Yes, I always keep this in my mind as I mentioned in patch #00. Any risk you're still concerning? Is it that case if guest OS force enabling these devices again? IMO, at this point there are two cases: #1. Without passing through a RMRR device Those emulated devices don't create 1:1 mapping so its safe, right? #2. With passing through a RMRR device This just probably cause these associated devices not to work well, but still don't bring any impact to other Domains, right? I mean this isn't going to worsen the preexisting situation. If I'm wrong please correct me. But I think the issue is, without doing *something* about MMIO collisions, the feature as a whole is sort of pointless. You can carefully specify rdm=strategy=host,reserved=strict, but you might I got what your mean. But there's no a good approach to bridge between xl and hvmloader to follow this policy. Right now, maybe just one thing could be tried like this, Under hvmloader circumstance, strict - Still set RDM as E820_RESERVED relaxed - Set RDM as a new internal E820 flag like E820_HAZARDOUS Then in the case of MMIO collisions E820_RESERVED - BUG() - Stop VM E820_HAZARDOUS - our warning messages + disable devices I think this can make sure we always take consistent policy in each involved cycle. A better way to communicate between xl and hvmloader is to use xenstore, as we do for allow_memory_reallocate. But I have very little hope we can hash out a suitable design for that by tomorrow. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v8][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm
base = (resource-base + bar_sz - 1) ~(uint64_t)(bar_sz - 1); + +/* If we're using mem_resource, check for RMRR conflicts */ +while ( resource == mem_resource +next_rmrr 0 +check_overlap(base, bar_sz, + memory_map.map[next_rmrr].addr, + memory_map.map[next_rmrr].size)) { +base = memory_map.map[next_rmrr].addr + memory_map.map[next_rmrr].size; +base = (resource-base + bar_sz - 1) ~(uint64_t)(bar_sz - 1); +next_rmrr=find_next_rmrr(base); +} + bar_data |= (uint32_t)base; bar_data_upper = (uint32_t)(base 32); base += bar_sz; Actually this chunk of codes are really similar as what we did in my previous revisions from RFC ~ v3. It's just trying to skip and then allocate, right? As Jan pointed out, there are two key problems: #1. All skipping action probably cause a result of no sufficient MMIO to allocate all devices as before. #2. Another is that alignment issue. When the original base change to align to rdm_end, some spaces are wasted. Especially, these spaces could be allocated to other smaller bars. This is one key reason why I had new revision started from v4 to address these two points :) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Requesting for freeze exception for RMRR
On 2015/7/14 17:29, Wei Liu wrote: On Tue, Jul 14, 2015 at 09:27:17AM +0800, Chen, Tiejun wrote: Please work with maintainers to get those hvmloader patches acked or reviewed. I will do. Maybe I need to update current status today. I just send out v8: * All tools comments raised by Jackson and Campbell are addressed and I don't see further feedback. * On hv side, last one is reviewed by George. And looks Jan have no obvious objection to that. (Jan: If I'm wrong let me take it back. ) * On hvmloader side, there three patches now: patch #5 is reviewed by George and Acked by Jan; patch #6 is reviewed just for code implementation but that solution can't convince all maintainers. Honestly, this is a rare possibility of collision in real world and the original pci allocation is not good so its hard to reshape the original mechanism in one week. But actually we also have some other solutions but we need a little time to figure out how to step forward but I just think any solution wouldn't bring any change to other stuffs. patch #7 is pretty close to be Acked. So what's your final determination? Thanks Tiejun Note Jackson and Campbell also raised some comments to improve current codes. 2. explain why it needs to be in this release (benefits). RMRR mechanism was broken for a long time and this makes VM always face security issues. In addition, those associated devices can't be passed through to VM and even result in VM crashes. 3. explain why it doesn't break things (risks). Our policy makes sure that system will work in the original way by default as without the RMRR patches. And especially, this series just impacts those platforms which have RMRR. Your patches touch crucial path in hvmloader to build memory map. There is risk that it may break hvmloader even if it is turned off. I would need at least some positive test results from you to confirm if this feature is turned off everything works as before. Could you show what sort of test you need here? I just did boot a VM without any RDM parameters. I just think maybe someone had this good experience to check this. Yes, this is the basic test I need -- at least booting a VM with RDM off. Feel free to do more tests and report back if you have time. Wei. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/16 0:14, George Dunlap wrote: On Wed, Jul 15, 2015 at 2:56 PM, George Dunlap george.dun...@eu.citrix.com wrote: Would it be possible, on a collision, to have one last stab at allocating the BAR somewhere else, without relocating memory (or relocating any other BARs)? At very least then an administrator could work around this kind of thing by setting the mmio_hole larger in the domain config. If it's not possible to have this last-ditch relocation effort, then Could you take a look at the original patch #06 ? Although Jan thought that is complicated, that is really one version that I can refine in current time slot. yes, I'd be OK with just disabling the device for the time being. Just let me send out new patch series based this idea. We can continue discuss this over there but we also need to further review other remaining comments based on a new revision. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table
I think I would say: -- Now use the hypervisor-supplied memory map to build our final e820 table: * Add regions for BIOS ranges and other special mappings not in the hypervisor map * Add in the hypervisor regions * Adjust the lowmem and highmem regions if we've had to relocate memory (adding a highmem region if necessary) * Sort all the ranges so that they appear in memory order. -- I'll update this and thanks a lot. CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com CC: Andrew Cooper andrew.coop...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Stefano Stabellini stefano.stabell...@eu.citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Wei Liu wei.l...@citrix.com Signed-off-by: Tiejun Chen tiejun.c...@intel.com --- [snip] +/* Low RAM goes here. Reserve space for special pages. */ +BUG_ON(low_mem_end (2u 20)); Won't this BUG if the guest was actually given less than 2GiB of RAM? 2u 20 = 0x20, so this is 2M, not 2G :) + +/* + * We may need to adjust real lowmem end since we may + * populate RAM to get enough MMIO previously. + */ [snip] + +/* + * And then we also need to adjust highmem. + */ +if ( add_high_mem ) +{ +for ( i = 0; i memory_map.nr_map; i++ ) +{ +if ( e820[i].type == E820_RAM + e820[i].addr (1ull 32)) +e820[i].size += add_high_mem; +} +} What if there was originally no high memory, but resizing the pci hole meant we had to create a high memory region? You're right. We need to consider this case. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/15 16:34, Jan Beulich wrote: On 15.07.15 at 06:27, tiejun.c...@intel.com wrote: Furthermore, could we have this solution as follows? Yet more special casing code you want to add. I said no to this model, and unless you can address the issue _without_ adding a lot of special casing code, the answer will remain no (subject What about this? @@ -301,6 +301,19 @@ void pci_setup(void) pci_mem_start = 1; } +for ( i = 0; i memory_map.nr_map ; i++ ) +{ +uint64_t reserved_start, reserved_size; +reserved_start = memory_map.map[i].addr; +reserved_size = memory_map.map[i].size; +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, + reserved_start, reserved_size) ) +{ +printf(Reserved device memory conflicts current PCI memory.\n); +BUG(); +} +} + if ( mmio_total (pci_mem_end - pci_mem_start) ) { printf(Low MMIO hole not large enough for all devices, This is very similar to our current policy to [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 since actually this is also another rare possibility in real world. Even I can do this as well when we handle that conflict with [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6. Note its not necessary to concern high memory since we already handle this case in the hv code previously, and its also not affected by those relocated memory later since our previous policy can make sure RAM isn't overlapping with RDM. Thanks Tiejun to co-maintainers overriding me). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
This is very similar to our current policy to [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6 since actually this is also another rare possibility in real world. Even I can do this as well when we handle that conflict with [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END] in patch #6. Sorry, here is one typo, s/#6/#5 Thanks Tiejun Note its not necessary to concern high memory since we already handle this case in the hv code previously, and its also not affected by those relocated memory later since our previous policy can make sure RAM isn't overlapping with RDM. Thanks Tiejun to co-maintainers overriding me). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
Certainly appreciate your time. I didn't mean its wasting time at this point. I just want to express that its hard to implement that solution in one or two weeks to walking into 4.6 as an exception. Note I know this feature is still not accepted as an exception to 4.6 right now so I'm making an assumption. After all this is a bug fix (and would have been allowed into 4.5 had it been ready in time), so doesn't necessarily need a freeze exception (but of course the bar raises the later it gets). Rather Yes, this is not a bug fix again into 4.6. than rushing in something that's cumbersome to maintain, I'd much prefer this to be done properly. Indeed, we'd like to finalize this properly as you said. But apparently time is not sufficient to allow this happened. So I just suggest we can further seek the best solution in next phase. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
On 2015/7/15 19:05, George Dunlap wrote: On Wed, Jul 15, 2015 at 10:27 AM, Jan Beulich jbeul...@suse.com wrote: On 15.07.15 at 10:59, tiejun.c...@intel.com wrote: On 2015/7/15 16:34, Jan Beulich wrote: On 15.07.15 at 06:27, tiejun.c...@intel.com wrote: Furthermore, could we have this solution as follows? Yet more special casing code you want to add. I said no to this model, and unless you can address the issue _without_ adding a lot of special casing code, the answer will remain no (subject What about this? @@ -301,6 +301,19 @@ void pci_setup(void) pci_mem_start = 1; } +for ( i = 0; i memory_map.nr_map ; i++ ) +{ +uint64_t reserved_start, reserved_size; +reserved_start = memory_map.map[i].addr; +reserved_size = memory_map.map[i].size; +if ( check_overlap(pci_mem_start, pci_mem_end - pci_mem_start, + reserved_start, reserved_size) ) +{ +printf(Reserved device memory conflicts current PCI memory.\n); +BUG(); +} +} So what would the cure be if someone ran into this BUG() (other than removing the device associated with the conflicting RMRR)? Afaics such a guest would remain permanently unbootable, which of course is not an option. Is not booting worse than what we have now -- which is, booting successfully but (probably) having issues due to MMIO ranges overlapping RMRRs? Its really so rare possibility here since in the real world we didn't see any RMRR regions = 0xF000 (the default pci memory start.) And I already sent out a little better revision in that ensuing email so also please take a review if possible :) This patch series as a whole represents a lot of work and a lot of tangible improvements to the situation; and (unless the situation has changed) it's almost entirely acked apart from the MMIO placement part. If there is a simple way that we can change hvmloader so that most (or even many) VM/device combinations work properly for the 4.6 release, then I think it's worth considering. Current MMIO allocation mechanism is not good. So we really need to reshape that, but we'd better do this with some further discussion in next release :) Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges
Maybe I can move this chunk of codes downward those actual allocation to check if RDM conflicts with the final allocation, and then just disable those associated devices by writing PCI_COMMAND without BUG() like this draft code, And what would keep the guest from re-enabling the device? We can't but IMO, #1. We're already posting that warning messages. #2. Actually this is like this sort of case like, as you known, even in the native platform, some broken devices are also disabled by BIOS, right? So I think this is OS's responsibility or risk to force enabling such a broken device. #3. Its really rare possibility in real world. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel