Re: [Xen-devel] [PATCH] gdbsx: prefer privcmd character device
On Tue, Oct 31, 2017 at 03:25:39PM +, Wei Liu wrote: > On Tue, Oct 31, 2017 at 10:20:11AM -0500, Doug Goldstein wrote: > > Prefer using the character device over the proc file if the character > > device exists. > > > > CC: Elena Ufimtseva <elena.ufimts...@oracle.com> > > CC: Ian Jackson <ian.jack...@eu.citrix.com> > > CC: Stefano Stabellini <stefano.stabell...@eu.citrix.com> > > CC: Wei Liu <wei.l...@citrix.com> > > Signed-off-by: Doug Goldstein <car...@cardoe.com> > > --- > > So this was originally submitted with 9c89dc95201 and 7d418eab3b6 and > > was rejected since the goal was to convert gdbsx to use libxc but that > > hasn't happened. /dev/xen/privcmd should be preferred and this change > > makes that happen. It would be nice if we landed this with the plan > > to convert gdbsx happening when it happens. > > Oh well... I think this is fine. > > Elena has the final verdict. I think this is fine. I will look into the conversion and relevant discussions if I find them and see what can be done. Thanks! Meanwhile, Reviewed-by: Elena Ufimtseva <elena.ufimts...@oracle.com> Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/4] x86: remove has_hvm_container_{domain/vcpu}
On Fri, Mar 03, 2017 at 12:25:07PM +, Roger Pau Monne wrote: > It is now useless since PVHv1 is removed and PVHv2 is a HVM domain from Xen's > point of view. > > Signed-off-by: Roger Pau Monné <roger@citrix.com> > Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com> > Acked-by: Tim Deegan <t...@xen.org> > Reviewed-by: Kevin Tian <kevin.t...@intel.com> > Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com> > Acked-by: George Dunlap <george.dun...@citrix.com> > --- > Cc: Christoph Egger <cheg...@amazon.de> > Cc: Jan Beulich <jbeul...@suse.com> > Cc: Andrew Cooper <andrew.coop...@citrix.com> > Cc: Boris Ostrovsky <boris.ostrov...@oracle.com> > Cc: Suravee Suthikulpanit <suravee.suthikulpa...@amd.com> > Cc: Jun Nakajima <jun.nakaj...@intel.com> > Cc: Kevin Tian <kevin.t...@intel.com> > Cc: Elena Ufimtseva <elena.ufimts...@oracle.com> Hmm, I dont see the code I should ACK. But here you go! Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com> > Cc: George Dunlap <george.dun...@eu.citrix.com> > Cc: Tim Deegan <t...@xen.org> > Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> > --- > xen/arch/x86/cpu/mcheck/vmce.c | 6 +++--- > xen/arch/x86/cpu/vpmu.c | 4 ++-- > xen/arch/x86/cpu/vpmu_amd.c | 12 ++-- > xen/arch/x86/cpu/vpmu_intel.c | 31 +++ > xen/arch/x86/cpuid.c| 6 +++--- > xen/arch/x86/debug.c| 2 +- > xen/arch/x86/domain.c | 28 ++-- > xen/arch/x86/domain_build.c | 5 ++--- > xen/arch/x86/domctl.c | 2 +- > xen/arch/x86/hvm/dm.c | 2 +- > xen/arch/x86/hvm/hvm.c | 6 +++--- > xen/arch/x86/hvm/irq.c | 2 +- > xen/arch/x86/hvm/mtrr.c | 2 +- > xen/arch/x86/hvm/vmsi.c | 3 +-- > xen/arch/x86/hvm/vmx/vmcs.c | 4 ++-- > xen/arch/x86/hvm/vmx/vmx.c | 4 ++-- > xen/arch/x86/mm.c | 4 ++-- > xen/arch/x86/mm/paging.c| 2 +- > xen/arch/x86/mm/shadow/common.c | 9 - > xen/arch/x86/setup.c| 2 +- > xen/arch/x86/time.c | 11 +-- > xen/arch/x86/traps.c| 4 ++-- > xen/arch/x86/x86_64/traps.c | 4 ++-- > xen/drivers/passthrough/x86/iommu.c | 2 +- > xen/include/asm-x86/domain.h| 2 +- > xen/include/asm-x86/event.h | 2 +- > xen/include/asm-x86/guest_access.h | 12 ++-- > xen/include/asm-x86/hvm/hvm.h | 2 +- > xen/include/xen/sched.h | 2 -- > xen/include/xen/tmem_xen.h | 5 ++--- > 30 files changed, 87 insertions(+), 95 deletions(-) > > diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c > index 8b727b4..6fb7833 100644 > --- a/xen/arch/x86/cpu/mcheck/vmce.c > +++ b/xen/arch/x86/cpu/mcheck/vmce.c > @@ -82,7 +82,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct > hvm_vmce_vcpu *ctxt) > { > dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA capabilities" > " %#" PRIx64 " for %pv (supported: %#Lx)\n", > -has_hvm_container_vcpu(v) ? "HVM" : "PV", ctxt->caps, > +is_hvm_vcpu(v) ? "HVM" : "PV", ctxt->caps, > v, guest_mcg_cap & ~MCG_CAP_COUNT); > return -EPERM; > } > @@ -364,7 +364,7 @@ int inject_vmce(struct domain *d, int vcpu) > if ( !v->is_initialised ) > continue; > > -if ( (has_hvm_container_domain(d) || > +if ( (is_hvm_domain(d) || >guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) && > !test_and_set_bool(v->mce_pending) ) > { > @@ -444,7 +444,7 @@ int unmmap_broken_page(struct domain *d, mfn_t mfn, > unsigned long gfn) > if ( !mfn_valid(mfn) ) > return -EINVAL; > > -if ( !has_hvm_container_domain(d) || !paging_mode_hap(d) ) > +if ( !is_hvm_domain(d) || !paging_mode_hap(d) ) > return -EOPNOTSUPP; > > rc = -1; > diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c > index a1e9f00..03401fd 100644 > --- a/xen/arch/x86/cpu/vpmu.c > +++ b/xen/arch/x86/cpu/vpmu.c > @@ -237,7 +237,7 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs) > vpmu->arch_vpmu_ops->arch_vpmu_save(sampling, 1); > vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); > > -if ( has_hvm_container_vcpu(sampled) ) > +if ( is_hvm_v
Re: [Xen-devel] [PATCH v3 2/4] x86: remove PVHv1 code
On Fri, Mar 03, 2017 at 12:25:06PM +, Roger Pau Monne wrote: > This removal applies to both the hypervisor and the toolstack side of PVHv1. > > Note that on the toolstack side a new PVH domain type is introduced to libxl. > The "none" device model version is removed, together with the "pvh" field in > the create info struct (the defines announcing those features are also removed > from libxl.h). The canonical way to create a PVH guest in libxl is to add > "pvh=1" to the guest config file. > > Signed-off-by: Roger Pau Monné <roger@citrix.com> > Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com> > Acked-by: George Dunlap <george.dun...@citrix.com> > Reviewed-by: Paul Durrant <paul.durr...@citrix.com> > Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com> > Reviewed-by: Kevin Tian <kevin.t...@intel.com> > --- > Changes since v1: > - Remove dom0pvh option from the command line docs. > - Bump domctl interface version due to the removed CDF flag. > - Introduce LIBXL_DOMAIN_TYPE_PVH. > - Remove "none" from the valid device model version options. > - Update the xl.cfg(5) man page to reflect the changes. > For gdbsx bits: Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com> > --- > Cc: Ian Jackson <ian.jack...@eu.citrix.com> > Cc: Wei Liu <wei.l...@citrix.com> > Cc: Elena Ufimtseva <elena.ufimts...@oracle.com> > Cc: Jan Beulich <jbeul...@suse.com> > Cc: Andrew Cooper <andrew.coop...@citrix.com> > Cc: Paul Durrant <paul.durr...@citrix.com> > Cc: Jun Nakajima <jun.nakaj...@intel.com> > Cc: Kevin Tian <kevin.t...@intel.com> > Cc: George Dunlap <george.dun...@eu.citrix.com> > Cc: Razvan Cojocaru <rcojoc...@bitdefender.com> > Cc: Tamas K Lengyel <ta...@tklengyel.com> > --- > docs/man/xl.cfg.pod.5.in| 16 +- > docs/misc/pvh-readme.txt| 63 > docs/misc/xen-command-line.markdown | 7 - > tools/debugger/gdbsx/xg/xg_main.c | 4 +- > tools/libxc/include/xc_dom.h| 1 - > tools/libxc/include/xenctrl.h | 2 +- > tools/libxc/xc_cpuid_x86.c | 13 +- > tools/libxc/xc_dom_core.c | 9 -- > tools/libxc/xc_dom_x86.c| 49 +++--- > tools/libxc/xc_domain.c | 1 - > tools/libxl/libxl.h | 22 +-- > tools/libxl/libxl_console.c | 1 + > tools/libxl/libxl_create.c | 64 +++- > tools/libxl/libxl_disk.c| 10 +- > tools/libxl/libxl_dm.c | 2 + > tools/libxl/libxl_dom.c | 86 ++- > tools/libxl/libxl_dom_save.c| 7 +- > tools/libxl/libxl_dom_suspend.c | 4 +- > tools/libxl/libxl_domain.c | 18 +-- > tools/libxl/libxl_internal.h| 1 - > tools/libxl/libxl_mem.c | 1 + > tools/libxl/libxl_nic.c | 7 +- > tools/libxl/libxl_pci.c | 9 +- > tools/libxl/libxl_stream_read.c | 8 +- > tools/libxl/libxl_stream_write.c| 14 +- > tools/libxl/libxl_types.idl | 115 --- > tools/libxl/libxl_usb.c | 4 +- > tools/libxl/libxl_x86.c | 31 ++-- > tools/libxl/libxl_x86_acpi.c| 3 +- > tools/xl/xl_parse.c | 8 +- > xen/arch/x86/cpu/vpmu.c | 3 +- > xen/arch/x86/domain.c | 42 +- > xen/arch/x86/domain_build.c | 287 > +--- > xen/arch/x86/domctl.c | 7 +- > xen/arch/x86/hvm/hvm.c | 81 +- > xen/arch/x86/hvm/hypercall.c| 4 +- > xen/arch/x86/hvm/io.c | 2 - > xen/arch/x86/hvm/ioreq.c| 3 +- > xen/arch/x86/hvm/irq.c | 3 - > xen/arch/x86/hvm/vmx/vmcs.c | 35 + > xen/arch/x86/hvm/vmx/vmx.c | 12 +- > xen/arch/x86/mm.c | 2 +- > xen/arch/x86/mm/p2m-pt.c| 2 +- > xen/arch/x86/mm/p2m.c | 6 +- > xen/arch/x86/physdev.c | 8 - > xen/arch/x86/setup.c| 7 - > xen/arch/x86/time.c | 27 > xen/common/domain.c | 2 - > xen/common/domctl.c | 10 -- > xen/common/kernel.c | 5 - > xen/common/vm_event.c | 8 +- > xen/include/asm-x86/domain.h| 1 - > xen/include/asm-x86/hvm/hvm.h | 3 - > xen/include/public/domctl.h | 14 +- > xen/include/xen/sched.h | 9 +- > 55 files changed, 252 insertions(+), 911 deletions(-) > delete mode 100644 docs/misc/pvh-readme.txt > > diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in > index 505c111..8e4eb97 100644 > --- a/docs/man/xl.cfg.pod.5.in > +++ b/docs/man/xl.cfg.pod.5.in > @@ -1064,6 +1064,13 @@ FIFO-based event channel ABI support up to 131,071 > event channels. > Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit > x86). > > +=item B
Re: [Xen-devel] [PATCH v2 1/3] x86: remove PVHv1 code
On Tue, Feb 28, 2017 at 05:39:39PM +, Roger Pau Monne wrote: > This removal applies to both the hypervisor and the toolstack side of PVHv1. > > Note that on the toolstack side there's one hiccup: on xl the "pvh" > configuration option is translated to builder="hvm", > device_model_version="none". This is done because otherwise xl would start > parsing PV like options, and filling the PV struct at libxl_domain_build_info > (which in turn pollutes the HVM one because it's a union). > > Signed-off-by: Roger Pau Monné <roger@citrix.com> gdbsx bits: Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com> > --- > Changes since v1: > - Remove dom0pvh option from the command line docs. > - Bump domctl interface version due to the removed CDF flag. > > --- > Cc: Ian Jackson <ian.jack...@eu.citrix.com> > Cc: Wei Liu <wei.l...@citrix.com> > Cc: Elena Ufimtseva <elena.ufimts...@oracle.com> > Cc: Jan Beulich <jbeul...@suse.com> > Cc: Andrew Cooper <andrew.coop...@citrix.com> > Cc: Paul Durrant <paul.durr...@citrix.com> > Cc: Jun Nakajima <jun.nakaj...@intel.com> > Cc: Kevin Tian <kevin.t...@intel.com> > Cc: George Dunlap <george.dun...@eu.citrix.com> > Cc: Razvan Cojocaru <rcojoc...@bitdefender.com> > Cc: Tamas K Lengyel <ta...@tklengyel.com> > --- > docs/man/xl.cfg.pod.5.in| 10 +- > docs/misc/pvh-readme.txt| 63 > docs/misc/xen-command-line.markdown | 7 - > tools/debugger/gdbsx/xg/xg_main.c | 4 +- > tools/libxc/include/xc_dom.h| 1 - > tools/libxc/include/xenctrl.h | 2 +- > tools/libxc/xc_cpuid_x86.c | 13 +- > tools/libxc/xc_dom_core.c | 9 -- > tools/libxc/xc_dom_x86.c| 49 +++--- > tools/libxc/xc_domain.c | 1 - > tools/libxl/libxl_create.c | 31 ++-- > tools/libxl/libxl_dom.c | 1 - > tools/libxl/libxl_internal.h| 1 - > tools/libxl/libxl_x86.c | 7 +- > tools/xl/xl_parse.c | 10 +- > xen/arch/x86/cpu/vpmu.c | 3 +- > xen/arch/x86/domain.c | 42 +- > xen/arch/x86/domain_build.c | 287 > +--- > xen/arch/x86/domctl.c | 7 +- > xen/arch/x86/hvm/hvm.c | 81 +- > xen/arch/x86/hvm/hypercall.c| 4 +- > xen/arch/x86/hvm/io.c | 2 - > xen/arch/x86/hvm/ioreq.c| 3 +- > xen/arch/x86/hvm/irq.c | 3 - > xen/arch/x86/hvm/vmx/vmcs.c | 35 + > xen/arch/x86/hvm/vmx/vmx.c | 12 +- > xen/arch/x86/mm.c | 2 +- > xen/arch/x86/mm/p2m-pt.c| 2 +- > xen/arch/x86/mm/p2m.c | 6 +- > xen/arch/x86/physdev.c | 8 - > xen/arch/x86/setup.c| 7 - > xen/arch/x86/time.c | 27 > xen/common/domain.c | 2 - > xen/common/domctl.c | 10 -- > xen/common/kernel.c | 5 - > xen/common/vm_event.c | 8 +- > xen/include/asm-x86/domain.h| 1 - > xen/include/asm-x86/hvm/hvm.h | 3 - > xen/include/public/domctl.h | 14 +- > xen/include/xen/sched.h | 9 +- > 40 files changed, 96 insertions(+), 696 deletions(-) > delete mode 100644 docs/misc/pvh-readme.txt > > diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in > index 505c111..da1fdd7 100644 > --- a/docs/man/xl.cfg.pod.5.in > +++ b/docs/man/xl.cfg.pod.5.in > @@ -1064,6 +1064,12 @@ FIFO-based event channel ABI support up to 131,071 > event channels. > Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit > x86). > > +=item B
Re: [Xen-devel] [PATCH v13 3/3] iommu: add rmrr Xen command line option for extra rmrrs
On Thu, Jan 19, 2017 at 01:29:15AM -0700, Jan Beulich wrote: > >>> On 18.01.17 at 20:56,wrote: > > I am looking at rmrr_identity_mapping where the RMRR paddr get converted > > to pfn and then mapped with iommu. > > If ( rmrr->end_address & ~PAGE_SHIFT_MASK_4K ) == 0, the while loop > > while ( base_pfn < end_pfn ) > > will not map that inclusive end_address of rmrr. > > Does it seem wrong? > > I don't think so, no. end_pfn is being calculated using > PAGE_ALIGN_4K(), i.e. rounding up. I mean to say, if the end address is already aligned, then the page wont be mapped. For example, if end paddr is 0x000ed000, end_pfn will be 0x000ed and wont be mapped in the loop while ( base_pfn < end_pfn ). And we will have mapped RMRR end address saved in arch.mapped_rmrrs as 0x000ed000. Looks like parsed ACPI RMRR end addresses are extended to end of the page though. Not sure if there is somewhere same boundary alignment in code similar to what you proposed below. > > >> > +rmrr->segment = seg; > >> > +rmrr->base_address = pfn_to_paddr(user_rmrrs[i].base_pfn); > >> > +rmrr->end_address = pfn_to_paddr(user_rmrrs[i].end_pfn + 1); > >> > >> "And this seems wrong too, unless I'm mistaken with the inclusive-ness." > >> > > This one is the avoidance of the while loop mapping in > > rmrr_identity_mapping. > > Well, that's the purpose you describe, but the comment was about > the calculation itself, which I think is lacking a "- 1", but even better > would be - for avoiding boundary issues - > > rmrr->end_address = pfn_to_paddr(user_rmrrs[i].end_pfn) | ~PAGE_MASK; Yes, this will eliminate this problem. This will need to be accounted for in overlapping condition as well. > > or some such. > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v13 3/3] iommu: add rmrr Xen command line option for extra rmrrs
On Thu, Jan 12, 2017 at 04:44:42AM -0700, Jan Beulich wrote: > >>> On 10.01.17 at 23:57,wrote: > > Changes in v13: > > - Implement feedback from Kevin Tian. > > > > https://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03169.html > > > > https://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03170.html > > > > https://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03171.html > > Any reason some of the review comments I had given were left > un-addressed? I'll reproduce them in quotes below. > Hi Jan Thanks for reminding! That was my fault that I did not tell this to Venu when transferring this patchset to him. > > --- a/xen/drivers/passthrough/vtd/dmar.c > > +++ b/xen/drivers/passthrough/vtd/dmar.c > > @@ -859,6 +859,132 @@ out: > > return ret; > > } > > > > +#define MAX_EXTRA_RMRR_PAGES 16 > > +#define MAX_EXTRA_RMRR 10 > > + > > +/* RMRR units derived from command line rmrr option. */ > > +#define MAX_EXTRA_RMRR_DEV 20 > > So you've kept "extra" in these, but ... > > > +struct user_rmrr { > > ... switched to "user" here and below. Please be consistent. > > > +static int __init add_user_rmrr(void) > > +{ > > +struct acpi_rmrr_unit *rmrr, *rmrru; > > +unsigned int idx, seg, i; > > +unsigned long base, end; > > +bool overlap; > > + > > +for ( i = 0; i < nr_rmrr; i++ ) > > +{ > > +base = user_rmrrs[i].base_pfn; > > +end = user_rmrrs[i].end_pfn; > > + > > +if ( base > end ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Invalid RMRR Range "ERMRRU_FMT"\n", > > + ERMRRU_ARG(user_rmrrs[i])); > > +continue; > > +} > > + > > +if ( (end - base) >= MAX_EXTRA_RMRR_PAGES ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "RMRR range "ERMRRU_FMT" exceeds "\ > > + __stringify(MAX_EXTRA_RMRR_PAGES)" pages\n", > > + ERMRRU_ARG(user_rmrrs[i])); > > +continue; > > +} > > + > > +overlap = false; > > +list_for_each_entry(rmrru, _rmrr_units, list) > > +{ > > +if ( pfn_to_paddr(base) < rmrru->end_address && > > + rmrru->base_address < pfn_to_paddr(end + 1) ) > > "Aren't both ranges inclusive? I.e. shouldn't the first one be <= (and > the second one could be <= too when dropping the +1), matching > the check acpi_parse_one_rmrr() does?" I agree. The ranges in acpu_rmrr_units and user_rmrrs are inclusive. If this is fixed, then there is another part where I am not sure what would be the better way to fix this. If fix is needed. I am looking at rmrr_identity_mapping where the RMRR paddr get converted to pfn and then mapped with iommu. If ( rmrr->end_address & ~PAGE_SHIFT_MASK_4K ) == 0, the while loop while ( base_pfn < end_pfn ) will not map that inclusive end_address of rmrr. Does it seem wrong? > > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Overlapping RMRRs: "ERMRRU_FMT" and [%lx-%lx]\n", > > + ERMRRU_ARG(user_rmrrs[i]), > > + paddr_to_pfn(rmrru->base_address), > > + paddr_to_pfn(rmrru->end_address)); > > +overlap = true; > > +break; > > +} > > +} > > +/* Don't add overlapping RMRR. */ > > +if ( overlap ) > > +continue; > > + > > +do > > +{ > > +if ( !mfn_valid(base) ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Invalid pfn in RMRR range "ERMRRU_FMT"\n", > > + ERMRRU_ARG(user_rmrrs[i])); > > +break; > > +} > > +} while ( base++ < end ); > > + > > +/* Invalid pfn in range as the loop ended before end_pfn was > > reached. */ > > +if ( base <= end ) > > +continue; > > + > > +rmrr = xzalloc(struct acpi_rmrr_unit); > > +if ( !rmrr ) > > +return -ENOMEM; > > + > > +rmrr->scope.devices = xmalloc_array(u16, user_rmrrs[i].dev_count); > > +if ( !rmrr->scope.devices ) > > +{ > > +xfree(rmrr); > > +return -ENOMEM; > > +} > > + > > +seg = 0; > > +for ( idx = 0; idx < user_rmrrs[i].dev_count; idx++ ) > > +{ > > +rmrr->scope.devices[idx] = user_rmrrs[i].sbdf[idx]; > > +seg |= PCI_SEG(user_rmrrs[i].sbdf[idx]); > > +} > > +if ( seg != PCI_SEG(user_rmrrs[i].sbdf[0]) ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Segments are not equal for RMRR range "ERMRRU_FMT"\n", > > + ERMRRU_ARG(user_rmrrs[i])); > > +scope_devices_free(>scope); > > +xfree(rmrr); > > +continue; > > +
Re: [Xen-devel] [PATCH] xen/x86: Fix CONFIG_CRASH_DEBUG build following c/s 897129dea
On Fri, Jan 06, 2017 at 02:34:17PM +, Andrew Cooper wrote: > Found by a Travis RANDCONFIG run. > > Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com> > --- > CC: Jan Beulich <jbeul...@suse.com> > CC: Elena Ufimtseva <elena.ufimts...@oracle.com> > --- > xen/arch/x86/gdbstub.c| 8 > xen/arch/x86/x86_64/gdbstub.c | 2 +- > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/xen/arch/x86/gdbstub.c b/xen/arch/x86/gdbstub.c > index 2a39189..fe69f81 100644 > --- a/xen/arch/x86/gdbstub.c > +++ b/xen/arch/x86/gdbstub.c > @@ -66,16 +66,16 @@ gdb_arch_resume(struct cpu_user_regs *regs, > struct gdb_context *ctx) > { > if ( addr != -1UL ) > -regs->eip = addr; > +regs->rip = addr; > > -regs->eflags &= ~X86_EFLAGS_TF; > +regs->_eflags &= ~X86_EFLAGS_TF; > > /* Set eflags.RF to ensure we do not re-enter. */ > -regs->eflags |= X86_EFLAGS_RF; > +regs->_eflags |= X86_EFLAGS_RF; > > /* Set the trap flag if we are single stepping. */ > if ( type == GDB_STEP ) > -regs->eflags |= X86_EFLAGS_TF; > +regs->_eflags |= X86_EFLAGS_TF; > } > > /* > diff --git a/xen/arch/x86/x86_64/gdbstub.c b/xen/arch/x86/x86_64/gdbstub.c > index 2626519..2c2ab15 100644 > --- a/xen/arch/x86/x86_64/gdbstub.c > +++ b/xen/arch/x86/x86_64/gdbstub.c > @@ -44,7 +44,7 @@ gdb_arch_read_reg_array(struct cpu_user_regs *regs, struct > gdb_context *ctx) > GDB_REG64(regs->r15); > > GDB_REG64(regs->rip); > -GDB_REG32(regs->eflags); > +GDB_REG32(regs->_eflags); > > GDB_REG32(regs->cs); > GDB_REG32(regs->ss); > -- > 2.1.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 03/14] xen: Use a typesafe to define INVALID_MFN
On Fri, Jul 08, 2016 at 08:20:03PM +0100, Andrew Cooper wrote: > On 08/07/2016 23:01, Elena Ufimtseva wrote: > > > >>> @@ -838,7 +838,6 @@ mfn_t oos_snapshot_lookup(struct domain *d, mfn_t > >>> gmfn) > >>> > >>> SHADOW_ERROR("gmfn %lx was OOS but not in hash table\n", > >>> mfn_x(gmfn)); > >>> BUG(); > >>> -return _mfn(INVALID_MFN); > > Can compiler be unhappy about this? > > This was my suggestion, from a previous round of review. Ah! Thanks for explanation. > > A while ago, I annotated BUG() with unreachable(), as as execution will > not continue from a bugframe, but the shadow code is definitely older > than my change. > > As such, compilers will have been dropping this return statement as part > of dead-code-elimination anyway. > > This option is better than just replacing one bit of dead code with a > different bit of dead code. > > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 04/14] xen: Use a typesafe to define INVALID_GFN
FN) ) > >+return gfn_x(INVALID_GFN); > > > > /* translate l2 guest gfn into l1 guest gfn */ > > rv = nestedhap_walk_L1_p2m(v, l2_gfn, _gfn, _page_order, > > _p2ma, > >@@ -2123,7 +2123,7 @@ unsigned long paging_gva_to_gfn(struct vcpu *v, > > !!(*pfec & PFEC_insn_fetch)); > > > > if ( rv != NESTEDHVM_PAGEFAULT_DONE ) > >-return INVALID_GFN; > >+return gfn_x(INVALID_GFN); > > > > /* > > * Sanity check that l1_gfn can be used properly as a 4K mapping, > > even > >@@ -2415,7 +2415,7 @@ static void p2m_init_altp2m_helper(struct domain *d, > >unsigned int i) > > struct p2m_domain *p2m = d->arch.altp2m_p2m[i]; > > struct ept_data *ept; > > > >-p2m->min_remapped_gfn = INVALID_GFN; > >+p2m->min_remapped_gfn = gfn_x(INVALID_GFN); > > p2m->max_remapped_gfn = 0; > > ept = >ept; > > ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m)); > >@@ -2551,7 +2551,7 @@ int p2m_change_altp2m_gfn(struct domain *d, unsigned > >int idx, > > > > mfn = ap2m->get_entry(ap2m, gfn_x(old_gfn), , , 0, NULL, NULL); > > > >-if ( gfn_x(new_gfn) == INVALID_GFN ) > >+if ( gfn_eq(new_gfn, INVALID_GFN) ) > > { > > if ( mfn_valid(mfn) ) > > p2m_remove_page(ap2m, gfn_x(old_gfn), mfn_x(mfn), > > PAGE_ORDER_4K); > >@@ -2613,7 +2613,7 @@ static void p2m_reset_altp2m(struct p2m_domain *p2m) > > /* Uninit and reinit ept to force TLB shootdown */ > > ept_p2m_uninit(p2m); > > ept_p2m_init(p2m); > >-p2m->min_remapped_gfn = INVALID_GFN; > >+p2m->min_remapped_gfn = gfn_x(INVALID_GFN); > > p2m->max_remapped_gfn = 0; > > } > > > >diff --git a/xen/arch/x86/mm/shadow/common.c > >b/xen/arch/x86/mm/shadow/common.c > >index 1c0b6cd..61ccddf 100644 > >--- a/xen/arch/x86/mm/shadow/common.c > >+++ b/xen/arch/x86/mm/shadow/common.c > >@@ -1707,7 +1707,7 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v, > >unsigned long vaddr, > > > > /* Translate the VA to a GFN. */ > > gfn = paging_get_hostmode(v)->gva_to_gfn(v, NULL, vaddr, ); > >-if ( gfn == INVALID_GFN ) > >+if ( gfn == gfn_x(INVALID_GFN) ) > > { > > if ( is_hvm_vcpu(v) ) > > hvm_inject_page_fault(pfec, vaddr); > >diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c > >index f892e2f..e54c8b7 100644 > >--- a/xen/arch/x86/mm/shadow/multi.c > >+++ b/xen/arch/x86/mm/shadow/multi.c > >@@ -3660,7 +3660,7 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m, > > */ > > if ( is_hvm_vcpu(v) && !hvm_nx_enabled(v) && !hvm_smep_enabled(v) ) > > pfec[0] &= ~PFEC_insn_fetch; > >-return INVALID_GFN; > >+return gfn_x(INVALID_GFN); > > } > > gfn = guest_walk_to_gfn(); > > > >diff --git a/xen/arch/x86/mm/shadow/private.h > >b/xen/arch/x86/mm/shadow/private.h > >index c424ad6..824796f 100644 > >--- a/xen/arch/x86/mm/shadow/private.h > >+++ b/xen/arch/x86/mm/shadow/private.h > >@@ -796,7 +796,7 @@ static inline unsigned long vtlb_lookup(struct vcpu *v, > > unsigned long va, uint32_t pfec) > > { > > unsigned long page_number = va >> PAGE_SHIFT; > >-unsigned long frame_number = INVALID_GFN; > >+unsigned long frame_number = gfn_x(INVALID_GFN); > > int i = vtlb_hash(page_number); > > > > spin_lock(>arch.paging.vtlb_lock); > >diff --git a/xen/drivers/passthrough/amd/iommu_map.c > >b/xen/drivers/passthrough/amd/iommu_map.c > >index c758459..b8c0a48 100644 > >--- a/xen/drivers/passthrough/amd/iommu_map.c > >+++ b/xen/drivers/passthrough/amd/iommu_map.c > >@@ -555,7 +555,7 @@ static int update_paging_mode(struct domain *d, unsigned > >long gfn) > > unsigned long old_root_mfn; > > struct domain_iommu *hd = dom_iommu(d); > > > >-if ( gfn == INVALID_GFN ) > >+if ( gfn == gfn_x(INVALID_GFN) ) > > return -EADDRNOTAVAIL; > > ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH)); > > > >diff --git a/xen/drivers/passthrough/vtd/iommu.c > >b/xen/drivers/passthrough/vtd/iommu.c > >index f010612..c322b9f 100644 > >--- a/xen/drivers/passthrough/vtd/iommu.c > >+++ b/xen/drivers/passthrough/vtd/iommu.c > >@@ -611,7 +611,7 @@ static int __must_check iommu_flush_iotlb(struct domain > >*d, > > if ( iommu_domid == -1 ) > > continue; > > > >-if ( page_count != 1 || gfn == INVALID_GFN ) > >+if ( page_count != 1 || gfn == gfn_x(INVALID_GFN) ) > > rc = iommu_flush_iotlb_dsi(iommu, iommu_domid, > > 0, flush_dev_iotlb); > > else > >@@ -640,7 +640,7 @@ static int __must_check iommu_flush_iotlb_pages(struct > >domain *d, > > > > static int __must_check iommu_flush_iotlb_all(struct domain *d) > > { > >-return iommu_flush_iotlb(d, INVALID_GFN, 0, 0); > >+return iommu_flush_iotlb(d, gfn_x(INVALID_GFN), 0, 0); > > } > > > > /* clear one page's page table */ > >diff --git a/xen/drivers/passthrough/x86/iommu.c > >b/xen/drivers/passthrough/x86/iommu.c > >index cd435d7..69cd6c5 100644 > >--- a/xen/drivers/passthrough/x86/iommu.c > >+++ b/xen/drivers/passthrough/x86/iommu.c > >@@ -61,7 +61,7 @@ int arch_iommu_populate_page_table(struct domain *d) > > unsigned long mfn = page_to_mfn(page); > > unsigned long gfn = mfn_to_gmfn(d, mfn); > > > >-if ( gfn != INVALID_GFN ) > >+if ( gfn != gfn_x(INVALID_GFN) ) > > { > > ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH)); > > BUG_ON(SHARED_M2P(gfn)); > >diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h > >index a8d980c..79ed4ff 100644 > >--- a/xen/include/asm-x86/guest_pt.h > >+++ b/xen/include/asm-x86/guest_pt.h > >@@ -32,7 +32,7 @@ > > #error GUEST_PAGING_LEVELS not defined > > #endif > > > >-#define VALID_GFN(m) (m != INVALID_GFN) > >+#define VALID_GFN(m) (m != gfn_x(INVALID_GFN)) > > > > static inline int > > valid_gfn(gfn_t m) > >@@ -251,7 +251,7 @@ static inline gfn_t > > guest_walk_to_gfn(walk_t *gw) > > { > > if ( !(guest_l1e_get_flags(gw->l1e) & _PAGE_PRESENT) ) > >-return _gfn(INVALID_GFN); > >+return INVALID_GFN; > > return guest_l1e_get_gfn(gw->l1e); > > } > > > >diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h > >index 4ab3574..194020e 100644 > >--- a/xen/include/asm-x86/p2m.h > >+++ b/xen/include/asm-x86/p2m.h > >@@ -324,7 +324,7 @@ struct p2m_domain { > > #define NR_POD_MRP_ENTRIES 32 > > > > /* Encode ORDER_2M superpage in top bit of GFN */ > >-#define POD_LAST_SUPERPAGE (INVALID_GFN & ~(INVALID_GFN >> 1)) > >+#define POD_LAST_SUPERPAGE (gfn_x(INVALID_GFN) & ~(gfn_x(INVALID_GFN) >> 1)) > > > > unsigned long list[NR_POD_MRP_ENTRIES]; > > unsigned int idx; > >diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h > >index 7f207ec..58bc0b8 100644 > >--- a/xen/include/xen/mm.h > >+++ b/xen/include/xen/mm.h > >@@ -84,7 +84,7 @@ static inline bool_t mfn_eq(mfn_t x, mfn_t y) > > > > TYPE_SAFE(unsigned long, gfn); > > #define PRI_gfn "05lx" > >-#define INVALID_GFN (~0UL) > >+#define INVALID_GFN _gfn(~0UL) > > > > #ifndef gfn_t > > #define gfn_t /* Grep fodder: gfn_t, _gfn() and gfn_x() are defined above > > */ > > > > -- > Julien Grall Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com> ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 03/14] xen: Use a typesafe to define INVALID_MFN
On Wed, Jul 06, 2016 at 02:04:17PM +0100, Julien Grall wrote: > (CC Elena). > > On 06/07/16 14:01, Julien Grall wrote: > >Also take the opportunity to convert arch/x86/debug.c to the typesafe > >mfn and use proper printf format for MFN/GFN when the code around is > >modified. > > > >Signed-off-by: Julien Grall> >Reviewed-by: Andrew Cooper > >Acked-by: Stefano Stabellini > > > >--- > >Cc: Christoph Egger > >Cc: Liu Jinsong > >Cc: Jan Beulich > >Cc: Mukesh Rathor > > I forgot to update the CC list since GDSX maintainership was take over by > Elena. Sorry for that. No problem! > > >Cc: Paul Durrant > >Cc: Jun Nakajima > >Cc: Kevin Tian > >Cc: George Dunlap > >Cc: Tim Deegan > > > > Changes in v6: > > - Add Stefano's acked-by for ARM bits > > - Use PRI_mfn and PRI_gfn > > - Remove set of brackets when it is not necessary > > - Use mfn_add when possible > > - Add Andrew's reviewed-by > > > > Changes in v5: > > - Patch added > >--- > > xen/arch/arm/p2m.c | 4 +-- > > xen/arch/x86/cpu/mcheck/mce.c | 2 +- > > xen/arch/x86/debug.c| 58 > > + > > xen/arch/x86/hvm/hvm.c | 6 ++--- > > xen/arch/x86/hvm/viridian.c | 12 - > > xen/arch/x86/hvm/vmx/vmx.c | 2 +- > > xen/arch/x86/mm/guest_walk.c| 4 +-- > > xen/arch/x86/mm/hap/hap.c | 4 +-- > > xen/arch/x86/mm/p2m-ept.c | 6 ++--- > > xen/arch/x86/mm/p2m-pod.c | 18 ++--- > > xen/arch/x86/mm/p2m-pt.c| 18 ++--- > > xen/arch/x86/mm/p2m.c | 54 +++--- > > xen/arch/x86/mm/paging.c| 12 - > > xen/arch/x86/mm/shadow/common.c | 43 +++--- > > xen/arch/x86/mm/shadow/multi.c | 36 - > > xen/common/domain.c | 6 ++--- > > xen/common/grant_table.c| 6 ++--- > > xen/include/xen/mm.h| 2 +- > > 18 files changed, 147 insertions(+), 146 deletions(-) > > > >diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > >index 34563bb..d690602 100644 > >--- a/xen/arch/arm/p2m.c > >+++ b/xen/arch/arm/p2m.c > >@@ -1461,7 +1461,7 @@ int relinquish_p2m_mapping(struct domain *d) > > return apply_p2m_changes(d, RELINQUISH, > >pfn_to_paddr(p2m->lowest_mapped_gfn), > >pfn_to_paddr(p2m->max_mapped_gfn), > >- pfn_to_paddr(INVALID_MFN), > >+ pfn_to_paddr(mfn_x(INVALID_MFN)), > >MATTR_MEM, 0, p2m_invalid, > >d->arch.p2m.default_access); > > } > >@@ -1476,7 +1476,7 @@ int p2m_cache_flush(struct domain *d, xen_pfn_t > >start_mfn, xen_pfn_t end_mfn) > > return apply_p2m_changes(d, CACHEFLUSH, > > pfn_to_paddr(start_mfn), > > pfn_to_paddr(end_mfn), > >- pfn_to_paddr(INVALID_MFN), > >+ pfn_to_paddr(mfn_x(INVALID_MFN)), > > MATTR_MEM, 0, p2m_invalid, > > d->arch.p2m.default_access); > > } > >diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c > >index edcbe48..2695b0c 100644 > >--- a/xen/arch/x86/cpu/mcheck/mce.c > >+++ b/xen/arch/x86/cpu/mcheck/mce.c > >@@ -1455,7 +1455,7 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc) > > gfn = PFN_DOWN(gaddr); > > mfn = mfn_x(get_gfn(d, gfn, )); > > > >-if ( mfn == INVALID_MFN ) > >+if ( mfn == mfn_x(INVALID_MFN) ) > > { > > put_gfn(d, gfn); > > put_domain(d); > >diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c > >index 58cae22..9213ea7 100644 > >--- a/xen/arch/x86/debug.c > >+++ b/xen/arch/x86/debug.c > >@@ -43,11 +43,11 @@ typedef unsigned long dbgva_t; > > typedef unsigned char dbgbyte_t; > > > > /* Returns: mfn for the given (hvm guest) vaddr */ > >-static unsigned long > >+static mfn_t > > dbg_hvm_va2mfn(dbgva_t vaddr, struct domain *dp, int toaddr, > > unsigned long *gfn) > > { > >-unsigned long mfn; > >+mfn_t mfn; > > uint32_t pfec = PFEC_page_present; > > p2m_type_t gfntype; > > > >@@ -60,16 +60,17 @@ dbg_hvm_va2mfn(dbgva_t vaddr, struct domain *dp, int > >toaddr, > > return INVALID_MFN; > > } > > > >-mfn = mfn_x(get_gfn(dp, *gfn, )); > >+mfn = get_gfn(dp, *gfn, ); > > if ( p2m_is_readonly(gfntype) && toaddr ) > > { > >
[Xen-devel] [PATCH RESEND] MAINTAINERS/gdbsx: change maintainer
From: Elena Ufimtseva <elena.ufimts...@oracle.com> Change gdbsx maintainer to myself. Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index a8e0043..e91140f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -206,7 +206,7 @@ F: xen/common/event_fifo.c F: xen/include/xen/event_fifo.h GDBSX DEBUGGER -M: Mukesh Rathor <mukesh.rat...@oracle.com> +M: Elena Ufimtseva <elena.ufimts...@oracle.com> S: Supported F: xen/arch/x86/debug.c F: tools/debugger/gdbsx/ -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] GDBSX Maintainer
Hi Julien, Andrew I was talking to Konrad some time ago about looking into this and the possibility of maintaining gdbsx code. I am willing sign up for this if there are no objections. Elena On Tue, Jun 28, 2016 at 9:46 AM, Andrew Cooperwrote: > On 28/06/16 17:31, Julien Grall wrote: >> Hi, >> >> I had to modify some code in arch/x86/debug.c and noticed that Mukesh >> is still the maintainer. IIRC he left Oracle quite a while ago, so my >> e-mail was bounced by the server. >> >> Do we have a new e-mail address for me? If not, does anyone plan to >> maintain this code? Shall we mark the code as "Orphan"? > > If noone explicitly wishes to maintain it, then it should be subsumed > into general x86. Its not like its a large or complicated area of code. > > ~Andrew > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel -- Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] pcie error containment: kill domain and dm without xend
Thanks George! On Fri, Jun 24, 2016 at 4:37 AM, George Dunlap <george.dun...@citrix.com> wrote: > On Wed, Jun 22, 2016 at 9:16 PM, Elena Ufimtseva <ufimts...@gmail.com> wrote: >> Hello >> >> I am working on PCIe errors containment and XSA-124 relevant problem. >> This is only small part of the problem and I can provide more details later >> if that is of someone's interest. >> As the temporary solution, guest domain with passthrough device >> without SRIOV gets killed when certain AER errors are triggered by >> dom0 AER code. >> In versions of xen with xend present, xenwatch can be used and pciback can >> write some fields to xenstore (as "aerfail" which is already present) >> and destroy device model and then domain itself. >> What would be the best way to initiate similar behaviour when xend is >> not used? Or maybe what is the best way to initiate device model >> destruction and domain itself without xend? > > xl forks a background process per VM to monitor VMs and destroy device > models at the appropriate times -- see > tools/xl_cmdimpl.c:create_domain() (and in particular search for > "need_daemon"). This is the place to implement VM-watching features > such as xend had. > > -George -- Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] pcie error containment: kill domain and dm without xend
Hello I am working on PCIe errors containment and XSA-124 relevant problem. This is only small part of the problem and I can provide more details later if that is of someone's interest. As the temporary solution, guest domain with passthrough device without SRIOV gets killed when certain AER errors are triggered by dom0 AER code. In versions of xen with xend present, xenwatch can be used and pciback can write some fields to xenstore (as "aerfail" which is already present) and destroy device model and then domain itself. What would be the best way to initiate similar behaviour when xend is not used? Or maybe what is the best way to initiate device model destruction and domain itself without xend? Thanks! Elena ~ -- Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?
On Tue, Mar 01, 2016 at 10:51:30PM +0100, Sander Eikelenboom wrote: > > Tuesday, March 1, 2016, 9:39:25 PM, you wrote: > > > On Tue, Mar 01, 2016 at 02:52:14PM -0500, Meng Xu wrote: > >> Hi Elena, > >> > >> Thank you very much for sharing this! :-) > >> > >> On Tue, Mar 1, 2016 at 1:20 PM, Elena Ufimtseva > >> <elena.ufimts...@oracle.com> wrote: > >> > > >> > On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote: > >> > > On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk > >> > > <konrad.w...@oracle.com> wrote: > >> > > >> > Hey! > >> > > >> > > >> > > >> > CC-ing Elena. > >> > > >> > >> > > >> I think you forgot you cc.ed her.. > >> > > >> Anyway, let's cc. her now... :-) > >> > > >> > >> > > >> > > >> > > >> >> We are measuring the execution time between native machine > >> > > >> >> environment > >> > > >> >> and xen virtualization environment using PARSEC Benchmark [1]. > >> > > >> >> > >> > > >> >> In virtualiztion environment, we run a domU with three VCPUs, > >> > > >> >> each of > >> > > >> >> them pinned to a core; we pin the dom0 to another core that is > >> > > >> >> not > >> > > >> >> used by the domU. > >> > > >> >> > >> > > >> >> Inside the Linux in domU in virtualization environment and in > >> > > >> >> native > >> > > >> >> environment, We used the cpuset to isolate a core (or VCPU) for > >> > > >> >> the > >> > > >> >> system processors and to isolate a core for the benchmark > >> > > >> >> processes. > >> > > >> >> We also configured the Linux boot command line with isocpus= > >> > > >> >> option to > >> > > >> >> isolate the core for benchmark from other unnecessary processes. > >> > > >> > > >> > > >> > You may want to just offline them and also boot the machine with > >> > > >> > NUMA > >> > > >> > disabled. > >> > > >> > >> > > >> Right, the machine is booted up with NUMA disabled. > >> > > >> We will offline the unnecessary cores then. > >> > > >> > >> > > >> > > >> > > >> >> > >> > > >> >> We expect that execution time of benchmarks in xen virtualization > >> > > >> >> environment is larger than the execution time in native machine > >> > > >> >> environment. However, the evaluation gave us an opposite result. > >> > > >> >> > >> > > >> >> Below is the evaluation data for the canneal and streamcluster > >> > > >> >> benchmarks: > >> > > >> >> > >> > > >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial > >> > > >> >> Native: 6.387s > >> > > >> >> Virtualization: 5.890s > >> > > >> >> > >> > > >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial > >> > > >> >> Native: 5.276s > >> > > >> >> Virtualization: 5.240s > >> > > >> >> > >> > > >> >> Is there anything wrong with our evaluation that lead to the > >> > > >> >> abnormal > >> > > >> >> performance results? > >> > > >> > > >> > > >> > Nothing is wrong. Virtualization is naturally faster than > >> > > >> > baremetal! > >> > > >> > > >> > > >> > :-) > >> > > >> > > >> > > >> > No clue sadly. > >> > > >> > >> > > >> Ah-ha. This is really surprising to me Why will it speed up the > >> > > >> system by adding one more layer? Unless the virtualization disabled > >> > > >> some services that occur in native and interfe
Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?
On Tue, Mar 01, 2016 at 02:52:14PM -0500, Meng Xu wrote: > Hi Elena, > > Thank you very much for sharing this! :-) > > On Tue, Mar 1, 2016 at 1:20 PM, Elena Ufimtseva > <elena.ufimts...@oracle.com> wrote: > > > > On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote: > > > On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk > > > <konrad.w...@oracle.com> wrote: > > > >> > Hey! > > > >> > > > > >> > CC-ing Elena. > > > >> > > > >> I think you forgot you cc.ed her.. > > > >> Anyway, let's cc. her now... :-) > > > >> > > > >> > > > > >> >> We are measuring the execution time between native machine > > > >> >> environment > > > >> >> and xen virtualization environment using PARSEC Benchmark [1]. > > > >> >> > > > >> >> In virtualiztion environment, we run a domU with three VCPUs, each > > > >> >> of > > > >> >> them pinned to a core; we pin the dom0 to another core that is not > > > >> >> used by the domU. > > > >> >> > > > >> >> Inside the Linux in domU in virtualization environment and in native > > > >> >> environment, We used the cpuset to isolate a core (or VCPU) for the > > > >> >> system processors and to isolate a core for the benchmark processes. > > > >> >> We also configured the Linux boot command line with isocpus= option > > > >> >> to > > > >> >> isolate the core for benchmark from other unnecessary processes. > > > >> > > > > >> > You may want to just offline them and also boot the machine with NUMA > > > >> > disabled. > > > >> > > > >> Right, the machine is booted up with NUMA disabled. > > > >> We will offline the unnecessary cores then. > > > >> > > > >> > > > > >> >> > > > >> >> We expect that execution time of benchmarks in xen virtualization > > > >> >> environment is larger than the execution time in native machine > > > >> >> environment. However, the evaluation gave us an opposite result. > > > >> >> > > > >> >> Below is the evaluation data for the canneal and streamcluster > > > >> >> benchmarks: > > > >> >> > > > >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial > > > >> >> Native: 6.387s > > > >> >> Virtualization: 5.890s > > > >> >> > > > >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial > > > >> >> Native: 5.276s > > > >> >> Virtualization: 5.240s > > > >> >> > > > >> >> Is there anything wrong with our evaluation that lead to the > > > >> >> abnormal > > > >> >> performance results? > > > >> > > > > >> > Nothing is wrong. Virtualization is naturally faster than baremetal! > > > >> > > > > >> > :-) > > > >> > > > > >> > No clue sadly. > > > >> > > > >> Ah-ha. This is really surprising to me Why will it speed up the > > > >> system by adding one more layer? Unless the virtualization disabled > > > >> some services that occur in native and interfere with the benchmark. > > > >> > > > >> If virtualization is faster than baremetal by nature, why we can see > > > >> that some experiment shows that virtualization introduces overhead? > > > > > > > > Elena told me that there were some weird regression in Linux 4.1 - where > > > > CPU burning workloads were _slower_ on baremetal than as guests. > > > > > > Hi Elena, > > > Would you mind sharing with us some of your experience of how you > > > found the real reason? Did you use some tool or some methodology to > > > pin down the reason (i.e, CPU burning workloads in native is _slower_ > > > on baremetal than as guests)? > > > > > > > Hi Meng > > > > Yes, sure! > > > > While working on performance tests for smt-exposing patches from Joao > > I run CPU bound workload in HVM guest and using same kernel in baremetal > > run same
Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?
On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote: > On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk >wrote: > >> > Hey! > >> > > >> > CC-ing Elena. > >> > >> I think you forgot you cc.ed her.. > >> Anyway, let's cc. her now... :-) > >> > >> > > >> >> We are measuring the execution time between native machine environment > >> >> and xen virtualization environment using PARSEC Benchmark [1]. > >> >> > >> >> In virtualiztion environment, we run a domU with three VCPUs, each of > >> >> them pinned to a core; we pin the dom0 to another core that is not > >> >> used by the domU. > >> >> > >> >> Inside the Linux in domU in virtualization environment and in native > >> >> environment, We used the cpuset to isolate a core (or VCPU) for the > >> >> system processors and to isolate a core for the benchmark processes. > >> >> We also configured the Linux boot command line with isocpus= option to > >> >> isolate the core for benchmark from other unnecessary processes. > >> > > >> > You may want to just offline them and also boot the machine with NUMA > >> > disabled. > >> > >> Right, the machine is booted up with NUMA disabled. > >> We will offline the unnecessary cores then. > >> > >> > > >> >> > >> >> We expect that execution time of benchmarks in xen virtualization > >> >> environment is larger than the execution time in native machine > >> >> environment. However, the evaluation gave us an opposite result. > >> >> > >> >> Below is the evaluation data for the canneal and streamcluster > >> >> benchmarks: > >> >> > >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial > >> >> Native: 6.387s > >> >> Virtualization: 5.890s > >> >> > >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial > >> >> Native: 5.276s > >> >> Virtualization: 5.240s > >> >> > >> >> Is there anything wrong with our evaluation that lead to the abnormal > >> >> performance results? > >> > > >> > Nothing is wrong. Virtualization is naturally faster than baremetal! > >> > > >> > :-) > >> > > >> > No clue sadly. > >> > >> Ah-ha. This is really surprising to me Why will it speed up the > >> system by adding one more layer? Unless the virtualization disabled > >> some services that occur in native and interfere with the benchmark. > >> > >> If virtualization is faster than baremetal by nature, why we can see > >> that some experiment shows that virtualization introduces overhead? > > > > Elena told me that there were some weird regression in Linux 4.1 - where > > CPU burning workloads were _slower_ on baremetal than as guests. > > Hi Elena, > Would you mind sharing with us some of your experience of how you > found the real reason? Did you use some tool or some methodology to > pin down the reason (i.e, CPU burning workloads in native is _slower_ > on baremetal than as guests)? > Hi Meng Yes, sure! While working on performance tests for smt-exposing patches from Joao I run CPU bound workload in HVM guest and using same kernel in baremetal run same test. While testing cpu-bound workload on baremetal linux (4.1.0-rc2) I found that the time to complete the same test is few times more that as it takes for the same under HVM guest. I have tried tests where kernel threads pinned to cores and without pinning. The execution times are most of the times take as twice longer, sometimes 4 times longer that HVM case. Interesting is not only that it takes sometimes 3-4 times more than HVM guest, but also that test with bound threads (to cores) takes almost 3 times longer to execute than running same cpu-bound test under HVM (in all configurations). I run each test 5 times and here are the execution times (seconds): - baremetal | thread_bind | thread unbind | HVM pinned to cores --- |---|- 74 | 83|28 74 | 88|28 74 | 38|28 74 | 73|28 74 | 87|28 Sometimes better times were on unbinded tests, but not often enough to present it here. Some results are much worse and reach up to 120 seconds. Each test has 8 kernel threads. In baremetal case I tried the following: - numa off,on; - all cpus are on; - isolate cpus from first node; - set intel_idle.max_cstate=1; - disable intel_pstate; I dont think I have exhausted all the options here, but it looked like two last changes did improve performance, but was still not comparable to HVM case. I am trying to find where regression had happened. Performance on newer kernel (I tried 4.5.0-rc4+) was close or better than HVM. I am trying to find f there were some relevant regressions to understand the reason of this. What kernel you guys use? Elena See more description of the tests here: http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg02874.html Joao patches are here:
Re: [Xen-devel] schedulers and topology exposing questions
On Thu, Jan 28, 2016 at 09:46:46AM +, Dario Faggioli wrote: > On Wed, 2016-01-27 at 11:03 -0500, Elena Ufimtseva wrote: > > On Wed, Jan 27, 2016 at 10:27:01AM -0500, Konrad Rzeszutek Wilk > > wrote: > > > On Wed, Jan 27, 2016 at 03:10:01PM +, George Dunlap wrote: > > > > On 27/01/16 14:33, Konrad Rzeszutek Wilk wrote: > > > > > On Tue, Jan 26, 2016 at 11:21:36AM +, George Dunlap wrote: > > > > > > On 22/01/16 16:54, Elena Ufimtseva wrote: > > > > > > > Hello all! > > > > > > > > > > > > > > Dario, Gerorge or anyone else, your help will be > > > > > > > appreciated. > > > > > > > > > > > > > > Let me put some intro to our findings. I may forget > > > > > > > something or put something > > > > > > > not too explicit, please ask me. > > > > > > > > > > > > > > Customer filled a bug where some of the applications were > > > > > > > running slow in their HVM DomU setups. > > > > > > > These running times were compared against baremetal running > > > > > > > same kernel version as HVM DomU. > > > > > > > > > > > > > > After some investigation by different parties, the test > > > > > > > case scenario was found > > > > > > > where the problem was easily seen. The test app is a udp > > > > > > > server/client pair where > > > > > > > client passes some message n number of times. > > > > > > > The test case was executed on baremetal and Xen DomU with > > > > > > > kernel version 2.6.39. > > > > > > > Bare metal showed 2x times better result that DomU. > > > > > > > > > > > > > > Konrad came up with a workaround that was setting the flag > > > > > > > for domain scheduler in linux > > > > > > > As the guest is not aware of SMT-related topology, it has a > > > > > > > flat topology initialized. > > > > > > > Kernel has domain scheduler flags for scheduling domain CPU > > > > > > > set to 4143 for 2.6.39. > > > > > > > Konrad discovered that changing the flag for CPU sched > > > > > > > domain to 4655 > > > > > > > works as a workaround and makes Linux think that the > > > > > > > topology has SMT threads. > > > > > > > This workaround makes the test to complete almost in same > > > > > > > time as on baremetal (or insignificantly worse). > > > > > > > > > > > > > > This workaround is not suitable for kernels of higher > > > > > > > versions as we discovered. > > > > > > > > > > > > > > The hackish way of making domU linux think that it has SMT > > > > > > > threads (along with matching cpuid) > > > > > > > made us thinks that the problem comes from the fact that > > > > > > > cpu topology is not exposed to > > > > > > > guest and Linux scheduler cannot make intelligent decision > > > > > > > on scheduling. > > > > > > > > > > > > > > Joao Martins from Oracle developed set of patches that > > > > > > > fixed the smt/core/cashe > > > > > > > topology numbering and provided matching pinning of vcpus > > > > > > > and enabling options, > > > > > > > allows to expose to guest correct topology. > > > > > > > I guess Joao will be posting it at some point. > > > > > > > > > > > > > > With this patches we decided to test the performance impact > > > > > > > on different kernel versionand Xen versions. > > > > > > > > > > > > > > The test described above was labeled as IO-bound test. > > > > > > > > > > > > So just to clarify: The client sends a request (presumably > > > > > > not much more > > > > > > than a ping) to the server, and waits for the server to > > > > > > respond before > > > > > > sending another one; and the server does the reverse -- > > > > > > receives a > > > > > > request, responds, and then waits for the n
Re: [Xen-devel] schedulers and topology exposing questions
On Thu, Jan 28, 2016 at 09:55:45AM +, Dario Faggioli wrote: > On Wed, 2016-01-27 at 15:53 +, George Dunlap wrote: > > On 27/01/16 15:27, Konrad Rzeszutek Wilk wrote: > > > > > > So Elena started looking at the CPU bound and seeing how Xen > > > behaves then > > > and if we can improve the floating situation as she saw some > > > abnormal > > > behavious. > > > > OK -- if the focus was on the two cases where the Xen credit1 > > scheduler > > (apparently) co-located two cpu-burning vcpus on sibling threads, > > then > > yeah, that's behavior we should probably try to get to the bottom of. > > > Well, let's see the trace. Hey Dario Please disregard the previous email with topology information. It was incorrect and I am attaching the topology that is actually result of Joao smt patches application. Elena > > In any case, I'm up to trying hooking the SMT load balancer in > runq_tickle (which would mean doing it upon every vcpus wakeup). > > My gut feeling is that the overhead my outwieght the benefit, and that > it will actually reveal useful only in a minority of the > cases/workloads, but it's maybe worth a try. > > Regards, > Dario > -- > <> (Raistlin Majere) > - > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) > processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Genuine Intel(R) CPU @ 2.80GHz stepping: 2 microcode : 0x209 cpu MHz : 2793.360 cache size : 25600 KB physical id : 0 siblings: 16 core id : 0 cpu cores : 8 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt bugs: bogomips: 5586.72 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Genuine Intel(R) CPU @ 2.80GHz stepping: 2 microcode : 0x209 cpu MHz : 2793.360 cache size : 25600 KB physical id : 0 siblings: 16 core id : 0 cpu cores : 8 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt bugs: bogomips: 5586.72 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Genuine Intel(R) CPU @ 2.80GHz stepping: 2 microcode : 0x209 cpu MHz : 2793.360 cache size : 25600 KB physical id : 0 siblings: 16 core id : 1 cpu cores : 8 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt bugs: bogomips: 5586.72 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Genuine Intel(R) CPU @ 2.80GHz stepping: 2 microcode : 0x209 cpu MHz : 2793.360 cache size : 25600 KB physical id : 0 siblings: 16 core id : 1 cpu cores : 8 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt
Re: [Xen-devel] schedulers and topology exposing questions
On Wed, Jan 27, 2016 at 02:01:35PM +, Dario Faggioli wrote: > On Fri, 2016-01-22 at 11:54 -0500, Elena Ufimtseva wrote: > > Hello all! > > > Hey, here I am again, > > > Konrad came up with a workaround that was setting the flag for domain > > scheduler in linux > > As the guest is not aware of SMT-related topology, it has a flat > > topology initialized. > > Kernel has domain scheduler flags for scheduling domain CPU set to > > 4143 for 2.6.39. > > Konrad discovered that changing the flag for CPU sched domain to 4655 > > > So, as you've seen, I also have been up to doing quite a few of > benchmarking doing soemthing similar (I used more recent kernels, and > decided to test 4131 as flags. > > In your casse, according to this: > http://lxr.oss.org.cn/source/include/linux/sched.h?v=2.6.39#L807 > > 4655 means: > SD_LOAD_BALANCE | > SD_BALANCE_EXEC | > > SD_BALANCE_WAKE | > SD_PREFER_LOCAL | [*] > > SD_SHARE_PKG_RESOURCES | > SD_SERIALIZE > > and another bit (0x4000) that I don't immediately see what it is. > > Things have changed a bit since then, it appears. However, I'm quite sure > I've tested turning on SD_SERIALIZE in 4.2.0 and 4.3.0, and results were > really pretty bad (as you also seem to say later). > > > works as a workaround and makes Linux think that the topology has SMT > > threads. > > > Well, yes and no. :-). I don't want to make this all a terminology > bunfight, something that also matters here is how many scheduling > domains you have. > > To check that (although in recent kernels) you check here: > > ls /proc/sys/kernel/sched_domain/cpu2/ (any cpu is ok) > > and see how many domain[0-9] you have. > > On baremetal, on an HT cpu, I've got this: > > $ cat /proc/sys/kernel/sched_domain/cpu2/domain*/name > SMT > MC > > So, two domains, one of which is the SMT one. If you check their flags, > they're different: > > $ cat /proc/sys/kernel/sched_domain/cpu2/domain*/flags > 4783 > 559 > > So, yes, you are right in saying that 4655 is related to SMT. In fact, > it is what (among other things) tells the load balancer that *all* the > cpus (well, all the scheduling groups, actually) in this domain are SMT > siblings... Which is a legitimate thing to do, but it's not what > happens on SMT baremetal. > > At least is consistent, IMO. I.e., it still creates a pretty flat > topology, like there was a big core, of which _all_ the vcpus are part > of, as SMT siblings. > > The other option (the one I'm leaning toward) was too get rid of that > one flag. I've only done preliminary experiments with it on and off, > and the ones with it off were better looking, so I did keep it off for > the big run... but we can test with it again. > > > This workaround makes the test to complete almost in same time as on > > baremetal (or insignificantly worse). > > > > This workaround is not suitable for kernels of higher versions as we > > discovered. > > > There may be more than one reason for this (as said, a lot changed!) > but it matches what I've found when SD_SERIALIZE was kept on for the > scheduling domain where all the vcpus are. > > > The hackish way of making domU linux think that it has SMT threads > > (along with matching cpuid) > > made us thinks that the problem comes from the fact that cpu topology > > is not exposed to > > guest and Linux scheduler cannot make intelligent decision on > > scheduling. > > > As said, I think it's the other way around: we expose too much of it > (and this is more of an issue for PV rather than for HVM). Basically, > either you do the pinning you're doing or, whatever you expose, will be > *wrong*... and the only way to expose not wrong data is to actually > don't expose anything! :-) > > > The test described above was labeled as IO-bound test. > > > > We have run io-bound test with and without smt-patches. The > > improvement comparing > > to base case (no smt patches, flat topology) shows 22-23% gain. > > > I'd be curious to see the content of the /proc/sys/kernel/sched_domain > directory and subdirectories with Joao's patches applied. > > > While we have seen improvement with io-bound tests, the same did not > > happen with cpu-bound workload. > > As cpu-bound test we use kernel module which runs requested number of > > kernel threads > > and each thread compresses and decompresses some data. > > > That is somewhat what I would have expected, although up to what > extent, it's hard to tell in advance. > > It also matches my fin
Re: [Xen-devel] schedulers and topology exposing questions
On Wed, Jan 27, 2016 at 10:27:01AM -0500, Konrad Rzeszutek Wilk wrote: > On Wed, Jan 27, 2016 at 03:10:01PM +, George Dunlap wrote: > > On 27/01/16 14:33, Konrad Rzeszutek Wilk wrote: > > > On Tue, Jan 26, 2016 at 11:21:36AM +, George Dunlap wrote: > > >> On 22/01/16 16:54, Elena Ufimtseva wrote: > > >>> Hello all! > > >>> > > >>> Dario, Gerorge or anyone else, your help will be appreciated. > > >>> > > >>> Let me put some intro to our findings. I may forget something or put > > >>> something > > >>> not too explicit, please ask me. > > >>> > > >>> Customer filled a bug where some of the applications were running slow > > >>> in their HVM DomU setups. > > >>> These running times were compared against baremetal running same kernel > > >>> version as HVM DomU. > > >>> > > >>> After some investigation by different parties, the test case scenario > > >>> was found > > >>> where the problem was easily seen. The test app is a udp server/client > > >>> pair where > > >>> client passes some message n number of times. > > >>> The test case was executed on baremetal and Xen DomU with kernel > > >>> version 2.6.39. > > >>> Bare metal showed 2x times better result that DomU. > > >>> > > >>> Konrad came up with a workaround that was setting the flag for domain > > >>> scheduler in linux > > >>> As the guest is not aware of SMT-related topology, it has a flat > > >>> topology initialized. > > >>> Kernel has domain scheduler flags for scheduling domain CPU set to 4143 > > >>> for 2.6.39. > > >>> Konrad discovered that changing the flag for CPU sched domain to 4655 > > >>> works as a workaround and makes Linux think that the topology has SMT > > >>> threads. > > >>> This workaround makes the test to complete almost in same time as on > > >>> baremetal (or insignificantly worse). > > >>> > > >>> This workaround is not suitable for kernels of higher versions as we > > >>> discovered. > > >>> > > >>> The hackish way of making domU linux think that it has SMT threads > > >>> (along with matching cpuid) > > >>> made us thinks that the problem comes from the fact that cpu topology > > >>> is not exposed to > > >>> guest and Linux scheduler cannot make intelligent decision on > > >>> scheduling. > > >>> > > >>> Joao Martins from Oracle developed set of patches that fixed the > > >>> smt/core/cashe > > >>> topology numbering and provided matching pinning of vcpus and enabling > > >>> options, > > >>> allows to expose to guest correct topology. > > >>> I guess Joao will be posting it at some point. > > >>> > > >>> With this patches we decided to test the performance impact on > > >>> different kernel versionand Xen versions. > > >>> > > >>> The test described above was labeled as IO-bound test. > > >> > > >> So just to clarify: The client sends a request (presumably not much more > > >> than a ping) to the server, and waits for the server to respond before > > >> sending another one; and the server does the reverse -- receives a > > >> request, responds, and then waits for the next request. Is that right? > > > > > > Yes. > > >> > > >> How much data is transferred? > > > > > > 1 packet, UDP > > >> > > >> If the amount of data transferred is tiny, then the bottleneck for the > > >> test is probably the IPI time, and I'd call this a "ping-pong" > > >> benchmark[1]. I would only call this "io-bound" if you're actually > > >> copying large amounts of data. > > > > > > What we found is that on baremetal the scheduler would put both apps > > > on the same CPU and schedule them right after each other. This would > > > have a high IPI as the scheduler would poke itself. > > > On Xen it would put the two applications on seperate CPUs - and there > > > would be hardly any IPI. > > > > Sorry -- why would the scheduler send itself an IPI if it's on the same > > logical cpu (which seems pretty pointless),
Re: [Xen-devel] schedulers and topology exposing questions
On Fri, Jan 22, 2016 at 06:29:19PM +0100, Dario Faggioli wrote: > On Fri, 2016-01-22 at 11:54 -0500, Elena Ufimtseva wrote: > > Hello all! > > > Hello, > > > Let me put some intro to our findings. I may forget something or put > > something > > not too explicit, please ask me. > > > > Customer filled a bug where some of the applications were running > > slow in their HVM DomU setups. > > These running times were compared against baremetal running same > > kernel version as HVM DomU. > > > > After some investigation by different parties, the test case scenario > > was found > > where the problem was easily seen. The test app is a udp > > server/client pair where > > client passes some message n number of times. > > The test case was executed on baremetal and Xen DomU with kernel > > version 2.6.39. > > Bare metal showed 2x times better result that DomU. > > > > Konrad came up with a workaround that was setting the flag for domain > > scheduler in linux > > As the guest is not aware of SMT-related topology, it has a flat > > topology initialized. > > Kernel has domain scheduler flags for scheduling domain CPU set to > > 4143 for 2.6.39. > > Konrad discovered that changing the flag for CPU sched domain to 4655 > > works as a workaround and makes Linux think that the topology has SMT > > threads. > > This workaround makes the test to complete almost in same time as on > > baremetal (or insignificantly worse). > > > > This workaround is not suitable for kernels of higher versions as we > > discovered. > > > > The hackish way of making domU linux think that it has SMT threads > > (along with matching cpuid) > > made us thinks that the problem comes from the fact that cpu topology > > is not exposed to > > guest and Linux scheduler cannot make intelligent decision on > > scheduling. > > > So, me an Juergen (from SuSE) have been working on this for a while > too. > > As far as my experiments goes, there are at least two different issues, > both traceable to Linux's scheduler behavior. One has to do with what > you just say, i.e., topology. > > Juergen has developed a set of patches, and I'm running benchamrks with > them applied to both Dom0 and DomU, to see how they work. > > I'm not far from finishing running a set of 324 different test cases > (each one run both without and with Juergen's patches). I am running > different benchamrks, such as: > - iperf, > - a Xen build, > - sysbench --oltp, > - sysbench --cpu, > - unixbench > > and I'm also varying how loaded the host is, how big the VMs are, and > how loaded the VMs are. Thats pretty cool. I also tried in my tests oversubscribed tests. > > 324 is the result of various combinations of the above... It's quite an > extensive set! :-P It is! Even with my few tests its a lot of work. > > As soon as everything finishes running, I'll data mine the results, and > let you know how they look like. > > > The other issue that I've observed is that tweaking some _non_ topology > related scheduling domains' flags also impact performance, sometimes in > a quite sensible way. > > I have got the results from the 324 test cases described above of > running with flags set to 4131 inside all the DomUs. That value was > chosen after quite a bit of preliminary benchmarking and investigation > as well. > > I'll share the results of that data set as well as soon as I manage to > extract them from the raw output. > > > Joao Martins from Oracle developed set of patches that fixed the > > smt/core/cashe > > topology numbering and provided matching pinning of vcpus and > > enabling options, > > allows to expose to guest correct topology. > > I guess Joao will be posting it at some point. > > > That is one way of approaching the topology issue. The other, which is > what me and Juergen are pursuing, is the opposite one, i.e., make the > DomU (and Dom0, actually) think that the topology is always completely > flat. > > I think, ideally, we want both: flat topology as the default, if no > pinning is specifying. Matching topology if it is. > > > With this patches we decided to test the performance impact on > > different kernel versionand Xen versions. > > > That is really interesting, and thanks a lot for sharing it with us. > > I'm in the middle of something here, so I just wanted to quickly let > you know that we're also working on something related... I'll have a > look at the rest of the email and at the graphs ASAP. Great! I am attaching the io and cpu-bound tests that were used to get the data. Thanks Dario! > > Thanks again and Regards, > Dario > -- > <> (Raistlin Majere) > - > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK) > perf_tests.tar.gz Description: application/gzip ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] how to enable kdb for xen
On Fri, Dec 18, 2015 at 11:24 PM, quizyjoneswrote: > Is there any progress? Hey I did look into this and I could not find the trace of what I have done before. So I decided to ytu and port it to current version from this Mukesh patch: http://lists.xen.org/archives/html/xen-devel/2014-04/msg3.html It looks like it applied without major issues, but I have not tested it yet, but its in my plan for next week. Elena > >> Date: Wed, 16 Dec 2015 09:42:47 -0500 >> From: ufimts...@gmail.com >> To: konrad.w...@oracle.com >> CC: elena.ufimts...@oracle.com; quizy_jo...@outlook.com; t...@xen.org; >> xen-devel@lists.xen.org >> Subject: Re: [Xen-devel] how to enable kdb for xen > >> >> On Wed, Dec 16, 2015 at 9:30 AM, Konrad Rzeszutek Wilk >> wrote: >> > On December 16, 2015 3:08:04 AM EST, quizyjones >> > wrote: >> >>The version embedded with kdb only updates to 4.1.0. How can I use it >> >>with xen 4.6? Or is there any other debuggers which can step in Xen? >> > >> > CCing Elena who poked at it some point. Not sure if she got it ported >> > over though. >> >> >> >>From: quizy_jo...@outlook.com >> >>To: xen-devel@lists.xen.org >> >>Date: Wed, 16 Dec 2015 06:57:02 + >> >>Subject: [Xen-devel] how to enable kdb for xen >> >> >> >> >> >> >> >> >> >>I tried to debug xen use kdb. After compiling xen with debug=y, is >> >>there any further steps I should take? I can get console outputs start >> >>with: Xen 4.4.1 (XEN) Xen version 4.4.1 (root@) (gcc >> >>(Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4) debug=y Wed Dec 16 11:01:14 >> >>.But I can't step into the boot procedure. The kdb seems not built >> >>in and there is no kdb folder in /tools/debugger. How can I build >> >>xen-4.4.1/xen-4.4.6 with kdb? >> >> Hey! >> If I recall correctly, I did try to port kdb. Let me find out what >> happened there. >> >> Elena >> >> >> >> >> >> >> >> >>___ >> >>Xen-devel mailing list >> >>Xen-devel@lists.xen.org >> >>http://lists.xen.org/xen-devel >> >> >> >> >> >> >> >>___ >> >>Xen-devel mailing list >> >>Xen-devel@lists.xen.org >> >>http://lists.xen.org/xen-devel >> > >> > >> > >> > ___ >> > Xen-devel mailing list >> > Xen-devel@lists.xen.org >> > http://lists.xen.org/xen-devel >> >> >> >> -- >> Elena >> >> ___ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel -- Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] how to enable kdb for xen
On Wed, Dec 16, 2015 at 9:30 AM, Konrad Rzeszutek Wilkwrote: > On December 16, 2015 3:08:04 AM EST, quizyjones > wrote: >>The version embedded with kdb only updates to 4.1.0. How can I use it >>with xen 4.6? Or is there any other debuggers which can step in Xen? > > CCing Elena who poked at it some point. Not sure if she got it ported over > though. >> >>From: quizy_jo...@outlook.com >>To: xen-devel@lists.xen.org >>Date: Wed, 16 Dec 2015 06:57:02 + >>Subject: [Xen-devel] how to enable kdb for xen >> >> >> >> >>I tried to debug xen use kdb. After compiling xen with debug=y, is >>there any further steps I should take? I can get console outputs start >>with:Xen 4.4.1(XEN) Xen version 4.4.1 (root@) (gcc >>(Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4) debug=y Wed Dec 16 11:01:14 >>.But I can't step into the boot procedure. The kdb seems not built >>in and there is no kdb folder in /tools/debugger. How can I build >>xen-4.4.1/xen-4.4.6 with kdb? Hey! If I recall correctly, I did try to port kdb. Let me find out what happened there. Elena >> >> >> >>___ >>Xen-devel mailing list >>Xen-devel@lists.xen.org >>http://lists.xen.org/xen-devel >> >> >> >>___ >>Xen-devel mailing list >>Xen-devel@lists.xen.org >>http://lists.xen.org/xen-devel > > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel -- Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v12 3/3] iommu: add rmrr Xen command line option for extra rmrrs
On Fri, Nov 06, 2015 at 04:05:25AM -0700, Jan Beulich wrote: > >>> On 06.11.15 at 05:22,wrote: > > On Wed, Oct 28, 2015 at 10:05:31AM -0600, Jan Beulich wrote: > >> >>> On 27.10.15 at 21:36, wrote: > >> > +static void __init add_extra_rmrr(void) > >> > +{ > >> > +struct acpi_rmrr_unit *acpi_rmrr; > >> > +struct acpi_rmrr_unit *rmrru; > >> > +unsigned int dev, seg, i; > >> > +unsigned long pfn; > >> > +bool_t overlap; > >> > + > >> > +for ( i = 0; i < nr_rmrr; i++ ) > >> > +{ > >> > +if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn > >> > ) > >> > +{ > >> > +printk(XENLOG_ERR VTDPREFIX > >> > + "Invalid RMRR Range "ERMRRU_FMT"\n", > >> > + ERMRRU_ARG(extra_rmrr_units[i])); > >> > +continue; > >> > +} > >> > + > >> > +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn > >> > >= > >> > + MAX_EXTRA_RMRR_PAGES ) > >> > +{ > >> > +printk(XENLOG_ERR VTDPREFIX > >> > + "RMRR range "ERMRRU_FMT" exceeds > >> > "__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n", > >> > + ERMRRU_ARG(extra_rmrr_units[i])); > >> > +continue; > >> > +} > >> > + > >> > +overlap = 0; > >> > +list_for_each_entry(rmrru, _rmrr_units, list) > >> > +{ > >> > +if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn) < > >> > rmrru->end_address && > >> > + rmrru->base_address < > >> > pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1) ) > >> > >> Aren't both ranges inclusive? I.e. shouldn't the first one be <= (and > >> the second one could be <= too when dropping the +1), matching > >> the check acpi_parse_one_rmrr() does? > > > > The end_address is not inclusive, while the start_address is. > > These to from rmrr_identity_mapping() > > ... > > ASSERT(rmrr->base_address < rmrr->end_address); > > > > These are byte-granular addresses. > > > and: > > ... > > while ( base_pfn < end_pfn ) > > { > > int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag); > > > > > > > > if ( err ) > > > > > > return err; > > > > > > base_pfn++; > > > > > > } > > ... > > > > I think this condition should not be a problem. But yes, its not uniform > > with acpi_parse_one_rmrr. > > Did you actually pay attention to how end_pfn gets calculated? > > > I guess I should send another version then? > > Yes of course. Ok, I see your point. > > >> > +} > >> > +if ( seg != PCI_SEG(extra_rmrr_units[i].sbdf[0]) ) > >> > +{ > >> > +printk(XENLOG_ERR VTDPREFIX > >> > + "Segments are not equal for RMRR range > >> > "ERMRRU_FMT"\n", > >> > + ERMRRU_ARG(extra_rmrr_units[i])); > >> > +scope_devices_free(_rmrr->scope); > >> > +xfree(acpi_rmrr); > >> > +continue; > >> > +} > >> > + > >> > +acpi_rmrr->segment = seg; > >> > +acpi_rmrr->base_address = > > pfn_to_paddr(extra_rmrr_units[i].base_pfn); > >> > +acpi_rmrr->end_address = > >> > pfn_to_paddr(extra_rmrr_units[i].end_pfn + > > 1); > >> > >> And this seems wrong too, unless I'm mistaken with the inclusive-ness. > >> > > The end_address is exclusive, see above. > No - see above. You are right, I actually meant to say end_pfn for extra rmrr in not inclusive. And this case is only valid when base_pfn == end_pfn as the parser does not take care of the case where there is only one pfn specified. The assumption in this case is that user meant [base_pfn, base_pfn + 1]. I think it will be safe to add the condition when incrementing. > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v12 3/3] iommu: add rmrr Xen command line option for extra rmrrs
On Wed, Oct 28, 2015 at 10:05:31AM -0600, Jan Beulich wrote: > >>> On 27.10.15 at 21:36,wrote: > > +static void __init add_extra_rmrr(void) > > +{ > > +struct acpi_rmrr_unit *acpi_rmrr; > > +struct acpi_rmrr_unit *rmrru; > > +unsigned int dev, seg, i; > > +unsigned long pfn; > > +bool_t overlap; > > + > > +for ( i = 0; i < nr_rmrr; i++ ) > > +{ > > +if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Invalid RMRR Range "ERMRRU_FMT"\n", > > + ERMRRU_ARG(extra_rmrr_units[i])); > > +continue; > > +} > > + > > +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn >= > > + MAX_EXTRA_RMRR_PAGES ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "RMRR range "ERMRRU_FMT" exceeds > > "__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n", > > + ERMRRU_ARG(extra_rmrr_units[i])); > > +continue; > > +} > > + > > +overlap = 0; > > +list_for_each_entry(rmrru, _rmrr_units, list) > > +{ > > +if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn) < > > rmrru->end_address && > > + rmrru->base_address < > > pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1) ) > > Aren't both ranges inclusive? I.e. shouldn't the first one be <= (and > the second one could be <= too when dropping the +1), matching > the check acpi_parse_one_rmrr() does? The end_address is not inclusive, while the start_address is. These to from rmrr_identity_mapping() ... ASSERT(rmrr->base_address < rmrr->end_address); and: ... while ( base_pfn < end_pfn ) { int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag); if ( err ) return err; base_pfn++; } ... I think this condition should not be a problem. But yes, its not uniform with acpi_parse_one_rmrr. I guess I should send another version then? > > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Overlapping RMRRs: "ERMRRU_FMT" and [%lx-%lx]\n", > > + ERMRRU_ARG(extra_rmrr_units[i]), > > + paddr_to_pfn(rmrru->base_address), > > + paddr_to_pfn(rmrru->end_address)); > > +overlap = 1; > > +break; > > +} > > +} > > +/* Don't add overlapping RMRR. */ > > +if ( overlap ) > > +continue; > > + > > +pfn = extra_rmrr_units[i].base_pfn; > > +do > > +{ > > +if ( !mfn_valid(pfn) ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Invalid pfn in RMRR range "ERMRRU_FMT"\n", > > + ERMRRU_ARG(extra_rmrr_units[i])); > > +break; > > +} > > +} while ( pfn++ < extra_rmrr_units[i].end_pfn ); > > + > > +/* Invalid pfn in range as the loop ended before end_pfn was > > reached. */ > > +if ( pfn <= extra_rmrr_units[i].end_pfn ) > > +continue; > > + > > +acpi_rmrr = xzalloc(struct acpi_rmrr_unit); > > +if ( !acpi_rmrr ) > > +return; > > + > > +acpi_rmrr->scope.devices = xmalloc_array(u16, > > + > > extra_rmrr_units[i].dev_count); > > +if ( !acpi_rmrr->scope.devices ) > > +{ > > +xfree(acpi_rmrr); > > +return; > > +} > > + > > +seg = 0; > > +for ( dev = 0; dev < extra_rmrr_units[i].dev_count; dev++ ) > > +{ > > +acpi_rmrr->scope.devices[dev] = extra_rmrr_units[i].sbdf[dev]; > > +seg = seg | PCI_SEG(extra_rmrr_units[i].sbdf[dev]); > > Once again - |= please. > Missed this one. > > +} > > +if ( seg != PCI_SEG(extra_rmrr_units[i].sbdf[0]) ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Segments are not equal for RMRR range "ERMRRU_FMT"\n", > > + ERMRRU_ARG(extra_rmrr_units[i])); > > +scope_devices_free(_rmrr->scope); > > +xfree(acpi_rmrr); > > +continue; > > +} > > + > > +acpi_rmrr->segment = seg; > > +acpi_rmrr->base_address = > > pfn_to_paddr(extra_rmrr_units[i].base_pfn); > > +acpi_rmrr->end_address = pfn_to_paddr(extra_rmrr_units[i].end_pfn > > + 1); > > And this seems wrong too, unless I'm mistaken with the inclusive-ness. > The end_address is exclusive, see
Re: [Xen-devel] [PATCH v11 3/3] iommu: add rmrr Xen command line option for extra rmrrs
On Mon, Oct 26, 2015 at 07:38:06AM -0600, Jan Beulich wrote: > >>> On 22.10.15 at 19:13, <elena.ufimts...@oracle.com> wrote: > > From: Elena Ufimtseva <elena.ufimts...@oracle.com> > > > > On some platforms RMRR regions may be not specified in ACPI and thus will > > not > > be mapped 1:1 in dom0. > Thanks Jan for review. > I think this may be misleading to readers: It sounds as if there was > the option for RMRRs to not be specified in ACPI tables, while in > fact this is a firmware bug. How about "On some platforms firmware > fails to specify RMRR regions may in ACPI tables, and thus those > regions will not be mapped in dom0 or guests the respective device(s) > get passed through to"? > Agree, makes more sense. > > +static void __init add_extra_rmrr(void) > > +{ > > +struct acpi_rmrr_unit *acpi_rmrr; > > +struct acpi_rmrr_unit *rmrru; > > +unsigned int dev, seg, i; > > +unsigned long pfn; > > +bool_t overlap; > > + > > +for ( i = 0; i < nr_rmrr; i++ ) > > +{ > > +if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Invalid RMRR Range "ERMRRU_FMT"\n", > > + ERMRRU_ARG(extra_rmrr_units[i])); > > +continue; > > +} > > + > > +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn >= > > + MAX_EXTRA_RMRR_PAGES ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "RMRR range "ERMRRU_FMT" exceeds > > "__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n", > > + ERMRRU_ARG(extra_rmrr_units[i])); > > +continue; > > +} > > + > > +overlap = 0; > > +list_for_each_entry(rmrru, _rmrr_units, list) > > +{ > > +if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn ) < > > rmrru->end_address && > > Stray blank inside the inner parentheses. > > > + rmrru->base_address < > > pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1) ) > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Overlapping RMRRs: "ERMRRU_FMT" and [%lx - %lx]\n", > > ERMRRU_FMT doesn't have any blanks inside the square brackets, > so I'd suggest the other format to nt have them either. > > > + ERMRRU_ARG(extra_rmrr_units[i]), > > + paddr_to_pfn(rmrru->base_address), > > + paddr_to_pfn(rmrru->end_address)); > > +overlap = 1; > > +break; > > +} > > +} > > +/* Dont add overlapping RMRR */ > > "Don't" and missing full stop. > > > +if ( overlap ) > > +continue; > > + > > +pfn = extra_rmrr_units[i].base_pfn; > > +do > > +{ > > +if ( !mfn_valid(pfn) || (pfn >> (paddr_bits - PAGE_SHIFT)) ) > > Actually I think the right side is redundant with the max_pfn check > mfn_valid() does. > > > +{ > > +printk(XENLOG_ERR VTDPREFIX > > + "Invalid pfn in RMRR range "ERMRRU_FMT"\n", > > + ERMRRU_ARG(extra_rmrr_units[i])); > > +break; > > Wrong indentation. > > > +} > > + > > +} while ( pfn++ < extra_rmrr_units[i].end_pfn ); > > Stray blank line before the end of the do/while body. > > > + > > +/* Invalid pfn in range as the loop ended before end_pfn was > > reached. */ > > +if ( pfn <= extra_rmrr_units[i].end_pfn ) > > +continue; > > + > > +acpi_rmrr = xzalloc(struct acpi_rmrr_unit); > > +if ( !acpi_rmrr ) > > +return; > > + > > +acpi_rmrr->scope.devices = xmalloc_array(u16, > > + > > extra_rmrr_units[i].dev_count); > > +if ( !acpi_rmrr->scope.devices ) > > +{ > > +xfree(acpi_rmrr); > > +return; > > +} > > + > > +seg = 0; > > +for ( dev = 0; dev < extra_rmrr_units[i].dev_count; dev++ ) > > +{ > > +acpi_rmrr->scope.devices[dev] = extra_rmrr_
[Xen-devel] [PATCH v12 0/3] iommu: add rmrr Xen command line option
From: Elena Ufimtseva <elena.ufimts...@oracle.com> Sending v12 with mostly cosmetic fixes from Jan's review on v11. Add Xen command line option rmrr to specify RMRR regions that are not defined in ACPI thus causing IO Page Fault while booting dom0 in PVH mode. These additional regions will be added to the list of RMRR regions parsed from ACPI. Changes in v11: - changed macro to print extra RMRR ranges and added argument macro; - fixed the overlapping check if condition error; - fixed the loop exit condition when checking pfn in RMRR region; Elena Ufimtseva (3): iommu VT-d: separate rmrr addition function pci: add wrapper for parse_pci iommu: add rmrr Xen command line option for extra rmrrs docs/misc/xen-command-line.markdown | 13 ++ xen/drivers/passthrough/vtd/dmar.c | 320 +--- xen/drivers/pci/pci.c | 11 ++ xen/include/xen/pci.h | 3 + 4 files changed, 285 insertions(+), 62 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v12 1/3] iommu VT-d: separate rmrr addition function
From: Elena Ufimtseva <elena.ufimts...@oracle.com> In preparation for auxiliary RMRR data provided on Xen command line, make RMRR adding a separate function. Also free memery for rmrr device scope in error path. Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> --- xen/drivers/passthrough/vtd/dmar.c | 126 +++-- 1 file changed, 65 insertions(+), 61 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 7cad593..2f315aa 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -583,6 +583,68 @@ out: return ret; } +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru) +{ +bool_t ignore = 0; +unsigned int i = 0; +int ret = 0; + +/* Skip checking if segment is not accessible yet. */ +if ( !pci_known_segment(rmrru->segment) ) +i = UINT_MAX; + +for ( ; i < rmrru->scope.devices_cnt; i++ ) +{ +u8 b = PCI_BUS(rmrru->scope.devices[i]); +u8 d = PCI_SLOT(rmrru->scope.devices[i]); +u8 f = PCI_FUNC(rmrru->scope.devices[i]); + +if ( pci_device_detect(rmrru->segment, b, d, f) == 0 ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, +" Non-existent device (%04x:%02x:%02x.%u) is reported" +" in RMRR (%"PRIx64", %"PRIx64")'s scope!\n", +rmrru->segment, b, d, f, +rmrru->base_address, rmrru->end_address); +ignore = 1; +} +else +{ +ignore = 0; +break; +} +} + +if ( ignore ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, +" Ignore the RMRR (%"PRIx64", %"PRIx64") due to " +"devices under its scope are not PCI discoverable!\n", +rmrru->base_address, rmrru->end_address); +scope_devices_free(>scope); +xfree(rmrru); +} +else if ( rmrru->base_address > rmrru->end_address ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, +" The RMRR (%"PRIx64", %"PRIx64") is incorrect!\n", +rmrru->base_address, rmrru->end_address); +scope_devices_free(>scope); +xfree(rmrru); +ret = -EFAULT; +} +else +{ +if ( iommu_verbose ) +dprintk(VTDPREFIX, +" RMRR region: base_addr %"PRIx64" end_address %"PRIx64"\n", +rmrru->base_address, rmrru->end_address); +acpi_register_rmrr_unit(rmrru); +} + +return ret; +} + static int __init acpi_parse_one_rmrr(struct acpi_dmar_header *header) { @@ -633,68 +695,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, >scope, RMRR_TYPE, rmrr->segment); -if ( ret || (rmrru->scope.devices_cnt == 0) ) -xfree(rmrru); +if ( !ret && (rmrru->scope.devices_cnt != 0) ) +register_one_rmrr(rmrru); else -{ -u8 b, d, f; -bool_t ignore = 0; -unsigned int i = 0; - -/* Skip checking if segment is not accessible yet. */ -if ( !pci_known_segment(rmrr->segment) ) -i = UINT_MAX; - -for ( ; i < rmrru->scope.devices_cnt; i++ ) -{ -b = PCI_BUS(rmrru->scope.devices[i]); -d = PCI_SLOT(rmrru->scope.devices[i]); -f = PCI_FUNC(rmrru->scope.devices[i]); - -if ( !pci_device_detect(rmrr->segment, b, d, f) ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, -" Non-existent device (%04x:%02x:%02x.%u) is reported" -" in RMRR (%"PRIx64", %"PRIx64")'s scope!\n", -rmrr->segment, b, d, f, -rmrru->base_address, rmrru->end_address); -ignore = 1; -} -else -{ -ignore = 0; -break; -} -} - -if ( ignore ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, -" Ignore the RMRR (%"PRIx64", %"PRIx64") due to " -"devices under its scope are not PCI discoverable!\n", -rmrru->base_address, rmrru->end_address); -scope_devices_free(>scope); -xfree(rmrru); -} -else if ( base_addr > end_addr ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, -" The RMRR (%"PRIx64", %"PRI
[Xen-devel] [PATCH v12 3/3] iommu: add rmrr Xen command line option for extra rmrrs
From: Elena Ufimtseva <elena.ufimts...@oracle.com> On some platforms firmware fails to specify RMRR regions in ACPI tables and thus those regions will not be mapped in dom0 or guests and may cause IO Page Faults and prevent dom0 from booting in PVH mode. New Xen command line option rmrr allows to specify such devices and memory regions. These regions are added to the list of RMRR defined in ACPI if the device is present in system. As a result, additional RMRRs will be mapped 1:1 in dom0 with correct permissions. Mentioned above problems were discovered during PVH work with ThinkCentre M and Dell 5600T. No official documentation was found so far in regards to what devices and why cause this. Experiments show that ThinkCentre M USB devices with enabled debug port generate DMA read transactions to the regions of memory marked reserved in host e820 map. For Dell 5600T the device and faulting addresses are not found yet. For detailed history of the discussion please check following threads: http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html Format for rmrr Xen command line option: rmrr=start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]] If grub2 used and multiple ranges are specified, ';' should be quoted/escaped, refer to grub2 manual for more information. Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> --- docs/misc/xen-command-line.markdown | 13 +++ xen/drivers/passthrough/vtd/dmar.c | 194 +++- 2 files changed, 206 insertions(+), 1 deletion(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 416e559..92c69ea 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1240,6 +1240,19 @@ Specify the host reboot method. 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by default it will use that method first). +### rmrr +> '= start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]] + +Define RMRR units that are missing from ACPI table along with device they +belong to and use them for 1:1 mapping. End addresses can be omitted and one +page will be mapped. The ranges are inclusive when start and end are specified. +If segment of the first device is not specified, segment zero will be used. +If other segments are not specified, first device segment will be used. +If a segment is specified for other than the first device and it does not match +the one specified for the first one, an error will be reported. +Note: grub2 requires to escape or use quotations if special characters are used, +namely ';', refer to the grub2 documentation if multiple ranges are specified. + ### ro-hpet > `= ` diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 2f315aa..a9c555e 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -867,6 +867,131 @@ out: return ret; } +#define MAX_EXTRA_RMRR_PAGES 16 +#define MAX_EXTRA_RMRR 10 + +/* RMRR units derived from command line rmrr option. */ +#define MAX_EXTRA_RMRR_DEV 20 +struct extra_rmrr_unit { +struct list_head list; +unsigned long base_pfn, end_pfn; +unsigned int dev_count; +u32 sbdf[MAX_EXTRA_RMRR_DEV]; +}; + +static __initdata unsigned int nr_rmrr; +static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR]; + +/* Macro for RMRR inclusive range formatting. */ +#define ERMRRU_FMT "[%lx-%lx]" +#define ERMRRU_ARG(eru) eru.base_pfn, eru.end_pfn + +static void __init add_extra_rmrr(void) +{ +struct acpi_rmrr_unit *acpi_rmrr; +struct acpi_rmrr_unit *rmrru; +unsigned int dev, seg, i; +unsigned long pfn; +bool_t overlap; + +for ( i = 0; i < nr_rmrr; i++ ) +{ +if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + "Invalid RMRR Range "ERMRRU_FMT"\n", + ERMRRU_ARG(extra_rmrr_units[i])); +continue; +} + +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn >= + MAX_EXTRA_RMRR_PAGES ) +{ +printk(XENLOG_ERR VTDPREFIX + "RMRR range "ERMRRU_FMT" exceeds "__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n", + ERMRRU_ARG(extra_rmrr_units[i])); +continue; +} + +overlap = 0; +list_for_each_entry(rmrru, _rmrr_units, list) +{ +if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn) < rmrru->end_address && + rmrru->base_address < pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1) ) +{ +printk(XENLOG_ERR VTDPREFIX + "Overlapping R
[Xen-devel] [PATCH v12 2/3] pci: add wrapper for parse_pci
From: Elena Ufimtseva <elena.ufimts...@oracle.com> For sbdf's parsing in RMRR command line add __parse_pci with additional parameter def_seg. __parse_pci will help to identify if segment was found in string being parsed or default segment was used. Make a wrapper parse_pci so the rest of the callers are not affected. Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> Acked-by: Jan Beulich <jbeul...@suse.com> --- xen/drivers/pci/pci.c | 11 +++ xen/include/xen/pci.h | 3 +++ 2 files changed, 14 insertions(+) diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c index ca07ed0..788a356 100644 --- a/xen/drivers/pci/pci.c +++ b/xen/drivers/pci/pci.c @@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, unsigned int *bus_p, unsigned int *dev_p, unsigned int *func_p) { +bool_t def_seg; + +return __parse_pci(s, seg_p, bus_p, dev_p, func_p, _seg); +} + +const char *__init __parse_pci(const char *s, unsigned int *seg_p, + unsigned int *bus_p, unsigned int *dev_p, + unsigned int *func_p, bool_t *def_seg) +{ unsigned long seg = simple_strtoul(s, , 16), bus, dev, func; if ( *s != ':' ) return NULL; bus = simple_strtoul(s + 1, , 16); +*def_seg = 0; if ( *s == ':' ) dev = simple_strtoul(s + 1, , 16); else @@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, dev = bus; bus = seg; seg = 0; +*def_seg = 1; } if ( func_p ) { diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index a5aef55..a7b62a4 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -151,6 +151,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, int cap); int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int cap); const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus, unsigned int *dev, unsigned int *func); +const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus, + unsigned int *dev, unsigned int *func, bool_t *def_seg); + bool_t pcie_aer_get_firmware_first(const struct pci_dev *); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v11 2/3] pci: add wrapper for parse_pci
From: Elena Ufimtseva <elena.ufimts...@oracle.com> For sbdf's parsing in RMRR command line add __parse_pci with additional parameter def_seg. __parse_pci will help to identify if segment was found in string being parsed or default segment was used. Make a wrapper parse_pci so the rest of the callers are not affected. Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> Acked-by: Jan Beulich <jbeul...@suse.com> --- xen/drivers/pci/pci.c | 11 +++ xen/include/xen/pci.h | 3 +++ 2 files changed, 14 insertions(+) diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c index ca07ed0..788a356 100644 --- a/xen/drivers/pci/pci.c +++ b/xen/drivers/pci/pci.c @@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, unsigned int *bus_p, unsigned int *dev_p, unsigned int *func_p) { +bool_t def_seg; + +return __parse_pci(s, seg_p, bus_p, dev_p, func_p, _seg); +} + +const char *__init __parse_pci(const char *s, unsigned int *seg_p, + unsigned int *bus_p, unsigned int *dev_p, + unsigned int *func_p, bool_t *def_seg) +{ unsigned long seg = simple_strtoul(s, , 16), bus, dev, func; if ( *s != ':' ) return NULL; bus = simple_strtoul(s + 1, , 16); +*def_seg = 0; if ( *s == ':' ) dev = simple_strtoul(s + 1, , 16); else @@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, dev = bus; bus = seg; seg = 0; +*def_seg = 1; } if ( func_p ) { diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index a5aef55..a7b62a4 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -151,6 +151,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, int cap); int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int cap); const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus, unsigned int *dev, unsigned int *func); +const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus, + unsigned int *dev, unsigned int *func, bool_t *def_seg); + bool_t pcie_aer_get_firmware_first(const struct pci_dev *); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v11 3/3] iommu: add rmrr Xen command line option for extra rmrrs
From: Elena Ufimtseva <elena.ufimts...@oracle.com> On some platforms RMRR regions may be not specified in ACPI and thus will not be mapped 1:1 in dom0. This causes IO Page Faults and prevents dom0 from booting in PVH mode. New Xen command line option rmrr allows to specify such devices and memory regions. These regions are added to the list of RMRR defined in ACPI if the device is present in system. As a result, additional RMRRs will be mapped 1:1 in dom0 with correct permissions. Mentioned above problems were discovered during PVH work with ThinkCentre M and Dell 5600T. No official documentation was found so far in regards to what devices and why cause this. Experiments show that ThinkCentre M USB devices with enabled debug port generate DMA read transactions to the regions of memory marked reserved in host e820 map. For Dell 5600T the device and faulting addresses are not found yet. For detailed history of the discussion please check following threads: http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html Format for rmrr Xen command line option: rmrr=start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]] If grub2 used and multiple ranges are specified, ';' should be quoted/escaped, refer to grub2 manual for more information. Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> --- docs/misc/xen-command-line.markdown | 13 +++ xen/drivers/passthrough/vtd/dmar.c | 196 +++- 2 files changed, 208 insertions(+), 1 deletion(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 416e559..92c69ea 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1240,6 +1240,19 @@ Specify the host reboot method. 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by default it will use that method first). +### rmrr +> '= start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]] + +Define RMRR units that are missing from ACPI table along with device they +belong to and use them for 1:1 mapping. End addresses can be omitted and one +page will be mapped. The ranges are inclusive when start and end are specified. +If segment of the first device is not specified, segment zero will be used. +If other segments are not specified, first device segment will be used. +If a segment is specified for other than the first device and it does not match +the one specified for the first one, an error will be reported. +Note: grub2 requires to escape or use quotations if special characters are used, +namely ';', refer to the grub2 documentation if multiple ranges are specified. + ### ro-hpet > `= ` diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index ced3239..8cbed88 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -867,6 +867,132 @@ out: return ret; } +#define MAX_EXTRA_RMRR_PAGES 16 +#define MAX_EXTRA_RMRR 10 + +/* RMRR units derived from command line rmrr option. */ +#define MAX_EXTRA_RMRR_DEV 20 +struct extra_rmrr_unit { +struct list_head list; +unsigned long base_pfn, end_pfn; +unsigned int dev_count; +u32 sbdf[MAX_EXTRA_RMRR_DEV]; +}; + +static __initdata unsigned int nr_rmrr; +static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR]; + +/* Macro for RMRR inclusive range formatting. */ +#define ERMRRU_FMT "[%lx-%lx]" +#define ERMRRU_ARG(eru) eru.base_pfn, eru.end_pfn + +static void __init add_extra_rmrr(void) +{ +struct acpi_rmrr_unit *acpi_rmrr; +struct acpi_rmrr_unit *rmrru; +unsigned int dev, seg, i; +unsigned long pfn; +bool_t overlap; + +for ( i = 0; i < nr_rmrr; i++ ) +{ +if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + "Invalid RMRR Range "ERMRRU_FMT"\n", + ERMRRU_ARG(extra_rmrr_units[i])); +continue; +} + +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn >= + MAX_EXTRA_RMRR_PAGES ) +{ +printk(XENLOG_ERR VTDPREFIX + "RMRR range "ERMRRU_FMT" exceeds "__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n", + ERMRRU_ARG(extra_rmrr_units[i])); +continue; +} + +overlap = 0; +list_for_each_entry(rmrru, _rmrr_units, list) +{ +if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn ) < rmrru->end_address && + rmrru->base_address < pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1) ) +{ +printk(XENLOG_ERR VTDPREFIX + "Overlapping RMRRs: "ERMR
[Xen-devel] [PATCH v11 0/3] iommu: add rmrr Xen command line option
From: Elena Ufimtseva <elena.ufimts...@oracle.com> Its being a while since the last v10. There are subtle changes and fewer patches in the series and will be nice to move it out of my way. Please review and comment. Add Xen command line option rmrr to specify RMRR regions for devices that are not defined in ACPI thus causing IO Page Fault while booting dom0 in PVH mode. These additional regions will be added to the list of RMRR regions parsed from ACPI. Changes in v11: - changed macro to print extra RMRR ranges and added argument macro; - fixed the overlapping check if condition error; - fixed the loop exit condition when checking pfn in RMRR region; Changes in v10: - incorporate patch 'dmar: device scope mem leak fix' as series requires it; - move patch 'pci: add PCI_SBDF and PCI_SEG macros' close to the last patch which uses it; Changes in v9: - skip to next RMRR region if current overlaps with any in acpi_rmrr_units; - fix typos in commit messages; - remove clean up chages introduced by mistake in v8; Elena Ufimtseva (3): iommu VT-d: separate rmrr addition function pci: add wrapper for parse_pci iommu: add rmrr Xen command line option for extra rmrrs Changes in v8: - removed bogus debug in patch 1 with non-functional changes; - changed PRI_RMRRL macro for formatting to reflect the fact that two arguments are used, so make it PRI_RMRR(s,e) for formatting inclusive RMRR range; 'L' is also removed from macro name, which meant to server as a type of arguments (%lx); - added overlapping check with RMRRs from ACPI; - added check based on paddr_bits for pfn's in extra RMRR range (not sure if its redundant with mfn_valid); - addressed while loop exit condition in extra RMRRs parser; docs/misc/xen-command-line.markdown | 13 ++ xen/drivers/passthrough/vtd/dmar.c | 322 +--- xen/drivers/pci/pci.c | 11 ++ xen/include/xen/pci.h | 3 + 4 files changed, 287 insertions(+), 62 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v11 1/3] iommu VT-d: separate rmrr addition function
From: Elena Ufimtseva <elena.ufimts...@oracle.com> In preparation for auxiliary RMRR data provided on Xen command line, make RMRR adding a separate function. Also free memery for rmrr device scope in error path. Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> --- xen/drivers/passthrough/vtd/dmar.c | 126 +++-- 1 file changed, 65 insertions(+), 61 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 7cad593..ced3239 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -583,6 +583,68 @@ out: return ret; } +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru) +{ +bool_t ignore = 0; +unsigned int i = 0; +int ret = 0; + +/* Skip checking if segment is not accessible yet. */ +if ( !pci_known_segment(rmrru->segment) ) +i = UINT_MAX; + +for ( ; i < rmrru->scope.devices_cnt; i++ ) +{ +u8 b = PCI_BUS(rmrru->scope.devices[i]); +u8 d = PCI_SLOT(rmrru->scope.devices[i]); +u8 f = PCI_FUNC(rmrru->scope.devices[i]); + +if ( pci_device_detect(rmrru->segment, b, d, f) == 0 ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, +" Non-existent device (%04x:%02x:%02x.%u) is reported" +" in RMRR (%"PRIx64", %"PRIx64")'s scope!\n", +rmrru->segment, b, d, f, +rmrru->base_address, rmrru->end_address); +ignore = 1; +} +else +{ +ignore = 0; +break; +} +} + +if ( ignore ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, +" Ignore the RMRR (%"PRIx64", %"PRIx64") due to " +"devices under its scope are not PCI discoverable!\n", +rmrru->base_address, rmrru->end_address); +scope_devices_free(>scope); +xfree(rmrru); +} +else if ( rmrru->base_address > rmrru->end_address ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, +" The RMRR (%"PRIx64", %"PRIx64") is incorrect!\n", +rmrru->base_address, rmrru->end_address); +scope_devices_free(>scope); +xfree(rmrru); +ret = -EFAULT; +} +else +{ +if ( iommu_verbose ) +dprintk(VTDPREFIX, +" RMRR region: base_addr %"PRIx64" end_address %"PRIx64"\n", +rmrru->base_address, rmrru->end_address); +acpi_register_rmrr_unit(rmrru); +} + +return ret; +} + static int __init acpi_parse_one_rmrr(struct acpi_dmar_header *header) { @@ -633,68 +695,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, >scope, RMRR_TYPE, rmrr->segment); -if ( ret || (rmrru->scope.devices_cnt == 0) ) -xfree(rmrru); +if ( !ret && (rmrru->scope.devices_cnt != 0) ) +register_one_rmrr(rmrru); else -{ -u8 b, d, f; -bool_t ignore = 0; -unsigned int i = 0; - -/* Skip checking if segment is not accessible yet. */ -if ( !pci_known_segment(rmrr->segment) ) -i = UINT_MAX; - -for ( ; i < rmrru->scope.devices_cnt; i++ ) -{ -b = PCI_BUS(rmrru->scope.devices[i]); -d = PCI_SLOT(rmrru->scope.devices[i]); -f = PCI_FUNC(rmrru->scope.devices[i]); - -if ( !pci_device_detect(rmrr->segment, b, d, f) ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, -" Non-existent device (%04x:%02x:%02x.%u) is reported" -" in RMRR (%"PRIx64", %"PRIx64")'s scope!\n", -rmrr->segment, b, d, f, -rmrru->base_address, rmrru->end_address); -ignore = 1; -} -else -{ -ignore = 0; -break; -} -} - -if ( ignore ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, -" Ignore the RMRR (%"PRIx64", %"PRIx64") due to " -"devices under its scope are not PCI discoverable!\n", -rmrru->base_address, rmrru->end_address); -scope_devices_free(>scope); -xfree(rmrru); -} -else if ( base_addr > end_addr ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, -" Th
Re: [Xen-devel] [PATCH v2] PVH Dom0 RMRR IOMMU mapping regression fix
On Mon, Sep 28, 2015 at 01:04:48AM -0600, Jan Beulich wrote: > >>> On 25.09.15 at 22:59, <elena.ufimts...@oracle.com> wrote: > > From: Elena Ufimtseva <elena.ufimts...@oracle.com> > > > > This patch addresses a regression introduced by commit > > 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in new set_identity_p2m_entry. > > RMRRs are not being mapped in IOMMU for PVH Dom0. This causes pages faults > > and > > some long 'hang-like' delays during Dom0 PVH boot and device assignments. > > > > During construct_dom0, in PVH path p2m is being constructed and identity > > mapped > > in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx. > > New code used to map RMRRs invoked from rmrr_identity_mapping > > checks if p2m entry exists with same type and access and if yes, skips iommu > > mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being > > mapped in IOMMU. > > > > As was mentioned in the earlier discussion, the PVH Dom0 construction code > > should be modified to properly map RMRR regions in IOMMU. Since change will > > be > > too invasive, this solution is a temporary fix at this time before better > > solution is in. Also as Jan mentioned, there is no need in having 'x' > > permissions > > for p2m entry of a mmio region, thus changed here. > > Well, now that I look at this again I think there could be reasons for > execute permission to be needed: Code placed in ROM may require > this. But then again Dom0 shouldn't on its own (i.e. without > involving the hypervisor) invoke such code, which usually would be > expecting to be run in root mode ring 0 anyway. So I think not > defaulting to include X is the right thing. Hence ... > > > Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> > > Reviewed-by: Jan Beulich <jbeul...@suse.com> > Thanks Jan! ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH resend] PVH Dom0 RMRR IOMMU mapping regression fix
On Fri, Sep 25, 2015 at 12:36:09AM -0600, Jan Beulich wrote: > >>> On 25.09.15 at 01:53,wrote: > > Permissions for p2m entry of read-only > > mmio regions are left unchanged as leaving only 'r' cause page faults. I am > > not sure what the reason of it yet, will try to dig it further. > > Yes please - imo this absolutely should be changed to just r along > with the rwx -> rw conversion. Since you saw page faults, could > you at least point out which address(es) they occurred for? After > all the set of r/o MMIO pages should be relatively small... I did verify it with clean build and I cannot reproduce it anymore. But that is the Page Fault I saw: XEN) [VT-D]iommu.c:873: iommu_fault_status: Fault Overflow (XEN) [VT-D]iommu.c:875: iommu_fault_status: Primary Pending Fault (XEN) [VT-D]DMAR:[DMA Write] Request device [:00:1f.2] fault addr 1b56000, iommu reg = 82c000203000 (XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set (XEN) print_vtd_entries: iommu 830412a4b9c0 dev :00:1f.2 gmfn 1b56 (XEN) root_entry = 830412a48000 (XEN) root_entry[0] = 291cbd001 (XEN) context = 830291cbd000 (XEN) context[fa] = 2_2920c7001 (XEN) l4 = 8302920c7000 (XEN) l4_index = 0 (XEN) l4[0] = 2920c6003 (XEN) l3 = 8302920c6000 (XEN) l3_index = 0 (XEN) l3[0] = 2920c5003 (XEN) l2 = 8302920c5000 (XEN) l2_index = d (XEN) l2[d] = 2920b5003 (XEN) l1 = 8302920b5000 (XEN) l1_index = 156 (XEN) l1[156] = 0 (XEN) l1[156] not present Device is not reported in DMAR, the gfn mapped with p2m_ram_rw type... lspci: 00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05) (prog-if 01 [AHCI 1.0]) Subsystem: Lenovo Device 3097 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 84 I/O ports at f0d0 [size=8] I/O ports at f0c0 [size=4] I/O ports at f0b0 [size=8] I/O ports at f0a0 [size=4] I/O ports at f060 [size=32] Memory at f7c36000 (32-bit, non-prefetchable) [size=2K] Capabilities: Kernel driver in use: ahci But as I say, I cannot reproduce it, will run few more tests. > > > --- a/xen/arch/x86/mm/p2m.c > > +++ b/xen/arch/x86/mm/p2m.c > > @@ -971,7 +971,17 @@ int set_identity_p2m_entry(struct domain *d, unsigned > > long gfn, > > ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K, > > p2m_mmio_direct, p2ma); > > else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma ) > > -ret = 0; > > +{ > > +/* > > + * PVH fixme: during Dom0 PVH construction, p2m entries are being > > set > > + * but iomem regions are not mapped with IOMMU. This makes sure > > that > > + * RMRRs are correctly mapped with IOMMU. > > + */ > > +if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) ) > > +ret = iommu_map_page(d, gfn, gfn, > > IOMMUF_readable|IOMMUF_writable); > > This should use p2m_get_iommu_flags() (which eventually needs to > also honor the passed in p2m_access_t, i.e. its use here for now > only serves documentation purposes as well as a means to spot the > location when making said adjustment). Here is the problem: for p2m_mmio_direct type p2m_get_iommu_flags() will return 0. And that is essentially why 1:1 iomem mapping for Dom0 PVH does set p2m entries, but does not create identity mapping in construct_dom0. Do you mean when saying 'honoring p2m_access_t' that p2m_get_iommu_flags should be more like ept_p2m_type_to_flags() where permissions are verified? Right now even if rw permissions are requested, the type p2m_mmio_direct will always return IOMMU flags being zero from p2m_get_iommu_flags(); > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] PVH Dom0 RMRR IOMMU mapping regression fix
From: Elena Ufimtseva <elena.ufimts...@oracle.com> This patch addresses a regression introduced by commit 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in new set_identity_p2m_entry. RMRRs are not being mapped in IOMMU for PVH Dom0. This causes pages faults and some long 'hang-like' delays during Dom0 PVH boot and device assignments. During construct_dom0, in PVH path p2m is being constructed and identity mapped in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx. New code used to map RMRRs invoked from rmrr_identity_mapping checks if p2m entry exists with same type and access and if yes, skips iommu mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being mapped in IOMMU. As was mentioned in the earlier discussion, the PVH Dom0 construction code should be modified to properly map RMRR regions in IOMMU. Since change will be too invasive, this solution is a temporary fix at this time before better solution is in. Also as Jan mentioned, there is no need in having 'x' permissions for p2m entry of a mmio region, thus changed here. You comments and suggestions are welcome! Thank you. Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> --- Changes in v2: - removed 'x' permission from p2m entry what has mmio read only regions. conducted tests did not demostrate IOMMU Page Faults that I mentioned in v1(RFC) of this patch; xen/arch/x86/domain_build.c | 4 ++-- xen/arch/x86/mm/p2m.c | 12 +++- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c index 18cf6aa..bca6fe7 100644 --- a/xen/arch/x86/domain_build.c +++ b/xen/arch/x86/domain_build.c @@ -432,9 +432,9 @@ static __init void pvh_add_mem_mapping(struct domain *d, unsigned long gfn, } if ( rangeset_contains_singleton(mmio_ro_ranges, mfn + i) ) -a = p2m_access_rx; +a = p2m_access_r; else -a = p2m_access_rwx; +a = p2m_access_rw; if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), a)) ) panic("pvh_add_mem_mapping: gfn:%lx mfn:%lx i:%ld rc:%d\n", diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index e1d930a..7ba7832 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -972,7 +972,17 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn, ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K, p2m_mmio_direct, p2ma); else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma ) -ret = 0; +{ +/* + * PVH fixme: during Dom0 PVH construction, p2m entries are being set + * but iomem regions are not mapped with IOMMU. This makes sure that + * RMRRs are correctly mapped with IOMMU. + */ +if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) ) +ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable); +else +ret = 0; +} else { if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED ) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression in RMRRs identity mapping for PVH Dom0
On Thu, Sep 24, 2015 at 11:29:54AM +0100, Wei Liu wrote: > Hi Elena > > On Wed, Sep 23, 2015 at 11:56:12AM -0400, Elena Ufimtseva wrote: > > Hi > > > > > > > > There is a regression in RMRR patch > > 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in > > new set_identity_p2m_entry. RMRRs are not being mapped in IOMMU for PVH > > Dom0. > > This causes pages faults and some long 'hang-like' delays during boot and > > device assignments. > > > > > > During construct_dom0, in PVH path p2m is being constructed and identity > > mapped > > in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx. > > New code used to map RMRRs invoked from rmrr_identity_mapping > > > > checks if p2m entry exists with same type and access and if yes, skips iommu > > mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being > > mapped in IOMMU. > > > > > > This debug patch attached fixes this and Ill be glad to see if there is a > > more elegant fix. > > > > > > From a release point of view, PVH Dom0 is not officially supported so I > don't consider this issue a blocker. > Understand. > We can backport the proper fix to 4.6.1 if necessary, but I doubt this > is the only fix we need to make PVH Dom0 work on 4.6. Am I right? Dom0 PVH boots with some glitches on Intel platforms and with some others on AMD and it will see for sure more patches. But this problem will make Dom0 on some Intel platforms to hang, throw page faults or may not be able to boot at all (as I have seend that happening for some devices when doing work on extra RMRRs). > > Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression in RMRRs identity mapping for PVH Dom0
On Thu, Sep 24, 2015 at 04:31:09AM -0600, Jan Beulich wrote: > >>> On 24.09.15 at 11:18,wrote: > > AIUI the problem is that before the call to set_identity_p2m_entry(), > > PVH dom0 has a p2m entry covering this range but no IOMMU entry. Is > > that right? So the fix will be to make PVH dom0 construction set up > > the IOMMU correctly when it sets up the p2m. > > Right, but with the current way of setting up PVH Dom0 I'm afraid > this will be rather intrusive to implement. Hence, however much I > dislike it, I wonder whether a variant of Elena's change (suitably > annotated with a phv fixme) wouldn't be a reasonable thing for 4.6. > With the switch to HVMlite the Dom0 setup will need to be re-done > anyway afaics. I agree here Jan. The PVH Dom0 up page tables is a sort of special case on its own. And me, Andrew Cooper and Konrad talked about changing it, but I have not yet started working on it yet, but I think its in my plan. > > Elena, as to the actual patch: > > >--- a/xen/arch/x86/mm/p2m.c > >+++ b/xen/arch/x86/mm/p2m.c > >@@ -970,8 +970,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned > >long gfn, > > if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm ) > > ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K, > > p2m_mmio_direct, p2ma); > >-else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma ) > >-ret = 0; > >+else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct ) > >+if ( a == p2ma && !is_pvh_domain(d) ) > >+ret = 0; > >+else ret = iommu_map_page(d, gfn, gfn, > >IOMMUF_readable|IOMMUF_writable); > > Besides this wanting figure braces, why do you pull the a == p2ma > check into the inner if()? If this is because of the P2M getting > populated with p2m_rwx, I think _that_ should be changed rather > than breaking the logic here (or, if done properly, complicating it). > There's no reason I can see to map MMIO regions rwx. Yes, that is why I did it, because of rwx. I will modify it. > > Also I think this wants to cover just hwdom and !iommu_use_hap_pt. Yes, forgot about this one. > > Jan > > > else > > { > > if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED ) > > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression in RMRRs identity mapping for PVH Dom0
On Thu, Sep 24, 2015 at 10:18:54AM +0100, Tim Deegan wrote: > At 15:17 +0800 on 24 Sep (1443107852), Chen, Tiejun wrote: > > On 9/23/2015 11:56 PM, Elena Ufimtseva wrote: > > > Hi > > > > > > There is a regression in RMRR patch > > > 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in > > > new set_identity_p2m_entry. RMRRs are not being mapped in IOMMU for PVH > > > Dom0. > > > This causes pages faults and some long 'hang-like' delays during boot and > > > device assignments. > > > > > > During construct_dom0, in PVH path p2m is being constructed and identity > > > mapped > > > in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx. > > > New code used to map RMRRs invoked from rmrr_identity_mapping > > > checks if p2m entry exists with same type and access and if yes, skips > > > iommu > > > mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not > > > being > > > mapped in IOMMU. > > > > > > This debug patch attached fixes this and Ill be glad to see if there is a > > > more elegant fix. > > > > Based on your explanation, sounds pvh always creates this mapping > > beforehand, so what about this? > > > > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c > > index cf8485e..d026845 100644 > > --- a/xen/arch/x86/mm/p2m.c > > +++ b/xen/arch/x86/mm/p2m.c > > @@ -964,7 +964,7 @@ int set_identity_p2m_entry(struct domain *d, > > unsigned long gfn, > > struct p2m_domain *p2m = p2m_get_hostp2m(d); > > int ret; > > > > -if ( !paging_mode_translate(p2m->domain) ) > > +if ( !paging_mode_translate(p2m->domain) || is_pvh_domain(d) ) > > Sorry, but that wouldn't be safe. :( PVH domains need the same > protection as any other paging_mode_translate ones. > > AIUI the problem is that before the call to set_identity_p2m_entry(), > PVH dom0 has a p2m entry covering this range but no IOMMU entry. Is > that right? So the fix will be to make PVH dom0 construction set up > the IOMMU correctly when it sets up the p2m. Yes, thats right. Rework of construct_dom0 and its PVH part should help. > > Cheers, > > Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [RFC PATCH resend] PVH Dom0 RMRR IOMMU mapping regression fix
From: Elena Ufimtseva <elena.ufimts...@oracle.com> This patch addresses a regression introduced by commit 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in new set_identity_p2m_entry. RMRRs are not being mapped in IOMMU for PVH Dom0. This causes pages faults and some long 'hang-like' delays during Dom0 PVH boot and device assignments. During construct_dom0, in PVH path p2m is being constructed and identity mapped in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx. New code used to map RMRRs invoked from rmrr_identity_mapping checks if p2m entry exists with same type and access and if yes, skips iommu mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being mapped in IOMMU. As was mentioned in the earlier discussion, the PVH Dom0 construction code should be modified to properly map RMRR regions in IOMMU. Since change will be too invasive, this solution is a temporary fix at this time before better solution is in. Also as Jan mentioned, there is no need in having 'x' permissions for p2m entry of a mmio region, thus changed here. Permissions for p2m entry of read-only mmio regions are left unchanged as leaving only 'r' cause page faults. I am not sure what the reason of it yet, will try to dig it further. You comments and suggestions are welcome! Thank you. Elena Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> --- xen/arch/x86/domain_build.c | 2 +- xen/arch/x86/mm/p2m.c | 12 +++- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c index 18cf6aa..259dfd4 100644 --- a/xen/arch/x86/domain_build.c +++ b/xen/arch/x86/domain_build.c @@ -434,7 +434,7 @@ static __init void pvh_add_mem_mapping(struct domain *d, unsigned long gfn, if ( rangeset_contains_singleton(mmio_ro_ranges, mfn + i) ) a = p2m_access_rx; else -a = p2m_access_rwx; +a = p2m_access_rw; if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), a)) ) panic("pvh_add_mem_mapping: gfn:%lx mfn:%lx i:%ld rc:%d\n", diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index b2726bd..97a0986 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -971,7 +971,17 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn, ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K, p2m_mmio_direct, p2ma); else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma ) -ret = 0; +{ +/* + * PVH fixme: during Dom0 PVH construction, p2m entries are being set + * but iomem regions are not mapped with IOMMU. This makes sure that + * RMRRs are correctly mapped with IOMMU. + */ +if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) ) +ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable); +else +ret = 0; +} else { if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED ) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Regression in RMRRs identity mapping for PVH Dom0
Hi There is a regression in RMRR patch 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in new set_identity_p2m_entry. RMRRs are not being mapped in IOMMU for PVH Dom0. This causes pages faults and some long 'hang-like' delays during boot and device assignments. During construct_dom0, in PVH path p2m is being constructed and identity mapped in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx. New code used to map RMRRs invoked from rmrr_identity_mapping checks if p2m entry exists with same type and access and if yes, skips iommu mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being mapped in IOMMU. This debug patch attached fixes this and Ill be glad to see if there is a more elegant fix. Thanks! Elena >From fb25216760a0c17447faa1f416cc59341600dc1b Mon Sep 17 00:00:00 2001 From: Elena Ufimtseva <elena.ufimts...@oracle.com> Date: Wed, 23 Sep 2015 11:47:49 -0400 Subject: [PATCH] RMRR regression debug for PVH Dom0 Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com> --- xen/arch/x86/mm/p2m.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index b2726bd..16c8938 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -970,8 +970,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn, if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm ) ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K, p2m_mmio_direct, p2ma); -else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma ) -ret = 0; +else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct ) +if ( a == p2ma && !is_pvh_domain(d) ) +ret = 0; +else ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable); else { if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED ) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] page faults on machines with 4TB memory
On Thu, Jul 23, 2015 at 06:01:45PM +0100, Andrew Cooper wrote: On 23/07/15 17:35, Elena Ufimtseva wrote: Hi While working on bugs during boot time on large oracle server x4-8, There is a problem with booting Xen on large machines with 4TB memory, such as Oracle x4-8. The page fault occured initially while loading xen pm info into hypervisor (you can see it in serial log attahced named 4.4.2_no_mem_override). Tracing down an issue shows that page fault occures in timer.c code while getting heap size. Here is the original call trace: rocessor: Uploading Xen processor PM info @ (XEN) [ Xen-4.4.3-preOVM x86_64 debug=n Tainted:C ] @ (XEN) CPU:0 @ (XEN) RIP:e008:[82d08022e747] add_entry+0x27/0x120 @ (XEN) RFLAGS: 00010082 CONTEXT: hypervisor @ (XEN) rax: 8a2d080513a20 rbx: 83808e802300 rcx: 00e8 @ (XEN) rdx: 00e8 rsi: 00e8 rdi: 83808e802300 @ (XEN) rbp: 82d080513a20 rsp: 82d0804d7c70 r8: 8840ffdb5010 @ (XEN) r9: 0017 r10: 83808e802180 r11: 0200200200200200 @ (XEN) r12: 82d080533080 r13: 0296 r14: 0100100100100100 @ (XEN) r15: 00e8 cr0: 80050033 cr4: 001526f0 @ (XEN) cr3: 0100818b2000 cr2: 8840ffdb5010 @ (XEN) ds: es: fs: gs: ss: e010 cs: e008 @ (XEN) Xen stack trace from rsp=82d0804d7c70: @ (XEN)83808e802300 82d080513a20 82d08022f59b 82d080533080 @ (XEN)82d080532f50 00e8 83808e802328 @ (XEN)82d080513a20 83808e8022c0 82d080533200 00e8 @ (XEN)00f0 82d0805331c0 82d0802458e2 @ (XEN)00e8 83808e802334 8384be7979b0 82d0804d7d78 @ (XEN) 8384be77c700 82d0804d7d78 82d080513a20 @ (XEN)82d080246207 00e8 00e8 8384be7979b0 @ (XEN)82d08024518a 82d080533080 0070 82d080533da8 @ (XEN)000100e8 8384be797a00 00e80001 002ab980002abd68 @ (XEN)271000124f80 002abd6800124f80 002ab980 82d0803753e0 @ (XEN)00010101 0001 82d0804d7e18 881fb4afbc88 @ (XEN)82d0804d 881fb28a4400 82d0804fca80 819b7080 @ (XEN)82d080266c16 83808fb46ba8 82d080208a82 83006bddd190 @ (XEN)0292 03010036 000100f6 000f @ (XEN)007f000c0082 007f000c0082 @ (XEN)000a 881fb28a4400 0005 @ (XEN) 00fe 0001 0001 @ (XEN) 82d08031f521 @ (XEN)0246 810010ea 810010ea @ (XEN)e030 0246 83006bddd000 881fb4afbd48 @ (XEN) Xen call trace: @ (XEN)[82d08022e747] add_entry+0x27/0x120 @ (XEN)[82d08022f59b] set_timer+0x10b/0x220 @ (XEN)[82d0802458e2] cpufreq_governor_dbs+0x1e2/0x2f0 @ (XEN)[82d080246207] __cpufreq_set_policy+0x87/0x120 @ (XEN)[82d08024518a] cpufreq_add_cpu+0x24a/0x4f0 @ (XEN)[82d080266c16] do_platform_op+0x9c6/0x1650 @ (XEN)[82d080208a82] evtchn_check_pollers+0x22/0xb0 @ (XEN)[82d08031f521] do_iret+0xc1/0x1a0 @ (XEN)[82d0803243a9] syscall_enter+0xa9/0xae @ (XEN) @ (XEN) Pagetable walk from 8840ffdb5010: @ (XEN) L4[0x110] = 0100818b3067 18b3 @ (XEN) L3[0x103] = @ (XEN) @ (XEN) 0x82d08022e720 add_entry: movzwl 0x28(%rdi),%edx 0x82d08022e724 add_entry+4:push %rbp 0x82d08022e725 add_entry+5: lea0x2e52f4(%rip),%rax# 0x82d080513a20 __per_cpu_offset 0x82d08022e72c add_entry+12: lea0x30494d(%rip),%r10# 0x82d080533080 per_cpu__timers 0x82d08022e733 add_entry+19: push %rbx 0x82d08022e734 add_entry+20: add(%rax,%rdx,8),%r10 0x82d08022e738 add_entry+24: movl $0x0,0x8(%rdi) 0x82d08022e73f add_entry+31: movb $0x3,0x2a(%rdi) 0x82d08022e743 add_entry+35: mov0x8(%r10),%r8 0x82d08022e747 add_entry+39: movzwl (%r8),%ecx And this points to int sz = GET_HEAP_SIZE(heap); in add_entry of timer.c. static int add_entry(struct timer *t) { 82d08022cad3: 53
Re: [Xen-devel] [PATCH v10 5/5] iommu: add rmrr Xen command line option for extra rmrrs
- jbeul...@suse.com wrote: On 15.07.15 at 17:27, elena.ufimts...@oracle.com wrote: On Wed, Jul 15, 2015 at 08:25:06AM +0100, Jan Beulich wrote: On 14.07.15 at 12:43, jbeul...@suse.com wrote: On 13.07.15 at 20:18, elena.ufimts...@oracle.com wrote: +/* Macro for RMRR inclusive range formatting. */ +#define PRI_RMRR(s,e) [%lx-%lx] Just PRI_RMRR (i.e. no parens or parameters) please. And I'm still missing a macro to pair the respective arguments - as said before, as single format specifier should be accompanied by a single argument (as visible to the reader at the use sites). Answering your IRC question here: #define ERU_FMT [%lx-%lx] #define ERU_ARG(eru) eru.base_pfn, eru.end_pfn (with the acronym eru open for improvement). Great! Thanks Jan. Can ERU be RMRRU? ERMRRU maybe - I'd like the extra to somehow be expressed in the name. Does this imply that it can be used for formatting ACPI RMRRs? Or with some modification perharps? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 5/5] iommu: add rmrr Xen command line option for extra rmrrs
On Wed, Jul 15, 2015 at 08:25:06AM +0100, Jan Beulich wrote: On 14.07.15 at 12:43, jbeul...@suse.com wrote: On 13.07.15 at 20:18, elena.ufimts...@oracle.com wrote: +/* Macro for RMRR inclusive range formatting. */ +#define PRI_RMRR(s,e) [%lx-%lx] Just PRI_RMRR (i.e. no parens or parameters) please. And I'm still missing a macro to pair the respective arguments - as said before, as single format specifier should be accompanied by a single argument (as visible to the reader at the use sites). Answering your IRC question here: #define ERU_FMT [%lx-%lx] #define ERU_ARG(eru) eru.base_pfn, eru.end_pfn (with the acronym eru open for improvement). Great! Thanks Jan. Can ERU be RMRRU? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Ping: [PATCH v6] dmar: device scope mem leak fix
- Original Message - From: jbeul...@suse.com To: kevin.t...@intel.com, yang.z.zh...@intel.com Cc: xen-devel@lists.xen.org, boris.ostrov...@oracle.com, elena.ufimts...@oracle.com, konrad.w...@oracle.com, t...@xen.org Sent: Monday, July 13, 2015 12:18:33 PM GMT -05:00 US/Canada Eastern Subject: Ping: [PATCH v6] dmar: device scope mem leak fix On 07.07.15 at 17:17, elena.ufimts...@oracle.com wrote: From: Elena Ufimtseva elena.ufimts...@oracle.com Release memory allocated for scope.devices dmar units on various failure paths and when disabling dmar. Set device count after successful memory allocation, not before, in device scope parsing function. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- Changes in v6: - eliminated unrelated code move; - fix introduces in v5 memory leak; Changes in v5; - make scope_devices_free actually safe; Changes in v4: - make scope_devices_free safe to call with NULL scope pointer; - since scope_devices_free is safe to call, use it in failure path in acpi_parse_one_drhd; Changes in v3: - make freeing memory for scope devices and zeroing device counter as a function; - make sure parse_one_rmrr has memory leak fix in this patch; - make sure ret values are not lost acpi_parse_one_drhd; Changes in v2: - release memory for devices scope on error paths in acpi_parse_one_drhd and acpi_parse_one_atsr and set the count to zero; xen/drivers/passthrough/vtd/dmar.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 2b07be9..8ed1e24 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -81,6 +81,15 @@ static int __init acpi_register_rmrr_unit(struct acpi_rmrr_unit *rmrr) return 0; } +static void scope_devices_free(struct dmar_scope *scope) +{ +if ( !scope ) +return; + +scope-devices_cnt = 0; +xfree(scope-devices); +} + static void __init disable_all_dmar_units(void) { struct acpi_drhd_unit *drhd, *_drhd; @@ -90,16 +99,19 @@ static void __init disable_all_dmar_units(void) list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list ) { list_del(drhd-list); +scope_devices_free(drhd-scope); xfree(drhd); } list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list ) { list_del(rmrr-list); +scope_devices_free(rmrr-scope); xfree(rmrr); } list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list ) { list_del(atsr-list); +scope_devices_free(atsr-scope); xfree(atsr); } } @@ -318,13 +330,13 @@ static int __init acpi_parse_dev_scope( if ( (cnt = scope_device_count(start, end)) 0 ) return cnt; -scope-devices_cnt = cnt; if ( cnt 0 ) { scope-devices = xzalloc_array(u16, cnt); if ( !scope-devices ) return -ENOMEM; } +scope-devices_cnt = cnt; while ( start end ) { @@ -427,7 +439,7 @@ static int __init acpi_parse_dev_scope( out: if ( ret ) -xfree(scope-devices); +scope_devices_free(scope); return ret; } @@ -542,6 +554,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) Workaround BIOS bug: ignore the DRHD due to all devices under its scope are not PCI discoverable!\n); +scope_devices_free(dmaru-scope); iommu_free(dmaru); xfree(dmaru); } @@ -562,9 +575,11 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) out: if ( ret ) { +scope_devices_free(dmaru-scope); iommu_free(dmaru); xfree(dmaru); } + return ret; } @@ -658,6 +673,7
[Xen-devel] [PATCH v10 2/5] iommu VT-d: separate rmrr addition function
From: Elena Ufimtseva elena.ufimts...@oracle.com In preparation for auxiliary RMRR data provided on Xen command line, make RMRR adding a separate function. Also free memory for rmrr device scope in error path. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/drivers/passthrough/vtd/dmar.c | 126 +++-- 1 file changed, 65 insertions(+), 61 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 8ed1e24..93f10fd 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -583,6 +583,68 @@ out: return ret; } +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru) +{ +bool_t ignore = 0; +unsigned int i = 0; +int ret = 0; + +/* Skip checking if segment is not accessible yet. */ +if ( !pci_known_segment(rmrru-segment) ) +i = UINT_MAX; + +for ( ; i rmrru-scope.devices_cnt; i++ ) +{ +u8 b = PCI_BUS(rmrru-scope.devices[i]); +u8 d = PCI_SLOT(rmrru-scope.devices[i]); +u8 f = PCI_FUNC(rmrru-scope.devices[i]); + +if ( pci_device_detect(rmrru-segment, b, d, f) == 0 ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Non-existent device (%04x:%02x:%02x.%u) is reported + in RMRR (%PRIx64, %PRIx64)'s scope!\n, +rmrru-segment, b, d, f, +rmrru-base_address, rmrru-end_address); +ignore = 1; +} +else +{ +ignore = 0; +break; +} +} + +if ( ignore ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Ignore the RMRR (%PRIx64, %PRIx64) due to +devices under its scope are not PCI discoverable!\n, +rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); +xfree(rmrru); +} +else if ( rmrru-base_address rmrru-end_address ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + The RMRR (%PRIx64, %PRIx64) is incorrect!\n, +rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); +xfree(rmrru); +ret = -EFAULT; +} +else +{ +if ( iommu_verbose ) +dprintk(VTDPREFIX, + RMRR region: base_addr %PRIx64 end_address %PRIx64\n, +rmrru-base_address, rmrru-end_address); +acpi_register_rmrr_unit(rmrru); +} + +return ret; +} + static int __init acpi_parse_one_rmrr(struct acpi_dmar_header *header) { @@ -633,68 +695,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, rmrru-scope, RMRR_TYPE, rmrr-segment); -if ( ret || (rmrru-scope.devices_cnt == 0) ) -xfree(rmrru); +if ( !ret (rmrru-scope.devices_cnt != 0) ) +register_one_rmrr(rmrru); else -{ -u8 b, d, f; -bool_t ignore = 0; -unsigned int i = 0; - -/* Skip checking if segment is not accessible yet. */ -if ( !pci_known_segment(rmrr-segment) ) -i = UINT_MAX; - -for ( ; i rmrru-scope.devices_cnt; i++ ) -{ -b = PCI_BUS(rmrru-scope.devices[i]); -d = PCI_SLOT(rmrru-scope.devices[i]); -f = PCI_FUNC(rmrru-scope.devices[i]); - -if ( !pci_device_detect(rmrr-segment, b, d, f) ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - Non-existent device (%04x:%02x:%02x.%u) is reported - in RMRR (%PRIx64, %PRIx64)'s scope!\n, -rmrr-segment, b, d, f, -rmrru-base_address, rmrru-end_address); -ignore = 1; -} -else -{ -ignore = 0; -break; -} -} - -if ( ignore ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - Ignore the RMRR (%PRIx64, %PRIx64) due to -devices under its scope are not PCI discoverable!\n, -rmrru-base_address, rmrru-end_address); -scope_devices_free(rmrru-scope); -xfree(rmrru); -} -else if ( base_addr end_addr ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - The RMRR (%PRIx64, %PRIx64) is incorrect!\n, -rmrru-base_address, rmrru-end_address); -scope_devices_free(rmrru-scope); -xfree(rmrru); -ret = -EFAULT; -} -else -{ -if ( iommu_verbose ) -dprintk(VTDPREFIX, - RMRR region: base_addr %PRIx64 - end_address %PRIx64\n, -rmrru-base_address, rmrru-end_address
[Xen-devel] [PATCH v10 1/5] dmar: device scope mem leak fix
From: Elena Ufimtseva elena.ufimts...@oracle.com Release memory allocated for scope.devices dmar units on various failure paths and when disabling dmar. Set device count after sucessfull memory allocation, not before, in device scope parsing function. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- Changes in v10: - mark patch v6 as v10 and include into the series of patches which add RMRR comman line option for Xen; Changes in v6: - eliminated unrelated code move; - fix introduces in v5 memory leak; Changes in v5; - xencope_devices_free actually safe; Changes in v4: - make scope_devices_free safe to call with NULL scope pointer; - since scope_devices_free is safe to call, use it in failure path in acpi_parse_one_drhd; Changes in v3: - make freeing memory for scope devices and zeroing device counter as a function; - make sure parse_one_rmrr has memory leak fix in this patch; - make sure ret values are not lost acpi_parse_one_drhd; Changes in v2: - release memory for devices scope on error paths in acpi_parse_one_drhd and acpi_parse_one_atsr and set the count to zero; drivers/passthrough/vtd/dmar.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 2b07be9..8ed1e24 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -81,6 +81,15 @@ static int __init acpi_register_rmrr_unit(struct acpi_rmrr_unit *rmrr) return 0; } +static void scope_devices_free(struct dmar_scope *scope) +{ +if ( !scope ) +return; + +scope-devices_cnt = 0; +xfree(scope-devices); +} + static void __init disable_all_dmar_units(void) { struct acpi_drhd_unit *drhd, *_drhd; @@ -90,16 +99,19 @@ static void __init disable_all_dmar_units(void) list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list ) { list_del(drhd-list); +scope_devices_free(drhd-scope); xfree(drhd); } list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list ) { list_del(rmrr-list); +scope_devices_free(rmrr-scope); xfree(rmrr); } list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list ) { list_del(atsr-list); +scope_devices_free(atsr-scope); xfree(atsr); } } @@ -318,13 +330,13 @@ static int __init acpi_parse_dev_scope( if ( (cnt = scope_device_count(start, end)) 0 ) return cnt; -scope-devices_cnt = cnt; if ( cnt 0 ) { scope-devices = xzalloc_array(u16, cnt); if ( !scope-devices ) return -ENOMEM; } +scope-devices_cnt = cnt; while ( start end ) { @@ -427,7 +439,7 @@ static int __init acpi_parse_dev_scope( out: if ( ret ) -xfree(scope-devices); +scope_devices_free(scope); return ret; } @@ -542,6 +554,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) Workaround BIOS bug: ignore the DRHD due to all devices under its scope are not PCI discoverable!\n); +scope_devices_free(dmaru-scope); iommu_free(dmaru); xfree(dmaru); } @@ -562,9 +575,11 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) out: if ( ret ) { +scope_devices_free(dmaru-scope); iommu_free(dmaru); xfree(dmaru); } + return ret; } @@ -658,6 +673,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) Ignore the RMRR (%PRIx64, %PRIx64) due to devices under its scope are not PCI discoverable!\n, rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); xfree(rmrru); } else if ( base_addr end_addr ) @@ -665,6 +681,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) dprintk(XENLOG_WARNING VTDPREFIX, The RMRR (%PRIx64, %PRIx64) is incorrect!\n, rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); xfree(rmrru); ret = -EFAULT; } @@ -727,7 +744,10 @@ acpi_parse_one_atsr(struct acpi_dmar_header *header) } if ( ret ) +{ +scope_devices_free(atsru-scope); xfree(atsru); +} else acpi_register_atsr_unit(atsru); return ret; -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v10 4/5] pci: add PCI_SBDF and PCI_SEG macros
From: Elena Ufimtseva elena.ufimts...@oracle.com In preperation for patch iommu: add rmrr Xen command line option for extra rmrrs which will use it. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- xen/include/xen/pci.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 36e8cd3..d66ecab 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -33,6 +33,8 @@ #define PCI_DEVFN2(bdf) ((bdf) 0xff) #define PCI_BDF(b,d,f) b) 0xff) 8) | PCI_DEVFN(d,f)) #define PCI_BDF2(b,df) b) 0xff) 8) | ((df) 0xff)) +#define PCI_SBDF(s,b,d,f) s) 0x) 16) | PCI_BDF(b,d,f)) +#define PCI_SEG(sbdf) (((sbdf) 16) 0x) struct pci_dev_info { bool_t is_extfn; -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v10 0/5] iommu: add rmrr Xen command line option
From: Elena Ufimtseva elena.ufimts...@oracle.com Add Xen command line option rmrr to specify RMRR regions for devices that are not defined in ACPI thus causing IO Page Fault while booting dom0 in PVH mode. These additional regions will be added to the list of RMRR regions parsed from ACPI. Changes in v10: - incorporate patch 'dmar: device scope mem leak fix' as series requires it. - move patch 'pci: add PCI_SBDF and PCI_SEG macros' close to the last patch which uses it; Changes in v9: - skip to next RMRR region if current overlaps with any in acpi_rmrr_units; - fix typos in commit messages; - remove clean up chages introduced by mistake in v8; Changes in v8: - removed bogus debug in patch 1 with non-functional changes; - changed PRI_RMRRL macro for formatting to reflect the fact that two arguments are used, so make it PRI_RMRR(s,e) for formatting inclusive RMRR range; 'L' is also removed from macro name, which meant to server as a type of arguments (%lx); - added overlapping check with RMRRs from ACPI; - added check based on paddr_bits for pfn's in extra RMRR range (not sure if its redundant with mfn_valid); - addressed while loop exit condition in extra RMRRs parser; Elena Ufimtseva (5): dmar: device scope mem leak fix iommu VT-d: separate rmrr addition function pci: add wrapper for parse_pci pci: add PCI_SBDF and PCI_SEG macros iommu: add rmrr Xen command line option for extra rmrrs docs/misc/xen-command-line.markdown | 13 ++ xen/drivers/passthrough/vtd/dmar.c | 355 +--- xen/drivers/pci/pci.c | 11 ++ xen/include/xen/pci.h | 5 + 4 files changed, 322 insertions(+), 62 deletions(-) -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v10 3/5] pci: add wrapper for parse_pci
From: Elena Ufimtseva elena.ufimts...@oracle.com For sbdf's parsing in RMRR command line add __parse_pci with additional parameter def_seg. __parse_pci will help to identify if segment was found in string being parsed or default segment was used. Make a wrapper parse_pci so the rest of the callers are not affected. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com Acked-by: Jan Beulich jbeul...@suse.com --- xen/drivers/pci/pci.c | 11 +++ xen/include/xen/pci.h | 3 +++ 2 files changed, 14 insertions(+) diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c index ca07ed0..788a356 100644 --- a/xen/drivers/pci/pci.c +++ b/xen/drivers/pci/pci.c @@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, unsigned int *bus_p, unsigned int *dev_p, unsigned int *func_p) { +bool_t def_seg; + +return __parse_pci(s, seg_p, bus_p, dev_p, func_p, def_seg); +} + +const char *__init __parse_pci(const char *s, unsigned int *seg_p, + unsigned int *bus_p, unsigned int *dev_p, + unsigned int *func_p, bool_t *def_seg) +{ unsigned long seg = simple_strtoul(s, s, 16), bus, dev, func; if ( *s != ':' ) return NULL; bus = simple_strtoul(s + 1, s, 16); +*def_seg = 0; if ( *s == ':' ) dev = simple_strtoul(s + 1, s, 16); else @@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, dev = bus; bus = seg; seg = 0; +*def_seg = 1; } if ( func_p ) { diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 3908146..36e8cd3 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -148,6 +148,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, int cap); int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int cap); const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus, unsigned int *dev, unsigned int *func); +const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus, + unsigned int *dev, unsigned int *func, bool_t *def_seg); + bool_t pcie_aer_get_firmware_first(const struct pci_dev *); -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v10 5/5] iommu: add rmrr Xen command line option for extra rmrrs
From: Elena Ufimtseva elena.ufimts...@oracle.com On some platforms RMRR regions may be not specified in ACPI and thus will not be mapped 1:1 in dom0. This causes IO Page Faults and prevents dom0 from booting in PVH mode. New Xen command line option rmrr allows to specify such devices and memory regions. These regions are added to the list of RMRR defined in ACPI if the device is present in system. As a result, additional RMRRs will be mapped 1:1 in dom0 with correct permissions. Mentioned above problems were discovered during PVH work with ThinkCentre M and Dell 5600T. No official documentation was found so far in regards to what devices and why cause this. Experiments show that ThinkCentre M USB devices with enabled debug port generate DMA read transactions to the regions of memory marked reserved in host e820 map. For Dell 5600T the device and faulting addresses are not found yet. For detailed history of the discussion please check following threads: http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html Format for rmrr Xen command line option: rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] If grub2 used and multiple ranges are specified, ';' should be quoted/escaped, refer to grub2 manual for more information. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- docs/misc/xen-command-line.markdown | 13 +++ xen/drivers/passthrough/vtd/dmar.c | 209 +++- 2 files changed, 221 insertions(+), 1 deletion(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index aa684c0..f307f3d 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1197,6 +1197,19 @@ Specify the host reboot method. 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by default it will use that method first). +### rmrr + '= start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] + +Define RMRR units that are missing from ACPI table along with device they +belong to and use them for 1:1 mapping. End addresses can be omitted and one +page will be mapped. The ranges are inclusive when start and end are specified. +If segment of the first device is not specified, segment zero will be used. +If other segments are not specified, first device segment will be used. +If a segment is specified for other than the first device and it does not match +the one specified for the first one, an error will be reported. +Note: grub2 requires to escape or use quotations if special characters are used, +namely ';', refer to the grub2 documentation if multiple ranges are specified. + ### ro-hpet `= boolean` diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 93f10fd..61e8f28 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -867,6 +867,145 @@ out: return ret; } +#define MAX_EXTRA_RMRR_PAGES 16 +#define MAX_EXTRA_RMRR 10 + +/* RMRR units derived from command line rmrr option. */ +#define MAX_EXTRA_RMRR_DEV 20 +struct extra_rmrr_unit { +struct list_head list; +unsigned long base_pfn, end_pfn; +unsigned int dev_count; +u32sbdf[MAX_EXTRA_RMRR_DEV]; +}; +static __initdata unsigned int nr_rmrr; +static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR]; + +/* Macro for RMRR inclusive range formatting. */ +#define PRI_RMRR(s,e) [%lx-%lx] + +static void __init add_extra_rmrr(void) +{ +struct acpi_rmrr_unit *acpi_rmrr; +struct acpi_rmrr_unit *rmrru; +unsigned int dev, seg, i, j; +unsigned long pfn; +bool_t overlap; + +for ( i = 0; i nr_rmrr; i++ ) +{ +if ( extra_rmrr_units[i].base_pfn extra_rmrr_units[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + Invalid RMRR Range PRI_RMRR(s,e)\n, + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn); +continue; +} + +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn = + MAX_EXTRA_RMRR_PAGES ) +{ +printk(XENLOG_ERR VTDPREFIX + RMRR range PRI_RMRR(s,e) exceeds __stringify(MAX_EXTRA_RMRR_PAGES) pages\n, + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn); +continue; +} + +for ( j = 0; j nr_rmrr; j++ ) +{ +if ( i != j + extra_rmrr_units[i].base_pfn = extra_rmrr_units[j].end_pfn + extra_rmrr_units[j].base_pfn = extra_rmrr_units[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + Overlapping RMRRs PRI_RMRR(s,e) and PRI_RMRR(s,e)\n, + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn
Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros
- jbeul...@suse.com wrote: On 09.07.15 at 14:07, elena.ufimts...@oracle.com wrote: You are right, it needs to be rebased. I can post later rebased on memory leak fix version, if you thin its a way to go. I didn't look at v9 yet, and can't predict when I will be able to. Jan Jan Would you like me to post v10 with memory leak patch included in the patchset before you start looking at v9? Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros
- wei.l...@citrix.com wrote: On Thu, Jul 09, 2015 at 05:00:45PM +0100, Jan Beulich wrote: On 09.07.15 at 17:53, elena.ufimts...@oracle.com wrote: - jbeul...@suse.com wrote: On 09.07.15 at 14:07, elena.ufimts...@oracle.com wrote: You are right, it needs to be rebased. I can post later rebased on memory leak fix version, if you thin its a way to go. I didn't look at v9 yet, and can't predict when I will be able to. Would you like me to post v10 with memory leak patch included in the patchset before you start looking at v9? If there is a dependency on the changes in the leak fix v6, then this would be a good idea. If not, you can keep things as they are now. I view the entire set more as a bug fix than a feature anyway, and hence see no reason not to get this in after the freeze. But I'm adding Wei just in case... Thanks Jan. The dependency exists on memory leak patch, so I will add it to this series and squash the first patch from v9. I just looked at v9. The first three patches are quite mechanical. The fourth patch is relatively bigger but it's also quite straightforward (mostly parsing input). All in all, this series itself is self-contained. I'm don't think OSSTest is able to test that, so it would not cause visible regression on our side. I also agree it's a bug fix. Preferably this series should be applied before first RC. Wei. Thank you Wei. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros
On Thu, Jul 09, 2015 at 09:10:06AM +0100, Jan Beulich wrote: On 08.07.15 at 19:27, konrad.w...@oracle.com wrote: On Tue, Jun 30, 2015 at 07:33:59PM -0400, elena.ufimts...@oracle.com wrote: From: Elena Ufimtseva elena.ufimts...@oracle.com You usually say why you need this patch. Something as simple as: In preperation for patch which will use it is OK. Or, even better, add such macros when the first user appears. Iirc I said so before... Yes, I realized this late. Will move over in the next version if needed. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros
- jbeul...@suse.com wrote: On 09.07.15 at 13:13, elena.ufimts...@oracle.com wrote: On Thu, Jul 09, 2015 at 09:10:06AM +0100, Jan Beulich wrote: On 08.07.15 at 19:27, konrad.w...@oracle.com wrote: On Tue, Jun 30, 2015 at 07:33:59PM -0400, elena.ufimts...@oracle.com wrote: From: Elena Ufimtseva elena.ufimts...@oracle.com You usually say why you need this patch. Something as simple as: In preperation for patch which will use it is OK. Or, even better, add such macros when the first user appears. Iirc I said so before... Yes, I realized this late. Will move over in the next version if needed. Don't you need to rebase on top of v6 of dmar: device scope mem leak fix anyway? Or does the series not conflict with those changes? You are right, it needs to be rebased. I can post later rebased on memory leak fix version, if you thin its a way to go. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 3/4] pci: add wrapper for parse_pci
From: Elena Ufimtseva elena.ufimts...@oracle.com For sbdf's parsing in RMRR command line add __parse_pci with additional parameter def_seg. __parse_pci will help to identify if segment was found in string being parsed or default segment was used. Make a wrapper parse_pci so the rest of the callers are not affected. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com Acked-by: Jan Beulich jbeul...@suse.com --- xen/drivers/pci/pci.c | 11 +++ xen/include/xen/pci.h | 3 +++ 2 files changed, 14 insertions(+) diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c index ca07ed0..788a356 100644 --- a/xen/drivers/pci/pci.c +++ b/xen/drivers/pci/pci.c @@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, unsigned int *bus_p, unsigned int *dev_p, unsigned int *func_p) { +bool_t def_seg; + +return __parse_pci(s, seg_p, bus_p, dev_p, func_p, def_seg); +} + +const char *__init __parse_pci(const char *s, unsigned int *seg_p, + unsigned int *bus_p, unsigned int *dev_p, + unsigned int *func_p, bool_t *def_seg) +{ unsigned long seg = simple_strtoul(s, s, 16), bus, dev, func; if ( *s != ':' ) return NULL; bus = simple_strtoul(s + 1, s, 16); +*def_seg = 0; if ( *s == ':' ) dev = simple_strtoul(s + 1, s, 16); else @@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, dev = bus; bus = seg; seg = 0; +*def_seg = 1; } if ( func_p ) { diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 414106a..d66ecab 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -150,6 +150,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, int cap); int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int cap); const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus, unsigned int *dev, unsigned int *func); +const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus, + unsigned int *dev, unsigned int *func, bool_t *def_seg); + bool_t pcie_aer_get_firmware_first(const struct pci_dev *); -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 2/4] iommu VT-d: separate rmrr addition function
From: Elena Ufimtseva elena.ufimts...@oracle.com In preparation for auxiliary RMRR data provided on Xen command line, make RMRR adding a separate function. Also free memery for rmrr device scope in error path. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- xen/drivers/passthrough/vtd/dmar.c | 126 +++-- 1 file changed, 65 insertions(+), 61 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 77ef708..a8e1e5d 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -585,6 +585,68 @@ out: return ret; } +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru) +{ +bool_t ignore = 0; +unsigned int i = 0; +int ret = 0; + +/* Skip checking if segment is not accessible yet. */ +if ( !pci_known_segment(rmrru-segment) ) +i = UINT_MAX; + +for ( ; i rmrru-scope.devices_cnt; i++ ) +{ +u8 b = PCI_BUS(rmrru-scope.devices[i]); +u8 d = PCI_SLOT(rmrru-scope.devices[i]); +u8 f = PCI_FUNC(rmrru-scope.devices[i]); + +if ( pci_device_detect(rmrru-segment, b, d, f) == 0 ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Non-existent device (%04x:%02x:%02x.%u) is reported + in RMRR (%PRIx64, %PRIx64)'s scope!\n, +rmrru-segment, b, d, f, +rmrru-base_address, rmrru-end_address); +ignore = 1; +} +else +{ +ignore = 0; +break; +} +} + +if ( ignore ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Ignore the RMRR (%PRIx64, %PRIx64) due to +devices under its scope are not PCI discoverable!\n, +rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); +xfree(rmrru); +} +else if ( rmrru-base_address rmrru-end_address ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + The RMRR (%PRIx64, %PRIx64) is incorrect!\n, +rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); +xfree(rmrru); +ret = -EFAULT; +} +else +{ +if ( iommu_verbose ) +dprintk(VTDPREFIX, + RMRR region: base_addr %PRIx64 end_address %PRIx64\n, +rmrru-base_address, rmrru-end_address); +acpi_register_rmrr_unit(rmrru); +} + +return ret; +} + static int __init acpi_parse_one_rmrr(struct acpi_dmar_header *header) { @@ -635,68 +697,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, rmrru-scope, RMRR_TYPE, rmrr-segment); -if ( ret || (rmrru-scope.devices_cnt == 0) ) -xfree(rmrru); +if ( !ret (rmrru-scope.devices_cnt != 0) ) +register_one_rmrr(rmrru); else -{ -u8 b, d, f; -bool_t ignore = 0; -unsigned int i = 0; - -/* Skip checking if segment is not accessible yet. */ -if ( !pci_known_segment(rmrr-segment) ) -i = UINT_MAX; - -for ( ; i rmrru-scope.devices_cnt; i++ ) -{ -b = PCI_BUS(rmrru-scope.devices[i]); -d = PCI_SLOT(rmrru-scope.devices[i]); -f = PCI_FUNC(rmrru-scope.devices[i]); - -if ( !pci_device_detect(rmrr-segment, b, d, f) ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - Non-existent device (%04x:%02x:%02x.%u) is reported - in RMRR (%PRIx64, %PRIx64)'s scope!\n, -rmrr-segment, b, d, f, -rmrru-base_address, rmrru-end_address); -ignore = 1; -} -else -{ -ignore = 0; -break; -} -} - -if ( ignore ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - Ignore the RMRR (%PRIx64, %PRIx64) due to -devices under its scope are not PCI discoverable!\n, -rmrru-base_address, rmrru-end_address); -scope_devices_free(rmrru-scope); -xfree(rmrru); -} -else if ( base_addr end_addr ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - The RMRR (%PRIx64, %PRIx64) is incorrect!\n, -rmrru-base_address, rmrru-end_address); -scope_devices_free(rmrru-scope); -xfree(rmrru); -ret = -EFAULT; -} -else -{ -if ( iommu_verbose ) -dprintk(VTDPREFIX, - RMRR region: base_addr %PRIx64 - end_address %PRIx64\n, -rmrru-base_address, rmrru-end_address); -acpi_register_rmrr_unit(rmrru
[Xen-devel] [PATCH v9 4/4] iommu: add rmrr Xen command line option for extra rmrrs
From: Elena Ufimtseva elena.ufimts...@oracle.com On some platforms RMRR regions may be not specified in ACPI and thus will not be mapped 1:1 in dom0. This causes IO Page Faults and prevents dom0 from booting in PVH mode. New Xen command line option rmrr allows to specify such devices and memory regions. These regions are added to the list of RMRR defined in ACPI if the device is present in system. As a result, additional RMRRs will be mapped 1:1 in dom0 with correct permissions. Mentioned above problems were discovered during PVH work with ThinkCentre M and Dell 5600T. No official documentation was found so far in regards to what devices and why cause this. Experiments show that ThinkCentre M USB devices with enabled debug port generate DMA read transactions to the regions of memory marked reserved in host e820 map. For Dell 5600T the device and faulting addresses are not found yet. For detailed history of the discussion please check following threads: http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html Format for rmrr Xen command line option: rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] If grub2 used and multiple ranges are specified, ';' should be quoted/escaped, refer to grub2 manual for more information. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- docs/misc/xen-command-line.markdown | 13 +++ xen/drivers/passthrough/vtd/dmar.c | 209 +++- 2 files changed, 221 insertions(+), 1 deletion(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index aa684c0..f307f3d 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1197,6 +1197,19 @@ Specify the host reboot method. 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by default it will use that method first). +### rmrr + '= start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] + +Define RMRR units that are missing from ACPI table along with device they +belong to and use them for 1:1 mapping. End addresses can be omitted and one +page will be mapped. The ranges are inclusive when start and end are specified. +If segment of the first device is not specified, segment zero will be used. +If other segments are not specified, first device segment will be used. +If a segment is specified for other than the first device and it does not match +the one specified for the first one, an error will be reported. +Note: grub2 requires to escape or use quotations if special characters are used, +namely ';', refer to the grub2 documentation if multiple ranges are specified. + ### ro-hpet `= boolean` diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index a8e1e5d..f62fb02 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -869,6 +869,145 @@ out: return ret; } +#define MAX_EXTRA_RMRR_PAGES 16 +#define MAX_EXTRA_RMRR 10 + +/* RMRR units derived from command line rmrr option. */ +#define MAX_EXTRA_RMRR_DEV 20 +struct extra_rmrr_unit { +struct list_head list; +unsigned long base_pfn, end_pfn; +unsigned int dev_count; +u32sbdf[MAX_EXTRA_RMRR_DEV]; +}; +static __initdata unsigned int nr_rmrr; +static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR]; + +/* Macro for RMRR inclusive range formatting. */ +#define PRI_RMRR(s,e) [%lx-%lx] + +static void __init add_extra_rmrr(void) +{ +struct acpi_rmrr_unit *acpi_rmrr; +struct acpi_rmrr_unit *rmrru; +unsigned int dev, seg, i, j; +unsigned long pfn; +bool_t overlap; + +for ( i = 0; i nr_rmrr; i++ ) +{ +if ( extra_rmrr_units[i].base_pfn extra_rmrr_units[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + Invalid RMRR Range PRI_RMRR(s,e)\n, + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn); +continue; +} + +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn = + MAX_EXTRA_RMRR_PAGES ) +{ +printk(XENLOG_ERR VTDPREFIX + RMRR range PRI_RMRR(s,e) exceeds __stringify(MAX_EXTRA_RMRR_PAGES) pages\n, + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn); +continue; +} + +for ( j = 0; j nr_rmrr; j++ ) +{ +if ( i != j + extra_rmrr_units[i].base_pfn = extra_rmrr_units[j].end_pfn + extra_rmrr_units[j].base_pfn = extra_rmrr_units[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + Overlapping RMRRs PRI_RMRR(s,e) and PRI_RMRR(s,e)\n, + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn, + extra_rmrr_units[j].base_pfn, extra_rmrr_units[j].end_pfn
[Xen-devel] [PATCH v9 0/4] iommu: add rmrr Xen command line option
From: Elena Ufimtseva elena.ufimts...@oracle.com v9 of rmrr command line patches. Add Xen command line option rmrr to specify RMRR regions for devices that are not defined in ACPI thus causing IO Page Fault while booting dom0 in PVH mode. These additional regions will be added to the list of RMRR regions parsed from ACPI. Changes in v9: - skip to next RMRR region if current overlaps with any in acpi_rmrr_units; - fix typos in commit messages; - remove clean up chages introduced by mistake in v8; Changes in v8: - removed bogus debug in patch 1 with non-functional changes; - changed PRI_RMRRL macro for formatting to reflect the fact that two arguments are used, so make it PRI_RMRR(s,e) for formatting inclusive RMRR range; 'L' is also removed from macro name, which meant to server as a type of arguments (%lx); - added overlapping check with RMRRs from ACPI; - added check based on paddr_bits for pfn's in extra RMRR range (not sure if its redundant with mfn_valid); - addressed while loop exit condition in extra RMRRs parser; Changes in v7: - make sure RMRRs ranges are being checked correctly; - dont interrupt RMRRs checking if some of checks fails, instead continue to next RMRR; - make rmrr variable names more obvious; - fix debug output formatting to match type of rmrr range; - fix typos in rmrr command line document and in comments; Elena Ufimtseva (4): pci: add PCI_SBDF and PCI_SEG macros iommu VT-d: separate rmrr addition function pci: add wrapper for parse_pci iommu: add rmrr Xen command line option for extra rmrrs docs/misc/xen-command-line.markdown | 13 ++ xen/drivers/passthrough/vtd/dmar.c | 334 +--- xen/drivers/pci/pci.c | 11 ++ xen/include/xen/pci.h | 5 + 4 files changed, 301 insertions(+), 62 deletions(-) -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 1/4] pci: add PCI_SBDF and PCI_SEG macros
From: Elena Ufimtseva elena.ufimts...@oracle.com In preparation for patch iommu: add rmrr Xen command line option for extra rmrrs which will use it. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- xen/include/xen/pci.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 3908146..414106a 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -33,6 +33,8 @@ #define PCI_DEVFN2(bdf) ((bdf) 0xff) #define PCI_BDF(b,d,f) b) 0xff) 8) | PCI_DEVFN(d,f)) #define PCI_BDF2(b,df) b) 0xff) 8) | ((df) 0xff)) +#define PCI_SBDF(s,b,d,f) s) 0x) 16) | PCI_BDF(b,d,f)) +#define PCI_SEG(sbdf) (((sbdf) 16) 0x) struct pci_dev_info { bool_t is_extfn; -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6] dmar: device scope mem leak fix
From: Elena Ufimtseva elena.ufimts...@oracle.com Release memory allocated for scope.devices dmar units on various failure paths and when disabling dmar. Set device count after successful memory allocation, not before, in device scope parsing function. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- Changes in v6: - eliminated unrelated code move; - fix introduces in v5 memory leak; Changes in v5; - make scope_devices_free actually safe; Changes in v4: - make scope_devices_free safe to call with NULL scope pointer; - since scope_devices_free is safe to call, use it in failure path in acpi_parse_one_drhd; Changes in v3: - make freeing memory for scope devices and zeroing device counter as a function; - make sure parse_one_rmrr has memory leak fix in this patch; - make sure ret values are not lost acpi_parse_one_drhd; Changes in v2: - release memory for devices scope on error paths in acpi_parse_one_drhd and acpi_parse_one_atsr and set the count to zero; xen/drivers/passthrough/vtd/dmar.c | 24 ++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 2b07be9..8ed1e24 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -81,6 +81,15 @@ static int __init acpi_register_rmrr_unit(struct acpi_rmrr_unit *rmrr) return 0; } +static void scope_devices_free(struct dmar_scope *scope) +{ +if ( !scope ) +return; + +scope-devices_cnt = 0; +xfree(scope-devices); +} + static void __init disable_all_dmar_units(void) { struct acpi_drhd_unit *drhd, *_drhd; @@ -90,16 +99,19 @@ static void __init disable_all_dmar_units(void) list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list ) { list_del(drhd-list); +scope_devices_free(drhd-scope); xfree(drhd); } list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list ) { list_del(rmrr-list); +scope_devices_free(rmrr-scope); xfree(rmrr); } list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list ) { list_del(atsr-list); +scope_devices_free(atsr-scope); xfree(atsr); } } @@ -318,13 +330,13 @@ static int __init acpi_parse_dev_scope( if ( (cnt = scope_device_count(start, end)) 0 ) return cnt; -scope-devices_cnt = cnt; if ( cnt 0 ) { scope-devices = xzalloc_array(u16, cnt); if ( !scope-devices ) return -ENOMEM; } +scope-devices_cnt = cnt; while ( start end ) { @@ -427,7 +439,7 @@ static int __init acpi_parse_dev_scope( out: if ( ret ) -xfree(scope-devices); +scope_devices_free(scope); return ret; } @@ -542,6 +554,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) Workaround BIOS bug: ignore the DRHD due to all devices under its scope are not PCI discoverable!\n); +scope_devices_free(dmaru-scope); iommu_free(dmaru); xfree(dmaru); } @@ -562,9 +575,11 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) out: if ( ret ) { +scope_devices_free(dmaru-scope); iommu_free(dmaru); xfree(dmaru); } + return ret; } @@ -658,6 +673,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) Ignore the RMRR (%PRIx64, %PRIx64) due to devices under its scope are not PCI discoverable!\n, rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); xfree(rmrru); } else if ( base_addr end_addr ) @@ -665,6 +681,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) dprintk(XENLOG_WARNING VTDPREFIX, The RMRR (%PRIx64, %PRIx64) is incorrect!\n, rmrru
Re: [Xen-devel] [PATCH v5] dmar: device scope mem leak fix
On Tue, Jul 07, 2015 at 10:54:25AM +0100, Jan Beulich wrote: On 01.07.15 at 20:30, elena.ufimts...@oracle.com wrote: Release memory allocated for scope.devices when disabling dmar units. Also set device count after memory allocation when device scope parsing. This is explanation of why the code should be moved imho and answers Jan question about why I needed to do this. In acpi_parse_one_drhr move call to acpi_parse_dev_scope after include_all check so the return value does not get overwritten by calling acpi_parse_dev_scope. I can't really connect the middle paragraph to the first or last one, and in any event this doesn't seem to belong in a commit message in that shape. Nor can I see the reason for the movement, even with the last sentence above trying to explain it. What return value do you see being overwritten? And how does that relate to the intention of this patch? Well, you are right that this part is unrelated to this patch. I will later post this change as a separate clean up. But to explain myself, the value of ret after acpi_parse_dev_scope is overwritten if drhd-segment == 0 include_all. I assumed that its important to preserve the ret code. I see the problem though with moving code as include_all will not be set if I exit right after acpi_parse_dev_scope. Thus I am dropping this part from this patch. @@ -474,12 +486,10 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) ret = iommu_alloc(dmaru); if ( ret ) -goto out; - -dev_scope_start = (void *)(drhd + 1); -dev_scope_end = ((void *)drhd) + header-length; -ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, - dmaru-scope, DMAR_TYPE, drhd-segment); +{ +xfree(dmaru); +return ret; +} Why is this being changed from goto out? You're now possibly leaking memory as well as a mapping if iommu_alloc() failed on any of its actions after having set drhd-iommu. Right, I see it. Will fix. Thank you! Elena Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)
- ian.campb...@citrix.com wrote: On Fri, 2015-07-03 at 13:55 +0100, Ian Campbell wrote: On Thu, 2015-07-02 at 16:16 +0100, Ian Jackson wrote: Ian Campbell writes (Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)): On Thu, 2015-07-02 at 09:45 +0100, Ian Campbell wrote: Shall I put up a poll of some sort to gather preferred timeslot options out of that set? Please can everyone who is interested in this topic indicate their date preference/availability at: http://doodle.com/cy88dhwzybg7hh7p I've gone with the usual 5pm BST slow for simplicity. That's 1200 Noon EDT, 9am PDT and 6pm CEST. I'm never available at 1700 BST on a Wednesday, I'm afraid. I can make that time any other day of the week. I've added the Tuesday and Thursday either side of each date to the mix as well. David, Roger, Stefano, Konrad, Boris, Elena: Sorry, would you mind adding your availability for the new dates. Konrad, Elena: Ping. I'll close the poll on Tuesday. At the moment (once I adjust for the missing responses assuming they are yes) the front runner appears like it is going to be Thursday 23rd. Hi Ian, I am fine with this date. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] can't create a vNUMA enabled PV guest
On Wed, Jul 1, 2015 at 10:42 AM, Dario Faggioli dario.faggi...@citrix.com wrote: Hey, I know Wei is away, so I'll try to find the time to look at this myself, but I figured I'll let know about it, in case someone has obvious (or not :-D) ideas. I think I'm facing a bug that prevents creating PV guests with a vNUMA topology. I'm pretty sure I tested this before, while reviewing Wei's patches, so it must be something introduced between then and now (yes, we need a vNUMA OSSTest test case... I'll see about putting one together). So, here we are. With this as base config: name= 'test' # Kernel, params and imags kernel = '/root/3.19.0+/vmlinuz-3.19.0+' ramdisk = '/root/3.19.0+/initrd.img-3.19.0+' # CPUs and Memory and related vcpus = '4' memory = '1024' vnuma = [ [ pnode=0,size=512,vcpus=0-1,vdistances=10,20 ], [ pnode=1,size=512,vcpus=2-3,vdistances=20,10 ] ] # Disks root= '/dev/xvda1 ro' disk= [ 'phy:/dev/vms/test-pv-disk,xvda1,w', ] # Networking dhcp= 'dhcp' vif = [ 'mac=00:16:3E:FA:A7:9B,bridge=xenbr0' ] If I build a HVM guest, everything works: (XEN) Memory location of each domain: (XEN) Domain 0 (total: 129874): (XEN) Node 0: 113466 (XEN) Node 1: 16408 (XEN) Domain 14 (total: 262251): (XEN) Node 0: 131029 (XEN) Node 1: 131222 (XEN) 2 vnodes, 4 vcpus, guest physical layout: (XEN) 0: pnode 0, vcpus 0-1 (XEN) - 1f80 (XEN) 1: pnode 1, vcpus 2-3 (XEN)1f80 - 3f80 root@test:~# numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 node 0 size: 411 MB node 0 free: 311 MB node 1 cpus: 2 3 node 1 size: 442 MB node 1 free: 406 MB node distances: node 0 1 0: 10 20 1: 20 10 If I build a PV guest, it breaks: root@Zhaman:~# xl create -c /etc/xen/test.cfg Parsing config from /etc/xen/test.cfg xc: error: panic: xc_dom_x86.c:940: arch_setup_meminit: failed to allocate 0x2 pages (v=1, p=1) : Internal error xc: error: panic: xc_dom_boot.c:155: xc_dom_boot_mem_init: can't allocate low memory for domain: Out of memory libxl: error: libxl_dom.c:731:libxl__build_pv: xc_dom_boot_mem_init failed: Device or resource busy libxl: error: libxl_create.c:1174:domcreate_rebuild_done: cannot (re-)build domain: -3 libxl: error: libxl.c:1586:libxl__destroy_domid: non-existant domain 15 libxl: error: libxl.c:1544:domain_destroy_callback: unable to destroy guest with domid 15 libxl: error: libxl.c:1471:domain_destroy_cb: destruction of domain 15 failed (XEN) d0v1 Over-allocation for domain 15: 262656 262400 (XEN) memory.c:155:d0v1 Could not allocate order=9 extent: id=15 memflags=210 (0 of 512) (XEN) d0v1 Over-allocation for domain 15: 262401 262400 (XEN) memory.c:155:d0v1 Could not allocate order=0 extent: id=15 memflags=210 (256 of 131072) As said, I'll be looking into it in the next days. If, in the meanwhile, someone has any ideas, that would be much appreciated. :-) Hi Dario! The kernel you are running maybe missing vNUMA patch. Konrad asked me if the patch was upstream. well, It is not, I think I abandoned it :). I will address latest comments and other changes in the v6 review and post it. Elena Regards, Dario -- This happens because I choose it to happen! (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel -- Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] dmar: device scope mem leak fix
On Wed, Jul 01, 2015 at 11:00:45AM +0100, Andrew Cooper wrote: On 01/07/15 00:20, elena.ufimts...@oracle.com wrote: --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -81,6 +81,13 @@ static int __init acpi_register_rmrr_unit(struct acpi_rmrr_unit *rmrr) return 0; } +static void scope_devices_free(struct dmar_scope *scope) +{ +if ( scope ) +scope-devices_cnt = 0; +xfree(scope-devices); This is very liable to suffer a NULL pointer dereference. Thanks Andrew, reposting. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4] dmar: device scope mem leak fix
From: Elena Ufimtseva elena.ufimts...@oracle.com Release memory allocated for scope.devices when disabling dmar units. Also set device count after memory allocation when device scope parsing. This is explanation of why the code should be moved imho and answers Jan question about why I needed to do this. In acpi_parse_one_drhr move call to acpi_parse_dev_scope after include_all check so the return value does not get overwritten by calling acpi_parse_dev_scope. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- Changes in v4: - make scope_devices_free safe to call with NULL scope pointer; - since scope_devices_free is safe to call, use it in failure path in acpi_parse_one_drhd; Changes in v3: - make freeing memory for scope devices and zeroing device counter as a function; - make sure parse_one_rmrr has memory leak fix in this patch; - make sure ret values are not lost acpi_parse_one_drhd; Changes in v2: - release memory for devices scope on error paths in acpi_parse_one_drhd and acpi_parse_one_atsr and set the count to zero; --- xen/drivers/passthrough/vtd/dmar.c | 38 ++ 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 2b07be9..77ef708 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -81,6 +81,13 @@ static int __init acpi_register_rmrr_unit(struct acpi_rmrr_unit *rmrr) return 0; } +static void scope_devices_free(struct dmar_scope *scope) +{ +if ( scope ) +scope-devices_cnt = 0; +xfree(scope-devices); +} + static void __init disable_all_dmar_units(void) { struct acpi_drhd_unit *drhd, *_drhd; @@ -90,16 +97,19 @@ static void __init disable_all_dmar_units(void) list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list ) { list_del(drhd-list); +scope_devices_free(drhd-scope); xfree(drhd); } list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list ) { list_del(rmrr-list); +scope_devices_free(rmrr-scope); xfree(rmrr); } list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list ) { list_del(atsr-list); +scope_devices_free(atsr-scope); xfree(atsr); } } @@ -318,13 +328,13 @@ static int __init acpi_parse_dev_scope( if ( (cnt = scope_device_count(start, end)) 0 ) return cnt; -scope-devices_cnt = cnt; if ( cnt 0 ) { scope-devices = xzalloc_array(u16, cnt); if ( !scope-devices ) return -ENOMEM; } +scope-devices_cnt = cnt; while ( start end ) { @@ -427,7 +437,7 @@ static int __init acpi_parse_dev_scope( out: if ( ret ) -xfree(scope-devices); +scope_devices_free(scope); return ret; } @@ -474,12 +484,10 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) ret = iommu_alloc(dmaru); if ( ret ) -goto out; - -dev_scope_start = (void *)(drhd + 1); -dev_scope_end = ((void *)drhd) + header-length; -ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, - dmaru-scope, DMAR_TYPE, drhd-segment); +{ +xfree(dmaru); +return ret; +} if ( dmaru-include_all ) { @@ -495,7 +503,13 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) if ( drhd-segment == 0 ) include_all = 1; } +if ( ret ) +goto out; +dev_scope_start = (void *)(drhd + 1); +dev_scope_end = ((void *)drhd) + header-length; +ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, + dmaru-scope, DMAR_TYPE, drhd-segment); if ( ret ) goto out; else if ( force_iommu || dmaru-include_all ) @@ -542,6 +556,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) Workaround BIOS bug: ignore the DRHD due to all devices under its scope are not PCI discoverable!\n); +scope_devices_free(dmaru-scope); iommu_free(dmaru); xfree(dmaru); } @@ -562,9 +577,11 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) out: if ( ret ) { +scope_devices_free(dmaru-scope); iommu_free(dmaru); xfree(dmaru); } + return ret; } @@ -658,6 +675,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) Ignore the RMRR (%PRIx64, %PRIx64) due to devices under its scope are not PCI discoverable!\n, rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); xfree(rmrru); } else if ( base_addr end_addr ) @@ -665,6 +683,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) dprintk(XENLOG_WARNING VTDPREFIX, The RMRR (%PRIx64
[Xen-devel] [PATCH v8 3/4] pci: add wrapper for parse_pci
From: Elena Ufimtseva elena.ufimts...@oracle.com For sbdf'si parsing in rmrr command line add __parse_pci with addtional parameter def_seg. __parse_pci will help to identify if segment was found in string being parsed or default segment was used. Make a wrapper parse_pci so the rest of the callers are not affected. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com Acked-by: Jan Beulich jbeul...@suse.com --- xen/drivers/pci/pci.c | 11 +++ xen/include/xen/pci.h | 3 +++ 2 files changed, 14 insertions(+) diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c index ca07ed0..788a356 100644 --- a/xen/drivers/pci/pci.c +++ b/xen/drivers/pci/pci.c @@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, unsigned int *bus_p, unsigned int *dev_p, unsigned int *func_p) { +bool_t def_seg; + +return __parse_pci(s, seg_p, bus_p, dev_p, func_p, def_seg); +} + +const char *__init __parse_pci(const char *s, unsigned int *seg_p, + unsigned int *bus_p, unsigned int *dev_p, + unsigned int *func_p, bool_t *def_seg) +{ unsigned long seg = simple_strtoul(s, s, 16), bus, dev, func; if ( *s != ':' ) return NULL; bus = simple_strtoul(s + 1, s, 16); +*def_seg = 0; if ( *s == ':' ) dev = simple_strtoul(s + 1, s, 16); else @@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, dev = bus; bus = seg; seg = 0; +*def_seg = 1; } if ( func_p ) { diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 414106a..d66ecab 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -150,6 +150,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, int cap); int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int cap); const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus, unsigned int *dev, unsigned int *func); +const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus, + unsigned int *dev, unsigned int *func, bool_t *def_seg); + bool_t pcie_aer_get_firmware_first(const struct pci_dev *); -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros
From: Elena Ufimtseva elena.ufimts...@oracle.com Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- xen/include/xen/pci.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 3908146..414106a 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -33,6 +33,8 @@ #define PCI_DEVFN2(bdf) ((bdf) 0xff) #define PCI_BDF(b,d,f) b) 0xff) 8) | PCI_DEVFN(d,f)) #define PCI_BDF2(b,df) b) 0xff) 8) | ((df) 0xff)) +#define PCI_SBDF(s,b,d,f) s) 0x) 16) | PCI_BDF(b,d,f)) +#define PCI_SEG(sbdf) (((sbdf) 16) 0x) struct pci_dev_info { bool_t is_extfn; -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 4/4] iommu: add rmrr Xen command line option for extra rmrrs
From: Elena Ufimtseva elena.ufimts...@oracle.com On some platforms RMRR regions may be not specified in ACPI and thus will not be mapped 1:1 in dom0. This causes IO Page Faults and prevents dom0 from booting in PVH mode. New Xen command line option rmrr allows to specify such devices and memory regions. These regions are added to the list of RMRR defined in ACPI if the device is present in system. As a result, additional RMRRs will be mapped 1:1 in dom0 with correct permissions. Mentioned above problems were discovered during PVH work with ThinkCentre M and Dell 5600T. No official documentation was found so far in regards to what devices and why cause this. Experiments show that ThinkCentre M USB devices with enabled debug port generate DMA read transactions to the regions of memory marked reserved in host e820 map. For Dell 5600T the device and faulting addresses are not found yet. For detailed history of the discussion please check following threads: http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html Format for rmrr Xen command line option: rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] If grub2 used and multiple ranges are specified, ';' should be quoted/escaped, refer to grub2 manual for more information. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- docs/misc/xen-command-line.markdown | 13 ++ xen/drivers/passthrough/vtd/dmar.c | 246 2 files changed, 236 insertions(+), 23 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index aa684c0..f307f3d 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1197,6 +1197,19 @@ Specify the host reboot method. 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by default it will use that method first). +### rmrr + '= start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] + +Define RMRR units that are missing from ACPI table along with device they +belong to and use them for 1:1 mapping. End addresses can be omitted and one +page will be mapped. The ranges are inclusive when start and end are specified. +If segment of the first device is not specified, segment zero will be used. +If other segments are not specified, first device segment will be used. +If a segment is specified for other than the first device and it does not match +the one specified for the first one, an error will be reported. +Note: grub2 requires to escape or use quotations if special characters are used, +namely ';', refer to the grub2 documentation if multiple ranges are specified. + ### ro-hpet `= boolean` diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index a8e1e5d..fa659a9 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -42,6 +42,8 @@ #define MIN_SCOPE_LEN (sizeof(struct acpi_dmar_device_scope) + \ sizeof(struct acpi_dmar_pci_path)) +#define PRI_RMRR(s,e) [%lx-%lx] + LIST_HEAD_READ_MOSTLY(acpi_drhd_units); LIST_HEAD_READ_MOSTLY(acpi_rmrr_units); static LIST_HEAD_READ_MOSTLY(acpi_atsr_units); @@ -425,7 +427,7 @@ static int __init acpi_parse_dev_scope( default: if ( iommu_verbose ) printk(XENLOG_WARNING VTDPREFIX Unknown scope type %#x\n, - acpi_scope-entry_type); +acpi_scope-entry_type); start += acpi_scope-length; continue; } @@ -479,8 +481,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) INIT_LIST_HEAD(dmaru-ioapic_list); INIT_LIST_HEAD(dmaru-hpet_list); if ( iommu_verbose ) -dprintk(VTDPREFIX, dmaru-address = %PRIx64\n, -dmaru-address); +dprintk(VTDPREFIX, dmaru-address = %PRIx64\n, dmaru-address); ret = iommu_alloc(dmaru); if ( ret ) @@ -541,8 +542,8 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) if ( !pci_device_detect(drhd-segment, b, d, f) ) { dprintk(XENLOG_WARNING VTDPREFIX, - Non-existent device (%04x:%02x:%02x.%u) is reported - in this DRHD's scope!\n, drhd-segment, b, d, f); + Non-existent device (%04x:%02x:%02x.%u) is reported in this DRHD's scope!\n, +drhd-segment, b, d, f); invalid_cnt++; } } @@ -553,8 +554,8 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) invalid_cnt == dmaru-scope.devices_cnt ) { dprintk(XENLOG_WARNING VTDPREFIX, - Workaround BIOS bug: ignore the DRHD due to all -devices under its scope are not PCI discoverable!\n); + Workaround BIOS bug: ignore the DRHD
[Xen-devel] [PATCH v8 0/4] iommu: add rmrr Xen command line option
From: Elena Ufimtseva elena.ufimts...@oracle.com v8 of rmrr comman line patches. Add Xen command line option rmrr to specify RMRR regions for devices that are not defined in ACPI thus causing IO Page Fault while booting dom0 in PVH mode. These additional regions will be added to the list of RMRR regions parsed from ACPI. Changes in v8: - removed bogus debug in patch 1 with non-functional changes; - changed PRI_RMRRL macro for formatting to reflect the fact that two arguments are used, so make it PRI_RMRR(s,e) for formatting inclusive RMRR range; 'L' is also removed from macro name, which meant to server as a type of arguments (%lx); - added overlapping check with RMRRs from ACPI; - added check based on paddr_bits for pfn's in extra RMRR range (not sure if its redundant with mfn_valid); - addressed while loop exit condition in extra RMRRs parser; Changes in v7: - make sure RMRRs ranges are being checked correctly; - dont interrupt RMRRs checking if some of checks fails, instead continue to next RMRR; - make rmrr variable names more obvious; - fix debug output formatting to match type of rmrr range; - fix typos in rmrr command line document and in comments; Changes in v6: - make __parse_pci return correct result and error codes; - move add_extra_rmrr - previous patch was missing RMRR addresses in range check, add it here; - add overlap check and range boundaries check; - moved extra rmrr structure definition to dmar.c; - change def_seg in __parse_pci type from int to bool_t; - change name for extra rmrr range to reflect they hold now pfns; Changes in v5: - make parse_pci a wrapper and add __parse_pci with additional def_seg param to identify if segment was specified; - make possible not to define segment for each device within same rmrr; - limit number of pages for one RMRR by 16; - run mfn_valid check for every address in RMRR range; - add PCI_SBDF macro; - remove list for extra rmrrs as they are kept in static array; Elena Ufimtseva (4): pci: add PCI_SBDF and PCI_SEG macros iommu VT-d: separate rmrr addition function pci: add wrapper for parse_pci iommu: add rmrr Xen command line option for extra rmrrs docs/misc/xen-command-line.markdown | 13 ++ xen/drivers/passthrough/vtd/dmar.c | 360 xen/drivers/pci/pci.c | 11 ++ xen/include/xen/pci.h | 5 + 4 files changed, 311 insertions(+), 78 deletions(-) -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 6/6] AMD-PVH: enable pvh if requirements met
On Wed, Jun 24, 2015 at 02:41:54PM -0700, Mukesh Rathor wrote: On Wed, 24 Jun 2015 16:26:44 -0400 Elena Ufimtseva elena.ufimts...@oracle.com wrote: On Wed, Jun 24, 2015 at 07:24:18PM +0100, Andrew Cooper wrote: On 24/06/15 08:49, Jan Beulich wrote: On 24.06.15 at 04:34, boris.ostrov...@oracle.com wrote: On 06/23/2015 08:30 AM, Jan Beulich wrote: On 22.06.15 at 18:37, elena.ufimts...@oracle.com wrote: --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1444,6 +1444,9 @@ const struct hvm_function_table * __init start_svm(void) svm_function_table.hap_capabilities = HVM_HAP_SUPERPAGE_2MB | ((cpuid_edx(0x8001) 0x0400) ? HVM_HAP_SUPERPAGE_1GB : 0); +if ( cpu_has_svm_npt cpu_has_svm_decode ) +svm_function_table.pvh_supported = 1; If svm_decode indeed is a prereq, then the earlier patch dealing with the handle_mmio() invocations doesn't need to fiddle with VMEXIT_INVLPG other than to maybe add a documenting ASSERT(). I am not sure we should require decode feature to be required for PVH support. I can't remember exactly but I think this feature was first introduced in family 15h so requiring it will leave at least family 10h processors as not supporting PVH. The question was why the dependency was added in the first place. Indeed only fam 12, 15, and 16 have the field documented. Otoh PVH isn't being supported universally on all VMX variants either... Right, but this is a bug (feature?) of the current implementation and need fixing. There are no technical reasons to prevent PVH guests running in any case where an HVM guest currently runs. The only technical restriction I can think of is that a PVH hardware domain needs IOMMU support, but that is it. CCing Mukesh, maybe he will reply to as why that restriction is here. Hi Elena, Basically, the restriction was to allow AMD to come on par with intel and get phase I working on it. Then, I could just focus on handle_mmio for INS/OUTS for both intel and amd, and if supporting !svm_decode family of CPUs was important, then extend handle_mmio further... http://xen-devel.narkive.com/liQjEoV2/rfh-amd-cr-intercept-for-lmsw-clts [In the absence of svm_decode, mov cr would need to go thru handle_mmio..] Thanks Mukesh! thanks, Mukesh ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 6/6] AMD-PVH: enable pvh if requirements met
On Wed, Jun 24, 2015 at 07:24:18PM +0100, Andrew Cooper wrote: On 24/06/15 08:49, Jan Beulich wrote: On 24.06.15 at 04:34, boris.ostrov...@oracle.com wrote: On 06/23/2015 08:30 AM, Jan Beulich wrote: On 22.06.15 at 18:37, elena.ufimts...@oracle.com wrote: --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1444,6 +1444,9 @@ const struct hvm_function_table * __init start_svm(void) svm_function_table.hap_capabilities = HVM_HAP_SUPERPAGE_2MB | ((cpuid_edx(0x8001) 0x0400) ? HVM_HAP_SUPERPAGE_1GB : 0); +if ( cpu_has_svm_npt cpu_has_svm_decode ) +svm_function_table.pvh_supported = 1; If svm_decode indeed is a prereq, then the earlier patch dealing with the handle_mmio() invocations doesn't need to fiddle with VMEXIT_INVLPG other than to maybe add a documenting ASSERT(). I am not sure we should require decode feature to be required for PVH support. I can't remember exactly but I think this feature was first introduced in family 15h so requiring it will leave at least family 10h processors as not supporting PVH. The question was why the dependency was added in the first place. Indeed only fam 12, 15, and 16 have the field documented. Otoh PVH isn't being supported universally on all VMX variants either... Right, but this is a bug (feature?) of the current implementation and need fixing. There are no technical reasons to prevent PVH guests running in any case where an HVM guest currently runs. The only technical restriction I can think of is that a PVH hardware domain needs IOMMU support, but that is it. CCing Mukesh, maybe he will reply to as why that restriction is here. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/6] pvh: domu construct vmcb 64 bit mode start
On Tue, Jun 23, 2015 at 01:02:49PM +0100, Jan Beulich wrote: On 22.06.15 at 18:37, elena.ufimts...@oracle.com wrote: From: Elena Ufimtseva elena.ufimts...@oracle.com Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com As long as this patch originally cam from Mukesh, From: should reflect that imo. Once you made changes, you S-o-b would be required alongside his. And once the changes made, From will not be Mukesh email anymore? --- a/xen/arch/x86/hvm/svm/vmcb.c +++ b/xen/arch/x86/hvm/svm/vmcb.c @@ -162,7 +162,12 @@ static int construct_vmcb(struct vcpu *v) vmcb-ds.attr.bytes = 0xc93; vmcb-fs.attr.bytes = 0xc93; vmcb-gs.attr.bytes = 0xc93; -vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */ + +if ( is_pvh_vcpu(v) ) +/* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */ +vmcb-cs.attr.bytes = 0xa9b; +else +vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */ With 32-bit support now actively being worked on, I don't think we want to see any new 32bitfixme-s proposed to go in. Plus it needs settling on whether the boot mode is to change for PVH. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/6] pvh: domu construct vmcb 64 bit mode start
On Tue, Jun 23, 2015 at 01:02:49PM +0100, Jan Beulich wrote: On 22.06.15 at 18:37, elena.ufimts...@oracle.com wrote: From: Elena Ufimtseva elena.ufimts...@oracle.com Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com As long as this patch originally cam from Mukesh, From: should reflect that imo. Once you made changes, you S-o-b would be required alongside his. --- a/xen/arch/x86/hvm/svm/vmcb.c +++ b/xen/arch/x86/hvm/svm/vmcb.c @@ -162,7 +162,12 @@ static int construct_vmcb(struct vcpu *v) vmcb-ds.attr.bytes = 0xc93; vmcb-fs.attr.bytes = 0xc93; vmcb-gs.attr.bytes = 0xc93; -vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */ + +if ( is_pvh_vcpu(v) ) +/* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */ +vmcb-cs.attr.bytes = 0xa9b; +else +vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */ With 32-bit support now actively being worked on, I don't think we want to see any new 32bitfixme-s proposed to go in. Plus it needs settling on whether the boot mode is to change for PVH. Yep, I will work on this with Boris patches in mind. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/6] AMD-PVH: call hvm_emulate_one instead of handle_mmio
From: Elena Ufimtseva elena.ufimts...@oracle.com Certain IOIO instructions and CR access instructions like lmsw/clts etc need to be emulated. handle_mmio is incorrectly called to accomplish this. Create svm_emulate() to call hvm_emulate_one which is more appropriate, and works for pvh as well. handle_mmio call is forbidden for pvh. Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com --- xen/arch/x86/hvm/svm/svm.c | 27 +++ 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 28792fe..e7262c9 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2289,6 +2289,23 @@ static struct hvm_function_table __initdata svm_function_table = { .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m, }; +static void svm_emulate(struct cpu_user_regs *regs) +{ +int rc; +struct hvm_emulate_ctxt ctxt; + +hvm_emulate_prepare(ctxt, regs); +rc = hvm_emulate_one(ctxt); + +if ( rc != X86EMUL_OKAY ) +{ + if ( ctxt.exn_pending ) + hvm_inject_trap(ctxt.trap); + else + hvm_inject_hw_exception(TRAP_gp_fault, 0); +} +} + void svm_vmexit_handler(struct cpu_user_regs *regs) { uint64_t exit_reason; @@ -2555,16 +2572,16 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) if ( handle_pio(port, bytes, dir) ) __update_guest_eip(regs, vmcb-exitinfo2 - vmcb-rip); } -else if ( !handle_mmio() ) -hvm_inject_hw_exception(TRAP_gp_fault, 0); +else +svm_emulate(regs); break; case VMEXIT_CR0_READ ... VMEXIT_CR15_READ: case VMEXIT_CR0_WRITE ... VMEXIT_CR15_WRITE: if ( cpu_has_svm_decode (vmcb-exitinfo1 (1ULL 63)) ) svm_vmexit_do_cr_access(vmcb, regs); -else if ( !handle_mmio() ) -hvm_inject_hw_exception(TRAP_gp_fault, 0); +else +svm_emulate(regs); break; case VMEXIT_INVLPG: @@ -2575,6 +2592,8 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) } else if ( !handle_mmio() ) hvm_inject_hw_exception(TRAP_gp_fault, 0); + else +svm_emulate(regs); break; case VMEXIT_INVLPGA: -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/6] AMD-PVH: Do not get/set vlapic TPR
From: Elena Ufimtseva elena.ufimts...@oracle.com PVH doesn't use apic emulation hence vlapic-regs ptr is not set for it. Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com --- xen/arch/x86/hvm/svm/svm.c | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index e7262c9..64d22fe 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1059,7 +1059,7 @@ static void noreturn svm_do_resume(struct vcpu *v) hvm_asid_flush_vcpu(v); } -if ( !vcpu_guestmode ) +if ( !vcpu_guestmode !is_pvh_domain(v-domain) ) { vintr_t intr; @@ -2332,7 +2332,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) * NB. We need to preserve the low bits of the TPR to make checked builds * of Windows work, even though they don't actually do anything. */ -if ( !vcpu_guestmode ) { +if ( !vcpu_guestmode !is_pvh_domain(v-domain) ) { intr = vmcb_get_vintr(vmcb); vlapic_set_reg(vcpu_vlapic(v), APIC_TASKPRI, ((intr.fields.tpr 0x0F) 4) | @@ -2720,15 +2720,18 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) } out: -if ( vcpu_guestmode ) -/* Don't clobber TPR of the nested guest. */ -return; - -/* The exit may have updated the TPR: reflect this in the hardware vtpr */ -intr = vmcb_get_vintr(vmcb); -intr.fields.tpr = -(vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI) 0xFF) 4; -vmcb_set_vintr(vmcb, intr); +/* Don't clobber TPR of the nested guest. */ +if ( vcpu_guestmode !is_pvh_domain(v-domain) ) +{ +/* + * The exit may have updated the TPR: reflect this in the hardware + * vtpr. + */ +intr = vmcb_get_vintr(vmcb); +intr.fields.tpr = +(vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI) 0xFF) 4; +vmcb_set_vintr(vmcb, intr); +} } void svm_trace_vmentry(void) -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/6] AMD-PVH: DomU support
From: Elena Ufimtseva elena.ufimts...@oracle.com This is a re-spin of patches for AMD PVH DomU from Mukesh Rathor. As I am diving into more details of AMD PVH, I am reposting his series with minor changes that reviewers (Jan and Boris) posted in comments. The issue with handle_mmio is not yet addressed and I would like to continue discussion Mukesh and Jan previously had in this thread http://lists.xen.org/archives/html/xen-devel/2014-08/msg01760.html The latest proposed solution was to create additional x86_emulate_ops structure that will handle pvh mmio correctly. Should I consider this approach as the one I should be working on? In vmcb construction patch comments Roger suggested to add additional parameter to vcpu_initialise as 32 bit work is in. Since Boris has posted 32-bit pvh domU support, that would be changed and I wanted to see if this is what everyone agrees on. Any other ideas/comments are also appreciated. Thank you. Changes made in this re-post: - left out setting LMA bit in construct_vmcb as its done in hvm_vcpu_initialise; - instead of checking if regs ptr is set in vcpu_vlapic, check if its pvh_domain; Elena Ufimtseva (6): pvh: domu construct vmcb 64 bit mode start AMD-PVH: cpuid intercept AMD-PVH: call hvm_emulate_one instead of handle_mmio AMD-PVH: Do not get/set vlapic TPR AMD-PVH: Support TSC_MODE_NEVER_EMULATE for PVH AMD-PVH: enable pvh if requirements met xen/arch/x86/hvm/svm/svm.c | 80 ++--- xen/arch/x86/hvm/svm/vmcb.c | 16 +++-- xen/arch/x86/time.c | 1 + 3 files changed, 68 insertions(+), 29 deletions(-) -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 5/6] AMD-PVH: Support TSC_MODE_NEVER_EMULATE for PVH
From: Elena Ufimtseva elena.ufimts...@oracle.com On AMD, MSR_AMD64_TSC_RATIO must be set for rdtsc instruction in guest to properly read the cpu tsc. To that end, set tsc_khz in struct domain. Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com --- xen/arch/x86/time.c | 1 + 1 file changed, 1 insertion(+) diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index bbb7e6c..d9709ce 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -1923,6 +1923,7 @@ void tsc_set_info(struct domain *d, * but always_emulate does not for some reason. Figure out * why. */ +d-arch.tsc_khz = cpu_khz; switch ( tsc_mode ) { case TSC_MODE_NEVER_EMULATE: -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 6/6] AMD-PVH: enable pvh if requirements met
From: Elena Ufimtseva elena.ufimts...@oracle.com Finally, enable pvh if the cpu supports NPT and svm decode. Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com --- xen/arch/x86/hvm/svm/svm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 64d22fe..9945550 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1444,6 +1444,9 @@ const struct hvm_function_table * __init start_svm(void) svm_function_table.hap_capabilities = HVM_HAP_SUPERPAGE_2MB | ((cpuid_edx(0x8001) 0x0400) ? HVM_HAP_SUPERPAGE_1GB : 0); +if ( cpu_has_svm_npt cpu_has_svm_decode ) +svm_function_table.pvh_supported = 1; + return svm_function_table; } -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/6] AMD-PVH: cpuid intercept
From: Elena Ufimtseva elena.ufimts...@oracle.com Call pv_cpuid for pvh cpuid intercept. Note, we modify svm_vmexit_do_cpuid instead of the intercept switch because the guest eip needs to be adjusted for pvh also. Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com --- xen/arch/x86/hvm/svm/svm.c | 25 ++--- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 6734fb6..28792fe 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1584,19 +1584,22 @@ static void svm_vmexit_do_cpuid(struct cpu_user_regs *regs) if ( (inst_len = __get_instruction_length(current, INSTR_CPUID)) == 0 ) return; +if ( is_pvh_vcpu(current) ) +pv_cpuid(regs); +else +{ +eax = regs-eax; +ebx = regs-ebx; +ecx = regs-ecx; +edx = regs-edx; -eax = regs-eax; -ebx = regs-ebx; -ecx = regs-ecx; -edx = regs-edx; - -svm_cpuid_intercept(eax, ebx, ecx, edx); - -regs-eax = eax; -regs-ebx = ebx; -regs-ecx = ecx; -regs-edx = edx; +svm_cpuid_intercept(eax, ebx, ecx, edx); +regs-eax = eax; +regs-ebx = ebx; +regs-ecx = ecx; +regs-edx = edx; +} __update_guest_eip(regs, inst_len); } -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/6] pvh: domu construct vmcb 64 bit mode start
From: Elena Ufimtseva elena.ufimts...@oracle.com Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com --- xen/arch/x86/hvm/svm/vmcb.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c index 6339d2a..70a6588 100644 --- a/xen/arch/x86/hvm/svm/vmcb.c +++ b/xen/arch/x86/hvm/svm/vmcb.c @@ -162,7 +162,12 @@ static int construct_vmcb(struct vcpu *v) vmcb-ds.attr.bytes = 0xc93; vmcb-fs.attr.bytes = 0xc93; vmcb-gs.attr.bytes = 0xc93; -vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */ + +if ( is_pvh_vcpu(v) ) +/* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */ +vmcb-cs.attr.bytes = 0xa9b; +else +vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */ /* Guest IDT. */ vmcb-idtr.base = 0; @@ -184,12 +189,17 @@ static int construct_vmcb(struct vcpu *v) vmcb-tr.limit = 0xff; v-arch.hvm_vcpu.guest_cr[0] = X86_CR0_PE | X86_CR0_ET; +/* PVH domains start in paging mode */ +if ( is_pvh_vcpu(v) ) +v-arch.hvm_vcpu.guest_cr[0] |= X86_CR0_PG; hvm_update_guest_cr(v, 0); -v-arch.hvm_vcpu.guest_cr[4] = 0; +v-arch.hvm_vcpu.guest_cr[4] = is_pvh_vcpu(v) ? X86_CR4_PAE : 0; hvm_update_guest_cr(v, 4); -paging_update_paging_modes(v); +/* For pvh, paging mode is updated by arch_set_info_guest(). */ +if ( is_hvm_vcpu(v) ) +paging_update_paging_modes(v); vmcb-_exception_intercepts = HVM_TRAP_MASK -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: making the PVH 64bit ABI as stableo
On Fri, Jun 05, 2015 at 05:29:21PM +0100, Ian Campbell wrote: On Wed, 2015-06-03 at 09:35 -0400, Boris Ostrovsky wrote: What I'm hearing from the x86 maintainers is that this is actually a high priority and not a nice to have cleanup. I picked 32-bit support, Elena is looking into AMD With the TODOs + these 2 being the things which the x86 maintainers have highlighted in this thread as being most critical for marking the ABI as stable (or at least moving experimental-tech preview) let me ask explicotly: What are the current time frames on these two items? For 32-bit support, just to get it to work in the within current framework I think can be done for 4.7 release (which is late this year IIRC). I can't tell you how much it will take to make it a part of a unified 32/64-bit guest launching as I haven't looked at this at all yet. Thanks. What about AMD support then? Elena? Hi Ian. I am working on debugging PVH AMD and looks like its movingi, slowly. Not sure how many other issues will be on the way, but hopefully similar timeframe, ie late this year. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v7 1/4] pci: add PCI_SBDF and PCI_SEG macros
From: Elena Ufimtseva elena.ufimts...@oracle.com Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- xen/include/xen/pci.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 3908146..414106a 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -33,6 +33,8 @@ #define PCI_DEVFN2(bdf) ((bdf) 0xff) #define PCI_BDF(b,d,f) b) 0xff) 8) | PCI_DEVFN(d,f)) #define PCI_BDF2(b,df) b) 0xff) 8) | ((df) 0xff)) +#define PCI_SBDF(s,b,d,f) s) 0x) 16) | PCI_BDF(b,d,f)) +#define PCI_SEG(sbdf) (((sbdf) 16) 0x) struct pci_dev_info { bool_t is_extfn; -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v7 0/4] iommu: add rmrr Xen command line option
From: Elena Ufimtseva elena.ufimts...@oracle.com v7 of rmrr comman line patches. Thank you for comments on v6. Add Xen command line option rmrr to specify RMRR regions for devices that are not defined in ACPI thus causing IO Page Fault while booting dom0 in PVH mode. These additional regions will be added to the list of RMRR regions parsed from ACPI. Changes in v7: - make sure RMRRs ranges are being checked correctly; - dont interrupt RMRRs checking if some of checks fails, instead continue to next RMRR; - make rmrr variable names more obvious; - fix debug output formatting to match type of rmrr range; - fix typos in rmrr command line document and in comments; Changes in v6: - make __parse_pci return correct result and error codes; - move add_extra_rmrr - previous patch was missing RMRR addresses in range check, add it here; - add overlap check and range boundaries check; - moved extra rmrr structure definition to dmar.c; - change def_seg in __parse_pci type from int to bool_t; - change name for extra rmrr range to reflect they hold now pfns; Changes in v5: - make parse_pci a wrapper and add __parse_pci with additional def_seg param to identify if segment was specified; - make possible not to define segment for each device within same rmrr; - limit number of pages for one RMRR by 16; - run mfn_valid check for every address in RMRR range; - add PCI_SBDF macro; - remove list for extra rmrrs as they are kept in static array; Changes in v4 after comments by Jan Beulich: - keep sbdf per device instead of bdf and one segment per RMRR when parsing and compare later; - add check for segment values and make sure they are same for one RMRR; - move RMRR parameters checks and add error messages if RMRRs are incorrect; - make relevant variables and functions static; - mention requirement for segment values in rmrr documentation; Changes in v3: - use ';' instead of '#' in command line and add proper notes for grub ';' special treatment; Changes in v2: - move rmrr parser to dmar.c and make it custom_param; - change of rmrr command line oprion format; since adding multiple device per range support needs to utilize more special characters and offered from the previous review ';' is not supported, '[' ']' are reserved, ':' and used in pci format, range and devices are separated by '#'; Suggestions are welcome; - added support for multiple devices per range; - moved adding misc RMRRs before ACPI RMRR parsing; - make parser fail if pci device is specified incorrectly; Elena Ufimtseva (4): pci: add PCI_SBDF and PCI_SEG macros iommu VT-d: separate rmrr addition function pci: add wrapper for parse_pci iommu: add rmrr Xen command line option for extra rmrrs docs/misc/xen-command-line.markdown | 12 ++ xen/drivers/passthrough/vtd/dmar.c | 313 +--- xen/drivers/pci/pci.c | 11 ++ xen/include/xen/pci.h | 5 + 4 files changed, 279 insertions(+), 62 deletions(-) -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3] dmar: device scope mem leak fix
From: Elena Ufimtseva elena.ufimts...@oracle.com Third attempt to incorporate memory leak fix. Thanks for comment on v2. Release memory allocated for scope.devices when disabling dmar units. Also set device count after memory allocation when device scope parsing. Changes in v3: - make freeing memory for scope devices and zeroing device counter a function and use it; - make sure parse_one_rmrr has memory leak fix in this patch; - make sure ret values are not lost acpi_parse_one_drhd; Changes in v2: - release memory for devices scope on error paths in acpi_parse_one_drhd and acpi_parse_one_atsr and set the count to zero; Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- xen/drivers/passthrough/vtd/dmar.c | 32 +--- 1 file changed, 25 insertions(+), 7 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 2b07be9..a675bf7 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -81,6 +81,12 @@ static int __init acpi_register_rmrr_unit(struct acpi_rmrr_unit *rmrr) return 0; } +static void scope_devices_free(struct dmar_scope *scope) +{ +scope-devices_cnt = 0; +xfree(scope-devices); +} + static void __init disable_all_dmar_units(void) { struct acpi_drhd_unit *drhd, *_drhd; @@ -90,16 +96,19 @@ static void __init disable_all_dmar_units(void) list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list ) { list_del(drhd-list); +scope_devices_free(drhd-scope); xfree(drhd); } list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list ) { list_del(rmrr-list); +scope_devices_free(rmrr-scope); xfree(rmrr); } list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list ) { list_del(atsr-list); +scope_devices_free(atsr-scope); xfree(atsr); } } @@ -318,13 +327,13 @@ static int __init acpi_parse_dev_scope( if ( (cnt = scope_device_count(start, end)) 0 ) return cnt; -scope-devices_cnt = cnt; if ( cnt 0 ) { scope-devices = xzalloc_array(u16, cnt); if ( !scope-devices ) return -ENOMEM; } +scope-devices_cnt = cnt; while ( start end ) { @@ -427,7 +436,7 @@ static int __init acpi_parse_dev_scope( out: if ( ret ) -xfree(scope-devices); +scope_devices_free(scope); return ret; } @@ -476,11 +485,6 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) if ( ret ) goto out; -dev_scope_start = (void *)(drhd + 1); -dev_scope_end = ((void *)drhd) + header-length; -ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, - dmaru-scope, DMAR_TYPE, drhd-segment); - if ( dmaru-include_all ) { if ( iommu_verbose ) @@ -495,7 +499,13 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) if ( drhd-segment == 0 ) include_all = 1; } +if ( ret ) +goto out; +dev_scope_start = (void *)(drhd + 1); +dev_scope_end = ((void *)drhd) + header-length; +ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, + dmaru-scope, DMAR_TYPE, drhd-segment); if ( ret ) goto out; else if ( force_iommu || dmaru-include_all ) @@ -542,6 +552,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) Workaround BIOS bug: ignore the DRHD due to all devices under its scope are not PCI discoverable!\n); +scope_devices_free(dmaru-scope); iommu_free(dmaru); xfree(dmaru); } @@ -552,6 +563,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) its scope are not PCI discoverable! Pls try option iommu=force or iommu=workaround_bios_bug if you really want VT-d\n); +scope_devices_free(dmaru-scope); ret = -EINVAL; } } @@ -565,6 +577,7 @@ out: iommu_free(dmaru); xfree(dmaru); } + return ret; } @@ -658,6 +671,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) Ignore the RMRR (%PRIx64, %PRIx64) due to devices under its scope are not PCI discoverable!\n, rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); xfree(rmrru); } else if ( base_addr end_addr ) @@ -665,6 +679,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) dprintk(XENLOG_WARNING VTDPREFIX, The RMRR (%PRIx64, %PRIx64) is incorrect!\n, rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); xfree(rmrru); ret = -EFAULT; } @@ -727,7 +742,10
[Xen-devel] [PATCH v7 4/4] iommu: add rmrr Xen command line option for extra rmrrs
From: Elena Ufimtseva elena.ufimts...@oracle.com From: Elena Ufimtseva elena.ufimts...@oracle.com On some platforms RMRR regions may be not specified in ACPI and thus will not be mapped 1:1 in dom0. This causes IO Page Faults and prevents dom0 from booting in PVH mode. New Xen command line option rmrr allows to specify such devices and memory regions. These regions are added to the list of RMRR defined in ACPI if the device is present in system. As a result, additional RMRRs will be mapped 1:1 in dom0 with correct permissions. Mentioned above problems were discovered during PVH work with ThinkCentre M and Dell 5600T. No official documentation was found so far in regards to what devices and why cause this. Experiments show that ThinkCentre M USB devices with enabled debug port generate DMA read transactions to the regions of memory marked reserved in host e820 map. For Dell 5600T the device and faulting addresses are not found yet. For detailed history of the discussion please check following threads: http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html Format for rmrr Xen command line option: rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] If grub2 used and multiple ranges are specified, ';' should be quoted/escaped, refer to grub2 manual for more information. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- docs/misc/xen-command-line.markdown | 12 +++ xen/drivers/passthrough/vtd/dmar.c | 183 +++- 2 files changed, 194 insertions(+), 1 deletion(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 4889e27..d2f0668 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1185,6 +1185,18 @@ Specify the host reboot method. 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by default it will use that method first). +### rmrr + '= start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] + +Define RMRRs units that are missing from ACPI table along with device they +belong to and use them for 1:1 mapping. End addresses can be omitted and one +page will be mapped. The ranges are inclusive when start and end are specified. +If segment of the first device is not specified, segment zero will be used. +If other segments are not specified, first device segment will be used. +If segments are specified for every device and not equal, an error will be reported. +Note: grub2 requires to escape or use quotations if special characters are used, +namely ';', refer to the grub2 documentation if multiple ranges are specified. + ### ro-hpet `= boolean` diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 5d78a37..857373f 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -869,6 +869,120 @@ out: return ret; } +#define MAX_EXTRA_RMRR_PAGES 16 +#define MAX_EXTRA_RMRR 10 + +/* RMRR units derived from command line rmrr option */ +#define MAX_EXTRA_RMRR_DEV 20 +struct extra_rmrr_unit { +struct list_head list; +unsigned long base_pfn, end_pfn; +u16dev_count; +u32sbdf[MAX_EXTRA_RMRR_DEV]; +}; +static __initdata unsigned int nr_rmrr; +static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR]; + +#define PRI_RMRRL [%lx - %lx] +static void __init add_extra_rmrr(void) +{ +struct acpi_rmrr_unit *acpi_rmrr; +unsigned int dev, seg, i, j; +unsigned long pfn; + +for ( i = 0; i nr_rmrr; i++ ) +{ +if ( extra_rmrr_units[i].base_pfn extra_rmrr_units[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + Start pfn end pfn for RMRR range PRI_RMRRL\n, + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn); +continue; +} + +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn = MAX_EXTRA_RMRR_PAGES ) +{ +printk(XENLOG_ERR VTDPREFIX + RMRR range exceeds %s pages PRI_RMRRL\n,__stringify(MAX_EXTRA_RMRR_PAGES), + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn); +continue; +} + +for ( j = 0; j nr_rmrr; j++ ) +{ +if ( i != j extra_rmrr_units[i].base_pfn = extra_rmrr_units[j].end_pfn + extra_rmrr_units[j].base_pfn = extra_rmrr_units[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + Overlapping RMRRs PRI_RMRRL and PRI_RMRRL\n, + extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn, + extra_rmrr_units[j].base_pfn, extra_rmrr_units[j].end_pfn); +break; +} +} +/* Broke out of the overlap loop check, continue with next rmrr. */ +if ( j nr_rmrr
[Xen-devel] [PATCH v7 2/4] iommu VT-d: separate rmrr addition function
From: Elena Ufimtseva elena.ufimts...@oracle.com In preparation for auxiliary RMRR data provided on Xen command line, make RMRR adding a separate function. Also free memery for rmrr device scope in error path. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- xen/drivers/passthrough/vtd/dmar.c | 130 - 1 file changed, 69 insertions(+), 61 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index a675bf7..5d78a37 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -581,6 +581,72 @@ out: return ret; } +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru) +{ +bool_t ignore = 0; +unsigned int i = 0; +int ret = 0; + +/* Skip checking if segment is not accessible yet. */ +if ( !pci_known_segment(rmrru-segment) ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, UNKNOWN Prefix! %04x, rmrru-segment); +i = UINT_MAX; +} + +for ( ; i rmrru-scope.devices_cnt; i++ ) +{ +u8 b = PCI_BUS(rmrru-scope.devices[i]); +u8 d = PCI_SLOT(rmrru-scope.devices[i]); +u8 f = PCI_FUNC(rmrru-scope.devices[i]); + +if ( pci_device_detect(rmrru-segment, b, d, f) == 0 ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Non-existent device (%04x:%02x:%02x.%u) is reported + in RMRR (%PRIx64, %PRIx64)'s scope!\n, +rmrru-segment, b, d, f, +rmrru-base_address, rmrru-end_address); +ignore = 1; +} +else +{ +ignore = 0; +break; +} +} + +if ( ignore ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Ignore the RMRR (%PRIx64, %PRIx64) due to +devices under its scope are not PCI discoverable!\n, +rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); +xfree(rmrru); +} +else if ( rmrru-base_address rmrru-end_address ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + The RMRR (%PRIx64, %PRIx64) is incorrect!\n, +rmrru-base_address, rmrru-end_address); +scope_devices_free(rmrru-scope); +xfree(rmrru); +ret = -EFAULT; +} +else +{ +if ( iommu_verbose ) +dprintk(VTDPREFIX, + RMRR region: base_addr %PRIx64 + end_address %PRIx64\n, +rmrru-base_address, rmrru-end_address); +acpi_register_rmrr_unit(rmrru); +} + +return ret; +} + static int __init acpi_parse_one_rmrr(struct acpi_dmar_header *header) { @@ -631,68 +697,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, rmrru-scope, RMRR_TYPE, rmrr-segment); -if ( ret || (rmrru-scope.devices_cnt == 0) ) -xfree(rmrru); +if ( !ret (rmrru-scope.devices_cnt != 0) ) +register_one_rmrr(rmrru); else -{ -u8 b, d, f; -bool_t ignore = 0; -unsigned int i = 0; - -/* Skip checking if segment is not accessible yet. */ -if ( !pci_known_segment(rmrr-segment) ) -i = UINT_MAX; - -for ( ; i rmrru-scope.devices_cnt; i++ ) -{ -b = PCI_BUS(rmrru-scope.devices[i]); -d = PCI_SLOT(rmrru-scope.devices[i]); -f = PCI_FUNC(rmrru-scope.devices[i]); - -if ( !pci_device_detect(rmrr-segment, b, d, f) ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - Non-existent device (%04x:%02x:%02x.%u) is reported - in RMRR (%PRIx64, %PRIx64)'s scope!\n, -rmrr-segment, b, d, f, -rmrru-base_address, rmrru-end_address); -ignore = 1; -} -else -{ -ignore = 0; -break; -} -} - -if ( ignore ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - Ignore the RMRR (%PRIx64, %PRIx64) due to -devices under its scope are not PCI discoverable!\n, -rmrru-base_address, rmrru-end_address); -scope_devices_free(rmrru-scope); -xfree(rmrru); -} -else if ( base_addr end_addr ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - The RMRR (%PRIx64, %PRIx64) is incorrect!\n, -rmrru-base_address, rmrru-end_address); -scope_devices_free(rmrru-scope); -xfree(rmrru); -ret = -EFAULT; -} -else -{ -if ( iommu_verbose ) -dprintk(VTDPREFIX, - RMRR region: base_addr %PRIx64 - end_address %PRIx64\n
Re: [Xen-devel] [PATCH v6 2/4] iommu VT-d: separate rmrr addition function
On Mon, Jun 01, 2015 at 04:51:55AM +, Tian, Kevin wrote: From: elena.ufimts...@oracle.com [mailto:elena.ufimts...@oracle.com] Sent: Saturday, May 30, 2015 5:39 AM From: Elena Ufimtseva elena.ufimts...@oracle.com In preparation for auxiliary RMRR data provided on Xen command line, make RMRR adding a separate function. Also free memery for rmrr device scope in error path. No changes since v5. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com Reviewed-by: Jan Beulich jbeul...@suse.com --- xen/drivers/passthrough/vtd/dmar.c | 129 - 1 file changed, 70 insertions(+), 59 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 0985150..89a2f79 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -576,6 +576,73 @@ out: return ret; } +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru) +{ +bool_t ignore = 0; +unsigned int i = 0; +int ret = 0; + +/* Skip checking if segment is not accessible yet. */ +if ( !pci_known_segment(rmrru-segment) ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, UNKNOWN Prefix! %04x, rmrru-segment); +i = UINT_MAX; +} + +for ( ; i rmrru-scope.devices_cnt; i++ ) +{ +u8 b = PCI_BUS(rmrru-scope.devices[i]); +u8 d = PCI_SLOT(rmrru-scope.devices[i]); +u8 f = PCI_FUNC(rmrru-scope.devices[i]); + +if ( pci_device_detect(rmrru-segment, b, d, f) == 0 ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Non-existent device (%04x:%02x:%02x.%u) is reported + in RMRR (%PRIx64, %PRIx64)'s scope!\n, +rmrru-segment, b, d, f, +rmrru-base_address, rmrru-end_address); +ignore = 1; +} +else +{ +ignore = 0; +break; +} +} + +if ( ignore ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Ignore the RMRR (%PRIx64, %PRIx64) due to +devices under its scope are not PCI discoverable!\n, +rmrru-base_address, rmrru-end_address); +xfree(rmrru-scope.devices); +xfree(rmrru); +ret = -EFAULT; +} +else if ( rmrru-base_address rmrru-end_address ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + The RMRR (%PRIx64, %PRIx64) is incorrect!\n, +rmrru-base_address, rmrru-end_address); +xfree(rmrru-scope.devices); +xfree(rmrru); +ret = -EFAULT; +} above two error handling can be combined into one at the end of the func like in other places. Thanks Kevin Hi Kevin Thank you for review. I think in this case I cannot combine these two as the ret should not be set in first (ignore) branch. Looks like I placed it there by mistake. Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 2/4] iommu VT-d: separate rmrr addition function
On Mon, Jun 01, 2015 at 09:53:55AM +0100, Jan Beulich wrote: On 29.05.15 at 23:38, elena.ufimts...@oracle.com wrote: In preparation for auxiliary RMRR data provided on Xen command line, make RMRR adding a separate function. Also free memery for rmrr device scope in error path. No changes since v5. Certainly there is. (And the statement wouldn't belong here anyway, but below the first --- separator.) Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com Reviewed-by: Jan Beulich jbeul...@suse.com And certainly I didn't approve it in this shape: +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru) +{ +bool_t ignore = 0; +unsigned int i = 0; +int ret = 0; + +/* Skip checking if segment is not accessible yet. */ +if ( !pci_known_segment(rmrru-segment) ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, UNKNOWN Prefix! %04x, rmrru-segment); +i = UINT_MAX; +} + +for ( ; i rmrru-scope.devices_cnt; i++ ) +{ +u8 b = PCI_BUS(rmrru-scope.devices[i]); +u8 d = PCI_SLOT(rmrru-scope.devices[i]); +u8 f = PCI_FUNC(rmrru-scope.devices[i]); + +if ( pci_device_detect(rmrru-segment, b, d, f) == 0 ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Non-existent device (%04x:%02x:%02x.%u) is reported + in RMRR (%PRIx64, %PRIx64)'s scope!\n, +rmrru-segment, b, d, f, +rmrru-base_address, rmrru-end_address); +ignore = 1; +} +else +{ +ignore = 0; +break; +} +} + +if ( ignore ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Ignore the RMRR (%PRIx64, %PRIx64) due to +devices under its scope are not PCI discoverable!\n, +rmrru-base_address, rmrru-end_address); +xfree(rmrru-scope.devices); +xfree(rmrru); +ret = -EFAULT; You _again_ made this an error, which it wasn't before. A little more care please. Yes, and I agreed that it did not make sense to set ret here, wishful typing I guess ) Also you folded the leak fix into here without saying so. As said on the solitary leak fix patch - that change belongs there (not the least because we will want to backport that but not this one). yes, changing this. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] dmar: device scope mem leak fix
On Mon, Jun 01, 2015 at 09:45:51AM +0100, Jan Beulich wrote: On 01.06.15 at 06:47, kevin.t...@intel.com wrote: From: Tian, Kevin Sent: Monday, June 01, 2015 12:43 PM and looks you dropped earlier changes to acpi_parse_one_rmrr. any elaboration why it's not required in this version? Never mind this one. Seems you have it in RMRR patch set. No - it belongs here, not there. Jan Yes, Jan is right, it went to rmrr patch, but have to be here in mem leak fix. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] dmar: device scope mem leak fix
From: Elena Ufimtseva elena.ufimts...@oracle.com Release memory allocated for scope.devices when disabling dmar units. Also set device count after memory allocation when device scope parsing. Changes in v2: - release memory for devices scope on error paths in acpi_parse_one_drhd and acpi_parse_one_atsr and set the count to zero; Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- xen/drivers/passthrough/vtd/dmar.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 2b07be9..0985150 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -90,16 +90,19 @@ static void __init disable_all_dmar_units(void) list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list ) { list_del(drhd-list); +xfree(drhd-scope.devices); xfree(drhd); } list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list ) { list_del(rmrr-list); +xfree(rmrr-scope.devices); xfree(rmrr); } list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list ) { list_del(atsr-list); +xfree(atsr-scope.devices); xfree(atsr); } } @@ -318,13 +321,13 @@ static int __init acpi_parse_dev_scope( if ( (cnt = scope_device_count(start, end)) 0 ) return cnt; -scope-devices_cnt = cnt; if ( cnt 0 ) { scope-devices = xzalloc_array(u16, cnt); if ( !scope-devices ) return -ENOMEM; } +scope-devices_cnt = cnt; while ( start end ) { @@ -427,7 +430,10 @@ static int __init acpi_parse_dev_scope( out: if ( ret ) +{ +scope-devices_cnt = 0; xfree(scope-devices); +} return ret; } @@ -478,8 +484,6 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) dev_scope_start = (void *)(drhd + 1); dev_scope_end = ((void *)drhd) + header-length; -ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, - dmaru-scope, DMAR_TYPE, drhd-segment); if ( dmaru-include_all ) { @@ -496,6 +500,8 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) include_all = 1; } +ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, + dmaru-scope, DMAR_TYPE, drhd-segment); if ( ret ) goto out; else if ( force_iommu || dmaru-include_all ) @@ -554,6 +560,8 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) really want VT-d\n); ret = -EINVAL; } +dmaru-scope.devices_cnt = 0; +xfree(dmaru-scope.devices); } else acpi_register_drhd_unit(dmaru); @@ -727,7 +735,11 @@ acpi_parse_one_atsr(struct acpi_dmar_header *header) } if ( ret ) +{ +atsru-scope.devices_cnt = 0; +xfree(atsru-scope.devices); xfree(atsru); +} else acpi_register_atsr_unit(atsru); return ret; -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/4] iommu: add rmrr Xen command line option for extra rmrrs
From: Elena Ufimtseva elena.ufimts...@oracle.com On some platforms RMRR regions may be not specified in ACPI and thus will not be mapped 1:1 in dom0. This causes IO Page Faults and prevents dom0 from booting in PVH mode. New Xen command line option rmrr allows to specify such devices and memory regions. These regions are added to the list of RMRR defined in ACPI if the device is present in system. As a result, additional RMRRs will be mapped 1:1 in dom0 with correct permissions. Mentioned above problems were discovered during PVH work with ThinkCentre M and Dell 5600T. No official documentation was found so far in regards to what devices and why cause this. Experiments show that ThinkCentre M USB devices with enabled debug port generate DMA read transactions to the regions of memory marked reserved in host e820 map. For Dell 5600T the device and faulting addresses are not found yet. For detailed history of the discussion please check following threads: http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html Format for rmrr Xen command line option: rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] If grub2 used and multiple ranges are specified, ';' should be quoted/escaped, refer to grub2 manual for more information. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- docs/misc/xen-command-line.markdown | 13 +++ xen/drivers/passthrough/vtd/dmar.c | 164 +++- 2 files changed, 176 insertions(+), 1 deletion(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 4889e27..26e2a5e 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1185,6 +1185,19 @@ Specify the host reboot method. 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by default it will use that method first). +### rmrr + '= start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] + +Define RMRRs units that are missing from ACPI table along with device +they belong to and use them for 1:1 mapping. End addresses can be omitted +and one page will be mapped. The ranges are inclusive when start and end +are specified.If segement of the first device is not specified, segment zero will be used. +If other segments are not specified, first device segment will be used. +If segments are specified for every device and not equal, an error will be reported. +Note: grub2 requires to escape or use quotations if special +characters are used, namely ';', refer to the grub2 documentation if multiple +ranges are specified. + ### ro-hpet `= boolean` diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 89a2f79..d675940 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -866,6 +866,106 @@ out: return ret; } +#define MAX_EXTRA_RMRR_PAGES 16 +#define MAX_EXTRA_RMRR 10 + +/* RMRR units derived from command line rmrr option */ +#define MAX_EXTRA_RMRR_DEV 20 +struct extra_rmrr_unit { +struct list_head list; +unsigned long base_pfn, end_pfn; +u16dev_count; +u32sbdf[MAX_EXTRA_RMRR_DEV]; +}; +static __initdata unsigned int nr_rmrr; +static struct __initdata extra_rmrr_unit rmrru[MAX_EXTRA_RMRR]; + +static void __init add_extra_rmrr(void) +{ +struct acpi_rmrr_unit *rmrrn; +unsigned int dev, seg, i, j; +unsigned long pfn; + +for ( i = 0; i nr_rmrr; i++ ) +{ +if ( rmrru[i].base_pfn rmrru[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + Start pfn end pfn for RMRR range [%PRIx64 - %PRIx64]\n, + rmrru[i].base_pfn, rmrru[i].end_pfn); +return; +} + +if ( rmrru[i].end_pfn - rmrru[i].base_pfn = MAX_EXTRA_RMRR_PAGES ) +{ +printk(XENLOG_ERR VTDPREFIX + RMRR range exceeds 16 pages [%PRIx64 - %PRIx64]\n, + rmrru[i].base_pfn, rmrru[i].end_pfn); +return; +} + +for ( j = 0; j nr_rmrr; j++ ) +{ +if ( i != j rmrru[i].base_pfn = rmrru[j].end_pfn + rmrru[j].base_pfn = rmrru[i].end_pfn ) +{ +printk(XENLOG_ERR VTDPREFIX + Overlapping RMRRs [%PRIx64,%PRIx64] and [%PRIx64,%PRIx64]\n, + rmrru[i].base_pfn, rmrru[i].end_pfn, + rmrru[j].base_pfn, rmrru[j].end_pfn); + return; +} +} + +for ( pfn = rmrru[i].base_pfn; pfn = rmrru[i].end_pfn; pfn++ ) +{ +if ( !mfn_valid(pfn) ) +if ( iommu_verbose ) +printk(XENLOG_ERR VTDPREFIX + Invalid mfn in RMRR range [%PRIx64 - %PRIx64]\n, + rmrru[i].base_pfn, rmrru[i].end_pfn); +return
[Xen-devel] [PATCH v6 1/4] pci: add PCI_SBDF and PCI_SEG macros
From: Elena Ufimtseva elena.ufimts...@oracle.com Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- xen/include/xen/pci.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 3908146..414106a 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -33,6 +33,8 @@ #define PCI_DEVFN2(bdf) ((bdf) 0xff) #define PCI_BDF(b,d,f) b) 0xff) 8) | PCI_DEVFN(d,f)) #define PCI_BDF2(b,df) b) 0xff) 8) | ((df) 0xff)) +#define PCI_SBDF(s,b,d,f) s) 0x) 16) | PCI_BDF(b,d,f)) +#define PCI_SEG(sbdf) (((sbdf) 16) 0x) struct pci_dev_info { bool_t is_extfn; -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 3/4] pci: add wrapper for parse_pci
From: Elena Ufimtseva elena.ufimts...@oracle.com For sbdf'si parsing in rmrr command line add __parse_pci with addtional parameter def_seg. __parse_pci will help to identify if segment was found in string being parsed or default segment was used. Make a wrapper parse_pci so the rest of the callers are not affected. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com --- xen/drivers/pci/pci.c | 11 +++ xen/include/xen/pci.h | 3 +++ 2 files changed, 14 insertions(+) diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c index ca07ed0..788a356 100644 --- a/xen/drivers/pci/pci.c +++ b/xen/drivers/pci/pci.c @@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, unsigned int *bus_p, unsigned int *dev_p, unsigned int *func_p) { +bool_t def_seg; + +return __parse_pci(s, seg_p, bus_p, dev_p, func_p, def_seg); +} + +const char *__init __parse_pci(const char *s, unsigned int *seg_p, + unsigned int *bus_p, unsigned int *dev_p, + unsigned int *func_p, bool_t *def_seg) +{ unsigned long seg = simple_strtoul(s, s, 16), bus, dev, func; if ( *s != ':' ) return NULL; bus = simple_strtoul(s + 1, s, 16); +*def_seg = 0; if ( *s == ':' ) dev = simple_strtoul(s + 1, s, 16); else @@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int *seg_p, dev = bus; bus = seg; seg = 0; +*def_seg = 1; } if ( func_p ) { diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 414106a..d66ecab 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -150,6 +150,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, int cap); int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int cap); const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus, unsigned int *dev, unsigned int *func); +const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus, + unsigned int *dev, unsigned int *func, bool_t *def_seg); + bool_t pcie_aer_get_firmware_first(const struct pci_dev *); -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 2/4] iommu VT-d: separate rmrr addition function
From: Elena Ufimtseva elena.ufimts...@oracle.com In preparation for auxiliary RMRR data provided on Xen command line, make RMRR adding a separate function. Also free memery for rmrr device scope in error path. No changes since v5. Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com Reviewed-by: Jan Beulich jbeul...@suse.com --- xen/drivers/passthrough/vtd/dmar.c | 129 - 1 file changed, 70 insertions(+), 59 deletions(-) diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index 0985150..89a2f79 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -576,6 +576,73 @@ out: return ret; } +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru) +{ +bool_t ignore = 0; +unsigned int i = 0; +int ret = 0; + +/* Skip checking if segment is not accessible yet. */ +if ( !pci_known_segment(rmrru-segment) ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, UNKNOWN Prefix! %04x, rmrru-segment); +i = UINT_MAX; +} + +for ( ; i rmrru-scope.devices_cnt; i++ ) +{ +u8 b = PCI_BUS(rmrru-scope.devices[i]); +u8 d = PCI_SLOT(rmrru-scope.devices[i]); +u8 f = PCI_FUNC(rmrru-scope.devices[i]); + +if ( pci_device_detect(rmrru-segment, b, d, f) == 0 ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Non-existent device (%04x:%02x:%02x.%u) is reported + in RMRR (%PRIx64, %PRIx64)'s scope!\n, +rmrru-segment, b, d, f, +rmrru-base_address, rmrru-end_address); +ignore = 1; +} +else +{ +ignore = 0; +break; +} +} + +if ( ignore ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + Ignore the RMRR (%PRIx64, %PRIx64) due to +devices under its scope are not PCI discoverable!\n, +rmrru-base_address, rmrru-end_address); +xfree(rmrru-scope.devices); +xfree(rmrru); +ret = -EFAULT; +} +else if ( rmrru-base_address rmrru-end_address ) +{ +dprintk(XENLOG_WARNING VTDPREFIX, + The RMRR (%PRIx64, %PRIx64) is incorrect!\n, +rmrru-base_address, rmrru-end_address); +xfree(rmrru-scope.devices); +xfree(rmrru); +ret = -EFAULT; +} +else +{ +if ( iommu_verbose ) +dprintk(VTDPREFIX, + RMRR region: base_addr %PRIx64 + end_address %PRIx64\n, +rmrru-base_address, rmrru-end_address); +acpi_register_rmrr_unit(rmrru); +} + +return ret; +} + static int __init acpi_parse_one_rmrr(struct acpi_dmar_header *header) { @@ -626,66 +693,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end, rmrru-scope, RMRR_TYPE, rmrr-segment); -if ( ret || (rmrru-scope.devices_cnt == 0) ) -xfree(rmrru); +if ( !ret (rmrru-scope.devices_cnt != 0) ) +return register_one_rmrr(rmrru); else -{ -u8 b, d, f; -bool_t ignore = 0; -unsigned int i = 0; - -/* Skip checking if segment is not accessible yet. */ -if ( !pci_known_segment(rmrr-segment) ) -i = UINT_MAX; - -for ( ; i rmrru-scope.devices_cnt; i++ ) -{ -b = PCI_BUS(rmrru-scope.devices[i]); -d = PCI_SLOT(rmrru-scope.devices[i]); -f = PCI_FUNC(rmrru-scope.devices[i]); - -if ( !pci_device_detect(rmrr-segment, b, d, f) ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - Non-existent device (%04x:%02x:%02x.%u) is reported - in RMRR (%PRIx64, %PRIx64)'s scope!\n, -rmrr-segment, b, d, f, -rmrru-base_address, rmrru-end_address); -ignore = 1; -} -else -{ -ignore = 0; -break; -} -} - -if ( ignore ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - Ignore the RMRR (%PRIx64, %PRIx64) due to -devices under its scope are not PCI discoverable!\n, -rmrru-base_address, rmrru-end_address); -xfree(rmrru); -} -else if ( base_addr end_addr ) -{ -dprintk(XENLOG_WARNING VTDPREFIX, - The RMRR (%PRIx64, %PRIx64) is incorrect!\n, -rmrru-base_address, rmrru-end_address); -xfree(rmrru); -ret = -EFAULT; -} -else -{ -if ( iommu_verbose ) -dprintk(VTDPREFIX, - RMRR region: base_addr %PRIx64 - end_address %PRIx64\n, -rmrru
[Xen-devel] [PATCH v6 0/4] iommu: add rmrr Xen command line option
From: Elena Ufimtseva elena.ufimts...@oracle.com v6 of extra rmrr series with addressed comments from Jan Beulich. Any suggestions are welcome. Add Xen command line option rmrr to specify RMRR regions for devices that are not defined in ACPI thus causing IO Page Fault while booting dom0 in PVH mode. These additional regions will be added to the list of RMRR regions parsed from ACPI. Changes in v6: - make __parse_pci return correct result and error codes; - move add_extra_rmrr - previous patch was missing RMRR addresses in range check, add it here; - add overlap check and range boundaries check; - moved extra rmrr structure definition to dmar.c; - change def_seg in __parse_pci type from int to bool_t; - change name for extra rmrr range to reflect they hold now pfns; Changes in v5: - make parse_pci a wrapper and add __parse_pci with additional def_seg param to identify if segment was specified; - make possible not to define segment for each device within same rmrr; - limit number of pages for one RMRR by 16; - run mfn_valid check for every address in RMRR range; - add PCI_SBDF macro; - remove list for extra rmrrs as they are kept in static array; Changes in v4 after comments by Jan Beulich: - keep sbdf per device instead of bdf and one segment per RMRR when parsing and compare later; - add check for segment values and make sure they are same for one RMRR; - move RMRR parameters checks and add error messages if RMRRs are incorrect; - make relevant variables and functions static; - mention requirement for segment values in rmrr documentation; Changes in v3: - use ';' instead of '#' in command line and add proper notes for grub ';' special treatment; Changes in v2: - move rmrr parser to dmar.c and make it custom_param; - change of rmrr command line oprion format; since adding multiple device per range support needs to utilize more special characters and offered from the previous review ';' is not supported, '[' ']' are reserved, ':' and used in pci format, range and devices are separated by '#'; Suggestions are welcome; - added support for multiple devices per range; - moved adding misc RMRRs before ACPI RMRR parsing; - make parser fail if pci device is specified incorrectly; Elena Ufimtseva (4): pci: add PCI_SBDF and PCI_SEG macros iommu VT-d: separate rmrr addition function pci: add wrapper for parse_pci iommu: add rmrr Xen command line option for extra rmrrs docs/misc/xen-command-line.markdown | 13 ++ xen/drivers/passthrough/vtd/dmar.c | 293 xen/drivers/pci/pci.c | 11 ++ xen/include/xen/pci.h | 5 + 4 files changed, 262 insertions(+), 60 deletions(-) -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] dmar: device scope mem leak fix
On Thu, May 28, 2015 at 08:57:16AM +0100, Jan Beulich wrote: On 27.05.15 at 21:56, elena.ufimts...@oracle.com wrote: On Tue, May 26, 2015 at 10:46:30AM +0100, Jan Beulich wrote: On 23.05.15 at 03:27, elena.ufimts...@oracle.com wrote: @@ -658,6 +661,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header) Ignore the RMRR (%PRIx64, %PRIx64) due to devices under its scope are not PCI discoverable!\n, rmrru-base_address, rmrru-end_address); +xfree(rmrru-scope.devices); xfree(rmrru); Do you think the ret should be set in this case also? Iirc in an earlier version of the other series you had added a failure error code setting here, and I had to specifically ask you to remove it. If you still think one is needed here, this would need to be a separate patch with a proper explanation. Jan Hi Jan I do remember this. Looked again and I think it makes sense on ignore path not to return error. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 4/4] iommu: add rmrr Xen command line option for extra rmrrs
On Tue, May 26, 2015 at 01:02:27PM +0100, Jan Beulich wrote: On 23.05.15 at 03:33, elena.ufimts...@oracle.com wrote: --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1185,6 +1185,19 @@ Specify the host reboot method. 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by default it will use that method first). +### rmrr + '= start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]] + +Define RMRRs units that are missing from ACPI table along with device +they belong to and use them for 1:1 mapping. End addresses can be omitted +and one page will be mapped. The ranges are inclusive when start and end +are specified.If segement of the first device is not specified, the default segment will be used. Thanks for review Jan. specified. If the segment ..., segment zero will be used. +If segments are specified for every device and not equal, error will be reported. ..., an error ... --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -50,6 +50,7 @@ static LIST_HEAD_READ_MOSTLY(acpi_rhsa_units); static struct acpi_table_header *__read_mostly dmar_table; static int __read_mostly dmar_flags; static u64 __read_mostly igd_drhd_address; +static void __init add_extra_rmrr(void); Why do you need this declaration? (And if you really need it - no segment annotations on declarations please.) @@ -856,6 +857,78 @@ out: return ret; } +#define MAX_EXTRA_RMRR_PAGES 16 +#define MAX_EXTRA_RMRR 10 +static __initdata unsigned int nr_rmrr; +static struct __initdata extra_rmrr_unit rmrru[MAX_EXTRA_RMRR]; + +static void __init add_extra_rmrr(void) +{ +struct acpi_rmrr_unit *rmrrn; +unsigned int dev, seg, addr; + +for (unsigned int i = 0; i nr_rmrr; i++ ) No C++ style constructs like this please. Instead please add the missing blank after the opening parenthesis. +{ +rmrrn = xmalloc(struct acpi_rmrr_unit); acpi_parse_one_rmrr() uses xzalloc() here. For the avoidance of doubt, I'd be fine with you doing so provided this is correct (i.e. all fields end up properly initialized, just like is the case with the -scope.devices allocation), if this wasn't introducing a latent bug (should a field get added). Agree, will change this. +if ( !rmrrn ) +return; + +rmrrn-scope.devices = xmalloc_array(typeof(*rmrrn-scope.devices), I'm afraid I may have mislead you with comments elsewhere: In xmalloc() invocations, considering the typeful result it produces, using the spelled out type is preferred over typeof() like used here. Thanks for explanation, I did not know that. + rmrru[i].dev_count); +if ( !rmrrn-scope.devices ) +{ +xfree(rmrrn); +return; +} + +if ( rmrru[i].end_address - rmrru[i].base_address MAX_EXTRA_RMRR_PAGES ) Now this reads really odd: With the conversion to store page numbers in these fields, they should have got renamed from _address (and afaict no longer need to be of u64 type). Also note that you have an off-by-one error here: The end address being inclusive, you want to bail on = max. I also fail to see end base being rejected somewhere. Nor are overlaps being dealt with (see acpi_parse_one_rmrr()). Somehow I skipped that, possibly wrong brnach. Will fix this and add overlap check. +{ +printk(XENLOG_ERR VTDPREFIX + RMRR range exceeds 16 pages [%PRIx64 - %PRIx64]\n, + rmrru[i].base_address, rmrru[i].end_address); +xfree(rmrrn-scope.devices); +xfree(rmrrn); +return; +} + +for ( addr = rmrru[i].base_address; addr = rmrru[i].end_address; addr++ ) And the loop variable here shouldn't be addr then (and certainly not of type unsigned int). +{ +if ( iommu_verbose ) +printk(XENLOG_ERR VTDPREFIX + Invalid mfn in RMRR range [%PRIx64 - %PRIx64]\n, + rmrru[i].base_address, rmrru[i].end_address); +xfree(rmrrn-scope.devices); +xfree(rmrrn); +return; +} + +seg = 0; +for ( dev = 0; dev rmrru-dev_count; dev++ ) +{ +rmrrn-scope.devices[dev] = rmrru-sbdf[dev]; +seg = seg | (rmrru-sbdf[dev] 16); |= Also with you having added PCI_SBDF() in patch 1, you should add the matched PCI_SEG() (or some such) instead of open coding it here. Yes, will be also useful. +} +if ( seg != (rmrru-sbdf[0] 16) ) +{ +printk(XENLOG_ERR VTDPREFIX + Segments are not equal for RMRR range [%PRIx64 - %PRIx64]\n, +