Re: [Xen-devel] [PATCH] gdbsx: prefer privcmd character device

2017-10-31 Thread Elena Ufimtseva
On Tue, Oct 31, 2017 at 03:25:39PM +, Wei Liu wrote:
> On Tue, Oct 31, 2017 at 10:20:11AM -0500, Doug Goldstein wrote:
> > Prefer using the character device over the proc file if the character
> > device exists.
> > 
> > CC: Elena Ufimtseva <elena.ufimts...@oracle.com>
> > CC: Ian Jackson <ian.jack...@eu.citrix.com>
> > CC: Stefano Stabellini <stefano.stabell...@eu.citrix.com>
> > CC: Wei Liu <wei.l...@citrix.com>
> > Signed-off-by: Doug Goldstein <car...@cardoe.com>
> > ---
> > So this was originally submitted with 9c89dc95201 and 7d418eab3b6 and
> > was rejected since the goal was to convert gdbsx to use libxc but that
> > hasn't happened. /dev/xen/privcmd should be preferred and this change
> > makes that happen. It would be nice if we landed this with the plan
> > to convert gdbsx happening when it happens.
> 
> Oh well... I think this is fine.
> 
> Elena has the final verdict.

I think this is fine.
I will look into the conversion and relevant discussions if I find them and
see what can be done.

Thanks!

Meanwhile,
Reviewed-by: Elena Ufimtseva <elena.ufimts...@oracle.com>

Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/4] x86: remove has_hvm_container_{domain/vcpu}

2017-03-03 Thread Elena Ufimtseva
On Fri, Mar 03, 2017 at 12:25:07PM +, Roger Pau Monne wrote:
> It is now useless since PVHv1 is removed and PVHv2 is a HVM domain from Xen's
> point of view.
> 
> Signed-off-by: Roger Pau Monné <roger@citrix.com>
> Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>
> Acked-by: Tim Deegan <t...@xen.org>
> Reviewed-by: Kevin Tian <kevin.t...@intel.com>
> Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
> Acked-by: George Dunlap <george.dun...@citrix.com>
> ---
> Cc: Christoph Egger <cheg...@amazon.de>
> Cc: Jan Beulich <jbeul...@suse.com>
> Cc: Andrew Cooper <andrew.coop...@citrix.com>
> Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
> Cc: Suravee Suthikulpanit <suravee.suthikulpa...@amd.com>
> Cc: Jun Nakajima <jun.nakaj...@intel.com>
> Cc: Kevin Tian <kevin.t...@intel.com>
> Cc: Elena Ufimtseva <elena.ufimts...@oracle.com>

Hmm, I dont see the code I should ACK.
But here you go!

Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com>

> Cc: George Dunlap <george.dun...@eu.citrix.com>
> Cc: Tim Deegan <t...@xen.org>
> Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
> ---
>  xen/arch/x86/cpu/mcheck/vmce.c  |  6 +++---
>  xen/arch/x86/cpu/vpmu.c |  4 ++--
>  xen/arch/x86/cpu/vpmu_amd.c | 12 ++--
>  xen/arch/x86/cpu/vpmu_intel.c   | 31 +++
>  xen/arch/x86/cpuid.c|  6 +++---
>  xen/arch/x86/debug.c|  2 +-
>  xen/arch/x86/domain.c   | 28 ++--
>  xen/arch/x86/domain_build.c |  5 ++---
>  xen/arch/x86/domctl.c   |  2 +-
>  xen/arch/x86/hvm/dm.c   |  2 +-
>  xen/arch/x86/hvm/hvm.c  |  6 +++---
>  xen/arch/x86/hvm/irq.c  |  2 +-
>  xen/arch/x86/hvm/mtrr.c |  2 +-
>  xen/arch/x86/hvm/vmsi.c |  3 +--
>  xen/arch/x86/hvm/vmx/vmcs.c |  4 ++--
>  xen/arch/x86/hvm/vmx/vmx.c  |  4 ++--
>  xen/arch/x86/mm.c   |  4 ++--
>  xen/arch/x86/mm/paging.c|  2 +-
>  xen/arch/x86/mm/shadow/common.c |  9 -
>  xen/arch/x86/setup.c|  2 +-
>  xen/arch/x86/time.c | 11 +--
>  xen/arch/x86/traps.c|  4 ++--
>  xen/arch/x86/x86_64/traps.c |  4 ++--
>  xen/drivers/passthrough/x86/iommu.c |  2 +-
>  xen/include/asm-x86/domain.h|  2 +-
>  xen/include/asm-x86/event.h |  2 +-
>  xen/include/asm-x86/guest_access.h  | 12 ++--
>  xen/include/asm-x86/hvm/hvm.h   |  2 +-
>  xen/include/xen/sched.h |  2 --
>  xen/include/xen/tmem_xen.h  |  5 ++---
>  30 files changed, 87 insertions(+), 95 deletions(-)
> 
> diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
> index 8b727b4..6fb7833 100644
> --- a/xen/arch/x86/cpu/mcheck/vmce.c
> +++ b/xen/arch/x86/cpu/mcheck/vmce.c
> @@ -82,7 +82,7 @@ int vmce_restore_vcpu(struct vcpu *v, const struct 
> hvm_vmce_vcpu *ctxt)
>  {
>  dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA capabilities"
>  " %#" PRIx64 " for %pv (supported: %#Lx)\n",
> -has_hvm_container_vcpu(v) ? "HVM" : "PV", ctxt->caps,
> +is_hvm_vcpu(v) ? "HVM" : "PV", ctxt->caps,
>  v, guest_mcg_cap & ~MCG_CAP_COUNT);
>  return -EPERM;
>  }
> @@ -364,7 +364,7 @@ int inject_vmce(struct domain *d, int vcpu)
>  if ( !v->is_initialised )
>  continue;
>  
> -if ( (has_hvm_container_domain(d) ||
> +if ( (is_hvm_domain(d) ||
>guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) &&
>   !test_and_set_bool(v->mce_pending) )
>  {
> @@ -444,7 +444,7 @@ int unmmap_broken_page(struct domain *d, mfn_t mfn, 
> unsigned long gfn)
>  if ( !mfn_valid(mfn) )
>  return -EINVAL;
>  
> -if ( !has_hvm_container_domain(d) || !paging_mode_hap(d) )
> +if ( !is_hvm_domain(d) || !paging_mode_hap(d) )
>  return -EOPNOTSUPP;
>  
>  rc = -1;
> diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
> index a1e9f00..03401fd 100644
> --- a/xen/arch/x86/cpu/vpmu.c
> +++ b/xen/arch/x86/cpu/vpmu.c
> @@ -237,7 +237,7 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
>  vpmu->arch_vpmu_ops->arch_vpmu_save(sampling, 1);
>  vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
>  
> -if ( has_hvm_container_vcpu(sampled) )
> +if ( is_hvm_v

Re: [Xen-devel] [PATCH v3 2/4] x86: remove PVHv1 code

2017-03-03 Thread Elena Ufimtseva
On Fri, Mar 03, 2017 at 12:25:06PM +, Roger Pau Monne wrote:
> This removal applies to both the hypervisor and the toolstack side of PVHv1.
> 
> Note that on the toolstack side a new PVH domain type is introduced to libxl.
> The "none" device model version is removed, together with the "pvh" field in
> the create info struct (the defines announcing those features are also removed
> from libxl.h). The canonical way to create a PVH guest in libxl is to add
> "pvh=1" to the guest config file.
> 
> Signed-off-by: Roger Pau Monné <roger@citrix.com>
> Reviewed-by: Andrew Cooper <andrew.coop...@citrix.com>
> Acked-by: George Dunlap <george.dun...@citrix.com>
> Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
> Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
> Reviewed-by: Kevin Tian <kevin.t...@intel.com>
> ---
> Changes since v1:
>  - Remove dom0pvh option from the command line docs.
>  - Bump domctl interface version due to the removed CDF flag.
>  - Introduce LIBXL_DOMAIN_TYPE_PVH.
>  - Remove "none" from the valid device model version options.
>  - Update the xl.cfg(5) man page to reflect the changes.
> 

For gdbsx bits:
Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
> ---
> Cc: Ian Jackson <ian.jack...@eu.citrix.com>
> Cc: Wei Liu <wei.l...@citrix.com>
> Cc: Elena Ufimtseva <elena.ufimts...@oracle.com>
> Cc: Jan Beulich <jbeul...@suse.com>
> Cc: Andrew Cooper <andrew.coop...@citrix.com>
> Cc: Paul Durrant <paul.durr...@citrix.com>
> Cc: Jun Nakajima <jun.nakaj...@intel.com>
> Cc: Kevin Tian <kevin.t...@intel.com>
> Cc: George Dunlap <george.dun...@eu.citrix.com>
> Cc: Razvan Cojocaru <rcojoc...@bitdefender.com>
> Cc: Tamas K Lengyel <ta...@tklengyel.com>
> ---
>  docs/man/xl.cfg.pod.5.in|  16 +-
>  docs/misc/pvh-readme.txt|  63 
>  docs/misc/xen-command-line.markdown |   7 -
>  tools/debugger/gdbsx/xg/xg_main.c   |   4 +-
>  tools/libxc/include/xc_dom.h|   1 -
>  tools/libxc/include/xenctrl.h   |   2 +-
>  tools/libxc/xc_cpuid_x86.c  |  13 +-
>  tools/libxc/xc_dom_core.c   |   9 --
>  tools/libxc/xc_dom_x86.c|  49 +++---
>  tools/libxc/xc_domain.c |   1 -
>  tools/libxl/libxl.h |  22 +--
>  tools/libxl/libxl_console.c |   1 +
>  tools/libxl/libxl_create.c  |  64 +++-
>  tools/libxl/libxl_disk.c|  10 +-
>  tools/libxl/libxl_dm.c  |   2 +
>  tools/libxl/libxl_dom.c |  86 ++-
>  tools/libxl/libxl_dom_save.c|   7 +-
>  tools/libxl/libxl_dom_suspend.c |   4 +-
>  tools/libxl/libxl_domain.c  |  18 +--
>  tools/libxl/libxl_internal.h|   1 -
>  tools/libxl/libxl_mem.c |   1 +
>  tools/libxl/libxl_nic.c |   7 +-
>  tools/libxl/libxl_pci.c |   9 +-
>  tools/libxl/libxl_stream_read.c |   8 +-
>  tools/libxl/libxl_stream_write.c|  14 +-
>  tools/libxl/libxl_types.idl | 115 ---
>  tools/libxl/libxl_usb.c |   4 +-
>  tools/libxl/libxl_x86.c |  31 ++--
>  tools/libxl/libxl_x86_acpi.c|   3 +-
>  tools/xl/xl_parse.c |   8 +-
>  xen/arch/x86/cpu/vpmu.c |   3 +-
>  xen/arch/x86/domain.c   |  42 +-
>  xen/arch/x86/domain_build.c | 287 
> +---
>  xen/arch/x86/domctl.c   |   7 +-
>  xen/arch/x86/hvm/hvm.c  |  81 +-
>  xen/arch/x86/hvm/hypercall.c|   4 +-
>  xen/arch/x86/hvm/io.c   |   2 -
>  xen/arch/x86/hvm/ioreq.c|   3 +-
>  xen/arch/x86/hvm/irq.c  |   3 -
>  xen/arch/x86/hvm/vmx/vmcs.c |  35 +
>  xen/arch/x86/hvm/vmx/vmx.c  |  12 +-
>  xen/arch/x86/mm.c   |   2 +-
>  xen/arch/x86/mm/p2m-pt.c|   2 +-
>  xen/arch/x86/mm/p2m.c   |   6 +-
>  xen/arch/x86/physdev.c  |   8 -
>  xen/arch/x86/setup.c|   7 -
>  xen/arch/x86/time.c |  27 
>  xen/common/domain.c |   2 -
>  xen/common/domctl.c |  10 --
>  xen/common/kernel.c |   5 -
>  xen/common/vm_event.c   |   8 +-
>  xen/include/asm-x86/domain.h|   1 -
>  xen/include/asm-x86/hvm/hvm.h   |   3 -
>  xen/include/public/domctl.h |  14 +-
>  xen/include/xen/sched.h |   9 +-
>  55 files changed, 252 insertions(+), 911 deletions(-)
>  delete mode 100644 docs/misc/pvh-readme.txt
> 
> diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
> index 505c111..8e4eb97 100644
> --- a/docs/man/xl.cfg.pod.5.in
> +++ b/docs/man/xl.cfg.pod.5.in
> @@ -1064,6 +1064,13 @@ FIFO-based event channel ABI support up to 131,071 
> event channels.
>  Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
>  x86).
>  
> +=item B

Re: [Xen-devel] [PATCH v2 1/3] x86: remove PVHv1 code

2017-03-01 Thread Elena Ufimtseva
On Tue, Feb 28, 2017 at 05:39:39PM +, Roger Pau Monne wrote:
> This removal applies to both the hypervisor and the toolstack side of PVHv1.
> 
> Note that on the toolstack side there's one hiccup: on xl the "pvh"
> configuration option is translated to builder="hvm",
> device_model_version="none".  This is done because otherwise xl would start
> parsing PV like options, and filling the PV struct at libxl_domain_build_info
> (which in turn pollutes the HVM one because it's a union).
> 
> Signed-off-by: Roger Pau Monné <roger@citrix.com>

gdbsx bits:

Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com>

> ---
> Changes since v1:
>  - Remove dom0pvh option from the command line docs.
>  - Bump domctl interface version due to the removed CDF flag.
> 
> ---
> Cc: Ian Jackson <ian.jack...@eu.citrix.com>
> Cc: Wei Liu <wei.l...@citrix.com>
> Cc: Elena Ufimtseva <elena.ufimts...@oracle.com>
> Cc: Jan Beulich <jbeul...@suse.com>
> Cc: Andrew Cooper <andrew.coop...@citrix.com>
> Cc: Paul Durrant <paul.durr...@citrix.com>
> Cc: Jun Nakajima <jun.nakaj...@intel.com>
> Cc: Kevin Tian <kevin.t...@intel.com>
> Cc: George Dunlap <george.dun...@eu.citrix.com>
> Cc: Razvan Cojocaru <rcojoc...@bitdefender.com>
> Cc: Tamas K Lengyel <ta...@tklengyel.com>
> ---
>  docs/man/xl.cfg.pod.5.in|  10 +-
>  docs/misc/pvh-readme.txt|  63 
>  docs/misc/xen-command-line.markdown |   7 -
>  tools/debugger/gdbsx/xg/xg_main.c   |   4 +-
>  tools/libxc/include/xc_dom.h|   1 -
>  tools/libxc/include/xenctrl.h   |   2 +-
>  tools/libxc/xc_cpuid_x86.c  |  13 +-
>  tools/libxc/xc_dom_core.c   |   9 --
>  tools/libxc/xc_dom_x86.c|  49 +++---
>  tools/libxc/xc_domain.c |   1 -
>  tools/libxl/libxl_create.c  |  31 ++--
>  tools/libxl/libxl_dom.c |   1 -
>  tools/libxl/libxl_internal.h|   1 -
>  tools/libxl/libxl_x86.c |   7 +-
>  tools/xl/xl_parse.c |  10 +-
>  xen/arch/x86/cpu/vpmu.c |   3 +-
>  xen/arch/x86/domain.c   |  42 +-
>  xen/arch/x86/domain_build.c | 287 
> +---
>  xen/arch/x86/domctl.c   |   7 +-
>  xen/arch/x86/hvm/hvm.c  |  81 +-
>  xen/arch/x86/hvm/hypercall.c|   4 +-
>  xen/arch/x86/hvm/io.c   |   2 -
>  xen/arch/x86/hvm/ioreq.c|   3 +-
>  xen/arch/x86/hvm/irq.c  |   3 -
>  xen/arch/x86/hvm/vmx/vmcs.c |  35 +
>  xen/arch/x86/hvm/vmx/vmx.c  |  12 +-
>  xen/arch/x86/mm.c   |   2 +-
>  xen/arch/x86/mm/p2m-pt.c|   2 +-
>  xen/arch/x86/mm/p2m.c   |   6 +-
>  xen/arch/x86/physdev.c  |   8 -
>  xen/arch/x86/setup.c|   7 -
>  xen/arch/x86/time.c |  27 
>  xen/common/domain.c |   2 -
>  xen/common/domctl.c |  10 --
>  xen/common/kernel.c |   5 -
>  xen/common/vm_event.c   |   8 +-
>  xen/include/asm-x86/domain.h|   1 -
>  xen/include/asm-x86/hvm/hvm.h   |   3 -
>  xen/include/public/domctl.h |  14 +-
>  xen/include/xen/sched.h |   9 +-
>  40 files changed, 96 insertions(+), 696 deletions(-)
>  delete mode 100644 docs/misc/pvh-readme.txt
> 
> diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
> index 505c111..da1fdd7 100644
> --- a/docs/man/xl.cfg.pod.5.in
> +++ b/docs/man/xl.cfg.pod.5.in
> @@ -1064,6 +1064,12 @@ FIFO-based event channel ABI support up to 131,071 
> event channels.
>  Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
>  x86).
>  
> +=item B

Re: [Xen-devel] [PATCH v13 3/3] iommu: add rmrr Xen command line option for extra rmrrs

2017-01-19 Thread Elena Ufimtseva
On Thu, Jan 19, 2017 at 01:29:15AM -0700, Jan Beulich wrote:
> >>> On 18.01.17 at 20:56,  wrote:
> > I am looking at rmrr_identity_mapping where the RMRR paddr get converted
> > to pfn and then mapped with iommu.
> > If ( rmrr->end_address & ~PAGE_SHIFT_MASK_4K ) == 0, the while loop
> > while ( base_pfn < end_pfn )
> >  will not map that inclusive end_address of rmrr.
> > Does it seem wrong?
> 
> I don't think so, no. end_pfn is being calculated using
> PAGE_ALIGN_4K(), i.e. rounding up.

I mean to say, if the end address is already aligned, then the page wont
be mapped.
For example, if end paddr is 0x000ed000, end_pfn will be
0x000ed and wont be mapped in the loop
while ( base_pfn < end_pfn ).
And we will have mapped RMRR end address saved in arch.mapped_rmrrs
as 0x000ed000.
Looks like parsed ACPI RMRR end addresses are extended to end of the
page though. Not sure if there is somewhere same boundary alignment in
code similar to what you proposed below.

> 
> >> > +rmrr->segment = seg;
> >> > +rmrr->base_address = pfn_to_paddr(user_rmrrs[i].base_pfn);
> >> > +rmrr->end_address = pfn_to_paddr(user_rmrrs[i].end_pfn + 1);
> >> 
> >> "And this seems wrong too, unless I'm mistaken with the inclusive-ness."
> >> 
> > This one is the avoidance of the while loop mapping in
> > rmrr_identity_mapping.
> 
> Well, that's the purpose you describe, but the comment was about
> the calculation itself, which I think is lacking a "- 1", but even better
> would be - for avoiding boundary issues -
> 
> rmrr->end_address = pfn_to_paddr(user_rmrrs[i].end_pfn) | ~PAGE_MASK;

Yes, this will eliminate this problem. This will need to be accounted
for in overlapping condition as well.

> 
> or some such.
> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v13 3/3] iommu: add rmrr Xen command line option for extra rmrrs

2017-01-18 Thread Elena Ufimtseva
On Thu, Jan 12, 2017 at 04:44:42AM -0700, Jan Beulich wrote:
> >>> On 10.01.17 at 23:57,  wrote:
> > Changes in v13:
> >  - Implement feedback from Kevin Tian.
> >
> > https://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03169.html 
> >
> > https://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03170.html 
> >
> > https://lists.xenproject.org/archives/html/xen-devel/2015-10/msg03171.html 
> 
> Any reason some of the review comments I had given were left
> un-addressed? I'll reproduce them in quotes below.
>

Hi Jan

Thanks for reminding!
That was my fault that I did not tell this to Venu when transferring
this patchset to him.
 
> > --- a/xen/drivers/passthrough/vtd/dmar.c
> > +++ b/xen/drivers/passthrough/vtd/dmar.c
> > @@ -859,6 +859,132 @@ out:
> >  return ret;
> >  }
> >  
> > +#define MAX_EXTRA_RMRR_PAGES 16
> > +#define MAX_EXTRA_RMRR 10
> > +
> > +/* RMRR units derived from command line rmrr option. */
> > +#define MAX_EXTRA_RMRR_DEV 20
> 
> So you've kept "extra" in these, but ...
> 
> > +struct user_rmrr {
> 
> ... switched to "user" here and below. Please be consistent.
> 
> > +static int __init add_user_rmrr(void)
> > +{
> > +struct acpi_rmrr_unit *rmrr, *rmrru;
> > +unsigned int idx, seg, i;
> > +unsigned long base, end;
> > +bool overlap;
> > +
> > +for ( i = 0; i < nr_rmrr; i++ )
> > +{
> > +base = user_rmrrs[i].base_pfn;
> > +end = user_rmrrs[i].end_pfn;
> > +
> > +if ( base > end )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Invalid RMRR Range "ERMRRU_FMT"\n",
> > +   ERMRRU_ARG(user_rmrrs[i]));
> > +continue;
> > +}
> > +
> > +if ( (end - base) >= MAX_EXTRA_RMRR_PAGES )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "RMRR range "ERMRRU_FMT" exceeds "\
> > +   __stringify(MAX_EXTRA_RMRR_PAGES)" pages\n",
> > +   ERMRRU_ARG(user_rmrrs[i]));
> > +continue;
> > +}
> > +
> > +overlap = false;
> > +list_for_each_entry(rmrru, _rmrr_units, list)
> > +{
> > +if ( pfn_to_paddr(base) < rmrru->end_address &&
> > + rmrru->base_address < pfn_to_paddr(end + 1) )
> 
> "Aren't both ranges inclusive? I.e. shouldn't the first one be <= (and
>  the second one could be <= too when dropping the +1), matching
>  the check acpi_parse_one_rmrr() does?"

I agree. The ranges in acpu_rmrr_units and user_rmrrs are inclusive.
If this is fixed, then there is another part where I am not sure what
would be the better way to fix this. If fix is needed.

I am looking at rmrr_identity_mapping where the RMRR paddr get converted
to pfn and then mapped with iommu.
If ( rmrr->end_address & ~PAGE_SHIFT_MASK_4K ) == 0, the while loop
while ( base_pfn < end_pfn )
 will not map that inclusive end_address of rmrr.
Does it seem wrong?


> 
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Overlapping RMRRs: "ERMRRU_FMT" and [%lx-%lx]\n",
> > +   ERMRRU_ARG(user_rmrrs[i]),
> > +   paddr_to_pfn(rmrru->base_address),
> > +   paddr_to_pfn(rmrru->end_address));
> > +overlap = true;
> > +break;
> > +}
> > +}
> > +/* Don't add overlapping RMRR. */
> > +if ( overlap )
> > +continue;
> > +
> > +do
> > +{
> > +if ( !mfn_valid(base) )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Invalid pfn in RMRR range "ERMRRU_FMT"\n",
> > +   ERMRRU_ARG(user_rmrrs[i]));
> > +break;
> > +}
> > +} while ( base++ < end );
> > +
> > +/* Invalid pfn in range as the loop ended before end_pfn was 
> > reached. */
> > +if ( base <= end )
> > +continue;
> > +
> > +rmrr = xzalloc(struct acpi_rmrr_unit);
> > +if ( !rmrr )
> > +return -ENOMEM;
> > +
> > +rmrr->scope.devices = xmalloc_array(u16, user_rmrrs[i].dev_count);
> > +if ( !rmrr->scope.devices )
> > +{
> > +xfree(rmrr);
> > +return -ENOMEM;
> > +}
> > +
> > +seg = 0;
> > +for ( idx = 0; idx < user_rmrrs[i].dev_count; idx++ )
> > +{
> > +rmrr->scope.devices[idx] = user_rmrrs[i].sbdf[idx];
> > +seg |= PCI_SEG(user_rmrrs[i].sbdf[idx]);
> > +}
> > +if ( seg != PCI_SEG(user_rmrrs[i].sbdf[0]) )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Segments are not equal for RMRR range "ERMRRU_FMT"\n",
> > +   ERMRRU_ARG(user_rmrrs[i]));
> > +scope_devices_free(>scope);
> > +xfree(rmrr);
> > +continue;
> > +

Re: [Xen-devel] [PATCH] xen/x86: Fix CONFIG_CRASH_DEBUG build following c/s 897129dea

2017-01-06 Thread Elena Ufimtseva
On Fri, Jan 06, 2017 at 02:34:17PM +, Andrew Cooper wrote:
> Found by a Travis RANDCONFIG run.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>

Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
> ---
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Elena Ufimtseva <elena.ufimts...@oracle.com>
> ---
>  xen/arch/x86/gdbstub.c| 8 
>  xen/arch/x86/x86_64/gdbstub.c | 2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/gdbstub.c b/xen/arch/x86/gdbstub.c
> index 2a39189..fe69f81 100644
> --- a/xen/arch/x86/gdbstub.c
> +++ b/xen/arch/x86/gdbstub.c
> @@ -66,16 +66,16 @@ gdb_arch_resume(struct cpu_user_regs *regs,
>  struct gdb_context *ctx)
>  {
>  if ( addr != -1UL )
> -regs->eip = addr;
> +regs->rip = addr;
>  
> -regs->eflags &= ~X86_EFLAGS_TF;
> +regs->_eflags &= ~X86_EFLAGS_TF;
>  
>  /* Set eflags.RF to ensure we do not re-enter. */
> -regs->eflags |= X86_EFLAGS_RF;
> +regs->_eflags |= X86_EFLAGS_RF;
>  
>  /* Set the trap flag if we are single stepping. */
>  if ( type == GDB_STEP )
> -regs->eflags |= X86_EFLAGS_TF;
> +regs->_eflags |= X86_EFLAGS_TF;
>  }
>  
>  /*
> diff --git a/xen/arch/x86/x86_64/gdbstub.c b/xen/arch/x86/x86_64/gdbstub.c
> index 2626519..2c2ab15 100644
> --- a/xen/arch/x86/x86_64/gdbstub.c
> +++ b/xen/arch/x86/x86_64/gdbstub.c
> @@ -44,7 +44,7 @@ gdb_arch_read_reg_array(struct cpu_user_regs *regs, struct 
> gdb_context *ctx)
>  GDB_REG64(regs->r15);
>  
>  GDB_REG64(regs->rip);
> -GDB_REG32(regs->eflags);
> +GDB_REG32(regs->_eflags);
>  
>  GDB_REG32(regs->cs);
>  GDB_REG32(regs->ss);
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 03/14] xen: Use a typesafe to define INVALID_MFN

2016-07-08 Thread Elena Ufimtseva
On Fri, Jul 08, 2016 at 08:20:03PM +0100, Andrew Cooper wrote:
> On 08/07/2016 23:01, Elena Ufimtseva wrote:
> >
> >>> @@ -838,7 +838,6 @@ mfn_t oos_snapshot_lookup(struct domain *d, mfn_t 
> >>> gmfn)
> >>>
> >>>  SHADOW_ERROR("gmfn %lx was OOS but not in hash table\n", 
> >>> mfn_x(gmfn));
> >>>  BUG();
> >>> -return _mfn(INVALID_MFN);
> > Can compiler be unhappy about this?
> 
> This was my suggestion, from a previous round of review.

Ah! Thanks for explanation.
> 
> A while ago, I annotated BUG() with unreachable(), as as execution will
> not continue from a bugframe, but the shadow code is definitely older
> than my change.
> 
> As such, compilers will have been dropping this return statement as part
> of dead-code-elimination anyway.
> 
> This option is better than just replacing one bit of dead code with a
> different bit of dead code.
> 
> ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 04/14] xen: Use a typesafe to define INVALID_GFN

2016-07-08 Thread Elena Ufimtseva
FN) )
> >+return gfn_x(INVALID_GFN);
> >
> >  /* translate l2 guest gfn into l1 guest gfn */
> >  rv = nestedhap_walk_L1_p2m(v, l2_gfn, _gfn, _page_order, 
> > _p2ma,
> >@@ -2123,7 +2123,7 @@ unsigned long paging_gva_to_gfn(struct vcpu *v,
> > !!(*pfec & PFEC_insn_fetch));
> >
> >  if ( rv != NESTEDHVM_PAGEFAULT_DONE )
> >-return INVALID_GFN;
> >+return gfn_x(INVALID_GFN);
> >
> >  /*
> >   * Sanity check that l1_gfn can be used properly as a 4K mapping, 
> > even
> >@@ -2415,7 +2415,7 @@ static void p2m_init_altp2m_helper(struct domain *d, 
> >unsigned int i)
> >  struct p2m_domain *p2m = d->arch.altp2m_p2m[i];
> >  struct ept_data *ept;
> >
> >-p2m->min_remapped_gfn = INVALID_GFN;
> >+p2m->min_remapped_gfn = gfn_x(INVALID_GFN);
> >  p2m->max_remapped_gfn = 0;
> >  ept = >ept;
> >  ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
> >@@ -2551,7 +2551,7 @@ int p2m_change_altp2m_gfn(struct domain *d, unsigned 
> >int idx,
> >
> >  mfn = ap2m->get_entry(ap2m, gfn_x(old_gfn), , , 0, NULL, NULL);
> >
> >-if ( gfn_x(new_gfn) == INVALID_GFN )
> >+if ( gfn_eq(new_gfn, INVALID_GFN) )
> >  {
> >  if ( mfn_valid(mfn) )
> >  p2m_remove_page(ap2m, gfn_x(old_gfn), mfn_x(mfn), 
> > PAGE_ORDER_4K);
> >@@ -2613,7 +2613,7 @@ static void p2m_reset_altp2m(struct p2m_domain *p2m)
> >  /* Uninit and reinit ept to force TLB shootdown */
> >  ept_p2m_uninit(p2m);
> >  ept_p2m_init(p2m);
> >-p2m->min_remapped_gfn = INVALID_GFN;
> >+p2m->min_remapped_gfn = gfn_x(INVALID_GFN);
> >  p2m->max_remapped_gfn = 0;
> >  }
> >
> >diff --git a/xen/arch/x86/mm/shadow/common.c 
> >b/xen/arch/x86/mm/shadow/common.c
> >index 1c0b6cd..61ccddf 100644
> >--- a/xen/arch/x86/mm/shadow/common.c
> >+++ b/xen/arch/x86/mm/shadow/common.c
> >@@ -1707,7 +1707,7 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v, 
> >unsigned long vaddr,
> >
> >  /* Translate the VA to a GFN. */
> >  gfn = paging_get_hostmode(v)->gva_to_gfn(v, NULL, vaddr, );
> >-if ( gfn == INVALID_GFN )
> >+if ( gfn == gfn_x(INVALID_GFN) )
> >  {
> >  if ( is_hvm_vcpu(v) )
> >  hvm_inject_page_fault(pfec, vaddr);
> >diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> >index f892e2f..e54c8b7 100644
> >--- a/xen/arch/x86/mm/shadow/multi.c
> >+++ b/xen/arch/x86/mm/shadow/multi.c
> >@@ -3660,7 +3660,7 @@ sh_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
> >   */
> >  if ( is_hvm_vcpu(v) && !hvm_nx_enabled(v) && !hvm_smep_enabled(v) )
> >  pfec[0] &= ~PFEC_insn_fetch;
> >-return INVALID_GFN;
> >+return gfn_x(INVALID_GFN);
> >  }
> >  gfn = guest_walk_to_gfn();
> >
> >diff --git a/xen/arch/x86/mm/shadow/private.h 
> >b/xen/arch/x86/mm/shadow/private.h
> >index c424ad6..824796f 100644
> >--- a/xen/arch/x86/mm/shadow/private.h
> >+++ b/xen/arch/x86/mm/shadow/private.h
> >@@ -796,7 +796,7 @@ static inline unsigned long vtlb_lookup(struct vcpu *v,
> >  unsigned long va, uint32_t pfec)
> >  {
> >  unsigned long page_number = va >> PAGE_SHIFT;
> >-unsigned long frame_number = INVALID_GFN;
> >+unsigned long frame_number = gfn_x(INVALID_GFN);
> >  int i = vtlb_hash(page_number);
> >
> >  spin_lock(>arch.paging.vtlb_lock);
> >diff --git a/xen/drivers/passthrough/amd/iommu_map.c 
> >b/xen/drivers/passthrough/amd/iommu_map.c
> >index c758459..b8c0a48 100644
> >--- a/xen/drivers/passthrough/amd/iommu_map.c
> >+++ b/xen/drivers/passthrough/amd/iommu_map.c
> >@@ -555,7 +555,7 @@ static int update_paging_mode(struct domain *d, unsigned 
> >long gfn)
> >  unsigned long old_root_mfn;
> >      struct domain_iommu *hd = dom_iommu(d);
> >
> >-if ( gfn == INVALID_GFN )
> >+if ( gfn == gfn_x(INVALID_GFN) )
> >  return -EADDRNOTAVAIL;
> >  ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH));
> >
> >diff --git a/xen/drivers/passthrough/vtd/iommu.c 
> >b/xen/drivers/passthrough/vtd/iommu.c
> >index f010612..c322b9f 100644
> >--- a/xen/drivers/passthrough/vtd/iommu.c
> >+++ b/xen/drivers/passthrough/vtd/iommu.c
> >@@ -611,7 +611,7 @@ static int __must_check iommu_flush_iotlb(struct domain 
> >*d,
> >  if ( iommu_domid == -1 )
> >  continue;
> >
> >-if ( page_count != 1 || gfn == INVALID_GFN )
> >+if ( page_count != 1 || gfn == gfn_x(INVALID_GFN) )
> >  rc = iommu_flush_iotlb_dsi(iommu, iommu_domid,
> > 0, flush_dev_iotlb);
> >  else
> >@@ -640,7 +640,7 @@ static int __must_check iommu_flush_iotlb_pages(struct 
> >domain *d,
> >
> >  static int __must_check iommu_flush_iotlb_all(struct domain *d)
> >  {
> >-return iommu_flush_iotlb(d, INVALID_GFN, 0, 0);
> >+return iommu_flush_iotlb(d, gfn_x(INVALID_GFN), 0, 0);
> >  }
> >
> >  /* clear one page's page table */
> >diff --git a/xen/drivers/passthrough/x86/iommu.c 
> >b/xen/drivers/passthrough/x86/iommu.c
> >index cd435d7..69cd6c5 100644
> >--- a/xen/drivers/passthrough/x86/iommu.c
> >+++ b/xen/drivers/passthrough/x86/iommu.c
> >@@ -61,7 +61,7 @@ int arch_iommu_populate_page_table(struct domain *d)
> >  unsigned long mfn = page_to_mfn(page);
> >  unsigned long gfn = mfn_to_gmfn(d, mfn);
> >
> >-if ( gfn != INVALID_GFN )
> >+if ( gfn != gfn_x(INVALID_GFN) )
> >  {
> >  ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH));
> >  BUG_ON(SHARED_M2P(gfn));
> >diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
> >index a8d980c..79ed4ff 100644
> >--- a/xen/include/asm-x86/guest_pt.h
> >+++ b/xen/include/asm-x86/guest_pt.h
> >@@ -32,7 +32,7 @@
> >  #error GUEST_PAGING_LEVELS not defined
> >  #endif
> >
> >-#define VALID_GFN(m) (m != INVALID_GFN)
> >+#define VALID_GFN(m) (m != gfn_x(INVALID_GFN))
> >
> >  static inline int
> >  valid_gfn(gfn_t m)
> >@@ -251,7 +251,7 @@ static inline gfn_t
> >  guest_walk_to_gfn(walk_t *gw)
> >  {
> >  if ( !(guest_l1e_get_flags(gw->l1e) & _PAGE_PRESENT) )
> >-return _gfn(INVALID_GFN);
> >+return INVALID_GFN;
> >  return guest_l1e_get_gfn(gw->l1e);
> >  }
> >
> >diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> >index 4ab3574..194020e 100644
> >--- a/xen/include/asm-x86/p2m.h
> >+++ b/xen/include/asm-x86/p2m.h
> >@@ -324,7 +324,7 @@ struct p2m_domain {
> >  #define NR_POD_MRP_ENTRIES 32
> >
> >  /* Encode ORDER_2M superpage in top bit of GFN */
> >-#define POD_LAST_SUPERPAGE (INVALID_GFN & ~(INVALID_GFN >> 1))
> >+#define POD_LAST_SUPERPAGE (gfn_x(INVALID_GFN) & ~(gfn_x(INVALID_GFN) >> 1))
> >
> >  unsigned long list[NR_POD_MRP_ENTRIES];
> >  unsigned int idx;
> >diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
> >index 7f207ec..58bc0b8 100644
> >--- a/xen/include/xen/mm.h
> >+++ b/xen/include/xen/mm.h
> >@@ -84,7 +84,7 @@ static inline bool_t mfn_eq(mfn_t x, mfn_t y)
> >
> >  TYPE_SAFE(unsigned long, gfn);
> >  #define PRI_gfn  "05lx"
> >-#define INVALID_GFN  (~0UL)
> >+#define INVALID_GFN  _gfn(~0UL)
> >
> >  #ifndef gfn_t
> >  #define gfn_t /* Grep fodder: gfn_t, _gfn() and gfn_x() are defined above 
> > */
> >
> 
> -- 
> Julien Grall

Acked-by: Elena Ufimtseva <elena.ufimts...@oracle.com>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 03/14] xen: Use a typesafe to define INVALID_MFN

2016-07-08 Thread Elena Ufimtseva
On Wed, Jul 06, 2016 at 02:04:17PM +0100, Julien Grall wrote:
> (CC Elena).
> 
> On 06/07/16 14:01, Julien Grall wrote:
> >Also take the opportunity to convert arch/x86/debug.c to the typesafe
> >mfn and use proper printf format for MFN/GFN when the code around is
> >modified.
> >
> >Signed-off-by: Julien Grall 
> >Reviewed-by: Andrew Cooper 
> >Acked-by: Stefano Stabellini 
> >
> >---
> >Cc: Christoph Egger 
> >Cc: Liu Jinsong 
> >Cc: Jan Beulich 
> >Cc: Mukesh Rathor 
> 
> I forgot to update the CC list since GDSX maintainership was take over by
> Elena. Sorry for that.

No problem!
> 
> >Cc: Paul Durrant 
> >Cc: Jun Nakajima 
> >Cc: Kevin Tian 
> >Cc: George Dunlap 
> >Cc: Tim Deegan 
> >
> > Changes in v6:
> > - Add Stefano's acked-by for ARM bits
> > - Use PRI_mfn and PRI_gfn
> > - Remove set of brackets when it is not necessary
> > - Use mfn_add when possible
> > - Add Andrew's reviewed-by
> >
> > Changes in v5:
> > - Patch added
> >---
> >  xen/arch/arm/p2m.c  |  4 +--
> >  xen/arch/x86/cpu/mcheck/mce.c   |  2 +-
> >  xen/arch/x86/debug.c| 58 
> > +
> >  xen/arch/x86/hvm/hvm.c  |  6 ++---
> >  xen/arch/x86/hvm/viridian.c | 12 -
> >  xen/arch/x86/hvm/vmx/vmx.c  |  2 +-
> >  xen/arch/x86/mm/guest_walk.c|  4 +--
> >  xen/arch/x86/mm/hap/hap.c   |  4 +--
> >  xen/arch/x86/mm/p2m-ept.c   |  6 ++---
> >  xen/arch/x86/mm/p2m-pod.c   | 18 ++---
> >  xen/arch/x86/mm/p2m-pt.c| 18 ++---
> >  xen/arch/x86/mm/p2m.c   | 54 +++---
> >  xen/arch/x86/mm/paging.c| 12 -
> >  xen/arch/x86/mm/shadow/common.c | 43 +++---
> >  xen/arch/x86/mm/shadow/multi.c  | 36 -
> >  xen/common/domain.c |  6 ++---
> >  xen/common/grant_table.c|  6 ++---
> >  xen/include/xen/mm.h|  2 +-
> >  18 files changed, 147 insertions(+), 146 deletions(-)
> >
> >diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> >index 34563bb..d690602 100644
> >--- a/xen/arch/arm/p2m.c
> >+++ b/xen/arch/arm/p2m.c
> >@@ -1461,7 +1461,7 @@ int relinquish_p2m_mapping(struct domain *d)
> >  return apply_p2m_changes(d, RELINQUISH,
> >pfn_to_paddr(p2m->lowest_mapped_gfn),
> >pfn_to_paddr(p2m->max_mapped_gfn),
> >-  pfn_to_paddr(INVALID_MFN),
> >+  pfn_to_paddr(mfn_x(INVALID_MFN)),
> >MATTR_MEM, 0, p2m_invalid,
> >d->arch.p2m.default_access);
> >  }
> >@@ -1476,7 +1476,7 @@ int p2m_cache_flush(struct domain *d, xen_pfn_t 
> >start_mfn, xen_pfn_t end_mfn)
> >  return apply_p2m_changes(d, CACHEFLUSH,
> >   pfn_to_paddr(start_mfn),
> >   pfn_to_paddr(end_mfn),
> >- pfn_to_paddr(INVALID_MFN),
> >+ pfn_to_paddr(mfn_x(INVALID_MFN)),
> >   MATTR_MEM, 0, p2m_invalid,
> >   d->arch.p2m.default_access);
> >  }
> >diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
> >index edcbe48..2695b0c 100644
> >--- a/xen/arch/x86/cpu/mcheck/mce.c
> >+++ b/xen/arch/x86/cpu/mcheck/mce.c
> >@@ -1455,7 +1455,7 @@ long do_mca(XEN_GUEST_HANDLE_PARAM(xen_mc_t) u_xen_mc)
> >  gfn = PFN_DOWN(gaddr);
> >  mfn = mfn_x(get_gfn(d, gfn, ));
> >
> >-if ( mfn == INVALID_MFN )
> >+if ( mfn == mfn_x(INVALID_MFN) )
> >  {
> >  put_gfn(d, gfn);
> >  put_domain(d);
> >diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c
> >index 58cae22..9213ea7 100644
> >--- a/xen/arch/x86/debug.c
> >+++ b/xen/arch/x86/debug.c
> >@@ -43,11 +43,11 @@ typedef unsigned long dbgva_t;
> >  typedef unsigned char dbgbyte_t;
> >
> >  /* Returns: mfn for the given (hvm guest) vaddr */
> >-static unsigned long
> >+static mfn_t
> >  dbg_hvm_va2mfn(dbgva_t vaddr, struct domain *dp, int toaddr,
> >  unsigned long *gfn)
> >  {
> >-unsigned long mfn;
> >+mfn_t mfn;
> >  uint32_t pfec = PFEC_page_present;
> >  p2m_type_t gfntype;
> >
> >@@ -60,16 +60,17 @@ dbg_hvm_va2mfn(dbgva_t vaddr, struct domain *dp, int 
> >toaddr,
> >  return INVALID_MFN;
> >  }
> >
> >-mfn = mfn_x(get_gfn(dp, *gfn, ));
> >+mfn = get_gfn(dp, *gfn, );
> >  if ( p2m_is_readonly(gfntype) && toaddr )
> >  {
> >  

[Xen-devel] [PATCH RESEND] MAINTAINERS/gdbsx: change maintainer

2016-06-28 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

Change gdbsx maintainer to myself.


Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a8e0043..e91140f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -206,7 +206,7 @@ F:  xen/common/event_fifo.c
 F: xen/include/xen/event_fifo.h
 
 GDBSX DEBUGGER
-M: Mukesh Rathor <mukesh.rat...@oracle.com>
+M:     Elena Ufimtseva <elena.ufimts...@oracle.com>
 S: Supported
 F: xen/arch/x86/debug.c
 F: tools/debugger/gdbsx/
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] GDBSX Maintainer

2016-06-28 Thread Elena Ufimtseva
Hi Julien, Andrew

I was talking to Konrad some time ago about looking into this and the
possibility of maintaining gdbsx code. I am willing sign up for this
if there are no objections.

Elena

On Tue, Jun 28, 2016 at 9:46 AM, Andrew Cooper
 wrote:
> On 28/06/16 17:31, Julien Grall wrote:
>> Hi,
>>
>> I had to modify some code in arch/x86/debug.c and noticed that Mukesh
>> is still the maintainer. IIRC he left Oracle quite a while ago, so my
>> e-mail was bounced by the server.
>>
>> Do we have a new e-mail address for me? If not, does anyone plan to
>> maintain this code? Shall we mark the code as "Orphan"?
>
> If noone explicitly wishes to maintain it, then it should be subsumed
> into general x86.  Its not like its a large or complicated area of code.
>
> ~Andrew
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



-- 
Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] pcie error containment: kill domain and dm without xend

2016-06-24 Thread Elena Ufimtseva
Thanks George!


On Fri, Jun 24, 2016 at 4:37 AM, George Dunlap <george.dun...@citrix.com> wrote:
> On Wed, Jun 22, 2016 at 9:16 PM, Elena Ufimtseva <ufimts...@gmail.com> wrote:
>> Hello
>>
>> I am working on PCIe errors containment and XSA-124 relevant problem.
>> This is only small part of the problem and I can provide more details later
>> if that is of someone's interest.
>> As the temporary solution, guest domain with passthrough device
>> without SRIOV  gets killed when certain AER errors are triggered by
>> dom0 AER code.
>> In versions of xen with xend present, xenwatch can be used and pciback can
>> write some fields to xenstore (as "aerfail" which is already present)
>> and destroy device model and then domain itself.
>> What would be the best way to initiate similar behaviour when xend is
>> not used? Or maybe what is the best way to initiate device model
>> destruction and domain itself without xend?
>
> xl forks a background process per VM to monitor VMs and destroy device
> models at the appropriate times -- see
> tools/xl_cmdimpl.c:create_domain() (and in particular search for
> "need_daemon").  This is the place to implement VM-watching features
> such as xend had.
>
>  -George



-- 
Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] pcie error containment: kill domain and dm without xend

2016-06-22 Thread Elena Ufimtseva
Hello

I am working on PCIe errors containment and XSA-124 relevant problem.
This is only small part of the problem and I can provide more details later
if that is of someone's interest.
As the temporary solution, guest domain with passthrough device
without SRIOV  gets killed when certain AER errors are triggered by
dom0 AER code.
In versions of xen with xend present, xenwatch can be used and pciback can
write some fields to xenstore (as "aerfail" which is already present)
and destroy device model and then domain itself.
What would be the best way to initiate similar behaviour when xend is
not used? Or maybe what is the best way to initiate device model
destruction and domain itself without xend?

Thanks!
Elena
~

-- 
Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?

2016-03-01 Thread Elena Ufimtseva
On Tue, Mar 01, 2016 at 10:51:30PM +0100, Sander Eikelenboom wrote:
> 
> Tuesday, March 1, 2016, 9:39:25 PM, you wrote:
> 
> > On Tue, Mar 01, 2016 at 02:52:14PM -0500, Meng Xu wrote:
> >> Hi Elena,
> >> 
> >> Thank you very much for sharing this! :-)
> >> 
> >> On Tue, Mar 1, 2016 at 1:20 PM, Elena Ufimtseva
> >> <elena.ufimts...@oracle.com> wrote:
> >> >
> >> > On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote:
> >> > > On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk
> >> > > <konrad.w...@oracle.com> wrote:
> >> > > >> > Hey!
> >> > > >> >
> >> > > >> > CC-ing Elena.
> >> > > >>
> >> > > >> I think you forgot you cc.ed her..
> >> > > >> Anyway, let's cc. her now... :-)
> >> > > >>
> >> > > >> >
> >> > > >> >> We are measuring the execution time between native machine 
> >> > > >> >> environment
> >> > > >> >> and xen virtualization environment using PARSEC Benchmark [1].
> >> > > >> >>
> >> > > >> >> In virtualiztion environment, we run a domU with three VCPUs, 
> >> > > >> >> each of
> >> > > >> >> them pinned to a core; we pin the dom0 to another core that is 
> >> > > >> >> not
> >> > > >> >> used by the domU.
> >> > > >> >>
> >> > > >> >> Inside the Linux in domU in virtualization environment and in 
> >> > > >> >> native
> >> > > >> >> environment,  We used the cpuset to isolate a core (or VCPU) for 
> >> > > >> >> the
> >> > > >> >> system processors and to isolate a core for the benchmark 
> >> > > >> >> processes.
> >> > > >> >> We also configured the Linux boot command line with isocpus= 
> >> > > >> >> option to
> >> > > >> >> isolate the core for benchmark from other unnecessary processes.
> >> > > >> >
> >> > > >> > You may want to just offline them and also boot the machine with 
> >> > > >> > NUMA
> >> > > >> > disabled.
> >> > > >>
> >> > > >> Right, the machine is booted up with NUMA disabled.
> >> > > >> We will offline the unnecessary cores then.
> >> > > >>
> >> > > >> >
> >> > > >> >>
> >> > > >> >> We expect that execution time of benchmarks in xen virtualization
> >> > > >> >> environment is larger than the execution time in native machine
> >> > > >> >> environment. However, the evaluation gave us an opposite result.
> >> > > >> >>
> >> > > >> >> Below is the evaluation data for the canneal and streamcluster 
> >> > > >> >> benchmarks:
> >> > > >> >>
> >> > > >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial
> >> > > >> >> Native: 6.387s
> >> > > >> >> Virtualization: 5.890s
> >> > > >> >>
> >> > > >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial
> >> > > >> >> Native: 5.276s
> >> > > >> >> Virtualization: 5.240s
> >> > > >> >>
> >> > > >> >> Is there anything wrong with our evaluation that lead to the 
> >> > > >> >> abnormal
> >> > > >> >> performance results?
> >> > > >> >
> >> > > >> > Nothing is wrong. Virtualization is naturally faster than 
> >> > > >> > baremetal!
> >> > > >> >
> >> > > >> > :-)
> >> > > >> >
> >> > > >> > No clue sadly.
> >> > > >>
> >> > > >> Ah-ha. This is really surprising to me Why will it speed up the
> >> > > >> system by adding one more layer? Unless the virtualization disabled
> >> > > >> some services that occur in native and interfe

Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?

2016-03-01 Thread Elena Ufimtseva
On Tue, Mar 01, 2016 at 02:52:14PM -0500, Meng Xu wrote:
> Hi Elena,
> 
> Thank you very much for sharing this! :-)
> 
> On Tue, Mar 1, 2016 at 1:20 PM, Elena Ufimtseva
> <elena.ufimts...@oracle.com> wrote:
> >
> > On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote:
> > > On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk
> > > <konrad.w...@oracle.com> wrote:
> > > >> > Hey!
> > > >> >
> > > >> > CC-ing Elena.
> > > >>
> > > >> I think you forgot you cc.ed her..
> > > >> Anyway, let's cc. her now... :-)
> > > >>
> > > >> >
> > > >> >> We are measuring the execution time between native machine 
> > > >> >> environment
> > > >> >> and xen virtualization environment using PARSEC Benchmark [1].
> > > >> >>
> > > >> >> In virtualiztion environment, we run a domU with three VCPUs, each 
> > > >> >> of
> > > >> >> them pinned to a core; we pin the dom0 to another core that is not
> > > >> >> used by the domU.
> > > >> >>
> > > >> >> Inside the Linux in domU in virtualization environment and in native
> > > >> >> environment,  We used the cpuset to isolate a core (or VCPU) for the
> > > >> >> system processors and to isolate a core for the benchmark processes.
> > > >> >> We also configured the Linux boot command line with isocpus= option 
> > > >> >> to
> > > >> >> isolate the core for benchmark from other unnecessary processes.
> > > >> >
> > > >> > You may want to just offline them and also boot the machine with NUMA
> > > >> > disabled.
> > > >>
> > > >> Right, the machine is booted up with NUMA disabled.
> > > >> We will offline the unnecessary cores then.
> > > >>
> > > >> >
> > > >> >>
> > > >> >> We expect that execution time of benchmarks in xen virtualization
> > > >> >> environment is larger than the execution time in native machine
> > > >> >> environment. However, the evaluation gave us an opposite result.
> > > >> >>
> > > >> >> Below is the evaluation data for the canneal and streamcluster 
> > > >> >> benchmarks:
> > > >> >>
> > > >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial
> > > >> >> Native: 6.387s
> > > >> >> Virtualization: 5.890s
> > > >> >>
> > > >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial
> > > >> >> Native: 5.276s
> > > >> >> Virtualization: 5.240s
> > > >> >>
> > > >> >> Is there anything wrong with our evaluation that lead to the 
> > > >> >> abnormal
> > > >> >> performance results?
> > > >> >
> > > >> > Nothing is wrong. Virtualization is naturally faster than baremetal!
> > > >> >
> > > >> > :-)
> > > >> >
> > > >> > No clue sadly.
> > > >>
> > > >> Ah-ha. This is really surprising to me Why will it speed up the
> > > >> system by adding one more layer? Unless the virtualization disabled
> > > >> some services that occur in native and interfere with the benchmark.
> > > >>
> > > >> If virtualization is faster than baremetal by nature, why we can see
> > > >> that some experiment shows that virtualization introduces overhead?
> > > >
> > > > Elena told me that there were some weird regression in Linux 4.1 - where
> > > > CPU burning workloads were _slower_ on baremetal than as guests.
> > >
> > > Hi Elena,
> > > Would you mind sharing with us some of your experience of how you
> > > found the real reason? Did you use some tool or some methodology to
> > > pin down the reason (i.e,  CPU burning workloads in native is _slower_
> > > on baremetal than as guests)?
> > >
> >
> > Hi Meng
> >
> > Yes, sure!
> >
> > While working on performance tests for smt-exposing patches from Joao
> > I run CPU bound workload in HVM guest and using same kernel in baremetal
> > run same 

Re: [Xen-devel] [Question] PARSEC benchmark has smaller execution time in VM than in native?

2016-03-01 Thread Elena Ufimtseva
On Tue, Mar 01, 2016 at 08:48:30AM -0500, Meng Xu wrote:
> On Mon, Feb 29, 2016 at 12:59 PM, Konrad Rzeszutek Wilk
>  wrote:
> >> > Hey!
> >> >
> >> > CC-ing Elena.
> >>
> >> I think you forgot you cc.ed her..
> >> Anyway, let's cc. her now... :-)
> >>
> >> >
> >> >> We are measuring the execution time between native machine environment
> >> >> and xen virtualization environment using PARSEC Benchmark [1].
> >> >>
> >> >> In virtualiztion environment, we run a domU with three VCPUs, each of
> >> >> them pinned to a core; we pin the dom0 to another core that is not
> >> >> used by the domU.
> >> >>
> >> >> Inside the Linux in domU in virtualization environment and in native
> >> >> environment,  We used the cpuset to isolate a core (or VCPU) for the
> >> >> system processors and to isolate a core for the benchmark processes.
> >> >> We also configured the Linux boot command line with isocpus= option to
> >> >> isolate the core for benchmark from other unnecessary processes.
> >> >
> >> > You may want to just offline them and also boot the machine with NUMA
> >> > disabled.
> >>
> >> Right, the machine is booted up with NUMA disabled.
> >> We will offline the unnecessary cores then.
> >>
> >> >
> >> >>
> >> >> We expect that execution time of benchmarks in xen virtualization
> >> >> environment is larger than the execution time in native machine
> >> >> environment. However, the evaluation gave us an opposite result.
> >> >>
> >> >> Below is the evaluation data for the canneal and streamcluster 
> >> >> benchmarks:
> >> >>
> >> >> Benchmark: canneal, input=simlarge, conf=gcc-serial
> >> >> Native: 6.387s
> >> >> Virtualization: 5.890s
> >> >>
> >> >> Benchmark: streamcluster, input=simlarge, conf=gcc-serial
> >> >> Native: 5.276s
> >> >> Virtualization: 5.240s
> >> >>
> >> >> Is there anything wrong with our evaluation that lead to the abnormal
> >> >> performance results?
> >> >
> >> > Nothing is wrong. Virtualization is naturally faster than baremetal!
> >> >
> >> > :-)
> >> >
> >> > No clue sadly.
> >>
> >> Ah-ha. This is really surprising to me Why will it speed up the
> >> system by adding one more layer? Unless the virtualization disabled
> >> some services that occur in native and interfere with the benchmark.
> >>
> >> If virtualization is faster than baremetal by nature, why we can see
> >> that some experiment shows that virtualization introduces overhead?
> >
> > Elena told me that there were some weird regression in Linux 4.1 - where
> > CPU burning workloads were _slower_ on baremetal than as guests.
> 
> Hi Elena,
> Would you mind sharing with us some of your experience of how you
> found the real reason? Did you use some tool or some methodology to
> pin down the reason (i.e,  CPU burning workloads in native is _slower_
> on baremetal than as guests)?
>

Hi Meng

Yes, sure!

While working on performance tests for smt-exposing patches from Joao
I run CPU bound workload in HVM guest and using same kernel in baremetal
run same test.
While testing cpu-bound workload on baremetal linux (4.1.0-rc2)
I found that the time to complete the same test is few times more that
as it takes for the same under HVM guest.
I have tried tests where kernel threads pinned to cores and without pinning.
The execution times are most of the times take as twice longer, sometimes 4
times longer that HVM case.

Interesting is not only that it takes sometimes 3-4 times more
than HVM guest, but also that test with bound threads (to cores) takes almost
3 times longer
to execute than running same cpu-bound test under HVM (in all
configurations).

I run each test 5 times and here are the execution times (seconds):

-
baremetal   |
thread_bind | thread unbind | HVM pinned to cores
--- |---|-
 74 | 83|28
 74 | 88|28
 74 | 38|28
 74 | 73|28
 74 | 87|28

Sometimes better times were on unbinded tests, but not often enough
to present it here. Some results are much worse and reach up to 120
seconds.

Each test has 8 kernel threads. In baremetal case I tried the following:
- numa off,on;
- all cpus are on;
- isolate cpus from first node;
- set intel_idle.max_cstate=1;
- disable intel_pstate;

I dont think I have exhausted all the options here, but it looked like
two last changes did improve performance, but was still not comparable to
HVM case.
I am trying to find where regression had happened. Performance on newer
kernel (I tried 4.5.0-rc4+) was close or better than HVM.

I am trying to find f there were some relevant regressions to understand
the reason of this.


What kernel you guys use?

Elena

See more description of the tests here:
http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg02874.html
Joao patches are here:

Re: [Xen-devel] schedulers and topology exposing questions

2016-01-29 Thread Elena Ufimtseva
On Thu, Jan 28, 2016 at 09:46:46AM +, Dario Faggioli wrote:
> On Wed, 2016-01-27 at 11:03 -0500, Elena Ufimtseva wrote:
> > On Wed, Jan 27, 2016 at 10:27:01AM -0500, Konrad Rzeszutek Wilk
> > wrote:
> > > On Wed, Jan 27, 2016 at 03:10:01PM +, George Dunlap wrote:
> > > > On 27/01/16 14:33, Konrad Rzeszutek Wilk wrote:
> > > > > On Tue, Jan 26, 2016 at 11:21:36AM +, George Dunlap wrote:
> > > > > > On 22/01/16 16:54, Elena Ufimtseva wrote:
> > > > > > > Hello all!
> > > > > > >
> > > > > > > Dario, Gerorge or anyone else,  your help will be
> > > > > > > appreciated.
> > > > > > >
> > > > > > > Let me put some intro to our findings. I may forget
> > > > > > > something or put something
> > > > > > > not too explicit, please ask me.
> > > > > > >
> > > > > > > Customer filled a bug where some of the applications were
> > > > > > > running slow in their HVM DomU setups.
> > > > > > > These running times were compared against baremetal running
> > > > > > > same kernel version as HVM DomU.
> > > > > > >
> > > > > > > After some investigation by different parties, the test
> > > > > > > case scenario was found
> > > > > > > where the problem was easily seen. The test app is a udp
> > > > > > > server/client pair where
> > > > > > > client passes some message n number of times.
> > > > > > > The test case was executed on baremetal and Xen DomU with
> > > > > > > kernel version 2.6.39.
> > > > > > > Bare metal showed 2x times better result that DomU.
> > > > > > >
> > > > > > > Konrad came up with a workaround that was setting the flag
> > > > > > > for domain scheduler in linux
> > > > > > > As the guest is not aware of SMT-related topology, it has a
> > > > > > > flat topology initialized.
> > > > > > > Kernel has domain scheduler flags for scheduling domain CPU
> > > > > > > set to 4143 for 2.6.39.
> > > > > > > Konrad discovered that changing the flag for CPU sched
> > > > > > > domain to 4655
> > > > > > > works as a workaround and makes Linux think that the
> > > > > > > topology has SMT threads.
> > > > > > > This workaround makes the test to complete almost in same
> > > > > > > time as on baremetal (or insignificantly worse).
> > > > > > >
> > > > > > > This workaround is not suitable for kernels of higher
> > > > > > > versions as we discovered.
> > > > > > >
> > > > > > > The hackish way of making domU linux think that it has SMT
> > > > > > > threads (along with matching cpuid)
> > > > > > > made us thinks that the problem comes from the fact that
> > > > > > > cpu topology is not exposed to
> > > > > > > guest and Linux scheduler cannot make intelligent decision
> > > > > > > on scheduling.
> > > > > > >
> > > > > > > Joao Martins from Oracle developed set of patches that
> > > > > > > fixed the smt/core/cashe
> > > > > > > topology numbering and provided matching pinning of vcpus
> > > > > > > and enabling options,
> > > > > > > allows to expose to guest correct topology.
> > > > > > > I guess Joao will be posting it at some point.
> > > > > > >
> > > > > > > With this patches we decided to test the performance impact
> > > > > > > on different kernel versionand Xen versions.
> > > > > > >
> > > > > > > The test described above was labeled as IO-bound test.
> > > > > >
> > > > > > So just to clarify: The client sends a request (presumably
> > > > > > not much more
> > > > > > than a ping) to the server, and waits for the server to
> > > > > > respond before
> > > > > > sending another one; and the server does the reverse --
> > > > > > receives a
> > > > > > request, responds, and then waits for the n

Re: [Xen-devel] schedulers and topology exposing questions

2016-01-29 Thread Elena Ufimtseva
On Thu, Jan 28, 2016 at 09:55:45AM +, Dario Faggioli wrote:
> On Wed, 2016-01-27 at 15:53 +, George Dunlap wrote:
> > On 27/01/16 15:27, Konrad Rzeszutek Wilk wrote:
> > > 
> > > So Elena started looking at the CPU bound and seeing how Xen
> > > behaves then
> > > and if we can improve the floating situation as she saw some
> > > abnormal
> > > behavious.
> >
> > OK -- if the focus was on the two cases where the Xen credit1
> > scheduler
> > (apparently) co-located two cpu-burning vcpus on sibling threads,
> > then
> > yeah, that's behavior we should probably try to get to the bottom of.
> >
> Well, let's see the trace. 

Hey Dario

Please disregard the previous email with topology information.
It was incorrect and I am attaching the topology that is actually result
of Joao smt patches application.


Elena
>
> In any case, I'm up to trying hooking the SMT load balancer in
> runq_tickle (which would mean doing it upon every vcpus wakeup).
>
> My gut feeling is that the overhead my outwieght the benefit, and that
> it will actually reveal useful only in a minority of the
> cases/workloads, but it's maybe worth a try.

>
> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)
>


processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 62
model name  : Genuine Intel(R) CPU  @ 2.80GHz
stepping: 2
microcode   : 0x209
cpu MHz : 2793.360
cache size  : 25600 KB
physical id : 0
siblings: 16
core id : 0
cpu cores   : 8
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs:
bogomips: 5586.72
clflush size: 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 62
model name  : Genuine Intel(R) CPU  @ 2.80GHz
stepping: 2
microcode   : 0x209
cpu MHz : 2793.360
cache size  : 25600 KB
physical id : 0
siblings: 16
core id : 0
cpu cores   : 8
apicid  : 1
initial apicid  : 1
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs:
bogomips: 5586.72
clflush size: 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 62
model name  : Genuine Intel(R) CPU  @ 2.80GHz
stepping: 2
microcode   : 0x209
cpu MHz : 2793.360
cache size  : 25600 KB
physical id : 0
siblings: 16
core id : 1
cpu cores   : 8
apicid  : 2
initial apicid  : 2
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt
bugs:
bogomips: 5586.72
clflush size: 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 62
model name  : Genuine Intel(R) CPU  @ 2.80GHz
stepping: 2
microcode   : 0x209
cpu MHz : 2793.360
cache size  : 25600 KB
physical id : 0
siblings: 16
core id : 1
cpu cores   : 8
apicid  : 3
initial apicid  : 3
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 
x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
fsgsbase smep erms xsaveopt

Re: [Xen-devel] schedulers and topology exposing questions

2016-01-28 Thread Elena Ufimtseva
On Wed, Jan 27, 2016 at 02:01:35PM +, Dario Faggioli wrote:
> On Fri, 2016-01-22 at 11:54 -0500, Elena Ufimtseva wrote:
> > Hello all!
> > 
> Hey, here I am again,
> 
> > Konrad came up with a workaround that was setting the flag for domain
> > scheduler in linux
> > As the guest is not aware of SMT-related topology, it has a flat
> > topology initialized.
> > Kernel has domain scheduler flags for scheduling domain CPU set to
> > 4143 for 2.6.39.
> > Konrad discovered that changing the flag for CPU sched domain to 4655
> >
> So, as you've seen, I also have been up to doing quite a few of
> benchmarking doing soemthing similar (I used more recent kernels, and
> decided to test 4131 as flags.
> 
> In your casse, according to this:
>  http://lxr.oss.org.cn/source/include/linux/sched.h?v=2.6.39#L807
> 
> 4655 means:
>   SD_LOAD_BALANCE        |
>   SD_BALANCE_EXEC        |
>  
> SD_BALANCE_WAKE        |
>   SD_PREFER_LOCAL        | [*]
>  
> SD_SHARE_PKG_RESOURCES |
>   SD_SERIALIZE
> 
> and another bit (0x4000) that I don't immediately see what it is.
> 
> Things have changed a bit since then, it appears. However, I'm quite sure 
> I've tested turning on SD_SERIALIZE in 4.2.0 and 4.3.0, and results were 
> really pretty bad (as you also seem to say later).
> 
> > works as a workaround and makes Linux think that the topology has SMT
> > threads.
> >
> Well, yes and no. :-). I don't want to make this all a terminology
> bunfight, something that also matters here is how many scheduling
> domains you have.
> 
> To check that (although in recent kernels) you check here:
> 
>  ls /proc/sys/kernel/sched_domain/cpu2/ (any cpu is ok)
> 
> and see how many domain[0-9] you have.
> 
> On baremetal, on an HT cpu, I've got this:
> 
> $ cat /proc/sys/kernel/sched_domain/cpu2/domain*/name 
> SMT
> MC
> 
> So, two domains, one of which is the SMT one. If you check their flags,
> they're different:
> 
> $ cat /proc/sys/kernel/sched_domain/cpu2/domain*/flags
> 4783
> 559
> 
> So, yes, you are right in saying that 4655 is related to SMT. In fact,
> it is what (among other things) tells the load balancer that *all* the
> cpus (well, all the scheduling groups, actually) in this domain are SMT
> siblings... Which is a legitimate thing to do, but it's not what
> happens on SMT baremetal.
> 
> At least is consistent, IMO. I.e., it still creates a pretty flat
> topology, like there was a big core, of which _all_ the vcpus are part
> of, as SMT siblings.
> 
> The other option (the one I'm leaning toward) was too get rid of that
> one flag. I've only done preliminary experiments with it on and off,
> and the ones with it off were better looking, so I did keep it off for
> the big run... but we can test with it again.
> 
> > This workaround makes the test to complete almost in same time as on
> > baremetal (or insignificantly worse).
> > 
> > This workaround is not suitable for kernels of higher versions as we
> > discovered.
> > 
> There may be more than one reason for this (as said, a lot changed!)
> but it matches what I've found when SD_SERIALIZE was kept on for the
> scheduling domain where all the vcpus are.
> 
> > The hackish way of making domU linux think that it has SMT threads
> > (along with matching cpuid)
> > made us thinks that the problem comes from the fact that cpu topology
> > is not exposed to
> > guest and Linux scheduler cannot make intelligent decision on
> > scheduling.
> > 
> As said, I think it's the other way around: we expose too much of it
> (and this is more of an issue for PV rather than for HVM). Basically,
> either you do the pinning you're doing or, whatever you expose, will be
> *wrong*... and the only way to expose not wrong data is to actually
> don't expose anything! :-)
> 
> > The test described above was labeled as IO-bound test.
> > 
> > We have run io-bound test with and without smt-patches. The
> > improvement comparing
> > to base case (no smt patches, flat topology) shows 22-23% gain.
> > 
> I'd be curious to see the content of the /proc/sys/kernel/sched_domain
> directory and subdirectories with Joao's patches applied.
> 
> > While we have seen improvement with io-bound tests, the same did not
> > happen with cpu-bound workload.
> > As cpu-bound test we use kernel module which runs requested number of
> > kernel threads
> > and each thread compresses and decompresses some data.
> > 
> That is somewhat what I would have expected, although up to what
> extent, it's hard to tell in advance.
> 
> It also matches my fin

Re: [Xen-devel] schedulers and topology exposing questions

2016-01-27 Thread Elena Ufimtseva
On Wed, Jan 27, 2016 at 10:27:01AM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 27, 2016 at 03:10:01PM +, George Dunlap wrote:
> > On 27/01/16 14:33, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Jan 26, 2016 at 11:21:36AM +, George Dunlap wrote:
> > >> On 22/01/16 16:54, Elena Ufimtseva wrote:
> > >>> Hello all!
> > >>>
> > >>> Dario, Gerorge or anyone else,  your help will be appreciated.
> > >>>
> > >>> Let me put some intro to our findings. I may forget something or put 
> > >>> something
> > >>> not too explicit, please ask me.
> > >>>
> > >>> Customer filled a bug where some of the applications were running slow 
> > >>> in their HVM DomU setups.
> > >>> These running times were compared against baremetal running same kernel 
> > >>> version as HVM DomU.
> > >>>
> > >>> After some investigation by different parties, the test case scenario 
> > >>> was found
> > >>> where the problem was easily seen. The test app is a udp server/client 
> > >>> pair where
> > >>> client passes some message n number of times.
> > >>> The test case was executed on baremetal and Xen DomU with kernel 
> > >>> version 2.6.39.
> > >>> Bare metal showed 2x times better result that DomU.
> > >>>
> > >>> Konrad came up with a workaround that was setting the flag for domain 
> > >>> scheduler in linux
> > >>> As the guest is not aware of SMT-related topology, it has a flat 
> > >>> topology initialized.
> > >>> Kernel has domain scheduler flags for scheduling domain CPU set to 4143 
> > >>> for 2.6.39.
> > >>> Konrad discovered that changing the flag for CPU sched domain to 4655
> > >>> works as a workaround and makes Linux think that the topology has SMT 
> > >>> threads.
> > >>> This workaround makes the test to complete almost in same time as on 
> > >>> baremetal (or insignificantly worse).
> > >>>
> > >>> This workaround is not suitable for kernels of higher versions as we 
> > >>> discovered.
> > >>>
> > >>> The hackish way of making domU linux think that it has SMT threads 
> > >>> (along with matching cpuid)
> > >>> made us thinks that the problem comes from the fact that cpu topology 
> > >>> is not exposed to
> > >>> guest and Linux scheduler cannot make intelligent decision on 
> > >>> scheduling.
> > >>>
> > >>> Joao Martins from Oracle developed set of patches that fixed the 
> > >>> smt/core/cashe
> > >>> topology numbering and provided matching pinning of vcpus and enabling 
> > >>> options,
> > >>> allows to expose to guest correct topology.
> > >>> I guess Joao will be posting it at some point.
> > >>>
> > >>> With this patches we decided to test the performance impact on 
> > >>> different kernel versionand Xen versions.
> > >>>
> > >>> The test described above was labeled as IO-bound test.
> > >>
> > >> So just to clarify: The client sends a request (presumably not much more
> > >> than a ping) to the server, and waits for the server to respond before
> > >> sending another one; and the server does the reverse -- receives a
> > >> request, responds, and then waits for the next request.  Is that right?
> > > 
> > > Yes.
> > >>
> > >> How much data is transferred?
> > > 
> > > 1 packet, UDP
> > >>
> > >> If the amount of data transferred is tiny, then the bottleneck for the
> > >> test is probably the IPI time, and I'd call this a "ping-pong"
> > >> benchmark[1].  I would only call this "io-bound" if you're actually
> > >> copying large amounts of data.
> > > 
> > > What we found is that on baremetal the scheduler would put both apps
> > > on the same CPU and schedule them right after each other. This would
> > > have a high IPI as the scheduler would poke itself.
> > > On Xen it would put the two applications on seperate CPUs - and there
> > > would be hardly any IPI.
> > 
> > Sorry -- why would the scheduler send itself an IPI if it's on the same
> > logical cpu (which seems pretty pointless),

Re: [Xen-devel] schedulers and topology exposing questions

2016-01-22 Thread Elena Ufimtseva
On Fri, Jan 22, 2016 at 06:29:19PM +0100, Dario Faggioli wrote:
> On Fri, 2016-01-22 at 11:54 -0500, Elena Ufimtseva wrote:
> > Hello all!
> >
> Hello,
>
> > Let me put some intro to our findings. I may forget something or put
> > something
> > not too explicit, please ask me.
> >
> > Customer filled a bug where some of the applications were running
> > slow in their HVM DomU setups.
> > These running times were compared against baremetal running same
> > kernel version as HVM DomU.
> >
> > After some investigation by different parties, the test case scenario
> > was found
> > where the problem was easily seen. The test app is a udp
> > server/client pair where
> > client passes some message n number of times.
> > The test case was executed on baremetal and Xen DomU with kernel
> > version 2.6.39.
> > Bare metal showed 2x times better result that DomU.
> >
> > Konrad came up with a workaround that was setting the flag for domain
> > scheduler in linux
> > As the guest is not aware of SMT-related topology, it has a flat
> > topology initialized.
> > Kernel has domain scheduler flags for scheduling domain CPU set to
> > 4143 for 2.6.39.
> > Konrad discovered that changing the flag for CPU sched domain to 4655
> > works as a workaround and makes Linux think that the topology has SMT
> > threads.
> > This workaround makes the test to complete almost in same time as on
> > baremetal (or insignificantly worse).
> >
> > This workaround is not suitable for kernels of higher versions as we
> > discovered.
> >
> > The hackish way of making domU linux think that it has SMT threads
> > (along with matching cpuid)
> > made us thinks that the problem comes from the fact that cpu topology
> > is not exposed to
> > guest and Linux scheduler cannot make intelligent decision on
> > scheduling.
> >
> So, me an Juergen (from SuSE) have been working on this for a while
> too.
>
> As far as my experiments goes, there are at least two different issues,
> both traceable to Linux's scheduler behavior. One has to do with what
> you just say, i.e., topology.
>
> Juergen has developed a set of patches, and I'm running benchamrks with
> them applied to both Dom0 and DomU, to see how they work.
>
> I'm not far from finishing running a set of 324 different test cases
> (each one run both without and with Juergen's patches). I am running
> different benchamrks, such as:
>  - iperf,
>  - a Xen build,
>  - sysbench --oltp,
>  - sysbench --cpu,
>  - unixbench
>
> and I'm also varying how loaded the host is, how big the VMs are, and
> how loaded the VMs are.

Thats pretty cool. I also tried in my tests oversubscribed tests.

>
> 324 is the result of various combinations of the above... It's quite an
> extensive set! :-P

It is! Even with my few tests its a lot of work.
>
> As soon as everything finishes running, I'll data mine the results, and
> let you know how they look like.
>
>
> The other issue that I've observed is that tweaking some _non_ topology
> related scheduling domains' flags also impact performance, sometimes in
> a quite sensible way.
>
> I have got the results from the 324 test cases described above of
> running with flags set to 4131 inside all the DomUs. That value was
> chosen after quite a bit of preliminary benchmarking and investigation
> as well.
>
> I'll share the results of that data set as well as soon as I manage to
> extract them from the raw output.

>
> > Joao Martins from Oracle developed set of patches that fixed the
> > smt/core/cashe
> > topology numbering and provided matching pinning of vcpus and
> > enabling options,
> > allows to expose to guest correct topology.
> > I guess Joao will be posting it at some point.
> >
> That is one way of approaching the topology issue. The other, which is
> what me and Juergen are pursuing, is the opposite one, i.e., make the
> DomU (and Dom0, actually) think that the topology is always completely
> flat.
>
> I think, ideally, we want both: flat topology as the default, if no
> pinning is specifying. Matching topology if it is.
>
> > With this patches we decided to test the performance impact on
> > different kernel versionand Xen versions.
> >
> That is really interesting, and thanks a lot for sharing it with us.
>
> I'm in the middle of something here, so I just wanted to quickly let
> you know that we're also working on something related... I'll have a
> look at the rest of the email and at the graphs ASAP.

Great!
I am attaching the io and cpu-bound tests that were used to get the
data.

Thanks Dario!
>
> Thanks again and Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)
>




perf_tests.tar.gz
Description: application/gzip
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] how to enable kdb for xen

2015-12-18 Thread Elena Ufimtseva
On Fri, Dec 18, 2015 at 11:24 PM, quizyjones  wrote:
> Is there any progress?

Hey

I did look into this and I could not find the trace of what I have
done before. So I decided to ytu and port it to current version from
this Mukesh patch:
http://lists.xen.org/archives/html/xen-devel/2014-04/msg3.html
It looks like it applied without major issues, but I have not tested
it yet, but its in my plan for next week.

Elena

>
>> Date: Wed, 16 Dec 2015 09:42:47 -0500
>> From: ufimts...@gmail.com
>> To: konrad.w...@oracle.com
>> CC: elena.ufimts...@oracle.com; quizy_jo...@outlook.com; t...@xen.org;
>> xen-devel@lists.xen.org
>> Subject: Re: [Xen-devel] how to enable kdb for xen
>
>>
>> On Wed, Dec 16, 2015 at 9:30 AM, Konrad Rzeszutek Wilk
>>  wrote:
>> > On December 16, 2015 3:08:04 AM EST, quizyjones
>> >  wrote:
>> >>The version embedded with kdb only updates to 4.1.0. How can I use it
>> >>with xen 4.6? Or is there any other debuggers which can step in Xen?
>> >
>> > CCing Elena who poked at it some point. Not sure if she got it ported
>> > over though.
>> >>
>> >>From: quizy_jo...@outlook.com
>> >>To: xen-devel@lists.xen.org
>> >>Date: Wed, 16 Dec 2015 06:57:02 +
>> >>Subject: [Xen-devel] how to enable kdb for xen
>> >>
>> >>
>> >>
>> >>
>> >>I tried to debug xen use kdb. After compiling xen with debug=y, is
>> >>there any further steps I should take? I can get console outputs start
>> >>with: Xen 4.4.1 (XEN) Xen version 4.4.1 (root@) (gcc
>> >>(Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4) debug=y Wed Dec 16 11:01:14
>> >>.But I can't step into the boot procedure. The kdb seems not built
>> >>in and there is no kdb folder in /tools/debugger. How can I build
>> >>xen-4.4.1/xen-4.4.6 with kdb?
>>
>> Hey!
>> If I recall correctly, I did try to port kdb. Let me find out what
>> happened there.
>>
>> Elena
>>
>> >>
>> >>
>> >>
>> >>___
>> >>Xen-devel mailing list
>> >>Xen-devel@lists.xen.org
>> >>http://lists.xen.org/xen-devel
>> >>
>> >>
>> >>
>> >>___
>> >>Xen-devel mailing list
>> >>Xen-devel@lists.xen.org
>> >>http://lists.xen.org/xen-devel
>> >
>> >
>> >
>> > ___
>> > Xen-devel mailing list
>> > Xen-devel@lists.xen.org
>> > http://lists.xen.org/xen-devel
>>
>>
>>
>> --
>> Elena
>>
>> ___
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel



-- 
Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] how to enable kdb for xen

2015-12-16 Thread Elena Ufimtseva
On Wed, Dec 16, 2015 at 9:30 AM, Konrad Rzeszutek Wilk
 wrote:
> On December 16, 2015 3:08:04 AM EST, quizyjones  
> wrote:
>>The version embedded with kdb only updates to 4.1.0. How can I use it
>>with xen 4.6?  Or is there any other debuggers which can step in Xen?
>
> CCing Elena who poked at it some point. Not sure if she got it ported over 
> though.
>>
>>From: quizy_jo...@outlook.com
>>To: xen-devel@lists.xen.org
>>Date: Wed, 16 Dec 2015 06:57:02 +
>>Subject: [Xen-devel] how to enable kdb for xen
>>
>>
>>
>>
>>I tried to debug xen use kdb. After compiling xen with debug=y, is
>>there any further steps I should take? I can get console outputs start
>>with:Xen 4.4.1(XEN) Xen version 4.4.1 (root@) (gcc
>>(Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4) debug=y Wed Dec 16 11:01:14
>>.But I can't step into the boot procedure.  The kdb seems not built
>>in and there is no kdb folder in /tools/debugger. How can I build
>>xen-4.4.1/xen-4.4.6 with kdb?

Hey!
If I recall correctly, I did try to port kdb. Let me find out what
happened there.

Elena

>>
>>
>>
>>___
>>Xen-devel mailing list
>>Xen-devel@lists.xen.org
>>http://lists.xen.org/xen-devel
>>
>>
>>
>>___
>>Xen-devel mailing list
>>Xen-devel@lists.xen.org
>>http://lists.xen.org/xen-devel
>
>
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



-- 
Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v12 3/3] iommu: add rmrr Xen command line option for extra rmrrs

2015-11-06 Thread Elena Ufimtseva
On Fri, Nov 06, 2015 at 04:05:25AM -0700, Jan Beulich wrote:
> >>> On 06.11.15 at 05:22,  wrote:
> > On Wed, Oct 28, 2015 at 10:05:31AM -0600, Jan Beulich wrote:
> >> >>> On 27.10.15 at 21:36,  wrote:
> >> > +static void __init add_extra_rmrr(void)
> >> > +{
> >> > +struct acpi_rmrr_unit *acpi_rmrr;
> >> > +struct acpi_rmrr_unit *rmrru;
> >> > +unsigned int dev, seg, i;
> >> > +unsigned long pfn;
> >> > +bool_t overlap;
> >> > +
> >> > +for ( i = 0; i < nr_rmrr; i++ )
> >> > +{
> >> > +if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn 
> >> > )
> >> > +{
> >> > +printk(XENLOG_ERR VTDPREFIX
> >> > +   "Invalid RMRR Range "ERMRRU_FMT"\n",
> >> > +   ERMRRU_ARG(extra_rmrr_units[i]));
> >> > +continue;
> >> > +}
> >> > +
> >> > +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn 
> >> > >=
> >> > + MAX_EXTRA_RMRR_PAGES )
> >> > +{
> >> > +printk(XENLOG_ERR VTDPREFIX
> >> > +   "RMRR range "ERMRRU_FMT" exceeds 
> >> > "__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n",
> >> > +   ERMRRU_ARG(extra_rmrr_units[i]));
> >> > +continue;
> >> > +}
> >> > +
> >> > +overlap = 0;
> >> > +list_for_each_entry(rmrru, _rmrr_units, list)
> >> > +{
> >> > +if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn) < 
> >> > rmrru->end_address &&
> >> > + rmrru->base_address < 
> >> > pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1) )
> >> 
> >> Aren't both ranges inclusive? I.e. shouldn't the first one be <= (and
> >> the second one could be <= too when dropping the +1), matching
> >> the check acpi_parse_one_rmrr() does?
> > 
> > The end_address is not inclusive, while the start_address is.
> > These to from  rmrr_identity_mapping()
> > ...
> > ASSERT(rmrr->base_address < rmrr->end_address); 
> > 
> 
> These are byte-granular addresses.
> 
> > and:
> > ...
> > while ( base_pfn < end_pfn )
> > {
> > int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);
> > 
> >  
> >
> > if ( err )  
> >  
> >
> > return err; 
> >  
> >
> > base_pfn++; 
> >  
> >
> > }
> > ...
> > 
> > I think this condition should not be a problem. But yes, its not uniform 
> > with acpi_parse_one_rmrr.
> 
> Did you actually pay attention to how end_pfn gets calculated?
> 
> > I guess I should send another version then?
> 
> Yes of course.

Ok, I see your point.
> 
> >> > +}
> >> > +if ( seg != PCI_SEG(extra_rmrr_units[i].sbdf[0]) )
> >> > +{
> >> > +printk(XENLOG_ERR VTDPREFIX
> >> > +   "Segments are not equal for RMRR range 
> >> > "ERMRRU_FMT"\n",
> >> > +   ERMRRU_ARG(extra_rmrr_units[i]));
> >> > +scope_devices_free(_rmrr->scope);
> >> > +xfree(acpi_rmrr);
> >> > +continue;
> >> > +}
> >> > +
> >> > +acpi_rmrr->segment = seg;
> >> > +acpi_rmrr->base_address = 
> > pfn_to_paddr(extra_rmrr_units[i].base_pfn);
> >> > +acpi_rmrr->end_address = 
> >> > pfn_to_paddr(extra_rmrr_units[i].end_pfn + 
> > 1);
> >> 
> >> And this seems wrong too, unless I'm mistaken with the inclusive-ness.
> >>
> > The end_address is exclusive, see above.

> No - see above.

You are right, I actually meant to say end_pfn for extra rmrr in not inclusive.
And this case is only valid when base_pfn == end_pfn as the parser does
not take care of the case where there is only one pfn specified. The
assumption in this case is that user meant [base_pfn, base_pfn + 1].
I think it will be safe to add the condition when incrementing.


> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v12 3/3] iommu: add rmrr Xen command line option for extra rmrrs

2015-11-05 Thread Elena Ufimtseva
On Wed, Oct 28, 2015 at 10:05:31AM -0600, Jan Beulich wrote:
> >>> On 27.10.15 at 21:36,  wrote:
> > +static void __init add_extra_rmrr(void)
> > +{
> > +struct acpi_rmrr_unit *acpi_rmrr;
> > +struct acpi_rmrr_unit *rmrru;
> > +unsigned int dev, seg, i;
> > +unsigned long pfn;
> > +bool_t overlap;
> > +
> > +for ( i = 0; i < nr_rmrr; i++ )
> > +{
> > +if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Invalid RMRR Range "ERMRRU_FMT"\n",
> > +   ERMRRU_ARG(extra_rmrr_units[i]));
> > +continue;
> > +}
> > +
> > +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn >=
> > + MAX_EXTRA_RMRR_PAGES )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "RMRR range "ERMRRU_FMT" exceeds 
> > "__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n",
> > +   ERMRRU_ARG(extra_rmrr_units[i]));
> > +continue;
> > +}
> > +
> > +overlap = 0;
> > +list_for_each_entry(rmrru, _rmrr_units, list)
> > +{
> > +if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn) < 
> > rmrru->end_address &&
> > + rmrru->base_address < 
> > pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1) )
> 
> Aren't both ranges inclusive? I.e. shouldn't the first one be <= (and
> the second one could be <= too when dropping the +1), matching
> the check acpi_parse_one_rmrr() does?

The end_address is not inclusive, while the start_address is.
These to from  rmrr_identity_mapping()
...
ASSERT(rmrr->base_address < rmrr->end_address); 
and:
...
while ( base_pfn < end_pfn )
{
int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw, flag);

if ( err )  
return err; 
base_pfn++; 
}
...

I think this condition should not be a problem. But yes, its not uniform with 
acpi_parse_one_rmrr.
I guess I should send another version then?
> 
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Overlapping RMRRs: "ERMRRU_FMT" and [%lx-%lx]\n",
> > +   ERMRRU_ARG(extra_rmrr_units[i]),
> > +   paddr_to_pfn(rmrru->base_address),
> > +   paddr_to_pfn(rmrru->end_address));
> > +overlap = 1;
> > +break;
> > +}
> > +}
> > +/* Don't add overlapping RMRR. */
> > +if ( overlap )
> > +continue;
> > +
> > +pfn = extra_rmrr_units[i].base_pfn;
> > +do
> > +{
> > +if ( !mfn_valid(pfn) )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Invalid pfn in RMRR range "ERMRRU_FMT"\n",
> > +   ERMRRU_ARG(extra_rmrr_units[i]));
> > +break;
> > +}
> > +} while ( pfn++ < extra_rmrr_units[i].end_pfn );
> > +
> > +/* Invalid pfn in range as the loop ended before end_pfn was 
> > reached. */
> > +if ( pfn <= extra_rmrr_units[i].end_pfn )
> > +continue;
> > +
> > +acpi_rmrr = xzalloc(struct acpi_rmrr_unit);
> > +if ( !acpi_rmrr )
> > +return;
> > +
> > +acpi_rmrr->scope.devices = xmalloc_array(u16,
> > + 
> > extra_rmrr_units[i].dev_count);
> > +if ( !acpi_rmrr->scope.devices )
> > +{
> > +xfree(acpi_rmrr);
> > +return;
> > +}
> > +
> > +seg = 0;
> > +for ( dev = 0; dev < extra_rmrr_units[i].dev_count; dev++ )
> > +{
> > +acpi_rmrr->scope.devices[dev] = extra_rmrr_units[i].sbdf[dev];
> > +seg = seg | PCI_SEG(extra_rmrr_units[i].sbdf[dev]);
> 
> Once again - |= please.
> 

Missed this one.

> > +}
> > +if ( seg != PCI_SEG(extra_rmrr_units[i].sbdf[0]) )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Segments are not equal for RMRR range "ERMRRU_FMT"\n",
> > +   ERMRRU_ARG(extra_rmrr_units[i]));
> > +scope_devices_free(_rmrr->scope);
> > +xfree(acpi_rmrr);
> > +continue;
> > +}
> > +
> > +acpi_rmrr->segment = seg;
> > +acpi_rmrr->base_address = 
> > pfn_to_paddr(extra_rmrr_units[i].base_pfn);
> > +acpi_rmrr->end_address = pfn_to_paddr(extra_rmrr_units[i].end_pfn 
> > + 1);
> 
> And this seems wrong too, unless I'm mistaken with the inclusive-ness.
>
The end_address is exclusive, see 

Re: [Xen-devel] [PATCH v11 3/3] iommu: add rmrr Xen command line option for extra rmrrs

2015-10-27 Thread Elena Ufimtseva
On Mon, Oct 26, 2015 at 07:38:06AM -0600, Jan Beulich wrote:
> >>> On 22.10.15 at 19:13, <elena.ufimts...@oracle.com> wrote:
> > From: Elena Ufimtseva <elena.ufimts...@oracle.com>
> > 
> > On some platforms RMRR regions may be not specified in ACPI and thus will 
> > not
> > be mapped 1:1 in dom0.
>

Thanks Jan for review.

> I think this may be misleading to readers: It sounds as if there was
> the option for RMRRs to not be specified in ACPI tables, while in
> fact this is a firmware bug. How about "On some platforms firmware
> fails to specify RMRR regions may in ACPI tables, and thus those
> regions will not be mapped in dom0 or guests the respective device(s)
> get passed through to"?
>
Agree, makes more sense.

> > +static void __init add_extra_rmrr(void)
> > +{
> > +struct acpi_rmrr_unit *acpi_rmrr;
> > +struct acpi_rmrr_unit *rmrru;
> > +unsigned int dev, seg, i;
> > +unsigned long pfn;
> > +bool_t overlap;
> > +
> > +for ( i = 0; i < nr_rmrr; i++ )
> > +{
> > +if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Invalid RMRR Range "ERMRRU_FMT"\n",
> > +   ERMRRU_ARG(extra_rmrr_units[i]));
> > +continue;
> > +}
> > +
> > +if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn >=
> > + MAX_EXTRA_RMRR_PAGES )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "RMRR range "ERMRRU_FMT" exceeds 
> > "__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n",
> > +   ERMRRU_ARG(extra_rmrr_units[i]));
> > +continue;
> > +}
> > +
> > +overlap = 0;
> > +list_for_each_entry(rmrru, _rmrr_units, list)
> > +{
> > +if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn ) < 
> > rmrru->end_address &&
> 
> Stray blank inside the inner parentheses.
> 
> > + rmrru->base_address < 
> > pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1) )
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Overlapping RMRRs: "ERMRRU_FMT" and [%lx - %lx]\n",
> 
> ERMRRU_FMT doesn't have any blanks inside the square brackets,
> so I'd suggest the other format to nt have them either.
> 
> > +   ERMRRU_ARG(extra_rmrr_units[i]),
> > +   paddr_to_pfn(rmrru->base_address),
> > +   paddr_to_pfn(rmrru->end_address));
> > +overlap = 1;
> > +break;
> > +}
> > +}
> > +/* Dont add overlapping RMRR */
> 
> "Don't" and missing full stop.
> 
> > +if ( overlap )
> > +continue;
> > +
> > +pfn = extra_rmrr_units[i].base_pfn;
> > +do
> > +{
> > +if ( !mfn_valid(pfn) || (pfn >> (paddr_bits - PAGE_SHIFT)) )
> 
> Actually I think the right side is redundant with the max_pfn check
> mfn_valid() does.
> 
> > +{
> > +printk(XENLOG_ERR VTDPREFIX
> > +   "Invalid pfn in RMRR range "ERMRRU_FMT"\n",
> > +   ERMRRU_ARG(extra_rmrr_units[i]));
> > +break;
> 
> Wrong indentation.
> 
> > +}
> > +
> > +} while ( pfn++ < extra_rmrr_units[i].end_pfn );
> 
> Stray blank line before the end of the do/while body.
> 
> > +
> > +/* Invalid pfn in range as the loop ended before end_pfn was 
> > reached. */
> > +if ( pfn <= extra_rmrr_units[i].end_pfn )
> > +continue;
> > +
> > +acpi_rmrr = xzalloc(struct acpi_rmrr_unit);
> > +if ( !acpi_rmrr )
> > +return;
> > +
> > +acpi_rmrr->scope.devices = xmalloc_array(u16,
> > + 
> > extra_rmrr_units[i].dev_count);
> > +if ( !acpi_rmrr->scope.devices )
> > +{
> > +xfree(acpi_rmrr);
> > +return;
> > +}
> > +
> > +seg = 0;
> > +for ( dev = 0; dev < extra_rmrr_units[i].dev_count; dev++ )
> > +{
> > +acpi_rmrr->scope.devices[dev] = extra_rmrr_

[Xen-devel] [PATCH v12 0/3] iommu: add rmrr Xen command line option

2015-10-27 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

Sending v12 with mostly cosmetic fixes from Jan's review on v11.

Add Xen command line option rmrr to specify RMRR regions that are not defined
in ACPI thus causing IO Page Fault while booting dom0 in PVH mode.
These additional regions will be added to the list of RMRR regions parsed from 
ACPI.

Changes in v11:
 - changed macro to print extra RMRR ranges and added argument macro;
 - fixed the overlapping check if condition error;
 - fixed the loop exit condition when checking pfn in RMRR region;

Elena Ufimtseva (3):
  iommu VT-d: separate rmrr addition function
  pci: add wrapper for parse_pci
  iommu: add rmrr Xen command line option for extra rmrrs

 docs/misc/xen-command-line.markdown |  13 ++
 xen/drivers/passthrough/vtd/dmar.c  | 320 +---
 xen/drivers/pci/pci.c   |  11 ++
 xen/include/xen/pci.h   |   3 +
 4 files changed, 285 insertions(+), 62 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v12 1/3] iommu VT-d: separate rmrr addition function

2015-10-27 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

In preparation for auxiliary RMRR data provided on Xen
command line, make RMRR adding a separate function.
Also free memery for rmrr device scope in error path.

Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
---
 xen/drivers/passthrough/vtd/dmar.c | 126 +++--
 1 file changed, 65 insertions(+), 61 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 7cad593..2f315aa 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -583,6 +583,68 @@ out:
 return ret;
 }
 
+static int register_one_rmrr(struct acpi_rmrr_unit *rmrru)
+{
+bool_t ignore = 0;
+unsigned int i = 0;
+int ret = 0;
+
+/* Skip checking if segment is not accessible yet. */
+if ( !pci_known_segment(rmrru->segment) )
+i = UINT_MAX;
+
+for ( ; i < rmrru->scope.devices_cnt; i++ )
+{
+u8 b = PCI_BUS(rmrru->scope.devices[i]);
+u8 d = PCI_SLOT(rmrru->scope.devices[i]);
+u8 f = PCI_FUNC(rmrru->scope.devices[i]);
+
+if ( pci_device_detect(rmrru->segment, b, d, f) == 0 )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+" Non-existent device (%04x:%02x:%02x.%u) is reported"
+" in RMRR (%"PRIx64", %"PRIx64")'s scope!\n",
+rmrru->segment, b, d, f,
+rmrru->base_address, rmrru->end_address);
+ignore = 1;
+}
+else
+{
+ignore = 0;
+break;
+}
+}
+
+if ( ignore )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+"  Ignore the RMRR (%"PRIx64", %"PRIx64") due to "
+"devices under its scope are not PCI discoverable!\n",
+rmrru->base_address, rmrru->end_address);
+scope_devices_free(>scope);
+xfree(rmrru);
+}
+else if ( rmrru->base_address > rmrru->end_address )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+"  The RMRR (%"PRIx64", %"PRIx64") is incorrect!\n",
+rmrru->base_address, rmrru->end_address);
+scope_devices_free(>scope);
+xfree(rmrru);
+ret = -EFAULT;
+}
+else
+{
+if ( iommu_verbose )
+dprintk(VTDPREFIX,
+"  RMRR region: base_addr %"PRIx64" end_address 
%"PRIx64"\n",
+rmrru->base_address, rmrru->end_address);
+acpi_register_rmrr_unit(rmrru);
+}
+
+return ret;
+}
+
 static int __init
 acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 {
@@ -633,68 +695,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
>scope, RMRR_TYPE, rmrr->segment);
 
-if ( ret || (rmrru->scope.devices_cnt == 0) )
-xfree(rmrru);
+if ( !ret && (rmrru->scope.devices_cnt != 0) )
+register_one_rmrr(rmrru);
 else
-{
-u8 b, d, f;
-bool_t ignore = 0;
-unsigned int i = 0;
-
-/* Skip checking if segment is not accessible yet. */
-if ( !pci_known_segment(rmrr->segment) )
-i = UINT_MAX;
-
-for ( ; i < rmrru->scope.devices_cnt; i++ )
-{
-b = PCI_BUS(rmrru->scope.devices[i]);
-d = PCI_SLOT(rmrru->scope.devices[i]);
-f = PCI_FUNC(rmrru->scope.devices[i]);
-
-if ( !pci_device_detect(rmrr->segment, b, d, f) )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-" Non-existent device (%04x:%02x:%02x.%u) is reported"
-" in RMRR (%"PRIx64", %"PRIx64")'s scope!\n",
-rmrr->segment, b, d, f,
-rmrru->base_address, rmrru->end_address);
-ignore = 1;
-}
-else
-{
-ignore = 0;
-break;
-}
-}
-
-if ( ignore )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-"  Ignore the RMRR (%"PRIx64", %"PRIx64") due to "
-"devices under its scope are not PCI discoverable!\n",
-rmrru->base_address, rmrru->end_address);
-scope_devices_free(>scope);
-xfree(rmrru);
-}
-else if ( base_addr > end_addr )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-"  The RMRR (%"PRIx64", %"PRI

[Xen-devel] [PATCH v12 3/3] iommu: add rmrr Xen command line option for extra rmrrs

2015-10-27 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

On some platforms firmware fails to specify RMRR regions in ACPI tables and thus
those regions will not be mapped in dom0 or guests and may cause IO Page Faults
and prevent dom0 from booting in PVH mode.

New Xen command line option rmrr allows to specify such devices and
memory regions. These regions are added to the list of RMRR defined in ACPI if
the device is present in system. As a result, additional RMRRs will be mapped 
1:1
in dom0 with correct permissions.

Mentioned above problems were discovered during PVH work with ThinkCentre M
and Dell 5600T. No official documentation was found so far in regards to what
devices and why cause this. Experiments show that ThinkCentre M USB devices
with enabled debug port generate DMA read transactions to the regions of
memory marked reserved in host e820 map.

For Dell 5600T the device and faulting addresses are not found yet.
For detailed history of the discussion please check following threads:
http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html

Format for rmrr Xen command line option:
rmrr=start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
If grub2 used and multiple ranges are specified, ';' should be
quoted/escaped, refer to grub2 manual for more information.

Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
---
 docs/misc/xen-command-line.markdown |  13 +++
 xen/drivers/passthrough/vtd/dmar.c  | 194 +++-
 2 files changed, 206 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 416e559..92c69ea 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1240,6 +1240,19 @@ Specify the host reboot method.
 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
  default it will use that method first).
 
+### rmrr
+> '= 
start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
+
+Define RMRR units that are missing from ACPI table along with device they
+belong to and use them for 1:1 mapping. End addresses can be omitted and one
+page will be mapped. The ranges are inclusive when start and end are specified.
+If segment of the first device is not specified, segment zero will be used.
+If other segments are not specified, first device segment will be used.
+If a segment is specified for other than the first device and it does not match
+the one specified for the first one, an error will be reported.
+Note: grub2 requires to escape or use quotations if special characters are 
used,
+namely ';', refer to the grub2 documentation if multiple ranges are specified.
+
 ### ro-hpet
 > `= `
 
diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 2f315aa..a9c555e 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -867,6 +867,131 @@ out:
 return ret;
 }
 
+#define MAX_EXTRA_RMRR_PAGES 16
+#define MAX_EXTRA_RMRR 10
+
+/* RMRR units derived from command line rmrr option. */
+#define MAX_EXTRA_RMRR_DEV 20
+struct extra_rmrr_unit {
+struct list_head list;
+unsigned long base_pfn, end_pfn;
+unsigned int dev_count;
+u32 sbdf[MAX_EXTRA_RMRR_DEV];
+};
+
+static __initdata unsigned int nr_rmrr;
+static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR];
+
+/* Macro for RMRR inclusive range formatting. */
+#define ERMRRU_FMT "[%lx-%lx]"
+#define ERMRRU_ARG(eru) eru.base_pfn, eru.end_pfn
+
+static void __init add_extra_rmrr(void)
+{
+struct acpi_rmrr_unit *acpi_rmrr;
+struct acpi_rmrr_unit *rmrru;
+unsigned int dev, seg, i;
+unsigned long pfn;
+bool_t overlap;
+
+for ( i = 0; i < nr_rmrr; i++ )
+{
+if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn )
+{
+printk(XENLOG_ERR VTDPREFIX
+   "Invalid RMRR Range "ERMRRU_FMT"\n",
+   ERMRRU_ARG(extra_rmrr_units[i]));
+continue;
+}
+
+if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn >=
+ MAX_EXTRA_RMRR_PAGES )
+{
+printk(XENLOG_ERR VTDPREFIX
+   "RMRR range "ERMRRU_FMT" exceeds 
"__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n",
+   ERMRRU_ARG(extra_rmrr_units[i]));
+continue;
+}
+
+overlap = 0;
+list_for_each_entry(rmrru, _rmrr_units, list)
+{
+if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn) < 
rmrru->end_address &&
+ rmrru->base_address < 
pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1) )
+{
+printk(XENLOG_ERR VTDPREFIX
+   "Overlapping R

[Xen-devel] [PATCH v12 2/3] pci: add wrapper for parse_pci

2015-10-27 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

For sbdf's parsing in RMRR command line add __parse_pci with additional
parameter def_seg. __parse_pci will help to identify if segment was
found in string being parsed or default segment was used.
Make a wrapper parse_pci so the rest of the callers are not affected.

Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
Acked-by: Jan Beulich <jbeul...@suse.com>
---
 xen/drivers/pci/pci.c | 11 +++
 xen/include/xen/pci.h |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
index ca07ed0..788a356 100644
--- a/xen/drivers/pci/pci.c
+++ b/xen/drivers/pci/pci.c
@@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
  unsigned int *bus_p, unsigned int *dev_p,
  unsigned int *func_p)
 {
+bool_t def_seg;
+
+return __parse_pci(s, seg_p, bus_p, dev_p, func_p, _seg);
+}
+
+const char *__init __parse_pci(const char *s, unsigned int *seg_p,
+ unsigned int *bus_p, unsigned int *dev_p,
+ unsigned int *func_p, bool_t *def_seg)
+{
 unsigned long seg = simple_strtoul(s, , 16), bus, dev, func;
 
 if ( *s != ':' )
 return NULL;
 bus = simple_strtoul(s + 1, , 16);
+*def_seg = 0;
 if ( *s == ':' )
 dev = simple_strtoul(s + 1, , 16);
 else
@@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
 dev = bus;
 bus = seg;
 seg = 0;
+*def_seg = 1;
 }
 if ( func_p )
 {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index a5aef55..a7b62a4 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -151,6 +151,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, 
int cap);
 int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int 
cap);
 const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus,
   unsigned int *dev, unsigned int *func);
+const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus,
+  unsigned int *dev, unsigned int *func, bool_t *def_seg);
+
 
 bool_t pcie_aer_get_firmware_first(const struct pci_dev *);
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v11 2/3] pci: add wrapper for parse_pci

2015-10-22 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

For sbdf's parsing in RMRR command line add __parse_pci with additional
parameter def_seg. __parse_pci will help to identify if segment was
found in string being parsed or default segment was used.
Make a wrapper parse_pci so the rest of the callers are not affected.

Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
Acked-by: Jan Beulich <jbeul...@suse.com>
---
 xen/drivers/pci/pci.c | 11 +++
 xen/include/xen/pci.h |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
index ca07ed0..788a356 100644
--- a/xen/drivers/pci/pci.c
+++ b/xen/drivers/pci/pci.c
@@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
  unsigned int *bus_p, unsigned int *dev_p,
  unsigned int *func_p)
 {
+bool_t def_seg;
+
+return __parse_pci(s, seg_p, bus_p, dev_p, func_p, _seg);
+}
+
+const char *__init __parse_pci(const char *s, unsigned int *seg_p,
+ unsigned int *bus_p, unsigned int *dev_p,
+ unsigned int *func_p, bool_t *def_seg)
+{
 unsigned long seg = simple_strtoul(s, , 16), bus, dev, func;
 
 if ( *s != ':' )
 return NULL;
 bus = simple_strtoul(s + 1, , 16);
+*def_seg = 0;
 if ( *s == ':' )
 dev = simple_strtoul(s + 1, , 16);
 else
@@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
 dev = bus;
 bus = seg;
 seg = 0;
+*def_seg = 1;
 }
 if ( func_p )
 {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index a5aef55..a7b62a4 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -151,6 +151,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, 
int cap);
 int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int 
cap);
 const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus,
   unsigned int *dev, unsigned int *func);
+const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus,
+  unsigned int *dev, unsigned int *func, bool_t *def_seg);
+
 
 bool_t pcie_aer_get_firmware_first(const struct pci_dev *);
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v11 3/3] iommu: add rmrr Xen command line option for extra rmrrs

2015-10-22 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

On some platforms RMRR regions may be not specified in ACPI and thus will not
be mapped 1:1 in dom0. This causes IO Page Faults and prevents dom0 from booting
in PVH mode. New Xen command line option rmrr allows to specify such devices and
memory regions. These regions are added to the list of RMRR defined in ACPI if
the device is present in system. As a result, additional RMRRs will be mapped 
1:1 in dom0 with correct permissions.

Mentioned above problems were discovered during PVH work with ThinkCentre M
and Dell 5600T. No official documentation was found so far in regards to what
devices and why cause this. Experiments show that ThinkCentre M USB devices
with enabled debug port generate DMA read transactions to the regions of
memory marked reserved in host e820 map.

For Dell 5600T the device and faulting addresses are not found yet.
For detailed history of the discussion please check following threads:
http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html

Format for rmrr Xen command line option:
rmrr=start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
If grub2 used and multiple ranges are specified, ';' should be
quoted/escaped, refer to grub2 manual for more information.

Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
---
 docs/misc/xen-command-line.markdown |  13 +++
 xen/drivers/passthrough/vtd/dmar.c  | 196 +++-
 2 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 416e559..92c69ea 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1240,6 +1240,19 @@ Specify the host reboot method.
 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
  default it will use that method first).
 
+### rmrr
+> '= 
start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
+
+Define RMRR units that are missing from ACPI table along with device they
+belong to and use them for 1:1 mapping. End addresses can be omitted and one
+page will be mapped. The ranges are inclusive when start and end are specified.
+If segment of the first device is not specified, segment zero will be used.
+If other segments are not specified, first device segment will be used.
+If a segment is specified for other than the first device and it does not match
+the one specified for the first one, an error will be reported.
+Note: grub2 requires to escape or use quotations if special characters are 
used,
+namely ';', refer to the grub2 documentation if multiple ranges are specified.
+
 ### ro-hpet
 > `= `
 
diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index ced3239..8cbed88 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -867,6 +867,132 @@ out:
 return ret;
 }
 
+#define MAX_EXTRA_RMRR_PAGES 16
+#define MAX_EXTRA_RMRR 10
+
+/* RMRR units derived from command line rmrr option. */
+#define MAX_EXTRA_RMRR_DEV 20
+struct extra_rmrr_unit {
+struct list_head list;
+unsigned long base_pfn, end_pfn;
+unsigned int dev_count;
+u32 sbdf[MAX_EXTRA_RMRR_DEV];
+};
+
+static __initdata unsigned int nr_rmrr;
+static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR];
+
+/* Macro for RMRR inclusive range formatting. */
+#define ERMRRU_FMT "[%lx-%lx]"
+#define ERMRRU_ARG(eru) eru.base_pfn, eru.end_pfn
+
+static void __init add_extra_rmrr(void)
+{
+struct acpi_rmrr_unit *acpi_rmrr;
+struct acpi_rmrr_unit *rmrru;
+unsigned int dev, seg, i;
+unsigned long pfn;
+bool_t overlap;
+
+for ( i = 0; i < nr_rmrr; i++ )
+{
+if ( extra_rmrr_units[i].base_pfn > extra_rmrr_units[i].end_pfn )
+{
+printk(XENLOG_ERR VTDPREFIX
+   "Invalid RMRR Range "ERMRRU_FMT"\n",
+   ERMRRU_ARG(extra_rmrr_units[i]));
+continue;
+}
+
+if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn >=
+ MAX_EXTRA_RMRR_PAGES )
+{
+printk(XENLOG_ERR VTDPREFIX
+   "RMRR range "ERMRRU_FMT" exceeds 
"__stringify(MAX_EXTRA_RMRR_PAGES)" pages\n",
+   ERMRRU_ARG(extra_rmrr_units[i]));
+continue;
+}
+
+overlap = 0;
+list_for_each_entry(rmrru, _rmrr_units, list)
+{
+if ( pfn_to_paddr(extra_rmrr_units[i].base_pfn ) < 
rmrru->end_address &&
+ rmrru->base_address < 
pfn_to_paddr(extra_rmrr_units[i].end_pfn + 1) )
+{
+printk(XENLOG_ERR VTDPREFIX
+   "Overlapping RMRRs: "ERMR

[Xen-devel] [PATCH v11 0/3] iommu: add rmrr Xen command line option

2015-10-22 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

Its being a while since the last v10. There are subtle changes and fewer
patches in the series and will be nice to move it out of my way.
Please review and comment.

Add Xen command line option rmrr to specify RMRR
regions for devices that are not defined in ACPI thus   
causing IO Page Fault while booting dom0 in PVH mode.   
These additional regions will be added to the list of   
RMRR regions parsed from ACPI.

Changes in v11:
 - changed macro to print extra RMRR ranges and added argument macro;
 - fixed the overlapping check if condition error;
 - fixed the loop exit condition when checking pfn in RMRR region;

Changes in v10:
 - incorporate patch 'dmar: device scope mem leak fix' as series requires it;
 - move patch 'pci: add PCI_SBDF and PCI_SEG macros' close to the last patch 
which uses it;
 
Changes in v9:
 - skip to next RMRR region if current overlaps with any in acpi_rmrr_units;
 - fix typos in commit messages;
 - remove clean up chages introduced by mistake in v8;  

Elena Ufimtseva (3):
  iommu VT-d: separate rmrr addition function
  pci: add wrapper for parse_pci
  iommu: add rmrr Xen command line option for extra rmrrs

Changes in v8:  
 - removed bogus debug in patch 1 with non-functional changes;  
 - changed PRI_RMRRL macro for formatting to reflect the fact that two arguments
   are used, so make it PRI_RMRR(s,e) for formatting inclusive RMRR range;  
   'L' is also removed from macro name, which meant to server as a type of 
arguments (%lx);
 - added overlapping check with RMRRs from ACPI;
 - added check based on paddr_bits for pfn's in extra RMRR range (not sure if   
   its redundant with mfn_valid);   
 - addressed while loop exit condition in extra RMRRs parser;

 docs/misc/xen-command-line.markdown |  13 ++
 xen/drivers/passthrough/vtd/dmar.c  | 322 +---
 xen/drivers/pci/pci.c   |  11 ++
 xen/include/xen/pci.h   |   3 +
 4 files changed, 287 insertions(+), 62 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v11 1/3] iommu VT-d: separate rmrr addition function

2015-10-22 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

In preparation for auxiliary RMRR data provided on Xen
command line, make RMRR adding a separate function.
Also free memery for rmrr device scope in error path.

Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> 
---
 xen/drivers/passthrough/vtd/dmar.c | 126 +++--
 1 file changed, 65 insertions(+), 61 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 7cad593..ced3239 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -583,6 +583,68 @@ out:
 return ret;
 }
 
+static int register_one_rmrr(struct acpi_rmrr_unit *rmrru)
+{
+bool_t ignore = 0;
+unsigned int i = 0;
+int ret = 0;
+
+/* Skip checking if segment is not accessible yet. */
+if ( !pci_known_segment(rmrru->segment) )
+i = UINT_MAX;
+
+for ( ; i < rmrru->scope.devices_cnt; i++ )
+{
+u8 b = PCI_BUS(rmrru->scope.devices[i]);
+u8 d = PCI_SLOT(rmrru->scope.devices[i]);
+u8 f = PCI_FUNC(rmrru->scope.devices[i]);
+
+if ( pci_device_detect(rmrru->segment, b, d, f) == 0 )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+" Non-existent device (%04x:%02x:%02x.%u) is reported"
+" in RMRR (%"PRIx64", %"PRIx64")'s scope!\n",
+rmrru->segment, b, d, f,
+rmrru->base_address, rmrru->end_address);
+ignore = 1;
+}
+else
+{
+ignore = 0;
+break;
+}
+}
+
+if ( ignore )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+"  Ignore the RMRR (%"PRIx64", %"PRIx64") due to "
+"devices under its scope are not PCI discoverable!\n",
+rmrru->base_address, rmrru->end_address);
+scope_devices_free(>scope);
+xfree(rmrru);
+}
+else if ( rmrru->base_address > rmrru->end_address )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+"  The RMRR (%"PRIx64", %"PRIx64") is incorrect!\n",
+rmrru->base_address, rmrru->end_address);
+scope_devices_free(>scope);
+xfree(rmrru);
+ret = -EFAULT;
+}
+else
+{
+if ( iommu_verbose )
+dprintk(VTDPREFIX,
+"  RMRR region: base_addr %"PRIx64" end_address 
%"PRIx64"\n",
+rmrru->base_address, rmrru->end_address);
+acpi_register_rmrr_unit(rmrru);
+}
+
+return ret;
+}
+
 static int __init
 acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 {
@@ -633,68 +695,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
>scope, RMRR_TYPE, rmrr->segment);
 
-if ( ret || (rmrru->scope.devices_cnt == 0) )
-xfree(rmrru);
+if ( !ret && (rmrru->scope.devices_cnt != 0) )
+register_one_rmrr(rmrru);
 else
-{
-u8 b, d, f;
-bool_t ignore = 0;
-unsigned int i = 0;
-
-/* Skip checking if segment is not accessible yet. */
-if ( !pci_known_segment(rmrr->segment) )
-i = UINT_MAX;
-
-for ( ; i < rmrru->scope.devices_cnt; i++ )
-{
-b = PCI_BUS(rmrru->scope.devices[i]);
-d = PCI_SLOT(rmrru->scope.devices[i]);
-f = PCI_FUNC(rmrru->scope.devices[i]);
-
-if ( !pci_device_detect(rmrr->segment, b, d, f) )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-" Non-existent device (%04x:%02x:%02x.%u) is reported"
-" in RMRR (%"PRIx64", %"PRIx64")'s scope!\n",
-rmrr->segment, b, d, f,
-rmrru->base_address, rmrru->end_address);
-ignore = 1;
-}
-else
-{
-ignore = 0;
-break;
-}
-}
-
-if ( ignore )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-"  Ignore the RMRR (%"PRIx64", %"PRIx64") due to "
-"devices under its scope are not PCI discoverable!\n",
-rmrru->base_address, rmrru->end_address);
-scope_devices_free(>scope);
-xfree(rmrru);
-}
-else if ( base_addr > end_addr )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-"  Th

Re: [Xen-devel] [PATCH v2] PVH Dom0 RMRR IOMMU mapping regression fix

2015-09-28 Thread Elena Ufimtseva
On Mon, Sep 28, 2015 at 01:04:48AM -0600, Jan Beulich wrote:
> >>> On 25.09.15 at 22:59, <elena.ufimts...@oracle.com> wrote:
> > From: Elena Ufimtseva <elena.ufimts...@oracle.com>
> > 
> > This patch addresses a regression introduced by commit 
> > 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in new set_identity_p2m_entry.
> > RMRRs are not being mapped in IOMMU for PVH Dom0. This causes pages faults 
> > and
> > some long 'hang-like' delays during Dom0 PVH boot and device assignments.
> > 
> > During construct_dom0, in PVH path p2m is being constructed and identity 
> > mapped
> > in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx.
> > New code used to map RMRRs invoked from rmrr_identity_mapping
> > checks if p2m entry exists with same type and access and if yes, skips iommu
> > mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being
> > mapped in IOMMU.
> > 
> > As was mentioned in the earlier discussion, the PVH Dom0 construction code
> > should be modified to properly map RMRR regions in IOMMU. Since change will 
> > be
> > too invasive, this solution is a temporary fix at this time before better
> > solution is in. Also as Jan mentioned, there is no need in having 'x' 
> > permissions
> > for p2m entry of a mmio region, thus changed here.
> 
> Well, now that I look at this again I think there could be reasons for
> execute permission to be needed: Code placed in ROM may require
> this. But then again Dom0 shouldn't on its own (i.e. without
> involving the hypervisor) invoke such code, which usually would be
> expecting to be run in root mode ring 0 anyway. So I think not
> defaulting to include X is the right thing. Hence ...
>
> > Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
> 
> Reviewed-by: Jan Beulich <jbeul...@suse.com>
> 

Thanks Jan!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH resend] PVH Dom0 RMRR IOMMU mapping regression fix

2015-09-25 Thread Elena Ufimtseva
On Fri, Sep 25, 2015 at 12:36:09AM -0600, Jan Beulich wrote:
> >>> On 25.09.15 at 01:53,  wrote:
> > Permissions for p2m entry of read-only 
> > mmio regions are left unchanged as leaving only 'r' cause page faults. I am 
> > not sure what the reason of it yet, will try to dig it further.
> 
> Yes please - imo this absolutely should be changed to just r along
> with the rwx -> rw conversion. Since you saw page faults, could
> you at least point out which address(es) they occurred for? After
> all the set of r/o MMIO pages should be relatively small...

I did verify it with clean build and I cannot reproduce it anymore.
But that is the Page Fault I saw:

XEN) [VT-D]iommu.c:873: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:875: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]DMAR:[DMA Write] Request device [:00:1f.2] fault addr
1b56000, iommu reg = 82c000203000
(XEN) [VT-D]DMAR: reason 05 - PTE Write access is not set
(XEN) print_vtd_entries: iommu 830412a4b9c0 dev :00:1f.2 gmfn
1b56
(XEN) root_entry = 830412a48000
(XEN) root_entry[0] = 291cbd001
(XEN) context = 830291cbd000
(XEN) context[fa] = 2_2920c7001
(XEN) l4 = 8302920c7000
(XEN) l4_index = 0
(XEN) l4[0] = 2920c6003
(XEN) l3 = 8302920c6000
(XEN) l3_index = 0
(XEN) l3[0] = 2920c5003
(XEN) l2 = 8302920c5000
(XEN) l2_index = d
(XEN) l2[d] = 2920b5003
(XEN) l1 = 8302920b5000
(XEN) l1_index = 156
(XEN) l1[156] = 0
(XEN) l1[156] not present

Device is not reported in DMAR, the gfn mapped with p2m_ram_rw type...

lspci:

00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 
6-port SATA Controller 1 [AHCI mode] (rev 05) (prog-if 01 [AHCI 1.0])
Subsystem: Lenovo Device 3097
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 84
I/O ports at f0d0 [size=8]
I/O ports at f0c0 [size=4]
I/O ports at f0b0 [size=8]
I/O ports at f0a0 [size=4]
I/O ports at f060 [size=32]
Memory at f7c36000 (32-bit, non-prefetchable) [size=2K]
Capabilities: 
Kernel driver in use: ahci


But as I say, I cannot reproduce it, will run few more tests.

> 
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -971,7 +971,17 @@ int set_identity_p2m_entry(struct domain *d, unsigned 
> > long gfn,
> >  ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
> >  p2m_mmio_direct, p2ma);
> >  else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
> > -ret = 0;
> > +{
> > +/*
> > + * PVH fixme: during Dom0 PVH construction, p2m entries are being 
> > set
> > + * but iomem regions are not mapped with IOMMU. This makes sure 
> > that
> > + * RMRRs are correctly mapped with IOMMU.
> > + */
> > +if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
> > +ret = iommu_map_page(d, gfn, gfn, 
> > IOMMUF_readable|IOMMUF_writable);
> 
> This should use p2m_get_iommu_flags() (which eventually needs to
> also honor the passed in p2m_access_t, i.e. its use here for now
> only serves documentation purposes as well as a means to spot the
> location when making said adjustment).

Here is the problem: for p2m_mmio_direct type p2m_get_iommu_flags() will
return 0. And that is essentially why 1:1 iomem mapping for Dom0 PVH
does set p2m entries, but does not create identity mapping in
construct_dom0.

Do you mean when saying 'honoring p2m_access_t' that p2m_get_iommu_flags should
be more like ept_p2m_type_to_flags() where permissions are verified?
Right now even if rw permissions are requested, the type p2m_mmio_direct will 
always
return IOMMU flags being zero from p2m_get_iommu_flags();

> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] PVH Dom0 RMRR IOMMU mapping regression fix

2015-09-25 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

This patch addresses a regression introduced by commit 
5ae03990c120a7b3067a52d9784c9aa72c0705a6 in new set_identity_p2m_entry.
RMRRs are not being mapped in IOMMU for PVH Dom0. This causes pages faults and
some long 'hang-like' delays during Dom0 PVH boot and device assignments.

During construct_dom0, in PVH path p2m is being constructed and identity mapped
in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx.
New code used to map RMRRs invoked from rmrr_identity_mapping
checks if p2m entry exists with same type and access and if yes, skips iommu
mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being
mapped in IOMMU.

As was mentioned in the earlier discussion, the PVH Dom0 construction code
should be modified to properly map RMRR regions in IOMMU. Since change will be
too invasive, this solution is a temporary fix at this time before better
solution is in. Also as Jan mentioned, there is no need in having 'x' 
permissions
for p2m entry of a mmio region, thus changed here.

You comments and suggestions are welcome!
Thank you.

Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
---

Changes in v2:
- removed 'x' permission from p2m entry what has mmio read only regions.
conducted tests did not demostrate IOMMU Page Faults that I mentioned in 
v1(RFC)
of this patch;

 xen/arch/x86/domain_build.c |  4 ++--
 xen/arch/x86/mm/p2m.c   | 12 +++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 18cf6aa..bca6fe7 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -432,9 +432,9 @@ static __init void pvh_add_mem_mapping(struct domain *d, 
unsigned long gfn,
 }
 
 if ( rangeset_contains_singleton(mmio_ro_ranges, mfn + i) )
-a = p2m_access_rx;
+a = p2m_access_r;
 else
-a = p2m_access_rwx;
+a = p2m_access_rw;
 
 if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), a)) )
 panic("pvh_add_mem_mapping: gfn:%lx mfn:%lx i:%ld rc:%d\n",
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index e1d930a..7ba7832 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -972,7 +972,17 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
 ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
 p2m_mmio_direct, p2ma);
 else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
-ret = 0;
+{
+/*
+ * PVH fixme: during Dom0 PVH construction, p2m entries are being set
+ * but iomem regions are not mapped with IOMMU. This makes sure that
+ * RMRRs are correctly mapped with IOMMU.
+ */
+if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
+ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
+else
+ret = 0;
+}
 else
 {
 if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED )
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression in RMRRs identity mapping for PVH Dom0

2015-09-24 Thread Elena Ufimtseva
On Thu, Sep 24, 2015 at 11:29:54AM +0100, Wei Liu wrote:
> Hi Elena
> 
> On Wed, Sep 23, 2015 at 11:56:12AM -0400, Elena Ufimtseva wrote:
> > Hi  
> > 
> > 
> > 
> > There is a regression in RMRR patch 
> > 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in
> > new set_identity_p2m_entry. RMRRs are not being mapped in IOMMU for PVH 
> > Dom0.
> > This causes pages faults and some long 'hang-like' delays during boot and
> > device assignments.
> > 
> > 
> > During construct_dom0, in PVH path  p2m is being constructed and identity 
> > mapped
> > in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx.
> > New code used to map RMRRs invoked from rmrr_identity_mapping   
> > 
> > checks if p2m entry exists with same type and access and if yes, skips iommu
> > mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being
> > mapped in IOMMU.
> > 
> > 
> > This debug patch attached fixes this and Ill be glad to see if there is a 
> > more elegant fix.
> > 
> > 
> 
> From a release point of view, PVH Dom0 is not officially supported so I
> don't consider this issue a blocker.
> 
Understand.

> We can backport the proper fix to 4.6.1 if necessary, but I doubt this
> is the only fix we need to make PVH Dom0 work on 4.6. Am I right?

Dom0 PVH boots with some glitches on Intel platforms and with some others on
AMD and it will see for sure more patches. But this problem will
make Dom0 on some Intel platforms to hang, throw page faults or may not be able
to boot at all (as I have seend that happening for some devices when
doing work on extra RMRRs).
> 
> Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression in RMRRs identity mapping for PVH Dom0

2015-09-24 Thread Elena Ufimtseva
On Thu, Sep 24, 2015 at 04:31:09AM -0600, Jan Beulich wrote:
> >>> On 24.09.15 at 11:18,  wrote:
> > AIUI the problem is that before the call to set_identity_p2m_entry(),
> > PVH dom0 has a p2m entry covering this range but no IOMMU entry.  Is
> > that right?  So the fix will be to make PVH dom0 construction set up
> > the IOMMU correctly when it sets up the p2m.
> 
> Right, but with the current way of setting up PVH Dom0 I'm afraid
> this will be rather intrusive to implement. Hence, however much I
> dislike it, I wonder whether a variant of Elena's change (suitably
> annotated with a phv fixme) wouldn't be a reasonable thing for 4.6.
> With the switch to HVMlite the Dom0 setup will need to be re-done
> anyway afaics.

I agree here Jan. The PVH Dom0 up page tables is a sort of special case
on its own. And me, Andrew Cooper and Konrad talked about changing it,
but I have not yet started working on it yet, but I think its in my
plan.

> 
> Elena, as to the actual patch:
> 
> >--- a/xen/arch/x86/mm/p2m.c
> >+++ b/xen/arch/x86/mm/p2m.c
> >@@ -970,8 +970,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned 
> >long gfn,
> > if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
> > ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
> > p2m_mmio_direct, p2ma);
> >-else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
> >-ret = 0;
> >+else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct )
> >+if ( a == p2ma && !is_pvh_domain(d) )
> >+ret = 0;
> >+else ret = iommu_map_page(d, gfn, gfn, 
> >IOMMUF_readable|IOMMUF_writable);
> 
> Besides this wanting figure braces, why do you pull the a == p2ma
> check into the inner if()? If this is because of the P2M getting
> populated with p2m_rwx, I think _that_ should be changed rather
> than breaking the logic here (or, if done properly, complicating it).
> There's no reason I can see to map MMIO regions rwx.

Yes, that is why I did it, because of rwx. I will modify it. 
> 
> Also I think this wants to cover just hwdom and !iommu_use_hap_pt.

Yes, forgot about this one.
> 
> Jan
> 
> > else
> > {
> > if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED )
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression in RMRRs identity mapping for PVH Dom0

2015-09-24 Thread Elena Ufimtseva
On Thu, Sep 24, 2015 at 10:18:54AM +0100, Tim Deegan wrote:
> At 15:17 +0800 on 24 Sep (1443107852), Chen, Tiejun wrote:
> > On 9/23/2015 11:56 PM, Elena Ufimtseva wrote:
> > > Hi
> > >
> > > There is a regression in RMRR patch 
> > > 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in
> > > new set_identity_p2m_entry. RMRRs are not being mapped in IOMMU for PVH 
> > > Dom0.
> > > This causes pages faults and some long 'hang-like' delays during boot and
> > > device assignments.
> > >
> > > During construct_dom0, in PVH path  p2m is being constructed and identity 
> > > mapped
> > > in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx.
> > > New code used to map RMRRs invoked from rmrr_identity_mapping
> > > checks if p2m entry exists with same type and access and if yes, skips 
> > > iommu
> > > mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not 
> > > being
> > > mapped in IOMMU.
> > >
> > > This debug patch attached fixes this and Ill be glad to see if there is a 
> > > more elegant fix.
> > 
> > Based on your explanation, sounds pvh always creates this mapping 
> > beforehand, so what about this?
> > 
> > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> > index cf8485e..d026845 100644
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -964,7 +964,7 @@ int set_identity_p2m_entry(struct domain *d, 
> > unsigned long gfn,
> >   struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >   int ret;
> > 
> > -if ( !paging_mode_translate(p2m->domain) )
> > +if ( !paging_mode_translate(p2m->domain) || is_pvh_domain(d) )
> 
> Sorry, but that wouldn't be safe. :(  PVH domains need the same
> protection as any other paging_mode_translate ones.
> 
> AIUI the problem is that before the call to set_identity_p2m_entry(),
> PVH dom0 has a p2m entry covering this range but no IOMMU entry.  Is
> that right?  So the fix will be to make PVH dom0 construction set up
> the IOMMU correctly when it sets up the p2m.

Yes, thats right. Rework of construct_dom0 and its PVH part should help.
> 
> Cheers,
> 
> Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH resend] PVH Dom0 RMRR IOMMU mapping regression fix

2015-09-24 Thread elena . ufimtseva
From: Elena Ufimtseva <elena.ufimts...@oracle.com>

This patch addresses a regression introduced by commit 
5ae03990c120a7b3067a52d9784c9aa72c0705a6 in new set_identity_p2m_entry. RMRRs 
are not being mapped in
IOMMU for PVH Dom0. This causes pages faults and some long 'hang-like' delays
during Dom0 PVH boot and device assignments.

During construct_dom0, in PVH path p2m is being constructed and identity mapped
in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx.
New code used to map RMRRs invoked from rmrr_identity_mapping
checks if p2m entry exists with same type and access and if yes, skips iommu
mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being
mapped in IOMMU.

As was mentioned in the earlier discussion, the PVH Dom0 construction code
should be modified to properly map RMRR regions in IOMMU. Since change will be
too invasive, this solution is a temporary fix at this time before better 
solution is in.
Also as Jan mentioned, there is no need in having 'x' permissions for p2m entry
of a mmio region, thus changed here. Permissions for p2m entry of read-only 
mmio regions are left unchanged as leaving only 'r' cause page faults. I am not 
sure what the reason of it yet, will try to dig it further.

You comments and suggestions are welcome!
Thank you.

Elena

Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
---
 xen/arch/x86/domain_build.c |  2 +-
 xen/arch/x86/mm/p2m.c   | 12 +++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 18cf6aa..259dfd4 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -434,7 +434,7 @@ static __init void pvh_add_mem_mapping(struct domain *d, 
unsigned long gfn,
 if ( rangeset_contains_singleton(mmio_ro_ranges, mfn + i) )
 a = p2m_access_rx;
 else
-a = p2m_access_rwx;
+a = p2m_access_rw;
 
 if ( (rc = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i), a)) )
 panic("pvh_add_mem_mapping: gfn:%lx mfn:%lx i:%ld rc:%d\n",
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b2726bd..97a0986 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -971,7 +971,17 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
 ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
 p2m_mmio_direct, p2ma);
 else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
-ret = 0;
+{
+/*
+ * PVH fixme: during Dom0 PVH construction, p2m entries are being set
+ * but iomem regions are not mapped with IOMMU. This makes sure that
+ * RMRRs are correctly mapped with IOMMU.
+ */
+if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
+ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
+else
+ret = 0;
+}
 else
 {
 if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED )
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Regression in RMRRs identity mapping for PVH Dom0

2015-09-23 Thread Elena Ufimtseva
Hi  

There is a regression in RMRR patch 5ae03990c120a7b3067a52d9784c9aa72c0705a6 in
new set_identity_p2m_entry. RMRRs are not being mapped in IOMMU for PVH Dom0.
This causes pages faults and some long 'hang-like' delays during boot and
device assignments.

During construct_dom0, in PVH path  p2m is being constructed and identity mapped
in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx.
New code used to map RMRRs invoked from rmrr_identity_mapping   
checks if p2m entry exists with same type and access and if yes, skips iommu
mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being
mapped in IOMMU.

This debug patch attached fixes this and Ill be glad to see if there is a more 
elegant fix.

Thanks! 

Elena 
>From fb25216760a0c17447faa1f416cc59341600dc1b Mon Sep 17 00:00:00 2001
From: Elena Ufimtseva <elena.ufimts...@oracle.com>
Date: Wed, 23 Sep 2015 11:47:49 -0400
Subject: [PATCH] RMRR regression debug for PVH Dom0

Signed-off-by: Elena Ufimtseva <elena.ufimts...@oracle.com>
---
 xen/arch/x86/mm/p2m.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b2726bd..16c8938 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -970,8 +970,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
 if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
 ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
 p2m_mmio_direct, p2ma);
-else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
-ret = 0;
+else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct )
+if ( a == p2ma && !is_pvh_domain(d) )
+ret = 0;
+else ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
 else
 {
 if ( flag & XEN_DOMCTL_DEV_RDM_RELAXED )
-- 
2.1.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] page faults on machines with 4TB memory

2015-07-23 Thread Elena Ufimtseva
On Thu, Jul 23, 2015 at 06:01:45PM +0100, Andrew Cooper wrote:
 On 23/07/15 17:35, Elena Ufimtseva wrote:
  Hi
 
  While working on bugs during boot time on large oracle server x4-8,
  There is a problem with booting Xen on large machines with  4TB memory,
  such as Oracle x4-8.
  The page fault occured initially while loading xen pm info into hypervisor
  (you can see it in serial log attahced named 4.4.2_no_mem_override).
  Tracing down an issue shows that page fault occures in timer.c code
  while getting heap size.
 
  Here is the original call trace:
  rocessor: Uploading Xen processor PM info 
  @ (XEN) [ Xen-4.4.3-preOVM  x86_64  debug=n  Tainted:C ] 
  @ (XEN) CPU:0 
  @ (XEN) RIP:e008:[82d08022e747] add_entry+0x27/0x120 
  @ (XEN) RFLAGS: 00010082   CONTEXT: hypervisor 
  @ (XEN) rax: 8a2d080513a20   rbx: 83808e802300   rcx:
  00e8 
  @ (XEN) rdx: 00e8   rsi: 00e8   rdi:
  83808e802300 
  @ (XEN) rbp: 82d080513a20   rsp: 82d0804d7c70   r8:
  8840ffdb5010 
  @ (XEN) r9:  0017   r10: 83808e802180   r11:
  0200200200200200 
  @ (XEN) r12: 82d080533080   r13: 0296   r14:
  0100100100100100 
  @ (XEN) r15: 00e8   cr0: 80050033   cr4:
  001526f0 
  @ (XEN) cr3: 0100818b2000   cr2: 8840ffdb5010 
  @ (XEN) ds:    es:    fs:    gs:    ss: e010   cs: e008 
  @ (XEN) Xen stack trace from rsp=82d0804d7c70: 
  @ (XEN)83808e802300 82d080513a20 82d08022f59b
  82d080533080 
  @ (XEN)82d080532f50 00e8 83808e802328
   
  @ (XEN)82d080513a20 83808e8022c0 82d080533200
  00e8 
  @ (XEN)00f0 82d0805331c0 82d0802458e2
   
  @ (XEN)00e8 83808e802334 8384be7979b0
  82d0804d7d78 
  @ (XEN) 8384be77c700 82d0804d7d78
  82d080513a20 
  @ (XEN)82d080246207 00e8 00e8
  8384be7979b0 
  @ (XEN)82d08024518a 82d080533080 0070
  82d080533da8 
  @ (XEN)000100e8 8384be797a00 00e80001
  002ab980002abd68 
  @ (XEN)271000124f80 002abd6800124f80 002ab980
  82d0803753e0 
  @ (XEN)00010101 0001 82d0804d7e18
  881fb4afbc88 
  @ (XEN)82d0804d 881fb28a4400 82d0804fca80
  819b7080 
  @ (XEN)82d080266c16 83808fb46ba8 82d080208a82
  83006bddd190 
  @ (XEN)0292 03010036 000100f6
  000f 
  @ (XEN)007f000c0082  007f000c0082
   
  @ (XEN)000a 881fb28a4400 0005
   
  @ (XEN) 00fe 0001
  0001 
  @ (XEN)  82d08031f521
   
  @ (XEN)0246 810010ea 
  810010ea 
  @ (XEN)e030 0246 83006bddd000
  881fb4afbd48 
  @ (XEN) Xen call trace: 
  @ (XEN)[82d08022e747] add_entry+0x27/0x120 
  @ (XEN)[82d08022f59b] set_timer+0x10b/0x220 
  @ (XEN)[82d0802458e2] cpufreq_governor_dbs+0x1e2/0x2f0 
  @ (XEN)[82d080246207] __cpufreq_set_policy+0x87/0x120 
  @ (XEN)[82d08024518a] cpufreq_add_cpu+0x24a/0x4f0 
  @ (XEN)[82d080266c16] do_platform_op+0x9c6/0x1650 
  @ (XEN)[82d080208a82] evtchn_check_pollers+0x22/0xb0 
  @ (XEN)[82d08031f521] do_iret+0xc1/0x1a0 
  @ (XEN)[82d0803243a9] syscall_enter+0xa9/0xae 
  @ (XEN) 
  @ (XEN) Pagetable walk from 8840ffdb5010: 
  @ (XEN)  L4[0x110] = 0100818b3067 18b3 
  @ (XEN)  L3[0x103] =   
  @ (XEN) 
  @ (XEN) 
 
  0x82d08022e720 add_entry: movzwl 0x28(%rdi),%edx
 0x82d08022e724 add_entry+4:push   %rbp
 0x82d08022e725 add_entry+5:
  lea0x2e52f4(%rip),%rax# 0x82d080513a20 
  __per_cpu_offset
 0x82d08022e72c add_entry+12:   
  lea0x30494d(%rip),%r10# 0x82d080533080 per_cpu__timers
 0x82d08022e733 add_entry+19:   push   %rbx
 0x82d08022e734 add_entry+20:   add(%rax,%rdx,8),%r10
 0x82d08022e738 add_entry+24:   movl   $0x0,0x8(%rdi)
 0x82d08022e73f add_entry+31:   movb   $0x3,0x2a(%rdi)
 0x82d08022e743 add_entry+35:   mov0x8(%r10),%r8
 0x82d08022e747 add_entry+39:   movzwl (%r8),%ecx
 
  And this points to 
  int sz = GET_HEAP_SIZE(heap);
  in add_entry of timer.c.
 
  static int add_entry(struct timer *t)   
  
  {   
  
  82d08022cad3:   53

Re: [Xen-devel] [PATCH v10 5/5] iommu: add rmrr Xen command line option for extra rmrrs

2015-07-15 Thread Elena Ufimtseva

- jbeul...@suse.com wrote:

  On 15.07.15 at 17:27, elena.ufimts...@oracle.com wrote:
  On Wed, Jul 15, 2015 at 08:25:06AM +0100, Jan Beulich wrote:
   On 14.07.15 at 12:43, jbeul...@suse.com wrote:
   On 13.07.15 at 20:18, elena.ufimts...@oracle.com wrote:
   +/* Macro for RMRR inclusive range formatting. */
   +#define PRI_RMRR(s,e) [%lx-%lx]
   
   Just PRI_RMRR (i.e. no parens or parameters) please. And I'm
 still
   missing a macro to pair the respective arguments - as said
 before,
   as single format specifier should be accompanied by a single
   argument (as visible to the reader at the use sites).
  
  Answering your IRC question here:
  
  #define ERU_FMT [%lx-%lx]
  #define ERU_ARG(eru) eru.base_pfn, eru.end_pfn
  
  (with the acronym eru open for improvement).
  
  Great! Thanks Jan.
  Can ERU be RMRRU? 
 
 ERMRRU maybe - I'd like the extra to somehow be expressed in
 the name.

Does this imply that it can be used for formatting ACPI RMRRs? 
Or with some modification perharps?
 
 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v10 5/5] iommu: add rmrr Xen command line option for extra rmrrs

2015-07-15 Thread Elena Ufimtseva
On Wed, Jul 15, 2015 at 08:25:06AM +0100, Jan Beulich wrote:
  On 14.07.15 at 12:43, jbeul...@suse.com wrote:
  On 13.07.15 at 20:18, elena.ufimts...@oracle.com wrote:
  +/* Macro for RMRR inclusive range formatting. */
  +#define PRI_RMRR(s,e) [%lx-%lx]
  
  Just PRI_RMRR (i.e. no parens or parameters) please. And I'm still
  missing a macro to pair the respective arguments - as said before,
  as single format specifier should be accompanied by a single
  argument (as visible to the reader at the use sites).
 
 Answering your IRC question here:
 
 #define ERU_FMT [%lx-%lx]
 #define ERU_ARG(eru) eru.base_pfn, eru.end_pfn
 
 (with the acronym eru open for improvement).

Great! Thanks Jan.
Can ERU be RMRRU? 

 
 Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Ping: [PATCH v6] dmar: device scope mem leak fix

2015-07-13 Thread Elena Ufimtseva

- Original Message -
From: jbeul...@suse.com
To: kevin.t...@intel.com, yang.z.zh...@intel.com
Cc: xen-devel@lists.xen.org, boris.ostrov...@oracle.com, 
elena.ufimts...@oracle.com, konrad.w...@oracle.com, t...@xen.org
Sent: Monday, July 13, 2015 12:18:33 PM GMT -05:00 US/Canada Eastern
Subject: Ping: [PATCH v6] dmar: device scope mem leak fix

 On 07.07.15 at 17:17, elena.ufimts...@oracle.com wrote:
 From: Elena Ufimtseva elena.ufimts...@oracle.com
 
 Release memory allocated for scope.devices dmar units on various
 failure paths and when disabling dmar. Set device count after
 successful memory allocation, not before, in device scope parsing function.
 
 Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
 ---
 Changes in v6:   

  - eliminated unrelated code move;
   
  - fix introduces in v5 memory leak;  
   
  

 Changes in v5;   

  - make scope_devices_free actually safe; 
   
  

 Changes in v4:   

  - make scope_devices_free safe to call with NULL scope pointer;  
   
  - since scope_devices_free is safe to call, use it in failure path   
   
in acpi_parse_one_drhd;   

  

 Changes in v3:   

  - make freeing memory for scope devices and zeroing device counter   
   
  as a function;  

  - make sure parse_one_rmrr has memory leak fix in this patch;
   
  - make sure ret values are not lost acpi_parse_one_drhd; 
   
  

 Changes in v2:   

  - release memory for devices scope on error paths in acpi_parse_one_drhd 
   
  and acpi_parse_one_atsr and set the count to zero; 
 
  xen/drivers/passthrough/vtd/dmar.c | 24 ++--
  1 file changed, 22 insertions(+), 2 deletions(-)
 
 diff --git a/xen/drivers/passthrough/vtd/dmar.c 
 b/xen/drivers/passthrough/vtd/dmar.c
 index 2b07be9..8ed1e24 100644
 --- a/xen/drivers/passthrough/vtd/dmar.c
 +++ b/xen/drivers/passthrough/vtd/dmar.c
 @@ -81,6 +81,15 @@ static int __init acpi_register_rmrr_unit(struct 
 acpi_rmrr_unit *rmrr)
  return 0;
  }
  
 +static void scope_devices_free(struct dmar_scope *scope)
 +{
 +if ( !scope )
 +return;
 +
 +scope-devices_cnt = 0;
 +xfree(scope-devices);
 +}
 +
  static void __init disable_all_dmar_units(void)
  {
  struct acpi_drhd_unit *drhd, *_drhd;
 @@ -90,16 +99,19 @@ static void __init disable_all_dmar_units(void)
  list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list )
  {
  list_del(drhd-list);
 +scope_devices_free(drhd-scope);
  xfree(drhd);
  }
  list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list )
  {
  list_del(rmrr-list);
 +scope_devices_free(rmrr-scope);
  xfree(rmrr);
  }
  list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list )
  {
  list_del(atsr-list);
 +scope_devices_free(atsr-scope);
  xfree(atsr);
  }
  }
 @@ -318,13 +330,13 @@ static int __init acpi_parse_dev_scope(
  if ( (cnt = scope_device_count(start, end))  0 )
  return cnt;
  
 -scope-devices_cnt = cnt;
  if ( cnt  0 )
  {
  scope-devices = xzalloc_array(u16, cnt);
  if ( !scope-devices )
  return -ENOMEM;
  }
 +scope-devices_cnt = cnt;
  
  while ( start  end )
  {
 @@ -427,7 +439,7 @@ static int __init acpi_parse_dev_scope(
  
   out:
  if ( ret )
 -xfree(scope-devices);
 +scope_devices_free(scope);
  
  return ret;
  }
 @@ -542,6 +554,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
Workaround BIOS bug: ignore the DRHD due to all 
  devices under its scope are not PCI discoverable!\n);
  
 +scope_devices_free(dmaru-scope);
  iommu_free(dmaru);
  xfree(dmaru);
  }
 @@ -562,9 +575,11 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
  out:
  if ( ret )
  {
 +scope_devices_free(dmaru-scope);
  iommu_free(dmaru);
  xfree(dmaru);
  }
 +
  return ret;
  }
  
 @@ -658,6 +673,7

[Xen-devel] [PATCH v10 2/5] iommu VT-d: separate rmrr addition function

2015-07-13 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

In preparation for auxiliary RMRR data provided on Xen
command line, make RMRR adding a separate function.
Also free memory for rmrr device scope in error path.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 xen/drivers/passthrough/vtd/dmar.c | 126 +++--
 1 file changed, 65 insertions(+), 61 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 8ed1e24..93f10fd 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -583,6 +583,68 @@ out:
 return ret;
 }
 
+static int register_one_rmrr(struct acpi_rmrr_unit *rmrru)
+{
+bool_t ignore = 0;
+unsigned int i = 0;
+int ret = 0;
+
+/* Skip checking if segment is not accessible yet. */
+if ( !pci_known_segment(rmrru-segment) )
+i = UINT_MAX;
+
+for ( ; i  rmrru-scope.devices_cnt; i++ )
+{
+u8 b = PCI_BUS(rmrru-scope.devices[i]);
+u8 d = PCI_SLOT(rmrru-scope.devices[i]);
+u8 f = PCI_FUNC(rmrru-scope.devices[i]);
+
+if ( pci_device_detect(rmrru-segment, b, d, f) == 0 )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+ Non-existent device (%04x:%02x:%02x.%u) is reported
+ in RMRR (%PRIx64, %PRIx64)'s scope!\n,
+rmrru-segment, b, d, f,
+rmrru-base_address, rmrru-end_address);
+ignore = 1;
+}
+else
+{
+ignore = 0;
+break;
+}
+}
+
+if ( ignore )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+  Ignore the RMRR (%PRIx64, %PRIx64) due to 
+devices under its scope are not PCI discoverable!\n,
+rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
+xfree(rmrru);
+}
+else if ( rmrru-base_address  rmrru-end_address )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+  The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
+rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
+xfree(rmrru);
+ret = -EFAULT;
+}
+else
+{
+if ( iommu_verbose )
+dprintk(VTDPREFIX,
+  RMRR region: base_addr %PRIx64 end_address 
%PRIx64\n,
+rmrru-base_address, rmrru-end_address);
+acpi_register_rmrr_unit(rmrru);
+}
+
+return ret;
+}
+
 static int __init
 acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 {
@@ -633,68 +695,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
rmrru-scope, RMRR_TYPE, rmrr-segment);
 
-if ( ret || (rmrru-scope.devices_cnt == 0) )
-xfree(rmrru);
+if ( !ret  (rmrru-scope.devices_cnt != 0) )
+register_one_rmrr(rmrru);
 else
-{
-u8 b, d, f;
-bool_t ignore = 0;
-unsigned int i = 0;
-
-/* Skip checking if segment is not accessible yet. */
-if ( !pci_known_segment(rmrr-segment) )
-i = UINT_MAX;
-
-for ( ; i  rmrru-scope.devices_cnt; i++ )
-{
-b = PCI_BUS(rmrru-scope.devices[i]);
-d = PCI_SLOT(rmrru-scope.devices[i]);
-f = PCI_FUNC(rmrru-scope.devices[i]);
-
-if ( !pci_device_detect(rmrr-segment, b, d, f) )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
- Non-existent device (%04x:%02x:%02x.%u) is reported
- in RMRR (%PRIx64, %PRIx64)'s scope!\n,
-rmrr-segment, b, d, f,
-rmrru-base_address, rmrru-end_address);
-ignore = 1;
-}
-else
-{
-ignore = 0;
-break;
-}
-}
-
-if ( ignore )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-  Ignore the RMRR (%PRIx64, %PRIx64) due to 
-devices under its scope are not PCI discoverable!\n,
-rmrru-base_address, rmrru-end_address);
-scope_devices_free(rmrru-scope);
-xfree(rmrru);
-}
-else if ( base_addr  end_addr )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-  The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
-rmrru-base_address, rmrru-end_address);
-scope_devices_free(rmrru-scope);
-xfree(rmrru);
-ret = -EFAULT;
-}
-else
-{
-if ( iommu_verbose )
-dprintk(VTDPREFIX,
-  RMRR region: base_addr %PRIx64
- end_address %PRIx64\n,
-rmrru-base_address, rmrru-end_address

[Xen-devel] [PATCH v10 1/5] dmar: device scope mem leak fix

2015-07-13 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Release memory allocated for scope.devices dmar units on various
failure paths and when disabling dmar. Set device count after
sucessfull memory allocation, not before, in device scope parsing function.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
Changes in v10:
 - mark patch v6 as v10 and include into the series of patches which add RMRR 
   comman line option for Xen;

Changes in v6:
 - eliminated unrelated code move;
 - fix introduces in v5 memory leak;

Changes in v5;
  - xencope_devices_free actually safe;

Changes in v4:
 - make scope_devices_free safe to call with NULL scope pointer;
 - since scope_devices_free is safe to call, use it in failure path
   in acpi_parse_one_drhd;

Changes in v3:
 - make freeing memory for scope devices and zeroing device counter
 as a function;
 - make sure parse_one_rmrr has memory leak fix in this patch;
 - make sure ret values are not lost acpi_parse_one_drhd;

Changes in v2:
 - release memory for devices scope on error paths in acpi_parse_one_drhd
 and acpi_parse_one_atsr and set the count to zero;

drivers/passthrough/vtd/dmar.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 2b07be9..8ed1e24 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -81,6 +81,15 @@ static int __init acpi_register_rmrr_unit(struct 
acpi_rmrr_unit *rmrr)
 return 0;
 }
 
+static void scope_devices_free(struct dmar_scope *scope)
+{
+if ( !scope )
+return;
+
+scope-devices_cnt = 0;
+xfree(scope-devices);
+}
+
 static void __init disable_all_dmar_units(void)
 {
 struct acpi_drhd_unit *drhd, *_drhd;
@@ -90,16 +99,19 @@ static void __init disable_all_dmar_units(void)
 list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list )
 {
 list_del(drhd-list);
+scope_devices_free(drhd-scope);
 xfree(drhd);
 }
 list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list )
 {
 list_del(rmrr-list);
+scope_devices_free(rmrr-scope);
 xfree(rmrr);
 }
 list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list )
 {
 list_del(atsr-list);
+scope_devices_free(atsr-scope);
 xfree(atsr);
 }
 }
@@ -318,13 +330,13 @@ static int __init acpi_parse_dev_scope(
 if ( (cnt = scope_device_count(start, end))  0 )
 return cnt;
 
-scope-devices_cnt = cnt;
 if ( cnt  0 )
 {
 scope-devices = xzalloc_array(u16, cnt);
 if ( !scope-devices )
 return -ENOMEM;
 }
+scope-devices_cnt = cnt;
 
 while ( start  end )
 {
@@ -427,7 +439,7 @@ static int __init acpi_parse_dev_scope(
 
  out:
 if ( ret )
-xfree(scope-devices);
+scope_devices_free(scope);
 
 return ret;
 }
@@ -542,6 +554,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
   Workaround BIOS bug: ignore the DRHD due to all 
 devices under its scope are not PCI discoverable!\n);
 
+scope_devices_free(dmaru-scope);
 iommu_free(dmaru);
 xfree(dmaru);
 }
@@ -562,9 +575,11 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 out:
 if ( ret )
 {
+scope_devices_free(dmaru-scope);
 iommu_free(dmaru);
 xfree(dmaru);
 }
+
 return ret;
 }
 
@@ -658,6 +673,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
   Ignore the RMRR (%PRIx64, %PRIx64) due to 
 devices under its scope are not PCI discoverable!\n,
 rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
 xfree(rmrru);
 }
 else if ( base_addr  end_addr )
@@ -665,6 +681,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 dprintk(XENLOG_WARNING VTDPREFIX,
   The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
 rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
 xfree(rmrru);
 ret = -EFAULT;
 }
@@ -727,7 +744,10 @@ acpi_parse_one_atsr(struct acpi_dmar_header *header)
 }
 
 if ( ret )
+{
+scope_devices_free(atsru-scope);
 xfree(atsru);
+}
 else
 acpi_register_atsr_unit(atsru);
 return ret;
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v10 4/5] pci: add PCI_SBDF and PCI_SEG macros

2015-07-13 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

In preperation for patch iommu: add rmrr Xen command line option for
extra rmrrs which will use it.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 xen/include/xen/pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 36e8cd3..d66ecab 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf)  0xff)
 #define PCI_BDF(b,d,f)  b)  0xff)  8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  b)  0xff)  8) | ((df)  0xff))
+#define PCI_SBDF(s,b,d,f) s)  0x)  16) | PCI_BDF(b,d,f))
+#define PCI_SEG(sbdf) (((sbdf)  16)  0x)
 
 struct pci_dev_info {
 bool_t is_extfn;
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v10 0/5] iommu: add rmrr Xen command line option

2015-07-13 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Add Xen command line option rmrr to specify RMRR
regions for devices that are not defined in ACPI thus   
causing IO Page Fault while booting dom0 in PVH mode.   
These additional regions will be added to the list of   
RMRR regions parsed from ACPI.  

Changes in v10:
 - incorporate patch 'dmar: device scope mem leak fix' as series requires it.
 - move patch 'pci: add PCI_SBDF and PCI_SEG macros' close to the last patch 
which uses it;
 
Changes in v9:  
- skip to next RMRR region if current overlaps with any in acpi_rmrr_units;
- fix typos in commit messages;
 - remove clean up chages introduced by mistake in v8;  

Changes in v8:  
 - removed bogus debug in patch 1 with non-functional changes;  
 - changed PRI_RMRRL macro for formatting to reflect the fact that two arguments
   are used, so make it PRI_RMRR(s,e) for formatting inclusive RMRR range;  
   'L' is also removed from macro name, which meant to server as a type of 
arguments (%lx);
 - added overlapping check with RMRRs from ACPI;
 - added check based on paddr_bits for pfn's in extra RMRR range (not sure if   
   its redundant with mfn_valid);   
 - addressed while loop exit condition in extra RMRRs parser;   

Elena Ufimtseva (5):
  dmar: device scope mem leak fix
  iommu VT-d: separate rmrr addition function
  pci: add wrapper for parse_pci
  pci: add PCI_SBDF and PCI_SEG macros
  iommu: add rmrr Xen command line option for extra rmrrs

 docs/misc/xen-command-line.markdown |  13 ++
 xen/drivers/passthrough/vtd/dmar.c  | 355 +---
 xen/drivers/pci/pci.c   |  11 ++
 xen/include/xen/pci.h   |   5 +
 4 files changed, 322 insertions(+), 62 deletions(-)

-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v10 3/5] pci: add wrapper for parse_pci

2015-07-13 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

For sbdf's parsing in RMRR command line add __parse_pci with additional
parameter def_seg. __parse_pci will help to identify if segment was
found in string being parsed or default segment was used.
Make a wrapper parse_pci so the rest of the callers are not affected.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
Acked-by: Jan Beulich jbeul...@suse.com
---
 xen/drivers/pci/pci.c | 11 +++
 xen/include/xen/pci.h |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
index ca07ed0..788a356 100644
--- a/xen/drivers/pci/pci.c
+++ b/xen/drivers/pci/pci.c
@@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
  unsigned int *bus_p, unsigned int *dev_p,
  unsigned int *func_p)
 {
+bool_t def_seg;
+
+return __parse_pci(s, seg_p, bus_p, dev_p, func_p, def_seg);
+}
+
+const char *__init __parse_pci(const char *s, unsigned int *seg_p,
+ unsigned int *bus_p, unsigned int *dev_p,
+ unsigned int *func_p, bool_t *def_seg)
+{
 unsigned long seg = simple_strtoul(s, s, 16), bus, dev, func;
 
 if ( *s != ':' )
 return NULL;
 bus = simple_strtoul(s + 1, s, 16);
+*def_seg = 0;
 if ( *s == ':' )
 dev = simple_strtoul(s + 1, s, 16);
 else
@@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
 dev = bus;
 bus = seg;
 seg = 0;
+*def_seg = 1;
 }
 if ( func_p )
 {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3908146..36e8cd3 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -148,6 +148,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, 
int cap);
 int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int 
cap);
 const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus,
   unsigned int *dev, unsigned int *func);
+const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus,
+  unsigned int *dev, unsigned int *func, bool_t *def_seg);
+
 
 bool_t pcie_aer_get_firmware_first(const struct pci_dev *);
 
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v10 5/5] iommu: add rmrr Xen command line option for extra rmrrs

2015-07-13 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

On some platforms RMRR regions may be not specified
in ACPI and thus will not be mapped 1:1 in dom0. This
causes IO Page Faults and prevents dom0 from booting
in PVH mode.
New Xen command line option rmrr allows to specify
such devices and memory regions. These regions are added
to the list of RMRR defined in ACPI if the device
is present in system. As a result, additional RMRRs will
be mapped 1:1 in dom0 with correct permissions.

Mentioned above problems were discovered during PVH work with
ThinkCentre M and Dell 5600T. No official documentation
was found so far in regards to what devices and why cause this.
Experiments show that ThinkCentre M USB devices with enabled
debug port generate DMA read transactions to the regions of
memory marked reserved in host e820 map.
For Dell 5600T the device and faulting addresses are not found yet.

For detailed history of the discussion please check following threads:
http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html

Format for rmrr Xen command line option:
rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
If grub2 used and multiple ranges are specified, ';' should be
quoted/escaped, refer to grub2 manual for more information.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com
---
 docs/misc/xen-command-line.markdown |  13 +++
 xen/drivers/passthrough/vtd/dmar.c  | 209 +++-
 2 files changed, 221 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index aa684c0..f307f3d 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1197,6 +1197,19 @@ Specify the host reboot method.
 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
  default it will use that method first).
 
+### rmrr
+ '= 
start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
+
+Define RMRR units that are missing from ACPI table along with device they
+belong to and use them for 1:1 mapping. End addresses can be omitted and one
+page will be mapped. The ranges are inclusive when start and end are specified.
+If segment of the first device is not specified, segment zero will be used.
+If other segments are not specified, first device segment will be used.
+If a segment is specified for other than the first device and it does not match
+the one specified for the first one, an error will be reported.
+Note: grub2 requires to escape or use quotations if special characters are 
used,
+namely ';', refer to the grub2 documentation if multiple ranges are specified.
+
 ### ro-hpet
  `= boolean`
 
diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 93f10fd..61e8f28 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -867,6 +867,145 @@ out:
 return ret;
 }
 
+#define MAX_EXTRA_RMRR_PAGES 16
+#define MAX_EXTRA_RMRR 10
+
+/* RMRR units derived from command line rmrr option. */
+#define MAX_EXTRA_RMRR_DEV 20
+struct extra_rmrr_unit {
+struct list_head list;
+unsigned long base_pfn, end_pfn;
+unsigned int dev_count;
+u32sbdf[MAX_EXTRA_RMRR_DEV];
+};
+static __initdata unsigned int nr_rmrr;
+static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR];
+
+/* Macro for RMRR inclusive range formatting. */
+#define PRI_RMRR(s,e) [%lx-%lx]
+
+static void __init add_extra_rmrr(void)
+{
+struct acpi_rmrr_unit *acpi_rmrr;
+struct acpi_rmrr_unit *rmrru;
+unsigned int dev, seg, i, j;
+unsigned long pfn;
+bool_t overlap;
+
+for ( i = 0; i  nr_rmrr; i++ )
+{
+if ( extra_rmrr_units[i].base_pfn  extra_rmrr_units[i].end_pfn )
+{
+printk(XENLOG_ERR VTDPREFIX
+   Invalid RMRR Range PRI_RMRR(s,e)\n,
+   extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+continue;
+}
+
+if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn =
+ MAX_EXTRA_RMRR_PAGES )
+{
+printk(XENLOG_ERR VTDPREFIX
+   RMRR range PRI_RMRR(s,e) exceeds 
__stringify(MAX_EXTRA_RMRR_PAGES) pages\n,
+   extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+continue;
+}
+
+for ( j = 0; j  nr_rmrr; j++ )
+{
+if ( i != j 
+ extra_rmrr_units[i].base_pfn = extra_rmrr_units[j].end_pfn 
+ extra_rmrr_units[j].base_pfn = extra_rmrr_units[i].end_pfn )
+{
+printk(XENLOG_ERR VTDPREFIX
+  Overlapping RMRRs PRI_RMRR(s,e) and 
PRI_RMRR(s,e)\n,
+  extra_rmrr_units[i].base_pfn, 
extra_rmrr_units[i].end_pfn

Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-07-09 Thread Elena Ufimtseva

- jbeul...@suse.com wrote:

  On 09.07.15 at 14:07, elena.ufimts...@oracle.com wrote:
  You are right, it needs to be rebased. I can post later rebased on
 memory 
  leak fix version, if you thin its a way to go.
 
 I didn't look at v9 yet, and can't predict when I will be able to.
 
 Jan

Jan 

Would you like me to post v10 with memory leak patch included in the patchset 
before you start looking at v9?

Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-07-09 Thread Elena Ufimtseva

- wei.l...@citrix.com wrote:

 On Thu, Jul 09, 2015 at 05:00:45PM +0100, Jan Beulich wrote:
   On 09.07.15 at 17:53, elena.ufimts...@oracle.com wrote:
   - jbeul...@suse.com wrote:
On 09.07.15 at 14:07, elena.ufimts...@oracle.com wrote:
You are right, it needs to be rebased. I can post later rebased
 on
   memory 
leak fix version, if you thin its a way to go.
   
   I didn't look at v9 yet, and can't predict when I will be able
 to.
   
   Would you like me to post v10 with memory leak patch included in
 the 
   patchset before you start looking at v9?
  
  If there is a dependency on the changes in the leak fix v6, then
  this would be a good idea. If not, you can keep things as they are
  now. I view the entire set more as a bug fix than a feature anyway,
  and hence see no reason not to get this in after the freeze. But
 I'm
  adding Wei just in case...
  
 

Thanks Jan.
The dependency exists on memory leak patch, so I will add it to this series and 
squash the first patch from v9.
 
 I just looked at v9. The first three patches are quite mechanical.
 The
 fourth patch is relatively bigger but it's also quite straightforward
 (mostly parsing input). All in all, this series itself is
 self-contained.
 
 I'm don't think OSSTest is able to test that, so it would not cause
 visible regression on our side.
 
 I also agree it's a bug fix. Preferably this series should be applied
 before first RC.
 
 Wei.

Thank you Wei.
 
  Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-07-09 Thread Elena Ufimtseva
On Thu, Jul 09, 2015 at 09:10:06AM +0100, Jan Beulich wrote:
  On 08.07.15 at 19:27, konrad.w...@oracle.com wrote:
  On Tue, Jun 30, 2015 at 07:33:59PM -0400, elena.ufimts...@oracle.com wrote:
  From: Elena Ufimtseva elena.ufimts...@oracle.com
  
  
  You usually say why you need this patch. Something as simple as:
  
  In preperation for patch  which will use it is OK.
 
 Or, even better, add such macros when the first user appears. Iirc
 I said so before...

Yes, I realized this late. Will move over in the next version if needed. 
 Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-07-09 Thread Elena Ufimtseva

- jbeul...@suse.com wrote:

  On 09.07.15 at 13:13, elena.ufimts...@oracle.com wrote:
  On Thu, Jul 09, 2015 at 09:10:06AM +0100, Jan Beulich wrote:
   On 08.07.15 at 19:27, konrad.w...@oracle.com wrote:
   On Tue, Jun 30, 2015 at 07:33:59PM -0400,
 elena.ufimts...@oracle.com wrote:
   From: Elena Ufimtseva elena.ufimts...@oracle.com
   
   
   You usually say why you need this patch. Something as simple as:
   
   In preperation for patch  which will use it is OK.
  
  Or, even better, add such macros when the first user appears. Iirc
  I said so before...
 
  Yes, I realized this late. Will move over in the next version if
 needed. 
 
 Don't you need to rebase on top of v6 of dmar: device scope mem
 leak fix anyway? Or does the series not conflict with those changes?

You are right, it needs to be rebased. I can post later rebased on memory leak 
fix version, if you thin its a way to go.
 
 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 3/4] pci: add wrapper for parse_pci

2015-07-08 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

For sbdf's parsing in RMRR command line add __parse_pci with additional
parameter def_seg. __parse_pci will help to identify if segment was
found in string being parsed or default segment was used.
Make a wrapper parse_pci so the rest of the callers are not affected.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
Acked-by: Jan Beulich jbeul...@suse.com
---
 xen/drivers/pci/pci.c | 11 +++
 xen/include/xen/pci.h |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
index ca07ed0..788a356 100644
--- a/xen/drivers/pci/pci.c
+++ b/xen/drivers/pci/pci.c
@@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
  unsigned int *bus_p, unsigned int *dev_p,
  unsigned int *func_p)
 {
+bool_t def_seg;
+
+return __parse_pci(s, seg_p, bus_p, dev_p, func_p, def_seg);
+}
+
+const char *__init __parse_pci(const char *s, unsigned int *seg_p,
+ unsigned int *bus_p, unsigned int *dev_p,
+ unsigned int *func_p, bool_t *def_seg)
+{
 unsigned long seg = simple_strtoul(s, s, 16), bus, dev, func;
 
 if ( *s != ':' )
 return NULL;
 bus = simple_strtoul(s + 1, s, 16);
+*def_seg = 0;
 if ( *s == ':' )
 dev = simple_strtoul(s + 1, s, 16);
 else
@@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
 dev = bus;
 bus = seg;
 seg = 0;
+*def_seg = 1;
 }
 if ( func_p )
 {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 414106a..d66ecab 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -150,6 +150,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, 
int cap);
 int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int 
cap);
 const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus,
   unsigned int *dev, unsigned int *func);
+const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus,
+  unsigned int *dev, unsigned int *func, bool_t *def_seg);
+
 
 bool_t pcie_aer_get_firmware_first(const struct pci_dev *);
 
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 2/4] iommu VT-d: separate rmrr addition function

2015-07-08 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

In preparation for auxiliary RMRR data provided on Xen
command line, make RMRR adding a separate function.
Also free memery for rmrr device scope in error path. 

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 xen/drivers/passthrough/vtd/dmar.c | 126 +++--
 1 file changed, 65 insertions(+), 61 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 77ef708..a8e1e5d 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -585,6 +585,68 @@ out:
 return ret;
 }
 
+static int register_one_rmrr(struct acpi_rmrr_unit *rmrru)
+{
+bool_t ignore = 0;
+unsigned int i = 0;
+int ret = 0;
+
+/* Skip checking if segment is not accessible yet. */
+if ( !pci_known_segment(rmrru-segment) )
+i = UINT_MAX;
+
+for ( ; i  rmrru-scope.devices_cnt; i++ )
+{
+u8 b = PCI_BUS(rmrru-scope.devices[i]);
+u8 d = PCI_SLOT(rmrru-scope.devices[i]);
+u8 f = PCI_FUNC(rmrru-scope.devices[i]);
+
+if ( pci_device_detect(rmrru-segment, b, d, f) == 0 )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+ Non-existent device (%04x:%02x:%02x.%u) is reported
+ in RMRR (%PRIx64, %PRIx64)'s scope!\n,
+rmrru-segment, b, d, f,
+rmrru-base_address, rmrru-end_address);
+ignore = 1;
+}
+else
+{
+ignore = 0;
+break;
+}
+}
+
+if ( ignore )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+  Ignore the RMRR (%PRIx64, %PRIx64) due to 
+devices under its scope are not PCI discoverable!\n,
+rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
+xfree(rmrru);
+}
+else if ( rmrru-base_address  rmrru-end_address )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+  The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
+rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
+xfree(rmrru);
+ret = -EFAULT;
+}
+else
+{
+if ( iommu_verbose )
+dprintk(VTDPREFIX,
+  RMRR region: base_addr %PRIx64 end_address 
%PRIx64\n,
+rmrru-base_address, rmrru-end_address);
+acpi_register_rmrr_unit(rmrru);
+}
+
+return ret;
+}
+
 static int __init
 acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 {
@@ -635,68 +697,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
rmrru-scope, RMRR_TYPE, rmrr-segment);
 
-if ( ret || (rmrru-scope.devices_cnt == 0) )
-xfree(rmrru);
+if ( !ret  (rmrru-scope.devices_cnt != 0) )
+register_one_rmrr(rmrru);
 else
-{
-u8 b, d, f;
-bool_t ignore = 0;
-unsigned int i = 0;
-
-/* Skip checking if segment is not accessible yet. */
-if ( !pci_known_segment(rmrr-segment) )
-i = UINT_MAX;
-
-for ( ; i  rmrru-scope.devices_cnt; i++ )
-{
-b = PCI_BUS(rmrru-scope.devices[i]);
-d = PCI_SLOT(rmrru-scope.devices[i]);
-f = PCI_FUNC(rmrru-scope.devices[i]);
-
-if ( !pci_device_detect(rmrr-segment, b, d, f) )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
- Non-existent device (%04x:%02x:%02x.%u) is reported
- in RMRR (%PRIx64, %PRIx64)'s scope!\n,
-rmrr-segment, b, d, f,
-rmrru-base_address, rmrru-end_address);
-ignore = 1;
-}
-else
-{
-ignore = 0;
-break;
-}
-}
-
-if ( ignore )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-  Ignore the RMRR (%PRIx64, %PRIx64) due to 
-devices under its scope are not PCI discoverable!\n,
-rmrru-base_address, rmrru-end_address);
-scope_devices_free(rmrru-scope);
-xfree(rmrru);
-}
-else if ( base_addr  end_addr )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-  The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
-rmrru-base_address, rmrru-end_address);
-scope_devices_free(rmrru-scope);
-xfree(rmrru);
-ret = -EFAULT;
-}
-else
-{
-if ( iommu_verbose )
-dprintk(VTDPREFIX,
-  RMRR region: base_addr %PRIx64
- end_address %PRIx64\n,
-rmrru-base_address, rmrru-end_address);
-acpi_register_rmrr_unit(rmrru

[Xen-devel] [PATCH v9 4/4] iommu: add rmrr Xen command line option for extra rmrrs

2015-07-08 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

On some platforms RMRR regions may be not specified
in ACPI and thus will not be mapped 1:1 in dom0. This
causes IO Page Faults and prevents dom0 from booting
in PVH mode.
New Xen command line option rmrr allows to specify
such devices and memory regions. These regions are added
to the list of RMRR defined in ACPI if the device
is present in system. As a result, additional RMRRs will
be mapped 1:1 in dom0 with correct permissions.

Mentioned above problems were discovered during PVH work with
ThinkCentre M and Dell 5600T. No official documentation
was found so far in regards to what devices and why cause this.
Experiments show that ThinkCentre M USB devices with enabled
debug port generate DMA read transactions to the regions of
memory marked reserved in host e820 map.
For Dell 5600T the device and faulting addresses are not found yet.

For detailed history of the discussion please check following threads:
http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html

Format for rmrr Xen command line option:
rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
If grub2 used and multiple ranges are specified, ';' should be
quoted/escaped, refer to grub2 manual for more information.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 docs/misc/xen-command-line.markdown |  13 +++
 xen/drivers/passthrough/vtd/dmar.c  | 209 +++-
 2 files changed, 221 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index aa684c0..f307f3d 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1197,6 +1197,19 @@ Specify the host reboot method.
 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
  default it will use that method first).
 
+### rmrr
+ '= 
start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
+
+Define RMRR units that are missing from ACPI table along with device they
+belong to and use them for 1:1 mapping. End addresses can be omitted and one
+page will be mapped. The ranges are inclusive when start and end are specified.
+If segment of the first device is not specified, segment zero will be used.
+If other segments are not specified, first device segment will be used.
+If a segment is specified for other than the first device and it does not match
+the one specified for the first one, an error will be reported.
+Note: grub2 requires to escape or use quotations if special characters are 
used,
+namely ';', refer to the grub2 documentation if multiple ranges are specified.
+
 ### ro-hpet
  `= boolean`
 
diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index a8e1e5d..f62fb02 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -869,6 +869,145 @@ out:
 return ret;
 }
 
+#define MAX_EXTRA_RMRR_PAGES 16
+#define MAX_EXTRA_RMRR 10
+
+/* RMRR units derived from command line rmrr option. */
+#define MAX_EXTRA_RMRR_DEV 20
+struct extra_rmrr_unit {
+struct list_head list;
+unsigned long base_pfn, end_pfn;
+unsigned int dev_count;
+u32sbdf[MAX_EXTRA_RMRR_DEV];
+};
+static __initdata unsigned int nr_rmrr;
+static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR];
+
+/* Macro for RMRR inclusive range formatting. */
+#define PRI_RMRR(s,e) [%lx-%lx]
+
+static void __init add_extra_rmrr(void)
+{
+struct acpi_rmrr_unit *acpi_rmrr;
+struct acpi_rmrr_unit *rmrru;
+unsigned int dev, seg, i, j;
+unsigned long pfn;
+bool_t overlap;
+
+for ( i = 0; i  nr_rmrr; i++ )
+{
+if ( extra_rmrr_units[i].base_pfn  extra_rmrr_units[i].end_pfn )
+{
+printk(XENLOG_ERR VTDPREFIX
+   Invalid RMRR Range PRI_RMRR(s,e)\n,
+   extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+continue;
+}
+
+if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn =
+ MAX_EXTRA_RMRR_PAGES )
+{
+printk(XENLOG_ERR VTDPREFIX
+   RMRR range PRI_RMRR(s,e) exceeds 
__stringify(MAX_EXTRA_RMRR_PAGES) pages\n,
+   extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+continue;
+}
+
+for ( j = 0; j  nr_rmrr; j++ )
+{
+if ( i != j 
+ extra_rmrr_units[i].base_pfn = extra_rmrr_units[j].end_pfn 
+ extra_rmrr_units[j].base_pfn = extra_rmrr_units[i].end_pfn )
+{
+printk(XENLOG_ERR VTDPREFIX
+  Overlapping RMRRs PRI_RMRR(s,e) and 
PRI_RMRR(s,e)\n,
+  extra_rmrr_units[i].base_pfn, 
extra_rmrr_units[i].end_pfn,
+  extra_rmrr_units[j].base_pfn, 
extra_rmrr_units[j].end_pfn

[Xen-devel] [PATCH v9 0/4] iommu: add rmrr Xen command line option

2015-07-08 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

v9 of rmrr command line patches.

Add Xen command line option rmrr to specify RMRR
regions for devices that are not defined in ACPI thus
causing IO Page Fault while booting dom0 in PVH mode.
These additional regions will be added to the list of
RMRR regions parsed from ACPI.

Changes in v9:
 - skip to next RMRR region if current overlaps with any in acpi_rmrr_units;
 - fix typos in commit messages;
 - remove clean up chages introduced by mistake in v8;

Changes in v8:
 - removed bogus debug in patch 1 with non-functional changes;
 - changed PRI_RMRRL macro for formatting to reflect the fact that two arguments
   are used, so make it PRI_RMRR(s,e) for formatting inclusive RMRR range;
   'L' is also removed from macro name, which meant to server as a type of 
arguments (%lx);
 - added overlapping check with RMRRs from ACPI;
 - added check based on paddr_bits for pfn's in extra RMRR range (not sure if
   its redundant with mfn_valid);
 - addressed while loop exit condition in extra RMRRs parser;

Changes in v7:
 - make sure RMRRs ranges are being checked correctly;
 - dont interrupt RMRRs checking if some of checks fails, instead
 continue to next RMRR;
 - make rmrr variable names more obvious;
 - fix debug output formatting to match type of rmrr range;
 - fix typos in rmrr command line document and in comments;

Elena Ufimtseva (4):
  pci: add PCI_SBDF and PCI_SEG macros
  iommu VT-d: separate rmrr addition function
  pci: add wrapper for parse_pci
  iommu: add rmrr Xen command line option for extra rmrrs

 docs/misc/xen-command-line.markdown |  13 ++
 xen/drivers/passthrough/vtd/dmar.c  | 334 +---
 xen/drivers/pci/pci.c   |  11 ++
 xen/include/xen/pci.h   |   5 +
 4 files changed, 301 insertions(+), 62 deletions(-)

-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-07-08 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

In preparation for patch iommu: add rmrr Xen command line option for
extra rmrrs which will use it.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 xen/include/xen/pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3908146..414106a 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf)  0xff)
 #define PCI_BDF(b,d,f)  b)  0xff)  8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  b)  0xff)  8) | ((df)  0xff))
+#define PCI_SBDF(s,b,d,f) s)  0x)  16) | PCI_BDF(b,d,f))
+#define PCI_SEG(sbdf) (((sbdf)  16)  0x)
 
 struct pci_dev_info {
 bool_t is_extfn;
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6] dmar: device scope mem leak fix

2015-07-07 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Release memory allocated for scope.devices dmar units on various
failure paths and when disabling dmar. Set device count after
successful memory allocation, not before, in device scope parsing function.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
Changes in v6:  
 - eliminated unrelated code move;  
 - fix introduces in v5 memory leak;

Changes in v5;  
 - make scope_devices_free actually safe;   

Changes in v4:  
 - make scope_devices_free safe to call with NULL scope pointer;
 - since scope_devices_free is safe to call, use it in failure path 
   in acpi_parse_one_drhd;  

Changes in v3:  
 - make freeing memory for scope devices and zeroing device counter 
 as a function; 
 - make sure parse_one_rmrr has memory leak fix in this patch;  
 - make sure ret values are not lost acpi_parse_one_drhd;   

Changes in v2:  
 - release memory for devices scope on error paths in acpi_parse_one_drhd   
 and acpi_parse_one_atsr and set the count to zero; 

 xen/drivers/passthrough/vtd/dmar.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 2b07be9..8ed1e24 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -81,6 +81,15 @@ static int __init acpi_register_rmrr_unit(struct 
acpi_rmrr_unit *rmrr)
 return 0;
 }
 
+static void scope_devices_free(struct dmar_scope *scope)
+{
+if ( !scope )
+return;
+
+scope-devices_cnt = 0;
+xfree(scope-devices);
+}
+
 static void __init disable_all_dmar_units(void)
 {
 struct acpi_drhd_unit *drhd, *_drhd;
@@ -90,16 +99,19 @@ static void __init disable_all_dmar_units(void)
 list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list )
 {
 list_del(drhd-list);
+scope_devices_free(drhd-scope);
 xfree(drhd);
 }
 list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list )
 {
 list_del(rmrr-list);
+scope_devices_free(rmrr-scope);
 xfree(rmrr);
 }
 list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list )
 {
 list_del(atsr-list);
+scope_devices_free(atsr-scope);
 xfree(atsr);
 }
 }
@@ -318,13 +330,13 @@ static int __init acpi_parse_dev_scope(
 if ( (cnt = scope_device_count(start, end))  0 )
 return cnt;
 
-scope-devices_cnt = cnt;
 if ( cnt  0 )
 {
 scope-devices = xzalloc_array(u16, cnt);
 if ( !scope-devices )
 return -ENOMEM;
 }
+scope-devices_cnt = cnt;
 
 while ( start  end )
 {
@@ -427,7 +439,7 @@ static int __init acpi_parse_dev_scope(
 
  out:
 if ( ret )
-xfree(scope-devices);
+scope_devices_free(scope);
 
 return ret;
 }
@@ -542,6 +554,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
   Workaround BIOS bug: ignore the DRHD due to all 
 devices under its scope are not PCI discoverable!\n);
 
+scope_devices_free(dmaru-scope);
 iommu_free(dmaru);
 xfree(dmaru);
 }
@@ -562,9 +575,11 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 out:
 if ( ret )
 {
+scope_devices_free(dmaru-scope);
 iommu_free(dmaru);
 xfree(dmaru);
 }
+
 return ret;
 }
 
@@ -658,6 +673,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
   Ignore the RMRR (%PRIx64, %PRIx64) due to 
 devices under its scope are not PCI discoverable!\n,
 rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
 xfree(rmrru);
 }
 else if ( base_addr  end_addr )
@@ -665,6 +681,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 dprintk(XENLOG_WARNING VTDPREFIX,
   The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
 rmrru

Re: [Xen-devel] [PATCH v5] dmar: device scope mem leak fix

2015-07-07 Thread Elena Ufimtseva
On Tue, Jul 07, 2015 at 10:54:25AM +0100, Jan Beulich wrote:
  On 01.07.15 at 20:30, elena.ufimts...@oracle.com wrote:
  Release memory allocated for scope.devices when disabling
  dmar units. Also set device count after memory allocation when
  device scope parsing.
  This is explanation of why the code should be moved imho and
  answers Jan question about why I needed to do this.
  In acpi_parse_one_drhr move call to acpi_parse_dev_scope after include_all
  check so the return value does not get overwritten by calling 
  acpi_parse_dev_scope.
 
 I can't really connect the middle paragraph to the first or last one,
 and in any event this doesn't seem to belong in a commit message
 in that shape. Nor can I see the reason for the movement, even
 with the last sentence above trying to explain it. What return value
 do you see being overwritten? And how does that relate to the
 intention of this patch?

Well, you are right that this part is unrelated to this patch.
I will later post this change as a separate clean up.

But to explain myself, the value of ret after acpi_parse_dev_scope is
overwritten if drhd-segment == 0  include_all. I assumed that its 
important to preserve the ret code. I see the problem though with
moving code as include_all will not be set if I exit right after
acpi_parse_dev_scope.
Thus I am dropping this part from this patch.


 
  @@ -474,12 +486,10 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
   
   ret = iommu_alloc(dmaru);
   if ( ret )
  -goto out;
  -
  -dev_scope_start = (void *)(drhd + 1);
  -dev_scope_end = ((void *)drhd) + header-length;
  -ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
  -   dmaru-scope, DMAR_TYPE, drhd-segment);
  +{
  +xfree(dmaru);
  +return ret;
  +}
 
 Why is this being changed from goto out? You're now possibly
 leaking memory as well as a mapping if iommu_alloc() failed on any
 of its actions after having set drhd-iommu.

Right, I see it. Will fix.

Thank you!

Elena
 
 Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-06 Thread Elena Ufimtseva

- ian.campb...@citrix.com wrote:

 On Fri, 2015-07-03 at 13:55 +0100, Ian Campbell wrote:
  On Thu, 2015-07-02 at 16:16 +0100, Ian Jackson wrote:
   Ian Campbell writes (Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly
 Xen.org Technical Call (2015-07-08)):
On Thu, 2015-07-02 at 09:45 +0100, Ian Campbell wrote:
 Shall I put up a poll of some sort to gather preferred
 timeslot options
 out of that set?

Please can everyone who is interested in this topic indicate
 their date
preference/availability at:

http://doodle.com/cy88dhwzybg7hh7p

I've gone with the usual 5pm BST slow for simplicity. That's
 1200 Noon
EDT, 9am PDT and 6pm CEST.
   
   I'm never available at 1700 BST on a Wednesday, I'm afraid.  I
 can
   make that time any other day of the week.
  
  I've added the Tuesday and Thursday either side of each date to the
 mix
  as well.
  
  David, Roger, Stefano, Konrad, Boris, Elena:
  Sorry, would you mind adding your availability for the new
  dates.
 
 Konrad, Elena: Ping.
 
  I'll close the poll on Tuesday.
 
 At the moment (once I adjust for the missing responses assuming they
 are
 yes) the front runner appears like it is going to be Thursday 23rd.
 

Hi Ian,
I am fine with this date.

 Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] can't create a vNUMA enabled PV guest

2015-07-02 Thread Elena Ufimtseva
On Wed, Jul 1, 2015 at 10:42 AM, Dario Faggioli
dario.faggi...@citrix.com wrote:
 Hey,

 I know Wei is away, so I'll try to find the time to look at this myself,
 but I figured I'll let know about it, in case someone has obvious (or
 not :-D) ideas.

 I think I'm facing a bug that prevents creating PV guests with a vNUMA
 topology. I'm pretty sure I tested this before, while reviewing Wei's
 patches, so it must be something introduced between then and now (yes,
 we need a vNUMA OSSTest test case... I'll see about putting one
 together).

 So, here we are. With this as base config:

   name= 'test'
   # Kernel, params and imags
   kernel  = '/root/3.19.0+/vmlinuz-3.19.0+'
   ramdisk = '/root/3.19.0+/initrd.img-3.19.0+'
   # CPUs and Memory and related
   vcpus   = '4'
   memory  = '1024'
   vnuma = [ [ pnode=0,size=512,vcpus=0-1,vdistances=10,20  ],
 [ pnode=1,size=512,vcpus=2-3,vdistances=20,10  ] ]
   # Disks
   root= '/dev/xvda1 ro'
   disk= [
 'phy:/dev/vms/test-pv-disk,xvda1,w',
 ]
   # Networking
   dhcp= 'dhcp'
   vif = [ 'mac=00:16:3E:FA:A7:9B,bridge=xenbr0' ]



 If I build a HVM guest, everything works:

 (XEN) Memory location of each domain:
 (XEN) Domain 0 (total: 129874):
 (XEN) Node 0: 113466
 (XEN) Node 1: 16408
 (XEN) Domain 14 (total: 262251):
 (XEN) Node 0: 131029
 (XEN) Node 1: 131222
 (XEN)  2 vnodes, 4 vcpus, guest physical layout:
 (XEN)  0: pnode   0, vcpus 0-1
 (XEN) - 1f80
 (XEN)  1: pnode   1, vcpus 2-3
 (XEN)1f80 - 3f80

 root@test:~# numactl --hardware
 available: 2 nodes (0-1)
 node 0 cpus: 0 1
 node 0 size: 411 MB
 node 0 free: 311 MB
 node 1 cpus: 2 3
 node 1 size: 442 MB
 node 1 free: 406 MB
 node distances:
 node   0   1
   0:  10  20
   1:  20  10


 If I build a PV guest, it breaks:

 root@Zhaman:~# xl create -c /etc/xen/test.cfg
 Parsing config from /etc/xen/test.cfg
 xc: error: panic: xc_dom_x86.c:940: arch_setup_meminit: failed to allocate 
 0x2 pages (v=1, p=1)
 : Internal error
 xc: error: panic: xc_dom_boot.c:155: xc_dom_boot_mem_init: can't allocate low 
 memory for domain: Out of memory
 libxl: error: libxl_dom.c:731:libxl__build_pv: xc_dom_boot_mem_init failed: 
 Device or resource busy
 libxl: error: libxl_create.c:1174:domcreate_rebuild_done: cannot (re-)build 
 domain: -3
 libxl: error: libxl.c:1586:libxl__destroy_domid: non-existant domain 15
 libxl: error: libxl.c:1544:domain_destroy_callback: unable to destroy guest 
 with domid 15
 libxl: error: libxl.c:1471:domain_destroy_cb: destruction of domain 15 failed

 (XEN) d0v1 Over-allocation for domain 15: 262656  262400
 (XEN) memory.c:155:d0v1 Could not allocate order=9 extent: id=15 memflags=210 
 (0 of 512)
 (XEN) d0v1 Over-allocation for domain 15: 262401  262400
 (XEN) memory.c:155:d0v1 Could not allocate order=0 extent: id=15 memflags=210 
 (256 of 131072)


 As said, I'll be looking into it in the next days. If, in the meanwhile,
 someone has any ideas, that would be much appreciated. :-)

Hi Dario!

The kernel you are running maybe missing vNUMA patch.
Konrad asked me if the patch was upstream. well, It is not, I think I
abandoned it :).
I will address latest comments and other changes in the v6 review and post it.

Elena



 Regards,
 Dario
 --
 This happens because I choose it to happen! (Raistlin Majere)
 -
 Dario Faggioli, Ph.D, http://about.me/dario.faggioli
 Senior Software Engineer, Citrix Systems RD Ltd., Cambridge (UK)

 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel




-- 
Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] dmar: device scope mem leak fix

2015-07-01 Thread Elena Ufimtseva
On Wed, Jul 01, 2015 at 11:00:45AM +0100, Andrew Cooper wrote:
 On 01/07/15 00:20, elena.ufimts...@oracle.com wrote:
  --- a/xen/drivers/passthrough/vtd/dmar.c
  +++ b/xen/drivers/passthrough/vtd/dmar.c
  @@ -81,6 +81,13 @@ static int __init acpi_register_rmrr_unit(struct 
  acpi_rmrr_unit *rmrr)
   return 0;
   }
   
  +static void scope_devices_free(struct dmar_scope *scope)
  +{
  +if ( scope )
  +scope-devices_cnt = 0;
  +xfree(scope-devices);
 
 This is very liable to suffer a NULL pointer dereference.

Thanks Andrew, reposting.
 
 ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4] dmar: device scope mem leak fix

2015-06-30 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Release memory allocated for scope.devices when disabling
dmar units. Also set device count after memory allocation when
device scope parsing.
This is explanation of why the code should be moved imho and
answers Jan question about why I needed to do this.
In acpi_parse_one_drhr move call to acpi_parse_dev_scope after include_all
check so the return value does not get overwritten by calling 
acpi_parse_dev_scope.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
Changes in v4:
 - make scope_devices_free safe to call with NULL scope pointer;
 - since scope_devices_free is safe to call, use it in failure path 
   in acpi_parse_one_drhd;

Changes in v3:
 - make freeing memory for scope devices and zeroing device counter
 as a function;
 - make sure parse_one_rmrr has memory leak fix in this patch;
 - make sure ret values are not lost acpi_parse_one_drhd;

Changes in v2:
 - release memory for devices scope on error paths in acpi_parse_one_drhd
 and acpi_parse_one_atsr and set the count to zero;
---
 xen/drivers/passthrough/vtd/dmar.c | 38 ++
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 2b07be9..77ef708 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -81,6 +81,13 @@ static int __init acpi_register_rmrr_unit(struct 
acpi_rmrr_unit *rmrr)
 return 0;
 }
 
+static void scope_devices_free(struct dmar_scope *scope)
+{
+if ( scope )
+scope-devices_cnt = 0;
+xfree(scope-devices);
+}
+
 static void __init disable_all_dmar_units(void)
 {
 struct acpi_drhd_unit *drhd, *_drhd;
@@ -90,16 +97,19 @@ static void __init disable_all_dmar_units(void)
 list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list )
 {
 list_del(drhd-list);
+scope_devices_free(drhd-scope);
 xfree(drhd);
 }
 list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list )
 {
 list_del(rmrr-list);
+scope_devices_free(rmrr-scope);
 xfree(rmrr);
 }
 list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list )
 {
 list_del(atsr-list);
+scope_devices_free(atsr-scope);
 xfree(atsr);
 }
 }
@@ -318,13 +328,13 @@ static int __init acpi_parse_dev_scope(
 if ( (cnt = scope_device_count(start, end))  0 )
 return cnt;
 
-scope-devices_cnt = cnt;
 if ( cnt  0 )
 {
 scope-devices = xzalloc_array(u16, cnt);
 if ( !scope-devices )
 return -ENOMEM;
 }
+scope-devices_cnt = cnt;
 
 while ( start  end )
 {
@@ -427,7 +437,7 @@ static int __init acpi_parse_dev_scope(
 
  out:
 if ( ret )
-xfree(scope-devices);
+scope_devices_free(scope);
 
 return ret;
 }
@@ -474,12 +484,10 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 
 ret = iommu_alloc(dmaru);
 if ( ret )
-goto out;
-
-dev_scope_start = (void *)(drhd + 1);
-dev_scope_end = ((void *)drhd) + header-length;
-ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
-   dmaru-scope, DMAR_TYPE, drhd-segment);
+{
+xfree(dmaru);
+return ret;
+}
 
 if ( dmaru-include_all )
 {
@@ -495,7 +503,13 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 if ( drhd-segment == 0 )
 include_all = 1;
 }
+if ( ret )
+goto out;
 
+dev_scope_start = (void *)(drhd + 1);
+dev_scope_end = ((void *)drhd) + header-length;
+ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
+   dmaru-scope, DMAR_TYPE, drhd-segment);
 if ( ret )
 goto out;
 else if ( force_iommu || dmaru-include_all )
@@ -542,6 +556,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
   Workaround BIOS bug: ignore the DRHD due to all 
 devices under its scope are not PCI discoverable!\n);
 
+scope_devices_free(dmaru-scope);
 iommu_free(dmaru);
 xfree(dmaru);
 }
@@ -562,9 +577,11 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 out:
 if ( ret )
 {
+scope_devices_free(dmaru-scope);
 iommu_free(dmaru);
 xfree(dmaru);
 }
+
 return ret;
 }
 
@@ -658,6 +675,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
   Ignore the RMRR (%PRIx64, %PRIx64) due to 
 devices under its scope are not PCI discoverable!\n,
 rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
 xfree(rmrru);
 }
 else if ( base_addr  end_addr )
@@ -665,6 +683,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 dprintk(XENLOG_WARNING VTDPREFIX,
   The RMRR (%PRIx64

[Xen-devel] [PATCH v8 3/4] pci: add wrapper for parse_pci

2015-06-30 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

For sbdf'si parsing in rmrr command line add __parse_pci with addtional
parameter def_seg. __parse_pci will help to identify if segment was
found
in string being parsed or default segment was used.
Make a wrapper parse_pci so the rest of the callers are not affected.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
Acked-by: Jan Beulich jbeul...@suse.com
---
 xen/drivers/pci/pci.c | 11 +++
 xen/include/xen/pci.h |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
index ca07ed0..788a356 100644
--- a/xen/drivers/pci/pci.c
+++ b/xen/drivers/pci/pci.c
@@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
  unsigned int *bus_p, unsigned int *dev_p,
  unsigned int *func_p)
 {
+bool_t def_seg;
+
+return __parse_pci(s, seg_p, bus_p, dev_p, func_p, def_seg);
+}
+
+const char *__init __parse_pci(const char *s, unsigned int *seg_p,
+ unsigned int *bus_p, unsigned int *dev_p,
+ unsigned int *func_p, bool_t *def_seg)
+{
 unsigned long seg = simple_strtoul(s, s, 16), bus, dev, func;
 
 if ( *s != ':' )
 return NULL;
 bus = simple_strtoul(s + 1, s, 16);
+*def_seg = 0;
 if ( *s == ':' )
 dev = simple_strtoul(s + 1, s, 16);
 else
@@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
 dev = bus;
 bus = seg;
 seg = 0;
+*def_seg = 1;
 }
 if ( func_p )
 {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 414106a..d66ecab 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -150,6 +150,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, 
int cap);
 int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int 
cap);
 const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus,
   unsigned int *dev, unsigned int *func);
+const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus,
+  unsigned int *dev, unsigned int *func, bool_t *def_seg);
+
 
 bool_t pcie_aer_get_firmware_first(const struct pci_dev *);
 
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-06-30 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 xen/include/xen/pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3908146..414106a 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf)  0xff)
 #define PCI_BDF(b,d,f)  b)  0xff)  8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  b)  0xff)  8) | ((df)  0xff))
+#define PCI_SBDF(s,b,d,f) s)  0x)  16) | PCI_BDF(b,d,f))
+#define PCI_SEG(sbdf) (((sbdf)  16)  0x)
 
 struct pci_dev_info {
 bool_t is_extfn;
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 4/4] iommu: add rmrr Xen command line option for extra rmrrs

2015-06-30 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

On some platforms RMRR regions may be not specified
in ACPI and thus will not be mapped 1:1 in dom0. This
causes IO Page Faults and prevents dom0 from booting
in PVH mode.
New Xen command line option rmrr allows to specify
such devices and memory regions. These regions are added
to the list of RMRR defined in ACPI if the device
is present in system. As a result, additional RMRRs will
be mapped 1:1 in dom0 with correct permissions.

Mentioned above problems were discovered during PVH work with
ThinkCentre M and Dell 5600T. No official documentation
was found so far in regards to what devices and why cause this.
Experiments show that ThinkCentre M USB devices with enabled
debug port generate DMA read transactions to the regions of
memory marked reserved in host e820 map.
For Dell 5600T the device and faulting addresses are not found yet.

For detailed history of the discussion please check following threads:
http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html

Format for rmrr Xen command line option:
rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
If grub2 used and multiple ranges are specified, ';' should be
quoted/escaped, refer to grub2 manual for more information.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 docs/misc/xen-command-line.markdown |  13 ++
 xen/drivers/passthrough/vtd/dmar.c  | 246 
 2 files changed, 236 insertions(+), 23 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index aa684c0..f307f3d 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1197,6 +1197,19 @@ Specify the host reboot method.
 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
  default it will use that method first).
 
+### rmrr
+ '= 
start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
+
+Define RMRR units that are missing from ACPI table along with device they
+belong to and use them for 1:1 mapping. End addresses can be omitted and one
+page will be mapped. The ranges are inclusive when start and end are specified.
+If segment of the first device is not specified, segment zero will be used.
+If other segments are not specified, first device segment will be used.
+If a segment is specified for other than the first device and it does not match
+the one specified for the first one, an error will be reported.
+Note: grub2 requires to escape or use quotations if special characters are 
used,
+namely ';', refer to the grub2 documentation if multiple ranges are specified.
+
 ### ro-hpet
  `= boolean`
 
diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index a8e1e5d..fa659a9 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -42,6 +42,8 @@
 #define MIN_SCOPE_LEN (sizeof(struct acpi_dmar_device_scope) + \
sizeof(struct acpi_dmar_pci_path))
 
+#define PRI_RMRR(s,e) [%lx-%lx]
+
 LIST_HEAD_READ_MOSTLY(acpi_drhd_units);
 LIST_HEAD_READ_MOSTLY(acpi_rmrr_units);
 static LIST_HEAD_READ_MOSTLY(acpi_atsr_units);
@@ -425,7 +427,7 @@ static int __init acpi_parse_dev_scope(
 default:
 if ( iommu_verbose )
 printk(XENLOG_WARNING VTDPREFIX Unknown scope type %#x\n,
-   acpi_scope-entry_type);
+acpi_scope-entry_type);
 start += acpi_scope-length;
 continue;
 }
@@ -479,8 +481,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 INIT_LIST_HEAD(dmaru-ioapic_list);
 INIT_LIST_HEAD(dmaru-hpet_list);
 if ( iommu_verbose )
-dprintk(VTDPREFIX,   dmaru-address = %PRIx64\n,
-dmaru-address);
+dprintk(VTDPREFIX,   dmaru-address = %PRIx64\n, dmaru-address);
 
 ret = iommu_alloc(dmaru);
 if ( ret )
@@ -541,8 +542,8 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 if ( !pci_device_detect(drhd-segment, b, d, f) )
 {
 dprintk(XENLOG_WARNING VTDPREFIX,
- Non-existent device (%04x:%02x:%02x.%u) is reported
- in this DRHD's scope!\n, drhd-segment, b, d, f);
+ Non-existent device (%04x:%02x:%02x.%u) is reported 
in this DRHD's scope!\n,
+drhd-segment, b, d, f);
 invalid_cnt++;
 }
 }
@@ -553,8 +554,8 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
  invalid_cnt == dmaru-scope.devices_cnt )
 {
 dprintk(XENLOG_WARNING VTDPREFIX,
-  Workaround BIOS bug: ignore the DRHD due to all 
-devices under its scope are not PCI discoverable!\n);
+  Workaround BIOS bug: ignore the DRHD

[Xen-devel] [PATCH v8 0/4] iommu: add rmrr Xen command line option

2015-06-30 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

v8 of rmrr comman line patches. 

Add Xen command line option rmrr to specify RMRR
regions for devices that are not defined in ACPI thus   
causing IO Page Fault while booting dom0 in PVH mode.   
These additional regions will be added to the list of   
RMRR regions parsed from ACPI.  

Changes in v8:  
 - removed bogus debug in patch 1 with non-functional changes;  
 - changed PRI_RMRRL macro for formatting to reflect the fact that two arguments
   are used, so make it PRI_RMRR(s,e) for formatting inclusive RMRR range;  
   'L' is also removed from macro name, which meant to server as a type of 
arguments (%lx);
 - added overlapping check with RMRRs from ACPI;
 - added check based on paddr_bits for pfn's in extra RMRR range (not sure if   
   its redundant with mfn_valid);   
 - addressed while loop exit condition in extra RMRRs parser; 
  
Changes in v7:  
 - make sure RMRRs ranges are being checked correctly;  
 - dont interrupt RMRRs checking if some of checks fails, instead   
 continue to next RMRR; 
 - make rmrr variable names more obvious;   
 - fix debug output formatting to match type of rmrr range; 
 - fix typos in rmrr command line document and in comments; 

Changes in v6:  
 - make __parse_pci return correct result and error codes;  
 - move add_extra_rmrr  
 - previous patch was missing RMRR addresses in range check, add it here;   
 - add overlap check and range boundaries check;
 - moved extra rmrr structure definition to dmar.c; 
 - change def_seg in __parse_pci type from int to bool_t;   
 - change name for extra rmrr range to reflect they hold now pfns;   

Changes in v5:  
 - make parse_pci a wrapper and add __parse_pci with additional def_seg param   
   to identify if segment was specified;
 - make possible not to define segment for each device within same rmrr;
 - limit number of pages for one RMRR by 16;
 - run mfn_valid check for every address in RMRR range; 
 - add PCI_SBDF macro;  
 - remove list for extra rmrrs as they are kept in static array;
  
Elena Ufimtseva (4):
  pci: add PCI_SBDF and PCI_SEG macros
  iommu VT-d: separate rmrr addition function
  pci: add wrapper for parse_pci
  iommu: add rmrr Xen command line option for extra rmrrs

 docs/misc/xen-command-line.markdown |  13 ++
 xen/drivers/passthrough/vtd/dmar.c  | 360 
 xen/drivers/pci/pci.c   |  11 ++
 xen/include/xen/pci.h   |   5 +
 4 files changed, 311 insertions(+), 78 deletions(-)

-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 6/6] AMD-PVH: enable pvh if requirements met

2015-06-25 Thread Elena Ufimtseva
On Wed, Jun 24, 2015 at 02:41:54PM -0700, Mukesh Rathor wrote:
 On Wed, 24 Jun 2015 16:26:44 -0400
 Elena Ufimtseva elena.ufimts...@oracle.com wrote:
 
  On Wed, Jun 24, 2015 at 07:24:18PM +0100, Andrew Cooper wrote:
   On 24/06/15 08:49, Jan Beulich wrote:
On 24.06.15 at 04:34, boris.ostrov...@oracle.com wrote:
On 06/23/2015 08:30 AM, Jan Beulich wrote:
On 22.06.15 at 18:37, elena.ufimts...@oracle.com wrote:
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1444,6 +1444,9 @@ const struct hvm_function_table * __init
start_svm(void)
  svm_function_table.hap_capabilities =
HVM_HAP_SUPERPAGE_2MB | ((cpuid_edx(0x8001) 
0x0400) ? HVM_HAP_SUPERPAGE_1GB : 0); 
+if ( cpu_has_svm_npt   cpu_has_svm_decode )
+svm_function_table.pvh_supported = 1;
If svm_decode indeed is a prereq, then the earlier patch dealing
with the handle_mmio() invocations doesn't need to fiddle with
VMEXIT_INVLPG other than to maybe add a documenting ASSERT().
   
I am not sure we should require decode feature to be required
for PVH support. I can't remember exactly but I think this
feature was first introduced in family 15h so requiring it will
leave at least family 10h processors as not supporting PVH.
The question was why the dependency was added in the first place.
Indeed only fam 12, 15, and 16 have the field documented. Otoh
PVH isn't being supported universally on all VMX variants
either...
   
   Right, but this is a bug (feature?) of the current implementation
   and need fixing.
   
   There are no technical reasons to prevent PVH guests running in any
   case where an HVM guest currently runs.
   
   The only technical restriction I can think of is that a PVH hardware
   domain needs IOMMU support, but that is it.
   
  
  CCing Mukesh, maybe he will reply to as why that restriction is here.
 
 Hi Elena,
 
 Basically, the restriction was to allow AMD to come on par with intel and
 get phase I working on it. Then, I could just focus on handle_mmio for
 INS/OUTS for both intel and amd, and if supporting !svm_decode family
 of CPUs was important, then extend handle_mmio further...  
 
 http://xen-devel.narkive.com/liQjEoV2/rfh-amd-cr-intercept-for-lmsw-clts
 
 [In the absence of svm_decode, mov cr would need to go thru handle_mmio..]

Thanks Mukesh! 

 
 thanks,
 Mukesh
 
 
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 6/6] AMD-PVH: enable pvh if requirements met

2015-06-24 Thread Elena Ufimtseva
On Wed, Jun 24, 2015 at 07:24:18PM +0100, Andrew Cooper wrote:
 On 24/06/15 08:49, Jan Beulich wrote:
  On 24.06.15 at 04:34, boris.ostrov...@oracle.com wrote:
  On 06/23/2015 08:30 AM, Jan Beulich wrote:
  On 22.06.15 at 18:37, elena.ufimts...@oracle.com wrote:
  --- a/xen/arch/x86/hvm/svm/svm.c
  +++ b/xen/arch/x86/hvm/svm/svm.c
  @@ -1444,6 +1444,9 @@ const struct hvm_function_table * __init
  start_svm(void)
svm_function_table.hap_capabilities = HVM_HAP_SUPERPAGE_2MB |
((cpuid_edx(0x8001)  0x0400) ? HVM_HAP_SUPERPAGE_1GB 
  : 0);

  +if ( cpu_has_svm_npt   cpu_has_svm_decode )
  +svm_function_table.pvh_supported = 1;
  If svm_decode indeed is a prereq, then the earlier patch dealing
  with the handle_mmio() invocations doesn't need to fiddle with
  VMEXIT_INVLPG other than to maybe add a documenting ASSERT().
 
  I am not sure we should require decode feature to be required for PVH 
  support. I can't remember exactly but I think this feature was first 
  introduced in family 15h so requiring it will leave at least family 10h 
  processors as not supporting PVH.
  The question was why the dependency was added in the first place.
  Indeed only fam 12, 15, and 16 have the field documented. Otoh
  PVH isn't being supported universally on all VMX variants either...
 
 Right, but this is a bug (feature?) of the current implementation and
 need fixing.
 
 There are no technical reasons to prevent PVH guests running in any case
 where an HVM guest currently runs.
 
 The only technical restriction I can think of is that a PVH hardware
 domain needs IOMMU support, but that is it.
 

CCing Mukesh, maybe he will reply to as why that restriction is here.

 ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/6] pvh: domu construct vmcb 64 bit mode start

2015-06-24 Thread Elena Ufimtseva
On Tue, Jun 23, 2015 at 01:02:49PM +0100, Jan Beulich wrote:
  On 22.06.15 at 18:37, elena.ufimts...@oracle.com wrote:
  From: Elena Ufimtseva elena.ufimts...@oracle.com
  
  Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
 
 As long as this patch originally cam from Mukesh, From: should
 reflect that imo. Once you made changes, you S-o-b would be
 required alongside his.

And once the changes made, From will not be Mukesh email anymore?
 
  --- a/xen/arch/x86/hvm/svm/vmcb.c
  +++ b/xen/arch/x86/hvm/svm/vmcb.c
  @@ -162,7 +162,12 @@ static int construct_vmcb(struct vcpu *v)
   vmcb-ds.attr.bytes = 0xc93;
   vmcb-fs.attr.bytes = 0xc93;
   vmcb-gs.attr.bytes = 0xc93;
  -vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */
  +
  +if ( is_pvh_vcpu(v) )
  +/* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */
  +vmcb-cs.attr.bytes = 0xa9b;
  +else
  +vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */
 
 With 32-bit support now actively being worked on, I don't think
 we want to see any new 32bitfixme-s proposed to go in. Plus it
 needs settling on whether the boot mode is to change for PVH.
 
 Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/6] pvh: domu construct vmcb 64 bit mode start

2015-06-24 Thread Elena Ufimtseva
On Tue, Jun 23, 2015 at 01:02:49PM +0100, Jan Beulich wrote:
  On 22.06.15 at 18:37, elena.ufimts...@oracle.com wrote:
  From: Elena Ufimtseva elena.ufimts...@oracle.com
  
  Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
 
 As long as this patch originally cam from Mukesh, From: should
 reflect that imo. Once you made changes, you S-o-b would be
 required alongside his.
 
  --- a/xen/arch/x86/hvm/svm/vmcb.c
  +++ b/xen/arch/x86/hvm/svm/vmcb.c
  @@ -162,7 +162,12 @@ static int construct_vmcb(struct vcpu *v)
   vmcb-ds.attr.bytes = 0xc93;
   vmcb-fs.attr.bytes = 0xc93;
   vmcb-gs.attr.bytes = 0xc93;
  -vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */
  +
  +if ( is_pvh_vcpu(v) )
  +/* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */
  +vmcb-cs.attr.bytes = 0xa9b;
  +else
  +vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */
 
 With 32-bit support now actively being worked on, I don't think
 we want to see any new 32bitfixme-s proposed to go in. Plus it
 needs settling on whether the boot mode is to change for PVH.

Yep, I will work on this with Boris patches in mind.

 
 Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/6] AMD-PVH: call hvm_emulate_one instead of handle_mmio

2015-06-22 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Certain IOIO instructions and CR access instructions like
lmsw/clts etc need to be emulated. handle_mmio is incorrectly called to
accomplish this. Create svm_emulate() to call hvm_emulate_one which is more
appropriate, and works for pvh as well. handle_mmio call is
forbidden for pvh.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 xen/arch/x86/hvm/svm/svm.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 28792fe..e7262c9 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2289,6 +2289,23 @@ static struct hvm_function_table __initdata 
svm_function_table = {
 .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m,
 };
 
+static void svm_emulate(struct cpu_user_regs *regs)
+{
+int rc;
+struct hvm_emulate_ctxt ctxt;
+
+hvm_emulate_prepare(ctxt, regs);
+rc = hvm_emulate_one(ctxt);
+
+if ( rc != X86EMUL_OKAY )
+{
+   if ( ctxt.exn_pending )
+   hvm_inject_trap(ctxt.trap);
+   else
+   hvm_inject_hw_exception(TRAP_gp_fault, 0);
+}
+}
+
 void svm_vmexit_handler(struct cpu_user_regs *regs)
 {
 uint64_t exit_reason;
@@ -2555,16 +2572,16 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
 if ( handle_pio(port, bytes, dir) )
 __update_guest_eip(regs, vmcb-exitinfo2 - vmcb-rip);
 }
-else if ( !handle_mmio() )
-hvm_inject_hw_exception(TRAP_gp_fault, 0);
+else
+svm_emulate(regs);
 break;
 
 case VMEXIT_CR0_READ ... VMEXIT_CR15_READ:
 case VMEXIT_CR0_WRITE ... VMEXIT_CR15_WRITE:
 if ( cpu_has_svm_decode  (vmcb-exitinfo1  (1ULL  63)) )
 svm_vmexit_do_cr_access(vmcb, regs);
-else if ( !handle_mmio() ) 
-hvm_inject_hw_exception(TRAP_gp_fault, 0);
+else
+svm_emulate(regs);
 break;
 
 case VMEXIT_INVLPG:
@@ -2575,6 +2592,8 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
 }
 else if ( !handle_mmio() )
 hvm_inject_hw_exception(TRAP_gp_fault, 0);
+   else
+svm_emulate(regs);
 break;
 
 case VMEXIT_INVLPGA:
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/6] AMD-PVH: Do not get/set vlapic TPR

2015-06-22 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

PVH doesn't use apic emulation hence vlapic-regs ptr is not set for it.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 xen/arch/x86/hvm/svm/svm.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index e7262c9..64d22fe 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1059,7 +1059,7 @@ static void noreturn svm_do_resume(struct vcpu *v)
 hvm_asid_flush_vcpu(v);
 }
 
-if ( !vcpu_guestmode )
+if ( !vcpu_guestmode  !is_pvh_domain(v-domain) )
 {
 vintr_t intr;
 
@@ -2332,7 +2332,7 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
  * NB. We need to preserve the low bits of the TPR to make checked builds
  * of Windows work, even though they don't actually do anything.
  */
-if ( !vcpu_guestmode ) {
+if ( !vcpu_guestmode  !is_pvh_domain(v-domain) ) {
 intr = vmcb_get_vintr(vmcb);
 vlapic_set_reg(vcpu_vlapic(v), APIC_TASKPRI,
((intr.fields.tpr  0x0F)  4) |
@@ -2720,15 +2720,18 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
 }
 
   out:
-if ( vcpu_guestmode )
-/* Don't clobber TPR of the nested guest. */
-return;
-
-/* The exit may have updated the TPR: reflect this in the hardware vtpr */
-intr = vmcb_get_vintr(vmcb);
-intr.fields.tpr =
-(vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI)  0xFF)  4;
-vmcb_set_vintr(vmcb, intr);
+/* Don't clobber TPR of the nested guest. */
+if ( vcpu_guestmode  !is_pvh_domain(v-domain) )
+{
+/*
+ * The exit may have updated the TPR: reflect this in the hardware
+ * vtpr.
+ */
+intr = vmcb_get_vintr(vmcb);
+intr.fields.tpr =
+(vlapic_get_reg(vcpu_vlapic(v), APIC_TASKPRI)  0xFF)  4;
+vmcb_set_vintr(vmcb, intr);
+}
 }
 
 void svm_trace_vmentry(void)
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 0/6] AMD-PVH: DomU support

2015-06-22 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

This is a re-spin of patches for AMD PVH DomU from Mukesh Rathor.
As I am diving into more details of AMD PVH, I am reposting his series
with minor changes that reviewers (Jan and Boris) posted in comments.

The issue with handle_mmio is not yet addressed and I would like to
continue discussion Mukesh and Jan previously had in this thread
http://lists.xen.org/archives/html/xen-devel/2014-08/msg01760.html
The latest proposed solution was to create additional x86_emulate_ops structure
that will handle pvh mmio correctly.
Should I consider this approach as the one I should be working on?

In vmcb construction patch comments Roger suggested to add additional
parameter to vcpu_initialise as 32 bit work is in. Since Boris has 
posted 32-bit pvh domU support, that would be changed and I wanted to
see if this is what everyone agrees on.

Any other ideas/comments are also appreciated.
Thank you.

Changes made in this re-post:
 - left out setting LMA bit in construct_vmcb as its done in 
hvm_vcpu_initialise;
 - instead of checking if regs ptr is set in vcpu_vlapic, check if its 
pvh_domain;


Elena Ufimtseva (6):
  pvh: domu construct vmcb 64 bit mode start
  AMD-PVH: cpuid intercept
  AMD-PVH: call hvm_emulate_one instead of handle_mmio
  AMD-PVH: Do not get/set vlapic TPR
  AMD-PVH: Support TSC_MODE_NEVER_EMULATE for PVH
  AMD-PVH: enable pvh if requirements met

 xen/arch/x86/hvm/svm/svm.c  | 80 ++---
 xen/arch/x86/hvm/svm/vmcb.c | 16 +++--
 xen/arch/x86/time.c |  1 +
 3 files changed, 68 insertions(+), 29 deletions(-)

-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 5/6] AMD-PVH: Support TSC_MODE_NEVER_EMULATE for PVH

2015-06-22 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

On AMD, MSR_AMD64_TSC_RATIO must be set for rdtsc instruction in guest
to properly read the cpu tsc. To that end, set tsc_khz in struct domain.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 xen/arch/x86/time.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index bbb7e6c..d9709ce 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -1923,6 +1923,7 @@ void tsc_set_info(struct domain *d,
  * but always_emulate does not for some reason.  Figure out
  * why.
  */
+d-arch.tsc_khz = cpu_khz;
 switch ( tsc_mode )
 {
 case TSC_MODE_NEVER_EMULATE:
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 6/6] AMD-PVH: enable pvh if requirements met

2015-06-22 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Finally, enable pvh if the cpu supports NPT and svm decode.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 xen/arch/x86/hvm/svm/svm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 64d22fe..9945550 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1444,6 +1444,9 @@ const struct hvm_function_table * __init start_svm(void)
 svm_function_table.hap_capabilities = HVM_HAP_SUPERPAGE_2MB |
 ((cpuid_edx(0x8001)  0x0400) ? HVM_HAP_SUPERPAGE_1GB : 0);
 
+if ( cpu_has_svm_npt   cpu_has_svm_decode )
+svm_function_table.pvh_supported = 1;
+
 return svm_function_table;
 }
 
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/6] AMD-PVH: cpuid intercept

2015-06-22 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Call pv_cpuid for pvh cpuid intercept. Note, we modify
svm_vmexit_do_cpuid instead of the intercept switch because the guest
eip needs to be adjusted for pvh also.

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 xen/arch/x86/hvm/svm/svm.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 6734fb6..28792fe 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1584,19 +1584,22 @@ static void svm_vmexit_do_cpuid(struct cpu_user_regs 
*regs)
 
 if ( (inst_len = __get_instruction_length(current, INSTR_CPUID)) == 0 )
 return;
+if ( is_pvh_vcpu(current) )
+pv_cpuid(regs);
+else
+{
+eax = regs-eax;
+ebx = regs-ebx;
+ecx = regs-ecx;
+edx = regs-edx;
 
-eax = regs-eax;
-ebx = regs-ebx;
-ecx = regs-ecx;
-edx = regs-edx;
-
-svm_cpuid_intercept(eax, ebx, ecx, edx);
-
-regs-eax = eax;
-regs-ebx = ebx;
-regs-ecx = ecx;
-regs-edx = edx;
+svm_cpuid_intercept(eax, ebx, ecx, edx);
 
+regs-eax = eax;
+regs-ebx = ebx;
+regs-ecx = ecx;
+regs-edx = edx;
+}
 __update_guest_eip(regs, inst_len);
 }
 
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/6] pvh: domu construct vmcb 64 bit mode start

2015-06-22 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Signed-off-by: Mukesh Rathor mukesh.rat...@oracle.com
---
 xen/arch/x86/hvm/svm/vmcb.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index 6339d2a..70a6588 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -162,7 +162,12 @@ static int construct_vmcb(struct vcpu *v)
 vmcb-ds.attr.bytes = 0xc93;
 vmcb-fs.attr.bytes = 0xc93;
 vmcb-gs.attr.bytes = 0xc93;
-vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */
+
+if ( is_pvh_vcpu(v) )
+/* CS.L == 1, exec, read/write, accessed. PVH 32bitfixme. */
+vmcb-cs.attr.bytes = 0xa9b;
+else
+vmcb-cs.attr.bytes = 0xc9b; /* exec/read, accessed */
 
 /* Guest IDT. */
 vmcb-idtr.base = 0;
@@ -184,12 +189,17 @@ static int construct_vmcb(struct vcpu *v)
 vmcb-tr.limit = 0xff;
 
 v-arch.hvm_vcpu.guest_cr[0] = X86_CR0_PE | X86_CR0_ET;
+/* PVH domains start in paging mode */
+if ( is_pvh_vcpu(v) )
+v-arch.hvm_vcpu.guest_cr[0] |= X86_CR0_PG;
 hvm_update_guest_cr(v, 0);
 
-v-arch.hvm_vcpu.guest_cr[4] = 0;
+v-arch.hvm_vcpu.guest_cr[4] = is_pvh_vcpu(v) ? X86_CR4_PAE : 0;
 hvm_update_guest_cr(v, 4);
 
-paging_update_paging_modes(v);
+/* For pvh, paging mode is updated by arch_set_info_guest(). */
+if ( is_hvm_vcpu(v) )
+paging_update_paging_modes(v);
 
 vmcb-_exception_intercepts =
 HVM_TRAP_MASK
-- 
1.9.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: making the PVH 64bit ABI as stableo

2015-06-10 Thread Elena Ufimtseva
On Fri, Jun 05, 2015 at 05:29:21PM +0100, Ian Campbell wrote:
 On Wed, 2015-06-03 at 09:35 -0400, Boris Ostrovsky wrote:
   What I'm hearing from the x86 maintainers is that this is actually a
   high priority and not a nice to have cleanup.
  
   I picked 32-bit support, Elena is looking into AMD
   With the TODOs + these 2 being the things which the x86 maintainers have
   highlighted in this thread as being most critical for marking the ABI as
   stable (or at least moving experimental-tech preview) let me ask
   explicotly:
  
What are the current time frames on these two items?
  
  For 32-bit support, just to get it to work in the within current 
  framework I think can be done for 4.7 release (which is late this year 
  IIRC).
  
  I can't tell you how much it will take to make it a part of a unified 
  32/64-bit guest launching as I haven't looked at this at all yet.
 
 Thanks. What about AMD support then? Elena?

Hi Ian.
I am working on debugging PVH AMD and looks like its movingi, slowly.
Not sure how many other issues will be on the way, but hopefully similar
timeframe, ie late this year.

 
 Ian.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v7 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-06-02 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 xen/include/xen/pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3908146..414106a 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf)  0xff)
 #define PCI_BDF(b,d,f)  b)  0xff)  8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  b)  0xff)  8) | ((df)  0xff))
+#define PCI_SBDF(s,b,d,f) s)  0x)  16) | PCI_BDF(b,d,f))
+#define PCI_SEG(sbdf) (((sbdf)  16)  0x)
 
 struct pci_dev_info {
 bool_t is_extfn;
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v7 0/4] iommu: add rmrr Xen command line option

2015-06-02 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

v7 of rmrr comman line patches.
Thank you for comments on v6.
  
Add Xen command line option rmrr to specify RMRR
regions for devices that are not defined in ACPI thus   
causing IO Page Fault while booting dom0 in PVH mode.   
These additional regions will be added to the list of   
RMRR regions parsed from ACPI.  

Changes in v7:  
 - make sure RMRRs ranges are being checked correctly;  
 - dont interrupt RMRRs checking if some of checks fails, instead   
 continue to next RMRR; 
 - make rmrr variable names more obvious;   
 - fix debug output formatting to match type of rmrr range; 
 - fix typos in rmrr command line document and in comments; 

Changes in v6:  
 - make __parse_pci return correct result and error codes;  
 - move add_extra_rmrr  
 - previous patch was missing RMRR addresses in range check, add it here;   
 - add overlap check and range boundaries check;
 - moved extra rmrr structure definition to dmar.c; 
 - change def_seg in __parse_pci type from int to bool_t;   
 - change name for extra rmrr range to reflect they hold now pfns;  

Changes in v5:  
 - make parse_pci a wrapper and add __parse_pci with additional def_seg param   
   to identify if segment was specified;
 - make possible not to define segment for each device within same rmrr;
 - limit number of pages for one RMRR by 16;
 - run mfn_valid check for every address in RMRR range; 
 - add PCI_SBDF macro;  
 - remove list for extra rmrrs as they are kept in static array;

Changes in v4 after comments by Jan Beulich:
 - keep sbdf per device instead of bdf and one segment per RMRR when parsing 
and compare later;
 - add check for segment values and make sure they are same for one RMRR;   
 - move RMRR parameters checks and add error messages if RMRRs are incorrect;   
 - make relevant variables and functions static;
 - mention requirement for segment values in rmrr documentation;  

Changes in v3:  
 - use ';' instead of '#' in command line and add proper notes for grub ';' 
 special treatment; 

Changes in v2:  
 - move rmrr parser to dmar.c and make it custom_param; 
 - change of rmrr command line oprion format; since adding multiple device  
 per range support needs to utilize more special characters and offered from
 the previous review ';' is not supported, '[' ']' are reserved, ':' and used 
in pci
 format, range and devices are separated by '#'; Suggestions are welcome;   
 - added support for multiple devices per range;
 - moved adding misc RMRRs before ACPI RMRR parsing;
 - make parser fail if pci device is specified incorrectly;

Elena Ufimtseva (4):
  pci: add PCI_SBDF and PCI_SEG macros
  iommu VT-d: separate rmrr addition function
  pci: add wrapper for parse_pci
  iommu: add rmrr Xen command line option for extra rmrrs

 docs/misc/xen-command-line.markdown |  12 ++
 xen/drivers/passthrough/vtd/dmar.c  | 313 +---
 xen/drivers/pci/pci.c   |  11 ++
 xen/include/xen/pci.h   |   5 +
 4 files changed, 279 insertions(+), 62 deletions(-)

-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3] dmar: device scope mem leak fix

2015-06-02 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Third attempt to incorporate memory leak fix.
Thanks for comment on v2.

Release memory allocated for scope.devices when disabling
dmar units. Also set device count after memory allocation when
device scope parsing.

Changes in v3:
 - make freeing memory for scope devices and zeroing device counter
 a function and use it;
 - make sure parse_one_rmrr has memory leak fix in this patch;
 - make sure ret values are not lost acpi_parse_one_drhd;

Changes in v2:
 - release memory for devices scope on error paths in acpi_parse_one_drhd
 and acpi_parse_one_atsr and set the count to zero;

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 xen/drivers/passthrough/vtd/dmar.c | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 2b07be9..a675bf7 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -81,6 +81,12 @@ static int __init acpi_register_rmrr_unit(struct 
acpi_rmrr_unit *rmrr)
 return 0;
 }
 
+static void scope_devices_free(struct dmar_scope *scope)
+{
+scope-devices_cnt = 0;
+xfree(scope-devices);
+}
+
 static void __init disable_all_dmar_units(void)
 {
 struct acpi_drhd_unit *drhd, *_drhd;
@@ -90,16 +96,19 @@ static void __init disable_all_dmar_units(void)
 list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list )
 {
 list_del(drhd-list);
+scope_devices_free(drhd-scope);
 xfree(drhd);
 }
 list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list )
 {
 list_del(rmrr-list);
+scope_devices_free(rmrr-scope);
 xfree(rmrr);
 }
 list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list )
 {
 list_del(atsr-list);
+scope_devices_free(atsr-scope);
 xfree(atsr);
 }
 }
@@ -318,13 +327,13 @@ static int __init acpi_parse_dev_scope(
 if ( (cnt = scope_device_count(start, end))  0 )
 return cnt;
 
-scope-devices_cnt = cnt;
 if ( cnt  0 )
 {
 scope-devices = xzalloc_array(u16, cnt);
 if ( !scope-devices )
 return -ENOMEM;
 }
+scope-devices_cnt = cnt;
 
 while ( start  end )
 {
@@ -427,7 +436,7 @@ static int __init acpi_parse_dev_scope(
 
  out:
 if ( ret )
-xfree(scope-devices);
+scope_devices_free(scope);
 
 return ret;
 }
@@ -476,11 +485,6 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 if ( ret )
 goto out;
 
-dev_scope_start = (void *)(drhd + 1);
-dev_scope_end = ((void *)drhd) + header-length;
-ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
-   dmaru-scope, DMAR_TYPE, drhd-segment);
-
 if ( dmaru-include_all )
 {
 if ( iommu_verbose )
@@ -495,7 +499,13 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 if ( drhd-segment == 0 )
 include_all = 1;
 }
+if ( ret )
+goto out;
 
+dev_scope_start = (void *)(drhd + 1);
+dev_scope_end = ((void *)drhd) + header-length;
+ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
+   dmaru-scope, DMAR_TYPE, drhd-segment);
 if ( ret )
 goto out;
 else if ( force_iommu || dmaru-include_all )
@@ -542,6 +552,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
   Workaround BIOS bug: ignore the DRHD due to all 
 devices under its scope are not PCI discoverable!\n);
 
+scope_devices_free(dmaru-scope);
 iommu_free(dmaru);
 xfree(dmaru);
 }
@@ -552,6 +563,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 its scope are not PCI discoverable! Pls try option 
 iommu=force or iommu=workaround_bios_bug if you 
 really want VT-d\n);
+scope_devices_free(dmaru-scope);
 ret = -EINVAL;
 }
 }
@@ -565,6 +577,7 @@ out:
 iommu_free(dmaru);
 xfree(dmaru);
 }
+
 return ret;
 }
 
@@ -658,6 +671,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
   Ignore the RMRR (%PRIx64, %PRIx64) due to 
 devices under its scope are not PCI discoverable!\n,
 rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
 xfree(rmrru);
 }
 else if ( base_addr  end_addr )
@@ -665,6 +679,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 dprintk(XENLOG_WARNING VTDPREFIX,
   The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
 rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
 xfree(rmrru);
 ret = -EFAULT;
 }
@@ -727,7 +742,10

[Xen-devel] [PATCH v7 4/4] iommu: add rmrr Xen command line option for extra rmrrs

2015-06-02 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

From: Elena Ufimtseva elena.ufimts...@oracle.com

On some platforms RMRR regions may be not specified
in ACPI and thus will not be mapped 1:1 in dom0. This
causes IO Page Faults and prevents dom0 from booting
in PVH mode.
New Xen command line option rmrr allows to specify
such devices and memory regions. These regions are added
to the list of RMRR defined in ACPI if the device
is present in system. As a result, additional RMRRs will
be mapped 1:1 in dom0 with correct permissions.

Mentioned above problems were discovered during PVH work with
ThinkCentre M and Dell 5600T. No official documentation
was found so far in regards to what devices and why cause this.
Experiments show that ThinkCentre M USB devices with enabled
debug port generate DMA read transactions to the regions of
memory marked reserved in host e820 map.
For Dell 5600T the device and faulting addresses are not found yet.

For detailed history of the discussion please check following threads:
http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html

Format for rmrr Xen command line option:
rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
If grub2 used and multiple ranges are specified, ';' should be
quoted/escaped, refer to grub2 manual for more information.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 docs/misc/xen-command-line.markdown |  12 +++
 xen/drivers/passthrough/vtd/dmar.c  | 183 +++-
 2 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 4889e27..d2f0668 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1185,6 +1185,18 @@ Specify the host reboot method.
 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
  default it will use that method first).
 
+### rmrr
+ '= 
start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
+
+Define RMRRs units that are missing from ACPI table along with device they
+belong to and use them for 1:1 mapping. End addresses can be omitted and one
+page will be mapped. The ranges are inclusive when start and end are specified.
+If segment of the first device is not specified, segment zero will be used.
+If other segments are not specified, first device segment will be used.
+If segments are specified for every device and not equal, an error will be 
reported.
+Note: grub2 requires to escape or use quotations if special characters are 
used,
+namely ';', refer to the grub2 documentation if multiple ranges are specified.
+
 ### ro-hpet
  `= boolean`
 
diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 5d78a37..857373f 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -869,6 +869,120 @@ out:
 return ret;
 }
 
+#define MAX_EXTRA_RMRR_PAGES 16
+#define MAX_EXTRA_RMRR 10
+
+/* RMRR units derived from command line rmrr option */
+#define MAX_EXTRA_RMRR_DEV 20
+struct extra_rmrr_unit {
+struct list_head list;
+unsigned long base_pfn, end_pfn;
+u16dev_count;
+u32sbdf[MAX_EXTRA_RMRR_DEV];
+};
+static __initdata unsigned int nr_rmrr;
+static struct __initdata extra_rmrr_unit extra_rmrr_units[MAX_EXTRA_RMRR];
+
+#define PRI_RMRRL [%lx - %lx]
+static void __init add_extra_rmrr(void)
+{
+struct acpi_rmrr_unit *acpi_rmrr;
+unsigned int dev, seg, i, j;
+unsigned long pfn;
+
+for ( i = 0; i  nr_rmrr; i++ )
+{
+if ( extra_rmrr_units[i].base_pfn  extra_rmrr_units[i].end_pfn )
+{
+printk(XENLOG_ERR VTDPREFIX
+   Start pfn  end pfn for RMRR range PRI_RMRRL\n,
+   extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+continue;
+}
+
+if ( extra_rmrr_units[i].end_pfn - extra_rmrr_units[i].base_pfn = 
MAX_EXTRA_RMRR_PAGES )
+{
+printk(XENLOG_ERR VTDPREFIX
+   RMRR range exceeds %s pages 
PRI_RMRRL\n,__stringify(MAX_EXTRA_RMRR_PAGES),
+   extra_rmrr_units[i].base_pfn, extra_rmrr_units[i].end_pfn);
+continue;
+}
+
+for ( j = 0; j  nr_rmrr; j++ )
+{
+if ( i != j  extra_rmrr_units[i].base_pfn = 
extra_rmrr_units[j].end_pfn 
+ extra_rmrr_units[j].base_pfn = extra_rmrr_units[i].end_pfn )
+{
+printk(XENLOG_ERR VTDPREFIX
+  Overlapping RMRRs PRI_RMRRL and PRI_RMRRL\n,
+  extra_rmrr_units[i].base_pfn, 
extra_rmrr_units[i].end_pfn,
+  extra_rmrr_units[j].base_pfn, 
extra_rmrr_units[j].end_pfn);
+break;
+}
+}
+/* Broke out of the overlap loop check, continue with next rmrr. */
+if ( j  nr_rmrr

[Xen-devel] [PATCH v7 2/4] iommu VT-d: separate rmrr addition function

2015-06-02 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

In preparation for auxiliary RMRR data provided on Xen
command line, make RMRR adding a separate function.
Also free memery for rmrr device scope in error path.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 xen/drivers/passthrough/vtd/dmar.c | 130 -
 1 file changed, 69 insertions(+), 61 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index a675bf7..5d78a37 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -581,6 +581,72 @@ out:
 return ret;
 }
 
+static int register_one_rmrr(struct acpi_rmrr_unit *rmrru)
+{
+bool_t ignore = 0;
+unsigned int i = 0;
+int ret = 0;
+
+/* Skip checking if segment is not accessible yet. */
+if ( !pci_known_segment(rmrru-segment) )
+{
+dprintk(XENLOG_WARNING VTDPREFIX, UNKNOWN Prefix! %04x, 
rmrru-segment);
+i = UINT_MAX;
+}
+
+for ( ; i  rmrru-scope.devices_cnt; i++ )
+{
+u8 b = PCI_BUS(rmrru-scope.devices[i]);
+u8 d = PCI_SLOT(rmrru-scope.devices[i]);
+u8 f = PCI_FUNC(rmrru-scope.devices[i]);
+
+if ( pci_device_detect(rmrru-segment, b, d, f) == 0 )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+ Non-existent device (%04x:%02x:%02x.%u) is reported
+ in RMRR (%PRIx64, %PRIx64)'s scope!\n,
+rmrru-segment, b, d, f,
+rmrru-base_address, rmrru-end_address);
+ignore = 1;
+}
+else
+{
+ignore = 0;
+break;
+}
+}
+
+if ( ignore )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+  Ignore the RMRR (%PRIx64, %PRIx64) due to 
+devices under its scope are not PCI discoverable!\n,
+rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
+xfree(rmrru);
+}
+else if ( rmrru-base_address  rmrru-end_address )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+  The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
+rmrru-base_address, rmrru-end_address);
+scope_devices_free(rmrru-scope);
+xfree(rmrru);
+ret = -EFAULT;
+}
+else
+{
+if ( iommu_verbose )
+dprintk(VTDPREFIX,
+  RMRR region: base_addr %PRIx64
+ end_address %PRIx64\n,
+rmrru-base_address, rmrru-end_address);
+acpi_register_rmrr_unit(rmrru);
+}
+
+return ret;
+}
+
 static int __init
 acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 {
@@ -631,68 +697,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
rmrru-scope, RMRR_TYPE, rmrr-segment);
 
-if ( ret || (rmrru-scope.devices_cnt == 0) )
-xfree(rmrru);
+if ( !ret  (rmrru-scope.devices_cnt != 0) )
+register_one_rmrr(rmrru);
 else
-{
-u8 b, d, f;
-bool_t ignore = 0;
-unsigned int i = 0;
-
-/* Skip checking if segment is not accessible yet. */
-if ( !pci_known_segment(rmrr-segment) )
-i = UINT_MAX;
-
-for ( ; i  rmrru-scope.devices_cnt; i++ )
-{
-b = PCI_BUS(rmrru-scope.devices[i]);
-d = PCI_SLOT(rmrru-scope.devices[i]);
-f = PCI_FUNC(rmrru-scope.devices[i]);
-
-if ( !pci_device_detect(rmrr-segment, b, d, f) )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
- Non-existent device (%04x:%02x:%02x.%u) is reported
- in RMRR (%PRIx64, %PRIx64)'s scope!\n,
-rmrr-segment, b, d, f,
-rmrru-base_address, rmrru-end_address);
-ignore = 1;
-}
-else
-{
-ignore = 0;
-break;
-}
-}
-
-if ( ignore )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-  Ignore the RMRR (%PRIx64, %PRIx64) due to 
-devices under its scope are not PCI discoverable!\n,
-rmrru-base_address, rmrru-end_address);
-scope_devices_free(rmrru-scope);
-xfree(rmrru);
-}
-else if ( base_addr  end_addr )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-  The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
-rmrru-base_address, rmrru-end_address);
-scope_devices_free(rmrru-scope);
-xfree(rmrru);
-ret = -EFAULT;
-}
-else
-{
-if ( iommu_verbose )
-dprintk(VTDPREFIX,
-  RMRR region: base_addr %PRIx64
- end_address %PRIx64\n

Re: [Xen-devel] [PATCH v6 2/4] iommu VT-d: separate rmrr addition function

2015-06-01 Thread Elena Ufimtseva
On Mon, Jun 01, 2015 at 04:51:55AM +, Tian, Kevin wrote:
  From: elena.ufimts...@oracle.com [mailto:elena.ufimts...@oracle.com]
  Sent: Saturday, May 30, 2015 5:39 AM
  
  From: Elena Ufimtseva elena.ufimts...@oracle.com
  
  In preparation for auxiliary RMRR data provided on Xen
  command line, make RMRR adding a separate function.
  Also free memery for rmrr device scope in error path.
  No changes since v5.
  
  Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
  Reviewed-by: Jan Beulich jbeul...@suse.com
  ---
   xen/drivers/passthrough/vtd/dmar.c | 129
  -
   1 file changed, 70 insertions(+), 59 deletions(-)
  
  diff --git a/xen/drivers/passthrough/vtd/dmar.c
  b/xen/drivers/passthrough/vtd/dmar.c
  index 0985150..89a2f79 100644
  --- a/xen/drivers/passthrough/vtd/dmar.c
  +++ b/xen/drivers/passthrough/vtd/dmar.c
  @@ -576,6 +576,73 @@ out:
   return ret;
   }
  
  +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru)
  +{
  +bool_t ignore = 0;
  +unsigned int i = 0;
  +int ret = 0;
  +
  +/* Skip checking if segment is not accessible yet. */
  +if ( !pci_known_segment(rmrru-segment) )
  +{
  +dprintk(XENLOG_WARNING VTDPREFIX, UNKNOWN Prefix! %04x,
  rmrru-segment);
  +i = UINT_MAX;
  +}
  +
  +for ( ; i  rmrru-scope.devices_cnt; i++ )
  +{
  +u8 b = PCI_BUS(rmrru-scope.devices[i]);
  +u8 d = PCI_SLOT(rmrru-scope.devices[i]);
  +u8 f = PCI_FUNC(rmrru-scope.devices[i]);
  +
  +if ( pci_device_detect(rmrru-segment, b, d, f) == 0 )
  +{
  +dprintk(XENLOG_WARNING VTDPREFIX,
  + Non-existent device (%04x:%02x:%02x.%u) is
  reported
  + in RMRR (%PRIx64, %PRIx64)'s scope!\n,
  +rmrru-segment, b, d, f,
  +rmrru-base_address, rmrru-end_address);
  +ignore = 1;
  +}
  +else
  +{
  +ignore = 0;
  +break;
  +}
  +}
  +
  +if ( ignore )
  +{
  +dprintk(XENLOG_WARNING VTDPREFIX,
  +  Ignore the RMRR (%PRIx64, %PRIx64) due to 
  +devices under its scope are not PCI discoverable!\n,
  +rmrru-base_address, rmrru-end_address);
  +xfree(rmrru-scope.devices);
  +xfree(rmrru);
  +ret = -EFAULT;
  +}
  +else if ( rmrru-base_address  rmrru-end_address )
  +{
  +dprintk(XENLOG_WARNING VTDPREFIX,
  +  The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
  +rmrru-base_address, rmrru-end_address);
  +xfree(rmrru-scope.devices);
  +xfree(rmrru);
  +ret = -EFAULT;
  +}
 
 above two error handling can be combined into one at the end of the
 func like in other places.
 
 Thanks
 Kevin

Hi Kevin

Thank you for review.
I think in this case I cannot combine these two as the ret should not be
set in first (ignore) branch. Looks like I placed it there by mistake.

Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 2/4] iommu VT-d: separate rmrr addition function

2015-06-01 Thread Elena Ufimtseva
On Mon, Jun 01, 2015 at 09:53:55AM +0100, Jan Beulich wrote:
  On 29.05.15 at 23:38, elena.ufimts...@oracle.com wrote:
  In preparation for auxiliary RMRR data provided on Xen
  command line, make RMRR adding a separate function.
  Also free memery for rmrr device scope in error path.
  No changes since v5.
 
 Certainly there is. (And the statement wouldn't belong here anyway,
 but below the first --- separator.)
 
  Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
  Reviewed-by: Jan Beulich jbeul...@suse.com
 
 And certainly I didn't approve it in this shape:
 
  +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru)
  +{
  +bool_t ignore = 0;
  +unsigned int i = 0;
  +int ret = 0;
  +
  +/* Skip checking if segment is not accessible yet. */
  +if ( !pci_known_segment(rmrru-segment) )
  +{
  +dprintk(XENLOG_WARNING VTDPREFIX, UNKNOWN Prefix! %04x, 
  rmrru-segment);
  +i = UINT_MAX;
  +}
  +
  +for ( ; i  rmrru-scope.devices_cnt; i++ )
  +{
  +u8 b = PCI_BUS(rmrru-scope.devices[i]);
  +u8 d = PCI_SLOT(rmrru-scope.devices[i]);
  +u8 f = PCI_FUNC(rmrru-scope.devices[i]);
  +
  +if ( pci_device_detect(rmrru-segment, b, d, f) == 0 )
  +{
  +dprintk(XENLOG_WARNING VTDPREFIX,
  + Non-existent device (%04x:%02x:%02x.%u) is reported
  + in RMRR (%PRIx64, %PRIx64)'s scope!\n,
  +rmrru-segment, b, d, f,
  +rmrru-base_address, rmrru-end_address);
  +ignore = 1;
  +}
  +else
  +{
  +ignore = 0;
  +break;
  +}
  +}
  +
  +if ( ignore )
  +{
  +dprintk(XENLOG_WARNING VTDPREFIX,
  +  Ignore the RMRR (%PRIx64, %PRIx64) due to 
  +devices under its scope are not PCI discoverable!\n,
  +rmrru-base_address, rmrru-end_address);
  +xfree(rmrru-scope.devices);
  +xfree(rmrru);
  +ret = -EFAULT;
 
 You _again_ made this an error, which it wasn't before. A little more
 care please.

Yes, and I agreed that it did not make sense to set ret here, wishful
typing I guess ) 

 
 Also you folded the leak fix into here without saying so. As said on
 the solitary leak fix patch - that change belongs there (not the least
 because we will want to backport that but not this one).

yes, changing this.

 
 Jan
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] dmar: device scope mem leak fix

2015-06-01 Thread Elena Ufimtseva
On Mon, Jun 01, 2015 at 09:45:51AM +0100, Jan Beulich wrote:
  On 01.06.15 at 06:47, kevin.t...@intel.com wrote:
   From: Tian, Kevin
  Sent: Monday, June 01, 2015 12:43 PM
  
  
  and looks you dropped earlier changes to acpi_parse_one_rmrr. any
  elaboration why it's not required in this version?
  
  Never mind this one. Seems you have it in RMRR patch set.
 
 No - it belongs here, not there.
 
 Jan

Yes, Jan is right, it went to rmrr patch, but have to be here in mem
leak fix.
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] dmar: device scope mem leak fix

2015-05-29 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Release memory allocated for scope.devices when disabling
dmar units. Also set device count after memory allocation when
device scope parsing.

Changes in v2:
 - release memory for devices scope on error paths in acpi_parse_one_drhd
 and acpi_parse_one_atsr and set the count to zero;

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 xen/drivers/passthrough/vtd/dmar.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 2b07be9..0985150 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -90,16 +90,19 @@ static void __init disable_all_dmar_units(void)
 list_for_each_entry_safe ( drhd, _drhd, acpi_drhd_units, list )
 {
 list_del(drhd-list);
+xfree(drhd-scope.devices);
 xfree(drhd);
 }
 list_for_each_entry_safe ( rmrr, _rmrr, acpi_rmrr_units, list )
 {
 list_del(rmrr-list);
+xfree(rmrr-scope.devices);
 xfree(rmrr);
 }
 list_for_each_entry_safe ( atsr, _atsr, acpi_atsr_units, list )
 {
 list_del(atsr-list);
+xfree(atsr-scope.devices);
 xfree(atsr);
 }
 }
@@ -318,13 +321,13 @@ static int __init acpi_parse_dev_scope(
 if ( (cnt = scope_device_count(start, end))  0 )
 return cnt;
 
-scope-devices_cnt = cnt;
 if ( cnt  0 )
 {
 scope-devices = xzalloc_array(u16, cnt);
 if ( !scope-devices )
 return -ENOMEM;
 }
+scope-devices_cnt = cnt;
 
 while ( start  end )
 {
@@ -427,7 +430,10 @@ static int __init acpi_parse_dev_scope(
 
  out:
 if ( ret )
+{
+scope-devices_cnt = 0;
 xfree(scope-devices);
+}
 
 return ret;
 }
@@ -478,8 +484,6 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 
 dev_scope_start = (void *)(drhd + 1);
 dev_scope_end = ((void *)drhd) + header-length;
-ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
-   dmaru-scope, DMAR_TYPE, drhd-segment);
 
 if ( dmaru-include_all )
 {
@@ -496,6 +500,8 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 include_all = 1;
 }
 
+ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
+   dmaru-scope, DMAR_TYPE, drhd-segment);
 if ( ret )
 goto out;
 else if ( force_iommu || dmaru-include_all )
@@ -554,6 +560,8 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
 really want VT-d\n);
 ret = -EINVAL;
 }
+dmaru-scope.devices_cnt = 0;
+xfree(dmaru-scope.devices);
 }
 else
 acpi_register_drhd_unit(dmaru);
@@ -727,7 +735,11 @@ acpi_parse_one_atsr(struct acpi_dmar_header *header)
 }
 
 if ( ret )
+{
+atsru-scope.devices_cnt = 0;
+xfree(atsru-scope.devices);
 xfree(atsru);
+}
 else
 acpi_register_atsr_unit(atsru);
 return ret;
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/4] iommu: add rmrr Xen command line option for extra rmrrs

2015-05-29 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

On some platforms RMRR regions may be not specified
in ACPI and thus will not be mapped 1:1 in dom0. This
causes IO Page Faults and prevents dom0 from booting
in PVH mode.
New Xen command line option rmrr allows to specify
such devices and memory regions. These regions are added
to the list of RMRR defined in ACPI if the device
is present in system. As a result, additional RMRRs will
be mapped 1:1 in dom0 with correct permissions.

Mentioned above problems were discovered during PVH work with
ThinkCentre M and Dell 5600T. No official documentation
was found so far in regards to what devices and why cause this.
Experiments show that ThinkCentre M USB devices with enabled
debug port generate DMA read transactions to the regions of
memory marked reserved in host e820 map.
For Dell 5600T the device and faulting addresses are not found yet.

For detailed history of the discussion please check following threads:
http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html

Format for rmrr Xen command line option:
rmrr=start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
If grub2 used and multiple ranges are specified, ';' should be
quoted/escaped, refer to grub2 manual for more information.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 docs/misc/xen-command-line.markdown |  13 +++
 xen/drivers/passthrough/vtd/dmar.c  | 164 +++-
 2 files changed, 176 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 4889e27..26e2a5e 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1185,6 +1185,19 @@ Specify the host reboot method.
 'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
  default it will use that method first).
 
+### rmrr
+ '= 
start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
+
+Define RMRRs units that are missing from ACPI table along with device
+they belong to and use them for 1:1 mapping. End addresses can be omitted
+and one page will be mapped. The ranges are inclusive when start and end
+are specified.If segement of the first device is not specified, segment zero 
will be used.
+If other segments are not specified, first device segment will be used.
+If segments are specified for every device and not equal, an error will be 
reported.
+Note: grub2 requires to escape or use quotations if special
+characters are used, namely ';', refer to the grub2 documentation if multiple
+ranges are specified.
+
 ### ro-hpet
  `= boolean`
 
diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 89a2f79..d675940 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -866,6 +866,106 @@ out:
 return ret;
 }
 
+#define MAX_EXTRA_RMRR_PAGES 16
+#define MAX_EXTRA_RMRR 10
+
+/* RMRR units derived from command line rmrr option */
+#define MAX_EXTRA_RMRR_DEV 20
+struct extra_rmrr_unit {
+struct list_head list;
+unsigned long base_pfn, end_pfn;
+u16dev_count;
+u32sbdf[MAX_EXTRA_RMRR_DEV];
+};
+static __initdata unsigned int nr_rmrr;
+static struct __initdata extra_rmrr_unit rmrru[MAX_EXTRA_RMRR];
+
+static void __init add_extra_rmrr(void)
+{
+struct acpi_rmrr_unit *rmrrn;
+unsigned int dev, seg, i, j;
+unsigned long pfn;
+
+for ( i = 0; i  nr_rmrr; i++ )
+{
+if ( rmrru[i].base_pfn  rmrru[i].end_pfn )
+{
+printk(XENLOG_ERR VTDPREFIX
+   Start pfn  end pfn for RMRR range [%PRIx64 - 
%PRIx64]\n,
+   rmrru[i].base_pfn, rmrru[i].end_pfn);
+return;
+}
+
+if ( rmrru[i].end_pfn - rmrru[i].base_pfn = MAX_EXTRA_RMRR_PAGES )
+{
+printk(XENLOG_ERR VTDPREFIX
+   RMRR range exceeds 16 pages [%PRIx64 - %PRIx64]\n,
+   rmrru[i].base_pfn, rmrru[i].end_pfn);
+return;
+}
+
+for ( j = 0; j  nr_rmrr; j++ )
+{
+if ( i != j  rmrru[i].base_pfn = rmrru[j].end_pfn 
+ rmrru[j].base_pfn = rmrru[i].end_pfn )
+{
+printk(XENLOG_ERR VTDPREFIX
+  Overlapping RMRRs [%PRIx64,%PRIx64] and 
[%PRIx64,%PRIx64]\n,
+  rmrru[i].base_pfn, rmrru[i].end_pfn,
+  rmrru[j].base_pfn, rmrru[j].end_pfn);
+   return;
+}
+}
+
+for ( pfn = rmrru[i].base_pfn; pfn = rmrru[i].end_pfn; pfn++ )
+{
+if ( !mfn_valid(pfn) )
+if ( iommu_verbose )
+printk(XENLOG_ERR VTDPREFIX
+   Invalid mfn in RMRR range [%PRIx64 - 
%PRIx64]\n,
+   rmrru[i].base_pfn, rmrru[i].end_pfn);
+return

[Xen-devel] [PATCH v6 1/4] pci: add PCI_SBDF and PCI_SEG macros

2015-05-29 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 xen/include/xen/pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 3908146..414106a 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -33,6 +33,8 @@
 #define PCI_DEVFN2(bdf) ((bdf)  0xff)
 #define PCI_BDF(b,d,f)  b)  0xff)  8) | PCI_DEVFN(d,f))
 #define PCI_BDF2(b,df)  b)  0xff)  8) | ((df)  0xff))
+#define PCI_SBDF(s,b,d,f) s)  0x)  16) | PCI_BDF(b,d,f))
+#define PCI_SEG(sbdf) (((sbdf)  16)  0x)
 
 struct pci_dev_info {
 bool_t is_extfn;
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 3/4] pci: add wrapper for parse_pci

2015-05-29 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

For sbdf'si parsing in rmrr command line add __parse_pci with addtional
parameter def_seg. __parse_pci will help to identify if segment was
found
in string being parsed or default segment was used.
Make a wrapper parse_pci so the rest of the callers are not affected.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
---
 xen/drivers/pci/pci.c | 11 +++
 xen/include/xen/pci.h |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/xen/drivers/pci/pci.c b/xen/drivers/pci/pci.c
index ca07ed0..788a356 100644
--- a/xen/drivers/pci/pci.c
+++ b/xen/drivers/pci/pci.c
@@ -119,11 +119,21 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
  unsigned int *bus_p, unsigned int *dev_p,
  unsigned int *func_p)
 {
+bool_t def_seg;
+
+return __parse_pci(s, seg_p, bus_p, dev_p, func_p, def_seg);
+}
+
+const char *__init __parse_pci(const char *s, unsigned int *seg_p,
+ unsigned int *bus_p, unsigned int *dev_p,
+ unsigned int *func_p, bool_t *def_seg)
+{
 unsigned long seg = simple_strtoul(s, s, 16), bus, dev, func;
 
 if ( *s != ':' )
 return NULL;
 bus = simple_strtoul(s + 1, s, 16);
+*def_seg = 0;
 if ( *s == ':' )
 dev = simple_strtoul(s + 1, s, 16);
 else
@@ -131,6 +141,7 @@ const char *__init parse_pci(const char *s, unsigned int 
*seg_p,
 dev = bus;
 bus = seg;
 seg = 0;
+*def_seg = 1;
 }
 if ( func_p )
 {
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 414106a..d66ecab 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -150,6 +150,9 @@ int pci_find_ext_capability(int seg, int bus, int devfn, 
int cap);
 int pci_find_next_ext_capability(int seg, int bus, int devfn, int pos, int 
cap);
 const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus,
   unsigned int *dev, unsigned int *func);
+const char *__parse_pci(const char *, unsigned int *seg, unsigned int *bus,
+  unsigned int *dev, unsigned int *func, bool_t *def_seg);
+
 
 bool_t pcie_aer_get_firmware_first(const struct pci_dev *);
 
-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 2/4] iommu VT-d: separate rmrr addition function

2015-05-29 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

In preparation for auxiliary RMRR data provided on Xen
command line, make RMRR adding a separate function.
Also free memery for rmrr device scope in error path.
No changes since v5.

Signed-off-by: Elena Ufimtseva elena.ufimts...@oracle.com
Reviewed-by: Jan Beulich jbeul...@suse.com
---
 xen/drivers/passthrough/vtd/dmar.c | 129 -
 1 file changed, 70 insertions(+), 59 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.c 
b/xen/drivers/passthrough/vtd/dmar.c
index 0985150..89a2f79 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -576,6 +576,73 @@ out:
 return ret;
 }
 
+static int register_one_rmrr(struct acpi_rmrr_unit *rmrru)
+{
+bool_t ignore = 0;
+unsigned int i = 0;
+int ret = 0;
+
+/* Skip checking if segment is not accessible yet. */
+if ( !pci_known_segment(rmrru-segment) )
+{
+dprintk(XENLOG_WARNING VTDPREFIX, UNKNOWN Prefix! %04x, 
rmrru-segment);
+i = UINT_MAX;
+}
+
+for ( ; i  rmrru-scope.devices_cnt; i++ )
+{
+u8 b = PCI_BUS(rmrru-scope.devices[i]);
+u8 d = PCI_SLOT(rmrru-scope.devices[i]);
+u8 f = PCI_FUNC(rmrru-scope.devices[i]);
+
+if ( pci_device_detect(rmrru-segment, b, d, f) == 0 )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+ Non-existent device (%04x:%02x:%02x.%u) is reported
+ in RMRR (%PRIx64, %PRIx64)'s scope!\n,
+rmrru-segment, b, d, f,
+rmrru-base_address, rmrru-end_address);
+ignore = 1;
+}
+else
+{
+ignore = 0;
+break;
+}
+}
+
+if ( ignore )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+  Ignore the RMRR (%PRIx64, %PRIx64) due to 
+devices under its scope are not PCI discoverable!\n,
+rmrru-base_address, rmrru-end_address);
+xfree(rmrru-scope.devices);
+xfree(rmrru);
+ret = -EFAULT;
+}
+else if ( rmrru-base_address  rmrru-end_address )
+{
+dprintk(XENLOG_WARNING VTDPREFIX,
+  The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
+rmrru-base_address, rmrru-end_address);
+xfree(rmrru-scope.devices);
+xfree(rmrru);
+ret = -EFAULT;
+}
+else
+{
+if ( iommu_verbose )
+dprintk(VTDPREFIX,
+  RMRR region: base_addr %PRIx64
+ end_address %PRIx64\n,
+rmrru-base_address, rmrru-end_address);
+acpi_register_rmrr_unit(rmrru);
+}
+
+return ret;
+}
+
 static int __init
 acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 {
@@ -626,66 +693,10 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
 ret = acpi_parse_dev_scope(dev_scope_start, dev_scope_end,
rmrru-scope, RMRR_TYPE, rmrr-segment);
 
-if ( ret || (rmrru-scope.devices_cnt == 0) )
-xfree(rmrru);
+if ( !ret  (rmrru-scope.devices_cnt != 0) )
+return register_one_rmrr(rmrru);
 else
-{
-u8 b, d, f;
-bool_t ignore = 0;
-unsigned int i = 0;
-
-/* Skip checking if segment is not accessible yet. */
-if ( !pci_known_segment(rmrr-segment) )
-i = UINT_MAX;
-
-for ( ; i  rmrru-scope.devices_cnt; i++ )
-{
-b = PCI_BUS(rmrru-scope.devices[i]);
-d = PCI_SLOT(rmrru-scope.devices[i]);
-f = PCI_FUNC(rmrru-scope.devices[i]);
-
-if ( !pci_device_detect(rmrr-segment, b, d, f) )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
- Non-existent device (%04x:%02x:%02x.%u) is reported
- in RMRR (%PRIx64, %PRIx64)'s scope!\n,
-rmrr-segment, b, d, f,
-rmrru-base_address, rmrru-end_address);
-ignore = 1;
-}
-else
-{
-ignore = 0;
-break;
-}
-}
-
-if ( ignore )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-  Ignore the RMRR (%PRIx64, %PRIx64) due to 
-devices under its scope are not PCI discoverable!\n,
-rmrru-base_address, rmrru-end_address);
-xfree(rmrru);
-}
-else if ( base_addr  end_addr )
-{
-dprintk(XENLOG_WARNING VTDPREFIX,
-  The RMRR (%PRIx64, %PRIx64) is incorrect!\n,
-rmrru-base_address, rmrru-end_address);
-xfree(rmrru);
-ret = -EFAULT;
-}
-else
-{
-if ( iommu_verbose )
-dprintk(VTDPREFIX,
-  RMRR region: base_addr %PRIx64
- end_address %PRIx64\n,
-rmrru

[Xen-devel] [PATCH v6 0/4] iommu: add rmrr Xen command line option

2015-05-29 Thread elena . ufimtseva
From: Elena Ufimtseva elena.ufimts...@oracle.com

v6 of extra rmrr series with addressed comments from Jan Beulich.
Any suggestions are welcome.
 
Add Xen command line option rmrr to specify RMRR
regions for devices that are not defined in ACPI thus
causing IO Page Fault while booting dom0 in PVH mode.
These additional regions will be added to the list of
RMRR regions parsed from ACPI.

Changes in v6:
 - make __parse_pci return correct result and error codes;
 - move add_extra_rmrr
 - previous patch was missing RMRR addresses in range check, add it here;
 - add overlap check and range boundaries check;
 - moved extra rmrr structure definition to dmar.c;
 - change def_seg in __parse_pci type from int to bool_t;
 - change name for extra rmrr range to reflect they hold now pfns;

Changes in v5:
 - make parse_pci a wrapper and add __parse_pci with additional def_seg param
   to identify if segment was specified;
 - make possible not to define segment for each device within same rmrr;
 - limit number of pages for one RMRR by 16;
 - run mfn_valid check for every address in RMRR range;
 - add PCI_SBDF macro;
 - remove list for extra rmrrs as they are kept in static array;

Changes in v4 after comments by Jan Beulich:
 - keep sbdf per device instead of bdf and one segment per RMRR when parsing 
and compare later;
 - add check for segment values and make sure they are same for one RMRR;
 - move RMRR parameters checks and add error messages if RMRRs are incorrect;
 - make relevant variables and functions static;
 - mention requirement for segment values in rmrr documentation;

Changes in v3:
 - use ';' instead of '#' in command line and add proper notes for grub ';'
 special treatment;

Changes in v2:
 - move rmrr parser to dmar.c and make it custom_param;
 - change of rmrr command line oprion format; since adding multiple device
 per range support needs to utilize more special characters and offered from
 the previous review ';' is not supported, '[' ']' are reserved, ':' and used 
in pci
 format, range and devices are separated by '#'; Suggestions are welcome;
 - added support for multiple devices per range;
 - moved adding misc RMRRs before ACPI RMRR parsing;
 - make parser fail if pci device is specified incorrectly;

Elena Ufimtseva (4):
  pci: add PCI_SBDF and PCI_SEG macros
  iommu VT-d: separate rmrr addition function
  pci: add wrapper for parse_pci
  iommu: add rmrr Xen command line option for extra rmrrs

 docs/misc/xen-command-line.markdown |  13 ++
 xen/drivers/passthrough/vtd/dmar.c  | 293 
 xen/drivers/pci/pci.c   |  11 ++
 xen/include/xen/pci.h   |   5 +
 4 files changed, 262 insertions(+), 60 deletions(-)

-- 
2.1.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] dmar: device scope mem leak fix

2015-05-28 Thread Elena Ufimtseva
On Thu, May 28, 2015 at 08:57:16AM +0100, Jan Beulich wrote:
  On 27.05.15 at 21:56, elena.ufimts...@oracle.com wrote:
  On Tue, May 26, 2015 at 10:46:30AM +0100, Jan Beulich wrote:
   On 23.05.15 at 03:27, elena.ufimts...@oracle.com wrote:
   @@ -658,6 +661,7 @@ acpi_parse_one_rmrr(struct acpi_dmar_header *header)
  Ignore the RMRR (%PRIx64, %PRIx64) due to 
devices under its scope are not PCI discoverable!\n,
rmrru-base_address, rmrru-end_address);
   +xfree(rmrru-scope.devices);
xfree(rmrru);
  Do you think the ret should be set in this case also?
 
 Iirc in an earlier version of the other series you had added a failure
 error code setting here, and I had to specifically ask you to remove
 it. If you still think one is needed here, this would need to be a
 separate patch with a proper explanation.
 
 Jan

Hi Jan

I do remember this. Looked again and I think it makes sense on ignore
path not to return error.


 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 4/4] iommu: add rmrr Xen command line option for extra rmrrs

2015-05-27 Thread Elena Ufimtseva
On Tue, May 26, 2015 at 01:02:27PM +0100, Jan Beulich wrote:
  On 23.05.15 at 03:33, elena.ufimts...@oracle.com wrote:
  --- a/docs/misc/xen-command-line.markdown
  +++ b/docs/misc/xen-command-line.markdown
  @@ -1185,6 +1185,19 @@ Specify the host reboot method.
   'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
default it will use that method first).
   
  +### rmrr
  + '= 
  start-end=[s1]bdf1[,[s1]bdf2[,...]];start-end=[s2]bdf1[,[s2]bdf2[,...]]
  +
  +Define RMRRs units that are missing from ACPI table along with device
  +they belong to and use them for 1:1 mapping. End addresses can be omitted
  +and one page will be mapped. The ranges are inclusive when start and end
  +are specified.If segement of the first device is not specified, the 
  default segment will be used.
 

Thanks for review Jan.

 specified. If the segment ..., segment zero will be used.
 
  +If segments are specified for every device and not equal, error will be 
  reported.
 
 ..., an error ...
 
  --- a/xen/drivers/passthrough/vtd/dmar.c
  +++ b/xen/drivers/passthrough/vtd/dmar.c
  @@ -50,6 +50,7 @@ static LIST_HEAD_READ_MOSTLY(acpi_rhsa_units);
   static struct acpi_table_header *__read_mostly dmar_table;
   static int __read_mostly dmar_flags;
   static u64 __read_mostly igd_drhd_address;
  +static void __init add_extra_rmrr(void);
 
 Why do you need this declaration? (And if you really need it - no
 segment annotations on declarations please.)
 
  @@ -856,6 +857,78 @@ out:
   return ret;
   }
   
  +#define MAX_EXTRA_RMRR_PAGES 16
  +#define MAX_EXTRA_RMRR 10
  +static __initdata unsigned int nr_rmrr;
  +static struct __initdata extra_rmrr_unit rmrru[MAX_EXTRA_RMRR];
  +
  +static void __init add_extra_rmrr(void)
  +{
  +struct acpi_rmrr_unit *rmrrn;
  +unsigned int dev, seg, addr;
  +
  +for (unsigned int i = 0; i  nr_rmrr; i++ )
 
 No C++ style constructs like this please. Instead please add the
 missing blank after the opening parenthesis.
 
  +{
  +rmrrn = xmalloc(struct acpi_rmrr_unit);
 
 acpi_parse_one_rmrr() uses xzalloc() here. For the avoidance of
 doubt, I'd be fine with you doing so provided this is correct (i.e. all
 fields end up properly initialized, just like is the case with the
 -scope.devices allocation), if this wasn't introducing a latent bug
 (should a field get added).
Agree, will change this.
 
  +if ( !rmrrn )
  +return;
  +
  +rmrrn-scope.devices = xmalloc_array(typeof(*rmrrn-scope.devices),
 
 I'm afraid I may have mislead you with comments elsewhere: In
 xmalloc() invocations, considering the typeful result it produces,
 using the spelled out type is preferred over typeof() like used
 here.
Thanks for explanation, I did not know that.
 
  + rmrru[i].dev_count);
  +if ( !rmrrn-scope.devices )
  +{
  +xfree(rmrrn);
  +return;
  +}
  +
  +if ( rmrru[i].end_address - rmrru[i].base_address  
  MAX_EXTRA_RMRR_PAGES )
 
 Now this reads really odd: With the conversion to store page numbers
 in these fields, they should have got renamed from _address (and
 afaict no longer need to be of u64 type). Also note that you have an
 off-by-one error here: The end address being inclusive, you want to
 bail on = max.
 
 I also fail to see end  base being rejected somewhere. Nor are
 overlaps being dealt with (see acpi_parse_one_rmrr()).

Somehow I skipped that, possibly wrong brnach.
Will fix this and add overlap check.
 
  +{
  +printk(XENLOG_ERR VTDPREFIX
  +   RMRR range exceeds 16 pages [%PRIx64 - %PRIx64]\n,
  +   rmrru[i].base_address, rmrru[i].end_address);
  +xfree(rmrrn-scope.devices);
  +xfree(rmrrn);
  +return;
  +}
  +
  +for ( addr = rmrru[i].base_address; addr = rmrru[i].end_address; 
  addr++ )
 
 And the loop variable here shouldn't be addr then (and certainly not
 of type unsigned int).
 
  +{
  +if ( iommu_verbose )
  +printk(XENLOG_ERR VTDPREFIX
  +   Invalid mfn in RMRR range [%PRIx64 - 
  %PRIx64]\n,
  +   rmrru[i].base_address, rmrru[i].end_address);
  +xfree(rmrrn-scope.devices);
  +xfree(rmrrn);
  +return;
  +}
  +
  +seg = 0;
  +for ( dev = 0; dev  rmrru-dev_count; dev++ )
  +{
  +rmrrn-scope.devices[dev] = rmrru-sbdf[dev];
  +seg = seg | (rmrru-sbdf[dev]  16);
 
 |=
 
 Also with you having added PCI_SBDF() in patch 1, you should add
 the matched PCI_SEG() (or some such) instead of open coding it
 here.
Yes, will be also useful.
 
  +}
  +if ( seg != (rmrru-sbdf[0]  16) )
  +{
  +printk(XENLOG_ERR VTDPREFIX
  +   Segments are not equal for RMRR range  [%PRIx64 - 
  %PRIx64]\n,
  +

  1   2   >