Re: [Xen-devel] [PATCH v3 05/17] SUPPORT.md: Toolstack core

2017-11-27 Thread Roger Pau Monné
On Mon, Nov 27, 2017 at 02:58:38PM +, George Dunlap wrote:
> On 11/27/2017 02:39 PM, Roger Pau Monné wrote:
> > On Mon, Nov 27, 2017 at 02:12:40PM +, George Dunlap wrote:
> >> On 11/27/2017 11:43 AM, Roger Pau Monné wrote:
> >>> On Wed, Nov 22, 2017 at 07:20:12PM +, George Dunlap wrote:
> >>>> For now only include xl-specific features, or interaction with the
> >>>> system.  Feature support matrix will be added when features are
> >>>> mentioned.
> >>>>
> >>>> Signed-off-by: George Dunlap <george.dun...@citrix.com>
> >>>> ---
> >>>> CC: Ian Jackson <ian.jack...@citrix.com>
> >>>> CC: Wei Liu <wei.l...@citrix.com>
> >>>> CC: Andrew Cooper <andrew.coop...@citrix.com>
> >>>> CC: Jan Beulich <jbeul...@suse.com>
> >>>> CC: Stefano Stabellini <sstabell...@kernel.org>
> >>>> CC: Konrad Wilk <konrad.w...@oracle.com>
> >>>> CC: Tim Deegan <t...@xen.org>
> >>>> ---
> >>>>  SUPPORT.md | 38 ++
> >>>>  1 file changed, 38 insertions(+)
> >>>>
> >>>> diff --git a/SUPPORT.md b/SUPPORT.md
> >>>> index 5945ab4926..df429cb3c4 100644
> >>>> --- a/SUPPORT.md
> >>>> +++ b/SUPPORT.md
> >>>> @@ -96,6 +96,44 @@ Requires hardware virtualisation support (Intel VMX / 
> >>>> AMD SVM)
> >>>>  
> >>>>  ARM only has one guest type at the moment
> >>>>  
> >>>> +## Toolstack
> >>>> +
> >>>> +### xl
> >>>> +
> >>>> +Status: Supported
> >>>> +
> >>>> +### Direct-boot kernel image format
> >>>> +
> >>>> +Supported, x86: bzImage
> >>>
> >>> ELF is missing here.
> >>>
> >>>> +Supported, ARM32: zImage
> >>>> +Supported, ARM64: Image
> >>>> +
> >>>> +Format which the toolstack accept for direct-boot kernels
> >>>> +
> >>>> +### systemd support for xl
> >>>
> >>> BSD-style init is also supported.
> >>
> >> Is that different than SysV init?
> > 
> > It seems so but I'm no expert. The FreeBSD handbook states [0]:
> > 
> > "Many Linux(c) distributions use the SysV init system, whereas FreeBSD
> > uses the traditional BSD-style init(8). Under the BSD-style init(8),
> > there are no run-levels and /etc/inittab does not exist. Instead,
> > startup is controlled by rc(8) scripts."
> > 
> > So if systemd and SysV are listed, BSD-style init should also be
> > listed.
> 
> OK, I've modified these to look like this:
> 
> ---
> ### Dom0 init support for xl
> 
> Status, SysV: Supported
> Status, systemd: Supported
> Status, BSD-style: Supported
> ---
> 
> Does that sound good?

Yes, that LGTM.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 05/17] SUPPORT.md: Toolstack core

2017-11-27 Thread Roger Pau Monné
On Wed, Nov 22, 2017 at 07:20:12PM +, George Dunlap wrote:
> For now only include xl-specific features, or interaction with the
> system.  Feature support matrix will be added when features are
> mentioned.
> 
> Signed-off-by: George Dunlap <george.dun...@citrix.com>
> ---
> CC: Ian Jackson <ian.jack...@citrix.com>
> CC: Wei Liu <wei.l...@citrix.com>
> CC: Andrew Cooper <andrew.coop...@citrix.com>
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Stefano Stabellini <sstabell...@kernel.org>
> CC: Konrad Wilk <konrad.w...@oracle.com>
> CC: Tim Deegan <t...@xen.org>
> ---
>  SUPPORT.md | 38 ++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/SUPPORT.md b/SUPPORT.md
> index 5945ab4926..df429cb3c4 100644
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -96,6 +96,44 @@ Requires hardware virtualisation support (Intel VMX / AMD 
> SVM)
>  
>  ARM only has one guest type at the moment
>  
> +## Toolstack
> +
> +### xl
> +
> +Status: Supported
> +
> +### Direct-boot kernel image format
> +
> +Supported, x86: bzImage

ELF is missing here.

> +Supported, ARM32: zImage
> +Supported, ARM64: Image
> +
> +Format which the toolstack accept for direct-boot kernels
> +
> +### systemd support for xl

BSD-style init is also supported.

The rest LGTM:

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 05/17] SUPPORT.md: Toolstack core

2017-11-27 Thread Roger Pau Monné
On Mon, Nov 27, 2017 at 02:12:40PM +, George Dunlap wrote:
> On 11/27/2017 11:43 AM, Roger Pau Monné wrote:
> > On Wed, Nov 22, 2017 at 07:20:12PM +, George Dunlap wrote:
> >> For now only include xl-specific features, or interaction with the
> >> system.  Feature support matrix will be added when features are
> >> mentioned.
> >>
> >> Signed-off-by: George Dunlap <george.dun...@citrix.com>
> >> ---
> >> CC: Ian Jackson <ian.jack...@citrix.com>
> >> CC: Wei Liu <wei.l...@citrix.com>
> >> CC: Andrew Cooper <andrew.coop...@citrix.com>
> >> CC: Jan Beulich <jbeul...@suse.com>
> >> CC: Stefano Stabellini <sstabell...@kernel.org>
> >> CC: Konrad Wilk <konrad.w...@oracle.com>
> >> CC: Tim Deegan <t...@xen.org>
> >> ---
> >>  SUPPORT.md | 38 ++
> >>  1 file changed, 38 insertions(+)
> >>
> >> diff --git a/SUPPORT.md b/SUPPORT.md
> >> index 5945ab4926..df429cb3c4 100644
> >> --- a/SUPPORT.md
> >> +++ b/SUPPORT.md
> >> @@ -96,6 +96,44 @@ Requires hardware virtualisation support (Intel VMX / 
> >> AMD SVM)
> >>  
> >>  ARM only has one guest type at the moment
> >>  
> >> +## Toolstack
> >> +
> >> +### xl
> >> +
> >> +Status: Supported
> >> +
> >> +### Direct-boot kernel image format
> >> +
> >> +Supported, x86: bzImage
> > 
> > ELF is missing here.
> > 
> >> +Supported, ARM32: zImage
> >> +Supported, ARM64: Image
> >> +
> >> +Format which the toolstack accept for direct-boot kernels
> >> +
> >> +### systemd support for xl
> > 
> > BSD-style init is also supported.
> 
> Is that different than SysV init?

It seems so but I'm no expert. The FreeBSD handbook states [0]:

"Many Linux(c) distributions use the SysV init system, whereas FreeBSD
uses the traditional BSD-style init(8). Under the BSD-style init(8),
there are no run-levels and /etc/inittab does not exist. Instead,
startup is controlled by rc(8) scripts."

So if systemd and SysV are listed, BSD-style init should also be
listed.

Thanks, Roger.

[0] 
https://www.freebsd.org/doc/en_US.ISO8859-1/articles/linux-users/startup.html

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/3] x86/xen: supply rsdp address in boot params for pvh guests

2017-11-28 Thread Roger Pau Monné
On Tue, Nov 28, 2017 at 10:44:00AM +0100, Juergen Gross wrote:
> When booted via the special PVH entry save the RSDP address set in the
> boot information block in struct boot_params. This will enable Xen to
> locate the RSDP at an arbitrary address.
> 
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/xen/enlighten_pvh.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
> index 436c4f003e17..0175194f4236 100644
> --- a/arch/x86/xen/enlighten_pvh.c
> +++ b/arch/x86/xen/enlighten_pvh.c
> @@ -71,6 +71,8 @@ static void __init init_pvh_bootparams(void)
>*/
>   pvh_bootparams.hdr.version = 0x212;

Shouldn't this be 0x20e, like it's set in patch 1?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] x86/acpi: take rsdp address for boot params if available

2017-11-28 Thread Roger Pau Monné
On Tue, Nov 28, 2017 at 10:43:59AM +0100, Juergen Gross wrote:
> In case the rsdp address in struct boot_params is specified don't try
> to find the table by searching, but take the address directly as set
> by the boot loader.
> 
> Signed-off-by: Juergen Gross 
> ---
>  drivers/acpi/osl.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 3bb46cb24a99..3b25e2ad7d75 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -45,6 +45,10 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_X86
> +#include 
> +#endif
> +
>  #include "internal.h"
>  
>  #define _COMPONENT   ACPI_OS_SERVICES
> @@ -195,6 +199,10 @@ acpi_physical_address __init 
> acpi_os_get_root_pointer(void)
>   if (acpi_rsdp)
>   return acpi_rsdp;
>  #endif
> +#ifdef CONFIG_X86
> + if (boot_params.hdr.acpi_rsdp_addr)
> + return boot_params.hdr.acpi_rsdp_addr;
> +#endif

I'm struggling to figure out how was PVH getting the RSDP previously,
because that should be removed now that it's in the zero-page.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point

2017-11-29 Thread Roger Pau Monné
On Wed, Nov 29, 2017 at 09:21:59AM +0100, Juergen Gross wrote:
> On 28/11/17 20:34, Maran Wilson wrote:
> > For certain applications it is desirable to rapidly boot a KVM virtual
> > machine. In cases where legacy hardware and software support within the
> > guest is not needed, Qemu should be able to boot directly into the
> > uncompressed Linux kernel binary without the need to run firmware.
> > 
> > There already exists an ABI to allow this for Xen PVH guests and the ABI is
> > supported by Linux and FreeBSD:
> > 
> >https://xenbits.xen.org/docs/unstable/misc/hvmlite.html

I would also add a link to:

http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,arch-x86,hvm,start_info.h.html#Struct_hvm_start_info

> > This PoC patch enables Qemu to use that same entry point for booting KVM
> > guests.
> > 
> > Even though the code is still PoC quality, I'm sending this as an RFC now
> > since there are a number of different ways the specific implementation
> > details can be handled. I chose a shared code path for Xen and KVM guests
> > but could just as easily create a separate code path that is advertised by
> > a different ELF note for KVM. There also seems to be some flexibility in
> > how the e820 table data is passed and how (or if) it should be identified
> > as e820 data. As a starting point, I've chosen the options that seem to
> > result in the smallest patch with minimal to no changes required of the
> > x86/HVM direct boot ABI.
> 
> I like the idea.
> 
> I'd rather split up the different hypervisor types early and use a
> common set of service functions instead of special casing xen_guest
> everywhere. This would make it much easier to support the KVM PVH
> boot without the need to configure the kernel with CONFIG_XEN.
> 
> Another option would be to use the same boot path as with grub: set
> the boot params in zeropage and start at startup_32.

I think I prefer this approach since AFAICT it should allow for
greater code share with the common boot path.

> 
> Juergen
> 
> > ---
> >  arch/x86/xen/enlighten_pvh.c | 74 
> > 
> >  1 file changed, 55 insertions(+), 19 deletions(-)
> > 
> > diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
> > index 98ab176..d93f711 100644
> > --- a/arch/x86/xen/enlighten_pvh.c
> > +++ b/arch/x86/xen/enlighten_pvh.c
> > @@ -31,21 +31,46 @@ static void xen_pvh_arch_setup(void)
> > acpi_irq_model = ACPI_IRQ_MODEL_PLATFORM;
> >  }
> >  
> > -static void __init init_pvh_bootparams(void)
> > +static void __init init_pvh_bootparams(bool xen_guest)
> >  {
> > struct xen_memory_map memmap;
> > int rc;
> >  
> > memset(_bootparams, 0, sizeof(pvh_bootparams));
> >  
> > -   memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table);
> > -   set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table);
> > -   rc = HYPERVISOR_memory_op(XENMEM_memory_map, );
> > -   if (rc) {
> > -   xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc);
> > -   BUG();
> > +   if (xen_guest) {
> > +   memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table);
> > +   set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table);
> > +   rc = HYPERVISOR_memory_op(XENMEM_memory_map, );
> > +   if (rc) {
> > +   xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc);
> > +   BUG();
> > +   }
> > +   pvh_bootparams.e820_entries = memmap.nr_entries;
> > +   } else if (pvh_start_info.nr_modules > 1) {
> > +   /* The second module should be the e820 data for KVM guests */

I don't think this is desirable. You might want to boot other OSes
using this method, and they might want to pass more than one module.

IMHO the hvm_start_info structure should be bumped to contain a
pointer to the memory map. Note that there's a 'version' field that
can be used for that. Even on Xen we might want to pass the memory map
in such a way instead of using the hypercall.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point

2017-11-29 Thread Roger Pau Monné
On Wed, Nov 29, 2017 at 03:11:12PM +0100, Juergen Gross wrote:
> On 29/11/17 15:03, Boris Ostrovsky wrote:
> > On 11/29/2017 03:50 AM, Roger Pau Monné wrote:
> >> On Wed, Nov 29, 2017 at 09:21:59AM +0100, Juergen Gross wrote:
> >>> On 28/11/17 20:34, Maran Wilson wrote:
> >>>> For certain applications it is desirable to rapidly boot a KVM virtual
> >>>> machine. In cases where legacy hardware and software support within the
> >>>> guest is not needed, Qemu should be able to boot directly into the
> >>>> uncompressed Linux kernel binary without the need to run firmware.
> >>>>
> >>>> There already exists an ABI to allow this for Xen PVH guests and the ABI 
> >>>> is
> >>>> supported by Linux and FreeBSD:
> >>>>
> >>>>https://xenbits.xen.org/docs/unstable/misc/hvmlite.html
> >> I would also add a link to:
> >>
> >> http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,arch-x86,hvm,start_info.h.html#Struct_hvm_start_info
> >>
> >>>> This PoC patch enables Qemu to use that same entry point for booting KVM
> >>>> guests.
> >>>>
> >>>> Even though the code is still PoC quality, I'm sending this as an RFC now
> >>>> since there are a number of different ways the specific implementation
> >>>> details can be handled. I chose a shared code path for Xen and KVM guests
> >>>> but could just as easily create a separate code path that is advertised 
> >>>> by
> >>>> a different ELF note for KVM. There also seems to be some flexibility in
> >>>> how the e820 table data is passed and how (or if) it should be identified
> >>>> as e820 data. As a starting point, I've chosen the options that seem to
> >>>> result in the smallest patch with minimal to no changes required of the
> >>>> x86/HVM direct boot ABI.
> >>> I like the idea.
> >>>
> >>> I'd rather split up the different hypervisor types early and use a
> >>> common set of service functions instead of special casing xen_guest
> >>> everywhere. This would make it much easier to support the KVM PVH
> >>> boot without the need to configure the kernel with CONFIG_XEN.
> >>>
> >>> Another option would be to use the same boot path as with grub: set
> >>> the boot params in zeropage and start at startup_32.
> >> I think I prefer this approach since AFAICT it should allow for
> >> greater code share with the common boot path.
> > 
> > zeropage is x86/Linux-specific so we'd need some sort of firmware (like
> > grub) between a hypervisor and Linux to convert hvm_start_info to
> > bootparams.
> 
> qemu?

But then it won't be using the PVH entry point, and would just use the
native one?

My understanding was that the PVH shim inside of Linux will prepare a
zero-page when booted using the PVH entry point, and then jump into
the native boot path.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] sched/null: skip vCPUs on the waitqueue that are blocked

2017-12-19 Thread Roger Pau Monné
On Tue, Dec 19, 2017 at 07:25:51AM -0700, Jan Beulich wrote:
> >>> On 19.12.17 at 15:16,  wrote:
> > --- a/xen/common/sched_null.c
> > +++ b/xen/common/sched_null.c
> > @@ -781,6 +781,10 @@ static struct task_slice null_schedule(const struct 
> > scheduler *ops,
> >  {
> >  list_for_each_entry( wvc, >waitq, waitq_elem )
> >  {
> > +if ( test_bit(_VPF_down, >vcpu->pause_flags) )
> > +/* Skip vCPUs that are down. */
> > +continue;
> 
> If such a custom check is indeed necessary here (looks to rather be
> something generic scheduling code should be dealing with), why
> would you take into consideration only this one VPF bit?

I know that at least when a vCPU is brought up a call to vcpu_wake
happens and the vCPU is scheduled. The same happens for unpause and
some of the other events, but I'm not sure whether it applies to all
of them.

IMHO 'down' was the clearer one since there's no chance the vCPU is
going to run in the near future if it's not brought up. Other
pause_flags like 'blocked_in_xen' or 'migrating' seem more temporary.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] libxl/pvh: force PVH guests to use the xenstore shutdown

2017-12-19 Thread Roger Pau Monné
On Tue, Dec 19, 2017 at 02:48:47PM +, Ian Jackson wrote:
> Roger Pau Monné writes ("Re: [PATCH] libxl/pvh: force PVH guests to use the 
> xenstore shutdown"):
> > Yes, that's exactly what I meant but failed to express. Feel free to
> > replace the commit message with yours.
> 
> OK, thanks :-).  I have
> 
> Acked-by: Ian Jackson <ian.jack...@eu.citrix.com>
> 
> and pushed it.
> 
> I think this is a candidate for backporting as far as 4.9 ?

Yes, 4.10 only though (that's when the PVH guest type was introduced)
in it's current form. 4.9 will require some rework (like checking qemu
pid).

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] libxl: put RSDP for PVH guest near 4GB

2017-12-19 Thread Roger Pau Monné
On Fri, Dec 01, 2017 at 03:14:07PM +0100, Juergen Gross wrote:
> Instead of locating the RSDP table below 1MB put it just below 4GB
> like the rest of the ACPI tables in case of PVH guests. This will
> avoid punching more holes than necessary into the memory map.
> 
> Signed-off-by: Juergen Gross 
> Acked-by: Wei Liu 
> ---
>  tools/libxc/xc_dom_hvmloader.c | 2 +-
>  tools/libxl/libxl_x86_acpi.c   | 5 ++---
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/libxc/xc_dom_hvmloader.c b/tools/libxc/xc_dom_hvmloader.c
> index 59f94e51e5..3f0bd65547 100644
> --- a/tools/libxc/xc_dom_hvmloader.c
> +++ b/tools/libxc/xc_dom_hvmloader.c
> @@ -136,7 +136,7 @@ static int module_init_one(struct xc_dom_image *dom,
>  struct xc_dom_seg seg;
>  void *dest;
>  
> -if ( module->length )
> +if ( module->length && !module->guest_addr_out )

Isn't that kind of a separate fix? AFAICT this just prevents
allocating memory if guest_addr_out is already set to a fixed
position.

The rest LGTM.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] libxl: put RSDP for PVH guest near 4GB

2017-12-19 Thread Roger Pau Monné
On Tue, Dec 19, 2017 at 04:46:37PM +0100, Juergen Gross wrote:
> On 19/12/17 16:38, Roger Pau Monné wrote:
> > On Fri, Dec 01, 2017 at 03:14:07PM +0100, Juergen Gross wrote:
> >> Instead of locating the RSDP table below 1MB put it just below 4GB
> >> like the rest of the ACPI tables in case of PVH guests. This will
> >> avoid punching more holes than necessary into the memory map.
> >>
> >> Signed-off-by: Juergen Gross <jgr...@suse.com>
> >> Acked-by: Wei Liu <wei.l...@citrix.com>
> >> ---
> >>  tools/libxc/xc_dom_hvmloader.c | 2 +-
> >>  tools/libxl/libxl_x86_acpi.c   | 5 ++---
> >>  2 files changed, 3 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/tools/libxc/xc_dom_hvmloader.c 
> >> b/tools/libxc/xc_dom_hvmloader.c
> >> index 59f94e51e5..3f0bd65547 100644
> >> --- a/tools/libxc/xc_dom_hvmloader.c
> >> +++ b/tools/libxc/xc_dom_hvmloader.c
> >> @@ -136,7 +136,7 @@ static int module_init_one(struct xc_dom_image *dom,
> >>  struct xc_dom_seg seg;
> >>  void *dest;
> >>  
> >> -if ( module->length )
> >> +if ( module->length && !module->guest_addr_out )
> > 
> > Isn't that kind of a separate fix? AFAICT this just prevents
> > allocating memory if guest_addr_out is already set to a fixed
> > position.
> 
> No, this is mandatory, as I have to skip the allocation for PVH, while
> HVM guests really want the allocation to take place.

Was this also a problem before? Other ACPI modules also set
guest_addr_out, and previously they would also get memory allocated.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] libxl: put RSDP for PVH guest near 4GB

2017-12-20 Thread Roger Pau Monné
On Tue, Dec 19, 2017 at 05:20:41PM +0100, Juergen Gross wrote:
> On 19/12/17 17:11, Roger Pau Monné wrote:
> > On Tue, Dec 19, 2017 at 04:46:37PM +0100, Juergen Gross wrote:
> >> On 19/12/17 16:38, Roger Pau Monné wrote:
> >>> On Fri, Dec 01, 2017 at 03:14:07PM +0100, Juergen Gross wrote:
> >>>> Instead of locating the RSDP table below 1MB put it just below 4GB
> >>>> like the rest of the ACPI tables in case of PVH guests. This will
> >>>> avoid punching more holes than necessary into the memory map.
> >>>>
> >>>> Signed-off-by: Juergen Gross <jgr...@suse.com>
> >>>> Acked-by: Wei Liu <wei.l...@citrix.com>
> >>>> ---
> >>>>  tools/libxc/xc_dom_hvmloader.c | 2 +-
> >>>>  tools/libxl/libxl_x86_acpi.c   | 5 ++---
> >>>>  2 files changed, 3 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/tools/libxc/xc_dom_hvmloader.c 
> >>>> b/tools/libxc/xc_dom_hvmloader.c
> >>>> index 59f94e51e5..3f0bd65547 100644
> >>>> --- a/tools/libxc/xc_dom_hvmloader.c
> >>>> +++ b/tools/libxc/xc_dom_hvmloader.c
> >>>> @@ -136,7 +136,7 @@ static int module_init_one(struct xc_dom_image *dom,
> >>>>  struct xc_dom_seg seg;
> >>>>  void *dest;
> >>>>  
> >>>> -if ( module->length )
> >>>> +if ( module->length && !module->guest_addr_out )
> >>>
> >>> Isn't that kind of a separate fix? AFAICT this just prevents
> >>> allocating memory if guest_addr_out is already set to a fixed
> >>> position.
> >>
> >> No, this is mandatory, as I have to skip the allocation for PVH, while
> >> HVM guests really want the allocation to take place.
> > 
> > Was this also a problem before? Other ACPI modules also set
> > guest_addr_out, and previously they would also get memory allocated.
> 
> module_init_one() would only be called for the first ACPI module, which
> happens to be the RSDP.

OK, I have to admit I find all this quite confusing. In any case, the
approach seems correct to me.

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v17 19/19] osstest: use -DWITHOUT_AUTO_OBJ with FreeBSD release targets

2017-12-05 Thread Roger Pau Monné
On Tue, Dec 05, 2017 at 03:09:47PM +, Ian Jackson wrote:
> Roger Pau Monne writes ("[PATCH v17 19/19] osstest: use -DWITHOUT_AUTO_OBJ 
> with FreeBSD release targets"):
> > Due to a recent FreeBSD change the default output directory of the release
> > targets is changed to the object directory instead of the source
> > directory. Use WITHOUT_AUTO_OBJ to restore previous behavior. This is
> > harmless if used with previous versions, it will be ignored.
> 
> So the purpose of this is to make both versions of the build do the
> same thing, so osstest doesn't have to care ?
> 
> Is that why you're not just updating osstest to expect the new
> behaviour ?

Yes, I think it's more sensible to restore the previous behavior so
that the same osstest script can be used to build FreeBSD from the
master branch or the stable branches.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [for-4.10] Re: [PATCH v2] tools/libxl: mark special pages as reserved in e820 map for PVH

2017-12-06 Thread Roger Pau Monné
On Wed, Dec 06, 2017 at 12:22:00PM +0100, Juergen Gross wrote:
> On 06/12/17 10:53, Julien Grall wrote:
> > Hi Juergen,
> > 
> > On 12/05/2017 04:19 PM, Juergen Gross wrote:
> >> On 05/12/17 16:23, Julien Grall wrote:
> >>> Hi Juergen,
> >>>
> >>> On 04/12/17 15:49, Juergen Gross wrote:
>  On 21/11/17 12:06, Juergen Gross wrote:
> > The "special pages" for PVH guests include the frames for console and
> > Xenstore ring buffers. Those have to be marked as "Reserved" in the
> > guest's E820 map, as otherwise conflicts might arise later e.g. when
> > hotplugging memory into the guest.
> >
> > Signed-off-by: Juergen Gross 
> > ---
> > This is a bugfix for PVH guests. Please consider for 4.10.
> 
>  Ping?
> >>>
> >>> I was waiting an ack from tools maintainers before looking for a release
> >>> perspective.
> >>>
> >>> I would recommend to tag your patch is 4.10 to help reviewers prioritize
> >>> review on your patch. I have done it now.
> >>>
> >>> I am looking at releasing Xen 4.10 in the next few days. Can you explain
> >>> the pros/cons of this patch?
> >>
> >> Pros: PVH guests with 4GB of memory or more will work. :-)
> > 
> > They never worked before? Or is it a regression? If it is a regression
> > when did it appear?
> 
> Hmm, seems we are lucky: Linux kernel will not try to map any memory
> there (I just tested it). So we don't need that patch in 4.10 for Linux
> running as PVH guest. Not sure about BSD, though.

I haven't yet committed any PVHv2 support to FreeBSD, but in any case
I always avoid using memory below 4GB just in case...

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] tools/libxl: mark special pages as reserved in e820 map for PVH

2017-12-04 Thread Roger Pau Monné
On Tue, Nov 21, 2017 at 12:06:06PM +0100, Juergen Gross wrote:
> The "special pages" for PVH guests include the frames for console and
> Xenstore ring buffers. Those have to be marked as "Reserved" in the
> guest's E820 map, as otherwise conflicts might arise later e.g. when
> hotplugging memory into the guest.
> 
> Signed-off-by: Juergen Gross <jgr...@suse.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Albeit I would also prefer this to not be PVH specific. Ideally I
would like both PVH and HVM to share the logic to mark the reserved
regions in the memory map. I guess this can be fixed afterwards by
moving away this logic from hvmloader and handling the creation of
the memory map for both HVM and PVH in libxl.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/5] docs/pvh: document initial MTRR state

2018-05-14 Thread Roger Pau Monné
On Mon, May 14, 2018 at 05:03:52PM +0100, Wei Liu wrote:
> On Thu, May 10, 2018 at 06:15:05PM +0100, Roger Pau Monne wrote:
> > Provided to both Dom0 and DomUs.
> > 
> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> > ---
> > Cc: Andrew Cooper <andrew.coop...@citrix.com>
> > Cc: George Dunlap <george.dun...@eu.citrix.com>
> > Cc: Ian Jackson <ian.jack...@eu.citrix.com>
> > Cc: Jan Beulich <jbeul...@suse.com>
> > Cc: Julien Grall <julien.gr...@arm.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
> > Cc: Stefano Stabellini <sstabell...@kernel.org>
> > Cc: Tim Deegan <t...@xen.org>
> > Cc: Wei Liu <wei.l...@citrix.com>
> > ---
> >  docs/misc/pvh.markdown | 15 +++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/docs/misc/pvh.markdown b/docs/misc/pvh.markdown
> > index e85fb15374..639401a887 100644
> > --- a/docs/misc/pvh.markdown
> > +++ b/docs/misc/pvh.markdown
> > @@ -92,3 +92,18 @@ event channels. Delivery of those interrupts can be 
> > configured in the same way
> >  as HVM guests, check xen/include/public/hvm/params.h and
> >  xen/include/public/hvm/hvm\_op.h for more information about available 
> > delivery
> >  methods.
> > +
> > +## MTRR ##
> > +
> > +### Unprivileged guests ###
> > +
> > +PVH guests are booted with the default MTRR type set to write-back and MTRR
> > +enabled. This allows DomUs to start with a sane MTRR state. Note that this 
> > will
> > +have to be revisited when pci-passthrough is added to PVH in order to set 
> > MMIO
> > +regions as UC.
> 
> My reading is "revisited" implies the default type will change. In fact
> it shouldn't. We should clarify: for ram it will remain WB, for MMIO
> holes it will be UC.
>
> Please correct me if I'm wrong.

That's correct. I've used "revisited" here in the sense that Xen might
change the default type to UC and set the RAM regions as WB using
variable MTRR ranges for example.

I simply wanted to remark that the way RAM is set to WB is currently
done using the default MTRR type. RAM will always be set of WB for PVH
in MTRR, however the way to achieve it might change.

What about adding:

"Xen guarantees that RAM regions will always have the WB cache type
set in the initial MTRR state, either set by the default MTRR type or
by other means."

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] vpci/msi: split code to bind pirq

2018-05-14 Thread Roger Pau Monné
On Mon, May 14, 2018 at 08:56:16AM -0600, Jan Beulich wrote:
> >>> On 14.05.18 at 16:15,  wrote:
> > On Mon, May 14, 2018 at 06:24:37AM -0600, Jan Beulich wrote:
> >> >>> On 08.05.18 at 11:25,  wrote:
> >> > --- a/xen/arch/x86/hvm/vmsi.c
> >> > +++ b/xen/arch/x86/hvm/vmsi.c
> >> > @@ -663,6 +663,42 @@ void vpci_msi_arch_mask(struct vpci_msi *msi, const 
> > struct pci_dev *pdev,
> >> >  vpci_mask_pirq(pdev->domain, msi->arch.pirq + entry, mask);
> >> >  }
> >> >  
> >> > +static int vpci_msi_update(const struct pci_dev *pdev, uint32_t data,
> >> > +   uint64_t address, unsigned int vectors,
> >> > +   unsigned int pirq, uint32_t mask)
> >> > +{
> >> > +unsigned int i;
> >> > +
> >> > +ASSERT(pcidevs_locked());
> >> > +
> >> > +for ( i = 0; i < vectors; i++ )
> >> > +{
> >> > +uint8_t vector = MASK_EXTR(data, MSI_DATA_VECTOR_MASK);
> >> > +uint8_t vector_mask = 0xff >> (8 - fls(vectors) + 1);
> >> > +struct xen_domctl_bind_pt_irq bind = {
> >> > +.machine_irq = pirq + i,
> >> > +.irq_type = PT_IRQ_TYPE_MSI,
> >> > +.u.msi.gvec = (vector & ~vector_mask) |
> >> > +  ((vector + i) & vector_mask),
> >> > +.u.msi.gflags = msi_gflags(data, address, (mask >> i) & 1),
> >> > +};
> >> > +int rc = pt_irq_create_bind(pdev->domain, );
> >> > +
> >> > +if ( rc )
> >> > +{
> >> > +gdprintk(XENLOG_ERR,
> >> > + "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
> >> > + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> >> > + PCI_FUNC(pdev->devfn), pirq + i, rc);
> >> > +while ( bind.machine_irq-- )
> >> > +pt_irq_destroy_bind(pdev->domain, );
> >> 
> >> I realize this is just code movement, but is this while() correct? I think 
> > it
> >> can only be correct if pirq (which bind.machine_irq gets initialized from)
> >> was always zero, yet that doesn't look to be the case.
> >> 
> >> If you agree, I'd prefer fixed code to be moved (read: wants a prereq
> >> patch), or for the fix to be applied while moving the code (suitably
> >> reasoned about in the description).
> > 
> > Right, this should be:
> > 
> > while ( bind.machine_irq-- >= pirq )
> > pt_irq_destroy_bind(pdev->domain, );
> 
> ">" you presumably mean, due to the post-decrement?

Ended up doing --bind.machine_irq >= pirq, because it seemed clearer
IMO.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/5] libxc/pvh: set default MTRR type to write-back

2018-05-14 Thread Roger Pau Monné
On Mon, May 14, 2018 at 05:00:51PM +0100, Wei Liu wrote:
> On Thu, May 10, 2018 at 06:15:04PM +0100, Roger Pau Monne wrote:
> > @@ -1014,6 +1034,30 @@ static int vcpu_hvm(struct xc_dom_image *dom)
> >  if ( dom->start_info_seg.pfn )
> >  bsp_ctx.cpu.rbx = dom->start_info_seg.pfn << PAGE_SHIFT;
> >  
> > +/* Set the MTRR. */
> > +bsp_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR);
> > +bsp_ctx.mtrr_d.instance = 0;
> > +bsp_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR);
> > +
> > +mtrr_record = hvm_get_save_record(full_ctx, HVM_SAVE_CODE(MTRR), 0);
> > +if ( !mtrr_record )
> > +{
> > +xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
> > + "%s: unable to get MTRR save record", __func__);
> > +goto out;
> > +}
> 
> Will this break cross version migration when the older hypervisor
> doesn't have such record?

This migration record is already present, it's not introduced in this
patch series. I'm simply making use of it in order to set a valid
initial MTRR state.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/5] hvm/mtrr: copy hardware state for Dom0

2018-05-14 Thread Roger Pau Monné
On Mon, May 14, 2018 at 08:26:30AM -0600, Jan Beulich wrote:
> >>> On 10.05.18 at 19:15, <roger@citrix.com> wrote:
> > Copy the state found on the hardware when creating a PVH Dom0. Since
> > the memory map provided to a PVH Dom0 is based on the native one using
> > the same set of MTRR ranges should provide Dom0 with a sane MTRR state
> > without having to manually build it in Xen.
> > 
> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> > ---
> > Cc: Jan Beulich <jbeul...@suse.com>
> > Cc: Andrew Cooper <andrew.coop...@citrix.com>
> > ---
> >  xen/arch/x86/hvm/mtrr.c | 23 +++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
> > index 95a3deabea..1cb000388a 100644
> > --- a/xen/arch/x86/hvm/mtrr.c
> > +++ b/xen/arch/x86/hvm/mtrr.c
> > @@ -176,6 +176,29 @@ int hvm_vcpu_cacheattr_init(struct vcpu *v)
> >  ((uint64_t)PAT_TYPE_UC_MINUS << 48) |   /* PAT6: UC- */
> >  ((uint64_t)PAT_TYPE_UNCACHABLE << 56);  /* PAT7: UC */
> >  
> > +if ( is_hardware_domain(v->domain) )
> > +{
> > +/* Copy values from the host. */
> > +struct domain *d = v->domain;
> > +unsigned int i;
> > +
> > +if ( mtrr_state.have_fixed )
> > +for ( i = 0; i < NUM_FIXED_MSR; i++ )
> > +mtrr_fix_range_msr_set(d, m, i,
> > +  ((uint64_t 
> > *)mtrr_state.fixed_ranges)[i]);
> 
> The presence/absence of fixed range MTRRs needs to be reflected in the
> capabilities MSR. Strictly speaking in their absence MSR access attempts to
> the fixed range MSRs should also cause #GP, as should any attempt to
> enable them in defType.

My intention was to always provide the fixed range MTRR capability,
regardless of whether the underlying hardware has it or not. Then of
course fixed ranges won't be enabled by default in the deftype MSR if
the underlying hardware also hasn't got them enabled.

> > +for ( i = 0; i < num_var_ranges; i++ )
> > +{
> > +mtrr_var_range_msr_set(d, m, MSR_IA32_MTRR_PHYSBASE(i),
> > +   mtrr_state.var_ranges[i].base);
> > +mtrr_var_range_msr_set(d, m, MSR_IA32_MTRR_PHYSMASK(i),
> > +   mtrr_state.var_ranges[i].mask);
> > +}
> > +
> > +mtrr_def_type_msr_set(d, m, mtrr_state.def_type |
> > +(mtrr_state.enabled << 10));
> 
> In the interest of no further proliferation of this and similar literal 
> numbers,
> could I ask you to introduce #define-s into msr-index.h?

Sure.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] vpci/msi: split code to bind pirq

2018-05-14 Thread Roger Pau Monné
On Mon, May 14, 2018 at 06:24:37AM -0600, Jan Beulich wrote:
> >>> On 08.05.18 at 11:25,  wrote:
> > --- a/xen/arch/x86/hvm/vmsi.c
> > +++ b/xen/arch/x86/hvm/vmsi.c
> > @@ -663,6 +663,42 @@ void vpci_msi_arch_mask(struct vpci_msi *msi, const 
> > struct pci_dev *pdev,
> >  vpci_mask_pirq(pdev->domain, msi->arch.pirq + entry, mask);
> >  }
> >  
> > +static int vpci_msi_update(const struct pci_dev *pdev, uint32_t data,
> > +   uint64_t address, unsigned int vectors,
> > +   unsigned int pirq, uint32_t mask)
> > +{
> > +unsigned int i;
> > +
> > +ASSERT(pcidevs_locked());
> > +
> > +for ( i = 0; i < vectors; i++ )
> > +{
> > +uint8_t vector = MASK_EXTR(data, MSI_DATA_VECTOR_MASK);
> > +uint8_t vector_mask = 0xff >> (8 - fls(vectors) + 1);
> > +struct xen_domctl_bind_pt_irq bind = {
> > +.machine_irq = pirq + i,
> > +.irq_type = PT_IRQ_TYPE_MSI,
> > +.u.msi.gvec = (vector & ~vector_mask) |
> > +  ((vector + i) & vector_mask),
> > +.u.msi.gflags = msi_gflags(data, address, (mask >> i) & 1),
> > +};
> > +int rc = pt_irq_create_bind(pdev->domain, );
> > +
> > +if ( rc )
> > +{
> > +gdprintk(XENLOG_ERR,
> > + "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
> > + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> > + PCI_FUNC(pdev->devfn), pirq + i, rc);
> > +while ( bind.machine_irq-- )
> > +pt_irq_destroy_bind(pdev->domain, );
> 
> I realize this is just code movement, but is this while() correct? I think it
> can only be correct if pirq (which bind.machine_irq gets initialized from)
> was always zero, yet that doesn't look to be the case.
> 
> If you agree, I'd prefer fixed code to be moved (read: wants a prereq
> patch), or for the fix to be applied while moving the code (suitably
> reasoned about in the description).

Right, this should be:

while ( bind.machine_irq-- >= pirq )
pt_irq_destroy_bind(pdev->domain, );

Will fix before moving the code.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] vpci/msi: fix update of bound MSI interrupts

2018-05-14 Thread Roger Pau Monné
On Mon, May 14, 2018 at 06:29:37AM -0600, Jan Beulich wrote:
> >>> On 08.05.18 at 11:25,  wrote:
> > --- a/xen/arch/x86/hvm/vmsi.c
> > +++ b/xen/arch/x86/hvm/vmsi.c
> > @@ -699,6 +699,29 @@ static int vpci_msi_update(const struct pci_dev *pdev, 
> > uint32_t data,
> >  return 0;
> >  }
> >  
> > +int vpci_msi_arch_update(struct vpci_msi *msi, const struct pci_dev *pdev)
> > +{
> > +int rc;
> > +
> > +ASSERT(msi->arch.pirq != INVALID_PIRQ);
> > +
> > +pcidevs_lock();
> > +rc = vpci_msi_update(pdev, msi->data, msi->address, msi->vectors,
> > + msi->arch.pirq, msi->mask);
> > +if ( rc )
> > +{
> > +spin_lock(>domain->event_lock);
> > +unmap_domain_pirq(pdev->domain, msi->arch.pirq);
> 
> This looks quite undesirable - a failed update should leave the interrupt in 
> its
> prior state rather than unbinding it. Is that overly difficult to achieve?

Oh, TBH I would expect that writing an invalid data or address fields
will disable MSI instead of keep using the old values. I'm not sure I
see the reason to keep using the old values, certainly that could make
something else go very wonky inside of the guest itself.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/5] libxc/pvh: set default MTRR type to write-back

2018-05-14 Thread Roger Pau Monné
On Mon, May 14, 2018 at 05:42:53PM +0100, Wei Liu wrote:
> On Mon, May 14, 2018 at 05:02:47PM +0100, Roger Pau Monné wrote:
> > On Mon, May 14, 2018 at 05:00:51PM +0100, Wei Liu wrote:
> > > On Thu, May 10, 2018 at 06:15:04PM +0100, Roger Pau Monne wrote:
> > > > @@ -1014,6 +1034,30 @@ static int vcpu_hvm(struct xc_dom_image *dom)
> > > >  if ( dom->start_info_seg.pfn )
> > > >  bsp_ctx.cpu.rbx = dom->start_info_seg.pfn << PAGE_SHIFT;
> > > >  
> > > > +/* Set the MTRR. */
> > > > +bsp_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR);
> > > > +bsp_ctx.mtrr_d.instance = 0;
> > > > +bsp_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR);
> > > > +
> > > > +mtrr_record = hvm_get_save_record(full_ctx, HVM_SAVE_CODE(MTRR), 
> > > > 0);
> > > > +if ( !mtrr_record )
> > > > +{
> > > > +xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
> > > > + "%s: unable to get MTRR save record", __func__);
> > > > +goto out;
> > > > +}
> > > 
> > > Will this break cross version migration when the older hypervisor
> > > doesn't have such record?
> > 
> > This migration record is already present, it's not introduced in this
> > patch series. I'm simply making use of it in order to set a valid
> > initial MTRR state.
> > 
> 
> Then I'm even more confused: does this mean the record is not properly
> honoured in the hypervisor? I.e. this is a hypervisor bug?

I'm kind of lost. This code I'm adding is not used for migration at
all. This is used in order to set the initial vCPU state for PVH
guests (which is done using xc_domain_hvm_setcontext). In order to
provide an initial MTRR state we need to create a MTRR record and
provide it to the hypervisor. I'm not introducing the MTRR record in
this patch, I'm just making use of it in order to set a sane initial
MTRR state for PVH guests.

libxc already does the same in order to set the initial CPU register
state for PVH/HVM guests.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] pci: treat class 0 devices as endpoints

2018-05-14 Thread Roger Pau Monné
On Mon, May 14, 2018 at 06:46:08AM -0600, Jan Beulich wrote:
> >>> On 08.05.18 at 11:33, <roger@citrix.com> wrote:
> > Class 0 devices are legacy pre PCI 2.0 devices that didn't have a
> > class code. Treat them as endpoints, so that they can be handled by
> > the IOMMU and properly passed-through to the hardware domain.
> > 
> > Such device has been seen on a Super Micro server, lspci -vv reports:
> > 
> > 00:13.0 Non-VGA unclassified device: Intel Corporation Device a135 (rev 31)
> > Subsystem: Super Micro Computer Inc Device 0931
> > Flags: bus master, fast devsel, latency 0, IRQ 11
> > Memory at df222000 (64-bit, non-prefetchable) [size=4K]
> > Capabilities: [80] Power Management version 3
> > 
> > Arguably this is not a legacy device (since this is a new server), but
> > in any case Xen needs to deal with it.
> 
> Well, it's a two fold argument: On one hand I agree we ought to be dealing
> with class 0. Otoh this particular device is an example to the contrary - we
> should try to avoid passing through broken devices: If their designers don't
> even get the class code right, what other flaws do we have to expect?
> Anyway, this is no objection to the actual code change, I'm merely
> unconvinced that the argumentation is plausible.
> 
> > Suggested-by: Andrew Cooper <andrew.coop...@citrix.com>
> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> 
> Acked-by: Jan Beulich <jbeul...@suse.com>

Thanks.

Since I also have another patch already Acked, I guess I will maintain
my for-4.12 branch until the tree opens again.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/2] vpci/msi: fix updating already bound MSI interrupts

2018-05-08 Thread Roger Pau Monné
On Tue, May 08, 2018 at 11:30:18AM +0200, Juergen Gross wrote:
> On 08/05/18 11:23, Roger Pau Monne wrote:
> > Hello,
> > 
> > There's a bug in current vpci code for MSI emulation when updating an
> > already bound interrupt. The code will disable and enable the interrupt
> > in order to update the binding, which calls unmap_domain_pirq that
> > disables the global MSI enable flag in the control register.
> > 
> > In order to fix this incorrect behavior introduce a new update helper
> > that should be used to update the bindings of an already enabled group
> > of MSI interrupts.
> > 
> > Thanks, Roger.
> > 
> > Roger Pau Monne (2):
> >   vpci/msi: split code to bind pirq
> >   vpci/msi: fix update of bound MSI interrupts
> > 
> >  xen/arch/x86/hvm/vmsi.c | 96 +
> >  xen/drivers/vpci/msi.c  |  3 +-
> >  xen/include/xen/vpci.h  |  2 +
> >  3 files changed, 71 insertions(+), 30 deletions(-)
> > 
> 
> I guess this is 4.11 material?

Hm, I think so given PVH Dom0 is experimental. OTOH this changes are
to code used exclusively by PVH Dom0, so there's no chance of breaking
anything else. Let's see what the maintainers think and I can Cc you
later if required.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/3] xen/pvh: enable and set default MTRR type

2018-05-09 Thread Roger Pau Monné
On Wed, May 09, 2018 at 12:30:16PM +0100, Roger Pau Monné wrote:
> On Wed, May 09, 2018 at 11:56:40AM +0100, Andrew Cooper wrote:
> > On 09/05/18 11:21, Roger Pau Monne wrote:
> > I'm not sure that setting the default MTRR type is going to be a
> > clever idea in hindsight when we come to doing PCI Passthrough support.
> 
> Setting the default type to WB is also set by hvmloader, it's just
> that hvmloader also sets some of the fixed and variable ranges to UC
> in order to cover the iomem areas.
> 
> The expectations when doing pci-passthrough is that the guest will
> always use paging and PAT in order to set the appropriate cache
> attributes, or else the guest itself will have to program the UC MTRR
> ranges, I admit that's not very nice however.
> 
> What about enabling the default MTRR type and setting it to WB in the
> toolstack for PVH? IMO doing it Xen itself would be wrong.

I have the following patch to set the default MTRR type, but I think
if we go down this road then we will also have to set UC MTRRs for
MMIO areas, which again seems fine to me.

---8<---
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index e33a28847d..3cb1a1720f 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -938,6 +938,8 @@ static int vcpu_hvm(struct xc_dom_image *dom)
 HVM_SAVE_TYPE(HEADER) header;
 struct hvm_save_descriptor cpu_d;
 HVM_SAVE_TYPE(CPU) cpu;
+struct hvm_save_descriptor mtrr_d;
+HVM_SAVE_TYPE(MTRR) mtrr;
 struct hvm_save_descriptor end_d;
 HVM_SAVE_TYPE(END) end;
 } bsp_ctx;
@@ -1014,6 +1016,15 @@ static int vcpu_hvm(struct xc_dom_image *dom)
 if ( dom->start_info_seg.pfn )
 bsp_ctx.cpu.rbx = dom->start_info_seg.pfn << PAGE_SHIFT;
 
+/* Set the MTRR. */
+bsp_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR);
+bsp_ctx.mtrr_d.instance = 0;
+bsp_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR);
+/* XXX: maybe this should be a firmware option instead? */
+if ( !dom->device_model )
+/* Enable MTRR (bit 11) and set the default type to WB (6). */
+bsp_ctx.mtrr.msr_mtrr_def_type = (1u << 11) | 6;
+
 /* Set the end descriptor. */
 bsp_ctx.end_d.typecode = HVM_SAVE_CODE(END);
 bsp_ctx.end_d.instance = 0;


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/3] xen/pvh: enable and set default MTRR type

2018-05-09 Thread Roger Pau Monné
On Wed, May 09, 2018 at 11:56:40AM +0100, Andrew Cooper wrote:
> On 09/05/18 11:21, Roger Pau Monne wrote:
> > On PVH MTRR is not initialized by the firmware (because there's no
> > firmware), so the kernel is started with MTRR disabled which means all
> > memory accesses are UC.
> >
> > So far there have been no issues (ie: slowdowns) caused by this
> > because PVH only supported DomU mode without passed-through devices,
> > so Xen was using WB as the default memory type instead of UC.
> >
> > Fix this by enabling MTRR and setting the default type to WB. Linux
> > will use PAT to set the actual memory cache attributes.
> >
> > Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> 
> I'd argue that this is a bug in PVH starting state.

Do you mean that MTRR should be setup before starting the guest?

> Do you know what mechanism is used to bodge things to WB in the first
> place?

If you mean when passthorugh is not used (ie: no IOMMU), then it's at
epte_get_entry_emt, grep for need_iommu(d) (line ~801).

> I'm not sure that setting the default MTRR type is going to be a
> clever idea in hindsight when we come to doing PCI Passthrough support.

Setting the default type to WB is also set by hvmloader, it's just
that hvmloader also sets some of the fixed and variable ranges to UC
in order to cover the iomem areas.

The expectations when doing pci-passthrough is that the guest will
always use paging and PAT in order to set the appropriate cache
attributes, or else the guest itself will have to program the UC MTRR
ranges, I admit that's not very nice however.

What about enabling the default MTRR type and setting it to WB in the
toolstack for PVH? IMO doing it Xen itself would be wrong.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC] libxl: set 1GB MMIO hole for PVH

2018-05-10 Thread Roger Pau Monné
On Wed, May 09, 2018 at 06:12:28PM +0200, Juergen Gross wrote:
> On 09/05/18 18:07, Roger Pau Monne wrote:
> > This prevents page-shattering, by being able to populate the RAM
> > regions below 4GB using 1GB pages, provided the guest memory size is
> > set to a multiple of a GB.
> > 
> > Note that there are some special and ACPI pages in the MMIO hole that
> > will be populated using smaller order pages, but those shouldn't be
> > accessed as often as RAM regions.
> 
> Would it be possible somehow to put a potential firmware into that
> 1GB region, too, if it needs any memory in high memory? Seabios e.g.
> is taking the last RAM page of the guest for its hypercall page, which
> will again shatter GB mappings.

I know this comment is related to HVM guests, but I'm not sure I see
how setting the hypercall page shatters GB mappings. Setting the
hypercall page doesn't involve changing any p2m mappings, but just
filling a guest RAM page with some data.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC] libxl: set 1GB MMIO hole for PVH

2018-05-10 Thread Roger Pau Monné
On Thu, May 10, 2018 at 10:43:26AM +0100, George Dunlap wrote:
> On Wed, May 9, 2018 at 5:07 PM, Roger Pau Monne  wrote:
> > This prevents page-shattering, by being able to populate the RAM
> > regions below 4GB using 1GB pages, provided the guest memory size is
> > set to a multiple of a GB.
> >
> > Note that there are some special and ACPI pages in the MMIO hole that
> > will be populated using smaller order pages, but those shouldn't be
> > accessed as often as RAM regions.
> 
> Is it possible to run PVH in pure 32-bit mode (as opposed to 32-bit
> PAE)?  If so, such guests would be limited to 3GiB of total memory
> (instead of 4GiB).

Yes, that's correct. PVH guests are not limited to any mode, you could
even run them in protected or real mode.

> But I suppose there's no particular reason to run PVH in pure 32-bit
> mode instead of 32-bit PAE.  (I don't *think* TLB misses are slower on
> 3-level paging than 2-level paging, because the L3 entries are
> essentially loaded on CR3 switch.)
> 
> So at the moment this seems OK to me.  If someone decides they want to
> run PVH 2-level paging with more than 3GiB of RAM, we can easily add
> an option to turn it on.

That's my opinion. HVM guests already have a mmio_hole option, it
would be almost trivial to make that option also available to PVH if
there's a need for it.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/domain: Drop the only-written smap_check_policy infrastructure

2018-05-08 Thread Roger Pau Monné
On Tue, May 08, 2018 at 02:03:04PM +0100, Andrew Cooper wrote:
> c/s 4c5d78a10d "x86/pagewalk: Re-implement the pagetable walker" dropped the
> consumer of smap_policy.  Looking at c/s 31ae587e6f which introduced the
> smap_check logic, it exists only to work around a bug in guest_walk_tables()
> was resolved by the aformentioned commit.
> 
> Remove the unused variables and associated infrastructure.
> 
> Reported-by: Jason Andryuk <jandr...@gmail.com>
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 8b66096..197f8d6 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -595,7 +583,6 @@ struct arch_vcpu
>  
>  struct guest_memory_policy
>  {
> -smap_check_policy_t smap_policy;
>  bool nested_guest_mode;

Maybe guest_memory_policy could be dropped and
update_guest_memory_policy updated to take a bool nested_mode
parameter? Or the function can be dropped altogether likely.

In any case, this can be done in a followup cleanup patch.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] libxc: Provide access to internal handles

2018-05-15 Thread Roger Pau Monné
On Mon, May 14, 2018 at 06:08:57PM +0100, Ian Jackson wrote:
> In order to support auditing of qemu depriv, my audit tool wants to
> know the fd of a privcmd handle on which it can easily make
> hypercalls.  xencall provides such a handle, but has no cooked
> facilities for making hypercalls.  So I open a libxc handle.  That
> means I need to get the privcmd fd out of the libxc handle.
> 
> ISTM that it is best to do this by providing an interface to get the
> underlying library handles for a libxc handle.  This kind of interface
> is quite common elsewhere and has not caused problems.
> 
> libxc is not a stable API so the downside risk of providing this
> access is not significant.
> 
> Signed-off-by: Ian Jackson 
> ---
>  tools/libxc/include/xenctrl.h | 10 ++
>  tools/libxc/xc_private.c  |  5 +
>  2 files changed, 15 insertions(+)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 408fa1c..d7733aa 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -183,6 +183,16 @@ enum xc_open_flags {
>   */
>  int xc_interface_close(xc_interface *xch);
>  
> +/**
> + * Return the handles which xch has opened and will use for
> + * hypercalls, foreign memory accesses and device model operations.
> + * These may be used with the corresponding libraries so long as the
> + * xch itself remains open.
> + */
> +struct xencall_handle *xc_interface_xcall_handle(xc_interface *xch);
> +struct xenforeignmemory_handle *xc_interface_fmem_handle(xc_interface *xch);
> +struct xendevicemodel_handle *xc_interface_dmod_handle(xc_interface *xch);

You introduce 3 prototypes but there's only one function being defined
below. Is this patch missing some chunks or I'm missing something
myself?

> +xencall_handle *xc_interface_xcall_handle(xc_interface *xch)
> +{
> +return xch->xcall;
> +}
> +
>  static pthread_key_t errbuf_pkey;
>  static pthread_once_t errbuf_pkey_once = PTHREAD_ONCE_INIT;

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 06/10] Perl @INC path: fix a few more scripts to use BEGIN

2018-05-17 Thread Roger Pau Monné
On Thu, May 17, 2018 at 12:16:55PM +0100, Ian Jackson wrote:
> Three more files which missed out on
>   dea987c5ab11 "PERLLIB, @INC: Use BEGIN { }"
> 
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 07/10] mg-anoint: Make readonly operations "work" in standalone mode

2018-05-17 Thread Roger Pau Monné
On Thu, May 17, 2018 at 12:16:56PM +0100, Ian Jackson wrote:
> This makes `mg-anoint' in standalone mode a view onto an empty set of
> anointments.  So now it becomes ok to call mg-anoint in make-*-flight.
> 
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.11] x86/cacheattr: fix mtrr_pat_not_equal

2018-05-17 Thread Roger Pau Monné
On Thu, May 17, 2018 at 04:44:04AM -0600, Jan Beulich wrote:
> >>> On 17.05.18 at 11:48, <roger@citrix.com> wrote:
> > The function is supposed to return whether the MTRR and PAT state of
> > two CPUs match. Currently this is not properly done because the test
> > for the deftype and the enable bits required both the deftype and the
> > enable bits to be different, while just one of those fields being
> > different can already cause the MTRR states on the vCPU to not match.
> > 
> > Fix this by changing the AND into an OR instead, so that either the
> > deftype or the enabled bits being different will cause the function to
> > return mismatching state.
> 
> This is by far not enough, but I didn't view the function as critical
> enough to warrant sending out the patch I have right away.

I've also realized that the logic there is wonky and would return true
in cases where the states are equal (ie: for example if fixed MTRRs
contents are different but FE is disabled).

Just wanted to do a minimal change that prevents wrongly reporting
that the state is equal when it's not (I think the other way around is
not that critical).

You change LGTM, and fixes some obvious cases where the current code
would return true even if the cache state is the same.

> Jan
> x86/HVM: correct mtrr_pat_not_equal()
> 
> The two vCPU-s differring in MTRR-enabled state means MTRR settings are
> not equal. Both vCPU-s having MTRRs disabled means only PAT needs to be
> compared. Along those lines for fixed range MTRRs. Differring variable
> range counts likewise mean settings are different overall.
> 
> Constify types and convert bool_t to bool.
> 
> Signed-off-by: Jan Beulich <jbeul...@suse.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

> --- unstable.orig/xen/arch/x86/hvm/mtrr.c
> +++ unstable/xen/arch/x86/hvm/mtrr.c
> @@ -476,35 +476,40 @@ bool_t mtrr_var_range_msr_set(
>  return 1;
>  }
>  
> -bool_t mtrr_pat_not_equal(struct vcpu *vd, struct vcpu *vs)
> +bool mtrr_pat_not_equal(const struct vcpu *vd, const struct vcpu *vs)
>  {
> -struct mtrr_state *md = >arch.hvm_vcpu.mtrr;
> -struct mtrr_state *ms = >arch.hvm_vcpu.mtrr;
> -int32_t res;
> -uint8_t num_var_ranges = (uint8_t)md->mtrr_cap;
> -
> -/* Test fixed ranges. */
> -res = memcmp(md->fixed_ranges, ms->fixed_ranges,
> -NUM_FIXED_RANGES*sizeof(mtrr_type));
> -if ( res )
> -return 1;
> -
> -/* Test var ranges. */
> -res = memcmp(md->var_ranges, ms->var_ranges,
> -num_var_ranges*sizeof(struct mtrr_var_range));
> -if ( res )
> -return 1;
> -
> -/* Test default type MSR. */
> -if ( (md->def_type != ms->def_type)
> -&& (md->enabled != ms->enabled) )
> -return 1;
> +const struct mtrr_state *md = >arch.hvm_vcpu.mtrr;
> +const struct mtrr_state *ms = >arch.hvm_vcpu.mtrr;
>  
> -/* Test PAT. */
> -if ( vd->arch.hvm_vcpu.pat_cr != vs->arch.hvm_vcpu.pat_cr )
> -return 1;
> +if ( (md->enabled ^ ms->enabled) & 2 )
> +return true;
> +
> +if ( md->enabled & 2 )
> +{
> +unsigned int num_var_ranges = (uint8_t)md->mtrr_cap;
> +
> +/* Test default type MSR. */
> +if ( md->def_type != ms->def_type )
> +return true;
> +
> +/* Test fixed ranges. */
> +if ( (md->enabled ^ ms->enabled) & 1 )
> +return true;
> +
> +if ( (md->enabled & 1) &&
> + memcmp(md->fixed_ranges, ms->fixed_ranges,
> +sizeof(md->fixed_ranges)) )
> +return true;
> +
> +/* Test variable ranges. */
> +if ( num_var_ranges != (uint8_t)ms->mtrr_cap ||

Is it really possible to have two vCPUs on the same domain with a
different number of variable ranges?

> + memcmp(md->var_ranges, ms->var_ranges,
> +num_var_ranges * sizeof(*md->var_ranges)) )
> +return true;
> +}
>  
> -return 0;
> +/* Test PAT. */
> +return vd->arch.hvm_vcpu.pat_cr != vs->arch.hvm_vcpu.pat_cr;
>  }
>  
>  struct hvm_mem_pinned_cacheattr_range {
> --- unstable.orig/xen/include/asm-x86/mtrr.h
> +++ unstable/xen/include/asm-x86/mtrr.h
> @@ -92,6 +92,6 @@ extern void memory_type_changed(struct d
>  extern bool_t pat_msr_set(uint64_t *pat, uint64_t msr);
>  
>  bool_t is_var_mtrr_overlapped(const struct mtrr_state *m);
> -bool_t mtrr_pat_not_equal(struct vcpu *vd, struct vcpu *vs);
> +bool mtrr_pat_not_equal(const struct vcpu *vd, const struct vcpu *vs);
>  
>  #endif /* __ASM_X86_MTRR_H__ */
> 
> 
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 10/10] mfi-common: Fall back to anointed builds in Executive mode

2018-05-17 Thread Roger Pau Monné
On Thu, May 17, 2018 at 12:16:59PM +0100, Ian Jackson wrote:
> Previously, `make-*-flight' would not work unless FREEBSD_*_BUILDJOB
> was set.  Now we use the anointed values if we can find them.
> 
> If we can't, mg-anoint retrieve will print a warning.
> 
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

LGTM

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 08/10] mg-anoint: Support mg-anoint retrieve --tolerate-unprepared

2018-05-17 Thread Roger Pau Monné
On Thu, May 17, 2018 at 12:16:57PM +0100, Ian Jackson wrote:
> make-*-flight is going to want this.
> 
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Just a style comment below.

> ---
>  mg-anoint | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/mg-anoint b/mg-anoint
> index 522cbdd..d09124b 100755
> --- a/mg-anoint
> +++ b/mg-anoint
> @@ -15,10 +15,12 @@
>  #--allow-blessed=BLESSING,...   default is from `prepare'
>  #--allow-job-status=STATUS,...  default is only `pass'
>  #
> -#  ./mg-anoint retrieve REFKEY
> +#  ./mg-anoint retrieve [--tolerate-unprepared] REFKEY
>  #  => FLIGHT JOB
>  # if nothing anointed yet, prints nothing and exits 0
>  # if anointment not prepared, fails
> +#  With --tolerate-unprepared, it is not an error if nothing is
> +#  reported because the anointment has not been prepared.
>  #
>  #  ./mg-anoint list-prepared REFKEY-GLOB
>  #  => possibly empty list of REFKEYs
> @@ -294,6 +296,11 @@ END
>  }
>  
>  sub cmd_retrieve {
> +my $tolerate_unprepared;
> +if (@ARGV && $ARGV[0] eq '--tolerate-unprepared') {
> + shift @ARGV;
> + $tolerate_unprepared = 1;
> +}

There seems to be a mix between hard and soft tabs in the chunk above
(and below).

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH 09/10] mfi-common: set_freebsd_runvars: Never set freebsd_distpath to `/amd64' etc.

2018-05-17 Thread Roger Pau Monné
On Thu, May 17, 2018 at 12:16:58PM +0100, Ian Jackson wrote:
> Logically, the final branch of the if should be qualified with a check
> for the emptiness of FreeBSDDist.  This is awkward in the current
> structure, since we really want to do the distpath lookup only if
> needed.  (This is not very important right now, but we are about to
> add another case which will do a more-likely-to-bomb-out and
> more-likely-to-block-on-the-db lookup.)  So refactor into `return'
> style.  This lets us introduce local variables in each branch.
> 
> Now gate the final branch appropriately.  The overall result is that
> if no useful FreeBSD build is found, we simply do not set the
> freebsd_* runvars, rather than setting them to wrong values (eg,
> `freebsd_distpath=/i386'.
> 
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/6] hvm/mtrr: use the hardware number of variable ranges for Dom0

2018-05-16 Thread Roger Pau Monné
On Wed, May 16, 2018 at 07:51:07AM -0600, Jan Beulich wrote:
> >>> On 16.05.18 at 15:41,  wrote:
> > On Wed, May 16, 2018 at 06:01:00AM -0600, Jan Beulich wrote:
> >> @@ -729,6 +729,14 @@ static int hvm_load_mtrr_msr(struct doma
> >>  if ( hvm_load_entry(MTRR, h, _mtrr) != 0 )
> >>  return -EINVAL;
> >>  
> >> +if ( (uint8_t)hw_mtrr.msr_mtrr_cap > MTRR_VCNT )
> > 
> > And here I would compare against the VCNT in mtrr_state->mtrr_cat
> > instead of MTRR_VCNT if you agree.
> 
> No, I don't agree: We're _defining_ the number of MTRRs for this vCPU
> here. There's no difference if such a record is loaded only once, or if all
> such records agree in count. But I don't see why we should refuse
> loading a record with a count higher than that provided by an earlier
> record, but no larger than MTRR_VCNT.

You are right, a previous load could have changed mtrr_state->mtrr_cap
to < MTRR_VCNT, and then subsequent intents to load MTRR_VCNT number
of MTRRs would fail.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/6] hvm/mtrr: use the hardware number of variable ranges for Dom0

2018-05-16 Thread Roger Pau Monné
On Wed, May 16, 2018 at 06:01:00AM -0600, Jan Beulich wrote:
> >>> On 16.05.18 at 13:53,  wrote:
> > On Wed, May 16, 2018 at 02:39:26AM -0600, Jan Beulich wrote:
> >> >>> On 15.05.18 at 16:36,  wrote:
> >> > +for ( i = 0; i < num_var_ranges; i++ )
> >> 
> >> Following your v1 I had already put together a patch to change just the
> >> save and load functions here, as the adjustments are necessary
> >> independent of the Dom0 aspect. Should num_var_ranges indeed be
> >> below MTRR_VCNT, there's an information leak here (of hypervisor stack
> >> data) without pre-initializing hw_mtrr. Here's the hunk from my patch, in
> >> case you care to re-use parts of it:
> >> 
> >> @@ -676,22 +676,22 @@ int hvm_set_mem_pinned_cacheattr(struct
> >>  
> >>  static int hvm_save_mtrr_msr(struct domain *d, hvm_domain_context_t *h)
> >>  {
> >> -int i;
> >>  struct vcpu *v;
> >> -struct hvm_hw_mtrr hw_mtrr;
> >> -struct mtrr_state *mtrr_state;
> >> +
> >>  /* save mtrr */
> >>  for_each_vcpu(d, v)
> >>  {
> >> -mtrr_state = >arch.hvm_vcpu.mtrr;
> >> +const struct mtrr_state *mtrr_state = >arch.hvm_vcpu.mtrr;
> >> +struct hvm_hw_mtrr hw_mtrr = {
> >> +.msr_mtrr_def_type = mtrr_state->def_type |
> >> + (mtrr_state->enabled << 10),
> >> +.msr_mtrr_cap  = mtrr_state->mtrr_cap,
> >> +};
> >> +unsigned int i;
> >>  
> >>  hvm_get_guest_pat(v, _mtrr.msr_pat_cr);
> >>  
> >> -hw_mtrr.msr_mtrr_def_type = mtrr_state->def_type
> >> -| (mtrr_state->enabled << 10);
> >> -hw_mtrr.msr_mtrr_cap = mtrr_state->mtrr_cap;
> >> -
> >> -for ( i = 0; i < MTRR_VCNT; i++ )
> >> +for ( i = 0; i < (uint8_t)hw_mtrr.msr_mtrr_cap; i++ )
> >>  {
> >>  /* save physbase */
> >>  hw_mtrr.msr_mtrr_var[i*2] =
> >> 
> >> (I didn't send it out yet as I'm generally of the opinion that prior to
> >> branching focus should be on the code to be released rather than
> >> the next following version.)
> > 
> > Would you be OK if I integrate this as a pre-patch to this one in my
> > series?
> 
> Sure, but then maybe better use the full one:

Thanks, I've added this to my MTRR series just after the switch to use
MASK_EXTR to get the VCNT.

> x86/HVM: improve MTRR load checks
> 
> We should not assume that the incoming set of values contains exactly
> MTRR_VCNT variable range MSRs. Permit a smaller amount and reject a
> bigger one. As a result the save path then also needs to no longer use
> a fixed upper bound, in turn requiring unused space in the save record
> to be zeroed up front.
> 
> Also slightly refine types where appropriate.
> 
> Signed-off-by: Jan Beulich 
> 
> --- unstable.orig/xen/arch/x86/hvm/mtrr.c
> +++ unstable/xen/arch/x86/hvm/mtrr.c
> @@ -676,22 +676,22 @@ int hvm_set_mem_pinned_cacheattr(struct
>  
>  static int hvm_save_mtrr_msr(struct domain *d, hvm_domain_context_t *h)
>  {
> -int i;
>  struct vcpu *v;
> -struct hvm_hw_mtrr hw_mtrr;
> -struct mtrr_state *mtrr_state;
> +
>  /* save mtrr */
>  for_each_vcpu(d, v)
>  {
> -mtrr_state = >arch.hvm_vcpu.mtrr;
> +const struct mtrr_state *mtrr_state = >arch.hvm_vcpu.mtrr;
> +struct hvm_hw_mtrr hw_mtrr = {
> +.msr_mtrr_def_type = mtrr_state->def_type |
> + (mtrr_state->enabled << 10),
> +.msr_mtrr_cap  = mtrr_state->mtrr_cap,
> +};
> +unsigned int i;
>  
>  hvm_get_guest_pat(v, _mtrr.msr_pat_cr);
>  
> -hw_mtrr.msr_mtrr_def_type = mtrr_state->def_type
> -| (mtrr_state->enabled << 10);
> -hw_mtrr.msr_mtrr_cap = mtrr_state->mtrr_cap;
> -
> -for ( i = 0; i < MTRR_VCNT; i++ )
> +for ( i = 0; i < (uint8_t)hw_mtrr.msr_mtrr_cap; i++ )

I've changed this to use MASK_EXTR instead (and switched the cases
below).

>  {
>  /* save physbase */
>  hw_mtrr.msr_mtrr_var[i*2] =
> @@ -729,6 +729,14 @@ static int hvm_load_mtrr_msr(struct doma
>  if ( hvm_load_entry(MTRR, h, _mtrr) != 0 )
>  return -EINVAL;
>  
> +if ( (uint8_t)hw_mtrr.msr_mtrr_cap > MTRR_VCNT )

And here I would compare against the VCNT in mtrr_state->mtrr_cat
instead of MTRR_VCNT if you agree.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/4] libxl: Provide better error message when qemu restrict user not found

2018-05-15 Thread Roger Pau Monné
On Mon, May 14, 2018 at 06:08:59PM +0100, Ian Jackson wrote:
> Add mention of LIBXL_QEMU_USER_RANGE_BASE, in case that is what the
> user was intending.
> 
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Draft NVDIMM proposal

2018-05-15 Thread Roger Pau Monné
Just some replies/questions to some of the points raised below.

On Fri, May 11, 2018 at 09:33:10AM -0700, Dan Williams wrote:
> [ adding linux-nvdimm ]
> 
> Great write up! Some comments below...
> 
> On Wed, May 9, 2018 at 10:35 AM, George Dunlap  
> wrote:
> >> To use a namespace, an operating system needs at a minimum two pieces
> >> of information: The UUID and/or Name of the namespace, and the SPA
> >> range where that namespace is mapped; and ideally also the Type and
> >> Abstraction Type to know how to interpret the data inside.
> 
> Not necessarily, no. Linux supports "label-less" mode where it exposes
> the raw capacity of a region in 1:1 mapped namespace without a label.
> This is how Linux supports "legacy" NVDIMMs that do not support
> labels.

In that case, how does Linux know which area of the NVDIMM it should
use to store the page structures?

> >> `fsdax` and `devdax` mode are both designed to make it possible for
> >> user processes to have direct mapping of NVRAM.  As such, both are
> >> only suitable for PMEM namespaces (?).  Both also need to have kernel
> >> page structures allocated for each page of NVRAM; this amounts to 64
> >> bytes for every 4k of NVRAM.  Memory for these page structures can
> >> either be allocated out of normal "system" memory, or inside the PMEM
> >> namespace itself.
> >>
> >> In both cases, an "info block", very similar to the BTT info block, is
> >> written to the beginning of the namespace when created.  This info
> >> block specifies whether the page structures come from system memory or
> >> from the namespace itself.  If from the namespace itself, it contains
> >> information about what parts of the namespace have been set aside for
> >> Linux to use for this purpose.
> >>
> >> Linux has also defined "Type GUIDs" for these two types of namespace
> >> to be stored in the namespace label, although these are not yet in the
> >> ACPI spec.
> 
> They never will be. One of the motivations for GUIDs is that an OS can
> define private ones without needing to go back and standardize them.
> Only GUIDs that are needed to inter-OS / pre-OS compatibility would
> need to be defined in ACPI, and there is no expectation that other
> OSes understand Linux's format for reserving page structure space.

Maybe it would be helpful to somehow mark those areas as
"non-persistent" storage, so that other OSes know they can use this
space for temporary data that doesn't need to survive across reboots?

> >> # Proposed design / roadmap
> >>
> >> Initially, dom0 accesses the NVRAM as normal, using static ACPI tables
> >> and the DSM methods; mappings are treated by Xen during this phase as
> >> MMIO.
> >>
> >> Once dom0 is ready to pass parts of a namespace through to a guest, it
> >> makes a hypercall to tell Xen about the namespace.  It includes any
> >> regions of the namespace which Xen may use for 'scratch'; it also
> >> includes a flag to indicate whether this 'scratch' space may be used
> >> for frame tables from other namespaces.
> >>
> >> Frame tables are then created for this SPA range.  They will be
> >> allocated from, in this order: 1) designated 'scratch' range from
> >> within this namespace 2) designated 'scratch' range from other
> >> namespaces which has been marked as sharable 3) system RAM.
> >>
> >> Xen will either verify that dom0 has no existing mappings, or promote
> >> the mappings to full pages (taking appropriate reference counts for
> >> mappings).  Dom0 must ensure that this namespace is not unmapped,
> >> modified, or relocated until it asks Xen to unmap it.
> >>
> >> For Xen frame tables, to begin with, set aside a partition inside a
> >> namespace to be used by Xen.  Pass this in to Xen when activating the
> >> namespace; this could be either 2a or 3a from "Page structure
> >> allocation".  After that, we could decide which of the two more
> >> streamlined approaches (2b or 3b) to pursue.
> >>
> >> At this point, dom0 can pass parts of the mapped namespace into
> >> guests.  Unfortunately, passing files on a fsdax filesystem is
> >> probably not safe; but we can pass in full dev-dax or fsdax
> >> partitions.
> >>
> >> From a guest perspective, I propose we provide static NFIT only, no
> >> access to labels to begin with.  This can be generated in hvmloader
> >> and/or the toolstack acpi code.
> 
> I'm ignorant of Xen internals, but can you not reuse the existing QEMU
> emulation for labels and NFIT?

We only use QEMU for HVM guests, which would still leave PVH guests
without NVDIMM support. Ideally we would like to use the same solution
for both HVM and PVH, which means QEMU cannot be part of that
solution.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/5] hvm/mtrr: copy hardware state for Dom0

2018-05-15 Thread Roger Pau Monné
On Tue, May 15, 2018 at 02:48:16AM -0600, Jan Beulich wrote:
> >>> On 15.05.18 at 10:35, <roger@citrix.com> wrote:
> > On Tue, May 15, 2018 at 01:52:43AM -0600, Jan Beulich wrote:
> >> >>> On 14.05.18 at 18:33, <roger@citrix.com> wrote:
> >> > On Mon, May 14, 2018 at 08:26:30AM -0600, Jan Beulich wrote:
> >> >> >>> On 10.05.18 at 19:15, <roger@citrix.com> wrote:
> >> >> > Copy the state found on the hardware when creating a PVH Dom0. Since
> >> >> > the memory map provided to a PVH Dom0 is based on the native one using
> >> >> > the same set of MTRR ranges should provide Dom0 with a sane MTRR state
> >> >> > without having to manually build it in Xen.
> >> >> > 
> >> >> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> >> >> > ---
> >> >> > Cc: Jan Beulich <jbeul...@suse.com>
> >> >> > Cc: Andrew Cooper <andrew.coop...@citrix.com>
> >> >> > ---
> >> >> >  xen/arch/x86/hvm/mtrr.c | 23 +++
> >> >> >  1 file changed, 23 insertions(+)
> >> >> > 
> >> >> > diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
> >> >> > index 95a3deabea..1cb000388a 100644
> >> >> > --- a/xen/arch/x86/hvm/mtrr.c
> >> >> > +++ b/xen/arch/x86/hvm/mtrr.c
> >> >> > @@ -176,6 +176,29 @@ int hvm_vcpu_cacheattr_init(struct vcpu *v)
> >> >> >  ((uint64_t)PAT_TYPE_UC_MINUS << 48) |   /* PAT6: UC- */
> >> >> >  ((uint64_t)PAT_TYPE_UNCACHABLE << 56);  /* PAT7: UC */
> >> >> >  
> >> >> > +if ( is_hardware_domain(v->domain) )
> >> >> > +{
> >> >> > +/* Copy values from the host. */
> >> >> > +struct domain *d = v->domain;
> >> >> > +unsigned int i;
> >> >> > +
> >> >> > +if ( mtrr_state.have_fixed )
> >> >> > +for ( i = 0; i < NUM_FIXED_MSR; i++ )
> >> >> > +mtrr_fix_range_msr_set(d, m, i,
> >> >> > +  ((uint64_t 
> >> >> > *)mtrr_state.fixed_ranges)[i]);
> >> >> 
> >> >> The presence/absence of fixed range MTRRs needs to be reflected in the
> >> >> capabilities MSR. Strictly speaking in their absence MSR access 
> >> >> attempts to
> >> >> the fixed range MSRs should also cause #GP, as should any attempt to
> >> >> enable them in defType.
> >> > 
> >> > My intention was to always provide the fixed range MTRR capability,
> >> > regardless of whether the underlying hardware has it or not. Then of
> >> > course fixed ranges won't be enabled by default in the deftype MSR if
> >> > the underlying hardware also hasn't got them enabled.
> >> 
> >> What would the result be of the OS writing to any of these MSRs, or
> >> setting the respective enable bit?
> > 
> > Likely the cache attributes for the guest will change if it sets some
> > fixed ranges and enables the FE bit. But I'm not sure why is that a
> > problem.
> 
> "The guest" being Dom0 here, don't forget. I simply don't see how you
> would properly mimic the behavior without there actually being fixed
> range MTRRs. Plus it contradicts the patch description.

Please bear with me.

The reason of this patchset is to provide PVH Dom0 with a sane initial
MTRR state, not to allow a PVH Dom0 to set the host MTRR state
directly.

So the fact that the underlying hardware doesn't have support for
fixed MTRR ranges shouldn't affect Xen's capability to provide such
feature to Dom0.

I see no reason to allow Dom0 to directly control the host MTRR
values. A PVH Dom0 has it's own physical memory map and can set
whatever cache attributes it wishes without affecting the host MTRR
types.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] hvm/mtrr: copy hardware state for Dom0

2018-05-16 Thread Roger Pau Monné
On Wed, May 16, 2018 at 02:47:39AM -0600, Jan Beulich wrote:
> >>> On 15.05.18 at 16:36,  wrote:
> > --- a/xen/arch/x86/hvm/mtrr.c
> > +++ b/xen/arch/x86/hvm/mtrr.c
> > @@ -185,6 +185,30 @@ int hvm_vcpu_cacheattr_init(struct vcpu *v)
> >  ((uint64_t)PAT_TYPE_UC_MINUS << 48) |   /* PAT6: UC- */
> >  ((uint64_t)PAT_TYPE_UNCACHABLE << 56);  /* PAT7: UC */
> >  
> > +if ( is_hardware_domain(v->domain) )
> > +{
> > +/* Copy values from the host. */
> > +struct domain *d = v->domain;
> > +unsigned int i;
> > +
> > +if ( mtrr_state.have_fixed )
> > +for ( i = 0; i < NUM_FIXED_MSR; i++ )
> > +mtrr_fix_range_msr_set(d, m, i,
> > +  ((uint64_t 
> > *)mtrr_state.fixed_ranges)[i]);
> > +
> > +for ( i = 0; i < num_var_ranges; i++ )
> > +{
> > +mtrr_var_range_msr_set(d, m, MSR_IA32_MTRR_PHYSBASE(i),
> > +   mtrr_state.var_ranges[i].base);
> > +mtrr_var_range_msr_set(d, m, MSR_IA32_MTRR_PHYSMASK(i),
> > +   mtrr_state.var_ranges[i].mask);
> > +}
> > +
> > +mtrr_def_type_msr_set(d, m,
> > +  mtrr_state.def_type |
> > +  (mtrr_state.enabled << 
> > MTRRdefType_FE_SHIFT));
> 
> This is all very clumsy (not your fault of course) - in particular, the 
> "enabled"
> field is a two-bit value. I'd rather not see this to continue to be that way,
> and hence I've created a patch to clean this up (see below; I'm intentionally
> retaining my own TODO notes in there for your reference, and it's also
> unlikely to apply as is because it sits on top of other changes). In that 
> light I
> think I'd prefer if you either (later) re-based this onto my patch or reverted
> back to the use of the literal 10 here.

I've also realized this but thought about fixing it later.

I don't mind adding this patch to my MTRR series in order to make
rebasing easier on my side if it's OK for you.

I would likely add the enabled and fixed_enabled masks to msr-index.h
and use MASK_* instead of shifts while there.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/6] hvm/mtrr: use the hardware number of variable ranges for Dom0

2018-05-16 Thread Roger Pau Monné
On Wed, May 16, 2018 at 02:39:26AM -0600, Jan Beulich wrote:
> >>> On 15.05.18 at 16:36,  wrote:
> > +for ( i = 0; i < num_var_ranges; i++ )
> 
> Following your v1 I had already put together a patch to change just the
> save and load functions here, as the adjustments are necessary
> independent of the Dom0 aspect. Should num_var_ranges indeed be
> below MTRR_VCNT, there's an information leak here (of hypervisor stack
> data) without pre-initializing hw_mtrr. Here's the hunk from my patch, in
> case you care to re-use parts of it:
> 
> @@ -676,22 +676,22 @@ int hvm_set_mem_pinned_cacheattr(struct
>  
>  static int hvm_save_mtrr_msr(struct domain *d, hvm_domain_context_t *h)
>  {
> -int i;
>  struct vcpu *v;
> -struct hvm_hw_mtrr hw_mtrr;
> -struct mtrr_state *mtrr_state;
> +
>  /* save mtrr */
>  for_each_vcpu(d, v)
>  {
> -mtrr_state = >arch.hvm_vcpu.mtrr;
> +const struct mtrr_state *mtrr_state = >arch.hvm_vcpu.mtrr;
> +struct hvm_hw_mtrr hw_mtrr = {
> +.msr_mtrr_def_type = mtrr_state->def_type |
> + (mtrr_state->enabled << 10),
> +.msr_mtrr_cap  = mtrr_state->mtrr_cap,
> +};
> +unsigned int i;
>  
>  hvm_get_guest_pat(v, _mtrr.msr_pat_cr);
>  
> -hw_mtrr.msr_mtrr_def_type = mtrr_state->def_type
> -| (mtrr_state->enabled << 10);
> -hw_mtrr.msr_mtrr_cap = mtrr_state->mtrr_cap;
> -
> -for ( i = 0; i < MTRR_VCNT; i++ )
> +for ( i = 0; i < (uint8_t)hw_mtrr.msr_mtrr_cap; i++ )
>  {
>  /* save physbase */
>  hw_mtrr.msr_mtrr_var[i*2] =
> 
> (I didn't send it out yet as I'm generally of the opinion that prior to
> branching focus should be on the code to be released rather than
> the next following version.)

Would you be OK if I integrate this as a pre-patch to this one in my
series?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen-hvm: stop faking I/O to access PCI config space

2018-05-16 Thread Roger Pau Monné
On Thu, May 03, 2018 at 12:18:40PM +0100, Paul Durrant wrote:
> This patch removes the current hackery where IOREQ_TYPE_PCI_CONFIG
> reqyests are handled by faking PIO to 0xcf8 and 0xcfc and replaces it
> with direct calls to pci_host_config_read/write_common().
> Doing so necessitates mapping BDFs to PCIDevices but maintaining a simple
> QLIST in xen_device_realize/unrealize() will suffice.
> 
> NOTE: whilst config space accesses are currently limited to
>   PCI_CONFIG_SPACE_SIZE, this patch paves the way to increasing the
>   limit to PCIE_CONFIG_SPACE_SIZE when Xen gains the ability to
>   emulate MCFG table accesses.

Thanks for doing this. I'm not a QEMU maintainer but:

Reviewed-by: Roger Pau Monné <roger@citrix.com>

> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> --
> Cc: Stefano Stabellini <sstabell...@kernel.org>
> Cc: Anthony Perard <anthony.per...@citrix.com>
> Cc: "Michael S. Tsirkin" <m...@redhat.com>
> Cc: Marcel Apfelbaum <mar...@redhat.com>
> Cc: Paolo Bonzini <pbonz...@redhat.com>
> Cc: Richard Henderson <r...@twiddle.net>
> Cc: Eduardo Habkost <ehabk...@redhat.com>
> ---
>  hw/i386/xen/trace-events |   2 +
>  hw/i386/xen/xen-hvm.c| 101 
> +--
>  2 files changed, 83 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
> index 8dab7bc..f576f1b 100644
> --- a/hw/i386/xen/trace-events
> +++ b/hw/i386/xen/trace-events
> @@ -15,6 +15,8 @@ cpu_ioreq_pio(void *req, uint32_t dir, uint32_t df, 
> uint32_t data_is_ptr, uint64
>  cpu_ioreq_pio_read_reg(void *req, uint64_t data, uint64_t addr, uint32_t 
> size) "I/O=%p pio read reg data=0x%"PRIx64" port=0x%"PRIx64" size=%d"
>  cpu_ioreq_pio_write_reg(void *req, uint64_t data, uint64_t addr, uint32_t 
> size) "I/O=%p pio write reg data=0x%"PRIx64" port=0x%"PRIx64" size=%d"
>  cpu_ioreq_move(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, 
> uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p copy 
> dir=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
> +cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, 
> uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
> +cpu_ioreq_config_write(void *req, uint32_t sbdf, uint32_t reg, uint32_t 
> size, uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
>  
>  # xen-mapcache.c
>  xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index caa563b..c139d29 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -12,6 +12,7 @@
>  
>  #include "cpu.h"
>  #include "hw/pci/pci.h"
> +#include "hw/pci/pci_host.h"
>  #include "hw/i386/pc.h"
>  #include "hw/i386/apic-msidef.h"
>  #include "hw/xen/xen_common.h"
> @@ -86,6 +87,12 @@ typedef struct XenPhysmap {
>  QLIST_ENTRY(XenPhysmap) list;
>  } XenPhysmap;
>  
> +typedef struct XenPciDevice {
> +PCIDevice *pci_dev;
> +uint32_t sbdf;
> +QLIST_ENTRY(XenPciDevice) entry;
> +} XenPciDevice;
> +
>  typedef struct XenIOState {
>  ioservid_t ioservid;
>  shared_iopage_t *shared_page;
> @@ -105,6 +112,7 @@ typedef struct XenIOState {
>  struct xs_handle *xenstore;
>  MemoryListener memory_listener;
>  MemoryListener io_listener;
> +QLIST_HEAD(, XenPciDevice) dev_list;
>  DeviceListener device_listener;
>  QLIST_HEAD(, XenPhysmap) physmap;
>  hwaddr free_phys_offset;
> @@ -569,6 +577,12 @@ static void xen_device_realize(DeviceListener *listener,
>  
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>  PCIDevice *pci_dev = PCI_DEVICE(dev);
> +XenPciDevice *xendev = g_new(XenPciDevice, 1);
> +
> +xendev->pci_dev = pci_dev;
> +xendev->sbdf = PCI_BUILD_BDF(pci_dev_bus_num(pci_dev),
> + pci_dev->devfn);
> +QLIST_INSERT_HEAD(>dev_list, xendev, entry);
>  
>  xen_map_pcidev(xen_domid, state->ioservid, pci_dev);
>  }
> @@ -581,8 +595,17 @@ static void xen_device_unrealize(DeviceListener 
> *listener,
>  
>  if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>  PCIDevice *pci_dev = PCI_DEVICE(dev);
> +XenPciDevice *xendev, *next;
>  
>  xen_unmap_pcidev(xen_domid, state->ioservid, pci_dev);
> +
> +QLIST_FOREACH_SAFE(xendev, >dev_list, entry, next) {
>

Re: [Xen-devel] ViryaOS: proposal for a new Xen Project sub-project

2018-05-21 Thread Roger Pau Monné
This looks interesting IMO. Just one small nit.

On Thu, May 17, 2018 at 03:31:40PM -0700, Stefano Stabellini wrote:
> ## Hardware Support
> 
> ViryaOS will support as many hardware platforms as possible, x86 and ARM

It might be good to mention "x86 (amd64)", and likewise in the bullet
list below. It won't work with some of the low-power x86 dev boards
that contain the Quark CPUs which are 32-bit only.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] xen vtd : set msi guest_masked 0 by default

2018-05-21 Thread Roger Pau Monné
On Mon, May 21, 2018 at 12:46:19PM +0100, David Woodhouse wrote:
> On Tue, 2016-01-26 at 09:34 +0800, Jianzhong,Chang wrote:
> > There are some problems when msi guest_masked is set to 1 by default.
> > When guest os is windows 2008 r2 server,
> > the device(eg X540-AT2 vf) is not initialized correctly.
> > Host will always receive message like this :"VF Reset msg received from vf".
> > Guest has network connectivity issues,
> > and can not correctly receive/send the packet.
> 
> In other words "the guest doesn't get any interrupts from the NIC".
> 
> > So, guest_masked is set to 0 by default.
> 
> This seems consistent with the PCI spec, which says that "After reset,
> the state of all implemented Mask and Pending bits is 0 (no vectors are
> masked and no messages are pending)."
> 
> That's what we *used* to have in Xen, before these commits changed it
> to assume that IRQs were guest-masked by default:
> 
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=ad28e42bd1d28d746988ed71654e8aa670629753
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84d6add5593d865736831d150da7c38588f669f6
> 
> > --- a/xen/arch/x86/msi.c
> > +++ b/xen/arch/x86/msi.c
> > @@ -512,7 +512,7 @@ void guest_mask_msi_irq(struct irq_desc *desc, bool_t 
> > mask)
> >  
> >  static unsigned int startup_msi_irq(struct irq_desc *desc)
> >  {
> > -if ( unlikely(!msi_set_mask_bit(desc, 0, !!(desc->status & 
> > IRQ_GUEST))) )
> > +if ( unlikely(!msi_set_mask_bit(desc, 0, 0) ))
> >  WARN();
> >  return 0;
> >  }
> 
> In testing, this part actually seems to make the difference in
> practice. Interrupts now work, and Windows guests have connectivity
> again.
> 
> > @@ -972,7 +972,7 @@ static int msix_capability_init(struct pci_dev *dev,
> >  entry->msi_attrib.entry_nr = msi->entry_nr;
> >  entry->msi_attrib.maskbit = 1;
> >  entry->msi_attrib.host_masked = 1;
> > -entry->msi_attrib.guest_masked = 1;
> > +entry->msi_attrib.guest_masked = 0;
> >  entry->msi_attrib.pos = pos;
> >  entry->irq = msi->irq;
> >  entry->dev = dev;
> 
> That also seems correct though, since it reflects the actual state we
> intend to emulate.

Hm, I think I might have fixed this issue, see:

https://git.qemu.org/?p=qemu.git;a=commit;h=a8036336609d2e184fc3543a4c439c0ba7d7f3a2

And the Xen side:

http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=69d99d1b223fc5082400374698ddd7486e5ea953

The original report of the issue is at:

https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg00915.html

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()

2018-05-24 Thread Roger Pau Monné
On Wed, May 23, 2018 at 05:55:50PM +0100, Andrew Cooper wrote:
> On 23/05/18 17:39, Roger Pau Monné wrote:
> > On Tue, May 22, 2018 at 12:20:40PM +0100, Andrew Cooper wrote:
> >> Instead of having multiple algorithms searching the MSR lists, implement a
> >> single one.  It has the semantics required by vmx_add_msr(), to identify 
> >> the
> >> position in which an MSR should live, if it isn't already present.
> >>
> >> There will be a marginal improvement for vmx_find_msr() by avoiding the
> >> function pointer calls to vmx_msr_entry_key_cmp(), and a major improvement 
> >> for
> >> vmx_add_msr() by using a binary search instead of a linear search.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> >> ---
> >> CC: Jan Beulich <jbeul...@suse.com>
> >> CC: Jun Nakajima <jun.nakaj...@intel.com>
> >> CC: Kevin Tian <kevin.t...@intel.com>
> >> CC: Wei Liu <wei.l...@citrix.com>
> >> CC: Roger Pau Monné <roger@citrix.com>
> >> ---
> >>  xen/arch/x86/hvm/vmx/vmcs.c | 42 
> >> --
> >>  1 file changed, 28 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> >> index f557857..e4acdc1 100644
> >> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> >> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> >> @@ -1276,24 +1276,36 @@ static int construct_vmcs(struct vcpu *v)
> >>  return 0;
> >>  }
> >>  
> >> -static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
> >> +/*
> >> + * Search an MSR list looking for an MSR entry, or the slot in which it 
> >> should
> >> + * live (to keep the data sorted) if an entry is not found.
> >> + *
> >> + * The return pointer is guarenteed to be bounded by start and end.  
> >> However,
> >> + * it may point at end, and may be invalid for the caller to dereference.
> >> + */
> >> +static struct vmx_msr_entry *locate_msr_entry(
> >> +struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t msr)
> >>  {
> >> -const u32 *msr = key;
> >> -const struct vmx_msr_entry *entry = elt;
> >> +while ( start < end )
> >> +{
> >> +struct vmx_msr_entry *mid = start + (end - start) / 2;
> >>  
> >> -if ( *msr > entry->index )
> >> -return 1;
> >> -if ( *msr < entry->index )
> >> -return -1;
> >> +if ( msr < mid->index )
> >> +end = mid;
> >> +else if ( msr > mid->index )
> >> +start = mid + 1;
> >> +else
> >> +return mid;
> >> +}
> > This is basically an open coded version of bsearch, isn't there anyway
> > to adapt the current bsearch so that it could be used for both
> > vmx_find_msr and vmx_add_msr?
> >
> > I know there will be a performance penalty for using a function
> > pointer for the comparator function, but this looks like code
> > duplication to me.
> 
> A third use appears in a later patch.  bsearch() doesn't have the
> described property on a miss, which is necessary to maintain the lists.

I would consider adding a flag to the list of parameters so that
bsearch returned the position where the item should be added in case
of a miss. You could then wrap it inside of locate_msr_entry, or get
rid of this helper altogether.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/5] x86: split out cpuid objects and helpers

2018-05-25 Thread Roger Pau Monné
On Fri, May 25, 2018 at 11:01:18AM +0100, Andrew Cooper wrote:
> On 25/05/18 10:49, Wei Liu wrote:
> > On Fri, May 25, 2018 at 11:41:04AM +0200, Roger Pau Monné wrote:
> >> On Thu, May 24, 2018 at 05:05:19PM +0100, Wei Liu wrote:
> >>> They are moved to a new header which is going to be consumed by both
> >>> the hypervisor and toolstack.
> >>>
> >>> Create a new directory for this kind of headers in anticipation of
> >>   ^that?
> >>> more will come.
> >>>
> >>> No functional change.
> >>>
> >>> Signed-off-by: Wei Liu <wei.l...@citrix.com>
> >>> ---
> >>> Cc: Jan Beulich <jbeul...@suse.com>
> >>> Cc: Andrew Cooper <andrew.coop...@citrix.com>
> >>>
> >>> Any suggestion on the directory name?
> >>> ---
> >>>  xen/include/asm-x86/arch-shared/cpuid.h | 213 
> >>> 
> >>>  xen/include/asm-x86/cpuid.h | 210 
> >>> +--
> >> I would have placed those inside of:
> >>
> >> xen/include/public/arch-x86/cpuid.h
> >>
> >> Protected with a #if defined(__XEN__) || defined(__XEN_TOOLS__)?
> >>
> > I envision this is going to grow into libcpuid or something, just like
> > libelf.
> 
> I was actually considering making a libx86 or equivalent.  At the
> minimum, we'll need a similar split for the MSR code.
> 
> Also, making struct {cpuid,msr}_policy available to the emulation code
> will drop the code volume massively, and has been on my TODO list for ages.
> 
> > Also this isn't some sort of interface between HV and toolstack -- the
> > interface works on serialised data. Putting the internal representation
> > in public header would be wrong.
> 
> +1
> 
> We happen to want to share some code between Xen and libxc, but it
> doesn't want to be in any kind of public interface.

Maybe xen/common/libx86?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/5] x86: move definition of struct cpuid_leaf to cpuid.h

2018-05-25 Thread Roger Pau Monné
On Thu, May 24, 2018 at 05:05:18PM +0100, Wei Liu wrote:
> This is a step towards consolidating relevant data structures and
> defines to one location.
> 
> It then requires defining cpuid_leaf in user space harness headers to
> make them continue to compile.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
>  tools/tests/x86_emulator/x86-emulate.h | 6 ++
>  xen/arch/x86/x86_emulate/x86_emulate.h | 6 +-
>  xen/include/asm-x86/cpuid.h| 4 
>  3 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/tests/x86_emulator/x86-emulate.h 
> b/tools/tests/x86_emulator/x86-emulate.h
> index c5e85de3a2..8ae31a2f9a 100644
> --- a/tools/tests/x86_emulator/x86-emulate.h
> +++ b/tools/tests/x86_emulator/x86-emulate.h
> @@ -79,6 +79,12 @@ WRAP(puts);
>  
>  #undef WRAP
>  
> +#ifndef cpuid_leaf
> +struct cpuid_leaf {
> +uint32_t a, b, c, d;
> +};
> +#endif

Do you really need the ifndef? AFAICT this header will always be
included from the test harness, which has no definition of cpuid_leaf.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/5] x86: split out cpuid objects and helpers

2018-05-25 Thread Roger Pau Monné
On Thu, May 24, 2018 at 05:05:19PM +0100, Wei Liu wrote:
> They are moved to a new header which is going to be consumed by both
> the hypervisor and toolstack.
> 
> Create a new directory for this kind of headers in anticipation of
  ^that?
> more will come.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> 
> Any suggestion on the directory name?
> ---
>  xen/include/asm-x86/arch-shared/cpuid.h | 213 
> 
>  xen/include/asm-x86/cpuid.h | 210 +--

I would have placed those inside of:

xen/include/public/arch-x86/cpuid.h

Protected with a #if defined(__XEN__) || defined(__XEN_TOOLS__)?

That's how structures are generally shared between the toolstack and
the hypervisor. But then that would also require you to prefix those
structures with 'xen_', which will add more churn to the patch...

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH] ap-common: Switch to Linux 4.14 by default on X86.

2018-05-25 Thread Roger Pau Monné
On Fri, May 25, 2018 at 03:44:53PM +0100, Ian Jackson wrote:
> Linux 4.9 is getting a bit long in the tooth.  4.14 is an LTS branch
> and the osstest-tested version seems reasonably good.  I ran a special
> report[1] to see what to expect and it reported no regressions.
> 
> Accordingly I am going to switch to using Linux 4.14 by default for
> most X86 runs in osstest.  ARM tests are not affected at this time;
> they use their own linux-arm-xen branch which is updated by the Xen
> ARM maintainers.
> 
> [1] ./sg-report-flight --that-linux=6ba89b52ba6916bc7a3d390d70951e992c0ca39e 
> --this-linux=d88700f79448fc8f03617d4f1929c39676f8d1e4 
> --branches-also=linux-4.9,linux-4.14,linux-arm-xen 122974 |less
> 
> CC: Boris Ostrovsky <boris.ostrov...@oracle.com>
> CC: Juergen Gross <jgr...@suse.com>
> CC: Stefano Stabellini <sstabell...@kernel.org>
> CC: Wei Liu <wei.l...@citrix.com>
> CC: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
> CC: Roger Pau Monné <roger@citrix.com>
> CC: Julien Grall <julien.gr...@linaro.org>
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

No objections:

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/9] x86/vmx: Fix handing of MSR_DEBUGCTL on VMExit

2018-05-24 Thread Roger Pau Monné
On Tue, May 22, 2018 at 12:20:42PM +0100, Andrew Cooper wrote:
> Currently, whenever the guest writes a nonzero value to MSR_DEBUGCTL, Xen
> updates a host MSR load list entry with the current hardware value of
> MSR_DEBUGCTL.  This is wrong.
> 
> On VMExit, hardware automatically resets MSR_DEBUGCTL to 0.  The only case
> where different behaviour is needed is if Xen is debugging itself, and this
> needs setting up unconditionally for the lifetime of the VM.
> 
> The `ler` command line boolean is the only way to configure any use of
> MSR_DEBUGCTL for Xen,

Hm, there's no documentation at all for the 'ler' option.

> so tie the host load list entry to this setting in
> construct_vmcs().  Any runtime update of Xen's MSR_DEBUGCTL setting requires
> more complicated synchronisation across all the running VMs.
> 
> In the exceedingly common case, this avoids the unnecessary overhead of having
> a host load entry performing the same zeroing operation that hardware has
> already performed as part of the VMExit.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> ---
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Jun Nakajima <jun.nakaj...@intel.com>
> CC: Kevin Tian <kevin.t...@intel.com>
> CC: Wei Liu <wei.l...@citrix.com>
> CC: Roger Pau Monné <roger@citrix.com>
> 
> Notes for backporting: This change probably does want backporting, but depends
> on the previous patch "Support remote access to the MSR lists", and adds an
> extra rdmsr to the vcpu construction path (resolved in a later patch).
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c | 6 ++
>  xen/arch/x86/hvm/vmx/vmx.c  | 3 +--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 8bf54c4..2035a6d 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -996,6 +996,7 @@ static int construct_vmcs(struct vcpu *v)
>  struct domain *d = v->domain;
>  u32 vmexit_ctl = vmx_vmexit_control;
>  u32 vmentry_ctl = vmx_vmentry_control;
> +int rc;
>  
>  vmx_vmcs_enter(v);
>  
> @@ -1266,6 +1267,11 @@ static int construct_vmcs(struct vcpu *v)
>  if ( cpu_has_vmx_tsc_scaling )
>  __vmwrite(TSC_MULTIPLIER, d->arch.hvm_domain.tsc_scaling_ratio);
>  
> +/* If using host debugging, restore Xen's setting on vmexit. */
> +if ( this_cpu(ler_msr) &&
> + (rc = vmx_add_host_load_msr(v, MSR_IA32_DEBUGCTLMSR))  )
> +return rc;

Isn't this missing a vmx_vmcs_exit on error?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 8/9] x86/vmx: Support removing MSRs from the host/guest load/save lists

2018-05-24 Thread Roger Pau Monné
On Tue, May 22, 2018 at 12:20:45PM +0100, Andrew Cooper wrote:
> Up until this point, the MSR load/save lists have only ever accumulated
> content.  Introduce vmx_del_msr() as a companion to vmx_add_msr().
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> ---
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Jun Nakajima <jun.nakaj...@intel.com>
> CC: Kevin Tian <kevin.t...@intel.com>
> CC: Wei Liu <wei.l...@citrix.com>
> CC: Roger Pau Monné <roger@citrix.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c| 68 
> ++
>  xen/include/asm-x86/hvm/vmx/vmcs.h |  1 +
>  2 files changed, 69 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 7bf19a0..e1a8f95 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1465,6 +1465,74 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t 
> val,
>  return rc;
>  }
>  
> +int vmx_del_msr(struct vcpu *v, uint32_t msr, enum vmx_msr_list_type type)
> +{
> +struct arch_vmx_struct *arch_vmx = >arch.hvm_vmx;
> +struct vmx_msr_entry *start = NULL, *ent, *end;
> +unsigned int substart, subend, total;
> +
> +ASSERT(v == current || !vcpu_runnable(v));
> +
> +switch ( type )
> +{
> +case VMX_MSR_HOST:
> +start= arch_vmx->host_msr_area;
> +substart = 0;
> +subend   = arch_vmx->host_msr_count;
> +total= subend;
> +break;
> +
> +case VMX_MSR_GUEST:
> +start= arch_vmx->msr_area;
> +substart = 0;
> +subend   = arch_vmx->msr_save_count;
> +total= arch_vmx->msr_load_count;
> +break;
> +
> +case VMX_MSR_GUEST_LOADONLY:
> +start= arch_vmx->msr_area;
> +substart = arch_vmx->msr_save_count;
> +subend   = arch_vmx->msr_load_count;
> +total= subend;
> +break;
> +
> +default:
> +ASSERT_UNREACHABLE();
> +}

The above chunk is already in vmx_find_msr and vmx_add_msr, maybe a
static helper that sets start/substart/subend/total would be helpful
here?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context

2018-05-24 Thread Roger Pau Monné
On Tue, May 22, 2018 at 12:20:46PM +0100, Andrew Cooper wrote:
> Intel hardware only uses 4 bits in MSR_EFER.  Changes to LME and LMA are
> handled automatically via the VMENTRY_CTLS.IA32E_MODE bit.
> 
> SCE is handled by ad-hoc logic in context_switch(), vmx_restore_guest_msrs()
> and vmx_update_guest_efer(), and works by altering the host SCE value to match
> the setting the guest wants.  This works because, in HVM vcpu context, Xen
> never needs to execute a SYSCALL or SYSRET instruction.
> 
> However, NXE has never been context switched.  Unlike SCE, NXE cannot be
> context switched at vcpu boundaries because disabling NXE makes PTE.NX bits
> reserved and cause a pagefault when encountered.  This means that the guest
> always has Xen's setting in effect, irrespective of the bit it can see and
> modify in its virtualised view of MSR_EFER.
> 
> This isn't a major problem for production operating systems because they, like
> Xen, always turn the NXE on when it is available.  However, it does have an
> observable effect on which guest PTE bits are valid, and whether
> PFEC_insn_fetch is visible in a #PF error code.
> 
> Second generation VT-x hardware has host and guest EFER fields in the VMCS,
> and support for loading and saving them automatically.  First generation VT-x
> hardware needs to use MSR load/save lists to cause an atomic switch of
> MSR_EFER on vmentry/exit.
> 
> Therefore we update vmx_init_vmcs_config() to find and use guest/host EFER
> support when available (and MSR load/save lists on older hardware) and drop
> all ad-hoc alteration of SCE.
> 
> There are two complications for shadow guests.  NXE, being a paging setting
> needs to remain under host control, but that is fine as it is also Xen which
> handles the pagefaults.  Also, it turns out that without EPT enabled, hardware
> won't tolerate LME and LMA being different via either the GUEST_EFER VMCS
> setting, or via the guest load list.  This doesn't matter in practice as we
> intercept all writes to CR0 and reads from MSR_EFER, so can provide
> architecturally consistent behaviour from the guests point of view.
> 
> As a result of fixing EFER context switching, we can remove the Intel-special
> case from hvm_nx_enabled() and let guest_walk_tables() work with the real
> guest paging settings.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>

LGTM:

Reviewed-by: Roger Pau Monné <roger@citrix.com>

One question below though.

> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
> b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index cfd174c..6c6897c 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -306,6 +306,8 @@ extern u64 vmx_ept_vpid_cap;
>  (vmx_cpu_based_exec_control & CPU_BASED_MONITOR_TRAP_FLAG)
>  #define cpu_has_vmx_pat \
>  (vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_PAT)
> +#define cpu_has_vmx_efer \
> +(vmx_vmentry_control & VM_ENTRY_LOAD_GUEST_EFER)

Don't you also need a vmx_vmexit_control & VM_EXIT_SAVE_GUEST_EFER and
vmx_vmexit_control & VM_EXIT_LOAD_HOST_EFER?

Or can the presence of those two be inferred from
VM_ENTRY_LOAD_GUEST_EFER?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 7/9] x86/vmx: Support load-only guest MSR list entries

2018-05-24 Thread Roger Pau Monné
On Tue, May 22, 2018 at 12:20:44PM +0100, Andrew Cooper wrote:
> Currently, the VMX_MSR_GUEST type maintains completely symmetric guest load
> and save lists, by pointing VM_EXIT_MSR_STORE_ADDR and VM_ENTRY_MSR_LOAD_ADDR
> at the same page, and setting VM_EXIT_MSR_STORE_COUNT and
> VM_ENTRY_MSR_LOAD_COUNT to the same value.
> 
> However, for MSRs which we won't let the guest have direct access to, having
> hardware save the current value on VMExit is unnecessary overhead.
> 
> To avoid this overhead, we must make the load and save lists asymmetric.  By
> making the entry load count greater than the exit store count, we can maintain
> two adjacent lists of MSRs, the first of which is saved and restored, and the
> second of which is only restored on VMEntry.
> 
> For simplicity:
>  * Both adjacent lists are still sorted by MSR index.
>  * It undefined behaviour to insert the same MSR into both lists.
>  * The total size of both lists is still limited at 256 entries (one 4k page).
> 
> Split the current msr_count field into msr_{load,save}_count, and introduce a
> new VMX_MSR_GUEST_LOADONLY type, and update vmx_{add,find}_msr() to calculate
> which sublist to search, based on type.  VMX_MSR_HOST has no logical sublist,
> whereas VMX_MSR_GUEST has a sublist between 0 and the save count, while
> VMX_MSR_GUEST_LOADONLY has a sublist between the save count and the load
> count.
> 
> One subtle point is that inserting an MSR into the load-save list involves
> moving the entire load-only list, and updating both counts.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Just one nit below.

> @@ -1423,8 +1446,11 @@ int vmx_add_msr(struct vcpu *v, uint32_t msr, uint64_t 
> val,
>  break;
>  
>  case VMX_MSR_GUEST:
> -__vmwrite(VM_EXIT_MSR_STORE_COUNT, ++arch_vmx->msr_count);
> -__vmwrite(VM_ENTRY_MSR_LOAD_COUNT, arch_vmx->msr_count);
> +__vmwrite(VM_EXIT_MSR_STORE_COUNT, ++arch_vmx->msr_save_count);
> +
> +/* Fallthrough */
> +case VMX_MSR_GUEST_LOADONLY:
> +__vmwrite(VM_ENTRY_MSR_LOAD_COUNT, ++arch_vmx->msr_load_count);
>  break;
>  }

Would it make sense to add something like:

ASSERT(arch_vmx->msr_save_count <= arch_vmx->msr_load_count);

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()

2018-05-24 Thread Roger Pau Monné
On Thu, May 24, 2018 at 11:59:07AM +0100, Andrew Cooper wrote:
> On 24/05/18 11:53, Roger Pau Monné wrote:
> > On Wed, May 23, 2018 at 05:55:50PM +0100, Andrew Cooper wrote:
> >> On 23/05/18 17:39, Roger Pau Monné wrote:
> >>> On Tue, May 22, 2018 at 12:20:40PM +0100, Andrew Cooper wrote:
> >>>> Instead of having multiple algorithms searching the MSR lists, implement 
> >>>> a
> >>>> single one.  It has the semantics required by vmx_add_msr(), to identify 
> >>>> the
> >>>> position in which an MSR should live, if it isn't already present.
> >>>>
> >>>> There will be a marginal improvement for vmx_find_msr() by avoiding the
> >>>> function pointer calls to vmx_msr_entry_key_cmp(), and a major 
> >>>> improvement for
> >>>> vmx_add_msr() by using a binary search instead of a linear search.
> >>>>
> >>>> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> >>>> ---
> >>>> CC: Jan Beulich <jbeul...@suse.com>
> >>>> CC: Jun Nakajima <jun.nakaj...@intel.com>
> >>>> CC: Kevin Tian <kevin.t...@intel.com>
> >>>> CC: Wei Liu <wei.l...@citrix.com>
> >>>> CC: Roger Pau Monné <roger@citrix.com>
> >>>> ---
> >>>>  xen/arch/x86/hvm/vmx/vmcs.c | 42 
> >>>> --
> >>>>  1 file changed, 28 insertions(+), 14 deletions(-)
> >>>>
> >>>> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> >>>> index f557857..e4acdc1 100644
> >>>> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> >>>> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> >>>> @@ -1276,24 +1276,36 @@ static int construct_vmcs(struct vcpu *v)
> >>>>  return 0;
> >>>>  }
> >>>>  
> >>>> -static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
> >>>> +/*
> >>>> + * Search an MSR list looking for an MSR entry, or the slot in which it 
> >>>> should
> >>>> + * live (to keep the data sorted) if an entry is not found.
> >>>> + *
> >>>> + * The return pointer is guarenteed to be bounded by start and end.  
> >>>> However,
> >>>> + * it may point at end, and may be invalid for the caller to 
> >>>> dereference.
> >>>> + */
> >>>> +static struct vmx_msr_entry *locate_msr_entry(
> >>>> +struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t 
> >>>> msr)
> >>>>  {
> >>>> -const u32 *msr = key;
> >>>> -const struct vmx_msr_entry *entry = elt;
> >>>> +while ( start < end )
> >>>> +{
> >>>> +struct vmx_msr_entry *mid = start + (end - start) / 2;
> >>>>  
> >>>> -if ( *msr > entry->index )
> >>>> -return 1;
> >>>> -if ( *msr < entry->index )
> >>>> -return -1;
> >>>> +if ( msr < mid->index )
> >>>> +end = mid;
> >>>> +else if ( msr > mid->index )
> >>>> +start = mid + 1;
> >>>> +else
> >>>> +return mid;
> >>>> +}
> >>> This is basically an open coded version of bsearch, isn't there anyway
> >>> to adapt the current bsearch so that it could be used for both
> >>> vmx_find_msr and vmx_add_msr?
> >>>
> >>> I know there will be a performance penalty for using a function
> >>> pointer for the comparator function, but this looks like code
> >>> duplication to me.
> >> A third use appears in a later patch.  bsearch() doesn't have the
> >> described property on a miss, which is necessary to maintain the lists.
> > I would consider adding a flag to the list of parameters so that
> > bsearch returned the position where the item should be added in case
> > of a miss. You could then wrap it inside of locate_msr_entry, or get
> > rid of this helper altogether.
> 
> bsearch() is specified by POSIX, and C89/99, amongst other standards. 
> Changing its API is not something I'm going to do.

Oh, didn't know that. In which case I agree. AFAICT there's no POSIX
specification for a function that could be used to add new entries
into a sorted array, so:

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/5] docs/pvh: document initial MTRR state

2018-05-15 Thread Roger Pau Monné
On Tue, May 15, 2018 at 01:51:03AM -0600, Jan Beulich wrote:
> >>> On 14.05.18 at 18:18,  wrote:
> > On Mon, May 14, 2018 at 10:13:47AM -0600, Jan Beulich wrote:
> >> >>> On 14.05.18 at 18:03,  wrote:
> >> > On Thu, May 10, 2018 at 06:15:05PM +0100, Roger Pau Monne wrote:
> >> >> --- a/docs/misc/pvh.markdown
> >> >> +++ b/docs/misc/pvh.markdown
> >> >> @@ -92,3 +92,18 @@ event channels. Delivery of those interrupts can be 
> >> >> configured in the same way
> >> >>  as HVM guests, check xen/include/public/hvm/params.h and
> >> >>  xen/include/public/hvm/hvm\_op.h for more information about available 
> >> >> delivery
> >> >>  methods.
> >> >> +
> >> >> +## MTRR ##
> >> >> +
> >> >> +### Unprivileged guests ###
> >> >> +
> >> >> +PVH guests are booted with the default MTRR type set to write-back and 
> >> >> MTRR
> >> >> +enabled. This allows DomUs to start with a sane MTRR state. Note that 
> >> >> this will
> >> >> +have to be revisited when pci-passthrough is added to PVH in order to 
> >> >> set MMIO
> >> >> +regions as UC.
> >> > 
> >> > My reading is "revisited" implies the default type will change. In fact
> >> > it shouldn't. We should clarify: for ram it will remain WB, for MMIO
> >> > holes it will be UC.
> >> 
> >> Why would changing the default late be a problem? A firmware update on
> >> bare hardware might also have such an effect. The default type read from
> >> the MSR must not change across the lifetime of a VM, but imo may change
> >> across reboots of it.
> >> 
> > 
> > Then setting a default here doesn't really help OS developers because
> > they will always need to write code to set the correct type -- not that
> > this is a big issue, but as I understand it the point here is to avoid
> > that.
> 
> Hmm, my understanding of the purpose of the series was that it aims at
> establishing some sane (i.e. reasonable for an OS to expect) state, instead
> of a firm "this will always be this way" one. Furthermore OSes generally
> shouldn't find a need to fiddle with MTRRs, provided firmware has done a
> proper job setting them up.

Indeed that's the purpose. Most OSes don't really care about the
details of the MTRR setup, and they just expect RAM regions to be set
to WB and MMIO holes to UC AFAICT.

I don't think Xen has to provide any guarantee about the details of
the MTRR state, apart from stating that RAM will be WB and MMIO UC.

I can leave the text as-is, or add the paragraph suggested in another
email to clarify if the current writing is prone to misunderstanding.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/5] hvm/mtrr: copy hardware state for Dom0

2018-05-15 Thread Roger Pau Monné
On Tue, May 15, 2018 at 01:52:43AM -0600, Jan Beulich wrote:
> >>> On 14.05.18 at 18:33, <roger@citrix.com> wrote:
> > On Mon, May 14, 2018 at 08:26:30AM -0600, Jan Beulich wrote:
> >> >>> On 10.05.18 at 19:15, <roger@citrix.com> wrote:
> >> > Copy the state found on the hardware when creating a PVH Dom0. Since
> >> > the memory map provided to a PVH Dom0 is based on the native one using
> >> > the same set of MTRR ranges should provide Dom0 with a sane MTRR state
> >> > without having to manually build it in Xen.
> >> > 
> >> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> >> > ---
> >> > Cc: Jan Beulich <jbeul...@suse.com>
> >> > Cc: Andrew Cooper <andrew.coop...@citrix.com>
> >> > ---
> >> >  xen/arch/x86/hvm/mtrr.c | 23 +++
> >> >  1 file changed, 23 insertions(+)
> >> > 
> >> > diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
> >> > index 95a3deabea..1cb000388a 100644
> >> > --- a/xen/arch/x86/hvm/mtrr.c
> >> > +++ b/xen/arch/x86/hvm/mtrr.c
> >> > @@ -176,6 +176,29 @@ int hvm_vcpu_cacheattr_init(struct vcpu *v)
> >> >  ((uint64_t)PAT_TYPE_UC_MINUS << 48) |   /* PAT6: UC- */
> >> >  ((uint64_t)PAT_TYPE_UNCACHABLE << 56);  /* PAT7: UC */
> >> >  
> >> > +if ( is_hardware_domain(v->domain) )
> >> > +{
> >> > +/* Copy values from the host. */
> >> > +struct domain *d = v->domain;
> >> > +unsigned int i;
> >> > +
> >> > +if ( mtrr_state.have_fixed )
> >> > +for ( i = 0; i < NUM_FIXED_MSR; i++ )
> >> > +mtrr_fix_range_msr_set(d, m, i,
> >> > +  ((uint64_t 
> >> > *)mtrr_state.fixed_ranges)[i]);
> >> 
> >> The presence/absence of fixed range MTRRs needs to be reflected in the
> >> capabilities MSR. Strictly speaking in their absence MSR access attempts to
> >> the fixed range MSRs should also cause #GP, as should any attempt to
> >> enable them in defType.
> > 
> > My intention was to always provide the fixed range MTRR capability,
> > regardless of whether the underlying hardware has it or not. Then of
> > course fixed ranges won't be enabled by default in the deftype MSR if
> > the underlying hardware also hasn't got them enabled.
> 
> What would the result be of the OS writing to any of these MSRs, or
> setting the respective enable bit?

Likely the cache attributes for the guest will change if it sets some
fixed ranges and enables the FE bit. But I'm not sure why is that a
problem.

If a guest plays with the MTRR ranges it must know what it's doing
anyway, and the fact that the underlying hardware has fixed range
support or not doesn't affect the changes that the guest might be
trying to perform to it's virtual MTRR registers/state.

Likewise the guest could change the variable ranges and mess up with
the types, but that's just going to affect the guest cache attributes,
not the host (Xen) ones.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/4] libxc: Drop declarations of osdep_privcmd_open and _close

2018-05-15 Thread Roger Pau Monné
On Mon, May 14, 2018 at 06:08:56PM +0100, Ian Jackson wrote:
> These functions are no longer defined or used anywhere.  The
> declarations should have been deleted when the definitions were.
> 
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/9] x86/vmx: Internal cleanup for MSR load/save infrastructure

2018-05-23 Thread Roger Pau Monné
On Tue, May 22, 2018 at 12:20:39PM +0100, Andrew Cooper wrote:
>  * Use an arch_vmx_struct local variable to reduce later code volume.
>  * Use start/total instead of msr_area/msr_count.  This is in preparation for
>more finegrained handling with later changes.
>  * Use ent/end pointers (again for preparation), and to make the vmx_add_msr()
>logic easier to follow.
>  * Make the memory allocation block of vmx_add_msr() unlikely, and calculate
>virt_to_maddr() just once.
> 
> No practical change to functionality.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> ---
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Jun Nakajima <jun.nakaj...@intel.com>
> CC: Kevin Tian <kevin.t...@intel.com>
> CC: Wei Liu <wei.l...@citrix.com>
> CC: Roger Pau Monné <roger@citrix.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c | 74 
> -
>  1 file changed, 40 insertions(+), 34 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index a5dcf5c..f557857 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1292,48 +1292,50 @@ static int vmx_msr_entry_key_cmp(const void *key, 
> const void *elt)
>  struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type type)
>  {
>  struct vcpu *curr = current;
> -unsigned int msr_count;
> -struct vmx_msr_entry *msr_area = NULL;
> +struct arch_vmx_struct *arch_vmx = >arch.hvm_vmx;

curr is used here only, so you can use current and get rid of the curr
local variable?

> +struct vmx_msr_entry *start = NULL;
> +unsigned int total;
>  
>  switch ( type )
>  {
>  case VMX_MSR_HOST:
> -msr_count = curr->arch.hvm_vmx.host_msr_count;
> -msr_area = curr->arch.hvm_vmx.host_msr_area;
> +start= arch_vmx->host_msr_area;
> +total= arch_vmx->host_msr_count;
>  break;
>  
>  case VMX_MSR_GUEST:
> -msr_count = curr->arch.hvm_vmx.msr_count;
> -msr_area = curr->arch.hvm_vmx.msr_area;
> +start= arch_vmx->msr_area;
> +total= arch_vmx->msr_count;

Not that I think is wrong, but why are you adding the extra spaces
after the variable name? Those assignments will already be aligned
because start and total names have the same length.

>  break;
>  
>  default:
>  ASSERT_UNREACHABLE();
>  }
>  
> -if ( msr_area == NULL )
> +if ( !start )
>  return NULL;
>  
> -return bsearch(, msr_area, msr_count, sizeof(struct vmx_msr_entry),
> +return bsearch(, start, total, sizeof(struct vmx_msr_entry),
> vmx_msr_entry_key_cmp);
>  }
>  
>  int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type)
>  {
>  struct vcpu *curr = current;

curr seems to be used only once below in order to get hvm_vmx? In
which case it could be removed.

> -unsigned int idx, *msr_count;
> -struct vmx_msr_entry **msr_area, *msr_area_elem;
> +struct arch_vmx_struct *arch_vmx = >arch.hvm_vmx;
> +struct vmx_msr_entry **ptr, *start = NULL, *ent, *end;

Do you really need to initialize start here? It seems like it's
unconditionally set to *ptr before any usage.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/9] x86/vmx: API improvements for MSR load/save infrastructure

2018-05-23 Thread Roger Pau Monné
On Tue, May 22, 2018 at 12:20:38PM +0100, Andrew Cooper wrote:
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h 
> b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 06c3179..c8a1f89 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -514,9 +514,6 @@ enum vmcs_field {
>  
>  #define VMCS_VPID_WIDTH 16
>  
> -#define VMX_GUEST_MSR 0
> -#define VMX_HOST_MSR  1
> -
>  /* VM Instruction error numbers */
>  enum vmx_insn_errno
>  {
> @@ -534,6 +531,54 @@ enum vmx_insn_errno
>  VMX_INSN_FAIL_INVALID  = ~0,
>  };
>  
> +/* MSR load/save list infrastructure. */
> +enum vmx_msr_list_type {
> +VMX_MSR_HOST,
> +VMX_MSR_GUEST,
> +};
> +
> +int vmx_add_msr(uint32_t msr, enum vmx_msr_list_type type);
> +
> +static inline int vmx_add_host_load_msr(uint32_t msr)
> +{
> +return vmx_add_msr(msr, VMX_MSR_HOST);
> +}
> +
> +static inline int vmx_add_guest_msr(uint32_t msr)
> +{
> +return vmx_add_msr(msr, VMX_MSR_GUEST);
> +}
> +
> +struct vmx_msr_entry *vmx_find_msr(uint32_t msr, enum vmx_msr_list_type 
> type);
> +
> +static inline int vmx_read_guest_msr(uint32_t msr, uint64_t *val)
> +{
> +struct vmx_msr_entry *ent;

const

Also I would probably do:

{
const struct vmx_msr_entry *ent = vmx_find_msr(msr, VMX_MSR_GUEST);

if ( !ent )
return -ESRCH;

*val = ent->data;
return 0;
}

With the const:

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/9] x86/vmx: Factor locate_msr_entry() out of vmx_find_msr() and vmx_add_msr()

2018-05-23 Thread Roger Pau Monné
On Tue, May 22, 2018 at 12:20:40PM +0100, Andrew Cooper wrote:
> Instead of having multiple algorithms searching the MSR lists, implement a
> single one.  It has the semantics required by vmx_add_msr(), to identify the
> position in which an MSR should live, if it isn't already present.
> 
> There will be a marginal improvement for vmx_find_msr() by avoiding the
> function pointer calls to vmx_msr_entry_key_cmp(), and a major improvement for
> vmx_add_msr() by using a binary search instead of a linear search.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
> ---
> CC: Jan Beulich <jbeul...@suse.com>
> CC: Jun Nakajima <jun.nakaj...@intel.com>
> CC: Kevin Tian <kevin.t...@intel.com>
> CC: Wei Liu <wei.l...@citrix.com>
> CC: Roger Pau Monné <roger@citrix.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c | 42 --
>  1 file changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index f557857..e4acdc1 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -1276,24 +1276,36 @@ static int construct_vmcs(struct vcpu *v)
>  return 0;
>  }
>  
> -static int vmx_msr_entry_key_cmp(const void *key, const void *elt)
> +/*
> + * Search an MSR list looking for an MSR entry, or the slot in which it 
> should
> + * live (to keep the data sorted) if an entry is not found.
> + *
> + * The return pointer is guarenteed to be bounded by start and end.  However,
> + * it may point at end, and may be invalid for the caller to dereference.
> + */
> +static struct vmx_msr_entry *locate_msr_entry(
> +struct vmx_msr_entry *start, struct vmx_msr_entry *end, uint32_t msr)
>  {
> -const u32 *msr = key;
> -const struct vmx_msr_entry *entry = elt;
> +while ( start < end )
> +{
> +struct vmx_msr_entry *mid = start + (end - start) / 2;
>  
> -if ( *msr > entry->index )
> -return 1;
> -if ( *msr < entry->index )
> -return -1;
> +if ( msr < mid->index )
> +end = mid;
> +else if ( msr > mid->index )
> +start = mid + 1;
> +else
> +return mid;
> +}

This is basically an open coded version of bsearch, isn't there anyway
to adapt the current bsearch so that it could be used for both
vmx_find_msr and vmx_add_msr?

I know there will be a performance penalty for using a function
pointer for the comparator function, but this looks like code
duplication to me.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/6] x86/vmx: Defer vmx_vmcs_exit() as long as possible in construct_vmcs()

2018-06-06 Thread Roger Pau Monné
On Mon, May 28, 2018 at 03:27:57PM +0100, Andrew Cooper wrote:
> paging_update_paging_modes() and vmx_vlapic_msr_changed() both operate on the
> VMCS being constructed.  Avoid dropping and re-acquiring the reference
> multiple times.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Jun Nakajima 
> CC: Kevin Tian 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index be02be1..ce78f19 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -996,6 +996,7 @@ static int construct_vmcs(struct vcpu *v)
>  struct domain *d = v->domain;
>  u32 vmexit_ctl = vmx_vmexit_control;
>  u32 vmentry_ctl = vmx_vmentry_control;
> +int rc;
>  
>  vmx_vmcs_enter(v);
>  
> @@ -1083,8 +1084,8 @@ static int construct_vmcs(struct vcpu *v)
>  
>  if ( msr_bitmap == NULL )
>  {
> -vmx_vmcs_exit(v);
> -return -ENOMEM;
> +rc = -ENOMEM;
> +goto out;
>  }
>  
>  memset(msr_bitmap, ~0, PAGE_SIZE);
> @@ -1258,13 +1259,14 @@ static int construct_vmcs(struct vcpu *v)
>  if ( cpu_has_vmx_tsc_scaling )
>  __vmwrite(TSC_MULTIPLIER, d->arch.hvm_domain.tsc_scaling_ratio);
>  
> -vmx_vmcs_exit(v);
> -
>  /* will update HOST & GUEST_CR3 as reqd */
>  paging_update_paging_modes(v);
>  
>  vmx_vlapic_msr_changed(v);
>  
> + out:
> +vmx_vmcs_exit(v);
> +
>  return 0;

Shouldn't you return rc here? Or else you lose the error value.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/6] x86/vmx: Simplify PAT handling during vcpu construction

2018-06-06 Thread Roger Pau Monné
On Mon, May 28, 2018 at 03:27:56PM +0100, Andrew Cooper wrote:
> The host PAT value is a compile time constant, and doesn't need to be read out
> of hardware.  Merge this if block into the previous block, which has an
> identical condition.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 03/11] x86: Initialise debug registers correctly

2018-06-06 Thread Roger Pau Monné
On Mon, Jun 04, 2018 at 02:59:07PM +0100, Andrew Cooper wrote:
> In particular, initialising %dr6 with the value 0 is buggy, because on
> hardware supporting Transnational Memory, it will cause the sticky RTM bit to
> be asserted, even though a debug event from a transaction hasn't actually been
> observed.
> 
> Introduce X86_DR7_DEFAULT to match the existing X86_DR6_DEFAULT, and use
> correct defaults when resetting the debug registers in cpu_init().
> 
> For vcpus, %dr6/7 have never been initialised.  In practice, this means that
> toolstack get/set operations see zeros until the vcpu has first touched its
> debug registers (at which point hardware fixes up the reserved bits), and the
> RTM corner case will persist beyond that point.
> 
> Introduce initialise_registers() to set register defaults (including eflags
> while we are fixing this) and call it early in vcpu_initialise().  Make a
> similar adjustment in hvm_vcpu_reset_state().
> 
> Finally, adjust the vcpu state initialising logic in libxc.  All 3 sites zero
> memory before choosing the nonzero defaults, which propagates the RTM corner
> case.
> 
> Signed-off-by: Andrew Cooper 

LGTM:

Reviewed-by: Roger Pau Monné 

Just a couple of questions below.

> ---
> CC: Jan Beulich 
> CC: Ian Jackson 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> ---
>  tools/libxc/xc_dom_x86.c   | 12 
>  xen/arch/x86/cpu/common.c  | 12 
>  xen/arch/x86/domain.c  | 10 ++
>  xen/arch/x86/hvm/hvm.c |  6 +-
>  xen/include/asm-x86/debugreg.h |  2 ++
>  5 files changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
> index e33a288..3ab918c 100644
> --- a/tools/libxc/xc_dom_x86.c
> +++ b/tools/libxc/xc_dom_x86.c
> @@ -53,6 +53,9 @@
>  #define X86_CR0_PE 0x01
>  #define X86_CR0_ET 0x10
>  
> +#define X86_DR6_DEFAULT 0x0ff0u
> +#define X86_DR7_DEFAULT 0x0400u
> +
>  #define SPECIALPAGE_PAGING   0
>  #define SPECIALPAGE_ACCESS   1
>  #define SPECIALPAGE_SHARING  2
> @@ -860,6 +863,9 @@ static int vcpu_x86_32(struct xc_dom_image *dom)
>  dom->parms.virt_base + (dom->start_info_pfn) * PAGE_SIZE_X86;
>  ctxt->user_regs.eflags = 1 << 9; /* Interrupt Enable */
>  
> +ctxt->debugreg[6] = X86_DR6_DEFAULT;
> +ctxt->debugreg[7] = X86_DR7_DEFAULT;
> +
>  ctxt->flags = VGCF_in_kernel_X86_32 | VGCF_online_X86_32;
>  if ( dom->parms.pae == XEN_PAE_EXTCR3 ||
>   dom->parms.pae == XEN_PAE_BIMODAL )
> @@ -907,6 +913,9 @@ static int vcpu_x86_64(struct xc_dom_image *dom)
>  dom->parms.virt_base + (dom->start_info_pfn) * PAGE_SIZE_X86;
>  ctxt->user_regs.rflags = 1 << 9; /* Interrupt Enable */
>  
> +ctxt->debugreg[6] = X86_DR6_DEFAULT;
> +ctxt->debugreg[7] = X86_DR7_DEFAULT;
> +
>  ctxt->flags = VGCF_in_kernel_X86_64 | VGCF_online_X86_64;
>  cr3_pfn = xc_dom_p2m(dom, dom->pgtables_seg.pfn);
>  ctxt->ctrlreg[3] = xen_pfn_to_cr3_x86_64(cr3_pfn);
> @@ -1011,6 +1020,9 @@ static int vcpu_hvm(struct xc_dom_image *dom)
>  /* Set the IP. */
>  bsp_ctx.cpu.rip = dom->parms.phys_entry;
>  
> +bsp_ctx.cpu.dr6 = X86_DR6_DEFAULT;
> +bsp_ctx.cpu.dr7 = X86_DR7_DEFAULT;
> +
>  if ( dom->start_info_seg.pfn )
>  bsp_ctx.cpu.rbx = dom->start_info_seg.pfn << PAGE_SHIFT;
>  
> diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
> index 528aff1..0872466 100644
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -3,6 +3,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -823,10 +824,13 @@ void cpu_init(void)
>   /* Ensure FPU gets initialised for each domain. */
>   stts();
>  
> - /* Clear all 6 debug registers: */
> -#define CD(register) asm volatile ( "mov %0,%%db" #register : : "r"(0UL) );
> - CD(0); CD(1); CD(2); CD(3); /* no db4 and db5 */; CD(6); CD(7);
> -#undef CD
> + /* Reset debug registers: */
> + write_debugreg(0, 0);
> + write_debugreg(1, 0);
> + write_debugreg(2, 0);
> + write_debugreg(3, 0);
> + write_debugreg(6, X86_DR6_DEFAULT);
> + write_debugreg(7, X86_DR7_DEFAULT);

AFAICT you are writing the default init values, which should be there
already. So this is just a safebelt in case the CPU is initialized
with some bogus DR values?

>  }
>  
>  void cpu_uninit(unsigned int cpu)
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 0ca820a..7ae9789 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -32

Re: [Xen-devel] [PATCH 6/6] x86/vmx: Drop VMX signal for full real-mode

2018-06-06 Thread Roger Pau Monné
On Mon, May 28, 2018 at 03:27:58PM +0100, Andrew Cooper wrote:
> The hvmloader code which used this signal was deleted 10 years ago (c/s
> 50b12df83 "x86 vmx: Remove vmxassist").  Furthermore, the value gets discarded
> anyway because the HVM domain builder unconditionally sets %rax to 0 in the
> same action it uses to set %rip to the appropriate entrypoint.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/6] x86/pat: Simplify host PAT handling

2018-06-06 Thread Roger Pau Monné
On Mon, May 28, 2018 at 03:27:55PM +0100, Andrew Cooper wrote:
> With the removal of the 32bit hypervisor build, host_pat is a constant value.
> Drop the variable and the redundant cpu_has_pat predicate, and use a define
> instead.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

> diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
> index 9924cdf..ac1577c 100644
> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -97,6 +97,12 @@
>X86_EFLAGS_NT|X86_EFLAGS_DF|X86_EFLAGS_IF|\
>X86_EFLAGS_TF)
>  
> +/*
> + * Host IA32_CR_PAT value to cover all memory types.  This is not the default
> + * MSR_PAT value, and is an ABI with PV guests.
> + */
> +#define XEN_MSR_PAT 0x050100070406ul

Not sure whether it would make sense to use MASK_INSR and define each
page attribute field in order to create this value.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 02/11] x86/vmx: Don't clobber %dr6 while debugging state is lazy

2018-06-06 Thread Roger Pau Monné
On Mon, Jun 04, 2018 at 02:59:06PM +0100, Andrew Cooper wrote:
> c/s 4f36452b63 introduced a write to %dr6 in the #DB intercept case, but the
> guests debug registers may be lazy at this point, at which point the guests
> later attempt to read %dr6 will discard this value and use the older stale
> value.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

> ---
> CC: Jan Beulich 
> CC: Wei Liu 
> CC: Roger Pau Monné 
> CC: Jun Nakajima 
> CC: Kevin Tian 
> ---
>  xen/arch/x86/hvm/vmx/vmx.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index 33d39f6..8dbe838 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -3696,6 +3696,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>   */
>  __vmread(EXIT_QUALIFICATION, _qualification);
>  HVMTRACE_1D(TRAP_DEBUG, exit_qualification);
> +__restore_debug_registers(v);

If I understood this correctly, you only call
__restore_debug_registers in order to make sure the DR registers are
marked as dirty?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] xen: share start flags between PV and PVH

2018-06-08 Thread Roger Pau Monné
On Fri, Jun 08, 2018 at 09:12:25AM -0700, Stefano Stabellini wrote:
> On Fri, 8 Jun 2018, Roger Pau Monne wrote:
> > Use a global variable to store the start flags for both PV and PVH.
> > This allows the xen_initial_domain macro to work properly on PVH.
> > 
> > Note that ARM is also switched to use the new variable.
> > 
> > Signed-off-by: Boris Ostrovsky 
> > Signed-off-by: Roger Pau Monné 
> 
> As I already mentioned, the ARM part is OK. However, is the issue that
> xen_start_info is not available on PVH? We had the same problem on ARM
> and solved it by faking a xen_start_info page, see the top of
> arch/arm/xen/enlighten.c.
> 
> I would love to get rid of that, but to do that, we also need to remove
> the xen_start_info referece at drivers/tty/hvc/hvc_xen.c:255:
> 
>   if (!xen_start_info->console.domU.evtchn)
>   return -ENODEV;

We could likely guard xen_pv_console_init (and other PV related
console functions) with CONFIG_PV. HVM/PVH use xen_hvm_console_init.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] multiboot2: clarify usage of the address tag

2018-06-08 Thread Roger Pau Monné
On Fri, Jun 08, 2018 at 11:35:52AM +0200, Daniel Kiper wrote:
> On Thu, Jun 07, 2018 at 05:59:06PM +0200, Roger Pau Monne wrote:
> > Add a note to spell out that if the address tag is not present the
> > file should be loaded using the elf header.
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Cc: Daniel Kiper 
> > Cc: xen-devel@lists.xenproject.org
> > ---
> > Changes since v1:
> >  - s/elf/@sc{elf}/
> >  - s/Multiboot/Multiboot2/
> > ---
> >  doc/multiboot.texi | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/doc/multiboot.texi b/doc/multiboot.texi
> > index 2e2d7e74a..3c797787c 100644
> > --- a/doc/multiboot.texi
> > +++ b/doc/multiboot.texi
> > @@ -509,6 +509,12 @@ assumes that no bss segment is present.
> >
> >  @end table
> >
> > +Note: This information does not need to be provided if the kernel image
> > +is in @sc{elf} format, but it must be provided if the image is in a.out
> > +format or in some other format. Compliant boot loaders must be able to
> > +load images that are either in @sc{elf} format or contain the address
> > +tag embedded in the Multiboot2 header.
> > +
> 
> Now it is better. However, there is a lack of information that this tag
> should be preferred over the relevant data provided in the ELF header if
> both are available in the image. This have to be clear like it is in
> Multiboot spec.

Would you be OK with adding the following sentence at the end:

"When the address tag is present it must be used in order to load the
image, regardless of whether an @sc{elf} header is also present."

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] multiboot2: clarify usage of the address tag

2018-06-07 Thread Roger Pau Monné
On Wed, Jun 06, 2018 at 07:28:20PM +0200, Daniel Kiper wrote:
> On Tue, Jun 05, 2018 at 11:55:36AM +0200, Roger Pau Monne wrote:
> > Add a note to spell out that if the address tag is not present the
> > file should be loaded using the elf header.
> >
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Cc: Daniel Kiper 
> > Cc: xen-devel@lists.xenproject.org
> > ---
> >  doc/multiboot.texi | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/doc/multiboot.texi b/doc/multiboot.texi
> > index 2e2d7e74a..196f9c17a 100644
> > --- a/doc/multiboot.texi
> > +++ b/doc/multiboot.texi
> > @@ -509,6 +509,12 @@ assumes that no bss segment is present.
> >
> >  @end table
> >
> > +Note: This information does not need to be provided if the kernel
> > +image is in elf format, but it must be provided if the image is in
> 
> s/elf/@sc{elf}/
> 
> > +a.out format or in some other format. Compliant boot loaders must be
> > +able to load images that are either in elf format or contain the
> 
> Ditto.
> 
> > +address tag embedded in the Multiboot header.
> 
> s/Multiboot/Multiboot2/
> 
> I think that it is also worth mentioning that the address tag has
> preference over relevant data provided in ELF header.
> 
> Additionally, may I ask you to provide similar patch for Multiboot spec?
> You can find it in multiboot branch. Please look for "The address fields
> of Multiboot header" paragraph.

Multiboot1 already has such paragraph in the "3.1.2 The magic fields
of Multiboot header" section.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Build issues with newer iasl (was: Re: Status of comet-4.10 branch)

2018-06-16 Thread Roger Pau Monné
On Fri, Jun 15, 2018 at 07:26:14PM +0100, Michael Young wrote:
> context (this hasn't reached Fedora yet as the build is broken, I think due
> to updates to Fedora's acpica-tools package which provides iasl).

I've recently experienced the same on FreeBSD, you might want to try:

http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=858dbaaeda33b05c1ac80aea0ba9a03924e09005

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation

2018-06-14 Thread Roger Pau Monné
Please try to avoid top posting.

On Wed, Jun 13, 2018 at 10:20:48PM +, Anchal Agarwal wrote:
> Hi Roger,
> To answer your question, due to the lack of mentioned commit
> (commit 12ea729645ac ("xen/blkback: unmap all persistent grants when
> frontend gets disconnected") in the older dom0 kernels(<3.2),resume from

This fix that you mention is only present in kernels >= 3.18 AFAICT,
and persistent grants where introduced in 3.8 (0a8704a51f38), so
anything < 3.8 should work fine. Not sure why you mention 3.2 here.

> hibernation can fail on guest side. In the absence of the commit,
> Persistant Grants are not unmapped immediately when frontend is 
> disconnected from backend and hence leave the block device in an 
> inconsistent state. To avoid this unstability and work with larger set 
> of kernel versions, this approach had been used. Once you don't have 
> any pending req/resp it is safer for guest to resume from hibernation.

I think the fix should be backported (if it hasn't been done yet) to
kernels between 3.8 and 3.18. I don't like to add all this code just
to work around a Linux backend kernel bug.

AFAICT if persistent grants work as expected you could use almost the
same path that's used for migration, greatly reducing the amount of
code that you need to add.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation

2018-06-13 Thread Roger Pau Monné
On Tue, Jun 12, 2018 at 08:56:13PM +, Anchal Agarwal wrote:
> From: Munehisa Kamata 
> 
> Add freeze and restore callbacks for PM suspend and hibernation support.
> The freeze handler stops a block-layer queue and disconnect the frontend
> from the backend while freeing ring_info and associated resources. The
> restore handler re-allocates ring_info and re-connect to the backedend,
> so the rest of the kernel can continue to use the block device
> transparently.Also, the handlers are used for both PM
> suspend and hibernation so that we can keep the existing suspend/resume
> callbacks for Xen suspend without modification.
> If a backend doesn't have commit 12ea729645ac ("xen/blkback: unmap all
> persistent grants when frontend gets disconnected"), the frontend may see
> massive amount of grant table warning when freeing resources.
> 
>  [   36.852659] deferring g.e. 0xf9 (pfn 0x)
>  [   36.855089] xen:grant_table: WARNING: g.e. 0x112 still in use!
> 
> In this case, persistent grants would need to be disabled.
> 
> Ensure no reqs/rsps in rings before disconnecting. When disconnecting
> the frontend from the backend in blkfront_freeze(), there still may be
> unconsumed requests or responses in the rings, especially when the
> backend is backed by network-based device. If the frontend gets
> disconnected with such reqs/rsps remaining there, it can cause
> grant warnings and/or losing reqs/rsps by freeing pages afterward.

I'm not sure why having pending requests can cause grant warnings or
lose of requests. If handled properly this shouldn't be an issue.
Linux blkfront already does live migration (which also involves a
reconnection of the frontend) with pending requests and that doesn't
seem to be an issue.

> This can lead resumed kernel into unrecoverable state like unexpected
> freeing of grant page and/or hung task due to the lost reqs or rsps.
> Therefore we have to ensure that there is no unconsumed requests or
> responses before disconnecting.

Given that we have multiqueue, plus multipage rings, I'm not sure
waiting for the requests on the rings to complete is a good idea.

Why can't you just disconnect the frontend and requeue all the
requests in flight? When the frontend connects on resume those
requests will be queued again.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] VT-d: re-phrase logic in vtd_set_hwdom_mapping() for clarity

2018-06-13 Thread Roger Pau Monné
On Tue, Jun 12, 2018 at 02:47:33PM +0100, Paul Durrant wrote:
> It is hard to reconcile the comment at the top of the loop in
> vtd_set_hwdom_mapping() with the if statement following it. This patch
> re-phrases the logic, preserving the semantics, but making it easier
> to read.
> 
> The patch also modifies the Xen command line documentation to make it
> clear that iommu_inclusive_mapping only applies to pages up to the 4GB
> boundary.
> 
> NOTE: This patch also corrects the indentation of the printk() towards
>   the end of vtd_set_hwdom_mapping().
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] VT-d: reconcile iommu_inclusive_mapping and iommu=dom0-strict

2018-06-13 Thread Roger Pau Monné
On Tue, Jun 12, 2018 at 02:47:34PM +0100, Paul Durrant wrote:
> The documentation for the iommu_inclusive_mapping Xen command line option
> states:
> 
> "Use this to work around firmware issues providing incorrect RMRR entries"
> 
> Unfortunately this workaround does not function correctly if the dom0-strict
> iommu option is also specified.
> 
> The documentation goes on to say:
> 
> "Rather than only mapping RAM pages for IOMMU accesses for Dom0, with this
>  option all pages up to 4GB, not marked as unusable in the E820 table, will
>  get a mapping established."
> 
> This patch modifies the VT-d hardware domain initialization code such that
> the workaround will continue to function in dom0-strict mode, by mapping
> all pages not marked as unusable *unless* they are RAM pages not assigned
> to dom0.
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [OSSTEST PATCH] cr-daily-branch, cr-publish-flight-logs: Tolerate failure to push harness

2018-06-13 Thread Roger Pau Monné
On Wed, Jun 13, 2018 at 02:51:56PM +0100, Ian Jackson wrote:
> Provide cr-publish-flight-logs --push-harness-try, which attempts the
> push but doesn't mind if it fails.
> 
> If we are not --real, tolerate failure to publish the flight logs.
> 
> Also, honour OSSTEST_PUSH_HARNESS which might contain '' or
> --push-harness or --push-harness-try.
> 
> CC: Roger Pau Monné 
> Signed-off-by: Ian Jackson 

Just tested this by manually killing a cr-daily-branch and it worked
properly, see:

http://logs.test-lab.xenproject.org/osstest/logs/124171/

So:

Tested-by: Roger Pau Monné 

Thanks!

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [osstest] add FreeBSD flight

2018-06-13 Thread Roger Pau Monné
Hello,

I've run a test flight of my FreeBSD osstest series today, the flight
shows all green:

http://logs.test-lab.xenproject.org/osstest/logs/124163/

The series can be found at:

git://xenbits.xen.org/people/royger/osstest.git freebsd_v18

AFAICT it's fully Acked. I've rebased it on top of current osstest
master branch. I think it's ready to go in unless there are
objections.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/8] libxc: Provide access to internal handles

2018-06-11 Thread Roger Pau Monné
On Mon, Jun 11, 2018 at 03:13:18PM +0100, Ian Jackson wrote:
> In order to support auditing of qemu depriv, my audit tool wants to
> know the fd of a privcmd handle on which it can easily make
> hypercalls.  xencall provides such a handle, but has no cooked
> facilities for making hypercalls.  So I open a libxc handle.  That
> means I need to get the privcmd fd out of the libxc handle.
> 
> ISTM that it is best to do this by providing an interface to get the
> underlying library handles for a libxc handle.  This kind of interface
> is quite common elsewhere and has not caused problems.
> 
> libxc is not a stable API so the downside risk of providing this
> access is not significant.
> 
> Signed-off-by: Ian Jackson 
> Acked-by: Wei Liu 

Reviewed-by: Roger Pau Monné 

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/8] tools/tests/depriv: Install depriv-fd-checker in our private libexec directory

2018-06-11 Thread Roger Pau Monné
On Mon, Jun 11, 2018 at 03:13:23PM +0100, Ian Jackson wrote:
> osstest is going to want to call it, and should not be expected to
> fish it out of the build tree.
> 
> Signed-off-by: Ian Jackson 

Reviewed-by: Roger Pau Monné 

Thanks

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/8] tools: xencall, xengnttab, xengntshr: Provide access to internal fds

2018-06-11 Thread Roger Pau Monné
On Mon, Jun 11, 2018 at 03:13:19PM +0100, Ian Jackson wrote:
> I want this to support my qemu depriv descriptor audit tool.
> 
> Signed-off-by: Ian Jackson 
> CC: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

> diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
> index 2f96144..c6a0181 100644
> --- a/tools/libs/call/libxencall.map
> +++ b/tools/libs/call/libxencall.map
> @@ -17,3 +17,9 @@ VERS_1.0 {
>   xencall_free_buffer_pages;
>   local: *; /* Do not expose anything by default */
>  };
> +
> +VERS_1.1 {
> + global:
> + xencall_fd;
> + 

^ Extra newline?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 6/8] tools/tests: Allow a test subdir to have `install' and `uninstall' targets

2018-06-11 Thread Roger Pau Monné
On Mon, Jun 11, 2018 at 03:13:22PM +0100, Ian Jackson wrote:
> Signed-off-by: Ian Jackson 

Reviewed-by: Roger Pau Monné 

Thanks

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] multiboot2: clarify usage of the address tag

2018-06-11 Thread Roger Pau Monné
On Fri, Jun 08, 2018 at 12:39:28PM +0200, Daniel Kiper wrote:
> On Fri, Jun 08, 2018 at 12:08:22PM +0200, Roger Pau Monné wrote:
> > On Fri, Jun 08, 2018 at 11:35:52AM +0200, Daniel Kiper wrote:
> > > On Thu, Jun 07, 2018 at 05:59:06PM +0200, Roger Pau Monne wrote:
> > > > Add a note to spell out that if the address tag is not present the
> > > > file should be loaded using the elf header.
> > > >
> > > > Signed-off-by: Roger Pau Monné 
> > > > ---
> > > > Cc: Daniel Kiper 
> > > > Cc: xen-devel@lists.xenproject.org
> > > > ---
> > > > Changes since v1:
> > > >  - s/elf/@sc{elf}/
> > > >  - s/Multiboot/Multiboot2/
> > > > ---
> > > >  doc/multiboot.texi | 6 ++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/doc/multiboot.texi b/doc/multiboot.texi
> > > > index 2e2d7e74a..3c797787c 100644
> > > > --- a/doc/multiboot.texi
> > > > +++ b/doc/multiboot.texi
> > > > @@ -509,6 +509,12 @@ assumes that no bss segment is present.
> > > >
> > > >  @end table
> > > >
> > > > +Note: This information does not need to be provided if the kernel image
> > > > +is in @sc{elf} format, but it must be provided if the image is in a.out
> > > > +format or in some other format. Compliant boot loaders must be able to
> > > > +load images that are either in @sc{elf} format or contain the address
> > > > +tag embedded in the Multiboot2 header.
> > > > +
> > >
> > > Now it is better. However, there is a lack of information that this tag
> > > should be preferred over the relevant data provided in the ELF header if
> > > both are available in the image. This have to be clear like it is in
> > > Multiboot spec.
> >
> > Would you be OK with adding the following sentence at the end:
> >
> > "When the address tag is present it must be used in order to load the
> > image, regardless of whether an @sc{elf} header is also present."
> 
> I would put this as a second sentence in note, just after "... some
> other format." However, then probably last sentence should be rephrased
> a bit.

I think the following is clear enough:

"Note: This information does not need to be provided if the kernel image is in
@sc{elf} format, but it must be provided if the image is in a.out format or in
some other format. When the address tag is present it must be used in order to
load the image, regardless of whether an @sc{elf} header is also present.
Compliant boot loaders must be able to load images that are either in @sc{elf}
format or contain the address tag embedded in the Multiboot2 header."

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] VT-d: reconcile iommu_inclusive_mapping and iommu=dom0-strict

2018-06-11 Thread Roger Pau Monné
On Fri, Jun 08, 2018 at 04:30:30PM +0100, Paul Durrant wrote:
> The documentation for the iommu_inclusive_mapping Xen command line option
> states:
> 
> "Use this to work around firmware issues providing incorrect RMRR entries"
> 
> Unfortunately this workaround does not function correctly if the dom0-strict
> iommu option is also specified.
> 
> The documentation goes on to say:
> 
> "Rather than only mapping RAM pages for IOMMU accesses for Dom0, with this
>  option all pages up to and including 4GB, not marked as unusable in the
>  E820 table, will get a  mapping established."

My version of xen-command-line.markdown says:

"Use this to work around firmware issues providing incorrect RMRR entries.
Rather than only mapping RAM pages for IOMMU accesses for Dom0, with this
option all pages not marked as unusable in the E820 table will get a mapping
established."

I think the documentation or the code needs fixing, because the
current code does indeed only map up to 4GB.

> 
> This patch modifies the VT-d hardware domain initialization code such that
> the workaround will continue to function in dom0-strict mode, by mapping
> all pages not marked as unusable *unless* they are RAM pages not assigned
> to dom0.
> 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Julien Grall 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Wei Liu 
> Cc: Kevin Tian 
> ---
>  docs/misc/xen-command-line.markdown   | 4 +++-
>  xen/drivers/passthrough/iommu.c   | 2 +-
>  xen/drivers/passthrough/vtd/iommu.c   | 2 +-
>  xen/drivers/passthrough/vtd/x86/vtd.c | 8 
>  xen/include/xen/iommu.h   | 2 +-
>  5 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown 
> b/docs/misc/xen-command-line.markdown
> index 6beb28dada..97768f1633 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1213,7 +1213,9 @@ wait descriptor timed out', try increasing this value.
>  Use this to work around firmware issues providing incorrect RMRR entries.
>  Rather than only mapping RAM pages for IOMMU accesses for Dom0, with this
>  option all pages up to and including 4GB, not marked as unusable in the
> -E820 table, will get a mapping established.
> +E820 table, will get a mapping established. Note that if `dom0-strict`
> +mode is enabled then conventional RAM pages not assigned to dom0 will not
> +be mapped.
>  
>  ### irq\_ratelimit (x86)
>  > `= `
> diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
> index 2c44fabf99..a483212356 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -52,7 +52,7 @@ custom_param("iommu", parse_iommu_param);
>  bool_t __initdata iommu_enable = 1;
>  bool_t __read_mostly iommu_enabled;
>  bool_t __read_mostly force_iommu;
> -bool_t __hwdom_initdata iommu_dom0_strict;
> +bool_t __read_mostly iommu_dom0_strict;

I'm not sure why you need to change the attributes of
iommu_dom0_strict, AFAICT it's still only used in hwdom_init (or init)
functions?

>  bool_t __read_mostly iommu_verbose;
>  bool_t __read_mostly iommu_workaround_bios_bug;
>  bool_t __read_mostly iommu_igfx = 1;
> diff --git a/xen/drivers/passthrough/vtd/iommu.c 
> b/xen/drivers/passthrough/vtd/iommu.c
> index 08bce92d40..e3f043288b 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1304,7 +1304,7 @@ static void __hwdom_init intel_iommu_hwdom_init(struct 
> domain *d)
>  {
>  struct acpi_drhd_unit *drhd;
>  
> -if ( !iommu_passthrough && !need_iommu(d) )
> +if ( !iommu_passthrough )

I think you will have to add an is_pv_domain(d) check here, or else PVH
Dom0 will also get those mappings, which is wrong because in the PVH
case we don't want to identity map PFNs into guest p2m.

PVH Dom0 wasn't calling vtd_set_hwdom_mapping because of the
!need_iommu check.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/2] VT-d: re-phrase logic in vtd_set_hwdom_mapping() for clarity

2018-06-11 Thread Roger Pau Monné
On Fri, Jun 08, 2018 at 04:30:29PM +0100, Paul Durrant wrote:
> diff --git a/docs/misc/xen-command-line.markdown 
> b/docs/misc/xen-command-line.markdown
> index 8712a833a2..6beb28dada 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1212,8 +1212,8 @@ wait descriptor timed out', try increasing this value.
>  
>  Use this to work around firmware issues providing incorrect RMRR entries.
>  Rather than only mapping RAM pages for IOMMU accesses for Dom0, with this
> -option all pages not marked as unusable in the E820 table will get a mapping
> -established.
> +option all pages up to and including 4GB, not marked as unusable in the
> +E820 table, will get a mapping established.

Sorry, I've reviewed the patches in the wrong order. You can ignore
the comments I've made related to this in patch 2.

>  ### irq\_ratelimit (x86)
>  > `= `
> diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c 
> b/xen/drivers/passthrough/vtd/x86/vtd.c
> index 88a60b3307..5c440ba183 100644
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> @@ -118,22 +118,26 @@ void __hwdom_init vtd_set_hwdom_mapping(struct domain 
> *d)
>  
>  for ( i = 0; i < top; i++ )
>  {
> +unsigned long pfn = pdx_to_pfn(i);
> +bool map;
>  int rc = 0;
>  
>  /*
> - * Set up 1:1 mapping for dom0. Default to use only conventional RAM
> - * areas and let RMRRs include needed reserved regions. When set, the
> - * inclusive mapping maps in everything below 4GB except unusable
> - * ranges.
> + * Set up 1:1 mapping for dom0. Default to include only
> + * conventional RAM areas and let RMRRs include needed reserved
> + * regions. When set, the inclusive mapping maps in every pfn up
> + * to and including 4GB except those that fall in unusable ranges.
>   */
> -unsigned long pfn = pdx_to_pfn(i);
> +if ( iommu_inclusive_mapping &&
> + pfn <= (0xUL >> PAGE_SHIFT) )

Please use GB(4) here for clarity.

> +map = !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE);
> +else
> +map = page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL);
> +
> +if ( !map )
> +continue;
>  
> -if ( pfn > (0xUL >> PAGE_SHIFT) ?
> - (!mfn_valid(_mfn(pfn)) ||
> -  !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL)) :
> - iommu_inclusive_mapping ?
> - page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) :
> - !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) )
> +if ( pfn > (0xUL >> PAGE_SHIFT) && !mfn_valid(_mfn(pfn)) )

I would maybe do this check before the page_is_ram_type one, so that
you can discard invalid mfns earlier.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] strange behavior with Multiboot2 on EFI

2018-06-15 Thread Roger Pau Monné
Adding Daniel Kiper.

On Wed, Jun 13, 2018 at 02:09:45AM +0300, Kristaps Čivkulis wrote:
> Hello,
> 
> I am implementing Multiboot2 support for FreeBSD loader to load Xen
> kernel. Currently I pass EFI 64-bit system table pointer tag, EFI boot
> services not terminated tag, EFI 64-bit image handle pointer tag and
> Image load base physical address tag.
> 
> The problem is, Xen kernel gets stuck into infinite loop at address
> near 0x7fa419be without printing anything. System table is at
> 0x7fbee018 and image handle is at 0x7f22fd98. If I debugged correctly,
> it got into infinite loop after first time calling void
> efi_console_set_mode(void) [0] because it didn't return from it.
> 
> Note that if I don't pass EFI 64-bit system table pointer tag, boot
> services not terminated tag or 64-bit image handle pointer tag then
> Xen kernel correctly prints error message!
> 
> Is this behavior intended? If not, where could be a problem?
> 
> I compiled Xen kernel on FreeBSD with this command:
> 
> # gmake -j4 xen clang=y LD=/usr/local/bin/ld CC="cc -B/usr/local/bin" \
>   NM=/usr/local/bin/nm
> 
> and I test with this command:
> 
> $ qemu-system-x86_64 -m 2048 -bios OVMF-pure-efi.fd -hda test_disk.img
> 
> OVMF-pure-efi.fd is taken from
> edk2.git-ovmf-x64-0-20180612.155.g5a56c04939.noarch.rpm at [1]

Daniel, ISTR that you used QEMU + OVMF in order to develop multiboot2
support for Xen, do you happen to know if a specific OVMF or QEMU
version is needed to get this working?

Kristaps is trying to get Xen/EFI to boot using OVMF and multiboot2,
but that doesn't seem to work (at least with the versions he is
testing with).

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Building xen-hypervisor 4.10 and xen-tools on Ubuntu 16.04 for PVH

2018-05-29 Thread Roger Pau Monné
On Tue, May 29, 2018 at 03:32:40AM +, .. .. wrote:
> Hello,
> 
> I am doing a study on Virtual Machine Introspection on Intel SGX based
> system using Xen based VMs.
> 
> I followed the article  -
> 
> https://wiki.xenproject.org/wiki/Linux_PVH
> 
> 
> , in order to build xen-hypervisor 4.10 and xen-tools on Ubuntu 16.04 for
> PVH.
> 
> However, after installing xen-tools and reboting, my machine, Ubuntu 16.4
> is stuck in a login loop. I am able to log in but after a few seconds it
> logs out and goes back to login screen.

Hm, that doesn't seem related to Xen. Does it work fine if you boot
the same system without Xen?

> Also, during the service enabling steps, all except xenstored.service were
> enabled. So I am not sure if the login loop issue is due to that.

What error do you get from the xenstored service? Can you provide the
logs of booting Xen and Linux?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Building xen-hypervisor 4.10 and xen-tools on Ubuntu 16.04 for PVH+

2018-05-29 Thread Roger Pau Monné
Please don't drop xen-devel from Cc and don't top post.

On Tue, May 29, 2018 at 09:56:29AM +0200, .. .. wrote:
> Yes, without xen, the system boots up fine. Unfortunately, since the system
> within few seconds goes back to login screen I am not able to check even
> the logs. But I am not sure if this is due to xenstored service. It looks
> more of some grub issue caused by Xen. The error on terminal while trying
> to enable xenstored service was directory not found, even though I was in
> the right path and all other services were successfully enabled.

Can you paste the output of `ls /dev/xen/`?

Did you also change your Dom0 kernel, or only Xen?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 05/11] x86/emul: Unfold %cr4.de handling in x86emul_read_dr()

2018-06-06 Thread Roger Pau Monné
On Mon, Jun 04, 2018 at 02:59:09PM +0100, Andrew Cooper wrote:
> No functional change (as curr->arch.debugreg[5] is zero when DE is clear), but
> this change simplifies the following patch.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 06/11] x86: Reorganise and rename debug register fields in struct vcpu

2018-06-06 Thread Roger Pau Monné
On Mon, Jun 04, 2018 at 02:59:10PM +0100, Andrew Cooper wrote:
> Reusing debugreg[5] for the PV emulated IO breakpoint information is confusing
> to read.  Instead, introduce a dr7_emul field in pv_vcpu for the pupose.
> 
> With the PV emulation out of the way, debugreg[4,5] are entirely unused and
> don't need to be stored.
> 
> Rename debugreg[0..3] to dr[0..3] to reduce code volume, but keep them as an
> array because their behaviour is identical and this helps simplfy some of the
> PV handling.  Introduce dr6 and dr7 fields to replace debugreg[6,7] which
> removes the storage for debugreg[4,5].
> 
> Two minor alterations on the PV side is that merging of the emulated state
> happens along with the other dr handling, rather than much later, and
> arch_set_info_guest() now checks the return value from set_debugreg() fails
> the hypercall rather than silently discarding the values.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 197f8d6..59d5e4a 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -500,6 +500,12 @@ struct pv_vcpu
>  unsigned long shadow_ldt_mapcnt;
>  spinlock_t shadow_ldt_lock;
>  
> +/*
> + * %dr7 bits the guest has set, but aren't loaded into hardware, and are
> + * completely emulated.
> + */
> +uint32_t dr7_emul;

Upper 32bit are reserved ATM, but won't it be better to just use a
unsigned long (like it's used below to store the registers)?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 11/11] x86/dbg: Cleanup of legacy dr6 constants

2018-06-06 Thread Roger Pau Monné
On Mon, Jun 04, 2018 at 02:59:15PM +0100, Andrew Cooper wrote:
> Replace the few remaining uses with X86_DR6_* constants.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 04/11] x86: Fix calculation of %dr6/7 reserved bits

2018-06-06 Thread Roger Pau Monné
On Wed, Jun 06, 2018 at 04:59:24PM +0100, Andrew Cooper wrote:
> On 06/06/18 16:49, Roger Pau Monné wrote:
> > On Mon, Jun 04, 2018 at 02:59:08PM +0100, Andrew Cooper wrote:
> >> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> >> index 10415e6..7fddae1 100644
> >> --- a/xen/arch/x86/hvm/hvm.c
> >> +++ b/xen/arch/x86/hvm/hvm.c
> >> @@ -977,6 +977,7 @@ unsigned long hvm_cr4_guest_valid_bits(const struct 
> >> domain *d, bool restore)
> >>  
> >>  static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
> >>  {
> >> +const struct cpuid_policy *cp = d->arch.cpuid;
> >>  int vcpuid;
> >>  struct vcpu *v;
> >>  struct hvm_hw_cpu ctxt;
> >> @@ -1154,8 +1155,8 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
> >> hvm_domain_context_t *h)
> >>  v->arch.debugreg[1] = ctxt.dr1;
> >>  v->arch.debugreg[2] = ctxt.dr2;
> >>  v->arch.debugreg[3] = ctxt.dr3;
> >> -v->arch.debugreg[6] = ctxt.dr6;
> >> -v->arch.debugreg[7] = ctxt.dr7;
> >> +v->arch.debugreg[6] = adjust_dr6_rsvd(ctxt.dr6, cp->feat.rtm);
> >> +v->arch.debugreg[7] = adjust_dr7_rsvd(ctxt.dr7, cp->feat.rtm);
> > I don't know much, but if the value in dr6/dr7 is changed from the one
> > in the provided context, won't the guest experience issues?
> >
> > Won't it be better to plain refuse to load a context that has
> > unsupported DR6/7 values?
> 
> adjust_dr{6,7}_rsvd() match the adjustments the processor does when
> loading values.
> 
> For backwards compatibility, the processor accepts any bit pattern for
> the bottom 32 bits, then ignores the bits it doesn't care about.  When
> reading the register back, the unimplemented bits appear as their
> defaults, which aren't all 0.
> 
> In the case of hvm_load_cpu_ctxt(), we'd ideally strictly limit it to
> architectural expectation, but there will be a good chunk of VMs out
> there which haven't ever touched %dr6/%dr7 in their lifetime, and will
> therefore have 0's in the migration record at this point.

Right, I guess that's the best approach.

Reviewed-by: Roger Pau Monné 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 07/11] x86/emul: Add pending_dbg field to x86_event

2018-06-06 Thread Roger Pau Monné
On Mon, Jun 04, 2018 at 02:59:11PM +0100, Andrew Cooper wrote:
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index bfa3a0d..39c9ddc 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2483,7 +2483,7 @@ void update_guest_eip(void)
>  }
>  
>  if ( regs->eflags & X86_EFLAGS_TF )
> -hvm_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
> +hvm_inject_debug_exn(X86_DR6_BS);
>  }
>  
>  static void vmx_fpu_dirty_intercept(void)
> @@ -3382,7 +3382,7 @@ static int vmx_handle_eoi_write(void)
>   * It is the callers responsibility to ensure that this function is only used
>   * in the context of an appropriate vmexit.
>   */
> -static void vmx_propagate_intr(unsigned long intr)
> +static void vmx_propagate_intr(unsigned long intr, unsigned long pending_dbg)
>  {
>  struct x86_event event = {
>  .vector = MASK_EXTR(intr, INTR_INFO_VECTOR_MASK),
> @@ -3406,6 +3406,9 @@ static void vmx_propagate_intr(unsigned long intr)
>  else
>  event.insn_len = 0;
>  
> +if ( event.vector == TRAP_debug )
> +event.pending_dbg = pending_dbg;
> +
>  hvm_inject_event();
>  }
>  
> @@ -3715,7 +3718,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>  if ( rc < 0 )
>  goto exit_and_crash;
>  if ( !rc )
> -vmx_propagate_intr(intr_info);
> +vmx_propagate_intr(intr_info, exit_qualification);
>  }
>  else
>  domain_pause_for_debugger();
> @@ -3736,7 +3739,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>  if ( rc < 0 )
>  goto exit_and_crash;
>  if ( !rc )
> -vmx_propagate_intr(intr_info);
> +vmx_propagate_intr(intr_info, 0 /* N/A */);
>  }
>  else
>  {
> @@ -3776,7 +3779,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>  break;
>  case TRAP_alignment_check:
>  HVMTRACE_1D(TRAP, vector);
> -vmx_propagate_intr(intr_info);
> +vmx_propagate_intr(intr_info, 0 /* N/A */);
>  break;
>  case TRAP_nmi:
>  if ( MASK_EXTR(intr_info, INTR_INFO_INTR_TYPE_MASK) !=

I'm kind of lost here. Don't you need to update vmx_inject_event so
that it does something with the pending_dbg field in x86_event?

(and the same to svm_inject_event)

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 07/11] x86/emul: Add pending_dbg field to x86_event

2018-06-06 Thread Roger Pau Monné
On Wed, Jun 06, 2018 at 05:50:38PM +0100, Andrew Cooper wrote:
> On 06/06/18 17:46, Roger Pau Monné wrote:
> > On Mon, Jun 04, 2018 at 02:59:11PM +0100, Andrew Cooper wrote:
> >> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> >> index bfa3a0d..39c9ddc 100644
> >> --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> @@ -2483,7 +2483,7 @@ void update_guest_eip(void)
> >>  }
> >>  
> >>  if ( regs->eflags & X86_EFLAGS_TF )
> >> -hvm_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
> >> +hvm_inject_debug_exn(X86_DR6_BS);
> >>  }
> >>  
> >>  static void vmx_fpu_dirty_intercept(void)
> >> @@ -3382,7 +3382,7 @@ static int vmx_handle_eoi_write(void)
> >>   * It is the callers responsibility to ensure that this function is only 
> >> used
> >>   * in the context of an appropriate vmexit.
> >>   */
> >> -static void vmx_propagate_intr(unsigned long intr)
> >> +static void vmx_propagate_intr(unsigned long intr, unsigned long 
> >> pending_dbg)
> >>  {
> >>  struct x86_event event = {
> >>  .vector = MASK_EXTR(intr, INTR_INFO_VECTOR_MASK),
> >> @@ -3406,6 +3406,9 @@ static void vmx_propagate_intr(unsigned long intr)
> >>  else
> >>  event.insn_len = 0;
> >>  
> >> +if ( event.vector == TRAP_debug )
> >> +event.pending_dbg = pending_dbg;
> >> +
> >>  hvm_inject_event();
> >>  }
> >>  
> >> @@ -3715,7 +3718,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
> >>  if ( rc < 0 )
> >>  goto exit_and_crash;
> >>  if ( !rc )
> >> -vmx_propagate_intr(intr_info);
> >> +vmx_propagate_intr(intr_info, exit_qualification);
> >>  }
> >>  else
> >>  domain_pause_for_debugger();
> >> @@ -3736,7 +3739,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
> >>  if ( rc < 0 )
> >>  goto exit_and_crash;
> >>  if ( !rc )
> >> -vmx_propagate_intr(intr_info);
> >> +vmx_propagate_intr(intr_info, 0 /* N/A */);
> >>  }
> >>  else
> >>  {
> >> @@ -3776,7 +3779,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
> >>  break;
> >>  case TRAP_alignment_check:
> >>  HVMTRACE_1D(TRAP, vector);
> >> -vmx_propagate_intr(intr_info);
> >> +vmx_propagate_intr(intr_info, 0 /* N/A */);
> >>  break;
> >>  case TRAP_nmi:
> >>  if ( MASK_EXTR(intr_info, INTR_INFO_INTR_TYPE_MASK) !=
> > I'm kind of lost here. Don't you need to update vmx_inject_event so
> > that it does something with the pending_dbg field in x86_event?
> >
> > (and the same to svm_inject_event)
> 
> The commit message specifically says "To begin resolving this issue, add
> a ..."
> 
> This patch is plumbing.  Later patches are fixes, and it takes until
> patch 9 to get working sensibly.

Sorry, realized this when reviewing the next patch:

Reviewed-by: Roger Pau Monné 

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 09/11] x86: Fix merging of new status bits into %dr6

2018-06-06 Thread Roger Pau Monné
On Mon, Jun 04, 2018 at 02:59:13PM +0100, Andrew Cooper wrote:
> The current logic used to update %dr6 when injecting #DB is buggy.  The
> architectural behaviour is to overwrite B{0..3} (rather than accumulate) and
> accumulate all other bits.
> 
> Introduce a new merge_dr6() helper, which also takes care of handing RTM
> correctly.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 08/11] x86/hvm: RFC - PROBABLY BROKEN - Defer all debugging/monitor actions to {svm, vmx}_inject_event()

2018-06-06 Thread Roger Pau Monné
On Mon, Jun 04, 2018 at 02:59:12PM +0100, Andrew Cooper wrote:
> Currently, there is a lot of functionality in the #DB intercepts, and some
> repeated functionality in the *_inject_event() logic.
> 
> The gdbsx code is implemented at both levels (albeit differently for #BP,
> which is presumably due to the fact that the old emulator behaviour used to be
> to move %rip forwards for traps), while the monitor behaviour only exists at
> the intercept level.
> 
> Updating of %dr6 is implemented (buggily) at both levels, but having it at
> both levels is problematic to implement correctly.
> 
> Rearrange the logic to have nothing interesting at the intercept level, and
> everything implemented at the injection level.  Amongst other things, this
> means that the monitor subsystem will pick up debug actions from emulated
> events.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

> @@ -1797,16 +1803,39 @@ static void vmx_inject_event(const struct x86_event 
> *event)
>  __vmread(GUEST_IA32_DEBUGCTL, );
>  __vmwrite(GUEST_IA32_DEBUGCTL, val & ~IA32_DEBUGCTLMSR_LBR);
>  }
> -if ( cpu_has_monitor_trap_flag )
> -break;
> +
>  /* fall through */
>  case TRAP_int3:
>  if ( curr->domain->debugger_attached )
>  {
>  /* Debug/Int3: Trap to debugger. */
> +if ( _event.vector == TRAP_int3 )
> +{
> +/* N.B. Can't use __update_guest_eip() for risk of recusion. 
> */
> +regs->rip += _event.insn_len;
> +regs->eflags &= ~X86_EFLAGS_RF;
> +curr->arch.gdbsx_vcpu_event = TRAP_int3;
> +}
> +
>  domain_pause_for_debugger();
>  return;
>  }
> +else
> +{
> +int rc = hvm_monitor_debug(regs->rip,
> +   _event.vector == TRAP_debug
> +   ? HVM_MONITOR_DEBUG_EXCEPTION
> +   : HVM_MONITOR_SOFTWARE_BREAKPOINT,
> +   _event.type, _event.insn_len);
> +if ( rc < 0 )
> +{
> +gprintk(XENLOG_ERR, "Monitor debug error %d\n", rc);
> +domain_crash(curr->domain);
> +return;
> +}
> +if ( rc )
> +return; /* VCPU paused.  Wait for monitor. */
> +}
>  break;

This look fairly similar to the svm_inject_event code, I wonder if
those could be merged somehow (or certain part of it shared).

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC] OVMF NVRAM Variable Retention

2018-06-07 Thread Roger Pau Monné
On Thu, Jun 07, 2018 at 09:59:10AM +0100, Ross Lagerwall wrote:
> On 06/06/2018 04:01 PM, aaron.yo...@oracle.com wrote:
> >    Any comments/suggestions/opinions/caveats on this approach?
> 
> I did this a while back. It is easy enough to do:
> 
> 1) Have Xen load OVMF_code.fd rather than the combined blob.
> 
> 2) Tweak the location in guest memory where hvmloader loads the blob.
> 
> 3) Tweak QEMU to load the OVMF_vars.fd blob at the correct location.
> 
> 4) Start QEMU with emulated flash for OVMF_vars.fd only.
> 
> Tweaking the locations ensures that the various parts of OVMF are at the
> location it expects. If you want, I could probably dig up some patches
> (hacks) which do this.

I don't like to tie UEFI (OVMF) to QEMU. We want to also run guests
with UEFI firmware without QEMU, see below.

> > 
> >    Or any other suggested approaches on how to fix this issue?
> 
> However... I did not like this approach for two reasons:
> 
> 1) Having an emulated flash blob is difficult to manage outside of the guest
> (i.e. populating initial state, updating variables if needed). For
> XenServer, we want more flexibility.
> 
> 2) While Secure Boot can be enabled with this implementation, it is not
> sufficiently secure because the guest is able to write anything it wants to
> the emulated flash. KVM solved this problem with SMM mode but I don't like
> that solution either.
> 
> So I am busy implementing:
> 
> A UEFI driver frontend which implements variable services by proxying
> requests to a backend running in dom0 (could be part of QEMU). This ensures
> security because the guest cannot directly write to the variable storage and
> the authentication checks are done outside of guest context. It allows
> flexibility because the backend can store the variables in any form (e.g. an
> sqlite database) rather than being restricted to an emulated flash blob.

This seems like a good idea, specially because that means we don't
need QEMU to load anything into the guest, and this approach could be
likely used by PVH guests running with UEFI also.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 04/11] x86: Fix calculation of %dr6/7 reserved bits

2018-06-06 Thread Roger Pau Monné
On Mon, Jun 04, 2018 at 02:59:08PM +0100, Andrew Cooper wrote:
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 10415e6..7fddae1 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -977,6 +977,7 @@ unsigned long hvm_cr4_guest_valid_bits(const struct 
> domain *d, bool restore)
>  
>  static int hvm_load_cpu_ctxt(struct domain *d, hvm_domain_context_t *h)
>  {
> +const struct cpuid_policy *cp = d->arch.cpuid;
>  int vcpuid;
>  struct vcpu *v;
>  struct hvm_hw_cpu ctxt;
> @@ -1154,8 +1155,8 @@ static int hvm_load_cpu_ctxt(struct domain *d, 
> hvm_domain_context_t *h)
>  v->arch.debugreg[1] = ctxt.dr1;
>  v->arch.debugreg[2] = ctxt.dr2;
>  v->arch.debugreg[3] = ctxt.dr3;
> -v->arch.debugreg[6] = ctxt.dr6;
> -v->arch.debugreg[7] = ctxt.dr7;
> +v->arch.debugreg[6] = adjust_dr6_rsvd(ctxt.dr6, cp->feat.rtm);
> +v->arch.debugreg[7] = adjust_dr7_rsvd(ctxt.dr7, cp->feat.rtm);

I don't know much, but if the value in dr6/dr7 is changed from the one
in the provided context, won't the guest experience issues?

Won't it be better to plain refuse to load a context that has
unsupported DR6/7 values?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/4] firmware/seabios: fix build on systems with non GNU toolchains

2018-07-02 Thread Roger Pau Monné
On Mon, Jul 02, 2018 at 10:42:44AM +0100, Wei Liu wrote:
> On Mon, Jul 02, 2018 at 11:36:28AM +0200, Roger Pau Monné wrote:
> > On Mon, Jul 02, 2018 at 10:32:15AM +0100, Wei Liu wrote:
> > > On Mon, Jul 02, 2018 at 10:28:23AM +0200, Roger Pau Monne wrote:
> > > [...]
> > > >  
> > > >  subtree-force-update:
> > > > @@ -128,3 +128,6 @@ endif
> > > >  subtree-force-update-all:
> > > > $(MAKE) seabios-dir-force-update
> > > > $(MAKE) ovmf-dir-force-update
> > > > +
> > > > +subdir-all-seabios-dir: seabios-dir
> > > > +   $(MAKE) -C $< CC=$(SEABIOSCC) LD=$(SEABIOSLD) PYTHON=$(PYTHON) 
> > > > all;
> > > 
> > > There is a pattern rule in Rules.mk, so I would very much avoid having
> > > it redefined.
> > 
> > I know, but I need to redefine it so I can pass the extra options. I
> > don't see any better way to do this TBH.
> 
> Oh, can you try the following to see if it works?
> 
>   subdir-all-seabios-dir: CC=$(SEABIOSCC) LD=$(SEABIOSLD)

This sadly doesn't seem to work. I assume this is some kind of weird
behaviour of the Xen build system? Is this due to the patter rules
that the build system uses?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] tools/libxc: Drop xc_cpuid_to_str()

2018-07-02 Thread Roger Pau Monné
On Mon, Jul 02, 2018 at 10:57:25AM +0100, Andrew Cooper wrote:
> This helper appears to have been introduced 10 years ago by c/s 5f14a87ceb
> "x86, hvm: Guest CPUID configuration" and never had any users at all.
> 
> alloc_str() is actually an opencoded calloc(), and now only has a single
> caller.  Use calloc() directly and drop alloc_str().
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

> @@ -832,7 +809,7 @@ int xc_cpuid_set(
>  continue;
>  }
>  
> -config_transformed[i] = alloc_str();
> +config_transformed[i] = calloc(33, 1); /* 32 bits, NUL terminator. */

I would rather do sizeof(*config_transformed[i]), but I'm not going to
insist.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   3   4   5   6   7   8   9   10   >