Re: [Xen-devel] [PATCH for-4.10] libxc: load acpi RSDP table at correct address

2017-11-22 Thread Roger Pau Monné
On Tue, Nov 21, 2017 at 12:13 PM, Juergen Gross  wrote:
> On 21/11/17 11:42, Andrew Cooper wrote:
>> On 21/11/17 07:44, Jan Beulich wrote:
>> On 20.11.17 at 17:59,  wrote:
 On 11/20/2017 11:43 AM, Jan Beulich wrote:
 On 20.11.17 at 17:28,  wrote:
>> On 11/20/2017 11:26 AM, Jan Beulich wrote:
>> On 20.11.17 at 17:14,  wrote:
 What could cause grub2 to fail to find space for the pointer in the
 first page? Will we ever have anything in EBDA (which is one of the
 possible RSDP locations)?
>>> Well, the EBDA (see the B in its name) is again something that's
>>> meaningless without there being a BIOS.
>> Exactly. So it should always be available for grub to copy the pointer
>> there.
> But what use would it be if grub copied it there? It just shouldn't
> be there, neither before nor after grub (just like grub doesn't
> introduce firmware into the system).
 So that the guest can find it using standard methods. If Xen can't
 guarantee ACPI-compliant placement of the pointer then someone has to
 help the guest find it in the expected place. We can do it with a
 dedicated entry point by setting the pointer explicitly (although
 admittedly this is not done correctly now) or we need to have firmware
 (grub2) place it in the "right" location.

 (It does look a bit hacky though)
>>> Indeed. Of course ACPI without any actual firmware is sort of odd,
>>> too. As to dedicated entry point and its alternatives: Xen itself
>>> tells grub (aiui we're talking about a flavor of it running PVH itself)
>>> where the RSDP is. Why can't grub forward that information in a
>>> suitable way (e.g. via a new tag, or - for Linux - as a new entry
>>> in the Linux boot header)?
>>
>> Or if the worst comes to the worst, fabricate an acpi_rsdp= command line
>> parameter?
>
> This would be easy: just replace the #ifdef CONFIG_KEXEC in
> drivers/acpi/osl.c of the Linux kernel with:
>
> #if defined(CONFIG_KEXEC) || defined(CONFIG_XEN_PVH)
>
> and the parameter is usable and active.
>
> Another possibility would be to let grub copy the RSDP to below 1MB and
> add the E820 entry for it.
>
> In any case it seems save to let Xen place the RSDP just below 4GB,
> together with console and Xenstore pages (so this area just expands).
> grub can handle this either on its own or together with the kernel.
>
> Lets see how Roger's plans with BSD look like.

On FreeBSD the position of the RSDP is passed from the loader to the
kernel, memory scanning is only used if the loader hasn't provided a
RSDP address. I don't know much about the Linux boot protocol, but it
would seem sensible to pass this information from the loader to the
kernel in the boot information (IIRC it's called the zero page on
Linux), so that Linux tries to get the RSDP from the information
passed by the loader and then resorts to memory scanning if that's not
provided.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10] libxc: load acpi RSDP table at correct address

2017-11-20 Thread Roger Pau Monné
Adding xen-devel, dropped it on my reply.

Replying from my phone, sorry for the formatting.


El 20 nov. 2017 9:35, "Juergen Gross"  escribió:

For PVH domains loading of the ACPI RSDP table is done via allocating
a domain loader segment after having loaded the kernel. This leads to
the RSDP table being loaded at an arbitrary guest address instead of
the architectural correct address just below 1MB.


AFAIK this is only true for legacy BIOS boot, when using UEFI the RSDP can
be anywhere in memory, hence grub2 must already have an alternative way of
finding the RSDP apart from scanning the low 1MB.

Thanks, Roger.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 115555: regressions - FAIL

2017-11-10 Thread Roger Pau Monné
On Sat, Nov 04, 2017 at 11:14:35PM +, osstest service owner wrote:
> flight 11 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/11/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-i386-pvgrub 11 guest-start  fail REGR. vs. 
> 115526
>  test-amd64-amd64-amd64-pvgrub 11 guest-start fail REGR. vs. 
> 115526

Both of the above trigger an assertion, it seems to be in pvgrub code
(because I doubt minios has anything kexec related):

root  (hd0,0)

 Filesystem type is ext2fs, partition type 0x83

kernel  /boot/vmlinuz-3.16.0-4-686-pae root=UUID=482efee7-e0f0-453e-bb8e-000bfc

d5a822 ro 

initrd  /boot/initrd.img-3.16.0-4-686-pae

= Init TPM Front 
Tpmfront:Error Unable to read device/vtpm/0/backend-id during tpmfront 
initialization! error = ENOENT
Tpmfront:Info Shutting down tpmfront
close blk: backend=/local/domain/0/backend/vbd/2/51712 node=device/vbd/51712
ASSERTION FAILED: 0 at kexec.c:418.
Do_exit called!
base is 0xb6fca8 caller is 0x6c85d
base is 0xb6fcb8 caller is 0x617ff
base is 0xb6fce8 caller is 0x18fc5
base is 0xb6fdd8 caller is 0x39f8
base is 0xb6fe08 caller is 0x8bbf
base is 0xb6fe18 caller is 0xa6ae
base is 0xb6fe48 caller is 0x10526
base is 0xb6fed8 caller is 0x10c03
base is 0xb6ff58 caller is 0x41f9
base is 0xb6ff7c caller is 0x61f31
base is 0xb6ffec caller is 0x319e

>  test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-localmigrate/x10 fail REGR. 
> vs. 115526

In this case it seems like the guest didn't suspend, but AFAICT the
guest is up and running. No errors on the serial console. It's on the
x16 migration.

>  test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore fail REGR. vs. 
> 115526

This one is worse, there's the following error from libxc:

xc: detail: Frames: 105472/1045504   10%
xc: detail: Frames: 115712/1045504   11%
xc: detail: Frames: 125952/1045504   12%
xc: detail: Frames: 136192/1045504   13%
xc: error: Failed to get types for pfn batch (14 = Bad address): Internal error
xc: error: Save failed (14 = Bad address): Internal error

The guest seems to be alive afterwards.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] blkback reporting incorrect number of sectors, unable to boot

2017-11-09 Thread Roger Pau Monné
On Thu, Nov 09, 2017 at 08:15:52AM -0700, Mike Reardon wrote:
> On Thu, Nov 9, 2017 at 2:30 AM, Roger Pau Monné <roger@citrix.com>
> wrote:
> 
> > Please try to avoid top-posting.
> >
> > On Wed, Nov 08, 2017 at 08:27:17PM -0700, Mike Reardon wrote:
> > > So am I correct in reading this that for at least the foreseeable future
> > > storage using 4k sector sizes is not gonna happen?  I'm just trying to
> > > figure out if I need to get some different hardware.
> >
> > Have you tried to use qdisk instead of blkback for the storage
> > backend?
> >
> > You will have to change your disk configuration line to add
> > backendtype=qdisk.
> >
> > Roger.
> >
> 
> Sorry I didn't realize my client was defaulting to top post.
> 
> If I add that to the disk config line, the system just hangs on the ovmf
> bios screen.  This appears in the qemu-dm log:
> 
> 
> xen be: qdisk-51712: xen be: qdisk-51712: error: Failed to get "write" lock
> error: Failed to get "write" lock
> xen be: qdisk-51712: xen be: qdisk-51712: initialise() failed
> initialise() failed

Hm, that doesn't seem related to the issue at hand. Adding Anthony and
Stefano (the QEMU maintainers).

Is there a know issue when booting a HVM guest with qdisk and UEFI?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/pvh: Do not add DSDT and FACS to PVH dom0 XSDT

2017-11-09 Thread Roger Pau Monné
On Thu, Nov 09, 2017 at 10:37:53AM -0500, Boris Ostrovsky wrote:
> These tables are pointed to from FADT. Adding them will
> result in duplicate entries in the guest's tables.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrov...@oracle.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks, roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2 v2] xen: Add support for initializing 16550 UART using ACPI

2017-11-09 Thread Roger Pau Monné
On Thu, Nov 09, 2017 at 06:18:21AM -0700, Jan Beulich wrote:
> >>> On 09.11.17 at 12:31,  wrote:
> > On Thu, Nov 09, 2017 at 03:49:23PM +0530, Bhupinder Thakur wrote:
> >> +static int ns16550_init_dt(struct ns16550 *uart,
> >> +   const struct dt_device_node *dev)
> >> +{
> >> +return -EINVAL;
> >> +}
> >> +#endif
> >> +
> >> +#ifdef CONFIG_ACPI
> >> +#include 
> > 
> > Please place the include at the top of the file, together with the
> > other ones.
> 
> I disagree here, at least as long as that header isn't itself expanding
> to nothing (or only stubs) when !CONFIG_ACPI.

The header already has a CONFIG_ACPI guard inside, but it doesn't
cover the full content, so my suggestion was to move the include to
the top of the file, but keeping the CONFIG_ACPI guards around it.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2 v2] xen: Add support for initializing 16550 UART using ACPI

2017-11-09 Thread Roger Pau Monné
On Thu, Nov 09, 2017 at 03:49:23PM +0530, Bhupinder Thakur wrote:
> Currently, Xen supports only DT based initialization of 16550 UART.
> This patch adds support for initializing 16550 UART using ACPI SPCR table.
> 
> This patch also makes the uart initialization code common between DT and
> ACPI based initialization.
> 
> Signed-off-by: Bhupinder Thakur 
> ---
> TBD:
> There was one review comment from Julien about how the uart->io_size is being 
> calculated. Currently, I am calulating the io_size based on address of the 
> last
> UART register. 
> 
> pci_uart_config also calcualates the uart->io_size like this:
> 
> uart->io_size = max(8U << param->reg_shift,
>  param->uart_offset);
> 
> I am not sure whether we can use similar logic for calculating uart->io_size.
> 
> Changes since v1:
> - Reused common code between DT and ACPI based initializations
> 
> CC: Andrew Cooper 
> CC: George Dunlap 
> CC: Ian Jackson 
> CC: Jan Beulich 
> CC: Konrad Rzeszutek Wilk 
> CC: Stefano Stabellini 
> CC: Tim Deegan 
> CC: Wei Liu 
> CC: Julien Grall 
> 
>  xen/drivers/char/ns16550.c  | 132 
> 
>  xen/include/xen/8250-uart.h |   1 +
>  2 files changed, 121 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
> index e0f8199..cf42fce 100644
> --- a/xen/drivers/char/ns16550.c
> +++ b/xen/drivers/char/ns16550.c
> @@ -1463,18 +1463,13 @@ void __init ns16550_init(int index, struct 
> ns16550_defaults *defaults)
>  }
>  
>  #ifdef CONFIG_HAS_DEVICE_TREE
> -static int __init ns16550_uart_dt_init(struct dt_device_node *dev,
> -   const void *data)
> +static int ns16550_init_dt(struct ns16550 *uart,
> +   const struct dt_device_node *dev)

Why are you dropping the __init attribute?

>  {
> -struct ns16550 *uart;
> -int res;
> +int res = 0;
>  u32 reg_shift, reg_width;
>  u64 io_size;
>  
> -uart = _com[0];
> -
> -ns16550_init_common(uart);
> -
>  uart->baud  = BAUD_AUTO;
>  uart->data_bits = 8;
>  uart->parity= UART_PARITY_NONE;
> @@ -1510,18 +1505,103 @@ static int __init ns16550_uart_dt_init(struct 
> dt_device_node *dev,
>  
>  uart->dw_usr_bsy = dt_device_is_compatible(dev, "snps,dw-apb-uart");
>  
> +return res;
> +}
> +#else
> +static int ns16550_init_dt(struct ns16550 *uart,
> +   const struct dt_device_node *dev)
> +{
> +return -EINVAL;
> +}
> +#endif
> +
> +#ifdef CONFIG_ACPI
> +#include 

Please place the include at the top of the file, together with the
other ones.

> +static int ns16550_init_acpi(struct ns16550 *uart,
> + const void *data)
> +{
> +struct acpi_table_spcr *spcr = NULL;
> +int status = 0;

I don't think you need to initialize any of those two variables. Or
do:

int status = acpi_get_table(ACPI_SIG_SPCR, 0,
(struct acpi_table_header **));

if ( ... )


> +status = acpi_get_table(ACPI_SIG_SPCR, 0,
> +(struct acpi_table_header **));
> +
> +if ( ACPI_FAILURE(status) )
> +{
> +printk("ns16550: Failed to get SPCR table\n");
> +return -EINVAL;
> +}
> +
> +uart->baud  = BAUD_AUTO;
> +uart->data_bits = 8;
> +uart->parity= spcr->parity;
> +uart->stop_bits = spcr->stop_bits;
> +uart->io_base = spcr->serial_port.address;
> +uart->irq = spcr->interrupt;
> +uart->reg_width = spcr->serial_port.bit_width / 8;
> +uart->reg_shift = 0;
> +uart->io_size = UART_MAX_REG << uart->reg_shift;

You seem to align some of the '=' above but not all, please do either
one, but consistently.

> +
> +irq_set_type(spcr->interrupt, spcr->interrupt_type);
> +
> +return 0;
> +}
> +#else
> +static int ns16550_init_acpi(struct ns16550 *uart,
> + const void *data)
> +{
> +return -EINVAL;
> +}
> +#endif
> +
> +static int ns16550_uart_init(struct ns16550 **puart,
> + const void *data, bool acpi)
> +{
> +struct ns16550 *uart = _com[0];
> +
> +*puart = uart;
> +
> +ns16550_init_common(uart);
> +
> +return ( acpi ) ? ns16550_init_acpi(uart, data)
  ^ unneeded parentheses.

> +: ns16550_init_dt(uart, data);
> +}
> +
> +static void ns16550_vuart_init(struct ns16550 *uart)
> +{
> +#ifdef CONFIG_ARM
>  uart->vuart.base_addr = uart->io_base;
>  uart->vuart.size = uart->io_size;
> -uart->vuart.data_off = UART_THR  -uart->vuart.status_off = UART_LSR -uart->vuart.status = UART_LSR_THRE|UART_LSR_TEMT;
> +uart->vuart.data_off = 

Re: [Xen-devel] [PATCH] x86/pvh: Do not add DSDT and FACS to PVH dom0 XSDT

2017-11-09 Thread Roger Pau Monné
On Wed, Nov 08, 2017 at 03:19:27PM -0500, Boris Ostrovsky wrote:
> These tables are pointed to from FADT. Adding them will
> result in duplicate entries in the guest's tables.
> 
> Signed-off-by: Boris Ostrovsky 
> ---
>  xen/arch/x86/hvm/dom0_build.c | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> index a67071c..c878bba 100644
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -818,6 +818,19 @@ static bool __init pvh_acpi_table_allowed(const char 
> *sig)
>  return true;
>  }
>  
> +static bool __init pvh_acpi_table_in_xsdt(const char *sig)
> +{
> +/*
> + * DSDT and FACS are pointed to from FADT and thus don't belong
> + * in XSDT.
> + */
> +if ( !strncmp(sig, ACPI_SIG_DSDT, ACPI_NAME_SIZE) ||
> + !strncmp(sig, ACPI_SIG_FACS, ACPI_NAME_SIZE) )
> +return false;
> +
> +return true;
> +}

What about adding something like:

static bool __init pvh_acpi_xsdt_table_allowed(const char *sig)
{
return pvh_acpi_table_allowed(sig) &&
   strncmp(sig, ACPI_SIG_DSDT, ACPI_NAME_SIZE) &&
   strncmp(sig, ACPI_SIG_FACS, ACPI_NAME_SIZE);
}

And replacing the pvh_acpi_table_allowed calls in pvh_setup_acpi_xsdt
with that?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] blkback reporting incorrect number of sectors, unable to boot

2017-11-09 Thread Roger Pau Monné
Please try to avoid top-posting.

On Wed, Nov 08, 2017 at 08:27:17PM -0700, Mike Reardon wrote:
> So am I correct in reading this that for at least the foreseeable future
> storage using 4k sector sizes is not gonna happen?  I'm just trying to
> figure out if I need to get some different hardware.

Have you tried to use qdisk instead of blkback for the storage
backend?

You will have to change your disk configuration line to add
backendtype=qdisk.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] win2008 guest cannot get ip through sriov

2017-11-09 Thread Roger Pau Monné
On Thu, Nov 09, 2017 at 12:22:49AM +, Hao, Xudong wrote:
> > Qemu-xen didn't have commit a80363, so I report this issue to ask for sync 
> > up
> > with qemu upstream. Last mail I mean I usually used Qemu Xen tree to do 
> > test,
> > and found out this issue.
> > 
> > Before requesting the backport, have you tested whether it fixes your 
> > issues?
> > 
> 
> Yes, Luwei (Cced) tested it with pass result.

In which case requesting a backport would be in place on the grounds
that's a bug fix.

The patch in question fixes a bug mostly seen when doing
PCI-passthrough of devices that support MSI-X interrupts to Windows
guest (and Windows attempts to setup the interrupts using MSI-X). It
doesn't manifest on Linux or FreeBSD because these OSes use a
different dance to setup MSI-X interrupts, and thus are not affected.
It's likely that other OSes with different MSI-X implementations are
able to trigger that bug. The result of not having the patch is that
interrupts of passed through devices will stay masked, thus
preventing the device from working (unless polling mode only is
used).

Pros:
 - It fixes a bug.
 - The patch is not very big or intrusive.

Cons:
 - It hasn't been tested as it's not in qemu-xen.
 - Only Windows is affected by the bug AFAIK.

IMHO, iff the backport is accepted it should be performed ASAP, so
that the patch can be properly tested before the release.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next 7/9] coverage: introduce support for llvm profiling

2017-11-08 Thread Roger Pau Monné
On Wed, Nov 08, 2017 at 01:38:59AM -0700, Jan Beulich wrote:
> >>> On 26.10.17 at 11:19,  wrote:
> > --- /dev/null
> > +++ b/xen/common/coverage/llvm.c
> > +#define LLVM_PROFILE_MAGIC_64 (uint64_t)255 << 56 | (uint64_t)'l' << 48 | \
> > +   (uint64_t)'p' << 40 | (uint64_t)'r' << 32 | (uint64_t)'o' << 24 |  \
> > +(uint64_t)'f' << 16 | (uint64_t)'r' << 8 | (uint64_t)129
> > +#define LLVM_PROFILE_MAGIC_32 (uint64_t)255 << 56 | (uint64_t)'l' << 48 | \
> > +   (uint64_t)'p' << 40 | (uint64_t)'r' << 32 | (uint64_t)'o' << 24 |  \
> > +(uint64_t)'f' << 16 | (uint64_t)'R' << 8 | (uint64_t)129
> 
> Judging by the file having Xen style, I imply this isn't intended to
> very closely match some other original. With that, parentheses
> should be added around all the shift operations above.
> 
> > +#if __clang_major__ >= 4 || (__clang_major__ == 3 && __clang_minor__ == 9)
> 
> Is it certain that there's never going to be a 3.10? Otherwise
> >= might be more suitable for the minor version check.

No, there's not going to be a clang 3.10.

> > +int __llvm_profile_runtime;
> 
> This isn't used anywhere - please add a brief comment explaining
> the need for it (the more that its type being plain int is also
> suspicious).

This is an internal symbol that must be present because Xen is not
linked against the run-time coverage library. It's described as an int
here:

https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#using-the-profiling-runtime-without-static-initializers

Without this symbol linkage fails.

> > +extern struct llvm_profile_data __start___llvm_prf_data;
> > +extern struct llvm_profile_data __stop___llvm_prf_data;
> > +extern uint64_t __start___llvm_prf_cnts;
> > +extern uint64_t __stop___llvm_prf_cnts;
> > +extern char __start___llvm_prf_names;
> > +extern char __stop___llvm_prf_names;
> 
> I guess all of these really want to have [] added, making ...
> 
> > +static void *start_data = &__start___llvm_prf_data;
> > +static void *end_data = &__stop___llvm_prf_data;
> > +static void *start_counters = &__start___llvm_prf_cnts;
> > +static void *end_counters = &__stop___llvm_prf_cnts;
> > +static void *start_names = &__start___llvm_prf_names;
> > +static void *end_names = &__stop___llvm_prf_names;
> 
> ... the  here unnecessary. But then - do these really need to
> be statics (rather than #define-s)?
> 
> I would also guess that at least the names ones could be const.

Could make them defines indeed.

> > +APPEND_TO_BUFFER((char *)start_data, end_data - start_data);
> > +APPEND_TO_BUFFER((char *)start_counters, end_counters - 
> > start_counters);
> > +APPEND_TO_BUFFER((char *)start_names, end_names - start_names);
> 
> The casts should all be to const char*, and perhaps that would
> better be done inside the macro anyway.

Seems better indeed.

> > --- a/xen/include/public/sysctl.h
> > +++ b/xen/include/public/sysctl.h
> > @@ -646,6 +646,12 @@ struct xen_sysctl_scheduler_op {
> >  
> >  #define XEN_GCOV_FORMAT_MAGIC0x58434f56 /* XCOV */
> 
> Hmm, shouldn't the private magic #define-s actually be put here
> (in which case you'd indeed need to retain both 32- and 64-bit
> variants)?

I don't think so, here XEN_GCOV_FORMAT_MAGIC is a Xen specific gcov
magic number.

OTOH LLVM_PROFILE_MAGIC_{64/32} is an llvm defined magic number,
that's not under our control. Hence I don't think it should be
exported in Xen public headers.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next 6/9] kconfig: add llvm coverage option

2017-11-08 Thread Roger Pau Monné
On Wed, Nov 08, 2017 at 01:13:29AM -0700, Jan Beulich wrote:
> >>> On 26.10.17 at 12:10, <wei.l...@citrix.com> wrote:
> > On Thu, Oct 26, 2017 at 11:08:21AM +0100, Roger Pau Monné wrote:
> >> On Thu, Oct 26, 2017 at 11:03:13AM +0100, Wei Liu wrote:
> >> > On Thu, Oct 26, 2017 at 10:19:35AM +0100, Roger Pau Monne wrote:
> >> > >  config GCOV
> >> > >bool "Gcov Support"
> >> > >depends on !LIVEPATCH
> >> > 
> >> > && !LLVM_COVERAGE
> >> 
> >> That was my idea, but sadly that's not possible because you generate a
> >> circular dependency. The best solution that I found is to only mark
> >> one as exclusive (ie: have depends !GCOV in the LLVM_COVERAGE option
> >> below).
> > 
> > In that case, why not just use "choice" to let user pick the
> > implementation?
> 
> Plus then this choice should probably have an auto-detect option
> just like gcov's gcc version one - at least I assume that it should
> be pretty straightforward to tell at build time which variant to use.
> In fact - what would happen if one enabled the wrong option for
> the compiler in use? IOW I wonder whether auto-detection (and
> then also for the gcc version) should be the only valid mode).

I don't plan to introduce llvm/clang version choices here, I think
autodetection should be fine.

I can remove the gcc ones also, leaving this as a boolean choice (ie:
enable code coverage support), but I would like to have some
confirmation from the gcc side.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [libvirt test] 115636: regressions - FAIL

2017-11-07 Thread Roger Pau Monné
On Tue, Nov 07, 2017 at 12:42:59PM +, osstest service owner wrote:
> flight 115636 libvirt real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/115636/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-i386-libvirt-qcow2 17 guest-start/debian.repeat fail REGR. vs. 
> 115476

One of the local attachments done in order to run the bootloader
failed:

2017-11-07 12:17:16.601+: libxl: 
libxl_disk.c:990:libxl__device_disk_local_initiate_attach: Trying to find local 
path
2017-11-07 12:17:16.601+: libxl: 
libxl_disk.c:998:libxl__device_disk_local_initiate_attach: Local path not 
found, initiating attach.
2017-11-07 12:17:16.601+: libxl: 
libxl_device.c:365:libxl__device_disk_set_backend: Disk vdev=(null) 
spec.backend=qdisk
2017-11-07 12:17:16.601+: libxl: 
libxl_device.c:365:libxl__device_disk_set_backend: Disk vdev=xvda 
spec.backend=qdisk
2017-11-07 12:17:16.606+: libxl: 
libxl_event.c:686:libxl__ev_xswatch_deregister: watch w=0xb3f06544: deregister 
unregistered
2017-11-07 12:17:26.693+: libxl: 
libxl_device.c:1366:libxl__wait_for_backend: Backend 
/local/domain/0/backend/qdisk/0/51712 not ready
2017-11-07 12:17:26.693+: libxl: 
libxl_bootloader.c:417:bootloader_disk_attached_cb: Domain 7:failed to attach 
local disk for bootloader execution
2017-11-07 12:17:26.693+: libxl: 
libxl_event.c:686:libxl__ev_xswatch_deregister: watch w=0xb3f06658: deregister 
unregistered
2017-11-07 12:17:26.693+: libxl: 
libxl_bootloader.c:283:bootloader_local_detached_cb: Domain 7:unable to detach 
locally attached disk
2017-11-07 12:17:26.693+: libxl: 
libxl_create.c:1246:domcreate_rebuild_done: Domain 7:cannot (re-)build domain: 
-3
2017-11-07 12:17:26.708+: libxl: libxl_domain.c:1138:devices_destroy_cb: 
Domain 7:Forked pid 4256 for destroy of domain
2017-11-07 12:17:26.711+: libxl: libxl_event.c:1869:libxl__ao_complete: ao 
0xb3f01578: complete, rc=-3
2017-11-07 12:17:26.712+: libxl: libxl_event.c:1838:libxl__ao__destroy: ao 
0xb3f01578: destroy

Sadly AFAICT there's no log for the QEMU running this backend.
Previous domain creating attempts worked.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] blkback reporting incorrect number of sectors, unable to boot

2017-11-07 Thread Roger Pau Monné
On Tue, Nov 07, 2017 at 04:31:06AM -0700, Jan Beulich wrote:
> >>> On 07.11.17 at 11:30,  wrote:
> > On Mon, Nov 06, 2017 at 05:33:37AM -0700, Jan Beulich wrote:
> >> >>> On 04.11.17 at 05:48,  wrote:
> >> > I added some additional storage to my server with some native 4k sector
> >> > size disks.  The LVM volumes on that array seem to work fine when mounted
> >> > by the host, and when passed through to any of the Linux guests, but
> >> > Windows guests aren't able to use them when using PV drivers.  The work
> >> > fine to install when I first install Windows (Windows 10, latest build) 
> >> > but
> >> > once I install the PV drivers it will no longer boot and give an
> >> > inaccessible boot device error.  If I assign the storage to a different
> >> > Windows guest that already has the drivers installed (as secondary 
> >> > storage,
> >> > not as the boot device) I see the disk listed in disk management, but the
> >> > size of the disk is 8x larger than it should be.  After looking into it a
> >> > bit, the disk is reporting 8x the number of sectors it should have when I
> >> > run xenstore-ls.  Here is the info from xenstore-ls for the relevant 
> >> > volume:
> >> > 
> >> >   51712 = ""
> >> >frontend = "/local/domain/8/device/vbd/51712"
> >> >params = "/dev/tv_storage/main-storage"
> >> >script = "/etc/xen/scripts/block"
> >> >frontend-id = "8"
> >> >online = "1"
> >> >removable = "0"
> >> >bootable = "1"
> >> >state = "2"
> >> >dev = "xvda"
> >> >type = "phy"
> >> >mode = "w"
> >> >device-type = "disk"
> >> >discard-enable = "1"
> >> >feature-max-indirect-segments = "256"
> >> >multi-queue-max-queues = "12"
> >> >max-ring-page-order = "4"
> >> >physical-device = "fe:0"
> >> >physical-device-path = "/dev/dm-0"
> >> >hotplug-status = "connected"
> >> >feature-flush-cache = "1"
> >> >feature-discard = "0"
> >> >feature-barrier = "1"
> >> >feature-persistent = "1"
> >> >sectors = "34359738368"
> >> >info = "0"
> >> >sector-size = "4096"
> >> >physical-sector-size = "4096"
> >> > 
> >> > 
> >> > Here are the numbers for the volume as reported by fdisk:
> >> > 
> >> > Disk /dev/tv_storage/main-storage: 16 TiB, 17592186044416 bytes, 
> >> > 4294967296
> >> > sectors
> >> > Units: sectors of 1 * 4096 = 4096 bytes
> >> > Sector size (logical/physical): 4096 bytes / 4096 bytes
> >> > I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> >> > Disklabel type: dos
> >> > Disk identifier: 0x
> >> > 
> >> > DeviceBoot StartEndSectors Size Id 
> >> > Type
> >> > /dev/tv_storage/main-storage1  1 4294967295 4294967295  16T ee 
> >> > GPT
> >> > 
> >> > 
> >> > As with the size reported in Windows disk management, the number of 
> >> > sectors
> >> > from xenstore seems is 8x higher than what it should be.  The disks 
> >> > aren't
> >> > using 512b sector emulation, they are natively 4k, so I have no idea 
> >> > where
> >> > the 8x increase is coming from.
> >> 
> >> Hmm, looks like a backend problem indeed: struct hd_struct's
> >> nr_sects (which get_capacity() returns) looks to be in 512-byte
> >> units, regardless of actual sector size. Hence the plain
> >> get_capacity() use as well the (wrongly open coded) use of
> >> part_nr_sects_read() looks insufficient in vbd_sz(). Roger,
> >> Konrad?
> > 
> > Hm, AFAICT sector-size should always be set to 512.
> 
> Which would mean that bdev_logical_block_size() can't be used by
> blkback to set this value. Yet then - what's the point of the xenstore
> setting if it's always the same value anyway?

Some frontends (at least FreeBSD) will choke if sector-size is not
set. So we have the following scenario:

 - Windows: acknowledges sector-size * sectors in order to set disk
   capacity.
 - Linux: sets disk capacity to sectors * 512.
 - FreeBSD: sets disk capacity to sector-size * sectors, will choke if
   sector-size is not set.

In order to keep compatibility with all of them AFAICT the only option
is to hardcode sector-size to 512 in xenstore.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [linux-4.9 test] 115504: regressions - FAIL

2017-11-07 Thread Roger Pau Monné
On Fri, Nov 03, 2017 at 08:21:31PM +, osstest service owner wrote:
> flight 115504 linux-4.9 real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/115504/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop   fail REGR. vs. 
> 114814

AFAICT this tree should also be force-pushed, the windows 16 issue is
the same as the one seen on xen-unstable.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Xen/pciback: Implement PCI slot or bus reset with 'do_flr' SysFS attribute

2017-11-07 Thread Roger Pau Monné
On Mon, Nov 06, 2017 at 12:48:42PM -0500, Govinda Tatti wrote:
> The life-cycle of a PCI device in Xen pciback is complex and is constrained
> by the generic PCI locking mechanism.
> 
> - It starts with the device being bound to us, for which we do a function
>   reset (done via SysFS so the PCI lock is held).
> - If the device is unbound from us, we also do a function reset
>   (done via SysFS so the PCI lock is held).
> - If the device is un-assigned from a guest - we do a function reset
>   (no PCI lock is held).
> 
> All reset operations are done on the individual PCI function level
> (so bus:device:function).
> 
> The reset for an individual PCI function means device must support FLR
> (PCIe or AF), PM reset on D3hot->D0 device specific reset, or a secondary
> bus reset for a singleton device on a bus but FLR does not have widespread
> support or it is not reliable in some cases. So, we need to provide an
> alternate mechanism to users to perform a slot or bus level reset.
> 
> Currently, a slot or bus reset is not exposed in SysFS as there is no good
> way of exposing a bus topology there. This is due to the complexity -
> we MUST know that the different functions of a PCIe device are not in use
> by other drivers, or if they are in use (say one of them is assigned to a
> guest and the other is  idle) - it is still OK to reset the slot (assuming
> both of them are owned by Xen pciback).
> 
> This patch does that by doing a slot or bus reset (if slot not supported)
> if all of the functions of a PCIe device belong to Xen PCIback.
> 
> Due to the complexity with the PCI lock we cannot do the reset when a
> device is bound ('echo $BDF > bind') or when unbound ('echo $BDF > unbind')
> as the pci_[slot|bus]_reset also takes the same lock resulting in a
> dead-lock.
> 
> Putting the reset function in a work-queue or thread won't work either -
> as we have to do the reset function outside the 'unbind' context (it holds
> the PCI lock). But once you 'unbind' a device the device is no longer under
> the ownership of Xen pciback and the pci_set_drvdata has been reset, so
> we cannot use a thread for this.
> 
> Instead of doing all this complex dance, we depend on the tool-stack doing
> the right thing. As such, we implement the 'do_flr' SysFS attribute which
> 'xl' uses when a device is detached or attached from/to a guest. It
> bypasses the need to worry about the PCI lock.
> 
> To not inadvertently do a bus reset that would affect devices that are in
> use by other drivers (other than Xen pciback) prior to the reset, we check
> that all of the devices under the bridge are owned by Xen pciback. If they
> are not, we refrain from executing the bus (or slot) reset.
> 
> Signed-off-by: Govinda Tatti 
> Signed-off-by: Konrad Rzeszutek Wilk 
> Reviewed-by: Boris Ostrovsky 
> ---
>  Documentation/ABI/testing/sysfs-driver-pciback |  12 +++
>  drivers/xen/xen-pciback/pci_stub.c | 125 
> +
>  2 files changed, 137 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-driver-pciback 
> b/Documentation/ABI/testing/sysfs-driver-pciback
> index 6a733bf..ccf7dc0 100644
> --- a/Documentation/ABI/testing/sysfs-driver-pciback
> +++ b/Documentation/ABI/testing/sysfs-driver-pciback
> @@ -11,3 +11,15 @@ Description:
>  #echo 00:19.0-E0:2:FF > /sys/bus/pci/drivers/pciback/quirks
>  will allow the guest to read and write to the configuration
>  register 0x0E.
> +
> +What:   /sys/bus/pci/drivers/pciback/do_flr
> +Date:   Nov 2017
> +KernelVersion:  4.15
> +Contact:xen-de...@lists.xenproject.org
> +Description:
> +An option to perform a slot or bus reset when a PCI device
> + is owned by Xen PCI backend. Writing a string of :BB:DD.F
> + will cause the pciback driver to perform a slot or bus reset
> + if the device supports it. It also checks to make sure that
> + all of the devices under the bridge are owned by Xen PCI
> + backend.
> diff --git a/drivers/xen/xen-pciback/pci_stub.c 
> b/drivers/xen/xen-pciback/pci_stub.c
> index 6331a95..2b2c269 100644
> --- a/drivers/xen/xen-pciback/pci_stub.c
> +++ b/drivers/xen/xen-pciback/pci_stub.c
> @@ -244,6 +244,96 @@ struct pci_dev *pcistub_get_pci_dev(struct 
> xen_pcibk_device *pdev,
>   return found_dev;
>  }
>  
> +struct pcistub_args {
> + struct pci_dev *dev;

const?

> + int dcount;

unsigned int.

> +};
> +
> +static int pcistub_search_dev(struct pci_dev *dev, void *data)

Seems like this function would better return a boolean rather than an
int.

> +{
> + struct pcistub_device *psdev;
> + struct pcistub_args *arg = data;
> + bool found_dev = false;
> + unsigned long flags;
> +
> + spin_lock_irqsave(_devices_lock, flags);
> +
> + list_for_each_entry(psdev, _devices, dev_list) {

Re: [Xen-devel] [BUG] blkback reporting incorrect number of sectors, unable to boot

2017-11-07 Thread Roger Pau Monné
On Mon, Nov 06, 2017 at 05:33:37AM -0700, Jan Beulich wrote:
> >>> On 04.11.17 at 05:48,  wrote:
> > I added some additional storage to my server with some native 4k sector
> > size disks.  The LVM volumes on that array seem to work fine when mounted
> > by the host, and when passed through to any of the Linux guests, but
> > Windows guests aren't able to use them when using PV drivers.  The work
> > fine to install when I first install Windows (Windows 10, latest build) but
> > once I install the PV drivers it will no longer boot and give an
> > inaccessible boot device error.  If I assign the storage to a different
> > Windows guest that already has the drivers installed (as secondary storage,
> > not as the boot device) I see the disk listed in disk management, but the
> > size of the disk is 8x larger than it should be.  After looking into it a
> > bit, the disk is reporting 8x the number of sectors it should have when I
> > run xenstore-ls.  Here is the info from xenstore-ls for the relevant volume:
> > 
> >   51712 = ""
> >frontend = "/local/domain/8/device/vbd/51712"
> >params = "/dev/tv_storage/main-storage"
> >script = "/etc/xen/scripts/block"
> >frontend-id = "8"
> >online = "1"
> >removable = "0"
> >bootable = "1"
> >state = "2"
> >dev = "xvda"
> >type = "phy"
> >mode = "w"
> >device-type = "disk"
> >discard-enable = "1"
> >feature-max-indirect-segments = "256"
> >multi-queue-max-queues = "12"
> >max-ring-page-order = "4"
> >physical-device = "fe:0"
> >physical-device-path = "/dev/dm-0"
> >hotplug-status = "connected"
> >feature-flush-cache = "1"
> >feature-discard = "0"
> >feature-barrier = "1"
> >feature-persistent = "1"
> >sectors = "34359738368"
> >info = "0"
> >sector-size = "4096"
> >physical-sector-size = "4096"
> > 
> > 
> > Here are the numbers for the volume as reported by fdisk:
> > 
> > Disk /dev/tv_storage/main-storage: 16 TiB, 17592186044416 bytes, 4294967296
> > sectors
> > Units: sectors of 1 * 4096 = 4096 bytes
> > Sector size (logical/physical): 4096 bytes / 4096 bytes
> > I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> > Disklabel type: dos
> > Disk identifier: 0x
> > 
> > DeviceBoot StartEndSectors Size Id Type
> > /dev/tv_storage/main-storage1  1 4294967295 4294967295  16T ee GPT
> > 
> > 
> > As with the size reported in Windows disk management, the number of sectors
> > from xenstore seems is 8x higher than what it should be.  The disks aren't
> > using 512b sector emulation, they are natively 4k, so I have no idea where
> > the 8x increase is coming from.
> 
> Hmm, looks like a backend problem indeed: struct hd_struct's
> nr_sects (which get_capacity() returns) looks to be in 512-byte
> units, regardless of actual sector size. Hence the plain
> get_capacity() use as well the (wrongly open coded) use of
> part_nr_sects_read() looks insufficient in vbd_sz(). Roger,
> Konrad?

Hm, AFAICT sector-size should always be set to 512.

> Question of course is whether the Linux frontend then
> also needs adjustment, and hence whether the backend can
> be corrected in a compatible way in the first place.

blkfront uses set_capacity, which also seems to expect the sectors to
be hardcoded to 512.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next 1/9] gcov: return ENOSYS for unimplemented gcov domctl

2017-11-07 Thread Roger Pau Monné
On Mon, Nov 06, 2017 at 05:06:18AM -0700, Jan Beulich wrote:
> >>> On 06.11.17 at 12:16,  wrote:
> > Jan Beulich writes ("Re: [PATCH for-next 1/9] gcov: return ENOSYS for 
> > unimplemented gcov domctl"):
> >> On 26.10.17 at 11:19,  wrote:
> >> > --- a/xen/common/gcov/gcov.c
> >> > +++ b/xen/common/gcov/gcov.c
> >> > @@ -239,7 +239,7 @@ int sysctl_gcov_op(struct xen_sysctl_gcov_op *op)
> >> >  break;
> >> >  
> >> >  default:
> >> > -ret = -EINVAL;
> >> > +ret = -ENOSYS;
> >> >  break;
> >> >  }
> >> 
> >> Very certainly ENOSYS is not in any way better. Despite the many
> >> misuses of it, we've started enforcing that this wouldn't be spread.
> >> -EOPNOTSUPP may be fine here, but -EINVAL is suitable as well.
> >> -ENOSYS exclusively means that a _top level_ hypercall is
> >> unimplemented (i.e. with very few exceptions there should be
> >> exactly one place where it gets returned, which is in the main
> >> hypercall dispatch code).
> > 
> > The distinction between unimplemented status of a top-level hypercall
> > and unimplemented status of a sub-op is rarely useful to the caller.
> > 
> > Conversely, the distinction between an unimplemented facility, and a
> > facility which is exists but is being used improperly, is vitally
> > important to anyone who is trying to write compatibility code.
> > 
> > I don't mind if you want to insist on the former distinction,
> > reserving ENOSYS for top-level hypercalls and EOPNOTSUPP for other
> > functions.
> > 
> > But I absolutely do mind the use of EINVAL for "unsupported function".
> > I appreciate that much of the hypervisor has historically used EINVAL
> > this way, but this is (a) a pain for callers (b) evil, bad, and wrong
> > (c) unnecessary since EOPNOTSUPP is available.  We should at least not
> > perpetrate any more of this.  In an unreleased API we should change it
> > before release.

This API has actually been released since ~2013 IIRC, when it was
added to Xen.

> Okay, so EOPNOTSUPP is it then, which is also my preference
> (due to there being so many uses of EINVAL elsewhere). I've
> merely mentioned that EINVAL would be suitable since,
> technically speaking, the value in a "sub-operation" field being
> invalid is no different from this being the case for the value in
> any other field.

If I don't get any more comments I will re-send this patch separately
using EOPNOTSUPP instead of ENOSYS. I will also keep the Acks gathered
so far unless anyone objects.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] win2008 guest cannot get ip through sriov

2017-11-07 Thread Roger Pau Monné
On Mon, Nov 06, 2017 at 01:04:56AM +, Hao, Xudong wrote:
> > -Original Message-
> > From: Roger Pau Monné [mailto:roger@citrix.com]
> > Sent: Friday, November 3, 2017 7:23 PM
> > To: Hao, Xudong <xudong@intel.com>
> > Cc: Julien Grall <julien.gr...@linaro.org>; Stefano Stabellini
> > <sstabell...@kernel.org>; Lars Kurth <lars.ku...@citrix.com>; Quan Xu
> > <quan@gmail.com>; Kang, Luwei <luwei.k...@intel.com>; Zhang,
> > PengtaoX <pengtaox.zh...@intel.com>; Julien Grall <julien.gr...@arm.com>;
> > Jan Beulich <jbeul...@suse.com>; Xen-devel <xen-de...@lists.xenproject.org>;
> > Anthony PERARD <anthony.per...@citrix.com>; Wei Liu <wei.l...@citrix.com>
> > Subject: Re: [Xen-devel] [BUG] win2008 guest cannot get ip through sriov
> > 
> > On Fri, Nov 03, 2017 at 01:10:26AM +, Hao, Xudong wrote:
> > >
> > > > -Original Message-
> > > > From: Julien Grall [mailto:julien.gr...@linaro.org]
> > > > Sent: Thursday, November 2, 2017 9:50 PM
> > > > To: Stefano Stabellini <sstabell...@kernel.org>
> > > > Cc: Hao, Xudong <xudong@intel.com>; Jan Beulich
> > > > <jbeul...@suse.com>; Quan Xu <quan@gmail.com>; Lars Kurth
> > > > <lars.ku...@citrix.com>; Wei Liu <wei.l...@citrix.com>; Zhang,
> > > > PengtaoX <pengtaox.zh...@intel.com>; Kang, Luwei
> > > > <luwei.k...@intel.com>; Julien Grall <julien.gr...@arm.com>; Anthony
> > > > PERARD <anthony.per...@citrix.com>; Xen-devel  > > > de...@lists.xenproject.org>
> > > > Subject: Re: [Xen-devel] [BUG] win2008 guest cannot get ip through
> > > > sriov
> > > >
> > > > Hi,
> > > >
> > > > On 27/10/17 21:16, Stefano Stabellini wrote:
> > > > > On Fri, 27 Oct 2017, Julien Grall wrote:
> > > > >> On 27/10/17 08:27, Hao, Xudong wrote:
> > > > >>> This bug exist much long time, there are many discussion last
> > > > >>> year but not a solution then. I call out it now because there is
> > > > >>> a fix in qemu
> > > > upstream:
> > > > >>> commit a8036336609d2e184fc3543a4c439c0ba7d7f3a2
> > > > >>> Author: Roger Pau Monne <roger@citrix.com>
> > > > >>> Date:   Thu Aug 24 16:07:03 2017 +0100
> > > > >>>
> > > > >>>   xen/pt: allow QEMU to request MSI unmasking at bind time
> > > > >>>
> > > > >>> The fix is not in qemu-xen tree yet, when will qemu-xen sync
> > > > >>> this fix? Is it possible to catch Xen 4.10's qemu-xen?
> > > > >>
> > > > >> I will let Stefano and Anthony providing feedback before giving a
> > > > >> release-ack here.
> > > > >
> > > > > Yes, I think we should backport the commit as it fixes a genuine bug.
> > > > > The backport is not risk-free but it only affects PCI Passthrough.
> > > > > Also the commit has been in QEMU for 2 months now.
> > > >
> > > > Does anyone actually tested it with QEMU Xen tree?
> > > >
> > >
> > > Qemu Xen tree is default which is in Xen source code configuration file
> > Config.mk, I tested it with it.
> > > QEMU_UPSTREAM_URL ?= http://xenbits.xen.org/git-http/qemu-xen.git
> > 
> > Can you please make sure you have QEMU commit a80363: xen/pt: allow QEMU
> > to request MSI unmasking at bind time. AFAICT this is not yet in the 
> > qemu-xen
> > tree.
> > 
> 
> Roger, 
> Maybe I misunderstood of your question and my last mail confused you. 
> Qemu-xen didn't have commit a80363, so I report this issue to ask for sync up 
> with qemu upstream. Last mail I mean I usually used Qemu Xen tree to do test, 
> and found out this issue.

Before requesting the backport, have you tested whether it
fixes your issues?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PVH support in grub2

2017-11-03 Thread Roger Pau Monné
On Fri, Nov 03, 2017 at 07:23:50PM +0100, Juergen Gross wrote:
> On 03/11/17 19:19, Boris Ostrovsky wrote:
> > On 11/03/2017 02:05 PM, Juergen Gross wrote:
> >>
> >> So again the question: how to tell whether we are PVH or HVM in
> >> init_hypervisor_platform()? ACPi tables are scanned way later...
> > 
> > Can we make grub/OVMF append a boot option?
> > 
> > Or set setup_header.hardware_subarch to something? We already have
> > X86_SUBARCH_XEN but it is only used by PV.  Or we might be able to use
> > hardware_subarch_data (will need to get a buy-in from x86 maintainers, I
> > think).
> 
> But wouldn't this break the idea to reuse the native boot paths in
> grub/OVMF without further modifications?
> 
> I'd rather have a way to ask the hypervisor whether we are in PVH mode
> (e.g. via CPUID or a hypercall to test for a devicemodel having itself
> registered).

Keep in mind that from Xen's PoV PVH is just a HVM guest. Xen
currently keeps a bitmap of the emulated devices that are available to
a guest, but that's so far an internal field. We could consider
exporting this on a cpuid leaf, but then we need to make it a fixed
ABI.

Maybe we can make a list of platform devices that are not available on
PVH and that Linux assumes to be present?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Commit moratorium to staging

2017-11-03 Thread Roger Pau Monné
On Fri, Nov 03, 2017 at 05:57:52PM +, George Dunlap wrote:
> On 11/03/2017 02:52 PM, George Dunlap wrote:
> > On 11/03/2017 02:14 PM, Roger Pau Monné wrote:
> >> On Thu, Nov 02, 2017 at 09:55:11AM +, Paul Durrant wrote:
> >>> Hmm. I wonder whether the guest is actually healthy after the migrate. 
> >>> One could imagine a situation where the storage device model (IDE in our 
> >>> case I guess) gets stuck in some way but recovers after a timeout in the 
> >>> guest storage stack. Thus, if you happen to try shut down while it is 
> >>> still stuck Windows starts trying to shut down but can't. Try after the 
> >>> timeout though and it can.
> >>> In the past we did make attempts to support Windows without PV drivers in 
> >>> XenServer but xenrt would never reliably pass VM lifecycle tests using 
> >>> emulated devices. That was with qemu trad, but I wonder whether upstream 
> >>> qemu is actually any better particularly if using older device models 
> >>> such as IDE and RTL8139 (which are probably largely unmodified from trad).
> >>
> >> Since I've been looking into this for a couple of days, and found no
> >> solution I'm going to write what I've found so far:
> >>
> >>  - The issue only affects Windows guests.
> >>  - It only manifests itself when doing live migration, non-live
> >>migration or save/resume work fine.
> >>  - It affects all x86 hardware, the amount of migrations in order to
> >>trigger it seems to depend on the hardware, but doing 20 migrations
> >>reliably triggers it on all the hardware I've tested.
> > 
> > Not good.
> > 
> > You said that Windows reported that the login process failed somehow?
> > 
> > Is it possible something bad is happening, like sending spurious page
> > faults to the guest in logdirty mode?
> > 
> > I wonder if we could reproduce something like it on Linux -- set a build
> > going and start localhost migrating; a spurious page fault is likely to
> > cause the build to fail.
> 
> Well, with a looping xen-build going on in the guest, I've done 40 local
> migrates with no problems yet.
> 
> But Roger -- is this on emulated devices only, no PV drivers?
> 
> That might be something worth looking at.

Yes, windows doesn't have PV devices. But save/restore and non-live
migration seems fine, so it doesn't look to be related to devices, but
rather to log-dirty or some other aspect of live-migration.

Or maybe it's something indeed related to emulated devices that's more
easily triggerable on live-migration.

I'm also thinking it would be helpful to do x20 save/restore,
shutdown, create, x20 migrations and shutdown. That would help us
identify problems related to save/restore and live-migration more
easily.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Commit moratorium to staging

2017-11-03 Thread Roger Pau Monné
On Thu, Nov 02, 2017 at 09:55:11AM +, Paul Durrant wrote:
> > -Original Message-
> > From: Roger Pau Monne
> > Sent: 02 November 2017 09:42
> > To: Paul Durrant <paul.durr...@citrix.com>
> > Cc: Ian Jackson <ian.jack...@citrix.com>; Lars Kurth
> > <lars.ku...@citrix.com>; Wei Liu <wei.l...@citrix.com>; Julien Grall
> > <julien.gr...@linaro.org>; committ...@xenproject.org; xen-devel  > de...@lists.xenproject.org>
> > Subject: Re: [Xen-devel] Commit moratorium to staging
> > 
> > On Thu, Nov 02, 2017 at 09:20:10AM +, Paul Durrant wrote:
> > > > -Original Message-
> > > > From: Roger Pau Monne
> > > > Sent: 02 November 2017 09:15
> > > > To: Roger Pau Monne <roger@citrix.com>
> > > > Cc: Ian Jackson <ian.jack...@citrix.com>; Lars Kurth
> > > > <lars.ku...@citrix.com>; Wei Liu <wei.l...@citrix.com>; Julien Grall
> > > > <julien.gr...@linaro.org>; Paul Durrant <paul.durr...@citrix.com>;
> > > > committ...@xenproject.org; xen-devel  > de...@lists.xenproject.org>
> > > > Subject: Re: [Xen-devel] Commit moratorium to staging
> > > >
> > > > On Wed, Nov 01, 2017 at 04:17:10PM +, Roger Pau Monné wrote:
> > > > > On Wed, Nov 01, 2017 at 02:07:48PM +, Ian Jackson wrote:
> > > > > > * Affected hosts differ from unaffected hosts according to cpuid.
> > > > > >   Roger has repro'd the bug on an unaffected host by masking out
> > > > > >   certain cpuid bits.  There are 6 implicated bits and he is working
> > > > > >   to narrow that down.
> > > > >
> > > > > I'm currently trying to narrow this down and make sure the above is
> > > > > accurate.
> > > >
> > > > So I was wrong with this, I guess I've run the tests on the wrong
> > > > host. Even when masking the different cpuid bits in the guest the
> > > > tests still succeeds.
> > > >
> > > > AFAICT the test fail or succeed reliably depending on the host
> > > > hardware. I don't really have many ideas about what to do next, but I
> > > > think it would be useful to create a manual osstest flight that runs
> > > > the win16 job in all the different hosts in the colo. I would also
> > > > capture the normal information that Xen collects after each test (xl
> > > > info, /proc/cpuid, serial logs...).
> > > >
> > > > Is there anything else not captured by ts-logs-capture that would be
> > > > interesting in order to help debug the issue?
> > >
> > > Does the shutdown reliably complete prior to migrate and then only fail
> > intermittently after a localhost migrate?
> > 
> > AFAICT yes, but it can also be added to the test in order to be sure.
> > 
> > > It might be useful to know what cpuid info is seen by the guest before and
> > after migrate.
> > 
> > Is there anyway to get that from windows in an automatic way? If not I
> > could test that with a Debian guest. In fact it might even be a good
> > thing for Linux based guest to be added to the regular migration tests
> > in order to make sure cpuid bits don't change across migrations.
> > 
> 
> I found this for windows:
> 
> https://www.cpuid.com/downloads/cpu-z/cpu-z_1.81-en.exe
> 
> It can generate a text or html report as well as being run interactively. But 
> you may get more mileage from using a debian HVM guest. I guess it may also 
> be useful is we can get a scan of available MSRs and content before and after 
> migrate too.
> 
> > > Another datapoint... does the shutdown fail if you insert a delay of a 
> > > couple
> > of minutes between the migrate and the shutdown?
> > 
> > Sometimes, after a variable number of calls to xl shutdown ... the
> > guest usually ends up shutting down.
> > 
> 
> Hmm. I wonder whether the guest is actually healthy after the migrate. One 
> could imagine a situation where the storage device model (IDE in our case I 
> guess) gets stuck in some way but recovers after a timeout in the guest 
> storage stack. Thus, if you happen to try shut down while it is still stuck 
> Windows starts trying to shut down but can't. Try after the timeout though 
> and it can.
> In the past we did make attempts to support Windows without PV drivers in 
> XenServer but xenrt would never reliably pass VM lifecycle tests using 
> emulated devices. That was with qemu trad,

Re: [Xen-devel] Xen PVH support in grub2

2017-11-03 Thread Roger Pau Monné
On Fri, Nov 03, 2017 at 01:50:11PM +0100, Juergen Gross wrote:
> On 03/11/17 13:17, Roger Pau Monné wrote:
> > On Fri, Nov 03, 2017 at 01:00:46PM +0100, Juergen Gross wrote:
> >> On 29/09/17 17:51, Roger Pau Monné wrote:
> >>> On Fri, Sep 29, 2017 at 03:33:58PM +, Juergen Gross wrote:
> >>>> On 29/09/17 17:24, Roger Pau Monné wrote:
> >>>>> On Fri, Sep 29, 2017 at 02:46:53PM +, Juergen Gross wrote:
> >>>>> Then, I also wonder whether it would make sense for this grub to load
> >>>>> the kernel using the PVH entry point or the native entry point. Would
> >>>>> it be possible to boot a Linux kernel up to the point where cpuid can
> >>>>> be used inside of a PVH container?
> >>>>
> >>>> I don't think today's Linux allows that. This has been discussed
> >>>> very thoroughly at the time Boris added PVH V2 support to the kernel.
> >>>
> >>> OK, I'm not going to insist on that, but my plans for FreeBSD is to
> >>> make the native entry point capable of booting inside of a PVH
> >>> container up to the point where cpuid (or whatever method) can be used
> >>> to detect the environment.
> >>
> >> Looking more thoroughly into the Linux boot code I think this could
> >> work for Linux, too. But only if we can tell PVH from HVM in the guest.
> >> How would you do that in FreeBSD? Via flags in the boot params? This
> >> would the have to be done in the boot loader (e.g. grub or OVMF).
> > 
> > My plan was not to differentiate between HVM and PVH, but rather to
> > make use of the ACPI information in order to decide which devices are
> > available and which are not inside of a PVH guest.
> > 
> > For example in the FADT "IA-PC Boot Architecture Flags" field for PVH
> > we already set "VGA Not Present" and "CMOS RTC Not Present". There
> > might be other flags/fields that must be set, but I would like to
> > avoid having a CPUID bit or similar saying "PVH", because then Xen
> > will be tied to always providing the same set of devices in PVH
> > containers.
> 
> Why? This would depend on the semantics tied to the flag. It could just
> mean "don't assume availability of legacy stuff" (e.g. BIOS calls).
> 
> Linux would have a problem with the ACPI approach as it would try BIOS
> calls way before it is initializing its ACPI handling. So in Linux I'd
> need another way to tell I'm running in PVH mode, e.g. a "no legacy"
> bit in the Xen HVM cpuid leaf.

If you are booted from the PVH entry point, there's no BIOS or UEFI
(ie: no firmware), if you are booted from the BIOS entry point there's
a BIOS and the same applies to UEFI. How does Linux differentiate
whether it's booted from BIOS or UEFI?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH] migrations: Do x10 migration 20x instead

2017-11-03 Thread Roger Pau Monné
On Fri, Nov 03, 2017 at 12:52:05PM +, Ian Jackson wrote:
> We want to keep the old testid or some new failures will be "never
> pass".
> 
> Roger reports that this change makes the existing host-specific
> Windows migration failures fail everywhere, so so things may need
 ^ extra so
> force pushing.
> 
> CC: Roger Pau Monné <roger@citrix.com>
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen PVH support in grub2

2017-11-03 Thread Roger Pau Monné
On Fri, Nov 03, 2017 at 01:00:46PM +0100, Juergen Gross wrote:
> On 29/09/17 17:51, Roger Pau Monné wrote:
> > On Fri, Sep 29, 2017 at 03:33:58PM +, Juergen Gross wrote:
> >> On 29/09/17 17:24, Roger Pau Monné wrote:
> >>> On Fri, Sep 29, 2017 at 02:46:53PM +, Juergen Gross wrote:
> >>> Then, I also wonder whether it would make sense for this grub to load
> >>> the kernel using the PVH entry point or the native entry point. Would
> >>> it be possible to boot a Linux kernel up to the point where cpuid can
> >>> be used inside of a PVH container?
> >>
> >> I don't think today's Linux allows that. This has been discussed
> >> very thoroughly at the time Boris added PVH V2 support to the kernel.
> > 
> > OK, I'm not going to insist on that, but my plans for FreeBSD is to
> > make the native entry point capable of booting inside of a PVH
> > container up to the point where cpuid (or whatever method) can be used
> > to detect the environment.
> 
> Looking more thoroughly into the Linux boot code I think this could
> work for Linux, too. But only if we can tell PVH from HVM in the guest.
> How would you do that in FreeBSD? Via flags in the boot params? This
> would the have to be done in the boot loader (e.g. grub or OVMF).

My plan was not to differentiate between HVM and PVH, but rather to
make use of the ACPI information in order to decide which devices are
available and which are not inside of a PVH guest.

For example in the FADT "IA-PC Boot Architecture Flags" field for PVH
we already set "VGA Not Present" and "CMOS RTC Not Present". There
might be other flags/fields that must be set, but I would like to
avoid having a CPUID bit or similar saying "PVH", because then Xen
will be tied to always providing the same set of devices in PVH
containers.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG] win2008 guest cannot get ip through sriov

2017-11-03 Thread Roger Pau Monné
On Fri, Nov 03, 2017 at 01:10:26AM +, Hao, Xudong wrote:
> 
> > -Original Message-
> > From: Julien Grall [mailto:julien.gr...@linaro.org]
> > Sent: Thursday, November 2, 2017 9:50 PM
> > To: Stefano Stabellini 
> > Cc: Hao, Xudong ; Jan Beulich ;
> > Quan Xu ; Lars Kurth ; Wei Liu
> > ; Zhang, PengtaoX ; Kang,
> > Luwei ; Julien Grall ; Anthony
> > PERARD ; Xen-devel  > de...@lists.xenproject.org>
> > Subject: Re: [Xen-devel] [BUG] win2008 guest cannot get ip through sriov
> > 
> > Hi,
> > 
> > On 27/10/17 21:16, Stefano Stabellini wrote:
> > > On Fri, 27 Oct 2017, Julien Grall wrote:
> > >> On 27/10/17 08:27, Hao, Xudong wrote:
> > >>> This bug exist much long time, there are many discussion last year
> > >>> but not a solution then. I call out it now because there is a fix in 
> > >>> qemu
> > upstream:
> > >>> commit a8036336609d2e184fc3543a4c439c0ba7d7f3a2
> > >>> Author: Roger Pau Monne 
> > >>> Date:   Thu Aug 24 16:07:03 2017 +0100
> > >>>
> > >>>   xen/pt: allow QEMU to request MSI unmasking at bind time
> > >>>
> > >>> The fix is not in qemu-xen tree yet, when will qemu-xen sync this
> > >>> fix? Is it possible to catch Xen 4.10's qemu-xen?
> > >>
> > >> I will let Stefano and Anthony providing feedback before giving a
> > >> release-ack here.
> > >
> > > Yes, I think we should backport the commit as it fixes a genuine bug.
> > > The backport is not risk-free but it only affects PCI Passthrough.
> > > Also the commit has been in QEMU for 2 months now.
> > 
> > Does anyone actually tested it with QEMU Xen tree?
> > 
> 
> Qemu Xen tree is default which is in Xen source code configuration file 
> Config.mk, I tested it with it.
> QEMU_UPSTREAM_URL ?= http://xenbits.xen.org/git-http/qemu-xen.git

Can you please make sure you have QEMU commit a80363: xen/pt: allow
QEMU to request MSI unmasking at bind time. AFAICT this is not yet in
the qemu-xen tree.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Regression PCI passthrough from 4.5.5 to 4.6.0-rc1

2017-11-02 Thread Roger Pau Monné
On Thu, Nov 02, 2017 at 07:02:49PM +0200, Pasi Kärkkäinen wrote:
> On Thu, Aug 24, 2017 at 09:23:16AM +0100, Roger Pau Monné wrote:
> > On Wed, Aug 23, 2017 at 07:13:00PM +0200, Andreas Kinzler wrote:
> > > > > > From a brief look it looks like this would be doable, but the way
> > > > > these flags are being communicated is rather ugly (the values used
> > > > > here
> > > > > > aren't part of the public interface, and hence it wasn't immediately
> > > > > > clear whether using one of the unused bits would be an option, but
> > > > > > it looks like it is).
> > > > > Yes, it's not pretty... Last used bit is 15, hence bit 16 could be
> > > > > used to signal to Xen whether the interrupt should be unmasked after
> > > > > binding. I have a half-drafted patch, will finish it now.
> > > > Andreas, could you please give a try to the attached two patches? One
> > > > is for Xen and the other one is for QEMU.
> > > 
> > > Seems to work after I fixed a bug ;-)
> > > 
> > > -gflags |= masked ? 0 : XEN_PT_GFLAGSSHIFT_UNMASKED;
> > > +gflags |= masked ? 0 : (1 << XEN_PT_GFLAGSSHIFT_UNMASKED);
> > > 
> > > Please let Jan and/or others review the patches.
> > 
> > Thanks. I would like to add your tested-by if possible, since I'm not
> > able to trigger this behavior myself.
> > 
> 
> Hmm.. did these patches get merged / acked? 

Yes, both the QEMU and the Xen sides have been merged.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Commit moratorium to staging

2017-11-02 Thread Roger Pau Monné
On Thu, Nov 02, 2017 at 09:20:10AM +, Paul Durrant wrote:
> > -Original Message-
> > From: Roger Pau Monne
> > Sent: 02 November 2017 09:15
> > To: Roger Pau Monne <roger@citrix.com>
> > Cc: Ian Jackson <ian.jack...@citrix.com>; Lars Kurth
> > <lars.ku...@citrix.com>; Wei Liu <wei.l...@citrix.com>; Julien Grall
> > <julien.gr...@linaro.org>; Paul Durrant <paul.durr...@citrix.com>;
> > committ...@xenproject.org; xen-devel <xen-de...@lists.xenproject.org>
> > Subject: Re: [Xen-devel] Commit moratorium to staging
> > 
> > On Wed, Nov 01, 2017 at 04:17:10PM +, Roger Pau Monné wrote:
> > > On Wed, Nov 01, 2017 at 02:07:48PM +, Ian Jackson wrote:
> > > > * Affected hosts differ from unaffected hosts according to cpuid.
> > > >   Roger has repro'd the bug on an unaffected host by masking out
> > > >   certain cpuid bits.  There are 6 implicated bits and he is working
> > > >   to narrow that down.
> > >
> > > I'm currently trying to narrow this down and make sure the above is
> > > accurate.
> > 
> > So I was wrong with this, I guess I've run the tests on the wrong
> > host. Even when masking the different cpuid bits in the guest the
> > tests still succeeds.
> > 
> > AFAICT the test fail or succeed reliably depending on the host
> > hardware. I don't really have many ideas about what to do next, but I
> > think it would be useful to create a manual osstest flight that runs
> > the win16 job in all the different hosts in the colo. I would also
> > capture the normal information that Xen collects after each test (xl
> > info, /proc/cpuid, serial logs...).
> > 
> > Is there anything else not captured by ts-logs-capture that would be
> > interesting in order to help debug the issue?
> 
> Does the shutdown reliably complete prior to migrate and then only fail 
> intermittently after a localhost migrate?

AFAICT yes, but it can also be added to the test in order to be sure.

> It might be useful to know what cpuid info is seen by the guest before and 
> after migrate.

Is there anyway to get that from windows in an automatic way? If not I
could test that with a Debian guest. In fact it might even be a good
thing for Linux based guest to be added to the regular migration tests
in order to make sure cpuid bits don't change across migrations.

> Another datapoint... does the shutdown fail if you insert a delay of a couple 
> of minutes between the migrate and the shutdown?

Sometimes, after a variable number of calls to xl shutdown ... the
guest usually ends up shutting down.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Commit moratorium to staging

2017-11-02 Thread Roger Pau Monné
On Wed, Nov 01, 2017 at 04:17:10PM +, Roger Pau Monné wrote:
> On Wed, Nov 01, 2017 at 02:07:48PM +, Ian Jackson wrote:
> > * Affected hosts differ from unaffected hosts according to cpuid.
> >   Roger has repro'd the bug on an unaffected host by masking out
> >   certain cpuid bits.  There are 6 implicated bits and he is working
> >   to narrow that down.
> 
> I'm currently trying to narrow this down and make sure the above is
> accurate.

So I was wrong with this, I guess I've run the tests on the wrong
host. Even when masking the different cpuid bits in the guest the
tests still succeeds.

AFAICT the test fail or succeed reliably depending on the host
hardware. I don't really have many ideas about what to do next, but I
think it would be useful to create a manual osstest flight that runs
the win16 job in all the different hosts in the colo. I would also
capture the normal information that Xen collects after each test (xl
info, /proc/cpuid, serial logs...).

Is there anything else not captured by ts-logs-capture that would be
interesting in order to help debug the issue?

Regards, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Commit moratorium to staging

2017-11-01 Thread Roger Pau Monné
On Wed, Nov 01, 2017 at 02:07:48PM +, Ian Jackson wrote:
> So, investigations (mostly by Roger, and also a bit of archaeology in
> the osstest db by me) have determined:
> 
> * This bug is 100% reproducible on affected hosts.  The repro is
>   to boot the Windows guest, save/restore it, then migrate it,
>   then shut down.  (This is from an IRL conversation with Roger and
>   may not be 100% accurate.  Roger, please correct me.)

Yes, that's correct AFAICT. The affected hosts works fine if windows
is booted and then shut down (without save/restore or migrations
involved).

> * Affected hosts differ from unaffected hosts according to cpuid.
>   Roger has repro'd the bug on an unaffected host by masking out
>   certain cpuid bits.  There are 6 implicated bits and he is working
>   to narrow that down.

I'm currently trying to narrow this down and make sure the above is
accurate.

> * It seems likely that this is therefore a real bug.  Maybe in Xen and
>   perhaps indeed one that should indeed be a release blocker.
> 
> * But this is not a regresson between master and staging.  It affects
>   many osstest branches apparently equally.
> 
> * This test is, effectively, new: before the osstest change
>   "HostDiskRoot: bump to 20G", these jobs would always fail earlier
>   and the affected step would not be run.
> 
> * The passes we got on various osstest branches before were just
>   because those branches hadn't tested on an affected host yet.  As
>   branches test different hosts, they will stick on affected hosts.
> 
> ISTM that this situation would therefore justify a force push.  We
> have established that this bug is very unlikely to be anything to do
> with the commits currently blocked by the failing pushes.

I agree, this is a bug that's always been present (at least in the
tested branches). It's triggered now because the windows tests
have made further progress.

> Furthermore, the test is not intermittent, so a force push will be
> effective in the following sense: we would only get a "spurious" pass,
> resulting in the relevant osstest branch becoming stuck again, if a
> future test was unlucky and got an unaffected host.  That will happen
> infrequently enough.
> 
> So unless anyone objects (and for xen.git#master, with Julien's
> permission), I intend to force push all affected osstest branches when
> the test report shows the only blockage is ws16 and/or win10 tests
> failing the "guest-stop" step.
> 
> Opinions ?

I agree that a force push is justified. This is bug going to be quite
annoying if osstest decides to tests on non-affected hosts, because
then we will get sporadic success flights.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Commit moratorium to staging

2017-10-31 Thread Roger Pau Monné
On Tue, Oct 31, 2017 at 10:49:35AM +, Julien Grall wrote:
> Hi all,
> 
> Master lags 15 days behind staging due to tests failing reliably on some of
> the hardware in osstest (see [1]).
> 
> At the moment a force push is not feasible because the same tests passes on
> different hardware (see [2]).

I've been looking into this, and I'm afraid I don't yet have a cause
for those issues. I'm going to post what I've found so far, maybe
someone is able to spot something I'm missing.

Since I assumed this was somehow related to the ACPI PM1A_STS/EN
blocks (which is how the power button even gets notified to the OS),
I've added the following instrumentation to the pmtimer.c code:

diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c
index 435647ff1e..051fc46df8 100644
--- a/xen/arch/x86/hvm/pmtimer.c
+++ b/xen/arch/x86/hvm/pmtimer.c
@@ -61,9 +61,15 @@ static void pmt_update_sci(PMTState *s)
 ASSERT(spin_is_locked(>lock));
 
 if ( acpi->pm1a_en & acpi->pm1a_sts & SCI_MASK )
+{
+printk("asserting SCI IRQ\n");
 hvm_isa_irq_assert(s->vcpu->domain, SCI_IRQ, NULL);
+}
 else
+{
+printk("de-asserting SCI IRQ\n");
 hvm_isa_irq_deassert(s->vcpu->domain, SCI_IRQ);
+}
 }
 
 void hvm_acpi_power_button(struct domain *d)
@@ -73,6 +79,7 @@ void hvm_acpi_power_button(struct domain *d)
 if ( !has_vpm(d) )
 return;
 
+printk("hvm_acpi_power_button for d%d\n", d->domain_id);
 spin_lock(>lock);
 d->arch.hvm_domain.acpi.pm1a_sts |= PWRBTN_STS;
 pmt_update_sci(s);
@@ -86,6 +93,7 @@ void hvm_acpi_sleep_button(struct domain *d)
 if ( !has_vpm(d) )
 return;
 
+printk("hvm_acpi_sleep_button for d%d\n", d->domain_id);
 spin_lock(>lock);
 d->arch.hvm_domain.acpi.pm1a_sts |= PWRBTN_STS;
 pmt_update_sci(s);
@@ -170,6 +178,7 @@ static int handle_evt_io(
 
 if ( dir == IOREQ_WRITE )
 {
+printk("write PM1a addr: %#x val: %#x\n", addr, *val);
 /* Handle this I/O one byte at a time */
 for ( i = bytes, data = *val;
   i > 0;
@@ -197,6 +206,8 @@ static int handle_evt_io(
  bytes, *val, port);
 }
 }
+printk("result pm1a_sts: %#x pm1a_en: %#x\n",
+  acpi->pm1a_sts, acpi->pm1a_en);
 /* Fix up the SCI state to match the new register state */
 pmt_update_sci(s);
 }

I've then rerun the failing test, and this is what I got in the
failure case (ie: windows ignoring the power event):

(XEN) hvm_acpi_power_button for d14
(XEN) asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0x1
(XEN) result pm1a_sts: 0x100 pm1a_en: 0x320
(XEN) asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0x100
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ
(XEN) write PM1a addr: 0x2 val: 0x320
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ

Strangely enough, the second time I've tried the same command (xl
shutdown -wF ...) on the same guest, it succeed and windows shut down
without issues, this is the log in that case:

(XEN) hvm_acpi_power_button for d14
(XEN) asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0x1
(XEN) result pm1a_sts: 0x100 pm1a_en: 0x320
(XEN) asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0x100
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ
(XEN) write PM1a addr: 0x2 val: 0x320
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ
(XEN) write PM1a addr: 0x2 val: 0x320
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ
(XEN) write PM1a addr: 0 val: 0x8000
(XEN) result pm1a_sts: 0 pm1a_en: 0x320
(XEN) de-asserting SCI IRQ

I have to admit I have no idea why Windows clears the STS power bit
and then completely ignores it on certain occasions.

I'm also afraid I have no idea how to debug Windows in order to know
why this event is acknowledged but ignored.

I've also tried to reproduce the same with a Debian guest, by doing
the same amount of save/restores and migrations, and finally issuing a
xl trigger  power, but Debian has always worked fine and
shut down.

Any comments are welcome.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [examine test] 115400: regressions - FAIL

2017-10-31 Thread Roger Pau Monné
On Mon, Oct 30, 2017 at 06:48:46PM +, osstest service owner wrote:
> flight 115400 examine real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/115400/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  examine-elbling0  2 hosts-allocate broken REGR. vs. 
> 115392
>  examine-godello0  4 memdisk-try-append   fail REGR. vs. 
> 115392
> 
> Tests which did not succeed, but are not blocking:
>  examine-godello1  4 memdisk-try-append   fail  like 
> 115392

I've looked at the failures, and the symptom is the same, the loader
seems to receive a key press that stops the autoboot timeout and drops
into the loader prompt:

Oct 30 16:59:55.887295 Hit [Enter] to boot immediately, or any other key for 
command prompt.
Oct 30 16:59:55.895262 
Oct 30 16:59:55.895297 
Oct 30 16:59:55.895324 Type '?' for a list of commands, 'help' for more 
detailed help.
Oct 30 16:59:55.903265 OK

I will look into this later.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next 9/9] coverage: add documentation for LLVM coverage

2017-10-30 Thread Roger Pau Monné
On Mon, Oct 30, 2017 at 04:48:09PM +, Wei Liu wrote:
> On Thu, Oct 26, 2017 at 10:19:38AM +0100, Roger Pau Monne wrote:
> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> > ---
> > Cc: Andrew Cooper <andrew.coop...@citrix.com>
> > Cc: George Dunlap <george.dun...@eu.citrix.com>
> > Cc: Ian Jackson <ian.jack...@eu.citrix.com>
> > Cc: Jan Beulich <jbeul...@suse.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
> > Cc: Stefano Stabellini <sstabell...@kernel.org>
> > Cc: Tim Deegan <t...@xen.org>
> > Cc: Wei Liu <wei.l...@citrix.com>
> > ---
> >  docs/misc/coverage.markdown | 47 
> > +
> >  1 file changed, 47 insertions(+)
> > 
> > diff --git a/docs/misc/coverage.markdown b/docs/misc/coverage.markdown
> > index 0a32c48f4b..565644631a 100644
> > --- a/docs/misc/coverage.markdown
> > +++ b/docs/misc/coverage.markdown
> > @@ -8,6 +8,8 @@ information. Every basic block in the code will be 
> > instrumented by the compiler
> >  to compute these statistics. It should not be used in production as it 
> > slows
> >  down your hypervisor.
> >  
> > +# GCOV (GCC coverage)
> > +
> >  ## Enable coverage
> >  
> >  Test coverage support can be turned on compiling Xen with the `coverage` 
> > option set
> > @@ -87,3 +89,48 @@ blob extracted from xencov!**
> >  * See output in a browser
> >  
> >  firefox cov/index.html
> > +
> > +# LLVM coverage
> > +
> > +## Enable coverage
> > +
> > +Coverage can be enabled using a Kconfig option, from the top-level 
> > directory
> > +use the following command to display the Kconfig menu:
> > +
> > +gmake -C xen menuconfig clang=y
> > +
> > +The LLVM coverage option can be found inside of the "Debugging Options"
> > +section. After enabling it just compile Xen as you would normally do:
> > +
> > +   gmake xen clang=y
> > +
> 
> It can be a bit confusing when make and gmake appear in the same
> document. I suggest you use make all over and add a footnote saying it
> should be gmake on FreeBSD.

This is my fault from copying the runes from my command line. I'm not
sure a footnote is even needed, everything on BSDs should use gmake
instead of make, and I don't plan to edit all the documentation files.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next 5/9] coverage: introduce generic file

2017-10-30 Thread Roger Pau Monné
On Mon, Oct 30, 2017 at 04:48:21PM +, Wei Liu wrote:
> On Thu, Oct 26, 2017 at 10:19:34AM +0100, Roger Pau Monne wrote:
> > 
> > diff --git a/xen/common/coverage/Makefile b/xen/common/coverage/Makefile
> > index f68d050eca..0e0510679e 100644
> > --- a/xen/common/coverage/Makefile
> > +++ b/xen/common/coverage/Makefile
> > @@ -1,4 +1,4 @@
> > -obj-y += gcov_base.o gcov.o
> > +obj-y += gcov_base.o gcov.o coverage.o
> >  obj-$(CONFIG_GCOV_FORMAT_3_4) += gcc_3_4.o
> >  obj-$(CONFIG_GCOV_FORMAT_4_7) += gcc_4_7.o
> >  obj-$(CONFIG_GCOV_FORMAT_4_9) += gcc_4_9.o
> > diff --git a/xen/common/coverage/coverage.c b/xen/common/coverage/coverage.c
> > new file mode 100644
> > index 00..1dec6944be
> > --- /dev/null
> > +++ b/xen/common/coverage/coverage.c
> > @@ -0,0 +1,71 @@
> > +/*
> > + * Generic functionality for coverage analysis.
> > + *
> > + * Copyright (C) 2017 Citrix Systems R
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms and conditions of the GNU General Public
> > + * License, version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public
> > + * License along with this program; If not, see 
> > .
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> 
> Please sort this.

OK, I will have to include type.h in coverage.h then.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 2/29] VIOMMU: Add vIOMMU helper functions to create, destroy vIOMMU instance

2017-10-30 Thread Roger Pau Monné
On Mon, Oct 30, 2017 at 09:41:23AM +0800, Lan Tianyu wrote:
> On 2017年10月25日 09:43, Lan Tianyu wrote:
> >> For all platforms supporting HVM, for PV I don't think it makes sense.
> >> > Since AFAIK ARM guest type is also HVM I would rather introduce this
> >> > field in the hvm_domain structure rather than the generic domain
> >> > structure.
> >> > 
> > This sounds reasonable.
> > 
> >> > You might want to wait for feedback from others regarding this issue.
> >> > 
> > I discussed with Julien before. He hoped no to add viommu code for ARM
> > first.So struct hvm_domain seems to be better place since it's arch
> > specific definition and only add struct viommu for struct hvm_domain of x86.
> 
> Hi Roger:
>   If PV guest needs PV IOMMU support, struct iommu should be put  into
> struct domain and it can be reused by full-virtualization and PV iommu.
> Malcolm Crossley sent out RFC patch of pv iommu before. I found it also
> needs to change struct domain.
> 
> https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg01441.html

This patch series is from February 2016: almost two years old and
there's been no further repost.

If this can indeed be shared with a future pv-iommu work, have you
checked whether the current structure data and hooks would be
suitable for a pv-iommu implementation?

I would rather prefer to move the viommu structure from hvm_domain to
the generic domain struct when it's actually needed (ie: when pv-iommu
is implemented) rather than doing it here.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [linux-linus test] 115244: regressions - FAIL

2017-10-27 Thread Roger Pau Monné
On Fri, Oct 27, 2017 at 01:46:24AM +, osstest service owner wrote:
> flight 115244 linux-linus real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/115244/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop   fail REGR. vs. 
> 114682

This seems to be the failure that we have been seeing in the Mass colo
also.

>  test-amd64-i386-libvirt-qcow2 10 debian-di-install   fail REGR. vs. 
> 114682

This is due to a network failure AFAICT:

Configure the package manager
-

!! ERROR: Cannot access repository

The repository on ftp.debian.org couldn't be accessed, so its updates will not 
be made available to you at this time. You should investigate this later.

Commented out entries for ftp.debian.org have been added to the 
/etc/apt/sources.list file.
[Press enter to continue] 

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Is that possible to merge MBA into Xen 4.10?

2017-10-26 Thread Roger Pau Monné
On Tue, Oct 24, 2017 at 10:10:14AM +0800, Yi Sun wrote:
> Hi, all,
> 
> As you may know, MBA patch set has got enough Reviewed-by/Acked-by in last 
> week.
> It is ready to be merged. 
> 
> This is a feature for Skylake, Intel has launched Skylake and KVM already
> supported MBA, so including it in Xen 4.10 will quickly fill this gap.
> 
> MBA missed the 4.10 feature freeze date for only a few days due to lack of
> timely review for earlier versions which slowed down the patch iteration 
> notably.
> It seems maintainers are very busy recently so that the review progress for 
> 4.10
> is slower than before. So I am wondering if it is possible to merge it into 
> 4.10?
> 
> This patch set mainly touches codes related to PSR in 
> tools/domctl/sysctl/hypervisor.
> It does not touch other features. So, the risk is low to merge it.

I agree that the risk is low, code is limited to PSR related bits, and
IIRC doesn't touch common code.

The main risk here would be breaking other Intel PSR features already
committed, but I guess Intel has made sure this new feature doesn't
break existing functionality.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next 6/9] kconfig: add llvm coverage option

2017-10-26 Thread Roger Pau Monné
On Thu, Oct 26, 2017 at 11:03:13AM +0100, Wei Liu wrote:
> On Thu, Oct 26, 2017 at 10:19:35AM +0100, Roger Pau Monne wrote:
> >  config GCOV
> > bool "Gcov Support"
> > depends on !LIVEPATCH
> 
> && !LLVM_COVERAGE

That was my idea, but sadly that's not possible because you generate a
circular dependency. The best solution that I found is to only mark
one as exclusive (ie: have depends !GCOV in the LLVM_COVERAGE option
below).

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10] gcov: fix typo in documentation

2017-10-26 Thread Roger Pau Monné
On Thu, Oct 26, 2017 at 02:44:22AM -0600, Jan Beulich wrote:
> >>> On 26.10.17 at 10:34,  wrote:
> > --- a/docs/misc/coverage.markdown
> > +++ b/docs/misc/coverage.markdown
> > @@ -1,6 +1,6 @@
> >  # Coverage support for Xen
> >  
> > -Coverare support allow you to get coverage information from Xen execution.
> > +Coverage support allow you to get coverage information from Xen execution.
> 
> How about also adding the missing 's' to "allow" at the same time?

Didn't catch that one, thanks.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 20/29] VIOMMU: Add get irq info callback to convert irq remapping request

2017-10-25 Thread Roger Pau Monné
On Wed, Oct 25, 2017 at 03:30:39PM +0800, Lan Tianyu wrote:
> On 2017年10月19日 23:42, Roger Pau Monné wrote:
> > On Thu, Sep 21, 2017 at 11:02:01PM -0400, Lan Tianyu wrote:
> > 
> >>  
> >>  struct viommu_ops {
> >> @@ -28,6 +29,9 @@ struct viommu_ops {
> >>  int (*destroy)(struct viommu *viommu);
> >>  int (*handle_irq_request)(struct domain *d,
> >>struct arch_irq_remapping_request *request);
> >> +int (*get_irq_info)(struct domain *d,
> >> +struct arch_irq_remapping_request *request,
> > 
> > AFAICT d and request should be constified.
> 
> Did you mean to keep d and request in the same line? This will exceed 80
> chars.

No, I meant that the parameters of the function should be "const struct
domain *d" and "const struct arch_irq_remapping_request *request".
AFAICT you should never modify them inside of get_irq_info.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v12 00/33] osstest: FreeBSD host support

2017-10-23 Thread Roger Pau Monné
On Fri, Oct 20, 2017 at 04:32:44PM +0100, Ian Jackson wrote:
> We have decided:
> 
>  We will push the anoint and examine parts of this series to osstest
>  pretest.  (You're going to give me a suitable branch on Monday.)
>  This should work because we have anointed FreeBSD builds already.

Sorry for the delay, had to cherry-pick some commits from the FreeBSD
host install series in order for the examine one to work. I've pushed
this to the following branch:

git://xenbits.xen.org/people/royger/osstest.git examine

Here is the output of a sample examine flight with the contents of the
branch:

http://osstest.xs.citrite.net/~osstest/testlogs/logs/72345/

Note that patch "ts-freebsd-host-install: add arguments to test
memdisk append options" is missing an Ack (you requested changes to
it).

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 28/29] x86/vvtd: Add queued invalidation (QI) support

2017-10-23 Thread Roger Pau Monné
On Mon, Oct 23, 2017 at 03:50:24PM +0800, Chao Gao wrote:
> On Fri, Oct 20, 2017 at 12:20:06PM +0100, Roger Pau Monné wrote:
> >On Thu, Sep 21, 2017 at 11:02:09PM -0400, Lan Tianyu wrote:
> >> From: Chao Gao <chao@intel.com>
> >> +}
> >> +
> >> +unmap_guest_page((void*)qinval_page);
> >> +return ret;
> >> +
> >> + error:
> >> +unmap_guest_page((void*)qinval_page);
> >> +gdprintk(XENLOG_ERR, "Internal error in Queue Invalidation.\n");
> >> +domain_crash(vvtd->domain);
> >
> >Do you really need to crash the domain in such case?
> 
> We reach here when guest requests some operations vvtd doesn't claim
> supported or emulated. I am afraid it also can be triggered by guest.
> How about ignoring the invalidation request?

What would real hardware do in such case?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10] docs: update coverage.markdown

2017-10-20 Thread Roger Pau Monné
On Fri, Oct 20, 2017 at 05:30:41PM +0100, Wei Liu wrote:
> The coverage support in hypervisor is redone. Update the document.
> 
> Signed-off-by: Wei Liu <wei.l...@citrix.com>

Adding Julien, although I'm not sure if doc changes also need a
release-ack.

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 00/29]

2017-10-20 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:41PM -0400, Lan Tianyu wrote:
> Change since v2:
>1) Remove vIOMMU hypercall of query capabilities and introduce when 
> necessary.
>2) Remove length field of vIOMMU create parameter of vIOMMU hypercall
>3) Introduce irq remapping mode callback to vIOMMU framework and 
> vIOMMU device models
> can check irq remapping mode by vendor specific ways.
>4) Update vIOMMU docs.
>5) Other changes please see patches' change logs.
> 
> Change since v1:
>1) Fix coding style issues
>2) Add definitions for vIOMMU type and capabilities
>3) Change vIOMMU kconfig and select vIOMMU default on x86
>4) Put vIOMMU creation in libxl__arch_domain_create()
>5) Make vIOMMU structure of tool stack more general for both PV and 
> HVM.
> 
> Change since RFC v2:
>1) Move vvtd.c to drivers/passthrough/vtd directroy. 
>2) Make vIOMMU always built in on x86
>3) Add new boot cmd "viommu" to enable viommu function
>4) Fix some code stype issues.
> 
> Change since RFC v1:
>1) Add Xen virtual IOMMU doc docs/misc/viommu.txt
>2) Move vIOMMU hypercall of create/destroy vIOMMU and query  
> capabilities from dmop to domctl suggested by Paul Durrant. Because
> these hypercalls can be done in tool stack and more VM mode(E,G PVH
> or other modes don't use Qemu) can be benefit.
>3) Add check of input MMIO address and length.
>4) Add iommu_type in vIOMMU hypercall parameter to specify
> vendor vIOMMU device model(E,G Intel VTD, AMD or ARM IOMMU. So far
> only support Intel VTD).
>5) Add save and restore support for vvtd
> 
> 
> This patchset is to introduce vIOMMU framework and add virtual VTD's
> interrupt remapping support according "Xen virtual IOMMU high level
> design doc V3"(https://lists.xenproject.org/archives/html/xen-devel/
> 2016-11/msg01391.html).
> 
> - vIOMMU framework
> New framework provides viommu_ops and help functions to abstract
> vIOMMU operations(E,G create, destroy, handle irq remapping request
> and so on). Vendors(Intel, ARM, AMD and son) can implement their
> vIOMMU callbacks.
> 
> - Virtual VTD
> We enable irq remapping function and covers both
> MSI and IOAPIC interrupts. Don't support post interrupt mode emulation
> and post interrupt mode enabled on host with virtual VTD. will add
> later.

Hello,

Just a couple of generic comments on the whole series:

 - Please make sure that the result after each patch is buildable. It
   is of extreme importance the that Xen tree is bisectable at all
   points.

 - Regarding the organization of the series, I would rather prefer
   that you place the design document at the beginning (like it's done
   now), then the hypervisor changes (possibly the generic framework
   first, then the vvtd functionality and finally all the hooks into
   common code) and the toolstack side at the end. This might be just
   my personal taste, but I think it's clearer to review/understand
   rather than mixed as it is now.

 - Finally, please try to make sure that each patch introduces the
   helpers or structures that it needs. For example don't place all
   the "static inline" helpers together with a bunch of structures in
   an isolated patch, and then a bunch of patches that start making
   use of them. Instead introduce the structures or helpers in the
   context when they are used. An exception of this might be for very
   big or generic structures.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 29/29] x86/vvtd: save and restore emulated VT-d

2017-10-20 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:02:10PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> Provide a save-restore pair to save/restore registers and non-register
> status.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
> v3:
>  - use one entry to save both vvtd registers and other intermediate
>  state
> ---
>  xen/drivers/passthrough/vtd/vvtd.c | 66 
> ++
>  xen/include/public/arch-x86/hvm/save.h | 25 -
>  2 files changed, 76 insertions(+), 15 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> index 668d0c9..2aecd93 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -28,11 +28,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "iommu.h"
>  #include "vtd.h"
> @@ -40,20 +42,6 @@
>  /* Supported capabilities by vvtd */
>  unsigned int vvtd_caps = VIOMMU_CAP_IRQ_REMAPPING;
>  
> -struct hvm_hw_vvtd_status {
> -uint32_t eim_enabled : 1,
> - intremap_enabled : 1;
> -uint32_t fault_index;
> -uint32_t irt_max_entry;
> -/* Interrupt remapping table base gfn */
> -uint64_t irt;
> -};
> -
> -union hvm_hw_vvtd_regs {
> -uint32_t data32[256];
> -uint64_t data64[128];
> -};
> -
>  struct vvtd {
>  /* Address range of remapping hardware register-set */
>  uint64_t base_addr;
> @@ -1057,6 +1045,56 @@ static bool vvtd_is_remapping(struct domain *d,
>  return 0;
>  }
>  
> +static int vvtd_load(struct domain *d, hvm_domain_context_t *h)
> +{
> +struct hvm_hw_vvtd *hw_vvtd;
> +
> +if ( !domain_vvtd(d) )
> +return -ENODEV;
> +
> +hw_vvtd = xmalloc(struct hvm_hw_vvtd);
> +if ( !hw_vvtd )
> +return -ENOMEM;
> +
> +if ( hvm_load_entry(VVTD, h, hw_vvtd) )
> +{
> +xfree(hw_vvtd);
> +return -EINVAL;
> +}

If you declare hvm_hw_vvtd_regs as a field inside of
hvm_hw_vvtd_status you won't need to do this alloc + memcpy, bnecause
you could directly load it to domain_vvtd.

In any case, I think the code here is going to change due to all the
other comments on the previous patches.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 28/29] x86/vvtd: Add queued invalidation (QI) support

2017-10-20 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:02:09PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> Queued Invalidation Interface is an expanded invalidation interface with
> extended capabilities. Hardware implementations report support for queued
> invalidation interface through the Extended Capability Register. The queued
> invalidation interface uses an Invalidation Queue (IQ), which is a circular
> buffer in system memory. Software submits commands by writing Invalidation
> Descriptors to the IQ.
> 
> In this patch, a new function viommu_process_iq() is used for emulating how
> hardware handles invalidation requests through QI.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/drivers/passthrough/vtd/iommu.h |  19 ++-
>  xen/drivers/passthrough/vtd/vvtd.c  | 232 
> 
>  2 files changed, 250 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h 
> b/xen/drivers/passthrough/vtd/iommu.h
> index c69cd21..c2b83f1 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -177,6 +177,21 @@
>  #define DMA_IRTA_S(val) (val & 0xf)
>  #define DMA_IRTA_SIZE(val)  (1UL << (DMA_IRTA_S(val) + 1))
>  
> +/* IQA_REG */
> +#define DMA_IQA_ADDR(val)   (val & ~0xfffULL)
> +#define DMA_IQA_QS(val) (val & 0x7)
> +#define DMA_IQA_RSVD0xff8ULL
> +
> +/* IECTL_REG */
> +#define DMA_IECTL_IM_SHIFT 31
> +#define DMA_IECTL_IM(1 << DMA_IECTL_IM_SHIFT)

Isn't this undefined behavior? It should be 1u.

You should consider using u for all the defines added.

> +#define DMA_IECTL_IP_SHIFT 30
> +#define DMA_IECTL_IP(1 << DMA_IECTL_IP_SHIFT)
> +
> +/* ICS_REG */
> +#define DMA_ICS_IWC_SHIFT   0
> +#define DMA_ICS_IWC (1 << DMA_ICS_IWC_SHIFT)
> +
>  /* PMEN_REG */
>  #define DMA_PMEN_EPM(((u32)1) << 31)
>  #define DMA_PMEN_PRS(((u32)1) << 0)
> @@ -211,7 +226,8 @@
>  #define DMA_FSTS_PPF (1U << DMA_FSTS_PPF_SHIFT)
>  #define DMA_FSTS_AFO (1U << 2)
>  #define DMA_FSTS_APF (1U << 3)
> -#define DMA_FSTS_IQE (1U << 4)
> +#define DMA_FSTS_IQE_SHIFT 4
> +#define DMA_FSTS_IQE (1U << DMA_FSTS_IQE_SHIFT)
>  #define DMA_FSTS_ICE (1U << 5)
>  #define DMA_FSTS_ITE (1U << 6)
>  #define DMA_FSTS_PRO_SHIFT 7
> @@ -562,6 +578,7 @@ struct qinval_entry {
>  
>  /* Queue invalidation head/tail shift */
>  #define QINVAL_INDEX_SHIFT 4
> +#define QINVAL_INDEX_MASK  0x7fff0ULL
>  
>  #define qinval_present(v) ((v).lo & 1)
>  #define qinval_fault_disable(v) (((v).lo >> 1) & 1)
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> index 55f7a46..668d0c9 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -419,6 +420,177 @@ static int vvtd_record_fault(struct vvtd *vvtd,
>  return X86EMUL_OKAY;
>  }
>  
> +/*
> + * Process a invalidation descriptor. Currently, only two types descriptors,
> + * Interrupt Entry Cache Invalidation Descritor and Invalidation Wait
> + * Descriptor are handled.
> + * @vvtd: the virtual vtd instance
> + * @i: the index of the invalidation descriptor to be processed
> + *
> + * If success return 0, or return non-zero when failure.
> + */
> +static int process_iqe(struct vvtd *vvtd, int i)

unsigned int.

> +{
> +uint64_t iqa;
> +struct qinval_entry *qinval_page;
> +int ret = 0;
> +
> +iqa = vvtd_get_reg_quad(vvtd, DMAR_IQA_REG);
> +qinval_page = map_guest_page(vvtd->domain, 
> DMA_IQA_ADDR(iqa)>>PAGE_SHIFT);

PFN_DOWN instead of open coding the shift. Both can be initialized
at declaration. Also AFAICT iqa is only used once, so the local
variable is not needed.

> +if ( IS_ERR(qinval_page) )
> +{
> +gdprintk(XENLOG_ERR, "Can't map guest IRT (rc %ld)",
> + PTR_ERR(qinval_page));
> +return PTR_ERR(qinval_page);
> +}
> +
> +switch ( qinval_page[i].q.inv_wait_dsc.lo.type )
> +{
> +case TYPE_INVAL_WAIT:
> +if ( qinval_page[i].q.inv_wait_dsc.lo.sw )
> +{
> +uint32_t data = qinval_page[i].q.inv_wait_dsc.lo.sdata;
> +uint64_t addr = (qinval_page[i].q.inv_wait_dsc.hi.saddr << 2);

Unneeded parentheses.

> +
> +ret = hvm_copy_to_guest_phys(addr, , sizeof(data), current);
> +if ( ret )
> +vvtd_info("Failed to write status address");

Don't you need to return or do something here? (like raise some kind
of error?)

> +}
> +
> +/*
> + * The following code generates an invalidation completion event
> + * indicating the invalidation wait descriptor completion. Note that
> + * the following code fragment is not tested properly.
> + */
> +if ( qinval_page[i].q.inv_wait_dsc.lo.iflag )
> +{
> 

Re: [Xen-devel] [PATCH V3 27/29] x86/vvtd: Enable Queued Invalidation through GCMD

2017-10-20 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:02:08PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> Software writes to QIE field of GCMD to enable or disable queued
> invalidations. This patch emulates QIE field of GCMD.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/drivers/passthrough/vtd/iommu.h |  3 ++-
>  xen/drivers/passthrough/vtd/vvtd.c  | 17 +
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h 
> b/xen/drivers/passthrough/vtd/iommu.h
> index e19b045..c69cd21 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -162,7 +162,8 @@
>  #define DMA_GSTS_FLS(((u64)1) << 29)
>  #define DMA_GSTS_AFLS   (((u64)1) << 28)
>  #define DMA_GSTS_WBFS   (((u64)1) << 27)
> -#define DMA_GSTS_QIES   (((u64)1) <<26)
> +#define DMA_GSTS_QIES_SHIFT 26
> +#define DMA_GSTS_QIES   (((u64)1) << DMA_GSTS_QIES_SHIFT)
>  #define DMA_GSTS_IRES_SHIFT 25
>  #define DMA_GSTS_IRES   (((u64)1) << DMA_GSTS_IRES_SHIFT)
>  #define DMA_GSTS_SIRTPS_SHIFT   24
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> index 745941c..55f7a46 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -496,6 +496,19 @@ static void vvtd_handle_gcmd_ire(struct vvtd *vvtd, 
> uint32_t val)
>  }
>  }
>  
> +static void vvtd_handle_gcmd_qie(struct vvtd *vvtd, uint32_t val)

I would use 'write' intead of 'handle', since this is only used by the
write path.

Also you should consider dropping the vvtd prefixes from the static
functions. It's quite clear they are vvtd related, and since they are
static there's no need to add such a prefix.

> +{
> +vvtd_info("%sable Queue Invalidation", (val & DMA_GCMD_QIE) ? "En" : 
> "Dis");
> +
> +if ( val & DMA_GCMD_QIE )
> +vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_QIES_SHIFT);
> +else
> +{
> +vvtd_set_reg_quad(vvtd, DMAR_IQH_REG, 0);
> +vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_QIES_SHIFT);
> +}
> +}

Since I've seen this pattern in other functions, it might be worth
adding a helper that does:

VVTD_SET_BIT(reg, bit, val)
{
if ( val )
set_bit(...);
else
clear_bit(...);
}

Then the above function could be reduced to:

VVTD_SET_BIT(reg, bit, val);
if ( !val )
vvtd_set_reg_quad(vvtd, DMAR_IQH_REG, 0);

I expect other functions can also be simplified by this macro.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10] string: fix memmove when size is 0

2017-10-20 Thread Roger Pau Monné
On Fri, Oct 20, 2017 at 01:17:40AM -0600, Jan Beulich wrote:
> >>> On 17.10.17 at 14:03,  wrote:
> > --- a/xen/arch/x86/string.c
> > +++ b/xen/arch/x86/string.c
> > @@ -39,6 +39,9 @@ void *(memmove)(void *dest, const void *src, size_t n)
> >  {
> >  long d0, d1, d2;
> >  
> > +if ( !n )
> > +return;
> 
> Actually - I can't see how this would build successfully: The function
> returns void *, not void. I'm taking the liberty to fix this (and also
> add unlikely()) while committing.

Thanks and sorry. I tested this with clang 5.0 + ubsan enabled, but I
have no idea why/how that compiled.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 26/29] x86/vvtd: Handle interrupt translation faults

2017-10-20 Thread Roger Pau Monné
On Fri, Oct 20, 2017 at 01:54:15PM +0800, Chao Gao wrote:
> On Thu, Oct 19, 2017 at 05:31:37PM +0100, Roger Pau Monné wrote:
> >On Thu, Sep 21, 2017 at 11:02:07PM -0400, Lan Tianyu wrote:
> >> +static int vvtd_alloc_frcd(struct vvtd *vvtd)
> >> +{
> >> +int prev;
> >> +uint64_t cap = vvtd_get_reg(vvtd, DMAR_CAP_REG);
> >> +unsigned int base = cap_fault_reg_offset(cap);
> >> +
> >> +/* Set the F bit to indicate the FRCD is in use. */
> >> +if ( !vvtd_test_and_set_bit(vvtd,
> >> +base + vvtd->status.fault_index * 
> >> DMA_FRCD_LEN +
> >> +DMA_FRCD3_OFFSET, DMA_FRCD_F_SHIFT) )
> >> +{
> >> +prev = vvtd->status.fault_index;
> >> +vvtd->status.fault_index = (prev + 1) % cap_num_fault_regs(cap);
> >> +return vvtd->status.fault_index;
> >
> >I would prefer that you return the index as an unsigned int parameter
> >passed by reference rather than as the return value of the function,
> >but that might not be the preference of others.
> 
> What are the pros and cons?

I personally don't like return values that have different meanings
depending on it's sign. Here < 0 means error, while >= 0 is used to
deliver some information.

What I didn't like here specifically (apart from the rant above) is
that I would prefer index to be unsigned int, but I'm not sure that's
enough to ask you to change the function prototype. Just leave it
as-is unless someone else complains.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 15/29] x86/vvtd: Process interrupt remapping request

2017-10-20 Thread Roger Pau Monné
On Fri, Oct 20, 2017 at 01:16:37PM +0800, Chao Gao wrote:
> On Thu, Oct 19, 2017 at 03:26:30PM +0100, Roger Pau Monné wrote:
> >On Thu, Sep 21, 2017 at 11:01:56PM -0400, Lan Tianyu wrote:
> >> +static void unmap_guest_page(void *virt)
> >> +{
> >> +struct page_info *page;
> >> +
> >> +ASSERT((unsigned long)virt & PAGE_MASK);
> >
> >I'm not sure I get the point of the check above.
> 
> I intended to check the address is 4K-page aligned. It should be
> 
> ASSERT(!((unsigned long)virt & (PAGE_SIZE - 1)))

Please use the IS_ALIGNED macro.

> >
> >> +}
> >> +return;
> >> +}
> >> +
> >> +static bool vvtd_irq_request_sanity_check(const struct vvtd *vvtd,
> >> +  struct 
> >> arch_irq_remapping_request *irq)
> >> +{
> >> +if ( irq->type == VIOMMU_REQUEST_IRQ_APIC )
> >> +{
> >> +struct IO_APIC_route_remap_entry rte = { .val = irq->msg.rte };
> >> +
> >> +ASSERT(rte.format);
> >
> >Is it fine to ASSERT here? Can't the guest set rte.format to whatever
> >it wants?
> 
> Guest can use legacy format interrupt (i.e. rte.format = 0). However,
> we only reach here when callback 'check_irq_remapping' return true and
> for vvtd, 'check_irq_remapping' just returns the format bit of irq request.
> If here ret.format isn't true, there must be a bug in our code.

Are you sure the correct locks are hold here to prevent the guest
from changing rte while all this processing is happening?

> >> +vvtd_handle_fault(vvtd, irq, NULL, VTD_FR_IR_REQ_RSVD, 
> >> record_fault);
> >> +return -EINVAL;
> >> +}
> >> +
> >> +if ( entry > vvtd->status.irt_max_entry )
> >> +{
> >> +vvtd_handle_fault(vvtd, irq, NULL, VTD_FR_IR_INDEX_OVER, 
> >> record_fault);
> >> +return -EACCES;
> >> +}
> >> +
> >> +irt_page = map_guest_page(vvtd->domain,
> >> +  vvtd->status.irt + (entry >> 
> >> IREMAP_ENTRY_ORDER));
> >
> >Since AFAICT you have to read this page(s) every time an interrupt
> >needs to be delivered, wouldn't it make sense for performance reasons
> >to have the page permanently mapped?
> 
> Yes. It is. Actually, we have a draft patch to do this. But to justify
> the necessity, I should run some benchmark at first. Mapping a guest
> page is slow on x86, right?

The issue is the tblflush, not the actual modifications of the page
tables.

> >
> >What's the maximum number of pages that can be used here?
> 
> VT-d current support 2^16 entries at most. The size of each entry is 128
> byte. Thus, we need 2^11 pages at most.

Those are guest pages at the end, so it shouldn't be a problem.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 12/29] x86/vvtd: Add MMIO handler for VVTD

2017-10-20 Thread Roger Pau Monné
On Fri, Oct 20, 2017 at 10:58:32AM +0800, Chao Gao wrote:
> On Thu, Oct 19, 2017 at 12:34:54PM +0100, Roger Pau Monné wrote:
> >On Thu, Sep 21, 2017 at 11:01:53PM -0400, Lan Tianyu wrote:
> >> +return 0;
> >> +}
> >> +
> >> +static int vvtd_read(struct vcpu *v, unsigned long addr,
> >> + unsigned int len, unsigned long *pval)
> >> +{
> >> +struct vvtd *vvtd = domain_vvtd(v->domain);
> >> +unsigned int offset = addr - vvtd->base_addr;
> >> +
> >> +vvtd_info("Read offset %x len %d\n", offset, len);
> >> +
> >> +if ( (len != 4 && len != 8) || (offset & (len - 1)) )
> >
> >What value does hardware return when performing unaligned reads or
> >reads with wrong size?
> 
> According to VT-d spec section 10.2, "Software must access 64-bit and
> 128-bit registers as either aligned quadwords or aligned doublewords".
> I am afraid there is no specific hardware action for unaligned access
> information. We can treat it as undefined? Then do nothing.
> But I did see windows driver has such accesses. We need to add a
> workaround for windows later.

I would recommend that you do *pval = ~0ul; in that case then.

> >
> >Here you return with pval not set, which is dangerous.
> 
> Indeed. But I need check whether the pval is initialized by the caller.
> If that, it is safe.

Yes, this was recently fixed as part of a XSA, but I would rather
prefer to set pval here in the error case.

> >
> >> +return X86EMUL_OKAY;
> >> +
> >> +if ( len == 4 )
> >> +*pval = vvtd_get_reg(vvtd, offset);
> >> +else
> >> +*pval = vvtd_get_reg_quad(vvtd, offset);
> >
> >...yet here you don't check for offset < 1024.
> >
> >> +
> >> +return X86EMUL_OKAY;
> >> +}
> >> +
> >> +static int vvtd_write(struct vcpu *v, unsigned long addr,
> >> +  unsigned int len, unsigned long val)
> >> +{
> >> +struct vvtd *vvtd = domain_vvtd(v->domain);
> >> +unsigned int offset = addr - vvtd->base_addr;
> >> +
> >> +vvtd_info("Write offset %x len %d val %lx\n", offset, len, val);
> >> +
> >> +if ( (len != 4 && len != 8) || (offset & (len - 1)) )
> >> +return X86EMUL_OKAY;
> >> +
> >> +if ( len == 4 )
> >> +{
> >> +switch ( offset )
> >> +{
> >> +case DMAR_IEDATA_REG:
> >> +case DMAR_IEADDR_REG:
> >> +case DMAR_IEUADDR_REG:
> >> +case DMAR_FEDATA_REG:
> >> +case DMAR_FEADDR_REG:
> >> +case DMAR_FEUADDR_REG:
> >> +vvtd_set_reg(vvtd, offset, val);
> >
> >Hm, so you are using a full page when you only care for 6 4B
> >registers? Seem like quite of a waste of memory.
> 
> Registers are added here when according features are introduced.

Even at the end of the series it doesn't seem like you are adding
support to 256 registers. From the code it seems like you allow writes
to 16 4B registers, and although you allow read access to all the
register space it seems quite dubious that you need 256 registers.
Hence the question about trying to minimize memory usage to what's
really needed.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 26/29] x86/vvtd: Handle interrupt translation faults

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:02:07PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> Interrupt translation faults are non-recoverable fault. When faults
> are triggered, it needs to populate fault info to Fault Recording
> Registers and inject vIOMMU msi interrupt to notify guest IOMMU driver
> to deal with faults.
> 
> This patch emulates hardware's handling interrupt translation
> faults (more information about the process can be found in VT-d spec,
> chipter "Translation Faults", section "Non-Recoverable Fault
> Reporting" and section "Non-Recoverable Logging").
> Specifically, viommu_record_fault() records the fault information and
> viommu_report_non_recoverable_fault() reports faults to software.
> Currently, only Primary Fault Logging is supported and the Number of
> Fault-recording Registers is 1.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/drivers/passthrough/vtd/iommu.h |  60 +++--
>  xen/drivers/passthrough/vtd/vvtd.c  | 252 
> +++-
>  2 files changed, 301 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h 
> b/xen/drivers/passthrough/vtd/iommu.h
> index 790384f..e19b045 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -198,26 +198,66 @@
>  #define DMA_CCMD_CAIG_MASK(x) (((u64)x) & ((u64) 0x3 << 59))
>  
>  /* FECTL_REG */
> -#define DMA_FECTL_IM (((u64)1) << 31)
> +#define DMA_FECTL_IM_SHIFT 31
> +#define DMA_FECTL_IM (1U << DMA_FECTL_IM_SHIFT)
> +#define DMA_FECTL_IP_SHIFT 30
> +#define DMA_FECTL_IP (1U << DMA_FECTL_IP_SHIFT)

Is it fine to change those from uint64_t to unsigned int?

>  
>  /* FSTS_REG */
> -#define DMA_FSTS_PFO ((u64)1 << 0)
> -#define DMA_FSTS_PPF ((u64)1 << 1)
> -#define DMA_FSTS_AFO ((u64)1 << 2)
> -#define DMA_FSTS_APF ((u64)1 << 3)
> -#define DMA_FSTS_IQE ((u64)1 << 4)
> -#define DMA_FSTS_ICE ((u64)1 << 5)
> -#define DMA_FSTS_ITE ((u64)1 << 6)
> -#define DMA_FSTS_FAULTSDMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | 
> DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE
> +#define DMA_FSTS_PFO_SHIFT 0
> +#define DMA_FSTS_PFO (1U << DMA_FSTS_PFO_SHIFT)
> +#define DMA_FSTS_PPF_SHIFT 1
> +#define DMA_FSTS_PPF (1U << DMA_FSTS_PPF_SHIFT)
> +#define DMA_FSTS_AFO (1U << 2)
> +#define DMA_FSTS_APF (1U << 3)
> +#define DMA_FSTS_IQE (1U << 4)
> +#define DMA_FSTS_ICE (1U << 5)
> +#define DMA_FSTS_ITE (1U << 6)

This seemingly non-functional changes should be done in a separate
patch.

> +#define DMA_FSTS_PRO_SHIFT 7
> +#define DMA_FSTS_PRO (1U << DMA_FSTS_PRO_SHIFT)
> +#define DMA_FSTS_FAULTS(DMA_FSTS_PFO | DMA_FSTS_PPF | DMA_FSTS_AFO | \
> +DMA_FSTS_APF | DMA_FSTS_IQE | DMA_FSTS_ICE | \
> +DMA_FSTS_ITE | DMA_FSTS_PRO)
> +#define DMA_FSTS_RW1CS (DMA_FSTS_PFO | DMA_FSTS_AFO | DMA_FSTS_APF | \
> +DMA_FSTS_IQE | DMA_FSTS_ICE | DMA_FSTS_ITE | \
> +DMA_FSTS_PRO)
>  #define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff)
>  
>  /* FRCD_REG, 32 bits access */
> -#define DMA_FRCD_F (((u64)1) << 31)
> +#define DMA_FRCD_LEN0x10
> +#define DMA_FRCD2_OFFSET0x8
> +#define DMA_FRCD3_OFFSET0xc
> +#define DMA_FRCD_F_SHIFT31
> +#define DMA_FRCD_F ((u64)1 << DMA_FRCD_F_SHIFT)
>  #define dma_frcd_type(d) ((d >> 30) & 1)
>  #define dma_frcd_fault_reason(c) (c & 0xff)
>  #define dma_frcd_source_id(c) (c & 0x)
>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
>  
> +struct vtd_fault_record_register
> +{
> +union {
> +struct {
> +uint64_t lo;
> +uint64_t hi;
> +} bits;
> +struct {
> +uint64_t rsvd0  :12,
> + fault_info :52;
> +uint64_t source_id  :16,
> + rsvd1  :9,
> + pmr:1,  /* Privilege Mode Requested */
> + exe:1,  /* Execute Permission Requested */
> + pasid_p:1,  /* PASID Present */
> + fault_reason   :8,  /* Fault Reason */
> + pasid_val  :20, /* PASID Value */
> + addr_type  :2,  /* Address Type */
> + type   :1,  /* Type. (0) Write (1) 
> Read/AtomicOp */
> + fault  :1;  /* Fault */
> +} fields;
> +};
> +};
> +
>  enum VTD_FAULT_TYPE
>  {
>  /* Interrupt remapping transition faults */
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> index bd1cadd..745941c 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -19,6 +19,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -41,6 +42,7 @@ unsigned int vvtd_caps = 

Re: [Xen-devel] [PATCH V3 25/29] x86/vmsi: Hook delivering remapping format msi to guest

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:02:06PM -0400, Lan Tianyu wrote:
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index 6196334..349a8cf 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -942,21 +942,20 @@ static void __msi_pirq_eoi(struct hvm_pirq_dpci 
> *pirq_dpci)
>  static int _hvm_dpci_msi_eoi(struct domain *d,
>   struct hvm_pirq_dpci *pirq_dpci, void *arg)
>  {
> -int vector = (long)arg;
> +uint8_t vector, dlm, vector_target = (long)arg;

Since you are changing this, please cast to (uint8_t) instead.

> +uint32_t dest;
> +bool dm;

Why are you moving dest, dm, dlm and vector here?

>  
> -if ( (pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI) &&
> - (pirq_dpci->gmsi.legacy.gvec == vector) )
> +if ( pirq_dpci->flags & HVM_IRQ_DPCI_MACH_MSI )
>  {
> -unsigned int dest = MASK_EXTR(pirq_dpci->gmsi.legacy.gflags,
> -  XEN_DOMCTL_VMSI_X86_DEST_ID_MASK);
> -bool dest_mode = pirq_dpci->gmsi.legacy.gflags &
> - XEN_DOMCTL_VMSI_X86_DM_MASK;

AFAICT their scope is limited to this if.

> +if ( pirq_dpci_2_msi_attr(d, pirq_dpci, , , , ) )
> +return 0;
>  
> -if ( vlapic_match_dest(vcpu_vlapic(current), NULL, 0, dest,
> -   dest_mode) )
> +if ( vector == vector_target &&
> + vlapic_match_dest(vcpu_vlapic(current), NULL, 0, dest, dm) )
>  {
> -__msi_pirq_eoi(pirq_dpci);
> -return 1;
> +__msi_pirq_eoi(pirq_dpci);
> +return 1;
>  }
>  }
>  
> -- 
> 1.8.3.1
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 24/29] tools/libxc: Add a new interface to bind remapping format msi with pirq

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:02:05PM -0400, Lan Tianyu wrote:
> From: Chao Gao 

The title for this patch it's wrong, it modifies both the hypervisor
and libxc. Please fix it.

> When exposing vIOMMU (vvtd) to guest, guest can configure the msi to
> remapping format. For pass-through device, the physical interrupt now
> can be bound with remapping format msi. This patch introduce a flag,
> HVM_IRQ_DPCI_GUEST_REMAPPED, which indicate a physical interrupt is
> bound with remapping format guest interrupt. Thus, we can use
> (HVM_IRQ_DPCI_GUEST_REMAPPED | HVM_IRQ_DPCI_GUEST_MSI) to show the new
> binding type. Also provide an new interface to manage the new binding.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> 
> ---
> diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
> index bd8a918..4f5d37b 100644
> --- a/xen/include/asm-x86/hvm/irq.h
> +++ b/xen/include/asm-x86/hvm/irq.h
> @@ -121,6 +121,7 @@ struct dev_intx_gsi_link {
>  #define _HVM_IRQ_DPCI_GUEST_PCI_SHIFT   4
>  #define _HVM_IRQ_DPCI_GUEST_MSI_SHIFT   5
>  #define _HVM_IRQ_DPCI_IDENTITY_GSI_SHIFT6
> +#define _HVM_IRQ_DPCI_GUEST_REMAPPED_SHIFT  7
>  #define _HVM_IRQ_DPCI_TRANSLATE_SHIFT  15
>  #define HVM_IRQ_DPCI_MACH_PCI(1u << _HVM_IRQ_DPCI_MACH_PCI_SHIFT)
>  #define HVM_IRQ_DPCI_MACH_MSI(1u << _HVM_IRQ_DPCI_MACH_MSI_SHIFT)
> @@ -128,6 +129,7 @@ struct dev_intx_gsi_link {
>  #define HVM_IRQ_DPCI_EOI_LATCH   (1u << _HVM_IRQ_DPCI_EOI_LATCH_SHIFT)
>  #define HVM_IRQ_DPCI_GUEST_PCI   (1u << _HVM_IRQ_DPCI_GUEST_PCI_SHIFT)
>  #define HVM_IRQ_DPCI_GUEST_MSI   (1u << _HVM_IRQ_DPCI_GUEST_MSI_SHIFT)
> +#define HVM_IRQ_DPCI_GUEST_REMAPPED  (1u << 
> _HVM_IRQ_DPCI_GUEST_REMAPPED_SHIFT)
>  #define HVM_IRQ_DPCI_IDENTITY_GSI(1u << _HVM_IRQ_DPCI_IDENTITY_GSI_SHIFT)
>  #define HVM_IRQ_DPCI_TRANSLATE   (1u << _HVM_IRQ_DPCI_TRANSLATE_SHIFT)

Please keep this sorted. It should go after the _GSI one.

>  
> @@ -137,6 +139,11 @@ struct hvm_gmsi_info {
>  uint32_t gvec;
>  uint32_t gflags;
>  } legacy;
> +struct {
> +uint32_t source_id;
> +uint32_t data;
> +uint64_t addr;
> +} intremap;
>  };
>  int dest_vcpu_id; /* -1 :multi-dest, non-negative: dest_vcpu_id */
>  bool posted; /* directly deliver to guest via VT-d PI? */
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 68854b6..8c59cfc 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -559,6 +559,7 @@ typedef enum pt_irq_type_e {
>  PT_IRQ_TYPE_MSI,
>  PT_IRQ_TYPE_MSI_TRANSLATE,
>  PT_IRQ_TYPE_SPI,/* ARM: valid range 32-1019 */
> +PT_IRQ_TYPE_MSI_IR,

Introducing a new irq type seems dubious, at the end this is still a
MSI interrupt.

>  } pt_irq_type_t;
>  struct xen_domctl_bind_pt_irq {
>  uint32_t machine_irq;
> @@ -586,6 +587,12 @@ struct xen_domctl_bind_pt_irq {
>  uint64_aligned_t gtable;
>  } msi;
>  struct {
> +uint32_t source_id;
> +uint32_t data;
> +uint64_t addr;
> +uint64_t gtable;
> +} msi_ir;

Have you tried to expand gflags somehow so that you don't need a new
type together with a new structure?

It seems quite cumbersome and also involves adding more handlers to
libxc.

At the end this is a domctl interface, so you should be able to modify
it at will.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 22/29] x86/vioapic: extend vioapic_get_vector() to support remapping format RTE

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:02:03PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> When IOAPIC RTE is in remapping format, it doesn't contain the vector of
> interrupt. For this case, the RTE contains an index of interrupt remapping
> table where the vector of interrupt is stored. This patchs gets the vector
> through a vIOMMU interface.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/arch/x86/hvm/vioapic.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
> index 5d0d1cd..9e47ef4 100644
> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -561,11 +561,25 @@ int vioapic_get_vector(const struct domain *d, unsigned 
> int gsi)
>  {
>  unsigned int pin;
>  const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, );
> +struct arch_irq_remapping_request request;
>  
>  if ( !vioapic )
>  return -EINVAL;
>  
> -return vioapic->redirtbl[pin].fields.vector;
> +irq_request_ioapic_fill(, vioapic->id, 
> vioapic->redirtbl[pin].bits);
> +if ( viommu_check_irq_remapping(vioapic->domain, ) )
> +{
> +int err;
> +struct arch_irq_remapping_info info;
> +
> +err = viommu_get_irq_info(vioapic->domain, , );
> +return !err ? info.vector : err;

You can simplify this as return err :? info.vector;

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 21/29] VIOMMU: Introduce callback of checking irq remapping mode

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:02:02PM -0400, Lan Tianyu wrote:
> This patch is to add callback for vIOAPIC and vMSI to check whether interrupt
> remapping is enabled.
> 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/common/viommu.c  | 15 +++
>  xen/include/xen/viommu.h | 10 ++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> index 0708e43..ff95465 100644
> --- a/xen/common/viommu.c
> +++ b/xen/common/viommu.c
> @@ -194,6 +194,21 @@ int viommu_get_irq_info(struct domain *d,
>  return viommu->ops->get_irq_info(d, request, irq_info);
>  }
>  
> +bool viommu_check_irq_remapping(struct domain *d,
> +struct arch_irq_remapping_request *request)

Both should be constified.

> +{
> +struct viommu *viommu = d->viommu;
> +
> +if ( !viommu )
> +return false;
> +
> +ASSERT(viommu->ops);
> +if ( !viommu->ops->check_irq_remapping )
> +return false;
> +
> +return viommu->ops->check_irq_remapping(d, request);

IMHO this helper should be introduced together with the vvtd
implementation of check_irq_remapping.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 20/29] VIOMMU: Add get irq info callback to convert irq remapping request

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:02:01PM -0400, Lan Tianyu wrote:
> This patch is to add get_irq_info callback for platform implementation
> to convert irq remapping request to irq info (E,G vector, dest, dest_mode
> and so on).
> 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/common/viommu.c  | 16 
>  xen/include/asm-x86/viommu.h |  8 
>  xen/include/xen/viommu.h | 14 ++
>  3 files changed, 38 insertions(+)
> 
> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> index b517158..0708e43 100644
> --- a/xen/common/viommu.c
> +++ b/xen/common/viommu.c
> @@ -178,6 +178,22 @@ int viommu_handle_irq_request(struct domain *d,
>  return viommu->ops->handle_irq_request(d, request);
>  }
>  
> +int viommu_get_irq_info(struct domain *d,
> +struct arch_irq_remapping_request *request,
> +struct arch_irq_remapping_info *irq_info)
> +{
> +struct viommu *viommu = d->viommu;
> +
> +if ( !viommu )
> +return -EINVAL;

OK, here there's a check for !viommu. Can we please have this written
down in the header? (ie: which functions are safe/expected to be
called without a viommu)

> +
> +ASSERT(viommu->ops);
> +if ( !viommu->ops->get_irq_info )
> +return -EINVAL;
> +
> +return viommu->ops->get_irq_info(d, request, irq_info);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
> index 366fbb6..586b6bd 100644
> --- a/xen/include/asm-x86/viommu.h
> +++ b/xen/include/asm-x86/viommu.h
> @@ -24,6 +24,14 @@
>  #define VIOMMU_REQUEST_IRQ_MSI  0
>  #define VIOMMU_REQUEST_IRQ_APIC 1
>  
> +struct arch_irq_remapping_info
> +{
> +uint8_t  vector;
> +uint32_t dest;
> +uint32_t dest_mode:1;
> +uint32_t delivery_mode:3;

Why uint32_t for this two last fields? Also please sort them so that
the padding is limited at the end of the structure.

> +};
> +
>  struct arch_irq_remapping_request
>  {
>  union {
> diff --git a/xen/include/xen/viommu.h b/xen/include/xen/viommu.h
> index 230f6b1..beb40cd 100644
> --- a/xen/include/xen/viommu.h
> +++ b/xen/include/xen/viommu.h
> @@ -21,6 +21,7 @@
>  #define __XEN_VIOMMU_H__
>  
>  struct viommu;
> +struct arch_irq_remapping_info;
>  struct arch_irq_remapping_request;

If you include asm/viommu.h in viommu.h you don't need to forward
declarations.

>  
>  struct viommu_ops {
> @@ -28,6 +29,9 @@ struct viommu_ops {
>  int (*destroy)(struct viommu *viommu);
>  int (*handle_irq_request)(struct domain *d,
>struct arch_irq_remapping_request *request);
> +int (*get_irq_info)(struct domain *d,
> +struct arch_irq_remapping_request *request,

AFAICT d and request should be constified.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 19/29] x86/vioapic: Hook interrupt delivery of vIOAPIC

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:02:00PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> When irq remapping is enabled, IOAPIC Redirection Entry may be in remapping
> format. If that, generate an irq_remapping_request and call the common
> VIOMMU abstraction's callback to handle this interrupt request. Device
> model is responsible for checking the request's validity.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> 
> ---
> v3:
>  - use the new interface to check remapping format.
> ---
>  xen/arch/x86/hvm/vioapic.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
> index 72cae93..5d0d1cd 100644
> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -30,6 +30,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -38,6 +39,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

I think asm/viommu should be included by viommu.h.

>  
>  /* HACK: Route IRQ0 only to VCPU0 to prevent time jumps. */
>  #define IRQ0_SPECIAL_ROUTING 1
> @@ -387,9 +389,17 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, 
> unsigned int pin)
>  struct vlapic *target;
>  struct vcpu *v;
>  unsigned int irq = vioapic->base_gsi + pin;
> +struct arch_irq_remapping_request request;
>  
>  ASSERT(spin_is_locked(>arch.hvm_domain.irq_lock));
>  
> +irq_request_ioapic_fill(, vioapic->id, 
> vioapic->redirtbl[pin].bits);

So the macro introduced in the previous patch should instead be
introduced here.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 18/29] VIOMMU: Add irq request callback to deal with irq remapping

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:59PM -0400, Lan Tianyu wrote:
> This patch is to add irq request callback for platform implementation
> to deal with irq remapping request.
> 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/common/viommu.c  | 15 +
>  xen/include/asm-x86/viommu.h | 72 
> 
>  xen/include/xen/viommu.h | 11 +++
>  3 files changed, 98 insertions(+)
>  create mode 100644 xen/include/asm-x86/viommu.h
> 
> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> index 55feb5d..b517158 100644
> --- a/xen/common/viommu.c
> +++ b/xen/common/viommu.c
> @@ -163,6 +163,21 @@ int viommu_domctl(struct domain *d, struct 
> xen_domctl_viommu_op *op,
>  return rc;
>  }
>  
> +int viommu_handle_irq_request(struct domain *d,
> +  struct arch_irq_remapping_request *request)
> +{
> +struct viommu *viommu = d->viommu;
> +
> +if ( !viommu )
> +return -EINVAL;

ENODEV

> +
> +ASSERT(viommu->ops);
> +if ( !viommu->ops->handle_irq_request )
> +return -EINVAL;
> +
> +return viommu->ops->handle_irq_request(d, request);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/asm-x86/viommu.h b/xen/include/asm-x86/viommu.h
> new file mode 100644
> index 000..366fbb6
> --- /dev/null
> +++ b/xen/include/asm-x86/viommu.h
> @@ -0,0 +1,72 @@
> +/*
> + * include/asm-x86/viommu.h
> + *
> + * Copyright (c) 2017 Intel Corporation.
> + * Author: Lan Tianyu 
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program; If not, see .
> + *
> + */
> +#ifndef __ARCH_X86_VIOMMU_H__
> +#define __ARCH_X86_VIOMMU_H__
> +
> +/* IRQ request type */
> +#define VIOMMU_REQUEST_IRQ_MSI  0
> +#define VIOMMU_REQUEST_IRQ_APIC 1
> +
> +struct arch_irq_remapping_request

Oh, so you have been using arch_irq_remapping_request in previous
patches without it being introduced. This is becoming more and more
hard to review. I will try to finish reviewing the whole series but
please, in the future make sure that each patch compiles on it's
own.

It's impossible to properly review a series when you use a structure
that has not yet been introduced.

> +{
> +union {
> +/* MSI */
> +struct {
> +uint64_t addr;
> +uint32_t data;
> +} msi;
> +/* Redirection Entry in IOAPIC */
> +uint64_t rte;
> +} msg;
> +uint16_t source_id;
> +uint8_t type;

Why don't you make this an enum?

> +};
> +
> +static inline void irq_request_ioapic_fill(struct arch_irq_remapping_request 
> *req,
> +   uint32_t ioapic_id, uint64_t rte)
> +{
> +ASSERT(req);
> +req->type = VIOMMU_REQUEST_IRQ_APIC;
> +req->source_id = ioapic_id;
> +req->msg.rte = rte;
> +}
> +
> +static inline void irq_request_msi_fill(struct arch_irq_remapping_request 
> *req,
> +uint32_t source_id, uint64_t addr,
> +uint32_t data)
> +{
> +ASSERT(req);
> +req->type = VIOMMU_REQUEST_IRQ_MSI;
> +req->source_id = source_id;
> +req->msg.msi.addr = addr;
> +req->msg.msi.data = data;
> +}

You are introducing two functions here that are not used in this
patch. They should be added when they are used, or else it's very hard
to review.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 17/29] x86/vvtd: add a helper function to decide the interrupt format

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:58PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> Different platform may use different method to distinguish
> remapping format interrupt and normal format interrupt.
> 
> Intel uses one bit in IOAPIC RTE or MSI address register to
> indicate the interrupt is remapping format. vvtd will handle
> all the interrupts when .check_irq_remapping() return true.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/drivers/passthrough/vtd/vvtd.c | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> index 5e22ace..bd1cadd 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -536,6 +536,28 @@ static int vvtd_get_irq_info(struct domain *d,
>  return 0;
>  }
>  
> +/* Probe whether the interrupt request is an remapping format */
> +static bool vvtd_is_remapping(struct domain *d,
> +  struct arch_irq_remapping_request *irq)
> +{
> +if ( irq->type == VIOMMU_REQUEST_IRQ_APIC )
> +{
> +struct IO_APIC_route_remap_entry rte = { .val = irq->msg.rte };
> +
> +return rte.format;
> +}
> +else if ( irq->type == VIOMMU_REQUEST_IRQ_MSI )
> +{
> +struct msi_msg_remap_entry msi_msg =
> +{ .address_lo = { .val = irq->msg.msi.addr } };
> +
> +return msi_msg.address_lo.format;
> +}
> +ASSERT_UNREACHABLE();

Switch please.

Also there's a bunch of temporary IO_APIC_route_remap_entry and
msi_msg_remap_entry local structures. Why don't you just create some
kind of union in arch_irq_remapping_request so that you don't need to
do this each time?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 16/29] x86/vvtd: decode interrupt attribute from IRTE

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:57PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> Without interrupt remapping, interrupt attributes can be extracted from
> msi message or IOAPIC RTE. However, with interrupt remapping enabled,
> the attributes are enclosed in the associated IRTE. This callback is
> for cases in which the caller wants to acquire interrupt attributes, for
> example:
> 1. vioapic_get_vector(). With vIOMMU, the RTE may don't contain vector.
^ not
> 2. perform EOI which is always based on the interrupt vector.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
> v3:
>  - add example cases in which we will use this function.
> ---
>  xen/drivers/passthrough/vtd/vvtd.c | 23 ++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> index 90c00f5..5e22ace 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -516,6 +516,26 @@ static int vvtd_handle_irq_request(struct domain *d,
>   irte.remap.tm);
>  }
>  
> +static int vvtd_get_irq_info(struct domain *d,
> + struct arch_irq_remapping_request *irq,
> + struct arch_irq_remapping_info *info)
> +{
> +int ret;
> +struct iremap_entry irte;
> +struct vvtd *vvtd = domain_vvtd(d);

I've realized that some of the helpers perform a if (!vvtd ) return
check, while others don't (like this one). Are some handlers expected
to be called without a vIOMMU? If so it would be good to list them
clearly.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Block device hang after migration

2017-10-19 Thread Roger Pau Monné
On Thu, Oct 19, 2017 at 11:53:11AM +0100, Wei Liu wrote:
> Hi
> 
> In the process of upgrading osstest to Stretch, I discovered an issue
> with the block device. This happens after a local migration.
> 
> [  127.216232] Freezing user space processes ... (elapsed 0.005 seconds) done.
> [  127.222143] Freezing remaining freezable tasks ... 
> [  147.228913] Freezing of tasks failed after 20.006 seconds (1 tasks 
> refusing to freeze, wq_busy=0):
> [  147.228935] jbd2/xvda1-8D0   143  2 0x
> [  147.228964]  880005109000  88000569e000 
> 88001f918240
> [  147.228984]  88001ea7a000 c9004029bb30 816038e3 
> c9004029bbe8
> [  147.229001]  00ff8800056d1500 88001f918240  
> 88000569e000
> [  147.229028] Call Trace:
> [  147.229274]  [] ? __schedule+0x233/0x6d0
> [  147.229297]  [] ? bit_wait+0x50/0x50
> [  147.229307]  [] ? schedule+0x32/0x80
> [  147.229318]  [] ? schedule_timeout+0x1de/0x350
> [  147.229345]  [] ? xen_clocksource_get_cycles+0x11/0x20
> [  147.229363]  [] ? ktime_get+0x3b/0xb0
> [  147.229378]  [] ? bit_wait+0x50/0x50
> [  147.229389]  [] ? io_schedule_timeout+0x9d/0x100
> [  147.229401]  [] ? prepare_to_wait+0x57/0x80
> [  147.229417]  [] ? bit_wait_io+0x17/0x60
> [  147.229427]  [] ? __wait_on_bit+0x53/0x80
> [  147.229442]  [] ? bit_wait+0x50/0x50
> [  147.229457]  [] ? out_of_line_wait_on_bit+0x7e/0xa0
> [  147.229469]  [] ? wake_atomic_t_function+0x60/0x60
> [  147.229563]  [] ? 
> jbd2_journal_commit_transaction+0xdd2/0x17a0 [jbd2]
> [  147.229589]  [] ? finish_task_switch+0x7d/0x1f0
> [  147.229612]  [] ? kjournald2+0xc2/0x260 [jbd2]
> [  147.229624]  [] ? prepare_to_wait_event+0xf0/0xf0
> [  147.229643]  [] ? commit_timeout+0x10/0x10 [jbd2]
> [  147.229656]  [] ? kthread+0xd7/0xf0
> [  147.229667]  [] ? kthread_park+0x60/0x60
> [  147.229684]  [] ? ret_from_fork+0x25/0x30
> [  147.229708] Restarting kernel threads ... done.
> [  147.230496] xen:manage: do_suspend: freeze kernel threads failed -16
> [  147.230508] Restarting tasks ... done.
> [  238.484918] 
> 
> http://logs.test-lab.xenproject.org/osstest/logs/114709/test-amd64-amd64-xl-qcow2/fiano1---var-log-xen-console-guest-debian.stretch.guest.osstest--incoming.log
> 
> And this seems to be the same issue Olivier Bonvale reported in "[Xen-devel]
> task btrfs-transacti:651 blocked for more than 120 seconds".

Not really I think, that didn't involve migration IIRC. Oliver was
attaching 26 PV disks, which starved the grant table.

> The guest in osstest uses ext4, with only one or two vbds. Kernel is Debian's
> stock kernel (4.9).

There's been a lot of patches from Juergen and others since 4.9.
osstest is currently using 4.9 also and doesn't seem to complain.

Is there any newer kernel available from backports?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 15/29] x86/vvtd: Process interrupt remapping request

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:56PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> When a remapping interrupt request arrives, remapping hardware computes the
> interrupt_index per the algorithm described in VTD spec
> "Interrupt Remapping Table", interprets the IRTE and generates a remapped
> interrupt request.
> 
> This patch introduces viommu_handle_irq_request() to emulate the process how
> remapping hardware handles a remapping interrupt request.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> 
> ---
> v3:
>  - Encode map_guest_page()'s error into void* to avoid using another parameter
> ---
>  xen/drivers/passthrough/vtd/iommu.h |  21 +++
>  xen/drivers/passthrough/vtd/vvtd.c  | 264 
> +++-
>  2 files changed, 284 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h 
> b/xen/drivers/passthrough/vtd/iommu.h
> index 703726f..790384f 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -218,6 +218,21 @@
>  #define dma_frcd_source_id(c) (c & 0x)
>  #define dma_frcd_page_addr(d) (d & (((u64)-1) << 12)) /* low 64 bit */
>  
> +enum VTD_FAULT_TYPE
> +{
> +/* Interrupt remapping transition faults */
> +VTD_FR_IR_REQ_RSVD  = 0x20, /* One or more IR request reserved
> + * fields set */
> +VTD_FR_IR_INDEX_OVER= 0x21, /* Index value greater than max */
> +VTD_FR_IR_ENTRY_P   = 0x22, /* Present (P) not set in IRTE */
> +VTD_FR_IR_ROOT_INVAL= 0x23, /* IR Root table invalid */
> +VTD_FR_IR_IRTE_RSVD = 0x24, /* IRTE Rsvd field non-zero with
> + * Present flag set */
> +VTD_FR_IR_REQ_COMPAT= 0x25, /* Encountered compatible IR
> + * request while disabled */
> +VTD_FR_IR_SID_ERR   = 0x26, /* Invalid Source-ID */
> +};

Why does this need to be an enum? Plus enum type names should not be
all in uppercase.

In any case, I would just use defines, like it's done for all other
values in the file.

> +
>  /*
>   * 0: Present
>   * 1-11: Reserved
> @@ -358,6 +373,12 @@ struct iremap_entry {
>  };
>  
>  /*
> + * When VT-d doesn't enable Extended Interrupt Mode. Hardware only interprets
> + * only 8-bits ([15:8]) of Destination-ID field in the IRTEs.
> + */
> +#define IRTE_xAPIC_DEST_MASK 0xff00
> +
> +/*
>   * Posted-interrupt descriptor address is 64 bits with 64-byte aligned, only
>   * the upper 26 bits of lest significiant 32 bits is available.
>   */
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> index a0f63e9..90c00f5 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -23,11 +23,17 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> +#include 
>  #include 
> +#include 
>  #include 
> +#include 
> +#include 
>  
>  #include "iommu.h"
> +#include "vtd.h"
>  
>  /* Supported capabilities by vvtd */
>  unsigned int vvtd_caps = VIOMMU_CAP_IRQ_REMAPPING;
> @@ -111,6 +117,132 @@ static inline uint64_t vvtd_get_reg_quad(struct vvtd 
> *vtd, uint32_t reg)
>  return vtd->regs->data64[reg/sizeof(uint64_t)];
>  }
>  
> +static void* map_guest_page(struct domain *d, uint64_t gfn)

gfn_t seems like a better type then uint64_t.

> +{
> +struct page_info *p;
> +void *ret;
> +
> +p = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);

You can do the initialization of p at declaration.

> +if ( !p )
> +return ERR_PTR(-EINVAL);
> +
> +if ( !get_page_type(p, PGT_writable_page) )
> +{
> +put_page(p);
> +return ERR_PTR(-EINVAL);
> +}
> +
> +ret = __map_domain_page_global(p);
> +if ( !ret )
> +{
> +put_page_and_type(p);
> +return ERR_PTR(-ENOMEM);
> +}
> +
> +return ret;
> +}
> +
> +static void unmap_guest_page(void *virt)
> +{
> +struct page_info *page;
> +
> +ASSERT((unsigned long)virt & PAGE_MASK);

I'm not sure I get the point of the check above.

> +page = mfn_to_page(domain_page_map_to_mfn(virt));
> +
> +unmap_domain_page_global(virt);
> +put_page_and_type(page);
> +}
> +
> +static void vvtd_inj_irq(struct vlapic *target, uint8_t vector,
> + uint8_t trig_mode, uint8_t delivery_mode)
> +{
> +vvtd_debug("dest=v%d, delivery_mode=%x vector=%d trig_mode=%d\n",
> +   vlapic_vcpu(target)->vcpu_id, delivery_mode, vector, 
> trig_mode);
> +
> +ASSERT((delivery_mode == dest_Fixed) ||
> +   (delivery_mode == dest_LowestPrio));
> +
> +vlapic_set_irq(target, vector, trig_mode);
> +}
> +
> +static int vvtd_delivery(struct domain *d, uint8_t vector,
> + uint32_t dest, uint8_t dest_mode,
> + uint8_t delivery_mode, uint8_t trig_mode)
> +{
> +struct vlapic *target;
> +struct vcpu 

Re: [Xen-devel] [PATCH V3 14/29] x86/vvtd: Enable Interrupt Remapping through GCMD

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:55PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> Software writes this field to enable/disable interrupt reampping. This patch
> emulate IRES field of GCMD.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/drivers/passthrough/vtd/iommu.h |  3 ++-
>  xen/drivers/passthrough/vtd/vvtd.c  | 30 +-
>  2 files changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h 
> b/xen/drivers/passthrough/vtd/iommu.h
> index a0d5ec8..703726f 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -163,7 +163,8 @@
>  #define DMA_GSTS_AFLS   (((u64)1) << 28)
>  #define DMA_GSTS_WBFS   (((u64)1) << 27)
>  #define DMA_GSTS_QIES   (((u64)1) <<26)
> -#define DMA_GSTS_IRES   (((u64)1) <<25)
> +#define DMA_GSTS_IRES_SHIFT 25
> +#define DMA_GSTS_IRES   (((u64)1) << DMA_GSTS_IRES_SHIFT)
>  #define DMA_GSTS_SIRTPS_SHIFT   24
>  #define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
>  #define DMA_GSTS_CFIS   (((u64)1) <<23)
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> index 6736956..a0f63e9 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -33,7 +33,8 @@
>  unsigned int vvtd_caps = VIOMMU_CAP_IRQ_REMAPPING;
>  
>  struct hvm_hw_vvtd_status {
> -uint32_t eim_enabled : 1;
> +uint32_t eim_enabled : 1,
> + intremap_enabled : 1;

Again please use bool.

>  uint32_t irt_max_entry;
>  /* Interrupt remapping table base gfn */
>  uint64_t irt;
> @@ -84,6 +85,11 @@ static inline void vvtd_set_bit(struct vvtd *vvtd, 
> uint32_t reg, int nr)
>  __set_bit(nr, >regs->data32[reg/sizeof(uint32_t)]);
>  }
>  
> +static inline void vvtd_clear_bit(struct vvtd *vvtd, uint32_t reg, int nr)
> +{
> +__clear_bit(nr, >regs->data32[reg/sizeof(uint32_t)]);
> +}

I'm not sure this functions are helpful, maybe it would be better to
just have a macro to get vvtd->regs->data32[reg/sizeof(uint32_t)]
instead, which seems to be the cumbersome part of the expression above
(and in vvtd_set_bit).

> +
>  static inline void vvtd_set_reg(struct vvtd *vtd, uint32_t reg, uint32_t 
> value)
>  {
>  vtd->regs->data32[reg/sizeof(uint32_t)] = value;
> @@ -105,6 +111,23 @@ static inline uint64_t vvtd_get_reg_quad(struct vvtd 
> *vtd, uint32_t reg)
>  return vtd->regs->data64[reg/sizeof(uint64_t)];
>  }
>  
> +static void vvtd_handle_gcmd_ire(struct vvtd *vvtd, uint32_t val)
> +{
> +vvtd_info("%sable Interrupt Remapping",
> +  (val & DMA_GCMD_IRE) ? "En" : "Dis");
> +
> +if ( val & DMA_GCMD_IRE )
> +{
> +vvtd->status.intremap_enabled = true;
> +vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
> +}
> +else
> +{
> +vvtd->status.intremap_enabled = false;
> +vvtd_clear_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);
> +}


You cna write the above like:

vvtd->status.intremap_enabled = val & DMA_GCMD_IRE;
(val & DMA_GCMD_IRE) ? vvtd_set_bit : vvtd_clear_bit
(vvtd, DMAR_GSTS_REG, DMA_GSTS_IRES_SHIFT);

Or similar (certainly setting vvtd->status.intremap_enabled doesn't
need to be branched).

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 13/29] x86/vvtd: Set Interrupt Remapping Table Pointer through GCMD

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:54PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> Software sets this field to set/update the interrupt remapping table pointer
> used by hardware. The interrupt remapping table pointer is specified through
> the Interrupt Remapping Table Address (IRTA_REG) register.
> 
> This patch emulates this operation and adds some new fields in VVTD to track
> info (e.g. the table's gfn and max supported entries) of interrupt remapping
> table.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> 
> ---
> v3:
>  - ignore unaligned r/w of vt-d hardware registers and return X86EMUL_OK
> ---
>  xen/drivers/passthrough/vtd/iommu.h | 12 ++-
>  xen/drivers/passthrough/vtd/vvtd.c  | 69 
> +
>  2 files changed, 80 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.h 
> b/xen/drivers/passthrough/vtd/iommu.h
> index ef038c9..a0d5ec8 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -153,6 +153,8 @@
>  #define DMA_GCMD_IRE(((u64)1) << 25)
>  #define DMA_GCMD_SIRTP  (((u64)1) << 24)
>  #define DMA_GCMD_CFI(((u64)1) << 23)
> +/* mask of one-shot bits */
> +#define DMA_GCMD_ONE_SHOT_MASK 0x96ff 

Trailing white space.

>  
>  /* GSTS_REG */
>  #define DMA_GSTS_TES(((u64)1) << 31)
> @@ -162,9 +164,17 @@
>  #define DMA_GSTS_WBFS   (((u64)1) << 27)
>  #define DMA_GSTS_QIES   (((u64)1) <<26)
>  #define DMA_GSTS_IRES   (((u64)1) <<25)
> -#define DMA_GSTS_SIRTPS (((u64)1) << 24)
> +#define DMA_GSTS_SIRTPS_SHIFT   24
> +#define DMA_GSTS_SIRTPS (((u64)1) << DMA_GSTS_SIRTPS_SHIFT)
>  #define DMA_GSTS_CFIS   (((u64)1) <<23)
>  
> +/* IRTA_REG */
> +/* The base of 4KB aligned interrupt remapping table */
> +#define DMA_IRTA_ADDR(val)  ((val) & ~0xfffULL)
> +/* The size of remapping table is 2^(x+1), where x is the size field in IRTA 
> */
> +#define DMA_IRTA_S(val) (val & 0xf)
> +#define DMA_IRTA_SIZE(val)  (1UL << (DMA_IRTA_S(val) + 1))
> +
>  /* PMEN_REG */
>  #define DMA_PMEN_EPM(((u32)1) << 31)
>  #define DMA_PMEN_PRS(((u32)1) << 0)
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> index a3002c3..6736956 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -32,6 +32,13 @@
>  /* Supported capabilities by vvtd */
>  unsigned int vvtd_caps = VIOMMU_CAP_IRQ_REMAPPING;
>  
> +struct hvm_hw_vvtd_status {
> +uint32_t eim_enabled : 1;

bool maybe?

> +uint32_t irt_max_entry;
> +/* Interrupt remapping table base gfn */
> +uint64_t irt;

If it's a gfn, use gfn_t as the type.

> +};
> +
>  union hvm_hw_vvtd_regs {
>  uint32_t data32[256];
>  uint64_t data64[128];
> @@ -43,6 +50,8 @@ struct vvtd {
>  uint64_t length;
>  /* Point back to the owner domain */
>  struct domain *domain;
> +
> +struct hvm_hw_vvtd_status status;

Why you need a sub-struct for this, can't this just be placed inside
of hvm_hw_vvtd_regs directly?

>  union hvm_hw_vvtd_regs *regs;
>  struct page_info *regs_page;
>  };
> @@ -70,6 +79,11 @@ struct vvtd *domain_vvtd(struct domain *d)
>  return (d->viommu) ? d->viommu->priv : NULL;
>  }
>  
> +static inline void vvtd_set_bit(struct vvtd *vvtd, uint32_t reg, int nr)
> +{
> +__set_bit(nr, >regs->data32[reg/sizeof(uint32_t)]);
> +}
> +
>  static inline void vvtd_set_reg(struct vvtd *vtd, uint32_t reg, uint32_t 
> value)
>  {
>  vtd->regs->data32[reg/sizeof(uint32_t)] = value;
> @@ -91,6 +105,44 @@ static inline uint64_t vvtd_get_reg_quad(struct vvtd 
> *vtd, uint32_t reg)
>  return vtd->regs->data64[reg/sizeof(uint64_t)];
>  }
>  
> +static void vvtd_handle_gcmd_sirtp(struct vvtd *vvtd, uint32_t val)
> +{
> +uint64_t irta = vvtd_get_reg_quad(vvtd, DMAR_IRTA_REG);
> +
> +if ( !(val & DMA_GCMD_SIRTP) )
> +return;
> +
> +vvtd->status.irt = DMA_IRTA_ADDR(irta) >> PAGE_SHIFT;
> +vvtd->status.irt_max_entry = DMA_IRTA_SIZE(irta);
> +vvtd->status.eim_enabled = !!(irta & IRTA_EIME);

If you use a bool you don't need the '!!'.

> +vvtd_info("Update IR info (addr=%lx eim=%d size=%d).",

The final '.' is unneeded IMHO.

> +  vvtd->status.irt, vvtd->status.eim_enabled,
> +  vvtd->status.irt_max_entry);
> +vvtd_set_bit(vvtd, DMAR_GSTS_REG, DMA_GSTS_SIRTPS_SHIFT);
> +}
> +
> +static int vvtd_write_gcmd(struct vvtd *vvtd, uint32_t val)
> +{
> +uint32_t orig = vvtd_get_reg(vvtd, DMAR_GSTS_REG);
> +uint32_t changed;
> +
> +orig = orig & DMA_GCMD_ONE_SHOT_MASK;   /* reset the one-shot bits */
> +changed = orig ^ val;
> +
> +if ( !changed )
> +return X86EMUL_OKAY;
> +
> +if ( changed & (changed - 1) )
> +vvtd_info("Guest attempts to write %x to GCMD (current GSTS is %x)," 

Trailing white-space.

> +  "it would lead to update 

Re: [Xen-devel] [PATCH V3 12/29] x86/vvtd: Add MMIO handler for VVTD

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:53PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> This patch adds VVTD MMIO handler to deal with MMIO access.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/drivers/passthrough/vtd/vvtd.c | 91 
> ++
>  1 file changed, 91 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> index c851ec7..a3002c3 100644
> --- a/xen/drivers/passthrough/vtd/vvtd.c
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -47,6 +47,29 @@ struct vvtd {
>  struct page_info *regs_page;
>  };
>  
> +/* Setting viommu_verbose enables debugging messages of vIOMMU */
> +bool __read_mostly viommu_verbose;
> +boolean_runtime_param("viommu_verbose", viommu_verbose);
> +
> +#ifndef NDEBUG
> +#define vvtd_info(fmt...) do {\
> +if ( viommu_verbose ) \
> +gprintk(XENLOG_G_INFO, ## fmt);   \

If you use gprintk you should use XENLOG_INFO, the '_G_' variants are
only used with plain printk.

> +} while(0)
> +#define vvtd_debug(fmt...) do {   \
> +if ( viommu_verbose && printk_ratelimit() )   \

Not sure why you need printk_ratelimit, XENLOG_G_DEBUG is already
rate-limited.

> +printk(XENLOG_G_DEBUG fmt);   \

Any reason why vvtd_info uses gprintk and here you use printk?

> +} while(0)
> +#else
> +#define vvtd_info(fmt...) do {} while(0)
> +#define vvtd_debug(fmt...) do {} while(0)

No need for 'fmt...' just '...' will suffice since you are discarding
the parameters anyway.

> +#endif
> +
> +struct vvtd *domain_vvtd(struct domain *d)
> +{
> +return (d->viommu) ? d->viommu->priv : NULL;

Unneeded parentheses around d->viommu.

Also, it seems wring to call domain_vvtd with !d->viommu. So I think
this helper should just be removed, and d->viommu->priv fetched
directly.

> +}
> +
>  static inline void vvtd_set_reg(struct vvtd *vtd, uint32_t reg, uint32_t 
> value)
>  {
>  vtd->regs->data32[reg/sizeof(uint32_t)] = value;
> @@ -68,6 +91,73 @@ static inline uint64_t vvtd_get_reg_quad(struct vvtd *vtd, 
> uint32_t reg)
>  return vtd->regs->data64[reg/sizeof(uint64_t)];
>  }
>  
> +static int vvtd_in_range(struct vcpu *v, unsigned long addr)
> +{
> +struct vvtd *vvtd = domain_vvtd(v->domain);
> +
> +if ( vvtd )
> +return (addr >= vvtd->base_addr) &&
> +   (addr < vvtd->base_addr + PAGE_SIZE);

So the register set covers a PAGE_SIZE, but hvm_hw_vvtd_regs only
covers from 0 to 1024B, it seems like there's something wrong here...

> +return 0;
> +}
> +
> +static int vvtd_read(struct vcpu *v, unsigned long addr,
> + unsigned int len, unsigned long *pval)
> +{
> +struct vvtd *vvtd = domain_vvtd(v->domain);
> +unsigned int offset = addr - vvtd->base_addr;
> +
> +vvtd_info("Read offset %x len %d\n", offset, len);
> +
> +if ( (len != 4 && len != 8) || (offset & (len - 1)) )

What value does hardware return when performing unaligned reads or
reads with wrong size?

Here you return with pval not set, which is dangerous.

> +return X86EMUL_OKAY;
> +
> +if ( len == 4 )
> +*pval = vvtd_get_reg(vvtd, offset);
> +else
> +*pval = vvtd_get_reg_quad(vvtd, offset);

...yet here you don't check for offset < 1024.

> +
> +return X86EMUL_OKAY;
> +}
> +
> +static int vvtd_write(struct vcpu *v, unsigned long addr,
> +  unsigned int len, unsigned long val)
> +{
> +struct vvtd *vvtd = domain_vvtd(v->domain);
> +unsigned int offset = addr - vvtd->base_addr;
> +
> +vvtd_info("Write offset %x len %d val %lx\n", offset, len, val);
> +
> +if ( (len != 4 && len != 8) || (offset & (len - 1)) )
> +return X86EMUL_OKAY;
> +
> +if ( len == 4 )
> +{
> +switch ( offset )
> +{
> +case DMAR_IEDATA_REG:
> +case DMAR_IEADDR_REG:
> +case DMAR_IEUADDR_REG:
> +case DMAR_FEDATA_REG:
> +case DMAR_FEADDR_REG:
> +case DMAR_FEUADDR_REG:
> +vvtd_set_reg(vvtd, offset, val);

Hm, so you are using a full page when you only care for 6 4B
registers? Seem like quite of a waste of memory.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 11/29] x86/hvm: Introduce a emulated VTD for HVM

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:52PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> This patch adds create/destroy function for the emulated VTD
> and adapts it to the common VIOMMU abstraction.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/drivers/passthrough/vtd/Makefile |   7 +-
>  xen/drivers/passthrough/vtd/iommu.h  |  23 +-
>  xen/drivers/passthrough/vtd/vvtd.c   | 147 
> +++
>  3 files changed, 170 insertions(+), 7 deletions(-)
>  create mode 100644 xen/drivers/passthrough/vtd/vvtd.c
> 
> diff --git a/xen/drivers/passthrough/vtd/Makefile 
> b/xen/drivers/passthrough/vtd/Makefile
> index f302653..163c7fe 100644
> --- a/xen/drivers/passthrough/vtd/Makefile
> +++ b/xen/drivers/passthrough/vtd/Makefile
> @@ -1,8 +1,9 @@
>  subdir-$(CONFIG_X86) += x86
>  
> -obj-y += iommu.o
>  obj-y += dmar.o
> -obj-y += utils.o
> -obj-y += qinval.o
>  obj-y += intremap.o
> +obj-y += iommu.o
> +obj-y += qinval.o
>  obj-y += quirks.o
> +obj-y += utils.o

Why do you need to shuffle the list above?

Also I'm not sure the Intel vIOMMU implementation should live here. As
you can see the path is:

xen/drivers/passthrough/vtd/

The vIOMMU is not tied to passthrough at all, so I would rather place
it in:

xen/drivers/vvtd/

Or maybe you can create something like:

xen/drivers/viommu/

So that all vIOMMU implementations can share some code.

> +obj-$(CONFIG_VIOMMU) += vvtd.o
> diff --git a/xen/drivers/passthrough/vtd/iommu.h 
> b/xen/drivers/passthrough/vtd/iommu.h
> index d7e433e..ef038c9 100644
> --- a/xen/drivers/passthrough/vtd/iommu.h
> +++ b/xen/drivers/passthrough/vtd/iommu.h
> @@ -66,6 +66,12 @@
>  #define VER_MAJOR(v)(((v) & 0xf0) >> 4)
>  #define VER_MINOR(v)((v) & 0x0f)
>  
> +/* Supported Adjusted Guest Address Widths */
> +#define DMA_CAP_SAGAW_SHIFT 8
> + /* 39-bit AGAW, 3-level page-table */
> +#define DMA_CAP_SAGAW_39bit (0x2ULL << DMA_CAP_SAGAW_SHIFT)
> +#define DMA_CAP_ND_64K  6ULL
> +
>  /*
>   * Decoding Capability Register
>   */
> @@ -74,6 +80,7 @@
>  #define cap_write_drain(c) (((c) >> 54) & 1)
>  #define cap_max_amask_val(c)   (((c) >> 48) & 0x3f)
>  #define cap_num_fault_regs(c)  c) >> 40) & 0xff) + 1)
> +#define cap_set_num_fault_regs(c)  c) - 1) & 0xff) << 40)
>  #define cap_pgsel_inv(c)   (((c) >> 39) & 1)
>  
>  #define cap_super_page_val(c)  (((c) >> 34) & 0xf)
> @@ -85,11 +92,13 @@
>  #define cap_sps_1tb(c) ((c >> 37) & 1)
>  
>  #define cap_fault_reg_offset(c)c) >> 24) & 0x3ff) * 16)
> +#define cap_set_fault_reg_offset(c) c) / 16) & 0x3ff) << 24 )
>  
>  #define cap_isoch(c)(((c) >> 23) & 1)
>  #define cap_qos(c)(((c) >> 22) & 1)
>  #define cap_mgaw(c)c) >> 16) & 0x3f) + 1)
> -#define cap_sagaw(c)(((c) >> 8) & 0x1f)
> +#define cap_set_mgaw(c) c) - 1) & 0x3f) << 16)
> +#define cap_sagaw(c)(((c) >> DMA_CAP_SAGAW_SHIFT) & 0x1f)
>  #define cap_caching_mode(c)(((c) >> 7) & 1)
>  #define cap_phmr(c)(((c) >> 6) & 1)
>  #define cap_plmr(c)(((c) >> 5) & 1)
> @@ -104,10 +113,16 @@
>  #define ecap_niotlb_iunits(e)e) >> 24) & 0xff) + 1)
>  #define ecap_iotlb_offset(e) e) >> 8) & 0x3ff) * 16)
>  #define ecap_coherent(e) ((e >> 0) & 0x1)
> -#define ecap_queued_inval(e) ((e >> 1) & 0x1)
> +#define DMA_ECAP_QI_SHIFT1
> +#define DMA_ECAP_QI  (1ULL << DMA_ECAP_QI_SHIFT)
> +#define ecap_queued_inval(e) ((e >> DMA_ECAP_QI_SHIFT) & 0x1)

Looks like this could be based on MASK_EXTR instead, but seeing how
the file is full of open-coded mask extracts I'm not sure it's worth
it anymore.

>  #define ecap_dev_iotlb(e)((e >> 2) & 0x1)
> -#define ecap_intr_remap(e)   ((e >> 3) & 0x1)
> -#define ecap_eim(e)  ((e >> 4) & 0x1)
> +#define DMA_ECAP_IR_SHIFT3
> +#define DMA_ECAP_IR  (1ULL << DMA_ECAP_IR_SHIFT)
> +#define ecap_intr_remap(e)   ((e >> DMA_ECAP_IR_SHIFT) & 0x1)
> +#define DMA_ECAP_EIM_SHIFT   4
> +#define DMA_ECAP_EIM (1ULL << DMA_ECAP_EIM_SHIFT)
> +#define ecap_eim(e)  ((e >> DMA_ECAP_EIM_SHIFT) & 0x1)

Maybe worth placing all the DMA_ECAP_* defines in a separate section?
Seems like how it's done for other features like DMA_FSTS or
DMA_CCMD.

>  #define ecap_cache_hints(e)  ((e >> 5) & 0x1)
>  #define ecap_pass_thru(e)((e >> 6) & 0x1)
>  #define ecap_snp_ctl(e)  ((e >> 7) & 0x1)
> diff --git a/xen/drivers/passthrough/vtd/vvtd.c 
> b/xen/drivers/passthrough/vtd/vvtd.c
> new file mode 100644
> index 000..c851ec7
> --- /dev/null
> +++ b/xen/drivers/passthrough/vtd/vvtd.c
> @@ -0,0 +1,147 @@
> +/*
> + * vvtd.c
> + *
> + * virtualize VTD for HVM.
> + *
> + * Copyright (C) 2017 Chao Gao, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under 

Re: [Xen-devel] [PATCH V3 10/29] vtd: add and align register definitions

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:51PM -0400, Lan Tianyu wrote:
> From: Chao Gao <chao@intel.com>
> 
> No functional changes.
> 
> Signed-off-by: Chao Gao <chao@intel.com>
> Signed-off-by: Lan Tianyu <tianyu....@intel.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Would have been nice to maybe split this into two, one patch that
simply fixes the alignment and another one that introduces the new
defines (or even introduce the new defines when they are actually
needed).

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 9/29] tools/libxc: Add viommu operations in libxc

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:50PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> This patch adds XEN_DOMCTL_viommu_op hypercall. This hypercall
> comprises two sub-commands:

Patch description doesn't match actual code. This patch doesn't add
any new hypercalls, it just adds libxc helpers for
XEN_DOMCTL_viommu_op.

> - create(): create a vIOMMU in Xen, given viommu type, register-set
> location and capabilities
> - destroy(): destroy a vIOMMU specified by viommu_id
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
> v3:
>  - Remove API for querying viommu capabilities
>  - Remove pointless cast
>  - Polish commit message
>  - Coding style
> ---
>  tools/libxc/Makefile  |  1 +
>  tools/libxc/include/xenctrl.h |  4 +++
>  tools/libxc/xc_viommu.c   | 64 
> +++
>  3 files changed, 69 insertions(+)
>  create mode 100644 tools/libxc/xc_viommu.c
> 
> diff --git a/tools/libxc/Makefile b/tools/libxc/Makefile
> index 9a019e8..7d8c4b4 100644
> --- a/tools/libxc/Makefile
> +++ b/tools/libxc/Makefile
> @@ -51,6 +51,7 @@ CTRL_SRCS-$(CONFIG_MiniOS) += xc_minios.c
>  CTRL_SRCS-y   += xc_evtchn_compat.c
>  CTRL_SRCS-y   += xc_gnttab_compat.c
>  CTRL_SRCS-y   += xc_devicemodel_compat.c
> +CTRL_SRCS-y   += xc_viommu.c
>  
>  GUEST_SRCS-y :=
>  GUEST_SRCS-y += xg_private.c xc_suspend.c
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index 43151cb..bedca1f 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2501,6 +2501,10 @@ enum xc_static_cpu_featuremask {
>  const uint32_t *xc_get_static_cpu_featuremask(enum 
> xc_static_cpu_featuremask);
>  const uint32_t *xc_get_feature_deep_deps(uint32_t feature);
>  
> +int xc_viommu_create(xc_interface *xch, domid_t dom, uint64_t type,
> + uint64_t base_addr, uint64_t cap, uint32_t *viommu_id);
> +int xc_viommu_destroy(xc_interface *xch, domid_t dom, uint32_t viommu_id);
> +
>  #endif
>  
>  int xc_livepatch_upload(xc_interface *xch,
> diff --git a/tools/libxc/xc_viommu.c b/tools/libxc/xc_viommu.c
> new file mode 100644
> index 000..17507c5
> --- /dev/null
> +++ b/tools/libxc/xc_viommu.c
> @@ -0,0 +1,64 @@
> +/*
> + * xc_viommu.c
> + *
> + * viommu related API functions.
> + *
> + * Copyright (C) 2017 Intel Corporation
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License, version 2.1, as published by the Free Software Foundation.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; If not, see 
> .
> + */
> +
> +#include "xc_private.h"
> +
> +int xc_viommu_create(xc_interface *xch, domid_t dom, uint64_t type,
> + uint64_t base_addr, uint64_t cap, uint32_t *viommu_id)
> +{
> +int rc;
> +

Extra newline.

> +DECLARE_DOMCTL;
> +
> +domctl.cmd = XEN_DOMCTL_viommu_op;
> +domctl.domain = dom;
> +domctl.u.viommu_op.cmd = XEN_DOMCTL_create_viommu;
> +domctl.u.viommu_op.u.create.viommu_type = type;

Please remove the "viommu_" prefix from the field, it's not needed
and just using "type" is already clear.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 8/29] tools/libxl: create vIOMMU during domain construction

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:49PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> If guest is configured to have a vIOMMU, create it during domain construction.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> 
> ---
> v3:
>  - Remove the process of querying capabilities.
> ---
>  tools/libxl/libxl_x86.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index 23c9a55..25cae5f 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -341,8 +341,25 @@ int libxl__arch_domain_create(libxl__gc *gc, 
> libxl_domain_config *d_config,
>  if (d_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM) {
>  unsigned long shadow = DIV_ROUNDUP(d_config->b_info.shadow_memkb,
> 1024);
> +int i;

unsigned int.

> +
>  xc_shadow_control(ctx->xch, domid, 
> XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION,
>NULL, 0, , 0, NULL);
> +
> +for (i = 0; i < d_config->b_info.num_viommus; i++) {
> +uint32_t id;
> +libxl_viommu_info *viommu = d_config->b_info.viommu + i;

Since this is an array I would rather prefer that you use
_config->b_info.viommu[i].

> +
> +if (viommu->type == LIBXL_VIOMMU_TYPE_INTEL_VTD) {
> +ret = xc_viommu_create(ctx->xch, domid, 
> VIOMMU_TYPE_INTEL_VTD,
> +   viommu->base_addr, viommu->cap, );

As said in another patch: this will break compilation because
xc_viommu_create is introduced in patch 9.

Please organize the patches in a way that the code always compiles and
works fine. Keep in mind that the Xen tree should be bisectable
always.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 7/29] tools/libxl: build DMAR table for a guest with one virtual VTD

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:48PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> A new logic is added to build ACPI DMAR table in tool stack for a guest
> with one virtual VTD and pass through it to guest via existing mechanism. If
> there already are ACPI tables needed to pass through, we joint the tables.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> 
> ---
> v3:
>  - build dmar and initialize related acpi_modules struct in
>  libxl_x86_acpi.c, keeping in accordance with pvh.
> 
> ---
>  tools/libxl/libxl_x86.c  |  3 +-
>  tools/libxl/libxl_x86_acpi.c | 98 
> ++--
>  2 files changed, 96 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
> index 455f6f0..23c9a55 100644
> --- a/tools/libxl/libxl_x86.c
> +++ b/tools/libxl/libxl_x86.c
> @@ -381,8 +381,7 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc 
> *gc,
>  {
>  int rc = 0;
>  
> -if ((info->type == LIBXL_DOMAIN_TYPE_HVM) &&
> -(info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_NONE)) {
> +if (info->type == LIBXL_DOMAIN_TYPE_HVM) {

You will have to rebase this on top of current staging,
LIBXL_DEVICE_MODEL_VERSION_NONE is now gone.

>  rc = libxl__dom_load_acpi(gc, info, dom);
>  if (rc != 0)
>  LOGE(ERROR, "libxl_dom_load_acpi failed");
> diff --git a/tools/libxl/libxl_x86_acpi.c b/tools/libxl/libxl_x86_acpi.c
> index 1761756..adf02f4 100644
> --- a/tools/libxl/libxl_x86_acpi.c
> +++ b/tools/libxl/libxl_x86_acpi.c
> @@ -16,6 +16,7 @@
>  #include "libxl_arch.h"
>  #include 
>  #include 
> +#include "libacpi/acpi2_0.h"
>  #include "libacpi/libacpi.h"
>  
>  #include 
> @@ -161,9 +162,9 @@ out:
>  return rc;
>  }
>  
> -int libxl__dom_load_acpi(libxl__gc *gc,
> - const libxl_domain_build_info *b_info,
> - struct xc_dom_image *dom)
> +static int libxl__dom_load_acpi_pvh(libxl__gc *gc,
> +const libxl_domain_build_info *b_info,
> +struct xc_dom_image *dom)
>  {
>  struct acpi_config config = {0};
>  struct libxl_acpi_ctxt libxl_ctxt;
> @@ -236,6 +237,97 @@ out:
>  return rc;
>  }
>  
> +static void *acpi_memalign(struct acpi_ctxt *ctxt, uint32_t size,
> +   uint32_t align)
> +{
> +int ret;
> +void *ptr;
> +
> +ret = posix_memalign(, align, size);
> +if (ret != 0 || !ptr)
> +return NULL;
> +
> +return ptr;
> +}
> +
> +/*
> + * For hvm, we don't need build acpi in libxl. Instead, it's built in 
> hvmloader.
> + * But if one hvm has virtual VTD(s), we build DMAR table for it and joint 
> this
> + * table with existing content in acpi_modules in order to employ HVM
> + * firmware pass-through mechanism to pass-through DMAR table.
> + */
> +static int libxl__dom_load_acpi_hvm(libxl__gc *gc,
> +const libxl_domain_build_info *b_info,
> +struct xc_dom_image *dom)
> +{

AFAICT there's some code duplication between libxl__dom_load_acpi_hvm
and libxl__dom_load_acpi_pvh, isn't there a chance you could put this
in a common function?

> +struct acpi_config config = { 0 };
> +struct acpi_ctxt ctxt;
> +void *table;
> +uint32_t len;
> +
> +if ((b_info->type != LIBXL_DOMAIN_TYPE_HVM) ||
> +(b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_NONE) ||
> +(b_info->num_viommus != 1) ||
> +(b_info->viommu[0].type != LIBXL_VIOMMU_TYPE_INTEL_VTD))
> +return 0;
> +
> +ctxt.mem_ops.alloc = acpi_memalign;
> +ctxt.mem_ops.v2p = virt_to_phys;
> +ctxt.mem_ops.free = acpi_mem_free;
> +
> +if (libxl_defbool_val(b_info->viommu[0].intremap))
> +config.iommu_intremap_supported = true;
> +/* x2apic is always enabled since in no case we must disable it */
> +config.iommu_x2apic_supported = true;
> +config.iommu_base_addr = b_info->viommu[0].base_addr;

I don't see libxl__dom_load_acpi_pvh setting any of the vIOMMU fields.

> +
> +/* IOAPIC id and PSEUDO BDF */
> +config.ioapic_id = 1;
> +config.ioapic_bus = 0xff;
> +config.ioapic_devfn = 0x0;
> +
> +config.host_addr_width = 39;
> +
> +table = construct_dmar(, );
> +if ( !table )
> +return ERROR_NOMEM;
> +len = ((struct acpi_header *)table)->length;
> +
> +if (len) {
> +libxl__ptr_add(gc, table);
> +if (!dom->acpi_modules[0].data) {
> +dom->acpi_modules[0].data = table;
> +dom->acpi_modules[0].length = len;
> +} else {
> +/* joint tables */
> +void *newdata;
> +
> +newdata = libxl__malloc(gc, len + dom->acpi_modules[0].length);
> +memcpy(newdata, dom->acpi_modules[0].data,
> +   dom->acpi_modules[0].length);
> + 

Re: [Xen-devel] [PATCH V3 6/29] tools/libxl: Add a user configurable parameter to control vIOMMU attributes

2017-10-19 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:47PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> A field, viommu_info, is added to struct libxl_domain_build_info. Several
> attributes can be specified by guest config file for virtual IOMMU. These
> attributes are used for DMAR construction and vIOMMU creation.

IMHO this should come much later in the series, ideally you would
introduce the xl/libxl code in the last patches, together with the
xl.cfg man page change.

> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> 
> ---
> v3:
>  - allow an array of viommu other than only one viommu to present to guest.
>  During domain building, an error would be raised for
>  multiple viommus case since we haven't implemented this yet.
>  - provide a libxl__viommu_set_default() for viommu
> 
> ---
>  docs/man/xl.cfg.pod.5.in| 27 +++
>  tools/libxl/libxl_create.c  | 52 
> +
>  tools/libxl/libxl_types.idl | 12 +++
>  tools/xl/xl_parse.c | 52 
> -
>  4 files changed, 142 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
> index 79cb2ea..9cd7dd7 100644
> --- a/docs/man/xl.cfg.pod.5.in
> +++ b/docs/man/xl.cfg.pod.5.in
> @@ -1547,6 +1547,33 @@ 
> L
>  
>  =back 
>  
> +=item 

Re: [Xen-devel] [PATCH V3 1/29] Xen/doc: Add Xen virtual IOMMU doc

2017-10-19 Thread Roger Pau Monné
On Thu, Oct 19, 2017 at 10:26:36AM +0800, Lan Tianyu wrote:
> Hi Roger:
>  Thanks for review.
> 
> On 2017年10月18日 21:26, Roger Pau Monné wrote:
> > On Thu, Sep 21, 2017 at 11:01:42PM -0400, Lan Tianyu wrote:
> >> +Xen hypervisor vIOMMU command
> >> +=
> >> +Introduce vIOMMU command "viommu=1" to enable vIOMMU function in 
> >> hypervisor.
> >> +It's default disabled.
> > 
> > Hm, I'm not sure we really need this. At the end viommu will be
> > disabled by default for guests, unless explicitly enabled in the
> > config file.
> 
> This is according to Jan's early comments on RFC patch
> https://patchwork.kernel.org/patch/9733869/.
> 
> "It's actually a question whether in our current scheme a Kconfig
> option is appropriate here in the first place. I'd rather see this be
> an always built feature which needs enabling on the command line
> for the time being."

So if I read this correctly Jan wanted you to ditch the Kconfig option
and instead rely on the command line option to enable/disable it.

I don't have a strong opinion here, so it's fine for me if you want to
keep both the Kconfig option and the command line one.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 2/29] VIOMMU: Add vIOMMU helper functions to create, destroy vIOMMU instance

2017-10-19 Thread Roger Pau Monné
On Thu, Oct 19, 2017 at 02:31:22PM +0800, Lan Tianyu wrote:
> On 2017年10月18日 22:05, Roger Pau Monné wrote:
> > On Thu, Sep 21, 2017 at 11:01:43PM -0400, Lan Tianyu wrote:
> >> +static int viommu_create(struct domain *d, uint64_t type,
> >> + uint64_t base_address, uint64_t caps,
> >> + uint32_t *viommu_id)
> > 
> > I'm quite sure this doesn't compile, you are adding a static function
> > here that's not used at all in this patch. Please be careful and don't
> > introduce patches that will break the build.
> 
> This function will be used in the next patch. "DOMCTL: Introduce new
> DOMCTL commands for vIOMMU support.". So this doesn't break patchset
> build. Will combine these two patches to avoid such issue.

If it's used in the next patch, then simply introduce it there.

> 
> 
> > 
> >> +{
> >> +struct viommu *viommu;
> >> +struct viommu_type *viommu_type = NULL;
> >> +int rc;
> >> +
> >> +/* Only support one vIOMMU per domain. */
> >> +if ( d->viommu )
> >> +return -E2BIG;
> >> +
> >> +viommu_type = viommu_get_type(type);
> >> +if ( !viommu_type )
> >> +return -EINVAL;
> >> +
> >> +if ( !viommu_type->ops || !viommu_type->ops->create )
> >> +return -EINVAL;
> > 
> > Can this really happen? What's the point in having a iommu_type
> > without ops or without the create op? I think this should be an ASSERT
> > instead.
> 
> How about add ASSERT(viommu_type->ops->create) here?

Since ops is already a pointer I would rather do

ASSERT(viommu_type->ops && viommu_type->ops->create);

Or else you risk a NULL pointer dereference.

> >> +
> >>  /*
> >>   * Stats
> >>   *
> >> @@ -479,6 +483,10 @@ struct domain
> >>  rwlock_t vnuma_rwlock;
> >>  struct vnuma_info *vnuma;
> >>  
> >> +#ifdef CONFIG_VIOMMU
> >> +struct viommu *viommu;
> >> +#endif
> > 
> > Shouldn't this go inside of x86/hvm/domain.h? (hvm_domain) PV guests
> > will certainly never be able to use it.
> 
> vIOMMU framework should be generic for all platforms and so didn't put
> this in arch/x86.

For all platforms supporting HVM, for PV I don't think it makes sense.
Since AFAIK ARM guest type is also HVM I would rather introduce this
field in the hvm_domain structure rather than the generic domain
structure.

You might want to wait for feedback from others regarding this issue.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 5/29] tools/libacpi: Add new fields in acpi_config for DMAR table

2017-10-19 Thread Roger Pau Monné
On Thu, Oct 19, 2017 at 04:09:02PM +0800, Lan Tianyu wrote:
> On 2017年10月18日 23:12, Roger Pau Monné wrote:
> >> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
> >> index a2efd23..fdd6a78 100644
> >> --- a/tools/libacpi/libacpi.h
> >> +++ b/tools/libacpi/libacpi.h
> >> @@ -20,6 +20,8 @@
> >>  #ifndef __LIBACPI_H__
> >>  #define __LIBACPI_H__
> >>  
> >> +#include 
> > 
> > I'm quite sure you shouldn't add this here, see how headers are added
> > using LIBACPI_STDUTILS.
> > 
> 
> We may replace bool with uint8_t xxx:1 to avoid introduce new head file.

Did you check whether including stdbool is actually required? AFAICT
hvmloader util.h already includes it, and you would only have to
introduce it in libxl if it's not there yet.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 5/29] tools/libacpi: Add new fields in acpi_config for DMAR table

2017-10-18 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:46PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> The BIOS reports the remapping hardware units in a platform to system software
> through the DMA Remapping Reporting (DMAR) ACPI table.
> New fields are introduces for DMAR table. These new fields are set by
 ^ introduced
> toolstack through parsing guest's config file. construct_dmar() is added to
> build DMAR table according to the new fields.
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
> v3:
>  - Remove chip-set specific IOAPIC BDF. Instead, let IOAPIC-related
>  info be passed by struct acpi_config.
> 
> ---
>  tools/libacpi/build.c   | 53 
> +
>  tools/libacpi/libacpi.h | 12 +++
>  2 files changed, 65 insertions(+)
> 
> diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
> index f9881c9..5ee8fcd 100644
> --- a/tools/libacpi/build.c
> +++ b/tools/libacpi/build.c
> @@ -303,6 +303,59 @@ static struct acpi_20_slit *construct_slit(struct 
> acpi_ctxt *ctxt,
>  return slit;
>  }
>  
> +/*
> + * Only one DMA remapping hardware unit is exposed and all devices
> + * are under the remapping hardware unit. I/O APIC should be explicitly
> + * enumerated.
> + */
> +struct acpi_dmar *construct_dmar(struct acpi_ctxt *ctxt,
> + const struct acpi_config *config)
> +{
> +struct acpi_dmar *dmar;
> +struct acpi_dmar_hardware_unit *drhd;
> +struct dmar_device_scope *scope;
> +unsigned int size;
> +unsigned int ioapic_scope_size = sizeof(*scope) + sizeof(scope->path[0]);

I'm not sure I follow why you need to add the size of a uint16_t here.

> +
> +size = sizeof(*dmar) + sizeof(*drhd) + ioapic_scope_size;

size can be initialized at declaration time.

> +
> +dmar = ctxt->mem_ops.alloc(ctxt, size, 16);

Even dmar can be initialized at declaration time.

> +if ( !dmar )
> +return NULL;
> +
> +memset(dmar, 0, size);
> +dmar->header.signature = ACPI_2_0_DMAR_SIGNATURE;
> +dmar->header.revision = ACPI_2_0_DMAR_REVISION;
> +dmar->header.length = size;
> +fixed_strcpy(dmar->header.oem_id, ACPI_OEM_ID);
> +fixed_strcpy(dmar->header.oem_table_id, ACPI_OEM_TABLE_ID);
> +dmar->header.oem_revision = ACPI_OEM_REVISION;
> +dmar->header.creator_id   = ACPI_CREATOR_ID;
> +dmar->header.creator_revision = ACPI_CREATOR_REVISION;
> +dmar->host_address_width = config->host_addr_width - 1;
> +if ( config->iommu_intremap_supported )
> +dmar->flags |= ACPI_DMAR_INTR_REMAP;
> +if ( !config->iommu_x2apic_supported )
> +dmar->flags |= ACPI_DMAR_X2APIC_OPT_OUT;

Is there any reason why we would want to create a guest with a vIOMMU
but not x2APIC support?

> +
> +drhd = (struct acpi_dmar_hardware_unit *)((void*)dmar + sizeof(*dmar));
  ^ space
> +drhd->type = ACPI_DMAR_TYPE_HARDWARE_UNIT;
> +drhd->length = sizeof(*drhd) + ioapic_scope_size;
> +drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
> +drhd->pci_segment = 0;
> +drhd->base_address = config->iommu_base_addr;
> +
> +scope = >scope[0];
> +scope->type = ACPI_DMAR_DEVICE_SCOPE_IOAPIC;
> +scope->length = ioapic_scope_size;
> +scope->enumeration_id = config->ioapic_id;
> +scope->bus = config->ioapic_bus;
> +scope->path[0] = config->ioapic_devfn;
> +
> +set_checksum(dmar, offsetof(struct acpi_header, checksum), size);
> +return dmar;
> +}
> +
>  static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
>  unsigned long *table_ptrs,
>  int nr_tables,
> diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
> index a2efd23..fdd6a78 100644
> --- a/tools/libacpi/libacpi.h
> +++ b/tools/libacpi/libacpi.h
> @@ -20,6 +20,8 @@
>  #ifndef __LIBACPI_H__
>  #define __LIBACPI_H__
>  
> +#include 

I'm quite sure you shouldn't add this here, see how headers are added
using LIBACPI_STDUTILS.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 4/29] tools/libacpi: Add DMA remapping reporting (DMAR) ACPI table structures

2017-10-18 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:45PM -0400, Lan Tianyu wrote:
> From: Chao Gao 
> 
> Add dmar table structure according Chapter 8 "BIOS Considerations" of
> VTd spec Rev. 2.4.
> 
> VTd 
> spec:http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
> 
> Signed-off-by: Chao Gao 
> Signed-off-by: Lan Tianyu 
> ---
>  tools/libacpi/acpi2_0.h | 61 
> +
>  1 file changed, 61 insertions(+)
> 
> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
> index 2619ba3..758a823 100644
> --- a/tools/libacpi/acpi2_0.h
> +++ b/tools/libacpi/acpi2_0.h
> @@ -422,6 +422,65 @@ struct acpi_20_slit {
>  };
>  
>  /*
> + * DMA Remapping Table header definition (DMAR)
> + */
> +
> +/*
> + * DMAR Flags.
> + */
> +#define ACPI_DMAR_INTR_REMAP(1 << 0)
> +#define ACPI_DMAR_X2APIC_OPT_OUT(1 << 1)
> +
> +struct acpi_dmar {
> +struct acpi_header header;
> +uint8_t host_address_width;
> +uint8_t flags;
> +uint8_t reserved[10];
> +};
> +
> +/*
> + * Device Scope Types
> + */
> +#define ACPI_DMAR_DEVICE_SCOPE_PCI_ENDPOINT 0x01
> +#define ACPI_DMAR_DEVICE_SCOPE_PCI_SUB_HIERARACHY   0x01
   ^0x02
> +#define ACPI_DMAR_DEVICE_SCOPE_IOAPIC   0x03
> +#define ACPI_DMAR_DEVICE_SCOPE_HPET 0x04
> +#define ACPI_DMAR_DEVICE_SCOPE_ACPI_NAMESPACE_DEVICE0x05

Maybe you could try to reduce the length of the defines?

> +
> +struct dmar_device_scope {
> +uint8_t type;
> +uint8_t length;
> +uint8_t reserved[2];
> +uint8_t enumeration_id;
> +uint8_t bus;
> +uint16_t path[0];
> +};
> +
> +/*
> + * DMA Remapping Hardware Unit Types
> + */
> +#define ACPI_DMAR_TYPE_HARDWARE_UNIT0x00
> +#define ACPI_DMAR_TYPE_RESERVED_MEMORY  0x01
> +#define ACPI_DMAR_TYPE_ATSR 0x02
> +#define ACPI_DMAR_TYPE_HARDWARE_AFFINITY0x03
> +#define ACPI_DMAR_TYPE_ANDD 0x04

I think you either use acronyms for all of them (like ATSR and ANDD)
or not. But mixing acronyms with full names is confusing.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 3/29] DOMCTL: Introduce new DOMCTL commands for vIOMMU support

2017-10-18 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:44PM -0400, Lan Tianyu wrote:
> This patch is to introduce create, destroy and query capabilities
> command for vIOMMU. vIOMMU layer will deal with requests and call
> arch vIOMMU ops.
> 
> Signed-off-by: Lan Tianyu 
> ---
>  xen/common/domctl.c |  6 ++
>  xen/common/viommu.c | 30 ++
>  xen/include/public/domctl.h | 42 ++
>  xen/include/xen/viommu.h|  2 ++
>  4 files changed, 80 insertions(+)
> 
> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
> index 42658e5..7e28237 100644
> --- a/xen/common/domctl.c
> +++ b/xen/common/domctl.c
> @@ -1149,6 +1149,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
> u_domctl)
>  copyback = 1;
>  break;
>  
> +#ifdef CONFIG_VIOMMU
> +case XEN_DOMCTL_viommu_op:
> +ret = viommu_domctl(d, >u.viommu_op, );

IMHO, I'm not really sure if it's worth to pass the copyback parameter
around. Can you just do the copy if !ret?

> +break;
> +#endif

Instead of guarding every call to a viommu related function with
CONFIG_VIOMMU I would rather add dummy replacements for them in the
!CONFIG_VIOMMU case in the viommu.h header.

> +
>  default:
>  ret = arch_do_domctl(op, d, u_domctl);
>  break;
> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> index 64d91e6..55feb5d 100644
> --- a/xen/common/viommu.c
> +++ b/xen/common/viommu.c
> @@ -133,6 +133,36 @@ static int viommu_create(struct domain *d, uint64_t type,
>  return 0;
>  }
>  
> +int viommu_domctl(struct domain *d, struct xen_domctl_viommu_op *op,
> +  bool *need_copy)
> +{
> +int rc = -EINVAL;

Why do you need to set rc to EINVAL? AFAICT there's no path that would
return rc without being initialized.

> +
> +if ( !viommu_enabled() )
> +return -ENODEV;
> +
> +switch ( op->cmd )
> +{
> +case XEN_DOMCTL_create_viommu:
> +rc = viommu_create(d, op->u.create.viommu_type,
> +   op->u.create.base_address,
> +   op->u.create.capabilities,
> +   >u.create.viommu_id);
> +if ( !rc )
> +*need_copy = true;
> +break;
> +
> +case XEN_DOMCTL_destroy_viommu:
> +rc = viommu_destroy_domain(d);
> +break;
> +
> +default:
> +return -ENOSYS;
> +}
> +
> +return rc;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
> index 50ff58f..68854b6 100644
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -1163,6 +1163,46 @@ struct xen_domctl_psr_cat_op {
>  typedef struct xen_domctl_psr_cat_op xen_domctl_psr_cat_op_t;
>  DEFINE_XEN_GUEST_HANDLE(xen_domctl_psr_cat_op_t);
>  
> +/*  vIOMMU helper
> + *
> + *  vIOMMU interface can be used to create/destroy vIOMMU and
> + *  query vIOMMU capabilities.
> + */
> +
> +/* vIOMMU type - specify vendor vIOMMU device model */
> +#define VIOMMU_TYPE_INTEL_VTD   0
> +
> +/* vIOMMU capabilities */
> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)

Please put those two defines next to the fields they belong to.

> +
> +struct xen_domctl_viommu_op {
> +uint32_t cmd;
> +#define XEN_DOMCTL_create_viommu  0
> +#define XEN_DOMCTL_destroy_viommu 1

Would be nice if the values where right aligned.

> +union {
> +struct {
> +/* IN - vIOMMU type */
> +uint64_t viommu_type;
> +/* 
> + * IN - MMIO base address of vIOMMU. vIOMMU device models
> + * are in charge of to check base_address.
> + */
> +uint64_t base_address;
> +/* IN - Capabilities with which we want to create */
> +uint64_t capabilities;
> +/* OUT - vIOMMU identity */
> +uint32_t viommu_id;
> +} create;
> +
> +struct {
> +/* IN - vIOMMU identity */
> +uint32_t viommu_id;
> +} destroy;
> +} u;
> +};

See my comments about the struct in patch 01/29.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3 2/29] VIOMMU: Add vIOMMU helper functions to create, destroy vIOMMU instance

2017-10-18 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:43PM -0400, Lan Tianyu wrote:
> This patch is to introduce an abstract layer for arch vIOMMU implementation
> to deal with requests from dom0. Arch vIOMMU code needs to provide callback
> to do create and destroy operation.
> 
> Signed-off-by: Lan Tianyu 
> ---
>  docs/misc/xen-command-line.markdown |   7 ++
>  xen/arch/x86/Kconfig|   1 +
>  xen/common/Kconfig  |   3 +
>  xen/common/Makefile |   1 +
>  xen/common/domain.c |   4 +
>  xen/common/viommu.c | 144 
> 
>  xen/include/xen/sched.h |   8 ++
>  xen/include/xen/viommu.h|  63 
>  8 files changed, 231 insertions(+)
>  create mode 100644 xen/common/viommu.c
>  create mode 100644 xen/include/xen/viommu.h
> 
> diff --git a/docs/misc/xen-command-line.markdown 
> b/docs/misc/xen-command-line.markdown
> index 9797c8d..dfd1db5 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1825,3 +1825,10 @@ mode.
>  > Default: `true`
>  
>  Permit use of the `xsave/xrstor` instructions.
> +
> +### viommu
> +> `= `
> +
> +> Default: `false`
> +
> +Permit use of viommu interface to create and destroy viommu device model.
> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
> index 30c2769..1f1de96 100644
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -23,6 +23,7 @@ config X86
>   select HAS_PDX
>   select NUMA
>   select VGA
> + select VIOMMU
>  
>  config ARCH_DEFCONFIG
>   string
> diff --git a/xen/common/Kconfig b/xen/common/Kconfig
> index dc8e876..2ad2c8d 100644
> --- a/xen/common/Kconfig
> +++ b/xen/common/Kconfig
> @@ -49,6 +49,9 @@ config HAS_CHECKPOLICY
>   string
>   option env="XEN_HAS_CHECKPOLICY"
>  
> +config VIOMMU
> + bool
> +
>  config KEXEC
>   bool "kexec support"
>   default y
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 39e2614..da32f71 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -56,6 +56,7 @@ obj-y += time.o
>  obj-y += timer.o
>  obj-y += trace.o
>  obj-y += version.o
> +obj-$(CONFIG_VIOMMU) += viommu.o
>  obj-y += virtual_region.o
>  obj-y += vm_event.o
>  obj-y += vmap.o
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 5aebcf2..cdb1c9d 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -814,6 +814,10 @@ static void complete_domain_destroy(struct rcu_head 
> *head)
>  
>  sched_destroy_domain(d);
>  
> +#ifdef CONFIG_VIOMMU
> +viommu_destroy_domain(d);
> +#endif
> +
>  /* Free page used by xen oprofile buffer. */
>  #ifdef CONFIG_XENOPROF
>  free_xenoprof_pages(d);
> diff --git a/xen/common/viommu.c b/xen/common/viommu.c
> new file mode 100644
> index 000..64d91e6
> --- /dev/null
> +++ b/xen/common/viommu.c
> @@ -0,0 +1,144 @@
> +/*
> + * common/viommu.c
> + *
> + * Copyright (c) 2017 Intel Corporation
> + * Author: Lan Tianyu 
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along 
> with
> + * this program; If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +bool __read_mostly opt_viommu;
> +boolean_param("viommu", opt_viommu);
> +
> +static DEFINE_SPINLOCK(type_list_lock);
> +static LIST_HEAD(type_list);
> +
> +struct viommu_type {
> +uint64_t type;

The comment I've made about type being uint64_t in the other patch
stands here.

> +struct viommu_ops *ops;
> +struct list_head node;
> +};
> +
> +int viommu_destroy_domain(struct domain *d)
> +{
> +int ret;
> +
> +if ( !d->viommu )
> +return -EINVAL;

ENODEV would be better.

> +
> +ret = d->viommu->ops->destroy(d->viommu);
> +if ( ret < 0 )
> +return ret;
> +
> +xfree(d->viommu);
> +d->viommu = NULL;

Newline preferably.

> +return 0;
> +}
> +
> +static struct viommu_type *viommu_get_type(uint64_t type)
> +{
> +struct viommu_type *viommu_type = NULL;
> +
> +spin_lock(_list_lock);
> +list_for_each_entry( viommu_type, _list, node )
> +{
> +if ( viommu_type->type == type )
> +{
> +spin_unlock(_list_lock);
> +return viommu_type;
> +}
> +}
> +spin_unlock(_list_lock);

Why do you need a lock here, and a list at all?

AFAICT vIOMMU types will never be added at runtime.

> +
> +return 

Re: [Xen-devel] [PATCH V3 1/29] Xen/doc: Add Xen virtual IOMMU doc

2017-10-18 Thread Roger Pau Monné
On Thu, Sep 21, 2017 at 11:01:42PM -0400, Lan Tianyu wrote:
> This patch is to add Xen virtual IOMMU doc to introduce motivation,
> framework, vIOMMU hypercall and xl configuration.
> 
> Signed-off-by: Lan Tianyu 
> ---
>  docs/misc/viommu.txt | 136 
> +++
>  1 file changed, 136 insertions(+)
>  create mode 100644 docs/misc/viommu.txt
> 
> diff --git a/docs/misc/viommu.txt b/docs/misc/viommu.txt
> new file mode 100644
> index 000..348e8c4
> --- /dev/null
> +++ b/docs/misc/viommu.txt
> @@ -0,0 +1,136 @@
> +Xen virtual IOMMU
> +
> +Motivation
> +==
> +Enable more than 128 vcpu support
> +
> +The current requirements of HPC cloud service requires VM with a high
> +number of CPUs in order to achieve high performance in parallel
> +computing.
> +
> +To support >128 vcpus, X2APIC mode in guest is necessary because legacy
> +APIC(XAPIC) just supports 8-bit APIC ID. The APIC ID used by Xen is
> +CPU ID * 2 (ie: CPU 127 has APIC ID 254, which is the last one available
> +in xAPIC mode) and so it only can support 128 vcpus at most. x2APIC mode
> +supports 32-bit APIC ID and it requires the interrupt remapping functionality
> +of a vIOMMU if the guest wishes to route interrupts to all available vCPUs
> +
> +The reason for this is that there is no modification for existing PCI MSI
> +and IOAPIC when introduce X2APIC.

I'm not sure the above sentence makes much sense. IMHO I would just
remove it.

> PCI MSI/IOAPIC can only send interrupt
> +message containing 8-bit APIC ID, which cannot address cpus with >254
> +APIC ID. Interrupt remapping supports 32-bit APIC ID and so it's necessary
> +for >128 vcpus support.
> +
> +
> +vIOMMU Architecture
> +===
> +vIOMMU device model is inside Xen hypervisor for following factors
> +1) Avoid round trips between Qemu and Xen hypervisor
> +2) Ease of integration with the rest of hypervisor
> +3) HVMlite/PVH doesn't use Qemu

Just use PVH here, HVMlite == PVH now.

> +
> +* Interrupt remapping overview.
> +Interrupts from virtual devices and physical devices are delivered
> +to vLAPIC from vIOAPIC and vMSI. vIOMMU needs to remap interrupt during
> +this procedure.
> +
> ++---+
> +|Qemu   |VM |
> +|   | ++|
> +|   | |  Device driver ||
> +|   | ++---+|
> +|   |  ^|
> +|   ++  | ++---+|
> +|   | Virtual device |  | |  IRQ subsystem ||
> +|   +---++  | ++---+|
> +|   |   |  ^|
> +|   |   |  ||
> ++---+---+
> +|hypervisor |  | VIRQ   |
> +|   |+-++   |
> +|   ||  vLAPIC  |   |
> +|   |VIRQ+-++   |
> +|   |  ^|
> +|   |  ||
> +|   |+-++   |
> +|   ||  vIOMMU  |   |
> +|   |+-++   |
> +|   |  ^|
> +|   |  ||
> +|   |+-++   |
> +|   ||   vIOAPIC/vMSI   |   |
> +|   |++++   |
> +|   | ^^|
> +|   +-+||
> +|  ||
> ++---+
> +HW |IRQ
> ++---+
> +|   PCI Device  |
> ++---+
> +
> +
> +vIOMMU hypercall
> +
> +Introduce a new domctl hypercall "xen_domctl_viommu_op" to create/destroy
> +vIOMMUs.
> +
> +* vIOMMU hypercall parameter structure
> +
> +/* vIOMMU type - specify vendor vIOMMU device model */
> +#define VIOMMU_TYPE_INTEL_VTD   0
> +
> +/* vIOMMU capabilities */
> +#define VIOMMU_CAP_IRQ_REMAPPING  (1u << 0)
> +
> +struct xen_domctl_viommu_op {
> +uint32_t cmd;
> +#define XEN_DOMCTL_create_viommu  0
> +#define XEN_DOMCTL_destroy_viommu 1

I would invert the order of the domctl names:

#define XEN_DOMCTL_viommu_create  0
#define XEN_DOMCTL_viommu_destroy 1

It's clearer if the operation is the last part of the name.

> +union {
> +struct {
> +/* IN - vIOMMU type  */
> +uint64_t 

Re: [Xen-devel] [OSSTEST PATCH] MaxUmask: enforce a maximum umask value

2017-10-18 Thread Roger Pau Monné
On Tue, Oct 17, 2017 at 12:10:38PM +0100, Ian Jackson wrote:
> On some operating systems, the default umask is not 002 as it should
> be (for the sensible setup with personal groups).
> 
> If a user with an 022 or 077 umask invokes osstest in Executive mode,
> they end up creating directories in $c{Logs} which are writeable only
> by them, and that can stop the whole system because the service user
> cannot expire them.
> 
> Prevent this from happening.
> 
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10] string: fix memmove when size is 0

2017-10-18 Thread Roger Pau Monné
On Tue, Oct 17, 2017 at 07:00:25AM -0600, Jan Beulich wrote:
> >>> On 17.10.17 at 14:52,  wrote:
> > On Tue, Oct 17, 2017 at 01:41:35PM +0100, Andrew Cooper wrote:
> >> There are many passed values which could trigger this warning.  Does
> >> 
> >> diff --git a/xen/arch/x86/string.c b/xen/arch/x86/string.c
> >> index cd85a38..4f55856 100644
> >> --- a/xen/arch/x86/string.c
> >> +++ b/xen/arch/x86/string.c
> >> @@ -47,7 +47,7 @@ void *(memmove)(void *dest, const void *src, size_t n)
> >>  "   rep movsb   ; "
> >>  "   cld   "
> >>  : "=" (d0), "=" (d1), "=" (d2)
> >> -: "0" (n), "1" (n-1+(const char *)src), "2" (n-1+(char *)dest)
> >> +: "0" (n), "1" ((uintptr_t)src + n - 1), "2" ((uintptr_t)dest + n 
> >> - 1)
> >>  : "memory");
> >>  
> >>  return dest;
> >> 
> >> work any better?
> > 
> > That does indeed work, but I'm not sure if it would mask legitimate
> > pointer overflows by casting them into integers.
> 
> It certainly would, as the tool can't possibly know that the asm()
> itself then effectively casts the integers back to pointers (i.e. it
> has no basis to try to "look through" the cast and continue analysis).

I assume there are no further steps for me, just wait for Julien's
release Ack.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 for-4.10] ubsan: add clang 5.0 support

2017-10-18 Thread Roger Pau Monné
On Wed, Oct 18, 2017 at 03:53:37AM -0600, Jan Beulich wrote:
> >>> On 18.10.17 at 11:42,  wrote:
> > On Wed, Oct 18, 2017 at 03:23:20AM -0600, Jan Beulich wrote:
> >> >>> On 18.10.17 at 09:45,  wrote:
> >> > +void __ubsan_handle_pointer_overflow(struct pointer_overflow_data *data,
> >> > +unsigned long base, unsigned long 
> >> > result)
> >> > +{
> >> > +unsigned long flags;
> >> > +
> >> > +if (suppress_report(>location))
> >> > +return;
> >> > +
> >> > +ubsan_prologue(>location, );
> >> > +
> >> > +if (((long)base >= 0) == ((long)result >= 0))
> >> > +pr_err("pointer operation %s %p to %p\n",
> >> > +base > result ? "underflowed" : "overflowed",
> >> > +(void *)base, (void *)result);
> >> > +else
> >> > +pr_err("pointer index expression with base %p 
> >> > overflowed to %p\n",
> >> > +(void *)base, (void *)result);
> >> 
> >> Would you mind explaining the difference between if and else
> >> branches? (I do realize I should have asked this on v1 already,
> >> but I didn't pay enough attention.) Whatever the idea behind
> >> this, it should probably be explained in a comment, as it looks
> >> to be heuristic.
> > 
> > The upstream commit is:
> > 
> > https://github.com/llvm-mirror/compiler-rt/commit/079b7657767dcc0fb284225c277d
> >  
> > 2b9ce73e423b
> > 
> > However it's lacking a proper commit message. It seems to me like it's
> > there to detect addition of signed + unsigned values when an overflow
> > happens, but I don't really see it's value rather than just using the
> > first message.
> 
> Right - me too. I'd therefore like to simply drop the "if" and the "else"
> branch (likely easily done while committing), and then the change is
> Acked-by: Jan Beulich 

Yes, feel free to drop the if/else and just keep the first error
message.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 for-4.10] ubsan: add clang 5.0 support

2017-10-18 Thread Roger Pau Monné
On Wed, Oct 18, 2017 at 03:23:20AM -0600, Jan Beulich wrote:
> >>> On 18.10.17 at 09:45,  wrote:
> > +void __ubsan_handle_pointer_overflow(struct pointer_overflow_data *data,
> > +   unsigned long base, unsigned long result)
> > +{
> > +   unsigned long flags;
> > +
> > +   if (suppress_report(>location))
> > +   return;
> > +
> > +   ubsan_prologue(>location, );
> > +
> > +   if (((long)base >= 0) == ((long)result >= 0))
> > +   pr_err("pointer operation %s %p to %p\n",
> > +   base > result ? "underflowed" : "overflowed",
> > +   (void *)base, (void *)result);
> > +   else
> > +   pr_err("pointer index expression with base %p overflowed to 
> > %p\n",
> > +   (void *)base, (void *)result);
> 
> Would you mind explaining the difference between if and else
> branches? (I do realize I should have asked this on v1 already,
> but I didn't pay enough attention.) Whatever the idea behind
> this, it should probably be explained in a comment, as it looks
> to be heuristic.

The upstream commit is:

https://github.com/llvm-mirror/compiler-rt/commit/079b7657767dcc0fb284225c277d2b9ce73e423b

However it's lacking a proper commit message. It seems to me like it's
there to detect addition of signed + unsigned values when an overflow
happens, but I don't really see it's value rather than just using the
first message.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-18 Thread Roger Pau Monné
On Wed, Oct 18, 2017 at 10:46:57AM +0200, Paolo Bonzini wrote:
> On 18/10/2017 10:32, Roger Pau Monné wrote:
> >> I'll have a try to check how much the differences would affect. If it
> >> would not take too much work, I'd like to adapt Xen NVDIMM enabling
> >> patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
> >> and MST's suggestions.
> > I don't agree with the end goal of fully switching to the QEMU build
> > ACPI tables. First of all, the only entity that has all the
> > information about the guest it's the toolstack, and so it should be
> > the one in control of the ACPI tables.
> > 
> > Also, Xen guests can use several device models concurrently (via the
> > ioreq server interface), and each should be able to contribute to the
> > information presented in the ACPI tables. Intel is also working on
> > adding IOMMU emulation to the Xen hypervisor, in which case the vIOMMU
> > ACPI tables should be created by the toolstack and not QEMU. And
> > finally keep in mind that there are Xen guests (PVH) that use ACPI
> > tables but not QEMU.
> 
> I agree with this in fact; QEMU has a view of _most_ of the emulated
> hardware, but not all.
> 
> However, I disagree that the toolstack should be alone in controlling
> the ACPI tables; rather, each involved part of the stack should be
> providing its own part of the tables.  For example, QEMU (in addition to
> NVDIMM information) should be the one providing an SSDT for southbridge
> devices (floppy, COMx, LPTx, etc.).

Yes, that's what I wanted to say, rather than the toolstack providing
all the ACPI tables by itself. Every component should provide the
tables of the devices under it's control, and that should be glued
together by the toolstack (ie: hvmloader).

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-18 Thread Roger Pau Monné
On Tue, Oct 17, 2017 at 08:16:47PM +0800, Haozhong Zhang wrote:
> On 10/17/17 13:45 +0200, Paolo Bonzini wrote:
> > On 14/10/2017 00:46, Stefano Stabellini wrote:
> > > On Fri, 13 Oct 2017, Jan Beulich wrote:
> > > On 13.10.17 at 13:13,  wrote:
> > >>> To Jan, Andrew, Stefano and Anthony,
> > >>>
> > >>> what do you think about allowing QEMU to build the entire guest ACPI
> > >>> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
> > >>> still there and just bypassed in this case.
> > >> Well, if that can be made work in a non-quirky way and without
> > >> loss of functionality, I'd probably be fine. I do think, however,
> > >> that there's a reason this is being handled in hvmloader right now.
> > > And not to discourage you, just as a clarification, you'll also need to
> > > consider backward compatibility: unless the tables are identical, I
> > > imagine we'll have to keep using the old tables for already installed
> > > virtual machines.
> > 
> > I agree.  Some of them are already identical, some are not but the QEMU
> > version should be okay, and for yet more it's probably better to keep
> > the Xen-specific parts in hvmloader.
> > 
> > The good thing is that it's possible to proceed incrementally once you
> > have the hvmloader support for merging the QEMU and hvmloader RSDT or
> > XSDT (whatever you are using), starting with just NVDIMM and proceeding
> > later with whatever you see fit.
> > 
> 
> I'll have a try to check how much the differences would affect. If it
> would not take too much work, I'd like to adapt Xen NVDIMM enabling
> patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
> and MST's suggestions.

I don't agree with the end goal of fully switching to the QEMU build
ACPI tables. First of all, the only entity that has all the
information about the guest it's the toolstack, and so it should be
the one in control of the ACPI tables.

Also, Xen guests can use several device models concurrently (via the
ioreq server interface), and each should be able to contribute to the
information presented in the ACPI tables. Intel is also working on
adding IOMMU emulation to the Xen hypervisor, in which case the vIOMMU
ACPI tables should be created by the toolstack and not QEMU. And
finally keep in mind that there are Xen guests (PVH) that use ACPI
tables but not QEMU.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10 2/2] ubsan: disable unaligned access checks

2017-10-17 Thread Roger Pau Monné
On Tue, Oct 17, 2017 at 01:56:18PM +0100, Wei Liu wrote:
> On Tue, Oct 17, 2017 at 12:36:44PM +0100, Roger Pau Monne wrote:
> > Currently there are many offenders of the unaligned access checks,
> > which makes booting with the unaligned check a PVH Dom0 impossible.
> > 
> > The main offenders seem to be the ACPI code, the VMX code and
> > specially the intremap code (set_ire_sid).
> > 
> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> > ---
> > Cc: Andrew Cooper <andrew.coop...@citrix.com>
> > Cc: George Dunlap <george.dun...@eu.citrix.com>
> > Cc: Ian Jackson <ian.jack...@eu.citrix.com>
> > Cc: Jan Beulich <jbeul...@suse.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
> > Cc: Stefano Stabellini <sstabell...@kernel.org>
> > Cc: Tim Deegan <t...@xen.org>
> > Cc: Wei Liu <wei.l...@citrix.com>
> > Cc: Julien Grall <julien.gr...@arm.com>
> > ---
> > I'm not sure whether we prefer to fix the offenders, or just disable
> > the alignment wholesale. In any case if we decide to disable the
> > check, the patch should have vary low impact, and hence should be
> > committed to 4.10 on the base that it only affects ubsan, which is not
> > enabled by default and not to be used on production systems.
> 
> I would very much like to fix the offenders but if the fixes turn out to
> be cumbersome, so be it.
> 
> What is wrong to leave this enabled? Each location is reported once,
> right?

With clang it's reported every time it's hit AFAICT (certainly more
than once).

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10 1/2] ubsan: add clang 5.0 support

2017-10-17 Thread Roger Pau Monné
On Tue, Oct 17, 2017 at 06:43:28AM -0600, Jan Beulich wrote:
> >>> On 17.10.17 at 13:36, <roger@citrix.com> wrote:
> > clang 5.0 changed the layout of the type_mismatch_data structure and
> > introduced __ubsan_handle_type_mismatch_v1 and
> > __ubsan_handle_pointer_overflow.
> > 
> > This commit adds support for the new structure layout, adds the
> > missing handlers and the new types for type_check_kinds.
> > 
> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> 
> Acked-by: Jan Beulich <jbeul...@suse.com>
> with a remark:
> 
> > +void __ubsan_handle_pointer_overflow(struct pointer_overflow_data *data,
> > +   unsigned long base, unsigned long result)
> > +{
> > +   unsigned long flags;
> > +
> > +   if (suppress_report(>location))
> > +   return;
> > +
> > +   ubsan_prologue(>location, );
> > +
> > +   pr_err("pointer overflow:\n");
> > +
> > +   if (((long)base >= 0) == ((long)result >= 0))
> > +   pr_err("%s of unsigned offset to %p overflowed to %p\n",
> > +   base > result ? "addition" : "subtraction",
> 
> Strictly speaking you also want to make "to" conditional upon this
> being an add; for subtract it ought to be "from". Or perhaps just
> say overflow and underflow?
> 
> And then - is "base > result" really a valid determination of
> add/subtract (or overflow/underflow)? If the pointed to type is
> wider than one byte, an addition may wrap one or more times
> and still yield a value larger than the starting pointer.

That code is mostly adapted from upstream llvm:

https://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/ubsan/ubsan_handlers.cc?revision=313572=markup

But I see your point, if the types have different widths that check
won't work properly. In any case, I don't see a better way to deal
with this.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10] string: fix memmove when size is 0

2017-10-17 Thread Roger Pau Monné
On Tue, Oct 17, 2017 at 01:41:35PM +0100, Andrew Cooper wrote:
> On 17/10/17 13:03, Roger Pau Monne wrote:
> > ubsan in clang 5.0 complains with:
> >
> > (XEN) UBSAN: Undefined behaviour in string.c:50:28
> > (XEN) pointer overflow:
> > (XEN) addition of unsigned offset to 8310 overflowed to 
> > 830f
> > [...]
> > (XEN) Xen call trace:
> > (XEN)[] ubsan.c#ubsan_epilogue+0xd/0xc0
> > (XEN)[] __ubsan_handle_pointer_overflow+0xa5/0xe0
> > (XEN)[] memmove+0x67/0x80
> > (XEN)[] page_alloc.c#bootmem_region_add+0x157/0x220
> > (XEN)[] init_boot_pages+0x56/0x220
> > (XEN)[] __start_xen+0x2c2d/0x4b50
> > (XEN)[] __high_start+0x53/0x60
> >
> > This is due to memmove doing a n-1+addr when n is 0. This is harmless
> > because the loop is bounded to 0. Fix this by returning earlier when n
> > is 0.
> >
> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> 
> There are many passed values which could trigger this warning.  Does
> 
> diff --git a/xen/arch/x86/string.c b/xen/arch/x86/string.c
> index cd85a38..4f55856 100644
> --- a/xen/arch/x86/string.c
> +++ b/xen/arch/x86/string.c
> @@ -47,7 +47,7 @@ void *(memmove)(void *dest, const void *src, size_t n)
>  "   rep movsb   ; "
>  "   cld   "
>  : "=" (d0), "=" (d1), "=" (d2)
> -: "0" (n), "1" (n-1+(const char *)src), "2" (n-1+(char *)dest)
> +: "0" (n), "1" ((uintptr_t)src + n - 1), "2" ((uintptr_t)dest + n - 
> 1)
>  : "memory");
>  
>  return dest;
> 
> work any better?

That does indeed work, but I'm not sure if it would mask legitimate
pointer overflows by casting them into integers. In any case, as said
on IRC the only problematic case ATM is when n == 0.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10] xen/dom0: Fix latent dom0 construction bugs on all architectures

2017-10-17 Thread Roger Pau Monné
On Mon, Oct 16, 2017 at 03:38:03PM +0100, Andrew Cooper wrote:
>  * x86 PV and ARM dom0's must not clear _VPF_down from v->pause_flags until
>all state is actually set up.  As it currently stands, d0v0 is eligible for
>scheduling before its registers have been set.  This is latent as we also
>hold a systemcontroller pause reference at the time which prevents d0 from
>being scheduled.
> 
>  * x86 PVH dom0's must set v->is_initialised on d0v0, to prevent another vcpu
>being able to call VCPUOP_initialise and modify state under the feet of the
>running vcpu.  This is latent as PVH dom0 construction don't yet function.
> 
> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] mm/shadow: fix declaration of fetch_type_names

2017-10-17 Thread Roger Pau Monné
On Tue, Oct 17, 2017 at 11:29:14AM +0100, Andrew Cooper wrote:
> On 17/10/17 11:23, Roger Pau Monne wrote:
> > fetch_type_names usage is guarded by SHADOW_DEBUG_PROPAGATE in
> > SHADOW_DEBUG, fix the declaration so it's also guarded by
> > SHADOW_DEBUG_PROPAGATE instead of DEBUG_TRACE_DUMP.
> >
> > Signed-off-by: Roger Pau Monné <roger@citrix.com>
> 
> Possibly worth noting that this is exposed by Clang when building with
> UBSAN ?

Yes, I hope this can be done upon commit if there are no further
changes required. I would add:

"Noted while building with clang and ubsan enabled"

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] mm/shadow: fix declaration of fetch_type_names

2017-10-17 Thread Roger Pau Monné
Fix Julien Grall address, sorry.

On Tue, Oct 17, 2017 at 11:23:53AM +0100, Roger Pau Monne wrote:
> fetch_type_names usage is guarded by SHADOW_DEBUG_PROPAGATE in
> SHADOW_DEBUG, fix the declaration so it's also guarded by
> SHADOW_DEBUG_PROPAGATE instead of DEBUG_TRACE_DUMP.
> 
> Signed-off-by: Roger Pau Monné <roger@citrix.com>
> ---
> Cc: Tim Deegan <t...@xen.org>
> Cc: George Dunlap <george.dun...@eu.citrix.com>
> Cc: Jan Beulich <jbeul...@suse.com>
> Cc: Andrew Cooper <andrew.coop...@citrix.com>
> Cc: Julien Grall <julien.gr...@citrix.com>
> ---
> IMHO, this is a simple compile-time fix, so it should be accepted for
> 4.10. Any breaking caused by this commit will be spotted at compile
> time.
> ---
>  xen/arch/x86/mm/shadow/multi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> index d540af11d7..9156382056 100644
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -77,7 +77,7 @@ typedef enum {
>  
>  extern const char *const fetch_type_names[];
>  
> -#if defined(DEBUG_TRACE_DUMP) && CONFIG_PAGING_LEVELS == GUEST_PAGING_LEVELS
> +#if SHADOW_DEBUG_PROPAGATE && CONFIG_PAGING_LEVELS == GUEST_PAGING_LEVELS
>  const char *const fetch_type_names[] = {
>  [ft_prefetch] = "prefetch",
>  [ft_demand_read]  = "demand read",
> -- 
> 2.13.5 (Apple Git-94)
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.10] xen/dom0: Fix latent dom0 construction bugs on all architectures

2017-10-16 Thread Roger Pau Monné
On Mon, Oct 16, 2017 at 03:38:03PM +0100, Andrew Cooper wrote:
>  * x86 PV and ARM dom0's must not clear _VPF_down from v->pause_flags until
>all state is actually set up.  As it currently stands, d0v0 is eligible for
>scheduling before its registers have been set.  This is latent as we also
>hold a systemcontroller pause reference at the time which prevents d0 from
>being scheduled.
> 
>  * x86 PVH dom0's must set v->is_initialised on d0v0, to prevent another vcpu
>being able to call VCPUOP_initialise and modify state under the feet of the
>running vcpu.  This is latent as PVH dom0 construction don't yet function.
> 
> Signed-off-by: Andrew Cooper 

LGTM, just one question.

> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> index e8f746c..a67071c 100644
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -614,6 +614,7 @@ static int __init pvh_setup_cpus(struct domain *d, 
> paddr_t entry,
>  
>  update_domain_wallclock_time(d);
>  
> +v->is_initialised = 1;
>  clear_bit(_VPF_down, >pause_flags);

Don't you want to move this to the end of dom0_construct_pvh? In any
case, at this point the vCPU state is already setup correctly.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 09/11] vpci/msi: add MSI handlers

2017-10-13 Thread Roger Pau Monné
On Tue, Oct 10, 2017 at 11:35:26AM +, Roger Pau Monné wrote:
> On Wed, Oct 04, 2017 at 08:34:13AM +, Jan Beulich wrote:
> > >>> On 19.09.17 at 17:29, <roger@citrix.com> wrote:
> > > +static void vpci_msi_enable(const struct pci_dev *pdev, struct vpci_msi 
> > > *msi,
> > > +unsigned int vectors)
> > > +{
> > > +int ret;
> > > +
> > > +ASSERT(!msi->enabled);
> > > +ret = vpci_msi_arch_enable(msi, pdev, vectors);
> > > +if ( ret )
> > > +return;
> > > +
> > > +/* Apply the mask bits. */
> > > +if ( msi->masking )
> > > +{
> > > +unsigned int i;
> > > +uint32_t mask = msi->mask;
> > > +
> > > +for ( i = ffs(mask) - 1; mask && i < vectors; i = ffs(mask) - 1 )
> > > +{
> > > +vpci_msi_arch_mask(msi, pdev, i, true);
> > > +__clear_bit(i, );
> > > +}
> > > +}
> > > +
> > > +__msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> > > + PCI_FUNC(pdev->devfn), msi->pos, 1);
> > 
> > This is very unlikely to be a function that arch-independent code is
> > permitted to call.
> 
> Right, I could remove the '__' prefix, or introduce a
> vpci_msi_arch_dev_enable helper that calls this function.

So would using msi_set_enable instead be acceptable?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/3] x86/pvh: use max_pdx to calculate the paging memory usage

2017-10-13 Thread Roger Pau Monné
On Fri, Oct 13, 2017 at 08:59:29AM +, Jan Beulich wrote:
> >>> On 13.10.17 at 10:49,  wrote:
>  On 29.09.17 at 13:25,  wrote:
> >> nr_pages doesn't take into account holes or MMIO regions, and
> >> underestimates the amount of memory needed for paging. Be on the safe
> >> side and use max_pdx instead.
> >> 
> >> Note that both cases are just approximations, but using max_pdx yields
> >> a number of free pages after Dom0 build always greater than the
> >> minimum reserve (either 1/16 of memory or 128MB, whatever is
> >> smaller).
> >> 
> >> Without this patch on a 16GB box the amount of free memory after
> >> building Dom0 without specifying any dom0_mem parameter would be
> >> 122MB, with this patch applied the amount of free memory after Dom0
> >> build is 144MB, which is greater than the reserved 128MB.
> > 
> > For the case of there not being a "dom0_mem=" this may indeed
> > be acceptable (albeit I notice the gap is larger than before, just
> > this time in the right direction). For the supposedly much more
> > common case of there being "dom0_mem=" (and with a positive
> > value), however, not using nr_pages ...
> >> @@ -288,7 +289,7 @@ unsigned long __init dom0_compute_nr_pages(
> >>  break;
> >>  
> >>  /* Reserve memory for shadow or HAP. */
> >> -avail -= dom0_paging_pages(d, nr_pages);
> >> +avail -= paging_pgs;
> > 
> > ... here is likely going to result in a huge overestimation.
> 
> Which I realize may or may not be a problem - the question is
> whether and if so how far the clamping done by
> 
> nr_pages = min(nr_pages, avail);
> 
> above here would result in a meaningfully different amount of
> memory Dom0 may get for certain command line option / total
> amount of memory combinations. I.e. quite a bit more than a
> single data point would need to be provided to prove this isn't
> going to be perceived as a regression by anyone.

What about using something like max_pdx - total_pages + nr_pages, that
seems like a good compromise that should take into account the MMIO
holes without overestimating as much as just using max_pdx.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [osstest test] 114342: regressions - trouble: blocked/broken/fail/pass

2017-10-12 Thread Roger Pau Monné
On Thu, Oct 12, 2017 at 02:24:23PM +, Ian Jackson wrote:
> osstest service owner writes ("[osstest test] 114342: regressions - trouble: 
> blocked/broken/fail/pass"):
> > flight 114342 osstest real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/114342/
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-armhf-armhf-xl-vhd  broken
> >  test-armhf-armhf-xl-vhd   4 host-install(4)broken REGR. vs. 
> > 114191
> >  test-armhf-armhf-examine  4 host-install   broken REGR. vs. 
> > 114191
> >  test-amd64-i386-xl-qemuu-ovmf-amd64 10 debian-hvm-install fail REGR. vs. 
> > 114191
> >  build-armhf-xsm   6 xen-buildfail REGR. vs. 
> > 114191
> 
> None of these are related to the only osstest patch under test,
> 
>  pvh: rename pvh tests to pvhv2
> 
> The armhf build failure looks like the armhf node crashing.
> 
> I have therefore force pushed this:
> 
> > version targeted for testing:
> >  osstest  2fe57c885d0290caf2d707893bb3bf3f5e8165f5

Thanks, hopefully this will unblock the branches stuck because of the
pvh "regression".

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xl: dm_restrict: Document that it does not work with PV

2017-10-12 Thread Roger Pau Monné
On Thu, Oct 12, 2017 at 11:21:07AM +, Ian Jackson wrote:
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>
> Reported-by: Roger Pau Monné <roger@citrix.com>

Reviewed-by: Roger Pau Monné <roger@citrix.com>

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] libxl: dm_restrict: Move to domain_build_info

2017-10-12 Thread Roger Pau Monné
On Thu, Oct 12, 2017 at 11:21:06AM +, Ian Jackson wrote:
> Right now, this is broken because libxl__build_device_model_args_new
> is used also for the qemu run for pv guests for qdisk devices, pvfb,
> etc.
> 
> We can either make this option properly HVM-specific, or make it
> generic.
> 
> In principle it is a reasonable request, to make the PV qemu
> deprivileged (even though it is not likely to be implemented any time
> soon).  So make this option generic.
> 
> We retain the name "device model" even though it is arguably
> inaccurate, because the xl docs already say, for example
>   For a PV guest a device-model is sometimes used to provide backends
>   for certain PV devices
> 
> The documentation patch here is pure code motion.  For ease of review
> we will fix up the docs, so the wording to be right for the new
> context, in the next patch.
> 
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>
> Reported-by: Roger Pau Monné <roger@citrix.com>
> ---
>  docs/man/xl.cfg.pod.5.in| 198 
> ++--
>  tools/libxl/libxl_create.c  |   2 +-
>  tools/libxl/libxl_dm.c  |   6 +-
>  tools/libxl/libxl_types.idl |   2 +-

This is at least missing a change to xl_parse.c AFAICT.

The rest LGTM.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [qemu-upstream-unstable test] 114273: regressions - FAIL

2017-10-11 Thread Roger Pau Monné
On Wed, Oct 11, 2017 at 11:34:23AM +, Anthony PERARD wrote:
> On Wed, Oct 11, 2017 at 04:33:26AM +, osstest service owner wrote:
> > flight 114273 qemu-upstream-unstable real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/114273/
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-amd64-xl-pvh-intel 12 guest-start fail REGR. vs. 
> > 114029
> >  test-amd64-amd64-xl-pvh-amd  12 guest-start  fail REGR. vs. 
> > 114029
> 
> Hi,
> 
> Is this an expected failure due to recent change about pvh (in libxl)?
> 
> In the logs, `xl create` fails with this:
> xc: error: panic: xc_dom_x86.c:1587: bootlate_pv: pin_table failed (pfn 
> 0x2374, rc=1): Internal error
> libxl: error: libxl_dom.c:745:libxl__build_dom: xc_dom_boot_image failed: 
> Invalid argument
> domainbuilder: detail: xc_dom_release: called
> 
> 
> Can we force push? as this error does not seems to be related to QEMU.

I've sent a patch for osstest this morning that should solve this:

https://lists.xenproject.org/archives/html/xen-devel/2017-10/msg01328.html

Ian has already pushed it, so it shouldn't take long before this is
fixed, sorry for the breakage.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [linux-3.18 bisection] complete test-amd64-amd64-xl-pvh-intel

2017-10-11 Thread Roger Pau Monné
On Wed, Oct 11, 2017 at 04:54:40AM +, osstest service owner wrote:
> branch xen-unstable
> xenbranch xen-unstable
> job test-amd64-amd64-xl-pvh-intel
> testid guest-start
> 
> Tree: linux 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
> Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
> Tree: qemuu git://xenbits.xen.org/qemu-xen.git
> Tree: xen git://xenbits.xen.org/xen.git
> 
> *** Found and reproduced problem changeset ***
> 
>   Bug is in tree:  xen git://xenbits.xen.org/xen.git
>   Bug introduced:  c7dfe4ac58dd9c8678126b78da961b233a49f3f9
>   Bug not present: 3c44f8ed44ab559c7e5b58316751bea53adfd83b
>   Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/114323/
> 
> 
>   commit c7dfe4ac58dd9c8678126b78da961b233a49f3f9
>   Author: Roger Pau Monne 
>   Date:   Fri Sep 22 16:25:06 2017 +0100
>   
>   xl: introduce a domain type option
>   
>   Introduce a new type option to xl configuration files in order to
>   specify the domain type. This supersedes the current builder option.
>   
>   The new option is documented in the xl.cfg man page, and the previous
>   builder option is marked as deprecated.
>   
>   Signed-off-by: Roger Pau Monn?? 
>   Acked-by: Ian Jackson 

This branch will have to be force pushed, together with the 4.9 one
AFAICT. This test succeed previously because we where testing a classic
PV guest instead of a PVH one.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   3   4   5   6   7   8   9   10   >