date:20170816

[Xen-devel] [xen-4.6-testing test] 112661: tolerable FAIL - PUSHED

2017-08-16 Thread osstest service owner

flight 112661 xen-4.6-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112661/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-xtf-amd64-amd64-4 47 xtf/test-hvm64-lbr-tsx-vmentry fail in 112648 pass 
in 112661
 test-xtf-amd64-amd64-4 21 xtf/test-hvm32-invlpg~shadow fail pass in 112648
 test-xtf-amd64-amd64-4  35 xtf/test-hvm32pae-invlpg~shadow fail pass in 112648
 test-xtf-amd64-amd64-3   48 xtf/test-hvm64-lbr-tsx-vmentry fail pass in 112648
 test-xtf-amd64-amd64-4 47 xtf/test-hvm64-invlpg~shadow fail pass in 112648
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail 
pass in 112648

Tests which did not succeed, but are not blocking:
 test-xtf-amd64-amd64-3 21 xtf/test-hvm32-invlpg~shadow fail in 112648 like 
111467
 test-xtf-amd64-amd64-3 34 xtf/test-hvm32pae-invlpg~shadow fail in 112648 like 
111467
 test-xtf-amd64-amd64-3 46 xtf/test-hvm64-invlpg~shadow fail in 112648 like 
111467
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail in 112648 never pass
 test-xtf-amd64-amd64-2  48 xtf/test-hvm64-lbr-tsx-vmentry fail like 111492
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 111492
 test-xtf-amd64-amd64-5  48 xtf/test-hvm64-lbr-tsx-vmentry fail like 111514
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 111514
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 111514
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 111514
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 111514
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 111514
 test-xtf-amd64-amd64-5   70 xtf/test-pv32pae-xsa-194 fail   never pass
 test-xtf-amd64-amd64-2   70 xtf/test-pv32pae-xsa-194 fail   never pass
 test-xtf-amd64-amd64-1   70 xtf/test-pv32pae-xsa-194 fail   never pass
 test-amd64-amd64-xl-pvh-intel 15 guest-saverestorefail  never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-xtf-amd64-amd64-3   70 xtf/test-pv32pae-xsa-194 fail   never pass
 test-xtf-amd64-amd64-4   70 xtf/test-pv32pae-xsa-194 fail   never pass
 test-amd64-amd64-xl-pvh-amd  12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore   fail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore   fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10

[Xen-devel] "MMIO emulation failed" from booting OVMF on Xen v4.9.0

2017-08-16 Thread Andri Möll


Hey,

As per Andrew [Cooper]'s suggestion, writing here instead of #xen on 
Freenode.


I'm trying out Xen (4.9.0) with OVMF (r21243.3858b4a1ff-1) and having it 
crash right on boot both with the 32b and 64b OVMF binaries. This is on 
Arch Linux, AMD Ryzen on a X370 motherboard.


Given the following minimal VM declaration:

builder = "hvm"
maxmem = 512
memory = 512
vcpus = 1
on_poweroff = "destroy"
on_reboot = "destroy"
on_crash = "destroy"
bios = "ovmf"
device_model_version = "qemu-xen"
bios_path_override = "/usr/share/ovmf/ovmf_code_ia32.bin"
and running it with `xl create vm.cfg`, I see it crash while booting 
with the following displayed by `xl dmesg`:


(XEN) MMIO emulation failed: d1v0 16bit @ f000:ff54 -> 66 ea 5c ff 
ff ff 10 00 b8 40 06 00 00 0f 22

(XEN) d1v0 Triple fault - invoking HVM shutdown action 1
I've run the hypervisor with `guest_loglvl=all` for more output and 
attached it here and uploaded it at 
https://gist.github.com/moll/a46dffc7466ced93a0365a6916a4db96 in case 
the file doesn't go through.


Any ideas anyone? Thanks in advance!

Andri
(XEN) HVM1 save: CPU
(XEN) HVM1 save: PIC
(XEN) HVM1 save: IOAPIC
(XEN) HVM1 save: LAPIC
(XEN) HVM1 save: LAPIC_REGS
(XEN) HVM1 save: PCI_IRQ
(XEN) HVM1 save: ISA_IRQ
(XEN) HVM1 save: PCI_LINK
(XEN) HVM1 save: PIT
(XEN) HVM1 save: RTC
(XEN) HVM1 save: HPET
(XEN) HVM1 save: PMTIMER
(XEN) HVM1 save: MTRR
(XEN) HVM1 save: VIRIDIAN_DOMAIN
(XEN) HVM1 save: CPU_XSAVE
(XEN) HVM1 save: VIRIDIAN_VCPU
(XEN) HVM1 save: VMCE_VCPU
(XEN) HVM1 save: TSC_ADJUST
(XEN) HVM1 save: CPU_MSR
(XEN) HVM1 restore: CPU 0
(d1) HVM Loader
(d1) Detected Xen v4.9.0
(d1) Xenbus rings @0xfeffc000, event channel 1
(d1) System requested OVMF
(d1) CPU speed is 3001 MHz
(d1) Relocating guest memory for lowmem MMIO space disabled
(d1) PCI-ISA link 0 routed to IRQ5
(d1) PCI-ISA link 1 routed to IRQ10
(d1) PCI-ISA link 2 routed to IRQ11
(d1) PCI-ISA link 3 routed to IRQ5
(d1) pci dev 01:3 INTA->IRQ10
(d1) pci dev 02:0 INTA->IRQ11
(d1) No RAM in high memory; setting high_mem resource base to 1
(d1) pci dev 03:0 bar 10 size 00200: 0f008
(d1) pci dev 02:0 bar 14 size 00100: 0f208
(d1) pci dev 03:0 bar 30 size 1: 0f300
(d1) pci dev 03:0 bar 14 size 01000: 0f301
(d1) pci dev 02:0 bar 10 size 00100: 0c001
(d1) pci dev 01:1 bar 20 size 00010: 0c101
(d1) Multiprocessor initialisation:
(d1)  - CPU0 ... 48-bit phys ... fixed MTRRs ... var MTRRs [1/8] ... done.
(d1) Writing SMBIOS tables ...
(d1) Loading OVMF ...
(XEN) d1v0 Over-allocation for domain 1: 131329 > 131328
(d1) Loading ACPI ...
(d1) CONV disabled
(d1) vm86 TSS at fc00a400
(d1) BIOS map:
(d1)  ffe0-fffd: Main BIOS
(d1) E820 table:
(d1)  [00]: : - :000a: RAM
(d1)  HOLE: :000a - :000f
(d1)  [01]: :000f - :0010: RESERVED
(d1)  [02]: :0010 - :1f715000: RAM
(d1)  HOLE: :1f715000 - :fc00
(d1)  [03]: :fc00 - 0001:: RESERVED
(d1) Invoking OVMF ...
(XEN) MMIO emulation failed: d1v0 16bit @ f000:ff54 -> 66 ea 5c ff ff ff 10 
00 b8 40 06 00 00 0f 22
(XEN) d1v0 Triple fault - invoking HVM shutdown action 1
(XEN) *** Dumping Dom1 vcpu#0 state: ***
(XEN) [ Xen-4.9.0  x86_64  debug=n   Not tainted ]
(XEN) CPU:6
(XEN) RIP:f000:[]
(XEN) RFLAGS: 0046   CONTEXT: hvm guest (d1v0)
(XEN) rax: 4023   rbx: ff74   rcx: 
(XEN) rdx:    rsi:    rdi: 5042
(XEN) rbp:    rsp:    r8:  
(XEN) r9:     r10:    r11: 
(XEN) r12:    r13:    r14: 
(XEN) r15:    cr0: 4033   cr4: 
(XEN) cr3:    cr2: 
(XEN) ds: f000   es:    fs:    gs:    ss:    cs: f000___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] linux-next: manual merge of the xen-tip tree with the tip tree

2017-08-16 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the xen-tip tree got a conflict in:

  arch/x86/entry/entry_64.S

between commit:

  UNWIND_HINT_IRET_REGS ("x86/entry/64: Add unwind hint annotations")

from the tip tree and commit:

  ad5b8c4ba323 ("xen: get rid of paravirt op adjust_exception_frame")

from the xen-tip tree.

I fixed it up (see below - though I don't know if a further adjustment
is required) and can carry the fix as necessary. This is now fixed as
far as linux-next is concerned, but any non trivial conflicts should be
mentioned to your upstream maintainer when your tree is submitted for
merging.  You may also want to consider cooperating with the maintainer
of the conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/entry/entry_64.S
index ca0b250eefc4,67fefaf21312..
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@@ -978,15 -891,17 +977,15 @@@ bad_gs
  ENTRY(do_softirq_own_stack)
pushq   %rbp
mov %rsp, %rbp
 -  inclPER_CPU_VAR(irq_count)
 -  cmove   PER_CPU_VAR(irq_stack_ptr), %rsp
 -  push%rbp/* frame pointer backlink */
 +  ENTER_IRQ_STACK regs=0 old_rsp=%r11
call__do_softirq
 +  LEAVE_IRQ_STACK regs=0
leaveq
 -  declPER_CPU_VAR(irq_count)
ret
 -END(do_softirq_own_stack)
 +ENDPROC(do_softirq_own_stack)
  
  #ifdef CONFIG_XEN
- idtentry xen_hypervisor_callback xen_do_hypervisor_callback has_error_code=0
+ idtentry hypervisor_callback xen_do_hypervisor_callback has_error_code=0
  
  /*
   * A note on the "critical region" in our callback handler.
@@@ -1053,9 -967,6 +1052,7 @@@ ENTRY(xen_failsafe_callback
movq8(%rsp), %r11
addq$0x30, %rsp
pushq   $0  /* RIP */
-   pushq   %r11
-   pushq   %rcx
 +  UNWIND_HINT_IRET_REGS offset=8
jmp general_protection
  1:/* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
movq(%rsp), %rcx
@@@ -1251,20 -1156,8 +1247,9 @@@ ENTRY(error_exit
  END(error_exit)
  
  /* Runs on exception stack */
+ /* XXX: broken on Xen PV */
  ENTRY(nmi)
 +  UNWIND_HINT_IRET_REGS
-   /*
-* Fix up the exception frame if we're on Xen.
-* PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most
-* one value to the stack on native, so it may clobber the rdx
-* scratch slot, but it won't clobber any of the important
-* slots past it.
-*
-* Xen is a different story, because the Xen frame itself overlaps
-* the "NMI executing" variable.
-*/
-   PARAVIRT_ADJUST_EXCEPTION_FRAME
- 
/*
 * We allow breakpoints in NMIs. If a breakpoint occurs, then
 * the iretq it performs will take us out of NMI context.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/4] x86/vtd: introduce a PVH implementation of iommu_inclusive_mapping

2017-08-16 Thread Tian, Kevin

> From: Roger Pau Monne [mailto:roger@citrix.com]
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> On certain Intel systems, as far as I can tell almost all pre-Haswell ones,
> trying to boot a PVH Dom0 will freeze the box completely, up to the point
> that
> not even the watchdog works. The freeze happens exactly when enabling
> the DMA
> remapping in the IOMMU, the last line seen is:
> 
> (XEN) [VT-D]iommu_enable_translation: iommu->reg = 82c00021b000
> 
> In order to workaround this (which seems to be a lack of proper RMRR
> entries,

since you position this patch as 'workaround', what is the side-effect with
such workaround? Do you want to restrict such workaround only for old
boxes?

better you can also put some comment in the code so others can understand
why pvh requires its own way when reading the code.

> plus the IOMMU being unable to generate faults and freezing the entire
> system)
> add a PVH specific implementation of iommu_inclusive_mapping, that
> maps
> non-RAM, non-unusable regions into Dom0 p2m. Note that care is taken to
> not map
> device MMIO regions that Xen is emulating, like the local APIC or the IO
> APIC.
> 
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Kevin Tian 
> ---
>  xen/drivers/passthrough/vtd/extern.h  |  1 +
>  xen/drivers/passthrough/vtd/iommu.c   |  2 ++
>  xen/drivers/passthrough/vtd/x86/vtd.c | 39
> +++
>  3 files changed, 42 insertions(+)
> 
> diff --git a/xen/drivers/passthrough/vtd/extern.h
> b/xen/drivers/passthrough/vtd/extern.h
> index fb7edfaef9..0eaf8956ff 100644
> --- a/xen/drivers/passthrough/vtd/extern.h
> +++ b/xen/drivers/passthrough/vtd/extern.h
> @@ -100,5 +100,6 @@ bool_t platform_supports_intremap(void);
>  bool_t platform_supports_x2apic(void);
> 
>  void vtd_set_hwdom_mapping(struct domain *d);
> +void vtd_set_pvh_hwdom_mapping(struct domain *d);
> 
>  #endif // _VTD_EXTERN_H_
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index daaed0abbd..8ed28defe2 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1303,6 +1303,8 @@ static void __hwdom_init
> intel_iommu_hwdom_init(struct domain *d)
>  /* Set up 1:1 page table for hardware domain. */
>  vtd_set_hwdom_mapping(d);
>  }
> +else if ( is_hvm_domain(d) )
> +vtd_set_pvh_hwdom_mapping(d);

Can you elaborate a bit here? Current condition is:

if ( !iommu_passthrough && !need_iommu(d) )
{
/* Set up 1:1 page table for hardware domain. */
vtd_set_hwdom_mapping(d);
}

So you assume for PVH above condition will never be true?

> 
>  setup_hwdom_pci_devices(d, setup_hwdom_device);
>  setup_hwdom_rmrr(d);
> diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
> b/xen/drivers/passthrough/vtd/x86/vtd.c
> index 88a60b3307..79c9b0526f 100644
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> @@ -21,10 +21,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "../iommu.h"
>  #include "../dmar.h"
> @@ -159,3 +161,40 @@ void __hwdom_init
> vtd_set_hwdom_mapping(struct domain *d)
>  }
>  }
> 
> +void __hwdom_init vtd_set_pvh_hwdom_mapping(struct domain *d)
> +{
> +unsigned long pfn;
> +
> +BUG_ON(!is_hardware_domain(d));
> +
> +if ( !iommu_inclusive_mapping )
> +return;
> +
> +/* NB: the low 1MB is already mapped in pvh_setup_p2m. */
> +for ( pfn = PFN_DOWN(MB(1)); pfn < PFN_DOWN(GB(4)); pfn++ )
> +{
> +p2m_access_t a;
> +int rc;
> +
> +if ( !(pfn & 0xfff) )
> +process_pending_softirqs();
> +
> +/* Skip RAM, ACPI and unusable regions. */
> +if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) ||
> + page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) ||
> + page_is_ram_type(pfn, RAM_TYPE_ACPI) ||
> + !iomem_access_permitted(d, pfn, pfn) )
> +continue;

I'm a bit confused here. So you only handle RESERVED memory
type here, which doesn't match the definition of inclusive mapping.

/*
 * iommu_inclusive_mapping: when set, all memory below 4GB is included in dom0
 * 1:1 iommu mappings except xen and unusable regions.
 */

there must be some background which I missed...

> +
> +ASSERT(!xen_in_range(pfn));
> +
> +a = rangeset_contains_range(mmio_ro_ranges, pfn, pfn) ?
> p2m_access_r
> +  : 
> p2m_access_rw;
> +rc = set_identity_p2m_entry(d, pfn, a, 0);
> +if ( rc )
> +   printk(XENLOG_WARNING VTDPREFIX
> +  " d%d: IOMMU mapping failed pfn %#lx: %d\n",
> +  d->domain_id, pfn, rc);
> +}
> +}
> +
> --
> 2.11.0 (Apple Git-81)

___
Xen-devel mailing list

Re: [Xen-devel] [PATCH v2 2/4] x86/dom0: prevent PVH Dom0 from mapping read-only the IO APIC area

2017-08-16 Thread Tian, Kevin

> From: Roger Pau Monne
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> This is emulated by Xen and must not be mapped into PVH Dom0 p2m.

same comment as previous one. please send it separately.

> 
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
>  xen/arch/x86/dom0_build.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
> index 3e0910d779..804efee1a9 100644
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -402,7 +402,7 @@ int __init dom0_setup_permissions(struct domain
> *d)
>  for ( i = 0; i < nr_ioapics; i++ )
>  {
>  mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr);
> -if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
> +if ( dom0_pvh || !rangeset_contains_singleton(mmio_ro_ranges,
> mfn) )
>  rc |= iomem_deny_access(d, mfn, mfn);
>  }
>  /* MSI range. */
> --
> 2.11.0 (Apple Git-81)
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] x86/dom0: prevent access to MMCFG areas for PVH Dom0

2017-08-16 Thread Tian, Kevin

> From: Roger Pau Monne
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> They are emulated by Xen, so they must not be mapped into Dom0 p2m.
> Introduce a helper function to add the MMCFG areas to the list of
> denied iomem regions for PVH Dom0.
> 
> Signed-off-by: Roger Pau Monné 

this patch is a general fix, not just for inclusive mapping. please send
it separately.

> ---
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> ---
> Changes since RFC:
>  - Introduce as helper instead of exposing the internal mmcfg
>variables to the Dom0 builder.
> ---
>  xen/arch/x86/dom0_build.c |  4 
>  xen/arch/x86/x86_64/mmconfig_64.c | 21 +
>  xen/include/xen/pci.h |  2 ++
>  3 files changed, 27 insertions(+)
> 
> diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
> index 0c125e61eb..3e0910d779 100644
> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain
> *d)
>  rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
>  }
> 
> +/* For PVH prevent access to the MMCFG areas. */
> +if ( dom0_pvh )
> +rc |= pci_mmcfg_set_domain_permissions(d);
> +
>  return rc;
>  }
> 
> diff --git a/xen/arch/x86/x86_64/mmconfig_64.c
> b/xen/arch/x86/x86_64/mmconfig_64.c
> index e84a67dfc4..271fad407f 100644
> --- a/xen/arch/x86/x86_64/mmconfig_64.c
> +++ b/xen/arch/x86/x86_64/mmconfig_64.c
> @@ -15,6 +15,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> 
>  #include "mmconfig.h"
> 
> @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx)
> cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number);
>  }
> 
> +int pci_mmcfg_set_domain_permissions(struct domain *d)
> +{
> +unsigned int idx;
> +int rc = 0;
> +
> +for ( idx = 0; idx < pci_mmcfg_config_num; idx++ )
> +{
> +const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg;
> +unsigned long start = PFN_DOWN(cfg->address) +
> +  PCI_BDF(cfg->start_bus_number, 0, 0);
> +unsigned long end = PFN_DOWN(cfg->address) +
> +PCI_BDF(cfg->end_bus_number, ~0, ~0);
> +
> +rc |= iomem_deny_access(d, start, end);
> +}
> +
> +return rc;
> +}
> +
>  bool_t pci_mmcfg_decode(unsigned long mfn, unsigned int *seg,
>  unsigned int *bdf)
>  {
> diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
> index 59b6e8a81c..ea6a66b248 100644
> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -170,4 +170,6 @@ int msixtbl_pt_register(struct domain *, struct pirq
> *, uint64_t gtable);
>  void msixtbl_pt_unregister(struct domain *, struct pirq *);
>  void msixtbl_pt_cleanup(struct domain *d);
> 
> +int pci_mmcfg_set_domain_permissions(struct domain *d);
> +
>  #endif /* __XEN_PCI_H__ */
> --
> 2.11.0 (Apple Git-81)
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [qemu-mainline test] 112658: regressions - trouble: blocked/broken/fail/pass

2017-08-16 Thread osstest service owner

flight 112658 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112658/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl 16 guest-start/debian.repeat fail REGR. vs. 112646

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-xsm   2 hosts-allocate  broken like 112646
 build-arm64   2 hosts-allocate  broken like 112646
 build-arm64-xsm   3 capture-logsbroken like 112646
 build-arm64   3 capture-logsbroken like 112646
 build-arm64-pvops 2 hosts-allocate  broken like 112646
 build-arm64-pvops 3 capture-logsbroken like 112646
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 112646
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 112646
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 112646
 test-amd64-amd64-xl-qemuu-win7-amd64 18 guest-start/win.repeat fail like 112646
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 112646
 test-amd64-amd64-xl-rtds 10 debian-install   fail  like 112646
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 112646
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore   fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass

version targeted for testing:
 qemuu1f296733876434118fd766cfef5eb6f29ecab6a8
baseline version:
 qemuuc4a6a8887c1b2a669e35ff9da9530824300bdce4

Last test of basis   112646  2017-08-15 13:30:37 Z1 days
Testing same since   112658  2017-08-16 01:19:21 Z1 days1 attempts


People who touched revisions under test:
  Alistair Francis 
  Eric Blake 
  Fam Zheng 
  Kevin Wolf 
  Paolo Bonzini 
  Peter Maydell 
  Portia Stephens 
  Richard Henderson

Re: [Xen-devel] [PATCH v2 0/4] x86/pvh: implement iommu_inclusive_mapping for PVH Dom0

2017-08-16 Thread Tian, Kevin

> From: Roger Pau Monne
> Sent: Saturday, August 12, 2017 12:43 AM
> 
> Hello,
> 
> Currently iommu_inclusive_mapping is not working for PVH Dom0, this

not working for all platforms or only older boxes? The subject indicates
the former while the later description seems the latter...

> patch
> series allows using it for a PVH Dom0, which seems to be required in order
> to
> boot on older boxes.
> 
> Git branch can be found at:
> 
> git://xenbits.xen.org/people/royger/xen.git iommu_inclusive_v2
> 
> Thanks, Roger.
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 2/3] x86/p2m: make p2m_alloc_ptp() return an MFN

2017-08-16 Thread Tian, Kevin

> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, August 11, 2017 9:20 PM
> 
> None of the callers really needs the struct page_info pointer.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: George Dunlap 

Reviewed-by: Kevin Tian 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86: Skip check apic_id_limit for Xen

2017-08-16 Thread Lan Tianyu

On 2017年08月16日 19:21, Paolo Bonzini wrote:
> On 16/08/2017 02:22, Lan Tianyu wrote:
>> Xen vIOMMU device model will be in Xen hypervisor. Skip vIOMMU
>> check for Xen here when vcpu number is more than 255.
> 
> I think you still need to do a check for vIOMMU being enabled.

Yes, this will be done in the Xen tool stack and Qemu doesn't have such
knowledge. Operations of create, destroy Xen vIOMMU will be done in the
Xen tool stack.

> 
> Paolo
> 
>> Signed-off-by: Lan Tianyu 
>> ---
>>  hw/i386/pc.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index 5943539..fc17885 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -1260,7 +1260,7 @@ void pc_machine_done(Notifier *notifier, void *data)
>>  fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
>>  }
>>  
>> -if (pcms->apic_id_limit > 255) {
>> +if (pcms->apic_id_limit > 255 && !xen_enabled()) {
>>  IntelIOMMUState *iommu = 
>> INTEL_IOMMU_DEVICE(x86_iommu_get_default());
>>  
>>  if (!iommu || !iommu->x86_iommu.intr_supported ||
>>
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [linux-3.18 test] 112657: trouble: blocked/broken/fail/pass

2017-08-16 Thread osstest service owner

flight 112657 linux-3.18 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112657/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-pvops 3 capture-logs   broken REGR. vs. 112102

Regressions which are regarded as allowable (not blocking):
 build-arm64-xsm   2 hosts-allocate broken REGR. vs. 112102
 build-arm64-pvops 2 hosts-allocate broken REGR. vs. 112102
 build-arm64   2 hosts-allocate broken REGR. vs. 112102

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-xsm   3 capture-logs  broken blocked in 112102
 build-arm64   3 capture-logs  broken blocked in 112102
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 112085
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 112085
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 112102
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail like 112102
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-localmigrate/x10 fail like 112102
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 112102
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 112102
 test-amd64-amd64-xl-rtds 10 debian-install   fail  like 112102
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore   fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore   fail never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass

version targeted for testing:
 linux945f1d358141d5d0310966647f58af9f7e740d14
baseline version:
 linuxdd8b674caeef9381345a6369fba29d425ff433f3

Last test of basis   112102  2017-07-21 17:53:24 Z   26 days
Failing since112351

Re: [Xen-devel] [PATCH v2 0/2] x86: paravirt related cleanup

2017-08-16 Thread Rusty Russell

Juergen Gross  writes:
> Cleanup special cases of paravirt patching:
>
> - Xen doesn't need a custom patching function, it can use
>   paravirt_patch_default()
>
> - Remove lguest completely from the tree. A LKML mail asking for any
>   users 3 months ago did not reveal any need for keeping lguest [1].

Shit, I didn't see that mail :(

Posting on lkml is a terrible way to find users (you should generally
remove the config option, wait a year, then see, as that gives end users
time to find it).

In this case though, I think it's time.  I intended for it to be removed
with the paravirt infrastructure itself, but I think that's getting
closer anyway.

Acked-by: Rusty Russell 

> In case the patches make it to the tree there is quite some potential
> for further simplification of paravirt stuff. Especially most of the
> pv operations can be put under the CONFIG_XEN_PV umbrella.
>
> Changes in V2:
> - drop patch 3 (removal of vsmp support)
> - patch 1: remove even more stuff no longer needed without xen_patch()
> (Peter Zijlstra)
>
> [1]: https://lkml.org/lkml/2017/5/15/502
>
> Juergen Gross (2):
>   paravirt,xen: remove xen_patch()
>   x86/lguest: remove lguest support
>
>  MAINTAINERS   |   11 -
>  arch/x86/Kbuild   |3 -
>  arch/x86/Kconfig  |2 -
>  arch/x86/include/asm/lguest.h |   91 -
>  arch/x86/include/asm/lguest_hcall.h   |   74 -
>  arch/x86/include/asm/processor.h  |2 +-
>  arch/x86/include/uapi/asm/bootparam.h |2 +-
>  arch/x86/kernel/asm-offsets_32.c  |   20 -
>  arch/x86/kernel/head_32.S |2 -
>  arch/x86/kernel/platform-quirks.c |1 -
>  arch/x86/kvm/Kconfig  |1 -
>  arch/x86/lguest/Kconfig   |   14 -
>  arch/x86/lguest/Makefile  |2 -
>  arch/x86/lguest/boot.c| 1558 ---
>  arch/x86/lguest/head_32.S |  192 --
>  arch/x86/xen/enlighten_pv.c   |   59 +-
>  arch/x86/xen/xen-asm.S|   24 +-
>  arch/x86/xen/xen-asm.h|   12 -
>  arch/x86/xen/xen-asm_32.S |   27 +-
>  arch/x86/xen/xen-asm_64.S |   20 +-
>  arch/x86/xen/xen-ops.h|   15 +-
>  drivers/Makefile  |1 -
>  drivers/block/Kconfig |2 +-
>  drivers/char/Kconfig  |2 +-
>  drivers/char/virtio_console.c |2 +-
>  drivers/lguest/Kconfig|   13 -
>  drivers/lguest/Makefile   |   26 -
>  drivers/lguest/README |   47 -
>  drivers/lguest/core.c |  398 
>  drivers/lguest/hypercalls.c   |  304 ---
>  drivers/lguest/interrupts_and_traps.c |  706 ---
>  drivers/lguest/lg.h   |  258 ---
>  drivers/lguest/lguest_user.c  |  446 -
>  drivers/lguest/page_tables.c  | 1239 
>  drivers/lguest/segments.c |  228 ---
>  drivers/lguest/x86/core.c |  724 ---
>  drivers/lguest/x86/switcher_32.S  |  388 
>  drivers/net/Kconfig   |2 +-
>  drivers/tty/hvc/Kconfig   |2 +-
>  drivers/virtio/Kconfig|4 +-
>  include/linux/lguest.h|   73 -
>  include/linux/lguest_launcher.h   |   44 -
>  include/uapi/linux/virtio_ring.h  |4 +-
>  tools/Makefile|   11 +-
>  tools/lguest/.gitignore   |2 -
>  tools/lguest/Makefile |   14 -
>  tools/lguest/extract  |   58 -
>  tools/lguest/lguest.c | 3420 
> -
>  tools/lguest/lguest.txt   |  125 --
>  49 files changed, 36 insertions(+), 10639 deletions(-)
>  delete mode 100644 arch/x86/include/asm/lguest.h
>  delete mode 100644 arch/x86/include/asm/lguest_hcall.h
>  delete mode 100644 arch/x86/lguest/Kconfig
>  delete mode 100644 arch/x86/lguest/Makefile
>  delete mode 100644 arch/x86/lguest/boot.c
>  delete mode 100644 arch/x86/lguest/head_32.S
>  delete mode 100644 arch/x86/xen/xen-asm.h
>  delete mode 100644 drivers/lguest/Kconfig
>  delete mode 100644 drivers/lguest/Makefile
>  delete mode 100644 drivers/lguest/README
>  delete mode 100644 drivers/lguest/core.c
>  delete mode 100644 drivers/lguest/hypercalls.c
>  delete mode 100644 drivers/lguest/interrupts_and_traps.c
>  delete mode 100644 drivers/lguest/lg.h
>  delete mode 100644 drivers/lguest/lguest_user.c
>  delete mode 100644 drivers/lguest/page_tables.c
>  delete mode 100644 drivers/lguest/segments.c
>  delete mode 100644 drivers/lguest/x86/core.c
>  delete mode 100644 drivers/lguest/x86/switcher_32.S
>  delete mode 100644 include/linux/lguest.h
>  delete mode 100644 include/linux/lguest_launcher.h
>  delete mode 100644 tools/lguest/.gitignore
>  delete mode 100644 tools/lguest/Makefile
>  delete mode 100644

[Xen-devel] [libvirt test] 112659: tolerable trouble: blocked/broken/pass - PUSHED

2017-08-16 Thread osstest service owner

flight 112659 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112659/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt  1 build-check(1)   blocked  n/a
 build-arm64-xsm   2 hosts-allocate  broken like 112640
 build-arm64-pvops 2 hosts-allocate  broken like 112640
 build-arm64-xsm   3 capture-logsbroken like 112640
 build-arm64   2 hosts-allocate  broken like 112640
 build-arm64-pvops 3 capture-logsbroken like 112640
 build-arm64   3 capture-logsbroken like 112640
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 112640
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 112640
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 112640
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  fdab78b57400905acd6040c8fb91206e2afbd795
baseline version:
 libvirt  40cc355c9223e17b54b66fdaedd93e9f6c669704

Last test of basis   112640  2017-08-15 04:32:17 Z1 days
Testing same since   112659  2017-08-16 04:22:07 Z0 days1 attempts


People who touched revisions under test:
  John Ferlan 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  broken  
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  broken  
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  blocked 
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopsbroken  
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm blocked 
 test-armhf-armhf-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt blocked 
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   blocked 
 test-armhf-armhf-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in

[Xen-devel] [xen-unstable test] 112655: tolerable trouble: blocked/broken/fail/pass - PUSHED

2017-08-16 Thread osstest service owner

flight 112655 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112655/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-rumprun-i386 17 rumprun-demo-xenstorels/xenstorels.repeat fail 
REGR. vs. 112544
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop   fail REGR. vs. 112544
 test-armhf-armhf-xl-rtds16 guest-start/debian.repeat fail REGR. vs. 112544

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-xsm   2 hosts-allocate  broken like 112544
 build-arm64   2 hosts-allocate  broken like 112544
 build-arm64-xsm   3 capture-logsbroken like 112544
 build-arm64-pvops 2 hosts-allocate  broken like 112544
 build-arm64-pvops 3 capture-logsbroken like 112544
 build-arm64   3 capture-logsbroken like 112544
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 112544
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail like 112544
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 112544
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail like 112544
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 112544
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 112544
 test-amd64-amd64-xl-rtds 10 debian-install   fail  like 112544
 test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore   fail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore   fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass

version targeted for testing:
 xen  6e2a4c73564ab907b732059adb317d6ca2d138a2
baseline version:
 xen  f5c3e78b5c61e7dfb05749c7a0c862ec18c86384

[Xen-devel] [linux-linus test] 112653: regressions - trouble: blocked/broken/fail/pass

2017-08-16 Thread osstest service owner

flight 112653 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112653/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-examine  7 reboot   fail REGR. vs. 110515
 test-amd64-amd64-xl-xsm   7 xen-boot fail REGR. vs. 110515
 test-amd64-amd64-i386-pvgrub  7 xen-boot fail REGR. vs. 110515
 test-amd64-amd64-xl-qemut-debianhvm-amd64  7 xen-bootfail REGR. vs. 110515
 test-amd64-amd64-xl-qcow2 7 xen-boot fail REGR. vs. 110515
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 xen-boot fail REGR. vs. 110515
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. 
vs. 110515
 test-amd64-amd64-libvirt  7 xen-boot fail REGR. vs. 110515
 test-amd64-amd64-rumprun-amd64  7 xen-boot   fail REGR. vs. 110515
 test-amd64-amd64-xl-pvh-intel  7 xen-bootfail REGR. vs. 110515
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
110515
 test-amd64-amd64-pygrub   7 xen-boot fail REGR. vs. 110515
 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host   fail REGR. vs. 110515
 test-amd64-amd64-amd64-pvgrub  7 xen-bootfail REGR. vs. 110515
 test-amd64-amd64-libvirt-vhd  7 xen-boot fail REGR. vs. 110515
 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host   fail REGR. vs. 110515
 test-amd64-amd64-qemuu-nested-intel  7 xen-boot  fail REGR. vs. 110515
 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 110515
 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 110515
 test-amd64-amd64-xl   7 xen-boot fail REGR. vs. 110515
 build-armhf-pvops 6 kernel-build fail REGR. vs. 110515

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-localmigrate/x10 fail pass in 
112638

Regressions which are regarded as allowable (not blocking):
 build-arm64-pvops 2 hosts-allocate broken REGR. vs. 110515
 build-arm64-xsm   2 hosts-allocate broken REGR. vs. 110515
 build-arm64   2 hosts-allocate broken REGR. vs. 110515
 test-armhf-armhf-xl-rtds 12 guest-startfail in 112638 REGR. vs. 110515

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-arm64-arm64-examine  1 build-check(1)   blocked  n/a
 test-armhf-armhf-examine  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-xsm   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-pvops 3 capture-logs  broken blocked in 110515
 build-arm64-xsm   3 capture-logs  broken blocked in 110515
 build-arm64   3 capture-logs  broken blocked in 110515
 test-amd64-i386-xl-qemuu-win7-amd64 18 guest-start/win.repeat fail blocked in 
110515
 test-amd64-amd64-xl-qemuu-win7-amd64 18 guest-start/win.repeat fail blocked in 
110515
 test-armhf-armhf-libvirt 14 saverestore-support-check fail in 112638 like 
110515
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-check fail in 112638 like 
110515
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop  fail in 112638 like 110515
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop  fail in 112638 like 110515
 test-armhf-armhf-libvirt-raw 13 saverestore-support-check fail in 112638 like 
110515
 test-armhf-armhf-xl-credit2 13 migrate-support-check fail in 112638 never pass
 test-armhf-armhf-xl-credit2 14 saverestore-support-check fail in 112638 never 
pass
 test-armhf-armhf-xl-xsm 13 migrate-support-check fail in 112638 never pass
 test-armhf-armhf-libvirt13 migrate-support-check fail in 112638 never pass

Re: [Xen-devel] [PATCH] x86/svm: Use physical addresses for HSA and Host VMCB

2017-08-16 Thread Boris Ostrovsky

On 08/16/2017 02:23 PM, Andrew Cooper wrote:
> They are only referenced by physical address (either the HSA MSR, or via
> VMSAVE/VMLOAD which take a physical operand).  Allocating xenheap hages and
> storing their virtual address is wasteful.
>
> Allocate them with domheap pages instead, taking the opportunity to suitably
> NUMA-position them.  This avoids Xen needing to perform a virt to phys
> translation on every context switch.
>
> Signed-off-by: Andrew Cooper 

Reviewed-by: Boris Ostrovsky 

> ---
> CC: Jan Beulich 
> CC: Boris Ostrovsky 
> CC: Suravee Suthikulpanit 
>
> TODO at some other point: Figure out why svm_cpu_up_prepare() is reliably
> called twice for every CPU.

That's because it is called by BSP via PREPARE_CPU notifier and then by
the ASP during svm_cpu_up().

I think

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 0dc9442..3e7b9fc 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1538,7 +1538,7 @@ static int _svm_cpu_up(bool bsp)
 return -EINVAL;
 }
 
-if ( (rc = svm_cpu_up_prepare(cpu)) != 0 )
+if ( bsp && (rc = svm_cpu_up_prepare(cpu)) != 0 )
 return rc;
 
 write_efer(read_efer() | EFER_SVME);


should take care of this. I only had a quick look at intel and seems
they may have the same problem.


-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 4/7] arm: smccc: handle SMCs according to SMCCC

2017-08-16 Thread Volodymyr Babchuk

Hello Jan,

On Wed, Aug 09, 2017 at 05:58:19AM -0600, Jan Beulich wrote:

> 
> > On 08/08/17 21:08, Volodymyr Babchuk wrote:
> >> +#ifndef __XEN_PUBLIC_ARCH_ARM_SMC_H__
> >> +#define __XEN_PUBLIC_ARCH_ARM_SMC_H__
> >> +
> >> +typedef struct {
> >> +uint32_t a[4];
> >> +} xen_arm_smccc_uid;
> 
> This is not the normal way of encoding a UID type.
I thought about this. According to RFC 4122, UUID should be defined like this:
struct xen_uuid_rfc_4122 {
u32 time_low;  /* low part of timestamp */
u16 time_mid;  /* mid part of timestamp */
u16 time_hi_and_version;   /* high part of timestamp and version */
u8  clock_seq_hi_and_reserved; /* clock seq hi and variant */
u8  clock_seq_low; /* clock seq low */
u8  node[6];   /* nodes */
};

This resembles structure of RFC, but it is highly inconvenient to use. The most
used operation for UUIDs is comparison, I think. Next popular operations are
serialization and deserialization. All those are very trivial, if you are using
array instead of separate fields. I just checked Linux kernel, it uses array
of 16 u8s. I used array of four u32s because this is how it is represented in
SMC convention.
Now I'm going to create separate public header for UUIDs. And I'm not sure
that RFC 4122 approach is the best. Serialization code for that structure
will require some fiddling with binary shifts. Personally I stick to the
Linux way (uint8_t data[16]).
So, I'm interested in maintainers opinion.

> >> +#define XEN_ARM_SMCCC_UID(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)  \
> >> +((xen_arm_smccc_uid) {{(a), ((b) << 16 | (c) ), \
> 
> This is not C89 compatible.

I'm sorry, but I not quite sure why this is not C89 compatible. According to [1]
C89 supports initializer lists.

> 
> >> +((d0) << 24 | (d1) << 16 | (d2) << 8 | (d3) << 0),  \
> >> +((d4) << 24 | (d5) << 16 | (d6) << 8 | (d7) << 0)}})
> >> +

[1] http://port70.net/~nsz/c/c89/c89-draft.html#3.5.7

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [ovmf baseline-only test] 71982: all pass

2017-08-16 Thread Platform Team regression test user

This run is configured for baseline tests only.

flight 71982 ovmf real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/71982/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf af0364f01e8cac95afad01437f13beef90f6640b
baseline version:
 ovmf a6b3d753f98118ee547ae935b347f4f00fa67e7c

Last test of basis71977  2017-08-15 20:47:23 Z1 days
Testing same since71982  2017-08-16 18:51:35 Z0 days1 attempts


People who touched revisions under test:
  Michael D Kinney 
  Michael Kinney 
  Sunny Wang 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


commit af0364f01e8cac95afad01437f13beef90f6640b
Author: Michael D Kinney 
Date:   Mon Aug 14 15:18:10 2017 -0700

Nt32/PlatformBootManagerLib: Enable STD_ERROR on all consoles

Add STD_ERROR flag to all output consoles that the Nt32
platform supports so all messages sent to the standard
error console device(s) are visible by default.

The Boot Maintenance Manager can be used to manually disable
standard error output to specific console devices.

UEFI Applications and UEFI Drivers are recommended to be
built with DEBUG() and ASSERT() messages sent to the standard
error device using MdePkg/Library/UefiDebugLibStdErr. Prior
to this change, a user would have to use the Boot Maintenance
Manager to configure a standard error console device to make
these messages visible.

Cc: Ruiyu Ni 
Cc: Hao Wu 
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Michael Kinney 
Reviewed-by: Ruiyu Ni 
Reviewed-by: Sunny Wang 
Tested-by: Sunny Wang 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [xen-unstable-smoke test] 112670: tolerable trouble: broken/pass - PUSHED

2017-08-16 Thread osstest service owner

flight 112670 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112670/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-pvops 2 hosts-allocate  broken like 112666
 build-arm64   2 hosts-allocate  broken like 112666
 build-arm64-pvops 3 capture-logsbroken like 112666
 build-arm64   3 capture-logsbroken like 112666
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  2310da993bca1d9101804cbaf2817f38a38b6510
baseline version:
 xen  79d5dd06a677fcc8c5a585d95b32c35bd38bc34e

Last test of basis   112666  2017-08-16 13:03:12 Z0 days
Testing same since   112670  2017-08-16 18:02:34 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 

jobs:
 build-amd64  pass
 build-arm64  broken  
 build-armhf  pass
 build-amd64-libvirt  pass
 build-arm64-pvopsbroken  
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  broken  
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-step build-arm64-pvops hosts-allocate
broken-step build-arm64 hosts-allocate
broken-step build-arm64-pvops capture-logs
broken-step build-arm64 capture-logs

Pushing revision :

+ branch=xen-unstable-smoke
+ revision=2310da993bca1d9101804cbaf2817f38a38b6510
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
2310da993bca1d9101804cbaf2817f38a38b6510
+ branch=xen-unstable-smoke
+ revision=2310da993bca1d9101804cbaf2817f38a38b6510
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.9-testing
+ '[' x2310da993bca1d9101804cbaf2817f38a38b6510 = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ :

Re: [Xen-devel] [PATCH v2 1/2] paravirt,xen: remove xen_patch()

2017-08-16 Thread Josh Poimboeuf

On Wed, Aug 16, 2017 at 07:31:56PM +0200, Juergen Gross wrote:
>  ENTRY(xen_irq_disable_direct)
>   movb $1, PER_CPU_VAR(xen_vcpu_info) + XEN_vcpu_info_mask
> -ENDPATCH(xen_irq_disable_direct)
>   ret
>   ENDPROC(xen_irq_disable_direct)
> - RELOC(xen_irq_disable_direct, 0)

Might as well remove the ENDPROC indentations while you're at it, for
readability and consistency with other asm code.

Otherwise,

Reviewed-by: Josh Poimboeuf 

-- 
Josh

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 0/3] arm: allign check_conditional_instr() with ARMv8

2017-08-16 Thread Volodymyr Babchuk

Hello all,

This is 4th version of patch series:

 * Fixed spelling in comments
 * Fixed coding style

Volodymyr Babchuk (3):
  arm: processor: add new struct hsr_smc32 into hsr union
  arm: traps: handle unknown exceptions in check_conditional_instr()
  arm: traps: handle SMC32 in check_conditional_instr()

 xen/arch/arm/traps.c| 19 ++-
 xen/include/asm-arm/processor.h | 17 +
 2 files changed, 35 insertions(+), 1 deletion(-)

-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [ovmf test] 112656: all pass - PUSHED

2017-08-16 Thread osstest service owner

flight 112656 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112656/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf af0364f01e8cac95afad01437f13beef90f6640b
baseline version:
 ovmf a6b3d753f98118ee547ae935b347f4f00fa67e7c

Last test of basis   112644  2017-08-15 09:49:00 Z1 days
Testing same since   112656  2017-08-15 21:17:42 Z0 days1 attempts


People who touched revisions under test:
  Michael D Kinney 
  Michael Kinney 
  Sunny Wang 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

+ branch=ovmf
+ revision=af0364f01e8cac95afad01437f13beef90f6640b
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push ovmf 
af0364f01e8cac95afad01437f13beef90f6640b
+ branch=ovmf
+ revision=af0364f01e8cac95afad01437f13beef90f6640b
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=ovmf
+ xenbranch=xen-unstable
+ '[' xovmf = xlinux ']'
+ linuxbranch=
+ '[' x = x ']'
+ qemuubranch=qemu-upstream-unstable
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable
+ prevxenbranch=xen-4.9-testing
+ '[' xaf0364f01e8cac95afad01437f13beef90f6640b = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ : git://xenbits.xen.org/xtf.git
++ : git://xenbits.xen.org/libvirt.git
++ : osst...@xenbits.xen.org:/home/xen/git/libvirt.git
++ : git://xenbits.xen.org/libvirt.git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : git
++ : git://xenbits.xen.org/osstest/rumprun.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/rumprun.git
++ : git://git.seabios.org/seabios.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/seabios.git
++ : git://xenbits.xen.org/osstest/seabios.git
++ : https://github.com/tianocore/edk2.git
++ : osst...@xenbits.xen.org:/home/xen/git/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/ovmf.git
++ : git://xenbits.xen.org/osstest/linux-firmware.git
++ : osst...@xenbits.xen.org:/home/osstest/ext/linux-firmware.git
++ :

[Xen-devel] [PATCH v4 3/3] arm: traps: handle SMC32 in check_conditional_instr()

2017-08-16 Thread Volodymyr Babchuk

On ARMv8 architecture we need to ensure that conditional check was passed
for a trapped SMC instruction that originates from AArch32 state
(ARM DDI 0487B.a page D7-2271).
Thus, we should not skip it while checking HSR.EC value.

For this type of exception special coding of HSR.ISS is used. There is
additional flag (CCKNOWNPASS) to be checked before performing standard
handling of CCVALID and COND fields.

Signed-off-by: Volodymyr Babchuk 
---

 * Fixed spelling
 * Fixed coding style in 'if ( )'

---
 xen/arch/arm/traps.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index eae2212..2e92223 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1716,8 +1716,25 @@ static int check_conditional_instr(struct cpu_user_regs 
*regs,
 unsigned long cpsr, cpsr_cond;
 int cond;
 
+/*
+ * SMC32 instruction case is special. Under SMC32 we mean SMC
+ * instruction on ARMv7 or SMC instruction originating from
+ * AArch32 state on ARMv8.
+ * On ARMv7 it will be trapped only if it passed condition check
+ * (ARM DDI 0406C.c page B3-1431), but we need to check condition
+ * flags on ARMv8 (ARM DDI 0487B.a page D7-2271).
+ * Encoding for HSR.ISS on ARMv8 is backwards compatible with ARMv7:
+ * HSR.ISS is defined as UNK/SBZP on ARMv7 which means, that it
+ * will be read as 0. This includes CCKNOWNPASS field.
+ * If CCKNOWNPASS == 0 then this was an unconditional instruction or
+ * it has passed conditional check (ARM DDI 0487B.a page D7-2272).
+ */
+if ( hsr.ec == HSR_EC_SMC32 && hsr.smc32.ccknownpass == 0 )
+return 1;
+
 /* Unconditional Exception classes */
-if ( hsr.ec == HSR_EC_UNKNOWN || hsr.ec >= 0x10 )
+if ( hsr.ec == HSR_EC_UNKNOWN ||
+ (hsr.ec >= 0x10 && hsr.ec != HSR_EC_SMC32) )
 return 1;
 
 /* Check for valid condition in hsr */
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 2/3] arm: traps: handle unknown exceptions in check_conditional_instr()

2017-08-16 Thread Volodymyr Babchuk

According to ARM architecture reference manual (ARM DDI 0487B.a page D7-2259,
ARM DDI 0406C.c page B3-1426), exception with unknown reason (HSR.EC == 0)
has no valid bits in HSR (apart from HSR.EC), so we can't check if that was
caused by conditional instruction. We need to assume that it is unconditional.

Signed-off-by: Volodymyr Babchuk 
Acked-by: Julien Grall 
---
 xen/arch/arm/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index c07999b..eae2212 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1717,7 +1717,7 @@ static int check_conditional_instr(struct cpu_user_regs 
*regs,
 int cond;
 
 /* Unconditional Exception classes */
-if ( hsr.ec >= 0x10 )
+if ( hsr.ec == HSR_EC_UNKNOWN || hsr.ec >= 0x10 )
 return 1;
 
 /* Check for valid condition in hsr */
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v4 1/3] arm: processor: add new struct hsr_smc32 into hsr union

2017-08-16 Thread Volodymyr Babchuk

On ARMv8, one of conditional exceptions (SMC that originates
from AArch32 state) has extra field in HSR.ISS encoding:

CCKNOWNPASS, bit [19]
Indicates whether the instruction might have failed its condition
code check.
   0 - The instruction was unconditional, or was conditional and
   passed  its condition code check.
   1 - The instruction was conditional, and might have failed its
   condition code check.
(ARM DDI 0487B.a page D7-2272)

This is an instruction specific field, so better to add new structure
to union hsr. This structure describes ISS encoding for an exception
from SMC instruction executing in AArch32 state. But we define this
struct for both ARMv7 and ARMv8, because ARMv8 encoding is backwards
compatible with ARMv7.

Signed-off-by: Volodymyr Babchuk 
Acked-by: Julien Grall 
---
 xen/include/asm-arm/processor.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
index 855ded1..926ae68 100644
--- a/xen/include/asm-arm/processor.h
+++ b/xen/include/asm-arm/processor.h
@@ -488,6 +488,23 @@ union hsr {
 unsigned long ec:6; /* Exception Class */
 } cp; /* HSR_EC_CP */
 
+/*
+ * This encoding is valid only for ARMv8 (ARM DDI 0487B.a, pages D7-2271 
and
+ * G6-4957). On ARMv7, encoding ISS for EC=0x13 is defined as UNK/SBZP
+ * (ARM DDI 0406C.c page B3-1431). UNK/SBZP means that hardware implements
+ * this field as Read-As-Zero. ARMv8 is backwards compatible with ARMv7:
+ * reading CCKNOWNPASS on ARMv7 will return 0, which means that condition
+ * check was passed or instruction was unconditional.
+ */
+struct hsr_smc32 {
+unsigned long res0:19;  /* Reserved */
+unsigned long ccknownpass:1; /* Instruction passed conditional check */
+unsigned long cc:4;/* Condition Code */
+unsigned long ccvalid:1;/* CC Valid */
+unsigned long len:1;   /* Instruction length */
+unsigned long ec:6;/* Exception Class */
+} smc32; /* HSR_EC_SMC32 */
+
 #ifdef CONFIG_ARM_64
 struct hsr_sysreg {
 unsigned long read:1;   /* Direction */
-- 
2.7.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen/pvcalls: use WARN_ON(1) instead of __WARN()

2017-08-16 Thread Boris Ostrovsky

On 07/21/2017 03:26 PM, Stefano Stabellini wrote:
> On Fri, 21 Jul 2017, Arnd Bergmann wrote:
>> __WARN() is an internal helper that is only available on
>> some architectures, but causes a build error e.g. on ARM64
>> in some configurations:
>>
>> drivers/xen/pvcalls-back.c: In function 'set_backend_state':
>> drivers/xen/pvcalls-back.c:1097:5: error: implicit declaration of function 
>> '__WARN' [-Werror=implicit-function-declaration]
>>
>> Unfortunately, there is no equivalent of BUG() that takes no
>> arguments, but WARN_ON(1) is commonly used in other drivers
>> and works on all configurations.
>>
>> Fixes: 7160378206b2 ("xen/pvcalls: xenbus state handling")
>> Signed-off-by: Arnd Bergmann 
> Reviewed-by: Stefano Stabellini 
>

Applied to for-linus-4.14

-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCHES v8 4/8] mm: Scrub memory from idle loop

2017-08-16 Thread Boris Ostrovsky

Instead of scrubbing pages during guest destruction (from
free_heap_pages()) do this opportunistically, from the idle loop.

We might come to scrub_free_pages()from idle loop while another CPU
uses mapcache override, resulting in a fault while trying to do
__map_domain_page() in scrub_one_page(). To avoid this, make mapcache
vcpu override a per-cpu variable.

Signed-off-by: Boris Ostrovsky 
Reviewed-by: Jan Beulich 
Reviewed-by: Dario Faggioli 
---
 xen/arch/arm/domain.c  |   8 ++-
 xen/arch/x86/domain.c  |   8 ++-
 xen/arch/x86/domain_page.c |   6 +--
 xen/common/page_alloc.c| 119 -
 xen/include/xen/mm.h   |   1 +
 5 files changed, 124 insertions(+), 18 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index eeebbdb..42fb8d6 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -51,7 +51,13 @@ void idle_loop(void)
 /* Are we here for running vcpu context tasklets, or for idling? */
 if ( unlikely(tasklet_work_to_do(cpu)) )
 do_tasklet();
-else
+/*
+ * Test softirqs twice --- first to see if should even try scrubbing
+ * and then, after it is done, whether softirqs became pending
+ * while we were scrubbing.
+ */
+else if ( !softirq_pending(cpu) && !scrub_free_pages() &&
+!softirq_pending(cpu) )
 {
 local_irq_disable();
 if ( cpu_is_haltable(cpu) )
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index baaf815..9b4b959 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -122,7 +122,13 @@ static void idle_loop(void)
 /* Are we here for running vcpu context tasklets, or for idling? */
 if ( unlikely(tasklet_work_to_do(cpu)) )
 do_tasklet();
-else
+/*
+ * Test softirqs twice --- first to see if should even try scrubbing
+ * and then, after it is done, whether softirqs became pending
+ * while we were scrubbing.
+ */
+else if ( !softirq_pending(cpu) && !scrub_free_pages()  &&
+!softirq_pending(cpu) )
 pm_idle();
 do_softirq();
 /*
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 71baede..0783c1e 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -18,12 +18,12 @@
 #include 
 #include 
 
-static struct vcpu *__read_mostly override;
+static DEFINE_PER_CPU(struct vcpu *, override);
 
 static inline struct vcpu *mapcache_current_vcpu(void)
 {
 /* In the common case we use the mapcache of the running VCPU. */
-struct vcpu *v = override ?: current;
+struct vcpu *v = this_cpu(override) ?: current;
 
 /*
  * When current isn't properly set up yet, this is equivalent to
@@ -59,7 +59,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
 
 void __init mapcache_override_current(struct vcpu *v)
 {
-override = v;
+this_cpu(override) = v;
 }
 
 #define mapcache_l2_entry(e) ((e) >> PAGETABLE_ORDER)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 1303736..d0c2021 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1009,15 +1009,86 @@ static int reserve_offlined_page(struct page_info *head)
 return count;
 }
 
-static void scrub_free_pages(unsigned int node)
+static nodemask_t node_scrubbing;
+
+/*
+ * If get_node is true this will return closest node that needs to be scrubbed,
+ * with appropriate bit in node_scrubbing set.
+ * If get_node is not set, this will return *a* node that needs to be scrubbed.
+ * node_scrubbing bitmask will no be updated.
+ * If no node needs scrubbing then NUMA_NO_NODE is returned.
+ */
+static unsigned int node_to_scrub(bool get_node)
 {
-struct page_info *pg;
-unsigned int zone;
+nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
+nodeid_t closest = NUMA_NO_NODE;
+u8 dist, shortest = 0xff;
 
-ASSERT(spin_is_locked(_lock));
+if ( node == NUMA_NO_NODE )
+node = 0;
 
-if ( !node_need_scrub[node] )
-return;
+if ( node_need_scrub[node] &&
+ (!get_node || !node_test_and_set(node, node_scrubbing)) )
+return node;
+
+/*
+ * See if there are memory-only nodes that need scrubbing and choose
+ * the closest one.
+ */
+local_node = node;
+for ( ; ; )
+{
+do {
+node = cycle_node(node, node_online_map);
+} while ( !cpumask_empty(_to_cpumask(node)) &&
+  (node != local_node) );
+
+if ( node == local_node )
+break;
+
+if ( node_need_scrub[node] )
+{
+if ( !get_node )
+return node;
+
+dist = __node_distance(local_node, node);
+
+/*
+ * Grab the node right away. If we find a closer node

[Xen-devel] [PATCHES v8 6/8] mm: Keep heap accessible to others while scrubbing

2017-08-16 Thread Boris Ostrovsky

Instead of scrubbing pages while holding heap lock we can mark
buddy's head as being scrubbed and drop the lock temporarily.
If someone (most likely alloc_heap_pages()) tries to access
this chunk it will signal the scrubber to abort scrub by setting
head's BUDDY_SCRUB_ABORT bit. The scrubber checks this bit after
processing each page and stops its work as soon as it sees it.

Signed-off-by: Boris Ostrovsky 
Reviewed-by: Jan Beulich 
---
 xen/common/page_alloc.c  | 110 +--
 xen/include/asm-arm/mm.h |  29 -
 xen/include/asm-x86/mm.h |  27 
 3 files changed, 143 insertions(+), 23 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index d0c2021..48798ca 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -683,6 +683,7 @@ static void page_list_add_scrub(struct page_info *pg, 
unsigned int node,
 {
 PFN_ORDER(pg) = order;
 pg->u.free.first_dirty = first_dirty;
+pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
 
 if ( first_dirty != INVALID_DIRTY_IDX )
 {
@@ -693,6 +694,25 @@ static void page_list_add_scrub(struct page_info *pg, 
unsigned int node,
 page_list_add(pg, (node, zone, order));
 }
 
+static void check_and_stop_scrub(struct page_info *head)
+{
+if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
+{
+typeof(head->u.free) pgfree;
+
+head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
+spin_lock_kick();
+for ( ; ; )
+{
+/* Can't ACCESS_ONCE() a bitfield. */
+pgfree.val = ACCESS_ONCE(head->u.free.val);
+if ( pgfree.scrub_state != BUDDY_SCRUB_ABORT )
+break;
+cpu_relax();
+}
+}
+}
+
 static struct page_info *get_free_buddy(unsigned int zone_lo,
 unsigned int zone_hi,
 unsigned int order, unsigned int 
memflags,
@@ -737,14 +757,19 @@ static struct page_info *get_free_buddy(unsigned int 
zone_lo,
 {
 if ( (pg = page_list_remove_head((node, zone, j))) )
 {
+if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+return pg;
 /*
  * We grab single pages (order=0) even if they are
  * unscrubbed. Given that scrubbing one page is fairly 
quick
  * it is not worth breaking higher orders.
  */
-if ( (order == 0) || use_unscrubbed ||
- pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+if ( (order == 0) || use_unscrubbed )
+{
+check_and_stop_scrub(pg);
 return pg;
+}
+
 page_list_add_tail(pg, (node, zone, j));
 }
 }
@@ -925,6 +950,7 @@ static int reserve_offlined_page(struct page_info *head)
 
 cur_head = head;
 
+check_and_stop_scrub(head);
 /*
  * We may break the buddy so let's mark the head as clean. Then, when
  * merging chunks back into the heap, we will see whether the chunk has
@@ -1075,6 +1101,29 @@ static unsigned int node_to_scrub(bool get_node)
 return closest;
 }
 
+struct scrub_wait_state {
+struct page_info *pg;
+unsigned int first_dirty;
+bool drop;
+};
+
+static void scrub_continue(void *data)
+{
+struct scrub_wait_state *st = data;
+
+if ( st->drop )
+return;
+
+if ( st->pg->u.free.scrub_state == BUDDY_SCRUB_ABORT )
+{
+/* There is a waiter for this buddy. Release it. */
+st->drop = true;
+st->pg->u.free.first_dirty = st->first_dirty;
+smp_wmb();
+st->pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+}
+}
+
 bool scrub_free_pages(void)
 {
 struct page_info *pg;
@@ -1097,25 +1146,53 @@ bool scrub_free_pages(void)
 do {
 while ( !page_list_empty((node, zone, order)) )
 {
-unsigned int i;
+unsigned int i, dirty_cnt;
+struct scrub_wait_state st;
 
 /* Unscrubbed pages are always at the end of the list. */
 pg = page_list_last((node, zone, order));
 if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
 break;
 
+ASSERT(pg->u.free.scrub_state == BUDDY_NOT_SCRUBBING);
+pg->u.free.scrub_state = BUDDY_SCRUBBING;
+
+spin_unlock(_lock);
+
+dirty_cnt = 0;
+
 for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
 {
 if ( test_bit(_PGC_need_scrub, [i].count_info) )
 {
 scrub_one_page([i]);
+/*
+ * We can modify

[Xen-devel] [PATCHES v8 2/8] mm: Extract allocation loop from alloc_heap_pages()

2017-08-16 Thread Boris Ostrovsky

This will make code a bit more readable, especially with changes that
will be introduced in subsequent patches.

Signed-off-by: Boris Ostrovsky 
Acked-by: Jan Beulich 
---
 xen/common/page_alloc.c | 139 +++-
 1 file changed, 77 insertions(+), 62 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index a39fd81..5c550b5 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -693,22 +693,15 @@ static void page_list_add_scrub(struct page_info *pg, 
unsigned int node,
 page_list_add(pg, (node, zone, order));
 }
 
-/* Allocate 2^@order contiguous pages. */
-static struct page_info *alloc_heap_pages(
-unsigned int zone_lo, unsigned int zone_hi,
-unsigned int order, unsigned int memflags,
-struct domain *d)
+static struct page_info *get_free_buddy(unsigned int zone_lo,
+unsigned int zone_hi,
+unsigned int order, unsigned int 
memflags,
+const struct domain *d)
 {
-unsigned int i, j, zone = 0, nodemask_retry = 0, first_dirty;
 nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
-unsigned long request = 1UL << order;
+nodemask_t nodemask = d ? d->node_affinity : node_online_map;
+unsigned int j, zone, nodemask_retry = 0;
 struct page_info *pg;
-nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
-bool_t need_tlbflush = 0;
-uint32_t tlbflush_timestamp = 0;
-
-/* Make sure there are enough bits in memflags for nodeID. */
-BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
 
 if ( node == NUMA_NO_NODE )
 {
@@ -724,34 +717,6 @@ static struct page_info *alloc_heap_pages(
 first_node = node;
 
 ASSERT(node < MAX_NUMNODES);
-ASSERT(zone_lo <= zone_hi);
-ASSERT(zone_hi < NR_ZONES);
-
-if ( unlikely(order > MAX_ORDER) )
-return NULL;
-
-spin_lock(_lock);
-
-/*
- * Claimed memory is considered unavailable unless the request
- * is made by a domain with sufficient unclaimed pages.
- */
-if ( (outstanding_claims + request >
-  total_avail_pages + tmem_freeable_pages()) &&
-  ((memflags & MEMF_no_refcount) ||
-   !d || d->outstanding_pages < request) )
-goto not_found;
-
-/*
- * TMEM: When available memory is scarce due to tmem absorbing it, allow
- * only mid-size allocations to avoid worst of fragmentation issues.
- * Others try tmem pools then fail.  This is a workaround until all
- * post-dom0-creation-multi-page allocations can be eliminated.
- */
-if ( ((order == 0) || (order >= 9)) &&
- (total_avail_pages <= midsize_alloc_zone_pages) &&
- tmem_freeable_pages() )
-goto try_tmem;
 
 /*
  * Start with requested node, but exhaust all node memory in requested 
@@ -763,17 +728,17 @@ static struct page_info *alloc_heap_pages(
 zone = zone_hi;
 do {
 /* Check if target node can support the allocation. */
-if ( !avail[node] || (avail[node][zone] < request) )
+if ( !avail[node] || (avail[node][zone] < (1UL << order)) )
 continue;
 
 /* Find smallest order which can satisfy the request. */
 for ( j = order; j <= MAX_ORDER; j++ )
 if ( (pg = page_list_remove_head((node, zone, j))) )
-goto found;
+return pg;
 } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
 if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
-goto not_found;
+return NULL;
 
 /* Pick next node. */
 if ( !node_isset(node, nodemask) )
@@ -790,46 +755,96 @@ static struct page_info *alloc_heap_pages(
 {
 /* When we have tried all in nodemask, we fall back to others. */
 if ( (memflags & MEMF_exact_node) || nodemask_retry++ )
-goto not_found;
+return NULL;
 nodes_andnot(nodemask, node_online_map, nodemask);
 first_node = node = first_node(nodemask);
 if ( node >= MAX_NUMNODES )
-goto not_found;
+return NULL;
 }
 }
+}
+
+/* Allocate 2^@order contiguous pages. */
+static struct page_info *alloc_heap_pages(
+unsigned int zone_lo, unsigned int zone_hi,
+unsigned int order, unsigned int memflags,
+struct domain *d)
+{
+nodeid_t node;
+unsigned int i, buddy_order, zone, first_dirty;
+unsigned long request = 1UL << order;
+struct page_info *pg;
+bool need_tlbflush = false;
+uint32_t tlbflush_timestamp = 0;
+
+/* Make sure there are enough bits in memflags for nodeID. */
+BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
+
+

[Xen-devel] [PATCHES v8 5/8] spinlock: Introduce spin_lock_cb()

2017-08-16 Thread Boris Ostrovsky

While waiting for a lock we may want to periodically run some
code. This code may, for example, allow the caller to release
resources held by it that are no longer needed in the critical
section protected by the lock.

Specifically, this feature will be needed by scrubbing code where
the scrubber, while waiting for heap lock to merge back clean
pages, may be requested by page allocator (which is currently
holding the lock) to abort merging and release the buddy page head
that the allocator wants.

We could use spin_trylock() but since it doesn't take lock ticket
it may take long time until the lock is taken. Instead we add
spin_lock_cb() that allows us to grab the ticket and execute a
callback while waiting. This callback is executed on every iteration
of the spinlock waiting loop.

Since we may be sleeping in the lock until it is released we need a
mechanism that will make sure that the callback has a chance to run.
We add spin_lock_kick() that will wake up the waiter.

Signed-off-by: Boris Ostrovsky 
---
Changes in v8:
* Defined arch_lock_signal_wmb() to avoid using smp_wmb() on ARM twice.

 xen/common/spinlock.c  | 9 -
 xen/include/asm-arm/spinlock.h | 2 ++
 xen/include/asm-x86/spinlock.h | 5 +
 xen/include/xen/spinlock.h | 4 
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 2a06406..3c1caae 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -129,7 +129,7 @@ static always_inline u16 observe_head(spinlock_tickets_t *t)
 return read_atomic(>head);
 }
 
-void _spin_lock(spinlock_t *lock)
+void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
 {
 spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
 LOCK_PROFILE_VAR;
@@ -140,6 +140,8 @@ void _spin_lock(spinlock_t *lock)
 while ( tickets.tail != observe_head(>tickets) )
 {
 LOCK_PROFILE_BLOCK;
+if ( unlikely(cb) )
+cb(data);
 arch_lock_relax();
 }
 LOCK_PROFILE_GOT;
@@ -147,6 +149,11 @@ void _spin_lock(spinlock_t *lock)
 arch_lock_acquire_barrier();
 }
 
+void _spin_lock(spinlock_t *lock)
+{
+ _spin_lock_cb(lock, NULL, NULL);
+}
+
 void _spin_lock_irq(spinlock_t *lock)
 {
 ASSERT(local_irq_is_enabled());
diff --git a/xen/include/asm-arm/spinlock.h b/xen/include/asm-arm/spinlock.h
index 8cdf9e1..42b0f58 100644
--- a/xen/include/asm-arm/spinlock.h
+++ b/xen/include/asm-arm/spinlock.h
@@ -10,4 +10,6 @@
 sev();  \
 } while(0)
 
+#define arch_lock_signal_wmb()  arch_lock_signal()
+
 #endif /* __ASM_SPINLOCK_H */
diff --git a/xen/include/asm-x86/spinlock.h b/xen/include/asm-x86/spinlock.h
index be72c0f..56f6095 100644
--- a/xen/include/asm-x86/spinlock.h
+++ b/xen/include/asm-x86/spinlock.h
@@ -18,5 +18,10 @@
 
 #define arch_lock_relax() cpu_relax()
 #define arch_lock_signal()
+#define arch_lock_signal_wmb()  \
+({  \
+smp_wmb();  \
+arch_lock_signal(); \
+})
 
 #endif /* __ASM_SPINLOCK_H */
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index c1883bd..b5ca07d 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -153,6 +153,7 @@ typedef struct spinlock {
 #define spin_lock_init(l) (*(l) = (spinlock_t)SPIN_LOCK_UNLOCKED)
 
 void _spin_lock(spinlock_t *lock);
+void _spin_lock_cb(spinlock_t *lock, void (*cond)(void *), void *data);
 void _spin_lock_irq(spinlock_t *lock);
 unsigned long _spin_lock_irqsave(spinlock_t *lock);
 
@@ -169,6 +170,7 @@ void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
 #define spin_lock(l)  _spin_lock(l)
+#define spin_lock_cb(l, c, d) _spin_lock_cb(l, c, d)
 #define spin_lock_irq(l)  _spin_lock_irq(l)
 #define spin_lock_irqsave(l, f) \
 ({  \
@@ -190,6 +192,8 @@ void _spin_unlock_recursive(spinlock_t *lock);
 1 : ({ local_irq_restore(flags); 0; }); \
 })
 
+#define spin_lock_kick(l) arch_lock_signal_wmb()
+
 /* Ensure a lock is quiescent between two critical operations. */
 #define spin_barrier(l)   _spin_barrier(l)
 
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCHES v8 1/8] mm: Place unscrubbed pages at the end of pagelist

2017-08-16 Thread Boris Ostrovsky

.. so that it's easy to find pages that need to be scrubbed (those pages are
now marked with _PGC_need_scrub bit).

We keep track of the first unscrubbed page in a page buddy using first_dirty
field. For now it can have two values, 0 (whole buddy needs scrubbing) or
INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
will allow scrubbing to be interrupted, resulting in first_dirty taking any
value.

Signed-off-by: Boris Ostrovsky 
---
Changes in v8:
* Changed x86's definition of page_info.u.free from using bitfields to natural
  datatypes
* Swapped order of bitfields in page_info.u.free for ARM
* Added BUILD_BUG_ON to check page_info.u.free.first_dirty size on x86, moved
  previously defined BUILD_BUG_ON from init_heap_pages() to init_boot_pages()
  (to avoid introducing extra '#ifdef x86' and to keep both together)

 xen/common/page_alloc.c  | 159 ---
 xen/include/asm-arm/mm.h |  17 -
 xen/include/asm-x86/mm.h |  15 +
 3 files changed, 167 insertions(+), 24 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 444ecf3..a39fd81 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -261,7 +261,11 @@ void __init init_boot_pages(paddr_t ps, paddr_t pe)
 #ifdef CONFIG_X86
 const unsigned long *badpage = NULL;
 unsigned int i, array_size;
+
+BUILD_BUG_ON(8 * sizeof(((struct page_info *)0)->u.free.first_dirty) <
+ MAX_ORDER + 1);
 #endif
+BUILD_BUG_ON(sizeof(((struct page_info *)0)->u) != sizeof(unsigned long));
 
 ps = round_pgup(ps);
 pe = round_pgdown(pe);
@@ -375,6 +379,8 @@ typedef struct page_list_head 
heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
 static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
 #define heap(node, zone, order) ((*_heap[node])[zone][order])
 
+static unsigned long node_need_scrub[MAX_NUMNODES];
+
 static unsigned long *avail[MAX_NUMNODES];
 static long total_avail_pages;
 
@@ -670,13 +676,30 @@ static void check_low_mem_virq(void)
 }
 }
 
+/* Pages that need a scrub are added to tail, otherwise to head. */
+static void page_list_add_scrub(struct page_info *pg, unsigned int node,
+unsigned int zone, unsigned int order,
+unsigned int first_dirty)
+{
+PFN_ORDER(pg) = order;
+pg->u.free.first_dirty = first_dirty;
+
+if ( first_dirty != INVALID_DIRTY_IDX )
+{
+ASSERT(first_dirty < (1U << order));
+page_list_add_tail(pg, (node, zone, order));
+}
+else
+page_list_add(pg, (node, zone, order));
+}
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
 unsigned int zone_lo, unsigned int zone_hi,
 unsigned int order, unsigned int memflags,
 struct domain *d)
 {
-unsigned int i, j, zone = 0, nodemask_retry = 0;
+unsigned int i, j, zone = 0, nodemask_retry = 0, first_dirty;
 nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
 unsigned long request = 1UL << order;
 struct page_info *pg;
@@ -790,12 +813,26 @@ static struct page_info *alloc_heap_pages(
 return NULL;
 
  found: 
+
+first_dirty = pg->u.free.first_dirty;
+
 /* We may have to halve the chunk a number of times. */
 while ( j != order )
 {
-PFN_ORDER(pg) = --j;
-page_list_add_tail(pg, (node, zone, j));
-pg += 1 << j;
+j--;
+page_list_add_scrub(pg, node, zone, j,
+(1U << j) > first_dirty ?
+first_dirty : INVALID_DIRTY_IDX);
+pg += 1U << j;
+
+if ( first_dirty != INVALID_DIRTY_IDX )
+{
+/* Adjust first_dirty */
+if ( first_dirty >= 1U << j )
+first_dirty -= 1U << j;
+else
+first_dirty = 0; /* We've moved past original first_dirty */
+}
 }
 
 ASSERT(avail[node][zone] >= request);
@@ -842,12 +879,20 @@ static int reserve_offlined_page(struct page_info *head)
 unsigned int node = phys_to_nid(page_to_maddr(head));
 int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
 struct page_info *cur_head;
-int cur_order;
+unsigned int cur_order, first_dirty;
 
 ASSERT(spin_is_locked(_lock));
 
 cur_head = head;
 
+/*
+ * We may break the buddy so let's mark the head as clean. Then, when
+ * merging chunks back into the heap, we will see whether the chunk has
+ * unscrubbed pages and set its first_dirty properly.
+ */
+first_dirty = head->u.free.first_dirty;
+head->u.free.first_dirty = INVALID_DIRTY_IDX;
+
 page_list_del(head, (node, zone, head_order));
 
 while ( cur_head < (head + (1 << head_order)) )
@@ -858,6 +903,8 @@ static int reserve_offlined_page(struct page_info *head)
 if ( page_state_is(cur_head, offlined) )
 {

[Xen-devel] [PATCHES v8 0/8] Memory scrubbing from idle loop

2017-08-16 Thread Boris Ostrovsky

V8:
* Reverted x86's page_info.u.free to using integral types instead of bitfields
* Defined arch_lock_signal_wmb() to avoid having extra barrier in ARM

(see per-patch changes)

When a domain is destroyed the hypervisor must scrub domain's pages before
giving them to another guest in order to prevent leaking the deceased
guest's data. Currently this is done during guest's destruction, possibly
causing very lengthy cleanup process.

This series adds support for scrubbing released pages from idle loop,
making guest destruction significantly faster. For example, destroying a
1TB guest can now be completed in 40+ seconds as opposed to about 9 minutes
using existing scrubbing algorithm.

Briefly, the new algorithm places dirty pages at the end of heap's page list
for each node/zone/order to avoid having to scan full list while searching
for dirty pages. One processor form each node checks whether the node has any
dirty pages and, if such pages are found, scrubs them. Scrubbing itself
happens without holding heap lock so other users may access heap in the
meantime. If while idle loop is scrubbing a particular chunk of pages this
chunk is requested by the heap allocator, scrubbing is immediately stopped.

On the allocation side, alloc_heap_pages() first tries to satisfy allocation
request using only clean pages. If this is not possible, the search is
repeated and dirty pages are scrubbed by the allocator.

This series is somewhat based on earlier work by Bob Liu.

V1:
* Only set PGC_need_scrub bit for the buddy head, thus making it unnecessary
  to scan whole buddy
* Fix spin_lock_cb()
* Scrub CPU-less nodes
* ARM support. Note that I have not been able to test this, only built the
  binary
* Added scrub test patch (last one). Not sure whether it should be considered
  for committing but I have been running with it.

V2:
* merge_chunks() returns new buddy head
* scrub_free_pages() returns softirq pending status in addition to (factored 
out)
  status of unscrubbed memory
* spin_lock uses inlined spin_lock_cb()
* scrub debugging code checks whole page, not just the first word.

V3:
* Keep dirty bit per page
* Simplify merge_chunks() (now merge_and_free_buddy())
* When scrubbing memmory-only nodes try to find the closest node.

V4:
* Keep track of dirty pages in a buddy with page_info.u.free.first_dirty.
* Drop patch 1 (factoring out merge_and_free_buddy()) since there is only
  one caller now
* Drop patch patch 5 (from V3) since we are not breaking partially-scrubbed
  buddy anymore
* Extract search loop in alloc_heap_pages() into get_free_buddy() (patch 2)
* Add MEMF_no_scrub flag

V5:
* Make page_info.u.free and union and use bitfields there.
* Bug fixes

V6:
* Changed first_dirty tracking from pointer-based to index-based (patch 1)
* Added/modified a few ASSERT()s
* Moved/modifed a couple of comments
* Adjusted width of INVALID_DIRTY_IDX

V7:
* Split free_heap_pages() buddy merge changes into a separate patch (patch 1)
* Changed type for page_info.u.free.need_tlbflush to bool:1
* Added BUILD_BUG_ON
* Adjusted datatype of temp variable in check_and_stop_scrub()
* Formatting changes


Deferred:
* Per-node heap locks. In addition to (presumably) improving performance in
  general, once they are available we can parallelize scrubbing further by
  allowing more than one core per node to do idle loop scrubbing.
* AVX-based scrubbing
* Use idle loop scrubbing during boot.



Boris Ostrovsky (8):
  mm: Place unscrubbed pages at the end of pagelist
  mm: Extract allocation loop from alloc_heap_pages()
  mm: Scrub pages in alloc_heap_pages() if needed
  mm: Scrub memory from idle loop
  spinlock: Introduce spin_lock_cb()
  mm: Keep heap accessible to others while scrubbing
  mm: Print number of unscrubbed pages in 'H' debug handler
  mm: Make sure pages are scrubbed

 xen/Kconfig.debug  |   7 +
 xen/arch/arm/domain.c  |   8 +-
 xen/arch/x86/domain.c  |   8 +-
 xen/arch/x86/domain_page.c |   6 +-
 xen/common/page_alloc.c| 580 +++--
 xen/common/spinlock.c  |   9 +-
 xen/include/asm-arm/mm.h   |  32 ++-
 xen/include/asm-arm/spinlock.h |   2 +
 xen/include/asm-x86/mm.h   |  30 ++-
 xen/include/asm-x86/spinlock.h |   5 +
 xen/include/xen/mm.h   |   5 +-
 xen/include/xen/spinlock.h |   4 +
 12 files changed, 600 insertions(+), 96 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCHES v8 7/8] mm: Print number of unscrubbed pages in 'H' debug handler

2017-08-16 Thread Boris Ostrovsky

Signed-off-by: Boris Ostrovsky 
Reviewed-by: Wei Liu 
---
 xen/common/page_alloc.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 48798ca..34c45be 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2308,6 +2308,13 @@ static void dump_heap(unsigned char key)
 printk("heap[node=%d][zone=%d] -> %lu pages\n",
i, j, avail[i][j]);
 }
+
+for ( i = 0; i < MAX_NUMNODES; i++ )
+{
+if ( !node_need_scrub[i] )
+continue;
+printk("Node %d has %lu unscrubbed pages\n", i, node_need_scrub[i]);
+}
 }
 
 static __init int register_heap_trigger(void)
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCHES v8 8/8] mm: Make sure pages are scrubbed

2017-08-16 Thread Boris Ostrovsky

Add a debug Kconfig option that will make page allocator verify
that pages that were supposed to be scrubbed are, in fact, clean.

Signed-off-by: Boris Ostrovsky 
Reviewed-by: Jan Beulich 
---
 xen/Kconfig.debug   |  7 ++
 xen/common/page_alloc.c | 63 -
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index 689f297..195d504 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG
  logged in the Xen ring buffer.
  If unsure, say N here.
 
+config SCRUB_DEBUG
+   bool "Page scrubbing test"
+   default DEBUG
+   ---help---
+ Verify that pages that need to be scrubbed before being allocated to
+ a guest are indeed scrubbed.
+
 endif # DEBUG || EXPERT
 
 endmenu
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 34c45be..388b121 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -170,6 +170,10 @@ boolean_param("bootscrub", opt_bootscrub);
 static unsigned long __initdata opt_bootscrub_chunk = MB(128);
 size_param("bootscrub_chunk", opt_bootscrub_chunk);
 
+#ifdef CONFIG_SCRUB_DEBUG
+static bool __read_mostly boot_scrub_done;
+#endif
+
 /*
  * Bit width of the DMA heap -- used to override NUMA-node-first.
  * allocation strategy, which can otherwise exhaust low memory.
@@ -694,6 +698,43 @@ static void page_list_add_scrub(struct page_info *pg, 
unsigned int node,
 page_list_add(pg, (node, zone, order));
 }
 
+/* SCRUB_PATTERN needs to be a repeating series of bytes. */
+#ifndef NDEBUG
+#define SCRUB_PATTERN0xc2c2c2c2c2c2c2c2ULL
+#else
+#define SCRUB_PATTERN0ULL
+#endif
+#define SCRUB_BYTE_PATTERN   (SCRUB_PATTERN & 0xff)
+
+static void poison_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+mfn_t mfn = _mfn(page_to_mfn(pg));
+uint64_t *ptr;
+
+ptr = map_domain_page(mfn);
+*ptr = ~SCRUB_PATTERN;
+unmap_domain_page(ptr);
+#endif
+}
+
+static void check_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+mfn_t mfn = _mfn(page_to_mfn(pg));
+const uint64_t *ptr;
+unsigned int i;
+
+if ( !boot_scrub_done )
+return;
+
+ptr = map_domain_page(mfn);
+for ( i = 0; i < PAGE_SIZE / sizeof (*ptr); i++ )
+ASSERT(ptr[i] == SCRUB_PATTERN);
+unmap_domain_page(ptr);
+#endif
+}
+
 static void check_and_stop_scrub(struct page_info *head)
 {
 if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
@@ -928,6 +969,9 @@ static struct page_info *alloc_heap_pages(
  * guest can control its own visibility of/through the cache.
  */
 flush_page_to_ram(page_to_mfn([i]), !(memflags & 
MEMF_no_icache_flush));
+
+if ( !(memflags & MEMF_no_scrub) )
+check_one_page([i]);
 }
 
 spin_unlock(_lock);
@@ -1291,7 +1335,10 @@ static void free_heap_pages(
 set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
 
 if ( need_scrub )
+{
 pg[i].count_info |= PGC_need_scrub;
+poison_one_page([i]);
+}
 }
 
 avail[node][zone] += 1 << order;
@@ -1649,7 +1696,12 @@ static void init_heap_pages(
 nr_pages -= n;
 }
 
+#ifndef CONFIG_SCRUB_DEBUG
 free_heap_pages(pg + i, 0, false);
+#else
+free_heap_pages(pg + i, 0, boot_scrub_done);
+#endif
+   
 }
 }
 
@@ -1915,6 +1967,10 @@ void __init scrub_heap_pages(void)
 
 printk("done.\n");
 
+#ifdef CONFIG_SCRUB_DEBUG
+boot_scrub_done = true;
+#endif
+
 /* Now that the heap is initialized, run checks and set bounds
  * for the low mem virq algorithm. */
 setup_low_mem_virq();
@@ -2188,12 +2244,16 @@ void free_domheap_pages(struct page_info *pg, unsigned 
int order)
 
 spin_unlock_recursive(>page_alloc_lock);
 
+#ifndef CONFIG_SCRUB_DEBUG
 /*
  * Normally we expect a domain to clear pages before freeing them,
  * if it cares about the secrecy of their contents. However, after
  * a domain has died we assume responsibility for erasure.
  */
 scrub = !!d->is_dying;
+#else
+scrub = true;
+#endif
 }
 else
 {
@@ -2285,7 +2345,8 @@ void scrub_one_page(struct page_info *pg)
 
 #ifndef NDEBUG
 /* Avoid callers relying on allocations returning zeroed pages. */
-unmap_domain_page(memset(__map_domain_page(pg), 0xc2, PAGE_SIZE));
+unmap_domain_page(memset(__map_domain_page(pg),
+ SCRUB_BYTE_PATTERN, PAGE_SIZE));
 #else
 /* For a production build, clear_page() is the fastest way to scrub. */
 clear_domain_page(_mfn(page_to_mfn(pg)));
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCHES v8 3/8] mm: Scrub pages in alloc_heap_pages() if needed

2017-08-16 Thread Boris Ostrovsky

When allocating pages in alloc_heap_pages() first look for clean pages. If none
is found then retry, take pages marked as unscrubbed and scrub them.

Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However,
this will become possible when we stop scrubbing from free_heap_pages() and
instead do it from idle loop.

Since not all allocations require clean pages (such as xenheap allocations)
introduce MEMF_no_scrub flag that callers can set if they are willing to
consume unscrubbed pages.

Signed-off-by: Boris Ostrovsky 
Reviewed-by: Jan Beulich 
---
 xen/common/page_alloc.c | 33 +
 xen/include/xen/mm.h|  4 +++-
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 5c550b5..1303736 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -702,6 +702,7 @@ static struct page_info *get_free_buddy(unsigned int 
zone_lo,
 nodemask_t nodemask = d ? d->node_affinity : node_online_map;
 unsigned int j, zone, nodemask_retry = 0;
 struct page_info *pg;
+bool use_unscrubbed = (memflags & MEMF_no_scrub);
 
 if ( node == NUMA_NO_NODE )
 {
@@ -733,8 +734,20 @@ static struct page_info *get_free_buddy(unsigned int 
zone_lo,
 
 /* Find smallest order which can satisfy the request. */
 for ( j = order; j <= MAX_ORDER; j++ )
+{
 if ( (pg = page_list_remove_head((node, zone, j))) )
-return pg;
+{
+/*
+ * We grab single pages (order=0) even if they are
+ * unscrubbed. Given that scrubbing one page is fairly 
quick
+ * it is not worth breaking higher orders.
+ */
+if ( (order == 0) || use_unscrubbed ||
+ pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+return pg;
+page_list_add_tail(pg, (node, zone, j));
+}
+}
 } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
 if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
@@ -818,6 +831,10 @@ static struct page_info *alloc_heap_pages(
 }
 
 pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+/* Try getting a dirty buddy if we couldn't get a clean one. */
+if ( !pg && !(memflags & MEMF_no_scrub) )
+pg = get_free_buddy(zone_lo, zone_hi, order,
+memflags | MEMF_no_scrub, d);
 if ( !pg )
 {
 /* No suitable memory blocks. Fail the request. */
@@ -863,7 +880,15 @@ static struct page_info *alloc_heap_pages(
 for ( i = 0; i < (1 << order); i++ )
 {
 /* Reference count must continuously be zero for free pages. */
-BUG_ON(pg[i].count_info != PGC_state_free);
+BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
+
+if ( test_bit(_PGC_need_scrub, [i].count_info) )
+{
+if ( !(memflags & MEMF_no_scrub) )
+scrub_one_page([i]);
+node_need_scrub[node]--;
+}
+
 pg[i].count_info = PGC_state_inuse;
 
 if ( !(memflags & MEMF_no_tlbflush) )
@@ -1737,7 +1762,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned 
int memflags)
 ASSERT(!in_irq());
 
 pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN,
-  order, memflags, NULL);
+  order, memflags | MEMF_no_scrub, NULL);
 if ( unlikely(pg == NULL) )
 return NULL;
 
@@ -1787,7 +1812,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned 
int memflags)
 if ( !(memflags >> _MEMF_bits) )
 memflags |= MEMF_bits(xenheap_bits);
 
-pg = alloc_domheap_pages(NULL, order, memflags);
+pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub);
 if ( unlikely(pg == NULL) )
 return NULL;
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 503b92e..e1f9c42 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -248,7 +248,9 @@ struct npfec {
 #define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
 #define _MEMF_no_icache_flush 7
 #define  MEMF_no_icache_flush (1U<<_MEMF_no_icache_flush)
-#define _MEMF_node8
+#define _MEMF_no_scrub8
+#define  MEMF_no_scrub(1U<<_MEMF_no_scrub)
+#define _MEMF_node16
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n) n) + 1) & MEMF_node_mask) << _MEMF_node)
 #define  MEMF_get_node(f) f) >> _MEMF_node) - 1) & MEMF_node_mask)
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH] x86/svm: Use physical addresses for HSA and Host VMCB

2017-08-16 Thread Andrew Cooper

They are only referenced by physical address (either the HSA MSR, or via
VMSAVE/VMLOAD which take a physical operand).  Allocating xenheap hages and
storing their virtual address is wasteful.

Allocate them with domheap pages instead, taking the opportunity to suitably
NUMA-position them.  This avoids Xen needing to perform a virt to phys
translation on every context switch.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Boris Ostrovsky 
CC: Suravee Suthikulpanit 

TODO at some other point: Figure out why svm_cpu_up_prepare() is reliably
called twice for every CPU.
---
 xen/arch/x86/hvm/svm/svm.c | 72 --
 xen/arch/x86/hvm/svm/vmcb.c| 15 
 xen/include/asm-x86/hvm/svm/vmcb.h |  1 -
 3 files changed, 54 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 0dc9442..599a8d3 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -72,11 +72,13 @@ static void svm_update_guest_efer(struct vcpu *);
 
 static struct hvm_function_table svm_function_table;
 
-/* va of hardware host save area */
-static DEFINE_PER_CPU_READ_MOSTLY(void *, hsa);
-
-/* vmcb used for extended host state */
-static DEFINE_PER_CPU_READ_MOSTLY(void *, root_vmcb);
+/*
+ * Physical addresses of the Host State Area (for hardware) and vmcb (for Xen)
+ * which contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state when in
+ * guest vcpu context.
+ */
+static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, hsa);
+static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, host_vmcb);
 
 static bool_t amd_erratum383_found __read_mostly;
 
@@ -1015,7 +1017,7 @@ static void svm_ctxt_switch_from(struct vcpu *v)
 svm_tsc_ratio_save(v);
 
 svm_sync_vmcb(v);
-svm_vmload(per_cpu(root_vmcb, cpu));
+svm_vmload_pa(per_cpu(host_vmcb, cpu));
 
 /* Resume use of ISTs now that the host TR is reinstated. */
 set_ist(_tables[cpu][TRAP_double_fault],  IST_DF);
@@ -1045,7 +1047,7 @@ static void svm_ctxt_switch_to(struct vcpu *v)
 
 svm_restore_dr(v);
 
-svm_vmsave(per_cpu(root_vmcb, cpu));
+svm_vmsave_pa(per_cpu(host_vmcb, cpu));
 svm_vmload(vmcb);
 vmcb->cleanbits.bytes = 0;
 svm_lwp_load(v);
@@ -1468,24 +1470,58 @@ static int svm_event_pending(struct vcpu *v)
 
 static void svm_cpu_dead(unsigned int cpu)
 {
-free_xenheap_page(per_cpu(hsa, cpu));
-per_cpu(hsa, cpu) = NULL;
-free_vmcb(per_cpu(root_vmcb, cpu));
-per_cpu(root_vmcb, cpu) = NULL;
+paddr_t *this_hsa = _cpu(hsa, cpu);
+paddr_t *this_vmcb = _cpu(host_vmcb, cpu);
+
+if ( *this_hsa )
+{
+free_domheap_page(maddr_to_page(*this_hsa));
+*this_hsa = 0;
+}
+
+if ( *this_vmcb )
+{
+free_domheap_page(maddr_to_page(*this_vmcb));
+*this_vmcb = 0;
+}
 }
 
 static int svm_cpu_up_prepare(unsigned int cpu)
 {
-if ( ((per_cpu(hsa, cpu) == NULL) &&
-  ((per_cpu(hsa, cpu) = alloc_host_save_area()) == NULL)) ||
- ((per_cpu(root_vmcb, cpu) == NULL) &&
-  ((per_cpu(root_vmcb, cpu) = alloc_vmcb()) == NULL)) )
+paddr_t *this_hsa = _cpu(hsa, cpu);
+paddr_t *this_vmcb = _cpu(host_vmcb, cpu);
+nodeid_t node = cpu_to_node(cpu);
+unsigned int memflags = 0;
+struct page_info *pg;
+
+if ( node != NUMA_NO_NODE )
+memflags = MEMF_node(node);
+
+if ( !*this_hsa )
+{
+pg = alloc_domheap_page(NULL, memflags);
+if ( !pg )
+goto err;
+
+clear_domain_page(_mfn(page_to_mfn(pg)));
+*this_hsa = page_to_maddr(pg);
+}
+
+if ( !*this_vmcb )
 {
-svm_cpu_dead(cpu);
-return -ENOMEM;
+pg = alloc_domheap_page(NULL, memflags);
+if ( !pg )
+goto err;
+
+clear_domain_page(_mfn(page_to_mfn(pg)));
+*this_vmcb = page_to_maddr(pg);
 }
 
 return 0;
+
+ err:
+svm_cpu_dead(cpu);
+return -ENOMEM;
 }
 
 static void svm_init_erratum_383(const struct cpuinfo_x86 *c)
@@ -1544,7 +1580,7 @@ static int _svm_cpu_up(bool bsp)
 write_efer(read_efer() | EFER_SVME);
 
 /* Initialize the HSA for this core. */
-wrmsrl(MSR_K8_VM_HSAVE_PA, (uint64_t)virt_to_maddr(per_cpu(hsa, cpu)));
+wrmsrl(MSR_K8_VM_HSAVE_PA, per_cpu(hsa, cpu));
 
 /* check for erratum 383 */
 svm_init_erratum_383(c);
diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index 9493215..997e759 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -50,21 +50,6 @@ void free_vmcb(struct vmcb_struct *vmcb)
 free_xenheap_page(vmcb);
 }
 
-struct host_save_area *alloc_host_save_area(void)
-{
-struct host_save_area *hsa;
-
-hsa = alloc_xenheap_page();
-if ( hsa == NULL )
-{
-printk(XENLOG_WARNING "Warning: failed to allocate hsa.\n");
-return NULL;
-}
-
-

[Xen-devel] [PATCH v2 1/2] paravirt,xen: remove xen_patch()

2017-08-16 Thread Juergen Gross

Xen's paravirt patch function xen_patch() does some special casing for
irq_ops functions to apply relocations when those functions can be
patched inline instead of calls.

Unfortunately none of the special case function replacements is small
enough to be patched inline, so the special case never applies.

As xen_patch() will call paravirt_patch_default() in all cases it can
be just dropped. xen-asm.h doesn't seem necessary without xen_patch()
as the only thing left in it would be the definition of XEN_EFLAGS_NMI
used only once. So move that definition and remove xen-asm.h.

Signed-off-by: Juergen Gross 
---
 arch/x86/xen/enlighten_pv.c | 59 +
 arch/x86/xen/xen-asm.S  | 24 --
 arch/x86/xen/xen-asm.h  | 12 -
 arch/x86/xen/xen-asm_32.S   | 27 -
 arch/x86/xen/xen-asm_64.S   | 20 ---
 arch/x86/xen/xen-ops.h  | 15 +++-
 6 files changed, 20 insertions(+), 137 deletions(-)
 delete mode 100644 arch/x86/xen/xen-asm.h

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 811e4ddb3f37..98491521bb43 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -981,59 +981,6 @@ void __ref xen_setup_vcpu_info_placement(void)
}
 }
 
-static unsigned xen_patch(u8 type, u16 clobbers, void *insnbuf,
- unsigned long addr, unsigned len)
-{
-   char *start, *end, *reloc;
-   unsigned ret;
-
-   start = end = reloc = NULL;
-
-#define SITE(op, x)\
-   case PARAVIRT_PATCH(op.x):  \
-   if (xen_have_vcpu_info_placement) { \
-   start = (char *)xen_##x##_direct;   \
-   end = xen_##x##_direct_end; \
-   reloc = xen_##x##_direct_reloc; \
-   }   \
-   goto patch_site
-
-   switch (type) {
-   SITE(pv_irq_ops, irq_enable);
-   SITE(pv_irq_ops, irq_disable);
-   SITE(pv_irq_ops, save_fl);
-   SITE(pv_irq_ops, restore_fl);
-#undef SITE
-
-   patch_site:
-   if (start == NULL || (end-start) > len)
-   goto default_patch;
-
-   ret = paravirt_patch_insns(insnbuf, len, start, end);
-
-   /* Note: because reloc is assigned from something that
-  appears to be an array, gcc assumes it's non-null,
-  but doesn't know its relationship with start and
-  end. */
-   if (reloc > start && reloc < end) {
-   int reloc_off = reloc - start;
-   long *relocp = (long *)(insnbuf + reloc_off);
-   long delta = start - (char *)addr;
-
-   *relocp += delta;
-   }
-   break;
-
-   default_patch:
-   default:
-   ret = paravirt_patch_default(type, clobbers, insnbuf,
-addr, len);
-   break;
-   }
-
-   return ret;
-}
-
 static const struct pv_info xen_info __initconst = {
.shared_kernel_pmd = 0,
 
@@ -1043,10 +990,6 @@ static const struct pv_info xen_info __initconst = {
.name = "Xen",
 };
 
-static const struct pv_init_ops xen_init_ops __initconst = {
-   .patch = xen_patch,
-};
-
 static const struct pv_cpu_ops xen_cpu_ops __initconst = {
.cpuid = xen_cpuid,
 
@@ -1244,7 +1187,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
 
/* Install Xen paravirt ops */
pv_info = xen_info;
-   pv_init_ops = xen_init_ops;
+   pv_init_ops.patch = paravirt_patch_default;
pv_cpu_ops = xen_cpu_ops;
 
x86_platform.get_nmi_reason = xen_get_nmi_reason;
diff --git a/arch/x86/xen/xen-asm.S b/arch/x86/xen/xen-asm.S
index eff224df813f..5410c122b8f2 100644
--- a/arch/x86/xen/xen-asm.S
+++ b/arch/x86/xen/xen-asm.S
@@ -1,14 +1,8 @@
 /*
- * Asm versions of Xen pv-ops, suitable for either direct use or
- * inlining.  The inline versions are the same as the direct-use
- * versions, with the pre- and post-amble chopped off.
- *
- * This code is encoded for size rather than absolute efficiency, with
- * a view to being able to inline as much as possible.
+ * Asm versions of Xen pv-ops, suitable for direct use.
  *
  * We only bother with direct forms (ie, vcpu in percpu data) of the
- * operations here; the indirect forms are better handled in C, since
- * they're generally too large to inline anyway.
+ * operations here; the indirect forms are better handled in C.
  */
 
 #include 
@@ -16,7 +10,7 @@
 #include 
 #include 
 
-#include "xen-asm.h"
+#include 
 
 /*
  * Enable events.  This clears the event mask and tests the pending
@@ -38,13

[Xen-devel] [PATCH v2 0/2] x86: paravirt related cleanup

2017-08-16 Thread Juergen Gross

Cleanup special cases of paravirt patching:

- Xen doesn't need a custom patching function, it can use
  paravirt_patch_default()

- Remove lguest completely from the tree. A LKML mail asking for any
  users 3 months ago did not reveal any need for keeping lguest [1].

In case the patches make it to the tree there is quite some potential
for further simplification of paravirt stuff. Especially most of the
pv operations can be put under the CONFIG_XEN_PV umbrella.

Changes in V2:
- drop patch 3 (removal of vsmp support)
- patch 1: remove even more stuff no longer needed without xen_patch()
(Peter Zijlstra)

[1]: https://lkml.org/lkml/2017/5/15/502

Juergen Gross (2):
  paravirt,xen: remove xen_patch()
  x86/lguest: remove lguest support

 MAINTAINERS   |   11 -
 arch/x86/Kbuild   |3 -
 arch/x86/Kconfig  |2 -
 arch/x86/include/asm/lguest.h |   91 -
 arch/x86/include/asm/lguest_hcall.h   |   74 -
 arch/x86/include/asm/processor.h  |2 +-
 arch/x86/include/uapi/asm/bootparam.h |2 +-
 arch/x86/kernel/asm-offsets_32.c  |   20 -
 arch/x86/kernel/head_32.S |2 -
 arch/x86/kernel/platform-quirks.c |1 -
 arch/x86/kvm/Kconfig  |1 -
 arch/x86/lguest/Kconfig   |   14 -
 arch/x86/lguest/Makefile  |2 -
 arch/x86/lguest/boot.c| 1558 ---
 arch/x86/lguest/head_32.S |  192 --
 arch/x86/xen/enlighten_pv.c   |   59 +-
 arch/x86/xen/xen-asm.S|   24 +-
 arch/x86/xen/xen-asm.h|   12 -
 arch/x86/xen/xen-asm_32.S |   27 +-
 arch/x86/xen/xen-asm_64.S |   20 +-
 arch/x86/xen/xen-ops.h|   15 +-
 drivers/Makefile  |1 -
 drivers/block/Kconfig |2 +-
 drivers/char/Kconfig  |2 +-
 drivers/char/virtio_console.c |2 +-
 drivers/lguest/Kconfig|   13 -
 drivers/lguest/Makefile   |   26 -
 drivers/lguest/README |   47 -
 drivers/lguest/core.c |  398 
 drivers/lguest/hypercalls.c   |  304 ---
 drivers/lguest/interrupts_and_traps.c |  706 ---
 drivers/lguest/lg.h   |  258 ---
 drivers/lguest/lguest_user.c  |  446 -
 drivers/lguest/page_tables.c  | 1239 
 drivers/lguest/segments.c |  228 ---
 drivers/lguest/x86/core.c |  724 ---
 drivers/lguest/x86/switcher_32.S  |  388 
 drivers/net/Kconfig   |2 +-
 drivers/tty/hvc/Kconfig   |2 +-
 drivers/virtio/Kconfig|4 +-
 include/linux/lguest.h|   73 -
 include/linux/lguest_launcher.h   |   44 -
 include/uapi/linux/virtio_ring.h  |4 +-
 tools/Makefile|   11 +-
 tools/lguest/.gitignore   |2 -
 tools/lguest/Makefile |   14 -
 tools/lguest/extract  |   58 -
 tools/lguest/lguest.c | 3420 -
 tools/lguest/lguest.txt   |  125 --
 49 files changed, 36 insertions(+), 10639 deletions(-)
 delete mode 100644 arch/x86/include/asm/lguest.h
 delete mode 100644 arch/x86/include/asm/lguest_hcall.h
 delete mode 100644 arch/x86/lguest/Kconfig
 delete mode 100644 arch/x86/lguest/Makefile
 delete mode 100644 arch/x86/lguest/boot.c
 delete mode 100644 arch/x86/lguest/head_32.S
 delete mode 100644 arch/x86/xen/xen-asm.h
 delete mode 100644 drivers/lguest/Kconfig
 delete mode 100644 drivers/lguest/Makefile
 delete mode 100644 drivers/lguest/README
 delete mode 100644 drivers/lguest/core.c
 delete mode 100644 drivers/lguest/hypercalls.c
 delete mode 100644 drivers/lguest/interrupts_and_traps.c
 delete mode 100644 drivers/lguest/lg.h
 delete mode 100644 drivers/lguest/lguest_user.c
 delete mode 100644 drivers/lguest/page_tables.c
 delete mode 100644 drivers/lguest/segments.c
 delete mode 100644 drivers/lguest/x86/core.c
 delete mode 100644 drivers/lguest/x86/switcher_32.S
 delete mode 100644 include/linux/lguest.h
 delete mode 100644 include/linux/lguest_launcher.h
 delete mode 100644 tools/lguest/.gitignore
 delete mode 100644 tools/lguest/Makefile
 delete mode 100644 tools/lguest/extract
 delete mode 100644 tools/lguest/lguest.c
 delete mode 100644 tools/lguest/lguest.txt

-- 
2.12.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 2.5/4] xen/x86: Replace mandatory barriers with compiler barriers

2017-08-16 Thread Andrew Cooper

In this case, rmb() is being used for its compiler barrier property.  Replace
it with an explicit barrer() and comment, to avoid it becoming an unnecessary
lfence instruction (when rmb() gets fixed) or looking like an SMP issue.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
---
 xen/drivers/passthrough/amd/iommu_init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_init.c 
b/xen/drivers/passthrough/amd/iommu_init.c
index a459e99..474992a 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -558,7 +558,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, 
u32 entry[])
 return;
 }
 udelay(1);
-rmb();
+barrier(); /* Prevent hoisting of the entry[] read. */
 code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
   IOMMU_EVENT_CODE_SHIFT);
 }
@@ -663,7 +663,7 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 
entry[])
 return;
 }
 udelay(1);
-rmb();
+barrier(); /* Prevent hoisting of the entry[] read. */
 code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
   IOMMU_PPR_LOG_CODE_SHIFT);
 }
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] xen/x86: Drop unnecessary barriers

2017-08-16 Thread Andrew Cooper

On 16/08/17 17:47, Andrew Cooper wrote:
> On 16/08/17 16:23, Jan Beulich wrote:
> On 16.08.17 at 13:22,  wrote:
>>> x86's current implementation of wmb() is a compiler barrier.  As a result, 
>>> the
>>> only change in this patch is to remove an mfence instruction from
>>> cpuidle_disable_deep_cstate().
>>>
>>> None of these barriers serve any purpose.  Most aren't aren't synchronising
>>> with any remote cpus, where as the mcetelem barriers are redundant with
>>> spin_unlock(), which already has full read/write barrier semantics.
>>>
>>> Signed-off-by: Andrew Cooper 
>> For the relevant parts
>> Acked-by: Jan Beulich 
>> For the parts the ack doesn't extend to, however:
>>
>>> --- a/xen/arch/x86/mm/shadow/multi.c
>>> +++ b/xen/arch/x86/mm/shadow/multi.c
>>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>>>   * will make sure no inconsistent mapping being translated into
>>>   * shadow page table. */
>>>  version = atomic_read(>arch.paging.shadow.gtable_dirty_version);
>>> -rmb();
>>>  walk_ok = sh_walk_guest_tables(v, va, , error_code);
>> Isn't this supposed to make sure version is being read first? I.e.
>> doesn't this at least need to be barrier()?
> atomic_read() is not free to be reordered by the compiler.  It is an asm
> volatile with a volatile memory reference.
>
>>> index a459e99..d5b6049 100644
>>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>>> @@ -558,7 +558,6 @@ static void parse_event_log_entry(struct amd_iommu 
>>> *iommu, u32 entry[])
>>>  return;
>>>  }
>>>  udelay(1);
>>> -rmb();
>>>  code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>>>IOMMU_EVENT_CODE_SHIFT);
>>>  }
>>> @@ -663,7 +662,6 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 
>>> entry[])
>>>  return;
>>>  }
>>>  udelay(1);
>>> -rmb();
>>>  code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>>>IOMMU_PPR_LOG_CODE_SHIFT);
>>>  }
>> With these fully removed, what keeps the compiler from moving
>> the entry[1] reads out of the loop? Implementation details of
>> udelay() don't count...
> It is a write to the control variable which is derived from a non-local
> non-constant object.  It can't be hoisted at all.
>
> Consider this simplified version:
>
> while ( count == 0 )
> count = entry[1];
>
> If entry were const, the compiler would be free to expect that the value
> doesn't change on repeated reads, but that is not the case here.

(And continuing my run of luck today), it turns out that GCC does
compile my example here to an infinite loop.

82d08026025f:   84 c0   test   %al,%al
82d080260261:   75 0a   jne82d08026026d 

82d080260263:   8b 46 04mov0x4(%rsi),%eax
82d080260266:   c1 e8 1cshr$0x1c,%eax
82d080260269:   84 c0   test   %al,%al
82d08026026b:   74 fc   je 82d080260269 



I will move this to being a barrer() with a hoisting comment (to avoid
it looking like an SMP issue), and I'm going to have to re-evaluate how
sane I think the C standard to be.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v2 07/22] ARM: vGIC: introduce priority setter/getter

2017-08-16 Thread Julien Grall




On 16/08/17 17:48, Andre Przywara wrote:

Hi,

On 11/08/17 15:10, Julien Grall wrote:

Hi Andre,

On 21/07/17 20:59, Andre Przywara wrote:

Since the GICs MMIO access always covers a number of IRQs at once,
introduce wrapper functions which loop over those IRQs, take their
locks and read or update the priority values.
This will be used in a later patch.

Signed-off-by: Andre Przywara 
---
 xen/arch/arm/vgic.c| 37 +
 xen/include/asm-arm/vgic.h |  5 +
 2 files changed, 42 insertions(+)

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 434b7e2..b2c9632 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -243,6 +243,43 @@ static int vgic_get_virq_priority(struct vcpu *v,
unsigned int virq)
 return ACCESS_ONCE(rank->priority[virq & INTERRUPT_RANK_MASK]);
 }

+#define MAX_IRQS_PER_IPRIORITYR 4


The name gives the impression that you may have IPRIORITYR with only 1
IRQ. But this is not true. The registers is always 4. However, you are
able to access using byte or word.


+uint32_t vgic_fetch_irq_priority(struct vcpu *v, unsigned int nrirqs,


I am well aware that the vgic code is mixing between virq and irq.
Moving forward, we should use virq to avoid confusion.


+ unsigned int first_irq)


Please stay consistent, with the naming. Either nr_irqs/first_irq or
nrirqs/firstirq. But not a mix.


I totally agree, but check this out:
xen/include/asm-arm/irq.h:#define nr_irqs NR_IRQS

So wherever you write nr_irqs in *any* part of ARM IRQ code you end up
with a compile error ...
Not easy to fix, though, hence I moved to the name without the
underscore, even though I don't really like it.


Oh. On a side note, nr_irqs does not cover all the IRQs. It only covers 
up to SPIs. Which is a little bit odd.


Anyway, maybe you would rename it to nr. I think it is fairly straight 
forward that you deal with IRQ.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-16 Thread Thomas Garnier

On Wed, Aug 16, 2017 at 8:12 AM, Ingo Molnar  wrote:
>
>
> * Thomas Garnier  wrote:
>
> > On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar  wrote:
> > >
> > > * Thomas Garnier  wrote:
> > >
> > >> > Do these changes get us closer to being able to build the kernel as 
> > >> > truly
> > >> > position independent, i.e. to place it anywhere in the valid x86-64 
> > >> > address
> > >> > space? Or any other advantages?
> > >>
> > >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow 
> > >> us to
> > >> have a full randomized address space where position and order of 
> > >> sections are
> > >> completely random. There is still some work to get there but being able 
> > >> to build
> > >> a PIE kernel is a significant step.
> > >
> > > So I _really_ dislike the whole PIE approach, because of the huge 
> > > slowdown:
> > >
> > > +config RANDOMIZE_BASE_LARGE
> > > +   bool "Increase the randomization range of the kernel image"
> > > +   depends on X86_64 && RANDOMIZE_BASE
> > > +   select X86_PIE
> > > +   select X86_MODULE_PLTS if MODULES
> > > +   default n
> > > +   ---help---
> > > + Build the kernel as a Position Independent Executable (PIE) and
> > > + increase the available randomization range from 1GB to 3GB.
> > > +
> > > + This option impacts performance on kernel CPU intensive 
> > > workloads up
> > > + to 10% due to PIE generated code. Impact on user-mode processes 
> > > and
> > > + typical usage would be significantly less (0.50% when you build 
> > > the
> > > + kernel).
> > > +
> > > + The kernel and modules will generate slightly more assembly (1 
> > > to 2%
> > > + increase on the .text sections). The vmlinux binary will be
> > > + significantly smaller due to less relocations.
> > >
> > > To put 10% kernel overhead into perspective: enabling this option wipes 
> > > out about
> > > 5-10 years worth of painstaking optimizations we've done to keep the 
> > > kernel fast
> > > ... (!!)
> >
> > Note that 10% is the high-bound of a CPU intensive workload.
>
> Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms 
> of
> modern kernel performance. In many cases we are literally applying cycle level
> optimizations that are barely measurable. A 0.1% speedup in linear execution 
> speed
> is already a big success.
>
> > I am going to start doing performance testing on -mcmodel=large to see if 
> > it is
> > faster than -fPIE.
>
> Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine
> instruction level.
>
> Function calls look like this:
>
>  -mcmodel=medium:
>
>757:   e8 98 ff ff ff  callq  6f4 
>
>  -mcmodel=large
>
>77b:   48 b8 10 f7 df ff ffmovabs $0xffdff710,%rax
>782:   ff ff ff
>785:   48 8d 04 03 lea(%rbx,%rax,1),%rax
>789:   ff d0   callq  *%rax
>
> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is
> totally unacceptable.
>

I started looking into mcmodel=large and ran into multiple issues. In
the meantime, i thought I would
try difference configurations and compilers.

I did 10 hackbench runs accross 10 reboots with and without pie (same
commit) with gcc 4.9. I copied
the result below and based on the hackbench configuration we are
between -0.29% and 1.92% (average
across is 0.8%) which seems more aligned with what people discussed in
this thread.

I don't know how I got 10% maximum on hackbench, I am still
investigating. It could be the configuration
I used or my base compiler being too old.

> > > I think the fundamental flaw is the assumption that we need a PIE 
> > > executable
> > > to have a freely relocatable kernel on 64-bit CPUs.
> > >
> > > Have you considered a kernel with -mcmodel=small (or medium) instead of 
> > > -fpie
> > > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) 
> > > canonical
> > > x86-64 address space to randomize the location of kernel text. The 
> > > location of
> > > modules can be further randomized within that 2GB window.
> >
> > -model=small/medium assume you are on the low 32-bit. It generates 
> > instructions
> > where the virtual addresses have the high 32-bit to be zero.
>
> How are these assumptions hardcoded by GCC? Most of the instructions should be
> relocatable straight away, as most call/jump/branch instructions are 
> RIP-relative.

I think PIE is capable to use relative instructions well.
mcmodel=large assumes symbols can be anywhere.

>
> I.e. is there no GCC code generation mode where code can be placed anywhere 
> in the
> canonical address space, yet call and jump distance is within 31 bits so that 
> the
> generated code is fast?

I think that's basically PIE. With PIE, you have the assumption
everything is close, the main issue is any assembly referencing
absolute addresses.

>
>

Re: [Xen-devel] [RFC PATCH v2 07/22] ARM: vGIC: introduce priority setter/getter

2017-08-16 Thread Andre Przywara

Hi,

On 11/08/17 15:10, Julien Grall wrote:
> Hi Andre,
> 
> On 21/07/17 20:59, Andre Przywara wrote:
>> Since the GICs MMIO access always covers a number of IRQs at once,
>> introduce wrapper functions which loop over those IRQs, take their
>> locks and read or update the priority values.
>> This will be used in a later patch.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>>  xen/arch/arm/vgic.c| 37 +
>>  xen/include/asm-arm/vgic.h |  5 +
>>  2 files changed, 42 insertions(+)
>>
>> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
>> index 434b7e2..b2c9632 100644
>> --- a/xen/arch/arm/vgic.c
>> +++ b/xen/arch/arm/vgic.c
>> @@ -243,6 +243,43 @@ static int vgic_get_virq_priority(struct vcpu *v,
>> unsigned int virq)
>>  return ACCESS_ONCE(rank->priority[virq & INTERRUPT_RANK_MASK]);
>>  }
>>
>> +#define MAX_IRQS_PER_IPRIORITYR 4
> 
> The name gives the impression that you may have IPRIORITYR with only 1
> IRQ. But this is not true. The registers is always 4. However, you are
> able to access using byte or word.
> 
>> +uint32_t vgic_fetch_irq_priority(struct vcpu *v, unsigned int nrirqs,
> 
> I am well aware that the vgic code is mixing between virq and irq.
> Moving forward, we should use virq to avoid confusion.
> 
>> + unsigned int first_irq)
> 
> Please stay consistent, with the naming. Either nr_irqs/first_irq or
> nrirqs/firstirq. But not a mix.

I totally agree, but check this out:
xen/include/asm-arm/irq.h:#define nr_irqs NR_IRQS

So wherever you write nr_irqs in *any* part of ARM IRQ code you end up
with a compile error ...
Not easy to fix, though, hence I moved to the name without the
underscore, even though I don't really like it.

Cheers,
Andre.

> 
> Also, it makes more sense to describe first the start then number.
> 
>> +{
>> +struct pending_irq *pirqs[MAX_IRQS_PER_IPRIORITYR];
>> +unsigned long flags;
>> +uint32_t ret = 0, i;
>> +
>> +local_irq_save(flags);
>> +vgic_lock_irqs(v, nrirqs, first_irq, pirqs);
> 
> I am not convinced on the usefulness of taking all the locks in one go.
> At one point in the time, you only need to lock a given pending_irq.
> 
>> +
>> +for ( i = 0; i < nrirqs; i++ )
>> +ret |= pirqs[i]->priority << (i * 8);
> 
> Please avoid open-coding number.
> 
>> +
>> +vgic_unlock_irqs(pirqs, nrirqs);
>> +local_irq_restore(flags);
>> +
>> +return ret;
>> +}
>> +
>> +void vgic_store_irq_priority(struct vcpu *v, unsigned int nrirqs,
>> + unsigned int first_irq, uint32_t value)
>> +{
>> +struct pending_irq *pirqs[MAX_IRQS_PER_IPRIORITYR];
>> +unsigned long flags;
>> +unsigned int i;
>> +
>> +local_irq_save(flags);
>> +vgic_lock_irqs(v, nrirqs, first_irq, pirqs);
>> +
>> +for ( i = 0; i < nrirqs; i++, value >>= 8 )
> 
> Same here.
> 
>> +pirqs[i]->priority = value & 0xff;
>> +
>> +vgic_unlock_irqs(pirqs, nrirqs);
>> +local_irq_restore(flags);
>> +}
>> +
>>  bool vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned
>> int irq)
>>  {
>>  unsigned long flags;
>> diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
>> index ecf4969..f3791c8 100644
>> --- a/xen/include/asm-arm/vgic.h
>> +++ b/xen/include/asm-arm/vgic.h
>> @@ -198,6 +198,11 @@ void vgic_lock_irqs(struct vcpu *v, unsigned int
>> nrirqs, unsigned int first_irq,
>>  struct pending_irq **pirqs);
>>  void vgic_unlock_irqs(struct pending_irq **pirqs, unsigned int nrirqs);
>>
>> +uint32_t vgic_fetch_irq_priority(struct vcpu *v, unsigned int nrirqs,
>> + unsigned int first_irq);
>> +void vgic_store_irq_priority(struct vcpu *v, unsigned int nrirqs,
>> + unsigned int first_irq, uint32_t reg);
>> +
>>  enum gic_sgi_mode;
>>
>>  /*
>>
> 
> Cheers,
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] xen/x86: Drop unnecessary barriers

2017-08-16 Thread Andrew Cooper

On 16/08/17 16:23, Jan Beulich wrote:
 On 16.08.17 at 13:22,  wrote:
>> x86's current implementation of wmb() is a compiler barrier.  As a result, 
>> the
>> only change in this patch is to remove an mfence instruction from
>> cpuidle_disable_deep_cstate().
>>
>> None of these barriers serve any purpose.  Most aren't aren't synchronising
>> with any remote cpus, where as the mcetelem barriers are redundant with
>> spin_unlock(), which already has full read/write barrier semantics.
>>
>> Signed-off-by: Andrew Cooper 
> For the relevant parts
> Acked-by: Jan Beulich 
> For the parts the ack doesn't extend to, however:
>
>> --- a/xen/arch/x86/mm/shadow/multi.c
>> +++ b/xen/arch/x86/mm/shadow/multi.c
>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>>   * will make sure no inconsistent mapping being translated into
>>   * shadow page table. */
>>  version = atomic_read(>arch.paging.shadow.gtable_dirty_version);
>> -rmb();
>>  walk_ok = sh_walk_guest_tables(v, va, , error_code);
> Isn't this supposed to make sure version is being read first? I.e.
> doesn't this at least need to be barrier()?

atomic_read() is not free to be reordered by the compiler.  It is an asm
volatile with a volatile memory reference.

>
>> index a459e99..d5b6049 100644
>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>> @@ -558,7 +558,6 @@ static void parse_event_log_entry(struct amd_iommu 
>> *iommu, u32 entry[])
>>  return;
>>  }
>>  udelay(1);
>> -rmb();
>>  code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>>IOMMU_EVENT_CODE_SHIFT);
>>  }
>> @@ -663,7 +662,6 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 
>> entry[])
>>  return;
>>  }
>>  udelay(1);
>> -rmb();
>>  code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>>IOMMU_PPR_LOG_CODE_SHIFT);
>>  }
> With these fully removed, what keeps the compiler from moving
> the entry[1] reads out of the loop? Implementation details of
> udelay() don't count...

It is a write to the control variable which is derived from a non-local
non-constant object.  It can't be hoisted at all.

Consider this simplified version:

while ( count == 0 )
count = entry[1];

If entry were const, the compiler would be free to expect that the value
doesn't change on repeated reads, but that is not the case here.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 6/6] xen: try to prevent idle timer from firing too often.

2017-08-16 Thread Dario Faggioli

Idea is: the more CPUs are still active in a grace period,
the more we can wait to check whether it's time to invoke
the callbacks (on those CPUs that have already quiesced,
are idle, and have callbacks queued).

What we're trying to avoid is one of those idle CPUs to
wake up, only to discover that the grace period is still
running, and that it hence could have be slept longer
(saving more power).

This patch implements an heuristic aimed at achieving
that, at the price of having to call cpumask_weight() on
the 'entering idle' path, on CPUs with queued callbacks.

Of course, we, at the same time, don't want to delay
recognising that we can invoke the callbacks for too
much, so we also set a maximum.

Signed-off-by: Dario Faggioli 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
Cc: Julien Grall 
---
 xen/common/rcupdate.c |   18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index e27bfed..b9ae6cc 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -110,10 +110,17 @@ struct rcu_data {
  * About how far in the future the timer should be programmed each time,
  * it's hard to tell (guess!!). Since this mimics Linux's periodic timer
  * tick, take values used there as an indication. In Linux 2.6.21, tick
- * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable
- * at least some power saving on the CPU that is going idle.
+ * period can be 10ms, 4ms, 3.33ms or 1ms.
+ *
+ * That being said, we can assume that, the more CPUs are still active in
+ * the current grace period, the longer it will take for it to come to its
+ * end. We wait 10ms for each active CPU, as minimizing the wakeups enables
+ * more effective power saving, on the CPU that has gone idle. But we also
+ * never wait more than 100ms, to avoid delaying recognising the end of a
+ * grace period (and the invocation of the callbacks) by too much.
  */
-#define RCU_IDLE_TIMER_PERIOD MILLISECS(10)
+#define RCU_IDLE_TIMER_CPU_DELAY  MILLISECS(10)
+#define RCU_IDLE_TIMER_PERIOD_MAX MILLISECS(100)
 
 static DEFINE_PER_CPU(struct rcu_data, rcu_data);
 
@@ -444,6 +451,7 @@ int rcu_needs_cpu(int cpu)
 void rcu_idle_timer_start()
 {
 struct rcu_data *rdp = _cpu(rcu_data);
+s_time_t next;
 
 /*
  * Note that we don't check rcu_pending() here. In fact, we don't want
@@ -453,7 +461,9 @@ void rcu_idle_timer_start()
 if (likely(!rdp->curlist))
 return;
 
-set_timer(>idle_timer, NOW() + RCU_IDLE_TIMER_PERIOD);
+next = min_t(s_time_t, RCU_IDLE_TIMER_PERIOD_MAX,
+ cpumask_weight(_ctrlblk.cpumask) * 
RCU_IDLE_TIMER_CPU_DELAY);
+set_timer(>idle_timer, NOW() + next);
 rdp->idle_timer_active = true;
 }
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 3/6] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.

2017-08-16 Thread Dario Faggioli

Xen is a tickless (micro-)kernel, i.e., when a CPU becomes
idle there is no timer tick that will periodically wake the
CPU up.
OTOH, when we imported RCU from Linux, Linux was (on x86) a
ticking kernel, i.e., there was a periodic timer tick always
running, even on idle CPUs. This was bad for power consumption,
but, for instance, made it easy to monitor the quiescent states
of all the CPUs, and hence tell when RCU grace periods ended.

In Xen, that is impossible, and that's particularly problematic
when the system is very lightly loaded, as some CPUs may never
have the chance to tell the RCU core logic about their quiescence,
and grace periods could extend indefinitely!

This has led, on x86, to long (and unpredictable) delays between
RCU callbacks queueing and their actual invokation. On ARM, we've
even seen infinite grace periods (e.g., complate_domain_destroy()
never being actually invoked!). See here:

 https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg02454.html

The first step for fixing this situation is for RCU to record,
at the beginning of a grace period, which CPUs are already idle.
In fact, being idle, they can't be in the middle of any read-side
critical section, and we don't have to wait for their quiescence.

This is tracked in a cpumask, in a similar way to how it was also
done in Linux (on s390, which was tickless already). It is also
basically the same approach used for making Linux x86 tickless,
in 2.6.21 on (see commit 79bf2bb3 "tick-management: dyntick /
highres functionality").

For correctness, wee also add barriers. One is also present in
Linux, (see commit c3f59023, "Fix RCU race in access of nohz_cpu_mask",
although, we change the code comment to something that makes better
sense for us). The other (which is its pair), is put in the newly
introduced function rcu_idle_enter(), right after updating the
cpumask. They prevent races between CPUs going idle during the
beginning of a grace period.

Signed-off-by: Dario Faggioli 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
Cc: Julien Grall 
---
Changes from v1:
* call rcu_idle_{enter,exit}() from tick suspension/restarting logic.  This
  widen the window during which a CPU has its bit set in the idle cpumask.
  During review, it was suggested to do the opposite (narrow it), and that's
  what I did first. But then, I changed my mind, as doing things as they look
  now (wide window), cures another pre-existing (and independent) raca which
  Tim discovered, still during v1 review;
* add a barrier in rcu_idle_enter() too, to properly deal with the race Tim
  pointed out during review;
* mark CPU where RCU initialization happens, at boot, as non-idle.
---
 xen/common/rcupdate.c  |   48 ++--
 xen/common/schedule.c  |2 ++
 xen/include/xen/rcupdate.h |3 +++
 3 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index 8cc5a82..9f7d41d 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -52,7 +52,8 @@ static struct rcu_ctrlblk {
 int  next_pending;  /* Is the next batch already waiting? */
 
 spinlock_t  lock __cacheline_aligned;
-cpumask_t   cpumask; /* CPUs that need to switch in order*/
+cpumask_t   cpumask; /* CPUs that need to switch in order ... */
+cpumask_t   idle_cpumask; /* ... unless they are already idle */
 /* for current batch to proceed.*/
 } __cacheline_aligned rcu_ctrlblk = {
 .cur = -300,
@@ -248,7 +249,16 @@ static void rcu_start_batch(struct rcu_ctrlblk *rcp)
 smp_wmb();
 rcp->cur++;
 
-cpumask_copy(>cpumask, _online_map);
+   /*
+* Make sure the increment of rcp->cur is visible so, even if a
+* CPU that is about to go idle, is captured inside rcp->cpumask,
+* rcu_pending() will return false, which then means cpu_quiet()
+* will be invoked, before the CPU would actually enter idle.
+*
+* This barrier is paired with the one in rcu_idle_enter().
+*/
+smp_mb();
+cpumask_andnot(>cpumask, _online_map, >idle_cpumask);
 }
 }
 
@@ -474,7 +484,41 @@ static struct notifier_block cpu_nfb = {
 void __init rcu_init(void)
 {
 void *cpu = (void *)(long)smp_processor_id();
+
+cpumask_setall(_ctrlblk.idle_cpumask);
+/* The CPU we're running on is certainly not idle */
+cpumask_clear_cpu(smp_processor_id(), _ctrlblk.idle_cpumask);
 cpu_callback(_nfb, CPU_UP_PREPARE, cpu);
 register_cpu_notifier(_nfb);
 open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
 }
+
+/*
+ * The CPU is becoming idle, so

[Xen-devel] [PATCH v2 5/6] xen: RCU: avoid busy waiting until the end of grace period.

2017-08-16 Thread Dario Faggioli

On the CPU where a callback is queued, cpu_is_haltable()
returns false (due to rcu_needs_cpu() being itself false).
That means the CPU would spin inside idle_loop(), continuously
calling do_softirq(), and, in there, continuously checking
rcu_pending(), in a tight loop.

Let's instead allow the CPU to really go idle, but make sure,
by arming a timer, that we periodically check whether the
grace period has come to an ended. As the period of the
timer, we pick a value that makes thing look like what
happens in Linux, with the periodic tick (as this code
comes from there).

Note that the timer will *only* be armed on CPUs that are
going idle while having queued RCU callbacks. On CPUs that
don't, there won't be any timer, and their sleep won't be
interrupted (and even for CPUs with callbacks, we only
expect an handful of wakeups at most, but that depends on
the system load, as much as from other things).

Signed-off-by: Dario Faggioli 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
Cc: George Dunlap 
Cc: Julien Grall 
---
Changes from v1:
* clarified changelog;
* fix style/indentation issues;
* deal with RCU idle timer in tick suspension logic;
* as a consequence of the point above, the timer now fires, so kill
  the ASSERT_UNREACHABLE, and put a perfcounter there (to count the
  times it triggers);
* add a comment about the value chosen for programming the idle timer;
* avoid pointless/bogus '!!' and void* casts;
* rearrange the rcu_needs_cpu() return condition;
* add a comment to clarify why we don't want to check rcu_pending()
  in rcu_idle_timer_start().
---
 xen/arch/x86/cpu/mwait-idle.c |3 +-
 xen/common/rcupdate.c |   72 -
 xen/common/schedule.c |2 +
 xen/include/xen/perfc_defn.h  |2 +
 xen/include/xen/rcupdate.h|3 ++
 5 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c
index 762dff1..b6770ea 100644
--- a/xen/arch/x86/cpu/mwait-idle.c
+++ b/xen/arch/x86/cpu/mwait-idle.c
@@ -741,9 +741,8 @@ static void mwait_idle(void)
}
 
cpufreq_dbs_timer_suspend();
-
sched_tick_suspend();
-   /* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */
+   /* Timer related operations can raise TIMER_SOFTIRQ. Process it now. */
process_pending_softirqs();
 
/* Interrupts must be disabled for C2 and higher transitions. */
diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c
index 9f7d41d..e27bfed 100644
--- a/xen/common/rcupdate.c
+++ b/xen/common/rcupdate.c
@@ -84,8 +84,37 @@ struct rcu_data {
 int cpu;
 struct rcu_head barrier;
 longlast_rs_qlen; /* qlen during the last resched */
+
+/* 3) idle CPUs handling */
+struct timer idle_timer;
+bool idle_timer_active;
 };
 
+/*
+ * If a CPU with RCU callbacks queued goes idle, when the grace period is
+ * not finished yet, how can we make sure that the callbacks will eventually
+ * be executed? In Linux (2.6.21, the first "tickless idle" Linux kernel),
+ * the periodic timer tick would not be stopped for such CPU. Here in Xen,
+ * we (may) don't even have a periodic timer tick, so we need to use a
+ * special purpose timer.
+ *
+ * Such timer:
+ * 1) is armed only when a CPU with an RCU callback(s) queued goes idle
+ *before the end of the current grace period (_not_ for any CPUs that
+ *go idle!);
+ * 2) when it fires, it is only re-armed if the grace period is still
+ *running;
+ * 3) it is stopped immediately, if the CPU wakes up from idle and
+ *resumes 'normal' execution.
+ *
+ * About how far in the future the timer should be programmed each time,
+ * it's hard to tell (guess!!). Since this mimics Linux's periodic timer
+ * tick, take values used there as an indication. In Linux 2.6.21, tick
+ * period can be 10ms, 4ms, 3.33ms or 1ms. Let's use 10ms, to enable
+ * at least some power saving on the CPU that is going idle.
+ */
+#define RCU_IDLE_TIMER_PERIOD MILLISECS(10)
+
 static DEFINE_PER_CPU(struct rcu_data, rcu_data);
 
 static int blimit = 10;
@@ -404,7 +433,45 @@ int rcu_needs_cpu(int cpu)
 {
 struct rcu_data *rdp = _cpu(rcu_data, cpu);
 
-return (!!rdp->curlist || rcu_pending(cpu));
+return (rdp->curlist && !rdp->idle_timer_active) || rcu_pending(cpu);
+}
+
+/*
+ * Timer for making sure the CPU where a callback is queued does
+ * periodically poke rcu_pedning(), so that it will invoke the callback
+ * not too late after the end of the grace period.
+ */
+void rcu_idle_timer_start()
+{
+struct rcu_data *rdp =

[Xen-devel] [PATCH v2 1/6] xen: in do_softirq() sample smp_processor_id() once and for all.

2017-08-16 Thread Dario Faggioli

In fact, right now, we read it at every iteration of the loop.
The reason it's done like this is how context switch was handled
on IA64 (see commit ae9bfcdc, "[XEN] Various softirq cleanups" [1]).

However:
1) we don't have IA64 any longer, and all the achitectures that
   we do support, are ok with sampling once and for all;
2) sampling at every iteration (slightly) affect performance;
3) sampling at every iteration is misleading, as it makes people
   believe that it is currently possible that SCHEDULE_SOFTIRQ
   moves the execution flow on another CPU (and the comment,
   by reinforcing this belief, makes things even worse!).

Therefore, let's:
- do the sampling only once, and remove the comment;
- leave an ASSERT() around, so that, if context switching
  logic changes (in current or new arches), we will notice.

[1] Some more (historical) information here:

http://old-list-archives.xenproject.org/archives/html/xen-devel/2006-06/msg01262.html

Signed-off-by: Dario Faggioli 
Reviewed-by: George Dunlap 
---
Cc: Andrew Cooper 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Tim Deegan 
---
This has been submitted already, as a part of another series. Discussion is 
here:
 https://lists.xen.org/archives/html/xen-devel/2017-06/msg00102.html

For the super lazy, Jan's latest word in that thread were these:
 "I've voiced my opinion, but I don't mean to block the patch. After
  all there's no active issue the change introduces."
 (https://lists.xen.org/archives/html/xen-devel/2017-06/msg00797.html)

Since then:
- changed "once and for all" with "only once", as requested by George (and
  applied his Reviewed-by, as he said I could).
---
 xen/common/softirq.c |8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/xen/common/softirq.c b/xen/common/softirq.c
index ac12cf8..67c84ba 100644
--- a/xen/common/softirq.c
+++ b/xen/common/softirq.c
@@ -27,16 +27,12 @@ static DEFINE_PER_CPU(unsigned int, batching);
 
 static void __do_softirq(unsigned long ignore_mask)
 {
-unsigned int i, cpu;
+unsigned int i, cpu = smp_processor_id();
 unsigned long pending;
 
 for ( ; ; )
 {
-/*
- * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ may move
- * us to another processor.
- */
-cpu = smp_processor_id();
+ASSERT(cpu == smp_processor_id());
 
 if ( rcu_pending(cpu) )
 rcu_check_callbacks(cpu);


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 4/6] xen: RCU: don't let a CPU with a callback go idle.

2017-08-16 Thread Dario Faggioli

If a CPU has a callback queued, it must be ready to invoke
it, as soon as all the other CPUs involved in the grace period
has gone through a quiescent state.

But if we let such CPU go idle, we can't really tell when (if!)
it will realize that it is actually time to invoke the callback.
To solve this problem, a CPU that has a callback queued (and has
already gone through a quiescent state itself) will stay online,
until the grace period ends, and the callback can be invoked.

This is similar to what Linux does, and is the second and last
step for fixing the overly long (or infinite!) grace periods.
The problem, though, is that, within Linux, we have the tick,
so, all that is necessary is to not stop the tick for the CPU
(even if it has gone idle). In Xen, there's no tick, so we must
avoid for the CPU to go idle entirely, and let it spin on
rcu_pending(), consuming power and causing overhead.

In this commit, we implement the above, using rcu_needs_cpu(),
in a way similar to how it is used in Linux. This it correct,
useful and not wasteful for CPUs that participate in grace
period, but have not a callback queued. For the ones that
has callbacks, an optimization that avoids having to spin is
introduced in a subsequent change.

Signed-off-by: Dario Faggioli 
Reviewed-by: Jan Beulich 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Julien Grall 
Cc: Tim Deegan 
Cc: Wei Liu 
---
 xen/include/xen/sched.h |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 5828a01..c116604 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -847,7 +847,8 @@ uint64_t get_cpu_idle_time(unsigned int cpu);
 
 /*
  * Used by idle loop to decide whether there is work to do:
- *  (1) Run softirqs; or (2) Play dead; or (3) Run tasklets.
+ *  (1) Deal with RCU; (2) or run softirqs; or (3) Play dead;
+ *  or (4) Run tasklets.
  *
  * About (3), if a tasklet is enqueued, it will be scheduled
  * really really soon, and hence it's pointless to try to
@@ -855,7 +856,8 @@ uint64_t get_cpu_idle_time(unsigned int cpu);
  * the tasklet_work_to_do() helper).
  */
 #define cpu_is_haltable(cpu)\
-(!softirq_pending(cpu) &&   \
+(!rcu_needs_cpu(cpu) && \
+ !softirq_pending(cpu) &&   \
  cpu_online(cpu) && \
  !per_cpu(tasklet_work_to_do, cpu))
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 0/6] xen: RCU: x86/ARM: Add support of rcu_idle_{enter, exit}

2017-08-16 Thread Dario Faggioli

Hello,

This is take 2 of this series, v1 of which can be found here:

 https://lists.xen.org/archives/html/xen-devel/2017-07/msg02770.html

This new version is mostly about taking care of the various review comments
received. Something of the differences are worth a mention here, though:

- patch 3 is significantly different, as a consequence of the fact that Tim
  highlighted, during v1 review, that there was another latent race that we
  should deal with. Luckily, it basically "was enough" to move the invocations
  of rcu_idle_{enter,exit}() a bit, and add some barriers (details in the
  patch changelog);

- patch 6 has been added, in the attempt of addressing concerns (coming mainly
  from Stefano) about the fact that the timer we need to introduce to deal
  with idle CPUs with queued callbacks, may be firing too often (leading to
  power being wasted).

There is a git branch, with this series in it, available here:

 git://xenbits.xen.org/people/dariof/xen.git  
rel/rcu/introduce-idle-enter-exit-v2
 https://travis-ci.org/fdario/xen/builds/265225626

This patch series addresses the XEN-27 issue, which I think Julien wants to
consider a blocker for 4.10:

 https://xenproject.atlassian.net/browse/XEN-27

Thanks and Regards,
Dario
---

Dario Faggioli (6):
  xen: in do_softirq() sample smp_processor_id() once and for all.
  xen: ARM: suspend the tick (if in use) when going idle.
  xen: RCU/x86/ARM: discount CPUs that were idle when grace period started.
  xen: RCU: don't let a CPU with a callback go idle.
  xen: RCU: avoid busy waiting until the end of grace period.
  xen: try to prevent idle timer from firing too often.

 xen/arch/arm/domain.c |   29 ++---
 xen/arch/x86/cpu/mwait-idle.c |3 -
 xen/common/rcupdate.c |  130 -
 xen/common/schedule.c |4 +
 xen/common/softirq.c  |8 +--
 xen/include/xen/perfc_defn.h  |2 +
 xen/include/xen/rcupdate.h|6 ++
 xen/include/xen/sched.h   |6 +-
 8 files changed, 166 insertions(+), 22 deletions(-)
--
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 2/6] xen: ARM: suspend the tick (if in use) when going idle.

2017-08-16 Thread Dario Faggioli

Since commit 964fae8ac ("cpuidle: suspend/resume scheduler
tick timer during cpu idle state entry/exit"), if a scheduler
has a periodic tick timer, we stop it when going idle.

This, however, is only true for x86. Make it true for ARM as
well.

Signed-off-by: Dario Faggioli 
Reviewed-by: Stefano Stabellini 
---
Cc: Julien Grall 
---
 xen/arch/arm/domain.c |   29 -
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index eeebbdb..2160d2b 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -39,6 +39,25 @@
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 
+static void do_idle(void)
+{
+unsigned int cpu = smp_processor_id();
+
+sched_tick_suspend();
+/* sched_tick_suspend() can raise TIMER_SOFTIRQ. Process it now. */
+process_pending_softirqs();
+
+local_irq_disable();
+if ( cpu_is_haltable(cpu) )
+{
+dsb(sy);
+wfi();
+}
+local_irq_enable();
+
+sched_tick_resume();
+}
+
 void idle_loop(void)
 {
 unsigned int cpu = smp_processor_id();
@@ -52,15 +71,7 @@ void idle_loop(void)
 if ( unlikely(tasklet_work_to_do(cpu)) )
 do_tasklet();
 else
-{
-local_irq_disable();
-if ( cpu_is_haltable(cpu) )
-{
-dsb(sy);
-wfi();
-}
-local_irq_enable();
-}
+do_idle();
 
 do_softirq();
 /*


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v2 01/22] ARM: vGIC: introduce and initialize pending_irq lock

2017-08-16 Thread Julien Grall




On 16/08/17 17:27, Andre Przywara wrote:

Hi,

On 10/08/17 16:35, Julien Grall wrote:

Hi,

On 21/07/17 20:59, Andre Przywara wrote:

Currently we protect the pending_irq structure with the corresponding
VGIC VCPU lock. There are problems in certain corner cases (for
instance if an IRQ is migrating), so let's introduce a per-IRQ lock,
which will protect the consistency of this structure independent from
any VCPU.
For now this just introduces and initializes the lock, also adds
wrapper macros to simplify its usage (and help debugging).

Signed-off-by: Andre Przywara 
---
 xen/arch/arm/vgic.c|  1 +
 xen/include/asm-arm/vgic.h | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 1e5107b..38dacd3 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -69,6 +69,7 @@ void vgic_init_pending_irq(struct pending_irq *p,
unsigned int virq)
 memset(p, 0, sizeof(*p));
 INIT_LIST_HEAD(>inflight);
 INIT_LIST_HEAD(>lr_queue);
+spin_lock_init(>lock);
 p->irq = virq;
 p->lpi_vcpu_id = INVALID_VCPU_ID;
 }
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index d4ed23d..1c38b9a 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -90,6 +90,14 @@ struct pending_irq
  * TODO: when implementing irq migration, taking only the current
  * vgic lock is not going to be enough. */
 struct list_head lr_queue;
+/* The lock protects the consistency of this structure. A single
status bit
+ * can be read and/or set without holding the lock using the atomic
+ * set_bit/clear_bit/test_bit functions, however accessing
multiple bits or
+ * relating to other members in this struct requires the lock.
+ * The list_head members are protected by their corresponding
VCPU lock,
+ * it is not sufficient to hold this pending_irq lock here to
query or
+ * change list order or affiliation. */


Actually, I have on question here. Do the vCPU lock sufficient to
protect the list_head members. Or do you also mandate the pending_irq to
be locked as well?


For *manipulating* a list (removing or adding a pending_irq) you need to
hold both locks. We need the VCPU lock as the list head in struct vcpu
could change, and we need the per-IRQ lock to prevent a pending_irq to
be inserted into two lists at the same time (and also the list_head
member variables are changed).
However just *checking* whether a certain pending_irq is a member of a
list works with just holding the per-IRQ lock.


This does not seem to be inlined with the description above. It says "It 
is not sufficient to hold this pending_irq lock here to query...".


Also, there are a few places not taking both lock when updating the 
list. This is at least the case of:

- vgic_clear_pending_irqs
- gic_clear_pending_irqs
- its_discard_event

So something has to be done to be the code inlined with the description.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-16 Thread Ard Biesheuvel

On 16 August 2017 at 17:26, Daniel Micay  wrote:
>> How are these assumptions hardcoded by GCC? Most of the instructions
>> should be
>> relocatable straight away, as most call/jump/branch instructions are
>> RIP-relative.
>>
>> I.e. is there no GCC code generation mode where code can be placed
>> anywhere in the
>> canonical address space, yet call and jump distance is within 31 bits
>> so that the
>> generated code is fast?
>
> That's what PIE is meant to do. However, not disabling support for lazy
> linking (-fno-plt) / symbol interposition (-Bsymbolic) is going to cause
> it to add needless overhead.
>
> arm64 is using -pie -shared -Bsymbolic in arch/arm64/Makefile for their
> CONFIG_RELOCATABLE option. See 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498.

The difference with arm64 is that its generic small code model is
already position independent, so we don't have to pass -fpic or -fpie
to the compiler. We only link in PIE mode to get the linker to emit
the dynamic relocation tables into the ELF binary. Relative branches
have a range of +/- 128 MB, which covers the kernel and modules
(unless the option to randomize the module region independently has
been selected, in which case branches between the kernel and modules
may be resolved via PLT entries that are emitted at module load time)

I am not sure how this extrapolates to x86, just adding some context.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v2 01/22] ARM: vGIC: introduce and initialize pending_irq lock

2017-08-16 Thread Andre Przywara

Hi,

On 10/08/17 16:35, Julien Grall wrote:
> Hi,
> 
> On 21/07/17 20:59, Andre Przywara wrote:
>> Currently we protect the pending_irq structure with the corresponding
>> VGIC VCPU lock. There are problems in certain corner cases (for
>> instance if an IRQ is migrating), so let's introduce a per-IRQ lock,
>> which will protect the consistency of this structure independent from
>> any VCPU.
>> For now this just introduces and initializes the lock, also adds
>> wrapper macros to simplify its usage (and help debugging).
>>
>> Signed-off-by: Andre Przywara 
>> ---
>>  xen/arch/arm/vgic.c|  1 +
>>  xen/include/asm-arm/vgic.h | 11 +++
>>  2 files changed, 12 insertions(+)
>>
>> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
>> index 1e5107b..38dacd3 100644
>> --- a/xen/arch/arm/vgic.c
>> +++ b/xen/arch/arm/vgic.c
>> @@ -69,6 +69,7 @@ void vgic_init_pending_irq(struct pending_irq *p,
>> unsigned int virq)
>>  memset(p, 0, sizeof(*p));
>>  INIT_LIST_HEAD(>inflight);
>>  INIT_LIST_HEAD(>lr_queue);
>> +spin_lock_init(>lock);
>>  p->irq = virq;
>>  p->lpi_vcpu_id = INVALID_VCPU_ID;
>>  }
>> diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
>> index d4ed23d..1c38b9a 100644
>> --- a/xen/include/asm-arm/vgic.h
>> +++ b/xen/include/asm-arm/vgic.h
>> @@ -90,6 +90,14 @@ struct pending_irq
>>   * TODO: when implementing irq migration, taking only the current
>>   * vgic lock is not going to be enough. */
>>  struct list_head lr_queue;
>> +/* The lock protects the consistency of this structure. A single
>> status bit
>> + * can be read and/or set without holding the lock using the atomic
>> + * set_bit/clear_bit/test_bit functions, however accessing
>> multiple bits or
>> + * relating to other members in this struct requires the lock.
>> + * The list_head members are protected by their corresponding
>> VCPU lock,
>> + * it is not sufficient to hold this pending_irq lock here to
>> query or
>> + * change list order or affiliation. */
> 
> Actually, I have on question here. Do the vCPU lock sufficient to
> protect the list_head members. Or do you also mandate the pending_irq to
> be locked as well?

For *manipulating* a list (removing or adding a pending_irq) you need to
hold both locks. We need the VCPU lock as the list head in struct vcpu
could change, and we need the per-IRQ lock to prevent a pending_irq to
be inserted into two lists at the same time (and also the list_head
member variables are changed).
However just *checking* whether a certain pending_irq is a member of a
list works with just holding the per-IRQ lock.

> Also, it would be good to have the locking order documented maybe in
> docs/misc?

Yes, I agree having a high level VGIC document (focussing on the locking
for the beginning) is a good idea.

Cheers,
Andre.

> 
>> +spinlock_t lock;
>>  };
>>
>>  #define NR_INTERRUPT_PER_RANK   32
>> @@ -156,6 +164,9 @@ struct vgic_ops {
>>  #define vgic_lock(v)   spin_lock_irq(&(v)->domain->arch.vgic.lock)
>>  #define vgic_unlock(v) spin_unlock_irq(&(v)->domain->arch.vgic.lock)
>>
>> +#define vgic_irq_lock(p, flags) spin_lock_irqsave(&(p)->lock, flags)
>> +#define vgic_irq_unlock(p, flags) spin_unlock_irqrestore(&(p)->lock,
>> flags)
>> +
>>  #define vgic_lock_rank(v, r, flags)   spin_lock_irqsave(&(r)->lock,
>> flags)
>>  #define vgic_unlock_rank(v, r, flags)
>> spin_unlock_irqrestore(&(r)->lock, flags)
>>
>>
> 
> Cheers,
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-16 Thread Daniel Micay

> How are these assumptions hardcoded by GCC? Most of the instructions
> should be 
> relocatable straight away, as most call/jump/branch instructions are
> RIP-relative.
> 
> I.e. is there no GCC code generation mode where code can be placed
> anywhere in the 
> canonical address space, yet call and jump distance is within 31 bits
> so that the 
> generated code is fast?

That's what PIE is meant to do. However, not disabling support for lazy
linking (-fno-plt) / symbol interposition (-Bsymbolic) is going to cause
it to add needless overhead.

arm64 is using -pie -shared -Bsymbolic in arch/arm64/Makefile for their
CONFIG_RELOCATABLE option. See 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Reduce debug overhead of __virt_to_maddr()

2017-08-16 Thread Andrew Cooper

On 16/08/17 16:07, Jan Beulich wrote:
 On 16.08.17 at 16:22,  wrote:
>> On 16/08/17 15:14, Andrew Cooper wrote:
>>> On 16/08/17 15:11, Jan Beulich wrote:
>>> On 16.08.17 at 15:58,  wrote:
> --- a/xen/include/asm-x86/x86_64/page.h
> +++ b/xen/include/asm-x86/x86_64/page.h
> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>  
>  static inline unsigned long __virt_to_maddr(unsigned long va)
>  {
> -ASSERT(va >= XEN_VIRT_START);
>  ASSERT(va < DIRECTMAP_VIRT_END);
>  if ( va >= DIRECTMAP_VIRT_START )
>  va -= DIRECTMAP_VIRT_START;
>  else
>  {
> -ASSERT(va < XEN_VIRT_END);
> +BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
> +ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
> +   ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
 Do you really need the casts here? I.e. what's wrong here with
 doing unsigned long arithmetic?
>>> Oh - good point.  This took more than one attempt to get right, and I
>>> first thought I had a sign extension problem.  The actual problem was a
>>> (lack of) + PAGE_SHIFT.
>>>
>>> The other thing to know is that  __virt_to_maddr() is used before the
>>> IDT is set up, so your only signal of something being wrong is a triple
>>> fault.  Let me double check without the casts, but I think it should be
>>> fine.
>> Ok - so it does function when using unsigned arithmetic.
>>
>> However, the generated code is better with signed arithmetic, as
>> ((long)XEN_VIRT_START >> 39) fix in a 32bit sign-extended immediate,
>> whereas XEN_VIRT_START >> 39 needs a movabs.
> Why would that be? Shifting out 39 bits means 25 significant bits
> are left out of the original 64. Or wait - isn't it 30 rather than 39?
> In that case indeed 34 significant bits would remain. In that case
> I'd be fine with the casts left in place, as long as at least the
> commit message (a code comment may be better to keep people
> like me from being tempted to remove the casts as ugly and
> apparently unnecessary) says why.

I'm clearly doing very well at counting today.  I do mean 30 bits (order
18 + page shift of 12).

The generated code is this:

82d0802ff923:   48 89 c2mov%rax,%rdx
82d0802ff926:   48 c1 fa 1e sar$0x1e,%rdx
82d0802ff92a:   48 81 fa 42 0b fe ffcmp   
$0xfffe0b42,%rdx

While there are 34 significant bits from this shift, the top 16 of them
are strictly set, meaning there are only 28 usefully significant bits.

FYI, the unsigned case looks like this:

82d0802ffb12:   48 89 c1mov%rax,%rcx
82d0802ffb15:   48 c1 e9 1e shr$0x1e,%rcx
82d0802ffb19:   48 ba 42 0b fe ff 03movabs $0x3fffe0b42,%rdx
82d0802ffb20:   00 00 00
82d0802ffb23:   48 39 d1cmp%rdx,%rcx

Are you happy with the following comment?

/* Signed arithmetic in so ((long)XEN_VIRT_START >> 30) fits in an imm32. */

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-16 Thread Christopher Lameter

On Wed, 16 Aug 2017, Ingo Molnar wrote:

> And we'd do this for _EVERY_ function call in the kernel. That kind of crap is
> totally unacceptable.

Ahh finally a limit is in sight as to how much security hardening etc can
reduce kernel performance.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [GIT PULL] (xen) stable/for-jens-4.13 for rc5

2017-08-16 Thread Jens Axboe

On 08/15/2017 09:21 PM, Konrad Rzeszutek Wilk wrote:
> Hey Jens,
> 
> Please git pull the following branch:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> stable/for-jens-4.13
> 
> which has two fixes, both of them spotted by Amazon.
> 
>  1) Fix in Xen-blkfront caused by the re-write in 4.8 time-frame.
>  2) Fix in the xen_biovec_phys_mergeable which allowed guest
> requests when using NVMe - to slurp up more data than allowed
> leading to an XSA (which has been made public today).

Pulled, thanks Konrad.

-- 
Jens Axboe


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [xen-4.5-testing test] 112652: regressions - FAIL

2017-08-16 Thread osstest service owner

flight 112652 xen-4.5-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112652/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-vhd   6 xen-install  fail REGR. vs. 110906
 test-amd64-i386-xl-qemuu-winxpsp3 16 guest-localmigrate/x10 fail REGR. vs. 
110906
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-localmigrate/x10 fail REGR. vs. 
110906
 test-armhf-armhf-libvirt-raw 10 debian-di-installfail REGR. vs. 110906

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds  7 xen-boot fail  like 110906
 test-xtf-amd64-amd64-2   57 leak-check/check fail  like 110906
 test-xtf-amd64-amd64-3   57 leak-check/check fail  like 110906
 test-xtf-amd64-amd64-5   57 leak-check/check fail  like 110906
 test-xtf-amd64-amd64-1   57 leak-check/check fail  like 110906
 test-xtf-amd64-amd64-4   57 leak-check/check fail  like 110906
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 110906
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 110906
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-localmigrate/x10 fail like 110906
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 110906
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 110906
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-xtf-amd64-amd64-2   19 xtf/test-hvm32-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-2 32 xtf/test-hvm32pae-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-2 39 xtf/test-hvm32pse-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-2   43 xtf/test-hvm64-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-4   19 xtf/test-hvm32-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-5   19 xtf/test-hvm32-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-1   19 xtf/test-hvm32-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-5 32 xtf/test-hvm32pae-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-1 32 xtf/test-hvm32pae-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-5 39 xtf/test-hvm32pse-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-4 32 xtf/test-hvm32pae-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-1 39 xtf/test-hvm32pse-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-5   43 xtf/test-hvm64-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-1   43 xtf/test-hvm64-cpuid-faulting fail  never pass
 test-xtf-amd64-amd64-4 39 xtf/test-hvm32pse-cpuid-faulting fail never pass
 test-xtf-amd64-amd64-4   43 xtf/test-hvm64-cpuid-faulting fail  never pass
 test-amd64-amd64-xl-pvh-amd  12 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 15 guest-saverestorefail  never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-xtf-amd64-amd64-2   56 xtf/test-hvm64-xsa-195   fail   never pass
 test-xtf-amd64-amd64-3   56 xtf/test-hvm64-xsa-195   fail   never pass
 test-xtf-amd64-amd64-5   56 xtf/test-hvm64-xsa-195   fail   never pass
 test-xtf-amd64-amd64-1   56 xtf/test-hvm64-xsa-195   fail   never pass
 test-xtf-amd64-amd64-4   56 xtf/test-hvm64-xsa-195   fail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 13 guest-saverestore  fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 13 guest-saverestore  fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore   fail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore   fail never pass

Re: [Xen-devel] [PATCH v2 4/4] xen/x86: Correct mandatory and SMP barrier definitions

2017-08-16 Thread Dario Faggioli

On Wed, 2017-08-16 at 12:22 +0100, Andrew Cooper wrote:
> Barriers are a complicated topic, a source of confusion, and their
> incorrect
> use is a common cause of bugs.  It *really* doesn't help when Xen's
> API is the
> same as Linux, but its ABI different.
> 
> Bring the two back in line, so programmers stand a chance of actually
> getting
> their usage correct.
> 
> Drop the links in the comment, both of which are now stale.  Instead,
> refer to
> the vendor system manuals.
> 
Does it perhaps make sense to link this:

https://www.kernel.org/doc/Documentation/memory-barriers.txt

> No functional change.
> 
IAC, FWIW:

> Signed-off-by: Andrew Cooper 
>
Reviewed-by: Dario Faggioli 

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers

2017-08-16 Thread Dario Faggioli

On Wed, 2017-08-16 at 12:22 +0100, Andrew Cooper wrote:
> There is no functional change.  Xen currently assignes smp_* meaning
> to
> the non-smp_* barriers.
> 
> All of these uses are just to deal with shared memory between
> multiple
> processors, so use the smp_*() which are the correct barriers for the
> purpose.
> 
FWIW, I had to deal with barriers recently, and this is much
appreciated! :-)

> Signed-off-by: Andrew Cooper 
>
Reviewed-by: Dario Faggioli 

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] xen/x86: Drop unnecessary barriers

2017-08-16 Thread Jan Beulich

>>> On 16.08.17 at 13:22,  wrote:
> x86's current implementation of wmb() is a compiler barrier.  As a result, the
> only change in this patch is to remove an mfence instruction from
> cpuidle_disable_deep_cstate().
> 
> None of these barriers serve any purpose.  Most aren't aren't synchronising
> with any remote cpus, where as the mcetelem barriers are redundant with
> spin_unlock(), which already has full read/write barrier semantics.
> 
> Signed-off-by: Andrew Cooper 

For the relevant parts
Acked-by: Jan Beulich 
For the parts the ack doesn't extend to, however:

> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>   * will make sure no inconsistent mapping being translated into
>   * shadow page table. */
>  version = atomic_read(>arch.paging.shadow.gtable_dirty_version);
> -rmb();
>  walk_ok = sh_walk_guest_tables(v, va, , error_code);

Isn't this supposed to make sure version is being read first? I.e.
doesn't this at least need to be barrier()?

> index a459e99..d5b6049 100644
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -558,7 +558,6 @@ static void parse_event_log_entry(struct amd_iommu 
> *iommu, u32 entry[])
>  return;
>  }
>  udelay(1);
> -rmb();
>  code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>IOMMU_EVENT_CODE_SHIFT);
>  }
> @@ -663,7 +662,6 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 
> entry[])
>  return;
>  }
>  udelay(1);
> -rmb();
>  code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>IOMMU_PPR_LOG_CODE_SHIFT);
>  }

With these fully removed, what keeps the compiler from moving
the entry[1] reads out of the loop? Implementation details of
udelay() don't count...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] x86: PIE support and option to extend KASLR randomization

2017-08-16 Thread Ingo Molnar


* Thomas Garnier  wrote:

> On Tue, Aug 15, 2017 at 12:56 AM, Ingo Molnar  wrote:
> >
> > * Thomas Garnier  wrote:
> >
> >> > Do these changes get us closer to being able to build the kernel as truly
> >> > position independent, i.e. to place it anywhere in the valid x86-64 
> >> > address
> >> > space? Or any other advantages?
> >>
> >> Yes, PIE allows us to put the kernel anywhere in memory. It will allow us 
> >> to
> >> have a full randomized address space where position and order of sections 
> >> are
> >> completely random. There is still some work to get there but being able to 
> >> build
> >> a PIE kernel is a significant step.
> >
> > So I _really_ dislike the whole PIE approach, because of the huge slowdown:
> >
> > +config RANDOMIZE_BASE_LARGE
> > +   bool "Increase the randomization range of the kernel image"
> > +   depends on X86_64 && RANDOMIZE_BASE
> > +   select X86_PIE
> > +   select X86_MODULE_PLTS if MODULES
> > +   default n
> > +   ---help---
> > + Build the kernel as a Position Independent Executable (PIE) and
> > + increase the available randomization range from 1GB to 3GB.
> > +
> > + This option impacts performance on kernel CPU intensive workloads 
> > up
> > + to 10% due to PIE generated code. Impact on user-mode processes 
> > and
> > + typical usage would be significantly less (0.50% when you build 
> > the
> > + kernel).
> > +
> > + The kernel and modules will generate slightly more assembly (1 to 
> > 2%
> > + increase on the .text sections). The vmlinux binary will be
> > + significantly smaller due to less relocations.
> >
> > To put 10% kernel overhead into perspective: enabling this option wipes out 
> > about
> > 5-10 years worth of painstaking optimizations we've done to keep the kernel 
> > fast
> > ... (!!)
> 
> Note that 10% is the high-bound of a CPU intensive workload.

Note that the 8-10% hackbench or even a 2%-4% range would be 'huge' in terms of 
modern kernel performance. In many cases we are literally applying cycle level 
optimizations that are barely measurable. A 0.1% speedup in linear execution 
speed 
is already a big success.

> I am going to start doing performance testing on -mcmodel=large to see if it 
> is 
> faster than -fPIE.

Unfortunately mcmodel=large looks pretty heavy too AFAICS, at the machine 
instruction level.

Function calls look like this:

 -mcmodel=medium:

   757:   e8 98 ff ff ff  callq  6f4 

 -mcmodel=large

   77b:   48 b8 10 f7 df ff ffmovabs $0xffdff710,%rax
   782:   ff ff ff 
   785:   48 8d 04 03 lea(%rbx,%rax,1),%rax
   789:   ff d0   callq  *%rax

And we'd do this for _EVERY_ function call in the kernel. That kind of crap is 
totally unacceptable.

> > I think the fundamental flaw is the assumption that we need a PIE 
> > executable 
> > to have a freely relocatable kernel on 64-bit CPUs.
> >
> > Have you considered a kernel with -mcmodel=small (or medium) instead of 
> > -fpie 
> > -mcmodel=large? We can pick a random 2GB window in the (non-kernel) 
> > canonical 
> > x86-64 address space to randomize the location of kernel text. The location 
> > of 
> > modules can be further randomized within that 2GB window.
> 
> -model=small/medium assume you are on the low 32-bit. It generates 
> instructions 
> where the virtual addresses have the high 32-bit to be zero.

How are these assumptions hardcoded by GCC? Most of the instructions should be 
relocatable straight away, as most call/jump/branch instructions are 
RIP-relative.

I.e. is there no GCC code generation mode where code can be placed anywhere in 
the 
canonical address space, yet call and jump distance is within 31 bits so that 
the 
generated code is fast?

Thanks,

Ingo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4] xen: get rid of paravirt op adjust_exception_frame

2017-08-16 Thread Boris Ostrovsky

On 08/11/2017 10:54 AM, Juergen Gross wrote:
> When running as Xen pv-guest the exception frame on the stack contains
> %r11 and %rcx additional to the other data pushed by the processor.
>
> Instead of having a paravirt op being called for each exception type
> prepend the Xen specific code to each exception entry. When running as
> Xen pv-guest just use the exception entry with prepended instructions,
> otherwise use the entry without the Xen specific code.
>
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/entry/entry_64.S | 23 ++---
>  arch/x86/entry/entry_64_compat.S  |  1 -
>  arch/x86/include/asm/paravirt.h   |  5 --
>  arch/x86/include/asm/paravirt_types.h |  4 --
>  arch/x86/include/asm/proto.h  |  3 ++
>  arch/x86/include/asm/traps.h  | 33 ++--
>  arch/x86/kernel/asm-offsets_64.c  |  1 -
>  arch/x86/kernel/paravirt.c|  3 --
>  arch/x86/xen/enlighten_pv.c   | 96 
> +++
>  arch/x86/xen/irq.c|  3 --
>  arch/x86/xen/xen-asm_64.S | 45 ++--
>  arch/x86/xen/xen-ops.h|  1 -
>  12 files changed, 140 insertions(+), 78 deletions(-)

Reviewed-by: Boris Ostrovsky 

Applied to for-linus-4.14.

-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal

2017-08-16 Thread Jan Beulich

  >>> On 16.08.17 at 13:22,  wrote:
> * Drop trailing whitespace.
>   * Move amd_nonfatal_mcheck_init() into .init.text and drop a trailing 
> return.
>   * Drop unnecessary wmb()'s.  Because of Xen's implementation, they are only
> compiler barriers anyway, and each wrmsr() is already fully serialising.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 0/3] xen: do some cleanups

2017-08-16 Thread Boris Ostrovsky

On 08/04/2017 07:36 AM, Juergen Gross wrote:
> Remove stuff no longer needed.
>
> Juergen Gross (3):
>   xen: remove tests for pvh mode in pure pv paths
>   xen: remove unused function xen_set_domain_pte()
>   xen: remove not used trace functions
>
>  arch/x86/include/asm/xen/page.h |  5 -
>  arch/x86/xen/mmu_pv.c   | 20 
>  arch/x86/xen/p2m.c  | 25 +
>  arch/x86/xen/setup.c|  5 +
>  include/trace/events/xen.h  | 38 --
>  5 files changed, 2 insertions(+), 91 deletions(-)
>

Applied to for-linus-4.14.

-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Reduce debug overhead of __virt_to_maddr()

2017-08-16 Thread Jan Beulich

>>> On 16.08.17 at 16:22,  wrote:
> On 16/08/17 15:14, Andrew Cooper wrote:
>> On 16/08/17 15:11, Jan Beulich wrote:
>> On 16.08.17 at 15:58,  wrote:
 --- a/xen/include/asm-x86/x86_64/page.h
 +++ b/xen/include/asm-x86/x86_64/page.h
 @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
  
  static inline unsigned long __virt_to_maddr(unsigned long va)
  {
 -ASSERT(va >= XEN_VIRT_START);
  ASSERT(va < DIRECTMAP_VIRT_END);
  if ( va >= DIRECTMAP_VIRT_START )
  va -= DIRECTMAP_VIRT_START;
  else
  {
 -ASSERT(va < XEN_VIRT_END);
 +BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
 +ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
 +   ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
>>> Do you really need the casts here? I.e. what's wrong here with
>>> doing unsigned long arithmetic?
>> Oh - good point.  This took more than one attempt to get right, and I
>> first thought I had a sign extension problem.  The actual problem was a
>> (lack of) + PAGE_SHIFT.
>>
>> The other thing to know is that  __virt_to_maddr() is used before the
>> IDT is set up, so your only signal of something being wrong is a triple
>> fault.  Let me double check without the casts, but I think it should be
>> fine.
> 
> Ok - so it does function when using unsigned arithmetic.
> 
> However, the generated code is better with signed arithmetic, as
> ((long)XEN_VIRT_START >> 39) fix in a 32bit sign-extended immediate,
> whereas XEN_VIRT_START >> 39 needs a movabs.

Why would that be? Shifting out 39 bits means 25 significant bits
are left out of the original 64. Or wait - isn't it 30 rather than 39?
In that case indeed 34 significant bits would remain. In that case
I'd be fine with the casts left in place, as long as at least the
commit message (a code comment may be better to keep people
like me from being tempted to remove the casts as ugly and
apparently unnecessary) says why.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen-platform: constify pci_device_id.

2017-08-16 Thread Boris Ostrovsky

On 08/02/2017 06:36 PM, Boris Ostrovsky wrote:
> On 08/02/2017 01:46 PM, Arvind Yadav wrote:
>> pci_device_id are not supposed to change at runtime. All functions
>> working with pci_device_id provided by  work with
>> const pci_device_id. So mark the non-const structs as const.
>>
>> Signed-off-by: Arvind Yadav 
> Reviewed-by: Boris Ostrovsky 


Applied to for-linus-4.14.

-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [xen-unstable-smoke test] 112666: tolerable trouble: broken/pass - PUSHED

2017-08-16 Thread osstest service owner

flight 112666 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112666/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-pvops 2 hosts-allocate  broken like 112665
 build-arm64   2 hosts-allocate  broken like 112665
 build-arm64-pvops 3 capture-logsbroken like 112665
 build-arm64   3 capture-logsbroken like 112665
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  79d5dd06a677fcc8c5a585d95b32c35bd38bc34e
baseline version:
 xen  4befb4ed85cf5f6784f8c0aaf1d2dba1dbd26ac0

Last test of basis   112665  2017-08-16 10:48:42 Z0 days
Testing same since   112666  2017-08-16 13:03:12 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 

jobs:
 build-amd64  pass
 build-arm64  broken  
 build-armhf  pass
 build-amd64-libvirt  pass
 build-arm64-pvopsbroken  
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  broken  
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-step build-arm64-pvops hosts-allocate
broken-step build-arm64 hosts-allocate
broken-step build-arm64-pvops capture-logs
broken-step build-arm64 capture-logs

Pushing revision :

+ branch=xen-unstable-smoke
+ revision=79d5dd06a677fcc8c5a585d95b32c35bd38bc34e
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x '!=' x/home/osstest/repos/lock ']'
++ OSSTEST_REPOS_LOCK_LOCKED=/home/osstest/repos/lock
++ exec with-lock-ex -w /home/osstest/repos/lock ./ap-push xen-unstable-smoke 
79d5dd06a677fcc8c5a585d95b32c35bd38bc34e
+ branch=xen-unstable-smoke
+ revision=79d5dd06a677fcc8c5a585d95b32c35bd38bc34e
+ . ./cri-lock-repos
++ . ./cri-common
+++ . ./cri-getconfig
+++ umask 002
+++ getrepos
 getconfig Repos
 perl -e '
use Osstest;
readglobalconfig();
print $c{"Repos"} or die $!;
'
+++ local repos=/home/osstest/repos
+++ '[' -z /home/osstest/repos ']'
+++ '[' '!' -d /home/osstest/repos ']'
+++ echo /home/osstest/repos
++ repos=/home/osstest/repos
++ repos_lock=/home/osstest/repos/lock
++ '[' x/home/osstest/repos/lock '!=' x/home/osstest/repos/lock ']'
+ . ./cri-common
++ . ./cri-getconfig
++ umask 002
+ select_xenbranch
+ case "$branch" in
+ tree=xen
+ xenbranch=xen-unstable-smoke
+ qemuubranch=qemu-upstream-unstable
+ '[' xxen = xlinux ']'
+ linuxbranch=
+ '[' xqemu-upstream-unstable = x ']'
+ select_prevxenbranch
++ ./cri-getprevxenbranch xen-unstable-smoke
+ prevxenbranch=xen-4.9-testing
+ '[' x79d5dd06a677fcc8c5a585d95b32c35bd38bc34e = x ']'
+ : tested/2.6.39.x
+ . ./ap-common
++ : osst...@xenbits.xen.org
+++ getconfig OsstestUpstream
+++ perl -e '
use Osstest;
readglobalconfig();
print $c{"OsstestUpstream"} or die $!;
'
++ :
++ : git://xenbits.xen.org/xen.git
++ : osst...@xenbits.xen.org:/home/xen/git/xen.git
++ : git://xenbits.xen.org/qemu-xen-traditional.git
++ : git://git.kernel.org
++ : git://git.kernel.org/pub/scm/linux/kernel/git
++ : git
++ : git://xenbits.xen.org/xtf.git
++ : osst...@xenbits.xen.org:/home/xen/git/xtf.git
++ :

Re: [Xen-devel] [PATCH] xen: cleanup xen.h

2017-08-16 Thread Boris Ostrovsky

On 07/27/2017 11:44 AM, Juergen Gross wrote:
> On 27/07/17 17:37, Boris Ostrovsky wrote:
>> On 07/27/2017 11:11 AM, Juergen Gross wrote:
>>> The macros for testing domain types are more complicated then they
>>> need to. Simplify them.
>>>
>>> Signed-off-by: Juergen Gross 
>>> ---
>>>  include/xen/xen.h | 20 +---
>>>  1 file changed, 9 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/include/xen/xen.h b/include/xen/xen.h
>>> index 6e8b7fc79801..28c59ca529d7 100644
>>> --- a/include/xen/xen.h
>>> +++ b/include/xen/xen.h
>>> @@ -13,11 +13,16 @@ extern enum xen_domain_type xen_domain_type;
>>>  #define xen_domain_typeXEN_NATIVE
>>>  #endif
>>>  
>>> +#ifdef CONFIG_XEN_PVH
>>> +extern bool xen_pvh;
>>> +#else
>>> +#define xen_pvh0
>>> +#endif
>>> +
>>>  #define xen_domain()   (xen_domain_type != XEN_NATIVE)
>>> -#define xen_pv_domain()(xen_domain() &&
>>> \
>>> -xen_domain_type == XEN_PV_DOMAIN)
>>> -#define xen_hvm_domain()   (xen_domain() &&\
>>> -xen_domain_type == XEN_HVM_DOMAIN)
>>> +#define xen_pv_domain()(xen_domain_type == XEN_PV_DOMAIN)
>> Stray tab.
> No. This is just due to the '+' of the patch.


Applied to for-linus-4.14

-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3] xen: Implement hypercall for tracing of program counters

2017-08-16 Thread Wei Liu

On Fri, Aug 11, 2017 at 05:25:34PM +0200, Felix Schmoll wrote:
> This commit makes the changes to the hypervisor, the build system as
> well as libxc necessary in order to facilitate tracing of program counters.
> 
> A discussion of the design can be found in the mailing list:
> https://lists.xen.org/archives/html/xen-devel/2017-05/threads.html#02210
> 
> The list of files to be included for tracing might still be too extensive,
> resulting in indeterministic tracing output for some use cases.
> 
> Signed-off-by: Felix Schmoll 

There are some styling issues in code. I have queued this patch up to
one of my branches and will fix those up.

It will be properly upstreamed once I or someone else gets around to
make the build system up to our task.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v2 21/22] ARM: vITS: injecting LPIs: use pending_irq lock

2017-08-16 Thread Julien Grall


Hi Andre,

On 21/07/17 21:00, Andre Przywara wrote:

Instead of using an atomic access and hoping for the best, let's use
the new pending_irq lock now to make sure we read a sane version of
the target VCPU.


How this is going to bring a saner version?

You only read the vCPU and well nothing prevent it to change between the 
time you get it and lock it in vgic_vcpu_inject_irq.


Cheers,


That still doesn't solve the problem mentioned in the comment, but
paves the way for future improvements.

Signed-off-by: Andre Przywara 
---
 xen/arch/arm/gic-v3-lpi.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index 2306b58..9db26ed 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -140,20 +140,22 @@ void vgic_vcpu_inject_lpi(struct domain *d, unsigned int 
virq)
 {
 /*
  * TODO: this assumes that the struct pending_irq stays valid all of
- * the time. We cannot properly protect this with the current locking
- * scheme, but the future per-IRQ lock will solve this problem.
+ * the time. We cannot properly protect this with the current code,
+ * but a future refcounting will solve this problem.
  */
 struct pending_irq *p = irq_to_pending(d->vcpu[0], virq);
+unsigned long flags;
 unsigned int vcpu_id;

 if ( !p )
 return;

-vcpu_id = ACCESS_ONCE(p->vcpu_id);
-if ( vcpu_id >= d->max_vcpus )
-  return;
+vgic_irq_lock(p, flags);
+vcpu_id = p->vcpu_id;
+vgic_irq_unlock(p, flags);

-vgic_vcpu_inject_irq(d->vcpu[vcpu_id], virq);
+if ( vcpu_id < d->max_vcpus )
+vgic_vcpu_inject_irq(d->vcpu[vcpu_id], virq);
 }

 /*



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v2 20/22] ARM: vGIC: move virtual IRQ enable bit from rank to pending_irq

2017-08-16 Thread Julien Grall


Hi Andre,

On 21/07/17 21:00, Andre Przywara wrote:

The enabled bits for a group of IRQs are still stored in the irq_rank
structure, although we already have the same information in pending_irq,
in the GIC_IRQ_GUEST_ENABLED bit of the "status" field.
Remove the storage from the irq_rank and just utilize the existing
wrappers to cover enabling/disabling of multiple IRQs.
This also marks the removal of the last member of struct vgic_irq_rank.

Signed-off-by: Andre Przywara 
---
 xen/arch/arm/vgic-v2.c |  41 +++--
 xen/arch/arm/vgic-v3.c |  41 +++--
 xen/arch/arm/vgic.c| 201 +++--
 xen/include/asm-arm/vgic.h |  10 +--
 4 files changed, 152 insertions(+), 141 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index c7ed3ce..3320642 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -166,9 +166,7 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, 
mmio_info_t *info,
register_t *r, void *priv)
 {
 struct hsr_dabt dabt = info->dabt;
-struct vgic_irq_rank *rank;
 int gicd_reg = (int)(info->gpa - v->domain->arch.vgic.dbase);
-unsigned long flags;
 unsigned int irq;

 perfc_incr(vgicd_reads);
@@ -222,20 +220,16 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, 
mmio_info_t *info,

 case VRANGE32(GICD_ISENABLER, GICD_ISENABLERN):
 if ( dabt.size != DABT_WORD ) goto bad_width;
-rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ISENABLER, DABT_WORD);
-if ( rank == NULL) goto read_as_zero;
-vgic_lock_rank(v, rank, flags);
-*r = vreg_reg32_extract(rank->ienable, info);
-vgic_unlock_rank(v, rank, flags);
+irq = (gicd_reg - GICD_ISENABLER) * 8;
+if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto read_as_zero;
+*r = vgic_fetch_irq_enabled(v, irq);
 return 1;

 case VRANGE32(GICD_ICENABLER, GICD_ICENABLERN):
 if ( dabt.size != DABT_WORD ) goto bad_width;
-rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ICENABLER, DABT_WORD);
-if ( rank == NULL) goto read_as_zero;
-vgic_lock_rank(v, rank, flags);
-*r = vreg_reg32_extract(rank->ienable, info);
-vgic_unlock_rank(v, rank, flags);
+irq = (gicd_reg - GICD_ICENABLER) * 8;
+if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto read_as_zero;
+*r = vgic_fetch_irq_enabled(v, irq);
 return 1;

 /* Read the pending status of an IRQ via GICD is not supported */
@@ -386,10 +380,7 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, 
mmio_info_t *info,
 register_t r, void *priv)
 {
 struct hsr_dabt dabt = info->dabt;
-struct vgic_irq_rank *rank;
 int gicd_reg = (int)(info->gpa - v->domain->arch.vgic.dbase);
-uint32_t tr;
-unsigned long flags;
 unsigned int irq;

 perfc_incr(vgicd_writes);
@@ -426,24 +417,16 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, 
mmio_info_t *info,

 case VRANGE32(GICD_ISENABLER, GICD_ISENABLERN):
 if ( dabt.size != DABT_WORD ) goto bad_width;
-rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ISENABLER, DABT_WORD);
-if ( rank == NULL) goto write_ignore;
-vgic_lock_rank(v, rank, flags);
-tr = rank->ienable;
-vreg_reg32_setbits(>ienable, r, info);
-vgic_enable_irqs(v, (rank->ienable) & (~tr), rank->index);
-vgic_unlock_rank(v, rank, flags);
+irq = (gicd_reg - GICD_ISENABLER) * 8;
+if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto write_ignore;
+vgic_store_irq_enable(v, irq, r);
 return 1;

 case VRANGE32(GICD_ICENABLER, GICD_ICENABLERN):
 if ( dabt.size != DABT_WORD ) goto bad_width;
-rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ICENABLER, DABT_WORD);
-if ( rank == NULL) goto write_ignore;
-vgic_lock_rank(v, rank, flags);
-tr = rank->ienable;
-vreg_reg32_clearbits(>ienable, r, info);
-vgic_disable_irqs(v, (~rank->ienable) & tr, rank->index);
-vgic_unlock_rank(v, rank, flags);
+irq = (gicd_reg - GICD_ICENABLER) * 8;
+if ( irq >= v->domain->arch.vgic.nr_spis + 32 ) goto write_ignore;
+vgic_store_irq_disable(v, irq, r);
 return 1;

 case VRANGE32(GICD_ISPENDR, GICD_ISPENDRN):
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index e9d46af..00cc1e5 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -676,8 +676,6 @@ static int __vgic_v3_distr_common_mmio_read(const char 
*name, struct vcpu *v,
 register_t *r)
 {
 struct hsr_dabt dabt = info->dabt;
-struct vgic_irq_rank *rank;
-unsigned long flags;
 unsigned int irq;

 switch ( reg )
@@ -689,20 +687,16 @@ static int __vgic_v3_distr_common_mmio_read(const char 
*name, struct vcpu *v,

 case

Re: [Xen-devel] [PATCH v3] passthrough: give XEN_DOMCTL_test_assign_device more sane semantics

2017-08-16 Thread Daniel De Graaf


On 08/16/2017 08:20 AM, Jan Beulich wrote:

So far callers of the libxc interface passed in a domain ID which was
then ignored in the hypervisor. Instead, make the hypervisor honor it
(accepting DOMID_INVALID to obtain original behavior), allowing to
query whether a device can be assigned to a particular domain.

Drop XSM's test_assign_{,dt}device hooks as no longer being
individually useful.

Signed-off-by: Jan Beulich 


Acked-by: Daniel De Graaf 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Reduce debug overhead of __virt_to_maddr()

2017-08-16 Thread Andrew Cooper

On 16/08/17 15:14, Andrew Cooper wrote:
> On 16/08/17 15:11, Jan Beulich wrote:
> On 16.08.17 at 15:58,  wrote:
>>> --- a/xen/include/asm-x86/x86_64/page.h
>>> +++ b/xen/include/asm-x86/x86_64/page.h
>>> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>>>  
>>>  static inline unsigned long __virt_to_maddr(unsigned long va)
>>>  {
>>> -ASSERT(va >= XEN_VIRT_START);
>>>  ASSERT(va < DIRECTMAP_VIRT_END);
>>>  if ( va >= DIRECTMAP_VIRT_START )
>>>  va -= DIRECTMAP_VIRT_START;
>>>  else
>>>  {
>>> -ASSERT(va < XEN_VIRT_END);
>>> +BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
>>> +ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
>>> +   ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
>> Do you really need the casts here? I.e. what's wrong here with
>> doing unsigned long arithmetic?
> Oh - good point.  This took more than one attempt to get right, and I
> first thought I had a sign extension problem.  The actual problem was a
> (lack of) + PAGE_SHIFT.
>
> The other thing to know is that  __virt_to_maddr() is used before the
> IDT is set up, so your only signal of something being wrong is a triple
> fault.  Let me double check without the casts, but I think it should be
> fine.

Ok - so it does function when using unsigned arithmetic.

However, the generated code is better with signed arithmetic, as
((long)XEN_VIRT_START >> 39) fix in a 32bit sign-extended immediate,
whereas XEN_VIRT_START >> 39 needs a movabs.

On the whole, I'd prefer to keep patch as is.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Reduce debug overhead of __virt_to_maddr()

2017-08-16 Thread Jan Beulich

>>> On 16.08.17 at 16:14,  wrote:
> On 16/08/17 15:11, Jan Beulich wrote:
> On 16.08.17 at 15:58,  wrote:
>>> --- a/xen/include/asm-x86/x86_64/page.h
>>> +++ b/xen/include/asm-x86/x86_64/page.h
>>> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>>>  
>>>  static inline unsigned long __virt_to_maddr(unsigned long va)
>>>  {
>>> -ASSERT(va >= XEN_VIRT_START);
>>>  ASSERT(va < DIRECTMAP_VIRT_END);
>>>  if ( va >= DIRECTMAP_VIRT_START )
>>>  va -= DIRECTMAP_VIRT_START;
>>>  else
>>>  {
>>> -ASSERT(va < XEN_VIRT_END);
>>> +BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
>>> +ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
>>> +   ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
>> Do you really need the casts here? I.e. what's wrong here with
>> doing unsigned long arithmetic?
> 
> Oh - good point.  This took more than one attempt to get right, and I
> first thought I had a sign extension problem.  The actual problem was a
> (lack of) + PAGE_SHIFT.
> 
> The other thing to know is that  __virt_to_maddr() is used before the
> IDT is set up, so your only signal of something being wrong is a triple
> fault.  Let me double check without the casts, but I think it should be
> fine.

If it is, then with the casts dropped
Reviewed-by: Jan Beulich 

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Reduce debug overhead of __virt_to_maddr()

2017-08-16 Thread Andrew Cooper

On 16/08/17 15:11, Jan Beulich wrote:
 On 16.08.17 at 15:58,  wrote:
>> --- a/xen/include/asm-x86/x86_64/page.h
>> +++ b/xen/include/asm-x86/x86_64/page.h
>> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>>  
>>  static inline unsigned long __virt_to_maddr(unsigned long va)
>>  {
>> -ASSERT(va >= XEN_VIRT_START);
>>  ASSERT(va < DIRECTMAP_VIRT_END);
>>  if ( va >= DIRECTMAP_VIRT_START )
>>  va -= DIRECTMAP_VIRT_START;
>>  else
>>  {
>> -ASSERT(va < XEN_VIRT_END);
>> +BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
>> +ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
>> +   ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
> Do you really need the casts here? I.e. what's wrong here with
> doing unsigned long arithmetic?

Oh - good point.  This took more than one attempt to get right, and I
first thought I had a sign extension problem.  The actual problem was a
(lack of) + PAGE_SHIFT.

The other thing to know is that  __virt_to_maddr() is used before the
IDT is set up, so your only signal of something being wrong is a triple
fault.  Let me double check without the casts, but I think it should be
fine.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Reduce debug overhead of __virt_to_maddr()

2017-08-16 Thread Jan Beulich

>>> On 16.08.17 at 15:58,  wrote:
> --- a/xen/include/asm-x86/x86_64/page.h
> +++ b/xen/include/asm-x86/x86_64/page.h
> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>  
>  static inline unsigned long __virt_to_maddr(unsigned long va)
>  {
> -ASSERT(va >= XEN_VIRT_START);
>  ASSERT(va < DIRECTMAP_VIRT_END);
>  if ( va >= DIRECTMAP_VIRT_START )
>  va -= DIRECTMAP_VIRT_START;
>  else
>  {
> -ASSERT(va < XEN_VIRT_END);
> +BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
> +ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
> +   ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));

Do you really need the casts here? I.e. what's wrong here with
doing unsigned long arithmetic?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 08/52] xen/arch/x86/genapic/probe.c: let custom parameter parsing routines return errno

2017-08-16 Thread Jan Beulich

>>> On 16.08.17 at 14:51,  wrote:
> Modify the custom parameter parsing routines in:
> 
> xen/arch/x86/genapic/probe.c
> 
> to indicate whether the parameter value was parsed successfully.
> 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH] x86/mm: Reduce debug overhead of __virt_to_maddr()

2017-08-16 Thread Andrew Cooper

__virt_to_maddr() is used very frequently, but has a large footprint due to
its assertions and comparasons.

Rearange its logic to drop one assertion entirely, encoding its check in a
second assertion (with no additional branch, and the comparason performed with
a 32bit immediate rather than requiring a movabs).

Bloat-o-meter net report is:
  add/remove: 0/0 grow/shrink: 1/72 up/down: 3/-2169 (-2166)

along with a reduction of 32 assertion frames (895 down to 861)

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
---
 xen/include/asm-x86/x86_64/page.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/include/asm-x86/x86_64/page.h 
b/xen/include/asm-x86/x86_64/page.h
index 947e52b..bd30f25 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
 
 static inline unsigned long __virt_to_maddr(unsigned long va)
 {
-ASSERT(va >= XEN_VIRT_START);
 ASSERT(va < DIRECTMAP_VIRT_END);
 if ( va >= DIRECTMAP_VIRT_START )
 va -= DIRECTMAP_VIRT_START;
 else
 {
-ASSERT(va < XEN_VIRT_END);
+BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
+ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
+   ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
+
 va += xen_phys_start - XEN_VIRT_START;
 }
 return (va & ma_va_bottom_mask) |
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 07/52] xen/arch/x86/dom0_build.c: let custom parameter parsing routines return errno

2017-08-16 Thread Jan Beulich

>>> On 16.08.17 at 14:51,  wrote:
> Modify the custom parameter parsing routines in:
> 
> xen/arch/x86/dom0_build.c
> 
> to indicate whether the parameter value was parsed successfully.
> 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Drop more PV superpage leftovers

2017-08-16 Thread Jan Beulich

>>> On 16.08.17 at 14:27,  wrote:
> Signed-off-by: Andrew Cooper 

Acked-by: Jan Beulich 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Drop __PAGE_OFFSET

2017-08-16 Thread Jan Beulich

>>> On 16.08.17 at 14:49,  wrote:
> It is a vestigial leftover of Xen having inherited Linux's memory management
> code in the early days.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Jan Beulich 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH v2 18/22] ARM: vGIC: move virtual IRQ target VCPU from rank to pending_irq

2017-08-16 Thread Julien Grall


Hi Andre,

On 21/07/17 21:00, Andre Przywara wrote:

The VCPU a shared virtual IRQ is targeting is currently stored in the
irq_rank structure.
For LPIs we already store the target VCPU in struct pending_irq, so
move SPIs over as well.
The ITS code, which was using this field already, was so far using the
VCPU lock to protect the pending_irq, so move this over to the new lock.

Signed-off-by: Andre Przywara 
---
 xen/arch/arm/vgic-v2.c | 56 +++
 xen/arch/arm/vgic-v3-its.c |  9 +++---
 xen/arch/arm/vgic-v3.c | 69 ---
 xen/arch/arm/vgic.c| 73 +-
 xen/include/asm-arm/vgic.h | 13 +++--
 5 files changed, 96 insertions(+), 124 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 0c8a598..c7ed3ce 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -66,19 +66,22 @@ void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t 
csize,
  *
  * Note the byte offset will be aligned to an ITARGETSR boundary.
  */
-static uint32_t vgic_fetch_itargetsr(struct vgic_irq_rank *rank,
- unsigned int offset)
+static uint32_t vgic_fetch_itargetsr(struct vcpu *v, unsigned int offset)
 {
 uint32_t reg = 0;
 unsigned int i;
+unsigned long flags;

-ASSERT(spin_is_locked(>lock));
-
-offset &= INTERRUPT_RANK_MASK;
 offset &= ~(NR_TARGETS_PER_ITARGETSR - 1);

 for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++ )
-reg |= (1 << read_atomic(>vcpu[offset])) << (i * 
NR_BITS_PER_TARGET);
+{
+struct pending_irq *p = irq_to_pending(v, offset);
+
+vgic_irq_lock(p, flags);
+reg |= (1 << p->vcpu_id) << (i * NR_BITS_PER_TARGET);
+vgic_irq_unlock(p, flags);
+}

 return reg;
 }
@@ -89,32 +92,29 @@ static uint32_t vgic_fetch_itargetsr(struct vgic_irq_rank 
*rank,
  *
  * Note the byte offset will be aligned to an ITARGETSR boundary.
  */
-static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
+static void vgic_store_itargetsr(struct domain *d,
  unsigned int offset, uint32_t itargetsr)
 {
 unsigned int i;
 unsigned int virq;

-ASSERT(spin_is_locked(>lock));
-
 /*
  * The ITARGETSR0-7, used for SGIs/PPIs, are implemented RO in the
  * emulation and should never call this function.
  *
- * They all live in the first rank.
+ * They all live in the first four bytes of ITARGETSR.
  */
-BUILD_BUG_ON(NR_INTERRUPT_PER_RANK != 32);
-ASSERT(rank->index >= 1);
+ASSERT(offset >= 4);

-offset &= INTERRUPT_RANK_MASK;
+virq = offset;
 offset &= ~(NR_TARGETS_PER_ITARGETSR - 1);

-virq = rank->index * NR_INTERRUPT_PER_RANK + offset;
-
 for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++, virq++ )
 {
 unsigned int new_target, old_target;
+unsigned long flags;
 uint8_t new_mask;
+struct pending_irq *p = spi_to_pending(d, virq);

 /*
  * Don't need to mask as we rely on new_mask to fit for only one
@@ -151,16 +151,14 @@ static void vgic_store_itargetsr(struct domain *d, struct 
vgic_irq_rank *rank,
 /* The vCPU ID always starts from 0 */
 new_target--;

-old_target = read_atomic(>vcpu[offset]);
+vgic_irq_lock(p, flags);
+old_target = p->vcpu_id;

 /* Only migrate the vIRQ if the target vCPU has changed */
 if ( new_target != old_target )
-{
-if ( vgic_migrate_irq(d->vcpu[old_target],
- d->vcpu[new_target],
- virq) )
-write_atomic(>vcpu[offset], new_target);
-}
+vgic_migrate_irq(p, , d->vcpu[new_target]);


Why do you need to pass a pointer on the flags and not directly the value?


+else
+vgic_irq_unlock(p, flags);
 }
 }

@@ -264,11 +262,7 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, 
mmio_info_t *info,
 uint32_t itargetsr;

 if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width;
-rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR, DABT_WORD);
-if ( rank == NULL) goto read_as_zero;
-vgic_lock_rank(v, rank, flags);
-itargetsr = vgic_fetch_itargetsr(rank, gicd_reg - GICD_ITARGETSR);
-vgic_unlock_rank(v, rank, flags);
+itargetsr = vgic_fetch_itargetsr(v, gicd_reg - GICD_ITARGETSR);


You need a check on the IRQ to avoid calling vgic_fetch_itargetsr with 
an IRQ not handled.



 *r = vreg_reg32_extract(itargetsr, info);

 return 1;
@@ -498,14 +492,10 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, 
mmio_info_t *info,
 uint32_t itargetsr;

 if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width;
-rank = vgic_rank_offset(v, 8,

[Xen-devel] [xen-4.7-testing test] 112650: regressions - trouble: blocked/broken/fail/pass

2017-08-16 Thread osstest service owner

flight 112650 xen-4.7-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/112650/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-xtf-amd64-amd64-2 47 xtf/test-hvm64-lbr-tsx-vmentry fail REGR. vs. 111516

Regressions which are regarded as allowable (not blocking):
 build-arm64-xsm   2 hosts-allocate broken REGR. vs. 111516
 build-arm64-pvops 2 hosts-allocate broken REGR. vs. 111516
 build-arm64   2 hosts-allocate broken REGR. vs. 111516

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl   1 build-check(1)   blocked  n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-credit2   1 build-check(1)   blocked  n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 build-arm64-xsm   3 capture-logs  broken blocked in 111516
 build-arm64   3 capture-logs  broken blocked in 111516
 build-arm64-pvops 3 capture-logs  broken blocked in 111516
 test-xtf-amd64-amd64-1  47 xtf/test-hvm64-lbr-tsx-vmentry fail like 111516
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 111516
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 111516
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 111516
 test-armhf-armhf-libvirt-xsm 14 saverestore-support-checkfail  like 111516
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 111516
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 10 windows-installfail never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 10 windows-installfail never pass
 test-amd64-amd64-xl-pvh-amd  12 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-intel 15 guest-saverestorefail  never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 13 guest-saverestore   fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 13 guest-saverestore   fail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-xsm  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass

version targeted for testing:
 xen  8aebf856caabeb46f89acf07b727193e16ab1242
baseline version:
 xen  4fbfa34b1a0bb329aa57275421e2e9027d32aad5

Last test of basis   111516  2017-07-07 02:22:38 Z   40 days
Testing same since   112650  2017-08-15

Re: [Xen-devel] [PATCH v5] x86/hvm: Allow guest_request vm_events coming from userspace

2017-08-16 Thread Tamas K Lengyel

On Wed, Aug 16, 2017 at 6:43 AM, Razvan Cojocaru
 wrote:
> On 16.08.2017 15:32, Tamas K Lengyel wrote:
>>
>> On Wed, Aug 16, 2017 at 12:07 AM, Razvan Cojocaru
>>  wrote:
>>>
>>> On 08/16/2017 02:16 AM, Tamas K Lengyel wrote:

 On Tue, Aug 15, 2017 at 2:06 AM, Jan Beulich  wrote:

 On 14.08.17 at 17:53,  wrote:
>>
>> On Tue, Aug 8, 2017 at 2:27 AM, Alexandru Isaila
>>  wrote:
>>>
>>> --- a/xen/arch/x86/hvm/hypercall.c
>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>> @@ -155,6 +155,11 @@ int hvm_hypercall(struct cpu_user_regs *regs)
>>>   /* Fallthrough to permission check. */
>>>   case 4:
>>>   case 2:
>>> +if ( currd->arch.monitor.guest_request_userspace_enabled &&
>>> +eax == __HYPERVISOR_hvm_op &&
>>> +(mode == 8 ? regs->rdi : regs->ebx) ==
>>> HVMOP_guest_request_vm_event )
>>> +break;
>>> +
>>
>>
>> So the CPL check happens after the monitor check, which means this
>> will trigger regardless if the hypercall is coming from userspace or
>> kernelspace. Since the monitor option specifically says userspace,
>> this should probably get moved into the block where CPL was checked.
>
>
> What difference would this make? For CPL0 the hypercall is
> permitted anyway, and for CPL > 0 we specifically want to bypass
> the CPL check. Or are you saying you want to restrict the new
> check to just CPL3?
>

 Yes, according to the name of this monitor option this should only
 trigger a vm_event when the hypercall is coming from CPL3. However,
 the way it is implemented right now I see that this monitor option
 actually requires the other one to be enabled too. By itself this
 monitor option will not work. So I would also like to ask that the
 check in xen/common/monitor.c, if ( d->monitor.guest_request_enabled
 ), to be extended to be: if ( d->monitor.guest_request_enabled ||
 d->monitor.guest_request_userspace_enabled )
>>>
>>>
>>> The option does not trigger anything. Its job is to allow guest requests
>>> coming from userspace (via VMCALLs). And not to _only_ allow these for
>>> userspace, but to allow them coming from userspace _as_well_.
>>>
>>> The current version of the patch, if I've not missed something, does not
>>> require d->monitor.guest_request_enabled to be true to work (the options
>>> can be toggled independently).
>>>
>>> The new function is meant to be called at any time, independent of
>>> enabling / disabling the guest request vm_event (i.e. it only controls
>>> its behaviour once it's enabled). So guest_request_userspace_enabled
>>> should not be used as synonym for guest_request_enabled.
>>>
>>
>> Hi Razvan,
>> so while monitor.guest_request_enabled can indeed be toggled
>> independently, if it is not set, how would the vm_event actually be
>> sent for monitor.guest_request_userspace_enabled? AFAICT it wouldn't
>> because the code responsible for sending the vm_event is gated only on
>> monitor.guest_request_enabled.
>
>
> Hello, indeed it wouldn't.
>
> This new option only control what happens _if_ monitor.guest_request_enabled
> is true. While monitor.guest_request_enabled is false, it has no effect. As
> soon as guest_request_enabled gets toggled on again, the behaviour will be
> the one that has been previously set by using
> guest_request_userspace_enabled.
>
>> And as for this new option being an extended version of
>> guest_request_enabled in the sense that it would allow both userspace
>> _and_ kernelspace, we can do that, but then the name of it is
>> misleading.
>
>
> The naming of the function did get shuffled around a bit. :)
>
> Would xc_monitor_allow_guest_userspace_event() be more appropriate?
>
> Also, if having a separate function feels counter-intuitive, we could also
> simply add a bool parameter to xc_monitor_guest_request(), for example:
>
> int xc_monitor_guest_request(xc_interface *xch, domid_t domain_id,
>  bool enable, bool sync,
>  bool allow_userspace);
>
> and use that to toggle guest_request_userspace_enabled.

In light of the conversion here I have to say I like this option the most.

>
> Thanks,
> Razvan
>
> P.S. Is replying only to me (as opposed to a "reply all") intentional?

Oops, no, I've re-added xen-devel :)

Tamas

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v9 11/13] arm/mem_access: Add long-descriptor based gpt

2017-08-16 Thread Sergej Proskurin

This commit adds functionality to walk the guest's page tables using the
long-descriptor translation table format for both ARMv7 and ARMv8.
Similar to the hardware architecture, the implementation supports
different page granularities (4K, 16K, and 64K). The implementation is
based on ARM DDI 0487B.a J1-5922, J1-5999, and ARM DDI 0406C.b B3-1510.

Note that the current implementation lacks support for Large VA/PA on
ARMv8.2 architectures (LVA/LPA, 52-bit virtual and physical address
sizes). The associated location in the code is marked appropriately.

Signed-off-by: Sergej Proskurin 
Acked-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v2: Use TCR_SZ_MASK instead of TTBCR_SZ_MASK for ARM 32-bit guests using
the long-descriptor translation table format.

Cosmetic fixes.

v3: Move the implementation to ./xen/arch/arm/guest_copy.c.

Remove the array strides and declare the array grainsizes as static
const instead of just const to reduce the function stack overhead.

Move parts of the funtion guest_walk_ld into the static functions
get_ttbr_and_gran_64bit and get_top_bit to reduce complexity.

Use the macro BIT(x) instead of (1UL << x).

Add more comments && Cosmetic fixes.

v4: Move functionality responsible for determining the configured IPA
output-size into a separate function get_ipa_output_size. In this
function, we remove the previously used switch statement, which was
responsible for distinguishing between different IPA output-sizes.
Instead, we retrieve the information from the introduced ipa_sizes
array.

Remove the defines GRANULE_SIZE_INDEX_* and TTBR0_VALID from
guest_walk.h. Instead, introduce the enums granule_size_index
active_ttbr directly inside of guest_walk.c so that the associated
fields don't get exported.

Adapt the function to the new parameter of type "struct vcpu *".

Remove support for 52bit IPA output-sizes entirely from this commit.

Use lpae_* helpers instead of p2m_* helpers.

Cosmetic fixes & Additional comments.

v5: Make use of the function vgic_access_guest_memory to read page table
entries in guest memory.

Invert the indeces of the arrays "offsets" and "masks" and simplify
readability by using an appropriate macro for the entries.

Remove remaining CONFIG_ARM_64 #ifdefs.

Remove the use of the macros BITS_PER_WORD and BITS_PER_DOUBLE_WORD.

Use GENMASK_ULL instead of manually creating complex masks to ease
readability.

Also, create a macro CHECK_BASE_SIZE which simply reduces the code
size and simplifies readability.

Make use of the newly introduced lpae_page macro in the if-statement
to test for invalid/reserved mappings in the L3 PTE.

Cosmetic fixes and additional comments.

v6: Convert the macro CHECK_BASE_SIZE into a helper function
check_base_size. The use of the old CHECK_BASE_SIZE was confusing as
it affected the control-flow through a return as part of the macro.

Return the value -EFAULT instead of -EINVAL if access to the guest's
memory fails.

Simplify the check in the end of the table walk that ensures that
the found PTE is a page or a superpage. The new implementation
checks if the pte maps a valid page or a superpage and returns an
-EFAULT only if both conditions are not true.

Adjust the type of the array offsets to paddr_t instead of vaddr_t
to allow working with the changed *_table_offset_* helpers, which
return offsets of type paddr_t.

Make use of renamed function access_guest_memory_by_ipa instead of
vgic_access_guest_memory.

v7: Change the return type of check_base_size to bool as it returns only
two possible values and the caller is interested only whether the call
has succeeded or not.

Use a mask for the computation of the IPA, as the lower values of
the PTE's base address do not need to be zeroed out.

Cosmetic fixes in comments.

v8: By calling access_guest_memory_by_ipa in guest_walk_(ld|sd), we rely
on the p2m->lock (rw_lock) to be recursive. To avoid bugs in the
future implementation, we add a comment in struct p2m_domain to
address this case. Thus, we make the future implementation aware of
the nested use of the lock.

v9: Remove second "to" in a comment.

Add Acked-by Julien Grall.
---
 xen/arch/arm/guest_walk.c | 398 +-
 xen/include/asm-arm/p2m.h |   8 +-
 2 files changed, 403 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
index 78badc2949..d0d45ad659 100644
--- a/xen/arch/arm/guest_walk.c
+++ b/xen/arch/arm/guest_walk.c
@@ -15,7 +15,10 @@
  * this program; If not, see .
  */
 
+#include 
 #include 
+#include 
+#include 
 
 /*
  * The function guest_walk_sd translates a given GVA into

[Xen-devel] [PATCH v9 13/13] arm/mem_access: Walk the guest's pt in software

2017-08-16 Thread Sergej Proskurin

In this commit, we make use of the gpt walk functionality introduced in
the previous commits. If mem_access is active, hardware-based gva to ipa
translation might fail, as gva_to_ipa uses the guest's translation
tables, access to which might be restricted by the active VTTBR. To
side-step potential translation errors in the function
p2m_mem_access_check_and_get_page due to restricted memory (e.g. to the
guest's page tables themselves), we walk the guest's page tables in
software.

Signed-off-by: Sergej Proskurin 
Acked-by: Tamas K Lengyel 
---
Cc: Razvan Cojocaru 
Cc: Tamas K Lengyel 
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v2: Check the returned access rights after walking the guest's page tables in
the function p2m_mem_access_check_and_get_page.

v3: Adapt Function names and parameter.

v4: Comment why we need to fail if the permission flags that are
requested by the caller do not satisfy the mapped page.

Cosmetic fix that simplifies the if-statement checking for the
GV2M_WRITE permission.

v5: Move comment to ease code readability.
---
 xen/arch/arm/mem_access.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
index e0888bbad2..3e2bb4088a 100644
--- a/xen/arch/arm/mem_access.c
+++ b/xen/arch/arm/mem_access.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int __p2m_get_mem_access(struct domain *d, gfn_t gfn,
 xenmem_access_t *access)
@@ -101,6 +102,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
long flag,
   const struct vcpu *v)
 {
 long rc;
+unsigned int perms;
 paddr_t ipa;
 gfn_t gfn;
 mfn_t mfn;
@@ -110,8 +112,35 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
long flag,
 struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
 
 rc = gva_to_ipa(gva, , flag);
+
+/*
+ * In case mem_access is active, hardware-based gva_to_ipa translation
+ * might fail. Since gva_to_ipa uses the guest's translation tables, access
+ * to which might be restricted by the active VTTBR, we perform a gva to
+ * ipa translation in software.
+ */
 if ( rc < 0 )
-goto err;
+{
+/*
+ * The software gva to ipa translation can still fail, e.g., if the gva
+ * is not mapped.
+ */
+if ( guest_walk_tables(v, gva, , ) < 0 )
+goto err;
+
+/*
+ * Check permissions that are assumed by the caller. For instance in
+ * case of guestcopy, the caller assumes that the translated page can
+ * be accessed with requested permissions. If this is not the case, we
+ * should fail.
+ *
+ * Please note that we do not check for the GV2M_EXEC permission. Yet,
+ * since the hardware-based translation through gva_to_ipa does not
+ * test for execute permissions this check can be left out.
+ */
+if ( (flag & GV2M_WRITE) && !(perms & GV2M_WRITE) )
+goto err;
+}
 
 gfn = gaddr_to_gfn(ipa);
 
-- 
2.13.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v9 09/13] arm/guest_access: Rename vgic_access_guest_memory

2017-08-16 Thread Sergej Proskurin

This commit renames the function vgic_access_guest_memory to
access_guest_memory_by_ipa. As the function name suggests, the functions
expects an IPA as argument. All invocations of this function have been
adapted accordingly. Apart from that, we have adjusted all printk
messages for cleanup and to eliminate artefacts of the function's
previous location.

Signed-off-by: Sergej Proskurin 
Acked-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v6: We added this patch to our patch series.

v7: Renamed the function's argument ipa back to gpa.

Removed any mentioning of "vITS" in the function's printk messages
and adjusted the commit message accordingly.

v9: Added Acked-by Julien Grall.
---
 xen/arch/arm/guestcopy.c   | 10 +-
 xen/arch/arm/vgic-v3-its.c | 36 ++--
 xen/include/asm-arm/guest_access.h |  4 ++--
 3 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c
index 938ffe2668..4ee07fcea3 100644
--- a/xen/arch/arm/guestcopy.c
+++ b/xen/arch/arm/guestcopy.c
@@ -123,8 +123,8 @@ unsigned long raw_copy_from_guest(void *to, const void 
__user *from, unsigned le
  * Temporarily map one physical guest page and copy data to or from it.
  * The data to be copied cannot cross a page boundary.
  */
-int vgic_access_guest_memory(struct domain *d, paddr_t gpa, void *buf,
- uint32_t size, bool is_write)
+int access_guest_memory_by_ipa(struct domain *d, paddr_t gpa, void *buf,
+   uint32_t size, bool is_write)
 {
 struct page_info *page;
 uint64_t offset = gpa & ~PAGE_MASK;  /* Offset within the mapped page */
@@ -134,7 +134,7 @@ int vgic_access_guest_memory(struct domain *d, paddr_t gpa, 
void *buf,
 /* Do not cross a page boundary. */
 if ( size > (PAGE_SIZE - offset) )
 {
-printk(XENLOG_G_ERR "d%d: vITS: memory access would cross page 
boundary\n",
+printk(XENLOG_G_ERR "d%d: guestcopy: memory access crosses page 
boundary.\n",
d->domain_id);
 return -EINVAL;
 }
@@ -142,7 +142,7 @@ int vgic_access_guest_memory(struct domain *d, paddr_t gpa, 
void *buf,
 page = get_page_from_gfn(d, paddr_to_pfn(gpa), , P2M_ALLOC);
 if ( !page )
 {
-printk(XENLOG_G_ERR "d%d: vITS: Failed to get table entry\n",
+printk(XENLOG_G_ERR "d%d: guestcopy: failed to get table entry.\n",
d->domain_id);
 return -EINVAL;
 }
@@ -150,7 +150,7 @@ int vgic_access_guest_memory(struct domain *d, paddr_t gpa, 
void *buf,
 if ( !p2m_is_ram(p2mt) )
 {
 put_page(page);
-printk(XENLOG_G_ERR "d%d: vITS: memory used by the ITS should be RAM.",
+printk(XENLOG_G_ERR "d%d: guestcopy: guest memory should be RAM.\n",
d->domain_id);
 return -EINVAL;
 }
diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
index 1af6820cab..72a5c70656 100644
--- a/xen/arch/arm/vgic-v3-its.c
+++ b/xen/arch/arm/vgic-v3-its.c
@@ -131,9 +131,9 @@ static int its_set_collection(struct virt_its *its, 
uint16_t collid,
 if ( collid >= its->max_collections )
 return -ENOENT;
 
-return vgic_access_guest_memory(its->d,
-addr + collid * sizeof(coll_table_entry_t),
-_id, sizeof(vcpu_id), true);
+return access_guest_memory_by_ipa(its->d,
+  addr + collid * 
sizeof(coll_table_entry_t),
+  _id, sizeof(vcpu_id), true);
 }
 
 /* Must be called with the ITS lock held. */
@@ -149,9 +149,9 @@ static struct vcpu *get_vcpu_from_collection(struct 
virt_its *its,
 if ( collid >= its->max_collections )
 return NULL;
 
-ret = vgic_access_guest_memory(its->d,
-   addr + collid * sizeof(coll_table_entry_t),
-   _id, sizeof(coll_table_entry_t), 
false);
+ret = access_guest_memory_by_ipa(its->d,
+ addr + collid * 
sizeof(coll_table_entry_t),
+ _id, sizeof(coll_table_entry_t), 
false);
 if ( ret )
 return NULL;
 
@@ -171,9 +171,9 @@ static int its_set_itt_address(struct virt_its *its, 
uint32_t devid,
 if ( devid >= its->max_devices )
 return -ENOENT;
 
-return vgic_access_guest_memory(its->d,
-addr + devid * sizeof(dev_table_entry_t),
-_entry, sizeof(itt_entry), true);
+return access_guest_memory_by_ipa(its->d,
+  addr + devid * sizeof(dev_table_entry_t),
+  _entry, sizeof(itt_entry), true);
 }
 
 /*
@@ -189,9 +189,9 @@ static int its_get_itt(struct

[Xen-devel] [PATCH v9 04/13] arm/mem_access: Add short-descriptor pte typedefs and macros

2017-08-16 Thread Sergej Proskurin

The current implementation does not provide appropriate types for
short-descriptor translation table entries. As such, this commit adds new
types, which simplify managing the respective translation table entries.

Signed-off-by: Sergej Proskurin 
Acked-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v3: Add more short-descriptor related pte typedefs that will be used by
the following commits.

v4: Move short-descriptor pte typedefs out of page.h into short-desc.h.

Change the type unsigned int to bool of every bitfield in
short-descriptor related data-structures that holds only one bit.

Change the typedef names from pte_sd_* to short_desc_*.

v5: Add {L1|L2}DESC_* defines to this commit.

v6: Add Julien Grall's Acked-by.
---
 xen/include/asm-arm/short-desc.h | 130 +++
 1 file changed, 130 insertions(+)
 create mode 100644 xen/include/asm-arm/short-desc.h

diff --git a/xen/include/asm-arm/short-desc.h b/xen/include/asm-arm/short-desc.h
new file mode 100644
index 00..9652a103c4
--- /dev/null
+++ b/xen/include/asm-arm/short-desc.h
@@ -0,0 +1,130 @@
+#ifndef __ARM_SHORT_DESC_H__
+#define __ARM_SHORT_DESC_H__
+
+/*
+ * First level translation table descriptor types used by the AArch32
+ * short-descriptor translation table format.
+ */
+#define L1DESC_INVALID  (0)
+#define L1DESC_PAGE_TABLE   (1)
+#define L1DESC_SECTION  (2)
+#define L1DESC_SECTION_PXN  (3)
+
+/* Defines for section and supersection shifts. */
+#define L1DESC_SECTION_SHIFT(20)
+#define L1DESC_SUPERSECTION_SHIFT   (24)
+#define L1DESC_SUPERSECTION_EXT_BASE1_SHIFT (32)
+#define L1DESC_SUPERSECTION_EXT_BASE2_SHIFT (36)
+
+/* Second level translation table descriptor types. */
+#define L2DESC_INVALID  (0)
+
+/* Defines for small (4K) and large page (64K) shifts. */
+#define L2DESC_SMALL_PAGE_SHIFT (12)
+#define L2DESC_LARGE_PAGE_SHIFT (16)
+
+/*
+ * Comprises bits of the level 1 short-descriptor format representing
+ * a section.
+ */
+typedef struct __packed {
+bool pxn:1; /* Privileged Execute Never */
+bool sec:1; /* == 1 if section or supersection */
+bool b:1;   /* Bufferable */
+bool c:1;   /* Cacheable */
+bool xn:1;  /* Execute Never */
+unsigned int dom:4; /* Domain field */
+bool impl:1;/* Implementation defined */
+unsigned int ap:2;  /* AP[1:0] */
+unsigned int tex:3; /* TEX[2:0] */
+bool ro:1;  /* AP[2] */
+bool s:1;   /* Shareable */
+bool ng:1;  /* Non-global */
+bool supersec:1;/* Must be 0 for sections */
+bool ns:1;  /* Non-secure */
+unsigned int base:12;   /* Section base address */
+} short_desc_l1_sec_t;
+
+/*
+ * Comprises bits of the level 1 short-descriptor format representing
+ * a supersection.
+ */
+typedef struct __packed {
+bool pxn:1; /* Privileged Execute Never */
+bool sec:1; /* == 1 if section or supersection */
+bool b:1;   /* Bufferable */
+bool c:1;   /* Cacheable */
+bool xn:1;  /* Execute Never */
+unsigned int extbase2:4;/* Extended base address, PA[39:36] */
+bool impl:1;/* Implementation defined */
+unsigned int ap:2;  /* AP[1:0] */
+unsigned int tex:3; /* TEX[2:0] */
+bool ro:1;  /* AP[2] */
+bool s:1;   /* Shareable */
+bool ng:1;  /* Non-global */
+bool supersec:1;/* Must be 0 for sections */
+bool ns:1;  /* Non-secure */
+unsigned int extbase1:4;/* Extended base address, PA[35:32] */
+unsigned int base:8;/* Supersection base address */
+} short_desc_l1_supersec_t;
+
+/*
+ * Comprises bits of the level 2 short-descriptor format representing
+ * a small page.
+ */
+typedef struct __packed {
+bool xn:1;  /* Execute Never */
+bool page:1;/* ==1 if small page */
+bool b:1;   /* Bufferable */
+bool c:1;   /* Cacheable */
+unsigned int ap:2;  /* AP[1:0] */
+unsigned int tex:3; /* TEX[2:0] */
+bool ro:1;  /* AP[2] */
+bool s:1;   /* Shareable */
+bool ng:1;  /* Non-global */
+unsigned int base:20;   /* Small page base address */
+} short_desc_l2_page_t;
+
+/*
+ * Comprises bits of the level 2 short-descriptor format representing
+ * a large page.
+ */
+typedef struct __packed {
+bool lpage:1;   /*

[Xen-devel] [PATCH v9 02/13] arm/mem_access: Add defines supporting PTs with varying page sizes

2017-08-16 Thread Sergej Proskurin

AArch64 supports pages with different (4K, 16K, and 64K) sizes.  To
enable guest page table walks for various configurations, this commit
extends the defines and helpers of the current implementation.

Signed-off-by: Sergej Proskurin 
Reviewed-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v3: Eliminate redundant macro definitions by introducing generic macros.

v4: Replace existing macros with ones that generate static inline
helpers as to ease the readability of the code.

Move the introduced code into lpae.h

v5: Remove PAGE_SHIFT_* defines from lpae.h as we import them now from
the header xen/lib.h.

Remove *_guest_table_offset macros as to reduce the number of
exported macros which are only used once. Instead, use the
associated functionality directly within the
GUEST_TABLE_OFFSET_HELPERS.

Add comment in GUEST_TABLE_OFFSET_HELPERS stating that a page table
with 64K page size granularity does not have a zeroeth lookup level.

Add #undefs for GUEST_TABLE_OFFSET and GUEST_TABLE_OFFSET_HELPERS.

Remove CONFIG_ARM_64 #defines.

v6: Rename *_guest_table_offset_* helpers to *_table_offset_* as they
are sufficiently generic to be applied not only to the guest's page
table walks.

Change the type of the parameter and return value of the
*_table_offset_* helpers from vaddr_t to paddr_t to enable applying
these helpers also for other purposes such as computation of IPA
offsets in second stage translation tables.

v7: Clarify comments in the code and commit message to address AArch64
directly instead of ARMv8 in general.

Rename remaining GUEST_TABLE_* macros into TABLE_* macros, to be
consistent with *_table_offset_* helpers.

Added Reviewed-by Julien Grall.
---
 xen/include/asm-arm/lpae.h | 61 ++
 1 file changed, 61 insertions(+)

diff --git a/xen/include/asm-arm/lpae.h b/xen/include/asm-arm/lpae.h
index a62b118630..efec493313 100644
--- a/xen/include/asm-arm/lpae.h
+++ b/xen/include/asm-arm/lpae.h
@@ -3,6 +3,8 @@

 #ifndef __ASSEMBLY__

+#include 
+
 /*
  * WARNING!  Unlike the x86 pagetable code, where l1 is the lowest level and
  * l4 is the root of the trie, the ARM pagetables follow ARM's documentation:
@@ -151,6 +153,65 @@ static inline bool lpae_is_superpage(lpae_t pte, unsigned 
int level)
 return (level < 3) && lpae_mapping(pte);
 }

+/*
+ * AArch64 supports pages with different sizes (4K, 16K, and 64K). To enable
+ * page table walks for various configurations, the following helpers enable
+ * walking the translation table with varying page size granularities.
+ */
+
+#define LPAE_SHIFT_4K   (9)
+#define LPAE_SHIFT_16K  (11)
+#define LPAE_SHIFT_64K  (13)
+
+#define lpae_entries(gran)  (_AC(1,U) << LPAE_SHIFT_##gran)
+#define lpae_entry_mask(gran)   (lpae_entries(gran) - 1)
+
+#define third_shift(gran)   (PAGE_SHIFT_##gran)
+#define third_size(gran)((paddr_t)1 << third_shift(gran))
+
+#define second_shift(gran)  (third_shift(gran) + LPAE_SHIFT_##gran)
+#define second_size(gran)   ((paddr_t)1 << second_shift(gran))
+
+#define first_shift(gran)   (second_shift(gran) + LPAE_SHIFT_##gran)
+#define first_size(gran)((paddr_t)1 << first_shift(gran))
+
+/* Note that there is no zeroeth lookup level with a 64K granule size. */
+#define zeroeth_shift(gran) (first_shift(gran) + LPAE_SHIFT_##gran)
+#define zeroeth_size(gran)  ((paddr_t)1 << zeroeth_shift(gran))
+
+#define TABLE_OFFSET(offs, gran)  (offs & lpae_entry_mask(gran))
+#define TABLE_OFFSET_HELPERS(gran)  \
+static inline paddr_t third_table_offset_##gran##K(paddr_t va)  \
+{   \
+return TABLE_OFFSET((va >> third_shift(gran##K)), gran##K); \
+}   \
+\
+static inline paddr_t second_table_offset_##gran##K(paddr_t va) \
+{   \
+return TABLE_OFFSET((va >> second_shift(gran##K)), gran##K);\
+}   \
+\
+static inline paddr_t first_table_offset_##gran##K(paddr_t va)  \
+{   \
+return TABLE_OFFSET((va >> first_shift(gran##K)), gran##K); \
+}   \
+\
+static inline paddr_t

[Xen-devel] [PATCH v9 01/13] arm/mem_access: Add and cleanup (TCR_|TTBCR_)* defines

2017-08-16 Thread Sergej Proskurin

This commit adds (TCR_|TTBCR_)* defines to simplify access to the
respective register contents. At the same time, we adjust the macros
TCR_T0SZ and TCR_TG0_* by using the newly introduced TCR_T0SZ_SHIFT and
TCR_TG0_SHIFT instead of the hardcoded values.

Signed-off-by: Sergej Proskurin 
Acked-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v2: Define TCR_SZ_MASK in a way so that it can be also applied to 32-bit guests
using the long-descriptor translation table format.

Extend the previous commit by further defines allowing a simplified access
to the registers TCR_EL1 and TTBCR.

v3: Replace the hardcoded value 0 in the TCR_T0SZ macro with the newly
introduced TCR_T0SZ_SHIFT. Also, replace the hardcoded value 14 in
the TCR_TG0_* macros with the introduced TCR_TG0_SHIFT.

Comment when to apply the defines TTBCR_PD(0|1), according to ARM
DDI 0487B.a and ARM DDI 0406C.b.

Remove TCR_TB_* defines.

Comment when certain TCR_EL2 register fields can be applied.

v4: Cosmetic changes.

v5: Remove the shift by 0 of the TCR_SZ_MASK as it can be applied to
both TCR_T0SZ and TCR_T1SZ (which reside at different offsets).

Adjust commit message to make clear that we do not only add but also
cleanup some TCR_* defines.

v6: Changed the comment of TCR_SZ_MASK as we falsely referenced a
section instead of a page.

Add Julien Grall's Acked-by.
---
 xen/include/asm-arm/processor.h | 69 ++---
 1 file changed, 65 insertions(+), 4 deletions(-)

diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
index ab5225fa6c..bf0e1bd014 100644
--- a/xen/include/asm-arm/processor.h
+++ b/xen/include/asm-arm/processor.h
@@ -94,6 +94,13 @@
 #define TTBCR_N_2KB  _AC(0x03,U)
 #define TTBCR_N_1KB  _AC(0x04,U)

+/*
+ * TTBCR_PD(0|1) can be applied only if LPAE is disabled, i.e., TTBCR.EAE==0
+ * (ARM DDI 0487B.a G6-5203 and ARM DDI 0406C.b B4-1722).
+ */
+#define TTBCR_PD0   (_AC(1,U)<<4)
+#define TTBCR_PD1   (_AC(1,U)<<5)
+
 /* SCTLR System Control Register. */
 /* HSCTLR is a subset of this. */
 #define SCTLR_TE(_AC(1,U)<<30)
@@ -154,7 +161,20 @@

 /* TCR: Stage 1 Translation Control */

-#define TCR_T0SZ(x) ((x)<<0)
+#define TCR_T0SZ_SHIFT  (0)
+#define TCR_T1SZ_SHIFT  (16)
+#define TCR_T0SZ(x) ((x)<

[Xen-devel] [PATCH v9 07/13] arm/mem_access: Introduce GENMASK_ULL bit operation

2017-08-16 Thread Sergej Proskurin

The current implementation of GENMASK is capable of creating bitmasks of
32-bit values on AArch32 and 64-bit values on AArch64. As we need to
create masks for 64-bit values on AArch32 as well, in this commit we
introduce the GENMASK_ULL bit operation. Please note that the
GENMASK_ULL implementation has been lifted from the linux kernel source
code.

Signed-off-by: Sergej Proskurin 
Reviewed-by: Stefano Stabellini 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Konrad Rzeszutek Wilk 
Cc: Stefano Stabellini 
Cc: Tim Deegan 
Cc: Wei Liu 
---
v6: As similar patches have been already submitted and NACKED in the
past, we resubmit this patch with 'THE REST' maintainers in Cc to
discuss whether this patch shall be applied into common or put into
ARM related code.

v7: Change the introduced macro BITS_PER_LONG_LONG to BITS_PER_LLONG.

Define BITS_PER_LLONG also in asm-x86/config.h in order to allow
global usage of the introduced macro GENMASK_ULL.

Remove previously unintended whitespace elimination in the function
get_bitmask_order as it is not the right patch to address cleanup.

v9: Add Reviewed-by Stenafo Stabellini.
---
 xen/include/asm-arm/config.h | 2 ++
 xen/include/asm-x86/config.h | 2 ++
 xen/include/xen/bitops.h | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
index 5b6f3c985d..7da94698e1 100644
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -19,6 +19,8 @@
 #define BITS_PER_LONG (BYTES_PER_LONG << 3)
 #define POINTER_ALIGN BYTES_PER_LONG
 
+#define BITS_PER_LLONG 64
+
 /* xen_ulong_t is always 64 bits */
 #define BITS_PER_XEN_ULONG 64
 
diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index 25af085af0..0130ac864f 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -15,6 +15,8 @@
 #define BITS_PER_BYTE 8
 #define POINTER_ALIGN BYTES_PER_LONG
 
+#define BITS_PER_LLONG 64
+
 #define BITS_PER_XEN_ULONG BITS_PER_LONG
 
 #define CONFIG_PAGING_ASSISTANCE 1
diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index bd0883ab22..e2019b02a3 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -10,6 +10,9 @@
 #define GENMASK(h, l) \
 (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h
 
+#define GENMASK_ULL(h, l) \
+(((~0ULL) << (l)) & (~0ULL >> (BITS_PER_LLONG - 1 - (h
+
 /*
  * ffs: find first bit set. This is defined the same way as
  * the libc and compiler builtin ffs routines, therefore
-- 
2.13.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v9 08/13] arm/guest_access: Move vgic_access_guest_memory to guest_access.h

2017-08-16 Thread Sergej Proskurin

This commit moves the function vgic_access_guest_memory to guestcopy.c
and the header asm/guest_access.h. No functional changes are made.
Please note that the function will be renamed in the following commit.

Signed-off-by: Sergej Proskurin 
Acked-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v6: We added this patch to our patch series.

v7: Add Acked-by Julien Grall.

v9: Include  in  to fix build issues
due to missing type information.
---
 xen/arch/arm/guestcopy.c   | 50 ++
 xen/arch/arm/vgic-v3-its.c |  1 +
 xen/arch/arm/vgic.c| 49 -
 xen/include/asm-arm/guest_access.h |  4 +++
 xen/include/asm-arm/vgic.h |  3 ---
 5 files changed, 55 insertions(+), 52 deletions(-)

diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c
index 413125f02b..938ffe2668 100644
--- a/xen/arch/arm/guestcopy.c
+++ b/xen/arch/arm/guestcopy.c
@@ -118,6 +118,56 @@ unsigned long raw_copy_from_guest(void *to, const void 
__user *from, unsigned le
 }
 return 0;
 }
+
+/*
+ * Temporarily map one physical guest page and copy data to or from it.
+ * The data to be copied cannot cross a page boundary.
+ */
+int vgic_access_guest_memory(struct domain *d, paddr_t gpa, void *buf,
+ uint32_t size, bool is_write)
+{
+struct page_info *page;
+uint64_t offset = gpa & ~PAGE_MASK;  /* Offset within the mapped page */
+p2m_type_t p2mt;
+void *p;
+
+/* Do not cross a page boundary. */
+if ( size > (PAGE_SIZE - offset) )
+{
+printk(XENLOG_G_ERR "d%d: vITS: memory access would cross page 
boundary\n",
+   d->domain_id);
+return -EINVAL;
+}
+
+page = get_page_from_gfn(d, paddr_to_pfn(gpa), , P2M_ALLOC);
+if ( !page )
+{
+printk(XENLOG_G_ERR "d%d: vITS: Failed to get table entry\n",
+   d->domain_id);
+return -EINVAL;
+}
+
+if ( !p2m_is_ram(p2mt) )
+{
+put_page(page);
+printk(XENLOG_G_ERR "d%d: vITS: memory used by the ITS should be RAM.",
+   d->domain_id);
+return -EINVAL;
+}
+
+p = __map_domain_page(page);
+
+if ( is_write )
+memcpy(p + offset, buf, size);
+else
+memcpy(buf, p + offset, size);
+
+unmap_domain_page(p);
+put_page(page);
+
+return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
index 9ef792f479..1af6820cab 100644
--- a/xen/arch/arm/vgic-v3-its.c
+++ b/xen/arch/arm/vgic-v3-its.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 1e5107b9f8..7a4e3cdc88 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -638,55 +638,6 @@ void vgic_free_virq(struct domain *d, unsigned int virq)
 }
 
 /*
- * Temporarily map one physical guest page and copy data to or from it.
- * The data to be copied cannot cross a page boundary.
- */
-int vgic_access_guest_memory(struct domain *d, paddr_t gpa, void *buf,
- uint32_t size, bool is_write)
-{
-struct page_info *page;
-uint64_t offset = gpa & ~PAGE_MASK;  /* Offset within the mapped page */
-p2m_type_t p2mt;
-void *p;
-
-/* Do not cross a page boundary. */
-if ( size > (PAGE_SIZE - offset) )
-{
-printk(XENLOG_G_ERR "d%d: vITS: memory access would cross page 
boundary\n",
-   d->domain_id);
-return -EINVAL;
-}
-
-page = get_page_from_gfn(d, paddr_to_pfn(gpa), , P2M_ALLOC);
-if ( !page )
-{
-printk(XENLOG_G_ERR "d%d: vITS: Failed to get table entry\n",
-   d->domain_id);
-return -EINVAL;
-}
-
-if ( !p2m_is_ram(p2mt) )
-{
-put_page(page);
-printk(XENLOG_G_ERR "d%d: vITS: memory used by the ITS should be RAM.",
-   d->domain_id);
-return -EINVAL;
-}
-
-p = __map_domain_page(page);
-
-if ( is_write )
-memcpy(p + offset, buf, size);
-else
-memcpy(buf, p + offset, size);
-
-unmap_domain_page(p);
-put_page(page);
-
-return 0;
-}
-
-/*
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
diff --git a/xen/include/asm-arm/guest_access.h 
b/xen/include/asm-arm/guest_access.h
index 251e935597..df5737cbe4 100644
--- a/xen/include/asm-arm/guest_access.h
+++ b/xen/include/asm-arm/guest_access.h
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 unsigned long raw_copy_to_guest(void *to, const void *from, unsigned len);
 unsigned long raw_copy_to_guest_flush_dcache(void *to, const void *from,
@@ -10,6 +11,9 @@ unsigned long raw_copy_to_guest_flush_dcache(void *to, const 
void *from,
 unsigned long raw_copy_from_guest(void *to, const void *from, unsigned len);

[Xen-devel] [PATCH v9 03/13] arm/lpae: Introduce lpae_is_page helper

2017-08-16 Thread Sergej Proskurin

This commit introduces a new helper that checks whether the target PTE
holds a page mapping or not. This helper will be used as part of the
following commits.

Signed-off-by: Sergej Proskurin 
Reviewed-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v6: Change the name of the lpae_page helper to lpae_is_page.

Add Julien Grall's Reviewed-by.
---
 xen/include/asm-arm/lpae.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/include/asm-arm/lpae.h b/xen/include/asm-arm/lpae.h
index efec493313..118ee5ae1a 100644
--- a/xen/include/asm-arm/lpae.h
+++ b/xen/include/asm-arm/lpae.h
@@ -153,6 +153,11 @@ static inline bool lpae_is_superpage(lpae_t pte, unsigned 
int level)
 return (level < 3) && lpae_mapping(pte);
 }
 
+static inline bool lpae_is_page(lpae_t pte, unsigned int level)
+{
+return (level == 3) && lpae_valid(pte) && pte.walk.table;
+}
+
 /*
  * AArch64 supports pages with different sizes (4K, 16K, and 64K). To enable
  * page table walks for various configurations, the following helpers enable
-- 
2.13.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v9 10/13] arm/mem_access: Add software guest-page-table walk

2017-08-16 Thread Sergej Proskurin

The function p2m_mem_access_check_and_get_page in mem_access.c
translates a gva to an ipa by means of the hardware functionality of the
ARM architecture. This is implemented in the function gva_to_ipa. If
mem_access is active, hardware-based gva to ipa translation might fail,
as gva_to_ipa uses the guest's translation tables, access to which might
be restricted by the active VTTBR. To address this issue, in this commit
we add a software-based guest-page-table walk, which will be used by the
function p2m_mem_access_check_and_get_page perform the gva to ipa
translation in software in one of the following commits.

Note: The introduced function guest_walk_tables assumes that the domain,
the gva of which is to be translated, is running on the currently active
vCPU. To walk the guest's page tables on a different vCPU, the following
registers would need to be loaded: TCR_EL1, TTBR0_EL1, TTBR1_EL1, and
SCTLR_EL1.

Signed-off-by: Sergej Proskurin 
Acked-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v2: Rename p2m_gva_to_ipa to p2m_walk_gpt and move it to p2m.c.

Move the functionality responsible for walking long-descriptor based
translation tables out of the function p2m_walk_gpt. Also move out
the long-descriptor based translation out of this commit.

Change function parameters in order to return access access rights
to a requested gva.

Cosmetic fixes.

v3: Rename the introduced functions to guest_walk_(tables|sd|ld) and
move the implementation to guest_copy.(c|h).

Set permissions in guest_walk_tables also if the MMU is disabled.

Change the function parameter of type "struct p2m_domain *" to
"struct vcpu *" in the function guest_walk_tables.

v4: Change the function parameter of type "struct p2m_domain *" to
"struct vcpu *" in the functions guest_walk_(sd|ld) as well.

v5: Merge two if-statements in guest_walk_tables to ease readability.

Set perms to GV2M_READ as to avoid undefined permissions.

Add Julien Grall's Acked-by.

v6: Adjusted change-log of v5.

Remove Julien Grall's Acked-by as we have changed the initialization
of perms. This needs to be reviewed.

Comment why we initialize perms with GV2M_READ by default. This is
due to the fact that in the current implementation we assume a GVA
to IPA translation with EL1 privileges. Since, valid mappings in the
first stage address translation table are readable by default for
EL1, we initialize perms with GV2M_READ and extend the permissions
according to the particular page table walk.

v7: Add Acked-by Julien Grall.
---
 xen/arch/arm/Makefile|  1 +
 xen/arch/arm/guest_walk.c| 99 
 xen/include/asm-arm/guest_walk.h | 19 
 3 files changed, 119 insertions(+)
 create mode 100644 xen/arch/arm/guest_walk.c
 create mode 100644 xen/include/asm-arm/guest_walk.h

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 49e1fb2f84..282d2c2949 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_HAS_GICV3) += gic-v3.o
 obj-$(CONFIG_HAS_ITS) += gic-v3-its.o
 obj-$(CONFIG_HAS_ITS) += gic-v3-lpi.o
 obj-y += guestcopy.o
+obj-y += guest_walk.o
 obj-y += hvm.o
 obj-y += io.o
 obj-y += irq.o
diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
new file mode 100644
index 00..78badc2949
--- /dev/null
+++ b/xen/arch/arm/guest_walk.c
@@ -0,0 +1,99 @@
+/*
+ * Guest page table walk
+ * Copyright (c) 2017 Sergej Proskurin 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see .
+ */
+
+#include 
+
+/*
+ * The function guest_walk_sd translates a given GVA into an IPA using the
+ * short-descriptor translation table format in software. This function assumes
+ * that the domain is running on the currently active vCPU. To walk the guest's
+ * page table on a different vCPU, the following registers would need to be
+ * loaded: TCR_EL1, TTBR0_EL1, TTBR1_EL1, and SCTLR_EL1.
+ */
+static int guest_walk_sd(const struct vcpu *v,
+ vaddr_t gva, paddr_t *ipa,
+ unsigned int *perms)
+{
+/* Not implemented yet. */
+return -EFAULT;
+}
+
+/*
+ * The function guest_walk_ld translates a given GVA into an IPA using the
+ * long-descriptor translation

[Xen-devel] [PATCH v9 06/13] arm/mem_access: Introduce BIT_ULL bit operation

2017-08-16 Thread Sergej Proskurin

We introduce the BIT_ULL macro to using values of unsigned long long as
to enable setting bits of 64-bit registers on AArch32.  In addition,
this commit adds a define holding the register width of 64 bit
double-word registers. This define simplifies using the associated
constants in the following commits.

Signed-off-by: Sergej Proskurin 
Reviewed-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v4: We reused the previous commit with the msg "arm/mem_access: Add
defines holding the width of 32/64bit regs" from v3, as we can reuse
the already existing define BITS_PER_WORD.

v5: Introduce a new macro BIT_ULL instead of changing the type of the
macro BIT.

Remove the define BITS_PER_DOUBLE_WORD.

v6: Add Julien Grall's Reviewed-by.
---
 xen/include/asm-arm/bitops.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/include/asm-arm/bitops.h b/xen/include/asm-arm/bitops.h
index bda889841b..1cbfb9edb2 100644
--- a/xen/include/asm-arm/bitops.h
+++ b/xen/include/asm-arm/bitops.h
@@ -24,6 +24,7 @@
 #define BIT(nr) (1UL << (nr))
 #define BIT_MASK(nr)(1UL << ((nr) % BITS_PER_WORD))
 #define BIT_WORD(nr)((nr) / BITS_PER_WORD)
+#define BIT_ULL(nr) (1ULL << (nr))
 #define BITS_PER_BYTE   8
 
 #define ADDR (*(volatile int *) addr)
-- 
2.13.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v9 00/13] arm/mem_access: Walk guest page tables in SW

2017-08-16 Thread Sergej Proskurin

The function p2m_mem_access_check_and_get_page is called from the
function get_page_from_gva if mem_access is active and the
hardware-aided translation of the given guest virtual address (gva) into
machine address fails. That is, if the stage-2 translation tables
constrain access to the guests's page tables, hardware-assisted
translation will fail. The idea of the function
p2m_mem_access_check_and_get_page is thus to translate the given gva and
check the requested access rights in software. However, as the current
implementation of p2m_mem_access_check_and_get_page makes use of the
hardware-aided gva to ipa translation, the translation might also fail
because of reasons stated above and will become equally relevant for the
altp2m implementation on ARM.  As such, we provide a software guest
translation table walk to address the above mentioned issue.

The current version of the implementation supports translation of both
the short-descriptor as well as the long-descriptor translation table
format on ARMv7 and ARMv8 (AArch32/AArch64).

This revised version incorporates the comments of the previous patch
series, which mainly comprise minor cosmetic fixes. All changes have
been discussed with the associated maintainers and accordingly stated in
the individual patches.

The following patch series can be found on Github[0].

Cheers,
~Sergej

[0] https://github.com/sergej-proskurin/xen (branch arm-gpt-walk-v9)

Sergej Proskurin (13):
  arm/mem_access: Add and cleanup (TCR_|TTBCR_)* defines
  arm/mem_access: Add defines supporting PTs with varying page sizes
  arm/lpae: Introduce lpae_is_page helper
  arm/mem_access: Add short-descriptor pte typedefs and macros
  arm/mem_access: Introduce GV2M_EXEC permission
  arm/mem_access: Introduce BIT_ULL bit operation
  arm/mem_access: Introduce GENMASK_ULL bit operation
  arm/guest_access: Move vgic_access_guest_memory to guest_access.h
  arm/guest_access: Rename vgic_access_guest_memory
  arm/mem_access: Add software guest-page-table walk
  arm/mem_access: Add long-descriptor based gpt
  arm/mem_access: Add short-descriptor based gpt
  arm/mem_access: Walk the guest's pt in software

 xen/arch/arm/Makefile  |   1 +
 xen/arch/arm/guest_walk.c  | 636 +
 xen/arch/arm/guestcopy.c   |  50 +++
 xen/arch/arm/mem_access.c  |  31 +-
 xen/arch/arm/vgic-v3-its.c |  37 +--
 xen/arch/arm/vgic.c|  49 ---
 xen/include/asm-arm/bitops.h   |   1 +
 xen/include/asm-arm/config.h   |   2 +
 xen/include/asm-arm/guest_access.h |   4 +
 xen/include/asm-arm/guest_walk.h   |  19 ++
 xen/include/asm-arm/lpae.h |  66 
 xen/include/asm-arm/p2m.h  |   8 +-
 xen/include/asm-arm/page.h |   1 +
 xen/include/asm-arm/processor.h|  69 +++-
 xen/include/asm-arm/short-desc.h   | 130 
 xen/include/asm-arm/vgic.h |   3 -
 xen/include/asm-x86/config.h   |   2 +
 xen/include/xen/bitops.h   |   3 +
 18 files changed, 1036 insertions(+), 76 deletions(-)
 create mode 100644 xen/arch/arm/guest_walk.c
 create mode 100644 xen/include/asm-arm/guest_walk.h
 create mode 100644 xen/include/asm-arm/short-desc.h

-- 
2.13.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v9 05/13] arm/mem_access: Introduce GV2M_EXEC permission

2017-08-16 Thread Sergej Proskurin

We extend the current implementation by an additional permission,
GV2M_EXEC, which will be used to describe execute permissions of PTE's
as part of our guest translation table walk implementation.

Signed-off-by: Sergej Proskurin 
Acked-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
 xen/include/asm-arm/page.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index cef2f28914..b8d641bfaf 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -90,6 +90,7 @@
 /* Flags for get_page_from_gva, gvirt_to_maddr etc */
 #define GV2M_READ  (0u<<0)
 #define GV2M_WRITE (1u<<0)
+#define GV2M_EXEC  (1u<<1)
 
 #ifndef __ASSEMBLY__
 
-- 
2.13.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

[Xen-devel] [PATCH v9 12/13] arm/mem_access: Add short-descriptor based gpt

2017-08-16 Thread Sergej Proskurin

This commit adds functionality to walk the guest's page tables using the
short-descriptor translation table format for both ARMv7 and ARMv8. The
implementation is based on ARM DDI 0487B-a J1-6002 and ARM DDI 0406C-b
B3-1506.

Signed-off-by: Sergej Proskurin 
Acked-by: Julien Grall 
---
Cc: Stefano Stabellini 
Cc: Julien Grall 
---
v3: Move the implementation to ./xen/arch/arm/guest_copy.c.

Use defines instead of hardcoded values.

Cosmetic fixes & Added more coments.

v4: Adjusted the names of short-descriptor data-types.

Adapt the function to the new parameter of type "struct vcpu *".

Cosmetic fixes.

v5: Make use of the function vgic_access_guest_memory read page table
entries in guest memory. At the same time, eliminate the offsets
array, as there is no need for an array. Instead, we apply the
associated masks to compute the GVA offsets directly in the code.

Use GENMASK to compute complex masks to ease code readability.

Use the type uint32_t for the TTBR register.

Make use of L2DESC_{SMALL|LARGE}_PAGE_SHIFT instead of
PAGE_SHIFT_{4K|64K} macros.

Remove {L1|L2}DESC_* defines from this commit.

Add comments and cosmetic fixes.

v6: Remove the variable level from the function guest_walk_sd as it is a
left-over from previous commits and is not used anymore.

Remove the falsely added issue that applied the mask to the gva
using the %-operator in the L1DESC_PAGE_TABLE case. Instead, use the
&-operator as it should have been done in the first place.

Make use of renamed function access_guest_memory_by_ipa instead of
vgic_access_guest_memory.

v7: Added Acked-by Julien Grall.

v8: We cast pte.*.base to paddr_t to cope with C type promotion of
types smaller than int. Otherwise pte.*.base would be casted to
int and subsequently sign extended, thus leading to a wrong value.
---
 xen/arch/arm/guest_walk.c | 147 +-
 1 file changed, 145 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/guest_walk.c b/xen/arch/arm/guest_walk.c
index d0d45ad659..c38bedcf65 100644
--- a/xen/arch/arm/guest_walk.c
+++ b/xen/arch/arm/guest_walk.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * The function guest_walk_sd translates a given GVA into an IPA using the
@@ -31,8 +32,150 @@ static int guest_walk_sd(const struct vcpu *v,
  vaddr_t gva, paddr_t *ipa,
  unsigned int *perms)
 {
-/* Not implemented yet. */
-return -EFAULT;
+int ret;
+bool disabled = true;
+uint32_t ttbr;
+paddr_t mask, paddr;
+short_desc_t pte;
+register_t ttbcr = READ_SYSREG(TCR_EL1);
+unsigned int n = ttbcr & TTBCR_N_MASK;
+struct domain *d = v->domain;
+
+mask = GENMASK_ULL(31, (32 - n));
+
+if ( n == 0 || !(gva & mask) )
+{
+/*
+ * Use TTBR0 for GVA to IPA translation.
+ *
+ * Note that on AArch32, the TTBR0_EL1 register is 32-bit wide.
+ * Nevertheless, we have to use the READ_SYSREG64 macro, as it is
+ * required for reading TTBR0_EL1.
+ */
+ttbr = READ_SYSREG64(TTBR0_EL1);
+
+/* If TTBCR.PD0 is set, translations using TTBR0 are disabled. */
+disabled = ttbcr & TTBCR_PD0;
+}
+else
+{
+/*
+ * Use TTBR1 for GVA to IPA translation.
+ *
+ * Note that on AArch32, the TTBR1_EL1 register is 32-bit wide.
+ * Nevertheless, we have to use the READ_SYSREG64 macro, as it is
+ * required for reading TTBR1_EL1.
+ */
+ttbr = READ_SYSREG64(TTBR1_EL1);
+
+/* If TTBCR.PD1 is set, translations using TTBR1 are disabled. */
+disabled = ttbcr & TTBCR_PD1;
+
+/*
+ * TTBR1 translation always works like n==0 TTBR0 translation (ARM DDI
+ * 0487B.a J1-6003).
+ */
+n = 0;
+}
+
+if ( disabled )
+return -EFAULT;
+
+/*
+ * The address of the L1 descriptor for the initial lookup has the
+ * following format: [ttbr<31:14-n>:gva<31-n:20>:00] (ARM DDI 0487B.a
+ * J1-6003). Note that the following GPA computation already considers that
+ * the first level address translation might comprise up to four
+ * consecutive pages and does not need to be page-aligned if n > 2.
+ */
+mask = GENMASK(31, (14 - n));
+paddr = (ttbr & mask);
+
+mask = GENMASK((31 - n), 20);
+paddr |= (gva & mask) >> 18;
+
+/* Access the guest's memory to read only one PTE. */
+ret = access_guest_memory_by_ipa(d, paddr, , sizeof(short_desc_t), 
false);
+if ( ret )
+return -EINVAL;
+
+switch ( pte.walk.dt )
+{
+case L1DESC_INVALID:
+return -EFAULT;
+
+case L1DESC_PAGE_TABLE:
+/*
+ * The address of the L2 descriptor has the following format:
+ *

1 2 3 >

1 - 100 of 215 matches

Mail list logo