Re: [Xen-devel] [PATCH 3/3] x86/xen: use guest_late_init to detect Xen PVH guest

2017-11-08 Thread Paolo Bonzini
On 08/11/2017 13:24, Juergen Gross wrote:
>> My understanding of Xen is very rusty at this point, but I think a
>> "completely" legacy-free HVM domain will still have a PCI bus and the
>> Xen platform device on that bus.
>>
>> A PVH domain just knows how to access the Xen PV features.
>
> A HVM domain does so, too. Today maybe only partially, but e.g. event
> channels work in a HVM domain even without the Xen platform device.
> Grant tables can be made working without the platform device, too,
> and I'm already preparing a patch to do exactly that.

What about assigned PCI devices?  I think they are not PV pcifront for
HVM.  So the main difference in the end is the PCI bus.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] x86/xen: use guest_late_init to detect Xen PVH guest

2017-11-08 Thread Paolo Bonzini
On 08/11/2017 12:55, Juergen Gross wrote:
> On 08/11/17 12:18, Jan Beulich wrote:
> On 08.11.17 at 10:07,  wrote:
>>> In case we are booted via the default boot entry by a generic loader
>>> like grub or OVMF it is necessary to distinguish between a HVM guest
>>> with a device model supporting legacy devices and a PVH guest without
>>> device model.
>>>
>>> PVH guests will always have x86_platform.legacy.no_vga set and
>>> x86_platform.legacy.rtc cleared, while both won't be true for HVM
>>> guests.
>>>
>>> Test for both conditions in the guest_late_init hook and set xen_pvh
>>> to true if they are met.
>>
>> This sounds pretty fragile to me: I can't see a reason why a proper
>> HVM guest couldn't come without VGA and RTC. That's not possible
>> today, agreed, but certainly an option down the road if virtualization
>> follows bare metal's road towards being legacy free.
> 
> From guest's perspective: what is the difference between a legacy free
> HVM domain and PVH? In the end the need for differentiating is to avoid
> access to legacy features in PVH as those would require a device model.

My understanding of Xen is very rusty at this point, but I think a
"completely" legacy-free HVM domain will still have a PCI bus and the
Xen platform device on that bus.

A PVH domain just knows how to access the Xen PV features.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/3] x86: add guest_late_init hook to hypervisor_x86 structure

2017-11-08 Thread Paolo Bonzini
On 08/11/2017 10:07, Juergen Gross wrote:
> Add a new guest_late_init hook to the hypervisor_x86 structure. It
> will replace the current kvm_guest_init() call which is changed to
> make use of the new hook.
> 
> Signed-off-by: Juergen Gross <jgr...@suse.com>

The trivial KVM changes are of course

Acked-by: Paolo Bonzini <pbonz...@redhat.com>

Paolo

> ---
>  arch/x86/include/asm/hypervisor.h | 11 +++
>  arch/x86/include/asm/kvm_para.h   |  2 --
>  arch/x86/kernel/kvm.c |  3 ++-
>  arch/x86/kernel/setup.c   |  2 +-
>  4 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/hypervisor.h 
> b/arch/x86/include/asm/hypervisor.h
> index 0ead9dbb9130..37320687b8cb 100644
> --- a/arch/x86/include/asm/hypervisor.h
> +++ b/arch/x86/include/asm/hypervisor.h
> @@ -38,6 +38,9 @@ struct hypervisor_x86 {
>   /* Platform setup (run once per boot) */
>   void(*init_platform)(void);
>  
> + /* Guest late init */
> + void(*guest_late_init)(void);
> +
>   /* X2APIC detection (run once per boot) */
>   bool(*x2apic_available)(void);
>  
> @@ -66,9 +69,17 @@ static inline void hypervisor_init_mem_mapping(void)
>   if (x86_hyper && x86_hyper->init_mem_mapping)
>   x86_hyper->init_mem_mapping();
>  }
> +
> +static inline void hypervisor_guest_late_init(void)
> +{
> + if (x86_hyper && x86_hyper->guest_late_init)
> + x86_hyper->guest_late_init();
> +}
> +
>  #else
>  static inline void init_hypervisor_platform(void) { }
>  static inline bool hypervisor_x2apic_available(void) { return false; }
>  static inline void hypervisor_init_mem_mapping(void) { }
> +static inline void hypervisor_guest_late_init(void) { }
>  #endif /* CONFIG_HYPERVISOR_GUEST */
>  #endif /* _ASM_X86_HYPERVISOR_H */
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index c373e44049b1..7b407dda2bd7 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -88,7 +88,6 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned 
> long p1,
>  #ifdef CONFIG_KVM_GUEST
>  bool kvm_para_available(void);
>  unsigned int kvm_arch_para_features(void);
> -void __init kvm_guest_init(void);
>  void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
>  void kvm_async_pf_task_wake(u32 token);
>  u32 kvm_read_and_reset_pf_reason(void);
> @@ -103,7 +102,6 @@ static inline void kvm_spinlock_init(void)
>  #endif /* CONFIG_PARAVIRT_SPINLOCKS */
>  
>  #else /* CONFIG_KVM_GUEST */
> -#define kvm_guest_init() do {} while (0)
>  #define kvm_async_pf_task_wait(T, I) do {} while(0)
>  #define kvm_async_pf_task_wake(T) do {} while(0)
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 8bb9594d0761..d331b5060aa9 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -465,7 +465,7 @@ static void __init kvm_apf_trap_init(void)
>   update_intr_gate(X86_TRAP_PF, async_page_fault);
>  }
>  
> -void __init kvm_guest_init(void)
> +static void __init kvm_guest_init(void)
>  {
>   int i;
>  
> @@ -548,6 +548,7 @@ const struct hypervisor_x86 x86_hyper_kvm __refconst = {
>   .name   = "KVM",
>   .detect = kvm_detect,
>   .x2apic_available   = kvm_para_available,
> + .guest_late_init= kvm_guest_init,
>  };
>  EXPORT_SYMBOL_GPL(x86_hyper_kvm);
>  
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 0957dd73d127..578569481d87 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1294,7 +1294,7 @@ void __init setup_arch(char **cmdline_p)
>  
>   io_apic_init_mappings();
>  
> - kvm_guest_init();
> + hypervisor_guest_late_init();
>  
>   e820__reserve_resources();
>   e820__register_nosave_regions(max_low_pfn);
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v7 2/5] x86/pvclock: add setter for pvclock_pvti_cpu0_va

2017-11-06 Thread Paolo Bonzini
On 19/10/2017 15:39, Joao Martins wrote:
> Right now there is only a pvclock_pvti_cpu0_va() which is defined
> on kvmclock since:
> 
> commit dac16fba6fc5
> ("x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap")
> 
> The only user of this interface so far is kvm. This commit adds a
> setter function for the pvti page and moves pvclock_pvti_cpu0_va
> to pvclock, which is a more generic place to have it; and would
> allow other PV clocksources to use it, such as Xen.
> 
> Signed-off-by: Joao Martins <joao.m.mart...@oracle.com>
> Acked-by: Andy Lutomirski <l...@kernel.org>

Acked-by: Paolo Bonzini <pbonz...@redhat.com>

IOW, the Xen folks are free to pick up the whole series. :)

Paolo

> ---
> Changes since v1:
>  * Rebased: the only conflict was that I had move the export
>  pvclock_pvti_cpu0_va() symbol as it is used by kvm PTP driver.
>  * Do not initialize pvti_cpu0_va to NULL (checkpatch error)
>  ( Comments from Andy Lutomirski )
>  * Removed asm/pvclock.h 'pvclock_set_pvti_cpu0_va' definition
>  for non !PARAVIRT_CLOCK to better track screwed Kconfig stuff.
>  * Add his Acked-by (provided the previous adjustment was made)
> 
> Changes since RFC:
>  (Comments from Andy Lutomirski)
>  * Add __init to pvclock_set_pvti_cpu0_va
>  * Add WARN_ON(vclock_was_used(VCLOCK_PVCLOCK)) to
>  pvclock_set_pvti_cpu0_va
> ---
>  arch/x86/include/asm/pvclock.h | 19 ++-
>  arch/x86/kernel/kvmclock.c |  7 +--
>  arch/x86/kernel/pvclock.c  | 14 ++
>  3 files changed, 25 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
> index 448cfe1b48cf..6f228f90cdd7 100644
> --- a/arch/x86/include/asm/pvclock.h
> +++ b/arch/x86/include/asm/pvclock.h
> @@ -4,15 +4,6 @@
>  #include 
>  #include 
>  
> -#ifdef CONFIG_KVM_GUEST
> -extern struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
> -#else
> -static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
> -{
> - return NULL;
> -}
> -#endif
> -
>  /* some helper functions for xen and kvm pv clock sources */
>  u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
>  u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
> @@ -101,4 +92,14 @@ struct pvclock_vsyscall_time_info {
>  
>  #define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info)
>  
> +#ifdef CONFIG_PARAVIRT_CLOCK
> +void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti);
> +struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void);
> +#else
> +static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
> +{
> + return NULL;
> +}
> +#endif
> +
>  #endif /* _ASM_X86_PVCLOCK_H */
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index d88967659098..538738047ff5 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -47,12 +47,6 @@ early_param("no-kvmclock", parse_no_kvmclock);
>  static struct pvclock_vsyscall_time_info *hv_clock;
>  static struct pvclock_wall_clock wall_clock;
>  
> -struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
> -{
> - return hv_clock;
> -}
> -EXPORT_SYMBOL_GPL(pvclock_pvti_cpu0_va);
> -
>  /*
>   * The wallclock is the time of day when we booted. Since then, some time may
>   * have elapsed since the hypervisor wrote the data. So we try to account for
> @@ -334,6 +328,7 @@ int __init kvm_setup_vsyscall_timeinfo(void)
>   return 1;
>   }
>  
> + pvclock_set_pvti_cpu0_va(hv_clock);
>   put_cpu();
>  
>   kvm_clock.archdata.vclock_mode = VCLOCK_PVCLOCK;
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 5c3f6d6a5078..cb7d6d9c9c2d 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -25,8 +25,10 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  static u8 valid_flags __read_mostly = 0;
> +static struct pvclock_vsyscall_time_info *pvti_cpu0_va __read_mostly;
>  
>  void pvclock_set_flags(u8 flags)
>  {
> @@ -144,3 +146,15 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
> *wall_clock,
>  
>   set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
>  }
> +
> +void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti)
> +{
> + WARN_ON(vclock_was_used(VCLOCK_PVCLOCK));
> + pvti_cpu0_va = pvti;
> +}
> +
> +struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
> +{
> + return pvti_cpu0_va;
> +}
> +EXPORT_SYMBOL_GPL(pvclock_pvti_cpu0_va);
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] pci-assign: Remove

2017-10-20 Thread Paolo Bonzini
Legacy PCI device assignment has been removed from Linux in 4.12,
and had been deprecated 2 years ago there.  We can remove it from
QEMU as well.

The ROM loading code was shared with Xen PCI passthrough, so move
it to hw/xen.

Cc: Stefano Stabellini <sstabell...@kernel.org>
Cc: Anthony Perard <anthony.per...@citrix.com>
Cc: xen-de...@lists.xenproject.org
Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
Xen parts only compile-tested.

 docs/qdev-device-use.txt   |   12 +-
 hw/i386/Makefile.objs  |1 -
 hw/i386/kvm/Makefile.objs  |2 +-
 hw/i386/kvm/pci-assign.c   | 1883 
 hw/xen/Makefile.objs   |1 +
 .../xen_pt_load_rom.c} |4 +-
 include/hw/pci/pci-assign.h|   27 -
 qdev-monitor.c |1 -
 scripts/device-crash-test  |2 -
 9 files changed, 6 insertions(+), 1927 deletions(-)
 delete mode 100644 hw/i386/kvm/pci-assign.c
 rename hw/{i386/pci-assign-load-rom.c => xen/xen_pt_load_rom.c} (96%)
 delete mode 100644 include/hw/pci/pci-assign.h

diff --git a/docs/qdev-device-use.txt b/docs/qdev-device-use.txt
index 1f297b5e9c..8f188d1d0b 100644
--- a/docs/qdev-device-use.txt
+++ b/docs/qdev-device-use.txt
@@ -366,17 +366,9 @@ bus=PCI-BUS,addr=DEVFN to control the PCI device address, 
as usual.
 === Host Device Assignment ===
 
 QEMU supports assigning host PCI devices (qemu-kvm only at this time)
-and host USB devices.
+and host USB devices.  PCI devices can only be assigned with -device:
 
-The old way to assign a host PCI device is
-
--pcidevice host=ADDR,dma=none,id=ID
-
-The new way is
-
--device pci-assign,host=ADDR,iommu=IOMMU,id=ID
-
-The old dma=none becomes iommu=off with -device.
+-device vfio-pci,host=ADDR,id=ID
 
 The old way to assign a host USB device is
 
diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 909ead6a77..2e5e1299ad 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -8,4 +8,3 @@ obj-$(CONFIG_XEN) += ../xenpv/ xen/
 
 obj-y += kvmvapic.o
 obj-y += acpi-build.o
-obj-y += pci-assign-load-rom.o
diff --git a/hw/i386/kvm/Makefile.objs b/hw/i386/kvm/Makefile.objs
index d8bce209bf..4224ed900e 100644
--- a/hw/i386/kvm/Makefile.objs
+++ b/hw/i386/kvm/Makefile.objs
@@ -1 +1 @@
-obj-y += clock.o apic.o i8259.o ioapic.o i8254.o pci-assign.o
+obj-y += clock.o apic.o i8259.o ioapic.o i8254.o
diff --git a/hw/xen/Makefile.objs b/hw/xen/Makefile.objs
index 64a70bc6cb..9ea5c73423 100644
--- a/hw/xen/Makefile.objs
+++ b/hw/xen/Makefile.objs
@@ -3,3 +3,4 @@ common-obj-$(CONFIG_XEN) += xen_backend.o xen_devconfig.o 
xen_pvdev.o xen-common
 
 obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
 obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o 
xen_pt_graphics.o xen_pt_msi.o
+obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt_load_rom.o
diff --git a/hw/i386/pci-assign-load-rom.c b/hw/xen/xen_pt_load_rom.c
similarity index 96%
rename from hw/i386/pci-assign-load-rom.c
rename to hw/xen/xen_pt_load_rom.c
index 43429b66be..2bc3b6c092 100644
--- a/hw/i386/pci-assign-load-rom.c
+++ b/hw/xen/xen_pt_load_rom.c
@@ -12,7 +12,7 @@
 #include "qemu/range.h"
 #include "sysemu/sysemu.h"
 #include "hw/pci/pci.h"
-#include "hw/pci/pci-assign.h"
+#include "xen_pt.h"
 
 /*
  * Scan the assigned devices for the devices that have an option ROM, and then
@@ -80,7 +80,7 @@ close_rom:
 fseek(fp, 0, SEEK_SET);
 val = 0;
 if (!fwrite(, 1, 1, fp)) {
-DEBUG("%s\n", "Failed to disable pci-sysfs rom file");
+XEN_PT_WARN("%s\n", "Failed to disable pci-sysfs rom file");
 }
 fclose(fp);
 
diff --git a/include/hw/pci/pci-assign.h b/include/hw/pci/pci-assign.h
deleted file mode 100644
index 55f42c56fa..00
--- a/include/hw/pci/pci-assign.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/*
- * This work is licensed under the terms of the GNU GPL, version 2.  See
- * the COPYING file in the top-level directory.
- *
- * Just split from hw/i386/kvm/pci-assign.c.
- */
-#ifndef PCI_ASSIGN_H
-#define PCI_ASSIGN_H
-
-#include "hw/pci/pci.h"
-
-//#define DEVICE_ASSIGNMENT_DEBUG
-
-#ifdef DEVICE_ASSIGNMENT_DEBUG
-#define DEBUG(fmt, ...)   \
-do {  \
-fprintf(stderr, "%s: " fmt, __func__ , __VA_ARGS__);  \
-} while (0)
-#else
-#define DEBUG(fmt, ...)
-#endif
-
-void *pci_assign_dev_load_option_rom(PCIDevice *dev, struct Object *owner,
- int *size, unsigned int domain,
- unsigned int bus, unsigned int slot,
- unsigned int function);
-#endif 

Re: [Xen-devel] [PATCH v2 2/2] x86: convert x86_platform_ops to timespec64

2017-10-19 Thread Paolo Bonzini
On 19/10/2017 12:57, Arnd Bergmann wrote:
> The x86 platform operations are fairly isolated, so we can
> change them from using timespec to timespec64. I checked that
> All the users and callers are safe, and there is only one
> critical function that is broken beyond 2106:
> 
> pvclock_read_wallclock() uses a 32-bit number of seconds since
> the epoch to communicate the boot time between host and guest
> in a virtual environment. This will work until 2106, but we
> should ideally find a replacement anyway. I've added a comment
> about it there.
> 
> Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
> Signed-off-by: Arnd Bergmann <a...@arndb.de>
> ---
> v2 changes:
> - move comment block (Boris)
> - remove unnecessary type cast (Boris)
> - fix format string (0day bot)
> - fix include order (0day bot)
> ---
>  arch/x86/include/asm/intel_mid_vrtc.h|  4 ++--
>  arch/x86/include/asm/mc146818rtc.h   |  4 ++--
>  arch/x86/include/asm/pvclock.h   |  2 +-
>  arch/x86/include/asm/x86_init.h  |  7 +++
>  arch/x86/kernel/kvmclock.c   |  4 ++--
>  arch/x86/kernel/pvclock.c| 15 +++
>  arch/x86/kernel/rtc.c| 16 
>  arch/x86/platform/intel-mid/intel_mid_vrtc.c | 12 ++--
>  arch/x86/xen/time.c  | 10 +-
>  9 files changed, 40 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index d88967659098..01c76e8cd4be 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -58,7 +58,7 @@ EXPORT_SYMBOL_GPL(pvclock_pvti_cpu0_va);
>   * have elapsed since the hypervisor wrote the data. So we try to account for
>   * that with system time
>   */
> -static void kvm_get_wallclock(struct timespec *now)
> +static void kvm_get_wallclock(struct timespec64 *now)
>  {
>   struct pvclock_vcpu_time_info *vcpu_time;
>   int low, high;
> @@ -77,7 +77,7 @@ static void kvm_get_wallclock(struct timespec *now)
>   put_cpu();
>  }
>  
> -static int kvm_set_wallclock(const struct timespec *now)
> +static int kvm_set_wallclock(const struct timespec64 *now)
>  {
>   return -1;
>  }
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 5c3f6d6a5078..013bef851664 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -121,26 +121,33 @@ u64 pvclock_clocksource_read(struct 
> pvclock_vcpu_time_info *src)
>  
>  void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
>   struct pvclock_vcpu_time_info *vcpu_time,
> - struct timespec *ts)
> + struct timespec64 *ts)
>  {
>   u32 version;
>   u64 delta;
> - struct timespec now;
> + struct timespec64 now;
>  
>   /* get wallclock at system boot */
>   do {
>   version = wall_clock->version;
>   rmb();  /* fetch version before time */
> + /*
> +  * Note: wall_clock->sec is a u32 value, so it can
> +  * only store dates between 1970 and 2106. To allow
> +  * times beyond that, we need to create a new hypercall
> +  * interface with an extended pvclock_wall_clock structure
> +  * like ARM has.
> +  */
>   now.tv_sec  = wall_clock->sec;
>   now.tv_nsec = wall_clock->nsec;
>   rmb();  /* fetch time before checking version */
>   } while ((wall_clock->version & 1) || (version != wall_clock->version));
>  
>   delta = pvclock_clocksource_read(vcpu_time);/* time since system 
> boot */
> - delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
> + delta += now.tv_sec * NSEC_PER_SEC + now.tv_nsec;
>  
>   now.tv_nsec = do_div(delta, NSEC_PER_SEC);
>   now.tv_sec = delta;
>  
> - set_normalized_timespec(ts, now.tv_sec, now.tv_nsec);
> + set_normalized_timespec64(ts, now.tv_sec, now.tv_nsec);
>  }
For kvmclock.c and pvclock.c,

Acked-by: Paolo Bonzini <pbonz...@redhat.com>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-18 Thread Paolo Bonzini
On 18/10/2017 10:32, Roger Pau Monné wrote:
>> I'll have a try to check how much the differences would affect. If it
>> would not take too much work, I'd like to adapt Xen NVDIMM enabling
>> patches to the all QEMU built ACPI. Otherwise, I'll fall back to Paolo
>> and MST's suggestions.
> I don't agree with the end goal of fully switching to the QEMU build
> ACPI tables. First of all, the only entity that has all the
> information about the guest it's the toolstack, and so it should be
> the one in control of the ACPI tables.
> 
> Also, Xen guests can use several device models concurrently (via the
> ioreq server interface), and each should be able to contribute to the
> information presented in the ACPI tables. Intel is also working on
> adding IOMMU emulation to the Xen hypervisor, in which case the vIOMMU
> ACPI tables should be created by the toolstack and not QEMU. And
> finally keep in mind that there are Xen guests (PVH) that use ACPI
> tables but not QEMU.

I agree with this in fact; QEMU has a view of _most_ of the emulated
hardware, but not all.

However, I disagree that the toolstack should be alone in controlling
the ACPI tables; rather, each involved part of the stack should be
providing its own part of the tables.  For example, QEMU (in addition to
NVDIMM information) should be the one providing an SSDT for southbridge
devices (floppy, COMx, LPTx, etc.).

The Xen stack (or more likely, hvmloader itself) would provide all the
bits that are provided by the hypervisor (MADT for the IOAPIC, another
SSDT for the HPET and RTC, DMAR tables for IOMMU, and so on).  This
should also work just fine for PVH.  Of course backwards compatibility
is the enemy of simplification, but in the end things _should_ actually
be simpler and I think it's a good idea if a prerequisite for Xen
vNVDIMM is to move AML code for QEMU devices out of hvmloader.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-17 Thread Paolo Bonzini
On 14/10/2017 00:46, Stefano Stabellini wrote:
> On Fri, 13 Oct 2017, Jan Beulich wrote:
> On 13.10.17 at 13:13,  wrote:
>>> To Jan, Andrew, Stefano and Anthony,
>>>
>>> what do you think about allowing QEMU to build the entire guest ACPI
>>> and letting SeaBIOS to load it? ACPI builder code in hvmloader is
>>> still there and just bypassed in this case.
>> Well, if that can be made work in a non-quirky way and without
>> loss of functionality, I'd probably be fine. I do think, however,
>> that there's a reason this is being handled in hvmloader right now.
> And not to discourage you, just as a clarification, you'll also need to
> consider backward compatibility: unless the tables are identical, I
> imagine we'll have to keep using the old tables for already installed
> virtual machines.

I agree.  Some of them are already identical, some are not but the QEMU
version should be okay, and for yet more it's probably better to keep
the Xen-specific parts in hvmloader.

The good thing is that it's possible to proceed incrementally once you
have the hvmloader support for merging the QEMU and hvmloader RSDT or
XSDT (whatever you are using), starting with just NVDIMM and proceeding
later with whatever you see fit.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: convert x86_platform_ops to timespec64

2017-10-16 Thread Paolo Bonzini
On 16/10/2017 14:16, Arnd Bergmann wrote:
> On Mon, Oct 16, 2017 at 2:08 PM, Paolo Bonzini <pbonz...@redhat.com> wrote:
>> On 16/10/2017 10:11, Arnd Bergmann wrote:
>>> Thanks!
>>>
>>> Since you've looked at it overall, do you have an opinion on the question
>>> how to fix the PV interface to deal with the pvclock_wall_clock overflow?
>>
>> It has to be done separately for each hypervisor.
>>
>> In KVM, for example, it is probably best to abandon
>> pvclock_read_wallclock altogether, and instead use the recently
>> introduced KVM_HC_CLOCK_PAIRING hypercall.  drivers/ptp/ptp_kvm.c is
>> already using it and it's y2106 safe.
> 
> Right, makes sense. I see that this interface is currently implemented
> only for 64-bit x86 in kvm_emulate_hypercall(). Could this be extended
> to x86-32 and the non-x86 architectures as well?

Yes, it could be implemented for x86-32 too.  The whole pvclock concept
however is specific to x86.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: convert x86_platform_ops to timespec64

2017-10-16 Thread Paolo Bonzini
On 16/10/2017 10:11, Arnd Bergmann wrote:
> Thanks!
> 
> Since you've looked at it overall, do you have an opinion on the question
> how to fix the PV interface to deal with the pvclock_wall_clock overflow?

It has to be done separately for each hypervisor.

In KVM, for example, it is probably best to abandon
pvclock_read_wallclock altogether, and instead use the recently
introduced KVM_HC_CLOCK_PAIRING hypercall.  drivers/ptp/ptp_kvm.c is
already using it and it's y2106 safe.

Paolo

> Should we add a new version now and deprecate the existing one, or
> do you think that y2106 is far enough out that we should just ignore the
> problem?


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [RFC QEMU PATCH v3 00/10] Implement vNVDIMM for Xen HVM guest

2017-10-12 Thread Paolo Bonzini
On 12/10/2017 14:45, Haozhong Zhang wrote:
> Basically, QEMU builds two ROMs for guest, /rom@etc/acpi/tables and
> /rom@etc/table-loader. The former is unstructured to guest, and
> contains all data of guest ACPI. The latter is a BIOSLinkerLoader
> organized as a set of commands, which direct the guest (e.g., SeaBIOS
> on KVM/QEMU) to relocate data in the former file, recalculate checksum
> of specified area, and fill guest address in specified ACPI field.
> 
> One part of my patches is to implement a mechanism to tell Xen which
> part of ACPI data is a table (NFIT), and which part defines a
> namespace device and what the device name is. I can add two new loader
> commands for them respectively.
> 
> Because they just provide information and SeaBIOS in non-xen
> environment ignores unrecognized commands, they will not break SeaBIOS
> in non-xen environment.
> 
> On QEMU side, most Xen-specific hacks in ACPI builder could be
> dropped, and replaced by adding the new loader commands (though they
> may be used only by Xen).
> 
> On Xen side, a fw_cfg driver and a BIOSLinkerLoader command executor
> are needed in, perhaps, hvmloader.

If Xen has to parse BIOSLinkerLoader, it can use the existing commands
to process a reduced set of ACPI tables.  In other words,
etc/acpi/tables would only include the NFIT, the SSDT with namespace
devices, and the XSDT.  etc/acpi/rsdp would include the RSDP table as usual.

hvmloader can then:

1) allocate some memory for where the XSDT will go

2) process the BIOSLinkerLoader like SeaBIOS would do

3) find the RSDP in low memory, since the loader script must have placed
it there.  If it cannot find it, allocate some low memory, fill it with
the RSDP header and revision, and and jump to step 6

4) If it found QEMU's RSDP, use it to find QEMU's XSDT

5) Copy ACPI table pointers from QEMU to hvmloader's RSDT and/or XSDT.

6) build hvmloader tables and link them into the RSDT and/or XSDT as usual.

7) overwrite the RSDP in low memory with a pointer to hvmloader's own
RSDT and/or XSDT, and updated the checksums

QEMU's XSDT remains there somewhere in memory, unused but harmless.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] KVM PV

2017-10-02 Thread Paolo Bonzini
On 02/10/2017 12:36, George Dunlap wrote:
>>> Although I'm not business man, I don't think the top cloud provider[s]
>>> would allow nested virtualization, however mature nested virtualization
>>> is. Even xen-pv is unable to be nested in the aws and azure.
>>
>> Check the contributors to KVM nested virtualization, you might be surprised.
>>
>> Nested Xen PV is not possible because the Xen hypervisor cannot run as a PV 
>> guest.>> It's a technical limitation.
> 
> Minor correction: Xen can't run on AWS as a PV guest, but it can run
> as an L1 hypervisor inside any "fully virtualized" VM (as both AWS and
> Azure provide), and provide PV L2 guests.

Yes, that's what I meant.

Thanks George!

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] KVM PV (was: Re: [PATCH v2 2/2] x86/lguest: remove lguest support)

2017-09-29 Thread Paolo Bonzini

- Lai Jiangshan <jiangshanlai+l...@gmail.com> ha scritto:
> On Sat, Sep 30, 2017 at 12:39 AM, Paolo Bonzini <pbonz...@redhat.com> wrote:
> > On 29/09/2017 17:47, Lai Jiangshan wrote:
> >> Hello, all
> >>
> >> An interesting (at least to me) thinking came up to me when I found
> >> that the lguest was removed. But I don't have enough knowledge
> >> to find out the answer nor energy to implement it in some time.
> >>
> >> Is it possible to implement kvm-pv which allows kvm to run on
> >> the boxes without hardware virtualization support, so that
> >> qemu/kvm can be used on clouds such as aws, azure?
> >
> > No, please don't. :)  Even Xen is moving from PV to PVH (paravirtualized
> > hardware with event channels, grant tables and the like, but still using
> > hardware extensions for MMU).
> >
> > Rather, cloud providers should help getting nested virtualization ready
> > for production use.  At least for KVM it's not that far.
> >
> 
> Although I'm not business man, I don't think the top cloud provider[s]
> would allow nested virtualization, however mature nested virtualization
> is. Even xen-pv is unable to be nested in the aws and azure.

Check the contributors to KVM nested virtualization, you might be surprised.

Nested Xen PV is not possible because the Xen hypervisor cannot run as a PV 
guest. It's a technical limitation.

Paolo

> 
> Thanks,
> Lai
> 
> >


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] KVM PV (was: Re: [PATCH v2 2/2] x86/lguest: remove lguest support)

2017-09-29 Thread Paolo Bonzini
On 29/09/2017 17:47, Lai Jiangshan wrote:
> Hello, all
> 
> An interesting (at least to me) thinking came up to me when I found
> that the lguest was removed. But I don't have enough knowledge
> to find out the answer nor energy to implement it in some time.
> 
> Is it possible to implement kvm-pv which allows kvm to run on
> the boxes without hardware virtualization support, so that
> qemu/kvm can be used on clouds such as aws, azure?

No, please don't. :)  Even Xen is moving from PV to PVH (paravirtualized
hardware with event channels, grant tables and the like, but still using
hardware extensions for MMU).

Rather, cloud providers should help getting nested virtualization ready
for production use.  At least for KVM it's not that far.

Paolo

> Without hardware virtualization support, the host kvm-pv module and
> the guest linux kernel need to cooperate in some ways. And some kvm
> facilities can help. For instance, the existing shadow-paging, which
> was not introduced when lguest had been added to kernel, could be
> reused to help on mmu virtualization. For guest kernel/userspace
> separation in x86_64, the intel cpu's segment registers can help too.
> (or use a new set of page-table for the guest kernel on amd64).
> 
> The thought is quite shallow, but I hope this email brings some
> inspirations rather than annoyance. And I'm sorry if the later things
> would happen.
> 
> Thanks,
> Lai.
> 
> On Thu, Aug 17, 2017 at 1:31 AM, Juergen Gross  wrote:
>> Lguest seems to be rather unused these days. It has seen only patches
>> ensuring it still builds the last two years and its official state is
>> "Odd Fixes".
>>
>> Nuke it in order to be able to clean up the paravirt code.
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 3/3] MAINTAINERS: xen, kvm: track pvclock-abi.h changes

2017-09-28 Thread Paolo Bonzini
On 27/09/2017 18:06, Konrad Rzeszutek Wilk wrote:
> On Wed, Sep 27, 2017 at 02:46:23PM +0100, Joao Martins wrote:
>> This file defines an ABI shared between guest and hypervisor(s)
>> (KVM, Xen) and as such there should be an correspondent entry in
>> MAINTAINERS file. Notice that there's already a text notice at the
>> top of the header file, hence this commit simply enforces it more
>> explicitly and have both peers noticed when such changes happen.
>>
>> Signed-off-by: Joao Martins <joao.m.mart...@oracle.com>
>> Acked-by: Juergen Gross <jgr...@suse.com>
> 
> Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>

Acked-by: Paolo Bonzini <pbonz...@redhat.com>

>> ---
>> In the end, I choose the originally posted because this is so far the
>> only ABI shared between Xen/KVM. Therefore whenever we have more things
>> shared it would deserve its own place in MAINTAINERS file. If the
>> thinking is wrong, I can switch to the alternative with a
>> "PARAVIRT ABIS" section.
>>
>> Changes since v1:
>>  * Add Juergen Gross Acked-by.
>> ---
>>  MAINTAINERS | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 6671f375f7fc..a4834c3c377a 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -7603,6 +7603,7 @@ S: Supported
>>  F:  arch/x86/kvm/
>>  F:  arch/x86/include/uapi/asm/kvm*
>>  F:  arch/x86/include/asm/kvm*
>> +F:  arch/x86/include/asm/pvclock-abi.h
>>  F:  arch/x86/kernel/kvm.c
>>  F:  arch/x86/kernel/kvmclock.c
>>  
>> @@ -14718,6 +14719,7 @@ F:   arch/x86/xen/
>>  F:  drivers/*/xen-*front.c
>>  F:  drivers/xen/
>>  F:  arch/x86/include/asm/xen/
>> +F:  arch/x86/include/asm/pvclock-abi.h
>>  F:  include/xen/
>>  F:  include/uapi/xen/
>>  F:  Documentation/ABI/stable/sysfs-hypervisor-xen
>> -- 
>> 2.11.0
>>


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: Skip check apic_id_limit for Xen

2017-08-18 Thread Paolo Bonzini
On 18/08/2017 18:38, Eduardo Habkost wrote:
>>> I think you still need to do a check for vIOMMU being enabled.
>>
>> Yes, this will be done in the Xen tool stack and Qemu doesn't have such
>> knowledge. Operations of create, destroy Xen vIOMMU will be done in the
>> Xen tool stack.
>
> Shouldn't we make QEMU have knowledge of the vIOMMU device, then?
> Won't QEMU need to know about it eventually?

No, it actually makes sense since Xen hosts all system-wide interrupt
sources outside QEMU, unlike KVM which only hosts the (per-CPU) local APIC.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: Skip check apic_id_limit for Xen

2017-08-16 Thread Paolo Bonzini
On 16/08/2017 02:22, Lan Tianyu wrote:
> Xen vIOMMU device model will be in Xen hypervisor. Skip vIOMMU
> check for Xen here when vcpu number is more than 255.

I think you still need to do a check for vIOMMU being enabled.

Paolo

> Signed-off-by: Lan Tianyu 
> ---
>  hw/i386/pc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 5943539..fc17885 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1260,7 +1260,7 @@ void pc_machine_done(Notifier *notifier, void *data)
>  fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
>  }
>  
> -if (pcms->apic_id_limit > 255) {
> +if (pcms->apic_id_limit > 255 && !xen_enabled()) {
>  IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default());
>  
>  if (!iommu || !iommu->x86_iommu.intr_supported ||
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 0/3] Qemu: Add Xen vIOMMU interrupt remapping function support

2017-08-10 Thread Paolo Bonzini
On 09/08/2017 22:51, Lan Tianyu wrote:
> This patchset is to deal with MSI interrupt remapping request when guest
> updates MSI registers.
> 
> Chao Gao (3):
>   i386/msi: Correct mask of destination ID in MSI address
>   xen-pt: bind/unbind interrupt remapping format MSI
>   msi: Handle remappable format interrupt request
> 
>  configure |  4 +++-
>  hw/i386/xen/xen-hvm.c |  8 ++-
>  hw/pci/msi.c  |  5 +++--
>  hw/pci/msix.c |  4 +++-
>  hw/xen/xen_pt_msi.c   | 52 
> +++
>  include/hw/i386/apic-msidef.h |  3 ++-
>  include/hw/xen/xen.h  |  2 +-
>  include/hw/xen/xen_common.h   | 25 +
>  stubs/xen-hvm.c   |  2 +-
>  9 files changed, 83 insertions(+), 22 deletions(-)
> 

Non-Xen parts look good (though I cannot ack them).

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache

2017-07-25 Thread Paolo Bonzini

> > Thanks---however, after re-reading xen-mapcache.c, dma needs to be false
> > for unlocked mappings.
> 
> If there is a DMA operation already in progress, it means that we'll
> already have a locked mapping for it.

Yes, I only wanted to say that qemu_ram_ptr_length should pass dma=false
when called by address_space_*_continue (i.e. with locked=false).

Paolo

> When address_space_write_continue is called, which in turn would call
> qemu_map_ram_ptr, or qemu_ram_ptr_length(unlocked), if the start and
> size of the requested mapping matches the one of the previously created
> locked mapping, then a pointer to the locked mapping will be returned.
> 
> If they don't match, a new unlocked mapping will be created and a
> pointer to it will be returned. (Arguably the algorithm could be
> improved so that a new mapping is not created if the address and size
> are contained within the locked mapping. This is a missing optimization
> today.)
> 
> It doesn't matter if a new unlocked mapping is created, or if the locked
> mapping is returned, because the pointer returned by
> qemu_ram_ptr_length(unlocked) is only used to do the memcpy, and never
> again. So I don't think this is a problem.
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache

2017-07-25 Thread Paolo Bonzini


- Original Message -
> From: "Stefano Stabellini" <sstabell...@kernel.org>
> To: "Paolo Bonzini" <pbonz...@redhat.com>
> Cc: "Anthony PERARD" <anthony.per...@citrix.com>, "Stefano Stabellini" 
> <sstabell...@kernel.org>,
> xen-devel@lists.xen.org, qemu-de...@nongnu.org
> Sent: Tuesday, July 25, 2017 8:08:21 PM
> Subject: Re: QEMU commit 04bf2526ce breaks use of xen-mapcache
> 
> On Tue, 25 Jul 2017, Paolo Bonzini wrote:
> > > Hi,
> > > 
> > > Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram)
> > > start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr().
> > > That result in calling xen_map_cache() with lock=true, but this mapping
> > > is never invalidated.
> > > So QEMU use more and more RAM until it stop working for a reason or an
> > > other. (crash if host have little RAM or stop emulating but no crash)
> > > 
> > > I don't know if calling xen_invalidate_map_cache_entry() in
> > > address_space_read_continue() and address_space_write_continue() is the
> > > right answer.  Is there something better to do ?
> > 
> > I think it's correct for dma to be true... maybe add a lock argument to
> > qemu_ram_ptr_length, so that make address_space_{read,write}_continue can
> > pass 0 and everyone else passes 1?
> 
> I think that is a great suggestion. That way, the difference between
> locked mappings and unlocked mappings would be explicit, rather than
> relying on callers to use qemu_map_ram_ptr for unlocked mappings and
> qemu_ram_ptr_length for locked mappings. And there aren't that many
> callers of qemu_ram_ptr_length, so adding a parameter wouldn't be an
> issue.

Thanks---however, after re-reading xen-mapcache.c, dma needs to be false
for unlocked mappings.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] QEMU commit 04bf2526ce breaks use of xen-mapcache

2017-07-25 Thread Paolo Bonzini
> Hi,
> 
> Commits 04bf2526ce (exec: use qemu_ram_ptr_length to access guest ram)
> start using qemu_ram_ptr_length() instead of qemu_map_ram_ptr().
> That result in calling xen_map_cache() with lock=true, but this mapping
> is never invalidated.
> So QEMU use more and more RAM until it stop working for a reason or an
> other. (crash if host have little RAM or stop emulating but no crash)
> 
> I don't know if calling xen_invalidate_map_cache_entry() in
> address_space_read_continue() and address_space_write_continue() is the
> right answer.  Is there something better to do ?

I think it's correct for dma to be true... maybe add a lock argument to
qemu_ram_ptr_length, so that make address_space_{read,write}_continue can
pass 0 and everyone else passes 1?

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] xen-disk: use an IOThread per instance

2017-06-20 Thread Paolo Bonzini
On 20/06/2017 15:47, Paul Durrant wrote:
> This patch allocates an IOThread object for each xen_disk instance and
> sets the AIO context appropriately on connect. This allows processing
> of I/O to proceed in parallel.
> 
> The patch also adds tracepoints into xen_disk to make it possible to
> follow the state transtions of an instance in the log.
> 
> Signed-off-by: Paul Durrant 

The QEMU block layer is not yet thread safe, but code running in
IOThreads still has to take the AioContext lock.  You need to call
aio_context_acquire/release in blk_bh and qemu_aio_complete.

Paolo

> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> ---
>  hw/block/trace-events |  7 +++
>  hw/block/xen_disk.c   | 44 +++-
>  2 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 65e83dc258..608b24ba66 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int 
> num_reqs, uint64_t offset,
>  # hw/block/hd-geometry.c
>  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p 
> LCHS %d %d %d"
>  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, 
> int trans) "blk %p CHS %u %u %u trans %d"
> +
> +# hw/block/xen_disk.c
> +xen_disk_alloc(char *name) "%s"
> +xen_disk_init(char *name) "%s"
> +xen_disk_connect(char *name) "%s"
> +xen_disk_disconnect(char *name) "%s"
> +xen_disk_free(char *name) "%s"
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index a9942d32db..ec1085c802 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -27,10 +27,13 @@
>  #include "hw/xen/xen_backend.h"
>  #include "xen_blkif.h"
>  #include "sysemu/blockdev.h"
> +#include "sysemu/iothread.h"
>  #include "sysemu/block-backend.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qstring.h"
> +#include "qom/object_interfaces.h"
> +#include "trace.h"
>  
>  /* - */
>  
> @@ -128,6 +131,9 @@ struct XenBlkDev {
>  DriveInfo   *dinfo;
>  BlockBackend*blk;
>  QEMUBH  *bh;
> +
> +IOThread*iothread;
> +AioContext  *ctx;
>  };
>  
>  /* - */
> @@ -923,11 +929,31 @@ static void blk_bh(void *opaque)
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
> +Object *obj;
> +char *name;
> +Error *err = NULL;
> +
> +trace_xen_disk_alloc(xendev->name);
>  
>  QLIST_INIT(>inflight);
>  QLIST_INIT(>finished);
>  QLIST_INIT(>freelist);
> -blkdev->bh = qemu_bh_new(blk_bh, blkdev);
> +
> +obj = object_new(TYPE_IOTHREAD);
> +name = g_strdup_printf("iothread-%s", xendev->name);
> +
> +object_property_add_child(object_get_objects_root(), name, obj, );
> +assert(!err);
> +
> +g_free(name);
> +
> +user_creatable_complete(obj, );
> +assert(!err);
> +
> +blkdev->iothread = (IOThread *)object_dynamic_cast(obj, TYPE_IOTHREAD);
> +blkdev->ctx = iothread_get_aio_context(blkdev->iothread);
> +blkdev->bh = aio_bh_new(blkdev->ctx, blk_bh, blkdev);
> +
>  if (xen_mode != XEN_EMULATE) {
>  batch_maps = 1;
>  }
> @@ -954,6 +980,8 @@ static int blk_init(struct XenDevice *xendev)
>  int info = 0;
>  char *directiosafe = NULL;
>  
> +trace_xen_disk_init(xendev->name);
> +
>  /* read xenstore entries */
>  if (blkdev->params == NULL) {
>  char *h = NULL;
> @@ -1069,6 +1097,8 @@ static int blk_connect(struct XenDevice *xendev)
>  unsigned int i;
>  uint32_t *domids;
>  
> +trace_xen_disk_connect(xendev->name);
> +
>  /* read-only ? */
>  if (blkdev->directiosafe) {
>  qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
> @@ -1285,6 +1315,8 @@ static int blk_connect(struct XenDevice *xendev)
>  blkdev->persistent_gnt_count = 0;
>  }
>  
> +blk_set_aio_context(blkdev->blk, blkdev->ctx);
> +
>  xen_be_bind_evtchn(>xendev);
>  
>  xen_pv_printf(>xendev, 1, "ok: proto %s, nr-ring-ref %u, "
> @@ -1298,13 +1330,20 @@ static void blk_disconnect(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
>  
> +trace_xen_disk_disconnect(xendev->name);
> +
> +aio_context_acquire(blkdev->ctx);
> +
>  if (blkdev->blk) {
> +blk_set_aio_context(blkdev->blk, qemu_get_aio_context());
>  blk_detach_dev(blkdev->blk, blkdev);
>  blk_unref(blkdev->blk);
>  blkdev->blk = NULL;
>  }
>  xen_pv_unbind_evtchn(>xendev);
>  
> +

Re: [Xen-devel] [PATCH v2 2/3] Check the return value of fcntl in qemu_set_cloexec

2017-05-11 Thread Paolo Bonzini


On 09/05/2017 21:04, Stefano Stabellini wrote:
> Assert that the return value is not an error. This issue was found by
> Coverity.
> 
> CID: 1374831
> 
> Signed-off-by: Stefano Stabellini 
> CC: gr...@kaod.org
> CC: pbonz...@redhat.com
> CC: Eric Blake 

Queued, thanks.

Paolo

> ---
>  util/oslib-posix.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 4d9189e..16894ad 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -182,7 +182,9 @@ void qemu_set_cloexec(int fd)
>  {
>  int f;
>  f = fcntl(fd, F_GETFD);
> -fcntl(fd, F_SETFD, f | FD_CLOEXEC);
> +assert(f != -1);
> +f = fcntl(fd, F_SETFD, f | FD_CLOEXEC);
> +assert(f != -1);
>  }
>  
>  /*
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC] xen/mapcache: store dma information in revmapcache entries for debugging

2017-05-03 Thread Paolo Bonzini


On 03/05/2017 02:08, Stefano Stabellini wrote:
> The Xen mapcache is able to create long term mappings, they are called
> "locked" mappings. The third parameter of the xen_map_cache call
> specifies if a mapping is a "locked" mapping.
> 
> 
> From the QEMU point of view there are two kinds of long term mappings:
> 
> [a] device memory mappings, such as option roms and video memory
> [b] dma mappings, created by dma_memory_map & friends
> 
> After certain operations, ballooning a VM in particular, Xen asks QEMU
> kindly to destroy all mappings. However, certainly [a] mappings are
> present and cannot be removed. That's not a problem as they are not
> affected by balloonning. The *real* problem is that if there are any
> mappings of type [b], any outstanding dma operations could fail. This is
> a known shortcoming. In other words, when Xen asks QEMU to destroy all
> mappings, it is an error if any [b] mappings exist.
> 
> However today we have no way of distinguishing [a] from [b]. Because of
> that, we cannot even print a decent warning.
> 
> This patch introduces a new "dma" bool field to MapCacheRev entires, to
> remember if a given mapping is for dma or is a long term device memory
> mapping. When xen_invalidate_map_cache is called, we print a warning if
> any [b] mappings exist. We ignore [a] mappings.
> 
> Mappings created by qemu_map_ram_ptr are assumed to be [a], while
> mappings created by address_space_map->qemu_ram_ptr_length are assumed
> to be [b].
> 
> The goal of the patch is to make debugging and system understanding
> easier.

Sure, why not.

Acked-by: Paolo Bonzini <pbonz...@redhat.com>

Paolo

> Signed-off-by: Stefano Stabellini <sstabell...@kernel.org>
> 
> diff --git a/exec.c b/exec.c
> index eac6085..85769e1 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2084,10 +2084,10 @@ void *qemu_map_ram_ptr(RAMBlock *ram_block, 
> ram_addr_t addr)
>   * In that case just map until the end of the page.
>   */
>  if (block->offset == 0) {
> -return xen_map_cache(addr, 0, 0);
> +return xen_map_cache(addr, 0, 0, false);
>  }
>  
> -block->host = xen_map_cache(block->offset, block->max_length, 1);
> +block->host = xen_map_cache(block->offset, block->max_length, 1, 
> false);
>  }
>  return ramblock_ptr(block, addr);
>  }
> @@ -2117,10 +2117,10 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, 
> ram_addr_t addr,
>   * In that case just map the requested area.
>   */
>  if (block->offset == 0) {
> -return xen_map_cache(addr, *size, 1);
> +return xen_map_cache(addr, *size, 1, true);
>  }
>  
> -block->host = xen_map_cache(block->offset, block->max_length, 1);
> +block->host = xen_map_cache(block->offset, block->max_length, 1, 
> true);
>  }
>  
>  return ramblock_ptr(block, addr);
> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> index 31debdf..fa4282a 100644
> --- a/hw/i386/xen/xen-mapcache.c
> +++ b/hw/i386/xen/xen-mapcache.c
> @@ -62,6 +62,7 @@ typedef struct MapCacheRev {
>  hwaddr paddr_index;
>  hwaddr size;
>  QTAILQ_ENTRY(MapCacheRev) next;
> +bool dma;
>  } MapCacheRev;
>  
>  typedef struct MapCache {
> @@ -202,7 +203,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
>  }
>  
>  static uint8_t *xen_map_cache_unlocked(hwaddr phys_addr, hwaddr size,
> -   uint8_t lock)
> +   uint8_t lock, bool dma)
>  {
>  MapCacheEntry *entry, *pentry = NULL;
>  hwaddr address_index;
> @@ -289,6 +290,7 @@ tryagain:
>  if (lock) {
>  MapCacheRev *reventry = g_malloc0(sizeof(MapCacheRev));
>  entry->lock++;
> +reventry->dma = dma;
>  reventry->vaddr_req = mapcache->last_entry->vaddr_base + 
> address_offset;
>  reventry->paddr_index = mapcache->last_entry->paddr_index;
>  reventry->size = entry->size;
> @@ -300,12 +302,12 @@ tryagain:
>  }
>  
>  uint8_t *xen_map_cache(hwaddr phys_addr, hwaddr size,
> -   uint8_t lock)
> +   uint8_t lock, bool dma)
>  {
>  uint8_t *p;
>  
>  mapcache_lock();
> -p = xen_map_cache_unlocked(phys_addr, size, lock);
> +p = xen_map_cache_unlocked(phys_addr, size, lock, dma);
>  mapcache_unlock();
>  return p;
>  }
> @@ -426,8 +428,10 @@ void xen_invalidate_map_cache(void)
>  mapcache_lock();
>  
>  QTAILQ_FOREAC

Re: [Xen-devel] [Qemu-devel] [PATCH v2 2/6] qdict: Add convenience helpers for wrapped puts

2017-04-05 Thread Paolo Bonzini


On 05/04/2017 11:01, Richard W.M. Jones wrote:
> On Wed, Apr 05, 2017 at 10:38:37AM +0200, Julia Lawall wrote:
>> OK, there is nothing special about g_assert_cmpint, but Coccinelle expects
>> expressions or types in function argument lists, so it gives a parse error
>> on finding an ==.
> 
> I should have checked the coccinelle mailing list before asking you,
> because I see that Eric already asked this question and you answered
> it.  For reference, that is here:
> 
>   https://systeme.lip6.fr/pipermail/cocci/2017-April/004107.html
> 
> Thanks again,

Eric, are you using the scripts/cocci-macro-file.h?

commit 6ad978e9f40d3edfd9f4a86b4a60e3523eff08fe
Author: Paolo Bonzini <pbonz...@redhat.com>
Date:   Wed May 18 11:11:55 2016 +0200

coccinelle: add g_assert_cmp* to macro file

This helps applying semantic patches to unit tests.

Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>

Thanks,

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 1/8] xen: import ring.h from xen

2017-03-29 Thread Paolo Bonzini


On 29/03/2017 01:54, Stefano Stabellini wrote:
>>> I understand your point of view, and honestly it wouldn't be a problem
>>> doing it the way you suggested either. However, I think that going
>>> forward it will be less of a maintenance pain to keep ring.h in sync,
>>> compared to maintaining a versioned build dependency between Xen and
>>> QEMU for the compilation of one PV backend. We do have version checks
>>> in QEMU for Xen compatibility, but not for PV backends or the xenpv
>>> machine yet.
>> For the pvUSB backend I just used a mandatory macro from the header for
>> the #ifdef. The backend will signal support when it was defined during
>> build and will refuse initialization otherwise. Xen tools are able to
>> recoginze qemu support of the backend by looking into Xenstore.
> 
> What do you think of the following:

Let's just copy the header...

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 0/4] Qemu: Add Xen vIOMMU support

2017-03-20 Thread Paolo Bonzini


On 20/03/2017 15:17, Roger Pau Monné wrote:
>>> Hi Paolo:
>>> Thanks for review. For Xen side, we won't reuse Intel IOMMU device model
>>> in Qemu and create counterpart in Xen hypervisor. The reasons are
>>>  1) Avoid round trips between Qemu and Xen hypervisor
>>>  2) Ease of integration with the rest part of the hypervisor(vIOAPIC,
>>> vMSI and so on).
>> Fair enough, though I'd be worried about increasing the attack surface
>> of the hypervisor.  For KVM, for example, IOMMU emulation requires using
>> the "split irqchip" feature to move the PIC and IOAPIC out of the kernel
>> and back to QEMU.
> Yes, that's right, we are increasing the surface of attack. But Xen also needs
> it in order to support APIC IDs > 255 on PVH guests (that have a local APIC 
> but
> no QEMU).

Not necessarily, you only need x2APIC support for that in the local APIC
emulation.  The MSI hypercalls (similar to KVM's MSI ioctls) can be
extended to accept x2APIC VCPU ids.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 0/4] Qemu: Add Xen vIOMMU support

2017-03-20 Thread Paolo Bonzini


On 20/03/2017 03:40, Lan Tianyu wrote:
>>> Xen only supports emulated I440 and so we enable vIOMMU with emulated
>>> I440 chipset. This works on Linux and Windows guest.
>> Any plans to change this?  Why is Xen not able to use Q35 with Intel
>> IOMMU, with only special hooks for interrupt remapping?
>>
>> Paolo
>>
> Hi Paolo:
> Thanks for review. For Xen side, we won't reuse Intel IOMMU device model
> in Qemu and create counterpart in Xen hypervisor. The reasons are
>  1) Avoid round trips between Qemu and Xen hypervisor
>  2) Ease of integration with the rest part of the hypervisor(vIOAPIC,
> vMSI and so on).

Fair enough, though I'd be worried about increasing the attack surface
of the hypervisor.  For KVM, for example, IOMMU emulation requires using
the "split irqchip" feature to move the PIC and IOAPIC out of the kernel
and back to QEMU.

Also, I think this series is missing changes to support IOMMU
translation in the vIOMMU device model.

Paolo

> We don't have plan to enable Q35 for Xen now. There is no dependency
> between Q35 emulation and vIOMMU implementation from our test. As
> Stefano mentioned, Anthony worked on the Q35 support before. These two
> tasks can be done in parallel.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH 0/4] Qemu: Add Xen vIOMMU support

2017-03-17 Thread Paolo Bonzini


On 17/03/2017 12:29, Lan Tianyu wrote:
> This patchset is to add Xen vIOMMU device model and handle
> irq remapping stuffs. Xen vIOMMU emulation is in the Xen hypervisor
> and the new device module in Qemu works as hypercall wrappers to
> create and destroy vIOMMU in hypervisor.
> 
> Xen only supports emulated I440 and so we enable vIOMMU with emulated
> I440 chipset. This works on Linux and Windows guest.

Any plans to change this?  Why is Xen not able to use Q35 with Intel
IOMMU, with only special hooks for interrupt remapping?

Paolo

> Chao Gao (4):
>   I440: Allow adding sysbus devices with -device on I440
>   Xen: add a dummy vIOMMU to create/destroy vIOMMU in Xen
>   xen-pt: bind/unbind interrupt remapping format MSI
>   msi: taking interrupt format into consideration during judging a pirq
> is binded with a event channel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/9] xen: do not build backends for targets that do not support xen

2017-03-17 Thread Paolo Bonzini


On 16/03/2017 21:01, Stefano Stabellini wrote:
> Change Makefile.objs to use CONFIG_XEN instead of CONFIG_XEN_BACKEND, so
> that the Xen backends are only built for targets that support Xen.
> 
> Set CONFIG_XEN in the toplevel Makefile to ensure that files that are
> built only once pick up Xen support properly.
> 
> Signed-off-by: Stefano Stabellini 
> CC: gr...@kaod.org
> CC: pbonz...@redhat.com
> CC: peter.mayd...@linaro.org
> CC: r...@twiddle.net
> CC: stefa...@redhat.com
> ---
>  Makefile | 1 +
>  hw/block/Makefile.objs   | 2 +-
>  hw/char/Makefile.objs| 2 +-
>  hw/display/Makefile.objs | 2 +-
>  hw/net/Makefile.objs | 2 +-
>  hw/usb/Makefile.objs | 2 +-
>  hw/xen/Makefile.objs | 2 +-
>  7 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 1c4c04f..b246138 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -26,6 +26,7 @@ endif
>  
>  CONFIG_SOFTMMU := $(if $(filter %-softmmu,$(TARGET_DIRS)),y)
>  CONFIG_USER_ONLY := $(if $(filter %-user,$(TARGET_DIRS)),y)
> +CONFIG_XEN := $(CONFIG_XEN_BACKEND)
>  CONFIG_ALL=y
>  -include config-all-devices.mak
>  -include config-all-disas.mak
> diff --git a/hw/block/Makefile.objs b/hw/block/Makefile.objs
> index d4c3ab7..e0ed980 100644
> --- a/hw/block/Makefile.objs
> +++ b/hw/block/Makefile.objs
> @@ -4,7 +4,7 @@ common-obj-$(CONFIG_SSI_M25P80) += m25p80.o
>  common-obj-$(CONFIG_NAND) += nand.o
>  common-obj-$(CONFIG_PFLASH_CFI01) += pflash_cfi01.o
>  common-obj-$(CONFIG_PFLASH_CFI02) += pflash_cfi02.o
> -common-obj-$(CONFIG_XEN_BACKEND) += xen_disk.o
> +common-obj-$(CONFIG_XEN) += xen_disk.o
>  common-obj-$(CONFIG_ECC) += ecc.o
>  common-obj-$(CONFIG_ONENAND) += onenand.o
>  common-obj-$(CONFIG_NVME_PCI) += nvme.o
> diff --git a/hw/char/Makefile.objs b/hw/char/Makefile.objs
> index 6ea76fe..725fdc4 100644
> --- a/hw/char/Makefile.objs
> +++ b/hw/char/Makefile.objs
> @@ -7,7 +7,7 @@ common-obj-$(CONFIG_SERIAL_ISA) += serial-isa.o
>  common-obj-$(CONFIG_SERIAL_PCI) += serial-pci.o
>  common-obj-$(CONFIG_VIRTIO) += virtio-console.o
>  common-obj-$(CONFIG_XILINX) += xilinx_uartlite.o
> -common-obj-$(CONFIG_XEN_BACKEND) += xen_console.o
> +common-obj-$(CONFIG_XEN) += xen_console.o
>  common-obj-$(CONFIG_CADENCE) += cadence_uart.o
>  
>  obj-$(CONFIG_EXYNOS4) += exynos4210_uart.o
> diff --git a/hw/display/Makefile.objs b/hw/display/Makefile.objs
> index 063889b..3d02e8b 100644
> --- a/hw/display/Makefile.objs
> +++ b/hw/display/Makefile.objs
> @@ -5,7 +5,7 @@ common-obj-$(CONFIG_JAZZ_LED) += jazz_led.o
>  common-obj-$(CONFIG_PL110) += pl110.o
>  common-obj-$(CONFIG_SSD0303) += ssd0303.o
>  common-obj-$(CONFIG_SSD0323) += ssd0323.o
> -common-obj-$(CONFIG_XEN_BACKEND) += xenfb.o
> +common-obj-$(CONFIG_XEN) += xenfb.o
>  
>  common-obj-$(CONFIG_VGA_PCI) += vga-pci.o
>  common-obj-$(CONFIG_VGA_ISA) += vga-isa.o
> diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
> index 610ed3e..6a95d92 100644
> --- a/hw/net/Makefile.objs
> +++ b/hw/net/Makefile.objs
> @@ -1,5 +1,5 @@
>  common-obj-$(CONFIG_DP8393X) += dp8393x.o
> -common-obj-$(CONFIG_XEN_BACKEND) += xen_nic.o
> +common-obj-$(CONFIG_XEN) += xen_nic.o
>  
>  # PCI network cards
>  common-obj-$(CONFIG_NE2000_PCI) += ne2000.o
> diff --git a/hw/usb/Makefile.objs b/hw/usb/Makefile.objs
> index 98b5c9d..5958be8 100644
> --- a/hw/usb/Makefile.objs
> +++ b/hw/usb/Makefile.objs
> @@ -40,5 +40,5 @@ common-obj-$(CONFIG_USB_REDIR) += redirect.o quirks.o
>  common-obj-y += $(patsubst %,host-%.o,$(HOST_USB))
>  
>  ifeq ($(CONFIG_USB_LIBUSB),y)
> -common-obj-$(CONFIG_XEN_BACKEND) += xen-usb.o
> +common-obj-$(CONFIG_XEN) += xen-usb.o
>  endif
> diff --git a/hw/xen/Makefile.objs b/hw/xen/Makefile.objs
> index 591cdc2..4be3ec9 100644
> --- a/hw/xen/Makefile.objs
> +++ b/hw/xen/Makefile.objs
> @@ -1,5 +1,5 @@
>  # xen backend driver support
> -common-obj-$(CONFIG_XEN_BACKEND) += xen_backend.o xen_devconfig.o xen_pvdev.o
> +common-obj-$(CONFIG_XEN) += xen_backend.o xen_devconfig.o xen_pvdev.o
>  
>  obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o
>  obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen_pt.o xen_pt_config_init.o 
> xen_pt_graphics.o xen_pt_msi.o
> 

I'm queuing this for 2.9 as a bugfix, adding Xen things to various
softcore emulators makes no sense.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/9] configure: change CONFIG_XEN_BACKEND to be a target property

2017-03-15 Thread Paolo Bonzini


On 14/03/2017 21:23, Stefano Stabellini wrote:
> On Tue, 14 Mar 2017, Stefano Stabellini wrote:
>>> Then you add to Makefile:
>>>
>>>  CONFIG_SOFTMMU := $(if $(filter %-softmmu,$(TARGET_DIRS)),y)
>>>  CONFIG_USER_ONLY := $(if $(filter %-user,$(TARGET_DIRS)),y)
>>> +CONFIG_XEN := $(CONFIG_XEN_BACKEND)
>>>  CONFIG_ALL=y
>>>  -include config-all-devices.mak
>>>  -include config-all-disas.mak
>>>
>>> The Makefile change ensures that they are built before descending in the
>>> target-specific directories.
>>
>> But I don't understand this. Please correct me if I am wrong, but this
>> change looks like it would end up setting CONFIG_XEN every time that
>> CONFIG_XEN_BACKEND is set. Without the configure change at the top, it
>> would end up setting CONFIG_XEN whenever the host supports Xen, even for
>> non-x86 and non-ARM targets. What am I missing?

This CONFIG_XEN assignment applies to the toplevel only, i.e. to files
that are built once.  Targets will still take CONFIG_XEN from
config-target.mak, and it will not be set for non-x86/non-ARM targets.
This CONFIG_XEN assignment applies to files that are compiled once.

The issue you reported here:

>   LINKaarch64-softmmu/qemu-system-aarch64
> ../hw/9pfs/xen-9p-backend.o: In function `xen_9pfs_alloc':
> /local/qemu/hw/9pfs/xen-9p-backend.c:387: undefined reference to 
> `xenstore_write_be_str'
> /local/qemu/hw/9pfs/xen-9p-backend.c:388: undefined reference to 
> `xenstore_write_be_int'

is because you need this in patch 9:

-common-obj-$(CONFIG_XEN_BACKEND) += xen-9p-backend.o
+common-obj-$(CONFIG_XEN) += xen-9p-backend.o


> ../hw/9pfs/xen-9p-backend.o: In function `xen_9pfs_connect':
> /local/qemu/hw/9pfs/xen-9p-backend.c:292: undefined reference to 
> `xenstore_read_fe_int'
> /local/qemu/hw/9pfs/xen-9p-backend.c:313: undefined reference to 
> `xenstore_read_fe_int'
> /local/qemu/hw/9pfs/xen-9p-backend.c:357: undefined reference to 
> `xen_pv_printf'
> /local/qemu/hw/9pfs/xen-9p-backend.c:307: undefined reference to 
> `xenstore_read_fe_int'
> /local/qemu/hw/9pfs/xen-9p-backend.c:362: undefined reference to 
> `xenstore_read_be_str'
> /local/qemu/hw/9pfs/xen-9p-backend.c:363: undefined reference to 
> `xenstore_read_be_str'
> /local/qemu/hw/9pfs/xen-9p-backend.c:366: undefined reference to 
> `xenstore_read_fe_str'
> /local/qemu/hw/9pfs/xen-9p-backend.c:353: undefined reference to 
> `xen_pv_printf'
> ../hw/9pfs/xen-9p-backend.o: In function `xen_9pfs_alloc':
> /local/qemu/hw/9pfs/xen-9p-backend.c:389: undefined reference to 
> `xenstore_write_be_int'
> collect2: error: ld returned 1 exit status
> make[1]: *** [qemu-system-arm] Error 1
> make: *** [subdir-arm-softmmu] Error 2
> ../hw/9pfs/xen-9p-backend.o: In function `xen_9pfs_alloc':
> /local/qemu/hw/9pfs/xen-9p-backend.c:387: undefined reference to 
> `xenstore_write_be_str'
> /local/qemu/hw/9pfs/xen-9p-backend.c:388: undefined reference to 
> `xenstore_write_be_int'
> ../hw/9pfs/xen-9p-backend.o: In function `xen_9pfs_connect':
> /local/qemu/hw/9pfs/xen-9p-backend.c:292: undefined reference to 
> `xenstore_read_fe_int'
> /local/qemu/hw/9pfs/xen-9p-backend.c:313: undefined reference to 
> `xenstore_read_fe_int'
> /local/qemu/hw/9pfs/xen-9p-backend.c:357: undefined reference to 
> `xen_pv_printf'
> /local/qemu/hw/9pfs/xen-9p-backend.c:307: undefined reference to 
> `xenstore_read_fe_int'
> /local/qemu/hw/9pfs/xen-9p-backend.c:362: undefined reference to 
> `xenstore_read_be_str'
> /local/qemu/hw/9pfs/xen-9p-backend.c:363: undefined reference to 
> `xenstore_read_be_str'
> /local/qemu/hw/9pfs/xen-9p-backend.c:366: undefined reference to 
> `xenstore_read_fe_str'
> /local/qemu/hw/9pfs/xen-9p-backend.c:353: undefined reference to 
> `xen_pv_printf'
> ../hw/9pfs/xen-9p-backend.o: In function `xen_9pfs_alloc':
> /local/qemu/hw/9pfs/xen-9p-backend.c:389: undefined reference to 
> `xenstore_write_be_int'
> collect2: error: ld returned 1 exit status
> make[1]: *** [qemu-system-aarch64] Error 1
> make: *** [subdir-aarch64-softmmu] Error 2
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/9] configure: change CONFIG_XEN_BACKEND to be a target property

2017-03-14 Thread Paolo Bonzini


On 14/03/2017 00:55, Stefano Stabellini wrote:
> CONFIG_XEN_BACKEND is currently set when the host supports Xen,
> regardless of the chosen targets. As a consequence, Xen backends can be
> enabled even on targets that don't support Xen.
> 
> Fix the issue by setting CONFIG_XEN_BACKEND only for targets that
> support Xen.
> 
> Signed-off-by: Stefano Stabellini 
> CC: gr...@kaod.org
> CC: gr...@kaod.org
> CC: pbonz...@redhat.com
> CC: peter.mayd...@linaro.org
> CC: r...@twiddle.net
> CC: stefa...@redhat.com
> ---
>  configure | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index 6c21975..6d8f752 100755
> --- a/configure
> +++ b/configure
> @@ -5442,7 +5442,6 @@ if test "$virglrenderer" = "yes" ; then
>echo "VIRGL_LIBS=$virgl_libs" >> $config_host_mak
>  fi
>  if test "$xen" = "yes" ; then
> -  echo "CONFIG_XEN_BACKEND=y" >> $config_host_mak
>echo "CONFIG_XEN_CTRL_INTERFACE_VERSION=$xen_ctrl_version" >> 
> $config_host_mak
>if test "$xen_pv_domain_build" = "yes" ; then
>  echo "CONFIG_XEN_PV_DOMAIN_BUILD=y" >> $config_host_mak
> @@ -6028,6 +6027,7 @@ case "$target_name" in
>i386|x86_64)
>  if test "$xen" = "yes" -a "$target_softmmu" = "yes" ; then
>echo "CONFIG_XEN=y" >> $config_target_mak
> +  echo "CONFIG_XEN_BACKEND=y" >> $config_target_mak
>if test "$xen_pci_passthrough" = yes; then
>  echo "CONFIG_XEN_PCI_PASSTHROUGH=y" >> "$config_target_mak"
>fi
> 

This messes up a bit the way xen_nic.o and friends are compiled, I
think, because they are common-obj-y but they are only found when
compiling in the target subdirectories.  So you end up building
x86_64-softmmu/../hw/net/xen_nic.o

If you're unlucky, I believe this can lead to a link failure where a
target is building xen_nic.o, while the other tries to link to a
partially written object file.

I think the files should be changed from
common-obj-$(CONFIG_XEN_BACKEND) to common-obj-$(CONFIG_XEN).  Then you
add to Makefile:

 CONFIG_SOFTMMU := $(if $(filter %-softmmu,$(TARGET_DIRS)),y)
 CONFIG_USER_ONLY := $(if $(filter %-user,$(TARGET_DIRS)),y)
+CONFIG_XEN := $(CONFIG_XEN_BACKEND)
 CONFIG_ALL=y
 -include config-all-devices.mak
 -include config-all-disas.mak

The Makefile change ensures that they are built before descending in the
target-specific directories.  The Makefile.objs change ensures that Xen
backends are not enabled on targets that support Xen.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 0/2] x86/kvm: Reduce vcpu_is_preempted() overhead

2017-02-21 Thread Paolo Bonzini


On 20/02/2017 20:54, Peter Zijlstra wrote:
> On Mon, Feb 20, 2017 at 01:36:02PM -0500, Waiman Long wrote:
>> Waiman Long (2):
>>   x86/paravirt: Change vcp_is_preempted() arg type to long
>>   x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64
>>
>>  arch/x86/include/asm/paravirt.h  |  2 +-
>>  arch/x86/include/asm/qspinlock.h |  2 +-
>>  arch/x86/kernel/asm-offsets_64.c |  9 +
>>  arch/x86/kernel/kvm.c| 26 +-
>>  arch/x86/kernel/paravirt-spinlocks.c |  2 +-
>>  5 files changed, 37 insertions(+), 4 deletions(-)
> 
> I'm assuming this will go through the KVM tree, if people want me to
> take it through the tip tree, please let me know.
> 
> Acked-by: Peter Zijlstra (Intel) 

Applied, thanks.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 4/4] KVM: VMX: Simplify segment_base

2017-02-21 Thread Paolo Bonzini

> Paolo, how stable, non-rebasing are the KVM tree commits?

Whatever ends in linux-next is stable.  I have a separate rebasing branch,
but it's not part of linux-next by design.

> Or should we keep Andy's KVM patches together with the GDT patches? Either
> workflow works for me - it's your call as these are predominantly KVM
> changes.

I'll delay my pull request to Linus a couple days so that I can test
Andy's 6 patches. Then you can just base your branch on Linus's tree.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] x86/paravirt: Don't make vcpu_is_preempted() a callee-save function

2017-02-10 Thread Paolo Bonzini
ed;
>  }
>  
>  #endif   /* CONFIG_PARAVIRT_SPINLOCKS */
> diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
> b/arch/x86/kernel/paravirt-spinlocks.c
> index 6259327..da050bc 100644
> --- a/arch/x86/kernel/paravirt-spinlocks.c
> +++ b/arch/x86/kernel/paravirt-spinlocks.c
> @@ -24,12 +24,10 @@ __visible bool __native_vcpu_is_preempted(int cpu)
>  {
>   return false;
>  }
> -PV_CALLEE_SAVE_REGS_THUNK(__native_vcpu_is_preempted);
>  
>  bool pv_is_native_vcpu_is_preempted(void)
>  {
> - return pv_lock_ops.vcpu_is_preempted.func ==
> - __raw_callee_save___native_vcpu_is_preempted;
> + return pv_lock_ops.vcpu_is_preempted == __native_vcpu_is_preempted;
>  }
>  
>  struct pv_lock_ops pv_lock_ops = {
> @@ -38,7 +36,7 @@ struct pv_lock_ops pv_lock_ops = {
>   .queued_spin_unlock = PV_CALLEE_SAVE(__native_queued_spin_unlock),
>   .wait = paravirt_nop,
>   .kick = paravirt_nop,
> - .vcpu_is_preempted = PV_CALLEE_SAVE(__native_vcpu_is_preempted),
> + .vcpu_is_preempted = __native_vcpu_is_preempted,
>  #endif /* SMP */
>  };
>  EXPORT_SYMBOL(pv_lock_ops);
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index 25a7c43..c85bb8f 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -114,8 +114,6 @@ void xen_uninit_lock_cpu(int cpu)
>   per_cpu(irq_name, cpu) = NULL;
>  }
>  
> -PV_CALLEE_SAVE_REGS_THUNK(xen_vcpu_stolen);
> -
>  /*
>   * Our init of PV spinlocks is split in two init functions due to us
>   * using paravirt patching and jump labels patching and having to do
> @@ -138,7 +136,7 @@ void __init xen_init_spinlocks(void)
>   pv_lock_ops.queued_spin_unlock = 
> PV_CALLEE_SAVE(__pv_queued_spin_unlock);
>   pv_lock_ops.wait = xen_qlock_wait;
>   pv_lock_ops.kick = xen_qlock_kick;
> - pv_lock_ops.vcpu_is_preempted = PV_CALLEE_SAVE(xen_vcpu_stolen);
> + pv_lock_ops.vcpu_is_preempted = xen_vcpu_stolen;
>  }
>  
>  static __init int xen_parse_nopvspin(char *arg)
> 

Acked-by: Paolo Bonzini <pbonz...@redhat.com>

Thank you very much!

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 0/7] Move accel, KVM, Xen, qtest files to accel/ subdir

2016-12-21 Thread Paolo Bonzini


On 21/12/2016 15:15, Eduardo Habkost wrote:
>>
>> ifeq ($(CONFIG_SOFTMMU),y)
>> obj-$(CONFIG_KVM) += kvm-all.c
>> obj-$(call lnot, $(CONFIG_KVM)) += kvm-stub.c
>> endif
>>
>> similar to what is done already in Makefile.objs?
> I assume you mean:
> 
>  ifeq ($(CONFIG_SOFTMMU),y)
>  obj-$(CONFIG_KVM) += kvm-all.c
>  endif
>  obj-$(call lnot, $(CONFIG_KVM)) += kvm-stub.c
> 
> Nothing really wrong, we can do that to avoid using libqemustub.
> I was just trying to avoid a more complex rule involving
> ifeq+lnot, and to avoid including accel/ in obj-y on the
> non-softmmu case.

TCG would need accel/ in obj-y in the non-softmmu case, for things such
as cpu-exec.c and translate-all.c.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 0/7] Move accel, KVM, Xen, qtest files to accel/ subdir

2016-12-21 Thread Paolo Bonzini


On 21/12/2016 14:14, Eduardo Habkost wrote:
> On Wed, Dec 21, 2016 at 12:21:44PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 20/12/2016 18:43, Eduardo Habkost wrote:
>>> This moves the KVM and Xen files to the an accel/ subdir.
>>>
>>> Instead of moving the *-stubs.c file to accel/ as-is, I tried to
>>> move most of the stub code to libqemustub.a. This way the obj-y
>>> logic for accel/ is simpler: obj-y includes accel/ only if
>>> CONFIG_SOFTMMU is set.
>>>
>>> The Xen stubs could be moved completely to stubs/, but some of
>>> the KVM stubs depend on cpu.h. So most of the kvm-stub.c code was
>>> moved to stubs/kvm.c, but some of that code was kept in
>>> accel/kvm-stub.c.
>>
>> I think we need to decide what libqemustub is for.
>>
>> The original purpose was to provide different implementations of some
>> functions for tools vs. emulators or (more rarely) for user-mode vs.
>> system emulation.
> 
> So, is sysemu vs user-mode a valid reason for using libqemustub?

Yes, but I was thinking of a different distinction.

You'd use libqemustub if user-mode emulation (or tools) only needs 2-3
functions out of a large file, while system-mode emulation needs all of it.

For example, of the entire monitor API, the tools need pretty much
nothing but monitor_init, monitor_get_fd, cur_mon and
monitor_cur_is_qmp.  Such a small extract of the API makes little sense
except for "this is what is needed to compile the tools", so it's stubs/
rather than monitor-stub.c.

Instead, non-KVM targets need a stub implementation of the entire API,
so it's kvm-stub.c rather than stubs/kvm.c (kvm-stub.c depends on cpu.h
but that's really only needed to compile it---the kvm-stub.c code
actually has no dependency).

There are certainly cases where libqemustub is used instead of lnot.  In
the specific case of sysemu vs. user-mode, stubs/cpus.c and
stubs/replay-user.c should not be in libqemustub.  They should be in a
separate file user-exec-stub.c, which is only used if !CONFIG_SOFTMMU.

> The main reason I have moved some code to sbus/kvm.c is to avoid
> having to include accel/kvm-stub.c in *-user.

What's wrong with

ifeq ($(CONFIG_SOFTMMU),y)
obj-$(CONFIG_KVM) += kvm-all.c
obj-$(call lnot, $(CONFIG_KVM)) += kvm-stub.c
endif

similar to what is done already in Makefile.objs?

> Moving xen-stub.c to libqemustub, on the other hand, is really
> unnecessary.

Why would it be different?

>> In general, I think libqemustub should be the last resort.  If possible,
>> inlines in headers should be the first choice, and stubs in objs-y or
>> common-objs-y (using $(call lnot) in the Makefile) should be the second.
> 
> I understand the reasoning, but I fail to see cases when
> libqemustub would be considered appropriate. Using stubs in
> obj-y/common-obj-y using $(call lnot) is always possible, isn't
> it?
> 
> Hmm, maybe on cases where the decision to use the stub doesn't
> depend on a single build variable (e.g. a function implemented by
> a handful of targets, but not all of them).

This is a good one.

> Are there other examples?

Does the one above (extract a small part of an API) make sense?

libqemustub is a necessary evil and it's almost never necessary.  It
basically exists for cases where you cannot replace a source file with
another wholesale.

There are also some cases of premature optimization.  For example reset
handlers are stubbed, but: 1) system emulation implements them in vl.c
which is an antipattern of its own, and 2) they are small enough that
including them in user-mode emulators (together with the rest of qdev)
is not a big deal.  (I'm planning to remove some stubs in 2.9, so I'm
taking these examples from that branch).

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 0/7] Move accel, KVM, Xen, qtest files to accel/ subdir

2016-12-21 Thread Paolo Bonzini


On 20/12/2016 18:43, Eduardo Habkost wrote:
> This moves the KVM and Xen files to the an accel/ subdir.
> 
> Instead of moving the *-stubs.c file to accel/ as-is, I tried to
> move most of the stub code to libqemustub.a. This way the obj-y
> logic for accel/ is simpler: obj-y includes accel/ only if
> CONFIG_SOFTMMU is set.
> 
> The Xen stubs could be moved completely to stubs/, but some of
> the KVM stubs depend on cpu.h. So most of the kvm-stub.c code was
> moved to stubs/kvm.c, but some of that code was kept in
> accel/kvm-stub.c.

I think we need to decide what libqemustub is for.

The original purpose was to provide different implementations of some
functions for tools vs. emulators or (more rarely) for user-mode vs.
system emulation.

There are also some cases where we use it for functions that are only
implemented by some targets.

In general, I think libqemustub should be the last resort.  If possible,
inlines in headers should be the first choice, and stubs in objs-y or
common-objs-y (using $(call lnot) in the Makefile) should be the second.

The reason is that stubs/ hides files from the corresponding maintainer;
for example, commit 07a32d6b's addition of stubs/get-next-serial.c was
in all likelihood unnecessary, but the new file went in without an ack
for a character device maintainer.  Tracking stubs in the MAINTAINERS
file is unwieldy and, from this point of view, I find

F: accel/kvm*

to be preferrable to

F: accel/kvm*
F: stubs/kvm.c

Thanks,

Paolo

> About TCG:
> --
> 
> It is not obvious to me which TCG-related files could be moved to
> accel/, so this series don't move any of them yet.
> 
> About other CONFIG_SOFTMMU top-level files:
> ---
> 
> I would like to know what we should do with the top-level
> CONFIG_SOFTMMU-only files that don't belong to hw/. Some
> candidates: arch_init.c cpus.c monitor.c gdbstub.c balloon.c
> ioport.c bootdevice.c memory.c cputlb.c memory_mapping.c dump.c.
> 
> Maybe a sysemu/ subdir? In that case, should we still create an
> accel/ subdir, or move xen-*, kvm-* and friends to sysemu/ too?
> 
> Cc: Paolo Bonzini <pbonz...@redhat.com>
> Cc: k...@vger.kernel.org
> Cc: Christoffer Dall <christoffer.d...@linaro.org>
> Cc: Anthony Perard <anthony.per...@citrix.com>
> Cc: Stefano Stabellini <sstabell...@kernel.org>
> Cc: xen-de...@lists.xensource.com
> 
> Eduardo Habkost (7):
>   xen: Move xen-*-stub.c to stubs/
>   xen: Move xen files to accel/
>   kvm: Move some kvm-stub.c code to stubs/kvm.c
>   kvm: Include kvm-stub.o only on CONFIG_SOFTMMU
>   kvm: Move kvm*.c files to accel/
>   accel: Move accel.c to accel/
>   accel: Move qtest.c to accel/

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 00/11] implement vcpu preempted check

2016-10-28 Thread Paolo Bonzini


On 28/10/2016 10:11, Pan Xinhui wrote:
> change from v5:
>   spilt x86/kvm patch into guest/host part.
>   introduce kvm_write_guest_offset_cached.
>   fix some typos.
>   rebase patch onto 4.9.2

Acked-by: Paolo Bonzini <pbonz...@redhat.com>

Thanks,

Paolo

> change from v4:
>   spilt x86 kvm vcpu preempted check into two patches.
>   add documentation patch.
>   add x86 vcpu preempted check patch under xen
>   add s390 vcpu preempted check patch 
> change from v3:
>   add x86 vcpu preempted check patch
> change from v2:
>   no code change, fix typos, update some comments
> change from v1:
>   a simplier definition of default vcpu_is_preempted
>   skip mahcine type check on ppc, and add config. remove dedicated macro.
>   add one patch to drop overload of rwsem_spin_on_owner and 
> mutex_spin_on_owner. 
>   add more comments
>   thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.
> 
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
> 
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
> 
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch 
> set
> 
> We also have observed some performace improvements in uninx benchmark tests.
> 
> PPC test result:
> 1 copy - 0.94%
> 2 copy - 7.17%
> 4 copy - 11.9%
> 8 copy -  3.04%
> 16 copy - 15.11%
> 
> details below:
> Without patch:
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks  2188223.0 KBps  (30.0 s, 
> 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks  1804433.0 KBps  (30.0 s, 
> 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks  1237257.0 KBps  (30.0 s, 
> 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks  1032658.0 KBps  (30.0 s, 
> 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks   768000.0 KBps  (30.1 
> s, 1 samples)
> 
> With patch: 
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks  2209189.0 KBps  (30.0 s, 
> 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks  1943816.0 KBps  (30.0 s, 
> 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks  1405591.0 KBps  (30.0 s, 
> 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks  1065080.0 KBps  (30.0 s, 
> 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks   904762.0 KBps  (30.0 
> s, 1 samples)
> 
> X86 test result:
>   test-case   after-patch   before-patch
> Execl Throughput   |18307.9 lps  |11701.6 lps 
> File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
> File Copy 256 bufsize 500 maxblocks|   367555.6 KBps |   222867.7 KBps
> File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
> Pipe Throughput| 11872208.7 lps  | 11855628.9 lps 
> Pipe-based Context Switching   |  1495126.5 lps  |  1490533.9 lps 
> Process Creation   |29881.2 lps  |28572.8 lps 
> Shell Scripts (1 concurrent)   |23224.3 lpm  |22607.4 lpm 
> Shell Scripts (8 concurrent)   | 3531.4 lpm  | 3211.9 lpm 
> System Call Overhead   | 10385653.0 lps  | 10419979.0 lps 
> 
> Christian Borntraeger (1):
>   s390/spinlock: Provide vcpu_is_preempted
> 
> Juergen Gross (1):
>   x86, xen: support vcpu preempted check
> 
> Pan Xinhui (9):
>   kernel/sched: introduce vcpu preempted check interface
>   locking/osq: Drop the overload of osq_lock()
>   kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
>   powerpc/spinlock: support vcpu preempted check
>   x86, paravirt: Add interface to support kvm/xen vcpu preempted check
>   KVM: Introduce kvm_write_guest_offset_cached
>   x86, kvm/x86.c: support vcpu preempted check
>   x86, kernel/kvm.c: support vcpu preempted check
>   Documentation: virtual: kvm: Support vcpu preempted check
> 
>  Documentation/virtual/kvm/msr.txt |  9 -
>  arch/powerpc/include/asm/spinlock.h   |  8 
>  arch/s390/in

Re: [Xen-devel] [PATCH 15/15] xen: Rename xen_be_frontend_changed

2016-10-13 Thread Paolo Bonzini

> So we are better leave it as is. Maybe we need to rename some functions in
> that file.
>  __iirc__ adding xen_frontend.c is one of Stefano's comments in previous v6
>  or v7..

I agree; it's quite possible that code can be shared between backend and
frontend (renaming functions as needed), and it's good to have a generic
xen frontend mechanism instead of something specific to TPM.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/15] xen: Rename xen_be_frontend_changed

2016-10-12 Thread Paolo Bonzini


On 09/10/2016 21:50, Emil Condrea wrote:
> On Tue, Oct 4, 2016 at 11:06 AM, Paolo Bonzini <pbonz...@redhat.com> wrote:
>>
>>
>> On 04/10/2016 08:43, Emil Condrea wrote:
>>> xen_be_frontend_changed -> xen_fe_frontend_changed
>>
>> This is not correct.  The front-end is implemented in the guest domain,
>> while the back-end is implemented in the dom0 or stubdom.
>>
> 
> You are right, thanks for the feedback! I will drop this patch
> together with the hunk
> from 04/15 patch which moves this function to xen_frontend.c

Actually all of your new xen_frontend.c seems to be reading frontend
information from XenStore, which is typically something that the backend
does.  So I suggest dropping the patch altogether.

Thanks,

Paolo

>> This function processes *in the backed* a notification that the frontend
>> state changed, hence the name should be xen_be_frontend_changed.
>>
>> Paolo
>>
>>> Signed-off-by: Emil Condrea <emilcond...@gmail.com>
>>> ---
>>>  hw/xen/xen_backend.c  | 2 +-
>>>  hw/xen/xen_frontend.c | 4 ++--
>>>  include/hw/xen/xen_frontend.h | 2 +-
>>>  3 files changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
>>> index 30d3aaa..b79e83e 100644
>>> --- a/hw/xen/xen_backend.c
>>> +++ b/hw/xen/xen_backend.c
>>> @@ -213,7 +213,7 @@ static int xen_be_try_setup(struct XenDevice *xendev)
>>>  xen_be_set_state(xendev, XenbusStateInitialising);
>>>
>>>  xen_be_backend_changed(xendev, NULL);
>>> -xen_be_frontend_changed(xendev, NULL);
>>> +xen_fe_frontend_changed(xendev, NULL);
>>>  return 0;
>>>  }
>>>
>>> diff --git a/hw/xen/xen_frontend.c b/hw/xen/xen_frontend.c
>>> index 1407f5f..761688b 100644
>>> --- a/hw/xen/xen_frontend.c
>>> +++ b/hw/xen/xen_frontend.c
>>> @@ -39,7 +39,7 @@ int xenstore_read_fe_uint64(struct XenDevice *xendev, 
>>> const char *node,
>>>  return xenstore_read_uint64(xendev->fe, node, uval);
>>>  }
>>>
>>> -void xen_be_frontend_changed(struct XenDevice *xendev, const char *node)
>>> +void xen_fe_frontend_changed(struct XenDevice *xendev, const char *node)
>>>  {
>>>  int fe_state;
>>>
>>> @@ -85,6 +85,6 @@ void xenstore_update_fe(char *watch, struct XenDevice 
>>> *xendev)
>>>  }
>>>  node = watch + len + 1;
>>>
>>> -xen_be_frontend_changed(xendev, node);
>>> +xen_fe_frontend_changed(xendev, node);
>>>  xen_be_check_state(xendev);
>>>  }
>>> diff --git a/include/hw/xen/xen_frontend.h b/include/hw/xen/xen_frontend.h
>>> index bb0bc23..2a5f03f 100644
>>> --- a/include/hw/xen/xen_frontend.h
>>> +++ b/include/hw/xen/xen_frontend.h
>>> @@ -9,6 +9,6 @@ int xenstore_read_fe_uint64(struct XenDevice *xendev, const 
>>> char *node,
>>>  uint64_t *uval);
>>>  void xenstore_update_fe(char *watch, struct XenDevice *xendev);
>>>
>>> -void xen_be_frontend_changed(struct XenDevice *xendev, const char *node);
>>> +void xen_fe_frontend_changed(struct XenDevice *xendev, const char *node);
>>>
>>>  #endif /* QEMU_HW_XEN_FRONTEND_H */
>>>
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/15] xen: Rename xen_be_frontend_changed

2016-10-04 Thread Paolo Bonzini


On 04/10/2016 10:06, Paolo Bonzini wrote:
> 
> 
> On 04/10/2016 08:43, Emil Condrea wrote:
>> xen_be_frontend_changed -> xen_fe_frontend_changed
> 
> This is not correct.  The front-end is implemented in the guest domain,
> while the back-end is implemented in the dom0 or stubdom.
> 
> This function processes *in the backed* a notification that the frontend
> state changed, hence the name should be xen_be_frontend_changed.

Sorry, this comment was in the wrong place.  It should have referred
only to the hunk in hw/xen/xen_backend.c.

Paolo

> Paolo
> 
>> Signed-off-by: Emil Condrea <emilcond...@gmail.com>
>> ---
>>  hw/xen/xen_backend.c  | 2 +-
>>  hw/xen/xen_frontend.c | 4 ++--
>>  include/hw/xen/xen_frontend.h | 2 +-
>>  3 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
>> index 30d3aaa..b79e83e 100644
>> --- a/hw/xen/xen_backend.c
>> +++ b/hw/xen/xen_backend.c
>> @@ -213,7 +213,7 @@ static int xen_be_try_setup(struct XenDevice *xendev)
>>  xen_be_set_state(xendev, XenbusStateInitialising);
>>  
>>  xen_be_backend_changed(xendev, NULL);
>> -xen_be_frontend_changed(xendev, NULL);
>> +xen_fe_frontend_changed(xendev, NULL);
>>  return 0;
>>  }
>>  
>> diff --git a/hw/xen/xen_frontend.c b/hw/xen/xen_frontend.c
>> index 1407f5f..761688b 100644
>> --- a/hw/xen/xen_frontend.c
>> +++ b/hw/xen/xen_frontend.c
>> @@ -39,7 +39,7 @@ int xenstore_read_fe_uint64(struct XenDevice *xendev, 
>> const char *node,
>>  return xenstore_read_uint64(xendev->fe, node, uval);
>>  }
>>  
>> -void xen_be_frontend_changed(struct XenDevice *xendev, const char *node)
>> +void xen_fe_frontend_changed(struct XenDevice *xendev, const char *node)
>>  {
>>  int fe_state;
>>  
>> @@ -85,6 +85,6 @@ void xenstore_update_fe(char *watch, struct XenDevice 
>> *xendev)
>>  }
>>  node = watch + len + 1;
>>  
>> -xen_be_frontend_changed(xendev, node);
>> +xen_fe_frontend_changed(xendev, node);
>>  xen_be_check_state(xendev);
>>  }
>> diff --git a/include/hw/xen/xen_frontend.h b/include/hw/xen/xen_frontend.h
>> index bb0bc23..2a5f03f 100644
>> --- a/include/hw/xen/xen_frontend.h
>> +++ b/include/hw/xen/xen_frontend.h
>> @@ -9,6 +9,6 @@ int xenstore_read_fe_uint64(struct XenDevice *xendev, const 
>> char *node,
>>  uint64_t *uval);
>>  void xenstore_update_fe(char *watch, struct XenDevice *xendev);
>>  
>> -void xen_be_frontend_changed(struct XenDevice *xendev, const char *node);
>> +void xen_fe_frontend_changed(struct XenDevice *xendev, const char *node);
>>  
>>  #endif /* QEMU_HW_XEN_FRONTEND_H */
>>
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/15] xen: Rename xen_be_frontend_changed

2016-10-04 Thread Paolo Bonzini


On 04/10/2016 08:43, Emil Condrea wrote:
> xen_be_frontend_changed -> xen_fe_frontend_changed

This is not correct.  The front-end is implemented in the guest domain,
while the back-end is implemented in the dom0 or stubdom.

This function processes *in the backed* a notification that the frontend
state changed, hence the name should be xen_be_frontend_changed.

Paolo

> Signed-off-by: Emil Condrea 
> ---
>  hw/xen/xen_backend.c  | 2 +-
>  hw/xen/xen_frontend.c | 4 ++--
>  include/hw/xen/xen_frontend.h | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
> index 30d3aaa..b79e83e 100644
> --- a/hw/xen/xen_backend.c
> +++ b/hw/xen/xen_backend.c
> @@ -213,7 +213,7 @@ static int xen_be_try_setup(struct XenDevice *xendev)
>  xen_be_set_state(xendev, XenbusStateInitialising);
>  
>  xen_be_backend_changed(xendev, NULL);
> -xen_be_frontend_changed(xendev, NULL);
> +xen_fe_frontend_changed(xendev, NULL);
>  return 0;
>  }
>  
> diff --git a/hw/xen/xen_frontend.c b/hw/xen/xen_frontend.c
> index 1407f5f..761688b 100644
> --- a/hw/xen/xen_frontend.c
> +++ b/hw/xen/xen_frontend.c
> @@ -39,7 +39,7 @@ int xenstore_read_fe_uint64(struct XenDevice *xendev, const 
> char *node,
>  return xenstore_read_uint64(xendev->fe, node, uval);
>  }
>  
> -void xen_be_frontend_changed(struct XenDevice *xendev, const char *node)
> +void xen_fe_frontend_changed(struct XenDevice *xendev, const char *node)
>  {
>  int fe_state;
>  
> @@ -85,6 +85,6 @@ void xenstore_update_fe(char *watch, struct XenDevice 
> *xendev)
>  }
>  node = watch + len + 1;
>  
> -xen_be_frontend_changed(xendev, node);
> +xen_fe_frontend_changed(xendev, node);
>  xen_be_check_state(xendev);
>  }
> diff --git a/include/hw/xen/xen_frontend.h b/include/hw/xen/xen_frontend.h
> index bb0bc23..2a5f03f 100644
> --- a/include/hw/xen/xen_frontend.h
> +++ b/include/hw/xen/xen_frontend.h
> @@ -9,6 +9,6 @@ int xenstore_read_fe_uint64(struct XenDevice *xendev, const 
> char *node,
>  uint64_t *uval);
>  void xenstore_update_fe(char *watch, struct XenDevice *xendev);
>  
> -void xen_be_frontend_changed(struct XenDevice *xendev, const char *node);
> +void xen_fe_frontend_changed(struct XenDevice *xendev, const char *node);
>  
>  #endif /* QEMU_HW_XEN_FRONTEND_H */
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/2] Xen HVM unplug changes

2016-09-01 Thread Paolo Bonzini


On 01/09/2016 14:11, Olaf Hering wrote:
> Update unplug in Xen HVM guests to cover more cases.
> Please review.
> 
> Olaf
> 
> Olaf Hering (2):
>   xen_platform: unplug also SCSI disks
>   xen_platform: SUSE xenlinux unplug for emulated PCI
> 
>  hw/i386/xen/xen_platform.c | 36 +++-
>  1 file changed, 35 insertions(+), 1 deletion(-)
> 

Looks good with the checkpatch complaints fixed.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Regression with commit 095497ffc66b7f031

2016-07-15 Thread Paolo Bonzini


On 15/07/2016 12:41, Juergen Gross wrote:
> On 15/07/16 12:35, Paolo Bonzini wrote:
>>
>>
>> On 15/07/2016 12:12, Gerd Hoffmann wrote:
>>> On Fr, 2016-07-15 at 12:02 +0200, Paolo Bonzini wrote:
>>>>
>>>> On 15/07/2016 10:47, Juergen Gross wrote:
>>>>> Nothing scaring and no real difference between working and not working
>>>>> variant.
>>>>>
>>>>> Meanwhile I've been digging a little bit deeper and found the reason:
>>>>> libxenstore is setting up a reader thread which is waiting for the
>>>>> watch to fire. With above commit the stack size of that thread (16kB)
>>>>> is too small. Setting it to 32kB made qemu work again.
>>>>
>>>> This makes very little sense (not your fault)...  The commit doesn't
>>>> change stack usage at all, TLS should not be on the stack.
>>>>
>>>> Can you capture a backtrace where the 16K stack is exceeded?  Perhaps
>>>> it's only due to inlining decision on the compiler, in which case
>>>> Peter's patch from today is only a bandaid.
>>>
>>> Hmm, I guess I hold off the vnc pull request for now ...
>>
>> Go ahead.  I looked at glibc source code and the patch is okay.
> 
> Paolo, do you know of any interface to obtain the size of the TLS area
> taken from the stack (before calling pthread_create() )? 

https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01643.html has a patch
that _removes_ code to do this from the golang runtime.  The comments
there say that only with glibc before version 2.16 the static TLS size
is taken out of the stack size...

What version of glibc are you using?

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Regression with commit 095497ffc66b7f031

2016-07-15 Thread Paolo Bonzini


On 15/07/2016 12:12, Gerd Hoffmann wrote:
> On Fr, 2016-07-15 at 12:02 +0200, Paolo Bonzini wrote:
>>
>> On 15/07/2016 10:47, Juergen Gross wrote:
>>> Nothing scaring and no real difference between working and not working
>>> variant.
>>>
>>> Meanwhile I've been digging a little bit deeper and found the reason:
>>> libxenstore is setting up a reader thread which is waiting for the
>>> watch to fire. With above commit the stack size of that thread (16kB)
>>> is too small. Setting it to 32kB made qemu work again.
>>
>> This makes very little sense (not your fault)...  The commit doesn't
>> change stack usage at all, TLS should not be on the stack.
>>
>> Can you capture a backtrace where the 16K stack is exceeded?  Perhaps
>> it's only due to inlining decision on the compiler, in which case
>> Peter's patch from today is only a bandaid.
> 
> Hmm, I guess I hold off the vnc pull request for now ...

Go ahead.  I looked at glibc source code and the patch is okay.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Regression with commit 095497ffc66b7f031

2016-07-15 Thread Paolo Bonzini


On 15/07/2016 10:47, Juergen Gross wrote:
> Nothing scaring and no real difference between working and not working
> variant.
> 
> Meanwhile I've been digging a little bit deeper and found the reason:
> libxenstore is setting up a reader thread which is waiting for the
> watch to fire. With above commit the stack size of that thread (16kB)
> is too small. Setting it to 32kB made qemu work again.

This makes very little sense (not your fault)...  The commit doesn't
change stack usage at all, TLS should not be on the stack.

Can you capture a backtrace where the 16K stack is exceeded?  Perhaps
it's only due to inlining decision on the compiler, in which case
Peter's patch from today is only a bandaid.

Thanks,

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] exec: Fix qemu_ram_block_from_host for Xen

2016-06-13 Thread Paolo Bonzini


On 09/06/2016 17:56, Anthony PERARD wrote:
> Since f615f39 (exec: remove ram_addr argument from
> qemu_ram_block_from_host), migration under Xen is likely to fail, with a
> SEGV of QEMU. But the commit only reveal a bug with the calculation of
> the offset value in qemu_ram_block_from_host().
> 
> This patch calculates the offset from the ram_addr as
> qemu_ram_addr_from_host() will later calculate the ram_addr from the
> offset.
> 
> Signed-off-by: Anthony PERARD <anthony.per...@citrix.com>
> ---
>  exec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/exec.c b/exec.c
> index f2c9e37..f13106d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1935,7 +1935,7 @@ RAMBlock *qemu_ram_block_from_host(void *ptr, bool 
> round_offset,
>  ram_addr = xen_ram_addr_from_mapcache(ptr);
>  block = qemu_get_ram_block(ram_addr);
>  if (block) {
> -*offset = (host - block->host);
> +*offset = ram_addr - block->offset;
>  }
>  rcu_read_unlock();
>  return block;
> 

Acked-by: Paolo Bonzini <pbonz...@redhat.com>

Feel free to send a pull request yourself!

Thanks, and sorry for the breakage.  Indeed the broken code comes from
commit 422148d3e56c3c9a07c0cf36c1e0a0b76f09c357.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] xen-hvm: ignore background I/O sections

2016-05-25 Thread Paolo Bonzini


- Original Message -
> From: "Anthony PERARD" <anthony.per...@citrix.com>
> To: "Paul Durrant" <paul.durr...@citrix.com>
> Cc: qemu-de...@nongnu.org, xen-de...@lists.xenproject.org, "Stefano 
> Stabellini" <sstabell...@kernel.org>, "Paolo
> Bonzini" <pbonz...@redhat.com>
> Sent: Wednesday, May 25, 2016 5:52:32 PM
> Subject: Re: [PATCH v3] xen-hvm: ignore background I/O sections
> 
> On Mon, May 09, 2016 at 05:31:20PM +0100, Paul Durrant wrote:
> > Since Xen will correctly handle accesses to unimplemented I/O ports (by
> > returning all 1's for reads and ignoring writes) there is no need for
> > QEMU to register backgroud I/O sections.
> > 
> > This patch therefore adds checks to xen_io_add/del so that sections with
> > memory-region ops pointing at 'unassigned_io_ops' are ignored.
> > 
> > Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> > Cc: Stefano Stabellini <sstabell...@kernel.org>
> > Cc: Anthony Perard <anthony.per...@citrix.com>
> > Cc: Paolo Bonzini <pbonz...@redhat.com>
> 
> Acked-by: Anthony PERARD <anthony.per...@citrix.com>

Do you have a signed GPG key or do I need to apply this for you?

Thanks,

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86, locking: Remove ticket (spin)lock implementation

2016-05-19 Thread Paolo Bonzini


On 18/05/2016 21:34, Peter Zijlstra wrote:
>> I don't know of any enterprise distro that is shipping anything
>> > more modern than 4.1?
> RHEL 7-- v3.10
> SLES 12   -- v3.12
> Debian Jessie -- v3.16
> Ubuntu 16.04 LTS  -- v4.4
> 
> But waiting for the major enterprise distros (RHEL/SLES) would mean
> another decade or so before people start using it. We don't usually wait
> this long for anything.

We're looking at converting a few specific spinlocks to qspinlock in
RHEL, though we cannot convert all of them due to the spin_lock_t ABI.
It won't get into customer's hands for a while of course.

Thanks,

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-hvm: ignore background I/O sections

2016-05-09 Thread Paolo Bonzini


On 09/05/2016 18:18, Paul Durrant wrote:
> Since Xen will correctly handle accesses to unimplemented I/O ports (by
> returning all 1's for reads and ignoring writes) there is no need for
> QEMU to register backgroud I/O sections.
> 
> This patch therefore adds checks to xen_io_add/del so that sections with
> memory-region ops pointing at 'unassigned_io_ops' are ignored.
> 
> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> Cc: Stefano Stabellini <sstabell...@kernel.org>
> Cc: Anthony Perard <anthony.per...@citrix.com>
> Cc: Paolo Bonzini <pbonz...@redhat.com>
> ---
>  xen-hvm.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/xen-hvm.c b/xen-hvm.c
> index 039680a..8ab44f0 100644
> --- a/xen-hvm.c
> +++ b/xen-hvm.c
> @@ -510,8 +510,12 @@ static void xen_io_add(MemoryListener *listener,
> MemoryRegionSection *section)
>  {
>  XenIOState *state = container_of(listener, XenIOState, io_listener);
> +MemoryRegion *mr = section->mr;
>  
> -memory_region_ref(section->mr);
> +if (mr->ops == _io_ops)
> +return;

Missing braces, same in xen_io_del.  Otherwise looks ok.

Paolo

> +memory_region_ref(mr);
>  
>  xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
>  }
> @@ -520,10 +524,14 @@ static void xen_io_del(MemoryListener *listener,
> MemoryRegionSection *section)
>  {
>  XenIOState *state = container_of(listener, XenIOState, io_listener);
> +MemoryRegion *mr = section->mr;
> +
> +if (mr->ops == _io_ops)
> +return;
>  
>  xen_unmap_io_section(xen_xc, xen_domid, state->ioservid, section);
>  
> -memory_region_unref(section->mr);
> +memory_region_unref(mr);
>  }
>  
>  static void xen_device_realize(DeviceListener *listener,
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Overlaped PIO with multiple ioreq_server (Xen4.6.1)

2016-05-09 Thread Paolo Bonzini


On 09/05/2016 18:14, Paul Durrant wrote:
>> -Original Message-
>> From: Paul Durrant
>> Sent: 09 May 2016 17:02
>> To: Paul Durrant; Paolo Bonzini; Martin Cerveny
>> Cc: xen-de...@lists.xensource.com; George Dunlap
>> Subject: RE: [Xen-devel] Overlaped PIO with multiple ioreq_server
>> (Xen4.6.1)
>>
>>> -Original Message-
>>> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
>>> Paul Durrant
>>> Sent: 09 May 2016 14:00
>>> To: Paolo Bonzini; Martin Cerveny
>>> Cc: xen-de...@lists.xensource.com; George Dunlap
>>> Subject: Re: [Xen-devel] Overlaped PIO with multiple ioreq_server
>>> (Xen4.6.1)
>>>
>>>> -Original Message-
>>>> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
>>>> Sent: 09 May 2016 13:56
>>>> To: Paul Durrant; Martin Cerveny
>>>> Cc: George Dunlap; xen-de...@lists.xensource.com
>>>> Subject: Re: [Xen-devel] Overlaped PIO with multiple ioreq_server
>>>> (Xen4.6.1)
>>>>
>>>>
>>>>
>>>> On 28/04/2016 13:25, Paul Durrant wrote:
>>>>>> Maybe you are lucky, qemu is registered before your own demu
>>>>>> emulator.
>>>>>
>>>>> I guess I was lucky.
>>>>
>>>> Yeah, QEMU has been doing that since 2013 (commit 3bb28b7, "memory:
>>>> Provide separate handling of unassigned io ports accesses", 2013-09-05).
>>>>
>>>>>> I used for testing your "demu" 2 years ago, now extending Citrix
>>>>>> "vgpu", all was fine up to xen 4.5.2 (with qemu 2.0.2) but
>>>>>> problem begin when I switched to 4.6.1 (with qemu 2.2.1), but it
>>>>>> maybe lucky timing in registration.
>>>>>
>>>>> I think Xen should really be spotting range overlaps like this, but
>>>>> the QEMU<->Xen interface will clearly need to be fixed to avoid the
>>>>> over-claiming of I/O ports like this.
>>>>
>>>> If the handling of unassigned I/O ports is sane in Xen (in QEMU they
>>>> return all ones and discard writes),
>>>
>>> Yes, it does exactly that.
>>>
>>>> it would be okay to make the
>>>> background 0-65535 range conditional on !xen_enabled().  See
>>>> memory_map_init() in QEMU's exec.c file.
>>>>
>>>
>>> Cool. Thanks for the tip. Will have a look at that now.
>>>
>>
>> Looks like creation of the background range is required. (Well, when I simply
>> #if 0-ed out creating it QEMU crashed on invocation). So, I guess I need to 
>> be
>> able to spot, from the memory listener callback in Xen, when a background
>> range is being added so it can be ignored. Same actually goes for memory as
>> well as I/O, since Xen will handle access to unimplemented MMIO ranges in a
>> similar fashion.
>>
> 
> In fact, this patch seems to do the trick for I/O:
> 
> diff --git a/xen-hvm.c b/xen-hvm.c
> index 039680a..8ab44f0 100644
> --- a/xen-hvm.c
> +++ b/xen-hvm.c
> @@ -510,8 +510,12 @@ static void xen_io_add(MemoryListener *listener,
> MemoryRegionSection *section)
>  {
>  XenIOState *state = container_of(listener, XenIOState, io_listener);
> +MemoryRegion *mr = section->mr;
> 
> -memory_region_ref(section->mr);
> +if (mr->ops == _io_ops)
> +return;
> +
> +memory_region_ref(mr);
> 
>  xen_map_io_section(xen_xc, xen_domid, state->ioservid, section);
>  }
> @@ -520,10 +524,14 @@ static void xen_io_del(MemoryListener *listener,
> MemoryRegionSection *section)
>  {
>  XenIOState *state = container_of(listener, XenIOState, io_listener);
> +MemoryRegion *mr = section->mr;
> +
> +if (mr->ops == _io_ops)
> +return;
> 
>  xen_unmap_io_section(xen_xc, xen_domid, state->ioservid, section);
> 
> -memory_region_unref(section->mr);
> +memory_region_unref(mr);
>  }

Looks good, feel free to Cc me if you send it to qemu-devel (though I'll
let Anthony merge it).

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Overlaped PIO with multiple ioreq_server (Xen4.6.1)

2016-05-09 Thread Paolo Bonzini


On 28/04/2016 13:25, Paul Durrant wrote:
>> Maybe you are lucky, qemu is registered before your own demu
>> emulator.
> 
> I guess I was lucky.

Yeah, QEMU has been doing that since 2013 (commit 3bb28b7, "memory:
Provide separate handling of unassigned io ports accesses", 2013-09-05).

>> I used for testing your "demu" 2 years ago, now extending Citrix
>> "vgpu", all was fine up to xen 4.5.2 (with qemu 2.0.2) but
>> problem begin when I switched to 4.6.1 (with qemu 2.2.1), but it
>> maybe lucky timing in registration.
> 
> I think Xen should really be spotting range overlaps like this, but
> the QEMU<->Xen interface will clearly need to be fixed to avoid the
> over-claiming of I/O ports like this.

If the handling of unassigned I/O ports is sane in Xen (in QEMU they
return all ones and discard writes), it would be okay to make the
background 0-65535 range conditional on !xen_enabled().  See
memory_map_init() in QEMU's exec.c file.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Does __KERNEL_DS serve a purpose?

2016-04-08 Thread Paolo Bonzini


On 08/04/2016 18:00, Andy Lutomirski wrote:
> But %ss can be loaded with 0 on 64-bit kernels.  (I assume that
> loading 0 into %ss sets SS.DPL to 0 if done at CPL0, but I'm vague on
> this, since it only really matters to hypervisor code AFAIK.)

It's even simpler, unless CPL=0 SS cannot be loaded with 0 while in a
64-bit code segment (SS can never be loaded with 0 if you're not in a
64-bit code segment).

Thus indeed SS=0 implies SS.DPL=0 on 64-bit kernels.

Paolo

> 32-bit kernels need __KERNEL_DS, I think.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/5] x86/paravirt: Add paravirt_{read, write}_msr

2016-03-15 Thread Paolo Bonzini


On 14/03/2016 17:53, Andy Lutomirski wrote:
> > Please do the same for KVM.
>
> Can you clarify?  KVM uses the native version, and the native version
> only oopses with this series applied if panic_on_oops is set.

Since you fixed the panic_on_oops thing, I'm okay with letting KVM use
the unsafe version and seeing what breaks...

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/5] x86/paravirt: Add paravirt_{read, write}_msr

2016-03-15 Thread Paolo Bonzini


On 14/03/2016 18:02, Andy Lutomirski wrote:
> On Mon, Mar 14, 2016 at 9:58 AM, Linus Torvalds
>  wrote:
>>
>> On Mar 14, 2016 9:53 AM, "Andy Lutomirski"  wrote:
>>>
>>> Can you clarify?  KVM uses the native version, and the native version
>>> only oopses with this series applied if panic_on_oops is set.
>>
>> Can we please remove that idiocy?
>>
>> There is no reason to panic whatsoever. Seriously. What's the upside of that
>> logic?
> 
> I imagine that people who set panic_on_oops want their systems to stop
> running user code if something happens that could corrupt the state or
> if there's any sign that user code is trying some non-deterministic
> exploit.  So I'm guessing that they'd want this type of "the kernel
> screwed up -- abort" to actually result in a panic.
> 
> As a concrete, although somewhat silly, example, suppose that a write
> to MSR_SYSENTER_STACK fails.  If that happened, then user code could
> subsequently try to take over the kernel by evil manipulation of TF
> and/or perf.
> 
> I'd be okay with removing this too, though, since arranging for MSR
> access to fail seems unlikely as an exploit vector.
> 
> Borislav: SUSE actually uses panic_on_oops, right?  What's their goal?

RHEL also does, and it's mostly to trap kernel page faults before they
do more damage such as filesystem corruption.  The debug kernel has
panic_on_oops=0, while the production kernel has panic_on_oops=1.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/5] x86/paravirt: Add paravirt_{read, write}_msr

2016-03-14 Thread Paolo Bonzini


On 11/03/2016 20:06, Andy Lutomirski wrote:
> This adds paravirt hooks for unsafe MSR access.  On native, they
> call native_{read,write}_msr.  On Xen, they use
> xen_{read,write}_msr_safe.
> 
> Nothing uses them yet for ease of bisection.  The next patch will
> use them in rdmsrl, wrmsrl, etc.
> 
> I intentionally didn't make them OOPS on #GP on Xen.  I think that
> should be done separately by the Xen maintainers.

Please do the same for KVM.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] qemu-upstream compile failure in intel_iommu.c:vtd_context_device_invalidate

2016-01-28 Thread Paolo Bonzini


On 27/01/2016 19:23, Olaf Hering wrote:
> 
> xen.git/tools/qemu-xen-dir/hw/i386/intel_iommu.c: In function 
> ‘vtd_context_device_invalidate’:
> xen.git/tools/qemu-xen-dir/hw/i386/intel_iommu.c:911:46: error: ‘mask’ may be 
> used uninitialized in this function [-Werror=maybe-uninitialized]
>  if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
>   ^
> It works with -O2. From the code flow its clear that mask is always
> initialized. Looks like gcc 5.2.1 does no proper diagnostic at -O1.
> What should be done with such issues, are they fixed in the code?

It's probably simplest to add a

default:
abort();

to the switch statement above.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 10/11] Add Error **errp for xen_pt_config_init()

2016-01-22 Thread Paolo Bonzini


On 21/01/2016 18:01, Stefano Stabellini wrote:
> -XEN_PT_LOG(>dev, "Failed to initialize %d/%ld reg 
> 0x%x in grp_type=0x%x (%d/%ld), rc=%d\n",
> -   j, 
> ARRAY_SIZE(xen_pt_emu_reg_grps[i].emu_regs),
> -   regs->offset, 
> xen_pt_emu_reg_grps[i].grp_type,
> -   i, ARRAY_SIZE(xen_pt_emu_reg_grps), rc);
> +xen_pt_config_reg_init(s, reg_grp_entry, regs, );
> +if (err) {
> +error_append_hint(, "Failed to initialize %d/%zu"
> +" reg 0x%x in grp_type = 0x%x (%d/%zu)",
> +j, 
> ARRAY_SIZE(xen_pt_emu_reg_grps[i].emu_regs),

Coverity noticed a preexisting problem here.  emu_regs is a pointer,
thus ARRAY_SIZE doesn't return what you expect.

Paolo

> +regs->offset, 
> xen_pt_emu_reg_grps[i].grp_type,
> +i, ARRAY_SIZE(xen_pt_emu_reg_grps));
> +error_propagate(errp, err);

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression: Xen guest with 5G of RAM on 32bit fail to boot

2015-12-02 Thread Paolo Bonzini


On 01/12/2015 18:53, Anthony PERARD wrote:
> The problem is in qemu_ram_alloc_internal() where 'size' and 'maxsize' are
> now been truncate to 32bit, due to 'qemu_host_page_size' been an uintptr_t
> in the HOST_PAGE_ALIGN macro.

Isn't it qemu_host_page_mask that causes the problem?

This should also work, as it causes qemu_host_page_mask to sign-extend:

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index f9998b9..87a4145 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -174,11 +174,10 @@ extern unsigned long reserved_va;
 #define TARGET_PAGE_MASK ~(TARGET_PAGE_SIZE - 1)
 #define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & 
TARGET_PAGE_MASK)
 
-/* ??? These should be the larger of uintptr_t and target_ulong.  */
 extern uintptr_t qemu_real_host_page_size;
-extern uintptr_t qemu_real_host_page_mask;
+extern intptr_t qemu_real_host_page_mask;
 extern uintptr_t qemu_host_page_size;
-extern uintptr_t qemu_host_page_mask;
+extern intptr_t qemu_host_page_mask;
 
 #define HOST_PAGE_ALIGN(addr) (((addr) + qemu_host_page_size - 1) & 
qemu_host_page_mask)
 #define REAL_HOST_PAGE_ALIGN(addr) (((addr) + qemu_real_host_page_size - 1) & \
diff --git a/translate-all.c b/translate-all.c
index a940bd2..7a15109 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -118,7 +118,7 @@ typedef struct PageDesc {
 #define V_L1_SHIFT (L1_MAP_ADDR_SPACE_BITS - TARGET_PAGE_BITS - V_L1_BITS)
 
 uintptr_t qemu_host_page_size;
-uintptr_t qemu_host_page_mask;
+intptr_t qemu_host_page_mask;
 
 /* The bottom level has pointers to PageDesc */
 static void *l1_map[V_L1_SIZE];
@@ -326,14 +326,14 @@ void page_size_init(void)
 /* NOTE: we can always suppose that qemu_host_page_size >=
TARGET_PAGE_SIZE */
 qemu_real_host_page_size = getpagesize();
-qemu_real_host_page_mask = ~(qemu_real_host_page_size - 1);
+qemu_real_host_page_mask = -(intptr_t)qemu_real_host_page_size;
 if (qemu_host_page_size == 0) {
 qemu_host_page_size = qemu_real_host_page_size;
 }
 if (qemu_host_page_size < TARGET_PAGE_SIZE) {
 qemu_host_page_size = TARGET_PAGE_SIZE;
 }
-qemu_host_page_mask = ~(qemu_host_page_size - 1);
+qemu_host_page_mask = -(intptr_t)qemu_host_page_size;
 }
 
 static void page_init(void)
diff --git a/translate-common.c b/translate-common.c
index 619feb4..171222d 100644
--- a/translate-common.c
+++ b/translate-common.c
@@ -21,7 +21,7 @@
 #include "qom/cpu.h"
 
 uintptr_t qemu_real_host_page_size;
-uintptr_t qemu_real_host_page_mask;
+intptr_t qemu_real_host_page_mask;
 
 #ifndef CONFIG_USER_ONLY
 /* mask must never be zero, except for A20 change call */

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Regression: Xen guest with 5G of RAM on 32bit fail to boot

2015-12-02 Thread Paolo Bonzini


On 02/12/2015 11:30, Paolo Bonzini wrote:
> diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
> index f9998b9..87a4145 100644
> --- a/include/exec/cpu-all.h
> +++ b/include/exec/cpu-all.h
> @@ -174,11 +174,10 @@ extern unsigned long reserved_va;
>  #define TARGET_PAGE_MASK ~(TARGET_PAGE_SIZE - 1)
>  #define TARGET_PAGE_ALIGN(addr) (((addr) + TARGET_PAGE_SIZE - 1) & 
> TARGET_PAGE_MASK)
>  
> -/* ??? These should be the larger of uintptr_t and target_ulong.  */
>  extern uintptr_t qemu_real_host_page_size;
> -extern uintptr_t qemu_real_host_page_mask;
> +extern intptr_t qemu_real_host_page_mask;
>  extern uintptr_t qemu_host_page_size;
> -extern uintptr_t qemu_host_page_mask;
> +extern intptr_t qemu_host_page_mask;
>  
>  #define HOST_PAGE_ALIGN(addr) (((addr) + qemu_host_page_size - 1) & 
> qemu_host_page_mask)
>  #define REAL_HOST_PAGE_ALIGN(addr) (((addr) + qemu_real_host_page_size - 1) 
> & \
> diff --git a/translate-all.c b/translate-all.c
> index a940bd2..7a15109 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -118,7 +118,7 @@ typedef struct PageDesc {
>  #define V_L1_SHIFT (L1_MAP_ADDR_SPACE_BITS - TARGET_PAGE_BITS - V_L1_BITS)
>  
>  uintptr_t qemu_host_page_size;
> -uintptr_t qemu_host_page_mask;
> +intptr_t qemu_host_page_mask;
>  
>  /* The bottom level has pointers to PageDesc */
>  static void *l1_map[V_L1_SIZE];
> @@ -326,14 +326,14 @@ void page_size_init(void)
>  /* NOTE: we can always suppose that qemu_host_page_size >=
> TARGET_PAGE_SIZE */
>  qemu_real_host_page_size = getpagesize();
> -qemu_real_host_page_mask = ~(qemu_real_host_page_size - 1);
> +qemu_real_host_page_mask = -(intptr_t)qemu_real_host_page_size;
>  if (qemu_host_page_size == 0) {
>  qemu_host_page_size = qemu_real_host_page_size;
>  }
>  if (qemu_host_page_size < TARGET_PAGE_SIZE) {
>  qemu_host_page_size = TARGET_PAGE_SIZE;
>  }
> -qemu_host_page_mask = ~(qemu_host_page_size - 1);
> +qemu_host_page_mask = -(intptr_t)qemu_host_page_size;
>  }
>  
>  static void page_init(void)
> diff --git a/translate-common.c b/translate-common.c
> index 619feb4..171222d 100644
> --- a/translate-common.c
> +++ b/translate-common.c
> @@ -21,7 +21,7 @@
>  #include "qom/cpu.h"
>  
>  uintptr_t qemu_real_host_page_size;
> -uintptr_t qemu_real_host_page_mask;
> +intptr_t qemu_real_host_page_mask;
>  
>  #ifndef CONFIG_USER_ONLY
>  /* mask must never be zero, except for A20 change call */
> 
> 

Ok, I tested this by adding

+ assert(HOST_PAGE_ALIGN(0x123456700ll) == 0x123457000ll);
+ assert(REAL_HOST_PAGE_ALIGN(0x123456700ll) == 0x123457000ll);

and doing a 32-bit x86_64-linux-user build.  Since Dave's patch does not
compile for user-mode emulation (ram_addr_t is a softmmu concept), I'm
queuing my patch for 2.5.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

2015-11-19 Thread Paolo Bonzini


On 19/11/2015 09:40, Gerd Hoffmann wrote:
>> > But this code should be
>> > minor to be maintained in libvirt.
> As far I know libvirt only needs to discover those devices.  If they
> look like sr/iov devices in sysfs this might work without any changes to
> libvirt.

I don't think they will look like SR/IOV devices.

The interface may look a little like the sysfs interface that GVT-g is
already using.  However, it should at least be extended to support
multiple vGPUs in a single VM.  This might not be possible for Intel
integrated graphics, but it should definitely be possible for discrete
graphics cards.

Another nit is that the VM id should probably be replaced by a UUID
(because it's too easy to stumble on an existing VM id), assuming a VM
id is needed at all.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

2015-11-19 Thread Paolo Bonzini


On 19/11/2015 16:32, Stefano Stabellini wrote:
> > In addition to Kevin's replies, I have a high-level question: can VFIO
> > be used by QEMU for both KVM and Xen?
> 
> No. VFIO cannot be used with Xen today. When running on Xen, the IOMMU
> is owned by Xen.

I don't think QEMU command line compatibility between KVM and Xen should
be a design goal for GVT-g.

Nevertheless, it shouldn't be a problem to use a "virtual" VFIO (which
doesn't need the IOMMU, because it uses the MMU in the physical GPU)
even under Xen.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Qemu/Xen: Fix early freeing MSIX MMIO memory region

2015-10-12 Thread Paolo Bonzini


On 12/10/2015 13:09, Stefano Stabellini wrote:
> On Sun, 11 Oct 2015, Lan Tianyu wrote:
>> From: >
>>
>> msix->mmio is added to XenPCIPassthroughState's object as property.
>> object_finalize_child_property is called for XenPCIPassthroughState's
>> object, which calls object_property_del_all, which is going to try to
>> delete msix->mmio. object_finalize_child_property() will access
>> msix->mmio's obj. But the whole msix struct has already been freed
>> by xen_pt_msix_delete. This will cause segment fault when msix->mmio
>> has been overwritten.
>>
>> This patch is to fix the issue.
>>
>> Signed-off-by: Lan Tianyu 
> 
> Looks good to me. Paolo?

Also looks good to me.  Thanks!

Paolo

>>  hw/xen/xen_pt.c |8 
>>  hw/xen/xen_pt.h |1 +
>>  hw/xen/xen_pt_config_init.c |2 +-
>>  hw/xen/xen_pt_msi.c |   13 -
>>  4 files changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
>> index 2b54f52..aa96288 100644
>> --- a/hw/xen/xen_pt.c
>> +++ b/hw/xen/xen_pt.c
>> @@ -938,10 +938,18 @@ static void xen_pci_passthrough_class_init(ObjectClass 
>> *klass, void *data)
>>  dc->props = xen_pci_passthrough_properties;
>>  };
>>  
>> +static void xen_pci_passthrough_finalize(Object *obj)
>> +{
>> +XenPCIPassthroughState *s = XEN_PT_DEVICE(obj);
>> +
>> +xen_pt_msix_delete(s);
>> +}
>> +
>>  static const TypeInfo xen_pci_passthrough_info = {
>>  .name = TYPE_XEN_PT_DEVICE,
>>  .parent = TYPE_PCI_DEVICE,
>>  .instance_size = sizeof(XenPCIPassthroughState),
>> +.instance_finalize = xen_pci_passthrough_finalize,
>>  .class_init = xen_pci_passthrough_class_init,
>>  };
>>  
>> diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
>> index 3bc22eb..c545280 100644
>> --- a/hw/xen/xen_pt.h
>> +++ b/hw/xen/xen_pt.h
>> @@ -305,6 +305,7 @@ void xen_pt_msi_disable(XenPCIPassthroughState *s);
>>  
>>  int xen_pt_msix_init(XenPCIPassthroughState *s, uint32_t base);
>>  void xen_pt_msix_delete(XenPCIPassthroughState *s);
>> +void xen_pt_msix_unmap(XenPCIPassthroughState *s);
>>  int xen_pt_msix_update(XenPCIPassthroughState *s);
>>  int xen_pt_msix_update_remap(XenPCIPassthroughState *s, int bar_index);
>>  void xen_pt_msix_disable(XenPCIPassthroughState *s);
>> diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
>> index 4a5bc11..0efee11 100644
>> --- a/hw/xen/xen_pt_config_init.c
>> +++ b/hw/xen/xen_pt_config_init.c
>> @@ -2079,7 +2079,7 @@ void xen_pt_config_delete(XenPCIPassthroughState *s)
>>  
>>  /* free MSI/MSI-X info table */
>>  if (s->msix) {
>> -xen_pt_msix_delete(s);
>> +xen_pt_msix_unmap(s);
>>  }
>>  g_free(s->msi);
>>  
>> diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c
>> index e3d7194..82de2bc 100644
>> --- a/hw/xen/xen_pt_msi.c
>> +++ b/hw/xen/xen_pt_msi.c
>> @@ -610,7 +610,7 @@ error_out:
>>  return rc;
>>  }
>>  
>> -void xen_pt_msix_delete(XenPCIPassthroughState *s)
>> +void xen_pt_msix_unmap(XenPCIPassthroughState *s)
>>  {
>>  XenPTMSIX *msix = s->msix;
>>  
>> @@ -627,6 +627,17 @@ void xen_pt_msix_delete(XenPCIPassthroughState *s)
>>  }
>>  
>>  memory_region_del_subregion(>bar[msix->bar_index], >mmio);
>> +}
>> +
>> +void xen_pt_msix_delete(XenPCIPassthroughState *s)
>> +{
>> +XenPTMSIX *msix = s->msix;
>> +
>> +if (!msix) {
>> +return;
>> +}
>> +
>> +object_unparent(OBJECT(>mmio));
>>  
>>  g_free(s->msix);
>>  s->msix = NULL;
>> -- 
>> 1.7.9.5
>>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-22 Thread Paolo Bonzini


On 21/09/2015 19:43, Andy Lutomirski wrote:
> And maybe the KVM user return notifier.

No, not really.  If anything, the place in KVM where it makes a
difference is vmx_save_host_state, which is also only using
always-present MSRs.  But don't care about KVM.

First clean it up, then we can add back inline versions like __rdmsr or
rdmsr_fault or rdmsr_unsafe or whatever.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-21 Thread Paolo Bonzini


On 21/09/2015 10:46, Ingo Molnar wrote:
> Or we could extend exception table entry encoding to include a 'warning bit', 
> to 
> not bloat the kernel. If the exception handler code encounters such an 
> exception 
> it would generate a one-time warning for that entry, but otherwise not crash 
> the 
> kernel and continue execution with an all-zeroes result for the MSR read.

The 'warning bit' already exists, it is the opcode that caused the fault. :)

The concern about bloat is a good one.  However, why is it necessary to
keep native_*_msr* inline?  If they are moved out-of-line, using the
exception table becomes the obvious solution and doesn't cause bloat
anymore.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] KVM: x86: trap AMD MSRs for the TSeg base and mask

2015-09-18 Thread Paolo Bonzini
These have roughly the same purpose as the SMRR, which we do not need
to implement in KVM.  However, Linux accesses MSR_K8_TSEG_ADDR at
boot, which causes problems when running a Xen dom0 under KVM.
Just return 0, meaning that processor protection of SMRAM is not
in effect.

Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
---
 arch/x86/include/asm/msr-index.h | 1 +
 arch/x86/kvm/x86.c   | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index c1c0a1c14344..b98b471a3b7e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -331,6 +331,7 @@
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK0x1800
 #define MSR_K8_TSEG_ADDR   0xc0010112
+#define MSR_K8_TSEG_MASK   0xc0010113
 #define K8_MTRRFIXRANGE_DRAM_ENABLE0x0004 /* MtrrFixDramEn bit*/
 #define K8_MTRRFIXRANGE_DRAM_MODIFY0x0008 /* MtrrFixDramModEn bit */
 #define K8_MTRR_RDMEM_WRMEM_MASK   0x18181818 /* Mask: RdMem|WrMem*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6bbb0dfb99d0..991466bf8dee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2190,6 +2190,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_LASTINTFROMIP:
case MSR_IA32_LASTINTTOIP:
case MSR_K8_SYSCFG:
+   case MSR_K8_TSEG_ADDR:
+   case MSR_K8_TSEG_MASK:
case MSR_K7_HWCR:
case MSR_VM_HSAVE_PA:
case MSR_K8_INT_PENDING_MSG:
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] rdmsr_safe in Linux PV (under Xen) gets an #GP:Re: [Fedora-xen] Running fedora xen on top of KVM?

2015-09-18 Thread Paolo Bonzini


On 18/09/2015 15:54, Borislav Petkov wrote:
> On Thu, Sep 17, 2015 at 01:23:31PM -0700, Andy Lutomirski wrote:
>> > Cc: Borislav.  Is TSEG guaranteed to exist?  Can we defer that until
> Why not? It is the tseg base address.
> 
> I think this is kvm injecting a #GP as it is not ignoring this MSR.
> Presumably modprobing kvm with "ignore_msrs=1" or so should hide the
> issue IIUC...

We should return 0 for the tseg MSRs.  KVM always handles them in the
northbridge emulation code, like Intel chipsets do.  Will send a patch.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Paolo Bonzini


On 17/09/2015 11:31, Borislav Petkov wrote:
> 
>> > Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes 
>> > are 
>> > non-critical and returning the 'safe' result is much better than crashing 
>> > or 
>> > hanging the bootup.
> ... and prepending all MSR accesses with feature/CPUID checks is probably 
> almost
> impossible.

That's not a big deal, that's what *_safe is for.  The problem is that
there are definitely some cases where the *_safe version is not being used.

I agree with Ingo that we should start with a WARN.  For example:

- give the read_msr and write_msr hooks the same prototype as the safe
variants

- make the virt platforms always return "no error" for the unsafe
variants (I understand if your first reaction is "ouch", but this
effectively is already the current behavior)

- change rdmsr/wrmsr/rdmsrl/wrmsrl to WARN if the read_msr and write_msr
hooks return an error

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Paolo Bonzini


On 17/09/2015 10:58, Peter Zijlstra wrote:
> But the far greater problem I have with the whole virt thing is that
> you cannot use rdmsr_safe() to probe if an MSR exists at all because, as
> you told me, these virt thingies return 0 for all 'unknown' MSRs instead
> of faulting.

At least for KVM, that behavior is opt-in (the ignore_msrs parameter)
and no distro that I know enables it by default.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Paolo Bonzini


On 17/09/2015 17:26, Andy Lutomirski wrote:
> Maybe Paolo can fix QEMU to fail bad MSR accesses for real...

I was afraid of someone proposing exactly that. :)

I can do it since the list of MSRs can be lifted from KVM.  Let's first
see the direction these patches go...

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Paolo Bonzini


On 17/09/2015 17:31, Arjan van de Ven wrote:
>>
>> What about 0 + WARN?
> 
> why 0?
> 
> 0xdeadbeef or any other pattern (even 0x3636363636) makes more sense (of
> course also WARN... but most folks don't read dmesg for WARNs)
> 
> (it's the same thing we do for list or slab poison stuff)

Sorry, brain fart.  That makes sense for the safe variants.  It would
require some auditing though.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 21/29] xen/pt: Sync up the dev.config and data values.

2015-09-15 Thread Paolo Bonzini


On 15/09/2015 15:28, Stefano Stabellini wrote:
> 
>> >  2). Send an follow up patch to change this to abort()? (and wherever else 
>> > I used
>> >  assert(..)?
> Yes, that would be good.
> 
> 
>> >  3). Wait till Paolo is done going through the patchset and then revisit 
>> > 1) or 2)?
> I don't know if Paolo has any more comments.

No, the few comments I had were only prompted by Coverity.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 0/19] xen-2015-09-08-tag

2015-09-14 Thread Paolo Bonzini


On 10/09/2015 12:29, Stefano Stabellini wrote:
> +if (lseek(config_fd, pos, SEEK_SET) != pos) {
> +return -errno;
> +}
>  do {
> -rc = pread(config_fd, (uint8_t *), len, pos);
> +rc = read(config_fd, (uint8_t *), len);
>  } while (rc < 0 && (errno == EINTR || errno == EAGAIN));

This leaks config_fd.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 21/29] xen/pt: Sync up the dev.config and data values.

2015-09-14 Thread Paolo Bonzini


On 10/09/2015 19:15, Stefano Stabellini wrote:
> +
> +switch (reg->size) {
> +case 1: rc = xen_host_pci_get_byte(>real_device, offset, (uint8_t 
> *));

A bit ugly, and it relies on the host being little endian.

> +break;
> +case 2: rc = xen_host_pci_get_word(>real_device, offset, 
> (uint16_t *));

Same here.

> +break;
> +case 4: rc = xen_host_pci_get_long(>real_device, offset, );
> +break;
> +default: assert(1);

This should be assert(0) or, better, abort().

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [edk2] EDK II & GPL - Re: OVMF BoF @ KVM Forum 2015

2015-09-10 Thread Paolo Bonzini


On 10/09/2015 16:24, Kevin Davis wrote:
> Further leading me to guess that any actual use of those
> implementations could lead to you actually needing to hire a real
> attorney and not one that you find on YouTube.

The good thing is that attorneys have already figured it out.  IBM
figured out a few years ago how to work around Microsoft's patents, and
that's how FAT32 (and more specifically long file names) are implemented
in Linux.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [edk2] EDK II & GPL - Re: OVMF BoF @ KVM Forum 2015

2015-09-09 Thread Paolo Bonzini
Well, FatPkg is only superficially permissive and not even open source, so 
there is a precedent. (A precedent that, I might add, happens to violate 
SourceForge's the off service).

When we import edk2 into Fedora we just remove FatBinPkg. We would think twice 
before contributing to it, but do not make any kind of fuss about it.

GPL is just the same. For example, it would be possible to have an 
automatically-updated git repo that omits the GPL directory; and development 
would then be easier for people whose legal departments tend not to influence 
the engineers' productivity.

In fact:

1) it is not like, among non-Intel contributors, proprietary software companies 
have the lion's share of edk2 commits, and they probably use Tiano releases. 
Intel could strip any GPL pieces as part of the Tiano release process.

2) the GPL is working just fine for Linux, which is not that different from 
UEFI. So, picture me skeptical. If anything, what Linux can teach edk2 is that 
a closed prices and balkanized trees are a direct cause of the abysmal security 
of those implementations.

Paolo

-Original Message-
From: El-Haj-Mahmoud, Samer [samer.el-haj-mahm...@hpe.com]
Received: mercoledì, 09 set 2015, 21:12
To: Jordan Justen [jordan.l.jus...@intel.com]; Andrew Fish [af...@apple.com]
CC: Lenny Szubowicz [lenn...@redhat.com]; Karen Noel [kn...@redhat.com]; Ard 
Biesheuvel [ard.biesheu...@linaro.org]; edk2-devel-01 [edk2-de...@ml01.01.org]; 
Reza Jelveh [reza.jel...@tuhh.de]; Alexander Graf [ag...@suse.de]; qemu devel 
list [qemu-de...@nongnu.org]; Hannes Reinecke [h...@suse.de]; Gabriel L. Somlo 
(GMail) [gso...@gmail.com]; Peter Jones [pjo...@redhat.com]; Peter Batard 
[p...@akeo.ie]; Gerd Hoffmann [kra...@redhat.com]; Cole Robinson 
[crobi...@redhat.com]; Paolo Bonzini [pbonz...@redhat.com]; 
xen-devel@lists.xen.org [xen-devel@lists.xen.org]; Laszlo Ersek 
[ler...@redhat.com]; Ademar de Souza Reis Jr. [ar...@redhat.com]
Subject: RE: [edk2] EDK II & GPL - Re:  OVMF BoF @ KVM Forum 2015

The recent expansions beyond BSD where all permissive licenses (BSD like) as 
far as I can tell.

I agree with Andrew, opening the door for GPL licensed code in EDK2 will have 
severe consequences for products that are built using EDK2. 



-Original Message-
From: edk2-devel [mailto:edk2-devel-boun...@lists.01.org] On Behalf Of Jordan 
Justen
Sent: Wednesday, September 09, 2015 12:58 PM
To: Andrew Fish <af...@apple.com>
Cc: Lenny Szubowicz <lenn...@redhat.com>; Karen Noel <kn...@redhat.com>; Ard 
Biesheuvel <ard.biesheu...@linaro.org>; edk2-devel-01 
<edk2-de...@lists.01.org>; Reza Jelveh <reza.jel...@tuhh.de>; Alexander Graf 
<ag...@suse.de>; qemu devel list <qemu-de...@nongnu.org>; Hannes Reinecke 
<h...@suse.de>; Gabriel L. Somlo (GMail) <gso...@gmail.com>; Peter Jones 
<pjo...@redhat.com>; Peter Batard <p...@akeo.ie>; Gerd Hoffmann 
<kra...@redhat.com>; Cole Robinson <crobi...@redhat.com>; Paolo Bonzini 
<pbonz...@redhat.com>; xen-devel@lists.xen.org; Laszlo Ersek 
<ler...@redhat.com>; Ademar de Souza Reis Jr. <ar...@redhat.com>
Subject: Re: [edk2] EDK II & GPL - Re: OVMF BoF @ KVM Forum 2015

On 2015-09-09 10:04:50, Andrew Fish wrote:
> 
> > On Sep 9, 2015, at 9:17 AM, Jordan Justen <jordan.l.jus...@intel.com> wrote:
> > 
> > So, related to this, I wonder how the community would feel about a 
> > GplDriverPkg. Would the community allow it as a new package in EDK 
> > II directly, or would a separate repo be required?
> > 
> 
> I think we would need a separate repo, like the FAT driver. That is 
> the only way to deal with the license issues.

There doesn't seem to be any guiding rules here. For example, I think some 
people are not comfortable with the FatBinPkg being in the tree due to the 
license. Why is that okay?

> > With regards to adding it directly into the EDK II tree, here are 
> > some potential concerns that I might anticipate hearing from the community:
> > 
> > * It will make it easier for contributors to choose GPL compared to 
> > a  permissive license, thereby limiting some users of the 
> > contribution
> > 
> > * GPL code will more easily be copied into the permissively licensed  
> > packages
> > 
> > * Some might refuse to look at EDK II entirely if it has a directory  
> > with GPL source code in it
> > 
> 
> Or have their rights to contribute revoked since this is a fundamental 
> change, and would require employees to get reauthorized by their legal 
> departments to contribute.

We've recently expanded beyond just allowing BSD code into the tree, and that 
appeared to be no big deal. No one brought this up as a fundamental change.

Just to be clear, are you saying Apple probably won't be able to contribute to 
EDK II if there is a

Re: [Xen-devel] [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-29 Thread Paolo Bonzini


On 29/07/2015 02:47, Andy Lutomirski wrote:
   If new kernels ignore the IOMMU for devices that don't set the flag
   and there are physical devices that already exist and don't set the
   flag, then those devices won't work reliably on most modern
   non-virtual platforms, PPC included.
 
  Are there many virtio physical devices out there ? We are talking about
  a virtio flag right ? Or have you been considering something else ?

 Yes, virtio flag.  I dislike having a virtio flag at all, but so far
 no one has come up with any better ideas.  If there was a reliable,
 cross-platform mechanism for per-device PCI bus properties, I'd be all
 for using that instead.

No, a virtio flag doesn't make sense.

Blindly using system memory is a bug in QEMU; it has to be fixed to use
the right address space, and then whatever the system provides to
describe the right address space can be used (like the DMAR table on x86).

On PPC I suppose you could use the host bridge's device tree?  If you
need a hook, you can add a

bool virtio_should_bypass_iommu(void)
{
/* lookup something in the device tree?!? */
}
EXPORT_SYMBOL_GPL(virtio_should_bypass_iommu);

in some pseries.c file, and in the driver:

static bool virtio_bypass_iommu(void)
{
bool (*fn)(void);

fn = symbol_get(virtio_should_bypass_iommu);
return fn  fn();
}

Awful, but that's what this thing is.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Paolo Bonzini


On 28/07/2015 03:08, Andy Lutomirski wrote:
 On Mon, Sep 1, 2014 at 10:39 AM, Andy Lutomirski l...@amacapital.net wrote:
 This fixes virtio on Xen guests as well as on any other platform
 that uses virtio_pci on which physical addresses don't match bus
 addresses.

 This can be tested with:

 virtme-run --xen xen --kimg arch/x86/boot/bzImage --console

 using virtme from here:

 https://git.kernel.org/cgit/utils/kernel/virtme/virtme.git

 Without these patches, the guest hangs forever.  With these patches,
 everything works.

 
 Dusting off an ancient thread.
 
 Now that the dust has accumulated^Wsettled, is it worth pursuing this?
  I think the situation is considerably worse than it was when I
 originally wrote these patches: I think that QEMU now supports a nasty
 mode in which the guest's PCI bus appears to be behind an IOMMU but
 the virtio devices on that bus punch straight through that IOMMU.

That is an experimental feature (it's x-iommu), so it can change.

The plan was:

- for PPC, virtio never honors IOMMU

- for non-PPC, either have virtio always honor IOMMU, or enforce that
virtio is not under IOMMU.

Paolo

 I have a half-hearted port to modern kernels here:
 
 https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=virtio_ring_xen
 
 I didn't implement DMA API access for virtio_pci_modern, and I have no
 idea what to do about detecting whether a given virtio device honors
 its IOMMU or not.
 
 --Andy
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Paolo Bonzini


On 28/07/2015 12:12, Benjamin Herrenschmidt wrote:
  That is an experimental feature (it's x-iommu), so it can change.
  
  The plan was:
  
  - for PPC, virtio never honors IOMMU
  
  - for non-PPC, either have virtio always honor IOMMU, or enforce that
  virtio is not under IOMMU.
  
 I dislike having PPC special cased.
 
 In fact, today x86 guests also assume that virtio bypasses IOMMU I
 believe. In fact *all* guests do.

This doesn't matter much, since the only guests that implement an IOMMU
in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
of stability.

 I would much prefer if the information as to whether it honors or not
 gets passed to the guest somewhat. My preference goes for passing it via
 the virtio config space but there were objections that it should be a
 bus property (which is tricky to do with PCI and doesn't properly
 reflect the fact that in qemu you can mix  match IOMMU-honoring devices
 and bypassing-virtio on the same bus). 

Yes, for example on x86 it must be passed through the DMAR table.
virtio-pci device must have a separate DRHD for them.  In QEMU, you
could add an under-iommu property to PCI bridges, and walk the
hierarchy of bridges to build the DRHDs.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Paolo Bonzini


On 28/07/2015 15:11, Jan Kiszka wrote:
 
  This doesn't matter much, since the only guests that implement an IOMMU
  in QEMU are (afaik) PPC and x86, and x86 does not yet promise any kind
  of stability.
  
  Hmm I think Jan (cc) said it was already used out there.
 Yes, no known issues with vt-d emulation for almost a year now. Error
 reporting could be improved, and interrupt remapping is still missing,
 but those are minor issues in this context.

On the other hand interrupt remapping is absolutely necessary for
production use, hence my point that x86 does not promise API stability.

(Any kind of stability actually didn't include crashes; those are not
expected :))

The Google patches for userspace PIC and IOAPIC are proceeding well, so
hopefully we can have interrupt remapping soon.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Paolo Bonzini


On 28/07/2015 18:42, Jan Kiszka wrote:
  On the other hand interrupt remapping is absolutely necessary for
  production use, hence my point that x86 does not promise API stability.
 
 Well, we currently implement the features that the Q35 used to expose.
 Adding interrupt remapping will require a new chipset and/or a hack
 switch to ignore compatibility.

Isn't the VT-d register space separate from other Q35 features and
backwards-compatible?  You could even add it to PIIX in theory just by
adding a DMAR.

It's not like for example SMRAM, where the registers are in the
northbridge configuration space and move around in every chipset generation.

  (Any kind of stability actually didn't include crashes; those are not
  expected :))
  
  The Google patches for userspace PIC and IOAPIC are proceeding well, so
  hopefully we can have interrupt remapping soon.
 
 If the day had 48 hours... I'd love to look into this, first adding QEMU
 support for the new irqchip architecture.

I hope I can squeeze in some time for that...  Google also had an intern
that was looking at it.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 0/4] virtio: Clean up scatterlists and use the DMA API

2015-07-28 Thread Paolo Bonzini


On 28/07/2015 19:19, Jan Kiszka wrote:
 On 2015-07-28 19:15, Paolo Bonzini wrote:


 On 28/07/2015 18:42, Jan Kiszka wrote:
 On the other hand interrupt remapping is absolutely necessary for
 production use, hence my point that x86 does not promise API stability.

 Well, we currently implement the features that the Q35 used to expose.
 Adding interrupt remapping will require a new chipset and/or a hack
 switch to ignore compatibility.

 Isn't the VT-d register space separate from other Q35 features and
 backwards-compatible?  You could even add it to PIIX in theory just by
 adding a DMAR.
 
 Yes, it's practically working, but it's not accurate /wrt how that
 hardware looked like in reality.

We've done that for a long time.  Real PIIX3 didn't have ACPI too, for
example (and it had a USB UHCI that is optional in QEMU).

Of course I'm not advocating adding the IOMMU to PIIX (assuming that
would work even just practically)... but I don't think adding interrupt
remapping to Q35 is a big deal.  It would be optional, just in case you
want to debug something without interrupt remapping, but it can be added.

 The Google patches for userspace PIC and IOAPIC are proceeding well, so
 hopefully we can have interrupt remapping soon.

 If the day had 48 hours... I'd love to look into this, first adding QEMU
 support for the new irqchip architecture.

 I hope I can squeeze in some time for that...  Google also had an intern
 that was looking at it.
 
 Great!

In theory it's easy with the latest series.  All you need is support for
converting IOAPIC routes to KVM routes (and of course the glue code to
enable the capability and create the userspace devices); everything else
should work just by reusing the -machine kernel_irqchip=on code.  In
theory...

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] qemu mainline regression with xen-unstable: unable to start QMP

2015-06-05 Thread Paolo Bonzini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256



On 05/06/2015 12:11, Stefano Stabellini wrote:
 
 Hopefully I will get to a change to Xen.  However getting the
 Xen change back-ported to enough version(s) will not be
 quick...
 Yeah, this is basically an ABI breakage.
 
 We have been trying to keep QEMU and Xen working together
 reasonably well without the need to specify that only QEMU newer
 than XXX or older than YYY work with Xen 4.x. This change would
 make it very difficult to do so.

It definitely is a 2.4 release blocker, no worries.

Paolo
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQEcBAEBCAAGBQJVcXj/AAoJEL/70l94x66DIwQH+gPYUvr5FQMftCFHWApiyBhx
oqq5HjpOZ3ZAcYy24QwbBSGj4WDMP2k5A7TuMZ4c/IcG8YUIjs222cOczgsFBxI4
evu+4QP7bAoVqGEbAfpuGTM70iGeoRzV3dT8ZffRLHPZk6230CvdEwY0w5Mj9fAF
nAhLpY4Z/fm5iezFYTAEPC/B6xgp60s4ShYaXJSulmiwHsV+qjneokDPYTnd8Qku
XZ7gmdnnRO3iaBq8czsuanK+xi1+vRY4VSJRRaIaQT6ud7yMfKil7Xmhl9oKuOj9
IATQki7SgZiTc1/KZL+QwVUP80+UfV85lvb2SutoCHD/qTZiHXUbJB99k0ynrUM=
=ay2H
-END PGP SIGNATURE-

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Do not emulate a floppy drive when -nodefaults

2015-05-15 Thread Paolo Bonzini


On 15/05/2015 09:50, Markus Armbruster wrote:

 --nodefaults must continue to disable all optional parts of the board.
 
 What exactly is optional is for the board / machine type to define.  It
 can't be changed once the machine type is released.
 
 When in doubt, make it optional.

I agree entirely.  This unfortunately applies only to future boards.

 Since Q35 is just starting to become migratable, the time to painlessly
 change its optional parts is *now*.

I can buy this.

Let's just not divert attention from more important stuff.  I may be
more inclined to accept patches, if we add something to our security
policy about disliking cute backronyms that attract bad journalism like
magnets.  And I'm only half joking; more like only 1% joking.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [PATCH] Do not emulate a floppy drive when -nodefaults

2015-05-14 Thread Paolo Bonzini


On 14/05/2015 14:02, Markus Armbruster wrote:
   It should certainly be off for pc-q35-2.4 and newer.  Real Q35 boards
   commonly don't have an FDC (depends on the Super I/O chip used).
 
   We may want to keep it off for pc-i440fx-2.4 and newer.  I doubt
   there's a real i440FX without an FDC, but our virtual i440FX is quite
   unlike a real one in other ways already.

That would break libvirt for people upgrading from 2.3 to 2.4.  So it's
more like pc-i440fx-3.0 and pc-q35-3.0.  Unless for q35 we decide to
break everything and retroactively nuke the controller.

(I'm still not sure why we have backwards-compatible machine types for q35).

Paolo

 * Create the FDC only if the option is on.
 
 * Optional: make -drive if=floppy,... auto-enable it
 
   I wouldn't bother doing the same for -global isa-fdc.driveA=... and
   such.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [PATCH] Do not emulate a floppy drive when -nodefaults

2015-05-14 Thread Paolo Bonzini


On 14/05/2015 14:45, Markus Armbruster wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
 
 On 14/05/2015 14:02, Markus Armbruster wrote:
   It should certainly be off for pc-q35-2.4 and newer.  Real Q35 boards
   commonly don't have an FDC (depends on the Super I/O chip used).

   We may want to keep it off for pc-i440fx-2.4 and newer.  I doubt
   there's a real i440FX without an FDC, but our virtual i440FX is quite
   unlike a real one in other ways already.

 That would break libvirt for people upgrading from 2.3 to 2.4.  So it's
 more like pc-i440fx-3.0 and pc-q35-3.0.
 
 What exactly breaks when?

libvirt expects -nodefaults -drive if=none,id=fdd0,... -global
isa-fdc.driveA=fdd0 to result in a machine with a working FDD.  It
doesn't know that it has to add -machine fdc=on.

Besides, adding a new machine option is not the best we can do.  If the
default is no FDC, all that is needed to add one back is -device.  An
FDC is yet another ISA device, it is possible to create one with -device.

 add the magic to make -global isa-fdc... auto-set the option to on.

That would be ugly magic.

The more I think about this, the more I think this is just a kneejerk
reaction to a sensationalist announcement.  The effect of this
vulnerability on properly configured data centers (running
non-prehistoric versions of Xen or KVM and using
stubdom/SELinux/AppArmor properly) should be really close to zero.

It's a storm in a tea cup.

Paolo

  Unless for q35 we decide to
 break everything and retroactively nuke the controller.

 (I'm still not sure why we have backwards-compatible machine types for q35).
 
 Beats me :)
 
 [...]
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [PATCH] Do not emulate a floppy drive when -nodefaults

2015-05-14 Thread Paolo Bonzini


On 14/05/2015 15:25, Sander Eikelenboom wrote:
 I tend to kindly disagree if you look at the broader perspective, yes it's 
 could 
 be a storm in a tea cup, but there seems to be a pattern.
 
 From a cmdline user / platform emulation point of view i can imagine that 
 some sort of 
 auto configuration / bundling in platforms is necessary (to limit the length 
 of 
 the cmdline to be supplied).
 
 But from a library / toolstack point of view it's quite nasty if *all* more 
 or 
 less obscure options have to actively disabled. It's quite undoable, not to 
 mention what
 happens when new ones get added. 

Where do you stop?

Do you want to disable ACPI by default?  It's a relatively large amount
of code, but for most modern guests it is necessary.

Besides, removing just the floppy drive makes little sense.  The
following devices remain with -nodefaults:

- an HPET

- the power management device, which includes an I2C controller

- an IDE controller

- a keyboard controller

- the magic VMware I/O port

- the PC speaker (!)

- the legacy PIT

- the real-time clock

- the two PICs and IOAPIC

Of all these, most guests will only use the PM device (maybe) and a
small subset of the RTC.  At the very least, the IDE controller, PC
speaker and the HPET should be removed by -nodefaults-i-mean-it.  If
you're using UEFI firmware, probably you could remove everything except
the PM device---maybe the RTC and the IOAPIC.  At this point this is not
-nodefaults-i-mean-it, it's a different machine type altogether and it's
remarkably different from a PC.

Reducing the attack surface is always nice, but hypervisor and device
model bugs are going to exist forever.  The amount of code for legacy
devices is all in all not that big, and I can hardly recall other
vulnerabilities in there.  This is why I say it's a knee-jerk reaction.

It is the intrinsic problem of virtualization mentioned in the famous
quote of Theo de Raadt(*).  Even if you do PV like Xen or Hyper-V, you
have less or no QEMU code but you throw more hypervisor code in
untrusted hands (hypervisor bugs exist, e.g. CVE-2012-4539,
CVE-2012-5510 or CVE-2013-1964).

(*) Which of course misses the point of virtualization
altogether.  Once you have established that you need
more density, choosing containers vs. virtualization
is a gamble on whether you prefer more moving parts and
more layers that have to be broken (more isolation), or
fewer moving parts and less isolation.

Paolo

 From this PoV it would be better to have to actively enable all the stuff you 
 want.
 
 At least Xen seemed to have taken the no-defaults as the best option to get 
 there so far, but that doesn't seem to 
 
 It's not the first CVE that has come from this you have to actively disable 
 all 
 you don't want to happen and probably won't be the last.
 
 So a -no-defaults-now-for-real option/mode for libraries / toolstacks, that 
 requires them to be very verbose, explicit and specific in what they actually 
 want, could be valuable.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [PATCH] Do not emulate a floppy drive when -nodefaults

2015-05-14 Thread Paolo Bonzini


On 14/05/2015 16:39, Stefano Stabellini wrote:
 On Thu, 14 May 2015, Paolo Bonzini wrote:
 On 14/05/2015 15:25, Sander Eikelenboom wrote:
 I tend to kindly disagree if you look at the broader perspective, yes it's 
 could 
 be a storm in a tea cup, but there seems to be a pattern.

 From a cmdline user / platform emulation point of view i can imagine 
 that some sort of 
 auto configuration / bundling in platforms is necessary (to limit the 
 length of 
 the cmdline to be supplied).

 But from a library / toolstack point of view it's quite nasty if *all* more 
 or 
 less obscure options have to actively disabled. It's quite undoable, not to 
 mention what
 happens when new ones get added. 

 Where do you stop?
 
 At this stage I would be happy enough if no floppy drives were actually
 emulated when the user asks for none.

Floppy drives aren't being emulated; the controller is.  Same for IDE,
so why single out the FDD controller?

 Sure, but it is harder to write a device emulator in a secure fashion
 than a brand new PV interface, that can be designed with security in
 mind from scratch. The number of VM escaping CVEs affecting Xen PV
 interfaces is extremely low, I think it was just two since the start of
 the project.

Sure; OTOH, treating hypervisor and QEMU escapes would be very silly, if
QEMU has been properly protected (through any combination of stubdoms,
LSM, seccomp, ...).

Paolo

 
 
 From this PoV it would be better to have to actively enable all the stuff 
 you want.

 At least Xen seemed to have taken the no-defaults as the best option to 
 get 
 there so far, but that doesn't seem to 

 It's not the first CVE that has come from this you have to actively 
 disable all 
 you don't want to happen and probably won't be the last.

 So a -no-defaults-now-for-real option/mode for libraries / toolstacks, 
 that 
 requires them to be very verbose, explicit and specific in what they 
 actually 
 want, could be valuable.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [PATCH] Do not emulate a floppy drive when -nodefaults

2015-05-14 Thread Paolo Bonzini


On 14/05/2015 13:47, Michael S. Tsirkin wrote:
  I would be OK with a new property too, as we could set it from
  libxl or libvirt. Anybody would be happy to pick this one up or should I
  do it?

 Pls go ahead, I can merge it in the pc tree.

Note that because of the -drive if=none,id=fdd1 -global
isa-fdc.driveA=fdd1 trick to create a floppy drive without if=floppy,
the default should remain on---at least for a few releases---even if
-nodefaults is passed.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Add a property to disable the floppy controller

2015-05-14 Thread Paolo Bonzini


On 14/05/2015 16:41, Stefano Stabellini wrote:
 Do not change default settings for any machines at the moment: this
 patch does not introduce any changes in behavior unless the user asks
 for no-fdc=on.

Can we avoid the double negative?

Paolo

 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 ---
  hw/i386/pc.c |   27 ---
  include/hw/i386/pc.h |2 ++
  vl.c |4 
  3 files changed, 30 insertions(+), 3 deletions(-)
 
 diff --git a/hw/i386/pc.c b/hw/i386/pc.c
 index f31d55e..501609b 100644
 --- a/hw/i386/pc.c
 +++ b/hw/i386/pc.c
 @@ -1450,10 +1450,14 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq 
 *gsi,
  cpu_exit_irq = qemu_allocate_irqs(cpu_request_exit, NULL, 1);
  DMA_init(0, cpu_exit_irq);
  
 -for(i = 0; i  MAX_FD; i++) {
 -fd[i] = drive_get(IF_FLOPPY, 0, i);
 +*floppy = NULL;
 +if (!object_property_get_bool(qdev_get_machine(),
 +PC_MACHINE_NO_FDC, error_abort)) {
 +for (i = 0; i  MAX_FD; i++) {
 +fd[i] = drive_get(IF_FLOPPY, 0, i);
 +}
 +*floppy = fdctrl_init_isa(isa_bus, fd);
  }
 -*floppy = fdctrl_init_isa(isa_bus, fd);
  }
  
  void pc_nic_init(ISABus *isa_bus, PCIBus *pci_bus)
 @@ -1797,6 +1801,20 @@ static bool pc_machine_get_aligned_dimm(Object *obj, 
 Error **errp)
  return pcms-enforce_aligned_dimm;
  }
  
 +static bool pc_machine_get_no_fdc(Object *obj, Error **errp)
 +{
 +PCMachineState *pcms = PC_MACHINE(obj);
 +
 +return pcms-no_fdc;
 +}
 +
 +static void pc_machine_set_no_fdc(Object *obj, bool value, Error **errp)
 +{
 +PCMachineState *pcms = PC_MACHINE(obj);
 +
 +pcms-no_fdc = value;
 +}
 +
  static void pc_machine_initfn(Object *obj)
  {
  PCMachineState *pcms = PC_MACHINE(obj);
 @@ -1820,6 +1838,9 @@ static void pc_machine_initfn(Object *obj)
  object_property_add_bool(obj, PC_MACHINE_ENFORCE_ALIGNED_DIMM,
   pc_machine_get_aligned_dimm,
   NULL, NULL);
 +object_property_add_bool(obj, PC_MACHINE_NO_FDC,
 + pc_machine_get_no_fdc, pc_machine_set_no_fdc,
 + NULL);
  }
  
  static void pc_machine_class_init(ObjectClass *oc, void *data)
 diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
 index 69d9cf8..93b2442 100644
 --- a/include/hw/i386/pc.h
 +++ b/include/hw/i386/pc.h
 @@ -41,6 +41,7 @@ struct PCMachineState {
  uint64_t max_ram_below_4g;
  OnOffAuto vmport;
  bool enforce_aligned_dimm;
 +bool no_fdc;
  };
  
  #define PC_MACHINE_ACPI_DEVICE_PROP acpi-device
 @@ -48,6 +49,7 @@ struct PCMachineState {
  #define PC_MACHINE_MAX_RAM_BELOW_4G max-ram-below-4g
  #define PC_MACHINE_VMPORT   vmport
  #define PC_MACHINE_ENFORCE_ALIGNED_DIMM enforce-aligned-dimm
 +#define PC_MACHINE_NO_FDC no-fdc
  
  /**
   * PCMachineClass:
 diff --git a/vl.c b/vl.c
 index 91411c1..81d80ae 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -387,6 +387,10 @@ static QemuOptsList qemu_machine_opts = {
  .name = iommu,
  .type = QEMU_OPT_BOOL,
  .help = Set on/off to enable/disable Intel IOMMU (VT-d),
 +},{
 +.name = PC_MACHINE_NO_FDC,
 +.type = QEMU_OPT_BOOL,
 +.help = do not emulate a floppy controller,
  },
  { /* End of list */ }
  },
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-pt: Fix bug cause PCI devices re-attach failed

2015-04-15 Thread Paolo Bonzini


On 15/04/2015 16:14, Li, Liang Z wrote:
 Yes,  it's the right place. Put aside the bug fix,  I think the 
 memory_region_ref/unref pair
  should be move to xen_pt_region_update after the conditional as you point 
 out.
  Do you think so?

It would make sense, but I was just guessing... I'm still not sure
whether that causes any difference, also because I'm not familiar with
the Xen PCI passthrough code.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-pt: Fix bug cause PCI devices re-attach failed

2015-04-13 Thread Paolo Bonzini


On 13/04/2015 16:12, Liang Li wrote:
 2. Do the attach and detach operation with a time interval. eg. 10s.
 
 The error message will not  disappear if retry, in this case, it's
 a bug.
 
 In the 'xen_pt_region_add' and 'xen_pt_region_del', we should only care
 about the 'xen-pci-pt-*' memory region, this can avoid the region's
 reference count is not equal with the dereference count when the
 device is detached and prevent the device's related QemuOpts object from
 being released properly, and then trigger the bug when the device is
 re-attached.

This doesn't explain _which_ region is causing the bug and how.

Assuming this is the right fix, should you instead move the
memory_region_ref/unref pair from xen_pt_region_add/del after this
conditional:

if (bar == -1  (!s-msix || s-msix-mmio != mr)) {
return;
}

in xen_pt_region_update?

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] ARM: KVM/XEN: how should we support virt-what?

2015-03-25 Thread Paolo Bonzini


On 25/03/2015 10:44, Andrew Jones wrote:
 2) create a specific DT node that will get exposed through sysfs, or
somewhere.

Looking at custom DT nodes is what PowerPC does.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 1/5] Qemu-Xen-vTPM: Support for Xen stubdom vTPM command line options

2015-01-19 Thread Paolo Bonzini


On 31/12/2014 00:02, Quan Xu wrote:
 Signed-off-by: Quan Xu quan...@intel.com
 ---
  configure| 14 ++
  hmp.c|  7 +++
  qapi-schema.json | 19 ---
  qemu-options.hx  | 13 +++--
  tpm.c|  7 ++-
  5 files changed, 54 insertions(+), 6 deletions(-)
 
 diff --git a/configure b/configure
 index a9e4d49..d63b8a1 100755
 --- a/configure
 +++ b/configure
 @@ -2942,6 +2942,16 @@ else
  fi
  
  ##
 +# TPM xenstubdoms is only on x86 Linux
 +
 +if test $targetos = Linux  test $cpu = i386 -o $cpu = x86_64  \
 +   test $xen = yes; then
 +  tpm_xenstubdoms=$tpm
 +else
 +  tpm_xenstubdoms=no
 +fi
 +
 +##
  # attr probe
  
  if test $attr != no ; then
 @@ -4333,6 +4343,7 @@ echo gcov  $gcov_tool
  echo gcov enabled  $gcov
  echo TPM support   $tpm
  echo libssh2 support   $libssh2
 +echo TPM xenstubdoms   $tpm_xenstubdoms
  echo TPM passthrough   $tpm_passthrough
  echo QOM debugging $qom_cast_debug
  echo vhdx  $vhdx
 @@ -4810,6 +4821,9 @@ if test $tpm = yes; then
if test $tpm_passthrough = yes; then
  echo CONFIG_TPM_PASSTHROUGH=y  $config_host_mak
fi
 +  if test $tpm_xenstubdoms = yes; then
 +echo CONFIG_TPM_XENSTUBDOMS=y  $config_host_mak
 +  fi
  fi
  
  echo TRACE_BACKENDS=$trace_backends  $config_host_mak
 diff --git a/hmp.c b/hmp.c
 index 63d7686..1df3ec7 100644
 --- a/hmp.c
 +++ b/hmp.c
 @@ -689,6 +689,7 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict)
  Error *err = NULL;
  unsigned int c = 0;
  TPMPassthroughOptions *tpo;
 +TPMXenstubdomsOptions *txo;
  
  info_list = qmp_query_tpm(err);
  if (err) {
 @@ -718,6 +719,12 @@ void hmp_info_tpm(Monitor *mon, const QDict *qdict)
 tpo-has_cancel_path ? ,cancel-path= : ,
 tpo-has_cancel_path ? tpo-cancel_path : );
  break;
 +case TPM_TYPE_OPTIONS_KIND_XENSTUBDOMS:
 +txo = ti-options-xenstubdoms;
 +if (!txo) {
 +monitor_printf(mon, null TPMXenstubdomsOptions error!\n);
 +}
 +break;

No need for the first four lines of code after the case.

  case TPM_TYPE_OPTIONS_KIND_MAX:
  break;
  }
 diff --git a/qapi-schema.json b/qapi-schema.json
 index 24379ab..9745c2b 100644
 --- a/qapi-schema.json
 +++ b/qapi-schema.json
 @@ -2854,9 +2854,10 @@
  #
  # @passthrough: TPM passthrough type
  #
 -# Since: 1.5
 +# @xenstubdoms: TPM xenstubdoms type (since 2.3)## Since 1.5

Cut-and-paste mistake.

Paolo

 +#
  ##
 -{ 'enum': 'TpmType', 'data': [ 'passthrough' ] }
 +{ 'enum': 'TpmType', 'data': [ 'passthrough', 'xenstubdoms' ] }
  
  ##
  # @query-tpm-types:
 @@ -2884,6 +2885,16 @@
  { 'type': 'TPMPassthroughOptions', 'data': { '*path' : 'str',
   '*cancel-path' : 'str'} }
  
 +# @TPMXenstubdomsOptions:
 +#
 +# Information about the TPM xenstubdoms type
 +#
 +# Since: 2.3
 +##
 +{ 'type': 'TPMXenstubdomsOptions', 'data': {  } }
 +#
 +##
 +
  ##
  # @TpmTypeOptions:
  #
 @@ -2894,7 +2905,9 @@
  # Since: 1.5
  ##
  { 'union': 'TpmTypeOptions',
 -   'data': { 'passthrough' : 'TPMPassthroughOptions' } }
 +  'data': { 'passthrough' : 'TPMPassthroughOptions',
 +'xenstubdoms' : 'TPMXenstubdomsOptions' } }
 +##
  
  ##
  # @TpmInfo:
 diff --git a/qemu-options.hx b/qemu-options.hx
 index 1e7d5b8..fd73f57 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
 @@ -2485,7 +2485,8 @@ DEF(tpmdev, HAS_ARG, QEMU_OPTION_tpmdev, \
  -tpmdev passthrough,id=id[,path=path][,cancel-path=path]\n
  use path to provide path to a character device; default 
 is /dev/tpm0\n
  use cancel-path to provide path to TPM's cancel sysfs 
 entry; if\n
 -not provided it will be searched for in 
 /sys/class/misc/tpm?/device\n,
 +not provided it will be searched for in 
 /sys/class/misc/tpm?/device\n
 +-tpmdev xenstubdoms,id=id\n,
  QEMU_ARCH_ALL)
  STEXI
  
 @@ -2495,7 +2496,8 @@ The general form of a TPM device option is:
  @item -tpmdev @var{backend} ,id=@var{id} [,@var{options}]
  @findex -tpmdev
  Backend type must be:
 -@option{passthrough}.
 +@option{passthrough}, or
 +@option{xenstubdoms}.
  
  The specific backend type will determine the applicable options.
  The @code{-tpmdev} option creates the TPM backend and requires a
 @@ -2545,6 +2547,13 @@ To create a passthrough TPM use the following two 
 options:
  Note that the @code{-tpmdev} id is @code{tpm0} and is referenced by
  @code{tpmdev=tpm0} in the device option.
  
 +To create a xenstubdoms TPM use the following two options:
 +@example
 +-tpmdev xenstubdoms,id=tpm0 -device tpm-tis,tpmdev=tpm0
 +@end example
 +Note that the @code{-tpmdev} id is @code{tpm0} and is referenced by
 +@code{tpmdev=tpm0} in the device option.
 

Re: [Xen-devel] [PATCH] xen-pt: Fix PCI devices re-attach failed

2015-01-12 Thread Paolo Bonzini


On 12/01/2015 14:35, Li, Liang Z wrote:
 
 diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c index c1bf357..f2893b2 100644
 --- a/hw/xen/xen_pt.c
 +++ b/hw/xen/xen_pt.c
 @@ -736,7 +736,7 @@ static int xen_pt_initfn(PCIDevice *d)
  }
  
  out:
 -memory_listener_register(s-memory_listener, address_space_memory);
 +memory_listener_register(s-memory_listener, 
 + s-dev.bus_master_as);
  memory_listener_register(s-io_listener, address_space_io);
  XEN_PT_LOG(d,
 Real physical device %02x:%02x.%d registered 
 successfully!\n,
 
 By further debugging, I found when using 'address_space_memory',  
 'xen_pt_region_del' 
 won't be called when the memory region's name is not  ' xen-pci-pt-*', when 
 using
  ' s-dev.bus_master_as ', there is no such issue.
 
 I think use the device related address space here is more reasonable, but I 
 am not sure.
  Could you give some suggestion?

Yes, this patch makes sense.  The listener will be called every time the
command register is written.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 5/5] Qemu-Xen-vTPM: QEMU machine class is initialized before tpm_init()

2015-01-12 Thread Paolo Bonzini


On 31/12/2014 00:03, Quan Xu wrote:
 make sure QEMU machine class is initialized and QEMU has registered
 Xen stubdom vTPM driver when call tpm_init()
 
 Signed-off-by: Quan Xu quan...@intel.com
 ---
  vl.c | 16 ++--
  1 file changed, 10 insertions(+), 6 deletions(-)
 
 diff --git a/vl.c b/vl.c
 index f6b3546..dd437e1 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -4114,12 +4114,6 @@ int main(int argc, char **argv, char **envp)
  exit(1);
  }
  
 -#ifdef CONFIG_TPM
 -if (tpm_init()  0) {
 -exit(1);
 -}
 -#endif
 -
  /* init the bluetooth world */
  if (foreach_device_config(DEV_BT, bt_parse))
  exit(1);
 @@ -4225,6 +4219,16 @@ int main(int argc, char **argv, char **envp)
  exit(1);
  }
  
 +/* For compatible with Xen stubdom vTPM driver, make
 + * sure QEMU machine class is initialized and QEMU has
 + * registered Xen stubdom vTPM driver ..
 +*/
 +#ifdef CONFIG_TPM
 +if (tpm_init()  0) {
 +exit(1);
 +}
 +#endif
 +
  /* init generic devices */
  if (qemu_opts_foreach(qemu_find_opts(device), device_init_func, NULL, 
 1) != 0)
  exit(1);
 

This is okay.  I think the comment is not necessary, but Stefano can fix
that up if he agrees.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-07 Thread Paolo Bonzini


On 07/01/2015 08:18, Andy Lutomirski wrote:
  Thus far, I've been told unambiguously that a guest can't observe pvti
  while it's being written, and I think you're now telling me that this
  isn't true and that a guest *can* observe pvti while it's being
  written while the low bit of the version field is not set.  If so,
  this is rather strongly incompatible with the spec in the KVM docs.
 
  Where am I saying that?
 I thought the conclusion from what you and Marcelo pointed out about
 the code was that, once the first vCPU updated its pvti, it could
 start running guest code while the other vCPUs are still updating
 pvti, so its guest code can observe the other vCPUs mid-update.

Ah, in that sense you're right.  However, each VCPU cannot observe _its
own_ pvti entry while it's being written (no matter what's in the low
bit of the version field).

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Paolo Bonzini


On 05/01/2015 20:17, Marcelo Tosatti wrote:
 But there is no guarantee that vCPU-N has updated its pvti when
 vCPU-M resumes guest instruction execution.

You're right.

 So the cost this patch removes is mainly from __getcpu (==RDTSCP?) ?
 Perhaps you can use Gleb's idea to stick vcpu id into version field ?

Or just replace __getcpu with rdtscp.

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-01-06 Thread Paolo Bonzini


On 06/01/2015 17:56, Andy Lutomirski wrote:
 Still no good.  We can migrate a bunch of times so we see the same CPU
 all three times

There are no three times.  The CPU you see here:

 
 
 // ... compute nanoseconds from pvti and tsc ...
 rmb();
 }   while(v != pvti-version);

is the same you read here:

 cpu = get_cpu();

The algorithm is:

1) get a consistent (cpu, version, tsc)

   1.a) get cpu
   1.b) get pvti[cpu]-version, ignoring low bit
   1.c) get (tsc, cpu)
   1.d) if cpu from 1.a and 1.c do not match, loop
   1.e) if pvti[cpu] was being updated, we'll loop later

2) compute nanoseconds from pvti[cpu] and tsc

3) if pvti[cpu] changed under our feet during (2), i.e. version doesn't
match, retry.

As long as the CPU is consistent between get_cpu() and rdtscp(), there
is no problem with migration, because pvti is always accessed for that
one CPU.  If there were any problem, it would be caught by the version
check.  Writing it down with two nested do...whiles makes it clearer IMHO.

 and *still* don't get a consistent read, unless we
 play nasty games with lots of version checks (I have a patch for that,
 but I don't like it very much).  The patch is here:
 
 https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso_paranoiaid=a69754dc5ff33f5187162b5338854ad23dd7be8d
 
 but I don't like it.
 
 Thus far, I've been told unambiguously that a guest can't observe pvti
 while it's being written, and I think you're now telling me that this
 isn't true and that a guest *can* observe pvti while it's being
 written while the low bit of the version field is not set.  If so,
 this is rather strongly incompatible with the spec in the KVM docs.

Where am I saying that?

Paolo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   >