Re: [PATCH v7 19/19] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-17 Thread Marc Zyngier
On Thu, 17 Dec 2015 15:22:50 +0800
Shannon Zhao  wrote:

> 
> 
> On 2015/12/17 4:33, Christoffer Dall wrote:
> > On Wed, Dec 16, 2015 at 04:06:49PM +0800, Shannon Zhao wrote:
> >> Hi,
> >>
> >> On 2015/12/16 15:31, Shannon Zhao wrote:
> > But in this case, you're returning an error if it is *not* 
> > initialized.
> > I understand that in that case you cannot return an interrupt 
> > number (-1
> > would be weird), but returning -EBUSY feels even more weird.
> >
> > I'd settle for -ENOXIO, or something similar. Anyone having a 
> > better idea?
> >
> > ENXIO or ENODEV would be my choice too, and add that to the
> > Documentation clearly describing when this error code is used.
> >
> > By the way, why do you loop over all VCPUS to set the same value when
> > you can't do anything per VCPU anyway?  It seems to me it's either a
> > per-VM property (that you can store on the VM data structure) or it's a
> > true per-VCPU property?
> >>> This is a per-VCPU property. PMU interrupt could be PPI or SPI. For PPI
> >>> the interrupt numbers are same for each vcpu, while for SPI they are
> >>> different, so it needs to set them separately. I planned to support both
> >>> PPI and SPI. I think I should add support for SPI at this moment and let
> >>> users (QEMU) to set these interrupts for each one.
> >>
> >> How about below vPMU Documentation?
> >>
> >> ARM Virtual Performance Monitor Unit (vPMU)
> >> ===
> >>
> >> Device types supported:
> >>   KVM_DEV_TYPE_ARM_PMU_V3 ARM Performance Monitor Unit v3
> >>
> >> Instantiate one PMU instance for per VCPU through this API.
> >>
> >> Groups:
> >>   KVM_DEV_ARM_PMU_GRP_IRQ
> >>   Attributes:
> >> The attr field of kvm_device_attr encodes two values:
> >> bits: | 63  32 | 31  0 |
> >> values:   | vcpu_index |  irq_num  |
> BTW, I change this Attribute to below format and pass vcpu_index through
> this Attribute while pass irq_num through kvm_device_attr.addr.
> Is it fine?
> 
> bits: | 63  32 | 31   0 |
> values:   |  reserved  | vcpu_index |

Using the .addr field for something that is clearly not an address is
rather odd. Is there any prior usage of that field for something that
is not an address?

Thanks,

M.
-- 
Without deviation from the norm, progress is not possible.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 19/19] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-17 Thread Shannon Zhao


On 2015/12/17 16:33, Marc Zyngier wrote:
> On Thu, 17 Dec 2015 15:22:50 +0800
> Shannon Zhao  wrote:
> 
>> > 
>> > 
>> > On 2015/12/17 4:33, Christoffer Dall wrote:
>>> > > On Wed, Dec 16, 2015 at 04:06:49PM +0800, Shannon Zhao wrote:
 > >> Hi,
 > >>
 > >> On 2015/12/16 15:31, Shannon Zhao wrote:
>>> > > But in this case, you're returning an error if it is 
>>> > > *not* initialized.
>>> > > I understand that in that case you cannot return an 
>>> > > interrupt number (-1
>>> > > would be weird), but returning -EBUSY feels even more 
>>> > > weird.
>>> > >
>>> > > I'd settle for -ENOXIO, or something similar. Anyone 
>>> > > having a better idea?
>>> > >
>>> > > ENXIO or ENODEV would be my choice too, and add that to the
>>> > > Documentation clearly describing when this error code is used.
>>> > >
>>> > > By the way, why do you loop over all VCPUS to set the same 
>>> > > value when
>>> > > you can't do anything per VCPU anyway?  It seems to me it's 
>>> > > either a
>>> > > per-VM property (that you can store on the VM data structure) 
>>> > > or it's a
>>> > > true per-VCPU property?
> > >>> This is a per-VCPU property. PMU interrupt could be PPI or SPI. For 
> > >>> PPI
> > >>> the interrupt numbers are same for each vcpu, while for SPI they are
> > >>> different, so it needs to set them separately. I planned to support 
> > >>> both
> > >>> PPI and SPI. I think I should add support for SPI at this moment 
> > >>> and let
> > >>> users (QEMU) to set these interrupts for each one.
 > >>
 > >> How about below vPMU Documentation?
 > >>
 > >> ARM Virtual Performance Monitor Unit (vPMU)
 > >> ===
 > >>
 > >> Device types supported:
 > >>   KVM_DEV_TYPE_ARM_PMU_V3 ARM Performance Monitor Unit v3
 > >>
 > >> Instantiate one PMU instance for per VCPU through this API.
 > >>
 > >> Groups:
 > >>   KVM_DEV_ARM_PMU_GRP_IRQ
 > >>   Attributes:
 > >> The attr field of kvm_device_attr encodes two values:
 > >> bits: | 63  32 | 31  0 |
 > >> values:   | vcpu_index |  irq_num  |
>> > BTW, I change this Attribute to below format and pass vcpu_index through
>> > this Attribute while pass irq_num through kvm_device_attr.addr.
>> > Is it fine?
>> > 
>> > bits: | 63  32 | 31   0 |
>> > values:   |  reserved  | vcpu_index |
> Using the .addr field for something that is clearly not an address is
> rather odd. Is there any prior usage of that field for something that
> is not an address?

I see this usage in vgic_attr_regs_access(). But if you prefer previous
one, I'll use that.

-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] kvm:x86:Fix error handling in the function kvm_write_wall_clock

2015-12-17 Thread Paolo Bonzini


On 17/12/2015 03:30, Nicholas Krause wrote:
> This fixes error handling in the function kvm_write_wall_clock
> by checking if any of the calls to kvm_write_guest have failed
> inside this paricutlar function and if so print to the console
> with pr_err that we are unable to write the data to the guest
> system to warn the user of this failure before directly returning
> to the caller of the function kvm_write_wall_check as we cannot
> continue the function to this function after a failed call to
> 
> Signed-off-by: Nicholas Krause 

I think I have explained before why this is mostly unnecessary, but the
patch can be saved.  I'll go through the problems again first.

> - kvm_write_guest(kvm, wall_clock, , sizeof(version));
> + if (kvm_write_guest(kvm, wall_clock, , sizeof(version))) {
> + pr_err("Unable to correctly write data to guest system\n");
> + return;
> + }

You've written a message to the log that can be triggered by the guest
(by writing an invalid value to the wall clock MSR).  We don't let the
guest spam the host logs.

>   /*
>* The guest calculates current wall clock time by adding
> @@ -1168,10 +1171,16 @@ static void kvm_write_wall_clock(struct kvm *kvm, 
> gpa_t wall_clock)
>   wc.nsec = boot.tv_nsec;
>   wc.version = version;
>  
> - kvm_write_guest(kvm, wall_clock, , sizeof(wc));
> + if (kvm_write_guest(kvm, wall_clock, , sizeof(wc))) {
> + pr_err("Unable to correctly write data to guest system\n");
> + return;
> + }

The other kvm_write_guest will probably also fail but, if it doesn't,
you've left an odd version in the data structure. The guest will loop
forever waiting for the even value.

Plus, same problem with logs.

>   version++;
> - kvm_write_guest(kvm, wall_clock, , sizeof(version));
> + if (kvm_write_guest(kvm, wall_clock, , sizeof(version))) {
> + pr_err("Unable to correctly write data to guest system\n");
> + return;
> + }
>  }

Same problem with logs, and the return is not useful.

Can you send a patch that only adds a return if the *first*
kvm_write_guest fails?  You can leave aside the others, and not add any
pr_err.

Thanks,

Paolo

>  static uint32_t div_frac(uint32_t dividend, uint32_t divisor)
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 19/19] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-17 Thread Shannon Zhao


On 2015/12/17 17:38, Marc Zyngier wrote:
> On 17/12/15 08:41, Shannon Zhao wrote:
>> > 
>> > 
>> > On 2015/12/17 16:33, Marc Zyngier wrote:
>>> >> On Thu, 17 Dec 2015 15:22:50 +0800
>>> >> Shannon Zhao  wrote:
>>> >>
> 
> 
>  On 2015/12/17 4:33, Christoffer Dall wrote:
>>> >> On Wed, Dec 16, 2015 at 04:06:49PM +0800, Shannon Zhao wrote:
>  Hi,
> 
>  On 2015/12/16 15:31, Shannon Zhao wrote:
>>> >> But in this case, you're 
>>> >> returning an error if it is 
>>> >> *not* initialized.
>>> >> I understand that in that case 
>>> >> you cannot return an interrupt 
>>> >> number (-1
>>> >> would be weird), but returning 
>>> >> -EBUSY feels even more weird.
>>> >>
>>> >> I'd settle for -ENOXIO, or 
>>> >> something similar. Anyone having 
>>> >> a better idea?
>>> >>
>>> >> ENXIO or ENODEV would be my choice too, and add 
>>> >> that to the
>>> >> Documentation clearly describing when this error 
>>> >> code is used.
>>> >>
>>> >> By the way, why do you loop over all VCPUS to 
>>> >> set the same value when
>>> >> you can't do anything per VCPU anyway?  It seems 
>>> >> to me it's either a
>>> >> per-VM property (that you can store on the VM 
>>> >> data structure) or it's a
>>> >> true per-VCPU property?
>>> >> This is a per-VCPU property. PMU interrupt could be PPI 
>>> >> or SPI. For PPI
>>> >> the interrupt numbers are same for each vcpu, while for 
>>> >> SPI they are
>>> >> different, so it needs to set them separately. I planned 
>>> >> to support both
>>> >> PPI and SPI. I think I should add support for SPI at 
>>> >> this moment and let
>>> >> users (QEMU) to set these interrupts for each one.
> 
>  How about below vPMU Documentation?
> 
>  ARM Virtual Performance Monitor Unit (vPMU)
>  ===
> 
>  Device types supported:
>    KVM_DEV_TYPE_ARM_PMU_V3 ARM Performance Monitor 
>  Unit v3
> 
>  Instantiate one PMU instance for per VCPU through this API.
> 
>  Groups:
>    KVM_DEV_ARM_PMU_GRP_IRQ
>    Attributes:
>  The attr field of kvm_device_attr encodes two values:
>  bits: | 63  32 | 31  0 |
>  values:   | vcpu_index |  irq_num  |
>  BTW, I change this Attribute to below format and pass vcpu_index 
>  through
>  this Attribute while pass irq_num through kvm_device_attr.addr.
>  Is it fine?
> 
>  bits: | 63  32 | 31   0 |
>  values:   |  reserved  | vcpu_index |
>>> >> Using the .addr field for something that is clearly not an address is
>>> >> rather odd. Is there any prior usage of that field for something that
>>> >> is not an address?
>> > 
>> > I see this usage in vgic_attr_regs_access(). But if you prefer previous
>> > one, I'll use that.
> Ah, you're using the .addr field to point to a userspace value, not to
> carry the value itself! That'd be fine by me (even if I still prefer the
> original one), but I'd like others to chime in (I'm quite bad at
> defining userspace stuff...).

Another reason I think is that it needs to pass the irq_num to user
space when calling get_attr. It could be passed via kvm_device_attr.addr
while can't be passed via kvm_device_attr.attr within current framework.

Thanks,
-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread yongji xie



On 2015/12/17 4:14, Alex Williamson wrote:

On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote:

Current vfio-pci implementation disallows to mmap MSI-X table in
case that user get to touch this directly.

However, EEH mechanism could ensure that a given pci device
can only shoot the MSIs assigned for its PE and guest kernel also
would not write to MSI-X table in pci_enable_msix() because
para-virtualization on PPC64 platform. So MSI-X table is safe to
access directly from the guest with EEH mechanism enabled.

The MSI-X table is paravirtualized on vfio in general and interrupt
remapping theoretically protects against errant interrupts, so why is
this PPC64 specific?  We have the same safeguards on x86 if we want to
decide they're sufficient.  Offhand, the only way I can think that a
device can touch the MSI-X table is via backdoors or p2p DMA with
another device.

Maybe I didn't make my point clear. The reasons why we can mmap MSI-X
table on PPC64 are:

1. EEH mechanism could ensure that a given pci device can only shoot
the MSIs assigned for its PE. So it would not do harm to other memory
space when the guest write a garbage MSI-X address/data to the vector table
if we passthough MSI-X tables to guest.

2. The guest kernel would not write to MSI-X table on PPC64 platform
when device drivers call pci_enable_msix() to initialize MSI-X interrupts.

So I think it is safe to mmap/passthrough MSI-X table on PPC64 platform.
And I'm not sure whether other architectures can ensure these two 
points. Thanks.


Regards
Yongji Xie

This patch adds support for this case and allow to mmap MSI-X
table if EEH is supported on PPC64 platform.

And we also add a VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP flag to notify
userspace that it's safe to mmap MSI-X table.

Signed-off-by: Yongji Xie 
---
  drivers/vfio/pci/vfio_pci.c |5 -
  drivers/vfio/pci/vfio_pci_private.h |5 +
  include/uapi/linux/vfio.h   |2 ++
  3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index dbcad99..85d9980 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -446,6 +446,9 @@ static long vfio_pci_ioctl(void *device_data,
if (vfio_pci_bar_page_aligned())
info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED;
  
+		if (vfio_msix_table_mmap_enabled())

+   info.flags |= VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP;
+
info.num_regions = VFIO_PCI_NUM_REGIONS;
info.num_irqs = VFIO_PCI_NUM_IRQS;
  
@@ -871,7 +874,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)

if (phys_len < PAGE_SIZE || req_start + req_len > phys_len)
return -EINVAL;
  
-	if (index == vdev->msix_bar) {

+   if (index == vdev->msix_bar && !vfio_msix_table_mmap_enabled()) {
/*
 * Disallow mmaps overlapping the MSI-X table; users don't
 * get to touch this directly.  We could find somewhere
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 319352a..835619e 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -74,6 +74,11 @@ static inline bool vfio_pci_bar_page_aligned(void)
return IS_ENABLED(CONFIG_PPC64);
  }
  
+static inline bool vfio_msix_table_mmap_enabled(void)

+{
+   return IS_ENABLED(CONFIG_EEH);
+}

I really dislike these.


+
  extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
  extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
  
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h

index 1fc8066..289e662 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -173,6 +173,8 @@ struct vfio_device_info {
  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)/* vfio-amba device */
  /* Platform support all PCI MMIO BARs to be page aligned */
  #define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED(1 << 4)
+/* Platform support mmapping PCI MSI-X vector table */
+#define VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP(1 << 5)

Again, not sure why this is on the device versus the region, but I'd
prefer to investigate whether we can handle this with the sparse mmap
capability (or lack of) in the capability chains I proposed[1]. Thanks,

Alex

[1] https://lkml.org/lkml/2015/11/23/748


Good idea! I wiil investigate it. Thanks.

Regards
Yongji Xie

__u32   num_regions;/* Max region index + 1 */
__u32   num_irqs;   /* Max IRQ index + 1 */
  };


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 19/19] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-17 Thread Marc Zyngier
On 17/12/15 10:10, Shannon Zhao wrote:
> 
> 
> On 2015/12/17 17:38, Marc Zyngier wrote:
>> On 17/12/15 08:41, Shannon Zhao wrote:


 On 2015/12/17 16:33, Marc Zyngier wrote:
>> On Thu, 17 Dec 2015 15:22:50 +0800
>> Shannon Zhao  wrote:
>>
>>
>>
>> On 2015/12/17 4:33, Christoffer Dall wrote:
>> On Wed, Dec 16, 2015 at 04:06:49PM +0800, Shannon Zhao wrote:
>> Hi,
>>
>> On 2015/12/16 15:31, Shannon Zhao wrote:
>> But in this case, you're 
>> returning an error if it is 
>> *not* initialized.
>> I understand that in that case 
>> you cannot return an interrupt 
>> number (-1
>> would be weird), but returning 
>> -EBUSY feels even more weird.
>>
>> I'd settle for -ENOXIO, or 
>> something similar. Anyone having 
>> a better idea?
>>
>> ENXIO or ENODEV would be my choice too, and add 
>> that to the
>> Documentation clearly describing when this error 
>> code is used.
>>
>> By the way, why do you loop over all VCPUS to 
>> set the same value when
>> you can't do anything per VCPU anyway?  It seems 
>> to me it's either a
>> per-VM property (that you can store on the VM 
>> data structure) or it's a
>> true per-VCPU property?
>> This is a per-VCPU property. PMU interrupt could be PPI 
>> or SPI. For PPI
>> the interrupt numbers are same for each vcpu, while for 
>> SPI they are
>> different, so it needs to set them separately. I planned 
>> to support both
>> PPI and SPI. I think I should add support for SPI at 
>> this moment and let
>> users (QEMU) to set these interrupts for each one.
>>
>> How about below vPMU Documentation?
>>
>> ARM Virtual Performance Monitor Unit (vPMU)
>> ===
>>
>> Device types supported:
>>   KVM_DEV_TYPE_ARM_PMU_V3 ARM Performance Monitor 
>> Unit v3
>>
>> Instantiate one PMU instance for per VCPU through this API.
>>
>> Groups:
>>   KVM_DEV_ARM_PMU_GRP_IRQ
>>   Attributes:
>> The attr field of kvm_device_attr encodes two values:
>> bits: | 63  32 | 31  0 |
>> values:   | vcpu_index |  irq_num  |
>> BTW, I change this Attribute to below format and pass vcpu_index 
>> through
>> this Attribute while pass irq_num through kvm_device_attr.addr.
>> Is it fine?
>>
>> bits: | 63  32 | 31   0 |
>> values:   |  reserved  | vcpu_index |
>> Using the .addr field for something that is clearly not an address is
>> rather odd. Is there any prior usage of that field for something that
>> is not an address?

 I see this usage in vgic_attr_regs_access(). But if you prefer previous
 one, I'll use that.
>> Ah, you're using the .addr field to point to a userspace value, not to
>> carry the value itself! That'd be fine by me (even if I still prefer the
>> original one), but I'd like others to chime in (I'm quite bad at
>> defining userspace stuff...).
> 
> Another reason I think is that it needs to pass the irq_num to user
> space when calling get_attr. It could be passed via kvm_device_attr.addr
> while can't be passed via kvm_device_attr.attr within current framework.

That's a very good reason. Go for it.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 19/19] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-17 Thread Marc Zyngier
On 17/12/15 08:41, Shannon Zhao wrote:
> 
> 
> On 2015/12/17 16:33, Marc Zyngier wrote:
>> On Thu, 17 Dec 2015 15:22:50 +0800
>> Shannon Zhao  wrote:
>>


 On 2015/12/17 4:33, Christoffer Dall wrote:
>> On Wed, Dec 16, 2015 at 04:06:49PM +0800, Shannon Zhao wrote:
 Hi,

 On 2015/12/16 15:31, Shannon Zhao wrote:
>> But in this case, you're returning an error if it is 
>> *not* initialized.
>> I understand that in that case you cannot return an 
>> interrupt number (-1
>> would be weird), but returning -EBUSY feels even more 
>> weird.
>>
>> I'd settle for -ENOXIO, or something similar. Anyone 
>> having a better idea?
>>
>> ENXIO or ENODEV would be my choice too, and add that to the
>> Documentation clearly describing when this error code is used.
>>
>> By the way, why do you loop over all VCPUS to set the same value 
>> when
>> you can't do anything per VCPU anyway?  It seems to me it's 
>> either a
>> per-VM property (that you can store on the VM data structure) or 
>> it's a
>> true per-VCPU property?
>> This is a per-VCPU property. PMU interrupt could be PPI or SPI. For 
>> PPI
>> the interrupt numbers are same for each vcpu, while for SPI they are
>> different, so it needs to set them separately. I planned to support 
>> both
>> PPI and SPI. I think I should add support for SPI at this moment and 
>> let
>> users (QEMU) to set these interrupts for each one.

 How about below vPMU Documentation?

 ARM Virtual Performance Monitor Unit (vPMU)
 ===

 Device types supported:
   KVM_DEV_TYPE_ARM_PMU_V3 ARM Performance Monitor Unit v3

 Instantiate one PMU instance for per VCPU through this API.

 Groups:
   KVM_DEV_ARM_PMU_GRP_IRQ
   Attributes:
 The attr field of kvm_device_attr encodes two values:
 bits: | 63  32 | 31  0 |
 values:   | vcpu_index |  irq_num  |
 BTW, I change this Attribute to below format and pass vcpu_index through
 this Attribute while pass irq_num through kvm_device_attr.addr.
 Is it fine?

 bits: | 63  32 | 31   0 |
 values:   |  reserved  | vcpu_index |
>> Using the .addr field for something that is clearly not an address is
>> rather odd. Is there any prior usage of that field for something that
>> is not an address?
> 
> I see this usage in vgic_attr_regs_access(). But if you prefer previous
> one, I'll use that.

Ah, you're using the .addr field to point to a userspace value, not to
carry the value itself! That'd be fine by me (even if I still prefer the
original one), but I'd like others to chime in (I'm quite bad at
defining userspace stuff...).

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/3] vfio-pci: Allow to mmap sub-page MMIO BARs if all MMIO BARs are page aligned

2015-12-17 Thread yongji xie



On 2015/12/17 4:04, Alex Williamson wrote:

On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote:

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio page
may be shared with other BARs.

But we should allow to mmap these sub-page MMIO BARs if all MMIO BARs
are page aligned which leads the BARs' mmio page would not be shared
with other BARs.

This patch adds support for this case and we also add a
VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED flag to notify userspace that
platform supports all MMIO BARs to be page aligned.

Signed-off-by: Yongji Xie 
---
  drivers/vfio/pci/vfio_pci.c |   10 +-
  drivers/vfio/pci/vfio_pci_private.h |5 +
  include/uapi/linux/vfio.h   |2 ++
  3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c
b/drivers/vfio/pci/vfio_pci.c
index 32b88bd..dbcad99 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -443,6 +443,9 @@ static long vfio_pci_ioctl(void *device_data,
if (vdev->reset_works)
info.flags |= VFIO_DEVICE_FLAGS_RESET;
  
+		if (vfio_pci_bar_page_aligned())

+   info.flags |=
VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED;
+
info.num_regions = VFIO_PCI_NUM_REGIONS;
info.num_irqs = VFIO_PCI_NUM_IRQS;
  
@@ -479,7 +482,8 @@ static long vfio_pci_ioctl(void *device_data,

 VFIO_REGION_INFO_FLAG_WRITE;
if (IS_ENABLED(CONFIG_VFIO_PCI_MMAP) &&
pci_resource_flags(pdev, info.index) &
-   IORESOURCE_MEM && info.size >=
PAGE_SIZE)
+   IORESOURCE_MEM && (info.size >=
PAGE_SIZE ||
+   vfio_pci_bar_page_aligned()))
info.flags |=
VFIO_REGION_INFO_FLAG_MMAP;
break;
case VFIO_PCI_ROM_REGION_INDEX:
@@ -855,6 +859,10 @@ static int vfio_pci_mmap(void *device_data,
struct vm_area_struct *vma)
return -EINVAL;
  
  	phys_len = pci_resource_len(pdev, index);

+
+   if (vfio_pci_bar_page_aligned())
+   phys_len = PAGE_ALIGN(phys_len);
+
req_len = vma->vm_end - vma->vm_start;
pgoff = vma->vm_pgoff &
((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
diff --git a/drivers/vfio/pci/vfio_pci_private.h
b/drivers/vfio/pci/vfio_pci_private.h
index 0e7394f..319352a 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -69,6 +69,11 @@ struct vfio_pci_device {
  #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) ||
is_msix(vdev)))
  #define irq_is(vdev, type) (vdev->irq_type == type)
  
+static inline bool vfio_pci_bar_page_aligned(void)

+{
+   return IS_ENABLED(CONFIG_PPC64);
+}

I really dislike this.  This is a problem for any architecture that
runs on larger pages, and even an annoyance on 4k hosts.  Why are we
only solving it for PPC64?
Yes, I know it's a problem for other architectures. But I'm not sure if 
other archs prefer
to enforce the alignment of all BARs to be at least PAGE_SIZE which 
would result in

some waste of address space.

So I just propose a prototype and add PPC64 support here. And other 
archs could decide

to use it or not by themselves.

Can't we do something similar in the core PCI code and detect it?

So you mean we can do it like this:

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d390fc1..f46c04d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -320,6 +320,11 @@ static inline resource_size_t 
pci_resource_alignment(struct pci_dev *dev,

return resource_alignment(res);
 }

+static inline bool pci_bar_page_aligned(void)
+{
+   return IS_ENABLED(CONFIG_PPC64);
+}
+
 void pci_enable_acs(struct pci_dev *dev);

 struct pci_dev_reset_methods {

or add a config option to indicate that PCI MMIO BARs should be page 
aligned? Thanks.


Regards
Yongji Xie

+
  extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
  extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
  
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h

index 751b69f..1fc8066 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -171,6 +171,8 @@ struct vfio_device_info {
  #define VFIO_DEVICE_FLAGS_PCI (1 << 1)  /* vfio-pci
device */
  #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform
device */
  #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)/* vfio-amba device
*/
+/* Platform support all PCI MMIO BARs to be page aligned */
+#define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED (1 << 4)
__u32   num_regions;/* Max region index + 1 */
__u32   num_irqs;   /* Max IRQ index + 1 */
  };

Why is this on the device info, shouldn't it be per region?  Do we even
need a flag or can we just set the existing mmap flag 

Re: [PATCH 00/16] MIPS: KVM: Misc trivial cleanups

2015-12-17 Thread Paolo Bonzini


On 17/12/2015 00:49, James Hogan wrote:
> This patchset contains a bunch of miscellaneous cleanups (which are
> mostly trivial) for MIPS KVM & MIPS headers, such as:
> - Style/whitespace fixes
> - General cleanup and removal of dead code.
> - Moving/refactoring of general MIPS definitions out of arch/mips/kvm/
>   and into arch/mips/include/asm/ so they can be shared with the rest of
>   arch/mips. Specifically COP0 register bits, exception codes, cache
>   ops, & instruction opcodes.
> - Add MAINTAINERS entry for MIPS KVM.
> 
> Due to the interaction with other arch/mips/ code, I think it makes
> sense for these to go via the MIPS tree.

No objection.

Acked-by: Paolo Bonzini 

I think I'd use s8/u8 instead of int8_t/uint8_t in patch 15, but really
that's just me.  I'm fine either way, and that's really the only comment
I have on the series. :)

Paolo

> James Hogan (16):
>   MIPS: KVM: Trivial whitespace and style fixes
>   MIPS: KVM: Drop some unused definitions from kvm_host.h
>   MIPS: Move definition of DC bit to mipsregs.h
>   MIPS: KVM: Drop unused kvm_mips_host_tlb_inv_index()
>   MIPS: KVM: Convert EXPORT_SYMBOL to _GPL
>   MIPS: KVM: Refactor added offsetof()s
>   MIPS: KVM: Make kvm_mips_{init,exit}() static
>   MIPS: Move Cause.ExcCode trap codes to mipsregs.h
>   MIPS: Update trap codes
>   MIPS: Use EXCCODE_ constants with set_except_vector()
>   MIPS: Break down cacheops.h definitions
>   MIPS: KVM: Use cacheops.h definitions
>   MIPS: Move KVM specific opcodes into asm/inst.h
>   MIPS: KVM: Add missing newline to kvm_err()
>   MIPS: KVM: Consistent use of uint*_t in MMIO handling
>   MAINTAINERS: Add KVM for MIPS entry
> 
>  MAINTAINERS   |   8 +++
>  arch/mips/Kconfig |   3 +-
>  arch/mips/include/asm/cacheops.h  | 106 --
>  arch/mips/include/asm/kvm_host.h  |  39 +
>  arch/mips/include/asm/mipsregs.h  |  34 +++
>  arch/mips/include/uapi/asm/inst.h |   3 +-
>  arch/mips/kernel/cpu-bugs64.c |   8 +--
>  arch/mips/kernel/traps.c  |  52 -
>  arch/mips/kvm/callback.c  |   2 +-
>  arch/mips/kvm/dyntrans.c  |  10 +---
>  arch/mips/kvm/emulate.c   | 118 
> --
>  arch/mips/kvm/interrupt.c |   8 +--
>  arch/mips/kvm/locore.S|  12 ++--
>  arch/mips/kvm/mips.c  |  38 ++--
>  arch/mips/kvm/opcode.h|  22 ---
>  arch/mips/kvm/tlb.c   |  77 +++--
>  arch/mips/kvm/trap_emul.c |   1 -
>  17 files changed, 245 insertions(+), 296 deletions(-)
>  delete mode 100644 arch/mips/kvm/opcode.h
> 
> Cc: Ralf Baechle 
> Cc: Paolo Bonzini 
> Cc: Gleb Natapov 
> Cc: linux-m...@linux-mips.org
> Cc: kvm@vger.kernel.org
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] VFIO: platform: reset: fix a warning message condition

2015-12-17 Thread Eric Auger
Hi Dan,

thanks for pointing this out.

Reviewed-by: Eric Auger 

I add Thomas in CC since the bug also exists in the native driver I
think (drivers/net/ethernet/amd/xgbe/drivers/net/ethernet/amd/xgbe
/xgbe-dev.c, xgbe_exit function).

Best Regards

Eric

On 12/17/2015 01:27 PM, Dan Carpenter wrote:
> This loop ends with count set to -1 and not zero so the warning message
> isn't printed when it should be.  I've fixed this by change the postop
> to a preop.
> 
> Fixes: 0990822c9866 ('VFIO: platform: reset: AMD xgbe reset module')
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c 
> b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
> index da5356f..d4030d0 100644
> --- a/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
> +++ b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
> @@ -110,7 +110,7 @@ int vfio_platform_amdxgbe_reset(struct 
> vfio_platform_device *vdev)
>   usleep_range(10, 15);
>  
>   count = 2000;
> - while (count-- && (ioread32(xgmac_regs->ioaddr + DMA_MR) & 1))
> + while (--count && (ioread32(xgmac_regs->ioaddr + DMA_MR) & 1))
>   usleep_range(500, 600);
>  
>   if (!count)
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] KVM-UNIT-TESTS: Hyper-V SynIC timers test

2015-12-17 Thread Paolo Bonzini


On 16/12/2015 19:51, Denis V. Lunev wrote:
> On 12/08/2015 07:36 PM, Andrey Smetanin wrote:
>> The test checks Hyper-V SynIC timers functionality.
>> The test runs on every vCPU and performs start/stop
>> of periodic/one-shot timers (with period=1ms) and checks
>> validity of received expiration messages in appropriate
>> ISR's.
>>
>> Changes v2:
>> * Share generic Hyper-V tests code
>> * Hyper-V SynIC timers test fixes to improve
>> readability and output
>>
>> Signed-off-by: Andrey Smetanin 
>> Reviewed-by: Roman Kagan 
>> CC: Paolo Bonzini 
>> CC: Marcelo Tosatti 
>> CC: Roman Kagan 
>> CC: Denis V. Lunev 
>> CC: qemu-de...@nongnu.org
>>
>> Andrey Smetanin (3):
>>lib/x86: Make free_page() available to call
>>x86/hyperv: Move Hyper-V generic code into hyperv.h/hyperv.c
>>x86: Hyper-V SynIC timers test
>>
>>   config/config-x86-common.mak |   8 +-
>>   lib/x86/msr.h|  23 ---
>>   lib/x86/vm.h |   1 +
>>   x86/hyperv.c |  24 +++
>>   x86/hyperv.h | 183 +
>>   x86/hyperv_stimer.c  | 376
>> +++
>>   x86/hyperv_synic.c   |  42 +
>>   x86/unittests.cfg|   5 +
>>   8 files changed, 603 insertions(+), 59 deletions(-)
>>   create mode 100644 x86/hyperv.c
>>   create mode 100644 x86/hyperv.h
>>   create mode 100644 x86/hyperv_stimer.c
>>
> ping :)

I was waiting for the QEMU 2.5 release so that I can merge the support
patches first.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] VFIO: platform: reset: fix a warning message condition

2015-12-17 Thread Dan Carpenter
On Thu, Dec 17, 2015 at 01:45:52PM +0100, Eric Auger wrote:
> I add Thomas in CC since the bug also exists in the native driver I
> think (drivers/net/ethernet/amd/xgbe/drivers/net/ethernet/amd/xgbe
> /xgbe-dev.c, xgbe_exit function).
> 

Thanks, but I sent a patch for that one already.  (These are static
checker fixes).

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-17 Thread Marcelo Tosatti
On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  wrote:
> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  wrote:
> >>
> >>
> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
> >>> > RAW TSC NTP corrected TSC
> >>> > t0  10  10
> >>> > t1  20  19.99
> >>> > t2  30  29.98
> >>> > t3  40  39.97
> >>> > t4  50  49.96
> >>> >
> >>> > ...
> >>> >
> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> >>> > you can see what will happen.
> >>>
> >>> Sure, but why would you ever switch from one to the other?
> >>
> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
> >> resume, the TSC certainly increases at the same rate as before, but the
> >> raw TSC restarted counting from 0 and systemtime has increased slower
> >> than the guest kvmclock.
> >
> > Wait, are we talking about the host's NTP or the guest's NTP?
> >
> > If it's the host's, then wouldn't systemtime be reset after resume to
> > the NTP corrected value?  If so, the guest wouldn't see time go
> > backwards.
> >
> > If it's the guest's, then the guest's NTP correction is applied on top
> > of kvmclock, and this shouldn't matter.
> >
> > I still feel like I'm missing something very basic here.
> >
> 
> OK, I think I get it.
> 
> Marcelo, I thought that kvmclock was supposed to propagate the host's
> correction to the guest.  If it did, indeed, propagate the correction
> then, after resume, the host's new system_time would match the guest's
> idea of it (after accounting for the guest's long nap), and I don't
> think there would be a problem.
> That being said, I can't find the code in the masterclock stuff that
> would actually do this.

Guest clock is maintained by guest timekeeping code, which does:

timer_interrupt() 
offset = read clocksource since last timer interrupt
accumulate_to_systemclock(offset)

The frequency correction of NTP in the host can be applied to 
kvmclock, which will be visible to the guest 
at "read clocksource since last timer interrupt" 
(kvmclock_clocksource_read function).

This does not mean that the NTP correction in the host is propagated
to the guests system clock directly.

(For example, the guest can run NTP which is free to do further
adjustments at "accumulate_to_systemclock(offset)" time).

> If, on the other hand, the host's NTP correction is not supposed to
> propagate to the guest, 

This is optional. There is a module option to control this, in fact.

Its nice to have, because then you can execute a guest without NTP
(say without network connection), and have a kvmclock (kvmclock is a
clocksource, not a guest system clock) which is NTP corrected.

> then shouldn't KVM just update system_time on
> resume to whatever the guest would think it had (which I think would
> be equivalent to the host's CLOCK_MONOTONIC_RAW value, possibly
> shifted by some per-guest constant offset).
> 
> --Andy

Sure, you could add a correction to compensate and make sure 
the guest clock does not see time backwards.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-17 Thread Andy Lutomirski
On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti  wrote:
> On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
>> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  wrote:
>> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  
>> > wrote:
>> >>
>> >>
>> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
>> >>> > RAW TSC NTP corrected TSC
>> >>> > t0  10  10
>> >>> > t1  20  19.99
>> >>> > t2  30  29.98
>> >>> > t3  40  39.97
>> >>> > t4  50  49.96
>> >>> >
>> >>> > ...
>> >>> >
>> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
>> >>> > you can see what will happen.
>> >>>
>> >>> Sure, but why would you ever switch from one to the other?
>> >>
>> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
>> >> resume, the TSC certainly increases at the same rate as before, but the
>> >> raw TSC restarted counting from 0 and systemtime has increased slower
>> >> than the guest kvmclock.
>> >
>> > Wait, are we talking about the host's NTP or the guest's NTP?
>> >
>> > If it's the host's, then wouldn't systemtime be reset after resume to
>> > the NTP corrected value?  If so, the guest wouldn't see time go
>> > backwards.
>> >
>> > If it's the guest's, then the guest's NTP correction is applied on top
>> > of kvmclock, and this shouldn't matter.
>> >
>> > I still feel like I'm missing something very basic here.
>> >
>>
>> OK, I think I get it.
>>
>> Marcelo, I thought that kvmclock was supposed to propagate the host's
>> correction to the guest.  If it did, indeed, propagate the correction
>> then, after resume, the host's new system_time would match the guest's
>> idea of it (after accounting for the guest's long nap), and I don't
>> think there would be a problem.
>> That being said, I can't find the code in the masterclock stuff that
>> would actually do this.
>
> Guest clock is maintained by guest timekeeping code, which does:
>
> timer_interrupt()
> offset = read clocksource since last timer interrupt
> accumulate_to_systemclock(offset)
>
> The frequency correction of NTP in the host can be applied to
> kvmclock, which will be visible to the guest
> at "read clocksource since last timer interrupt"
> (kvmclock_clocksource_read function).

pvclock_clocksource_read?  That seems to do the same thing as all the
other clocksource access functions.

>
> This does not mean that the NTP correction in the host is propagated
> to the guests system clock directly.
>
> (For example, the guest can run NTP which is free to do further
> adjustments at "accumulate_to_systemclock(offset)" time).

Of course.  But I expected that, in the absence of NTP on the guest,
that the guest would track the host's *corrected* time.

>
>> If, on the other hand, the host's NTP correction is not supposed to
>> propagate to the guest,
>
> This is optional. There is a module option to control this, in fact.
>
> Its nice to have, because then you can execute a guest without NTP
> (say without network connection), and have a kvmclock (kvmclock is a
> clocksource, not a guest system clock) which is NTP corrected.

Can you point to how this works?  I found kvm_guest_time_update, whch
is called under circumstances that I haven't untangled.  I can't
really tell what it's trying to do.

In any case, this still seems much more convoluted than it has to be.
In the case in which the host has a stable TSC (tsc is selected in the
core timekeeping code, VCLOCK_TSC is set, etc), which is basically all
the time on the last few generations of CPUs, then the core
timekeeping code is already exposing a linear function that's supposed
to be used for monotonic, cpu-local access to a corrected nanosecond
counter.  It's even in pretty much exactly the right form to pass
through to the guest via pvclock in the gtod data.  Why doesn't KVM
pass it through verbatim, updated in real time?  Is there some legacy
reason that KVM must apply its own corrections and has to jump through
hoops to pause vcpus when updating those vcpu's copies of the pvclock
data?

>
>> then shouldn't KVM just update system_time on
>> resume to whatever the guest would think it had (which I think would
>> be equivalent to the host's CLOCK_MONOTONIC_RAW value, possibly
>> shifted by some per-guest constant offset).
>>
>> --Andy
>
> Sure, you could add a correction to compensate and make sure
> the guest clock does not see time backwards.
>

Could you help do that?  You understand the code far better than I do.

As it stands, it simply doesn't work on any system that suspends and
resumes (unless maybe the system has the upcoming Intel ART feature,
and I have no clue when that'll show up).

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH v7 07/19] KVM: ARM64: PMU: Add perf event map and introduce perf event creating function

2015-12-17 Thread Marc Zyngier
On 17/12/15 15:22, Mark Rutland wrote:
> On Tue, Dec 15, 2015 at 04:49:27PM +0800, Shannon Zhao wrote:
>> From: Shannon Zhao 
>>
>> When we use tools like perf on host, perf passes the event type and the
>> id of this event type category to kernel, then kernel will map them to
>> hardware event number and write this number to PMU PMEVTYPER_EL0
>> register. When getting the event number in KVM, directly use raw event
>> type to create a perf_event for it.
>>
>> Signed-off-by: Shannon Zhao 
>> ---
>>  arch/arm64/include/asm/pmu.h |   3 ++
>>  arch/arm64/kvm/Makefile  |   1 +
>>  include/kvm/arm_pmu.h|  11 
>>  virt/kvm/arm/pmu.c   | 122 
>> +++
>>  4 files changed, 137 insertions(+)
>>  create mode 100644 virt/kvm/arm/pmu.c
> 
> [...]
> 
>> +/**
>> + * kvm_pmu_set_counter_event_type - set selected counter to monitor some 
>> event
>> + * @vcpu: The vcpu pointer
>> + * @data: The data guest writes to PMXEVTYPER_EL0
>> + * @select_idx: The number of selected counter
>> + *
>> + * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to 
>> count an
>> + * event with given hardware event number. Here we call perf_event API to
>> + * emulate this action and create a kernel perf event for it.
>> + */
>> +void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
>> +u64 select_idx)
>> +{
>> +struct kvm_pmu *pmu = >arch.pmu;
>> +struct kvm_pmc *pmc = >pmc[select_idx];
>> +struct perf_event *event;
>> +struct perf_event_attr attr;
>> +u64 eventsel, counter;
>> +
>> +kvm_pmu_stop_counter(vcpu, pmc);
>> +eventsel = data & ARMV8_EVTYPE_EVENT;
>> +
>> +memset(, 0, sizeof(struct perf_event_attr));
>> +attr.type = PERF_TYPE_RAW;
>> +attr.size = sizeof(attr);
>> +attr.pinned = 1;
>> +attr.disabled = kvm_pmu_counter_is_enabled(vcpu, select_idx);
>> +attr.exclude_user = data & ARMV8_EXCLUDE_EL0 ? 1 : 0;
>> +attr.exclude_kernel = data & ARMV8_EXCLUDE_EL1 ? 1 : 0;
>> +attr.exclude_hv = 1; /* Don't count EL2 events */
>> +attr.exclude_host = 1; /* Don't count host events */
>> +attr.config = eventsel;
>> +
>> +counter = kvm_pmu_get_counter_value(vcpu, select_idx);
>> +/* The initial sample period (overflow count) of an event. */
>> +attr.sample_period = (-counter) & pmc->bitmask;
>> +
>> +event = perf_event_create_kernel_counter(, -1, current, NULL, pmc);
> 
> As far as I can see, this is going to result in unreliable counts on a
> big.LITTLE system, even if the VCPUs are constrained to one class of
> core.
> 
> As this is a task-bound event (cpu == -1, task is current), the perf
> core will stop as soon as one PMU driver agrees to handle the event. The
> event will then only count on CPUs handled by that driver.
> 
> If you're unlucky, the set of CPUs handled by that driver is not the
> same as the set of CPUs your VM is constrained to. e.g. your VM might be
> on little cores, but the big PMU driver accepted the event, and only
> counts on big cores.
> 
> I'm not sure how we can solve that.

Yeah, another level of BL braindeadness. We should have some API to
match the PMU we want on the CPU we're on at the moment that trap occurs.

I don't think this should block this series though - this is something
we can improve on in parallel (a possible solution being to forbid KVM
on BL platform altogether).

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-17 Thread Marcelo Tosatti
GOn Mon, Dec 14, 2015 at 02:31:10PM -0800, Andy Lutomirski wrote:
> On Mon, Dec 14, 2015 at 2:00 PM, Marcelo Tosatti  wrote:
> > On Mon, Dec 14, 2015 at 02:44:15PM +0100, Paolo Bonzini wrote:
> >>
> >>
> >> On 11/12/2015 22:57, Andy Lutomirski wrote:
> >> > I'm still not seeing the issue.
> >> >
> >> > The formula is:
> >> >
> >> > (((rdtsc - pvti->tsc_timestamp) * pvti->tsc_to_system_mul) >>
> >> > pvti->tsc_shift) + pvti->system_time
> >> >
> >> > Obviously, if you reset pvti->tsc_timestamp to the current tsc value
> >> > after suspend/resume, you would also need to update system_time.
> >> >
> >> > I don't see what this has to do with suspend/resume or with whether
> >> > the effective scale factor is greater than or less than one.  The only
> >> > suspend/resume interaction I can see is that, if the host allows the
> >> > guest-observed TSC value to jump (which is arguably a bug, what that's
> >> > not important here), it needs to update pvti before resuming the
> >> > guest.
> >>
> >> Which is not an issue, since freezing obviously gets all CPUs out of
> >> guest mode.
> >>
> >> Marcelo, can you provide an example with made-up values for tsc and pvti?
> >
> > I meant "systemtime" at ^.
> >
> > guest visible clock = systemtime (updated at time 0, guest initialization) 
> > + scaled tsc reads=LARGE VALUE.
> >   ^^
> > guest reads clock to memory at location A = scaled tsc read.
> > -> suspend resume event
> > guest visible clock = systemtime (updated at time AFTER SUSPEND) + scaled 
> > tsc reads=0.
> > guest reads clock to memory at location B.
> >
> > So before the suspend/resume event, the clock is the RAW TSC values
> > (scaled by kvmclock, but the frequency of the RAW TSC).
> >
> > After suspend/resume event, the clock is updated from the host
> > via get_kernel_ns(), which reads the corrected NTP frequency TSC.
> >
> > So you switch the timebase, from a clock running at a given frequency,
> > to a clock running at another frequency (effective frequency).
> >
> > Example:
> >
> > RAW TSC NTP corrected TSC
> > t0  10  10
> > t1  20  19.99
> > t2  30  29.98
> > t3  40  39.97
> > t4  50  49.96
> >
> > ...
> >
> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> > you can see what will happen.
> 
> Sure, but why would you ever switch from one to the other? 

Because thats what happens when you ask kvmclock to update from system
time (which is a reliable clock, resistant to suspend/resume issues).

>  I'm still not seeing the scenario under which this discontinuity is
> visible to anything other than the kvmclock code itself.

Host userspace can see if it uses TSC and clock_gettime()
and expects them to run hand in hand.

> The only things that need to be monotonic are the output from
> vread_pvclock and the in-kernel equivalent, I think.
> 
> --Andy

clock_gettime as well, should be monotonic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: debug: Remove spurious inline attributes

2015-12-17 Thread Marc Zyngier
On 17/12/15 16:28, Alex Bennée wrote:
> 
> Marc Zyngier  writes:
> 
>> The debug trapping code is pretty heavy on the "inline" attribute,
>> but most functions are actually referenced in the sysreg tables,
>> making the inlining imposible.
>>
>> Removing the useless inline qualifier seems the right thing to do,
>> having verified that the output code is similar.
>>
>> Cc: Alex Bennée 
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/sys_regs.c | 58 
>> +++
>>  1 file changed, 29 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 88adebf..eec3598 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -220,9 +220,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>>   * All writes will set the KVM_ARM64_DEBUG_DIRTY flag to ensure the
>>   * hyp.S code switches between host and guest values in future.
>>   */
>> -static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
>> -  struct sys_reg_params *p,
>> -  u64 *dbg_reg)
>> +static void reg_to_dbg(struct kvm_vcpu *vcpu,
>> +   struct sys_reg_params *p,
>> +   u64 *dbg_reg)
>>  {
>>  u64 val = p->regval;
>>
>> @@ -235,18 +235,18 @@ static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
>>  vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>>  }
>>
>> -static inline void dbg_to_reg(struct kvm_vcpu *vcpu,
>> -  struct sys_reg_params *p,
>> -  u64 *dbg_reg)
>> +static void dbg_to_reg(struct kvm_vcpu *vcpu,
>> +   struct sys_reg_params *p,
>> +   u64 *dbg_reg)
>>  {
>>  p->regval = *dbg_reg;
>>  if (p->is_32bit)
>>  p->regval &= 0xUL;
>>  }
> 
> Christoffer's "register keyword" comments not-withstanding I'd prefer to
> keep the reg_to_dbg/dbg_to_reg functions as inline because they really
> are just boilerplate helpers I didn't want to repeat in the actual
> access functions - although if you've looked at the code I assume that
> means GCC has been smart about it.

Indeed, GCC is smart enough to directly inline it. In general, GCC is
doing a pretty good job at inlining static functions that are small
enough not to be worth jumping to. These days, "static inline" only
really makes sense in an include file.

> The rest all make sense. I wonder why I was being so inline happy?
> 
> Reviewed-by: Alex Bennée 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Question] Switching VCPU CPL from the hypervisor ?

2015-12-17 Thread Yacine HEBBAL
Paolo Bonzini  redhat.com> writes:
> On 15/12/2015 18:02, Hebbal Yacine wrote:
> > What I want to do is: when a controlled process is in user mode, i
> > change its cpl to 0, force it to execute a code that is injected in the
> > VM, set back its cpl to 3 and let it run like if nothing happened
> 
> Could you inject an SMI and place your code in the guest firmware's SMM
> handler?  What input is needed by this CPL=0 code?
> 
> Paolo

I think the the solution follows this logic, meaning we inject an interrupt,
intercept execution of its handler (cpl is 0), save cpu context, execute the
code we want, push a return address, and force the process to execute IRET
instruction which will complete the cleaning, restore cpl to 3 and let the
process resume its normal execution. I'll continue to dig this to see if
there is any better approach, thanks for your help :)


Yacine

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: debug: Remove spurious inline attributes

2015-12-17 Thread Alex Bennée

Marc Zyngier  writes:

> The debug trapping code is pretty heavy on the "inline" attribute,
> but most functions are actually referenced in the sysreg tables,
> making the inlining imposible.
>
> Removing the useless inline qualifier seems the right thing to do,
> having verified that the output code is similar.
>
> Cc: Alex Bennée 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/sys_regs.c | 58 
> +++
>  1 file changed, 29 insertions(+), 29 deletions(-)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 88adebf..eec3598 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -220,9 +220,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>   * All writes will set the KVM_ARM64_DEBUG_DIRTY flag to ensure the
>   * hyp.S code switches between host and guest values in future.
>   */
> -static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
> -   struct sys_reg_params *p,
> -   u64 *dbg_reg)
> +static void reg_to_dbg(struct kvm_vcpu *vcpu,
> +struct sys_reg_params *p,
> +u64 *dbg_reg)
>  {
>   u64 val = p->regval;
>
> @@ -235,18 +235,18 @@ static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
>   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>  }
>
> -static inline void dbg_to_reg(struct kvm_vcpu *vcpu,
> -   struct sys_reg_params *p,
> -   u64 *dbg_reg)
> +static void dbg_to_reg(struct kvm_vcpu *vcpu,
> +struct sys_reg_params *p,
> +u64 *dbg_reg)
>  {
>   p->regval = *dbg_reg;
>   if (p->is_32bit)
>   p->regval &= 0xUL;
>  }

Christoffer's "register keyword" comments not-withstanding I'd prefer to
keep the reg_to_dbg/dbg_to_reg functions as inline because they really
are just boilerplate helpers I didn't want to repeat in the actual
access functions - although if you've looked at the code I assume that
means GCC has been smart about it.

>
> -static inline bool trap_bvr(struct kvm_vcpu *vcpu,
> - struct sys_reg_params *p,
> - const struct sys_reg_desc *rd)
> +static bool trap_bvr(struct kvm_vcpu *vcpu,
> +  struct sys_reg_params *p,
> +  const struct sys_reg_desc *rd)
>  {
>   u64 *dbg_reg = >arch.vcpu_debug_state.dbg_bvr[rd->reg];
>
> @@ -280,15 +280,15 @@ static int get_bvr(struct kvm_vcpu *vcpu, const struct 
> sys_reg_desc *rd,
>   return 0;
>  }
>
> -static inline void reset_bvr(struct kvm_vcpu *vcpu,
> -  const struct sys_reg_desc *rd)
> +static void reset_bvr(struct kvm_vcpu *vcpu,
> +   const struct sys_reg_desc *rd)
>  {
>   vcpu->arch.vcpu_debug_state.dbg_bvr[rd->reg] = rd->val;
>  }
>
> -static inline bool trap_bcr(struct kvm_vcpu *vcpu,
> - struct sys_reg_params *p,
> - const struct sys_reg_desc *rd)
> +static bool trap_bcr(struct kvm_vcpu *vcpu,
> +  struct sys_reg_params *p,
> +  const struct sys_reg_desc *rd)
>  {
>   u64 *dbg_reg = >arch.vcpu_debug_state.dbg_bcr[rd->reg];
>
> @@ -323,15 +323,15 @@ static int get_bcr(struct kvm_vcpu *vcpu, const struct 
> sys_reg_desc *rd,
>   return 0;
>  }
>
> -static inline void reset_bcr(struct kvm_vcpu *vcpu,
> -  const struct sys_reg_desc *rd)
> +static void reset_bcr(struct kvm_vcpu *vcpu,
> +   const struct sys_reg_desc *rd)
>  {
>   vcpu->arch.vcpu_debug_state.dbg_bcr[rd->reg] = rd->val;
>  }
>
> -static inline bool trap_wvr(struct kvm_vcpu *vcpu,
> - struct sys_reg_params *p,
> - const struct sys_reg_desc *rd)
> +static bool trap_wvr(struct kvm_vcpu *vcpu,
> +  struct sys_reg_params *p,
> +  const struct sys_reg_desc *rd)
>  {
>   u64 *dbg_reg = >arch.vcpu_debug_state.dbg_wvr[rd->reg];
>
> @@ -366,15 +366,15 @@ static int get_wvr(struct kvm_vcpu *vcpu, const struct 
> sys_reg_desc *rd,
>   return 0;
>  }
>
> -static inline void reset_wvr(struct kvm_vcpu *vcpu,
> -  const struct sys_reg_desc *rd)
> +static void reset_wvr(struct kvm_vcpu *vcpu,
> +   const struct sys_reg_desc *rd)
>  {
>   vcpu->arch.vcpu_debug_state.dbg_wvr[rd->reg] = rd->val;
>  }
>
> -static inline bool trap_wcr(struct kvm_vcpu *vcpu,
> - struct sys_reg_params *p,
> - const struct sys_reg_desc *rd)
> +static bool trap_wcr(struct kvm_vcpu *vcpu,
> +  struct sys_reg_params *p,
> +  const struct sys_reg_desc *rd)
>  {
>   u64 *dbg_reg = >arch.vcpu_debug_state.dbg_wcr[rd->reg];
>
> @@ -408,8 +408,8 @@ static int get_wcr(struct 

Re: [PATCH v7 07/19] KVM: ARM64: PMU: Add perf event map and introduce perf event creating function

2015-12-17 Thread Mark Rutland
On Tue, Dec 15, 2015 at 04:49:27PM +0800, Shannon Zhao wrote:
> From: Shannon Zhao 
> 
> When we use tools like perf on host, perf passes the event type and the
> id of this event type category to kernel, then kernel will map them to
> hardware event number and write this number to PMU PMEVTYPER_EL0
> register. When getting the event number in KVM, directly use raw event
> type to create a perf_event for it.
> 
> Signed-off-by: Shannon Zhao 
> ---
>  arch/arm64/include/asm/pmu.h |   3 ++
>  arch/arm64/kvm/Makefile  |   1 +
>  include/kvm/arm_pmu.h|  11 
>  virt/kvm/arm/pmu.c   | 122 
> +++
>  4 files changed, 137 insertions(+)
>  create mode 100644 virt/kvm/arm/pmu.c

[...]

> +/**
> + * kvm_pmu_set_counter_event_type - set selected counter to monitor some 
> event
> + * @vcpu: The vcpu pointer
> + * @data: The data guest writes to PMXEVTYPER_EL0
> + * @select_idx: The number of selected counter
> + *
> + * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to 
> count an
> + * event with given hardware event number. Here we call perf_event API to
> + * emulate this action and create a kernel perf event for it.
> + */
> +void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
> + u64 select_idx)
> +{
> + struct kvm_pmu *pmu = >arch.pmu;
> + struct kvm_pmc *pmc = >pmc[select_idx];
> + struct perf_event *event;
> + struct perf_event_attr attr;
> + u64 eventsel, counter;
> +
> + kvm_pmu_stop_counter(vcpu, pmc);
> + eventsel = data & ARMV8_EVTYPE_EVENT;
> +
> + memset(, 0, sizeof(struct perf_event_attr));
> + attr.type = PERF_TYPE_RAW;
> + attr.size = sizeof(attr);
> + attr.pinned = 1;
> + attr.disabled = kvm_pmu_counter_is_enabled(vcpu, select_idx);
> + attr.exclude_user = data & ARMV8_EXCLUDE_EL0 ? 1 : 0;
> + attr.exclude_kernel = data & ARMV8_EXCLUDE_EL1 ? 1 : 0;
> + attr.exclude_hv = 1; /* Don't count EL2 events */
> + attr.exclude_host = 1; /* Don't count host events */
> + attr.config = eventsel;
> +
> + counter = kvm_pmu_get_counter_value(vcpu, select_idx);
> + /* The initial sample period (overflow count) of an event. */
> + attr.sample_period = (-counter) & pmc->bitmask;
> +
> + event = perf_event_create_kernel_counter(, -1, current, NULL, pmc);

As far as I can see, this is going to result in unreliable counts on a
big.LITTLE system, even if the VCPUs are constrained to one class of
core.

As this is a task-bound event (cpu == -1, task is current), the perf
core will stop as soon as one PMU driver agrees to handle the event. The
event will then only count on CPUs handled by that driver.

If you're unlucky, the set of CPUs handled by that driver is not the
same as the set of CPUs your VM is constrained to. e.g. your VM might be
on little cores, but the big PMU driver accepted the event, and only
counts on big cores.

I'm not sure how we can solve that.

Thanks,
Mark.

> + if (IS_ERR(event)) {
> + printk_once("kvm: pmu event creation failed %ld\n",
> + PTR_ERR(event));
> + return;
> + }
> +
> + pmc->perf_event = event;
> +}
> -- 
> 2.0.4
> 
> 
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 01/12] run_tests: move run() to scripts/

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:32PM +0100, Radim Krčmář wrote:
> We'll be using it from scripts/mkstandalone later.
> 
> Signed-off-by: Radim Krčmář 
> ---
>  v2: new
>  
>  run_tests.sh | 53 +
>  scripts/run.bash | 51 +++

Could probably just put run() in scripts/functions.bash

drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL 17/45] target-i386/kvm: Hyper-V SynIC timers MSR's support

2015-12-17 Thread Paolo Bonzini
From: Andrey Smetanin 

Hyper-V SynIC timers are host timers that are configurable
by guest through corresponding MSR's (HV_X64_MSR_STIMER*).
Guest setup and use fired by host events(SynIC interrupt
and appropriate timer expiration message) as guest clock
events.

The state of Hyper-V SynIC timers are stored in corresponding
MSR's. This patch seria implements such MSR's support and migration.

Signed-off-by: Andrey Smetanin 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Denis V. Lunev 
CC: Roman Kagan 
CC: kvm@vger.kernel.org

Message-Id: <1448464885-8300-3-git-send-email-asmeta...@virtuozzo.com>
Signed-off-by: Paolo Bonzini 
---
 target-i386/cpu-qom.h |  1 +
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  2 ++
 target-i386/kvm.c | 50 +-
 target-i386/machine.c | 29 +
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
index 7ea5b34..5f9d960 100644
--- a/target-i386/cpu-qom.h
+++ b/target-i386/cpu-qom.h
@@ -95,6 +95,7 @@ typedef struct X86CPU {
 bool hyperv_vpindex;
 bool hyperv_runtime;
 bool hyperv_synic;
+bool hyperv_stimer;
 bool check_cpuid;
 bool enforce_cpuid;
 bool expose_kvm;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 2fdd855..92f7cc1 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3147,6 +3147,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
 DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
 DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
+DEFINE_PROP_BOOL("hv-stimer", X86CPU, hyperv_stimer, false),
 DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
 DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 1168ddf..595891e 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -925,6 +925,8 @@ typedef struct CPUX86State {
 uint64_t msr_hv_synic_evt_page;
 uint64_t msr_hv_synic_msg_page;
 uint64_t msr_hv_synic_sint[HV_SYNIC_SINT_COUNT];
+uint64_t msr_hv_stimer_config[HV_SYNIC_STIMER_COUNT];
+uint64_t msr_hv_stimer_count[HV_SYNIC_STIMER_COUNT];
 
 /* exception/interrupt handling */
 int error_code;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d1c2c81..7692b59 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -90,6 +90,7 @@ static bool has_msr_hv_reset;
 static bool has_msr_hv_vpindex;
 static bool has_msr_hv_runtime;
 static bool has_msr_hv_synic;
+static bool has_msr_hv_stimer;
 static bool has_msr_mtrr;
 static bool has_msr_xss;
 
@@ -526,7 +527,8 @@ static bool hyperv_enabled(X86CPU *cpu)
 cpu->hyperv_reset ||
 cpu->hyperv_vpindex ||
 cpu->hyperv_runtime ||
-cpu->hyperv_synic);
+cpu->hyperv_synic ||
+cpu->hyperv_stimer);
 }
 
 static Error *invtsc_mig_blocker;
@@ -630,6 +632,13 @@ int kvm_arch_init_vcpu(CPUState *cs)
 env->msr_hv_synic_sint[sint] = HV_SYNIC_SINT_MASKED;
 }
 }
+if (cpu->hyperv_stimer) {
+if (!has_msr_hv_stimer) {
+fprintf(stderr, "Hyper-V timers aren't supported by kernel\n");
+return -ENOSYS;
+}
+c->eax |= HV_X64_MSR_SYNTIMER_AVAILABLE;
+}
 c = _data.entries[cpuid_i++];
 c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
 if (cpu->hyperv_relaxed_timing) {
@@ -980,6 +989,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hv_synic = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == HV_X64_MSR_STIMER0_CONFIG) {
+has_msr_hv_stimer = true;
+continue;
+}
 }
 }
 
@@ -1558,6 +1571,19 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
   env->msr_hv_synic_sint[j]);
 }
 }
+if (has_msr_hv_stimer) {
+int j;
+
+for (j = 0; j < ARRAY_SIZE(env->msr_hv_stimer_config); j++) {
+kvm_msr_entry_set([n++], HV_X64_MSR_STIMER0_CONFIG + j*2,
+env->msr_hv_stimer_config[j]);
+}
+
+for (j = 0; j < ARRAY_SIZE(env->msr_hv_stimer_count); j++) {
+kvm_msr_entry_set([n++], HV_X64_MSR_STIMER0_COUNT + j*2,
+env->msr_hv_stimer_count[j]);
+}
+}
 if (has_msr_mtrr) {
 

[PULL 16/45] hw/misc: Hyper-V test device 'hyperv-testdev'

2015-12-17 Thread Paolo Bonzini
From: Andrey Smetanin 

'hyperv-testdev' will be used by kvm-unit-tests
to setup Hyper-V SynIC SINT's routing and to inject
Hyper-V SynIC SINT's.

Hyper-V test device is ISA type device that creates 0x3000
IO memory region and catches write access into it. Every
write operation data decoded into ctl code and parameters
for Hyper-V test device.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/misc/Makefile.objs  |   1 +
 hw/misc/hyperv_testdev.c   | 167 +
 4 files changed, 170 insertions(+)
 create mode 100644 hw/misc/hyperv_testdev.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 43c96d1..57ccff3 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index dfb8095..30f5e75 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 8a235df..d4765c2 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -43,3 +43,4 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
 
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 obj-$(CONFIG_EDU) += edu.o
+obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
diff --git a/hw/misc/hyperv_testdev.c b/hw/misc/hyperv_testdev.c
new file mode 100644
index 000..d88844a
--- /dev/null
+++ b/hw/misc/hyperv_testdev.c
@@ -0,0 +1,167 @@
+/*
+ * QEMU KVM Hyper-V test device to support Hyper-V kvm-unit-tests
+ *
+ * Copyright (C) 2015 Andrey Smetanin 
+ *
+ * Authors:
+ *  Andrey Smetanin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/hw.h"
+#include "hw/qdev.h"
+#include "hw/isa/isa.h"
+#include "sysemu/kvm.h"
+#include "linux/kvm.h"
+#include "target-i386/hyperv.h"
+#include "kvm_i386.h"
+
+#define HV_TEST_DEV_MAX_SINT_ROUTES 64
+
+struct HypervTestDev {
+ISADevice parent_obj;
+MemoryRegion sint_control;
+HvSintRoute *sint_route[HV_TEST_DEV_MAX_SINT_ROUTES];
+};
+typedef struct HypervTestDev HypervTestDev;
+
+#define TYPE_HYPERV_TEST_DEV "hyperv-testdev"
+#define HYPERV_TEST_DEV(obj) \
+OBJECT_CHECK(HypervTestDev, (obj), TYPE_HYPERV_TEST_DEV)
+
+enum {
+HV_TEST_DEV_SINT_ROUTE_CREATE = 1,
+HV_TEST_DEV_SINT_ROUTE_DESTROY,
+HV_TEST_DEV_SINT_ROUTE_SET_SINT
+};
+
+static int alloc_sint_route_index(HypervTestDev *dev)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(dev->sint_route); i++) {
+if (dev->sint_route[i] == NULL) {
+return i;
+}
+}
+return -1;
+}
+
+static void free_sint_route_index(HypervTestDev *dev, int i)
+{
+assert(i >= 0 && i < ARRAY_SIZE(dev->sint_route));
+dev->sint_route[i] = NULL;
+}
+
+static int find_sint_route_index(HypervTestDev *dev, uint32_t vcpu_id,
+ uint32_t sint)
+{
+HvSintRoute *sint_route;
+int i;
+
+for (i = 0; i < ARRAY_SIZE(dev->sint_route); i++) {
+sint_route = dev->sint_route[i];
+if (sint_route && sint_route->vcpu_id == vcpu_id &&
+sint_route->sint == sint) {
+return i;
+}
+}
+return -1;
+}
+
+static void hv_synic_test_dev_control(HypervTestDev *dev, uint32_t ctl,
+  uint32_t vcpu_id, uint32_t sint)
+{
+int i;
+HvSintRoute *sint_route;
+
+switch (ctl) {
+case HV_TEST_DEV_SINT_ROUTE_CREATE:
+i = alloc_sint_route_index(dev);
+assert(i >= 0);
+sint_route = kvm_hv_sint_route_create(vcpu_id, sint, NULL);
+assert(sint_route);
+dev->sint_route[i] = sint_route;
+break;
+case HV_TEST_DEV_SINT_ROUTE_DESTROY:
+i = find_sint_route_index(dev, vcpu_id, sint);
+assert(i >= 0);
+sint_route = dev->sint_route[i];
+kvm_hv_sint_route_destroy(sint_route);
+free_sint_route_index(dev, i);
+break;
+case HV_TEST_DEV_SINT_ROUTE_SET_SINT:
+i = 

[PULL 13/45] target-i386/kvm: Hyper-V SynIC MSR's support

2015-12-17 Thread Paolo Bonzini
From: Andrey Smetanin 

This patch does Hyper-V Synthetic interrupt
controller(Hyper-V SynIC) MSR's support and
migration. Hyper-V SynIC is enabled by cpu's
'hv-synic' option.

This patch does not allow cpu creation if
'hv-synic' option specified but kernel
doesn't support Hyper-V SynIC.

Changes v3:
* removed 'msr_hv_synic_version' migration because
it's value always the same
* moved SynIC msr's initialization into kvm_arch_init_vcpu

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
---
 target-i386/cpu-qom.h |  1 +
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  5 
 target-i386/kvm.c | 66 ++-
 target-i386/machine.c | 37 +
 5 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
index e3bfe9d..7ea5b34 100644
--- a/target-i386/cpu-qom.h
+++ b/target-i386/cpu-qom.h
@@ -94,6 +94,7 @@ typedef struct X86CPU {
 bool hyperv_reset;
 bool hyperv_vpindex;
 bool hyperv_runtime;
+bool hyperv_synic;
 bool check_cpuid;
 bool enforce_cpuid;
 bool expose_kvm;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 11e5e39..2fdd855 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3146,6 +3146,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("hv-reset", X86CPU, hyperv_reset, false),
 DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
 DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
+DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
 DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
 DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 84edfd0..1168ddf 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -920,6 +920,11 @@ typedef struct CPUX86State {
 uint64_t msr_hv_tsc;
 uint64_t msr_hv_crash_params[HV_X64_MSR_CRASH_PARAMS];
 uint64_t msr_hv_runtime;
+uint64_t msr_hv_synic_control;
+uint64_t msr_hv_synic_version;
+uint64_t msr_hv_synic_evt_page;
+uint64_t msr_hv_synic_msg_page;
+uint64_t msr_hv_synic_sint[HV_SYNIC_SINT_COUNT];
 
 /* exception/interrupt handling */
 int error_code;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 6dc9846..ca0708a 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -86,6 +86,7 @@ static bool has_msr_hv_crash;
 static bool has_msr_hv_reset;
 static bool has_msr_hv_vpindex;
 static bool has_msr_hv_runtime;
+static bool has_msr_hv_synic;
 static bool has_msr_mtrr;
 static bool has_msr_xss;
 
@@ -521,7 +522,8 @@ static bool hyperv_enabled(X86CPU *cpu)
 cpu->hyperv_crash ||
 cpu->hyperv_reset ||
 cpu->hyperv_vpindex ||
-cpu->hyperv_runtime);
+cpu->hyperv_runtime ||
+cpu->hyperv_synic);
 }
 
 static Error *invtsc_mig_blocker;
@@ -610,6 +612,21 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (cpu->hyperv_runtime && has_msr_hv_runtime) {
 c->eax |= HV_X64_MSR_VP_RUNTIME_AVAILABLE;
 }
+if (cpu->hyperv_synic) {
+int sint;
+
+if (!has_msr_hv_synic ||
+kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_SYNIC, 0)) {
+fprintf(stderr, "Hyper-V SynIC is not supported by kernel\n");
+return -ENOSYS;
+}
+
+c->eax |= HV_X64_MSR_SYNIC_AVAILABLE;
+env->msr_hv_synic_version = HV_SYNIC_VERSION_1;
+for (sint = 0; sint < ARRAY_SIZE(env->msr_hv_synic_sint); sint++) {
+env->msr_hv_synic_sint[sint] = HV_SYNIC_SINT_MASKED;
+}
+}
 c = _data.entries[cpuid_i++];
 c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
 if (cpu->hyperv_relaxed_timing) {
@@ -956,6 +973,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hv_runtime = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == HV_X64_MSR_SCONTROL) {
+has_msr_hv_synic = true;
+continue;
+}
 }
 }
 
@@ -1517,6 +1538,23 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_set([n++], HV_X64_MSR_VP_RUNTIME,
   env->msr_hv_runtime);
 }
+if (cpu->hyperv_synic) {
+int j;
+

[PULL 15/45] target-i386/hyperv: Hyper-V SynIC SINT routing and vcpu exit

2015-12-17 Thread Paolo Bonzini
From: Andrey Smetanin 

Hyper-V SynIC(synthetic interrupt controller) helpers for
Hyper-V SynIC irq routing setup, irq injection, irq ack
notifications event/message pages changes tracking for future use.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
---
 target-i386/Makefile.objs |   2 +-
 target-i386/hyperv.c  | 127 ++
 target-i386/hyperv.h  |  42 +++
 target-i386/kvm.c |   6 +++
 4 files changed, 176 insertions(+), 1 deletion(-)
 create mode 100644 target-i386/hyperv.c
 create mode 100644 target-i386/hyperv.h

diff --git a/target-i386/Makefile.objs b/target-i386/Makefile.objs
index 437d997..2255f46 100644
--- a/target-i386/Makefile.objs
+++ b/target-i386/Makefile.objs
@@ -3,5 +3,5 @@ obj-y += excp_helper.o fpu_helper.o cc_helper.o int_helper.o 
svm_helper.o
 obj-y += smm_helper.o misc_helper.o mem_helper.o seg_helper.o
 obj-y += gdbstub.o
 obj-$(CONFIG_SOFTMMU) += machine.o arch_memory_mapping.o arch_dump.o monitor.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM) += kvm.o hyperv.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
diff --git a/target-i386/hyperv.c b/target-i386/hyperv.c
new file mode 100644
index 000..e79b173
--- /dev/null
+++ b/target-i386/hyperv.c
@@ -0,0 +1,127 @@
+/*
+ * QEMU KVM Hyper-V support
+ *
+ * Copyright (C) 2015 Andrey Smetanin 
+ *
+ * Authors:
+ *  Andrey Smetanin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hyperv.h"
+#include "standard-headers/asm-x86/hyperv.h"
+
+int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit)
+{
+CPUX86State *env = >env;
+
+switch (exit->type) {
+case KVM_EXIT_HYPERV_SYNIC:
+if (!cpu->hyperv_synic) {
+return -1;
+}
+
+/*
+ * For now just track changes in SynIC control and msg/evt pages msr's.
+ * When SynIC messaging/events processing will be added in future
+ * here we will do messages queues flushing and pages remapping.
+ */
+switch (exit->u.synic.msr) {
+case HV_X64_MSR_SCONTROL:
+env->msr_hv_synic_control = exit->u.synic.control;
+break;
+case HV_X64_MSR_SIMP:
+env->msr_hv_synic_msg_page = exit->u.synic.msg_page;
+break;
+case HV_X64_MSR_SIEFP:
+env->msr_hv_synic_evt_page = exit->u.synic.evt_page;
+break;
+default:
+return -1;
+}
+return 0;
+default:
+return -1;
+}
+}
+
+static void kvm_hv_sint_ack_handler(EventNotifier *notifier)
+{
+HvSintRoute *sint_route = container_of(notifier, HvSintRoute,
+   sint_ack_notifier);
+event_notifier_test_and_clear(notifier);
+if (sint_route->sint_ack_clb) {
+sint_route->sint_ack_clb(sint_route);
+}
+}
+
+HvSintRoute *kvm_hv_sint_route_create(uint32_t vcpu_id, uint32_t sint,
+  HvSintAckClb sint_ack_clb)
+{
+HvSintRoute *sint_route;
+int r, gsi;
+
+sint_route = g_malloc0(sizeof(*sint_route));
+r = event_notifier_init(_route->sint_set_notifier, false);
+if (r) {
+goto err;
+}
+
+r = event_notifier_init(_route->sint_ack_notifier, false);
+if (r) {
+goto err_sint_set_notifier;
+}
+
+event_notifier_set_handler(_route->sint_ack_notifier,
+   kvm_hv_sint_ack_handler);
+
+gsi = kvm_irqchip_add_hv_sint_route(kvm_state, vcpu_id, sint);
+if (gsi < 0) {
+goto err_gsi;
+}
+
+r = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+   _route->sint_set_notifier,
+   _route->sint_ack_notifier, 
gsi);
+if (r) {
+goto err_irqfd;
+}
+sint_route->gsi = gsi;
+sint_route->sint_ack_clb = sint_ack_clb;
+sint_route->vcpu_id = vcpu_id;
+sint_route->sint = sint;
+
+return sint_route;
+
+err_irqfd:
+kvm_irqchip_release_virq(kvm_state, gsi);
+err_gsi:
+event_notifier_set_handler(_route->sint_ack_notifier, NULL);
+event_notifier_cleanup(_route->sint_ack_notifier);
+err_sint_set_notifier:
+event_notifier_cleanup(_route->sint_set_notifier);
+err:
+g_free(sint_route);
+
+return NULL;
+}
+
+void kvm_hv_sint_route_destroy(HvSintRoute *sint_route)
+{

[PULL 14/45] kvm: Hyper-V SynIC irq routing support

2015-12-17 Thread Paolo Bonzini
From: Andrey Smetanin 

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini 
---
 include/sysemu/kvm.h |  1 +
 kvm-all.c| 33 +
 2 files changed, 34 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index b31f325..9a569f1 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -455,6 +455,7 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg,
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter);
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint);
 
 int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
EventNotifier *rn, int virq);
diff --git a/kvm-all.c b/kvm-all.c
index ed707fe..e78a378 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1300,6 +1300,34 @@ int kvm_irqchip_add_adapter_route(KVMState *s, 
AdapterInfo *adapter)
 return virq;
 }
 
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint)
+{
+struct kvm_irq_routing_entry kroute = {};
+int virq;
+
+if (!kvm_gsi_routing_enabled()) {
+return -ENOSYS;
+}
+if (!kvm_check_extension(s, KVM_CAP_HYPERV_SYNIC)) {
+return -ENOSYS;
+}
+virq = kvm_irqchip_get_virq(s);
+if (virq < 0) {
+return virq;
+}
+
+kroute.gsi = virq;
+kroute.type = KVM_IRQ_ROUTING_HV_SINT;
+kroute.flags = 0;
+kroute.u.hv_sint.vcpu = vcpu;
+kroute.u.hv_sint.sint = sint;
+
+kvm_add_routing_entry(s, );
+kvm_irqchip_commit_routes(s);
+
+return virq;
+}
+
 #else /* !KVM_CAP_IRQ_ROUTING */
 
 void kvm_init_irq_routing(KVMState *s)
@@ -1325,6 +1353,11 @@ int kvm_irqchip_add_adapter_route(KVMState *s, 
AdapterInfo *adapter)
 return -ENOSYS;
 }
 
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint)
+{
+return -ENOSYS;
+}
+
 static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign)
 {
 abort();
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 00/12] Improve the output of test runners

2015-12-17 Thread Radim Krčmář
v1: http://www.spinics.net/lists/kvm/msg125202.html

Drew brought up the existence of scripts/mkstandalone.sh, which
significantly expanded v2 (and my set of curses) ...
I didn't want to do the same twice, so first part of this series,
[1-4/12], reuses run() from run_tests.sh and does some non-conservative
changes to scripts/mkstandalone.sh.  scripts/mkstandalone.sh is lacking
behind run_tests.sh, but should be good enough to fulfill its purpose.

The output of run_tests.sh has also changed a bit and now looks like
this (you'll again need to imagine colours):

> PASS apic (14 tests)
> PASS ioapic (19 tests)
> PASS smptest (1 tests)
> PASS smptest3 (1 tests)
> PASS vmexit_cpuid
> PASS vmexit_vmcall
> PASS vmexit_mov_from_cr8
> PASS vmexit_mov_to_cr8
> PASS vmexit_inl_pmtimer
> PASS vmexit_ipi
> PASS vmexit_ipi_halt
> PASS vmexit_ple_round_robin
> PASS access
> SKIP smap (0 tests)
> SKIP pku (0 tests)
> PASS emulator (132 tests, 1 skipped)
> PASS eventinj (13 tests)
> PASS hypercall (2 tests)
> PASS idt_test (4 tests)
> PASS msr (13 tests)
> PASS pmu (67 tests, 1 expected failures)
> PASS port80
> PASS realmode
> PASS s3
> PASS sieve
> PASS tsc (3 tests)
> PASS tsc_adjust (5 tests)
> PASS xsave (17 tests)
> PASS rmap_chain
> SKIP svm (0 tests)
> SKIP svm-disabled (0 tests)
> SKIP taskswitch (i386 only)
> SKIP taskswitch2 (i386 only)
> PASS kvmclock_test
> PASS pcid (3 tests)
> SKIP vmx (0 tests)
> PASS debug (7 tests)
> SKIP hyperv_synic (failed check)


Radim Krčmář (12):
  run_tests: move run() to scripts/
  run_tests: prepare for changes in scripts/mkstandalone
  scripts/mkstandalone: use common run function
  scripts/mkstandalone: improve exit paths
  lib/report: allow test skipping
  x86/*: report skipped tests
  x86/pmu: expect failure with nmi_watchdog
  scripts/run: generalize check
  x86/hyperv_synic: check for support before testing
  run_tests: print summary
  wrappers: consolidate skip output
  run_tests: suppress stderr

 lib/libcflat.h  |  1 +
 lib/report.c| 44 +++---
 run_tests.sh| 58 +---
 scripts/mkstandalone.sh | 64 +
 scripts/run.bash| 62 +++
 x86/apic.c  |  7 +++---
 x86/emulator.c  |  2 +-
 x86/hyperv_synic.c  |  2 +-
 x86/pku.c   |  2 +-
 x86/pmu.c   | 11 +++--
 x86/smap.c  |  2 +-
 x86/svm.c   |  2 +-
 x86/tsc.c   |  2 +-
 x86/unittests.cfg   |  4 ++--
 14 files changed, 146 insertions(+), 117 deletions(-)
 create mode 100644 scripts/run.bash

-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 01/12] run_tests: move run() to scripts/

2015-12-17 Thread Radim Krčmář
We'll be using it from scripts/mkstandalone later.

Signed-off-by: Radim Krčmář 
---
 v2: new
 
 run_tests.sh | 53 +
 scripts/run.bash | 51 +++
 2 files changed, 52 insertions(+), 52 deletions(-)
 create mode 100644 scripts/run.bash

diff --git a/run_tests.sh b/run_tests.sh
index fad22a935b00..58949e39c38c 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -6,63 +6,12 @@ if [ ! -f config.mak ]; then
 fi
 source config.mak
 source scripts/functions.bash
+source scripts/run.bash
 
 config=$TEST_DIR/unittests.cfg
 qemu=${QEMU:-qemu-system-$ARCH}
 verbose=0
 
-function run()
-{
-local testname="$1"
-local groups="$2"
-local smp="$3"
-local kernel="$4"
-local opts="$5"
-local arch="$6"
-local check="$7"
-local accel="$8"
-
-if [ -z "$testname" ]; then
-return
-fi
-
-if [ -n "$only_group" ] && ! grep -q "$only_group" <<<$groups; then
-return
-fi
-
-if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
-echo "skip $1 ($arch only)"
-return
-fi
-
-# check a file for a particular value before running a test
-# the check line can contain multiple files to check separated by a space
-# but each check parameter needs to be of the form =
-for check_param in ${check[@]}; do
-path=${check_param%%=*}
-value=${check_param#*=}
-if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then
-echo "skip $1 ($path not equal to $value)"
-return
-fi
-done
-
-cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp $smp 
$opts"
-if [ $verbose != 0 ]; then
-echo $cmdline
-fi
-
-# extra_params in the config file may contain backticks that need to be
-# expanded, so use eval to start qemu
-eval $cmdline >> test.log
-
-if [ $? -le 1 ]; then
-echo -e "\e[32mPASS\e[0m $1"
-else
-echo -e "\e[31mFAIL\e[0m $1"
-fi
-}
-
 function usage()
 {
 cat <=
+for check_param in ${check[@]}; do
+path=${check_param%%=*}
+value=${check_param#*=}
+if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then
+echo "skip $1 ($path not equal to $value)"
+return
+fi
+done
+
+cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp $smp 
$opts"
+if [ $verbose != 0 ]; then
+echo $cmdline
+fi
+
+# extra_params in the config file may contain backticks that need to be
+# expanded, so use eval to start qemu
+eval $cmdline >> test.log
+
+if [ $? -le 1 ]; then
+echo -e "\e[32mPASS\e[0m $1"
+else
+echo -e "\e[31mFAIL\e[0m $1"
+fi
+}
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 12/12] run_tests: suppress stderr

2015-12-17 Thread Radim Krčmář
log it instead to keep the output clear.

Signed-off-by: Radim Krčmář 
---
 v2: new (v1 did this by catching all output in a variable)

 run_tests.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/run_tests.sh b/run_tests.sh
index 62c106a0b693..4c8c20a16eac 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -29,7 +29,7 @@ EOF
 }
 
 __run() { ./$TEST_DIR-run "${@}"; }
-__eval_log() { eval "${@}" >> test.log; }
+__eval_log() { eval "${@}" >> test.log 2>&1; }
 __echo_last_log() { cat test.log; } # XXX: ignores the 'last' bit
 
 echo > test.log
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 06/12] x86/*: report skipped tests

2015-12-17 Thread Radim Krčmář
No care to consistency or exhaustivity was given.

(svm-disabled test should be redone and it's weird that x86/hyperv_synic
 is about the only one that does report_skip when unsupported.)

Signed-off-by: Radim Krčmář 
---
 v2: remove double newline in enable_tsc_deadline_timer [Drew]
 
 x86/apic.c | 7 +++
 x86/emulator.c | 2 +-
 x86/hyperv_synic.c | 2 +-
 x86/pku.c  | 2 +-
 x86/pmu.c  | 2 +-
 x86/smap.c | 2 +-
 x86/svm.c  | 2 +-
 x86/tsc.c  | 2 +-
 8 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/x86/apic.c b/x86/apic.c
index d4eec529e535..a97fe5e5a6c5 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -27,7 +27,7 @@ static void tsc_deadline_timer_isr(isr_regs_t *regs)
 ++tdt_count;
 }
 
-static void start_tsc_deadline_timer(void)
+static void __test_tsc_deadline_timer(void)
 {
 handle_irq(TSC_DEADLINE_TIMER_VECTOR, tsc_deadline_timer_isr);
 irq_enable();
@@ -45,7 +45,6 @@ static int enable_tsc_deadline_timer(void)
 if (cpuid(1).c & (1 << 24)) {
 lvtt = TSC_DEADLINE_TIMER_MODE | TSC_DEADLINE_TIMER_VECTOR;
 apic_write(APIC_LVTT, lvtt);
-start_tsc_deadline_timer();
 return 1;
 } else {
 return 0;
@@ -55,9 +54,9 @@ static int enable_tsc_deadline_timer(void)
 static void test_tsc_deadline_timer(void)
 {
 if(enable_tsc_deadline_timer()) {
-printf("tsc deadline timer enabled\n");
+__test_tsc_deadline_timer();
 } else {
-printf("tsc deadline timer not detected\n");
+report_skip("tsc deadline timer not detected");
 }
 }
 
diff --git a/x86/emulator.c b/x86/emulator.c
index e5c1c6b9a2f3..b64a5fe0f3dc 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -1062,7 +1062,7 @@ static void illegal_movbe_handler(struct ex_regs *regs)
 static void test_illegal_movbe(void)
 {
if (!(cpuid(1).c & (1 << 22))) {
-   printf("SKIP: illegal movbe\n");
+   report_skip("illegal movbe");
return;
}
 
diff --git a/x86/hyperv_synic.c b/x86/hyperv_synic.c
index 18d1295bfb37..602b79392bfd 100644
--- a/x86/hyperv_synic.c
+++ b/x86/hyperv_synic.c
@@ -228,7 +228,7 @@ int main(int ac, char **av)
 
 report("Hyper-V SynIC test", ok);
 } else {
-report("Hyper-V SynIC is not supported", true);
+report_skip("Hyper-V SynIC is not supported");
 }
 
 return report_summary();
diff --git a/x86/pku.c b/x86/pku.c
index 0e00b9984d70..58971d21ed05 100644
--- a/x86/pku.c
+++ b/x86/pku.c
@@ -91,7 +91,7 @@ int main(int ac, char **av)
 
 if (!(cpuid_indexed(7, 0).c & (1 << X86_FEATURE_PKU))) {
 printf("PKU not enabled, exiting\n");
-exit(1);
+return report_summary();
 }
 
 setup_vm();
diff --git a/x86/pmu.c b/x86/pmu.c
index 03f80190bb25..c68980044dee 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -387,7 +387,7 @@ int main(int ac, char **av)
 
if (!eax.split.version_id) {
printf("No pmu is detected!\n");
-   return 1;
+   return report_summary();
}
printf("PMU version: %d\n", eax.split.version_id);
printf("GP counters: %d\n", eax.split.num_counters);
diff --git a/x86/smap.c b/x86/smap.c
index d8a7ae82dc00..0aa44054bd48 100644
--- a/x86/smap.c
+++ b/x86/smap.c
@@ -93,7 +93,7 @@ int main(int ac, char **av)
 
if (!(cpuid_indexed(7, 0).b & (1 << X86_FEATURE_SMAP))) {
printf("SMAP not enabled, exiting\n");
-   exit(1);
+   return report_summary();
}
 
setup_vm();
diff --git a/x86/svm.c b/x86/svm.c
index 1046ddf73732..ff1a0f34b4bf 100644
--- a/x86/svm.c
+++ b/x86/svm.c
@@ -1064,7 +1064,7 @@ int main(int ac, char **av)
 
 if (!(cpuid(0x8001).c & 4)) {
 printf("SVM not availble\n");
-return 0;
+return report_summary();
 }
 
 setup_svm();
diff --git a/x86/tsc.c b/x86/tsc.c
index c71dc2a7abe0..ee247459fb42 100644
--- a/x86/tsc.c
+++ b/x86/tsc.c
@@ -43,5 +43,5 @@ int main()
test_rdtscp(0x100);
} else
printf("rdtscp not supported\n");
-   return 0;
+   return report_summary();
 }
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 03/12] scripts/mkstandalone: use common run function

2015-12-17 Thread Radim Krčmář
The biggest change is dependency on bash.  An alternative would be to
rewrite `run` in POSIX shell, but I think it's ok to presume that KVM
unit tests will run on a system where installing bash isn't a problem.
(We already depend on QEMU ...)

Apart from that, there are changes in output and exit codes.
 - summary doesn't go to stderr
 - PASS/FAIL is colourful
 - FAILed scripts return 1

Signed-off-by: Radim Krčmář 
---
 v2: new (I can fix the stderr in v3)
 
 scripts/mkstandalone.sh | 59 +
 1 file changed, 25 insertions(+), 34 deletions(-)

diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index 3ce244aff67b..cf2182dbd936 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -20,6 +20,13 @@ fi
 unittests=$TEST_DIR/unittests.cfg
 mkdir -p tests
 
+escape ()
+{
+   for arg in "${@}"; do
+   printf "%q " "$arg"; # XXX: trailing whitespace
+   done
+}
+
 function mkstandalone()
 {
local testname="$1"
@@ -49,33 +56,14 @@ function mkstandalone()
cmdline=$(cut -d' ' -f2- <<< "$cmdline")
 
cat < $standalone
-#!/bin/sh
+#!/bin/bash
 
-EOF
-if [ "$arch" ]; then
-   cat <> $standalone
 ARCH=\`uname -m | sed -e s/i.86/i386/ | sed -e 's/arm.*/arm/'\`
-[ "\$ARCH" = "aarch64" ] && ARCH="arm64"
-if [ "\$ARCH" != "$arch" ]; then
-   echo "skip $testname ($arch only)" 1>&2
-   exit 1
-fi
 
 EOF
-fi
-if [ "$check" ]; then
-   cat <> $standalone
-for param in $check; do
-   path=\`echo \$param | cut -d= -f1\`
-   value=\`echo \$param | cut -d= -f2\`
-   if [ -f "\$path" ] && [ "\`cat \$path\`" != "\$value" ]; then
-   echo "skip $testname (\$path not equal to \$value)" 1>&2
-   exit 1
-   fi
-done
 
-EOF
-fi
+cat scripts/run.bash >> $standalone
+
 if [ ! -f $kernel ]; then
cat <> $standalone
 echo "skip $testname (test kernel not present)" 1>&2
@@ -100,9 +88,13 @@ MAX_SMP="MAX_SMP"
 echo \$qemu $cmdline -smp $smp $opts
 
 cmdline="\`echo '$cmdline' | sed s%$kernel%_NO_FILE_4Uhere_%\`"
-if \$qemu \$cmdline 2>&1 | grep 'No accelerator found'; then
-   ret=2
-else
+if \$qemu \$cmdline 2>&1 | grep 'No accelerator found' >/dev/null; then
+   echo "skip $testname (QEMU doesn't support KVM)"
+   exit 1
+fi
+
+__run()
+{
MAX_SMP=\`getconf _NPROCESSORS_CONF\`
while \$qemu \$cmdline -smp \$MAX_SMP 2>&1 | grep 'exceeds max cpus' > 
/dev/null; do
MAX_SMP=\`expr \$MAX_SMP - 1\`
@@ -110,16 +102,15 @@ else
 
cmdline="\`echo '$cmdline' | sed s%$kernel%\$bin%\`"
\$qemu \$cmdline -smp $smp $opts
-   ret=\$?
-fi
-echo Return value from qemu: \$ret
-if [ \$ret -le 1 ]; then
-   echo PASS $testname 1>&2
-else
-   echo FAIL $testname 1>&2
-fi
+}
+
+__eval_log() { eval "\${@}"; }
+
+run `escape "${@}"`
+ret=$?
+
 rm -f \$bin
-exit 0
+exit \$ret
 EOF
 fi
 chmod +x $standalone
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 10/12] run_tests: print summary

2015-12-17 Thread Radim Krčmář
Might be interesting and hopefully won't break too many scripts.

Signed-off-by: Radim Krčmář 
---
 v2:
 - don't print "0 unexpected failures" in run_tests' summary. [Drew]
   (This could have been done in lib/report, but I'm not sure why we want
to always print it in the summary, so I've kept it there.)
 - use $testname
 - don't buffer the output (a serious bug in v1)
 - worse performance (reads the output of all tests)
 
 run_tests.sh| 1 +
 scripts/mkstandalone.sh | 2 ++
 scripts/run.bash| 5 -
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/run_tests.sh b/run_tests.sh
index e09d410beaa4..62c106a0b693 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -30,6 +30,7 @@ EOF
 
 __run() { ./$TEST_DIR-run "${@}"; }
 __eval_log() { eval "${@}" >> test.log; }
+__echo_last_log() { cat test.log; } # XXX: ignores the 'last' bit
 
 echo > test.log
 
diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index 778383077769..c37f694398b8 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -104,7 +104,9 @@ __run()
\$qemu \$cmdline -smp $smp $opts
 }
 
+# log goes to stdout and nothing is remembered
 __eval_log() { eval "\${@}"; }
+__echo_last_log() { echo; }
 
 run `escape "${@}"`
 exit \$?
diff --git a/scripts/run.bash b/scripts/run.bash
index f532cb9e8b1c..d3e8d37d315d 100644
--- a/scripts/run.bash
+++ b/scripts/run.bash
@@ -46,7 +46,10 @@ function run()
 *)  echo -ne "\e[31mFAIL\e[0m"
 esac
 
-echo " $1"
+echo -n " $testname"
+
+__echo_last_log | sed 'x;$s/^SUMMARY: //;ta;$s/.*//p;d;
+ :a s/, 0 unexpected failures//;s/^/ (/;s/$/)/'
 
 return $ret
 }
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 04/12] scripts/mkstandalone: improve exit paths

2015-12-17 Thread Radim Krčmář
trap can be called on EXIT, which covers most exits.

Signed-off-by: Radim Krčmář 
---
 v2: new
 
 scripts/mkstandalone.sh | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index cf2182dbd936..778383077769 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -71,7 +71,7 @@ exit 1
 EOF
 else
cat <> $standalone
-trap 'rm -f \$bin; exit 1' HUP INT TERM
+trap 'rm -f \$bin' EXIT
 bin=\`mktemp\`
 base64 -d << 'BIN_EOF' | zcat > \$bin &&
 EOF
@@ -107,10 +107,7 @@ __run()
 __eval_log() { eval "\${@}"; }
 
 run `escape "${@}"`
-ret=$?
-
-rm -f \$bin
-exit \$ret
+exit \$?
 EOF
 fi
 chmod +x $standalone
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 09/12] x86/hyperv_synic: check for support before testing

2015-12-17 Thread Radim Krčmář
It's not easy to distinguish successful unit-test from failed QEMU, so
we check for presence of the needed feature before hand.

Signed-off-by: Radim Krčmář 
---
 v2: remove "> /dev/null" as check doesn't print the output anymore
 
 x86/unittests.cfg | 1 +
 1 file changed, 1 insertion(+)

diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index 6b94ad93dcf0..25779993cc27 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -182,3 +182,4 @@ arch = x86_64
 file = hyperv_synic.flat
 smp = 2
 extra_params = -cpu kvm64,hv_synic -device hyperv-testdev
+check = echo quit | $qemu -cpu kvm64,hv_synic -device hyperv-testdev -monitor 
stdio
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 07/12] x86/pmu: expect failure with nmi_watchdog

2015-12-17 Thread Radim Krčmář
Host's nmi_watchdog takes one slot, making the "all counters" unit-test
fail.  We know exactly what happens, mark it as expected failure.

PMU test is now executed regardless of host_nmi_watchdog.

Signed-off-by: Radim Krčmář 
---
 v2:
 - host_nmi_watchdog made static
 
 x86/pmu.c | 9 -
 x86/unittests.cfg | 3 +--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/x86/pmu.c b/x86/pmu.c
index c68980044dee..70e9b3a41e96 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -92,6 +92,7 @@ struct pmu_event {
 };
 
 static int num_counters;
+static bool host_nmi_watchdog;
 
 char *buf;
 
@@ -291,7 +292,7 @@ static void check_counters_many(void)
if (!verify_counter([i]))
break;
 
-   report("all counters", i == n);
+   report_xfail("all counters", host_nmi_watchdog, i == n);
 }
 
 static void check_counter_overflow(void)
@@ -374,6 +375,7 @@ static void check_rdpmc(void)
 
 int main(int ac, char **av)
 {
+   int i;
struct cpuid id = cpuid(10);
 
setup_vm();
@@ -385,6 +387,11 @@ int main(int ac, char **av)
ebx.full = id.b;
edx.full = id.d;
 
+   /* XXX: horrible command line parsing */
+   for (i = 1; i < ac; i++)
+   if (!strcmp(av[i], "host_nmi_watchdog=1"))
+   host_nmi_watchdog = true;
+
if (!eax.split.version_id) {
printf("No pmu is detected!\n");
return report_summary();
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index c15c86df..6b94ad93dcf0 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -106,8 +106,7 @@ file = msr.flat
 
 [pmu]
 file = pmu.flat
-extra_params = -cpu host
-check = /proc/sys/kernel/nmi_watchdog=0
+extra_params = -cpu host -append "host_nmi_watchdog=`cat 
/proc/sys/kernel/nmi_watchdog`"
 
 [port80]
 file = port80.flat
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 02/12] run_tests: prepare for changes in scripts/mkstandalone

2015-12-17 Thread Radim Krčmář
mkstandalone has a different mechanism for running tests as well as a
different handling of output and return codes.
 - create two shell function to capture test execution and logging
 - return the return value of unit-test
 - cope with empty $verbose in `run`

Signed-off-by: Radim Krčmář 
---
 v2: new (reused the bitshift and comment from v1)
 
 run_tests.sh |  4 
 scripts/run.bash | 13 +
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index 58949e39c38c..e09d410beaa4 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -28,7 +28,11 @@ specify the appropriate qemu binary for ARCH-run.
 EOF
 }
 
+__run() { ./$TEST_DIR-run "${@}"; }
+__eval_log() { eval "${@}" >> test.log; }
+
 echo > test.log
+
 while getopts "g:hv" opt; do
 case $opt in
 g)
diff --git a/scripts/run.bash b/scripts/run.bash
index 0c5a2569d80e..243586c6d2fc 100644
--- a/scripts/run.bash
+++ b/scripts/run.bash
@@ -34,18 +34,23 @@ function run()
 fi
 done
 
-cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp $smp 
$opts"
-if [ $verbose != 0 ]; then
+cmdline="TESTNAME=$testname ACCEL=$accel __run $kernel -smp $smp $opts"
+if [ "$verbose" -a "$verbose" != 0 ]; then
 echo $cmdline
 fi
 
 # extra_params in the config file may contain backticks that need to be
 # expanded, so use eval to start qemu
-eval $cmdline >> test.log
+__eval_log "$cmdline"
+# The first bit of return value is too hard to use, just skip it.
+# Unit-tests' return value is shifted by one.
+ret=$(($? >> 1))
 
-if [ $? -le 1 ]; then
+if [ $ret -eq 0 ]; then
 echo -e "\e[32mPASS\e[0m $1"
 else
 echo -e "\e[31mFAIL\e[0m $1"
 fi
+
+return $ret
 }
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 05/12] lib/report: allow test skipping

2015-12-17 Thread Radim Krčmář
We can now explicitly mark a unit-test as skipped.
If all unit-tests were skipped, the whole test is reported as skipped as
well.  This also includes the case where no tests were run, but still
ended with report_summary().

When the whole test is skipped, ./run_tests.sh prints yellow "SKIP"
instead of green "PASS".

Return value of 77 is used to please Autotools.  I also renamed few
things in reporting code and chose to refactor a logic while at it.

Signed-off-by: Radim Krčmář 
---
 v2:
 - turned skip into yellow SKIP [Drew]
 - wrapped line at 80 characters [Drew]
 - added static to va_report
 
 lib/libcflat.h   |  1 +
 lib/report.c | 44 ++--
 scripts/run.bash | 12 +++-
 3 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/lib/libcflat.h b/lib/libcflat.h
index 9747ccdbc9f1..070818354ee1 100644
--- a/lib/libcflat.h
+++ b/lib/libcflat.h
@@ -61,6 +61,7 @@ void report_prefix_push(const char *prefix);
 void report_prefix_pop(void);
 void report(const char *msg_fmt, bool pass, ...);
 void report_xfail(const char *msg_fmt, bool xfail, bool pass, ...);
+void report_skip(const char *msg_fmt, ...);
 int report_summary(void);
 
 #define ARRAY_SIZE(_a) (sizeof(_a)/sizeof((_a)[0]))
diff --git a/lib/report.c b/lib/report.c
index 35e664108a92..a47f2e00b529 100644
--- a/lib/report.c
+++ b/lib/report.c
@@ -13,7 +13,7 @@
 #include "libcflat.h"
 #include "asm/spinlock.h"
 
-static unsigned int tests, failures, xfailures;
+static unsigned int tests, failures, xfailures, skipped;
 static char prefixes[256];
 static struct spinlock lock;
 
@@ -43,25 +43,28 @@ void report_prefix_pop(void)
spin_unlock();
 }
 
-void va_report_xfail(const char *msg_fmt, bool xfail, bool cond, va_list va)
+static void va_report(const char *msg_fmt, bool pass, bool xfail, bool skip,
+   va_list va)
 {
-   char *pass = xfail ? "XPASS" : "PASS";
-   char *fail = xfail ? "XFAIL" : "FAIL";
char buf[2000];
+   char *prefix = skip ? "SKIP"
+   : xfail ? (pass ? "XPASS" : "XFAIL")
+   : (pass ? "PASS"  : "FAIL");
 
spin_lock();
 
tests++;
-   printf("%s: ", cond ? pass : fail);
+   printf("%s: ", prefix);
puts(prefixes);
vsnprintf(buf, sizeof(buf), msg_fmt, va);
puts(buf);
puts("\n");
-   if (xfail && cond)
-   failures++;
-   else if (xfail)
+
+   if (skip)
+   skipped++;
+   else if (xfail && !pass)
xfailures++;
-   else if (!cond)
+   else if (xfail || !pass)
failures++;
 
spin_unlock();
@@ -71,7 +74,7 @@ void report(const char *msg_fmt, bool pass, ...)
 {
va_list va;
va_start(va, pass);
-   va_report_xfail(msg_fmt, false, pass, va);
+   va_report(msg_fmt, pass, false, false, va);
va_end(va);
 }
 
@@ -79,7 +82,15 @@ void report_xfail(const char *msg_fmt, bool xfail, bool 
pass, ...)
 {
va_list va;
va_start(va, pass);
-   va_report_xfail(msg_fmt, xfail, pass, va);
+   va_report(msg_fmt, pass, xfail, false, va);
+   va_end(va);
+}
+
+void report_skip(const char *msg_fmt, ...)
+{
+   va_list va;
+   va_start(va, msg_fmt);
+   va_report(msg_fmt, false, false, true, va);
va_end(va);
 }
 
@@ -89,9 +100,14 @@ int report_summary(void)
 
printf("\nSUMMARY: %d tests, %d unexpected failures", tests, failures);
if (xfailures)
-   printf(", %d expected failures\n", xfailures);
-   else
-   printf("\n");
+   printf(", %d expected failures", xfailures);
+   if (skipped)
+   printf(", %d skipped", skipped);
+   printf("\n");
+
+   if (tests == skipped)
+   return 77; /* blame AUTOTOOLS */
+
return failures > 0 ? 1 : 0;
 
spin_unlock();
diff --git a/scripts/run.bash b/scripts/run.bash
index 243586c6d2fc..b92611c29fbb 100644
--- a/scripts/run.bash
+++ b/scripts/run.bash
@@ -46,11 +46,13 @@ function run()
 # Unit-tests' return value is shifted by one.
 ret=$(($? >> 1))
 
-if [ $ret -eq 0 ]; then
-echo -e "\e[32mPASS\e[0m $1"
-else
-echo -e "\e[31mFAIL\e[0m $1"
-fi
+case $ret in
+0)  echo -ne "\e[32mPASS\e[0m" ;;
+77) echo -ne "\e[33mSKIP\e[0m" ;;
+*)  echo -ne "\e[31mFAIL\e[0m"
+esac
+
+echo " $1"
 
 return $ret
 }
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 08/12] scripts/run: generalize check

2015-12-17 Thread Radim Krčmář
config attribute "check" is currently unused.
Provide a simple implementation instead of removing it.

Signed-off-by: Radim Krčmář 
---
 v2:
 - update scripts/mkstandalone.sh [Drew]
 - don't print too much [Drew]
 - log the output of check command
 - use $testname
 
 scripts/run.bash | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/scripts/run.bash b/scripts/run.bash
index b92611c29fbb..f532cb9e8b1c 100644
--- a/scripts/run.bash
+++ b/scripts/run.bash
@@ -22,17 +22,11 @@ function run()
 return
 fi
 
-# check a file for a particular value before running a test
-# the check line can contain multiple files to check separated by a space
-# but each check parameter needs to be of the form =
-for check_param in ${check[@]}; do
-path=${check_param%%=*}
-value=${check_param#*=}
-if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then
-echo "skip $1 ($path not equal to $value)"
-return
-fi
-done
+__eval_log "$check" || {
+__eval_log 'echo "skipped $testname (check returned $?)"'
+echo "skip $testname (failed check)"
+return
+}
 
 cmdline="TESTNAME=$testname ACCEL=$accel __run $kernel -smp $smp $opts"
 if [ "$verbose" -a "$verbose" != 0 ]; then
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH kvm-unit-tests v2 11/12] wrappers: consolidate skip output

2015-12-17 Thread Radim Krčmář
Ugly helpers will get us yellow "SKIP" to stdout and 77 on exit.

Signed-off-by: Radim Krčmář 
---
 v2: new
 
 scripts/mkstandalone.sh |  6 +++---
 scripts/run.bash| 25 -
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index c37f694398b8..adb11abf650c 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -66,7 +66,7 @@ cat scripts/run.bash >> $standalone
 
 if [ ! -f $kernel ]; then
cat <> $standalone
-echo "skip $testname (test kernel not present)" 1>&2
+echo "\$(SKIP) $testname (test kernel not present)"
 exit 1
 EOF
 else
@@ -89,8 +89,8 @@ echo \$qemu $cmdline -smp $smp $opts
 
 cmdline="\`echo '$cmdline' | sed s%$kernel%_NO_FILE_4Uhere_%\`"
 if \$qemu \$cmdline 2>&1 | grep 'No accelerator found' >/dev/null; then
-   echo "skip $testname (QEMU doesn't support KVM)"
-   exit 1
+   echo "\$(SKIP) $testname (QEMU doesn't support KVM)"
+   exit \$EXIT_SKIP
 fi
 
 __run()
diff --git a/scripts/run.bash b/scripts/run.bash
index d3e8d37d315d..06a13b9aaf1a 100644
--- a/scripts/run.bash
+++ b/scripts/run.bash
@@ -1,3 +1,10 @@
+PASS() { echo -ne "\e[32mPASS\e[0m"; }
+SKIP() { echo -ne "\e[33mSKIP\e[0m"; }
+FAIL() { echo -ne "\e[31mFAIL\e[0m"; }
+
+EXIT_SUCCESS=0
+EXIT_SKIP=77
+
 function run()
 {
 local testname="$1"
@@ -10,22 +17,22 @@ function run()
 local accel="$8"
 
 if [ -z "$testname" ]; then
-return
+return $EXIT_SKIP
 fi
 
 if [ -n "$only_group" ] && ! grep -q "$only_group" <<<$groups; then
-return
+return $EXIT_SKIP
 fi
 
 if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
-echo "skip $1 ($arch only)"
-return
+echo "`SKIP` $testname ($arch only)"
+return $EXIT_SKIP
 fi
 
 __eval_log "$check" || {
 __eval_log 'echo "skipped $testname (check returned $?)"'
-echo "skip $testname (failed check)"
-return
+echo "`SKIP` $testname (failed check)"
+return $EXIT_SKIP
 }
 
 cmdline="TESTNAME=$testname ACCEL=$accel __run $kernel -smp $smp $opts"
@@ -41,9 +48,9 @@ function run()
 ret=$(($? >> 1))
 
 case $ret in
-0)  echo -ne "\e[32mPASS\e[0m" ;;
-77) echo -ne "\e[33mSKIP\e[0m" ;;
-*)  echo -ne "\e[31mFAIL\e[0m"
+$EXIT_SUCCESS) PASS ;;
+$EXIT_SKIP)SKIP ;;
+*) FAIL
 esac
 
 echo -n " $testname"
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 02/12] run_tests: prepare for changes in scripts/mkstandalone

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:33PM +0100, Radim Krčmář wrote:
> mkstandalone has a different mechanism for running tests as well as a
> different handling of output and return codes.
>  - create two shell function to capture test execution and logging
>  - return the return value of unit-test
>  - cope with empty $verbose in `run`
> 
> Signed-off-by: Radim Krčmář 
> ---
>  v2: new (reused the bitshift and comment from v1)
>  
>  run_tests.sh |  4 
>  scripts/run.bash | 13 +
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index 58949e39c38c..e09d410beaa4 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -28,7 +28,11 @@ specify the appropriate qemu binary for ARCH-run.
>  EOF
>  }
>  
> +__run() { ./$TEST_DIR-run "${@}"; }
> +__eval_log() { eval "${@}" >> test.log; }
> +
>  echo > test.log
> +
>  while getopts "g:hv" opt; do
>  case $opt in
>  g)
> diff --git a/scripts/run.bash b/scripts/run.bash
> index 0c5a2569d80e..243586c6d2fc 100644
> --- a/scripts/run.bash
> +++ b/scripts/run.bash
> @@ -34,18 +34,23 @@ function run()
>  fi
>  done
>  
> -cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp 
> $smp $opts"
> -if [ $verbose != 0 ]; then
> +cmdline="TESTNAME=$testname ACCEL=$accel __run $kernel -smp $smp $opts"
> +if [ "$verbose" -a "$verbose" != 0 ]; then

For bash bools I prefer just doing 'if [ "$verbose" = "yes" ]', allowing it
to be empty.

>  echo $cmdline
>  fi
>  
>  # extra_params in the config file may contain backticks that need to be
>  # expanded, so use eval to start qemu
> -eval $cmdline >> test.log
> +__eval_log "$cmdline"
> +# The first bit of return value is too hard to use, just skip it.
> +# Unit-tests' return value is shifted by one.
> +ret=$(($? >> 1))

I just wrote a patch, inspired by reviewing your v1 of this series, that
tackles the ambiguous exit code problem. I'll post it now, but obviously
we'll need to rebase one or the other of our run_tests.sh series'.

>  
> -if [ $? -le 1 ]; then
> +if [ $ret -eq 0 ]; then
>  echo -e "\e[32mPASS\e[0m $1"
>  else
>  echo -e "\e[31mFAIL\e[0m $1"
>  fi
> +
> +return $ret
>  }
> -- 
> 2.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 10/12] run_tests: print summary

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:41PM +0100, Radim Krčmář wrote:
> Might be interesting and hopefully won't break too many scripts.
> 
> Signed-off-by: Radim Krčmář 
> ---
>  v2:
>  - don't print "0 unexpected failures" in run_tests' summary. [Drew]
>(This could have been done in lib/report, but I'm not sure why we want
> to always print it in the summary, so I've kept it there.)
>  - use $testname
>  - don't buffer the output (a serious bug in v1)
>  - worse performance (reads the output of all tests)
>  
>  run_tests.sh| 1 +
>  scripts/mkstandalone.sh | 2 ++
>  scripts/run.bash| 5 -
>  3 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index e09d410beaa4..62c106a0b693 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -30,6 +30,7 @@ EOF
>  
>  __run() { ./$TEST_DIR-run "${@}"; }
>  __eval_log() { eval "${@}" >> test.log; }
> +__echo_last_log() { cat test.log; } # XXX: ignores the 'last' bit
>  
>  echo > test.log
>  
> diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
> index 778383077769..c37f694398b8 100755
> --- a/scripts/mkstandalone.sh
> +++ b/scripts/mkstandalone.sh
> @@ -104,7 +104,9 @@ __run()
>   \$qemu \$cmdline -smp $smp $opts
>  }
>  
> +# log goes to stdout and nothing is remembered
>  __eval_log() { eval "\${@}"; }
> +__echo_last_log() { echo; }
>  
>  run `escape "${@}"`
>  exit \$?
> diff --git a/scripts/run.bash b/scripts/run.bash
> index f532cb9e8b1c..d3e8d37d315d 100644
> --- a/scripts/run.bash
> +++ b/scripts/run.bash
> @@ -46,7 +46,10 @@ function run()
>  *)  echo -ne "\e[31mFAIL\e[0m"
>  esac
>  
> -echo " $1"
> +echo -n " $testname"
> +
> +__echo_last_log | sed 'x;$s/^SUMMARY: //;ta;$s/.*//p;d;
> + :a s/, 0 unexpected failures//;s/^/ (/;s/$/)/'

This sed is way beyond my skillz. I would have just done the following
at the expense of an extra pipe :-)

tac test.log | grep -m1 '^SUMMARY:' | sed 's/.*: \(.*\),.*/\1/'

>  
>  return $ret
>  }
> -- 
> 2.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 09/12] x86/hyperv_synic: check for support before testing

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:40PM +0100, Radim Krčmář wrote:
> It's not easy to distinguish successful unit-test from failed QEMU, so
> we check for presence of the needed feature before hand.
> 
> Signed-off-by: Radim Krčmář 
> ---
>  v2: remove "> /dev/null" as check doesn't print the output anymore
>  
>  x86/unittests.cfg | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> index 6b94ad93dcf0..25779993cc27 100644
> --- a/x86/unittests.cfg
> +++ b/x86/unittests.cfg
> @@ -182,3 +182,4 @@ arch = x86_64
>  file = hyperv_synic.flat
>  smp = 2
>  extra_params = -cpu kvm64,hv_synic -device hyperv-testdev
> +check = echo quit | $qemu -cpu kvm64,hv_synic -device hyperv-testdev 
> -monitor stdio

Let's make sure $QEMU==$qemu in contexts where unittests.cfg is used, and
then document (in the unittests.cfg header) that $QEMU may be used in the
check lines.

> -- 
> 2.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 10/12] run_tests: print summary

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:41PM +0100, Radim Krčmář wrote:
> Might be interesting and hopefully won't break too many scripts.
> 
> Signed-off-by: Radim Krčmář 
> ---
>  v2:
>  - don't print "0 unexpected failures" in run_tests' summary. [Drew]
>(This could have been done in lib/report, but I'm not sure why we want
> to always print it in the summary, so I've kept it there.)

I vote we kill it in lib/report. It's usually just noise.

drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 05/12] lib/report: allow test skipping

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:36PM +0100, Radim Krčmář wrote:
> We can now explicitly mark a unit-test as skipped.
> If all unit-tests were skipped, the whole test is reported as skipped as
> well.  This also includes the case where no tests were run, but still
> ended with report_summary().
> 
> When the whole test is skipped, ./run_tests.sh prints yellow "SKIP"
> instead of green "PASS".
> 
> Return value of 77 is used to please Autotools.  I also renamed few
> things in reporting code and chose to refactor a logic while at it.
> 
> Signed-off-by: Radim Krčmář 
> ---
>  v2:
>  - turned skip into yellow SKIP [Drew]
>  - wrapped line at 80 characters [Drew]
>  - added static to va_report
>  
>  lib/libcflat.h   |  1 +
>  lib/report.c | 44 ++--
>  scripts/run.bash | 12 +++-
>  3 files changed, 38 insertions(+), 19 deletions(-)
> 
> diff --git a/lib/libcflat.h b/lib/libcflat.h
> index 9747ccdbc9f1..070818354ee1 100644
> --- a/lib/libcflat.h
> +++ b/lib/libcflat.h
> @@ -61,6 +61,7 @@ void report_prefix_push(const char *prefix);
>  void report_prefix_pop(void);
>  void report(const char *msg_fmt, bool pass, ...);
>  void report_xfail(const char *msg_fmt, bool xfail, bool pass, ...);
> +void report_skip(const char *msg_fmt, ...);
>  int report_summary(void);
>  
>  #define ARRAY_SIZE(_a) (sizeof(_a)/sizeof((_a)[0]))
> diff --git a/lib/report.c b/lib/report.c
> index 35e664108a92..a47f2e00b529 100644
> --- a/lib/report.c
> +++ b/lib/report.c
> @@ -13,7 +13,7 @@
>  #include "libcflat.h"
>  #include "asm/spinlock.h"
>  
> -static unsigned int tests, failures, xfailures;
> +static unsigned int tests, failures, xfailures, skipped;
>  static char prefixes[256];
>  static struct spinlock lock;
>  
> @@ -43,25 +43,28 @@ void report_prefix_pop(void)
>   spin_unlock();
>  }
>  
> -void va_report_xfail(const char *msg_fmt, bool xfail, bool cond, va_list va)
> +static void va_report(const char *msg_fmt, bool pass, bool xfail, bool skip,
> + va_list va)

Making this static disallows unit test writers to create their own
variable arg report() wrapper functions. Perhaps to determine whether
or not a skip is in order, e.g.

 xyz_report(msg, pass, ...)
 {
va_list va;
va_start(va, pass);
if (xyz)
   va_report(msg, pass, false, false, va);
else
   va_report(msg, false, false, true, va);
va_end(va);
 }

>  {
> - char *pass = xfail ? "XPASS" : "PASS";
> - char *fail = xfail ? "XFAIL" : "FAIL";
>   char buf[2000];
> + char *prefix = skip ? "SKIP"
> + : xfail ? (pass ? "XPASS" : "XFAIL")
> + : (pass ? "PASS"  : "FAIL");
>  
>   spin_lock();
>  
>   tests++;
> - printf("%s: ", cond ? pass : fail);
> + printf("%s: ", prefix);
>   puts(prefixes);
>   vsnprintf(buf, sizeof(buf), msg_fmt, va);
>   puts(buf);
>   puts("\n");
> - if (xfail && cond)
> - failures++;
> - else if (xfail)
> +
> + if (skip)
> + skipped++;
> + else if (xfail && !pass)
>   xfailures++;
> - else if (!cond)
> + else if (xfail || !pass)
>   failures++;
>  
>   spin_unlock();
> @@ -71,7 +74,7 @@ void report(const char *msg_fmt, bool pass, ...)
>  {
>   va_list va;
>   va_start(va, pass);
> - va_report_xfail(msg_fmt, false, pass, va);
> + va_report(msg_fmt, pass, false, false, va);
>   va_end(va);
>  }
>  
> @@ -79,7 +82,15 @@ void report_xfail(const char *msg_fmt, bool xfail, bool 
> pass, ...)
>  {
>   va_list va;
>   va_start(va, pass);
> - va_report_xfail(msg_fmt, xfail, pass, va);
> + va_report(msg_fmt, pass, xfail, false, va);
> + va_end(va);
> +}
> +
> +void report_skip(const char *msg_fmt, ...)
> +{
> + va_list va;
> + va_start(va, msg_fmt);
> + va_report(msg_fmt, false, false, true, va);
>   va_end(va);
>  }
>  
> @@ -89,9 +100,14 @@ int report_summary(void)
>  
>   printf("\nSUMMARY: %d tests, %d unexpected failures", tests, failures);
>   if (xfailures)
> - printf(", %d expected failures\n", xfailures);
> - else
> - printf("\n");
> + printf(", %d expected failures", xfailures);
> + if (skipped)
> + printf(", %d skipped", skipped);
> + printf("\n");
> +
> + if (tests == skipped)
> + return 77; /* blame AUTOTOOLS */
> +
>   return failures > 0 ? 1 : 0;
>  
>   spin_unlock();
> diff --git a/scripts/run.bash b/scripts/run.bash
> index 243586c6d2fc..b92611c29fbb 100644
> --- a/scripts/run.bash
> +++ b/scripts/run.bash
> @@ -46,11 +46,13 @@ function run()
>  # Unit-tests' return value is shifted by one.
>  ret=$(($? >> 1))
>  
> -if [ $ret -eq 0 ]; then
> -echo -e "\e[32mPASS\e[0m $1"
> -else
> -echo -e "\e[31mFAIL\e[0m $1"
> -fi
> +case $ret in
> +0)  echo -ne "\e[32mPASS\e[0m" ;;
> + 

Re: [PATCH kvm-unit-tests v2 05/12] lib/report: allow test skipping

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 01:30:23PM -0600, Andrew Jones wrote:
> On Thu, Dec 17, 2015 at 06:53:36PM +0100, Radim Krčmář wrote:
> > We can now explicitly mark a unit-test as skipped.
> > If all unit-tests were skipped, the whole test is reported as skipped as
> > well.  This also includes the case where no tests were run, but still
> > ended with report_summary().
> > 
> > When the whole test is skipped, ./run_tests.sh prints yellow "SKIP"
> > instead of green "PASS".
> > 
> > Return value of 77 is used to please Autotools.  I also renamed few
> > things in reporting code and chose to refactor a logic while at it.
> > 
> > Signed-off-by: Radim Krčmář 
> > ---
> >  v2:
> >  - turned skip into yellow SKIP [Drew]
> >  - wrapped line at 80 characters [Drew]
> >  - added static to va_report
> >  
> >  lib/libcflat.h   |  1 +
> >  lib/report.c | 44 ++--
> >  scripts/run.bash | 12 +++-
> >  3 files changed, 38 insertions(+), 19 deletions(-)
> > 
> > diff --git a/lib/libcflat.h b/lib/libcflat.h
> > index 9747ccdbc9f1..070818354ee1 100644
> > --- a/lib/libcflat.h
> > +++ b/lib/libcflat.h
> > @@ -61,6 +61,7 @@ void report_prefix_push(const char *prefix);
> >  void report_prefix_pop(void);
> >  void report(const char *msg_fmt, bool pass, ...);
> >  void report_xfail(const char *msg_fmt, bool xfail, bool pass, ...);
> > +void report_skip(const char *msg_fmt, ...);
> >  int report_summary(void);
> >  
> >  #define ARRAY_SIZE(_a) (sizeof(_a)/sizeof((_a)[0]))
> > diff --git a/lib/report.c b/lib/report.c
> > index 35e664108a92..a47f2e00b529 100644
> > --- a/lib/report.c
> > +++ b/lib/report.c
> > @@ -13,7 +13,7 @@
> >  #include "libcflat.h"
> >  #include "asm/spinlock.h"
> >  
> > -static unsigned int tests, failures, xfailures;
> > +static unsigned int tests, failures, xfailures, skipped;
> >  static char prefixes[256];
> >  static struct spinlock lock;
> >  
> > @@ -43,25 +43,28 @@ void report_prefix_pop(void)
> > spin_unlock();
> >  }
> >  
> > -void va_report_xfail(const char *msg_fmt, bool xfail, bool cond, va_list 
> > va)
> > +static void va_report(const char *msg_fmt, bool pass, bool xfail, bool 
> > skip,
> > +   va_list va)
> 
> Making this static disallows unit test writers to create their own
> variable arg report() wrapper functions. Perhaps to determine whether
> or not a skip is in order, e.g.
> 
>  xyz_report(msg, pass, ...)
>  {
> va_list va;
> va_start(va, pass);
> if (xyz)
>va_report(msg, pass, false, false, va);
> else
>va_report(msg, false, false, true, va);
> va_end(va);
>  }

Hmm, while I still think we should avoid using static, to allow new wrappers,
the wrapper I wrote here as an example wouldn't be necessary if report_skip's
inputs were instead 

void report_skip(const char *msg_fmt, bool pass, bool skip, ...)

Why not do that?

> 
> >  {
> > -   char *pass = xfail ? "XPASS" : "PASS";
> > -   char *fail = xfail ? "XFAIL" : "FAIL";
> > char buf[2000];
> > +   char *prefix = skip ? "SKIP"
> > +   : xfail ? (pass ? "XPASS" : "XFAIL")
> > +   : (pass ? "PASS"  : "FAIL");
> >  
> > spin_lock();
> >  
> > tests++;
> > -   printf("%s: ", cond ? pass : fail);
> > +   printf("%s: ", prefix);
> > puts(prefixes);
> > vsnprintf(buf, sizeof(buf), msg_fmt, va);
> > puts(buf);
> > puts("\n");
> > -   if (xfail && cond)
> > -   failures++;
> > -   else if (xfail)
> > +
> > +   if (skip)
> > +   skipped++;
> > +   else if (xfail && !pass)
> > xfailures++;
> > -   else if (!cond)
> > +   else if (xfail || !pass)
> > failures++;
> >  
> > spin_unlock();
> > @@ -71,7 +74,7 @@ void report(const char *msg_fmt, bool pass, ...)
> >  {
> > va_list va;
> > va_start(va, pass);
> > -   va_report_xfail(msg_fmt, false, pass, va);
> > +   va_report(msg_fmt, pass, false, false, va);
> > va_end(va);
> >  }
> >  
> > @@ -79,7 +82,15 @@ void report_xfail(const char *msg_fmt, bool xfail, bool 
> > pass, ...)
> >  {
> > va_list va;
> > va_start(va, pass);
> > -   va_report_xfail(msg_fmt, xfail, pass, va);
> > +   va_report(msg_fmt, pass, xfail, false, va);
> > +   va_end(va);
> > +}
> > +
> > +void report_skip(const char *msg_fmt, ...)
> > +{
> > +   va_list va;
> > +   va_start(va, msg_fmt);
> > +   va_report(msg_fmt, false, false, true, va);
> > va_end(va);
> >  }
> >  
> > @@ -89,9 +100,14 @@ int report_summary(void)
> >  
> > printf("\nSUMMARY: %d tests, %d unexpected failures", tests, failures);
> > if (xfailures)
> > -   printf(", %d expected failures\n", xfailures);
> > -   else
> > -   printf("\n");
> > +   printf(", %d expected failures", xfailures);
> > +   if (skipped)
> > +   printf(", %d skipped", skipped);
> > +   printf("\n");
> > +
> > +   if (tests == skipped)
> > +   return 77; /* blame AUTOTOOLS */
> > +
> > 

[kvm-unit-tests PATCH 3/3] add timeout support

2015-12-17 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 arm/run | 8 ++--
 arm/unittests.cfg   | 1 +
 run_tests.sh| 5 -
 scripts/functions.bash  | 8 ++--
 scripts/mkstandalone.sh | 9 +++--
 x86/run | 8 ++--
 x86/unittests.cfg   | 1 +
 7 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/arm/run b/arm/run
index 4a648697d7fb5..a66892becc8ae 100755
--- a/arm/run
+++ b/arm/run
@@ -75,10 +75,14 @@ chr_testdev+=' -device virtconsole,chardev=ctd -chardev 
testdev,id=ctd'
 M+=",accel=$ACCEL"
 command="$qemu $M -cpu $processor $chr_testdev"
 command+=" -display none -serial stdio -kernel"
-echo $command "$@"
+
+if [ "$TIMEOUT" ]; then
+   timeout_cmd="timeout --foreground $TIMEOUT"
+fi
+echo $timeout_cmd $command "$@"
 
 if [ "$DRYRUN" != "yes" ]; then
-   $command "$@"
+   $timeout_cmd $command "$@"
ret=$?
echo Return value from qemu: $ret
exit $ret
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 926bbb153728b..8c6c475f050fc 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -15,6 +15,7 @@
 # accel = kvm|tcg  # Optionally specify if test must run with
 #  # kvm or tcg. If not specified, then kvm will
 #  # be used when available.
+# timeout =  # Optionally specify a timeout.
 ##
 
 #
diff --git a/run_tests.sh b/run_tests.sh
index f8de08cfb21b5..23494fa032c49 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -21,6 +21,7 @@ function run()
 local arch="$6"
 local check="$7"
 local accel="$8"
+local timeout="${9:-$TIMEOUT}"
 local errlog sig ret
 
 if [ -z "$testname" ]; then
@@ -48,7 +49,7 @@ function run()
 fi
 done
 
-cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp $smp 
$opts"
+cmdline="TESTNAME=$testname ACCEL=$accel TIMEOUT=$timeout ./$TEST_DIR-run 
$kernel -smp $smp $opts"
 if [ $verbose != 0 ]; then
 echo $cmdline
 fi
@@ -78,6 +79,8 @@ function run()
 echo -e "\e[31mFAIL\e[0m $1 (debug-exit not called)"
 elif [ $ret -eq 1 ]; then
 echo -e "\e[32mPASS\e[0m $1"
+elif [ $ret -eq 124 ]; then
+echo -e "\e[31mFAIL\e[0m $1 (timeout; duration=$timeout)"
 elif [ $ret -ge 128 ]; then
 ((sig=ret-128))
 echo -e "\e[31mFAIL\e[0m $1 (got signal $sig)"
diff --git a/scripts/functions.bash b/scripts/functions.bash
index f13fe6f88f23d..ee9143c5d630d 100644
--- a/scripts/functions.bash
+++ b/scripts/functions.bash
@@ -11,12 +11,13 @@ function for_each_unittest()
local arch
local check
local accel
+   local timeout
 
exec {fd}<"$unittests"
 
while read -u $fd line; do
if [[ "$line" =~ ^\[(.*)\]$ ]]; then
-   "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" 
"$arch" "$check" "$accel"
+   "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" 
"$arch" "$check" "$accel" "$timeout"
testname=${BASH_REMATCH[1]}
smp=1
kernel=""
@@ -25,6 +26,7 @@ function for_each_unittest()
arch=""
check=""
accel=""
+   timeout=""
elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then
kernel=$TEST_DIR/${BASH_REMATCH[1]}
elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
@@ -39,8 +41,10 @@ function for_each_unittest()
check=${BASH_REMATCH[1]}
elif [[ $line =~ ^accel\ *=\ *(.*)$ ]]; then
accel=${BASH_REMATCH[1]}
+   elif [[ $line =~ ^timeout\ *=\ *(.*)$ ]]; then
+   timeout=${BASH_REMATCH[1]}
fi
done
-   "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" 
"$accel"
+   "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" "$arch" "$check" 
"$accel" "$timeout"
exec {fd}<&-
 }
diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index 94ea0467c5be6..80eb269b430c6 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -30,6 +30,7 @@ function mkstandalone()
local arch="$6"
local check="$7"
local accel="$8"
+   local timeout="$9"
 
if [ -z "$testname" ]; then
return 1
@@ -97,8 +98,12 @@ if [ "\$QEMU" ]; then
qemu="\$QEMU"
 fi
 
+if [ "$timeout" ]; then
+   timeout_cmd='timeout --foreground $timeout'
+fi
+
 MAX_SMP="MAX_SMP"
-echo \$qemu $cmdline -smp $smp $opts
+echo \$timeout_cmd \$qemu $cmdline -smp $smp $opts
 
 cmdline="\`echo '$cmdline' | sed s%$kernel%_NO_FILE_4Uhere_%\`"
 if \$qemu \$cmdline 2>&1 | grep 'No accelerator found'; then
@@ -110,7 +115,7 @@ else
done
 
cmdline="\`echo '$cmdline' 

[kvm-unit-tests PATCH 1/3] run_tests.sh: reduce return code ambiguity

2015-12-17 Thread Andrew Jones
qemu/unittest exit codes are convoluted, causing codes 0 and 1
to be ambiguous. Here are the possible meanings

 .-.
 || 0 | 1  |
 |-|
 | QEMU   | did something successfully,   | FAILURE|
 || but probably didn't run the   ||
 || unittest, OR caught SIGINT,   ||
 || SIGHUP, or SIGTERM||
 |-|
 | unittest   | for some reason exited using  | SUCCESS|
 || ACPI/PSCI, not with debug-exit||
 .-.

As we can see above, an exit code of zero is even ambiguous for each
row, i.e. QEMU could exit with zero because it successfully completed,
or because it caught a signal. unittest could exit with zero because
it successfully powered-off, or because for some odd reason it powered-
off instead of calling debug-exit.

And, the most fun is that exit-code == 1 means QEMU failed, but the
unittest succeeded.

This patch attempts to reduce that ambiguity, by also looking at stderr.
With it, we have

 0  - unexpected exit from qemu, or the unittest not using debug-exit.
  Consider it a FAILURE
 1  - unittest SUCCESS
 < 128  - something failed (could be the unittest, qemu, or a run script)
  Check the logs.
 >= 128 - signal (signum = code - 128)

Signed-off-by: Andrew Jones 
---
 run_tests.sh| 27 +--
 scripts/mkstandalone.sh | 27 ++-
 2 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index fad22a935b007..f8de08cfb21b5 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -21,6 +21,7 @@ function run()
 local arch="$6"
 local check="$7"
 local accel="$8"
+local errlog sig ret
 
 if [ -z "$testname" ]; then
 return
@@ -54,10 +55,32 @@ function run()
 
 # extra_params in the config file may contain backticks that need to be
 # expanded, so use eval to start qemu
-eval $cmdline >> test.log
+errlog=$(mktemp)
+eval $cmdline >> test.log 2> $errlog
+ret=$?
+
+if [ "$(stat -c %s $errlog)" != "0" ]; then
+# Some signals result in a zero return code, but the error log
+# tells the truth.
+sig="$(grep 'terminating on signal' $errlog | sed 's/.*terminating on 
signal \([0-9][0-9]*\).*/\1/')"
+if [ $ret -eq 0 ] && [ "$sig" ]; then
+((ret=sig+128))
+elif [ $ret -eq 1 ]; then
+# We got the unittest SUCCESS code, but also error messages,
+# let's assume qemu failed.
+ret=2
+fi
+cat $errlog >> test.log
+fi
+rm -f $errlog
 
-if [ $? -le 1 ]; then
+if [ $ret -eq 0 ]; then
+echo -e "\e[31mFAIL\e[0m $1 (debug-exit not called)"
+elif [ $ret -eq 1 ]; then
 echo -e "\e[32mPASS\e[0m $1"
+elif [ $ret -ge 128 ]; then
+((sig=ret-128))
+echo -e "\e[31mFAIL\e[0m $1 (got signal $sig)"
 else
 echo -e "\e[31mFAIL\e[0m $1"
 fi
diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index 3ce244aff67b9..94ea0467c5be6 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -83,8 +83,9 @@ exit 1
 EOF
 else
cat <> $standalone
-trap 'rm -f \$bin; exit 1' HUP INT TERM
+trap 'rm -f \$bin \$errlog; exit 1' HUP INT TERM
 bin=\`mktemp\`
+errlog=\`mktemp\`
 base64 -d << 'BIN_EOF' | zcat > \$bin &&
 EOF
 gzip - < $kernel | base64 >> $standalone
@@ -109,16 +110,32 @@ else
done
 
cmdline="\`echo '$cmdline' | sed s%$kernel%\$bin%\`"
-   \$qemu \$cmdline -smp $smp $opts
+   \$qemu \$cmdline -smp $smp $opts 2> \$errlog
ret=\$?
+   echo Return value from qemu: \$ret
+
+   if [ "\`stat -c %s \$errlog\`" != "0" ]; then
+   sig="\`grep 'terminating on signal' \$errlog | sed 
's/.*terminating on signal \([0-9][0-9]*\).*/\1/'\`"
+   if [ \$ret -eq 0 ] && [ "\$sig" ]; then
+   ret=\`expr \$sig + 128\`
+   elif [ \$ret -eq 1 ]; then
+   ret=2
+   fi
+   cat \$errlog
+   fi
 fi
-echo Return value from qemu: \$ret
-if [ \$ret -le 1 ]; then
+
+if [ \$ret -eq 0 ]; then
+   echo "FAIL $testname (debug-exit not called)" 1>&2
+elif [ \$ret -eq 1 ]; then
echo PASS $testname 1>&2
+elif [ \$ret -ge 128 ]; then
+   echo "FAIL $testname (got signal \`expr \$ret - 128\`)" 1>&2
 else
echo FAIL $testname 1>&2
 fi
-rm -f \$bin
+
+rm -f \$bin \$errlog
 exit 0
 EOF
 fi
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to 

[kvm-unit-tests PATCH 2/3] cleanup unittests.cfg headers

2015-12-17 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 arm/unittests.cfg | 26 +-
 x86/unittests.cfg | 21 ++---
 2 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 5e26da1a8c1bc..926bbb153728b 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -1,13 +1,21 @@
-# Define your new unittest following the convention:
+##
+# unittest configuration
+#
 # [unittest_name]
-# file = foo.flat # Name of the flat file to be used
-# smp  = 2# Number of processors the VM will use during this test
-# # Use $MAX_SMP to use the maximum the host supports.
-# extra_params = -append  # Additional parameters used
-# arch = arm|arm64   # Only if test case is specific to one
-# groups = group1 group2 # Used to identify test cases with run_tests -g ...
-# accel = kvm|tcg # Optionally specify if test must run with kvm or tcg.
-# # If not specified, then kvm will be used when available.
+# file = .flat   # Name of the flat file to be used.
+# smp  =  # Number of processors the VM will use
+#  # during this test. Use $MAX_SMP to use
+#  # the maximum the host supports. Defaults
+#  # to one.
+# extra_params = -append# Additional parameters used.
+# arch = arm|arm64 # Select one if the test case is
+#  # specific to only one.
+# groups =   ... # Used to identify test cases
+#  # with run_tests -g ...
+# accel = kvm|tcg  # Optionally specify if test must run with
+#  # kvm or tcg. If not specified, then kvm will
+#  # be used when available.
+##
 
 #
 # Test that the configured number of processors (smp = ), and
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index c15c86df7..ae41781b5b72b 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -1,11 +1,18 @@
-# Define your new unittest following the convention:
+##
+# unittest configuration
+#
 # [unittest_name]
-# file = foo.flat # Name of the flat file to be used
-# smp = 2 # Number of processors the VM will use during this test
-# # Use $MAX_SMP to use the maximum the host supports.
-# extra_params = -cpu qemu64,+x2apic # Additional parameters used
-# arch = i386/x86_64 # Only if the test case works only on one of them
-# groups = group1 group2 # Used to identify test cases with run_tests -g ...
+# file = .flat   # Name of the flat file to be used.
+# smp  =  # Number of processors the VM will use
+#  # during this test. Use $MAX_SMP to use
+#  # the maximum the host supports. Defaults
+#  # to one.
+# extra_params = -append# Additional parameters used.
+# arch = i386|x86_64   # Select one if the test case is
+#  # specific to only one.
+# groups =   ... # Used to identify test cases
+#  # with run_tests -g ...
+##
 
 [apic]
 file = apic.flat
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 0/3] run_tests.sh changes

2015-12-17 Thread Andrew Jones
Improve exit code handling and add unittest timeout support.

Andrew Jones (3):
  run_tests.sh: reduce return code ambiguity
  cleanup unittests.cfg headers
  add timeout support

 arm/run |  8 ++--
 arm/unittests.cfg   | 27 ++-
 run_tests.sh| 32 +---
 scripts/functions.bash  |  8 ++--
 scripts/mkstandalone.sh | 34 --
 x86/run |  8 ++--
 x86/unittests.cfg   | 22 +++---
 7 files changed, 108 insertions(+), 31 deletions(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 12/12] run_tests: suppress stderr

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:43PM +0100, Radim Krčmář wrote:
> log it instead to keep the output clear.
> 
> Signed-off-by: Radim Krčmář 
> ---
>  v2: new (v1 did this by catching all output in a variable)
> 
>  run_tests.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index 62c106a0b693..4c8c20a16eac 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -29,7 +29,7 @@ EOF
>  }
>  
>  __run() { ./$TEST_DIR-run "${@}"; }
> -__eval_log() { eval "${@}" >> test.log; }
> +__eval_log() { eval "${@}" >> test.log 2>&1; }
>  __echo_last_log() { cat test.log; } # XXX: ignores the 'last' bit

This will greatly conflict with my exit code untangling patch. I'll send
that now so you can see it.

>  
>  echo > test.log
> -- 
> 2.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 03/12] scripts/mkstandalone: use common run function

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:34PM +0100, Radim Krčmář wrote:
> The biggest change is dependency on bash.  An alternative would be to
> rewrite `run` in POSIX shell, but I think it's ok to presume that KVM
> unit tests will run on a system where installing bash isn't a problem.

Hmm... as hard as I worked to avoid the dependency on bash for the
standalone tests, then I'm reluctant to give up on that. I do agree
that having the dependency for the printf-%q trick helps a ton in
making the code more maintainable though.

> (We already depend on QEMU ...)

Dependency on qemu doesn't imply a dependency on bash. The idea behind
the standalone version of kvm-unit-tests tests is that you can receive
one in your email and launch it.

> 
> Apart from that, there are changes in output and exit codes.
>  - summary doesn't go to stderr

I wanted the summary on stderr so when you redirect the output of the
test to a file the output would directly diff with the corresponding
output in test.log from a system where run_tests.sh was used.

>  - PASS/FAIL is colourful
>  - FAILed scripts return 1

I'm not sure why I did exit 0 instead of exit $ret. I guess that was
just a thinko on my part. exit $ret makes more sense.

> 
> Signed-off-by: Radim Krčmář 
> ---
>  v2: new (I can fix the stderr in v3)
>  
>  scripts/mkstandalone.sh | 59 
> +
>  1 file changed, 25 insertions(+), 34 deletions(-)
> 
> diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
> index 3ce244aff67b..cf2182dbd936 100755
> --- a/scripts/mkstandalone.sh
> +++ b/scripts/mkstandalone.sh
> @@ -20,6 +20,13 @@ fi
>  unittests=$TEST_DIR/unittests.cfg
>  mkdir -p tests
>  
> +escape ()
> +{
> + for arg in "${@}"; do
> + printf "%q " "$arg"; # XXX: trailing whitespace
> + done
> +}
> +
>  function mkstandalone()
>  {
>   local testname="$1"
> @@ -49,33 +56,14 @@ function mkstandalone()
>   cmdline=$(cut -d' ' -f2- <<< "$cmdline")
>  
>   cat < $standalone
> -#!/bin/sh
> +#!/bin/bash
>  
> -EOF
> -if [ "$arch" ]; then
> - cat <> $standalone
>  ARCH=\`uname -m | sed -e s/i.86/i386/ | sed -e 's/arm.*/arm/'\`
> -[ "\$ARCH" = "aarch64" ] && ARCH="arm64"
> -if [ "\$ARCH" != "$arch" ]; then
> - echo "skip $testname ($arch only)" 1>&2
> - exit 1
> -fi
>  
>  EOF
> -fi
> -if [ "$check" ]; then
> - cat <> $standalone
> -for param in $check; do
> - path=\`echo \$param | cut -d= -f1\`
> - value=\`echo \$param | cut -d= -f2\`
> - if [ -f "\$path" ] && [ "\`cat \$path\`" != "\$value" ]; then
> - echo "skip $testname (\$path not equal to \$value)" 1>&2
> - exit 1
> - fi
> -done
>  
> -EOF
> -fi
> +cat scripts/run.bash >> $standalone
> +
>  if [ ! -f $kernel ]; then
>   cat <> $standalone
>  echo "skip $testname (test kernel not present)" 1>&2
> @@ -100,9 +88,13 @@ MAX_SMP="MAX_SMP"
>  echo \$qemu $cmdline -smp $smp $opts
>  
>  cmdline="\`echo '$cmdline' | sed s%$kernel%_NO_FILE_4Uhere_%\`"
> -if \$qemu \$cmdline 2>&1 | grep 'No accelerator found'; then
> - ret=2
> -else
> +if \$qemu \$cmdline 2>&1 | grep 'No accelerator found' >/dev/null; then
> + echo "skip $testname (QEMU doesn't support KVM)"
> + exit 1
> +fi
> +
> +__run()
> +{
>   MAX_SMP=\`getconf _NPROCESSORS_CONF\`
>   while \$qemu \$cmdline -smp \$MAX_SMP 2>&1 | grep 'exceeds max cpus' > 
> /dev/null; do
>   MAX_SMP=\`expr \$MAX_SMP - 1\`
> @@ -110,16 +102,15 @@ else
>  
>   cmdline="\`echo '$cmdline' | sed s%$kernel%\$bin%\`"
>   \$qemu \$cmdline -smp $smp $opts
> - ret=\$?
> -fi
> -echo Return value from qemu: \$ret
> -if [ \$ret -le 1 ]; then
> - echo PASS $testname 1>&2
> -else
> - echo FAIL $testname 1>&2
> -fi
> +}
> +
> +__eval_log() { eval "\${@}"; }
> +
> +run `escape "${@}"`
> +ret=$?
> +
>  rm -f \$bin
> -exit 0
> +exit \$ret
>  EOF
>  fi
>  chmod +x $standalone
> -- 
> 2.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 11/12] wrappers: consolidate skip output

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:42PM +0100, Radim Krčmář wrote:
> Ugly helpers will get us yellow "SKIP" to stdout and 77 on exit.
> 
> Signed-off-by: Radim Krčmář 
> ---
>  v2: new
>  
>  scripts/mkstandalone.sh |  6 +++---
>  scripts/run.bash| 25 -
>  2 files changed, 19 insertions(+), 12 deletions(-)
> 
> diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
> index c37f694398b8..adb11abf650c 100755
> --- a/scripts/mkstandalone.sh
> +++ b/scripts/mkstandalone.sh
> @@ -66,7 +66,7 @@ cat scripts/run.bash >> $standalone
>  
>  if [ ! -f $kernel ]; then
>   cat <> $standalone
> -echo "skip $testname (test kernel not present)" 1>&2
> +echo "\$(SKIP) $testname (test kernel not present)"
>  exit 1
>  EOF
>  else
> @@ -89,8 +89,8 @@ echo \$qemu $cmdline -smp $smp $opts
>  
>  cmdline="\`echo '$cmdline' | sed s%$kernel%_NO_FILE_4Uhere_%\`"
>  if \$qemu \$cmdline 2>&1 | grep 'No accelerator found' >/dev/null; then
> - echo "skip $testname (QEMU doesn't support KVM)"
> - exit 1
> + echo "\$(SKIP) $testname (QEMU doesn't support KVM)"
> + exit \$EXIT_SKIP
>  fi
>  
>  __run()
> diff --git a/scripts/run.bash b/scripts/run.bash
> index d3e8d37d315d..06a13b9aaf1a 100644
> --- a/scripts/run.bash
> +++ b/scripts/run.bash
> @@ -1,3 +1,10 @@
> +PASS() { echo -ne "\e[32mPASS\e[0m"; }
> +SKIP() { echo -ne "\e[33mSKIP\e[0m"; }
> +FAIL() { echo -ne "\e[31mFAIL\e[0m"; }

I definitely put these in functions.bash, but then I'd put run() in
functions.bash too, as I said earlier.

> +
> +EXIT_SUCCESS=0
> +EXIT_SKIP=77
> +
>  function run()
>  {
>  local testname="$1"
> @@ -10,22 +17,22 @@ function run()
>  local accel="$8"
>  
>  if [ -z "$testname" ]; then
> -return
> +return $EXIT_SKIP
>  fi
>  
>  if [ -n "$only_group" ] && ! grep -q "$only_group" <<<$groups; then
> -return
> +return $EXIT_SKIP
>  fi
>  
>  if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
> -echo "skip $1 ($arch only)"
> -return
> +echo "`SKIP` $testname ($arch only)"
> +return $EXIT_SKIP
>  fi
>  
>  __eval_log "$check" || {
>  __eval_log 'echo "skipped $testname (check returned $?)"'
> -echo "skip $testname (failed check)"
> -return
> +echo "`SKIP` $testname (failed check)"
> +return $EXIT_SKIP
>  }
>  
>  cmdline="TESTNAME=$testname ACCEL=$accel __run $kernel -smp $smp $opts"
> @@ -41,9 +48,9 @@ function run()
>  ret=$(($? >> 1))
>  
>  case $ret in
> -0)  echo -ne "\e[32mPASS\e[0m" ;;
> -77) echo -ne "\e[33mSKIP\e[0m" ;;
> -*)  echo -ne "\e[31mFAIL\e[0m"
> +$EXIT_SUCCESS) PASS ;;
> +$EXIT_SKIP)SKIP ;;
> +*) FAIL
>  esac
>  
>  echo -n " $testname"
> -- 
> 2.6.4

Otherwise
Reviewed-by: Andrew Jones 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/9] powerpc/xics: Add icp_native_cause_ipi_rm

2015-12-17 Thread Suresh Warrier
Function to cause an IPI by directly updating the MFFR register
in the XICS. The function is meant for real-mode callers since
they cannot use the smp_ops->cause_ipi function which uses an
ioremapped address.

Normal usage is for the the KVM real mode code to set the IPI message
using smp_muxed_ipi_message_pass and then invoke icp_native_cause_ipi_rm
to cause the actual IPI.

The function requires kvm_hstate.xics_phys to have been initialized
with the physical address of XICS.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/xics.h   |  1 +
 arch/powerpc/sysdev/xics/icp-native.c | 21 +
 2 files changed, 22 insertions(+)

diff --git a/arch/powerpc/include/asm/xics.h b/arch/powerpc/include/asm/xics.h
index 0e25bdb..2546048 100644
--- a/arch/powerpc/include/asm/xics.h
+++ b/arch/powerpc/include/asm/xics.h
@@ -30,6 +30,7 @@
 #ifdef CONFIG_PPC_ICP_NATIVE
 extern int icp_native_init(void);
 extern void icp_native_flush_interrupt(void);
+extern void icp_native_cause_ipi_rm(int cpu);
 #else
 static inline int icp_native_init(void) { return -ENODEV; }
 #endif
diff --git a/arch/powerpc/sysdev/xics/icp-native.c 
b/arch/powerpc/sysdev/xics/icp-native.c
index eae3265..afdf62f 100644
--- a/arch/powerpc/sysdev/xics/icp-native.c
+++ b/arch/powerpc/sysdev/xics/icp-native.c
@@ -159,6 +159,27 @@ static void icp_native_cause_ipi(int cpu, unsigned long 
data)
icp_native_set_qirr(cpu, IPI_PRIORITY);
 }
 
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+void icp_native_cause_ipi_rm(int cpu)
+{
+   /*
+* Currently not used to send IPIs to another CPU
+* on the same core. Only caller is KVM real mode.
+* Need the physical address of the XICS to be
+* previously saved in kvm_hstate in the paca.
+*/
+   unsigned long xics_phys;
+
+   /*
+* Just like the cause_ipi functions, it is required to
+* include a full barrier (out8 includes a sync) before
+* causing the IPI.
+*/
+   xics_phys = paca[cpu].kvm_hstate.xics_phys;
+   out_rm8((u8 *)(xics_phys + XICS_MFRR), IPI_PRIORITY);
+}
+#endif
+
 /*
  * Called when an interrupt is received on an off-line CPU to
  * clear the interrupt, so that the CPU can go back to nap mode.
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 8/9] KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU

2015-12-17 Thread Suresh Warrier
This patch adds support to real-mode KVM to search for a core
running in the host partition and send it an IPI message with
VCPU to be woken. This avoids having to switch to the host
partition to complete an H_IPI hypercall when the VCPU which
is the target of the the H_IPI is not loaded (is not running
in the guest).

The patch also includes the support in the IPI handler running
in the host to do the wakeup by calling kvmppc_xics_ipi_action
for the PPC_MSG_RM_HOST_ACTION message.

When a guest is being destroyed, we need to ensure that there
are no pending IPIs waiting to wake up a VCPU before we free
the VCPUs of the guest. This is accomplished by:
- Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs
  before freeing any VCPUs in kvm_arch_destroy_vm()
- Any PPC_MSG_RM_HOST_ACTION messages must be executed first
  before any other PPC_MSG_CALL_FUNCTION messages

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/kernel/smp.c| 11 +
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 81 ++--
 arch/powerpc/kvm/powerpc.c   | 10 +
 3 files changed, 99 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e222efc..cb8be5d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -257,6 +257,17 @@ irqreturn_t smp_ipi_demux(void)
 
do {
all = xchg(>messages, 0);
+#if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+   /*
+* Must check for PPC_MSG_RM_HOST_ACTION messages
+* before PPC_MSG_CALL_FUNCTION messages because when
+* a VM is destroyed, we call kick_all_cpus_sync()
+* to ensure that any pending PPC_MSG_RM_HOST_ACTION
+* messages have completed before we free any VCPUs.
+*/
+   if (all & IPI_MESSAGE(PPC_MSG_RM_HOST_ACTION))
+   kvmppc_xics_ipi_action();
+#endif
if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNCTION))
generic_smp_call_function_interrupt();
if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 43ffbfe..a8ca3ed 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -51,11 +51,70 @@ static void ics_rm_check_resend(struct kvmppc_xics *xics,
 
 /* -- ICP routines -- */
 
+/*
+ * We start the search from our current CPU Id in the core map
+ * and go in a circle until we get back to our ID looking for a
+ * core that is running in host context and that hasn't already
+ * been targeted for another rm_host_ops.
+ *
+ * In the future, could consider using a fairer algorithm (one
+ * that distributes the IPIs better)
+ *
+ * Returns -1, if no CPU could be found in the host
+ * Else, returns a CPU Id which has been reserved for use
+ */
+static inline int grab_next_hostcore(int start,
+   struct kvmppc_host_rm_core *rm_core, int max, int action)
+{
+   bool success;
+   int core;
+   union kvmppc_rm_state old, new;
+
+   for (core = start + 1; core < max; core++)  {
+   old = new = READ_ONCE(rm_core[core].rm_state);
+
+   if (!old.in_host || old.rm_action)
+   continue;
+
+   /* Try to grab this host core if not taken already. */
+   new.rm_action = action;
+
+   success = cmpxchg64(_core[core].rm_state.raw,
+   old.raw, new.raw) == old.raw;
+   if (success) {
+   /*
+* Make sure that the store to the rm_action is made
+* visible before we return to caller (and the
+* subsequent store to rm_data) to synchronize with
+* the IPI handler.
+*/
+   smp_wmb();
+   return core;
+   }
+   }
+
+   return -1;
+}
+
+static inline int find_available_hostcore(int action)
+{
+   int core;
+   int my_core = smp_processor_id() >> threads_shift;
+   struct kvmppc_host_rm_core *rm_core = kvmppc_host_rm_ops_hv->rm_core;
+
+   core = grab_next_hostcore(my_core, rm_core, cpu_nr_cores(), action);
+   if (core == -1)
+   core = grab_next_hostcore(core, rm_core, my_core, action);
+
+   return core;
+}
+
 static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
struct kvm_vcpu *this_vcpu)
 {
struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
int cpu;
+   int hcore, hcpu;
 
/* Mark the target VCPU as having an interrupt pending */
vcpu->stat.queue_intr++;
@@ -67,11 +126,25 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
return;
}
 
-   

[PATCH v3 5/9] KVM: PPC: Book3S HV: Manage core host state

2015-12-17 Thread Suresh Warrier
Update the core host state in kvmppc_host_rm_ops whenever
the primary thread of the core enters the guest or returns
back.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/kvm/book3s_hv.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4042623..95a2ed3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2261,6 +2261,46 @@ static void post_guest_process(struct kvmppc_vcore *vc, 
bool is_master)
 }
 
 /*
+ * Clear core from the list of active host cores as we are about to
+ * enter the guest. Only do this if it is the primary thread of the
+ * core (not if a subcore) that is entering the guest.
+ */
+static inline void kvmppc_clear_host_core(int cpu)
+{
+   int core;
+
+   if (!kvmppc_host_rm_ops_hv || cpu_thread_in_core(cpu))
+   return;
+   /*
+* Memory barrier can be omitted here as we will do a smp_wmb()
+* later in kvmppc_start_thread and we need ensure that state is
+* visible to other CPUs only after we enter guest.
+*/
+   core = cpu >> threads_shift;
+   kvmppc_host_rm_ops_hv->rm_core[core].rm_state.in_host = 0;
+}
+
+/*
+ * Advertise this core as an active host core since we exited the guest
+ * Only need to do this if it is the primary thread of the core that is
+ * exiting.
+ */
+static inline void kvmppc_set_host_core(int cpu)
+{
+   int core;
+
+   if (!kvmppc_host_rm_ops_hv || cpu_thread_in_core(cpu))
+   return;
+
+   /*
+* Memory barrier can be omitted here because we do a spin_unlock
+* immediately after this which provides the memory barrier.
+*/
+   core = cpu >> threads_shift;
+   kvmppc_host_rm_ops_hv->rm_core[core].rm_state.in_host = 1;
+}
+
+/*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
  */
@@ -2372,6 +2412,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
}
}
 
+   kvmppc_clear_host_core(pcpu);
+
/* Start all the threads */
active = 0;
for (sub = 0; sub < core_info.n_subcores; ++sub) {
@@ -2468,6 +2510,8 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore 
*vc)
kvmppc_ipi_thread(pcpu + i);
}
 
+   kvmppc_set_host_core(pcpu);
+
spin_unlock(>lock);
 
/* make sure updates to secondary vcpu structs are visible now */
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/9] KVM: PPC: Book3S HV: Host-side RM data structures

2015-12-17 Thread Suresh Warrier
This patch defines the data structures to support the setting up
of host side operations while running in real mode in the guest,
and also the functions to allocate and free it.

The operations are for now limited to virtual XICS operations.
Currently, we have only defined one operation in the data
structure:
 - Wake up a VCPU sleeping in the host when it
   receives a virtual interrupt

The operations are assigned at the core level because PowerKVM
requires that the host run in SMT off mode. For each core,
we will need to manage its state atomically - where the state
is defined by:
1. Is the core running in the host?
2. Is there a Real Mode (RM) operation pending on the host?

Currently, core state is only managed at the whole-core level
even when the system is in split-core mode. This just limits
the number of free or "available" cores in the host to perform
any host-side operations.

The kvmppc_host_rm_core.rm_data allows any data to be passed by
KVM in real mode to the host core along with the operation to
be performed.

The kvmppc_host_rm_ops structure is allocated the very first time
a guest VM is started. Initial core state is also set - all online
cores are in the host. This structure is never deleted, not even
when there are no active guests. However, it needs to be freed
when the module is unloaded because the kvmppc_host_rm_ops_hv
can contain function pointers to kvm-hv.ko functions for the
different supported host operations.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/kvm_ppc.h   | 31 
 arch/powerpc/kvm/book3s_hv.c | 70 
 arch/powerpc/kvm/book3s_hv_builtin.c |  3 ++
 3 files changed, 104 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index c6ef05b..47cd441 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -437,6 +437,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
 {
return vcpu->arch.irq_type == KVMPPC_IRQ_XICS;
 }
+extern void kvmppc_alloc_host_rm_ops(void);
+extern void kvmppc_free_host_rm_ops(void);
 extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
 extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server);
 extern int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args);
@@ -446,6 +448,8 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 
icpval);
 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
 #else
+static inline void kvmppc_alloc_host_rm_ops(void) {};
+static inline void kvmppc_free_host_rm_ops(void) {};
 static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
{ return 0; }
 static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
@@ -459,6 +463,33 @@ static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, 
u32 cmd)
{ return 0; }
 #endif
 
+/*
+ * Host-side operations we want to set up while running in real
+ * mode in the guest operating on the xics.
+ * Currently only VCPU wakeup is supported.
+ */
+
+union kvmppc_rm_state {
+   unsigned long raw;
+   struct {
+   u32 in_host;
+   u32 rm_action;
+   };
+};
+
+struct kvmppc_host_rm_core {
+   union kvmppc_rm_state rm_state;
+   void *rm_data;
+   char pad[112];
+};
+
+struct kvmppc_host_rm_ops {
+   struct kvmppc_host_rm_core  *rm_core;
+   void(*vcpu_kick)(struct kvm_vcpu *vcpu);
+};
+
+extern struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv;
+
 static inline unsigned long kvmppc_get_epr(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_KVM_BOOKE_HV
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 54b45b7..4042623 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2967,6 +2967,73 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu 
*vcpu)
goto out_srcu;
 }
 
+#ifdef CONFIG_KVM_XICS
+/*
+ * Allocate a per-core structure for managing state about which cores are
+ * running in the host versus the guest and for exchanging data between
+ * real mode KVM and CPU running in the host.
+ * This is only done for the first VM.
+ * The allocated structure stays even if all VMs have stopped.
+ * It is only freed when the kvm-hv module is unloaded.
+ * It's OK for this routine to fail, we just don't support host
+ * core operations like redirecting H_IPI wakeups.
+ */
+void kvmppc_alloc_host_rm_ops(void)
+{
+   struct kvmppc_host_rm_ops *ops;
+   unsigned long l_ops;
+   int cpu, core;
+   int size;
+
+   /* Not the first time here ? */
+   if (kvmppc_host_rm_ops_hv != NULL)
+   return;
+
+   ops = kzalloc(sizeof(struct kvmppc_host_rm_ops), GFP_KERNEL);
+   if (!ops)
+   return;
+
+   size = cpu_nr_cores() * sizeof(struct kvmppc_host_rm_core);
+   

[PATCH v3 2/9] powerpc/smp: Add smp_muxed_ipi_set_message

2015-12-17 Thread Suresh Warrier
smp_muxed_ipi_message_pass() invokes smp_ops->cause_ipi, which
uses an ioremapped address to access registers on the XICS
interrupt controller to cause the IPI. Because of this real
mode callers cannot call smp_muxed_ipi_message_pass() for IPI
messaging.

This patch creates a separate function smp_muxed_ipi_set_message
just to set the IPI message without the cause_ipi routine.
After calling this function to set the IPI message, real
mode callers must cause the IPI by writing to the XICS registers
directly.

As part of this, we also change smp_muxed_ipi_message_pass
to call smp_muxed_ipi_set_message to set the message instead
of doing it directly inside the routine.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/smp.h | 1 +
 arch/powerpc/kernel/smp.c  | 9 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 9ef9c37..78083ed 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -124,6 +124,7 @@ extern const char *smp_ipi_name[];
 /* for irq controllers with only a single ipi */
 extern void smp_muxed_ipi_set_data(int cpu, unsigned long data);
 extern void smp_muxed_ipi_message_pass(int cpu, int msg);
+extern void smp_muxed_ipi_set_message(int cpu, int msg);
 extern irqreturn_t smp_ipi_demux(void);
 
 void smp_init_pSeries(void);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index a53a130..e222efc 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -218,7 +218,7 @@ void smp_muxed_ipi_set_data(int cpu, unsigned long data)
info->data = data;
 }
 
-void smp_muxed_ipi_message_pass(int cpu, int msg)
+void smp_muxed_ipi_set_message(int cpu, int msg)
 {
struct cpu_messages *info = _cpu(ipi_message, cpu);
char *message = (char *)>messages;
@@ -228,6 +228,13 @@ void smp_muxed_ipi_message_pass(int cpu, int msg)
 */
smp_mb();
message[msg] = 1;
+}
+
+void smp_muxed_ipi_message_pass(int cpu, int msg)
+{
+   struct cpu_messages *info = _cpu(ipi_message, cpu);
+
+   smp_muxed_ipi_set_message(cpu, msg);
/*
 * cause_ipi functions are required to include a full barrier
 * before doing whatever causes the IPI.
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread Alex Williamson
On Thu, 2015-12-17 at 18:37 +0800, yongji xie wrote:
> 
> On 2015/12/17 4:14, Alex Williamson wrote:
> > On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote:
> > > Current vfio-pci implementation disallows to mmap MSI-X table in
> > > case that user get to touch this directly.
> > > 
> > > However, EEH mechanism could ensure that a given pci device
> > > can only shoot the MSIs assigned for its PE and guest kernel also
> > > would not write to MSI-X table in pci_enable_msix() because
> > > para-virtualization on PPC64 platform. So MSI-X table is safe to
> > > access directly from the guest with EEH mechanism enabled.
> > The MSI-X table is paravirtualized on vfio in general and interrupt
> > remapping theoretically protects against errant interrupts, so why
> > is
> > this PPC64 specific?  We have the same safeguards on x86 if we want
> > to
> > decide they're sufficient.  Offhand, the only way I can think that
> > a
> > device can touch the MSI-X table is via backdoors or p2p DMA with
> > another device.
> Maybe I didn't make my point clear. The reasons why we can mmap MSI-X
> table on PPC64 are:
> 
> 1. EEH mechanism could ensure that a given pci device can only shoot
> the MSIs assigned for its PE. So it would not do harm to other memory
> space when the guest write a garbage MSI-X address/data to the vector
> table
> if we passthough MSI-X tables to guest.

Interrupt remapping does the same on x86.

> 2. The guest kernel would not write to MSI-X table on PPC64 platform
> when device drivers call pci_enable_msix() to initialize MSI-X
> interrupts.

This is irrelevant to the vfio API.  vfio is a userspace driver
interface, QEMU is just one possible consumer of the interface.  Even
in the case of PPC64 & QEMU, the guest is still capable of writing to
the vector table, it just probably won't.

> So I think it is safe to mmap/passthrough MSI-X table on PPC64
> platform.
> And I'm not sure whether other architectures can ensure these two 
> points. 

There is another consideration, which is the API exposed to the user.
 vfio currently enforces interrupt setup through ioctls by making the
PCI mechanisms for interrupt programming inaccessible through the
device regions.  Ignoring that you are only focused on PPC64 with QEMU,
does it make sense for the vfio API to allow a user to manipulate
interrupt programming in a way that not only will not work, but in a
way that we expect to fail and require error isolation to recover from?
 I can't say I'm fully convinced that a footnote in the documentation
is sufficient for that.  Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/9] powerpc/smp: Support more IPI messages

2015-12-17 Thread Suresh Warrier
This patch increases the number of demuxed messages for a
controller with a single ipi to 8 for 64-bit systems

This is required because we want to use the IPI mechanism
to send messages from a CPU running in KVM real mode in a
guest to a CPU in the host to take some action. Currently,
we only support 4 messages and all 4 are already taken.

Define a fifth message PPC_MSG_RM_HOST_ACTION for this
purpose.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/smp.h | 3 +++
 arch/powerpc/kernel/smp.c  | 8 
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 825663c..9ef9c37 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -114,6 +114,9 @@ extern int cpu_to_core_id(int cpu);
 #define PPC_MSG_TICK_BROADCAST 2
 #define PPC_MSG_DEBUGGER_BREAK  3
 
+/* This is only used by the powernv kernel */
+#define PPC_MSG_RM_HOST_ACTION 4
+
 /* for irq controllers that have dedicated ipis per message (4) */
 extern int smp_request_message_ipi(int virq, int message);
 extern const char *smp_ipi_name[];
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ec9ec20..a53a130 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -206,7 +206,7 @@ int smp_request_message_ipi(int virq, int msg)
 
 #ifdef CONFIG_PPC_SMP_MUXED_IPI
 struct cpu_messages {
-   int messages;   /* current messages */
+   long messages;  /* current messages */
unsigned long data; /* data for cause ipi */
 };
 static DEFINE_PER_CPU_SHARED_ALIGNED(struct cpu_messages, ipi_message);
@@ -236,15 +236,15 @@ void smp_muxed_ipi_message_pass(int cpu, int msg)
 }
 
 #ifdef __BIG_ENDIAN__
-#define IPI_MESSAGE(A) (1 << (24 - 8 * (A)))
+#define IPI_MESSAGE(A) (1uL << ((BITS_PER_LONG - 8) - 8 * (A)))
 #else
-#define IPI_MESSAGE(A) (1 << (8 * (A)))
+#define IPI_MESSAGE(A) (1uL << (8 * (A)))
 #endif
 
 irqreturn_t smp_ipi_demux(void)
 {
struct cpu_messages *info = this_cpu_ptr(_message);
-   unsigned int all;
+   unsigned long all;
 
mb();   /* order any irq clear */
 
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread Alex Williamson
On Thu, 2015-12-17 at 10:08 +, David Laight wrote:
> > The MSI-X table is paravirtualized on vfio in general and interrupt
> > remapping theoretically protects against errant interrupts, so why
> > is
> > this PPC64 specific? We have the same safeguards on x86 if we want
> > to
> > decide they're sufficient. Offhand, the only way I can think that a
> > device can touch the MSI-X table is via backdoors or p2p DMA with
> > another device.
> 
> Is this all related to the statements in the PCI(e) spec that the
> MSI-X table and Pending bit array should in their own BARs?
> (ISTR it even suggests a BAR each.)
> 
> Since the MSI-X table exists in device memory/registers there is
> nothing to stop the device modifying the table contents (or even
> ignoring the contents and writing address+data pairs that are known
> to reference the CPUs MSI-X interrupt generation logic).
> 
> We've an fpga based PCIe slave that has some additional PCIe slaves
> (associated with the interrupt generation logic) that are currently
> next to the PBA (which is 8k from the MSI-X table).
> If we can't map the PBA we can't actually raise any interrupts.
> The same would be true if page size is 64k and mapping the MSI-X
> table banned.
> 
> Do we need to change our PCIe slave address map so we don't need
> to access anything in the same page (which might be 64k were we to
> target large ppc - which we don't at the moment) as both the
> MSI-X table and the PBA?
> 
> I'd also note that being able to read the MSI-X table is a useful
> diagnostic that the relevant interrupts are enabled properly.

Yes, the spec requirement is that MSI-X structures must reside in a 4k
aligned area that doesn't overlap with other configuration registers
for the device.  It's only an advisement to put them into their own
BAR, and 4k clearly wasn't as forward looking as we'd hope.  Vfio
doesn't particularly care about the PBA, but if it resides in the same
host PAGE_SIZE area as the MSI-X vector table, you currently won't be
able to get to it.  Most devices are not at all dependent on the PBA
for any sort of functionality.

It's really more correct to say that both the vector table and PBA are
emulated by QEMU than paravirtualized.  Only PPC64 has the guest OS
taking a paravirtual path to program the vector table, everyone else
attempts to read/write to the device MMIO space, which gets trapped and
emulated in QEMU.  This is why the QEMU side patch has further ugly
hacks to mess with the ordering of MemoryRegions since even if we can
access and mmap the MSI-X vector table, we'll still trap into QEMU for
emulation.

How exactly does the ability to map the PBA affect your ability to
raise an interrupt?  I can only think that maybe you're writing PBA
bits to clear them, but the spec indicates that software should never
write to the PBA, only read, and that writes are undefined.  So that
would be very non-standard, QEMU drops writes, they don't even make it
to the hardware.  Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/9] KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUs

2015-12-17 Thread Suresh Warrier
The kvmppc_host_rm_ops structure keeps track of which cores are
are in the host by maintaining a bitmask of active/runnable
online CPUs that have not entered the guest. This patch adds
support to manage the bitmask when a CPU is offlined or onlined
in the host.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/kvm/book3s_hv.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 95a2ed3..da2cc56 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3012,6 +3012,36 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu 
*vcpu)
 }
 
 #ifdef CONFIG_KVM_XICS
+static int kvmppc_cpu_notify(struct notifier_block *self, unsigned long action,
+   void *hcpu)
+{
+   unsigned long cpu = (long)hcpu;
+
+   switch (action) {
+   case CPU_UP_PREPARE:
+   case CPU_UP_PREPARE_FROZEN:
+   kvmppc_set_host_core(cpu);
+   break;
+
+#ifdef CONFIG_HOTPLUG_CPU
+   case CPU_DEAD:
+   case CPU_DEAD_FROZEN:
+   case CPU_UP_CANCELED:
+   case CPU_UP_CANCELED_FROZEN:
+   kvmppc_clear_host_core(cpu);
+   break;
+#endif
+   default:
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block kvmppc_cpu_notifier = {
+   .notifier_call = kvmppc_cpu_notify,
+};
+
 /*
  * Allocate a per-core structure for managing state about which cores are
  * running in the host versus the guest and for exchanging data between
@@ -3045,6 +3075,8 @@ void kvmppc_alloc_host_rm_ops(void)
return;
}
 
+   get_online_cpus();
+
for (cpu = 0; cpu < nr_cpu_ids; cpu += threads_per_core) {
if (!cpu_online(cpu))
continue;
@@ -3063,14 +3095,21 @@ void kvmppc_alloc_host_rm_ops(void)
l_ops = (unsigned long) ops;
 
if (cmpxchg64((unsigned long *)_host_rm_ops_hv, 0, l_ops)) {
+   put_online_cpus();
kfree(ops->rm_core);
kfree(ops);
+   return;
}
+
+   register_cpu_notifier(_cpu_notifier);
+
+   put_online_cpus();
 }
 
 void kvmppc_free_host_rm_ops(void)
 {
if (kvmppc_host_rm_ops_hv) {
+   unregister_cpu_notifier(_cpu_notifier);
kfree(kvmppc_host_rm_ops_hv->rm_core);
kfree(kvmppc_host_rm_ops_hv);
kvmppc_host_rm_ops_hv = NULL;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 9/9] KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection

2015-12-17 Thread Suresh Warrier
Redirecting the wakeup of a VCPU from the H_IPI hypercall to
a core running in the host is usually a good idea, most workloads
seemed to benefit. However, in one heavily interrupt-driven SMT1
workload, some regression was observed. This patch adds a kvm_hv
module parameter called h_ipi_redirect to control this feature.

The default value for this tunable is 1 - that is enable the feature.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/kvm_ppc.h   |  1 +
 arch/powerpc/kvm/book3s_hv.c | 11 +++
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  5 -
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 1b93519..29d1442 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -448,6 +448,7 @@ extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 
icpval);
 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
 extern void kvmppc_xics_ipi_action(void);
+extern int h_ipi_redirect;
 #else
 static inline void kvmppc_alloc_host_rm_ops(void) {};
 static inline void kvmppc_free_host_rm_ops(void) {};
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d6280ed..182ec84 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -81,6 +81,17 @@ static int target_smt_mode;
 module_param(target_smt_mode, int, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)");
 
+#ifdef CONFIG_KVM_XICS
+static struct kernel_param_ops module_param_ops = {
+   .set = param_set_int,
+   .get = param_get_int,
+};
+
+module_param_cb(h_ipi_redirect, _param_ops, _ipi_redirect,
+   S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core");
+#endif
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index a8ca3ed..4c062e7 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -24,6 +24,9 @@
 
 #define DEBUG_PASSUP
 
+int h_ipi_redirect = 1;
+EXPORT_SYMBOL(h_ipi_redirect);
+
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp 
*icp,
u32 new_irq);
 
@@ -134,7 +137,7 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
cpu = vcpu->arch.thread_cpu;
if (cpu < 0 || cpu >= nr_cpu_ids) {
hcore = -1;
-   if (kvmppc_host_rm_ops_hv)
+   if (kvmppc_host_rm_ops_hv && h_ipi_redirect)
hcore = find_available_hostcore(XICS_RM_KICK_VCPU);
if (hcore != -1) {
hcpu = hcore << threads_shift;
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 7/9] KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM

2015-12-17 Thread Suresh Warrier
This patch adds the support for the kick VCPU operation for
kvmppc_host_rm_ops. The kvmppc_xics_ipi_action() function
provides the function to be invoked for a host side operation
when poked by the real mode KVM. This is initiated by KVM by
sending an IPI to any free host core.

KVM real mode must set the rm_action to XICS_RM_KICK_VCPU and
rm_data to point to the VCPU to be woken up before sending the IPI.
Note that we have allocated one kvmppc_host_rm_core structure
per core. The above values need to be set in the structure
corresponding to the core to which the IPI will be sent.

Signed-off-by: Suresh Warrier 
---
 arch/powerpc/include/asm/kvm_ppc.h   |  1 +
 arch/powerpc/kvm/book3s_hv.c |  2 ++
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 36 
 3 files changed, 39 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 47cd441..1b93519 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -447,6 +447,7 @@ extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
 extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
struct kvm_vcpu *vcpu, u32 cpu);
+extern void kvmppc_xics_ipi_action(void);
 #else
 static inline void kvmppc_alloc_host_rm_ops(void) {};
 static inline void kvmppc_free_host_rm_ops(void) {};
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index da2cc56..d6280ed 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3085,6 +3085,8 @@ void kvmppc_alloc_host_rm_ops(void)
ops->rm_core[core].rm_state.in_host = 1;
}
 
+   ops->vcpu_kick = kvmppc_fast_vcpu_kick_hv;
+
/*
 * Make the contents of the kvmppc_host_rm_ops structure visible
 * to other CPUs before we assign it to the global variable.
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c 
b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 24f5807..43ffbfe 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "book3s_xics.h"
@@ -623,3 +624,38 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long 
xirr)
  bail:
return check_too_hard(xics, icp);
 }
+
+/*  --- Non-real mode XICS-related built-in routines ---  */
+
+/**
+ * Host Operations poked by RM KVM
+ */
+static void rm_host_ipi_action(int action, void *data)
+{
+   switch (action) {
+   case XICS_RM_KICK_VCPU:
+   kvmppc_host_rm_ops_hv->vcpu_kick(data);
+   break;
+   default:
+   WARN(1, "Unexpected rm_action=%d data=%p\n", action, data);
+   break;
+   }
+
+}
+
+void kvmppc_xics_ipi_action(void)
+{
+   int core;
+   unsigned int cpu = smp_processor_id();
+   struct kvmppc_host_rm_core *rm_corep;
+
+   core = cpu >> threads_shift;
+   rm_corep = _host_rm_ops_hv->rm_core[core];
+
+   if (rm_corep->rm_data) {
+   rm_host_ipi_action(rm_corep->rm_state.rm_action,
+   rm_corep->rm_data);
+   rm_corep->rm_data = NULL;
+   rm_corep->rm_state.rm_action = 0;
+   }
+}
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/9] KVM: PPC: Book3S HV: Optimize wakeup VCPU from H_IPI

2015-12-17 Thread Suresh Warrier
When the VCPU target of an H_IPI hypercall is not running
in the guest, we need to do a kick VCPU (wake the VCPU thread)
to make it runnable. The real-mode version of the H_IPI hypercall
cannot do this because it involves waking a sleeping thread.
Thus the hcall returns H_TOO_HARD which forces a switch back
to host so that the H_IPI call can be completed in virtual mode.
This has been found to cause a slowdown for many workloads like
YCSB MongoDB, small message networking, etc. 

One solution is to hand off this job of waking the VCPU to a CPU
that is running in the host by sending it a message through the 
IPI mechanism from the hypercall.

This patch set optimizes the wakeup of the target VCPU by posting
a free core already running in the host to do the wakeup, thus
avoiding the switch to host and back. It requires maintaining a
bitmask of all the available cores in the system to indicate if
they are in the host or running in some guest. It also requires
the H_IPI hypercall to search for a free host core and send it a
new IPI message PPC_MSG_RM_HOST_ACTION after stashing away some
parameters like the pointer to VCPU for the IPI handler. Locks
are avoided by using atomic operations to save core state, to
find and reserve a core in the host, etc.

Note that it is possible for a guest to be destroyed and its
VCPUs freed before the IPI handler gets to run. This case is
handled by ensuring that any pending PPC_MSG_RM_HOST_ACTION
IPIs are completed before proceeding with freeing the VCPUs.

Currently, powerpc only support 4 IPI messages and all 4 are 
already taken for other purposes. This patch also set increases
the number of supported IPI messages to 8. It also provides the
code to send an IPI from hypercall running in real-mode since
the existing cause_ipi functions cannot be executed in real-mode.

A tunable h_ipi_redirect is also included in the patch set to
disable the feature. 

v3:
* Updated/clarified commit logs.
* Fix build break when building without KVM.

v2:
* Complete patch set sent to both kvm and linuxppc mailing lists
  to avoid build-breaks.
* Broke up real mode IPI messaging function into two pieces - one
  to set the message and one to cause the IPI. New function
  icp_native_cause_ipi_rm added to arch/powerpc/sysdev/xics/icp-native.c


Suresh Warrier (9):
  powerpc/smp: Support more IPI messages
  powerpc/smp: Add smp_muxed_ipi_set_message
  powerpc/xics: Add icp_native_cause_ipi_rm
  KVM: PPC: Book3S HV: Host-side RM data structures
  KVM: PPC: Book3S HV: Manage core host state
  KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUs
  KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM
  KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU
  KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection

 arch/powerpc/include/asm/kvm_ppc.h|  33 +++
 arch/powerpc/include/asm/smp.h|   4 +
 arch/powerpc/include/asm/xics.h   |   1 +
 arch/powerpc/kernel/smp.c |  28 +-
 arch/powerpc/kvm/book3s_hv.c  | 166 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  |   3 +
 arch/powerpc/kvm/book3s_hv_rm_xics.c  | 120 +++-
 arch/powerpc/kvm/powerpc.c|  10 ++
 arch/powerpc/sysdev/xics/icp-native.c |  21 +
 9 files changed, 378 insertions(+), 8 deletions(-)

-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/3] vfio-pci: Allow to mmap sub-page MMIO BARs if all MMIO BARs are page aligned

2015-12-17 Thread Alex Williamson
On Thu, 2015-12-17 at 18:26 +0800, yongji xie wrote:
> 
> On 2015/12/17 4:04, Alex Williamson wrote:
> > On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote:
> > > Current vfio-pci implementation disallows to mmap
> > > sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
> > > page
> > > may be shared with other BARs.
> > > 
> > > But we should allow to mmap these sub-page MMIO BARs if all MMIO
> > > BARs
> > > are page aligned which leads the BARs' mmio page would not be
> > > shared
> > > with other BARs.
> > > 
> > > This patch adds support for this case and we also add a
> > > VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED flag to notify userspace that
> > > platform supports all MMIO BARs to be page aligned.
> > > 
> > > Signed-off-by: Yongji Xie 
> > > ---
> > >   drivers/vfio/pci/vfio_pci.c |   10 +-
> > >   drivers/vfio/pci/vfio_pci_private.h |5 +
> > >   include/uapi/linux/vfio.h   |2 ++
> > >   3 files changed, 16 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/vfio/pci/vfio_pci.c
> > > b/drivers/vfio/pci/vfio_pci.c
> > > index 32b88bd..dbcad99 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -443,6 +443,9 @@ static long vfio_pci_ioctl(void *device_data,
> > >   if (vdev->reset_works)
> > >   info.flags |= VFIO_DEVICE_FLAGS_RESET;
> > >   
> > > + if (vfio_pci_bar_page_aligned())
> > > + info.flags |=
> > > VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED;
> > > +
> > >   info.num_regions = VFIO_PCI_NUM_REGIONS;
> > >   info.num_irqs = VFIO_PCI_NUM_IRQS;
> > >   
> > > @@ -479,7 +482,8 @@ static long vfio_pci_ioctl(void *device_data,
> > >    VFIO_REGION_INFO_FLAG_WRIT
> > > E;
> > >   if (IS_ENABLED(CONFIG_VFIO_PCI_MMAP) &&
> > >   pci_resource_flags(pdev,
> > > info.index) &
> > > - IORESOURCE_MEM && info.size >=
> > > PAGE_SIZE)
> > > + IORESOURCE_MEM && (info.size >=
> > > PAGE_SIZE ||
> > > + vfio_pci_bar_page_aligned()))
> > >   info.flags |=
> > > VFIO_REGION_INFO_FLAG_MMAP;
> > >   break;
> > >   case VFIO_PCI_ROM_REGION_INDEX:
> > > @@ -855,6 +859,10 @@ static int vfio_pci_mmap(void *device_data,
> > > struct vm_area_struct *vma)
> > >   return -EINVAL;
> > >   
> > >   phys_len = pci_resource_len(pdev, index);
> > > +
> > > + if (vfio_pci_bar_page_aligned())
> > > + phys_len = PAGE_ALIGN(phys_len);
> > > +
> > >   req_len = vma->vm_end - vma->vm_start;
> > >   pgoff = vma->vm_pgoff &
> > >   ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) -
> > > 1);
> > > diff --git a/drivers/vfio/pci/vfio_pci_private.h
> > > b/drivers/vfio/pci/vfio_pci_private.h
> > > index 0e7394f..319352a 100644
> > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > @@ -69,6 +69,11 @@ struct vfio_pci_device {
> > >   #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) ||
> > > is_msix(vdev)))
> > >   #define irq_is(vdev, type) (vdev->irq_type == type)
> > >   
> > > +static inline bool vfio_pci_bar_page_aligned(void)
> > > +{
> > > + return IS_ENABLED(CONFIG_PPC64);
> > > +}
> > I really dislike this.  This is a problem for any architecture that
> > runs on larger pages, and even an annoyance on 4k hosts.  Why are
> > we
> > only solving it for PPC64?
> Yes, I know it's a problem for other architectures. But I'm not sure
> if 
> other archs prefer
> to enforce the alignment of all BARs to be at least PAGE_SIZE which 
> would result in
> some waste of address space.
> 
> So I just propose a prototype and add PPC64 support here. And other 
> archs could decide
> to use it or not by themselves.
> > Can't we do something similar in the core PCI code and detect it?
> So you mean we can do it like this:
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index d390fc1..f46c04d 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -320,6 +320,11 @@ static inline resource_size_t 
> pci_resource_alignment(struct pci_dev *dev,
>  return resource_alignment(res);
>   }
> 
> +static inline bool pci_bar_page_aligned(void)
> +{
> +   return IS_ENABLED(CONFIG_PPC64);
> +}
> +
>   void pci_enable_acs(struct pci_dev *dev);
> 
>   struct pci_dev_reset_methods {
> 
> or add a config option to indicate that PCI MMIO BARs should be page 
> aligned? 

Yes, I'm thinking of a boot commandline option, maybe one that PPC64
can default to enabled if it chooses to.  The problem is not unique to
PPC64 and the solution should not be unique either.  I don't want to
need to revisit this for ARM, which we know is going to be similarly
afflicted.  Thanks,

Alex
--
To unsubscribe from this 

Re: [PATCH v2 0/3] KVM-UNIT-TESTS: Hyper-V SynIC timers test

2015-12-17 Thread Denis V. Lunev

On 12/17/2015 05:55 PM, Paolo Bonzini wrote:


On 16/12/2015 19:51, Denis V. Lunev wrote:

On 12/08/2015 07:36 PM, Andrey Smetanin wrote:

The test checks Hyper-V SynIC timers functionality.
The test runs on every vCPU and performs start/stop
of periodic/one-shot timers (with period=1ms) and checks
validity of received expiration messages in appropriate
ISR's.

Changes v2:
* Share generic Hyper-V tests code
* Hyper-V SynIC timers test fixes to improve
readability and output

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
CC: Paolo Bonzini 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org

Andrey Smetanin (3):
lib/x86: Make free_page() available to call
x86/hyperv: Move Hyper-V generic code into hyperv.h/hyperv.c
x86: Hyper-V SynIC timers test

   config/config-x86-common.mak |   8 +-
   lib/x86/msr.h|  23 ---
   lib/x86/vm.h |   1 +
   x86/hyperv.c |  24 +++
   x86/hyperv.h | 183 +
   x86/hyperv_stimer.c  | 376
+++
   x86/hyperv_synic.c   |  42 +
   x86/unittests.cfg|   5 +
   8 files changed, 603 insertions(+), 59 deletions(-)
   create mode 100644 x86/hyperv.c
   create mode 100644 x86/hyperv.h
   create mode 100644 x86/hyperv_stimer.c


ping :)

I was waiting for the QEMU 2.5 release so that I can merge the support
patches first.

Paolo

great. Thank you :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


linux-next: manual merge of the akpm-current tree with the kvm tree

2015-12-17 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/x86/kvm/mmu.c

between commits:

  7ee0e5b29d27 ("KVM: x86: MMU: Remove unused parameter of __direct_map()")
  029499b47738 ("KVM: x86: MMU: Make mmu_set_spte() return emulate value")

from the kvm tree and commit:

  7fd3f3e7c320 ("kvm: rename pfn_t to kvm_pfn_t")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc arch/x86/kvm/mmu.c
index a1a3d1907fdc,2dd83650d867..
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@@ -2546,9 -2564,10 +2546,9 @@@ done
return ret;
  }
  
 -static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 -   unsigned pte_access, int write_fault, int *emulate,
 -   int level, gfn_t gfn, kvm_pfn_t pfn, bool speculative,
 -   bool host_writable)
 +static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned 
pte_access,
-int write_fault, int level, gfn_t gfn, pfn_t pfn,
++   int write_fault, int level, gfn_t gfn, kvm_pfn_t pfn,
 +   bool speculative, bool host_writable)
  {
int was_rmapped = 0;
int rmap_count;
@@@ -2606,11 -2624,9 +2606,11 @@@
}
  
kvm_release_pfn_clean(pfn);
 +
 +  return emulate;
  }
  
- static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
+ static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
 bool no_dirty_log)
  {
struct kvm_memory_slot *slot;
@@@ -2691,8 -2708,9 +2691,8 @@@ static void direct_pte_prefetch(struct 
__direct_pte_prefetch(vcpu, sp, sptep);
  }
  
 -static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
 -  int map_writable, int level, gfn_t gfn, kvm_pfn_t pfn,
 -  bool prefault)
 +static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
-   int level, gfn_t gfn, pfn_t pfn, bool prefault)
++  int level, gfn_t gfn, kvm_pfn_t pfn, bool prefault)
  {
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread Benjamin Herrenschmidt
On Thu, 2015-12-17 at 14:41 -0700, Alex Williamson wrote:
> > So I think it is safe to mmap/passthrough MSI-X table on PPC64
> > platform.
> > And I'm not sure whether other architectures can ensure these two 
> > points. 
> 
> There is another consideration, which is the API exposed to the user.
>  vfio currently enforces interrupt setup through ioctls by making the
> PCI mechanisms for interrupt programming inaccessible through the
> device regions.  Ignoring that you are only focused on PPC64 with QEMU,
> does it make sense for the vfio API to allow a user to manipulate
> interrupt programming in a way that not only will not work, but in a
> way that we expect to fail and require error isolation to recover from?
>  I can't say I'm fully convinced that a footnote in the documentation
> is sufficient for that.  Thanks,

Well, one could argue that the "isolation" provided by qemu here is
fairly weak anyway ;-)

I mean. .. how do you know the device doesn't have a backdoor path into
that table via some other MMIO registers anyway ?

In any case, the HW isolation on platforms like pseries means that the
worst the guest can do si shoot itself in the foot. Big deal. On the
other hand, not bothering with intercepting the table has benefits,
such as reducing the memory region clutter, but also removing all the
massive performacne problems we see because adapters have critical
registers in the same 64K page as the MSI-X table.

So I don't think there is any question here that we *need* that
functionality in power. The filtering of the table by Qemu doesn't
provide any practical benefit, it just gets in the way.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] VFIO: platform: reset: fix a warning message condition

2015-12-17 Thread Dan Carpenter
This loop ends with count set to -1 and not zero so the warning message
isn't printed when it should be.  I've fixed this by change the postop
to a preop.

Fixes: 0990822c9866 ('VFIO: platform: reset: AMD xgbe reset module')
Signed-off-by: Dan Carpenter 

diff --git a/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c 
b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
index da5356f..d4030d0 100644
--- a/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
+++ b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
@@ -110,7 +110,7 @@ int vfio_platform_amdxgbe_reset(struct vfio_platform_device 
*vdev)
usleep_range(10, 15);
 
count = 2000;
-   while (count-- && (ioread32(xgmac_regs->ioaddr + DMA_MR) & 1))
+   while (--count && (ioread32(xgmac_regs->ioaddr + DMA_MR) & 1))
usleep_range(500, 600);
 
if (!count)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported

2015-12-17 Thread David Laight
> The MSI-X table is paravirtualized on vfio in general and interrupt
> remapping theoretically protects against errant interrupts, so why is
> this PPC64 specific? We have the same safeguards on x86 if we want to
> decide they're sufficient. Offhand, the only way I can think that a
> device can touch the MSI-X table is via backdoors or p2p DMA with
> another device.

Is this all related to the statements in the PCI(e) spec that the
MSI-X table and Pending bit array should in their own BARs?
(ISTR it even suggests a BAR each.)

Since the MSI-X table exists in device memory/registers there is
nothing to stop the device modifying the table contents (or even
ignoring the contents and writing address+data pairs that are known
to reference the CPUs MSI-X interrupt generation logic).

We've an fpga based PCIe slave that has some additional PCIe slaves
(associated with the interrupt generation logic) that are currently
next to the PBA (which is 8k from the MSI-X table).
If we can't map the PBA we can't actually raise any interrupts.
The same would be true if page size is 64k and mapping the MSI-X
table banned.

Do we need to change our PCIe slave address map so we don't need
to access anything in the same page (which might be 64k were we to
target large ppc - which we don't at the moment) as both the
MSI-X table and the PBA?

I'd also note that being able to read the MSI-X table is a useful
diagnostic that the relevant interrupts are enabled properly.

David



Re: [PATCH 00/16] MIPS: KVM: Misc trivial cleanups

2015-12-17 Thread James Hogan
Hi Paolo.

On Thu, Dec 17, 2015 at 11:39:14AM +0100, Paolo Bonzini wrote:
> 
> 
> On 17/12/2015 00:49, James Hogan wrote:
> > This patchset contains a bunch of miscellaneous cleanups (which are
> > mostly trivial) for MIPS KVM & MIPS headers, such as:
> > - Style/whitespace fixes
> > - General cleanup and removal of dead code.
> > - Moving/refactoring of general MIPS definitions out of arch/mips/kvm/
> >   and into arch/mips/include/asm/ so they can be shared with the rest of
> >   arch/mips. Specifically COP0 register bits, exception codes, cache
> >   ops, & instruction opcodes.
> > - Add MAINTAINERS entry for MIPS KVM.
> > 
> > Due to the interaction with other arch/mips/ code, I think it makes
> > sense for these to go via the MIPS tree.
> 
> No objection.
> 
> Acked-by: Paolo Bonzini 

Thanks!

> 
> I think I'd use s8/u8 instead of int8_t/uint8_t in patch 15, but really
> that's just me.  I'm fine either way, and that's really the only comment
> I have on the series. :)

I'm inclined to agree. u?int(8|16|32)_t all over arch/mips/kvm/ instead
of [us](8|16|32) irritates me as its not very kernel'y.

Ralf: Maybe don't apply patch 15 (I've marked rejected in patchwork),
and I'll do a wider cleanup at some point instead.

Cheers
James

> 
> Paolo
> 
> > James Hogan (16):
> >   MIPS: KVM: Trivial whitespace and style fixes
> >   MIPS: KVM: Drop some unused definitions from kvm_host.h
> >   MIPS: Move definition of DC bit to mipsregs.h
> >   MIPS: KVM: Drop unused kvm_mips_host_tlb_inv_index()
> >   MIPS: KVM: Convert EXPORT_SYMBOL to _GPL
> >   MIPS: KVM: Refactor added offsetof()s
> >   MIPS: KVM: Make kvm_mips_{init,exit}() static
> >   MIPS: Move Cause.ExcCode trap codes to mipsregs.h
> >   MIPS: Update trap codes
> >   MIPS: Use EXCCODE_ constants with set_except_vector()
> >   MIPS: Break down cacheops.h definitions
> >   MIPS: KVM: Use cacheops.h definitions
> >   MIPS: Move KVM specific opcodes into asm/inst.h
> >   MIPS: KVM: Add missing newline to kvm_err()
> >   MIPS: KVM: Consistent use of uint*_t in MMIO handling
> >   MAINTAINERS: Add KVM for MIPS entry
> > 
> >  MAINTAINERS   |   8 +++
> >  arch/mips/Kconfig |   3 +-
> >  arch/mips/include/asm/cacheops.h  | 106 --
> >  arch/mips/include/asm/kvm_host.h  |  39 +
> >  arch/mips/include/asm/mipsregs.h  |  34 +++
> >  arch/mips/include/uapi/asm/inst.h |   3 +-
> >  arch/mips/kernel/cpu-bugs64.c |   8 +--
> >  arch/mips/kernel/traps.c  |  52 -
> >  arch/mips/kvm/callback.c  |   2 +-
> >  arch/mips/kvm/dyntrans.c  |  10 +---
> >  arch/mips/kvm/emulate.c   | 118 
> > --
> >  arch/mips/kvm/interrupt.c |   8 +--
> >  arch/mips/kvm/locore.S|  12 ++--
> >  arch/mips/kvm/mips.c  |  38 ++--
> >  arch/mips/kvm/opcode.h|  22 ---
> >  arch/mips/kvm/tlb.c   |  77 +++--
> >  arch/mips/kvm/trap_emul.c |   1 -
> >  17 files changed, 245 insertions(+), 296 deletions(-)
> >  delete mode 100644 arch/mips/kvm/opcode.h
> > 
> > Cc: Ralf Baechle 
> > Cc: Paolo Bonzini 
> > Cc: Gleb Natapov 
> > Cc: linux-m...@linux-mips.org
> > Cc: kvm@vger.kernel.org
> > 


signature.asc
Description: Digital signature


Re: [PATCH kvm-unit-tests v2 04/12] scripts/mkstandalone: improve exit paths

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:35PM +0100, Radim Krčmář wrote:
> trap can be called on EXIT, which covers most exits.

Not with dash :-) If we decide to depend on bash, then

Reviewed-by: Andrew Jones 

> 
> Signed-off-by: Radim Krčmář 
> ---
>  v2: new
>  
>  scripts/mkstandalone.sh | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
> index cf2182dbd936..778383077769 100755
> --- a/scripts/mkstandalone.sh
> +++ b/scripts/mkstandalone.sh
> @@ -71,7 +71,7 @@ exit 1
>  EOF
>  else
>   cat <> $standalone
> -trap 'rm -f \$bin; exit 1' HUP INT TERM
> +trap 'rm -f \$bin' EXIT
>  bin=\`mktemp\`
>  base64 -d << 'BIN_EOF' | zcat > \$bin &&
>  EOF
> @@ -107,10 +107,7 @@ __run()
>  __eval_log() { eval "\${@}"; }
>  
>  run `escape "${@}"`
> -ret=$?
> -
> -rm -f \$bin
> -exit \$ret
> +exit \$?
>  EOF
>  fi
>  chmod +x $standalone
> -- 
> 2.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests v2 00/12] Improve the output of test runners

2015-12-17 Thread Andrew Jones
On Thu, Dec 17, 2015 at 06:53:31PM +0100, Radim Krčmář wrote:
> v1: http://www.spinics.net/lists/kvm/msg125202.html
> 
> Drew brought up the existence of scripts/mkstandalone.sh, which
> significantly expanded v2 (and my set of curses) ...
> I didn't want to do the same twice, so first part of this series,
> [1-4/12], reuses run() from run_tests.sh and does some non-conservative
> changes to scripts/mkstandalone.sh.  scripts/mkstandalone.sh is lacking
> behind run_tests.sh, but should be good enough to fulfill its purpose.
> 
> The output of run_tests.sh has also changed a bit and now looks like
> this (you'll again need to imagine colours):
> 
> > PASS apic (14 tests)
> > PASS ioapic (19 tests)
> > PASS smptest (1 tests)
> > PASS smptest3 (1 tests)
> > PASS vmexit_cpuid
> > PASS vmexit_vmcall

Why do some tests, which have only 1 test, say (1 tests), but other
tests don't say anything?

> > PASS vmexit_mov_from_cr8
> > PASS vmexit_mov_to_cr8
> > PASS vmexit_inl_pmtimer
> > PASS vmexit_ipi
> > PASS vmexit_ipi_halt
> > PASS vmexit_ple_round_robin
> > PASS access
> > SKIP smap (0 tests)
> > SKIP pku (0 tests)
> > PASS emulator (132 tests, 1 skipped)
> > PASS eventinj (13 tests)
> > PASS hypercall (2 tests)
> > PASS idt_test (4 tests)
> > PASS msr (13 tests)
> > PASS pmu (67 tests, 1 expected failures)
> > PASS port80
> > PASS realmode
> > PASS s3
> > PASS sieve
> > PASS tsc (3 tests)
> > PASS tsc_adjust (5 tests)
> > PASS xsave (17 tests)
> > PASS rmap_chain
> > SKIP svm (0 tests)
> > SKIP svm-disabled (0 tests)
> > SKIP taskswitch (i386 only)
> > SKIP taskswitch2 (i386 only)
> > PASS kvmclock_test
> > PASS pcid (3 tests)
> > SKIP vmx (0 tests)
> > PASS debug (7 tests)
> > SKIP hyperv_synic (failed check)

Some nice improvements with this series. I'm not sure I like depending
on bash in standalone tests, but then that could just be cause I worked
pretty hard at avoiding the dependency, and thus I'll have to cry at the
loss of it...

Please review the series I'll send in about 2 minutes, so we can discuss
how to integrate them.

Thanks,
drew

> 
> 
> Radim Krčmář (12):
>   run_tests: move run() to scripts/
>   run_tests: prepare for changes in scripts/mkstandalone
>   scripts/mkstandalone: use common run function
>   scripts/mkstandalone: improve exit paths
>   lib/report: allow test skipping
>   x86/*: report skipped tests
>   x86/pmu: expect failure with nmi_watchdog
>   scripts/run: generalize check
>   x86/hyperv_synic: check for support before testing
>   run_tests: print summary
>   wrappers: consolidate skip output
>   run_tests: suppress stderr
> 
>  lib/libcflat.h  |  1 +
>  lib/report.c| 44 +++---
>  run_tests.sh| 58 +---
>  scripts/mkstandalone.sh | 64 
> +
>  scripts/run.bash| 62 +++
>  x86/apic.c  |  7 +++---
>  x86/emulator.c  |  2 +-
>  x86/hyperv_synic.c  |  2 +-
>  x86/pku.c   |  2 +-
>  x86/pmu.c   | 11 +++--
>  x86/smap.c  |  2 +-
>  x86/svm.c   |  2 +-
>  x86/tsc.c   |  2 +-
>  x86/unittests.cfg   |  4 ++--
>  14 files changed, 146 insertions(+), 117 deletions(-)
>  create mode 100644 scripts/run.bash
> 
> -- 
> 2.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: KVM: debug: Remove spurious inline attributes

2015-12-17 Thread Alex Bennée

Marc Zyngier  writes:

> On 17/12/15 16:28, Alex Bennée wrote:
>>
>> Marc Zyngier  writes:
>>
>>> The debug trapping code is pretty heavy on the "inline" attribute,
>>> but most functions are actually referenced in the sysreg tables,
>>> making the inlining imposible.
>>>
>>> Removing the useless inline qualifier seems the right thing to do,
>>> having verified that the output code is similar.
>>>
>>> Cc: Alex Bennée 
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>  arch/arm64/kvm/sys_regs.c | 58 
>>> +++
>>>  1 file changed, 29 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index 88adebf..eec3598 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -220,9 +220,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>>>   * All writes will set the KVM_ARM64_DEBUG_DIRTY flag to ensure the
>>>   * hyp.S code switches between host and guest values in future.
>>>   */
>>> -static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
>>> - struct sys_reg_params *p,
>>> - u64 *dbg_reg)
>>> +static void reg_to_dbg(struct kvm_vcpu *vcpu,
>>> +  struct sys_reg_params *p,
>>> +  u64 *dbg_reg)
>>>  {
>>> u64 val = p->regval;
>>>
>>> @@ -235,18 +235,18 @@ static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
>>> vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>>>  }
>>>
>>> -static inline void dbg_to_reg(struct kvm_vcpu *vcpu,
>>> - struct sys_reg_params *p,
>>> - u64 *dbg_reg)
>>> +static void dbg_to_reg(struct kvm_vcpu *vcpu,
>>> +  struct sys_reg_params *p,
>>> +  u64 *dbg_reg)
>>>  {
>>> p->regval = *dbg_reg;
>>> if (p->is_32bit)
>>> p->regval &= 0xUL;
>>>  }
>>
>> Christoffer's "register keyword" comments not-withstanding I'd prefer to
>> keep the reg_to_dbg/dbg_to_reg functions as inline because they really
>> are just boilerplate helpers I didn't want to repeat in the actual
>> access functions - although if you've looked at the code I assume that
>> means GCC has been smart about it.
>
> Indeed, GCC is smart enough to directly inline it. In general, GCC is
> doing a pretty good job at inlining static functions that are small
> enough not to be worth jumping to. These days, "static inline" only
> really makes sense in an include file.

Fair enough.

>
>> The rest all make sense. I wonder why I was being so inline happy?
>>
>> Reviewed-by: Alex Bennée 
>
> Thanks,
>
>   M.


--
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] VFIO: capability chains

2015-12-17 Thread Alexey Kardashevskiy

On 11/24/2015 07:43 AM, Alex Williamson wrote:

Please see the commit log and comments in patch 1 for a general
explanation of the problems that this series tries to address.  The
general problem is that we have several cases where we want to expose
variable sized information to the user, whether it's sparse mmaps for
a region, as implemented here, or DMA mapping ranges of an IOMMU, or
reserved MSI mapping ranges, etc.  Extending data structures is hard;
extending them to report variable sized data is really hard.  After
considering several options, I think the best approach is to copy how
PCI does capabilities.  This allows the ioctl to only expose the
capabilities that are relevant for them, avoids data structures that
are too complicated to parse, and avoids creating a new ioctl each
time we think of something else that we'd like to report.  This method
also doesn't preclude extensions to the fixed structure since the
offset of these capabilities is entirely dynamic.

Comments welcome, I'll also follow-up to the QEMU and KVM lists with
an RFC making use of this for mmaps skipping over the MSI-X table.
Thanks,


Out of curiosity - could this information be exposed to the userspace via 
/sys/bus/pci/devices/:xx:xx:x/vfio_? It seems not to change after 
vfio_pci driver is bound to a device.






Alex

---

Alex Williamson (3):
   vfio: Define capability chains
   vfio: Define sparse mmap capability for regions
   vfio/pci: Include sparse mmap capability for MSI-X table regions


  drivers/vfio/pci/vfio_pci.c |  101 +++
  include/uapi/linux/vfio.h   |   53 ++-
  2 files changed, 152 insertions(+), 2 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] VFIO: capability chains

2015-12-17 Thread Alex Williamson
On Fri, 2015-12-18 at 13:05 +1100, Alexey Kardashevskiy wrote:
> On 11/24/2015 07:43 AM, Alex Williamson wrote:
> > Please see the commit log and comments in patch 1 for a general
> > explanation of the problems that this series tries to address.  The
> > general problem is that we have several cases where we want to
> > expose
> > variable sized information to the user, whether it's sparse mmaps
> > for
> > a region, as implemented here, or DMA mapping ranges of an IOMMU,
> > or
> > reserved MSI mapping ranges, etc.  Extending data structures is
> > hard;
> > extending them to report variable sized data is really hard.  After
> > considering several options, I think the best approach is to copy
> > how
> > PCI does capabilities.  This allows the ioctl to only expose the
> > capabilities that are relevant for them, avoids data structures
> > that
> > are too complicated to parse, and avoids creating a new ioctl each
> > time we think of something else that we'd like to report.  This
> > method
> > also doesn't preclude extensions to the fixed structure since the
> > offset of these capabilities is entirely dynamic.
> > 
> > Comments welcome, I'll also follow-up to the QEMU and KVM lists
> > with
> > an RFC making use of this for mmaps skipping over the MSI-X table.
> > Thanks,
> 
> Out of curiosity - could this information be exposed to the userspace
> via 
> /sys/bus/pci/devices/:xx:xx:x/vfio_? It seems not to change
> after 
> vfio_pci driver is bound to a device.

For what purpose?  vfio doesn't have a sysfs interface, why start one? 
Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmclock doesn't work, help?

2015-12-17 Thread Marcelo Tosatti
On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti  wrote:
> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  
> >> wrote:
> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  
> >> > wrote:
> >> >>
> >> >>
> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
> >> >>> > RAW TSC NTP corrected TSC
> >> >>> > t0  10  10
> >> >>> > t1  20  19.99
> >> >>> > t2  30  29.98
> >> >>> > t3  40  39.97
> >> >>> > t4  50  49.96

(1)

> >> >>> >
> >> >>> > ...
> >> >>> >
> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
> >> >>> > you can see what will happen.
> >> >>>
> >> >>> Sure, but why would you ever switch from one to the other?
> >> >>
> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
> >> >> resume, the TSC certainly increases at the same rate as before, but the
> >> >> raw TSC restarted counting from 0 and systemtime has increased slower
> >> >> than the guest kvmclock.
> >> >
> >> > Wait, are we talking about the host's NTP or the guest's NTP?
> >> >
> >> > If it's the host's, then wouldn't systemtime be reset after resume to
> >> > the NTP corrected value?  If so, the guest wouldn't see time go
> >> > backwards.
> >> >
> >> > If it's the guest's, then the guest's NTP correction is applied on top
> >> > of kvmclock, and this shouldn't matter.
> >> >
> >> > I still feel like I'm missing something very basic here.
> >> >
> >>
> >> OK, I think I get it.
> >>
> >> Marcelo, I thought that kvmclock was supposed to propagate the host's
> >> correction to the guest.  If it did, indeed, propagate the correction
> >> then, after resume, the host's new system_time would match the guest's
> >> idea of it (after accounting for the guest's long nap), and I don't
> >> think there would be a problem.
> >> That being said, I can't find the code in the masterclock stuff that
> >> would actually do this.
> >
> > Guest clock is maintained by guest timekeeping code, which does:
> >
> > timer_interrupt()
> > offset = read clocksource since last timer interrupt
> > accumulate_to_systemclock(offset)
> >
> > The frequency correction of NTP in the host can be applied to
> > kvmclock, which will be visible to the guest
> > at "read clocksource since last timer interrupt"
> > (kvmclock_clocksource_read function).
> 
> pvclock_clocksource_read?  That seems to do the same thing as all the
> other clocksource access functions.
> 
> >
> > This does not mean that the NTP correction in the host is propagated
> > to the guests system clock directly.
> >
> > (For example, the guest can run NTP which is free to do further
> > adjustments at "accumulate_to_systemclock(offset)" time).
> 
> Of course.  But I expected that, in the absence of NTP on the guest,
> that the guest would track the host's *corrected* time.
> 
> >
> >> If, on the other hand, the host's NTP correction is not supposed to
> >> propagate to the guest,
> >
> > This is optional. There is a module option to control this, in fact.
> >
> > Its nice to have, because then you can execute a guest without NTP
> > (say without network connection), and have a kvmclock (kvmclock is a
> > clocksource, not a guest system clock) which is NTP corrected.
> 
> Can you point to how this works?  I found kvm_guest_time_update, whch
> is called under circumstances that I haven't untangled.  I can't
> really tell what it's trying to do.

Documentation/virtual/kvm/timekeeping.txt.

> In any case, this still seems much more convoluted than it has to be.
> In the case in which the host has a stable TSC (tsc is selected in the
> core timekeeping code, VCLOCK_TSC is set, etc), which is basically all
> the time on the last few generations of CPUs, then the core
> timekeeping code is already exposing a linear function that's supposed
> to be used for monotonic, cpu-local access to a corrected nanosecond
> counter.  It's even in pretty much exactly the right form to pass
> through to the guest via pvclock in the gtod data.  Why doesn't KVM
> pass it through verbatim, updated in real time?  Is there some legacy
> reason that KVM must apply its own corrections and has to jump through
> hoops to pause vcpus when updating those vcpu's copies of the pvclock
> data?

Read the comment on x86.c which starts with
" *
 * Assuming a stable TSC across physical CPUS, and a stable TSC
 * across virtual CPUs, the following condition is possible.
 * Each numbered line represents an event visible to both
 * CPUs at the next numbered event.
"

> >> then shouldn't KVM just update system_time on
> >> resume to whatever the guest would think it had (which I think would
> >> be equivalent to the host's CLOCK_MONOTONIC_RAW value, 

Re: kvmclock doesn't work, help?

2015-12-17 Thread Andy Lutomirski
On Thu, Dec 17, 2015 at 11:08 AM, Marcelo Tosatti  wrote:
> On Thu, Dec 17, 2015 at 08:33:17AM -0800, Andy Lutomirski wrote:
>> On Wed, Dec 16, 2015 at 1:57 PM, Marcelo Tosatti  wrote:
>> > On Wed, Dec 16, 2015 at 10:17:16AM -0800, Andy Lutomirski wrote:
>> >> On Wed, Dec 16, 2015 at 9:48 AM, Andy Lutomirski  
>> >> wrote:
>> >> > On Tue, Dec 15, 2015 at 12:42 AM, Paolo Bonzini  
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >> On 14/12/2015 23:31, Andy Lutomirski wrote:
>> >> >>> > RAW TSC NTP corrected TSC
>> >> >>> > t0  10  10
>> >> >>> > t1  20  19.99
>> >> >>> > t2  30  29.98
>> >> >>> > t3  40  39.97
>> >> >>> > t4  50  49.96
>
> (1)
>
>> >> >>> >
>> >> >>> > ...
>> >> >>> >
>> >> >>> > if you suddenly switch from RAW TSC to NTP corrected TSC,
>> >> >>> > you can see what will happen.
>> >> >>>
>> >> >>> Sure, but why would you ever switch from one to the other?
>> >> >>
>> >> >> The guest uses the raw TSC and systemtime = 0 until suspend.  After
>> >> >> resume, the TSC certainly increases at the same rate as before, but the
>> >> >> raw TSC restarted counting from 0 and systemtime has increased slower
>> >> >> than the guest kvmclock.
>> >> >
>> >> > Wait, are we talking about the host's NTP or the guest's NTP?
>> >> >
>> >> > If it's the host's, then wouldn't systemtime be reset after resume to
>> >> > the NTP corrected value?  If so, the guest wouldn't see time go
>> >> > backwards.
>> >> >
>> >> > If it's the guest's, then the guest's NTP correction is applied on top
>> >> > of kvmclock, and this shouldn't matter.
>> >> >
>> >> > I still feel like I'm missing something very basic here.
>> >> >
>> >>
>> >> OK, I think I get it.
>> >>
>> >> Marcelo, I thought that kvmclock was supposed to propagate the host's
>> >> correction to the guest.  If it did, indeed, propagate the correction
>> >> then, after resume, the host's new system_time would match the guest's
>> >> idea of it (after accounting for the guest's long nap), and I don't
>> >> think there would be a problem.
>> >> That being said, I can't find the code in the masterclock stuff that
>> >> would actually do this.
>> >
>> > Guest clock is maintained by guest timekeeping code, which does:
>> >
>> > timer_interrupt()
>> > offset = read clocksource since last timer interrupt
>> > accumulate_to_systemclock(offset)
>> >
>> > The frequency correction of NTP in the host can be applied to
>> > kvmclock, which will be visible to the guest
>> > at "read clocksource since last timer interrupt"
>> > (kvmclock_clocksource_read function).
>>
>> pvclock_clocksource_read?  That seems to do the same thing as all the
>> other clocksource access functions.
>>
>> >
>> > This does not mean that the NTP correction in the host is propagated
>> > to the guests system clock directly.
>> >
>> > (For example, the guest can run NTP which is free to do further
>> > adjustments at "accumulate_to_systemclock(offset)" time).
>>
>> Of course.  But I expected that, in the absence of NTP on the guest,
>> that the guest would track the host's *corrected* time.
>>
>> >
>> >> If, on the other hand, the host's NTP correction is not supposed to
>> >> propagate to the guest,
>> >
>> > This is optional. There is a module option to control this, in fact.
>> >
>> > Its nice to have, because then you can execute a guest without NTP
>> > (say without network connection), and have a kvmclock (kvmclock is a
>> > clocksource, not a guest system clock) which is NTP corrected.
>>
>> Can you point to how this works?  I found kvm_guest_time_update, whch
>> is called under circumstances that I haven't untangled.  I can't
>> really tell what it's trying to do.
>
> Documentation/virtual/kvm/timekeeping.txt.
>

That document is really long.  I skimmed it and found nothing.

>> In any case, this still seems much more convoluted than it has to be.
>> In the case in which the host has a stable TSC (tsc is selected in the
>> core timekeeping code, VCLOCK_TSC is set, etc), which is basically all
>> the time on the last few generations of CPUs, then the core
>> timekeeping code is already exposing a linear function that's supposed
>> to be used for monotonic, cpu-local access to a corrected nanosecond
>> counter.  It's even in pretty much exactly the right form to pass
>> through to the guest via pvclock in the gtod data.  Why doesn't KVM
>> pass it through verbatim, updated in real time?  Is there some legacy
>> reason that KVM must apply its own corrections and has to jump through
>> hoops to pause vcpus when updating those vcpu's copies of the pvclock
>> data?
>
> Read the comment on x86.c which starts with
> " *
>  * Assuming a stable TSC across physical CPUS, and a stable TSC
>  * across virtual CPUs, the following condition is possible.
>  * Each numbered line 

[PATCH kernel] vfio: Add explicit alignments in vfio_iommu_spapr_tce_create

2015-12-17 Thread Alexey Kardashevskiy
The vfio_iommu_spapr_tce_create struct has 4x32bit and 2x64bit fields
which should have resulted in sizeof(fio_iommu_spapr_tce_create) equal
to 32 bytes. However due to the gcc's default alignment, the actual
size of this struct is 40 bytes.

This fills gaps with __resv1/2 fields.

This should not cause any change in behavior.

Signed-off-by: Alexey Kardashevskiy 
---
 include/uapi/linux/vfio.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9fd7b5d..d117233 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -568,8 +568,10 @@ struct vfio_iommu_spapr_tce_create {
__u32 flags;
/* in */
__u32 page_shift;
+   __u32 __resv1;
__u64 window_size;
__u32 levels;
+   __u32 __resv2;
/* out */
__u64 start_addr;
 };
-- 
2.5.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] VFIO: capability chains

2015-12-17 Thread Alexey Kardashevskiy

On 12/18/2015 01:38 PM, Alex Williamson wrote:

On Fri, 2015-12-18 at 13:05 +1100, Alexey Kardashevskiy wrote:

On 11/24/2015 07:43 AM, Alex Williamson wrote:

Please see the commit log and comments in patch 1 for a general
explanation of the problems that this series tries to address.  The
general problem is that we have several cases where we want to
expose
variable sized information to the user, whether it's sparse mmaps
for
a region, as implemented here, or DMA mapping ranges of an IOMMU,
or
reserved MSI mapping ranges, etc.  Extending data structures is
hard;
extending them to report variable sized data is really hard.  After
considering several options, I think the best approach is to copy
how
PCI does capabilities.  This allows the ioctl to only expose the
capabilities that are relevant for them, avoids data structures
that
are too complicated to parse, and avoids creating a new ioctl each
time we think of something else that we'd like to report.  This
method
also doesn't preclude extensions to the fixed structure since the
offset of these capabilities is entirely dynamic.

Comments welcome, I'll also follow-up to the QEMU and KVM lists
with
an RFC making use of this for mmaps skipping over the MSI-X table.
Thanks,


Out of curiosity - could this information be exposed to the userspace
via
/sys/bus/pci/devices/:xx:xx:x/vfio_? It seems not to change
after
vfio_pci driver is bound to a device.


For what purpose?  vfio doesn't have a sysfs interface, why start one?
Thanks,


well, it could simplify debugging a bit if this information was available 
from the userspace without programming a test tool doing some ioctl()'s. 
Not a big deal though...




--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html