Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt

2015-10-06 Thread Alex Williamson
On Tue, 2015-10-06 at 08:32 +, Bhushan Bharat wrote:
> 
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Tuesday, October 06, 2015 4:15 AM
> > To: Bhushan Bharat-R65777 
> > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> > marc.zyng...@arm.com; will.dea...@arm.com
> > Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> > interrupt
> > 
> > On Mon, 2015-10-05 at 07:20 +, Bhushan Bharat wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Saturday, October 03, 2015 4:17 AM
> > > > To: Bhushan Bharat-R65777 
> > > > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > > christoffer.d...@linaro.org; eric.au...@linaro.org;
> > > > pranavku...@linaro.org; marc.zyng...@arm.com; will.dea...@arm.com
> > > > Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> > > > interrupt
> > > >
> > > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > > An MSI-address is allocated and programmed in pcie device during
> > > > > interrupt configuration. Now for a pass-through device, try to
> > > > > create the iommu mapping for this allocted/programmed msi-address.
> > > > > If the iommu mapping is created and the msi address programmed in
> > > > > the pcie device is different from msi-iova as per iommu
> > > > > programming then reconfigure the pci device to use msi-iova as msi
> > address.
> > > > >
> > > > > Signed-off-by: Bharat Bhushan 
> > > > > ---
> > > > >  drivers/vfio/pci/vfio_pci_intrs.c | 36
> > > > > ++--
> > > > >  1 file changed, 34 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c
> > > > > b/drivers/vfio/pci/vfio_pci_intrs.c
> > > > > index 1f577b4..c9690af 100644
> > > > > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > > > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > > > > @@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct
> > > > vfio_pci_device *vdev,
> > > > >   int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> > > > >   char *name = msix ? "vfio-msix" : "vfio-msi";
> > > > >   struct eventfd_ctx *trigger;
> > > > > + struct msi_msg msg;
> > > > > + struct vfio_device *device;
> > > > > + uint64_t msi_addr, msi_iova;
> > > > >   int ret;
> > > > >
> > > > >   if (vector >= vdev->num_ctx)
> > > > >   return -EINVAL;
> > > > >
> > > > > + device = vfio_device_get_from_dev(>dev);
> > > >
> > > > Have you looked at this function?  I don't think we want to be doing
> > > > that every time we want to poke the interrupt configuration.
> > >
> > > I am trying to describe what I understood, a device can have many
> > interrupts and we should setup iommu only once, when called for the first
> > time to enable/setup interrupt.
> > > Similarly when disabling the interrupt we should iommu-unmap when
> > > called for the last enabled interrupt for that device. Now with this
> > > understanding, should I move this map-unmap to separate functions and
> > > call them from vfio_msi_set_block() rather than in
> > > vfio_msi_set_vector_signal()
> > 
> > Interrupts can be setup and torn down at any time and I don't see how one
> > function or the other makes much difference.
> > vfio_device_get_from_dev() is enough overhead that the data we need
> > should be cached if we're going to call it with some regularity.  Maybe
> > vfio_iommu_driver_ops.open() should be called with a pointer to the
> > vfio_device... or the vfio_group.
> 
> vfio_iommu_driver_ops.open() ? or do you mean vfio_pci_open() should be 
> called with vfio_device or vfio_group, and we will cache that in 
> vfio_pci_device ?

vfio_pci_open() is an implementation of vfio_iommu_driver_ops.open().
The internal API between vfio and vfio bus drivers would need to have a
parameter added.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Fix memory leak if timer initialization fails

2015-10-06 Thread Wei Huang


On 10/06/2015 03:14 AM, Pavel Fedin wrote:
> Jump to correct label and free kvm_host_cpu_state
> 
> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm/kvm/arm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index dc017ad..78b2869 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -1080,7 +1080,7 @@ static int init_hyp_mode(void)
>*/
>   err = kvm_timer_hyp_init();
>   if (err)
> - goto out_free_mappings;
> + goto out_free_context;
>  
>  #ifndef CONFIG_HOTPLUG_CPU
>   free_boot_hyp_pgd();
> 


kvm_host_cpu_state was allocated before kvm_timer_hyp_init() is called.
So it needs to be freed when kvm_timer_hyp_init() fails.

Reviewed-by: Wei Huang 

Thanks,
-Wei
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages

2015-10-06 Thread Alex Williamson
On Tue, 2015-10-06 at 09:05 +, Bhushan Bharat wrote:
> 
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Tuesday, October 06, 2015 4:15 AM
> > To: Bhushan Bharat-R65777 
> > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> > marc.zyng...@arm.com; will.dea...@arm.com
> > Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI
> > pages
> > 
> > On Mon, 2015-10-05 at 06:27 +, Bhushan Bharat wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Saturday, October 03, 2015 4:16 AM
> > > > To: Bhushan Bharat-R65777 
> > > > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > > christoffer.d...@linaro.org; eric.au...@linaro.org;
> > > > pranavku...@linaro.org; marc.zyng...@arm.com; will.dea...@arm.com
> > > > Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap
> > > > MSI pages
> > > >
> > > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > > For MSI interrupts to work for a pass-through devices we need to
> > > > > have mapping of msi-pages in iommu. Now on some platforms (like
> > > > > x86) does this msi-pages mapping happens magically and in these
> > > > > case they chooses an iova which they somehow know that it will
> > > > > never overlap with guest memory. But this magic iova selection may
> > > > > not be always true for all platform (like PowerPC and ARM64).
> > > > >
> > > > > Also on x86 platform, there is no problem as long as running a
> > > > > x86-guest on x86-host but there can be issues when running a
> > > > > non-x86 guest on
> > > > > x86 host or other userspace applications like (I think ODP/DPDK).
> > > > > As in these cases there can be chances that it overlaps with guest
> > > > > memory mapping.
> > > >
> > > > Wow, it's amazing anything works... smoke and mirrors.
> > > >
> > > > > This patch add interface to iommu-map and iommu-unmap msi-pages
> > at
> > > > > reserved iova chosen by userspace.
> > > > >
> > > > > Signed-off-by: Bharat Bhushan 
> > > > > ---
> > > > >  drivers/vfio/vfio.c |  52 +++
> > > > >  drivers/vfio/vfio_iommu_type1.c | 111
> > > > 
> > > > >  include/linux/vfio.h|   9 +++-
> > > > >  3 files changed, 171 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > > > > 2fb29df..a817d2d 100644
> > > > > --- a/drivers/vfio/vfio.c
> > > > > +++ b/drivers/vfio/vfio.c
> > > > > @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct
> > > > notifier_block *nb,
> > > > >   return NOTIFY_OK;
> > > > >  }
> > > > >
> > > > > +int vfio_device_map_msi(struct vfio_device *device, uint64_t
> > msi_addr,
> > > > > + uint32_t size, uint64_t *msi_iova) {
> > > > > + struct vfio_container *container = device->group->container;
> > > > > + struct vfio_iommu_driver *driver;
> > > > > + int ret;
> > > > > +
> > > > > + /* Validate address and size */
> > > > > + if (!msi_addr || !size || !msi_iova)
> > > > > + return -EINVAL;
> > > > > +
> > > > > + down_read(>group_lock);
> > > > > +
> > > > > + driver = container->iommu_driver;
> > > > > + if (!driver || !driver->ops || !driver->ops->msi_map) {
> > > > > + up_read(>group_lock);
> > > > > + return -EINVAL;
> > > > > + }
> > > > > +
> > > > > + ret = driver->ops->msi_map(container->iommu_data,
> > > > > +msi_addr, size, msi_iova);
> > > > > +
> > > > > + up_read(>group_lock);
> > > > > + return ret;
> > > > > +}
> > > > > +
> > > > > +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t
> > > > msi_iova,
> > > > > +   uint64_t size)
> > > > > +{
> > > > > + struct vfio_container *container = device->group->container;
> > > > > + struct vfio_iommu_driver *driver;
> > > > > + int ret;
> > > > > +
> > > > > + /* Validate address and size */
> > > > > + if (!msi_iova || !size)
> > > > > + return -EINVAL;
> > > > > +
> > > > > + down_read(>group_lock);
> > > > > +
> > > > > + driver = container->iommu_driver;
> > > > > + if (!driver || !driver->ops || !driver->ops->msi_unmap) {
> > > > > + up_read(>group_lock);
> > > > > + return -EINVAL;
> > > > > + }
> > > > > +
> > > > > + ret = driver->ops->msi_unmap(container->iommu_data,
> > > > > +  msi_iova, size);
> > > > > +
> > > > > + up_read(>group_lock);
> > > > > + return ret;
> > > > > +}
> > > > > +
> > > > >  /**
> > > > >   * VFIO driver API
> > > > >   */
> > > > > diff --git 

Re: [kvmtool] Add a link to the lwn.net article

2015-10-06 Thread Will Deacon
Hi Sven,

On Sat, Oct 03, 2015 at 10:42:55PM +1000, Sven Dowideit wrote:
> Signed-off-by: Sven Dowideit 
> ---
>  README | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/README b/README
> index 6667f23..5501f05 100644
> --- a/README
> +++ b/README
> @@ -97,6 +97,9 @@ project:
>  
>  http://thread.gmane.org/gmane.linux.kernel/962051/focus=962620
>  
> +Another detailed example can be found in the lwn.net article:
> +
> +http://lwn.net/Articles/658511/

Currently, this link requires an lwn subscription but I'm happy to take
this patch once it becomes freely available later this week.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCHv3 1/3] arm: Add PMU test

2015-10-06 Thread Andrew Jones
On Tue, Oct 06, 2015 at 01:49:24PM -0400, Christopher Covington wrote:
> Beginning with a simple sanity check of the control register, add
> a unit test for the ARM Performance Monitors Unit (PMU).
> 
> Signed-off-by: Christopher Covington 
> ---
>  arm/pmu.c   | 66 
> +
>  arm/unittests.cfg   |  5 
>  config/config-arm64.mak |  4 ++-
>  3 files changed, 74 insertions(+), 1 deletion(-)
>  create mode 100644 arm/pmu.c
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> new file mode 100644
> index 000..91a3688
> --- /dev/null
> +++ b/arm/pmu.c
> @@ -0,0 +1,66 @@
> +/*
> + * Test the ARM Performance Monitors Unit (PMU).
> + *
> + * Copyright 2015 The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU Lesser General Public License version 2.1 and
> + * only version 2.1 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but 
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public 
> License
> + * for more details.
> + */
> +#include "libcflat.h"
> +
> +struct pmu_data {
> + union {
> + uint32_t pmcr_el0;
> + struct {
> + uint32_t enable:1;
> + uint32_t event_counter_reset:1;
> + uint32_t cycle_counter_reset:1;
> + uint32_t cycle_counter_clock_divider:1;
> + uint32_t event_counter_export:1;
> + uint32_t cycle_counter_disable_when_prohibited:1;
> + uint32_t cycle_counter_long:1;
> + uint32_t zeros:4;

resv might be a better name than zeros. Actually, does the spec even
state these bits will always be read as zeros?

> + uint32_t num_counters:5;
> + uint32_t identification_code:8;
> + uint32_t implementer:8;
> + };
> + };
> +};
> +
> +/* As a simple sanity check on the PMCR_EL0, ensure the implementer field 
> isn't
> + * null. Also print out a couple other interesting fields for diagnostic
> + * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
> + * event counters and therefore reports zero of them, but hopefully support 
> for
^for
> + * at least the instructions event will be added in the future and the 
> reported
> + * number of event counters will become nonzero.
> + */
> +static bool check_pmcr(void)
> +{
> + struct pmu_data pmcr;

Thanks for making the union and bitfield anonymous. I suggested
naming the struct pmu_data because I'm expecting more stuff to
eventually get shoved in there. If not, then maybe it needs to
be renamed something more pmcr-ish. Or, if it might expand its
purpose as I expect, then this variable should be renamed, i.e.

  struct pmu_data pmu;

> +
> + asm volatile("mrs %0, pmcr_el0" : "=r" (pmcr));

And this should be

  asm volatile("mrs %0, pmcr_el0" : "=r" (pmu.pmcr_el0));

> +
> + printf("PMU implementer: %c\n", pmcr.implementer);
> + printf("Identification code: 0x%x\n", pmcr.identification_code);
> + printf("Event counters:  %d\n", pmcr.num_counters);
> +
> + if (pmcr.implementer)
> + return true;
> +
> + return false;

How about

  return pmcr.implementer != 0;

> +}
> +
> +int main(void)
> +{
> + report_prefix_push("pmu");
> +
> + report("Control register", check_pmcr());
> +
> + return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index e068a0c..fd94adb 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -35,3 +35,8 @@ file = selftest.flat
>  smp = `getconf _NPROCESSORS_CONF`
>  extra_params = -append 'smp'
>  groups = selftest
> +
> +# Test PMU support without -icount
> +[pmu]
> +file = pmu.flat
> +groups = pmu
> diff --git a/config/config-arm64.mak b/config/config-arm64.mak
> index d61b703..140b611 100644
> --- a/config/config-arm64.mak
> +++ b/config/config-arm64.mak
> @@ -12,9 +12,11 @@ cflatobjs += lib/arm64/processor.o
>  cflatobjs += lib/arm64/spinlock.o
>  
>  # arm64 specific tests
> -tests =
> +tests = $(TEST_DIR)/pmu.flat
>  
>  include config/config-arm-common.mak
>  
>  arch_clean: arm_clean
>   $(RM) lib/arm64/.*.d
> +
> +$(TEST_DIR)/pmu.elf: $(cstart.o) $(TEST_DIR)/pmu.o

We can have a PMU on 32-bit ARM processors too. If this test file
needs to be specific to 64-bit, then we should probably name it
less generically, like pmu64?

Thanks,
drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-06 Thread Bandan Das
Joerg Roedel  writes:

> On Thu, Oct 01, 2015 at 06:31:27PM -0400, Bandan Das wrote:
>> >> @@ -514,7 +514,7 @@ static void skip_emulated_instruction(struct kvm_vcpu 
>> >> *vcpu)
>> >>   struct vcpu_svm *svm = to_svm(vcpu);
>> >>  
>> >>   if (svm->vmcb->control.next_rip != 0) {
>> >> - WARN_ON(!static_cpu_has(X86_FEATURE_NRIPS));
>> >> + WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
>> >>   svm->next_rip = svm->vmcb->control.next_rip;
>> >>   }
>
> I looked again how this could possibly be triggered, and I am somewhat
> confused now.
>
> So svm->vmcb->control.next_rip is only written by hardware or in
> svm_check_intercept(). Both cases write only to this field, if the
> hardware supports X86_FEATURE_NRIPS. The write in nested_svm_vmexit only

Not until commit f104765b4f81fd74d69e0eb161e89096deade2db. So, an older L1
kernel will trigger it.

> targets the guests VMCB, and we don't use that one again.
>
> So I can't see how the WARN_ON above could be triggered. Do I miss
> something or might this also be a miscompilation of static_cpu_has?
>
>
>   Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-06 Thread Bandan Das
Joerg Roedel  writes:

> On Mon, Oct 05, 2015 at 01:42:43PM -0400, Bandan Das wrote:
>> Joerg Roedel  writes:
>> 
>> > On Mon, Oct 05, 2015 at 12:54:43PM -0400, Bandan Das wrote:
>> >> Joerg Roedel  writes:
>> >> The problems is that the next_rip field could be stale. If the processor 
>> >> supports
>> >> next_rip, then it will clear it out on the next entry. If it doesn't,
>> >> an old value just sits there (no matter who wrote it) and the problem
>> >> happens when skip_emulated_instruction advances the rip with an incorrect
>> >> value.
>> >
>> > So the right fix would be to just set the guests next_rip to 0 when we
>> > emulate vmrun, just like real hardware does, no?
>> 
>> Agreed, resetting to 0 if nrips isn't supported seems right. It would still
>> help having a printk_once in this case IMO :)
>
> I meant to reset it always to 0 on vmrun, like real hardware does.

Atleast the spec don't mention this, I don't know how I got that idea :) The 
spec
just say that it gets written to by hardware on certain intercepts and for 
others
it gets reset to 0 on #VMEXIT.

>
>
>   Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [kvm-unit-tests PATCHv3 2/3] arm: pmu: Check cycle count increases

2015-10-06 Thread Andrew Jones
On Tue, Oct 06, 2015 at 01:49:25PM -0400, Christopher Covington wrote:
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
> 
> Signed-off-by: Christopher Covington 
> ---
>  arm/pmu.c | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 91a3688..589e605 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -33,6 +33,8 @@ struct pmu_data {
>   };
>  };
>  
> +static const int samples = 10;

#define NR_SAMPLES 10

> +
>  /* As a simple sanity check on the PMCR_EL0, ensure the implementer field 
> isn't
>   * null. Also print out a couple other interesting fields for diagnostic
>   * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
> @@ -56,11 +58,38 @@ static bool check_pmcr(void)
>   return false;
>  }
>  
> +/* Ensure that the cycle counter progresses between back-to-back reads.
> + */

style nit: your block quotes don't have opening wing (the preferred
kernel style - and, fwiw, my preference too...)

> +static bool check_cycles_increase(void)
> +{
> + struct pmu_data pmcr;
> +
> + pmcr.enable = 1;
> + asm volatile("msr pmcr_el0, %0" : : "r" (pmcr));
> +
> + for (int i = 0; i < samples; i++) {
> + int a, b;
> +
> + asm volatile(
> + "mrs %[a], pmccntr_el0\n"
> + "mrs %[b], pmccntr_el0\n"
> + : [a] "=r" (a), [b] "=r" (b));
> +
> + if (a >= b) {
> + printf("Read %d then %d.\n", a, b);
> + return false;
> + }
> + }
> +
> + return true;
> +}
> +
>  int main(void)
>  {
>   report_prefix_push("pmu");
>  
>   report("Control register", check_pmcr());
> + report("Monotonically increasing cycle count", check_cycles_increase());
>  
>   return report_summary();
>  }
> -- 
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCHv3 3/3] arm: pmu: Add CPI checking

2015-10-06 Thread Andrew Jones
On Tue, Oct 06, 2015 at 01:49:26PM -0400, Christopher Covington wrote:
> Check the numbers of cycles per instruction (CPI) implied by ARM PMU
> cycle counter values. Check that in -icount mode these strictly
> match the specified rate.
> 
> Signed-off-by: Christopher Covington 
> ---
>  arm/pmu.c | 72 
> ++-
>  arm/unittests.cfg | 13 ++
>  2 files changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 589e605..0ad113d 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -84,12 +84,82 @@ static bool check_cycles_increase(void)
>   return true;
>  }
>  
> -int main(void)
> +/* Execute a known number of guest instructions. Only odd instruction counts
> + * greater than or equal to 3 are supported by the in-line assembly code. The
> + * control register (PMCR_EL0) is initialized with the provided value 
> (allowing
> + * for example for the cycle counter or event counters to be reset). At the 
> end
> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> + * counting, allowing the cycle counter or event counters to be read at the
> + * leisure of the calling code.
> + */
> +static void measure_instrs(int num, struct pmu_data pmcr)
> +{
> + int i = (num - 1) / 2;
> +
> + if (num < 3 || ((num - 1) % 2))
> + abort();

assert(num >= 3 && ((num - 1) % 2) == 0);

> +
> + asm volatile(
> + "msr pmcr_el0, %[pmcr]\n"
^\t
> + "1: subs %[i], %[i], #1\n"
   ^\t  ^\t
> + "b.gt 1b\n"
 ^\t
> + "msr pmcr_el0, xzr"
^\t
> + : [i] "+r" (i) : [pmcr] "r" (pmcr) : "cc");
> +}
> +
> +/* Measure cycle counts for various known instruction counts. Ensure that the
> + * cycle counter progresses (similar to check_cycles_increase() but with more
> + * instructions and using reset and stop controls). If supplied a positive,
> + * nonzero CPI parameter, also strictly check that every measurement matches
> + * it. Strict CPI checking is used to test -icount mode.
> + */
> +static bool check_cpi(int cpi)
> +{
> + struct pmu_data pmcr;
> +
> + pmcr.cycle_counter_reset = 1;
> + pmcr.enable = 1;
> +
> + if (cpi > 0)
> + printf("Checking for CPI=%d.\n", cpi);
> + printf("instrs : cycles0 cycles1 ...\n");
> +
> + for (int i = 3; i < 300; i += 32) {
> + int avg, sum = 0;
> +
> + printf("%d :", i);
> + for (int j = 0; j < samples; j++) {
> + int cycles;
> +
> + measure_instrs(i, pmcr);
> + asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> + printf(" %d", cycles);
> +
> + if (!cycles || (cpi > 0 && cycles != i * cpi)) {
> + printf("\n");
> + return false;
> + }
> +
> + sum += cycles;
> + }
> + avg = sum / samples;
> + printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
> + sum, avg, i / avg, avg / i);
> + }
> +
> + return true;
> +}
> +
> +int main(int argc, char *argv[])
>  {
>   report_prefix_push("pmu");
>  
>   report("Control register", check_pmcr());
>   report("Monotonically increasing cycle count", check_cycles_increase());
>  
> + int cpi = (argc == 1 ? atol(argv[0]) : 0);

I prefer variable declarations at the top of the function, and

  int cpi = 0;

  if (argc > 1)
cpi = atol(argv[0]);

looks a bit better to me.


> +
> + report("Cycle/instruction ratio", check_cpi(cpi));
> +
>   return report_summary();
>  }
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index fd94adb..333ee0d 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -39,4 +39,17 @@ groups = selftest
>  # Test PMU support without -icount
>  [pmu]
>  file = pmu.flat
> +extra_params = -append '-1'

Why do we need this cpu == -1? Can't it just be zero?

> +groups = pmu
> +
> +# Test PMU support with -icount IPC=1
> +[pmu-icount-1]
> +file = pmu.flat
> +extra_params = -icount 0 -append '1'
> +groups = pmu
> +
> +# Test PMU support with -icount IPC=256
> +[pmu-icount-256]
> +file = pmu.flat
> +extra_params = -icount 8 -append '256'
>  groups = pmu

-icount is a tcg specific parameter. I have a patch[*] in my staging
branch which allows you to specify 'accel = tcg' in unittests.cfg for
this type of test. You'll need to use that for anything with -icount
on the extra_params list.

Thanks,
drew

[*] 
https://github.com/rhdrjones/kvm-unit-tests/commit/85e084cf263e76484f7d82cbc9add4e7602f80a4
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: x86: fix edge EOI and IOAPIC reconfig race

2015-10-06 Thread Radim Krčmář
2015-08-15 02:00+0200, Paolo Bonzini:
>On 14/08/2015 10:38, Radim Krčmář wrote:
>>> How do you reproduce the bug?
>> I run rhel4 (2.6.9) kernel on 2 VCPUs and frequently alternate
>> smp_affinity of "timer".  The bug is hit within seconds.
> 
> Nice, I'll try to make a unit test for it on the plane. :)

(Time on planes is best spent by sleeping ;])

What do you think about the series?

I made a prototype kvm-unit-test ...
(Reproduces with APICv or split irqchip builds.)
---8<---
x86: test IO-APIC dest_id modification before

Regression test for kvm commit $TODO.
Run with '-smp 2'.

I had problems with handle_irq so this version uses asm workaround;
will fix that in final version.  Initialization also presumes that
everything will work as it did on my machines :)
---
 x86/ioapic.c | 47 +--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/x86/ioapic.c b/x86/ioapic.c
index 1fe1ccc9eadd..55f8eea03496 100644
--- a/x86/ioapic.c
+++ b/x86/ioapic.c
@@ -4,21 +4,27 @@
 #include "smp.h"
 #include "desc.h"
 #include "isr.h"
+#include "io.h"
 
 #define EDGE_TRIGGERED 0
 #define LEVEL_TRIGGERED 1
 
-static void set_ioapic_redir(unsigned line, unsigned vec, unsigned trig_mode)
+static void __set_ioapic_redir(unsigned line, u8 vec, bool trig_mode, u8 
dest_id)
 {
ioapic_redir_entry_t e = {
.vector = vec,
-   .delivery_mode = 0,
.trig_mode = trig_mode,
+   .dest_id = dest_id,
};
 
ioapic_write_redir(line, e);
 }
 
+static void set_ioapic_redir(unsigned line, unsigned vec, unsigned trig_mode)
+{
+   __set_ioapic_redir(line, vec, trig_mode, 0);
+}
+
 static void set_irq_line(unsigned line, int val)
 {
asm volatile("out %0, %1" : : "a"((u8)val), "d"((u16)(0x2000 + line)));
@@ -298,6 +304,41 @@ static void test_ioapic_level_retrigger_mask(void)
set_mask(0x0e, false);
 }
 
+static volatile int pit_working = -1;
+
+static void __attribute__((used)) pit_isr_fn(void)
+{
+   if (!pit_working++);
+   __set_ioapic_redir(0x2, 0x84, EDGE_TRIGGERED, 1);
+   eoi();
+}
+
+void pit_isr(void);
+asm (
+ "pit_isr: \n"
+ "   call pit_isr_fn \n"
+#ifndef __x86_64__
+ "   iret"
+#else
+ "   iretq"
+#endif
+ );
+
+static void test_ioapic_eoi_bug(void)
+{
+   if (cpu_count() < 2)
+   return;
+
+   set_idt_entry(0x84, pit_isr, 0);
+   __set_ioapic_redir(0x2, 0x84, EDGE_TRIGGERED, 0);
+
+   outb(0x34, 0x43);
+
+   for (unsigned loops = 1; loops && pit_working < 1; loops--)
+   asm volatile ("nop");
+
+   report("dest_id reconfiguration before EOI", pit_working >= 1);
+}
 
 int main(void)
 {
@@ -325,5 +366,7 @@ int main(void)
test_ioapic_level_mask();
test_ioapic_level_retrigger_mask();
 
+   test_ioapic_eoi_bug();
+
return report_summary();
 }
-- 
2.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state

2015-10-06 Thread Bhushan Bharat


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, October 06, 2015 4:15 AM
> To: Bhushan Bharat-R65777 
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> marc.zyng...@arm.com; will.dea...@arm.com
> Subject: Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs
> automap state
> 
> On Mon, 2015-10-05 at 06:00 +, Bhushan Bharat wrote:
> > > -1138,6 +1156,8 @@
> > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > }
> > > > } else if (cmd == VFIO_IOMMU_GET_INFO) {
> > > > struct vfio_iommu_type1_info info;
> > > > +   struct iommu_domain_msi_maps msi_maps;
> > > > +   int ret;
> > > >
> > > > minsz = offsetofend(struct vfio_iommu_type1_info,
> > > iova_pgsizes);
> > > >
> > > > @@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void
> > > > *iommu_data,
> > > >
> > > > info.flags = 0;
> > > >
> > > > +   ret = vfio_domains_get_msi_maps(iommu, _maps);
> > > > +   if (ret)
> > > > +   return ret;
> > >
> > > And now ioctl(VFIO_IOMMU_GET_INFO) no longer works for any
> IOMMU
> > > implementing domain_get_attr but not supporting
> > > DOMAIN_ATTR_MSI_MAPPING.
> >
> > With this current patch version this will get the default assumed behavior
> as you commented on previous patch.
> 
> How so?

You are right, the ioctl will return failure. But that should be ok, right?

> 
> +   msi_maps->automap = true;
> +   msi_maps->override_automap = false;
> +
> +   if (domain->ops->domain_get_attr)
> +   ret = domain->ops->domain_get_attr(domain, attr,
> + data);
> 
> If domain_get_attr is implemented, but DOMAIN_ATTR_MSI_MAPPING is
> not, ret should be an error code.

Currently it returns same error code returned by 
domain->ops->domain_get_attr(). 
I do not think we want to complicate that we return an error to user-space that 
msi's probably cannot be used but user-space can continue with Legacy 
interrupt, or you want that?

Thanks
-Bharat



RE: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages

2015-10-06 Thread Bhushan Bharat


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, October 06, 2015 4:15 AM
> To: Bhushan Bharat-R65777 
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> marc.zyng...@arm.com; will.dea...@arm.com
> Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI
> pages
> 
> On Mon, 2015-10-05 at 06:27 +, Bhushan Bharat wrote:
> >
> >
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Saturday, October 03, 2015 4:16 AM
> > > To: Bhushan Bharat-R65777 
> > > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > christoffer.d...@linaro.org; eric.au...@linaro.org;
> > > pranavku...@linaro.org; marc.zyng...@arm.com; will.dea...@arm.com
> > > Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap
> > > MSI pages
> > >
> > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > For MSI interrupts to work for a pass-through devices we need to
> > > > have mapping of msi-pages in iommu. Now on some platforms (like
> > > > x86) does this msi-pages mapping happens magically and in these
> > > > case they chooses an iova which they somehow know that it will
> > > > never overlap with guest memory. But this magic iova selection may
> > > > not be always true for all platform (like PowerPC and ARM64).
> > > >
> > > > Also on x86 platform, there is no problem as long as running a
> > > > x86-guest on x86-host but there can be issues when running a
> > > > non-x86 guest on
> > > > x86 host or other userspace applications like (I think ODP/DPDK).
> > > > As in these cases there can be chances that it overlaps with guest
> > > > memory mapping.
> > >
> > > Wow, it's amazing anything works... smoke and mirrors.
> > >
> > > > This patch add interface to iommu-map and iommu-unmap msi-pages
> at
> > > > reserved iova chosen by userspace.
> > > >
> > > > Signed-off-by: Bharat Bhushan 
> > > > ---
> > > >  drivers/vfio/vfio.c |  52 +++
> > > >  drivers/vfio/vfio_iommu_type1.c | 111
> > > 
> > > >  include/linux/vfio.h|   9 +++-
> > > >  3 files changed, 171 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > > > 2fb29df..a817d2d 100644
> > > > --- a/drivers/vfio/vfio.c
> > > > +++ b/drivers/vfio/vfio.c
> > > > @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct
> > > notifier_block *nb,
> > > > return NOTIFY_OK;
> > > >  }
> > > >
> > > > +int vfio_device_map_msi(struct vfio_device *device, uint64_t
> msi_addr,
> > > > +   uint32_t size, uint64_t *msi_iova) {
> > > > +   struct vfio_container *container = device->group->container;
> > > > +   struct vfio_iommu_driver *driver;
> > > > +   int ret;
> > > > +
> > > > +   /* Validate address and size */
> > > > +   if (!msi_addr || !size || !msi_iova)
> > > > +   return -EINVAL;
> > > > +
> > > > +   down_read(>group_lock);
> > > > +
> > > > +   driver = container->iommu_driver;
> > > > +   if (!driver || !driver->ops || !driver->ops->msi_map) {
> > > > +   up_read(>group_lock);
> > > > +   return -EINVAL;
> > > > +   }
> > > > +
> > > > +   ret = driver->ops->msi_map(container->iommu_data,
> > > > +  msi_addr, size, msi_iova);
> > > > +
> > > > +   up_read(>group_lock);
> > > > +   return ret;
> > > > +}
> > > > +
> > > > +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t
> > > msi_iova,
> > > > + uint64_t size)
> > > > +{
> > > > +   struct vfio_container *container = device->group->container;
> > > > +   struct vfio_iommu_driver *driver;
> > > > +   int ret;
> > > > +
> > > > +   /* Validate address and size */
> > > > +   if (!msi_iova || !size)
> > > > +   return -EINVAL;
> > > > +
> > > > +   down_read(>group_lock);
> > > > +
> > > > +   driver = container->iommu_driver;
> > > > +   if (!driver || !driver->ops || !driver->ops->msi_unmap) {
> > > > +   up_read(>group_lock);
> > > > +   return -EINVAL;
> > > > +   }
> > > > +
> > > > +   ret = driver->ops->msi_unmap(container->iommu_data,
> > > > +msi_iova, size);
> > > > +
> > > > +   up_read(>group_lock);
> > > > +   return ret;
> > > > +}
> > > > +
> > > >  /**
> > > >   * VFIO driver API
> > > >   */
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > b/drivers/vfio/vfio_iommu_type1.c index 3315fb6..ab376c2 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -1003,12 +1003,34 @@ 

[PATCH] KVM: arm/arm64: Fix memory leak if timer initialization fails

2015-10-06 Thread Pavel Fedin
Jump to correct label and free kvm_host_cpu_state

Signed-off-by: Pavel Fedin 
---
 arch/arm/kvm/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dc017ad..78b2869 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -1080,7 +1080,7 @@ static int init_hyp_mode(void)
 */
err = kvm_timer_hyp_init();
if (err)
-   goto out_free_mappings;
+   goto out_free_context;
 
 #ifndef CONFIG_HOTPLUG_CPU
free_boot_hyp_pgd();
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PING: [PATCH v4 0/7] KVM: arm64: Implement API for vGICv3 live migration

2015-10-06 Thread Pavel Fedin
 Hello! Sorry if you're terribly busy, but... small gentle PING...

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

> -Original Message-
> From: kvmarm-boun...@lists.cs.columbia.edu 
> [mailto:kvmarm-boun...@lists.cs.columbia.edu] On
> Behalf Of Pavel Fedin
> Sent: Monday, September 28, 2015 6:27 PM
> To: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org
> Cc: Marc Zyngier; Andre Przywara
> Subject: [PATCH v4 0/7] KVM: arm64: Implement API for vGICv3 live migration
> 
> This patchset adds necessary userspace API in order to support vGICv3 live
> migration. GICv3 registers are accessed using device attribute ioctls,
> similar to GICv2.
> 
> v3 => v4:
> - Split pure refactoring from anything else
> - Documentation brought up to date
> - Cleaned up 'mmio' structure usage in vgic_attr_regs_access(),
>   use call_range_handler() for 64-bit access handling
> - Rebased on new linux-next
> 
> v2 => v3:
> - KVM_DEV_ARM_VGIC_CPUID_MASK enlarged to 20 bits, allowing more than 256
>   CPUs.
> - Bug fix: Correctly set mmio->private, necessary for redistributor access.
> - Added accessors for ICC_AP0R and ICC_AP1R registers
> - Rebased on new linux-next
> 
> v1 => v2:
> - Do not use generic register get/set API for CPU interface, use only
>   device attributes.
> - Introduce size specifier for distributor and redistributor register
>   accesses, do not assume size any more.
> - Lots of refactor and reusable code extraction.
> - Added forgotten documentation
> 
> Pavel Fedin (7):
>   KVM: arm/arm64: Move endianness conversion out of
> vgic_attr_regs_access()
>   KVM: arm/arm64: Refactor vGIC attributes handling code
>   KVM: arm/arm64: Fix the documentation
>   KVM: arm64: Implement vGICv3 distributor and redistributor access from
> userspace
>   KVM: arm64: Refactor system register handlers
>   KVM: arm64: Introduce find_reg_by_id()
>   Implement vGICv3 CPU interface access
> 
>  Documentation/virtual/kvm/devices/arm-vgic.txt |  90 ++-
>  arch/arm64/include/uapi/asm/kvm.h  |  11 +-
>  arch/arm64/kvm/sys_regs.c  |  83 +++---
>  arch/arm64/kvm/sys_regs.h  |   8 +-
>  arch/arm64/kvm/sys_regs_generic_v8.c   |   2 +-
>  include/linux/irqchip/arm-gic-v3.h |  18 +-
>  virt/kvm/arm/vgic-v2-emul.c| 122 ++---
>  virt/kvm/arm/vgic-v3-emul.c| 338 
> -
>  virt/kvm/arm/vgic.c|  65 +
>  virt/kvm/arm/vgic.h|   4 +
>  10 files changed, 571 insertions(+), 170 deletions(-)
> 
> --
> 2.4.4
> 
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt

2015-10-06 Thread Bhushan Bharat


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, October 06, 2015 4:15 AM
> To: Bhushan Bharat-R65777 
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> marc.zyng...@arm.com; will.dea...@arm.com
> Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> interrupt
> 
> On Mon, 2015-10-05 at 07:20 +, Bhushan Bharat wrote:
> >
> >
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Saturday, October 03, 2015 4:17 AM
> > > To: Bhushan Bharat-R65777 
> > > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > christoffer.d...@linaro.org; eric.au...@linaro.org;
> > > pranavku...@linaro.org; marc.zyng...@arm.com; will.dea...@arm.com
> > > Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> > > interrupt
> > >
> > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > An MSI-address is allocated and programmed in pcie device during
> > > > interrupt configuration. Now for a pass-through device, try to
> > > > create the iommu mapping for this allocted/programmed msi-address.
> > > > If the iommu mapping is created and the msi address programmed in
> > > > the pcie device is different from msi-iova as per iommu
> > > > programming then reconfigure the pci device to use msi-iova as msi
> address.
> > > >
> > > > Signed-off-by: Bharat Bhushan 
> > > > ---
> > > >  drivers/vfio/pci/vfio_pci_intrs.c | 36
> > > > ++--
> > > >  1 file changed, 34 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c
> > > > b/drivers/vfio/pci/vfio_pci_intrs.c
> > > > index 1f577b4..c9690af 100644
> > > > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > > > @@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct
> > > vfio_pci_device *vdev,
> > > > int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> > > > char *name = msix ? "vfio-msix" : "vfio-msi";
> > > > struct eventfd_ctx *trigger;
> > > > +   struct msi_msg msg;
> > > > +   struct vfio_device *device;
> > > > +   uint64_t msi_addr, msi_iova;
> > > > int ret;
> > > >
> > > > if (vector >= vdev->num_ctx)
> > > > return -EINVAL;
> > > >
> > > > +   device = vfio_device_get_from_dev(>dev);
> > >
> > > Have you looked at this function?  I don't think we want to be doing
> > > that every time we want to poke the interrupt configuration.
> >
> > I am trying to describe what I understood, a device can have many
> interrupts and we should setup iommu only once, when called for the first
> time to enable/setup interrupt.
> > Similarly when disabling the interrupt we should iommu-unmap when
> > called for the last enabled interrupt for that device. Now with this
> > understanding, should I move this map-unmap to separate functions and
> > call them from vfio_msi_set_block() rather than in
> > vfio_msi_set_vector_signal()
> 
> Interrupts can be setup and torn down at any time and I don't see how one
> function or the other makes much difference.
> vfio_device_get_from_dev() is enough overhead that the data we need
> should be cached if we're going to call it with some regularity.  Maybe
> vfio_iommu_driver_ops.open() should be called with a pointer to the
> vfio_device... or the vfio_group.

vfio_iommu_driver_ops.open() ? or do you mean vfio_pci_open() should be called 
with vfio_device or vfio_group, and we will cache that in vfio_pci_device ?

> 
> > >  Also note that
> > > IOMMU mappings don't operate on devices, but groups, so maybe we
> > > want to pass the group.
> >
> > Yes, it operates on group. I hesitated to add an API to get group. Do you
> suggest to that it is ok to add API to get group from device.
> 
> No, the above suggestion is probably better.
> 
> > >
> > > > +   if (device == NULL)
> > > > +   return -EINVAL;
> > >
> > > This would be a legitimate BUG_ON(!device)
> > >
> > > > +
> > > > if (vdev->ctx[vector].trigger) {
> > > > free_irq(irq, vdev->ctx[vector].trigger);
> > > > +   get_cached_msi_msg(irq, );
> > > > +   msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo;
> > > > +   vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE);
> > > > kfree(vdev->ctx[vector].name);
> > > > eventfd_ctx_put(vdev->ctx[vector].trigger);
> > > > vdev->ctx[vector].trigger = NULL; @@ -346,12 +356,11 @@
> static
> > > > int vfio_msi_set_vector_signal(struct
> > > vfio_pci_device *vdev,
> > > >  * cached value of the message prior to enabling.
> > > >  */
> > > > if (msix) {
> > > > - 

[kvm-unit-tests PATCHv3 3/3] arm: pmu: Add CPI checking

2015-10-06 Thread Christopher Covington
Check the numbers of cycles per instruction (CPI) implied by ARM PMU
cycle counter values. Check that in -icount mode these strictly
match the specified rate.

Signed-off-by: Christopher Covington 
---
 arm/pmu.c | 72 ++-
 arm/unittests.cfg | 13 ++
 2 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 589e605..0ad113d 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -84,12 +84,82 @@ static bool check_cycles_increase(void)
return true;
 }
 
-int main(void)
+/* Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, struct pmu_data pmcr)
+{
+   int i = (num - 1) / 2;
+
+   if (num < 3 || ((num - 1) % 2))
+   abort();
+
+   asm volatile(
+   "msr pmcr_el0, %[pmcr]\n"
+   "1: subs %[i], %[i], #1\n"
+   "b.gt 1b\n"
+   "msr pmcr_el0, xzr"
+   : [i] "+r" (i) : [pmcr] "r" (pmcr) : "cc");
+}
+
+/* Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+   struct pmu_data pmcr;
+
+   pmcr.cycle_counter_reset = 1;
+   pmcr.enable = 1;
+
+   if (cpi > 0)
+   printf("Checking for CPI=%d.\n", cpi);
+   printf("instrs : cycles0 cycles1 ...\n");
+
+   for (int i = 3; i < 300; i += 32) {
+   int avg, sum = 0;
+
+   printf("%d :", i);
+   for (int j = 0; j < samples; j++) {
+   int cycles;
+
+   measure_instrs(i, pmcr);
+   asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+   printf(" %d", cycles);
+
+   if (!cycles || (cpi > 0 && cycles != i * cpi)) {
+   printf("\n");
+   return false;
+   }
+
+   sum += cycles;
+   }
+   avg = sum / samples;
+   printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
+   sum, avg, i / avg, avg / i);
+   }
+
+   return true;
+}
+
+int main(int argc, char *argv[])
 {
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
report("Monotonically increasing cycle count", check_cycles_increase());
 
+   int cpi = (argc == 1 ? atol(argv[0]) : 0);
+
+   report("Cycle/instruction ratio", check_cpi(cpi));
+
return report_summary();
 }
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index fd94adb..333ee0d 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -39,4 +39,17 @@ groups = selftest
 # Test PMU support without -icount
 [pmu]
 file = pmu.flat
+extra_params = -append '-1'
+groups = pmu
+
+# Test PMU support with -icount IPC=1
+[pmu-icount-1]
+file = pmu.flat
+extra_params = -icount 0 -append '1'
+groups = pmu
+
+# Test PMU support with -icount IPC=256
+[pmu-icount-256]
+file = pmu.flat
+extra_params = -icount 8 -append '256'
 groups = pmu
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCHv3 1/3] arm: Add PMU test

2015-10-06 Thread Christopher Covington
Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington 
---
 arm/pmu.c   | 66 +
 arm/unittests.cfg   |  5 
 config/config-arm64.mak |  4 ++-
 3 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 000..91a3688
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,66 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright 2015 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+
+struct pmu_data {
+   union {
+   uint32_t pmcr_el0;
+   struct {
+   uint32_t enable:1;
+   uint32_t event_counter_reset:1;
+   uint32_t cycle_counter_reset:1;
+   uint32_t cycle_counter_clock_divider:1;
+   uint32_t event_counter_export:1;
+   uint32_t cycle_counter_disable_when_prohibited:1;
+   uint32_t cycle_counter_long:1;
+   uint32_t zeros:4;
+   uint32_t num_counters:5;
+   uint32_t identification_code:8;
+   uint32_t implementer:8;
+   };
+   };
+};
+
+/* As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero of them, but hopefully support for
+ * at least the instructions event will be added in the future and the reported
+ * number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+   struct pmu_data pmcr;
+
+   asm volatile("mrs %0, pmcr_el0" : "=r" (pmcr));
+
+   printf("PMU implementer: %c\n", pmcr.implementer);
+   printf("Identification code: 0x%x\n", pmcr.identification_code);
+   printf("Event counters:  %d\n", pmcr.num_counters);
+
+   if (pmcr.implementer)
+   return true;
+
+   return false;
+}
+
+int main(void)
+{
+   report_prefix_push("pmu");
+
+   report("Control register", check_pmcr());
+
+   return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index e068a0c..fd94adb 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -35,3 +35,8 @@ file = selftest.flat
 smp = `getconf _NPROCESSORS_CONF`
 extra_params = -append 'smp'
 groups = selftest
+
+# Test PMU support without -icount
+[pmu]
+file = pmu.flat
+groups = pmu
diff --git a/config/config-arm64.mak b/config/config-arm64.mak
index d61b703..140b611 100644
--- a/config/config-arm64.mak
+++ b/config/config-arm64.mak
@@ -12,9 +12,11 @@ cflatobjs += lib/arm64/processor.o
 cflatobjs += lib/arm64/spinlock.o
 
 # arm64 specific tests
-tests =
+tests = $(TEST_DIR)/pmu.flat
 
 include config/config-arm-common.mak
 
 arch_clean: arm_clean
$(RM) lib/arm64/.*.d
+
+$(TEST_DIR)/pmu.elf: $(cstart.o) $(TEST_DIR)/pmu.o
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCHv3 2/3] arm: pmu: Check cycle count increases

2015-10-06 Thread Christopher Covington
Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington 
---
 arm/pmu.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 91a3688..589e605 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -33,6 +33,8 @@ struct pmu_data {
};
 };
 
+static const int samples = 10;
+
 /* As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
  * null. Also print out a couple other interesting fields for diagnostic
  * purposes. For example, as of fall 2015, QEMU TCG mode doesn't implement
@@ -56,11 +58,38 @@ static bool check_pmcr(void)
return false;
 }
 
+/* Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+   struct pmu_data pmcr;
+
+   pmcr.enable = 1;
+   asm volatile("msr pmcr_el0, %0" : : "r" (pmcr));
+
+   for (int i = 0; i < samples; i++) {
+   int a, b;
+
+   asm volatile(
+   "mrs %[a], pmccntr_el0\n"
+   "mrs %[b], pmccntr_el0\n"
+   : [a] "=r" (a), [b] "=r" (b));
+
+   if (a >= b) {
+   printf("Read %d then %d.\n", a, b);
+   return false;
+   }
+   }
+
+   return true;
+}
+
 int main(void)
 {
report_prefix_push("pmu");
 
report("Control register", check_pmcr());
+   report("Monotonically increasing cycle count", check_cycles_increase());
 
return report_summary();
 }
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCHv3] ARM PMU tests

2015-10-06 Thread Christopher Covington
Changes from v2:

* Explicit test for monotonically increasing cycle count
* Tests now pass or fail
* Tests broken into functions
* Tests/functions broken into separate patches in series
* Style improvements as suggested by Wei Huang and Linux checkpatch.pl
* Spelling and comment improvements

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v4] os-android: Add support to android platform

2015-10-06 Thread Houcheng Lin
Hi,

There are 7 sources still call basename() directly and block/vvfat.c
define its own static basename() function. Please see the grep below:

➜  qemu git:(patch-v4) ✗ grep  "basename(" **/*.c  | grep -v get_basename
fsdev/virtfs-proxy-helper.c:basename(prog));
hw/vfio/pci.c:group_name = basename(iommu_group_path);
hw/vfio/platform.c:group_name = basename(iommu_group_path);
linux-user/elfload.c:base_filename = strdup(basename(filename));
qemu-io.c:progname = basename(argv[0]);
qemu-nbd.c:snprintf(sockpath, 128, SOCKET_PATH, basename(device));
qga/commands-posix.c:driver = g_strdup(basename(buf));
qga/commands-posix.c:fs->name = g_strdup(basename(syspath));

➜  qemu git:(patch-v4) ✗ grep get_basename **/*.c
block/vvfat.c:static const char* get_basename(const char* path)
block/vvfat.c: basename2 = get_basename(path);
block/vvfat.c:basename = get_basename(mapping->path);
block/vvfat.c: const char* basename = get_basename(mapping->path);
block/vvfat.c: const char* basename = get_basename(mapping->path);
block/vvfat.c: const char* basename2 = get_basename(path);
block/vvfat.c: - strlen(get_basename(commit->path)) - 1;
linux-user/elfload.c:base_filename =
g_path_get_basename(ts->bprm->filename);

Directly change all of them to g_path_get_basename ?

2015-10-06 21:22 GMT+08:00 Paolo Bonzini :
>
>
> On 06/10/2015 14:13, Eric Blake wrote:
>>
>> If you are already writing code to be ported to both Unixy and
>> windows systems, you are better off rolling your own alternative to
>> basename (or better, using something else that has already rolled a
>> portable version for you - while I know gnulib does that, we aren't
>> using gnulib; but I assume glib has something along those lines
>> even though I haven't looked for it).
>
> Yes, there is g_path_get_basename (and g_path_get_dirname).  Added to
> http://wiki.qemu.org/BiteSizedTasks#API_conversion.
>
> Paolo



-- 
Best regards,
Houcheng Lin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region

2015-10-06 Thread Bhushan Bharat


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, October 06, 2015 4:15 AM
> To: Bhushan Bharat-R65777 
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> marc.zyng...@arm.com; will.dea...@arm.com
> Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova
> region
> 
> On Mon, 2015-10-05 at 04:55 +, Bhushan Bharat wrote:
> > Hi Alex,
> >
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Saturday, October 03, 2015 4:16 AM
> > > To: Bhushan Bharat-R65777 
> > > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > christoffer.d...@linaro.org; eric.au...@linaro.org;
> > > pranavku...@linaro.org; marc.zyng...@arm.com; will.dea...@arm.com
> > > Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del
> > > reserved iova region
> > >
> > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > This Patch adds the VFIO APIs to add and remove reserved iova regions.
> > > > The reserved iova region can be used for mapping some specific
> > > > physical address in iommu.
> > > >
> > > > Currently we are planning to use this interface for adding iova
> > > > regions for creating iommu of msi-pages. But the API are designed
> > > > for future extension where some other physical address can be
> mapped.
> > > >
> > > > Signed-off-by: Bharat Bhushan 
> > > > ---
> > > >  drivers/vfio/vfio_iommu_type1.c | 201
> > > +++-
> > > >  include/uapi/linux/vfio.h   |  43 +
> > > >  2 files changed, 243 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > b/drivers/vfio/vfio_iommu_type1.c index 57d8c37..fa5d3e4 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -59,6 +59,7 @@ struct vfio_iommu {
> > > > struct rb_root  dma_list;
> > > > boolv2;
> > > > boolnesting;
> > > > +   struct list_headreserved_iova_list;
> > >
> > > This alignment leads to poor packing in the structure, put it above the
> bools.
> >
> > ok
> >
> > >
> > > >  };
> > > >
> > > >  struct vfio_domain {
> > > > @@ -77,6 +78,15 @@ struct vfio_dma {
> > > > int prot;   /* IOMMU_READ/WRITE */
> > > >  };
> > > >
> > > > +struct vfio_resvd_region {
> > > > +   dma_addr_t  iova;
> > > > +   size_t  size;
> > > > +   int prot;   /* IOMMU_READ/WRITE */
> > > > +   int refcount;   /* ref count of 
> > > > mappings */
> > > > +   uint64_tmap_paddr;  /* Mapped Physical 
> > > > Address
> > > */
> > >
> > > phys_addr_t
> >
> > Ok,
> >
> > >
> > > > +   struct list_head next;
> > > > +};
> > > > +
> > > >  struct vfio_group {
> > > > struct iommu_group  *iommu_group;
> > > > struct list_headnext;
> > > > @@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct
> > > vfio_iommu *iommu,
> > > > return NULL;
> > > >  }
> > > >
> > > > +/* This function must be called with iommu->lock held */ static
> > > > +bool vfio_overlap_with_resvd_region(struct vfio_iommu *iommu,
> > > > +  dma_addr_t start, size_t 
> > > > size) {
> > > > +   struct vfio_resvd_region *region;
> > > > +
> > > > +   list_for_each_entry(region, >reserved_iova_list, next) {
> > > > +   if (region->iova < start)
> > > > +   return (start - region->iova < region->size);
> > > > +   else if (start < region->iova)
> > > > +   return (region->iova - start < size);
> > >
> > > <= on both of the return lines?
> >
> > I think is should be "<" and not "=<", no ?
> 
> Yep, looks like you're right.  Maybe there's a more straightforward way to do
> this.
> 
> > >
> > > > +
> > > > +   return (region->size > 0 && size > 0);
> > > > +   }
> > > > +
> > > > +   return false;
> > > > +}
> > > > +
> > > > +/* This function must be called with iommu->lock held */ static
> > > > +struct vfio_resvd_region *vfio_find_resvd_region(struct
> > > > +vfio_iommu
> > > *iommu,
> > > > +dma_addr_t start, 
> > > > size_t
> > > size) {
> > > > +   struct vfio_resvd_region *region;
> > > > +
> > > > +   list_for_each_entry(region, >reserved_iova_list, next)
> > > > +   if (region->iova == start && region->size == size)
> > > > +   return region;
> > > > +
> > > > +   return NULL;
> > > > +}
> > > > +
> > > >  static void vfio_link_dma(struct vfio_iommu *iommu, struct
> > 

Re: [Qemu-devel] [RFC PATCH v4] os-android: Add support to android platform

2015-10-06 Thread Stefan Hajnoczi
On Sat, Oct 03, 2015 at 12:44:14PM +0800, Houcheng Lin wrote:
> diff --git a/configure b/configure
> index d7c24cd..cda88c1 100755
> --- a/configure
> +++ b/configure
> @@ -567,7 +567,6 @@ fi
>  
>  # host *BSD for user mode
>  HOST_VARIANT_DIR=""
> -
>  case $targetos in
>  CYGWIN*)
>mingw32="yes"

Spurious whitespace change

> diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
> index b1beaa6..44beee3 100644
> --- a/hw/i386/kvm/pci-assign.c
> +++ b/hw/i386/kvm/pci-assign.c
> @@ -22,7 +22,6 @@
>   */
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 

What is the justification for this?  Do you know why io.h was included
before?

> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index ab3c876..9e26d10 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -74,6 +74,14 @@ typedef unsigned intuint_fast16_t;
>  typedef signed int  int_fast16_t;
>  #endif
>  
> +#ifdef CONFIG_ANDROID
> +/*
> + * For include the basename prototyping in android.
> + */
> +#include 

Files that use basename(3) should include libgen.h.  Why include it
here?

> +#define IOV_MAX 1024

Are you sure that Android NDK headers do not contain this constant?

> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 3ae4987..4ae746b 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -62,6 +62,8 @@ extern int daemon(int, int);
>  #include 
>  #include 
>  #include 
> +#include 

Why did you include time.h?

> +#include 
>  
>  #ifdef CONFIG_LINUX
>  #include 
> @@ -482,3 +484,17 @@ int qemu_read_password(char *buf, int buf_size)
>  printf("\n");
>  return ret;
>  }
> +
> +int qemu_getdtablesize(void)
> +{
> +#ifdef CONFIG_ANDROID
> +struct rlimit r;
> +
> +if (getrlimit(RLIMIT_NOFILE, ) < 0) {
> +return sysconf(_SC_OPEN_MAX);
> +}
> +return r.rlim_cur;
> +#else
> +return getdtablesize();
> +#endif
> +}

We can probably drop the getdtablesize() call completely and use the
CONFIG_ANDROID code on all platforms.  I suggest splitting this out into
a separate patch that introduces qemu_getdtablesize() and converts all
callers.

> diff --git a/util/qemu-openpty.c b/util/qemu-openpty.c
> index 4c53211..b305886 100644
> --- a/util/qemu-openpty.c
> +++ b/util/qemu-openpty.c
> @@ -51,12 +51,17 @@
>  # include 
>  #endif
>  
> -#ifdef __sun__
> +#if defined(__sun__) || defined(CONFIG_ANDROID)
> +
>  /* Once Solaris has openpty(), this is going to be removed. */
>  static int openpty(int *amaster, int *aslave, char *name,
> struct termios *termp, struct winsize *winp)
>  {
> +#if defined(CONFIG_ANDROID)
> +char slave[PATH_MAX];
> +#else
>  const char *slave;
> +#endif
>  int mfd = -1, sfd = -1;
>  
>  *amaster = *aslave = -1;
> @@ -67,17 +72,22 @@ static int openpty(int *amaster, int *aslave, char *name,
>  
>  if (grantpt(mfd) == -1 || unlockpt(mfd) == -1)
>  goto err;
> -
> +#if defined(CONFIG_ANDROID)
> +if (ptsname_r(mfd, slave, PATH_MAX) < 0)
> +goto err;
> +#else
>  if ((slave = ptsname(mfd)) == NULL)
>  goto err;
> +#endif

ptsname_r(3) should be used on all Linux hosts because it is reentrant.
This improvement isn't Android-specific, please split it into a separate
patch.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvmtool] Skip a few messages by default: command line args; flat binary; earlyprintk.

2015-10-06 Thread Dimitri John Ledkov
Hello,

A bit of context. I'm from Clear Linux* Project for Intel Architecture
and we use lkvm as the hypervisor in Clear Containers for Docker
Engine http://blog.surgut.co.uk/2015/09/clear-containers-for-docker-engine.html

For us, we really do not want any output coming from the hypervisor,
or kernel, or init. We use systemd, and the only unit that has TTY
connected for output is the ultimate docker workload user has
requested. Thus I am avert --vey-quiet mode for lkvm. I understand
that this is outside of the usual intended use-case for kvmtool (i.e.
kernel development), but it's really lean and nice to work with.

On 30 September 2015 at 17:11, Andre Przywara  wrote:
> Hi Dimitri,
>
> thanks for sharing your patches.
>
> On 29/09/15 17:59, Dimitri John Ledkov wrote:
>> The partial command line args & earlyprintk=serial are still enabled
>> in the debug mode. Warning that a flat binary kernel image is attemped
>> to be loaded is completely dropped. These are not that informative,
>> once one is past intial debugging, and only polute the console.
>>
>> Signed-off-by: Dimitri John Ledkov 
>> ---
>>  builtin-run.c | 10 ++
>>  kvm.c |  1 -
>>  x86/kvm.c |  8 ++--
>>  3 files changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/builtin-run.c b/builtin-run.c
>> index e0c8732..8edbf88 100644
>> --- a/builtin-run.c
>> +++ b/builtin-run.c
>> @@ -613,10 +613,12 @@ static struct kvm *kvm_cmd_run_init(int argc, const 
>> char **argv)
>>
>>   kvm->cfg.real_cmdline = real_cmdline;
>>
>> - printf("  # %s run -k %s -m %Lu -c %d --name %s\n", KVM_BINARY_NAME,
>> - kvm->cfg.kernel_filename,
>> - (unsigned long long)kvm->cfg.ram_size / 1024 / 1024,
>> - kvm->cfg.nrcpus, kvm->cfg.guest_name);
>> + if (do_debug_print) {
>> + printf("  # %s run -k %s -m %Lu -c %d --name %s\n", 
>> KVM_BINARY_NAME,
>> +kvm->cfg.kernel_filename,
>> +(unsigned long long)kvm->cfg.ram_size / 1024 / 1024,
>> +kvm->cfg.nrcpus, kvm->cfg.guest_name);
>> + }
>
> I like the general idea. In fact I have this very patch (among others)
> in my tree too. I applied similar guarding to other messages as well
> (mostly those that only show up on ARM, but also the "ended normally"
> message). Like any good UNIX tool kvmtool should keep quiet if it has
> nothing worthwhile to say ;-)
> But looking at it more closely, I see that there is pr_debug() defined
> doing that "if (do_debug_print)" already. The only issue is that is
> prints source line information, which is not really useful here. But
> then again there does not seem to be any user of it?
>

I'd be happy to change these to pr_debug() messages.


> So what about the following:
> - We avoid printing pr_info() messages in the default case. Looking at
> its current users in the tree this information is not really useful for
> normal users. We enable pr_info() output only if do_debug_print is
> enabled or introduce another command line flag (--verbose?) for that.
> - We check each user of pr_info() to see whether this information is
> actually "informational" or whether it should be converted to pr_warn.
> - We change the above line to use pr_info instead of printf.

Sounds good to me. That should work as well.

> - We fix the EOL mayhem we have atm while at it.
>

Not quite sure what you mean by `EOL mayhem' could you please elaborate?

> If you don't mind I will give this a try later this week.
>

Thumbs up!


>>
>>   if (init_list__init(kvm) < 0)
>>   die ("Initialisation failed");
>> diff --git a/kvm.c b/kvm.c
>> index 10ed230..1081072 100644
>> --- a/kvm.c
>> +++ b/kvm.c
>> @@ -378,7 +378,6 @@ bool kvm__load_kernel(struct kvm *kvm, const char 
>> *kernel_filename,
>>   if (ret)
>>   goto found_kernel;
>>
>> - pr_warning("%s is not a bzImage. Trying to load it as a flat 
>> binary...", kernel_filename);
>
> I think on x86 this message is useful to have: to point people to the
> fact that they are trying to load a kernel which most probably isn't one.
> Do you actually load a "flat binary", so not a Linux bzImage? If yes,
> what is it? Does this work for you? I didn't have the impression that
> this code was actually used at all.

We do use flat kernel loading, and we have extra patches for that. But
I haven't reconciled that with current upstream tree yet.

> If you do use it, could you please give my kernel loading series [1] a
> try? I touch flat binary loading there, but had no chance to test it.
>

These look very interesting. I will definitely look into them.

>>  #endif
>>
>>   ret = load_elf_binary(kvm, fd_kernel, fd_initrd, kernel_cmdline);
>> diff --git a/x86/kvm.c b/x86/kvm.c
>> index 512ad67..4a5fa41 100644
>> --- a/x86/kvm.c
>> +++ b/x86/kvm.c
>> @@ -124,8 +124,12 @@ void kvm__arch_set_cmdline(char *cmdline, bool video)
>>  

Re: VM exit profiling

2015-10-06 Thread Prateek Sharma
On Tue, Oct 6, 2015 at 6:50 PM, David Matlack  wrote:
> Have you tried perf kvm stat? e.g.
>
> perf kvm stat record -a sleep 10 # record all vmexits for 10 seconds
> perf kvm stat report --event=vmexit
>
> This gives per-exit counts and min/max/avg latencies.
>
> Alternatively you can record the raw events kvm:kvm_exit and kvm:kvm_entry and
> process the data however you want.
>
> On Tue, Oct 6, 2015 at 3:42 PM, Prateek Sharma  wrote:
>> Hello all,
>>   There used to be perf scripts to do exit-level profiling of VMs
>> (number of exits, time spent per exit, etc). I am using kernel version
>> 3.19, and the perf utility which ships with it has a perf-kvm option, but
>> that only reports total number of exits, and not the reason/latency.
>>   The stats in  /sys/debug/kvm seem to be about number of exits only,
>> not the time information. My question is: whats the most convenient way to
>> get per-exit counts and durations?
>>
>> Thanks,
>> --Prateek
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi David,
   Thanks for the answer. It works perfectly.
--Prateek
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


VM exit profiling

2015-10-06 Thread Prateek Sharma
Hello all,
  There used to be perf scripts to do exit-level profiling of VMs
(number of exits, time spent per exit, etc). I am using kernel version
3.19, and the perf utility which ships with it has a perf-kvm option, but
that only reports total number of exits, and not the reason/latency.
  The stats in  /sys/debug/kvm seem to be about number of exits only,
not the time information. My question is: whats the most convenient way to
get per-exit counts and durations?

Thanks,
--Prateek
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VM exit profiling

2015-10-06 Thread David Matlack
Have you tried perf kvm stat? e.g.

perf kvm stat record -a sleep 10 # record all vmexits for 10 seconds
perf kvm stat report --event=vmexit

This gives per-exit counts and min/max/avg latencies.

Alternatively you can record the raw events kvm:kvm_exit and kvm:kvm_entry and
process the data however you want.

On Tue, Oct 6, 2015 at 3:42 PM, Prateek Sharma  wrote:
> Hello all,
>   There used to be perf scripts to do exit-level profiling of VMs
> (number of exits, time spent per exit, etc). I am using kernel version
> 3.19, and the perf utility which ships with it has a perf-kvm option, but
> that only reports total number of exits, and not the reason/latency.
>   The stats in  /sys/debug/kvm seem to be about number of exits only,
> not the time information. My question is: whats the most convenient way to
> get per-exit counts and durations?
>
> Thanks,
> --Prateek
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iommu/s390: add iommu api for s390 pci devices

2015-10-06 Thread Joerg Roedel
On Thu, Oct 01, 2015 at 07:30:28PM +0200, Gerald Schaefer wrote:
> Yes, the DMA API is already implemented in arch/s390/pci/pci_dma.c.
> I thought about moving it over to the new location in drivers/iommu/,
> but I don't see any benefit from it.

Okay, this is true for now. At some point we hopefully have a common
DMA-API implementation for all IOMMU driver, at which point s390 can
make use of it too and abandon its own implementation.

> Also, the two APIs are quite different on s390 and must not be mixed-up.
> For example, we have optimizations in the DMA API to reduce TLB flushes
> based on iommu bitmap wrap-around, which is not possible for the map/unmap
> logic in the IOMMU API. There is also the requirement that each device has
> its own DMA page table (not shared), which is important for DMA API device
> recovery and map/unmap on s390.

This sounds quite similar to what other IOMMU drivers also implement,
especially the AMD IOMMU driver. It also uses non-shared page-tables for
devices and implements the bitmap-allocator optimization.

> Hmm, not sure how this can replace my own struct. I need the struct to
> maintain a list of all devices that share a dma page table. And the
> devices need to be added and removed to/from that list in attach/detach_dev.
> 
> I also need that list during map/unmap, in order to do a TLB flush for
> all affected devices, and this happens under a spin lock.
> 
> So I guess I cannot use the iommu_group->devices list, which is managed
> in add/remove_device and under a mutex, if that was on your mind.

Yeah, right. Thanks for the explanation.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iommu/s390: add iommu api for s390 pci devices

2015-10-06 Thread Joerg Roedel
On Thu, Aug 27, 2015 at 03:33:03PM +0200, Gerald Schaefer wrote:
> This adds an IOMMU API implementation for s390 PCI devices.
> 
> Reviewed-by: Sebastian Ott 
> Signed-off-by: Gerald Schaefer 
> ---
>  MAINTAINERS |   7 +
>  arch/s390/Kconfig   |   1 +
>  arch/s390/include/asm/pci.h |   4 +
>  arch/s390/include/asm/pci_dma.h |   5 +-
>  arch/s390/pci/pci_dma.c |  37 +++--
>  drivers/iommu/Kconfig   |   7 +
>  drivers/iommu/Makefile  |   1 +
>  drivers/iommu/s390-iommu.c  | 337 
> 
>  8 files changed, 386 insertions(+), 13 deletions(-)
>  create mode 100644 drivers/iommu/s390-iommu.c

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-06 Thread Joerg Roedel
On Mon, Oct 05, 2015 at 01:42:43PM -0400, Bandan Das wrote:
> Joerg Roedel  writes:
> 
> > On Mon, Oct 05, 2015 at 12:54:43PM -0400, Bandan Das wrote:
> >> Joerg Roedel  writes:
> >> The problems is that the next_rip field could be stale. If the processor 
> >> supports
> >> next_rip, then it will clear it out on the next entry. If it doesn't,
> >> an old value just sits there (no matter who wrote it) and the problem
> >> happens when skip_emulated_instruction advances the rip with an incorrect
> >> value.
> >
> > So the right fix would be to just set the guests next_rip to 0 when we
> > emulate vmrun, just like real hardware does, no?
> 
> Agreed, resetting to 0 if nrips isn't supported seems right. It would still
> help having a printk_once in this case IMO :)

I meant to reset it always to 0 on vmrun, like real hardware does.



Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI

2015-10-06 Thread Bhushan Bharat


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Tuesday, October 06, 2015 4:25 AM
> To: Bhushan Bharat-R65777 
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> marc.zyng...@arm.com; will.dea...@arm.com
> Subject: Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for
> MSI
> 
> On Mon, 2015-10-05 at 08:33 +, Bhushan Bharat wrote:
> >
> >
> > > -Original Message-
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Saturday, October 03, 2015 4:17 AM
> > > To: Bhushan Bharat-R65777 
> > > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > christoffer.d...@linaro.org; eric.au...@linaro.org;
> > > pranavku...@linaro.org; marc.zyng...@arm.com; will.dea...@arm.com
> > > Subject: Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping
> > > for MSI
> > >
> > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > Finally ARM SMMU declare that iommu-mapping for MSI-pages are not
> > > > set automatically and it should be set explicitly.
> > > >
> > > > Signed-off-by: Bharat Bhushan 
> > > > ---
> > > >  drivers/iommu/arm-smmu.c | 7 ++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index
> > > > a3956fb..9d37e72 100644
> > > > --- a/drivers/iommu/arm-smmu.c
> > > > +++ b/drivers/iommu/arm-smmu.c
> > > > @@ -1401,13 +1401,18 @@ static int
> arm_smmu_domain_get_attr(struct
> > > iommu_domain *domain,
> > > > enum iommu_attr attr, void *data)  {
> > > > struct arm_smmu_domain *smmu_domain =
> > > to_smmu_domain(domain);
> > > > +   struct iommu_domain_msi_maps *msi_maps;
> > > >
> > > > switch (attr) {
> > > > case DOMAIN_ATTR_NESTING:
> > > > *(int *)data = (smmu_domain->stage ==
> > > ARM_SMMU_DOMAIN_NESTED);
> > > > return 0;
> > > > case DOMAIN_ATTR_MSI_MAPPING:
> > > > -   /* Dummy handling added */
> > > > +   msi_maps = data;
> > > > +
> > > > +   msi_maps->automap = false;
> > > > +   msi_maps->override_automap = true;
> > > > +
> > > > return 0;
> > > > default:
> > > > return -ENODEV;
> > >
> > > In previous discussions I understood one of the problems you were
> > > trying to solve was having a limited number of MSI banks and while
> > > you may be able to get isolated MSI banks for some number of users,
> > > it wasn't unlimited and sharing may be required.  I don't see any of that
> addressed in this series.
> >
> > That problem was on PowerPC. Infact there were two problems, one which
> MSI bank to be used and second how to create iommu-mapping for device
> assigned to userspace.
> > First problem was PowerPC specific and that will be solved separately.
> > For second problem, earlier I tried to added a couple of MSI specific ioctls
> and you suggested (IIUC) that we should have a generic reserved-iova type
> of API and then we can map MSI bank using reserved-iova and this will not
> require involvement of user-space.
> >
> > >
> > > Also, the management of reserved IOVAs vs MSI addresses looks really
> > > dubious to me.  How does your platform pick an MSI address and what
> > > are we breaking by covertly changing it?  We seem to be masking over
> > > at the VFIO level, where there should be lower level interfaces
> > > doing the right thing when we configure MSI on the device.
> >
> > Yes, In my understanding the right solution should be:
> >  1) VFIO driver should know what physical-msi-address will be used for
> devices in an iommu-group.
> > I did not find an generic API, on PowerPC I added some function in
> ffrescale msi-driver and called from vfio-iommu-fsl-pamu.c (not yet
> upstreamed).
> >  2) VFIO driver should know what IOVA to be used for creating
> > iommu-mapping (VFIO APIs patch of this patch series)
> >  3) VFIO driver will create the iommu-mapping using (1) and (2)
> >  4) VFIO driver should be able to tell the msi-driver that for a given 
> > device it
> should use different IOVA. So when composing the msi message (for the
> devices is the given iommu-group) it should use that programmed iova as
> MSI-address. This interface also needed to be developed.
> >
> > I was not sure of which approach we should take. The current approach in
> the patch is simple to develop so I went ahead to take input but I agree this
> does not look very good.
> > What do you think, should drop this approach and work out the approach
> as described above.
> 
> I'm certainly not interested in applying an maintaining an interim solution 
> that
> isn't the right one.  It seems like VFIO is too involved in this process in 
> your
> 

Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-06 Thread Joerg Roedel
On Thu, Oct 01, 2015 at 06:31:27PM -0400, Bandan Das wrote:
> >> @@ -514,7 +514,7 @@ static void skip_emulated_instruction(struct kvm_vcpu 
> >> *vcpu)
> >>struct vcpu_svm *svm = to_svm(vcpu);
> >>  
> >>if (svm->vmcb->control.next_rip != 0) {
> >> -  WARN_ON(!static_cpu_has(X86_FEATURE_NRIPS));
> >> +  WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
> >>svm->next_rip = svm->vmcb->control.next_rip;
> >>}

I looked again how this could possibly be triggered, and I am somewhat
confused now.

So svm->vmcb->control.next_rip is only written by hardware or in
svm_check_intercept(). Both cases write only to this field, if the
hardware supports X86_FEATURE_NRIPS. The write in nested_svm_vmexit only
targets the guests VMCB, and we don't use that one again.

So I can't see how the WARN_ON above could be triggered. Do I miss
something or might this also be a miscompilation of static_cpu_has?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/12] KVM: x86: Replace call-back set_tsc_khz() with a common function

2015-10-06 Thread Paolo Bonzini


On 06/10/2015 06:06, Haozhong Zhang wrote:
> Alternatively, it's also possible to follow David's comment to use
> divq on x86_64 to keep both precision and safety. On i386, it just
> falls back to above truncating approach.

khz is just 32 bits, so we can do a 96/32 division.  And because this is
a slow path, we can code a generic u64*u32/u32 function and use it to do
(1 << kvm_tsc_scaling_ratio_frac_bits) * khz / tsc_khz:

diff --git a/include/linux/math64.h b/include/linux/math64.h
index c45c089bfdac..5b70af4fa386 100644
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -142,6 +142,13 @@ static inline u64 mul_u64_u32_shr(u64 a, u32 mul,
unsigned int shift)
 }
 #endif /* mul_u64_u32_shr */

+#ifndef mul_u64_u32_div
+static inline u64 mul_u64_u32_div(u64 x, u32 num, u32 den)
+{
+   return (u64)(((unsigned __int128)a * mul) / den);
+}
+#endif
+
 #else

 #ifndef mul_u64_u32_shr
@@ -161,6 +168,35 @@ static inline u64 mul_u64_u32_shr(u64 a, u32 mul,
unsigned int shift)
 }
 #endif /* mul_u64_u32_shr */

+#ifndef mul_u64_u32_div
+static inline u64 mul_u64_u32_div(u64 a, u32 num, u32 den)
+{
+   union {
+   u64 ll;
+   struct {
+#ifdef __BIG_ENDIAN
+   u32 high, low;
+#else
+   u32 low, high;
+#endif
+   } l;
+   } u, rl, rh;
+
+   u.ll = a;
+   rl.ll = (u64)u.l.low * num;
+   rh.ll = (u64)u.l.high * num + rl.l.high;
+
+   /* Bits 32-63 of the result will be in rh.l.low.  */
+   rl.l.high = do_div(rh.ll, den);
+
+   /* Bits 0-31 of the result will be in rl.l.low.  */
+   do_div(rl.ll, den);
+
+   rl.l.high = rh.l.low;
+   return rl.ll;
+}
+#endif
+
 #endif

 #endif /* _LINUX_MATH64_H */
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v4] os-android: Add support to android platform

2015-10-06 Thread Paolo Bonzini
Just a couple comments since I reviewed the previous versions...

On 06/10/2015 11:47, Stefan Hajnoczi wrote:
> >  #include 
> > -#include 
> >  #include 
> >  #include 
> >  #include 
> 
> What is the justification for this?  Do you know why io.h was included
> before?

No reason, the same patch is en route through qemu-trivial.

>>
>> -
>> +#if defined(CONFIG_ANDROID)
>> +if (ptsname_r(mfd, slave, PATH_MAX) < 0)
>> +goto err;
>> +#else
>>  if ((slave = ptsname(mfd)) == NULL)
>>  goto err;
>> +#endif
> 
> ptsname_r(3) should be used on all Linux hosts because it is reentrant.
> This improvement isn't Android-specific, please split it into a separate
> patch.

Actually everyone except Solaris and Android is already using openpty.
This is emulation code for those two OSes.  (The gnulib manual mentions
that AIX 5.1, HP-UX 11, IRIX 6.5 also don't have openpty, but we don't
support those I think).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/12] KVM: x86: Replace call-back set_tsc_khz() with a common function

2015-10-06 Thread Haozhong Zhang
On Tue, Oct 06, 2015 at 12:40:49PM +0200, Paolo Bonzini wrote:
> 
> 
> On 06/10/2015 06:06, Haozhong Zhang wrote:
> > Alternatively, it's also possible to follow David's comment to use
> > divq on x86_64 to keep both precision and safety. On i386, it just
> > falls back to above truncating approach.
> 
> khz is just 32 bits, so we can do a 96/32 division.  And because this is
> a slow path, we can code a generic u64*u32/u32 function and use it to do
> (1 << kvm_tsc_scaling_ratio_frac_bits) * khz / tsc_khz:
>

This is much better! Thanks Paolo! I'll use this mul_u64_u32_shr() in
the next version.

> diff --git a/include/linux/math64.h b/include/linux/math64.h
> index c45c089bfdac..5b70af4fa386 100644
> --- a/include/linux/math64.h
> +++ b/include/linux/math64.h
> @@ -142,6 +142,13 @@ static inline u64 mul_u64_u32_shr(u64 a, u32 mul,
> unsigned int shift)
>  }
>  #endif /* mul_u64_u32_shr */
> 
> +#ifndef mul_u64_u32_div
> +static inline u64 mul_u64_u32_div(u64 x, u32 num, u32 den)
> +{
> + return (u64)(((unsigned __int128)a * mul) / den);
> +}
> +#endif
> +
>  #else
> 
>  #ifndef mul_u64_u32_shr
> @@ -161,6 +168,35 @@ static inline u64 mul_u64_u32_shr(u64 a, u32 mul,
> unsigned int shift)
>  }
>  #endif /* mul_u64_u32_shr */
> 
> +#ifndef mul_u64_u32_div
> +static inline u64 mul_u64_u32_div(u64 a, u32 num, u32 den)
> +{
> + union {
> + u64 ll;
> + struct {
> +#ifdef __BIG_ENDIAN
> + u32 high, low;
> +#else
> + u32 low, high;
> +#endif
> + } l;
> + } u, rl, rh;
> +
> + u.ll = a;
> + rl.ll = (u64)u.l.low * num;
> + rh.ll = (u64)u.l.high * num + rl.l.high;
> +
> + /* Bits 32-63 of the result will be in rh.l.low.  */
> + rl.l.high = do_div(rh.ll, den);
> +
> + /* Bits 0-31 of the result will be in rl.l.low.  */
> + do_div(rl.ll, den);
> +
> + rl.l.high = rh.l.low;
> + return rl.ll;
> +}
> +#endif
> +
>  #endif
> 
>  #endif /* _LINUX_MATH64_H */
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH v4] os-android: Add support to android platform

2015-10-06 Thread Eric Blake
On 10/06/2015 03:47 AM, Stefan Hajnoczi wrote:
> On Sat, Oct 03, 2015 at 12:44:14PM +0800, Houcheng Lin wrote:
>> diff --git a/configure b/configure
>> index d7c24cd..cda88c1 100755

>> +++ b/include/qemu/osdep.h
>> @@ -74,6 +74,14 @@ typedef unsigned intuint_fast16_t;
>>  typedef signed int  int_fast16_t;
>>  #endif
>>  
>> +#ifdef CONFIG_ANDROID
>> +/*
>> + * For include the basename prototyping in android.
>> + */
>> +#include 
> 
> Files that use basename(3) should include libgen.h.  Why include it
> here?

What is using basename(3)? POSIX gives basename(3) a worthless contract
- it is free to return non-thread-safe storage.  Also, it is not
portable to Windows-style drive-letters, so I consider any code using
 to already be suspect.

If you are already writing code to be ported to both Unixy and windows
systems, you are better off rolling your own alternative to basename (or
better, using something else that has already rolled a portable version
for you - while I know gnulib does that, we aren't using gnulib; but I
assume glib has something along those lines even though I haven't looked
for it).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH v4] os-android: Add support to android platform

2015-10-06 Thread Paolo Bonzini


On 06/10/2015 14:13, Eric Blake wrote:
> 
> If you are already writing code to be ported to both Unixy and
> windows systems, you are better off rolling your own alternative to
> basename (or better, using something else that has already rolled a
> portable version for you - while I know gnulib does that, we aren't
> using gnulib; but I assume glib has something along those lines
> even though I haven't looked for it).

Yes, there is g_path_get_basename (and g_path_get_dirname).  Added to
http://wiki.qemu.org/BiteSizedTasks#API_conversion.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvmtool] Skip a few messages by default: command line args; flat binary; earlyprintk.

2015-10-06 Thread Will Deacon
On Wed, Sep 30, 2015 at 05:11:15PM +0100, Andre Przywara wrote:
> On 29/09/15 17:59, Dimitri John Ledkov wrote:
> > The partial command line args & earlyprintk=serial are still enabled
> > in the debug mode. Warning that a flat binary kernel image is attemped
> > to be loaded is completely dropped. These are not that informative,
> > once one is past intial debugging, and only polute the console.
> > 
> > Signed-off-by: Dimitri John Ledkov 
> > ---
> >  builtin-run.c | 10 ++
> >  kvm.c |  1 -
> >  x86/kvm.c |  8 ++--
> >  3 files changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/builtin-run.c b/builtin-run.c
> > index e0c8732..8edbf88 100644
> > --- a/builtin-run.c
> > +++ b/builtin-run.c
> > @@ -613,10 +613,12 @@ static struct kvm *kvm_cmd_run_init(int argc, const 
> > char **argv)
> >  
> > kvm->cfg.real_cmdline = real_cmdline;
> >  
> > -   printf("  # %s run -k %s -m %Lu -c %d --name %s\n", KVM_BINARY_NAME,
> > -   kvm->cfg.kernel_filename,
> > -   (unsigned long long)kvm->cfg.ram_size / 1024 / 1024,
> > -   kvm->cfg.nrcpus, kvm->cfg.guest_name);
> > +   if (do_debug_print) {
> > +   printf("  # %s run -k %s -m %Lu -c %d --name %s\n", 
> > KVM_BINARY_NAME,
> > +  kvm->cfg.kernel_filename,
> > +  (unsigned long long)kvm->cfg.ram_size / 1024 / 1024,
> > +  kvm->cfg.nrcpus, kvm->cfg.guest_name);
> > +   }
> 
> I like the general idea. In fact I have this very patch (among others)
> in my tree too. I applied similar guarding to other messages as well
> (mostly those that only show up on ARM, but also the "ended normally"
> message). Like any good UNIX tool kvmtool should keep quiet if it has
> nothing worthwhile to say ;-)
> But looking at it more closely, I see that there is pr_debug() defined
> doing that "if (do_debug_print)" already. The only issue is that is
> prints source line information, which is not really useful here. But
> then again there does not seem to be any user of it?
> 
> So what about the following:
> - We avoid printing pr_info() messages in the default case. Looking at
> its current users in the tree this information is not really useful for
> normal users. We enable pr_info() output only if do_debug_print is
> enabled or introduce another command line flag (--verbose?) for that.
> - We check each user of pr_info() to see whether this information is
> actually "informational" or whether it should be converted to pr_warn.
> - We change the above line to use pr_info instead of printf.
> - We fix the EOL mayhem we have atm while at it.
> 
> If you don't mind I will give this a try later this week.

Sounds good to me.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state

2015-10-06 Thread Alex Williamson
On Tue, 2015-10-06 at 08:53 +, Bhushan Bharat wrote:
> 
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Tuesday, October 06, 2015 4:15 AM
> > To: Bhushan Bharat-R65777 
> > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> > marc.zyng...@arm.com; will.dea...@arm.com
> > Subject: Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs
> > automap state
> > 
> > On Mon, 2015-10-05 at 06:00 +, Bhushan Bharat wrote:
> > > > -1138,6 +1156,8 @@
> > > > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > > > >   }
> > > > >   } else if (cmd == VFIO_IOMMU_GET_INFO) {
> > > > >   struct vfio_iommu_type1_info info;
> > > > > + struct iommu_domain_msi_maps msi_maps;
> > > > > + int ret;
> > > > >
> > > > >   minsz = offsetofend(struct vfio_iommu_type1_info,
> > > > iova_pgsizes);
> > > > >
> > > > > @@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void
> > > > > *iommu_data,
> > > > >
> > > > >   info.flags = 0;
> > > > >
> > > > > + ret = vfio_domains_get_msi_maps(iommu, _maps);
> > > > > + if (ret)
> > > > > + return ret;
> > > >
> > > > And now ioctl(VFIO_IOMMU_GET_INFO) no longer works for any
> > IOMMU
> > > > implementing domain_get_attr but not supporting
> > > > DOMAIN_ATTR_MSI_MAPPING.
> > >
> > > With this current patch version this will get the default assumed behavior
> > as you commented on previous patch.
> > 
> > How so?
> 
> You are right, the ioctl will return failure. But that should be ok, right?

Not remotely.  ioctl(VFIO_IOMMU_GET_INFO) can't suddenly stop working on
some platforms.

> > 
> > +   msi_maps->automap = true;
> > +   msi_maps->override_automap = false;
> > +
> > +   if (domain->ops->domain_get_attr)
> > +   ret = domain->ops->domain_get_attr(domain, attr,
> > + data);
> > 
> > If domain_get_attr is implemented, but DOMAIN_ATTR_MSI_MAPPING is
> > not, ret should be an error code.
> 
> Currently it returns same error code returned by 
> domain->ops->domain_get_attr(). 
> I do not think we want to complicate that we return an error to user-space 
> that msi's probably cannot be used but user-space can continue with Legacy 
> interrupt, or you want that?

I can't really parse your statement, but ioctl(VFIO_IOMMU_GET_INFO)
works today and it must work with your changes.  Your change should only
affect whether some flags are visible, MSI has worked just fine up to
this point on other platforms.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region

2015-10-06 Thread Alex Williamson
On Tue, 2015-10-06 at 09:39 +, Bhushan Bharat wrote:
> 
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Tuesday, October 06, 2015 4:15 AM
> > To: Bhushan Bharat-R65777 
> > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> > marc.zyng...@arm.com; will.dea...@arm.com
> > Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova
> > region
> > 
> > On Mon, 2015-10-05 at 04:55 +, Bhushan Bharat wrote:
> > > Hi Alex,
> > >
> > > > -Original Message-
> > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > Sent: Saturday, October 03, 2015 4:16 AM
> > > > To: Bhushan Bharat-R65777 
> > > > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > > > christoffer.d...@linaro.org; eric.au...@linaro.org;
> > > > pranavku...@linaro.org; marc.zyng...@arm.com; will.dea...@arm.com
> > > > Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del
> > > > reserved iova region
> > > >
> > > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > > > This Patch adds the VFIO APIs to add and remove reserved iova regions.
> > > > > The reserved iova region can be used for mapping some specific
> > > > > physical address in iommu.
> > > > >
> > > > > Currently we are planning to use this interface for adding iova
> > > > > regions for creating iommu of msi-pages. But the API are designed
> > > > > for future extension where some other physical address can be
> > mapped.
> > > > >
> > > > > Signed-off-by: Bharat Bhushan 
> > > > > ---
> > > > >  drivers/vfio/vfio_iommu_type1.c | 201
> > > > +++-
> > > > >  include/uapi/linux/vfio.h   |  43 +
> > > > >  2 files changed, 243 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > > b/drivers/vfio/vfio_iommu_type1.c index 57d8c37..fa5d3e4 100644
> > > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > > @@ -59,6 +59,7 @@ struct vfio_iommu {
> > > > >   struct rb_root  dma_list;
> > > > >   boolv2;
> > > > >   boolnesting;
> > > > > + struct list_headreserved_iova_list;
> > > >
> > > > This alignment leads to poor packing in the structure, put it above the
> > bools.
> > >
> > > ok
> > >
> > > >
> > > > >  };
> > > > >
> > > > >  struct vfio_domain {
> > > > > @@ -77,6 +78,15 @@ struct vfio_dma {
> > > > >   int prot;   /* IOMMU_READ/WRITE */
> > > > >  };
> > > > >
> > > > > +struct vfio_resvd_region {
> > > > > + dma_addr_t  iova;
> > > > > + size_t  size;
> > > > > + int prot;   /* IOMMU_READ/WRITE */
> > > > > + int refcount;   /* ref count of 
> > > > > mappings */
> > > > > + uint64_tmap_paddr;  /* Mapped Physical 
> > > > > Address
> > > > */
> > > >
> > > > phys_addr_t
> > >
> > > Ok,
> > >
> > > >
> > > > > + struct list_head next;
> > > > > +};
> > > > > +
> > > > >  struct vfio_group {
> > > > >   struct iommu_group  *iommu_group;
> > > > >   struct list_headnext;
> > > > > @@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct
> > > > vfio_iommu *iommu,
> > > > >   return NULL;
> > > > >  }
> > > > >
> > > > > +/* This function must be called with iommu->lock held */ static
> > > > > +bool vfio_overlap_with_resvd_region(struct vfio_iommu *iommu,
> > > > > +dma_addr_t start, size_t 
> > > > > size) {
> > > > > + struct vfio_resvd_region *region;
> > > > > +
> > > > > + list_for_each_entry(region, >reserved_iova_list, next) {
> > > > > + if (region->iova < start)
> > > > > + return (start - region->iova < region->size);
> > > > > + else if (start < region->iova)
> > > > > + return (region->iova - start < size);
> > > >
> > > > <= on both of the return lines?
> > >
> > > I think is should be "<" and not "=<", no ?
> > 
> > Yep, looks like you're right.  Maybe there's a more straightforward way to 
> > do
> > this.
> > 
> > > >
> > > > > +
> > > > > + return (region->size > 0 && size > 0);
> > > > > + }
> > > > > +
> > > > > + return false;
> > > > > +}
> > > > > +
> > > > > +/* This function must be called with iommu->lock held */ static
> > > > > +struct vfio_resvd_region *vfio_find_resvd_region(struct
> > > > > +vfio_iommu
> > > > *iommu,
> > > > > +  dma_addr_t start, 
> > > > > size_t
> > > > size) {
> > > > > + struct vfio_resvd_region *region;
> > > > > +
> > > > > + list_for_each_entry(region,