Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-05 Thread Bandan Das
Joerg Roedel  writes:

> On Mon, Oct 05, 2015 at 12:54:43PM -0400, Bandan Das wrote:
>> Joerg Roedel  writes:
>> The problems is that the next_rip field could be stale. If the processor 
>> supports
>> next_rip, then it will clear it out on the next entry. If it doesn't,
>> an old value just sits there (no matter who wrote it) and the problem
>> happens when skip_emulated_instruction advances the rip with an incorrect
>> value.
>
> So the right fix would be to just set the guests next_rip to 0 when we
> emulate vmrun, just like real hardware does, no?

Agreed, resetting to 0 if nrips isn't supported seems right. It would still
help having a printk_once in this case IMO :)

>
>   Joerg
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-05 Thread Mario Smarduch
Will do, I'll get them over to you.

-Original Message-
From: Christoffer Dall [mailto:christoffer.d...@linaro.org] 
Sent: Monday, October 05, 2015 10:26 AM
To: Mario Smarduch
Cc: kvm...@lists.cs.columbia.edu; marc.zyng...@arm.com; kvm@vger.kernel.org; 
linux-arm-ker...@lists.infradead.org
Subject: Re: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

On Mon, Oct 05, 2015 at 09:14:57AM -0700, Mario Smarduch wrote:
> Hi Christoffer,
>I just managed to boot qemu arm32 up on arm64 (last Fri - thanks 
> for the tip
> - there were few other issue to clean up), so let me retest it again. 
> Also I noticed some refactoring would help both 32 and 64 bit patches.
> 
> Yes I could provide a the user space tests as well.
> 
I'd like those regardless as I generally test my queue before pushing it to 
next.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/12] KVM: x86: Replace call-back set_tsc_khz() with a common function

2015-10-05 Thread Radim Krčmář
2015-09-28 13:38+0800, Haozhong Zhang:
> Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
> patch removes the call-back set_tsc_khz() and replaces it with a common
> function.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> +static void set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
> +{
> + u64 ratio, khz;
| [...]
> + khz = user_tsc_khz;

I'd use "user_tsc_khz" directly.

> + /* TSC scaling required  - calculate ratio */
> + shift = (kvm_tsc_scaling_ratio_frac_bits <= 32) ?
> + kvm_tsc_scaling_ratio_frac_bits : 32;
> + ratio = khz << shift;
> + do_div(ratio, tsc_khz);
> + ratio <<= (kvm_tsc_scaling_ratio_frac_bits - shift);

VMX is losing 16 bits by this operation;  normal fixed point division
could get us a smaller drift (and an one-liner here) ...
at 4.3 GHz, 32 instead of 48 bits after decimal point translate to one
"lost" TSC tick per second, in the worst case.

Please mention that we are truncating on purpose :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/12] KVM: x86: Use the correct vcpu's TSC rate to compute time scale

2015-10-05 Thread Radim Krčmář
2015-09-28 13:38+0800, Haozhong Zhang:
> This patch makes KVM use virtual_tsc_khz rather than the host TSC rate
> as vcpu's TSC rate to compute the time scale if TSC scaling is enabled.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -1782,7 +1782,9 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>   return 0;
>  
>   if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) {
> - kvm_get_time_scale(NSEC_PER_SEC / 1000, this_tsc_khz,
> + tgt_tsc_khz = kvm_has_tsc_control ?
> + vcpu->virtual_tsc_khz : this_tsc_khz;
> + kvm_get_time_scale(NSEC_PER_SEC / 1000, tgt_tsc_khz,
>  >hv_clock.tsc_shift,
>  >hv_clock.tsc_to_system_mul);

Good catch, it seems that SVM didn't scale kvmclock correctly ...
I think we'll want this patch in stable.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-05 Thread Dirk Müller
> So the right fix would be to just set the guests next_rip to 0 when we
> emulate vmrun, just like real hardware does, no?

Like this? (Note: I’m not sure what I’m doing here..). I Agree with you that 
the warning
for this seems excessive, I’ve just removed it. 



0001-KVM-nSVM-Check-for-NRIP-support-before-passing-on-co.patch
Description: Binary data


Thanks,
Dirk

Re: [PATCH 04/12] KVM: x86: Replace call-back set_tsc_khz() with a common function

2015-10-05 Thread David Matlack
On Mon, Oct 5, 2015 at 12:53 PM, Radim Krčmář  wrote:
> 2015-09-28 13:38+0800, Haozhong Zhang:
>> Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
>> patch removes the call-back set_tsc_khz() and replaces it with a common
>> function.
>>
>> Signed-off-by: Haozhong Zhang 
>> ---
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> +static void set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool scale)
>> +{
>> + u64 ratio, khz;
> | [...]
>> + khz = user_tsc_khz;
>
> I'd use "user_tsc_khz" directly.
>
>> + /* TSC scaling required  - calculate ratio */
>> + shift = (kvm_tsc_scaling_ratio_frac_bits <= 32) ?
>> + kvm_tsc_scaling_ratio_frac_bits : 32;
>> + ratio = khz << shift;
>> + do_div(ratio, tsc_khz);
>> + ratio <<= (kvm_tsc_scaling_ratio_frac_bits - shift);
>
> VMX is losing 16 bits by this operation;  normal fixed point division
> could get us a smaller drift (and an one-liner here) ...
> at 4.3 GHz, 32 instead of 48 bits after decimal point translate to one
> "lost" TSC tick per second, in the worst case.

We can easily avoid losing precision on x86_64 (divq allows a 128-bit
dividend). 32-bit can just lose the 16 bits of precision (TSC scaling
is only available on SkyLake, and I'd be surprised if there were
many hosts running KVM in protected mode on SkyLake :)).

>
> Please mention that we are truncating on purpose :)
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 2/6] iommu: Add interface to get msi-pages mapping attributes

2015-10-05 Thread Bhushan Bharat


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Saturday, October 03, 2015 4:16 AM
> To: Bhushan Bharat-R65777 
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> marc.zyng...@arm.com; will.dea...@arm.com
> Subject: Re: [RFC PATCH 2/6] iommu: Add interface to get msi-pages
> mapping attributes
> 
> [really ought to consider cc'ing the iommu list]
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > This APIs return the capability of automatically mapping msi-pages in
> > iommu with some magic iova. Which is what currently most of iommu's
> > does and is the default behaviour.
> >
> > Further API returns whether iommu allows the user to define different
> > iova for mai-page mapping for the domain. This is required when a msi
> > capable device is directly assigned to user-space/VM and user-space/VM
> > need to define a non-overlapping (from other dma-able address space)
> > iova for msi-pages mapping in iommu.
> >
> > This patch just define the interface and follow up patches will extend
> > this interface.
> 
> This is backwards, generally you want to add the infrastructure and only
> expose it once all the pieces are in place for it to work.  For instance, 
> patch
> 1/6 exposes a new userspace interface for vfio that doesn't do anything yet.
> How does the user know if it's there, *and* works?

Ok, I will reorder the patches.

> 
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  drivers/iommu/arm-smmu.c|  3 +++
> >  drivers/iommu/fsl_pamu_domain.c |  3 +++
> >  drivers/iommu/iommu.c   | 14 ++
> >  include/linux/iommu.h   |  9 -
> >  4 files changed, 28 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index
> > 66a803b..a3956fb 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -1406,6 +1406,9 @@ static int arm_smmu_domain_get_attr(struct
> iommu_domain *domain,
> > case DOMAIN_ATTR_NESTING:
> > *(int *)data = (smmu_domain->stage ==
> ARM_SMMU_DOMAIN_NESTED);
> > return 0;
> > +   case DOMAIN_ATTR_MSI_MAPPING:
> > +   /* Dummy handling added */
> > +   return 0;
> > default:
> > return -ENODEV;
> > }
> > diff --git a/drivers/iommu/fsl_pamu_domain.c
> > b/drivers/iommu/fsl_pamu_domain.c index 1d45293..9a94430 100644
> > --- a/drivers/iommu/fsl_pamu_domain.c
> > +++ b/drivers/iommu/fsl_pamu_domain.c
> > @@ -856,6 +856,9 @@ static int fsl_pamu_get_domain_attr(struct
> iommu_domain *domain,
> > case DOMAIN_ATTR_FSL_PAMUV1:
> > *(int *)data = DOMAIN_ATTR_FSL_PAMUV1;
> > break;
> > +   case DOMAIN_ATTR_MSI_MAPPING:
> > +   /* Dummy handling added */
> > +   break;
> > default:
> > pr_debug("Unsupported attribute type\n");
> > ret = -EINVAL;
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> > d4f527e..16c2eab 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1216,6 +1216,7 @@ int iommu_domain_get_attr(struct
> iommu_domain *domain,
> > bool *paging;
> > int ret = 0;
> > u32 *count;
> > +   struct iommu_domain_msi_maps *msi_maps;
> >
> > switch (attr) {
> > case DOMAIN_ATTR_GEOMETRY:
> > @@ -1236,6 +1237,19 @@ int iommu_domain_get_attr(struct
> iommu_domain *domain,
> > ret = -ENODEV;
> >
> > break;
> > +   case DOMAIN_ATTR_MSI_MAPPING:
> > +   msi_maps = data;
> > +
> > +   /* Default MSI-pages are magically mapped with some iova
> and
> > +* do now allow to configure with different iova.
> > +*/
> > +   msi_maps->automap = true;
> > +   msi_maps->override_automap = false;
> 
> There's no magic.  I think what you're trying to express is the difference
> between platforms that support MSI within the IOMMU IOVA space and
> thus need explicit IOMMU mappings vs platforms where MSI mappings
> either bypass the IOMMU entirely or are setup implicitly with interrupt
> remapping support.

Yes, I wants to differentiate the platforms which requires explicit iommu 
mapping for MSI with other platforms.
I will change the comment and use better name 
(need_mapping/need_iommu_mapping/require_mapping).

> 
> Why does it make sense to impose any sort of defaults?  If the IOMMU
> driver doesn't tell us what to do, I don't think we want to assume anything.
> 
> > +
> > +   if (domain->ops->domain_get_attr)
> > +   ret = domain->ops->domain_get_attr(domain, attr,
> data);
> > +
> > +   break;
> > default:
> > if (!domain->ops->domain_get_attr)
> > return -EINVAL;
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > 

Re: [PATCH 1/3] Target-microblaze: Remove unnecessary variable

2015-10-05 Thread Michael Tokarev
05.10.2015 08:18, Markus Armbruster wrote:
> Michael Tokarev  writes:
> 
>> 25.09.2015 11:37, Shraddha Barke wrote:
>>> Compress lines and remove the variable .
>>
>> Applied to -trivial, removing this piece of commit message:
>>
>> ---
>>> Change made using Coccinelle script
[..snip..]
>> ---
> 
> Why?  I like having the semantic patch in the commit message when
> there's any chance we'll want do the same mechanical change again later.
> 
> You could save space and include it by reference, though: "Same
> Coccinelle semantic patch as is commit 74c373e".

git commit messages aren't good documentation for various scripts
like this, this info will be lost in the noize.  If it might be
better to keep such scripts in a separate file where it is easier
to find, or in a wiki page on the site. The key point is where to
find the info, git log is difficult for that, especially when you
don't know what to search for or that such a script exists in
there in the first place.

On the other hand, when git log is cluttered by such a long messages
for such small changes, it becomes more difficult to find info which
you really look in git log -- namely, which changes were made that
might have introduced this regression, things like that.

So to me, the shorter and cleaner the commit message is, the better.

BTW, I've no idea why this email has been Cc'd to kvm@vger :)

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt

2015-10-05 Thread Bhushan Bharat


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Saturday, October 03, 2015 4:17 AM
> To: Bhushan Bharat-R65777 
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> marc.zyng...@arm.com; will.dea...@arm.com
> Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> interrupt
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > An MSI-address is allocated and programmed in pcie device during
> > interrupt configuration. Now for a pass-through device, try to create
> > the iommu mapping for this allocted/programmed msi-address.  If the
> > iommu mapping is created and the msi address programmed in the pcie
> > device is different from msi-iova as per iommu programming then
> > reconfigure the pci device to use msi-iova as msi address.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  drivers/vfio/pci/vfio_pci_intrs.c | 36
> > ++--
> >  1 file changed, 34 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c
> > b/drivers/vfio/pci/vfio_pci_intrs.c
> > index 1f577b4..c9690af 100644
> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > @@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
> > int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> > char *name = msix ? "vfio-msix" : "vfio-msi";
> > struct eventfd_ctx *trigger;
> > +   struct msi_msg msg;
> > +   struct vfio_device *device;
> > +   uint64_t msi_addr, msi_iova;
> > int ret;
> >
> > if (vector >= vdev->num_ctx)
> > return -EINVAL;
> >
> > +   device = vfio_device_get_from_dev(>dev);
> 
> Have you looked at this function?  I don't think we want to be doing that
> every time we want to poke the interrupt configuration.

I am trying to describe what I understood, a device can have many interrupts 
and we should setup iommu only once, when called for the first time to 
enable/setup interrupt.
Similarly when disabling the interrupt we should iommu-unmap when called for 
the last enabled interrupt for that device. Now with this understanding, should 
I move this map-unmap to separate functions and call them from 
vfio_msi_set_block() rather than in vfio_msi_set_vector_signal()

>  Also note that
> IOMMU mappings don't operate on devices, but groups, so maybe we want
> to pass the group.

Yes, it operates on group. I hesitated to add an API to get group. Do you 
suggest to that it is ok to add API to get group from device.

> 
> > +   if (device == NULL)
> > +   return -EINVAL;
> 
> This would be a legitimate BUG_ON(!device)
> 
> > +
> > if (vdev->ctx[vector].trigger) {
> > free_irq(irq, vdev->ctx[vector].trigger);
> > +   get_cached_msi_msg(irq, );
> > +   msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo;
> > +   vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE);
> > kfree(vdev->ctx[vector].name);
> > eventfd_ctx_put(vdev->ctx[vector].trigger);
> > vdev->ctx[vector].trigger = NULL;
> > @@ -346,12 +356,11 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
> >  * cached value of the message prior to enabling.
> >  */
> > if (msix) {
> > -   struct msi_msg msg;
> > -
> > get_cached_msi_msg(irq, );
> > pci_write_msi_msg(irq, );
> > }
> >
> > +
> 
> gratuitous newline
> 
> > ret = request_irq(irq, vfio_msihandler, 0,
> >   vdev->ctx[vector].name, trigger);
> > if (ret) {
> > @@ -360,6 +369,29 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
> > return ret;
> > }
> >
> > +   /* Re-program the new-iova in pci-device in case there is
> > +* different iommu-mapping created for programmed msi-address.
> > +*/
> > +   get_cached_msi_msg(irq, );
> > +   msi_iova = 0;
> > +   msi_addr = (u64)(msg.address_hi) << 32 | (u64)(msg.address_lo);
> > +   ret = vfio_device_map_msi(device, msi_addr, PAGE_SIZE,
> _iova);
> > +   if (ret) {
> > +   free_irq(irq, vdev->ctx[vector].trigger);
> > +   kfree(vdev->ctx[vector].name);
> > +   eventfd_ctx_put(trigger);
> > +   return ret;
> > +   }
> > +
> > +   /* Reprogram only if iommu-mapped iova is different from msi-
> address */
> > +   if (msi_iova && (msi_iova != msi_addr)) {
> > +   msg.address_hi = (u32)(msi_iova >> 32);
> > +   /* Keep Lower bits from original msi message address */
> > +   msg.address_lo &= PAGE_MASK;
> > +   msg.address_lo |= (u32)(msi_iova & 0x);
> 
> Seems like you're making some assumptions here that are dependent on the
> architecture and maybe the platform.

What I tried is to map the msi 

Re: [Qemu-devel] [PATCH 1/3] Target-microblaze: Remove unnecessary variable

2015-10-05 Thread Peter Maydell
On 5 October 2015 at 08:18, Michael Tokarev  wrote:
> 05.10.2015 08:18, Markus Armbruster wrote:
>> Why?  I like having the semantic patch in the commit message when
>> there's any chance we'll want do the same mechanical change again later.
>>
>> You could save space and include it by reference, though: "Same
>> Coccinelle semantic patch as is commit 74c373e".
>
> git commit messages aren't good documentation for various scripts
> like this, this info will be lost in the noize.  If it might be
> better to keep such scripts in a separate file where it is easier
> to find, or in a wiki page on the site. The key point is where to
> find the info, git log is difficult for that, especially when you
> don't know what to search for or that such a script exists in
> there in the first place.
>
> On the other hand, when git log is cluttered by such a long messages
> for such small changes, it becomes more difficult to find info which
> you really look in git log -- namely, which changes were made that
> might have introduced this regression, things like that.

I think it can be useful when you're looking at a commit
to know that it was automatically created, especially if
it's a big commit. It means that if you're looking for
a bug in it you can concentrate on the script that created
it rather than the possibly large set of changes it produced,
or if you're trying to cherry-pick it into another branch you
can just apply the script instead.

In a commit with a change this small it's not very significant
either way, though.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state

2015-10-05 Thread Bhushan Bharat


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Saturday, October 03, 2015 4:16 AM
> To: Bhushan Bharat-R65777 
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> marc.zyng...@arm.com; will.dea...@arm.com
> Subject: Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs
> automap state
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > This patch allows the user-space to know whether msi-pages are
> > automatically mapped with some magic iova or not.
> >
> > Even if the msi-pages are automatically mapped, still user-space wants
> > to over-ride the automatic iova selection for msi-mapping.
> > For this user-space need to know whether it is allowed to change the
> > automatic mapping or not and this API provides this mechanism.
> > Follow up patches will provide how to over-ride this.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 32
> 
> >  include/uapi/linux/vfio.h   |  3 +++
> >  2 files changed, 35 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index fa5d3e4..3315fb6 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -59,6 +59,7 @@ struct vfio_iommu {
> > struct rb_root  dma_list;
> > boolv2;
> > boolnesting;
> > +   boolallow_msi_reconfig;
> > struct list_headreserved_iova_list;
> >  };
> >
> > @@ -1117,6 +1118,23 @@ static int
> vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
> > return ret;
> >  }
> >
> > +static
> > +int vfio_domains_get_msi_maps(struct vfio_iommu *iommu,
> > + struct iommu_domain_msi_maps *msi_maps) {
> > +   struct vfio_domain *d;
> > +   int ret;
> > +
> > +   mutex_lock(>lock);
> > +   /* All domains have same msi-automap property, pick first */
> > +   d = list_first_entry(>domain_list, struct vfio_domain, next);
> > +   ret = iommu_domain_get_attr(d->domain,
> DOMAIN_ATTR_MSI_MAPPING,
> > +   msi_maps);
> > +   mutex_unlock(>lock);
> > +
> > +   return ret;
> > +}
> > +
> >  static long vfio_iommu_type1_ioctl(void *iommu_data,
> >unsigned int cmd, unsigned long arg)  { @@
> -1138,6 +1156,8 @@
> > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > }
> > } else if (cmd == VFIO_IOMMU_GET_INFO) {
> > struct vfio_iommu_type1_info info;
> > +   struct iommu_domain_msi_maps msi_maps;
> > +   int ret;
> >
> > minsz = offsetofend(struct vfio_iommu_type1_info,
> iova_pgsizes);
> >
> > @@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void
> > *iommu_data,
> >
> > info.flags = 0;
> >
> > +   ret = vfio_domains_get_msi_maps(iommu, _maps);
> > +   if (ret)
> > +   return ret;
> 
> And now ioctl(VFIO_IOMMU_GET_INFO) no longer works for any IOMMU
> implementing domain_get_attr but not supporting
> DOMAIN_ATTR_MSI_MAPPING.

With this current patch version this will get the default assumed behavior as 
you commented on previous patch. 

> 
> > +
> > +   if (msi_maps.override_automap) {
> > +   info.flags |=
> VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG;
> > +   iommu->allow_msi_reconfig = true;
> > +   }
> > +
> > +   if (msi_maps.automap)
> > +   info.flags |= VFIO_IOMMU_INFO_MSI_AUTOMAP;
> > +
> > info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> >
> > return copy_to_user((void __user *)arg, , minsz); diff --
> git
> > a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index
> > 1abd1a9..9998f6e 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -391,6 +391,9 @@ struct vfio_iommu_type1_info {
> > __u32   argsz;
> > __u32   flags;
> >  #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)   /* supported page
> sizes info */
> > +#define VFIO_IOMMU_INFO_MSI_AUTOMAP (1 << 1)   /* MSI pages
> are auto-mapped
> > +  in iommu */
> > +#define VFIO_IOMMU_INFO_MSI_ALLOW_RECONFIG (1 << 2) /* Allows
> > +reconfig automap*/
> > __u64   iova_pgsizes;   /* Bitmap of supported page sizes */
> >  };
> >
> 
> Once again, exposing interfaces to the user before they actually do anything
> is backwards.

Will change the order.

Thanks
-Bharat



RE: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages

2015-10-05 Thread Bhushan Bharat


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Saturday, October 03, 2015 4:16 AM
> To: Bhushan Bharat-R65777 
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> marc.zyng...@arm.com; will.dea...@arm.com
> Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI
> pages
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > For MSI interrupts to work for a pass-through devices we need to have
> > mapping of msi-pages in iommu. Now on some platforms (like x86) does
> > this msi-pages mapping happens magically and in these case they
> > chooses an iova which they somehow know that it will never overlap
> > with guest memory. But this magic iova selection may not be always
> > true for all platform (like PowerPC and ARM64).
> >
> > Also on x86 platform, there is no problem as long as running a
> > x86-guest on x86-host but there can be issues when running a non-x86
> > guest on
> > x86 host or other userspace applications like (I think ODP/DPDK).
> > As in these cases there can be chances that it overlaps with guest
> > memory mapping.
> 
> Wow, it's amazing anything works... smoke and mirrors.
> 
> > This patch add interface to iommu-map and iommu-unmap msi-pages at
> > reserved iova chosen by userspace.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  drivers/vfio/vfio.c |  52 +++
> >  drivers/vfio/vfio_iommu_type1.c | 111
> 
> >  include/linux/vfio.h|   9 +++-
> >  3 files changed, 171 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > 2fb29df..a817d2d 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct
> notifier_block *nb,
> > return NOTIFY_OK;
> >  }
> >
> > +int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
> > +   uint32_t size, uint64_t *msi_iova) {
> > +   struct vfio_container *container = device->group->container;
> > +   struct vfio_iommu_driver *driver;
> > +   int ret;
> > +
> > +   /* Validate address and size */
> > +   if (!msi_addr || !size || !msi_iova)
> > +   return -EINVAL;
> > +
> > +   down_read(>group_lock);
> > +
> > +   driver = container->iommu_driver;
> > +   if (!driver || !driver->ops || !driver->ops->msi_map) {
> > +   up_read(>group_lock);
> > +   return -EINVAL;
> > +   }
> > +
> > +   ret = driver->ops->msi_map(container->iommu_data,
> > +  msi_addr, size, msi_iova);
> > +
> > +   up_read(>group_lock);
> > +   return ret;
> > +}
> > +
> > +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t
> msi_iova,
> > + uint64_t size)
> > +{
> > +   struct vfio_container *container = device->group->container;
> > +   struct vfio_iommu_driver *driver;
> > +   int ret;
> > +
> > +   /* Validate address and size */
> > +   if (!msi_iova || !size)
> > +   return -EINVAL;
> > +
> > +   down_read(>group_lock);
> > +
> > +   driver = container->iommu_driver;
> > +   if (!driver || !driver->ops || !driver->ops->msi_unmap) {
> > +   up_read(>group_lock);
> > +   return -EINVAL;
> > +   }
> > +
> > +   ret = driver->ops->msi_unmap(container->iommu_data,
> > +msi_iova, size);
> > +
> > +   up_read(>group_lock);
> > +   return ret;
> > +}
> > +
> >  /**
> >   * VFIO driver API
> >   */
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 3315fb6..ab376c2 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -1003,12 +1003,34 @@ out_free:
> > return ret;
> >  }
> >
> > +static void vfio_iommu_unmap_all_reserved_regions(struct vfio_iommu
> > +*iommu) {
> > +   struct vfio_resvd_region *region;
> > +   struct vfio_domain *d;
> > +
> > +   list_for_each_entry(region, >reserved_iova_list, next) {
> > +   list_for_each_entry(d, >domain_list, next) {
> > +   if (!region->map_paddr)
> > +   continue;
> > +
> > +   if (!iommu_iova_to_phys(d->domain, region->iova))
> > +   continue;
> > +
> > +   iommu_unmap(d->domain, region->iova,
> PAGE_SIZE);
> 
> PAGE_SIZE?  Why not region->size?

Yes, this should be region->size.

> 
> > +   region->map_paddr = 0;
> > +   cond_resched();
> > +   }
> > +   }
> > +}
> > +
> >  static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)  {
> > struct rb_node *node;
> >
> > while ((node = rb_first(>dma_list)))
> > vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma,
> node));
> > +
> > +   

RE: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI

2015-10-05 Thread Bhushan Bharat


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Saturday, October 03, 2015 4:17 AM
> To: Bhushan Bharat-R65777 
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> marc.zyng...@arm.com; will.dea...@arm.com
> Subject: Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for
> MSI
> 
> On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > Finally ARM SMMU declare that iommu-mapping for MSI-pages are not set
> > automatically and it should be set explicitly.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  drivers/iommu/arm-smmu.c | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index
> > a3956fb..9d37e72 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -1401,13 +1401,18 @@ static int arm_smmu_domain_get_attr(struct
> iommu_domain *domain,
> > enum iommu_attr attr, void *data)  {
> > struct arm_smmu_domain *smmu_domain =
> to_smmu_domain(domain);
> > +   struct iommu_domain_msi_maps *msi_maps;
> >
> > switch (attr) {
> > case DOMAIN_ATTR_NESTING:
> > *(int *)data = (smmu_domain->stage ==
> ARM_SMMU_DOMAIN_NESTED);
> > return 0;
> > case DOMAIN_ATTR_MSI_MAPPING:
> > -   /* Dummy handling added */
> > +   msi_maps = data;
> > +
> > +   msi_maps->automap = false;
> > +   msi_maps->override_automap = true;
> > +
> > return 0;
> > default:
> > return -ENODEV;
> 
> In previous discussions I understood one of the problems you were trying to
> solve was having a limited number of MSI banks and while you may be able
> to get isolated MSI banks for some number of users, it wasn't unlimited and
> sharing may be required.  I don't see any of that addressed in this series.

That problem was on PowerPC. Infact there were two problems, one which MSI bank 
to be used and second how to create iommu-mapping for device assigned to 
userspace.
First problem was PowerPC specific and that will be solved separately.
For second problem, earlier I tried to added a couple of MSI specific ioctls 
and you suggested (IIUC) that we should have a generic reserved-iova type of 
API and then we can map MSI bank using reserved-iova and this will not require 
involvement of user-space.

> 
> Also, the management of reserved IOVAs vs MSI addresses looks really
> dubious to me.  How does your platform pick an MSI address and what are
> we breaking by covertly changing it?  We seem to be masking over at the
> VFIO level, where there should be lower level interfaces doing the right thing
> when we configure MSI on the device.

Yes, In my understanding the right solution should be:
 1) VFIO driver should know what physical-msi-address will be used for devices 
in an iommu-group.
I did not find an generic API, on PowerPC I added some function in 
ffrescale msi-driver and called from vfio-iommu-fsl-pamu.c (not yet upstreamed).
 2) VFIO driver should know what IOVA to be used for creating iommu-mapping 
(VFIO APIs patch of this patch series)
 3) VFIO driver will create the iommu-mapping using (1) and (2)
 4) VFIO driver should be able to tell the msi-driver that for a given device 
it should use different IOVA. So when composing the msi message (for the 
devices is the given iommu-group) it should use that programmed iova as 
MSI-address. This interface also needed to be developed.

I was not sure of which approach we should take. The current approach in the 
patch is simple to develop so I went ahead to take input but I agree this does 
not look very good.
What do you think, should drop this approach and work out the approach as 
described above.

Thanks
-Bharat
> 
> The problem of reporting "automap" base address isn't addressed more than
> leaving some unused field in iommu_domain_msi_maps.

N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-05 Thread Joerg Roedel
On Thu, Oct 01, 2015 at 06:31:27PM -0400, Bandan Das wrote:
> Paolo Bonzini  writes:
> 
> > On 01/10/2015 13:43, Dirk Müller wrote:
> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> index 94b7d15..0a42859 100644
> >> --- a/arch/x86/kvm/svm.c
> >> +++ b/arch/x86/kvm/svm.c
> >> @@ -514,7 +514,7 @@ static void skip_emulated_instruction(struct kvm_vcpu 
> >> *vcpu)
> >>struct vcpu_svm *svm = to_svm(vcpu);
> >>  
> >>if (svm->vmcb->control.next_rip != 0) {
> >> -  WARN_ON(!static_cpu_has(X86_FEATURE_NRIPS));
> >> +  WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
> >>svm->next_rip = svm->vmcb->control.next_rip;
> >>}
> >>  
> >
> > Bandan, what was the reason for warning here?
> 
> I added the warning so that we catch if the next_rip field is being written
> to (even if the feature isn't supported) by a buggy L1 hypervisor.

Even if the L1 hypervisor writes to the next_rip field in the VMCB, we
would never see it in this code path, as we access the shadow VMCB in
this statement.

We don't even care if the L1 hypervisor writes to its next_rip field
because we only write to this field on an emulatated VMEXIT and never
read it back.

So what's the point in adding a guest-triggerable warning at all?



Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm-all: notice KVM of vcpu's TSC rate after migration

2015-10-05 Thread Haozhong Zhang
On Wed, Sep 30, 2015 at 05:36:11PM -0300, Eduardo Habkost wrote:
> On Wed, Sep 30, 2015 at 08:32:26AM +0800, Haozhong Zhang wrote:
> > > [...]
> > > > > Or maybe we shouldn't treat this as VM state, but as configuration, 
> > > > > and
> > > > > let management configure the TSC frequency explicitly if the user 
> > > > > really
> > > > > needs it to stay the same during migration.
> > > > >
> > > > > (CCing libvir-list to see if they have feedback)
> > > > >
> > > > 
> > > > Thanks for CC. I'll consider to add a command line option to control
> > > > the configuration of guest TSC fequency.
> > > 
> > > It already exists, -cpu has a "tsc-freq" option.
> > >
> > 
> > What I'm considering is to add a "-keep-tsc-freq" option to control
> > whether the TSC frequency should be migrated if "tsc-freq" is not
> > presented. Compared to "tsc-freq", "-keep-tsc-freq" can free users
> > from figuring out the guest TSC frequency manually in the migration.
> 
> If you do that, please make it a property of the CPU object, so it will
> can be set as a "-cpu" option.
>

Yes, I'll do so.

- Haozhong

> -- 
> Eduardo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/3] target-i386: add a subsection of vcpu's TSC rate in vmstate_x86_cpu

2015-10-05 Thread Haozhong Zhang
On Wed, Sep 30, 2015 at 09:07:08AM +0100, Dr. David Alan Gilbert wrote:
> * Haozhong Zhang (haozhong.zh...@intel.com) wrote:
> > On Tue, Sep 29, 2015 at 08:00:13PM +0100, Dr. David Alan Gilbert wrote:
> > > * Haozhong Zhang (haozhong.zh...@intel.com) wrote:
> > > > The newly added subsection 'vmstate_tsc_khz' in this patch results in
> > > > vcpu's TSC rate being saved on the source machine and loaded on the
> > > > target machine during the migration.
> > > > 
> > > > Signed-off-by: Haozhong Zhang 
> > > 
> > > Hi,
> > >   I'd appreciate it if you could tie this to only do it on newer
> > > machine types; that way it won't break back migration.
> > >
> > 
> > Does "back migration" mean migrating from QEMU w/ vmstate_tsc_khz
> > subsection to QEMU w/o that subsection?
> 
> Yes; like if we migrate from a newer qemu to an older qemu but with
> the same machine type.
>

This patch does break the back migration. I'll fix this in the next version.

- Haozhong

> Dave
> 
> > 
> > - Haozhong
> > 
> > > Dave
> > > 
> > > > ---
> > > >  target-i386/machine.c | 20 
> > > >  1 file changed, 20 insertions(+)
> > > > 
> > > > diff --git a/target-i386/machine.c b/target-i386/machine.c
> > > > index 9fa0563..80108a3 100644
> > > > --- a/target-i386/machine.c
> > > > +++ b/target-i386/machine.c
> > > > @@ -752,6 +752,25 @@ static const VMStateDescription vmstate_xss = {
> > > >  }
> > > >  };
> > > >  
> > > > +static bool tsc_khz_needed(void *opaque)
> > > > +{
> > > > +X86CPU *cpu = opaque;
> > > > +CPUX86State *env = >env;
> > > > +
> > > > +return env->tsc_khz != 0;
> > > > +}
> > > > +
> > > > +static const VMStateDescription vmstate_tsc_khz = {
> > > > +.name = "cpu/tsc_khz",
> > > > +.version_id = 1,
> > > > +.minimum_version_id = 1,
> > > > +.needed = tsc_khz_needed,
> > > > +.fields = (VMStateField[]) {
> > > > +VMSTATE_INT64(env.tsc_khz, X86CPU),
> > > > +VMSTATE_END_OF_LIST()
> > > > +}
> > > > +};
> > > > +
> > > >  VMStateDescription vmstate_x86_cpu = {
> > > >  .name = "cpu",
> > > >  .version_id = 12,
> > > > @@ -871,6 +890,7 @@ VMStateDescription vmstate_x86_cpu = {
> > > >  _msr_hyperv_crash,
> > > >  _avx512,
> > > >  _xss,
> > > > +_tsc_khz,
> > > >  NULL
> > > >  }
> > > >  };
> > > > -- 
> > > > 2.4.8
> > > > 
> > > > 
> > > --
> > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/12] KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch

2015-10-05 Thread Haozhong Zhang
On Mon, Oct 05, 2015 at 09:26:30PM +0200, Radim Krčmář wrote:
> 2015-09-28 13:38+0800, Haozhong Zhang:
> > This patch moves the field of TSC scaling ratio from the architecture
> > struct vcpu_svm to the common struct kvm_vcpu_arch.
> > 
> > Signed-off-by: Haozhong Zhang 
> > ---
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > @@ -7080,6 +7080,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
> >  
> > vcpu = kvm_x86_ops->vcpu_create(kvm, id);
> >  
> > +   if (!IS_ERR(vcpu))
> > +   vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;
> 
> This shouldn't be necessary, (and we can definitely do it without error
> checking later)
> 
>  kvm_arch_vcpu_create
>(vmx|svm)_create_vcpu
>  kvm_vcpu_init
>kvm_arch_vcpu_init
>  kvm_set_tsc_khz
> 
> sets vcpu->arch.tsc_scaling_ratio to something reasonable and SVM didn't
> overwrite that value.  (kvm_set_tsc_khz() only doesn't set the ration if
> this_tsc_khz == 0, which we could extend to be extra safe.)

Thanks Radim! I even didn't notice this path. I'll remove the ratio
setting in kvm_arch_vcpu_create(). In kvm_set_tsc_khz(), if
this_tsc_khz == 0, I'll make it set vcpu->arch.tsc_scaling_ratio to
kvm_default_tsc_scaling_ratio.

- Haozhong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCHv2] arm: Add PMU test

2015-10-05 Thread Wei Huang


On 10/02/2015 10:48 AM, Christopher Covington wrote:
> Add test the ARM Performance Monitors Unit (PMU). The informational
> fields from the control register are printed, but not checked, and
> the number of cycles it takes to run a known-instruction-count loop
> is printed, but not checked. Once QEMU is fixed, we can at least
> begin to check for IPC == 1 when using -icount.

Thanks for submitting it. I think this is a good starting point to add
PMU unit testing support for ARM. Some comments below.

> 
> Signed-off-by: Christopher Covington 
> ---
>  arm/pmu.c   | 89 
> +
>  arm/unittests.cfg   | 11 ++
>  config/config-arm64.mak |  4 ++-
>  3 files changed, 103 insertions(+), 1 deletion(-)
>  create mode 100644 arm/pmu.c
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> new file mode 100644
> index 000..f724c2c
> --- /dev/null
> +++ b/arm/pmu.c
> @@ -0,0 +1,89 @@
> +/*
> + * Test the ARM Performance Monitors Unit (PMU).
> + *
> + * Copyright 2015 The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU Lesser General Public License version 2.1 and
> + * only version 2.1 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but 
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public 
> License
> + * for more details.
> + */
> +#include "libcflat.h"
> +
> +struct pmu_data {
> + union {
> + uint32_t pmcr_el0;
> + struct {
> + unsigned int enable:1;
> + unsigned int event_counter_reset:1;
> + unsigned int cycle_counter_reset:1;
> + unsigned int cycle_counter_clock_divider:1;
> + unsigned int event_counter_export:1;
> + unsigned int cycle_counter_disable_when_prohibited:1;
> + unsigned int cycle_counter_long:1;
> + unsigned int zeros:4;
> + unsigned int num_counters:5;
> + unsigned int identification_code:8;
> + unsigned int implementor:8;

Not saying it is a problem because "unsigned int" is 32-bit on 64bit
machine. But to make it consistent with pmcr_el0, I would prefer
"unsigned int" been replaced by "uint32_t".

> + };
> + };
> +};
> +
> +/* Execute a known number of guest instructions. Only odd instruction counts
> + * greater than or equal to 3 are supported. The control register (PMCR) is
> + * initialized with the provided value (allowing for example for the cycle
> + * counter or eventer count to be reset if needed). After the known 
> instruction
> + * count loop, zero is written to the PMCR to disable counting, allowing the
> + * cycle counter or event counters to be read as needed at a later time.
> + */
> +static void measure_instrs(int len, struct pmu_data pmcr)
> +{
> + int i = (len - 1) / 2;
> +
> + if (len < 3 || ((len - 1) % 2))
> + abort();
> +
> + asm volatile(
> + "msr pmcr_el0, %[pmcr]\n"
> + "1: subs %[i], %[i], #1\n"
> + "b.gt 1b\n"
> + "msr pmcr_el0, xzr"
> + : [i] "+r" (i) : [pmcr] "r" (pmcr) : "cc");
> +}
> +
> +int main()
> +{
> + struct pmu_data pmcr;
> + const int samples = 10;
> +
> + asm volatile("mrs %0, pmcr_el0" : "=r" (pmcr));
> +
> + printf("PMU implementor: %c\n", pmcr.implementor);
> + printf("Identification code: 0x%x\n", pmcr.identification_code);
> + printf("Event counters:  %d\n", pmcr.num_counters);
> +
> + pmcr.cycle_counter_reset = 1;
> + pmcr.enable = 1;
> +
> + printf("\ninstructions : cycles0 cycles1 ...\n");
> +
> + for (int i = 3; i < 300; i += 32) {
> + int avg, sum = 0;
> + printf("%d :", i);
> + for (int j = 0; j < samples; j++) {
> + int val;
> + measure_instrs(i, pmcr);
> + asm volatile("mrs %0, pmccntr_el0" : "=r" (val));
> + sum += val;
> + printf(" %d", val);
> + }
> + avg = sum / samples;
> + printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n", sum, avg, i / 
> avg, avg / i);
> + }

I understand that, as stated in commit message, it currently doesn't
check the correctness of PMU counter values. But it would be better if
testing code can be abstracted into an independent function (e.g.
instr_cycle_check) for report("Instruction Cycles",
instr_cycle_check()). You can return TRUE in the checking code for now.


> +
> + return report_summary();
> +}
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 

Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-05 Thread Bandan Das
Dirk Müller  writes:

>> So the right fix would be to just set the guests next_rip to 0 when we
>> emulate vmrun, just like real hardware does, no?
>
> Like this? (Note: I’m not sure what I’m doing here..). I Agree with you that 
> the warning
> for this seems excessive, I’ve just removed it. 
>
>
> From 47db81837b6fe58aa302317bbf2a092b40af0a36 Mon Sep 17 00:00:00 2001
> From: Dirk Mueller 
> Date: Fri, 2 Oct 2015 08:35:24 +0200
> Subject: [PATCH] KVM: nSVM: Check for NRIP support before passing on
>  control.next_rip
>
> NRIP support itself depends on cpuid Fn8000_000A_EDX[NRIPS], so
> ignore it if the cpu feature is not available. Remove the
> guest-triggerable WARN_ON for this as it just emulates real
> hardware behavior.
>
> Signed-off-by: Dirk Mueller 
> ---
>  arch/x86/kvm/svm.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 0a42859..66e3a4c 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -514,8 +514,10 @@ static void skip_emulated_instruction(struct kvm_vcpu 
> *vcpu)
>   struct vcpu_svm *svm = to_svm(vcpu);
>  
>   if (svm->vmcb->control.next_rip != 0) {
> - WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
> - svm->next_rip = svm->vmcb->control.next_rip;
> + if (static_cpu_has(X86_FEATURE_NRIPS))
> + svm->next_rip = svm->vmcb->control.next_rip;
> + else
> + svm->next_rip = 0;

skip_emulated_instruction probably isn't the right place.

>   }

So, L1 writes this field in svm_check_intercept.

I went back to the spec and it says (15.7.1):
The next sequential instruction pointer (nRIP) is saved in the guest VMCB
control area at location C8h on all #VMEXITs that are due to instruction
intercepts, as defined in Section 15.9 on page 458, as well as MSR and
IOIO intercepts and exceptions caused by the INT3, INTO, and BOUND
instructions. For all other intercepts, nRIP is reset to zero.

So, I think a better way would be to reset the field on vmexit from
L2 to L0 if NRIP isn't supported. Maybe it's enough to do this only for
the intercepts mentioned above but I am not sure.

Also, please include a debug message (pr_debug_once will do). It seems
unnecessary today, but in 2 years when someone is scratching his head why his
nested guests won't run, he will thank you!

BTW, it seems possible to unconditionally advertise SVM_FEATURE_NRIP..

>   if (!svm->next_rip) {
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/12] KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch

2015-10-05 Thread Radim Krčmář
2015-09-28 13:38+0800, Haozhong Zhang:
> This patch moves the field of TSC scaling ratio from the architecture
> struct vcpu_svm to the common struct kvm_vcpu_arch.
> 
> Signed-off-by: Haozhong Zhang 
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -7080,6 +7080,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
>  
>   vcpu = kvm_x86_ops->vcpu_create(kvm, id);
>  
> + if (!IS_ERR(vcpu))
> + vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio;

This shouldn't be necessary, (and we can definitely do it without error
checking later)

 kvm_arch_vcpu_create
   (vmx|svm)_create_vcpu
 kvm_vcpu_init
   kvm_arch_vcpu_init
 kvm_set_tsc_khz

sets vcpu->arch.tsc_scaling_ratio to something reasonable and SVM didn't
overwrite that value.  (kvm_set_tsc_khz() only doesn't set the ration if
this_tsc_khz == 0, which we could extend to be extra safe.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova region

2015-10-05 Thread Alex Williamson
On Mon, 2015-10-05 at 04:55 +, Bhushan Bharat wrote:
> Hi Alex,
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Saturday, October 03, 2015 4:16 AM
> > To: Bhushan Bharat-R65777 
> > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> > marc.zyng...@arm.com; will.dea...@arm.com
> > Subject: Re: [RFC PATCH 1/6] vfio: Add interface for add/del reserved iova
> > region
> > 
> > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > This Patch adds the VFIO APIs to add and remove reserved iova regions.
> > > The reserved iova region can be used for mapping some specific
> > > physical address in iommu.
> > >
> > > Currently we are planning to use this interface for adding iova
> > > regions for creating iommu of msi-pages. But the API are designed for
> > > future extension where some other physical address can be mapped.
> > >
> > > Signed-off-by: Bharat Bhushan 
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 201
> > +++-
> > >  include/uapi/linux/vfio.h   |  43 +
> > >  2 files changed, 243 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c index 57d8c37..fa5d3e4 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -59,6 +59,7 @@ struct vfio_iommu {
> > >   struct rb_root  dma_list;
> > >   boolv2;
> > >   boolnesting;
> > > + struct list_headreserved_iova_list;
> > 
> > This alignment leads to poor packing in the structure, put it above the 
> > bools.
> 
> ok
> 
> > 
> > >  };
> > >
> > >  struct vfio_domain {
> > > @@ -77,6 +78,15 @@ struct vfio_dma {
> > >   int prot;   /* IOMMU_READ/WRITE */
> > >  };
> > >
> > > +struct vfio_resvd_region {
> > > + dma_addr_t  iova;
> > > + size_t  size;
> > > + int prot;   /* IOMMU_READ/WRITE */
> > > + int refcount;   /* ref count of mappings */
> > > + uint64_tmap_paddr;  /* Mapped Physical Address
> > */
> > 
> > phys_addr_t
> 
> Ok,
> 
> > 
> > > + struct list_head next;
> > > +};
> > > +
> > >  struct vfio_group {
> > >   struct iommu_group  *iommu_group;
> > >   struct list_headnext;
> > > @@ -106,6 +116,38 @@ static struct vfio_dma *vfio_find_dma(struct
> > vfio_iommu *iommu,
> > >   return NULL;
> > >  }
> > >
> > > +/* This function must be called with iommu->lock held */ static bool
> > > +vfio_overlap_with_resvd_region(struct vfio_iommu *iommu,
> > > +dma_addr_t start, size_t size) {
> > > + struct vfio_resvd_region *region;
> > > +
> > > + list_for_each_entry(region, >reserved_iova_list, next) {
> > > + if (region->iova < start)
> > > + return (start - region->iova < region->size);
> > > + else if (start < region->iova)
> > > + return (region->iova - start < size);
> > 
> > <= on both of the return lines?
> 
> I think is should be "<" and not "=<", no ?

Yep, looks like you're right.  Maybe there's a more straightforward way
to do this.

> > 
> > > +
> > > + return (region->size > 0 && size > 0);
> > > + }
> > > +
> > > + return false;
> > > +}
> > > +
> > > +/* This function must be called with iommu->lock held */ static
> > > +struct vfio_resvd_region *vfio_find_resvd_region(struct vfio_iommu
> > *iommu,
> > > +  dma_addr_t start, size_t
> > size) {
> > > + struct vfio_resvd_region *region;
> > > +
> > > + list_for_each_entry(region, >reserved_iova_list, next)
> > > + if (region->iova == start && region->size == size)
> > > + return region;
> > > +
> > > + return NULL;
> > > +}
> > > +
> > >  static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma
> > > *new)  {
> > >   struct rb_node **link = >dma_list.rb_node, *parent =
> > NULL; @@
> > > -580,7 +622,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> > >
> > >   mutex_lock(>lock);
> > >
> > > - if (vfio_find_dma(iommu, iova, size)) {
> > > + if (vfio_find_dma(iommu, iova, size) ||
> > > + vfio_overlap_with_resvd_region(iommu, iova, size)) {
> > >   mutex_unlock(>lock);
> > >   return -EEXIST;
> > >   }
> > > @@ -626,6 +669,127 @@ static int vfio_dma_do_map(struct vfio_iommu
> > *iommu,
> > >   return ret;
> > >  }
> > >
> > > +/* This function must be called with iommu->lock held */ static int
> > > +vfio_iommu_resvd_region_del(struct vfio_iommu *iommu,
> > > + dma_addr_t iova, size_t size, int prot) {
> > > + struct vfio_resvd_region *res_region;
> > 
> > Have some consistency in naming, just use "region".
> 
> Ok,
> 
> > > 

Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi interrupt

2015-10-05 Thread Alex Williamson
On Mon, 2015-10-05 at 07:20 +, Bhushan Bharat wrote:
> 
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Saturday, October 03, 2015 4:17 AM
> > To: Bhushan Bharat-R65777 
> > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> > marc.zyng...@arm.com; will.dea...@arm.com
> > Subject: Re: [RFC PATCH 5/6] vfio-pci: Create iommu mapping for msi
> > interrupt
> > 
> > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > An MSI-address is allocated and programmed in pcie device during
> > > interrupt configuration. Now for a pass-through device, try to create
> > > the iommu mapping for this allocted/programmed msi-address.  If the
> > > iommu mapping is created and the msi address programmed in the pcie
> > > device is different from msi-iova as per iommu programming then
> > > reconfigure the pci device to use msi-iova as msi address.
> > >
> > > Signed-off-by: Bharat Bhushan 
> > > ---
> > >  drivers/vfio/pci/vfio_pci_intrs.c | 36
> > > ++--
> > >  1 file changed, 34 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c
> > > b/drivers/vfio/pci/vfio_pci_intrs.c
> > > index 1f577b4..c9690af 100644
> > > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > > @@ -312,13 +312,23 @@ static int vfio_msi_set_vector_signal(struct
> > vfio_pci_device *vdev,
> > >   int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> > >   char *name = msix ? "vfio-msix" : "vfio-msi";
> > >   struct eventfd_ctx *trigger;
> > > + struct msi_msg msg;
> > > + struct vfio_device *device;
> > > + uint64_t msi_addr, msi_iova;
> > >   int ret;
> > >
> > >   if (vector >= vdev->num_ctx)
> > >   return -EINVAL;
> > >
> > > + device = vfio_device_get_from_dev(>dev);
> > 
> > Have you looked at this function?  I don't think we want to be doing that
> > every time we want to poke the interrupt configuration.
> 
> I am trying to describe what I understood, a device can have many interrupts 
> and we should setup iommu only once, when called for the first time to 
> enable/setup interrupt.
> Similarly when disabling the interrupt we should iommu-unmap when called for 
> the last enabled interrupt for that device. Now with this understanding, 
> should I move this map-unmap to separate functions and call them from 
> vfio_msi_set_block() rather than in vfio_msi_set_vector_signal()

Interrupts can be setup and torn down at any time and I don't see how
one function or the other makes much difference.
vfio_device_get_from_dev() is enough overhead that the data we need
should be cached if we're going to call it with some regularity.  Maybe
vfio_iommu_driver_ops.open() should be called with a pointer to the
vfio_device... or the vfio_group.

> >  Also note that
> > IOMMU mappings don't operate on devices, but groups, so maybe we want
> > to pass the group.
> 
> Yes, it operates on group. I hesitated to add an API to get group. Do you 
> suggest to that it is ok to add API to get group from device.

No, the above suggestion is probably better.

> > 
> > > + if (device == NULL)
> > > + return -EINVAL;
> > 
> > This would be a legitimate BUG_ON(!device)
> > 
> > > +
> > >   if (vdev->ctx[vector].trigger) {
> > >   free_irq(irq, vdev->ctx[vector].trigger);
> > > + get_cached_msi_msg(irq, );
> > > + msi_iova = ((u64)msg.address_hi << 32) | msg.address_lo;
> > > + vfio_device_unmap_msi(device, msi_iova, PAGE_SIZE);
> > >   kfree(vdev->ctx[vector].name);
> > >   eventfd_ctx_put(vdev->ctx[vector].trigger);
> > >   vdev->ctx[vector].trigger = NULL;
> > > @@ -346,12 +356,11 @@ static int vfio_msi_set_vector_signal(struct
> > vfio_pci_device *vdev,
> > >* cached value of the message prior to enabling.
> > >*/
> > >   if (msix) {
> > > - struct msi_msg msg;
> > > -
> > >   get_cached_msi_msg(irq, );
> > >   pci_write_msi_msg(irq, );
> > >   }
> > >
> > > +
> > 
> > gratuitous newline
> > 
> > >   ret = request_irq(irq, vfio_msihandler, 0,
> > > vdev->ctx[vector].name, trigger);
> > >   if (ret) {
> > > @@ -360,6 +369,29 @@ static int vfio_msi_set_vector_signal(struct
> > vfio_pci_device *vdev,
> > >   return ret;
> > >   }
> > >
> > > + /* Re-program the new-iova in pci-device in case there is
> > > +  * different iommu-mapping created for programmed msi-address.
> > > +  */
> > > + get_cached_msi_msg(irq, );
> > > + msi_iova = 0;
> > > + msi_addr = (u64)(msg.address_hi) << 32 | (u64)(msg.address_lo);
> > > + ret = vfio_device_map_msi(device, msi_addr, PAGE_SIZE,
> > _iova);
> > > + if (ret) {
> > > + free_irq(irq, vdev->ctx[vector].trigger);
> > > + kfree(vdev->ctx[vector].name);
> > > + 

Re: [RFC PATCH 3/6] vfio: Extend iommu-info to return MSIs automap state

2015-10-05 Thread Alex Williamson
On Mon, 2015-10-05 at 06:00 +, Bhushan Bharat wrote:
> > -1138,6 +1156,8 @@
> > > static long vfio_iommu_type1_ioctl(void *iommu_data,
> > >   }
> > >   } else if (cmd == VFIO_IOMMU_GET_INFO) {
> > >   struct vfio_iommu_type1_info info;
> > > + struct iommu_domain_msi_maps msi_maps;
> > > + int ret;
> > >
> > >   minsz = offsetofend(struct vfio_iommu_type1_info,
> > iova_pgsizes);
> > >
> > > @@ -1149,6 +1169,18 @@ static long vfio_iommu_type1_ioctl(void
> > > *iommu_data,
> > >
> > >   info.flags = 0;
> > >
> > > + ret = vfio_domains_get_msi_maps(iommu, _maps);
> > > + if (ret)
> > > + return ret;
> > 
> > And now ioctl(VFIO_IOMMU_GET_INFO) no longer works for any IOMMU
> > implementing domain_get_attr but not supporting
> > DOMAIN_ATTR_MSI_MAPPING.
> 
> With this current patch version this will get the default assumed behavior as 
> you commented on previous patch. 

How so?

+   msi_maps->automap = true;
+   msi_maps->override_automap = false;
+
+   if (domain->ops->domain_get_attr)
+   ret = domain->ops->domain_get_attr(domain, attr, data);

If domain_get_attr is implemented, but DOMAIN_ATTR_MSI_MAPPING is not,
ret should be an error code.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI pages

2015-10-05 Thread Alex Williamson
On Mon, 2015-10-05 at 06:27 +, Bhushan Bharat wrote:
> 
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Saturday, October 03, 2015 4:16 AM
> > To: Bhushan Bharat-R65777 
> > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> > marc.zyng...@arm.com; will.dea...@arm.com
> > Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI
> > pages
> > 
> > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > For MSI interrupts to work for a pass-through devices we need to have
> > > mapping of msi-pages in iommu. Now on some platforms (like x86) does
> > > this msi-pages mapping happens magically and in these case they
> > > chooses an iova which they somehow know that it will never overlap
> > > with guest memory. But this magic iova selection may not be always
> > > true for all platform (like PowerPC and ARM64).
> > >
> > > Also on x86 platform, there is no problem as long as running a
> > > x86-guest on x86-host but there can be issues when running a non-x86
> > > guest on
> > > x86 host or other userspace applications like (I think ODP/DPDK).
> > > As in these cases there can be chances that it overlaps with guest
> > > memory mapping.
> > 
> > Wow, it's amazing anything works... smoke and mirrors.
> > 
> > > This patch add interface to iommu-map and iommu-unmap msi-pages at
> > > reserved iova chosen by userspace.
> > >
> > > Signed-off-by: Bharat Bhushan 
> > > ---
> > >  drivers/vfio/vfio.c |  52 +++
> > >  drivers/vfio/vfio_iommu_type1.c | 111
> > 
> > >  include/linux/vfio.h|   9 +++-
> > >  3 files changed, 171 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > > 2fb29df..a817d2d 100644
> > > --- a/drivers/vfio/vfio.c
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct
> > notifier_block *nb,
> > >   return NOTIFY_OK;
> > >  }
> > >
> > > +int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr,
> > > + uint32_t size, uint64_t *msi_iova) {
> > > + struct vfio_container *container = device->group->container;
> > > + struct vfio_iommu_driver *driver;
> > > + int ret;
> > > +
> > > + /* Validate address and size */
> > > + if (!msi_addr || !size || !msi_iova)
> > > + return -EINVAL;
> > > +
> > > + down_read(>group_lock);
> > > +
> > > + driver = container->iommu_driver;
> > > + if (!driver || !driver->ops || !driver->ops->msi_map) {
> > > + up_read(>group_lock);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + ret = driver->ops->msi_map(container->iommu_data,
> > > +msi_addr, size, msi_iova);
> > > +
> > > + up_read(>group_lock);
> > > + return ret;
> > > +}
> > > +
> > > +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t
> > msi_iova,
> > > +   uint64_t size)
> > > +{
> > > + struct vfio_container *container = device->group->container;
> > > + struct vfio_iommu_driver *driver;
> > > + int ret;
> > > +
> > > + /* Validate address and size */
> > > + if (!msi_iova || !size)
> > > + return -EINVAL;
> > > +
> > > + down_read(>group_lock);
> > > +
> > > + driver = container->iommu_driver;
> > > + if (!driver || !driver->ops || !driver->ops->msi_unmap) {
> > > + up_read(>group_lock);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + ret = driver->ops->msi_unmap(container->iommu_data,
> > > +  msi_iova, size);
> > > +
> > > + up_read(>group_lock);
> > > + return ret;
> > > +}
> > > +
> > >  /**
> > >   * VFIO driver API
> > >   */
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c index 3315fb6..ab376c2 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -1003,12 +1003,34 @@ out_free:
> > >   return ret;
> > >  }
> > >
> > > +static void vfio_iommu_unmap_all_reserved_regions(struct vfio_iommu
> > > +*iommu) {
> > > + struct vfio_resvd_region *region;
> > > + struct vfio_domain *d;
> > > +
> > > + list_for_each_entry(region, >reserved_iova_list, next) {
> > > + list_for_each_entry(d, >domain_list, next) {
> > > + if (!region->map_paddr)
> > > + continue;
> > > +
> > > + if (!iommu_iova_to_phys(d->domain, region->iova))
> > > + continue;
> > > +
> > > + iommu_unmap(d->domain, region->iova,
> > PAGE_SIZE);
> > 
> > PAGE_SIZE?  Why not region->size?
> 
> Yes, this should be region->size.
> 
> > 
> > > + region->map_paddr = 0;
> > > + cond_resched();
> > > + }
> > > + }
> > > +}
> > > +
> > >  static void 

Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for MSI

2015-10-05 Thread Alex Williamson
On Mon, 2015-10-05 at 08:33 +, Bhushan Bharat wrote:
> 
> 
> > -Original Message-
> > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > Sent: Saturday, October 03, 2015 4:17 AM
> > To: Bhushan Bharat-R65777 
> > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org;
> > christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org;
> > marc.zyng...@arm.com; will.dea...@arm.com
> > Subject: Re: [RFC PATCH 6/6] arm-smmu: Allow to set iommu mapping for
> > MSI
> > 
> > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote:
> > > Finally ARM SMMU declare that iommu-mapping for MSI-pages are not set
> > > automatically and it should be set explicitly.
> > >
> > > Signed-off-by: Bharat Bhushan 
> > > ---
> > >  drivers/iommu/arm-smmu.c | 7 ++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index
> > > a3956fb..9d37e72 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -1401,13 +1401,18 @@ static int arm_smmu_domain_get_attr(struct
> > iommu_domain *domain,
> > >   enum iommu_attr attr, void *data)  {
> > >   struct arm_smmu_domain *smmu_domain =
> > to_smmu_domain(domain);
> > > + struct iommu_domain_msi_maps *msi_maps;
> > >
> > >   switch (attr) {
> > >   case DOMAIN_ATTR_NESTING:
> > >   *(int *)data = (smmu_domain->stage ==
> > ARM_SMMU_DOMAIN_NESTED);
> > >   return 0;
> > >   case DOMAIN_ATTR_MSI_MAPPING:
> > > - /* Dummy handling added */
> > > + msi_maps = data;
> > > +
> > > + msi_maps->automap = false;
> > > + msi_maps->override_automap = true;
> > > +
> > >   return 0;
> > >   default:
> > >   return -ENODEV;
> > 
> > In previous discussions I understood one of the problems you were trying to
> > solve was having a limited number of MSI banks and while you may be able
> > to get isolated MSI banks for some number of users, it wasn't unlimited and
> > sharing may be required.  I don't see any of that addressed in this series.
> 
> That problem was on PowerPC. Infact there were two problems, one which MSI 
> bank to be used and second how to create iommu-mapping for device assigned to 
> userspace.
> First problem was PowerPC specific and that will be solved separately.
> For second problem, earlier I tried to added a couple of MSI specific ioctls 
> and you suggested (IIUC) that we should have a generic reserved-iova type of 
> API and then we can map MSI bank using reserved-iova and this will not 
> require involvement of user-space.
> 
> > 
> > Also, the management of reserved IOVAs vs MSI addresses looks really
> > dubious to me.  How does your platform pick an MSI address and what are
> > we breaking by covertly changing it?  We seem to be masking over at the
> > VFIO level, where there should be lower level interfaces doing the right 
> > thing
> > when we configure MSI on the device.
> 
> Yes, In my understanding the right solution should be:
>  1) VFIO driver should know what physical-msi-address will be used for 
> devices in an iommu-group.
> I did not find an generic API, on PowerPC I added some function in 
> ffrescale msi-driver and called from vfio-iommu-fsl-pamu.c (not yet 
> upstreamed).
>  2) VFIO driver should know what IOVA to be used for creating iommu-mapping 
> (VFIO APIs patch of this patch series)
>  3) VFIO driver will create the iommu-mapping using (1) and (2)
>  4) VFIO driver should be able to tell the msi-driver that for a given device 
> it should use different IOVA. So when composing the msi message (for the 
> devices is the given iommu-group) it should use that programmed iova as 
> MSI-address. This interface also needed to be developed.
> 
> I was not sure of which approach we should take. The current approach in the 
> patch is simple to develop so I went ahead to take input but I agree this 
> does not look very good.
> What do you think, should drop this approach and work out the approach as 
> described above.

I'm certainly not interested in applying an maintaining an interim
solution that isn't the right one.  It seems like VFIO is too involved
in this process in your example.  On x86 we have per vector isolation
and the only thing we're missing is reporting back of the region used by
MSI vectors as reserved IOVA space (but it's standard on x86, so an x86
VM user will never use it for IOVA).  In your model, the MSI IOVA space
is programmable, but it has page granularity (I assume).  Therefore we
shouldn't be sharing that page with anyone.  That seems to suggest we
need to allocate a page of vector space from the host kernel, setup the
IOVA mapping, and then the host kernel should know to only allocate MSI
vectors for these devices from that pre-allocated page.  Otherwise we
need to call the interrupts unsafe, like we do on x86 without 

[PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-05 Thread Mario Smarduch
This patch series is a followup to the armv7 fp/simd lazy switch
implementation, uses similar approach and depends on the series -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016567.html
Patches are based on 4.3-rc2 commit 1f93e4a96c91093

Patches are based on earlier arm64 fp/simd optimization work -
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html

And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
32-bit guest on 64 bit host - 
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html

The patch series have been tested on Foundation Model arm64/arm64 and
arm32/arm64. The test program used can be found here 

https://github.com/mjsmar/arm-arm64-fpsimd-test

Launched upto 16 instances on 4-way Guest and another 16 on the host (both 
cases 1mS sleep), ran overnight. 

Changes v1->v2:
- Tested arm32/arm64
- rebased to 4.3-rc2
- changed a couple register accesses from 64 to 32 bit 

Mario Smarduch (2):
  add hooks for armv8 fp/simd lazy switch
  enable armv8 fp/simd lazy switch

 arch/arm/kvm/arm.c|  2 --
 arch/arm64/include/asm/kvm_asm.h  |  1 +
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/kernel/asm-offsets.c   |  1 +
 arch/arm64/kvm/hyp.S  | 59 ++-
 5 files changed, 45 insertions(+), 21 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] add hooks for armv8 fp/simd lazy switch

2015-10-05 Thread Mario Smarduch
This patch adds hooks to support fp/simd lazy switch. A vcpu flag to track
fp/simd state, and flag offset in vcpu structure.

Signed-off-by: Mario Smarduch 
---
 arch/arm64/include/asm/kvm_host.h | 3 +++
 arch/arm64/kernel/asm-offsets.c   | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 4562459..03f25d0 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -157,6 +157,9 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u64 irq_lines;  /* IRQ and FIQ levels */
 
+   /* Track fp/simd lazy switch */
+   u32 vfp_lazy;
+
/* Cache some mmu pages needed inside spinlock regions */
struct kvm_mmu_memory_cache mmu_page_cache;
 
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 8d89cf8..8311da4 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -124,6 +124,7 @@ int main(void)
   DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2));
   DEFINE(VCPU_MDCR_EL2,offsetof(struct kvm_vcpu, arch.mdcr_el2));
   DEFINE(VCPU_IRQ_LINES,   offsetof(struct kvm_vcpu, arch.irq_lines));
+  DEFINE(VCPU_VFP_LAZY, offsetof(struct kvm_vcpu, arch.vfp_lazy));
   DEFINE(VCPU_HOST_CONTEXT,offsetof(struct kvm_vcpu, 
arch.host_cpu_context));
   DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
arch.host_debug_state));
   DEFINE(VCPU_TIMER_CNTV_CTL,  offsetof(struct kvm_vcpu, 
arch.timer_cpu.cntv_ctl));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] enable armv8 fp/simd lazy switch

2015-10-05 Thread Mario Smarduch
This patch enables arm64 lazy fp/simd switch. Removes the ARM constraint,
and follows the same approach as armv7 version - found here.

https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016567.html

To summarize - provided the guest accesses fp/simd unit we limit number
of fp/simd context switches to two per vCPU execution schedule.

Signed-off-by: Mario Smarduch 
---
 arch/arm/kvm/arm.c   |  2 --
 arch/arm64/include/asm/kvm_asm.h |  1 +
 arch/arm64/kvm/hyp.S | 59 +++-
 3 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 1b1f9e9..fe609f1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -112,12 +112,10 @@ void kvm_arch_check_processor_compat(void *rtn)
  */
 static void kvm_switch_fp_regs(struct kvm_vcpu *vcpu)
 {
-#ifdef CONFIG_ARM
if (vcpu->arch.vfp_lazy == 1) {
kvm_call_hyp(__kvm_restore_host_vfp_state, vcpu);
vcpu->arch.vfp_lazy = 0;
}
-#endif
 }
 
 /**
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e37710..83dcac5 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -117,6 +117,7 @@ extern char __kvm_hyp_vector[];
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index e583613..ea99f66 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -385,14 +385,6 @@
tbz \tmp, #KVM_ARM64_DEBUG_DIRTY_SHIFT, \target
 .endm
 
-/*
- * Branch to target if CPTR_EL2.TFP bit is set (VFP/SIMD trapping enabled)
- */
-.macro skip_fpsimd_state tmp, target
-   mrs \tmp, cptr_el2
-   tbnz\tmp, #CPTR_EL2_TFP_SHIFT, \target
-.endm
-
 .macro compute_debug_state target
// Compute debug state: If any of KDE, MDE or KVM_ARM64_DEBUG_DIRTY
// is set, we do a full save/restore cycle and disable trapping.
@@ -433,10 +425,6 @@
mrs x5, ifsr32_el2
stp x4, x5, [x3]
 
-   skip_fpsimd_state x8, 2f
-   mrs x6, fpexc32_el2
-   str x6, [x3, #16]
-2:
skip_debug_state x8, 1f
mrs x7, dbgvcr32_el2
str x7, [x3, #24]
@@ -481,8 +469,15 @@
isb
 99:
msr hcr_el2, x2
-   mov x2, #CPTR_EL2_TTA
+
+   mov x2, #0
+   ldr w3, [x0, #VCPU_VFP_LAZY]
+   tbnzw3, #0, 98f
+
orr x2, x2, #CPTR_EL2_TFP
+98:
+   orr x2, x2, #CPTR_EL2_TTA
+
msr cptr_el2, x2
 
mov x2, #(1 << 15)  // Trap CP15 Cr=15
@@ -669,14 +664,12 @@ __restore_debug:
ret
 
 __save_fpsimd:
-   skip_fpsimd_state x3, 1f
save_fpsimd
-1: ret
+   ret
 
 __restore_fpsimd:
-   skip_fpsimd_state x3, 1f
restore_fpsimd
-1: ret
+   ret
 
 switch_to_guest_fpsimd:
pushx4, lr
@@ -688,6 +681,9 @@ switch_to_guest_fpsimd:
 
mrs x0, tpidr_el2
 
+   mov w2, #1
+   str w2, [x0, #VCPU_VFP_LAZY]
+
ldr x2, [x0, #VCPU_HOST_CONTEXT]
kern_hyp_va x2
bl __save_fpsimd
@@ -763,7 +759,6 @@ __kvm_vcpu_return:
add x2, x0, #VCPU_CONTEXT
 
save_guest_regs
-   bl __save_fpsimd
bl __save_sysregs
 
skip_debug_state x3, 1f
@@ -784,7 +779,6 @@ __kvm_vcpu_return:
kern_hyp_va x2
 
bl __restore_sysregs
-   bl __restore_fpsimd
/* Clear FPSIMD and Trace trapping */
msr cptr_el2, xzr
 
@@ -863,6 +857,33 @@ ENTRY(__kvm_flush_vm_context)
ret
 ENDPROC(__kvm_flush_vm_context)
 
+/**
+ * kvm_switch_fp_regs() - switch guest/host VFP/SIMD registers
+ * @vcpu:  pointer to vcpu structure.
+ *
+ */
+ENTRY(__kvm_restore_host_vfp_state)
+   pushx4, lr
+
+   kern_hyp_va x0
+   add x2, x0, #VCPU_CONTEXT
+
+   // Load Guest HCR, determine if guest is 32 or 64 bit
+   ldr x3, [x0, #VCPU_HCR_EL2]
+   tbnzx3, #HCR_RW_SHIFT, 1f
+   mrs x4, fpexc32_el2
+   str x4, [x2, #CPU_SYSREG_OFFSET(FPEXC32_EL2)]
+1:
+   bl __save_fpsimd
+
+   ldr x2, [x0, #VCPU_HOST_CONTEXT]
+   kern_hyp_va x2
+   bl __restore_fpsimd
+
+   pop x4, lr
+   ret
+ENDPROC(__kvm_restore_host_vfp_state)
+
 __kvm_hyp_panic:
// Guess the context by looking at VTTBR:
// If zero, then we're already a host.
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/12] KVM: x86: Replace call-back set_tsc_khz() with a common function

2015-10-05 Thread Haozhong Zhang
On Mon, Oct 05, 2015 at 09:53:26PM +0200, Radim Krčmář wrote:
> 2015-09-28 13:38+0800, Haozhong Zhang:
> > Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
> > patch removes the call-back set_tsc_khz() and replaces it with a common
> > function.
> > 
> > Signed-off-by: Haozhong Zhang 
> > ---
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > +static void set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool 
> > scale)
> > +{
> > +   u64 ratio, khz;
> | [...]
> > +   khz = user_tsc_khz;
> 
> I'd use "user_tsc_khz" directly.
>

I'll do so.

> > +   /* TSC scaling required  - calculate ratio */
> > +   shift = (kvm_tsc_scaling_ratio_frac_bits <= 32) ?
> > +   kvm_tsc_scaling_ratio_frac_bits : 32;
> > +   ratio = khz << shift;
> > +   do_div(ratio, tsc_khz);
> > +   ratio <<= (kvm_tsc_scaling_ratio_frac_bits - shift);
> 
> VMX is losing 16 bits by this operation;  normal fixed point division
> could get us a smaller drift (and an one-liner here) ...
> at 4.3 GHz, 32 instead of 48 bits after decimal point translate to one
> "lost" TSC tick per second, in the worst case.
>
> Please mention that we are truncating on purpose :)

It's intentional to avoid the potential overflow in
  khz << kvm_tsc_scaling_ratio_frac_bits.

For VMX where kvm_tsc_scaling_ratio_frac_bits == 48, the above
expression is only safe to left shift a pretty small khz (< 2^16 KHz
or 65.5 MHz). Thus, I decided to sacrifice the precision for safety.
I chose to truncate at the boundary of 32 bits which can handle
khz as large as 4294 GHz.

Though this truncation results in losing TSC ticks when khz is larger
than 4.3 GHz, the lost is however pretty small compared with the large
khz.

Alternatively, it's also possible to follow David's comment to use
divq on x86_64 to keep both precision and safety. On i386, it just
falls back to above truncating approach.

- Haozhong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/15] arm64: Add page size to the kernel image header

2015-10-05 Thread Ard Biesheuvel
On 5 October 2015 at 14:02, Suzuki K. Poulose  wrote:
> On 02/10/15 16:49, Catalin Marinas wrote:
>>
>> On Tue, Sep 15, 2015 at 04:41:18PM +0100, Suzuki K. Poulose wrote:
>>>
>>> From: Ard Biesheuvel 
>>>
>>> This patch adds the page size to the arm64 kernel image header
>>> so that one can infer the PAGESIZE used by the kernel. This will
>>> be helpful to diagnose failures to boot the kernel with page size
>>> not supported by the CPU.
>>>
>>> Signed-off-by: Ard Biesheuvel 
>>
>>
>> This patch needs you signed-off-by as well since you are posting it. And
>> IIRC I acked it as well, I'll check.
>>
>
> Yes, you did mention that you were OK with the patch. But I thought there
> was
> no  'Acked-by' tag added. Hence didn't pick that up.
>

In my version of this patch (which I sent separately before noticing
that Suzuki had already folded it into his series), I took the comment
from Catalin to the email in which I suggested this change as an
implicit Acked-by.

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/15] arm64: Add page size to the kernel image header

2015-10-05 Thread Suzuki K. Poulose

On 02/10/15 16:49, Catalin Marinas wrote:

On Tue, Sep 15, 2015 at 04:41:18PM +0100, Suzuki K. Poulose wrote:

From: Ard Biesheuvel 

This patch adds the page size to the arm64 kernel image header
so that one can infer the PAGESIZE used by the kernel. This will
be helpful to diagnose failures to boot the kernel with page size
not supported by the CPU.

Signed-off-by: Ard Biesheuvel 


This patch needs you signed-off-by as well since you are posting it. And
IIRC I acked it as well, I'll check.



Yes, you did mention that you were OK with the patch. But I thought there was
no  'Acked-by' tag added. Hence didn't pick that up.



If you are fine with adding your signed-of-by, I can add it to the patch
when applying (unless I see other issues with the series).



Yes, please go ahead, if I don't have to send another version depending on
the review of KVM bits. If I not, I will add the S-o-b and your Acked-by.


Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-05 Thread Mario Smarduch
Hi Christoffer,
   I just managed to boot qemu arm32 up on arm64 (last Fri - thanks for the tip
- there were few other issue to clean up), so let me retest it again. Also I
noticed some refactoring would help both 32 and 64 bit patches.

Yes I could provide a the user space tests as well.

Thanks-
- Mario

On 10/5/2015 8:45 AM, Christoffer Dall wrote:
> On Tue, Sep 22, 2015 at 04:34:01PM -0700, Mario Smarduch wrote:
>> This is a 2nd itteration for arm64, v1 patches were posted by mistake from 
>> an 
>> older branch which included several bugs. Hopefully didn't waste too much of 
>> anyones time.
>>
>> This patch series is a followup to the armv7 fp/simd lazy switch
>> implementation, uses similar approach and depends on the series - see
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016516.html
>>
>> It's based on earlier arm64 fp/simd optimization work - see
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html
>>
>> And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
>> 32-bit guest on 64 bit host (and may require more here) - see
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html
>>
>> This series has be tested with arm64 on arm64 with several FP applications 
>> running on host and guest, with substantial decrease on number of 
>> fp/simd context switches. From about 30% down to 2% with one guest running.
>>
>> At this time I don't have arm32/arm64 working and hoping Christoffer and/or
>> Marc (or anyone) can test 32-bit guest/64-bit host.
>>
> Did you already have some test infrastructure/applications that I can
> reuse for this purpose or do I have to write userspace software?
> 
> -Christoffer
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-05 Thread Joerg Roedel
On Mon, Oct 05, 2015 at 12:54:43PM -0400, Bandan Das wrote:
> Joerg Roedel  writes:
> The problems is that the next_rip field could be stale. If the processor 
> supports
> next_rip, then it will clear it out on the next entry. If it doesn't,
> an old value just sits there (no matter who wrote it) and the problem
> happens when skip_emulated_instruction advances the rip with an incorrect
> value.

So the right fix would be to just set the guests next_rip to 0 when we
emulate vmrun, just like real hardware does, no?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-05 Thread Christoffer Dall
On Mon, Oct 05, 2015 at 09:14:57AM -0700, Mario Smarduch wrote:
> Hi Christoffer,
>I just managed to boot qemu arm32 up on arm64 (last Fri - thanks for the 
> tip
> - there were few other issue to clean up), so let me retest it again. Also I
> noticed some refactoring would help both 32 and 64 bit patches.
> 
> Yes I could provide a the user space tests as well.
> 
I'd like those regardless as I generally test my queue before pushing it
to next.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS

2015-10-05 Thread Bandan Das
Joerg Roedel  writes:

> On Thu, Oct 01, 2015 at 06:31:27PM -0400, Bandan Das wrote:
>> Paolo Bonzini  writes:
>> 
>> > On 01/10/2015 13:43, Dirk Müller wrote:
>> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> >> index 94b7d15..0a42859 100644
>> >> --- a/arch/x86/kvm/svm.c
>> >> +++ b/arch/x86/kvm/svm.c
>> >> @@ -514,7 +514,7 @@ static void skip_emulated_instruction(struct kvm_vcpu 
>> >> *vcpu)
>> >>   struct vcpu_svm *svm = to_svm(vcpu);
>> >>  
>> >>   if (svm->vmcb->control.next_rip != 0) {
>> >> - WARN_ON(!static_cpu_has(X86_FEATURE_NRIPS));
>> >> + WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
>> >>   svm->next_rip = svm->vmcb->control.next_rip;
>> >>   }
>> >>  
>> >
>> > Bandan, what was the reason for warning here?
>> 
>> I added the warning so that we catch if the next_rip field is being written
>> to (even if the feature isn't supported) by a buggy L1 hypervisor.
>
> Even if the L1 hypervisor writes to the next_rip field in the VMCB, we
> would never see it in this code path, as we access the shadow VMCB in
> this statement.
>
> We don't even care if the L1 hypervisor writes to its next_rip field
> because we only write to this field on an emulatated VMEXIT and never
> read it back.

The problems is that the next_rip field could be stale. If the processor 
supports
next_rip, then it will clear it out on the next entry. If it doesn't,
an old value just sits there (no matter who wrote it) and the problem
happens when skip_emulated_instruction advances the rip with an incorrect
value.

> So what's the point in adding a guest-triggerable warning at all?

So, yes, maybe this doesn't have to be a guest specific warning but we still
need to warn if this unsupported field is being written to.

>
>
>   Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT - PATCH v2 0/2] KVM/arm64: add fp/simd lazy switch support

2015-10-05 Thread Christoffer Dall
On Tue, Sep 22, 2015 at 04:34:01PM -0700, Mario Smarduch wrote:
> This is a 2nd itteration for arm64, v1 patches were posted by mistake from an 
> older branch which included several bugs. Hopefully didn't waste too much of 
> anyones time.
> 
> This patch series is a followup to the armv7 fp/simd lazy switch
> implementation, uses similar approach and depends on the series - see
> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-September/016516.html
> 
> It's based on earlier arm64 fp/simd optimization work - see
> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html
> 
> And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle
> 32-bit guest on 64 bit host (and may require more here) - see
> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html
> 
> This series has be tested with arm64 on arm64 with several FP applications 
> running on host and guest, with substantial decrease on number of 
> fp/simd context switches. From about 30% down to 2% with one guest running.
> 
> At this time I don't have arm32/arm64 working and hoping Christoffer and/or
> Marc (or anyone) can test 32-bit guest/64-bit host.
> 
Did you already have some test infrastructure/applications that I can
reuse for this purpose or do I have to write userspace software?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/15] arm64: Add page size to the kernel image header

2015-10-05 Thread Christoffer Dall
On Fri, Oct 02, 2015 at 05:50:38PM +0100, Marc Zyngier wrote:
> On 02/10/15 17:31, Catalin Marinas wrote:
> > On Fri, Oct 02, 2015 at 04:49:01PM +0100, Catalin Marinas wrote:
> >> On Tue, Sep 15, 2015 at 04:41:18PM +0100, Suzuki K. Poulose wrote:
> >>> From: Ard Biesheuvel 
> >>>
> >>> This patch adds the page size to the arm64 kernel image header
> >>> so that one can infer the PAGESIZE used by the kernel. This will
> >>> be helpful to diagnose failures to boot the kernel with page size
> >>> not supported by the CPU.
> >>>
> >>> Signed-off-by: Ard Biesheuvel 
> >>
> >> This patch needs you signed-off-by as well since you are posting it. And
> >> IIRC I acked it as well, I'll check.
> >>
> >> If you are fine with adding your signed-of-by, I can add it to the patch
> >> when applying (unless I see other issues with the series).
> > 
> > Actually, I just realised that the kvm patches don't have any acks, so
> > I'm not taking this series yet. You may want to repost once you have all
> > the acks in place.
> > 
> 
> I'm in the middle of it. I should be done next week (I may have said the
> same thing two weeks ago...).
> 
Very near the top of my to-review list as well, will try to get it done
this week.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html