[PATCH] VSOCK: mark virtio_transport.ko experimental

2015-12-03 Thread Stefan Hajnoczi
Be explicit that the virtio_transport.ko code implements a draft virtio
specification that is still subject to change.

Signed-off-by: Stefan Hajnoczi 
---
If you'd rather wait until the device specification has been finalized, feel
free to revert the virtio-vsock code for now.  Apologies for not mentioning the
status in the Kconfig earlier.

 net/vmw_vsock/Kconfig | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 74e0bc8..d8be850 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -28,12 +28,17 @@ config VMWARE_VMCI_VSOCKETS
  will be called vmw_vsock_vmci_transport. If unsure, say N.
 
 config VIRTIO_VSOCKETS
-   tristate "virtio transport for Virtual Sockets"
+   tristate "virtio transport for Virtual Sockets (Experimental)"
depends on VSOCKETS && VIRTIO
select VIRTIO_VSOCKETS_COMMON
+   default n
help
  This module implements a virtio transport for Virtual Sockets.
 
+ This feature is based on a draft of the virtio-vsock device
+ specification that is still subject to change.  It can be used
+ to begin developing applications that use Virtual Sockets.
+
  Enable this transport if your Virtual Machine runs on Qemu/KVM.
 
  To compile this driver as a module, choose M here: the module
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/5] Add virtio transport for AF_VSOCK

2015-12-03 Thread Michael S. Tsirkin
On Wed, Dec 02, 2015 at 02:43:58PM +0800, Stefan Hajnoczi wrote:
> v2:
>  * Rebased onto Linux v4.4-rc2
>  * vhost: Refuse to assign reserved CIDs
>  * vhost: Refuse guest CID if already in use
>  * vhost: Only accept correctly addressed packets (no spoofing!)
>  * vhost: Support flexible rx/tx descriptor layout
>  * vhost: Add missing total_tx_buf decrement
>  * virtio_transport: Fix total_tx_buf accounting
>  * virtio_transport: Add virtio_transport global mutex to prevent races
>  * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
>semantics)
>  * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes 
> deadlock)
>  * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
>  * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
>  * common: Fix peer_buf_alloc inheritance on child socket
> 
> This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
> AF_VSOCK is designed for communication between virtual machines and
> hypervisors.  It is currently only implemented for VMware's VMCI transport.
> 
> This series implements the proposed virtio-vsock device specification from
> here:
> http://comments.gmane.org/gmane.comp.emulators.virtio.devel/855
> 
> Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
> picked up the series again.
> 
> The QEMU userspace changes are here:
> https://github.com/stefanha/qemu/commits/vsock
> 
> Why virtio-vsock?
> -
> Guest<->host communication is currently done over the virtio-serial device.
> This makes it hard to port sockets API-based applications and is limited to
> static ports.
> 
> virtio-vsock uses the sockets API so that applications can rely on familiar
> SOCK_STREAM and SOCK_DGRAM semantics.  Applications on the host can easily
> connect to guest agents because the sockets API allows multiple connections to
> a listen socket (unlike virtio-serial).  This simplifies the guest<->host
> communication and eliminates the need for extra processes on the host to
> arbitrate virtio-serial ports.
> 
> Overview
> 
> This series adds 3 pieces:
> 
> 1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko
> 
> 2. virtio_transport.ko - guest driver
> 
> 3. drivers/vhost/vsock.ko - host driver
> 
> Howto
> -
> The following kernel options are needed:
>   CONFIG_VSOCKETS=y
>   CONFIG_VIRTIO_VSOCKETS=y
>   CONFIG_VIRTIO_VSOCKETS_COMMON=y
>   CONFIG_VHOST_VSOCK=m
> 
> Launch QEMU as follows:
>   # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3
> 
> Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
> is 2 and the guest is automatically assigned a CID (use VMADDR_CID_ANY (-1) to
> bind to it).
> 
> Status
> --
> There are a few design changes I'd like to make to the virtio-vsock device:
> 
> 1. The 3-way handshake isn't necessary over a reliable transport (virtqueue).
>Spoofing packets is also impossible so the security aspects of the 3-way
>handshake (including syn cookie) add nothing.  The next version will have a
>single operation to establish a connection.

It's hard to discuss without seeing the details, but we do need to
slow down guests that are flooding host with socket creation requests.
The handshake is a simple way for hypervisor to defer
such requests until it has resources without
breaking things.

> 2. Credit-based flow control doesn't work for SOCK_DGRAM since multiple 
> clients
>can transmit to the same listen socket.  There is no way for the clients to
>coordinate buffer space with each other fairly.  The next version will drop
>credit-based flow control for SOCK_DGRAM and only rely on best-effort
>delivery.  SOCK_STREAM still has guaranteed delivery.

I suspect in the end we will need a measure of fairness even
if you drop packets. And recovering from packet loss is
hard enough that not many applications do it correctly.
I suggest disabling SOCK_DGRAM for now.

> 3. In the next version only the host will be able to establish connections
>(i.e. to connect to a guest agent).  This is for security reasons since
>there is currently no ability to provide host services only to certain
>guests.  This also matches how AF_VSOCK works on modern VMware hypervisors.


I see David merged this one already, but above planned changes are
userspace and hypervisor/guest visible.

Once this is upstream and userspace/guests start relying on it,
we'll be stuck supporting this version in addition to
whatever we really want, with no easy way to even test it.

Might it not be better to defer enabling this upstream until the interface is
finalized?


> Asias He (5):
>   VSOCK: Introduce vsock_find_unbound_socket and
> vsock_bind_dgram_generic
>   VSOCK: Introduce virtio-vsock-common.ko
>   VSOCK: Introduce virtio-vsock.ko
>   VSOCK: Introduce vhost-vsock.ko
>   VSOCK: Add Makefile and Kconfig
> 
>  drivers/vhost/Kconfig

Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-03 Thread Lan, Tianyu


On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote:

>We hope
>to find a better way to make SRIOV NIC work in these cases and this is
>worth to do since SRIOV NIC provides better network performance compared
>with PV NIC.

If this is a performance optimization as the above implies,
you need to include some numbers, and document how did
you implement the switch and how did you measure the performance.



OK. Some ideas of my patches come from paper "CompSC: Live Migration with
Pass-through Devices".
http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf

It compared performance data between the solution of switching PV and VF 
and VF migration.(Chapter 7: Discussion)




>Current patches have some issues. I think we can find
>solution for them andimprove them step by step.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-03 Thread Pavel Fedin
 Hello!

> > > Cc: sta...@vger.kernel.org
> > > Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's 
> > > uncachedness")
> > >
> >
> > That commit is not in a release yet, so no need for cc stable
> [...]
> 
> But it is cc'd to stable, so unless it is going to be nacked at review
> stage, any subsequent fixes should also be cc'd.

 Sorry guys for messing things up a bit, but the affected commit actually is in 
stable branch (4.4-rc3), so i decided to Cc: stable, just in case, because the 
breakage is quite big IMHO.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: arm64: Correctly handle zero register during MMIO

2015-12-03 Thread Marc Zyngier
On 03/12/15 09:58, Pavel Fedin wrote:
> On ARM64 register index of 31 corresponds to both zero register and SP.
> However, all memory access instructions, use ZR as transfer register. SP
> is used only as a base register in indirect memory addressing, or by
> register-register arithmetics, which cannot be trapped here.
> 
> Correct emulation is achieved by introducing new register accessor
> functions, which can do special handling for reg_num == 31. These new
> accessors intentionally do not rely on old vcpu_reg() on ARM64, because
> it is to be removed. Since the affected code is shared by both ARM
> flavours, implementations of these accessors are also added to ARM32 code.
> 
> This patch fixes setting MMIO register to a random value (actually SP)
> instead of zero by something like:
> 
>  *((volatile int *)reg) = 0;
> 
> compilers tend to generate "str wzr, [xx]" here
> 
> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm/include/asm/kvm_emulate.h   | 12 
>  arch/arm/kvm/mmio.c  |  5 +++--
>  arch/arm64/include/asm/kvm_emulate.h | 13 +
>  3 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h 
> b/arch/arm/include/asm/kvm_emulate.h
> index a9c80a2..b7ff32e 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -28,6 +28,18 @@
>  unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
>  unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
>  
> +static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu,
> +  u8 reg_num)
> +{
> + return *vcpu_reg(vcpu, reg_num);
> +}
> +
> +static inline void vcpu_set_reg(const struct kvm_vcpu *vcpu, u8 reg_num,
> + unsigned long val)
> +{
> + *vcpu_reg(vcpu, reg_num) = val;
> +}
> +
>  bool kvm_condition_valid(struct kvm_vcpu *vcpu);
>  void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
>  void kvm_inject_undefined(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
> index 974b1c6..3a10c9f 100644
> --- a/arch/arm/kvm/mmio.c
> +++ b/arch/arm/kvm/mmio.c
> @@ -115,7 +115,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct 
> kvm_run *run)
>   trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
>  data);
>   data = vcpu_data_host_to_guest(vcpu, data, len);
> - *vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt) = data;
> + vcpu_set_reg(vcpu, vcpu->arch.mmio_decode.rt, data);
>   }
>  
>   return 0;
> @@ -186,7 +186,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run 
> *run,
>   rt = vcpu->arch.mmio_decode.rt;
>  
>   if (is_write) {
> - data = vcpu_data_guest_to_host(vcpu, *vcpu_reg(vcpu, rt), len);
> + data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt),
> +len);
>  
>   trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, fault_ipa, data);
>   mmio_write_buf(data_buf, len, data);
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 3ca894e..5a182af 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -109,6 +109,19 @@ static inline unsigned long *vcpu_reg(const struct 
> kvm_vcpu *vcpu, u8 reg_num)
>   return (unsigned long *)_gp_regs(vcpu)->regs.regs[reg_num];
>  }
>  
> +static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu,
> +  u8 reg_num)
> +{
> + return (reg_num == 31) ? 0 : vcpu_gp_regs(vcpu)->regs.regs[reg_num];
> +}
> +
> +static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
> + unsigned long val)
> +{
> + if (reg_num != 31)
> + vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val;
> +}
> +
>  /* Get vcpu SPSR for current mode */
>  static inline unsigned long *vcpu_spsr(const struct kvm_vcpu *vcpu)
>  {
> 

Thanks for finding this nasty one.

Reviewed-by: Marc Zyngier 

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-03 Thread Marc Zyngier
On 03/12/15 10:53, Pavel Fedin wrote:
>  Hello!
> 
>>> The problem has been discovered by performing an operation
>>>
>>>  *((volatile int *)reg) = 0;
>>>
>>> which compiles as "str xzr, [xx]", and resulted in strange values being
>>> written.
>>
>> Interesting find. Which compiler is that?
> 
> $ aarch64-linux-gnu-gcc --version
> aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3 20141031 (prerelease)
> Copyright (C) 2014 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

OK. I was just wondering if that was a new thing or not.

[...]

>  Isn't it legitimate to write from ZR to MMIO register?
>  Another potential case is in our vgic-v3-switch.S:
> 
>   msr_s   ICH_HCR_EL2, xzr
> 
>  It's only because it is KVM code we have never discovered this problem yet. 
> Somebody could write such a thing in some other place,
> with some other register, which would be executed by KVM, and... boo...

I'm certainly not disputing that, this is a real bug that should be
fixed right now.

Looking forward to seeing your v2.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses

2015-12-03 Thread Pavel Fedin
 Hello!

> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 87a64e8..a667228 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -102,7 +102,7 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
> >
> > BUG_ON(!p->is_write);
> >
> > -   val = *vcpu_reg(vcpu, p->Rt);
> > +   val = *p->val;
> 
> Why does it have to be a pointer? You could just have "val = p->val" if
> you carried the actual value instead of a pointer to the stack variable
> holding that value.

 There's only one concern for pointer approach. Actually, this refactor is 
based on my vGICv3 live migration API patch set:
http://www.spinics.net/lists/kvm/msg124205.html
http://www.spinics.net/lists/kvm/msg124202.html

 It's simply more convenient to use a pointer for exchange with userspace, see 
vgic_v3_cpu_regs_access() and callers. I wouldn't
like to refactor the code again. What's your opinion on this?
 And of course i'll fix up the rest.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses

2015-12-03 Thread Marc Zyngier
On 03/12/15 11:08, Pavel Fedin wrote:
> Hello!
> 
>>> diff --git a/arch/arm64/kvm/sys_regs.c
>>> b/arch/arm64/kvm/sys_regs.c index 87a64e8..a667228 100644 ---
>>> a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@
>>> -102,7 +102,7 @@ static bool access_vm_reg(struct kvm_vcpu
>>> *vcpu,
>>> 
>>> BUG_ON(!p->is_write);
>>> 
>>> -   val = *vcpu_reg(vcpu, p->Rt); + val = *p->val;
>> 
>> Why does it have to be a pointer? You could just have "val =
>> p->val" if you carried the actual value instead of a pointer to the
>> stack variable holding that value.
> 
> There's only one concern for pointer approach. Actually, this
> refactor is based on my vGICv3 live migration API patch set: 
> http://www.spinics.net/lists/kvm/msg124205.html 
> http://www.spinics.net/lists/kvm/msg124202.html
> 
> It's simply more convenient to use a pointer for exchange with
> userspace, see vgic_v3_cpu_regs_access() and callers. I wouldn't like
> to refactor the code again. What's your opinion on this?

I still don't think this is a good idea. You can still store the value
as an integer in vgic_v3_cpu_regs_access(), and check the write property
to do the writeback on read. Which is the same thing I asked for in this
patch.

> And of course i'll fix up the rest.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses

2015-12-03 Thread Pavel Fedin
 Hello!

> > It's simply more convenient to use a pointer for exchange with
> > userspace, see vgic_v3_cpu_regs_access() and callers. I wouldn't like
> > to refactor the code again. What's your opinion on this?
> 
> I still don't think this is a good idea. You can still store the value
> as an integer in vgic_v3_cpu_regs_access(), and check the write property
> to do the writeback on read. Which is the same thing I asked for in this
> patch.

 Started doing this and found one more (big) reason against. All sysreg 
handlers have 'const struct sys_reg_params' declaration, and
callers, and their callers... This 'const' is all around the code, and it would 
take a separate huge patch to un-const'ify all this.
Does it worth that?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-03 Thread Ard Biesheuvel
On 3 December 2015 at 08:14, Pavel Fedin  wrote:
>  Hello!
>
>> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> > index 7dace90..51ad98f 100644
>> > --- a/arch/arm/kvm/mmu.c
>> > +++ b/arch/arm/kvm/mmu.c
>> > @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> > *pmd,
>> >
>> > pte = pte_offset_kernel(pmd, addr);
>> > do {
>> > -   if (!pte_none(*pte) && 
>> > !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> > +   if (!pte_none(*pte) &&
>> > +   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>>
>> I think your analysis is correct, but does that not apply to both instances?
>
>  No no, another one is correct, since it operates on real PFN (at least looks 
> like so). I have verified my fix against the original problem (crash on 
> Exynos5410 without generic timer), and it still works fine there.
>

I don't think so. Regardless of whether you are manipulating HYP
mappings or stage-2 mappings, the physical address is always the
output, not the input of the translation, so addr is always either a
virtual address or a intermediate physical address, whereas
pfn_valid() operates on host physical addresses.

>> And instead of reverting, could we fix this properly instead?
>
>  Of course, i'm not against alternate approaches, feel free to. I've just 
> suggested what i could, to fix things quickly. I'm indeed no expert in KVM 
> memory management yet. After all, this is what mailing lists are for.
>

OK. I will follow up with a patch, as Christoffer requested. I'd
appreciate it if you could test to see if it also fixes the current
issue, and the original arch timer issue.

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/21] arm64: KVM: Add patchable function selector

2015-12-03 Thread Marc Zyngier
On Wed, 2 Dec 2015 16:34:56 -0600
Andrew Jones  wrote:

> On Fri, Nov 27, 2015 at 06:50:04PM +, Marc Zyngier wrote:
> > KVM so far relies on code patching, and is likely to use it more
> > in the future. The main issue is that our alternative system works
> > at the instruction level, while we'd like to have alternatives at
> > the function level.
> 
> How about setting static-keys at hyp init time?

That was an option I looked at. And while static keys would work to some
extent, they also have some nasty side effects:
- They create both a fast and a slow path. We don't want that - both
  path should be equally fast, or at least have as little overhead as
  possible
- We do need code patching for some assembly code, and using static
  keys on top creates a parallel mechanism that makes it hard to
  follow/debug/maintain.

You can view this alternative function call as a slightly different
kind of static keys - one that can give you the capability to handle
function calls instead of just jumping over code sequences. Both have
their own merits.

Thanks,

M.
-- 
Without deviation from the norm, progress is not possible.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 09/10] Qemu/VFIO: Add SRIOV VF migration support

2015-12-03 Thread Lan, Tianyu



On 12/3/2015 6:25 AM, Alex Williamson wrote:

On Tue, 2015-11-24 at 21:35 +0800, Lan Tianyu wrote:

This patch is to add SRIOV VF migration support.
Create new device type "vfio-sriov" and add faked PCI migration capability
to the type device.

The purpose of the new capability
1) sync migration status with VF driver in the VM
2) Get mailbox irq vector to notify VF driver during migration.
3) Provide a way to control injecting irq or not.

Qemu will migrate PCI configure space regs and MSIX config for VF.
Inject mailbox irq at last stage of migration to notify VF about
migration event and wait VF driver ready for migration. VF driver
writeS PCI config reg PCI_VF_MIGRATION_VF_STATUS in the new cap table
to tell Qemu.


What makes this sr-iov specific?  Why wouldn't we simply extend vfio-pci
with a migration=on feature?  Thanks,


Sounds reasonable and will update it.



Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 06/10] Qemu/PCI: Add macros for faked PCI migration capability

2015-12-03 Thread Lan, Tianyu



On 12/3/2015 6:25 AM, Alex Williamson wrote:

This will of course break if the PCI SIG defines that capability index.
Couldn't this be done within a vendor defined capability?  Thanks,


Yes, it should work and thanks for suggestion.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-03 Thread Pavel Fedin
 Hello!

> >> I think your analysis is correct, but does that not apply to both 
> >> instances?
> >
> >  No no, another one is correct, since it operates on real PFN (at least 
> > looks like so). I
> have verified my fix against the original problem (crash on Exynos5410 
> without generic timer),
> and it still works fine there.
> >
> 
> I don't think so. Regardless of whether you are manipulating HYP
> mappings or stage-2 mappings, the physical address is always the
> output, not the input of the translation, so addr is always either a
> virtual address or a intermediate physical address, whereas
> pfn_valid() operates on host physical addresses.

 Yes, you are right. I have reviewed this more carefully, and indeed, 
unmap_range() is also called by unmap_stage2_range(), so it can be both IPA and 
real PA.

> OK. I will follow up with a patch, as Christoffer requested. I'd
> appreciate it if you could test to see if it also fixes the current
> issue, and the original arch timer issue.

 I have just made the same patch, and currently testing it on all my boards. 
Also i'll test it on my ARM64 too, just in case. I was about to finish the 
testing and send the patch in maybe one or two hours.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-03 Thread Pavel Fedin
ARM64 CPU has zero register which is read-only, with a value of 0.
However, KVM currently incorrectly recognizes it being SP (because
Rt == 31, and in struct user_pt_regs 'regs' array is followed by SP),
resulting in invalid value being read, or even SP corruption on write.

The problem has been discovered by performing an operation

 *((volatile int *)reg) = 0;

which compiles as "str xzr, [xx]", and resulted in strange values being
written.

Pavel Fedin (3):
  KVM: arm64: Correctly handle zero register during MMIO
  KVM: arm64: Correctly handle zero register in system register accesses
  KVM: arm64: Get rid of old vcpu_reg()

 arch/arm/include/asm/kvm_emulate.h   | 12 ++
 arch/arm/kvm/mmio.c  |  5 ++-
 arch/arm/kvm/psci.c  | 20 -
 arch/arm64/include/asm/kvm_emulate.h | 18 +---
 arch/arm64/kvm/handle_exit.c |  2 +-
 arch/arm64/kvm/sys_regs.c| 79 
 arch/arm64/kvm/sys_regs.h|  4 +-
 arch/arm64/kvm/sys_regs_generic_v8.c |  2 +-
 8 files changed, 85 insertions(+), 57 deletions(-)

-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: arm64: Correctly handle zero register during MMIO

2015-12-03 Thread Pavel Fedin
On ARM64 register index of 31 corresponds to both zero register and SP.
However, all memory access instructions, use ZR as transfer register. SP
is used only as a base register in indirect memory addressing, or by
register-register arithmetics, which cannot be trapped here.

Correct emulation is achieved by introducing new register accessor
functions, which can do special handling for reg_num == 31. These new
accessors intentionally do not rely on old vcpu_reg() on ARM64, because
it is to be removed. Since the affected code is shared by both ARM
flavours, implementations of these accessors are also added to ARM32 code.

This patch fixes setting MMIO register to a random value (actually SP)
instead of zero by something like:

 *((volatile int *)reg) = 0;

compilers tend to generate "str wzr, [xx]" here

Signed-off-by: Pavel Fedin 
---
 arch/arm/include/asm/kvm_emulate.h   | 12 
 arch/arm/kvm/mmio.c  |  5 +++--
 arch/arm64/include/asm/kvm_emulate.h | 13 +
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index a9c80a2..b7ff32e 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -28,6 +28,18 @@
 unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
 unsigned long *vcpu_spsr(struct kvm_vcpu *vcpu);
 
+static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu,
+u8 reg_num)
+{
+   return *vcpu_reg(vcpu, reg_num);
+}
+
+static inline void vcpu_set_reg(const struct kvm_vcpu *vcpu, u8 reg_num,
+   unsigned long val)
+{
+   *vcpu_reg(vcpu, reg_num) = val;
+}
+
 bool kvm_condition_valid(struct kvm_vcpu *vcpu);
 void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
index 974b1c6..3a10c9f 100644
--- a/arch/arm/kvm/mmio.c
+++ b/arch/arm/kvm/mmio.c
@@ -115,7 +115,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
   data);
data = vcpu_data_host_to_guest(vcpu, data, len);
-   *vcpu_reg(vcpu, vcpu->arch.mmio_decode.rt) = data;
+   vcpu_set_reg(vcpu, vcpu->arch.mmio_decode.rt, data);
}
 
return 0;
@@ -186,7 +186,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
rt = vcpu->arch.mmio_decode.rt;
 
if (is_write) {
-   data = vcpu_data_guest_to_host(vcpu, *vcpu_reg(vcpu, rt), len);
+   data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt),
+  len);
 
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, fault_ipa, data);
mmio_write_buf(data_buf, len, data);
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 3ca894e..5a182af 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -109,6 +109,19 @@ static inline unsigned long *vcpu_reg(const struct 
kvm_vcpu *vcpu, u8 reg_num)
return (unsigned long *)_gp_regs(vcpu)->regs.regs[reg_num];
 }
 
+static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu,
+u8 reg_num)
+{
+   return (reg_num == 31) ? 0 : vcpu_gp_regs(vcpu)->regs.regs[reg_num];
+}
+
+static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
+   unsigned long val)
+{
+   if (reg_num != 31)
+   vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val;
+}
+
 /* Get vcpu SPSR for current mode */
 static inline unsigned long *vcpu_spsr(const struct kvm_vcpu *vcpu)
 {
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses

2015-12-03 Thread Pavel Fedin
System register accesses also use zero register for Rt == 31, and
therefore using it will also result in getting SP value instead. This
patch makes them also using new accessors, introduced by the previous
patch.

Additionally, got rid of "massive hack" in kvm_handle_cp_64().

Signed-off-by: Pavel Fedin 
---
 arch/arm64/kvm/sys_regs.c| 79 
 arch/arm64/kvm/sys_regs.h|  4 +-
 arch/arm64/kvm/sys_regs_generic_v8.c |  2 +-
 3 files changed, 46 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 87a64e8..a667228 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -102,7 +102,7 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 
BUG_ON(!p->is_write);
 
-   val = *vcpu_reg(vcpu, p->Rt);
+   val = *p->val;
if (!p->is_aarch32) {
vcpu_sys_reg(vcpu, r->reg) = val;
} else {
@@ -125,13 +125,10 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
   const struct sys_reg_params *p,
   const struct sys_reg_desc *r)
 {
-   u64 val;
-
if (!p->is_write)
return read_from_write_only(vcpu, p);
 
-   val = *vcpu_reg(vcpu, p->Rt);
-   vgic_v3_dispatch_sgi(vcpu, val);
+   vgic_v3_dispatch_sgi(vcpu, *p->val);
 
return true;
 }
@@ -153,7 +150,7 @@ static bool trap_oslsr_el1(struct kvm_vcpu *vcpu,
if (p->is_write) {
return ignore_write(vcpu, p);
} else {
-   *vcpu_reg(vcpu, p->Rt) = (1 << 3);
+   *p->val = (1 << 3);
return true;
}
 }
@@ -167,7 +164,7 @@ static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu,
} else {
u32 val;
asm volatile("mrs %0, dbgauthstatus_el1" : "=r" (val));
-   *vcpu_reg(vcpu, p->Rt) = val;
+   *p->val = val;
return true;
}
 }
@@ -204,13 +201,13 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r)
 {
if (p->is_write) {
-   vcpu_sys_reg(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
+   vcpu_sys_reg(vcpu, r->reg) = *p->val;
vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
} else {
-   *vcpu_reg(vcpu, p->Rt) = vcpu_sys_reg(vcpu, r->reg);
+   *p->val = vcpu_sys_reg(vcpu, r->reg);
}
 
-   trace_trap_reg(__func__, r->reg, p->is_write, *vcpu_reg(vcpu, p->Rt));
+   trace_trap_reg(__func__, r->reg, p->is_write, *p->val);
 
return true;
 }
@@ -228,7 +225,7 @@ static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
  const struct sys_reg_params *p,
  u64 *dbg_reg)
 {
-   u64 val = *vcpu_reg(vcpu, p->Rt);
+   u64 val = *p->val;
 
if (p->is_32bit) {
val &= 0xUL;
@@ -248,7 +245,7 @@ static inline void dbg_to_reg(struct kvm_vcpu *vcpu,
if (p->is_32bit)
val &= 0xUL;
 
-   *vcpu_reg(vcpu, p->Rt) = val;
+   *p->val = val;
 }
 
 static inline bool trap_bvr(struct kvm_vcpu *vcpu,
@@ -697,10 +694,10 @@ static bool trap_dbgidr(struct kvm_vcpu *vcpu,
u64 pfr = read_system_reg(SYS_ID_AA64PFR0_EL1);
u32 el3 = !!cpuid_feature_extract_field(pfr, 
ID_AA64PFR0_EL3_SHIFT);
 
-   *vcpu_reg(vcpu, p->Rt) = dfr >> ID_AA64DFR0_WRPS_SHIFT) & 
0xf) << 28) |
- (((dfr >> ID_AA64DFR0_BRPS_SHIFT) & 
0xf) << 24) |
- (((dfr >> ID_AA64DFR0_CTX_CMPS_SHIFT) 
& 0xf) << 20) |
- (6 << 16) | (el3 << 14) | (el3 << 
12));
+   *p->val = dfr >> ID_AA64DFR0_WRPS_SHIFT) & 0xf) << 28) |
+  (((dfr >> ID_AA64DFR0_BRPS_SHIFT) & 0xf) << 24) |
+  (((dfr >> ID_AA64DFR0_CTX_CMPS_SHIFT) & 0xf) << 20) |
+  (6 << 16) | (el3 << 14) | (el3 << 12));
return true;
}
 }
@@ -710,10 +707,10 @@ static bool trap_debug32(struct kvm_vcpu *vcpu,
 const struct sys_reg_desc *r)
 {
if (p->is_write) {
-   vcpu_cp14(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
+   vcpu_cp14(vcpu, r->reg) = *p->val;
vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
} else {
-   *vcpu_reg(vcpu, p->Rt) = vcpu_cp14(vcpu, r->reg);
+   *p->val = vcpu_cp14(vcpu, r->reg);
}
 
return true;
@@ -740,12 +737,12 @@ static inline bool trap_xvr(struct kvm_vcpu *vcpu,
u64 val = *dbg_reg;
 
val &= 0xUL;
-   val |= *vcpu_reg(vcpu, p->Rt) << 32;
+   val |= *p->val << 32;
*dbg_reg = val;
 

[PATCH 3/3] KVM: arm64: Get rid of old vcpu_reg()

2015-12-03 Thread Pavel Fedin
Using oldstyle vcpu_reg() accessor is proven to be inapproptiate and
unsafe on ARM64. This patch fixes the rest of use cases and completely
removes vcpu_reg() on ARM64.

Signed-off-by: Pavel Fedin 
---
 arch/arm/kvm/psci.c  | 20 ++--
 arch/arm64/include/asm/kvm_emulate.h | 11 +++
 arch/arm64/kvm/handle_exit.c |  2 +-
 3 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index 0b55696..a9b3b90 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -75,7 +75,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
*source_vcpu)
unsigned long context_id;
phys_addr_t target_pc;
 
-   cpu_id = *vcpu_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK;
+   cpu_id = vcpu_get_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK;
if (vcpu_mode_is_32bit(source_vcpu))
cpu_id &= ~((u32) 0);
 
@@ -94,8 +94,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
*source_vcpu)
return PSCI_RET_INVALID_PARAMS;
}
 
-   target_pc = *vcpu_reg(source_vcpu, 2);
-   context_id = *vcpu_reg(source_vcpu, 3);
+   target_pc = vcpu_get_reg(source_vcpu, 2);
+   context_id = vcpu_get_reg(source_vcpu, 3);
 
kvm_reset_vcpu(vcpu);
 
@@ -114,7 +114,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu 
*source_vcpu)
 * NOTE: We always update r0 (or x0) because for PSCI v0.1
 * the general puspose registers are undefined upon CPU_ON.
 */
-   *vcpu_reg(vcpu, 0) = context_id;
+   vcpu_set_reg(vcpu, 0, context_id);
vcpu->arch.power_off = false;
smp_mb();   /* Make sure the above is visible */
 
@@ -134,8 +134,8 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct 
kvm_vcpu *vcpu)
struct kvm *kvm = vcpu->kvm;
struct kvm_vcpu *tmp;
 
-   target_affinity = *vcpu_reg(vcpu, 1);
-   lowest_affinity_level = *vcpu_reg(vcpu, 2);
+   target_affinity = vcpu_get_reg(vcpu, 1);
+   lowest_affinity_level = vcpu_get_reg(vcpu, 2);
 
/* Determine target affinity mask */
target_affinity_mask = psci_affinity_mask(lowest_affinity_level);
@@ -209,7 +209,7 @@ int kvm_psci_version(struct kvm_vcpu *vcpu)
 static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
 {
int ret = 1;
-   unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
+   unsigned long psci_fn = vcpu_get_reg(vcpu, 0) & ~((u32) 0);
unsigned long val;
 
switch (psci_fn) {
@@ -273,13 +273,13 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
break;
}
 
-   *vcpu_reg(vcpu, 0) = val;
+   vcpu_set_reg(vcpu, 0, val);
return ret;
 }
 
 static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
 {
-   unsigned long psci_fn = *vcpu_reg(vcpu, 0) & ~((u32) 0);
+   unsigned long psci_fn = vcpu_get_reg(vcpu, 0) & ~((u32) 0);
unsigned long val;
 
switch (psci_fn) {
@@ -295,7 +295,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
break;
}
 
-   *vcpu_reg(vcpu, 0) = val;
+   vcpu_set_reg(vcpu, 0, val);
return 1;
 }
 
diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 5a182af..25a4021 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -100,15 +100,10 @@ static inline void vcpu_set_thumb(struct kvm_vcpu *vcpu)
 }
 
 /*
- * vcpu_reg should always be passed a register number coming from a
- * read of ESR_EL2. Otherwise, it may give the wrong result on AArch32
- * with banked registers.
+ * vcpu_get_reg and vcpu_set_reg should always be passed a register number
+ * coming from a read of ESR_EL2. Otherwise, it may give the wrong result on
+ * AArch32 with banked registers.
  */
-static inline unsigned long *vcpu_reg(const struct kvm_vcpu *vcpu, u8 reg_num)
-{
-   return (unsigned long *)_gp_regs(vcpu)->regs.regs[reg_num];
-}
-
 static inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu,
 u8 reg_num)
 {
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 68a0759..15f0477 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -37,7 +37,7 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run 
*run)
 {
int ret;
 
-   trace_kvm_hvc_arm64(*vcpu_pc(vcpu), *vcpu_reg(vcpu, 0),
+   trace_kvm_hvc_arm64(*vcpu_pc(vcpu), vcpu_get_reg(vcpu, 0),
kvm_vcpu_hvc_get_imm(vcpu));
 
ret = kvm_psci_call(vcpu);
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/3] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-03 Thread Pavel Fedin
 Hello!

> > The problem has been discovered by performing an operation
> >
> >  *((volatile int *)reg) = 0;
> >
> > which compiles as "str xzr, [xx]", and resulted in strange values being
> > written.
> 
> Interesting find. Which compiler is that?

$ aarch64-linux-gnu-gcc --version
aarch64-linux-gnu-gcc (Linaro GCC 2014.11) 4.9.3 20141031 (prerelease)
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

 This is from my colleague who actually hit the bug by his driver. And i can 
reproduce the issue with different compiler version
using the following small testcase:
--- cut ---
p.fedin@fedinw7x64 /cygdrive/d/Projects/Test
$ cat test.c
volatile int *addr;

int test_val(int val)
{
*addr = val;
}

int test_zero(void)
{
*addr = 0;
}

p.fedin@fedinw7x64 /cygdrive/d/Projects/Test
$ aarch64-unknown-linux-gnu-gcc -O2 -c test.c

p.fedin@fedinw7x64 /cygdrive/d/Projects/Test
$ aarch64-unknown-linux-gnu-objdump -d test.o

test.o: file format elf64-littleaarch64


Disassembly of section .text:

 :
   0:   2a0003e2mov w2, w0
   4:   2a0103e0mov w0, w1
   8:   9001adrpx1, 8 
   c:   f9400021ldr x1, [x1]
  10:   b922str w2, [x1]
  14:   d65f03c0ret

0018 :
  18:   9001adrpx1, 8 
  1c:   f9400021ldr x1, [x1]
  20:   b93fstr wzr, [x1]
  24:   d65f03c0ret

p.fedin@fedinw7x64 /cygdrive/d/Projects/Test
$ aarch64-unknown-linux-gnu-gcc --version
aarch64-unknown-linux-gnu-gcc (GCC) 4.9.0
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
--- cut ---

 Isn't it legitimate to write from ZR to MMIO register?
 Another potential case is in our vgic-v3-switch.S:

msr_s   ICH_HCR_EL2, xzr

 It's only because it is KVM code we have never discovered this problem yet. 
Somebody could write such a thing in some other place,
with some other register, which would be executed by KVM, and... boo...

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 02/10] Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition

2015-12-03 Thread Lan, Tianyu


On 12/3/2015 6:25 AM, Alex Williamson wrote:

I didn't seen a matching kernel patch series for this, but why is the
kernel more capable of doing this than userspace is already?

The following link is the kernel patch.
http://marc.info/?l=kvm=144837328920989=2


These seem
like pointless ioctls, we're creating a purely virtual PCI capability,
the kernel doesn't really need to participate in that.


VFIO kernel driver has pci_config_map which indicates the PCI capability 
position and length which helps to find free PCI config regs. Qemu side 
doesn't have such info and can't get the exact table size of PCI 
capability. If we want to add such support in the Qemu, needs duplicates 
a lot of code of vfio_pci_configs.c in the Qemu.



Also, why are we
restricting ourselves to standard capabilities?


This version is to check whether it's on the right way and We can extend
this to pci extended capability later.


That's often a crowded
space and we can't always know whether an area is free or not based only
on it being covered by a capability.  Some capabilities can also appear
more than once, so there's context that isn't being passed to the kernel
here.  Thanks,


The region outside of PCI capability are not passed to kernel or used by 
Qemu for MSI/MSIX . It's possible to use these places for new 
capability. One concerns is that guest driver may abuse them and quirk 
of masking some special regs outside of capability maybe helpful.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM: arm64: BUG FIX: Correctly handle zero register transfers

2015-12-03 Thread Marc Zyngier
On 03/12/15 09:58, Pavel Fedin wrote:
> ARM64 CPU has zero register which is read-only, with a value of 0.
> However, KVM currently incorrectly recognizes it being SP (because
> Rt == 31, and in struct user_pt_regs 'regs' array is followed by SP),
> resulting in invalid value being read, or even SP corruption on write.

No really. XZR and SP do share the same encoding.

> The problem has been discovered by performing an operation
> 
>  *((volatile int *)reg) = 0;
> 
> which compiles as "str xzr, [xx]", and resulted in strange values being
> written.

Interesting find. Which compiler is that?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses

2015-12-03 Thread Marc Zyngier
Pavel,

On 03/12/15 09:58, Pavel Fedin wrote:
> System register accesses also use zero register for Rt == 31, and
> therefore using it will also result in getting SP value instead. This
> patch makes them also using new accessors, introduced by the previous
> patch.
> 
> Additionally, got rid of "massive hack" in kvm_handle_cp_64().

Looks good for a first drop - some comments below.

> 
> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm64/kvm/sys_regs.c| 79 
> 
>  arch/arm64/kvm/sys_regs.h|  4 +-
>  arch/arm64/kvm/sys_regs_generic_v8.c |  2 +-
>  3 files changed, 46 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 87a64e8..a667228 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -102,7 +102,7 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  
>   BUG_ON(!p->is_write);
>  
> - val = *vcpu_reg(vcpu, p->Rt);
> + val = *p->val;

Why does it have to be a pointer? You could just have "val = p->val" if
you carried the actual value instead of a pointer to the stack variable
holding that value.

>   if (!p->is_aarch32) {
>   vcpu_sys_reg(vcpu, r->reg) = val;
>   } else {
> @@ -125,13 +125,10 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
>  const struct sys_reg_params *p,
>  const struct sys_reg_desc *r)
>  {
> - u64 val;
> -
>   if (!p->is_write)
>   return read_from_write_only(vcpu, p);
>  
> - val = *vcpu_reg(vcpu, p->Rt);
> - vgic_v3_dispatch_sgi(vcpu, val);
> + vgic_v3_dispatch_sgi(vcpu, *p->val);
>  
>   return true;
>  }
> @@ -153,7 +150,7 @@ static bool trap_oslsr_el1(struct kvm_vcpu *vcpu,
>   if (p->is_write) {
>   return ignore_write(vcpu, p);
>   } else {
> - *vcpu_reg(vcpu, p->Rt) = (1 << 3);
> + *p->val = (1 << 3);
>   return true;
>   }
>  }
> @@ -167,7 +164,7 @@ static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu,
>   } else {
>   u32 val;
>   asm volatile("mrs %0, dbgauthstatus_el1" : "=r" (val));
> - *vcpu_reg(vcpu, p->Rt) = val;
> + *p->val = val;
>   return true;
>   }
>  }
> @@ -204,13 +201,13 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>   const struct sys_reg_desc *r)
>  {
>   if (p->is_write) {
> - vcpu_sys_reg(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
> + vcpu_sys_reg(vcpu, r->reg) = *p->val;
>   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>   } else {
> - *vcpu_reg(vcpu, p->Rt) = vcpu_sys_reg(vcpu, r->reg);
> + *p->val = vcpu_sys_reg(vcpu, r->reg);
>   }
>  
> - trace_trap_reg(__func__, r->reg, p->is_write, *vcpu_reg(vcpu, p->Rt));
> + trace_trap_reg(__func__, r->reg, p->is_write, *p->val);
>  
>   return true;
>  }
> @@ -228,7 +225,7 @@ static inline void reg_to_dbg(struct kvm_vcpu *vcpu,
> const struct sys_reg_params *p,
> u64 *dbg_reg)
>  {
> - u64 val = *vcpu_reg(vcpu, p->Rt);
> + u64 val = *p->val;
>  
>   if (p->is_32bit) {
>   val &= 0xUL;
> @@ -248,7 +245,7 @@ static inline void dbg_to_reg(struct kvm_vcpu *vcpu,
>   if (p->is_32bit)
>   val &= 0xUL;
>  
> - *vcpu_reg(vcpu, p->Rt) = val;
> + *p->val = val;
>  }
>  
>  static inline bool trap_bvr(struct kvm_vcpu *vcpu,
> @@ -697,10 +694,10 @@ static bool trap_dbgidr(struct kvm_vcpu *vcpu,
>   u64 pfr = read_system_reg(SYS_ID_AA64PFR0_EL1);
>   u32 el3 = !!cpuid_feature_extract_field(pfr, 
> ID_AA64PFR0_EL3_SHIFT);
>  
> - *vcpu_reg(vcpu, p->Rt) = dfr >> ID_AA64DFR0_WRPS_SHIFT) & 
> 0xf) << 28) |
> -   (((dfr >> ID_AA64DFR0_BRPS_SHIFT) & 
> 0xf) << 24) |
> -   (((dfr >> ID_AA64DFR0_CTX_CMPS_SHIFT) 
> & 0xf) << 20) |
> -   (6 << 16) | (el3 << 14) | (el3 << 
> 12));
> + *p->val = dfr >> ID_AA64DFR0_WRPS_SHIFT) & 0xf) << 28) |
> +(((dfr >> ID_AA64DFR0_BRPS_SHIFT) & 0xf) << 24) |
> +(((dfr >> ID_AA64DFR0_CTX_CMPS_SHIFT) & 0xf) << 20) |
> +(6 << 16) | (el3 << 14) | (el3 << 12));
>   return true;
>   }
>  }
> @@ -710,10 +707,10 @@ static bool trap_debug32(struct kvm_vcpu *vcpu,
>const struct sys_reg_desc *r)
>  {
>   if (p->is_write) {
> - vcpu_cp14(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt);
> + vcpu_cp14(vcpu, r->reg) = *p->val;
>   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>   } else {
> - *vcpu_reg(vcpu, p->Rt) = vcpu_cp14(vcpu, r->reg);
> +   

RE: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO

2015-12-03 Thread Pavel Fedin
 Hello!

> I like that you're making this transparent
> for the user, but at the same time, directly calling function pointers
> through the msi_domain_ops is quite ugly.

 Do you mean dereferencing info->ops->vfio_map() in .c code? I can introduce 
some wrappers in include/linux/msi.h like 
msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not conceptually 
change anything.

>  There needs to be a real, interface there that isn't specific to vfio.

 Hm... What else is going to use this?
 Actually, in my implementation the only thing specific to vfio is using struct 
vfio_iommu_driver_ops. This is because we have to perform MSI mapping for all 
"vfio domains" registered for this container. At least this is how original 
type1 driver works.
 Can anybody explain me, what these "vfio domains" are? From the code it looks 
like we can have several IOMMU instances belonging to one VFIO container, and 
in this case one IOMMU == one "vfio domain". So is my understanding correct 
that "vfio domain" is IOMMU instance?
 And here come completely different ideas...
 First of all, can anybody explain, why do i perform all mappings on per-IOMMU 
basis, not on per-device basis? AFAIK at least ARM SMMU knows about "stream 
IDs", and therefore it should be capable of distinguishing between different 
devices. So can i have per-device mapping? This would make things much simpler.
 So:
 Idea 1: do per-device mappings. In this case i don't have to track down which 
devices belong to which group and which IOMMU...
 Idea 2: What if we indeed simply simulate x86 behavior? What if we just do 1:1 
mapping for MSI register when IOMMU is initialized and forget about it, so that 
MSI messages are guaranteed to reach the host? Or would this mean that we would 
have to do 1:1 mapping for the whole address range? Looks like (1) tried to do 
something similar, with address reservation.
 Idea 3: Is single device guaranteed to correspond to a single "vfio domain" 
(and, as a consequence, to a single IOMMU)? In this case it's very easy to 
unlink interface introduced by 0002 of my series from vfio, and pass just raw 
struct iommu_domain * without any driver_ops? irqchip code would only need 
iommu_map() and iommu_unmap() then, no calling back to vfio layer.

Cc'ed to authors of all mentioned series

> [1] http://www.spinics.net/lists/kvm/msg121669.html,
> http://www.spinics.net/lists/kvm/msg121662.html
> [2] http://www.spinics.net/lists/kvm/msg119236.html

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-03 Thread Lan, Tianyu


On 12/2/2015 10:31 PM, Michael S. Tsirkin wrote:

>We hope
>to find a better way to make SRIOV NIC work in these cases and this is
>worth to do since SRIOV NIC provides better network performance compared
>with PV NIC.

If this is a performance optimization as the above implies,
you need to include some numbers, and document how did
you implement the switch and how did you measure the performance.



OK. Some ideas of my patches come from paper "CompSC: Live Migration with
Pass-through Devices".
http://www.cl.cam.ac.uk/research/srg/netos/vee_2012/papers/p109.pdf

It compared performance data between the solution of switching PV and VF 
and VF migration.(Chapter 7: Discussion)




>Current patches have some issues. I think we can find
>solution for them andimprove them step by step.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO

2015-12-03 Thread Marc Zyngier
On Thu, 3 Dec 2015 16:16:54 +0300
Pavel Fedin  wrote:

Pavel,

>  Hello!
> 
> > I like that you're making this transparent
> > for the user, but at the same time, directly calling function pointers
> > through the msi_domain_ops is quite ugly.
> 
>  Do you mean dereferencing info->ops->vfio_map() in .c code? I can introduce 
> some wrappers in include/linux/msi.h like 
> msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not conceptually 
> change anything.
> 
> >  There needs to be a real, interface there that isn't specific to vfio.
> 
>  Hm... What else is going to use this?
>  Actually, in my implementation the only thing specific to vfio is
> using struct vfio_iommu_driver_ops. This is because we have to
> perform MSI mapping for all "vfio domains" registered for this
> container. At least this is how original type1 driver works.

>  Can anybody explain me, what these "vfio domains" are? From the code
> it looks like we can have several IOMMU instances belonging to one
> VFIO container, and in this case one IOMMU == one "vfio domain". So
> is my understanding correct that "vfio domain" is IOMMU instance?

>  And here come completely different ideas...

>  First of all, can anybody explain, why do i perform all mappings on
> per-IOMMU basis, not on per-device basis? AFAIK at least ARM SMMU
> knows about "stream IDs", and therefore it should be capable of
> distinguishing between different devices. So can i have per-device
> mapping? This would make things much simpler.

Not exactly. You can have a a bunch of devices between a single
StreamID (making them completely indistinguishable). This is a system
integration decision.

>  So:
>  Idea 1: do per-device mappings. In this case i don't have to track
> down which devices belong to which group and which IOMMU...

As explained above, this doesn't work.

>  Idea 2: What if we indeed simply simulate x86 behavior? What if we
> just do 1:1 mapping for MSI register when IOMMU is initialized and
> forget about it, so that MSI messages are guaranteed to reach the
> host? Or would this mean that we would have to do 1:1 mapping for the
> whole address range? Looks like (1) tried to do something similar,
> with address reservation.

Neither. We want userspace to be in control of the memory map, and it
is the kernel's job to tell us whether or not this matches the HW
capabilities or not. A fixed mapping may completely clash with the
memory map I want (think emulating HW x on platform y), and there is no
reason why we should have the restrictions x86 has.

My understanding is that userspace should be in charge here, and maybe
obtain a range of acceptable IOVAs for the MSI doorbell to be mapped.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: arm64: Correctly handle zero register in system register accesses

2015-12-03 Thread Marc Zyngier
On 03/12/15 11:55, Pavel Fedin wrote:
>  Hello!
> 
>>> It's simply more convenient to use a pointer for exchange with
>>> userspace, see vgic_v3_cpu_regs_access() and callers. I wouldn't like
>>> to refactor the code again. What's your opinion on this?
>>
>> I still don't think this is a good idea. You can still store the value
>> as an integer in vgic_v3_cpu_regs_access(), and check the write property
>> to do the writeback on read. Which is the same thing I asked for in this
>> patch.
> 
>  Started doing this and found one more (big) reason against. All sysreg 
> handlers have 'const struct sys_reg_params' declaration, and
> callers, and their callers... This 'const' is all around the code, and it 
> would take a separate huge patch to un-const'ify all this.
> Does it worth that?

maz@approximate:~/Work/arm-platforms$ git diff --stat
 arch/arm64/kvm/sys_regs.c | 36 ++--
 arch/arm64/kvm/sys_regs.h |  2 +-
 2 files changed, 19 insertions(+), 19 deletions(-)

Hardly a big deal... You can have that as a separate patch.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: fix the writing POSTED_INTR_NV

2015-12-03 Thread Paolo Bonzini


On 03/12/2015 06:29, roy.qing...@gmail.com wrote:
> From: Li RongQing  
> 
> POSTED_INTR_NV is 16bit, should not use 64bit write function
> 
> [ 5311.676074] vmwrite error: reg 3 value 0 (err 12)
>   [ 5311.680001] CPU: 49 PID: 4240 Comm: qemu-system-i38 Tainted: G I 
> 4.1.13-WR8.0.0.0_standard #1
>   [ 5311.689343] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS 
> SE5C610.86B.01.01.0008.021120151325 02/11/2015
>   [ 5311.699550]   e69a7e1c c1950de1  e69a7e38 
> fafcff45 fafebd24
>   [ 5311.706924] 0003  000c b6a06dfa e69a7e40 fafcff79 
> e69a7eb0 fafd5f57
>   [ 5311.714296] e69a7ec0 c1080600  0001 c0e18018 01be 
>  0b43
>   [ 5311.721651] Call Trace:
>   [ 5311.722942] [] dump_stack+0x4b/0x75
>   [ 5311.726467] [] vmwrite_error+0x35/0x40 [kvm_intel]
>   [ 5311.731444] [] vmcs_writel+0x29/0x30 [kvm_intel]
>   [ 5311.736228] [] vmx_create_vcpu+0x337/0xb90 [kvm_intel]
>   [ 5311.741600] [] ? dequeue_task_fair+0x2e0/0xf60
>   [ 5311.746197] [] kvm_arch_vcpu_create+0x3a/0x70 [kvm]
>   [ 5311.751278] [] kvm_vm_ioctl+0x14d/0x640 [kvm]
>   [ 5311.755771] [] ? free_pages_prepare+0x1a4/0x2d0
>   [ 5311.760455] [] ? debug_smp_processor_id+0x12/0x20
>   [ 5311.765333] [] ? sched_move_task+0xbe/0x170
>   [ 5311.769621] [] ? kmem_cache_free+0x213/0x230
>   [ 5311.774016] [] ? kvm_set_memory_region+0x60/0x60 [kvm]
>   [ 5311.779379] [] do_vfs_ioctl+0x2e2/0x500
>   [ 5311.783285] [] ? kmem_cache_free+0x213/0x230
>   [ 5311.787677] [] ? __mmdrop+0x63/0xd0
>   [ 5311.791196] [] ? __mmdrop+0x63/0xd0
>   [ 5311.794712] [] ? __mmdrop+0x63/0xd0
>   [ 5311.798234] [] ? __fget+0x57/0x90
>   [ 5311.801559] [] ? __fget_light+0x22/0x50
>   [ 5311.805464] [] SyS_ioctl+0x80/0x90
>   [ 5311.808885] [] sysenter_do_call+0x12/0x12
>   [ 5312.059280] kvm: zapping shadow pages for mmio generation wraparound
>   [ 5313.678415] kvm [4231]: vcpu0 disabled perfctr wrmsr: 0xc2 data 0x
>   [ 5313.726518] kvm [4231]: vcpu0 unhandled rdmsr: 0x570
> 
> Signed-off-by: Li RongQing 
> Cc: Yang Zhang 
> ---
>  arch/x86/kvm/vmx.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5eb56ed..418e084 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4780,7 +4780,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>  
>   vmcs_write16(GUEST_INTR_STATUS, 0);
>  
> - vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
> + vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);
>   vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((>pi_desc)));
>   }
>  
> @@ -9505,7 +9505,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
>*/
>   vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv;
>   vmx->nested.pi_pending = false;
> - vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
> + vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);
>   vmcs_write64(POSTED_INTR_DESC_ADDR,
>   page_to_phys(vmx->nested.pi_desc_page) +
>   (unsigned long)(vmcs12->posted_intr_desc_addr &
> 

Indeed this is a problem for 32-bit hosts.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] virtio/s390: one fix

2015-12-03 Thread Cornelia Huck
Michael,

here's one fix for the virtio-ccw driver.

Patch is against master, as your git branches on mst/vhost.git seem
to be quite old. Is there some branch I can base my own branch on,
or do you prefer to take patches directly anyway?

Cornelia Huck (1):
  virtio/s390: handle error values in irb

 drivers/s390/virtio/virtio_ccw.c | 62 
 1 file changed, 37 insertions(+), 25 deletions(-)

-- 
2.3.9

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] virtio/s390: handle error values in irb

2015-12-03 Thread Cornelia Huck
The common I/O layer may pass an error value as the irb in the device's
interrupt handler (for classic channel I/O). This won't happen in
current virtio-ccw implementations, but it's better to be safe than
sorry.

Let's just return the error conveyed by the irb and clear any possible
pending I/O indications.

Signed-off-by: Cornelia Huck 
Reviewed-by: Guenther Hutzl 
Reviewed-by: Pierre Morel 
---
 drivers/s390/virtio/virtio_ccw.c | 62 
 1 file changed, 37 insertions(+), 25 deletions(-)

diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index b2a1a81..1b83159 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -984,6 +984,36 @@ static struct virtqueue *virtio_ccw_vq_by_ind(struct 
virtio_ccw_device *vcdev,
return vq;
 }
 
+static void virtio_ccw_check_activity(struct virtio_ccw_device *vcdev,
+ __u32 activity)
+{
+   if (vcdev->curr_io & activity) {
+   switch (activity) {
+   case VIRTIO_CCW_DOING_READ_FEAT:
+   case VIRTIO_CCW_DOING_WRITE_FEAT:
+   case VIRTIO_CCW_DOING_READ_CONFIG:
+   case VIRTIO_CCW_DOING_WRITE_CONFIG:
+   case VIRTIO_CCW_DOING_WRITE_STATUS:
+   case VIRTIO_CCW_DOING_SET_VQ:
+   case VIRTIO_CCW_DOING_SET_IND:
+   case VIRTIO_CCW_DOING_SET_CONF_IND:
+   case VIRTIO_CCW_DOING_RESET:
+   case VIRTIO_CCW_DOING_READ_VQ_CONF:
+   case VIRTIO_CCW_DOING_SET_IND_ADAPTER:
+   case VIRTIO_CCW_DOING_SET_VIRTIO_REV:
+   vcdev->curr_io &= ~activity;
+   wake_up(>wait_q);
+   break;
+   default:
+   /* don't know what to do... */
+   dev_warn(>cdev->dev,
+"Suspicious activity '%08x'\n", activity);
+   WARN_ON(1);
+   break;
+   }
+   }
+}
+
 static void virtio_ccw_int_handler(struct ccw_device *cdev,
   unsigned long intparm,
   struct irb *irb)
@@ -995,6 +1025,12 @@ static void virtio_ccw_int_handler(struct ccw_device 
*cdev,
 
if (!vcdev)
return;
+   if (IS_ERR(irb)) {
+   vcdev->err = PTR_ERR(irb);
+   virtio_ccw_check_activity(vcdev, activity);
+   /* Don't poke around indicators, something's wrong. */
+   return;
+   }
/* Check if it's a notification from the host. */
if ((intparm == 0) &&
(scsw_stctl(>scsw) ==
@@ -1010,31 +1046,7 @@ static void virtio_ccw_int_handler(struct ccw_device 
*cdev,
/* Map everything else to -EIO. */
vcdev->err = -EIO;
}
-   if (vcdev->curr_io & activity) {
-   switch (activity) {
-   case VIRTIO_CCW_DOING_READ_FEAT:
-   case VIRTIO_CCW_DOING_WRITE_FEAT:
-   case VIRTIO_CCW_DOING_READ_CONFIG:
-   case VIRTIO_CCW_DOING_WRITE_CONFIG:
-   case VIRTIO_CCW_DOING_WRITE_STATUS:
-   case VIRTIO_CCW_DOING_SET_VQ:
-   case VIRTIO_CCW_DOING_SET_IND:
-   case VIRTIO_CCW_DOING_SET_CONF_IND:
-   case VIRTIO_CCW_DOING_RESET:
-   case VIRTIO_CCW_DOING_READ_VQ_CONF:
-   case VIRTIO_CCW_DOING_SET_IND_ADAPTER:
-   case VIRTIO_CCW_DOING_SET_VIRTIO_REV:
-   vcdev->curr_io &= ~activity;
-   wake_up(>wait_q);
-   break;
-   default:
-   /* don't know what to do... */
-   dev_warn(>dev, "Suspicious activity '%08x'\n",
-activity);
-   WARN_ON(1);
-   break;
-   }
-   }
+   virtio_ccw_check_activity(vcdev, activity);
for_each_set_bit(i, >indicators,
 sizeof(vcdev->indicators) * BITS_PER_BYTE) {
/* The bit clear must happen before the vring kick. */
-- 
2.3.9

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] Introduce MSI hardware mapping for VFIO

2015-12-03 Thread Alex Williamson
On Thu, 2015-12-03 at 16:16 +0300, Pavel Fedin wrote:
>  Hello!
> 
> > I like that you're making this transparent
> > for the user, but at the same time, directly calling function pointers
> > through the msi_domain_ops is quite ugly.
> 
>  Do you mean dereferencing info->ops->vfio_map() in .c code? I can
> introduce some wrappers in include/linux/msi.h like
> msi_domain_vfio_map()/msi_domain_vfio_unmap(), this would not
> conceptually change anything.

But otherwise we're parsing data structures that are possibly intended
to be private.  An interface also abstracts the implementation from the
caller.

> >  There needs to be a real, interface there that isn't specific to
> vfio.
> 
>  Hm... What else is going to use this?

I don't know, but pushing vfio specific data structures and concepts
into core kernel callbacks is clearly the wrong direction to go.

>  Actually, in my implementation the only thing specific to vfio is
> using struct vfio_iommu_driver_ops. This is because we have to perform
> MSI mapping for all "vfio domains" registered for this container. At
> least this is how original type1 driver works.
>  Can anybody explain me, what these "vfio domains" are? From the code
> it looks like we can have several IOMMU instances belonging to one
> VFIO container, and in this case one IOMMU == one "vfio domain". So is
> my understanding correct that "vfio domain" is IOMMU instance?

There's no such thing as a vfio domain, I think you mean iommu domains.
A vfio container represents a user iommu context.  All of the groups
(and thus devices) within a container have the same iommu mappings.
However, not all of the groups are necessarily behind iommu hardware
units that support the same set of features.  We might therefore need to
mirror the user iommu context for the container across multiple physical
iommu contexts (aka domains).  When we walk the iommu->domain_list,
we're mirroring mappings across these multiple iommu domains within the
container.

>  And here come completely different ideas...
>  First of all, can anybody explain, why do i perform all mappings on
> per-IOMMU basis, not on per-device basis? AFAIK at least ARM SMMU
> knows about "stream IDs", and therefore it should be capable of
> distinguishing between different devices. So can i have per-device
> mapping? This would make things much simpler.

vfio is built on iommu groups with the premise being that an iommu group
represents the smallest set of devices that are isolated from all other
devices both by iommu visibility and by DMA isolation (peer-to-peer).
Therefore we base iommu mappings on an iommu group because we cannot
enforce userspace isolation at a sub-group level.  In a system or
topology that is well architected for device isolation, there will be a
one-to-one mapping of iommu groups to devices.

So, iommu mappings are always on a per-group basis, but the user may
attach multiple groups to a single container, which as discussed above
represents a single iommu context.  That single context may be backed by
one or more iommu domains, depending on the capabilities of the iommu
hardware.  You're therefore not performing all mappings on a per-iommu
basis unless you have a user defined container spanning multiple iommus
which are incompatible in a way that requires us to manage them with
separate iommu domains.

The iommu's ability to do per device mappings here is irrelevant.
You're working within a user defined IOMMU context where they have
decided that all of the devices should have the same context.

>  So:
>  Idea 1: do per-device mappings. In this case i don't have to track
> down which devices belong to which group and which IOMMU...

Nak, that doesn't solve anything.

>  Idea 2: What if we indeed simply simulate x86 behavior? What if we
> just do 1:1 mapping for MSI register when IOMMU is initialized and
> forget about it, so that MSI messages are guaranteed to reach the
> host? Or would this mean that we would have to do 1:1 mapping for the
> whole address range? Looks like (1) tried to do something similar,
> with address reservation.

x86 isn't problem-free in this space.  An x86 VM is going to know that
the 0xfee0 address range is special, it won't be backed by RAM and
won't be a DMA target, thus we'll never attempt to map it for an iova
address.  However, if we run a non-x86 VM or a userspace driver, it
doesn't necessarily know that there's anything special about that range
of iovas.  I intend to resolve this with an extension to the iommu info
ioctl that describes the available iova space for the iommu.  The
interrupt region would simply be excluded.

This may be an option for you too, but you need to consider whether it
precludes things like hotplug.  Even in the x86 case, if we have a
non-x86 VM and we try to hot-add a PCI device, we can't dynamically
remove the RAM that would interfere with with the MSI vector block.  I
don't know what that looks like on your platform, whether you can pick a
fixed range for the 

[PATCH] KVM: VMX: fix read/write sizes of VMCS fields

2015-12-03 Thread Paolo Bonzini
In theory this should have broken EPT on 32-bit kernels (due to
reading the high part of natural-width field GUEST_CR3).  Not sure
if no one noticed or the processor behaves differently from the
documentation.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/vmx.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c39737ff0581..b1af1e48070b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4868,7 +4868,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
 
seg_setup(VCPU_SREG_CS);
vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
-   vmcs_write32(GUEST_CS_BASE, 0x);
+   vmcs_writel(GUEST_CS_BASE, 0xul);
 
seg_setup(VCPU_SREG_DS);
seg_setup(VCPU_SREG_ES);
@@ -4904,7 +4904,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
 
vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0);
-   vmcs_write32(GUEST_PENDING_DBG_EXCEPTIONS, 0);
+   vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0);
 
setup_msrs(vmx);
 
@@ -7893,7 +7893,7 @@ static void dump_vmcs(void)
u32 pin_based_exec_ctrl = vmcs_read32(PIN_BASED_VM_EXEC_CONTROL);
u32 secondary_exec_control = 0;
unsigned long cr4 = vmcs_readl(GUEST_CR4);
-   u64 efer = vmcs_readl(GUEST_IA32_EFER);
+   u64 efer = vmcs_read64(GUEST_IA32_EFER);
int i, n;
 
if (cpu_has_secondary_exec_ctrls())
@@ -10159,7 +10159,7 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12,
 * Additionally, restore L2's PDPTR to vmcs12.
 */
if (enable_ept) {
-   vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3);
+   vmcs12->guest_cr3 = vmcs_readl(GUEST_CR3);
vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: fix read/write sizes of VMCS fields in dump_vmcs

2015-12-03 Thread Paolo Bonzini
This was not printing the high parts of several 64-bit fields on
32-bit kernels.  Separate from the previous one to make the patches
easier to review.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/vmx.c | 39 ---
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b1af1e48070b..b1a453d78155 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7909,10 +7909,10 @@ static void dump_vmcs(void)
if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT) &&
(cr4 & X86_CR4_PAE) && !(efer & EFER_LMA))
{
-   pr_err("PDPTR0 = 0x%016lx  PDPTR1 = 0x%016lx\n",
-  vmcs_readl(GUEST_PDPTR0), vmcs_readl(GUEST_PDPTR1));
-   pr_err("PDPTR2 = 0x%016lx  PDPTR3 = 0x%016lx\n",
-  vmcs_readl(GUEST_PDPTR2), vmcs_readl(GUEST_PDPTR3));
+   pr_err("PDPTR0 = 0x%016llx  PDPTR1 = 0x%016llx\n",
+  vmcs_read64(GUEST_PDPTR0), vmcs_read64(GUEST_PDPTR1));
+   pr_err("PDPTR2 = 0x%016llx  PDPTR3 = 0x%016llx\n",
+  vmcs_read64(GUEST_PDPTR2), vmcs_read64(GUEST_PDPTR3));
}
pr_err("RSP = 0x%016lx  RIP = 0x%016lx\n",
   vmcs_readl(GUEST_RSP), vmcs_readl(GUEST_RIP));
@@ -7933,16 +7933,16 @@ static void dump_vmcs(void)
vmx_dump_sel("TR:  ", GUEST_TR_SELECTOR);
if ((vmexit_ctl & (VM_EXIT_SAVE_IA32_PAT | VM_EXIT_SAVE_IA32_EFER)) ||
(vmentry_ctl & (VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_IA32_EFER)))
-   pr_err("EFER = 0x%016llx  PAT = 0x%016lx\n",
-  efer, vmcs_readl(GUEST_IA32_PAT));
-   pr_err("DebugCtl = 0x%016lx  DebugExceptions = 0x%016lx\n",
-  vmcs_readl(GUEST_IA32_DEBUGCTL),
+   pr_err("EFER = 0x%016llx  PAT = 0x%016llx\n",
+  efer, vmcs_read64(GUEST_IA32_PAT));
+   pr_err("DebugCtl = 0x%016llx  DebugExceptions = 0x%016lx\n",
+  vmcs_read64(GUEST_IA32_DEBUGCTL),
   vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS));
if (vmentry_ctl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL)
-   pr_err("PerfGlobCtl = 0x%016lx\n",
-  vmcs_readl(GUEST_IA32_PERF_GLOBAL_CTRL));
+   pr_err("PerfGlobCtl = 0x%016llx\n",
+  vmcs_read64(GUEST_IA32_PERF_GLOBAL_CTRL));
if (vmentry_ctl & VM_ENTRY_LOAD_BNDCFGS)
-   pr_err("BndCfgS = 0x%016lx\n", vmcs_readl(GUEST_BNDCFGS));
+   pr_err("BndCfgS = 0x%016llx\n", vmcs_read64(GUEST_BNDCFGS));
pr_err("Interruptibility = %08x  ActivityState = %08x\n",
   vmcs_read32(GUEST_INTERRUPTIBILITY_INFO),
   vmcs_read32(GUEST_ACTIVITY_STATE));
@@ -7971,11 +7971,12 @@ static void dump_vmcs(void)
   vmcs_read32(HOST_IA32_SYSENTER_CS),
   vmcs_readl(HOST_IA32_SYSENTER_EIP));
if (vmexit_ctl & (VM_EXIT_LOAD_IA32_PAT | VM_EXIT_LOAD_IA32_EFER))
-   pr_err("EFER = 0x%016lx  PAT = 0x%016lx\n",
-  vmcs_readl(HOST_IA32_EFER), vmcs_readl(HOST_IA32_PAT));
+   pr_err("EFER = 0x%016llx  PAT = 0x%016llx\n",
+  vmcs_read64(HOST_IA32_EFER),
+  vmcs_read64(HOST_IA32_PAT));
if (vmexit_ctl & VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL)
-   pr_err("PerfGlobCtl = 0x%016lx\n",
-  vmcs_readl(HOST_IA32_PERF_GLOBAL_CTRL));
+   pr_err("PerfGlobCtl = 0x%016llx\n",
+  vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL));
 
pr_err("*** Control State ***\n");
pr_err("PinBased=%08x CPUBased=%08x SecondaryExec=%08x\n",
@@ -7998,16 +7999,16 @@ static void dump_vmcs(void)
pr_err("IDTVectoring: info=%08x errcode=%08x\n",
   vmcs_read32(IDT_VECTORING_INFO_FIELD),
   vmcs_read32(IDT_VECTORING_ERROR_CODE));
-   pr_err("TSC Offset = 0x%016lx\n", vmcs_readl(TSC_OFFSET));
+   pr_err("TSC Offset = 0x%016llx\n", vmcs_read64(TSC_OFFSET));
if (secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
-   pr_err("TSC Multiplier = 0x%016lx\n",
-  vmcs_readl(TSC_MULTIPLIER));
+   pr_err("TSC Multiplier = 0x%016llx\n",
+  vmcs_read64(TSC_MULTIPLIER));
if (cpu_based_exec_ctrl & CPU_BASED_TPR_SHADOW)
pr_err("TPR Threshold = 0x%02x\n", vmcs_read32(TPR_THRESHOLD));
if (pin_based_exec_ctrl & PIN_BASED_POSTED_INTR)
pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16(POSTED_INTR_NV));
if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT))
-   pr_err("EPT pointer = 0x%016lx\n", vmcs_readl(EPT_POINTER));
+   pr_err("EPT pointer = 0x%016llx\n", vmcs_read64(EPT_POINTER));
n = vmcs_read32(CR3_TARGET_COUNT);
for (i = 0; i + 1 < n; i += 4)
pr_err("CR3 

Re: [PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support

2015-12-03 Thread Marc Zyngier
On 14/11/15 22:12, Mario Smarduch wrote:
> This patch adds vcpu fields to track lazy state, save host FPEXC, and
> offsets to fields.
> 
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/include/asm/kvm_host.h | 6 ++
>  arch/arm/kernel/asm-offsets.c   | 2 ++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 3df1e97..f1bf551 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -107,6 +107,12 @@ struct kvm_vcpu_arch {
>   /* Interrupt related fields */
>   u32 irq_lines;  /* IRQ and FIQ levels */
>  
> + /* fp/simd dirty flag true if guest accessed register file */
> + boolvfp_dirty;

I think we do not need this bool, because it is already represented by
the state of the trapping bits.

> +
> + /* Save host FPEXC register to later restore on vcpu put */
> + u32 host_fpexc;
> +
>   /* Exception Information */
>   struct kvm_vcpu_fault_info fault;
>  
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index 871b826..9f79712 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -186,6 +186,8 @@ int main(void)
>DEFINE(VCPU_CPSR,  offsetof(struct kvm_vcpu, 
> arch.regs.usr_regs.ARM_cpsr));
>DEFINE(VCPU_HCR,   offsetof(struct kvm_vcpu, arch.hcr));
>DEFINE(VCPU_IRQ_LINES, offsetof(struct kvm_vcpu, arch.irq_lines));
> +  DEFINE(VCPU_VFP_DIRTY, offsetof(struct kvm_vcpu, arch.vfp_dirty));
> +  DEFINE(VCPU_VFP_HOST_FPEXC,offsetof(struct kvm_vcpu, 
> arch.host_fpexc));
>DEFINE(VCPU_HSR,   offsetof(struct kvm_vcpu, arch.fault.hsr));
>DEFINE(VCPU_HxFAR, offsetof(struct kvm_vcpu, arch.fault.hxfar));
>DEFINE(VCPU_HPFAR, offsetof(struct kvm_vcpu, arch.fault.hpfar));
> 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-12-03 Thread Marc Zyngier
On 14/11/15 22:12, Mario Smarduch wrote:
> This patch tracks armv7 fp/simd hardware state with a vcpu lazy flag.
> On vcpu_load saves host fpexc and enables FP access, and later enables fp/simd
> trapping if lazy flag is not set. On first fp/simd access trap to handler 
> to save host and restore guest context, disable trapping and set vcpu lazy 
> flag. On vcpu_put if flag is set save guest and restore host context and 
> always restore host fpexc.
> 
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/include/asm/kvm_host.h   | 33 ++
>  arch/arm/kvm/arm.c| 12 
>  arch/arm/kvm/interrupts.S | 58 
> +++
>  arch/arm/kvm/interrupts_head.S| 26 +-
>  arch/arm64/include/asm/kvm_host.h |  6 
>  5 files changed, 104 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index f1bf551..8fc7a59 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -40,6 +40,38 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>  
> +/*
> + * Reads the host FPEXC register, saves it to vcpu context and enables the
> + * FP/SIMD unit.
> + */
> +#ifdef CONFIG_VFPv3
> +#define kvm_enable_fpexc(vcpu) { \
> + u32 fpexc = 0;  \
> + asm volatile(   \
> + "mrc p10, 7, %0, cr8, cr0, 0\n" \
> + "str %0, [%1]\n"\
> + "orr %0, %0, #(1 << 30)\n"  \
> + "mcr p10, 7, %0, cr8, cr0, 0\n" \

Don't you need an ISB here? Also, it would be a lot nicer if this was a
real function (possibly inlined). I don't see any real reason to make
this a #define.

Also, you're preserving a lot of the host's FPEXC bits. Is that safe?

> + : "+r" (fpexc)  \
> + : "r" (>arch.host_fpexc)  \
> + );  \
> +}
> +#else
> +#define kvm_enable_fpexc(vcpu)
> +#endif
> +
> +/* Restores host FPEXC register */
> +#ifdef CONFIG_VFPv3
> +#define kvm_restore_host_fpexc(vcpu) {   \
> + asm volatile(   \
> + "mcr p10, 7, %0, cr8, cr0, 0\n" \
> + : : "r" (vcpu->arch.host_fpexc) \
> + );  \
> +}
> +#else
> +#define kvm_restore_host_fpexc(vcpu)
> +#endif
> +
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> @@ -227,6 +259,7 @@ int kvm_perf_teardown(void);
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>  
>  static inline void kvm_arch_hardware_disable(void) {}
>  static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index dc017ad..cfc348a 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -291,10 +291,22 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>  
>   kvm_arm_set_running_vcpu(vcpu);
> +
> + /* Save and enable FPEXC before we load guest context */
> + kvm_enable_fpexc(vcpu);
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> + /* If the fp/simd registers are dirty save guest, restore host. */
> + if (vcpu->arch.vfp_dirty) {

See my previous comment about the dirty state.

> + kvm_restore_host_vfp_state(vcpu);
> + vcpu->arch.vfp_dirty = 0;
> + }
> +
> + /* Restore host FPEXC trashed in vcpu_load */
> + kvm_restore_host_fpexc(vcpu);
> +
>   /*
>* The arch-generic KVM code expects the cpu field of a vcpu to be -1
>* if the vcpu is no longer assigned to a cpu.  This is used for the
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 900ef6d..1ddaa89 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -28,6 +28,26 @@
>  #include "interrupts_head.S"
>  
>   .text
> +/**
> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
> + *   This function is called from host to save the guest, and restore host
> + *   fp/simd hardware context. It's placed outside of hyp start/end region.
> + */
> +ENTRY(kvm_restore_host_vfp_state)
> +#ifdef CONFIG_VFPv3
> + push{r4-r7}
> +
> + add r7, vcpu, #VCPU_VFP_GUEST
> + store_vfp_state r7
> +
> + add r7, vcpu, #VCPU_VFP_HOST
> + ldr r7, [r7]
> + restore_vfp_state r7
> +
> + pop {r4-r7}
> +#endif
> + bx  lr
> +ENDPROC(kvm_restore_host_vfp_state)
>  

Please don't mix things that are 

Re: [RFC PATCH V2 02/10] Qemu/VFIO: Add new VFIO_GET_PCI_CAP_INFO ioctl cmd definition

2015-12-03 Thread Alex Williamson
On Thu, 2015-12-03 at 16:40 +0800, Lan, Tianyu wrote:
> On 12/3/2015 6:25 AM, Alex Williamson wrote:
> > I didn't seen a matching kernel patch series for this, but why is the
> > kernel more capable of doing this than userspace is already?
> The following link is the kernel patch.
> http://marc.info/?l=kvm=144837328920989=2
> 
> > These seem
> > like pointless ioctls, we're creating a purely virtual PCI capability,
> > the kernel doesn't really need to participate in that.
> 
> VFIO kernel driver has pci_config_map which indicates the PCI capability 
> position and length which helps to find free PCI config regs. Qemu side 
> doesn't have such info and can't get the exact table size of PCI 
> capability. If we want to add such support in the Qemu, needs duplicates 
> a lot of code of vfio_pci_configs.c in the Qemu.

That's an internal implementation detail of the kernel, not motivation
for creating a new userspace ABI.  QEMU can recreate this data on its
own.  The kernel is in no more of an authoritative position to determine
capability extents than userspace.

> > Also, why are we
> > restricting ourselves to standard capabilities?
> 
> This version is to check whether it's on the right way and We can extend
> this to pci extended capability later.
> 
> > That's often a crowded
> > space and we can't always know whether an area is free or not based only
> > on it being covered by a capability.  Some capabilities can also appear
> > more than once, so there's context that isn't being passed to the kernel
> > here.  Thanks,
> 
> The region outside of PCI capability are not passed to kernel or used by 
> Qemu for MSI/MSIX . It's possible to use these places for new 
> capability. One concerns is that guest driver may abuse them and quirk 
> of masking some special regs outside of capability maybe helpful.

That's not correct, see kernel commit
a7d1ea1c11b33bda2691f3294b4d735ed635535a.  Gaps between capabilities are
exposed with raw read-write access from the kernel and some drivers and
devices depend on this.  There's also no guarantee that there's a
sufficiently sized gap in conventional space.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-12-03 Thread Marc Zyngier
On 14/11/15 22:12, Mario Smarduch wrote:
> This patch tracks armv7 and armv8 fp/simd hardware state with a vcpu lazy 
> flag.
> On vcpu_load for 32 bit guests enable FP access, and later enable fp/simd
> trapping for 32 and 64 bit guests if lazy flag is not set. On first fp/simd 
> access trap to handler to save host and restore guest context, disable 
> trapping and set vcpu lazy flag. On vcpu_put if flag is set save guest and 
> restore host context and also save guest fpexc register.
> 
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/include/asm/kvm_host.h   |  3 ++
>  arch/arm/kvm/arm.c| 18 +++--
>  arch/arm64/include/asm/kvm_asm.h  |  2 +
>  arch/arm64/include/asm/kvm_host.h | 17 +++-
>  arch/arm64/kernel/asm-offsets.c   |  1 +
>  arch/arm64/kvm/hyp.S  | 83 
> +--
>  6 files changed, 89 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 8fc7a59..6960ff2 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -40,6 +40,8 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>  
> +#define kvm_guest_is32bit(vcpu)  true

This should be defined as an inline function, and placed in
asm/kvm_emulate.h, probably renamed as kvm_guest_vcpu_is_32bit.

> +
>  /*
>   * Reads the host FPEXC register, saves to vcpu context and enables the
>   * FPEXC.
> @@ -260,6 +262,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
> +static inline void kvm_save_guest_fpexc(struct kvm_vcpu *vcpu) {}
>  
>  static inline void kvm_arch_hardware_disable(void) {}
>  static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index cfc348a..7a20530 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  
>   kvm_arm_set_running_vcpu(vcpu);
>  
> - /* Save and enable FPEXC before we load guest context */
> - kvm_enable_fpexc(vcpu);
> + /*
> +  * For 32bit guest executing on arm64, enable fp/simd access in
> +  * EL2. On arm32 save host fpexc and then enable fp/simd access.
> +  */
> + if (kvm_guest_is32bit(vcpu))
> + kvm_enable_fpexc(vcpu);
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -301,10 +305,18 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>   /* If the fp/simd registers are dirty save guest, restore host. */
>   if (vcpu->arch.vfp_dirty) {
>   kvm_restore_host_vfp_state(vcpu);
> +
> + /*
> +  * For 32bit guest on arm64 save the guest fpexc register
> +  * in EL2 mode.
> +  */
> + if (kvm_guest_is32bit(vcpu))
> + kvm_save_guest_fpexc(vcpu);
> +
>   vcpu->arch.vfp_dirty = 0;
>   }
>  
> - /* Restore host FPEXC trashed in vcpu_load */
> + /* For arm32 restore host FPEXC trashed in vcpu_load. */
>   kvm_restore_host_fpexc(vcpu);
>  
>   /*
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 5e37710..c589ca9 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
>  extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> +extern void __kvm_enable_fpexc32(void);
> +extern void __kvm_save_fpexc32(struct kvm_vcpu *vcpu);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 83e65dd..6e2d6b5 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -41,6 +41,8 @@
>  
>  #define KVM_VCPU_MAX_FEATURES 3
>  
> +#define kvm_guest_is32bit(vcpu)  (!(vcpu->arch.hcr_el2 & HCR_RW))
> +
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>  int kvm_arch_dev_ioctl_check_extension(long ext);
> @@ -251,9 +253,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> -static inline void kvm_enable_fpexc(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
> +
> +static inline void kvm_enable_fpexc(struct kvm_vcpu *vcpu)
> +{
> + /* Enable FP/SIMD access from EL2 mode*/
> + kvm_call_hyp(__kvm_enable_fpexc32);
> +}
> +
> +static inline void kvm_save_guest_fpexc(struct kvm_vcpu *vcpu)
> +{

[PATCH] KVM: vmx: detect mismatched size in VMCS read/write

2015-12-03 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
I am sending this as RFC because the error messages it produces are
very ugly.  Because of inlining, the original line is lost.  The
alternative is to change vmcs_read/write/checkXX into macros, but
then you need to have a single huge BUILD_BUG_ON or BUILD_BUG_ON_MSG
because multiple BUILD_BUG_ON* with the same __LINE__ are not
supported well.
---
 arch/x86/kvm/vmx.c | 100 -
 1 file changed, 83 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b1a453d78155..62d958a1ec0f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1447,7 +1447,51 @@ static inline void ept_sync_context(u64 eptp)
}
 }
 
-static __always_inline unsigned long vmcs_readl(unsigned long field)
+static __always_inline void vmcs_check16(unsigned long field)
+{
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
0x2000,
+"16-bit accessor invalid for 64-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
0x2001,
+"16-bit accessor invalid for 64-bit high field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x4000,
+"16-bit accessor invalid for 32-bit high field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x6000,
+"16-bit accessor invalid for natural width field");
+}
+
+static __always_inline void vmcs_check32(unsigned long field)
+{
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0,
+"32-bit accessor invalid for 16-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x6000,
+"32-bit accessor invalid for natural width field");
+}
+
+static __always_inline void vmcs_check64(unsigned long field)
+{
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0,
+"64-bit accessor invalid for 16-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
0x2001,
+"64-bit accessor invalid for 64-bit high field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x4000,
+"64-bit accessor invalid for 32-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x6000,
+"64-bit accessor invalid for natural width field");
+}
+
+static __always_inline void vmcs_checkl(unsigned long field)
+{
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0,
+"Natural width accessor invalid for 16-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
0x2000,
+"Natural width accessor invalid for 64-bit field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6001) == 
0x2001,
+"Natural width accessor invalid for 64-bit high 
field");
+BUILD_BUG_ON_MSG(__builtin_constant_p(field) && ((field) & 0x6000) == 
0x4000,
+"Natural width accessor invalid for 32-bit field");
+}
+
+static __always_inline unsigned long __vmcs_readl(unsigned long field)
 {
unsigned long value;
 
@@ -1458,23 +1502,32 @@ static __always_inline unsigned long 
vmcs_readl(unsigned long field)
 
 static __always_inline u16 vmcs_read16(unsigned long field)
 {
-   return vmcs_readl(field);
+   vmcs_check16(field);
+   return __vmcs_readl(field);
 }
 
 static __always_inline u32 vmcs_read32(unsigned long field)
 {
-   return vmcs_readl(field);
+   vmcs_check32(field);
+   return __vmcs_readl(field);
 }
 
 static __always_inline u64 vmcs_read64(unsigned long field)
 {
+   vmcs_check64(field);
 #ifdef CONFIG_X86_64
-   return vmcs_readl(field);
+   return __vmcs_readl(field);
 #else
-   return vmcs_readl(field) | ((u64)vmcs_readl(field+1) << 32);
+   return __vmcs_readl(field) | ((u64)__vmcs_readl(field+1) << 32);
 #endif
 }
 
+static __always_inline unsigned long vmcs_readl(unsigned long field)
+{
+   vmcs_checkl(field);
+   return __vmcs_readl(field);
+}
+
 static noinline void vmwrite_error(unsigned long field, unsigned long value)
 {
printk(KERN_ERR "vmwrite error: reg %lx value %lx (err %d)\n",
@@ -1482,7 +1535,7 @@ static noinline void vmwrite_error(unsigned long field, 
unsigned long value)
dump_stack();
 }
 
-static void vmcs_writel(unsigned long field, unsigned long value)
+static __always_inline void __vmcs_writel(unsigned long field, unsigned long 
value)
 {
u8 error;
 
@@ -1492,33 +1545,46 @@ static void vmcs_writel(unsigned long field, unsigned 
long 

Re: [PATCH v2 0/5] Add virtio transport for AF_VSOCK

2015-12-03 Thread David Miller
From: Stefan Hajnoczi 
Date: Wed,  2 Dec 2015 14:43:58 +0800

> This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support

2015-12-03 Thread Mario Smarduch


On 12/3/2015 7:46 AM, Marc Zyngier wrote:
> On 14/11/15 22:12, Mario Smarduch wrote:
>> This patch adds vcpu fields to track lazy state, save host FPEXC, and
>> offsets to fields.
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/include/asm/kvm_host.h | 6 ++
>>  arch/arm/kernel/asm-offsets.c   | 2 ++
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 3df1e97..f1bf551 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -107,6 +107,12 @@ struct kvm_vcpu_arch {
>>  /* Interrupt related fields */
>>  u32 irq_lines;  /* IRQ and FIQ levels */
>>  
>> +/* fp/simd dirty flag true if guest accessed register file */
>> +boolvfp_dirty;
> 
> I think we do not need this bool, because it is already represented by
> the state of the trapping bits.

The trapping bit state is lost on exit since they're cleared, no?

> 
>> +
>> +/* Save host FPEXC register to later restore on vcpu put */
>> +u32 host_fpexc;
>> +
>>  /* Exception Information */
>>  struct kvm_vcpu_fault_info fault;
>>  
>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>> index 871b826..9f79712 100644
>> --- a/arch/arm/kernel/asm-offsets.c
>> +++ b/arch/arm/kernel/asm-offsets.c
>> @@ -186,6 +186,8 @@ int main(void)
>>DEFINE(VCPU_CPSR, offsetof(struct kvm_vcpu, 
>> arch.regs.usr_regs.ARM_cpsr));
>>DEFINE(VCPU_HCR,  offsetof(struct kvm_vcpu, arch.hcr));
>>DEFINE(VCPU_IRQ_LINES,offsetof(struct kvm_vcpu, arch.irq_lines));
>> +  DEFINE(VCPU_VFP_DIRTY,offsetof(struct kvm_vcpu, arch.vfp_dirty));
>> +  DEFINE(VCPU_VFP_HOST_FPEXC,   offsetof(struct kvm_vcpu, 
>> arch.host_fpexc));
>>DEFINE(VCPU_HSR,  offsetof(struct kvm_vcpu, arch.fault.hsr));
>>DEFINE(VCPU_HxFAR,offsetof(struct kvm_vcpu, 
>> arch.fault.hxfar));
>>DEFINE(VCPU_HPFAR,offsetof(struct kvm_vcpu, 
>> arch.fault.hpfar));
>>
> 
> Thanks,
> 
>   M.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support

2015-12-03 Thread Marc Zyngier
On 03/12/15 19:21, Mario Smarduch wrote:
> 
> 
> On 12/3/2015 7:46 AM, Marc Zyngier wrote:
>> On 14/11/15 22:12, Mario Smarduch wrote:
>>> This patch adds vcpu fields to track lazy state, save host FPEXC, and
>>> offsets to fields.
>>>
>>> Signed-off-by: Mario Smarduch 
>>> ---
>>>  arch/arm/include/asm/kvm_host.h | 6 ++
>>>  arch/arm/kernel/asm-offsets.c   | 2 ++
>>>  2 files changed, 8 insertions(+)
>>>
>>> diff --git a/arch/arm/include/asm/kvm_host.h 
>>> b/arch/arm/include/asm/kvm_host.h
>>> index 3df1e97..f1bf551 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -107,6 +107,12 @@ struct kvm_vcpu_arch {
>>> /* Interrupt related fields */
>>> u32 irq_lines;  /* IRQ and FIQ levels */
>>>  
>>> +   /* fp/simd dirty flag true if guest accessed register file */
>>> +   boolvfp_dirty;
>>
>> I think we do not need this bool, because it is already represented by
>> the state of the trapping bits.
> 
> The trapping bit state is lost on exit since they're cleared, no?

But that's what should actually be preserved, no? At the moment, you
maintain some side state to reflect what the trapping state is. You
might as well keep it around all the time.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device

2015-12-03 Thread Yunhong Jiang
On Thu, Dec 03, 2015 at 11:55:53AM -0700, Alex Williamson wrote:
> On Thu, 2015-12-03 at 10:22 -0800, Yunhong Jiang wrote:
> > When assigning a VFIO device to a KVM guest with low latency requirement, 
> > it  
> > is better to handle the interrupt in the hard interrupt context, to reduce 
> > the context switch to/from the IRQ thread.
> > 
> > Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi 
> > interrupt is changed to use request_threaded_irq(). The primary interrupt 
> > handler tries to set the guest interrupt atomically. If it fails to achieve 
> > it, a threaded interrupt handler will be invoked.
> > 
> > The irq_bypass manager is extended for this purpose. The KVM eventfd will 
> > provide a irqbypass consumer to handle the interrupt at hard interrupt 
> > context. The producer will invoke the consumer's handler then.
> 
> Do you have any performance data?  Thanks,

Sorry, I should include the data on the commit messages.

The test:
Launch a VM with a FPGA device, which triggers an interrpt every 1ms. The VM 
is launched on a isolated CPU with NONHZ_FULL enabled.
Two data are collected.  On the guest, a special program will check the 
latency from the time the interrupt been triggered to the time the interrupt 
received by guest IRS. On the host, I use the perf to collect the perf data.

The performance Data with the patch:

Host perf data:
[root@otcnfv02 test-tools]# perf kvm stat record -C 34 sleep 10
[root@otcnfv02 test-tools]# perf kvm stat report
Analyze events for all VMs, all VCPUs:
 VM-EXITSamples  Samples% Time%Min TimeMax Time
Avg time
  EXTERNAL_INTERRUPT   999798.55%99.31%  1.73us 17.07us
2.09us ( +-   0.21% )
   PAUSE_INSTRUCTION127 1.25% 0.51%  0.69us  1.20us
0.84us ( +-   1.34% )
   MSR_WRITE 20 0.20% 0.18%  1.62us  3.21us
1.93us ( +-   3.95% )

Guest data:
[nfv@rt-test-1 ~]$ ./run-int-test.sh
Latency is Min: 3.74us Max: 20.08us Avg: 4.49us
No of interrupts = 74995

The performance data without the patch:
Host perf data:
[root@otcnfv02 test-tools]# perf kvm stat record -C 34 sleep 10
[root@otcnfv02 test-tools]# perf kvm stat report
Analyze events for all VMs, all VCPUs:
 VM-EXITSamples  Samples% Time%Min TimeMax Time
Avg time
   PAUSE_INSTRUCTION 14113687.74%50.39%  0.69us  8.51us 
 
0.77us ( +-   0.07% )
  EXTERNAL_INTERRUPT  1970112.25%49.59%  2.31us 15.93us 
 
5.46us ( +-   0.12% )
   MSR_WRITE 24 0.01% 0.02%  1.51us  2.22us 
 
1.91us ( +-   1.91% )

Notice:
The EXTERNAL_INTERRUPT VMExit is different w/ the patch (9997) and w/o the 
patch (19701). It is because with threaded IRQ, the NOHZ_FULL it not working 
because two threads on the pCPU, thus we have both the FPGA device interrupt 
and the tick timer interrupt. After calculation, the average time for the 
FPGA device interrupt is 4.72 us.

Guest data:
[nfv@rt-test-1 ~]$ ./run-int-test.sh
Latency is Min: 6.70us Max: 50.38us Avg: 7.44us
No of interrupts = 42639

Thanks
--jyh

> 
> Alex
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Threaded MSI interrupt for VFIO PCI device

2015-12-03 Thread Alex Williamson
On Thu, 2015-12-03 at 10:22 -0800, Yunhong Jiang wrote:
> When assigning a VFIO device to a KVM guest with low latency requirement, it  
> is better to handle the interrupt in the hard interrupt context, to reduce 
> the context switch to/from the IRQ thread.
> 
> Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi 
> interrupt is changed to use request_threaded_irq(). The primary interrupt 
> handler tries to set the guest interrupt atomically. If it fails to achieve 
> it, a threaded interrupt handler will be invoked.
> 
> The irq_bypass manager is extended for this purpose. The KVM eventfd will 
> provide a irqbypass consumer to handle the interrupt at hard interrupt 
> context. The producer will invoke the consumer's handler then.

Do you have any performance data?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/3] KVM/arm/arm64: add hooks for armv7 fp/simd lazy switch support

2015-12-03 Thread Mario Smarduch


On 12/3/2015 11:24 AM, Marc Zyngier wrote:
> On 03/12/15 19:21, Mario Smarduch wrote:
>>
>>
>> On 12/3/2015 7:46 AM, Marc Zyngier wrote:
>>> On 14/11/15 22:12, Mario Smarduch wrote:
 This patch adds vcpu fields to track lazy state, save host FPEXC, and
 offsets to fields.

 Signed-off-by: Mario Smarduch 
 ---
  arch/arm/include/asm/kvm_host.h | 6 ++
  arch/arm/kernel/asm-offsets.c   | 2 ++
  2 files changed, 8 insertions(+)

 diff --git a/arch/arm/include/asm/kvm_host.h 
 b/arch/arm/include/asm/kvm_host.h
 index 3df1e97..f1bf551 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -107,6 +107,12 @@ struct kvm_vcpu_arch {
/* Interrupt related fields */
u32 irq_lines;  /* IRQ and FIQ levels */
  
 +  /* fp/simd dirty flag true if guest accessed register file */
 +  boolvfp_dirty;
>>>
>>> I think we do not need this bool, because it is already represented by
>>> the state of the trapping bits.
>>
>> The trapping bit state is lost on exit since they're cleared, no?
> 
> But that's what should actually be preserved, no? At the moment, you
> maintain some side state to reflect what the trapping state is. You
> might as well keep it around all the time.

Ok I see, you should be able to preserve and use the trap registers. I'll rework
it.

> 
> Thanks,
> 
>   M.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: fix the writing POSTED_INTR_NV

2015-12-03 Thread Li RongQing
On Thu, Dec 3, 2015 at 9:58 PM, Paolo Bonzini  wrote:
> Indeed this is a problem for 32-bit hosts.


I meet this error when I run a 32bit kernel on Intel Wildcat Pass platform


-Roy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] KVM: Add the irq handling consumer

2015-12-03 Thread kbuild test robot
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]
Hi Yunhong,

[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.4-rc3 next-20151202]

url:
https://github.com/0day-ci/linux/commits/Yunhong-Jiang/KVM-Extract-the-irqfd_wakeup_pollin-irqfd_wakeup_pollup/20151204-024034
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: arm64-allmodconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/kvm/built-in.o: In function `kvm_irqfd_assign':
>> :(.text+0x10a70): undefined reference to `irq_bypass_register_consumer'
   arch/arm64/kvm/built-in.o: In function `irqfd_shutdown':
>> :(.text+0x10cd4): undefined reference to `irq_bypass_unregister_consumer'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[PATCH 5/5] KVM: Expose x86 kvm_arch_set_irq_inatomic()

2015-12-03 Thread Yunhong Jiang
The x86 support setting irq in atomic, expose it to vfio driver.

Signed-off-by: Yunhong Jiang 
---
 arch/x86/kvm/Kconfig |  1 +
 include/linux/kvm_host.h | 19 ---
 virt/kvm/Kconfig |  3 +++
 virt/kvm/eventfd.c   |  9 -
 4 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 639a6e34500c..642e8b905c96 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -30,6 +30,7 @@ config KVM
select HAVE_KVM_IRQFD
select IRQ_BYPASS_MANAGER
select HAVE_KVM_IRQ_BYPASS
+   select KVM_SET_IRQ_INATOMIC
select HAVE_KVM_IRQ_ROUTING
select HAVE_KVM_EVENTFD
select KVM_APIC_ARCHITECTURE
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 590c46e672df..a6e237275928 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -852,9 +852,6 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
irq, int level,
bool line_status);
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
int irq_source_id, int level, bool line_status);
-int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
-  struct kvm *kvm, int irq_source_id,
-  int level, bool line_status);
 bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin);
 void kvm_notify_acked_gsi(struct kvm *kvm, int gsi);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
@@ -1207,4 +1204,20 @@ int kvm_arch_update_irqfd_routing(struct kvm *kvm, 
unsigned int host_irq,
  uint32_t guest_irq, bool set);
 #endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */
 
+#ifndef CONFIG_KVM_SET_IRQ_INATOMIC
+int __attribute__((weak)) kvm_arch_set_irq_inatomic(
+   struct kvm_kernel_irq_routing_entry *irq,
+   struct kvm *kvm, int irq_source_id,
+   int level,
+   bool line_status)
+{
+   return -EWOULDBLOCK;
+}
+#else
+extern int kvm_arch_set_irq_inatomic(
+   struct kvm_kernel_irq_routing_entry *e,
+   struct kvm *kvm, int irq_source_id, int level,
+   bool line_status);
+#endif
+
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 7a79b6853583..7c99dd4724a4 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -50,3 +50,6 @@ config KVM_COMPAT
 
 config HAVE_KVM_IRQ_BYPASS
bool
+
+config KVM_SET_IRQ_INATOMIC
+   bool
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index b20a2d1bbf73..405c26742380 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -173,15 +173,6 @@ irqfd_deactivate(struct kvm_kernel_irqfd *irqfd)
queue_work(irqfd_cleanup_wq, >shutdown);
 }
 
-int __attribute__((weak)) kvm_arch_set_irq_inatomic(
-   struct kvm_kernel_irq_routing_entry *irq,
-   struct kvm *kvm, int irq_source_id,
-   int level,
-   bool line_status)
-{
-   return -EWOULDBLOCK;
-}
-
 static int
 irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
 {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] VIRT: Support runtime irq_bypass consumer

2015-12-03 Thread Yunhong Jiang
Extend the irq_bypass manager to support runtime consumers. A runtime
irq_bypass consumer can handle interrupt when an interrupt triggered. A
runtime consumer has it's handle_irq() function set and passing a
irq_context for the irq handling.

A producer keep a link for the runtime consumers, so that it can invoke
each consumer's handle_irq() when irq invoked.

Currently the irq_bypass manager has several code path assuming there is
only one consumer/producer pair for each token. For example, when
register the producer, it exits the loop after finding one match
consumer.  This is updated to support both static consumer (like for
Posted Interrupt consumer) and runtime consumer.

Signed-off-by: Yunhong Jiang 
---
 include/linux/irqbypass.h |  8 +
 virt/lib/irqbypass.c  | 82 +++
 2 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
index 1551b5b2f4c2..d5bec0c7be3a 100644
--- a/include/linux/irqbypass.h
+++ b/include/linux/irqbypass.h
@@ -12,6 +12,7 @@
 #define IRQBYPASS_H
 
 #include 
+#include 
 
 struct irq_bypass_consumer;
 
@@ -47,6 +48,9 @@ struct irq_bypass_consumer;
  */
 struct irq_bypass_producer {
struct list_head node;
+   /* Update side is synchronized by the lock on irqbypass.c */
+   struct srcu_struct srcu;
+   struct list_head consumers;
void *token;
int irq;
int (*add_consumer)(struct irq_bypass_producer *,
@@ -61,6 +65,7 @@ struct irq_bypass_producer {
  * struct irq_bypass_consumer - IRQ bypass consumer definition
  * @node: IRQ bypass manager private list management
  * @token: opaque token to match between producer and consumer
+ * @sibling: consumers with same token list management
  * @add_producer: Connect the IRQ consumer to an IRQ producer
  * @del_producer: Disconnect the IRQ consumer from an IRQ producer
  * @stop: Perform any quiesce operations necessary prior to add/del (optional)
@@ -73,6 +78,7 @@ struct irq_bypass_producer {
  */
 struct irq_bypass_consumer {
struct list_head node;
+   struct list_head sibling;
void *token;
int (*add_producer)(struct irq_bypass_consumer *,
struct irq_bypass_producer *);
@@ -80,6 +86,8 @@ struct irq_bypass_consumer {
 struct irq_bypass_producer *);
void (*stop)(struct irq_bypass_consumer *);
void (*start)(struct irq_bypass_consumer *);
+   int (*handle_irq)(void *arg);
+   void *irq_context;
 };
 
 int irq_bypass_register_producer(struct irq_bypass_producer *);
diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
index 09a03b5a21ff..43ef9e2c77dc 100644
--- a/virt/lib/irqbypass.c
+++ b/virt/lib/irqbypass.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 MODULE_LICENSE("GPL v2");
 MODULE_DESCRIPTION("IRQ bypass manager utility module");
@@ -49,11 +50,8 @@ static int __connect(struct irq_bypass_producer *prod,
prod->del_consumer(prod, cons);
}
 
-   if (cons->start)
-   cons->start(cons);
-   if (prod->start)
-   prod->start(prod);
-
+   if (!ret && cons->handle_irq)
+   list_add_rcu(>sibling, >consumers);
return ret;
 }
 
@@ -71,6 +69,11 @@ static void __disconnect(struct irq_bypass_producer *prod,
if (prod->del_consumer)
prod->del_consumer(prod, cons);
 
+   if (cons->handle_irq) {
+   list_del_rcu(>sibling);
+   synchronize_srcu(>srcu);
+   }
+
if (cons->start)
cons->start(cons);
if (prod->start)
@@ -87,7 +90,8 @@ static void __disconnect(struct irq_bypass_producer *prod,
 int irq_bypass_register_producer(struct irq_bypass_producer *producer)
 {
struct irq_bypass_producer *tmp;
-   struct irq_bypass_consumer *consumer;
+   struct list_head *node, *next, siblings = LIST_HEAD_INIT(siblings);
+   int ret;
 
might_sleep();
 
@@ -96,6 +100,9 @@ int irq_bypass_register_producer(struct irq_bypass_producer 
*producer)
 
mutex_lock();
 
+   INIT_LIST_HEAD(>consumers);
+   init_srcu_struct(>srcu);
+
list_for_each_entry(tmp, , node) {
if (tmp->token == producer->token) {
mutex_unlock();
@@ -104,23 +111,48 @@ int irq_bypass_register_producer(struct 
irq_bypass_producer *producer)
}
}
 
-   list_for_each_entry(consumer, , node) {
+   list_for_each_safe(node, next, ) {
+   struct irq_bypass_consumer *consumer = container_of(
+   node, struct irq_bypass_consumer, node);
+
if (consumer->token == producer->token) {
-   int ret = __connect(producer, consumer);
-   if (ret) {
-   mutex_unlock();
-   

[PATCH 1/5] KVM: Extract the irqfd_wakeup_pollin/irqfd_wakeup_pollup

2015-12-03 Thread Yunhong Jiang
Separate the irqfd_wakeup_pollin/irqfd_wakeup_pollup from the
irqfd_wakeup, so that we can reuse the logic for MSI fastpath injection.

Signed-off-by: Yunhong Jiang 
---
 virt/kvm/eventfd.c | 86 --
 1 file changed, 51 insertions(+), 35 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 46dbc0a7dfc1..c31d43b762db 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -180,6 +180,53 @@ int __attribute__((weak)) kvm_arch_set_irq_inatomic(
return -EWOULDBLOCK;
 }
 
+static int
+irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
+{
+   struct kvm *kvm = irqfd->kvm;
+   struct kvm_kernel_irq_routing_entry irq;
+   unsigned seq;
+   int idx, ret;
+
+   idx = srcu_read_lock(>irq_srcu);
+   do {
+   seq = read_seqcount_begin(>irq_entry_sc);
+   irq = irqfd->irq_entry;
+   } while (read_seqcount_retry(>irq_entry_sc, seq));
+   /* An event has been signaled, inject an interrupt */
+   ret = kvm_arch_set_irq_inatomic(, kvm,
+   KVM_USERSPACE_IRQ_SOURCE_ID, 1,
+   false);
+   srcu_read_unlock(>irq_srcu, idx);
+
+   return ret;
+}
+
+static int
+irqfd_wakeup_pollup(struct kvm_kernel_irqfd *irqfd)
+{
+   struct kvm *kvm = irqfd->kvm;
+   unsigned long flags;
+
+   spin_lock_irqsave(>irqfds.lock, flags);
+
+   /*
+* We must check if someone deactivated the irqfd before
+* we could acquire the irqfds.lock since the item is
+* deactivated from the KVM side before it is unhooked from
+* the wait-queue.  If it is already deactivated, we can
+* simply return knowing the other side will cleanup for us.
+* We cannot race against the irqfd going away since the
+* other side is required to acquire wqh->lock, which we hold
+*/
+   if (irqfd_is_active(irqfd))
+   irqfd_deactivate(irqfd);
+
+   spin_unlock_irqrestore(>irqfds.lock, flags);
+
+   return 0;
+}
+
 /*
  * Called with wqh->lock held and interrupts disabled
  */
@@ -189,45 +236,14 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
struct kvm_kernel_irqfd *irqfd =
container_of(wait, struct kvm_kernel_irqfd, wait);
unsigned long flags = (unsigned long)key;
-   struct kvm_kernel_irq_routing_entry irq;
-   struct kvm *kvm = irqfd->kvm;
-   unsigned seq;
-   int idx;
 
-   if (flags & POLLIN) {
-   idx = srcu_read_lock(>irq_srcu);
-   do {
-   seq = read_seqcount_begin(>irq_entry_sc);
-   irq = irqfd->irq_entry;
-   } while (read_seqcount_retry(>irq_entry_sc, seq));
-   /* An event has been signaled, inject an interrupt */
-   if (kvm_arch_set_irq_inatomic(, kvm,
- KVM_USERSPACE_IRQ_SOURCE_ID, 1,
- false) == -EWOULDBLOCK)
+   if (flags & POLLIN)
+   if (irqfd_wakeup_pollin(irqfd) == -EWOULDBLOCK)
schedule_work(>inject);
-   srcu_read_unlock(>irq_srcu, idx);
-   }
 
-   if (flags & POLLHUP) {
+   if (flags & POLLHUP)
/* The eventfd is closing, detach from KVM */
-   unsigned long flags;
-
-   spin_lock_irqsave(>irqfds.lock, flags);
-
-   /*
-* We must check if someone deactivated the irqfd before
-* we could acquire the irqfds.lock since the item is
-* deactivated from the KVM side before it is unhooked from
-* the wait-queue.  If it is already deactivated, we can
-* simply return knowing the other side will cleanup for us.
-* We cannot race against the irqfd going away since the
-* other side is required to acquire wqh->lock, which we hold
-*/
-   if (irqfd_is_active(irqfd))
-   irqfd_deactivate(irqfd);
-
-   spin_unlock_irqrestore(>irqfds.lock, flags);
-   }
+   irqfd_wakeup_pollup(irqfd);
 
return 0;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] VFIO: Support threaded interrupt handling on VFIO

2015-12-03 Thread Yunhong Jiang
For VFIO device with MSI interrupt type, it's possible to handle the
interrupt on hard interrupt context without invoking the interrupt
thread. Handling the interrupt on hard interrupt context reduce the
interrupt latency.

Signed-off-by: Yunhong Jiang 
---
 drivers/vfio/pci/vfio_pci_intrs.c | 39 ++-
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 3b3ba15558b7..108d335c5656 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -236,12 +236,35 @@ static void vfio_intx_disable(struct vfio_pci_device 
*vdev)
kfree(vdev->ctx);
 }
 
+static irqreturn_t vfio_msihandler(int irq, void *arg)
+{
+   struct vfio_pci_irq_ctx *ctx = arg;
+   struct irq_bypass_producer *producer = >producer;
+   struct irq_bypass_consumer *consumer;
+   int ret = IRQ_HANDLED, idx;
+
+   idx = srcu_read_lock(>srcu);
+
+   list_for_each_entry_rcu(consumer, >consumers, sibling) {
+   /*
+* Invoke the thread handler if any consumer would block, but
+* finish all consumes.
+*/
+   if (consumer->handle_irq(consumer->irq_context) == -EWOULDBLOCK)
+   ret = IRQ_WAKE_THREAD;
+   continue;
+   }
+
+   srcu_read_unlock(>srcu, idx);
+   return ret;
+}
+
 /*
  * MSI/MSI-X
  */
-static irqreturn_t vfio_msihandler(int irq, void *arg)
+static irqreturn_t vfio_msihandler_threaded(int irq, void *arg)
 {
-   struct eventfd_ctx *trigger = arg;
+   struct eventfd_ctx *trigger = ((struct vfio_pci_irq_ctx *)arg)->trigger;
 
eventfd_signal(trigger, 1);
return IRQ_HANDLED;
@@ -318,7 +341,7 @@ static int vfio_msi_set_vector_signal(struct 
vfio_pci_device *vdev,
return -EINVAL;
 
if (vdev->ctx[vector].trigger) {
-   free_irq(irq, vdev->ctx[vector].trigger);
+   free_irq(irq, >ctx[vector]);
irq_bypass_unregister_producer(>ctx[vector].producer);
kfree(vdev->ctx[vector].name);
eventfd_ctx_put(vdev->ctx[vector].trigger);
@@ -353,8 +376,14 @@ static int vfio_msi_set_vector_signal(struct 
vfio_pci_device *vdev,
pci_write_msi_msg(irq, );
}
 
-   ret = request_irq(irq, vfio_msihandler, 0,
- vdev->ctx[vector].name, trigger);
+   /*
+* Currently the primary handler for the thread_irq will be invoked on
+* a thread, the IRQF_ONESHOT is a hack for it.
+*/
+   ret = request_threaded_irq(irq, vfio_msihandler,
+  vfio_msihandler_threaded,
+  IRQF_ONESHOT, vdev->ctx[vector].name,
+  >ctx[vector]);
if (ret) {
kfree(vdev->ctx[vector].name);
eventfd_ctx_put(trigger);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-03 Thread Alexander Duyck
On Wed, Dec 2, 2015 at 6:08 AM, Lan, Tianyu  wrote:
> On 12/1/2015 11:02 PM, Michael S. Tsirkin wrote:
>>>
>>> But
>>> it requires guest OS to do specific configurations inside and rely on
>>> bonding driver which blocks it work on Windows.
>>>  From performance side,
>>> putting VF and virtio NIC under bonded interface will affect their
>>> performance even when not do migration. These factors block to use VF
>>> NIC passthough in some user cases(Especially in the cloud) which require
>>> migration.
>>
>>
>> That's really up to guest. You don't need to do bonding,
>> you can just move the IP and mac from userspace, that's
>> possible on most OS-es.
>>
>> Or write something in guest kernel that is more lightweight if you are
>> so inclined. What we are discussing here is the host-guest interface,
>> not the in-guest interface.
>>
>>> Current solution we proposed changes NIC driver and Qemu. Guest Os
>>> doesn't need to do special thing for migration.
>>> It's easy to deploy
>>
>>
>>
>> Except of course these patches don't even work properly yet.
>>
>> And when they do, even minor changes in host side NIC hardware across
>> migration will break guests in hard to predict ways.
>
>
> Switching between PV and VF NIC will introduce network stop and the
> latency of hotplug VF is measurable. For some user cases(cloud service
> and OPNFV) which are sensitive to network stabilization and performance,
> these are not friend and blocks SRIOV NIC usage in these case. We hope
> to find a better way to make SRIOV NIC work in these cases and this is
> worth to do since SRIOV NIC provides better network performance compared
> with PV NIC. Current patches have some issues. I think we can find
> solution for them andimprove them step by step.

I still believe the concepts being put into use here are deeply
flawed.  You are assuming you can somehow complete the migration while
the device is active and I seriously doubt that is the case.  You are
going to cause data corruption or worse cause a kernel panic when you
end up corrupting the guest memory.

You have to halt the device at some point in order to complete the
migration.  Now I fully agree it is best to do this for as small a
window as possible.  I really think that your best approach would be
embrace and extend the current solution that is making use of bonding.
The first step being to make it so that you don't have to hot-plug the
VF until just before you halt the guest instead of before you start he
migration.  Just doing that would yield a significant gain in terms of
performance during the migration.  In addition something like that
should be able to be done without having to be overly invasive into
the drivers.  A few tweaks to the DMA API and you could probably have
that resolved.

As far as avoiding the hot-plug itself that would be better handled as
a separate follow-up, and really belongs more to the PCI layer than
the NIC device drivers.  The device drivers should already have code
for handling a suspend/resume due to a power cycle event.  If you
could make use of that then it is just a matter of implementing
something in the hot-plug or PCIe drivers that would allow QEMU to
signal when the device needs to go into D3 and when it can resume
normal operation at D0.  You could probably use the PCI Bus Master
Enable bit as the test on if the device is ready for migration or not.
If the bit is set you cannot migrate the VM, and if it is cleared than
you are ready to migrate.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Threaded MSI interrupt for VFIO PCI device

2015-12-03 Thread Yunhong Jiang
When assigning a VFIO device to a KVM guest with low latency requirement, it  
is better to handle the interrupt in the hard interrupt context, to reduce 
the context switch to/from the IRQ thread.

Based on discussion on https://lkml.org/lkml/2015/10/26/764, the VFIO msi 
interrupt is changed to use request_threaded_irq(). The primary interrupt 
handler tries to set the guest interrupt atomically. If it fails to achieve 
it, a threaded interrupt handler will be invoked.

The irq_bypass manager is extended for this purpose. The KVM eventfd will 
provide a irqbypass consumer to handle the interrupt at hard interrupt 
context. The producer will invoke the consumer's handler then.

Yunhong Jiang (5):
  Extract the irqfd_wakeup_pollin/irqfd_wakeup_pollup
  Support runtime irq_bypass consumer
  Support threaded interrupt handling on VFIO
  Add the irq handling consumer
  Expose x86 kvm_arch_set_irq_inatomic()

 arch/x86/kvm/Kconfig  |   1 +
 drivers/vfio/pci/vfio_pci_intrs.c |  39 ++--
 include/linux/irqbypass.h |   8 +++
 include/linux/kvm_host.h  |  19 +-
 include/linux/kvm_irqfd.h |   1 +
 virt/kvm/Kconfig  |   3 +
 virt/kvm/eventfd.c| 131 ++
 virt/lib/irqbypass.c  |  82 ++--
 8 files changed, 214 insertions(+), 70 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: Add the irq handling consumer

2015-12-03 Thread Yunhong Jiang
Add an irq_bypass consumer to the KVM eventfd, so that when a MSI interrupt
happens and triggerred from VFIO, it can be handled fast.

Signed-off-by: Yunhong Jiang 
---
 include/linux/kvm_irqfd.h |  1 +
 virt/kvm/eventfd.c| 42 ++
 2 files changed, 43 insertions(+)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index 0c1de05098c8..5573d53ccebb 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -65,6 +65,7 @@ struct kvm_kernel_irqfd {
poll_table pt;
struct work_struct shutdown;
struct irq_bypass_consumer consumer;
+   struct irq_bypass_consumer fastpath;
struct irq_bypass_producer *producer;
 };
 
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c31d43b762db..b20a2d1bbf73 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -144,6 +144,8 @@ irqfd_shutdown(struct work_struct *work)
 #ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
irq_bypass_unregister_consumer(>consumer);
 #endif
+   if (irqfd->fastpath.token)
+   irq_bypass_unregister_consumer(>fastpath);
eventfd_ctx_put(irqfd->eventfd);
kfree(irqfd);
 }
@@ -203,6 +205,14 @@ irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
 }
 
 static int
+kvm_fastpath_irq(void *arg)
+{
+   struct kvm_kernel_irqfd *irqfd = arg;
+
+   return irqfd_wakeup_pollin(irqfd);
+}
+
+static int
 irqfd_wakeup_pollup(struct kvm_kernel_irqfd *irqfd)
 {
struct kvm *kvm = irqfd->kvm;
@@ -296,6 +306,34 @@ int  __attribute__((weak)) kvm_arch_update_irqfd_routing(
 }
 #endif
 
+static int kvm_fastpath_stub(struct irq_bypass_consumer *stub,
+struct irq_bypass_producer *stub1)
+{
+   return 0;
+}
+
+static void kvm_fastpath_stub1(struct irq_bypass_consumer *stub,
+struct irq_bypass_producer *stub1)
+{
+}
+
+static int setup_fastpath_consumer(struct kvm_kernel_irqfd *irqfd)
+{
+   int ret;
+
+   irqfd->fastpath.token = (void *)irqfd->eventfd;
+   irqfd->fastpath.add_producer = kvm_fastpath_stub;
+   irqfd->fastpath.del_producer = kvm_fastpath_stub1;
+   irqfd->fastpath.handle_irq = kvm_fastpath_irq;
+   irqfd->fastpath.irq_context = irqfd;
+   ret = irq_bypass_register_consumer(>fastpath);
+
+   if (ret)
+   /* A special tag to indicate consumer not working */
+   irqfd->fastpath.token = (void *)0;
+   return ret;
+}
+
 static int
 kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 {
@@ -435,6 +473,10 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
irqfd->consumer.token, ret);
 #endif
 
+   if (setup_fastpath_consumer(irqfd))
+   pr_info("irq bypass fastpath consumer (toke %p) registration 
fails: %d\n",
+   irqfd->eventfd, ret);
+
return 0;
 
 fail:
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-12-03 Thread Mario Smarduch


On 12/3/2015 8:13 AM, Marc Zyngier wrote:
> On 14/11/15 22:12, Mario Smarduch wrote:
>> This patch tracks armv7 and armv8 fp/simd hardware state with a vcpu lazy 
>> flag.
>> On vcpu_load for 32 bit guests enable FP access, and later enable fp/simd
>> trapping for 32 and 64 bit guests if lazy flag is not set. On first fp/simd 
>> access trap to handler to save host and restore guest context, disable 
>> trapping and set vcpu lazy flag. On vcpu_put if flag is set save guest and 
>> restore host context and also save guest fpexc register.
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/include/asm/kvm_host.h   |  3 ++
>>  arch/arm/kvm/arm.c| 18 +++--
>>  arch/arm64/include/asm/kvm_asm.h  |  2 +
>>  arch/arm64/include/asm/kvm_host.h | 17 +++-
>>  arch/arm64/kernel/asm-offsets.c   |  1 +
>>  arch/arm64/kvm/hyp.S  | 83 
>> +--
>>  6 files changed, 89 insertions(+), 35 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index 8fc7a59..6960ff2 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -40,6 +40,8 @@
>>  
>>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>>  
>> +#define kvm_guest_is32bit(vcpu) true
> 
> This should be defined as an inline function, and placed in
> asm/kvm_emulate.h, probably renamed as kvm_guest_vcpu_is_32bit.

Will do, this header file should also resolve my problems in armv7.
> 
>> +
>>  /*
>>   * Reads the host FPEXC register, saves to vcpu context and enables the
>>   * FPEXC.
>> @@ -260,6 +262,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>>  void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>> +static inline void kvm_save_guest_fpexc(struct kvm_vcpu *vcpu) {}
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index cfc348a..7a20530 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  
>>  kvm_arm_set_running_vcpu(vcpu);
>>  
>> -/* Save and enable FPEXC before we load guest context */
>> -kvm_enable_fpexc(vcpu);
>> +/*
>> + * For 32bit guest executing on arm64, enable fp/simd access in
>> + * EL2. On arm32 save host fpexc and then enable fp/simd access.
>> + */
>> +if (kvm_guest_is32bit(vcpu))
>> +kvm_enable_fpexc(vcpu);
>>  }
>>  
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>> @@ -301,10 +305,18 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  /* If the fp/simd registers are dirty save guest, restore host. */
>>  if (vcpu->arch.vfp_dirty) {
>>  kvm_restore_host_vfp_state(vcpu);
>> +
>> +/*
>> + * For 32bit guest on arm64 save the guest fpexc register
>> + * in EL2 mode.
>> + */
>> +if (kvm_guest_is32bit(vcpu))
>> +kvm_save_guest_fpexc(vcpu);
>> +
>>  vcpu->arch.vfp_dirty = 0;
>>  }
>>  
>> -/* Restore host FPEXC trashed in vcpu_load */
>> +/* For arm32 restore host FPEXC trashed in vcpu_load. */
>>  kvm_restore_host_fpexc(vcpu);
>>  
>>  /*
>> diff --git a/arch/arm64/include/asm/kvm_asm.h 
>> b/arch/arm64/include/asm/kvm_asm.h
>> index 5e37710..c589ca9 100644
>> --- a/arch/arm64/include/asm/kvm_asm.h
>> +++ b/arch/arm64/include/asm/kvm_asm.h
>> @@ -117,6 +117,8 @@ extern char __kvm_hyp_vector[];
>>  extern void __kvm_flush_vm_context(void);
>>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>> +extern void __kvm_enable_fpexc32(void);
>> +extern void __kvm_save_fpexc32(struct kvm_vcpu *vcpu);
>>  
>>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>>  
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index 83e65dd..6e2d6b5 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -41,6 +41,8 @@
>>  
>>  #define KVM_VCPU_MAX_FEATURES 3
>>  
>> +#define kvm_guest_is32bit(vcpu) (!(vcpu->arch.hcr_el2 & HCR_RW))
>> +
>>  int __attribute_const__ kvm_target_cpu(void);
>>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>>  int kvm_arch_dev_ioctl_check_extension(long ext);
>> @@ -251,9 +253,20 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> -static inline void kvm_enable_fpexc(struct kvm_vcpu *vcpu) {}
>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>> +
>> +static inline void 

Re: [PATCH] KVM: VMX: fix the writing POSTED_INTR_NV

2015-12-03 Thread Yang Zhang

On 2015/12/3 13:29, roy.qing...@gmail.com wrote:

From: Li RongQing 

POSTED_INTR_NV is 16bit, should not use 64bit write function

[ 5311.676074] vmwrite error: reg 3 value 0 (err 12)
   [ 5311.680001] CPU: 49 PID: 4240 Comm: qemu-system-i38 Tainted: G I 
4.1.13-WR8.0.0.0_standard #1
   [ 5311.689343] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS 
SE5C610.86B.01.01.0008.021120151325 02/11/2015
   [ 5311.699550]   e69a7e1c c1950de1  e69a7e38 
fafcff45 fafebd24
   [ 5311.706924] 0003  000c b6a06dfa e69a7e40 fafcff79 
e69a7eb0 fafd5f57
   [ 5311.714296] e69a7ec0 c1080600  0001 c0e18018 01be 
 0b43
   [ 5311.721651] Call Trace:
   [ 5311.722942] [] dump_stack+0x4b/0x75
   [ 5311.726467] [] vmwrite_error+0x35/0x40 [kvm_intel]
   [ 5311.731444] [] vmcs_writel+0x29/0x30 [kvm_intel]
   [ 5311.736228] [] vmx_create_vcpu+0x337/0xb90 [kvm_intel]
   [ 5311.741600] [] ? dequeue_task_fair+0x2e0/0xf60
   [ 5311.746197] [] kvm_arch_vcpu_create+0x3a/0x70 [kvm]
   [ 5311.751278] [] kvm_vm_ioctl+0x14d/0x640 [kvm]
   [ 5311.755771] [] ? free_pages_prepare+0x1a4/0x2d0
   [ 5311.760455] [] ? debug_smp_processor_id+0x12/0x20
   [ 5311.765333] [] ? sched_move_task+0xbe/0x170
   [ 5311.769621] [] ? kmem_cache_free+0x213/0x230
   [ 5311.774016] [] ? kvm_set_memory_region+0x60/0x60 [kvm]
   [ 5311.779379] [] do_vfs_ioctl+0x2e2/0x500
   [ 5311.783285] [] ? kmem_cache_free+0x213/0x230
   [ 5311.787677] [] ? __mmdrop+0x63/0xd0
   [ 5311.791196] [] ? __mmdrop+0x63/0xd0
   [ 5311.794712] [] ? __mmdrop+0x63/0xd0
   [ 5311.798234] [] ? __fget+0x57/0x90
   [ 5311.801559] [] ? __fget_light+0x22/0x50
   [ 5311.805464] [] SyS_ioctl+0x80/0x90
   [ 5311.808885] [] sysenter_do_call+0x12/0x12
   [ 5312.059280] kvm: zapping shadow pages for mmio generation wraparound
   [ 5313.678415] kvm [4231]: vcpu0 disabled perfctr wrmsr: 0xc2 data 0x
   [ 5313.726518] kvm [4231]: vcpu0 unhandled rdmsr: 0x570

Signed-off-by: Li RongQing 
Cc: Yang Zhang 
---
  arch/x86/kvm/vmx.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5eb56ed..418e084 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4780,7 +4780,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)

vmcs_write16(GUEST_INTR_STATUS, 0);

-   vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
+   vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);
vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((>pi_desc)));
}

@@ -9505,7 +9505,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 */
vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv;
vmx->nested.pi_pending = false;
-   vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
+   vmcs_write16(POSTED_INTR_NV, POSTED_INTR_VECTOR);
vmcs_write64(POSTED_INTR_DESC_ADDR,
page_to_phys(vmx->nested.pi_desc_page) +
(unsigned long)(vmcs12->posted_intr_desc_addr &



Yes, it is a mistake from my original patch.:(

Reviewed-by: Yang Zhang 

--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-12-03 Thread Mario Smarduch


On 12/3/2015 7:58 AM, Marc Zyngier wrote:
> On 14/11/15 22:12, Mario Smarduch wrote:
>> This patch tracks armv7 fp/simd hardware state with a vcpu lazy flag.
>> On vcpu_load saves host fpexc and enables FP access, and later enables 
>> fp/simd
>> trapping if lazy flag is not set. On first fp/simd access trap to handler 
>> to save host and restore guest context, disable trapping and set vcpu lazy 
>> flag. On vcpu_put if flag is set save guest and restore host context and 
>> always restore host fpexc.
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/include/asm/kvm_host.h   | 33 ++
>>  arch/arm/kvm/arm.c| 12 
>>  arch/arm/kvm/interrupts.S | 58 
>> +++
>>  arch/arm/kvm/interrupts_head.S| 26 +-
>>  arch/arm64/include/asm/kvm_host.h |  6 
>>  5 files changed, 104 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index f1bf551..8fc7a59 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -40,6 +40,38 @@
>>  
>>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>>  
>> +/*
>> + * Reads the host FPEXC register, saves it to vcpu context and enables the
>> + * FP/SIMD unit.
>> + */
>> +#ifdef CONFIG_VFPv3
>> +#define kvm_enable_fpexc(vcpu) {\
>> +u32 fpexc = 0;  \
>> +asm volatile(   \
>> +"mrc p10, 7, %0, cr8, cr0, 0\n" \
>> +"str %0, [%1]\n"\
>> +"orr %0, %0, #(1 << 30)\n"  \
>> +"mcr p10, 7, %0, cr8, cr0, 0\n" \
> 
> Don't you need an ISB here? 
Yes it does (B2.7.3) - was thinking something else but the manual is clear here.

> Also, it would be a lot nicer if this was a
> real function (possibly inlined). I don't see any real reason to make
> this a #define.
Had some trouble reconciling arm and arm64 compile making this
a function in kvm_host.h. I'll work to resolve it.

> 
> Also, you're preserving a lot of the host's FPEXC bits. Is that safe?
No it may not be, should just set the enable bit.
> 
>> +: "+r" (fpexc)  \
>> +: "r" (>arch.host_fpexc)  \
>> +);  \
>> +}
>> +#else
>> +#define kvm_enable_fpexc(vcpu)
>> +#endif
>> +
>> +/* Restores host FPEXC register */
>> +#ifdef CONFIG_VFPv3
>> +#define kvm_restore_host_fpexc(vcpu) {  \
>> +asm volatile(   \
>> +"mcr p10, 7, %0, cr8, cr0, 0\n" \
>> +: : "r" (vcpu->arch.host_fpexc) \
>> +);  \
>> +}
>> +#else
>> +#define kvm_restore_host_fpexc(vcpu)
>> +#endif
>> +
>>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>>  int __attribute_const__ kvm_target_cpu(void);
>>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>> @@ -227,6 +259,7 @@ int kvm_perf_teardown(void);
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index dc017ad..cfc348a 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -291,10 +291,22 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>>  
>>  kvm_arm_set_running_vcpu(vcpu);
>> +
>> +/* Save and enable FPEXC before we load guest context */
>> +kvm_enable_fpexc(vcpu);
>>  }
>>  
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>> +/* If the fp/simd registers are dirty save guest, restore host. */
>> +if (vcpu->arch.vfp_dirty) {
> 
> See my previous comment about the dirty state.
Yes that change seems to be working out fine.
> 
>> +kvm_restore_host_vfp_state(vcpu);
>> +vcpu->arch.vfp_dirty = 0;
>> +}
>> +
>> +/* Restore host FPEXC trashed in vcpu_load */
>> +kvm_restore_host_fpexc(vcpu);
>> +
>>  /*
>>   * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>   * if the vcpu is no longer assigned to a cpu.  This is used for the
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 900ef6d..1ddaa89 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -28,6 +28,26 @@
>>  #include "interrupts_head.S"
>>  
>>  .text
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) -
>> + *  This function is called from host to save the guest, and restore host
>> + *  fp/simd hardware context. 

Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-03 Thread Ben Hutchings
On Wed, 2015-12-02 at 18:41 +0100, Ard Biesheuvel wrote:
> Hi Pavel,
> 
> Thanks for getting to the bottom of this.
> 
> On 1 December 2015 at 14:03, Pavel Fedin  wrote:
> > This function takes stage-II physical addresses (A.K.A. IPA), on input, not
> > real physical addresses. This causes kvm_is_device_pfn() to return wrong
> > values, depending on how much guest and host memory maps match. This
> > results in completely broken KVM on some boards. The problem has been
> > caught on Samsung proprietary hardware.
> > 
> > Cc: sta...@vger.kernel.org
> > Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's 
> > uncachedness")
> > 
> 
> That commit is not in a release yet, so no need for cc stable
[...]

But it is cc'd to stable, so unless it is going to be nacked at review
stage, any subsequent fixes should also be cc'd.

Ben.

-- 
Ben Hutchings
For every action, there is an equal and opposite criticism. - Harrison

signature.asc
Description: This is a digitally signed message part


Re: [PATCH] KVM: VMX: fix read/write sizes of VMCS fields

2015-12-03 Thread Yang Zhang

On 2015/12/3 23:11, Paolo Bonzini wrote:

In theory this should have broken EPT on 32-bit kernels (due to
reading the high part of natural-width field GUEST_CR3).  Not sure
if no one noticed or the processor behaves differently from the
documentation.


It seems we will check the success of vmcs_write but not vmcs_read. 
Shouldn't check the vmcs_read?




Signed-off-by: Paolo Bonzini 
---
  arch/x86/kvm/vmx.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c39737ff0581..b1af1e48070b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4868,7 +4868,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)

seg_setup(VCPU_SREG_CS);
vmcs_write16(GUEST_CS_SELECTOR, 0xf000);
-   vmcs_write32(GUEST_CS_BASE, 0x);
+   vmcs_writel(GUEST_CS_BASE, 0xul);

seg_setup(VCPU_SREG_DS);
seg_setup(VCPU_SREG_ES);
@@ -4904,7 +4904,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)

vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0);
-   vmcs_write32(GUEST_PENDING_DBG_EXCEPTIONS, 0);
+   vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0);

setup_msrs(vmx);

@@ -7893,7 +7893,7 @@ static void dump_vmcs(void)
u32 pin_based_exec_ctrl = vmcs_read32(PIN_BASED_VM_EXEC_CONTROL);
u32 secondary_exec_control = 0;
unsigned long cr4 = vmcs_readl(GUEST_CR4);
-   u64 efer = vmcs_readl(GUEST_IA32_EFER);
+   u64 efer = vmcs_read64(GUEST_IA32_EFER);
int i, n;

if (cpu_has_secondary_exec_ctrls())
@@ -10159,7 +10159,7 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, 
struct vmcs12 *vmcs12,
 * Additionally, restore L2's PDPTR to vmcs12.
 */
if (enable_ept) {
-   vmcs12->guest_cr3 = vmcs_read64(GUEST_CR3);
+   vmcs12->guest_cr3 = vmcs_readl(GUEST_CR3);
vmcs12->guest_pdptr0 = vmcs_read64(GUEST_PDPTR0);
vmcs12->guest_pdptr1 = vmcs_read64(GUEST_PDPTR1);
vmcs12->guest_pdptr2 = vmcs_read64(GUEST_PDPTR2);





--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 3/3] vhost_net: basic polling support

2015-12-03 Thread Jason Wang


On 12/02/2015 08:36 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 02, 2015 at 01:04:03PM +0800, Jason Wang wrote:
>>
>> On 12/01/2015 10:43 PM, Michael S. Tsirkin wrote:
>>> On Tue, Dec 01, 2015 at 01:17:49PM +0800, Jason Wang wrote:
 On 11/30/2015 06:44 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 25, 2015 at 03:11:29PM +0800, Jason Wang wrote:
>>> This patch tries to poll for new added tx buffer or socket receive
>>> queue for a while at the end of tx/rx processing. The maximum time
>>> spent on polling were specified through a new kind of vring ioctl.
>>>
>>> Signed-off-by: Jason Wang 
> One further enhancement would be to actually poll
> the underlying device. This should be reasonably
> straight-forward with macvtap (especially in the
> passthrough mode).
>
>
 Yes, it is. I have some patches to do this by replacing
 skb_queue_empty() with sk_busy_loop() but for tap.
>>> We probably don't want to do this unconditionally, though.
>>>
 Tests does not show
 any improvement but some regression.
>>> Did you add code to call sk_mark_napi_id on tap then?
>>> sk_busy_loop won't do anything useful without.
>> Yes I did. Probably something wrong elsewhere.
> Is this for guest-to-guest?

Nope. Like you said below, since it requires NAPI so it was external
host to guest.

>  the patch to do napi
> for tap is still not upstream due to minor performance
> regression.  Want me to repost it?

Sure, I've played this a little bit in the past too.

>
  Maybe it's better to test macvtap.
>>> Same thing ...
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html