Re: [Qemu-devel] [PATCH 0/2] mips/kvm: Fixes for big endian MIPS64 hosts

2015-07-09 Thread Peter Maydell
On 9 July 2015 at 14:49, Paolo Bonzini pbonz...@redhat.com wrote:


 On 09/07/2015 10:13, Leon Alrae wrote:
 On 08/07/2015 16:22, Paolo Bonzini wrote:
 On 08/07/2015 17:03, James Hogan wrote:
 Hi Paolo,

 On 24/04/15 11:26, James Hogan wrote:
 A couple of small fixes for accessing 32-bit KVM registers on
 big endian, and to sign extend struct kvm_regs registers so as to
 work on MIPS64 hosts.

 James Hogan (2): mips/kvm: Fix Big endian 32-bit register access
 mips/kvm: Sign extend registers written to KVM

 Any chance of applying these in time for v2.4?

 I had assumed they'd go through Leon, but I can apply them for 2.4-rc1
 as well.

 I thought these changes would get applied via kvm queue since they touch
 target-mips/kvm.c only. But if such target-specific kvm patches usually go 
 via
 target-* tree then I'm happy to pick them up.

 It's the same---they can go in through any tree.  In this case I've now
 applied them and will send them out next week.

I've actually just applied them to master as buildfixes :-)

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer

2015-07-09 Thread Alex Williamson
On Thu, 2015-07-09 at 14:28 +0200, Joerg Roedel wrote:
 On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote:
  Hosting the bypass manager in kernel/irq seemed appropriate, but really
  it could be anywhere.  Does anyone have a different preference or
  specifically want it under their scope?  We had originally thought of
  this as an IOMMU service, but I think we've generalized it beyond that.
  I expect we should also add the necessary hooks to turn it into a
  loadable module to keep the tinification folks happy, I'll incorporate
  the current working changes and post a version with that.
 
 Yeah, this is only an IOMMU service on x86, afaik. So drivers/iommu is
 probably the wrong place to host it.
 
 Will there be any other producers than VFIO or any other consumers than
 KVM? If not, it should live in one of these spaces. KVM is probably the
 best choice, as any hardware feature that uses this targets
 virtualization, so there will hardly ever be another consumer than KVM.

If we think that it's *only* a kvm-vfio interaction then we could add it
to virt/kvm/vfio.c.  vfio could use symbol_get to avoid a module
dependency and effectively disable the code path when not used with kvm.
The reverse model of hosting it in vfio and using symbol_get from
kvm-vfio would also work.  Do we really want to declare it to be
kvm-vfio specific though?  Another option would be to simply host it
under virt/lib with module dependencies for both vfio and kvm.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 16:13, Alex Williamson wrote:
 On Thu, 2015-07-09 at 14:28 +0200, Joerg Roedel wrote:
 On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote:
 Hosting the bypass manager in kernel/irq seemed appropriate, but really
 it could be anywhere.  Does anyone have a different preference or
 specifically want it under their scope?  We had originally thought of
 this as an IOMMU service, but I think we've generalized it beyond that.
 I expect we should also add the necessary hooks to turn it into a
 loadable module to keep the tinification folks happy, I'll incorporate
 the current working changes and post a version with that.

 Yeah, this is only an IOMMU service on x86, afaik. So drivers/iommu is
 probably the wrong place to host it.

 Will there be any other producers than VFIO or any other consumers than
 KVM? If not, it should live in one of these spaces. KVM is probably the
 best choice, as any hardware feature that uses this targets
 virtualization, so there will hardly ever be another consumer than KVM.
 
 If we think that it's *only* a kvm-vfio interaction then we could add it
 to virt/kvm/vfio.c.  vfio could use symbol_get to avoid a module
 dependency and effectively disable the code path when not used with kvm.
 The reverse model of hosting it in vfio and using symbol_get from
 kvm-vfio would also work.  Do we really want to declare it to be
 kvm-vfio specific though?  Another option would be to simply host it
 under virt/lib with module dependencies for both vfio and kvm.

I wonder if in the future we may have some kind of driver-mediated
passthrough, e.g. for network drivers.  They might use the bypass
mechanism too.  So I think drivers/vfio is too restrictive.

virt/ right now only hosts KVM, but it could for example host lguest
too.  virt/lib/ is okay with me.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

2015-07-09 Thread Christoffer Dall
On Thu, Jul 09, 2015 at 02:24:06PM +0200, Christoffer Dall wrote:
 On Thu, Jul 09, 2015 at 01:07:24PM +0100, Peter Maydell wrote:
  On 9 July 2015 at 13:05, Christoffer Dall christoffer.d...@linaro.org 
  wrote:
   As I understand it, the problem is that if we ever run a VCPU after
   reading the value, and write back the value afterwards, you potentially
   make time go backwards and get inconsistent views of time from different
   VCPUs because they may have read the time before/after updating the
   CNTVOFF.
  
  Right, but I think if QEMU does that it's a bug (and more to
  the point I don't entirely understand why we would do that
  yet, even given that we don't have a distinction between
  registers to sync always and registers to sync only on
  reset...)
  
 I think we have evidence that it does that, but we don't know why/how.
 
So I ran this through GDB, and this happens when the guest probes the
virtio devices, specifically the backtrace tells me that

virtio_current_cpu_endian () at /root/src/qemu/hw/virtio/virtio.c:594
= cc-virtio_is_big_endian
  - arm_cpu_is_big_endian
- cpu_synchronize_state
  - kvm_cpu_synchronize_state

which causes cpu-kvm_vcpu_dirty = true, which causes the run-loop to
write the CNTVOFF on a per-vcpu basis without stopping anything, as far
as I can tell.

So yeah, I guess the only required fix here is to fix QEMU in some way
as to not fiddle with the timer registers in this way, and then I
honestly don't know if we should try to fix legacy userspace in the
kernel, but based on the feedback from Jan, I suppose not.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

2015-07-09 Thread Peter Maydell
On 9 July 2015 at 15:17, Christoffer Dall christoffer.d...@linaro.org wrote:
 On Thu, Jul 09, 2015 at 02:24:06PM +0200, Christoffer Dall wrote:
 So I ran this through GDB, and this happens when the guest probes the
 virtio devices, specifically the backtrace tells me that

 virtio_current_cpu_endian () at /root/src/qemu/hw/virtio/virtio.c:594
 = cc-virtio_is_big_endian
   - arm_cpu_is_big_endian
 - cpu_synchronize_state
   - kvm_cpu_synchronize_state

 which causes cpu-kvm_vcpu_dirty = true, which causes the run-loop to
 write the CNTVOFF on a per-vcpu basis without stopping anything, as far
 as I can tell.

Ah, I was wondering if it was going to turn out to be this,
but I hadn't figured out why it was going to cause us to do
a write-back of state rather than just a read.

 So yeah, I guess the only required fix here is to fix QEMU in some way
 as to not fiddle with the timer registers in this way, and then I
 honestly don't know if we should try to fix legacy userspace in the
 kernel, but based on the feedback from Jan, I suppose not.

arm_cpu_is_big_endian() doesn't actually want to write
back any state -- all it's interested in is reading.
So we really ought not to need to write anything back there.
kvm-all.c's sync functions don't seem to provide a sync
kernel state to userspace but I promise I'm not going to
dirty it function though. I guess we could add one.

(Overall it's kind of fragile even if we can avoid this
specific case where we're writing back the counter state,
though -- we should probably think about the sync level
stuff as well.)

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: SVM: use NPT page attributes

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 04:30, Xiao Guangrong wrote:
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 602b974a60a6..0f125c1860ec 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1085,6 +1085,47 @@ static u64 svm_compute_tsc_offset(struct
 kvm_vcpu *vcpu, u64 target_tsc)
return target_tsc - tsc;
}

 +static void svm_set_guest_pat(struct vcpu_svm *svm, u64 *g_pat)
 +{
 +struct kvm_vcpu *vcpu = svm-vcpu;
 +
 +/* Unlike Intel, AMD takes the guest's CR0.CD into account.

 I noticed this code in svm_set_cr0():

  if (!(vcpu-kvm-arch.disabled_quirks  KVM_QUIRK_CD_NW_CLEARED))
  cr0 = ~(X86_CR0_CD | X86_CR0_NW);

 gCR0.CD is hidden to CPU if KVM_QUIRK_CD_NW_CLEARED is not set and looks
 like it is the normal case after grepping Qemu code.

Hi Xiao,

yes, this is correct.  QEMU still does not have support for disabling
quirks, so gCR0.CD is currently hidden on SVM.  I would like to
include this series in 4.2, while for 4.3 I will disable the quirk above
altogether (it is superseded by the way PAT is forced to all-WB).

In any case, this is just a KVM limitation.  gCR0.CD is taken into
account by the processor when computing the effective memory type.  It
uses all of these:

- the guest page tables PAT/PCD/PWT

- the guest page tables CR3.PCD/CR3.PWD

- the nested page tables PAT/PCD/PWT

- the guest page tables CR3.PCD/CR3.PWD

- the host MTRRs

- gCR0.CD

- hCR0.CD

Paolo

 How about this one? I still do not know how SVM properly emulates
 CR0.CD? :(
 -- 
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/7] KVM: arm/arm64: gsi routing support

2015-07-09 Thread Andre Przywara
Hi Pavel,

On 09/07/15 15:37, Pavel Fedin wrote:
  Hello!
 
 v1 - v2:
 - user API changed:
   x devid id passed in kvm_irq_routing_msi
   x kept the new routing entry type: KVM_IRQ_ROUTING_EXTENDED_MSI
 
  Andre, you never replied to my last comment to the previous series.

Oh dear, my draft folder again :-( Sorry for that!

 Are you going to do the same
 change in your MSI API? Otherwise:
 1. KVM_IRQ_LINE - we have completely own convention. Well, this was already 
 done before us, we
 cannot fix it.
 2. KVM_SIGNAL_MSI - we use VALID_DEVID flag plus devid

Yes, because there is already a flag value and no other way to specify
this, in contrast to ...

 3. KVM_SET_GSI_ROUTING - we use KVM_IRQ_ROUTING_EXTENDED_MSI plus devid

Here we already have a type field with some users, so lets piggy-back on
this.

Both ioctl extensions are coupled with a per-VM capability to let
userland know that it needs to provide a device ID.

  Don't (2) and (3) together still look bad? Since we agreed on not using 
 flags, i would suggest to
 have KVM_SIGNAL_EXTENDED_MSI counterpart, which also doesn't use flags.

Using flags on its own (without an explicit capability) is what I
opposed against, not flags in general. After all, that's what they are
meant for, right? In case of KVM_SET_GSI_ROUTING it just seems awkward
to me to use a flag when a different type would do as well.

But after all, I don't have a strong opinion on that matter, so if
others prefer using a flag I am also fine with that.

Poka,
Andre.

  I know, we were already talking about it, so, if this gets ignored for the 
 second time, i assume
 the Architects decided that fancy APIs are cool, and i promise to stop this.
 
 Kind regards,
 Pavel Fedin
 Expert Engineer
 Samsung Electronics Research center Russia
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] mips/kvm: Fixes for big endian MIPS64 hosts

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 10:13, Leon Alrae wrote:
 On 08/07/2015 16:22, Paolo Bonzini wrote:
 On 08/07/2015 17:03, James Hogan wrote:
 Hi Paolo,

 On 24/04/15 11:26, James Hogan wrote:
 A couple of small fixes for accessing 32-bit KVM registers on
 big endian, and to sign extend struct kvm_regs registers so as to
 work on MIPS64 hosts.

 James Hogan (2): mips/kvm: Fix Big endian 32-bit register access 
 mips/kvm: Sign extend registers written to KVM

 Any chance of applying these in time for v2.4?

 I had assumed they'd go through Leon, but I can apply them for 2.4-rc1
 as well.
 
 I thought these changes would get applied via kvm queue since they touch
 target-mips/kvm.c only. But if such target-specific kvm patches usually go via
 target-* tree then I'm happy to pick them up.

It's the same---they can go in through any tree.  In this case I've now
applied them and will send them out next week.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for 2.4 0/2] MIPS build fixes for v2.4

2015-07-09 Thread Peter Maydell
On 9 July 2015 at 12:52, Leon Alrae leon.al...@imgtec.com wrote:
 On 09/07/2015 10:17, James Hogan wrote:
 These two patches fix build errors for the MIPS TCG backend and MIPS
 KVM.

 Please could they be applied for v2.4.

 James Hogan (2):
   tcg/mips: Fix build error from merged memop+mmu_idx parameter
   mips/kvm: Sync with newer MIPS KVM headers

  target-mips/kvm.c | 15 ++-
  tcg/mips/tcg-target.c |  4 ++--
  2 files changed, 4 insertions(+), 15 deletions(-)

 Reviewed-by: Leon Alrae leon.al...@imgtec.com

 Peter, since these are build fixes, could they be squeezed into rc0?

Applied to master, thanks.

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 0/7] KVM: arm/arm64: gsi routing support

2015-07-09 Thread Pavel Fedin
 Hello!

 v1 - v2:
 - user API changed:
   x devid id passed in kvm_irq_routing_msi
   x kept the new routing entry type: KVM_IRQ_ROUTING_EXTENDED_MSI

 Andre, you never replied to my last comment to the previous series. Are you 
going to do the same
change in your MSI API? Otherwise:
1. KVM_IRQ_LINE - we have completely own convention. Well, this was already 
done before us, we
cannot fix it.
2. KVM_SIGNAL_MSI - we use VALID_DEVID flag plus devid
3. KVM_SET_GSI_ROUTING - we use KVM_IRQ_ROUTING_EXTENDED_MSI plus devid

 Don't (2) and (3) together still look bad? Since we agreed on not using flags, 
i would suggest to
have KVM_SIGNAL_EXTENDED_MSI counterpart, which also doesn't use flags.
 I know, we were already talking about it, so, if this gets ignored for the 
second time, i assume
the Architects decided that fancy APIs are cool, and i promise to stop this.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/2] mips/kvm: Fixes for big endian MIPS64 hosts

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 16:00, Peter Maydell wrote:
 
  I've actually just applied them to master as buildfixes :-)
 No, wait, I'm confusing this set with a different 2-patch
 set of MIPS fixes. Paolo, can you go ahead and take them through
 the kvm tree?

Sure, I had already queued them in fact.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 0/2] mips/kvm: Fixes for big endian MIPS64 hosts

2015-07-09 Thread Peter Maydell
On 9 July 2015 at 14:58, Peter Maydell peter.mayd...@linaro.org wrote:
 On 9 July 2015 at 14:49, Paolo Bonzini pbonz...@redhat.com wrote:
 It's the same---they can go in through any tree.  In this case I've now
 applied them and will send them out next week.

 I've actually just applied them to master as buildfixes :-)

No, wait, I'm confusing this set with a different 2-patch
set of MIPS fixes. Paolo, can you go ahead and take them through
the kvm tree?

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer

2015-07-09 Thread Joerg Roedel
On Thu, Jul 09, 2015 at 04:38:41PM +0200, Paolo Bonzini wrote:
  On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote:
  If we think that it's *only* a kvm-vfio interaction then we could add it
  to virt/kvm/vfio.c.  vfio could use symbol_get to avoid a module
  dependency and effectively disable the code path when not used with kvm.
  The reverse model of hosting it in vfio and using symbol_get from
  kvm-vfio would also work.  Do we really want to declare it to be
  kvm-vfio specific though?  Another option would be to simply host it
  under virt/lib with module dependencies for both vfio and kvm.
 
 I wonder if in the future we may have some kind of driver-mediated
 passthrough, e.g. for network drivers.  They might use the bypass
 mechanism too.  So I think drivers/vfio is too restrictive.
 
 virt/ right now only hosts KVM, but it could for example host lguest
 too.  virt/lib/ is okay with me.

Yeah, virt/lib is probably the best choice.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 0/7] KVM: arm/arm64: gsi routing support

2015-07-09 Thread Pavel Fedin
 Hi!

  3. KVM_SET_GSI_ROUTING - we use KVM_IRQ_ROUTING_EXTENDED_MSI plus devid
 
 Here we already have a type field with some users, so lets piggy-back on
 this.

 We already have 'flags' there too.

 Both ioctl extensions are coupled with a per-VM capability to let
 userland know that it needs to provide a device ID.

 Using flags on its own (without an explicit capability) is what I
 opposed against, not flags in general.

 Ok, and in your next respin you'll add the capability, correct? So that we 
will finally have all
pieces in place.

 In case of KVM_SET_GSI_ROUTING it just seems awkward
 to me to use a flag when a different type would do as well.

 Well, MSI vs Extended MSI are even not different types really. It's just MSI 
which has devid. And
we *ALREADY* have VALID_DEVID flag.

 But after all, I don't have a strong opinion on that matter, so if
 others prefer using a flag I am also fine with that.

 So, ok, to be short... My vote is for flag, because it's already there and it 
keeps up with the
style we already have. Eric, this is my final statement about it. It's up to 
you to accept or
ignore. In qemu code flag is a little bit nicer because it's just stored in a 
variable and helps to
avoid several if-else's (however small ones). Compare:
--- cut ---
kroute.gsi = virq;
kroute.type = KVM_IRQ_ROUTING_MSI;
kroute.u.msi.address_lo = (uint32_t)msg.address;
kroute.u.msi.address_hi = msg.address  32;
kroute.u.msi.data = le32_to_cpu(msg.data);
kroute.flags = kvm_msi_flags;
if (kroute.flags  KVM_MSI_VALID_DEVID) {
kroute.u.msi.devid = (pci_bus_num(dev-bus)  8) | dev-devfn;
}
--- cut ---
and:
--- cut ---
kroute.gsi = virq;
if (use_extended_msi) {
kroute.u.msi.devid = (pci_bus_num(dev-bus)  8) | dev-devfn;
kroute.type = KVM_IRQ_ROUTING_EXTENDED_MSI;
} else {
kroute.type = KVM_IRQ_ROUTING_MSI;
}
kroute.u.msi.address_lo = (uint32_t)msg.address;
kroute.u.msi.address_hi = msg.address  32;
kroute.u.msi.data = le32_to_cpu(msg.data);
kroute.flags = kvm_msi_flags;
--- cut ---

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

2015-07-09 Thread Christoffer Dall
On Thu, Jul 09, 2015 at 03:26:20PM +0100, Peter Maydell wrote:
 On 9 July 2015 at 15:17, Christoffer Dall christoffer.d...@linaro.org wrote:
  On Thu, Jul 09, 2015 at 02:24:06PM +0200, Christoffer Dall wrote:
  So I ran this through GDB, and this happens when the guest probes the
  virtio devices, specifically the backtrace tells me that
 
  virtio_current_cpu_endian () at /root/src/qemu/hw/virtio/virtio.c:594
  = cc-virtio_is_big_endian
- arm_cpu_is_big_endian
  - cpu_synchronize_state
- kvm_cpu_synchronize_state
 
  which causes cpu-kvm_vcpu_dirty = true, which causes the run-loop to
  write the CNTVOFF on a per-vcpu basis without stopping anything, as far
  as I can tell.
 
 Ah, I was wondering if it was going to turn out to be this,
 but I hadn't figured out why it was going to cause us to do
 a write-back of state rather than just a read.
 
  So yeah, I guess the only required fix here is to fix QEMU in some way
  as to not fiddle with the timer registers in this way, and then I
  honestly don't know if we should try to fix legacy userspace in the
  kernel, but based on the feedback from Jan, I suppose not.
 
 arm_cpu_is_big_endian() doesn't actually want to write
 back any state -- all it's interested in is reading.
 So we really ought not to need to write anything back there.
 kvm-all.c's sync functions don't seem to provide a sync
 kernel state to userspace but I promise I'm not going to
 dirty it function though. I guess we could add one.
 
 (Overall it's kind of fragile even if we can avoid this
 specific case where we're writing back the counter state,
 though -- we should probably think about the sync level
 stuff as well.)

Sounds to me like we should add the sync level stuff first, as it should
catch all callers, and afterwards consider adding the sync-one-way
optimization.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/7] KVM: arm/arm64: gsi routing support

2015-07-09 Thread Eric Auger
Hi Pavel, Andre,
On 07/09/2015 05:52 PM, Pavel Fedin wrote:
  Hi!
 
 3. KVM_SET_GSI_ROUTING - we use KVM_IRQ_ROUTING_EXTENDED_MSI plus devid

 Here we already have a type field with some users, so lets piggy-back on
 this.
 
  We already have 'flags' there too.
 
 Both ioctl extensions are coupled with a per-VM capability to let
 userland know that it needs to provide a device ID.
 
 Using flags on its own (without an explicit capability) is what I
 opposed against, not flags in general.
 
  Ok, and in your next respin you'll add the capability, correct? So that we 
 will finally have all
 pieces in place.
 
 In case of KVM_SET_GSI_ROUTING it just seems awkward
 to me to use a flag when a different type would do as well.
 
  Well, MSI vs Extended MSI are even not different types really. It's just 
 MSI which has devid. And
 we *ALREADY* have VALID_DEVID flag.
 
 But after all, I don't have a strong opinion on that matter, so if
 others prefer using a flag I am also fine with that.
 
  So, ok, to be short... My vote is for flag, because it's already there and 
 it keeps up with the
 style we already have. Eric, this is my final statement about it. It's up to 
 you to accept or
 ignore. In qemu code flag is a little bit nicer because it's just stored in a 
 variable and helps to
 avoid several if-else's (however small ones). Compare:
Well personally I prefer the type thing and I don't see much difference
at userspace level anyway. But I am not this kind of hyperspace
architect guy. So, since there is no consensus here, I would say let's
wait for formal reviews of our maintainers and I will align.

The v2 update is not the outcome of a consensus so I made arbitrary
decisions to progress  fix bugs and I hope this eventually works with
ITS ;-)

Best Regards

Eric
 --- cut ---
 kroute.gsi = virq;
 kroute.type = KVM_IRQ_ROUTING_MSI;
 kroute.u.msi.address_lo = (uint32_t)msg.address;
 kroute.u.msi.address_hi = msg.address  32;
 kroute.u.msi.data = le32_to_cpu(msg.data);
 kroute.flags = kvm_msi_flags;
 if (kroute.flags  KVM_MSI_VALID_DEVID) {
 kroute.u.msi.devid = (pci_bus_num(dev-bus)  8) | dev-devfn;
 }
 --- cut ---
 and:
 --- cut ---
 kroute.gsi = virq;
 if (use_extended_msi) {
 kroute.u.msi.devid = (pci_bus_num(dev-bus)  8) | dev-devfn;
 kroute.type = KVM_IRQ_ROUTING_EXTENDED_MSI;
 } else {
 kroute.type = KVM_IRQ_ROUTING_MSI;
 }
 kroute.u.msi.address_lo = (uint32_t)msg.address;
 kroute.u.msi.address_hi = msg.address  32;
 kroute.u.msi.data = le32_to_cpu(msg.data);
 kroute.flags = kvm_msi_flags;
 --- cut ---
 
 Kind regards,
 Pavel Fedin
 Expert Engineer
 Samsung Electronics Research center Russia
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer

2015-07-09 Thread Alex Williamson
On Thu, 2015-07-09 at 17:34 +0200, Joerg Roedel wrote:
 On Thu, Jul 09, 2015 at 04:38:41PM +0200, Paolo Bonzini wrote:
   On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote:
   If we think that it's *only* a kvm-vfio interaction then we could add it
   to virt/kvm/vfio.c.  vfio could use symbol_get to avoid a module
   dependency and effectively disable the code path when not used with kvm.
   The reverse model of hosting it in vfio and using symbol_get from
   kvm-vfio would also work.  Do we really want to declare it to be
   kvm-vfio specific though?  Another option would be to simply host it
   under virt/lib with module dependencies for both vfio and kvm.
  
  I wonder if in the future we may have some kind of driver-mediated
  passthrough, e.g. for network drivers.  They might use the bypass
  mechanism too.  So I think drivers/vfio is too restrictive.
  
  virt/ right now only hosts KVM, but it could for example host lguest
  too.  virt/lib/ is okay with me.
 
 Yeah, virt/lib is probably the best choice.

I will make it so.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-09 Thread Bandan Das
Paolo Bonzini pbonz...@redhat.com writes:

 On 09/07/2015 08:43, Laszlo Ersek wrote:
 On 07/09/15 08:09, Paolo Bonzini wrote:


 On 09/07/2015 00:36, Bandan Das wrote:
 Let userspace inquire the maximum physical address width
 of the host processors; this can be used to identify maximum
 memory that can be assigned to the guest.

 Reported-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  arch/x86/kvm/x86.c   | 3 +++
  include/uapi/linux/kvm.h | 1 +
  2 files changed, 4 insertions(+)

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index bbaf44e..97d6746 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
 long ext)
case KVM_CAP_NR_MEMSLOTS:
r = KVM_USER_MEM_SLOTS;
break;
 +  case KVM_CAP_PHY_ADDR_WIDTH:
 +  r = boot_cpu_data.x86_phys_bits;
 +  break;

 Userspace can just use CPUID, can't it?
 
 I believe KVM's cooperation is necessary, for the following reason:
 
 The truncation only occurs when the guest-phys - host-phys translation
 is done in hardware, *and* the phys bits of the host processor are
 insufficient to represent the highest guest-phys address that the guest
 will ever face.
 
 The first condition (of course) means that the truncation depends on EPT
 being enabled. (I didn't test on AMD so I don't know if RVI has the same
 issue.) If EPT is disabled, either because the host processor lacks it,
 or because the respective kvm_intel module parameter is set so, then the
 issue cannot be experienced.
 
 Therefore I believe a KVM patch is necessary.
 
 However, this specific patch doesn't seem sufficient; it should also
 consider whether EPT is enabled. (And the ioctl should be perhaps
 renamed to reflect that -- what QEMU needs to know is not the raw
 physical address width of the host processor, but whether that width
 will cause EPT to silently truncate high guest-phys addresses.)

 Right; if you want to consider whether EPT is enabled (which is the
 right thing to do, albeit it makes for a much bigger patch) a KVM patch
 is necessary.  In that case you also need to patch the API documentation.

Note that this patch really doesn't do anything except for printing a
message that something might potentially go wrong. Without EPT, you don't
hit the processor limitation with your setup, but the user should nevertheless
still be notified. In fact, I think shadow paging code should also emulate
this behavior if the gpa is out of range.

 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-09 Thread Laszlo Ersek
On 07/09/15 20:32, Bandan Das wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
 
 On 09/07/2015 08:43, Laszlo Ersek wrote:
 On 07/09/15 08:09, Paolo Bonzini wrote:


 On 09/07/2015 00:36, Bandan Das wrote:
 Let userspace inquire the maximum physical address width
 of the host processors; this can be used to identify maximum
 memory that can be assigned to the guest.

 Reported-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  arch/x86/kvm/x86.c   | 3 +++
  include/uapi/linux/kvm.h | 1 +
  2 files changed, 4 insertions(+)

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index bbaf44e..97d6746 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
 long ext)
   case KVM_CAP_NR_MEMSLOTS:
   r = KVM_USER_MEM_SLOTS;
   break;
 + case KVM_CAP_PHY_ADDR_WIDTH:
 + r = boot_cpu_data.x86_phys_bits;
 + break;

 Userspace can just use CPUID, can't it?

 I believe KVM's cooperation is necessary, for the following reason:

 The truncation only occurs when the guest-phys - host-phys translation
 is done in hardware, *and* the phys bits of the host processor are
 insufficient to represent the highest guest-phys address that the guest
 will ever face.

 The first condition (of course) means that the truncation depends on EPT
 being enabled. (I didn't test on AMD so I don't know if RVI has the same
 issue.) If EPT is disabled, either because the host processor lacks it,
 or because the respective kvm_intel module parameter is set so, then the
 issue cannot be experienced.

 Therefore I believe a KVM patch is necessary.

 However, this specific patch doesn't seem sufficient; it should also
 consider whether EPT is enabled. (And the ioctl should be perhaps
 renamed to reflect that -- what QEMU needs to know is not the raw
 physical address width of the host processor, but whether that width
 will cause EPT to silently truncate high guest-phys addresses.)

 Right; if you want to consider whether EPT is enabled (which is the
 right thing to do, albeit it makes for a much bigger patch) a KVM patch
 is necessary.  In that case you also need to patch the API documentation.
 
 Note that this patch really doesn't do anything except for printing a
 message that something might potentially go wrong.

Yes.

 Without EPT, you don't
 hit the processor limitation with your setup, but the user should nevertheless
 still be notified.

I disagree.

 In fact, I think shadow paging code should also emulate
 this behavior if the gpa is out of range.

I disagree.

There is no out of range gpa. QEMU allocates enough memory, and it
should be completely transparent to the guest. The fact that it silently
breaks with nested paging if the host processor doesn't have enough
address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not
sure, but I suspect it's a hardware bug). In any case the guest
shouldn't care at all. It is a *virtual* machine, and the VMM should lie
to it plausibly enough. How much RAM, and how many phys address bits the
host has, is a performance question, but it should not be a correctness
question. A 256 GB guest should run (slowly, but correctly) on a laptop
that has only 4 GB of RAM and only 36 phys addr bits, but plenty of swap
space.

Because otherwise your argument could be extrapolated as TCG should
break too if the gpa is 'out of range'.

So, I disagree. Whatever memory you give to the guest should just work
(unless of course you want to emulate a small address width for the
*VCPU*, but that's absolutely not the use case here). What we have here
is a leaky abstraction: a PCPU limitation giving away a lie that the
guest should never notice. The guest should be able to use all memory
that was specified with QEMU's -m, regardless of TCG vs. KVM-without-EPT
vs. KVM-with-EPT. If the last case cannot work (due to hardware
limitations), that's fine, but then (and only then) a warning should be
printed.

... In any case, please understand that I'm not campaigning for this
warning :) IIRC the warning was your (very welcome!) idea after I
reported the problem; I'm just trying to ensure that the warning match
the exact issue I encountered.

Thanks!
Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 0/7] KVM: arm/arm64: gsi routing support

2015-07-09 Thread Pavel Fedin
 Well personally I prefer the type thing and I don't see much difference
 at userspace level anyway. But I am not this kind of hyperspace
 architect guy. So, since there is no consensus here, I would say let's
 wait for formal reviews of our maintainers and I will align.

 Hah, okay... Don't take it for too much. Anyway your word weighs much more 
than my one...

 hope this eventually works with
 ITS ;-)

 You know... There are actually problems with irqfd. However i'm not sure 
whether they are result of
some bug or of poorly made vGIC CPU interface software emulation. I don't have 
hardware with working
vGICv3 here. With GICv2m everything is fine, but code path is quite different, 
and LPI != SPI.
 This is further overcomplicated by the fact that as soon as i start up the 
profiler, the problem
goes away. Looks like increased number of context switches have impact.
 Actually, the problem is very poor vhost-net throughput, which jumps up when i 
start the profiler
and goes back down when i stop it. Just to note...
 I'm off since now for a small vacation, will be back on monday or tuesday. See 
you guys!

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] target-i386: Sanity check host processor physical address width

2015-07-09 Thread Bandan Das
Laszlo Ersek ler...@redhat.com writes:
...

 First, see my comments on the KVM patch.

 Second, ram_size is not the right thing to compare. What should be
 checked is whether the highest guest-physical address that maps to RAM
 can be represented in the address width of the host processor (and only
 if EPT is enabled, but that sub-condition belongs to the KVM patch).

 Note that this is not the same as the check written in the patch. For
 example, if you assume a 32-bit PCI hole with size 1 GB, then a total
 guest RAM of size 63 GB will result in the highest guest-phys memory
 address being 0xF__, which just fits into 36 bits.

 Correspondingly, the above code would not print the warning for

   -m $((63 * 1024 + 1))

 on my laptop (which has address sizes   : 36 bits physical, ...), even
 though such a guest would not boot for me (with EPT enabled).

 Please see

 http://thread.gmane.org/gmane.comp.bios.tianocore.devel/15418/focus=15447

 So, ram_size in the controlling expression should be replaced with
 maximum_guest_ram_address (which should be inclusive, and the = relop
 should be preserved).
 also with memory hotplug tuned on we should check if the end of
 hotplug memory area is less then limit, i.e.:
 
   pcms-hotplug_memory.base + hotplug_mem_size  1ULL  max_phys_bits

 Seems reasonable, thanks for the hint!

Thanks Igor and Laszlo, makes sense. I am wondering if this 1GB PCI
hole is always fixed so that I can simply include that in calculating the 
maximum
guest ram address ? Or do we have to figure that out every time ?

 (The LHS in this instance is exclusive though, so equality should *not*
 trigger the warning. maximum_guest_ram_address is inclusive, and
 equality should trigger the warning. (Although equality seems quite
 impossible in practice.))

 Thanks!
 Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Thomas Huth
On Thu, 9 Jul 2015 16:07:47 +0300
Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote:
  
  
  On 09/07/2015 11:48, Laurent Vivier wrote:
   
   
   On 09/07/2015 09:49, Thomas Huth wrote:
   The option for supporting cross-endianness legacy guests in
   the vhost and tun code should only be available on systems
   that support cross-endian guests.
   
   I'm sure I misunderstand something, but what happens if we use QEMU with
   TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
   little endian host ?
  
  TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost.
  
  Paolo
 
 vhost does not require irqfd anymore.  I think ioeventfd actually works
 fine though I didn't try, it would be easy to support.

That's an interesting issue, thanks for pointing this out, Laurent! So
do we now rather want to leave everything as it currently is, in case
somebody wants to use vhost-net with a cross-endian TCG guest one day?

Or do we assume that either
a) TCG is so slow anyway that nobody wants to accelerate it with vhost
or
b) TCG vhost likely won't happen that soon so we hope that everybody
will already be using virtio 1.0 at that point in time (with a fixed
endianness) 
?
... then I think we should go on and include this patch.

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Thomas Huth
On Thu, 9 Jul 2015 16:07:47 +0300
Michael S. Tsirkin m...@redhat.com wrote:

 On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote:
  
  
  On 09/07/2015 11:48, Laurent Vivier wrote:
   
   
   On 09/07/2015 09:49, Thomas Huth wrote:
   The option for supporting cross-endianness legacy guests in
   the vhost and tun code should only be available on systems
   that support cross-endian guests.
   
   I'm sure I misunderstand something, but what happens if we use QEMU with
   TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
   little endian host ?
  
  TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost.
  
  Paolo
 
 vhost does not require irqfd anymore.  I think ioeventfd actually works
 fine though I didn't try, it would be easy to support.

That's an interesting issue, thanks for pointing this out, Laurent! So
do we now rather want to leave everything as it currently is, in case
somebody wants to use vhost-net with a cross-endian TCG guest one day?

Or do we assume that either
a) TCG is so slow anyway that nobody wants to accelerate it with vhost
or
b) TCG vhost likely won't happen that soon so we hope that everybody
will already be using virtio 1.0 at that point in time (with a fixed
endianness) 
?
... then I think we should go on and include this patch.

 Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v1 4/5] KVM: x86: Add arch specific routines for irqbypass manager

2015-07-09 Thread Alex Williamson
On Fri, 2015-07-10 at 11:00 +0800, Feng Wu wrote:
 Add the following x86 specific routines for irqbypass manger:
 
 - kvm_arch_irq_bypass_add_producer
 - kvm_arch_irq_bypass_del_producer
 - kvm_arch_irq_bypass_stop
 - kvm_arch_irq_bypass_resume
 - kvm_arch_irq_bypass_update
 
 Signed-off-by: Feng Wu feng...@intel.com
 ---
  arch/x86/kvm/Kconfig |  1 +
  arch/x86/kvm/x86.c   | 55 
 
  2 files changed, 56 insertions(+)
 
 diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
 index 22f6fcb..ae68f2a 100644
 --- a/arch/x86/kvm/Kconfig
 +++ b/arch/x86/kvm/Kconfig
 @@ -62,6 +62,7 @@ config KVM_INTEL
   # for perf_guest_get_msrs():
   depends on CPU_SUP_INTEL
   select IRQ_BYPASS_MANAGER
 + select HAVE_KVM_IRQ_BYPASS
   ---help---
 Provides support for KVM on Intel processors equipped with the VT
 extensions.
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index d81ac02..a59f7e3 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -49,6 +49,8 @@
  #include linux/pci.h
  #include linux/timekeeper_internal.h
  #include linux/pvclock_gtod.h
 +#include linux/kvm_irqfd.h
 +#include linux/irqbypass.h
  #include trace/events/kvm.h
  
  #define CREATE_TRACE_POINTS
 @@ -8023,6 +8025,59 @@ out:
   return ret;
  }
  
 +void kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
 +   struct irq_bypass_producer *prod)
 +{
 + int ret;
 + struct kvm_kernel_irqfd *irqfd =
 + container_of(cons, struct kvm_kernel_irqfd, consumer);
 +
 + irqfd-producer = prod;
 +
 + ret = kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 1);
 + WARN_ON(ret);
 +}
 +
 +void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
 +   struct irq_bypass_producer *prod)
 +{
 + int ret;
 + struct kvm_kernel_irqfd *irqfd =
 + container_of(cons, struct kvm_kernel_irqfd, consumer);
 +
 + irqfd-producer = NULL;
 +
 + /*
 +  * When producer of consumer is unregistered, we change back to
 +  * remapped mode, so we can re-use the current implementation
 +  * when the irq is masked/disabed or the consumer side (KVM
 +  * int this case doesn't want to receive the interrupts.
 +  */
 + ret = kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 0);
 + WARN_ON(ret);
 +}
 +
 +void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *cons)
 +{
 +}
 +
 +void kvm_arch_irq_bypass_resume(struct irq_bypass_consumer *cons)
 +{
 +}

No need for dummy functions, these callbacks will be optional.  Thanks,

Alex

 +
 +void kvm_arch_irq_bypass_update(struct irq_bypass_consumer *cons)
 +{
 + int ret;
 + struct kvm_kernel_irqfd *irqfd =
 + container_of(cons, struct kvm_kernel_irqfd, consumer);
 +
 + BUG_ON(!irqfd-producer);
 +
 + ret = kvm_arch_update_pi_irte(irqfd-kvm, irqfd-producer-irq,
 +   irqfd-gsi, 1);
 + WARN_ON(ret);
 +}
 +
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC v1 5/5] Call irqbypass update routine when updating irqfd

2015-07-09 Thread Wu, Feng


 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Alex Williamson
 Sent: Friday, July 10, 2015 11:26 AM
 To: Wu, Feng
 Cc: kvm@vger.kernel.org; pbonz...@redhat.com; j...@8bytes.org;
 avi.kiv...@gmail.com; eric.au...@linaro.org
 Subject: Re: [RFC v1 5/5] Call irqbypass update routine when updating irqfd
 
 On Fri, 2015-07-10 at 11:00 +0800, Feng Wu wrote:
  Call update routine when updating irqfd, this can update the
  IRTE for Intel posted-interrupts.
 
  Signed-off-by: Feng Wu feng...@intel.com
  ---
   virt/kvm/eventfd.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)
 
  diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
  index a32cf6c..1226835 100644
  --- a/virt/kvm/eventfd.c
  +++ b/virt/kvm/eventfd.c
  @@ -570,8 +570,10 @@ void kvm_irq_routing_update(struct kvm *kvm)
 
  spin_lock_irq(kvm-irqfds.lock);
 
  -   list_for_each_entry(irqfd, kvm-irqfds.items, list)
  +   list_for_each_entry(irqfd, kvm-irqfds.items, list) {
  irqfd_update(kvm, irqfd);
  +   irqfd-consumer.update(irqfd-consumer);
  +   }
 
  spin_unlock_irq(kvm-irqfds.lock);
   }
 
 I don't understand why the irq bypass manager needs to know about this
 update callback.  We could just as easily make it be a function pointer
 on the irqfd structure or maybe just open code it.  It's defined by the
 consumer and called by the consumer, the irq bypass manager shouldn't
 know about it.  Thanks,

Yes, you are right. All we need is the producer information which has been
passed in the register routine. And we can easily make this update logic
inside the consumer. Thanks for your comments!

Thanks,
Feng

 
 Alex
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-09 Thread Bandan Das
Bandan Das b...@redhat.com writes:

 Laszlo Ersek ler...@redhat.com writes:
 ...
 Yes.

 Without EPT, you don't
 hit the processor limitation with your setup, but the user should 
 nevertheless
 still be notified.

 I disagree.

 In fact, I think shadow paging code should also emulate
 this behavior if the gpa is out of range.

 I disagree.

 There is no out of range gpa. QEMU allocates enough memory, and it
 should be completely transparent to the guest. The fact that it silently
 breaks with nested paging if the host processor doesn't have enough
 address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not
 sure, but I suspect it's a hardware bug). In any case the guest
 shouldn't care at all. It is a *virtual* machine, and the VMM should lie
 to it plausibly enough. How much RAM, and how many phys address bits the
 host has, is a performance question, but it should not be a correctness
 question. A 256 GB guest should run (slowly, but correctly) on a laptop
 that has only 4 GB of RAM and only 36 phys addr bits, but plenty of swap
 space.

 Because otherwise your argument could be extrapolated as TCG should
 break too if the gpa is 'out of range'.

 So, I disagree. Whatever memory you give to the guest should just work
 (unless of course you want to emulate a small address width for the
 *VCPU*, but that's absolutely not the use case here). What we have here
 is a leaky abstraction: a PCPU limitation giving away a lie that the
 guest should never notice. The guest should be able to use all memory
 that was specified with QEMU's -m, regardless of TCG vs. KVM-without-EPT
 vs. KVM-with-EPT. If the last case cannot work (due to hardware
 limitations), that's fine, but then (and only then) a warning should be
 printed.

 Hmm... Ok, I understand your point. So, this is more like a EPT
 limitation/bug in that Qemu isn't complaining about the memory assigned
 to the guest but EPT code is breaking owing to the processor physical
 address width. And honestly, I now think that this patch just makes the whole
 situation more confusing :) I am wondering if it's just possible for kvm to
 simply throw an error like a EPT misconfiguration or something ..

 Or in other words, if using a hardware assisted mechanism is just not
 possible, KVM will simply not let it run instead of letting a guest
 stuck in boot.

I noticed that when the guest gets stuck, trace shows an endless loop of
EXTERNAL_INTERRUPT exits with code 14 (PF).

There's a note in 28.2.2 of the spec that No processors supporting
the Intel64 architecture support more than 48 physical-address bits..
An attempt to use such an address causes a page fault. So, my first
guess was to print out the guest physical address. That seems to be
well beyond the range and is always 0xff000 (when the guest is stuck).

The other thing I can think of is the EPT entries have bits in the
51:N range set which is reserved and always 0. I haven't verified
but it looks like there's ept_misconfig_inspect_spte() that should
already catch this condition. I am out of ideas for today :)


 ... In any case, please understand that I'm not campaigning for this
 warning :) IIRC the warning was your (very welcome!) idea after I
 reported the problem; I'm just trying to ensure that the warning match
 the exact issue I encountered.

 Thanks!
 Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: SVM: use NPT page attributes

2015-07-09 Thread Xiao Guangrong



On 07/09/2015 11:18 PM, Paolo Bonzini wrote:



On 09/07/2015 04:30, Xiao Guangrong wrote:

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 602b974a60a6..0f125c1860ec 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1085,6 +1085,47 @@ static u64 svm_compute_tsc_offset(struct
kvm_vcpu *vcpu, u64 target_tsc)
return target_tsc - tsc;
}

+static void svm_set_guest_pat(struct vcpu_svm *svm, u64 *g_pat)
+{
+struct kvm_vcpu *vcpu = svm-vcpu;
+
+/* Unlike Intel, AMD takes the guest's CR0.CD into account.


I noticed this code in svm_set_cr0():

  if (!(vcpu-kvm-arch.disabled_quirks  KVM_QUIRK_CD_NW_CLEARED))
  cr0 = ~(X86_CR0_CD | X86_CR0_NW);

gCR0.CD is hidden to CPU if KVM_QUIRK_CD_NW_CLEARED is not set and looks
like it is the normal case after grepping Qemu code.


Hi Xiao,

yes, this is correct.  QEMU still does not have support for disabling
quirks, so gCR0.CD is currently hidden on SVM.  I would like to
include this series in 4.2, while for 4.3 I will disable the quirk above
altogether (it is superseded by the way PAT is forced to all-WB).



That plan sounds good to me.

You will drop disabled_quirks completely or just enable it in Qemu? :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v1 1/5] vfio: Register/unregister irq_bypass_producer

2015-07-09 Thread Feng Wu
This patch adds the registration/unregistration of an
irq_bypass_producer for MSI/MSIx on vfio pci devices.

Signed-off-by: Feng Wu feng...@intel.com
---
 drivers/vfio/pci/vfio_pci_intrs.c   | 8 ++--
 drivers/vfio/pci/vfio_pci_private.h | 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c 
b/drivers/vfio/pci/vfio_pci_intrs.c
index 4e053be..6e86292 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -322,7 +322,7 @@ static int vfio_msi_set_vector_signal(struct 
vfio_pci_device *vdev,
 
if (vdev-ctx[vector].trigger) {
free_irq(irq, vdev-ctx[vector].trigger);
-   /* irq_bypass_unregister_producer(); */
+   irq_bypass_unregister_producer(vdev-ctx[vector].producer);
kfree(vdev-ctx[vector].name);
eventfd_ctx_put(vdev-ctx[vector].trigger);
vdev-ctx[vector].trigger = NULL;
@@ -364,7 +364,11 @@ static int vfio_msi_set_vector_signal(struct 
vfio_pci_device *vdev,
return ret;
}
 
-   /* irq_bypass_register_producer(); */
+   INIT_LIST_HEAD(vdev-ctx[vector].producer.node);
+   vdev-ctx[vector].producer.token = trigger;
+   vdev-ctx[vector].producer.irq = irq;
+   ret = irq_bypass_register_producer(vdev-ctx[vector].producer);
+   WARN_ON(ret);
 
vdev-ctx[vector].trigger = trigger;
 
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index ae0e1b4..0e7394f 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -13,6 +13,7 @@
 
 #include linux/mutex.h
 #include linux/pci.h
+#include linux/irqbypass.h
 
 #ifndef VFIO_PCI_PRIVATE_H
 #define VFIO_PCI_PRIVATE_H
@@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx {
struct virqfd   *mask;
char*name;
boolmasked;
+   struct irq_bypass_producer  producer;
 };
 
 struct vfio_pci_device {
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v1 3/5] KVM: Add pointer to 'struct irq_bypass_produce' in 'kvm_kernel_irqfd'

2015-07-09 Thread Feng Wu
Add reference to struct irq_bypass_produce so we can get the producer
information from the consumer side.

Signed-off-by: Feng Wu feng...@intel.com
---
 include/linux/kvm_irqfd.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index 3c0bd07..0c1de05 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -65,6 +65,7 @@ struct kvm_kernel_irqfd {
poll_table pt;
struct work_struct shutdown;
struct irq_bypass_consumer consumer;
+   struct irq_bypass_producer *producer;
 };
 
 #endif /* __LINUX_KVM_IRQFD_H */
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v1 5/5] Call irqbypass update routine when updating irqfd

2015-07-09 Thread Feng Wu
Call update routine when updating irqfd, this can update the
IRTE for Intel posted-interrupts.

Signed-off-by: Feng Wu feng...@intel.com
---
 virt/kvm/eventfd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index a32cf6c..1226835 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -570,8 +570,10 @@ void kvm_irq_routing_update(struct kvm *kvm)
 
spin_lock_irq(kvm-irqfds.lock);
 
-   list_for_each_entry(irqfd, kvm-irqfds.items, list)
+   list_for_each_entry(irqfd, kvm-irqfds.items, list) {
irqfd_update(kvm, irqfd);
+   irqfd-consumer.update(irqfd-consumer);
+   }
 
spin_unlock_irq(kvm-irqfds.lock);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v1 2/5] KVM: x86: Update IRTE for posted-interrupts

2015-07-09 Thread Feng Wu
This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/kvm/x86.c | 73 ++
 1 file changed, 73 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 26eaeb5..d81ac02 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -63,6 +63,7 @@
 #include asm/fpu/internal.h /* Ugh! */
 #include asm/pvclock.h
 #include asm/div64.h
+#include asm/irq_remapping.h
 
 #define MAX_IO_MSRS 256
 #define KVM_MAX_MCE_BANKS 32
@@ -7950,6 +7951,78 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
 }
 EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
 
+/*
+ * kvm_arch_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success,  0 on failure
+ */
+int kvm_arch_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+   uint32_t guest_irq, bool set)
+{
+   struct kvm_kernel_irq_routing_entry *e;
+   struct kvm_irq_routing_table *irq_rt;
+   struct kvm_lapic_irq irq;
+   struct kvm_vcpu *vcpu;
+   struct vcpu_data vcpu_info;
+   int idx, ret = -EINVAL;
+
+   if (!irq_remapping_cap(IRQ_POSTING_CAP))
+   return 0;
+
+   idx = srcu_read_lock(kvm-irq_srcu);
+   irq_rt = srcu_dereference(kvm-irq_routing, kvm-irq_srcu);
+   BUG_ON(guest_irq = irq_rt-nr_rt_entries);
+
+   hlist_for_each_entry(e, irq_rt-map[guest_irq], link) {
+   if (e-type != KVM_IRQ_ROUTING_MSI)
+   continue;
+   /*
+* VT-d PI cannot support posting multicast/broadcast
+* interrupts to a VCPU, we still use interrupt remapping
+* for these kind of interrupts.
+*
+* For lowest-priority interrupts, we only support
+* those with single CPU as the destination, e.g. user
+* configures the interrupts via /proc/irq or uses
+* irqbalance to make the interrupts single-CPU.
+*
+* We will support full lowest-priority interrupt later.
+*
+*/
+
+   kvm_set_msi_irq(e, irq);
+   if (!kvm_intr_is_single_vcpu(kvm, irq, vcpu))
+   continue;
+
+   vcpu_info.pi_desc_addr = kvm_x86_ops-get_pi_desc_addr(vcpu);
+   vcpu_info.vector = irq.vector;
+
+   if (set)
+   ret = irq_set_vcpu_affinity(host_irq, vcpu_info);
+   else {
+   /* suppress notification event before unposting */
+   kvm_x86_ops-pi_set_sn(vcpu);
+   ret = irq_set_vcpu_affinity(host_irq, NULL);
+   kvm_x86_ops-pi_clear_sn(vcpu);
+   }
+
+   if (ret  0) {
+   printk(KERN_INFO %s: failed to update PI IRTE\n,
+   __func__);
+   goto out;
+   }
+   }
+
+   ret = 0;
+out:
+   srcu_read_unlock(kvm-irq_srcu, idx);
+   return ret;
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v1 4/5] KVM: x86: Add arch specific routines for irqbypass manager

2015-07-09 Thread Feng Wu
Add the following x86 specific routines for irqbypass manger:

- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop
- kvm_arch_irq_bypass_resume
- kvm_arch_irq_bypass_update

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/kvm/Kconfig |  1 +
 arch/x86/kvm/x86.c   | 55 
 2 files changed, 56 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 22f6fcb..ae68f2a 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -62,6 +62,7 @@ config KVM_INTEL
# for perf_guest_get_msrs():
depends on CPU_SUP_INTEL
select IRQ_BYPASS_MANAGER
+   select HAVE_KVM_IRQ_BYPASS
---help---
  Provides support for KVM on Intel processors equipped with the VT
  extensions.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d81ac02..a59f7e3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -49,6 +49,8 @@
 #include linux/pci.h
 #include linux/timekeeper_internal.h
 #include linux/pvclock_gtod.h
+#include linux/kvm_irqfd.h
+#include linux/irqbypass.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -8023,6 +8025,59 @@ out:
return ret;
 }
 
+void kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+   int ret;
+   struct kvm_kernel_irqfd *irqfd =
+   container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+   irqfd-producer = prod;
+
+   ret = kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 1);
+   WARN_ON(ret);
+}
+
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+   int ret;
+   struct kvm_kernel_irqfd *irqfd =
+   container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+   irqfd-producer = NULL;
+
+   /*
+* When producer of consumer is unregistered, we change back to
+* remapped mode, so we can re-use the current implementation
+* when the irq is masked/disabed or the consumer side (KVM
+* int this case doesn't want to receive the interrupts.
+*/
+   ret = kvm_arch_update_pi_irte(irqfd-kvm, prod-irq, irqfd-gsi, 0);
+   WARN_ON(ret);
+}
+
+void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *cons)
+{
+}
+
+void kvm_arch_irq_bypass_resume(struct irq_bypass_consumer *cons)
+{
+}
+
+void kvm_arch_irq_bypass_update(struct irq_bypass_consumer *cons)
+{
+   int ret;
+   struct kvm_kernel_irqfd *irqfd =
+   container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+   BUG_ON(!irqfd-producer);
+
+   ret = kvm_arch_update_pi_irte(irqfd-kvm, irqfd-producer-irq,
+ irqfd-gsi, 1);
+   WARN_ON(ret);
+}
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v1 0/5] irq bypass interface implementation for VT-d Posted-interrupts

2015-07-09 Thread Feng Wu
This series is based on Alex and Eric's irq bypass manager framework. To
make things clear, I only send out the patches related to irq bypass
manager, the purpose here is to show how certain callbacks are used
in VT-d PI and help to improve the irqbypass manager itself.

Feng Wu (5):
  vfio: Register/unregister irq_bypass_producer
  KVM: x86: Update IRTE for posted-interrupts
  KVM: Add pointer to 'struct irq_bypass_produce' in 'kvm_kernel_irqfd'
  KVM: x86: Add arch specific routines for irqbypass manager
  Call irqbypass update routine when updating irqfd

 arch/x86/kvm/Kconfig|   1 +
 arch/x86/kvm/x86.c  | 128 
 drivers/vfio/pci/vfio_pci_intrs.c   |   8 ++-
 drivers/vfio/pci/vfio_pci_private.h |   2 +
 include/linux/kvm_irqfd.h   |   1 +
 virt/kvm/eventfd.c  |   4 +-
 6 files changed, 141 insertions(+), 3 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v1 5/5] Call irqbypass update routine when updating irqfd

2015-07-09 Thread Alex Williamson
On Fri, 2015-07-10 at 11:00 +0800, Feng Wu wrote:
 Call update routine when updating irqfd, this can update the
 IRTE for Intel posted-interrupts.
 
 Signed-off-by: Feng Wu feng...@intel.com
 ---
  virt/kvm/eventfd.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index a32cf6c..1226835 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -570,8 +570,10 @@ void kvm_irq_routing_update(struct kvm *kvm)
  
   spin_lock_irq(kvm-irqfds.lock);
  
 - list_for_each_entry(irqfd, kvm-irqfds.items, list)
 + list_for_each_entry(irqfd, kvm-irqfds.items, list) {
   irqfd_update(kvm, irqfd);
 + irqfd-consumer.update(irqfd-consumer);
 + }
  
   spin_unlock_irq(kvm-irqfds.lock);
  }

I don't understand why the irq bypass manager needs to know about this
update callback.  We could just as easily make it be a function pointer
on the irqfd structure or maybe just open code it.  It's defined by the
consumer and called by the consumer, the irq bypass manager shouldn't
know about it.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Thomas Huth
The option for supporting cross-endianness legacy guests in
the vhost and tun code should only be available on systems
that support cross-endian guests.

Signed-off-by: Thomas Huth th...@redhat.com
---
 arch/arm/kvm/Kconfig | 1 +
 arch/arm64/kvm/Kconfig   | 1 +
 arch/powerpc/kvm/Kconfig | 1 +
 drivers/net/Kconfig  | 1 +
 drivers/vhost/Kconfig| 1 +
 virt/kvm/Kconfig | 3 +++
 6 files changed, 8 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index bfb915d..9d8f363 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select KVM_CROSS_ENDIAN_GUESTS
depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
---help---
  Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index bfffe8f..9af39fe 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select KVM_CROSS_ENDIAN_GUESTS
---help---
  Support hosting virtualized guest machines.
 
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 3caec2c..e028710 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
+   select KVM_CROSS_ENDIAN_GUESTS
---help---
  Support running unmodified book3s_64 guest kernels in
  virtual machines on POWER7 and PPC970 processors that have
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c18f9e6..0c4ce47 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -261,6 +261,7 @@ config TUN
 config TUN_VNET_CROSS_LE
bool Support for cross-endian vnet headers on little-endian kernels
default n
+   depends on KVM_CROSS_ENDIAN_GUESTS
---help---
  This option allows TUN/TAP and MACVTAP device drivers in a
  little-endian kernel to parse vnet headers that come from a
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..4d8ae6b 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -35,6 +35,7 @@ config VHOST
 
 config VHOST_CROSS_ENDIAN_LEGACY
bool Cross-endian support for vhost
+   depends on KVM_CROSS_ENDIAN_GUESTS
default n
---help---
  This option allows vhost to support guests with a different byte
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index e2c876d..cc7b28a 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
 config KVM_COMPAT
def_bool y
depends on COMPAT  !S390
+
+config KVM_CROSS_ENDIAN_GUESTS
+   bool
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] mips/kvm: Fixes for big endian MIPS64 hosts

2015-07-09 Thread Leon Alrae
On 08/07/2015 16:22, Paolo Bonzini wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256
 
 
 
 On 08/07/2015 17:03, James Hogan wrote:
 Hi Paolo,

 On 24/04/15 11:26, James Hogan wrote:
 A couple of small fixes for accessing 32-bit KVM registers on
 big endian, and to sign extend struct kvm_regs registers so as to
 work on MIPS64 hosts.

 James Hogan (2): mips/kvm: Fix Big endian 32-bit register access 
 mips/kvm: Sign extend registers written to KVM

 Any chance of applying these in time for v2.4?
 
 I had assumed they'd go through Leon, but I can apply them for 2.4-rc1
 as well.

I thought these changes would get applied via kvm queue since they touch
target-mips/kvm.c only. But if such target-specific kvm patches usually go via
target-* tree then I'm happy to pick them up.

Leon

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

2015-07-09 Thread Peter Maydell
On 9 July 2015 at 11:22, Christoffer Dall christoffer.d...@linaro.org wrote:
 On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote:
 I suspect Jan is right and we really need to distinguish
 the KVM_PUT_*_STATE levels in ARM QEMU. This probably
 implies some kind of whitelist/override mechanism, since
 by and large we neither know nor want to know the
 semantics for system registers, we leave that up to the
 kernel.

 Q: if you have a running VM, and you pause it for
 an hour, what should the CNTVCT register do? Presumably
 it should not advance, but how do we arrange for that
 to happen?

 I think the CNTVCT should not advance when the VM is not scheduled, so
 if we pause the VM or starve all the VCPUs for enough time, the guest
 should not see time progressing, since otherwise the guest scheduler
 cannot maintain fairness and you're bound to see spurious RCU stalls
 etc.

 That's exactly why a guest can read both a virtual and physical counter
 and it is an area where you simply want some level of
 paravirtualization.  I haven't studied how/if Linux deals with this at
 all.

 So I think adjusting CNTVOFF should be managed by the kernel for the
 pause/starvation scenario (which I think Avi once told me x86 does too -
 does anyone know the current state of the art?).

Agreed that we should be aiming for same behaviour as x86
rather than reinventing the wheel...

 So the only situation where I think userspace should adjust the CNTVOFF
 value is for migration where we are talking about a brand new VM with
 has_run_once clear.

For incoming migration/loadvm the VM will be freshly *reset*,
but it will not be completely created from scratch. It's
possible for the user to run a VM for a bit, and then use
loadvm from the monitor to load an old snapshot. This
will do a qemu_system_reset() and then restore the state.
So in that case has_run_once won't be clear. (Would it be
OK to clear it when userspace does VCPU_INIT? What else
does it control?)

 Thus, if we were designing this from scratch now, the API should
 be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the
 VM has run once, but it's too late for that as we would break userspace.
 The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF
 in the kernel side as well, and finally also fix QEMU so that it doesn't
 try to do the thing that future kernels will ignore.

Isn't Marc's patch causing us to return an error to userspace?
(It does a 'return -1', at least -- does that get ignored at
a higher level?)

In principle, if userspace does:
 * stop all vCPUs
 * read the counter values from each vCPU
 * write the counter values back to each vCPU
 * resume vCPUs

this shouldn't cause time to do funny things, should it?
(ie the problem only occurs if we try to write the vCPU
counter value while only one vCPU is stopped).

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

2015-07-09 Thread Jan Kiszka
On 2015-07-09 12:22, Christoffer Dall wrote:
 Hi Peter and Marc,
 
 [cc'ing Paolo for his input on x86 timekeeping]
 
 On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote:
 On 8 July 2015 at 17:37, Marc Zyngier marc.zyng...@arm.com wrote:
 On 08/07/15 17:06, Peter Maydell wrote:
 I'd prefer it if somebody could investigate to see why QEMU
 is actually doing this -- so far we just have speculation.

 I'd prefer that too, but so far people seem to be more comfortable
 waiting for the issue to fix itself. In the meantime, VMs are broken in
 weird and wonderful ways, and I don't think the current status-quo helps
 anyone.

 Putting in a patch which might not be the right fix isn't
 necessarily a good plan either...

 Does has_run_once get cleared if we do a re-VCPU_INIT
 of a CPU that's run before? (We need to allow rewriting
 of guest state at that point so that reset VM and
 load migration state behaves correctly.)
 
 no, it does not, has_run_once is set the first time a VCPU is run and is
 currently *never* cleared.
 

 I suspect Jan is right and we really need to distinguish
 the KVM_PUT_*_STATE levels in ARM QEMU. This probably
 implies some kind of whitelist/override mechanism, since
 by and large we neither know nor want to know the
 semantics for system registers, we leave that up to the
 kernel.

 Q: if you have a running VM, and you pause it for
 an hour, what should the CNTVCT register do? Presumably
 it should not advance, but how do we arrange for that
 to happen?

 
 I think the CNTVCT should not advance when the VM is not scheduled, so
 if we pause the VM or starve all the VCPUs for enough time, the guest
 should not see time progressing, since otherwise the guest scheduler
 cannot maintain fairness and you're bound to see spurious RCU stalls
 etc.
 
 That's exactly why a guest can read both a virtual and physical counter
 and it is an area where you simply want some level of
 paravirtualization.  I haven't studied how/if Linux deals with this at
 all.
 
 So I think adjusting CNTVOFF should be managed by the kernel for the
 pause/starvation scenario (which I think Avi once told me x86 does too -
 does anyone know the current state of the art?).
 
 So the only situation where I think userspace should adjust the CNTVOFF
 value is for migration where we are talking about a brand new VM with
 has_run_once clear.
 
 Thus, if we were designing this from scratch now, the API should
 be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the
 VM has run once, but it's too late for that as we would break userspace.
 The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF
 in the kernel side as well, and finally also fix QEMU so that it doesn't
 try to do the thing that future kernels will ignore.

Fixing QEMU to only write on KVM_PUT_FULL_STATE - yes, that should be
done, but I don't think the approach for the kernel is generally right.
The kernel should not do any policing on user space requests to change
the VCPU or VM state unless

 - security is affected
 - userspace lacks information, thus cannot decide correctly
 - legacy userspace has a bug, we can detect it and want to fix that up
   without affecting future userspace that has a reason to do it
   differently

Regarding CNTVOFF, the first two criteria do not apply for sure. Maybe
the last one, don't know. Just think of the hypothetical scenario that a
userspace VM debugger wants to inject certain register manipulations. If
you block this by some hidden VM state like proposed, that feature would
no longer be implementable easily.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-09 Thread Laszlo Ersek
On 07/09/15 08:09, Paolo Bonzini wrote:
 
 
 On 09/07/2015 00:36, Bandan Das wrote:
 Let userspace inquire the maximum physical address width
 of the host processors; this can be used to identify maximum
 memory that can be assigned to the guest.

 Reported-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  arch/x86/kvm/x86.c   | 3 +++
  include/uapi/linux/kvm.h | 1 +
  2 files changed, 4 insertions(+)

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index bbaf44e..97d6746 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
 ext)
  case KVM_CAP_NR_MEMSLOTS:
  r = KVM_USER_MEM_SLOTS;
  break;
 +case KVM_CAP_PHY_ADDR_WIDTH:
 +r = boot_cpu_data.x86_phys_bits;
 +break;
 
 Userspace can just use CPUID, can't it?

I believe KVM's cooperation is necessary, for the following reason:

The truncation only occurs when the guest-phys - host-phys translation
is done in hardware, *and* the phys bits of the host processor are
insufficient to represent the highest guest-phys address that the guest
will ever face.

The first condition (of course) means that the truncation depends on EPT
being enabled. (I didn't test on AMD so I don't know if RVI has the same
issue.) If EPT is disabled, either because the host processor lacks it,
or because the respective kvm_intel module parameter is set so, then the
issue cannot be experienced.

Therefore I believe a KVM patch is necessary.

However, this specific patch doesn't seem sufficient; it should also
consider whether EPT is enabled. (And the ioctl should be perhaps
renamed to reflect that -- what QEMU needs to know is not the raw
physical address width of the host processor, but whether that width
will cause EPT to silently truncate high guest-phys addresses.)

Thanks
Laszlo

 
 Paolo
 
  case KVM_CAP_PV_MMU:/* obsolete */
  r = 0;
  break;
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 716ad4a..e7949a1 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_DISABLE_QUIRKS 116
  #define KVM_CAP_X86_SMM 117
  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
 +#define KVM_CAP_PHY_ADDR_WIDTH 119
  
  #ifdef KVM_CAP_IRQ_ROUTING
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: x86: fix load xsave feature warning

2015-07-09 Thread Wanpeng Li
[   68.196974] WARNING: CPU: 1 PID: 2140 at arch/x86/kvm/x86.c:3161 
kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]()
[   68.196975] Modules linked in: snd_hda_codec_hdmi i915 rfcomm bnep bluetooth 
i2c_algo_bit rfkill nfsd drm_kms_helper nfs_acl nfs drm lockd grace sunrpc 
fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec 
snd_hda_core snd_hwdep snd_pcm snd_seq_dummy snd_seq_oss x86_pkg_temp_thermal 
snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi kvm snd_seq 
ghash_clmulni_intel fuse snd_timer aesni_intel parport_pc ablk_helper 
snd_seq_device cryptd ppdev snd lp parport lrw dcdbas gf128mul i2c_core 
glue_helper lpc_ich video shpchp mfd_core soundcore serio_raw acpi_cpufreq ext4 
mbcache jbd2 sd_mod crc32c_intel ahci libahci libata e1000e ptp pps_core
[   68.197005] CPU: 1 PID: 2140 Comm: qemu-system-x86 Not tainted 4.2.0-rc1+ #2
[   68.197006] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 
01/08/2015
[   68.197007]  a03b0657 8800d984bca8 815915a2 

[   68.197009]   8800d984bce8 81057c0a 
7ff6d0001000
[   68.197010]  0002 880211c1a000 0004 
8800ce0288c0
[   68.197012] Call Trace:
[   68.197017]  [815915a2] dump_stack+0x45/0x57
[   68.197020]  [81057c0a] warn_slowpath_common+0x8a/0xc0
[   68.197022]  [81057cfa] warn_slowpath_null+0x1a/0x20
[   68.197029]  [a037bed8] kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]
[   68.197035]  [a037aede] ? kvm_arch_vcpu_load+0x4e/0x1c0 [kvm]
[   68.197040]  [a03696a6] kvm_vcpu_ioctl+0xc6/0x5c0 [kvm]
[   68.197043]  [811252d2] ? perf_pmu_enable+0x22/0x30
[   68.197044]  [8112663e] ? perf_event_context_sched_in+0x7e/0xb0
[   68.197048]  [811a6882] do_vfs_ioctl+0x2c2/0x4a0
[   68.197050]  [8107bf33] ? finish_task_switch+0x173/0x220
[   68.197053]  [8123307f] ? selinux_file_ioctl+0x4f/0xd0
[   68.197055]  [8122cac3] ? security_file_ioctl+0x43/0x60
[   68.197057]  [811a6ad9] SyS_ioctl+0x79/0x90
[   68.197060]  [81597e57] entry_SYSCALL_64_fastpath+0x12/0x6a
[   68.197061] ---[ end trace 558a5ebf9445fc80 ]---

After commit (0c4109bec0 'x86/fpu/xstate: Fix up bad get_xsave_addr() 
assumptions'), there is no assumption an xsave bit is present in the 
hardware (pcntxt_mask) that it is always present in a given xsave buffer.
An enabled state to be present on 'pcntxt_mask', but *not* in 'xstate_bv'
could happen when the last 'xsave' did not request that this feature be 
saved (unlikely) or because the init optimization caused it to not be 
saved. This patch kill the assumption.

Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
---
Note: This patch against latest linus tree.

 arch/x86/kvm/x86.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bbaf44e..7125cefe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3157,8 +3157,7 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
cpuid_count(XSTATE_CPUID, index,
size, offset, ecx, edx);
memcpy(dest, src + offset, size);
-   } else
-   WARN_ON_ONCE(1);
+   }
 
valid -= feature;
}
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 00:36, Bandan Das wrote:
 Let userspace inquire the maximum physical address width
 of the host processors; this can be used to identify maximum
 memory that can be assigned to the guest.
 
 Reported-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  arch/x86/kvm/x86.c   | 3 +++
  include/uapi/linux/kvm.h | 1 +
  2 files changed, 4 insertions(+)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index bbaf44e..97d6746 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
 ext)
   case KVM_CAP_NR_MEMSLOTS:
   r = KVM_USER_MEM_SLOTS;
   break;
 + case KVM_CAP_PHY_ADDR_WIDTH:
 + r = boot_cpu_data.x86_phys_bits;
 + break;

Userspace can just use CPUID, can't it?

Paolo

   case KVM_CAP_PV_MMU:/* obsolete */
   r = 0;
   break;
 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
 index 716ad4a..e7949a1 100644
 --- a/include/uapi/linux/kvm.h
 +++ b/include/uapi/linux/kvm.h
 @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_DISABLE_QUIRKS 116
  #define KVM_CAP_X86_SMM 117
  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
 +#define KVM_CAP_PHY_ADDR_WIDTH 119
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target-i386: Sanity check host processor physical address width

2015-07-09 Thread Laszlo Ersek
On 07/09/15 00:42, Bandan Das wrote:
 
 If a Linux guest is assigned more memory than is supported
 by the host processor, the guest is unable to boot. That
 is expected, however, there's no message indicating the user
 what went wrong. This change prints a message to stderr if
 KVM has the corresponding capability.
 
 Reported-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  linux-headers/linux/kvm.h | 1 +
  target-i386/kvm.c | 6 ++
  2 files changed, 7 insertions(+)
 
 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
 index 3bac873..6afad49 100644
 --- a/linux-headers/linux/kvm.h
 +++ b/linux-headers/linux/kvm.h
 @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_DISABLE_QUIRKS 116
  #define KVM_CAP_X86_SMM 117
  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
 +#define KVM_CAP_PHY_ADDR_WIDTH 119
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 066d03d..66e3448 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  uint64_t shadow_mem;
  int ret;
  struct utsname utsname;
 +int max_phys_bits;
  
  ret = kvm_get_supported_msrs(s);
  if (ret  0) {
 @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  }
  }
  
 +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH);
 +if (max_phys_bits  (1ULL  max_phys_bits) = ram_size)
 +fprintf(stderr, Warning: The amount of memory assigned to the guest 
 
 +is more than that supported by the host CPU(s). Guest may be 
 unstable.\n);
 +
  if (kvm_check_extension(s, KVM_CAP_X86_SMM)) {
  smram_machine_done.notify = register_smram_listener;
  qemu_add_machine_init_done_notifier(smram_machine_done);
 

First, see my comments on the KVM patch.

Second, ram_size is not the right thing to compare. What should be
checked is whether the highest guest-physical address that maps to RAM
can be represented in the address width of the host processor (and only
if EPT is enabled, but that sub-condition belongs to the KVM patch).

Note that this is not the same as the check written in the patch. For
example, if you assume a 32-bit PCI hole with size 1 GB, then a total
guest RAM of size 63 GB will result in the highest guest-phys memory
address being 0xF__, which just fits into 36 bits.

Correspondingly, the above code would not print the warning for

  -m $((63 * 1024 + 1))

on my laptop (which has address sizes   : 36 bits physical, ...), even
though such a guest would not boot for me (with EPT enabled).

Please see

http://thread.gmane.org/gmane.comp.bios.tianocore.devel/15418/focus=15447

So, ram_size in the controlling expression should be replaced with
maximum_guest_ram_address (which should be inclusive, and the = relop
should be preserved).

Thanks
Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Thomas Huth
The option for supporting cross-endianness legacy guests in
the vhost and tun code should only be available on systems
that support cross-endian guests.

Signed-off-by: Thomas Huth th...@redhat.com
---
 arch/arm/kvm/Kconfig | 1 +
 arch/arm64/kvm/Kconfig   | 1 +
 arch/powerpc/kvm/Kconfig | 1 +
 drivers/net/Kconfig  | 1 +
 drivers/vhost/Kconfig| 1 +
 virt/kvm/Kconfig | 3 +++
 6 files changed, 8 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index bfb915d..9d8f363 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select KVM_CROSS_ENDIAN_GUESTS
depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
---help---
  Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index bfffe8f..9af39fe 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select KVM_CROSS_ENDIAN_GUESTS
---help---
  Support hosting virtualized guest machines.
 
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 3caec2c..e028710 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
select KVM_BOOK3S_HV_POSSIBLE
select MMU_NOTIFIER
select CMA
+   select KVM_CROSS_ENDIAN_GUESTS
---help---
  Support running unmodified book3s_64 guest kernels in
  virtual machines on POWER7 and PPC970 processors that have
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c18f9e6..0c4ce47 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -261,6 +261,7 @@ config TUN
 config TUN_VNET_CROSS_LE
bool Support for cross-endian vnet headers on little-endian kernels
default n
+   depends on KVM_CROSS_ENDIAN_GUESTS
---help---
  This option allows TUN/TAP and MACVTAP device drivers in a
  little-endian kernel to parse vnet headers that come from a
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..4d8ae6b 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -35,6 +35,7 @@ config VHOST
 
 config VHOST_CROSS_ENDIAN_LEGACY
bool Cross-endian support for vhost
+   depends on KVM_CROSS_ENDIAN_GUESTS
default n
---help---
  This option allows vhost to support guests with a different byte
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index e2c876d..cc7b28a 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
 config KVM_COMPAT
def_bool y
depends on COMPAT  !S390
+
+config KVM_CROSS_ENDIAN_GUESTS
+   bool
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target-i386: Sanity check host processor physical address width

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 00:42, Bandan Das wrote:
 
 If a Linux guest is assigned more memory than is supported
 by the host processor, the guest is unable to boot. That
 is expected, however, there's no message indicating the user
 what went wrong. This change prints a message to stderr if
 KVM has the corresponding capability.
 
 Reported-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Bandan Das b...@redhat.com

This is not entirely correct, because it doesn't take the PCI hole into
account.

Perhaps KVM could simply hide memory above the limit (i.e. treat it as
MMIO), and the BIOS could remove RAM above the limit from the e820
memory map?

Paolo

 ---
  linux-headers/linux/kvm.h | 1 +
  target-i386/kvm.c | 6 ++
  2 files changed, 7 insertions(+)
 
 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
 index 3bac873..6afad49 100644
 --- a/linux-headers/linux/kvm.h
 +++ b/linux-headers/linux/kvm.h
 @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_DISABLE_QUIRKS 116
  #define KVM_CAP_X86_SMM 117
  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
 +#define KVM_CAP_PHY_ADDR_WIDTH 119
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 066d03d..66e3448 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  uint64_t shadow_mem;
  int ret;
  struct utsname utsname;
 +int max_phys_bits;
  
  ret = kvm_get_supported_msrs(s);
  if (ret  0) {
 @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  }
  }
  
 +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH);
 +if (max_phys_bits  (1ULL  max_phys_bits) = ram_size)
 +fprintf(stderr, Warning: The amount of memory assigned to the guest 
 
 +is more than that supported by the host CPU(s). Guest may be 
 unstable.\n);
 +
  if (kvm_check_extension(s, KVM_CAP_X86_SMM)) {
  smram_machine_done.notify = register_smram_listener;
  qemu_add_machine_init_done_notifier(smram_machine_done);
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target-i386: Sanity check host processor physical address width

2015-07-09 Thread Laszlo Ersek
On 07/09/15 09:59, Paolo Bonzini wrote:
 
 
 On 09/07/2015 00:42, Bandan Das wrote:

 If a Linux guest is assigned more memory than is supported
 by the host processor, the guest is unable to boot. That
 is expected, however, there's no message indicating the user
 what went wrong. This change prints a message to stderr if
 KVM has the corresponding capability.

 Reported-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Bandan Das b...@redhat.com
 
 This is not entirely correct, because it doesn't take the PCI hole into
 account.
 
 Perhaps KVM could simply hide memory above the limit (i.e. treat it as
 MMIO), and the BIOS could remove RAM above the limit from the e820
 memory map?

I'd prefer to leave the guest firmware*s* out of this... :)

E820 is a legacy BIOS concept. In OVMF we'd have to hack the memory
resource descriptor HOBs (which in turn control the DXE memory space
map, which in turn controls the UEFI memory map). Those HOBs are
currently based on what the CMOS reports about the RAM available under
and above 4GB.

It's pretty complex already (will get more complex with SMM support),
and TBH, for working around such an obscure issue, I wouldn't like to
complicate it even further...

After all, this is a host platform limitation. The solution should be to
either move to a more capable host, or do it in software (disable EPT).

Thanks!
Laszlo

 
 Paolo
 
 ---
  linux-headers/linux/kvm.h | 1 +
  target-i386/kvm.c | 6 ++
  2 files changed, 7 insertions(+)

 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
 index 3bac873..6afad49 100644
 --- a/linux-headers/linux/kvm.h
 +++ b/linux-headers/linux/kvm.h
 @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_DISABLE_QUIRKS 116
  #define KVM_CAP_X86_SMM 117
  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
 +#define KVM_CAP_PHY_ADDR_WIDTH 119
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 066d03d..66e3448 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  uint64_t shadow_mem;
  int ret;
  struct utsname utsname;
 +int max_phys_bits;
  
  ret = kvm_get_supported_msrs(s);
  if (ret  0) {
 @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  }
  }
  
 +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH);
 +if (max_phys_bits  (1ULL  max_phys_bits) = ram_size)
 +fprintf(stderr, Warning: The amount of memory assigned to the 
 guest 
 +is more than that supported by the host CPU(s). Guest may be 
 unstable.\n);
 +
  if (kvm_check_extension(s, KVM_CAP_X86_SMM)) {
  smram_machine_done.notify = register_smram_listener;
  qemu_add_machine_init_done_notifier(smram_machine_done);


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live migration using shared storage in different networks

2015-07-09 Thread Stefan Hajnoczi
On Mon, Jul 06, 2015 at 09:44:07AM +0100, Miguel Barbosa Gonçalves wrote:
 I am building a KVM cluster that needs VM live migration.
 
 My shared storage as well as the KVM hosts will be running
 CentOS.
 
 Because 10 Gbps Ethernet switches are very expensive at the
 moment I will connect the KVM hosts to the storage by
 cross-over cables and create private networks for each
 connection (10.0.0.0/30 and 10.0.0.4/30).
 
 The following diagram shows the topology
 
  Management ManagementManagement
 VLAN   VLAN  VLAN
  |  | |
 ++-+  10 Gbps  +++  10 Gbps  ++-+
 | KVM Host |---| Storage |---| KVM Host |
 ++-+   +++   ++-+
  |  | |
Public PublicPublic
 VLAN   VLAN  VLAN
 
 My question is: will live migration work in this configuration
 since the storage will have 2 different IP addresses
 (10.0.0.1 and 10.0.0.5) in 2 different networks even though
 it is the same storage?

At the QEMU level this works.

At the libvirt level you may need to be careful how you configure the
storage and domains (VMs).  You need to make sure that the storage
network details are not part of the same VM configuration that gets
applied on both hosts.

For example: if you NFS mount or attach iSCSI LUNs to the host and then
just give libvirt the path to the image file, it will work.  But if you
want libvirt to do the NFS/iSCSI setup for you then you'll have to dig
into the configuration documentation (http://libvirt.org/) and it may
not be possible.

Stefan


pgpDCdJm4w0jZ.pgp
Description: PGP signature


[PATCH for 2.4 0/2] MIPS build fixes for v2.4

2015-07-09 Thread James Hogan
These two patches fix build errors for the MIPS TCG backend and MIPS
KVM.

Please could they be applied for v2.4.

James Hogan (2):
  tcg/mips: Fix build error from merged memop+mmu_idx parameter
  mips/kvm: Sync with newer MIPS KVM headers

 target-mips/kvm.c | 15 ++-
 tcg/mips/tcg-target.c |  4 ++--
 2 files changed, 4 insertions(+), 15 deletions(-)

Cc: Aurelien Jarno aurel...@aurel32.net
Cc: Leon Alrae leon.al...@imgtec.com
Cc: Richard Henderson r...@twiddle.net
Cc: Peter Maydell peter.mayd...@linaro.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: kvm@vger.kernel.org
-- 
2.3.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for 2.4 2/2] mips/kvm: Sync with newer MIPS KVM headers

2015-07-09 Thread James Hogan
The KVM_REG_MIPS_COUNT_* definitions are now included in
linux-headers/asm-mips/kvm.h since commit b061808d39fa (linux-headers:
update linux headers to kvm/next), therefore the duplicate definitions
in target-mips/kvm.c can now be dropped (the definitions were tweaked
slightly in commit 7a52ce8a1607 (linux-headers: update) which
triggered the following build warnings turned errors):

target-mips/kvm.c:232:0: error: KVM_REG_MIPS_COUNT_CTL redefined [-Werror]
linux-headers/asm/kvm.h:129:0: note: this is the location of the previous 
definition
target-mips/kvm.c:236:0: error: KVM_REG_MIPS_COUNT_RESUME redefined [-Werror]
linux-headers/asm/kvm.h:141:0: note: this is the location of the previous 
definition
target-mips/kvm.c:239:0: error: KVM_REG_MIPS_COUNT_HZ redefined [-Werror]
linux-headers/asm/kvm.h:147:0: note: this is the location of the previous 
definition

Also update the MIPS_C0_{32,64} macros to utilise definitions more
recently added to the asm-mips/kvm.h header.

Signed-off-by: James Hogan james.ho...@imgtec.com
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Leon Alrae leon.al...@imgtec.com
Cc: Aurelien Jarno aurel...@aurel32.net
Cc: kvm@vger.kernel.org
---
 target-mips/kvm.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/target-mips/kvm.c b/target-mips/kvm.c
index 7d2293d93492..bd64a70bcda0 100644
--- a/target-mips/kvm.c
+++ b/target-mips/kvm.c
@@ -212,10 +212,10 @@ int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, int 
level)
 }
 
 #define MIPS_CP0_32(_R, _S) \
-(KVM_REG_MIPS | KVM_REG_SIZE_U32 | 0x1 | (8 * (_R) + (_S)))
+(KVM_REG_MIPS_CP0 | KVM_REG_SIZE_U32 | (8 * (_R) + (_S)))
 
 #define MIPS_CP0_64(_R, _S) \
-(KVM_REG_MIPS | KVM_REG_SIZE_U64 | 0x1 | (8 * (_R) + (_S)))
+(KVM_REG_MIPS_CP0 | KVM_REG_SIZE_U64 | (8 * (_R) + (_S)))
 
 #define KVM_REG_MIPS_CP0_INDEX  MIPS_CP0_32(0, 0)
 #define KVM_REG_MIPS_CP0_CONTEXTMIPS_CP0_64(4, 0)
@@ -232,17 +232,6 @@ int kvm_mips_set_ipi_interrupt(MIPSCPU *cpu, int irq, int 
level)
 #define KVM_REG_MIPS_CP0_EPCMIPS_CP0_64(14, 0)
 #define KVM_REG_MIPS_CP0_ERROREPC   MIPS_CP0_64(30, 0)
 
-/* CP0_Count control */
-#define KVM_REG_MIPS_COUNT_CTL  (KVM_REG_MIPS | KVM_REG_SIZE_U64 | \
- 0x2 | 0)
-#define KVM_REG_MIPS_COUNT_CTL_DC   0x0001  /* master disable */
-/* CP0_Count resume monotonic nanoseconds */
-#define KVM_REG_MIPS_COUNT_RESUME   (KVM_REG_MIPS | KVM_REG_SIZE_U64 | \
- 0x2 | 1)
-/* CP0_Count rate in Hz */
-#define KVM_REG_MIPS_COUNT_HZ   (KVM_REG_MIPS | KVM_REG_SIZE_U64 | \
- 0x2 | 2)
-
 static inline int kvm_mips_put_one_reg(CPUState *cs, uint64_t reg_id,
int32_t *addr)
 {
-- 
2.3.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/13] arm64: KVM: VHE: Add alternatives for VHE-enabled world-switch

2015-07-09 Thread Marc Zyngier
Hi Mario,

On 09/07/15 02:29, Mario Smarduch wrote:
 On 07/08/2015 09:19 AM, Marc Zyngier wrote:
 In order to switch between host and guest, a VHE-enabled kernel
 must use different accessors for certain system registers.

 This patch uses runtime patching to use the right instruction
 when required...

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  arch/arm64/include/asm/kvm_asm.h |  40 ++--
  arch/arm64/kvm/hyp.S | 210 
 ++-
  arch/arm64/kvm/vhe-macros.h  |  18 
  3 files changed, 191 insertions(+), 77 deletions(-)

 []
   * Author: Marc Zyngier marc.zyng...@arm.com
   *
   * This program is free software; you can redistribute it and/or modify
 @@ -67,40 +67,52 @@
   stp x29, lr, [x3, #80]

   mrs x19, sp_el0
 - mrs x20, elr_el2// pc before entering el2
 - mrs x21, spsr_el2   // pstate before entering el2
 + str x19, [x3, #96]
 +.endm

 - stp x19, x20, [x3, #96]
 - str x21, [x3, #112]
 
 Hi Marc,
 
   trying to make a little sense out of this :)

Don't even try, it hurts... ;-)

 In the case of VHE kernel the two 'mrs_hyp()' and 'mrs_el1()'
 calls would  be accessing same registers - namely EL1 variants?
 For non VHE EL2, EL1?
 
 The mrs_s and sysreg_EL12 are new, not sure what these mean.

mrs_s and msr_s are just macros to that deal with system registers that
the assembler doesn't know about (yet). They have been in (moderate) use
for about a year, and have been introduced with the GICv3 support.

See arch/arm64/include/asm/sysreg.h for the gory details.

Now, on to sysreg_EL12: The main idea with VHE is that anything that
used to run at EL1 (the kernel) can now run unmodified at EL2, and that
it is the EL2 software that has to change to deal with it.

So when the kernel uses VHE and runs at EL2, an access to sysreg_EL1
really accesses sysreg_EL2, transparently. This is what makes it
possible to run the kernel at EL2 without any change.

But when the KVM world switch wants to access a guest register, it
cannot use sysreg_EL1 anymore (that would hit on the EL2 register
because of the above rule). For this, it must use sysreg_EL12 which
effectively means access the EL1 register from EL2.

As a consequence, we have the following rules:
- non-VHE: msr_el1 uses EL1, msr_hyp uses EL2
- VHE: msr_el1 uses EL12, msr_hyp uses EL1

Does this help?

M.

 - Mario
 
 +.macro save_el1_state
 + mrs_hyp(x20, ELR)   // pc before entering el2
 + mrs_hyp(x21, SPSR)  // pstate before entering el2

   mrs x22, sp_el1
 - mrs x23, elr_el1
 - mrs x24, spsr_el1
 +
 + mrs_el1(x23, elr)
 + mrs_el1(x24, spsr)
 +
 + add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0
 + stp x20, x21, [x3, #8]  // HACK: Store to the regs after SP_EL0

   str x22, [x2, #CPU_GP_REG_OFFSET(CPU_SP_EL1)]
   str x23, [x2, #CPU_GP_REG_OFFSET(CPU_ELR_EL1)]
   str x24, [x2, #CPU_SPSR_OFFSET(KVM_SPSR_EL1)]
  .endm

 -.macro restore_common_regs
 +.macro restore_el1_state
   // x2: base address for cpu context
   // x3: tmp register

 + add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0
 + ldp x20, x21, [x3, #8] // Same hack again, get guest PC and pstate
 +
   ldr x22, [x2, #CPU_GP_REG_OFFSET(CPU_SP_EL1)]
   ldr x23, [x2, #CPU_GP_REG_OFFSET(CPU_ELR_EL1)]
   ldr x24, [x2, #CPU_SPSR_OFFSET(KVM_SPSR_EL1)]

 + msr_hyp(ELR, x20)   // pc on return from el2
 + msr_hyp(SPSR, x21)  // pstate on return from el2
 +
   msr sp_el1, x22
 - msr elr_el1, x23
 - msr spsr_el1, x24

 - add x3, x2, #CPU_XREG_OFFSET(31)// SP_EL0
 - ldp x19, x20, [x3]
 - ldr x21, [x3, #16]
 + msr_el1(elr, x23)
 + msr_el1(spsr, x24)
 +.endm

 +.macro restore_common_regs
 + // x2: base address for cpu context
 + // x3: tmp register
 +
 + ldr x19, [x2, #CPU_XREG_OFFSET(31)] // SP_EL0
   msr sp_el0, x19
 - msr elr_el2, x20// pc on return from el2
 - msr spsr_el2, x21   // pstate on return from el2

   add x3, x2, #CPU_XREG_OFFSET(19)
   ldp x19, x20, [x3]
 @@ -113,9 +125,15 @@

  .macro save_host_regs
   save_common_regs
 +ifnvhe   nop,b  skip_el1_save
 + save_el1_state
 +skip_el1_save:
  .endm

  .macro restore_host_regs
 +ifnvhe   nop,b  
 skip_el1_restore
 + restore_el1_state
 +skip_el1_restore:
   restore_common_regs
  .endm

 @@ -159,6 +177,7 @@
   stp x6, x7, [x3, #16]

   save_common_regs
 + save_el1_state
  .endm

  .macro restore_guest_regs
 @@ -184,6 +203,7 @@
   ldr x18, [x3, #144]

   // x19-x29, lr, sp*, elr*, spsr*
 + restore_el1_state
   restore_common_regs

   // Last bits of the 

[PATCH v2 0/7] KVM: arm/arm64: gsi routing support

2015-07-09 Thread Eric Auger
With the advent of GICv3 ITS in-kernel emulation, KVM GSI routing
appears to be requested. More specifically MSI routing is needed.
irqchip routing does not sound to be really useful on arm but usage of
MSI routing also mandates to integrate irqchip routing. The initial
implementation of irqfd on arm must be upgraded with the integration
of kvm irqchip.c code and the implementation of its standard hooks
in the architecture specific part.

In case KVM_SET_GSI_ROUTING ioctl is not called, a default routing
table with flat irqchip routing entries is built enabling to inject gsi
corresponding to the SPI indexes seen by the guest.

As soon as KVM_SET_GSI_ROUTING is called, user-space overwrites this
default routing table and is responsible for building the whole routing
table.

for arm/arm64 KVM_SET_GSI_ROUTING has a limited support:
- only applies to KVM_IRQFD and not to KVM_IRQ_LINE

- irqchip routing was tested on Calxeda midway (VFIO with irqfd)
  with and without explicit routing
- MSI routing without GICv3 ITS was tested using APM Xgene-I
  (qemu VIRTIO-PCI vhost net without gsi_direct_mapping).
- MSI routing with GICv3 ITS is *NOT* tested.

Code can be found at 
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.2-rc1-gsi-routing-v2

It applies on Andre's [PATCH 00/13] arm64: KVM: GICv3 ITS emulation
(http://www.spinics.net/lists/kvm/msg117402.html)

History:

v1 - v2:
- user API changed:
  x devid id passed in kvm_irq_routing_msi
  x kept the new routing entry type: KVM_IRQ_ROUTING_EXTENDED_MSI
- kvm_host.h: adopt Andre's proposal to replace the msi_msg by a struct
  composed of the msi_msg and devid in kvm_kernel_irq_routing_entry
- Fix bug reported by Pavel: Added KVM_IRQ_ROUTING_EXTENDED_MSI handling
  in eventfd.c
- added vgic_v2m_inject_msi in vgic-v2-emul.c as suggested by Andre
- fix bug reported by Andre: bad setting of msi.flags and msi.devid
  in kvm_send_userspace_msi
- avoid injecting reserved IRQ numbers in vgic_irqfd_set_irq

RFC - PATCH:
- clearly state limited support on arm/arm64:
  KVM_IRQ_LINE not impacted by GSI routing
- add default routing table feature (new patch file)
- changed uapi to use padding field area
- reword api.txt

Eric Auger (7):
  KVM: api: introduce KVM_IRQ_ROUTING_EXTENDED_MSI
  KVM: kvm_host: add devid in kvm_kernel_irq_routing_entry
  KVM: irqchip: convey devid to kvm_set_msi
  KVM: arm/arm64: enable irqchip routing
  KVM: arm/arm64: build a default routing table
  KVM: arm/arm64: enable MSI routing
  KVM: arm: implement kvm_set_msi by gsi direct mapping

 Documentation/virtual/kvm/api.txt |  32 ++--
 arch/arm/include/asm/kvm_host.h   |   2 +
 arch/arm/kvm/Kconfig  |   3 ++
 arch/arm/kvm/Makefile |   2 +-
 arch/arm64/include/asm/kvm_host.h |   1 +
 arch/arm64/kvm/Kconfig|   2 +
 arch/arm64/kvm/Makefile   |   2 +-
 include/kvm/arm_vgic.h|   9 
 include/linux/kvm_host.h  |   7 ++-
 include/uapi/linux/kvm.h  |   6 ++-
 virt/kvm/arm/vgic-v2-emul.c   |  12 +
 virt/kvm/arm/vgic.c   | 101 +-
 virt/kvm/eventfd.c|   6 ++-
 virt/kvm/irqchip.c|  12 -
 14 files changed, 153 insertions(+), 44 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/7] KVM: arm/arm64: build a default routing table

2015-07-09 Thread Eric Auger
Implement a default routing table made of flat irqchip routing
entries (gsi = irqchip.pin) covering the VGIC SPI indexes.
This routing table is overwritten by the first user-space call
to KVM_SET_GSI_ROUTING ioctl.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

PATCH: creation
---
 virt/kvm/arm/vgic.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 6c6c25e..63be67e 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1781,6 +1781,8 @@ int vgic_init(struct kvm *kvm)
ret |= vgic_init_bitmap(dist-irq_cfg, nr_cpus, nr_irqs);
ret |= vgic_init_bytemap(dist-irq_priority, nr_cpus, nr_irqs);
 
+   ret |= kvm_setup_default_irq_routing(kvm);
+
if (ret)
goto out;
 
@@ -2258,6 +2260,25 @@ out:
return r;
 }
 
+int kvm_setup_default_irq_routing(struct kvm *kvm)
+{
+   struct kvm_irq_routing_entry *entries;
+   u32 nr = kvm-arch.vgic.nr_irqs - VGIC_NR_PRIVATE_IRQS;
+   int i, ret;
+
+   entries = kcalloc(nr, sizeof(struct kvm_kernel_irq_routing_entry),
+ GFP_KERNEL);
+   for (i = 0; i  nr; i++) {
+   entries[i].gsi = i;
+   entries[i].type = KVM_IRQ_ROUTING_IRQCHIP;
+   entries[i].u.irqchip.irqchip = 0;
+   entries[i].u.irqchip.pin = i;
+   }
+   ret = kvm_set_irq_routing(kvm, entries, nr, 0);
+   kfree(entries);
+   return ret;
+}
+
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id,
int level, bool line_status)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/7] KVM: kvm_host: add devid in kvm_kernel_irq_routing_entry

2015-07-09 Thread Eric Auger
Extend kvm_kernel_irq_routing_entry to transport devid. This is
needed for ARM. Its validity depends on the routing type entry.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v1 - v2:
replace msi_msg field by a struct composed of msi_msg and devid

RFC - PATCH:
- reword the commit message after change in first patch (uapi)
---
 include/linux/kvm_host.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9564fd7..cac0fe4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -321,7 +321,10 @@ struct kvm_kernel_irq_routing_entry {
unsigned irqchip;
unsigned pin;
} irqchip;
-   struct msi_msg msi;
+   struct {
+   struct msi_msg msi;
+   u32 devid;
+   };
struct kvm_s390_adapter_int adapter;
};
struct hlist_node link;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping

2015-07-09 Thread Eric Auger
If the ITS modality is not available, let's simply support MSI
injection by transforming the MSI.data into an SPI ID.

This becomes possible to use KVM_SIGNAL_MSI ioctl for arm too.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v1 - v2:
- introduce vgic_v2m_inject_msi in vgic-v2-emul.c following Andre's
  advice
---
 arch/arm/kvm/Kconfig|  1 +
 virt/kvm/arm/vgic-v2-emul.c | 12 
 2 files changed, 13 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 151e710..0f58baf 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select HAVE_KVM_MSI
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQ_ROUTING
depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c
index 1390797..43013cc 100644
--- a/virt/kvm/arm/vgic-v2-emul.c
+++ b/virt/kvm/arm/vgic-v2-emul.c
@@ -478,6 +478,17 @@ static bool vgic_v2_queue_sgi(struct kvm_vcpu *vcpu, int 
irq)
 }
 
 /**
+ * Emulates GICv2M MSI injection by injecting the SPI ID matching
+ * the msi data
+ * @kvm: pointer to the kvm struct
+ * @msi: the msi struct handle
+ */
+static int vgic_v2m_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
+{
+   return kvm_vgic_inject_irq(kvm, 0, msi-data, 1);
+}
+
+/**
  * kvm_vgic_map_resources - Configure global VGIC state before running any 
VCPUs
  * @kvm: pointer to the kvm struct
  *
@@ -566,6 +577,7 @@ void vgic_v2_init_emulation(struct kvm *kvm)
dist-vm_ops.add_sgi_source = vgic_v2_add_sgi_source;
dist-vm_ops.init_model = vgic_v2_init_model;
dist-vm_ops.map_resources = vgic_v2_map_resources;
+   dist-vm_ops.inject_msi = vgic_v2m_inject_msi;
 
kvm-arch.max_vcpus = VGIC_V2_MAX_CPUS;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/7] KVM: arm/arm64: enable MSI routing

2015-07-09 Thread Eric Auger
Up to now, only irqchip routing entries could be set. This patch
adds the capability to insert MSI routing entries, with or without
device id. Although standard MSI entries can be set, their
injection still is not supported. For ARM64, let's also increase
KVM_MAX_IRQ_ROUTES to 4096: include SPI irqchip flat routes plus
MSI routes. In the future this might be extended.

The new MSI routing entry type also must be managed similarly to
legacy KVM_IRQ_ROUTING_MSI in eventfd irqfd_wakeup and irqfd_update.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v1 - v2:
- adapt to new routing entry types

RFC - PATCH:
- move api MSI routing updates into that patch file
- use new devid field of user api struct
---
 Documentation/virtual/kvm/api.txt | 10 ++
 include/linux/kvm_host.h  |  2 ++
 virt/kvm/arm/vgic.c   | 13 +
 virt/kvm/eventfd.c|  6 --
 4 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 9276dac..686b121 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1431,6 +1431,11 @@ Sets the GSI routing table entries, overwriting any 
previously set entries.
 On arm/arm64, GSI routing has the following limitation:
 - GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.
 
+On arm/arm64, MSI routing through in-kernel GICv3 ITS must use
+KVM_IRQ_ROUTING_EXTENDED_MSI routing type and device ID must be set
+in msi struct. Otherwise, KVM_IRQ_ROUTING_MSI must be used without
+populating the msi devid field.
+
 struct kvm_irq_routing {
__u32 nr;
__u32 flags;
@@ -2342,6 +2347,11 @@ On arm/arm64, gsi routing being supported, the following 
can happen:
 - in case no routing entry is associated to this gsi, injection fails
 - in case the gsi is associated to an irqchip routing entry,
   irqchip.pin + 32 corresponds to the injected SPI ID.
+- in case the gsi is associated to an MSI routing entry,
+  * without GICv3 ITS in-kernel emulation, MSI data matches the SPI ID
+of the injected SPI
+  * with GICv3 ITS in-kernel emulation, the MSI message and device ID
+are translated into an LPI.
 
 4.76 KVM_PPC_ALLOCATE_HTAB
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cac0fe4..ea1a810 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -976,6 +976,8 @@ static inline int mmu_notifier_retry(struct kvm *kvm, 
unsigned long mmu_seq)
 
 #ifdef CONFIG_S390
 #define KVM_MAX_IRQ_ROUTES 4096 //FIXME: we can have more than that...
+#elif defined(CONFIG_ARM64)
+#define KVM_MAX_IRQ_ROUTES 4096
 #else
 #define KVM_MAX_IRQ_ROUTES 1024
 #endif
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 63be67e..c5546d8 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2252,6 +2252,19 @@ int kvm_set_routing_entry(struct 
kvm_kernel_irq_routing_entry *e,
(e-irqchip.irqchip = KVM_NR_IRQCHIPS))
goto out;
break;
+   case KVM_IRQ_ROUTING_MSI:
+   e-set = kvm_set_msi;
+   e-msi.address_lo = ue-u.msi.address_lo;
+   e-msi.address_hi = ue-u.msi.address_hi;
+   e-msi.data = ue-u.msi.data;
+   break;
+   case KVM_IRQ_ROUTING_EXTENDED_MSI:
+   e-set = kvm_set_msi;
+   e-msi.address_lo = ue-u.msi.address_lo;
+   e-msi.address_hi = ue-u.msi.address_hi;
+   e-msi.data = ue-u.msi.data;
+   e-devid = ue-u.msi.devid;
+   break;
default:
goto out;
}
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..d76d05d 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -238,7 +238,8 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
irq = irqfd-irq_entry;
} while (read_seqcount_retry(irqfd-irq_entry_sc, seq));
/* An event has been signaled, inject an interrupt */
-   if (irq.type == KVM_IRQ_ROUTING_MSI)
+   if (irq.type == KVM_IRQ_ROUTING_MSI ||
+   irq.type == KVM_IRQ_ROUTING_EXTENDED_MSI)
kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1,
false);
else
@@ -294,7 +295,8 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd 
*irqfd)
e = entries;
for (i = 0; i  n_entries; ++i, ++e) {
/* Only fast-path MSI. */
-   if (e-type == KVM_IRQ_ROUTING_MSI)
+   if (e-type == KVM_IRQ_ROUTING_MSI ||
+   e-type == KVM_IRQ_ROUTING_EXTENDED_MSI)
irqfd-irq_entry = *e;
}
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH v2 3/7] KVM: irqchip: convey devid to kvm_set_msi

2015-07-09 Thread Eric Auger
on ARM, a devid field is populated in kvm_msi struct in case the
flag is set to KVM_MSI_VALID_DEVID. Let's populate the corresponding
kvm_kernel_irq_routing_entry devid field and set the msi type to
KVM_IRQ_ROUTING_EXTENDED_MSI.

Signed-off-by: Eric Auger eric.au...@linaro.org
---
 virt/kvm/irqchip.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 21c1424..e678f8a 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -72,9 +72,17 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi 
*msi)
 {
struct kvm_kernel_irq_routing_entry route;
 
-   if (!irqchip_in_kernel(kvm) || msi-flags != 0)
+   if (!irqchip_in_kernel(kvm))
return -EINVAL;
 
+   if (msi-flags  KVM_MSI_VALID_DEVID) {
+   route.devid = msi-devid;
+   route.type = KVM_IRQ_ROUTING_EXTENDED_MSI;
+   } else if (!msi-flags)
+   return -EINVAL;
+
+   /* historically the route.type was not set */
+
route.msi.address_lo = msi-address_lo;
route.msi.address_hi = msi-address_hi;
route.msi.data = msi-data;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/7] KVM: arm/arm64: enable irqchip routing

2015-07-09 Thread Eric Auger
This patch adds compilation and link against irqchip.

On ARM, irqchip routing is not really useful since there is
a single irqchip. However main motivation behind using irqchip
code is to enable MSI routing code. With the support of in-kernel
GICv3 ITS emulation, it now seems to be a MUST HAVE requirement.

Functions previously implemented in vgic.c and substitute
to more complex irqchip implementation are removed:

- kvm_send_userspace_msi
- kvm_irq_map_chip_pin
- kvm_set_irq
- kvm_irq_map_gsi.

They implemented a kernel default identity GSI routing. This is now
replaced by user-side provided routing.

Routing standard hooks are now implemented in vgic:
- kvm_set_routing_entry
- kvm_set_irq
- kvm_set_msi

Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined.
KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed.

MSI routing is not yet allowed.

Signed-off-by: Eric Auger eric.au...@linaro.org

---
v1 - v2:
- fix bug reported by Andre related to msi.flags and msi.devid setting
  in kvm_send_userspace_msi
- avoid injecting reserved IRQ numbers in vgic_irqfd_set_irq

RFC - PATCH
- reword api.txt:
  x move MSI routing comments in a subsequent patch,
  x clearly state GSI routing does not apply to KVM_IRQ_LINE
---
 Documentation/virtual/kvm/api.txt | 12 ---
 arch/arm/include/asm/kvm_host.h   |  2 ++
 arch/arm/kvm/Kconfig  |  2 ++
 arch/arm/kvm/Makefile |  2 +-
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/Kconfig|  2 ++
 arch/arm64/kvm/Makefile   |  2 +-
 include/kvm/arm_vgic.h|  9 -
 virt/kvm/arm/vgic.c   | 71 +--
 virt/kvm/irqchip.c|  2 ++
 10 files changed, 65 insertions(+), 40 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 3d920cd..9276dac 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1421,13 +1421,16 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or 
guest IRQ is allowed.
 4.52 KVM_SET_GSI_ROUTING
 
 Capability: KVM_CAP_IRQ_ROUTING
-Architectures: x86 s390
+Architectures: x86 s390 arm arm64
 Type: vm ioctl
 Parameters: struct kvm_irq_routing (in)
 Returns: 0 on success, -1 on error
 
 Sets the GSI routing table entries, overwriting any previously set entries.
 
+On arm/arm64, GSI routing has the following limitation:
+- GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.
+
 struct kvm_irq_routing {
__u32 nr;
__u32 flags;
@@ -2335,9 +2338,10 @@ Note that closing the resamplefd is not sufficient to 
disable the
 irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
 and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
 
-On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared
-Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is
-given by gsi + 32.
+On arm/arm64, gsi routing being supported, the following can happen:
+- in case no routing entry is associated to this gsi, injection fails
+- in case the gsi is associated to an irqchip routing entry,
+  irqchip.pin + 32 corresponds to the injected SPI ID.
 
 4.76 KVM_PPC_ALLOCATE_HTAB
 
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index e896d2c..9871441 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -42,6 +42,8 @@
 
 #define KVM_VCPU_MAX_FEATURES 2
 
+#define KVM_IRQCHIP_NUM_PINS 988 /* 1020 -32 is the number of SPI */
+
 #include kvm/arm_vgic.h
 
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index bfb915d..151e710 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -31,6 +31,8 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select HAVE_KVM_IRQCHIP
+   select HAVE_KVM_IRQ_ROUTING
depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
---help---
  Support hosting virtualized guest machines.
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index c5eef02c..1a8f48a 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
 AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
 
 KVM := ../../../virt/kvm
-kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o 
$(KVM)/vfio.o
+kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o 
$(KVM)/vfio.o $(KVM)/irqchip.o
 
 obj-y += kvm-arm.o init.o interrupts.o
 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 2709db2..f6201eb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -44,6 +44,7 @@
 #include kvm/arm_arch_timer.h
 
 #define KVM_VCPU_MAX_FEATURES 3
+#define KVM_IRQCHIP_NUM_PINS 988 /* 

[PATCH v2 1/7] KVM: api: introduce KVM_IRQ_ROUTING_EXTENDED_MSI

2015-07-09 Thread Eric Auger
On ARM, the MSI msg (address and data) comes along with
out-of-band device ID information. The device ID encodes the
device that writes the MSI msg. Let's convey the device id in
kvm_irq_routing_msi and use a new routing entry type to
indicate the devid is populated.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v1 - v2:
- devid id passed in kvm_irq_routing_msi instead of in
  kvm_irq_routing_entry

RFC - PATCH
- remove kvm_irq_routing_extended_msi and use union instead
---
 Documentation/virtual/kvm/api.txt | 10 +-
 include/uapi/linux/kvm.h  |  6 +-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index bea8de7..3d920cd 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1453,6 +1453,7 @@ struct kvm_irq_routing_entry {
 #define KVM_IRQ_ROUTING_IRQCHIP 1
 #define KVM_IRQ_ROUTING_MSI 2
 #define KVM_IRQ_ROUTING_S390_ADAPTER 3
+#define KVM_IRQ_ROUTING_EXTENDED_MSI 4
 
 No flags are specified so far, the corresponding field must be set to zero.
 
@@ -1465,9 +1466,16 @@ struct kvm_irq_routing_msi {
__u32 address_lo;
__u32 address_hi;
__u32 data;
-   __u32 pad;
+   union {
+   __u32 pad;
+   __u32 devid;
+   };
 };
 
+for KVM_IRQ_ROUTING_EXTENDED_MSI routing entry type, the kvm_irq_routing_msi
+routing entry is used and devid is populated with the device ID that wrote
+the MSI message. For PCI, this is usually a BFD identifier in the lower 16 
bits.
+
 struct kvm_irq_routing_s390_adapter {
__u64 ind_addr;
__u64 summary_addr;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9a261e5..39ec164 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -829,7 +829,10 @@ struct kvm_irq_routing_msi {
__u32 address_lo;
__u32 address_hi;
__u32 data;
-   __u32 pad;
+   union {
+   __u32 pad;
+   __u32 devid;
+   };
 };
 
 struct kvm_irq_routing_s390_adapter {
@@ -844,6 +847,7 @@ struct kvm_irq_routing_s390_adapter {
 #define KVM_IRQ_ROUTING_IRQCHIP 1
 #define KVM_IRQ_ROUTING_MSI 2
 #define KVM_IRQ_ROUTING_S390_ADAPTER 3
+#define KVM_IRQ_ROUTING_EXTENDED_MSI 4
 
 struct kvm_irq_routing_entry {
__u32 gsi;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Detect vGIC presence at runtime

2015-07-09 Thread Christoffer Dall
On Tue, Jul 07, 2015 at 02:11:01PM +0300, Pavel Fedin wrote:
 Allows to use KVM on hardware without vGIC. Interrupt controller has to be
 emulated in userspace in this case.
 
 Signed-off-by: Pavel Fedin p.fe...@samsung.com
 ---
  arch/arm/kvm/arm.c | 19 ++-
  1 file changed, 18 insertions(+), 1 deletion(-)
 
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index d9631ec..f2f7a13 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -61,6 +61,8 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
  static u8 kvm_next_vmid;
  static DEFINE_SPINLOCK(kvm_vmid_lock);
  
 +static bool vgic_present;
 +
  static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
  {
   BUG_ON(preemptible());
 @@ -172,6 +174,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
 ext)
   switch (ext) {
   case KVM_CAP_IRQCHIP:
   case KVM_CAP_IRQFD:
 + r = vgic_present;
 + break;
   case KVM_CAP_IOEVENTFD:
   case KVM_CAP_DEVICE_CTRL:
   case KVM_CAP_USER_MEMORY:
 @@ -850,6 +854,8 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
  
   switch (dev_id) {
   case KVM_ARM_DEVICE_VGIC_V2:
 + if (!vgic_present)
 + return -ENXIO;
   return kvm_vgic_addr(kvm, type, dev_addr-addr, true);
   default:
   return -ENODEV;
 @@ -864,6 +870,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
  
   switch (ioctl) {
   case KVM_CREATE_IRQCHIP: {
 + if (!vgic_present)
 + return -ENXIO;
   return kvm_vgic_create(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
   }
   case KVM_ARM_SET_DEVICE_ADDR: {
 @@ -1046,8 +1054,17 @@ static int init_hyp_mode(void)
* Init HYP view of VGIC
*/
   err = kvm_vgic_hyp_init();
 - if (err)
 + switch (err) {
 + case 0:
 + vgic_present = true;
 + break;
 + case -ENODEV:
 + case -ENXIO:
 + vgic_present = false;

why not report ENXIO as an error?  If probing the vgic fails due to
being unable to request the irq or something similar, then surely your
system has and error and this should be reported.

This may be more nicely implemented by letting the vgic init/probe
functions set the vgic_present, or maybe better yet, just export a
function from vgic.c:

bool kvm_vgic_present(void)
{
return vgic_ops != NULL;
}

 + break;
 + default:
   goto out_free_context;
 + }
  
   /*
* Init HYP architected timer support
 -- 
 2.4.4
 

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for 2.4 0/2] MIPS build fixes for v2.4

2015-07-09 Thread Leon Alrae
On 09/07/2015 10:17, James Hogan wrote:
 These two patches fix build errors for the MIPS TCG backend and MIPS
 KVM.
 
 Please could they be applied for v2.4.
 
 James Hogan (2):
   tcg/mips: Fix build error from merged memop+mmu_idx parameter
   mips/kvm: Sync with newer MIPS KVM headers
 
  target-mips/kvm.c | 15 ++-
  tcg/mips/tcg-target.c |  4 ++--
  2 files changed, 4 insertions(+), 15 deletions(-)

Reviewed-by: Leon Alrae leon.al...@imgtec.com

Peter, since these are build fixes, could they be squeezed into rc0?

Thanks,
Leon

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Cornelia Huck
On Thu,  9 Jul 2015 09:49:05 +0200
Thomas Huth th...@redhat.com wrote:

 The option for supporting cross-endianness legacy guests in

s/cross-endianness/cross-endian/ ?

 the vhost and tun code should only be available on systems
 that support cross-endian guests.
 
 Signed-off-by: Thomas Huth th...@redhat.com
 ---
  arch/arm/kvm/Kconfig | 1 +
  arch/arm64/kvm/Kconfig   | 1 +
  arch/powerpc/kvm/Kconfig | 1 +
  drivers/net/Kconfig  | 1 +
  drivers/vhost/Kconfig| 1 +
  virt/kvm/Kconfig | 3 +++
  6 files changed, 8 insertions(+)

Acked-by: Cornelia Huck cornelia.h...@de.ibm.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Cornelia Huck
On Thu,  9 Jul 2015 09:49:05 +0200
Thomas Huth th...@redhat.com wrote:

 The option for supporting cross-endianness legacy guests in

s/cross-endianness/cross-endian/ ?

 the vhost and tun code should only be available on systems
 that support cross-endian guests.
 
 Signed-off-by: Thomas Huth th...@redhat.com
 ---
  arch/arm/kvm/Kconfig | 1 +
  arch/arm64/kvm/Kconfig   | 1 +
  arch/powerpc/kvm/Kconfig | 1 +
  drivers/net/Kconfig  | 1 +
  drivers/vhost/Kconfig| 1 +
  virt/kvm/Kconfig | 3 +++
  6 files changed, 8 insertions(+)

Acked-by: Cornelia Huck cornelia.h...@de.ibm.com

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/6] IRQ bypass manager and irqfd consumer

2015-07-09 Thread Joerg Roedel
On Tue, Jul 07, 2015 at 11:17:48AM -0600, Alex Williamson wrote:
 Hosting the bypass manager in kernel/irq seemed appropriate, but really
 it could be anywhere.  Does anyone have a different preference or
 specifically want it under their scope?  We had originally thought of
 this as an IOMMU service, but I think we've generalized it beyond that.
 I expect we should also add the necessary hooks to turn it into a
 loadable module to keep the tinification folks happy, I'll incorporate
 the current working changes and post a version with that.

Yeah, this is only an IOMMU service on x86, afaik. So drivers/iommu is
probably the wrong place to host it.

Will there be any other producers than VFIO or any other consumers than
KVM? If not, it should live in one of these spaces. KVM is probably the
best choice, as any hardware feature that uses this targets
virtualization, so there will hardly ever be another consumer than KVM.


Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

2015-07-09 Thread Peter Maydell
On 9 July 2015 at 13:05, Christoffer Dall christoffer.d...@linaro.org wrote:
 As I understand it, the problem is that if we ever run a VCPU after
 reading the value, and write back the value afterwards, you potentially
 make time go backwards and get inconsistent views of time from different
 VCPUs because they may have read the time before/after updating the
 CNTVOFF.

Right, but I think if QEMU does that it's a bug (and more to
the point I don't entirely understand why we would do that
yet, even given that we don't have a distinction between
registers to sync always and registers to sync only on
reset...)

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

2015-07-09 Thread Christoffer Dall
On Thu, Jul 09, 2015 at 01:07:24PM +0100, Peter Maydell wrote:
 On 9 July 2015 at 13:05, Christoffer Dall christoffer.d...@linaro.org wrote:
  As I understand it, the problem is that if we ever run a VCPU after
  reading the value, and write back the value afterwards, you potentially
  make time go backwards and get inconsistent views of time from different
  VCPUs because they may have read the time before/after updating the
  CNTVOFF.
 
 Right, but I think if QEMU does that it's a bug (and more to
 the point I don't entirely understand why we would do that
 yet, even given that we don't have a distinction between
 registers to sync always and registers to sync only on
 reset...)
 
I think we have evidence that it does that, but we don't know why/how.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 08/11] KVM: arm: implement dirty bit mechanism for debug registers

2015-07-09 Thread Christoffer Dall
On Tue, Jul 07, 2015 at 11:24:06AM +0100, Will Deacon wrote:
 On Tue, Jul 07, 2015 at 11:06:57AM +0100, Zhichao Huang wrote:
  Chazy and me are talking about how to reduce the saving/restoring
  overhead for debug registers.
  We want to add a state in hw_breakpoint.c to indicate whether the host
  enable any hwbrpts or not (might export a fuction that kvm can call),
  then we can read this state from memory instead of reading from real
  hardware registers, and to decide whether we need a world switch or
  not.
  Does it acceptable?
 
 Maybe, hard to tell without the code. There are obvious races to deal with
 if you use variables to indicate whether resources are in use -- why not
 just trap debug access from the host as well? Then you could keep track of
 the owner in kvm and trap accesses from everybody else.
 
The only information we're looking for here is whether the host has
enabled some break/watch point so that we need to disable them before
running the guest.

Just to re-iterate, when we are about to run a guest, we have the
following cases:

1) Neither the host nor the guest has configured any [WB]points
2) Only the host has configured any [WB]points
3) Only the guest has configured any [WB]points
4) Both the host and the guest have configured any [WB]points

In case (1), KVM should enable trapping and swtich the register state on
guest accesses.

In cases (2), (3), and (4) we must switch the register state on each
entry/exit.

If we are to trap debug register accesses in KVM to set a flag to keep
track of the owner (iow. has the host touched the registers) then don't
we impose an ordering requirement of whether KVM or the breakpoint
functionality gets initialized first, and we need to take special care
when tearing down KVM to disable the traps?  It sounds a little complex.

I've previously suggested to simply look at the B/W control registers to
figure out what to do.  Caching the state in memory is an optimization,
do we even have any idea how important such an optimization is?

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

2015-07-09 Thread Christoffer Dall
On Thu, Jul 09, 2015 at 11:38:40AM +0100, Peter Maydell wrote:
 On 9 July 2015 at 11:22, Christoffer Dall christoffer.d...@linaro.org wrote:
  On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote:
  I suspect Jan is right and we really need to distinguish
  the KVM_PUT_*_STATE levels in ARM QEMU. This probably
  implies some kind of whitelist/override mechanism, since
  by and large we neither know nor want to know the
  semantics for system registers, we leave that up to the
  kernel.
 
  Q: if you have a running VM, and you pause it for
  an hour, what should the CNTVCT register do? Presumably
  it should not advance, but how do we arrange for that
  to happen?
 
  I think the CNTVCT should not advance when the VM is not scheduled, so
  if we pause the VM or starve all the VCPUs for enough time, the guest
  should not see time progressing, since otherwise the guest scheduler
  cannot maintain fairness and you're bound to see spurious RCU stalls
  etc.
 
  That's exactly why a guest can read both a virtual and physical counter
  and it is an area where you simply want some level of
  paravirtualization.  I haven't studied how/if Linux deals with this at
  all.
 
  So I think adjusting CNTVOFF should be managed by the kernel for the
  pause/starvation scenario (which I think Avi once told me x86 does too -
  does anyone know the current state of the art?).
 
 Agreed that we should be aiming for same behaviour as x86
 rather than reinventing the wheel...
 
  So the only situation where I think userspace should adjust the CNTVOFF
  value is for migration where we are talking about a brand new VM with
  has_run_once clear.
 
 For incoming migration/loadvm the VM will be freshly *reset*,
 but it will not be completely created from scratch. It's
 possible for the user to run a VM for a bit, and then use
 loadvm from the monitor to load an old snapshot. This
 will do a qemu_system_reset() and then restore the state.
 So in that case has_run_once won't be clear. (Would it be
 OK to clear it when userspace does VCPU_INIT? What else
 does it control?)

We would have to introduce another flag.  It is currently used to only
map the VGIC resources once (we have to do this before running the VCPU
the first time because only then do we have the required info of where
to map the resources etc.), and we use it to enforce that the user
doesn't initialize a VCPU with one setting, and then initialize it with
a different setting afterwards (for example with power-on state or the
emulated CPU target).

 
  Thus, if we were designing this from scratch now, the API should
  be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the
  VM has run once, but it's too late for that as we would break userspace.
  The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF
  in the kernel side as well, and finally also fix QEMU so that it doesn't
  try to do the thing that future kernels will ignore.
 
 Isn't Marc's patch causing us to return an error to userspace?
 (It does a 'return -1', at least -- does that get ignored at
 a higher level?)

I remembered it as it simply ignored it, but you're right, it returns -1
to userspace (which is a strange thing we do in arch_timer.c, I'm not
sure why we return -EPERM here - probably a bug).

So that can't work obviously, because it will regress userspace.

 
 In principle, if userspace does:
  * stop all vCPUs
  * read the counter values from each vCPU
  * write the counter values back to each vCPU
  * resume vCPUs
 
 this shouldn't cause time to do funny things, should it?

No, it shouldn't.

 (ie the problem only occurs if we try to write the vCPU
 counter value while only one vCPU is stopped).
 

As I understand it, the problem is that if we ever run a VCPU after
reading the value, and write back the value afterwards, you potentially
make time go backwards and get inconsistent views of time from different
VCPUs because they may have read the time before/after updating the
CNTVOFF.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

2015-07-09 Thread Christoffer Dall
Hi Jan,

On Thu, Jul 09, 2015 at 12:40:39PM +0200, Jan Kiszka wrote:
 On 2015-07-09 12:22, Christoffer Dall wrote:
  Hi Peter and Marc,
  
  [cc'ing Paolo for his input on x86 timekeeping]
  
  On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote:
  On 8 July 2015 at 17:37, Marc Zyngier marc.zyng...@arm.com wrote:
  On 08/07/15 17:06, Peter Maydell wrote:
  I'd prefer it if somebody could investigate to see why QEMU
  is actually doing this -- so far we just have speculation.
 
  I'd prefer that too, but so far people seem to be more comfortable
  waiting for the issue to fix itself. In the meantime, VMs are broken in
  weird and wonderful ways, and I don't think the current status-quo helps
  anyone.
 
  Putting in a patch which might not be the right fix isn't
  necessarily a good plan either...
 
  Does has_run_once get cleared if we do a re-VCPU_INIT
  of a CPU that's run before? (We need to allow rewriting
  of guest state at that point so that reset VM and
  load migration state behaves correctly.)
  
  no, it does not, has_run_once is set the first time a VCPU is run and is
  currently *never* cleared.
  
 
  I suspect Jan is right and we really need to distinguish
  the KVM_PUT_*_STATE levels in ARM QEMU. This probably
  implies some kind of whitelist/override mechanism, since
  by and large we neither know nor want to know the
  semantics for system registers, we leave that up to the
  kernel.
 
  Q: if you have a running VM, and you pause it for
  an hour, what should the CNTVCT register do? Presumably
  it should not advance, but how do we arrange for that
  to happen?
 
  
  I think the CNTVCT should not advance when the VM is not scheduled, so
  if we pause the VM or starve all the VCPUs for enough time, the guest
  should not see time progressing, since otherwise the guest scheduler
  cannot maintain fairness and you're bound to see spurious RCU stalls
  etc.
  
  That's exactly why a guest can read both a virtual and physical counter
  and it is an area where you simply want some level of
  paravirtualization.  I haven't studied how/if Linux deals with this at
  all.
  
  So I think adjusting CNTVOFF should be managed by the kernel for the
  pause/starvation scenario (which I think Avi once told me x86 does too -
  does anyone know the current state of the art?).
  
  So the only situation where I think userspace should adjust the CNTVOFF
  value is for migration where we are talking about a brand new VM with
  has_run_once clear.
  
  Thus, if we were designing this from scratch now, the API should
  be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the
  VM has run once, but it's too late for that as we would break userspace.
  The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF
  in the kernel side as well, and finally also fix QEMU so that it doesn't
  try to do the thing that future kernels will ignore.
 
 Fixing QEMU to only write on KVM_PUT_FULL_STATE - yes, that should be
 done, but I don't think the approach for the kernel is generally right.
 The kernel should not do any policing on user space requests to change
 the VCPU or VM state unless
 
  - security is affected
  - userspace lacks information, thus cannot decide correctly
  - legacy userspace has a bug, we can detect it and want to fix that up
without affecting future userspace that has a reason to do it
differently
 
 Regarding CNTVOFF, the first two criteria do not apply for sure. Maybe
 the last one, don't know. Just think of the hypothetical scenario that a
 userspace VM debugger wants to inject certain register manipulations. If
 you block this by some hidden VM state like proposed, that feature would
 no longer be implementable easily.
 
Hmm, I didn't think about this, I guess reverse execution and
record/replay scenarios would also want you to make time appear to be
going backwards?

But the third case kind of does apply, if we care about fixing legacy
userspace in the kernel.  We could define a new register index for the
ioctl that allows time to move forward that future userspace can use,
and impose the restriction of not making time go backwards with the
current index.  Would that break any currently working use cases on
arm64 with legacy userspace, such as migration?

Thanks,
-Christoffer


signature.asc
Description: Digital signature


Re: [PATCH 01/13] arm/arm64: Add new is_kernel_in_hyp_mode predicate

2015-07-09 Thread Mark Rutland
Hi,

 +static inline bool is_kernel_in_hyp_mode(void)
 +{
 + u64 el;
 +
 + asm(mrs %0, CurrentEL : =r (el));
 + return el == CurrentEL_EL2;
 +}

If you can include cputype.h, I think this can be:

static inline bool is_kernel_in_hyp_mode(void)
{
return read_cpuid(CurrentEL) == CurrentEL_EL2;
}

Mark.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Laurent Vivier


On 09/07/2015 09:49, Thomas Huth wrote:
 The option for supporting cross-endianness legacy guests in
 the vhost and tun code should only be available on systems
 that support cross-endian guests.

I'm sure I misunderstand something, but what happens if we use QEMU with
TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
little endian host ?
Do you forbid the use of vhost in this case ?

 Signed-off-by: Thomas Huth th...@redhat.com
 ---
  arch/arm/kvm/Kconfig | 1 +
  arch/arm64/kvm/Kconfig   | 1 +
  arch/powerpc/kvm/Kconfig | 1 +
  drivers/net/Kconfig  | 1 +
  drivers/vhost/Kconfig| 1 +
  virt/kvm/Kconfig | 3 +++
  6 files changed, 8 insertions(+)
 
 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index bfb915d..9d8f363 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -31,6 +31,7 @@ config KVM
   select KVM_VFIO
   select HAVE_KVM_EVENTFD
   select HAVE_KVM_IRQFD
 + select KVM_CROSS_ENDIAN_GUESTS
   depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
   ---help---
 Support hosting virtualized guest machines.
 diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
 index bfffe8f..9af39fe 100644
 --- a/arch/arm64/kvm/Kconfig
 +++ b/arch/arm64/kvm/Kconfig
 @@ -31,6 +31,7 @@ config KVM
   select KVM_VFIO
   select HAVE_KVM_EVENTFD
   select HAVE_KVM_IRQFD
 + select KVM_CROSS_ENDIAN_GUESTS
   ---help---
 Support hosting virtualized guest machines.
  
 diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
 index 3caec2c..e028710 100644
 --- a/arch/powerpc/kvm/Kconfig
 +++ b/arch/powerpc/kvm/Kconfig
 @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
   select KVM_BOOK3S_HV_POSSIBLE
   select MMU_NOTIFIER
   select CMA
 + select KVM_CROSS_ENDIAN_GUESTS
   ---help---
 Support running unmodified book3s_64 guest kernels in
 virtual machines on POWER7 and PPC970 processors that have
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index c18f9e6..0c4ce47 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -261,6 +261,7 @@ config TUN
  config TUN_VNET_CROSS_LE
   bool Support for cross-endian vnet headers on little-endian kernels
   default n
 + depends on KVM_CROSS_ENDIAN_GUESTS
   ---help---
 This option allows TUN/TAP and MACVTAP device drivers in a
 little-endian kernel to parse vnet headers that come from a
 diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
 index 533eaf0..4d8ae6b 100644
 --- a/drivers/vhost/Kconfig
 +++ b/drivers/vhost/Kconfig
 @@ -35,6 +35,7 @@ config VHOST
  
  config VHOST_CROSS_ENDIAN_LEGACY
   bool Cross-endian support for vhost
 + depends on KVM_CROSS_ENDIAN_GUESTS
   default n
   ---help---
 This option allows vhost to support guests with a different byte
 diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
 index e2c876d..cc7b28a 100644
 --- a/virt/kvm/Kconfig
 +++ b/virt/kvm/Kconfig
 @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
  config KVM_COMPAT
 def_bool y
 depends on COMPAT  !S390
 +
 +config KVM_CROSS_ENDIAN_GUESTS
 +   bool
 
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 00/11] KVM Guest Debug support for arm64

2015-07-09 Thread Christoffer Dall
On Tue, Jul 07, 2015 at 05:29:52PM +0100, Alex Bennée wrote:
 Here is V8 of the KVM Guest Debug support for arm64.
 
 The diffstat between v7 and v8 is getting pretty small and as I
 haven't re-based you can run:
 
 git diff -u guest-debug/4.1-v7..guest-debug/4.1-v8
 
 And the kernelci report is at:
 
 http://kernelci.org/build/alex/kernel/v4.1-11-g182a5fa64600/
 
 
 The only real changes apart from comments and white space are to
 sys_regs which sees another minor re-factoring. The 32 bit handling
 explicitly preserves the top 32 bits of the AArch64 registers although
 I'm not convinced it matter too much as for a booting AArch32 guest
 kernel they should start as 0 and never change.
 
 For full details see the changelog on each of the patches.
 
 GIT Repos:
 
 The patches for this series are based off v4.1 and can be found
 at:
 
 Kernel:
 https://git.linaro.org/people/alex.bennee/linux.git
 branch: guest-debug/4.1-v8
 describe: v4.1-11-g182a5faB
 
 QEMU:
 https://github.com/stsquad/qemu
 branch: kvm/guest-debug-v6*** BLURB HERE ***
 
 
I'm happy with this series as it stands now and think it can be put into
-next.

I have boot-tested the series with a VM on both Juno and TC2.

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/13] arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature

2015-07-09 Thread Marc Zyngier
On 09/07/15 10:48, Mark Rutland wrote:
 On Wed, Jul 08, 2015 at 05:19:06PM +0100, Marc Zyngier wrote:
 Add a new ARM64_HAS_VIRT_HOST_EXTN features to indicate that the
 CPU has the ARMv8,1 VHE capability.
 
 Nit: s/,/./
 
 It's probably worth mentioning somewhere that we have to check CurrentEL
 rather than a feature register in case some prior software dropped us to
 EL1N (e.g. if we're a guest under this scheme).

Good point, this is a leftover from a previous version that actually
checked the feature register. I'll clean up the commit log.

Thanks,

M.

 Mark.
 

 This will be used to trigger kernel patching in KVM.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  arch/arm64/include/asm/cpufeature.h |  3 ++-
  arch/arm64/kernel/cpufeature.c  | 11 +++
  2 files changed, 13 insertions(+), 1 deletion(-)

 diff --git a/arch/arm64/include/asm/cpufeature.h 
 b/arch/arm64/include/asm/cpufeature.h
 index c104421..6c3742d 100644
 --- a/arch/arm64/include/asm/cpufeature.h
 +++ b/arch/arm64/include/asm/cpufeature.h
 @@ -25,8 +25,9 @@
  #define ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE1
  #define ARM64_WORKAROUND_845719 2
  #define ARM64_HAS_SYSREG_GIC_CPUIF  3
 +#define ARM64_HAS_VIRT_HOST_EXTN4
  
 -#define ARM64_NCAPS 4
 +#define ARM64_NCAPS 5
  
  #ifndef __ASSEMBLY__
  
 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
 index 5ad86ce..e1dcd63 100644
 --- a/arch/arm64/kernel/cpufeature.c
 +++ b/arch/arm64/kernel/cpufeature.c
 @@ -21,6 +21,7 @@
  #include linux/types.h
  #include asm/cpu.h
  #include asm/cpufeature.h
 +#include asm/virt.h
  
  static bool
  has_id_aa64pfr0_feature(const struct arm64_cpu_capabilities *entry)
 @@ -31,6 +32,11 @@ has_id_aa64pfr0_feature(const struct 
 arm64_cpu_capabilities *entry)
  return (val  entry-register_mask) == entry-register_value;
  }
  
 +static bool runs_at_el2(const struct arm64_cpu_capabilities *entry)
 +{
 +return is_kernel_in_hyp_mode();
 +}
 +
  static const struct arm64_cpu_capabilities arm64_features[] = {
  {
  .desc = GIC system register CPU interface,
 @@ -39,6 +45,11 @@ static const struct arm64_cpu_capabilities 
 arm64_features[] = {
  .register_mask = (0xf  24),
  .register_value = (1  24),
  },
 +{
 +.desc = Virtualization Host Extensions,
 +.capability = ARM64_HAS_VIRT_HOST_EXTN,
 +.matches = runs_at_el2,
 +},
  {},
  };
  
 -- 
 2.1.4

 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

 


-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Christoffer Dall
On Thu, Jul 09, 2015 at 09:49:05AM +0200, Thomas Huth wrote:
 The option for supporting cross-endianness legacy guests in
 the vhost and tun code should only be available on systems
 that support cross-endian guests.
 
 Signed-off-by: Thomas Huth th...@redhat.com

Acked-by: Christoffer Dall christoffer.d...@linaro.org
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Christoffer Dall
On Thu, Jul 09, 2015 at 09:49:05AM +0200, Thomas Huth wrote:
 The option for supporting cross-endianness legacy guests in
 the vhost and tun code should only be available on systems
 that support cross-endian guests.
 
 Signed-off-by: Thomas Huth th...@redhat.com

Acked-by: Christoffer Dall christoffer.d...@linaro.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] target-i386: Sanity check host processor physical address width

2015-07-09 Thread Igor Mammedov
On Thu, 09 Jul 2015 09:02:38 +0200
Laszlo Ersek ler...@redhat.com wrote:

 On 07/09/15 00:42, Bandan Das wrote:
  
  If a Linux guest is assigned more memory than is supported
  by the host processor, the guest is unable to boot. That
  is expected, however, there's no message indicating the user
  what went wrong. This change prints a message to stderr if
  KVM has the corresponding capability.
  
  Reported-by: Laszlo Ersek ler...@redhat.com
  Signed-off-by: Bandan Das b...@redhat.com
  ---
   linux-headers/linux/kvm.h | 1 +
   target-i386/kvm.c | 6 ++
   2 files changed, 7 insertions(+)
  
  diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
  index 3bac873..6afad49 100644
  --- a/linux-headers/linux/kvm.h
  +++ b/linux-headers/linux/kvm.h
  @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info {
   #define KVM_CAP_DISABLE_QUIRKS 116
   #define KVM_CAP_X86_SMM 117
   #define KVM_CAP_MULTI_ADDRESS_SPACE 118
  +#define KVM_CAP_PHY_ADDR_WIDTH 119
   
   #ifdef KVM_CAP_IRQ_ROUTING
   
  diff --git a/target-i386/kvm.c b/target-i386/kvm.c
  index 066d03d..66e3448 100644
  --- a/target-i386/kvm.c
  +++ b/target-i386/kvm.c
  @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
   uint64_t shadow_mem;
   int ret;
   struct utsname utsname;
  +int max_phys_bits;
   
   ret = kvm_get_supported_msrs(s);
   if (ret  0) {
  @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
   }
   }
   
  +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH);
  +if (max_phys_bits  (1ULL  max_phys_bits) = ram_size)
  +fprintf(stderr, Warning: The amount of memory assigned to the 
  guest 
  +is more than that supported by the host CPU(s). Guest may be 
  unstable.\n);
  +
   if (kvm_check_extension(s, KVM_CAP_X86_SMM)) {
   smram_machine_done.notify = register_smram_listener;
   qemu_add_machine_init_done_notifier(smram_machine_done);
  
 
 First, see my comments on the KVM patch.
 
 Second, ram_size is not the right thing to compare. What should be
 checked is whether the highest guest-physical address that maps to RAM
 can be represented in the address width of the host processor (and only
 if EPT is enabled, but that sub-condition belongs to the KVM patch).
 
 Note that this is not the same as the check written in the patch. For
 example, if you assume a 32-bit PCI hole with size 1 GB, then a total
 guest RAM of size 63 GB will result in the highest guest-phys memory
 address being 0xF__, which just fits into 36 bits.
 
 Correspondingly, the above code would not print the warning for
 
   -m $((63 * 1024 + 1))
 
 on my laptop (which has address sizes   : 36 bits physical, ...), even
 though such a guest would not boot for me (with EPT enabled).
 
 Please see
 
 http://thread.gmane.org/gmane.comp.bios.tianocore.devel/15418/focus=15447
 
 So, ram_size in the controlling expression should be replaced with
 maximum_guest_ram_address (which should be inclusive, and the = relop
 should be preserved).
also with memory hotplug tuned on we should check if the end of
hotplug memory area is less then limit, i.e.:

  pcms-hotplug_memory.base + hotplug_mem_size  1ULL  max_phys_bits

 
 Thanks
 Laszlo
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/13] arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature

2015-07-09 Thread Mark Rutland
On Wed, Jul 08, 2015 at 05:19:06PM +0100, Marc Zyngier wrote:
 Add a new ARM64_HAS_VIRT_HOST_EXTN features to indicate that the
 CPU has the ARMv8,1 VHE capability.

Nit: s/,/./

It's probably worth mentioning somewhere that we have to check CurrentEL
rather than a feature register in case some prior software dropped us to
EL1N (e.g. if we're a guest under this scheme).

Mark.

 
 This will be used to trigger kernel patching in KVM.
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  arch/arm64/include/asm/cpufeature.h |  3 ++-
  arch/arm64/kernel/cpufeature.c  | 11 +++
  2 files changed, 13 insertions(+), 1 deletion(-)
 
 diff --git a/arch/arm64/include/asm/cpufeature.h 
 b/arch/arm64/include/asm/cpufeature.h
 index c104421..6c3742d 100644
 --- a/arch/arm64/include/asm/cpufeature.h
 +++ b/arch/arm64/include/asm/cpufeature.h
 @@ -25,8 +25,9 @@
  #define ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE 1
  #define ARM64_WORKAROUND_845719  2
  #define ARM64_HAS_SYSREG_GIC_CPUIF   3
 +#define ARM64_HAS_VIRT_HOST_EXTN 4
  
 -#define ARM64_NCAPS  4
 +#define ARM64_NCAPS  5
  
  #ifndef __ASSEMBLY__
  
 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
 index 5ad86ce..e1dcd63 100644
 --- a/arch/arm64/kernel/cpufeature.c
 +++ b/arch/arm64/kernel/cpufeature.c
 @@ -21,6 +21,7 @@
  #include linux/types.h
  #include asm/cpu.h
  #include asm/cpufeature.h
 +#include asm/virt.h
  
  static bool
  has_id_aa64pfr0_feature(const struct arm64_cpu_capabilities *entry)
 @@ -31,6 +32,11 @@ has_id_aa64pfr0_feature(const struct 
 arm64_cpu_capabilities *entry)
   return (val  entry-register_mask) == entry-register_value;
  }
  
 +static bool runs_at_el2(const struct arm64_cpu_capabilities *entry)
 +{
 + return is_kernel_in_hyp_mode();
 +}
 +
  static const struct arm64_cpu_capabilities arm64_features[] = {
   {
   .desc = GIC system register CPU interface,
 @@ -39,6 +45,11 @@ static const struct arm64_cpu_capabilities 
 arm64_features[] = {
   .register_mask = (0xf  24),
   .register_value = (1  24),
   },
 + {
 + .desc = Virtualization Host Extensions,
 + .capability = ARM64_HAS_VIRT_HOST_EXTN,
 + .matches = runs_at_el2,
 + },
   {},
  };
  
 -- 
 2.1.4
 
 ___
 kvmarm mailing list
 kvm...@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Laurent Vivier


On 09/07/2015 09:49, Thomas Huth wrote:
 The option for supporting cross-endianness legacy guests in
 the vhost and tun code should only be available on systems
 that support cross-endian guests.

I'm sure I misunderstand something, but what happens if we use QEMU with
TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
little endian host ?
Do you forbid the use of vhost in this case ?

 Signed-off-by: Thomas Huth th...@redhat.com
 ---
  arch/arm/kvm/Kconfig | 1 +
  arch/arm64/kvm/Kconfig   | 1 +
  arch/powerpc/kvm/Kconfig | 1 +
  drivers/net/Kconfig  | 1 +
  drivers/vhost/Kconfig| 1 +
  virt/kvm/Kconfig | 3 +++
  6 files changed, 8 insertions(+)
 
 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index bfb915d..9d8f363 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -31,6 +31,7 @@ config KVM
   select KVM_VFIO
   select HAVE_KVM_EVENTFD
   select HAVE_KVM_IRQFD
 + select KVM_CROSS_ENDIAN_GUESTS
   depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
   ---help---
 Support hosting virtualized guest machines.
 diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
 index bfffe8f..9af39fe 100644
 --- a/arch/arm64/kvm/Kconfig
 +++ b/arch/arm64/kvm/Kconfig
 @@ -31,6 +31,7 @@ config KVM
   select KVM_VFIO
   select HAVE_KVM_EVENTFD
   select HAVE_KVM_IRQFD
 + select KVM_CROSS_ENDIAN_GUESTS
   ---help---
 Support hosting virtualized guest machines.
  
 diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
 index 3caec2c..e028710 100644
 --- a/arch/powerpc/kvm/Kconfig
 +++ b/arch/powerpc/kvm/Kconfig
 @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
   select KVM_BOOK3S_HV_POSSIBLE
   select MMU_NOTIFIER
   select CMA
 + select KVM_CROSS_ENDIAN_GUESTS
   ---help---
 Support running unmodified book3s_64 guest kernels in
 virtual machines on POWER7 and PPC970 processors that have
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index c18f9e6..0c4ce47 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -261,6 +261,7 @@ config TUN
  config TUN_VNET_CROSS_LE
   bool Support for cross-endian vnet headers on little-endian kernels
   default n
 + depends on KVM_CROSS_ENDIAN_GUESTS
   ---help---
 This option allows TUN/TAP and MACVTAP device drivers in a
 little-endian kernel to parse vnet headers that come from a
 diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
 index 533eaf0..4d8ae6b 100644
 --- a/drivers/vhost/Kconfig
 +++ b/drivers/vhost/Kconfig
 @@ -35,6 +35,7 @@ config VHOST
  
  config VHOST_CROSS_ENDIAN_LEGACY
   bool Cross-endian support for vhost
 + depends on KVM_CROSS_ENDIAN_GUESTS
   default n
   ---help---
 This option allows vhost to support guests with a different byte
 diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
 index e2c876d..cc7b28a 100644
 --- a/virt/kvm/Kconfig
 +++ b/virt/kvm/Kconfig
 @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
  config KVM_COMPAT
 def_bool y
 depends on COMPAT  !S390
 +
 +config KVM_CROSS_ENDIAN_GUESTS
 +   bool
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] target-i386: Sanity check host processor physical address width

2015-07-09 Thread Laszlo Ersek
On 07/09/15 11:27, Igor Mammedov wrote:
 On Thu, 09 Jul 2015 09:02:38 +0200
 Laszlo Ersek ler...@redhat.com wrote:
 
 On 07/09/15 00:42, Bandan Das wrote:

 If a Linux guest is assigned more memory than is supported
 by the host processor, the guest is unable to boot. That
 is expected, however, there's no message indicating the user
 what went wrong. This change prints a message to stderr if
 KVM has the corresponding capability.

 Reported-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  linux-headers/linux/kvm.h | 1 +
  target-i386/kvm.c | 6 ++
  2 files changed, 7 insertions(+)

 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
 index 3bac873..6afad49 100644
 --- a/linux-headers/linux/kvm.h
 +++ b/linux-headers/linux/kvm.h
 @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_DISABLE_QUIRKS 116
  #define KVM_CAP_X86_SMM 117
  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
 +#define KVM_CAP_PHY_ADDR_WIDTH 119
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 066d03d..66e3448 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  uint64_t shadow_mem;
  int ret;
  struct utsname utsname;
 +int max_phys_bits;
  
  ret = kvm_get_supported_msrs(s);
  if (ret  0) {
 @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  }
  }
  
 +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH);
 +if (max_phys_bits  (1ULL  max_phys_bits) = ram_size)
 +fprintf(stderr, Warning: The amount of memory assigned to the 
 guest 
 +is more than that supported by the host CPU(s). Guest may be 
 unstable.\n);
 +
  if (kvm_check_extension(s, KVM_CAP_X86_SMM)) {
  smram_machine_done.notify = register_smram_listener;
  qemu_add_machine_init_done_notifier(smram_machine_done);


 First, see my comments on the KVM patch.

 Second, ram_size is not the right thing to compare. What should be
 checked is whether the highest guest-physical address that maps to RAM
 can be represented in the address width of the host processor (and only
 if EPT is enabled, but that sub-condition belongs to the KVM patch).

 Note that this is not the same as the check written in the patch. For
 example, if you assume a 32-bit PCI hole with size 1 GB, then a total
 guest RAM of size 63 GB will result in the highest guest-phys memory
 address being 0xF__, which just fits into 36 bits.

 Correspondingly, the above code would not print the warning for

   -m $((63 * 1024 + 1))

 on my laptop (which has address sizes   : 36 bits physical, ...), even
 though such a guest would not boot for me (with EPT enabled).

 Please see

 http://thread.gmane.org/gmane.comp.bios.tianocore.devel/15418/focus=15447

 So, ram_size in the controlling expression should be replaced with
 maximum_guest_ram_address (which should be inclusive, and the = relop
 should be preserved).
 also with memory hotplug tuned on we should check if the end of
 hotplug memory area is less then limit, i.e.:
 
   pcms-hotplug_memory.base + hotplug_mem_size  1ULL  max_phys_bits

Seems reasonable, thanks for the hint!

(The LHS in this instance is exclusive though, so equality should *not*
trigger the warning. maximum_guest_ram_address is inclusive, and
equality should trigger the warning. (Although equality seems quite
impossible in practice.))

Thanks!
Laszlo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/13] arm/arm64: Add new is_kernel_in_hyp_mode predicate

2015-07-09 Thread Marc Zyngier
On 09/07/15 10:42, Mark Rutland wrote:
 Hi,
 
 +static inline bool is_kernel_in_hyp_mode(void)
 +{
 +u64 el;
 +
 +asm(mrs %0, CurrentEL : =r (el));
 +return el == CurrentEL_EL2;
 +}
 
 If you can include cputype.h, I think this can be:
 
 static inline bool is_kernel_in_hyp_mode(void)
 {
   return read_cpuid(CurrentEL) == CurrentEL_EL2;
 }

This would indeed work, but CurrentEL is hardly an ID register. I feel
slightly uncomfortable using read_cpuid (which might return a cached
version at some point) for random system registers.

Thoughts?

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/13] arm/arm64: Add new is_kernel_in_hyp_mode predicate

2015-07-09 Thread Mark Rutland
On Thu, Jul 09, 2015 at 11:05:34AM +0100, Marc Zyngier wrote:
 On 09/07/15 10:42, Mark Rutland wrote:
  Hi,
  
  +static inline bool is_kernel_in_hyp_mode(void)
  +{
  +  u64 el;
  +
  +  asm(mrs %0, CurrentEL : =r (el));
  +  return el == CurrentEL_EL2;
  +}
  
  If you can include cputype.h, I think this can be:
  
  static inline bool is_kernel_in_hyp_mode(void)
  {
  return read_cpuid(CurrentEL) == CurrentEL_EL2;
  }
 
 This would indeed work, but CurrentEL is hardly an ID register. I feel
 slightly uncomfortable using read_cpuid (which might return a cached
 version at some point) for random system registers.
 
 Thoughts?

I have no strong feelings either way, but I agree with the general
uneasiness w.r.t. what read_cpuid can be expected to do.

Elsewhere we just use inline asm to read non CPUID system registers, so
let's leave your patch as it was.

Thanks,
Mark.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] KVM: arm/arm64: Don't let userspace update CNTVOFF once guest is running

2015-07-09 Thread Christoffer Dall
Hi Peter and Marc,

[cc'ing Paolo for his input on x86 timekeeping]

On Wed, Jul 08, 2015 at 08:13:59PM +0100, Peter Maydell wrote:
 On 8 July 2015 at 17:37, Marc Zyngier marc.zyng...@arm.com wrote:
  On 08/07/15 17:06, Peter Maydell wrote:
  I'd prefer it if somebody could investigate to see why QEMU
  is actually doing this -- so far we just have speculation.
 
  I'd prefer that too, but so far people seem to be more comfortable
  waiting for the issue to fix itself. In the meantime, VMs are broken in
  weird and wonderful ways, and I don't think the current status-quo helps
  anyone.
 
 Putting in a patch which might not be the right fix isn't
 necessarily a good plan either...
 
 Does has_run_once get cleared if we do a re-VCPU_INIT
 of a CPU that's run before? (We need to allow rewriting
 of guest state at that point so that reset VM and
 load migration state behaves correctly.)

no, it does not, has_run_once is set the first time a VCPU is run and is
currently *never* cleared.

 
 I suspect Jan is right and we really need to distinguish
 the KVM_PUT_*_STATE levels in ARM QEMU. This probably
 implies some kind of whitelist/override mechanism, since
 by and large we neither know nor want to know the
 semantics for system registers, we leave that up to the
 kernel.
 
 Q: if you have a running VM, and you pause it for
 an hour, what should the CNTVCT register do? Presumably
 it should not advance, but how do we arrange for that
 to happen?
 

I think the CNTVCT should not advance when the VM is not scheduled, so
if we pause the VM or starve all the VCPUs for enough time, the guest
should not see time progressing, since otherwise the guest scheduler
cannot maintain fairness and you're bound to see spurious RCU stalls
etc.

That's exactly why a guest can read both a virtual and physical counter
and it is an area where you simply want some level of
paravirtualization.  I haven't studied how/if Linux deals with this at
all.

So I think adjusting CNTVOFF should be managed by the kernel for the
pause/starvation scenario (which I think Avi once told me x86 does too -
does anyone know the current state of the art?).

So the only situation where I think userspace should adjust the CNTVOFF
value is for migration where we are talking about a brand new VM with
has_run_once clear.

Thus, if we were designing this from scratch now, the API should
be to return an error when trying to set KVM_REG_ARM_TIMER_CNT after the
VM has run once, but it's too late for that as we would break userspace.
The best alternative IMHO would be to merge Marc's patch and fix CNTVOFF
in the kernel side as well, and finally also fix QEMU so that it doesn't
try to do the thing that future kernels will ignore.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] target-i386: Sanity check host processor physical address width

2015-07-09 Thread Igor Mammedov
On Wed, 08 Jul 2015 18:42:01 -0400
Bandan Das b...@redhat.com wrote:

 
 If a Linux guest is assigned more memory than is supported
 by the host processor, the guest is unable to boot. That
 is expected, however, there's no message indicating the user
 what went wrong. This change prints a message to stderr if
 KVM has the corresponding capability.
 
 Reported-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  linux-headers/linux/kvm.h | 1 +
  target-i386/kvm.c | 6 ++
  2 files changed, 7 insertions(+)
 
 diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
 index 3bac873..6afad49 100644
 --- a/linux-headers/linux/kvm.h
 +++ b/linux-headers/linux/kvm.h
 @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_DISABLE_QUIRKS 116
  #define KVM_CAP_X86_SMM 117
  #define KVM_CAP_MULTI_ADDRESS_SPACE 118
 +#define KVM_CAP_PHY_ADDR_WIDTH 119
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 066d03d..66e3448 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -892,6 +892,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  uint64_t shadow_mem;
  int ret;
  struct utsname utsname;
 +int max_phys_bits;
  
  ret = kvm_get_supported_msrs(s);
  if (ret  0) {
 @@ -945,6 +946,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
  }
  }
  
 +max_phys_bits = kvm_check_extension(s, KVM_CAP_PHY_ADDR_WIDTH);
max_phys_bits seems generic enough and could be applied to other targets
as well.

making it a property of machine, would make accessing/manipulating it easier.
define default value for machine/TCG mode and when KVM is enabled
it would override/set its own limit.

then any board could easily access machine-max_gpa to make board specific
checks.

 +if (max_phys_bits  (1ULL  max_phys_bits) = ram_size)
 +fprintf(stderr, Warning: The amount of memory assigned to the guest 
 
 +is more than that supported by the host CPU(s). Guest may be 
 unstable.\n);
 +
  if (kvm_check_extension(s, KVM_CAP_X86_SMM)) {
  smram_machine_done.notify = register_smram_listener;
  qemu_add_machine_init_done_notifier(smram_machine_done);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] Detect vGIC presence at runtime

2015-07-09 Thread Pavel Fedin
 Hello!

 why not report ENXIO as an error?  If probing the vgic fails due to
 being unable to request the irq or something similar, then surely your
 system has and error and this should be reported.

 It is reported by probe function itself.
 -ENODEV here means there's no GIC at all. -ENXIO happens when, for example, 
there is GIC node in
the device tree, but it does not specify vGIC resources. Normally this means 
that vGIC is defunct on
the machine.

 This may be more nicely implemented by letting the vgic init/probe
 functions set the vgic_present, or maybe better yet, just export a
 function from vgic.c:
 
 bool kvm_vgic_present(void)
 {
   return vgic_ops != NULL;
 }

 Is it necessary? Actually this flag is not needed anywhere else except 
arch/arm/kvm/arm.c, only at
init time. Runtime should, i believe, use irqchip_in_kernel(), because userland 
can choose just not
to use vGIC for some reason (testing for example).

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: x86: fix load xsave feature warning

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 09:44, Wanpeng Li wrote:
 [   68.196974] WARNING: CPU: 1 PID: 2140 at arch/x86/kvm/x86.c:3161 
 kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]()
 [   68.196975] Modules linked in: snd_hda_codec_hdmi i915 rfcomm bnep 
 bluetooth i2c_algo_bit rfkill nfsd drm_kms_helper nfs_acl nfs drm lockd grace 
 sunrpc fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel 
 snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_dummy snd_seq_oss 
 x86_pkg_temp_thermal snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi 
 kvm snd_seq ghash_clmulni_intel fuse snd_timer aesni_intel parport_pc 
 ablk_helper snd_seq_device cryptd ppdev snd lp parport lrw dcdbas gf128mul 
 i2c_core glue_helper lpc_ich video shpchp mfd_core soundcore serio_raw 
 acpi_cpufreq ext4 mbcache jbd2 sd_mod crc32c_intel ahci libahci libata e1000e 
 ptp pps_core
 [   68.197005] CPU: 1 PID: 2140 Comm: qemu-system-x86 Not tainted 4.2.0-rc1+ 
 #2
 [   68.197006] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 
 01/08/2015
 [   68.197007]  a03b0657 8800d984bca8 815915a2 
 
 [   68.197009]   8800d984bce8 81057c0a 
 7ff6d0001000
 [   68.197010]  0002 880211c1a000 0004 
 8800ce0288c0
 [   68.197012] Call Trace:
 [   68.197017]  [815915a2] dump_stack+0x45/0x57
 [   68.197020]  [81057c0a] warn_slowpath_common+0x8a/0xc0
 [   68.197022]  [81057cfa] warn_slowpath_null+0x1a/0x20
 [   68.197029]  [a037bed8] kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]
 [   68.197035]  [a037aede] ? kvm_arch_vcpu_load+0x4e/0x1c0 [kvm]
 [   68.197040]  [a03696a6] kvm_vcpu_ioctl+0xc6/0x5c0 [kvm]
 [   68.197043]  [811252d2] ? perf_pmu_enable+0x22/0x30
 [   68.197044]  [8112663e] ? perf_event_context_sched_in+0x7e/0xb0
 [   68.197048]  [811a6882] do_vfs_ioctl+0x2c2/0x4a0
 [   68.197050]  [8107bf33] ? finish_task_switch+0x173/0x220
 [   68.197053]  [8123307f] ? selinux_file_ioctl+0x4f/0xd0
 [   68.197055]  [8122cac3] ? security_file_ioctl+0x43/0x60
 [   68.197057]  [811a6ad9] SyS_ioctl+0x79/0x90
 [   68.197060]  [81597e57] entry_SYSCALL_64_fastpath+0x12/0x6a
 [   68.197061] ---[ end trace 558a5ebf9445fc80 ]---
 
 After commit (0c4109bec0 'x86/fpu/xstate: Fix up bad get_xsave_addr() 
 assumptions'), there is no assumption an xsave bit is present in the 
 hardware (pcntxt_mask) that it is always present in a given xsave buffer.
 An enabled state to be present on 'pcntxt_mask', but *not* in 'xstate_bv'
 could happen when the last 'xsave' did not request that this feature be 
 saved (unlikely) or because the init optimization caused it to not be 
 saved. This patch kill the assumption.
 
 Signed-off-by: Wanpeng Li wanpeng...@hotmail.com
 ---
 Note: This patch against latest linus tree.
 
  arch/x86/kvm/x86.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index bbaf44e..7125cefe 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -3157,8 +3157,7 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
   cpuid_count(XSTATE_CPUID, index,
   size, offset, ecx, edx);
   memcpy(dest, src + offset, size);
 - } else
 - WARN_ON_ONCE(1);
 + }
  
   valid -= feature;
   }
 

I'll apply this tomorrow, thanks.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 08/11] KVM: arm: implement dirty bit mechanism for debug registers

2015-07-09 Thread Zhichao Huang


On July 9, 2015 1:08:55 AM GMT+08:00, Will Deacon will.dea...@arm.com wrote:
On Wed, Jul 08, 2015 at 11:50:22AM +0100, Zhichao Huang wrote:
 Are you happy with this?:

You miss the reserved breakpoint, I think.

Sorry, I can't quite understand. What is the reserved breakpoint?

When arch_[iu]ninstall_hw_breakpoint is called, it only set/clear the
brps and wrps slots.

I also still don't understand why this is preferable to trapping.

Will

-- 
Zhichao Huang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Michael S. Tsirkin
On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote:
 
 
 On 09/07/2015 11:48, Laurent Vivier wrote:
  
  
  On 09/07/2015 09:49, Thomas Huth wrote:
  The option for supporting cross-endianness legacy guests in
  the vhost and tun code should only be available on systems
  that support cross-endian guests.
  
  I'm sure I misunderstand something, but what happens if we use QEMU with
  TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
  little endian host ?
 
 TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost.
 
 Paolo

vhost does not require irqfd anymore.  I think ioeventfd actually works
fine though I didn't try, it would be easy to support.


  Do you forbid the use of vhost in this case ?
  
  Signed-off-by: Thomas Huth th...@redhat.com
  ---
   arch/arm/kvm/Kconfig | 1 +
   arch/arm64/kvm/Kconfig   | 1 +
   arch/powerpc/kvm/Kconfig | 1 +
   drivers/net/Kconfig  | 1 +
   drivers/vhost/Kconfig| 1 +
   virt/kvm/Kconfig | 3 +++
   6 files changed, 8 insertions(+)
 
  diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
  index bfb915d..9d8f363 100644
  --- a/arch/arm/kvm/Kconfig
  +++ b/arch/arm/kvm/Kconfig
  @@ -31,6 +31,7 @@ config KVM
 select KVM_VFIO
 select HAVE_KVM_EVENTFD
 select HAVE_KVM_IRQFD
  +  select KVM_CROSS_ENDIAN_GUESTS
 depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
 ---help---
   Support hosting virtualized guest machines.
  diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
  index bfffe8f..9af39fe 100644
  --- a/arch/arm64/kvm/Kconfig
  +++ b/arch/arm64/kvm/Kconfig
  @@ -31,6 +31,7 @@ config KVM
 select KVM_VFIO
 select HAVE_KVM_EVENTFD
 select HAVE_KVM_IRQFD
  +  select KVM_CROSS_ENDIAN_GUESTS
 ---help---
   Support hosting virtualized guest machines.
   
  diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
  index 3caec2c..e028710 100644
  --- a/arch/powerpc/kvm/Kconfig
  +++ b/arch/powerpc/kvm/Kconfig
  @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
 select KVM_BOOK3S_HV_POSSIBLE
 select MMU_NOTIFIER
 select CMA
  +  select KVM_CROSS_ENDIAN_GUESTS
 ---help---
   Support running unmodified book3s_64 guest kernels in
   virtual machines on POWER7 and PPC970 processors that have
  diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
  index c18f9e6..0c4ce47 100644
  --- a/drivers/net/Kconfig
  +++ b/drivers/net/Kconfig
  @@ -261,6 +261,7 @@ config TUN
   config TUN_VNET_CROSS_LE
 bool Support for cross-endian vnet headers on little-endian kernels
 default n
  +  depends on KVM_CROSS_ENDIAN_GUESTS
 ---help---
   This option allows TUN/TAP and MACVTAP device drivers in a
   little-endian kernel to parse vnet headers that come from a
  diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
  index 533eaf0..4d8ae6b 100644
  --- a/drivers/vhost/Kconfig
  +++ b/drivers/vhost/Kconfig
  @@ -35,6 +35,7 @@ config VHOST
   
   config VHOST_CROSS_ENDIAN_LEGACY
 bool Cross-endian support for vhost
  +  depends on KVM_CROSS_ENDIAN_GUESTS
 default n
 ---help---
   This option allows vhost to support guests with a different byte
  diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
  index e2c876d..cc7b28a 100644
  --- a/virt/kvm/Kconfig
  +++ b/virt/kvm/Kconfig
  @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
   config KVM_COMPAT
  def_bool y
  depends on COMPAT  !S390
  +
  +config KVM_CROSS_ENDIAN_GUESTS
  +   bool
 
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] irqchip: GICv2/v3: Add support for irq_vcpu_affinity

2015-07-09 Thread Marc Zyngier
The GICv2 and GICv3 architectures allow an active physical interrupt
to be forwarded to a guest, and the guest to indirectly perform the
deactivation of the interrupt by performing an EOI on the virtual
interrupt (see for example the GICv2 spec, 3.2.1).

This allows some substantial performance improvement for level
triggered interrupts that otherwise have to be masked/unmasked in
VFIO, not to mention the required trap back to KVM when the guest
performs an EOI.

To enable this, the GICs need to be switched to a different EOImode,
where a taken interrupt can be left active (which prevents the same
interrupt from being taken again), while other interrupts are still
being processed normally.

We also use the new irq_set_vcpu_affinity hook that was introduced for
Intel's Posted Interrupts to determine whether or not to perform the
deactivation at EOI-time.

As all of this only makes sense when the kernel can behave as a
hypervisor, we only enable this mode on detecting that the kernel was
actually booted in HYP mode, and that the GIC supports this feature.

This series is a complete rework of a RFC I sent over a year ago:

http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/266328.html

Since then, a lot has been either merged (the irqchip_state) or reworked
(my active-timer series: http://www.spinics.net/lists/kvm/msg118768.html),
and this implements the last few bits for Eric Auger's series to
finally make it into the kernel:

https://lkml.org/lkml/2015/7/2/268
https://lkml.org/lkml/2015/7/6/291

With all these patches combined, physical interrupt routing from the
kernel into a VM becomes possible.

This has been tested on Juno (GICv2) and FastModel (GICv3). A branch
is available at:

git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git 
irq/gic-irq-vcpu-affinity

Marc Zyngier (6):
  irqchip: GICv3: Convert to EOImode == 1
  irqchip: GIC: Convert to EOImode == 1
  irqchip: GICv3: Skip LPI deactivation
  irqchip: GIC: Use chip_data instead of handler_data for cascaded
interrupts
  irqchip: GICv3: Don't deactivate interrupts forwarded to a guest
  irqchip: GIC: Don't deactivate interrupts forwarded to a guest

 drivers/irqchip/irq-gic-v3.c   | 57 +++--
 drivers/irqchip/irq-gic.c  | 64 +++---
 include/linux/irqchip/arm-gic-v3.h |  9 ++
 include/linux/irqchip/arm-gic.h|  4 +++
 4 files changed, 126 insertions(+), 8 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Detect vGIC presence at runtime

2015-07-09 Thread Christoffer Dall
On Thu, Jul 09, 2015 at 03:50:49PM +0300, Pavel Fedin wrote:
  Hello!
 
  why not report ENXIO as an error?  If probing the vgic fails due to
  being unable to request the irq or something similar, then surely your
  system has and error and this should be reported.
 
  It is reported by probe function itself.
  -ENODEV here means there's no GIC at all. -ENXIO happens when, for example, 
 there is GIC node in
 the device tree, but it does not specify vGIC resources. Normally this means 
 that vGIC is defunct on
 the machine.

I'd like to distinguish between the 'missing vgic' and 'something bad
happened when trying to initialize the vgic' cases, which I don't think
we do currently, because the ENXIO code is used in various situations.

 
  This may be more nicely implemented by letting the vgic init/probe
  functions set the vgic_present, or maybe better yet, just export a
  function from vgic.c:
  
  bool kvm_vgic_present(void)
  {
  return vgic_ops != NULL;
  }
 
  Is it necessary? Actually this flag is not needed anywhere else except 
 arch/arm/kvm/arm.c, only at
 init time. Runtime should, i believe, use irqchip_in_kernel(), because 
 userland can choose just not
 to use vGIC for some reason (testing for example).
 
I feel the init flow is relatively difficult to follow and adding a
bunch of flags here and there doesn't seem to help.  By adding a
function with a proper comment, it should be more clear, and I don't
like the switch statement on the error return values.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 08:43, Laszlo Ersek wrote:
 On 07/09/15 08:09, Paolo Bonzini wrote:


 On 09/07/2015 00:36, Bandan Das wrote:
 Let userspace inquire the maximum physical address width
 of the host processors; this can be used to identify maximum
 memory that can be assigned to the guest.

 Reported-by: Laszlo Ersek ler...@redhat.com
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  arch/x86/kvm/x86.c   | 3 +++
  include/uapi/linux/kvm.h | 1 +
  2 files changed, 4 insertions(+)

 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index bbaf44e..97d6746 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2683,6 +2683,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
 long ext)
 case KVM_CAP_NR_MEMSLOTS:
 r = KVM_USER_MEM_SLOTS;
 break;
 +   case KVM_CAP_PHY_ADDR_WIDTH:
 +   r = boot_cpu_data.x86_phys_bits;
 +   break;

 Userspace can just use CPUID, can't it?
 
 I believe KVM's cooperation is necessary, for the following reason:
 
 The truncation only occurs when the guest-phys - host-phys translation
 is done in hardware, *and* the phys bits of the host processor are
 insufficient to represent the highest guest-phys address that the guest
 will ever face.
 
 The first condition (of course) means that the truncation depends on EPT
 being enabled. (I didn't test on AMD so I don't know if RVI has the same
 issue.) If EPT is disabled, either because the host processor lacks it,
 or because the respective kvm_intel module parameter is set so, then the
 issue cannot be experienced.
 
 Therefore I believe a KVM patch is necessary.
 
 However, this specific patch doesn't seem sufficient; it should also
 consider whether EPT is enabled. (And the ioctl should be perhaps
 renamed to reflect that -- what QEMU needs to know is not the raw
 physical address width of the host processor, but whether that width
 will cause EPT to silently truncate high guest-phys addresses.)

Right; if you want to consider whether EPT is enabled (which is the
right thing to do, albeit it makes for a much bigger patch) a KVM patch
is necessary.  In that case you also need to patch the API documentation.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 11:48, Laurent Vivier wrote:
 
 
 On 09/07/2015 09:49, Thomas Huth wrote:
 The option for supporting cross-endianness legacy guests in
 the vhost and tun code should only be available on systems
 that support cross-endian guests.
 
 I'm sure I misunderstand something, but what happens if we use QEMU with
 TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
 little endian host ?

TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost.

Paolo

 Do you forbid the use of vhost in this case ?
 
 Signed-off-by: Thomas Huth th...@redhat.com
 ---
  arch/arm/kvm/Kconfig | 1 +
  arch/arm64/kvm/Kconfig   | 1 +
  arch/powerpc/kvm/Kconfig | 1 +
  drivers/net/Kconfig  | 1 +
  drivers/vhost/Kconfig| 1 +
  virt/kvm/Kconfig | 3 +++
  6 files changed, 8 insertions(+)

 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index bfb915d..9d8f363 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -31,6 +31,7 @@ config KVM
  select KVM_VFIO
  select HAVE_KVM_EVENTFD
  select HAVE_KVM_IRQFD
 +select KVM_CROSS_ENDIAN_GUESTS
  depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
  ---help---
Support hosting virtualized guest machines.
 diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
 index bfffe8f..9af39fe 100644
 --- a/arch/arm64/kvm/Kconfig
 +++ b/arch/arm64/kvm/Kconfig
 @@ -31,6 +31,7 @@ config KVM
  select KVM_VFIO
  select HAVE_KVM_EVENTFD
  select HAVE_KVM_IRQFD
 +select KVM_CROSS_ENDIAN_GUESTS
  ---help---
Support hosting virtualized guest machines.
  
 diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
 index 3caec2c..e028710 100644
 --- a/arch/powerpc/kvm/Kconfig
 +++ b/arch/powerpc/kvm/Kconfig
 @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
  select KVM_BOOK3S_HV_POSSIBLE
  select MMU_NOTIFIER
  select CMA
 +select KVM_CROSS_ENDIAN_GUESTS
  ---help---
Support running unmodified book3s_64 guest kernels in
virtual machines on POWER7 and PPC970 processors that have
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index c18f9e6..0c4ce47 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -261,6 +261,7 @@ config TUN
  config TUN_VNET_CROSS_LE
  bool Support for cross-endian vnet headers on little-endian kernels
  default n
 +depends on KVM_CROSS_ENDIAN_GUESTS
  ---help---
This option allows TUN/TAP and MACVTAP device drivers in a
little-endian kernel to parse vnet headers that come from a
 diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
 index 533eaf0..4d8ae6b 100644
 --- a/drivers/vhost/Kconfig
 +++ b/drivers/vhost/Kconfig
 @@ -35,6 +35,7 @@ config VHOST
  
  config VHOST_CROSS_ENDIAN_LEGACY
  bool Cross-endian support for vhost
 +depends on KVM_CROSS_ENDIAN_GUESTS
  default n
  ---help---
This option allows vhost to support guests with a different byte
 diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
 index e2c876d..cc7b28a 100644
 --- a/virt/kvm/Kconfig
 +++ b/virt/kvm/Kconfig
 @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
  config KVM_COMPAT
 def_bool y
 depends on COMPAT  !S390
 +
 +config KVM_CROSS_ENDIAN_GUESTS
 +   bool

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 11:48, Laurent Vivier wrote:
 
 
 On 09/07/2015 09:49, Thomas Huth wrote:
 The option for supporting cross-endianness legacy guests in
 the vhost and tun code should only be available on systems
 that support cross-endian guests.
 
 I'm sure I misunderstand something, but what happens if we use QEMU with
 TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
 little endian host ?

TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost.

Paolo

 Do you forbid the use of vhost in this case ?
 
 Signed-off-by: Thomas Huth th...@redhat.com
 ---
  arch/arm/kvm/Kconfig | 1 +
  arch/arm64/kvm/Kconfig   | 1 +
  arch/powerpc/kvm/Kconfig | 1 +
  drivers/net/Kconfig  | 1 +
  drivers/vhost/Kconfig| 1 +
  virt/kvm/Kconfig | 3 +++
  6 files changed, 8 insertions(+)

 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index bfb915d..9d8f363 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -31,6 +31,7 @@ config KVM
  select KVM_VFIO
  select HAVE_KVM_EVENTFD
  select HAVE_KVM_IRQFD
 +select KVM_CROSS_ENDIAN_GUESTS
  depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
  ---help---
Support hosting virtualized guest machines.
 diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
 index bfffe8f..9af39fe 100644
 --- a/arch/arm64/kvm/Kconfig
 +++ b/arch/arm64/kvm/Kconfig
 @@ -31,6 +31,7 @@ config KVM
  select KVM_VFIO
  select HAVE_KVM_EVENTFD
  select HAVE_KVM_IRQFD
 +select KVM_CROSS_ENDIAN_GUESTS
  ---help---
Support hosting virtualized guest machines.
  
 diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
 index 3caec2c..e028710 100644
 --- a/arch/powerpc/kvm/Kconfig
 +++ b/arch/powerpc/kvm/Kconfig
 @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
  select KVM_BOOK3S_HV_POSSIBLE
  select MMU_NOTIFIER
  select CMA
 +select KVM_CROSS_ENDIAN_GUESTS
  ---help---
Support running unmodified book3s_64 guest kernels in
virtual machines on POWER7 and PPC970 processors that have
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index c18f9e6..0c4ce47 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -261,6 +261,7 @@ config TUN
  config TUN_VNET_CROSS_LE
  bool Support for cross-endian vnet headers on little-endian kernels
  default n
 +depends on KVM_CROSS_ENDIAN_GUESTS
  ---help---
This option allows TUN/TAP and MACVTAP device drivers in a
little-endian kernel to parse vnet headers that come from a
 diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
 index 533eaf0..4d8ae6b 100644
 --- a/drivers/vhost/Kconfig
 +++ b/drivers/vhost/Kconfig
 @@ -35,6 +35,7 @@ config VHOST
  
  config VHOST_CROSS_ENDIAN_LEGACY
  bool Cross-endian support for vhost
 +depends on KVM_CROSS_ENDIAN_GUESTS
  default n
  ---help---
This option allows vhost to support guests with a different byte
 diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
 index e2c876d..cc7b28a 100644
 --- a/virt/kvm/Kconfig
 +++ b/virt/kvm/Kconfig
 @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
  config KVM_COMPAT
 def_bool y
 depends on COMPAT  !S390
 +
 +config KVM_CROSS_ENDIAN_GUESTS
 +   bool

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target-i386: Sanity check host processor physical address width

2015-07-09 Thread Paolo Bonzini


On 09/07/2015 10:26, Laszlo Ersek wrote:
  
  Perhaps KVM could simply hide memory above the limit (i.e. treat it as
  MMIO), and the BIOS could remove RAM above the limit from the e820
  memory map?
 I'd prefer to leave the guest firmware*s* out of this... :)
 
 E820 is a legacy BIOS concept. In OVMF we'd have to hack the memory
 resource descriptor HOBs (which in turn control the DXE memory space
 map, which in turn controls the UEFI memory map). Those HOBs are
 currently based on what the CMOS reports about the RAM available under
 and above 4GB.
 
 It's pretty complex already (will get more complex with SMM support),
 and TBH, for working around such an obscure issue, I wouldn't like to
 complicate it even further...
 
 After all, this is a host platform limitation. The solution should be to
 either move to a more capable host, or do it in software (disable EPT).

The reason I mentioned the firmware is because you could in principle
have the same issue on real hardware - say putting 128 GB on your
laptop.  The firmware should cope with it.

If OVMF does not use etc/e820, it can instead hack the values it reads
from CMOS, bounding them according to the CPUID value.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests

2015-07-09 Thread Michael S. Tsirkin
On Thu, Jul 09, 2015 at 02:57:33PM +0200, Paolo Bonzini wrote:
 
 
 On 09/07/2015 11:48, Laurent Vivier wrote:
  
  
  On 09/07/2015 09:49, Thomas Huth wrote:
  The option for supporting cross-endianness legacy guests in
  the vhost and tun code should only be available on systems
  that support cross-endian guests.
  
  I'm sure I misunderstand something, but what happens if we use QEMU with
  TCG instead of KVM, i.e. a big endian powerpc kernel guest on x86_64
  little endian host ?
 
 TCG does not yet support irqfd/ioeventfd, so it cannot be used with vhost.
 
 Paolo

vhost does not require irqfd anymore.  I think ioeventfd actually works
fine though I didn't try, it would be easy to support.


  Do you forbid the use of vhost in this case ?
  
  Signed-off-by: Thomas Huth th...@redhat.com
  ---
   arch/arm/kvm/Kconfig | 1 +
   arch/arm64/kvm/Kconfig   | 1 +
   arch/powerpc/kvm/Kconfig | 1 +
   drivers/net/Kconfig  | 1 +
   drivers/vhost/Kconfig| 1 +
   virt/kvm/Kconfig | 3 +++
   6 files changed, 8 insertions(+)
 
  diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
  index bfb915d..9d8f363 100644
  --- a/arch/arm/kvm/Kconfig
  +++ b/arch/arm/kvm/Kconfig
  @@ -31,6 +31,7 @@ config KVM
 select KVM_VFIO
 select HAVE_KVM_EVENTFD
 select HAVE_KVM_IRQFD
  +  select KVM_CROSS_ENDIAN_GUESTS
 depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
 ---help---
   Support hosting virtualized guest machines.
  diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
  index bfffe8f..9af39fe 100644
  --- a/arch/arm64/kvm/Kconfig
  +++ b/arch/arm64/kvm/Kconfig
  @@ -31,6 +31,7 @@ config KVM
 select KVM_VFIO
 select HAVE_KVM_EVENTFD
 select HAVE_KVM_IRQFD
  +  select KVM_CROSS_ENDIAN_GUESTS
 ---help---
   Support hosting virtualized guest machines.
   
  diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
  index 3caec2c..e028710 100644
  --- a/arch/powerpc/kvm/Kconfig
  +++ b/arch/powerpc/kvm/Kconfig
  @@ -79,6 +79,7 @@ config KVM_BOOK3S_64_HV
 select KVM_BOOK3S_HV_POSSIBLE
 select MMU_NOTIFIER
 select CMA
  +  select KVM_CROSS_ENDIAN_GUESTS
 ---help---
   Support running unmodified book3s_64 guest kernels in
   virtual machines on POWER7 and PPC970 processors that have
  diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
  index c18f9e6..0c4ce47 100644
  --- a/drivers/net/Kconfig
  +++ b/drivers/net/Kconfig
  @@ -261,6 +261,7 @@ config TUN
   config TUN_VNET_CROSS_LE
 bool Support for cross-endian vnet headers on little-endian kernels
 default n
  +  depends on KVM_CROSS_ENDIAN_GUESTS
 ---help---
   This option allows TUN/TAP and MACVTAP device drivers in a
   little-endian kernel to parse vnet headers that come from a
  diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
  index 533eaf0..4d8ae6b 100644
  --- a/drivers/vhost/Kconfig
  +++ b/drivers/vhost/Kconfig
  @@ -35,6 +35,7 @@ config VHOST
   
   config VHOST_CROSS_ENDIAN_LEGACY
 bool Cross-endian support for vhost
  +  depends on KVM_CROSS_ENDIAN_GUESTS
 default n
 ---help---
   This option allows vhost to support guests with a different byte
  diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
  index e2c876d..cc7b28a 100644
  --- a/virt/kvm/Kconfig
  +++ b/virt/kvm/Kconfig
  @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
   config KVM_COMPAT
  def_bool y
  depends on COMPAT  !S390
  +
  +config KVM_CROSS_ENDIAN_GUESTS
  +   bool
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] irqchip: GICv3: Skip LPI deactivation

2015-07-09 Thread Marc Zyngier
Contrary to other GICv3 interrupts, LPIs do not have an active state
by virtue of being edge-triggered only (they only have a pending state).

Given this, there is no point trying to deactivate them, and we can
skip the ICC_DIR_EL1 entierely.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 drivers/irqchip/irq-gic-v3.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 49768fc..e02592b 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -295,10 +295,14 @@ static int gic_irq_get_irqchip_state(struct irq_data *d,
 
 static void gic_eoi_irq(struct irq_data *d)
 {
-   if (static_key_true(supports_deactivate))
+   if (static_key_true(supports_deactivate)) {
+   /* No need to deactivate an LPI */
+   if (gic_irq(d) = 8192)
+   return;
gic_write_dir(gic_irq(d));
-   else
+   } else {
gic_write_eoir(gic_irq(d));
+   }
 }
 
 static int gic_set_type(struct irq_data *d, unsigned int type)
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] irqchip: GIC: Convert to EOImode == 1

2015-07-09 Thread Marc Zyngier
So far, GICv2 has been used in with EOImode == 0. The effect of this
mode is to perform the priority drop and the deactivation of the
interrupt at the same time.

While this works perfectly for Linux (we only have a single priority),
it causes issues when an interrupt is forwarded to a guest, and when
we want the guest to perform the EOI itself.

For this case, the GIC architecture provides EOImode == 1, where:
- A write to the EOI register drops the priority of the interrupt and leaves
it active. Other interrupts at the same priority level can now be taken,
but the active interrupt cannot be taken again
- A write to the DIR marks the interrupt as inactive, meaning it can
now be taken again.

We only enable this feature when booted in HYP mode and that
the device-tree reporte a suitable CPU interface. Observable behaviour
should remain unchanged.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 drivers/irqchip/irq-gic.c   | 32 +---
 include/linux/irqchip/arm-gic.h |  4 
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 8d7e1c8..e264675 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -46,6 +46,7 @@
 #include asm/irq.h
 #include asm/exception.h
 #include asm/smp_plat.h
+#include asm/virt.h
 
 #include irq-gic-common.h
 #include irqchip.h
@@ -82,6 +83,8 @@ static DEFINE_RAW_SPINLOCK(irq_controller_lock);
 #define NR_GIC_CPU_IF 8
 static u8 gic_cpu_map[NR_GIC_CPU_IF] __read_mostly;
 
+static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
+
 #ifndef MAX_GIC_NR
 #define MAX_GIC_NR 1
 #endif
@@ -164,7 +167,10 @@ static void gic_unmask_irq(struct irq_data *d)
 
 static void gic_eoi_irq(struct irq_data *d)
 {
-   writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI);
+   if (static_key_true(supports_deactivate))
+   writel_relaxed(gic_irq(d), gic_cpu_base(d) + 
GIC_CPU_DEACTIVATE);
+   else
+   writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI);
 }
 
 static int gic_irq_set_irqchip_state(struct irq_data *d,
@@ -272,11 +278,15 @@ static void __exception_irq_entry gic_handle_irq(struct 
pt_regs *regs)
irqnr = irqstat  GICC_IAR_INT_ID_MASK;
 
if (likely(irqnr  15  irqnr  1021)) {
+   if (static_key_true(supports_deactivate))
+   writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
handle_domain_irq(gic-domain, irqnr, regs);
continue;
}
if (irqnr  16) {
writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
+   if (static_key_true(supports_deactivate))
+   writel_relaxed(irqstat, cpu_base + 
GIC_CPU_DEACTIVATE);
 #ifdef CONFIG_SMP
handle_IPI(irqnr, regs);
 #endif
@@ -358,7 +368,11 @@ static u8 gic_get_cpumask(struct gic_chip_data *gic)
 static void gic_cpu_if_up(void)
 {
void __iomem *cpu_base = gic_data_cpu_base(gic_data[0]);
-   u32 bypass = 0;
+   u32 bypass;
+   u32 mode = 0;
+
+   if (static_key_true(supports_deactivate))
+   mode = GIC_CPU_CTRL_EOImodeNS;
 
/*
* Preserve bypass disable bits to be written back later
@@ -366,7 +380,7 @@ static void gic_cpu_if_up(void)
bypass = readl(cpu_base + GIC_CPU_CTRL);
bypass = GICC_DIS_BYPASS_MASK;
 
-   writel_relaxed(bypass | GICC_ENABLE, cpu_base + GIC_CPU_CTRL);
+   writel_relaxed(bypass | mode | GICC_ENABLE, cpu_base + GIC_CPU_CTRL);
 }
 
 
@@ -986,6 +1000,9 @@ void __init gic_init_bases(unsigned int gic_nr, int 
irq_start,
register_cpu_notifier(gic_cpu_notifier);
 #endif
set_handle_irq(gic_handle_irq);
+   pr_info (GIC: Using EOImode == %d\n,
+static_key_true(supports_deactivate));
+
}
 
gic_dist_init(gic);
@@ -1001,6 +1018,7 @@ gic_of_init(struct device_node *node, struct device_node 
*parent)
 {
void __iomem *cpu_base;
void __iomem *dist_base;
+   struct resource cpu_res;
u32 percpu_offset;
int irq;
 
@@ -1013,6 +1031,11 @@ gic_of_init(struct device_node *node, struct device_node 
*parent)
cpu_base = of_iomap(node, 1);
WARN(!cpu_base, unable to map gic cpu registers\n);
 
+   of_address_to_resource(node, 1, cpu_res);
+   if (!gic_cnt 
+   (!is_hyp_mode_available() || resource_size(cpu_res)  SZ_8K))
+   static_key_slow_dec(supports_deactivate);
+
if (of_property_read_u32(node, cpu-offset, percpu_offset))
percpu_offset = 0;
 
@@ -1131,6 +1154,9 @@ gic_v2_acpi_init(struct acpi_table_header *table)
return -ENOMEM;
}
 
+   if (!is_hyp_mode_available())
+   static_key_slow_dec(supports_deactivate);
+
/*
 * 

[PATCH 6/6] irqchip: GIC: Don't deactivate interrupts forwarded to a guest

2015-07-09 Thread Marc Zyngier
Commit 0a4377de3056 (genirq: Introduce irq_set_vcpu_affinity() to
target an interrupt to a VCPU) added just what we needed at the
lowest level to allow an interrupt to be deactivated by a guest.

When such a request reaches the GIC, it knows it doesn't need to
perform the deactivation anymore, and can safely leave the guest
do its magic. This of course requires additional support in both
VFIO and KVM.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 drivers/irqchip/irq-gic.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 3c7f3a4..ca4074a 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -140,6 +140,11 @@ static inline unsigned int gic_irq(struct irq_data *d)
return d-hwirq;
 }
 
+static inline bool forwarded_irq(struct irq_data *d)
+{
+   return d-handler_data != NULL;
+}
+
 /*
  * Routines to acknowledge, disable and enable interrupts
  */
@@ -158,6 +163,12 @@ static int gic_peek_irq(struct irq_data *d, u32 offset)
 static void gic_mask_irq(struct irq_data *d)
 {
gic_poke_irq(d, GIC_DIST_ENABLE_CLEAR);
+   /*
+* When masking a forwarded interrupt, make sure it is
+* deactivated as well.
+*/
+   if (static_key_true(supports_deactivate)  forwarded_irq(d))
+   gic_poke_irq(d, GIC_DIST_ACTIVE_CLEAR);
 }
 
 static void gic_unmask_irq(struct irq_data *d)
@@ -167,10 +178,14 @@ static void gic_unmask_irq(struct irq_data *d)
 
 static void gic_eoi_irq(struct irq_data *d)
 {
-   if (static_key_true(supports_deactivate))
+   if (static_key_true(supports_deactivate)) {
+   /* Do not deactivate an IRQ forwarded to a vcpu. */
+   if (forwarded_irq(d))
+   return;
writel_relaxed(gic_irq(d), gic_cpu_base(d) + 
GIC_CPU_DEACTIVATE);
-   else
+   } else {
writel_relaxed(gic_irq(d), gic_cpu_base(d) + GIC_CPU_EOI);
+   }
 }
 
 static int gic_irq_set_irqchip_state(struct irq_data *d,
@@ -239,6 +254,18 @@ static int gic_set_type(struct irq_data *d, unsigned int 
type)
return gic_configure_irq(gicirq, type, base, NULL);
 }
 
+static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
+{
+   /* Only interrupts on the primary GIC can be forwarded to a vcpu. */
+   if (static_key_true(supports_deactivate) 
+   irq_data_get_irq_chip_data(d) == gic_data[0]) {
+   d-handler_data = vcpu;
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
 #ifdef CONFIG_SMP
 static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
bool force)
@@ -334,6 +361,7 @@ static struct irq_chip gic_chip = {
 #endif
.irq_get_irqchip_state  = gic_irq_get_irqchip_state,
.irq_set_irqchip_state  = gic_irq_set_irqchip_state,
+   .irq_set_vcpu_affinity  = gic_irq_set_vcpu_affinity,
.flags  = IRQCHIP_SET_TYPE_MASKED,
 };
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] irqchip: GICv3: Don't deactivate interrupts forwarded to a guest

2015-07-09 Thread Marc Zyngier
Commit 0a4377de3056 (genirq: Introduce irq_set_vcpu_affinity() to
target an interrupt to a VCPU) added just what we needed at the
lowest level to allow an interrupt to be deactivated by a guest.

When such a request reaches the GIC, it knows it doesn't need to
perform the deactivation anymore, and can safely leave the guest
do its magic. This of course requires additional support in both
VFIO and KVM.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 drivers/irqchip/irq-gic-v3.c | 29 +++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index e02592b..a1ca9e6 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -70,6 +70,11 @@ static inline int gic_irq_in_rdist(struct irq_data *d)
return gic_irq(d)  32;
 }
 
+static inline bool forwarded_irq(struct irq_data *d)
+{
+   return d-handler_data != NULL;
+}
+
 static inline void __iomem *gic_dist_base(struct irq_data *d)
 {
if (gic_irq_in_rdist(d))/* SGI+PPI - SGI_base for this CPU */
@@ -231,6 +236,12 @@ static void gic_poke_irq(struct irq_data *d, u32 offset)
 static void gic_mask_irq(struct irq_data *d)
 {
gic_poke_irq(d, GICD_ICENABLER);
+   /*
+* When masking a forwarded interrupt, make sure it is
+* deactivated as well.
+*/
+   if (static_key_true(supports_deactivate)  forwarded_irq(d))
+   gic_poke_irq(d, GICD_ICACTIVER);
 }
 
 static void gic_unmask_irq(struct irq_data *d)
@@ -296,8 +307,11 @@ static int gic_irq_get_irqchip_state(struct irq_data *d,
 static void gic_eoi_irq(struct irq_data *d)
 {
if (static_key_true(supports_deactivate)) {
-   /* No need to deactivate an LPI */
-   if (gic_irq(d) = 8192)
+   /*
+* No need to deactivate an LPI, or an interrupt that
+* is is getting forwarded to a vcpu.
+*/
+   if (gic_irq(d) = 8192 || forwarded_irq(d))
return;
gic_write_dir(gic_irq(d));
} else {
@@ -331,6 +345,16 @@ static int gic_set_type(struct irq_data *d, unsigned int 
type)
return gic_configure_irq(irq, type, base, rwp_wait);
 }
 
+static int gic_irq_set_vcpu_affinity(struct irq_data *d, void *vcpu)
+{
+   if (static_key_true(supports_deactivate)) {
+   d-handler_data = vcpu;
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
 static u64 gic_mpidr_to_affinity(u64 mpidr)
 {
u64 aff;
@@ -678,6 +702,7 @@ static struct irq_chip gic_chip = {
.irq_set_affinity   = gic_set_affinity,
.irq_get_irqchip_state  = gic_irq_get_irqchip_state,
.irq_set_irqchip_state  = gic_irq_set_irqchip_state,
+   .irq_set_vcpu_affinity  = gic_irq_set_vcpu_affinity,
.flags  = IRQCHIP_SET_TYPE_MASKED,
 };
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] irqchip: GIC: Use chip_data instead of handler_data for cascaded interrupts

2015-07-09 Thread Marc Zyngier
When used as a primary interrupt controller, the GIC driver uses
irq_data-chip_data to extract controller-specific information.

When used as a secondary interrupt controller, it uses handler_data
instead. As this difference is relatively pointless and only creates
confusion, change the secondary path to match what the rest of the
driver does.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 drivers/irqchip/irq-gic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index e264675..3c7f3a4 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -298,7 +298,7 @@ static void __exception_irq_entry gic_handle_irq(struct 
pt_regs *regs)
 
 static void gic_handle_cascade_irq(unsigned int irq, struct irq_desc *desc)
 {
-   struct gic_chip_data *chip_data = irq_get_handler_data(irq);
+   struct gic_chip_data *chip_data = irq_get_chip_data(irq);
struct irq_chip *chip = irq_get_chip(irq);
unsigned int cascade_irq, gic_irq;
unsigned long status;
@@ -341,7 +341,7 @@ void __init gic_cascade_irq(unsigned int gic_nr, unsigned 
int irq)
 {
if (gic_nr = MAX_GIC_NR)
BUG();
-   if (irq_set_handler_data(irq, gic_data[gic_nr]) != 0)
+   if (irq_set_chip_data(irq, gic_data[gic_nr]) != 0)
BUG();
irq_set_chained_handler(irq, gic_handle_cascade_irq);
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] irqchip: GICv3: Convert to EOImode == 1

2015-07-09 Thread Marc Zyngier
So far, GICv3 has been used in with EOImode == 0. The effect of this
mode is to perform the priority drop and the deactivation of the
interrupt at the same time.

While this works perfectly for Linux (we only have a single priority),
it causes issues when an interrupt is forwarded to a guest, and when
we want the guest to perform the EOI itself.

For this case, the GIC architecture provides EOImode == 1, where:
- A write to ICC_EOIR1_EL1 drops the priority of the interrupt and leaves
it active. Other interrupts at the same priority level can now be taken,
but the active interrupt cannot be taken again
- A write to ICC_DIR_EL1 marks the interrupt as inactive, meaning it can
now be taken again.

This patch converts the driver to be able to use this new mode, depending
on whether or not the kernel can behave as a hypervisor. No feature change.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 drivers/irqchip/irq-gic-v3.c   | 28 +---
 include/linux/irqchip/arm-gic-v3.h |  9 +
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index c52f7ba..49768fc 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -30,6 +30,7 @@
 #include asm/cputype.h
 #include asm/exception.h
 #include asm/smp_plat.h
+#include asm/virt.h
 
 #include irq-gic-common.h
 #include irqchip.h
@@ -50,6 +51,7 @@ struct gic_chip_data {
 };
 
 static struct gic_chip_data gic_data __read_mostly;
+static struct static_key supports_deactivate = STATIC_KEY_INIT_TRUE;
 
 #define gic_data_rdist()   (this_cpu_ptr(gic_data.rdists.rdist))
 #define gic_data_rdist_rd_base()   (gic_data_rdist()-rd_base)
@@ -293,7 +295,10 @@ static int gic_irq_get_irqchip_state(struct irq_data *d,
 
 static void gic_eoi_irq(struct irq_data *d)
 {
-   gic_write_eoir(gic_irq(d));
+   if (static_key_true(supports_deactivate))
+   gic_write_dir(gic_irq(d));
+   else
+   gic_write_eoir(gic_irq(d));
 }
 
 static int gic_set_type(struct irq_data *d, unsigned int type)
@@ -343,6 +348,10 @@ static asmlinkage void __exception_irq_entry 
gic_handle_irq(struct pt_regs *regs
 
if (likely(irqnr  15  irqnr  1020) || irqnr = 8192) {
int err;
+
+   if (static_key_true(supports_deactivate))
+   gic_write_eoir(irqnr);
+
err = handle_domain_irq(gic_data.domain, irqnr, regs);
if (err) {
WARN_ONCE(true, Unexpected interrupt 
received!\n);
@@ -352,6 +361,8 @@ static asmlinkage void __exception_irq_entry 
gic_handle_irq(struct pt_regs *regs
}
if (irqnr  16) {
gic_write_eoir(irqnr);
+   if (static_key_true(supports_deactivate))
+   gic_write_dir(irqnr);
 #ifdef CONFIG_SMP
handle_IPI(irqnr, regs);
 #else
@@ -451,8 +462,13 @@ static void gic_cpu_sys_reg_init(void)
/* Set priority mask register */
gic_write_pmr(DEFAULT_PMR_VALUE);
 
-   /* EOI deactivates interrupt too (mode 0) */
-   gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir);
+   if (static_key_true(supports_deactivate)) {
+   /* EOI drops priority only (mode 1) */
+   gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop);
+   } else {
+   /* EOI deactivates interrupt too (mode 0) */
+   gic_write_ctlr(ICC_CTLR_EL1_EOImode_drop_dir);
+   }
 
/* ... and let's hit the road... */
gic_write_grpen1(1);
@@ -820,6 +836,12 @@ static int __init gic_of_init(struct device_node *node, 
struct device_node *pare
if (of_property_read_u64(node, redistributor-stride, redist_stride))
redist_stride = 0;
 
+   if (!is_hyp_mode_available())
+   static_key_slow_dec(supports_deactivate);
+
+   pr_info (GIC: Using EOImode == %d\n,
+static_key_true(supports_deactivate));
+
gic_data.dist_base = dist_base;
gic_data.redist_regions = rdist_regs;
gic_data.nr_redist_regions = nr_redist_regions;
diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index ffbc034..bc98832 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -104,6 +104,8 @@
 #define GICR_SYNCR 0x00C0
 #define GICR_MOVLPIR   0x0100
 #define GICR_MOVALLR   0x0110
+#define GICR_ISACTIVER GICD_ISACTIVER
+#define GICR_ICACTIVER GICD_ICACTIVER
 #define GICR_IDREGSGICD_IDREGS
 #define GICR_PIDR2 GICD_PIDR2
 
@@ -288,6 +290,7 @@
 #define ICH_VMCR_PMR_MASK  (0xffUL  ICH_VMCR_PMR_SHIFT)
 
 #define ICC_EOIR1_EL1  sys_reg(3, 0, 12, 12, 1)
+#define ICC_DIR_EL1  

RE: [PATCH 2/2] Detect vGIC presence at runtime

2015-07-09 Thread Pavel Fedin
 Hello!

 I'd like to distinguish between the 'missing vgic' and 'something bad
 happened when trying to initialize the vgic' cases, which I don't think
 we do currently, because the ENXIO code is used in various situations.

 It is done. Check, for example, vgic_v2_probe(). -ENXIO is returned when some 
of resources are
either missing from DT specification or wrongly given (not page-aligned). In 
the rest of cases error
code is taken from underlying functions, which are more supposed to return 
things like -EINVAL or
-ENOMEM.
 If you are doubtful, i could suggest to replace -EINVAL with -ENODEV in cases 
where resources are
not present. This would give more clear indication of we don't have vGIC 
condition.

 I feel the init flow is relatively difficult to follow and adding a
 bunch of flags here and there doesn't seem to help.  By adding a
 function with a proper comment, it should be more clear, and I don't
 like the switch statement on the error return values.

 Well, the alternative is:
 1. If GIC node or vGIC resources are not present, return 0 instead of error 
code.
 2. Implement a function which you suggest.
 Then:
 1. Situation with missing vGIC is just considered to be normal; it's not a 
to-be-ignored error any
more.
 2. vGIC present situation is automatically determined by vgic_ops != NULL; 
this means that probe
function completely worked and vGIC implementation has been chosen.

 Agree?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


 -Original Message-
 From: Christoffer Dall [mailto:christoffer.d...@linaro.org]
 Sent: Thursday, July 09, 2015 4:25 PM
 To: Pavel Fedin
 Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; 'Marc Zyngier'
 Subject: Re: [PATCH 2/2] Detect vGIC presence at runtime
 
 On Thu, Jul 09, 2015 at 03:50:49PM +0300, Pavel Fedin wrote:
   Hello!
 
   why not report ENXIO as an error?  If probing the vgic fails due to
   being unable to request the irq or something similar, then surely your
   system has and error and this should be reported.
 
   It is reported by probe function itself.
   -ENODEV here means there's no GIC at all. -ENXIO happens when, for 
  example, there is GIC node
in
  the device tree, but it does not specify vGIC resources. Normally this 
  means that vGIC is
defunct on
  the machine.
 
 
   This may be more nicely implemented by letting the vgic init/probe
   functions set the vgic_present, or maybe better yet, just export a
   function from vgic.c:
  
   bool kvm_vgic_present(void)
   {
 return vgic_ops != NULL;
   }
 
   Is it necessary? Actually this flag is not needed anywhere else except 
  arch/arm/kvm/arm.c, only
at
  init time. Runtime should, i believe, use irqchip_in_kernel(), because 
  userland can choose just
not
  to use vGIC for some reason (testing for example).
 
 I feel the init flow is relatively difficult to follow and adding a
 bunch of flags here and there doesn't seem to help.  By adding a
 function with a proper comment, it should be more clear, and I don't
 like the switch statement on the error return values.
 
 -Christoffer

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target-i386: Sanity check host processor physical address width

2015-07-09 Thread Bandan Das
Paolo Bonzini pbonz...@redhat.com writes:

 On 09/07/2015 10:26, Laszlo Ersek wrote:
  
  Perhaps KVM could simply hide memory above the limit (i.e. treat it as
  MMIO), and the BIOS could remove RAM above the limit from the e820
  memory map?
 I'd prefer to leave the guest firmware*s* out of this... :)
 
 E820 is a legacy BIOS concept. In OVMF we'd have to hack the memory
 resource descriptor HOBs (which in turn control the DXE memory space
 map, which in turn controls the UEFI memory map). Those HOBs are
 currently based on what the CMOS reports about the RAM available under
 and above 4GB.
 
 It's pretty complex already (will get more complex with SMM support),
 and TBH, for working around such an obscure issue, I wouldn't like to
 complicate it even further...
 
 After all, this is a host platform limitation. The solution should be to
 either move to a more capable host, or do it in software (disable EPT).

 The reason I mentioned the firmware is because you could in principle
 have the same issue on real hardware - say putting 128 GB on your
 laptop.  The firmware should cope with it.

Agreed, it's probably not a good idea to deviate too much from how real
hardware would behave IMO. As a simplification of Paolo's idea, is it
possible for qemu to completely ignore memory above the limit ? Will
that break anything ? :)

 If OVMF does not use etc/e820, it can instead hack the values it reads
 from CMOS, bounding them according to the CPUID value.

 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >