Re: KVM on ARM Cortex A53 in 32-bit Mode

2017-03-15 Thread Yasutaka, T
Hi Christopher and other KVM/ARM community members,

I tried the following QEMU and Kernel combinations on Raspberry Pi 3
(AArch64 mode) but failed:

Latest commit of Alexander Graf's no-kvm-irqchip QEMU branch:
https://github.com/agraf/qemu/tree/no-kvm-irqchip

raspberrypi/linux's rpi-4.10.y branch (forked from Linux 4.10-rc8?) latest
commit bb0ff9d059c67e1611c7422f7982a6a4876efe67
patched by $(git diff (linux 4.10-rc5 commit
7a308bb3016f57e5be11a677d15b821536419d36) (Christpher's irqs-to-user-v2
latest commit d0a7cc725535df1b9cc64b442d246c20a9edb904) )

and got the following error:

pi@raspberrypi:~/os/graf/no-kvm-irqchip/aarch64-softmmu $
qemu-system-aarch64 --enable-kvm -M virt
"kvm" accelerator not found.
No accelerator found!

pi@raspberrypi:~/os/graf/no-kvm-irqchip/aarch64-softmmu $ lkvm run test
  # lkvm run -k test -m 448 -c 4 --name guest-1605
  Error: Unsupported KVM extension detected: KVM_CAP_IRQCHIP
  Fatal: Failed to create virtual GIC


which seems strange for me because dmesg shows kernel is 4.10.2-v8+ and
Boot CPU is AArach64, all CPUs are initialized in HYP mode and /dev/kvm
exists. So I thought GIC emulation is also enabled, or should I need to
enable other config flags or do settings? Or, is this way of applying patch
wrong? Current config.gz is:
https://gist.github.com/caprice-j/d6e144cb8727df813a4fbe716beac3cb

My other concerns are no kvm-ok command is available and /proc/cpuinfo does
not show flags for virtualization support (I believe RPI3's cortex A53
(ARMv8) has builtin virt support and does not show the support flag
explicitly in /proc/cpuinfo ... but I'm unsure that is true)

pi@raspberrypi:~/os/graf/no-kvm-irqchip/aarch64-softmmu $ cat /proc/cpuinfo
processor   : 0
BogoMIPS: 38.40
Features: fp asimd evtstrm crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

I would be delighted if someone gives me suggestions.

Yasutaka

2017-03-07 4:55 GMT-05:00 Christoffer Dall :

> Hi Yasutaka,
>
> On Thu, Mar 02, 2017 at 09:57:50AM -0500, Yasutaka Tanaka wrote:
> > Hello KVM/ARM community,
> >
> > I am now trying to execute KVM on Raspberry Pi 3 Model B (RPI3).
> > Does anyone know whether KVM/ARM can run with 32-bit mode on ARM Cortex
> A53?
>
> Theoretically, it should.
>
> >
> > The CPU of RPI3 is ARM Cortex A53 which can be executed in both 32-bit
> and
> > 64-bit mode. I've managed to display /dev/kvm on 64-bit mode execution
> > (i.e. with a 64-bit linux kernel) on RPI3, but I failed to do so in
> 32-bit
> > mode.
> >
> > In 32-bit mode, an KVM error messages says CPU not supported. I guess the
> > reason for the error message is KVM_ARM_TARGET_CORTEX_A53 is only
> returned
> > in /arch/arm64/kvm/guest.c not under/arch/arm/kvm/guest.c.
> >
> > So I'm wondering whether it's possible or not.
>
> It should be.  You may just have to add the minimal missing definitions
> for a 32-bit A53 CPU support.
>
> > What I'd like to do is to
> > compare the execution speed of KVM between 32-bit and 64-bit mode. The
> > second plan is to compare between KVMs of 32-bit RPI2 and 64-bit RPI3 (I
> > have both), but since RPI2 spec is a little low and I'm worried that some
> > numeric comparisons become unfair.
>
> I would be careful here, since the RPI3 doesn't have the GIC and you'll
> end up going to user space a lot for interrupt handling.  Especially
> look out for SMP workloads in the VM which may generate a lot of IPIs
> which are really going to suffer from having to go to userspace on the
> host.
>
> >
> > (I was also notified by an KVM/ARM expert that since RPI doesn't have a
> GIC
> > and I need to apply some patches)
>
> Yes, you need the "Support userspace irqchip with arch timers" series
> and corresponding QEMU patches:
>
> http://www.spinics.net/lists/kvm/msg144685.html
>
> Good luck.
> -Christoffer
>



-- 
Yasutaka TANAKA
M.S. Candidate, Computer Science
Fu Foundation School of Engineering and Applied Science
Columbia University

y.tan...@columbia.edu
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory

2017-03-15 Thread James Morse
Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], notifications for
broken memory can call memory_failure() in mm/memory-failure.c to deliver
SIGBUS to any user space process using the page, and notify all the
in-kernel users.

If the page corresponded with guest memory, KVM will unmap this page
from its stage2 page tables. The user space process that allocated
this memory may have never touched this page in which case it may not
be mapped meaning SIGBUS won't be delivered.

When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it
comes to process the stage2 fault.

Do as x86 does, and deliver the SIGBUS when we discover
KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb
as this matches the user space mapping size.

Signed-off-by: James Morse 
CC: gengdongjiu 
---
 Without this patch both kvmtool and Qemu exit as the KVM_RUN ioctl() returns
 EFAULT.
 QEMU: error: kvm run failed Bad address
 LVKM: KVM_RUN failed: Bad address

 With this patch both kvmtool and Qemu receive SIGBUS ... and then exit.
 In the future Qemu can use this signal to notify the guest, for more details
 see hwpoison[1].

 [0] https://www.spinics.net/lists/arm-kernel/msg560009.html
 [1] 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hwpoison.txt


 arch/arm/kvm/mmu.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 962616fd4ddd..9d1aa294e88f 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -20,8 +20,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct kvm_vcpu 
*vcpu, kvm_pfn_t pfn,
__coherent_cache_guest_page(vcpu, pfn, size);
 }
 
+static void kvm_send_hwpoison_signal(unsigned long address, bool hugetlb)
+{
+   siginfo_t info;
+
+   info.si_signo   = SIGBUS;
+   info.si_errno   = 0;
+   info.si_code= BUS_MCEERR_AR;
+   info.si_addr= (void __user *)address;
+
+   if (hugetlb)
+   info.si_addr_lsb = PMD_SHIFT;
+   else
+   info.si_addr_lsb = PAGE_SHIFT;
+
+   send_sig_info(SIGBUS, , current);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  struct kvm_memory_slot *memslot, unsigned long hva,
  unsigned long fault_status)
@@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
smp_rmb();
 
pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, );
+   if (pfn == KVM_PFN_ERR_HWPOISON) {
+   kvm_send_hwpoison_signal(hva, hugetlb);
+   return 0;
+   }
if (is_error_noslot_pfn(pfn))
return -EFAULT;
 
-- 
2.10.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH RFC 6/7] ARM64: KVM: Support heterogeneous system

2017-03-15 Thread Andrew Jones
On Wed, Mar 15, 2017 at 04:22:37PM +0100, Christoffer Dall wrote:
> On Wed, Mar 15, 2017 at 03:06:33PM +0100, Andrew Jones wrote:
> > On Wed, Mar 15, 2017 at 02:36:45PM +0100, Christoffer Dall wrote:
> > > > If QEMU wants to know
> > > > whether or not the host it's running on is heterogeneous, then
> > > > it can just query sysfs, rather than ask KVM.
> > > > 
> > > 
> > > Can it?  Is this information available in a reliable way from userspace?
> > 
> > I don't know much (anything) about it, but, afaict, yes. See
> > https://lkml.org/lkml/2017/1/19/380
> > 
> > > > > I'm sure I missed some aspect of this discussion though.
> > > > 
> > > > Me too. As we discussed, it's probably time to try and hash out a fresh
> > > > design doc. It'd be good to get a clear design agreed upon before
> > > > returning to the patches.
> > > > 
> > > 
> > > Yes, it's on my list.
> > 
> > I'm happy to help with this, even to take a stab at the first draft. Just
> > let me know if you or Shannon prefer to do the draft yourselves. If you
> > don't have a preference, then I can start working on it pretty soon.
> > 
> I'd be really happy for you to draft a new design doc, and I can review
> in private or publicly first as you prefer.
>

OK, I'll hopefully have something pulled together by the end of next week.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH RFC 6/7] ARM64: KVM: Support heterogeneous system

2017-03-15 Thread Christoffer Dall
On Wed, Mar 15, 2017 at 03:06:33PM +0100, Andrew Jones wrote:
> On Wed, Mar 15, 2017 at 02:36:45PM +0100, Christoffer Dall wrote:
> > > If QEMU wants to know
> > > whether or not the host it's running on is heterogeneous, then
> > > it can just query sysfs, rather than ask KVM.
> > > 
> > 
> > Can it?  Is this information available in a reliable way from userspace?
> 
> I don't know much (anything) about it, but, afaict, yes. See
> https://lkml.org/lkml/2017/1/19/380
> 
> > > > I'm sure I missed some aspect of this discussion though.
> > > 
> > > Me too. As we discussed, it's probably time to try and hash out a fresh
> > > design doc. It'd be good to get a clear design agreed upon before
> > > returning to the patches.
> > > 
> > 
> > Yes, it's on my list.
> 
> I'm happy to help with this, even to take a stab at the first draft. Just
> let me know if you or Shannon prefer to do the draft yourselves. If you
> don't have a preference, then I can start working on it pretty soon.
> 
I'd be really happy for you to draft a new design doc, and I can review
in private or publicly first as you prefer.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-03-15 Thread Marc Zyngier
On 15/03/17 14:33, Suzuki K Poulose wrote:
> On 15/03/17 13:28, Marc Zyngier wrote:
>> On 15/03/17 10:56, Christoffer Dall wrote:
>>> On Wed, Mar 15, 2017 at 09:39:26AM +, Marc Zyngier wrote:
 On 15/03/17 09:21, Christoffer Dall wrote:
> On Tue, Mar 14, 2017 at 02:52:34PM +, Suzuki K Poulose wrote:
>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>> unmap_stage2_range() on the entire memory range for the guest. This could
>> cause problems with other callers (e.g, munmap on a memslot) trying to
>> unmap a range.
>>
>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>> Cc: sta...@vger.kernel.org # v3.10+
>> Cc: Marc Zyngier 
>> Cc: Christoffer Dall 
>> Signed-off-by: Suzuki K Poulose 
> 
> ...
> 
>>> ok, then there's just the concern that we may be holding a spinlock for
>>> a very long time.  I seem to recall Mario once added something where he
>>> unlocked and gave a chance to schedule something else for each PUD or
>>> something like that, because he ran into the issue during migration.  Am
>>> I confusing this with something else?
>>
>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>> to give the system a chance to breathe. Maybe we could use a similar
>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616fd4ddd..1786c24212d4 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, 
>> phys_addr_t start, u64 size)
>>  phys_addr_t addr = start, end = start + size;
>>  phys_addr_t next;
>>
>> +BUG_ON(!spin_is_locked(>mmu_lock));
>> +
>>  pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>  do {
>> +if (need_resched() || spin_needbreak(>mmu_lock))
>> +cond_resched_lock(>mmu_lock);
> 
> nit: I think we could make the cond_resched_lock() unconditionally here:
> Given, __cond_resched_lock() already does all the above checks :
> 
> kernel/sched/core.c:
> 
> int __cond_resched_lock(spinlock_t *lock)
> {
>  int resched = should_resched(PREEMPT_LOCK_OFFSET);
> 
> ...
> 
>  if (spin_needbreak(lock) || resched) {

Right. And should_resched() also contains a test for need_resched().

This means we can also simplify stage2_wp_range(). Awesome!

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH RFC 6/7] ARM64: KVM: Support heterogeneous system

2017-03-15 Thread Mark Rutland
On Wed, Mar 15, 2017 at 03:06:33PM +0100, Andrew Jones wrote:
> On Wed, Mar 15, 2017 at 02:36:45PM +0100, Christoffer Dall wrote:
> > > If QEMU wants to know
> > > whether or not the host it's running on is heterogeneous, then
> > > it can just query sysfs, rather than ask KVM.
> > > 
> > 
> > Can it?  Is this information available in a reliable way from userspace?
> 
> I don't know much (anything) about it, but, afaict, yes. See
> https://lkml.org/lkml/2017/1/19/380

The "capacity" of a CPU does *not* tell you if your system is
hetereogeneous. Two vastly different CPU implementations can stumble
upon the same capacity, and two identical implementations could be
assigned close but not identical capacities.

The "capacity" is purely a scheduler heuristic, and should not be relied
upon for functional correctness.

We have a sysfs interface to see the MIDR and REVIDR of (online) CPUs,
which can tell you. See Documentation/arm64/cpu-feature-registers.txt.

Whether a system is heterogeneous can change at runtime, as CPUs can be
brought online very late (e.g. if booted with maxcpus capped, or if we
get "real" hotplug in future).

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH RFC 6/7] ARM64: KVM: Support heterogeneous system

2017-03-15 Thread Andrew Jones
On Wed, Mar 15, 2017 at 02:21:48PM +, Peter Maydell wrote:
> On 15 March 2017 at 14:06, Andrew Jones  wrote:
> > On Wed, Mar 15, 2017 at 02:36:45PM +0100, Christoffer Dall wrote:
> >> > If QEMU wants to know
> >> > whether or not the host it's running on is heterogeneous, then
> >> > it can just query sysfs, rather than ask KVM.
> >> >
> >>
> >> Can it?  Is this information available in a reliable way from userspace?
> >
> > I don't know much (anything) about it, but, afaict, yes. See
> > https://lkml.org/lkml/2017/1/19/380
> 
> AFAICT that won't work if the CPU you're trying to
> look at the ID registers for happens to be offline at
> the time you're looking at it.
>

Hmm, OK, but I think the jury is still out on whether QEMU even
needs to know this information. If we push the burden up to the
user/libvirt, than this information can be obtained [somehow] at
the higher level. The higher level can then choose to use generic,
if no vcpu affinity is desired, or, if one of the heterogeneous
cpu types is preferred for the vcpu's model, then it can also
ensure the vcpu is affined to the appropriate cpuset.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-03-15 Thread Suzuki K Poulose

On 15/03/17 13:28, Marc Zyngier wrote:

On 15/03/17 10:56, Christoffer Dall wrote:

On Wed, Mar 15, 2017 at 09:39:26AM +, Marc Zyngier wrote:

On 15/03/17 09:21, Christoffer Dall wrote:

On Tue, Mar 14, 2017 at 02:52:34PM +, Suzuki K Poulose wrote:

In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
unmap_stage2_range() on the entire memory range for the guest. This could
cause problems with other callers (e.g, munmap on a memslot) trying to
unmap a range.

Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Cc: sta...@vger.kernel.org # v3.10+
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 


...


ok, then there's just the concern that we may be holding a spinlock for
a very long time.  I seem to recall Mario once added something where he
unlocked and gave a chance to schedule something else for each PUD or
something like that, because he ran into the issue during migration.  Am
I confusing this with something else?


That definitely rings a bell: stage2_wp_range() uses that kind of trick
to give the system a chance to breathe. Maybe we could use a similar
trick in our S2 unmapping code? How about this (completely untested) patch:

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 962616fd4ddd..1786c24212d4 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
phys_addr_t addr = start, end = start + size;
phys_addr_t next;

+   BUG_ON(!spin_is_locked(>mmu_lock));
+
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
do {
+   if (need_resched() || spin_needbreak(>mmu_lock))
+   cond_resched_lock(>mmu_lock);


nit: I think we could make the cond_resched_lock() unconditionally here:
Given, __cond_resched_lock() already does all the above checks :

kernel/sched/core.c:

int __cond_resched_lock(spinlock_t *lock)
{
int resched = should_resched(PREEMPT_LOCK_OFFSET);

...

if (spin_needbreak(lock) || resched) {


Suzuki
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH RFC 6/7] ARM64: KVM: Support heterogeneous system

2017-03-15 Thread Peter Maydell
On 15 March 2017 at 14:06, Andrew Jones  wrote:
> On Wed, Mar 15, 2017 at 02:36:45PM +0100, Christoffer Dall wrote:
>> > If QEMU wants to know
>> > whether or not the host it's running on is heterogeneous, then
>> > it can just query sysfs, rather than ask KVM.
>> >
>>
>> Can it?  Is this information available in a reliable way from userspace?
>
> I don't know much (anything) about it, but, afaict, yes. See
> https://lkml.org/lkml/2017/1/19/380

AFAICT that won't work if the CPU you're trying to
look at the ID registers for happens to be offline at
the time you're looking at it.

thanks
-- PMM
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH RFC 6/7] ARM64: KVM: Support heterogeneous system

2017-03-15 Thread Andrew Jones
On Wed, Mar 15, 2017 at 02:36:45PM +0100, Christoffer Dall wrote:
> > If QEMU wants to know
> > whether or not the host it's running on is heterogeneous, then
> > it can just query sysfs, rather than ask KVM.
> > 
> 
> Can it?  Is this information available in a reliable way from userspace?

I don't know much (anything) about it, but, afaict, yes. See
https://lkml.org/lkml/2017/1/19/380

> > > I'm sure I missed some aspect of this discussion though.
> > 
> > Me too. As we discussed, it's probably time to try and hash out a fresh
> > design doc. It'd be good to get a clear design agreed upon before
> > returning to the patches.
> > 
> 
> Yes, it's on my list.

I'm happy to help with this, even to take a stab at the first draft. Just
let me know if you or Shannon prefer to do the draft yourselves. If you
don't have a preference, then I can start working on it pretty soon.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-03-15 Thread Robin Murphy
Hi Marc,

On 15/03/17 13:43, Marc Zyngier wrote:
> On 15/03/17 13:35, Christoffer Dall wrote:
>> On Wed, Mar 15, 2017 at 01:28:07PM +, Marc Zyngier wrote:
>>> On 15/03/17 10:56, Christoffer Dall wrote:
 On Wed, Mar 15, 2017 at 09:39:26AM +, Marc Zyngier wrote:
> On 15/03/17 09:21, Christoffer Dall wrote:
>> On Tue, Mar 14, 2017 at 02:52:34PM +, Suzuki K Poulose wrote:
>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>> unmap_stage2_range() on the entire memory range for the guest. This 
>>> could
>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>> unmap a range.
>>>
>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>> Cc: sta...@vger.kernel.org # v3.10+
>>> Cc: Marc Zyngier 
>>> Cc: Christoffer Dall 
>>> Signed-off-by: Suzuki K Poulose 
>>> ---
>>>  arch/arm/kvm/mmu.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 13b9c1f..b361f71 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>> if (kvm->arch.pgd == NULL)
>>> return;
>>>  
>>> +   spin_lock(>mmu_lock);
>>> unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>>> +   spin_unlock(>mmu_lock);
>>> +
>>
>> This ends up holding the spin lock for potentially quite a while, where
>> we can do things like __flush_dcache_area(), which I think can fault.
>
> I believe we're always using the linear mapping (or kmap on 32bit) in
> order not to fault.
>

 ok, then there's just the concern that we may be holding a spinlock for
 a very long time.  I seem to recall Mario once added something where he
 unlocked and gave a chance to schedule something else for each PUD or
 something like that, because he ran into the issue during migration.  Am
 I confusing this with something else?
>>>
>>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>>> to give the system a chance to breathe. Maybe we could use a similar
>>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 962616fd4ddd..1786c24212d4 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, 
>>> phys_addr_t start, u64 size)
>>> phys_addr_t addr = start, end = start + size;
>>> phys_addr_t next;
>>>  
>>> +   BUG_ON(!spin_is_locked(>mmu_lock));

Nit: assert_spin_locked() is somewhat more pleasant (and currently looks
to expand to the exact same code).

Robin.

>>> +
>>> pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>> do {
>>> +   if (need_resched() || spin_needbreak(>mmu_lock))
>>> +   cond_resched_lock(>mmu_lock);
>>> +
>>> next = stage2_pgd_addr_end(addr, end);
>>> if (!stage2_pgd_none(*pgd))
>>> unmap_stage2_puds(kvm, pgd, addr, next);
>>>
>>> The additional BUG_ON() is just for my own peace of mind - we seem to
>>> have missed a couple of these lately, and the "breathing" code makes
>>> it imperative that this lock is being taken prior to entering the
>>> function.
>>>
>>
>> Looks good to me!
> 
> OK. I'll stash that on top of Suzuki's series, and start running some
> actual tests... ;-)
> 
> Thanks,
> 
>   M.
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-03-15 Thread Marc Zyngier
On 15/03/17 13:35, Christoffer Dall wrote:
> On Wed, Mar 15, 2017 at 01:28:07PM +, Marc Zyngier wrote:
>> On 15/03/17 10:56, Christoffer Dall wrote:
>>> On Wed, Mar 15, 2017 at 09:39:26AM +, Marc Zyngier wrote:
 On 15/03/17 09:21, Christoffer Dall wrote:
> On Tue, Mar 14, 2017 at 02:52:34PM +, Suzuki K Poulose wrote:
>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>> unmap_stage2_range() on the entire memory range for the guest. This could
>> cause problems with other callers (e.g, munmap on a memslot) trying to
>> unmap a range.
>>
>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>> Cc: sta...@vger.kernel.org # v3.10+
>> Cc: Marc Zyngier 
>> Cc: Christoffer Dall 
>> Signed-off-by: Suzuki K Poulose 
>> ---
>>  arch/arm/kvm/mmu.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 13b9c1f..b361f71 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>  if (kvm->arch.pgd == NULL)
>>  return;
>>  
>> +spin_lock(>mmu_lock);
>>  unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>> +spin_unlock(>mmu_lock);
>> +
>
> This ends up holding the spin lock for potentially quite a while, where
> we can do things like __flush_dcache_area(), which I think can fault.

 I believe we're always using the linear mapping (or kmap on 32bit) in
 order not to fault.

>>>
>>> ok, then there's just the concern that we may be holding a spinlock for
>>> a very long time.  I seem to recall Mario once added something where he
>>> unlocked and gave a chance to schedule something else for each PUD or
>>> something like that, because he ran into the issue during migration.  Am
>>> I confusing this with something else?
>>
>> That definitely rings a bell: stage2_wp_range() uses that kind of trick
>> to give the system a chance to breathe. Maybe we could use a similar
>> trick in our S2 unmapping code? How about this (completely untested) patch:
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616fd4ddd..1786c24212d4 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, 
>> phys_addr_t start, u64 size)
>>  phys_addr_t addr = start, end = start + size;
>>  phys_addr_t next;
>>  
>> +BUG_ON(!spin_is_locked(>mmu_lock));
>> +
>>  pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>>  do {
>> +if (need_resched() || spin_needbreak(>mmu_lock))
>> +cond_resched_lock(>mmu_lock);
>> +
>>  next = stage2_pgd_addr_end(addr, end);
>>  if (!stage2_pgd_none(*pgd))
>>  unmap_stage2_puds(kvm, pgd, addr, next);
>>
>> The additional BUG_ON() is just for my own peace of mind - we seem to
>> have missed a couple of these lately, and the "breathing" code makes
>> it imperative that this lock is being taken prior to entering the
>> function.
>>
> 
> Looks good to me!

OK. I'll stash that on top of Suzuki's series, and start running some
actual tests... ;-)

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH RFC 6/7] ARM64: KVM: Support heterogeneous system

2017-03-15 Thread Christoffer Dall
On Wed, Mar 15, 2017 at 01:51:13PM +0100, Andrew Jones wrote:
> On Wed, Mar 15, 2017 at 12:50:44PM +0100, Christoffer Dall wrote:
> > Hi Drew,
> > 
> > [Replying here to try to capture the discussion about this patch we had
> > at connect].
> > 
> > On Sat, Jan 28, 2017 at 03:55:51PM +0100, Andrew Jones wrote:
> > > On Mon, Jan 16, 2017 at 05:33:33PM +0800, Shannon Zhao wrote:
> > > > From: Shannon Zhao 
> > > > 
> > > > When initializing KVM, check whether physical hardware is a
> > > > heterogeneous system through the MIDR values. If so, force userspace to
> > > > set the KVM_ARM_VCPU_CROSS feature bit. Otherwise, it should fail to
> > > > initialize VCPUs.
> > > > 
> > > > Signed-off-by: Shannon Zhao 
> > > > ---
> > > >  arch/arm/kvm/arm.c   | 26 ++
> > > >  include/uapi/linux/kvm.h |  1 +
> > > >  2 files changed, 27 insertions(+)
> > > > 
> > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > > index bdceb19..21ec070 100644
> > > > --- a/arch/arm/kvm/arm.c
> > > > +++ b/arch/arm/kvm/arm.c
> > > > @@ -46,6 +46,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  
> > > >  #ifdef REQUIRES_VIRT
> > > >  __asm__(".arch_extension   virt");
> > > > @@ -65,6 +66,7 @@ static unsigned int kvm_vmid_bits __read_mostly;
> > > >  static DEFINE_SPINLOCK(kvm_vmid_lock);
> > > >  
> > > >  static bool vgic_present;
> > > > +static bool heterogeneous_system;
> > > >  
> > > >  static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
> > > >  
> > > > @@ -210,6 +212,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
> > > > long ext)
> > > > case KVM_CAP_ARM_CROSS_VCPU:
> > > > r = 1;
> > > > break;
> > > > +   case KVM_CAP_ARM_HETEROGENEOUS:
> > > > +   r = heterogeneous_system;
> > > > +   break;
> > > 
> > > What's this for? When/why would usespace check it?
> > > 
> > 
> > Without a capability, how can userspace tell the difference between "I
> > got -EINVAL because I'm on an old kernel" or "I asked for something that
> > any kernel cannot emulate"?
> > 
> > Do we need to distinguish between these cases?
> 
> Yup, I'm in full agreement that we need a capability for the
> cross-vcpu support. Above this heterogeneous one there's the
> CROSS_VCPU one though. Do we need both? 

Probably not.

> If QEMU wants to know
> whether or not the host it's running on is heterogeneous, then
> it can just query sysfs, rather than ask KVM.
> 

Can it?  Is this information available in a reliable way from userspace?

> > 
> > > > case KVM_CAP_COALESCED_MMIO:
> > > > r = KVM_COALESCED_MMIO_PAGE_OFFSET;
> > > > break;
> > > > @@ -812,6 +817,12 @@ static int kvm_vcpu_set_target(struct kvm_vcpu 
> > > > *vcpu,
> > > > int phys_target = kvm_target_cpu();
> > > > bool cross_vcpu = kvm_vcpu_has_feature_cross_cpu(init);
> > > >  
> > > > +   if (heterogeneous_system && !cross_vcpu) {
> > > > +   kvm_err("%s:Host is a heterogeneous system, set 
> > > > KVM_ARM_VCPU_CROSS bit\n",
> > > > +   __func__);
> > > > +   return -EINVAL;
> > > > +   }
> > > 
> > > Instead of forcing userspace to set a bit, why not just confirm the
> > > target selected will work? E.g. if only generic works on a heterogeneous
> > > system then just 
> > > 
> > >  if (heterogeneous_system && init->target != GENERIC)
> > > return -EINVAL
> > > 
> > > should work
> > > 
> > 
> > Yes, I think we concluded that if we advertise if we can do the
> > cross type emulation or not, then if we can do the emulation we should
> > just do it when possible, for maximum user experience.
> 
> Your agreement here implies to me that we only need the one capability.
> 

Yes.

> > 
> > I'm sure I missed some aspect of this discussion though.
> 
> Me too. As we discussed, it's probably time to try and hash out a fresh
> design doc. It'd be good to get a clear design agreed upon before
> returning to the patches.
> 

Yes, it's on my list.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-03-15 Thread Christoffer Dall
On Wed, Mar 15, 2017 at 01:28:07PM +, Marc Zyngier wrote:
> On 15/03/17 10:56, Christoffer Dall wrote:
> > On Wed, Mar 15, 2017 at 09:39:26AM +, Marc Zyngier wrote:
> >> On 15/03/17 09:21, Christoffer Dall wrote:
> >>> On Tue, Mar 14, 2017 at 02:52:34PM +, Suzuki K Poulose wrote:
>  In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>  unmap_stage2_range() on the entire memory range for the guest. This could
>  cause problems with other callers (e.g, munmap on a memslot) trying to
>  unmap a range.
> 
>  Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>  Cc: sta...@vger.kernel.org # v3.10+
>  Cc: Marc Zyngier 
>  Cc: Christoffer Dall 
>  Signed-off-by: Suzuki K Poulose 
>  ---
>   arch/arm/kvm/mmu.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
>  diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>  index 13b9c1f..b361f71 100644
>  --- a/arch/arm/kvm/mmu.c
>  +++ b/arch/arm/kvm/mmu.c
>  @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>   if (kvm->arch.pgd == NULL)
>   return;
>   
>  +spin_lock(>mmu_lock);
>   unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>  +spin_unlock(>mmu_lock);
>  +
> >>>
> >>> This ends up holding the spin lock for potentially quite a while, where
> >>> we can do things like __flush_dcache_area(), which I think can fault.
> >>
> >> I believe we're always using the linear mapping (or kmap on 32bit) in
> >> order not to fault.
> >>
> > 
> > ok, then there's just the concern that we may be holding a spinlock for
> > a very long time.  I seem to recall Mario once added something where he
> > unlocked and gave a chance to schedule something else for each PUD or
> > something like that, because he ran into the issue during migration.  Am
> > I confusing this with something else?
> 
> That definitely rings a bell: stage2_wp_range() uses that kind of trick
> to give the system a chance to breathe. Maybe we could use a similar
> trick in our S2 unmapping code? How about this (completely untested) patch:
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 962616fd4ddd..1786c24212d4 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, 
> phys_addr_t start, u64 size)
>   phys_addr_t addr = start, end = start + size;
>   phys_addr_t next;
>  
> + BUG_ON(!spin_is_locked(>mmu_lock));
> +
>   pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>   do {
> + if (need_resched() || spin_needbreak(>mmu_lock))
> + cond_resched_lock(>mmu_lock);
> +
>   next = stage2_pgd_addr_end(addr, end);
>   if (!stage2_pgd_none(*pgd))
>   unmap_stage2_puds(kvm, pgd, addr, next);
> 
> The additional BUG_ON() is just for my own peace of mind - we seem to
> have missed a couple of these lately, and the "breathing" code makes
> it imperative that this lock is being taken prior to entering the
> function.
> 

Looks good to me!

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] kvm: arm/arm64: Take mmap_sem in stage2_unmap_vm

2017-03-15 Thread Paolo Bonzini


On 15/03/2017 10:17, Christoffer Dall wrote:
> On Tue, Mar 14, 2017 at 02:52:32PM +, Suzuki K Poulose wrote:
>> From: Marc Zyngier 
>>
>> We don't hold the mmap_sem while searching for the VMAs when
>> we try to unmap each memslot for a VM. Fix this properly to
>> avoid unexpected results.
>>
>> Fixes: commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm")
>> Cc: sta...@vger.kernel.org # v3.19+
>> Cc: Christoffer Dall 
>> Signed-off-by: Marc Zyngier 
>> Signed-off-by: Suzuki K Poulose 
>> ---
>>  arch/arm/kvm/mmu.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616f..f2e2e0c 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -803,6 +803,7 @@ void stage2_unmap_vm(struct kvm *kvm)
>>  int idx;
>>  
>>  idx = srcu_read_lock(>srcu);
>> +down_read(>mm->mmap_sem);
>>  spin_lock(>mmu_lock);
>>  
>>  slots = kvm_memslots(kvm);
>> @@ -810,6 +811,7 @@ void stage2_unmap_vm(struct kvm *kvm)
>>  stage2_unmap_memslot(kvm, memslot);
>>  
>>  spin_unlock(>mmu_lock);
>> +up_read(>mm->mmap_sem);
>>  srcu_read_unlock(>srcu, idx);
>>  }
>>  
>> -- 
>> 2.7.4
>>
> 
> Are we sure that holding mmu_lock is valid while holding the mmap_sem?

Sure, spinlock-inside-semaphore and spinlock-inside-mutex is always okay.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-03-15 Thread Marc Zyngier
On 15/03/17 10:56, Christoffer Dall wrote:
> On Wed, Mar 15, 2017 at 09:39:26AM +, Marc Zyngier wrote:
>> On 15/03/17 09:21, Christoffer Dall wrote:
>>> On Tue, Mar 14, 2017 at 02:52:34PM +, Suzuki K Poulose wrote:
 In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
 unmap_stage2_range() on the entire memory range for the guest. This could
 cause problems with other callers (e.g, munmap on a memslot) trying to
 unmap a range.

 Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
 Cc: sta...@vger.kernel.org # v3.10+
 Cc: Marc Zyngier 
 Cc: Christoffer Dall 
 Signed-off-by: Suzuki K Poulose 
 ---
  arch/arm/kvm/mmu.c | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
 index 13b9c1f..b361f71 100644
 --- a/arch/arm/kvm/mmu.c
 +++ b/arch/arm/kvm/mmu.c
 @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
if (kvm->arch.pgd == NULL)
return;
  
 +  spin_lock(>mmu_lock);
unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
 +  spin_unlock(>mmu_lock);
 +
>>>
>>> This ends up holding the spin lock for potentially quite a while, where
>>> we can do things like __flush_dcache_area(), which I think can fault.
>>
>> I believe we're always using the linear mapping (or kmap on 32bit) in
>> order not to fault.
>>
> 
> ok, then there's just the concern that we may be holding a spinlock for
> a very long time.  I seem to recall Mario once added something where he
> unlocked and gave a chance to schedule something else for each PUD or
> something like that, because he ran into the issue during migration.  Am
> I confusing this with something else?

That definitely rings a bell: stage2_wp_range() uses that kind of trick
to give the system a chance to breathe. Maybe we could use a similar
trick in our S2 unmapping code? How about this (completely untested) patch:

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 962616fd4ddd..1786c24212d4 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -292,8 +292,13 @@ static void unmap_stage2_range(struct kvm *kvm, 
phys_addr_t start, u64 size)
phys_addr_t addr = start, end = start + size;
phys_addr_t next;
 
+   BUG_ON(!spin_is_locked(>mmu_lock));
+
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
do {
+   if (need_resched() || spin_needbreak(>mmu_lock))
+   cond_resched_lock(>mmu_lock);
+
next = stage2_pgd_addr_end(addr, end);
if (!stage2_pgd_none(*pgd))
unmap_stage2_puds(kvm, pgd, addr, next);

The additional BUG_ON() is just for my own peace of mind - we seem to
have missed a couple of these lately, and the "breathing" code makes
it imperative that this lock is being taken prior to entering the
function.

Thoughts?

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH RFC 6/7] ARM64: KVM: Support heterogeneous system

2017-03-15 Thread Andrew Jones
On Wed, Mar 15, 2017 at 12:50:44PM +0100, Christoffer Dall wrote:
> Hi Drew,
> 
> [Replying here to try to capture the discussion about this patch we had
> at connect].
> 
> On Sat, Jan 28, 2017 at 03:55:51PM +0100, Andrew Jones wrote:
> > On Mon, Jan 16, 2017 at 05:33:33PM +0800, Shannon Zhao wrote:
> > > From: Shannon Zhao 
> > > 
> > > When initializing KVM, check whether physical hardware is a
> > > heterogeneous system through the MIDR values. If so, force userspace to
> > > set the KVM_ARM_VCPU_CROSS feature bit. Otherwise, it should fail to
> > > initialize VCPUs.
> > > 
> > > Signed-off-by: Shannon Zhao 
> > > ---
> > >  arch/arm/kvm/arm.c   | 26 ++
> > >  include/uapi/linux/kvm.h |  1 +
> > >  2 files changed, 27 insertions(+)
> > > 
> > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > index bdceb19..21ec070 100644
> > > --- a/arch/arm/kvm/arm.c
> > > +++ b/arch/arm/kvm/arm.c
> > > @@ -46,6 +46,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  
> > >  #ifdef REQUIRES_VIRT
> > >  __asm__(".arch_extension virt");
> > > @@ -65,6 +66,7 @@ static unsigned int kvm_vmid_bits __read_mostly;
> > >  static DEFINE_SPINLOCK(kvm_vmid_lock);
> > >  
> > >  static bool vgic_present;
> > > +static bool heterogeneous_system;
> > >  
> > >  static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
> > >  
> > > @@ -210,6 +212,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, 
> > > long ext)
> > >   case KVM_CAP_ARM_CROSS_VCPU:
> > >   r = 1;
> > >   break;
> > > + case KVM_CAP_ARM_HETEROGENEOUS:
> > > + r = heterogeneous_system;
> > > + break;
> > 
> > What's this for? When/why would usespace check it?
> > 
> 
> Without a capability, how can userspace tell the difference between "I
> got -EINVAL because I'm on an old kernel" or "I asked for something that
> any kernel cannot emulate"?
> 
> Do we need to distinguish between these cases?

Yup, I'm in full agreement that we need a capability for the
cross-vcpu support. Above this heterogeneous one there's the
CROSS_VCPU one though. Do we need both? If QEMU wants to know
whether or not the host it's running on is heterogeneous, then
it can just query sysfs, rather than ask KVM.

> 
> > >   case KVM_CAP_COALESCED_MMIO:
> > >   r = KVM_COALESCED_MMIO_PAGE_OFFSET;
> > >   break;
> > > @@ -812,6 +817,12 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
> > >   int phys_target = kvm_target_cpu();
> > >   bool cross_vcpu = kvm_vcpu_has_feature_cross_cpu(init);
> > >  
> > > + if (heterogeneous_system && !cross_vcpu) {
> > > + kvm_err("%s:Host is a heterogeneous system, set 
> > > KVM_ARM_VCPU_CROSS bit\n",
> > > + __func__);
> > > + return -EINVAL;
> > > + }
> > 
> > Instead of forcing userspace to set a bit, why not just confirm the
> > target selected will work? E.g. if only generic works on a heterogeneous
> > system then just 
> > 
> >  if (heterogeneous_system && init->target != GENERIC)
> > return -EINVAL
> > 
> > should work
> > 
> 
> Yes, I think we concluded that if we advertise if we can do the
> cross type emulation or not, then if we can do the emulation we should
> just do it when possible, for maximum user experience.

Your agreement here implies to me that we only need the one capability.

> 
> I'm sure I missed some aspect of this discussion though.

Me too. As we discussed, it's probably time to try and hash out a fresh
design doc. It'd be good to get a clear design agreed upon before
returning to the patches.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH RFC 6/7] ARM64: KVM: Support heterogeneous system

2017-03-15 Thread Christoffer Dall
Hi Drew,

[Replying here to try to capture the discussion about this patch we had
at connect].

On Sat, Jan 28, 2017 at 03:55:51PM +0100, Andrew Jones wrote:
> On Mon, Jan 16, 2017 at 05:33:33PM +0800, Shannon Zhao wrote:
> > From: Shannon Zhao 
> > 
> > When initializing KVM, check whether physical hardware is a
> > heterogeneous system through the MIDR values. If so, force userspace to
> > set the KVM_ARM_VCPU_CROSS feature bit. Otherwise, it should fail to
> > initialize VCPUs.
> > 
> > Signed-off-by: Shannon Zhao 
> > ---
> >  arch/arm/kvm/arm.c   | 26 ++
> >  include/uapi/linux/kvm.h |  1 +
> >  2 files changed, 27 insertions(+)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index bdceb19..21ec070 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -46,6 +46,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #ifdef REQUIRES_VIRT
> >  __asm__(".arch_extension   virt");
> > @@ -65,6 +66,7 @@ static unsigned int kvm_vmid_bits __read_mostly;
> >  static DEFINE_SPINLOCK(kvm_vmid_lock);
> >  
> >  static bool vgic_present;
> > +static bool heterogeneous_system;
> >  
> >  static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
> >  
> > @@ -210,6 +212,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> > ext)
> > case KVM_CAP_ARM_CROSS_VCPU:
> > r = 1;
> > break;
> > +   case KVM_CAP_ARM_HETEROGENEOUS:
> > +   r = heterogeneous_system;
> > +   break;
> 
> What's this for? When/why would usespace check it?
> 

Without a capability, how can userspace tell the difference between "I
got -EINVAL because I'm on an old kernel" or "I asked for something that
any kernel cannot emulate"?

Do we need to distinguish between these cases?

> > case KVM_CAP_COALESCED_MMIO:
> > r = KVM_COALESCED_MMIO_PAGE_OFFSET;
> > break;
> > @@ -812,6 +817,12 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
> > int phys_target = kvm_target_cpu();
> > bool cross_vcpu = kvm_vcpu_has_feature_cross_cpu(init);
> >  
> > +   if (heterogeneous_system && !cross_vcpu) {
> > +   kvm_err("%s:Host is a heterogeneous system, set 
> > KVM_ARM_VCPU_CROSS bit\n",
> > +   __func__);
> > +   return -EINVAL;
> > +   }
> 
> Instead of forcing userspace to set a bit, why not just confirm the
> target selected will work? E.g. if only generic works on a heterogeneous
> system then just 
> 
>  if (heterogeneous_system && init->target != GENERIC)
> return -EINVAL
> 
> should work
> 

Yes, I think we concluded that if we advertise if we can do the
cross type emulation or not, then if we can do the emulation we should
just do it when possible, for maximum user experience.

I'm sure I missed some aspect of this discussion though.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] kvm: arm/arm64: Take mmap_sem in kvm_arch_prepare_memory_region

2017-03-15 Thread Christoffer Dall
On Tue, Mar 14, 2017 at 02:52:33PM +, Suzuki K Poulose wrote:
> From: Marc Zyngier 
> 
> We don't hold the mmap_sem while searching for VMAs (via find_vma), in
> kvm_arch_prepare_memory_region, which can end up in expected failures.
> 
> Fixes: commit 8eef91239e57 ("arm/arm64: KVM: map MMIO regions at creation 
> time")
> Cc: Ard Biesheuvel 
> Cc: Christoffer Dall 
> Cc: Eric Auger 
> Cc: sta...@vger.kernel.org # v3.18+
> Signed-off-by: Marc Zyngier 
> [ Handle dirty page logging failure case ]
> Signed-off-by: Suzuki K Poulose 

Reviewed-by: Christoffer Dall 

> ---
>  arch/arm/kvm/mmu.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index f2e2e0c..13b9c1f 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1803,6 +1803,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>   (KVM_PHYS_SIZE >> PAGE_SHIFT))
>   return -EFAULT;
>  
> + down_read(>mm->mmap_sem);
>   /*
>* A memory region could potentially cover multiple VMAs, and any holes
>* between them, so iterate over all of them to find out if we can map
> @@ -1846,8 +1847,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>   pa += vm_start - vma->vm_start;
>  
>   /* IO region dirty page logging not allowed */
> - if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES)
> - return -EINVAL;
> + if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) {
> + ret = -EINVAL;
> + goto out;
> + }
>  
>   ret = kvm_phys_addr_ioremap(kvm, gpa, pa,
>   vm_end - vm_start,
> @@ -1859,7 +1862,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>   } while (hva < reg_end);
>  
>   if (change == KVM_MR_FLAGS_ONLY)
> - return ret;
> + goto out;
>  
>   spin_lock(>mmu_lock);
>   if (ret)
> @@ -1867,6 +1870,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>   else
>   stage2_flush_memslot(kvm, memslot);
>   spin_unlock(>mmu_lock);
> +out:
> + up_read(>mm->mmap_sem);
>   return ret;
>  }
>  
> -- 
> 2.7.4
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] kvm: arm/arm64: Take mmap_sem in stage2_unmap_vm

2017-03-15 Thread Christoffer Dall
On Wed, Mar 15, 2017 at 09:34:53AM +, Marc Zyngier wrote:
> On 15/03/17 09:17, Christoffer Dall wrote:
> > On Tue, Mar 14, 2017 at 02:52:32PM +, Suzuki K Poulose wrote:
> >> From: Marc Zyngier 
> >>
> >> We don't hold the mmap_sem while searching for the VMAs when
> >> we try to unmap each memslot for a VM. Fix this properly to
> >> avoid unexpected results.
> >>
> >> Fixes: commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm")
> >> Cc: sta...@vger.kernel.org # v3.19+
> >> Cc: Christoffer Dall 
> >> Signed-off-by: Marc Zyngier 
> >> Signed-off-by: Suzuki K Poulose 
> >> ---
> >>  arch/arm/kvm/mmu.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 962616f..f2e2e0c 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -803,6 +803,7 @@ void stage2_unmap_vm(struct kvm *kvm)
> >>int idx;
> >>  
> >>idx = srcu_read_lock(>srcu);
> >> +  down_read(>mm->mmap_sem);
> >>spin_lock(>mmu_lock);
> >>  
> >>slots = kvm_memslots(kvm);
> >> @@ -810,6 +811,7 @@ void stage2_unmap_vm(struct kvm *kvm)
> >>stage2_unmap_memslot(kvm, memslot);
> >>  
> >>spin_unlock(>mmu_lock);
> >> +  up_read(>mm->mmap_sem);
> >>srcu_read_unlock(>srcu, idx);
> >>  }
> >>  
> >> -- 
> >> 2.7.4
> >>
> > 
> > Are we sure that holding mmu_lock is valid while holding the mmap_sem?
> 
> Maybe I'm just confused by the many levels of locking, Here's my rational:
> 
> - kvm->srcu protects the memslot list
> - mmap_sem protects the kernel VMA list
> - mmu_lock protects the stage2 page tables (at least here)
> 
> I don't immediately see any issue with holding the mmap_sem mutex here
> (unless there is a path that would retrigger a down operation on the
> mmap_sem?).
> 
> Or am I missing something obvious?

I was worried that someone else could hold the mmu_lock and take the
mmap_sem, but that wouldn't be allowed of course, because the semaphore
can sleep, so I agree, you should be good.

I just needed this conversation to feel good about this patch ;)

Reviewed-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-03-15 Thread Christoffer Dall
On Wed, Mar 15, 2017 at 09:39:26AM +, Marc Zyngier wrote:
> On 15/03/17 09:21, Christoffer Dall wrote:
> > On Tue, Mar 14, 2017 at 02:52:34PM +, Suzuki K Poulose wrote:
> >> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> >> unmap_stage2_range() on the entire memory range for the guest. This could
> >> cause problems with other callers (e.g, munmap on a memslot) trying to
> >> unmap a range.
> >>
> >> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> >> Cc: sta...@vger.kernel.org # v3.10+
> >> Cc: Marc Zyngier 
> >> Cc: Christoffer Dall 
> >> Signed-off-by: Suzuki K Poulose 
> >> ---
> >>  arch/arm/kvm/mmu.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 13b9c1f..b361f71 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> >>if (kvm->arch.pgd == NULL)
> >>return;
> >>  
> >> +  spin_lock(>mmu_lock);
> >>unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> >> +  spin_unlock(>mmu_lock);
> >> +
> > 
> > This ends up holding the spin lock for potentially quite a while, where
> > we can do things like __flush_dcache_area(), which I think can fault.
> 
> I believe we're always using the linear mapping (or kmap on 32bit) in
> order not to fault.
> 

ok, then there's just the concern that we may be holding a spinlock for
a very long time.  I seem to recall Mario once added something where he
unlocked and gave a chance to schedule something else for each PUD or
something like that, because he ran into the issue during migration.  Am
I confusing this with something else?

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-03-15 Thread Marc Zyngier
On 15/03/17 09:21, Christoffer Dall wrote:
> On Tue, Mar 14, 2017 at 02:52:34PM +, Suzuki K Poulose wrote:
>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>> unmap_stage2_range() on the entire memory range for the guest. This could
>> cause problems with other callers (e.g, munmap on a memslot) trying to
>> unmap a range.
>>
>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>> Cc: sta...@vger.kernel.org # v3.10+
>> Cc: Marc Zyngier 
>> Cc: Christoffer Dall 
>> Signed-off-by: Suzuki K Poulose 
>> ---
>>  arch/arm/kvm/mmu.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 13b9c1f..b361f71 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>  if (kvm->arch.pgd == NULL)
>>  return;
>>  
>> +spin_lock(>mmu_lock);
>>  unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
>> +spin_unlock(>mmu_lock);
>> +
> 
> This ends up holding the spin lock for potentially quite a while, where
> we can do things like __flush_dcache_area(), which I think can fault.

I believe we're always using the linear mapping (or kmap on 32bit) in
order not to fault.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] kvm: arm/arm64: Take mmap_sem in stage2_unmap_vm

2017-03-15 Thread Marc Zyngier
On 15/03/17 09:17, Christoffer Dall wrote:
> On Tue, Mar 14, 2017 at 02:52:32PM +, Suzuki K Poulose wrote:
>> From: Marc Zyngier 
>>
>> We don't hold the mmap_sem while searching for the VMAs when
>> we try to unmap each memslot for a VM. Fix this properly to
>> avoid unexpected results.
>>
>> Fixes: commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm")
>> Cc: sta...@vger.kernel.org # v3.19+
>> Cc: Christoffer Dall 
>> Signed-off-by: Marc Zyngier 
>> Signed-off-by: Suzuki K Poulose 
>> ---
>>  arch/arm/kvm/mmu.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616f..f2e2e0c 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -803,6 +803,7 @@ void stage2_unmap_vm(struct kvm *kvm)
>>  int idx;
>>  
>>  idx = srcu_read_lock(>srcu);
>> +down_read(>mm->mmap_sem);
>>  spin_lock(>mmu_lock);
>>  
>>  slots = kvm_memslots(kvm);
>> @@ -810,6 +811,7 @@ void stage2_unmap_vm(struct kvm *kvm)
>>  stage2_unmap_memslot(kvm, memslot);
>>  
>>  spin_unlock(>mmu_lock);
>> +up_read(>mm->mmap_sem);
>>  srcu_read_unlock(>srcu, idx);
>>  }
>>  
>> -- 
>> 2.7.4
>>
> 
> Are we sure that holding mmu_lock is valid while holding the mmap_sem?

Maybe I'm just confused by the many levels of locking, Here's my rational:

- kvm->srcu protects the memslot list
- mmap_sem protects the kernel VMA list
- mmu_lock protects the stage2 page tables (at least here)

I don't immediately see any issue with holding the mmap_sem mutex here
(unless there is a path that would retrigger a down operation on the
mmap_sem?).

Or am I missing something obvious?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd

2017-03-15 Thread Christoffer Dall
On Tue, Mar 14, 2017 at 02:52:34PM +, Suzuki K Poulose wrote:
> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> unmap_stage2_range() on the entire memory range for the guest. This could
> cause problems with other callers (e.g, munmap on a memslot) trying to
> unmap a range.
> 
> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> Cc: sta...@vger.kernel.org # v3.10+
> Cc: Marc Zyngier 
> Cc: Christoffer Dall 
> Signed-off-by: Suzuki K Poulose 
> ---
>  arch/arm/kvm/mmu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 13b9c1f..b361f71 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -831,7 +831,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>   if (kvm->arch.pgd == NULL)
>   return;
>  
> + spin_lock(>mmu_lock);
>   unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> + spin_unlock(>mmu_lock);
> +

This ends up holding the spin lock for potentially quite a while, where
we can do things like __flush_dcache_area(), which I think can fault.

Is that valid?

Thanks,
-Christoffer

>   /* Free the HW pgd, one page at a time */
>   free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
>   kvm->arch.pgd = NULL;
> -- 
> 2.7.4
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] kvm: arm/arm64: Take mmap_sem in stage2_unmap_vm

2017-03-15 Thread Christoffer Dall
On Tue, Mar 14, 2017 at 02:52:32PM +, Suzuki K Poulose wrote:
> From: Marc Zyngier 
> 
> We don't hold the mmap_sem while searching for the VMAs when
> we try to unmap each memslot for a VM. Fix this properly to
> avoid unexpected results.
> 
> Fixes: commit 957db105c997 ("arm/arm64: KVM: Introduce stage2_unmap_vm")
> Cc: sta...@vger.kernel.org # v3.19+
> Cc: Christoffer Dall 
> Signed-off-by: Marc Zyngier 
> Signed-off-by: Suzuki K Poulose 
> ---
>  arch/arm/kvm/mmu.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 962616f..f2e2e0c 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -803,6 +803,7 @@ void stage2_unmap_vm(struct kvm *kvm)
>   int idx;
>  
>   idx = srcu_read_lock(>srcu);
> + down_read(>mm->mmap_sem);
>   spin_lock(>mmu_lock);
>  
>   slots = kvm_memslots(kvm);
> @@ -810,6 +811,7 @@ void stage2_unmap_vm(struct kvm *kvm)
>   stage2_unmap_memslot(kvm, memslot);
>  
>   spin_unlock(>mmu_lock);
> + up_read(>mm->mmap_sem);
>   srcu_read_unlock(>srcu, idx);
>  }
>  
> -- 
> 2.7.4
> 

Are we sure that holding mmu_lock is valid while holding the mmap_sem?

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 00/15] arm64/kvm: use common sysreg definitions

2017-03-15 Thread Christoffer Dall
On Fri, Mar 10, 2017 at 08:17:22AM +, Marc Zyngier wrote:
> On Thu, Mar 09 2017 at  5:07:12 pm GMT, Mark Rutland  
> wrote:
> > Currently we duplicate effort in maintaining system register encodings 
> > across
> > arm64's , KVM's sysreg tables, and other places. This 
> > redundancy
> > is unfortunate, and as encodings are encoded in-place without any mnemonic,
> > this ends up more painful to read than necessary.
> >
> > This series ameliorates this by making  the canonical location
> > for (architected) system register encodings, with other users building atop 
> > of
> > this, e.g. with KVM deriving its sysreg table values from the common 
> > mnemonics.
> >
> > I've only attacked AArch64-native SYS encodings, and ignored CP{15,14}
> > registers for now, but these could be handled similarly. Largely, I've 
> > stuck to
> > only what KVM needs, though for the debug and perfmon groups it was easier 
> > to
> > take the whole group from the ARM ARM than to filter them to only what KVM
> > needed today.
> >
> > To verify that I haven't accidentally broken KVM, I've diffed sys_regs.o and
> > sys_regs_generic_v8.o on a section-by-section basis before and after the 
> > series
> > is applied. The .text, .data, and .rodata sections (and most others) are
> > identical. The __bug_table section, and some .debug* sections differ, and 
> > this
> > appears to be due to line numbers changing due to removed lines.
> >
> > One thing I wasn't sure how to address was banks of registers such as
> > PMEVCNTR_EL0. We currently enumerate all cases for our GICv3 definitions,
> > but it seemed painful to expand ~30 cases for PMEVCNTR_EL0 and friends, 
> > and
> > for these I've made the macros take an 'n' parameter. It would be nice to be
> > consistent either way, and I'm happy to expand those cases.
> >
> > I've pushed thes series out to a branch [1] based on v4.11-rc1. It looks 
> > like
> > git rebase is also happy to apply the patches atop of the 
> > kvm-arm-for-4.11-rc2
> > tag.
> 
> I had a quick glance at this series, and this looks like a very good
> piece of work - thanks for doing this.

Agreed.  You can add my acked-by on all the KVM patches if you please.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm