Re: [Qemu-devel] ssh session with qemu-arm using busybox

2019-03-12 Thread Suzuki K Poulose




On 12/03/2019 14:02, Pintu Agarwal wrote:


-netdev user,id=unet,hostfwd=tcp::-:22 \
-net user \

and you 'll get guest's port 22 to be forwarded to hosts port , so
you can do

ssh root@localhost:

from the host.



I tried many different options, but unfortunately none worked for me.
1)
qemu-system-arm -M vexpress-a9 -m 1024M -kernel
../KERNEL/linux/arch/arm/boot/zImage -dtb
../KERNEL/linux/arch/arm/boot/dts/vexpress-v2p-ca9.dtb -initrd
rootfs.img.gz -append "console=ttyAMA0 root=/dev/ram rdinit=/sbin/init
ip=dhcp" -nographic -smp 4 -netdev user,id=unet,hostfwd=tcp::-:22
-net user

With this the eth0 interface is removed, and I see this message
(although login works):
qemu-system-arm: warning: hub 0 with no nics
qemu-system-arm: warning: netdev unet has no peer
Booting Linux on physical CPU 0x0

NET: Registered protocol family 17

Run /sbin/init as init process
ifconfig: SIOCSIFADDR: No such device
route: SIOCADDRT: Network is unreachable

But, ssh is still not working.
ssh root@localhost:
ssh: Could not resolve hostname localhost:: Name or service not known


man ssh

+

Make sure you have sshd in your custom rootfs and has been stared.

Cheers
Suzuki



Re: [Qemu-devel] [RFC v4 02/16] linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT

2018-10-19 Thread Suzuki K Poulose

Hi Eric,

On 10/18/2018 03:30 PM, Eric Auger wrote:

This is a header update against kvmarm next branch

git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm kvmarm/next

to get the KVM_ARM_GET_MAX_VM_PHYS_SHIFT ioctl. This allows to retrieve
the IPA address range KVM supports.

Signed-off-by: Eric Auger 

---

v3 -> v4:
- update against kvmarm next
---
  linux-headers/linux/kvm.h | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 83ba4eb571..9647ce4fcb 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -750,6 +750,15 @@ struct kvm_ppc_resize_hpt {
  
  #define KVM_S390_SIE_PAGE_OFFSET 1
  
+/*

+ * On arm64, machine type can be used to request the physical
+ * address size for the VM. Bits[7-0] are reserved for the guest
+ * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
+ * value 0 implies the default IPA size, 40bits.
+ */
+#define KVM_VM_TYPE_ARM_IPA_SIZE_MASK  0xffULL
+#define KVM_VM_TYPE_ARM_IPA_SIZE(x)\
+   ((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
  /*
   * ioctls for /dev/kvm fds:
   */
@@ -953,6 +962,7 @@ struct kvm_ppc_resize_hpt {
  #define KVM_CAP_NESTED_STATE 157
  #define KVM_CAP_ARM_INJECT_SERROR_ESR 158
  #define KVM_CAP_MSR_PLATFORM_INFO 159
+#define KVM_CAP_ARM_VM_IPA_SIZE 160 /* returns maximum IPA bits for a VM */


Please be aware that there have been multiple merge conflicts with
the kvmarm-tree onto kvm tree upstream and the numbers have changed.
I assume that you will be rebasing this to mainline anyways.

Cheers
Suzuki



Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-11 Thread Suzuki K Poulose

On 10/07/18 18:03, Dave Martin wrote:

On Tue, Jul 10, 2018 at 05:38:39PM +0100, Suzuki K Poulose wrote:

On 09/07/18 14:37, Dave Martin wrote:

On Mon, Jul 09, 2018 at 01:29:42PM +0100, Marc Zyngier wrote:

On 09/07/18 12:23, Dave Martin wrote:


[...]


Wedging arguments into a few bits in the type argument feels awkward,
and may be regretted later if we run out of bits, or something can't be
represented in the chosen encoding.


I think that's a pretty convincing argument for a "better" CREATE_VM,
one that would have a clearly defined, structured (and potentially
extensible) argument.

I've quickly hacked the following:

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6270a3b38e9..3e76214034c2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -735,6 +735,20 @@ struct kvm_ppc_resize_hpt {
__u32 pad;
  };

+struct kvm_create_vm2 {
+   __u64   version;/* Or maybe not */
+   union {
+   struct {
+#define KVM_ARM_SVE_CAPABLE(1 << 0)
+#define KVM_ARM_SELECT_IPA {1 << 1)
+   __u64   capabilities;
+   __u16   sve_vlen;
+   __u8ipa_size;
+   } arm64;
+   __u64   dummy[15];
+   };
+};
+
  #define KVMIO 0xAE

  /* machine type bits, to be used as argument to KVM_CREATE_VM */

Other architectures could fill in their own bits if they need to.

Thoughts?


This kind of thing should work, but it may still get messy when we
add additional fields.



Marc, Dave,

I like Dave's approach. Some comments below.



It we want this to work cross-arch, would it make sense to go
for a more generic approach, say

struct kvm_create_vm_attr_any {
 __u32   type;
};

#define KVM_CREATE_VM_ATTR_ARCH_CAPABILITIES 1
struct kvm_create_vm_attr_arch_capabilities {
 __u32   type;
 __u16   size; /* support future expansion of capabilities[] */
 __u16   reserved;
 __u64   capabilities[1];
};


We also need to advertise which attributes are supported by the host,
so that the user can tune the available ones. That would make a bit mask
like the above trickier, unless we return the supported values back
in the argument ptr for the "probe" call. And this scheme in general
can be useful for passing back a non-boolean result specific to the
attribute, without having a per-attribute ioctl. (e.g, maximum limit
for IPA).


Maybe, but this could quickly become bloated.  (My approach already
feels a bit bloated...)

I'm not sure that arbitrarily complex negotiation will really be
needed, but userspace might want to change its mind if setting a
particular propertiy fails.

An alternative might be to have a bunch of per-VM ioctls to configure
different things, like x86 has.  There's at least precedent for that.
For arm, we currently only have a few.  That allows for easy extension,
at the cost of adding ioctls.


As you know, one of the major problems with the per-VM ioctls is
the ordering of different operations and tracking to make sure that
the userspace follows the expected order. e.g, the first approach for
IPA series was based on this and it made things complex enough to drop
it.



There may be some ioctls we can reuse, like KVM_ENABLE_CAP for per-
vm capability flags.


May be we could switch to KVM_VM_CAPS and pass a list of capabilities
to be enabled at creation time ? The kvm_enable_cap can pass in additional
arguments for each cap. That way we don't have to rely on a new set of
attributes and probing becomes straight forward.

Suzuki



Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-10 Thread Suzuki K Poulose

On 09/07/18 14:37, Dave Martin wrote:

On Mon, Jul 09, 2018 at 01:29:42PM +0100, Marc Zyngier wrote:

On 09/07/18 12:23, Dave Martin wrote:

On Fri, Jul 06, 2018 at 05:39:00PM +0100, Suzuki K Poulose wrote:

On 07/06/2018 04:09 PM, Marc Zyngier wrote:

On 06/07/18 14:49, Suzuki K Poulose wrote:

On 04/07/18 23:03, Suzuki K Poulose wrote:

On 07/04/2018 04:51 PM, Will Deacon wrote:

Hi Suzuki,

On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote:

Allow specifying the physical address size for a new VM via
the kvm_type argument for KVM_CREATE_VM ioctl. This allows
us to finalise the stage2 page table format as early as possible
and hence perform the right checks on the memory slots without
complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
of the type field and can encode more information in the future if
required. The IPA size is still capped at 40bits.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Peter Maydel 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Suzuki K Poulose 
---
   arch/arm/include/asm/kvm_mmu.h   |  2 ++
   arch/arm64/include/asm/kvm_arm.h | 10 +++---
   arch/arm64/include/asm/kvm_mmu.h |  2 ++
   include/uapi/linux/kvm.h | 10 ++
   virt/kvm/arm/arm.c   | 24 ++--
   5 files changed, 39 insertions(+), 9 deletions(-)


[...]


diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4df9bb6..fa4cab0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
   #define KVM_S390_SIE_PAGE_OFFSET 1
   /*
+ * On arm/arm64, machine type can be used to request the physical
+ * address size for the VM. Bits [7-0] have been reserved for the
+ * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
+ * value 0 implies the default IPA size, which is 40bits.
+ */
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK    0xff
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)    \
+    ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)


This seems like you're allocating quite a lot of bits in a non-extensible
interface to a fairly esoteric parameter. Would it be better to add another
ioctl, or condense the number of sizes you support instead?


As I explained in the other thread, we need the size as soon as the VM
is created. The major challenge is keeping the backward compatibility by
mapping 0 to 40bits. I will give it a thought.


Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, which
occupies 3 bits and has the following definitions. (ID_AA64MMFR0_EL1:PARange
also has the field definitions, except that the field is 4bits wide, but
only 3bits are used)

000 32 bits, 4GB.
001 36 bits, 64GB.
010 40 bits, 1TB.
011 42 bits, 4TB.
100 44 bits, 16TB.
101 48 bits, 256TB.
110 52 bits, 4PB

But we need to map 0 => 40bits IPA to make our ABI backward compatible. So
we could use the additional one bit to indicate that IPA size is requested
in the 3 bits.

i.e,

machine_type:

Bit [2:0]   - Requested IPA size. Values follow VTCR_EL2.PS format.

Bit [3] - 1 => IPA Size bits (Bits[2:0]) requested.
0 => Not requested

The only minor down side is restricting to the predefined values above,
which is not a real issue for a VM.

Thoughts ?


I'd be very wary of using that 4th bit to do something that is not in
the architecture. We have only a single value left to be used (0b111),
and then your scheme clashes with the architecture definition.


I agree. However, if we ever go beyond the 3bits in PARange, we have an
issue with {V}TCR counter part. But lets not take that chance.



I'd rather encode things in a way that is independent from the
architecture, and be done with it. You can map 0 to 40bits, and we have
the ability to express all values the architecture has (just in a
different order).


The other option I can think of is encoding a signed number which is the
difference of the IPA from 40. But that would need 5 bits if we were to
encode it as it is. And if we want to squeeze it in 4bit, we could store
half the difference (limiting the IPA limit to even numbers).

i.e IPA = 40 + 2 * sign_extend(bits[3:0);


I came across similar issues when trying to work out how to enable
SVE for KVM.  In the end I reduced this to a per-vcpu feature, but
it means that there is no global opt-in for the SVE-specific KVM
API extensions:

That's a bit gross, because SVE may require a change to the way
vcpus are initialised.  The set of supported SVE vector lengths needs
to be set somehow before the vcpu is set running, but it's tricky do
do that without a new ioctl -- which would mean that if SVE is enabled
for a vcpu then the vcpu is not considered runnable until the new
magic ioctl is called.

Opting into that semantic change globally at VM creation time might
be preferable.  On the SVE side, this is still very much subject to
review/change.


Here:

The KVM_CREATE_VM init argument seems undefined by the KVM core code and

Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-06 Thread Suzuki K Poulose

On 07/06/2018 04:09 PM, Marc Zyngier wrote:

On 06/07/18 14:49, Suzuki K Poulose wrote:

On 04/07/18 23:03, Suzuki K Poulose wrote:

On 07/04/2018 04:51 PM, Will Deacon wrote:

Hi Suzuki,

On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote:

Allow specifying the physical address size for a new VM via
the kvm_type argument for KVM_CREATE_VM ioctl. This allows
us to finalise the stage2 page table format as early as possible
and hence perform the right checks on the memory slots without
complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
of the type field and can encode more information in the future if
required. The IPA size is still capped at 40bits.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Peter Maydel 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Suzuki K Poulose 
---
   arch/arm/include/asm/kvm_mmu.h   |  2 ++
   arch/arm64/include/asm/kvm_arm.h | 10 +++---
   arch/arm64/include/asm/kvm_mmu.h |  2 ++
   include/uapi/linux/kvm.h | 10 ++
   virt/kvm/arm/arm.c   | 24 ++--
   5 files changed, 39 insertions(+), 9 deletions(-)


[...]


diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4df9bb6..fa4cab0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
   #define KVM_S390_SIE_PAGE_OFFSET 1
   /*
+ * On arm/arm64, machine type can be used to request the physical
+ * address size for the VM. Bits [7-0] have been reserved for the
+ * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
+ * value 0 implies the default IPA size, which is 40bits.
+ */
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK    0xff
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)    \
+    ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)


This seems like you're allocating quite a lot of bits in a non-extensible
interface to a fairly esoteric parameter. Would it be better to add another
ioctl, or condense the number of sizes you support instead?


As I explained in the other thread, we need the size as soon as the VM
is created. The major challenge is keeping the backward compatibility by
mapping 0 to 40bits. I will give it a thought.


Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, which
occupies 3 bits and has the following definitions. (ID_AA64MMFR0_EL1:PARange
also has the field definitions, except that the field is 4bits wide, but
only 3bits are used)

000 32 bits, 4GB.
001 36 bits, 64GB.
010 40 bits, 1TB.
011 42 bits, 4TB.
100 44 bits, 16TB.
101 48 bits, 256TB.
110 52 bits, 4PB

But we need to map 0 => 40bits IPA to make our ABI backward compatible. So
we could use the additional one bit to indicate that IPA size is requested
in the 3 bits.

i.e,

machine_type:

Bit [2:0]   - Requested IPA size. Values follow VTCR_EL2.PS format.

Bit [3] - 1 => IPA Size bits (Bits[2:0]) requested.
0 => Not requested

The only minor down side is restricting to the predefined values above,
which is not a real issue for a VM.

Thoughts ?


I'd be very wary of using that 4th bit to do something that is not in
the architecture. We have only a single value left to be used (0b111),
and then your scheme clashes with the architecture definition.


I agree. However, if we ever go beyond the 3bits in PARange, we have an
issue with {V}TCR counter part. But lets not take that chance.



I'd rather encode things in a way that is independent from the
architecture, and be done with it. You can map 0 to 40bits, and we have
the ability to express all values the architecture has (just in a
different order).


The other option I can think of is encoding a signed number which is the 
difference of the IPA from 40. But that would need 5 bits if we were to
encode it as it is. And if we want to squeeze it in 4bit, we could store 
half the difference (limiting the IPA limit to even numbers).


i.e IPA = 40 + 2 * sign_extend(bits[3:0);


Suzuki



Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-06 Thread Suzuki K Poulose

On 04/07/18 23:03, Suzuki K Poulose wrote:

On 07/04/2018 04:51 PM, Will Deacon wrote:

Hi Suzuki,

On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote:

Allow specifying the physical address size for a new VM via
the kvm_type argument for KVM_CREATE_VM ioctl. This allows
us to finalise the stage2 page table format as early as possible
and hence perform the right checks on the memory slots without
complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
of the type field and can encode more information in the future if
required. The IPA size is still capped at 40bits.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Peter Maydel 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Suzuki K Poulose 
---
  arch/arm/include/asm/kvm_mmu.h   |  2 ++
  arch/arm64/include/asm/kvm_arm.h | 10 +++---
  arch/arm64/include/asm/kvm_mmu.h |  2 ++
  include/uapi/linux/kvm.h | 10 ++
  virt/kvm/arm/arm.c   | 24 ++--
  5 files changed, 39 insertions(+), 9 deletions(-)


[...]


diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4df9bb6..fa4cab0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
  #define KVM_S390_SIE_PAGE_OFFSET 1
  /*
+ * On arm/arm64, machine type can be used to request the physical
+ * address size for the VM. Bits [7-0] have been reserved for the
+ * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
+ * value 0 implies the default IPA size, which is 40bits.
+ */
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK    0xff
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)    \
+    ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)


This seems like you're allocating quite a lot of bits in a non-extensible
interface to a fairly esoteric parameter. Would it be better to add another
ioctl, or condense the number of sizes you support instead?


As I explained in the other thread, we need the size as soon as the VM
is created. The major challenge is keeping the backward compatibility by
mapping 0 to 40bits. I will give it a thought.


Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, which
occupies 3 bits and has the following definitions. (ID_AA64MMFR0_EL1:PARange
also has the field definitions, except that the field is 4bits wide, but
only 3bits are used)

000 32 bits, 4GB.
001 36 bits, 64GB.
010 40 bits, 1TB.
011 42 bits, 4TB.
100 44 bits, 16TB.
101 48 bits, 256TB.
110 52 bits, 4PB

But we need to map 0 => 40bits IPA to make our ABI backward compatible. So
we could use the additional one bit to indicate that IPA size is requested
in the 3 bits.

i.e,

machine_type:

Bit [2:0]   - Requested IPA size. Values follow VTCR_EL2.PS format.

Bit [3] - 1 => IPA Size bits (Bits[2:0]) requested.
0 => Not requested

The only minor down side is restricting to the predefined values above,
which is not a real issue for a VM.

Thoughts ?

Suzuki



Re: [Qemu-devel] [kvmtool test PATCH 22/24] kvmtool: arm64: Add support for guest physical address size

2018-07-05 Thread Suzuki K Poulose

On 05/07/18 14:46, Auger Eric wrote:

Hi Marc,

On 07/05/2018 03:20 PM, Marc Zyngier wrote:

On 05/07/18 13:47, Julien Grall wrote:

Hi Will,

On 04/07/18 16:52, Will Deacon wrote:

On Wed, Jul 04, 2018 at 04:00:11PM +0100, Julien Grall wrote:

On 04/07/18 15:09, Will Deacon wrote:

On Fri, Jun 29, 2018 at 12:15:42PM +0100, Suzuki K Poulose wrote:

Add an option to specify the physical address size used by this
VM.

Signed-off-by: Suzuki K Poulose 
---
   arm/aarch64/include/kvm/kvm-config-arch.h | 5 -
   arm/include/arm-common/kvm-config-arch.h  | 1 +
   2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h 
b/arm/aarch64/include/kvm/kvm-config-arch.h
index 04be43d..dabd22c 100644
--- a/arm/aarch64/include/kvm/kvm-config-arch.h
+++ b/arm/aarch64/include/kvm/kvm-config-arch.h
@@ -8,7 +8,10 @@
"Create PMUv3 device"),   \
OPT_U64('\0', "kaslr-seed", &(cfg)->kaslr_seed,\
"Specify random seed for Kernel Address Space "   \
-   "Layout Randomization (KASLR)"),
+   "Layout Randomization (KASLR)"),  \
+   OPT_INTEGER('\0', "phys-shift", &(cfg)->phys_shift,\
+   "Specify maximum physical address size (not " \
+   "the amount of memory)"),


Given that this is a shift value, I think the help message could be more
informative. Something like:

"Specify maximum number of bits in a guest physical address"

I think I'd actually leave out any mention of memory, because this does
actually have an effect on the amount of addressable memory in a way that I
don't think we want to describe in half of a usage message line :)

Is there any particular reasons to expose this option to the user?

I have recently sent a series to allow the user to specify the position
of the RAM [1]. With that series in mind, I think the user would not really
need to specify the maximum physical shift. Instead we could automatically
find it.


Marc makes a good point that it doesn't help for MMIO regions, so I'm trying
to understand whether we can do something differently there and avoid
sacrificing the type parameter.


I am not sure to understand this. kvmtools knows the memory layout
(including MMIOs) of the guest, so couldn't it guess the maximum
physical shift for that?


That's exactly what Will was trying to avoid, by having KVM to compute
the size of the IPA space based on the registered memslots. We've now
established that it doesn't work, so what we need to define is:

- whether we need another ioctl(), or do we carry on piggy-backing on
the CPU type,

kvm type I guess


machine type is more appropriate, going by the existing users.


- assuming the latter, whether we can reduce the number of bits used in
the ioctl parameter by subtly encoding the IPA size.

Getting benefit from your Freudian slip, how should guest CPU PARange
and maximum number of bits in a guest physical address relate?

My understanding is they are not correlated at the moment and our guest
PARange is fixed at the moment. But shouldn't they?

On Intel there is
qemu-system-x86_64 -M pc,accel=kvm -cpu SandyBridge,phys-bits=36
or
qemu-system-x86_64 -M pc,accel=kvm -cpu SandyBridge,host-phys-bits=true

where phys-bits, as far as I understand has a a similar semantics as the
PARange.



AFAIT, PARange tells you the maximum (I)Physcial Address that can be handled
by the CPU. But your IPA limit tells you where the guest RAM is placed.
So they need not be the same. e.g, on Juno, A57's have a PARange of 42 if I am
not wrong (but definitely > 40), while A53's have it at 40 and the system RAM
is at 40bits.

So, if we were to only use the A57s on Juno, we could run a KVM instance with 42
bits IPA or anything lower. So, PARange can be inferred as the maximum limit
of the CPU's capability while the IPA is where the RAM is placed for a given
system.
One could keep them in sync for a VM by emulating, but then nobody
uses the PARange, except the KVM. The other problem with capping PARange in the 
VM
to IPA is restricting the IPA size of a nested VM. So, I don't think this is
really beneficial.

Cheers
Suzuki




Thanks

Eric


Thanks,

M.






Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-04 Thread Suzuki K Poulose

On 07/04/2018 04:51 PM, Will Deacon wrote:

Hi Suzuki,

On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote:

Allow specifying the physical address size for a new VM via
the kvm_type argument for KVM_CREATE_VM ioctl. This allows
us to finalise the stage2 page table format as early as possible
and hence perform the right checks on the memory slots without
complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
of the type field and can encode more information in the future if
required. The IPA size is still capped at 40bits.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Peter Maydel 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Suzuki K Poulose 
---
  arch/arm/include/asm/kvm_mmu.h   |  2 ++
  arch/arm64/include/asm/kvm_arm.h | 10 +++---
  arch/arm64/include/asm/kvm_mmu.h |  2 ++
  include/uapi/linux/kvm.h | 10 ++
  virt/kvm/arm/arm.c   | 24 ++--
  5 files changed, 39 insertions(+), 9 deletions(-)


[...]


diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4df9bb6..fa4cab0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
  #define KVM_S390_SIE_PAGE_OFFSET 1
  
  /*

+ * On arm/arm64, machine type can be used to request the physical
+ * address size for the VM. Bits [7-0] have been reserved for the
+ * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
+ * value 0 implies the default IPA size, which is 40bits.
+ */
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK0xff
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)  \
+   ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)


This seems like you're allocating quite a lot of bits in a non-extensible
interface to a fairly esoteric parameter. Would it be better to add another
ioctl, or condense the number of sizes you support instead?


As I explained in the other thread, we need the size as soon as the VM
is created. The major challenge is keeping the backward compatibility by
mapping 0 to 40bits. I will give it a thought.

Suzuki



Re: [Qemu-devel] [kvmtool test PATCH 24/24] kvmtool: arm: Add support for creating VM with PA size

2018-07-04 Thread Suzuki K Poulose

Hi Will,

On 07/04/2018 03:22 PM, Will Deacon wrote:

On Fri, Jun 29, 2018 at 12:15:44PM +0100, Suzuki K Poulose wrote:

diff --git a/arm/kvm.c b/arm/kvm.c
index 5701d41..b1969be 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -11,6 +11,8 @@
  #include 
  #include 
  
+unsigned long kvm_arm_type;

+
  struct kvm_ext kvm_req_ext[] = {
{ DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
{ DEFINE_KVM_EXT(KVM_CAP_ONE_REG) },
@@ -18,6 +20,26 @@ struct kvm_ext kvm_req_ext[] = {
{ 0, 0 },
  };
  
+#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT

+#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT  _IO(KVMIO, 0x0b)
+#endif
+
+void kvm__arch_init_hyp(struct kvm *kvm)
+{
+   int max_ipa;
+
+   max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT);
+   if (max_ipa < 0)
+   max_ipa = 40;
+   if (!kvm->cfg.arch.phys_shift)
+   kvm->cfg.arch.phys_shift = 40;
+   if (kvm->cfg.arch.phys_shift > max_ipa)
+   die("Requested PA size (%u) is not supported by the host 
(%ubits)\n",
+   kvm->cfg.arch.phys_shift, max_ipa);
+   if (kvm->cfg.arch.phys_shift != 40)
+   kvm_arm_type = kvm->cfg.arch.phys_shift;
+}


Seems a bit weird that the "machine type identifier" to KVM_CREATE_VM is
dedicated entirely to holding the physical address shift verbatim. Is this
really the ABI?


The bits[7:0] of the machine type has been reserved for the IPA shift.
This version is missing the updates to the ABI documentation, I have it
for the next version.



Also, couldn't KVM figure it out automatically if you add memslots at high
addresses, making this a niche tunable outside of testing?


The stage2 pgd size is really dependent on the max IPA. Also, unlike the 
stage1 (where the maximum size will be 1 page), the size can go upto 16

pages (and different number of levels due to concatenation), so we need
to finalize this at least before the first memory gets mapped (RAM or 
Device). That implies, we cannot wait until all the memory slots are

created.

The first version of the series added a separate ioctl for specifying
the limit, which had its own complexities. So, this ABI was suggested
to keep things simpler.


Suzuki



Re: [Qemu-devel] [PATCH v3 10/20] kvm: arm64: Dynamic configuration of VTTBR mask

2018-07-04 Thread Suzuki K Poulose

On 07/04/2018 09:24 AM, Auger Eric wrote:

+ *
+ * We have a magic formula for the Magic_N below.
+ *
+ *  Magic_N(PAGE_SIZE, Entry_Level) = 64 - ((PAGE_SHIFT - 3) *
Number of levels)


[0] ^^^




+ *
+ * where number of levels = (4 - Entry_Level).


^^^ Doesn't this help make it clear ? Using the expansion makes it a bit
more
unreadable below.


I just wanted to mention the tables you refer (D4-23 and D4-25) give
Magic_N for a larger scope as they deal with any lookup level while we
only care about the entry level for BADDR. So I was a little bit
confused when reading the explanation but that's not a big deal.


Ah, ok. I will try to clarify it.

Cheers
Suzuki



Re: [Qemu-devel] [RFC 5/6] hw/arm/virt: support kvm_type property

2018-07-03 Thread Suzuki K Poulose

On 03/07/18 13:47, Andrew Jones wrote:

This infrastructure already is used in hw/ppc/spapr.c

Whould it be better if we would pass something like kvm-type=48bGPA?
Otherwise I can decode another virt machine option (min_vm_phys_shift)
in kvm_type callback.


Yes, this is what I'm thinking. I don't believe we have to expose the
details of the KVM API to the user through the QEMU command line. The
details are actually more complicated anyway, as the phys-shift is
only the lower 8-bits of KVM type[*], not the whole value.

Thanks,
drew

[*] Looks like Suzuki's series is missing the Documentation/virtual/kvm/api.txt
 update needed to specify that.


Thanks for spotting, I will update the documentation.

Suzuki



Re: [Qemu-devel] [PATCH v3 10/20] kvm: arm64: Dynamic configuration of VTTBR mask

2018-07-03 Thread Suzuki K Poulose

Hi Eric,

On 02/07/18 15:41, Auger Eric wrote:

Hi Suzuki,

On 06/29/2018 01:15 PM, Suzuki K Poulose wrote:

On arm64 VTTBR_EL2:BADDR holds the base address for the stage2
translation table. The Arm ARM mandates that the bits BADDR[x-1:0]
should be 0, where 'x' is defined for a given IPA Size and the
number of levels for a translation granule size. It is defined
using some magical constants. This patch is a reverse engineered
implementation to calculate the 'x' at runtime for a given ipa and
number of page table levels. See patch for more details.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2:
  - Part 1 of spilt from VTCR & VTTBR dynamic configuration
---
  arch/arm64/include/asm/kvm_arm.h | 60 +---
  arch/arm64/include/asm/kvm_mmu.h | 25 -
  2 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 3dffd38..c557f45 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -140,8 +140,6 @@
   * Note that when using 4K pages, we concatenate two first level page tables
   * together. With 16K pages, we concatenate 16 first level page tables.
   *
- * The magic numbers used for VTTBR_X in this patch can be found in Tables
- * D4-23 and D4-25 in ARM DDI 0487A.b.

Isn't it a pretty old reference? Could you refer to C.a?


Sure, I will update the references everywhere.


+ *
+ * The algorithm defines the expectations on the BaseAddress (for the page
+ * table) bits resolved at each level based on the page size, entry level
+ * and T0SZ. The variable "x" in the algorithm also affects the VTTBR:BADDR
+ * for stage2 page table.
+ *
+ * The value of "x" is calculated as :
+ * x = Magic_N - T0SZ
+ *
+ * where Magic_N is an integer depending on the page size and the entry
+ * level of the page table as below:
+ *
+ * 
+ * | Entry level   |  4K16K   64K |
+ * 
+ * | Level: 0 (4 levels)   | 28   |  -  |  -  |
+ * 
+ * | Level: 1 (3 levels)   | 37   | 31  | 25  |
+ * 
+ * | Level: 2 (2 levels)   | 46   | 42  | 38  |
+ * 
+ * | Level: 3 (1 level)| -| 53  | 51  |
+ * 

I understand entry level = Lookup level in the table.


Entry level => The level at which we start the page table walk for
a given address (This is in line with the ARM ARM). So,

Entry_level = (4 - Number_of_Page_table_levels)


But you may want to compute x for BaseAddress matching lookup level 2
with number of levels = 4.


No, the BaseAddress is only calcualted for the "Entry_level". So the
above case doesn't exist at all.


So shouldn't you s/Number of levels/4 - entry_level?


Ok, I now understood what you are referring to [0]

for BADDR we want the BaseAddr of the initial lookup level so
effectively the entry level we are interested in is 4 - number of levels
and we don't care or d) condition. At least this is my understanding ;-)
If correct you may slightly reword the explanation?




+ *
+ * We have a magic formula for the Magic_N below.
+ *
+ *  Magic_N(PAGE_SIZE, Entry_Level) = 64 - ((PAGE_SHIFT - 3) * Number of 
levels)


[0] ^^^




+ *
+ * where number of levels = (4 - Entry_Level).


^^^ Doesn't this help make it clear ? Using the expansion makes it a bit more
unreadable below.

  
+/*

+ * Get the magic number 'x' for VTTBR:BADDR of this KVM instance.
+ * With v8.2 LVA extensions, 'x' should be a minimum of 6 with
+ * 52bit IPS.

Link to the spec?


Sure, will add it.

Thanks for the patience to review :-)

Cheers
Suzuki



Re: [Qemu-devel] [PATCH v3 13/20] kvm: arm64: Configure VTCR per VM

2018-07-03 Thread Suzuki K Poulose

On 02/07/18 13:16, Marc Zyngier wrote:

On 29/06/18 12:15, Suzuki K Poulose wrote:

We set VTCR_EL2 very early during the stage2 init and don't
touch it ever. This is fine as we had a fixed IPA size. This
patch changes the behavior to set the VTCR for a given VM,
depending on its stage2 table. The common configuration for
VTCR is still performed during the early init as we have to
retain the hardware access flag update bits (VTCR_EL2_HA)
per CPU (as they are only set for the CPUs which are capabile).


capable


The bits defining the number of levels in the page table (SL0)
and and the size of the Input address to the translation (T0SZ)
are programmed for each VM upon entry to the guest.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Change since V2:
  - Load VTCR for TLB operations
---
  arch/arm64/include/asm/kvm_arm.h  | 19 +--
  arch/arm64/include/asm/kvm_asm.h  |  2 +-
  arch/arm64/include/asm/kvm_host.h |  9 ++---
  arch/arm64/include/asm/kvm_hyp.h  | 11 +++
  arch/arm64/kvm/hyp/s2-setup.c | 17 +
  5 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 11a7db0..b02c316 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -120,9 +120,7 @@
  #define VTCR_EL2_IRGN0_WBWA   TCR_IRGN0_WBWA
  #define VTCR_EL2_SL0_SHIFT6
  #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT)
-#define VTCR_EL2_SL0_LVL1  (1 << VTCR_EL2_SL0_SHIFT)
  #define VTCR_EL2_T0SZ_MASK0x3f
-#define VTCR_EL2_T0SZ_40B  24
  #define VTCR_EL2_VS_SHIFT 19
  #define VTCR_EL2_VS_8BIT  (0 << VTCR_EL2_VS_SHIFT)
  #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT)
@@ -137,43 +135,44 @@
   * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time
   * (see hyp-init.S).
   *
+ * VTCR_EL2.SL0 and T0SZ are configured per VM at runtime before switching to
+ * the VM.
+ *
   * Note that when using 4K pages, we concatenate two first level page tables
   * together. With 16K pages, we concatenate 16 first level page tables.
   *
   */
  
-#define VTCR_EL2_T0SZ_IPA	VTCR_EL2_T0SZ_40B

  #define VTCR_EL2_COMMON_BITS  (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \
 VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
+#define VTCR_EL2_PRIVATE_MASK  (VTCR_EL2_SL0_MASK | VTCR_EL2_T0SZ_MASK)


What does "private" mean here? It really is the IPA configuration, so
I'd rather have a naming that reflects that.


  #ifdef CONFIG_ARM64_64K_PAGES
  /*
   * Stage2 translation configuration:
   * 64kB pages (TG0 = 1)
- * 2 level page tables (SL = 1)
   */
-#define VTCR_EL2_TGRAN_FLAGS   (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1)
+#define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K
  #define VTCR_EL2_TGRAN_SL0_BASE   3UL
  
  #elif defined(CONFIG_ARM64_16K_PAGES)

  /*
   * Stage2 translation configuration:
   * 16kB pages (TG0 = 2)
- * 2 level page tables (SL = 1)
   */
-#define VTCR_EL2_TGRAN_FLAGS   (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1)
+#define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K
  #define VTCR_EL2_TGRAN_SL0_BASE   3UL
  #else /* 4K */
  /*
   * Stage2 translation configuration:
   * 4kB pages (TG0 = 0)
- * 3 level page tables (SL = 1)
   */
-#define VTCR_EL2_TGRAN_FLAGS   (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1)
+#define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K
  #define VTCR_EL2_TGRAN_SL0_BASE   2UL
  #endif
  
-#define VTCR_EL2_FLAGS			(VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS)

+#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN)
+
  /*
   * VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
   * Interestingly, it depends on the page size.
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 102b5a5..91372eb 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -72,7 +72,7 @@ extern void __vgic_v3_init_lrs(void);
  
  extern u32 __kvm_get_mdcr_el2(void);
  
-extern u32 __init_stage2_translation(void);

+extern void __init_stage2_translation(void);
  
  /* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */

  #define __hyp_this_cpu_ptr(sym)   
\
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index fe8777b..328f472 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -442,10 +442,13 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
  
  static inline void __cpu_init_stage2(void)

  {
-   u32 parange = kvm_call_hyp(__init_stage2_translation);
+   u32 ps;
  
-	WARN_ONCE(parange < 40,

- "PARange is %d bits, unsupported configuration!", parange);
+   kvm_call_hyp(__init_stage2_translation);
+   /* Sanity c

Re: [Qemu-devel] [PATCH v3 01/20] virtio: mmio-v1: Validate queue PFN

2018-07-03 Thread Suzuki K Poulose

Hi Michael,

On 06/29/2018 06:42 PM, Michael S. Tsirkin wrote:

On Fri, Jun 29, 2018 at 12:15:21PM +0100, Suzuki K Poulose wrote:

virtio-mmio with virtio-v1 uses a 32bit PFN for the queue.
If the queue pfn is too large to fit in 32bits, which
we could hit on arm64 systems with 52bit physical addresses
(even with 64K page size), we simply miss out a proper link
to the other side of the queue.

Add a check to validate the PFN, rather than silently breaking
the devices.

Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Peter Maydel 
Cc: Jean-Philippe Brucker 
Signed-off-by: Suzuki K Poulose 
---
Changes since v2:
  - Change errno to -E2BIG
---
  drivers/virtio/virtio_mmio.c | 18 --
  1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 67763d3..82cedc8 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -397,9 +397,21 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
*vdev, unsigned index,
/* Activate the queue */
writel(virtqueue_get_vring_size(vq), vm_dev->base + 
VIRTIO_MMIO_QUEUE_NUM);
if (vm_dev->version == 1) {
+   u64 q_pfn = virtqueue_get_desc_addr(vq) >> PAGE_SHIFT;
+
+   /*
+* virtio-mmio v1 uses a 32bit QUEUE PFN. If we have something
+* that doesn't fit in 32bit, fail the setup rather than
+* pretending to be successful.
+*/
+   if (q_pfn >> 32) {
+   dev_err(>dev, "virtio-mmio: queue address too 
large\n");


How about:
"hypervisor bug: legacy virtio-mmio must not be used with more than 0x%llx 
Gigabytes of memory",
0x1ULL << (32 - 30) << PAGE_SHIFT


nit : Do we need change "hypervisor" => "platform" ? Virtio is used by 
other tools (e.g, emulators) and not just virtual machines.


Suzuki



Re: [Qemu-devel] [PATCH v3 19/20] kvm: arm64: Allow IPA size supported by the system

2018-07-02 Thread Suzuki K Poulose

On 02/07/18 14:50, Marc Zyngier wrote:

On 29/06/18 12:15, Suzuki K Poulose wrote:

So far we have restricted the IPA size of the VM to the default
value (40bits). Now that we can manage the IPA size per VM and
support dynamic stage2 page tables, allow VMs to have larger IPA.
This is done by setting the IPA limit to the one supported by
the hardware and kernel. This patch also moves the check for
the default IPA size support to kvm_get_ipa_limit().

Since the stage2 page table code is dependent on the stage1
page table, we always ensure that :

   Number of Levels at Stage1 >= Number of Levels at Stage2

So we limit the IPA to make sure that the above condition
is satisfied. This will affect the following combinations
of VA_BITS and IPA for different page sizes.

   39bit VA, 4K  - IPA > 43 (Upto 48)
   36bit VA, 16K - IPA > 40 (Upto 48)
   42bit VA, 64K - IPA > 46 (Upto 52)


I'm not sure I get it. Are these the IPA sizes that we forbid based on
the host VA size and page size configuration?


Yes, thats right.


If so, can you rewrite
this as:

host configuration | unsupported IPA range
39bit VA, 4k   | [44, 48]
36bit VA, 16K  | [41, 48]
42bit VA, 64k  | [47, 52]

and say that all the other combinations are supported?


Sure, that looks much better. Thanks

Suzuki



Re: [Qemu-devel] [PATCH v3 16/20] kvm: arm64: Switch to per VM IPA limit

2018-07-02 Thread Suzuki K Poulose



Hi Marc,

On 02/07/18 14:32, Marc Zyngier wrote:

On 29/06/18 12:15, Suzuki K Poulose wrote:

Now that we can manage the stage2 page table per VM, switch the
configuration details to per VM instance. We keep track of the
IPA bits, number of page table levels and the VTCR bits (which
depends on the IPA and the number of levels). While at it, remove
unused pgd_lock field from kvm_arch for arm64.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 




diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 328f472..9a15860 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -61,13 +61,23 @@ struct kvm_arch {
u64vmid_gen;
u32vmid;
  
-	/* 1-level 2nd stage table and lock */

-   spinlock_t pgd_lock;
+   /* stage-2 page table */
pgd_t *pgd;
  
  	/* VTTBR value associated with above pgd and vmid */

u64vttbr;
  
+	/* Private bits of VTCR_EL2 for this VM */

+   u64vtcr_private;


As I said in another email, this should become a full VTCR_EL2 copy.



OK


+   /* Size of the PA size for this guest */
+   u8 phys_shift;
+   /*
+* Number of levels in page table. We could always calculate
+* it from phys_shift above. We cache it for faster switches
+* in stage2 page table helpers.
+*/
+   u8 s2_levels;


And these two fields feel like they should be derived from the VTCR
itself, instead of being there on their own. Any chance you could look
into this?


Yes, the VTCR is computed from the above two values and we could compute
them back from the VTCR. I will give it a try.


diff --git a/arch/arm64/include/asm/stage2_pgtable.h 
b/arch/arm64/include/asm/stage2_pgtable.h
index ffc37cc..91d7936 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -65,7 +65,6 @@
  #define __s2_pgd_ptrs(pa, lvls)   (1 << ((pa) - 
pt_levels_pgdir_shift((lvls
  #define __s2_pgd_size(pa, lvls)   (__s2_pgd_ptrs((pa), (lvls)) * 
sizeof(pgd_t))
  
-#define kvm_stage2_levels(kvm)		stage2_pt_levels(kvm_phys_shift(kvm))

  #define stage2_pgdir_shift(kvm)   \
pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
  #define stage2_pgdir_size(kvm)(_AC(1, UL) << 
stage2_pgdir_shift((kvm)))
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index a339e00..d7822e1 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -867,6 +867,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
return -EINVAL;
}
  
+	/* Make sure we have the stage2 configured for this VM */

+   if (WARN_ON(!kvm_phys_shift(kvm)))


Can this be triggered from userspace?


No. As we initialise the phys shift before we get here. If type is left
blank (i.e, 0), we default to 40bits. So there should be something there.
The check is to make sure we have indeed past the configuration step.


+   return -EINVAL;
+
/* Allocate the HW PGD, making sure that each page gets its own 
refcount */
pgd = stage2_alloc_pgd(kvm);
if (!pgd)





Cheers
Suzuki



Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-07-02 Thread Suzuki K Poulose

On 02/07/18 14:13, Marc Zyngier wrote:

On 29/06/18 12:15, Suzuki K Poulose wrote:

Allow specifying the physical address size for a new VM via
the kvm_type argument for KVM_CREATE_VM ioctl. This allows
us to finalise the stage2 page table format as early as possible
and hence perform the right checks on the memory slots without
complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
of the type field and can encode more information in the future if
required. The IPA size is still capped at 40bits.


Can't we relax this? There is no technical reason (AFAICS) not to allow
going down to 36bit IPA if the user has requested it.


Sure, we can.



If we run on a 36bit IPA system, the default would fail. But if the user
specified "please give me a 36bit IPA VM", we could satisfy that
requirement and allow them to run their stupidly small guest!


Absolutely. I will fix this in the next version.

Cheers
Suzuki



Re: [Qemu-devel] [PATCH v3 09/20] kvm: arm64: Make stage2 page table layout dynamic

2018-07-02 Thread Suzuki K Poulose

Hi Eric,


On 02/07/18 13:14, Auger Eric wrote:

Hi Suzuki,

On 06/29/2018 01:15 PM, Suzuki K Poulose wrote:

So far we had a static stage2 page table handling code, based on a
fixed IPA of 40bits. As we prepare for a configurable IPA size per
VM, make our stage2 page table code dynamic, to do the right thing
for a given VM. We ensure the existing condition is always true even
when we lift the limit on the IPA. i.e,

page table levels in stage1 >= page table levels in stage2

Support for the IPA size configuration needs other changes in the way
we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still
fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h
to the top, before including the asm/stage2_pgtable.h to avoid a forward
declaration.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2
  - Restrict the stage2 page table to allow reusing the host page table
helpers for now, until we get stage1 independent page table helpers.

I would move this up in the commit msg to motivate the fact we enforce
the able condition.


This is mentioned in the commit message for the patch which lifts the limitation
on the IPA. This patch only deals with the dynamic page table level handling,
with the restriction on the levels. Nevertheless, I could add it to the
description.


---
  arch/arm64/include/asm/kvm_mmu.h  |  14 +-
  arch/arm64/include/asm/stage2_pgtable-nopmd.h |  42 --
  arch/arm64/include/asm/stage2_pgtable-nopud.h |  39 -
  arch/arm64/include/asm/stage2_pgtable.h   | 207 +++---
  4 files changed, 159 insertions(+), 143 deletions(-)
  delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
  delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h


with my very limited knowledge of S2 page table walkers I fail to
understand why we now can get rid of stage2_pgtable-nopmd.h and
stage2_pgtable-nopud.h and associated FOLDED config. Please could you
explain it in the commit message?


As mentioned above, we have static page table helpers, which are decided
at compile time (just like the stage1). So these files hold the definitions
for the cases where PUD/PMD is folded and included for a given stage1 VA.
But since we are now doing this check per VM, we make the decision
by checking the kvm_stage2_levels(), instead of hard coding it.

Does that help ? A short version of that is already there. May be I could
elaborate that a bit.


-
-#define stage2_pgd_index(kvm, addr) \
-   (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1))
+static inline unsigned long stage2_pgd_index(struct kvm *kvm, phys_addr_t addr)
+{
+   return (addr >> stage2_pgdir_shift(kvm)) & (stage2_pgd_ptrs(kvm) - 1);
+}
  
  static inline phys_addr_t

  stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
  {
-   phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
+   phys_addr_t boundary;
  
+	boundary = (addr + stage2_pgdir_size(kvm)) & stage2_pgdir_mask(kvm);

return (boundary - 1 < end - 1) ? boundary : end;
  }
  



Globally this patch is pretty hard to review. I don't know if it is
possible to split into 2. 1) Addition of some helper macros. 2) removal
of nopud and nopmd and implementation of the corresponding macros?


I acknowledge that. The patch redefines the "existing" macros to make the
decision at runtime based on the VM's setting. I will see if there is a
better way to do it.

Cheers
Suzuki



Re: [Qemu-devel] [PATCH v3 07/20] kvm: arm/arm64: Prepare for VM specific stage2 translations

2018-07-02 Thread Suzuki K Poulose



Hi Eric,

On 02/07/18 11:51, Auger Eric wrote:

Hi Suzuki,

On 06/29/2018 01:15 PM, Suzuki K Poulose wrote:

Right now the stage2 page table for a VM is hard coded, assuming
an IPA of 40bits. As we are about to add support for per VM IPA,
prepare the stage2 page table helpers to accept the kvm instance
to make the right decision for the VM. No functional changes.
Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves
some of the definitions dependent on kvm instance to asm/kvm_mmu.h
for arm32. In that process drop the _AC() specifier constants

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2:
  - Update commit description abuot the movement to asm/kvm_mmu.h
for arm32
  - Drop _AC() specifiers




diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 8553d68..f36eb20 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -36,15 +36,19 @@
})
  
  /*

- * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation 
levels.
+ * kvm_mmu_cache_min_pages() is the number of stage2 page
+ * table translation levels, excluding the top level, for
+ * the given VM. Since we have a 3 level page-table, this
+ * is fixed.
   */
-#define KVM_MMU_CACHE_MIN_PAGES2
+#define kvm_mmu_cache_min_pages(kvm)   2

nit: In addition to Marc'c comment, I can see it defined in
stage2_pgtable.h on arm64 side. Can't we align?


Sure, will do that.


diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index fb9a712..5da8f52 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -141,8 +141,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
   * We currently only support a 40bit IPA.
   */
  #define KVM_PHYS_SHIFT(40)
-#define KVM_PHYS_SIZE  (1UL << KVM_PHYS_SHIFT)
-#define KVM_PHYS_MASK  (KVM_PHYS_SIZE - 1UL)
+
+#define kvm_phys_shift(kvm)KVM_PHYS_SHIFT
+#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm))

Can't you get rid of _AC() also in arm64 case?




+#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))


Yes, that missed. I will do it. Thanks for spotting.

Cheers
Suzuki




Re: [Qemu-devel] [PATCH v3 09/20] kvm: arm64: Make stage2 page table layout dynamic

2018-07-02 Thread Suzuki K Poulose

On 29/06/18 12:15, Suzuki K Poulose wrote:

So far we had a static stage2 page table handling code, based on a
fixed IPA of 40bits. As we prepare for a configurable IPA size per
VM, make our stage2 page table code dynamic, to do the right thing
for a given VM. We ensure the existing condition is always true even
when we lift the limit on the IPA. i.e,

page table levels in stage1 >= page table levels in stage2

Support for the IPA size configuration needs other changes in the way
we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still
fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h
to the top, before including the asm/stage2_pgtable.h to avoid a forward
declaration.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2
  - Restrict the stage2 page table to allow reusing the host page table
helpers for now, until we get stage1 independent page table helpers.


...


-#define stage2_pgd_none(kvm, pgd)  pgd_none(pgd)
-#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd)
-#define stage2_pgd_present(kvm, pgd)   pgd_present(pgd)
-#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud)
-#define stage2_pud_offset(kvm, pgd, address)   pud_offset(pgd, address)
-#define stage2_pud_free(kvm, pud)  pud_free(NULL, pud)
+#define __s2_pud_index(addr) \
+   (((addr) >> __S2_PUD_SHIFT) & (PTRS_PER_PTE - 1))
+#define __s2_pmd_index(addr) \
+   (((addr) >> __S2_PMD_SHIFT) & (PTRS_PER_PTE - 1))
  
-#define stage2_pud_table_empty(kvm, pudp)	kvm_page_empty(pudp)

+#define __kvm_has_stage2_levels(kvm, min_levels)   \
+  ((CONFIG_PGTABLE_LEVELS >= min_levels) && (kvm_stage2_levels(kvm) >= 
min_levels))


On another look, I have renamed the helpers as follows :

kvm_stage2_has_pud(kvm) => kvm_stage2_has_pmd(kvm)
kvm_stage2_has_pgd(kvm) => kvm_stage2_has_pud(kvm)

below and everywhere.


+
+#define kvm_stage2_has_pgd(kvm)__kvm_has_stage2_levels(kvm, 4)
+#define kvm_stage2_has_pud(kvm) __kvm_has_stage2_levels(kvm, 3)



Suzuki



Re: [Qemu-devel] [PATCH v3 07/20] kvm: arm/arm64: Prepare for VM specific stage2 translations

2018-07-02 Thread Suzuki K Poulose

On 02/07/18 11:12, Marc Zyngier wrote:

On 29/06/18 12:15, Suzuki K Poulose wrote:

Right now the stage2 page table for a VM is hard coded, assuming
an IPA of 40bits. As we are about to add support for per VM IPA,
prepare the stage2 page table helpers to accept the kvm instance
to make the right decision for the VM. No functional changes.
Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves
some of the definitions dependent on kvm instance to asm/kvm_mmu.h
for arm32. In that process drop the _AC() specifier constants

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2:
  - Update commit description abuot the movement to asm/kvm_mmu.h
for arm32
  - Drop _AC() specifiers
---
  arch/arm/include/asm/kvm_arm.h|   3 +-
  arch/arm/include/asm/kvm_mmu.h|  15 +++-
  arch/arm/include/asm/stage2_pgtable.h |  42 -
  arch/arm64/include/asm/kvm_mmu.h  |   7 +-
  arch/arm64/include/asm/stage2_pgtable-nopmd.h |  18 ++--
  arch/arm64/include/asm/stage2_pgtable-nopud.h |  16 ++--
  arch/arm64/include/asm/stage2_pgtable.h   |  49 ++-
  virt/kvm/arm/arm.c|   2 +-
  virt/kvm/arm/mmu.c| 119 +-
  virt/kvm/arm/vgic/vgic-kvm-device.c   |   2 +-
  10 files changed, 148 insertions(+), 125 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 3ab8b37..c3f1f9b 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -133,8 +133,7 @@
   * space.
   */
  #define KVM_PHYS_SHIFT(40)
-#define KVM_PHYS_SIZE  (_AC(1, ULL) << KVM_PHYS_SHIFT)
-#define KVM_PHYS_MASK  (KVM_PHYS_SIZE - _AC(1, ULL))
+
  #define PTRS_PER_S2_PGD   (_AC(1, ULL) << (KVM_PHYS_SHIFT - 30))
  
  /* Virtualization Translation Control Register (VTCR) bits */

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 8553d68..f36eb20 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -36,15 +36,19 @@
})
  
  /*

- * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation 
levels.
+ * kvm_mmu_cache_min_pages() is the number of stage2 page
+ * table translation levels, excluding the top level, for
+ * the given VM. Since we have a 3 level page-table, this
+ * is fixed.


I find this comment quite confusing: number of levels, but excluding the
top one? The original one was just as bad, to be honest.

Can't we just say: "kvm_mmu_cache_min_page() is the number of pages
required to install a stage-2 translation"?


Yes, that is much better.  Will change it.


diff --git a/arch/arm64/include/asm/stage2_pgtable.h 
b/arch/arm64/include/asm/stage2_pgtable.h
index 8b68099..057a405 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -65,10 +65,10 @@
  #define PTRS_PER_S2_PGD   (1 << (KVM_PHYS_SHIFT - 
S2_PGDIR_SHIFT))
  
  /*

- * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation
+ * kvm_mmmu_cache_min_pages is the number of stage2 page table translation
   * levels in addition to the PGD.
   */
-#define KVM_MMU_CACHE_MIN_PAGES(STAGE2_PGTABLE_LEVELS - 1)
+#define kvm_mmu_cache_min_pages(kvm)   (STAGE2_PGTABLE_LEVELS - 1)


Same comment as for the 32bit case.

  






Otherwise:

Acked-by: Marc Zyngier 


Thanks
Suzuki



[Qemu-devel] [kvmtool test PATCH 23/24] kvmtool: arm64: Switch memory layout

2018-06-29 Thread Suzuki K Poulose
If the guest wants to use a larger physical address space place
the RAM at upper half of the address space. Otherwise, it uses the
default layout.

Signed-off-by: Suzuki K Poulose 
---
 arm/aarch32/include/kvm/kvm-arch.h |  6 --
 arm/aarch64/include/kvm/kvm-arch.h | 15 ---
 arm/include/arm-common/kvm-arch.h  | 11 ++-
 arm/kvm.c  |  2 +-
 4 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/arm/aarch32/include/kvm/kvm-arch.h 
b/arm/aarch32/include/kvm/kvm-arch.h
index cd31e72..bcd382b 100644
--- a/arm/aarch32/include/kvm/kvm-arch.h
+++ b/arm/aarch32/include/kvm/kvm-arch.h
@@ -3,8 +3,10 @@
 
 #define ARM_KERN_OFFSET(...)   0x8000
 
-#define ARM_MAX_MEMORY(...)ARM_LOMAP_MAX_MEMORY
-
 #include "arm-common/kvm-arch.h"
 
+#define ARM_MAX_MEMORY(...)ARM32_MAX_MEMORY
+#define ARM_MEMORY_AREA(...)   ARM32_MEMORY_AREA
+
+
 #endif /* KVM__KVM_ARCH_H */
diff --git a/arm/aarch64/include/kvm/kvm-arch.h 
b/arm/aarch64/include/kvm/kvm-arch.h
index 9de623a..bad35b9 100644
--- a/arm/aarch64/include/kvm/kvm-arch.h
+++ b/arm/aarch64/include/kvm/kvm-arch.h
@@ -1,14 +1,23 @@
 #ifndef KVM__KVM_ARCH_H
 #define KVM__KVM_ARCH_H
 
+#include "arm-common/kvm-arch.h"
+
+#define ARM64_MEMORY_AREA(phys_shift)  (1UL << (phys_shift - 1))
+#define ARM64_MAX_MEMORY(phys_shift)   \
+   ((1ULL << (phys_shift)) - ARM64_MEMORY_AREA(phys_shift))
+
+#define ARM_MEMORY_AREA(kvm)   ((kvm)->cfg.arch.aarch32_guest ?\
+ARM32_MEMORY_AREA  :   \
+ARM64_MEMORY_AREA(kvm->cfg.arch.phys_shift))
+
 #define ARM_KERN_OFFSET(kvm)   ((kvm)->cfg.arch.aarch32_guest  ?   \
0x8000  :   \
0x8)
 
 #define ARM_MAX_MEMORY(kvm)((kvm)->cfg.arch.aarch32_guest  ?   \
-   ARM_LOMAP_MAX_MEMORY:   \
-   ARM_HIMAP_MAX_MEMORY)
+   ARM32_MAX_MEMORY:   \
+   ARM64_MAX_MEMORY(kvm->cfg.arch.phys_shift))
 
-#include "arm-common/kvm-arch.h"
 
 #endif /* KVM__KVM_ARCH_H */
diff --git a/arm/include/arm-common/kvm-arch.h 
b/arm/include/arm-common/kvm-arch.h
index b9d486d..b29b4b1 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -6,14 +6,15 @@
 #include 
 
 #include "arm-common/gic.h"
-
 #define ARM_IOPORT_AREA_AC(0x, UL)
 #define ARM_MMIO_AREA  _AC(0x0001, UL)
 #define ARM_AXI_AREA   _AC(0x4000, UL)
-#define ARM_MEMORY_AREA_AC(0x8000, UL)
 
-#define ARM_LOMAP_MAX_MEMORY   ((1ULL << 32) - ARM_MEMORY_AREA)
-#define ARM_HIMAP_MAX_MEMORY   ((1ULL << 40) - ARM_MEMORY_AREA)
+#define ARM32_MEMORY_AREA  _AC(0x8000, UL)
+#define ARM32_MAX_MEMORY   ((1ULL << 32) - ARM32_MEMORY_AREA)
+
+#define ARM_IOMEM_AREA_END ARM32_MEMORY_AREA
+
 
 #define ARM_GIC_DIST_BASE  (ARM_AXI_AREA - ARM_GIC_DIST_SIZE)
 #define ARM_GIC_CPUI_BASE  (ARM_GIC_DIST_BASE - ARM_GIC_CPUI_SIZE)
@@ -24,7 +25,7 @@
 #define ARM_IOPORT_SIZE(ARM_MMIO_AREA - ARM_IOPORT_AREA)
 #define ARM_VIRTIO_MMIO_SIZE   (ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
 #define ARM_PCI_CFG_SIZE   (1ULL << 24)
-#define ARM_PCI_MMIO_SIZE  (ARM_MEMORY_AREA - \
+#define ARM_PCI_MMIO_SIZE  (ARM_IOMEM_AREA_END - \
(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
 
 #define KVM_IOPORT_AREAARM_IOPORT_AREA
diff --git a/arm/kvm.c b/arm/kvm.c
index 2ab436e..5701d41 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -30,7 +30,7 @@ void kvm__init_ram(struct kvm *kvm)
u64 phys_start, phys_size;
void *host_mem;
 
-   phys_start  = ARM_MEMORY_AREA;
+   phys_start  = ARM_MEMORY_AREA(kvm);
phys_size   = kvm->ram_size;
host_mem= kvm->ram_start;
 
-- 
2.7.4




[Qemu-devel] [kvmtool test PATCH 21/24] kvmtool: Allow backends to run checks on the KVM device fd

2018-06-29 Thread Suzuki K Poulose
Allow architectures to perform initialisation based on the
KVM device fd ioctls, even before the VM is created.

Signed-off-by: Suzuki K Poulose 
---
 include/kvm/kvm.h | 4 
 kvm.c | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 90463b8..a036dd2 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -103,6 +103,10 @@ int kvm__get_sock_by_instance(const char *name);
 int kvm__enumerate_instances(int (*callback)(const char *name, int pid));
 void kvm__remove_socket(const char *name);
 
+#ifndef kvm__arch_init_hyp
+static inline void kvm__arch_init_hyp(struct kvm *kvm) {}
+#endif
+
 void kvm__arch_set_cmdline(char *cmdline, bool video);
 void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size);
 void kvm__arch_delete_ram(struct kvm *kvm);
diff --git a/kvm.c b/kvm.c
index f8f2fdc..b992e74 100644
--- a/kvm.c
+++ b/kvm.c
@@ -304,6 +304,8 @@ int kvm__init(struct kvm *kvm)
goto err_sys_fd;
}
 
+   kvm__arch_init_hyp(kvm);
+
kvm->vm_fd = ioctl(kvm->sys_fd, KVM_CREATE_VM, KVM_VM_TYPE);
if (kvm->vm_fd < 0) {
pr_err("KVM_CREATE_VM ioctl");
-- 
2.7.4




[Qemu-devel] [PATCH v3 16/20] kvm: arm64: Switch to per VM IPA limit

2018-06-29 Thread Suzuki K Poulose
Now that we can manage the stage2 page table per VM, switch the
configuration details to per VM instance. We keep track of the
IPA bits, number of page table levels and the VTCR bits (which
depends on the IPA and the number of levels). While at it, remove
unused pgd_lock field from kvm_arch for arm64.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/kvm_host.h   | 14 --
 arch/arm64/include/asm/kvm_hyp.h|  3 +--
 arch/arm64/include/asm/kvm_mmu.h| 20 ++--
 arch/arm64/include/asm/stage2_pgtable.h |  1 -
 virt/kvm/arm/mmu.c  |  4 
 5 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 328f472..9a15860 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -61,13 +61,23 @@ struct kvm_arch {
u64vmid_gen;
u32vmid;
 
-   /* 1-level 2nd stage table and lock */
-   spinlock_t pgd_lock;
+   /* stage-2 page table */
pgd_t *pgd;
 
/* VTTBR value associated with above pgd and vmid */
u64vttbr;
 
+   /* Private bits of VTCR_EL2 for this VM */
+   u64vtcr_private;
+   /* Size of the PA size for this guest */
+   u8 phys_shift;
+   /*
+* Number of levels in page table. We could always calculate
+* it from phys_shift above. We cache it for faster switches
+* in stage2 page table helpers.
+*/
+   u8 s2_levels;
+
/* The last vcpu id that ran on each physical CPU */
int __percpu *last_vcpu_ran;
 
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 3e8052d1..699f678 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -166,8 +166,7 @@ static __always_inline void __hyp_text 
__load_guest_stage2(struct kvm *kvm)
u64 vtcr = read_sysreg(vtcr_el2);
 
vtcr &= ~VTCR_EL2_PRIVATE_MASK;
-   vtcr |= VTCR_EL2_SL0(kvm_stage2_levels(kvm)) |
-   VTCR_EL2_T0SZ(kvm_phys_shift(kvm));
+   vtcr |= kvm->arch.vtcr_private;
write_sysreg(vtcr, vtcr_el2);
write_sysreg(kvm->arch.vttbr, vttbr_el2);
 }
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index f3fb05a3..a291cdc 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -143,9 +143,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
  */
 #define KVM_PHYS_SHIFT (40)
 
-#define kvm_phys_shift(kvm)KVM_PHYS_SHIFT
+#define kvm_phys_shift(kvm)(kvm->arch.phys_shift)
 #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm))
 #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
+#define kvm_stage2_levels(kvm) (kvm->arch.s2_levels)
 
 static inline bool kvm_page_empty(void *ptr)
 {
@@ -528,6 +529,18 @@ static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm)
 
 static inline void *stage2_alloc_pgd(struct kvm *kvm)
 {
+   u32 ipa, lvls;
+
+   /*
+* Stage2 page table can support concatenation of (upto 16) tables
+* at the entry level, thereby reducing the number of levels.
+*/
+   ipa = kvm_phys_shift(kvm);
+   lvls = stage2_pt_levels(ipa);
+
+   kvm->arch.s2_levels = lvls;
+   kvm->arch.vtcr_private = VTCR_EL2_SL0(lvls) | TCR_T0SZ(ipa);
+
return alloc_pages_exact(stage2_pgd_size(kvm),
 GFP_KERNEL | __GFP_ZERO);
 }
@@ -537,7 +550,10 @@ static inline u32 kvm_get_ipa_limit(void)
return KVM_PHYS_SHIFT;
 }
 
-static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {}
+static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift)
+{
+   kvm->arch.phys_shift = ipa_shift;
+}
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/stage2_pgtable.h 
b/arch/arm64/include/asm/stage2_pgtable.h
index ffc37cc..91d7936 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -65,7 +65,6 @@
 #define __s2_pgd_ptrs(pa, lvls)(1 << ((pa) - 
pt_levels_pgdir_shift((lvls
 #define __s2_pgd_size(pa, lvls)(__s2_pgd_ptrs((pa), (lvls)) * 
sizeof(pgd_t))
 
-#define kvm_stage2_levels(kvm) stage2_pt_levels(kvm_phys_shift(kvm))
 #define stage2_pgdir_shift(kvm)\
pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
 #define stage2_pgdir_size(kvm) (_AC(1, UL) << 
stage2_pgdir_shift((kvm)))
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index a339e00..d7822e1 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -867,6 +867,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
return -EINVAL;
}
 
+   /* Make sure we have the stage2 configured for this VM

[Qemu-devel] [PATCH v3 19/20] kvm: arm64: Allow IPA size supported by the system

2018-06-29 Thread Suzuki K Poulose
So far we have restricted the IPA size of the VM to the default
value (40bits). Now that we can manage the IPA size per VM and
support dynamic stage2 page tables, allow VMs to have larger IPA.
This is done by setting the IPA limit to the one supported by
the hardware and kernel. This patch also moves the check for
the default IPA size support to kvm_get_ipa_limit().

Since the stage2 page table code is dependent on the stage1
page table, we always ensure that :

  Number of Levels at Stage1 >= Number of Levels at Stage2

So we limit the IPA to make sure that the above condition
is satisfied. This will affect the following combinations
of VA_BITS and IPA for different page sizes.

  39bit VA, 4K  - IPA > 43 (Upto 48)
  36bit VA, 16K - IPA > 40 (Upto 48)
  42bit VA, 64K - IPA > 46 (Upto 52)

Supporting the above combinations need independent stage2
page table manipulation code, which would need substantial
changes. We could purse the solution independently and
switch the page table code once we have it ready.

Cc: Catalin Marinas 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2:
 - Restrict the IPA size to limit the number of page table
   levels in stage2 to that of stage1 or less.
---
 arch/arm64/include/asm/kvm_host.h |  6 --
 arch/arm64/include/asm/kvm_mmu.h  | 37 -
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 9a15860..e858e49 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -452,13 +452,7 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 
 static inline void __cpu_init_stage2(void)
 {
-   u32 ps;
-
kvm_call_hyp(__init_stage2_translation);
-   /* Sanity check for minimum IPA size support */
-   ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1) & 
0x7);
-   WARN_ONCE(ps < 40,
- "PARange is %d bits, unsupported configuration!", ps);
 }
 
 /* Guest/host FPSIMD coordination helpers */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index a291cdc..d38f395 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -547,7 +547,42 @@ static inline void *stage2_alloc_pgd(struct kvm *kvm)
 
 static inline u32 kvm_get_ipa_limit(void)
 {
-   return KVM_PHYS_SHIFT;
+   unsigned int ipa_max, va_max, parange;
+
+   parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7;
+   ipa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
+
+   /* Raise the limit to the default size for backward compatibility */
+   if (ipa_max < KVM_PHYS_SHIFT) {
+   WARN_ONCE(1,
+ "PARange is %d bits, unsupported configuration!",
+ ipa_max);
+   ipa_max = KVM_PHYS_SHIFT;
+   }
+
+   /* Clamp it to the PA size supported by the kernel */
+   ipa_max = (ipa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : ipa_max;
+   /*
+* Since our stage2 table is dependent on the stage1 page table code,
+* we must always honor the following condition:
+*
+*  Number of levels in Stage1 >= Number of levels in Stage2.
+*
+* So clamp the ipa limit further down to limit the number of levels.
+* Since we can concatenate upto 16 tables at entry level, we could
+* go upto 4bits above the maximum VA addressible with the current
+* number of levels.
+*/
+   va_max = PGDIR_SHIFT + PAGE_SHIFT - 3;
+   va_max += 4;
+
+   if (va_max < ipa_max) {
+   kvm_info("Limiting IPA limit to %dbytes due to host VA bits 
limitation\n",
+va_max);
+   ipa_max = va_max;
+   }
+
+   return ipa_max;
 }
 
 static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift)
-- 
2.7.4




[Qemu-devel] [kvmtool test PATCH 24/24] kvmtool: arm: Add support for creating VM with PA size

2018-06-29 Thread Suzuki K Poulose
Specify the physical size for the VM encoded in the vm type.

Signed-off-by: Suzuki K Poulose 
---
 arm/include/arm-common/kvm-arch.h |  6 +-
 arm/kvm.c | 22 ++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arm/include/arm-common/kvm-arch.h 
b/arm/include/arm-common/kvm-arch.h
index b29b4b1..d77f3ac 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -44,7 +44,11 @@
 
 #define KVM_IRQ_OFFSET GIC_SPI_IRQ_BASE
 
-#define KVM_VM_TYPE0
+extern unsigned long   kvm_arm_type;
+extern void kvm__arch_init_hyp(struct kvm *kvm);
+
+#define KVM_VM_TYPEkvm_arm_type
+#define kvm__arch_init_hyp kvm__arch_init_hyp
 
 #define VIRTIO_DEFAULT_TRANS(kvm)  \
((kvm)->cfg.arch.virtio_trans_pci ? VIRTIO_PCI : VIRTIO_MMIO)
diff --git a/arm/kvm.c b/arm/kvm.c
index 5701d41..b1969be 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+unsigned long kvm_arm_type;
+
 struct kvm_ext kvm_req_ext[] = {
{ DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) },
{ DEFINE_KVM_EXT(KVM_CAP_ONE_REG) },
@@ -18,6 +20,26 @@ struct kvm_ext kvm_req_ext[] = {
{ 0, 0 },
 };
 
+#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT
+#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT  _IO(KVMIO, 0x0b)
+#endif
+
+void kvm__arch_init_hyp(struct kvm *kvm)
+{
+   int max_ipa;
+
+   max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT);
+   if (max_ipa < 0)
+   max_ipa = 40;
+   if (!kvm->cfg.arch.phys_shift)
+   kvm->cfg.arch.phys_shift = 40;
+   if (kvm->cfg.arch.phys_shift > max_ipa)
+   die("Requested PA size (%u) is not supported by the host 
(%ubits)\n",
+   kvm->cfg.arch.phys_shift, max_ipa);
+   if (kvm->cfg.arch.phys_shift != 40)
+   kvm_arm_type = kvm->cfg.arch.phys_shift;
+}
+
 bool kvm__arch_cpu_supports_vm(void)
 {
/* The KVM capability check is enough. */
-- 
2.7.4




[Qemu-devel] [kvmtool test PATCH 22/24] kvmtool: arm64: Add support for guest physical address size

2018-06-29 Thread Suzuki K Poulose
Add an option to specify the physical address size used by this
VM.

Signed-off-by: Suzuki K Poulose 
---
 arm/aarch64/include/kvm/kvm-config-arch.h | 5 -
 arm/include/arm-common/kvm-config-arch.h  | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h 
b/arm/aarch64/include/kvm/kvm-config-arch.h
index 04be43d..dabd22c 100644
--- a/arm/aarch64/include/kvm/kvm-config-arch.h
+++ b/arm/aarch64/include/kvm/kvm-config-arch.h
@@ -8,7 +8,10 @@
"Create PMUv3 device"), \
OPT_U64('\0', "kaslr-seed", &(cfg)->kaslr_seed, \
"Specify random seed for Kernel Address Space " \
-   "Layout Randomization (KASLR)"),
+   "Layout Randomization (KASLR)"),\
+   OPT_INTEGER('\0', "phys-shift", &(cfg)->phys_shift, \
+   "Specify maximum physical address size (not "   \
+   "the amount of memory)"),
 
 #include "arm-common/kvm-config-arch.h"
 
diff --git a/arm/include/arm-common/kvm-config-arch.h 
b/arm/include/arm-common/kvm-config-arch.h
index 6a196f1..e0b531e 100644
--- a/arm/include/arm-common/kvm-config-arch.h
+++ b/arm/include/arm-common/kvm-config-arch.h
@@ -11,6 +11,7 @@ struct kvm_config_arch {
boolhas_pmuv3;
u64 kaslr_seed;
enum irqchip_type irqchip;
+   int phys_shift;
 };
 
 int irqchip_parser(const struct option *opt, const char *arg, int unset);
-- 
2.7.4




[Qemu-devel] [PATCH v3 13/20] kvm: arm64: Configure VTCR per VM

2018-06-29 Thread Suzuki K Poulose
We set VTCR_EL2 very early during the stage2 init and don't
touch it ever. This is fine as we had a fixed IPA size. This
patch changes the behavior to set the VTCR for a given VM,
depending on its stage2 table. The common configuration for
VTCR is still performed during the early init as we have to
retain the hardware access flag update bits (VTCR_EL2_HA)
per CPU (as they are only set for the CPUs which are capabile).
The bits defining the number of levels in the page table (SL0)
and and the size of the Input address to the translation (T0SZ)
are programmed for each VM upon entry to the guest.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Change since V2:
 - Load VTCR for TLB operations
---
 arch/arm64/include/asm/kvm_arm.h  | 19 +--
 arch/arm64/include/asm/kvm_asm.h  |  2 +-
 arch/arm64/include/asm/kvm_host.h |  9 ++---
 arch/arm64/include/asm/kvm_hyp.h  | 11 +++
 arch/arm64/kvm/hyp/s2-setup.c | 17 +
 5 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 11a7db0..b02c316 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -120,9 +120,7 @@
 #define VTCR_EL2_IRGN0_WBWATCR_IRGN0_WBWA
 #define VTCR_EL2_SL0_SHIFT 6
 #define VTCR_EL2_SL0_MASK  (3 << VTCR_EL2_SL0_SHIFT)
-#define VTCR_EL2_SL0_LVL1  (1 << VTCR_EL2_SL0_SHIFT)
 #define VTCR_EL2_T0SZ_MASK 0x3f
-#define VTCR_EL2_T0SZ_40B  24
 #define VTCR_EL2_VS_SHIFT  19
 #define VTCR_EL2_VS_8BIT   (0 << VTCR_EL2_VS_SHIFT)
 #define VTCR_EL2_VS_16BIT  (1 << VTCR_EL2_VS_SHIFT)
@@ -137,43 +135,44 @@
  * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time
  * (see hyp-init.S).
  *
+ * VTCR_EL2.SL0 and T0SZ are configured per VM at runtime before switching to
+ * the VM.
+ *
  * Note that when using 4K pages, we concatenate two first level page tables
  * together. With 16K pages, we concatenate 16 first level page tables.
  *
  */
 
-#define VTCR_EL2_T0SZ_IPA  VTCR_EL2_T0SZ_40B
 #define VTCR_EL2_COMMON_BITS   (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \
 VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1)
+#define VTCR_EL2_PRIVATE_MASK  (VTCR_EL2_SL0_MASK | VTCR_EL2_T0SZ_MASK)
 
 #ifdef CONFIG_ARM64_64K_PAGES
 /*
  * Stage2 translation configuration:
  * 64kB pages (TG0 = 1)
- * 2 level page tables (SL = 1)
  */
-#define VTCR_EL2_TGRAN_FLAGS   (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1)
+#define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K
 #define VTCR_EL2_TGRAN_SL0_BASE3UL
 
 #elif defined(CONFIG_ARM64_16K_PAGES)
 /*
  * Stage2 translation configuration:
  * 16kB pages (TG0 = 2)
- * 2 level page tables (SL = 1)
  */
-#define VTCR_EL2_TGRAN_FLAGS   (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1)
+#define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K
 #define VTCR_EL2_TGRAN_SL0_BASE3UL
 #else  /* 4K */
 /*
  * Stage2 translation configuration:
  * 4kB pages (TG0 = 0)
- * 3 level page tables (SL = 1)
  */
-#define VTCR_EL2_TGRAN_FLAGS   (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1)
+#define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K
 #define VTCR_EL2_TGRAN_SL0_BASE2UL
 #endif
 
-#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | 
VTCR_EL2_TGRAN_FLAGS)
+#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN)
+
 /*
  * VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
  * Interestingly, it depends on the page size.
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 102b5a5..91372eb 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -72,7 +72,7 @@ extern void __vgic_v3_init_lrs(void);
 
 extern u32 __kvm_get_mdcr_el2(void);
 
-extern u32 __init_stage2_translation(void);
+extern void __init_stage2_translation(void);
 
 /* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */
 #define __hyp_this_cpu_ptr(sym)
\
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index fe8777b..328f472 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -442,10 +442,13 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 
 static inline void __cpu_init_stage2(void)
 {
-   u32 parange = kvm_call_hyp(__init_stage2_translation);
+   u32 ps;
 
-   WARN_ONCE(parange < 40,
- "PARange is %d bits, unsupported configuration!", parange);
+   kvm_call_hyp(__init_stage2_translation);
+   /* Sanity check for minimum IPA size support */
+   ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1) & 
0x7);
+   WARN_ONCE(ps < 40,
+ "PARange is %d bits, unsupported configuration!", ps);
 }
 
 /* Guest/host 

[Qemu-devel] [PATCH v3 20/20] kvm: arm64: Fall back to normal stage2 entry level

2018-06-29 Thread Suzuki K Poulose
We use concatenated entry level page tables (upto 16tables) for
stage2. If we don't have sufficient contiguous pages (e.g, 16 * 64K),
fallback to the normal page table format, by going one level
deeper if permitted.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
New in v3
---
 arch/arm64/include/asm/kvm_arm.h |  7 +++
 arch/arm64/include/asm/kvm_mmu.h | 18 +
 arch/arm64/kvm/guest.c   | 42 
 3 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index cb6a2ee..42eb528 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -137,6 +137,8 @@
  *
  * VTCR_EL2.SL0 and T0SZ are configured per VM at runtime before switching to
  * the VM.
+ *
+ * With 16k/64k, the maximum number of levels supported at Stage2 is 3.
  */
 
 #define VTCR_EL2_COMMON_BITS   (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \
@@ -150,6 +152,7 @@
  */
 #define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K
 #define VTCR_EL2_TGRAN_SL0_BASE3UL
+#define ARM64_TGRAN_STAGE2_MAX_LEVELS  3
 
 #elif defined(CONFIG_ARM64_16K_PAGES)
 /*
@@ -158,6 +161,8 @@
  */
 #define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K
 #define VTCR_EL2_TGRAN_SL0_BASE3UL
+#define ARM64_TGRAN_STAGE2_MAX_LEVELS  3
+
 #else  /* 4K */
 /*
  * Stage2 translation configuration:
@@ -165,6 +170,8 @@
  */
 #define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K
 #define VTCR_EL2_TGRAN_SL0_BASE2UL
+#define ARM64_TGRAN_STAGE2_MAX_LEVELS  4
+
 #endif
 
 #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index d38f395..50f632e 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -527,23 +527,7 @@ static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm)
return vttbr_baddr_mask(kvm_phys_shift(kvm), kvm_stage2_levels(kvm));
 }
 
-static inline void *stage2_alloc_pgd(struct kvm *kvm)
-{
-   u32 ipa, lvls;
-
-   /*
-* Stage2 page table can support concatenation of (upto 16) tables
-* at the entry level, thereby reducing the number of levels.
-*/
-   ipa = kvm_phys_shift(kvm);
-   lvls = stage2_pt_levels(ipa);
-
-   kvm->arch.s2_levels = lvls;
-   kvm->arch.vtcr_private = VTCR_EL2_SL0(lvls) | TCR_T0SZ(ipa);
-
-   return alloc_pages_exact(stage2_pgd_size(kvm),
-GFP_KERNEL | __GFP_ZERO);
-}
+extern void *stage2_alloc_pgd(struct kvm *kvm);
 
 static inline u32 kvm_get_ipa_limit(void)
 {
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 56a0260..5a3a687 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -31,6 +31,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "trace.h"
 
@@ -458,3 +460,43 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 
return ret;
 }
+
+void *stage2_alloc_pgd(struct kvm *kvm)
+{
+   u32 ipa, s2_lvls, lvls;
+   u64 pgd_size;
+   void *pgd;
+
+   /*
+* Stage2 page table can support concatenation of (upto 16) tables
+* at the entry level, thereby reducing the number of levels. We try
+* to use concatenation wherever possible. If we fail, fallback to
+* normal levels if possible.
+*/
+   ipa = kvm_phys_shift(kvm);
+   lvls = s2_lvls = stage2_pt_levels(ipa);
+
+retry:
+   pgd_size = __s2_pgd_size(ipa, lvls);
+   pgd = alloc_pages_exact(pgd_size, GFP_KERNEL | __GFP_ZERO);
+
+   /* Check if the PGD meets the alignment requirements */
+   if (pgd && (virt_to_phys(pgd) & ~vttbr_baddr_mask(ipa, lvls))) {
+   free_pages_exact(pgd, pgd_size);
+   pgd = NULL;
+   }
+
+   if (pgd) {
+   kvm->arch.s2_levels = lvls;
+   kvm->arch.vtcr_private = VTCR_EL2_SL0(lvls) | TCR_T0SZ(ipa);
+   } else {
+   /* Check if we can use an entry level without concatenation */
+   lvls = ARM64_HW_PGTABLE_LEVELS(ipa);
+   if ((lvls > s2_lvls) &&
+   (lvls <= CONFIG_PGTABLE_LEVELS) &&
+   (lvls <= ARM64_TGRAN_STAGE2_MAX_LEVELS))
+   goto retry;
+   }
+
+   return pgd;
+}
-- 
2.7.4




[Qemu-devel] [PATCH v3 18/20] kvm: arm64: Add support for handling 52bit IPA

2018-06-29 Thread Suzuki K Poulose
Add support for handling the 52bit IPA. 52bit IPA
support needs changes to the following :

 1) Page-table entries - We use kernel page table helpers for setting
 up the stage2. Hence we don't explicit changes here

 2) VTTBR:BADDR - This is already supported with :
   commit 529c4b05a3cb2f324aa ("arm64: handle 52-bit addresses in TTBR")

 3) VGIC support for 52bit: Supported with a patch in this series.

That leaves us with the handling for PAR and HPAR. This patch adds
support for handling the 52bit addresses in PAR and HPFAR,
which are used while handling the permission faults in stage1.

Cc: Marc Zyngier 
Cc: Kristina Martsenko 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm64/include/asm/kvm_arm.h | 7 +++
 arch/arm64/kvm/hyp/switch.c  | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 2e90942..cb6a2ee 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -301,6 +301,13 @@
 
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
 #define HPFAR_MASK (~UL(0xf))
+/*
+ * We have
+ * PAR [PA_Shift - 1   : 12] = PA  [PA_Shift - 1 : 12]
+ * HPFAR   [PA_Shift - 9   : 4]  = FIPA[PA_Shift - 1 : 12]
+ */
+#define PAR_TO_HPFAR(par)  \
+   (((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8)
 
 #define kvm_arm_exception_type \
{0, "IRQ" },\
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 355fb25..fb66320 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -260,7 +260,7 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, 
u64 *hpfar)
return false; /* Translation failed, back to guest */
 
/* Convert PAR to HPFAR format */
-   *hpfar = ((tmp >> 12) & ((1UL << 36) - 1)) << 4;
+   *hpfar = PAR_TO_HPFAR(tmp);
return true;
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM

2018-06-29 Thread Suzuki K Poulose
Allow specifying the physical address size for a new VM via
the kvm_type argument for KVM_CREATE_VM ioctl. This allows
us to finalise the stage2 page table format as early as possible
and hence perform the right checks on the memory slots without
complication. The size is encoded as Log2(PA_Size) in the bits[7:0]
of the type field and can encode more information in the future if
required. The IPA size is still capped at 40bits.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Peter Maydel 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Signed-off-by: Suzuki K Poulose 
---
 arch/arm/include/asm/kvm_mmu.h   |  2 ++
 arch/arm64/include/asm/kvm_arm.h | 10 +++---
 arch/arm64/include/asm/kvm_mmu.h |  2 ++
 include/uapi/linux/kvm.h | 10 ++
 virt/kvm/arm/arm.c   | 24 ++--
 5 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index d86f8dd..bcc3dd9 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -385,6 +385,8 @@ static inline u32 kvm_get_ipa_limit(void)
return KVM_PHYS_SHIFT;
 }
 
+static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {}
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index b02c316..2e90942 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -128,19 +128,15 @@
 #define VTCR_EL2_T0SZ(x)   TCR_T0SZ(x)
 
 /*
- * We configure the Stage-2 page tables to always restrict the IPA space to be
- * 40 bits wide (T0SZ = 24).  Systems with a PARange smaller than 40 bits are
- * not known to exist and will break with this configuration.
+ * We configure the Stage-2 page tables based on the requested size of
+ * IPA for each VM. The default size is set to 40bits and is not allowed
+ * go below that limit (for backward compatibility).
  *
  * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time
  * (see hyp-init.S).
  *
  * VTCR_EL2.SL0 and T0SZ are configured per VM at runtime before switching to
  * the VM.
- *
- * Note that when using 4K pages, we concatenate two first level page tables
- * together. With 16K pages, we concatenate 16 first level page tables.
- *
  */
 
 #define VTCR_EL2_COMMON_BITS   (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index b4564d8..f3fb05a3 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -537,5 +537,7 @@ static inline u32 kvm_get_ipa_limit(void)
return KVM_PHYS_SHIFT;
 }
 
+static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4df9bb6..fa4cab0 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt {
 #define KVM_S390_SIE_PAGE_OFFSET 1
 
 /*
+ * On arm/arm64, machine type can be used to request the physical
+ * address size for the VM. Bits [7-0] have been reserved for the
+ * PA size shift (i.e, log2(PA_Size)). For backward compatibility,
+ * value 0 implies the default IPA size, which is 40bits.
+ */
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK0xff
+#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x)  \
+   ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK)
+
+/*
  * ioctls for /dev/kvm fds:
  */
 #define KVM_GET_API_VERSION   _IO(KVMIO,   0x00)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 0d99e67..1085761 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -112,6 +112,25 @@ void kvm_arch_check_processor_compat(void *rtn)
 }
 
 
+static int kvm_arch_config_vm(struct kvm *kvm, unsigned long type)
+{
+   u32 ipa_shift = KVM_VM_TYPE_ARM_PHYS_SHIFT(type);
+
+   /*
+* Make sure the size, if specified, is within the range of
+* default size and supported maximum limit.
+*/
+   if (ipa_shift) {
+   if (ipa_shift < KVM_PHYS_SHIFT || ipa_shift > kvm_ipa_limit)
+   return -EINVAL;
+   } else {
+   ipa_shift = KVM_PHYS_SHIFT;
+   }
+
+   kvm_config_stage2(kvm, ipa_shift);
+   return 0;
+}
+
 /**
  * kvm_arch_init_vm - initializes a VM data structure
  * @kvm:   pointer to the KVM struct
@@ -120,8 +139,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
int ret, cpu;
 
-   if (type)
-   return -EINVAL;
+   ret = kvm_arch_config_vm(kvm, type);
+   if (ret)
+   return ret;
 
kvm->arch.last_vcpu_ran = 
alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
if (!kvm->arch.last_vcpu_ran)
-- 
2.7.4




[Qemu-devel] [PATCH v3 17/20] vgic: Add support for 52bit guest physical address

2018-06-29 Thread Suzuki K Poulose
From: Kristina Martsenko 

Add support for handling 52bit guest physical address to the
VGIC layer. So far we have limited the guest physical address
to 48bits, by explicitly masking the upper bits. This patch
removes the restriction. We do not have to check if the host
supports 52bit as the gpa is always validated during an access.
(e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()).
Also, the ITS table save-restore is also not affected with
the enhancement. The DTE entries already store the bits[51:8]
of the ITT_addr (with a 256byte alignment).

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Kristina Martsenko 
[ Macro clean ups, fix PROPBASER and PENDBASER accesses ]
Signed-off-by: Suzuki K Poulose 
---
 include/linux/irqchip/arm-gic-v3.h |  5 +
 virt/kvm/arm/vgic/vgic-its.c   | 36 ++--
 virt/kvm/arm/vgic/vgic-mmio-v3.c   |  2 --
 3 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index cbb872c..bc4b95b 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -346,6 +346,8 @@
 #define GITS_CBASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, 
RaWaWt)
 #define GITS_CBASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, 
RaWaWb)
 
+#define GITS_CBASER_ADDRESS(cbaser)((cbaser) & GENMASK_ULL(52, 12))
+
 #define GITS_BASER_NR_REGS 8
 
 #define GITS_BASER_VALID   (1ULL << 63)
@@ -377,6 +379,9 @@
 #define GITS_BASER_ENTRY_SIZE_MASK GENMASK_ULL(52, 48)
 #define GITS_BASER_PHYS_52_to_48(phys) \
(((phys) & GENMASK_ULL(47, 16)) | (((phys) >> 48) & 0xf) << 12)
+#define GITS_BASER_ADDR_48_to_52(baser)
\
+   (((baser) & GENMASK_ULL(47, 16)) | (((baser) >> 12) & 0xf) << 48)
+
 #define GITS_BASER_SHAREABILITY_SHIFT  (10)
 #define GITS_BASER_InnerShareable  \
GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 4ed79c9..c6eb390 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -234,13 +234,6 @@ static struct its_ite *find_ite(struct vgic_its *its, u32 
device_id,
list_for_each_entry(dev, &(its)->device_list, dev_list) \
list_for_each_entry(ite, &(dev)->itt_head, ite_list)
 
-/*
- * We only implement 48 bits of PA at the moment, although the ITS
- * supports more. Let's be restrictive here.
- */
-#define BASER_ADDRESS(x)   ((x) & GENMASK_ULL(47, 16))
-#define CBASER_ADDRESS(x)  ((x) & GENMASK_ULL(47, 12))
-
 #define GIC_LPI_OFFSET 8192
 
 #define VITS_TYPER_IDBITS 16
@@ -752,6 +745,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 
baser, u32 id,
 {
int l1_tbl_size = GITS_BASER_NR_PAGES(baser) * SZ_64K;
u64 indirect_ptr, type = GITS_BASER_TYPE(baser);
+   phys_addr_t base = GITS_BASER_ADDR_48_to_52(baser);
int esz = GITS_BASER_ENTRY_SIZE(baser);
int index;
gfn_t gfn;
@@ -776,7 +770,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 
baser, u32 id,
if (id >= (l1_tbl_size / esz))
return false;
 
-   addr = BASER_ADDRESS(baser) + id * esz;
+   addr = base + id * esz;
gfn = addr >> PAGE_SHIFT;
 
if (eaddr)
@@ -791,7 +785,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 
baser, u32 id,
 
/* Each 1st level entry is represented by a 64-bit value. */
if (kvm_read_guest_lock(its->dev->kvm,
-  BASER_ADDRESS(baser) + index * sizeof(indirect_ptr),
+  base + index * sizeof(indirect_ptr),
   _ptr, sizeof(indirect_ptr)))
return false;
 
@@ -801,11 +795,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 
baser, u32 id,
if (!(indirect_ptr & BIT_ULL(63)))
return false;
 
-   /*
-* Mask the guest physical address and calculate the frame number.
-* Any address beyond our supported 48 bits of PA will be caught
-* by the actual check in the final step.
-*/
+   /* Mask the guest physical address and calculate the frame number. */
indirect_ptr &= GENMASK_ULL(51, 16);
 
/* Find the address of the actual entry */
@@ -1297,9 +1287,6 @@ static u64 vgic_sanitise_its_baser(u64 reg)
  GITS_BASER_OUTER_CACHEABILITY_SHIFT,
  vgic_sanitise_outer_cacheability);
 
-   /* Bits 15:12 contain bits 51:48 of the PA, which we don't support. */
-   reg &= ~GENMASK_ULL(15, 12);
-
/* We support only one (ITS) page size: 64K */
reg =

[Qemu-devel] [PATCH v3 08/20] kvm: arm/arm64: Abstract stage2 pgd table allocation

2018-06-29 Thread Suzuki K Poulose
Abstract the allocation of stage2 entry level tables for
given VM, so that later we can choose to fall back to the
normal page table levels (i.e, avoid entry level table
concatenation) on arm64.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2:
 - New patch
---
 arch/arm/include/asm/kvm_mmu.h   | 6 ++
 arch/arm64/include/asm/kvm_mmu.h | 6 ++
 virt/kvm/arm/mmu.c   | 2 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index f36eb20..b2da5a4 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -372,6 +372,12 @@ static inline int hyp_map_aux_data(void)
return 0;
 }
 
+static inline void *stage2_alloc_pgd(struct kvm *kvm)
+{
+   return alloc_pages_exact(stage2_pgd_size(kvm),
+GFP_KERNEL | __GFP_ZERO);
+}
+
 #define kvm_phys_to_vttbr(addr)(addr)
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 5da8f52..dbaf513 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -501,5 +501,11 @@ static inline int hyp_map_aux_data(void)
 
 #define kvm_phys_to_vttbr(addr)phys_to_ttbr(addr)
 
+static inline void *stage2_alloc_pgd(struct kvm *kvm)
+{
+   return alloc_pages_exact(stage2_pgd_size(kvm),
+GFP_KERNEL | __GFP_ZERO);
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 82dd571..a339e00 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -868,7 +868,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
}
 
/* Allocate the HW PGD, making sure that each page gets its own 
refcount */
-   pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
+   pgd = stage2_alloc_pgd(kvm);
if (!pgd)
return -ENOMEM;
 
-- 
2.7.4




[Qemu-devel] [PATCH v3 14/20] kvm: arm/arm64: Expose supported physical address limit for VM

2018-06-29 Thread Suzuki K Poulose
Expose the maximum physical address size supported by the host
for a VM. This could be later used by the userspace to choose the
appropriate size for a given VM. The limit is determined as the
minimum of actual CPU limit, the kernel limit (i.e, either 48 or 52)
and the stage2 page table support limit (which is 40bits at the moment).
For backward compatibility, we support a minimum of 40bits. The limit
will be lifted as we add support for the stage2 to support the host
kernel PA limit.

This value may be different from what is exposed to the VM via
CPU ID registers. The limit only applies to the stage2 page table.

Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Peter Maydel 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2:
 - Bump the ioctl number
---
 Documentation/virtual/kvm/api.txt | 15 +++
 arch/arm/include/asm/kvm_mmu.h|  5 +
 arch/arm64/include/asm/kvm_mmu.h  |  5 +
 include/uapi/linux/kvm.h  |  6 ++
 virt/kvm/arm/arm.c|  6 ++
 5 files changed, 37 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index d10944e..662374b 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3561,6 +3561,21 @@ Returns: 0 on success,
-ENOENT on deassign if the conn_id isn't registered
-EEXIST on assign if the conn_id is already registered
 
+4.113 KVM_ARM_GET_MAX_VM_PHYS_SHIFT
+Capability: basic
+Architectures: arm, arm64
+Type: system ioctl
+Parameters: none
+Returns: log2(Maximum Guest physical address space size) supported by the
+hypervisor.
+
+This ioctl can be used to identify the maximum guest physical address
+space size supported by the hypervisor. The returned value indicates the
+maximum size of the address that can be resolved by the stage2
+translation table on arm/arm64. On arm64, the value is decided based
+on the host kernel configuration and the system wide safe value of
+ID_AA64MMFR0_EL1:PARange. This may not match the value exposed to the
+VM in CPU ID registers.
 
 5. The kvm_run structure
 
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index b2da5a4..d86f8dd 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -380,6 +380,11 @@ static inline void *stage2_alloc_pgd(struct kvm *kvm)
 
 #define kvm_phys_to_vttbr(addr)(addr)
 
+static inline u32 kvm_get_ipa_limit(void)
+{
+   return KVM_PHYS_SHIFT;
+}
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 813a72a..b4564d8 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -532,5 +532,10 @@ static inline void *stage2_alloc_pgd(struct kvm *kvm)
 GFP_KERNEL | __GFP_ZERO);
 }
 
+static inline u32 kvm_get_ipa_limit(void)
+{
+   return KVM_PHYS_SHIFT;
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6270a3..4df9bb6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -775,6 +775,12 @@ struct kvm_ppc_resize_hpt {
 #define KVM_GET_MSR_FEATURE_INDEX_LIST_IOWR(KVMIO, 0x0a, struct 
kvm_msr_list)
 
 /*
+ * Get the maximum physical address size supported by the host.
+ * Returns log2(Max-Physical-Address-Size)
+ */
+#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT  _IO(KVMIO, 0x0b)
+
+/*
  * Extension capability list.
  */
 #define KVM_CAP_IRQCHIP  0
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index d2637bb..0d99e67 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -66,6 +66,7 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u32 kvm_next_vmid;
 static unsigned int kvm_vmid_bits __read_mostly;
 static DEFINE_RWLOCK(kvm_vmid_lock);
+static u32 kvm_ipa_limit;
 
 static bool vgic_present;
 
@@ -248,6 +249,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
 {
+   if (ioctl == KVM_ARM_GET_MAX_VM_PHYS_SHIFT)
+   return kvm_ipa_limit;
+
return -EINVAL;
 }
 
@@ -1361,6 +1365,8 @@ static int init_common_resources(void)
kvm_vmid_bits = kvm_get_vmid_bits();
kvm_info("%d-bit VMID\n", kvm_vmid_bits);
 
+   kvm_ipa_limit = kvm_get_ipa_limit();
+
return 0;
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH v3 09/20] kvm: arm64: Make stage2 page table layout dynamic

2018-06-29 Thread Suzuki K Poulose
So far we had a static stage2 page table handling code, based on a
fixed IPA of 40bits. As we prepare for a configurable IPA size per
VM, make our stage2 page table code dynamic, to do the right thing
for a given VM. We ensure the existing condition is always true even
when we lift the limit on the IPA. i.e,

page table levels in stage1 >= page table levels in stage2

Support for the IPA size configuration needs other changes in the way
we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still
fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h
to the top, before including the asm/stage2_pgtable.h to avoid a forward
declaration.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2
 - Restrict the stage2 page table to allow reusing the host page table
   helpers for now, until we get stage1 independent page table helpers.
---
 arch/arm64/include/asm/kvm_mmu.h  |  14 +-
 arch/arm64/include/asm/stage2_pgtable-nopmd.h |  42 --
 arch/arm64/include/asm/stage2_pgtable-nopud.h |  39 -
 arch/arm64/include/asm/stage2_pgtable.h   | 207 +++---
 4 files changed, 159 insertions(+), 143 deletions(-)
 delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
 delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index dbaf513..a351722 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * As ARMv8.0 only has the TTBR0_EL2 register, we cannot express
@@ -147,6 +148,13 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
 #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
 #define kvm_vttbr_baddr_mask(kvm)  VTTBR_BADDR_MASK
 
+static inline bool kvm_page_empty(void *ptr)
+{
+   struct page *ptr_page = virt_to_page(ptr);
+
+   return page_count(ptr_page) == 1;
+}
+
 #include 
 
 int create_hyp_mappings(void *from, void *to, pgprot_t prot);
@@ -237,12 +245,6 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp)
return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN);
 }
 
-static inline bool kvm_page_empty(void *ptr)
-{
-   struct page *ptr_page = virt_to_page(ptr);
-   return page_count(ptr_page) == 1;
-}
-
 #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
 
 #ifdef __PAGETABLE_PMD_FOLDED
diff --git a/arch/arm64/include/asm/stage2_pgtable-nopmd.h 
b/arch/arm64/include/asm/stage2_pgtable-nopmd.h
deleted file mode 100644
index 0280ded..000
--- a/arch/arm64/include/asm/stage2_pgtable-nopmd.h
+++ /dev/null
@@ -1,42 +0,0 @@
-/*
- * Copyright (C) 2016 - ARM Ltd
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef __ARM64_S2_PGTABLE_NOPMD_H_
-#define __ARM64_S2_PGTABLE_NOPMD_H_
-
-#include 
-
-#define __S2_PGTABLE_PMD_FOLDED
-
-#define S2_PMD_SHIFT   S2_PUD_SHIFT
-#define S2_PTRS_PER_PMD1
-#define S2_PMD_SIZE(1UL << S2_PMD_SHIFT)
-#define S2_PMD_MASK(~(S2_PMD_SIZE-1))
-
-#define stage2_pud_none(kvm, pud)  (0)
-#define stage2_pud_present(kvm, pud)   (1)
-#define stage2_pud_clear(kvm, pud) do { } while (0)
-#define stage2_pud_populate(kvm, pud, pmd) do { } while (0)
-#define stage2_pmd_offset(kvm, pud, address)   ((pmd_t *)(pud))
-
-#define stage2_pmd_free(kvm, pmd)  do { } while (0)
-
-#define stage2_pmd_addr_end(kvm, addr, end)(end)
-
-#define stage2_pud_huge(kvm, pud)  (0)
-#define stage2_pmd_table_empty(kvm, pmdp)  (0)
-
-#endif
diff --git a/arch/arm64/include/asm/stage2_pgtable-nopud.h 
b/arch/arm64/include/asm/stage2_pgtable-nopud.h
deleted file mode 100644
index cd6304e..000
--- a/arch/arm64/include/asm/stage2_pgtable-nopud.h
+++ /dev/null
@@ -1,39 +0,0 @@
-/*
- * Copyright (C) 2016 - ARM Ltd
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should h

[Qemu-devel] [PATCH v3 07/20] kvm: arm/arm64: Prepare for VM specific stage2 translations

2018-06-29 Thread Suzuki K Poulose
Right now the stage2 page table for a VM is hard coded, assuming
an IPA of 40bits. As we are about to add support for per VM IPA,
prepare the stage2 page table helpers to accept the kvm instance
to make the right decision for the VM. No functional changes.
Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves
some of the definitions dependent on kvm instance to asm/kvm_mmu.h
for arm32. In that process drop the _AC() specifier constants

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2:
 - Update commit description abuot the movement to asm/kvm_mmu.h
   for arm32
 - Drop _AC() specifiers
---
 arch/arm/include/asm/kvm_arm.h|   3 +-
 arch/arm/include/asm/kvm_mmu.h|  15 +++-
 arch/arm/include/asm/stage2_pgtable.h |  42 -
 arch/arm64/include/asm/kvm_mmu.h  |   7 +-
 arch/arm64/include/asm/stage2_pgtable-nopmd.h |  18 ++--
 arch/arm64/include/asm/stage2_pgtable-nopud.h |  16 ++--
 arch/arm64/include/asm/stage2_pgtable.h   |  49 ++-
 virt/kvm/arm/arm.c|   2 +-
 virt/kvm/arm/mmu.c| 119 +-
 virt/kvm/arm/vgic/vgic-kvm-device.c   |   2 +-
 10 files changed, 148 insertions(+), 125 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 3ab8b37..c3f1f9b 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -133,8 +133,7 @@
  * space.
  */
 #define KVM_PHYS_SHIFT (40)
-#define KVM_PHYS_SIZE  (_AC(1, ULL) << KVM_PHYS_SHIFT)
-#define KVM_PHYS_MASK  (KVM_PHYS_SIZE - _AC(1, ULL))
+
 #define PTRS_PER_S2_PGD(_AC(1, ULL) << (KVM_PHYS_SHIFT - 30))
 
 /* Virtualization Translation Control Register (VTCR) bits */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 8553d68..f36eb20 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -36,15 +36,19 @@
})
 
 /*
- * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation 
levels.
+ * kvm_mmu_cache_min_pages() is the number of stage2 page
+ * table translation levels, excluding the top level, for
+ * the given VM. Since we have a 3 level page-table, this
+ * is fixed.
  */
-#define KVM_MMU_CACHE_MIN_PAGES2
+#define kvm_mmu_cache_min_pages(kvm)   2
 
 #ifndef __ASSEMBLY__
 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -52,6 +56,13 @@
 /* Ensure compatibility with arm64 */
 #define VA_BITS32
 
+#define kvm_phys_shift(kvm)KVM_PHYS_SHIFT
+#define kvm_phys_size(kvm) (1ULL << kvm_phys_shift(kvm))
+#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - 1ULL)
+#define kvm_vttbr_baddr_mask(kvm)  VTTBR_BADDR_MASK
+
+#define stage2_pgd_size(kvm)   (PTRS_PER_S2_PGD * sizeof(pgd_t))
+
 int create_hyp_mappings(void *from, void *to, pgprot_t prot);
 int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
   void __iomem **kaddr,
diff --git a/arch/arm/include/asm/stage2_pgtable.h 
b/arch/arm/include/asm/stage2_pgtable.h
index 460d616..e22ae94 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -19,43 +19,45 @@
 #ifndef __ARM_S2_PGTABLE_H_
 #define __ARM_S2_PGTABLE_H_
 
-#define stage2_pgd_none(pgd)   pgd_none(pgd)
-#define stage2_pgd_clear(pgd)  pgd_clear(pgd)
-#define stage2_pgd_present(pgd)pgd_present(pgd)
-#define stage2_pgd_populate(pgd, pud)  pgd_populate(NULL, pgd, pud)
-#define stage2_pud_offset(pgd, address)pud_offset(pgd, address)
-#define stage2_pud_free(pud)   pud_free(NULL, pud)
+#define stage2_pgd_none(kvm, pgd)  pgd_none(pgd)
+#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd)
+#define stage2_pgd_present(kvm, pgd)   pgd_present(pgd)
+#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud)
+#define stage2_pud_offset(kvm, pgd, address)   pud_offset(pgd, address)
+#define stage2_pud_free(kvm, pud)  pud_free(NULL, pud)
 
-#define stage2_pud_none(pud)   pud_none(pud)
-#define stage2_pud_clear(pud)  pud_clear(pud)
-#define stage2_pud_present(pud)pud_present(pud)
-#define stage2_pud_populate(pud, pmd)  pud_populate(NULL, pud, pmd)
-#define stage2_pmd_offset(pud, address)pmd_offset(pud, address)
-#define stage2_pmd_free(pmd)   pmd_free(NULL, pmd)
+#define stage2_pud_none(kvm, pud)  pud_none(pud)
+#define stage2_pud_clear(kvm, pud) pud_clear(pud)
+#define stage2_pud_present(kvm, pud)   pud_present(pud)
+#define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd)
+#define stage2_pmd_offset(kvm, pud, address)   pmd_offset(pud,

[Qemu-devel] [PATCH v3 11/20] kvm: arm64: Helper for computing VTCR_EL2.SL0

2018-06-29 Thread Suzuki K Poulose
VTCR_EL2 holds the following key stage2 translation table
parameters:
  SL0  - Entry level in the page table lookup.
  T0SZ - Denotes the size of the memory addressed by the table.

We have been using fixed values for the SL0 depending on the
page size as we have a fixed IPA size. But since we are about
to make it dynamic, we need to calculate the SL0 at runtime
per VM. This patch adds a helper to comput the value of SL0 for
a given IPA.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since v2:
 - Part 2 of split from VTCR & VTTBR dynamic configuration
---
 arch/arm64/include/asm/kvm_arm.h | 35 ---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index c557f45..11a7db0 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -153,7 +153,8 @@
  * 2 level page tables (SL = 1)
  */
 #define VTCR_EL2_TGRAN_FLAGS   (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1)
-#define VTTBR_X_TGRAN_MAGIC38
+#define VTCR_EL2_TGRAN_SL0_BASE3UL
+
 #elif defined(CONFIG_ARM64_16K_PAGES)
 /*
  * Stage2 translation configuration:
@@ -161,7 +162,7 @@
  * 2 level page tables (SL = 1)
  */
 #define VTCR_EL2_TGRAN_FLAGS   (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1)
-#define VTTBR_X_TGRAN_MAGIC42
+#define VTCR_EL2_TGRAN_SL0_BASE3UL
 #else  /* 4K */
 /*
  * Stage2 translation configuration:
@@ -169,11 +170,39 @@
  * 3 level page tables (SL = 1)
  */
 #define VTCR_EL2_TGRAN_FLAGS   (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1)
-#define VTTBR_X_TGRAN_MAGIC37
+#define VTCR_EL2_TGRAN_SL0_BASE2UL
 #endif
 
 #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | 
VTCR_EL2_TGRAN_FLAGS)
 /*
+ * VTCR_EL2:SL0 indicates the entry level for Stage2 translation.
+ * Interestingly, it depends on the page size.
+ * See D.10.2.110, VTCR_EL2, in ARM DDI 0487B.b
+ *
+ * -
+ * | Entry level   |  4K  | 16K/64K |
+ * --
+ * | Level: 0  |  2   |   - |
+ * --
+ * | Level: 1  |  1   |   2 |
+ * --
+ * | Level: 2  |  0   |   1 |
+ * --
+ * | Level: 3  |  -   |   0 |
+ * --
+ *
+ * That table roughly translates to :
+ *
+ * SL0(PAGE_SIZE, Entry_level) = SL0_BASE(PAGE_SIZE) - Entry_Level
+ *
+ * Where SL0_BASE(4K) = 2 and SL0_BASE(16K) = 3, SL0_BASE(64K) = 3, provided
+ * we take care of ruling out the unsupported cases and
+ * Entry_Level = 4 - Number_of_levels.
+ *
+ */
+#define VTCR_EL2_SL0(levels) \
+   ((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT)
+/*
  * ARM VMSAv8-64 defines an algorithm for finding the translation table
  * descriptors in section D4.2.8 in ARM DDI 0487B.b.
  *
-- 
2.7.4




[Qemu-devel] [PATCH v3 12/20] kvm: arm64: Add helper for loading the stage2 setting for a VM

2018-06-29 Thread Suzuki K Poulose
We load the stage2 context of a guest for different operations,
including running the guest and tlb maintenance on behalf of the
guest. As of now only the vttbr is private to the guest, but this
is about to change with IPA per VM. Add a helper to load the stage2
configuration for a VM, which could do the right thing with the
future changes.

Cc: Christoffer Dall 
Cc: Marc Zyngier 
Signed-off-by: Suzuki K Poulose 
---
Changes since v2:
 - New patch
---
 arch/arm64/include/asm/kvm_hyp.h | 6 ++
 arch/arm64/kvm/hyp/switch.c  | 2 +-
 arch/arm64/kvm/hyp/tlb.c | 4 ++--
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 384c343..82f9994 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -155,5 +155,11 @@ void deactivate_traps_vhe_put(void);
 u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
 void __noreturn __hyp_do_panic(unsigned long, ...);
 
+/* Must be called from hyp code running at EL2 */
+static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm)
+{
+   write_sysreg(kvm->arch.vttbr, vttbr_el2);
+}
+
 #endif /* __ARM64_KVM_HYP_H__ */
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d496ef5..355fb25 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -195,7 +195,7 @@ void deactivate_traps_vhe_put(void)
 
 static void __hyp_text __activate_vm(struct kvm *kvm)
 {
-   write_sysreg(kvm->arch.vttbr, vttbr_el2);
+   __load_guest_stage2(kvm);
 }
 
 static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 131c777..4dbd9c6 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -30,7 +30,7 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm 
*kvm)
 * bits. Changing E2H is impossible (goodbye TTBR1_EL2), so
 * let's flip TGE before executing the TLB operation.
 */
-   write_sysreg(kvm->arch.vttbr, vttbr_el2);
+   __load_guest_stage2(kvm);
val = read_sysreg(hcr_el2);
val &= ~HCR_TGE;
write_sysreg(val, hcr_el2);
@@ -39,7 +39,7 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm 
*kvm)
 
 static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm)
 {
-   write_sysreg(kvm->arch.vttbr, vttbr_el2);
+   __load_guest_stage2(kvm);
isb();
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH v3 05/20] kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table

2018-06-29 Thread Suzuki K Poulose
So far we have only supported 3 level page table with fixed IPA of 40bits.
Fix stage2_flush_memslot() to accommodate for 4 level tables.

Cc: Marc Zyngier 
Acked-by: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
 virt/kvm/arm/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1d90d79..061e6b3 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -379,7 +379,8 @@ static void stage2_flush_memslot(struct kvm *kvm,
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
do {
next = stage2_pgd_addr_end(addr, end);
-   stage2_flush_puds(kvm, pgd, addr, next);
+   if (!stage2_pgd_none(*pgd))
+   stage2_flush_puds(kvm, pgd, addr, next);
} while (pgd++, addr = next, addr != end);
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH v3 10/20] kvm: arm64: Dynamic configuration of VTTBR mask

2018-06-29 Thread Suzuki K Poulose
On arm64 VTTBR_EL2:BADDR holds the base address for the stage2
translation table. The Arm ARM mandates that the bits BADDR[x-1:0]
should be 0, where 'x' is defined for a given IPA Size and the
number of levels for a translation granule size. It is defined
using some magical constants. This patch is a reverse engineered
implementation to calculate the 'x' at runtime for a given ipa and
number of page table levels. See patch for more details.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2:
 - Part 1 of spilt from VTCR & VTTBR dynamic configuration
---
 arch/arm64/include/asm/kvm_arm.h | 60 +---
 arch/arm64/include/asm/kvm_mmu.h | 25 -
 2 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 3dffd38..c557f45 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -140,8 +140,6 @@
  * Note that when using 4K pages, we concatenate two first level page tables
  * together. With 16K pages, we concatenate 16 first level page tables.
  *
- * The magic numbers used for VTTBR_X in this patch can be found in Tables
- * D4-23 and D4-25 in ARM DDI 0487A.b.
  */
 
 #define VTCR_EL2_T0SZ_IPA  VTCR_EL2_T0SZ_40B
@@ -175,9 +173,63 @@
 #endif
 
 #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | 
VTCR_EL2_TGRAN_FLAGS)
-#define VTTBR_X(VTTBR_X_TGRAN_MAGIC - 
VTCR_EL2_T0SZ_IPA)
+/*
+ * ARM VMSAv8-64 defines an algorithm for finding the translation table
+ * descriptors in section D4.2.8 in ARM DDI 0487B.b.
+ *
+ * The algorithm defines the expectations on the BaseAddress (for the page
+ * table) bits resolved at each level based on the page size, entry level
+ * and T0SZ. The variable "x" in the algorithm also affects the VTTBR:BADDR
+ * for stage2 page table.
+ *
+ * The value of "x" is calculated as :
+ * x = Magic_N - T0SZ
+ *
+ * where Magic_N is an integer depending on the page size and the entry
+ * level of the page table as below:
+ *
+ * 
+ * | Entry level   |  4K16K   64K |
+ * 
+ * | Level: 0 (4 levels)   | 28   |  -  |  -  |
+ * 
+ * | Level: 1 (3 levels)   | 37   | 31  | 25  |
+ * 
+ * | Level: 2 (2 levels)   | 46   | 42  | 38  |
+ * 
+ * | Level: 3 (1 level)| -| 53  | 51  |
+ * 
+ *
+ * We have a magic formula for the Magic_N below.
+ *
+ *  Magic_N(PAGE_SIZE, Entry_Level) = 64 - ((PAGE_SHIFT - 3) * Number of 
levels)
+ *
+ * where number of levels = (4 - Entry_Level).
+ *
+ * So, given that T0SZ = (64 - PA_SHIFT), we can compute 'x' as follows:
+ *
+ * x = (64 - ((PAGE_SHIFT - 3) * Number_of_levels)) - (64 - PA_SHIFT)
+ *   = PA_SHIFT - ((PAGE_SHIFT - 3) * Number of levels)
+ *
+ * Here is one way to explain the Magic Formula:
+ *
+ *  x = log2(Size_of_Entry_Level_Table)
+ *
+ * Since, we can resolve (PAGE_SHIFT - 3) bits at each level, and another
+ * PAGE_SHIFT bits in the PTE, we have :
+ *
+ *  Bits_Entry_level = PA_SHIFT - ((PAGE_SHIFT - 3) * (n - 1) + PAGE_SHIFT)
+ *  = PA_SHIFT - (PAGE_SHIFT - 3) * n - 3
+ *  where n = number of levels, and since each pointer is 8bytes, we have:
+ *
+ *  x = Bits_Entry_Level + 3
+ *= PA_SHIFT - (PAGE_SHIFT - 3) * n
+ *
+ * The only constraint here is that, we have to find the number of page table
+ * levels for a given IPA size (which we do, see stage2_pt_levels())
+ */
+#define ARM64_VTTBR_X(ipa, levels) ((ipa) - ((levels) * (PAGE_SHIFT - 3)))
 
-#define VTTBR_BADDR_MASK  (((UL(1) << (PHYS_MASK_SHIFT - VTTBR_X)) - 1) << 
VTTBR_X)
 #define VTTBR_VMID_SHIFT  (UL(48))
 #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index a351722..813a72a 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -146,7 +146,6 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
 #define kvm_phys_shift(kvm)KVM_PHYS_SHIFT
 #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm))
 #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL))
-#define kvm_vttbr_baddr_mask(kvm)  VTTBR_BADDR_MASK
 
 static inline bool kvm_page_empty(void *ptr)
 {
@@ -503,6 +502,30 @@ static inline int hyp_map_aux_data(void)
 
 #define kvm_phys_to_vttbr(addr)phys_to_ttbr(addr)
 
+/*
+ * Get the magic number 'x' for VTTBR:BADDR of this KVM instance.
+ * With v8.2 LVA extensions, 'x' should be a minimum of 6 with
+ * 52bit IPS.
+ */
+sta

[Qemu-devel] [PATCH v3 06/20] kvm: arm/arm64: Remove spurious WARN_ON

2018-06-29 Thread Suzuki K Poulose
On a 4-level page table pgd entry can be empty, unlike a 3-level
page table. Remove the spurious WARN_ON() in stage_get_pud().

Cc: Marc Zyngier 
Acked-by: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
 virt/kvm/arm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 061e6b3..308171c 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -976,7 +976,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache
pud_t *pud;
 
pgd = kvm->arch.pgd + stage2_pgd_index(addr);
-   if (WARN_ON(stage2_pgd_none(*pgd))) {
+   if (stage2_pgd_none(*pgd)) {
if (!cache)
return NULL;
pud = mmu_memory_cache_alloc(cache);
-- 
2.7.4




[Qemu-devel] [PATCH v3 02/20] virtio: pci-legacy: Validate queue pfn

2018-06-29 Thread Suzuki K Poulose
Legacy PCI over virtio uses a 32bit PFN for the queue. If the
queue pfn is too large to fit in 32bits, which we could hit on
arm64 systems with 52bit physical addresses (even with 64K page
size), we simply miss out a proper link to the other side of
the queue.

Add a check to validate the PFN, rather than silently breaking
the devices.

Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Peter Maydel 
Cc: Jean-Philippe Brucker 
Signed-off-by: Suzuki K Poulose 
---
Changes since v2:
 - Change errno to -E2BIG
---
 drivers/virtio/virtio_pci_legacy.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci_legacy.c 
b/drivers/virtio/virtio_pci_legacy.c
index 2780886..c0d6987a 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device 
*vp_dev,
struct virtqueue *vq;
u16 num;
int err;
+   u64 q_pfn;
 
/* Select the queue we're interested in */
iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
@@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device 
*vp_dev,
if (!vq)
return ERR_PTR(-ENOMEM);
 
+   q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
+   if (q_pfn >> 32) {
+   dev_err(_dev->pci_dev->dev, "virtio-pci queue PFN too 
large\n");
+   err = -E2BIG;
+   goto out_del_vq;
+   }
+
/* activate the queue */
-   iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT,
- vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+   iowrite32(q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
 
vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY;
 
@@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device 
*vp_dev,
 
 out_deactivate:
iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN);
+out_del_vq:
vring_del_virtqueue(vq);
return ERR_PTR(err);
 }
-- 
2.7.4




[Qemu-devel] [PATCH v3 04/20] kvm: arm64: Clean up VTCR_EL2 initialisation

2018-06-29 Thread Suzuki K Poulose
Use the new helper for converting the parange to the physical shift.
Also, add the missing definitions for the VTCR_EL2 register fields
and use them instead of hard coding numbers.

Cc: Marc Zyngier 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2
 - Part 2 of the split from original patch.
 - Also add missing VTCR field helpers and use them.
---
 arch/arm64/include/asm/kvm_arm.h |  3 +++
 arch/arm64/kvm/hyp/s2-setup.c| 30 ++
 2 files changed, 9 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 6dd285e..3dffd38 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -106,6 +106,7 @@
 #define VTCR_EL2_RES1  (1 << 31)
 #define VTCR_EL2_HD(1 << 22)
 #define VTCR_EL2_HA(1 << 21)
+#define VTCR_EL2_PS_SHIFT  TCR_EL2_PS_SHIFT
 #define VTCR_EL2_PS_MASK   TCR_EL2_PS_MASK
 #define VTCR_EL2_TG0_MASK  TCR_TG0_MASK
 #define VTCR_EL2_TG0_4KTCR_TG0_4K
@@ -126,6 +127,8 @@
 #define VTCR_EL2_VS_8BIT   (0 << VTCR_EL2_VS_SHIFT)
 #define VTCR_EL2_VS_16BIT  (1 << VTCR_EL2_VS_SHIFT)
 
+#define VTCR_EL2_T0SZ(x)   TCR_T0SZ(x)
+
 /*
  * We configure the Stage-2 page tables to always restrict the IPA space to be
  * 40 bits wide (T0SZ = 24).  Systems with a PARange smaller than 40 bits are
diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c
index 603e1ee..81094f1 100644
--- a/arch/arm64/kvm/hyp/s2-setup.c
+++ b/arch/arm64/kvm/hyp/s2-setup.c
@@ -19,11 +19,13 @@
 #include 
 #include 
 #include 
+#include 
 
 u32 __hyp_text __init_stage2_translation(void)
 {
u64 val = VTCR_EL2_FLAGS;
u64 parange;
+   u32 phys_shift;
u64 tmp;
 
/*
@@ -34,30 +36,10 @@ u32 __hyp_text __init_stage2_translation(void)
parange = read_sysreg(id_aa64mmfr0_el1) & 7;
if (parange > ID_AA64MMFR0_PARANGE_MAX)
parange = ID_AA64MMFR0_PARANGE_MAX;
-   val |= parange << 16;
+   val |= parange << VTCR_EL2_PS_SHIFT;
 
/* Compute the actual PARange... */
-   switch (parange) {
-   case 0:
-   parange = 32;
-   break;
-   case 1:
-   parange = 36;
-   break;
-   case 2:
-   parange = 40;
-   break;
-   case 3:
-   parange = 42;
-   break;
-   case 4:
-   parange = 44;
-   break;
-   case 5:
-   default:
-   parange = 48;
-   break;
-   }
+   phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
 
/*
 * ... and clamp it to 40 bits, unless we have some braindead
@@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void)
 * return that value for the rest of the kernel to decide what
 * to do.
 */
-   val |= 64 - (parange > 40 ? 40 : parange);
+   val |= VTCR_EL2_T0SZ(phys_shift > 40 ? 40 : phys_shift);
 
/*
 * Check the availability of Hardware Access Flag / Dirty Bit
@@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void)
 
write_sysreg(val, vtcr_el2);
 
-   return parange;
+   return phys_shift;
 }
-- 
2.7.4




[Qemu-devel] [PATCH v3 03/20] arm64: Add a helper for PARange to physical shift conversion

2018-06-29 Thread Suzuki K Poulose
On arm64, ID_AA64MMFR0_EL1.PARange encodes the maximum Physical
Address range supported by the CPU. Add a helper to decode this
to actual physical shift. If we hit an unallocated value, return
the maximum range supported by the kernel.
This is will be used by the KVM to set the VTCR_EL2.T0SZ, as it
is about to move its place. Having this helper keeps the code
movement cleaner.

Cc: Catalin Marinas 
Cc: Marc Zyngier 
Cc: James Morse 
Cc: Christoffer Dall 
Signed-off-by: Suzuki K Poulose 
---
Changes since V2:
 - Split the patch
 - Limit the physical shift only for values unrecognized.
---
 arch/arm64/include/asm/cpufeature.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h 
b/arch/arm64/include/asm/cpufeature.h
index 1717ba1..855cf0e 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -530,6 +530,19 @@ void arm64_set_ssbd_mitigation(bool state);
 static inline void arm64_set_ssbd_mitigation(bool state) {}
 #endif
 
+static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
+{
+   switch (parange) {
+   case 0: return 32;
+   case 1: return 36;
+   case 2: return 40;
+   case 3: return 42;
+   case 4: return 44;
+   case 5: return 48;
+   case 6: return 52;
+   default: return CONFIG_ARM64_PA_BITS;
+   }
+}
 #endif /* __ASSEMBLY__ */
 
 #endif
-- 
2.7.4




[Qemu-devel] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support

2018-06-29 Thread Suzuki K Poulose
48-to-52 bit conversion for GIC ITS BASER.
   (suggested by Christoffer)
 - Split virtio PFN check patches and address comments.

Kristina Martsenko (1):
  vgic: Add support for 52bit guest physical address

Suzuki K Poulose (19):
  virtio: mmio-v1: Validate queue PFN
  virtio: pci-legacy: Validate queue pfn
  arm64: Add a helper for PARange to physical shift conversion
  kvm: arm64: Clean up VTCR_EL2 initialisation
  kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table
  kvm: arm/arm64: Remove spurious WARN_ON
  kvm: arm/arm64: Prepare for VM specific stage2 translations
  kvm: arm/arm64: Abstract stage2 pgd table allocation
  kvm: arm64: Make stage2 page table layout dynamic
  kvm: arm64: Dynamic configuration of VTTBR mask
  kvm: arm64: Helper for computing VTCR_EL2.SL0
  kvm: arm64: Add helper for loading the stage2 setting for a VM
  kvm: arm64: Configure VTCR per VM
  kvm: arm/arm64: Expose supported physical address limit for VM
  kvm: arm/arm64: Allow tuning the physical address size for VM
  kvm: arm64: Switch to per VM IPA limit
  kvm: arm64: Add support for handling 52bit IPA
  kvm: arm64: Allow IPA size supported by the system
  kvm: arm64: Fall back to normal stage2 entry level

 Documentation/virtual/kvm/api.txt |  15 ++
 arch/arm/include/asm/kvm_arm.h|   3 +-
 arch/arm/include/asm/kvm_mmu.h|  28 +++-
 arch/arm/include/asm/stage2_pgtable.h |  42 ++---
 arch/arm64/include/asm/cpufeature.h   |  13 ++
 arch/arm64/include/asm/kvm_arm.h  | 137 ++---
 arch/arm64/include/asm/kvm_asm.h  |   2 +-
 arch/arm64/include/asm/kvm_host.h |  19 ++-
 arch/arm64/include/asm/kvm_hyp.h  |  16 ++
 arch/arm64/include/asm/kvm_mmu.h  |  92 ++-
 arch/arm64/include/asm/stage2_pgtable-nopmd.h |  42 -
 arch/arm64/include/asm/stage2_pgtable-nopud.h |  39 -
 arch/arm64/include/asm/stage2_pgtable.h   | 213 +++---
 arch/arm64/kvm/guest.c|  42 +
 arch/arm64/kvm/hyp/s2-setup.c |  37 +
 arch/arm64/kvm/hyp/switch.c   |   4 +-
 arch/arm64/kvm/hyp/tlb.c  |   4 +-
 drivers/virtio/virtio_mmio.c  |  18 ++-
 drivers/virtio/virtio_pci_legacy.c|  12 +-
 include/linux/irqchip/arm-gic-v3.h|   5 +
 include/uapi/linux/kvm.h  |  16 ++
 virt/kvm/arm/arm.c|  32 +++-
 virt/kvm/arm/mmu.c| 124 ---
 virt/kvm/arm/vgic/vgic-its.c  |  36 ++---
 virt/kvm/arm/vgic/vgic-kvm-device.c   |   2 +-
 virt/kvm/arm/vgic/vgic-mmio-v3.c  |   2 -
 26 files changed, 663 insertions(+), 332 deletions(-)
 delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h
 delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h


kvmtool patches :

Suzuki K Poulose (4):
  kvmtool: Allow backends to run checks on the KVM device fd
  kvmtool: arm64: Add support for guest physical address size
  kvmtool: arm64: Switch memory layout
  kvmtool: arm: Add support for creating VM with PA size

 arm/aarch32/include/kvm/kvm-arch.h|  6 --
 arm/aarch64/include/kvm/kvm-arch.h| 15 ---
 arm/aarch64/include/kvm/kvm-config-arch.h |  5 -
 arm/include/arm-common/kvm-arch.h | 17 +++--
 arm/include/arm-common/kvm-config-arch.h  |  1 +
 arm/kvm.c | 24 +++-
 include/kvm/kvm.h |  4 
 kvm.c |  2 ++
 8 files changed, 61 insertions(+), 13 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH v3 01/20] virtio: mmio-v1: Validate queue PFN

2018-06-29 Thread Suzuki K Poulose
virtio-mmio with virtio-v1 uses a 32bit PFN for the queue.
If the queue pfn is too large to fit in 32bits, which
we could hit on arm64 systems with 52bit physical addresses
(even with 64K page size), we simply miss out a proper link
to the other side of the queue.

Add a check to validate the PFN, rather than silently breaking
the devices.

Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Marc Zyngier 
Cc: Christoffer Dall 
Cc: Peter Maydel 
Cc: Jean-Philippe Brucker 
Signed-off-by: Suzuki K Poulose 
---
Changes since v2:
 - Change errno to -E2BIG
---
 drivers/virtio/virtio_mmio.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 67763d3..82cedc8 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -397,9 +397,21 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
*vdev, unsigned index,
/* Activate the queue */
writel(virtqueue_get_vring_size(vq), vm_dev->base + 
VIRTIO_MMIO_QUEUE_NUM);
if (vm_dev->version == 1) {
+   u64 q_pfn = virtqueue_get_desc_addr(vq) >> PAGE_SHIFT;
+
+   /*
+* virtio-mmio v1 uses a 32bit QUEUE PFN. If we have something
+* that doesn't fit in 32bit, fail the setup rather than
+* pretending to be successful.
+*/
+   if (q_pfn >> 32) {
+   dev_err(>dev, "virtio-mmio: queue address too 
large\n");
+   err = -E2BIG;
+   goto error_bad_pfn;
+   }
+
writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_QUEUE_ALIGN);
-   writel(virtqueue_get_desc_addr(vq) >> PAGE_SHIFT,
-   vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
+   writel(q_pfn, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
} else {
u64 addr;
 
@@ -430,6 +442,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
*vdev, unsigned index,
 
return vq;
 
+error_bad_pfn:
+   vring_del_virtqueue(vq);
 error_new_virtqueue:
if (vm_dev->version == 1) {
writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN);
-- 
2.7.4




Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo

2016-05-09 Thread Suzuki K Poulose

On 09/05/16 04:30, Vijay Kilari wrote:

Hi Suzuki,

   The last 5 patches are not compiling on v4.4. Looks like your patch
series is not merged completely. Can you please
rebase your patches and let me know.



Could you please give the tree below a try ?

git://linux-arm.org/linux-skp.git cpu-ftr/v3-4.3-rc4


This works.
Now the question is, Are your patches getting merged anytime soon?.


Well, we have been waiting for a use case, like this, before we merge
the series.

Will, Catalin,

Now that we have some real users of the infrastructure, what do you think ?
I can post an updated/rebased series, if you would like.


Suzuki




If not, I prefer to go with /proc/cpuinfo.

Another solution is look for /sys/devices/system/cpu/cpu$ID/identification/midr
if not available then fall back on /proc/cpuinfo.

Regards
Vijay






Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo

2016-04-13 Thread Suzuki K Poulose

On 13/04/16 10:54, Vijay Kilari wrote:

On Mon, Apr 11, 2016 at 3:07 PM, Suzuki K Poulose
<suzuki.poul...@arm.com> wrote:

On 11/04/16 07:52, Vijay Kilari wrote:




Hi Suzuki,

  The last 5 patches are not compiling on v4.4. Looks like your patch
series is not merged completely. Can you please
rebase your patches and let me know.



Could you please give the tree below a try ?

git://linux-arm.org/linux-skp.git cpu-ftr/v3-4.3-rc4

Cheers
Suzuki



Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo

2016-04-11 Thread Suzuki K Poulose

On 11/04/16 07:52, Vijay Kilari wrote:

Adding Suzuki Poulose.

Hi Suzuki,

On Fri, Apr 8, 2016 at 3:13 PM, Peter Maydell  wrote:

On 8 April 2016 at 07:21, Vijay Kilari  wrote:

On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell  wrote:

I'm told there are kernel patches in progress to get this sort
of information in a maintainable way to userspace, which are
currently somewhat stalled due to lack of anybody who wants to
consume it. If you have a use case then you should probably
flag it up with the kernel devs.


Can you please give references to those patches/discussion?


I'm told the most recent thread is https://lkml.org/lkml/2015/10/5/517
(and that most of the patches in that series have gone in, except
for the last 4 or 5 which implement the ABI).


Can you please throw some light on what is the status of ABI to
read cpu information in user space.
I wanted to know cpu implementer, part number in QEMU utils
to add prefetches to speed up live migration for Thunderx platform.



As for the patch series, except for that last 5 patches (which actually 
implements
the ABI), the infrastructure patches have been merged in v4.4.

We are awaiting feedback from possible consumers like toolchain (gcc, glibc).
If you think this will be suitable for you, thats good to know. There is
documentation available in the last patch in the above series. Could you please
try the series (on v4.4, which would be easier, by simply picking up the last
5 patches) and let us know if that works for you ?

Cheers
Suzuki




[Qemu-devel] Qemu s390x emulation

2013-01-15 Thread Suzuki K. Poulose

Hi

I have been trying to setup a qemu session for qemu-system-s390x (on
x86_64) using a kernel (with initramfs built-in the kernel) without a
disk image. The kernel was built with s390 defconfig + disabled loadable
modules (just to keep everything inside the kernel).

$ qemu-system-s390x -M s390 -kernel vmlinux -m 1024


The session dies in say 2 secs, with an exit code of 0. I searched for
some hints / success stories, couldn't find any.

Am I doing something wrong here ? Please let me know the right procedure
for getting this up and running.


Thanks
Suzuki




Re: [Qemu-devel] Qemu s390x emulation

2013-01-15 Thread Suzuki K. Poulose

On 01/15/2013 04:39 PM, Alexander Graf wrote:


On 15.01.2013, at 12:05, Suzuki K. Poulose wrote:


Hi

I have been trying to setup a qemu session for qemu-system-s390x (on
x86_64) using a kernel (with initramfs built-in the kernel) without a
disk image. The kernel was built with s390 defconfig + disabled loadable
modules (just to keep everything inside the kernel).

$ qemu-system-s390x -M s390 -kernel vmlinux -m 1024


The session dies in say 2 secs, with an exit code of 0. I searched for
some hints / success stories, couldn't find any.

Am I doing something wrong here ? Please let me know the right procedure
for getting this up and running.


S390 boots using an image file. Please try -kernel kernel 
dir/arch/s390/boot/image.

Tried that even, but not any better. btw, moved to the upstream git for 
qemu.


0
$/data/src/qemu/s390x-softmmu/qemu-system-s390x  -m 1024 -kernel ./image 
-nographic

$echo $?
0
$file ./image
./image: Linux S390

$ cd /data/src/qemu/ ; git log | head -n1
commit cf7c3f0cb5a7129f57fa9e69d410d6a05031988c

Thanks
Suzuki


Alex