Re: [Qemu-devel] ssh session with qemu-arm using busybox
On 12/03/2019 14:02, Pintu Agarwal wrote: -netdev user,id=unet,hostfwd=tcp::-:22 \ -net user \ and you 'll get guest's port 22 to be forwarded to hosts port , so you can do ssh root@localhost: from the host. I tried many different options, but unfortunately none worked for me. 1) qemu-system-arm -M vexpress-a9 -m 1024M -kernel ../KERNEL/linux/arch/arm/boot/zImage -dtb ../KERNEL/linux/arch/arm/boot/dts/vexpress-v2p-ca9.dtb -initrd rootfs.img.gz -append "console=ttyAMA0 root=/dev/ram rdinit=/sbin/init ip=dhcp" -nographic -smp 4 -netdev user,id=unet,hostfwd=tcp::-:22 -net user With this the eth0 interface is removed, and I see this message (although login works): qemu-system-arm: warning: hub 0 with no nics qemu-system-arm: warning: netdev unet has no peer Booting Linux on physical CPU 0x0 NET: Registered protocol family 17 Run /sbin/init as init process ifconfig: SIOCSIFADDR: No such device route: SIOCADDRT: Network is unreachable But, ssh is still not working. ssh root@localhost: ssh: Could not resolve hostname localhost:: Name or service not known man ssh + Make sure you have sshd in your custom rootfs and has been stared. Cheers Suzuki
Re: [Qemu-devel] [RFC v4 02/16] linux-headers: header update for KVM/ARM KVM_ARM_GET_MAX_VM_PHYS_SHIFT
Hi Eric, On 10/18/2018 03:30 PM, Eric Auger wrote: This is a header update against kvmarm next branch git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm kvmarm/next to get the KVM_ARM_GET_MAX_VM_PHYS_SHIFT ioctl. This allows to retrieve the IPA address range KVM supports. Signed-off-by: Eric Auger --- v3 -> v4: - update against kvmarm next --- linux-headers/linux/kvm.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index 83ba4eb571..9647ce4fcb 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -750,6 +750,15 @@ struct kvm_ppc_resize_hpt { #define KVM_S390_SIE_PAGE_OFFSET 1 +/* + * On arm64, machine type can be used to request the physical + * address size for the VM. Bits[7-0] are reserved for the guest + * PA size shift (i.e, log2(PA_Size)). For backward compatibility, + * value 0 implies the default IPA size, 40bits. + */ +#define KVM_VM_TYPE_ARM_IPA_SIZE_MASK 0xffULL +#define KVM_VM_TYPE_ARM_IPA_SIZE(x)\ + ((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK) /* * ioctls for /dev/kvm fds: */ @@ -953,6 +962,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_NESTED_STATE 157 #define KVM_CAP_ARM_INJECT_SERROR_ESR 158 #define KVM_CAP_MSR_PLATFORM_INFO 159 +#define KVM_CAP_ARM_VM_IPA_SIZE 160 /* returns maximum IPA bits for a VM */ Please be aware that there have been multiple merge conflicts with the kvmarm-tree onto kvm tree upstream and the numbers have changed. I assume that you will be rebasing this to mainline anyways. Cheers Suzuki
Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM
On 10/07/18 18:03, Dave Martin wrote: On Tue, Jul 10, 2018 at 05:38:39PM +0100, Suzuki K Poulose wrote: On 09/07/18 14:37, Dave Martin wrote: On Mon, Jul 09, 2018 at 01:29:42PM +0100, Marc Zyngier wrote: On 09/07/18 12:23, Dave Martin wrote: [...] Wedging arguments into a few bits in the type argument feels awkward, and may be regretted later if we run out of bits, or something can't be represented in the chosen encoding. I think that's a pretty convincing argument for a "better" CREATE_VM, one that would have a clearly defined, structured (and potentially extensible) argument. I've quickly hacked the following: diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index b6270a3b38e9..3e76214034c2 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -735,6 +735,20 @@ struct kvm_ppc_resize_hpt { __u32 pad; }; +struct kvm_create_vm2 { + __u64 version;/* Or maybe not */ + union { + struct { +#define KVM_ARM_SVE_CAPABLE(1 << 0) +#define KVM_ARM_SELECT_IPA {1 << 1) + __u64 capabilities; + __u16 sve_vlen; + __u8ipa_size; + } arm64; + __u64 dummy[15]; + }; +}; + #define KVMIO 0xAE /* machine type bits, to be used as argument to KVM_CREATE_VM */ Other architectures could fill in their own bits if they need to. Thoughts? This kind of thing should work, but it may still get messy when we add additional fields. Marc, Dave, I like Dave's approach. Some comments below. It we want this to work cross-arch, would it make sense to go for a more generic approach, say struct kvm_create_vm_attr_any { __u32 type; }; #define KVM_CREATE_VM_ATTR_ARCH_CAPABILITIES 1 struct kvm_create_vm_attr_arch_capabilities { __u32 type; __u16 size; /* support future expansion of capabilities[] */ __u16 reserved; __u64 capabilities[1]; }; We also need to advertise which attributes are supported by the host, so that the user can tune the available ones. That would make a bit mask like the above trickier, unless we return the supported values back in the argument ptr for the "probe" call. And this scheme in general can be useful for passing back a non-boolean result specific to the attribute, without having a per-attribute ioctl. (e.g, maximum limit for IPA). Maybe, but this could quickly become bloated. (My approach already feels a bit bloated...) I'm not sure that arbitrarily complex negotiation will really be needed, but userspace might want to change its mind if setting a particular propertiy fails. An alternative might be to have a bunch of per-VM ioctls to configure different things, like x86 has. There's at least precedent for that. For arm, we currently only have a few. That allows for easy extension, at the cost of adding ioctls. As you know, one of the major problems with the per-VM ioctls is the ordering of different operations and tracking to make sure that the userspace follows the expected order. e.g, the first approach for IPA series was based on this and it made things complex enough to drop it. There may be some ioctls we can reuse, like KVM_ENABLE_CAP for per- vm capability flags. May be we could switch to KVM_VM_CAPS and pass a list of capabilities to be enabled at creation time ? The kvm_enable_cap can pass in additional arguments for each cap. That way we don't have to rely on a new set of attributes and probing becomes straight forward. Suzuki
Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM
On 09/07/18 14:37, Dave Martin wrote: On Mon, Jul 09, 2018 at 01:29:42PM +0100, Marc Zyngier wrote: On 09/07/18 12:23, Dave Martin wrote: On Fri, Jul 06, 2018 at 05:39:00PM +0100, Suzuki K Poulose wrote: On 07/06/2018 04:09 PM, Marc Zyngier wrote: On 06/07/18 14:49, Suzuki K Poulose wrote: On 04/07/18 23:03, Suzuki K Poulose wrote: On 07/04/2018 04:51 PM, Will Deacon wrote: Hi Suzuki, On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote: Allow specifying the physical address size for a new VM via the kvm_type argument for KVM_CREATE_VM ioctl. This allows us to finalise the stage2 page table format as early as possible and hence perform the right checks on the memory slots without complication. The size is encoded as Log2(PA_Size) in the bits[7:0] of the type field and can encode more information in the future if required. The IPA size is still capped at 40bits. Cc: Marc Zyngier Cc: Christoffer Dall Cc: Peter Maydel Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Suzuki K Poulose --- arch/arm/include/asm/kvm_mmu.h | 2 ++ arch/arm64/include/asm/kvm_arm.h | 10 +++--- arch/arm64/include/asm/kvm_mmu.h | 2 ++ include/uapi/linux/kvm.h | 10 ++ virt/kvm/arm/arm.c | 24 ++-- 5 files changed, 39 insertions(+), 9 deletions(-) [...] diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 4df9bb6..fa4cab0 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt { #define KVM_S390_SIE_PAGE_OFFSET 1 /* + * On arm/arm64, machine type can be used to request the physical + * address size for the VM. Bits [7-0] have been reserved for the + * PA size shift (i.e, log2(PA_Size)). For backward compatibility, + * value 0 implies the default IPA size, which is 40bits. + */ +#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK 0xff +#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x) \ + ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK) This seems like you're allocating quite a lot of bits in a non-extensible interface to a fairly esoteric parameter. Would it be better to add another ioctl, or condense the number of sizes you support instead? As I explained in the other thread, we need the size as soon as the VM is created. The major challenge is keeping the backward compatibility by mapping 0 to 40bits. I will give it a thought. Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, which occupies 3 bits and has the following definitions. (ID_AA64MMFR0_EL1:PARange also has the field definitions, except that the field is 4bits wide, but only 3bits are used) 000 32 bits, 4GB. 001 36 bits, 64GB. 010 40 bits, 1TB. 011 42 bits, 4TB. 100 44 bits, 16TB. 101 48 bits, 256TB. 110 52 bits, 4PB But we need to map 0 => 40bits IPA to make our ABI backward compatible. So we could use the additional one bit to indicate that IPA size is requested in the 3 bits. i.e, machine_type: Bit [2:0] - Requested IPA size. Values follow VTCR_EL2.PS format. Bit [3] - 1 => IPA Size bits (Bits[2:0]) requested. 0 => Not requested The only minor down side is restricting to the predefined values above, which is not a real issue for a VM. Thoughts ? I'd be very wary of using that 4th bit to do something that is not in the architecture. We have only a single value left to be used (0b111), and then your scheme clashes with the architecture definition. I agree. However, if we ever go beyond the 3bits in PARange, we have an issue with {V}TCR counter part. But lets not take that chance. I'd rather encode things in a way that is independent from the architecture, and be done with it. You can map 0 to 40bits, and we have the ability to express all values the architecture has (just in a different order). The other option I can think of is encoding a signed number which is the difference of the IPA from 40. But that would need 5 bits if we were to encode it as it is. And if we want to squeeze it in 4bit, we could store half the difference (limiting the IPA limit to even numbers). i.e IPA = 40 + 2 * sign_extend(bits[3:0); I came across similar issues when trying to work out how to enable SVE for KVM. In the end I reduced this to a per-vcpu feature, but it means that there is no global opt-in for the SVE-specific KVM API extensions: That's a bit gross, because SVE may require a change to the way vcpus are initialised. The set of supported SVE vector lengths needs to be set somehow before the vcpu is set running, but it's tricky do do that without a new ioctl -- which would mean that if SVE is enabled for a vcpu then the vcpu is not considered runnable until the new magic ioctl is called. Opting into that semantic change globally at VM creation time might be preferable. On the SVE side, this is still very much subject to review/change. Here: The KVM_CREATE_VM init argument seems undefined by the KVM core code and
Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM
On 07/06/2018 04:09 PM, Marc Zyngier wrote: On 06/07/18 14:49, Suzuki K Poulose wrote: On 04/07/18 23:03, Suzuki K Poulose wrote: On 07/04/2018 04:51 PM, Will Deacon wrote: Hi Suzuki, On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote: Allow specifying the physical address size for a new VM via the kvm_type argument for KVM_CREATE_VM ioctl. This allows us to finalise the stage2 page table format as early as possible and hence perform the right checks on the memory slots without complication. The size is encoded as Log2(PA_Size) in the bits[7:0] of the type field and can encode more information in the future if required. The IPA size is still capped at 40bits. Cc: Marc Zyngier Cc: Christoffer Dall Cc: Peter Maydel Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Suzuki K Poulose --- arch/arm/include/asm/kvm_mmu.h | 2 ++ arch/arm64/include/asm/kvm_arm.h | 10 +++--- arch/arm64/include/asm/kvm_mmu.h | 2 ++ include/uapi/linux/kvm.h | 10 ++ virt/kvm/arm/arm.c | 24 ++-- 5 files changed, 39 insertions(+), 9 deletions(-) [...] diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 4df9bb6..fa4cab0 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt { #define KVM_S390_SIE_PAGE_OFFSET 1 /* + * On arm/arm64, machine type can be used to request the physical + * address size for the VM. Bits [7-0] have been reserved for the + * PA size shift (i.e, log2(PA_Size)). For backward compatibility, + * value 0 implies the default IPA size, which is 40bits. + */ +#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK 0xff +#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x) \ + ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK) This seems like you're allocating quite a lot of bits in a non-extensible interface to a fairly esoteric parameter. Would it be better to add another ioctl, or condense the number of sizes you support instead? As I explained in the other thread, we need the size as soon as the VM is created. The major challenge is keeping the backward compatibility by mapping 0 to 40bits. I will give it a thought. Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, which occupies 3 bits and has the following definitions. (ID_AA64MMFR0_EL1:PARange also has the field definitions, except that the field is 4bits wide, but only 3bits are used) 000 32 bits, 4GB. 001 36 bits, 64GB. 010 40 bits, 1TB. 011 42 bits, 4TB. 100 44 bits, 16TB. 101 48 bits, 256TB. 110 52 bits, 4PB But we need to map 0 => 40bits IPA to make our ABI backward compatible. So we could use the additional one bit to indicate that IPA size is requested in the 3 bits. i.e, machine_type: Bit [2:0] - Requested IPA size. Values follow VTCR_EL2.PS format. Bit [3] - 1 => IPA Size bits (Bits[2:0]) requested. 0 => Not requested The only minor down side is restricting to the predefined values above, which is not a real issue for a VM. Thoughts ? I'd be very wary of using that 4th bit to do something that is not in the architecture. We have only a single value left to be used (0b111), and then your scheme clashes with the architecture definition. I agree. However, if we ever go beyond the 3bits in PARange, we have an issue with {V}TCR counter part. But lets not take that chance. I'd rather encode things in a way that is independent from the architecture, and be done with it. You can map 0 to 40bits, and we have the ability to express all values the architecture has (just in a different order). The other option I can think of is encoding a signed number which is the difference of the IPA from 40. But that would need 5 bits if we were to encode it as it is. And if we want to squeeze it in 4bit, we could store half the difference (limiting the IPA limit to even numbers). i.e IPA = 40 + 2 * sign_extend(bits[3:0); Suzuki
Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM
On 04/07/18 23:03, Suzuki K Poulose wrote: On 07/04/2018 04:51 PM, Will Deacon wrote: Hi Suzuki, On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote: Allow specifying the physical address size for a new VM via the kvm_type argument for KVM_CREATE_VM ioctl. This allows us to finalise the stage2 page table format as early as possible and hence perform the right checks on the memory slots without complication. The size is encoded as Log2(PA_Size) in the bits[7:0] of the type field and can encode more information in the future if required. The IPA size is still capped at 40bits. Cc: Marc Zyngier Cc: Christoffer Dall Cc: Peter Maydel Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Suzuki K Poulose --- arch/arm/include/asm/kvm_mmu.h | 2 ++ arch/arm64/include/asm/kvm_arm.h | 10 +++--- arch/arm64/include/asm/kvm_mmu.h | 2 ++ include/uapi/linux/kvm.h | 10 ++ virt/kvm/arm/arm.c | 24 ++-- 5 files changed, 39 insertions(+), 9 deletions(-) [...] diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 4df9bb6..fa4cab0 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt { #define KVM_S390_SIE_PAGE_OFFSET 1 /* + * On arm/arm64, machine type can be used to request the physical + * address size for the VM. Bits [7-0] have been reserved for the + * PA size shift (i.e, log2(PA_Size)). For backward compatibility, + * value 0 implies the default IPA size, which is 40bits. + */ +#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK 0xff +#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x) \ + ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK) This seems like you're allocating quite a lot of bits in a non-extensible interface to a fairly esoteric parameter. Would it be better to add another ioctl, or condense the number of sizes you support instead? As I explained in the other thread, we need the size as soon as the VM is created. The major challenge is keeping the backward compatibility by mapping 0 to 40bits. I will give it a thought. Here is one option. We could re-use the {V}TCR_ELx.{I}PS field format, which occupies 3 bits and has the following definitions. (ID_AA64MMFR0_EL1:PARange also has the field definitions, except that the field is 4bits wide, but only 3bits are used) 000 32 bits, 4GB. 001 36 bits, 64GB. 010 40 bits, 1TB. 011 42 bits, 4TB. 100 44 bits, 16TB. 101 48 bits, 256TB. 110 52 bits, 4PB But we need to map 0 => 40bits IPA to make our ABI backward compatible. So we could use the additional one bit to indicate that IPA size is requested in the 3 bits. i.e, machine_type: Bit [2:0] - Requested IPA size. Values follow VTCR_EL2.PS format. Bit [3] - 1 => IPA Size bits (Bits[2:0]) requested. 0 => Not requested The only minor down side is restricting to the predefined values above, which is not a real issue for a VM. Thoughts ? Suzuki
Re: [Qemu-devel] [kvmtool test PATCH 22/24] kvmtool: arm64: Add support for guest physical address size
On 05/07/18 14:46, Auger Eric wrote: Hi Marc, On 07/05/2018 03:20 PM, Marc Zyngier wrote: On 05/07/18 13:47, Julien Grall wrote: Hi Will, On 04/07/18 16:52, Will Deacon wrote: On Wed, Jul 04, 2018 at 04:00:11PM +0100, Julien Grall wrote: On 04/07/18 15:09, Will Deacon wrote: On Fri, Jun 29, 2018 at 12:15:42PM +0100, Suzuki K Poulose wrote: Add an option to specify the physical address size used by this VM. Signed-off-by: Suzuki K Poulose --- arm/aarch64/include/kvm/kvm-config-arch.h | 5 - arm/include/arm-common/kvm-config-arch.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h b/arm/aarch64/include/kvm/kvm-config-arch.h index 04be43d..dabd22c 100644 --- a/arm/aarch64/include/kvm/kvm-config-arch.h +++ b/arm/aarch64/include/kvm/kvm-config-arch.h @@ -8,7 +8,10 @@ "Create PMUv3 device"), \ OPT_U64('\0', "kaslr-seed", &(cfg)->kaslr_seed,\ "Specify random seed for Kernel Address Space " \ - "Layout Randomization (KASLR)"), + "Layout Randomization (KASLR)"), \ + OPT_INTEGER('\0', "phys-shift", &(cfg)->phys_shift,\ + "Specify maximum physical address size (not " \ + "the amount of memory)"), Given that this is a shift value, I think the help message could be more informative. Something like: "Specify maximum number of bits in a guest physical address" I think I'd actually leave out any mention of memory, because this does actually have an effect on the amount of addressable memory in a way that I don't think we want to describe in half of a usage message line :) Is there any particular reasons to expose this option to the user? I have recently sent a series to allow the user to specify the position of the RAM [1]. With that series in mind, I think the user would not really need to specify the maximum physical shift. Instead we could automatically find it. Marc makes a good point that it doesn't help for MMIO regions, so I'm trying to understand whether we can do something differently there and avoid sacrificing the type parameter. I am not sure to understand this. kvmtools knows the memory layout (including MMIOs) of the guest, so couldn't it guess the maximum physical shift for that? That's exactly what Will was trying to avoid, by having KVM to compute the size of the IPA space based on the registered memslots. We've now established that it doesn't work, so what we need to define is: - whether we need another ioctl(), or do we carry on piggy-backing on the CPU type, kvm type I guess machine type is more appropriate, going by the existing users. - assuming the latter, whether we can reduce the number of bits used in the ioctl parameter by subtly encoding the IPA size. Getting benefit from your Freudian slip, how should guest CPU PARange and maximum number of bits in a guest physical address relate? My understanding is they are not correlated at the moment and our guest PARange is fixed at the moment. But shouldn't they? On Intel there is qemu-system-x86_64 -M pc,accel=kvm -cpu SandyBridge,phys-bits=36 or qemu-system-x86_64 -M pc,accel=kvm -cpu SandyBridge,host-phys-bits=true where phys-bits, as far as I understand has a a similar semantics as the PARange. AFAIT, PARange tells you the maximum (I)Physcial Address that can be handled by the CPU. But your IPA limit tells you where the guest RAM is placed. So they need not be the same. e.g, on Juno, A57's have a PARange of 42 if I am not wrong (but definitely > 40), while A53's have it at 40 and the system RAM is at 40bits. So, if we were to only use the A57s on Juno, we could run a KVM instance with 42 bits IPA or anything lower. So, PARange can be inferred as the maximum limit of the CPU's capability while the IPA is where the RAM is placed for a given system. One could keep them in sync for a VM by emulating, but then nobody uses the PARange, except the KVM. The other problem with capping PARange in the VM to IPA is restricting the IPA size of a nested VM. So, I don't think this is really beneficial. Cheers Suzuki Thanks Eric Thanks, M.
Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM
On 07/04/2018 04:51 PM, Will Deacon wrote: Hi Suzuki, On Fri, Jun 29, 2018 at 12:15:35PM +0100, Suzuki K Poulose wrote: Allow specifying the physical address size for a new VM via the kvm_type argument for KVM_CREATE_VM ioctl. This allows us to finalise the stage2 page table format as early as possible and hence perform the right checks on the memory slots without complication. The size is encoded as Log2(PA_Size) in the bits[7:0] of the type field and can encode more information in the future if required. The IPA size is still capped at 40bits. Cc: Marc Zyngier Cc: Christoffer Dall Cc: Peter Maydel Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Suzuki K Poulose --- arch/arm/include/asm/kvm_mmu.h | 2 ++ arch/arm64/include/asm/kvm_arm.h | 10 +++--- arch/arm64/include/asm/kvm_mmu.h | 2 ++ include/uapi/linux/kvm.h | 10 ++ virt/kvm/arm/arm.c | 24 ++-- 5 files changed, 39 insertions(+), 9 deletions(-) [...] diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 4df9bb6..fa4cab0 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt { #define KVM_S390_SIE_PAGE_OFFSET 1 /* + * On arm/arm64, machine type can be used to request the physical + * address size for the VM. Bits [7-0] have been reserved for the + * PA size shift (i.e, log2(PA_Size)). For backward compatibility, + * value 0 implies the default IPA size, which is 40bits. + */ +#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK0xff +#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x) \ + ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK) This seems like you're allocating quite a lot of bits in a non-extensible interface to a fairly esoteric parameter. Would it be better to add another ioctl, or condense the number of sizes you support instead? As I explained in the other thread, we need the size as soon as the VM is created. The major challenge is keeping the backward compatibility by mapping 0 to 40bits. I will give it a thought. Suzuki
Re: [Qemu-devel] [kvmtool test PATCH 24/24] kvmtool: arm: Add support for creating VM with PA size
Hi Will, On 07/04/2018 03:22 PM, Will Deacon wrote: On Fri, Jun 29, 2018 at 12:15:44PM +0100, Suzuki K Poulose wrote: diff --git a/arm/kvm.c b/arm/kvm.c index 5701d41..b1969be 100644 --- a/arm/kvm.c +++ b/arm/kvm.c @@ -11,6 +11,8 @@ #include #include +unsigned long kvm_arm_type; + struct kvm_ext kvm_req_ext[] = { { DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) }, { DEFINE_KVM_EXT(KVM_CAP_ONE_REG) }, @@ -18,6 +20,26 @@ struct kvm_ext kvm_req_ext[] = { { 0, 0 }, }; +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT _IO(KVMIO, 0x0b) +#endif + +void kvm__arch_init_hyp(struct kvm *kvm) +{ + int max_ipa; + + max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT); + if (max_ipa < 0) + max_ipa = 40; + if (!kvm->cfg.arch.phys_shift) + kvm->cfg.arch.phys_shift = 40; + if (kvm->cfg.arch.phys_shift > max_ipa) + die("Requested PA size (%u) is not supported by the host (%ubits)\n", + kvm->cfg.arch.phys_shift, max_ipa); + if (kvm->cfg.arch.phys_shift != 40) + kvm_arm_type = kvm->cfg.arch.phys_shift; +} Seems a bit weird that the "machine type identifier" to KVM_CREATE_VM is dedicated entirely to holding the physical address shift verbatim. Is this really the ABI? The bits[7:0] of the machine type has been reserved for the IPA shift. This version is missing the updates to the ABI documentation, I have it for the next version. Also, couldn't KVM figure it out automatically if you add memslots at high addresses, making this a niche tunable outside of testing? The stage2 pgd size is really dependent on the max IPA. Also, unlike the stage1 (where the maximum size will be 1 page), the size can go upto 16 pages (and different number of levels due to concatenation), so we need to finalize this at least before the first memory gets mapped (RAM or Device). That implies, we cannot wait until all the memory slots are created. The first version of the series added a separate ioctl for specifying the limit, which had its own complexities. So, this ABI was suggested to keep things simpler. Suzuki
Re: [Qemu-devel] [PATCH v3 10/20] kvm: arm64: Dynamic configuration of VTTBR mask
On 07/04/2018 09:24 AM, Auger Eric wrote: + * + * We have a magic formula for the Magic_N below. + * + * Magic_N(PAGE_SIZE, Entry_Level) = 64 - ((PAGE_SHIFT - 3) * Number of levels) [0] ^^^ + * + * where number of levels = (4 - Entry_Level). ^^^ Doesn't this help make it clear ? Using the expansion makes it a bit more unreadable below. I just wanted to mention the tables you refer (D4-23 and D4-25) give Magic_N for a larger scope as they deal with any lookup level while we only care about the entry level for BADDR. So I was a little bit confused when reading the explanation but that's not a big deal. Ah, ok. I will try to clarify it. Cheers Suzuki
Re: [Qemu-devel] [RFC 5/6] hw/arm/virt: support kvm_type property
On 03/07/18 13:47, Andrew Jones wrote: This infrastructure already is used in hw/ppc/spapr.c Whould it be better if we would pass something like kvm-type=48bGPA? Otherwise I can decode another virt machine option (min_vm_phys_shift) in kvm_type callback. Yes, this is what I'm thinking. I don't believe we have to expose the details of the KVM API to the user through the QEMU command line. The details are actually more complicated anyway, as the phys-shift is only the lower 8-bits of KVM type[*], not the whole value. Thanks, drew [*] Looks like Suzuki's series is missing the Documentation/virtual/kvm/api.txt update needed to specify that. Thanks for spotting, I will update the documentation. Suzuki
Re: [Qemu-devel] [PATCH v3 10/20] kvm: arm64: Dynamic configuration of VTTBR mask
Hi Eric, On 02/07/18 15:41, Auger Eric wrote: Hi Suzuki, On 06/29/2018 01:15 PM, Suzuki K Poulose wrote: On arm64 VTTBR_EL2:BADDR holds the base address for the stage2 translation table. The Arm ARM mandates that the bits BADDR[x-1:0] should be 0, where 'x' is defined for a given IPA Size and the number of levels for a translation granule size. It is defined using some magical constants. This patch is a reverse engineered implementation to calculate the 'x' at runtime for a given ipa and number of page table levels. See patch for more details. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2: - Part 1 of spilt from VTCR & VTTBR dynamic configuration --- arch/arm64/include/asm/kvm_arm.h | 60 +--- arch/arm64/include/asm/kvm_mmu.h | 25 - 2 files changed, 80 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 3dffd38..c557f45 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -140,8 +140,6 @@ * Note that when using 4K pages, we concatenate two first level page tables * together. With 16K pages, we concatenate 16 first level page tables. * - * The magic numbers used for VTTBR_X in this patch can be found in Tables - * D4-23 and D4-25 in ARM DDI 0487A.b. Isn't it a pretty old reference? Could you refer to C.a? Sure, I will update the references everywhere. + * + * The algorithm defines the expectations on the BaseAddress (for the page + * table) bits resolved at each level based on the page size, entry level + * and T0SZ. The variable "x" in the algorithm also affects the VTTBR:BADDR + * for stage2 page table. + * + * The value of "x" is calculated as : + * x = Magic_N - T0SZ + * + * where Magic_N is an integer depending on the page size and the entry + * level of the page table as below: + * + * + * | Entry level | 4K16K 64K | + * + * | Level: 0 (4 levels) | 28 | - | - | + * + * | Level: 1 (3 levels) | 37 | 31 | 25 | + * + * | Level: 2 (2 levels) | 46 | 42 | 38 | + * + * | Level: 3 (1 level)| -| 53 | 51 | + * I understand entry level = Lookup level in the table. Entry level => The level at which we start the page table walk for a given address (This is in line with the ARM ARM). So, Entry_level = (4 - Number_of_Page_table_levels) But you may want to compute x for BaseAddress matching lookup level 2 with number of levels = 4. No, the BaseAddress is only calcualted for the "Entry_level". So the above case doesn't exist at all. So shouldn't you s/Number of levels/4 - entry_level? Ok, I now understood what you are referring to [0] for BADDR we want the BaseAddr of the initial lookup level so effectively the entry level we are interested in is 4 - number of levels and we don't care or d) condition. At least this is my understanding ;-) If correct you may slightly reword the explanation? + * + * We have a magic formula for the Magic_N below. + * + * Magic_N(PAGE_SIZE, Entry_Level) = 64 - ((PAGE_SHIFT - 3) * Number of levels) [0] ^^^ + * + * where number of levels = (4 - Entry_Level). ^^^ Doesn't this help make it clear ? Using the expansion makes it a bit more unreadable below. +/* + * Get the magic number 'x' for VTTBR:BADDR of this KVM instance. + * With v8.2 LVA extensions, 'x' should be a minimum of 6 with + * 52bit IPS. Link to the spec? Sure, will add it. Thanks for the patience to review :-) Cheers Suzuki
Re: [Qemu-devel] [PATCH v3 13/20] kvm: arm64: Configure VTCR per VM
On 02/07/18 13:16, Marc Zyngier wrote: On 29/06/18 12:15, Suzuki K Poulose wrote: We set VTCR_EL2 very early during the stage2 init and don't touch it ever. This is fine as we had a fixed IPA size. This patch changes the behavior to set the VTCR for a given VM, depending on its stage2 table. The common configuration for VTCR is still performed during the early init as we have to retain the hardware access flag update bits (VTCR_EL2_HA) per CPU (as they are only set for the CPUs which are capabile). capable The bits defining the number of levels in the page table (SL0) and and the size of the Input address to the translation (T0SZ) are programmed for each VM upon entry to the guest. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Change since V2: - Load VTCR for TLB operations --- arch/arm64/include/asm/kvm_arm.h | 19 +-- arch/arm64/include/asm/kvm_asm.h | 2 +- arch/arm64/include/asm/kvm_host.h | 9 ++--- arch/arm64/include/asm/kvm_hyp.h | 11 +++ arch/arm64/kvm/hyp/s2-setup.c | 17 + 5 files changed, 28 insertions(+), 30 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 11a7db0..b02c316 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -120,9 +120,7 @@ #define VTCR_EL2_IRGN0_WBWA TCR_IRGN0_WBWA #define VTCR_EL2_SL0_SHIFT6 #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT) -#define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT) #define VTCR_EL2_T0SZ_MASK0x3f -#define VTCR_EL2_T0SZ_40B 24 #define VTCR_EL2_VS_SHIFT 19 #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT) @@ -137,43 +135,44 @@ * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time * (see hyp-init.S). * + * VTCR_EL2.SL0 and T0SZ are configured per VM at runtime before switching to + * the VM. + * * Note that when using 4K pages, we concatenate two first level page tables * together. With 16K pages, we concatenate 16 first level page tables. * */ -#define VTCR_EL2_T0SZ_IPA VTCR_EL2_T0SZ_40B #define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \ VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1) +#define VTCR_EL2_PRIVATE_MASK (VTCR_EL2_SL0_MASK | VTCR_EL2_T0SZ_MASK) What does "private" mean here? It really is the IPA configuration, so I'd rather have a naming that reflects that. #ifdef CONFIG_ARM64_64K_PAGES /* * Stage2 translation configuration: * 64kB pages (TG0 = 1) - * 2 level page tables (SL = 1) */ -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1) +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K #define VTCR_EL2_TGRAN_SL0_BASE 3UL #elif defined(CONFIG_ARM64_16K_PAGES) /* * Stage2 translation configuration: * 16kB pages (TG0 = 2) - * 2 level page tables (SL = 1) */ -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1) +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K #define VTCR_EL2_TGRAN_SL0_BASE 3UL #else /* 4K */ /* * Stage2 translation configuration: * 4kB pages (TG0 = 0) - * 3 level page tables (SL = 1) */ -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1) +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K #define VTCR_EL2_TGRAN_SL0_BASE 2UL #endif -#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) +#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN) + /* * VTCR_EL2:SL0 indicates the entry level for Stage2 translation. * Interestingly, it depends on the page size. diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 102b5a5..91372eb 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -72,7 +72,7 @@ extern void __vgic_v3_init_lrs(void); extern u32 __kvm_get_mdcr_el2(void); -extern u32 __init_stage2_translation(void); +extern void __init_stage2_translation(void); /* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */ #define __hyp_this_cpu_ptr(sym) \ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index fe8777b..328f472 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -442,10 +442,13 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, static inline void __cpu_init_stage2(void) { - u32 parange = kvm_call_hyp(__init_stage2_translation); + u32 ps; - WARN_ONCE(parange < 40, - "PARange is %d bits, unsupported configuration!", parange); + kvm_call_hyp(__init_stage2_translation); + /* Sanity c
Re: [Qemu-devel] [PATCH v3 01/20] virtio: mmio-v1: Validate queue PFN
Hi Michael, On 06/29/2018 06:42 PM, Michael S. Tsirkin wrote: On Fri, Jun 29, 2018 at 12:15:21PM +0100, Suzuki K Poulose wrote: virtio-mmio with virtio-v1 uses a 32bit PFN for the queue. If the queue pfn is too large to fit in 32bits, which we could hit on arm64 systems with 52bit physical addresses (even with 64K page size), we simply miss out a proper link to the other side of the queue. Add a check to validate the PFN, rather than silently breaking the devices. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Marc Zyngier Cc: Christoffer Dall Cc: Peter Maydel Cc: Jean-Philippe Brucker Signed-off-by: Suzuki K Poulose --- Changes since v2: - Change errno to -E2BIG --- drivers/virtio/virtio_mmio.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 67763d3..82cedc8 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -397,9 +397,21 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index, /* Activate the queue */ writel(virtqueue_get_vring_size(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NUM); if (vm_dev->version == 1) { + u64 q_pfn = virtqueue_get_desc_addr(vq) >> PAGE_SHIFT; + + /* +* virtio-mmio v1 uses a 32bit QUEUE PFN. If we have something +* that doesn't fit in 32bit, fail the setup rather than +* pretending to be successful. +*/ + if (q_pfn >> 32) { + dev_err(>dev, "virtio-mmio: queue address too large\n"); How about: "hypervisor bug: legacy virtio-mmio must not be used with more than 0x%llx Gigabytes of memory", 0x1ULL << (32 - 30) << PAGE_SHIFT nit : Do we need change "hypervisor" => "platform" ? Virtio is used by other tools (e.g, emulators) and not just virtual machines. Suzuki
Re: [Qemu-devel] [PATCH v3 19/20] kvm: arm64: Allow IPA size supported by the system
On 02/07/18 14:50, Marc Zyngier wrote: On 29/06/18 12:15, Suzuki K Poulose wrote: So far we have restricted the IPA size of the VM to the default value (40bits). Now that we can manage the IPA size per VM and support dynamic stage2 page tables, allow VMs to have larger IPA. This is done by setting the IPA limit to the one supported by the hardware and kernel. This patch also moves the check for the default IPA size support to kvm_get_ipa_limit(). Since the stage2 page table code is dependent on the stage1 page table, we always ensure that : Number of Levels at Stage1 >= Number of Levels at Stage2 So we limit the IPA to make sure that the above condition is satisfied. This will affect the following combinations of VA_BITS and IPA for different page sizes. 39bit VA, 4K - IPA > 43 (Upto 48) 36bit VA, 16K - IPA > 40 (Upto 48) 42bit VA, 64K - IPA > 46 (Upto 52) I'm not sure I get it. Are these the IPA sizes that we forbid based on the host VA size and page size configuration? Yes, thats right. If so, can you rewrite this as: host configuration | unsupported IPA range 39bit VA, 4k | [44, 48] 36bit VA, 16K | [41, 48] 42bit VA, 64k | [47, 52] and say that all the other combinations are supported? Sure, that looks much better. Thanks Suzuki
Re: [Qemu-devel] [PATCH v3 16/20] kvm: arm64: Switch to per VM IPA limit
Hi Marc, On 02/07/18 14:32, Marc Zyngier wrote: On 29/06/18 12:15, Suzuki K Poulose wrote: Now that we can manage the stage2 page table per VM, switch the configuration details to per VM instance. We keep track of the IPA bits, number of page table levels and the VTCR bits (which depends on the IPA and the number of levels). While at it, remove unused pgd_lock field from kvm_arch for arm64. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 328f472..9a15860 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -61,13 +61,23 @@ struct kvm_arch { u64vmid_gen; u32vmid; - /* 1-level 2nd stage table and lock */ - spinlock_t pgd_lock; + /* stage-2 page table */ pgd_t *pgd; /* VTTBR value associated with above pgd and vmid */ u64vttbr; + /* Private bits of VTCR_EL2 for this VM */ + u64vtcr_private; As I said in another email, this should become a full VTCR_EL2 copy. OK + /* Size of the PA size for this guest */ + u8 phys_shift; + /* +* Number of levels in page table. We could always calculate +* it from phys_shift above. We cache it for faster switches +* in stage2 page table helpers. +*/ + u8 s2_levels; And these two fields feel like they should be derived from the VTCR itself, instead of being there on their own. Any chance you could look into this? Yes, the VTCR is computed from the above two values and we could compute them back from the VTCR. I will give it a try. diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index ffc37cc..91d7936 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -65,7 +65,6 @@ #define __s2_pgd_ptrs(pa, lvls) (1 << ((pa) - pt_levels_pgdir_shift((lvls #define __s2_pgd_size(pa, lvls) (__s2_pgd_ptrs((pa), (lvls)) * sizeof(pgd_t)) -#define kvm_stage2_levels(kvm) stage2_pt_levels(kvm_phys_shift(kvm)) #define stage2_pgdir_shift(kvm) \ pt_levels_pgdir_shift(kvm_stage2_levels(kvm)) #define stage2_pgdir_size(kvm)(_AC(1, UL) << stage2_pgdir_shift((kvm))) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index a339e00..d7822e1 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -867,6 +867,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) return -EINVAL; } + /* Make sure we have the stage2 configured for this VM */ + if (WARN_ON(!kvm_phys_shift(kvm))) Can this be triggered from userspace? No. As we initialise the phys shift before we get here. If type is left blank (i.e, 0), we default to 40bits. So there should be something there. The check is to make sure we have indeed past the configuration step. + return -EINVAL; + /* Allocate the HW PGD, making sure that each page gets its own refcount */ pgd = stage2_alloc_pgd(kvm); if (!pgd) Cheers Suzuki
Re: [Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM
On 02/07/18 14:13, Marc Zyngier wrote: On 29/06/18 12:15, Suzuki K Poulose wrote: Allow specifying the physical address size for a new VM via the kvm_type argument for KVM_CREATE_VM ioctl. This allows us to finalise the stage2 page table format as early as possible and hence perform the right checks on the memory slots without complication. The size is encoded as Log2(PA_Size) in the bits[7:0] of the type field and can encode more information in the future if required. The IPA size is still capped at 40bits. Can't we relax this? There is no technical reason (AFAICS) not to allow going down to 36bit IPA if the user has requested it. Sure, we can. If we run on a 36bit IPA system, the default would fail. But if the user specified "please give me a 36bit IPA VM", we could satisfy that requirement and allow them to run their stupidly small guest! Absolutely. I will fix this in the next version. Cheers Suzuki
Re: [Qemu-devel] [PATCH v3 09/20] kvm: arm64: Make stage2 page table layout dynamic
Hi Eric, On 02/07/18 13:14, Auger Eric wrote: Hi Suzuki, On 06/29/2018 01:15 PM, Suzuki K Poulose wrote: So far we had a static stage2 page table handling code, based on a fixed IPA of 40bits. As we prepare for a configurable IPA size per VM, make our stage2 page table code dynamic, to do the right thing for a given VM. We ensure the existing condition is always true even when we lift the limit on the IPA. i.e, page table levels in stage1 >= page table levels in stage2 Support for the IPA size configuration needs other changes in the way we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h to the top, before including the asm/stage2_pgtable.h to avoid a forward declaration. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2 - Restrict the stage2 page table to allow reusing the host page table helpers for now, until we get stage1 independent page table helpers. I would move this up in the commit msg to motivate the fact we enforce the able condition. This is mentioned in the commit message for the patch which lifts the limitation on the IPA. This patch only deals with the dynamic page table level handling, with the restriction on the levels. Nevertheless, I could add it to the description. --- arch/arm64/include/asm/kvm_mmu.h | 14 +- arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 -- arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 - arch/arm64/include/asm/stage2_pgtable.h | 207 +++--- 4 files changed, 159 insertions(+), 143 deletions(-) delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h with my very limited knowledge of S2 page table walkers I fail to understand why we now can get rid of stage2_pgtable-nopmd.h and stage2_pgtable-nopud.h and associated FOLDED config. Please could you explain it in the commit message? As mentioned above, we have static page table helpers, which are decided at compile time (just like the stage1). So these files hold the definitions for the cases where PUD/PMD is folded and included for a given stage1 VA. But since we are now doing this check per VM, we make the decision by checking the kvm_stage2_levels(), instead of hard coding it. Does that help ? A short version of that is already there. May be I could elaborate that a bit. - -#define stage2_pgd_index(kvm, addr) \ - (((addr) >> S2_PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1)) +static inline unsigned long stage2_pgd_index(struct kvm *kvm, phys_addr_t addr) +{ + return (addr >> stage2_pgdir_shift(kvm)) & (stage2_pgd_ptrs(kvm) - 1); +} static inline phys_addr_t stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) { - phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK; + phys_addr_t boundary; + boundary = (addr + stage2_pgdir_size(kvm)) & stage2_pgdir_mask(kvm); return (boundary - 1 < end - 1) ? boundary : end; } Globally this patch is pretty hard to review. I don't know if it is possible to split into 2. 1) Addition of some helper macros. 2) removal of nopud and nopmd and implementation of the corresponding macros? I acknowledge that. The patch redefines the "existing" macros to make the decision at runtime based on the VM's setting. I will see if there is a better way to do it. Cheers Suzuki
Re: [Qemu-devel] [PATCH v3 07/20] kvm: arm/arm64: Prepare for VM specific stage2 translations
Hi Eric, On 02/07/18 11:51, Auger Eric wrote: Hi Suzuki, On 06/29/2018 01:15 PM, Suzuki K Poulose wrote: Right now the stage2 page table for a VM is hard coded, assuming an IPA of 40bits. As we are about to add support for per VM IPA, prepare the stage2 page table helpers to accept the kvm instance to make the right decision for the VM. No functional changes. Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves some of the definitions dependent on kvm instance to asm/kvm_mmu.h for arm32. In that process drop the _AC() specifier constants Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2: - Update commit description abuot the movement to asm/kvm_mmu.h for arm32 - Drop _AC() specifiers diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8553d68..f36eb20 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -36,15 +36,19 @@ }) /* - * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels. + * kvm_mmu_cache_min_pages() is the number of stage2 page + * table translation levels, excluding the top level, for + * the given VM. Since we have a 3 level page-table, this + * is fixed. */ -#define KVM_MMU_CACHE_MIN_PAGES2 +#define kvm_mmu_cache_min_pages(kvm) 2 nit: In addition to Marc'c comment, I can see it defined in stage2_pgtable.h on arm64 side. Can't we align? Sure, will do that. diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index fb9a712..5da8f52 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -141,8 +141,11 @@ static inline unsigned long __kern_hyp_va(unsigned long v) * We currently only support a 40bit IPA. */ #define KVM_PHYS_SHIFT(40) -#define KVM_PHYS_SIZE (1UL << KVM_PHYS_SHIFT) -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - 1UL) + +#define kvm_phys_shift(kvm)KVM_PHYS_SHIFT +#define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) Can't you get rid of _AC() also in arm64 case? +#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) Yes, that missed. I will do it. Thanks for spotting. Cheers Suzuki
Re: [Qemu-devel] [PATCH v3 09/20] kvm: arm64: Make stage2 page table layout dynamic
On 29/06/18 12:15, Suzuki K Poulose wrote: So far we had a static stage2 page table handling code, based on a fixed IPA of 40bits. As we prepare for a configurable IPA size per VM, make our stage2 page table code dynamic, to do the right thing for a given VM. We ensure the existing condition is always true even when we lift the limit on the IPA. i.e, page table levels in stage1 >= page table levels in stage2 Support for the IPA size configuration needs other changes in the way we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h to the top, before including the asm/stage2_pgtable.h to avoid a forward declaration. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2 - Restrict the stage2 page table to allow reusing the host page table helpers for now, until we get stage1 independent page table helpers. ... -#define stage2_pgd_none(kvm, pgd) pgd_none(pgd) -#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd) -#define stage2_pgd_present(kvm, pgd) pgd_present(pgd) -#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud) -#define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address) -#define stage2_pud_free(kvm, pud) pud_free(NULL, pud) +#define __s2_pud_index(addr) \ + (((addr) >> __S2_PUD_SHIFT) & (PTRS_PER_PTE - 1)) +#define __s2_pmd_index(addr) \ + (((addr) >> __S2_PMD_SHIFT) & (PTRS_PER_PTE - 1)) -#define stage2_pud_table_empty(kvm, pudp) kvm_page_empty(pudp) +#define __kvm_has_stage2_levels(kvm, min_levels) \ + ((CONFIG_PGTABLE_LEVELS >= min_levels) && (kvm_stage2_levels(kvm) >= min_levels)) On another look, I have renamed the helpers as follows : kvm_stage2_has_pud(kvm) => kvm_stage2_has_pmd(kvm) kvm_stage2_has_pgd(kvm) => kvm_stage2_has_pud(kvm) below and everywhere. + +#define kvm_stage2_has_pgd(kvm)__kvm_has_stage2_levels(kvm, 4) +#define kvm_stage2_has_pud(kvm) __kvm_has_stage2_levels(kvm, 3) Suzuki
Re: [Qemu-devel] [PATCH v3 07/20] kvm: arm/arm64: Prepare for VM specific stage2 translations
On 02/07/18 11:12, Marc Zyngier wrote: On 29/06/18 12:15, Suzuki K Poulose wrote: Right now the stage2 page table for a VM is hard coded, assuming an IPA of 40bits. As we are about to add support for per VM IPA, prepare the stage2 page table helpers to accept the kvm instance to make the right decision for the VM. No functional changes. Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves some of the definitions dependent on kvm instance to asm/kvm_mmu.h for arm32. In that process drop the _AC() specifier constants Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2: - Update commit description abuot the movement to asm/kvm_mmu.h for arm32 - Drop _AC() specifiers --- arch/arm/include/asm/kvm_arm.h| 3 +- arch/arm/include/asm/kvm_mmu.h| 15 +++- arch/arm/include/asm/stage2_pgtable.h | 42 - arch/arm64/include/asm/kvm_mmu.h | 7 +- arch/arm64/include/asm/stage2_pgtable-nopmd.h | 18 ++-- arch/arm64/include/asm/stage2_pgtable-nopud.h | 16 ++-- arch/arm64/include/asm/stage2_pgtable.h | 49 ++- virt/kvm/arm/arm.c| 2 +- virt/kvm/arm/mmu.c| 119 +- virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +- 10 files changed, 148 insertions(+), 125 deletions(-) diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index 3ab8b37..c3f1f9b 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -133,8 +133,7 @@ * space. */ #define KVM_PHYS_SHIFT(40) -#define KVM_PHYS_SIZE (_AC(1, ULL) << KVM_PHYS_SHIFT) -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - _AC(1, ULL)) + #define PTRS_PER_S2_PGD (_AC(1, ULL) << (KVM_PHYS_SHIFT - 30)) /* Virtualization Translation Control Register (VTCR) bits */ diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8553d68..f36eb20 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -36,15 +36,19 @@ }) /* - * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels. + * kvm_mmu_cache_min_pages() is the number of stage2 page + * table translation levels, excluding the top level, for + * the given VM. Since we have a 3 level page-table, this + * is fixed. I find this comment quite confusing: number of levels, but excluding the top one? The original one was just as bad, to be honest. Can't we just say: "kvm_mmu_cache_min_page() is the number of pages required to install a stage-2 translation"? Yes, that is much better. Will change it. diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index 8b68099..057a405 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -65,10 +65,10 @@ #define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - S2_PGDIR_SHIFT)) /* - * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation + * kvm_mmmu_cache_min_pages is the number of stage2 page table translation * levels in addition to the PGD. */ -#define KVM_MMU_CACHE_MIN_PAGES(STAGE2_PGTABLE_LEVELS - 1) +#define kvm_mmu_cache_min_pages(kvm) (STAGE2_PGTABLE_LEVELS - 1) Same comment as for the 32bit case. Otherwise: Acked-by: Marc Zyngier Thanks Suzuki
[Qemu-devel] [kvmtool test PATCH 23/24] kvmtool: arm64: Switch memory layout
If the guest wants to use a larger physical address space place the RAM at upper half of the address space. Otherwise, it uses the default layout. Signed-off-by: Suzuki K Poulose --- arm/aarch32/include/kvm/kvm-arch.h | 6 -- arm/aarch64/include/kvm/kvm-arch.h | 15 --- arm/include/arm-common/kvm-arch.h | 11 ++- arm/kvm.c | 2 +- 4 files changed, 23 insertions(+), 11 deletions(-) diff --git a/arm/aarch32/include/kvm/kvm-arch.h b/arm/aarch32/include/kvm/kvm-arch.h index cd31e72..bcd382b 100644 --- a/arm/aarch32/include/kvm/kvm-arch.h +++ b/arm/aarch32/include/kvm/kvm-arch.h @@ -3,8 +3,10 @@ #define ARM_KERN_OFFSET(...) 0x8000 -#define ARM_MAX_MEMORY(...)ARM_LOMAP_MAX_MEMORY - #include "arm-common/kvm-arch.h" +#define ARM_MAX_MEMORY(...)ARM32_MAX_MEMORY +#define ARM_MEMORY_AREA(...) ARM32_MEMORY_AREA + + #endif /* KVM__KVM_ARCH_H */ diff --git a/arm/aarch64/include/kvm/kvm-arch.h b/arm/aarch64/include/kvm/kvm-arch.h index 9de623a..bad35b9 100644 --- a/arm/aarch64/include/kvm/kvm-arch.h +++ b/arm/aarch64/include/kvm/kvm-arch.h @@ -1,14 +1,23 @@ #ifndef KVM__KVM_ARCH_H #define KVM__KVM_ARCH_H +#include "arm-common/kvm-arch.h" + +#define ARM64_MEMORY_AREA(phys_shift) (1UL << (phys_shift - 1)) +#define ARM64_MAX_MEMORY(phys_shift) \ + ((1ULL << (phys_shift)) - ARM64_MEMORY_AREA(phys_shift)) + +#define ARM_MEMORY_AREA(kvm) ((kvm)->cfg.arch.aarch32_guest ?\ +ARM32_MEMORY_AREA : \ +ARM64_MEMORY_AREA(kvm->cfg.arch.phys_shift)) + #define ARM_KERN_OFFSET(kvm) ((kvm)->cfg.arch.aarch32_guest ? \ 0x8000 : \ 0x8) #define ARM_MAX_MEMORY(kvm)((kvm)->cfg.arch.aarch32_guest ? \ - ARM_LOMAP_MAX_MEMORY: \ - ARM_HIMAP_MAX_MEMORY) + ARM32_MAX_MEMORY: \ + ARM64_MAX_MEMORY(kvm->cfg.arch.phys_shift)) -#include "arm-common/kvm-arch.h" #endif /* KVM__KVM_ARCH_H */ diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h index b9d486d..b29b4b1 100644 --- a/arm/include/arm-common/kvm-arch.h +++ b/arm/include/arm-common/kvm-arch.h @@ -6,14 +6,15 @@ #include #include "arm-common/gic.h" - #define ARM_IOPORT_AREA_AC(0x, UL) #define ARM_MMIO_AREA _AC(0x0001, UL) #define ARM_AXI_AREA _AC(0x4000, UL) -#define ARM_MEMORY_AREA_AC(0x8000, UL) -#define ARM_LOMAP_MAX_MEMORY ((1ULL << 32) - ARM_MEMORY_AREA) -#define ARM_HIMAP_MAX_MEMORY ((1ULL << 40) - ARM_MEMORY_AREA) +#define ARM32_MEMORY_AREA _AC(0x8000, UL) +#define ARM32_MAX_MEMORY ((1ULL << 32) - ARM32_MEMORY_AREA) + +#define ARM_IOMEM_AREA_END ARM32_MEMORY_AREA + #define ARM_GIC_DIST_BASE (ARM_AXI_AREA - ARM_GIC_DIST_SIZE) #define ARM_GIC_CPUI_BASE (ARM_GIC_DIST_BASE - ARM_GIC_CPUI_SIZE) @@ -24,7 +25,7 @@ #define ARM_IOPORT_SIZE(ARM_MMIO_AREA - ARM_IOPORT_AREA) #define ARM_VIRTIO_MMIO_SIZE (ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE)) #define ARM_PCI_CFG_SIZE (1ULL << 24) -#define ARM_PCI_MMIO_SIZE (ARM_MEMORY_AREA - \ +#define ARM_PCI_MMIO_SIZE (ARM_IOMEM_AREA_END - \ (ARM_AXI_AREA + ARM_PCI_CFG_SIZE)) #define KVM_IOPORT_AREAARM_IOPORT_AREA diff --git a/arm/kvm.c b/arm/kvm.c index 2ab436e..5701d41 100644 --- a/arm/kvm.c +++ b/arm/kvm.c @@ -30,7 +30,7 @@ void kvm__init_ram(struct kvm *kvm) u64 phys_start, phys_size; void *host_mem; - phys_start = ARM_MEMORY_AREA; + phys_start = ARM_MEMORY_AREA(kvm); phys_size = kvm->ram_size; host_mem= kvm->ram_start; -- 2.7.4
[Qemu-devel] [kvmtool test PATCH 21/24] kvmtool: Allow backends to run checks on the KVM device fd
Allow architectures to perform initialisation based on the KVM device fd ioctls, even before the VM is created. Signed-off-by: Suzuki K Poulose --- include/kvm/kvm.h | 4 kvm.c | 2 ++ 2 files changed, 6 insertions(+) diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h index 90463b8..a036dd2 100644 --- a/include/kvm/kvm.h +++ b/include/kvm/kvm.h @@ -103,6 +103,10 @@ int kvm__get_sock_by_instance(const char *name); int kvm__enumerate_instances(int (*callback)(const char *name, int pid)); void kvm__remove_socket(const char *name); +#ifndef kvm__arch_init_hyp +static inline void kvm__arch_init_hyp(struct kvm *kvm) {} +#endif + void kvm__arch_set_cmdline(char *cmdline, bool video); void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size); void kvm__arch_delete_ram(struct kvm *kvm); diff --git a/kvm.c b/kvm.c index f8f2fdc..b992e74 100644 --- a/kvm.c +++ b/kvm.c @@ -304,6 +304,8 @@ int kvm__init(struct kvm *kvm) goto err_sys_fd; } + kvm__arch_init_hyp(kvm); + kvm->vm_fd = ioctl(kvm->sys_fd, KVM_CREATE_VM, KVM_VM_TYPE); if (kvm->vm_fd < 0) { pr_err("KVM_CREATE_VM ioctl"); -- 2.7.4
[Qemu-devel] [PATCH v3 16/20] kvm: arm64: Switch to per VM IPA limit
Now that we can manage the stage2 page table per VM, switch the configuration details to per VM instance. We keep track of the IPA bits, number of page table levels and the VTCR bits (which depends on the IPA and the number of levels). While at it, remove unused pgd_lock field from kvm_arch for arm64. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- arch/arm64/include/asm/kvm_host.h | 14 -- arch/arm64/include/asm/kvm_hyp.h| 3 +-- arch/arm64/include/asm/kvm_mmu.h| 20 ++-- arch/arm64/include/asm/stage2_pgtable.h | 1 - virt/kvm/arm/mmu.c | 4 5 files changed, 35 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 328f472..9a15860 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -61,13 +61,23 @@ struct kvm_arch { u64vmid_gen; u32vmid; - /* 1-level 2nd stage table and lock */ - spinlock_t pgd_lock; + /* stage-2 page table */ pgd_t *pgd; /* VTTBR value associated with above pgd and vmid */ u64vttbr; + /* Private bits of VTCR_EL2 for this VM */ + u64vtcr_private; + /* Size of the PA size for this guest */ + u8 phys_shift; + /* +* Number of levels in page table. We could always calculate +* it from phys_shift above. We cache it for faster switches +* in stage2 page table helpers. +*/ + u8 s2_levels; + /* The last vcpu id that ran on each physical CPU */ int __percpu *last_vcpu_ran; diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index 3e8052d1..699f678 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -166,8 +166,7 @@ static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm) u64 vtcr = read_sysreg(vtcr_el2); vtcr &= ~VTCR_EL2_PRIVATE_MASK; - vtcr |= VTCR_EL2_SL0(kvm_stage2_levels(kvm)) | - VTCR_EL2_T0SZ(kvm_phys_shift(kvm)); + vtcr |= kvm->arch.vtcr_private; write_sysreg(vtcr, vtcr_el2); write_sysreg(kvm->arch.vttbr, vttbr_el2); } diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index f3fb05a3..a291cdc 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -143,9 +143,10 @@ static inline unsigned long __kern_hyp_va(unsigned long v) */ #define KVM_PHYS_SHIFT (40) -#define kvm_phys_shift(kvm)KVM_PHYS_SHIFT +#define kvm_phys_shift(kvm)(kvm->arch.phys_shift) #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) +#define kvm_stage2_levels(kvm) (kvm->arch.s2_levels) static inline bool kvm_page_empty(void *ptr) { @@ -528,6 +529,18 @@ static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm) static inline void *stage2_alloc_pgd(struct kvm *kvm) { + u32 ipa, lvls; + + /* +* Stage2 page table can support concatenation of (upto 16) tables +* at the entry level, thereby reducing the number of levels. +*/ + ipa = kvm_phys_shift(kvm); + lvls = stage2_pt_levels(ipa); + + kvm->arch.s2_levels = lvls; + kvm->arch.vtcr_private = VTCR_EL2_SL0(lvls) | TCR_T0SZ(ipa); + return alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO); } @@ -537,7 +550,10 @@ static inline u32 kvm_get_ipa_limit(void) return KVM_PHYS_SHIFT; } -static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {} +static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) +{ + kvm->arch.phys_shift = ipa_shift; +} #endif /* __ASSEMBLY__ */ #endif /* __ARM64_KVM_MMU_H__ */ diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h index ffc37cc..91d7936 100644 --- a/arch/arm64/include/asm/stage2_pgtable.h +++ b/arch/arm64/include/asm/stage2_pgtable.h @@ -65,7 +65,6 @@ #define __s2_pgd_ptrs(pa, lvls)(1 << ((pa) - pt_levels_pgdir_shift((lvls #define __s2_pgd_size(pa, lvls)(__s2_pgd_ptrs((pa), (lvls)) * sizeof(pgd_t)) -#define kvm_stage2_levels(kvm) stage2_pt_levels(kvm_phys_shift(kvm)) #define stage2_pgdir_shift(kvm)\ pt_levels_pgdir_shift(kvm_stage2_levels(kvm)) #define stage2_pgdir_size(kvm) (_AC(1, UL) << stage2_pgdir_shift((kvm))) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index a339e00..d7822e1 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -867,6 +867,10 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) return -EINVAL; } + /* Make sure we have the stage2 configured for this VM
[Qemu-devel] [PATCH v3 19/20] kvm: arm64: Allow IPA size supported by the system
So far we have restricted the IPA size of the VM to the default value (40bits). Now that we can manage the IPA size per VM and support dynamic stage2 page tables, allow VMs to have larger IPA. This is done by setting the IPA limit to the one supported by the hardware and kernel. This patch also moves the check for the default IPA size support to kvm_get_ipa_limit(). Since the stage2 page table code is dependent on the stage1 page table, we always ensure that : Number of Levels at Stage1 >= Number of Levels at Stage2 So we limit the IPA to make sure that the above condition is satisfied. This will affect the following combinations of VA_BITS and IPA for different page sizes. 39bit VA, 4K - IPA > 43 (Upto 48) 36bit VA, 16K - IPA > 40 (Upto 48) 42bit VA, 64K - IPA > 46 (Upto 52) Supporting the above combinations need independent stage2 page table manipulation code, which would need substantial changes. We could purse the solution independently and switch the page table code once we have it ready. Cc: Catalin Marinas Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2: - Restrict the IPA size to limit the number of page table levels in stage2 to that of stage1 or less. --- arch/arm64/include/asm/kvm_host.h | 6 -- arch/arm64/include/asm/kvm_mmu.h | 37 - 2 files changed, 36 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 9a15860..e858e49 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -452,13 +452,7 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, static inline void __cpu_init_stage2(void) { - u32 ps; - kvm_call_hyp(__init_stage2_translation); - /* Sanity check for minimum IPA size support */ - ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1) & 0x7); - WARN_ONCE(ps < 40, - "PARange is %d bits, unsupported configuration!", ps); } /* Guest/host FPSIMD coordination helpers */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index a291cdc..d38f395 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -547,7 +547,42 @@ static inline void *stage2_alloc_pgd(struct kvm *kvm) static inline u32 kvm_get_ipa_limit(void) { - return KVM_PHYS_SHIFT; + unsigned int ipa_max, va_max, parange; + + parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7; + ipa_max = id_aa64mmfr0_parange_to_phys_shift(parange); + + /* Raise the limit to the default size for backward compatibility */ + if (ipa_max < KVM_PHYS_SHIFT) { + WARN_ONCE(1, + "PARange is %d bits, unsupported configuration!", + ipa_max); + ipa_max = KVM_PHYS_SHIFT; + } + + /* Clamp it to the PA size supported by the kernel */ + ipa_max = (ipa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : ipa_max; + /* +* Since our stage2 table is dependent on the stage1 page table code, +* we must always honor the following condition: +* +* Number of levels in Stage1 >= Number of levels in Stage2. +* +* So clamp the ipa limit further down to limit the number of levels. +* Since we can concatenate upto 16 tables at entry level, we could +* go upto 4bits above the maximum VA addressible with the current +* number of levels. +*/ + va_max = PGDIR_SHIFT + PAGE_SHIFT - 3; + va_max += 4; + + if (va_max < ipa_max) { + kvm_info("Limiting IPA limit to %dbytes due to host VA bits limitation\n", +va_max); + ipa_max = va_max; + } + + return ipa_max; } static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) -- 2.7.4
[Qemu-devel] [kvmtool test PATCH 24/24] kvmtool: arm: Add support for creating VM with PA size
Specify the physical size for the VM encoded in the vm type. Signed-off-by: Suzuki K Poulose --- arm/include/arm-common/kvm-arch.h | 6 +- arm/kvm.c | 22 ++ 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h index b29b4b1..d77f3ac 100644 --- a/arm/include/arm-common/kvm-arch.h +++ b/arm/include/arm-common/kvm-arch.h @@ -44,7 +44,11 @@ #define KVM_IRQ_OFFSET GIC_SPI_IRQ_BASE -#define KVM_VM_TYPE0 +extern unsigned long kvm_arm_type; +extern void kvm__arch_init_hyp(struct kvm *kvm); + +#define KVM_VM_TYPEkvm_arm_type +#define kvm__arch_init_hyp kvm__arch_init_hyp #define VIRTIO_DEFAULT_TRANS(kvm) \ ((kvm)->cfg.arch.virtio_trans_pci ? VIRTIO_PCI : VIRTIO_MMIO) diff --git a/arm/kvm.c b/arm/kvm.c index 5701d41..b1969be 100644 --- a/arm/kvm.c +++ b/arm/kvm.c @@ -11,6 +11,8 @@ #include #include +unsigned long kvm_arm_type; + struct kvm_ext kvm_req_ext[] = { { DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) }, { DEFINE_KVM_EXT(KVM_CAP_ONE_REG) }, @@ -18,6 +20,26 @@ struct kvm_ext kvm_req_ext[] = { { 0, 0 }, }; +#ifndef KVM_ARM_GET_MAX_VM_PHYS_SHIFT +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT _IO(KVMIO, 0x0b) +#endif + +void kvm__arch_init_hyp(struct kvm *kvm) +{ + int max_ipa; + + max_ipa = ioctl(kvm->sys_fd, KVM_ARM_GET_MAX_VM_PHYS_SHIFT); + if (max_ipa < 0) + max_ipa = 40; + if (!kvm->cfg.arch.phys_shift) + kvm->cfg.arch.phys_shift = 40; + if (kvm->cfg.arch.phys_shift > max_ipa) + die("Requested PA size (%u) is not supported by the host (%ubits)\n", + kvm->cfg.arch.phys_shift, max_ipa); + if (kvm->cfg.arch.phys_shift != 40) + kvm_arm_type = kvm->cfg.arch.phys_shift; +} + bool kvm__arch_cpu_supports_vm(void) { /* The KVM capability check is enough. */ -- 2.7.4
[Qemu-devel] [kvmtool test PATCH 22/24] kvmtool: arm64: Add support for guest physical address size
Add an option to specify the physical address size used by this VM. Signed-off-by: Suzuki K Poulose --- arm/aarch64/include/kvm/kvm-config-arch.h | 5 - arm/include/arm-common/kvm-config-arch.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h b/arm/aarch64/include/kvm/kvm-config-arch.h index 04be43d..dabd22c 100644 --- a/arm/aarch64/include/kvm/kvm-config-arch.h +++ b/arm/aarch64/include/kvm/kvm-config-arch.h @@ -8,7 +8,10 @@ "Create PMUv3 device"), \ OPT_U64('\0', "kaslr-seed", &(cfg)->kaslr_seed, \ "Specify random seed for Kernel Address Space " \ - "Layout Randomization (KASLR)"), + "Layout Randomization (KASLR)"),\ + OPT_INTEGER('\0', "phys-shift", &(cfg)->phys_shift, \ + "Specify maximum physical address size (not " \ + "the amount of memory)"), #include "arm-common/kvm-config-arch.h" diff --git a/arm/include/arm-common/kvm-config-arch.h b/arm/include/arm-common/kvm-config-arch.h index 6a196f1..e0b531e 100644 --- a/arm/include/arm-common/kvm-config-arch.h +++ b/arm/include/arm-common/kvm-config-arch.h @@ -11,6 +11,7 @@ struct kvm_config_arch { boolhas_pmuv3; u64 kaslr_seed; enum irqchip_type irqchip; + int phys_shift; }; int irqchip_parser(const struct option *opt, const char *arg, int unset); -- 2.7.4
[Qemu-devel] [PATCH v3 13/20] kvm: arm64: Configure VTCR per VM
We set VTCR_EL2 very early during the stage2 init and don't touch it ever. This is fine as we had a fixed IPA size. This patch changes the behavior to set the VTCR for a given VM, depending on its stage2 table. The common configuration for VTCR is still performed during the early init as we have to retain the hardware access flag update bits (VTCR_EL2_HA) per CPU (as they are only set for the CPUs which are capabile). The bits defining the number of levels in the page table (SL0) and and the size of the Input address to the translation (T0SZ) are programmed for each VM upon entry to the guest. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Change since V2: - Load VTCR for TLB operations --- arch/arm64/include/asm/kvm_arm.h | 19 +-- arch/arm64/include/asm/kvm_asm.h | 2 +- arch/arm64/include/asm/kvm_host.h | 9 ++--- arch/arm64/include/asm/kvm_hyp.h | 11 +++ arch/arm64/kvm/hyp/s2-setup.c | 17 + 5 files changed, 28 insertions(+), 30 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 11a7db0..b02c316 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -120,9 +120,7 @@ #define VTCR_EL2_IRGN0_WBWATCR_IRGN0_WBWA #define VTCR_EL2_SL0_SHIFT 6 #define VTCR_EL2_SL0_MASK (3 << VTCR_EL2_SL0_SHIFT) -#define VTCR_EL2_SL0_LVL1 (1 << VTCR_EL2_SL0_SHIFT) #define VTCR_EL2_T0SZ_MASK 0x3f -#define VTCR_EL2_T0SZ_40B 24 #define VTCR_EL2_VS_SHIFT 19 #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT) @@ -137,43 +135,44 @@ * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time * (see hyp-init.S). * + * VTCR_EL2.SL0 and T0SZ are configured per VM at runtime before switching to + * the VM. + * * Note that when using 4K pages, we concatenate two first level page tables * together. With 16K pages, we concatenate 16 first level page tables. * */ -#define VTCR_EL2_T0SZ_IPA VTCR_EL2_T0SZ_40B #define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \ VTCR_EL2_IRGN0_WBWA | VTCR_EL2_RES1) +#define VTCR_EL2_PRIVATE_MASK (VTCR_EL2_SL0_MASK | VTCR_EL2_T0SZ_MASK) #ifdef CONFIG_ARM64_64K_PAGES /* * Stage2 translation configuration: * 64kB pages (TG0 = 1) - * 2 level page tables (SL = 1) */ -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1) +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K #define VTCR_EL2_TGRAN_SL0_BASE3UL #elif defined(CONFIG_ARM64_16K_PAGES) /* * Stage2 translation configuration: * 16kB pages (TG0 = 2) - * 2 level page tables (SL = 1) */ -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1) +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K #define VTCR_EL2_TGRAN_SL0_BASE3UL #else /* 4K */ /* * Stage2 translation configuration: * 4kB pages (TG0 = 0) - * 3 level page tables (SL = 1) */ -#define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1) +#define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K #define VTCR_EL2_TGRAN_SL0_BASE2UL #endif -#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) +#define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN) + /* * VTCR_EL2:SL0 indicates the entry level for Stage2 translation. * Interestingly, it depends on the page size. diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 102b5a5..91372eb 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -72,7 +72,7 @@ extern void __vgic_v3_init_lrs(void); extern u32 __kvm_get_mdcr_el2(void); -extern u32 __init_stage2_translation(void); +extern void __init_stage2_translation(void); /* Home-grown __this_cpu_{ptr,read} variants that always work at HYP */ #define __hyp_this_cpu_ptr(sym) \ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index fe8777b..328f472 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -442,10 +442,13 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, static inline void __cpu_init_stage2(void) { - u32 parange = kvm_call_hyp(__init_stage2_translation); + u32 ps; - WARN_ONCE(parange < 40, - "PARange is %d bits, unsupported configuration!", parange); + kvm_call_hyp(__init_stage2_translation); + /* Sanity check for minimum IPA size support */ + ps = id_aa64mmfr0_parange_to_phys_shift(read_sysreg(id_aa64mmfr0_el1) & 0x7); + WARN_ONCE(ps < 40, + "PARange is %d bits, unsupported configuration!", ps); } /* Guest/host
[Qemu-devel] [PATCH v3 20/20] kvm: arm64: Fall back to normal stage2 entry level
We use concatenated entry level page tables (upto 16tables) for stage2. If we don't have sufficient contiguous pages (e.g, 16 * 64K), fallback to the normal page table format, by going one level deeper if permitted. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- New in v3 --- arch/arm64/include/asm/kvm_arm.h | 7 +++ arch/arm64/include/asm/kvm_mmu.h | 18 + arch/arm64/kvm/guest.c | 42 3 files changed, 50 insertions(+), 17 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index cb6a2ee..42eb528 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -137,6 +137,8 @@ * * VTCR_EL2.SL0 and T0SZ are configured per VM at runtime before switching to * the VM. + * + * With 16k/64k, the maximum number of levels supported at Stage2 is 3. */ #define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \ @@ -150,6 +152,7 @@ */ #define VTCR_EL2_TGRAN VTCR_EL2_TG0_64K #define VTCR_EL2_TGRAN_SL0_BASE3UL +#define ARM64_TGRAN_STAGE2_MAX_LEVELS 3 #elif defined(CONFIG_ARM64_16K_PAGES) /* @@ -158,6 +161,8 @@ */ #define VTCR_EL2_TGRAN VTCR_EL2_TG0_16K #define VTCR_EL2_TGRAN_SL0_BASE3UL +#define ARM64_TGRAN_STAGE2_MAX_LEVELS 3 + #else /* 4K */ /* * Stage2 translation configuration: @@ -165,6 +170,8 @@ */ #define VTCR_EL2_TGRAN VTCR_EL2_TG0_4K #define VTCR_EL2_TGRAN_SL0_BASE2UL +#define ARM64_TGRAN_STAGE2_MAX_LEVELS 4 + #endif #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN) diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index d38f395..50f632e 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -527,23 +527,7 @@ static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm) return vttbr_baddr_mask(kvm_phys_shift(kvm), kvm_stage2_levels(kvm)); } -static inline void *stage2_alloc_pgd(struct kvm *kvm) -{ - u32 ipa, lvls; - - /* -* Stage2 page table can support concatenation of (upto 16) tables -* at the entry level, thereby reducing the number of levels. -*/ - ipa = kvm_phys_shift(kvm); - lvls = stage2_pt_levels(ipa); - - kvm->arch.s2_levels = lvls; - kvm->arch.vtcr_private = VTCR_EL2_SL0(lvls) | TCR_T0SZ(ipa); - - return alloc_pages_exact(stage2_pgd_size(kvm), -GFP_KERNEL | __GFP_ZERO); -} +extern void *stage2_alloc_pgd(struct kvm *kvm); static inline u32 kvm_get_ipa_limit(void) { diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 56a0260..5a3a687 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -31,6 +31,8 @@ #include #include #include +#include +#include #include "trace.h" @@ -458,3 +460,43 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, return ret; } + +void *stage2_alloc_pgd(struct kvm *kvm) +{ + u32 ipa, s2_lvls, lvls; + u64 pgd_size; + void *pgd; + + /* +* Stage2 page table can support concatenation of (upto 16) tables +* at the entry level, thereby reducing the number of levels. We try +* to use concatenation wherever possible. If we fail, fallback to +* normal levels if possible. +*/ + ipa = kvm_phys_shift(kvm); + lvls = s2_lvls = stage2_pt_levels(ipa); + +retry: + pgd_size = __s2_pgd_size(ipa, lvls); + pgd = alloc_pages_exact(pgd_size, GFP_KERNEL | __GFP_ZERO); + + /* Check if the PGD meets the alignment requirements */ + if (pgd && (virt_to_phys(pgd) & ~vttbr_baddr_mask(ipa, lvls))) { + free_pages_exact(pgd, pgd_size); + pgd = NULL; + } + + if (pgd) { + kvm->arch.s2_levels = lvls; + kvm->arch.vtcr_private = VTCR_EL2_SL0(lvls) | TCR_T0SZ(ipa); + } else { + /* Check if we can use an entry level without concatenation */ + lvls = ARM64_HW_PGTABLE_LEVELS(ipa); + if ((lvls > s2_lvls) && + (lvls <= CONFIG_PGTABLE_LEVELS) && + (lvls <= ARM64_TGRAN_STAGE2_MAX_LEVELS)) + goto retry; + } + + return pgd; +} -- 2.7.4
[Qemu-devel] [PATCH v3 18/20] kvm: arm64: Add support for handling 52bit IPA
Add support for handling the 52bit IPA. 52bit IPA support needs changes to the following : 1) Page-table entries - We use kernel page table helpers for setting up the stage2. Hence we don't explicit changes here 2) VTTBR:BADDR - This is already supported with : commit 529c4b05a3cb2f324aa ("arm64: handle 52-bit addresses in TTBR") 3) VGIC support for 52bit: Supported with a patch in this series. That leaves us with the handling for PAR and HPAR. This patch adds support for handling the 52bit addresses in PAR and HPFAR, which are used while handling the permission faults in stage1. Cc: Marc Zyngier Cc: Kristina Martsenko Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- arch/arm64/include/asm/kvm_arm.h | 7 +++ arch/arm64/kvm/hyp/switch.c | 2 +- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 2e90942..cb6a2ee 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -301,6 +301,13 @@ /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */ #define HPFAR_MASK (~UL(0xf)) +/* + * We have + * PAR [PA_Shift - 1 : 12] = PA [PA_Shift - 1 : 12] + * HPFAR [PA_Shift - 9 : 4] = FIPA[PA_Shift - 1 : 12] + */ +#define PAR_TO_HPFAR(par) \ + (((par) & GENMASK_ULL(PHYS_MASK_SHIFT - 1, 12)) >> 8) #define kvm_arm_exception_type \ {0, "IRQ" },\ diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index 355fb25..fb66320 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -260,7 +260,7 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar) return false; /* Translation failed, back to guest */ /* Convert PAR to HPFAR format */ - *hpfar = ((tmp >> 12) & ((1UL << 36) - 1)) << 4; + *hpfar = PAR_TO_HPFAR(tmp); return true; } -- 2.7.4
[Qemu-devel] [PATCH v3 15/20] kvm: arm/arm64: Allow tuning the physical address size for VM
Allow specifying the physical address size for a new VM via the kvm_type argument for KVM_CREATE_VM ioctl. This allows us to finalise the stage2 page table format as early as possible and hence perform the right checks on the memory slots without complication. The size is encoded as Log2(PA_Size) in the bits[7:0] of the type field and can encode more information in the future if required. The IPA size is still capped at 40bits. Cc: Marc Zyngier Cc: Christoffer Dall Cc: Peter Maydel Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Suzuki K Poulose --- arch/arm/include/asm/kvm_mmu.h | 2 ++ arch/arm64/include/asm/kvm_arm.h | 10 +++--- arch/arm64/include/asm/kvm_mmu.h | 2 ++ include/uapi/linux/kvm.h | 10 ++ virt/kvm/arm/arm.c | 24 ++-- 5 files changed, 39 insertions(+), 9 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index d86f8dd..bcc3dd9 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -385,6 +385,8 @@ static inline u32 kvm_get_ipa_limit(void) return KVM_PHYS_SHIFT; } +static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {} + #endif /* !__ASSEMBLY__ */ #endif /* __ARM_KVM_MMU_H__ */ diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index b02c316..2e90942 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -128,19 +128,15 @@ #define VTCR_EL2_T0SZ(x) TCR_T0SZ(x) /* - * We configure the Stage-2 page tables to always restrict the IPA space to be - * 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are - * not known to exist and will break with this configuration. + * We configure the Stage-2 page tables based on the requested size of + * IPA for each VM. The default size is set to 40bits and is not allowed + * go below that limit (for backward compatibility). * * VTCR_EL2.PS is extracted from ID_AA64MMFR0_EL1.PARange at boot time * (see hyp-init.S). * * VTCR_EL2.SL0 and T0SZ are configured per VM at runtime before switching to * the VM. - * - * Note that when using 4K pages, we concatenate two first level page tables - * together. With 16K pages, we concatenate 16 first level page tables. - * */ #define VTCR_EL2_COMMON_BITS (VTCR_EL2_SH0_INNER | VTCR_EL2_ORGN0_WBWA | \ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index b4564d8..f3fb05a3 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -537,5 +537,7 @@ static inline u32 kvm_get_ipa_limit(void) return KVM_PHYS_SHIFT; } +static inline void kvm_config_stage2(struct kvm *kvm, u32 ipa_shift) {} + #endif /* __ASSEMBLY__ */ #endif /* __ARM64_KVM_MMU_H__ */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 4df9bb6..fa4cab0 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -751,6 +751,16 @@ struct kvm_ppc_resize_hpt { #define KVM_S390_SIE_PAGE_OFFSET 1 /* + * On arm/arm64, machine type can be used to request the physical + * address size for the VM. Bits [7-0] have been reserved for the + * PA size shift (i.e, log2(PA_Size)). For backward compatibility, + * value 0 implies the default IPA size, which is 40bits. + */ +#define KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK0xff +#define KVM_VM_TYPE_ARM_PHYS_SHIFT(x) \ + ((x) & KVM_VM_TYPE_ARM_PHYS_SHIFT_MASK) + +/* * ioctls for /dev/kvm fds: */ #define KVM_GET_API_VERSION _IO(KVMIO, 0x00) diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 0d99e67..1085761 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -112,6 +112,25 @@ void kvm_arch_check_processor_compat(void *rtn) } +static int kvm_arch_config_vm(struct kvm *kvm, unsigned long type) +{ + u32 ipa_shift = KVM_VM_TYPE_ARM_PHYS_SHIFT(type); + + /* +* Make sure the size, if specified, is within the range of +* default size and supported maximum limit. +*/ + if (ipa_shift) { + if (ipa_shift < KVM_PHYS_SHIFT || ipa_shift > kvm_ipa_limit) + return -EINVAL; + } else { + ipa_shift = KVM_PHYS_SHIFT; + } + + kvm_config_stage2(kvm, ipa_shift); + return 0; +} + /** * kvm_arch_init_vm - initializes a VM data structure * @kvm: pointer to the KVM struct @@ -120,8 +139,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { int ret, cpu; - if (type) - return -EINVAL; + ret = kvm_arch_config_vm(kvm, type); + if (ret) + return ret; kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran)); if (!kvm->arch.last_vcpu_ran) -- 2.7.4
[Qemu-devel] [PATCH v3 17/20] vgic: Add support for 52bit guest physical address
From: Kristina Martsenko Add support for handling 52bit guest physical address to the VGIC layer. So far we have limited the guest physical address to 48bits, by explicitly masking the upper bits. This patch removes the restriction. We do not have to check if the host supports 52bit as the gpa is always validated during an access. (e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()). Also, the ITS table save-restore is also not affected with the enhancement. The DTE entries already store the bits[51:8] of the ITT_addr (with a 256byte alignment). Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Kristina Martsenko [ Macro clean ups, fix PROPBASER and PENDBASER accesses ] Signed-off-by: Suzuki K Poulose --- include/linux/irqchip/arm-gic-v3.h | 5 + virt/kvm/arm/vgic/vgic-its.c | 36 ++-- virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 -- 3 files changed, 15 insertions(+), 28 deletions(-) diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index cbb872c..bc4b95b 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -346,6 +346,8 @@ #define GITS_CBASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWt) #define GITS_CBASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWb) +#define GITS_CBASER_ADDRESS(cbaser)((cbaser) & GENMASK_ULL(52, 12)) + #define GITS_BASER_NR_REGS 8 #define GITS_BASER_VALID (1ULL << 63) @@ -377,6 +379,9 @@ #define GITS_BASER_ENTRY_SIZE_MASK GENMASK_ULL(52, 48) #define GITS_BASER_PHYS_52_to_48(phys) \ (((phys) & GENMASK_ULL(47, 16)) | (((phys) >> 48) & 0xf) << 12) +#define GITS_BASER_ADDR_48_to_52(baser) \ + (((baser) & GENMASK_ULL(47, 16)) | (((baser) >> 12) & 0xf) << 48) + #define GITS_BASER_SHAREABILITY_SHIFT (10) #define GITS_BASER_InnerShareable \ GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable) diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index 4ed79c9..c6eb390 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -234,13 +234,6 @@ static struct its_ite *find_ite(struct vgic_its *its, u32 device_id, list_for_each_entry(dev, &(its)->device_list, dev_list) \ list_for_each_entry(ite, &(dev)->itt_head, ite_list) -/* - * We only implement 48 bits of PA at the moment, although the ITS - * supports more. Let's be restrictive here. - */ -#define BASER_ADDRESS(x) ((x) & GENMASK_ULL(47, 16)) -#define CBASER_ADDRESS(x) ((x) & GENMASK_ULL(47, 12)) - #define GIC_LPI_OFFSET 8192 #define VITS_TYPER_IDBITS 16 @@ -752,6 +745,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, { int l1_tbl_size = GITS_BASER_NR_PAGES(baser) * SZ_64K; u64 indirect_ptr, type = GITS_BASER_TYPE(baser); + phys_addr_t base = GITS_BASER_ADDR_48_to_52(baser); int esz = GITS_BASER_ENTRY_SIZE(baser); int index; gfn_t gfn; @@ -776,7 +770,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, if (id >= (l1_tbl_size / esz)) return false; - addr = BASER_ADDRESS(baser) + id * esz; + addr = base + id * esz; gfn = addr >> PAGE_SHIFT; if (eaddr) @@ -791,7 +785,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, /* Each 1st level entry is represented by a 64-bit value. */ if (kvm_read_guest_lock(its->dev->kvm, - BASER_ADDRESS(baser) + index * sizeof(indirect_ptr), + base + index * sizeof(indirect_ptr), _ptr, sizeof(indirect_ptr))) return false; @@ -801,11 +795,7 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, u32 id, if (!(indirect_ptr & BIT_ULL(63))) return false; - /* -* Mask the guest physical address and calculate the frame number. -* Any address beyond our supported 48 bits of PA will be caught -* by the actual check in the final step. -*/ + /* Mask the guest physical address and calculate the frame number. */ indirect_ptr &= GENMASK_ULL(51, 16); /* Find the address of the actual entry */ @@ -1297,9 +1287,6 @@ static u64 vgic_sanitise_its_baser(u64 reg) GITS_BASER_OUTER_CACHEABILITY_SHIFT, vgic_sanitise_outer_cacheability); - /* Bits 15:12 contain bits 51:48 of the PA, which we don't support. */ - reg &= ~GENMASK_ULL(15, 12); - /* We support only one (ITS) page size: 64K */ reg =
[Qemu-devel] [PATCH v3 08/20] kvm: arm/arm64: Abstract stage2 pgd table allocation
Abstract the allocation of stage2 entry level tables for given VM, so that later we can choose to fall back to the normal page table levels (i.e, avoid entry level table concatenation) on arm64. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2: - New patch --- arch/arm/include/asm/kvm_mmu.h | 6 ++ arch/arm64/include/asm/kvm_mmu.h | 6 ++ virt/kvm/arm/mmu.c | 2 +- 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index f36eb20..b2da5a4 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -372,6 +372,12 @@ static inline int hyp_map_aux_data(void) return 0; } +static inline void *stage2_alloc_pgd(struct kvm *kvm) +{ + return alloc_pages_exact(stage2_pgd_size(kvm), +GFP_KERNEL | __GFP_ZERO); +} + #define kvm_phys_to_vttbr(addr)(addr) #endif /* !__ASSEMBLY__ */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 5da8f52..dbaf513 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -501,5 +501,11 @@ static inline int hyp_map_aux_data(void) #define kvm_phys_to_vttbr(addr)phys_to_ttbr(addr) +static inline void *stage2_alloc_pgd(struct kvm *kvm) +{ + return alloc_pages_exact(stage2_pgd_size(kvm), +GFP_KERNEL | __GFP_ZERO); +} + #endif /* __ASSEMBLY__ */ #endif /* __ARM64_KVM_MMU_H__ */ diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 82dd571..a339e00 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -868,7 +868,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) } /* Allocate the HW PGD, making sure that each page gets its own refcount */ - pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO); + pgd = stage2_alloc_pgd(kvm); if (!pgd) return -ENOMEM; -- 2.7.4
[Qemu-devel] [PATCH v3 14/20] kvm: arm/arm64: Expose supported physical address limit for VM
Expose the maximum physical address size supported by the host for a VM. This could be later used by the userspace to choose the appropriate size for a given VM. The limit is determined as the minimum of actual CPU limit, the kernel limit (i.e, either 48 or 52) and the stage2 page table support limit (which is 40bits at the moment). For backward compatibility, we support a minimum of 40bits. The limit will be lifted as we add support for the stage2 to support the host kernel PA limit. This value may be different from what is exposed to the VM via CPU ID registers. The limit only applies to the stage2 page table. Cc: Christoffer Dall Cc: Marc Zyngier Cc: Peter Maydel Signed-off-by: Suzuki K Poulose --- Changes since V2: - Bump the ioctl number --- Documentation/virtual/kvm/api.txt | 15 +++ arch/arm/include/asm/kvm_mmu.h| 5 + arch/arm64/include/asm/kvm_mmu.h | 5 + include/uapi/linux/kvm.h | 6 ++ virt/kvm/arm/arm.c| 6 ++ 5 files changed, 37 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index d10944e..662374b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3561,6 +3561,21 @@ Returns: 0 on success, -ENOENT on deassign if the conn_id isn't registered -EEXIST on assign if the conn_id is already registered +4.113 KVM_ARM_GET_MAX_VM_PHYS_SHIFT +Capability: basic +Architectures: arm, arm64 +Type: system ioctl +Parameters: none +Returns: log2(Maximum Guest physical address space size) supported by the +hypervisor. + +This ioctl can be used to identify the maximum guest physical address +space size supported by the hypervisor. The returned value indicates the +maximum size of the address that can be resolved by the stage2 +translation table on arm/arm64. On arm64, the value is decided based +on the host kernel configuration and the system wide safe value of +ID_AA64MMFR0_EL1:PARange. This may not match the value exposed to the +VM in CPU ID registers. 5. The kvm_run structure diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index b2da5a4..d86f8dd 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -380,6 +380,11 @@ static inline void *stage2_alloc_pgd(struct kvm *kvm) #define kvm_phys_to_vttbr(addr)(addr) +static inline u32 kvm_get_ipa_limit(void) +{ + return KVM_PHYS_SHIFT; +} + #endif /* !__ASSEMBLY__ */ #endif /* __ARM_KVM_MMU_H__ */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 813a72a..b4564d8 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -532,5 +532,10 @@ static inline void *stage2_alloc_pgd(struct kvm *kvm) GFP_KERNEL | __GFP_ZERO); } +static inline u32 kvm_get_ipa_limit(void) +{ + return KVM_PHYS_SHIFT; +} + #endif /* __ASSEMBLY__ */ #endif /* __ARM64_KVM_MMU_H__ */ diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index b6270a3..4df9bb6 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -775,6 +775,12 @@ struct kvm_ppc_resize_hpt { #define KVM_GET_MSR_FEATURE_INDEX_LIST_IOWR(KVMIO, 0x0a, struct kvm_msr_list) /* + * Get the maximum physical address size supported by the host. + * Returns log2(Max-Physical-Address-Size) + */ +#define KVM_ARM_GET_MAX_VM_PHYS_SHIFT _IO(KVMIO, 0x0b) + +/* * Extension capability list. */ #define KVM_CAP_IRQCHIP 0 diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index d2637bb..0d99e67 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -66,6 +66,7 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1); static u32 kvm_next_vmid; static unsigned int kvm_vmid_bits __read_mostly; static DEFINE_RWLOCK(kvm_vmid_lock); +static u32 kvm_ipa_limit; static bool vgic_present; @@ -248,6 +249,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { + if (ioctl == KVM_ARM_GET_MAX_VM_PHYS_SHIFT) + return kvm_ipa_limit; + return -EINVAL; } @@ -1361,6 +1365,8 @@ static int init_common_resources(void) kvm_vmid_bits = kvm_get_vmid_bits(); kvm_info("%d-bit VMID\n", kvm_vmid_bits); + kvm_ipa_limit = kvm_get_ipa_limit(); + return 0; } -- 2.7.4
[Qemu-devel] [PATCH v3 09/20] kvm: arm64: Make stage2 page table layout dynamic
So far we had a static stage2 page table handling code, based on a fixed IPA of 40bits. As we prepare for a configurable IPA size per VM, make our stage2 page table code dynamic, to do the right thing for a given VM. We ensure the existing condition is always true even when we lift the limit on the IPA. i.e, page table levels in stage1 >= page table levels in stage2 Support for the IPA size configuration needs other changes in the way we configure the EL2 registers (VTTBR and VTCR). So, the IPA is still fixed to 40bits. The patch also moves the kvm_page_empty() in asm/kvm_mmu.h to the top, before including the asm/stage2_pgtable.h to avoid a forward declaration. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2 - Restrict the stage2 page table to allow reusing the host page table helpers for now, until we get stage1 independent page table helpers. --- arch/arm64/include/asm/kvm_mmu.h | 14 +- arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 -- arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 - arch/arm64/include/asm/stage2_pgtable.h | 207 +++--- 4 files changed, 159 insertions(+), 143 deletions(-) delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index dbaf513..a351722 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -21,6 +21,7 @@ #include #include #include +#include /* * As ARMv8.0 only has the TTBR0_EL2 register, we cannot express @@ -147,6 +148,13 @@ static inline unsigned long __kern_hyp_va(unsigned long v) #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) #define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK +static inline bool kvm_page_empty(void *ptr) +{ + struct page *ptr_page = virt_to_page(ptr); + + return page_count(ptr_page) == 1; +} + #include int create_hyp_mappings(void *from, void *to, pgprot_t prot); @@ -237,12 +245,6 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp) return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN); } -static inline bool kvm_page_empty(void *ptr) -{ - struct page *ptr_page = virt_to_page(ptr); - return page_count(ptr_page) == 1; -} - #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/stage2_pgtable-nopmd.h b/arch/arm64/include/asm/stage2_pgtable-nopmd.h deleted file mode 100644 index 0280ded..000 --- a/arch/arm64/include/asm/stage2_pgtable-nopmd.h +++ /dev/null @@ -1,42 +0,0 @@ -/* - * Copyright (C) 2016 - ARM Ltd - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program. If not, see <http://www.gnu.org/licenses/>. - */ - -#ifndef __ARM64_S2_PGTABLE_NOPMD_H_ -#define __ARM64_S2_PGTABLE_NOPMD_H_ - -#include - -#define __S2_PGTABLE_PMD_FOLDED - -#define S2_PMD_SHIFT S2_PUD_SHIFT -#define S2_PTRS_PER_PMD1 -#define S2_PMD_SIZE(1UL << S2_PMD_SHIFT) -#define S2_PMD_MASK(~(S2_PMD_SIZE-1)) - -#define stage2_pud_none(kvm, pud) (0) -#define stage2_pud_present(kvm, pud) (1) -#define stage2_pud_clear(kvm, pud) do { } while (0) -#define stage2_pud_populate(kvm, pud, pmd) do { } while (0) -#define stage2_pmd_offset(kvm, pud, address) ((pmd_t *)(pud)) - -#define stage2_pmd_free(kvm, pmd) do { } while (0) - -#define stage2_pmd_addr_end(kvm, addr, end)(end) - -#define stage2_pud_huge(kvm, pud) (0) -#define stage2_pmd_table_empty(kvm, pmdp) (0) - -#endif diff --git a/arch/arm64/include/asm/stage2_pgtable-nopud.h b/arch/arm64/include/asm/stage2_pgtable-nopud.h deleted file mode 100644 index cd6304e..000 --- a/arch/arm64/include/asm/stage2_pgtable-nopud.h +++ /dev/null @@ -1,39 +0,0 @@ -/* - * Copyright (C) 2016 - ARM Ltd - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should h
[Qemu-devel] [PATCH v3 07/20] kvm: arm/arm64: Prepare for VM specific stage2 translations
Right now the stage2 page table for a VM is hard coded, assuming an IPA of 40bits. As we are about to add support for per VM IPA, prepare the stage2 page table helpers to accept the kvm instance to make the right decision for the VM. No functional changes. Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves some of the definitions dependent on kvm instance to asm/kvm_mmu.h for arm32. In that process drop the _AC() specifier constants Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2: - Update commit description abuot the movement to asm/kvm_mmu.h for arm32 - Drop _AC() specifiers --- arch/arm/include/asm/kvm_arm.h| 3 +- arch/arm/include/asm/kvm_mmu.h| 15 +++- arch/arm/include/asm/stage2_pgtable.h | 42 - arch/arm64/include/asm/kvm_mmu.h | 7 +- arch/arm64/include/asm/stage2_pgtable-nopmd.h | 18 ++-- arch/arm64/include/asm/stage2_pgtable-nopud.h | 16 ++-- arch/arm64/include/asm/stage2_pgtable.h | 49 ++- virt/kvm/arm/arm.c| 2 +- virt/kvm/arm/mmu.c| 119 +- virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +- 10 files changed, 148 insertions(+), 125 deletions(-) diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index 3ab8b37..c3f1f9b 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -133,8 +133,7 @@ * space. */ #define KVM_PHYS_SHIFT (40) -#define KVM_PHYS_SIZE (_AC(1, ULL) << KVM_PHYS_SHIFT) -#define KVM_PHYS_MASK (KVM_PHYS_SIZE - _AC(1, ULL)) + #define PTRS_PER_S2_PGD(_AC(1, ULL) << (KVM_PHYS_SHIFT - 30)) /* Virtualization Translation Control Register (VTCR) bits */ diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8553d68..f36eb20 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -36,15 +36,19 @@ }) /* - * KVM_MMU_CACHE_MIN_PAGES is the number of stage2 page table translation levels. + * kvm_mmu_cache_min_pages() is the number of stage2 page + * table translation levels, excluding the top level, for + * the given VM. Since we have a 3 level page-table, this + * is fixed. */ -#define KVM_MMU_CACHE_MIN_PAGES2 +#define kvm_mmu_cache_min_pages(kvm) 2 #ifndef __ASSEMBLY__ #include #include #include +#include #include #include #include @@ -52,6 +56,13 @@ /* Ensure compatibility with arm64 */ #define VA_BITS32 +#define kvm_phys_shift(kvm)KVM_PHYS_SHIFT +#define kvm_phys_size(kvm) (1ULL << kvm_phys_shift(kvm)) +#define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - 1ULL) +#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK + +#define stage2_pgd_size(kvm) (PTRS_PER_S2_PGD * sizeof(pgd_t)) + int create_hyp_mappings(void *from, void *to, pgprot_t prot); int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size, void __iomem **kaddr, diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index 460d616..e22ae94 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -19,43 +19,45 @@ #ifndef __ARM_S2_PGTABLE_H_ #define __ARM_S2_PGTABLE_H_ -#define stage2_pgd_none(pgd) pgd_none(pgd) -#define stage2_pgd_clear(pgd) pgd_clear(pgd) -#define stage2_pgd_present(pgd)pgd_present(pgd) -#define stage2_pgd_populate(pgd, pud) pgd_populate(NULL, pgd, pud) -#define stage2_pud_offset(pgd, address)pud_offset(pgd, address) -#define stage2_pud_free(pud) pud_free(NULL, pud) +#define stage2_pgd_none(kvm, pgd) pgd_none(pgd) +#define stage2_pgd_clear(kvm, pgd) pgd_clear(pgd) +#define stage2_pgd_present(kvm, pgd) pgd_present(pgd) +#define stage2_pgd_populate(kvm, pgd, pud) pgd_populate(NULL, pgd, pud) +#define stage2_pud_offset(kvm, pgd, address) pud_offset(pgd, address) +#define stage2_pud_free(kvm, pud) pud_free(NULL, pud) -#define stage2_pud_none(pud) pud_none(pud) -#define stage2_pud_clear(pud) pud_clear(pud) -#define stage2_pud_present(pud)pud_present(pud) -#define stage2_pud_populate(pud, pmd) pud_populate(NULL, pud, pmd) -#define stage2_pmd_offset(pud, address)pmd_offset(pud, address) -#define stage2_pmd_free(pmd) pmd_free(NULL, pmd) +#define stage2_pud_none(kvm, pud) pud_none(pud) +#define stage2_pud_clear(kvm, pud) pud_clear(pud) +#define stage2_pud_present(kvm, pud) pud_present(pud) +#define stage2_pud_populate(kvm, pud, pmd) pud_populate(NULL, pud, pmd) +#define stage2_pmd_offset(kvm, pud, address) pmd_offset(pud,
[Qemu-devel] [PATCH v3 11/20] kvm: arm64: Helper for computing VTCR_EL2.SL0
VTCR_EL2 holds the following key stage2 translation table parameters: SL0 - Entry level in the page table lookup. T0SZ - Denotes the size of the memory addressed by the table. We have been using fixed values for the SL0 depending on the page size as we have a fixed IPA size. But since we are about to make it dynamic, we need to calculate the SL0 at runtime per VM. This patch adds a helper to comput the value of SL0 for a given IPA. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since v2: - Part 2 of split from VTCR & VTTBR dynamic configuration --- arch/arm64/include/asm/kvm_arm.h | 35 --- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index c557f45..11a7db0 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -153,7 +153,8 @@ * 2 level page tables (SL = 1) */ #define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_64K | VTCR_EL2_SL0_LVL1) -#define VTTBR_X_TGRAN_MAGIC38 +#define VTCR_EL2_TGRAN_SL0_BASE3UL + #elif defined(CONFIG_ARM64_16K_PAGES) /* * Stage2 translation configuration: @@ -161,7 +162,7 @@ * 2 level page tables (SL = 1) */ #define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_16K | VTCR_EL2_SL0_LVL1) -#define VTTBR_X_TGRAN_MAGIC42 +#define VTCR_EL2_TGRAN_SL0_BASE3UL #else /* 4K */ /* * Stage2 translation configuration: @@ -169,11 +170,39 @@ * 3 level page tables (SL = 1) */ #define VTCR_EL2_TGRAN_FLAGS (VTCR_EL2_TG0_4K | VTCR_EL2_SL0_LVL1) -#define VTTBR_X_TGRAN_MAGIC37 +#define VTCR_EL2_TGRAN_SL0_BASE2UL #endif #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) /* + * VTCR_EL2:SL0 indicates the entry level for Stage2 translation. + * Interestingly, it depends on the page size. + * See D.10.2.110, VTCR_EL2, in ARM DDI 0487B.b + * + * - + * | Entry level | 4K | 16K/64K | + * -- + * | Level: 0 | 2 | - | + * -- + * | Level: 1 | 1 | 2 | + * -- + * | Level: 2 | 0 | 1 | + * -- + * | Level: 3 | - | 0 | + * -- + * + * That table roughly translates to : + * + * SL0(PAGE_SIZE, Entry_level) = SL0_BASE(PAGE_SIZE) - Entry_Level + * + * Where SL0_BASE(4K) = 2 and SL0_BASE(16K) = 3, SL0_BASE(64K) = 3, provided + * we take care of ruling out the unsupported cases and + * Entry_Level = 4 - Number_of_levels. + * + */ +#define VTCR_EL2_SL0(levels) \ + ((VTCR_EL2_TGRAN_SL0_BASE - (4 - (levels))) << VTCR_EL2_SL0_SHIFT) +/* * ARM VMSAv8-64 defines an algorithm for finding the translation table * descriptors in section D4.2.8 in ARM DDI 0487B.b. * -- 2.7.4
[Qemu-devel] [PATCH v3 12/20] kvm: arm64: Add helper for loading the stage2 setting for a VM
We load the stage2 context of a guest for different operations, including running the guest and tlb maintenance on behalf of the guest. As of now only the vttbr is private to the guest, but this is about to change with IPA per VM. Add a helper to load the stage2 configuration for a VM, which could do the right thing with the future changes. Cc: Christoffer Dall Cc: Marc Zyngier Signed-off-by: Suzuki K Poulose --- Changes since v2: - New patch --- arch/arm64/include/asm/kvm_hyp.h | 6 ++ arch/arm64/kvm/hyp/switch.c | 2 +- arch/arm64/kvm/hyp/tlb.c | 4 ++-- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index 384c343..82f9994 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -155,5 +155,11 @@ void deactivate_traps_vhe_put(void); u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt); void __noreturn __hyp_do_panic(unsigned long, ...); +/* Must be called from hyp code running at EL2 */ +static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm) +{ + write_sysreg(kvm->arch.vttbr, vttbr_el2); +} + #endif /* __ARM64_KVM_HYP_H__ */ diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index d496ef5..355fb25 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -195,7 +195,7 @@ void deactivate_traps_vhe_put(void) static void __hyp_text __activate_vm(struct kvm *kvm) { - write_sysreg(kvm->arch.vttbr, vttbr_el2); + __load_guest_stage2(kvm); } static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu) diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c index 131c777..4dbd9c6 100644 --- a/arch/arm64/kvm/hyp/tlb.c +++ b/arch/arm64/kvm/hyp/tlb.c @@ -30,7 +30,7 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm) * bits. Changing E2H is impossible (goodbye TTBR1_EL2), so * let's flip TGE before executing the TLB operation. */ - write_sysreg(kvm->arch.vttbr, vttbr_el2); + __load_guest_stage2(kvm); val = read_sysreg(hcr_el2); val &= ~HCR_TGE; write_sysreg(val, hcr_el2); @@ -39,7 +39,7 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm) static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm) { - write_sysreg(kvm->arch.vttbr, vttbr_el2); + __load_guest_stage2(kvm); isb(); } -- 2.7.4
[Qemu-devel] [PATCH v3 05/20] kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table
So far we have only supported 3 level page table with fixed IPA of 40bits. Fix stage2_flush_memslot() to accommodate for 4 level tables. Cc: Marc Zyngier Acked-by: Christoffer Dall Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 1d90d79..061e6b3 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -379,7 +379,8 @@ static void stage2_flush_memslot(struct kvm *kvm, pgd = kvm->arch.pgd + stage2_pgd_index(addr); do { next = stage2_pgd_addr_end(addr, end); - stage2_flush_puds(kvm, pgd, addr, next); + if (!stage2_pgd_none(*pgd)) + stage2_flush_puds(kvm, pgd, addr, next); } while (pgd++, addr = next, addr != end); } -- 2.7.4
[Qemu-devel] [PATCH v3 10/20] kvm: arm64: Dynamic configuration of VTTBR mask
On arm64 VTTBR_EL2:BADDR holds the base address for the stage2 translation table. The Arm ARM mandates that the bits BADDR[x-1:0] should be 0, where 'x' is defined for a given IPA Size and the number of levels for a translation granule size. It is defined using some magical constants. This patch is a reverse engineered implementation to calculate the 'x' at runtime for a given ipa and number of page table levels. See patch for more details. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2: - Part 1 of spilt from VTCR & VTTBR dynamic configuration --- arch/arm64/include/asm/kvm_arm.h | 60 +--- arch/arm64/include/asm/kvm_mmu.h | 25 - 2 files changed, 80 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 3dffd38..c557f45 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -140,8 +140,6 @@ * Note that when using 4K pages, we concatenate two first level page tables * together. With 16K pages, we concatenate 16 first level page tables. * - * The magic numbers used for VTTBR_X in this patch can be found in Tables - * D4-23 and D4-25 in ARM DDI 0487A.b. */ #define VTCR_EL2_T0SZ_IPA VTCR_EL2_T0SZ_40B @@ -175,9 +173,63 @@ #endif #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) -#define VTTBR_X(VTTBR_X_TGRAN_MAGIC - VTCR_EL2_T0SZ_IPA) +/* + * ARM VMSAv8-64 defines an algorithm for finding the translation table + * descriptors in section D4.2.8 in ARM DDI 0487B.b. + * + * The algorithm defines the expectations on the BaseAddress (for the page + * table) bits resolved at each level based on the page size, entry level + * and T0SZ. The variable "x" in the algorithm also affects the VTTBR:BADDR + * for stage2 page table. + * + * The value of "x" is calculated as : + * x = Magic_N - T0SZ + * + * where Magic_N is an integer depending on the page size and the entry + * level of the page table as below: + * + * + * | Entry level | 4K16K 64K | + * + * | Level: 0 (4 levels) | 28 | - | - | + * + * | Level: 1 (3 levels) | 37 | 31 | 25 | + * + * | Level: 2 (2 levels) | 46 | 42 | 38 | + * + * | Level: 3 (1 level)| -| 53 | 51 | + * + * + * We have a magic formula for the Magic_N below. + * + * Magic_N(PAGE_SIZE, Entry_Level) = 64 - ((PAGE_SHIFT - 3) * Number of levels) + * + * where number of levels = (4 - Entry_Level). + * + * So, given that T0SZ = (64 - PA_SHIFT), we can compute 'x' as follows: + * + * x = (64 - ((PAGE_SHIFT - 3) * Number_of_levels)) - (64 - PA_SHIFT) + * = PA_SHIFT - ((PAGE_SHIFT - 3) * Number of levels) + * + * Here is one way to explain the Magic Formula: + * + * x = log2(Size_of_Entry_Level_Table) + * + * Since, we can resolve (PAGE_SHIFT - 3) bits at each level, and another + * PAGE_SHIFT bits in the PTE, we have : + * + * Bits_Entry_level = PA_SHIFT - ((PAGE_SHIFT - 3) * (n - 1) + PAGE_SHIFT) + * = PA_SHIFT - (PAGE_SHIFT - 3) * n - 3 + * where n = number of levels, and since each pointer is 8bytes, we have: + * + * x = Bits_Entry_Level + 3 + *= PA_SHIFT - (PAGE_SHIFT - 3) * n + * + * The only constraint here is that, we have to find the number of page table + * levels for a given IPA size (which we do, see stage2_pt_levels()) + */ +#define ARM64_VTTBR_X(ipa, levels) ((ipa) - ((levels) * (PAGE_SHIFT - 3))) -#define VTTBR_BADDR_MASK (((UL(1) << (PHYS_MASK_SHIFT - VTTBR_X)) - 1) << VTTBR_X) #define VTTBR_VMID_SHIFT (UL(48)) #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT) diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index a351722..813a72a 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -146,7 +146,6 @@ static inline unsigned long __kern_hyp_va(unsigned long v) #define kvm_phys_shift(kvm)KVM_PHYS_SHIFT #define kvm_phys_size(kvm) (_AC(1, ULL) << kvm_phys_shift(kvm)) #define kvm_phys_mask(kvm) (kvm_phys_size(kvm) - _AC(1, ULL)) -#define kvm_vttbr_baddr_mask(kvm) VTTBR_BADDR_MASK static inline bool kvm_page_empty(void *ptr) { @@ -503,6 +502,30 @@ static inline int hyp_map_aux_data(void) #define kvm_phys_to_vttbr(addr)phys_to_ttbr(addr) +/* + * Get the magic number 'x' for VTTBR:BADDR of this KVM instance. + * With v8.2 LVA extensions, 'x' should be a minimum of 6 with + * 52bit IPS. + */ +sta
[Qemu-devel] [PATCH v3 06/20] kvm: arm/arm64: Remove spurious WARN_ON
On a 4-level page table pgd entry can be empty, unlike a 3-level page table. Remove the spurious WARN_ON() in stage_get_pud(). Cc: Marc Zyngier Acked-by: Christoffer Dall Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 061e6b3..308171c 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -976,7 +976,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache pud_t *pud; pgd = kvm->arch.pgd + stage2_pgd_index(addr); - if (WARN_ON(stage2_pgd_none(*pgd))) { + if (stage2_pgd_none(*pgd)) { if (!cache) return NULL; pud = mmu_memory_cache_alloc(cache); -- 2.7.4
[Qemu-devel] [PATCH v3 02/20] virtio: pci-legacy: Validate queue pfn
Legacy PCI over virtio uses a 32bit PFN for the queue. If the queue pfn is too large to fit in 32bits, which we could hit on arm64 systems with 52bit physical addresses (even with 64K page size), we simply miss out a proper link to the other side of the queue. Add a check to validate the PFN, rather than silently breaking the devices. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Marc Zyngier Cc: Christoffer Dall Cc: Peter Maydel Cc: Jean-Philippe Brucker Signed-off-by: Suzuki K Poulose --- Changes since v2: - Change errno to -E2BIG --- drivers/virtio/virtio_pci_legacy.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index 2780886..c0d6987a 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -122,6 +122,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, struct virtqueue *vq; u16 num; int err; + u64 q_pfn; /* Select the queue we're interested in */ iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL); @@ -141,9 +142,15 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, if (!vq) return ERR_PTR(-ENOMEM); + q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT; + if (q_pfn >> 32) { + dev_err(_dev->pci_dev->dev, "virtio-pci queue PFN too large\n"); + err = -E2BIG; + goto out_del_vq; + } + /* activate the queue */ - iowrite32(virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, - vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); + iowrite32(q_pfn, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); vq->priv = (void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY; @@ -160,6 +167,7 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, out_deactivate: iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); +out_del_vq: vring_del_virtqueue(vq); return ERR_PTR(err); } -- 2.7.4
[Qemu-devel] [PATCH v3 04/20] kvm: arm64: Clean up VTCR_EL2 initialisation
Use the new helper for converting the parange to the physical shift. Also, add the missing definitions for the VTCR_EL2 register fields and use them instead of hard coding numbers. Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2 - Part 2 of the split from original patch. - Also add missing VTCR field helpers and use them. --- arch/arm64/include/asm/kvm_arm.h | 3 +++ arch/arm64/kvm/hyp/s2-setup.c| 30 ++ 2 files changed, 9 insertions(+), 24 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 6dd285e..3dffd38 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -106,6 +106,7 @@ #define VTCR_EL2_RES1 (1 << 31) #define VTCR_EL2_HD(1 << 22) #define VTCR_EL2_HA(1 << 21) +#define VTCR_EL2_PS_SHIFT TCR_EL2_PS_SHIFT #define VTCR_EL2_PS_MASK TCR_EL2_PS_MASK #define VTCR_EL2_TG0_MASK TCR_TG0_MASK #define VTCR_EL2_TG0_4KTCR_TG0_4K @@ -126,6 +127,8 @@ #define VTCR_EL2_VS_8BIT (0 << VTCR_EL2_VS_SHIFT) #define VTCR_EL2_VS_16BIT (1 << VTCR_EL2_VS_SHIFT) +#define VTCR_EL2_T0SZ(x) TCR_T0SZ(x) + /* * We configure the Stage-2 page tables to always restrict the IPA space to be * 40 bits wide (T0SZ = 24). Systems with a PARange smaller than 40 bits are diff --git a/arch/arm64/kvm/hyp/s2-setup.c b/arch/arm64/kvm/hyp/s2-setup.c index 603e1ee..81094f1 100644 --- a/arch/arm64/kvm/hyp/s2-setup.c +++ b/arch/arm64/kvm/hyp/s2-setup.c @@ -19,11 +19,13 @@ #include #include #include +#include u32 __hyp_text __init_stage2_translation(void) { u64 val = VTCR_EL2_FLAGS; u64 parange; + u32 phys_shift; u64 tmp; /* @@ -34,30 +36,10 @@ u32 __hyp_text __init_stage2_translation(void) parange = read_sysreg(id_aa64mmfr0_el1) & 7; if (parange > ID_AA64MMFR0_PARANGE_MAX) parange = ID_AA64MMFR0_PARANGE_MAX; - val |= parange << 16; + val |= parange << VTCR_EL2_PS_SHIFT; /* Compute the actual PARange... */ - switch (parange) { - case 0: - parange = 32; - break; - case 1: - parange = 36; - break; - case 2: - parange = 40; - break; - case 3: - parange = 42; - break; - case 4: - parange = 44; - break; - case 5: - default: - parange = 48; - break; - } + phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange); /* * ... and clamp it to 40 bits, unless we have some braindead @@ -65,7 +47,7 @@ u32 __hyp_text __init_stage2_translation(void) * return that value for the rest of the kernel to decide what * to do. */ - val |= 64 - (parange > 40 ? 40 : parange); + val |= VTCR_EL2_T0SZ(phys_shift > 40 ? 40 : phys_shift); /* * Check the availability of Hardware Access Flag / Dirty Bit @@ -86,5 +68,5 @@ u32 __hyp_text __init_stage2_translation(void) write_sysreg(val, vtcr_el2); - return parange; + return phys_shift; } -- 2.7.4
[Qemu-devel] [PATCH v3 03/20] arm64: Add a helper for PARange to physical shift conversion
On arm64, ID_AA64MMFR0_EL1.PARange encodes the maximum Physical Address range supported by the CPU. Add a helper to decode this to actual physical shift. If we hit an unallocated value, return the maximum range supported by the kernel. This is will be used by the KVM to set the VTCR_EL2.T0SZ, as it is about to move its place. Having this helper keeps the code movement cleaner. Cc: Catalin Marinas Cc: Marc Zyngier Cc: James Morse Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since V2: - Split the patch - Limit the physical shift only for values unrecognized. --- arch/arm64/include/asm/cpufeature.h | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 1717ba1..855cf0e 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -530,6 +530,19 @@ void arm64_set_ssbd_mitigation(bool state); static inline void arm64_set_ssbd_mitigation(bool state) {} #endif +static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) +{ + switch (parange) { + case 0: return 32; + case 1: return 36; + case 2: return 40; + case 3: return 42; + case 4: return 44; + case 5: return 48; + case 6: return 52; + default: return CONFIG_ARM64_PA_BITS; + } +} #endif /* __ASSEMBLY__ */ #endif -- 2.7.4
[Qemu-devel] [PATCH v3 00/20] arm64: Dynamic & 52bit IPA support
48-to-52 bit conversion for GIC ITS BASER. (suggested by Christoffer) - Split virtio PFN check patches and address comments. Kristina Martsenko (1): vgic: Add support for 52bit guest physical address Suzuki K Poulose (19): virtio: mmio-v1: Validate queue PFN virtio: pci-legacy: Validate queue pfn arm64: Add a helper for PARange to physical shift conversion kvm: arm64: Clean up VTCR_EL2 initialisation kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table kvm: arm/arm64: Remove spurious WARN_ON kvm: arm/arm64: Prepare for VM specific stage2 translations kvm: arm/arm64: Abstract stage2 pgd table allocation kvm: arm64: Make stage2 page table layout dynamic kvm: arm64: Dynamic configuration of VTTBR mask kvm: arm64: Helper for computing VTCR_EL2.SL0 kvm: arm64: Add helper for loading the stage2 setting for a VM kvm: arm64: Configure VTCR per VM kvm: arm/arm64: Expose supported physical address limit for VM kvm: arm/arm64: Allow tuning the physical address size for VM kvm: arm64: Switch to per VM IPA limit kvm: arm64: Add support for handling 52bit IPA kvm: arm64: Allow IPA size supported by the system kvm: arm64: Fall back to normal stage2 entry level Documentation/virtual/kvm/api.txt | 15 ++ arch/arm/include/asm/kvm_arm.h| 3 +- arch/arm/include/asm/kvm_mmu.h| 28 +++- arch/arm/include/asm/stage2_pgtable.h | 42 ++--- arch/arm64/include/asm/cpufeature.h | 13 ++ arch/arm64/include/asm/kvm_arm.h | 137 ++--- arch/arm64/include/asm/kvm_asm.h | 2 +- arch/arm64/include/asm/kvm_host.h | 19 ++- arch/arm64/include/asm/kvm_hyp.h | 16 ++ arch/arm64/include/asm/kvm_mmu.h | 92 ++- arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 - arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 - arch/arm64/include/asm/stage2_pgtable.h | 213 +++--- arch/arm64/kvm/guest.c| 42 + arch/arm64/kvm/hyp/s2-setup.c | 37 + arch/arm64/kvm/hyp/switch.c | 4 +- arch/arm64/kvm/hyp/tlb.c | 4 +- drivers/virtio/virtio_mmio.c | 18 ++- drivers/virtio/virtio_pci_legacy.c| 12 +- include/linux/irqchip/arm-gic-v3.h| 5 + include/uapi/linux/kvm.h | 16 ++ virt/kvm/arm/arm.c| 32 +++- virt/kvm/arm/mmu.c| 124 --- virt/kvm/arm/vgic/vgic-its.c | 36 ++--- virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +- virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 - 26 files changed, 663 insertions(+), 332 deletions(-) delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h kvmtool patches : Suzuki K Poulose (4): kvmtool: Allow backends to run checks on the KVM device fd kvmtool: arm64: Add support for guest physical address size kvmtool: arm64: Switch memory layout kvmtool: arm: Add support for creating VM with PA size arm/aarch32/include/kvm/kvm-arch.h| 6 -- arm/aarch64/include/kvm/kvm-arch.h| 15 --- arm/aarch64/include/kvm/kvm-config-arch.h | 5 - arm/include/arm-common/kvm-arch.h | 17 +++-- arm/include/arm-common/kvm-config-arch.h | 1 + arm/kvm.c | 24 +++- include/kvm/kvm.h | 4 kvm.c | 2 ++ 8 files changed, 61 insertions(+), 13 deletions(-) -- 2.7.4
[Qemu-devel] [PATCH v3 01/20] virtio: mmio-v1: Validate queue PFN
virtio-mmio with virtio-v1 uses a 32bit PFN for the queue. If the queue pfn is too large to fit in 32bits, which we could hit on arm64 systems with 52bit physical addresses (even with 64K page size), we simply miss out a proper link to the other side of the queue. Add a check to validate the PFN, rather than silently breaking the devices. Cc: "Michael S. Tsirkin" Cc: Jason Wang Cc: Marc Zyngier Cc: Christoffer Dall Cc: Peter Maydel Cc: Jean-Philippe Brucker Signed-off-by: Suzuki K Poulose --- Changes since v2: - Change errno to -E2BIG --- drivers/virtio/virtio_mmio.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 67763d3..82cedc8 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -397,9 +397,21 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index, /* Activate the queue */ writel(virtqueue_get_vring_size(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NUM); if (vm_dev->version == 1) { + u64 q_pfn = virtqueue_get_desc_addr(vq) >> PAGE_SHIFT; + + /* +* virtio-mmio v1 uses a 32bit QUEUE PFN. If we have something +* that doesn't fit in 32bit, fail the setup rather than +* pretending to be successful. +*/ + if (q_pfn >> 32) { + dev_err(>dev, "virtio-mmio: queue address too large\n"); + err = -E2BIG; + goto error_bad_pfn; + } + writel(PAGE_SIZE, vm_dev->base + VIRTIO_MMIO_QUEUE_ALIGN); - writel(virtqueue_get_desc_addr(vq) >> PAGE_SHIFT, - vm_dev->base + VIRTIO_MMIO_QUEUE_PFN); + writel(q_pfn, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN); } else { u64 addr; @@ -430,6 +442,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index, return vq; +error_bad_pfn: + vring_del_virtqueue(vq); error_new_virtqueue: if (vm_dev->version == 1) { writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN); -- 2.7.4
Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
On 09/05/16 04:30, Vijay Kilari wrote: Hi Suzuki, The last 5 patches are not compiling on v4.4. Looks like your patch series is not merged completely. Can you please rebase your patches and let me know. Could you please give the tree below a try ? git://linux-arm.org/linux-skp.git cpu-ftr/v3-4.3-rc4 This works. Now the question is, Are your patches getting merged anytime soon?. Well, we have been waiting for a use case, like this, before we merge the series. Will, Catalin, Now that we have some real users of the infrastructure, what do you think ? I can post an updated/rebased series, if you would like. Suzuki If not, I prefer to go with /proc/cpuinfo. Another solution is look for /sys/devices/system/cpu/cpu$ID/identification/midr if not available then fall back on /proc/cpuinfo. Regards Vijay
Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
On 13/04/16 10:54, Vijay Kilari wrote: On Mon, Apr 11, 2016 at 3:07 PM, Suzuki K Poulose <suzuki.poul...@arm.com> wrote: On 11/04/16 07:52, Vijay Kilari wrote: Hi Suzuki, The last 5 patches are not compiling on v4.4. Looks like your patch series is not merged completely. Can you please rebase your patches and let me know. Could you please give the tree below a try ? git://linux-arm.org/linux-skp.git cpu-ftr/v3-4.3-rc4 Cheers Suzuki
Re: [Qemu-devel] [RFC PATCH v2 2/3] utils: Add cpuinfo helper to fetch /proc/cpuinfo
On 11/04/16 07:52, Vijay Kilari wrote: Adding Suzuki Poulose. Hi Suzuki, On Fri, Apr 8, 2016 at 3:13 PM, Peter Maydellwrote: On 8 April 2016 at 07:21, Vijay Kilari wrote: On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell wrote: I'm told there are kernel patches in progress to get this sort of information in a maintainable way to userspace, which are currently somewhat stalled due to lack of anybody who wants to consume it. If you have a use case then you should probably flag it up with the kernel devs. Can you please give references to those patches/discussion? I'm told the most recent thread is https://lkml.org/lkml/2015/10/5/517 (and that most of the patches in that series have gone in, except for the last 4 or 5 which implement the ABI). Can you please throw some light on what is the status of ABI to read cpu information in user space. I wanted to know cpu implementer, part number in QEMU utils to add prefetches to speed up live migration for Thunderx platform. As for the patch series, except for that last 5 patches (which actually implements the ABI), the infrastructure patches have been merged in v4.4. We are awaiting feedback from possible consumers like toolchain (gcc, glibc). If you think this will be suitable for you, thats good to know. There is documentation available in the last patch in the above series. Could you please try the series (on v4.4, which would be easier, by simply picking up the last 5 patches) and let us know if that works for you ? Cheers Suzuki
[Qemu-devel] Qemu s390x emulation
Hi I have been trying to setup a qemu session for qemu-system-s390x (on x86_64) using a kernel (with initramfs built-in the kernel) without a disk image. The kernel was built with s390 defconfig + disabled loadable modules (just to keep everything inside the kernel). $ qemu-system-s390x -M s390 -kernel vmlinux -m 1024 The session dies in say 2 secs, with an exit code of 0. I searched for some hints / success stories, couldn't find any. Am I doing something wrong here ? Please let me know the right procedure for getting this up and running. Thanks Suzuki
Re: [Qemu-devel] Qemu s390x emulation
On 01/15/2013 04:39 PM, Alexander Graf wrote: On 15.01.2013, at 12:05, Suzuki K. Poulose wrote: Hi I have been trying to setup a qemu session for qemu-system-s390x (on x86_64) using a kernel (with initramfs built-in the kernel) without a disk image. The kernel was built with s390 defconfig + disabled loadable modules (just to keep everything inside the kernel). $ qemu-system-s390x -M s390 -kernel vmlinux -m 1024 The session dies in say 2 secs, with an exit code of 0. I searched for some hints / success stories, couldn't find any. Am I doing something wrong here ? Please let me know the right procedure for getting this up and running. S390 boots using an image file. Please try -kernel kernel dir/arch/s390/boot/image. Tried that even, but not any better. btw, moved to the upstream git for qemu. 0 $/data/src/qemu/s390x-softmmu/qemu-system-s390x -m 1024 -kernel ./image -nographic $echo $? 0 $file ./image ./image: Linux S390 $ cd /data/src/qemu/ ; git log | head -n1 commit cf7c3f0cb5a7129f57fa9e69d410d6a05031988c Thanks Suzuki Alex