Re: [PATCH v13 13/35] KVM: Introduce per-page memory attributes
On Thu, 2023-11-02 at 11:32 +0100, Paolo Bonzini wrote: > > > +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES > > > +static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, > > > gfn_t gfn) > > > +{ > > > + return xa_to_value(xa_load(>mem_attr_array, gfn)); > > > +} > > > > Only call xa_to_value() when xa_load() returns !NULL? > > This xarray does not store a pointer, therefore xa_load() actually > returns an integer that is tagged with 1 in the low bit: > > static inline unsigned long xa_to_value(const void *entry) > { > return (unsigned long)entry >> 1; > } > > Returning zero for an empty entry is okay, so the result of xa_load() > can be used directly. Thanks for explaining. I was thinking perhaps it's better to do: void *entry = xa_load(...); return xa_is_value(entry) ? xa_to_value(entry) : 0; But "NULL (0) >> 1" is still 0, so yes we can use directly.
Re: [PATCH v13 13/35] KVM: Introduce per-page memory attributes
On 11/2/23 04:01, Huang, Kai wrote: On Fri, 2023-10-27 at 11:21 -0700, Sean Christopherson wrote: From: Chao Peng In confidential computing usages, whether a page is private or shared is necessary information for KVM to perform operations like page fault handling, page zapping etc. There are other potential use cases for per-page memory attributes, e.g. to make memory read-only (or no-exec, or exec-only, etc.) without having to modify memslots. Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow userspace to operate on the per-page memory attributes. - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to a guest memory range. - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported memory attributes. Use an xarray to store the per-page attributes internally, with a naive, not fully optimized implementation, i.e. prioritize correctness over performance for the initial implementation. Use bit 3 for the PRIVATE attribute so that KVM can use bits 0-2 for RWX attributes/protections in the future, e.g. to give userspace fine-grained control over read, write, and execute protections for guest memory. Provide arch hooks for handling attribute changes before and after common code sets the new attributes, e.g. x86 will use the "pre" hook to zap all relevant mappings, and the "post" hook to track whether or not hugepages can be used to map the range. To simplify the implementation wrap the entire sequence with kvm_mmu_invalidate_{begin,end}() even though the operation isn't strictly guaranteed to be an invalidation. For the initial use case, x86 *will* always invalidate memory, and preventing arch code from creating new mappings while the attributes are in flux makes it much easier to reason about the correctness of consuming attributes. It's possible that future usages may not require an invalidation, e.g. if KVM ends up supporting RWX protections and userspace grants _more_ protections, but again opt for simplicity and punt optimizations to if/when they are needed. Suggested-by: Sean Christopherson Link: https://lore.kernel.org/all/y2wb48kd0j4vg...@google.com Cc: Fuad Tabba Cc: Xu Yilun Cc: Mickaël Salaün Signed-off-by: Chao Peng Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson [...] +Note, there is no "get" API. Userspace is responsible for explicitly tracking +the state of a gfn/page as needed. + [...] +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES +static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn) +{ + return xa_to_value(xa_load(>mem_attr_array, gfn)); +} Only call xa_to_value() when xa_load() returns !NULL? This xarray does not store a pointer, therefore xa_load() actually returns an integer that is tagged with 1 in the low bit: static inline unsigned long xa_to_value(const void *entry) { return (unsigned long)entry >> 1; } Returning zero for an empty entry is okay, so the result of xa_load() can be used directly. + +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, +unsigned long attrs); Seems it's not immediately clear why this function is needed in this patch, especially when you said there is no "get" API above. Add some material to changelog? It's used by later patches; even without a "get" API, it's a pretty fundamental functionality. +bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, + struct kvm_gfn_range *range); +bool kvm_arch_post_set_memory_attributes(struct kvm *kvm, +struct kvm_gfn_range *range); Looks if this Kconfig is on, the above two arch hooks won't have implementation. Is it better to have two __weak empty versions here in this patch? Anyway, from the changelog it seems it's not mandatory for some ARCH to provide the above two if one wants to turn this on, i.e., the two hooks can be empty and the ARCH can just use the __weak version. I think this can be added by the first arch that needs memory attributes and also doesn't need one of these hooks. Or perhaps the x86 kvm_arch_pre_set_memory_attributes() could be made generic and thus that would be the __weak version. It's too early to tell, so it's better to leave the implementation to the architectures for now. Paolo
Re: [PATCH v13 13/35] KVM: Introduce per-page memory attributes
On Fri, 2023-10-27 at 11:21 -0700, Sean Christopherson wrote: > From: Chao Peng > > In confidential computing usages, whether a page is private or shared is > necessary information for KVM to perform operations like page fault > handling, page zapping etc. There are other potential use cases for > per-page memory attributes, e.g. to make memory read-only (or no-exec, > or exec-only, etc.) without having to modify memslots. > > Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow > userspace to operate on the per-page memory attributes. > - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to > a guest memory range. > - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported > memory attributes. > > Use an xarray to store the per-page attributes internally, with a naive, > not fully optimized implementation, i.e. prioritize correctness over > performance for the initial implementation. > > Use bit 3 for the PRIVATE attribute so that KVM can use bits 0-2 for RWX > attributes/protections in the future, e.g. to give userspace fine-grained > control over read, write, and execute protections for guest memory. > > Provide arch hooks for handling attribute changes before and after common > code sets the new attributes, e.g. x86 will use the "pre" hook to zap all > relevant mappings, and the "post" hook to track whether or not hugepages > can be used to map the range. > > To simplify the implementation wrap the entire sequence with > kvm_mmu_invalidate_{begin,end}() even though the operation isn't strictly > guaranteed to be an invalidation. For the initial use case, x86 *will* > always invalidate memory, and preventing arch code from creating new > mappings while the attributes are in flux makes it much easier to reason > about the correctness of consuming attributes. > > It's possible that future usages may not require an invalidation, e.g. > if KVM ends up supporting RWX protections and userspace grants _more_ > protections, but again opt for simplicity and punt optimizations to > if/when they are needed. > > Suggested-by: Sean Christopherson > Link: https://lore.kernel.org/all/y2wb48kd0j4vg...@google.com > Cc: Fuad Tabba > Cc: Xu Yilun > Cc: Mickaël Salaün > Signed-off-by: Chao Peng > Co-developed-by: Sean Christopherson > Signed-off-by: Sean Christopherson > [...] > +Note, there is no "get" API. Userspace is responsible for explicitly > tracking > +the state of a gfn/page as needed. > + > [...] > > +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES > +static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t > gfn) > +{ > + return xa_to_value(xa_load(>mem_attr_array, gfn)); > +} Only call xa_to_value() when xa_load() returns !NULL? > + > +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end, > + unsigned long attrs); Seems it's not immediately clear why this function is needed in this patch, especially when you said there is no "get" API above. Add some material to changelog? > +bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, > + struct kvm_gfn_range *range); > +bool kvm_arch_post_set_memory_attributes(struct kvm *kvm, > + struct kvm_gfn_range *range); Looks if this Kconfig is on, the above two arch hooks won't have implementation. Is it better to have two __weak empty versions here in this patch? Anyway, from the changelog it seems it's not mandatory for some ARCH to provide the above two if one wants to turn this on, i.e., the two hooks can be empty and the ARCH can just use the __weak version.
Re: [PATCH v13 13/35] KVM: Introduce per-page memory attributes
On 2023-10-27 11:21 AM, Sean Christopherson wrote: > From: Chao Peng > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 89c1a991a3b8..df573229651b 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -808,6 +809,9 @@ struct kvm { > > #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER > struct notifier_block pm_notifier; > +#endif > +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES > + struct xarray mem_attr_array; Please document how access to mem_attr_array is synchronized. If I'm reading the code correctly I think it's... /* Protected by slots_locks (for writes) and RCU (for reads) */
Re: [PATCH v13 13/35] KVM: Introduce per-page memory attributes
On Mon, Oct 30, 2023, Sean Christopherson wrote: > On Mon, Oct 30, 2023, Chao Gao wrote: > > On Fri, Oct 27, 2023 at 11:21:55AM -0700, Sean Christopherson wrote: > > >From: Chao Peng > > > > > >In confidential computing usages, whether a page is private or shared is > > >necessary information for KVM to perform operations like page fault > > >handling, page zapping etc. There are other potential use cases for > > >per-page memory attributes, e.g. to make memory read-only (or no-exec, > > >or exec-only, etc.) without having to modify memslots. > > > > > >Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow > > >userspace to operate on the per-page memory attributes. > > > - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to > > >a guest memory range. > > > > > - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported > > >memory attributes. > > > > This ioctl() is already removed. So, the changelog is out-of-date and needs > > an update. > > Doh, I lost track of this and the fixup for KVM_CAP_MEMORY_ATTRIBUTES below. > > > >+:Capability: KVM_CAP_MEMORY_ATTRIBUTES > > >+:Architectures: x86 > > >+:Type: vm ioctl > > >+:Parameters: struct kvm_memory_attributes(in) > > > >^ add one space here? > > Ah, yeah, that does appear to be the standard. > > > > > > >+static bool kvm_pre_set_memory_attributes(struct kvm *kvm, > > >+struct kvm_gfn_range *range) > > >+{ > > >+ /* > > >+ * Unconditionally add the range to the invalidation set, regardless of > > >+ * whether or not the arch callback actually needs to zap SPTEs. E.g. > > >+ * if KVM supports RWX attributes in the future and the attributes are > > >+ * going from R=>RW, zapping isn't strictly necessary. Unconditionally > > >+ * adding the range allows KVM to require that MMU invalidations add at > > >+ * least one range between begin() and end(), e.g. allows KVM to detect > > >+ * bugs where the add() is missed. Rexlaing the rule *might* be safe, > > > > Relaxing > > > > >@@ -4640,6 +4850,17 @@ static int > > >kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) > > > case KVM_CAP_BINARY_STATS_FD: > > > case KVM_CAP_SYSTEM_EVENT_DATA: > > > return 1; > > >+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES > > >+ case KVM_CAP_MEMORY_ATTRIBUTES: > > >+ u64 attrs = kvm_supported_mem_attributes(kvm); > > >+ > > >+ r = -EFAULT; > > >+ if (copy_to_user(argp, , sizeof(attrs))) > > >+ goto out; > > >+ r = 0; > > >+ break; > > > > This cannot work, e.g., no @argp in this function and is fixed by a later > > commit: > > > > fcbef1e5e5d2 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for > > guest-specific backing memory") > > I'll post a fixup patch for all of these, thanks much! Heh, that was an -ENOCOFFEE. Fixup patches for a changelog goof and an ephemeral bug are going to be hard to post. Paolo, do you want to take care of all of these fixups and typos, or would you prefer that I start a v14 branch and then hand it off to you at some point?
Re: [PATCH v13 13/35] KVM: Introduce per-page memory attributes
On Mon, Oct 30, 2023, Chao Gao wrote: > On Fri, Oct 27, 2023 at 11:21:55AM -0700, Sean Christopherson wrote: > >From: Chao Peng > > > >In confidential computing usages, whether a page is private or shared is > >necessary information for KVM to perform operations like page fault > >handling, page zapping etc. There are other potential use cases for > >per-page memory attributes, e.g. to make memory read-only (or no-exec, > >or exec-only, etc.) without having to modify memslots. > > > >Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow > >userspace to operate on the per-page memory attributes. > > - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to > >a guest memory range. > > > - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported > >memory attributes. > > This ioctl() is already removed. So, the changelog is out-of-date and needs > an update. Doh, I lost track of this and the fixup for KVM_CAP_MEMORY_ATTRIBUTES below. > >+:Capability: KVM_CAP_MEMORY_ATTRIBUTES > >+:Architectures: x86 > >+:Type: vm ioctl > >+:Parameters: struct kvm_memory_attributes(in) > > ^ add one space here? Ah, yeah, that does appear to be the standard. > > > >+static bool kvm_pre_set_memory_attributes(struct kvm *kvm, > >+ struct kvm_gfn_range *range) > >+{ > >+/* > >+ * Unconditionally add the range to the invalidation set, regardless of > >+ * whether or not the arch callback actually needs to zap SPTEs. E.g. > >+ * if KVM supports RWX attributes in the future and the attributes are > >+ * going from R=>RW, zapping isn't strictly necessary. Unconditionally > >+ * adding the range allows KVM to require that MMU invalidations add at > >+ * least one range between begin() and end(), e.g. allows KVM to detect > >+ * bugs where the add() is missed. Rexlaing the rule *might* be safe, > > Relaxing > > >@@ -4640,6 +4850,17 @@ static int > >kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) > > case KVM_CAP_BINARY_STATS_FD: > > case KVM_CAP_SYSTEM_EVENT_DATA: > > return 1; > >+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES > >+case KVM_CAP_MEMORY_ATTRIBUTES: > >+u64 attrs = kvm_supported_mem_attributes(kvm); > >+ > >+r = -EFAULT; > >+if (copy_to_user(argp, , sizeof(attrs))) > >+goto out; > >+r = 0; > >+break; > > This cannot work, e.g., no @argp in this function and is fixed by a later > commit: > > fcbef1e5e5d2 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for > guest-specific backing memory") I'll post a fixup patch for all of these, thanks much!
Re: [PATCH v13 13/35] KVM: Introduce per-page memory attributes
On Fri, Oct 27, 2023 at 11:21:55AM -0700, Sean Christopherson wrote: >From: Chao Peng > >In confidential computing usages, whether a page is private or shared is >necessary information for KVM to perform operations like page fault >handling, page zapping etc. There are other potential use cases for >per-page memory attributes, e.g. to make memory read-only (or no-exec, >or exec-only, etc.) without having to modify memslots. > >Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow >userspace to operate on the per-page memory attributes. > - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to >a guest memory range. > - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported >memory attributes. This ioctl() is already removed. So, the changelog is out-of-date and needs an update. > >+ >+:Capability: KVM_CAP_MEMORY_ATTRIBUTES >+:Architectures: x86 >+:Type: vm ioctl >+:Parameters: struct kvm_memory_attributes(in) ^ add one space here? >+static bool kvm_pre_set_memory_attributes(struct kvm *kvm, >+struct kvm_gfn_range *range) >+{ >+ /* >+ * Unconditionally add the range to the invalidation set, regardless of >+ * whether or not the arch callback actually needs to zap SPTEs. E.g. >+ * if KVM supports RWX attributes in the future and the attributes are >+ * going from R=>RW, zapping isn't strictly necessary. Unconditionally >+ * adding the range allows KVM to require that MMU invalidations add at >+ * least one range between begin() and end(), e.g. allows KVM to detect >+ * bugs where the add() is missed. Rexlaing the rule *might* be safe, Relaxing >@@ -4640,6 +4850,17 @@ static int kvm_vm_ioctl_check_extension_generic(struct >kvm *kvm, long arg) > case KVM_CAP_BINARY_STATS_FD: > case KVM_CAP_SYSTEM_EVENT_DATA: > return 1; >+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES >+ case KVM_CAP_MEMORY_ATTRIBUTES: >+ u64 attrs = kvm_supported_mem_attributes(kvm); >+ >+ r = -EFAULT; >+ if (copy_to_user(argp, , sizeof(attrs))) >+ goto out; >+ r = 0; >+ break; This cannot work, e.g., no @argp in this function and is fixed by a later commit: fcbef1e5e5d2 ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory")
[PATCH v13 13/35] KVM: Introduce per-page memory attributes
From: Chao Peng In confidential computing usages, whether a page is private or shared is necessary information for KVM to perform operations like page fault handling, page zapping etc. There are other potential use cases for per-page memory attributes, e.g. to make memory read-only (or no-exec, or exec-only, etc.) without having to modify memslots. Introduce two ioctls (advertised by KVM_CAP_MEMORY_ATTRIBUTES) to allow userspace to operate on the per-page memory attributes. - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to a guest memory range. - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported memory attributes. Use an xarray to store the per-page attributes internally, with a naive, not fully optimized implementation, i.e. prioritize correctness over performance for the initial implementation. Use bit 3 for the PRIVATE attribute so that KVM can use bits 0-2 for RWX attributes/protections in the future, e.g. to give userspace fine-grained control over read, write, and execute protections for guest memory. Provide arch hooks for handling attribute changes before and after common code sets the new attributes, e.g. x86 will use the "pre" hook to zap all relevant mappings, and the "post" hook to track whether or not hugepages can be used to map the range. To simplify the implementation wrap the entire sequence with kvm_mmu_invalidate_{begin,end}() even though the operation isn't strictly guaranteed to be an invalidation. For the initial use case, x86 *will* always invalidate memory, and preventing arch code from creating new mappings while the attributes are in flux makes it much easier to reason about the correctness of consuming attributes. It's possible that future usages may not require an invalidation, e.g. if KVM ends up supporting RWX protections and userspace grants _more_ protections, but again opt for simplicity and punt optimizations to if/when they are needed. Suggested-by: Sean Christopherson Link: https://lore.kernel.org/all/y2wb48kd0j4vg...@google.com Cc: Fuad Tabba Cc: Xu Yilun Cc: Mickaël Salaün Signed-off-by: Chao Peng Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- Documentation/virt/kvm/api.rst | 36 + include/linux/kvm_host.h | 18 +++ include/uapi/linux/kvm.h | 13 ++ virt/kvm/Kconfig | 4 + virt/kvm/kvm_main.c| 233 + 5 files changed, 304 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 860216536810..e2252c748fd6 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6091,6 +6091,42 @@ applied. See KVM_SET_USER_MEMORY_REGION. +4.140 KVM_SET_MEMORY_ATTRIBUTES +--- + +:Capability: KVM_CAP_MEMORY_ATTRIBUTES +:Architectures: x86 +:Type: vm ioctl +:Parameters: struct kvm_memory_attributes(in) +:Returns: 0 on success, <0 on error + +KVM_SET_MEMORY_ATTRIBUTES allows userspace to set memory attributes for a range +of guest physical memory. + +:: + + struct kvm_memory_attributes { + __u64 address; + __u64 size; + __u64 attributes; + __u64 flags; + }; + + #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) + +The address and size must be page aligned. The supported attributes can be +retrieved via ioctl(KVM_CHECK_EXTENSION) on KVM_CAP_MEMORY_ATTRIBUTES. If +executed on a VM, KVM_CAP_MEMORY_ATTRIBUTES precisely returns the attributes +supported by that VM. If executed at system scope, KVM_CAP_MEMORY_ATTRIBUTES +returns all attributes supported by KVM. The only attribute defined at this +time is KVM_MEMORY_ATTRIBUTE_PRIVATE, which marks the associated gfn as being +guest private memory. + +Note, there is no "get" API. Userspace is responsible for explicitly tracking +the state of a gfn/page as needed. + +The "flags" field is reserved for future extensions and must be '0'. + 5. The kvm_run structure diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 89c1a991a3b8..df573229651b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -256,6 +256,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER union kvm_mmu_notifier_arg { pte_t pte; + unsigned long attributes; }; struct kvm_gfn_range { @@ -808,6 +809,9 @@ struct kvm { #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER struct notifier_block pm_notifier; +#endif +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES + struct xarray mem_attr_array; #endif char stats_id[KVM_STATS_NAME_SIZE]; }; @@ -2340,4 +2344,18 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, vcpu->run->memory_fault.flags = 0; } +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES +static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn) +{ + return