Re: [PATCH V3] tools/perf: Add perf binary dependent rule for shellcheck log in Makefile.perf

2023-11-06 Thread Athira Rajeev



> On 23-Oct-2023, at 4:14 PM, James Clark  wrote:
> 
> 
> 
> On 13/10/2023 08:36, Athira Rajeev wrote:
>> Add rule in new Makefile "tests/Makefile.tests" for running
>> shellcheck on shell test scripts. This automates below shellcheck
>> into the build.
>> 
>> $ for F in $(find tests/shell/ -perm -o=x -name '*.sh'); do shellcheck -S 
>> warning $F; done
>> 
>> Condition for shellcheck is added in Makefile.perf to avoid build
>> breakage in the absence of shellcheck binary. Update Makefile.perf
>> to contain new rule for "SHELLCHECK_TEST" which is for making
>> shellcheck test as a dependency on perf binary. Added
>> "tests/Makefile.tests" to run shellcheck on shellscripts in
>> tests/shell. The make rule "SHLLCHECK_RUN" ensures that, every
>> time during make, shellcheck will be run only on modified files
>> during subsequent invocations. By this, if any newly added shell
>> scripts or fixes in existing scripts breaks coding/formatting
>> style, it will get captured during the perf build.
>> 
>> Example build failure with present scripts in tests/shell:
>> 
>> INSTALL libsubcmd_headers
>> INSTALL libperf_headers
>> INSTALL libapi_headers
>> INSTALL libsymbol_headers
>> INSTALL libbpf_headers
>> make[3]: *** 
>> [/root/athira/namhyung/perf-tools-next/tools/perf/tests/Makefile.tests:17: 
>> output/tests/shell/record_sideband.dep] Error 1
>> make[3]: *** Waiting for unfinished jobs
>> make[3]: *** 
>> [/root/athira/namhyung/perf-tools-next/tools/perf/tests/Makefile.tests:17: 
>> output/tests/shell/test_arm_coresight.dep] Error 1
>> make[3]: *** 
>> [/root/athira/namhyung/perf-tools-next/tools/perf/tests/Makefile.tests:17: 
>> output/tests/shell/lock_contention.dep] Error 1
>> make[2]: *** [Makefile.perf:675: SHELLCHECK_TEST] Error 2
>> make[1]: *** [Makefile.perf:238: sub-make] Error 2
>> make: *** [Makefile:70: all] Error 2
>> 
>> After this, for testing, changed "tests/shell/record.sh" to
>> break shellcheck format. In the next make run, it is
>> also captured:
>> 
>> make[3]: *** 
>> [/root/athira/namhyung/perf-tools-next/tools/perf/tests/Makefile.tests:17: 
>> output/tests/shell/record_sideband.dep] Error 1
>> make[3]: *** Waiting for unfinished jobs
>> make[3]: *** 
>> [/root/athira/namhyung/perf-tools-next/tools/perf/tests/Makefile.tests:17: 
>> output/tests/shell/record.dep] Error 1
>> make[3]: *** 
>> [/root/athira/namhyung/perf-tools-next/tools/perf/tests/Makefile.tests:17: 
>> output/tests/shell/test_arm_coresight.dep] Error 1
>> make[3]: *** 
>> [/root/athira/namhyung/perf-tools-next/tools/perf/tests/Makefile.tests:17: 
>> output/tests/shell/lock_contention.dep] Error 1
>> make[2]: *** [Makefile.perf:675: SHELLCHECK_TEST] Error 2
>> make[1]: *** [Makefile.perf:238: sub-make] Error 2
>> make: *** [Makefile:70: all] Error 2
>> 
>> The exact failure log can be found in:
>> output/tests/shell/record.dep.log file
>> 
> 
> Hi Athira,
> 
> Having the reason for a hard failure put into a log file rather than the
> console output is very non standard. I'm not sure what the reason for
> this is.
> 
> The log filename isn't even listed in the output so how would anyone
> know what went wrong?
> 
> Can we just have it so that the failure is printed in the make output
> like any other build error.

Sure James, Thanks for looking into and sharing the review comment.
I will address the change in V4

> 
> [...]
> 
>> +output/%.dep: %.sh | $(DIRS)
>> + $(call rule_mkdir)
>> + $(eval input_file := $(subst output/,./,$(patsubst %.dep, %.sh, $@)))
>> + $(Q)$(call frecho-cmd,test)@shellcheck -S warning ${input_file} 1> $@.log 
>> && ([[ ! -s $@.log ]])
> 
> [[ ]] is a bash extension, but the build system seems to use /bin/sh so
> you get this error depending on your distro:
> 
>  tools/perf/tests/Makefile.tests:17: output/tests/shell
>  /record+probe_libc_inet_pton.dep] Error 127
>  /bin/sh: 1: [[: not found
> 
> Changing it to [ ] fixes it

Ok, will make the change in next version

> 
>> + $(Q)$(call frecho-cmd,test)@touch $@
> 
> Touching the source file in the build system doesn't feel right, surely
> this could be open to all kinds of parallel build race conditions or
> version controll issues.
> 
> Isn't the output of the rule the .log file, so just a normal make rule
> based on those two files work? Then if the .log file is older than the
> source file, the shellcheck is re-run, otherwise not? It feels like the
> .dep file would then also be unecessary.

Ok, I will fix this.
> 
> The .dep lines in the make output are a bit confusing because they're
> not in the source tree so it's not clear to an outsider what that make
> output is for.
> 
> Other than that, it does seem to work ok for me.
Thanks for the review. I will post V4 with all the changes

Athira
> 
>> + $(Q)$(call frecho-cmd,test)@rm -rf $@.log
>> +$(DIRS):
>> + @mkdir -p $@
>> +
>> +clean:
>> + @rm -rf output




Re: [PATCH 1/3] perf tests test_arm_coresight: Fix the shellcheck warning in latest test_arm_coresight.sh

2023-11-06 Thread Athira Rajeev



> On 07-Nov-2023, at 3:14 AM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Thu, Oct 05, 2023 at 02:24:15PM +0530, Athira Rajeev escreveu:
>>> On 05-Oct-2023, at 1:50 PM, James Clark  wrote:
>>> On 29/09/2023 05:11, Athira Rajeev wrote:
 Running shellcheck on tests/shell/test_arm_coresight.sh
 throws below warnings:
 
 In tests/shell/test_arm_coresight.sh line 15:
 cs_etm_path=$(find  /sys/bus/event_source/devices/cs_etm/ -name cpu* 
 -print -quit)
 ^--^ SC2061: Quote the parameter to -name so the shell 
 won't interpret it.
 
 In tests/shell/test_arm_coresight.sh line 20:
 if [ $archhver -eq 5 -a "$(printf "0x%X\n" $archpart)" = "0xA13" ] ; then
 ^-- SC2166: Prefer [ p ] && [ q ] as [ p -a q 
 ] is not well defined
 
 This warning is observed after commit:
 "commit bb350847965d ("perf test: Update cs_etm testcase for Arm ETE")"
 
 Fixed this issue by using quoting 'cpu*' for SC2061 and
 using "&&" in line number 20 for SC2166 warning
 
 Fixes: bb350847965d ("perf test: Update cs_etm testcase for Arm ETE")
 Signed-off-by: Athira Rajeev 
 ---
 tools/perf/tests/shell/test_arm_coresight.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/tools/perf/tests/shell/test_arm_coresight.sh 
 b/tools/perf/tests/shell/test_arm_coresight.sh
 index fe78c4626e45..f2115dfa24a5 100755
 --- a/tools/perf/tests/shell/test_arm_coresight.sh
 +++ b/tools/perf/tests/shell/test_arm_coresight.sh
 @@ -12,12 +12,12 @@
 glb_err=0
 
 cs_etm_dev_name() {
 - cs_etm_path=$(find  /sys/bus/event_source/devices/cs_etm/ -name cpu* 
 -print -quit)
 + cs_etm_path=$(find  /sys/bus/event_source/devices/cs_etm/ -name 'cpu*' 
 -print -quit)
 trcdevarch=$(cat ${cs_etm_path}/mgmt/trcdevarch)
 archhver=$((($trcdevarch >> 12) & 0xf))
 archpart=$(($trcdevarch & 0xfff))
 
 - if [ $archhver -eq 5 -a "$(printf "0x%X\n" $archpart)" = "0xA13" ] ; then
 + if [ $archhver -eq 5 ] && [ "$(printf "0x%X\n" $archpart)" = "0xA13" ] ; 
 then
 echo "ete"
 else
 echo "etm"
>>> 
>>> 
>>> Reviewed-by: James Clark 
> 
> Some are not applying, even after b4 picking up v2
> 
> Total patches: 3
> ---
> Cover: 
> ./v2_20231013_atrajeev_fix_for_shellcheck_issues_with_latest_scripts_in_tests_shell.cover
> Link: 
> https://lore.kernel.org/r/20231013073021.99794-1-atraj...@linux.vnet.ibm.com
> Base: not specified
>   git am 
> ./v2_20231013_atrajeev_fix_for_shellcheck_issues_with_latest_scripts_in_tests_shell.mbx
> ⬢[acme@toolbox perf-tools-next]$git am 
> ./v2_20231013_atrajeev_fix_for_shellcheck_issues_with_latest_scripts_in_tests_shell.mbx
> Applying: tools/perf/tests Ignore the shellcheck SC2046 warning in 
> lock_contention
> error: patch failed: tools/perf/tests/shell/lock_contention.sh:33
> error: tools/perf/tests/shell/lock_contention.sh: patch does not apply
> Patch failed at 0001 tools/perf/tests Ignore the shellcheck SC2046 warning in 
> lock_contention
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> ⬢[acme@toolbox perf-tools-next]$ git am --abort
> ⬢[acme@toolbox perf-tools-next]$

Hi Arnaldo

The patch is picked up : 
https://lore.kernel.org/all/169757198796.167943.10552920255799914362.b4...@kernel.org/
 .
Thanks for looking into.

Athira




Re: [PATCH v2 37/37] powerpc: Support execute-only on all powerpc

2023-11-06 Thread Aneesh Kumar K V
On 11/6/23 6:53 PM, Christophe Leroy wrote:
> 
> 
> Le 02/11/2023 à 06:39, Aneesh Kumar K.V a écrit :
>> Christophe Leroy  writes:
>>
>>> Introduce PAGE_EXECONLY_X macro which provides exec-only rights.
>>> The _X may be seen as redundant with the EXECONLY but it helps
>>> keep consistancy, all macros having the EXEC right have _X.
>>>
>>> And put it next to PAGE_NONE as PAGE_EXECONLY_X is
>>> somehow PAGE_NONE + EXEC just like all other SOMETHING_X are
>>> just SOMETHING + EXEC.
>>>
>>> On book3s/64 PAGE_EXECONLY becomes PAGE_READONLY_X.
>>>
>>> On book3s/64, as PAGE_EXECONLY is only valid for Radix add
>>> VM_READ flag in vm_get_page_prot() for non-Radix.
>>>
>>> And update access_error() so that a non exec fault on a VM_EXEC only
>>> mapping is always invalid, even when the underlying layer don't
>>> always generate a fault for that.
>>>
>>> For 8xx, set PAGE_EXECONLY_X as _PAGE_NA | _PAGE_EXEC.
>>> For others, only set it as just _PAGE_EXEC
>>>
>>> With that change, 8xx, e500 and 44x fully honor execute-only
>>> protection.
>>>
>>> On 40x that is a partial implementation of execute-only. The
>>> implementation won't be complete because once a TLB has been loaded
>>> via the Instruction TLB miss handler, it will be possible to read
>>> the page. But at least it can't be read unless it is executed first.
>>>
>>> On 603 MMU, TLB missed are handled by SW and there are separate
>>> DTLB and ITLB. Execute-only is therefore now supported by not loading
>>> DTLB when read access is not permitted.
>>>
>>> On hash (604) MMU it is more tricky because hash table is common to
>>> load/store and execute. Nevertheless it is still possible to check
>>> whether _PAGE_READ is set before loading hash table for a load/store
>>> access. At least it can't be read unless it is executed first.
>>>
>>> Signed-off-by: Christophe Leroy 
>>> Cc: Russell Currey 
>>> Cc: Kees Cook 
>>> ---
>>>   arch/powerpc/include/asm/book3s/32/pgtable.h |  2 +-
>>>   arch/powerpc/include/asm/book3s/64/pgtable.h |  4 +---
>>>   arch/powerpc/include/asm/nohash/32/pte-8xx.h |  1 +
>>>   arch/powerpc/include/asm/nohash/pgtable.h|  2 +-
>>>   arch/powerpc/include/asm/nohash/pte-e500.h   |  1 +
>>>   arch/powerpc/include/asm/pgtable-masks.h |  2 ++
>>>   arch/powerpc/mm/book3s64/pgtable.c   | 10 --
>>>   arch/powerpc/mm/fault.c  |  9 +
>>>   arch/powerpc/mm/pgtable.c|  4 ++--
>>>   9 files changed, 18 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
>>> b/arch/powerpc/include/asm/book3s/32/pgtable.h
>>> index 244621c88510..52971ee30717 100644
>>> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
>>> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
>>> @@ -425,7 +425,7 @@ static inline bool pte_access_permitted(pte_t pte, bool 
>>> write)
>>>   {
>>> /*
>>>  * A read-only access is controlled by _PAGE_READ bit.
>>> -* We have _PAGE_READ set for WRITE and EXECUTE
>>> +* We have _PAGE_READ set for WRITE
>>>  */
>>> if (!pte_present(pte) || !pte_read(pte))
>>> return false;
>>>
>>
>> Should this now be updated to check for EXEC bit ?
> 
> I don't think so based on what I see in arm64: 
> https://elixir.bootlin.com/linux/v6.6/source/arch/arm64/include/asm/pgtable.h#L146
> 

But then there can be a get_user_pages() (FOLL_GET) on an exec only pte right?
if we are going to access the page data(FOLL_PIN), then yes exec only mapping 
should
fail for that. But if we using it to do struct page manipulation we need 
pte_access_permitted
to return true for exec only mapping?


-aneesh




Re: [PATCH] powerpc: Fix signature of pfn_to_kaddr()

2023-11-06 Thread Michael Ellerman
Linus Walleij  writes:
> There is a const in the returned value from pfn_to_kaddr()
> but there are consumers that want to modify the result
> and the generic function pfn_to_virt() in 
> does allow this, so let's relax this requirement and do not
> make the returned value const.
>
> Reported-by: kernel test robot 
> Closes: 
> https://lore.kernel.org/oe-kbuild-all/202311061940.4pbrm44u-...@intel.com/
 
I'm struggling to connect the removal of const with those bug reports.
It looks like all those warnings are about 0xc000 being
outside the range of unsigned long when building 32-bit.

Is it the right bug report link?

The current signature of:

  static inline const void *pfn_to_kaddr(unsigned long pfn) ...

seems OK to me.

It allows code like:

  const void *p = pfn_to_kaddr(pfn);
  p++;

But errors for:

  const void *p = pfn_to_kaddr(pfn);
  unsigned long *q = p;
  *q = 0;

  error: initialization discards ‘const’ qualifier from pointer target type


Having said that it looks like almost every caller of pfn_to_kaddr()
casts the result to unsigned long, so possibly that would be the better
return type in terms of the actual usage. Although that would conflict
with __va() which returns void * :/

cheers


Re: [PATCH 08/34] KVM: Introduce KVM_SET_USER_MEMORY_REGION2

2023-11-06 Thread Yuan Yao
On Sun, Nov 05, 2023 at 05:30:11PM +0100, Paolo Bonzini wrote:
> From: Sean Christopherson 
>
> Introduce a "version 2" of KVM_SET_USER_MEMORY_REGION so that additional
> information can be supplied without setting userspace up to fail.  The
> padding in the new kvm_userspace_memory_region2 structure will be used to
> pass a file descriptor in addition to the userspace_addr, i.e. allow
> userspace to point at a file descriptor and map memory into a guest that
> is NOT mapped into host userspace.
>
> Alternatively, KVM could simply add "struct kvm_userspace_memory_region2"
> without a new ioctl(), but as Paolo pointed out, adding a new ioctl()
> makes detection of bad flags a bit more robust, e.g. if the new fd field
> is guarded only by a flag and not a new ioctl(), then a userspace bug
> (setting a "bad" flag) would generate out-of-bounds access instead of an
> -EINVAL error.
>
> Cc: Jarkko Sakkinen 
> Reviewed-by: Paolo Bonzini 
> Reviewed-by: Xiaoyao Li 
> Signed-off-by: Sean Christopherson 
> Reviewed-by: Fuad Tabba 
> Tested-by: Fuad Tabba 
> Message-Id: <20231027182217.3615211-9-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---
>  Documentation/virt/kvm/api.rst | 22 +
>  arch/x86/kvm/x86.c |  2 +-
>  include/linux/kvm_host.h   |  4 +--
>  include/uapi/linux/kvm.h   | 13 
>  virt/kvm/kvm_main.c| 57 +-
>  5 files changed, 87 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7025b3751027..bdea1423c5f8 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1340,6 +1340,7 @@ yet and must be cleared on entry.
>   __u64 guest_phys_addr;
>   __u64 memory_size; /* bytes */
>   __u64 userspace_addr; /* start of the userspace allocated memory */
> + __u64 pad[16];

Looks incorrect to add padding part in kvm_userspace_memory_region,
only need to apply on kvm_userspace_memory_region2 below.

>};
>
>/* for kvm_userspace_memory_region::flags */
> @@ -6192,6 +6193,27 @@ to know what fields can be changed for the system 
> register described by
>  ``op0, op1, crn, crm, op2``. KVM rejects ID register values that describe a
>  superset of the features supported by the system.
>
> +4.140 KVM_SET_USER_MEMORY_REGION2
> +-
> +
> +:Capability: KVM_CAP_USER_MEMORY2
> +:Architectures: all
> +:Type: vm ioctl
> +:Parameters: struct kvm_userspace_memory_region2 (in)
> +:Returns: 0 on success, -1 on error
> +
> +::
> +
> +  struct kvm_userspace_memory_region2 {
> + __u32 slot;
> + __u32 flags;
> + __u64 guest_phys_addr;
> + __u64 memory_size; /* bytes */
> + __u64 userspace_addr; /* start of the userspace allocated memory */
> +  };
> +
> +See KVM_SET_USER_MEMORY_REGION.
> +
>  5. The kvm_run structure
>  
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2c924075f6f1..7b389f27dffc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12576,7 +12576,7 @@ void __user * __x86_set_memory_region(struct kvm 
> *kvm, int id, gpa_t gpa,
>   }
>
>   for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> - struct kvm_userspace_memory_region m;
> + struct kvm_userspace_memory_region2 m;
>
>   m.slot = id | (i << 16);
>   m.flags = 0;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5faba69403ac..4e741ff27af3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1146,9 +1146,9 @@ enum kvm_mr_change {
>  };
>
>  int kvm_set_memory_region(struct kvm *kvm,
> -   const struct kvm_userspace_memory_region *mem);
> +   const struct kvm_userspace_memory_region2 *mem);
>  int __kvm_set_memory_region(struct kvm *kvm,
> - const struct kvm_userspace_memory_region *mem);
> + const struct kvm_userspace_memory_region2 *mem);
>  void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot);
>  void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen);
>  int kvm_arch_prepare_memory_region(struct kvm *kvm,
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 211b86de35ac..308cc70bd6ab 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -95,6 +95,16 @@ struct kvm_userspace_memory_region {
>   __u64 userspace_addr; /* start of the userspace allocated memory */
>  };
>
> +/* for KVM_SET_USER_MEMORY_REGION2 */
> +struct kvm_userspace_memory_region2 {
> + __u32 slot;
> + __u32 flags;
> + __u64 guest_phys_addr;
> + __u64 memory_size;
> + __u64 userspace_addr;
> + __u64 pad[16];
> +};
> +
>  /*
>   * The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for
>   * userspace, other bits are reserved for kvm internal use which are defined
> @@ -1201,6 +1211,7 @@ 

Re: [PATCH 1/3] perf tests test_arm_coresight: Fix the shellcheck warning in latest test_arm_coresight.sh

2023-11-06 Thread Arnaldo Carvalho de Melo
Em Thu, Oct 05, 2023 at 02:24:15PM +0530, Athira Rajeev escreveu:
> > On 05-Oct-2023, at 1:50 PM, James Clark  wrote:
> > On 29/09/2023 05:11, Athira Rajeev wrote:
> >> Running shellcheck on tests/shell/test_arm_coresight.sh
> >> throws below warnings:
> >> 
> >> In tests/shell/test_arm_coresight.sh line 15:
> >> cs_etm_path=$(find  /sys/bus/event_source/devices/cs_etm/ -name cpu* 
> >> -print -quit)
> >>  ^--^ SC2061: Quote the parameter to -name so the shell 
> >> won't interpret it.
> >> 
> >> In tests/shell/test_arm_coresight.sh line 20:
> >> if [ $archhver -eq 5 -a "$(printf "0x%X\n" $archpart)" = "0xA13" ] ; then
> >>  ^-- SC2166: Prefer [ p ] && [ q ] as [ p -a q 
> >> ] is not well defined
> >> 
> >> This warning is observed after commit:
> >> "commit bb350847965d ("perf test: Update cs_etm testcase for Arm ETE")"
> >> 
> >> Fixed this issue by using quoting 'cpu*' for SC2061 and
> >> using "&&" in line number 20 for SC2166 warning
> >> 
> >> Fixes: bb350847965d ("perf test: Update cs_etm testcase for Arm ETE")
> >> Signed-off-by: Athira Rajeev 
> >> ---
> >> tools/perf/tests/shell/test_arm_coresight.sh | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/tools/perf/tests/shell/test_arm_coresight.sh 
> >> b/tools/perf/tests/shell/test_arm_coresight.sh
> >> index fe78c4626e45..f2115dfa24a5 100755
> >> --- a/tools/perf/tests/shell/test_arm_coresight.sh
> >> +++ b/tools/perf/tests/shell/test_arm_coresight.sh
> >> @@ -12,12 +12,12 @@
> >> glb_err=0
> >> 
> >> cs_etm_dev_name() {
> >> - cs_etm_path=$(find  /sys/bus/event_source/devices/cs_etm/ -name cpu* 
> >> -print -quit)
> >> + cs_etm_path=$(find  /sys/bus/event_source/devices/cs_etm/ -name 'cpu*' 
> >> -print -quit)
> >> trcdevarch=$(cat ${cs_etm_path}/mgmt/trcdevarch)
> >> archhver=$((($trcdevarch >> 12) & 0xf))
> >> archpart=$(($trcdevarch & 0xfff))
> >> 
> >> - if [ $archhver -eq 5 -a "$(printf "0x%X\n" $archpart)" = "0xA13" ] ; then
> >> + if [ $archhver -eq 5 ] && [ "$(printf "0x%X\n" $archpart)" = "0xA13" ] ; 
> >> then
> >> echo "ete"
> >> else
> >> echo "etm"
> > 
> > 
> > Reviewed-by: James Clark 

Some are not applying, even after b4 picking up v2

Total patches: 3
---
Cover: 
./v2_20231013_atrajeev_fix_for_shellcheck_issues_with_latest_scripts_in_tests_shell.cover
 Link: 
https://lore.kernel.org/r/20231013073021.99794-1-atraj...@linux.vnet.ibm.com
 Base: not specified
   git am 
./v2_20231013_atrajeev_fix_for_shellcheck_issues_with_latest_scripts_in_tests_shell.mbx
⬢[acme@toolbox perf-tools-next]$git am 
./v2_20231013_atrajeev_fix_for_shellcheck_issues_with_latest_scripts_in_tests_shell.mbx
Applying: tools/perf/tests Ignore the shellcheck SC2046 warning in 
lock_contention
error: patch failed: tools/perf/tests/shell/lock_contention.sh:33
error: tools/perf/tests/shell/lock_contention.sh: patch does not apply
Patch failed at 0001 tools/perf/tests Ignore the shellcheck SC2046 warning in 
lock_contention
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
⬢[acme@toolbox perf-tools-next]$ git am --abort
⬢[acme@toolbox perf-tools-next]$


Re: [PATCH 25/34] KVM: selftests: Add helpers to convert guest memory b/w private and shared

2023-11-06 Thread Fuad Tabba
On Mon, Nov 6, 2023 at 4:13 PM Sean Christopherson  wrote:
>
> On Mon, Nov 06, 2023, Fuad Tabba wrote:
> > On Sun, Nov 5, 2023 at 4:34 PM Paolo Bonzini  wrote:
> > > +void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t base, uint64_t 
> > > size,
> > > +   bool punch_hole)
> > > +{
> > > +   const int mode = FALLOC_FL_KEEP_SIZE | (punch_hole ? 
> > > FALLOC_FL_PUNCH_HOLE : 0);
> > > +   struct userspace_mem_region *region;
> > > +   uint64_t end = base + size;
> > > +   uint64_t gpa, len;
> > > +   off_t fd_offset;
> > > +   int ret;
> > > +
> > > +   for (gpa = base; gpa < end; gpa += len) {
> > > +   uint64_t offset;
> > > +
> > > +   region = userspace_mem_region_find(vm, gpa, gpa);
> > > +   TEST_ASSERT(region && region->region.flags & 
> > > KVM_MEM_GUEST_MEMFD,
> > > +   "Private memory region not found for GPA 
> > > 0x%lx", gpa);
> > > +
> > > +   offset = (gpa - region->region.guest_phys_addr);
> >
> > nit: why the parentheses?
>
> I simply forgot to remove them when I changed the function to support spanning
> multiple memslots, i.e. when the code went from this
>
> fd_offset = region->region.gmem_offset +
> (gpa - region->region.guest_phys_addr);
>
> to what you see above.

I wasn't actually expecting an answer, but I literally _did_ ask for it :)

Thanks,
/fuad


Re: [PATCH 27/34] KVM: selftests: Introduce VM "shape" to allow tests to specify the VM type

2023-11-06 Thread Fuad Tabba
On Mon, Nov 6, 2023 at 4:04 PM Sean Christopherson  wrote:
>
> On Mon, Nov 06, 2023, Fuad Tabba wrote:
> > On Sun, Nov 5, 2023 at 4:34 PM Paolo Bonzini  wrote:
> > >
> > > From: Sean Christopherson 
> > >
> > > Add a "vm_shape" structure to encapsulate the selftests-defined "mode",
> > > along with the KVM-defined "type" for use when creating a new VM.  "mode"
> > > tracks physical and virtual address properties, as well as the preferred
> > > backing memory type, while "type" corresponds to the VM type.
> > >
> > > Taking the VM type will allow adding tests for KVM_CREATE_GUEST_MEMFD,
> > > a.k.a. guest private memory, without needing an entirely separate set of
> > > helpers.  Guest private memory is effectively usable only by confidential
> > > VM types, and it's expected that x86 will double down and require unique
> > > VM types for TDX and SNP guests.
> > >
> > > Signed-off-by: Sean Christopherson 
> > > Message-Id: <20231027182217.3615211-30-sea...@google.com>
> > > Signed-off-by: Paolo Bonzini 
> > > ---
> >
> > nit: as in a prior selftest commit messages, references in the commit
> > message to guest _private_ memory. Should these be changed to just
> > guest memory?
>
> Hmm, no, "private" is mostly appropriate here.  At this point in time, only 
> x86
> supports KVM_CREATE_GUEST_MEMFD, and x86 only supports it for private memory.
> And the purpose of letting x86 selftests specify KVM_X86_SW_PROTECTED_VM, i.e.
> the reason this patch exists, is purely to get private memory.
>
> Maybe tweak the second paragraph to this?
>
> Taking the VM type will allow adding tests for KVM_CREATE_GUEST_MEMFD
> without needing an entirely separate set of helpers.  At this time,
> guest_memfd is effectively usable only by confidential VM types in the
> form of guest private memory, and it's expected that x86 will double down
> and require unique VM types for TDX and SNP guests.

sgtm
/fuad


Re: [PATCH 25/34] KVM: selftests: Add helpers to convert guest memory b/w private and shared

2023-11-06 Thread Sean Christopherson
On Mon, Nov 06, 2023, Fuad Tabba wrote:
> On Sun, Nov 5, 2023 at 4:34 PM Paolo Bonzini  wrote:
> > +void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t base, uint64_t 
> > size,
> > +   bool punch_hole)
> > +{
> > +   const int mode = FALLOC_FL_KEEP_SIZE | (punch_hole ? 
> > FALLOC_FL_PUNCH_HOLE : 0);
> > +   struct userspace_mem_region *region;
> > +   uint64_t end = base + size;
> > +   uint64_t gpa, len;
> > +   off_t fd_offset;
> > +   int ret;
> > +
> > +   for (gpa = base; gpa < end; gpa += len) {
> > +   uint64_t offset;
> > +
> > +   region = userspace_mem_region_find(vm, gpa, gpa);
> > +   TEST_ASSERT(region && region->region.flags & 
> > KVM_MEM_GUEST_MEMFD,
> > +   "Private memory region not found for GPA 
> > 0x%lx", gpa);
> > +
> > +   offset = (gpa - region->region.guest_phys_addr);
> 
> nit: why the parentheses?

I simply forgot to remove them when I changed the function to support spanning
multiple memslots, i.e. when the code went from this

fd_offset = region->region.gmem_offset +
(gpa - region->region.guest_phys_addr);

to what you see above.


Re: [PATCH 27/34] KVM: selftests: Introduce VM "shape" to allow tests to specify the VM type

2023-11-06 Thread Sean Christopherson
On Mon, Nov 06, 2023, Fuad Tabba wrote:
> On Sun, Nov 5, 2023 at 4:34 PM Paolo Bonzini  wrote:
> >
> > From: Sean Christopherson 
> >
> > Add a "vm_shape" structure to encapsulate the selftests-defined "mode",
> > along with the KVM-defined "type" for use when creating a new VM.  "mode"
> > tracks physical and virtual address properties, as well as the preferred
> > backing memory type, while "type" corresponds to the VM type.
> >
> > Taking the VM type will allow adding tests for KVM_CREATE_GUEST_MEMFD,
> > a.k.a. guest private memory, without needing an entirely separate set of
> > helpers.  Guest private memory is effectively usable only by confidential
> > VM types, and it's expected that x86 will double down and require unique
> > VM types for TDX and SNP guests.
> >
> > Signed-off-by: Sean Christopherson 
> > Message-Id: <20231027182217.3615211-30-sea...@google.com>
> > Signed-off-by: Paolo Bonzini 
> > ---
> 
> nit: as in a prior selftest commit messages, references in the commit
> message to guest _private_ memory. Should these be changed to just
> guest memory?

Hmm, no, "private" is mostly appropriate here.  At this point in time, only x86
supports KVM_CREATE_GUEST_MEMFD, and x86 only supports it for private memory.
And the purpose of letting x86 selftests specify KVM_X86_SW_PROTECTED_VM, i.e.
the reason this patch exists, is purely to get private memory.

Maybe tweak the second paragraph to this?

Taking the VM type will allow adding tests for KVM_CREATE_GUEST_MEMFD
without needing an entirely separate set of helpers.  At this time,
guest_memfd is effectively usable only by confidential VM types in the
form of guest private memory, and it's expected that x86 will double down
and require unique VM types for TDX and SNP guests.


Re: [PATCH v13 20/35] KVM: x86/mmu: Handle page fault for private memory

2023-11-06 Thread Sean Christopherson
On Mon, Nov 06, 2023, Xu Yilun wrote:
> On Sun, Nov 05, 2023 at 05:19:36PM +0100, Paolo Bonzini wrote:
> > On Sun, Nov 5, 2023 at 2:04 PM Xu Yilun  wrote:
> > >
> > > > +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
> > > > +   struct kvm_page_fault 
> > > > *fault)
> > > > +{
> > > > + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
> > > > +   PAGE_SIZE, fault->write, 
> > > > fault->exec,
> > > > +   fault->is_private);
> > > > +}
> > > > +
> > > > +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > > > +struct kvm_page_fault *fault)
> > > > +{
> > > > + int max_order, r;
> > > > +
> > > > + if (!kvm_slot_can_be_private(fault->slot)) {
> > > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> > > > + return -EFAULT;
> > > > + }
> > > > +
> > > > + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, 
> > > > >pfn,
> > > > +  _order);
> > > > + if (r) {
> > > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> > > > + return r;
> > >
> > > Why report KVM_EXIT_MEMORY_FAULT here? even with a ret != -EFAULT?
> > 
> > The cases are EFAULT, EHWPOISON (which can report
> > KVM_EXIT_MEMORY_FAULT) and ENOMEM. I think it's fine
> > that even -ENOMEM can return KVM_EXIT_MEMORY_FAULT,
> > and it doesn't violate the documentation.  The docs tell you "what
> > can you do if error if EFAULT or EHWPOISON?"; they don't
> > exclude that other errnos result in KVM_EXIT_MEMORY_FAULT,
> > it's just that you're not supposed to look at it
> 
> Thanks, it's OK for ENOMEM + KVM_EXIT_MEMORY_FAULT.
> 
> Another concern is, now 3 places to report EFAULT + KVM_EXIT_MEMORY_FAULT:
> 
>   if (!kvm_slot_can_be_private(fault->slot)) {
>   kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>   return -EFAULT;
>   }
> 
>   file = kvm_gmem_get_file(slot);
>   if (!file)
>   return -EFAULT;
> 
>   if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
>   kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
>   return -EFAULT;
>   }
> 
> They are different cases, and seems userspace should handle them
> differently, but not enough information to distinguish them.

For the first, the memory_fault exit will inform userspace that the guest wants
to map memory as private, and userspace will see that the memslot isn't 
configured
to support private mappings.  Userspace may not even need to query memslots, 
e.g.
if the gfn in question has been enumerated to the guest as something that can 
only
be mapped shared.

For the second (no valid guest_memfd file), userspace put the last reference to
the guest_memfd file without informing the guest or creating a memslot.  That's
firmly a userspace bug.

For the third and last, userspace will see that the guest is requesting a 
private
mapping but the gfn is configured for shared mappings.

In all cases, userspace has the necessary information to resolve the issue, 
where
"resolving the issue" may mean terminating the guest.  If userspace isn't 
tracking
memslots or the private attribute, then userspace has far bigger problems.


Re: [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory

2023-11-06 Thread Sean Christopherson
On Sat, Nov 04, 2023, Xu Yilun wrote:
> > +KVM_SET_USER_MEMORY_REGION2 is an extension to KVM_SET_USER_MEMORY_REGION 
> > that
> > +allows mapping guest_memfd memory into a guest.  All fields shared with
> > +KVM_SET_USER_MEMORY_REGION identically.  Userspace can set KVM_MEM_PRIVATE 
> > in
> > +flags to have KVM bind the memory region to a given guest_memfd range of
> > +[guest_memfd_offset, guest_memfd_offset + memory_size].  The target 
> > guest_memfd
> ^
> The range end should be exclusive, is it?

Yes, that should be a ')', not a ']'.

> > +static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags)
> > +{
> > +   const char *anon_name = "[kvm-gmem]";
> > +   struct kvm_gmem *gmem;
> > +   struct inode *inode;
> > +   struct file *file;
> > +   int fd, err;
> > +
> > +   fd = get_unused_fd_flags(0);
> > +   if (fd < 0)
> > +   return fd;
> > +
> > +   gmem = kzalloc(sizeof(*gmem), GFP_KERNEL);
> > +   if (!gmem) {
> > +   err = -ENOMEM;
> > +   goto err_fd;
> > +   }
> > +
> > +   /*
> > +* Use the so called "secure" variant, which creates a unique inode
> > +* instead of reusing a single inode.  Each guest_memfd instance needs
> > +* its own inode to track the size, flags, etc.
> > +*/
> > +   file = anon_inode_getfile_secure(anon_name, _gmem_fops, gmem,
> > +O_RDWR, NULL);
> > +   if (IS_ERR(file)) {
> > +   err = PTR_ERR(file);
> > +   goto err_gmem;
> > +   }
> > +
> > +   file->f_flags |= O_LARGEFILE;
> > +
> > +   inode = file->f_inode;
> > +   WARN_ON(file->f_mapping != inode->i_mapping);
> 
> Just curious, why should we check the mapping fields which is garanteed in
> other subsystem?

Mostly to document the behavior.  The vast majority of folks that read this code
will be KVM developers, not file systems developers, and will likely have no 
clue
about the relationship between f_mapping and i_mapping.  And in the extremely
unlikely scenario that anon_inode_getfile_secure() no longer sets f_mapping, a
WARN detects the issue whereas a comment does not.


Re: [PATCH 14/34] fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()

2023-11-06 Thread Christian Brauner
On Sun, Nov 05, 2023 at 05:30:17PM +0100, Paolo Bonzini wrote:
> The call to the inode_init_security_anon() LSM hook is not the sole
> reason to use anon_inode_getfile_secure() or anon_inode_getfd_secure().
> For example, the functions also allow one to create a file with non-zero
> size, without needing a full-blown filesystem.  In this case, you don't
> need a "secure" version, just unique inodes; the current name of the
> functions is confusing and does not explain well the difference with
> the more "standard" anon_inode_getfile() and anon_inode_getfd().
> 
> Of course, there is another side of the coin; neither io_uring nor
> userfaultfd strictly speaking need distinct inodes, and it is not
> that clear anymore that anon_inode_create_get{file,fd}() allow the LSM
> to intercept and block the inode's creation.  If one was so inclined,
> anon_inode_getfile_secure() and anon_inode_getfd_secure() could be kept,
> using the shared inode or a new one depending on CONFIG_SECURITY.
> However, this is probably overkill, and potentially a cause of bugs in
> different configurations.  Therefore, just add a comment to io_uring
> and userfaultfd explaining the choice of the function.
> 
> While at it, remove the export for what is now anon_inode_create_getfd().
> There is no in-tree module that uses it, and the old name is gone anyway.

That's great, thanks.

> If anybody actually needs the symbol, they can ask or they can just use
> anon_inode_create_getfile(), which will be exported very soon for use
> in KVM.
> 
> Suggested-by: Christian Brauner 
> Signed-off-by: Paolo Bonzini 
> ---

Looks good to me,
Reviewed-by: Christian Brauner 


Re: [PATCH 31/34] KVM: selftests: Expand set_memory_region_test to validate guest_memfd()

2023-11-06 Thread Paolo Bonzini

On 11/5/23 17:30, Paolo Bonzini wrote:

From: Chao Peng 

Expand set_memory_region_test to exercise various positive and negative
testcases for private memory.

  - Non-guest_memfd() file descriptor for private memory
  - guest_memfd() from different VM
  - Overlapping bindings
  - Unaligned bindings


This needs a small fixup:

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
b/tools/testing/selftests/kvm/include/kvm_util_base.h
index e4d2cd9218b2..1b58f943562f 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -819,6 +819,7 @@ static inline struct kvm_vm *vm_create_barebones(void)
return vm_create(VM_SHAPE_DEFAULT);
 }
 
+#ifdef __x86_64__

 static inline struct kvm_vm *vm_create_barebones_protected_vm(void)
 {
const struct vm_shape shape = {
@@ -828,6 +829,7 @@ static inline struct kvm_vm 
*vm_create_barebones_protected_vm(void)
 
 	return vm_create(shape);

 }
+#endif
 
 static inline struct kvm_vm *vm_create(uint32_t nr_runnable_vcpus)

 {
diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c 
b/tools/testing/selftests/kvm/set_memory_region_test.c
index 1891774eb6d4..302c7a46955b 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -386,6 +386,7 @@ static void test_add_max_memory_regions(void)
 }
 
 
+#ifdef __x86_64__

 static void test_invalid_guest_memfd(struct kvm_vm *vm, int memfd,
 size_t offset, const char *msg)
 {
@@ -476,14 +477,13 @@ static void 
test_add_overlapping_private_memory_regions(void)
close(memfd);
kvm_vm_free(vm);
 }
+#endif
 
 int main(int argc, char *argv[])

 {
 #ifdef __x86_64__
int i, loops;
-#endif
 
-#ifdef __x86_64__

/*
 * FIXME: the zero-memslot test fails on aarch64 and s390x because
 * KVM_RUN fails with ENOEXEC or EFAULT.
@@ -493,6 +493,7 @@ int main(int argc, char *argv[])
 
 	test_add_max_memory_regions();
 
+#ifdef __x86_64__

if (kvm_has_cap(KVM_CAP_GUEST_MEMFD) &&
(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_VM))) {
test_add_private_memory_region();
@@ -501,7 +502,6 @@ int main(int argc, char *argv[])
pr_info("Skipping tests for KVM_MEM_GUEST_MEMFD memory 
regions\n");
}
 
-#ifdef __x86_64__

if (argc > 1)
loops = atoi_positive("Number of iterations", argv[1]);
else

in order to compile successfully on non-x86 platforms.



Re: [RFC PATCH v8 13/13] media: vim2m_audio: add virtual driver for audio memory to memory

2023-11-06 Thread Hans Verkuil
On 27/10/2023 12:35, Shengjiu Wang wrote:
> Audio memory to memory virtual driver use video memory to memory
> virtual driver vim2m.c as example. The main difference is
> device type is VFL_TYPE_AUDIO and device cap type is V4L2_CAP_AUDIO_M2M.
> 
> The device_run function is a dummy function, which is simply
> copy the data from input buffer to output buffer.

I started work on the v4l-utils part of this, using this driver.

I noticed that this driver doesn't expose the 
V4L2_CID_M2M_AUDIO_SOURCE/SINK_RATE
controls, and it really should, otherwise it is not representative of this
type of device.

It is enough to start with just a single fixed rate listed for each control.

It would be even nicer if you can have two rates such as 24000 and 48000 and
do the actual rate conversion, i.e. dropping every other sample or duplicating
each sample depending on whether you're halving or doubling the rate. That
should be easy to implement, and it makes this driver much more realistic.

Regards,

Hans

> 
> Signed-off-by: Shengjiu Wang 
> ---
>  drivers/media/test-drivers/Kconfig   |   9 +
>  drivers/media/test-drivers/Makefile  |   1 +
>  drivers/media/test-drivers/vim2m_audio.c | 680 +++
>  3 files changed, 690 insertions(+)
>  create mode 100644 drivers/media/test-drivers/vim2m_audio.c
> 
> diff --git a/drivers/media/test-drivers/Kconfig 
> b/drivers/media/test-drivers/Kconfig
> index 459b433e9fae..c280e192d43a 100644
> --- a/drivers/media/test-drivers/Kconfig
> +++ b/drivers/media/test-drivers/Kconfig
> @@ -17,6 +17,15 @@ config VIDEO_VIM2M
> This is a virtual test device for the memory-to-memory driver
> framework.
>  
> +config VIDEO_VIM2M_AUDIO
> + tristate "Virtual Memory-to-Memory Driver For Audio"
> + depends on VIDEO_DEV
> + select VIDEOBUF2_VMALLOC
> + select V4L2_MEM2MEM_DEV
> + help
> +   This is a virtual audio test device for the memory-to-memory driver
> +   framework.
> +
>  source "drivers/media/test-drivers/vicodec/Kconfig"
>  source "drivers/media/test-drivers/vimc/Kconfig"
>  source "drivers/media/test-drivers/vivid/Kconfig"
> diff --git a/drivers/media/test-drivers/Makefile 
> b/drivers/media/test-drivers/Makefile
> index 740714a4584d..c2c33282bf96 100644
> --- a/drivers/media/test-drivers/Makefile
> +++ b/drivers/media/test-drivers/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_DVB_VIDTV) += vidtv/
>  
>  obj-$(CONFIG_VIDEO_VICODEC) += vicodec/
>  obj-$(CONFIG_VIDEO_VIM2M) += vim2m.o
> +obj-$(CONFIG_VIDEO_VIM2M_AUDIO) += vim2m_audio.o
>  obj-$(CONFIG_VIDEO_VIMC) += vimc/
>  obj-$(CONFIG_VIDEO_VIVID) += vivid/
>  obj-$(CONFIG_VIDEO_VISL) += visl/
> diff --git a/drivers/media/test-drivers/vim2m_audio.c 
> b/drivers/media/test-drivers/vim2m_audio.c
> new file mode 100644
> index ..2134e8338417
> --- /dev/null
> +++ b/drivers/media/test-drivers/vim2m_audio.c
> @@ -0,0 +1,680 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * A virtual v4l2-mem2mem example for audio device.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +MODULE_DESCRIPTION("Virtual device for audio mem2mem testing");
> +MODULE_LICENSE("GPL");
> +
> +static unsigned int debug;
> +module_param(debug, uint, 0644);
> +MODULE_PARM_DESC(debug, "debug level");
> +
> +#define MEM2MEM_NAME "vim2m-audio"
> +
> +#define dprintk(dev, lvl, fmt, arg...) \
> + v4l2_dbg(lvl, debug, &(dev)->v4l2_dev, "%s: " fmt, __func__, ## arg)
> +
> +#define SAMPLE_NUM 4096
> +
> +static void audm2m_dev_release(struct device *dev)
> +{}
> +
> +static struct platform_device audm2m_pdev = {
> + .name   = MEM2MEM_NAME,
> + .dev.release= audm2m_dev_release,
> +};
> +
> +static u32 formats[] = {
> + V4L2_AUDIO_FMT_S8,
> + V4L2_AUDIO_FMT_S16_LE,
> + V4L2_AUDIO_FMT_U16_LE,
> + V4L2_AUDIO_FMT_S24_LE,
> + V4L2_AUDIO_FMT_S24_3LE,
> + V4L2_AUDIO_FMT_U24_LE,
> + V4L2_AUDIO_FMT_U24_3LE,
> + V4L2_AUDIO_FMT_S32_LE,
> + V4L2_AUDIO_FMT_U32_LE,
> + V4L2_AUDIO_FMT_S20_3LE,
> + V4L2_AUDIO_FMT_U20_3LE,
> +};
> +
> +#define NUM_FORMATS ARRAY_SIZE(formats)
> +
> +/* Per-queue, driver-specific private data */
> +struct audm2m_q_data {
> + unsigned intrate;
> + unsigned intchannels;
> + unsigned intbuffersize;
> + u32 fourcc;
> +};
> +
> +enum {
> + V4L2_M2M_SRC = 0,
> + V4L2_M2M_DST = 1,
> +};
> +
> +static snd_pcm_format_t find_format(u32 fourcc)
> +{
> + snd_pcm_format_t fmt;
> + unsigned int k;
> +
> + for (k = 0; k < NUM_FORMATS; k++) {
> + if (formats[k] == fourcc)
> + break;
> + }
> +
> + if (k == NUM_FORMATS)
> + return 0;
> +
> + fmt = v4l2_fourcc_to_audfmt(formats[k]);
> +
> + return fmt;
> +}
> +
> +struct audm2m_dev {
> + 

[PATCH] powerpc: Fix signature of pfn_to_kaddr()

2023-11-06 Thread Linus Walleij
There is a const in the returned value from pfn_to_kaddr()
but there are consumers that want to modify the result
and the generic function pfn_to_virt() in 
does allow this, so let's relax this requirement and do not
make the returned value const.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202311061940.4pbrm44u-...@intel.com/
Fixes: 58b6fed89ab0 ("powerpc: Make virt_to_pfn() a static inline")
Signed-off-by: Linus Walleij 
---
The remaining warnings from the test robot appear a bit bogus.
If someone knows what to do about them, please help. The warnings
often properly uncovers problems that have been around forever
due to these functions being disguised as macros.
---
 arch/powerpc/include/asm/page.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index e5fcc79b5bfb..5243e48dc13a 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -230,7 +230,7 @@ static inline unsigned long virt_to_pfn(const void *kaddr)
return __pa(kaddr) >> PAGE_SHIFT;
 }
 
-static inline const void *pfn_to_kaddr(unsigned long pfn)
+static inline void *pfn_to_kaddr(unsigned long pfn)
 {
return __va(pfn << PAGE_SHIFT);
 }

---
base-commit: d2f51b3516dade79269ff45eae2a7668ae711b25
change-id: 20231106-virt-to-pfn-fix-ppc-d91de47c6017

Best regards,
-- 
Linus Walleij 



[PATCH 5/7] powerpc/rtas: Move post_mobility_fixup() declaration to pseries

2023-11-06 Thread Nathan Lynch via B4 Relay
From: Nathan Lynch 

This is a pseries-specific function declaration that doesn't belong in
rtas.h. Move it to the pseries platform code and adjust
pseries/suspend.c accordingly.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/rtas.h  | 1 -
 arch/powerpc/platforms/pseries/pseries.h | 1 +
 arch/powerpc/platforms/pseries/suspend.c | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index c6568a647cd0..2365668fc13e 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -444,7 +444,6 @@ extern void pSeries_log_error(char *buf, unsigned int 
err_type, int fatal);
 #ifdef CONFIG_PPC_PSERIES
 extern time64_t last_rtas_event;
 extern int clobbering_unread_rtas_event(void);
-extern void post_mobility_fixup(void);
 int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
 #else
 static inline int clobbering_unread_rtas_event(void) { return 0; }
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 8376f03f932a..bba4ad192b0f 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -55,6 +55,7 @@ extern int dlpar_detach_node(struct device_node *);
 extern int dlpar_acquire_drc(u32 drc_index);
 extern int dlpar_release_drc(u32 drc_index);
 extern int dlpar_unisolate_drc(u32 drc_index);
+extern void post_mobility_fixup(void);
 
 void queue_hotplug_event(struct pseries_hp_errorlog *hp_errlog);
 int handle_dlpar_errorlog(struct pseries_hp_errorlog *hp_errlog);
diff --git a/arch/powerpc/platforms/pseries/suspend.c 
b/arch/powerpc/platforms/pseries/suspend.c
index 5c43435472cc..382003dfdb9a 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include "pseries.h"
 
 static struct device suspend_dev;
 

-- 
2.41.0



[PATCH 3/7] powerpc/rtas: Drop declaration of undefined call_rtas() function

2023-11-06 Thread Nathan Lynch via B4 Relay
From: Nathan Lynch 

The call_rtas() function has never been a part of arch/powerpc, and
its implementation was removed from arch/ppc by 0a26b1364f14 ("ppc:
Remove CHRP, POWER3 and POWER4 support from arch/ppc").

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/rtas.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index c697c3c74694..3bf7f0a4b07e 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -542,8 +542,6 @@ static inline void pSeries_coalesce_init(void) { }
 static inline void rtas_initialize(void) { }
 #endif
 
-extern int call_rtas(const char *, int, int, unsigned long *, ...);
-
 #ifdef CONFIG_HV_PERF_CTRS
 void read_24x7_sys_info(void);
 #else

-- 
2.41.0



[PATCH 0/7] powerpc/rtas: Trivial, coding style, and kernel-doc fixes

2023-11-06 Thread Nathan Lynch via B4 Relay
* Fix recently introduced kernel-doc warnings.
* Make minor coding style adjustments for readability.
* Remove rtas_service_present() and an old call_rtas() declaration.
* Move a pseries-specific function prototype to pseries code.

---
Nathan Lynch (7):
  powerpc/pseries/rtas-work-area: Fix rtas_work_area_reserve_arena() 
kernel-doc
  powerpc/rtas: Fix ppc_rtas_rmo_buf_show() kernel-doc
  powerpc/rtas: Drop declaration of undefined call_rtas() function
  powerpc/rtas: Remove unused rtas_service_present()
  powerpc/rtas: Move post_mobility_fixup() declaration to pseries
  powerpc/rtas: Remove trailing space
  powerpc/rtas: Remove 'extern' from function declarations in rtas.h

 arch/powerpc/include/asm/rtas.h | 62 -
 arch/powerpc/kernel/rtas-proc.c |  2 +
 arch/powerpc/kernel/rtas.c  | 23 -
 arch/powerpc/platforms/pseries/pseries.h|  1 +
 arch/powerpc/platforms/pseries/rtas-work-area.c |  1 +
 arch/powerpc/platforms/pseries/suspend.c|  1 +
 6 files changed, 43 insertions(+), 47 deletions(-)
---
base-commit: 303d77a6e1707498f09c9d8ee91b1dc07ca315a5
change-id: 20231025-rtas-trivial-2c22ce853f46

Best regards,
-- 
Nathan Lynch 



[PATCH 6/7] powerpc/rtas: Remove trailing space

2023-11-06 Thread Nathan Lynch via B4 Relay
From: Nathan Lynch 

Use scripts/cleanfile to remove instances of trailing space in the
core RTAS code and header.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/rtas.h |  6 +++---
 arch/powerpc/kernel/rtas.c  | 18 +-
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 2365668fc13e..1bed6be8ada3 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -268,7 +268,7 @@ typedef struct {
 #define RTAS_TYPE_DEALLOC  0xE3
 #define RTAS_TYPE_DUMP 0xE4
 #define RTAS_TYPE_HOTPLUG  0xE5
-/* I don't add PowerMGM events right now, this is a different topic */ 
+/* I don't add PowerMGM events right now, this is a different topic */
 #define RTAS_TYPE_PMGM_POWER_SW_ON 0x60
 #define RTAS_TYPE_PMGM_POWER_SW_OFF0x61
 #define RTAS_TYPE_PMGM_LID_OPEN0x62
@@ -461,7 +461,7 @@ static inline void rtas_cancel_event_scan(void) { }
 
 /* Error types logged.  */
 #define ERR_FLAG_ALREADY_LOGGED0x0
-#define ERR_FLAG_BOOT  0x1 /* log was pulled from NVRAM on boot */
+#define ERR_FLAG_BOOT  0x1 /* log was pulled from NVRAM on boot */
 #define ERR_TYPE_RTAS_LOG  0x2 /* from rtas event-scan */
 #define ERR_TYPE_KERNEL_PANIC  0x4 /* from die()/panic() */
 #define ERR_TYPE_KERNEL_PANIC_GZ 0x8   /* ditto, compressed */
@@ -471,7 +471,7 @@ static inline void rtas_cancel_event_scan(void) { }
(ERR_TYPE_RTAS_LOG | ERR_TYPE_KERNEL_PANIC | ERR_TYPE_KERNEL_PANIC_GZ)
 
 #define RTAS_DEBUG KERN_DEBUG "RTAS: "
- 
+
 #define RTAS_ERROR_LOG_MAX 2048
 
 /*
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index b5b340a91157..c49f078382a9 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -670,7 +670,7 @@ static void call_rtas_display_status_delay(char c)
static int pending_newline = 0;  /* did last write end with unprinted 
newline? */
static int width = 16;
 
-   if (c == '\n') {
+   if (c == '\n') {
while (width-- > 0)
call_rtas_display_status(' ');
width = 16;
@@ -680,7 +680,7 @@ static void call_rtas_display_status_delay(char c)
if (pending_newline) {
call_rtas_display_status('\r');
call_rtas_display_status('\n');
-   } 
+   }
pending_newline = 0;
if (width--) {
call_rtas_display_status(c);
@@ -820,7 +820,7 @@ void rtas_progress(char *s, unsigned short hex)
else
rtas_call(display_character, 1, 1, NULL, '\r');
}
- 
+
if (row_width)
width = row_width[current_line];
else
@@ -840,9 +840,9 @@ void rtas_progress(char *s, unsigned short hex)
spin_unlock(_lock);
return;
}
- 
+
/* RTAS wants CR-LF, not just LF */
- 
+
if (*os == '\n') {
rtas_call(display_character, 1, 1, NULL, '\r');
rtas_call(display_character, 1, 1, NULL, '\n');
@@ -852,7 +852,7 @@ void rtas_progress(char *s, unsigned short hex)
 */
rtas_call(display_character, 1, 1, NULL, *os);
}
- 
+
if (row_width)
width = row_width[current_line];
else
@@ -861,15 +861,15 @@ void rtas_progress(char *s, unsigned short hex)
width--;
rtas_call(display_character, 1, 1, NULL, *os);
}
- 
+
os++;
- 
+
/* if we overwrite the screen length */
if (width <= 0)
while ((*os != 0) && (*os != '\n') && (*os != '\r'))
os++;
}
- 
+
spin_unlock(_lock);
 }
 EXPORT_SYMBOL_GPL(rtas_progress);  /* needed by rtas_flash module 
*/

-- 
2.41.0



[PATCH 4/7] powerpc/rtas: Remove unused rtas_service_present()

2023-11-06 Thread Nathan Lynch via B4 Relay
From: Nathan Lynch 

rtas_service_present() has no more users.

rtas_function_implemented() is now the appropriate API for determining
whether a given RTAS function is available to call.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/rtas.h | 1 -
 arch/powerpc/kernel/rtas.c  | 5 -
 2 files changed, 6 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 3bf7f0a4b07e..c6568a647cd0 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -409,7 +409,6 @@ static inline bool rtas_function_implemented(const 
rtas_fn_handle_t handle)
return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE;
 }
 extern int rtas_token(const char *service);
-extern int rtas_service_present(const char *service);
 extern int rtas_call(int token, int, int, int *, ...);
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
int nret, ...);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index eddc031c4b95..b5b340a91157 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -900,11 +900,6 @@ int rtas_token(const char *service)
 }
 EXPORT_SYMBOL_GPL(rtas_token);
 
-int rtas_service_present(const char *service)
-{
-   return rtas_token(service) != RTAS_UNKNOWN_SERVICE;
-}
-
 #ifdef CONFIG_RTAS_ERROR_LOGGING
 
 static u32 rtas_error_log_max __ro_after_init = RTAS_ERROR_LOG_MAX;

-- 
2.41.0



[PATCH 1/7] powerpc/pseries/rtas-work-area: Fix rtas_work_area_reserve_arena() kernel-doc

2023-11-06 Thread Nathan Lynch via B4 Relay
From: Nathan Lynch 

>From a W=1 build:

>> arch/powerpc/platforms/pseries/rtas-work-area.c:189: warning: Function 
>> parameter or member 'limit' not
>> described in 'rtas_work_area_reserve_arena'

Add the missing description of the limit parameter.

Signed-off-by: Nathan Lynch 
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202309131221.bm1pg96n-...@intel.com/
---
 arch/powerpc/platforms/pseries/rtas-work-area.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/rtas-work-area.c 
b/arch/powerpc/platforms/pseries/rtas-work-area.c
index b37d52f40360..7fe34bee84d8 100644
--- a/arch/powerpc/platforms/pseries/rtas-work-area.c
+++ b/arch/powerpc/platforms/pseries/rtas-work-area.c
@@ -184,6 +184,7 @@ machine_arch_initcall(pseries, 
rtas_work_area_allocator_init);
 
 /**
  * rtas_work_area_reserve_arena() - Reserve memory suitable for RTAS work 
areas.
+ * @limit: Upper limit for memblock allocation.
  */
 void __init rtas_work_area_reserve_arena(const phys_addr_t limit)
 {

-- 
2.41.0



[PATCH 7/7] powerpc/rtas: Remove 'extern' from function declarations in rtas.h

2023-11-06 Thread Nathan Lynch via B4 Relay
From: Nathan Lynch 

This header occasionally gains new function declarations without the
leading extern in accordance with current style rules. Leaving the
legacy externs in place is making the header more difficult to read
over time because of the inconsistency. Remove them.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/include/asm/rtas.h | 52 -
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 1bed6be8ada3..6e32ad62e936 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -408,42 +408,42 @@ static inline bool rtas_function_implemented(const 
rtas_fn_handle_t handle)
 {
return rtas_function_token(handle) != RTAS_UNKNOWN_SERVICE;
 }
-extern int rtas_token(const char *service);
-extern int rtas_call(int token, int, int, int *, ...);
+int rtas_token(const char *service);
+int rtas_call(int token, int, int, int *, ...);
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
int nret, ...);
-extern void __noreturn rtas_restart(char *cmd);
-extern void rtas_power_off(void);
-extern void __noreturn rtas_halt(void);
-extern void rtas_os_term(char *str);
+void __noreturn rtas_restart(char *cmd);
+void rtas_power_off(void);
+void __noreturn rtas_halt(void);
+void rtas_os_term(char *str);
 void rtas_activate_firmware(void);
-extern int rtas_get_sensor(int sensor, int index, int *state);
-extern int rtas_get_sensor_fast(int sensor, int index, int *state);
-extern int rtas_get_power_level(int powerdomain, int *level);
-extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
-extern bool rtas_indicator_present(int token, int *maxindex);
-extern int rtas_set_indicator(int indicator, int index, int new_value);
-extern int rtas_set_indicator_fast(int indicator, int index, int new_value);
-extern void rtas_progress(char *s, unsigned short hex);
+int rtas_get_sensor(int sensor, int index, int *state);
+int rtas_get_sensor_fast(int sensor, int index, int *state);
+int rtas_get_power_level(int powerdomain, int *level);
+int rtas_set_power_level(int powerdomain, int level, int *setlevel);
+bool rtas_indicator_present(int token, int *maxindex);
+int rtas_set_indicator(int indicator, int index, int new_value);
+int rtas_set_indicator_fast(int indicator, int index, int new_value);
+void rtas_progress(char *s, unsigned short hex);
 int rtas_ibm_suspend_me(int *fw_status);
 int rtas_error_rc(int rtas_rc);
 
 struct rtc_time;
-extern time64_t rtas_get_boot_time(void);
-extern void rtas_get_rtc_time(struct rtc_time *rtc_time);
-extern int rtas_set_rtc_time(struct rtc_time *rtc_time);
+time64_t rtas_get_boot_time(void);
+void rtas_get_rtc_time(struct rtc_time *rtc_time);
+int rtas_set_rtc_time(struct rtc_time *rtc_time);
 
-extern unsigned int rtas_busy_delay_time(int status);
+unsigned int rtas_busy_delay_time(int status);
 bool rtas_busy_delay(int status);
 
-extern int early_init_dt_scan_rtas(unsigned long node,
+int early_init_dt_scan_rtas(unsigned long node,
const char *uname, int depth, void *data);
 
-extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
+void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
 
 #ifdef CONFIG_PPC_PSERIES
 extern time64_t last_rtas_event;
-extern int clobbering_unread_rtas_event(void);
+int clobbering_unread_rtas_event(void);
 int rtas_syscall_dispatch_ibm_suspend_me(u64 handle);
 #else
 static inline int clobbering_unread_rtas_event(void) { return 0; }
@@ -454,7 +454,7 @@ static inline int rtas_syscall_dispatch_ibm_suspend_me(u64 
handle)
 #endif
 
 #ifdef CONFIG_PPC_RTAS_DAEMON
-extern void rtas_cancel_event_scan(void);
+void rtas_cancel_event_scan(void);
 #else
 static inline void rtas_cancel_event_scan(void) { }
 #endif
@@ -479,7 +479,7 @@ static inline void rtas_cancel_event_scan(void) { }
  *  for all rtas calls that require an error buffer argument.
  *  This includes 'check-exception' and 'rtas-last-error'.
  */
-extern int rtas_get_error_log_max(void);
+int rtas_get_error_log_max(void);
 
 /* Event Scan Parameters */
 #define EVENT_SCAN_ALL_EVENTS  0xf000
@@ -518,8 +518,8 @@ static inline u32 rtas_config_addr(int busno, int devfn, 
int reg)
(devfn << 8) | (reg & 0xff);
 }
 
-extern void rtas_give_timebase(void);
-extern void rtas_take_timebase(void);
+void rtas_give_timebase(void);
+void rtas_take_timebase(void);
 
 #ifdef CONFIG_PPC_RTAS
 static inline int page_is_rtas_user_buf(unsigned long pfn)
@@ -532,7 +532,7 @@ static inline int page_is_rtas_user_buf(unsigned long pfn)
 
 /* Not the best place to put pSeries_coalesce_init, will be fixed when we
  * move some of the rtas suspend-me stuff to pseries */
-extern void pSeries_coalesce_init(void);
+void pSeries_coalesce_init(void);
 void rtas_initialize(void);
 #else
 static inline int page_is_rtas_user_buf(unsigned long pfn) { 

[PATCH 2/7] powerpc/rtas: Fix ppc_rtas_rmo_buf_show() kernel-doc

2023-11-06 Thread Nathan Lynch via B4 Relay
From: Nathan Lynch 

>From a W=1 build:

>> arch/powerpc/kernel/rtas-proc.c:771: warning: Function parameter or member 
>> 'm' not described in
>> 'ppc_rtas_rmo_buf_show'
>> arch/powerpc/kernel/rtas-proc.c:771: warning: Function parameter or member 
>> 'v' not described in
>> 'ppc_rtas_rmo_buf_show'

Add the missing parameter descriptions.

Signed-off-by: Nathan Lynch 
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202309211645.1lvwmbv4-...@intel.com/
---
 arch/powerpc/kernel/rtas-proc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/rtas-proc.c b/arch/powerpc/kernel/rtas-proc.c
index 9454b8395b6a..f38df72e64b8 100644
--- a/arch/powerpc/kernel/rtas-proc.c
+++ b/arch/powerpc/kernel/rtas-proc.c
@@ -752,6 +752,8 @@ static int ppc_rtas_tone_volume_show(struct seq_file *m, 
void *v)
 
 /**
  * ppc_rtas_rmo_buf_show() - Describe RTAS-addressable region for user space.
+ * @m: seq_file output target.
+ * @v: Unused.
  *
  * Base + size description of a range of RTAS-addressable memory set
  * aside for user space to use as work area(s) for certain RTAS

-- 
2.41.0



Re: [PATCH v13 20/35] KVM: x86/mmu: Handle page fault for private memory

2023-11-06 Thread Xu Yilun
On Sun, Nov 05, 2023 at 05:19:36PM +0100, Paolo Bonzini wrote:
> On Sun, Nov 5, 2023 at 2:04 PM Xu Yilun  wrote:
> >
> > > +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
> > > +   struct kvm_page_fault *fault)
> > > +{
> > > + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
> > > +   PAGE_SIZE, fault->write, fault->exec,
> > > +   fault->is_private);
> > > +}
> > > +
> > > +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > > +struct kvm_page_fault *fault)
> > > +{
> > > + int max_order, r;
> > > +
> > > + if (!kvm_slot_can_be_private(fault->slot)) {
> > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> > > + return -EFAULT;
> > > + }
> > > +
> > > + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, 
> > > >pfn,
> > > +  _order);
> > > + if (r) {
> > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
> > > + return r;
> >
> > Why report KVM_EXIT_MEMORY_FAULT here? even with a ret != -EFAULT?
> 
> The cases are EFAULT, EHWPOISON (which can report
> KVM_EXIT_MEMORY_FAULT) and ENOMEM. I think it's fine
> that even -ENOMEM can return KVM_EXIT_MEMORY_FAULT,
> and it doesn't violate the documentation.  The docs tell you "what
> can you do if error if EFAULT or EHWPOISON?"; they don't
> exclude that other errnos result in KVM_EXIT_MEMORY_FAULT,
> it's just that you're not supposed to look at it

Thanks, it's OK for ENOMEM + KVM_EXIT_MEMORY_FAULT.

Another concern is, now 3 places to report EFAULT + KVM_EXIT_MEMORY_FAULT:

  if (!kvm_slot_can_be_private(fault->slot)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
  }

  file = kvm_gmem_get_file(slot);
  if (!file)
return -EFAULT;

  if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
  }

They are different cases, and seems userspace should handle them
differently, but not enough information to distinguish them.

Thanks,
Yilun

> 
> Paolo
> 
> 


Re: [PATCH v2 37/37] powerpc: Support execute-only on all powerpc

2023-11-06 Thread Christophe Leroy


Le 02/11/2023 à 06:39, Aneesh Kumar K.V a écrit :
> Christophe Leroy  writes:
> 
>> Introduce PAGE_EXECONLY_X macro which provides exec-only rights.
>> The _X may be seen as redundant with the EXECONLY but it helps
>> keep consistancy, all macros having the EXEC right have _X.
>>
>> And put it next to PAGE_NONE as PAGE_EXECONLY_X is
>> somehow PAGE_NONE + EXEC just like all other SOMETHING_X are
>> just SOMETHING + EXEC.
>>
>> On book3s/64 PAGE_EXECONLY becomes PAGE_READONLY_X.
>>
>> On book3s/64, as PAGE_EXECONLY is only valid for Radix add
>> VM_READ flag in vm_get_page_prot() for non-Radix.
>>
>> And update access_error() so that a non exec fault on a VM_EXEC only
>> mapping is always invalid, even when the underlying layer don't
>> always generate a fault for that.
>>
>> For 8xx, set PAGE_EXECONLY_X as _PAGE_NA | _PAGE_EXEC.
>> For others, only set it as just _PAGE_EXEC
>>
>> With that change, 8xx, e500 and 44x fully honor execute-only
>> protection.
>>
>> On 40x that is a partial implementation of execute-only. The
>> implementation won't be complete because once a TLB has been loaded
>> via the Instruction TLB miss handler, it will be possible to read
>> the page. But at least it can't be read unless it is executed first.
>>
>> On 603 MMU, TLB missed are handled by SW and there are separate
>> DTLB and ITLB. Execute-only is therefore now supported by not loading
>> DTLB when read access is not permitted.
>>
>> On hash (604) MMU it is more tricky because hash table is common to
>> load/store and execute. Nevertheless it is still possible to check
>> whether _PAGE_READ is set before loading hash table for a load/store
>> access. At least it can't be read unless it is executed first.
>>
>> Signed-off-by: Christophe Leroy 
>> Cc: Russell Currey 
>> Cc: Kees Cook 
>> ---
>>   arch/powerpc/include/asm/book3s/32/pgtable.h |  2 +-
>>   arch/powerpc/include/asm/book3s/64/pgtable.h |  4 +---
>>   arch/powerpc/include/asm/nohash/32/pte-8xx.h |  1 +
>>   arch/powerpc/include/asm/nohash/pgtable.h|  2 +-
>>   arch/powerpc/include/asm/nohash/pte-e500.h   |  1 +
>>   arch/powerpc/include/asm/pgtable-masks.h |  2 ++
>>   arch/powerpc/mm/book3s64/pgtable.c   | 10 --
>>   arch/powerpc/mm/fault.c  |  9 +
>>   arch/powerpc/mm/pgtable.c|  4 ++--
>>   9 files changed, 18 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/32/pgtable.h
>> index 244621c88510..52971ee30717 100644
>> --- a/arch/powerpc/include/asm/book3s/32/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
>> @@ -425,7 +425,7 @@ static inline bool pte_access_permitted(pte_t pte, bool 
>> write)
>>   {
>>  /*
>>   * A read-only access is controlled by _PAGE_READ bit.
>> - * We have _PAGE_READ set for WRITE and EXECUTE
>> + * We have _PAGE_READ set for WRITE
>>   */
>>  if (!pte_present(pte) || !pte_read(pte))
>>  return false;
>>
> 
> Should this now be updated to check for EXEC bit ?

I don't think so based on what I see in arm64: 
https://elixir.bootlin.com/linux/v6.6/source/arch/arm64/include/asm/pgtable.h#L146

Christophe


Re: [PATCH v2 29/37] powerpc/nohash: Replace pte_user() by pte_read()

2023-11-06 Thread Christophe Leroy


Le 31/10/2023 à 11:15, Aneesh Kumar K.V a écrit :
> Christophe Leroy  writes:
> 
>> pte_user() is now only used in pte_access_permitted() to check
>> access on vmas. User flag is cleared to make a page unreadable.
>>
>> So rename it pte_read() and remove pte_user() which isn't used
>> anymore.
>>
>> For the time being it checks _PAGE_USER but in the near futur
>> all plateforms will be converted to _PAGE_READ so lets support
>> both for now.
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>>   arch/powerpc/include/asm/nohash/32/pte-8xx.h |  7 ---
>>   arch/powerpc/include/asm/nohash/pgtable.h| 13 +++--
>>   arch/powerpc/mm/ioremap.c|  4 
>>   3 files changed, 7 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
>> b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
>> index 62c965a4511a..1ee38befd29a 100644
>> --- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
>> +++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
>> @@ -112,13 +112,6 @@ static inline pte_t pte_mkwrite_novma(pte_t pte)
>>   
>>   #define pte_mkwrite_novma pte_mkwrite_novma
>>   
>> -static inline bool pte_user(pte_t pte)
>> -{
>> -return !(pte_val(pte) & _PAGE_SH);
>> -}
>> -
>> -#define pte_user pte_user
>> -
>>   static inline pte_t pte_mkhuge(pte_t pte)
>>   {
>>  return __pte(pte_val(pte) | _PAGE_SPS | _PAGE_HUGE);
>> diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
>> b/arch/powerpc/include/asm/nohash/pgtable.h
>> index ee677162f9e6..aba56fe3b1c6 100644
>> --- a/arch/powerpc/include/asm/nohash/pgtable.h
>> +++ b/arch/powerpc/include/asm/nohash/pgtable.h
>> @@ -160,9 +160,6 @@ static inline int pte_write(pte_t pte)
>>  return pte_val(pte) & _PAGE_WRITE;
>>   }
>>   #endif
>> -#ifndef pte_read
>> -static inline int pte_read(pte_t pte)   { return 1; }
>> -#endif
>>   static inline int pte_dirty(pte_t pte) { return pte_val(pte) & 
>> _PAGE_DIRTY; }
>>   static inline int pte_special(pte_t pte)   { return pte_val(pte) & 
>> _PAGE_SPECIAL; }
>>   static inline int pte_none(pte_t pte)  { return (pte_val(pte) 
>> & ~_PTE_NONE_MASK) == 0; }
>> @@ -190,10 +187,14 @@ static inline int pte_young(pte_t pte)
>>* and PTE_64BIT, PAGE_KERNEL_X contains _PAGE_BAP_SR which is also in
>>* _PAGE_USER.  Need to explicitly match _PAGE_BAP_UR bit in that case too.
>>*/
>> -#ifndef pte_user
>> -static inline bool pte_user(pte_t pte)
>> +#ifndef pte_read
>> +static inline bool pte_read(pte_t pte)
>>   {
>> +#ifdef _PAGE_READ
>> +return (pte_val(pte) & _PAGE_READ) == _PAGE_READ;
>> +#else
>>  return (pte_val(pte) & _PAGE_USER) == _PAGE_USER;
>> +#endif
>>   }
>>   #endif
>>   
>> @@ -208,7 +209,7 @@ static inline bool pte_access_permitted(pte_t pte, bool 
>> write)
>>   * A read-only access is controlled by _PAGE_USER bit.
>>   * We have _PAGE_READ set for WRITE and EXECUTE
>>   */
>> -if (!pte_present(pte) || !pte_user(pte) || !pte_read(pte))
>> +if (!pte_present(pte) || !pte_read(pte))
>>  return false;
>>   
>>  if (write && !pte_write(pte))
>> diff --git a/arch/powerpc/mm/ioremap.c b/arch/powerpc/mm/ioremap.c
>> index 7823c38f09de..7b0afcabd89f 100644
>> --- a/arch/powerpc/mm/ioremap.c
>> +++ b/arch/powerpc/mm/ioremap.c
>> @@ -50,10 +50,6 @@ void __iomem *ioremap_prot(phys_addr_t addr, size_t size, 
>> unsigned long flags)
>>  if (pte_write(pte))
>>  pte = pte_mkdirty(pte);
>>   
>> -/* we don't want to let _PAGE_USER leak out */
>> -if (WARN_ON(pte_user(pte)))
>> -return NULL;
>>
> 
> This check is still valid right? I understand that we want to remove
> _PAGE_USER. But then loosing this check is ok?

Well, we may have to think about it for book3s/64. For all others 
_PAGE_USER is gone and replaced by a check of addresses versus TASK_SIZE.

As ioremap() will map into vmalloc space that address is necesserally 
correct.

> 
>> -
>>  if (iowa_is_active())
>>  return iowa_ioremap(addr, size, pte_pgprot(pte), caller);
>>  return __ioremap_caller(addr, size, pte_pgprot(pte), caller);
>> -- 
>> 2.41.0


Re: [PATCH 27/34] KVM: selftests: Introduce VM "shape" to allow tests to specify the VM type

2023-11-06 Thread Fuad Tabba
On Sun, Nov 5, 2023 at 4:34 PM Paolo Bonzini  wrote:
>
> From: Sean Christopherson 
>
> Add a "vm_shape" structure to encapsulate the selftests-defined "mode",
> along with the KVM-defined "type" for use when creating a new VM.  "mode"
> tracks physical and virtual address properties, as well as the preferred
> backing memory type, while "type" corresponds to the VM type.
>
> Taking the VM type will allow adding tests for KVM_CREATE_GUEST_MEMFD,
> a.k.a. guest private memory, without needing an entirely separate set of
> helpers.  Guest private memory is effectively usable only by confidential
> VM types, and it's expected that x86 will double down and require unique
> VM types for TDX and SNP guests.
>
> Signed-off-by: Sean Christopherson 
> Message-Id: <20231027182217.3615211-30-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---

nit: as in a prior selftest commit messages, references in the commit
message to guest _private_ memory. Should these be changed to just
guest memory?

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad

>  tools/testing/selftests/kvm/dirty_log_test.c  |  2 +-
>  .../selftests/kvm/include/kvm_util_base.h | 54 +++
>  .../selftests/kvm/kvm_page_table_test.c   |  2 +-
>  tools/testing/selftests/kvm/lib/kvm_util.c| 43 +++
>  tools/testing/selftests/kvm/lib/memstress.c   |  3 +-
>  .../kvm/x86_64/ucna_injection_test.c  |  2 +-
>  6 files changed, 72 insertions(+), 34 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c 
> b/tools/testing/selftests/kvm/dirty_log_test.c
> index 936f3a8d1b83..6cbecf499767 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -699,7 +699,7 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, 
> struct kvm_vcpu **vcpu,
>
> pr_info("Testing guest mode: %s\n", vm_guest_mode_string(mode));
>
> -   vm = __vm_create(mode, 1, extra_mem_pages);
> +   vm = __vm_create(VM_SHAPE(mode), 1, extra_mem_pages);
>
> log_mode_create_vm_done(vm);
> *vcpu = vm_vcpu_add(vm, 0, guest_code);
> diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
> b/tools/testing/selftests/kvm/include/kvm_util_base.h
> index 1441fca6c273..157508c071f3 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> @@ -188,6 +188,23 @@ enum vm_guest_mode {
> NUM_VM_MODES,
>  };
>
> +struct vm_shape {
> +   enum vm_guest_mode mode;
> +   unsigned int type;
> +};
> +
> +#define VM_TYPE_DEFAULT0
> +
> +#define VM_SHAPE(__mode)   \
> +({ \
> +   struct vm_shape shape = {   \
> +   .mode = (__mode),   \
> +   .type = VM_TYPE_DEFAULT \
> +   };  \
> +   \
> +   shape;  \
> +})
> +
>  #if defined(__aarch64__)
>
>  extern enum vm_guest_mode vm_mode_default;
> @@ -220,6 +237,8 @@ extern enum vm_guest_mode vm_mode_default;
>
>  #endif
>
> +#define VM_SHAPE_DEFAULT   VM_SHAPE(VM_MODE_DEFAULT)
> +
>  #define MIN_PAGE_SIZE  (1U << MIN_PAGE_SHIFT)
>  #define PTES_PER_MIN_PAGE  ptes_per_page(MIN_PAGE_SIZE)
>
> @@ -784,21 +803,21 @@ vm_paddr_t vm_alloc_page_table(struct kvm_vm *vm);
>   * __vm_create() does NOT create vCPUs, @nr_runnable_vcpus is used purely to
>   * calculate the amount of memory needed for per-vCPU data, e.g. stacks.
>   */
> -struct kvm_vm *vm_create(enum vm_guest_mode mode);
> -struct kvm_vm *__vm_create(enum vm_guest_mode mode, uint32_t 
> nr_runnable_vcpus,
> +struct kvm_vm *vm_create(struct vm_shape shape);
> +struct kvm_vm *__vm_create(struct vm_shape shape, uint32_t nr_runnable_vcpus,
>uint64_t nr_extra_pages);
>
>  static inline struct kvm_vm *vm_create_barebones(void)
>  {
> -   return vm_create(VM_MODE_DEFAULT);
> +   return vm_create(VM_SHAPE_DEFAULT);
>  }
>
>  static inline struct kvm_vm *vm_create(uint32_t nr_runnable_vcpus)
>  {
> -   return __vm_create(VM_MODE_DEFAULT, nr_runnable_vcpus, 0);
> +   return __vm_create(VM_SHAPE_DEFAULT, nr_runnable_vcpus, 0);
>  }
>
> -struct kvm_vm *__vm_create_with_vcpus(enum vm_guest_mode mode, uint32_t 
> nr_vcpus,
> +struct kvm_vm *__vm_create_with_vcpus(struct vm_shape shape, uint32_t 
> nr_vcpus,
>   uint64_t extra_mem_pages,
>   void *guest_code, struct kvm_vcpu 
> *vcpus[]);
>
> @@ -806,17 +825,27 @@ static inline struct kvm_vm 
> *vm_create_with_vcpus(uint32_t nr_vcpus,
>   void *guest_code,
>   struct kvm_vcpu *vcpus[])
>  {
> -   return 

Re: [PATCH 28/34] KVM: selftests: Add GUEST_SYNC[1-6] macros for synchronizing more data

2023-11-06 Thread Fuad Tabba
On Sun, Nov 5, 2023 at 4:34 PM Paolo Bonzini  wrote:
>
> From: Sean Christopherson 
>
> Add GUEST_SYNC[1-6]() so that tests can pass the maximum amount of
> information supported via ucall(), without needing to resort to shared
> memory.
>
> Signed-off-by: Sean Christopherson 
> Message-Id: <20231027182217.3615211-31-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad

>  tools/testing/selftests/kvm/include/ucall_common.h | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/ucall_common.h 
> b/tools/testing/selftests/kvm/include/ucall_common.h
> index ce33d306c2cb..0fb472a5a058 100644
> --- a/tools/testing/selftests/kvm/include/ucall_common.h
> +++ b/tools/testing/selftests/kvm/include/ucall_common.h
> @@ -52,6 +52,17 @@ int ucall_nr_pages_required(uint64_t page_size);
>  #define GUEST_SYNC_ARGS(stage, arg1, arg2, arg3, arg4) \
> ucall(UCALL_SYNC, 6, "hello", stage, arg1, 
> arg2, arg3, arg4)
>  #define GUEST_SYNC(stage)  ucall(UCALL_SYNC, 2, "hello", stage)
> +#define GUEST_SYNC1(arg0)  ucall(UCALL_SYNC, 1, arg0)
> +#define GUEST_SYNC2(arg0, arg1)ucall(UCALL_SYNC, 2, arg0, arg1)
> +#define GUEST_SYNC3(arg0, arg1, arg2) \
> +   ucall(UCALL_SYNC, 3, arg0, arg1, arg2)
> +#define GUEST_SYNC4(arg0, arg1, arg2, arg3) \
> +   ucall(UCALL_SYNC, 4, arg0, arg1, arg2, arg3)
> +#define GUEST_SYNC5(arg0, arg1, arg2, arg3, arg4) \
> +   ucall(UCALL_SYNC, 5, arg0, arg1, arg2, arg3, 
> arg4)
> +#define GUEST_SYNC6(arg0, arg1, arg2, arg3, arg4, arg5) \
> +   ucall(UCALL_SYNC, 6, arg0, arg1, arg2, arg3, 
> arg4, arg5)
> +
>  #define GUEST_PRINTF(_fmt, _args...) ucall_fmt(UCALL_PRINTF, _fmt, ##_args)
>  #define GUEST_DONE()   ucall(UCALL_DONE, 0)
>
> --
> 2.39.1
>
>


Re: [PATCH 26/34] KVM: selftests: Add helpers to do KVM_HC_MAP_GPA_RANGE hypercalls (x86)

2023-11-06 Thread Fuad Tabba
On Sun, Nov 5, 2023 at 4:34 PM Paolo Bonzini  wrote:
>
> From: Vishal Annapurve 
>
> Add helpers for x86 guests to invoke the KVM_HC_MAP_GPA_RANGE hypercall,
> which KVM will forward to userspace and thus can be used by tests to
> coordinate private<=>shared conversions between host userspace code and
> guest code.
>
> Signed-off-by: Vishal Annapurve 
> [sean: drop shared/private helpers (let tests specify flags)]
> Signed-off-by: Sean Christopherson 
> Message-Id: <20231027182217.3615211-29-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad

>  .../selftests/kvm/include/x86_64/processor.h  | 15 +++
>  1 file changed, 15 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h 
> b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 25bc61dac5fb..a84863503fcb 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>
> +#include 
>  #include 
>
>  #include "../kvm_util.h"
> @@ -1194,6 +1195,20 @@ uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, 
> uint64_t a1, uint64_t a2,
>  uint64_t __xen_hypercall(uint64_t nr, uint64_t a0, void *a1);
>  void xen_hypercall(uint64_t nr, uint64_t a0, void *a1);
>
> +static inline uint64_t __kvm_hypercall_map_gpa_range(uint64_t gpa,
> +uint64_t size, uint64_t 
> flags)
> +{
> +   return kvm_hypercall(KVM_HC_MAP_GPA_RANGE, gpa, size >> PAGE_SHIFT, 
> flags, 0);
> +}
> +
> +static inline void kvm_hypercall_map_gpa_range(uint64_t gpa, uint64_t size,
> +  uint64_t flags)
> +{
> +   uint64_t ret = __kvm_hypercall_map_gpa_range(gpa, size, flags);
> +
> +   GUEST_ASSERT(!ret);
> +}
> +
>  void __vm_xsave_require_permission(uint64_t xfeature, const char *name);
>
>  #define vm_xsave_require_permission(xfeature)  \
> --
> 2.39.1
>
>


Re: [PATCH 14/34] fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()

2023-11-06 Thread Fuad Tabba
On Sun, Nov 5, 2023 at 4:32 PM Paolo Bonzini  wrote:
>
> The call to the inode_init_security_anon() LSM hook is not the sole
> reason to use anon_inode_getfile_secure() or anon_inode_getfd_secure().
> For example, the functions also allow one to create a file with non-zero
> size, without needing a full-blown filesystem.  In this case, you don't
> need a "secure" version, just unique inodes; the current name of the
> functions is confusing and does not explain well the difference with
> the more "standard" anon_inode_getfile() and anon_inode_getfd().
>
> Of course, there is another side of the coin; neither io_uring nor
> userfaultfd strictly speaking need distinct inodes, and it is not
> that clear anymore that anon_inode_create_get{file,fd}() allow the LSM
> to intercept and block the inode's creation.  If one was so inclined,
> anon_inode_getfile_secure() and anon_inode_getfd_secure() could be kept,
> using the shared inode or a new one depending on CONFIG_SECURITY.
> However, this is probably overkill, and potentially a cause of bugs in
> different configurations.  Therefore, just add a comment to io_uring
> and userfaultfd explaining the choice of the function.
>
> While at it, remove the export for what is now anon_inode_create_getfd().
> There is no in-tree module that uses it, and the old name is gone anyway.
> If anybody actually needs the symbol, they can ask or they can just use
> anon_inode_create_getfile(), which will be exported very soon for use
> in KVM.
>
> Suggested-by: Christian Brauner 
> Signed-off-by: Paolo Bonzini 
> ---

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad


>  fs/anon_inodes.c| 46 +++--
>  fs/userfaultfd.c|  5 ++--
>  include/linux/anon_inodes.h |  4 ++--
>  io_uring/io_uring.c |  3 ++-
>  4 files changed, 36 insertions(+), 22 deletions(-)
>
> diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
> index 24192a7667ed..3d4a27f8b4fe 100644
> --- a/fs/anon_inodes.c
> +++ b/fs/anon_inodes.c
> @@ -79,7 +79,7 @@ static struct file *__anon_inode_getfile(const char *name,
>  const struct file_operations *fops,
>  void *priv, int flags,
>  const struct inode *context_inode,
> -bool secure)
> +bool make_inode)
>  {
> struct inode *inode;
> struct file *file;
> @@ -87,7 +87,7 @@ static struct file *__anon_inode_getfile(const char *name,
> if (fops->owner && !try_module_get(fops->owner))
> return ERR_PTR(-ENOENT);
>
> -   if (secure) {
> +   if (make_inode) {
> inode = anon_inode_make_secure_inode(name, context_inode);
> if (IS_ERR(inode)) {
> file = ERR_CAST(inode);
> @@ -149,13 +149,10 @@ struct file *anon_inode_getfile(const char *name,
>  EXPORT_SYMBOL_GPL(anon_inode_getfile);
>
>  /**
> - * anon_inode_getfile_secure - Like anon_inode_getfile(), but creates a new
> + * anon_inode_create_getfile - Like anon_inode_getfile(), but creates a new
>   * !S_PRIVATE anon inode rather than reuse the
>   * singleton anon inode and calls the
> - * inode_init_security_anon() LSM hook.  This
> - * allows for both the inode to have its own
> - * security context and for the LSM to enforce
> - * policy on the inode's creation.
> + * inode_init_security_anon() LSM hook.
>   *
>   * @name:[in]name of the "class" of the new file
>   * @fops:[in]file operations for the new file
> @@ -164,11 +161,19 @@ EXPORT_SYMBOL_GPL(anon_inode_getfile);
>   * @context_inode:
>   *   [in]the logical relationship with the new inode (optional)
>   *
> + * Create a new anonymous inode and file pair.  This can be done for two
> + * reasons:
> + * - for the inode to have its own security context, so that LSMs can enforce
> + *   policy on the inode's creation;
> + * - if the caller needs a unique inode, for example in order to customize
> + *   the size returned by fstat()
> + *
>   * The LSM may use @context_inode in inode_init_security_anon(), but a
> - * reference to it is not held.  Returns the newly created file* or an error
> - * pointer.  See the anon_inode_getfile() documentation for more information.
> + * reference to it is not held.
> + *
> + * Returns the newly created file* or an error pointer.
>   */
> -struct file *anon_inode_getfile_secure(const char *name,
> +struct file *anon_inode_create_getfile(const char *name,
>const struct file_operations *fops,
>void *priv, int flags,
>const struct 

Re: [PATCH v13 27/35] KVM: selftests: Add helpers to convert guest memory b/w private and shared

2023-11-06 Thread Fuad Tabba
On Fri, Oct 27, 2023 at 7:23 PM Sean Christopherson  wrote:
>
> From: Vishal Annapurve 
>
> Add helpers to convert memory between private and shared via KVM's
> memory attributes, as well as helpers to free/allocate guest_memfd memory
> via fallocate().  Userspace, i.e. tests, is NOT required to do fallocate()
> when converting memory, as the attributes are the single source of true.

true->truth

> Provide allocate() helpers so that tests can mimic a userspace that frees
> private memory on conversion, e.g. to prioritize memory usage over
> performance.
>
> Signed-off-by: Vishal Annapurve 
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> ---
>  .../selftests/kvm/include/kvm_util_base.h | 48 +++
>  tools/testing/selftests/kvm/lib/kvm_util.c| 28 +++
>  2 files changed, 76 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
> b/tools/testing/selftests/kvm/include/kvm_util_base.h
> index 9f861182c02a..1441fca6c273 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> @@ -333,6 +333,54 @@ static inline void vm_enable_cap(struct kvm_vm *vm, 
> uint32_t cap, uint64_t arg0)
> vm_ioctl(vm, KVM_ENABLE_CAP, _cap);
>  }
>
> +static inline void vm_set_memory_attributes(struct kvm_vm *vm, uint64_t gpa,
> +   uint64_t size, uint64_t 
> attributes)
> +{
> +   struct kvm_memory_attributes attr = {
> +   .attributes = attributes,
> +   .address = gpa,
> +   .size = size,
> +   .flags = 0,
> +   };
> +
> +   /*
> +* KVM_SET_MEMORY_ATTRIBUTES overwrites _all_ attributes.  These flows
> +* need significant enhancements to support multiple attributes.
> +*/
> +   TEST_ASSERT(!attributes || attributes == KVM_MEMORY_ATTRIBUTE_PRIVATE,
> +   "Update me to support multiple attributes!");
> +
> +   vm_ioctl(vm, KVM_SET_MEMORY_ATTRIBUTES, );
> +}
> +
> +
> +static inline void vm_mem_set_private(struct kvm_vm *vm, uint64_t gpa,
> + uint64_t size)
> +{
> +   vm_set_memory_attributes(vm, gpa, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +}
> +
> +static inline void vm_mem_set_shared(struct kvm_vm *vm, uint64_t gpa,
> +uint64_t size)
> +{
> +   vm_set_memory_attributes(vm, gpa, size, 0);
> +}
> +
> +void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t gpa, uint64_t size,
> +   bool punch_hole);
> +
> +static inline void vm_guest_mem_punch_hole(struct kvm_vm *vm, uint64_t gpa,
> +  uint64_t size)
> +{
> +   vm_guest_mem_fallocate(vm, gpa, size, true);
> +}
> +
> +static inline void vm_guest_mem_allocate(struct kvm_vm *vm, uint64_t gpa,
> +uint64_t size)
> +{
> +   vm_guest_mem_fallocate(vm, gpa, size, false);
> +}
> +
>  void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size);
>  const char *vm_guest_mode_string(uint32_t i);
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 45050f54701a..a140aee8d0f5 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1176,6 +1176,34 @@ void vm_mem_region_delete(struct kvm_vm *vm, uint32_t 
> slot)
> __vm_mem_region_delete(vm, memslot2region(vm, slot), true);
>  }
>
> +void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t base, uint64_t size,
> +   bool punch_hole)
> +{
> +   const int mode = FALLOC_FL_KEEP_SIZE | (punch_hole ? 
> FALLOC_FL_PUNCH_HOLE : 0);
> +   struct userspace_mem_region *region;
> +   uint64_t end = base + size;
> +   uint64_t gpa, len;
> +   off_t fd_offset;
> +   int ret;
> +
> +   for (gpa = base; gpa < end; gpa += len) {
> +   uint64_t offset;
> +
> +   region = userspace_mem_region_find(vm, gpa, gpa);
> +   TEST_ASSERT(region && region->region.flags & KVM_MEM_PRIVATE,
> +   "Private memory region not found for GPA 0x%lx", 
> gpa);
> +
> +   offset = (gpa - region->region.guest_phys_addr);

nit: why the parentheses?

> +   fd_offset = region->region.guest_memfd_offset + offset;
> +   len = min_t(uint64_t, end - gpa, region->region.memory_size - 
> offset);
> +
> +   ret = fallocate(region->region.guest_memfd, mode, fd_offset, 
> len);
> +   TEST_ASSERT(!ret, "fallocate() failed to %s at %lx (len = 
> %lu), fd = %d, mode = %x, offset = %lx\n",
> +   punch_hole ? "punch hole" : "allocate", gpa, len,
> +   region->region.guest_memfd, mode, fd_offset);
> +   }
> +}
> +

Nits aside:

Reviewed-by: Fuad Tabba 

Re: [PATCH 25/34] KVM: selftests: Add helpers to convert guest memory b/w private and shared

2023-11-06 Thread Fuad Tabba
On Sun, Nov 5, 2023 at 4:34 PM Paolo Bonzini  wrote:
>
> From: Vishal Annapurve 
>
> Add helpers to convert memory between private and shared via KVM's
> memory attributes, as well as helpers to free/allocate guest_memfd memory
> via fallocate().  Userspace, i.e. tests, is NOT required to do fallocate()
> when converting memory, as the attributes are the single source of true.

nit: true->truth

> Provide allocate() helpers so that tests can mimic a userspace that frees
> private memory on conversion, e.g. to prioritize memory usage over
> performance.
>
> Signed-off-by: Vishal Annapurve 
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> Message-Id: <20231027182217.3615211-28-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---
>  .../selftests/kvm/include/kvm_util_base.h | 48 +++
>  tools/testing/selftests/kvm/lib/kvm_util.c| 28 +++
>  2 files changed, 76 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
> b/tools/testing/selftests/kvm/include/kvm_util_base.h
> index 9f861182c02a..1441fca6c273 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> @@ -333,6 +333,54 @@ static inline void vm_enable_cap(struct kvm_vm *vm, 
> uint32_t cap, uint64_t arg0)
> vm_ioctl(vm, KVM_ENABLE_CAP, _cap);
>  }
>
> +static inline void vm_set_memory_attributes(struct kvm_vm *vm, uint64_t gpa,
> +   uint64_t size, uint64_t 
> attributes)
> +{
> +   struct kvm_memory_attributes attr = {
> +   .attributes = attributes,
> +   .address = gpa,
> +   .size = size,
> +   .flags = 0,
> +   };
> +
> +   /*
> +* KVM_SET_MEMORY_ATTRIBUTES overwrites _all_ attributes.  These flows
> +* need significant enhancements to support multiple attributes.
> +*/
> +   TEST_ASSERT(!attributes || attributes == KVM_MEMORY_ATTRIBUTE_PRIVATE,
> +   "Update me to support multiple attributes!");
> +
> +   vm_ioctl(vm, KVM_SET_MEMORY_ATTRIBUTES, );
> +}
> +
> +
> +static inline void vm_mem_set_private(struct kvm_vm *vm, uint64_t gpa,
> + uint64_t size)
> +{
> +   vm_set_memory_attributes(vm, gpa, size, KVM_MEMORY_ATTRIBUTE_PRIVATE);
> +}
> +
> +static inline void vm_mem_set_shared(struct kvm_vm *vm, uint64_t gpa,
> +uint64_t size)
> +{
> +   vm_set_memory_attributes(vm, gpa, size, 0);
> +}
> +
> +void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t gpa, uint64_t size,
> +   bool punch_hole);
> +
> +static inline void vm_guest_mem_punch_hole(struct kvm_vm *vm, uint64_t gpa,
> +  uint64_t size)
> +{
> +   vm_guest_mem_fallocate(vm, gpa, size, true);
> +}
> +
> +static inline void vm_guest_mem_allocate(struct kvm_vm *vm, uint64_t gpa,
> +uint64_t size)
> +{
> +   vm_guest_mem_fallocate(vm, gpa, size, false);
> +}
> +
>  void vm_enable_dirty_ring(struct kvm_vm *vm, uint32_t ring_size);
>  const char *vm_guest_mode_string(uint32_t i);
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index b63500fca627..95a553400ea9 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1167,6 +1167,34 @@ void vm_mem_region_delete(struct kvm_vm *vm, uint32_t 
> slot)
> __vm_mem_region_delete(vm, memslot2region(vm, slot), true);
>  }
>
> +void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t base, uint64_t size,
> +   bool punch_hole)
> +{
> +   const int mode = FALLOC_FL_KEEP_SIZE | (punch_hole ? 
> FALLOC_FL_PUNCH_HOLE : 0);
> +   struct userspace_mem_region *region;
> +   uint64_t end = base + size;
> +   uint64_t gpa, len;
> +   off_t fd_offset;
> +   int ret;
> +
> +   for (gpa = base; gpa < end; gpa += len) {
> +   uint64_t offset;
> +
> +   region = userspace_mem_region_find(vm, gpa, gpa);
> +   TEST_ASSERT(region && region->region.flags & 
> KVM_MEM_GUEST_MEMFD,
> +   "Private memory region not found for GPA 0x%lx", 
> gpa);
> +
> +   offset = (gpa - region->region.guest_phys_addr);

nit: why the parentheses?

> +   fd_offset = region->region.guest_memfd_offset + offset;
> +   len = min_t(uint64_t, end - gpa, region->region.memory_size - 
> offset);
> +
> +   ret = fallocate(region->region.guest_memfd, mode, fd_offset, 
> len);
> +   TEST_ASSERT(!ret, "fallocate() failed to %s at %lx (len = 
> %lu), fd = %d, mode = %x, offset = %lx\n",
> +   punch_hole ? "punch hole" : "allocate", gpa, len,
> +   

Re: [PATCH 24/34] KVM: selftests: Add support for creating private memslots

2023-11-06 Thread Fuad Tabba
Hi,

Regarding the subject (and the commit message), should we still be
calling them "private" slots, or guestmem_slots?

On Sun, Nov 5, 2023 at 4:34 PM Paolo Bonzini  wrote:
>
> From: Sean Christopherson 
>
> Add support for creating "private" memslots via KVM_CREATE_GUEST_MEMFD and
> KVM_SET_USER_MEMORY_REGION2.  Make vm_userspace_mem_region_add() a wrapper
> to its effective replacement, vm_mem_add(), so that private memslots are
> fully opt-in, i.e. don't require update all tests that add memory regions.

nit: update->updating

>
> Pivot on the KVM_MEM_PRIVATE flag instead of the validity of the "gmem"

KVM_MEM_PRIVATE  -> KVM_MEM_GUEST_MEMFD

> file descriptor so that simple tests can let vm_mem_add() do the heavy
> lifting of creating the guest memfd, but also allow the caller to pass in
> an explicit fd+offset so that fancier tests can do things like back
> multiple memslots with a single file.  If the caller passes in a fd, dup()
> the fd so that (a) __vm_mem_region_delete() can close the fd associated
> with the memory region without needing yet another flag, and (b) so that
> the caller can safely close its copy of the fd without having to first
> destroy memslots.
>
> Co-developed-by: Ackerley Tng 
> Signed-off-by: Ackerley Tng 
> Signed-off-by: Sean Christopherson 
> Message-Id: <20231027182217.3615211-27-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---
>  .../selftests/kvm/include/kvm_util_base.h | 23 ++
>  .../testing/selftests/kvm/include/test_util.h |  5 ++
>  tools/testing/selftests/kvm/lib/kvm_util.c| 76 +++
>  3 files changed, 73 insertions(+), 31 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
> b/tools/testing/selftests/kvm/include/kvm_util_base.h
> index 9f144841c2ee..9f861182c02a 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> @@ -431,6 +431,26 @@ static inline uint64_t vm_get_stat(struct kvm_vm *vm, 
> const char *stat_name)
>
>  void vm_create_irqchip(struct kvm_vm *vm);
>
> +static inline int __vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size,
> +   uint64_t flags)
> +{
> +   struct kvm_create_guest_memfd guest_memfd = {
> +   .size = size,
> +   .flags = flags,
> +   };
> +
> +   return __vm_ioctl(vm, KVM_CREATE_GUEST_MEMFD, _memfd);
> +}
> +
> +static inline int vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size,
> +   uint64_t flags)
> +{
> +   int fd = __vm_create_guest_memfd(vm, size, flags);
> +
> +   TEST_ASSERT(fd >= 0, KVM_IOCTL_ERROR(KVM_CREATE_GUEST_MEMFD, fd));
> +   return fd;
> +}
> +
>  void vm_set_user_memory_region(struct kvm_vm *vm, uint32_t slot, uint32_t 
> flags,
>uint64_t gpa, uint64_t size, void *hva);
>  int __vm_set_user_memory_region(struct kvm_vm *vm, uint32_t slot, uint32_t 
> flags,
> @@ -439,6 +459,9 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
> enum vm_mem_backing_src_type src_type,
> uint64_t guest_paddr, uint32_t slot, uint64_t npages,
> uint32_t flags);
> +void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
> +   uint64_t guest_paddr, uint32_t slot, uint64_t npages,
> +   uint32_t flags, int guest_memfd_fd, uint64_t 
> guest_memfd_offset);
>
>  void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t 
> flags);
>  void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa);
> diff --git a/tools/testing/selftests/kvm/include/test_util.h 
> b/tools/testing/selftests/kvm/include/test_util.h
> index 7e614adc6cf4..7257f2243ab9 100644
> --- a/tools/testing/selftests/kvm/include/test_util.h
> +++ b/tools/testing/selftests/kvm/include/test_util.h
> @@ -142,6 +142,11 @@ static inline bool backing_src_is_shared(enum 
> vm_mem_backing_src_type t)
> return vm_mem_backing_src_alias(t)->flag & MAP_SHARED;
>  }
>
> +static inline bool backing_src_can_be_huge(enum vm_mem_backing_src_type t)
> +{
> +   return t != VM_MEM_SRC_ANONYMOUS && t != VM_MEM_SRC_SHMEM;
> +}
> +
>  /* Aligns x up to the next multiple of size. Size must be a power of 2. */
>  static inline uint64_t align_up(uint64_t x, uint64_t size)
>  {
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 3676b37bea38..b63500fca627 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -669,6 +669,8 @@ static void __vm_mem_region_delete(struct kvm_vm *vm,
> TEST_ASSERT(!ret, __KVM_SYSCALL_ERROR("munmap()", ret));
> close(region->fd);
> }
> +   if (region->region.guest_memfd >= 0)
> +   close(region->region.guest_memfd);
>
> free(region);
>  }
> @@ -870,36 +872,15 @@ void 

Re: [PATCH 23/34] KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2

2023-11-06 Thread Fuad Tabba
On Sun, Nov 5, 2023 at 4:33 PM Paolo Bonzini  wrote:
>
> From: Sean Christopherson 
>
> Use KVM_SET_USER_MEMORY_REGION2 throughout KVM's selftests library so that
> support for guest private memory can be added without needing an entirely
> separate set of helpers.
>
> Note, this obviously makes selftests backwards-incompatible with older KVM
> versions from this point forward.
>
> Signed-off-by: Sean Christopherson 
> Message-Id: <20231027182217.3615211-26-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad

>  .../selftests/kvm/include/kvm_util_base.h |  2 +-
>  tools/testing/selftests/kvm/lib/kvm_util.c| 19 ++-
>  2 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
> b/tools/testing/selftests/kvm/include/kvm_util_base.h
> index 967eaaeacd75..9f144841c2ee 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> @@ -44,7 +44,7 @@ typedef uint64_t vm_paddr_t; /* Virtual Machine (Guest) 
> physical address */
>  typedef uint64_t vm_vaddr_t; /* Virtual Machine (Guest) virtual address */
>
>  struct userspace_mem_region {
> -   struct kvm_userspace_memory_region region;
> +   struct kvm_userspace_memory_region2 region;
> struct sparsebit *unused_phy_pages;
> int fd;
> off_t offset;
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index f09295d56c23..3676b37bea38 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -453,8 +453,9 @@ void kvm_vm_restart(struct kvm_vm *vmp)
> vm_create_irqchip(vmp);
>
> hash_for_each(vmp->regions.slot_hash, ctr, region, slot_node) {
> -   int ret = ioctl(vmp->fd, KVM_SET_USER_MEMORY_REGION, 
> >region);
> -   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION IOCTL 
> failed,\n"
> +   int ret = ioctl(vmp->fd, KVM_SET_USER_MEMORY_REGION2, 
> >region);
> +
> +   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL 
> failed,\n"
> "  rc: %i errno: %i\n"
> "  slot: %u flags: 0x%x\n"
> "  guest_phys_addr: 0x%llx size: 0x%llx",
> @@ -657,7 +658,7 @@ static void __vm_mem_region_delete(struct kvm_vm *vm,
> }
>
> region->region.memory_size = 0;
> -   vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, >region);
> +   vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, >region);
>
> sparsebit_free(>unused_phy_pages);
> ret = munmap(region->mmap_start, region->mmap_size);
> @@ -1014,8 +1015,8 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm,
> region->region.guest_phys_addr = guest_paddr;
> region->region.memory_size = npages * vm->page_size;
> region->region.userspace_addr = (uintptr_t) region->host_mem;
> -   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, >region);
> -   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION IOCTL failed,\n"
> +   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, >region);
> +   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n"
> "  rc: %i errno: %i\n"
> "  slot: %u flags: 0x%x\n"
> "  guest_phys_addr: 0x%lx size: 0x%lx",
> @@ -1097,9 +1098,9 @@ void vm_mem_region_set_flags(struct kvm_vm *vm, 
> uint32_t slot, uint32_t flags)
>
> region->region.flags = flags;
>
> -   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, >region);
> +   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, >region);
>
> -   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION IOCTL failed,\n"
> +   TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n"
> "  rc: %i errno: %i slot: %u flags: 0x%x",
> ret, errno, slot, flags);
>  }
> @@ -1127,9 +1128,9 @@ void vm_mem_region_move(struct kvm_vm *vm, uint32_t 
> slot, uint64_t new_gpa)
>
> region->region.guest_phys_addr = new_gpa;
>
> -   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, >region);
> +   ret = __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, >region);
>
> -   TEST_ASSERT(!ret, "KVM_SET_USER_MEMORY_REGION failed\n"
> +   TEST_ASSERT(!ret, "KVM_SET_USER_MEMORY_REGION2 failed\n"
> "ret: %i errno: %i slot: %u new_gpa: 0x%lx",
> ret, errno, slot, new_gpa);
>  }
> --
> 2.39.1
>
>


Re: [PATCH 22/34] KVM: selftests: Drop unused kvm_userspace_memory_region_find() helper

2023-11-06 Thread Fuad Tabba
On Sun, Nov 5, 2023 at 4:33 PM Paolo Bonzini  wrote:
>
> From: Sean Christopherson 
>
> Drop kvm_userspace_memory_region_find(), it's unused and a terrible API
> (probably why it's unused).  If anything outside of kvm_util.c needs to
> get at the memslot, userspace_mem_region_find() can be exposed to give
> others full access to all memory region/slot information.
>
> Signed-off-by: Sean Christopherson 
> Message-Id: <20231027182217.3615211-25-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad

>  .../selftests/kvm/include/kvm_util_base.h |  4 ---
>  tools/testing/selftests/kvm/lib/kvm_util.c| 29 ---
>  2 files changed, 33 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h 
> b/tools/testing/selftests/kvm/include/kvm_util_base.h
> index a18db6a7b3cf..967eaaeacd75 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> @@ -776,10 +776,6 @@ vm_adjust_num_guest_pages(enum vm_guest_mode mode, 
> unsigned int num_guest_pages)
> return n;
>  }
>
> -struct kvm_userspace_memory_region *
> -kvm_userspace_memory_region_find(struct kvm_vm *vm, uint64_t start,
> -uint64_t end);
> -
>  #define sync_global_to_guest(vm, g) ({ \
> typeof(g) *_p = addr_gva2hva(vm, (vm_vaddr_t)&(g)); \
> memcpy(_p, &(g), sizeof(g));\
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
> b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 7a8af1821f5d..f09295d56c23 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -590,35 +590,6 @@ userspace_mem_region_find(struct kvm_vm *vm, uint64_t 
> start, uint64_t end)
> return NULL;
>  }
>
> -/*
> - * KVM Userspace Memory Region Find
> - *
> - * Input Args:
> - *   vm - Virtual Machine
> - *   start - Starting VM physical address
> - *   end - Ending VM physical address, inclusive.
> - *
> - * Output Args: None
> - *
> - * Return:
> - *   Pointer to overlapping region, NULL if no such region.
> - *
> - * Public interface to userspace_mem_region_find. Allows tests to look up
> - * the memslot datastructure for a given range of guest physical memory.
> - */
> -struct kvm_userspace_memory_region *
> -kvm_userspace_memory_region_find(struct kvm_vm *vm, uint64_t start,
> -uint64_t end)
> -{
> -   struct userspace_mem_region *region;
> -
> -   region = userspace_mem_region_find(vm, start, end);
> -   if (!region)
> -   return NULL;
> -
> -   return >region;
> -}
> -
>  __weak void vcpu_arch_free(struct kvm_vcpu *vcpu)
>  {
>
> --
> 2.39.1
>
>


Re: [PATCH v13 23/35] KVM: x86: Add support for "protected VMs" that can utilize private memory

2023-11-06 Thread Paolo Bonzini

On 11/6/23 12:00, Fuad Tabba wrote:

Hi,


On Fri, Oct 27, 2023 at 7:23 PM Sean Christopherson  wrote:


Add a new x86 VM type, KVM_X86_SW_PROTECTED_VM, to serve as a development
and testing vehicle for Confidential (CoCo) VMs, and potentially to even
become a "real" product in the distant future, e.g. a la pKVM.

The private memory support in KVM x86 is aimed at AMD's SEV-SNP and
Intel's TDX, but those technologies are extremely complex (understatement),
difficult to debug, don't support running as nested guests, and require
hardware that's isn't universally accessible.  I.e. relying SEV-SNP or TDX


nit: "that isn't"

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 


Hi Fuad,

thanks for your reviews and tests of the gmem patches!  Can you please 
continue replying to v14?


Thanks,

Paolo


Cheers,
/fuad


for maintaining guest private memory isn't a realistic option.

At the very least, KVM_X86_SW_PROTECTED_VM will enable a variety of
selftests for guest_memfd and private memory support without requiring
unique hardware.

Signed-off-by: Sean Christopherson 
---
  Documentation/virt/kvm/api.rst  | 32 
  arch/x86/include/asm/kvm_host.h | 15 +--
  arch/x86/include/uapi/asm/kvm.h |  3 +++
  arch/x86/kvm/Kconfig| 12 
  arch/x86/kvm/mmu/mmu_internal.h |  1 +
  arch/x86/kvm/x86.c  | 16 +++-
  include/uapi/linux/kvm.h|  1 +
  virt/kvm/Kconfig|  5 +
  8 files changed, 78 insertions(+), 7 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 38dc1fda4f45..00029436ac5b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -147,10 +147,29 @@ described as 'basic' will be available.
  The new VM has no virtual cpus and no memory.
  You probably want to use 0 as machine type.

+X86:
+
+
+Supported X86 VM types can be queried via KVM_CAP_VM_TYPES.
+
+S390:
+^
+
  In order to create user controlled virtual machines on S390, check
  KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
  privileged user (CAP_SYS_ADMIN).

+MIPS:
+^
+
+To use hardware assisted virtualization on MIPS (VZ ASE) rather than
+the default trap & emulate implementation (which changes the virtual
+memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
+flag KVM_VM_MIPS_VZ.
+
+ARM64:
+^^
+
  On arm64, the physical address size for a VM (IPA Size limit) is limited
  to 40bits by default. The limit can be configured if the host supports the
  extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
@@ -8650,6 +8669,19 @@ block sizes is exposed in 
KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES as a
  64-bit bitmap (each bit describing a block size). The default value is
  0, to disable the eager page splitting.

+8.41 KVM_CAP_VM_TYPES
+-
+
+:Capability: KVM_CAP_MEMORY_ATTRIBUTES
+:Architectures: x86
+:Type: system ioctl
+
+This capability returns a bitmap of support VM types.  The 1-setting of bit @n
+means the VM type with value @n is supported.  Possible values of @n are::
+
+  #define KVM_X86_DEFAULT_VM   0
+  #define KVM_X86_SW_PROTECTED_VM  1
+
  9. Known KVM API problems
  =

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f9e8d5642069..dff10051e9b6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1244,6 +1244,7 @@ enum kvm_apicv_inhibit {
  };

  struct kvm_arch {
+   unsigned long vm_type;
 unsigned long n_used_mmu_pages;
 unsigned long n_requested_mmu_pages;
 unsigned long n_max_mmu_pages;
@@ -2077,6 +2078,12 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t 
new_pgd);
  void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
int tdp_max_root_level, int tdp_huge_page_level);

+#ifdef CONFIG_KVM_PRIVATE_MEM
+#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.vm_type != 
KVM_X86_DEFAULT_VM)
+#else
+#define kvm_arch_has_private_mem(kvm) false
+#endif
+
  static inline u16 kvm_read_ldt(void)
  {
 u16 ldt;
@@ -2125,14 +2132,10 @@ enum {
  #define HF_SMM_INSIDE_NMI_MASK (1 << 2)

  # define KVM_MAX_NR_ADDRESS_SPACES 2
+/* SMM is currently unsupported for guests with private memory. */
+# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 : 
2)
  # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK ? 
1 : 0)
  # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, (role).smm)
-
-static inline int kvm_arch_nr_memslot_as_ids(struct kvm *kvm)
-{
-   return KVM_MAX_NR_ADDRESS_SPACES;
-}
-
  #else
  # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
  #endif
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 1a6a1f987949..a448d0964fc0 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -562,4 

Re: [PATCH 21/34] KVM: x86: Add support for "protected VMs" that can utilize private memory

2023-11-06 Thread Fuad Tabba
Hi,

On Sun, Nov 5, 2023 at 4:33 PM Paolo Bonzini  wrote:
>
> From: Sean Christopherson 
>
> Add a new x86 VM type, KVM_X86_SW_PROTECTED_VM, to serve as a development
> and testing vehicle for Confidential (CoCo) VMs, and potentially to even
> become a "real" product in the distant future, e.g. a la pKVM.
>
> The private memory support in KVM x86 is aimed at AMD's SEV-SNP and
> Intel's TDX, but those technologies are extremely complex (understatement),
> difficult to debug, don't support running as nested guests, and require
> hardware that's isn't universally accessible.  I.e. relying SEV-SNP or TDX

(replied to v13 earlier, sorry)

nit: "that isn't"

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad

> for maintaining guest private memory isn't a realistic option.
>
> At the very least, KVM_X86_SW_PROTECTED_VM will enable a variety of
> selftests for guest_memfd and private memory support without requiring
> unique hardware.
>
> Signed-off-by: Sean Christopherson 
> Reviewed-by: Paolo Bonzini 
> Message-Id: <20231027182217.3615211-24-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---
>  Documentation/virt/kvm/api.rst  | 32 
>  arch/x86/include/asm/kvm_host.h | 15 +--
>  arch/x86/include/uapi/asm/kvm.h |  3 +++
>  arch/x86/kvm/Kconfig| 12 
>  arch/x86/kvm/mmu/mmu_internal.h |  1 +
>  arch/x86/kvm/x86.c  | 16 +++-
>  include/uapi/linux/kvm.h|  1 +
>  virt/kvm/Kconfig|  5 +
>  8 files changed, 78 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 4a9a291380ad..38882263278d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -147,10 +147,29 @@ described as 'basic' will be available.
>  The new VM has no virtual cpus and no memory.
>  You probably want to use 0 as machine type.
>
> +X86:
> +
> +
> +Supported X86 VM types can be queried via KVM_CAP_VM_TYPES.
> +
> +S390:
> +^
> +
>  In order to create user controlled virtual machines on S390, check
>  KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
>  privileged user (CAP_SYS_ADMIN).
>
> +MIPS:
> +^
> +
> +To use hardware assisted virtualization on MIPS (VZ ASE) rather than
> +the default trap & emulate implementation (which changes the virtual
> +memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
> +flag KVM_VM_MIPS_VZ.
> +
> +ARM64:
> +^^
> +
>  On arm64, the physical address size for a VM (IPA Size limit) is limited
>  to 40bits by default. The limit can be configured if the host supports the
>  extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
> @@ -8766,6 +8785,19 @@ block sizes is exposed in 
> KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES as a
>  64-bit bitmap (each bit describing a block size). The default value is
>  0, to disable the eager page splitting.
>
> +8.41 KVM_CAP_VM_TYPES
> +-
> +
> +:Capability: KVM_CAP_MEMORY_ATTRIBUTES
> +:Architectures: x86
> +:Type: system ioctl
> +
> +This capability returns a bitmap of support VM types.  The 1-setting of bit 
> @n
> +means the VM type with value @n is supported.  Possible values of @n are::
> +
> +  #define KVM_X86_DEFAULT_VM   0
> +  #define KVM_X86_SW_PROTECTED_VM  1
> +
>  9. Known KVM API problems
>  =
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 75ab0da06e64..a565a2e70f30 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1255,6 +1255,7 @@ enum kvm_apicv_inhibit {
>  };
>
>  struct kvm_arch {
> +   unsigned long vm_type;
> unsigned long n_used_mmu_pages;
> unsigned long n_requested_mmu_pages;
> unsigned long n_max_mmu_pages;
> @@ -2089,6 +2090,12 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t 
> new_pgd);
>  void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>int tdp_max_root_level, int tdp_huge_page_level);
>
> +#ifdef CONFIG_KVM_PRIVATE_MEM
> +#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.vm_type != 
> KVM_X86_DEFAULT_VM)
> +#else
> +#define kvm_arch_has_private_mem(kvm) false
> +#endif
> +
>  static inline u16 kvm_read_ldt(void)
>  {
> u16 ldt;
> @@ -2137,14 +2144,10 @@ enum {
>  #define HF_SMM_INSIDE_NMI_MASK (1 << 2)
>
>  # define KVM_MAX_NR_ADDRESS_SPACES 2
> +/* SMM is currently unsupported for guests with private memory. */
> +# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 
> : 2)
>  # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK 
> ? 1 : 0)
>  # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 
> (role).smm)
> -
> -static inline int kvm_arch_nr_memslot_as_ids(struct kvm *kvm)
> -{
> -   return KVM_MAX_NR_ADDRESS_SPACES;
> -}
> -
>  #else
>  # define kvm_memslots_for_spte_role(kvm, role) 

Re: [PATCH v13 23/35] KVM: x86: Add support for "protected VMs" that can utilize private memory

2023-11-06 Thread Fuad Tabba
Hi,


On Fri, Oct 27, 2023 at 7:23 PM Sean Christopherson  wrote:
>
> Add a new x86 VM type, KVM_X86_SW_PROTECTED_VM, to serve as a development
> and testing vehicle for Confidential (CoCo) VMs, and potentially to even
> become a "real" product in the distant future, e.g. a la pKVM.
>
> The private memory support in KVM x86 is aimed at AMD's SEV-SNP and
> Intel's TDX, but those technologies are extremely complex (understatement),
> difficult to debug, don't support running as nested guests, and require
> hardware that's isn't universally accessible.  I.e. relying SEV-SNP or TDX

nit: "that isn't"

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad

> for maintaining guest private memory isn't a realistic option.
>
> At the very least, KVM_X86_SW_PROTECTED_VM will enable a variety of
> selftests for guest_memfd and private memory support without requiring
> unique hardware.
>
> Signed-off-by: Sean Christopherson 
> ---
>  Documentation/virt/kvm/api.rst  | 32 
>  arch/x86/include/asm/kvm_host.h | 15 +--
>  arch/x86/include/uapi/asm/kvm.h |  3 +++
>  arch/x86/kvm/Kconfig| 12 
>  arch/x86/kvm/mmu/mmu_internal.h |  1 +
>  arch/x86/kvm/x86.c  | 16 +++-
>  include/uapi/linux/kvm.h|  1 +
>  virt/kvm/Kconfig|  5 +
>  8 files changed, 78 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 38dc1fda4f45..00029436ac5b 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -147,10 +147,29 @@ described as 'basic' will be available.
>  The new VM has no virtual cpus and no memory.
>  You probably want to use 0 as machine type.
>
> +X86:
> +
> +
> +Supported X86 VM types can be queried via KVM_CAP_VM_TYPES.
> +
> +S390:
> +^
> +
>  In order to create user controlled virtual machines on S390, check
>  KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as
>  privileged user (CAP_SYS_ADMIN).
>
> +MIPS:
> +^
> +
> +To use hardware assisted virtualization on MIPS (VZ ASE) rather than
> +the default trap & emulate implementation (which changes the virtual
> +memory layout to fit in user mode), check KVM_CAP_MIPS_VZ and use the
> +flag KVM_VM_MIPS_VZ.
> +
> +ARM64:
> +^^
> +
>  On arm64, the physical address size for a VM (IPA Size limit) is limited
>  to 40bits by default. The limit can be configured if the host supports the
>  extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
> @@ -8650,6 +8669,19 @@ block sizes is exposed in 
> KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES as a
>  64-bit bitmap (each bit describing a block size). The default value is
>  0, to disable the eager page splitting.
>
> +8.41 KVM_CAP_VM_TYPES
> +-
> +
> +:Capability: KVM_CAP_MEMORY_ATTRIBUTES
> +:Architectures: x86
> +:Type: system ioctl
> +
> +This capability returns a bitmap of support VM types.  The 1-setting of bit 
> @n
> +means the VM type with value @n is supported.  Possible values of @n are::
> +
> +  #define KVM_X86_DEFAULT_VM   0
> +  #define KVM_X86_SW_PROTECTED_VM  1
> +
>  9. Known KVM API problems
>  =
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index f9e8d5642069..dff10051e9b6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1244,6 +1244,7 @@ enum kvm_apicv_inhibit {
>  };
>
>  struct kvm_arch {
> +   unsigned long vm_type;
> unsigned long n_used_mmu_pages;
> unsigned long n_requested_mmu_pages;
> unsigned long n_max_mmu_pages;
> @@ -2077,6 +2078,12 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t 
> new_pgd);
>  void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
>int tdp_max_root_level, int tdp_huge_page_level);
>
> +#ifdef CONFIG_KVM_PRIVATE_MEM
> +#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.vm_type != 
> KVM_X86_DEFAULT_VM)
> +#else
> +#define kvm_arch_has_private_mem(kvm) false
> +#endif
> +
>  static inline u16 kvm_read_ldt(void)
>  {
> u16 ldt;
> @@ -2125,14 +2132,10 @@ enum {
>  #define HF_SMM_INSIDE_NMI_MASK (1 << 2)
>
>  # define KVM_MAX_NR_ADDRESS_SPACES 2
> +/* SMM is currently unsupported for guests with private memory. */
> +# define kvm_arch_nr_memslot_as_ids(kvm) (kvm_arch_has_private_mem(kvm) ? 1 
> : 2)
>  # define kvm_arch_vcpu_memslots_id(vcpu) ((vcpu)->arch.hflags & HF_SMM_MASK 
> ? 1 : 0)
>  # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 
> (role).smm)
> -
> -static inline int kvm_arch_nr_memslot_as_ids(struct kvm *kvm)
> -{
> -   return KVM_MAX_NR_ADDRESS_SPACES;
> -}
> -
>  #else
>  # define kvm_memslots_for_spte_role(kvm, role) __kvm_memslots(kvm, 0)
>  #endif
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 1a6a1f987949..a448d0964fc0 100644
> --- a/arch/x86/include/uapi/asm/kvm.h

Re: [PATCH 18/34] KVM: x86/mmu: Handle page fault for private memory

2023-11-06 Thread Fuad Tabba
Hi,

On Sun, Nov 5, 2023 at 4:33 PM Paolo Bonzini  wrote:
>
> From: Chao Peng 
>
> Add support for resolving page faults on guest private memory for VMs
> that differentiate between "shared" and "private" memory.  For such VMs,
> KVM_MEM_PRIVATE memslots can include both fd-based private memory and

KVM_MEM_PRIVATE  -> KVM_MEM_GUEST_MEMFD

Cheers,
/fuad

> hva-based shared memory, and KVM needs to map in the "correct" variant,
> i.e. KVM needs to map the gfn shared/private as appropriate based on the
> current state of the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE flag.
>
> For AMD's SEV-SNP and Intel's TDX, the guest effectively gets to request
> shared vs. private via a bit in the guest page tables, i.e. what the guest
> wants may conflict with the current memory attributes.  To support such
> "implicit" conversion requests, exit to user with KVM_EXIT_MEMORY_FAULT
> to forward the request to userspace.  Add a new flag for memory faults,
> KVM_MEMORY_EXIT_FLAG_PRIVATE, to communicate whether the guest wants to
> map memory as shared vs. private.
>
> Like KVM_MEMORY_ATTRIBUTE_PRIVATE, use bit 3 for flagging private memory
> so that KVM can use bits 0-2 for capturing RWX behavior if/when userspace
> needs such information, e.g. a likely user of KVM_EXIT_MEMORY_FAULT is to
> exit on missing mappings when handling guest page fault VM-Exits.  In
> that case, userspace will want to know RWX information in order to
> correctly/precisely resolve the fault.
>
> Note, private memory *must* be backed by guest_memfd, i.e. shared mappings
> always come from the host userspace page tables, and private mappings
> always come from a guest_memfd instance.
>
> Co-developed-by: Yu Zhang 
> Signed-off-by: Yu Zhang 
> Signed-off-by: Chao Peng 
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> Reviewed-by: Fuad Tabba 
> Tested-by: Fuad Tabba 
> Message-Id: <20231027182217.3615211-21-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---
>  Documentation/virt/kvm/api.rst  |   8 ++-
>  arch/x86/kvm/mmu/mmu.c  | 101 ++--
>  arch/x86/kvm/mmu/mmu_internal.h |   1 +
>  include/linux/kvm_host.h|   8 ++-
>  include/uapi/linux/kvm.h|   1 +
>  5 files changed, 110 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 6d681f45969e..4a9a291380ad 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6953,6 +6953,7 @@ spec refer, https://github.com/riscv/riscv-sbi-doc.
>
> /* KVM_EXIT_MEMORY_FAULT */
> struct {
> +  #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3)
> __u64 flags;
> __u64 gpa;
> __u64 size;
> @@ -6961,8 +6962,11 @@ spec refer, https://github.com/riscv/riscv-sbi-doc.
>  KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that
>  could not be resolved by KVM.  The 'gpa' and 'size' (in bytes) describe the
>  guest physical address range [gpa, gpa + size) of the fault.  The 'flags' 
> field
> -describes properties of the faulting access that are likely pertinent.
> -Currently, no flags are defined.
> +describes properties of the faulting access that are likely pertinent:
> +
> + - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault 
> occurred
> +   on a private memory access.  When clear, indicates the fault occurred on a
> +   shared access.
>
>  Note!  KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it
>  accompanies a return code of '-1', not '0'!  errno will always be set to 
> EFAULT
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f5c6b0643645..754a5aaebee5 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3147,9 +3147,9 @@ static int host_pfn_mapping_level(struct kvm *kvm, 
> gfn_t gfn,
> return level;
>  }
>
> -int kvm_mmu_max_mapping_level(struct kvm *kvm,
> - const struct kvm_memory_slot *slot, gfn_t gfn,
> - int max_level)
> +static int __kvm_mmu_max_mapping_level(struct kvm *kvm,
> +  const struct kvm_memory_slot *slot,
> +  gfn_t gfn, int max_level, bool 
> is_private)
>  {
> struct kvm_lpage_info *linfo;
> int host_level;
> @@ -3161,6 +3161,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
> break;
> }
>
> +   if (is_private)
> +   return max_level;
> +
> if (max_level == PG_LEVEL_4K)
> return PG_LEVEL_4K;
>
> @@ -3168,6 +3171,16 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
> return min(host_level, max_level);
>  }
>
> +int kvm_mmu_max_mapping_level(struct kvm *kvm,
> + const struct kvm_memory_slot *slot, gfn_t gfn,
> + int max_level)
> +{
> + 

Re: [PATCH 15/34] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory

2023-11-06 Thread Fuad Tabba
Hi,

>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 083ed507e200..6d681f45969e 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst

...
>
> +4.142 KVM_CREATE_GUEST_MEMFD
> +
> +
> +:Capability: KVM_CAP_GUEST_MEMFD
> +:Architectures: none
> +:Type: vm ioctl
> +:Parameters: struct kvm_create_guest_memfd(in)

nit: space before (in)

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad

> +:Returns: 0 on success, <0 on error
> +
> +KVM_CREATE_GUEST_MEMFD creates an anonymous file and returns a file 
> descriptor
> +that refers to it.  guest_memfd files are roughly analogous to files created
> +via memfd_create(), e.g. guest_memfd files live in RAM, have volatile 
> storage,
> +and are automatically released when the last reference is dropped.  Unlike
> +"regular" memfd_create() files, guest_memfd files are bound to their owning
> +virtual machine (see below), cannot be mapped, read, or written by userspace,
> +and cannot be resized  (guest_memfd files do however support PUNCH_HOLE).
> +
> +::
> +
> +  struct kvm_create_guest_memfd {
> +   __u64 size;
> +   __u64 flags;
> +   __u64 reserved[6];
> +  };
> +
> +Conceptually, the inode backing a guest_memfd file represents physical 
> memory,
> +i.e. is coupled to the virtual machine as a thing, not to a "struct kvm".  
> The
> +file itself, which is bound to a "struct kvm", is that instance's view of the
> +underlying memory, e.g. effectively provides the translation of guest 
> addresses
> +to host memory.  This allows for use cases where multiple KVM structures are
> +used to manage a single virtual machine, e.g. when performing intrahost
> +migration of a virtual machine.
> +
> +KVM currently only supports mapping guest_memfd via 
> KVM_SET_USER_MEMORY_REGION2,
> +and more specifically via the guest_memfd and guest_memfd_offset fields in
> +"struct kvm_userspace_memory_region2", where guest_memfd_offset is the offset
> +into the guest_memfd instance.  For a given guest_memfd file, there can be at
> +most one mapping per page, i.e. binding multiple memory regions to a single
> +guest_memfd range is not allowed (any number of memory regions can be bound 
> to
> +a single guest_memfd file, but the bound ranges must not overlap).
> +
> +See KVM_SET_USER_MEMORY_REGION2 for additional details.
> +
>  5. The kvm_run structure
>  
>
> diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
> index 3d4a27f8b4fe..6f3d31b4d1e3 100644
> --- a/fs/anon_inodes.c
> +++ b/fs/anon_inodes.c
> @@ -181,6 +181,7 @@ struct file *anon_inode_create_getfile(const char *name,
> return __anon_inode_getfile(name, fops, priv, flags,
> context_inode, true);
>  }
> +EXPORT_SYMBOL_GPL(anon_inode_create_getfile);
>
>  static int __anon_inode_getfd(const char *name,
>   const struct file_operations *fops,
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 68a144cb7dbc..a6de526c0426 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -589,8 +589,20 @@ struct kvm_memory_slot {
> u32 flags;
> short id;
> u16 as_id;
> +
> +#ifdef CONFIG_KVM_PRIVATE_MEM
> +   struct {
> +   struct file __rcu *file;
> +   pgoff_t pgoff;
> +   } gmem;
> +#endif
>  };
>
> +static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot 
> *slot)
> +{
> +   return slot && (slot->flags & KVM_MEM_GUEST_MEMFD);
> +}
> +
>  static inline bool kvm_slot_dirty_track_enabled(const struct kvm_memory_slot 
> *slot)
>  {
> return slot->flags & KVM_MEM_LOG_DIRTY_PAGES;
> @@ -685,6 +697,17 @@ static inline int kvm_arch_vcpu_memslots_id(struct 
> kvm_vcpu *vcpu)
>  }
>  #endif
>
> +/*
> + * Arch code must define kvm_arch_has_private_mem if support for private 
> memory
> + * is enabled.
> + */
> +#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM)
> +static inline bool kvm_arch_has_private_mem(struct kvm *kvm)
> +{
> +   return false;
> +}
> +#endif
> +
>  struct kvm_memslots {
> u64 generation;
> atomic_long_t last_used_slot;
> @@ -1400,6 +1423,7 @@ void *kvm_mmu_memory_cache_alloc(struct 
> kvm_mmu_memory_cache *mc);
>  void kvm_mmu_invalidate_begin(struct kvm *kvm);
>  void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
>  void kvm_mmu_invalidate_end(struct kvm *kvm);
> +bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
>
>  long kvm_arch_dev_ioctl(struct file *filp,
> unsigned int ioctl, unsigned long arg);
> @@ -2355,6 +2379,30 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm 
> *kvm,
> struct kvm_gfn_range *range);
>  bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
>  struct 

Re: [PATCH 12/34] KVM: Introduce per-page memory attributes

2023-11-06 Thread Fuad Tabba
Hi,

...

> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 96aa930536b1..68a144cb7dbc 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -256,6 +256,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #ifdef CONFIG_KVM_GENERIC_MMU_NOTIFIER
>  union kvm_mmu_notifier_arg {
> pte_t pte;
> +   unsigned long attributes;
>  };
>
>  struct kvm_gfn_range {
> @@ -806,6 +807,10 @@ struct kvm {
>
>  #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
> struct notifier_block pm_notifier;
> +#endif
> +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +   /* Protected by slots_locks (for writes) and RCU (for reads) */

slots_locks -> slots_lock

Otherwise,
Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad

> +   struct xarray mem_attr_array;
>  #endif
> char stats_id[KVM_STATS_NAME_SIZE];
>  };
> @@ -2338,4 +2343,18 @@ static inline void 
> kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
> vcpu->run->memory_fault.flags = 0;
>  }
>
> +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t 
> gfn)
> +{
> +   return xa_to_value(xa_load(>mem_attr_array, gfn));
> +}
> +
> +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> +unsigned long attrs);
> +bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
> +   struct kvm_gfn_range *range);
> +bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
> +struct kvm_gfn_range *range);
> +#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */
> +
>  #endif
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 59010a685007..e8d167e54980 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1220,6 +1220,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES 230
>  #define KVM_CAP_USER_MEMORY2 231
>  #define KVM_CAP_MEMORY_FAULT_INFO 232
> +#define KVM_CAP_MEMORY_ATTRIBUTES 233
>
>  #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -2288,4 +2289,16 @@ struct kvm_s390_zpci_op {
>  /* flags for kvm_s390_zpci_op->u.reg_aen.flags */
>  #define KVM_S390_ZPCIOP_REGAEN_HOST(1 << 0)
>
> +/* Available with KVM_CAP_MEMORY_ATTRIBUTES */
> +#define KVM_SET_MEMORY_ATTRIBUTES  _IOW(KVMIO,  0xd2, struct 
> kvm_memory_attributes)
> +
> +struct kvm_memory_attributes {
> +   __u64 address;
> +   __u64 size;
> +   __u64 attributes;
> +   __u64 flags;
> +};
> +
> +#define KVM_MEMORY_ATTRIBUTE_PRIVATE   (1ULL << 3)
> +
>  #endif /* __LINUX_KVM_H */
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index ecae2914c97e..5bd7fcaf9089 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -96,3 +96,7 @@ config KVM_GENERIC_HARDWARE_ENABLING
>  config KVM_GENERIC_MMU_NOTIFIER
> select MMU_NOTIFIER
> bool
> +
> +config KVM_GENERIC_MEMORY_ATTRIBUTES
> +   select KVM_GENERIC_MMU_NOTIFIER
> +   bool
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 7f3291dec7a6..f1a575d39b3b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1211,6 +1211,9 @@ static struct kvm *kvm_create_vm(unsigned long type, 
> const char *fdname)
> spin_lock_init(>mn_invalidate_lock);
> rcuwait_init(>mn_memslots_update_rcuwait);
> xa_init(>vcpu_array);
> +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +   xa_init(>mem_attr_array);
> +#endif
>
> INIT_LIST_HEAD(>gpc_list);
> spin_lock_init(>gpc_lock);
> @@ -1391,6 +1394,9 @@ static void kvm_destroy_vm(struct kvm *kvm)
> }
> cleanup_srcu_struct(>irq_srcu);
> cleanup_srcu_struct(>srcu);
> +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +   xa_destroy(>mem_attr_array);
> +#endif
> kvm_arch_free_vm(kvm);
> preempt_notifier_dec();
> hardware_disable_all();
> @@ -2397,6 +2403,200 @@ static int kvm_vm_ioctl_clear_dirty_log(struct kvm 
> *kvm,
>  }
>  #endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
>
> +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
> +/*
> + * Returns true if _all_ gfns in the range [@start, @end) have attributes
> + * matching @attrs.
> + */
> +bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
> +unsigned long attrs)
> +{
> +   XA_STATE(xas, >mem_attr_array, start);
> +   unsigned long index;
> +   bool has_attrs;
> +   void *entry;
> +
> +   rcu_read_lock();
> +
> +   if (!attrs) {
> +   has_attrs = !xas_find(, end - 1);
> +   goto out;
> +   }
> +
> +   has_attrs = true;
> +   for (index = start; index < end; index++) {
> +   do {
> +   entry = xas_next();
> +   } while (xas_retry(, entry));
> +
> +   if (xas.xa_index != index || 

Re: [PATCH 09/34] KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace

2023-11-06 Thread Fuad Tabba
On Sun, Nov 5, 2023 at 4:32 PM Paolo Bonzini  wrote:
>
> From: Chao Peng 
>
> Add a new KVM exit type to allow userspace to handle memory faults that
> KVM cannot resolve, but that userspace *may* be able to handle (without
> terminating the guest).
>
> KVM will initially use KVM_EXIT_MEMORY_FAULT to report implicit
> conversions between private and shared memory.  With guest private memory,
> there will be two kind of memory conversions:
>
>   - explicit conversion: happens when the guest explicitly calls into KVM
> to map a range (as private or shared)
>
>   - implicit conversion: happens when the guest attempts to access a gfn
> that is configured in the "wrong" state (private vs. shared)
>
> On x86 (first architecture to support guest private memory), explicit
> conversions will be reported via KVM_EXIT_HYPERCALL+KVM_HC_MAP_GPA_RANGE,
> but reporting KVM_EXIT_HYPERCALL for implicit conversions is undesriable
> as there is (obviously) no hypercall, and there is no guarantee that the
> guest actually intends to convert between private and shared, i.e. what
> KVM thinks is an implicit conversion "request" could actually be the
> result of a guest code bug.
>
> KVM_EXIT_MEMORY_FAULT will be used to report memory faults that appear to
> be implicit conversions.
>
> Note!  To allow for future possibilities where KVM reports
> KVM_EXIT_MEMORY_FAULT and fills run->memory_fault on _any_ unresolved
> fault, KVM returns "-EFAULT" (-1 with errno == EFAULT from userspace's
> perspective), not '0'!  Due to historical baggage within KVM, exiting to
> userspace with '0' from deep callstacks, e.g. in emulation paths, is
> infeasible as doing so would require a near-complete overhaul of KVM,
> whereas KVM already propagates -errno return codes to userspace even when
> the -errno originated in a low level helper.
>
> Report the gpa+size instead of a single gfn even though the initial usage
> is expected to always report single pages.  It's entirely possible, likely
> even, that KVM will someday support sub-page granularity faults, e.g.
> Intel's sub-page protection feature allows for additional protections at
> 128-byte granularity.
>
> Link: https://lore.kernel.org/all/20230908222905.1321305-5-amoor...@google.com
> Link: https://lore.kernel.org/all/zq3amlo2syv3d...@google.com
> Cc: Anish Moorthy 
> Cc: David Matlack 
> Suggested-by: Sean Christopherson 
> Co-developed-by: Yu Zhang 
> Signed-off-by: Yu Zhang 
> Signed-off-by: Chao Peng 
> Co-developed-by: Sean Christopherson 
> Signed-off-by: Sean Christopherson 
> Reviewed-by: Paolo Bonzini 
> Message-Id: <20231027182217.3615211-10-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> ---

Reviewed-by: Fuad Tabba 
Tested-by: Fuad Tabba 

Cheers,
/fuad

>  Documentation/virt/kvm/api.rst | 41 ++
>  arch/x86/kvm/x86.c |  1 +
>  include/linux/kvm_host.h   | 11 +
>  include/uapi/linux/kvm.h   |  8 +++
>  4 files changed, 61 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index bdea1423c5f8..481fb0e2ce90 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6846,6 +6846,26 @@ array field represents return values. The userspace 
> should update the return
>  values of SBI call before resuming the VCPU. For more details on RISC-V SBI
>  spec refer, https://github.com/riscv/riscv-sbi-doc.
>
> +::
> +
> +   /* KVM_EXIT_MEMORY_FAULT */
> +   struct {
> +   __u64 flags;
> +   __u64 gpa;
> +   __u64 size;
> +   } memory_fault;
> +
> +KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that
> +could not be resolved by KVM.  The 'gpa' and 'size' (in bytes) describe the
> +guest physical address range [gpa, gpa + size) of the fault.  The 'flags' 
> field
> +describes properties of the faulting access that are likely pertinent.
> +Currently, no flags are defined.
> +
> +Note!  KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it
> +accompanies a return code of '-1', not '0'!  errno will always be set to 
> EFAULT
> +or EHWPOISON when KVM exits with KVM_EXIT_MEMORY_FAULT, userspace should 
> assume
> +kvm_run.exit_reason is stale/undefined for all other error numbers.
> +
>  ::
>
>  /* KVM_EXIT_NOTIFY */
> @@ -7880,6 +7900,27 @@ This capability is aimed to mitigate the threat that 
> malicious VMs can
>  cause CPU stuck (due to event windows don't open up) and make the CPU
>  unavailable to host or other VMs.
>
> +7.34 KVM_CAP_MEMORY_FAULT_INFO
> +--
> +
> +:Architectures: x86
> +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
> +
> +The presence of this capability indicates that KVM_RUN will fill
> +kvm_run.memory_fault if KVM cannot resolve a guest page fault VM-Exit, e.g. 
> if
> +there is a valid memslot but no backing VMA for the corresponding host 

Re: [PATCH 03/34] KVM: Use gfn instead of hva for mmu_notifier_retry

2023-11-06 Thread Huang, Kai
On Sun, 2023-11-05 at 17:30 +0100, Paolo Bonzini wrote:
> From: Chao Peng 
> 
> Currently in mmu_notifier invalidate path, hva range is recorded and then
> checked against by mmu_invalidate_retry_hva() in the page fault handling
> path. However, for the soon-to-be-introduced private memory, a page fault
> may not have a hva associated, checking gfn(gpa) makes more sense.
> 
> For existing hva based shared memory, gfn is expected to also work. The
> only downside is when aliasing multiple gfns to a single hva, the
> current algorithm of checking multiple ranges could result in a much
> larger range being rejected. Such aliasing should be uncommon, so the
> impact is expected small.
> 
> Suggested-by: Sean Christopherson 
> Cc: Xu Yilun 
> Signed-off-by: Chao Peng 
> Reviewed-by: Fuad Tabba 
> Tested-by: Fuad Tabba 
> [sean: convert vmx_set_apic_access_page_addr() to gfn-based API]
> Signed-off-by: Sean Christopherson 
> Reviewed-by: Paolo Bonzini 
> Reviewed-by: Xu Yilun 
> Message-Id: <20231027182217.3615211-4-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> 

Reviewed-by: Kai Huang 


Re: [PATCH 02/34] KVM: Assert that mmu_invalidate_in_progress *never* goes negative

2023-11-06 Thread Huang, Kai
On Sun, 2023-11-05 at 17:30 +0100, Paolo Bonzini wrote:
> From: Sean Christopherson 
> 
> Move the assertion on the in-progress invalidation count from the primary
> MMU's notifier path to KVM's common notification path, i.e. assert that
> the count doesn't go negative even when the invalidation is coming from
> KVM itself.
> 
> Opportunistically convert the assertion to a KVM_BUG_ON(), i.e. kill only
> the affected VM, not the entire kernel.  A corrupted count is fatal to the
> VM, e.g. the non-zero (negative) count will cause mmu_invalidate_retry()
> to block any and all attempts to install new mappings.  But it's far from
> guaranteed that an end() without a start() is fatal or even problematic to
> anything other than the target VM, e.g. the underlying bug could simply be
> a duplicate call to end().  And it's much more likely that a missed
> invalidation, i.e. a potential use-after-free, would manifest as no
> notification whatsoever, not an end() without a start().
> 
> Signed-off-by: Sean Christopherson 
> Reviewed-by: Paolo Bonzini 
> Reviewed-by: Fuad Tabba 
> Tested-by: Fuad Tabba 
> Message-Id: <20231027182217.3615211-3-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> 

Reviewed-by: Kai Huang 



Re: [PATCH 01/34] KVM: Tweak kvm_hva_range and hva_handler_t to allow reusing for gfn ranges

2023-11-06 Thread Huang, Kai
On Sun, 2023-11-05 at 17:30 +0100, Paolo Bonzini wrote:
> From: Sean Christopherson 
> 
> Rework and rename "struct kvm_hva_range" into "kvm_mmu_notifier_range" so
> that the structure can be used to handle notifications that operate on gfn
> context, i.e. that aren't tied to a host virtual address.  Rename the
> handler typedef too (arguably it should always have been gfn_handler_t).
> 
> Practically speaking, this is a nop for 64-bit kernels as the only
> meaningful change is to store start+end as u64s instead of unsigned longs.
> 
> Reviewed-by: Paolo Bonzini 
> Reviewed-by: Xiaoyao Li 
> Signed-off-by: Sean Christopherson 
> Reviewed-by: Fuad Tabba 
> Tested-by: Fuad Tabba 
> Message-Id: <20231027182217.3615211-2-sea...@google.com>
> Signed-off-by: Paolo Bonzini 
> 

Reviewed-by: Kai Huang 



Re: [PATCH 08/34] KVM: Introduce KVM_SET_USER_MEMORY_REGION2

2023-11-06 Thread Huang, Kai
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7025b3751027..bdea1423c5f8 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1340,6 +1340,7 @@ yet and must be cleared on entry.
>   __u64 guest_phys_addr;
>   __u64 memory_size; /* bytes */
>   __u64 userspace_addr; /* start of the userspace allocated memory */
> + __u64 pad[16];

Looks this "pad[16]" should be moved to ...

>};
>  
>/* for kvm_userspace_memory_region::flags */
> @@ -6192,6 +6193,27 @@ to know what fields can be changed for the system 
> register described by
>  ``op0, op1, crn, crm, op2``. KVM rejects ID register values that describe a
>  superset of the features supported by the system.
>  
> +4.140 KVM_SET_USER_MEMORY_REGION2
> +-
> +
> +:Capability: KVM_CAP_USER_MEMORY2
> +:Architectures: all
> +:Type: vm ioctl
> +:Parameters: struct kvm_userspace_memory_region2 (in)
> +:Returns: 0 on success, -1 on error
> +
> +::
> +
> +  struct kvm_userspace_memory_region2 {
> + __u32 slot;
> + __u32 flags;
> + __u64 guest_phys_addr;
> + __u64 memory_size; /* bytes */
> + __u64 userspace_addr; /* start of the userspace allocated memory */
> +  };
> +

... here.

Acked-by: Kai Huang 


Re: mm/debug_vm_pgtable.c:860 warning triggered

2023-11-06 Thread David Hildenbrand

On 06.11.23 07:06, Michael Ellerman wrote:

Anshuman Khandual  writes:

Hello Daniel,

This test just ensures that PFN is preserved during pte <--> swap pte 
transformations
, and the warning here seems to have been caused by powerpc platform specific 
helpers
and/or its pte_t representation. Adding powerpc folks and platform mailing list 
here.




32bit swp_entry_t with 64bit pte is supported by making sure that we 
never store a swap offset larger than what we can actually fit into the 
swp_entry_t.


There is common code in place to handle that: see 
generic_max_swapfile_size(), which does to conversion back and forth to 
see how many bits of the offset actually survive the conversion.



Doesn't the test need a similar treatment to:

   2321ba3e3733 ("mm/debug_vm_pgtable: more pte_swp_exclusive() sanity checks")

Which said:
 Especially, the pfn_pte() is dodgy when the swap PTE layout differs
 heavily from ordinary PTEs.  Let's properly construct a swap PTE from swap
 type+offset.



Sounds reasonable to me.

--
Cheers,

David / dhildenb



Re: [PATCH 2/8] leds: nic78bx: explicitly unregister LEDs at module's shutdown

2023-11-06 Thread Christophe Leroy


Le 25/10/2023 à 15:07, George Stark a écrit :
> LEDs are registered using devm_led_classdev_register() and automatically
> unregistered after module's remove(). led_classdev_unregister() calls
> led_set_brightness() to turn off the LEDs and module's appropriate callback
> uses resources those were destroyed already in module's remove().
> So explicitly unregister LEDs at module shutdown.
> 
> Signed-off-by: George Stark 
> ---
>   drivers/leds/leds-nic78bx.c | 4 
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/leds/leds-nic78bx.c b/drivers/leds/leds-nic78bx.c
> index f196f52eec1e..12b70fcad37f 100644
> --- a/drivers/leds/leds-nic78bx.c
> +++ b/drivers/leds/leds-nic78bx.c
> @@ -170,6 +170,10 @@ static int nic78bx_probe(struct platform_device *pdev)
>   static int nic78bx_remove(struct platform_device *pdev)
>   {
>   struct nic78bx_led_data *led_data = platform_get_drvdata(pdev);
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(nic78bx_leds); i++)
> + devm_led_classdev_unregister(>dev, _leds[i].cdev);

The whole purpose of devm_ functions is that you don't need to call 
unregister when removing the driver as the dev core will do it for you. 
I understand your problem but I think this is not the solution.

>   
>   /* Lock LED register */
>   outb(NIC78BX_LOCK_VALUE,


Re: [PATCH 0/8] devm_led_classdev_register() usage problem

2023-11-06 Thread Christophe Leroy


Le 25/10/2023 à 15:07, George Stark a écrit :
> Lots of drivers use devm_led_classdev_register() to register their led objects
> and let the kernel free those leds at the driver's remove stage.
> It can lead to a problem due to led_classdev_unregister()
> implementation calls led_set_brightness() to turn off the led.
> led_set_brightness() may call one of the module's brightness_set callbacks.
> If that callback uses module's resources allocated without using devm funcs()
> then those resources will be already freed at module's remove() callback and
> we may have use-after-free situation.
> 
> Here is an example:
> 
> module_probe()
> {
>  devm_led_classdev_register(module_brightness_set_cb);
>  mutex_init();
> }
> 
> module_brightness_set_cb()
> {
>  mutex_lock();
>  do_set_brightness();
>  mutex_unlock();
> }
> 
> module_remove()
> {
>  mutex_destroy();
> }
> 
> at rmmod:
> module_remove()
>  ->mutex_destroy();
> devres_release_all()
>  ->led_classdev_unregister();
>  ->led_set_brightness();
>  ->module_brightness_set_cb();
>   ->mutex_lock();  /* use-after-free */
> 
> I think it's an architectural issue and should be discussed thoroughly.
> Some thoughts about fixing it as a start:
> 1) drivers can use devm_led_classdev_unregister() to explicitly free leds 
> before
> dependend resources are freed. devm_led_classdev_register() remains being 
> useful
> to simplify probe implementation.
> As a proof of concept I examined all drivers from drivers/leds and prepared
> patches where it's needed. Sometimes it was not as clean as just calling
> devm_led_classdev_unregister() because several drivers do not track
> their leds object at all - they can call devm_led_classdev_register() and 
> drop the
> returned pointer. In that case I used devres group API.

I see no point in using a device managed function if you have to call 
the unregister function. All the purpose of device managed functions is 
to avoid having to call an unregister function at the end.

> 
> Drivers outside drivers/leds should be checked too after discussion.
> 
> 2) remove led_set_brightness from led_classdev_unregister() and force the 
> drivers
> to turn leds off at shutdown. May be add check that led's brightness is 0
> at led_classdev_unregister() and put a warning to dmesg if it's not.
> Actually in many cases it doesn't really need to turn off the leds manually 
> one-by-one
> if driver shutdowns whole led controller. For the last case to disable the 
> warning
> new flag can be brought in e.g LED_AUTO_OFF_AT_SHUTDOWN (similar to 
> LED_RETAIN_AT_SHUTDOWN).
> 
> George Stark (8):
>leds: powernv: explicitly unregister LEDs at module's shutdown
>leds: nic78bx: explicitly unregister LEDs at module's shutdown
>leds: an30259a: explicitly unregister LEDs at module's shutdown
>leds: mlxreg: explicitly unregister LEDs at module's shutdown
>leds: aw200xx: explicitly unregister LEDs at module's shutdown
>leds: aw2013: explicitly unregister LEDs at module's shutdown
>leds: lp3952: explicitly unregister LEDs at module's shutdown
>leds: lm3532: explicitly unregister LEDs at module's shutdown
> 
>   drivers/leds/leds-an30259a.c |  4 
>   drivers/leds/leds-aw200xx.c  |  4 
>   drivers/leds/leds-aw2013.c   |  4 
>   drivers/leds/leds-lm3532.c   |  6 ++
>   drivers/leds/leds-lp3952.c   |  5 +
>   drivers/leds/leds-mlxreg.c   | 12 +++-
>   drivers/leds/leds-nic78bx.c  |  4 
>   drivers/leds/leds-powernv.c  |  7 +++
>   8 files changed, 45 insertions(+), 1 deletion(-)
>