[PATCH] ASoC: imx-hdmi: Fix refcount leak in imx_hdmi_probe

2022-05-10 Thread Miaoqian Lin
of_find_device_by_node() takes reference, we should use put_device()
to release it. when devm_kzalloc() fails, it doesn't have a
put_device(), it will cause refcount leak.
Add missing put_device() to fix this.

Fixes: 6a5f850aa83a ("ASoC: fsl: Add imx-hdmi machine driver")
Fixes: f670b274f7f6 ("ASoC: imx-hdmi: add put_device() after 
of_find_device_by_node()")
Signed-off-by: Miaoqian Lin 
---
 sound/soc/fsl/imx-hdmi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/fsl/imx-hdmi.c b/sound/soc/fsl/imx-hdmi.c
index 929f69b758af..ec149dc73938 100644
--- a/sound/soc/fsl/imx-hdmi.c
+++ b/sound/soc/fsl/imx-hdmi.c
@@ -126,6 +126,7 @@ static int imx_hdmi_probe(struct platform_device *pdev)
data = devm_kzalloc(>dev, sizeof(*data), GFP_KERNEL);
if (!data) {
ret = -ENOMEM;
+   put_device(_pdev->dev);
goto fail;
}
 
-- 
2.25.1



Re: [PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-10 Thread Baolin Wang




On 5/11/2022 7:28 AM, Andrew Morton wrote:

On Tue, 10 May 2022 16:17:39 -0700 Andrew Morton  
wrote:


+
+static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep)
+{
+   return ptep_get(ptep);
+}
+
+static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
+  pte_t *ptep, pte_t pte)
+{
+}
  #endif/* CONFIG_HUGETLB_PAGE */
  


This blows up nommu (arm allnoconfig):

In file included from fs/io_uring.c:71:
./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush':
./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 
'ptep_get' [-Werror=implicit-function-declaration]
  1100 | return ptep_get(ptep);
   |^~~~


huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply
zapped this change.



Well that wasn't a great success.  Doing this instead.  It's pretty
nasty - something nicer would be nicer please.


Thanks for fixing the building issue. I'll look at this to simplify the 
dummy function. Myabe just remove the ptep_get().


diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -1097,7 +1097,7 @@ static inline void set_huge_swap_pte_at(struct 
mm_struct *mm, unsigned long addr

 static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep)
 {
-   return ptep_get(ptep);
+   return *ptep;
 }



--- 
a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix
+++ a/include/linux/hugetlb.h
@@ -1094,6 +1094,7 @@ static inline void set_huge_swap_pte_at(
  {
  }
  
+#ifdef CONFIG_MMU

  static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep)
  {
@@ -1104,6 +1105,7 @@ static inline void set_huge_pte_at(struc
   pte_t *ptep, pte_t pte)
  {
  }
+#endif
  #endif/* CONFIG_HUGETLB_PAGE */
  
  static inline spinlock_t *huge_pte_lock(struct hstate *h,

_


Re: [PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-10 Thread Andrew Morton
On Wed, 11 May 2022 09:19:17 +0800 kernel test robot  wrote:

> Hi Baolin,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on akpm-mm/mm-everything]
> [cannot apply to hnaz-mm/master arm64/for-next/core linus/master v5.18-rc6]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:
> https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220510-114753
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git 
> mm-everything
> config: h8300-buildonly-randconfig-r001-20220509 
> (https://download.01.org/0day-ci/archive/20220511/202205110919.cwiciqye-...@intel.com/config)
> compiler: h8300-linux-gcc (GCC) 11.3.0
> reproduce (this is a W=1 build):
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # 
> https://github.com/intel-lab-lkp/linux/commit/b666792b4c5f9774c350977ff88837bafc36365a
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review 
> Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220510-114753
> git checkout b666792b4c5f9774c350977ff88837bafc36365a
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 
> O=build_dir ARCH=h8300 SHELL=/bin/bash
> 
> ...

Thanks. 
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix.patch
(which is in current mm-everything) fixes this up.



[PATCH V2] powerpc/eeh: Drop redundant spinlock initialization

2022-05-10 Thread Haowen Bai
slot_errbuf_lock has declared and initialized by DEFINE_SPINLOCK,
so we don't need to spin_lock_init again, drop it.

Signed-off-by: Haowen Bai 
---
V1->V2: update comment

 arch/powerpc/platforms/pseries/eeh_pseries.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index f9af879c0222..abf0b577d055 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -847,8 +847,7 @@ static int __init eeh_pseries_init(void)
return -EINVAL;
}
 
-   /* Initialize error log lock and size */
-   spin_lock_init(_errbuf_lock);
+   /* Initialize error log size */
eeh_error_buf_size = rtas_token("rtas-error-log-max");
if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) {
pr_info("%s: unknown EEH error log size\n",
-- 
2.7.4



Re: [PATCH kernel] KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers

2022-05-10 Thread Alexey Kardashevskiy




On 5/11/22 03:58, Cédric Le Goater wrote:

Hello Alexey,

On 5/9/22 09:11, Alexey Kardashevskiy wrote:

Currently we have 2 sets of interrupt controller hypercalls handlers
for real and virtual modes, this is from POWER8 times when switching
MMU on was considered an expensive operation.

POWER9 however does not have dependent threads and MMU is enabled for
handling hcalls so the XIVE native 


XIVE native does not have any real-mode hcall handlers. In fact, all
are handled at the QEMU level.

or XICS-on-XIVE real mode handlers never execute on real P9 and > 
later CPUs.


They are not ? I am surprised. It must be a "recent" change. Any how,
if you can remove them safely, this is good news and you should be able
to clean up some more code in the PowerNV native interface.



Yes, this is the result of that massive work of Nick to move the KVM's 
asm to c for p9. It could have been the case even before that but harder 
to see in that asm code :)





This untemplate the handlers and only keeps the real mode handlers for
XICS native (up to POWER8) and remove the rest of dead code. Changes
in functions are mechanical except few missing empty lines to make
checkpatch.pl happy.

The default implemented hcalls list already contains XICS hcalls so
no change there.

This should not cause any behavioral change.


In the worse case, it impacts performance a bit but only on "old" distros
(kernel < 4.14), I doubt anyone will complain.


Signed-off-by: Alexey Kardashevskiy 


Acked-by: Cédric Le Goater 



Thanks!



Thanks,

C.



---
  arch/powerpc/kvm/Makefile   |   2 +-
  arch/powerpc/include/asm/kvm_ppc.h  |   7 -
  arch/powerpc/kvm/book3s_xive.h  |   7 -
  arch/powerpc/kvm/book3s_hv_builtin.c    |  64 ---
  arch/powerpc/kvm/book3s_hv_rm_xics.c    |   5 +
  arch/powerpc/kvm/book3s_hv_rm_xive.c    |  46 --
  arch/powerpc/kvm/book3s_xive.c  | 638 +++-
  arch/powerpc/kvm/book3s_xive_template.c | 636 ---
  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  12 +-
  9 files changed, 632 insertions(+), 785 deletions(-)
  delete mode 100644 arch/powerpc/kvm/book3s_hv_rm_xive.c
  delete mode 100644 arch/powerpc/kvm/book3s_xive_template.c

diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 8e3681a86074..f17379b0f161 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -73,7 +73,7 @@ kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
  book3s_hv_tm.o
  kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
-    book3s_hv_rm_xics.o book3s_hv_rm_xive.o
+    book3s_hv_rm_xics.o
  kvm-book3s_64-builtin-tm-objs-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
  book3s_hv_tm_builtin.o
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h

index 44200a27371b..a775377a570e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -787,13 +787,6 @@ long kvmppc_rm_h_page_init(struct kvm_vcpu *vcpu, 
unsigned long flags,

 unsigned long dest, unsigned long src);
  long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
    unsigned long slb_v, unsigned int status, 
bool data);

-unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu);
-unsigned long kvmppc_rm_h_xirr_x(struct kvm_vcpu *vcpu);
-unsigned long kvmppc_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long 
server);

-int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
-    unsigned long mfrr);
-int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr);
-int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
  void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu);
  /*
diff --git a/arch/powerpc/kvm/book3s_xive.h 
b/arch/powerpc/kvm/book3s_xive.h

index 09d0657596c3..1e48f72e8aa5 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -285,13 +285,6 @@ static inline u32 __xive_read_eq(__be32 *qpage, 
u32 msk, u32 *idx, u32 *toggle)

  return cur & 0x7fff;
  }
-extern unsigned long xive_rm_h_xirr(struct kvm_vcpu *vcpu);
-extern unsigned long xive_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned 
long server);

-extern int xive_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
- unsigned long mfrr);
-extern int xive_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr);
-extern int xive_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
-
  /*
   * Common Xive routines for XICS-over-XIVE and XIVE native
   */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c

index 7e52d0beee77..88a8f6473c4e 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -489,70 +489,6 @@ static long kvmppc_read_one_intr(bool *again)
  return kvmppc_check_passthru(xisr, xirr, again);
  }
-#ifdef CONFIG_KVM_XICS
-unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu)
-{
-    if (!kvmppc_xics_enabled(vcpu))
-    return 

Re: [PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-10 Thread kernel test robot
Hi Baolin,

I love your patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]
[cannot apply to hnaz-mm/master arm64/for-next/core linus/master v5.18-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220510-114753
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git 
mm-everything
config: h8300-buildonly-randconfig-r001-20220509 
(https://download.01.org/0day-ci/archive/20220511/202205110919.cwiciqye-...@intel.com/config)
compiler: h8300-linux-gcc (GCC) 11.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/b666792b4c5f9774c350977ff88837bafc36365a
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220510-114753
git checkout b666792b4c5f9774c350977ff88837bafc36365a
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 
O=build_dir ARCH=h8300 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from kernel/fork.c:51:
   include/linux/hugetlb.h: In function 'huge_ptep_clear_flush':
>> include/linux/hugetlb.h:1100:16: error: implicit declaration of function 
>> 'ptep_get' [-Werror=implicit-function-declaration]
1100 | return ptep_get(ptep);
 |^~~~
>> include/linux/hugetlb.h:1100:16: error: incompatible types when returning 
>> type 'int' but 'pte_t' was expected
1100 | return ptep_get(ptep);
 |^~
   kernel/fork.c: At top level:
   kernel/fork.c:162:13: warning: no previous prototype for 
'arch_release_task_struct' [-Wmissing-prototypes]
 162 | void __weak arch_release_task_struct(struct task_struct *tsk)
 | ^~~~
   kernel/fork.c:847:20: warning: no previous prototype for 
'arch_task_cache_init' [-Wmissing-prototypes]
 847 | void __init __weak arch_task_cache_init(void) { }
 |^~~~
   kernel/fork.c:942:12: warning: no previous prototype for 
'arch_dup_task_struct' [-Wmissing-prototypes]
 942 | int __weak arch_dup_task_struct(struct task_struct *dst,
 |^~~~
   cc1: some warnings being treated as errors
--
   In file included from kernel/sysctl.c:46:
   include/linux/hugetlb.h: In function 'huge_ptep_clear_flush':
>> include/linux/hugetlb.h:1100:16: error: implicit declaration of function 
>> 'ptep_get' [-Werror=implicit-function-declaration]
1100 | return ptep_get(ptep);
 |^~~~
>> include/linux/hugetlb.h:1100:16: error: incompatible types when returning 
>> type 'int' but 'pte_t' was expected
1100 | return ptep_get(ptep);
 |^~
   cc1: some warnings being treated as errors
--
   In file included from include/linux/migrate.h:8,
from mm/page_alloc.c:62:
   include/linux/hugetlb.h: In function 'huge_ptep_clear_flush':
>> include/linux/hugetlb.h:1100:16: error: implicit declaration of function 
>> 'ptep_get' [-Werror=implicit-function-declaration]
1100 | return ptep_get(ptep);
 |^~~~
>> include/linux/hugetlb.h:1100:16: error: incompatible types when returning 
>> type 'int' but 'pte_t' was expected
1100 | return ptep_get(ptep);
 |^~
   In file included from include/asm-generic/bug.h:5,
from arch/h8300/include/asm/bug.h:8,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:6,
from mm/page_alloc.c:19:
   mm/page_alloc.c: In function 'free_pages':
   include/asm-generic/page.h:89:51: warning: ordered comparison of pointer 
with null pointer [-Wextra]
  89 | #define virt_addr_valid(kaddr)  (((void *)(kaddr) >= (void 
*)PAGE_OFFSET) && \
 |   ^~
   include/linux/compiler.h:78:45: note: in definition of macro 'unlikely'
  78 | # define unlikely(x)__builtin_expect(!!(x), 0)
 | ^
   include/linux/mmd

Re: [PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-10 Thread Andrew Morton
On Tue, 10 May 2022 16:17:39 -0700 Andrew Morton  
wrote:

> > +
> > +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> > + unsigned long addr, pte_t *ptep)
> > +{
> > +   return ptep_get(ptep);
> > +}
> > +
> > +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long 
> > addr,
> > +  pte_t *ptep, pte_t pte)
> > +{
> > +}
> >  #endif /* CONFIG_HUGETLB_PAGE */
> >  
> 
> This blows up nommu (arm allnoconfig):
> 
> In file included from fs/io_uring.c:71:
> ./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush':
> ./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 
> 'ptep_get' [-Werror=implicit-function-declaration]
>  1100 | return ptep_get(ptep);
>   |^~~~
> 
> 
> huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply
> zapped this change.
> 

Well that wasn't a great success.  Doing this instead.  It's pretty
nasty - something nicer would be nicer please.

--- 
a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix
+++ a/include/linux/hugetlb.h
@@ -1094,6 +1094,7 @@ static inline void set_huge_swap_pte_at(
 {
 }
 
+#ifdef CONFIG_MMU
 static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep)
 {
@@ -1104,6 +1105,7 @@ static inline void set_huge_pte_at(struc
   pte_t *ptep, pte_t pte)
 {
 }
+#endif
 #endif /* CONFIG_HUGETLB_PAGE */
 
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
_



Re: [PATCH v3 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration

2022-05-10 Thread Andrew Morton
On Tue, 10 May 2022 11:45:59 +0800 Baolin Wang  
wrote:

> On some architectures (like ARM64), it can support CONT-PTE/PMD size
> hugetlb, which means it can support not only PMD/PUD size hugetlb:
> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
> size specified.
> 
> When migrating a hugetlb page, we will get the relevant page table
> entry by huge_pte_offset() only once to nuke it and remap it with
> a migration pte entry. This is correct for PMD or PUD size hugetlb,
> since they always contain only one pmd entry or pud entry in the
> page table.
> 
> However this is incorrect for CONT-PTE and CONT-PMD size hugetlb,
> since they can contain several continuous pte or pmd entry with
> same page table attributes. So we will nuke or remap only one pte
> or pmd entry for this CONT-PTE/PMD size hugetlb page, which is
> not expected for hugetlb migration. The problem is we can still
> continue to modify the subpages' data of a hugetlb page during
> migrating a hugetlb page, which can cause a serious data consistent
> issue, since we did not nuke the page table entry and set a
> migration pte for the subpages of a hugetlb page.
> 
> To fix this issue, we should change to use huge_ptep_clear_flush()
> to nuke a hugetlb page table, and remap it with set_huge_pte_at()
> and set_huge_swap_pte_at() when migrating a hugetlb page, which
> already considered the CONT-PTE or CONT-PMD size hugetlb.
> 
> ...
>
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct 
> mm_struct *mm, unsigned long addr
>   pte_t *ptep, pte_t pte, unsigned long 
> sz)
>  {
>  }
> +
> +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
> +   unsigned long addr, pte_t *ptep)
> +{
> + return ptep_get(ptep);
> +}
> +
> +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
> +pte_t *ptep, pte_t pte)
> +{
> +}
>  #endif   /* CONFIG_HUGETLB_PAGE */
>  

This blows up nommu (arm allnoconfig):

In file included from fs/io_uring.c:71:
./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush':
./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 
'ptep_get' [-Werror=implicit-function-declaration]
 1100 | return ptep_get(ptep);
  |^~~~


huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply
zapped this change.

--- 
a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix
+++ a/include/linux/hugetlb.h
@@ -1093,17 +1093,6 @@ static inline void set_huge_swap_pte_at(
pte_t *ptep, pte_t pte, unsigned long 
sz)
 {
 }
-
-static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
-{
-   return ptep_get(ptep);
-}
-
-static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
-  pte_t *ptep, pte_t pte)
-{
-}
 #endif /* CONFIG_HUGETLB_PAGE */
 
 static inline spinlock_t *huge_pte_lock(struct hstate *h,
_



Re: [PATCH kernel] KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers

2022-05-10 Thread Cédric Le Goater

Hello Alexey,

On 5/9/22 09:11, Alexey Kardashevskiy wrote:

Currently we have 2 sets of interrupt controller hypercalls handlers
for real and virtual modes, this is from POWER8 times when switching
MMU on was considered an expensive operation.

POWER9 however does not have dependent threads and MMU is enabled for
handling hcalls so the XIVE native 


XIVE native does not have any real-mode hcall handlers. In fact, all
are handled at the QEMU level.


or XICS-on-XIVE real mode handlers never execute on real P9 and > later CPUs.


They are not ? I am surprised. It must be a "recent" change. Any how,
if you can remove them safely, this is good news and you should be able
to clean up some more code in the PowerNV native interface.



This untemplate the handlers and only keeps the real mode handlers for
XICS native (up to POWER8) and remove the rest of dead code. Changes
in functions are mechanical except few missing empty lines to make
checkpatch.pl happy.

The default implemented hcalls list already contains XICS hcalls so
no change there.

This should not cause any behavioral change.


In the worse case, it impacts performance a bit but only on "old" distros
(kernel < 4.14), I doubt anyone will complain.


Signed-off-by: Alexey Kardashevskiy 


Acked-by: Cédric Le Goater 

Thanks,

C.

 

---
  arch/powerpc/kvm/Makefile   |   2 +-
  arch/powerpc/include/asm/kvm_ppc.h  |   7 -
  arch/powerpc/kvm/book3s_xive.h  |   7 -
  arch/powerpc/kvm/book3s_hv_builtin.c|  64 ---
  arch/powerpc/kvm/book3s_hv_rm_xics.c|   5 +
  arch/powerpc/kvm/book3s_hv_rm_xive.c|  46 --
  arch/powerpc/kvm/book3s_xive.c  | 638 +++-
  arch/powerpc/kvm/book3s_xive_template.c | 636 ---
  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  12 +-
  9 files changed, 632 insertions(+), 785 deletions(-)
  delete mode 100644 arch/powerpc/kvm/book3s_hv_rm_xive.c
  delete mode 100644 arch/powerpc/kvm/book3s_xive_template.c

diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 8e3681a86074..f17379b0f161 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -73,7 +73,7 @@ kvm-hv-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \
book3s_hv_tm.o
  
  kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \

-   book3s_hv_rm_xics.o book3s_hv_rm_xive.o
+   book3s_hv_rm_xics.o
  
  kvm-book3s_64-builtin-tm-objs-$(CONFIG_PPC_TRANSACTIONAL_MEM) += \

book3s_hv_tm_builtin.o
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 44200a27371b..a775377a570e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -787,13 +787,6 @@ long kvmppc_rm_h_page_init(struct kvm_vcpu *vcpu, unsigned 
long flags,
   unsigned long dest, unsigned long src);
  long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
unsigned long slb_v, unsigned int status, bool 
data);
-unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu);
-unsigned long kvmppc_rm_h_xirr_x(struct kvm_vcpu *vcpu);
-unsigned long kvmppc_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long server);
-int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
-unsigned long mfrr);
-int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr);
-int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
  void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu);
  
  /*

diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
index 09d0657596c3..1e48f72e8aa5 100644
--- a/arch/powerpc/kvm/book3s_xive.h
+++ b/arch/powerpc/kvm/book3s_xive.h
@@ -285,13 +285,6 @@ static inline u32 __xive_read_eq(__be32 *qpage, u32 msk, 
u32 *idx, u32 *toggle)
return cur & 0x7fff;
  }
  
-extern unsigned long xive_rm_h_xirr(struct kvm_vcpu *vcpu);

-extern unsigned long xive_rm_h_ipoll(struct kvm_vcpu *vcpu, unsigned long 
server);
-extern int xive_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
-unsigned long mfrr);
-extern int xive_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr);
-extern int xive_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
-
  /*
   * Common Xive routines for XICS-over-XIVE and XIVE native
   */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 7e52d0beee77..88a8f6473c4e 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -489,70 +489,6 @@ static long kvmppc_read_one_intr(bool *again)
return kvmppc_check_passthru(xisr, xirr, again);
  }
  
-#ifdef CONFIG_KVM_XICS

-unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu)
-{
-   if (!kvmppc_xics_enabled(vcpu))
-   return H_TOO_HARD;
-   if (xics_on_xive())
-   return xive_rm_h_xirr(vcpu);
-   else
-   return xics_rm_h_xirr(vcpu);
-}
-
-unsigned long 

Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-10 Thread John Ogness
On 2022-05-10, Steven Rostedt  wrote:
>> As already mentioned in the other reply, panic() sometimes stops the
>> other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
>> 
>> Another situation is when the CPU using the lock ends in some
>> infinite loop because something went wrong. The system is in
>> an unpredictable state during panic().
>> 
>> I am not sure if this is possible with the code under gsmi_dev.lock
>> but such things really happen during panic() in other subsystems.
>> Using trylock in the panic() code path is a good practice.
>
> I believe that Peter Zijlstra had a special spin lock for NMIs or
> early printk, where it would not block if the lock was held on the
> same CPU. That is, if an NMI happened and paniced while this lock was
> held on the same CPU, it would not deadlock. But it would block if the
> lock was held on another CPU.

Yes. And starting with 5.19 it will be carrying the name that _you_ came
up with (cpu_sync):

printk_cpu_sync_get_irqsave()
printk_cpu_sync_put_irqrestore()

John


Re: [PATCH 18/30] notifier: Show function names on notifier routines if DEBUG_NOTIFIERS is set

2022-05-10 Thread Steven Rostedt
On Thu, 28 Apr 2022 09:01:13 +0800
Xiaoming Ni  wrote:

> > +#ifdef CONFIG_DEBUG_NOTIFIERS
> > +   {
> > +   char sym_name[KSYM_NAME_LEN];
> > +
> > +   pr_info("notifiers: registered %s()\n",
> > +   notifier_name(n, sym_name));
> > +   }  
> 
> Duplicate Code.
> 
> Is it better to use __func__ and %pS?
> 
> pr_info("%s: %pS\n", __func__, n->notifier_call);
> 
> 
> > +#endif

Also, don't sprinkle #ifdef in C code. Instead:

if (IS_ENABLED(CONFIG_DEBUG_NOTIFIERS))
pr_info("notifers: regsiter %ps()\n",
n->notifer_call);


Or define a print macro at the start of the C file that is a nop if it is
not defined, and use the macro.

-- Steve


Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-10 Thread Steven Rostedt
On Tue, 10 May 2022 13:38:39 +0200
Petr Mladek  wrote:

> As already mentioned in the other reply, panic() sometimes stops
> the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
> 
> Another situation is when the CPU using the lock ends in some
> infinite loop because something went wrong. The system is in
> an unpredictable state during panic().
> 
> I am not sure if this is possible with the code under gsmi_dev.lock
> but such things really happen during panic() in other subsystems.
> Using trylock in the panic() code path is a good practice.

I believe that Peter Zijlstra had a special spin lock for NMIs or early
printk, where it would not block if the lock was held on the same CPU. That
is, if an NMI happened and paniced while this lock was held on the same
CPU, it would not deadlock. But it would block if the lock was held on
another CPU.

-- Steve


Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers

2022-05-10 Thread Guilherme G. Piccoli
On 10/05/2022 10:53, Michael Ellerman wrote:
> "Guilherme G. Piccoli"  writes:
>> On 05/05/2022 15:55, Hari Bathini wrote:
>>> [...] 
>>> The change looks good. I have tested it on an LPAR (ppc64).
>>>
>>> Reviewed-by: Hari Bathini 
>>>
>>
>> Hi Michael. do you think it's possible to add this one to powerpc/next
>> (or something like that), or do you prefer a V2 with his tag?
> 
> Ah sorry, I assumed it was going in as part of the whole series. I guess
> I misread the cover letter.
> 
> So you want me to take this patch on its own via the powerpc tree?
> 
> cheers

Hi Michael, thanks for the prompt response!

You didn't misread, that was the plan heh
But some maintainers start to take patches and merge in their trees, and
in the end, it seems to make sense - almost half of this series are
fixes or clean-ups, that are not really necessary to get merged altogether.

So, if you can take this one, I'd appreciate - it'll make V2 a bit
smaller =)

Cheers,


Guilherme


Re: [PATCH 08/30] powerpc/setup: Refactor/untangle panic notifiers

2022-05-10 Thread Michael Ellerman
"Guilherme G. Piccoli"  writes:
> On 05/05/2022 15:55, Hari Bathini wrote:
>> [...] 
>> The change looks good. I have tested it on an LPAR (ppc64).
>> 
>> Reviewed-by: Hari Bathini 
>> 
>
> Hi Michael. do you think it's possible to add this one to powerpc/next
> (or something like that), or do you prefer a V2 with his tag?

Ah sorry, I assumed it was going in as part of the whole series. I guess
I misread the cover letter.

So you want me to take this patch on its own via the powerpc tree?

cheers


Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path

2022-05-10 Thread Guilherme G. Piccoli
On 10/05/2022 08:38, Petr Mladek wrote:
> [...]
> I see two more alternative solutions:
> 
> 1st variant is a trick already used in console write() callbacks.
> They do trylock() when oops_in_progress is set. They remember
> the result to prevent double unlock when printing Oops messages and
> the system will try to continue working. For example:
> 
> pl011_console_write(struct console *co, const char *s, unsigned int count)
> {
> [...]
>   int locked = 1;
> [...]
>   if (uap->port.sysrq)
>   locked = 0;
>   else if (oops_in_progress)
>   locked = spin_trylock(>port.lock);
>   else
>   spin_lock(>port.lock);
> 
> [...]
> 
>   if (locked)
>   spin_unlock(>port.lock);
> }
> 
> 
> 2nd variant is to check panic_cpu variable. It is used in printk.c.
> We might move the function to panic.h:
> 
> static bool panic_in_progress(void)
> {
>   return unlikely(atomic_read(_cpu) != PANIC_CPU_INVALID);
> }
> 
> and then do:
> 
>   if (panic_in_progress()) {
>   ...

Thanks for the review Petr! I feel alternative two is way better, it
checks for panic - the oops_in_progress isn't really enough, since we
can call panic() directly, not necessarily through an oops path, correct?

For me, we could stick with the lock check, but I'll defer to Evan - I
didn't work the V2 patch yet, what do you prefer Evan?


> [...]
> As already mentioned in the other reply, panic() sometimes stops
> the other CPUs using NMI, for example, see kdump_nmi_shootdown_cpus().
> 
> Another situation is when the CPU using the lock ends in some
> infinite loop because something went wrong. The system is in
> an unpredictable state during panic().
> 
> I am not sure if this is possible with the code under gsmi_dev.lock
> but such things really happen during panic() in other subsystems.
> Using trylock in the panic() code path is a good practice.
> 
> Best Regards,
> Petr

Makes total sense, thanks for confirming!
Cheers,


Guilherme


Re: [PATCH 05/30] misc/pvpanic: Convert regular spinlock into trylock on panic path

2022-05-10 Thread Guilherme G. Piccoli
On 10/05/2022 09:14, Petr Mladek wrote:
> [...]
>> With that said, it's dangerous to use regular spinlocks in such path,
>> as introduced by commit b3c0f8774668 ("misc/pvpanic: probe multiple 
>> instances").
>> This patch fixes that by replacing regular spinlocks with the trylock
>> safer approach.
> 
> It seems that the lock is used just to manipulating a list. A super
> safe solution would be to use the rcu API: rcu_add_rcu() and
> list_del_rcu() under rcu_read_lock(). The spin lock will not be
> needed and the list will always be valid.
> 
> The advantage would be that it will always call members that
> were successfully added earlier. That said, I am not familiar
> with pvpanic and am not sure if it is worth it.
> 
>> It also fixes an old comment (about a long gone framebuffer code) and
>> the notifier priority - we should execute hypervisor notifiers early,
>> deferring this way the panic action to the hypervisor, as expected by
>> the users that are setting up pvpanic.
> 
> This should be done in a separate patch. It changes the behavior.
> Also there might be a discussion whether it really should be
> the maximal priority.
> 
> Best Regards,
> Petr

Thanks for the review Petr. Patch was already merged - my goal was to be
concise, i.e., a patch per driver / module, so the patch kinda fixes
whatever I think is wrong with the driver with regards panic handling.

Do you think it worth to remove this patch from Greg's branch just to
split it in 2? Personally I think it's not worth, but opinions are welcome.

About the RCU part, this one really could be a new patch, a good
improvement patch - it makes sense to me, we can think about that after
the fixes I guess.

Cheers,


Guilherme


Re: [PATCH 01/30] x86/crash,reboot: Avoid re-disabling VMX in all CPUs on crash/restart

2022-05-10 Thread Guilherme G. Piccoli
On 09/05/2022 12:52, Sean Christopherson wrote:
> I find the shortlog to be very confusing, the bug has nothing to do with 
> disabling
> VMX and I distinctly remember wrapping VMXOFF with exception fixup to prevent 
> doom
> if VMX is already disabled :-).  The issue is really that 
> nmi_shootdown_cpus() doesn't
> play nice with being called twice.
> 

Hey Sean, OK - I agree with you, the issue is really about the double
list addition.

> [...]
> 
> Call stacks for the two callers would be very, very helpful.
> [...]

> This feels like were just adding more duct tape to the mess.  nmi_shootdown() 
> is
> still unsafe for more than one caller, and it takes a _lot_ of staring and 
> searching
> to understand that crash_smp_send_stop() is invoked iff CONFIG_KEXEC_CORE=y, 
> i.e.
> that it will call smp_ops.crash_stop_other_cpus() and not just 
> smp_send_stop().
> 
> Rather than shared a flag between two relatively unrelated functions, what if 
> we
> instead disabling virtualization in crash_nmi_callback() and then turn the 
> reboot
> call into a nop if an NMI shootdown has already occurred?  That will also add 
> a
> bit of documentation about multiple shootdowns not working.
> 
> And I believe there's also a lurking bug in 
> native_machine_emergency_restart() that
> can be fixed with cleanup.  SVM can also block INIT and so should be disabled 
> during
> an emergency reboot.
> 
> The attached patches are compile tested only.  If they seem sane, I'll post an
> official mini series.

Thanks Sean, it makes sense - my patch is more a "band-aid" whereas
yours fixes it in a more generic way. Confess I found the logic of your
patch complex, but as you said, it requires a *lot* of code analysis to
understand these multiple shutdown patches, the problem is complicated
by nature heh

I've tested your patch 0001 and it works well for all cases [0], so go
ahead and submit the miniseries, feel free to add:

Reported-and-tested-by: Guilherme G. Piccoli 


I've read patch 0002 and it makes sense to me as well, a good proactive
bug fix =)

With that said, I'll of course drop this one from V2 of this series.
Cheers,


Guilherme




[0]
A summary of my tests and the code paths that the panic shutdown take
depending on some conditions:

New function that disables VMX/SVM: cpu_crash_disable_virtualization()
[should be executed in every online CPU on shutdown)

The panic path triggers the following call stacks depending on kdump and
post_notifiers:


(1) kexec/kdump + !crash_kexec_post_notifiers
->machine_crash_shutdown()
.crash_shutdown() 
--native_machine_crash_shutdown() [all custom handlers except Xen PV
call the native generic function]
crash_smp_send_stop()
--kdump_nmi_shootdown_cpus()
nmi_shootdown_cpus(kdump_nmi_callback)
--crash_nmi_callback()
kdump_nmi_callback()
--cpu_crash_disable_virtualization()


(2) kexec/kdump + crash_kexec_post_notifiers
->crash_smp_send_stop()
kdump_nmi_shootdown_cpus()
--nmi_shootdown_cpus(kdump_nmi_callback)
crash_nmi_callback()
--kdump_nmi_callback()
cpu_crash_disable_virtualization()

After this path, will execute machine_crash_shutdown() but
crash_smp_send_stop()
is guarded against double execution. Also, emergency restart calls
emergency_vmx_disable_all() .


(3) !kexec/kdump + crash_kexec_post_notifiers

Same as (2)


(4) !kexec/kdump + !crash_kexec_post_notifiers
-> smp_send_stop()
native_stop_other_cpus()
--apic_send_IPI_allbutself(REBOOT_VECTOR)
sysvec_reboot
--cpu_emergency_vmxoff() 

If not:
--register_stop_handler()
apic_send_IPI_allbutself(NMI_VECTOR)
--smp_stop_nmi_callback()
cpu_emergency_vmxoff()

After that, emergency_vmx_disable_all() gets called in the emergency
restart path as well.


Re: [PATCH] powerpc/eeh: Drop redundant spinlock initialization

2022-05-10 Thread Tyrel Datwyler
On 5/10/22 02:53, Haowen Bai wrote:
> slot_errbuf_lock has declared and initialized by DEFINE_SPINLOCK,
> so we don't need to spin_lock_init again, drop it.
> 
> Signed-off-by: Haowen Bai 
> ---
>  arch/powerpc/platforms/pseries/eeh_pseries.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
> b/arch/powerpc/platforms/pseries/eeh_pseries.c
> index f9af879c0222..77b476093e06 100644
> --- a/arch/powerpc/platforms/pseries/eeh_pseries.c
> +++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
> @@ -848,7 +848,6 @@ static int __init eeh_pseries_init(void)
>   }
> 
>   /* Initialize error log lock and size */

Update the comment, or just drop it entirely?

-Tyrel

> - spin_lock_init(_errbuf_lock);
>   eeh_error_buf_size = rtas_token("rtas-error-log-max");
>   if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) {
>   pr_info("%s: unknown EEH error log size\n",



Re: [PATCH 23/30] printk: kmsg_dump: Introduce helper to inform number of dumpers

2022-05-10 Thread Steven Rostedt
On Wed, 27 Apr 2022 19:49:17 -0300
"Guilherme G. Piccoli"  wrote:

> Currently we don't have a way to check if there are dumpers set,
> except counting the list members maybe. This patch introduces a very
> simple helper to provide this information, by just keeping track of
> registered/unregistered kmsg dumpers. It's going to be used on the
> panic path in the subsequent patch.

FYI, it is considered "bad form" to reference in the change log "this
patch". We know this is a patch. The change log should just talk about what
is being done. So can you reword your change logs (you do this is almost
every patch). Here's what I would reword the above to be:

 Currently we don't have a way to check if there are dumpers set, except
 perhaps by counting the list members. Introduce a very simple helper to
 provide this information, by just keeping track of registered/unregistered
 kmsg dumpers. This will simplify the refactoring of the panic path.


-- Steve


Re: [PATCH V2 1/2] tools/perf: Add utility function to read /proc/cpuinfo for any field

2022-05-10 Thread Arnaldo Carvalho de Melo
Em Tue, May 10, 2022 at 07:08:47PM +0530, Athira Rajeev escreveu:
> 
> 
> > On 06-May-2022, at 3:03 PM, Athira Rajeev  
> > wrote:
> > 
> > 
> > 
> >> On 05-May-2022, at 10:54 PM, Arnaldo Carvalho de Melo  
> >> wrote:
> >> 
> >> Em Thu, May 05, 2022 at 03:09:59PM +0530, Athira Rajeev escreveu:
> >>> /proc/cpuinfo provides information about type of processor, number
> >>> of CPU's etc. Reading /proc/cpuinfo file outputs useful information
> >>> by field name like cpu, platform, model (depending on architecture)
> >>> and its value separated by colon.
> >>> 
> >>> Add new utility function "cpuinfo_field" in "util/header.c" which
> >>> accepts field name as input string to search in /proc/cpuinfo content.
> >>> This returns the first matching value as resulting string. Example,
> >>> calling the function "cpuinfo_field(platform)" in powerpc returns
> >>> the platform value. This can be used to fetch processor information
> >>> from "cpuinfo" by other utilities/testcases.
> >>> 
> >>> Signed-off-by: Athira Rajeev 
> >>> ---
> >>> tools/perf/util/header.c | 53 
> >>> tools/perf/util/header.h |  1 +
> >>> 2 files changed, 54 insertions(+)
> >>> 
> >>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> >>> index a27132e5a5ef..f08857f96606 100644
> >>> --- a/tools/perf/util/header.c
> >>> +++ b/tools/perf/util/header.c
> >>> @@ -983,6 +983,59 @@ static int write_dir_format(struct feat_fd *ff,
> >>>   return do_write(ff, >dir.version, sizeof(data->dir.version));
> >>> }
> >>> 
> >>> +/*
> >>> + * Return entry from /proc/cpuinfo
> >>> + * indicated by "search" parameter.
> >>> + */
> >>> +char *cpuinfo_field(const char *search)
> >>> +{
> >>> + FILE *file;
> >>> + char *buf = NULL;
> >>> + char *copy_buf = NULL, *p;
> >>> + size_t len = 0;
> >>> +
> >>> + if (!search)
> >>> + return NULL;
> >>> +
> >>> + file = fopen("/proc/cpuinfo", "r");
> >>> + if (!file)
> >>> + return NULL;
> >>> +
> >>> + while (getline(, , file) > 0) {
> >>> + if (!strncmp(buf, search, strlen(search)))
> >> 
> >> Can you save the search string lenght in a variable and use it instead
> >> of calling strlen() for the same buffer for each line in /proc/cpuinfo?
> > 
> > 
> > Hi Arnaldo, Michael
> > 
> > Thanks for review comments. Based on suggestion from Michael, I am 
> > reworking on patch 2 to SKIP the test
> > if physical_id is set to -1 irrespective of value from cpuinfo.
> > 
> > In this patch, I had written "cpuinfo_field " function as generic function 
> > for retrieving any entry from /proc/cpuinfo.
> > But it won't be used in patch 2 now. Do you think this function is useful 
> > to keep ? Otherwise, I will drop patch 1

Lets add it when the need arises.

- Arnaldo
 
> Hi,
> 
> Requesting for suggestions on this change
> 
> Thanks
> Athira
> > 
> > Thanks
> > Athira Rajeev
> > 
> >> 
> >>> + break;
> >>> + }
> >>> +
> >>> + if (feof(file))
> >>> + goto done;
> >>> +
> >>> + /*
> >>> +  * Trim the new line and separate
> >>> +  * value for search field from ":"
> >>> +  * in cpuinfo line output.
> >>> +  * Example output line:
> >>> +  * platform : 
> >>> +  */
> >>> + copy_buf = buf;
> >>> + p = strchr(copy_buf, ':');
> >> 
> >> So you assume that this will always be there, right? Shouldn't we not
> >> assume that and check if p is NULL and bail out instead?
> >> 
> >>> +
> >>> + /* Go to string after ":" */
> >>> + copy_buf = p + 1;
> >>> + p = strchr(copy_buf, '\n');
> >> 
> >> Ditto.
> >> 
> >>> + if (p)
> >>> + *p = '\0';
> >>> +
> >>> + /* Copy the filtered string after removing space to buf */
> >>> + strcpy(buf, strim(copy_buf));
> >>> +
> >>> + fclose(file);
> >>> + return buf;
> >>> +
> >>> +done:
> >> 
> >> Please rename this goto label to "not_found", "done" isn't intention
> >> revealing.
> >> 
> >>> + free(buf);
> >>> + fclose(file);
> >>> + return NULL;
> >>> +}
> >>> /*
> >>> * Check whether a CPU is online
> >>> *
> >>> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> >>> index 0eb4bc29a5a4..b0f754364bd4 100644
> >>> --- a/tools/perf/util/header.h
> >>> +++ b/tools/perf/util/header.h
> >>> @@ -166,4 +166,5 @@ int get_cpuid(char *buffer, size_t sz);
> >>> 
> >>> char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused);
> >>> int strcmp_cpuid_str(const char *s1, const char *s2);
> >>> +char *cpuinfo_field(const char *search);
> >>> #endif /* __PERF_HEADER_H */
> >>> -- 
> >>> 2.35.1
> >> 
> >> -- 
> >> 
> >> - Arnaldo

-- 

- Arnaldo


Re: [PATCH] tools/perf/tests: Skip perf BPF test if clang is not present

2022-05-10 Thread Arnaldo Carvalho de Melo
Em Fri, May 06, 2022 at 03:07:51PM +0530, Athira Rajeev escreveu:
> 
> 
> > On 05-May-2022, at 10:51 PM, Arnaldo Carvalho de Melo  
> > wrote:
> > 
> > Em Thu, May 05, 2022 at 03:30:39PM +0530, Athira Rajeev escreveu:
> >> Perf BPF filter test fails in environment where "clang"
> >> is not installed.
> >> 
> >> Test failure logs:
> >> 
> >> <<>>
> >> 42: BPF filter:
> >> 42.1: Basic BPF filtering : Skip
> >> 42.2: BPF pinning : FAILED!
> >> 42.3: BPF prologue generation : FAILED!
> >> <<>>
> >> 
> >> Enabling verbose option provided debug logs which says
> >> clang/llvm needs to be installed. Snippet of verbose logs:
> >> 
> >> <<>>
> >> 42.2: BPF pinning  :
> >> --- start ---
> >> test child forked, pid 61423
> >> ERROR: unable to find clang.
> >> Hint:  Try to install latest clang/llvm to support BPF.
> >>Check your $PATH
> >> 
> >> <>
> >> 
> >> Failed to compile test case: 'Basic BPF llvm compile'
> >> Unable to get BPF object, fix kbuild first
> >> test child finished with -1
> >>  end 
> >> BPF filter subtest 2: FAILED!
> >> <<>>
> >> 
> >> Here subtests, "BPF pinning" and "BPF prologue generation"
> >> failed and logs shows clang/llvm is needed. After installing
> >> clang, testcase passes.
> >> 
> >> Reason on why subtest failure happens though logs has proper
> >> debug information:
> >> Main function __test__bpf calls test_llvm__fetch_bpf_obj by
> >> passing 4th argument as true ( 4th arguments maps to parameter
> >> "force" in test_llvm__fetch_bpf_obj ). But this will cause
> >> test_llvm__fetch_bpf_obj to skip the check for clang/llvm.
> >> 
> >> Snippet of code part which checks for clang based on
> >> parameter "force" in test_llvm__fetch_bpf_obj:
> >> 
> >> <<>>
> >> if (!force && (!llvm_param.user_set_param &&
> >> <<>>
> >> 
> >> Since force is set to "false", test won't get skipped and
> >> fails to compile test case. The BPF code compilation needs
> >> clang, So pass the fourth argument as "false" and also skip
> >> the test if reason for return is "TEST_SKIP"
> >> 
> >> After the patch:
> >> 
> >> <<>>
> >> 42: BPF filter:
> >> 42.1: Basic BPF filtering : Skip
> >> 42.2: BPF pinning : Skip
> >> 42.3: BPF prologue generation : Skip
> >> <<>>
> > 
> > Wouldn't it be better to add the reason for the skip, like other tests
> > do?
> > 
> > E.g.:
> > 
> > 23: Watchpoint  :
> > 23.1: Read Only Watchpoint  : Skip 
> > (missing hardware support)
> > 23.2: Write Only Watchpoint : Ok
> > 23.3: Read / Write Watchpoint   : Ok
> > 23.4: Modify Watchpoint
> > 
> > Something like:
> > 
> > After the patch:
> > 
> > <<>>
> > 42: BPF filter:
> > 42.1: Basic BPF filtering : Skip (clang not installed)
> > 42.2: BPF pinning : Skip (clang not installed)
> > 42.3: BPF prologue generation : Skip (clang not installed)
> 
> 
> Hi Arnaldo,
> 
> I tried to use TEST_CASE_REASON("BPF pinning", bpf_pinning, "clang not 
> installed")
> 
> The clang check is done in test_llvm__fetch_bpf_obj under some condition 
> checks:
> 
> <<>>
>  /*
>  * Skip this test if user's .perfconfig doesn't set [llvm] section
>  * and clang is not found in $PATH
>  */
> if (!force && (!llvm_param.user_set_param &&
>llvm__search_clang())) {
> pr_debug("No clang, skip this test\n");
> return TEST_SKIP;
> }
> <<>>
> 
> But the reason for BPF skip could happen at other places also ie non-root 
> user, bpf support checks from check_env.
> So can't exactly print the skip reason to be clang since It could get skipped 
> from other environment checks too. Any suggestions Arnaldo ?

We have cases where the framework isn't flexible enough to say exactly
what was the reason for the failure and we use language such as "maybe
clang isn't installed or some other reason?"

- Arnaldo
 
> Thanks
> Athira
> 
> > <<>>
> > 
> >> Signed-off-by: Athira Rajeev 
> >> ---
> >> tools/perf/tests/bpf.c | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
> >> index 57b9591f7cbb..ae62f01239e3 100644
> >> --- a/tools/perf/tests/bpf.c
> >> +++ b/tools/perf/tests/bpf.c
> >> @@ -222,11 +222,11 @@ static int __test__bpf(int idx)
> >> 
> >>ret = test_llvm__fetch_bpf_obj(_buf, _buf_sz,
> >>   bpf_testcase_table[idx].prog_id,
> >> - true, NULL);
> >> + false, NULL);
> >>if (ret != TEST_OK || !obj_buf || !obj_buf_sz) {
> >>pr_debug("Unable to get BPF object, %s\n",
> >> bpf_testcase_table[idx].msg_compile_fail);
> 

Re: [PATCH] PCI/AER: Iterate over error counters instead of error strings

2022-05-10 Thread Bjorn Helgaas
[+cc Rajat]

On Mon, May 09, 2022 at 06:14:41PM +, Mohamed Khalfella wrote:
> PCI AER stats counters sysfs attributes need to iterate over
> stats counters instead of stats names. 

Thanks for catching this; it definitely looks like a real issue!  I
guess you're probably seeing junk in the sysfs files?

It would be helpful to reviewers if this said *why* we need to iterate
over the counters instead of the names.  I think the problem is that
the current code reads past the end of the stats counters.

There are parallel arrays here:

  #define AER_MAX_TYPEOF_COR_ERRS 16
  #define AER_MAX_TYPEOF_UNCOR_ERRS   27

  aer_correctable_error_string[32]   # 32
  pdev->aer_stats->dev_cor_errs[AER_MAX_TYPEOF_COR_ERRS] # 16
  aer_uncorrectable_error_string[32] # 32
  pdev->aer_stats->dev_fatal_errs[AER_MAX_TYPEOF_UNCOR_ERRS] # 27
  pdev->aer_stats->dev_nonfatal_errs[AER_MAX_TYPEOF_UNCOR_ERRS]  # 27

And here's the current use of them:

  #define aer_stats_dev_attr(..., stats_array, strings_array, ...)
for (i = 0; i < ARRAY_SIZE(strings_array); i++) {
  if (strings_array[i])
sysfs_emit_at(..., strings_array[i], stats[i]);  (1)
  else if (stats[i])
sysfs_emit_at(..., stats[i]);(2)

  aer_stats_dev_attr(..., dev_cor_errs, aer_correctable_error_string,
  aer_stats_dev_attr(..., dev_fatal_errs, aer_uncorrectable_error_string,
  aer_stats_dev_attr(..., dev_nonfatal_errs, aer_uncorrectable_error_string,

The current loop iterates over 0..31, which is safe at (1) because the
non-NULL strings are at aer_correctable_error_string[0..15] and
aer_uncorrectable_error_string[0..26].

But it is unsafe at (2) because it references dev_cor_errs[16..31],
dev_fatal_errs[27..31], and dev_nonfatal_errs[27..31], which are past
the end of the arrays.

> Also, added a build time check to make sure all counters have
> entries in strings array.
>
> Fixes: 0678e3109a3c ("PCI/AER: Simplify __aer_print_error()")

Yep, I blew it there.  Rajat did it correctly when he added this with
81aa5206f9a7 ("PCI/AER: Add sysfs attributes to provide AER stats and
breakdown"), and I broke it by extending the string arrays to 32
entries.

> Cc: sta...@vger.kernel.org
> Reported-by: Meeta Saggi 
> Signed-off-by: Mohamed Khalfella 
> Reviewed-by: Meeta Saggi 
> Reviewed-by: Eric Badger 
> ---
>  drivers/pci/pcie/aer.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 9fa1f97e5b27..ce99a6d44786 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -533,7 +533,7 @@ static const char *aer_agent_string[] = {
>   u64 *stats = pdev->aer_stats->stats_array;  \
>   size_t len = 0; \
>   \
> - for (i = 0; i < ARRAY_SIZE(strings_array); i++) {   \
> + for (i = 0; i < ARRAY_SIZE(pdev->aer_stats->stats_array); i++) {\
>   if (strings_array[i])   \
>   len += sysfs_emit_at(buf, len, "%s %llu\n", \
>strings_array[i],  \

I think maybe we should populate the currently NULL entries in the
string[] arrays and simplify the code here, e.g.,

  static const char *aer_correctable_error_string[] = {
"RxErr",/* Bit Position 0   */
"dev_cor_errs_bit[1]",
...

  if (stats[i])
len += sysfs_emit_at(buf, len, "%s %llu\n", strings_array[i], stats[i]);

It's a little more data space, but easier to verify.

> @@ -1342,6 +1342,11 @@ static int aer_probe(struct pcie_device *dev)
>   struct device *device = >device;
>   struct pci_dev *port = dev->port;
>  
> + BUILD_BUG_ON(ARRAY_SIZE(aer_correctable_error_string) <
> +  AER_MAX_TYPEOF_COR_ERRS);
> + BUILD_BUG_ON(ARRAY_SIZE(aer_uncorrectable_error_string) <
> +  AER_MAX_TYPEOF_UNCOR_ERRS);

And make these check for "!=" instead of "<".


Re: [PATCH 2/2] powerpc/vdso: Link with ld.lld when requested

2022-05-10 Thread Nathan Chancellor
On Tue, May 10, 2022 at 04:22:12PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 5/10/22 06:46, Nathan Chancellor wrote:
> > The PowerPC vDSO is linked with $(CC) instead of $(LD), which means the
> > default linker of the compiler is used instead of the linker requested
> > by the builder.
> > 
> >$ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/
> >...
> > 
> >$ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg
> > 
> >File: arch/powerpc/kernel/vdso/vdso32.so.dbg
> >String dump of section '.comment':
> >[ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> > 
> >File: arch/powerpc/kernel/vdso/vdso64.so.dbg
> >String dump of section '.comment':
> >[ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> > 
> > The compiler option '-fuse-ld' tells the compiler which linker to use
> > when it is invoked as both the compiler and linker. Use '-fuse-ld=lld'
> > when LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is
> > linked with the same linker as the rest of the kernel.
> > 
> >$ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg
> > 
> >File: arch/powerpc/kernel/vdso/vdso32.so.dbg
> >String dump of section '.comment':
> >[ 0] Linker: LLD 14.0.0
> >[14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> > 
> >File: arch/powerpc/kernel/vdso/vdso64.so.dbg
> >String dump of section '.comment':
> >[ 0] Linker: LLD 14.0.0
> >[14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)
> > 
> > LD can be a full path to ld.lld, which will not be handled properly by
> > '-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's
> > search path. '-fuse-ld' can take a path to the linker but it is
> > deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario.
> > 
> > Use '--ld-path' if it is supported, as it will handle a full path or
> > just 'ld.lld' properly. See the LLVM commit below for the full details
> > of '--ld-path'.
> > 
> > Link: https://github.com/ClangBuiltLinux/linux/issues/774
> > Link: 
> > https://github.com/llvm/llvm-project/commit/1bc5c84710a8c73ef21295e63c19d10a8c71f2f5
> > Signed-off-by: Nathan Chancellor 
> > ---
> >   arch/powerpc/kernel/vdso/Makefile | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/powerpc/kernel/vdso/Makefile 
> > b/arch/powerpc/kernel/vdso/Makefile
> > index 954974287ee7..096b0bf1335f 100644
> > --- a/arch/powerpc/kernel/vdso/Makefile
> > +++ b/arch/powerpc/kernel/vdso/Makefile
> > @@ -48,6 +48,7 @@ UBSAN_SANITIZE := n
> >   KASAN_SANITIZE := n
> >   ccflags-y := -shared -fno-common -fno-builtin -nostdlib 
> > -Wl,--hash-style=both
> > +ccflags-$(CONFIG_LD_IS_LLD) += $(call 
> > cc-option,--ld-path=$(LD),-fuse-ld=lld)
> 
> 
> Out of curiosity - how does this work exactly? I can see --ld-path= in the
> output so it works but there is no -fuse-ld=lld, is the second argument of
> cc-option only picked when the first one is not supported?

Correct. See Documentation/kbuild/makefiles.rst for more information
about those macros if you are curious.

> Anyway,
> 
> Tested-by: Alexey Kardashevskiy 
> Reviewed-by: Alexey Kardashevskiy 

Thanks a lot!

Cheers,
Nathan


Re: [PATCH -next] powerpc: add support for syscall stack randomization

2022-05-10 Thread Kees Cook
On Tue, May 10, 2022 at 07:23:46PM +1000, Nicholas Piggin wrote:
> Excerpts from Xiu Jianfeng's message of May 5, 2022 9:19 pm:
> > Add support for adding a random offset to the stack while handling
> > syscalls. This patch uses mftb() instead of get_random_int() for better
> > performance.
> 
> Hey, very nice.

Agreed! :)

> > [...]
> > @@ -82,6 +83,7 @@ notrace long system_call_exception(long r3, long r4, long 
> > r5,
> >  
> > kuap_lock();
> >  
> > +   add_random_kstack_offset();
> > regs->orig_gpr3 = r3;
> >  
> > if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
> 
> This looks like the right place. I wonder why other interrupts don't
> get the same treatment. Userspace can induce the kernel to take a 
> synchronous interrupt, or wait for async ones. Smaller surface area 
> maybe but certain instruction emulation for example could result in
> significant logic that depends on user state. Anyway that's for
> hardening gurus to ponder.

I welcome it being used for any userspace controllable entry to the
kernel! :)

Also, related, have you validated the result using the LKDTM test?
See tools/testing/selftests/lkdtm/stack-entropy.sh

> 
> > @@ -405,6 +407,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, 
> > struct pt_regs *regs)
> >  
> > /* Restore user access locks last */
> > kuap_user_restore(regs);
> > +   choose_random_kstack_offset(mftb() & 0xFF);
> >  
> > return ret;
> >  }
> 
> So this seems to be what x86 and s390 do, but why are we choosing a
> new offset for every interrupt when it's only used on a syscall?
> I would rather you do what arm64 does and just choose the offset
> at the end of system_call_exception.
> 
> I wonder why the choose is separated from the add? I guess it's to
> avoid a data dependency for stack access on an expensive random
> function, so that makes sense (a comment would be nice in the
> generic code).

How does this read? I can send a "real" patch if it looks good:


diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
index 1468caf001c0..ad3e80275c74 100644
--- a/include/linux/randomize_kstack.h
+++ b/include/linux/randomize_kstack.h
@@ -40,8 +40,11 @@ DECLARE_PER_CPU(u32, kstack_offset);
  */
 #define KSTACK_OFFSET_MAX(x)   ((x) & 0x3FF)
 
-/*
- * These macros must be used during syscall entry when interrupts and
+/**
+ * add_random_kstack_offset - Increase stack utilization by previously
+ *   chosen random offset
+ *
+ * This should be used in the syscall entry path when interrupts and
  * preempt are disabled, and after user registers have been stored to
  * the stack.
  */
@@ -55,6 +58,24 @@ DECLARE_PER_CPU(u32, kstack_offset);
}   \
 } while (0)
 
+/**
+ * choose_random_kstack_offset - Choose the random offsset for the next
+ *  add_random_kstack_offset()
+ *
+ * This should only be used during syscall exit when interrupts and
+ * preempt are disabled, and before user registers have been restored
+ * from the stack. This is done to frustrate attack attempts from
+ * userspace to learn the offset:
+ * - Maximize the timing uncertainty visible from userspace: if the
+ *   the offset is chosen at syscall entry, userspace has much more
+ *   control over the timing between chosen offsets. "How long will we
+ *   be in kernel mode?" tends to be more difficult to know than "how
+ *   long will be be in user mode?"
+ * - Reduce the lifetime of the new offset sitting in memory during
+ *   kernel mode execution. Exposures of "thread-local" (e.g. current,
+ *   percpu, etc) memory contents tends to be easier than arbitrary
+ *   location memory exposures.
+ */
 #define choose_random_kstack_offset(rand) do { \
if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \
_kstack_offset)) {\


-- 
Kees Cook


Re: [PATCH 14/30] panic: Properly identify the panic event to the notifiers' callbacks

2022-05-10 Thread Guilherme G. Piccoli
On 10/05/2022 12:16, Petr Mladek wrote:
> [...]
> Hmm, this looks like a hack. PANIC_UNUSED will never be used.
> All notifiers will be always called with PANIC_NOTIFIER.
> 
> The @val parameter is normally used when the same notifier_list
> is used in different situations.
> 
> But you are going to use it when the same notifier is used
> in more lists. This is normally distinguished by the @nh
> (atomic_notifier_head) parameter.
> 
> IMHO, it is a bad idea. First, it would confuse people because
> it does not follow the original design of the parameters.
> Second, the related code must be touched anyway when
> the notifier is moved into another list so it does not
> help much.
> 
> Or do I miss anything, please?
> 
> Best Regards,
> Petr

Hi Petr, thanks for the review.

I'm not strong attached to this patch, so we could drop it and refactor
the code of next patches to use the @nh as identification - but
personally, I feel this parameter could be used to identify the list
that called such function, in other words, what is the event that
triggered the callback. Some notifiers are even declared with this
parameter called "ev", like the event that triggers the notifier.


You mentioned 2 cases:

(a) Same notifier_list used in different situations;

(b) Same *notifier callback* used in different lists;

Mine is case (b), right? Can you show me an example of case (a)? You can
see in the following patches (or grep the kernel) that people are using
this identification parameter to determine which kind of OOPS trigger
the callback to condition the execution of the function to specific
cases. IIUIC, this is more or less what I'm doing, but extending the
idea for panic notifiers.

Again, as a personal preference, it makes sense to me using id's VS
comparing pointers to differentiate events/callers.

Cheers,


Guilherme


Re: [PATCH v4 10/14] kbuild: check static EXPORT_SYMBOL* by script instead of modpost

2022-05-10 Thread Masahiro Yamada
On Tue, May 10, 2022 at 3:05 AM 'Nick Desaulniers' via Clang Built
Linux  wrote:
>
> On Sun, May 8, 2022 at 12:10 PM Masahiro Yamada  wrote:
> >
> > diff --git a/scripts/check-local-export b/scripts/check-local-export
> > new file mode 100755
> > index ..d1721fa63057
> > --- /dev/null
> > +++ b/scripts/check-local-export
> > @@ -0,0 +1,48 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +#
> > +# Copyright (C) 2022 Masahiro Yamada
> > +
> > +set -e
> > +set -o pipefail
> > +
> > +declare -A symbol_types
> > +declare -a export_symbols
> > +
> > +exit_code=0
> > +
> > +while read value type name
> > +do
> > +   # to avoid error for clang LTO; $name may be empty
> > +   if [[ $value = -* && -z $name ]]; then
> > +   continue
> > +   fi
> > +
> > +   # The first field (value) may be empty. If so, fix it up.
> > +   if [[ -z $name ]]; then
> > +  name=${type}
> > +  type=${value}
> > +   fi
>
> Consider adding examples of output from NM as comments where you're
> handling special cases.
>
> Aren't BOTH from LTO?  The first case is:
>
>  T strncpy


For LTO, I see

 t

in the llvm-nm output.



>
> while the second is
>
>  U strncpy

Right, this happens for all unresolved symbols.
The address part is empty.


I will add the output example in the comment block.





> IIUC?
>
> Reviewed-by: Nick Desaulniers 
> --
> Thanks,
> ~Nick Desaulniers
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clang-built-linux+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clang-built-linux/CAKwvOdkhcJB8Bnrt51siRefWe%2BZSvHagCs2G011PzkkrD3cxQw%40mail.gmail.com.



-- 
Best Regards
Masahiro Yamada


Re: [PATCH] tools/perf/tests: Skip perf BPF test if clang is not present

2022-05-10 Thread Athira Rajeev



> On 06-May-2022, at 3:07 PM, Athira Rajeev  wrote:
> 
> 
> 
>> On 05-May-2022, at 10:51 PM, Arnaldo Carvalho de Melo  
>> wrote:
>> 
>> Em Thu, May 05, 2022 at 03:30:39PM +0530, Athira Rajeev escreveu:
>>> Perf BPF filter test fails in environment where "clang"
>>> is not installed.
>>> 
>>> Test failure logs:
>>> 
>>> <<>>
>>> 42: BPF filter:
>>> 42.1: Basic BPF filtering : Skip
>>> 42.2: BPF pinning : FAILED!
>>> 42.3: BPF prologue generation : FAILED!
>>> <<>>
>>> 
>>> Enabling verbose option provided debug logs which says
>>> clang/llvm needs to be installed. Snippet of verbose logs:
>>> 
>>> <<>>
>>> 42.2: BPF pinning  :
>>> --- start ---
>>> test child forked, pid 61423
>>> ERROR:  unable to find clang.
>>> Hint:   Try to install latest clang/llvm to support BPF.
>>>   Check your $PATH
>>> 
>>> <>
>>> 
>>> Failed to compile test case: 'Basic BPF llvm compile'
>>> Unable to get BPF object, fix kbuild first
>>> test child finished with -1
>>>  end 
>>> BPF filter subtest 2: FAILED!
>>> <<>>
>>> 
>>> Here subtests, "BPF pinning" and "BPF prologue generation"
>>> failed and logs shows clang/llvm is needed. After installing
>>> clang, testcase passes.
>>> 
>>> Reason on why subtest failure happens though logs has proper
>>> debug information:
>>> Main function __test__bpf calls test_llvm__fetch_bpf_obj by
>>> passing 4th argument as true ( 4th arguments maps to parameter
>>> "force" in test_llvm__fetch_bpf_obj ). But this will cause
>>> test_llvm__fetch_bpf_obj to skip the check for clang/llvm.
>>> 
>>> Snippet of code part which checks for clang based on
>>> parameter "force" in test_llvm__fetch_bpf_obj:
>>> 
>>> <<>>
>>> if (!force && (!llvm_param.user_set_param &&
>>> <<>>
>>> 
>>> Since force is set to "false", test won't get skipped and
>>> fails to compile test case. The BPF code compilation needs
>>> clang, So pass the fourth argument as "false" and also skip
>>> the test if reason for return is "TEST_SKIP"
>>> 
>>> After the patch:
>>> 
>>> <<>>
>>> 42: BPF filter:
>>> 42.1: Basic BPF filtering : Skip
>>> 42.2: BPF pinning : Skip
>>> 42.3: BPF prologue generation : Skip
>>> <<>>
>> 
>> Wouldn't it be better to add the reason for the skip, like other tests
>> do?
>> 
>> E.g.:
>> 
>> 23: Watchpoint  :
>> 23.1: Read Only Watchpoint  : Skip 
>> (missing hardware support)
>> 23.2: Write Only Watchpoint : Ok
>> 23.3: Read / Write Watchpoint   : Ok
>> 23.4: Modify Watchpoint
>> 
>> Something like:
>> 
>> After the patch:
>> 
>> <<>>
>> 42: BPF filter:
>> 42.1: Basic BPF filtering : Skip (clang not installed)
>> 42.2: BPF pinning : Skip (clang not installed)
>> 42.3: BPF prologue generation : Skip (clang not installed)
> 
> 
> Hi Arnaldo,
> 
> I tried to use TEST_CASE_REASON("BPF pinning", bpf_pinning, "clang not 
> installed")
> 
> The clang check is done in test_llvm__fetch_bpf_obj under some condition 
> checks:
> 
> <<>>
> /*
> * Skip this test if user's .perfconfig doesn't set [llvm] section
> * and clang is not found in $PATH
> */
>if (!force && (!llvm_param.user_set_param &&
>   llvm__search_clang())) {
>pr_debug("No clang, skip this test\n");
>return TEST_SKIP;
>}
> <<>>
> 
> But the reason for BPF skip could happen at other places also ie non-root 
> user, bpf support checks from check_env.
> So can't exactly print the skip reason to be clang since It could get skipped 
> from other environment checks too. Any suggestions Arnaldo ?
> 
> Thanks
> Athira

Hi Arnaldo,

Looking for suggestions on this change.

Thanks
Athira
> 
>> <<>>
>> 
>>> Signed-off-by: Athira Rajeev 
>>> ---
>>> tools/perf/tests/bpf.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
>>> index 57b9591f7cbb..ae62f01239e3 100644
>>> --- a/tools/perf/tests/bpf.c
>>> +++ b/tools/perf/tests/bpf.c
>>> @@ -222,11 +222,11 @@ static int __test__bpf(int idx)
>>> 
>>> ret = test_llvm__fetch_bpf_obj(_buf, _buf_sz,
>>>bpf_testcase_table[idx].prog_id,
>>> -  true, NULL);
>>> +  false, NULL);
>>> if (ret != TEST_OK || !obj_buf || !obj_buf_sz) {
>>> pr_debug("Unable to get BPF object, %s\n",
>>>  bpf_testcase_table[idx].msg_compile_fail);
>>> -   if (idx == 0)
>>> +   if ((idx == 0) || (ret == TEST_SKIP))
>>> return TEST_SKIP;
>>> else
>>> return TEST_FAIL;
>>> -- 
>>> 2.35.1
>> 
>> -- 
>> 
>> - Arnaldo



Re: [PATCH V2 1/2] tools/perf: Add utility function to read /proc/cpuinfo for any field

2022-05-10 Thread Athira Rajeev



> On 06-May-2022, at 3:03 PM, Athira Rajeev  wrote:
> 
> 
> 
>> On 05-May-2022, at 10:54 PM, Arnaldo Carvalho de Melo  
>> wrote:
>> 
>> Em Thu, May 05, 2022 at 03:09:59PM +0530, Athira Rajeev escreveu:
>>> /proc/cpuinfo provides information about type of processor, number
>>> of CPU's etc. Reading /proc/cpuinfo file outputs useful information
>>> by field name like cpu, platform, model (depending on architecture)
>>> and its value separated by colon.
>>> 
>>> Add new utility function "cpuinfo_field" in "util/header.c" which
>>> accepts field name as input string to search in /proc/cpuinfo content.
>>> This returns the first matching value as resulting string. Example,
>>> calling the function "cpuinfo_field(platform)" in powerpc returns
>>> the platform value. This can be used to fetch processor information
>>> from "cpuinfo" by other utilities/testcases.
>>> 
>>> Signed-off-by: Athira Rajeev 
>>> ---
>>> tools/perf/util/header.c | 53 
>>> tools/perf/util/header.h |  1 +
>>> 2 files changed, 54 insertions(+)
>>> 
>>> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
>>> index a27132e5a5ef..f08857f96606 100644
>>> --- a/tools/perf/util/header.c
>>> +++ b/tools/perf/util/header.c
>>> @@ -983,6 +983,59 @@ static int write_dir_format(struct feat_fd *ff,
>>> return do_write(ff, >dir.version, sizeof(data->dir.version));
>>> }
>>> 
>>> +/*
>>> + * Return entry from /proc/cpuinfo
>>> + * indicated by "search" parameter.
>>> + */
>>> +char *cpuinfo_field(const char *search)
>>> +{
>>> +   FILE *file;
>>> +   char *buf = NULL;
>>> +   char *copy_buf = NULL, *p;
>>> +   size_t len = 0;
>>> +
>>> +   if (!search)
>>> +   return NULL;
>>> +
>>> +   file = fopen("/proc/cpuinfo", "r");
>>> +   if (!file)
>>> +   return NULL;
>>> +
>>> +   while (getline(, , file) > 0) {
>>> +   if (!strncmp(buf, search, strlen(search)))
>> 
>> Can you save the search string lenght in a variable and use it instead
>> of calling strlen() for the same buffer for each line in /proc/cpuinfo?
> 
> 
> Hi Arnaldo, Michael
> 
> Thanks for review comments. Based on suggestion from Michael, I am reworking 
> on patch 2 to SKIP the test
> if physical_id is set to -1 irrespective of value from cpuinfo.
> 
> In this patch, I had written "cpuinfo_field " function as generic function 
> for retrieving any entry from /proc/cpuinfo.
> But it won't be used in patch 2 now. Do you think this function is useful to 
> keep ? Otherwise, I will drop patch 1

Hi,

Requesting for suggestions on this change

Thanks
Athira
> 
> Thanks
> Athira Rajeev
> 
>> 
>>> +   break;
>>> +   }
>>> +
>>> +   if (feof(file))
>>> +   goto done;
>>> +
>>> +   /*
>>> +* Trim the new line and separate
>>> +* value for search field from ":"
>>> +* in cpuinfo line output.
>>> +* Example output line:
>>> +* platform : 
>>> +*/
>>> +   copy_buf = buf;
>>> +   p = strchr(copy_buf, ':');
>> 
>> So you assume that this will always be there, right? Shouldn't we not
>> assume that and check if p is NULL and bail out instead?
>> 
>>> +
>>> +   /* Go to string after ":" */
>>> +   copy_buf = p + 1;
>>> +   p = strchr(copy_buf, '\n');
>> 
>> Ditto.
>> 
>>> +   if (p)
>>> +   *p = '\0';
>>> +
>>> +   /* Copy the filtered string after removing space to buf */
>>> +   strcpy(buf, strim(copy_buf));
>>> +
>>> +   fclose(file);
>>> +   return buf;
>>> +
>>> +done:
>> 
>> Please rename this goto label to "not_found", "done" isn't intention
>> revealing.
>> 
>>> +   free(buf);
>>> +   fclose(file);
>>> +   return NULL;
>>> +}
>>> /*
>>> * Check whether a CPU is online
>>> *
>>> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
>>> index 0eb4bc29a5a4..b0f754364bd4 100644
>>> --- a/tools/perf/util/header.h
>>> +++ b/tools/perf/util/header.h
>>> @@ -166,4 +166,5 @@ int get_cpuid(char *buffer, size_t sz);
>>> 
>>> char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused);
>>> int strcmp_cpuid_str(const char *s1, const char *s2);
>>> +char *cpuinfo_field(const char *search);
>>> #endif /* __PERF_HEADER_H */
>>> -- 
>>> 2.35.1
>> 
>> -- 
>> 
>> - Arnaldo



Re: [PATCH v4 06/14] kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS

2022-05-10 Thread Masahiro Yamada
On Tue, May 10, 2022 at 2:51 AM Nick Desaulniers
 wrote:
>
>  On Sun, May 8, 2022 at 12:10 PM Masahiro Yamada  wrote:
> >
> > diff --git a/include/asm-generic/export.h b/include/asm-generic/export.h
> > index 07a36a874dca..51ce72ce80fa 100644
> > --- a/include/asm-generic/export.h
> > +++ b/include/asm-generic/export.h
> > @@ -2,6 +2,14 @@
> >  #ifndef __ASM_GENERIC_EXPORT_H
> >  #define __ASM_GENERIC_EXPORT_H
> >
> > +/*
> > + * This comment block is used by fixdep. Please do not remove.
>
> I don't know much about fixdep. How does that work, if you could summarize?


You can find detailed explanation in scripts/basic/fixdep.c


In short, it works like this:

fixdep parses every source (including headers).
If it finds "CONFIG_MODVERSIONS", it adds a dependency
on $(wildcard include/config/MODVERSIONS)
to the .cmd files.

If CONFIG_MODVERSIONS is toggled in Kconfig,
it touches include/config/MODVERSIONS.[1]

In the next run of Make, all the sources depending on
CONFIG_MODVERSIONS will be re-compiled because the
timestamp of include/config/MODVERSIONS is up-to-date.


[1]: 
https://github.com/torvalds/linux/blob/v5.17/scripts/kconfig/confdata.c#L141





> > + *
> > + * When CONFIG_MODVERSIONS is changed from n to y, all source files having
> > + * EXPORT_SYMBOL variants must be re-compiled because genksyms is run as a
> > + * side effect of the .o build rule.
> > + */
> > +
> >  #ifndef KSYM_FUNC
> >  #define KSYM_FUNC(x) x
> >  #endif
> > @@ -12,9 +20,6 @@
> >  #else
> >  #define KSYM_ALIGN 4
> >  #endif
> > -#ifndef KCRC_ALIGN
> > -#define KCRC_ALIGN 4
> > -#endif
>
> The #ifndef is there because arch/m68k/include/asm/export.h:1 defines
> KCRC_ALIGN. You should delete that, too.

Nice catch! I will clean it up too.




>
> > diff --git a/scripts/genksyms/genksyms.c b/scripts/genksyms/genksyms.c
> > index 4827c5abe5b7..6e6933ae7911 100644
> > --- a/scripts/genksyms/genksyms.c
> > +++ b/scripts/genksyms/genksyms.c
> > @@ -33,7 +33,7 @@ char *cur_filename;
> >  int in_source_file;
> >
> >  static int flag_debug, flag_dump_defs, flag_reference, flag_dump_types,
> > -  flag_preserve, flag_warnings, flag_rel_crcs;
> > +  flag_preserve, flag_warnings;
> >
> >  static int errors;
> >  static int nsyms;
> > @@ -681,10 +681,7 @@ void export_symbol(const char *name)
> > fputs(">\n", debugfile);
> >
> > /* Used as a linker script. */
>
> ^ Does this comment still apply?

No.  From this commit going forward,
the genksyms output will not be used as a linker script.

08/14 will delete this comment anyway, but
it is possible to remove it in this commit.








>
> > -   printf(!flag_rel_crcs ? "__crc_%s = 0x%08lx;\n" :
> > -  "SECTIONS { .rodata : ALIGN(4) { "
> > -  "__crc_%s = .; LONG(0x%08lx); } }\n",
> > -  name, crc);
> > +   printf("__crc_%s = 0x%08lx;\n", name, crc);
> > }
> >  }
> >
> > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > index eceb3ee7ec06..6aee2401f3ad 100755
> > --- a/scripts/link-vmlinux.sh
> > +++ b/scripts/link-vmlinux.sh
> > @@ -88,11 +88,6 @@ modpost_link()
> > gen_initcalls
> > lds="-T .tmp_initcalls.lds"
> >
> > -   if is_enabled CONFIG_MODVERSIONS; then
> > -   gen_symversions
>
> ^ this is the only caller of gen_symversions, right? Then
> gen_symversions can be cleaned up, too?


We can keep it in this commit.

The follow-up cleaning is done in 07/14.

To avoid too big commit,
I separated the build flow change and
trivial cleanups.



>
> > -   lds="${lds} -T .tmp_symversions.lds"
> > -   fi
> > -
> > # This might take a while, so indicate that we're doing
> > # an LTO link
> > info LTO ${1}
> > @@ -183,6 +178,10 @@ vmlinux_link()
> > libs="${KBUILD_VMLINUX_LIBS}"
> > fi
> >
> > +   if is_enabled CONFIG_MODULES; then
> > +   objs="${objs} .vmlinux.export.o"
> > +   fi
> > +
> > if [ "${SRCARCH}" = "um" ]; then
> > wl=-Wl,
> > ld="${CC}"
> > @@ -312,6 +311,7 @@ cleanup()
> > rm -f vmlinux.o
> > rm -f .vmlinux.d
> > rm -f .vmlinux.objs
> > +   rm -f .vmlinux.export.c
>
> Probably can drop the `rm -f .tmp_symversions.lds` here, too?

It will be cleaned up by 07/14.







> --
> Thanks,
> ~Nick Desaulniers



--
Best Regards
Masahiro Yamada


Re: [PATCH v6 29/29] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> The HPET-based hardlockup detector relies on the TSC to determine if an
> observed NMI interrupt was originated by HPET timer. Hence, this detector
> can no longer be used with an unstable TSC.
> 
> In such case, permanently stop the HPET-based hardlockup detector and
> start the perf-based detector.
> 
> Cc: Andi Kleen 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: io...@lists.linux-foundation.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: x...@kernel.org
> Suggested-by: Thomas Gleixner 
> Reviewed-by: Tony Luck 
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Relocated the delcaration of hardlockup_detector_switch_to_perf() to
>x86/nmi.h It does not depend on HPET.
>  * Removed function stub. The shim hardlockup detector is always for x86.
> 
> Changes since v4:
>  * Added a stub version of hardlockup_detector_switch_to_perf() for
>!CONFIG_HPET_TIMER. (lkp)
>  * Reconfigure the whole lockup detector instead of unconditionally
>starting the perf-based hardlockup detector.
> 
> Changes since v3:
>  * None
> 
> Changes since v2:
>  * Introduced this patch.
> 
> Changes since v1:
>  * N/A
> ---
>  arch/x86/include/asm/nmi.h | 6 ++
>  arch/x86/kernel/tsc.c  | 2 ++
>  arch/x86/kernel/watchdog_hld.c | 6 ++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
> index 4a0d5b562c91..47752ff67d8b 100644
> --- a/arch/x86/include/asm/nmi.h
> +++ b/arch/x86/include/asm/nmi.h
> @@ -63,4 +63,10 @@ void stop_nmi(void);
>  void restart_nmi(void);
>  void local_touch_nmi(void);
>  
> +#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR
> +void hardlockup_detector_switch_to_perf(void);
> +#else
> +static inline void hardlockup_detector_switch_to_perf(void) { }
> +#endif
> +
>  #endif /* _ASM_X86_NMI_H */
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index cc1843044d88..74772ffc79d1 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1176,6 +1176,8 @@ void mark_tsc_unstable(char *reason)
>  
>   clocksource_mark_unstable(_tsc_early);
>   clocksource_mark_unstable(_tsc);
> +
> + hardlockup_detector_switch_to_perf();
>  }
>  
>  EXPORT_SYMBOL_GPL(mark_tsc_unstable);
> diff --git a/arch/x86/kernel/watchdog_hld.c b/arch/x86/kernel/watchdog_hld.c
> index ef11f0af4ef5..7940977c6312 100644
> --- a/arch/x86/kernel/watchdog_hld.c
> +++ b/arch/x86/kernel/watchdog_hld.c
> @@ -83,3 +83,9 @@ void watchdog_nmi_start(void)
>   if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET)
>   hardlockup_detector_hpet_start();
>  }
> +
> +void hardlockup_detector_switch_to_perf(void)
> +{
> + detector_type = X86_HARDLOCKUP_DETECTOR_PERF;

Another possible problem along the same lines here,
isn't your watchdog still running at this point? And
it uses detector_type in the switch.

> + lockup_detector_reconfigure();

Actually the detector_type switch is used in some
functions called by lockup_detector_reconfigure()
e.g., watchdog_nmi_stop, so this seems buggy even
without concurrent watchdog.

Is this switching a good idea in general? The admin
has asked for non-standard option because they want
more PMU counterss available and now it eats a
counter potentially causing a problem rather than
detecting one.

I would rather just disable with a warning if it were
up to me. If you *really* wanted to be fancy then
allow admin to re-enable via proc maybe.

Thanks,
Nick



[linux-next] [FC/EXT4] [PPC] WARNING: CPU: 33 PID: 47869 at block/blk-lib.c:50 __blkdev_issue_discard+0x250/0x280

2022-05-10 Thread Tasmiya Nalatwad

Greetings,

linux-next kernel 5.18.0-rc5-next-20220506 WARN_ON is triggered while 
running stress test on FC disk created with the EXT4 filesystem.


Console Logs :

 md127: detected capacity change from 0 to 62879744
 EXT4-fs (dm-11): mounted filesystem with ordered data mode. Quota 
mode: none.

 md127: detected capacity change from 62879744 to 0
 md: md127 stopped.
 EXT4-fs (dm-11): mounted filesystem with ordered data mode. Quota 
mode: none.

 md127: detected capacity change from 0 to 62879744
 WARNING: CPU: 33 PID: 47869 at block/blk-lib.c:50 
__blkdev_issue_discard+0x250/0x280
 Modules linked in: raid0 rpadlpar_io rpaphp nfnetlink tcp_diag 
udp_diag inet_diag unix_diag af_packet_diag netlink_diag bonding rfkill 
sunrpc pseries_rng xts vmx_crypto gf128mul sch_fq_codel binfmt_misc 
ip_tables ext4 mbcache jbd2 dm_round_robin sd_mod t10_pi crc64_rocksoft 
crc64 sg ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror 
dm_region_hash dm_log dm_mod fuse
 CPU: 33 PID: 47869 Comm: mkfs.ext4 Kdump: loaded Not tainted 
5.18.0-rc5-next-20220506-autotest #1

 NIP:  c064beb0 LR: c064bf40 CTR: 
 REGS: c000a7a2f870 TRAP: 0700   Not tainted 
(5.18.0-rc5-next-20220506-autotest)

 MSR:  80029033   CR: 28002282  XER: 
 CFAR: c064bd24 IRQMASK: 0
 GPR00: c064bf40 c000a7a2fb10 c28cbd00 
c000ed96
 GPR04:  8000 0cc0 
c000a7a2fc28
 GPR08:  c000c70a  

 GPR12: 2000 c018ff95ee80  

 GPR16:   0077ef00 
00014e1e56a0
 GPR20: 8000 7fff988b0588 7213dfe8 

 GPR24: 0003  0100 
c000ed96
 GPR28: 0100  c000ed96 
c000ed96

 NIP [c064beb0] __blkdev_issue_discard+0x250/0x280
 LR [c064bf40] blkdev_issue_discard+0x60/0xe0
 Call Trace:
 [c000a7a2fb10] [c000a7a2fb60] 0xc000a7a2fb60 (unreliable)
 [c000a7a2fbe0] [c064bf40] blkdev_issue_discard+0x60/0xe0
 [c000a7a2fc70] [c065e840] blkdev_common_ioctl+0x1b0/0xbf0
 [c000a7a2fd00] [c065f6a8] blkdev_ioctl+0x428/0x6e0
 [c000a7a2fd60] [c04857c8] sys_ioctl+0xf8/0x150
 [c000a7a2fdb0] [c002f468] system_call_exception+0x178/0x380
 [c000a7a2fe10] [c000c64c] system_call_common+0xec/0x250
 --- interrupt: c00 at 0x7fff98524480
 NIP:  7fff98524480 LR: 7fff98867828 CTR: 
 REGS: c000a7a2fe80 TRAP: 0c00   Not tainted 
(5.18.0-rc5-next-20220506-autotest)
 MSR:  8280f033   CR: 
24002288  XER: 

 IRQMASK: 0
 GPR00: 0036 7213dec0 7fff98617100 
0003
 GPR04: 20001277 7213df48 0100 
1000
 GPR08: 0003   

 GPR12:  7fff9895ce40  

 GPR16:   0077ef00 
00014e1e56a0
 GPR20: 8000 7fff988b0588 7213dfe8 

 GPR24: 00011b2a0890 00011b2a08a0 00011b2a0880 
00011b2a00d8
 GPR28:  7fff988b0590 00011b2a00e0 
7213e0a0

 NIP [7fff98524480] 0x7fff98524480
 LR [7fff98867828] 0x7fff98867828
 --- interrupt: c00
 Instruction dump:
 6000 2fa3 419e0018 3c62fe6e 38810068 38630da0 4bb95b21 6000
 3b20ffa1 4b60 6000 6000 <0fe0> 7c0802a6 fb010090 fb4100a0
 ---[ end trace  ]---


--
Regards,
Tasmiya Nalatwad
IBM Linux Technology Center


Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

2022-05-10 Thread Thomas Gleixner
On Tue, May 10 2022 at 21:16, Nicholas Piggin wrote:
> Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
>> +/*
>> + * If in use, the HPET hardlockup detector relies on tsc_khz.
>> + * Reconfigure it to make use of the refined tsc_khz.
>> + */
>> +lockup_detector_reconfigure();
>
> I don't know if the API is conceptually good.
>
> You change something that the lockup detector is currently using, 
> *while* the detector is running asynchronously, and then reconfigure
> it. What happens in the window? If this code is only used for small
> adjustments maybe it does not really matter but in principle it's
> a bad API to export.
>
> lockup_detector_reconfigure as an internal API is okay because it
> reconfigures things while the watchdog is stopped [actually that
> looks untrue for soft dog which uses watchdog_thresh in
> is_softlockup(), but that should be fixed].
>
> You're the arch so you're allowed to stop the watchdog and configure
> it, e.g., hardlockup_detector_perf_stop() is called in arch/.
>
> So you want to disable HPET watchdog if it was enabled, then update
> wherever you're using tsc_khz, then re-enable.

The real question is whether making this refined tsc_khz value
immediately effective matters at all. IMO, it does not because up to
that point the watchdog was happily using the coarse calibrated value
and the whole use TSC to assess whether the HPET fired mechanism is just
a guestimate anyway. So what's the point of trying to guess 'more
correct'.

Thanks,

tglx



Re: [PATCH v6 00/23] Rust support

2022-05-10 Thread Miguel Ojeda
Hi David,

On Tue, May 10, 2022 at 6:45 AM David Gow  wrote:
>
> I've just sent out a pull request to get this working under UML as
> well, which would simplify running these further:
> https://github.com/Rust-for-Linux/linux/pull/766

Thanks a lot!

> Yeah, these are all fair points: particularly for small doctests.
>
> Maybe having an optional name, which more significant tests could use
> to override the file:line names? That could be useful for a few of the
> larger, more often referenced tests.

Sounds reasonable. I can add support for that.

> Ugh: it's a bit ugly either way. I suspect that file:line is still
> probably better, if only because we need some way of looking up the
> test in the code if it fails. I'd hate for people to be randomly
> hashing bits of just to find out what test is failing.

One redeeming quality is that the assertion prints the line/file
number in the generated file, so it would still be possible to check
where it came from:

[13:13:43] # rust_kernel_doctest_str_rs_somehash: ASSERTION FAILED
at rust/doctests_kernel_generated.rs:2209
[13:13:43] Expected 2 > 3 to be true, but is false
[13:13:43] not ok 43 - rust_kernel_doctest_str_rs_somehash
[13:13:43] [FAILED] rust_kernel_doctest_str_rs_somehash

Another alternative is to keep the file:line information around
without embedding it into the test name, e.g. in a TAP comment or a
mapping file (which `kunit.py` could read).

But, yeah, before doing hashes or things like that, I would just go
for simplicity and keep things as they are unless some use case really
needs doctests to be stable.

> Oops: I missed that (one of the issues with testing this on a
> different machine which had a rust toolchain). Looks good to me.
>
> Ah: I didn't realise the plan was always to have crate-specific
> suites, and possibly to split things up.
>
> The KTAP output specification does actually support arbitrary nesting
> (though KUnit itself doesn't at the moment), which would potentially
> be an option if (e.g.) providing the complete module nesting made
> sense. I'm not convinced that'd make things easier to read, though.

That is useful to know in case we need it, thanks!

Cheers,
Miguel


Re: [PATCH v2] KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()

2022-05-10 Thread Christophe Leroy


Le 10/05/2022 à 13:18, Alexander Graf a écrit :
> Commit 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
> moved the switch_mmu_context() to C. While in principle a good idea, it
> meant that the function now uses the stack. The stack is not accessible
> from real mode though.
> 
> So to keep calling the function, let's turn on MSR_DR while we call it.
> That way, all pointer references to the stack are handled virtually.

Is the system ready to handle a DSI in case the stack is not mapped ?

> 
> In addition, make sure to save/restore r12 in an SPRG, as it may get
> clobbered by the C function.
> 
> Reported-by: Matt Evans 
> Fixes: 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")

Oops, sorry for that. I didn't realise that there was other callers to 
switch_mmu_context() than switch_mm_irqs_off().

Christophe

> Signed-off-by: Alexander Graf 
> Cc: sta...@vger.kernel.org # v5.14+
> 
> ---
> 
> v1 -> v2:
> 
>- Save and restore R12, so that we don't touch volatile registers
>  while calling into C.
> ---
>   arch/powerpc/kvm/book3s_32_sr.S | 26 +-
>   1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_32_sr.S b/arch/powerpc/kvm/book3s_32_sr.S
> index e3ab9df6cf19..1ce13e3ab072 100644
> --- a/arch/powerpc/kvm/book3s_32_sr.S
> +++ b/arch/powerpc/kvm/book3s_32_sr.S
> @@ -122,11 +122,27 @@
>   
>   /* 0x0 - 0xb */
>   
> - /* 'current->mm' needs to be in r4 */
> - tophys(r4, r2)
> - lwz r4, MM(r4)
> - tophys(r4, r4)
> - /* This only clobbers r0, r3, r4 and r5 */
> + /* switch_mmu_context() clobbers r12, rescue it */
> + SET_SCRATCH0(r12)
> +
> + /* switch_mmu_context() needs paging, let's enable it */
> + mfmsr   r9
> + ori r11, r9, MSR_DR
> + mtmsr   r11
> + sync
> +
> + /* Calling switch_mmu_context(, current->mm, ); */
> + lwz r4, MM(r2)
>   bl  switch_mmu_context
>   
> + /* Disable paging again */
> + mfmsr   r9
> + li  r6, MSR_DR
> + andcr9, r9, r6
> + mtmsr   r9
> + sync
> +
> + /* restore r12 */
> + GET_SCRATCH0(r12)
> +
>   .endm

Re: [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> The HPET hardlockup detector relies on tsc_khz to estimate the value of
> that the TSC will have when its HPET channel fires. A refined tsc_khz
> helps to estimate better the expected TSC value.
> 
> Using the early value of tsc_khz may lead to a large error in the expected
> TSC value. Restarting the NMI watchdog detector has the effect of kicking
> its HPET channel and make use of the refined tsc_khz.
> 
> When the HPET hardlockup is not in use, restarting the NMI watchdog is
> a noop.
> 
> Cc: Andi Kleen 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: io...@lists.linux-foundation.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: x...@kernel.org
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Introduced this patch
> 
> Changes since v4
>  * N/A
> 
> Changes since v3
>  * N/A
> 
> Changes since v2:
>  * N/A
> 
> Changes since v1:
>  * N/A
> ---
>  arch/x86/kernel/tsc.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index cafacb2e58cc..cc1843044d88 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1386,6 +1386,12 @@ static void tsc_refine_calibration_work(struct 
> work_struct *work)
>   /* Inform the TSC deadline clockevent devices about the recalibration */
>   lapic_update_tsc_freq();
>  
> + /*
> +  * If in use, the HPET hardlockup detector relies on tsc_khz.
> +  * Reconfigure it to make use of the refined tsc_khz.
> +  */
> + lockup_detector_reconfigure();

I don't know if the API is conceptually good.

You change something that the lockup detector is currently using, 
*while* the detector is running asynchronously, and then reconfigure
it. What happens in the window? If this code is only used for small
adjustments maybe it does not really matter but in principle it's
a bad API to export.

lockup_detector_reconfigure as an internal API is okay because it
reconfigures things while the watchdog is stopped [actually that
looks untrue for soft dog which uses watchdog_thresh in
is_softlockup(), but that should be fixed].

You're the arch so you're allowed to stop the watchdog and configure
it, e.g., hardlockup_detector_perf_stop() is called in arch/.

So you want to disable HPET watchdog if it was enabled, then update
wherever you're using tsc_khz, then re-enable.

Thanks,
Nick


Re: [PATCH] powerpc/perf: Add support for caps under sysfs in powerpc

2022-05-10 Thread Athira Rajeev



> On 06-May-2022, at 6:55 PM, Michael Ellerman  wrote:
> 
> Hi Athira,
> 
> Some comments below :)
> 
> Athira Rajeev  writes:
>> Add caps support under "/sys/bus/event_source/devices//"
>> for powerpc. This directory can be used to expose some of the
>> specific features that powerpc PMU supports to the user.
>> Example: pmu_name. The name of PMU registered will depend on
>> platform, say power9 or power10 or it could be Generic Compat
>> PMU.
> 
> Is there precedent for adding a "caps" directory? ie. do other PMUs on
> other architectures already do that?
Hi Michael,

Thanks for the review comments.

Yes, There are other PMU’s on other architectures already having “caps” support.
I will add them in changelog for reference in V2.
> 
> Is there precedent for adding "pmu_name"?
> 
> I don't see any mention of them in Documentation/ABI anywhere.
> 
> If we're the first to do that we should add it to the documentation.
> 
> As this would set a precedent for other PMUs, please Cc the perf
> maintainers on v2.
> 
>> Currently the only way to know which is the registered
>> PMU is from the dmesg logs. But clearing the dmesg will make it
>> difficult to know exact PMU backend used. And even extracting
>> from dmesg will be complicated, as we need  to parse the dmesg
>> logs and add filters for pmu name. Whereas by exposing it via
>> caps will make it easy as we just need to directly read it from
>> the sysfs.
>> 
>> Add a caps directory to /sys/bus/event_source/devices/cpu/
>> for power8, power9, power10 and generic compat PMU.
>> 
>> The information exposed currently:
>> - pmu_name : Underlying PMU name from the driver
>> 
>> Example result with power9 pmu:
>> 
>> # ls /sys/bus/event_source/devices/cpu/caps
>> pmu_name
>> 
>> # cat /sys/bus/event_source/devices/cpu/caps/pmu_name
>> power9
>> 
>> Signed-off-by: Athira Rajeev 
>> Reviewed-by: Madhavan Srinivasan 
>> ---
>> arch/powerpc/perf/generic-compat-pmu.c | 20 
>> arch/powerpc/perf/power10-pmu.c| 20 
>> arch/powerpc/perf/power8-pmu.c | 20 
>> arch/powerpc/perf/power9-pmu.c | 20 
>> 4 files changed, 80 insertions(+)
>> 
>> diff --git a/arch/powerpc/perf/generic-compat-pmu.c 
>> b/arch/powerpc/perf/generic-compat-pmu.c
>> index f3db88aee4dd..7b5fe2d89007 100644
>> --- a/arch/powerpc/perf/generic-compat-pmu.c
>> +++ b/arch/powerpc/perf/generic-compat-pmu.c
>> @@ -151,9 +151,29 @@ static const struct attribute_group 
>> generic_compat_pmu_format_group = {
>>  .attrs = generic_compat_pmu_format_attr,
>> };
>> 
>> +static ssize_t pmu_name_show(struct device *cdev,
>> +struct device_attribute *attr,
>> +char *buf)
>> +{
>> +return snprintf(buf, PAGE_SIZE, "generic_compat_pmu");
>> +}
> 
> That's not a great name, now that it's exposed to userspace.
> 
> For starters it's only generic on Book3S, and if you look at
> init_generic_compat_pmu() it's really a "ISA >= v3.0 fallback PMU" - or
> something like that.
> 
>> +static DEVICE_ATTR_RO(pmu_name);
>> +
>> +static struct attribute *generic_compat_pmu_caps_attrs[] = {
>> +_attr_pmu_name.attr,
>> +NULL
>> +};
>> +
>> +static struct attribute_group generic_compat_pmu_caps_group = {
>> +.name  = "caps",
>> +.attrs = generic_compat_pmu_caps_attrs,
>> +};
>> +
>> static const struct attribute_group *generic_compat_pmu_attr_groups[] = {
>>  _compat_pmu_format_group,
>>  _compat_pmu_events_group,
>> +_compat_pmu_caps_group,
>>  NULL,
>> };
>> 
>> diff --git a/arch/powerpc/perf/power10-pmu.c 
>> b/arch/powerpc/perf/power10-pmu.c
>> index d3398100a60f..a622ff783719 100644
>> --- a/arch/powerpc/perf/power10-pmu.c
>> +++ b/arch/powerpc/perf/power10-pmu.c
>> @@ -258,6 +258,25 @@ static const struct attribute_group 
>> power10_pmu_format_group = {
>>  .attrs = power10_pmu_format_attr,
>> };
>> 
>> +static ssize_t pmu_name_show(struct device *cdev,
>> +struct device_attribute *attr,
>> +char *buf)
>> +{
>> +return snprintf(buf, PAGE_SIZE, "power10");
> 
> I believe that should use sysfs_emit().

Sure will change this.
> 
>> +}
>> +
>> +static DEVICE_ATTR_RO(pmu_name);
>> +
>> +static struct attribute *power10_pmu_caps_attrs[] = {
>> +_attr_pmu_name.attr,
>> +NULL
>> +};
>> +
>> +static struct attribute_group power10_pmu_caps_group = {
>> +.name  = "caps",
>> +.attrs = power10_pmu_caps_attrs,
>> +};
>> +
>> static const struct attribute_group *power10_pmu_attr_groups_dd1[] = {
>>  _pmu_format_group,
>>  _pmu_events_group_dd1,
>> @@ -267,6 +286,7 @@ static const struct attribute_group 
>> *power10_pmu_attr_groups_dd1[] = {
>> static const struct attribute_group *power10_pmu_attr_groups[] = {
>>  _pmu_format_group,
>>  _pmu_events_group,
>> +_pmu_caps_group,
>>  NULL,
>> };
> 
> There's a lot of boiler plate repeated for each PMU.
> 
> We already have power_pmu->name, can we use that and make 

Re: [PATCH v6 24/29] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog"

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am:
> Prepare hardlockup_panic_setup() to handle a comma-separated list of
> options. Thus, it can continue parsing its own command-line options while
> ignoring parameters that are relevant only to specific implementations of
> the hardlockup detector. Such implementations may use an early_param to
> parse their own options.

It can't really handle comma separated list though, until the next
patch. nmi_watchdog=panic,0 does not make sense, so you lost error
handling of that.

And is it kosher to double handle options like this? I'm sure it
happens but it's ugly.

Would you consider just add a new option for x86 and avoid changing
this? Less code and patches.

Thanks,
Nick

> 
> Cc: Andi Kleen 
> Cc: Nicholas Piggin 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: io...@lists.linux-foundation.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: x...@kernel.org
> Reviewed-by: Tony Luck 
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Corrected typo in commit message. (Tony)
> 
> Changes since v4:
>  * None
> 
> Changes since v3:
>  * None
> 
> Changes since v2:
>  * Introduced this patch.
> 
> Changes since v1:
>  * None
> ---
>  kernel/watchdog.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 9166220457bc..6443841a755f 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -73,13 +73,13 @@ void __init hardlockup_detector_disable(void)
>  
>  static int __init hardlockup_panic_setup(char *str)
>  {
> - if (!strncmp(str, "panic", 5))
> + if (parse_option_str(str, "panic"))
>   hardlockup_panic = 1;
> - else if (!strncmp(str, "nopanic", 7))
> + else if (parse_option_str(str, "nopanic"))
>   hardlockup_panic = 0;
> - else if (!strncmp(str, "0", 1))
> + else if (parse_option_str(str, "0"))
>   nmi_watchdog_user_enabled = 0;
> - else if (!strncmp(str, "1", 1))
> + else if (parse_option_str(str, "1"))
>   nmi_watchdog_user_enabled = 1;
>   return 1;
>  }
> -- 
> 2.17.1
> 
> 


Re: [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init()

2022-05-10 Thread Nicholas Piggin
Excerpts from Ricardo Neri's message of May 6, 2022 9:59 am:
> Certain implementations of the hardlockup detector require support for
> Inter-Processor Interrupt shorthands. On x86, support for these can only
> be determined after all the possible CPUs have booted once (in
> smp_init()). Other architectures may not need such check.
> 
> lockup_detector_init() only performs the initializations of data
> structures of the lockup detector. Hence, there are no dependencies on
> smp_init().

I think this is the only real thing which affects other watchdog types?

Not sure if it's a big problem, the secondary CPUs coming up won't
have their watchdog active until quite late, and the primary could
implement its own timeout in __cpu_up for secondary coming up, and
IPI it to get traces if necessary which is probably more robust.

Acked-by: Nicholas Piggin 

> 
> Cc: Andi Kleen 
> Cc: Nicholas Piggin 
> Cc: Andrew Morton 
> Cc: Stephane Eranian 
> Cc: "Ravi V. Shankar" 
> Cc: io...@lists.linux-foundation.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: x...@kernel.org
> Reviewed-by: Tony Luck 
> Signed-off-by: Ricardo Neri 
> ---
> Changes since v5:
>  * Introduced this patch
> 
> Changes since v4:
>  * N/A
> 
> Changes since v3:
>  * N/A
> 
> Changes since v2:
>  * N/A
> 
> Changes since v1:
>  * N/A
> ---
>  init/main.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/init/main.c b/init/main.c
> index 98182c3c2c4b..62c52c9e4c2b 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1600,9 +1600,11 @@ static noinline void __init kernel_init_freeable(void)
>  
>   rcu_init_tasks_generic();
>   do_pre_smp_initcalls();
> - lockup_detector_init();
>  
>   smp_init();
> +
> + lockup_detector_init();
> +
>   sched_init_smp();
>  
>   padata_init();
> -- 
> 2.17.1
> 
> 


Re: [PATCH] bug: Use normal relative pointers in 'struct bug_entry'

2022-05-10 Thread Catalin Marinas
On Thu, May 05, 2022 at 06:09:45PM -0700, Josh Poimboeuf wrote:
> With CONFIG_GENERIC_BUG_RELATIVE_POINTERS, the addr/file relative
> pointers are calculated weirdly: based on the beginning of the bug_entry
> struct address, rather than their respective pointer addresses.
> 
> Make the relative pointers less surprising to both humans and tools by
> calculating them the normal way.
> 
> Signed-off-by: Josh Poimboeuf 
> ---
>  arch/arm64/include/asm/asm-bug.h |  4 ++--

Acked-by: Catalin Marinas 


Re: [PATCH] KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()

2022-05-10 Thread Matt Evans
Hi Alex,

> On 9 May 2022, at 21:23, Alexander Graf  wrote:
> 
> Commit 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
> moved the switch_mmu_context() to C. While in principle a good idea, it
> meant that the function now uses the stack. The stack is not accessible
> from real mode though.
> 
> So to keep calling the function, let's turn on MSR_DR while we call it.
> That way, all pointer references to the stack are handled virtually.
> 
> Reported-by: Matt Evans 
> Fixes: 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C")
> Signed-off-by: Alexander Graf 
> Cc: sta...@vger.kernel.org

Many thanks - this addresses the issue I saw, and has been...

Tested-by: Matt Evans 

...on a G4 host.  One comment though:

> —
> arch/powerpc/kvm/book3s_32_sr.S | 20 +++-
> 1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_32_sr.S b/arch/powerpc/kvm/book3s_32_sr.S
> index e3ab9df6cf19..bd4f798f7a46 100644
> --- a/arch/powerpc/kvm/book3s_32_sr.S
> +++ b/arch/powerpc/kvm/book3s_32_sr.S
> @@ -122,11 +122,21 @@
> 
>   /* 0x0 - 0xb */
> 
> - /* 'current->mm' needs to be in r4 */
> - tophys(r4, r2)
> - lwz r4, MM(r4)
> - tophys(r4, r4)
> - /* This only clobbers r0, r3, r4 and r5 */
> + /* switch_mmu_context() needs paging, let's enable it */
> + mfmsr   r9
> + ori r11, r9, MSR_DR
> + mtmsr   r11
> + sync
> +
> + /* Calling switch_mmu_context(, current->mm, ); */
> + lwz r4, MM(r2)
>   bl  switch_mmu_context

Of the volatile registers, I believe r12 is still valuable here and would need 
to be preserved.
(I can’t spot any others but would defer to your judgement here.)

For example:

diff --git a/arch/powerpc/kvm/book3s_32_sr.S b/arch/powerpc/kvm/book3s_32_sr.S
index e3ab9df6cf19..41fc9ca12d38 100644
--- a/arch/powerpc/kvm/book3s_32_sr.S
+++ b/arch/powerpc/kvm/book3s_32_sr.S
@@ -122,11 +122,23 @@
 
/* 0x0 - 0xb */
 
-   /* 'current->mm' needs to be in r4 */
-   tophys(r4, r2)
-   lwz r4, MM(r4)
-   tophys(r4, r4)
-   /* This only clobbers r0, r3, r4 and r5 */
+   /* switch_mmu_context() needs paging, let's enable it */
+   mfmsr   r9
+   ori r11, r9, MSR_DR
+   mtmsr   r11
+   sync
+
+   SAVE_GPR(12, r1)
+   /* Calling switch_mmu_context(, current->mm, ); */
+   lwz r4, MM(r2)
bl  switch_mmu_context
+   REST_GPR(12, r1)
+
+   /* Disable paging again */
+   mfmsr   r9
+   li  r6, MSR_DR
+   andcr9, r9, r6
+   mtmsr   r9
+   sync
 
 .endm


Matt

> 
> + /* Disable paging again */
> + mfmsr   r9
> + li  r6, MSR_DR
> + andcr9, r9, r6
> + mtmsr   r9
> + sync
> +
> .endm
> -- 
> 2.28.0.394.ge197136389
> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 
> 



[PATCH] powerpc/eeh: Drop redundant spinlock initialization

2022-05-10 Thread Haowen Bai
slot_errbuf_lock has declared and initialized by DEFINE_SPINLOCK,
so we don't need to spin_lock_init again, drop it.

Signed-off-by: Haowen Bai 
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c 
b/arch/powerpc/platforms/pseries/eeh_pseries.c
index f9af879c0222..77b476093e06 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -848,7 +848,6 @@ static int __init eeh_pseries_init(void)
}
 
/* Initialize error log lock and size */
-   spin_lock_init(_errbuf_lock);
eeh_error_buf_size = rtas_token("rtas-error-log-max");
if (eeh_error_buf_size == RTAS_UNKNOWN_SERVICE) {
pr_info("%s: unknown EEH error log size\n",
-- 
2.7.4



Re: [PATCH -next] powerpc: add support for syscall stack randomization

2022-05-10 Thread Nicholas Piggin
Excerpts from Xiu Jianfeng's message of May 5, 2022 9:19 pm:
> Add support for adding a random offset to the stack while handling
> syscalls. This patch uses mftb() instead of get_random_int() for better
> performance.

Hey, very nice.

> 
> Signed-off-by: Xiu Jianfeng 
> ---
>  arch/powerpc/Kconfig| 1 +
>  arch/powerpc/kernel/interrupt.c | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 5fc9153927ac..7e04c9f80cbc 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -192,6 +192,7 @@ config PPC
>   select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
>   select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
>   select HAVE_ARCH_KFENCE if PPC_BOOK3S_32 || PPC_8xx || 
> 40x
> + select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
>   select HAVE_ARCH_KGDB
>   select HAVE_ARCH_MMAP_RND_BITS
>   select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index 784ea3289c84..459385769721 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -4,6 +4,7 @@
>  #include 
>  #include 
>  #include  /* for show_regs */
> +#include 
>  
>  #include 
>  #include 
> @@ -82,6 +83,7 @@ notrace long system_call_exception(long r3, long r4, long 
> r5,
>  
>   kuap_lock();
>  
> + add_random_kstack_offset();
>   regs->orig_gpr3 = r3;
>  
>   if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))

This looks like the right place. I wonder why other interrupts don't
get the same treatment. Userspace can induce the kernel to take a 
synchronous interrupt, or wait for async ones. Smaller surface area 
maybe but certain instruction emulation for example could result in
significant logic that depends on user state. Anyway that's for
hardening gurus to ponder.

> @@ -405,6 +407,7 @@ interrupt_exit_user_prepare_main(unsigned long ret, 
> struct pt_regs *regs)
>  
>   /* Restore user access locks last */
>   kuap_user_restore(regs);
> + choose_random_kstack_offset(mftb() & 0xFF);
>  
>   return ret;
>  }

So this seems to be what x86 and s390 do, but why are we choosing a
new offset for every interrupt when it's only used on a syscall?
I would rather you do what arm64 does and just choose the offset
at the end of system_call_exception.

I wonder why the choose is separated from the add? I guess it's to
avoid a data dependency for stack access on an expensive random
function, so that makes sense (a comment would be nice in the
generic code).

I don't actually know if mftb() is cheaper here than a RNG. It
may not be conditioned all that well either. I would be tempted
to measure. 64-bit *may* be able to use a bit more than 256
bytes of stack too -- we have 16 byte alignment minimum so this
gives only 4 bits of randomness AFAIKS.

Thanks,
Nick


Re: [PATCH] powerpc/crash: save cpu register data in crash_smp_send_stop()

2022-05-10 Thread Nicholas Piggin
Excerpts from Hari Bathini's message of May 7, 2022 2:39 am:
> Capture register data for secondary CPUs in crash_smp_send_stop()
> instead of doing it much later in crash_kexec_prepare_cpus() function
> with another set of NMI IPIs to secondary CPUs. This change avoids
> unnecessarily tricky post processing of data to get the right
> backtrace for these CPUs.

Is the tricky post processing done in crash tools? Is it buggy in
some situations or just fragile code you want to deprecate? Seems
like a good goal either way

I assume the desire to stop secondaries ASAP is not just to get
register data but also to limit the amount of damage they might
cause to the crash process. Can they take interrupts or trigger
the hard lockup watchdog, for example?

> -void crash_smp_send_stop(void)
> -{
> - static bool stopped = false;
> -
> - /*
> -  * In case of fadump, register data for all CPUs is captured by f/w
> -  * on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before
> -  * this rtas call to avoid tricky post processing of those CPUs'
> -  * backtraces.
> -  */
> - if (should_fadump_crash())
> - return;

This is not actually code you changed, but I wonder if it's wrong,
if fadump is enabled then panic runs without stopping secondaries?
Doesn't seem quite right.

> -
> - if (stopped)
> - return;
> -
> - stopped = true;
> -
> -#ifdef CONFIG_NMI_IPI
> - smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_stop_this_cpu, 100);
> -#else
> - smp_call_function(crash_stop_this_cpu, NULL, 0);
> -#endif /* CONFIG_NMI_IPI */
> -}

Now if kexec is not configured do we lose our crash_smp_send_stop
function, or is it only ever called if kexec is enabled?

> -
>  #ifdef CONFIG_NMI_IPI
>  static void nmi_stop_this_cpu(struct pt_regs *regs)
>  {
> diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
> index 22ceeeb705ab..f06dfe71caca 100644
> --- a/arch/powerpc/kexec/crash.c
> +++ b/arch/powerpc/kexec/crash.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * The primary CPU waits a while for all secondary CPUs to enter. This is to
> @@ -102,7 +103,7 @@ void crash_ipi_callback(struct pt_regs *regs)
>   /* NOTREACHED */
>  }
>  
> -static void crash_kexec_prepare_cpus(int cpu)
> +static void crash_kexec_prepare_cpus(void)
>  {
>   unsigned int msecs;
>   volatile unsigned int ncpus = num_online_cpus() - 1;/* Excluding the 
> panic cpu */
> @@ -203,7 +204,7 @@ void crash_kexec_secondary(struct pt_regs *regs)
>  
>  #else/* ! CONFIG_SMP */
>  
> -static void crash_kexec_prepare_cpus(int cpu)
> +static void crash_kexec_prepare_cpus(void)
>  {
>   /*
>* move the secondaries to us so that we can copy
> @@ -249,6 +250,42 @@ static void __maybe_unused crash_kexec_wait_realmode(int 
> cpu)
>  static inline void crash_kexec_wait_realmode(int cpu) {}
>  #endif   /* CONFIG_SMP && CONFIG_PPC64 */
>  
> +void crash_smp_send_stop(void)
> +{
> + static int cpus_stopped;
> +
> + /*
> +  * In case of fadump, register data for all CPUs is captured by f/w
> +  * on ibm,os-term rtas call. Skip IPI callbacks to other CPUs before
> +  * this rtas call to avoid tricky post processing of those CPUs'
> +  * backtraces.
> +  */
> + if (should_fadump_crash())
> + return;
> +
> + if (cpus_stopped)
> + return;
> +
> + cpus_stopped = 1;
> +
> + /* Avoid hardlocking with irresponsive CPU holding logbuf_lock */
> + printk_deferred_enter();
> +
> + /*
> +  * This function is only called after the system
> +  * has panicked or is otherwise in a critical state.
> +  * The minimum amount of code to allow a kexec'd kernel
> +  * to run successfully needs to happen here.
> +  *
> +  * In practice this means stopping other cpus in
> +  * an SMP system.
> +  * The kernel is broken so disable interrupts.
> +  */
> + hard_irq_disable();
> +
> + crash_kexec_prepare_cpus();

This seems to move a bit of the kexec code around so this runs 
before notifiers in the panic path now. Maybe that's okay, I don't
know this code too well, but how feasible would it be to have
crash_stop_this_cpu() call crash_save_cpu()? And keeping the
second IPI.

I do like the idea of removing the second IPI if possible, but
that could be done later by moving the logic into crash_save_cpu()
(it could just poll on a flag until the primary releases it to
the next phase, rather than have the primary send another IPI).

Thanks,
Nick


> +}
> +
>  /*
>   * Register a function to be called on shutdown.  Only use this if you
>   * can't reset your device in the second kernel.
> @@ -312,21 +349,6 @@ void default_machine_crash_shutdown(struct pt_regs *regs)
>   unsigned int i;
>   int (*old_handler)(struct pt_regs *regs);
>  
> - /* Avoid hardlocking with irresponsive CPU holding logbuf_lock */
> - 

Re: [PATCH v4 03/14] modpost: split the section mismatch checks into section-check.c

2022-05-10 Thread Masahiro Yamada
On Tue, May 10, 2022 at 2:20 AM Nick Desaulniers
 wrote:
>
> On Sun, May 8, 2022 at 12:10 PM Masahiro Yamada  wrote:
> >
> > modpost.c is too big, and the half of the code is for section checks.
> > Split it.
> >
> > I fixed some style issues in the moved code.
>
> It would be helpful for review if the split and restyle were distinct
> patches.  Otherwise I can't tell what has changed.
>
> This does lose the ability to use git blame to get more context on
> some of the oddities in modpost (which I have found useful in the
> past).  I don't feel strongly though.


OK, I will just move the code in v5.





> > diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> > index a78b75f0eeb0..e7e2c70a98f5 100644
> > --- a/scripts/mod/modpost.c
> > +++ b/scripts/mod/modpost.c
> > @@ -31,7 +31,7 @@ static bool external_module;
> >  /* Only warn about unresolved symbols */
> >  static bool warn_unresolved;
> >
> > -static int sec_mismatch_count;
> > +int sec_mismatch_count;
>
> ^ this should go in modpost.h if it is to be used in two translation
> units, rather than forward declaring it in section-check.c.  You did
> this for the functions.


Sorry, I do not understand.


In modpost.h, I put the declaration:

  extern int sec_mismatch_count;

If I moved it to the header without 'extern'
I would get multiple definitions.







--
Best Regards
Masahiro Yamada


Re: [PATCH v4 00/14] kbuild: yet another series of cleanups (modpost, LTO, MODULE_REL_CRCS, export.h)

2022-05-10 Thread Masahiro Yamada
On Tue, May 10, 2022 at 7:13 AM Nathan Chancellor  wrote:
>
> On Mon, May 09, 2022 at 01:24:33PM +0900, Masahiro Yamada wrote:
> > On Mon, May 9, 2022 at 4:09 AM Masahiro Yamada  wrote:
> > >
> > > This is the third batch of cleanups in this development cycle.
> > >
> > > Major changes in v4:
> > >  - Move static EXPORT_SYMBOL check to a script
> > >  - Some refactoring
> > >
> > > Major changes in v3:
> > >
> > >  - Generate symbol CRCs as C code, and remove CONFIG_MODULE_REL_CRCS.
> > >
> > > Major changes in v2:
> > >
> > >  - V1 did not work with CONFIG_MODULE_REL_CRCS.
> > >I fixed this for v2.
> > >
> > >  - Reflect some review comments in v1
> > >
> > >  - Refactor the code more
> > >
> > >  - Avoid too long argument error
> >
> > This series is available at
> > git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild.git
> > lto-cleanup-v4
>
> Hi Masahiro,
>
> I checked this out and went to run it through my QEMU tests but I see
> two new errors.
>
> Failure #1:
>
> In file included from scripts/mod/section-check.c:3:
> scripts/mod/modpost.h:15:10: fatal error: 'elfconfig.h' file not found
> #include "elfconfig.h"
>  ^
> 1 error generated.
>
> I was able to get past that with
>
> diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile
> index ca739c6c68a1..c33b83bfbcad 100644
> --- a/scripts/mod/Makefile
> +++ b/scripts/mod/Makefile
> @@ -16,7 +16,7 @@ targets += $(devicetable-offsets-file) devicetable-offsets.s
>
>  # dependencies on generated files need to be listed explicitly
>
> -$(obj)/modpost.o $(obj)/file2alias.o $(obj)/sumversion.o: $(obj)/elfconfig.h
> +$(obj)/modpost.o $(obj)/file2alias.o $(obj)/sumversion.o 
> $(obj)/section-check.o: $(obj)/elfconfig.h
>  $(obj)/file2alias.o: $(obj)/$(devicetable-offsets-file)
>

Thanks for testing.


I will slightly refactor the code as follows.




@@ -16,7 +16,7 @@ targets += $(devicetable-offsets-file) devicetable-offsets.s

 # dependencies on generated files need to be listed explicitly

-$(obj)/modpost.o $(obj)/file2alias.o $(obj)/sumversion.o: $(obj)/elfconfig.h
+$(addprefix $(obj)/, $(modpost-objs)): $(obj)/elfconfig.h

 $(obj)/file2alias.o: $(obj)/$(devicetable-offsets-file)

 quiet_cmd_elfconfig = MKELF   $@








>  quiet_cmd_elfconfig = MKELF   $@
>
> Failure #2:
>
>   GEN .version
>   CHK include/generated/compile.h
>   GEN .tmp_initcalls.lds
>   LTO vmlinux.o
>   OBJTOOL vmlinux.o
>   MODPOST vmlinux.symvers
>   MODINFO modules.builtin.modinfo
>   GEN modules.builtin
>   LD  .tmp_vmlinux.btf
> ld.lld: error: cannot open .vmlinux.export.o: No such file or directory
>   BTF .btf.vmlinux.bin.o
> pahole: .tmp_vmlinux.btf: No such file or directory
>   CC  .vmlinux.export.c
>   LD  .tmp_vmlinux.kallsyms1
> ld.lld: error: .btf.vmlinux.bin.o: unknown file type
> make[1]: *** [Makefile:1159: vmlinux] Error 1
>
> I was not really able to see what is going wrong here. Attached is the
> configuration that I ran into this with. If you need any other
> information, please let me know!

Ah, OK.
This is because .vmlinux.export.o is compiled after gen_btf.

I will swap the order in v5.





> Cheers,
> Nathan



--
Best Regards
Masahiro Yamada


Re: [PATCH] powerpc/papr_scm: Fix leaking nvdimm_events_map elements

2022-05-10 Thread Aneesh Kumar K.V
Vaibhav Jain  writes:

> Right now 'char *' elements allocated individual 'stat_id' in
> 'papr_scm_priv.nvdimm_events_map' during papr_scm_pmu_check_events() leak in
> papr_scm_remove() and papr_scm_pmu_register(), papr_scm_pmu_check_events() 
> error
> paths.
>
> Also individual 'stat_id' arent NULL terminated 'char *' instead they are 
> fixed
> 8-byte sized identifiers. However papr_scm_pmu_register() assumes it to be a
> NULL terminated 'char *' and at other places it assumes it to be a
> 'papr_scm_perf_stat.stat_id' sized string which is 8-byes in size.
>
> Fix this by allocating the memory for papr_scm_priv.nvdimm_events_map to also
> include space for 'stat_id' entries. This is possible since number of 
> available
> events/stat_ids are known upfront. This saves some memory and one extra level 
> of
> indirection from 'nvdimm_events_map' to 'stat_id'. Also rest of the code
> can continue to call 'kfree(papr_scm_priv.nvdimm_events_map)' without needing 
> to
> iterate over the array and free up individual elements.
>
> Also introduce a new typedef called 'state_id_t' thats a 'u8[8]' and can be 
> used
> across papr_scm to deal with stat_ids.
>
> Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support")
> Signed-off-by: Vaibhav Jain 
> ---
>  arch/powerpc/platforms/pseries/papr_scm.c | 48 +++
>  1 file changed, 22 insertions(+), 26 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index 39962c905542..f33a865ad5fb 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -70,8 +70,10 @@
>  #define PAPR_SCM_PERF_STATS_VERSION 0x1
>  
>  /* Struct holding a single performance metric */
> +typedef u8 stat_id_t[8];
> +
>  struct papr_scm_perf_stat {
> - u8 stat_id[8];
> + stat_id_t stat_id;
>   __be64 stat_val;
>  } __packed;

Can we do this as two patch? One that fix the leak and other that adds
new type?

>  
> @@ -126,7 +128,7 @@ struct papr_scm_priv {
>   u64 health_bitmap_inject_mask;
>  
>/* array to have event_code and stat_id mappings */
> - char **nvdimm_events_map;
> + stat_id_t *nvdimm_events_map;
>  };
>  
>  static int papr_scm_pmem_flush(struct nd_region *nd_region,
> @@ -462,7 +464,7 @@ static int papr_scm_pmu_check_events(struct papr_scm_priv 
> *p, struct nvdimm_pmu
>  {
>   struct papr_scm_perf_stat *stat;
>   struct papr_scm_perf_stats *stats;
> - int index, rc, count;
> + int index, rc = 0;
>   u32 available_events;
>  
>   if (!p->stat_buffer_len)
> @@ -478,35 +480,29 @@ static int papr_scm_pmu_check_events(struct 
> papr_scm_priv *p, struct nvdimm_pmu
>   return rc;
>   }
>  
> - /* Allocate memory to nvdimm_event_map */
> - p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), 
> GFP_KERNEL);
> - if (!p->nvdimm_events_map) {
> - rc = -ENOMEM;
> - goto out_stats;
> - }
> -
>   /* Called to get list of events supported */
>   rc = drc_pmem_query_stats(p, stats, 0);
>   if (rc)
> - goto out_nvdimm_events_map;
> -
> - for (index = 0, stat = stats->scm_statistic, count = 0;
> -  index < available_events; index++, ++stat) {
> - p->nvdimm_events_map[count] = kmemdup_nul(stat->stat_id, 8, 
> GFP_KERNEL);
> - if (!p->nvdimm_events_map[count]) {
> - rc = -ENOMEM;
> - goto out_nvdimm_events_map;
> - }
> + goto out;
>  
> - count++;
> + /*
> +  * Allocate memory and populate nvdimm_event_map.
> +  * Allocate an extra element for NULL entry
> +  */
> + p->nvdimm_events_map = kcalloc(available_events + 1,
> +sizeof(stat_id_t), GFP_KERNEL);
> + if (!p->nvdimm_events_map) {
> + rc = -ENOMEM;
> + goto out;
>   }
> - p->nvdimm_events_map[count] = NULL;
> - kfree(stats);
> - return 0;
>  
> -out_nvdimm_events_map:
> - kfree(p->nvdimm_events_map);
> -out_stats:
> + /* Copy all stat_ids to event map */
> + for (index = 0, stat = stats->scm_statistic;
> +  index < available_events; index++, ++stat) {
> + memcpy(>nvdimm_events_map[index], >stat_id,
> +sizeof(stat_id_t));
> + }
> +out:
>   kfree(stats);
>   return rc;
>  }
>
> base-commit: 348c71344111d7a48892e3e52264ff11956fc196
> -- 
> 2.35.1



Re: [PATCH 2/2] powerpc/vdso: Link with ld.lld when requested

2022-05-10 Thread Alexey Kardashevskiy




On 5/10/22 06:46, Nathan Chancellor wrote:

The PowerPC vDSO is linked with $(CC) instead of $(LD), which means the
default linker of the compiler is used instead of the linker requested
by the builder.

   $ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/
   ...

   $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg

   File: arch/powerpc/kernel/vdso/vdso32.so.dbg
   String dump of section '.comment':
   [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

   File: arch/powerpc/kernel/vdso/vdso64.so.dbg
   String dump of section '.comment':
   [ 0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

The compiler option '-fuse-ld' tells the compiler which linker to use
when it is invoked as both the compiler and linker. Use '-fuse-ld=lld'
when LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is
linked with the same linker as the rest of the kernel.

   $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg

   File: arch/powerpc/kernel/vdso/vdso32.so.dbg
   String dump of section '.comment':
   [ 0] Linker: LLD 14.0.0
   [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

   File: arch/powerpc/kernel/vdso/vdso64.so.dbg
   String dump of section '.comment':
   [ 0] Linker: LLD 14.0.0
   [14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

LD can be a full path to ld.lld, which will not be handled properly by
'-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's
search path. '-fuse-ld' can take a path to the linker but it is
deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario.

Use '--ld-path' if it is supported, as it will handle a full path or
just 'ld.lld' properly. See the LLVM commit below for the full details
of '--ld-path'.

Link: https://github.com/ClangBuiltLinux/linux/issues/774
Link: 
https://github.com/llvm/llvm-project/commit/1bc5c84710a8c73ef21295e63c19d10a8c71f2f5
Signed-off-by: Nathan Chancellor 
---
  arch/powerpc/kernel/vdso/Makefile | 1 +
  1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/vdso/Makefile 
b/arch/powerpc/kernel/vdso/Makefile
index 954974287ee7..096b0bf1335f 100644
--- a/arch/powerpc/kernel/vdso/Makefile
+++ b/arch/powerpc/kernel/vdso/Makefile
@@ -48,6 +48,7 @@ UBSAN_SANITIZE := n
  KASAN_SANITIZE := n
  
  ccflags-y := -shared -fno-common -fno-builtin -nostdlib -Wl,--hash-style=both

+ccflags-$(CONFIG_LD_IS_LLD) += $(call cc-option,--ld-path=$(LD),-fuse-ld=lld)



Out of curiosity - how does this work exactly? I can see --ld-path= in 
the output so it works but there is no -fuse-ld=lld, is the second 
argument of cc-option only picked when the first one is not supported?


Anyway,

Tested-by: Alexey Kardashevskiy 
Reviewed-by: Alexey Kardashevskiy 



  
  CC32FLAGS := -Wl,-soname=linux-vdso32.so.1 -m32

  AS32FLAGS := -D__VDSO32__ -s


Re: [PATCH 1/2] powerpc/vdso: Remove unused ENTRY in linker scripts

2022-05-10 Thread Alexey Kardashevskiy




On 5/10/22 06:46, Nathan Chancellor wrote:

When linking vdso{32,64}.so.dbg with ld.lld, there is a warning about
not finding _start for the starting address:

   ld.lld: warning: cannot find entry symbol _start; not setting start address
   ld.lld: warning: cannot find entry symbol _start; not setting start address

Looking at GCC + GNU ld, the entry point address is 0x0:

   $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
   File: vdso32.so.dbg
 Entry point address:   0x0
   File: vdso64.so.dbg
 Entry point address:   0x0

This matches what ld.lld emits:

   $ powerpc64le-linux-gnu-readelf -p .comment vdso{32,64}.so.dbg

   File: vdso32.so.dbg

   String dump of section '.comment':
 [ 0]  Linker: LLD 14.0.0
 [14]  clang version 14.0.0 (Fedora 14.0.0-1.fc37)

   File: vdso64.so.dbg

   String dump of section '.comment':
 [ 0]  Linker: LLD 14.0.0
 [14]  clang version 14.0.0 (Fedora 14.0.0-1.fc37)

   $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
   File: vdso32.so.dbg
 Entry point address:   0x0
   File: vdso64.so.dbg
 Entry point address:   0x0

Remove ENTRY to remove the warning, as it is unnecessary for the vDSO to
function correctly.



Sounds more like a bugfix to me - _start is simply not defined, I wonder 
why ld is not complaining.



Tested-by: Alexey Kardashevskiy 
Reviewed-by: Alexey Kardashevskiy 




Signed-off-by: Nathan Chancellor 
---
  arch/powerpc/kernel/vdso/vdso32.lds.S | 1 -
  arch/powerpc/kernel/vdso/vdso64.lds.S | 1 -
  2 files changed, 2 deletions(-)

diff --git a/arch/powerpc/kernel/vdso/vdso32.lds.S 
b/arch/powerpc/kernel/vdso/vdso32.lds.S
index 58e0099f70f4..e0d19d74455f 100644
--- a/arch/powerpc/kernel/vdso/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso/vdso32.lds.S
@@ -13,7 +13,6 @@ OUTPUT_FORMAT("elf32-powerpcle", "elf32-powerpcle", 
"elf32-powerpcle")
  OUTPUT_FORMAT("elf32-powerpc", "elf32-powerpc", "elf32-powerpc")
  #endif
  OUTPUT_ARCH(powerpc:common)
-ENTRY(_start)
  
  SECTIONS

  {
diff --git a/arch/powerpc/kernel/vdso/vdso64.lds.S 
b/arch/powerpc/kernel/vdso/vdso64.lds.S
index 0288cad428b0..1a4a7bc4c815 100644
--- a/arch/powerpc/kernel/vdso/vdso64.lds.S
+++ b/arch/powerpc/kernel/vdso/vdso64.lds.S
@@ -13,7 +13,6 @@ OUTPUT_FORMAT("elf64-powerpcle", "elf64-powerpcle", 
"elf64-powerpcle")
  OUTPUT_FORMAT("elf64-powerpc", "elf64-powerpc", "elf64-powerpc")
  #endif
  OUTPUT_ARCH(powerpc:common64)
-ENTRY(_start)
  
  SECTIONS

  {