[PATCH v2] powerpc/pseries: fix potential memory leak in init_cpu_associativity()

2022-12-13 Thread Wang Yufen
If the vcpu_associativity alloc memory successfully but the
pcpu_associativity fails to alloc memory, the vcpu_associativity
memory leaks.

Fixes: d62c8deeb6e6 ("powerpc/pseries: Provide vcpu dispatch statistics")
Signed-off-by: Wang Yufen 
---
 arch/powerpc/platforms/pseries/lpar.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 97ef649..bb24545 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -524,8 +524,10 @@ static ssize_t vcpudispatch_stats_write(struct file *file, 
const char __user *p,
 
if (cmd) {
rc = init_cpu_associativity();
-   if (rc)
+   if (rc) {
+   destroy_cpu_associativity();
goto out;
+   }
 
for_each_possible_cpu(cpu) {
disp = per_cpu_ptr(_disp_data, cpu);
-- 
1.8.3.1



Re: [PATCH] powerpc/pseries: fix potential memory leak in init_cpu_associativity()

2022-12-13 Thread wangyufen




在 2022/12/13 14:06, Naveen N. Rao 写道:

Wang Yufen wrote:

If the vcpu_associativity alloc memory successfully but the
pcpu_associativity fails to alloc memory, the vcpu_associativity
memory leaks.

Fixes: d62c8deeb6e6 ("powerpc/pseries: Provide vcpu dispatch statistics")
Signed-off-by: Wang Yufen 
---
 arch/powerpc/platforms/pseries/lpar.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c

index 97ef649..501ee6c 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -211,6 +211,7 @@ static int init_cpu_associativity(void)

 if (!vcpu_associativity || !pcpu_associativity) {
 pr_err("error allocating memory for associativity 
information\n");

+    kfree(vcpu_associativity);


I think we should call destroy_cpu_associativity() here instead. We 
don't know which allocation failed, so it is better to try and free 
both, and also to reset the pointers to 0.


Hi,

Okay, I'll send a v2 with the following modifications:

--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -524,8 +524,10 @@ static ssize_t vcpudispatch_stats_write(struct file 
*file, const char __user *p,


if (cmd) {
rc = init_cpu_associativity();
-   if (rc)
+   if (rc) {
+   destroy_cpu_associativity();
goto out;
+   }

for_each_possible_cpu(cpu) {
disp = per_cpu_ptr(_disp_data, cpu);

Thanks,
Wang




- Naveen




[PATCH] powerpc/64: Implement arch_within_stack_frames

2022-12-13 Thread Nicholas Miehlbradt
Walks the stack when copy_{to,from}_user address is in the stack to
ensure that the object being copied is entirely within a single stack
frame.

Substatially similar to the x86 implementation except using the back
chain to traverse the stack and identify stack frame boundaries.

Signed-off-by: Nicholas Miehlbradt 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/include/asm/thread_info.h | 38 ++
 2 files changed, 39 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2ca5418457ed..4c59d139ea83 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -198,6 +198,7 @@ config PPC
select HAVE_ARCH_KASAN_VMALLOC  if HAVE_ARCH_KASAN
select HAVE_ARCH_KFENCE if ARCH_SUPPORTS_DEBUG_PAGEALLOC
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
+   select HAVE_ARCH_WITHIN_STACK_FRAMESif PPC64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index af58f1ed3952..efdf39e07884 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -186,6 +186,44 @@ static inline bool test_thread_local_flags(unsigned int 
flags)
 #define is_elf2_task() (0)
 #endif
 
+#ifdef CONFIG_PPC64
+
+#ifdef CONFIG_PPC64_ELF_ABI_V1
+#define PARAMETER_SAVE_OFFSET 48
+#else
+#define PARAMETER_SAVE_OFFSET 32
+#endif
+
+/*
+ * Walks up the stack frames to make sure that the specified object is
+ * entirely contained by a single stack frame.
+ *
+ * Returns:
+ * GOOD_FRAME  if within a frame
+ * BAD_STACK   if placed across a frame boundary (or outside stack)
+ */
+static inline int arch_within_stack_frames(const void * const stack,
+  const void * const stackend,
+  const void *obj, unsigned long len)
+{
+   const void *frame;
+   const void *oldframe;
+
+   oldframe = (const void *)current_stack_pointer;
+   frame = *(const void * const *)oldframe;
+
+   while (stack <= frame && frame < stackend) {
+   if (obj + len <= frame)
+   return obj >= oldframe + PARAMETER_SAVE_OFFSET ?
+   GOOD_FRAME : BAD_STACK;
+   oldframe = frame;
+   frame = *(const void * const *)oldframe;
+   }
+
+   return BAD_STACK;
+}
+#endif /* CONFIG_PPC64 */
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
-- 
2.34.1



Re: sched/debug: CPU hotplug operation suffers in a large cpu systems

2022-12-13 Thread Phil Auld
On Wed, Dec 14, 2022 at 10:41:25AM +1100 Michael Ellerman wrote:
> Phil Auld  writes:
> > On Tue, Dec 13, 2022 at 07:23:54AM +0100 Greg Kroah-Hartman wrote:
> >> On Mon, Dec 12, 2022 at 02:17:58PM -0500, Phil Auld wrote:
> >> > Hi,
> >> > 
> >> > On Tue, Nov 08, 2022 at 01:24:39PM +0100 Greg Kroah-Hartman wrote:
> >> > > On Tue, Nov 08, 2022 at 03:30:46PM +0530, Vishal Chourasia wrote:
> >> > > > 
> >> > > > Thanks Greg & Peter for your direction. 
> >> > > > 
> >> > > > While we pursue the idea of having debugfs based on kernfs, we 
> >> > > > thought about
> >> > > > having a boot time parameter which would disable creating and 
> >> > > > updating of the
> >> > > > sched_domain debugfs files and this would also be useful even when 
> >> > > > the kernfs
> >> > > > solution kicks in, as users who may not care about these debugfs 
> >> > > > files would
> >> > > > benefit from a faster CPU hotplug operation.
> >> > > 
> >> > > Ick, no, you would be adding a new user/kernel api that you will be
> >> > > required to support for the next 20+ years.  Just to get over a
> >> > > short-term issue before you solve the problem properly.
> >> > 
> >> > I'm not convinced moving these files from debugfs to kernfs is the right
> >> > fix.  That will take it from ~50 back to ~20 _minutes_ on these systems.
> >> > I don't think either of those numbers is reasonable.
> >> > 
> >> > The issue as I see it is the full rebuild for every change with no way to
> >> > batch the changes. How about something like the below?
> >> > 
> >> > This puts the domains/* files under the sched_verbose flag. About the 
> >> > only
> >> > thing under that flag now are the detailed topology discovery printks 
> >> > anyway
> >> > so this fits together nicely.
> >> > 
> >> > This way the files would be off by default (assuming you don't boot with
> >> > sched_verbose) and can be created at runtime by enabling verbose. 
> >> > Multiple
> >> > changes could also be batched by disabling/makeing changes/re-enabling.
> >> > 
> >> > It does not create a new API, uses one that is already there.
> >> 
> >> The idea seems good, the implementation might need a bit of work :)
> >
> > More than the one comment below? Let me know.
> >
> >> 
> >> > > If you really do not want these debugfs files, just disable debugfs 
> >> > > from
> >> > > your system.  That should be a better short-term solution, right?
> >> > 
> >> > We do find these files useful at times for debugging issue and looking
> >> > at what's going on on the system.
> >> > 
> >> > > 
> >> > > Or better yet, disable SCHED_DEBUG, why can't you do that?
> >> > 
> >> > Same with this... useful information with (modulo issues like this)
> >> > small cost. There are also tuning knobs that are only available
> >> > with SCHED_DEBUG. 
> >> > 
> >> > 
> >> > Cheers,
> >> > Phil
> >> > 
> >> > ---
> >> > 
> >> > sched/debug: Put sched/domains files under verbose flag
> >> > 
> >> > The debug files under sched/domains can take a long time to regenerate,
> >> > especially when updates are done one at a time. Move these files under
> >> > the verbose debug flag. Allow changes to verbose to trigger generation
> >> > of the files. This lets a user batch the updates but still have the
> >> > information available.  The detailed topology printk messages are also
> >> > under verbose.
> >> > 
> >> > Signed-off-by: Phil Auld 
> >> > ---
> >> >  kernel/sched/debug.c | 68 ++--
> >> >  1 file changed, 66 insertions(+), 2 deletions(-)
> >> > 
> >> > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> >> > index 1637b65ba07a..2eb51ee3ccab 100644
> >> > --- a/kernel/sched/debug.c
> >> > +++ b/kernel/sched/debug.c
> >> > @@ -280,6 +280,31 @@ static const struct file_operations 
> >> > sched_dynamic_fops = {
> >> >  
> >> >  __read_mostly bool sched_debug_verbose;
> >> >  
> >> > +static ssize_t sched_verbose_write(struct file *filp, const char __user 
> >> > *ubuf,
> >> > +   size_t cnt, loff_t *ppos);
> >> > +
> >> > +static int sched_verbose_show(struct seq_file *m, void *v)
> >> > +{
> >> > +if (sched_debug_verbose)
> >> > +seq_puts(m,"Y\n");
> >> > +else
> >> > +seq_puts(m,"N\n");
> >> > +return 0;
> >> > +}
> >> > +
> >> > +static int sched_verbose_open(struct inode *inode, struct file *filp)
> >> > +{
> >> > +return single_open(filp, sched_verbose_show, NULL);
> >> > +}
> >> > +
> >> > +static const struct file_operations sched_verbose_fops = {
> >> > +.open   = sched_verbose_open,
> >> > +.write  = sched_verbose_write,
> >> > +.read   = seq_read,
> >> > +.llseek = seq_lseek,
> >> > +.release= seq_release,
> >> > +};
> >> > +
> >> >  static const struct seq_operations sched_debug_sops;
> >> >  
> >> >  static int sched_debug_open(struct inode *inode, struct file *filp)
> >> > @@ 

Re: sched/debug: CPU hotplug operation suffers in a large cpu systems

2022-12-13 Thread Michael Ellerman
Phil Auld  writes:
> On Tue, Dec 13, 2022 at 07:23:54AM +0100 Greg Kroah-Hartman wrote:
>> On Mon, Dec 12, 2022 at 02:17:58PM -0500, Phil Auld wrote:
>> > Hi,
>> > 
>> > On Tue, Nov 08, 2022 at 01:24:39PM +0100 Greg Kroah-Hartman wrote:
>> > > On Tue, Nov 08, 2022 at 03:30:46PM +0530, Vishal Chourasia wrote:
>> > > > 
>> > > > Thanks Greg & Peter for your direction. 
>> > > > 
>> > > > While we pursue the idea of having debugfs based on kernfs, we thought 
>> > > > about
>> > > > having a boot time parameter which would disable creating and updating 
>> > > > of the
>> > > > sched_domain debugfs files and this would also be useful even when the 
>> > > > kernfs
>> > > > solution kicks in, as users who may not care about these debugfs files 
>> > > > would
>> > > > benefit from a faster CPU hotplug operation.
>> > > 
>> > > Ick, no, you would be adding a new user/kernel api that you will be
>> > > required to support for the next 20+ years.  Just to get over a
>> > > short-term issue before you solve the problem properly.
>> > 
>> > I'm not convinced moving these files from debugfs to kernfs is the right
>> > fix.  That will take it from ~50 back to ~20 _minutes_ on these systems.
>> > I don't think either of those numbers is reasonable.
>> > 
>> > The issue as I see it is the full rebuild for every change with no way to
>> > batch the changes. How about something like the below?
>> > 
>> > This puts the domains/* files under the sched_verbose flag. About the only
>> > thing under that flag now are the detailed topology discovery printks 
>> > anyway
>> > so this fits together nicely.
>> > 
>> > This way the files would be off by default (assuming you don't boot with
>> > sched_verbose) and can be created at runtime by enabling verbose. Multiple
>> > changes could also be batched by disabling/makeing changes/re-enabling.
>> > 
>> > It does not create a new API, uses one that is already there.
>> 
>> The idea seems good, the implementation might need a bit of work :)
>
> More than the one comment below? Let me know.
>
>> 
>> > > If you really do not want these debugfs files, just disable debugfs from
>> > > your system.  That should be a better short-term solution, right?
>> > 
>> > We do find these files useful at times for debugging issue and looking
>> > at what's going on on the system.
>> > 
>> > > 
>> > > Or better yet, disable SCHED_DEBUG, why can't you do that?
>> > 
>> > Same with this... useful information with (modulo issues like this)
>> > small cost. There are also tuning knobs that are only available
>> > with SCHED_DEBUG. 
>> > 
>> > 
>> > Cheers,
>> > Phil
>> > 
>> > ---
>> > 
>> > sched/debug: Put sched/domains files under verbose flag
>> > 
>> > The debug files under sched/domains can take a long time to regenerate,
>> > especially when updates are done one at a time. Move these files under
>> > the verbose debug flag. Allow changes to verbose to trigger generation
>> > of the files. This lets a user batch the updates but still have the
>> > information available.  The detailed topology printk messages are also
>> > under verbose.
>> > 
>> > Signed-off-by: Phil Auld 
>> > ---
>> >  kernel/sched/debug.c | 68 ++--
>> >  1 file changed, 66 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
>> > index 1637b65ba07a..2eb51ee3ccab 100644
>> > --- a/kernel/sched/debug.c
>> > +++ b/kernel/sched/debug.c
>> > @@ -280,6 +280,31 @@ static const struct file_operations 
>> > sched_dynamic_fops = {
>> >  
>> >  __read_mostly bool sched_debug_verbose;
>> >  
>> > +static ssize_t sched_verbose_write(struct file *filp, const char __user 
>> > *ubuf,
>> > + size_t cnt, loff_t *ppos);
>> > +
>> > +static int sched_verbose_show(struct seq_file *m, void *v)
>> > +{
>> > +  if (sched_debug_verbose)
>> > +  seq_puts(m,"Y\n");
>> > +  else
>> > +  seq_puts(m,"N\n");
>> > +  return 0;
>> > +}
>> > +
>> > +static int sched_verbose_open(struct inode *inode, struct file *filp)
>> > +{
>> > +  return single_open(filp, sched_verbose_show, NULL);
>> > +}
>> > +
>> > +static const struct file_operations sched_verbose_fops = {
>> > +  .open   = sched_verbose_open,
>> > +  .write  = sched_verbose_write,
>> > +  .read   = seq_read,
>> > +  .llseek = seq_lseek,
>> > +  .release= seq_release,
>> > +};
>> > +
>> >  static const struct seq_operations sched_debug_sops;
>> >  
>> >  static int sched_debug_open(struct inode *inode, struct file *filp)
>> > @@ -303,7 +328,7 @@ static __init int sched_init_debug(void)
>> >debugfs_sched = debugfs_create_dir("sched", NULL);
>> >  
>> >debugfs_create_file("features", 0644, debugfs_sched, NULL, 
>> > _feat_fops);
>> > -  debugfs_create_bool("verbose", 0644, debugfs_sched, 
>> > _debug_verbose);
>> > +  debugfs_create_file("verbose", 0644, debugfs_sched, NULL, 
>> > _verbose_fops);
>> >  #ifdef 

Re: [PATCH v8 4/9] phy: fsl: Add Lynx 10G SerDes driver

2022-12-13 Thread Sean Anderson
On 12/12/22 18:37, Stephen Boyd wrote:
> Quoting Sean Anderson (2022-12-08 07:36:45)
>> On 12/6/22 21:17, Stephen Boyd wrote:
>> > Quoting Sean Anderson (2022-11-01 16:27:21)
>> >> On 11/1/22 16:10, Stephen Boyd wrote:
>> >> >> 
>> >> >> Oh, I remember why I did this. I need the reference clock for 
>> >> >> clk_hw_round_rate,
>> >> >> which is AFAICT the only correct way to implement round_rate.
>> >> >> 
>> >> > 
>> >> > Is the reference clk the parent of the clk implementing
>> >> > clk_ops::round_rate()?
>> >> 
>> >> Yes. We may be able to produce a given output with multiple reference
>> >> rates. However, the clock API provides no mechanism to say "Don't ask
>> >> for the parent clock to be rate X, you just tried it and the parent
>> >> clock can't support it." So instead, we loop over the possible reference
>> >> rates and pick the first one which the parent says it can round to.
>> >> 
>> > 
>> > Sorry, I'm lost. Why can't you loop over possible reference rates in
>> > determine_rate/round_rate clk op here?
>> 
>> This is what I do currently, but you need to have the parent clock to do
>> so. With your suggested method, we never actually get a struct clk(_hw)
>> which we can query for rate support.
> 
> The clk_hw for the parent is given to the determine_rate clk_op in the
> clk_rate_request structure. It's stored in the best_parent_hw pointer
> when the determine_rate function is called. Does that work for you?

Huh, I didn't realize that best_parent_hw was an input as well as an
output. The way it's documented, it makes it seem like this is set by
determine_rate.

I'm still not very happy with having to use an aux bus, as there are
plenty of other drivers which just register clocks with the top-level
device.

I will try and revisit this at some point, but the project has ended so
I'm not sure when I will find the time.

--Sean


Re: powerpc-part: was: Re: [PATCH v6] livepatch: Clear relocation targets on a module removal

2022-12-13 Thread Joe Lawrence
On 12/13/22 8:29 AM, Petr Mladek wrote:
> On Tue 2022-12-13 00:13:46, Song Liu wrote:
>> )() ()On Mon, Dec 12, 2022 at 9:12 AM Petr Mladek  wrote:
>>>
>>> On Fri 2022-12-09 11:59:35, Song Liu wrote:
 On Fri, Dec 9, 2022 at 3:41 AM Petr Mladek  wrote:
> On Mon 2022-11-28 17:57:06, Song Liu wrote:
>> On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek  wrote:
>>>
 --- a/arch/powerpc/kernel/module_64.c
 +++ b/arch/powerpc/kernel/module_64.c
 +#ifdef CONFIG_LIVEPATCH
 +void clear_relocate_add(Elf64_Shdr *sechdrs,
 +const char *strtab,
 +unsigned int symindex,
 +unsigned int relsec,
 +struct module *me)
 +{
>>>
>>> [...]
>>>
 +
 + instruction = (u32 *)location;
 + if (is_mprofile_ftrace_call(symname))
 + continue;
>
> Why do we ignore these symbols?
>
> I can't find any counter-part in apply_relocate_add(). It looks super
> tricky. It would deserve a comment.
>
> And I have no idea how we could maintain these exceptions.
>
 + if 
 (!instr_is_relative_link_branch(ppc_inst(*instruction)))
 + continue;
>
> Same here. It looks super tricky and there is no explanation.

 The two checks are from restore_r2(). But I cannot really remember
 why we needed them. It is probably an updated version from an earlier
 version (3 year earlier..).
>>>
>>> This is a good sign that it has to be explained in a comment.
>>> Or even better, it should not by copy pasted.
>>>
 + instruction += 1;
 + patch_instruction(instruction, ppc_inst(PPC_RAW_NOP()));
>>>
>>> I believe that this is not enough. apply_relocate_add() does this:
>>>
>>> int apply_relocate_add(Elf64_Shdr *sechdrs,
>>> [...]
>>>struct module *me)
>>> {
>>> [...]
>>> case R_PPC_REL24:
>>> /* FIXME: Handle weak symbols here --RR */
>>> if (sym->st_shndx == SHN_UNDEF ||
>>> sym->st_shndx == SHN_LIVEPATCH) {
>>> [...]
>>> if (!restore_r2(strtab + sym->st_name,
>>> (u32 *)location + 
>>> 1, me))
>>> [...]   return -ENOEXEC;
>>>
>>> --->if (patch_instruction((u32 *)location, 
>>> ppc_inst(value)))
>>> return -EFAULT;
>>>
>>> , where restore_r2() does:
>>>
>>> static int restore_r2(const char *name, u32 *instruction, struct module *me)
>>> {
>>> [...]
>>> /* ld r2,R2_STACK_OFFSET(r1) */
>>> --->if (patch_instruction(instruction, ppc_inst(PPC_INST_LD_TOC)))
>>> return 0;
>>> [...]
>>> }
>>>
>>> By other words, apply_relocate_add() modifies two instructions:
>>>
>>>+ patch_instruction() called in restore_r2() writes into "location + 1"
>>>+ patch_instruction() called in apply_relocate_add() writes into 
>>> "location"
>>>
>>> IMHO, we have to clear both.
>>>
>>> IMHO, we need to implement a function that reverts the changes done
>>> in restore_r2(). Also we need to revert the changes done in
>>> apply_relocate_add().
>>
>> I finally got time to read all the details again and recalled what
>> happened with the code.
>>
>> The failure happens when we
>> 1) call apply_relocate_add() on klp load (or module first load,
>>if klp was loaded first);
>> 2) do nothing when the module is unloaded;
>> 3) call apply_relocate_add() on module reload, which failed.
>>
>> The failure happens at this check in restore_r2():
>>
>> if (*instruction != PPC_RAW_NOP()) {
>> pr_err("%s: Expected nop after call, got %08x at %pS\n",
>> me->name, *instruction, instruction);
>> return 0;
>> }
>>
>> Therefore, apply_relocate_add only fails when "location + 1"
>> is not NOP. And to make it not fail, we only need to write NOP to
>> "location + 1" in clear_relocate_add().
> 
> Yes, this should be enough to pass the existing check.
> 
>> IIUC, you want clear_relocate_add() to undo everything we did
>> in apply_relocate_add(); while I was writing clear_relocate_add()
>> to make the next apply_relocate_add() not fail.
>>
>> I agree that, based on the name, clear_relocate_add() should
>> undo everything by apply_relocate_add(). But I am not sure how
>> to handle some cases. For example, how do we undo
>>
>> case R_PPC64_ADDR32:
>> /* Simply set it */
>> *(u32 *)location = value;
>>break;
>>
>> Shall we just write zeros? I don't think this matters.
> 
> I guess that it would be zeros as we do in x86_64.
> 
> 
>> I think this is the question we should answer first:

Re: [PATCH] [BACKPORT FOR 4.14] libtraceevent: Fix build with binutils 2.35

2022-12-13 Thread Segher Boessenkool
On Tue, Dec 13, 2022 at 07:03:07PM +0100, Christophe Leroy wrote:
> In binutils 2.35, 'nm -D' changed to show symbol versions along with
> symbol names, with the usual @@ separator.

2.37 instead?  And --without-symbol-versions is there to restore the
old behaviour.  The script is parsing the output already so this patch
is simpler maybe, but :-)


Segher


Re: powerpc-part: was: Re: [PATCH v6] livepatch: Clear relocation targets on a module removal

2022-12-13 Thread Song Liu
On Fri, Dec 9, 2022 at 3:41 AM Petr Mladek  wrote:
>
> Hi,
>
> this reply is only about the powerpc-specific part.
>
> Also adding Kamalesh and Michael into Cc who worked on the related
> commit a443bf6e8a7674b86221f49 ("powerpc/modules: Add REL24 relocation
> support of livepatch symbols").
>
>
> On Mon 2022-11-28 17:57:06, Song Liu wrote:
> > On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek  wrote:
> > >
> > > > --- a/arch/powerpc/kernel/module_64.c
> > > > +++ b/arch/powerpc/kernel/module_64.c
>
> I put back the name of the modified file so that it is easier
> to know what changes we are talking about.
>
> [...]
> > > > +#ifdef CONFIG_LIVEPATCH
> > > > +void clear_relocate_add(Elf64_Shdr *sechdrs,
> > > > +const char *strtab,
> > > > +unsigned int symindex,
> > > > +unsigned int relsec,
> > > > +struct module *me)
> > > > +{
> > > > + unsigned int i;
> > > > + Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr;
> > > > + Elf64_Sym *sym;
> > > > + unsigned long *location;
> > > > + const char *symname;
> > > > + u32 *instruction;
> > > > +
> > > > + pr_debug("Clearing ADD relocate section %u to %u\n", relsec,
> > > > +  sechdrs[relsec].sh_info);
> > > > +
> > > > + for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rela); i++) {
> > > > + location = (void 
> > > > *)sechdrs[sechdrs[relsec].sh_info].sh_addr
> > > > + + rela[i].r_offset;
> > > > + sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
> > > > + + ELF64_R_SYM(rela[i].r_info);
> > > > + symname = me->core_kallsyms.strtab
> > > > + + sym->st_name;
>
> The above calculation is quite complex. It seems to be copied from
> apply_relocate_add(). If I maintained this code I would want to avoid
> the duplication. definitely.
>

Back to this one...

If we go with option 2 that clear_relocate_add() only does things
needed to make the next apply_relocate_add() succeed, we are
not likely to have one write_relocate_add(), which is shared by
apply_relocate_add() and clear_relocate_add(). To avoid
duplication, shall we have two helpers to calculate location and
sym? Or maybe one more to calculate symname? I personally
don't like such one liner helper with multiple arguments, such as

static unsigned long *get_location(Elf64_Shdr *sechdrs,
   unsigned int relsec, unsigned int idx)
{
Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr;

return (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
 + rela[idx].r_offset;
}

Then use it as
 location = get_location(sechdrs, relsec, i);

We also need get_sym(), which is similar to get_location.

We can probably pass in different arguments. But I don't find
any options that I like. I think duplicate some code is better in
this case. However, if you do think these helpers are better,
or have other suggestions, I won't insist further.

Thanks,
Song


[linux-next:master] BUILD REGRESSION 39ab32797f072eaf86b1faa7384ac73450684110

2022-12-13 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 39ab32797f072eaf86b1faa7384ac73450684110  Add linux-next specific 
files for 20221213

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202211142244.sgkxbwo2-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211150003.lkfys4he-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211242120.mzzvguln-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211282102.qur7hhrw-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202211301634.cejlltjp-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202212020520.0okmino3-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202212040713.rvney9e8-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202212051759.cev6fyhy-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202212111423.xqzdm0kt-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202212132047.wvoyepma-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202212132110.cve3hjb8-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

Documentation/networking/devlink/etas_es58x.rst: WARNING: document isn't 
included in any toctree
arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): can't 
find starting instruction
arch/powerpc/kernel/optprobes_head.o: warning: objtool: 
optprobe_template_end(): can't find starting instruction
drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn201/irq_service_dcn201.c:139:43:
 warning: unused variable 'dmub_outbox_irq_info_funcs' [-Wunused-const-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn201/irq_service_dcn201.c:40:20: 
warning: no previous prototype for 'to_dal_irq_source_dcn201' 
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/dc/irq/dcn201/irq_service_dcn201.c:40:20: 
warning: no previous prototype for function 'to_dal_irq_source_dcn201' 
[-Wmissing-prototypes]
drivers/regulator/tps65219-regulator.c:310:32: warning: parameter 'dev' set but 
not used [-Wunused-but-set-parameter]
drivers/regulator/tps65219-regulator.c:310:60: warning: parameter 'dev' set but 
not used [-Wunused-but-set-parameter]
drivers/regulator/tps65219-regulator.c:370:26: sparse:int
drivers/regulator/tps65219-regulator.c:370:26: sparse:struct regulator_dev 
*[assigned] rdev
drivers/regulator/tps65219-regulator.c:370:26: warning: ordered comparison of 
pointer with integer zero [-Wextra]
include/linux/compiler_types.h:357:45: error: call to 
'__compiletime_assert_270' declared with attribute error: BUILD_BUG_ON failed: 
sizeof(priv_tbl->probs) % 16
net/netfilter/ipvs/ip_vs_est.c:688:17: sparse:signed long long *
net/netfilter/ipvs/ip_vs_est.c:688:17: sparse:unsigned long long [usertype] 
*

Unverified Error/Warning (likely false positive, please contact us if 
interested):

drivers/i2c/busses/i2c-qcom-geni.c:1028:28: sparse: sparse: symbol 
'i2c_master_hub' was not declared. Should it be static?
drivers/iio/adc/twl6030-gpadc.c:955:16-23: duplicated argument to & or |
drivers/iio/light/tsl2563.c:768:8-33: WARNING: Threaded IRQ with no primary 
handler requested without IRQF_ONESHOT (unless it is nested IRQ)
fs/xfs/xfs_iomap.c:86:29: sparse: sparse: symbol 'xfs_iomap_page_ops' was not 
declared. Should it be static?

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|   |-- 
drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- 
drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- arc-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|   |-- 
drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- 
drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- arm-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|   |-- 
drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- 
drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- arm-randconfig-s032-20221213
|   `-- 
include-linux-compiler_types.h:error:call-to-__compiletime_assert_NNN-declared-with-attribute-error:BUILD_BUG_ON-failed:sizeof(priv_tbl-probs)
|-- arm64-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-irq-dcn201-irq_service_dcn201.c:warning:no-previous-prototype-for-to_dal_irq_source_dcn201
|   |-- 
drivers-regulator-tps65219-regulator.c:warning:ordered-comparison-of-pointer-with-integer-zero
|   `-- 
drivers-regulator-tps65219-regulator.c:warning:parameter-dev-set-but-not-used
|-- i386-allyesconfig
|   |-- 
drivers-gpu-drm-amd

[PATCH] [BACKPORT FOR 4.14] libtraceevent: Fix build with binutils 2.35

2022-12-13 Thread Christophe Leroy
From: Ben Hutchings 

[upstream commit 39efdd94e314336f4acbac4c07e0f37bdc3bef71]

In binutils 2.35, 'nm -D' changed to show symbol versions along with
symbol names, with the usual @@ separator.  When generating
libtraceevent-dynamic-list we need just the names, so strip off the
version suffix if present.

Signed-off-by: Ben Hutchings 
Tested-by: Salvatore Bonaccorso 
Reviewed-by: Steven Rostedt 
Cc: linux-trace-de...@vger.kernel.org
Cc: sta...@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo 
Signed-off-by: Christophe Leroy 
---
 tools/lib/traceevent/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
index a0ac01c647f5..2d6989f8a87c 100644
--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -263,7 +263,7 @@ define do_generate_dynamic_list_file
xargs echo "U w W" | tr 'w ' 'W\n' | sort -u | xargs echo`;\
if [ "$$symbol_type" = "U W" ];then \
(echo '{';  \
-   $(NM) -u -D $1 | awk 'NF>1 {print "\t"$$2";"}' | sort -u;\
+   $(NM) -u -D $1 | awk 'NF>1 {sub("@.*", "", $$2); print 
"\t"$$2";"}' | sort -u;\
echo '};';  \
) > $2; \
else\
-- 
2.38.1



Re: [PATCH v3 0/8] Generic IPI sending tracepoint

2022-12-13 Thread Palmer Dabbelt

On Fri, 02 Dec 2022 07:58:09 PST (-0800), vschn...@redhat.com wrote:

Background
==

Detecting IPI *reception* is relatively easy, e.g. using
trace_irq_handler_{entry,exit} or even just function-trace
flush_smp_call_function_queue() for SMP calls.

Figuring out their *origin*, is trickier as there is no generic tracepoint tied
to e.g. smp_call_function():

o AFAIA x86 has no tracepoint tied to sending IPIs, only receiving them
  (cf. trace_call_function{_single}_entry()).
o arm/arm64 do have trace_ipi_raise(), which gives us the target cpus but also a
  mostly useless string (smp_calls will all be "Function call interrupts").
o Other architectures don't seem to have any IPI-sending related tracepoint.

I believe one reason those tracepoints used by arm/arm64 ended up as they were
is because these archs used to handle IPIs differently from regular interrupts
(the IRQ driver would directly invoke an IPI-handling routine), which meant they
never showed up in trace_irq_handler_{entry, exit}. The trace_ipi_{entry,exit}
tracepoints gave a way to trace IPI reception but those have become redundant as
of:

  56afcd3dbd19 ("ARM: Allow IPIs to be handled as normal interrupts")
  d3afc7f12987 ("arm64: Allow IPIs to be handled as normal interrupts")

which gave IPIs a "proper" handler function used through
generic_handle_domain_irq(), which makes them show up via
trace_irq_handler_{entry, exit}.

Changing stuff up
=

Per the above, it would make sense to reshuffle trace_ipi_raise() and move it
into generic code. This also came up during Daniel's talk on Osnoise at the CPU
isolation MC of LPC 2022 [1].

Now, to be useful, such a tracepoint needs to export:
o targeted CPU(s)
o calling context

The only way to get the calling context with trace_ipi_raise() is to trigger a
stack dump, e.g. $(trace-cmd -e ipi* -T echo 42).

This is instead introducing a new tracepoint which exports the relevant context
(callsite, and requested callback for when the callsite isn't helpful), and is
usable by all architectures as it sits in generic code.

Another thing worth mentioning is that depending on the callsite, the _RET_IP_
fed to the tracepoint is not always useful - generic_exec_single() doesn't tell
you much about the actual callback being sent via IPI, which is why the new
tracepoint also has a @callback argument.

Patches
===

o Patch 1 is included for convenience and will be merged independently. FYI I
  have libtraceevent patches [2] to improve the
  pretty-printing of cpumasks using the new type, which look like:
  <...>-3322  [021]   560.402583: ipi_send_cpumask: cpumask=14,17,21 
callsite=on_each_cpu_cond_mask+0x40 callback=flush_tlb_func+0x0
  <...>-187   [010]   562.590584: ipi_send_cpumask: cpumask=0-23 
callsite=on_each_cpu_cond_mask+0x40 callback=do_sync_core+0x0

o Patches 2-6 spread out the tracepoint across relevant sites.
  Patch 6 ends up sprinkling lots of #include  which I'm not
  the biggest fan of, but is the least horrible solution I've been able to come
  up with so far.

o Patch 8 is trying to be smart about tracing the callback associated with the
  IPI.

This results in having IPI trace events for:

o smp_call_function*()
o smp_send_reschedule()
o irq_work_queue*()
o standalone uses of __smp_call_single_queue()

This is incomplete, just looking at arm64 there's more IPI types that aren't
covered:

  IPI_CPU_STOP,
  IPI_CPU_CRASH_STOP,
  IPI_TIMER,
  IPI_WAKEUP,

... But it feels like a good starting point.

Links
=

[1]: https://youtu.be/5gT57y4OzBM?t=14234
[2]: https://lore.kernel.org/all/20221116144154.3662923-1-vschn...@redhat.com/

Revisions
=

v2 -> v3


o Dropped the generic export of smp_send_reschedule(), turned it into a macro
  and a bunch of imports
o Dropped the send_call_function_single_ipi() macro madness, split it into sched
  and smp bits using some of Peter's suggestions

v1 -> v2


o Ditched single-CPU tracepoint
o Changed tracepoint signature to include callback
o Changed tracepoint callsite field to void *; the parameter is still UL to save
  up on casts due to using _RET_IP_.
o Fixed linking failures due to not exporting smp_send_reschedule()

Steven Rostedt (Google) (1):
  tracing: Add __cpumask to denote a trace event field that is a
cpumask_t

Valentin Schneider (7):
  trace: Add trace_ipi_send_cpumask()
  sched, smp: Trace IPIs sent via send_call_function_single_ipi()
  smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
  irq_work: Trace self-IPIs sent via arch_irq_work_raise()
  treewide: Trace IPIs sent via smp_send_reschedule()
  smp: reword smp call IPI comment
  sched, smp: Trace smp callback causing an IPI

 arch/alpha/kernel/smp.c  |  2 +-
 arch/arc/kernel/smp.c|  2 +-
 arch/arm/kernel/smp.c|  5 +-
 arch/arm/mach-actions/platsmp.c  |  2 +
 arch/arm64/kernel/smp.c  |  3 +-
 

Re: sched/debug: CPU hotplug operation suffers in a large cpu systems

2022-12-13 Thread Phil Auld
On Tue, Dec 13, 2022 at 03:31:06PM +0100 Greg Kroah-Hartman wrote:
> On Tue, Dec 13, 2022 at 08:22:58AM -0500, Phil Auld wrote:
> > On Tue, Dec 13, 2022 at 07:23:54AM +0100 Greg Kroah-Hartman wrote:
> > > On Mon, Dec 12, 2022 at 02:17:58PM -0500, Phil Auld wrote:
> > > > Hi,
> > > > 
> > > > On Tue, Nov 08, 2022 at 01:24:39PM +0100 Greg Kroah-Hartman wrote:
> > > > > On Tue, Nov 08, 2022 at 03:30:46PM +0530, Vishal Chourasia wrote:
> > > > > > 
> > > > > > Thanks Greg & Peter for your direction. 
> > > > > > 
> > > > > > While we pursue the idea of having debugfs based on kernfs, we 
> > > > > > thought about
> > > > > > having a boot time parameter which would disable creating and 
> > > > > > updating of the
> > > > > > sched_domain debugfs files and this would also be useful even when 
> > > > > > the kernfs
> > > > > > solution kicks in, as users who may not care about these debugfs 
> > > > > > files would
> > > > > > benefit from a faster CPU hotplug operation.
> > > > > 
> > > > > Ick, no, you would be adding a new user/kernel api that you will be
> > > > > required to support for the next 20+ years.  Just to get over a
> > > > > short-term issue before you solve the problem properly.
> > > > 
> > > > I'm not convinced moving these files from debugfs to kernfs is the right
> > > > fix.  That will take it from ~50 back to ~20 _minutes_ on these systems.
> > > > I don't think either of those numbers is reasonable.
> > > > 
> > > > The issue as I see it is the full rebuild for every change with no way 
> > > > to
> > > > batch the changes. How about something like the below?
> > > > 
> > > > This puts the domains/* files under the sched_verbose flag. About the 
> > > > only
> > > > thing under that flag now are the detailed topology discovery printks 
> > > > anyway
> > > > so this fits together nicely.
> > > > 
> > > > This way the files would be off by default (assuming you don't boot with
> > > > sched_verbose) and can be created at runtime by enabling verbose. 
> > > > Multiple
> > > > changes could also be batched by disabling/makeing changes/re-enabling.
> > > > 
> > > > It does not create a new API, uses one that is already there.
> > > 
> > > The idea seems good, the implementation might need a bit of work :)
> > 
> > More than the one comment below? Let me know.
> 
> No idea, resubmit a working patch and I'll review it properly :)
>

Will do. 


Thanks,
Phil


-- 



Re: [PATCH v6] livepatch: Clear relocation targets on a module removal

2022-12-13 Thread Petr Mladek
On Tue 2022-12-13 00:28:34, Song Liu wrote:
> On Fri, Dec 9, 2022 at 4:55 AM Miroslav Benes  wrote:
> >
> > Hi,
> >
> > first thank you for taking over and I also appologize for not replying
> > much sooner.
> >
> > On Thu, 1 Sep 2022, Song Liu wrote:
> >
> > > From: Miroslav Benes 
> > >
> > > Josh reported a bug:
> > >
> > >   When the object to be patched is a module, and that module is
> > >   rmmod'ed and reloaded, it fails to load with:
> > >
> > >   module: x86/modules: Skipping invalid relocation target, existing value 
> > > is nonzero for type 2, loc ba0302e9, val a03e293c
> > >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 
> > > 'nfsd' (-8)
> > >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to 
> > > load module 'nfsd'
> > >
> > >   The livepatch module has a relocation which references a symbol
> > >   in the _previous_ loading of nfsd. When apply_relocate_add()
> > >   tries to replace the old relocation with a new one, it sees that
> > >   the previous one is nonzero and it errors out.
> > >
> > >   On ppc64le, we have a similar issue:
> > >
> > >   module_64: livepatch_nfsd: Expected nop after call, got e8410018 at 
> > > e_show+0x60/0x548 [livepatch_nfsd]
> > >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 
> > > 'nfsd' (-8)
> > >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to 
> > > load module 'nfsd'
> > >
> > > He also proposed three different solutions. We could remove the error
> > > check in apply_relocate_add() introduced by commit eda9cec4c9a1
> > > ("x86/module: Detect and skip invalid relocations"). However the check
> > > is useful for detecting corrupted modules.
> > >
> > > We could also deny the patched modules to be removed. If it proved to be
> > > a major drawback for users, we could still implement a different
> > > approach. The solution would also complicate the existing code a lot.
> > >
> > > We thus decided to reverse the relocation patching (clear all relocation
> > > targets on x86_64). The solution is not
> > > universal and is too much arch-specific, but it may prove to be simpler
> > > in the end.
> > >
> > > Reported-by: Josh Poimboeuf 
> > > Signed-off-by: Miroslav Benes 
> > > Signed-off-by: Song Liu 
> >
> > Petr has commented on the code aspects. I will just add that s390x was not
> > dealt with at the time because there was no live patching support for
> > s390x back then if I remember correctly and my notes do not lie. The same
> > applies to powerpc32. I think that both should be fixed as well with this
> > patch. It might also help to clean up the ifdeffery in the patch a bit.
> 
> After reading the code (no testing), I think we don't need any logic for
> ppc32 and s390.
> 
> We need clear_relocate_add() to handle module reload failure.
> 
> I don't think we have similar checks for ppc32 and s390, so
> clear_relocate_add() is not needed for the two.
> 
> OTOH, we can argue that clear_relocate_add() should undo
> everything apply_relocate_add() did. But I do think that
> will be an overkill.

It is true that we do not need to clear the relocations if the values
are not checked in apply_relocated_add().

I do not have strong opinion whether we should do it or not.

One one hand, the clearing code might introduce a bug if it modifies
some wrong location. So, it might do more harm then good.

One the other hand, it feels bad when a code is jumping to a
non-existing address. I know, nobody should call this code.
But it is still a kind of a security hole.

Well, I think that we could keep the clearing functions empty
on ppc32 and s390 in this patch(set). It won't be worse than
it is now. And perfection is the enemy of good.

Best Regards,
Petr


Re: sched/debug: CPU hotplug operation suffers in a large cpu systems

2022-12-13 Thread Greg Kroah-Hartman
On Tue, Dec 13, 2022 at 08:22:58AM -0500, Phil Auld wrote:
> On Tue, Dec 13, 2022 at 07:23:54AM +0100 Greg Kroah-Hartman wrote:
> > On Mon, Dec 12, 2022 at 02:17:58PM -0500, Phil Auld wrote:
> > > Hi,
> > > 
> > > On Tue, Nov 08, 2022 at 01:24:39PM +0100 Greg Kroah-Hartman wrote:
> > > > On Tue, Nov 08, 2022 at 03:30:46PM +0530, Vishal Chourasia wrote:
> > > > > 
> > > > > Thanks Greg & Peter for your direction. 
> > > > > 
> > > > > While we pursue the idea of having debugfs based on kernfs, we 
> > > > > thought about
> > > > > having a boot time parameter which would disable creating and 
> > > > > updating of the
> > > > > sched_domain debugfs files and this would also be useful even when 
> > > > > the kernfs
> > > > > solution kicks in, as users who may not care about these debugfs 
> > > > > files would
> > > > > benefit from a faster CPU hotplug operation.
> > > > 
> > > > Ick, no, you would be adding a new user/kernel api that you will be
> > > > required to support for the next 20+ years.  Just to get over a
> > > > short-term issue before you solve the problem properly.
> > > 
> > > I'm not convinced moving these files from debugfs to kernfs is the right
> > > fix.  That will take it from ~50 back to ~20 _minutes_ on these systems.
> > > I don't think either of those numbers is reasonable.
> > > 
> > > The issue as I see it is the full rebuild for every change with no way to
> > > batch the changes. How about something like the below?
> > > 
> > > This puts the domains/* files under the sched_verbose flag. About the only
> > > thing under that flag now are the detailed topology discovery printks 
> > > anyway
> > > so this fits together nicely.
> > > 
> > > This way the files would be off by default (assuming you don't boot with
> > > sched_verbose) and can be created at runtime by enabling verbose. Multiple
> > > changes could also be batched by disabling/makeing changes/re-enabling.
> > > 
> > > It does not create a new API, uses one that is already there.
> > 
> > The idea seems good, the implementation might need a bit of work :)
> 
> More than the one comment below? Let me know.

No idea, resubmit a working patch and I'll review it properly :)

> > > + r = kstrtobool_from_user(ubuf, cnt, );
> > > + if (!r) {
> > > + mutex_lock(_domains_mutex);
> > > + r = debugfs_file_get(dentry);
> > > + if (unlikely(r))
> > > + return r;
> > > + sched_debug_verbose = bv;
> > > + debugfs_file_put(dentry);
> > 
> > Why the get/put of the debugfs dentry? for just this single value?
> 
> That's what debugfs_file_write_bool() does, which is where I got that since
> that's really what this is doing. I couldn't see a good way to make this
> just call that.
> 
> I suppose the get/put may not be needed since the only way this should
> go away is under that mutex too.

Yes, it should not be needed.

> ... erm, yeah, that return is a problem ... I'll fix that.
> 
> Also, this was originally on v6.1-rc7. I can rebase when I repost but I
> didn't want to do it on a random commit so I picked (at the time) the latest
> tag.  Should I just use the head of Linux? 

Yes, or linux-next.

thanks,

greg k-h


Re: [PATCH v1 06/10] powerpc/bpf: Perform complete extra passes to update addresses

2022-12-13 Thread Naveen N. Rao

Christophe Leroy wrote:

BPF core calls the jit compiler again for an extra pass in order
to properly set subprog addresses.

Unlike other architectures, powerpc only updates the addresses
during that extra pass. It means that holes must have been left
in the code in order to enable the maximum possible instruction
size.

In order avoid waste of space, and waste of CPU time on powerpc
processors on which the NOP instruction is not 0-cycle, perform
two real additional passes.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/net/bpf_jit_comp.c | 85 -
 1 file changed, 85 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 43e634126514..8833bf23f5aa 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -23,74 +23,6 @@ static void bpf_jit_fill_ill_insns(void *area, unsigned int 
size)
memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
 }
 
-/* Fix updated addresses (for subprog calls, ldimm64, et al) during extra pass */

-static int bpf_jit_fixup_addresses(struct bpf_prog *fp, u32 *image,
-  struct codegen_context *ctx, u32 *addrs)
-{
-   const struct bpf_insn *insn = fp->insnsi;
-   bool func_addr_fixed;
-   u64 func_addr;
-   u32 tmp_idx;
-   int i, j, ret;
-
-   for (i = 0; i < fp->len; i++) {
-   /*
-* During the extra pass, only the branch target addresses for
-* the subprog calls need to be fixed. All other instructions
-* can left untouched.
-*
-* The JITed image length does not change because we already
-* ensure that the JITed instruction sequence for these calls
-* are of fixed length by padding them with NOPs.
-*/
-   if (insn[i].code == (BPF_JMP | BPF_CALL) &&
-   insn[i].src_reg == BPF_PSEUDO_CALL) {
-   ret = bpf_jit_get_func_addr(fp, [i], true,
-   _addr,
-   _addr_fixed);


I don't see you updating calls to bpf_jit_get_func_addr() in 
bpf_jit_build_body() to set extra_pass to true. Afaics, that's required 
to get the correct address to be branched to for subprogs.



- Naveen



Re: powerpc-part: was: Re: [PATCH v6] livepatch: Clear relocation targets on a module removal

2022-12-13 Thread Petr Mladek
On Tue 2022-12-13 00:13:46, Song Liu wrote:
> )() ()On Mon, Dec 12, 2022 at 9:12 AM Petr Mladek  wrote:
> >
> > On Fri 2022-12-09 11:59:35, Song Liu wrote:
> > > On Fri, Dec 9, 2022 at 3:41 AM Petr Mladek  wrote:
> > > > On Mon 2022-11-28 17:57:06, Song Liu wrote:
> > > > > On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek  wrote:
> > > > > >
> > > > > > > --- a/arch/powerpc/kernel/module_64.c
> > > > > > > +++ b/arch/powerpc/kernel/module_64.c
> > > > > > > +#ifdef CONFIG_LIVEPATCH
> > > > > > > +void clear_relocate_add(Elf64_Shdr *sechdrs,
> > > > > > > +const char *strtab,
> > > > > > > +unsigned int symindex,
> > > > > > > +unsigned int relsec,
> > > > > > > +struct module *me)
> > > > > > > +{
> >
> > [...]
> >
> > > > > > > +
> > > > > > > + instruction = (u32 *)location;
> > > > > > > + if (is_mprofile_ftrace_call(symname))
> > > > > > > + continue;
> > > >
> > > > Why do we ignore these symbols?
> > > >
> > > > I can't find any counter-part in apply_relocate_add(). It looks super
> > > > tricky. It would deserve a comment.
> > > >
> > > > And I have no idea how we could maintain these exceptions.
> > > >
> > > > > > > + if 
> > > > > > > (!instr_is_relative_link_branch(ppc_inst(*instruction)))
> > > > > > > + continue;
> > > >
> > > > Same here. It looks super tricky and there is no explanation.
> > >
> > > The two checks are from restore_r2(). But I cannot really remember
> > > why we needed them. It is probably an updated version from an earlier
> > > version (3 year earlier..).
> >
> > This is a good sign that it has to be explained in a comment.
> > Or even better, it should not by copy pasted.
> >
> > > > > > > + instruction += 1;
> > > > > > > + patch_instruction(instruction, 
> > > > > > > ppc_inst(PPC_RAW_NOP()));
> >
> > I believe that this is not enough. apply_relocate_add() does this:
> >
> > int apply_relocate_add(Elf64_Shdr *sechdrs,
> > [...]
> >struct module *me)
> > {
> > [...]
> > case R_PPC_REL24:
> > /* FIXME: Handle weak symbols here --RR */
> > if (sym->st_shndx == SHN_UNDEF ||
> > sym->st_shndx == SHN_LIVEPATCH) {
> > [...]
> > if (!restore_r2(strtab + sym->st_name,
> > (u32 *)location + 
> > 1, me))
> > [...]   return -ENOEXEC;
> >
> > --->if (patch_instruction((u32 *)location, 
> > ppc_inst(value)))
> > return -EFAULT;
> >
> > , where restore_r2() does:
> >
> > static int restore_r2(const char *name, u32 *instruction, struct module *me)
> > {
> > [...]
> > /* ld r2,R2_STACK_OFFSET(r1) */
> > --->if (patch_instruction(instruction, ppc_inst(PPC_INST_LD_TOC)))
> > return 0;
> > [...]
> > }
> >
> > By other words, apply_relocate_add() modifies two instructions:
> >
> >+ patch_instruction() called in restore_r2() writes into "location + 1"
> >+ patch_instruction() called in apply_relocate_add() writes into 
> > "location"
> >
> > IMHO, we have to clear both.
> >
> > IMHO, we need to implement a function that reverts the changes done
> > in restore_r2(). Also we need to revert the changes done in
> > apply_relocate_add().
> 
> I finally got time to read all the details again and recalled what
> happened with the code.
> 
> The failure happens when we
> 1) call apply_relocate_add() on klp load (or module first load,
>if klp was loaded first);
> 2) do nothing when the module is unloaded;
> 3) call apply_relocate_add() on module reload, which failed.
> 
> The failure happens at this check in restore_r2():
> 
> if (*instruction != PPC_RAW_NOP()) {
> pr_err("%s: Expected nop after call, got %08x at %pS\n",
> me->name, *instruction, instruction);
> return 0;
> }
> 
> Therefore, apply_relocate_add only fails when "location + 1"
> is not NOP. And to make it not fail, we only need to write NOP to
> "location + 1" in clear_relocate_add().

Yes, this should be enough to pass the existing check.

> IIUC, you want clear_relocate_add() to undo everything we did
> in apply_relocate_add(); while I was writing clear_relocate_add()
> to make the next apply_relocate_add() not fail.
> 
> I agree that, based on the name, clear_relocate_add() should
> undo everything by apply_relocate_add(). But I am not sure how
> to handle some cases. For example, how do we undo
> 
> case R_PPC64_ADDR32:
> /* Simply set it */
> *(u32 *)location = value;
>break;
>
> Shall we just write zeros? I don't think this matters.

I guess that it would be zeros as we do in x86_64.


> 

Re: sched/debug: CPU hotplug operation suffers in a large cpu systems

2022-12-13 Thread Phil Auld
On Tue, Dec 13, 2022 at 07:23:54AM +0100 Greg Kroah-Hartman wrote:
> On Mon, Dec 12, 2022 at 02:17:58PM -0500, Phil Auld wrote:
> > Hi,
> > 
> > On Tue, Nov 08, 2022 at 01:24:39PM +0100 Greg Kroah-Hartman wrote:
> > > On Tue, Nov 08, 2022 at 03:30:46PM +0530, Vishal Chourasia wrote:
> > > > 
> > > > Thanks Greg & Peter for your direction. 
> > > > 
> > > > While we pursue the idea of having debugfs based on kernfs, we thought 
> > > > about
> > > > having a boot time parameter which would disable creating and updating 
> > > > of the
> > > > sched_domain debugfs files and this would also be useful even when the 
> > > > kernfs
> > > > solution kicks in, as users who may not care about these debugfs files 
> > > > would
> > > > benefit from a faster CPU hotplug operation.
> > > 
> > > Ick, no, you would be adding a new user/kernel api that you will be
> > > required to support for the next 20+ years.  Just to get over a
> > > short-term issue before you solve the problem properly.
> > 
> > I'm not convinced moving these files from debugfs to kernfs is the right
> > fix.  That will take it from ~50 back to ~20 _minutes_ on these systems.
> > I don't think either of those numbers is reasonable.
> > 
> > The issue as I see it is the full rebuild for every change with no way to
> > batch the changes. How about something like the below?
> > 
> > This puts the domains/* files under the sched_verbose flag. About the only
> > thing under that flag now are the detailed topology discovery printks anyway
> > so this fits together nicely.
> > 
> > This way the files would be off by default (assuming you don't boot with
> > sched_verbose) and can be created at runtime by enabling verbose. Multiple
> > changes could also be batched by disabling/makeing changes/re-enabling.
> > 
> > It does not create a new API, uses one that is already there.
> 
> The idea seems good, the implementation might need a bit of work :)

More than the one comment below? Let me know.

> 
> > > If you really do not want these debugfs files, just disable debugfs from
> > > your system.  That should be a better short-term solution, right?
> > 
> > We do find these files useful at times for debugging issue and looking
> > at what's going on on the system.
> > 
> > > 
> > > Or better yet, disable SCHED_DEBUG, why can't you do that?
> > 
> > Same with this... useful information with (modulo issues like this)
> > small cost. There are also tuning knobs that are only available
> > with SCHED_DEBUG. 
> > 
> > 
> > Cheers,
> > Phil
> > 
> > ---
> > 
> > sched/debug: Put sched/domains files under verbose flag
> > 
> > The debug files under sched/domains can take a long time to regenerate,
> > especially when updates are done one at a time. Move these files under
> > the verbose debug flag. Allow changes to verbose to trigger generation
> > of the files. This lets a user batch the updates but still have the
> > information available.  The detailed topology printk messages are also
> > under verbose.
> > 
> > Signed-off-by: Phil Auld 
> > ---
> >  kernel/sched/debug.c | 68 ++--
> >  1 file changed, 66 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> > index 1637b65ba07a..2eb51ee3ccab 100644
> > --- a/kernel/sched/debug.c
> > +++ b/kernel/sched/debug.c
> > @@ -280,6 +280,31 @@ static const struct file_operations sched_dynamic_fops 
> > = {
> >  
> >  __read_mostly bool sched_debug_verbose;
> >  
> > +static ssize_t sched_verbose_write(struct file *filp, const char __user 
> > *ubuf,
> > +  size_t cnt, loff_t *ppos);
> > +
> > +static int sched_verbose_show(struct seq_file *m, void *v)
> > +{
> > +   if (sched_debug_verbose)
> > +   seq_puts(m,"Y\n");
> > +   else
> > +   seq_puts(m,"N\n");
> > +   return 0;
> > +}
> > +
> > +static int sched_verbose_open(struct inode *inode, struct file *filp)
> > +{
> > +   return single_open(filp, sched_verbose_show, NULL);
> > +}
> > +
> > +static const struct file_operations sched_verbose_fops = {
> > +   .open   = sched_verbose_open,
> > +   .write  = sched_verbose_write,
> > +   .read   = seq_read,
> > +   .llseek = seq_lseek,
> > +   .release= seq_release,
> > +};
> > +
> >  static const struct seq_operations sched_debug_sops;
> >  
> >  static int sched_debug_open(struct inode *inode, struct file *filp)
> > @@ -303,7 +328,7 @@ static __init int sched_init_debug(void)
> > debugfs_sched = debugfs_create_dir("sched", NULL);
> >  
> > debugfs_create_file("features", 0644, debugfs_sched, NULL, 
> > _feat_fops);
> > -   debugfs_create_bool("verbose", 0644, debugfs_sched, 
> > _debug_verbose);
> > +   debugfs_create_file("verbose", 0644, debugfs_sched, NULL, 
> > _verbose_fops);
> >  #ifdef CONFIG_PREEMPT_DYNAMIC
> > debugfs_create_file("preempt", 0644, debugfs_sched, NULL, 
> > _dynamic_fops);
> >  #endif
> > @@ -402,15 +427,23 

Re: Mass-building defconfigs: many fail with assembler errors

2022-12-13 Thread Segher Boessenkool
On Tue, Dec 13, 2022 at 09:41:59AM +0100, Jan-Benedict Glaw wrote:
> On Tue, 2022-12-13 14:49:20 +1100, Michael Ellerman  
> wrote:
> > So your script should exclude all files that end in ".config".
> 
> Thanks!  Will just drop those.
> 
> > To find the names of the generated configs you can use something like:
> > 
> >  $ awk '/PHONY \+= .*config/ {print $3}' arch/powerpc/Makefile
> 
> ...and integrate these instead. Thanks a lot!

You can also pretend you are a simple user and use the targets
"make help" and "make help-boards" suggest :-)


Segher


[PATCH] [REBASED for 4.14] once: add DO_ONCE_SLOW() for sleepable contexts

2022-12-13 Thread Christophe Leroy
From: Eric Dumazet 

[ Upstream commit 62c07983bef9d3e78e71189441e1a470f0d1e653 ]

Christophe Leroy reported a ~80ms latency spike
happening at first TCP connect() time.

This is because __inet_hash_connect() uses get_random_once()
to populate a perturbation table which became quite big
after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16")

get_random_once() uses DO_ONCE(), which block hard irqs for the duration
of the operation.

This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock
for operations where we prefer to stay in process context.

Then __inet_hash_connect() can use get_random_slow_once()
to populate its perturbation table.

Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16")
Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time")
Reported-by: Christophe Leroy 
Link: 
https://lore.kernel.org/netdev/cann89ilaeybaoyajy0y9umgfff5gpxduog-ervb2jddrnq5...@mail.gmail.com/T/#t
Signed-off-by: Eric Dumazet 
Cc: Willy Tarreau 
Tested-by: Christophe Leroy 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
Signed-off-by: Christophe Leroy 
---
 include/linux/once.h   | 28 
 lib/once.c | 30 ++
 net/ipv4/inet_hashtables.c |  4 ++--
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/include/linux/once.h b/include/linux/once.h
index 6790884d3c57..bb091119b754 100644
--- a/include/linux/once.h
+++ b/include/linux/once.h
@@ -5,10 +5,18 @@
 #include 
 #include 
 
+/* Helpers used from arbitrary contexts.
+ * Hard irqs are blocked, be cautious.
+ */
 bool __do_once_start(bool *done, unsigned long *flags);
 void __do_once_done(bool *done, struct static_key *once_key,
unsigned long *flags);
 
+/* Variant for process contexts only. */
+bool __do_once_slow_start(bool *done);
+void __do_once_slow_done(bool *done, struct static_key *once_key,
+struct module *mod);
+
 /* Call a function exactly once. The idea of DO_ONCE() is to perform
  * a function call such as initialization of random seeds, etc, only
  * once, where DO_ONCE() can live in the fast-path. After @func has
@@ -52,9 +60,29 @@ void __do_once_done(bool *done, struct static_key *once_key,
___ret;  \
})
 
+/* Variant of DO_ONCE() for process/sleepable contexts. */
+#define DO_ONCE_SLOW(func, ...)
 \
+   ({   \
+   bool ___ret = false; \
+   static bool ___done = false; \
+   static struct static_key ___once_key = STATIC_KEY_INIT_TRUE; \
+   if (static_key_true(&___once_key)) { \
+   ___ret = __do_once_slow_start(&___done); \
+   if (unlikely(___ret)) {  \
+   func(__VA_ARGS__);   \
+   __do_once_slow_done(&___done, &___once_key,  \
+   THIS_MODULE);\
+   }\
+   }\
+   ___ret;  \
+   })
+
 #define get_random_once(buf, nbytes)\
DO_ONCE(get_random_bytes, (buf), (nbytes))
 #define get_random_once_wait(buf, nbytes)\
DO_ONCE(get_random_bytes_wait, (buf), (nbytes))  \
 
+#define get_random_slow_once(buf, nbytes)   \
+   DO_ONCE_SLOW(get_random_bytes, (buf), (nbytes))
+
 #endif /* _LINUX_ONCE_H */
diff --git a/lib/once.c b/lib/once.c
index bfb7420d0de3..76c7bbc0aa40 100644
--- a/lib/once.c
+++ b/lib/once.c
@@ -61,3 +61,33 @@ void __do_once_done(bool *done, struct static_key *once_key,
once_disable_jump(once_key);
 }
 EXPORT_SYMBOL(__do_once_done);
+
+static DEFINE_MUTEX(once_mutex);
+
+bool __do_once_slow_start(bool *done)
+   __acquires(once_mutex)
+{
+   mutex_lock(_mutex);
+   if (*done) {
+   mutex_unlock(_mutex);
+   /* Keep sparse happy by restoring an even lock count on
+* this mutex. In case we return here, we don't call into
+* __do_once_done but return early in the DO_ONCE_SLOW() macro.
+*/
+   __acquire(once_mutex);
+   return false;
+   }
+
+   return true;
+}
+EXPORT_SYMBOL(__do_once_slow_start);
+
+void __do_once_slow_done(bool *done, struct static_key *once_key,
+struct module *mod)
+   

Re: Mass-building defconfigs: many fail with assembler errors

2022-12-13 Thread Jan-Benedict Glaw
Hi Segher!

On Mon, 2022-12-12 18:26:13 -0600, Segher Boessenkool 
 wrote:
> On Mon, Dec 12, 2022 at 10:51:17PM +0100, Jan-Benedict Glaw wrote:
[...]
> > For PPC, a good number of those fail,
> > and I probably don't understand PPC well enough to propose patches. Or
> > did I pick wrongly targeted toolchains? Most of the time, my suspicion
> > is that we're not giving the correct -m flags in
> > ./arch/powerpc/boot/?  (My setup for doing test builds is fairly automated, 
> > I
> > can easily throw in patches for testing.)
> 
> Many of those use a 32-bit toolchain with a 64-bit kernel, or they
> require some e500 specific config but not getting it (or the other way
> around).
> 
> > 64-bit.config
> 
> > ==> Why "-m32 -mcpu=powerpc"? Binutils/GCC are for 
> > --target=powerpc64-linux
> 
> Something in your config is forcing that.

That's a configuration that should not have been built. As Michael
wrote, it's just a fragment for a defconfig, I queued it for a build
but that was plain wrong. (As well as other *.config files.)

> > Compiler ICEs (during GIMPLE pass: ccp) in align.c:
> > 
> >   powerpc-linux-gcc -Wp,-MMD,arch/powerpc/kernel/.align.o.d -nostdinc 
> > -I./arch/powerpc/include -I./arch/powerpc/include/generated  -I./include 
> > -I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi 
> > -I./include/uapi -I./include/generated/uapi -include 
> > ./include/linux/compiler-version.h -include ./include/linux/kconfig.h 
> > -include ./include/linux/compiler_types.h -D__KERNEL__ -I ./arch/powerpc 
> > -fmacro-prefix-map=./= -Wall -Wundef -Werror=strict-prototypes 
> > -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE 
> > -Werror=implicit-function-declaration -Werror=implicit-int 
> > -Werror=return-type -Wno-format-security -std=gnu11 -mbig-endian -m32 
> > -msoft-float -pipe -ffixed-r2 -mmultiple -mno-readonly-in-sdata -mcpu=440 
> > -mno-prefixed -mno-pcrel -mno-altivec -mno-vsx -mno-mma 
> > -fno-asynchronous-unwind-tables -mno-string -Wa,-m440 -mbig-endian 
> > -mstack-protector-guard=tls -mstack-protector-guard-reg=r2 
> > -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation 
> > -Wno-format-overflow -Wno-address-of-packed-member -O2 
> > -fno-allow-store-data-races -Wframe-larger-than=1024 
> > -fstack-protector-strong -Wno-main -Wno-unused-but-set-variable 
> > -Wno-unused-const-variable -Wno-dangling-pointer -fomit-frame-pointer 
> > -ftrivial-auto-var-init=zero -fno-stack-clash-protection 
> > -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wcast-function-type 
> > -Wno-stringop-truncation -Wno-stringop-overflow -Wno-restrict 
> > -Wno-maybe-uninitialized -Wno-alloc-size-larger-than 
> > -Wimplicit-fallthrough=5 -fno-strict-overflow -fno-stack-check 
> > -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types 
> > -Werror=designated-init -Wno-packed-not-aligned -g 
> > -mstack-protector-guard-offset=1080 -Werror
> > -DKBUILD_MODFILE='"arch/powerpc/kernel/align"' -DKBUILD_BASENAME='"align"' 
> > -DKBUILD_MODNAME='"align"' -D__KBUILD_MODNAME=kmod_align -c -o 
> > arch/powerpc/kernel/align.o arch/powerpc/kernel/align.c  
> > during GIMPLE pass: ccp
> > arch/powerpc/kernel/align.c: In function 
> > '__copy_inst_from_kernel_nofault':
> > arch/powerpc/kernel/align.c:364:1: internal compiler error: in 
> > maybe_register_def, at tree-into-ssa.cc:1948
> >   364 | }
> >   | ^
> > 0x19d8886 internal_error(char const*, ...)
> >???:0
> > 0x7bb4fe fancy_abort(char const*, int, char const*)
> >???:0
> > 0x1791bfe dom_walker::walk(basic_block_def*)
> >???:0
> > 0xe94ec0 update_ssa(unsigned int)
> >???:0
> > 0x103d6b9 execute_update_addresses_taken()
> >???:0
> > Please submit a full bug report, with preprocessed source (by using 
> > -freport-bug).
> > Please include the complete backtrace with any bug report.
> > See  for instructions.
> > make[3]: *** [scripts/Makefile.build:250: arch/powerpc/kernel/align.o] 
> > Error 1
> > make[2]: *** [scripts/Makefile.build:500: arch/powerpc/kernel] Error 2
> > make[1]: *** [scripts/Makefile.build:500: arch/powerpc] Error 2
> > make: *** [Makefile:1992: .] Error 2
> > 
> > ==> Should probably open a PR for this.
> 
> Yes please!

I'll wait until the current build loop finishes. Looking at
__copy_inst_from_kernel_nofault(), it uses an asm goto, which I had
issues with and there was already a fix for it
(7676235f690e624b7ed41a22b22ce8ccfac1492f,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107997) which might fix
all of those.

Thanks,
  Jan-Benedict

-- 


signature.asc
Description: PGP signature


Re: [PATCH] perf test bpf: Skip test if kernel-debuginfo is not present

2022-12-13 Thread Athira Rajeev



> On 13-Dec-2022, at 12:27 AM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Thu, Nov 03, 2022 at 12:27:01PM +0530, Athira Rajeev escreveu:
>>> On 28-Oct-2022, at 9:12 PM, Kajol Jain  wrote:
>>> 
>>> Perf BPF filter test fails in environment where "kernel-debuginfo"
>>> is not installed.
>>> 
>>> Test failure logs:
>>> <<>>
>>> 42: BPF filter:
>>> 42.1: Basic BPF filtering : Ok
>>> 42.2: BPF pinning : Ok
>>> 42.3: BPF prologue generation : FAILED!
>>> <<>>
>>> 
>>> Enabling verbose option provided debug logs, which says debuginfo
>>> needs to be installed. Snippet of verbose logs:
>>> 
>>> <<>>
>>> 42.3: BPF prologue generation   :
>>> --- start ---
>>> test child forked, pid 28218
>>> <<>>
>>> Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo
>>> package.
>>> bpf_probe: failed to convert perf probe events
>>> Failed to add events selected by BPF
>>> test child finished with -1
>>>  end 
>>> BPF filter subtest 3: FAILED!
>>> <<>>
>>> 
>>> Here subtest, "BPF prologue generation" failed and
>>> logs shows debuginfo is needed. After installing
>>> kernel-debuginfo package, testcase passes.
>>> 
>>> Subtest "BPF prologue generation" failed because, the "do_test"
>>> function returns "TEST_FAIL" without checking the error type
>>> returned by "parse_events_load_bpf_obj" function.
>>> Function parse_events_load_bpf_obj can also return error of type
>>> "-ENOENT" incase kernel-debuginfo package is not installed. Fix this
>>> by adding check for -ENOENT error.
>>> 
>>> Test result after the patch changes:
>>> 
>>> Test failure logs:
>>> <<>>
>>> 42: BPF filter :
>>> 42.1: Basic BPF filtering  : Ok
>>> 42.2: BPF pinning  : Ok
>>> 42.3: BPF prologue generation  : Skip (clang/debuginfo isn't
>>> installed or environment missing BPF support)
>>> 
>>> Fixes: ba1fae431e74bb42 ("perf test: Add 'perf test BPF'")
>>> Signed-off-by: Kajol Jain 
>>> Reviewed-by: Madhavan Srinivasan 
>>> ---
>>> tools/perf/tests/bpf.c | 6 +-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
>>> index 17c023823713..57cecadc1da2 100644
>>> --- a/tools/perf/tests/bpf.c
>>> +++ b/tools/perf/tests/bpf.c
>>> @@ -126,6 +126,10 @@ static int do_test(struct bpf_object *obj, int 
>>> (*func)(void),
>>> 
>>> err = parse_events_load_bpf_obj(_state, _state.list, obj, 
>>> NULL);
>>> parse_events_error__exit(_error);
>>> +   if (err == -ENOENT) {
>>> +   pr_debug("Failed to add events selected by BPF, debuginfo 
>>> package not installed\n");
>>> +   return TEST_SKIP;
>>> +   }
>> 
>> Hi Kajol,
>> 
>> Here, you have used ENOENT to skip the test. But there could be other places 
>> in the code path for “parse_events_load_bpf_obj”
>> which also returns ENOENT. In that case, for any exit that returns ENOENT, 
>> test will get skipped.
>> 
>> Can we look at the logs, example we have this in commit logs:
>> 
>>  Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo
>>  package.
>> 
>> so as to decide whether to skip for debug info ?
> 
> Kajol?
> 
> - Arnaldo

Hi Arnaldo, looking for your suggestion on how to handle the case where 
debuginfo is missing.

Here the bpf test fails because of missing debuginfo. The function which goes 
through the debuginfo check is "parse_events_load_bpf_obj" . 
parse_events_load_bpf_obj internally calls "open_debuginfo" which returns 
ENOENT when debuginfo is missing. The patch fix from Kajol is to skip the test 
using error code ENOENT for debuginfo.

But issue with using this return code is that, there are other places in the 
code path for "parse_events_load_bpf_obj"
which also returns ENOENT. In that case, for any exit path that returns ENOENT, 
test will get skipped.
Hence looking for an alternative way to identify missing debuginfo to skip the 
test. Please share your thoughts on this.

Thanks
Athira


> 
>> Thanks
>> Athira
>> 
>>> if (err || list_empty(_state.list)) {
>>> pr_debug("Failed to add events selected by BPF\n");
>>> return TEST_FAIL;
>>> @@ -368,7 +372,7 @@ static struct test_case bpf_tests[] = {
>>> "clang isn't installed or environment missing BPF 
>>> support"),
>>> #ifdef HAVE_BPF_PROLOGUE
>>> TEST_CASE_REASON("BPF prologue generation", bpf_prologue_test,
>>> -   "clang isn't installed or environment missing BPF 
>>> support"),
>>> +   "clang/debuginfo isn't installed or environment missing 
>>> BPF support"),
>>> #else
>>> TEST_CASE_REASON("BPF prologue generation", bpf_prologue_test, "not 
>>> compiled in"),
>>> #endif
>>> -- 
>>> 2.31.1



Re: Mass-building defconfigs: many fail with assembler errors

2022-12-13 Thread Jan-Benedict Glaw
Hi Michael,

On Tue, 2022-12-13 14:49:20 +1100, Michael Ellerman  wrote:
> Jan-Benedict Glaw  writes:
> > Is anybody else routinely building current Binutils + GCC, to try to
> > build all the Linux defconfigs?
> 
> I did for several years, but eventually stopped because it was taking
> too much time I needed to spend on other things.

I've got one system at my hands to let it build stuff all day. So I'm
trying to extend that as far as possible.

> > For PPC, a good number of those fail,
> > and I probably don't understand PPC well enough to propose patches. Or
> > did I pick wrongly targeted toolchains? Most of the time, my suspicion
> > is that we're not giving the correct -m flags in
> > ./arch/powerpc/boot/?  (My setup for doing test builds is fairly automated, 
> > I
> > can easily throw in patches for testing.)
> 
> All the results against .config are invalid or at least
> dubious. Those files are not standalone defconfigs, they're fragments of
> defconfigs that are assembled together by arch/powerpc/Makefile using
> merge_config.sh.
> 
> So your script should exclude all files that end in ".config".

Thanks!  Will just drop those.

> To find the names of the generated configs you can use something like:
> 
>  $ awk '/PHONY \+= .*config/ {print $3}' arch/powerpc/Makefile

...and integrate these instead. Thanks a lot!

> > 64-bit.config
> >   powerpc64-linux-gcc -Wp,-MD,arch/powerpc/boot/.opal-calls.o.d 
> > -D__ASSEMBLY__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs 
> > -fno-strict-aliasing -O2 -msoft-float -mno-altivec -mno-vsx   -pipe 
> > -fomit-frame-pointer -fno-builtin -fPIC -nostdinc -I./arch/powerpc/include 
> > -I./arch/powerpc/include/generated  -I./include 
> > -I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi 
> > -I./include/uapi -I./include/generated/uapi -include 
> > ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -m32 
> > -mcpu=powerpc -isystem 
> > /var/lib/laminar/run/linux-powerpc-64-bit.config/12/toolchain/bin/../lib/gcc/powerpc64-linux/13.0.0/include
> >  -mbig-endian -nostdinc -c -o arch/powerpc/boot/opal-calls.o 
> > arch/powerpc/boot/opal-calls.S
> > arch/powerpc/boot/opal-calls.S: Assembler messages:
> > arch/powerpc/boot/opal-calls.S:20: Error: unrecognized opcode: `ld'
> > arch/powerpc/boot/opal-calls.S:21: Error: unrecognized opcode: `ld'
> > arch/powerpc/boot/opal-calls.S:32: Error: unrecognized opcode: `std'
> > arch/powerpc/boot/opal-calls.S:49: Error: unrecognized opcode: `ld'
> > arch/powerpc/boot/opal-calls.S:50: Error: unrecognized opcode: `ld'
> > arch/powerpc/boot/opal-calls.S:52: Error: unrecognized opcode: `hrfid'
> > arch/powerpc/boot/opal-calls.S:55: Error: unrecognized opcode: `tdi'
> > arch/powerpc/boot/opal-calls.S:58: Error: unrecognized opcode: `ld'
> > make[1]: *** [arch/powerpc/boot/Makefile:232: 
> > arch/powerpc/boot/opal-calls.o] Error 1
> > make: *** [arch/powerpc/Makefile:247: zImage] Error 2
> 
> ...
> 
> > bamboo_defconfig
> >   powerpc-linux-gcc -Wp,-MD,arch/powerpc/boot/.treeboot-akebono.o.d 
> > -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -O2 
> > -msoft-float -mno-altivec -mno-vsx   -pipe -fomit-frame-pointer 
> > -fno-builtin -fPIC -nostdinc -I./arch/powerpc/include 
> > -I./arch/powerpc/include/generated  -I./include 
> > -I./arch/powerpc/include/uapi -I./arch/powerpc/include/generated/uapi 
> > -I./include/uapi -I./include/generated/uapi -include 
> > ./include/linux/compiler-version.h -include ./include/linux/kconfig.h -m32 
> > -mcpu=powerpc -isystem 
> > /var/lib/laminar/run/linux-powerpc-bamboo_defconfig/12/toolchain/bin/../lib/gcc/powerpc-linux/13.0.0/include
> >  -mbig-endian -fno-stack-protector -include 
> > ./include/linux/compiler_attributes.h -I./arch/powerpc/boot 
> > -I./arch/powerpc/boot -mcpu=405 -c -o arch/powerpc/boot/treeboot-akebono.o 
> > arch/powerpc/boot/treeboot-akebono.c
> > {standard input}: Assembler messages:
> > {standard input}:94: Error: unrecognized opcode: `mtdcrx'
> > {standard input}:101: Error: unrecognized opcode: `mfdcrx'
> > {standard input}:107: Error: unrecognized opcode: `mtdcrx'
> > {standard input}:306: Error: unrecognized opcode: `mfdcrx'
> > make[1]: *** [arch/powerpc/boot/Makefile:229: 
> > arch/powerpc/boot/treeboot-akebono.o] Error 1
> > make: *** [arch/powerpc/Makefile:247: zImage] Error 2
> 
> Both treeboot-akebono.c and treeboot-currituck.c are for 476 so should
> probably be built with -mcpu=476. eg:
> 
> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> index d32d95aea5d6..acb6eddace8f 100644
> --- a/arch/powerpc/boot/Makefile
> +++ b/arch/powerpc/boot/Makefile
> @@ -88,8 +88,8 @@ $(obj)/cuboot-taishan.o: BOOTCFLAGS += -mcpu=440
>  $(obj)/cuboot-katmai.o: BOOTCFLAGS += -mcpu=440
>  $(obj)/cuboot-acadia.o: BOOTCFLAGS += -mcpu=405
>  $(obj)/treeboot-iss4xx.o: BOOTCFLAGS += -mcpu=405
> -$(obj)/treeboot-currituck.o: 

Re: [PATCH v6] livepatch: Clear relocation targets on a module removal

2022-12-13 Thread Song Liu
On Fri, Dec 9, 2022 at 4:55 AM Miroslav Benes  wrote:
>
> Hi,
>
> first thank you for taking over and I also appologize for not replying
> much sooner.
>
> On Thu, 1 Sep 2022, Song Liu wrote:
>
> > From: Miroslav Benes 
> >
> > Josh reported a bug:
> >
> >   When the object to be patched is a module, and that module is
> >   rmmod'ed and reloaded, it fails to load with:
> >
> >   module: x86/modules: Skipping invalid relocation target, existing value 
> > is nonzero for type 2, loc ba0302e9, val a03e293c
> >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' 
> > (-8)
> >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to 
> > load module 'nfsd'
> >
> >   The livepatch module has a relocation which references a symbol
> >   in the _previous_ loading of nfsd. When apply_relocate_add()
> >   tries to replace the old relocation with a new one, it sees that
> >   the previous one is nonzero and it errors out.
> >
> >   On ppc64le, we have a similar issue:
> >
> >   module_64: livepatch_nfsd: Expected nop after call, got e8410018 at 
> > e_show+0x60/0x548 [livepatch_nfsd]
> >   livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' 
> > (-8)
> >   livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to 
> > load module 'nfsd'
> >
> > He also proposed three different solutions. We could remove the error
> > check in apply_relocate_add() introduced by commit eda9cec4c9a1
> > ("x86/module: Detect and skip invalid relocations"). However the check
> > is useful for detecting corrupted modules.
> >
> > We could also deny the patched modules to be removed. If it proved to be
> > a major drawback for users, we could still implement a different
> > approach. The solution would also complicate the existing code a lot.
> >
> > We thus decided to reverse the relocation patching (clear all relocation
> > targets on x86_64). The solution is not
> > universal and is too much arch-specific, but it may prove to be simpler
> > in the end.
> >
> > Reported-by: Josh Poimboeuf 
> > Signed-off-by: Miroslav Benes 
> > Signed-off-by: Song Liu 
>
> Petr has commented on the code aspects. I will just add that s390x was not
> dealt with at the time because there was no live patching support for
> s390x back then if I remember correctly and my notes do not lie. The same
> applies to powerpc32. I think that both should be fixed as well with this
> patch. It might also help to clean up the ifdeffery in the patch a bit.

After reading the code (no testing), I think we don't need any logic for
ppc32 and s390.

We need clear_relocate_add() to handle module reload failure.
The failure happens when we

1) call apply_relocate_add() on klp load (or module first load,
   if klp was loaded first);
2) do nothing when the module is unloaded;
3) call apply_relocate_add() on module reload, which failed.

This failure happens in the sanity check in
apply_relocate_add().

For x86, the check is something like:
if (*(s32 *)loc != 0)
goto invalid_relocation;

For ppc64, the check is in restore_r2():

if (*instruction != PPC_RAW_NOP()) {
pr_err("%s: Expected nop after call, got %08x at %pS\n",
me->name, *instruction, instruction);
return 0;
}

I don't think we have similar checks for ppc32 and s390, so
clear_relocate_add() is not needed for the two.

OTOH, we can argue that clear_relocate_add() should undo
everything apply_relocate_add() did. But I do think that
will be an overkill.

WDYT?

Thanks,
Song


Re: powerpc-part: was: Re: [PATCH v6] livepatch: Clear relocation targets on a module removal

2022-12-13 Thread Song Liu
)() ()On Mon, Dec 12, 2022 at 9:12 AM Petr Mladek  wrote:
>
> On Fri 2022-12-09 11:59:35, Song Liu wrote:
> > On Fri, Dec 9, 2022 at 3:41 AM Petr Mladek  wrote:
> > > On Mon 2022-11-28 17:57:06, Song Liu wrote:
> > > > On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek  wrote:
> > > > >
> > > > > > --- a/arch/powerpc/kernel/module_64.c
> > > > > > +++ b/arch/powerpc/kernel/module_64.c
> > > > > > +#ifdef CONFIG_LIVEPATCH
> > > > > > +void clear_relocate_add(Elf64_Shdr *sechdrs,
> > > > > > +const char *strtab,
> > > > > > +unsigned int symindex,
> > > > > > +unsigned int relsec,
> > > > > > +struct module *me)
> > > > > > +{
>
> [...]
>
> > > > > > +
> > > > > > + instruction = (u32 *)location;
> > > > > > + if (is_mprofile_ftrace_call(symname))
> > > > > > + continue;
> > >
> > > Why do we ignore these symbols?
> > >
> > > I can't find any counter-part in apply_relocate_add(). It looks super
> > > tricky. It would deserve a comment.
> > >
> > > And I have no idea how we could maintain these exceptions.
> > >
> > > > > > + if 
> > > > > > (!instr_is_relative_link_branch(ppc_inst(*instruction)))
> > > > > > + continue;
> > >
> > > Same here. It looks super tricky and there is no explanation.
> >
> > The two checks are from restore_r2(). But I cannot really remember
> > why we needed them. It is probably an updated version from an earlier
> > version (3 year earlier..).
>
> This is a good sign that it has to be explained in a comment.
> Or even better, it should not by copy pasted.
>
> > > > > > + instruction += 1;
> > > > > > + patch_instruction(instruction, 
> > > > > > ppc_inst(PPC_RAW_NOP()));
>
> I believe that this is not enough. apply_relocate_add() does this:
>
> int apply_relocate_add(Elf64_Shdr *sechdrs,
> [...]
>struct module *me)
> {
> [...]
> case R_PPC_REL24:
> /* FIXME: Handle weak symbols here --RR */
> if (sym->st_shndx == SHN_UNDEF ||
> sym->st_shndx == SHN_LIVEPATCH) {
> [...]
> if (!restore_r2(strtab + sym->st_name,
> (u32 *)location + 1, 
> me))
> [...]   return -ENOEXEC;
>
> --->if (patch_instruction((u32 *)location, 
> ppc_inst(value)))
> return -EFAULT;
>
> , where restore_r2() does:
>
> static int restore_r2(const char *name, u32 *instruction, struct module *me)
> {
> [...]
> /* ld r2,R2_STACK_OFFSET(r1) */
> --->if (patch_instruction(instruction, ppc_inst(PPC_INST_LD_TOC)))
> return 0;
> [...]
> }
>
> By other words, apply_relocate_add() modifies two instructions:
>
>+ patch_instruction() called in restore_r2() writes into "location + 1"
>+ patch_instruction() called in apply_relocate_add() writes into "location"
>
> IMHO, we have to clear both.
>
> IMHO, we need to implement a function that reverts the changes done
> in restore_r2(). Also we need to revert the changes done in
> apply_relocate_add().

I finally got time to read all the details again and recalled what
happened with the code.

The failure happens when we
1) call apply_relocate_add() on klp load (or module first load,
   if klp was loaded first);
2) do nothing when the module is unloaded;
3) call apply_relocate_add() on module reload, which failed.

The failure happens at this check in restore_r2():

if (*instruction != PPC_RAW_NOP()) {
pr_err("%s: Expected nop after call, got %08x at %pS\n",
me->name, *instruction, instruction);
return 0;
}

Therefore, apply_relocate_add only fails when "location + 1"
is not NOP. And to make it not fail, we only need to write NOP to
"location + 1" in clear_relocate_add().

IIUC, you want clear_relocate_add() to undo everything we did
in apply_relocate_add(); while I was writing clear_relocate_add()
to make the next apply_relocate_add() not fail.

I agree that, based on the name, clear_relocate_add() should
undo everything by apply_relocate_add(). But I am not sure how
to handle some cases. For example, how do we undo

case R_PPC64_ADDR32:
/* Simply set it */
*(u32 *)location = value;
   break;

Shall we just write zeros? I don't think this matters.

I think this is the question we should answer first:
What shall clear_relocate_add() do?
1) undo everything by apply_relocate_add();
2) only do things needed to make the next
   apply_relocate_add succeed;
3) something between 1) and 2).

WDYT?

Thanks,
Song

>
> Please, avoid code duplication as much as possible. Especially,
> the two checks is_mprofile_ftrace_call() and
> instr_is_relative_link_branch()