date:20220926

Re: [PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function

2022-09-26 Thread Christophe Leroy



Le 27/09/2022 à 04:57, Benjamin Gray a écrit :
> On Mon, 2022-09-26 at 14:33 +, Christophe Leroy wrote:
>>> +#define patch_memory(addr, val) \
>>> +({ \
>>> +   BUILD_BUG_ON(!__native_word(val)); \
>>> +   __patch_memory(addr, (unsigned long) val, sizeof(val)); \
>>> +})
>>
>> Can you do a static __always_inline function instead of a macro here
>> ?
> 
> I didn't before because it doesn't allow using the type as a parameter.
> I considered these forms
> 
>patch_memory(addr, val, 8);
>patch_memory(addr, val, void*);
>patch_memory(addr, val);  // size taken from val type
> 
> And thought the third was the nicest to use. Though coming back to
> this, I hadn't considered
> 
>patch_memory(addr, val, sizeof(void*))
> 
> which would still allow a type to decide the size, and not be a macro.
> I've got an example implementation further down that also addresses the
> size check issue.

Oh, I missed that you did automatic type sizing. Fair enough.

However I think taking the type of the passed value is dangerous.

See put_user(), it uses the size of the destination pointer, not the 
size of the input value.

patch_memory doesn't seem to be used outside of code-patching.c, so I 
don't thing it is worth to worry about a nice looking API. Just make it 
simple and pass the size to the function.

> 
>>> +static int __always_inline ___patch_memory(void *patch_addr,
>>> +  unsigned long data,
>>> +  void *prog_addr,
>>> +  size_t size)
>>
>> Is it really needed in the .c file ? I would expect GCC to take the
>> right decision by itself.
> 
> I thought it'd be better to always inline it given it's only used
> generically in do_patch_memory and __do_patch_memory, which both get
> inlined into __patch_memory. But it does end up generating two copies
> due to the different contexts it's called in, so probably not worth it.
> Removed for v3.
> 
> (raw_patch_instruction gets an optimised inline of ___patch_memory
> either way)
> 
>> A BUILD_BUG() would be better here I think.
> 
> BUILD_BUG() as the default case always triggers though, I assume
> because the constant used for size is too far away. How about
> 
>static __always_inline int patch_memory(void *addr,
>unsigned long val,
>size_t size)
>{
>int __patch_memory(void *dest, unsigned long src, size_t size);
> 
>BUILD_BUG_ON_MSG(!(size == sizeof(char)  ||
>   size == sizeof(short) ||
>   size == sizeof(int)   ||
>   size == sizeof(long)),
> "Unsupported size for patch_memory");
>return __patch_memory(addr, val, size);
>}
> 
> Declaring the __patch_memory function inside of patch_memory enforces
> that you can't accidentally call __patch_memory without going through
> this or the *patch_instruction entry points (which hardcode the size).

Aren't you making it more difficult that needed ? That's C, not C plus 
plus and we are not trying to help the user.
All kernel developpers know that as soon as they use a function that has 
a leading double underscore they will be on their own.

And again, patch_memory() isn't used anywhere else, at least for the 
time being, so why worry about that ?

> 
>>> +   }
>>>
>>> -   __put_kernel_nofault(patch_addr, , u32,
>>> failed);
>>> -   } else {
>>> -   u64 val = ppc_inst_as_ulong(instr);
>>> +   dcbst(patch_addr);
>>> +   dcbst(patch_addr + size - 1); /* Last byte of data may
>>> cross a cacheline */
>>
>> Or the second byte of data may cross a cacheline ...
> 
> It might, but unless we are assuming data cachelines smaller than the
> native word size it will either be in the first or last byte's
> cacheline. Whereas the last byte might be in it's own cacheline.
> 
> As justification the comment's misleading though, how about reducing it
> to "data may cross a cacheline" and leaving the reason for flushing the
> last byte implicit?

Yes that was my worry, a misleading comment.
I think "data may cross a cacheline" is what we need as a comment.

> 
>>> -static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
>>> +static int __always_inline __do_patch_memory(void *dest, unsigned
>>> long src, size_t size)
>>>    {
>>
>> Whaou, do we really want all this to be __always_inline ? Did you
>> check
>> the text size increase ?
> 
> These ones are redundant because GCC will already inline them, they
> were just part of experimenting inlining ___patch_memory. Will remove
> for v3.
> 
> The text size doesn't increase though because the call hierarchy is
> just a linear chain of
> __patch_memory -> do_patch_memory -> __do_patch_memory

Yes, I had in mind that all those would be inlined doing to all callers 
of patch_instruction() and

Re: [PATCH v2 5/6] powerpc/64: Add support for out-of-line static calls

2022-09-26 Thread Benjamin Gray

On Mon, 2022-09-26 at 13:16 +, Christophe Leroy wrote:
> Build failure with GCC 5.5 (ppc64le_defconfig):
> 
>    CC  arch/powerpc/kernel/ptrace/ptrace.o
> {standard input}: Assembler messages:
> {standard input}:10: Error: .localentry expression for 
> `__SCT__tp_func_sys_enter' is not a valid power of 2
> {standard input}:29: Error: .localentry expression for 
> `__SCT__tp_func_sys_exit' is not a valid power of 2

Looks support for a literal st_other value in `.localentry` is added in
binutils 2.32 . I'll change the config entry as follows for v3:

  select HAVE_STATIC_CALL if 
  PPC32 || (PPC64_ELF_ABI_V2 && LD_VERSION >= 23200)

Re: [PATCH linux-next][RFC] powerpc: avoid lockdep when we are offline

2022-09-26 Thread Nicholas Piggin

On Tue Sep 27, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote:
> This is second version of my fix to PPC's  "WARNING: suspicious RCU usage",
> I improved my fix under Paul E. McKenney's guidance:
> Link: 
> https://lore.kernel.org/lkml/20220914021528.15946-1-zhouzho...@gmail.com/T/
>
> During the cpu offlining, the sub functions of xive_teardown_cpu will
> call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
> travel RCU protected list, so "WARNING: suspicious RCU usage" will be
> triggered.
>
> Avoid lockdep when we are offline.

I don't see how this is safe. If RCU is no longer watching the CPU then
the memory it is accessing here could be concurrently freed. I think the
warning is valid.

powerpc's problem is that cpuhp_report_idle_dead() is called before
arch_cpu_idle_dead(), so it must not rely on any RCU protection there.
I would say xive cleanup just needs to be done earlier. I wonder why it
is not done in __cpu_disable or thereabouts, that's where the interrupt
controller is supposed to be stopped.

Thanks,
Nick

>
> Signed-off-by: Zhouyi Zhou 
> ---
> Dear PPC and RCU developers
>
> I found this bug when trying to do rcutorture tests in ppc VM of
> Open Source Lab of Oregon State University
>
> console.log report following bug:
> [   37.635545][T0] WARNING: suspicious RCU usage^M
> [   37.636409][T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M
> [   37.637575][T0] -^M
> [   37.638306][T0] kernel/locking/lockdep.c:3723 RCU-list traversed in 
> non-reader section!!^M
> [   37.639651][T0] ^M
> [   37.639651][T0] other info that might help us debug this:^M
> [   37.639651][T0] ^M
> [   37.641381][T0] ^M
> [   37.641381][T0] RCU used illegally from offline CPU!^M
> [   37.641381][T0] rcu_scheduler_active = 2, debug_locks = 1^M
> [   37.667170][T0] no locks held by swapper/6/0.^M
> [   37.668328][T0] ^M
> [   37.668328][T0] stack backtrace:^M
> [   37.669995][T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 
> 6.0.0-rc4-next-20220907-dirty #8^M
> [   37.672777][T0] Call Trace:^M
> [   37.673729][T0] [c4653920] [c097f9b4] 
> dump_stack_lvl+0x98/0xe0 (unreliable)^M
> [   37.678579][T0] [c4653960] [c01f2eb8] 
> lockdep_rcu_suspicious+0x148/0x16c^M
> [   37.680425][T0] [c46539f0] [c01ed9b4] 
> __lock_acquire+0x10f4/0x26e0^M
> [   37.682450][T0] [c4653b30] [c01efc2c] 
> lock_acquire+0x12c/0x420^M
> [   37.684113][T0] [c4653c20] [c10d704c] 
> _raw_spin_lock_irqsave+0x6c/0xc0^M
> [   37.686154][T0] [c4653c60] [c00c7b4c] 
> xive_spapr_put_ipi+0xcc/0x150^M
> [   37.687879][T0] [c4653ca0] [c10c72a8] 
> xive_cleanup_cpu_ipi+0xc8/0xf0^M
> [   37.689856][T0] [c4653cf0] [c10c7370] 
> xive_teardown_cpu+0xa0/0xf0^M
> [   37.691877][T0] [c4653d30] [c00fba5c] 
> pseries_cpu_offline_self+0x5c/0x100^M
> [   37.693882][T0] [c4653da0] [c005d2c4] 
> arch_cpu_idle_dead+0x44/0x60^M
> [   37.695739][T0] [c4653dc0] [c01c740c] 
> do_idle+0x16c/0x3d0^M
> [   37.697536][T0] [c4653e70] [c01c7a1c] 
> cpu_startup_entry+0x3c/0x40^M
> [   37.699694][T0] [c4653ea0] [c005ca20] 
> start_secondary+0x6c0/0xb50^M
> [   37.701742][T0] [c4653f90] [c000d054] 
> start_secondary_prolog+0x10/0x14^M
>
>
> Tested on PPC VM of Open Source Lab of Oregon State University.
> Test results show that although "WARNING: suspicious RCU usage" has gone,
> and there are less "BUG: soft lockup" reports than the original kernel
> (9 vs 13), which sounds good ;-)
>
> But after my modification, results-rcutorture-kasan/SRCU-P/console.log.diags
> shows a new warning:
> [  222.289242][  T110] WARNING: CPU: 6 PID: 110 at 
> kernel/rcu/rcutorture.c:2806 rcu_torture_fwd_prog+0xc88/0xdd0
>
> I guess above new warning also exits in original kernel, so I write a tiny 
> test script as follows:
>
> #!/bin/sh
>
> COUNTER=0
> while [ $COUNTER -lt 1000 ] ; do
> qemu-system-ppc64 -nographic -smp cores=8,threads=1 -net none -M pseries 
> -nodefaults -device spapr-vscsi -serial file:/tmp/console.log -m 2G -kernel 
> /tmp/vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 
> rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot 
> rcupdate.rcu_task_stall_timeout=3 rcutorture.torture_type=srcud 
> rcupdate.rcu_self_test=1 rcutorture.fwd_progress=3 srcutree.big_cpu_lim=5 
> rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30 
> rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 
> rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 
> rcutorture.verbose=1"&
> qemu_pid=$!
> cd ~/next1/linux-next
> make clean
> #I use "make vmlinux -j 8" to create heavy background jitter
> make vmlinux -j 8  > /dev/null 2>&1 
> make_pid=$!
> wait $qemu_pid

Re: [PATCH v2 6/6] powerpc/64: Add tests for out-of-line static calls

2022-09-26 Thread Benjamin Gray

On Mon, 2022-09-26 at 14:55 +, Christophe Leroy wrote:
> > +config PPC_STATIC_CALL_KUNIT_TEST
> > +   tristate "KUnit tests for PPC64 ELF ABI V2 static calls"
> > +   default KUNIT_ALL_TESTS
> > +   depends on HAVE_STATIC_CALL && PPC64_ELF_ABI_V2 && KUNIT &&
> > m
> 
> Is there a reason why it is dedicated to PPC64 ? In that case, can
> you 
> make it explicit with the name of the config option, and with the
> name 
> of the file below ?

The tests were written to make sure the TOC stays correct, so in theory
PPC64_ELF_ABI_V2 (and potentially PPC64_ELF_ABI_V1) is the only ABI
that should need them. I was thinking other tests should probably go in
static_call_selftest.c

Thinking now though, I suppose runtime modules are out-of-range for
branches on 32-bit as well? I can see it being useful for just testing
the indirect branch fallback in that case, without trying to make some
generic test suite that needs to work on other arches. The TOC specific
checks can be conditionally enabled per ABI.

Re: [PATCH v2 5/6] powerpc/64: Add support for out-of-line static calls

2022-09-26 Thread Benjamin Gray

On Mon, 2022-09-26 at 14:54 +, Christophe Leroy wrote:
> > diff --git a/arch/powerpc/kernel/static_call.c
> > b/arch/powerpc/kernel/static_call.c
> > index 863a7aa24650..ecbb74e1b4d3 100644
> > --- a/arch/powerpc/kernel/static_call.c
> > +++ b/arch/powerpc/kernel/static_call.c
> > @@ -4,30 +4,108 @@
> >   
> >   #include 
> >   
> > +static void* ppc_function_toc(u32 *func)
> > +{
> > +#ifdef CONFIG_PPC64_ELF_ABI_V2
> 
> Can you use IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) instead ?

I tried when implementing it, but the `(u64) func` cast is an issue. I
could side step it and use `unsigned long` if that's preferable?
Otherwise I like being explicit about the size, it's a delicate
function.

Re: [PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function

2022-09-26 Thread Benjamin Gray

On Mon, 2022-09-26 at 14:33 +, Christophe Leroy wrote:
> > +#define patch_memory(addr, val) \
> > +({ \
> > +   BUILD_BUG_ON(!__native_word(val)); \
> > +   __patch_memory(addr, (unsigned long) val, sizeof(val)); \
> > +})
> 
> Can you do a static __always_inline function instead of a macro here
> ?

I didn't before because it doesn't allow using the type as a parameter.
I considered these forms

  patch_memory(addr, val, 8);
  patch_memory(addr, val, void*);
  patch_memory(addr, val);  // size taken from val type

And thought the third was the nicest to use. Though coming back to
this, I hadn't considered

  patch_memory(addr, val, sizeof(void*))

which would still allow a type to decide the size, and not be a macro.
I've got an example implementation further down that also addresses the
size check issue.

> > +static int __always_inline ___patch_memory(void *patch_addr,
> > +  unsigned long data,
> > +  void *prog_addr,
> > +  size_t size)
> 
> Is it really needed in the .c file ? I would expect GCC to take the 
> right decision by itself.

I thought it'd be better to always inline it given it's only used
generically in do_patch_memory and __do_patch_memory, which both get
inlined into __patch_memory. But it does end up generating two copies
due to the different contexts it's called in, so probably not worth it.
Removed for v3.

(raw_patch_instruction gets an optimised inline of ___patch_memory
either way)

> A BUILD_BUG() would be better here I think.

BUILD_BUG() as the default case always triggers though, I assume
because the constant used for size is too far away. How about

  static __always_inline int patch_memory(void *addr, 
  unsigned long val, 
  size_t size) 
  {
  int __patch_memory(void *dest, unsigned long src, size_t size);

  BUILD_BUG_ON_MSG(!(size == sizeof(char)  ||
 size == sizeof(short) ||
 size == sizeof(int)   ||
 size == sizeof(long)),
   "Unsupported size for patch_memory");
  return __patch_memory(addr, val, size);
  }

Declaring the __patch_memory function inside of patch_memory enforces
that you can't accidentally call __patch_memory without going through
this or the *patch_instruction entry points (which hardcode the size).

> > +   }
> > 
> > -   __put_kernel_nofault(patch_addr, , u32,
> > failed);
> > -   } else {
> > -   u64 val = ppc_inst_as_ulong(instr);
> > +   dcbst(patch_addr);
> > +   dcbst(patch_addr + size - 1); /* Last byte of data may
> > cross a cacheline */
> 
> Or the second byte of data may cross a cacheline ...

It might, but unless we are assuming data cachelines smaller than the
native word size it will either be in the first or last byte's
cacheline. Whereas the last byte might be in it's own cacheline.

As justification the comment's misleading though, how about reducing it
to "data may cross a cacheline" and leaving the reason for flushing the
last byte implicit?

> > -static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
> > +static int __always_inline __do_patch_memory(void *dest, unsigned
> > long src, size_t size)
> >   {
> 
> Whaou, do we really want all this to be __always_inline ? Did you
> check 
> the text size increase ?

These ones are redundant because GCC will already inline them, they
were just part of experimenting inlining ___patch_memory. Will remove
for v3. 

The text size doesn't increase though because the call hierarchy is
just a linear chain of
__patch_memory -> do_patch_memory -> __do_patch_memory

The entry point __patch_memory is not inlined.

Re: [PATCH v2 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub

2022-09-26 Thread Benjamin Gray

On Mon, 2022-09-26 at 14:49 +, Christophe Leroy wrote:
> > +   /* Replace indirect branch sequence with direct branch
> > where possible */
> > +   if (!create_branch(, jump_seq_addr, addr, 0))
> > +   if (patch_instruction(jump_seq_addr, direct))
> 
> Why not use patch_branch() ?

I didn't think of it at the time. To get the same abort-if-patch-failed
semantics then the following should work

  int err;
  ...

  /* Replace indirect branch sequence with direct branch where 
   * possible 
   */
  err = patch_branch(>jump[PPC64_STUB_MTCTR_OFFSET], addr, 0);
  if (err && err != -ERANGE)
  return 0;
>

Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies

2022-09-26 Thread AKASHI Takahiro

On Mon, Sep 26, 2022 at 09:40:25AM +0200, Michal Such??nek wrote:
> On Mon, Sep 26, 2022 at 08:47:32AM +0200, Greg Kroah-Hartman wrote:
> > On Sat, Sep 24, 2022 at 01:55:23PM +0200, Michal Suchánek wrote:
> > > On Sat, Sep 24, 2022 at 12:13:34PM +0200, Greg Kroah-Hartman wrote:
> > > > On Sat, Sep 24, 2022 at 11:45:21AM +0200, Michal Suchánek wrote:
> > > > > On Sat, Sep 24, 2022 at 11:19:19AM +0200, Greg Kroah-Hartman wrote:
> > > > > > On Fri, Sep 23, 2022 at 07:10:28PM +0200, Michal Suchanek wrote:
> > > > > > > Hello,
> > > > > > > 
> > > > > > > this is backport of commit 0d519cadf751
> > > > > > > ("arm64: kexec_file: use more system keyrings to verify kernel 
> > > > > > > image signature")
> > > > > > > to table 5.15 tree including the preparatory patches.
> > > > > > 
> > > > > > This feels to me like a new feature for arm64, one that has never 
> > > > > > worked
> > > > > > before and you are just making it feature-parity with x86, right?
> > > > > > 
> > > > > > Or is this a regression fix somewhere?  Why is this needed in 
> > > > > > 5.15.y and
> > > > > > why can't people who need this new feature just use a newer kernel
> > > > > > version (5.19?)
> > > > > 
> > > > > It's half-broken implementation of the kexec kernel verification. At 
> > > > > the time
> > > > > it was implemented for arm64 we had the platform and secondary 
> > > > > keyrings
> > > > > and x86 was using them but on arm64 the initial implementation ignores
> > > > > them.
> > > > 
> > > > Ok, so it's something that never worked.  Adding support to get it to
> > > > work doesn't really fall into the stable kernel rules, right?
> > > 
> > > Not sure. It was defective, not using the facilities available at the
> > > time correctly. Which translates to kernels that can be kexec'd on x86
> > > failing to kexec on arm64 without any explanation (signed with same key,
> > > built for the appropriate arch).
> > 
> > Feature parity across architectures is not a "regression", but rather a
> > "this feature is not implemented for this architecture yet" type of
> > thing.
> 
> That depends on the view - before kexec verification you could boot any
> kernel, now you can boot some kernels signed with a valid key, but not
> others - the initial implementation is buggy, probably because it
> is based on an old version of the x86 code.

Buggy?
The feature of supporting platform ring had been slipped in just before
I submitted the latest patch series which was eventually merged.
(I should have noticed it though.)

Looking at changes in the commit 278311e417be ("kexec, KEYS: Make use of 
platform
keyring for signature verify"), it seems to be obvious that it is a new feature
because it introduced a new Kconfig option, CONFIG_INTEGRITY_PLATFORM_KEYRING,
which allows for enabling/disabling platform ring support.

-Takahiro Akashi

> > 
> > > > Again, what's wrong with 5.19 for anyone who wants this?  Who does want
> > > > this?
> > > 
> > > Not sure, really.
> > > 
> > > The final patch was repeatedly backported to stable and failed to build
> > > because the prerequisites were missing.
> > 
> > That's because it was tagged, but now that you show the full set of
> > requirements, it's pretty obvious to me that this is not relevant for
> > going this far back.
> 
> That also works.
> 
> Thanks
> 
> Michal

Re: [PATCH 2/7] mm: Free device private pages have zero refcount

2022-09-26 Thread Alistair Popple

Jason Gunthorpe  writes:

> On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
>> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
>> refcount") device private pages have no longer had an extra reference
>> count when the page is in use. However before handing them back to the
>> owning device driver we add an extra reference count such that free
>> pages have a reference count of one.
>>
>> This makes it difficult to tell if a page is free or not because both
>> free and in use pages will have a non-zero refcount. Instead we should
>> return pages to the drivers page allocator with a zero reference count.
>> Kernel code can then safely use kernel functions such as
>> get_page_unless_zero().
>>
>> Signed-off-by: Alistair Popple 
>> ---
>>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 1 +
>>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
>>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
>>  lib/test_hmm.c   | 1 +
>>  mm/memremap.c| 5 -
>>  mm/page_alloc.c  | 6 ++
>>  6 files changed, 10 insertions(+), 5 deletions(-)
>
> I think this is a great idea, but I'm surprised no dax stuff is
> touched here?

free_zone_device_page() shouldn't be called for pgmap->type ==
MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX
there. Except that the folio code looks like it might have introduced a
bug. AFAICT put_page() always calls
put_devmap_managed_page(>page) but folio_put() does not (although
folios_put() does!). So it seems folio_put() won't end up calling
__put_devmap_managed_page_refs() as I think it should.

I think you're right about the change to __init_zone_device_page() - I
should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to
look at Dan's patch series more closely as I suspect it might be better
to rebase this patch on top of that.

> Jason

Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-26 Thread Alistair Popple



Felix Kuehling  writes:

> On 2022-09-26 17:35, Lyude Paul wrote:
>> On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:
>>> When the module is unloaded or a GPU is unbound from the module it is
>>> possible for device private pages to be left mapped in currently running
>>> processes. This leads to a kernel crash when the pages are either freed
>>> or accessed from the CPU because the GPU and associated data structures
>>> and callbacks have all been freed.
>>>
>>> Fix this by migrating any mappings back to normal CPU memory prior to
>>> freeing the GPU memory chunks and associated device private pages.
>>>
>>> Signed-off-by: Alistair Popple 
>>>
>>> ---
>>>
>>> I assume the AMD driver might have a similar issue. However I can't see
>>> where device private (or coherent) pages actually get unmapped/freed
>>> during teardown as I couldn't find any relevant calls to
>>> devm_memunmap(), memunmap(), devm_release_mem_region() or
>>> release_mem_region(). So it appears that ZONE_DEVICE pages are not being
>>> properly freed during module unload, unless I'm missing something?
>> I've got no idea, will poke Ben to see if they know the answer to this
>
> I guess we're relying on devm to release the region. Isn't the whole point of
> using devm_request_free_mem_region that we don't have to remember to 
> explicitly
> release it when the device gets destroyed? I believe we had an explicit free
> call at some point by mistake, and that caused a double-free during module
> unload. See this commit for reference:

Argh, thanks for that pointer. I was not so familiar with
devm_request_free_mem_region()/devm_memremap_pages() as currently
Nouveau explicitly manages that itself.

> commit 22f4f4faf337d5fb2d2750aff13215726814273e
> Author: Philip Yang 
> Date:   Mon Sep 20 17:25:52 2021 -0400
>
> drm/amdkfd: fix svm_migrate_fini warning
>  Device manager releases device-specific resources when a driver
> disconnects from a device, devm_memunmap_pages and
> devm_release_mem_region calls in svm_migrate_fini are redundant.
>  It causes below warning trace after patch "drm/amdgpu: Split
> amdgpu_device_fini into early and late", so remove function
> svm_migrate_fini.
>  BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718
>  WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795
> devm_release_action+0x51/0x60
> Call Trace:
> ? memunmap_pages+0x360/0x360
> svm_migrate_fini+0x2d/0x60 [amdgpu]
> kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
> amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
> amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
> amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
> drm_dev_release+0x20/0x40 [drm]
> release_nodes+0x196/0x1e0
> device_release_driver_internal+0x104/0x1d0
> driver_detach+0x47/0x90
> bus_remove_driver+0x7a/0xd0
> pci_unregister_driver+0x3d/0x90
> amdgpu_exit+0x11/0x20 [amdgpu]
>  Signed-off-by: Philip Yang 
> Reviewed-by: Felix Kuehling 
> Signed-off-by: Alex Deucher 
>
> Furthermore, I guess we are assuming that nobody is using the GPU when the
> module is unloaded. As long as any processes have /dev/kfd open, you won't be
> able to unload the module (except by force-unload). I suppose with ZONE_DEVICE
> memory, we can have references to device memory pages even when user mode has
> closed /dev/kfd. We do have a cleanup handler that runs in an 
> MMU-free-notifier.
> In theory that should run after all the pages in the mm_struct have been 
> freed.
> It releases all sorts of other device resources and needs the driver to still 
> be
> there. I'm not sure if there is anything preventing a module unload before the
> free-notifier runs. I'll look into that.

Right - module unload (or device unbind) is one of the other ways we can
hit this issue in Nouveau at least. You can end up with ZONE_DEVICE
pages mapped in a running process after the module has unloaded.
Although now you mention it that seems a bit wrong - the pgmap refcount
should provide some protection against that. Will have to look into
that too.

> Regards,
>   Felix
>
>
>>
>>> ---
>>>   drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++-
>>>   1 file changed, 48 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
>>> b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>>> index 66ebbd4..3b247b8 100644
>>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
>>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
>>> @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
>>> mutex_unlock(>dmem->mutex);
>>>   }
>>>   +/*
>>> + * Evict all pages mapping a chunk.
>>> + */
>>> +void
>>> +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
>>> +{
>>> +   unsigned long i, npages = range_len(>pagemap.range) >> 
>>> PAGE_SHIFT;
>>> +   unsigned long *src_pfns, *dst_pfns;
>>> +   dma_addr_t *dma_addrs;
>>> +   struct

[PATCH linux-next][RFC] powerpc: avoid lockdep when we are offline

2022-09-26 Thread Zhouyi Zhou

This is second version of my fix to PPC's  "WARNING: suspicious RCU usage",
I improved my fix under Paul E. McKenney's guidance:
Link: 
https://lore.kernel.org/lkml/20220914021528.15946-1-zhouzho...@gmail.com/T/

During the cpu offlining, the sub functions of xive_teardown_cpu will
call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will
travel RCU protected list, so "WARNING: suspicious RCU usage" will be
triggered.

Avoid lockdep when we are offline.

Signed-off-by: Zhouyi Zhou 
---
Dear PPC and RCU developers

I found this bug when trying to do rcutorture tests in ppc VM of
Open Source Lab of Oregon State University

console.log report following bug:
[   37.635545][T0] WARNING: suspicious RCU usage^M
[   37.636409][T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M
[   37.637575][T0] -^M
[   37.638306][T0] kernel/locking/lockdep.c:3723 RCU-list traversed in 
non-reader section!!^M
[   37.639651][T0] ^M
[   37.639651][T0] other info that might help us debug this:^M
[   37.639651][T0] ^M
[   37.641381][T0] ^M
[   37.641381][T0] RCU used illegally from offline CPU!^M
[   37.641381][T0] rcu_scheduler_active = 2, debug_locks = 1^M
[   37.667170][T0] no locks held by swapper/6/0.^M
[   37.668328][T0] ^M
[   37.668328][T0] stack backtrace:^M
[   37.669995][T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 
6.0.0-rc4-next-20220907-dirty #8^M
[   37.672777][T0] Call Trace:^M
[   37.673729][T0] [c4653920] [c097f9b4] 
dump_stack_lvl+0x98/0xe0 (unreliable)^M
[   37.678579][T0] [c4653960] [c01f2eb8] 
lockdep_rcu_suspicious+0x148/0x16c^M
[   37.680425][T0] [c46539f0] [c01ed9b4] 
__lock_acquire+0x10f4/0x26e0^M
[   37.682450][T0] [c4653b30] [c01efc2c] 
lock_acquire+0x12c/0x420^M
[   37.684113][T0] [c4653c20] [c10d704c] 
_raw_spin_lock_irqsave+0x6c/0xc0^M
[   37.686154][T0] [c4653c60] [c00c7b4c] 
xive_spapr_put_ipi+0xcc/0x150^M
[   37.687879][T0] [c4653ca0] [c10c72a8] 
xive_cleanup_cpu_ipi+0xc8/0xf0^M
[   37.689856][T0] [c4653cf0] [c10c7370] 
xive_teardown_cpu+0xa0/0xf0^M
[   37.691877][T0] [c4653d30] [c00fba5c] 
pseries_cpu_offline_self+0x5c/0x100^M
[   37.693882][T0] [c4653da0] [c005d2c4] 
arch_cpu_idle_dead+0x44/0x60^M
[   37.695739][T0] [c4653dc0] [c01c740c] 
do_idle+0x16c/0x3d0^M
[   37.697536][T0] [c4653e70] [c01c7a1c] 
cpu_startup_entry+0x3c/0x40^M
[   37.699694][T0] [c4653ea0] [c005ca20] 
start_secondary+0x6c0/0xb50^M
[   37.701742][T0] [c4653f90] [c000d054] 
start_secondary_prolog+0x10/0x14^M


Tested on PPC VM of Open Source Lab of Oregon State University.
Test results show that although "WARNING: suspicious RCU usage" has gone,
and there are less "BUG: soft lockup" reports than the original kernel
(9 vs 13), which sounds good ;-)

But after my modification, results-rcutorture-kasan/SRCU-P/console.log.diags
shows a new warning:
[  222.289242][  T110] WARNING: CPU: 6 PID: 110 at kernel/rcu/rcutorture.c:2806 
rcu_torture_fwd_prog+0xc88/0xdd0

I guess above new warning also exits in original kernel, so I write a tiny test 
script as follows:

#!/bin/sh

COUNTER=0
while [ $COUNTER -lt 1000 ] ; do
qemu-system-ppc64 -nographic -smp cores=8,threads=1 -net none -M pseries 
-nodefaults -device spapr-vscsi -serial file:/tmp/console.log -m 2G -kernel 
/tmp/vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 
rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot 
rcupdate.rcu_task_stall_timeout=3 rcutorture.torture_type=srcud 
rcupdate.rcu_self_test=1 rcutorture.fwd_progress=3 srcutree.big_cpu_lim=5 
rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30 
rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 
rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"&
qemu_pid=$!
cd ~/next1/linux-next
make clean
#I use "make vmlinux -j 8" to create heavy background jitter
make vmlinux -j 8  > /dev/null 2>&1 
make_pid=$!
wait $qemu_pid
kill $qemu_pid
kill $make_id
if grep -q WARN /tmp/console.log;
then
echo $COUNTER > /tmp/counter
exit
fi
COUNTER=$(($COUNTER+1))
done

Above test shows that original kernel also warn about
"WARNING: CPU: 6 PID: 110 at kernel/rcu/rcutorture.c:2806 
rcu_torture_fwd_prog+0xc88/0xdd0"

But I am not very sure about my results, so I still add a [RFC] to my subject 
line.

Thank all of you for your guidance and encouragement ;-)

Cheers
Zhouyi
--
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index e0a7ac5db15d..e47098f00da1 100644
---

Re: [PATCH v5 27/30] RFC: KVM: powerpc: Move processor compatibility check to hardware setup

2022-09-26 Thread Isaku Yamahata

On Fri, Sep 23, 2022 at 04:58:41PM +1000,
Michael Ellerman  wrote:

> isaku.yamah...@intel.com writes:
> > From: Isaku Yamahata 
> >
> > Move processor compatibility check from kvm_arch_processor_compat() into
>   ^ 
>   kvm_arch_check_processor_compat()
> 
> > kvm_arch_hardware_setup().  The check does model name comparison with a
> > global variable, cur_cpu_spec.  There is no point to check it at run time
> > on all processors.
> 
> A key detail I had to look up is that both kvm_arch_hardware_setup() and
> kvm_arch_check_processor_compat() are called from kvm_init(), one after
> the other. But the latter is called on each CPU.
> 
> And because the powerpc implementation of kvm_arch_check_processor_compat()
> just checks a global, there's no need to call it on every CPU.
> 
> > kvmppc_core_check_processor_compat() checks the global variable.  There are
> > five implementation for it as follows.
> 
> There are three implementations not five.

Thanks. I'll update the commit message.

> >   arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec;
> >   arch/powerpc/kvm/book3s.c: return 0
> >   arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2")
> >   arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc")
> >  strcmp(cur_cpu_spec->cpu_name, "e5500")
> >  strcmp(cur_cpu_spec->cpu_name, "e6500")
> >
> > Suggested-by: Sean Christopherson 
> > Signed-off-by: Isaku Yamahata 
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: Fabiano Rosas 
> > ---
> >  arch/powerpc/kvm/powerpc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> > index 7b56d6ccfdfb..31dc4f231e9d 100644
> > --- a/arch/powerpc/kvm/powerpc.c
> > +++ b/arch/powerpc/kvm/powerpc.c
> > @@ -444,12 +444,12 @@ int kvm_arch_hardware_enable(void)
> >  
> >  int kvm_arch_hardware_setup(void *opaque)
> >  {
> > -   return 0;
> > +   return kvmppc_core_check_processor_compat();
> >  }
> >  
> >  int kvm_arch_check_processor_compat(void)
> >  {
> > -   return kvmppc_core_check_processor_compat();
> > +   return 0;
> >  }
> 
> The actual change seems OK. I gave it a quick test boot and ran some
> VMs, everything seems to work as before.
> 
> Acked-by: Michael Ellerman  (powerpc)

Thanks so much for testing. I'll remove RFC.
-- 
Isaku Yamahata

Re: [PATCH v3 4/4] powerpc/64s: Enable KFENCE on book3s64

2022-09-26 Thread Russell Currey

On Mon, 2022-09-26 at 07:57 +, Nicholas Miehlbradt wrote:
> KFENCE support was added for ppc32 in commit 90cbac0e995d
> ("powerpc: Enable KFENCE for PPC32").
> Enable KFENCE on ppc64 architecture with hash and radix MMUs.
> It uses the same mechanism as debug pagealloc to
> protect/unprotect pages. All KFENCE kunit tests pass on both
> MMUs.
> 
> KFENCE memory is initially allocated using memblock but is
> later marked as SLAB allocated. This necessitates the change
> to __pud_free to ensure that the KFENCE pages are freed
> appropriately.
> 
> Based on previous work by Christophe Leroy and Jordan Niethe.
> 
> Signed-off-by: Nicholas Miehlbradt 

LGTM.  For the whole series:

Reviewed-by: Russell Currey

Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-26 Thread Alistair Popple

John Hubbard  writes:

> On 9/26/22 14:35, Lyude Paul wrote:
>>> +   for (i = 0; i < npages; i++) {
>>> +   if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
>>> +   struct page *dpage;
>>> +
>>> +   /*
>>> +* _GFP_NOFAIL because the GPU is going away and there
>>> +* is nothing sensible we can do if we can't copy the
>>> +* data back.
>>> +*/
>>
>> You'll have to excuse me for a moment since this area of nouveau isn't one of
>> my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite
>> retry, in the case of a GPU hotplug event I would assume we would rather just
>> stop trying to migrate things to the GPU and just drop the data instead of
>> hanging on infinite retries.
>>

No problem, thanks for taking a look!

> Hi Lyude!
>
> Actually, I really think it's better in this case to keep trying
> (presumably not necessarily infinitely, but only until memory becomes
> available), rather than failing out and corrupting data.
>
> That's because I'm not sure it's completely clear that this memory is
> discardable. And at some point, we're going to make this all work with
> file-backed memory, which will *definitely* not be discardable--I
> realize that we're not there yet, of course.
>
> But here, it's reasonable to commit to just retrying indefinitely,
> really. Memory should eventually show up. And if it doesn't, then
> restarting the machine is better than corrupting data, generally.

The memory is definitely not discardable here if the migration failed
because that implies it is still mapped into some userspace process.

We could avoid restarting the machine by doing something similar to what
happens during memory failure and killing every process that maps the
page(s). But overall I think it's better to retry until memory is
available, because that allows things like reclaim to work and in the
worst case allows the OOM killer to select an appropriate task to kill.
It also won't cause data corruption if/when we have file-backed memory.

> thanks,

Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-26 Thread Felix Kuehling




On 2022-09-26 17:35, Lyude Paul wrote:

On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:

When the module is unloaded or a GPU is unbound from the module it is
possible for device private pages to be left mapped in currently running
processes. This leads to a kernel crash when the pages are either freed
or accessed from the CPU because the GPU and associated data structures
and callbacks have all been freed.

Fix this by migrating any mappings back to normal CPU memory prior to
freeing the GPU memory chunks and associated device private pages.

Signed-off-by: Alistair Popple 

---

I assume the AMD driver might have a similar issue. However I can't see
where device private (or coherent) pages actually get unmapped/freed
during teardown as I couldn't find any relevant calls to
devm_memunmap(), memunmap(), devm_release_mem_region() or
release_mem_region(). So it appears that ZONE_DEVICE pages are not being
properly freed during module unload, unless I'm missing something?

I've got no idea, will poke Ben to see if they know the answer to this


I guess we're relying on devm to release the region. Isn't the whole 
point of using devm_request_free_mem_region that we don't have to 
remember to explicitly release it when the device gets destroyed? I 
believe we had an explicit free call at some point by mistake, and that 
caused a double-free during module unload. See this commit for reference:


commit 22f4f4faf337d5fb2d2750aff13215726814273e
Author: Philip Yang 
Date:   Mon Sep 20 17:25:52 2021 -0400

drm/amdkfd: fix svm_migrate_fini warning

Device manager releases device-specific resources when a driver

disconnects from a device, devm_memunmap_pages and
devm_release_mem_region calls in svm_migrate_fini are redundant.

It causes below warning trace after patch "drm/amdgpu: Split

amdgpu_device_fini into early and late", so remove function
svm_migrate_fini.

BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718

WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795

devm_release_action+0x51/0x60
Call Trace:
? memunmap_pages+0x360/0x360
svm_migrate_fini+0x2d/0x60 [amdgpu]
kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
drm_dev_release+0x20/0x40 [drm]
release_nodes+0x196/0x1e0
device_release_driver_internal+0x104/0x1d0
driver_detach+0x47/0x90
bus_remove_driver+0x7a/0xd0
pci_unregister_driver+0x3d/0x90
amdgpu_exit+0x11/0x20 [amdgpu]

Signed-off-by: Philip Yang 

Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 

Furthermore, I guess we are assuming that nobody is using the GPU when 
the module is unloaded. As long as any processes have /dev/kfd open, you 
won't be able to unload the module (except by force-unload). I suppose 
with ZONE_DEVICE memory, we can have references to device memory pages 
even when user mode has closed /dev/kfd. We do have a cleanup handler 
that runs in an MMU-free-notifier. In theory that should run after all 
the pages in the mm_struct have been freed. It releases all sorts of 
other device resources and needs the driver to still be there. I'm not 
sure if there is anything preventing a module unload before the 
free-notifier runs. I'll look into that.


Regards,
  Felix





---
  drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++-
  1 file changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 66ebbd4..3b247b8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
mutex_unlock(>dmem->mutex);
  }
  
+/*

+ * Evict all pages mapping a chunk.
+ */
+void
+nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
+{
+   unsigned long i, npages = range_len(>pagemap.range) >> 
PAGE_SHIFT;
+   unsigned long *src_pfns, *dst_pfns;
+   dma_addr_t *dma_addrs;
+   struct nouveau_fence *fence;
+
+   src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
+   dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
+   dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL);
+
+   migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT,
+   npages);
+
+   for (i = 0; i < npages; i++) {
+   if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
+   struct page *dpage;
+
+   /*
+* _GFP_NOFAIL because the GPU is going away and there
+* is nothing sensible we can do if we can't copy the
+* data back.
+*/

You'll have to excuse me for a moment since this area

Re: [PATCH 0/8] generic command line v4

2022-09-26 Thread Daniel Walker

On Mon, Sep 26, 2022 at 05:52:18PM -0500, Rob Herring wrote:
> On Thu, Sep 22, 2022 at 4:15 PM Daniel Gimpelevich
>  wrote:
> >
> > On Thu, 2022-09-22 at 14:10 -0700, Daniel Walker wrote:
> > > On Thu, Sep 22, 2022 at 05:03:46PM -0400, Sean Anderson wrote:
> > [snip]
> > > > As recently as last month, someone's patch to add such support was
> > > > rejected for this reason [1].
> > > >
> > > > --Sean
> > > >
> > > > [1] 
> > > > https://lore.kernel.org/linux-arm-kernel/20220812084613.GA3107@willie-the-truck/
> > >
> > >
> > > I had no idea.. Thanks for pointing that out. I guess I will re-submit in 
> > > that
> > > case.
> > >
> > > Daniel
> >
> > This has been happening repeatedly since circa 2014, on multiple
> > architectures. It's quite frustrating, really.
> 
> It must not be that important. From the last time, IMO Christophe's
> version was much closer to being merged than this series. This is not
> how you get things upstream:
> 
> > * Dropped powerpc changes
> >   Christophe Leroy has reservations about the features for powerpc. I
> >   don't think his reservations are founded, and these changes should
> >   fully work on powerpc. However, I dropped these changes so Christophe
> >   can have more time to get comfortable with the changes.
> 
> Rob

I don't submit often enough, that's true. However, I figured maintainers don't
want the changes. This is a common occurrence in industry, people may submit
once or twice, no traction and they give up. I suppose it's a combination of
problems.

Christophe's don't have the same features, so they are really totally different
but conflicting.

Daniel

Re: [PATCH 0/8] generic command line v4

2022-09-26 Thread Daniel Walker

On Thu, Sep 22, 2022 at 02:15:44PM -0700, Daniel Gimpelevich wrote:
> On Thu, 2022-09-22 at 14:10 -0700, Daniel Walker wrote:
> > On Thu, Sep 22, 2022 at 05:03:46PM -0400, Sean Anderson wrote:
> [snip]
> > > As recently as last month, someone's patch to add such support was
> > > rejected for this reason [1].
> > > 
> > > --Sean
> > > 
> > > [1] 
> > > https://lore.kernel.org/linux-arm-kernel/20220812084613.GA3107@willie-the-truck/
> > 
> > 
> > I had no idea.. Thanks for pointing that out. I guess I will re-submit in 
> > that
> > case.
> > 
> > Daniel
> 
> This has been happening repeatedly since circa 2014, on multiple
> architectures. It's quite frustrating, really.
 
 I'm not sure I'm following your comments. What's frustrating exactly ?

Daniel

Re: [PATCH 0/8] generic command line v4

2022-09-26 Thread Rob Herring

On Thu, Sep 22, 2022 at 4:15 PM Daniel Gimpelevich
 wrote:
>
> On Thu, 2022-09-22 at 14:10 -0700, Daniel Walker wrote:
> > On Thu, Sep 22, 2022 at 05:03:46PM -0400, Sean Anderson wrote:
> [snip]
> > > As recently as last month, someone's patch to add such support was
> > > rejected for this reason [1].
> > >
> > > --Sean
> > >
> > > [1] 
> > > https://lore.kernel.org/linux-arm-kernel/20220812084613.GA3107@willie-the-truck/
> >
> >
> > I had no idea.. Thanks for pointing that out. I guess I will re-submit in 
> > that
> > case.
> >
> > Daniel
>
> This has been happening repeatedly since circa 2014, on multiple
> architectures. It's quite frustrating, really.

It must not be that important. From the last time, IMO Christophe's
version was much closer to being merged than this series. This is not
how you get things upstream:

> * Dropped powerpc changes
>   Christophe Leroy has reservations about the features for powerpc. I
>   don't think his reservations are founded, and these changes should
>   fully work on powerpc. However, I dropped these changes so Christophe
>   can have more time to get comfortable with the changes.

Rob

Re: [PATCH v2 2/2] powerpc/rtas: block error injection when locked down

2022-09-26 Thread Paul Moore

On Mon, Sep 26, 2022 at 9:18 AM Nathan Lynch  wrote:
>
> The error injection facility on pseries VMs allows corruption of
> arbitrary guest memory, potentially enabling a sufficiently privileged
> user to disable lockdown or perform other modifications of the running
> kernel via the rtas syscall.
>
> Block the PAPR error injection facility from being opened or called
> when locked down.
>
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/kernel/rtas.c | 25 -
>  include/linux/security.h   |  1 +
>  security/security.c|  1 +
>  3 files changed, 26 insertions(+), 1 deletion(-)

The lockdown changes are trivial, but they look fine to me.

Acked-by: Paul Moore  (LSM)

-- 
paul-moore.com

Re: [PATCH v2 1/2] powerpc/pseries: block untrusted device tree changes when locked down

2022-09-26 Thread Paul Moore

On Mon, Sep 26, 2022 at 9:17 AM Nathan Lynch  wrote:
>
> The /proc/powerpc/ofdt interface allows the root user to freely alter
> the in-kernel device tree, enabling arbitrary physical address writes
> via drivers that could bind to malicious device nodes, thus making it
> possible to disable lockdown.
>
> Historically this interface has been used on the pseries platform to
> facilitate the runtime addition and removal of processor, memory, and
> device resources (aka Dynamic Logical Partitioning or DLPAR). Years
> ago, the processor and memory use cases were migrated to designs that
> happen to be lockdown-friendly: device tree updates are communicated
> directly to the kernel from firmware without passing through untrusted
> user space. I/O device DLPAR via the "drmgr" command in powerpc-utils
> remains the sole legitimate user of /proc/powerpc/ofdt, but it is
> already broken in lockdown since it uses /dev/mem to allocate argument
> buffers for the rtas syscall. So only illegitimate uses of the
> interface should see a behavior change when running on a locked down
> kernel.
>
> Signed-off-by: Nathan Lynch 
> ---
>  arch/powerpc/platforms/pseries/reconfig.c | 5 +
>  include/linux/security.h  | 1 +
>  security/security.c   | 1 +
>  3 files changed, 7 insertions(+)

Thanks for moving the definitions.

Acked-by: Paul Moore  (LSM)

> diff --git a/arch/powerpc/platforms/pseries/reconfig.c 
> b/arch/powerpc/platforms/pseries/reconfig.c
> index cad7a0c93117..599bd2c78514 100644
> --- a/arch/powerpc/platforms/pseries/reconfig.c
> +++ b/arch/powerpc/platforms/pseries/reconfig.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> @@ -361,6 +362,10 @@ static ssize_t ofdt_write(struct file *file, const char 
> __user *buf, size_t coun
> char *kbuf;
> char *tmp;
>
> +   rv = security_locked_down(LOCKDOWN_DEVICE_TREE);
> +   if (rv)
> +   return rv;
> +
> kbuf = memdup_user_nul(buf, count);
> if (IS_ERR(kbuf))
> return PTR_ERR(kbuf);
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 7bd0c490703d..39e7c0e403d9 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -114,6 +114,7 @@ enum lockdown_reason {
> LOCKDOWN_IOPORT,
> LOCKDOWN_MSR,
> LOCKDOWN_ACPI_TABLES,
> +   LOCKDOWN_DEVICE_TREE,
> LOCKDOWN_PCMCIA_CIS,
> LOCKDOWN_TIOCSSERIAL,
> LOCKDOWN_MODULE_PARAMETERS,
> diff --git a/security/security.c b/security/security.c
> index 4b95de24bc8d..51bf66d4f472 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -52,6 +52,7 @@ const char *const 
> lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
> [LOCKDOWN_IOPORT] = "raw io port access",
> [LOCKDOWN_MSR] = "raw MSR access",
> [LOCKDOWN_ACPI_TABLES] = "modifying ACPI tables",
> +   [LOCKDOWN_DEVICE_TREE] = "modifying device tree contents",
> [LOCKDOWN_PCMCIA_CIS] = "direct PCMCIA CIS storage",
> [LOCKDOWN_TIOCSSERIAL] = "reconfiguration of serial port IO",
> [LOCKDOWN_MODULE_PARAMETERS] = "unsafe module parameters",
> --
> 2.37.3
>


-- 
paul-moore.com

Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-26 Thread John Hubbard

On 9/26/22 14:35, Lyude Paul wrote:
>> +for (i = 0; i < npages; i++) {
>> +if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
>> +struct page *dpage;
>> +
>> +/*
>> + * _GFP_NOFAIL because the GPU is going away and there
>> + * is nothing sensible we can do if we can't copy the
>> + * data back.
>> + */
> 
> You'll have to excuse me for a moment since this area of nouveau isn't one of
> my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite
> retry, in the case of a GPU hotplug event I would assume we would rather just
> stop trying to migrate things to the GPU and just drop the data instead of
> hanging on infinite retries.
> 
Hi Lyude!

Actually, I really think it's better in this case to keep trying
(presumably not necessarily infinitely, but only until memory becomes
available), rather than failing out and corrupting data.

That's because I'm not sure it's completely clear that this memory is
discardable. And at some point, we're going to make this all work with
file-backed memory, which will *definitely* not be discardable--I
realize that we're not there yet, of course.

But here, it's reasonable to commit to just retrying indefinitely,
really. Memory should eventually show up. And if it doesn't, then
restarting the machine is better than corrupting data, generally.

thanks,

-- 
John Hubbard
NVIDIA

[PATCH v3] powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()

2022-09-26 Thread Nathan Lynch

At boot time, it is not necessary to delay between polls of
cpu_callin_map when waiting for a kicked CPU to come up. Remove the
delay intervals, but preserve the overall deadline (five seconds).

At run time, the first poll result is usually negative and we incur a
sleeping wait. If we spin on the callin word for a short time first,
we can reduce __cpu_up() from dozens of milliseconds to under 1ms in
the common case on a P9 LPAR:

$ ppc64_cpu --smt=off
$ bpftrace -e 'kprobe:__cpu_up {
 @start[tid] = nsecs;
   }
   kretprobe:__cpu_up /@start[tid]/ {
 @us = hist((nsecs - @start[tid]) / 1000);
 delete(@start[tid]);
   }' -c 'ppc64_cpu --smt=on'

Before:

@us:
[16K, 32K)85 ||
[32K, 64K)13 |@@@ |

After:

@us:
[128, 256)95 ||
[256, 512) 3 |@   |

Signed-off-by: Nathan Lynch 
---

Notes:
Changes since v2:
* Use short optimistic spin for hotplug case and fall back to sleeping
  loop.
* Preserve original deadline for hotplug case, which was effectively
  100 seconds as coded.
* Improve benchmark by timing __cpu_up() duration directly.

Changes since v1:
* Do not poll indefinitely; restore the original 5sec timeout

 arch/powerpc/kernel/smp.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 169703fead57..b7ce46bbc6f1 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1257,7 +1257,12 @@ static void cpu_idle_thread_init(unsigned int cpu, 
struct task_struct *idle)
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
-   int rc, c;
+   const unsigned long boot_spin_ms = 5 * MSEC_PER_SEC;
+   const bool booting = system_state < SYSTEM_RUNNING;
+   const unsigned long hp_spin_ms = 1;
+   unsigned long deadline;
+   int rc;
+   const unsigned long spin_wait_ms = booting ? boot_spin_ms : hp_spin_ms;
 
/*
 * Don't allow secondary threads to come online if inhibited
@@ -1302,22 +1307,23 @@ int __cpu_up(unsigned int cpu, struct task_struct 
*tidle)
}
 
/*
-* wait to see if the cpu made a callin (is actually up).
-* use this value that I found through experimentation.
-* -- Cort
+* At boot time, simply spin on the callin word until the
+* deadline passes.
+*
+* At run time, spin for an optimistic amount of time to avoid
+* sleeping in the common case.
 */
-   if (system_state < SYSTEM_RUNNING)
-   for (c = 5; c && !cpu_callin_map[cpu]; c--)
-   udelay(100);
-#ifdef CONFIG_HOTPLUG_CPU
-   else
-   /*
-* CPUs can take much longer to come up in the
-* hotplug case.  Wait five seconds.
-*/
-   for (c = 5000; c && !cpu_callin_map[cpu]; c--)
-   msleep(1);
-#endif
+   deadline = jiffies + msecs_to_jiffies(spin_wait_ms);
+   spin_until_cond(cpu_callin_map[cpu] || 
time_is_before_jiffies(deadline));
+
+   if (!cpu_callin_map[cpu] && system_state >= SYSTEM_RUNNING) {
+   const unsigned long sleep_interval_us = 10 * USEC_PER_MSEC;
+   const unsigned long sleep_wait_ms = 100 * MSEC_PER_SEC;
+
+   deadline = jiffies + msecs_to_jiffies(sleep_wait_ms);
+   while (!cpu_callin_map[cpu] && time_is_after_jiffies(deadline))
+   fsleep(sleep_interval_us);
+   }
 
if (!cpu_callin_map[cpu]) {
printk(KERN_ERR "Processor %u is stuck.\n", cpu);
-- 
2.37.1

Re: [PATCH v2] powerpc: Ignore DSI error caused by the copy/paste instruction

2022-09-26 Thread Haren Myneni

On Mon, 2022-09-26 at 05:55 +, Christophe Leroy wrote:
> 
> Le 25/09/2022 à 22:26, Haren Myneni a écrit :
> > DSI error will be generated when the paste operation is issued on
> > the suspended NX window due to NX state changes. The hypervisor
> > expects the partition to ignore this error during page pault
> > handling. To differentiate DSI caused by an actual HW configuration
> > or by the NX window, a new “ibm,pi-features” type value is defined.
> > Byte 0, bit 3 of pi-attribute-specifier-type is now defined to
> > indicate this DSI error. If this error is not ignored, the user
> > space can get SIGBUS when the NX request is issued.
> 
> Would be nice to mention at least one time in the message that NX
> stands 
> to nest accelerator.
> 
> Otherwise, that's confusing with for exemple:
> Commit 2e602847d9c2 ("KVM: PPC: Don't flush PTEs on NX/RO hit")
> Commit c49643319715 ("powerpc/32s: Only leave NX unset on segments
> used 
> for modules")

Thanks. I did not realize since VAS/NX code is added before. I will add
the description as you suggested. 

> 
> 
> > This patch adds changes to read ibm,pi-features property and ignore
> > DSI error in the page fault handling if CPU_FTR_NX_DSI if defined.
> > 
> > Signed-off-by: Haren Myneni 
> > ---
> > v2: Code cleanup as suggested by Christophe Leroy
> > 
> >   arch/powerpc/include/asm/cputable.h |  5 ++--
> >   arch/powerpc/kernel/prom.c  | 36 +---
> > -
> >   arch/powerpc/mm/fault.c | 17 +-
> >   3 files changed, 45 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> > index 014005428687..cb949f12baa9 100644
> > --- a/arch/powerpc/mm/fault.c
> > +++ b/arch/powerpc/mm/fault.c
> > @@ -367,7 +367,22 @@ static void sanity_check_fault(bool is_write,
> > bool is_user,
> >   #elif defined(CONFIG_PPC_8xx)
> >   #define page_fault_is_bad(__err)  ((__err) & DSISR_NOEXEC_OR_G)
> >   #elif defined(CONFIG_PPC64)
> > -#define page_fault_is_bad(__err)   ((__err) & DSISR_BAD_FAULT_64S)
> > +static int page_fault_is_bad(unsigned long err)
> > +{
> > +   unsigned long flag = DSISR_BAD_FAULT_64S;
> > +
> > +   /*
> > +* PAPR 14.15.3.4.1
> > +* If byte 0, bit 3 of pi-attribute-specifier-type in
> > +* ibm,pi-features property is defined, ignore the DSI error
> > +* which is caused by the paste instruction on the
> > +* suspended NX window.
> > +*/
> > +   if (cpu_has_feature(CPU_FTR_NX_DSI))
> > +   flag &= ~DSISR_BAD_COPYPASTE;
> > +
> > +   return (err & flag);
> 
> You don't need parenthesis ( )
> 
> > +}
> >   #else
> >   #define page_fault_is_bad(__err)  ((__err) & DSISR_BAD_FAULT_32S)
> >   #endif

Re: [PATCH V2] tools/perf/tests: Fix perf probe error log check in skip_if_no_debuginfo

2022-09-26 Thread Arnaldo Carvalho de Melo

Em Fri, Sep 16, 2022 at 06:35:41PM +0530, kajoljain escreveu:
> 
> 
> On 9/16/22 16:19, Athira Rajeev wrote:
> > The perf probe related tests like probe_vfs_getname.sh which
> > is in "tools/perf/tests/shell" directory have dependency on
> > debuginfo information in the kernel. Currently debuginfo
> > check is handled by skip_if_no_debuginfo function in the
> > file "lib/probe_vfs_getname.sh". skip_if_no_debuginfo function
> > looks for this specific error log from perf probe to skip
> > the testcase:
> > 
> > <<>>
> > Failed to find the path for the kernel|Debuginfo-analysis is
> > not supported
> > <>>
> > 
> > But in some case, like this one in powerpc, while running this
> > test, observed error logs is:
> > 
> > <<>>
> > The /lib/modules//build/vmlinux file has no debug information.
> > Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo
> > package.
> >   Error: Failed to add events.
> > <<>>
> > 
> > Update the skip_if_no_debuginfo function to include the above
> > error, to skip the test in these scenarios too.
> 
> Patch looks good to me.
> 
> Reviewed-By: Kajol Jain 

Thanks, applied.

- Arnaldo

 
> Thanks,
> Kajol Jain
> 
> > 
> > Reported-by: Disha Goel 
> > Signed-off-by: Athira Rajeev 
> > ---
> > changelog:
> >  v1 -> v2:
> >  Corrected formatting of spaces in error log.
> >  With spaces in v1 of the patch, the egrep search was
> >  considering spaces also.
> > 
> >  tools/perf/tests/shell/lib/probe_vfs_getname.sh | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/tests/shell/lib/probe_vfs_getname.sh 
> > b/tools/perf/tests/shell/lib/probe_vfs_getname.sh
> > index 5b17d916c555..b616d42bd19d 100644
> > --- a/tools/perf/tests/shell/lib/probe_vfs_getname.sh
> > +++ b/tools/perf/tests/shell/lib/probe_vfs_getname.sh
> > @@ -19,6 +19,6 @@ add_probe_vfs_getname() {
> >  }
> >  
> >  skip_if_no_debuginfo() {
> > -   add_probe_vfs_getname -v 2>&1 | egrep -q "^(Failed to find the path for 
> > the kernel|Debuginfo-analysis is not supported)" && return 2
> > +   add_probe_vfs_getname -v 2>&1 | egrep -q "^(Failed to find the path for 
> > the kernel|Debuginfo-analysis is not supported)|(file has no debug 
> > information)" && return 2
> > return 1
> >  }

-- 

- Arnaldo

[PATCH net-next v5 1/9] dt-bindings: net: Expand pcs-handle to an array

2022-09-26 Thread Sean Anderson

This allows multiple phandles to be specified for pcs-handle, such as
when multiple PCSs are present for a single MAC. To differentiate
between them, also add a pcs-handle-names property.

Signed-off-by: Sean Anderson 
---
This was previously submitted as [1]. I expect to update this series
more, so I have moved it here. Changes from that version include:
- Add maxItems to existing bindings
- Add a dependency from pcs-names to pcs-handle.

[1] 
https://lore.kernel.org/netdev/20220711160519.741990-3-sean.ander...@seco.com/

(no changes since v4)

Changes in v4:
- Use pcs-handle-names instead of pcs-names, as discussed

Changes in v3:
- New

 .../bindings/net/dsa/renesas,rzn1-a5psw.yaml   |  1 +
 .../devicetree/bindings/net/ethernet-controller.yaml   | 10 +-
 .../devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml|  2 +-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml 
b/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml
index 7ca9c19a157c..a53552ee1d0e 100644
--- a/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml
+++ b/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml
@@ -74,6 +74,7 @@ properties:
 
 properties:
   pcs-handle:
+maxItems: 1
 description:
   phandle pointing to a PCS sub-node compatible with
   renesas,rzn1-miic.yaml#
diff --git a/Documentation/devicetree/bindings/net/ethernet-controller.yaml 
b/Documentation/devicetree/bindings/net/ethernet-controller.yaml
index 4b3c590fcebf..5bb2ec2963cf 100644
--- a/Documentation/devicetree/bindings/net/ethernet-controller.yaml
+++ b/Documentation/devicetree/bindings/net/ethernet-controller.yaml
@@ -108,11 +108,16 @@ properties:
 $ref: "#/properties/phy-connection-type"
 
   pcs-handle:
-$ref: /schemas/types.yaml#/definitions/phandle
+$ref: /schemas/types.yaml#/definitions/phandle-array
 description:
   Specifies a reference to a node representing a PCS PHY device on a MDIO
   bus to link with an external PHY (phy-handle) if exists.
 
+  pcs-handle-names:
+$ref: /schemas/types.yaml#/definitions/string-array
+description:
+  The name of each PCS in pcs-handle.
+
   phy-handle:
 $ref: /schemas/types.yaml#/definitions/phandle
 description:
@@ -216,6 +221,9 @@ properties:
 required:
   - speed
 
+dependencies:
+  pcs-handle-names: [pcs-handle]
+
 allOf:
   - if:
   properties:
diff --git a/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml 
b/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml
index 7f620a71a972..600240281e8c 100644
--- a/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml
+++ b/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml
@@ -31,7 +31,7 @@ properties:
   phy-mode: true
 
   pcs-handle:
-$ref: /schemas/types.yaml#/definitions/phandle
+maxItems: 1
 description:
   A reference to a node representing a PCS PHY device found on
   the internal MDIO bus.
-- 
2.35.1.1320.gc452695387.dirty

[PATCH net-next v5 9/9] arm64: dts: layerscape: Add nodes for QSGMII PCSs

2022-09-26 Thread Sean Anderson

Now that we actually read registers from QSGMII PCSs, it's important
that we have the correct address (instead of hoping that we're the MAC
with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII
PCSs.  The exact mapping of QSGMII to MACs depends on the SoC.

Since the first QSGMII PCSs share an address with the SGMII and XFI
PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts
on the bus.

Signed-off-by: Sean Anderson 
---

(no changes since v3)

Changes in v3:
- Split this patch off from the previous one

Changes in v2:
- New

 .../boot/dts/freescale/fsl-ls1043-post.dtsi   | 24 ++
 .../boot/dts/freescale/fsl-ls1046-post.dtsi   | 25 +++
 2 files changed, 49 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043-post.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1043-post.dtsi
index d237162a8744..5c4d7eef8b61 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1043-post.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1043-post.dtsi
@@ -24,9 +24,12 @@  {
 
/* these aliases provide the FMan ports mapping */
enet0: ethernet@e {
+   pcs-handle-names = "qsgmii";
};
 
enet1: ethernet@e2000 {
+   pcsphy-handle = <>, <_pcs1>;
+   pcs-handle-names = "sgmii", "qsgmii";
};
 
enet2: ethernet@e4000 {
@@ -36,11 +39,32 @@ enet3: ethernet@e6000 {
};
 
enet4: ethernet@e8000 {
+   pcsphy-handle = <>, <_pcs2>;
+   pcs-handle-names = "sgmii", "qsgmii";
};
 
enet5: ethernet@ea000 {
+   pcsphy-handle = <>, <_pcs3>;
+   pcs-handle-names = "sgmii", "qsgmii";
};
 
enet6: ethernet@f {
};
+
+   mdio@e1000 {
+   qsgmiib_pcs1: ethernet-pcs@1 {
+   compatible = "fsl,lynx-pcs";
+   reg = <0x1>;
+   };
+
+   qsgmiib_pcs2: ethernet-pcs@2 {
+   compatible = "fsl,lynx-pcs";
+   reg = <0x2>;
+   };
+
+   qsgmiib_pcs3: ethernet-pcs@3 {
+   compatible = "fsl,lynx-pcs";
+   reg = <0x3>;
+   };
+   };
 };
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046-post.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046-post.dtsi
index d6caaea57d90..4e3345093943 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046-post.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046-post.dtsi
@@ -23,6 +23,8 @@  {
  {
/* these aliases provide the FMan ports mapping */
enet0: ethernet@e {
+   pcsphy-handle = <_pcs3>;
+   pcs-handle-names = "qsgmii";
};
 
enet1: ethernet@e2000 {
@@ -35,14 +37,37 @@ enet3: ethernet@e6000 {
};
 
enet4: ethernet@e8000 {
+   pcsphy-handle = <>, <_pcs1>;
+   pcs-handle-names = "sgmii", "qsgmii";
};
 
enet5: ethernet@ea000 {
+   pcsphy-handle = <>, <>;
+   pcs-handle-names = "sgmii", "qsgmii";
};
 
enet6: ethernet@f {
};
 
enet7: ethernet@f2000 {
+   pcsphy-handle = <>, <_pcs2>, <>;
+   pcs-handle-names = "sgmii", "qsgmii", "xfi";
+   };
+
+   mdio@eb000 {
+   qsgmiib_pcs1: ethernet-pcs@1 {
+   compatible = "fsl,lynx-pcs";
+   reg = <0x1>;
+   };
+
+   qsgmiib_pcs2: ethernet-pcs@2 {
+   compatible = "fsl,lynx-pcs";
+   reg = <0x2>;
+   };
+
+   qsgmiib_pcs3: ethernet-pcs@3 {
+   compatible = "fsl,lynx-pcs";
+   reg = <0x3>;
+   };
};
 };
-- 
2.35.1.1320.gc452695387.dirty

[PATCH net-next v5 8/9] powerpc: dts: qoriq: Add nodes for QSGMII PCSs

2022-09-26 Thread Sean Anderson

Now that we actually read registers from QSGMII PCSs, it's important
that we have the correct address (instead of hoping that we're the MAC
with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII
PCSs. They have the same addresses on all SoCs (e.g. if QSGMIIA is
present it's used for MACs 1 through 4).

Since the first QSGMII PCSs share an address with the SGMII and XFI
PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts
on the bus.

Signed-off-by: Sean Anderson 
---

(no changes since v4)

Changes in v4:
- Add XFI PCS for t208x MAC1/MAC2

Changes in v3:
- Add compatibles for QSGMII PCSs
- Split arm and powerpcs dts updates

Changes in v2:
- New

 .../boot/dts/fsl/qoriq-fman3-0-10g-0-best-effort.dtsi  |  3 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi | 10 +-
 .../boot/dts/fsl/qoriq-fman3-0-10g-1-best-effort.dtsi  | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi |  3 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi |  3 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-0.dtsi  |  3 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-1.dtsi  | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-2.dtsi  | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-3.dtsi  | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-4.dtsi  |  3 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-5.dtsi  | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1-10g-0.dtsi | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1-10g-1.dtsi | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-0.dtsi  |  3 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-1.dtsi  | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-2.dtsi  | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-3.dtsi  | 10 +-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-4.dtsi  |  3 ++-
 arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-5.dtsi  | 10 +-
 20 files changed, 131 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0-best-effort.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0-best-effort.dtsi
index baa0c503e741..7e70977f282a 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0-best-effort.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0-best-effort.dtsi
@@ -55,7 +55,8 @@ ethernet@e {
reg = <0xe 0x1000>;
fsl,fman-ports = <_rx_0x08 _tx_0x28>;
ptp-timer = <_timer0>;
-   pcsphy-handle = <>;
+   pcsphy-handle = <>, <>;
+   pcs-handle-names = "sgmii", "qsgmii";
};
 
mdio@e1000 {
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi
index 93095600e808..5f89f7c1761f 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi
@@ -52,7 +52,15 @@ ethernet@f {
compatible = "fsl,fman-memac";
reg = <0xf 0x1000>;
fsl,fman-ports = <_rx_0x10 _tx_0x30>;
-   pcsphy-handle = <>;
+   pcsphy-handle = <>, <_pcs2>, <>;
+   pcs-handle-names = "sgmii", "qsgmii", "xfi";
+   };
+
+   mdio@e9000 {
+   qsgmiib_pcs2: ethernet-pcs@2 {
+   compatible = "fsl,lynx-pcs";
+   reg = <2>;
+   };
};
 
mdio@f1000 {
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1-best-effort.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1-best-effort.dtsi
index ff4bd38f0645..71eb75e82c2e 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1-best-effort.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1-best-effort.dtsi
@@ -55,7 +55,15 @@ ethernet@e2000 {
reg = <0xe2000 0x1000>;
fsl,fman-ports = <_rx_0x09 _tx_0x29>;
ptp-timer = <_timer0>;
-   pcsphy-handle = <>;
+   pcsphy-handle = <>, <_pcs1>;
+   pcs-handle-names = "sgmii", "qsgmii";
+   };
+
+   mdio@e1000 {
+   qsgmiia_pcs1: ethernet-pcs@1 {
+   compatible = "fsl,lynx-pcs";
+   reg = <1>;
+   };
};
 
mdio@e3000 {
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi
index 1fa38ed6f59e..fb7032ddb7fc 100644
--- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi
@@ -52,7 +52,15 @@ ethernet@f2000 {
compatible = "fsl,fman-memac";
reg = <0xf2000 0x1000>;
fsl,fman-ports = <_rx_0x11 _tx_0x31>;
-   pcsphy-handle = <>;
+   pcsphy-handle = <>, <_pcs3>, <>;
+

[PATCH net-next v5 7/9] powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G

2022-09-26 Thread Sean Anderson

On the T208X SoCs, MAC1 and MAC2 support XGMII. Add some new MAC dtsi
fragments, and mark the QMAN ports as 10G.

Fixes: da414bb923d9 ("powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the 
SoC device tree(s)")
Signed-off-by: Sean Anderson 
---

(no changes since v4)

Changes in v4:
- New

 .../boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi | 44 +++
 .../boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi | 44 +++
 arch/powerpc/boot/dts/fsl/t2081si-post.dtsi   |  4 +-
 3 files changed, 90 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi
 create mode 100644 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi

diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi
new file mode 100644
index ..437dab3fc017
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0-or-later
+/*
+ * QorIQ FMan v3 10g port #2 device tree stub [ controller @ offset 0x40 ]
+ *
+ * Copyright 2022 Sean Anderson 
+ * Copyright 2012 - 2015 Freescale Semiconductor Inc.
+ */
+
+fman@40 {
+   fman0_rx_0x08: port@88000 {
+   cell-index = <0x8>;
+   compatible = "fsl,fman-v3-port-rx";
+   reg = <0x88000 0x1000>;
+   fsl,fman-10g-port;
+   };
+
+   fman0_tx_0x28: port@a8000 {
+   cell-index = <0x28>;
+   compatible = "fsl,fman-v3-port-tx";
+   reg = <0xa8000 0x1000>;
+   fsl,fman-10g-port;
+   };
+
+   ethernet@e {
+   cell-index = <0>;
+   compatible = "fsl,fman-memac";
+   reg = <0xe 0x1000>;
+   fsl,fman-ports = <_rx_0x08 _tx_0x28>;
+   ptp-timer = <_timer0>;
+   pcsphy-handle = <>;
+   };
+
+   mdio@e1000 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "fsl,fman-memac-mdio", "fsl,fman-xmdio";
+   reg = <0xe1000 0x1000>;
+   fsl,erratum-a011043; /* must ignore read errors */
+
+   pcsphy0: ethernet-phy@0 {
+   reg = <0x0>;
+   };
+   };
+};
diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi 
b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi
new file mode 100644
index ..ad116b17850a
--- /dev/null
+++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0-or-later
+/*
+ * QorIQ FMan v3 10g port #3 device tree stub [ controller @ offset 0x40 ]
+ *
+ * Copyright 2022 Sean Anderson 
+ * Copyright 2012 - 2015 Freescale Semiconductor Inc.
+ */
+
+fman@40 {
+   fman0_rx_0x09: port@89000 {
+   cell-index = <0x9>;
+   compatible = "fsl,fman-v3-port-rx";
+   reg = <0x89000 0x1000>;
+   fsl,fman-10g-port;
+   };
+
+   fman0_tx_0x29: port@a9000 {
+   cell-index = <0x29>;
+   compatible = "fsl,fman-v3-port-tx";
+   reg = <0xa9000 0x1000>;
+   fsl,fman-10g-port;
+   };
+
+   ethernet@e2000 {
+   cell-index = <1>;
+   compatible = "fsl,fman-memac";
+   reg = <0xe2000 0x1000>;
+   fsl,fman-ports = <_rx_0x09 _tx_0x29>;
+   ptp-timer = <_timer0>;
+   pcsphy-handle = <>;
+   };
+
+   mdio@e3000 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "fsl,fman-memac-mdio", "fsl,fman-xmdio";
+   reg = <0xe3000 0x1000>;
+   fsl,erratum-a011043; /* must ignore read errors */
+
+   pcsphy1: ethernet-phy@0 {
+   reg = <0x0>;
+   };
+   };
+};
diff --git a/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi 
b/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi
index ecbb447920bc..74e17e134387 100644
--- a/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi
+++ b/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi
@@ -609,8 +609,8 @@ usb1: usb@211000 {
 /include/ "qoriq-bman1.dtsi"
 
 /include/ "qoriq-fman3-0.dtsi"
-/include/ "qoriq-fman3-0-1g-0.dtsi"
-/include/ "qoriq-fman3-0-1g-1.dtsi"
+/include/ "qoriq-fman3-0-10g-2.dtsi"
+/include/ "qoriq-fman3-0-10g-3.dtsi"
 /include/ "qoriq-fman3-0-1g-2.dtsi"
 /include/ "qoriq-fman3-0-1g-3.dtsi"
 /include/ "qoriq-fman3-0-1g-4.dtsi"
-- 
2.35.1.1320.gc452695387.dirty

[PATCH net-next v5 6/9] net: dpaa: Convert to phylink

2022-09-26 Thread Sean Anderson

This converts DPAA to phylink. All macs are converted. This should work
with no device tree modifications (including those made in this series),
except for QSGMII (as noted previously).

The mEMAC configuration is one of the tricker areas. I have tried to
capture all the restrictions across the various models. Most of the time,
we assume that if the serdes supports a mode or the phy-interface-mode
specifies it, then we support it. The only place we can't do this is
(RG)MII, since there's no serdes. In that case, we rely on a (new)
devicetree property.  There are also several cases where half-duplex is
broken. Unfortunately, only a single compatible is used for the MAC, so we
have to use the board compatible instead.

The 10GEC conversion is very straightforward, since it only supports XAUI.
There is generally nothing to configure.

The dTSEC conversion is broadly similar to mEMAC, but is simpler because we
don't support configuring the SerDes (though this can be easily added) and
we don't have multiple PCSs. From what I can tell, there's nothing
different in the driver or documentation between SGMII and 1000BASE-X
except for the advertising. Similarly, I couldn't find anything about
2500BASE-X. In both cases, I treat them like SGMII. These modes aren't used
by any in-tree boards. Similarly, despite being mentioned in the driver, I
couldn't find any documented SoCs which supported QSGMII.  I have left it
unimplemented for now.

10GEC and dTSEC have not been tested at all. I would greatly appreciate if
someone could try them out.

Signed-off-by: Sean Anderson 
---
This has been tested on an LS1046ARDB.

With managed=phy, I was unable to get the interfaces to come up at all,
hence the default to in-band.

(no changes since v3)

Changes in v3:
- Remove _return label from memac_initialization in favor of returning
  directly
- Fix grabbing the default PCS not checking for -ENODATA from
  of_property_match_string
- Set DTSEC_ECNTRL_R100M in dtsec_link_up instead of dtsec_mac_config
- Remove rmii/mii properties

Changes in v2:
- Remove unused variable slow_10g_if
- Restrict valid link modes based on the phy interface. This is easier
  to set up, and mostly captures what I intended to do the first time.
  We now have a custom validate which restricts half-duplex for some SoCs
  for RGMII, but generally just uses the default phylink validate.
- Configure the SerDes in enable/disable
- Properly implement all ethtool ops and ioctls. These were mostly
  stubbed out just enough to compile last time.
- Convert 10GEC and dTSEC as well

 drivers/net/ethernet/freescale/dpaa/Kconfig   |   4 +-
 .../net/ethernet/freescale/dpaa/dpaa_eth.c|  89 +--
 .../ethernet/freescale/dpaa/dpaa_ethtool.c|  90 +--
 drivers/net/ethernet/freescale/fman/Kconfig   |   1 -
 .../net/ethernet/freescale/fman/fman_dtsec.c  | 459 +++---
 .../net/ethernet/freescale/fman/fman_mac.h|  10 -
 .../net/ethernet/freescale/fman/fman_memac.c  | 578 +-
 .../net/ethernet/freescale/fman/fman_tgec.c   | 131 ++--
 drivers/net/ethernet/freescale/fman/mac.c | 168 +
 drivers/net/ethernet/freescale/fman/mac.h |  23 +-
 10 files changed, 629 insertions(+), 924 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/Kconfig 
b/drivers/net/ethernet/freescale/dpaa/Kconfig
index 0e1439fd00bd..2b560661c82a 100644
--- a/drivers/net/ethernet/freescale/dpaa/Kconfig
+++ b/drivers/net/ethernet/freescale/dpaa/Kconfig
@@ -2,8 +2,8 @@
 menuconfig FSL_DPAA_ETH
tristate "DPAA Ethernet"
depends on FSL_DPAA && FSL_FMAN
-   select PHYLIB
-   select FIXED_PHY
+   select PHYLINK
+   select PCS_LYNX
help
  Data Path Acceleration Architecture Ethernet driver,
  supporting the Freescale QorIQ chips.
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 0a180d17121c..262a2558353b 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -264,8 +264,19 @@ static int dpaa_netdev_init(struct net_device *net_dev,
net_dev->needed_headroom = priv->tx_headroom;
net_dev->watchdog_timeo = msecs_to_jiffies(tx_timeout);
 
-   mac_dev->net_dev = net_dev;
+   /* The rest of the config is filled in by the mac device already */
+   mac_dev->phylink_config.dev = _dev->dev;
+   mac_dev->phylink_config.type = PHYLINK_NETDEV;
mac_dev->update_speed = dpaa_eth_cgr_set_speed;
+   mac_dev->phylink = phylink_create(_dev->phylink_config,
+ dev_fwnode(mac_dev->dev),
+ mac_dev->phy_if,
+ mac_dev->phylink_ops);
+   if (IS_ERR(mac_dev->phylink)) {
+   err = PTR_ERR(mac_dev->phylink);
+   dev_err_probe(dev, err, "Could not create phylink\n");
+   return err;
+   }
 
/* start without

[PATCH net-next v5 5/9] net: fman: memac: Use lynx pcs driver

2022-09-26 Thread Sean Anderson

Although not stated in the datasheet, as far as I can tell PCS for mEMACs
is a "Lynx." By reusing the existing driver, we can remove the PCS
management code from the memac driver. This requires calling some PCS
functions manually which phylink would usually do for us, but we will let
it do that soon.

One problem is that we don't actually have a PCS for QSGMII. We pretend
that each mEMAC's MDIO bus has four QSGMII PCSs, but this is not the case.
Only the "base" mEMAC's MDIO bus has the four QSGMII PCSs. This is not an
issue yet, because we never get the PCS state. However, it will be once the
conversion to phylink is complete, since the links will appear to never
come up. To get around this, we allow specifying multiple PCSs in pcsphy.
This breaks backwards compatibility with old device trees, but only for
QSGMII. IMO this is the only reasonable way to figure out what the actual
QSGMII PCS is.

Additionally, we now also support a separate XFI PCS. This can allow the
SerDes driver to set different addresses for the SGMII and XFI PCSs so they
can be accessed at the same time.

Signed-off-by: Sean Anderson 
---

(no changes since v3)

Changes in v3:
- Put the PCS mdiodev only after we are done with it (since the PCS
  does not perform a get itself).

Changes in v2:
- Move PCS_LYNX dependency to fman Kconfig

 drivers/net/ethernet/freescale/fman/Kconfig   |   3 +
 .../net/ethernet/freescale/fman/fman_memac.c  | 257 +++---
 2 files changed, 104 insertions(+), 156 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/Kconfig 
b/drivers/net/ethernet/freescale/fman/Kconfig
index 48bf8088795d..8f5637db41dd 100644
--- a/drivers/net/ethernet/freescale/fman/Kconfig
+++ b/drivers/net/ethernet/freescale/fman/Kconfig
@@ -4,6 +4,9 @@ config FSL_FMAN
depends on FSL_SOC || ARCH_LAYERSCAPE || COMPILE_TEST
select GENERIC_ALLOCATOR
select PHYLIB
+   select PHYLINK
+   select PCS
+   select PCS_LYNX
select CRC32
default n
help
diff --git a/drivers/net/ethernet/freescale/fman/fman_memac.c 
b/drivers/net/ethernet/freescale/fman/fman_memac.c
index 56a29f505590..80ae34bea818 100644
--- a/drivers/net/ethernet/freescale/fman/fman_memac.c
+++ b/drivers/net/ethernet/freescale/fman/fman_memac.c
@@ -11,43 +11,12 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 
-/* PCS registers */
-#define MDIO_SGMII_CR  0x00
-#define MDIO_SGMII_DEV_ABIL_SGMII  0x04
-#define MDIO_SGMII_LINK_TMR_L  0x12
-#define MDIO_SGMII_LINK_TMR_H  0x13
-#define MDIO_SGMII_IF_MODE 0x14
-
-/* SGMII Control defines */
-#define SGMII_CR_AN_EN 0x1000
-#define SGMII_CR_RESTART_AN0x0200
-#define SGMII_CR_FD0x0100
-#define SGMII_CR_SPEED_SEL1_1G 0x0040
-#define SGMII_CR_DEF_VAL   (SGMII_CR_AN_EN | SGMII_CR_FD | \
-SGMII_CR_SPEED_SEL1_1G)
-
-/* SGMII Device Ability for SGMII defines */
-#define MDIO_SGMII_DEV_ABIL_SGMII_MODE 0x4001
-#define MDIO_SGMII_DEV_ABIL_BASEX_MODE 0x01A0
-
-/* Link timer define */
-#define LINK_TMR_L 0xa120
-#define LINK_TMR_H 0x0007
-#define LINK_TMR_L_BASEX   0xaf08
-#define LINK_TMR_H_BASEX   0x002f
-
-/* SGMII IF Mode defines */
-#define IF_MODE_USE_SGMII_AN   0x0002
-#define IF_MODE_SGMII_EN   0x0001
-#define IF_MODE_SGMII_SPEED_100M   0x0004
-#define IF_MODE_SGMII_SPEED_1G 0x0008
-#define IF_MODE_SGMII_DUPLEX_HALF  0x0010
-
 /* Num of additional exact match MAC adr regs */
 #define MEMAC_NUM_OF_PADDRS 7
 
@@ -326,7 +295,9 @@ struct fman_mac {
struct fman_rev_info fm_rev_info;
bool basex_if;
struct phy *serdes;
-   struct phy_device *pcsphy;
+   struct phylink_pcs *sgmii_pcs;
+   struct phylink_pcs *qsgmii_pcs;
+   struct phylink_pcs *xfi_pcs;
bool allmulti_enabled;
 };
 
@@ -487,91 +458,22 @@ static u32 get_mac_addr_hash_code(u64 eth_addr)
return xor_val;
 }
 
-static void setup_sgmii_internal_phy(struct fman_mac *memac,
-struct fixed_phy_status *fixed_link)
+static void setup_sgmii_internal(struct fman_mac *memac,
+struct phylink_pcs *pcs,
+struct fixed_phy_status *fixed_link)
 {
-   u16 tmp_reg16;
-
-   if (WARN_ON(!memac->pcsphy))
-   return;
-
-   /* SGMII mode */
-   tmp_reg16 = IF_MODE_SGMII_EN;
-   if (!fixed_link)
-   /* AN enable */
-   tmp_reg16 |= IF_MODE_USE_SGMII_AN;
-   else {
-   switch (fixed_link->speed) {
-   case 10:
-   /* For 10M: IF_MODE[SPEED_10M] = 0 */
-   break;
-   case 100:
-   tmp_reg16 |= IF_MODE_SGMII_SPEED_100M;
-   break;
-

[PATCH net-next v5 4/9] net: fman: memac: Add serdes support

2022-09-26 Thread Sean Anderson

This adds support for using a serdes which has to be configured. This is
primarly in preparation for the next commit, which will then change the
serdes mode dynamically.

Signed-off-by: Sean Anderson 
---

(no changes since v4)

Changes in v4:
- Don't fail if phy support was not compiled in

 .../net/ethernet/freescale/fman/fman_memac.c  | 49 ++-
 1 file changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman_memac.c 
b/drivers/net/ethernet/freescale/fman/fman_memac.c
index 32d26cf17843..56a29f505590 100644
--- a/drivers/net/ethernet/freescale/fman/fman_memac.c
+++ b/drivers/net/ethernet/freescale/fman/fman_memac.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* PCS registers */
@@ -324,6 +325,7 @@ struct fman_mac {
void *fm;
struct fman_rev_info fm_rev_info;
bool basex_if;
+   struct phy *serdes;
struct phy_device *pcsphy;
bool allmulti_enabled;
 };
@@ -1203,17 +1205,56 @@ int memac_initialization(struct mac_device *mac_dev,
}
}
 
+   memac->serdes = devm_of_phy_get(mac_dev->dev, mac_node, "serdes");
+   err = PTR_ERR(memac->serdes);
+   if (err == -ENODEV || err == -ENOSYS) {
+   dev_dbg(mac_dev->dev, "could not get (optional) serdes\n");
+   memac->serdes = NULL;
+   } else if (IS_ERR(memac->serdes)) {
+   dev_err_probe(mac_dev->dev, err, "could not get serdes\n");
+   goto _return_fm_mac_free;
+   } else {
+   err = phy_init(memac->serdes);
+   if (err) {
+   dev_err_probe(mac_dev->dev, err,
+ "could not initialize serdes\n");
+   goto _return_fm_mac_free;
+   }
+
+   err = phy_power_on(memac->serdes);
+   if (err) {
+   dev_err_probe(mac_dev->dev, err,
+ "could not power on serdes\n");
+   goto _return_phy_exit;
+   }
+
+   if (memac->phy_if == PHY_INTERFACE_MODE_SGMII ||
+   memac->phy_if == PHY_INTERFACE_MODE_1000BASEX ||
+   memac->phy_if == PHY_INTERFACE_MODE_2500BASEX ||
+   memac->phy_if == PHY_INTERFACE_MODE_QSGMII ||
+   memac->phy_if == PHY_INTERFACE_MODE_XGMII) {
+   err = phy_set_mode_ext(memac->serdes, PHY_MODE_ETHERNET,
+  memac->phy_if);
+   if (err) {
+   dev_err_probe(mac_dev->dev, err,
+ "could not set serdes mode to 
%s\n",
+ phy_modes(memac->phy_if));
+   goto _return_phy_power_off;
+   }
+   }
+   }
+
if (!mac_dev->phy_node && of_phy_is_fixed_link(mac_node)) {
struct phy_device *phy;
 
err = of_phy_register_fixed_link(mac_node);
if (err)
-   goto _return_fm_mac_free;
+   goto _return_phy_power_off;
 
fixed_link = kzalloc(sizeof(*fixed_link), GFP_KERNEL);
if (!fixed_link) {
err = -ENOMEM;
-   goto _return_fm_mac_free;
+   goto _return_phy_power_off;
}
 
mac_dev->phy_node = of_node_get(mac_node);
@@ -1242,6 +1283,10 @@ int memac_initialization(struct mac_device *mac_dev,
 
goto _return;
 
+_return_phy_power_off:
+   phy_power_off(memac->serdes);
+_return_phy_exit:
+   phy_exit(memac->serdes);
 _return_fixed_link_free:
kfree(fixed_link);
 _return_fm_mac_free:
-- 
2.35.1.1320.gc452695387.dirty

[PATCH net-next v5 3/9] dt-bindings: net: fman: Add additional interface properties

2022-09-26 Thread Sean Anderson

At the moment, mEMACs are configured almost completely based on the
phy-connection-type. That is, if the phy interface is RGMII, it assumed
that RGMII is supported. For some interfaces, it is assumed that the
RCW/bootloader has set up the SerDes properly. This is generally OK, but
restricts runtime reconfiguration. The actual link state is never
reported.

To address these shortcomings, the driver will need additional
information. First, it needs to know how to access the PCS/PMAs (in
order to configure them and get the link status). The SGMII PCS/PMA is
the only currently-described PCS/PMA. Add the XFI and QSGMII PCS/PMAs as
well. The XFI (and 10GBASE-KR) PCS/PMA is a c45 "phy" which sits on the
same MDIO bus as SGMII PCS/PMA. By default they will have conflicting
addresses, but they are also not enabled at the same time by default.
Therefore, we can let the XFI PCS/PMA be the default when
phy-connection-type is xgmii. This will allow for
backwards-compatibility.

QSGMII, however, cannot work with the current binding. This is because
the QSGMII PCS/PMAs are only present on one MAC's MDIO bus. At the
moment this is worked around by having every MAC write to the PCS/PMA
addresses (without checking if they are present). This only works if
each MAC has the same configuration, and only if we don't need to know
the status. Because the QSGMII PCS/PMA will typically be located on a
different MDIO bus than the MAC's SGMII PCS/PMA, there is no fallback
for the QSGMII PCS/PMA.

Signed-off-by: Sean Anderson 
Reviewed-by: Rob Herring 
---

(no changes since v3)

Changes in v3:
- Add vendor prefix 'fsl,' to rgmii and mii properties.
- Set maxItems for pcs-names
- Remove phy-* properties from example because dt-schema complains and I
  can't be bothered to figure out how to make it work.
- Add pcs-handle as a preferred version of pcsphy-handle
- Deprecate pcsphy-handle
- Remove mii/rmii properties

Changes in v2:
- Better document how we select which PCS to use in the default case

 .../bindings/net/fsl,fman-dtsec.yaml  | 53 ++-
 .../devicetree/bindings/net/fsl-fman.txt  |  5 +-
 2 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml 
b/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml
index 3a35ac1c260d..c80c880a9dab 100644
--- a/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml
+++ b/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml
@@ -85,9 +85,39 @@ properties:
 $ref: /schemas/types.yaml#/definitions/phandle
 description: A reference to the IEEE1588 timer
 
+  phys:
+description: A reference to the SerDes lane(s)
+maxItems: 1
+
+  phy-names:
+items:
+  - const: serdes
+
   pcsphy-handle:
-$ref: /schemas/types.yaml#/definitions/phandle
-description: A reference to the PCS (typically found on the SerDes)
+$ref: /schemas/types.yaml#/definitions/phandle-array
+minItems: 1
+maxItems: 3
+deprecated: true
+description: See pcs-handle.
+
+  pcs-handle:
+minItems: 1
+maxItems: 3
+description: |
+  A reference to the various PCSs (typically found on the SerDes). If
+  pcs-handle-names is absent, and phy-connection-type is "xgmii", then the 
first
+  reference will be assumed to be for "xfi". Otherwise, if 
pcs-handle-names is
+  absent, then the first reference will be assumed to be for "sgmii".
+
+  pcs-handle-names:
+minItems: 1
+maxItems: 3
+items:
+  enum:
+- sgmii
+- qsgmii
+- xfi
+description: The type of each PCS in pcsphy-handle.
 
   tbi-handle:
 $ref: /schemas/types.yaml#/definitions/phandle
@@ -100,6 +130,10 @@ required:
   - fsl,fman-ports
   - ptp-timer
 
+dependencies:
+  pcs-handle-names:
+- pcs-handle
+
 allOf:
   - $ref: ethernet-controller.yaml#
   - if:
@@ -110,14 +144,6 @@ allOf:
 then:
   required:
 - tbi-handle
-  - if:
-  properties:
-compatible:
-  contains:
-const: fsl,fman-memac
-then:
-  required:
-- pcsphy-handle
 
 unevaluatedProperties: false
 
@@ -138,8 +164,9 @@ examples:
 reg = <0xe8000 0x1000>;
 fsl,fman-ports = <_rx_0x0c _tx_0x2c>;
 ptp-timer = <_timer0>;
-pcsphy-handle = <>;
-phy-handle = <_phy1>;
-phy-connection-type = "sgmii";
+pcs-handle = <>, <_pcs1>;
+pcs-handle-names = "sgmii", "qsgmii";
+phys = < 1>;
+phy-names = "serdes";
 };
 ...
diff --git a/Documentation/devicetree/bindings/net/fsl-fman.txt 
b/Documentation/devicetree/bindings/net/fsl-fman.txt
index b9055335db3b..bda4b41af074 100644
--- a/Documentation/devicetree/bindings/net/fsl-fman.txt
+++ b/Documentation/devicetree/bindings/net/fsl-fman.txt
@@ -320,8 +320,9 @@ For internal PHY device on internal mdio bus, a PHY node 
should be created.
 See the definition of the PHY node in

[PATCH net-next v5 2/9] dt-bindings: net: Add Lynx PCS binding

2022-09-26 Thread Sean Anderson

This binding is fairly bare-bones for now, since the Lynx driver doesn't
parse any properties (or match based on the compatible). We just need it
in order to prevent the PCS nodes from having phy devices attached to
them. This is not really a problem, but it is a bit inefficient.

This binding is really for three separate PCSs (SGMII, QSGMII, and XFI).
However, the driver treats all of them the same. This works because the
SGMII and XFI devices typically use the same address, and the SerDes
driver (or RCW) muxes between them. The QSGMII PCSs have the same
register layout as the SGMII PCSs. To do things properly, we'd probably
do something like

ethernet-pcs@0 {
#pcs-cells = <1>;
compatible = "fsl,lynx-pcs";
reg = <0>, <1>, <2>, <3>;
};

but that would add complexity, and we can describe the hardware just
fine using separate PCSs for now.

Signed-off-by: Sean Anderson 
---

Changes in v5:
- New

 .../bindings/net/pcs/fsl,lynx-pcs.yaml| 40 +++
 1 file changed, 40 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml

diff --git a/Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml 
b/Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml
new file mode 100644
index ..fbedf696c555
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml
@@ -0,0 +1,40 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/pcs/fsl,lynx-pcs.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NXP Lynx PCS
+
+maintainers:
+  - Ioana Ciornei 
+
+description: |
+  NXP Lynx 10G and 28G SerDes have Ethernet PCS devices which can be used as
+  protocol controllers. They are accessible over the Ethernet interface's MDIO
+  bus.
+
+properties:
+  compatible:
+const: fsl,lynx-pcs
+
+  reg:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+
+additionalProperties: false
+
+examples:
+  - |
+mdio-bus {
+  #address-cells = <1>;
+  #size-cells = <0>;
+
+  qsgmii_pcs1: ethernet-pcs@1 {
+compatible = "fsl,lynx-pcs";
+reg = <1>;
+  };
+};
-- 
2.35.1.1320.gc452695387.dirty

[PATCH net-next v5 0/9] [RFT] net: dpaa: Convert to phylink

2022-09-26 Thread Sean Anderson

This series converts the DPAA driver to phylink.

I have tried to maintain backwards compatibility with existing device
trees whereever possible. However, one area where I was unable to
achieve this was with QSGMII. Please refer to patch 2 for details.

All mac drivers have now been converted. I would greatly appreciate if
anyone has T-series or P-series boards they can test/debug this series
on. I only have an LS1046ARDB. Everything but QSGMII should work without
breakage; QSGMII needs patches 7 and 8. For this reason, the last 4
patches in this series should be applied together (and should not go
through separate trees).

This series depends on [1] and [2].

[1] 
https://lore.kernel.org/netdev/20220725153730.2604096-1-sean.ander...@seco.com/
[2] 
https://lore.kernel.org/netdev/20220725151039.2581576-1-sean.ander...@seco.com/

Changes in v5:
- Add Lynx PCS binding

Changes in v4:
- Use pcs-handle-names instead of pcs-names, as discussed
- Don't fail if phy support was not compiled in
- Split off rate adaptation series
- Split off DPAA "preparation" series
- Split off Lynx 10G support
- t208x: Mark MAC1 and MAC2 as 10G
- Add XFI PCS for t208x MAC1/MAC2

Changes in v3:
- Expand pcs-handle to an array
- Add vendor prefix 'fsl,' to rgmii and mii properties.
- Set maxItems for pcs-names
- Remove phy-* properties from example because dt-schema complains and I
  can't be bothered to figure out how to make it work.
- Add pcs-handle as a preferred version of pcsphy-handle
- Deprecate pcsphy-handle
- Remove mii/rmii properties
- Put the PCS mdiodev only after we are done with it (since the PCS
  does not perform a get itself).
- Remove _return label from memac_initialization in favor of returning
  directly
- Fix grabbing the default PCS not checking for -ENODATA from
  of_property_match_string
- Set DTSEC_ECNTRL_R100M in dtsec_link_up instead of dtsec_mac_config
- Remove rmii/mii properties
- Replace 1000Base... with 1000BASE... to match IEEE capitalization
- Add compatibles for QSGMII PCSs
- Split arm and powerpcs dts updates

Changes in v2:
- Better document how we select which PCS to use in the default case
- Move PCS_LYNX dependency to fman Kconfig
- Remove unused variable slow_10g_if
- Restrict valid link modes based on the phy interface. This is easier
  to set up, and mostly captures what I intended to do the first time.
  We now have a custom validate which restricts half-duplex for some SoCs
  for RGMII, but generally just uses the default phylink validate.
- Configure the SerDes in enable/disable
- Properly implement all ethtool ops and ioctls. These were mostly
  stubbed out just enough to compile last time.
- Convert 10GEC and dTSEC as well
- Fix capitalization of mEMAC in commit messages
- Add nodes for QSGMII PCSs
- Add nodes for QSGMII PCSs

Sean Anderson (9):
  dt-bindings: net: Expand pcs-handle to an array
  dt-bindings: net: Add Lynx PCS binding
  dt-bindings: net: fman: Add additional interface properties
  net: fman: memac: Add serdes support
  net: fman: memac: Use lynx pcs driver
  net: dpaa: Convert to phylink
  powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G
  powerpc: dts: qoriq: Add nodes for QSGMII PCSs
  arm64: dts: layerscape: Add nodes for QSGMII PCSs

 .../bindings/net/dsa/renesas,rzn1-a5psw.yaml  |   1 +
 .../bindings/net/ethernet-controller.yaml |  10 +-
 .../bindings/net/fsl,fman-dtsec.yaml  |  53 +-
 .../bindings/net/fsl,qoriq-mc-dpmac.yaml  |   2 +-
 .../devicetree/bindings/net/fsl-fman.txt  |   5 +-
 .../bindings/net/pcs/fsl,lynx-pcs.yaml|  40 +
 .../boot/dts/freescale/fsl-ls1043-post.dtsi   |  24 +
 .../boot/dts/freescale/fsl-ls1046-post.dtsi   |  25 +
 .../fsl/qoriq-fman3-0-10g-0-best-effort.dtsi  |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi |  10 +-
 .../fsl/qoriq-fman3-0-10g-1-best-effort.dtsi  |  10 +-
 .../boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi |  10 +-
 .../boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi |  45 ++
 .../boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi |  45 ++
 .../boot/dts/fsl/qoriq-fman3-0-1g-0.dtsi  |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-1.dtsi  |  10 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-2.dtsi  |  10 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-3.dtsi  |  10 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-4.dtsi  |   3 +-
 .../boot/dts/fsl/qoriq-fman3-0-1g-5.dtsi  |  10 +-
 .../boot/dts/fsl/qoriq-fman3-1-10g-0.dtsi |  10 +-
 .../boot/dts/fsl/qoriq-fman3-1-10g-1.dtsi |  10 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-0.dtsi  |   3 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-1.dtsi  |  10 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-2.dtsi  |  10 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-3.dtsi  |  10 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-4.dtsi  |   3 +-
 .../boot/dts/fsl/qoriq-fman3-1-1g-5.dtsi  |  10 +-
 arch/powerpc/boot/dts/fsl/t2081si-post.dtsi   |   4 +-
 drivers/net/ethernet/freescale/dpaa/Kconfig   |   4 +-
 .../net/ethernet/freescale/dpaa/dpaa_eth.c|  89 +--

Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()

2022-09-26 Thread Bjorn Helgaas

On Mon, Sep 26, 2022 at 10:01:55PM +0800, Zhuo Chen wrote:
> On 9/23/22 5:08 AM, Bjorn Helgaas wrote:
> > On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:
> > > When state is pci_channel_io_frozen in pcie_do_recovery(),
> > > the severity is fatal and fatal status should be cleared.
> > > So we add pci_aer_clear_fatal_status().
> > 
> > Seems sensible to me.  Did you find this by code inspection or by
> > debugging a problem?  If the latter, it would be nice to mention the
> > symptoms of the problem in the commit log.
> 
> I found this by code inspection so I may not enumerate what kind of problems
> this code will cause.
> > 
> > > Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
> > > and pci_aer_clear_nonfatal_status() contains the function of
> > > 'if (host->native_aer || pcie_ports_native)', so we move them
> > > out of it.
> > 
> > Wrap commit log to fill 75 columns.
> > 
> > > Signed-off-by: Zhuo Chen 
> > > ---
> > >   drivers/pci/pcie/err.c | 8 ++--
> > >   1 file changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > > index 0c5a143025af..e0a8ade4c3fe 100644
> > > --- a/drivers/pci/pcie/err.c
> > > +++ b/drivers/pci/pcie/err.c
> > > @@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev 
> > > *dev,
> > >* it is responsible for clearing this status.  In that case, 
> > > the
> > >* signaling device may not even be visible to the OS.
> > >*/
> > > - if (host->native_aer || pcie_ports_native) {
> > > + if (host->native_aer || pcie_ports_native)
> > >   pcie_clear_device_status(dev);
> > 
> > pcie_clear_device_status() doesn't check for pcie_aer_is_native()
> > internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
> > errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
> > status only when we control AER"), both callers check before calling
> > it.
> > 
> > I think we should move the check inside pcie_clear_device_status().
> > That could be a separate preliminary patch.
> > 
> > There are a couple other places (aer_root_reset() and
> > get_port_device_capability()) that do the same check and could be
> > changed to use pcie_aer_is_native() instead.  That could be another
> > preliminary patch.
> > 
> Good suggestion. But I have only one doubt. In aer_root_reset(), if we use
> "if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap
> is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return
> false. It's different from just using "(host->native_aer ||
> pcie_ports_native)".
> Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL
> pointer check should be added in pcie_aer_is_native() because root may be
> NULL.

Good point.  In aer_root_reset(), we're updating Root Port registers,
so I think they should look like:

  if (pcie_aer_is_native(root) && aer) {
...
  }

Does that seem safe and equivalent to you?

Bjorn

Re: [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status

2022-09-26 Thread Bjorn Helgaas

On Mon, Sep 26, 2022 at 10:16:23PM +0800, Zhuo Chen wrote:
> On 9/23/22 5:50 AM, Bjorn Helgaas wrote:
> > On Fri, Sep 02, 2022 at 02:16:34AM +0800, Zhuo Chen wrote:
> > > Statements clearing AER error status in aer_enable_rootport() has the
> > > same function as pci_aer_raw_clear_status(). So we replace them, which
> > > has no functional changes.

> > > - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, );
> > > - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
> > > - pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, );
> > > - pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
> > > - pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, );
> > > - pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
> > > + pci_aer_raw_clear_status(pdev);
> > 
> > It's true that this is functionally equivalent.
> > 
> > But 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status() to
> > unconditionally clear Error Status") says pci_aer_raw_clear_status()
> > is only for use in the EDR path (this should have been included in the
> > function comment), so I think we should preserve that property and use
> > pci_aer_clear_status() here.
> > 
> > pci_aer_raw_clear_status() is the same as pci_aer_clear_status()
> > except it doesn't check pcie_aer_is_native().  And I'm pretty sure we
> > can't get to aer_enable_rootport() *unless* pcie_aer_is_native(),
> > because get_port_device_capability() checks the same thing, so they
> > should be equivalent here.
> > 
> Thanks Bjorn, this very detailed correction is helpful. By the way, 'only
> for use in the EDR path' obviously written in the function comments may be
> better. So far only commit log has included these.

Yes, definitely!  I goofed when I applied that patch without making
sure there was something in the function comment.

Bjorn

Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-26 Thread Bjorn Helgaas

On Mon, Sep 26, 2022 at 09:30:48PM +0800, Zhuo Chen wrote:
> On 9/23/22 4:02 AM, Bjorn Helgaas wrote:
> > On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote:
> > > On 9/12/22 12:22 AM, Serge Semin wrote:
> > > > On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:

> > > ‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be
> > > removed in the next kernel:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406
> > 
> > This is a problem because 00278564a60e ("crypto: hisilicon - Remove
> > pci_aer_clear_nonfatal_status() call") is in Herbert's cryptodev tree,
> > and if I apply this series to the PCI tree and Linus merges it before
> > Herbert's cryptodev changes, it will break the build.
> > 
> > I think we need to split this patch up like this:
> > 
> >- Add pci_aer_clear_uncorrect_error_status() to PCI core
> >- Convert dpc to use pci_aer_clear_uncorrect_error_status()
> >  (I might end up squashing with above)
> >- Convert lpfc to use pci_aer_clear_uncorrect_error_status()
> >- Convert ntb_hw_idt to use pci_aer_clear_uncorrect_error_status()
> >- Unexport pci_aer_clear_nonfatal_status()
> > 
> > Then I can apply all but the last patch safely.  If the crypto changes
> > are merged first, we can add the last one; otherwise we can do it for
> > the next cycle.
> > 
> Good proposal. I will implement these in the next version.
> 
> Do I need to put pci related modifications (include patch 2/3 and 3/3) in a
> patch set or just single patches?

When in doubt, put them in separate patches.  It's trivial for me to
squash them together if that makes more sense, but much more difficult
for me to split them apart.

Thanks for helping clean up this area!

Bjorn

Re: [PATCH v6 4/8] phy: fsl: Add Lynx 10G SerDes driver

2022-09-26 Thread Sean Anderson




On 9/24/22 2:54 AM, Vinod Koul wrote:
> On 20-09-22, 16:23, Sean Anderson wrote:
>> This adds support for the Lynx 10G "SerDes" devices found on various NXP
>> QorIQ SoCs. There may be up to four SerDes devices on each SoC, each
>> supporting up to eight lanes. Protocol support for each SerDes is highly
>> heterogeneous, with each SoC typically having a totally different
>> selection of supported protocols for each lane. Additionally, the SerDes
>> devices on each SoC also have differing support. One SerDes will
>> typically support Ethernet on most lanes, while the other will typically
>> support PCIe on most lanes.
>> 
>> There is wide hardware support for this SerDes. It is present on QorIQ
>> T-Series and Layerscape processors. Because each SoC typically has
>> specific instructions and exceptions for its SerDes, I have limited the
>> initial scope of this module to just the LS1046A and LS1088A.
>> Additionally, I have only added support for Ethernet protocols. There is
>> not a great need for dynamic reconfiguration for other protocols (except
>> perhaps for M.2 cards), so support for them may never be added.
>> 
>> Nevertheless, I have tried to provide an obvious path for adding support
>> for other SoCs as well as other protocols. SATA just needs support for
>> configuring LNmSSCR0. PCIe may need to configure the equalization
>> registers. It also uses multiple lanes. I have tried to write the driver
>> with multi-lane support in mind, so there should not need to be any
>> large changes. Although there are 6 protocols supported, I have only
>> tested SGMII and XFI. The rest have been implemented as described in
>> the datasheet. Most of these protocols should work "as-is", but
>> 10GBASE-KR will need PCS support for link training.
>> 
>> The PLLs are modeled as clocks proper. This lets us take advantage of
>> the existing clock infrastructure. I have not given the same treatment
>> to the per-lane clocks because they need to be programmed in-concert
>> with the rest of the lane settings. One tricky thing is that the VCO
>> (PLL) rate exceeds 2^32 (maxing out at around 5GHz). This will be a
>> problem on 32-bit platforms, since clock rates are stored as unsigned
>> longs. To work around this, the pll clock rate is generally treated in
>> units of kHz.
>> 
>> The PLLs are configured rather interestingly. Instead of the usual direct
>> programming of the appropriate divisors, the input and output clock rates
>> are selected directly. Generally, the only restriction is that the input
>> and output must be integer multiples of each other. This suggests some kind
>> of internal look-up table. The datasheets generally list out the supported
>> combinations explicitly, and not all input/output combinations are
>> documented. I'm not sure if this is due to lack of support, or due to an
>> oversight. If this becomes an issue, then some combinations can be
>> blacklisted (or whitelisted). This may also be necessary for other SoCs
>> which have more stringent clock requirements.
>> 
>> The general API call list for this PHY is documented under the driver-api
>> docs. I think this is rather standard, except that most drivers configure
>> the mode (protocol) at xlate-time. Unlike some other phys where e.g. PCIe
>> x4 will use 4 separate phys all configured for PCIe, this driver uses one
>> phy configured to use 4 lanes. This is because while the individual lanes
>> may be configured individually, the protocol selection acts on all lanes at
>> once. Additionally, the order which lanes should be configured in is
>> specified by the datasheet.  To coordinate this, lanes are reserved in
>> phy_init, and released in phy_exit.
>> 
>> This driver was written with reference to the LS1046A reference manual.
>> However, it was informed by reference manuals for all processors with
>> mEMACs, especially the T4240 (which appears to have a "maxed-out"
>> configuration). The earlier P-Series processors appear to be similar, but
>> have a different overall register layout (using "banks" instead of
>> separate SerDes). Perhaps this those use a "5G Lynx SerDes."
>> 
>> Signed-off-by: Sean Anderson 
>> ---
>> 
>> Changes in v6:
>> - Update MAINTAINERS to include new files
>> - Include bitfield.h and slab.h to allow compilation on non-arm64
>>   arches.
>> - Depend on COMMON_CLK and either layerscape/ppc
>> 
>> Changes in v5:
>> - Remove references to PHY_INTERFACE_MODE_1000BASEKX to allow this
>>   series to be applied directly to linux/master.
>> - Add fsl,lynx-10g.h to MAINTAINERS
>> 
>> Changes in v4:
>> - Rework all debug statements to remove use of __func__. Additional
>>   information has been provided as necessary.
>> - Consider alternative parent rates in round_rate and not in set_rate.
>>   Trying to modify out parent's rate in set_rate will deadlock.
>> - Explicitly perform a stop/reset sequence in set_rate. This way we
>>   always ensure that the PLL is properly stopped.
>> - Set the power-down bit when disabling the

[PATCH RFC 5/5] mm: remove unused savedwrite infrastructure

2022-09-26 Thread David Hildenbrand

NUMA hinting no longer uses savedwrite, let's rip it out.

... and while at it, drop __pte_write() and __pmd_write() on ppc64.

Signed-off-by: David Hildenbrand 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 80 +---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  2 +-
 include/linux/pgtable.h  | 24 --
 mm/debug_vm_pgtable.c| 32 
 4 files changed, 5 insertions(+), 133 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 392ff48f77df..b3ddc34d71c1 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -418,35 +418,9 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 #define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH
 #define pmdp_clear_flush_young pmdp_test_and_clear_young
 
-static inline int __pte_write(pte_t pte)
-{
-   return !!(pte_raw(pte) & cpu_to_be64(_PAGE_WRITE));
-}
-
-#ifdef CONFIG_NUMA_BALANCING
-#define pte_savedwrite pte_savedwrite
-static inline bool pte_savedwrite(pte_t pte)
-{
-   /*
-* Saved write ptes are prot none ptes that doesn't have
-* privileged bit sit. We mark prot none as one which has
-* present and pviliged bit set and RWX cleared. To mark
-* protnone which used to have _PAGE_WRITE set we clear
-* the privileged bit.
-*/
-   return !(pte_raw(pte) & cpu_to_be64(_PAGE_RWX | _PAGE_PRIVILEGED));
-}
-#else
-#define pte_savedwrite pte_savedwrite
-static inline bool pte_savedwrite(pte_t pte)
-{
-   return false;
-}
-#endif
-
 static inline int pte_write(pte_t pte)
 {
-   return __pte_write(pte) || pte_savedwrite(pte);
+   return !!(pte_raw(pte) & cpu_to_be64(_PAGE_WRITE));
 }
 
 static inline int pte_read(pte_t pte)
@@ -458,24 +432,16 @@ static inline int pte_read(pte_t pte)
 static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep)
 {
-   if (__pte_write(*ptep))
+   if (pte_write(*ptep))
pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 0);
-   else if (unlikely(pte_savedwrite(*ptep)))
-   pte_update(mm, addr, ptep, 0, _PAGE_PRIVILEGED, 0);
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
 static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
-   /*
-* We should not find protnone for hugetlb, but this complete the
-* interface.
-*/
-   if (__pte_write(*ptep))
+   if (pte_write(*ptep))
pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 1);
-   else if (unlikely(pte_savedwrite(*ptep)))
-   pte_update(mm, addr, ptep, 0, _PAGE_PRIVILEGED, 1);
 }
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
@@ -552,36 +518,6 @@ static inline int pte_protnone(pte_t pte)
return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE | 
_PAGE_RWX)) ==
cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE);
 }
-
-#define pte_mk_savedwrite pte_mk_savedwrite
-static inline pte_t pte_mk_savedwrite(pte_t pte)
-{
-   /*
-* Used by Autonuma subsystem to preserve the write bit
-* while marking the pte PROT_NONE. Only allow this
-* on PROT_NONE pte
-*/
-   VM_BUG_ON((pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX | 
_PAGE_PRIVILEGED)) !=
- cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED));
-   return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_PRIVILEGED));
-}
-
-#define pte_clear_savedwrite pte_clear_savedwrite
-static inline pte_t pte_clear_savedwrite(pte_t pte)
-{
-   /*
-* Used by KSM subsystem to make a protnone pte readonly.
-*/
-   VM_BUG_ON(!pte_protnone(pte));
-   return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PRIVILEGED));
-}
-#else
-#define pte_clear_savedwrite pte_clear_savedwrite
-static inline pte_t pte_clear_savedwrite(pte_t pte)
-{
-   VM_WARN_ON(1);
-   return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_WRITE));
-}
 #endif /* CONFIG_NUMA_BALANCING */
 
 static inline bool pte_hw_valid(pte_t pte)
@@ -658,8 +594,6 @@ static inline unsigned long pte_pfn(pte_t pte)
 /* Generic modifiers for PTE bits */
 static inline pte_t pte_wrprotect(pte_t pte)
 {
-   if (unlikely(pte_savedwrite(pte)))
-   return pte_clear_savedwrite(pte);
return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_WRITE));
 }
 
@@ -1156,8 +1090,6 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_mkclean(pmd)   pte_pmd(pte_mkclean(pmd_pte(pmd)))
 #define pmd_mkyoung(pmd)   pte_pmd(pte_mkyoung(pmd_pte(pmd)))
 #define pmd_mkwrite(pmd)   pte_pmd(pte_mkwrite(pmd_pte(pmd)))
-#define pmd_mk_savedwrite(pmd) pte_pmd(pte_mk_savedwrite(pmd_pte(pmd)))
-#define pmd_clear_savedwrite(pmd)  
pte_pmd(pte_clear_savedwrite(pmd_pte(pmd)))
 
 #ifdef

[PATCH RFC 4/5] mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite

2022-09-26 Thread David Hildenbrand

commit b191f9b106ea ("mm: numa: preserve PTE write permissions across a
NUMA hinting fault") added remembering write permissions using ordinary
pte_write() for PROT_NONE mapped pages to avoid write faults when
remapping the page !PROT_NONE on NUMA hinting faults.

That commit noted:

The patch looks hacky but the alternatives looked worse. The tidest was
to rewalk the page tables after a hinting fault but it was more complex
than this approach and the performance was worse. It's not generally
safe to just mark the page writable during the fault if it's a write
fault as it may have been read-only for COW so that approach was
discarded.

Later, commit 288bc54949fc ("mm/autonuma: let architecture override how
the write bit should be stashed in a protnone pte.") introduced a family
of savedwrite PTE functions that didn't necessarily improve the whole
situation.

One confusing thing is that nowadays, if a page is pte_protnone()
and pte_savedwrite() then also pte_write() is true. Another source of
confusion is that there is only a single pte_mk_savedwrite() call in the
kernel. All other write-protection code seems to silently rely on
pte_wrprotect().

Ever since PageAnonExclusive was introduced and we started using it in
mprotect context via commit 64fe24a3e05e ("mm/mprotect: try avoiding write
faults for exclusive anonymous pages when changing protection"), we do
have machinery in place to avoid write faults when changing protection,
which is exactly what we want to do here.

Let's similarly do what ordinary mprotect() does nowadays when upgrading
write permissions and reuse can_change_pte_writable() and
can_change_pmd_writable() to detect if we can upgrade PTE permissions to be
writable.

For anonymous pages there should be absolutely no change: if an
anonymous page is not exclusive, it could not have been mapped writable --
because only exclusive anonymous pages can be mapped writable.

However, there *might* be a change for writable shared mappings that
require writenotify: if they are not dirty, we cannot map them writable.
While it might not matter in practice, we'd need a different way to
identify whether writenotify is actually required -- and ordinary mprotect
would benefit from that as well.

We'll remove all savedwrite leftovers next.

Signed-off-by: David Hildenbrand 
---
 include/linux/mm.h |  2 ++
 mm/huge_memory.c   | 28 +---
 mm/ksm.c   |  9 -
 mm/memory.c| 19 ---
 mm/mprotect.c  |  7 ++-
 5 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8a5ad9d050bf..20061a9f7f47 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1954,6 +1954,8 @@ extern unsigned long move_page_tables(struct 
vm_area_struct *vma,
 #define  MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \
MM_CP_UFFD_WP_RESOLVE)
 
+bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
+pte_t pte);
 extern unsigned long change_protection(struct mmu_gather *tlb,
  struct vm_area_struct *vma, unsigned long start,
  unsigned long end, pgprot_t newprot,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e5ce3e11d4ae..f148d1295d2e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1507,8 +1507,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
int page_nid = NUMA_NO_NODE;
int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK);
-   bool migrated = false;
-   bool was_writable = pmd_savedwrite(oldpmd);
+   bool try_change_writable, migrated = false;
int flags = 0;
 
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
@@ -1517,13 +1516,22 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
goto out;
}
 
+   /* See mprotect_fixup(). */
+   if (vma->vm_flags & VM_SHARED)
+   try_change_writable = vma_wants_writenotify(vma, 
vma->vm_page_prot);
+   else
+   try_change_writable = !!(vma->vm_flags & VM_WRITE);
+
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
page = vm_normal_page_pmd(vma, haddr, pmd);
if (!page)
goto out_map;
 
/* See similar comment in do_numa_page for explanation */
-   if (!was_writable)
+   if (try_change_writable && !pmd_write(pmd) &&
+can_change_pmd_writable(vma, vmf->address, pmd))
+   pmd = pmd_mkwrite(pmd);
+   if (!pmd_write(pmd))
flags |= TNF_NO_GROUP;
 
page_nid = page_to_nid(page);
@@ -1568,8 +1576,12 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
/* Restore the PMD */
pmd = pmd_modify(oldpmd, vma->vm_page_prot);
pmd = pmd_mkyoung(pmd);
-   if (was_writable)
+
+   /* Similar to mprotect()

[PATCH RFC 3/5] mm/huge_memory: try avoiding write faults when changing PMD protection

2022-09-26 Thread David Hildenbrand

Let's replicate what we have for PTEs in can_change_pte_writable() also
for PMDs.

While this might look like a pure performance improvement, we'll us this to
get rid of savedwrite handling in do_huge_pmd_numa_page() next. Place
do_huge_pmd_numa_page() stategicly good for that purpose.

Note that MM_CP_TRY_CHANGE_WRITABLE is currently only set when we come
via mprotect_fixup().

Signed-off-by: David Hildenbrand 
---
 mm/huge_memory.c | 38 --
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2f18896c8f9a..e5ce3e11d4ae 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1386,6 +1386,36 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf)
return VM_FAULT_FALLBACK;
 }
 
+static inline bool can_change_pmd_writable(struct vm_area_struct *vma,
+  unsigned long addr, pmd_t pmd)
+{
+   struct page *page;
+
+   if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
+   return false;
+
+   /* Don't touch entries that are not even readable (NUMA hinting). */
+   if (pmd_protnone(pmd))
+   return false;
+
+   /* Do we need write faults for softdirty tracking? */
+   if (vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd))
+   return false;
+
+   /* Do we need write faults for uffd-wp tracking? */
+   if (userfaultfd_huge_pmd_wp(vma, pmd))
+   return false;
+
+   if (!(vma->vm_flags & VM_SHARED)) {
+   /* See can_change_pte_writable(). */
+   page = vm_normal_page_pmd(vma, addr, pmd);
+   return page && PageAnon(page) && !PageAnonExclusive(page);
+   }
+
+   /* See can_change_pte_writable(). */
+   return pmd_dirty(pmd);
+}
+
 /* FOLL_FORCE can write to even unwritable PMDs in COW mappings. */
 static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
struct vm_area_struct *vma,
@@ -1889,13 +1919,17 @@ int change_huge_pmd(struct mmu_gather *tlb, struct 
vm_area_struct *vma,
 */
entry = pmd_clear_uffd_wp(entry);
}
+
+   /* See change_pte_range(). */
+   if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) &&
+   can_change_pmd_writable(vma, addr, entry))
+   entry = pmd_mkwrite(entry);
+
ret = HPAGE_PMD_NR;
set_pmd_at(mm, addr, pmd, entry);
 
if (huge_pmd_needs_flush(oldpmd, entry))
tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE);
-
-   BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(entry));
 unlock:
spin_unlock(ptl);
return ret;
-- 
2.37.3

[PATCH RFC 2/5] mm/mprotect: minor can_change_pte_writable() cleanups

2022-09-26 Thread David Hildenbrand

We want to replicate this code for handling PMDs soon. No need to crash
the kernel, warning and rejecting is good enough. As this will no longer
get optimized out, drop the pte_write() check: no harm would be done.

While at it, add a comment why PROT_NONE mapped pages are excluded.

Signed-off-by: David Hildenbrand 
---
 mm/mprotect.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index c6c13a0a4bcc..95323bc9a951 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -43,8 +43,10 @@ static inline bool can_change_pte_writable(struct 
vm_area_struct *vma,
 {
struct page *page;
 
-   VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte));
+   if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
+   return false;
 
+   /* Don't touch entries that are not even readable (NUMA hinting). */
if (pte_protnone(pte))
return false;
 
-- 
2.37.3

[PATCH RFC 1/5] mm/mprotect: allow clean exclusive anon pages to be writable

2022-09-26 Thread David Hildenbrand

From: Nadav Amit 

Anonymous pages might have the dirty bit clear, but this should not
prevent mprotect from making them writable if they are exclusive.
Therefore, skip the test whether the page is dirty in this case.

Note that there are already other ways to get a writable PTE mapping an
anonymous page that is clean: for example, via MADV_FREE. In an ideal
world, we'd have a different indication from the FS whether writenotify
is still required.

Signed-off-by: Nadav Amit 
[ comment for dirty/clean handling; return directly; update description ]
Signed-off-by: David Hildenbrand 
---
 mm/mprotect.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index ed013f836b4a..c6c13a0a4bcc 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -45,7 +45,7 @@ static inline bool can_change_pte_writable(struct 
vm_area_struct *vma,
 
VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte));
 
-   if (pte_protnone(pte) || !pte_dirty(pte))
+   if (pte_protnone(pte))
return false;
 
/* Do we need write faults for softdirty tracking? */
@@ -64,11 +64,15 @@ static inline bool can_change_pte_writable(struct 
vm_area_struct *vma,
 * the PT lock.
 */
page = vm_normal_page(vma, addr, pte);
-   if (!page || !PageAnon(page) || !PageAnonExclusive(page))
-   return false;
+   return page && PageAnon(page) && PageAnonExclusive(page);
}
 
-   return true;
+   /*
+* Shared mapping: "clean" might indicate that the FS still has to be
+* notified via a write fault once first -- see vma_wants_writenotify().
+* If "dirty", the assumtion is that there already was a write fault.
+*/
+   return pte_dirty(pte);
 }
 
 static unsigned long change_pte_range(struct mmu_gather *tlb,
-- 
2.37.3

[PATCH RFC 0/5] mm/autonuma: replace savedwrite infrastructure

2022-09-26 Thread David Hildenbrand

As discussed in my talk at LPC, we can reuse the same mechanism for
deciding whether to map a pte writable when upgrading permissions via
mprotect() -- e.g., PROT_READ -> PROT_READ|PROT_WRITE -- to replace the
savedwrite infrastructure used for NUMA hinting faults (e.g., PROT_NONE
-> PROT_READ|PROT_WRITE). Instead of maintaining previous write permissions
for a pte/pmd, we re-determine if the pte/pmd can be writable.

The big benefit is that we have a common logic for deciding whether we can
map a pte/pmd writable on protection changes.

For private mappings, there should be no difference -- from
what I understand, that is what autonuma benchmarks care about.

I ran autonumabench on a system with 2 NUMA nodes, 96 GiB each via:
perf stat --null --repeat 10

The numa1 benchmark is quite noisy in my environment. I suspect that there
is no actual change in performance, even though the numbers indicate that
this series might improve performance slightly.

numa1:
mm-stable:   156.75 +- 11.67 seconds time elapsed  ( +-  7.44% )
mm-stable++: 147.50 +- 9.35 seconds time elapsed  ( +-  6.34% )

numa2:
mm-stable:   15.9834 +- 0.0589 seconds time elapsed  ( +-  0.37% )
mm-stable++: 16.1467 +- 0.0946 seconds time elapsed  ( +-  0.59% )

It is worth noting that for shared writable mappings that require
writenotify, we will only avoid write faults if the pte/pmd is dirty
(inherited from the older mprotect logic). If we ever care about optimizing
that further, we'd need a different mechanism to identify whether the FS
still needs to get notified on the next write access. In any case, such an
optimiztion will then not be autonuma-specific, but mprotect() permission
upgrades would similarly benefit from it.

Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: Dave Chinner 
Cc: Nadav Amit 
Cc: Peter Xu 
Cc: Andrea Arcangeli 
Cc: Hugh Dickins 
Cc: Vlastimil Babka 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Mike Rapoport 
Cc: Anshuman Khandual 

David Hildenbrand (4):
  mm/mprotect: minor can_change_pte_writable() cleanups
  mm/huge_memory: try avoiding write faults when changing PMD protection
  mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite
  mm: remove unused savedwrite infrastructure

Nadav Amit (1):
  mm/mprotect: allow clean exclusive anon pages to be writable

 arch/powerpc/include/asm/book3s/64/pgtable.h | 80 +---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  2 +-
 include/linux/mm.h   |  2 +
 include/linux/pgtable.h  | 24 --
 mm/debug_vm_pgtable.c| 32 
 mm/huge_memory.c | 66 
 mm/ksm.c |  9 +--
 mm/memory.c  | 19 -
 mm/mprotect.c| 23 +++---
 9 files changed, 93 insertions(+), 164 deletions(-)

-- 
2.37.3

Re: [PATCH v2 6/6] powerpc/64: Add tests for out-of-line static calls

2022-09-26 Thread Christophe Leroy



Le 26/09/2022 à 08:43, Benjamin Gray a écrit :
> KUnit tests for the various combinations of caller/trampoline/target and
> kernel/module. They must be run from a module loaded at runtime to
> guarantee they have a different TOC to the kernel.
> 
> The tests try to mitigate the chance of panicing by restoring the
> TOC after every static call. Not all possible errors can be caught
> by this (we can't stop a trampoline from using a bad TOC itself),
> but it makes certain errors easier to debug.
> 
> Signed-off-by: Benjamin Gray 
> ---
>   arch/powerpc/Kconfig   |  10 +
>   arch/powerpc/kernel/Makefile   |   1 +
>   arch/powerpc/kernel/static_call.c  |  61 ++
>   arch/powerpc/kernel/static_call_test.c | 251 +
>   arch/powerpc/kernel/static_call_test.h |  56 ++
>   5 files changed, 379 insertions(+)
>   create mode 100644 arch/powerpc/kernel/static_call_test.c
>   create mode 100644 arch/powerpc/kernel/static_call_test.h
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index e7a66635eade..0ca60514c0e2 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -1023,6 +1023,16 @@ config PPC_RTAS_FILTER
> Say Y unless you know what you are doing and the filter is causing
> problems for you.
>   
> +config PPC_STATIC_CALL_KUNIT_TEST
> + tristate "KUnit tests for PPC64 ELF ABI V2 static calls"
> + default KUNIT_ALL_TESTS
> + depends on HAVE_STATIC_CALL && PPC64_ELF_ABI_V2 && KUNIT && m

Is there a reason why it is dedicated to PPC64 ? In that case, can you 
make it explicit with the name of the config option, and with the name 
of the file below ?

> + help
> +   Tests that check the TOC is kept consistent across all combinations
> +   of caller/trampoline/target being kernel/module. Must be built as a
> +   module and loaded at runtime to ensure the module has a different
> +   TOC to the kernel.
> +
>   endmenu
>   
>   config ISA_DMA_API
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index a30d0d0f5499..22c07e3d34df 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -131,6 +131,7 @@ obj-$(CONFIG_RELOCATABLE) += reloc_$(BITS).o
>   obj-$(CONFIG_PPC32) += entry_32.o setup_32.o early_32.o
>   obj-$(CONFIG_PPC64) += dma-iommu.o iommu.o
>   obj-$(CONFIG_HAVE_STATIC_CALL)  += static_call.o
> +obj-$(CONFIG_PPC_STATIC_CALL_KUNIT_TEST) += static_call_test.o
>   obj-$(CONFIG_KGDB)  += kgdb.o
>   obj-$(CONFIG_BOOTX_TEXT)+= btext.o
>   obj-$(CONFIG_SMP)   += smp.o
> diff --git a/arch/powerpc/kernel/static_call.c 
> b/arch/powerpc/kernel/static_call.c
> index ecbb74e1b4d3..8d338917b70e 100644
> --- a/arch/powerpc/kernel/static_call.c
> +++ b/arch/powerpc/kernel/static_call.c
> @@ -113,3 +113,64 @@ void arch_static_call_transform(void *site, void *tramp, 
> void *func, bool tail)
>   panic("%s: patching failed %pS at %pS\n", __func__, func, 
> tramp);
>   }
>   EXPORT_SYMBOL_GPL(arch_static_call_transform);
> +
> +
> +#if IS_MODULE(CONFIG_PPC_STATIC_CALL_KUNIT_TEST)
> +
> +#include "static_call_test.h"
> +
> +int ppc_sc_kernel_target_1(struct kunit* test)
> +{
> + toc_fixup(test);
> + return 1;
> +}
> +
> +int ppc_sc_kernel_target_2(struct kunit* test)
> +{
> + toc_fixup(test);
> + return 2;
> +}
> +
> +DEFINE_STATIC_CALL(ppc_sc_kernel, ppc_sc_kernel_target_1);
> +
> +int ppc_sc_kernel_call(struct kunit* test)
> +{
> + return PROTECTED_SC(test, int, static_call(ppc_sc_kernel)(test));
> +}
> +
> +int ppc_sc_kernel_call_indirect(struct kunit* test, int (*fn)(struct kunit*))
> +{
> + return PROTECTED_SC(test, int, fn(test));
> +}
> +
> +long ppc_sc_kernel_target_big(struct kunit* test,
> +   long a,
> +   long b,
> +   long c,
> +   long d,
> +   long e,
> +   long f,
> +   long g,
> +   long h,
> +   long i)
> +{
> + toc_fixup(test);
> + KUNIT_EXPECT_EQ(test, a, b);
> + KUNIT_EXPECT_EQ(test, a, c);
> + KUNIT_EXPECT_EQ(test, a, d);
> + KUNIT_EXPECT_EQ(test, a, e);
> + KUNIT_EXPECT_EQ(test, a, f);
> + KUNIT_EXPECT_EQ(test, a, g);
> + KUNIT_EXPECT_EQ(test, a, h);
> + KUNIT_EXPECT_EQ(test, a, i);
> + return ~a;
> +}
> +
> +EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_1);
> +EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_2);
> +EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_big);
> +EXPORT_STATIC_CALL_GPL(ppc_sc_kernel);
> +EXPORT_SYMBOL_GPL(ppc_sc_kernel_call);
> +EXPORT_SYMBOL_GPL(ppc_sc_kernel_call_indirect);
> +
> +#endif /* IS_MODULE(CONFIG_PPC_STATIC_CALL_KUNIT_TEST) */
> diff --git a/arch/powerpc/kernel/static_call_test.c 
> b/arch/powerpc/kernel/static_call_test.c
> new file mode 100644
> index

Re: [PATCH v2 5/6] powerpc/64: Add support for out-of-line static calls

2022-09-26 Thread Christophe Leroy



Le 26/09/2022 à 08:43, Benjamin Gray a écrit :
> Implement static call support for 64 bit V2 ABI. This requires
> making sure the TOC is kept correct across kernel-module
> boundaries. As a secondary concern, it tries to use the local
> entry point of a target wherever possible. It does so by
> checking if both tramp & target are kernel code, and falls
> back to detecting the common global entry point patterns
> if modules are involved. Detecting the global entry point is
> also required for setting the local entry point as the trampoline
> target: if we cannot detect the local entry point, then we need to
> convservatively initialise r12 and use the global entry point.
> 
> The trampolines are marked with `.localentry NAME, 1` to make the
> linker save and restore the TOC on each call to the trampoline. This
> allows the trampoline to safely target functions with different TOC
> values.
> 
> However this directive also implies the TOC is not initialised on entry
> to the trampoline. The kernel TOC is easily found in the PACA, but not
> an arbitrary module TOC. Therefore the trampoline implementation depends
> on whether it's in the kernel or not. If in the kernel, we initialise
> the TOC using the PACA. If in a module, we have to initialise the TOC
> with zero context, so it's quite expensive.
> 
> Signed-off-by: Benjamin Gray 
> ---
>   arch/powerpc/Kconfig |  2 +-
>   arch/powerpc/include/asm/code-patching.h |  1 +
>   arch/powerpc/include/asm/static_call.h   | 80 +++--
>   arch/powerpc/kernel/Makefile |  3 +-
>   arch/powerpc/kernel/static_call.c| 90 ++--
>   5 files changed, 164 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 4c466acdc70d..e7a66635eade 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -248,7 +248,7 @@ config PPC
>   select HAVE_SOFTIRQ_ON_OWN_STACK
>   select HAVE_STACKPROTECTOR  if PPC32 && 
> $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
>   select HAVE_STACKPROTECTOR  if PPC64 && 
> $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
> - select HAVE_STATIC_CALL if PPC32
> + select HAVE_STATIC_CALL if PPC32 || PPC64_ELF_ABI_V2
>   select HAVE_SYSCALL_TRACEPOINTS
>   select HAVE_VIRT_CPU_ACCOUNTING
>   select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
> diff --git a/arch/powerpc/include/asm/code-patching.h 
> b/arch/powerpc/include/asm/code-patching.h
> index 15efd8ab22da..8d1850080af8 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -132,6 +132,7 @@ int translate_branch(ppc_inst_t *instr, const u32 *dest, 
> const u32 *src);
>   bool is_conditional_branch(ppc_inst_t instr);
>   
>   #define OP_RT_RA_MASK   0xUL
> +#define OP_SI_MASK   0xUL
>   #define LIS_R2  (PPC_RAW_LIS(_R2, 0))
>   #define ADDIS_R2_R12(PPC_RAW_ADDIS(_R2, _R12, 0))
>   #define ADDI_R2_R2  (PPC_RAW_ADDI(_R2, _R2, 0))
> diff --git a/arch/powerpc/include/asm/static_call.h 
> b/arch/powerpc/include/asm/static_call.h
> index de1018cc522b..3d6e82200cb7 100644
> --- a/arch/powerpc/include/asm/static_call.h
> +++ b/arch/powerpc/include/asm/static_call.h
> @@ -2,12 +2,75 @@
>   #ifndef _ASM_POWERPC_STATIC_CALL_H
>   #define _ASM_POWERPC_STATIC_CALL_H
>   
> +#ifdef CONFIG_PPC64_ELF_ABI_V2
> +
> +#ifdef MODULE
> +
> +#define __PPC_SCT(name, inst)\
> + asm(".pushsection .text, \"ax\" \n" \
> + ".align 6   \n" \
> + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
> + ".localentry " STATIC_CALL_TRAMP_STR(name) ", 1 \n" \
> + STATIC_CALL_TRAMP_STR(name) ":  \n" \
> + "   mflr11  \n" \
> + "   bcl 20, 31, $+4 \n" \
> + "0: mflr12  \n" \
> + "   mtlr11  \n" \
> + "   addi12, 12, (" STATIC_CALL_TRAMP_STR(name) " - 0b)  \n" 
> \
> + "   addis 2, 12, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@ha \n" 
> \
> + "   addi 2, 2, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@l\n" 
> \
> + "   " inst "\n" \
> + "   ld  12, (2f - " STATIC_CALL_TRAMP_STR(name) ")(12)  \n" 
> \
> + "   mtctr   12  \n" \
> + "   bctr\n" \
> + "1: li  3, 0\n" \
> + "   blr

Re: [PATCH v2 4/6] static_call: Move static call selftest to static_call_selftest.c

2022-09-26 Thread Christophe Leroy



Le 26/09/2022 à 08:43, Benjamin Gray a écrit :
> These tests are out-of-line only, so moving them to the
> their own file allows them to be run when an arch does
> not implement inline static calls.
> 
> Signed-off-by: Benjamin Gray 

I think you got a Reviewed-by from previous series.

> ---
>   kernel/Makefile   |  1 +
>   kernel/static_call_inline.c   | 43 ---
>   kernel/static_call_selftest.c | 41 +
>   3 files changed, 42 insertions(+), 43 deletions(-)
>   create mode 100644 kernel/static_call_selftest.c
> 
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 318789c728d3..8ce8beaa3cc0 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -113,6 +113,7 @@ obj-$(CONFIG_KCSAN) += kcsan/
>   obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
>   obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o
>   obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o
> +obj-$(CONFIG_STATIC_CALL_SELFTEST) += static_call_selftest.o
>   obj-$(CONFIG_CFI_CLANG) += cfi.o
>   
>   obj-$(CONFIG_PERF_EVENTS) += events/
> diff --git a/kernel/static_call_inline.c b/kernel/static_call_inline.c
> index dc5665b62814..64d04d054698 100644
> --- a/kernel/static_call_inline.c
> +++ b/kernel/static_call_inline.c
> @@ -498,46 +498,3 @@ int __init static_call_init(void)
>   return 0;
>   }
>   early_initcall(static_call_init);
> -
> -#ifdef CONFIG_STATIC_CALL_SELFTEST
> -
> -static int func_a(int x)
> -{
> - return x+1;
> -}
> -
> -static int func_b(int x)
> -{
> - return x+2;
> -}
> -
> -DEFINE_STATIC_CALL(sc_selftest, func_a);
> -
> -static struct static_call_data {
> -  int (*func)(int);
> -  int val;
> -  int expect;
> -} static_call_data [] __initdata = {
> -  { NULL,   2, 3 },
> -  { func_b, 2, 4 },
> -  { func_a, 2, 3 }
> -};
> -
> -static int __init test_static_call_init(void)
> -{
> -  int i;
> -
> -  for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) {
> -   struct static_call_data *scd = _call_data[i];
> -
> -  if (scd->func)
> -  static_call_update(sc_selftest, scd->func);
> -
> -  WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect);
> -  }
> -
> -  return 0;
> -}
> -early_initcall(test_static_call_init);
> -
> -#endif /* CONFIG_STATIC_CALL_SELFTEST */
> diff --git a/kernel/static_call_selftest.c b/kernel/static_call_selftest.c
> new file mode 100644
> index ..246ad89f64eb
> --- /dev/null
> +++ b/kernel/static_call_selftest.c
> @@ -0,0 +1,41 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +
> +static int func_a(int x)
> +{
> + return x+1;
> +}
> +
> +static int func_b(int x)
> +{
> + return x+2;
> +}
> +
> +DEFINE_STATIC_CALL(sc_selftest, func_a);
> +
> +static struct static_call_data {
> + int (*func)(int);
> + int val;
> + int expect;
> +} static_call_data [] __initdata = {
> + { NULL,   2, 3 },
> + { func_b, 2, 4 },
> + { func_a, 2, 3 }
> +};
> +
> +static int __init test_static_call_init(void)
> +{
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) {
> + struct static_call_data *scd = _call_data[i];
> +
> + if (scd->func)
> + static_call_update(sc_selftest, scd->func);
> +
> + WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect);
> + }
> +
> + return 0;
> +}
> +early_initcall(test_static_call_init);

Re: [PATCH v2 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub

2022-09-26 Thread Christophe Leroy



Le 26/09/2022 à 08:43, Benjamin Gray a écrit :
> Inserts a direct branch to the stub target when possible, replacing the
> mtctr/btctr sequence.
> 
> The load into r12 could potentially be skipped too, but that change
> would need to refactor the arguments to indicate that the address
> does not have a separate local entry point.
> 
> This helps the static call implementation, where modules calling their
> own trampolines are called through this stub and the trampoline is
> easily within range of a direct branch.
> 
> Signed-off-by: Benjamin Gray 
> ---
>   arch/powerpc/kernel/module_64.c | 13 +
>   1 file changed, 13 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
> index 4d816f7785b4..745ce9097dcf 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -141,6 +141,12 @@ static u32 ppc64_stub_insns[] = {
>   PPC_RAW_BCTR(),
>   };
>   
> +#ifdef CONFIG_PPC64_ELF_ABI_V1
> +#define PPC64_STUB_MTCTR_OFFSET 5
> +#else
> +#define PPC64_STUB_MTCTR_OFFSET 4
> +#endif
> +
>   /* Count how many different 24-bit relocations (different symbol,
>  different addend) */
>   static unsigned int count_relocs(const Elf64_Rela *rela, unsigned int num)
> @@ -429,6 +435,8 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
>   long reladdr;
>   func_desc_t desc;
>   int i;
> + u32 *jump_seq_addr = >jump[PPC64_STUB_MTCTR_OFFSET];
> + ppc_inst_t direct;
>   
>   if (is_mprofile_ftrace_call(name))
>   return create_ftrace_stub(entry, addr, me);
> @@ -439,6 +447,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
>   return 0;
>   }
>   
> + /* Replace indirect branch sequence with direct branch where possible */
> + if (!create_branch(, jump_seq_addr, addr, 0))
> + if (patch_instruction(jump_seq_addr, direct))

Why not use patch_branch() ?

> + return 0;
> +
>   /* Stub uses address relative to r2. */
>   reladdr = (unsigned long)entry - my_r2(sechdrs, me);
>   if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {

Re: [PATCH 2/7] mm: Free device private pages have zero refcount

2022-09-26 Thread Jason Gunthorpe

On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote:
> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page
> refcount") device private pages have no longer had an extra reference
> count when the page is in use. However before handing them back to the
> owning device driver we add an extra reference count such that free
> pages have a reference count of one.
> 
> This makes it difficult to tell if a page is free or not because both
> free and in use pages will have a non-zero refcount. Instead we should
> return pages to the drivers page allocator with a zero reference count.
> Kernel code can then safely use kernel functions such as
> get_page_unless_zero().
> 
> Signed-off-by: Alistair Popple 
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c   | 1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 +
>  drivers/gpu/drm/nouveau/nouveau_dmem.c   | 1 +
>  lib/test_hmm.c   | 1 +
>  mm/memremap.c| 5 -
>  mm/page_alloc.c  | 6 ++
>  6 files changed, 10 insertions(+), 5 deletions(-)

I think this is a great idea, but I'm surprised no dax stuff is
touched here?

Jason

Re: [PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function

2022-09-26 Thread Christophe Leroy

Hi,

By the way my email address is not anymore @c-s.fr but @csgroup.eu 
allthough the former still works.

Le 26/09/2022 à 08:43, Benjamin Gray a écrit :
> Adds a generic text patching mechanism for patches of 1, 2, 4, or (64-bit) 8
> bytes. The patcher conditionally syncs the icache depending on if
> the content will be executed (as opposed to, e.g., read-only data).
> 
> The `patch_instruction` function is reimplemented in terms of this
> more generic function. This generic implementation allows patching of
> arbitrary 64-bit data, whereas the original `patch_instruction` decided
> the size based on the 'instruction' opcode, so was not suitable for
> arbitrary data.

I get a lot better though still some slight degradation: I get approx 3% 
more time needed to activate and de-activate ftrace when 
STRICT_KERNEL_RWX is selected.

I get a surprising result without STRICT_KERNEL_RWX. Activation is also 
3% but the de-activation needs 25% more time.


> 
> Signed-off-by: Benjamin Gray 
> ---
>   arch/powerpc/include/asm/code-patching.h |  7 ++
>   arch/powerpc/lib/code-patching.c | 90 +---
>   2 files changed, 71 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/code-patching.h 
> b/arch/powerpc/include/asm/code-patching.h
> index 1c6316ec4b74..15efd8ab22da 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -76,6 +76,13 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
>   int patch_branch(u32 *addr, unsigned long target, int flags);
>   int patch_instruction(u32 *addr, ppc_inst_t instr);
>   int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
> +int __patch_memory(void *dest, unsigned long src, size_t size);
> +
> +#define patch_memory(addr, val) \
> +({ \
> + BUILD_BUG_ON(!__native_word(val)); \
> + __patch_memory(addr, (unsigned long) val, sizeof(val)); \
> +})

Can you do a static __always_inline function instead of a macro here ?

> 
>   static inline unsigned long patch_site_addr(s32 *site)
>   {
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index ad0cf3108dd0..9979380d55ef 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -15,20 +15,47 @@
>   #include 
>   #include 
> 
> -static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 
> *patch_addr)
> +static int __always_inline ___patch_memory(void *patch_addr,
> +unsigned long data,
> +void *prog_addr,
> +size_t size)

Is it really needed in the .c file ? I would expect GCC to take the 
right decision by itself.

By the way, the __always_inline must immediately follow static.

>   {
> - if (!ppc_inst_prefixed(instr)) {
> - u32 val = ppc_inst_val(instr);
> + switch (size) {
> + case 1:
> + __put_kernel_nofault(patch_addr, , u8, failed);
> + break;
> + case 2:
> + __put_kernel_nofault(patch_addr, , u16, failed);
> + break;
> + case 4:
> + __put_kernel_nofault(patch_addr, , u32, failed);
> + break;
> +#ifdef CONFIG_PPC64
> + case 8:
> + __put_kernel_nofault(patch_addr, , u64, failed);
> + break;
> +#endif
> + default:
> + unreachable();

A BUILD_BUG() would be better here I think.

> + }
> 
> - __put_kernel_nofault(patch_addr, , u32, failed);
> - } else {
> - u64 val = ppc_inst_as_ulong(instr);
> + dcbst(patch_addr);
> + dcbst(patch_addr + size - 1); /* Last byte of data may cross a 
> cacheline */

Or the second byte of data may cross a cacheline ...

> 
> - __put_kernel_nofault(patch_addr, , u64, failed);
> - }
> + mb(); /* sync */
> +
> + /* Flush on the EA that may be executed in case of a non-coherent 
> icache */
> + icbi(prog_addr);
> +
> + /* Also flush the last byte of the instruction if it may be a
> +  * prefixed instruction and we aren't assuming minimum 64-byte
> +  * cacheline sizes
> +  */
> + if (IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64)
> + icbi(prog_addr + size - 1);
> 
> - asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
> - "r" (exec_addr));
> + mb(); /* sync */
> + isync();
> 
>   return 0;
> 
> @@ -38,7 +65,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t 
> instr, u32 *patch_addr
> 
>   int raw_patch_instruction(u32 *addr, ppc_inst_t instr)
>   {
> - return __patch_instruction(addr, instr, addr);
> + if (ppc_inst_prefixed(instr))
> + return ___patch_memory(addr, ppc_inst_as_ulong(instr), addr, 
> sizeof(u64));
> + else
> + return ___patch_memory(addr, ppc_inst_val(instr), addr, 
> sizeof(u32));
>

Re: [PATCH v2 0/6] Out-of-line static calls for powerpc64 ELF V2

2022-09-26 Thread Christophe Leroy



Le 26/09/2022 à 08:43, Benjamin Gray a écrit :
> Implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI.
> Static calls patch an indirect branch into a direct branch at runtime.
> Out-of-line specifically has a caller directly call a trampoline, and
> the trampoline gets patched to directly call the target.
> 
> Previous version here:
> https://lore.kernel.org/all/20220916062330.430468-1-bg...@linux.ibm.com/
> 
> I couldn't see a dedicated ftrace benchmark in the kernel, but my own
> benchmarking showed no significant impact to ftrace activation.

I use the following hack for benchmarking:

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 439e2ab6905e..e7d0d3deb8bf 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2628,10 +2628,11 @@ void __weak ftrace_replace_code(int mod_flags)
 bool enable = mod_flags & FTRACE_MODIFY_ENABLE_FL;
 int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL;
 int failed;
+   int t0;

 if (unlikely(ftrace_disabled))
 return;
-
+t0 = mftb();
 do_for_each_ftrace_rec(pg, rec) {

 if (rec->flags & FTRACE_FL_DISABLED)
@@ -2646,6 +2647,8 @@ void __weak ftrace_replace_code(int mod_flags)
 if (schedulable)
 cond_resched();
 } while_for_each_ftrace_rec();
+t0 = mftb() - t0;
+pr_err("%s: %d\n", __func__, t0);
  }

  struct ftrace_rec_iter {


> 
> The __patch_memory function is meant to be accessed through the size checking
> patch_memory wrapper. I don't think there's a way to expose the macro without
> also exposing __patch_memory though. I considered making the type an explicit
> macro param, but using the value type seemed more ergonomic.
> 
> V2:
> Mostly accounting for feedback from Christophe:
> * Code patching rewritten
>  - Rename to *_memory
>  - Use __always_inline to get the compiler to realise it can
>collapse all the sub-functions
>  - Pass data directly instead of through a pointer, elliding a redundant 
> load
>  - Flush the last byte of data too (technically redundant if an 
> instrucion, but
>saves a conditional branch + the isync will be the bottleneck).
>  - Handle a non-cohenrent icache, assume a coherent dcache
>  - Handle when we don't assume a 64 byte icache on 64-bits
>  - Flatten the poke address init and teardown
>  - Check the data size in patch_memory at build time
>(inline function was suggested, but a macro makes checking
> based on the data type easier).
>  - It builds now on 32 bit and without strict RWX
> * Static call enabling is no longer configurable
> * Refactored arch_static_call_transform to minimise casting
> * Made the KUnit tests more robust (previously they changed non-volatile
>registers in the init hook, but that's incorrect because it returns to
>the KUnit framework before the test case is called).
> * Some other minor refactoring in other patches
> 
> 
> Benjamin Gray (6):
>powerpc/code-patching: Implement generic text patching function
>powerpc/module: Handle caller-saved TOC in module linker
>powerpc/module: Optimise nearby branches in ELF V2 ABI stub
>static_call: Move static call selftest to static_call_selftest.c
>powerpc/64: Add support for out-of-line static calls
>powerpc/64: Add tests for out-of-line static calls
> 
>   arch/powerpc/Kconfig |  12 +-
>   arch/powerpc/include/asm/code-patching.h |   8 +
>   arch/powerpc/include/asm/static_call.h   |  80 +++-
>   arch/powerpc/kernel/Makefile |   4 +-
>   arch/powerpc/kernel/module_64.c  |  27 ++-
>   arch/powerpc/kernel/static_call.c| 151 +-
>   arch/powerpc/kernel/static_call_test.c   | 251 +++
>   arch/powerpc/kernel/static_call_test.h   |  56 +
>   arch/powerpc/lib/code-patching.c |  90 +---
>   kernel/Makefile  |   1 +
>   kernel/static_call_inline.c  |  43 
>   kernel/static_call_selftest.c|  41 
>   12 files changed, 682 insertions(+), 82 deletions(-)
>   create mode 100644 arch/powerpc/kernel/static_call_test.c
>   create mode 100644 arch/powerpc/kernel/static_call_test.h
>   create mode 100644 kernel/static_call_selftest.c
> 
> 
> base-commit: 3d7a198cfdb47405cfb4a3ea523876569fe341e6
> --
> 2.37.3

Re: [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status

2022-09-26 Thread Zhuo Chen





On 9/23/22 5:50 AM, Bjorn Helgaas wrote:

On Fri, Sep 02, 2022 at 02:16:34AM +0800, Zhuo Chen wrote:

Statements clearing AER error status in aer_enable_rootport() has the
same function as pci_aer_raw_clear_status(). So we replace them, which
has no functional changes.

Signed-off-by: Zhuo Chen 
---
  drivers/pci/pcie/aer.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index d2996afa80f6..eb0193f279f2 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1287,12 +1287,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc)
   SYSTEM_ERROR_INTR_ON_MESG_MASK);
  
  	/* Clear error status */

-   pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32);
-   pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, );
-   pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32);
+   pci_aer_raw_clear_status(pdev);


It's true that this is functionally equivalent.

But 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status() to
unconditionally clear Error Status") says pci_aer_raw_clear_status()
is only for use in the EDR path (this should have been included in the
function comment), so I think we should preserve that property and use
pci_aer_clear_status() here.

pci_aer_raw_clear_status() is the same as pci_aer_clear_status()
except it doesn't check pcie_aer_is_native().  And I'm pretty sure we
can't get to aer_enable_rootport() *unless* pcie_aer_is_native(),
because get_port_device_capability() checks the same thing, so they
should be equivalent here.

Bjorn
Thanks Bjorn, this very detailed correction is helpful. By the way, 
'only for use in the EDR path' obviously written in the function 
comments may be better. So far only commit log has included these.


I will change to use pci_aer_clear_status() in next patch.

--
Thanks,
Zhuo Chen

Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()

2022-09-26 Thread Zhuo Chen





On 9/23/22 5:08 AM, Bjorn Helgaas wrote:

On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote:

When state is pci_channel_io_frozen in pcie_do_recovery(),
the severity is fatal and fatal status should be cleared.
So we add pci_aer_clear_fatal_status().


Seems sensible to me.  Did you find this by code inspection or by
debugging a problem?  If the latter, it would be nice to mention the
symptoms of the problem in the commit log.


I found this by code inspection so I may not enumerate what kind of 
problems this code will cause.



Since pcie_aer_is_native() in pci_aer_clear_fatal_status()
and pci_aer_clear_nonfatal_status() contains the function of
'if (host->native_aer || pcie_ports_native)', so we move them
out of it.


Wrap commit log to fill 75 columns.


Signed-off-by: Zhuo Chen 
---
  drivers/pci/pcie/err.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 0c5a143025af..e0a8ade4c3fe 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
 * it is responsible for clearing this status.  In that case, the
 * signaling device may not even be visible to the OS.
 */
-   if (host->native_aer || pcie_ports_native) {
+   if (host->native_aer || pcie_ports_native)
pcie_clear_device_status(dev);


pcie_clear_device_status() doesn't check for pcie_aer_is_native()
internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status
errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER
status only when we control AER"), both callers check before calling
it.

I think we should move the check inside pcie_clear_device_status().
That could be a separate preliminary patch.

There are a couple other places (aer_root_reset() and
get_port_device_capability()) that do the same check and could be
changed to use pcie_aer_is_native() instead.  That could be another
preliminary patch.

Good suggestion. But I have only one doubt. In aer_root_reset(), if we 
use "if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap
is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return 
false. It's different from just using "(host->native_aer ||

pcie_ports_native)".
Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL 
pointer check should be added in pcie_aer_is_native() because root may 
be NULL.





+   if (state == pci_channel_io_frozen)
+   pci_aer_clear_fatal_status(dev);
+   else
pci_aer_clear_nonfatal_status(dev);
-   }
+
pci_info(bridge, "device recovery successful\n");
return status;
  
--

2.30.1 (Apple Git-130)



--
Thanks,
Zhuo Chen

Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status

2022-09-26 Thread Zhuo Chen





On 9/23/22 4:02 AM, Bjorn Helgaas wrote:

On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote:

On 9/12/22 12:22 AM, Serge Semin wrote:

On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote:

Status bits for ERR_NONFATAL errors only are cleared in
pci_aer_clear_nonfatal_status(), but we want clear uncorrectable
error status in ntb_hw_idt.c and lpfc_attr.c. So we add
pci_aer_clear_uncorrect_error_status() and change to use it.


What about the next drivers

drivers/scsi/lpfc/lpfc_attr.c
drivers/crypto/hisilicon/qm.c
drivers/net/ethernet/intel/ice/ice_main.c

which call the pci_aer_clear_nonfatal_status() method too?


‘pci_aer_clear_nonfatal_status()’ in
drivers/net/ethernet/intel/ice/ice_main.c has already been removed and
merged in kernel in: 
https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776


It's better if you can use kernel.org URLs that don't depend on
third parties like github, e.g.,

   https://git.kernel.org/linus/ca415ea1f03a


Good reminder, I'll pay attention next time.


‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be
removed in the next kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406


This is a problem because 00278564a60e ("crypto: hisilicon - Remove
pci_aer_clear_nonfatal_status() call") is in Herbert's cryptodev tree,
and if I apply this series to the PCI tree and Linus merges it before
Herbert's cryptodev changes, it will break the build.

I think we need to split this patch up like this:

   - Add pci_aer_clear_uncorrect_error_status() to PCI core
   - Convert dpc to use pci_aer_clear_uncorrect_error_status()
 (I might end up squashing with above)
   - Convert lpfc to use pci_aer_clear_uncorrect_error_status()
   - Convert ntb_hw_idt to use pci_aer_clear_uncorrect_error_status()
   - Unexport pci_aer_clear_nonfatal_status()

Then I can apply all but the last patch safely.  If the crypto changes
are merged first, we can add the last one; otherwise we can do it for
the next cycle.


Good proposal. I will implement these in the next version.

Do I need to put pci related modifications (include patch 2/3 and 3/3) 
in a patch set or just single patches?



Uncorrectable error status register was intended to be cleared in
drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in 
https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb
and
https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f


This will be a behavior change for lpfc and ntb_hw_idt.  It looks like
it changes the behavior back to what it was before e7b0b847de6d
("PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery"),
so it might be OK, but splitting these out to their own patches will
make the change more obvious and we can make sure that's what we want.

Bjorn


Thanks Bjorn, I will put lpfc and ntb_hw_idt changes in single patchs.



Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has
no functional changes.

Since pci_aer_clear_nonfatal_status() is used only internally, move
its declaration to the PCI internal header file. Also, no one cares
about return value of pci_aer_clear_nonfatal_status(), so make it void.

Signed-off-by: Zhuo Chen 
---
   drivers/ntb/hw/idt/ntb_hw_idt.c |  4 ++--
   drivers/pci/pci.h   |  2 ++
   drivers/pci/pcie/aer.c  | 23 ++-
   drivers/pci/pcie/dpc.c  |  3 +--
   drivers/scsi/lpfc/lpfc_attr.c   |  4 ++--
   include/linux/aer.h |  4 ++--
   6 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 733557231ed0..de1dbbc5b9de 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
ret = pci_enable_pcie_error_reporting(pdev);
if (ret != 0)
dev_warn(>dev, "PCIe AER capability disabled\n");



-   else /* Cleanup nonfatal error status before getting to init */
-   pci_aer_clear_nonfatal_status(pdev);
+   else /* Cleanup uncorrectable error status before getting to init */
+   pci_aer_clear_uncorrect_error_status(pdev);


  From the IDT NTB PCIe initialization procedure point of view both of
these methods are equivalent. So for the IDT NTB part:


IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original
function is clear uncorrectable error status register including fatal and
non-fatal error status bits.


Acked-by: Serge Semin 

-Sergey


/* First enable the PCI device */
ret = pcim_enable_device(pdev);
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index e10cdec6c56e..574176f43025 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);

Re: [PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function

2022-09-26 Thread kernel test robot

Hi Benjamin,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on 3d7a198cfdb47405cfb4a3ea523876569fe341e6]

url:
https://github.com/intel-lab-lkp/linux/commits/Benjamin-Gray/Out-of-line-static-calls-for-powerpc64-ELF-V2/20220926-145009
base:   3d7a198cfdb47405cfb4a3ea523876569fe341e6
config: powerpc-allnoconfig
compiler: powerpc-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/7e7a5738456329ebbc24558228fb729ce5236f60
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Benjamin-Gray/Out-of-line-static-calls-for-powerpc64-ELF-V2/20220926-145009
git checkout 7e7a5738456329ebbc24558228fb729ce5236f60
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 
O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/lib/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> arch/powerpc/lib/code-patching.c:18:1: error: 'inline' is not at beginning 
>> of declaration [-Werror=old-style-declaration]
  18 | static int __always_inline ___patch_memory(void *patch_addr,
 | ^~
   cc1: all warnings being treated as errors


vim +/inline +18 arch/powerpc/lib/code-patching.c

17  
  > 18  static int __always_inline ___patch_memory(void *patch_addr,
19 unsigned long data,
20 void *prog_addr,
21 size_t size)
22  {
23  switch (size) {
24  case 1:
25  __put_kernel_nofault(patch_addr, , u8, failed);
26  break;
27  case 2:
28  __put_kernel_nofault(patch_addr, , u16, failed);
29  break;
30  case 4:
31  __put_kernel_nofault(patch_addr, , u32, failed);
32  break;
33  #ifdef CONFIG_PPC64
34  case 8:
35  __put_kernel_nofault(patch_addr, , u64, failed);
36  break;
37  #endif
38  default:
39  unreachable();
40  }
41  
42  dcbst(patch_addr);
43  dcbst(patch_addr + size - 1); /* Last byte of data may cross a 
cacheline */
44  
45  mb(); /* sync */
46  
47  /* Flush on the EA that may be executed in case of a 
non-coherent icache */
48  icbi(prog_addr);
49  
50  /* Also flush the last byte of the instruction if it may be a
51   * prefixed instruction and we aren't assuming minimum 64-byte
52   * cacheline sizes
53   */
54  if (IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64)
55  icbi(prog_addr + size - 1);
56  
57  mb(); /* sync */
58  isync();
59  
60  return 0;
61  
62  failed:
63  return -EPERM;
64  }
65  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp
#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 6.0.0-rc2 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="powerpc-linux-gcc (GCC) 12.1.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=120100
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23800
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23800
CONFIG_LLD_VERSION=0
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=123
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_XZ is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SYSVIPC is not set
# CONFIG_WATCH_QUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# end of IRQ subsystem

CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
# end of Tim

Re: [PATCH v2] powerpc: Ignore DSI error caused by the copy/paste instruction

2022-09-26 Thread Michael Ellerman

Haren Myneni  writes:
> DSI error will be generated when the paste operation is issued on
> the suspended NX window due to NX state changes. The hypervisor

Please spell out DSI and NX on the first usage.

> expects the partition to ignore this error during page pault
> handling. To differentiate DSI caused by an actual HW configuration
> or by the NX window, a new “ibm,pi-features” type value is defined.
> Byte 0, bit 3 of pi-attribute-specifier-type is now defined to
> indicate this DSI error. If this error is not ignored, the user
> space can get SIGBUS when the NX request is issued.
>
> This patch adds changes to read ibm,pi-features property and ignore
> DSI error in the page fault handling if CPU_FTR_NX_DSI if defined.
>
> Signed-off-by: Haren Myneni 
> ---
> v2: Code cleanup as suggested by Christophe Leroy 
>
>  arch/powerpc/include/asm/cputable.h |  5 ++--
>  arch/powerpc/kernel/prom.c  | 36 +
>  arch/powerpc/mm/fault.c | 17 +-
>  3 files changed, 45 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/cputable.h 
> b/arch/powerpc/include/asm/cputable.h
> index ae8c3e13cfce..8dc9949b6365 100644
> --- a/arch/powerpc/include/asm/cputable.h
> +++ b/arch/powerpc/include/asm/cputable.h
> @@ -192,6 +192,7 @@ static inline void cpu_feature_keys_init(void) { }
>  #define CPU_FTR_P9_RADIX_PREFETCH_BUG
> LONG_ASM_CONST(0x0002)
>  #define CPU_FTR_ARCH_31  
> LONG_ASM_CONST(0x0004)
>  #define CPU_FTR_DAWR1
> LONG_ASM_CONST(0x0008)
> +#define CPU_FTR_NX_DSI   
> LONG_ASM_CONST(0x0010)

Can we make this an MMU feature?

We have a lot more free MMU feature bits, it should just be a case of
s/cpu/mmu/ pretty much everywhere you use it.

>  #ifndef __ASSEMBLY__
>  
> @@ -429,7 +430,7 @@ static inline void cpu_feature_keys_init(void) { }
>   CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
>   CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
>   CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_P9_TLBIE_STQ_BUG | \
> - CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR)
> + CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR | CPU_FTR_NX_DSI)
>  #define CPU_FTRS_POWER9_DD2_0 (CPU_FTRS_POWER9 | 
> CPU_FTR_P9_RADIX_PREFETCH_BUG)
>  #define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | \
>  CPU_FTR_P9_RADIX_PREFETCH_BUG | \
> @@ -451,7 +452,7 @@ static inline void cpu_feature_keys_init(void) { }
>   CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
>   CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
>   CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \
> - CPU_FTR_DAWR | CPU_FTR_DAWR1)
> + CPU_FTR_DAWR | CPU_FTR_DAWR1 | CPU_FTR_NX_DSI)

You're turning that bit on by default for Power9 and Power10 - is that
correct?

If so do you have a documentation source for that?

cheers

[PATCH v2 2/2] powerpc/rtas: block error injection when locked down

2022-09-26 Thread Nathan Lynch

The error injection facility on pseries VMs allows corruption of
arbitrary guest memory, potentially enabling a sufficiently privileged
user to disable lockdown or perform other modifications of the running
kernel via the rtas syscall.

Block the PAPR error injection facility from being opened or called
when locked down.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/kernel/rtas.c | 25 -
 include/linux/security.h   |  1 +
 security/security.c|  1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 693133972294..c2540d393f1c 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -464,6 +465,9 @@ void rtas_call_unlocked(struct rtas_args *args, int token, 
int nargs, int nret,
va_end(list);
 }
 
+static int ibm_open_errinjct_token;
+static int ibm_errinjct_token;
+
 int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 {
va_list list;
@@ -476,6 +480,16 @@ int rtas_call(int token, int nargs, int nret, int 
*outputs, ...)
if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
return -1;
 
+   if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) {
+   /*
+* It would be nicer to not discard the error value
+* from security_locked_down(), but callers expect an
+* RTAS status, not an errno.
+*/
+   if (security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION))
+   return -1;
+   }
+
if ((mfmsr() & (MSR_IR|MSR_DR)) != (MSR_IR|MSR_DR)) {
WARN_ON_ONCE(1);
return -1;
@@ -1227,6 +1241,14 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
if (block_rtas_call(token, nargs, ))
return -EINVAL;
 
+   if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) {
+   int err;
+
+   err = security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION);
+   if (err)
+   return err;
+   }
+
/* Need to handle ibm,suspend_me call specially */
if (token == rtas_token("ibm,suspend-me")) {
 
@@ -1325,7 +1347,8 @@ void __init rtas_initialize(void)
 #ifdef CONFIG_RTAS_ERROR_LOGGING
rtas_last_error_token = rtas_token("rtas-last-error");
 #endif
-
+   ibm_open_errinjct_token = rtas_token("ibm,open-errinjct");
+   ibm_errinjct_token = rtas_token("ibm,errinjct");
rtas_syscall_filter_init();
 }
 
diff --git a/include/linux/security.h b/include/linux/security.h
index 39e7c0e403d9..70f89dc3a712 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -123,6 +123,7 @@ enum lockdown_reason {
LOCKDOWN_XMON_WR,
LOCKDOWN_BPF_WRITE_USER,
LOCKDOWN_DBG_WRITE_KERNEL,
+   LOCKDOWN_RTAS_ERROR_INJECTION,
LOCKDOWN_INTEGRITY_MAX,
LOCKDOWN_KCORE,
LOCKDOWN_KPROBES,
diff --git a/security/security.c b/security/security.c
index 51bf66d4f472..eabe3ce7e74e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -61,6 +61,7 @@ const char *const 
lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
[LOCKDOWN_XMON_WR] = "xmon write access",
[LOCKDOWN_BPF_WRITE_USER] = "use of bpf to write user RAM",
[LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM",
+   [LOCKDOWN_RTAS_ERROR_INJECTION] = "RTAS error injection",
[LOCKDOWN_INTEGRITY_MAX] = "integrity",
[LOCKDOWN_KCORE] = "/proc/kcore access",
[LOCKDOWN_KPROBES] = "use of kprobes",
-- 
2.37.3

Re: [PATCH v2 5/6] powerpc/64: Add support for out-of-line static calls

2022-09-26 Thread Christophe Leroy



Le 26/09/2022 à 08:43, Benjamin Gray a écrit :
> Implement static call support for 64 bit V2 ABI. This requires
> making sure the TOC is kept correct across kernel-module
> boundaries. As a secondary concern, it tries to use the local
> entry point of a target wherever possible. It does so by
> checking if both tramp & target are kernel code, and falls
> back to detecting the common global entry point patterns
> if modules are involved. Detecting the global entry point is
> also required for setting the local entry point as the trampoline
> target: if we cannot detect the local entry point, then we need to
> convservatively initialise r12 and use the global entry point.
> 
> The trampolines are marked with `.localentry NAME, 1` to make the
> linker save and restore the TOC on each call to the trampoline. This
> allows the trampoline to safely target functions with different TOC
> values.
> 
> However this directive also implies the TOC is not initialised on entry
> to the trampoline. The kernel TOC is easily found in the PACA, but not
> an arbitrary module TOC. Therefore the trampoline implementation depends
> on whether it's in the kernel or not. If in the kernel, we initialise
> the TOC using the PACA. If in a module, we have to initialise the TOC
> with zero context, so it's quite expensive.


Build failure with GCC 5.5 (ppc64le_defconfig):

   CC  arch/powerpc/kernel/ptrace/ptrace.o
{standard input}: Assembler messages:
{standard input}:10: Error: .localentry expression for 
`__SCT__tp_func_sys_enter' is not a valid power of 2
{standard input}:29: Error: .localentry expression for 
`__SCT__tp_func_sys_exit' is not a valid power of 2


> 
> Signed-off-by: Benjamin Gray 
> ---
>   arch/powerpc/Kconfig |  2 +-
>   arch/powerpc/include/asm/code-patching.h |  1 +
>   arch/powerpc/include/asm/static_call.h   | 80 +++--
>   arch/powerpc/kernel/Makefile |  3 +-
>   arch/powerpc/kernel/static_call.c| 90 ++--
>   5 files changed, 164 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 4c466acdc70d..e7a66635eade 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -248,7 +248,7 @@ config PPC
>   select HAVE_SOFTIRQ_ON_OWN_STACK
>   select HAVE_STACKPROTECTOR  if PPC32 && 
> $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
>   select HAVE_STACKPROTECTOR  if PPC64 && 
> $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
> - select HAVE_STATIC_CALL if PPC32
> + select HAVE_STATIC_CALL if PPC32 || PPC64_ELF_ABI_V2
>   select HAVE_SYSCALL_TRACEPOINTS
>   select HAVE_VIRT_CPU_ACCOUNTING
>   select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
> diff --git a/arch/powerpc/include/asm/code-patching.h 
> b/arch/powerpc/include/asm/code-patching.h
> index 15efd8ab22da..8d1850080af8 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -132,6 +132,7 @@ int translate_branch(ppc_inst_t *instr, const u32 *dest, 
> const u32 *src);
>   bool is_conditional_branch(ppc_inst_t instr);
>   
>   #define OP_RT_RA_MASK   0xUL
> +#define OP_SI_MASK   0xUL
>   #define LIS_R2  (PPC_RAW_LIS(_R2, 0))
>   #define ADDIS_R2_R12(PPC_RAW_ADDIS(_R2, _R12, 0))
>   #define ADDI_R2_R2  (PPC_RAW_ADDI(_R2, _R2, 0))
> diff --git a/arch/powerpc/include/asm/static_call.h 
> b/arch/powerpc/include/asm/static_call.h
> index de1018cc522b..3d6e82200cb7 100644
> --- a/arch/powerpc/include/asm/static_call.h
> +++ b/arch/powerpc/include/asm/static_call.h
> @@ -2,12 +2,75 @@
>   #ifndef _ASM_POWERPC_STATIC_CALL_H
>   #define _ASM_POWERPC_STATIC_CALL_H
>   
> +#ifdef CONFIG_PPC64_ELF_ABI_V2
> +
> +#ifdef MODULE
> +
> +#define __PPC_SCT(name, inst)\
> + asm(".pushsection .text, \"ax\" \n" \
> + ".align 6   \n" \
> + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
> + ".localentry " STATIC_CALL_TRAMP_STR(name) ", 1 \n" \
> + STATIC_CALL_TRAMP_STR(name) ":  \n" \
> + "   mflr11  \n" \
> + "   bcl 20, 31, $+4 \n" \
> + "0: mflr12  \n" \
> + "   mtlr11  \n" \
> + "   addi12, 12, (" STATIC_CALL_TRAMP_STR(name) " - 0b)  \n" 
> \
> + "   addis 2, 12, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@ha \n" 
> \
> + "   addi 2, 2, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@l\n" 
> \
> + "   " inst "\n" \
> +

[PATCH v2 0/2] powerpc/pseries: restrict error injection and DT changes when locked down

2022-09-26 Thread Nathan Lynch

Add two new lockdown reasons for use in powerpc's pseries platform
code.

The pseries platform allows hardware-level error injection via certain
calls to the RTAS (Run Time Abstraction Services) firmware. ACPI-based
error injection is already restricted in lockdown; this facility
should be restricted for the same reasons.

pseries also allows nearly arbitrary device tree changes via
/proc/powerpc/ofdt. Just as overriding ACPI tables is not allowed
while locked down, so should this facility be restricted.

Changes since v1:
* Move LOCKDOWN_DEVICE_TREE next to LOCKDOWN_ACPI_TABLES.

Nathan Lynch (2):
  powerpc/pseries: block untrusted device tree changes when locked down
  powerpc/rtas: block error injection when locked down

 arch/powerpc/kernel/rtas.c| 25 ++-
 arch/powerpc/platforms/pseries/reconfig.c |  5 +
 include/linux/security.h  |  2 ++
 security/security.c   |  2 ++
 4 files changed, 33 insertions(+), 1 deletion(-)

-- 
2.37.3

[PATCH v2 1/2] powerpc/pseries: block untrusted device tree changes when locked down

2022-09-26 Thread Nathan Lynch

The /proc/powerpc/ofdt interface allows the root user to freely alter
the in-kernel device tree, enabling arbitrary physical address writes
via drivers that could bind to malicious device nodes, thus making it
possible to disable lockdown.

Historically this interface has been used on the pseries platform to
facilitate the runtime addition and removal of processor, memory, and
device resources (aka Dynamic Logical Partitioning or DLPAR). Years
ago, the processor and memory use cases were migrated to designs that
happen to be lockdown-friendly: device tree updates are communicated
directly to the kernel from firmware without passing through untrusted
user space. I/O device DLPAR via the "drmgr" command in powerpc-utils
remains the sole legitimate user of /proc/powerpc/ofdt, but it is
already broken in lockdown since it uses /dev/mem to allocate argument
buffers for the rtas syscall. So only illegitimate uses of the
interface should see a behavior change when running on a locked down
kernel.

Signed-off-by: Nathan Lynch 
---
 arch/powerpc/platforms/pseries/reconfig.c | 5 +
 include/linux/security.h  | 1 +
 security/security.c   | 1 +
 3 files changed, 7 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/reconfig.c 
b/arch/powerpc/platforms/pseries/reconfig.c
index cad7a0c93117..599bd2c78514 100644
--- a/arch/powerpc/platforms/pseries/reconfig.c
+++ b/arch/powerpc/platforms/pseries/reconfig.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -361,6 +362,10 @@ static ssize_t ofdt_write(struct file *file, const char 
__user *buf, size_t coun
char *kbuf;
char *tmp;
 
+   rv = security_locked_down(LOCKDOWN_DEVICE_TREE);
+   if (rv)
+   return rv;
+
kbuf = memdup_user_nul(buf, count);
if (IS_ERR(kbuf))
return PTR_ERR(kbuf);
diff --git a/include/linux/security.h b/include/linux/security.h
index 7bd0c490703d..39e7c0e403d9 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -114,6 +114,7 @@ enum lockdown_reason {
LOCKDOWN_IOPORT,
LOCKDOWN_MSR,
LOCKDOWN_ACPI_TABLES,
+   LOCKDOWN_DEVICE_TREE,
LOCKDOWN_PCMCIA_CIS,
LOCKDOWN_TIOCSSERIAL,
LOCKDOWN_MODULE_PARAMETERS,
diff --git a/security/security.c b/security/security.c
index 4b95de24bc8d..51bf66d4f472 100644
--- a/security/security.c
+++ b/security/security.c
@@ -52,6 +52,7 @@ const char *const 
lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
[LOCKDOWN_IOPORT] = "raw io port access",
[LOCKDOWN_MSR] = "raw MSR access",
[LOCKDOWN_ACPI_TABLES] = "modifying ACPI tables",
+   [LOCKDOWN_DEVICE_TREE] = "modifying device tree contents",
[LOCKDOWN_PCMCIA_CIS] = "direct PCMCIA CIS storage",
[LOCKDOWN_TIOCSSERIAL] = "reconfiguration of serial port IO",
[LOCKDOWN_MODULE_PARAMETERS] = "unsafe module parameters",
-- 
2.37.3

[PATCH -next] soc: fsl: dpio: Add init/exit annotations to module init/exit func

2022-09-26 Thread ruanjinjie

Add missing __init/__exit annotations to module init/exit funcs

Signed-off-by: ruanjinjie 
---
 drivers/soc/fsl/dpio/dpio-driver.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/fsl/dpio/dpio-driver.c 
b/drivers/soc/fsl/dpio/dpio-driver.c
index 5a2edc48dd79..534e91dd929c 100644
--- a/drivers/soc/fsl/dpio/dpio-driver.c
+++ b/drivers/soc/fsl/dpio/dpio-driver.c
@@ -326,7 +326,7 @@ static struct fsl_mc_driver dpaa2_dpio_driver = {
.match_id_table = dpaa2_dpio_match_id_table
 };
 
-static int dpio_driver_init(void)
+static int __init dpio_driver_init(void)
 {
if (!zalloc_cpumask_var(_unused_mask, GFP_KERNEL))
return -ENOMEM;
@@ -335,7 +335,7 @@ static int dpio_driver_init(void)
return fsl_mc_driver_register(_dpio_driver);
 }
 
-static void dpio_driver_exit(void)
+static void __exit dpio_driver_exit(void)
 {
free_cpumask_var(cpus_unused_mask);
fsl_mc_driver_unregister(_dpio_driver);
-- 
2.25.1

[PATCH -next] soc: fsl: dpio: Add init/exit annotations to module init/exit func

2022-09-26 Thread ruanjinjie

Add missing __init/__exit annotations to module init/exit funcs

Signed-off-by: ruanjinjie 
---
 drivers/soc/fsl/dpio/dpio-driver.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/fsl/dpio/dpio-driver.c 
b/drivers/soc/fsl/dpio/dpio-driver.c
index 5a2edc48dd79..534e91dd929c 100644
--- a/drivers/soc/fsl/dpio/dpio-driver.c
+++ b/drivers/soc/fsl/dpio/dpio-driver.c
@@ -326,7 +326,7 @@ static struct fsl_mc_driver dpaa2_dpio_driver = {
.match_id_table = dpaa2_dpio_match_id_table
 };
 
-static int dpio_driver_init(void)
+static int __init dpio_driver_init(void)
 {
if (!zalloc_cpumask_var(_unused_mask, GFP_KERNEL))
return -ENOMEM;
@@ -335,7 +335,7 @@ static int dpio_driver_init(void)
return fsl_mc_driver_register(_dpio_driver);
 }
 
-static void dpio_driver_exit(void)
+static void __exit dpio_driver_exit(void)
 {
free_cpumask_var(cpus_unused_mask);
fsl_mc_driver_unregister(_dpio_driver);
-- 
2.25.1

Re: Is PPC 44x PIKA Warp board still relevant?

2022-09-26 Thread Michael Ellerman

Christophe Leroy  writes:
> Hi Dmitry
>
> Le 25/09/2022 à 07:06, Dmitry Torokhov a écrit :
>> Hi Michael, Nick,
>> 
>> I was wondering if PIKA Warp board still relevant. The reason for my
>> question is that I am interested in dropping legacy gpio APIs,
>> especially OF-specific ones, in favor of newer gpiod APIs, and
>> arch/powerpc/platforms/44x/warp.c is one of few users of it.
>
> As far as I can see, that board is still being sold, see
>
> https://www.voipon.co.uk/pika-warp-asterisk-appliance-p-932.html

On the other hand it looks like PIKA technologies went bankrupt earlier
this year.

>> The code in question is supposed to turn off green led and flash red led
>> in case of overheating, and is doing so by directly accessing GPIOs
>> owned by led-gpio driver without requesting/allocating them. This is not
>> really supported with gpiod API, and is not a good practice in general.
>
> As far as I can see, it was ported to led-gpio by
>
> ba703e1a7a0b powerpc/4xx: Have Warp take advantage of GPIO LEDs 
> default-state = keep
> 805e324b7fbd powerpc: Update Warp to use leds-gpio driver
>
>> Before I spend much time trying to implement a replacement without
>> access to the hardware, I wonder if this board is in use at all, and if
>> it is how important is the feature of flashing red led on critical
>> temperature shutdown?
>
> Don't know who can tell it ?

I would be surprised if anyone is still running upstream kernels on it.

I can't find any sign of any activity on the mailing list related to it
since it was initially merged.

> Maybe let's perform a more standard implementation is see if anybody 
> screams ?

How much work is it to convert it?

Flashing a LED when the machine dies is nice, but not exactly critical,
hopefully the machine *isn't* dying that often :)

cheers

Re: [PATCH 3/7] powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c

2022-09-26 Thread Pali Rohár

On Monday 26 September 2022 10:17:26 Christophe Leroy wrote:
> Le 26/09/2022 à 11:53, Pali Rohár a écrit :
> > On Monday 26 September 2022 09:48:02 Christophe Leroy wrote:
> >> Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> >>> This moves machine descriptions and all related code for all P2020 boards
> >>> into new p2020.c source file. This is preparation for code deduplication
> >>> and providing one unified machine description for all P2020 boards.
> >>
> >> I'm having hard time to review this patch.
> >>
> >> It looks like you are doing much more than just moving machine
> >> descriptions and related code into p2020.c
> >>
> >> Apparently p2020.c has a lot of code that doesn't seem be move from
> >> somewhere else.
> >>
> >> Maybe there is a need to tidy up in order to ease reviewing.
> > 
> > This is probably harder to read due to how git format-patch generated
> > this email. The important is:
> > 
> >   copy from arch/powerpc/platforms/85xx/mpc85xx_ds.c
> >   copy to arch/powerpc/platforms/85xx/p2020.c
> > 
> > Which means that git thinks that my newly introduced file p2020.c is
> > similar to old file mpc85xx_ds.c and generated diff in format which do:
> > 
> >   1. copy mpc85xx_ds.c to p2020.c
> >   2. apply diff on newly introduced file p2020.c
> > 
> > Code is really moved from mpc85xx_ds.c and mpc85xx_rdb.c files into file
> > p2020.c.
> > 
> > File p2020.c is new in this patch.
> 
> Well, I didn't really look in how the patch was generated, I imported 
> your series and mainly reviewed it in git directly.
> 
> For this patch I have the following diff stat:
> 
> $ git show --stat e2d8c39e2e32855658d1c5f042a7ce88952f488a
> commit e2d8c39e2e32855658d1c5f042a7ce88952f488a
> Author: Pali Rohár 
> Date:   Fri Aug 19 21:15:53 2022 +0200
> 
>  powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c
> 
>  This moves machine descriptions and all related code for all P2020 
> boards
>  into new p2020.c source file. This is preparation for code 
> deduplication
>  and providing one unified machine description for all P2020 boards.
> 
>  Signed-off-by: Pali Rohár 
> 
>   arch/powerpc/platforms/85xx/Makefile  |   2 ++
>   arch/powerpc/platforms/85xx/mpc85xx_ds.c  |  23 --
>   arch/powerpc/platforms/85xx/mpc85xx_rdb.c |  44 --
>   arch/powerpc/platforms/85xx/p2020.c   | 273 
> ++
>   4 files changed, 275 insertions(+), 67 deletions(-)
> 
> 
> So there is a lot more code added than deleted.
> 
> If it was really a code move as described in the commit message, I would 
> have approximately the same number of inserts as number of deletions.

I see... The reason is that helper ds/rdb functions are copies (not
moved) because they are needed still in ds/rdb boards. And in later
patches in this patch series are then p2020 helper function cleaned and
simplified.

So as I see basically this change moves p2020 machine descriptions from
ds/rdb files into p2020.c, plus copy helper functions.

Not sure what should be the best case how to do it. I did not wanted to
introduce regression in the code, so I rather did not touched non-p2020
code in ds/rdb files.

> 
> > 
> >>>
> >>> Signed-off-by: Pali Rohár 
> >>> ---
> >>>arch/powerpc/platforms/85xx/Makefile  |   2 +
> >>>arch/powerpc/platforms/85xx/mpc85xx_ds.c  |  23 ---
> >>>arch/powerpc/platforms/85xx/mpc85xx_rdb.c |  44 --
> >>>.../platforms/85xx/{mpc85xx_ds.c => p2020.c}  | 134 --
> >>>4 files changed, 91 insertions(+), 112 deletions(-)
> >>>copy arch/powerpc/platforms/85xx/{mpc85xx_ds.c => p2020.c} (65%)
> >>>
> >>> diff --git a/arch/powerpc/platforms/85xx/Makefile 
> >>> b/arch/powerpc/platforms/85xx/Makefile
> >>> index 260fbad7967b..1ad261b4eeb6 100644
> >>> --- a/arch/powerpc/platforms/85xx/Makefile
> >>> +++ b/arch/powerpc/platforms/85xx/Makefile
> >>> @@ -23,6 +23,8 @@ obj-$(CONFIG_P1010_RDB)   += p1010rdb.o
> >>>obj-$(CONFIG_P1022_DS)+= p1022_ds.o
> >>>obj-$(CONFIG_P1022_RDK)   += p1022_rdk.o
> >>>obj-$(CONFIG_P1023_RDB)   += p1023_rdb.o
> >>> +obj-$(CONFIG_MPC85xx_DS)  += p2020.o
> >>> +obj-$(CONFIG_MPC85xx_RDB) += p2020.o
> >>>obj-$(CONFIG_TWR_P102x)   += twr_p102x.o
> >>>obj-$(CONFIG_CORENET_GENERIC)   += corenet_generic.o
> >>>obj-$(CONFIG_FB_FSL_DIU)   += t1042rdb_diu.o
> >>> diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c 
> >>> b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> >>> index 9a6d637ef54a..05aac997b5ed 100644
> >>> --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> >>> +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> >>> @@ -168,7 +168,6 @@ static int __init mpc8544_ds_probe(void)
> >>>
> >>>machine_arch_initcall(mpc8544_ds, mpc85xx_common_publish_devices);
> >>>machine_arch_initcall(mpc8572_ds,

Re: [PATCH 7/7] powerpc: dts: turris1x.dts: Remove "fsl,P2020RDB-PC" compatible string

2022-09-26 Thread Pali Rohár

On Monday 26 September 2022 10:10:19 Christophe Leroy wrote:
> Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> > "fsl,P2020RDB-PC" compatible string was present in Turris 1.x DTS file just
> > because Linux kernel required it for proper detection of P2020 processor
> > during boot.
> > 
> > This was quite a hack as CZ,NIC Turris 1.x is not compatible with
> > Freescale P2020-RDB-PC board.
> > 
> > Now when kernel has generic unified support for boards with P2020
> > processors, there is no need to have this "hack" in turris1x.dts file.
> > 
> > So remove incorrect "fsl,P2020RDB-PC" compatible string from turris1x.dts.
> 
> Oh, I thought it was not possible to modify DTSes.

Boards which have hardcoded DTB binaries in bootloader or are
kernel out-of-tree, they obviously needs to be still supported by
kernel.

> If it is, can you have a common compatible to all p2020, for instance 
> "fsl,p2020', so that you can use it in patch 5 instead of 
> of_find_node_by_path("/cpus/PowerPC,P2020@0") ?

I can add fsl,p2020. But it does not solve issue for other boards.

This string fsl,p2020 is not used by any board (yet).

Also Turris 1.x boards have burned some older DTB file in Flash NOR.

So it is problematic.

> > 
> > Signed-off-by: Pali Rohár 
> > ---
> >   arch/powerpc/boot/dts/turris1x.dts | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/boot/dts/turris1x.dts 
> > b/arch/powerpc/boot/dts/turris1x.dts
> > index 12e08271e61f..69c38ed8a3a5 100644
> > --- a/arch/powerpc/boot/dts/turris1x.dts
> > +++ b/arch/powerpc/boot/dts/turris1x.dts
> > @@ -15,7 +15,7 @@
> >   
> >   / {
> > model = "Turris 1.x";
> > -   compatible = "cznic,turris1x", "fsl,P2020RDB-PC"; /* fsl,P2020RDB-PC is 
> > required for booting Linux */
> > +   compatible = "cznic,turris1x";
> >   
> > aliases {
> > ethernet0 =

Re: [PATCH 6/7] powerpc/85xx: p2020: Enable boards by new config option CONFIG_P2020

2022-09-26 Thread Pali Rohár

On Monday 26 September 2022 10:08:19 Christophe Leroy wrote:
> Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> > Generic unified P2020 machine description which supports all P2020-based
> > boards is now in separate file p2020.c. So create a separate config option
> > CONFIG_P2020 for it.
> 
> Could it be CONFIG_PPC_P2020 instead ? Nowadays, drivers seems to spread 
> all over driver/ directory, so it's much better to have CONFIG_PPC_ 
> prefix on all dedicated powerpc config items.

Ok! I do not have any strong preference of config option name.

> > 
> > Previously machine descriptions for P2020 boards were enabled by
> > CONFIG_MPC85xx_DS or CONFIG_MPC85xx_RDB option. So set CONFIG_P2020 to be
> > enabled by default when one of those option is enabled.
> > 
> > This allows to compile support for P2020 boards without need to have
> > enabled support for older mpc85xx boards. And to compile kernel for old
> > mpc85xx boards without having enabled support for new P2020 boards.
> > 
> > Signed-off-by: Pali Rohár 
> > ---
> >   arch/powerpc/platforms/85xx/Kconfig  | 22 ++
> >   arch/powerpc/platforms/85xx/Makefile |  3 +--
> >   2 files changed, 19 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/85xx/Kconfig 
> > b/arch/powerpc/platforms/85xx/Kconfig
> > index be16eba0f704..2cb4e9248b42 100644
> > --- a/arch/powerpc/platforms/85xx/Kconfig
> > +++ b/arch/powerpc/platforms/85xx/Kconfig
> > @@ -78,16 +78,16 @@ config MPC8536_DS
> >   This option enables support for the MPC8536 DS board
> >   
> >   config MPC85xx_DS
> > -   bool "Freescale MPC8544 DS / MPC8572 DS / P2020 DS"
> > +   bool "Freescale MPC8544 DS / MPC8572 DS"
> > select PPC_I8259
> > select DEFAULT_UIMAGE
> > select FSL_ULI1575 if PCI
> > select SWIOTLB
> > help
> > - This option enables support for the MPC8544 DS, MPC8572 DS and P2020 
> > DS boards
> > + This option enables support for the MPC8544 DS and MPC8572 DS boards
> >   
> >   config MPC85xx_RDB
> > -   bool "Freescale P102x MBG/UTM/RDB and P2020 RDB"
> > +   bool "Freescale P102x MBG/UTM/RDB"
> > select PPC_I8259
> > select DEFAULT_UIMAGE
> > select FSL_ULI1575 if PCI
> > @@ -95,7 +95,21 @@ config MPC85xx_RDB
> > help
> >   This option enables support for the P1020 MBG PC, P1020 UTM PC,
> >   P1020 RDB PC, P1020 RDB PD, P1020 RDB, P1021 RDB PC, P1024 RDB,
> > - P1025 RDB, P2020 RDB and P2020 RDB PC boards
> > + and P1025 RDB boards
> > +
> > +config P2020
> > +   bool "Freescale P2020"
> > +   default y if MPC85xx_DS || MPC85xx_RDB
> 
> Is that necessary ?
> Can you just update defconfigs ?

This is for old users defconfigs, so if they update kernel to new
version it automatically selects all features which were already
enabled.

But if you think this is not necessary, just drop it.

> By the way, did you have a look at the impact on defconfigs ?
> 
> > +   select DEFAULT_UIMAGE
> > +   select SWIOTLB
> > +   imply PPC_I8259
> > +   imply FSL_ULI1575 if PCI
> 
> Why imply and not select ?

Because more P2020 boards do not have these two HW parts. So I do not
see reason for hard dependency. In my opinion, if user does not need to
enable some kernel option (because his HW does not require it) then
kernel should allow to do it, unless there is no strong reason for it.

And IIRC imply is like select but allow user to disable specified
option.

> > +   help
> > + This option enables generic unified support for any board with the
> > + Freescale P2020 processor.
> > +
> > + For example: P2020 DS board, P2020 RDB board, P2020 RDB PC board or
> > + CZ.NIC Turris 1.x boards.
> >   
> >   config P1010_RDB
> > bool "Freescale P1010 RDB"
> > diff --git a/arch/powerpc/platforms/85xx/Makefile 
> > b/arch/powerpc/platforms/85xx/Makefile
> > index 1ad261b4eeb6..021e168442d7 100644
> > --- a/arch/powerpc/platforms/85xx/Makefile
> > +++ b/arch/powerpc/platforms/85xx/Makefile
> > @@ -23,8 +23,7 @@ obj-$(CONFIG_P1010_RDB)   += p1010rdb.o
> >   obj-$(CONFIG_P1022_DS)+= p1022_ds.o
> >   obj-$(CONFIG_P1022_RDK)   += p1022_rdk.o
> >   obj-$(CONFIG_P1023_RDB)   += p1023_rdb.o
> > -obj-$(CONFIG_MPC85xx_DS)  += p2020.o
> > -obj-$(CONFIG_MPC85xx_RDB) += p2020.o
> > +obj-$(CONFIG_P2020)   += p2020.o
> >   obj-$(CONFIG_TWR_P102x)   += twr_p102x.o
> >   obj-$(CONFIG_CORENET_GENERIC)   += corenet_generic.o
> >   obj-$(CONFIG_FB_FSL_DIU)  += t1042rdb_diu.o

Re: [PATCH 3/7] powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c

2022-09-26 Thread Christophe Leroy



Le 26/09/2022 à 11:53, Pali Rohár a écrit :
> On Monday 26 September 2022 09:48:02 Christophe Leroy wrote:
>> Le 19/08/2022 à 21:15, Pali Rohár a écrit :
>>> This moves machine descriptions and all related code for all P2020 boards
>>> into new p2020.c source file. This is preparation for code deduplication
>>> and providing one unified machine description for all P2020 boards.
>>
>> I'm having hard time to review this patch.
>>
>> It looks like you are doing much more than just moving machine
>> descriptions and related code into p2020.c
>>
>> Apparently p2020.c has a lot of code that doesn't seem be move from
>> somewhere else.
>>
>> Maybe there is a need to tidy up in order to ease reviewing.
> 
> This is probably harder to read due to how git format-patch generated
> this email. The important is:
> 
>   copy from arch/powerpc/platforms/85xx/mpc85xx_ds.c
>   copy to arch/powerpc/platforms/85xx/p2020.c
> 
> Which means that git thinks that my newly introduced file p2020.c is
> similar to old file mpc85xx_ds.c and generated diff in format which do:
> 
>   1. copy mpc85xx_ds.c to p2020.c
>   2. apply diff on newly introduced file p2020.c
> 
> Code is really moved from mpc85xx_ds.c and mpc85xx_rdb.c files into file
> p2020.c.
> 
> File p2020.c is new in this patch.

Well, I didn't really look in how the patch was generated, I imported 
your series and mainly reviewed it in git directly.

For this patch I have the following diff stat:

$ git show --stat e2d8c39e2e32855658d1c5f042a7ce88952f488a
commit e2d8c39e2e32855658d1c5f042a7ce88952f488a
Author: Pali Rohár 
Date:   Fri Aug 19 21:15:53 2022 +0200

 powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c

 This moves machine descriptions and all related code for all P2020 
boards
 into new p2020.c source file. This is preparation for code 
deduplication
 and providing one unified machine description for all P2020 boards.

 Signed-off-by: Pali Rohár 

  arch/powerpc/platforms/85xx/Makefile  |   2 ++
  arch/powerpc/platforms/85xx/mpc85xx_ds.c  |  23 --
  arch/powerpc/platforms/85xx/mpc85xx_rdb.c |  44 --
  arch/powerpc/platforms/85xx/p2020.c   | 273 
++
  4 files changed, 275 insertions(+), 67 deletions(-)


So there is a lot more code added than deleted.

If it was really a code move as described in the commit message, I would 
have approximately the same number of inserts as number of deletions.


> 
>>>
>>> Signed-off-by: Pali Rohár 
>>> ---
>>>arch/powerpc/platforms/85xx/Makefile  |   2 +
>>>arch/powerpc/platforms/85xx/mpc85xx_ds.c  |  23 ---
>>>arch/powerpc/platforms/85xx/mpc85xx_rdb.c |  44 --
>>>.../platforms/85xx/{mpc85xx_ds.c => p2020.c}  | 134 --
>>>4 files changed, 91 insertions(+), 112 deletions(-)
>>>copy arch/powerpc/platforms/85xx/{mpc85xx_ds.c => p2020.c} (65%)
>>>
>>> diff --git a/arch/powerpc/platforms/85xx/Makefile 
>>> b/arch/powerpc/platforms/85xx/Makefile
>>> index 260fbad7967b..1ad261b4eeb6 100644
>>> --- a/arch/powerpc/platforms/85xx/Makefile
>>> +++ b/arch/powerpc/platforms/85xx/Makefile
>>> @@ -23,6 +23,8 @@ obj-$(CONFIG_P1010_RDB)   += p1010rdb.o
>>>obj-$(CONFIG_P1022_DS)+= p1022_ds.o
>>>obj-$(CONFIG_P1022_RDK)   += p1022_rdk.o
>>>obj-$(CONFIG_P1023_RDB)   += p1023_rdb.o
>>> +obj-$(CONFIG_MPC85xx_DS)  += p2020.o
>>> +obj-$(CONFIG_MPC85xx_RDB) += p2020.o
>>>obj-$(CONFIG_TWR_P102x)   += twr_p102x.o
>>>obj-$(CONFIG_CORENET_GENERIC)   += corenet_generic.o
>>>obj-$(CONFIG_FB_FSL_DIU) += t1042rdb_diu.o
>>> diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c 
>>> b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
>>> index 9a6d637ef54a..05aac997b5ed 100644
>>> --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c
>>> +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
>>> @@ -168,7 +168,6 @@ static int __init mpc8544_ds_probe(void)
>>>
>>>machine_arch_initcall(mpc8544_ds, mpc85xx_common_publish_devices);
>>>machine_arch_initcall(mpc8572_ds, mpc85xx_common_publish_devices);
>>> -machine_arch_initcall(p2020_ds, mpc85xx_common_publish_devices);
>>>
>>>/*
>>> * Called very early, device-tree isn't unflattened
>>> @@ -178,14 +177,6 @@ static int __init mpc8572_ds_probe(void)
>>> return !!of_machine_is_compatible("fsl,MPC8572DS");
>>>}
>>>
>>> -/*
>>> - * Called very early, device-tree isn't unflattened
>>> - */
>>> -static int __init p2020_ds_probe(void)
>>> -{
>>> -   return !!of_machine_is_compatible("fsl,P2020DS");
>>> -}
>>> -
>>>define_machine(mpc8544_ds) {
>>> .name   = "MPC8544 DS",
>>> .probe  = mpc8544_ds_probe,
>>> @@ -213,17 +204,3 @@ define_machine(mpc8572_ds) {
>>> .calibrate_decr = generic_calibrate_decr,
>>> .progress   =

Re: [PATCH 7/7] powerpc: dts: turris1x.dts: Remove "fsl,P2020RDB-PC" compatible string

2022-09-26 Thread Christophe Leroy



Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> "fsl,P2020RDB-PC" compatible string was present in Turris 1.x DTS file just
> because Linux kernel required it for proper detection of P2020 processor
> during boot.
> 
> This was quite a hack as CZ,NIC Turris 1.x is not compatible with
> Freescale P2020-RDB-PC board.
> 
> Now when kernel has generic unified support for boards with P2020
> processors, there is no need to have this "hack" in turris1x.dts file.
> 
> So remove incorrect "fsl,P2020RDB-PC" compatible string from turris1x.dts.

Oh, I thought it was not possible to modify DTSes.

If it is, can you have a common compatible to all p2020, for instance 
"fsl,p2020', so that you can use it in patch 5 instead of 
of_find_node_by_path("/cpus/PowerPC,P2020@0") ?

> 
> Signed-off-by: Pali Rohár 
> ---
>   arch/powerpc/boot/dts/turris1x.dts | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/boot/dts/turris1x.dts 
> b/arch/powerpc/boot/dts/turris1x.dts
> index 12e08271e61f..69c38ed8a3a5 100644
> --- a/arch/powerpc/boot/dts/turris1x.dts
> +++ b/arch/powerpc/boot/dts/turris1x.dts
> @@ -15,7 +15,7 @@
>   
>   / {
>   model = "Turris 1.x";
> - compatible = "cznic,turris1x", "fsl,P2020RDB-PC"; /* fsl,P2020RDB-PC is 
> required for booting Linux */
> + compatible = "cznic,turris1x";
>   
>   aliases {
>   ethernet0 =

Re: [PATCH 5/7] powerpc/85xx: p2020: Define just one machine description

2022-09-26 Thread Pali Rohár

On Monday 26 September 2022 10:02:47 Christophe Leroy wrote:
> > +static int __init p2020_probe(void)
> >   {
> > -   if (of_machine_is_compatible("fsl,P2020RDB-PC"))
> > -   return 1;
> > -   return 0;
> > +   struct device_node *p2020_cpu;
> > +
> > +   /*
> > +* There is no common compatible string for all P2020 boards.
> > +* The only common thing is "PowerPC,P2020@0" cpu node.
> > +* So check for P2020 board via this cpu node.
> > +*/
> > +   p2020_cpu = of_find_node_by_path("/cpus/PowerPC,P2020@0");
> > +   if (!p2020_cpu)
> > +   return 0;
> 
> This looks odd. I though all probe were using the compatible, and in 
> fact I have a series in preparation that drops all 
> of_machine_is_compatible() checks in probe functions and do it in the 
> caller instead, after adding a .compatible string in the machine 
> description.
> 
> Is there really no compatible that can be used for all p2020 ?

Really. There is none. I have looked into all available P2020 DTB files
(either externals passed by bootloader or kernel in-tree) and there is
no common compatible string. The only "common" thing is cpu node, how I
implemented it int this patch series.

And same issue is with boards with P101x and P102x DTB files.

Re: [PATCH 6/7] powerpc/85xx: p2020: Enable boards by new config option CONFIG_P2020

2022-09-26 Thread Christophe Leroy



Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> Generic unified P2020 machine description which supports all P2020-based
> boards is now in separate file p2020.c. So create a separate config option
> CONFIG_P2020 for it.

Could it be CONFIG_PPC_P2020 instead ? Nowadays, drivers seems to spread 
all over driver/ directory, so it's much better to have CONFIG_PPC_ 
prefix on all dedicated powerpc config items.

> 
> Previously machine descriptions for P2020 boards were enabled by
> CONFIG_MPC85xx_DS or CONFIG_MPC85xx_RDB option. So set CONFIG_P2020 to be
> enabled by default when one of those option is enabled.
> 
> This allows to compile support for P2020 boards without need to have
> enabled support for older mpc85xx boards. And to compile kernel for old
> mpc85xx boards without having enabled support for new P2020 boards.
> 
> Signed-off-by: Pali Rohár 
> ---
>   arch/powerpc/platforms/85xx/Kconfig  | 22 ++
>   arch/powerpc/platforms/85xx/Makefile |  3 +--
>   2 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/85xx/Kconfig 
> b/arch/powerpc/platforms/85xx/Kconfig
> index be16eba0f704..2cb4e9248b42 100644
> --- a/arch/powerpc/platforms/85xx/Kconfig
> +++ b/arch/powerpc/platforms/85xx/Kconfig
> @@ -78,16 +78,16 @@ config MPC8536_DS
> This option enables support for the MPC8536 DS board
>   
>   config MPC85xx_DS
> - bool "Freescale MPC8544 DS / MPC8572 DS / P2020 DS"
> + bool "Freescale MPC8544 DS / MPC8572 DS"
>   select PPC_I8259
>   select DEFAULT_UIMAGE
>   select FSL_ULI1575 if PCI
>   select SWIOTLB
>   help
> -   This option enables support for the MPC8544 DS, MPC8572 DS and P2020 
> DS boards
> +   This option enables support for the MPC8544 DS and MPC8572 DS boards
>   
>   config MPC85xx_RDB
> - bool "Freescale P102x MBG/UTM/RDB and P2020 RDB"
> + bool "Freescale P102x MBG/UTM/RDB"
>   select PPC_I8259
>   select DEFAULT_UIMAGE
>   select FSL_ULI1575 if PCI
> @@ -95,7 +95,21 @@ config MPC85xx_RDB
>   help
> This option enables support for the P1020 MBG PC, P1020 UTM PC,
> P1020 RDB PC, P1020 RDB PD, P1020 RDB, P1021 RDB PC, P1024 RDB,
> -   P1025 RDB, P2020 RDB and P2020 RDB PC boards
> +   and P1025 RDB boards
> +
> +config P2020
> + bool "Freescale P2020"
> + default y if MPC85xx_DS || MPC85xx_RDB

Is that necessary ?
Can you just update defconfigs ?

By the way, did you have a look at the impact on defconfigs ?

> + select DEFAULT_UIMAGE
> + select SWIOTLB
> + imply PPC_I8259
> + imply FSL_ULI1575 if PCI

Why imply and not select ?

> + help
> +   This option enables generic unified support for any board with the
> +   Freescale P2020 processor.
> +
> +   For example: P2020 DS board, P2020 RDB board, P2020 RDB PC board or
> +   CZ.NIC Turris 1.x boards.
>   
>   config P1010_RDB
>   bool "Freescale P1010 RDB"
> diff --git a/arch/powerpc/platforms/85xx/Makefile 
> b/arch/powerpc/platforms/85xx/Makefile
> index 1ad261b4eeb6..021e168442d7 100644
> --- a/arch/powerpc/platforms/85xx/Makefile
> +++ b/arch/powerpc/platforms/85xx/Makefile
> @@ -23,8 +23,7 @@ obj-$(CONFIG_P1010_RDB)   += p1010rdb.o
>   obj-$(CONFIG_P1022_DS)+= p1022_ds.o
>   obj-$(CONFIG_P1022_RDK)   += p1022_rdk.o
>   obj-$(CONFIG_P1023_RDB)   += p1023_rdb.o
> -obj-$(CONFIG_MPC85xx_DS)  += p2020.o
> -obj-$(CONFIG_MPC85xx_RDB) += p2020.o
> +obj-$(CONFIG_P2020)   += p2020.o
>   obj-$(CONFIG_TWR_P102x)   += twr_p102x.o
>   obj-$(CONFIG_CORENET_GENERIC)   += corenet_generic.o
>   obj-$(CONFIG_FB_FSL_DIU)+= t1042rdb_diu.o

Re: [PATCH 5/7] powerpc/85xx: p2020: Define just one machine description

2022-09-26 Thread Christophe Leroy



Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> Combine machine descriptions and code of all P2020 boards into just one
> generic unified P2020 machine description. This allows kernel to boot on
> any P2020-based board with P2020 DTS file without need to patch kernel and
> define a new machine description in 85xx powerpc platform directory.
> 
> Signed-off-by: Pali Rohár 
> ---
>   arch/powerpc/platforms/85xx/p2020.c | 83 +++--
>   1 file changed, 19 insertions(+), 64 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/85xx/p2020.c 
> b/arch/powerpc/platforms/85xx/p2020.c
> index d327e6c9b838..1a3ffeb47dfc 100644
> --- a/arch/powerpc/platforms/85xx/p2020.c
> +++ b/arch/powerpc/platforms/85xx/p2020.c
> @@ -154,83 +154,38 @@ static void __init p2020_setup_arch(void)
>   #endif
>   }
>   
> -#ifdef CONFIG_MPC85xx_DS
> -machine_arch_initcall(p2020_ds, mpc85xx_common_publish_devices);
> -#endif /* CONFIG_MPC85xx_DS */
> -
> -#ifdef CONFIG_MPC85xx_RDB
> -machine_arch_initcall(p2020_rdb, mpc85xx_common_publish_devices);
> -machine_arch_initcall(p2020_rdb_pc, mpc85xx_common_publish_devices);
> -#endif /* CONFIG_MPC85xx_RDB */
> +machine_arch_initcall(p2020, mpc85xx_common_publish_devices);
>   
>   /*
>* Called very early, device-tree isn't unflattened
>*/
> -#ifdef CONFIG_MPC85xx_DS
> -static int __init p2020_ds_probe(void)
> -{
> - return !!of_machine_is_compatible("fsl,P2020DS");
> -}
> -#endif /* CONFIG_MPC85xx_DS */
> -
> -#ifdef CONFIG_MPC85xx_RDB
> -static int __init p2020_rdb_probe(void)
> -{
> - if (of_machine_is_compatible("fsl,P2020RDB"))
> - return 1;
> - return 0;
> -}
> -
> -static int __init p2020_rdb_pc_probe(void)
> +static int __init p2020_probe(void)
>   {
> - if (of_machine_is_compatible("fsl,P2020RDB-PC"))
> - return 1;
> - return 0;
> + struct device_node *p2020_cpu;
> +
> + /*
> +  * There is no common compatible string for all P2020 boards.
> +  * The only common thing is "PowerPC,P2020@0" cpu node.
> +  * So check for P2020 board via this cpu node.
> +  */
> + p2020_cpu = of_find_node_by_path("/cpus/PowerPC,P2020@0");
> + if (!p2020_cpu)
> + return 0;

This looks odd. I though all probe were using the compatible, and in 
fact I have a series in preparation that drops all 
of_machine_is_compatible() checks in probe functions and do it in the 
caller instead, after adding a .compatible string in the machine 
description.

Is there really no compatible that can be used for all p2020 ?

> +
> + of_node_put(p2020_cpu);
> + return 1;
>   }
> -#endif /* CONFIG_MPC85xx_RDB */
> -
> -#ifdef CONFIG_MPC85xx_DS
> -define_machine(p2020_ds) {
> - .name   = "P2020 DS",
> - .probe  = p2020_ds_probe,
> - .setup_arch = p2020_setup_arch,
> - .init_IRQ   = p2020_pic_init,
> -#ifdef CONFIG_PCI
> - .pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
> - .pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
> -#endif
> - .get_irq= mpic_get_irq,
> - .calibrate_decr = generic_calibrate_decr,
> - .progress   = udbg_progress,
> -};
> -#endif /* CONFIG_MPC85xx_DS */
> -
> -#ifdef CONFIG_MPC85xx_RDB
> -define_machine(p2020_rdb) {
> - .name   = "P2020 RDB",
> - .probe  = p2020_rdb_probe,
> - .setup_arch = p2020_setup_arch,
> - .init_IRQ   = p2020_pic_init,
> -#ifdef CONFIG_PCI
> - .pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
> - .pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
> -#endif
> - .get_irq= mpic_get_irq,
> - .calibrate_decr = generic_calibrate_decr,
> - .progress   = udbg_progress,
> -};
>   
> -define_machine(p2020_rdb_pc) {
> - .name   = "P2020RDB-PC",
> - .probe  = p2020_rdb_pc_probe,
> +define_machine(p2020) {
> + .name   = "Freescale P2020",
> + .probe  = p2020_probe,
>   .setup_arch = p2020_setup_arch,
>   .init_IRQ   = p2020_pic_init,
>   #ifdef CONFIG_PCI
>   .pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
> - .pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
> + .pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
>   #endif
>   .get_irq= mpic_get_irq,
>   .calibrate_decr = generic_calibrate_decr,
>   .progress   = udbg_progress,
>   };
> -#endif /* CONFIG_MPC85xx_RDB */

Re: [PATCH 4/7] powerpc/85xx: p2020: Unify .setup_arch and .init_IRQ callbacks

2022-09-26 Thread Christophe Leroy



Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> Make just one .setup_arch and one .init_IRQ callback implementation for all
> P2020 board code. This deduplicate repeated and same code.

I think this patch should be split in two parts:

First patch : Create function mpc85xx_8259_init
Second patch : Refactor.

> 
> Signed-off-by: Pali Rohár 
> ---
>   arch/powerpc/platforms/85xx/p2020.c | 97 +
>   1 file changed, 30 insertions(+), 67 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/85xx/p2020.c 
> b/arch/powerpc/platforms/85xx/p2020.c
> index d65d4c88ac47..d327e6c9b838 100644
> --- a/arch/powerpc/platforms/85xx/p2020.c
> +++ b/arch/powerpc/platforms/85xx/p2020.c
> @@ -42,9 +42,8 @@
>   #define DBG(fmt, args...)
>   #endif
>   
> -#ifdef CONFIG_MPC85xx_DS
> -
>   #ifdef CONFIG_PPC_I8259
> +
>   static void mpc85xx_8259_cascade(struct irq_desc *desc)
>   {
>   struct irq_chip *chip = irq_desc_get_chip(desc);
> @@ -55,37 +54,21 @@ static void mpc85xx_8259_cascade(struct irq_desc *desc)
>   }
>   chip->irq_eoi(>irq_data);
>   }
> -#endif   /* CONFIG_PPC_I8259 */
>   
> -static void __init mpc85xx_ds_pic_init(void)
> +static void mpc85xx_8259_init(void)
>   {
> - struct mpic *mpic;
> -#ifdef CONFIG_PPC_I8259
>   struct device_node *np;
>   struct device_node *cascade_node = NULL;
>   int cascade_irq;
> -#endif
> -
> - mpic = mpic_alloc(NULL, 0,
> -   MPIC_BIG_ENDIAN |
> -   MPIC_SINGLE_DEST_CPU,
> - 0, 256, " OpenPIC  ");
>   
> - BUG_ON(mpic == NULL);
> - mpic_init(mpic);
> -
> -#ifdef CONFIG_PPC_I8259
> - /* Initialize the i8259 controller */
>   for_each_node_by_type(np, "interrupt-controller")
>   if (of_device_is_compatible(np, "chrp,iic")) {
>   cascade_node = np;
>   break;
>   }
>   
> - if (cascade_node == NULL) {
> - printk(KERN_DEBUG "Could not find i8259 PIC\n");
> + if (cascade_node == NULL)
>   return;
> - }
>   
>   cascade_irq = irq_of_parse_and_map(cascade_node, 0);
>   if (!cascade_irq) {
> @@ -93,12 +76,30 @@ static void __init mpc85xx_ds_pic_init(void)
>   return;
>   }
>   
> - DBG("mpc85xxds: cascade mapped to irq %d\n", cascade_irq);
> + DBG("i8259: cascade mapped to irq %d\n", cascade_irq);
>   
>   i8259_init(cascade_node, 0);
>   of_node_put(cascade_node);
>   
>   irq_set_chained_handler(cascade_irq, mpc85xx_8259_cascade);
> +}
> +
> +#endif   /* CONFIG_PPC_I8259 */
> +
> +static void __init p2020_pic_init(void)
> +{
> + struct mpic *mpic;
> +
> + mpic = mpic_alloc(NULL, 0,
> +   MPIC_BIG_ENDIAN |
> +   MPIC_SINGLE_DEST_CPU,
> + 0, 256, " OpenPIC  ");
> +
> + BUG_ON(mpic == NULL);
> + mpic_init(mpic);
> +
> +#ifdef CONFIG_PPC_I8259
> + mpc85xx_8259_init();
>   #endif  /* CONFIG_PPC_I8259 */
>   }
>   
> @@ -138,58 +139,20 @@ static void __init mpc85xx_ds_uli_init(void)
>   #endif
>   }
>   
> -#endif /* CONFIG_MPC85xx_DS */
> -
> -#ifdef CONFIG_MPC85xx_RDB
> -static void __init mpc85xx_rdb_pic_init(void)
> -{
> - struct mpic *mpic;
> -
> - mpic = mpic_alloc(NULL, 0,
> -   MPIC_BIG_ENDIAN |
> -   MPIC_SINGLE_DEST_CPU,
> -   0, 256, " OpenPIC  ");
> -
> - BUG_ON(mpic == NULL);
> - mpic_init(mpic);
> -}
> -#endif /* CONFIG_MPC85xx_RDB */
> -
>   /*
>* Setup the architecture
>*/
> -#ifdef CONFIG_MPC85xx_DS
> -static void __init mpc85xx_ds_setup_arch(void)
> +static void __init p2020_setup_arch(void)
>   {
> - if (ppc_md.progress)
> - ppc_md.progress("mpc85xx_ds_setup_arch()", 0);
> -
>   swiotlb_detect_4g();
>   fsl_pci_assign_primary();
>   mpc85xx_ds_uli_init();
>   mpc85xx_smp_init();
>   
> - printk("MPC85xx DS board from Freescale Semiconductor\n");
> -}
> -#endif /* CONFIG_MPC85xx_DS */
> -
> -#ifdef CONFIG_MPC85xx_RDB
> -static void __init mpc85xx_rdb_setup_arch(void)
> -{
> - if (ppc_md.progress)
> - ppc_md.progress("mpc85xx_rdb_setup_arch()", 0);
> -
> - mpc85xx_smp_init();
> -
> - fsl_pci_assign_primary();
> -
>   #ifdef CONFIG_QUICC_ENGINE
>   mpc85xx_qe_par_io_init();
> -#endif   /* CONFIG_QUICC_ENGINE */
> -
> - printk(KERN_INFO "MPC85xx RDB board from Freescale Semiconductor\n");
> +#endif
>   }
> -#endif /* CONFIG_MPC85xx_RDB */
>   
>   #ifdef CONFIG_MPC85xx_DS
>   machine_arch_initcall(p2020_ds, mpc85xx_common_publish_devices);
> @@ -230,8 +193,8 @@ static int __init p2020_rdb_pc_probe(void)
>   define_machine(p2020_ds) {
>   .name   = "P2020 DS",
>   .probe  = p2020_ds_probe,
> - .setup_arch = mpc85xx_ds_setup_arch,
> - .init_IRQ   = mpc85xx_ds_pic_init,
> + .setup_arch = p2020_setup_arch,
> + .init_IRQ   = p2020_pic_init,
>   #ifdef CONFIG_PCI
>

Re: [PATCH 3/7] powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c

2022-09-26 Thread Pali Rohár

On Monday 26 September 2022 09:48:02 Christophe Leroy wrote:
> Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> > This moves machine descriptions and all related code for all P2020 boards
> > into new p2020.c source file. This is preparation for code deduplication
> > and providing one unified machine description for all P2020 boards.
> 
> I'm having hard time to review this patch.
> 
> It looks like you are doing much more than just moving machine 
> descriptions and related code into p2020.c
> 
> Apparently p2020.c has a lot of code that doesn't seem be move from 
> somewhere else.
> 
> Maybe there is a need to tidy up in order to ease reviewing.

This is probably harder to read due to how git format-patch generated
this email. The important is:

 copy from arch/powerpc/platforms/85xx/mpc85xx_ds.c
 copy to arch/powerpc/platforms/85xx/p2020.c

Which means that git thinks that my newly introduced file p2020.c is
similar to old file mpc85xx_ds.c and generated diff in format which do:

 1. copy mpc85xx_ds.c to p2020.c
 2. apply diff on newly introduced file p2020.c

Code is really moved from mpc85xx_ds.c and mpc85xx_rdb.c files into file
p2020.c.

File p2020.c is new in this patch.

> > 
> > Signed-off-by: Pali Rohár 
> > ---
> >   arch/powerpc/platforms/85xx/Makefile  |   2 +
> >   arch/powerpc/platforms/85xx/mpc85xx_ds.c  |  23 ---
> >   arch/powerpc/platforms/85xx/mpc85xx_rdb.c |  44 --
> >   .../platforms/85xx/{mpc85xx_ds.c => p2020.c}  | 134 --
> >   4 files changed, 91 insertions(+), 112 deletions(-)
> >   copy arch/powerpc/platforms/85xx/{mpc85xx_ds.c => p2020.c} (65%)
> > 
> > diff --git a/arch/powerpc/platforms/85xx/Makefile 
> > b/arch/powerpc/platforms/85xx/Makefile
> > index 260fbad7967b..1ad261b4eeb6 100644
> > --- a/arch/powerpc/platforms/85xx/Makefile
> > +++ b/arch/powerpc/platforms/85xx/Makefile
> > @@ -23,6 +23,8 @@ obj-$(CONFIG_P1010_RDB)   += p1010rdb.o
> >   obj-$(CONFIG_P1022_DS)+= p1022_ds.o
> >   obj-$(CONFIG_P1022_RDK)   += p1022_rdk.o
> >   obj-$(CONFIG_P1023_RDB)   += p1023_rdb.o
> > +obj-$(CONFIG_MPC85xx_DS)  += p2020.o
> > +obj-$(CONFIG_MPC85xx_RDB) += p2020.o
> >   obj-$(CONFIG_TWR_P102x)   += twr_p102x.o
> >   obj-$(CONFIG_CORENET_GENERIC)   += corenet_generic.o
> >   obj-$(CONFIG_FB_FSL_DIU)  += t1042rdb_diu.o
> > diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c 
> > b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> > index 9a6d637ef54a..05aac997b5ed 100644
> > --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> > +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> > @@ -168,7 +168,6 @@ static int __init mpc8544_ds_probe(void)
> >   
> >   machine_arch_initcall(mpc8544_ds, mpc85xx_common_publish_devices);
> >   machine_arch_initcall(mpc8572_ds, mpc85xx_common_publish_devices);
> > -machine_arch_initcall(p2020_ds, mpc85xx_common_publish_devices);
> >   
> >   /*
> >* Called very early, device-tree isn't unflattened
> > @@ -178,14 +177,6 @@ static int __init mpc8572_ds_probe(void)
> > return !!of_machine_is_compatible("fsl,MPC8572DS");
> >   }
> >   
> > -/*
> > - * Called very early, device-tree isn't unflattened
> > - */
> > -static int __init p2020_ds_probe(void)
> > -{
> > -   return !!of_machine_is_compatible("fsl,P2020DS");
> > -}
> > -
> >   define_machine(mpc8544_ds) {
> > .name   = "MPC8544 DS",
> > .probe  = mpc8544_ds_probe,
> > @@ -213,17 +204,3 @@ define_machine(mpc8572_ds) {
> > .calibrate_decr = generic_calibrate_decr,
> > .progress   = udbg_progress,
> >   };
> > -
> > -define_machine(p2020_ds) {
> > -   .name   = "P2020 DS",
> > -   .probe  = p2020_ds_probe,
> > -   .setup_arch = mpc85xx_ds_setup_arch,
> > -   .init_IRQ   = mpc85xx_ds_pic_init,
> > -#ifdef CONFIG_PCI
> > -   .pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
> > -   .pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
> > -#endif
> > -   .get_irq= mpic_get_irq,
> > -   .calibrate_decr = generic_calibrate_decr,
> > -   .progress   = udbg_progress,
> > -};
> > diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c 
> > b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> > index b6129c148fea..05f1ed635735 100644
> > --- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> > +++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> > @@ -108,8 +108,6 @@ static void __init mpc85xx_rdb_setup_arch(void)
> > printk(KERN_INFO "MPC85xx RDB board from Freescale Semiconductor\n");
> >   }
> >   
> > -machine_arch_initcall(p2020_rdb, mpc85xx_common_publish_devices);
> > -machine_arch_initcall(p2020_rdb_pc, mpc85xx_common_publish_devices);
> >   machine_arch_initcall(p1020_mbg_pc, mpc85xx_common_publish_devices);
> >   machine_arch_initcall(p1020_rdb, mpc85xx_common_publish_devices);
> >   machine_arch_initcall(p1020_rdb_pc, mpc85xx_common_publish_devices);
> > @@ -122,13 +120,6 @@ machine_arch_initcall(p1024_rdb, 
> >

Re: [PATCH 3/7] powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c

2022-09-26 Thread Christophe Leroy



Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> This moves machine descriptions and all related code for all P2020 boards
> into new p2020.c source file. This is preparation for code deduplication
> and providing one unified machine description for all P2020 boards.

I'm having hard time to review this patch.

It looks like you are doing much more than just moving machine 
descriptions and related code into p2020.c

Apparently p2020.c has a lot of code that doesn't seem be move from 
somewhere else.

Maybe there is a need to tidy up in order to ease reviewing.

> 
> Signed-off-by: Pali Rohár 
> ---
>   arch/powerpc/platforms/85xx/Makefile  |   2 +
>   arch/powerpc/platforms/85xx/mpc85xx_ds.c  |  23 ---
>   arch/powerpc/platforms/85xx/mpc85xx_rdb.c |  44 --
>   .../platforms/85xx/{mpc85xx_ds.c => p2020.c}  | 134 --
>   4 files changed, 91 insertions(+), 112 deletions(-)
>   copy arch/powerpc/platforms/85xx/{mpc85xx_ds.c => p2020.c} (65%)
> 
> diff --git a/arch/powerpc/platforms/85xx/Makefile 
> b/arch/powerpc/platforms/85xx/Makefile
> index 260fbad7967b..1ad261b4eeb6 100644
> --- a/arch/powerpc/platforms/85xx/Makefile
> +++ b/arch/powerpc/platforms/85xx/Makefile
> @@ -23,6 +23,8 @@ obj-$(CONFIG_P1010_RDB)   += p1010rdb.o
>   obj-$(CONFIG_P1022_DS)+= p1022_ds.o
>   obj-$(CONFIG_P1022_RDK)   += p1022_rdk.o
>   obj-$(CONFIG_P1023_RDB)   += p1023_rdb.o
> +obj-$(CONFIG_MPC85xx_DS)  += p2020.o
> +obj-$(CONFIG_MPC85xx_RDB) += p2020.o
>   obj-$(CONFIG_TWR_P102x)   += twr_p102x.o
>   obj-$(CONFIG_CORENET_GENERIC)   += corenet_generic.o
>   obj-$(CONFIG_FB_FSL_DIU)+= t1042rdb_diu.o
> diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c 
> b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> index 9a6d637ef54a..05aac997b5ed 100644
> --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> @@ -168,7 +168,6 @@ static int __init mpc8544_ds_probe(void)
>   
>   machine_arch_initcall(mpc8544_ds, mpc85xx_common_publish_devices);
>   machine_arch_initcall(mpc8572_ds, mpc85xx_common_publish_devices);
> -machine_arch_initcall(p2020_ds, mpc85xx_common_publish_devices);
>   
>   /*
>* Called very early, device-tree isn't unflattened
> @@ -178,14 +177,6 @@ static int __init mpc8572_ds_probe(void)
>   return !!of_machine_is_compatible("fsl,MPC8572DS");
>   }
>   
> -/*
> - * Called very early, device-tree isn't unflattened
> - */
> -static int __init p2020_ds_probe(void)
> -{
> - return !!of_machine_is_compatible("fsl,P2020DS");
> -}
> -
>   define_machine(mpc8544_ds) {
>   .name   = "MPC8544 DS",
>   .probe  = mpc8544_ds_probe,
> @@ -213,17 +204,3 @@ define_machine(mpc8572_ds) {
>   .calibrate_decr = generic_calibrate_decr,
>   .progress   = udbg_progress,
>   };
> -
> -define_machine(p2020_ds) {
> - .name   = "P2020 DS",
> - .probe  = p2020_ds_probe,
> - .setup_arch = mpc85xx_ds_setup_arch,
> - .init_IRQ   = mpc85xx_ds_pic_init,
> -#ifdef CONFIG_PCI
> - .pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
> - .pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
> -#endif
> - .get_irq= mpic_get_irq,
> - .calibrate_decr = generic_calibrate_decr,
> - .progress   = udbg_progress,
> -};
> diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c 
> b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> index b6129c148fea..05f1ed635735 100644
> --- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> +++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> @@ -108,8 +108,6 @@ static void __init mpc85xx_rdb_setup_arch(void)
>   printk(KERN_INFO "MPC85xx RDB board from Freescale Semiconductor\n");
>   }
>   
> -machine_arch_initcall(p2020_rdb, mpc85xx_common_publish_devices);
> -machine_arch_initcall(p2020_rdb_pc, mpc85xx_common_publish_devices);
>   machine_arch_initcall(p1020_mbg_pc, mpc85xx_common_publish_devices);
>   machine_arch_initcall(p1020_rdb, mpc85xx_common_publish_devices);
>   machine_arch_initcall(p1020_rdb_pc, mpc85xx_common_publish_devices);
> @@ -122,13 +120,6 @@ machine_arch_initcall(p1024_rdb, 
> mpc85xx_common_publish_devices);
>   /*
>* Called very early, device-tree isn't unflattened
>*/
> -static int __init p2020_rdb_probe(void)
> -{
> - if (of_machine_is_compatible("fsl,P2020RDB"))
> - return 1;
> - return 0;
> -}
> -
>   static int __init p1020_rdb_probe(void)
>   {
>   if (of_machine_is_compatible("fsl,P1020RDB"))
> @@ -153,13 +144,6 @@ static int __init p1021_rdb_pc_probe(void)
>   return 0;
>   }
>   
> -static int __init p2020_rdb_pc_probe(void)
> -{
> - if (of_machine_is_compatible("fsl,P2020RDB-PC"))
> - return 1;
> - return 0;
> -}
> -
>   static int __init p1025_rdb_probe(void)
>   {
>   return of_machine_is_compatible("fsl,P1025RDB");
> @@ -180,20 +164,6 @@ static int __init

Re: [PATCH 2/7] powerpc/85xx: Mark mpc85xx_ds_pic_init() as static

2022-09-26 Thread Pali Rohár

On Monday 26 September 2022 09:43:55 Christophe Leroy wrote:
> Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> > Function mpc85xx_ds_pic_init() is not used out of the mpc85xx_ds.c file.
> > 
> > Signed-off-by: Pali Rohár 
> 
> This patch should be squashed into patch 1.

No problem. Just to explain that I split those changes into different
patches because they touch different files and different board code.
And I thought that different things should be in different patches.

> > ---
> >   arch/powerpc/platforms/85xx/mpc85xx_ds.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c 
> > b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> > index f8d2c97f39bd..9a6d637ef54a 100644
> > --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> > +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> > @@ -54,7 +54,7 @@ static void mpc85xx_8259_cascade(struct irq_desc *desc)
> >   }
> >   #endif/* CONFIG_PPC_I8259 */
> >   
> > -void __init mpc85xx_ds_pic_init(void)
> > +static void __init mpc85xx_ds_pic_init(void)
> >   {
> > struct mpic *mpic;
> >   #ifdef CONFIG_PPC_I8259

Re: [PATCH 2/7] powerpc/85xx: Mark mpc85xx_ds_pic_init() as static

2022-09-26 Thread Christophe Leroy



Le 19/08/2022 à 21:15, Pali Rohár a écrit :
> Function mpc85xx_ds_pic_init() is not used out of the mpc85xx_ds.c file.
> 
> Signed-off-by: Pali Rohár 

This patch should be squashed into patch 1.

> ---
>   arch/powerpc/platforms/85xx/mpc85xx_ds.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c 
> b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> index f8d2c97f39bd..9a6d637ef54a 100644
> --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c
> @@ -54,7 +54,7 @@ static void mpc85xx_8259_cascade(struct irq_desc *desc)
>   }
>   #endif  /* CONFIG_PPC_I8259 */
>   
> -void __init mpc85xx_ds_pic_init(void)
> +static void __init mpc85xx_ds_pic_init(void)
>   {
>   struct mpic *mpic;
>   #ifdef CONFIG_PPC_I8259

Re: [PATCH v3] powerpc/pseries/mce: Avoid instrumentation in realmode

2022-09-26 Thread Nicholas Piggin

On Mon Sep 26, 2022 at 4:18 PM AEST, Ganesh Goudar wrote:
> Part of machine check error handling is done in realmode,
> As of now instrumentation is not possible for any code that
> runs in realmode.
> When MCE is injected on KASAN enabled kernel, crash is
> observed, Hence force inline or mark no instrumentation
> for functions which can run in realmode, to avoid KASAN
> instrumentation.
>
> Signed-off-by: Ganesh Goudar 
> ---
> v2: Force inline few more functions.
>
> v3: Adding noinstr to few functions instead of __always_inline.

I would still like to consider doing a realmode annotation, but
as a minimal fix for the next merge window I suppose this is okay.
There's still no indication for why the annotation exists on the
functions which is a bit annoying, maybe not fundamentally worse
than notrace was, but the scope of reasons why it's there gets
bigger.


> ---
>  arch/powerpc/include/asm/hw_irq.h| 8 
>  arch/powerpc/include/asm/interrupt.h | 2 +-
>  arch/powerpc/include/asm/rtas.h  | 4 ++--
>  arch/powerpc/kernel/rtas.c   | 4 ++--
>  4 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/hw_irq.h 
> b/arch/powerpc/include/asm/hw_irq.h
> index 983551859891..c4d542b4a623 100644
> --- a/arch/powerpc/include/asm/hw_irq.h
> +++ b/arch/powerpc/include/asm/hw_irq.h
> @@ -111,7 +111,7 @@ static inline void __hard_RI_enable(void)
>  #ifdef CONFIG_PPC64
>  #include 
>  
> -static inline notrace unsigned long irq_soft_mask_return(void)
> +noinstr static unsigned long irq_soft_mask_return(void)
>  {
>   unsigned long flags;

Don't uninline the ones in headers.

> @@ -128,7 +128,7 @@ static inline notrace unsigned long 
> irq_soft_mask_return(void)
>   * for the critical section and as a clobber because
>   * we changed paca->irq_soft_mask
>   */
> -static inline notrace void irq_soft_mask_set(unsigned long mask)
> +noinstr static void irq_soft_mask_set(unsigned long mask)
>  {
>   /*
>* The irq mask must always include the STD bit if any are set.
> @@ -155,7 +155,7 @@ static inline notrace void irq_soft_mask_set(unsigned 
> long mask)
>   : "memory");
>  }
>  
> -static inline notrace unsigned long irq_soft_mask_set_return(unsigned long 
> mask)
> +noinstr static unsigned long irq_soft_mask_set_return(unsigned long mask)
>  {
>   unsigned long flags;
>  
> @@ -191,7 +191,7 @@ static inline notrace unsigned long 
> irq_soft_mask_or_return(unsigned long mask)
>   return flags;
>  }
>  
> -static inline unsigned long arch_local_save_flags(void)
> +static __always_inline unsigned long arch_local_save_flags(void)
>  {
>   return irq_soft_mask_return();
>  }

Can we instead add noinstr to this too, the the other ones that were
changed to always inline?

Thanks,
Nick

> diff --git a/arch/powerpc/include/asm/interrupt.h 
> b/arch/powerpc/include/asm/interrupt.h
> index 8069dbc4b8d1..090895051712 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -92,7 +92,7 @@ static inline bool is_implicit_soft_masked(struct pt_regs 
> *regs)
>   return search_kernel_soft_mask_table(regs->nip);
>  }
>  
> -static inline void srr_regs_clobbered(void)
> +static __always_inline void srr_regs_clobbered(void)
>  {
>   local_paca->srr_valid = 0;
>   local_paca->hsrr_valid = 0;
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 00531af17ce0..52d29d664fdf 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -201,13 +201,13 @@ inline uint32_t rtas_ext_event_company_id(struct 
> rtas_ext_event_log_v6 *ext_log)
>  #define PSERIES_ELOG_SECT_ID_MCE (('M' << 8) | 'C')
>  
>  static
> -inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect)
> +__always_inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect)
>  {
>   return be16_to_cpu(sect->id);
>  }
>  
>  static
> -inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect)
> +__always_inline uint16_t pseries_errorlog_length(struct pseries_errorlog 
> *sect)
>  {
>   return be16_to_cpu(sect->length);
>  }
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 693133972294..f9d78245c0e8 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -48,7 +48,7 @@
>  /* This is here deliberately so it's only used in this file */
>  void enter_rtas(unsigned long);
>  
> -static inline void do_enter_rtas(unsigned long args)
> +static __always_inline void do_enter_rtas(unsigned long args)
>  {
>   unsigned long msr;
>  
> @@ -435,7 +435,7 @@ static char *__fetch_rtas_last_error(char *altbuf)
>  #endif
>  
>  
> -static void
> +noinstr static void
>  va_rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
> va_list list)
>  {
> -- 
> 2.37.1

Re: [PATCH -next] powerpc: Avoid platform device Leak in the event of platform_device_add() fails

2022-09-26 Thread linyujun (C)


kindly ping

在 2022/9/14 11:26, Lin Yujun 写道:

Use platform_device_put() to free the platform device and return
directly in the event platform_device_add() fails.

Fixes: a28d3af2a26c ("[PATCH] 2/5 powerpc: Rework PowerMac i2c part 2")
Signed-off-by: Lin Yujun 
---
  arch/powerpc/platforms/powermac/low_i2c.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powermac/low_i2c.c 
b/arch/powerpc/platforms/powermac/low_i2c.c
index c1c430c66dc9..5171635c3450 100644
--- a/arch/powerpc/platforms/powermac/low_i2c.c
+++ b/arch/powerpc/platforms/powermac/low_i2c.c
@@ -1487,6 +1487,7 @@ static int __init pmac_i2c_create_platform_devices(void)
  {
struct pmac_i2c_bus *bus;
int i = 0;
+   int ret;
  
  	/* In the case where we are initialized from smp_init(), we must

 * not use the timer (and thus the irq). It's safe from now on
@@ -1502,7 +1503,11 @@ static int __init pmac_i2c_create_platform_devices(void)
return -ENOMEM;
bus->platform_dev->dev.platform_data = bus;
bus->platform_dev->dev.of_node = bus->busnode;
-   platform_device_add(bus->platform_dev);
+   ret = platform_device_add(bus->platform_dev);
+   if (ret) {
+   platform_device_put(bus->platform_dev);
+   return ret;
+   }
}
  
  	/* Now call platform "init" functions */

Re: [PATCH v3] powerpc/pseries/mce: Avoid instrumentation in realmode

2022-09-26 Thread Sachin Sant




> On 26-Sep-2022, at 11:48 AM, Ganesh Goudar  wrote:
> 
> Part of machine check error handling is done in realmode,
> As of now instrumentation is not possible for any code that
> runs in realmode.
> When MCE is injected on KASAN enabled kernel, crash is
> observed, Hence force inline or mark no instrumentation
> for functions which can run in realmode, to avoid KASAN
> instrumentation.
> 
> Signed-off-by: Ganesh Goudar 
> ---
> v2: Force inline few more functions.
> 
> v3: Adding noinstr to few functions instead of __always_inline.
> ---

Tested-by: Sachin Sant 

- Sachin

RE: [RFC PATCH 2/2] powerpc: nop trap instruction after WARN_ONCE fires

2022-09-26 Thread David Laight

From: Nicholas Piggin
> Sent: 23 September 2022 16:42
>
> WARN_ONCE and similar are often used in frequently executed code, and
> should not crash the system. The program check interrupt caused by
> WARN_ON_ONCE can be a significant overhead even when nothing is being
> printed. This can cause performance to become unacceptable, having the
> same effective impact to the user as a BUG_ON().
> 
> Avoid this overhead by patching the trap with a nop instruction after a
> "once" trap fires. Conditional warnings that return a result must have
> equivalent compare and branch instructions after the trap, so when it is
> nopped the statement will behave the same way. It's possible the asm
> goto should be removed entirely and this comparison just done in C now.
> 
> XXX: possibly this should schedule the patching to run in a different
> context than the program check.

I'm pretty sure WARN_ON_ONCE() is valid everywhere printk() is allowed.
In many cases this means you can't call mutex_enter().

So you need a different scheme.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

[PATCH v3 2/4] powerpc/64s: Remove unneeded #ifdef CONFIG_DEBUG_PAGEALLOC in hash_utils

2022-09-26 Thread Nicholas Miehlbradt

From: Christophe Leroy 

debug_pagealloc_enabled() is always defined and constant folds to
'false' when CONFIG_DEBUG_PAGEALLOC is not enabled.

Remove the #ifdefs, the code and associated static variables will
be optimised out by the compiler when CONFIG_DEBUG_PAGEALLOC is
not defined.

Signed-off-by: Christophe Leroy 
Signed-off-by: Nicholas Miehlbradt 
---
 arch/powerpc/mm/book3s64/hash_utils.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index fc92613dc2bf..e63ff401a6ea 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -123,11 +123,8 @@ EXPORT_SYMBOL_GPL(mmu_slb_size);
 #ifdef CONFIG_PPC_64K_PAGES
 int mmu_ci_restrictions;
 #endif
-#ifdef CONFIG_DEBUG_PAGEALLOC
 static u8 *linear_map_hash_slots;
 static unsigned long linear_map_hash_count;
-static DEFINE_SPINLOCK(linear_map_hash_lock);
-#endif /* CONFIG_DEBUG_PAGEALLOC */
 struct mmu_hash_ops mmu_hash_ops;
 EXPORT_SYMBOL(mmu_hash_ops);
 
@@ -427,11 +424,9 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long 
vend,
break;
 
cond_resched();
-#ifdef CONFIG_DEBUG_PAGEALLOC
if (debug_pagealloc_enabled() &&
(paddr >> PAGE_SHIFT) < linear_map_hash_count)
linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
-#endif /* CONFIG_DEBUG_PAGEALLOC */
}
return ret < 0 ? ret : 0;
 }
@@ -1066,7 +1061,6 @@ static void __init htab_initialize(void)
 
prot = pgprot_val(PAGE_KERNEL);
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
if (debug_pagealloc_enabled()) {
linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT;
linear_map_hash_slots = memblock_alloc_try_nid(
@@ -1076,7 +1070,6 @@ static void __init htab_initialize(void)
panic("%s: Failed to allocate %lu bytes max_addr=%pa\n",
  __func__, linear_map_hash_count, _rma_size);
}
-#endif /* CONFIG_DEBUG_PAGEALLOC */
 
/* create bolted the linear mapping in the hash table */
for_each_mem_range(i, , ) {
@@ -1991,6 +1984,8 @@ long hpte_insert_repeating(unsigned long hash, unsigned 
long vpn,
 }
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
+static DEFINE_SPINLOCK(linear_map_hash_lock);
+
 static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
 {
unsigned long hash;
-- 
2.34.1

[PATCH v3 4/4] powerpc/64s: Enable KFENCE on book3s64

2022-09-26 Thread Nicholas Miehlbradt

KFENCE support was added for ppc32 in commit 90cbac0e995d
("powerpc: Enable KFENCE for PPC32").
Enable KFENCE on ppc64 architecture with hash and radix MMUs.
It uses the same mechanism as debug pagealloc to
protect/unprotect pages. All KFENCE kunit tests pass on both
MMUs.

KFENCE memory is initially allocated using memblock but is
later marked as SLAB allocated. This necessitates the change
to __pud_free to ensure that the KFENCE pages are freed
appropriately.

Based on previous work by Christophe Leroy and Jordan Niethe.

Signed-off-by: Nicholas Miehlbradt 
---
v2: Refactor
v3: Simplified ABI version check
---
 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/include/asm/book3s/64/pgalloc.h |  6 --
 arch/powerpc/include/asm/book3s/64/pgtable.h |  2 +-
 arch/powerpc/include/asm/kfence.h| 15 +++
 arch/powerpc/mm/book3s64/hash_utils.c| 10 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c |  6 --
 6 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a4f8a5276e5c..f7dd0f49510d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -194,7 +194,7 @@ config PPC
select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN  if PPC_RADIX_MMU
select HAVE_ARCH_KASAN_VMALLOC  if HAVE_ARCH_KASAN
-   select HAVE_ARCH_KFENCE if PPC_BOOK3S_32 || PPC_8xx || 
40x
+   select HAVE_ARCH_KFENCE if ARCH_SUPPORTS_DEBUG_PAGEALLOC
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index e1af0b394ceb..dd2cff53a111 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -113,9 +113,11 @@ static inline void __pud_free(pud_t *pud)
 
/*
 * Early pud pages allocated via memblock allocator
-* can't be directly freed to slab
+* can't be directly freed to slab. KFENCE pages have
+* both reserved and slab flags set so need to be freed
+* kmem_cache_free.
 */
-   if (PageReserved(page))
+   if (PageReserved(page) && !PageSlab(page))
free_reserved_page(page);
else
kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), pud);
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index cb9d5fd39d7f..fd5d800f2836 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1123,7 +1123,7 @@ static inline void vmemmap_remove_mapping(unsigned long 
start,
 }
 #endif
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
+#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
 static inline void __kernel_map_pages(struct page *page, int numpages, int 
enable)
 {
if (radix_enabled())
diff --git a/arch/powerpc/include/asm/kfence.h 
b/arch/powerpc/include/asm/kfence.h
index a9846b68c6b9..6fd2b4d486c5 100644
--- a/arch/powerpc/include/asm/kfence.h
+++ b/arch/powerpc/include/asm/kfence.h
@@ -11,11 +11,25 @@
 #include 
 #include 
 
+#ifdef CONFIG_PPC64_ELF_ABI_V1
+#define ARCH_FUNC_PREFIX "."
+#endif
+
 static inline bool arch_kfence_init_pool(void)
 {
return true;
 }
 
+#ifdef CONFIG_PPC64
+static inline bool kfence_protect_page(unsigned long addr, bool protect)
+{
+   struct page *page = virt_to_page(addr);
+
+   __kernel_map_pages(page, 1, !protect);
+
+   return true;
+}
+#else
 static inline bool kfence_protect_page(unsigned long addr, bool protect)
 {
pte_t *kpte = virt_to_kpte(addr);
@@ -29,5 +43,6 @@ static inline bool kfence_protect_page(unsigned long addr, 
bool protect)
 
return true;
 }
+#endif
 
 #endif /* __ASM_POWERPC_KFENCE_H */
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index b37412fe5930..9cceaa5998a3 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -424,7 +424,7 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long 
vend,
break;
 
cond_resched();
-   if (debug_pagealloc_enabled() &&
+   if (debug_pagealloc_enabled_or_kfence() &&
(paddr >> PAGE_SHIFT) < linear_map_hash_count)
linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
}
@@ -773,7 +773,7 @@ static void __init htab_init_page_sizes(void)
bool aligned = true;
init_hpte_page_sizes();
 
-   if (!debug_pagealloc_enabled()) {
+   if (!debug_pagealloc_enabled_or_kfence()) {
/*
 * Pick a size for the linear mapping. Currently, we only
 * support 16M, 1M and 4K which is the default
@@

[PATCH v3 3/4] powerpc/64s: Allow double call of kernel_[un]map_linear_page()

2022-09-26 Thread Nicholas Miehlbradt

From: Christophe Leroy 

If the page is already mapped resp. already unmapped, bail out.

Signed-off-by: Christophe Leroy 
Signed-off-by: Nicholas Miehlbradt 
---
 arch/powerpc/mm/book3s64/hash_utils.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index e63ff401a6ea..b37412fe5930 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -2000,6 +2000,9 @@ static void kernel_map_linear_page(unsigned long vaddr, 
unsigned long lmi)
if (!vsid)
return;
 
+   if (linear_map_hash_slots[lmi] & 0x80)
+   return;
+
ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode,
HPTE_V_BOLTED,
mmu_linear_psize, mmu_kernel_ssize);
@@ -2019,7 +2022,10 @@ static void kernel_unmap_linear_page(unsigned long 
vaddr, unsigned long lmi)
 
hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize);
spin_lock(_map_hash_lock);
-   BUG_ON(!(linear_map_hash_slots[lmi] & 0x80));
+   if (!(linear_map_hash_slots[lmi] & 0x80)) {
+   spin_unlock(_map_hash_lock);
+   return;
+   }
hidx = linear_map_hash_slots[lmi] & 0x7f;
linear_map_hash_slots[lmi] = 0;
spin_unlock(_map_hash_lock);
-- 
2.34.1

[PATCH v3 1/4] powerpc/64s: Add DEBUG_PAGEALLOC for radix

2022-09-26 Thread Nicholas Miehlbradt

There is support for DEBUG_PAGEALLOC on hash but not on radix.
Add support on radix.

Signed-off-by: Nicholas Miehlbradt 
---
v2: Revert change to radix_memory_block_size, instead set the size
in radix_init_pgtable and radix__create_section_mapping directly.
v3: Remove max_mapping_size argument of create_physical_mapping
as the value is the same at all call sites.
---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index db2f3d193448..daa40e3b74dd 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -267,13 +268,16 @@ static unsigned long next_boundary(unsigned long addr, 
unsigned long end)
 
 static int __meminit create_physical_mapping(unsigned long start,
 unsigned long end,
-unsigned long max_mapping_size,
 int nid, pgprot_t _prot)
 {
unsigned long vaddr, addr, mapping_size = 0;
bool prev_exec, exec = false;
pgprot_t prot;
int psize;
+   unsigned long max_mapping_size = radix_mem_block_size;
+
+   if (debug_pagealloc_enabled())
+   max_mapping_size = PAGE_SIZE;
 
start = ALIGN(start, PAGE_SIZE);
end   = ALIGN_DOWN(end, PAGE_SIZE);
@@ -352,7 +356,6 @@ static void __init radix_init_pgtable(void)
}
 
WARN_ON(create_physical_mapping(start, end,
-   radix_mem_block_size,
-1, PAGE_KERNEL));
}
 
@@ -850,7 +853,7 @@ int __meminit radix__create_section_mapping(unsigned long 
start,
}
 
return create_physical_mapping(__pa(start), __pa(end),
-  radix_mem_block_size, nid, prot);
+  nid, prot);
 }
 
 int __meminit radix__remove_section_mapping(unsigned long start, unsigned long 
end)
@@ -899,7 +902,14 @@ void __meminit radix__vmemmap_remove_mapping(unsigned long 
start, unsigned long
 #ifdef CONFIG_DEBUG_PAGEALLOC
 void radix__kernel_map_pages(struct page *page, int numpages, int enable)
 {
-   pr_warn_once("DEBUG_PAGEALLOC not supported in radix mode\n");
+   unsigned long addr;
+
+   addr = (unsigned long)page_address(page);
+
+   if (enable)
+   set_memory_p(addr, numpages);
+   else
+   set_memory_np(addr, numpages);
 }
 #endif
 
-- 
2.34.1

Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")

2022-09-26 Thread Kalle Valo

David Hildenbrand  writes:

>>> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
>>> +**
>>> +
>>> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it
>>> +is common for a given warning condition, if it occurs at all, to occur
>>> +multiple times. This can fill up and wrap the kernel log, and can even slow
>>> +the system enough that the excessive logging turns into its own, additional
>>> +problem.
>>
>> FWIW I have had cases where WARN() messages caused a reboot, maybe
>> mention that here? In my case the logging was so excessive that the
>> watchdog wasn't updated and in the end the device was forcefully
>> rebooted.
>>
>
> That should be covered by the last part, no? What would be your suggestion?

I was just thinking that maybe make it more obvious that even WARN_ON()
can crash the system, something along these lines:

"..., additional problem like stalling the system so much that it causes
a reboot."

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies

2022-09-26 Thread Michal Suchánek

On Mon, Sep 26, 2022 at 08:47:32AM +0200, Greg Kroah-Hartman wrote:
> On Sat, Sep 24, 2022 at 01:55:23PM +0200, Michal Suchánek wrote:
> > On Sat, Sep 24, 2022 at 12:13:34PM +0200, Greg Kroah-Hartman wrote:
> > > On Sat, Sep 24, 2022 at 11:45:21AM +0200, Michal Suchánek wrote:
> > > > On Sat, Sep 24, 2022 at 11:19:19AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Fri, Sep 23, 2022 at 07:10:28PM +0200, Michal Suchanek wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > this is backport of commit 0d519cadf751
> > > > > > ("arm64: kexec_file: use more system keyrings to verify kernel 
> > > > > > image signature")
> > > > > > to table 5.15 tree including the preparatory patches.
> > > > > 
> > > > > This feels to me like a new feature for arm64, one that has never 
> > > > > worked
> > > > > before and you are just making it feature-parity with x86, right?
> > > > > 
> > > > > Or is this a regression fix somewhere?  Why is this needed in 5.15.y 
> > > > > and
> > > > > why can't people who need this new feature just use a newer kernel
> > > > > version (5.19?)
> > > > 
> > > > It's half-broken implementation of the kexec kernel verification. At 
> > > > the time
> > > > it was implemented for arm64 we had the platform and secondary keyrings
> > > > and x86 was using them but on arm64 the initial implementation ignores
> > > > them.
> > > 
> > > Ok, so it's something that never worked.  Adding support to get it to
> > > work doesn't really fall into the stable kernel rules, right?
> > 
> > Not sure. It was defective, not using the facilities available at the
> > time correctly. Which translates to kernels that can be kexec'd on x86
> > failing to kexec on arm64 without any explanation (signed with same key,
> > built for the appropriate arch).
> 
> Feature parity across architectures is not a "regression", but rather a
> "this feature is not implemented for this architecture yet" type of
> thing.

That depends on the view - before kexec verification you could boot any
kernel, now you can boot some kernels signed with a valid key, but not
others - the initial implementation is buggy, probably because it
is based on an old version of the x86 code.

> 
> > > Again, what's wrong with 5.19 for anyone who wants this?  Who does want
> > > this?
> > 
> > Not sure, really.
> > 
> > The final patch was repeatedly backported to stable and failed to build
> > because the prerequisites were missing.
> 
> That's because it was tagged, but now that you show the full set of
> requirements, it's pretty obvious to me that this is not relevant for
> going this far back.

That also works.

Thanks

Michal

Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies

2022-09-26 Thread Greg Kroah-Hartman

On Sat, Sep 24, 2022 at 01:55:23PM +0200, Michal Suchánek wrote:
> On Sat, Sep 24, 2022 at 12:13:34PM +0200, Greg Kroah-Hartman wrote:
> > On Sat, Sep 24, 2022 at 11:45:21AM +0200, Michal Suchánek wrote:
> > > On Sat, Sep 24, 2022 at 11:19:19AM +0200, Greg Kroah-Hartman wrote:
> > > > On Fri, Sep 23, 2022 at 07:10:28PM +0200, Michal Suchanek wrote:
> > > > > Hello,
> > > > > 
> > > > > this is backport of commit 0d519cadf751
> > > > > ("arm64: kexec_file: use more system keyrings to verify kernel image 
> > > > > signature")
> > > > > to table 5.15 tree including the preparatory patches.
> > > > 
> > > > This feels to me like a new feature for arm64, one that has never worked
> > > > before and you are just making it feature-parity with x86, right?
> > > > 
> > > > Or is this a regression fix somewhere?  Why is this needed in 5.15.y and
> > > > why can't people who need this new feature just use a newer kernel
> > > > version (5.19?)
> > > 
> > > It's half-broken implementation of the kexec kernel verification. At the 
> > > time
> > > it was implemented for arm64 we had the platform and secondary keyrings
> > > and x86 was using them but on arm64 the initial implementation ignores
> > > them.
> > 
> > Ok, so it's something that never worked.  Adding support to get it to
> > work doesn't really fall into the stable kernel rules, right?
> 
> Not sure. It was defective, not using the facilities available at the
> time correctly. Which translates to kernels that can be kexec'd on x86
> failing to kexec on arm64 without any explanation (signed with same key,
> built for the appropriate arch).

Feature parity across architectures is not a "regression", but rather a
"this feature is not implemented for this architecture yet" type of
thing.

> > Again, what's wrong with 5.19 for anyone who wants this?  Who does want
> > this?
> 
> Not sure, really.
> 
> The final patch was repeatedly backported to stable and failed to build
> because the prerequisites were missing.

That's because it was tagged, but now that you show the full set of
requirements, it's pretty obvious to me that this is not relevant for
going this far back.

thanks,

greg k-h

[PATCH v2 5/6] powerpc/64: Add support for out-of-line static calls

2022-09-26 Thread Benjamin Gray

Implement static call support for 64 bit V2 ABI. This requires
making sure the TOC is kept correct across kernel-module
boundaries. As a secondary concern, it tries to use the local
entry point of a target wherever possible. It does so by
checking if both tramp & target are kernel code, and falls
back to detecting the common global entry point patterns
if modules are involved. Detecting the global entry point is
also required for setting the local entry point as the trampoline
target: if we cannot detect the local entry point, then we need to
convservatively initialise r12 and use the global entry point.

The trampolines are marked with `.localentry NAME, 1` to make the
linker save and restore the TOC on each call to the trampoline. This
allows the trampoline to safely target functions with different TOC
values.

However this directive also implies the TOC is not initialised on entry
to the trampoline. The kernel TOC is easily found in the PACA, but not
an arbitrary module TOC. Therefore the trampoline implementation depends
on whether it's in the kernel or not. If in the kernel, we initialise
the TOC using the PACA. If in a module, we have to initialise the TOC
with zero context, so it's quite expensive.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/include/asm/code-patching.h |  1 +
 arch/powerpc/include/asm/static_call.h   | 80 +++--
 arch/powerpc/kernel/Makefile |  3 +-
 arch/powerpc/kernel/static_call.c| 90 ++--
 5 files changed, 164 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4c466acdc70d..e7a66635eade 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -248,7 +248,7 @@ config PPC
select HAVE_SOFTIRQ_ON_OWN_STACK
select HAVE_STACKPROTECTOR  if PPC32 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
select HAVE_STACKPROTECTOR  if PPC64 && 
$(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
-   select HAVE_STATIC_CALL if PPC32
+   select HAVE_STATIC_CALL if PPC32 || PPC64_ELF_ABI_V2
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_VIRT_CPU_ACCOUNTING
select HUGETLB_PAGE_SIZE_VARIABLE   if PPC_BOOK3S_64 && HUGETLB_PAGE
diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 15efd8ab22da..8d1850080af8 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -132,6 +132,7 @@ int translate_branch(ppc_inst_t *instr, const u32 *dest, 
const u32 *src);
 bool is_conditional_branch(ppc_inst_t instr);
 
 #define OP_RT_RA_MASK  0xUL
+#define OP_SI_MASK 0xUL
 #define LIS_R2 (PPC_RAW_LIS(_R2, 0))
 #define ADDIS_R2_R12   (PPC_RAW_ADDIS(_R2, _R12, 0))
 #define ADDI_R2_R2 (PPC_RAW_ADDI(_R2, _R2, 0))
diff --git a/arch/powerpc/include/asm/static_call.h 
b/arch/powerpc/include/asm/static_call.h
index de1018cc522b..3d6e82200cb7 100644
--- a/arch/powerpc/include/asm/static_call.h
+++ b/arch/powerpc/include/asm/static_call.h
@@ -2,12 +2,75 @@
 #ifndef _ASM_POWERPC_STATIC_CALL_H
 #define _ASM_POWERPC_STATIC_CALL_H
 
+#ifdef CONFIG_PPC64_ELF_ABI_V2
+
+#ifdef MODULE
+
+#define __PPC_SCT(name, inst)  \
+   asm(".pushsection .text, \"ax\" \n" \
+   ".align 6   \n" \
+   ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \
+   ".localentry " STATIC_CALL_TRAMP_STR(name) ", 1 \n" \
+   STATIC_CALL_TRAMP_STR(name) ":  \n" \
+   "   mflr11  \n" \
+   "   bcl 20, 31, $+4 \n" \
+   "0: mflr12  \n" \
+   "   mtlr11  \n" \
+   "   addi12, 12, (" STATIC_CALL_TRAMP_STR(name) " - 0b)  \n" 
\
+   "   addis 2, 12, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@ha \n" 
\
+   "   addi 2, 2, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@l\n" 
\
+   "   " inst "\n" \
+   "   ld  12, (2f - " STATIC_CALL_TRAMP_STR(name) ")(12)  \n" 
\
+   "   mtctr   12  \n" \
+   "   bctr\n" \
+   "1: li  3, 0\n" \
+   "   blr \n" \
+   ".balign 8  \n" \
+   "2: .8byte 0\n" \
+   ".type " STATIC_CALL_TRAMP_STR(name) ",

[PATCH v2 2/6] powerpc/module: Handle caller-saved TOC in module linker

2022-09-26 Thread Benjamin Gray

The callee may set a field in `st_other` to 1 to indicate r2 should be
treated as caller-saved. This means a trampoline must be used to save
the current TOC before calling it and restore it afterwards, much like
external calls.

This is necessary for supporting V2 ABI static calls that do not
preserve the TOC.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/module_64.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 7e45dc98df8a..4d816f7785b4 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -55,6 +55,12 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym)
 * of function and try to derive r2 from it). */
return PPC64_LOCAL_ENTRY_OFFSET(sym->st_other);
 }
+
+static bool need_r2save_stub(unsigned char st_other)
+{
+   return ((st_other & STO_PPC64_LOCAL_MASK) >> STO_PPC64_LOCAL_BIT) == 1;
+}
+
 #else
 
 static func_desc_t func_desc(unsigned long addr)
@@ -66,6 +72,11 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym)
return 0;
 }
 
+static bool need_r2save_stub(unsigned char st_other)
+{
+   return false;
+}
+
 void *dereference_module_function_descriptor(struct module *mod, void *ptr)
 {
if (ptr < (void *)mod->arch.start_opd ||
@@ -632,7 +643,8 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
case R_PPC_REL24:
/* FIXME: Handle weak symbols here --RR */
if (sym->st_shndx == SHN_UNDEF ||
-   sym->st_shndx == SHN_LIVEPATCH) {
+   sym->st_shndx == SHN_LIVEPATCH ||
+   need_r2save_stub(sym->st_other)) {
/* External: go via stub */
value = stub_for_addr(sechdrs, value, me,
strtab + sym->st_name);
-- 
2.37.3

[PATCH v2 4/6] static_call: Move static call selftest to static_call_selftest.c

2022-09-26 Thread Benjamin Gray

These tests are out-of-line only, so moving them to the
their own file allows them to be run when an arch does
not implement inline static calls.

Signed-off-by: Benjamin Gray 
---
 kernel/Makefile   |  1 +
 kernel/static_call_inline.c   | 43 ---
 kernel/static_call_selftest.c | 41 +
 3 files changed, 42 insertions(+), 43 deletions(-)
 create mode 100644 kernel/static_call_selftest.c

diff --git a/kernel/Makefile b/kernel/Makefile
index 318789c728d3..8ce8beaa3cc0 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -113,6 +113,7 @@ obj-$(CONFIG_KCSAN) += kcsan/
 obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
 obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o
 obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o
+obj-$(CONFIG_STATIC_CALL_SELFTEST) += static_call_selftest.o
 obj-$(CONFIG_CFI_CLANG) += cfi.o
 
 obj-$(CONFIG_PERF_EVENTS) += events/
diff --git a/kernel/static_call_inline.c b/kernel/static_call_inline.c
index dc5665b62814..64d04d054698 100644
--- a/kernel/static_call_inline.c
+++ b/kernel/static_call_inline.c
@@ -498,46 +498,3 @@ int __init static_call_init(void)
return 0;
 }
 early_initcall(static_call_init);
-
-#ifdef CONFIG_STATIC_CALL_SELFTEST
-
-static int func_a(int x)
-{
-   return x+1;
-}
-
-static int func_b(int x)
-{
-   return x+2;
-}
-
-DEFINE_STATIC_CALL(sc_selftest, func_a);
-
-static struct static_call_data {
-  int (*func)(int);
-  int val;
-  int expect;
-} static_call_data [] __initdata = {
-  { NULL,   2, 3 },
-  { func_b, 2, 4 },
-  { func_a, 2, 3 }
-};
-
-static int __init test_static_call_init(void)
-{
-  int i;
-
-  for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) {
- struct static_call_data *scd = _call_data[i];
-
-  if (scd->func)
-  static_call_update(sc_selftest, scd->func);
-
-  WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect);
-  }
-
-  return 0;
-}
-early_initcall(test_static_call_init);
-
-#endif /* CONFIG_STATIC_CALL_SELFTEST */
diff --git a/kernel/static_call_selftest.c b/kernel/static_call_selftest.c
new file mode 100644
index ..246ad89f64eb
--- /dev/null
+++ b/kernel/static_call_selftest.c
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+
+static int func_a(int x)
+{
+   return x+1;
+}
+
+static int func_b(int x)
+{
+   return x+2;
+}
+
+DEFINE_STATIC_CALL(sc_selftest, func_a);
+
+static struct static_call_data {
+   int (*func)(int);
+   int val;
+   int expect;
+} static_call_data [] __initdata = {
+   { NULL,   2, 3 },
+   { func_b, 2, 4 },
+   { func_a, 2, 3 }
+};
+
+static int __init test_static_call_init(void)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) {
+   struct static_call_data *scd = _call_data[i];
+
+   if (scd->func)
+   static_call_update(sc_selftest, scd->func);
+
+   WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect);
+   }
+
+   return 0;
+}
+early_initcall(test_static_call_init);
-- 
2.37.3

[PATCH v2 6/6] powerpc/64: Add tests for out-of-line static calls

2022-09-26 Thread Benjamin Gray

KUnit tests for the various combinations of caller/trampoline/target and
kernel/module. They must be run from a module loaded at runtime to
guarantee they have a different TOC to the kernel.

The tests try to mitigate the chance of panicing by restoring the
TOC after every static call. Not all possible errors can be caught
by this (we can't stop a trampoline from using a bad TOC itself),
but it makes certain errors easier to debug.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/Kconfig   |  10 +
 arch/powerpc/kernel/Makefile   |   1 +
 arch/powerpc/kernel/static_call.c  |  61 ++
 arch/powerpc/kernel/static_call_test.c | 251 +
 arch/powerpc/kernel/static_call_test.h |  56 ++
 5 files changed, 379 insertions(+)
 create mode 100644 arch/powerpc/kernel/static_call_test.c
 create mode 100644 arch/powerpc/kernel/static_call_test.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e7a66635eade..0ca60514c0e2 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -1023,6 +1023,16 @@ config PPC_RTAS_FILTER
  Say Y unless you know what you are doing and the filter is causing
  problems for you.
 
+config PPC_STATIC_CALL_KUNIT_TEST
+   tristate "KUnit tests for PPC64 ELF ABI V2 static calls"
+   default KUNIT_ALL_TESTS
+   depends on HAVE_STATIC_CALL && PPC64_ELF_ABI_V2 && KUNIT && m
+   help
+ Tests that check the TOC is kept consistent across all combinations
+ of caller/trampoline/target being kernel/module. Must be built as a
+ module and loaded at runtime to ensure the module has a different
+ TOC to the kernel.
+
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index a30d0d0f5499..22c07e3d34df 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -131,6 +131,7 @@ obj-$(CONFIG_RELOCATABLE)   += reloc_$(BITS).o
 obj-$(CONFIG_PPC32)+= entry_32.o setup_32.o early_32.o
 obj-$(CONFIG_PPC64)+= dma-iommu.o iommu.o
 obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o
+obj-$(CONFIG_PPC_STATIC_CALL_KUNIT_TEST)   += static_call_test.o
 obj-$(CONFIG_KGDB) += kgdb.o
 obj-$(CONFIG_BOOTX_TEXT)   += btext.o
 obj-$(CONFIG_SMP)  += smp.o
diff --git a/arch/powerpc/kernel/static_call.c 
b/arch/powerpc/kernel/static_call.c
index ecbb74e1b4d3..8d338917b70e 100644
--- a/arch/powerpc/kernel/static_call.c
+++ b/arch/powerpc/kernel/static_call.c
@@ -113,3 +113,64 @@ void arch_static_call_transform(void *site, void *tramp, 
void *func, bool tail)
panic("%s: patching failed %pS at %pS\n", __func__, func, 
tramp);
 }
 EXPORT_SYMBOL_GPL(arch_static_call_transform);
+
+
+#if IS_MODULE(CONFIG_PPC_STATIC_CALL_KUNIT_TEST)
+
+#include "static_call_test.h"
+
+int ppc_sc_kernel_target_1(struct kunit* test)
+{
+   toc_fixup(test);
+   return 1;
+}
+
+int ppc_sc_kernel_target_2(struct kunit* test)
+{
+   toc_fixup(test);
+   return 2;
+}
+
+DEFINE_STATIC_CALL(ppc_sc_kernel, ppc_sc_kernel_target_1);
+
+int ppc_sc_kernel_call(struct kunit* test)
+{
+   return PROTECTED_SC(test, int, static_call(ppc_sc_kernel)(test));
+}
+
+int ppc_sc_kernel_call_indirect(struct kunit* test, int (*fn)(struct kunit*))
+{
+   return PROTECTED_SC(test, int, fn(test));
+}
+
+long ppc_sc_kernel_target_big(struct kunit* test,
+ long a,
+ long b,
+ long c,
+ long d,
+ long e,
+ long f,
+ long g,
+ long h,
+ long i)
+{
+   toc_fixup(test);
+   KUNIT_EXPECT_EQ(test, a, b);
+   KUNIT_EXPECT_EQ(test, a, c);
+   KUNIT_EXPECT_EQ(test, a, d);
+   KUNIT_EXPECT_EQ(test, a, e);
+   KUNIT_EXPECT_EQ(test, a, f);
+   KUNIT_EXPECT_EQ(test, a, g);
+   KUNIT_EXPECT_EQ(test, a, h);
+   KUNIT_EXPECT_EQ(test, a, i);
+   return ~a;
+}
+
+EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_1);
+EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_2);
+EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_big);
+EXPORT_STATIC_CALL_GPL(ppc_sc_kernel);
+EXPORT_SYMBOL_GPL(ppc_sc_kernel_call);
+EXPORT_SYMBOL_GPL(ppc_sc_kernel_call_indirect);
+
+#endif /* IS_MODULE(CONFIG_PPC_STATIC_CALL_KUNIT_TEST) */
diff --git a/arch/powerpc/kernel/static_call_test.c 
b/arch/powerpc/kernel/static_call_test.c
new file mode 100644
index ..2d69524d935f
--- /dev/null
+++ b/arch/powerpc/kernel/static_call_test.c
@@ -0,0 +1,251 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "static_call_test.h"
+
+#include 
+#include 
+#include 
+
+/*
+ * Tests to ensure correctness in a variety of cases for static calls.
+ *
+ * The tests focus on ensuring the TOC is kept consistent across the
+ * module-kernel boundary, as compilers can't see that a trampoline

[PATCH v2 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub

2022-09-26 Thread Benjamin Gray

Inserts a direct branch to the stub target when possible, replacing the
mtctr/btctr sequence.

The load into r12 could potentially be skipped too, but that change
would need to refactor the arguments to indicate that the address
does not have a separate local entry point.

This helps the static call implementation, where modules calling their
own trampolines are called through this stub and the trampoline is
easily within range of a direct branch.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/kernel/module_64.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c
index 4d816f7785b4..745ce9097dcf 100644
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@ -141,6 +141,12 @@ static u32 ppc64_stub_insns[] = {
PPC_RAW_BCTR(),
 };
 
+#ifdef CONFIG_PPC64_ELF_ABI_V1
+#define PPC64_STUB_MTCTR_OFFSET 5
+#else
+#define PPC64_STUB_MTCTR_OFFSET 4
+#endif
+
 /* Count how many different 24-bit relocations (different symbol,
different addend) */
 static unsigned int count_relocs(const Elf64_Rela *rela, unsigned int num)
@@ -429,6 +435,8 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
long reladdr;
func_desc_t desc;
int i;
+   u32 *jump_seq_addr = >jump[PPC64_STUB_MTCTR_OFFSET];
+   ppc_inst_t direct;
 
if (is_mprofile_ftrace_call(name))
return create_ftrace_stub(entry, addr, me);
@@ -439,6 +447,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs,
return 0;
}
 
+   /* Replace indirect branch sequence with direct branch where possible */
+   if (!create_branch(, jump_seq_addr, addr, 0))
+   if (patch_instruction(jump_seq_addr, direct))
+   return 0;
+
/* Stub uses address relative to r2. */
reladdr = (unsigned long)entry - my_r2(sechdrs, me);
if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {
-- 
2.37.3

[PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function

2022-09-26 Thread Benjamin Gray

Adds a generic text patching mechanism for patches of 1, 2, 4, or (64-bit) 8
bytes. The patcher conditionally syncs the icache depending on if
the content will be executed (as opposed to, e.g., read-only data).

The `patch_instruction` function is reimplemented in terms of this
more generic function. This generic implementation allows patching of
arbitrary 64-bit data, whereas the original `patch_instruction` decided
the size based on the 'instruction' opcode, so was not suitable for
arbitrary data.

Signed-off-by: Benjamin Gray 
---
 arch/powerpc/include/asm/code-patching.h |  7 ++
 arch/powerpc/lib/code-patching.c | 90 +---
 2 files changed, 71 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/code-patching.h 
b/arch/powerpc/include/asm/code-patching.h
index 1c6316ec4b74..15efd8ab22da 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -76,6 +76,13 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr,
 int patch_branch(u32 *addr, unsigned long target, int flags);
 int patch_instruction(u32 *addr, ppc_inst_t instr);
 int raw_patch_instruction(u32 *addr, ppc_inst_t instr);
+int __patch_memory(void *dest, unsigned long src, size_t size);
+
+#define patch_memory(addr, val) \
+({ \
+   BUILD_BUG_ON(!__native_word(val)); \
+   __patch_memory(addr, (unsigned long) val, sizeof(val)); \
+})

 static inline unsigned long patch_site_addr(s32 *site)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index ad0cf3108dd0..9979380d55ef 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -15,20 +15,47 @@
 #include 
 #include 

-static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 
*patch_addr)
+static int __always_inline ___patch_memory(void *patch_addr,
+  unsigned long data,
+  void *prog_addr,
+  size_t size)
 {
-   if (!ppc_inst_prefixed(instr)) {
-   u32 val = ppc_inst_val(instr);
+   switch (size) {
+   case 1:
+   __put_kernel_nofault(patch_addr, , u8, failed);
+   break;
+   case 2:
+   __put_kernel_nofault(patch_addr, , u16, failed);
+   break;
+   case 4:
+   __put_kernel_nofault(patch_addr, , u32, failed);
+   break;
+#ifdef CONFIG_PPC64
+   case 8:
+   __put_kernel_nofault(patch_addr, , u64, failed);
+   break;
+#endif
+   default:
+   unreachable();
+   }

-   __put_kernel_nofault(patch_addr, , u32, failed);
-   } else {
-   u64 val = ppc_inst_as_ulong(instr);
+   dcbst(patch_addr);
+   dcbst(patch_addr + size - 1); /* Last byte of data may cross a 
cacheline */

-   __put_kernel_nofault(patch_addr, , u64, failed);
-   }
+   mb(); /* sync */
+
+   /* Flush on the EA that may be executed in case of a non-coherent 
icache */
+   icbi(prog_addr);
+
+   /* Also flush the last byte of the instruction if it may be a
+* prefixed instruction and we aren't assuming minimum 64-byte
+* cacheline sizes
+*/
+   if (IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64)
+   icbi(prog_addr + size - 1);

-   asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
-   "r" (exec_addr));
+   mb(); /* sync */
+   isync();

return 0;

@@ -38,7 +65,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t 
instr, u32 *patch_addr

 int raw_patch_instruction(u32 *addr, ppc_inst_t instr)
 {
-   return __patch_instruction(addr, instr, addr);
+   if (ppc_inst_prefixed(instr))
+   return ___patch_memory(addr, ppc_inst_as_ulong(instr), addr, 
sizeof(u64));
+   else
+   return ___patch_memory(addr, ppc_inst_val(instr), addr, 
sizeof(u32));
 }

 #ifdef CONFIG_STRICT_KERNEL_RWX
@@ -147,24 +177,22 @@ static void unmap_patch_area(unsigned long addr)
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
 }

-static int __do_patch_instruction(u32 *addr, ppc_inst_t instr)
+static int __always_inline __do_patch_memory(void *dest, unsigned long src, 
size_t size)
 {
int err;
u32 *patch_addr;
-   unsigned long text_poke_addr;
pte_t *pte;
-   unsigned long pfn = get_patch_pfn(addr);
-
-   text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr & 
PAGE_MASK;
-   patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr));
+   unsigned long text_poke_addr = (unsigned 
long)__this_cpu_read(text_poke_area)->addr & PAGE_MASK;
+   unsigned long pfn = get_patch_pfn(dest);

+   patch_addr = (u32 *)(text_poke_addr + offset_in_page(dest));
pte = virt_to_kpte(text_poke_addr);

[PATCH v2 0/6] Out-of-line static calls for powerpc64 ELF V2

2022-09-26 Thread Benjamin Gray

Implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI.
Static calls patch an indirect branch into a direct branch at runtime.
Out-of-line specifically has a caller directly call a trampoline, and
the trampoline gets patched to directly call the target.

Previous version here:
https://lore.kernel.org/all/20220916062330.430468-1-bg...@linux.ibm.com/

I couldn't see a dedicated ftrace benchmark in the kernel, but my own
benchmarking showed no significant impact to ftrace activation.

The __patch_memory function is meant to be accessed through the size checking
patch_memory wrapper. I don't think there's a way to expose the macro without
also exposing __patch_memory though. I considered making the type an explicit
macro param, but using the value type seemed more ergonomic.

V2:
Mostly accounting for feedback from Christophe:
* Code patching rewritten
- Rename to *_memory
- Use __always_inline to get the compiler to realise it can
  collapse all the sub-functions
- Pass data directly instead of through a pointer, elliding a redundant load
- Flush the last byte of data too (technically redundant if an instrucion, 
but
  saves a conditional branch + the isync will be the bottleneck).
- Handle a non-cohenrent icache, assume a coherent dcache
- Handle when we don't assume a 64 byte icache on 64-bits
- Flatten the poke address init and teardown
- Check the data size in patch_memory at build time
  (inline function was suggested, but a macro makes checking
   based on the data type easier).
- It builds now on 32 bit and without strict RWX
* Static call enabling is no longer configurable
* Refactored arch_static_call_transform to minimise casting
* Made the KUnit tests more robust (previously they changed non-volatile
  registers in the init hook, but that's incorrect because it returns to
  the KUnit framework before the test case is called).
* Some other minor refactoring in other patches


Benjamin Gray (6):
  powerpc/code-patching: Implement generic text patching function
  powerpc/module: Handle caller-saved TOC in module linker
  powerpc/module: Optimise nearby branches in ELF V2 ABI stub
  static_call: Move static call selftest to static_call_selftest.c
  powerpc/64: Add support for out-of-line static calls
  powerpc/64: Add tests for out-of-line static calls

 arch/powerpc/Kconfig |  12 +-
 arch/powerpc/include/asm/code-patching.h |   8 +
 arch/powerpc/include/asm/static_call.h   |  80 +++-
 arch/powerpc/kernel/Makefile |   4 +-
 arch/powerpc/kernel/module_64.c  |  27 ++-
 arch/powerpc/kernel/static_call.c| 151 +-
 arch/powerpc/kernel/static_call_test.c   | 251 +++
 arch/powerpc/kernel/static_call_test.h   |  56 +
 arch/powerpc/lib/code-patching.c |  90 +---
 kernel/Makefile  |   1 +
 kernel/static_call_inline.c  |  43 
 kernel/static_call_selftest.c|  41 
 12 files changed, 682 insertions(+), 82 deletions(-)
 create mode 100644 arch/powerpc/kernel/static_call_test.c
 create mode 100644 arch/powerpc/kernel/static_call_test.h
 create mode 100644 kernel/static_call_selftest.c


base-commit: 3d7a198cfdb47405cfb4a3ea523876569fe341e6
--
2.37.3

Re: Is PPC 44x PIKA Warp board still relevant?

2022-09-26 Thread Christophe Leroy

Hi Dmitry

Le 25/09/2022 à 07:06, Dmitry Torokhov a écrit :
> Hi Michael, Nick,
> 
> I was wondering if PIKA Warp board still relevant. The reason for my
> question is that I am interested in dropping legacy gpio APIs,
> especially OF-specific ones, in favor of newer gpiod APIs, and
> arch/powerpc/platforms/44x/warp.c is one of few users of it.

As far as I can see, that board is still being sold, see

https://www.voipon.co.uk/pika-warp-asterisk-appliance-p-932.html


> 
> The code in question is supposed to turn off green led and flash red led
> in case of overheating, and is doing so by directly accessing GPIOs
> owned by led-gpio driver without requesting/allocating them. This is not
> really supported with gpiod API, and is not a good practice in general.

As far as I can see, it was ported to led-gpio by

ba703e1a7a0b powerpc/4xx: Have Warp take advantage of GPIO LEDs 
default-state = keep
805e324b7fbd powerpc: Update Warp to use leds-gpio driver

> Before I spend much time trying to implement a replacement without
> access to the hardware, I wonder if this board is in use at all, and if
> it is how important is the feature of flashing red led on critical
> temperature shutdown?
> 

Don't know who can tell it ?

Maybe let's perform a more standard implementation is see if anybody 
screams ?

Christophe

[PATCH v3] powerpc/pseries/mce: Avoid instrumentation in realmode

2022-09-26 Thread Ganesh Goudar

Part of machine check error handling is done in realmode,
As of now instrumentation is not possible for any code that
runs in realmode.
When MCE is injected on KASAN enabled kernel, crash is
observed, Hence force inline or mark no instrumentation
for functions which can run in realmode, to avoid KASAN
instrumentation.

Signed-off-by: Ganesh Goudar 
---
v2: Force inline few more functions.

v3: Adding noinstr to few functions instead of __always_inline.
---
 arch/powerpc/include/asm/hw_irq.h| 8 
 arch/powerpc/include/asm/interrupt.h | 2 +-
 arch/powerpc/include/asm/rtas.h  | 4 ++--
 arch/powerpc/kernel/rtas.c   | 4 ++--
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 983551859891..c4d542b4a623 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -111,7 +111,7 @@ static inline void __hard_RI_enable(void)
 #ifdef CONFIG_PPC64
 #include 
 
-static inline notrace unsigned long irq_soft_mask_return(void)
+noinstr static unsigned long irq_soft_mask_return(void)
 {
unsigned long flags;
 
@@ -128,7 +128,7 @@ static inline notrace unsigned long 
irq_soft_mask_return(void)
  * for the critical section and as a clobber because
  * we changed paca->irq_soft_mask
  */
-static inline notrace void irq_soft_mask_set(unsigned long mask)
+noinstr static void irq_soft_mask_set(unsigned long mask)
 {
/*
 * The irq mask must always include the STD bit if any are set.
@@ -155,7 +155,7 @@ static inline notrace void irq_soft_mask_set(unsigned long 
mask)
: "memory");
 }
 
-static inline notrace unsigned long irq_soft_mask_set_return(unsigned long 
mask)
+noinstr static unsigned long irq_soft_mask_set_return(unsigned long mask)
 {
unsigned long flags;
 
@@ -191,7 +191,7 @@ static inline notrace unsigned long 
irq_soft_mask_or_return(unsigned long mask)
return flags;
 }
 
-static inline unsigned long arch_local_save_flags(void)
+static __always_inline unsigned long arch_local_save_flags(void)
 {
return irq_soft_mask_return();
 }
diff --git a/arch/powerpc/include/asm/interrupt.h 
b/arch/powerpc/include/asm/interrupt.h
index 8069dbc4b8d1..090895051712 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -92,7 +92,7 @@ static inline bool is_implicit_soft_masked(struct pt_regs 
*regs)
return search_kernel_soft_mask_table(regs->nip);
 }
 
-static inline void srr_regs_clobbered(void)
+static __always_inline void srr_regs_clobbered(void)
 {
local_paca->srr_valid = 0;
local_paca->hsrr_valid = 0;
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 00531af17ce0..52d29d664fdf 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -201,13 +201,13 @@ inline uint32_t rtas_ext_event_company_id(struct 
rtas_ext_event_log_v6 *ext_log)
 #define PSERIES_ELOG_SECT_ID_MCE   (('M' << 8) | 'C')
 
 static
-inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect)
+__always_inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect)
 {
return be16_to_cpu(sect->id);
 }
 
 static
-inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect)
+__always_inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect)
 {
return be16_to_cpu(sect->length);
 }
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 693133972294..f9d78245c0e8 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -48,7 +48,7 @@
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
 
-static inline void do_enter_rtas(unsigned long args)
+static __always_inline void do_enter_rtas(unsigned long args)
 {
unsigned long msr;
 
@@ -435,7 +435,7 @@ static char *__fetch_rtas_last_error(char *altbuf)
 #endif
 
 
-static void
+noinstr static void
 va_rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
  va_list list)
 {
-- 
2.37.1

Re: [PATCH] powerpc/microwatt: Remove unused early debug code

2022-09-26 Thread Joel Stanley

On Mon, 19 Sept 2022 at 05:28, Michael Ellerman  wrote:
>
> The original microwatt submission[1] included some early debug code for
> using the Microwatt "potato" UART.

The potato is indeed dead.

>
> The series that was eventually merged switched to using a standard UART,
> and so doesn't need any special early debug handling. But some of the
> original code was merged accidentally under the non-existent
> CONFIG_PPC_EARLY_DEBUG_MICROWATT.

The kconfig never got added, so you're right. Using the "legacy serial
console" must be how we get early console on microwatt? I can't quite
work it out.

May or may not be related to https://github.com/linuxppc/issues/issues/413

>
> Drop the unused code.
>
> 1: 
> https://lore.kernel.org/linuxppc-dev/20200509050340.gd1464...@thinks.paulus.ozlabs.org/
>
> Fixes: 48b545b8018d ("powerpc/microwatt: Use standard 16550 UART for console")
> Reported-by: Lukas Bulwahn 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/kernel/udbg_16550.c | 39 
>  1 file changed, 39 deletions(-)
>
> diff --git a/arch/powerpc/kernel/udbg_16550.c 
> b/arch/powerpc/kernel/udbg_16550.c
> index d3942de254c6..ddfbc74bf85f 100644
> --- a/arch/powerpc/kernel/udbg_16550.c
> +++ b/arch/powerpc/kernel/udbg_16550.c
> @@ -296,42 +296,3 @@ void __init udbg_init_40x_realmode(void)
>  }
>
>  #endif /* CONFIG_PPC_EARLY_DEBUG_40x */
> -
> -#ifdef CONFIG_PPC_EARLY_DEBUG_MICROWATT
> -
> -#define UDBG_UART_MW_ADDR  ((void __iomem *)0xc0002000)
> -
> -static u8 udbg_uart_in_isa300_rm(unsigned int reg)
> -{
> -   uint64_t msr = mfmsr();
> -   uint8_t  c;
> -
> -   mtmsr(msr & ~(MSR_EE|MSR_DR));
> -   isync();
> -   eieio();
> -   c = __raw_rm_readb(UDBG_UART_MW_ADDR + (reg << 2));
> -   mtmsr(msr);
> -   isync();
> -   return c;
> -}
> -
> -static void udbg_uart_out_isa300_rm(unsigned int reg, u8 val)
> -{
> -   uint64_t msr = mfmsr();
> -
> -   mtmsr(msr & ~(MSR_EE|MSR_DR));
> -   isync();
> -   eieio();
> -   __raw_rm_writeb(val, UDBG_UART_MW_ADDR + (reg << 2));
> -   mtmsr(msr);
> -   isync();
> -}
> -
> -void __init udbg_init_debug_microwatt(void)
> -{
> -   udbg_uart_in = udbg_uart_in_isa300_rm;
> -   udbg_uart_out = udbg_uart_out_isa300_rm;
> -   udbg_use_uart();
> -}
> -
> -#endif /* CONFIG_PPC_EARLY_DEBUG_MICROWATT */
> --
> 2.37.2
>

[PATCH 7/7] hmm-tests: Add test for migrate_device_range()

2022-09-26 Thread Alistair Popple

Signed-off-by: Alistair Popple 
---
 lib/test_hmm.c | 119 +-
 lib/test_hmm_uapi.h|   1 +-
 tools/testing/selftests/vm/hmm-tests.c |  49 +++-
 3 files changed, 148 insertions(+), 21 deletions(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 2bd3a67..d2821dd 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -100,6 +100,7 @@ struct dmirror {
 struct dmirror_chunk {
struct dev_pagemap  pagemap;
struct dmirror_device   *mdevice;
+   bool remove;
 };
 
 /*
@@ -192,11 +193,15 @@ static int dmirror_fops_release(struct inode *inode, 
struct file *filp)
return 0;
 }
 
+static struct dmirror_chunk *dmirror_page_to_chunk(struct page *page)
+{
+   return container_of(page->pgmap, struct dmirror_chunk, pagemap);
+}
+
 static struct dmirror_device *dmirror_page_to_device(struct page *page)
 
 {
-   return container_of(page->pgmap, struct dmirror_chunk,
-   pagemap)->mdevice;
+   return dmirror_page_to_chunk(page)->mdevice;
 }
 
 static int dmirror_do_fault(struct dmirror *dmirror, struct hmm_range *range)
@@ -1219,6 +1224,84 @@ static int dmirror_snapshot(struct dmirror *dmirror,
return ret;
 }
 
+static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk)
+{
+   unsigned long start_pfn = chunk->pagemap.range.start >> PAGE_SHIFT;
+   unsigned long end_pfn = chunk->pagemap.range.end >> PAGE_SHIFT;
+   unsigned long npages = end_pfn - start_pfn + 1;
+   unsigned long i;
+   unsigned long *src_pfns;
+   unsigned long *dst_pfns;
+
+   src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
+   dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
+
+   migrate_device_range(src_pfns, start_pfn, npages);
+   for (i = 0; i < npages; i++) {
+   struct page *dpage, *spage;
+
+   spage = migrate_pfn_to_page(src_pfns[i]);
+   if (!spage || !(src_pfns[i] & MIGRATE_PFN_MIGRATE))
+   continue;
+
+   if (WARN_ON(!is_device_private_page(spage) &&
+   !is_device_coherent_page(spage)))
+   continue;
+   spage = BACKING_PAGE(spage);
+   dpage = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_NOFAIL);
+   lock_page(dpage);
+   copy_highpage(dpage, spage);
+   dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
+   if (src_pfns[i] & MIGRATE_PFN_WRITE)
+   dst_pfns[i] |= MIGRATE_PFN_WRITE;
+   }
+   migrate_device_pages(src_pfns, dst_pfns, npages);
+   migrate_device_finalize(src_pfns, dst_pfns, npages);
+   kfree(src_pfns);
+   kfree(dst_pfns);
+}
+
+/* Removes free pages from the free list so they can't be re-allocated */
+static void dmirror_remove_free_pages(struct dmirror_chunk *devmem)
+{
+   struct dmirror_device *mdevice = devmem->mdevice;
+   struct page *page;
+
+   for (page = mdevice->free_pages; page; page = page->zone_device_data)
+   if (dmirror_page_to_chunk(page) == devmem)
+   mdevice->free_pages = page->zone_device_data;
+}
+
+static void dmirror_device_remove_chunks(struct dmirror_device *mdevice)
+{
+   unsigned int i;
+
+   mutex_lock(>devmem_lock);
+   if (mdevice->devmem_chunks) {
+   for (i = 0; i < mdevice->devmem_count; i++) {
+   struct dmirror_chunk *devmem =
+   mdevice->devmem_chunks[i];
+
+   spin_lock(>lock);
+   devmem->remove = true;
+   dmirror_remove_free_pages(devmem);
+   spin_unlock(>lock);
+
+   dmirror_device_evict_chunk(devmem);
+   memunmap_pages(>pagemap);
+   if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
+   release_mem_region(devmem->pagemap.range.start,
+  
range_len(>pagemap.range));
+   kfree(devmem);
+   }
+   mdevice->devmem_count = 0;
+   mdevice->devmem_capacity = 0;
+   mdevice->free_pages = NULL;
+   kfree(mdevice->devmem_chunks);
+   }
+   mutex_unlock(>devmem_lock);
+}
+
 static long dmirror_fops_unlocked_ioctl(struct file *filp,
unsigned int command,
unsigned long arg)
@@ -1273,6 +1356,11 @@ static long dmirror_fops_unlocked_ioctl(struct file 
*filp,
ret = dmirror_snapshot(dmirror, );
break;
 
+   case HMM_DMIRROR_RELEASE:
+   dmirror_device_remove_chunks(dmirror->mdevice);
+   ret = 0;
+   break;
+
default:
return -EINVAL;
}
@@ -1327,9 +1415,13 @@ static void

[PATCH 6/7] nouveau/dmem: Evict device private memory during release

2022-09-26 Thread Alistair Popple

When the module is unloaded or a GPU is unbound from the module it is
possible for device private pages to be left mapped in currently running
processes. This leads to a kernel crash when the pages are either freed
or accessed from the CPU because the GPU and associated data structures
and callbacks have all been freed.

Fix this by migrating any mappings back to normal CPU memory prior to
freeing the GPU memory chunks and associated device private pages.

Signed-off-by: Alistair Popple 

---

I assume the AMD driver might have a similar issue. However I can't see
where device private (or coherent) pages actually get unmapped/freed
during teardown as I couldn't find any relevant calls to
devm_memunmap(), memunmap(), devm_release_mem_region() or
release_mem_region(). So it appears that ZONE_DEVICE pages are not being
properly freed during module unload, unless I'm missing something?
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++-
 1 file changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 66ebbd4..3b247b8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
mutex_unlock(>dmem->mutex);
 }
 
+/*
+ * Evict all pages mapping a chunk.
+ */
+void
+nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
+{
+   unsigned long i, npages = range_len(>pagemap.range) >> 
PAGE_SHIFT;
+   unsigned long *src_pfns, *dst_pfns;
+   dma_addr_t *dma_addrs;
+   struct nouveau_fence *fence;
+
+   src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
+   dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
+   dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL);
+
+   migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT,
+   npages);
+
+   for (i = 0; i < npages; i++) {
+   if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
+   struct page *dpage;
+
+   /*
+* _GFP_NOFAIL because the GPU is going away and there
+* is nothing sensible we can do if we can't copy the
+* data back.
+*/
+   dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL);
+   dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
+   nouveau_dmem_copy_one(chunk->drm,
+   migrate_pfn_to_page(src_pfns[i]), dpage,
+   _addrs[i]);
+   }
+   }
+
+   nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, );
+   migrate_device_pages(src_pfns, dst_pfns, npages);
+   nouveau_dmem_fence_done();
+   migrate_device_finalize(src_pfns, dst_pfns, npages);
+   kfree(src_pfns);
+   kfree(dst_pfns);
+   for (i = 0; i < npages; i++)
+   dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE, 
DMA_BIDIRECTIONAL);
+   kfree(dma_addrs);
+}
+
 void
 nouveau_dmem_fini(struct nouveau_drm *drm)
 {
@@ -380,8 +426,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm)
mutex_lock(>dmem->mutex);
 
list_for_each_entry_safe(chunk, tmp, >dmem->chunks, list) {
+   nouveau_dmem_evict_chunk(chunk);
nouveau_bo_unpin(chunk->bo);
nouveau_bo_ref(NULL, >bo);
+   WARN_ON(chunk->callocated);
list_del(>list);
memunmap_pages(>pagemap);
release_mem_region(chunk->pagemap.range.start,
-- 
git-series 0.9.1

[PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()

2022-09-26 Thread Alistair Popple

nouveau_dmem_fault_copy_one() is used during handling of CPU faults via
the migrate_to_ram() callback and is used to copy data from GPU to CPU
memory. It is currently specific to fault handling, however a future
patch implementing eviction of data during teardown needs similar
functionality.

Refactor out the core functionality so that it is not specific to fault
handling.

Signed-off-by: Alistair Popple 
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c | 59 +--
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index f9234ed..66ebbd4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -139,44 +139,25 @@ static void nouveau_dmem_fence_done(struct nouveau_fence 
**fence)
}
 }
 
-static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm,
-   struct vm_fault *vmf, struct migrate_vma *args,
-   dma_addr_t *dma_addr)
+static int nouveau_dmem_copy_one(struct nouveau_drm *drm, struct page *spage,
+   struct page *dpage, dma_addr_t *dma_addr)
 {
struct device *dev = drm->dev->dev;
-   struct page *dpage, *spage;
-   struct nouveau_svmm *svmm;
-
-   spage = migrate_pfn_to_page(args->src[0]);
-   if (!spage || !(args->src[0] & MIGRATE_PFN_MIGRATE))
-   return 0;
 
-   dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address);
-   if (!dpage)
-   return VM_FAULT_SIGBUS;
lock_page(dpage);
 
*dma_addr = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
if (dma_mapping_error(dev, *dma_addr))
-   goto error_free_page;
+   return -EIO;
 
-   svmm = spage->zone_device_data;
-   mutex_lock(>mutex);
-   nouveau_svmm_invalidate(svmm, args->start, args->end);
if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr,
-   NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage)))
-   goto error_dma_unmap;
-   mutex_unlock(>mutex);
+NOUVEAU_APER_VRAM,
+nouveau_dmem_page_addr(spage))) {
+   dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
+   return -EIO;
+   }
 
-   args->dst[0] = migrate_pfn(page_to_pfn(dpage));
return 0;
-
-error_dma_unmap:
-   mutex_unlock(>mutex);
-   dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
-error_free_page:
-   __free_page(dpage);
-   return VM_FAULT_SIGBUS;
 }
 
 static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf)
@@ -184,9 +165,11 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct 
vm_fault *vmf)
struct nouveau_drm *drm = page_to_drm(vmf->page);
struct nouveau_dmem *dmem = drm->dmem;
struct nouveau_fence *fence;
+   struct nouveau_svmm *svmm;
+   struct page *spage, *dpage;
unsigned long src = 0, dst = 0;
dma_addr_t dma_addr = 0;
-   vm_fault_t ret;
+   vm_fault_t ret = 0;
struct migrate_vma args = {
.vma= vmf->vma,
.start  = vmf->address,
@@ -207,9 +190,25 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct 
vm_fault *vmf)
if (!args.cpages)
return 0;
 
-   ret = nouveau_dmem_fault_copy_one(drm, vmf, , _addr);
-   if (ret || dst == 0)
+   spage = migrate_pfn_to_page(src);
+   if (!spage || !(src & MIGRATE_PFN_MIGRATE))
+   goto done;
+
+   dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address);
+   if (!dpage)
+   goto done;
+
+   dst = migrate_pfn(page_to_pfn(dpage));
+
+   svmm = spage->zone_device_data;
+   mutex_lock(>mutex);
+   nouveau_svmm_invalidate(svmm, args.start, args.end);
+   ret = nouveau_dmem_copy_one(drm, spage, dpage, _addr);
+   mutex_unlock(>mutex);
+   if (ret) {
+   ret = VM_FAULT_SIGBUS;
goto done;
+   }
 
nouveau_fence_new(dmem->migrate.chan, false, );
migrate_vma_pages();
-- 
git-series 0.9.1

1 2 >

1 - 100 of 107 matches

Mail list logo