Re: [PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function
Le 27/09/2022 à 04:57, Benjamin Gray a écrit : > On Mon, 2022-09-26 at 14:33 +, Christophe Leroy wrote: >>> +#define patch_memory(addr, val) \ >>> +({ \ >>> + BUILD_BUG_ON(!__native_word(val)); \ >>> + __patch_memory(addr, (unsigned long) val, sizeof(val)); \ >>> +}) >> >> Can you do a static __always_inline function instead of a macro here >> ? > > I didn't before because it doesn't allow using the type as a parameter. > I considered these forms > >patch_memory(addr, val, 8); >patch_memory(addr, val, void*); >patch_memory(addr, val); // size taken from val type > > And thought the third was the nicest to use. Though coming back to > this, I hadn't considered > >patch_memory(addr, val, sizeof(void*)) > > which would still allow a type to decide the size, and not be a macro. > I've got an example implementation further down that also addresses the > size check issue. Oh, I missed that you did automatic type sizing. Fair enough. However I think taking the type of the passed value is dangerous. See put_user(), it uses the size of the destination pointer, not the size of the input value. patch_memory doesn't seem to be used outside of code-patching.c, so I don't thing it is worth to worry about a nice looking API. Just make it simple and pass the size to the function. > >>> +static int __always_inline ___patch_memory(void *patch_addr, >>> + unsigned long data, >>> + void *prog_addr, >>> + size_t size) >> >> Is it really needed in the .c file ? I would expect GCC to take the >> right decision by itself. > > I thought it'd be better to always inline it given it's only used > generically in do_patch_memory and __do_patch_memory, which both get > inlined into __patch_memory. But it does end up generating two copies > due to the different contexts it's called in, so probably not worth it. > Removed for v3. > > (raw_patch_instruction gets an optimised inline of ___patch_memory > either way) > >> A BUILD_BUG() would be better here I think. > > BUILD_BUG() as the default case always triggers though, I assume > because the constant used for size is too far away. How about > >static __always_inline int patch_memory(void *addr, >unsigned long val, >size_t size) >{ >int __patch_memory(void *dest, unsigned long src, size_t size); > >BUILD_BUG_ON_MSG(!(size == sizeof(char) || > size == sizeof(short) || > size == sizeof(int) || > size == sizeof(long)), > "Unsupported size for patch_memory"); >return __patch_memory(addr, val, size); >} > > Declaring the __patch_memory function inside of patch_memory enforces > that you can't accidentally call __patch_memory without going through > this or the *patch_instruction entry points (which hardcode the size). Aren't you making it more difficult that needed ? That's C, not C plus plus and we are not trying to help the user. All kernel developpers know that as soon as they use a function that has a leading double underscore they will be on their own. And again, patch_memory() isn't used anywhere else, at least for the time being, so why worry about that ? > >>> + } >>> >>> - __put_kernel_nofault(patch_addr, , u32, >>> failed); >>> - } else { >>> - u64 val = ppc_inst_as_ulong(instr); >>> + dcbst(patch_addr); >>> + dcbst(patch_addr + size - 1); /* Last byte of data may >>> cross a cacheline */ >> >> Or the second byte of data may cross a cacheline ... > > It might, but unless we are assuming data cachelines smaller than the > native word size it will either be in the first or last byte's > cacheline. Whereas the last byte might be in it's own cacheline. > > As justification the comment's misleading though, how about reducing it > to "data may cross a cacheline" and leaving the reason for flushing the > last byte implicit? Yes that was my worry, a misleading comment. I think "data may cross a cacheline" is what we need as a comment. > >>> -static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) >>> +static int __always_inline __do_patch_memory(void *dest, unsigned >>> long src, size_t size) >>> { >> >> Whaou, do we really want all this to be __always_inline ? Did you >> check >> the text size increase ? > > These ones are redundant because GCC will already inline them, they > were just part of experimenting inlining ___patch_memory. Will remove > for v3. > > The text size doesn't increase though because the call hierarchy is > just a linear chain of > __patch_memory -> do_patch_memory -> __do_patch_memory Yes, I had in mind that all those would be inlined doing to all callers of patch_instruction() and
Re: [PATCH v2 5/6] powerpc/64: Add support for out-of-line static calls
On Mon, 2022-09-26 at 13:16 +, Christophe Leroy wrote: > Build failure with GCC 5.5 (ppc64le_defconfig): > > CC arch/powerpc/kernel/ptrace/ptrace.o > {standard input}: Assembler messages: > {standard input}:10: Error: .localentry expression for > `__SCT__tp_func_sys_enter' is not a valid power of 2 > {standard input}:29: Error: .localentry expression for > `__SCT__tp_func_sys_exit' is not a valid power of 2 Looks support for a literal st_other value in `.localentry` is added in binutils 2.32 . I'll change the config entry as follows for v3: select HAVE_STATIC_CALL if PPC32 || (PPC64_ELF_ABI_V2 && LD_VERSION >= 23200)
Re: [PATCH linux-next][RFC] powerpc: avoid lockdep when we are offline
On Tue Sep 27, 2022 at 11:48 AM AEST, Zhouyi Zhou wrote: > This is second version of my fix to PPC's "WARNING: suspicious RCU usage", > I improved my fix under Paul E. McKenney's guidance: > Link: > https://lore.kernel.org/lkml/20220914021528.15946-1-zhouzho...@gmail.com/T/ > > During the cpu offlining, the sub functions of xive_teardown_cpu will > call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will > travel RCU protected list, so "WARNING: suspicious RCU usage" will be > triggered. > > Avoid lockdep when we are offline. I don't see how this is safe. If RCU is no longer watching the CPU then the memory it is accessing here could be concurrently freed. I think the warning is valid. powerpc's problem is that cpuhp_report_idle_dead() is called before arch_cpu_idle_dead(), so it must not rely on any RCU protection there. I would say xive cleanup just needs to be done earlier. I wonder why it is not done in __cpu_disable or thereabouts, that's where the interrupt controller is supposed to be stopped. Thanks, Nick > > Signed-off-by: Zhouyi Zhou > --- > Dear PPC and RCU developers > > I found this bug when trying to do rcutorture tests in ppc VM of > Open Source Lab of Oregon State University > > console.log report following bug: > [ 37.635545][T0] WARNING: suspicious RCU usage^M > [ 37.636409][T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M > [ 37.637575][T0] -^M > [ 37.638306][T0] kernel/locking/lockdep.c:3723 RCU-list traversed in > non-reader section!!^M > [ 37.639651][T0] ^M > [ 37.639651][T0] other info that might help us debug this:^M > [ 37.639651][T0] ^M > [ 37.641381][T0] ^M > [ 37.641381][T0] RCU used illegally from offline CPU!^M > [ 37.641381][T0] rcu_scheduler_active = 2, debug_locks = 1^M > [ 37.667170][T0] no locks held by swapper/6/0.^M > [ 37.668328][T0] ^M > [ 37.668328][T0] stack backtrace:^M > [ 37.669995][T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted > 6.0.0-rc4-next-20220907-dirty #8^M > [ 37.672777][T0] Call Trace:^M > [ 37.673729][T0] [c4653920] [c097f9b4] > dump_stack_lvl+0x98/0xe0 (unreliable)^M > [ 37.678579][T0] [c4653960] [c01f2eb8] > lockdep_rcu_suspicious+0x148/0x16c^M > [ 37.680425][T0] [c46539f0] [c01ed9b4] > __lock_acquire+0x10f4/0x26e0^M > [ 37.682450][T0] [c4653b30] [c01efc2c] > lock_acquire+0x12c/0x420^M > [ 37.684113][T0] [c4653c20] [c10d704c] > _raw_spin_lock_irqsave+0x6c/0xc0^M > [ 37.686154][T0] [c4653c60] [c00c7b4c] > xive_spapr_put_ipi+0xcc/0x150^M > [ 37.687879][T0] [c4653ca0] [c10c72a8] > xive_cleanup_cpu_ipi+0xc8/0xf0^M > [ 37.689856][T0] [c4653cf0] [c10c7370] > xive_teardown_cpu+0xa0/0xf0^M > [ 37.691877][T0] [c4653d30] [c00fba5c] > pseries_cpu_offline_self+0x5c/0x100^M > [ 37.693882][T0] [c4653da0] [c005d2c4] > arch_cpu_idle_dead+0x44/0x60^M > [ 37.695739][T0] [c4653dc0] [c01c740c] > do_idle+0x16c/0x3d0^M > [ 37.697536][T0] [c4653e70] [c01c7a1c] > cpu_startup_entry+0x3c/0x40^M > [ 37.699694][T0] [c4653ea0] [c005ca20] > start_secondary+0x6c0/0xb50^M > [ 37.701742][T0] [c4653f90] [c000d054] > start_secondary_prolog+0x10/0x14^M > > > Tested on PPC VM of Open Source Lab of Oregon State University. > Test results show that although "WARNING: suspicious RCU usage" has gone, > and there are less "BUG: soft lockup" reports than the original kernel > (9 vs 13), which sounds good ;-) > > But after my modification, results-rcutorture-kasan/SRCU-P/console.log.diags > shows a new warning: > [ 222.289242][ T110] WARNING: CPU: 6 PID: 110 at > kernel/rcu/rcutorture.c:2806 rcu_torture_fwd_prog+0xc88/0xdd0 > > I guess above new warning also exits in original kernel, so I write a tiny > test script as follows: > > #!/bin/sh > > COUNTER=0 > while [ $COUNTER -lt 1000 ] ; do > qemu-system-ppc64 -nographic -smp cores=8,threads=1 -net none -M pseries > -nodefaults -device spapr-vscsi -serial file:/tmp/console.log -m 2G -kernel > /tmp/vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 > rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot > rcupdate.rcu_task_stall_timeout=3 rcutorture.torture_type=srcud > rcupdate.rcu_self_test=1 rcutorture.fwd_progress=3 srcutree.big_cpu_lim=5 > rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30 > rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 > rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 > rcutorture.verbose=1"& > qemu_pid=$! > cd ~/next1/linux-next > make clean > #I use "make vmlinux -j 8" to create heavy background jitter > make vmlinux -j 8 > /dev/null 2>&1 > make_pid=$! > wait $qemu_pid
Re: [PATCH v2 6/6] powerpc/64: Add tests for out-of-line static calls
On Mon, 2022-09-26 at 14:55 +, Christophe Leroy wrote: > > +config PPC_STATIC_CALL_KUNIT_TEST > > + tristate "KUnit tests for PPC64 ELF ABI V2 static calls" > > + default KUNIT_ALL_TESTS > > + depends on HAVE_STATIC_CALL && PPC64_ELF_ABI_V2 && KUNIT && > > m > > Is there a reason why it is dedicated to PPC64 ? In that case, can > you > make it explicit with the name of the config option, and with the > name > of the file below ? The tests were written to make sure the TOC stays correct, so in theory PPC64_ELF_ABI_V2 (and potentially PPC64_ELF_ABI_V1) is the only ABI that should need them. I was thinking other tests should probably go in static_call_selftest.c Thinking now though, I suppose runtime modules are out-of-range for branches on 32-bit as well? I can see it being useful for just testing the indirect branch fallback in that case, without trying to make some generic test suite that needs to work on other arches. The TOC specific checks can be conditionally enabled per ABI.
Re: [PATCH v2 5/6] powerpc/64: Add support for out-of-line static calls
On Mon, 2022-09-26 at 14:54 +, Christophe Leroy wrote: > > diff --git a/arch/powerpc/kernel/static_call.c > > b/arch/powerpc/kernel/static_call.c > > index 863a7aa24650..ecbb74e1b4d3 100644 > > --- a/arch/powerpc/kernel/static_call.c > > +++ b/arch/powerpc/kernel/static_call.c > > @@ -4,30 +4,108 @@ > > > > #include > > > > +static void* ppc_function_toc(u32 *func) > > +{ > > +#ifdef CONFIG_PPC64_ELF_ABI_V2 > > Can you use IS_ENABLED(CONFIG_PPC64_ELF_ABI_V2) instead ? I tried when implementing it, but the `(u64) func` cast is an issue. I could side step it and use `unsigned long` if that's preferable? Otherwise I like being explicit about the size, it's a delicate function.
Re: [PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function
On Mon, 2022-09-26 at 14:33 +, Christophe Leroy wrote: > > +#define patch_memory(addr, val) \ > > +({ \ > > + BUILD_BUG_ON(!__native_word(val)); \ > > + __patch_memory(addr, (unsigned long) val, sizeof(val)); \ > > +}) > > Can you do a static __always_inline function instead of a macro here > ? I didn't before because it doesn't allow using the type as a parameter. I considered these forms patch_memory(addr, val, 8); patch_memory(addr, val, void*); patch_memory(addr, val); // size taken from val type And thought the third was the nicest to use. Though coming back to this, I hadn't considered patch_memory(addr, val, sizeof(void*)) which would still allow a type to decide the size, and not be a macro. I've got an example implementation further down that also addresses the size check issue. > > +static int __always_inline ___patch_memory(void *patch_addr, > > + unsigned long data, > > + void *prog_addr, > > + size_t size) > > Is it really needed in the .c file ? I would expect GCC to take the > right decision by itself. I thought it'd be better to always inline it given it's only used generically in do_patch_memory and __do_patch_memory, which both get inlined into __patch_memory. But it does end up generating two copies due to the different contexts it's called in, so probably not worth it. Removed for v3. (raw_patch_instruction gets an optimised inline of ___patch_memory either way) > A BUILD_BUG() would be better here I think. BUILD_BUG() as the default case always triggers though, I assume because the constant used for size is too far away. How about static __always_inline int patch_memory(void *addr, unsigned long val, size_t size) { int __patch_memory(void *dest, unsigned long src, size_t size); BUILD_BUG_ON_MSG(!(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long)), "Unsupported size for patch_memory"); return __patch_memory(addr, val, size); } Declaring the __patch_memory function inside of patch_memory enforces that you can't accidentally call __patch_memory without going through this or the *patch_instruction entry points (which hardcode the size). > > + } > > > > - __put_kernel_nofault(patch_addr, , u32, > > failed); > > - } else { > > - u64 val = ppc_inst_as_ulong(instr); > > + dcbst(patch_addr); > > + dcbst(patch_addr + size - 1); /* Last byte of data may > > cross a cacheline */ > > Or the second byte of data may cross a cacheline ... It might, but unless we are assuming data cachelines smaller than the native word size it will either be in the first or last byte's cacheline. Whereas the last byte might be in it's own cacheline. As justification the comment's misleading though, how about reducing it to "data may cross a cacheline" and leaving the reason for flushing the last byte implicit? > > -static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) > > +static int __always_inline __do_patch_memory(void *dest, unsigned > > long src, size_t size) > > { > > Whaou, do we really want all this to be __always_inline ? Did you > check > the text size increase ? These ones are redundant because GCC will already inline them, they were just part of experimenting inlining ___patch_memory. Will remove for v3. The text size doesn't increase though because the call hierarchy is just a linear chain of __patch_memory -> do_patch_memory -> __do_patch_memory The entry point __patch_memory is not inlined.
Re: [PATCH v2 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub
On Mon, 2022-09-26 at 14:49 +, Christophe Leroy wrote: > > + /* Replace indirect branch sequence with direct branch > > where possible */ > > + if (!create_branch(, jump_seq_addr, addr, 0)) > > + if (patch_instruction(jump_seq_addr, direct)) > > Why not use patch_branch() ? I didn't think of it at the time. To get the same abort-if-patch-failed semantics then the following should work int err; ... /* Replace indirect branch sequence with direct branch where * possible */ err = patch_branch(>jump[PPC64_STUB_MTCTR_OFFSET], addr, 0); if (err && err != -ERANGE) return 0; >
Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies
On Mon, Sep 26, 2022 at 09:40:25AM +0200, Michal Such??nek wrote: > On Mon, Sep 26, 2022 at 08:47:32AM +0200, Greg Kroah-Hartman wrote: > > On Sat, Sep 24, 2022 at 01:55:23PM +0200, Michal Suchánek wrote: > > > On Sat, Sep 24, 2022 at 12:13:34PM +0200, Greg Kroah-Hartman wrote: > > > > On Sat, Sep 24, 2022 at 11:45:21AM +0200, Michal Suchánek wrote: > > > > > On Sat, Sep 24, 2022 at 11:19:19AM +0200, Greg Kroah-Hartman wrote: > > > > > > On Fri, Sep 23, 2022 at 07:10:28PM +0200, Michal Suchanek wrote: > > > > > > > Hello, > > > > > > > > > > > > > > this is backport of commit 0d519cadf751 > > > > > > > ("arm64: kexec_file: use more system keyrings to verify kernel > > > > > > > image signature") > > > > > > > to table 5.15 tree including the preparatory patches. > > > > > > > > > > > > This feels to me like a new feature for arm64, one that has never > > > > > > worked > > > > > > before and you are just making it feature-parity with x86, right? > > > > > > > > > > > > Or is this a regression fix somewhere? Why is this needed in > > > > > > 5.15.y and > > > > > > why can't people who need this new feature just use a newer kernel > > > > > > version (5.19?) > > > > > > > > > > It's half-broken implementation of the kexec kernel verification. At > > > > > the time > > > > > it was implemented for arm64 we had the platform and secondary > > > > > keyrings > > > > > and x86 was using them but on arm64 the initial implementation ignores > > > > > them. > > > > > > > > Ok, so it's something that never worked. Adding support to get it to > > > > work doesn't really fall into the stable kernel rules, right? > > > > > > Not sure. It was defective, not using the facilities available at the > > > time correctly. Which translates to kernels that can be kexec'd on x86 > > > failing to kexec on arm64 without any explanation (signed with same key, > > > built for the appropriate arch). > > > > Feature parity across architectures is not a "regression", but rather a > > "this feature is not implemented for this architecture yet" type of > > thing. > > That depends on the view - before kexec verification you could boot any > kernel, now you can boot some kernels signed with a valid key, but not > others - the initial implementation is buggy, probably because it > is based on an old version of the x86 code. Buggy? The feature of supporting platform ring had been slipped in just before I submitted the latest patch series which was eventually merged. (I should have noticed it though.) Looking at changes in the commit 278311e417be ("kexec, KEYS: Make use of platform keyring for signature verify"), it seems to be obvious that it is a new feature because it introduced a new Kconfig option, CONFIG_INTEGRITY_PLATFORM_KEYRING, which allows for enabling/disabling platform ring support. -Takahiro Akashi > > > > > > Again, what's wrong with 5.19 for anyone who wants this? Who does want > > > > this? > > > > > > Not sure, really. > > > > > > The final patch was repeatedly backported to stable and failed to build > > > because the prerequisites were missing. > > > > That's because it was tagged, but now that you show the full set of > > requirements, it's pretty obvious to me that this is not relevant for > > going this far back. > > That also works. > > Thanks > > Michal
Re: [PATCH 2/7] mm: Free device private pages have zero refcount
Jason Gunthorpe writes: > On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote: >> Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page >> refcount") device private pages have no longer had an extra reference >> count when the page is in use. However before handing them back to the >> owning device driver we add an extra reference count such that free >> pages have a reference count of one. >> >> This makes it difficult to tell if a page is free or not because both >> free and in use pages will have a non-zero refcount. Instead we should >> return pages to the drivers page allocator with a zero reference count. >> Kernel code can then safely use kernel functions such as >> get_page_unless_zero(). >> >> Signed-off-by: Alistair Popple >> --- >> arch/powerpc/kvm/book3s_hv_uvmem.c | 1 + >> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 + >> drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + >> lib/test_hmm.c | 1 + >> mm/memremap.c| 5 - >> mm/page_alloc.c | 6 ++ >> 6 files changed, 10 insertions(+), 5 deletions(-) > > I think this is a great idea, but I'm surprised no dax stuff is > touched here? free_zone_device_page() shouldn't be called for pgmap->type == MEMORY_DEVICE_FS_DAX so I don't think we should have to worry about DAX there. Except that the folio code looks like it might have introduced a bug. AFAICT put_page() always calls put_devmap_managed_page(>page) but folio_put() does not (although folios_put() does!). So it seems folio_put() won't end up calling __put_devmap_managed_page_refs() as I think it should. I think you're right about the change to __init_zone_device_page() - I should limit it to DEVICE_PRIVATE/COHERENT pages only. But I need to look at Dan's patch series more closely as I suspect it might be better to rebase this patch on top of that. > Jason
Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release
Felix Kuehling writes: > On 2022-09-26 17:35, Lyude Paul wrote: >> On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote: >>> When the module is unloaded or a GPU is unbound from the module it is >>> possible for device private pages to be left mapped in currently running >>> processes. This leads to a kernel crash when the pages are either freed >>> or accessed from the CPU because the GPU and associated data structures >>> and callbacks have all been freed. >>> >>> Fix this by migrating any mappings back to normal CPU memory prior to >>> freeing the GPU memory chunks and associated device private pages. >>> >>> Signed-off-by: Alistair Popple >>> >>> --- >>> >>> I assume the AMD driver might have a similar issue. However I can't see >>> where device private (or coherent) pages actually get unmapped/freed >>> during teardown as I couldn't find any relevant calls to >>> devm_memunmap(), memunmap(), devm_release_mem_region() or >>> release_mem_region(). So it appears that ZONE_DEVICE pages are not being >>> properly freed during module unload, unless I'm missing something? >> I've got no idea, will poke Ben to see if they know the answer to this > > I guess we're relying on devm to release the region. Isn't the whole point of > using devm_request_free_mem_region that we don't have to remember to > explicitly > release it when the device gets destroyed? I believe we had an explicit free > call at some point by mistake, and that caused a double-free during module > unload. See this commit for reference: Argh, thanks for that pointer. I was not so familiar with devm_request_free_mem_region()/devm_memremap_pages() as currently Nouveau explicitly manages that itself. > commit 22f4f4faf337d5fb2d2750aff13215726814273e > Author: Philip Yang > Date: Mon Sep 20 17:25:52 2021 -0400 > > drm/amdkfd: fix svm_migrate_fini warning > Device manager releases device-specific resources when a driver > disconnects from a device, devm_memunmap_pages and > devm_release_mem_region calls in svm_migrate_fini are redundant. > It causes below warning trace after patch "drm/amdgpu: Split > amdgpu_device_fini into early and late", so remove function > svm_migrate_fini. > BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718 > WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795 > devm_release_action+0x51/0x60 > Call Trace: > ? memunmap_pages+0x360/0x360 > svm_migrate_fini+0x2d/0x60 [amdgpu] > kgd2kfd_device_exit+0x23/0xa0 [amdgpu] > amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu] > amdgpu_device_fini_sw+0x45/0x290 [amdgpu] > amdgpu_driver_release_kms+0x12/0x30 [amdgpu] > drm_dev_release+0x20/0x40 [drm] > release_nodes+0x196/0x1e0 > device_release_driver_internal+0x104/0x1d0 > driver_detach+0x47/0x90 > bus_remove_driver+0x7a/0xd0 > pci_unregister_driver+0x3d/0x90 > amdgpu_exit+0x11/0x20 [amdgpu] > Signed-off-by: Philip Yang > Reviewed-by: Felix Kuehling > Signed-off-by: Alex Deucher > > Furthermore, I guess we are assuming that nobody is using the GPU when the > module is unloaded. As long as any processes have /dev/kfd open, you won't be > able to unload the module (except by force-unload). I suppose with ZONE_DEVICE > memory, we can have references to device memory pages even when user mode has > closed /dev/kfd. We do have a cleanup handler that runs in an > MMU-free-notifier. > In theory that should run after all the pages in the mm_struct have been > freed. > It releases all sorts of other device resources and needs the driver to still > be > there. I'm not sure if there is anything preventing a module unload before the > free-notifier runs. I'll look into that. Right - module unload (or device unbind) is one of the other ways we can hit this issue in Nouveau at least. You can end up with ZONE_DEVICE pages mapped in a running process after the module has unloaded. Although now you mention it that seems a bit wrong - the pgmap refcount should provide some protection against that. Will have to look into that too. > Regards, > Felix > > >> >>> --- >>> drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++- >>> 1 file changed, 48 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c >>> b/drivers/gpu/drm/nouveau/nouveau_dmem.c >>> index 66ebbd4..3b247b8 100644 >>> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c >>> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c >>> @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) >>> mutex_unlock(>dmem->mutex); >>> } >>> +/* >>> + * Evict all pages mapping a chunk. >>> + */ >>> +void >>> +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) >>> +{ >>> + unsigned long i, npages = range_len(>pagemap.range) >> >>> PAGE_SHIFT; >>> + unsigned long *src_pfns, *dst_pfns; >>> + dma_addr_t *dma_addrs; >>> + struct
[PATCH linux-next][RFC] powerpc: avoid lockdep when we are offline
This is second version of my fix to PPC's "WARNING: suspicious RCU usage", I improved my fix under Paul E. McKenney's guidance: Link: https://lore.kernel.org/lkml/20220914021528.15946-1-zhouzho...@gmail.com/T/ During the cpu offlining, the sub functions of xive_teardown_cpu will call __lock_acquire when CONFIG_LOCKDEP=y. The latter function will travel RCU protected list, so "WARNING: suspicious RCU usage" will be triggered. Avoid lockdep when we are offline. Signed-off-by: Zhouyi Zhou --- Dear PPC and RCU developers I found this bug when trying to do rcutorture tests in ppc VM of Open Source Lab of Oregon State University console.log report following bug: [ 37.635545][T0] WARNING: suspicious RCU usage^M [ 37.636409][T0] 6.0.0-rc4-next-20220907-dirty #8 Not tainted^M [ 37.637575][T0] -^M [ 37.638306][T0] kernel/locking/lockdep.c:3723 RCU-list traversed in non-reader section!!^M [ 37.639651][T0] ^M [ 37.639651][T0] other info that might help us debug this:^M [ 37.639651][T0] ^M [ 37.641381][T0] ^M [ 37.641381][T0] RCU used illegally from offline CPU!^M [ 37.641381][T0] rcu_scheduler_active = 2, debug_locks = 1^M [ 37.667170][T0] no locks held by swapper/6/0.^M [ 37.668328][T0] ^M [ 37.668328][T0] stack backtrace:^M [ 37.669995][T0] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 6.0.0-rc4-next-20220907-dirty #8^M [ 37.672777][T0] Call Trace:^M [ 37.673729][T0] [c4653920] [c097f9b4] dump_stack_lvl+0x98/0xe0 (unreliable)^M [ 37.678579][T0] [c4653960] [c01f2eb8] lockdep_rcu_suspicious+0x148/0x16c^M [ 37.680425][T0] [c46539f0] [c01ed9b4] __lock_acquire+0x10f4/0x26e0^M [ 37.682450][T0] [c4653b30] [c01efc2c] lock_acquire+0x12c/0x420^M [ 37.684113][T0] [c4653c20] [c10d704c] _raw_spin_lock_irqsave+0x6c/0xc0^M [ 37.686154][T0] [c4653c60] [c00c7b4c] xive_spapr_put_ipi+0xcc/0x150^M [ 37.687879][T0] [c4653ca0] [c10c72a8] xive_cleanup_cpu_ipi+0xc8/0xf0^M [ 37.689856][T0] [c4653cf0] [c10c7370] xive_teardown_cpu+0xa0/0xf0^M [ 37.691877][T0] [c4653d30] [c00fba5c] pseries_cpu_offline_self+0x5c/0x100^M [ 37.693882][T0] [c4653da0] [c005d2c4] arch_cpu_idle_dead+0x44/0x60^M [ 37.695739][T0] [c4653dc0] [c01c740c] do_idle+0x16c/0x3d0^M [ 37.697536][T0] [c4653e70] [c01c7a1c] cpu_startup_entry+0x3c/0x40^M [ 37.699694][T0] [c4653ea0] [c005ca20] start_secondary+0x6c0/0xb50^M [ 37.701742][T0] [c4653f90] [c000d054] start_secondary_prolog+0x10/0x14^M Tested on PPC VM of Open Source Lab of Oregon State University. Test results show that although "WARNING: suspicious RCU usage" has gone, and there are less "BUG: soft lockup" reports than the original kernel (9 vs 13), which sounds good ;-) But after my modification, results-rcutorture-kasan/SRCU-P/console.log.diags shows a new warning: [ 222.289242][ T110] WARNING: CPU: 6 PID: 110 at kernel/rcu/rcutorture.c:2806 rcu_torture_fwd_prog+0xc88/0xdd0 I guess above new warning also exits in original kernel, so I write a tiny test script as follows: #!/bin/sh COUNTER=0 while [ $COUNTER -lt 1000 ] ; do qemu-system-ppc64 -nographic -smp cores=8,threads=1 -net none -M pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log -m 2G -kernel /tmp/vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=3 rcutorture.torture_type=srcud rcupdate.rcu_self_test=1 rcutorture.fwd_progress=3 srcutree.big_cpu_lim=5 rcutorture.onoff_interval=1000 rcutorture.onoff_holdoff=30 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 rcutorture.shutdown_secs=420 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1"& qemu_pid=$! cd ~/next1/linux-next make clean #I use "make vmlinux -j 8" to create heavy background jitter make vmlinux -j 8 > /dev/null 2>&1 make_pid=$! wait $qemu_pid kill $qemu_pid kill $make_id if grep -q WARN /tmp/console.log; then echo $COUNTER > /tmp/counter exit fi COUNTER=$(($COUNTER+1)) done Above test shows that original kernel also warn about "WARNING: CPU: 6 PID: 110 at kernel/rcu/rcutorture.c:2806 rcu_torture_fwd_prog+0xc88/0xdd0" But I am not very sure about my results, so I still add a [RFC] to my subject line. Thank all of you for your guidance and encouragement ;-) Cheers Zhouyi -- arch/powerpc/platforms/pseries/hotplug-cpu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index e0a7ac5db15d..e47098f00da1 100644 ---
Re: [PATCH v5 27/30] RFC: KVM: powerpc: Move processor compatibility check to hardware setup
On Fri, Sep 23, 2022 at 04:58:41PM +1000, Michael Ellerman wrote: > isaku.yamah...@intel.com writes: > > From: Isaku Yamahata > > > > Move processor compatibility check from kvm_arch_processor_compat() into > ^ > kvm_arch_check_processor_compat() > > > kvm_arch_hardware_setup(). The check does model name comparison with a > > global variable, cur_cpu_spec. There is no point to check it at run time > > on all processors. > > A key detail I had to look up is that both kvm_arch_hardware_setup() and > kvm_arch_check_processor_compat() are called from kvm_init(), one after > the other. But the latter is called on each CPU. > > And because the powerpc implementation of kvm_arch_check_processor_compat() > just checks a global, there's no need to call it on every CPU. > > > kvmppc_core_check_processor_compat() checks the global variable. There are > > five implementation for it as follows. > > There are three implementations not five. Thanks. I'll update the commit message. > > arch/powerpc/include/asm/cputable.h: extern struct cpu_spec *cur_cpu_spec; > > arch/powerpc/kvm/book3s.c: return 0 > > arch/powerpc/kvm/e500.c: strcmp(cur_cpu_spec->cpu_name, "e500v2") > > arch/powerpc/kvm/e500mc.c: strcmp(cur_cpu_spec->cpu_name, "e500mc") > > strcmp(cur_cpu_spec->cpu_name, "e5500") > > strcmp(cur_cpu_spec->cpu_name, "e6500") > > > > Suggested-by: Sean Christopherson > > Signed-off-by: Isaku Yamahata > > Cc: linuxppc-dev@lists.ozlabs.org > > Cc: Fabiano Rosas > > --- > > arch/powerpc/kvm/powerpc.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > > index 7b56d6ccfdfb..31dc4f231e9d 100644 > > --- a/arch/powerpc/kvm/powerpc.c > > +++ b/arch/powerpc/kvm/powerpc.c > > @@ -444,12 +444,12 @@ int kvm_arch_hardware_enable(void) > > > > int kvm_arch_hardware_setup(void *opaque) > > { > > - return 0; > > + return kvmppc_core_check_processor_compat(); > > } > > > > int kvm_arch_check_processor_compat(void) > > { > > - return kvmppc_core_check_processor_compat(); > > + return 0; > > } > > The actual change seems OK. I gave it a quick test boot and ran some > VMs, everything seems to work as before. > > Acked-by: Michael Ellerman (powerpc) Thanks so much for testing. I'll remove RFC. -- Isaku Yamahata
Re: [PATCH v3 4/4] powerpc/64s: Enable KFENCE on book3s64
On Mon, 2022-09-26 at 07:57 +, Nicholas Miehlbradt wrote: > KFENCE support was added for ppc32 in commit 90cbac0e995d > ("powerpc: Enable KFENCE for PPC32"). > Enable KFENCE on ppc64 architecture with hash and radix MMUs. > It uses the same mechanism as debug pagealloc to > protect/unprotect pages. All KFENCE kunit tests pass on both > MMUs. > > KFENCE memory is initially allocated using memblock but is > later marked as SLAB allocated. This necessitates the change > to __pud_free to ensure that the KFENCE pages are freed > appropriately. > > Based on previous work by Christophe Leroy and Jordan Niethe. > > Signed-off-by: Nicholas Miehlbradt LGTM. For the whole series: Reviewed-by: Russell Currey
Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release
John Hubbard writes: > On 9/26/22 14:35, Lyude Paul wrote: >>> + for (i = 0; i < npages; i++) { >>> + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { >>> + struct page *dpage; >>> + >>> + /* >>> +* _GFP_NOFAIL because the GPU is going away and there >>> +* is nothing sensible we can do if we can't copy the >>> +* data back. >>> +*/ >> >> You'll have to excuse me for a moment since this area of nouveau isn't one of >> my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite >> retry, in the case of a GPU hotplug event I would assume we would rather just >> stop trying to migrate things to the GPU and just drop the data instead of >> hanging on infinite retries. >> No problem, thanks for taking a look! > Hi Lyude! > > Actually, I really think it's better in this case to keep trying > (presumably not necessarily infinitely, but only until memory becomes > available), rather than failing out and corrupting data. > > That's because I'm not sure it's completely clear that this memory is > discardable. And at some point, we're going to make this all work with > file-backed memory, which will *definitely* not be discardable--I > realize that we're not there yet, of course. > > But here, it's reasonable to commit to just retrying indefinitely, > really. Memory should eventually show up. And if it doesn't, then > restarting the machine is better than corrupting data, generally. The memory is definitely not discardable here if the migration failed because that implies it is still mapped into some userspace process. We could avoid restarting the machine by doing something similar to what happens during memory failure and killing every process that maps the page(s). But overall I think it's better to retry until memory is available, because that allows things like reclaim to work and in the worst case allows the OOM killer to select an appropriate task to kill. It also won't cause data corruption if/when we have file-backed memory. > thanks,
Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release
On 2022-09-26 17:35, Lyude Paul wrote: On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote: When the module is unloaded or a GPU is unbound from the module it is possible for device private pages to be left mapped in currently running processes. This leads to a kernel crash when the pages are either freed or accessed from the CPU because the GPU and associated data structures and callbacks have all been freed. Fix this by migrating any mappings back to normal CPU memory prior to freeing the GPU memory chunks and associated device private pages. Signed-off-by: Alistair Popple --- I assume the AMD driver might have a similar issue. However I can't see where device private (or coherent) pages actually get unmapped/freed during teardown as I couldn't find any relevant calls to devm_memunmap(), memunmap(), devm_release_mem_region() or release_mem_region(). So it appears that ZONE_DEVICE pages are not being properly freed during module unload, unless I'm missing something? I've got no idea, will poke Ben to see if they know the answer to this I guess we're relying on devm to release the region. Isn't the whole point of using devm_request_free_mem_region that we don't have to remember to explicitly release it when the device gets destroyed? I believe we had an explicit free call at some point by mistake, and that caused a double-free during module unload. See this commit for reference: commit 22f4f4faf337d5fb2d2750aff13215726814273e Author: Philip Yang Date: Mon Sep 20 17:25:52 2021 -0400 drm/amdkfd: fix svm_migrate_fini warning Device manager releases device-specific resources when a driver disconnects from a device, devm_memunmap_pages and devm_release_mem_region calls in svm_migrate_fini are redundant. It causes below warning trace after patch "drm/amdgpu: Split amdgpu_device_fini into early and late", so remove function svm_migrate_fini. BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718 WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795 devm_release_action+0x51/0x60 Call Trace: ? memunmap_pages+0x360/0x360 svm_migrate_fini+0x2d/0x60 [amdgpu] kgd2kfd_device_exit+0x23/0xa0 [amdgpu] amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu] amdgpu_device_fini_sw+0x45/0x290 [amdgpu] amdgpu_driver_release_kms+0x12/0x30 [amdgpu] drm_dev_release+0x20/0x40 [drm] release_nodes+0x196/0x1e0 device_release_driver_internal+0x104/0x1d0 driver_detach+0x47/0x90 bus_remove_driver+0x7a/0xd0 pci_unregister_driver+0x3d/0x90 amdgpu_exit+0x11/0x20 [amdgpu] Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher Furthermore, I guess we are assuming that nobody is using the GPU when the module is unloaded. As long as any processes have /dev/kfd open, you won't be able to unload the module (except by force-unload). I suppose with ZONE_DEVICE memory, we can have references to device memory pages even when user mode has closed /dev/kfd. We do have a cleanup handler that runs in an MMU-free-notifier. In theory that should run after all the pages in the mm_struct have been freed. It releases all sorts of other device resources and needs the driver to still be there. I'm not sure if there is anything preventing a module unload before the free-notifier runs. I'll look into that. Regards, Felix --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++- 1 file changed, 48 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 66ebbd4..3b247b8 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) mutex_unlock(>dmem->mutex); } +/* + * Evict all pages mapping a chunk. + */ +void +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) +{ + unsigned long i, npages = range_len(>pagemap.range) >> PAGE_SHIFT; + unsigned long *src_pfns, *dst_pfns; + dma_addr_t *dma_addrs; + struct nouveau_fence *fence; + + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); + dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL); + + migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT, + npages); + + for (i = 0; i < npages; i++) { + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { + struct page *dpage; + + /* +* _GFP_NOFAIL because the GPU is going away and there +* is nothing sensible we can do if we can't copy the +* data back. +*/ You'll have to excuse me for a moment since this area
Re: [PATCH 0/8] generic command line v4
On Mon, Sep 26, 2022 at 05:52:18PM -0500, Rob Herring wrote: > On Thu, Sep 22, 2022 at 4:15 PM Daniel Gimpelevich > wrote: > > > > On Thu, 2022-09-22 at 14:10 -0700, Daniel Walker wrote: > > > On Thu, Sep 22, 2022 at 05:03:46PM -0400, Sean Anderson wrote: > > [snip] > > > > As recently as last month, someone's patch to add such support was > > > > rejected for this reason [1]. > > > > > > > > --Sean > > > > > > > > [1] > > > > https://lore.kernel.org/linux-arm-kernel/20220812084613.GA3107@willie-the-truck/ > > > > > > > > > I had no idea.. Thanks for pointing that out. I guess I will re-submit in > > > that > > > case. > > > > > > Daniel > > > > This has been happening repeatedly since circa 2014, on multiple > > architectures. It's quite frustrating, really. > > It must not be that important. From the last time, IMO Christophe's > version was much closer to being merged than this series. This is not > how you get things upstream: > > > * Dropped powerpc changes > > Christophe Leroy has reservations about the features for powerpc. I > > don't think his reservations are founded, and these changes should > > fully work on powerpc. However, I dropped these changes so Christophe > > can have more time to get comfortable with the changes. > > Rob I don't submit often enough, that's true. However, I figured maintainers don't want the changes. This is a common occurrence in industry, people may submit once or twice, no traction and they give up. I suppose it's a combination of problems. Christophe's don't have the same features, so they are really totally different but conflicting. Daniel
Re: [PATCH 0/8] generic command line v4
On Thu, Sep 22, 2022 at 02:15:44PM -0700, Daniel Gimpelevich wrote: > On Thu, 2022-09-22 at 14:10 -0700, Daniel Walker wrote: > > On Thu, Sep 22, 2022 at 05:03:46PM -0400, Sean Anderson wrote: > [snip] > > > As recently as last month, someone's patch to add such support was > > > rejected for this reason [1]. > > > > > > --Sean > > > > > > [1] > > > https://lore.kernel.org/linux-arm-kernel/20220812084613.GA3107@willie-the-truck/ > > > > > > I had no idea.. Thanks for pointing that out. I guess I will re-submit in > > that > > case. > > > > Daniel > > This has been happening repeatedly since circa 2014, on multiple > architectures. It's quite frustrating, really. I'm not sure I'm following your comments. What's frustrating exactly ? Daniel
Re: [PATCH 0/8] generic command line v4
On Thu, Sep 22, 2022 at 4:15 PM Daniel Gimpelevich wrote: > > On Thu, 2022-09-22 at 14:10 -0700, Daniel Walker wrote: > > On Thu, Sep 22, 2022 at 05:03:46PM -0400, Sean Anderson wrote: > [snip] > > > As recently as last month, someone's patch to add such support was > > > rejected for this reason [1]. > > > > > > --Sean > > > > > > [1] > > > https://lore.kernel.org/linux-arm-kernel/20220812084613.GA3107@willie-the-truck/ > > > > > > I had no idea.. Thanks for pointing that out. I guess I will re-submit in > > that > > case. > > > > Daniel > > This has been happening repeatedly since circa 2014, on multiple > architectures. It's quite frustrating, really. It must not be that important. From the last time, IMO Christophe's version was much closer to being merged than this series. This is not how you get things upstream: > * Dropped powerpc changes > Christophe Leroy has reservations about the features for powerpc. I > don't think his reservations are founded, and these changes should > fully work on powerpc. However, I dropped these changes so Christophe > can have more time to get comfortable with the changes. Rob
Re: [PATCH v2 2/2] powerpc/rtas: block error injection when locked down
On Mon, Sep 26, 2022 at 9:18 AM Nathan Lynch wrote: > > The error injection facility on pseries VMs allows corruption of > arbitrary guest memory, potentially enabling a sufficiently privileged > user to disable lockdown or perform other modifications of the running > kernel via the rtas syscall. > > Block the PAPR error injection facility from being opened or called > when locked down. > > Signed-off-by: Nathan Lynch > --- > arch/powerpc/kernel/rtas.c | 25 - > include/linux/security.h | 1 + > security/security.c| 1 + > 3 files changed, 26 insertions(+), 1 deletion(-) The lockdown changes are trivial, but they look fine to me. Acked-by: Paul Moore (LSM) -- paul-moore.com
Re: [PATCH v2 1/2] powerpc/pseries: block untrusted device tree changes when locked down
On Mon, Sep 26, 2022 at 9:17 AM Nathan Lynch wrote: > > The /proc/powerpc/ofdt interface allows the root user to freely alter > the in-kernel device tree, enabling arbitrary physical address writes > via drivers that could bind to malicious device nodes, thus making it > possible to disable lockdown. > > Historically this interface has been used on the pseries platform to > facilitate the runtime addition and removal of processor, memory, and > device resources (aka Dynamic Logical Partitioning or DLPAR). Years > ago, the processor and memory use cases were migrated to designs that > happen to be lockdown-friendly: device tree updates are communicated > directly to the kernel from firmware without passing through untrusted > user space. I/O device DLPAR via the "drmgr" command in powerpc-utils > remains the sole legitimate user of /proc/powerpc/ofdt, but it is > already broken in lockdown since it uses /dev/mem to allocate argument > buffers for the rtas syscall. So only illegitimate uses of the > interface should see a behavior change when running on a locked down > kernel. > > Signed-off-by: Nathan Lynch > --- > arch/powerpc/platforms/pseries/reconfig.c | 5 + > include/linux/security.h | 1 + > security/security.c | 1 + > 3 files changed, 7 insertions(+) Thanks for moving the definitions. Acked-by: Paul Moore (LSM) > diff --git a/arch/powerpc/platforms/pseries/reconfig.c > b/arch/powerpc/platforms/pseries/reconfig.c > index cad7a0c93117..599bd2c78514 100644 > --- a/arch/powerpc/platforms/pseries/reconfig.c > +++ b/arch/powerpc/platforms/pseries/reconfig.c > @@ -10,6 +10,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -361,6 +362,10 @@ static ssize_t ofdt_write(struct file *file, const char > __user *buf, size_t coun > char *kbuf; > char *tmp; > > + rv = security_locked_down(LOCKDOWN_DEVICE_TREE); > + if (rv) > + return rv; > + > kbuf = memdup_user_nul(buf, count); > if (IS_ERR(kbuf)) > return PTR_ERR(kbuf); > diff --git a/include/linux/security.h b/include/linux/security.h > index 7bd0c490703d..39e7c0e403d9 100644 > --- a/include/linux/security.h > +++ b/include/linux/security.h > @@ -114,6 +114,7 @@ enum lockdown_reason { > LOCKDOWN_IOPORT, > LOCKDOWN_MSR, > LOCKDOWN_ACPI_TABLES, > + LOCKDOWN_DEVICE_TREE, > LOCKDOWN_PCMCIA_CIS, > LOCKDOWN_TIOCSSERIAL, > LOCKDOWN_MODULE_PARAMETERS, > diff --git a/security/security.c b/security/security.c > index 4b95de24bc8d..51bf66d4f472 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -52,6 +52,7 @@ const char *const > lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = { > [LOCKDOWN_IOPORT] = "raw io port access", > [LOCKDOWN_MSR] = "raw MSR access", > [LOCKDOWN_ACPI_TABLES] = "modifying ACPI tables", > + [LOCKDOWN_DEVICE_TREE] = "modifying device tree contents", > [LOCKDOWN_PCMCIA_CIS] = "direct PCMCIA CIS storage", > [LOCKDOWN_TIOCSSERIAL] = "reconfiguration of serial port IO", > [LOCKDOWN_MODULE_PARAMETERS] = "unsafe module parameters", > -- > 2.37.3 > -- paul-moore.com
Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release
On 9/26/22 14:35, Lyude Paul wrote: >> +for (i = 0; i < npages; i++) { >> +if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { >> +struct page *dpage; >> + >> +/* >> + * _GFP_NOFAIL because the GPU is going away and there >> + * is nothing sensible we can do if we can't copy the >> + * data back. >> + */ > > You'll have to excuse me for a moment since this area of nouveau isn't one of > my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite > retry, in the case of a GPU hotplug event I would assume we would rather just > stop trying to migrate things to the GPU and just drop the data instead of > hanging on infinite retries. > Hi Lyude! Actually, I really think it's better in this case to keep trying (presumably not necessarily infinitely, but only until memory becomes available), rather than failing out and corrupting data. That's because I'm not sure it's completely clear that this memory is discardable. And at some point, we're going to make this all work with file-backed memory, which will *definitely* not be discardable--I realize that we're not there yet, of course. But here, it's reasonable to commit to just retrying indefinitely, really. Memory should eventually show up. And if it doesn't, then restarting the machine is better than corrupting data, generally. thanks, -- John Hubbard NVIDIA
[PATCH v3] powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()
At boot time, it is not necessary to delay between polls of cpu_callin_map when waiting for a kicked CPU to come up. Remove the delay intervals, but preserve the overall deadline (five seconds). At run time, the first poll result is usually negative and we incur a sleeping wait. If we spin on the callin word for a short time first, we can reduce __cpu_up() from dozens of milliseconds to under 1ms in the common case on a P9 LPAR: $ ppc64_cpu --smt=off $ bpftrace -e 'kprobe:__cpu_up { @start[tid] = nsecs; } kretprobe:__cpu_up /@start[tid]/ { @us = hist((nsecs - @start[tid]) / 1000); delete(@start[tid]); }' -c 'ppc64_cpu --smt=on' Before: @us: [16K, 32K)85 || [32K, 64K)13 |@@@ | After: @us: [128, 256)95 || [256, 512) 3 |@ | Signed-off-by: Nathan Lynch --- Notes: Changes since v2: * Use short optimistic spin for hotplug case and fall back to sleeping loop. * Preserve original deadline for hotplug case, which was effectively 100 seconds as coded. * Improve benchmark by timing __cpu_up() duration directly. Changes since v1: * Do not poll indefinitely; restore the original 5sec timeout arch/powerpc/kernel/smp.c | 38 ++ 1 file changed, 22 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 169703fead57..b7ce46bbc6f1 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -1257,7 +1257,12 @@ static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle) int __cpu_up(unsigned int cpu, struct task_struct *tidle) { - int rc, c; + const unsigned long boot_spin_ms = 5 * MSEC_PER_SEC; + const bool booting = system_state < SYSTEM_RUNNING; + const unsigned long hp_spin_ms = 1; + unsigned long deadline; + int rc; + const unsigned long spin_wait_ms = booting ? boot_spin_ms : hp_spin_ms; /* * Don't allow secondary threads to come online if inhibited @@ -1302,22 +1307,23 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) } /* -* wait to see if the cpu made a callin (is actually up). -* use this value that I found through experimentation. -* -- Cort +* At boot time, simply spin on the callin word until the +* deadline passes. +* +* At run time, spin for an optimistic amount of time to avoid +* sleeping in the common case. */ - if (system_state < SYSTEM_RUNNING) - for (c = 5; c && !cpu_callin_map[cpu]; c--) - udelay(100); -#ifdef CONFIG_HOTPLUG_CPU - else - /* -* CPUs can take much longer to come up in the -* hotplug case. Wait five seconds. -*/ - for (c = 5000; c && !cpu_callin_map[cpu]; c--) - msleep(1); -#endif + deadline = jiffies + msecs_to_jiffies(spin_wait_ms); + spin_until_cond(cpu_callin_map[cpu] || time_is_before_jiffies(deadline)); + + if (!cpu_callin_map[cpu] && system_state >= SYSTEM_RUNNING) { + const unsigned long sleep_interval_us = 10 * USEC_PER_MSEC; + const unsigned long sleep_wait_ms = 100 * MSEC_PER_SEC; + + deadline = jiffies + msecs_to_jiffies(sleep_wait_ms); + while (!cpu_callin_map[cpu] && time_is_after_jiffies(deadline)) + fsleep(sleep_interval_us); + } if (!cpu_callin_map[cpu]) { printk(KERN_ERR "Processor %u is stuck.\n", cpu); -- 2.37.1
Re: [PATCH v2] powerpc: Ignore DSI error caused by the copy/paste instruction
On Mon, 2022-09-26 at 05:55 +, Christophe Leroy wrote: > > Le 25/09/2022 à 22:26, Haren Myneni a écrit : > > DSI error will be generated when the paste operation is issued on > > the suspended NX window due to NX state changes. The hypervisor > > expects the partition to ignore this error during page pault > > handling. To differentiate DSI caused by an actual HW configuration > > or by the NX window, a new “ibm,pi-features” type value is defined. > > Byte 0, bit 3 of pi-attribute-specifier-type is now defined to > > indicate this DSI error. If this error is not ignored, the user > > space can get SIGBUS when the NX request is issued. > > Would be nice to mention at least one time in the message that NX > stands > to nest accelerator. > > Otherwise, that's confusing with for exemple: > Commit 2e602847d9c2 ("KVM: PPC: Don't flush PTEs on NX/RO hit") > Commit c49643319715 ("powerpc/32s: Only leave NX unset on segments > used > for modules") Thanks. I did not realize since VAS/NX code is added before. I will add the description as you suggested. > > > > This patch adds changes to read ibm,pi-features property and ignore > > DSI error in the page fault handling if CPU_FTR_NX_DSI if defined. > > > > Signed-off-by: Haren Myneni > > --- > > v2: Code cleanup as suggested by Christophe Leroy > > > > arch/powerpc/include/asm/cputable.h | 5 ++-- > > arch/powerpc/kernel/prom.c | 36 +--- > > - > > arch/powerpc/mm/fault.c | 17 +- > > 3 files changed, 45 insertions(+), 13 deletions(-) > > > > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > > index 014005428687..cb949f12baa9 100644 > > --- a/arch/powerpc/mm/fault.c > > +++ b/arch/powerpc/mm/fault.c > > @@ -367,7 +367,22 @@ static void sanity_check_fault(bool is_write, > > bool is_user, > > #elif defined(CONFIG_PPC_8xx) > > #define page_fault_is_bad(__err) ((__err) & DSISR_NOEXEC_OR_G) > > #elif defined(CONFIG_PPC64) > > -#define page_fault_is_bad(__err) ((__err) & DSISR_BAD_FAULT_64S) > > +static int page_fault_is_bad(unsigned long err) > > +{ > > + unsigned long flag = DSISR_BAD_FAULT_64S; > > + > > + /* > > +* PAPR 14.15.3.4.1 > > +* If byte 0, bit 3 of pi-attribute-specifier-type in > > +* ibm,pi-features property is defined, ignore the DSI error > > +* which is caused by the paste instruction on the > > +* suspended NX window. > > +*/ > > + if (cpu_has_feature(CPU_FTR_NX_DSI)) > > + flag &= ~DSISR_BAD_COPYPASTE; > > + > > + return (err & flag); > > You don't need parenthesis ( ) > > > +} > > #else > > #define page_fault_is_bad(__err) ((__err) & DSISR_BAD_FAULT_32S) > > #endif
Re: [PATCH V2] tools/perf/tests: Fix perf probe error log check in skip_if_no_debuginfo
Em Fri, Sep 16, 2022 at 06:35:41PM +0530, kajoljain escreveu: > > > On 9/16/22 16:19, Athira Rajeev wrote: > > The perf probe related tests like probe_vfs_getname.sh which > > is in "tools/perf/tests/shell" directory have dependency on > > debuginfo information in the kernel. Currently debuginfo > > check is handled by skip_if_no_debuginfo function in the > > file "lib/probe_vfs_getname.sh". skip_if_no_debuginfo function > > looks for this specific error log from perf probe to skip > > the testcase: > > > > <<>> > > Failed to find the path for the kernel|Debuginfo-analysis is > > not supported > > <>> > > > > But in some case, like this one in powerpc, while running this > > test, observed error logs is: > > > > <<>> > > The /lib/modules//build/vmlinux file has no debug information. > > Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo > > package. > > Error: Failed to add events. > > <<>> > > > > Update the skip_if_no_debuginfo function to include the above > > error, to skip the test in these scenarios too. > > Patch looks good to me. > > Reviewed-By: Kajol Jain Thanks, applied. - Arnaldo > Thanks, > Kajol Jain > > > > > Reported-by: Disha Goel > > Signed-off-by: Athira Rajeev > > --- > > changelog: > > v1 -> v2: > > Corrected formatting of spaces in error log. > > With spaces in v1 of the patch, the egrep search was > > considering spaces also. > > > > tools/perf/tests/shell/lib/probe_vfs_getname.sh | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/tools/perf/tests/shell/lib/probe_vfs_getname.sh > > b/tools/perf/tests/shell/lib/probe_vfs_getname.sh > > index 5b17d916c555..b616d42bd19d 100644 > > --- a/tools/perf/tests/shell/lib/probe_vfs_getname.sh > > +++ b/tools/perf/tests/shell/lib/probe_vfs_getname.sh > > @@ -19,6 +19,6 @@ add_probe_vfs_getname() { > > } > > > > skip_if_no_debuginfo() { > > - add_probe_vfs_getname -v 2>&1 | egrep -q "^(Failed to find the path for > > the kernel|Debuginfo-analysis is not supported)" && return 2 > > + add_probe_vfs_getname -v 2>&1 | egrep -q "^(Failed to find the path for > > the kernel|Debuginfo-analysis is not supported)|(file has no debug > > information)" && return 2 > > return 1 > > } -- - Arnaldo
[PATCH net-next v5 1/9] dt-bindings: net: Expand pcs-handle to an array
This allows multiple phandles to be specified for pcs-handle, such as when multiple PCSs are present for a single MAC. To differentiate between them, also add a pcs-handle-names property. Signed-off-by: Sean Anderson --- This was previously submitted as [1]. I expect to update this series more, so I have moved it here. Changes from that version include: - Add maxItems to existing bindings - Add a dependency from pcs-names to pcs-handle. [1] https://lore.kernel.org/netdev/20220711160519.741990-3-sean.ander...@seco.com/ (no changes since v4) Changes in v4: - Use pcs-handle-names instead of pcs-names, as discussed Changes in v3: - New .../bindings/net/dsa/renesas,rzn1-a5psw.yaml | 1 + .../devicetree/bindings/net/ethernet-controller.yaml | 10 +- .../devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml| 2 +- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml b/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml index 7ca9c19a157c..a53552ee1d0e 100644 --- a/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml +++ b/Documentation/devicetree/bindings/net/dsa/renesas,rzn1-a5psw.yaml @@ -74,6 +74,7 @@ properties: properties: pcs-handle: +maxItems: 1 description: phandle pointing to a PCS sub-node compatible with renesas,rzn1-miic.yaml# diff --git a/Documentation/devicetree/bindings/net/ethernet-controller.yaml b/Documentation/devicetree/bindings/net/ethernet-controller.yaml index 4b3c590fcebf..5bb2ec2963cf 100644 --- a/Documentation/devicetree/bindings/net/ethernet-controller.yaml +++ b/Documentation/devicetree/bindings/net/ethernet-controller.yaml @@ -108,11 +108,16 @@ properties: $ref: "#/properties/phy-connection-type" pcs-handle: -$ref: /schemas/types.yaml#/definitions/phandle +$ref: /schemas/types.yaml#/definitions/phandle-array description: Specifies a reference to a node representing a PCS PHY device on a MDIO bus to link with an external PHY (phy-handle) if exists. + pcs-handle-names: +$ref: /schemas/types.yaml#/definitions/string-array +description: + The name of each PCS in pcs-handle. + phy-handle: $ref: /schemas/types.yaml#/definitions/phandle description: @@ -216,6 +221,9 @@ properties: required: - speed +dependencies: + pcs-handle-names: [pcs-handle] + allOf: - if: properties: diff --git a/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml b/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml index 7f620a71a972..600240281e8c 100644 --- a/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml +++ b/Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml @@ -31,7 +31,7 @@ properties: phy-mode: true pcs-handle: -$ref: /schemas/types.yaml#/definitions/phandle +maxItems: 1 description: A reference to a node representing a PCS PHY device found on the internal MDIO bus. -- 2.35.1.1320.gc452695387.dirty
[PATCH net-next v5 9/9] arm64: dts: layerscape: Add nodes for QSGMII PCSs
Now that we actually read registers from QSGMII PCSs, it's important that we have the correct address (instead of hoping that we're the MAC with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII PCSs. The exact mapping of QSGMII to MACs depends on the SoC. Since the first QSGMII PCSs share an address with the SGMII and XFI PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts on the bus. Signed-off-by: Sean Anderson --- (no changes since v3) Changes in v3: - Split this patch off from the previous one Changes in v2: - New .../boot/dts/freescale/fsl-ls1043-post.dtsi | 24 ++ .../boot/dts/freescale/fsl-ls1046-post.dtsi | 25 +++ 2 files changed, 49 insertions(+) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1043-post.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1043-post.dtsi index d237162a8744..5c4d7eef8b61 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1043-post.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1043-post.dtsi @@ -24,9 +24,12 @@ { /* these aliases provide the FMan ports mapping */ enet0: ethernet@e { + pcs-handle-names = "qsgmii"; }; enet1: ethernet@e2000 { + pcsphy-handle = <>, <_pcs1>; + pcs-handle-names = "sgmii", "qsgmii"; }; enet2: ethernet@e4000 { @@ -36,11 +39,32 @@ enet3: ethernet@e6000 { }; enet4: ethernet@e8000 { + pcsphy-handle = <>, <_pcs2>; + pcs-handle-names = "sgmii", "qsgmii"; }; enet5: ethernet@ea000 { + pcsphy-handle = <>, <_pcs3>; + pcs-handle-names = "sgmii", "qsgmii"; }; enet6: ethernet@f { }; + + mdio@e1000 { + qsgmiib_pcs1: ethernet-pcs@1 { + compatible = "fsl,lynx-pcs"; + reg = <0x1>; + }; + + qsgmiib_pcs2: ethernet-pcs@2 { + compatible = "fsl,lynx-pcs"; + reg = <0x2>; + }; + + qsgmiib_pcs3: ethernet-pcs@3 { + compatible = "fsl,lynx-pcs"; + reg = <0x3>; + }; + }; }; diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046-post.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046-post.dtsi index d6caaea57d90..4e3345093943 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1046-post.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046-post.dtsi @@ -23,6 +23,8 @@ { { /* these aliases provide the FMan ports mapping */ enet0: ethernet@e { + pcsphy-handle = <_pcs3>; + pcs-handle-names = "qsgmii"; }; enet1: ethernet@e2000 { @@ -35,14 +37,37 @@ enet3: ethernet@e6000 { }; enet4: ethernet@e8000 { + pcsphy-handle = <>, <_pcs1>; + pcs-handle-names = "sgmii", "qsgmii"; }; enet5: ethernet@ea000 { + pcsphy-handle = <>, <>; + pcs-handle-names = "sgmii", "qsgmii"; }; enet6: ethernet@f { }; enet7: ethernet@f2000 { + pcsphy-handle = <>, <_pcs2>, <>; + pcs-handle-names = "sgmii", "qsgmii", "xfi"; + }; + + mdio@eb000 { + qsgmiib_pcs1: ethernet-pcs@1 { + compatible = "fsl,lynx-pcs"; + reg = <0x1>; + }; + + qsgmiib_pcs2: ethernet-pcs@2 { + compatible = "fsl,lynx-pcs"; + reg = <0x2>; + }; + + qsgmiib_pcs3: ethernet-pcs@3 { + compatible = "fsl,lynx-pcs"; + reg = <0x3>; + }; }; }; -- 2.35.1.1320.gc452695387.dirty
[PATCH net-next v5 8/9] powerpc: dts: qoriq: Add nodes for QSGMII PCSs
Now that we actually read registers from QSGMII PCSs, it's important that we have the correct address (instead of hoping that we're the MAC with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII PCSs. They have the same addresses on all SoCs (e.g. if QSGMIIA is present it's used for MACs 1 through 4). Since the first QSGMII PCSs share an address with the SGMII and XFI PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts on the bus. Signed-off-by: Sean Anderson --- (no changes since v4) Changes in v4: - Add XFI PCS for t208x MAC1/MAC2 Changes in v3: - Add compatibles for QSGMII PCSs - Split arm and powerpcs dts updates Changes in v2: - New .../boot/dts/fsl/qoriq-fman3-0-10g-0-best-effort.dtsi | 3 ++- arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-0-10g-1-best-effort.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi | 3 ++- arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi | 3 ++- arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-0.dtsi | 3 ++- arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-1.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-2.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-3.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-4.dtsi | 3 ++- arch/powerpc/boot/dts/fsl/qoriq-fman3-0-1g-5.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-1-10g-0.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-1-10g-1.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-0.dtsi | 3 ++- arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-1.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-2.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-3.dtsi | 10 +- arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-4.dtsi | 3 ++- arch/powerpc/boot/dts/fsl/qoriq-fman3-1-1g-5.dtsi | 10 +- 20 files changed, 131 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0-best-effort.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0-best-effort.dtsi index baa0c503e741..7e70977f282a 100644 --- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0-best-effort.dtsi +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0-best-effort.dtsi @@ -55,7 +55,8 @@ ethernet@e { reg = <0xe 0x1000>; fsl,fman-ports = <_rx_0x08 _tx_0x28>; ptp-timer = <_timer0>; - pcsphy-handle = <>; + pcsphy-handle = <>, <>; + pcs-handle-names = "sgmii", "qsgmii"; }; mdio@e1000 { diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi index 93095600e808..5f89f7c1761f 100644 --- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi @@ -52,7 +52,15 @@ ethernet@f { compatible = "fsl,fman-memac"; reg = <0xf 0x1000>; fsl,fman-ports = <_rx_0x10 _tx_0x30>; - pcsphy-handle = <>; + pcsphy-handle = <>, <_pcs2>, <>; + pcs-handle-names = "sgmii", "qsgmii", "xfi"; + }; + + mdio@e9000 { + qsgmiib_pcs2: ethernet-pcs@2 { + compatible = "fsl,lynx-pcs"; + reg = <2>; + }; }; mdio@f1000 { diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1-best-effort.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1-best-effort.dtsi index ff4bd38f0645..71eb75e82c2e 100644 --- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1-best-effort.dtsi +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1-best-effort.dtsi @@ -55,7 +55,15 @@ ethernet@e2000 { reg = <0xe2000 0x1000>; fsl,fman-ports = <_rx_0x09 _tx_0x29>; ptp-timer = <_timer0>; - pcsphy-handle = <>; + pcsphy-handle = <>, <_pcs1>; + pcs-handle-names = "sgmii", "qsgmii"; + }; + + mdio@e1000 { + qsgmiia_pcs1: ethernet-pcs@1 { + compatible = "fsl,lynx-pcs"; + reg = <1>; + }; }; mdio@e3000 { diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi index 1fa38ed6f59e..fb7032ddb7fc 100644 --- a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi @@ -52,7 +52,15 @@ ethernet@f2000 { compatible = "fsl,fman-memac"; reg = <0xf2000 0x1000>; fsl,fman-ports = <_rx_0x11 _tx_0x31>; - pcsphy-handle = <>; + pcsphy-handle = <>, <_pcs3>, <>; +
[PATCH net-next v5 7/9] powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G
On the T208X SoCs, MAC1 and MAC2 support XGMII. Add some new MAC dtsi fragments, and mark the QMAN ports as 10G. Fixes: da414bb923d9 ("powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)") Signed-off-by: Sean Anderson --- (no changes since v4) Changes in v4: - New .../boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi | 44 +++ .../boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi | 44 +++ arch/powerpc/boot/dts/fsl/t2081si-post.dtsi | 4 +- 3 files changed, 90 insertions(+), 2 deletions(-) create mode 100644 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi create mode 100644 arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi new file mode 100644 index ..437dab3fc017 --- /dev/null +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0-or-later +/* + * QorIQ FMan v3 10g port #2 device tree stub [ controller @ offset 0x40 ] + * + * Copyright 2022 Sean Anderson + * Copyright 2012 - 2015 Freescale Semiconductor Inc. + */ + +fman@40 { + fman0_rx_0x08: port@88000 { + cell-index = <0x8>; + compatible = "fsl,fman-v3-port-rx"; + reg = <0x88000 0x1000>; + fsl,fman-10g-port; + }; + + fman0_tx_0x28: port@a8000 { + cell-index = <0x28>; + compatible = "fsl,fman-v3-port-tx"; + reg = <0xa8000 0x1000>; + fsl,fman-10g-port; + }; + + ethernet@e { + cell-index = <0>; + compatible = "fsl,fman-memac"; + reg = <0xe 0x1000>; + fsl,fman-ports = <_rx_0x08 _tx_0x28>; + ptp-timer = <_timer0>; + pcsphy-handle = <>; + }; + + mdio@e1000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "fsl,fman-memac-mdio", "fsl,fman-xmdio"; + reg = <0xe1000 0x1000>; + fsl,erratum-a011043; /* must ignore read errors */ + + pcsphy0: ethernet-phy@0 { + reg = <0x0>; + }; + }; +}; diff --git a/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi new file mode 100644 index ..ad116b17850a --- /dev/null +++ b/arch/powerpc/boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0-or-later +/* + * QorIQ FMan v3 10g port #3 device tree stub [ controller @ offset 0x40 ] + * + * Copyright 2022 Sean Anderson + * Copyright 2012 - 2015 Freescale Semiconductor Inc. + */ + +fman@40 { + fman0_rx_0x09: port@89000 { + cell-index = <0x9>; + compatible = "fsl,fman-v3-port-rx"; + reg = <0x89000 0x1000>; + fsl,fman-10g-port; + }; + + fman0_tx_0x29: port@a9000 { + cell-index = <0x29>; + compatible = "fsl,fman-v3-port-tx"; + reg = <0xa9000 0x1000>; + fsl,fman-10g-port; + }; + + ethernet@e2000 { + cell-index = <1>; + compatible = "fsl,fman-memac"; + reg = <0xe2000 0x1000>; + fsl,fman-ports = <_rx_0x09 _tx_0x29>; + ptp-timer = <_timer0>; + pcsphy-handle = <>; + }; + + mdio@e3000 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "fsl,fman-memac-mdio", "fsl,fman-xmdio"; + reg = <0xe3000 0x1000>; + fsl,erratum-a011043; /* must ignore read errors */ + + pcsphy1: ethernet-phy@0 { + reg = <0x0>; + }; + }; +}; diff --git a/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi b/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi index ecbb447920bc..74e17e134387 100644 --- a/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi +++ b/arch/powerpc/boot/dts/fsl/t2081si-post.dtsi @@ -609,8 +609,8 @@ usb1: usb@211000 { /include/ "qoriq-bman1.dtsi" /include/ "qoriq-fman3-0.dtsi" -/include/ "qoriq-fman3-0-1g-0.dtsi" -/include/ "qoriq-fman3-0-1g-1.dtsi" +/include/ "qoriq-fman3-0-10g-2.dtsi" +/include/ "qoriq-fman3-0-10g-3.dtsi" /include/ "qoriq-fman3-0-1g-2.dtsi" /include/ "qoriq-fman3-0-1g-3.dtsi" /include/ "qoriq-fman3-0-1g-4.dtsi" -- 2.35.1.1320.gc452695387.dirty
[PATCH net-next v5 6/9] net: dpaa: Convert to phylink
This converts DPAA to phylink. All macs are converted. This should work with no device tree modifications (including those made in this series), except for QSGMII (as noted previously). The mEMAC configuration is one of the tricker areas. I have tried to capture all the restrictions across the various models. Most of the time, we assume that if the serdes supports a mode or the phy-interface-mode specifies it, then we support it. The only place we can't do this is (RG)MII, since there's no serdes. In that case, we rely on a (new) devicetree property. There are also several cases where half-duplex is broken. Unfortunately, only a single compatible is used for the MAC, so we have to use the board compatible instead. The 10GEC conversion is very straightforward, since it only supports XAUI. There is generally nothing to configure. The dTSEC conversion is broadly similar to mEMAC, but is simpler because we don't support configuring the SerDes (though this can be easily added) and we don't have multiple PCSs. From what I can tell, there's nothing different in the driver or documentation between SGMII and 1000BASE-X except for the advertising. Similarly, I couldn't find anything about 2500BASE-X. In both cases, I treat them like SGMII. These modes aren't used by any in-tree boards. Similarly, despite being mentioned in the driver, I couldn't find any documented SoCs which supported QSGMII. I have left it unimplemented for now. 10GEC and dTSEC have not been tested at all. I would greatly appreciate if someone could try them out. Signed-off-by: Sean Anderson --- This has been tested on an LS1046ARDB. With managed=phy, I was unable to get the interfaces to come up at all, hence the default to in-band. (no changes since v3) Changes in v3: - Remove _return label from memac_initialization in favor of returning directly - Fix grabbing the default PCS not checking for -ENODATA from of_property_match_string - Set DTSEC_ECNTRL_R100M in dtsec_link_up instead of dtsec_mac_config - Remove rmii/mii properties Changes in v2: - Remove unused variable slow_10g_if - Restrict valid link modes based on the phy interface. This is easier to set up, and mostly captures what I intended to do the first time. We now have a custom validate which restricts half-duplex for some SoCs for RGMII, but generally just uses the default phylink validate. - Configure the SerDes in enable/disable - Properly implement all ethtool ops and ioctls. These were mostly stubbed out just enough to compile last time. - Convert 10GEC and dTSEC as well drivers/net/ethernet/freescale/dpaa/Kconfig | 4 +- .../net/ethernet/freescale/dpaa/dpaa_eth.c| 89 +-- .../ethernet/freescale/dpaa/dpaa_ethtool.c| 90 +-- drivers/net/ethernet/freescale/fman/Kconfig | 1 - .../net/ethernet/freescale/fman/fman_dtsec.c | 459 +++--- .../net/ethernet/freescale/fman/fman_mac.h| 10 - .../net/ethernet/freescale/fman/fman_memac.c | 578 +- .../net/ethernet/freescale/fman/fman_tgec.c | 131 ++-- drivers/net/ethernet/freescale/fman/mac.c | 168 + drivers/net/ethernet/freescale/fman/mac.h | 23 +- 10 files changed, 629 insertions(+), 924 deletions(-) diff --git a/drivers/net/ethernet/freescale/dpaa/Kconfig b/drivers/net/ethernet/freescale/dpaa/Kconfig index 0e1439fd00bd..2b560661c82a 100644 --- a/drivers/net/ethernet/freescale/dpaa/Kconfig +++ b/drivers/net/ethernet/freescale/dpaa/Kconfig @@ -2,8 +2,8 @@ menuconfig FSL_DPAA_ETH tristate "DPAA Ethernet" depends on FSL_DPAA && FSL_FMAN - select PHYLIB - select FIXED_PHY + select PHYLINK + select PCS_LYNX help Data Path Acceleration Architecture Ethernet driver, supporting the Freescale QorIQ chips. diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c index 0a180d17121c..262a2558353b 100644 --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c @@ -264,8 +264,19 @@ static int dpaa_netdev_init(struct net_device *net_dev, net_dev->needed_headroom = priv->tx_headroom; net_dev->watchdog_timeo = msecs_to_jiffies(tx_timeout); - mac_dev->net_dev = net_dev; + /* The rest of the config is filled in by the mac device already */ + mac_dev->phylink_config.dev = _dev->dev; + mac_dev->phylink_config.type = PHYLINK_NETDEV; mac_dev->update_speed = dpaa_eth_cgr_set_speed; + mac_dev->phylink = phylink_create(_dev->phylink_config, + dev_fwnode(mac_dev->dev), + mac_dev->phy_if, + mac_dev->phylink_ops); + if (IS_ERR(mac_dev->phylink)) { + err = PTR_ERR(mac_dev->phylink); + dev_err_probe(dev, err, "Could not create phylink\n"); + return err; + } /* start without
[PATCH net-next v5 5/9] net: fman: memac: Use lynx pcs driver
Although not stated in the datasheet, as far as I can tell PCS for mEMACs is a "Lynx." By reusing the existing driver, we can remove the PCS management code from the memac driver. This requires calling some PCS functions manually which phylink would usually do for us, but we will let it do that soon. One problem is that we don't actually have a PCS for QSGMII. We pretend that each mEMAC's MDIO bus has four QSGMII PCSs, but this is not the case. Only the "base" mEMAC's MDIO bus has the four QSGMII PCSs. This is not an issue yet, because we never get the PCS state. However, it will be once the conversion to phylink is complete, since the links will appear to never come up. To get around this, we allow specifying multiple PCSs in pcsphy. This breaks backwards compatibility with old device trees, but only for QSGMII. IMO this is the only reasonable way to figure out what the actual QSGMII PCS is. Additionally, we now also support a separate XFI PCS. This can allow the SerDes driver to set different addresses for the SGMII and XFI PCSs so they can be accessed at the same time. Signed-off-by: Sean Anderson --- (no changes since v3) Changes in v3: - Put the PCS mdiodev only after we are done with it (since the PCS does not perform a get itself). Changes in v2: - Move PCS_LYNX dependency to fman Kconfig drivers/net/ethernet/freescale/fman/Kconfig | 3 + .../net/ethernet/freescale/fman/fman_memac.c | 257 +++--- 2 files changed, 104 insertions(+), 156 deletions(-) diff --git a/drivers/net/ethernet/freescale/fman/Kconfig b/drivers/net/ethernet/freescale/fman/Kconfig index 48bf8088795d..8f5637db41dd 100644 --- a/drivers/net/ethernet/freescale/fman/Kconfig +++ b/drivers/net/ethernet/freescale/fman/Kconfig @@ -4,6 +4,9 @@ config FSL_FMAN depends on FSL_SOC || ARCH_LAYERSCAPE || COMPILE_TEST select GENERIC_ALLOCATOR select PHYLIB + select PHYLINK + select PCS + select PCS_LYNX select CRC32 default n help diff --git a/drivers/net/ethernet/freescale/fman/fman_memac.c b/drivers/net/ethernet/freescale/fman/fman_memac.c index 56a29f505590..80ae34bea818 100644 --- a/drivers/net/ethernet/freescale/fman/fman_memac.c +++ b/drivers/net/ethernet/freescale/fman/fman_memac.c @@ -11,43 +11,12 @@ #include #include +#include #include #include #include #include -/* PCS registers */ -#define MDIO_SGMII_CR 0x00 -#define MDIO_SGMII_DEV_ABIL_SGMII 0x04 -#define MDIO_SGMII_LINK_TMR_L 0x12 -#define MDIO_SGMII_LINK_TMR_H 0x13 -#define MDIO_SGMII_IF_MODE 0x14 - -/* SGMII Control defines */ -#define SGMII_CR_AN_EN 0x1000 -#define SGMII_CR_RESTART_AN0x0200 -#define SGMII_CR_FD0x0100 -#define SGMII_CR_SPEED_SEL1_1G 0x0040 -#define SGMII_CR_DEF_VAL (SGMII_CR_AN_EN | SGMII_CR_FD | \ -SGMII_CR_SPEED_SEL1_1G) - -/* SGMII Device Ability for SGMII defines */ -#define MDIO_SGMII_DEV_ABIL_SGMII_MODE 0x4001 -#define MDIO_SGMII_DEV_ABIL_BASEX_MODE 0x01A0 - -/* Link timer define */ -#define LINK_TMR_L 0xa120 -#define LINK_TMR_H 0x0007 -#define LINK_TMR_L_BASEX 0xaf08 -#define LINK_TMR_H_BASEX 0x002f - -/* SGMII IF Mode defines */ -#define IF_MODE_USE_SGMII_AN 0x0002 -#define IF_MODE_SGMII_EN 0x0001 -#define IF_MODE_SGMII_SPEED_100M 0x0004 -#define IF_MODE_SGMII_SPEED_1G 0x0008 -#define IF_MODE_SGMII_DUPLEX_HALF 0x0010 - /* Num of additional exact match MAC adr regs */ #define MEMAC_NUM_OF_PADDRS 7 @@ -326,7 +295,9 @@ struct fman_mac { struct fman_rev_info fm_rev_info; bool basex_if; struct phy *serdes; - struct phy_device *pcsphy; + struct phylink_pcs *sgmii_pcs; + struct phylink_pcs *qsgmii_pcs; + struct phylink_pcs *xfi_pcs; bool allmulti_enabled; }; @@ -487,91 +458,22 @@ static u32 get_mac_addr_hash_code(u64 eth_addr) return xor_val; } -static void setup_sgmii_internal_phy(struct fman_mac *memac, -struct fixed_phy_status *fixed_link) +static void setup_sgmii_internal(struct fman_mac *memac, +struct phylink_pcs *pcs, +struct fixed_phy_status *fixed_link) { - u16 tmp_reg16; - - if (WARN_ON(!memac->pcsphy)) - return; - - /* SGMII mode */ - tmp_reg16 = IF_MODE_SGMII_EN; - if (!fixed_link) - /* AN enable */ - tmp_reg16 |= IF_MODE_USE_SGMII_AN; - else { - switch (fixed_link->speed) { - case 10: - /* For 10M: IF_MODE[SPEED_10M] = 0 */ - break; - case 100: - tmp_reg16 |= IF_MODE_SGMII_SPEED_100M; - break; -
[PATCH net-next v5 4/9] net: fman: memac: Add serdes support
This adds support for using a serdes which has to be configured. This is primarly in preparation for the next commit, which will then change the serdes mode dynamically. Signed-off-by: Sean Anderson --- (no changes since v4) Changes in v4: - Don't fail if phy support was not compiled in .../net/ethernet/freescale/fman/fman_memac.c | 49 ++- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/fman/fman_memac.c b/drivers/net/ethernet/freescale/fman/fman_memac.c index 32d26cf17843..56a29f505590 100644 --- a/drivers/net/ethernet/freescale/fman/fman_memac.c +++ b/drivers/net/ethernet/freescale/fman/fman_memac.c @@ -13,6 +13,7 @@ #include #include #include +#include #include /* PCS registers */ @@ -324,6 +325,7 @@ struct fman_mac { void *fm; struct fman_rev_info fm_rev_info; bool basex_if; + struct phy *serdes; struct phy_device *pcsphy; bool allmulti_enabled; }; @@ -1203,17 +1205,56 @@ int memac_initialization(struct mac_device *mac_dev, } } + memac->serdes = devm_of_phy_get(mac_dev->dev, mac_node, "serdes"); + err = PTR_ERR(memac->serdes); + if (err == -ENODEV || err == -ENOSYS) { + dev_dbg(mac_dev->dev, "could not get (optional) serdes\n"); + memac->serdes = NULL; + } else if (IS_ERR(memac->serdes)) { + dev_err_probe(mac_dev->dev, err, "could not get serdes\n"); + goto _return_fm_mac_free; + } else { + err = phy_init(memac->serdes); + if (err) { + dev_err_probe(mac_dev->dev, err, + "could not initialize serdes\n"); + goto _return_fm_mac_free; + } + + err = phy_power_on(memac->serdes); + if (err) { + dev_err_probe(mac_dev->dev, err, + "could not power on serdes\n"); + goto _return_phy_exit; + } + + if (memac->phy_if == PHY_INTERFACE_MODE_SGMII || + memac->phy_if == PHY_INTERFACE_MODE_1000BASEX || + memac->phy_if == PHY_INTERFACE_MODE_2500BASEX || + memac->phy_if == PHY_INTERFACE_MODE_QSGMII || + memac->phy_if == PHY_INTERFACE_MODE_XGMII) { + err = phy_set_mode_ext(memac->serdes, PHY_MODE_ETHERNET, + memac->phy_if); + if (err) { + dev_err_probe(mac_dev->dev, err, + "could not set serdes mode to %s\n", + phy_modes(memac->phy_if)); + goto _return_phy_power_off; + } + } + } + if (!mac_dev->phy_node && of_phy_is_fixed_link(mac_node)) { struct phy_device *phy; err = of_phy_register_fixed_link(mac_node); if (err) - goto _return_fm_mac_free; + goto _return_phy_power_off; fixed_link = kzalloc(sizeof(*fixed_link), GFP_KERNEL); if (!fixed_link) { err = -ENOMEM; - goto _return_fm_mac_free; + goto _return_phy_power_off; } mac_dev->phy_node = of_node_get(mac_node); @@ -1242,6 +1283,10 @@ int memac_initialization(struct mac_device *mac_dev, goto _return; +_return_phy_power_off: + phy_power_off(memac->serdes); +_return_phy_exit: + phy_exit(memac->serdes); _return_fixed_link_free: kfree(fixed_link); _return_fm_mac_free: -- 2.35.1.1320.gc452695387.dirty
[PATCH net-next v5 3/9] dt-bindings: net: fman: Add additional interface properties
At the moment, mEMACs are configured almost completely based on the phy-connection-type. That is, if the phy interface is RGMII, it assumed that RGMII is supported. For some interfaces, it is assumed that the RCW/bootloader has set up the SerDes properly. This is generally OK, but restricts runtime reconfiguration. The actual link state is never reported. To address these shortcomings, the driver will need additional information. First, it needs to know how to access the PCS/PMAs (in order to configure them and get the link status). The SGMII PCS/PMA is the only currently-described PCS/PMA. Add the XFI and QSGMII PCS/PMAs as well. The XFI (and 10GBASE-KR) PCS/PMA is a c45 "phy" which sits on the same MDIO bus as SGMII PCS/PMA. By default they will have conflicting addresses, but they are also not enabled at the same time by default. Therefore, we can let the XFI PCS/PMA be the default when phy-connection-type is xgmii. This will allow for backwards-compatibility. QSGMII, however, cannot work with the current binding. This is because the QSGMII PCS/PMAs are only present on one MAC's MDIO bus. At the moment this is worked around by having every MAC write to the PCS/PMA addresses (without checking if they are present). This only works if each MAC has the same configuration, and only if we don't need to know the status. Because the QSGMII PCS/PMA will typically be located on a different MDIO bus than the MAC's SGMII PCS/PMA, there is no fallback for the QSGMII PCS/PMA. Signed-off-by: Sean Anderson Reviewed-by: Rob Herring --- (no changes since v3) Changes in v3: - Add vendor prefix 'fsl,' to rgmii and mii properties. - Set maxItems for pcs-names - Remove phy-* properties from example because dt-schema complains and I can't be bothered to figure out how to make it work. - Add pcs-handle as a preferred version of pcsphy-handle - Deprecate pcsphy-handle - Remove mii/rmii properties Changes in v2: - Better document how we select which PCS to use in the default case .../bindings/net/fsl,fman-dtsec.yaml | 53 ++- .../devicetree/bindings/net/fsl-fman.txt | 5 +- 2 files changed, 43 insertions(+), 15 deletions(-) diff --git a/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml b/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml index 3a35ac1c260d..c80c880a9dab 100644 --- a/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml +++ b/Documentation/devicetree/bindings/net/fsl,fman-dtsec.yaml @@ -85,9 +85,39 @@ properties: $ref: /schemas/types.yaml#/definitions/phandle description: A reference to the IEEE1588 timer + phys: +description: A reference to the SerDes lane(s) +maxItems: 1 + + phy-names: +items: + - const: serdes + pcsphy-handle: -$ref: /schemas/types.yaml#/definitions/phandle -description: A reference to the PCS (typically found on the SerDes) +$ref: /schemas/types.yaml#/definitions/phandle-array +minItems: 1 +maxItems: 3 +deprecated: true +description: See pcs-handle. + + pcs-handle: +minItems: 1 +maxItems: 3 +description: | + A reference to the various PCSs (typically found on the SerDes). If + pcs-handle-names is absent, and phy-connection-type is "xgmii", then the first + reference will be assumed to be for "xfi". Otherwise, if pcs-handle-names is + absent, then the first reference will be assumed to be for "sgmii". + + pcs-handle-names: +minItems: 1 +maxItems: 3 +items: + enum: +- sgmii +- qsgmii +- xfi +description: The type of each PCS in pcsphy-handle. tbi-handle: $ref: /schemas/types.yaml#/definitions/phandle @@ -100,6 +130,10 @@ required: - fsl,fman-ports - ptp-timer +dependencies: + pcs-handle-names: +- pcs-handle + allOf: - $ref: ethernet-controller.yaml# - if: @@ -110,14 +144,6 @@ allOf: then: required: - tbi-handle - - if: - properties: -compatible: - contains: -const: fsl,fman-memac -then: - required: -- pcsphy-handle unevaluatedProperties: false @@ -138,8 +164,9 @@ examples: reg = <0xe8000 0x1000>; fsl,fman-ports = <_rx_0x0c _tx_0x2c>; ptp-timer = <_timer0>; -pcsphy-handle = <>; -phy-handle = <_phy1>; -phy-connection-type = "sgmii"; +pcs-handle = <>, <_pcs1>; +pcs-handle-names = "sgmii", "qsgmii"; +phys = < 1>; +phy-names = "serdes"; }; ... diff --git a/Documentation/devicetree/bindings/net/fsl-fman.txt b/Documentation/devicetree/bindings/net/fsl-fman.txt index b9055335db3b..bda4b41af074 100644 --- a/Documentation/devicetree/bindings/net/fsl-fman.txt +++ b/Documentation/devicetree/bindings/net/fsl-fman.txt @@ -320,8 +320,9 @@ For internal PHY device on internal mdio bus, a PHY node should be created. See the definition of the PHY node in
[PATCH net-next v5 2/9] dt-bindings: net: Add Lynx PCS binding
This binding is fairly bare-bones for now, since the Lynx driver doesn't parse any properties (or match based on the compatible). We just need it in order to prevent the PCS nodes from having phy devices attached to them. This is not really a problem, but it is a bit inefficient. This binding is really for three separate PCSs (SGMII, QSGMII, and XFI). However, the driver treats all of them the same. This works because the SGMII and XFI devices typically use the same address, and the SerDes driver (or RCW) muxes between them. The QSGMII PCSs have the same register layout as the SGMII PCSs. To do things properly, we'd probably do something like ethernet-pcs@0 { #pcs-cells = <1>; compatible = "fsl,lynx-pcs"; reg = <0>, <1>, <2>, <3>; }; but that would add complexity, and we can describe the hardware just fine using separate PCSs for now. Signed-off-by: Sean Anderson --- Changes in v5: - New .../bindings/net/pcs/fsl,lynx-pcs.yaml| 40 +++ 1 file changed, 40 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml diff --git a/Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml b/Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml new file mode 100644 index ..fbedf696c555 --- /dev/null +++ b/Documentation/devicetree/bindings/net/pcs/fsl,lynx-pcs.yaml @@ -0,0 +1,40 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/pcs/fsl,lynx-pcs.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: NXP Lynx PCS + +maintainers: + - Ioana Ciornei + +description: | + NXP Lynx 10G and 28G SerDes have Ethernet PCS devices which can be used as + protocol controllers. They are accessible over the Ethernet interface's MDIO + bus. + +properties: + compatible: +const: fsl,lynx-pcs + + reg: +maxItems: 1 + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | +mdio-bus { + #address-cells = <1>; + #size-cells = <0>; + + qsgmii_pcs1: ethernet-pcs@1 { +compatible = "fsl,lynx-pcs"; +reg = <1>; + }; +}; -- 2.35.1.1320.gc452695387.dirty
[PATCH net-next v5 0/9] [RFT] net: dpaa: Convert to phylink
This series converts the DPAA driver to phylink. I have tried to maintain backwards compatibility with existing device trees whereever possible. However, one area where I was unable to achieve this was with QSGMII. Please refer to patch 2 for details. All mac drivers have now been converted. I would greatly appreciate if anyone has T-series or P-series boards they can test/debug this series on. I only have an LS1046ARDB. Everything but QSGMII should work without breakage; QSGMII needs patches 7 and 8. For this reason, the last 4 patches in this series should be applied together (and should not go through separate trees). This series depends on [1] and [2]. [1] https://lore.kernel.org/netdev/20220725153730.2604096-1-sean.ander...@seco.com/ [2] https://lore.kernel.org/netdev/20220725151039.2581576-1-sean.ander...@seco.com/ Changes in v5: - Add Lynx PCS binding Changes in v4: - Use pcs-handle-names instead of pcs-names, as discussed - Don't fail if phy support was not compiled in - Split off rate adaptation series - Split off DPAA "preparation" series - Split off Lynx 10G support - t208x: Mark MAC1 and MAC2 as 10G - Add XFI PCS for t208x MAC1/MAC2 Changes in v3: - Expand pcs-handle to an array - Add vendor prefix 'fsl,' to rgmii and mii properties. - Set maxItems for pcs-names - Remove phy-* properties from example because dt-schema complains and I can't be bothered to figure out how to make it work. - Add pcs-handle as a preferred version of pcsphy-handle - Deprecate pcsphy-handle - Remove mii/rmii properties - Put the PCS mdiodev only after we are done with it (since the PCS does not perform a get itself). - Remove _return label from memac_initialization in favor of returning directly - Fix grabbing the default PCS not checking for -ENODATA from of_property_match_string - Set DTSEC_ECNTRL_R100M in dtsec_link_up instead of dtsec_mac_config - Remove rmii/mii properties - Replace 1000Base... with 1000BASE... to match IEEE capitalization - Add compatibles for QSGMII PCSs - Split arm and powerpcs dts updates Changes in v2: - Better document how we select which PCS to use in the default case - Move PCS_LYNX dependency to fman Kconfig - Remove unused variable slow_10g_if - Restrict valid link modes based on the phy interface. This is easier to set up, and mostly captures what I intended to do the first time. We now have a custom validate which restricts half-duplex for some SoCs for RGMII, but generally just uses the default phylink validate. - Configure the SerDes in enable/disable - Properly implement all ethtool ops and ioctls. These were mostly stubbed out just enough to compile last time. - Convert 10GEC and dTSEC as well - Fix capitalization of mEMAC in commit messages - Add nodes for QSGMII PCSs - Add nodes for QSGMII PCSs Sean Anderson (9): dt-bindings: net: Expand pcs-handle to an array dt-bindings: net: Add Lynx PCS binding dt-bindings: net: fman: Add additional interface properties net: fman: memac: Add serdes support net: fman: memac: Use lynx pcs driver net: dpaa: Convert to phylink powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G powerpc: dts: qoriq: Add nodes for QSGMII PCSs arm64: dts: layerscape: Add nodes for QSGMII PCSs .../bindings/net/dsa/renesas,rzn1-a5psw.yaml | 1 + .../bindings/net/ethernet-controller.yaml | 10 +- .../bindings/net/fsl,fman-dtsec.yaml | 53 +- .../bindings/net/fsl,qoriq-mc-dpmac.yaml | 2 +- .../devicetree/bindings/net/fsl-fman.txt | 5 +- .../bindings/net/pcs/fsl,lynx-pcs.yaml| 40 + .../boot/dts/freescale/fsl-ls1043-post.dtsi | 24 + .../boot/dts/freescale/fsl-ls1046-post.dtsi | 25 + .../fsl/qoriq-fman3-0-10g-0-best-effort.dtsi | 3 +- .../boot/dts/fsl/qoriq-fman3-0-10g-0.dtsi | 10 +- .../fsl/qoriq-fman3-0-10g-1-best-effort.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-0-10g-1.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-0-10g-2.dtsi | 45 ++ .../boot/dts/fsl/qoriq-fman3-0-10g-3.dtsi | 45 ++ .../boot/dts/fsl/qoriq-fman3-0-1g-0.dtsi | 3 +- .../boot/dts/fsl/qoriq-fman3-0-1g-1.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-0-1g-2.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-0-1g-3.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-0-1g-4.dtsi | 3 +- .../boot/dts/fsl/qoriq-fman3-0-1g-5.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-1-10g-0.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-1-10g-1.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-1-1g-0.dtsi | 3 +- .../boot/dts/fsl/qoriq-fman3-1-1g-1.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-1-1g-2.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-1-1g-3.dtsi | 10 +- .../boot/dts/fsl/qoriq-fman3-1-1g-4.dtsi | 3 +- .../boot/dts/fsl/qoriq-fman3-1-1g-5.dtsi | 10 +- arch/powerpc/boot/dts/fsl/t2081si-post.dtsi | 4 +- drivers/net/ethernet/freescale/dpaa/Kconfig | 4 +- .../net/ethernet/freescale/dpaa/dpaa_eth.c| 89 +--
Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()
On Mon, Sep 26, 2022 at 10:01:55PM +0800, Zhuo Chen wrote: > On 9/23/22 5:08 AM, Bjorn Helgaas wrote: > > On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote: > > > When state is pci_channel_io_frozen in pcie_do_recovery(), > > > the severity is fatal and fatal status should be cleared. > > > So we add pci_aer_clear_fatal_status(). > > > > Seems sensible to me. Did you find this by code inspection or by > > debugging a problem? If the latter, it would be nice to mention the > > symptoms of the problem in the commit log. > > I found this by code inspection so I may not enumerate what kind of problems > this code will cause. > > > > > Since pcie_aer_is_native() in pci_aer_clear_fatal_status() > > > and pci_aer_clear_nonfatal_status() contains the function of > > > 'if (host->native_aer || pcie_ports_native)', so we move them > > > out of it. > > > > Wrap commit log to fill 75 columns. > > > > > Signed-off-by: Zhuo Chen > > > --- > > > drivers/pci/pcie/err.c | 8 ++-- > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > > > index 0c5a143025af..e0a8ade4c3fe 100644 > > > --- a/drivers/pci/pcie/err.c > > > +++ b/drivers/pci/pcie/err.c > > > @@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev > > > *dev, > > >* it is responsible for clearing this status. In that case, > > > the > > >* signaling device may not even be visible to the OS. > > >*/ > > > - if (host->native_aer || pcie_ports_native) { > > > + if (host->native_aer || pcie_ports_native) > > > pcie_clear_device_status(dev); > > > > pcie_clear_device_status() doesn't check for pcie_aer_is_native() > > internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status > > errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER > > status only when we control AER"), both callers check before calling > > it. > > > > I think we should move the check inside pcie_clear_device_status(). > > That could be a separate preliminary patch. > > > > There are a couple other places (aer_root_reset() and > > get_port_device_capability()) that do the same check and could be > > changed to use pcie_aer_is_native() instead. That could be another > > preliminary patch. > > > Good suggestion. But I have only one doubt. In aer_root_reset(), if we use > "if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap > is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return > false. It's different from just using "(host->native_aer || > pcie_ports_native)". > Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL > pointer check should be added in pcie_aer_is_native() because root may be > NULL. Good point. In aer_root_reset(), we're updating Root Port registers, so I think they should look like: if (pcie_aer_is_native(root) && aer) { ... } Does that seem safe and equivalent to you? Bjorn
Re: [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status
On Mon, Sep 26, 2022 at 10:16:23PM +0800, Zhuo Chen wrote: > On 9/23/22 5:50 AM, Bjorn Helgaas wrote: > > On Fri, Sep 02, 2022 at 02:16:34AM +0800, Zhuo Chen wrote: > > > Statements clearing AER error status in aer_enable_rootport() has the > > > same function as pci_aer_raw_clear_status(). So we replace them, which > > > has no functional changes. > > > - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ); > > > - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32); > > > - pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, ); > > > - pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32); > > > - pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ); > > > - pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32); > > > + pci_aer_raw_clear_status(pdev); > > > > It's true that this is functionally equivalent. > > > > But 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status() to > > unconditionally clear Error Status") says pci_aer_raw_clear_status() > > is only for use in the EDR path (this should have been included in the > > function comment), so I think we should preserve that property and use > > pci_aer_clear_status() here. > > > > pci_aer_raw_clear_status() is the same as pci_aer_clear_status() > > except it doesn't check pcie_aer_is_native(). And I'm pretty sure we > > can't get to aer_enable_rootport() *unless* pcie_aer_is_native(), > > because get_port_device_capability() checks the same thing, so they > > should be equivalent here. > > > Thanks Bjorn, this very detailed correction is helpful. By the way, 'only > for use in the EDR path' obviously written in the function comments may be > better. So far only commit log has included these. Yes, definitely! I goofed when I applied that patch without making sure there was something in the function comment. Bjorn
Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status
On Mon, Sep 26, 2022 at 09:30:48PM +0800, Zhuo Chen wrote: > On 9/23/22 4:02 AM, Bjorn Helgaas wrote: > > On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote: > > > On 9/12/22 12:22 AM, Serge Semin wrote: > > > > On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote: > > > ‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be > > > removed in the next kernel: > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406 > > > > This is a problem because 00278564a60e ("crypto: hisilicon - Remove > > pci_aer_clear_nonfatal_status() call") is in Herbert's cryptodev tree, > > and if I apply this series to the PCI tree and Linus merges it before > > Herbert's cryptodev changes, it will break the build. > > > > I think we need to split this patch up like this: > > > >- Add pci_aer_clear_uncorrect_error_status() to PCI core > >- Convert dpc to use pci_aer_clear_uncorrect_error_status() > > (I might end up squashing with above) > >- Convert lpfc to use pci_aer_clear_uncorrect_error_status() > >- Convert ntb_hw_idt to use pci_aer_clear_uncorrect_error_status() > >- Unexport pci_aer_clear_nonfatal_status() > > > > Then I can apply all but the last patch safely. If the crypto changes > > are merged first, we can add the last one; otherwise we can do it for > > the next cycle. > > > Good proposal. I will implement these in the next version. > > Do I need to put pci related modifications (include patch 2/3 and 3/3) in a > patch set or just single patches? When in doubt, put them in separate patches. It's trivial for me to squash them together if that makes more sense, but much more difficult for me to split them apart. Thanks for helping clean up this area! Bjorn
Re: [PATCH v6 4/8] phy: fsl: Add Lynx 10G SerDes driver
On 9/24/22 2:54 AM, Vinod Koul wrote: > On 20-09-22, 16:23, Sean Anderson wrote: >> This adds support for the Lynx 10G "SerDes" devices found on various NXP >> QorIQ SoCs. There may be up to four SerDes devices on each SoC, each >> supporting up to eight lanes. Protocol support for each SerDes is highly >> heterogeneous, with each SoC typically having a totally different >> selection of supported protocols for each lane. Additionally, the SerDes >> devices on each SoC also have differing support. One SerDes will >> typically support Ethernet on most lanes, while the other will typically >> support PCIe on most lanes. >> >> There is wide hardware support for this SerDes. It is present on QorIQ >> T-Series and Layerscape processors. Because each SoC typically has >> specific instructions and exceptions for its SerDes, I have limited the >> initial scope of this module to just the LS1046A and LS1088A. >> Additionally, I have only added support for Ethernet protocols. There is >> not a great need for dynamic reconfiguration for other protocols (except >> perhaps for M.2 cards), so support for them may never be added. >> >> Nevertheless, I have tried to provide an obvious path for adding support >> for other SoCs as well as other protocols. SATA just needs support for >> configuring LNmSSCR0. PCIe may need to configure the equalization >> registers. It also uses multiple lanes. I have tried to write the driver >> with multi-lane support in mind, so there should not need to be any >> large changes. Although there are 6 protocols supported, I have only >> tested SGMII and XFI. The rest have been implemented as described in >> the datasheet. Most of these protocols should work "as-is", but >> 10GBASE-KR will need PCS support for link training. >> >> The PLLs are modeled as clocks proper. This lets us take advantage of >> the existing clock infrastructure. I have not given the same treatment >> to the per-lane clocks because they need to be programmed in-concert >> with the rest of the lane settings. One tricky thing is that the VCO >> (PLL) rate exceeds 2^32 (maxing out at around 5GHz). This will be a >> problem on 32-bit platforms, since clock rates are stored as unsigned >> longs. To work around this, the pll clock rate is generally treated in >> units of kHz. >> >> The PLLs are configured rather interestingly. Instead of the usual direct >> programming of the appropriate divisors, the input and output clock rates >> are selected directly. Generally, the only restriction is that the input >> and output must be integer multiples of each other. This suggests some kind >> of internal look-up table. The datasheets generally list out the supported >> combinations explicitly, and not all input/output combinations are >> documented. I'm not sure if this is due to lack of support, or due to an >> oversight. If this becomes an issue, then some combinations can be >> blacklisted (or whitelisted). This may also be necessary for other SoCs >> which have more stringent clock requirements. >> >> The general API call list for this PHY is documented under the driver-api >> docs. I think this is rather standard, except that most drivers configure >> the mode (protocol) at xlate-time. Unlike some other phys where e.g. PCIe >> x4 will use 4 separate phys all configured for PCIe, this driver uses one >> phy configured to use 4 lanes. This is because while the individual lanes >> may be configured individually, the protocol selection acts on all lanes at >> once. Additionally, the order which lanes should be configured in is >> specified by the datasheet. To coordinate this, lanes are reserved in >> phy_init, and released in phy_exit. >> >> This driver was written with reference to the LS1046A reference manual. >> However, it was informed by reference manuals for all processors with >> mEMACs, especially the T4240 (which appears to have a "maxed-out" >> configuration). The earlier P-Series processors appear to be similar, but >> have a different overall register layout (using "banks" instead of >> separate SerDes). Perhaps this those use a "5G Lynx SerDes." >> >> Signed-off-by: Sean Anderson >> --- >> >> Changes in v6: >> - Update MAINTAINERS to include new files >> - Include bitfield.h and slab.h to allow compilation on non-arm64 >> arches. >> - Depend on COMMON_CLK and either layerscape/ppc >> >> Changes in v5: >> - Remove references to PHY_INTERFACE_MODE_1000BASEKX to allow this >> series to be applied directly to linux/master. >> - Add fsl,lynx-10g.h to MAINTAINERS >> >> Changes in v4: >> - Rework all debug statements to remove use of __func__. Additional >> information has been provided as necessary. >> - Consider alternative parent rates in round_rate and not in set_rate. >> Trying to modify out parent's rate in set_rate will deadlock. >> - Explicitly perform a stop/reset sequence in set_rate. This way we >> always ensure that the PLL is properly stopped. >> - Set the power-down bit when disabling the
[PATCH RFC 5/5] mm: remove unused savedwrite infrastructure
NUMA hinting no longer uses savedwrite, let's rip it out. ... and while at it, drop __pte_write() and __pmd_write() on ppc64. Signed-off-by: David Hildenbrand --- arch/powerpc/include/asm/book3s/64/pgtable.h | 80 +--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- include/linux/pgtable.h | 24 -- mm/debug_vm_pgtable.c| 32 4 files changed, 5 insertions(+), 133 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 392ff48f77df..b3ddc34d71c1 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -418,35 +418,9 @@ static inline int __ptep_test_and_clear_young(struct mm_struct *mm, #define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH #define pmdp_clear_flush_young pmdp_test_and_clear_young -static inline int __pte_write(pte_t pte) -{ - return !!(pte_raw(pte) & cpu_to_be64(_PAGE_WRITE)); -} - -#ifdef CONFIG_NUMA_BALANCING -#define pte_savedwrite pte_savedwrite -static inline bool pte_savedwrite(pte_t pte) -{ - /* -* Saved write ptes are prot none ptes that doesn't have -* privileged bit sit. We mark prot none as one which has -* present and pviliged bit set and RWX cleared. To mark -* protnone which used to have _PAGE_WRITE set we clear -* the privileged bit. -*/ - return !(pte_raw(pte) & cpu_to_be64(_PAGE_RWX | _PAGE_PRIVILEGED)); -} -#else -#define pte_savedwrite pte_savedwrite -static inline bool pte_savedwrite(pte_t pte) -{ - return false; -} -#endif - static inline int pte_write(pte_t pte) { - return __pte_write(pte) || pte_savedwrite(pte); + return !!(pte_raw(pte) & cpu_to_be64(_PAGE_WRITE)); } static inline int pte_read(pte_t pte) @@ -458,24 +432,16 @@ static inline int pte_read(pte_t pte) static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - if (__pte_write(*ptep)) + if (pte_write(*ptep)) pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 0); - else if (unlikely(pte_savedwrite(*ptep))) - pte_update(mm, addr, ptep, 0, _PAGE_PRIVILEGED, 0); } #define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - /* -* We should not find protnone for hugetlb, but this complete the -* interface. -*/ - if (__pte_write(*ptep)) + if (pte_write(*ptep)) pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 1); - else if (unlikely(pte_savedwrite(*ptep))) - pte_update(mm, addr, ptep, 0, _PAGE_PRIVILEGED, 1); } #define __HAVE_ARCH_PTEP_GET_AND_CLEAR @@ -552,36 +518,6 @@ static inline int pte_protnone(pte_t pte) return (pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE | _PAGE_RWX)) == cpu_to_be64(_PAGE_PRESENT | _PAGE_PTE); } - -#define pte_mk_savedwrite pte_mk_savedwrite -static inline pte_t pte_mk_savedwrite(pte_t pte) -{ - /* -* Used by Autonuma subsystem to preserve the write bit -* while marking the pte PROT_NONE. Only allow this -* on PROT_NONE pte -*/ - VM_BUG_ON((pte_raw(pte) & cpu_to_be64(_PAGE_PRESENT | _PAGE_RWX | _PAGE_PRIVILEGED)) != - cpu_to_be64(_PAGE_PRESENT | _PAGE_PRIVILEGED)); - return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_PRIVILEGED)); -} - -#define pte_clear_savedwrite pte_clear_savedwrite -static inline pte_t pte_clear_savedwrite(pte_t pte) -{ - /* -* Used by KSM subsystem to make a protnone pte readonly. -*/ - VM_BUG_ON(!pte_protnone(pte)); - return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PRIVILEGED)); -} -#else -#define pte_clear_savedwrite pte_clear_savedwrite -static inline pte_t pte_clear_savedwrite(pte_t pte) -{ - VM_WARN_ON(1); - return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_WRITE)); -} #endif /* CONFIG_NUMA_BALANCING */ static inline bool pte_hw_valid(pte_t pte) @@ -658,8 +594,6 @@ static inline unsigned long pte_pfn(pte_t pte) /* Generic modifiers for PTE bits */ static inline pte_t pte_wrprotect(pte_t pte) { - if (unlikely(pte_savedwrite(pte))) - return pte_clear_savedwrite(pte); return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_WRITE)); } @@ -1156,8 +1090,6 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) #define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) -#define pmd_mk_savedwrite(pmd) pte_pmd(pte_mk_savedwrite(pmd_pte(pmd))) -#define pmd_clear_savedwrite(pmd) pte_pmd(pte_clear_savedwrite(pmd_pte(pmd))) #ifdef
[PATCH RFC 4/5] mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite
commit b191f9b106ea ("mm: numa: preserve PTE write permissions across a NUMA hinting fault") added remembering write permissions using ordinary pte_write() for PROT_NONE mapped pages to avoid write faults when remapping the page !PROT_NONE on NUMA hinting faults. That commit noted: The patch looks hacky but the alternatives looked worse. The tidest was to rewalk the page tables after a hinting fault but it was more complex than this approach and the performance was worse. It's not generally safe to just mark the page writable during the fault if it's a write fault as it may have been read-only for COW so that approach was discarded. Later, commit 288bc54949fc ("mm/autonuma: let architecture override how the write bit should be stashed in a protnone pte.") introduced a family of savedwrite PTE functions that didn't necessarily improve the whole situation. One confusing thing is that nowadays, if a page is pte_protnone() and pte_savedwrite() then also pte_write() is true. Another source of confusion is that there is only a single pte_mk_savedwrite() call in the kernel. All other write-protection code seems to silently rely on pte_wrprotect(). Ever since PageAnonExclusive was introduced and we started using it in mprotect context via commit 64fe24a3e05e ("mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection"), we do have machinery in place to avoid write faults when changing protection, which is exactly what we want to do here. Let's similarly do what ordinary mprotect() does nowadays when upgrading write permissions and reuse can_change_pte_writable() and can_change_pmd_writable() to detect if we can upgrade PTE permissions to be writable. For anonymous pages there should be absolutely no change: if an anonymous page is not exclusive, it could not have been mapped writable -- because only exclusive anonymous pages can be mapped writable. However, there *might* be a change for writable shared mappings that require writenotify: if they are not dirty, we cannot map them writable. While it might not matter in practice, we'd need a different way to identify whether writenotify is actually required -- and ordinary mprotect would benefit from that as well. We'll remove all savedwrite leftovers next. Signed-off-by: David Hildenbrand --- include/linux/mm.h | 2 ++ mm/huge_memory.c | 28 +--- mm/ksm.c | 9 - mm/memory.c| 19 --- mm/mprotect.c | 7 ++- 5 files changed, 41 insertions(+), 24 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8a5ad9d050bf..20061a9f7f47 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1954,6 +1954,8 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) +bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr, +pte_t pte); extern unsigned long change_protection(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, pgprot_t newprot, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index e5ce3e11d4ae..f148d1295d2e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1507,8 +1507,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) unsigned long haddr = vmf->address & HPAGE_PMD_MASK; int page_nid = NUMA_NO_NODE; int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK); - bool migrated = false; - bool was_writable = pmd_savedwrite(oldpmd); + bool try_change_writable, migrated = false; int flags = 0; vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); @@ -1517,13 +1516,22 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) goto out; } + /* See mprotect_fixup(). */ + if (vma->vm_flags & VM_SHARED) + try_change_writable = vma_wants_writenotify(vma, vma->vm_page_prot); + else + try_change_writable = !!(vma->vm_flags & VM_WRITE); + pmd = pmd_modify(oldpmd, vma->vm_page_prot); page = vm_normal_page_pmd(vma, haddr, pmd); if (!page) goto out_map; /* See similar comment in do_numa_page for explanation */ - if (!was_writable) + if (try_change_writable && !pmd_write(pmd) && +can_change_pmd_writable(vma, vmf->address, pmd)) + pmd = pmd_mkwrite(pmd); + if (!pmd_write(pmd)) flags |= TNF_NO_GROUP; page_nid = page_to_nid(page); @@ -1568,8 +1576,12 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) /* Restore the PMD */ pmd = pmd_modify(oldpmd, vma->vm_page_prot); pmd = pmd_mkyoung(pmd); - if (was_writable) + + /* Similar to mprotect()
[PATCH RFC 3/5] mm/huge_memory: try avoiding write faults when changing PMD protection
Let's replicate what we have for PTEs in can_change_pte_writable() also for PMDs. While this might look like a pure performance improvement, we'll us this to get rid of savedwrite handling in do_huge_pmd_numa_page() next. Place do_huge_pmd_numa_page() stategicly good for that purpose. Note that MM_CP_TRY_CHANGE_WRITABLE is currently only set when we come via mprotect_fixup(). Signed-off-by: David Hildenbrand --- mm/huge_memory.c | 38 -- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2f18896c8f9a..e5ce3e11d4ae 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1386,6 +1386,36 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf) return VM_FAULT_FALLBACK; } +static inline bool can_change_pmd_writable(struct vm_area_struct *vma, + unsigned long addr, pmd_t pmd) +{ + struct page *page; + + if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) + return false; + + /* Don't touch entries that are not even readable (NUMA hinting). */ + if (pmd_protnone(pmd)) + return false; + + /* Do we need write faults for softdirty tracking? */ + if (vma_soft_dirty_enabled(vma) && !pmd_soft_dirty(pmd)) + return false; + + /* Do we need write faults for uffd-wp tracking? */ + if (userfaultfd_huge_pmd_wp(vma, pmd)) + return false; + + if (!(vma->vm_flags & VM_SHARED)) { + /* See can_change_pte_writable(). */ + page = vm_normal_page_pmd(vma, addr, pmd); + return page && PageAnon(page) && !PageAnonExclusive(page); + } + + /* See can_change_pte_writable(). */ + return pmd_dirty(pmd); +} + /* FOLL_FORCE can write to even unwritable PMDs in COW mappings. */ static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page, struct vm_area_struct *vma, @@ -1889,13 +1919,17 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, */ entry = pmd_clear_uffd_wp(entry); } + + /* See change_pte_range(). */ + if ((cp_flags & MM_CP_TRY_CHANGE_WRITABLE) && !pmd_write(entry) && + can_change_pmd_writable(vma, addr, entry)) + entry = pmd_mkwrite(entry); + ret = HPAGE_PMD_NR; set_pmd_at(mm, addr, pmd, entry); if (huge_pmd_needs_flush(oldpmd, entry)) tlb_flush_pmd_range(tlb, addr, HPAGE_PMD_SIZE); - - BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(entry)); unlock: spin_unlock(ptl); return ret; -- 2.37.3
[PATCH RFC 2/5] mm/mprotect: minor can_change_pte_writable() cleanups
We want to replicate this code for handling PMDs soon. No need to crash the kernel, warning and rejecting is good enough. As this will no longer get optimized out, drop the pte_write() check: no harm would be done. While at it, add a comment why PROT_NONE mapped pages are excluded. Signed-off-by: David Hildenbrand --- mm/mprotect.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index c6c13a0a4bcc..95323bc9a951 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -43,8 +43,10 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma, { struct page *page; - VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte)); + if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE))) + return false; + /* Don't touch entries that are not even readable (NUMA hinting). */ if (pte_protnone(pte)) return false; -- 2.37.3
[PATCH RFC 1/5] mm/mprotect: allow clean exclusive anon pages to be writable
From: Nadav Amit Anonymous pages might have the dirty bit clear, but this should not prevent mprotect from making them writable if they are exclusive. Therefore, skip the test whether the page is dirty in this case. Note that there are already other ways to get a writable PTE mapping an anonymous page that is clean: for example, via MADV_FREE. In an ideal world, we'd have a different indication from the FS whether writenotify is still required. Signed-off-by: Nadav Amit [ comment for dirty/clean handling; return directly; update description ] Signed-off-by: David Hildenbrand --- mm/mprotect.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index ed013f836b4a..c6c13a0a4bcc 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -45,7 +45,7 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma, VM_BUG_ON(!(vma->vm_flags & VM_WRITE) || pte_write(pte)); - if (pte_protnone(pte) || !pte_dirty(pte)) + if (pte_protnone(pte)) return false; /* Do we need write faults for softdirty tracking? */ @@ -64,11 +64,15 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma, * the PT lock. */ page = vm_normal_page(vma, addr, pte); - if (!page || !PageAnon(page) || !PageAnonExclusive(page)) - return false; + return page && PageAnon(page) && PageAnonExclusive(page); } - return true; + /* +* Shared mapping: "clean" might indicate that the FS still has to be +* notified via a write fault once first -- see vma_wants_writenotify(). +* If "dirty", the assumtion is that there already was a write fault. +*/ + return pte_dirty(pte); } static unsigned long change_pte_range(struct mmu_gather *tlb, -- 2.37.3
[PATCH RFC 0/5] mm/autonuma: replace savedwrite infrastructure
As discussed in my talk at LPC, we can reuse the same mechanism for deciding whether to map a pte writable when upgrading permissions via mprotect() -- e.g., PROT_READ -> PROT_READ|PROT_WRITE -- to replace the savedwrite infrastructure used for NUMA hinting faults (e.g., PROT_NONE -> PROT_READ|PROT_WRITE). Instead of maintaining previous write permissions for a pte/pmd, we re-determine if the pte/pmd can be writable. The big benefit is that we have a common logic for deciding whether we can map a pte/pmd writable on protection changes. For private mappings, there should be no difference -- from what I understand, that is what autonuma benchmarks care about. I ran autonumabench on a system with 2 NUMA nodes, 96 GiB each via: perf stat --null --repeat 10 The numa1 benchmark is quite noisy in my environment. I suspect that there is no actual change in performance, even though the numbers indicate that this series might improve performance slightly. numa1: mm-stable: 156.75 +- 11.67 seconds time elapsed ( +- 7.44% ) mm-stable++: 147.50 +- 9.35 seconds time elapsed ( +- 6.34% ) numa2: mm-stable: 15.9834 +- 0.0589 seconds time elapsed ( +- 0.37% ) mm-stable++: 16.1467 +- 0.0946 seconds time elapsed ( +- 0.59% ) It is worth noting that for shared writable mappings that require writenotify, we will only avoid write faults if the pte/pmd is dirty (inherited from the older mprotect logic). If we ever care about optimizing that further, we'd need a different mechanism to identify whether the FS still needs to get notified on the next write access. In any case, such an optimiztion will then not be autonuma-specific, but mprotect() permission upgrades would similarly benefit from it. Cc: Linus Torvalds Cc: Andrew Morton Cc: Mel Gorman Cc: Dave Chinner Cc: Nadav Amit Cc: Peter Xu Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Vlastimil Babka Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Mike Rapoport Cc: Anshuman Khandual David Hildenbrand (4): mm/mprotect: minor can_change_pte_writable() cleanups mm/huge_memory: try avoiding write faults when changing PMD protection mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite mm: remove unused savedwrite infrastructure Nadav Amit (1): mm/mprotect: allow clean exclusive anon pages to be writable arch/powerpc/include/asm/book3s/64/pgtable.h | 80 +--- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 2 +- include/linux/mm.h | 2 + include/linux/pgtable.h | 24 -- mm/debug_vm_pgtable.c| 32 mm/huge_memory.c | 66 mm/ksm.c | 9 +-- mm/memory.c | 19 - mm/mprotect.c| 23 +++--- 9 files changed, 93 insertions(+), 164 deletions(-) -- 2.37.3
Re: [PATCH v2 6/6] powerpc/64: Add tests for out-of-line static calls
Le 26/09/2022 à 08:43, Benjamin Gray a écrit : > KUnit tests for the various combinations of caller/trampoline/target and > kernel/module. They must be run from a module loaded at runtime to > guarantee they have a different TOC to the kernel. > > The tests try to mitigate the chance of panicing by restoring the > TOC after every static call. Not all possible errors can be caught > by this (we can't stop a trampoline from using a bad TOC itself), > but it makes certain errors easier to debug. > > Signed-off-by: Benjamin Gray > --- > arch/powerpc/Kconfig | 10 + > arch/powerpc/kernel/Makefile | 1 + > arch/powerpc/kernel/static_call.c | 61 ++ > arch/powerpc/kernel/static_call_test.c | 251 + > arch/powerpc/kernel/static_call_test.h | 56 ++ > 5 files changed, 379 insertions(+) > create mode 100644 arch/powerpc/kernel/static_call_test.c > create mode 100644 arch/powerpc/kernel/static_call_test.h > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index e7a66635eade..0ca60514c0e2 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -1023,6 +1023,16 @@ config PPC_RTAS_FILTER > Say Y unless you know what you are doing and the filter is causing > problems for you. > > +config PPC_STATIC_CALL_KUNIT_TEST > + tristate "KUnit tests for PPC64 ELF ABI V2 static calls" > + default KUNIT_ALL_TESTS > + depends on HAVE_STATIC_CALL && PPC64_ELF_ABI_V2 && KUNIT && m Is there a reason why it is dedicated to PPC64 ? In that case, can you make it explicit with the name of the config option, and with the name of the file below ? > + help > + Tests that check the TOC is kept consistent across all combinations > + of caller/trampoline/target being kernel/module. Must be built as a > + module and loaded at runtime to ensure the module has a different > + TOC to the kernel. > + > endmenu > > config ISA_DMA_API > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile > index a30d0d0f5499..22c07e3d34df 100644 > --- a/arch/powerpc/kernel/Makefile > +++ b/arch/powerpc/kernel/Makefile > @@ -131,6 +131,7 @@ obj-$(CONFIG_RELOCATABLE) += reloc_$(BITS).o > obj-$(CONFIG_PPC32) += entry_32.o setup_32.o early_32.o > obj-$(CONFIG_PPC64) += dma-iommu.o iommu.o > obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o > +obj-$(CONFIG_PPC_STATIC_CALL_KUNIT_TEST) += static_call_test.o > obj-$(CONFIG_KGDB) += kgdb.o > obj-$(CONFIG_BOOTX_TEXT)+= btext.o > obj-$(CONFIG_SMP) += smp.o > diff --git a/arch/powerpc/kernel/static_call.c > b/arch/powerpc/kernel/static_call.c > index ecbb74e1b4d3..8d338917b70e 100644 > --- a/arch/powerpc/kernel/static_call.c > +++ b/arch/powerpc/kernel/static_call.c > @@ -113,3 +113,64 @@ void arch_static_call_transform(void *site, void *tramp, > void *func, bool tail) > panic("%s: patching failed %pS at %pS\n", __func__, func, > tramp); > } > EXPORT_SYMBOL_GPL(arch_static_call_transform); > + > + > +#if IS_MODULE(CONFIG_PPC_STATIC_CALL_KUNIT_TEST) > + > +#include "static_call_test.h" > + > +int ppc_sc_kernel_target_1(struct kunit* test) > +{ > + toc_fixup(test); > + return 1; > +} > + > +int ppc_sc_kernel_target_2(struct kunit* test) > +{ > + toc_fixup(test); > + return 2; > +} > + > +DEFINE_STATIC_CALL(ppc_sc_kernel, ppc_sc_kernel_target_1); > + > +int ppc_sc_kernel_call(struct kunit* test) > +{ > + return PROTECTED_SC(test, int, static_call(ppc_sc_kernel)(test)); > +} > + > +int ppc_sc_kernel_call_indirect(struct kunit* test, int (*fn)(struct kunit*)) > +{ > + return PROTECTED_SC(test, int, fn(test)); > +} > + > +long ppc_sc_kernel_target_big(struct kunit* test, > + long a, > + long b, > + long c, > + long d, > + long e, > + long f, > + long g, > + long h, > + long i) > +{ > + toc_fixup(test); > + KUNIT_EXPECT_EQ(test, a, b); > + KUNIT_EXPECT_EQ(test, a, c); > + KUNIT_EXPECT_EQ(test, a, d); > + KUNIT_EXPECT_EQ(test, a, e); > + KUNIT_EXPECT_EQ(test, a, f); > + KUNIT_EXPECT_EQ(test, a, g); > + KUNIT_EXPECT_EQ(test, a, h); > + KUNIT_EXPECT_EQ(test, a, i); > + return ~a; > +} > + > +EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_1); > +EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_2); > +EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_big); > +EXPORT_STATIC_CALL_GPL(ppc_sc_kernel); > +EXPORT_SYMBOL_GPL(ppc_sc_kernel_call); > +EXPORT_SYMBOL_GPL(ppc_sc_kernel_call_indirect); > + > +#endif /* IS_MODULE(CONFIG_PPC_STATIC_CALL_KUNIT_TEST) */ > diff --git a/arch/powerpc/kernel/static_call_test.c > b/arch/powerpc/kernel/static_call_test.c > new file mode 100644 > index
Re: [PATCH v2 5/6] powerpc/64: Add support for out-of-line static calls
Le 26/09/2022 à 08:43, Benjamin Gray a écrit : > Implement static call support for 64 bit V2 ABI. This requires > making sure the TOC is kept correct across kernel-module > boundaries. As a secondary concern, it tries to use the local > entry point of a target wherever possible. It does so by > checking if both tramp & target are kernel code, and falls > back to detecting the common global entry point patterns > if modules are involved. Detecting the global entry point is > also required for setting the local entry point as the trampoline > target: if we cannot detect the local entry point, then we need to > convservatively initialise r12 and use the global entry point. > > The trampolines are marked with `.localentry NAME, 1` to make the > linker save and restore the TOC on each call to the trampoline. This > allows the trampoline to safely target functions with different TOC > values. > > However this directive also implies the TOC is not initialised on entry > to the trampoline. The kernel TOC is easily found in the PACA, but not > an arbitrary module TOC. Therefore the trampoline implementation depends > on whether it's in the kernel or not. If in the kernel, we initialise > the TOC using the PACA. If in a module, we have to initialise the TOC > with zero context, so it's quite expensive. > > Signed-off-by: Benjamin Gray > --- > arch/powerpc/Kconfig | 2 +- > arch/powerpc/include/asm/code-patching.h | 1 + > arch/powerpc/include/asm/static_call.h | 80 +++-- > arch/powerpc/kernel/Makefile | 3 +- > arch/powerpc/kernel/static_call.c| 90 ++-- > 5 files changed, 164 insertions(+), 12 deletions(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 4c466acdc70d..e7a66635eade 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -248,7 +248,7 @@ config PPC > select HAVE_SOFTIRQ_ON_OWN_STACK > select HAVE_STACKPROTECTOR if PPC32 && > $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2) > select HAVE_STACKPROTECTOR if PPC64 && > $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13) > - select HAVE_STATIC_CALL if PPC32 > + select HAVE_STATIC_CALL if PPC32 || PPC64_ELF_ABI_V2 > select HAVE_SYSCALL_TRACEPOINTS > select HAVE_VIRT_CPU_ACCOUNTING > select HUGETLB_PAGE_SIZE_VARIABLE if PPC_BOOK3S_64 && HUGETLB_PAGE > diff --git a/arch/powerpc/include/asm/code-patching.h > b/arch/powerpc/include/asm/code-patching.h > index 15efd8ab22da..8d1850080af8 100644 > --- a/arch/powerpc/include/asm/code-patching.h > +++ b/arch/powerpc/include/asm/code-patching.h > @@ -132,6 +132,7 @@ int translate_branch(ppc_inst_t *instr, const u32 *dest, > const u32 *src); > bool is_conditional_branch(ppc_inst_t instr); > > #define OP_RT_RA_MASK 0xUL > +#define OP_SI_MASK 0xUL > #define LIS_R2 (PPC_RAW_LIS(_R2, 0)) > #define ADDIS_R2_R12(PPC_RAW_ADDIS(_R2, _R12, 0)) > #define ADDI_R2_R2 (PPC_RAW_ADDI(_R2, _R2, 0)) > diff --git a/arch/powerpc/include/asm/static_call.h > b/arch/powerpc/include/asm/static_call.h > index de1018cc522b..3d6e82200cb7 100644 > --- a/arch/powerpc/include/asm/static_call.h > +++ b/arch/powerpc/include/asm/static_call.h > @@ -2,12 +2,75 @@ > #ifndef _ASM_POWERPC_STATIC_CALL_H > #define _ASM_POWERPC_STATIC_CALL_H > > +#ifdef CONFIG_PPC64_ELF_ABI_V2 > + > +#ifdef MODULE > + > +#define __PPC_SCT(name, inst)\ > + asm(".pushsection .text, \"ax\" \n" \ > + ".align 6 \n" \ > + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \ > + ".localentry " STATIC_CALL_TRAMP_STR(name) ", 1 \n" \ > + STATIC_CALL_TRAMP_STR(name) ": \n" \ > + " mflr11 \n" \ > + " bcl 20, 31, $+4 \n" \ > + "0: mflr12 \n" \ > + " mtlr11 \n" \ > + " addi12, 12, (" STATIC_CALL_TRAMP_STR(name) " - 0b) \n" > \ > + " addis 2, 12, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@ha \n" > \ > + " addi 2, 2, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@l\n" > \ > + " " inst "\n" \ > + " ld 12, (2f - " STATIC_CALL_TRAMP_STR(name) ")(12) \n" > \ > + " mtctr 12 \n" \ > + " bctr\n" \ > + "1: li 3, 0\n" \ > + " blr
Re: [PATCH v2 4/6] static_call: Move static call selftest to static_call_selftest.c
Le 26/09/2022 à 08:43, Benjamin Gray a écrit : > These tests are out-of-line only, so moving them to the > their own file allows them to be run when an arch does > not implement inline static calls. > > Signed-off-by: Benjamin Gray I think you got a Reviewed-by from previous series. > --- > kernel/Makefile | 1 + > kernel/static_call_inline.c | 43 --- > kernel/static_call_selftest.c | 41 + > 3 files changed, 42 insertions(+), 43 deletions(-) > create mode 100644 kernel/static_call_selftest.c > > diff --git a/kernel/Makefile b/kernel/Makefile > index 318789c728d3..8ce8beaa3cc0 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -113,6 +113,7 @@ obj-$(CONFIG_KCSAN) += kcsan/ > obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o > obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o > obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o > +obj-$(CONFIG_STATIC_CALL_SELFTEST) += static_call_selftest.o > obj-$(CONFIG_CFI_CLANG) += cfi.o > > obj-$(CONFIG_PERF_EVENTS) += events/ > diff --git a/kernel/static_call_inline.c b/kernel/static_call_inline.c > index dc5665b62814..64d04d054698 100644 > --- a/kernel/static_call_inline.c > +++ b/kernel/static_call_inline.c > @@ -498,46 +498,3 @@ int __init static_call_init(void) > return 0; > } > early_initcall(static_call_init); > - > -#ifdef CONFIG_STATIC_CALL_SELFTEST > - > -static int func_a(int x) > -{ > - return x+1; > -} > - > -static int func_b(int x) > -{ > - return x+2; > -} > - > -DEFINE_STATIC_CALL(sc_selftest, func_a); > - > -static struct static_call_data { > - int (*func)(int); > - int val; > - int expect; > -} static_call_data [] __initdata = { > - { NULL, 2, 3 }, > - { func_b, 2, 4 }, > - { func_a, 2, 3 } > -}; > - > -static int __init test_static_call_init(void) > -{ > - int i; > - > - for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) { > - struct static_call_data *scd = _call_data[i]; > - > - if (scd->func) > - static_call_update(sc_selftest, scd->func); > - > - WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect); > - } > - > - return 0; > -} > -early_initcall(test_static_call_init); > - > -#endif /* CONFIG_STATIC_CALL_SELFTEST */ > diff --git a/kernel/static_call_selftest.c b/kernel/static_call_selftest.c > new file mode 100644 > index ..246ad89f64eb > --- /dev/null > +++ b/kernel/static_call_selftest.c > @@ -0,0 +1,41 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include > + > +static int func_a(int x) > +{ > + return x+1; > +} > + > +static int func_b(int x) > +{ > + return x+2; > +} > + > +DEFINE_STATIC_CALL(sc_selftest, func_a); > + > +static struct static_call_data { > + int (*func)(int); > + int val; > + int expect; > +} static_call_data [] __initdata = { > + { NULL, 2, 3 }, > + { func_b, 2, 4 }, > + { func_a, 2, 3 } > +}; > + > +static int __init test_static_call_init(void) > +{ > + int i; > + > + for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) { > + struct static_call_data *scd = _call_data[i]; > + > + if (scd->func) > + static_call_update(sc_selftest, scd->func); > + > + WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect); > + } > + > + return 0; > +} > +early_initcall(test_static_call_init);
Re: [PATCH v2 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub
Le 26/09/2022 à 08:43, Benjamin Gray a écrit : > Inserts a direct branch to the stub target when possible, replacing the > mtctr/btctr sequence. > > The load into r12 could potentially be skipped too, but that change > would need to refactor the arguments to indicate that the address > does not have a separate local entry point. > > This helps the static call implementation, where modules calling their > own trampolines are called through this stub and the trampoline is > easily within range of a direct branch. > > Signed-off-by: Benjamin Gray > --- > arch/powerpc/kernel/module_64.c | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c > index 4d816f7785b4..745ce9097dcf 100644 > --- a/arch/powerpc/kernel/module_64.c > +++ b/arch/powerpc/kernel/module_64.c > @@ -141,6 +141,12 @@ static u32 ppc64_stub_insns[] = { > PPC_RAW_BCTR(), > }; > > +#ifdef CONFIG_PPC64_ELF_ABI_V1 > +#define PPC64_STUB_MTCTR_OFFSET 5 > +#else > +#define PPC64_STUB_MTCTR_OFFSET 4 > +#endif > + > /* Count how many different 24-bit relocations (different symbol, > different addend) */ > static unsigned int count_relocs(const Elf64_Rela *rela, unsigned int num) > @@ -429,6 +435,8 @@ static inline int create_stub(const Elf64_Shdr *sechdrs, > long reladdr; > func_desc_t desc; > int i; > + u32 *jump_seq_addr = >jump[PPC64_STUB_MTCTR_OFFSET]; > + ppc_inst_t direct; > > if (is_mprofile_ftrace_call(name)) > return create_ftrace_stub(entry, addr, me); > @@ -439,6 +447,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs, > return 0; > } > > + /* Replace indirect branch sequence with direct branch where possible */ > + if (!create_branch(, jump_seq_addr, addr, 0)) > + if (patch_instruction(jump_seq_addr, direct)) Why not use patch_branch() ? > + return 0; > + > /* Stub uses address relative to r2. */ > reladdr = (unsigned long)entry - my_r2(sechdrs, me); > if (reladdr > 0x7FFF || reladdr < -(0x8000L)) {
Re: [PATCH 2/7] mm: Free device private pages have zero refcount
On Mon, Sep 26, 2022 at 04:03:06PM +1000, Alistair Popple wrote: > Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page > refcount") device private pages have no longer had an extra reference > count when the page is in use. However before handing them back to the > owning device driver we add an extra reference count such that free > pages have a reference count of one. > > This makes it difficult to tell if a page is free or not because both > free and in use pages will have a non-zero refcount. Instead we should > return pages to the drivers page allocator with a zero reference count. > Kernel code can then safely use kernel functions such as > get_page_unless_zero(). > > Signed-off-by: Alistair Popple > --- > arch/powerpc/kvm/book3s_hv_uvmem.c | 1 + > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 + > drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 + > lib/test_hmm.c | 1 + > mm/memremap.c| 5 - > mm/page_alloc.c | 6 ++ > 6 files changed, 10 insertions(+), 5 deletions(-) I think this is a great idea, but I'm surprised no dax stuff is touched here? Jason
Re: [PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function
Hi, By the way my email address is not anymore @c-s.fr but @csgroup.eu allthough the former still works. Le 26/09/2022 à 08:43, Benjamin Gray a écrit : > Adds a generic text patching mechanism for patches of 1, 2, 4, or (64-bit) 8 > bytes. The patcher conditionally syncs the icache depending on if > the content will be executed (as opposed to, e.g., read-only data). > > The `patch_instruction` function is reimplemented in terms of this > more generic function. This generic implementation allows patching of > arbitrary 64-bit data, whereas the original `patch_instruction` decided > the size based on the 'instruction' opcode, so was not suitable for > arbitrary data. I get a lot better though still some slight degradation: I get approx 3% more time needed to activate and de-activate ftrace when STRICT_KERNEL_RWX is selected. I get a surprising result without STRICT_KERNEL_RWX. Activation is also 3% but the de-activation needs 25% more time. > > Signed-off-by: Benjamin Gray > --- > arch/powerpc/include/asm/code-patching.h | 7 ++ > arch/powerpc/lib/code-patching.c | 90 +--- > 2 files changed, 71 insertions(+), 26 deletions(-) > > diff --git a/arch/powerpc/include/asm/code-patching.h > b/arch/powerpc/include/asm/code-patching.h > index 1c6316ec4b74..15efd8ab22da 100644 > --- a/arch/powerpc/include/asm/code-patching.h > +++ b/arch/powerpc/include/asm/code-patching.h > @@ -76,6 +76,13 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr, > int patch_branch(u32 *addr, unsigned long target, int flags); > int patch_instruction(u32 *addr, ppc_inst_t instr); > int raw_patch_instruction(u32 *addr, ppc_inst_t instr); > +int __patch_memory(void *dest, unsigned long src, size_t size); > + > +#define patch_memory(addr, val) \ > +({ \ > + BUILD_BUG_ON(!__native_word(val)); \ > + __patch_memory(addr, (unsigned long) val, sizeof(val)); \ > +}) Can you do a static __always_inline function instead of a macro here ? > > static inline unsigned long patch_site_addr(s32 *site) > { > diff --git a/arch/powerpc/lib/code-patching.c > b/arch/powerpc/lib/code-patching.c > index ad0cf3108dd0..9979380d55ef 100644 > --- a/arch/powerpc/lib/code-patching.c > +++ b/arch/powerpc/lib/code-patching.c > @@ -15,20 +15,47 @@ > #include > #include > > -static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 > *patch_addr) > +static int __always_inline ___patch_memory(void *patch_addr, > +unsigned long data, > +void *prog_addr, > +size_t size) Is it really needed in the .c file ? I would expect GCC to take the right decision by itself. By the way, the __always_inline must immediately follow static. > { > - if (!ppc_inst_prefixed(instr)) { > - u32 val = ppc_inst_val(instr); > + switch (size) { > + case 1: > + __put_kernel_nofault(patch_addr, , u8, failed); > + break; > + case 2: > + __put_kernel_nofault(patch_addr, , u16, failed); > + break; > + case 4: > + __put_kernel_nofault(patch_addr, , u32, failed); > + break; > +#ifdef CONFIG_PPC64 > + case 8: > + __put_kernel_nofault(patch_addr, , u64, failed); > + break; > +#endif > + default: > + unreachable(); A BUILD_BUG() would be better here I think. > + } > > - __put_kernel_nofault(patch_addr, , u32, failed); > - } else { > - u64 val = ppc_inst_as_ulong(instr); > + dcbst(patch_addr); > + dcbst(patch_addr + size - 1); /* Last byte of data may cross a > cacheline */ Or the second byte of data may cross a cacheline ... > > - __put_kernel_nofault(patch_addr, , u64, failed); > - } > + mb(); /* sync */ > + > + /* Flush on the EA that may be executed in case of a non-coherent > icache */ > + icbi(prog_addr); > + > + /* Also flush the last byte of the instruction if it may be a > + * prefixed instruction and we aren't assuming minimum 64-byte > + * cacheline sizes > + */ > + if (IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64) > + icbi(prog_addr + size - 1); > > - asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr), > - "r" (exec_addr)); > + mb(); /* sync */ > + isync(); > > return 0; > > @@ -38,7 +65,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t > instr, u32 *patch_addr > > int raw_patch_instruction(u32 *addr, ppc_inst_t instr) > { > - return __patch_instruction(addr, instr, addr); > + if (ppc_inst_prefixed(instr)) > + return ___patch_memory(addr, ppc_inst_as_ulong(instr), addr, > sizeof(u64)); > + else > + return ___patch_memory(addr, ppc_inst_val(instr), addr, > sizeof(u32)); >
Re: [PATCH v2 0/6] Out-of-line static calls for powerpc64 ELF V2
Le 26/09/2022 à 08:43, Benjamin Gray a écrit : > Implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI. > Static calls patch an indirect branch into a direct branch at runtime. > Out-of-line specifically has a caller directly call a trampoline, and > the trampoline gets patched to directly call the target. > > Previous version here: > https://lore.kernel.org/all/20220916062330.430468-1-bg...@linux.ibm.com/ > > I couldn't see a dedicated ftrace benchmark in the kernel, but my own > benchmarking showed no significant impact to ftrace activation. I use the following hack for benchmarking: diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 439e2ab6905e..e7d0d3deb8bf 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -2628,10 +2628,11 @@ void __weak ftrace_replace_code(int mod_flags) bool enable = mod_flags & FTRACE_MODIFY_ENABLE_FL; int schedulable = mod_flags & FTRACE_MODIFY_MAY_SLEEP_FL; int failed; + int t0; if (unlikely(ftrace_disabled)) return; - +t0 = mftb(); do_for_each_ftrace_rec(pg, rec) { if (rec->flags & FTRACE_FL_DISABLED) @@ -2646,6 +2647,8 @@ void __weak ftrace_replace_code(int mod_flags) if (schedulable) cond_resched(); } while_for_each_ftrace_rec(); +t0 = mftb() - t0; +pr_err("%s: %d\n", __func__, t0); } struct ftrace_rec_iter { > > The __patch_memory function is meant to be accessed through the size checking > patch_memory wrapper. I don't think there's a way to expose the macro without > also exposing __patch_memory though. I considered making the type an explicit > macro param, but using the value type seemed more ergonomic. > > V2: > Mostly accounting for feedback from Christophe: > * Code patching rewritten > - Rename to *_memory > - Use __always_inline to get the compiler to realise it can >collapse all the sub-functions > - Pass data directly instead of through a pointer, elliding a redundant > load > - Flush the last byte of data too (technically redundant if an > instrucion, but >saves a conditional branch + the isync will be the bottleneck). > - Handle a non-cohenrent icache, assume a coherent dcache > - Handle when we don't assume a 64 byte icache on 64-bits > - Flatten the poke address init and teardown > - Check the data size in patch_memory at build time >(inline function was suggested, but a macro makes checking > based on the data type easier). > - It builds now on 32 bit and without strict RWX > * Static call enabling is no longer configurable > * Refactored arch_static_call_transform to minimise casting > * Made the KUnit tests more robust (previously they changed non-volatile >registers in the init hook, but that's incorrect because it returns to >the KUnit framework before the test case is called). > * Some other minor refactoring in other patches > > > Benjamin Gray (6): >powerpc/code-patching: Implement generic text patching function >powerpc/module: Handle caller-saved TOC in module linker >powerpc/module: Optimise nearby branches in ELF V2 ABI stub >static_call: Move static call selftest to static_call_selftest.c >powerpc/64: Add support for out-of-line static calls >powerpc/64: Add tests for out-of-line static calls > > arch/powerpc/Kconfig | 12 +- > arch/powerpc/include/asm/code-patching.h | 8 + > arch/powerpc/include/asm/static_call.h | 80 +++- > arch/powerpc/kernel/Makefile | 4 +- > arch/powerpc/kernel/module_64.c | 27 ++- > arch/powerpc/kernel/static_call.c| 151 +- > arch/powerpc/kernel/static_call_test.c | 251 +++ > arch/powerpc/kernel/static_call_test.h | 56 + > arch/powerpc/lib/code-patching.c | 90 +--- > kernel/Makefile | 1 + > kernel/static_call_inline.c | 43 > kernel/static_call_selftest.c| 41 > 12 files changed, 682 insertions(+), 82 deletions(-) > create mode 100644 arch/powerpc/kernel/static_call_test.c > create mode 100644 arch/powerpc/kernel/static_call_test.h > create mode 100644 kernel/static_call_selftest.c > > > base-commit: 3d7a198cfdb47405cfb4a3ea523876569fe341e6 > -- > 2.37.3
Re: [PATCH 3/3] PCI/AER: Use pci_aer_raw_clear_status() to clear root port's AER error status
On 9/23/22 5:50 AM, Bjorn Helgaas wrote: On Fri, Sep 02, 2022 at 02:16:34AM +0800, Zhuo Chen wrote: Statements clearing AER error status in aer_enable_rootport() has the same function as pci_aer_raw_clear_status(). So we replace them, which has no functional changes. Signed-off-by: Zhuo Chen --- drivers/pci/pcie/aer.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index d2996afa80f6..eb0193f279f2 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1287,12 +1287,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc) SYSTEM_ERROR_INTR_ON_MESG_MASK); /* Clear error status */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ); - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, reg32); - pci_read_config_dword(pdev, aer + PCI_ERR_COR_STATUS, ); - pci_write_config_dword(pdev, aer + PCI_ERR_COR_STATUS, reg32); - pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ); - pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32); + pci_aer_raw_clear_status(pdev); It's true that this is functionally equivalent. But 20e15e673b05 ("PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error Status") says pci_aer_raw_clear_status() is only for use in the EDR path (this should have been included in the function comment), so I think we should preserve that property and use pci_aer_clear_status() here. pci_aer_raw_clear_status() is the same as pci_aer_clear_status() except it doesn't check pcie_aer_is_native(). And I'm pretty sure we can't get to aer_enable_rootport() *unless* pcie_aer_is_native(), because get_port_device_capability() checks the same thing, so they should be equivalent here. Bjorn Thanks Bjorn, this very detailed correction is helpful. By the way, 'only for use in the EDR path' obviously written in the function comments may be better. So far only commit log has included these. I will change to use pci_aer_clear_status() in next patch. -- Thanks, Zhuo Chen
Re: [PATCH 2/3] PCI/ERR: Clear fatal status in pcie_do_recovery()
On 9/23/22 5:08 AM, Bjorn Helgaas wrote: On Fri, Sep 02, 2022 at 02:16:33AM +0800, Zhuo Chen wrote: When state is pci_channel_io_frozen in pcie_do_recovery(), the severity is fatal and fatal status should be cleared. So we add pci_aer_clear_fatal_status(). Seems sensible to me. Did you find this by code inspection or by debugging a problem? If the latter, it would be nice to mention the symptoms of the problem in the commit log. I found this by code inspection so I may not enumerate what kind of problems this code will cause. Since pcie_aer_is_native() in pci_aer_clear_fatal_status() and pci_aer_clear_nonfatal_status() contains the function of 'if (host->native_aer || pcie_ports_native)', so we move them out of it. Wrap commit log to fill 75 columns. Signed-off-by: Zhuo Chen --- drivers/pci/pcie/err.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index 0c5a143025af..e0a8ade4c3fe 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -243,10 +243,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, * it is responsible for clearing this status. In that case, the * signaling device may not even be visible to the OS. */ - if (host->native_aer || pcie_ports_native) { + if (host->native_aer || pcie_ports_native) pcie_clear_device_status(dev); pcie_clear_device_status() doesn't check for pcie_aer_is_native() internally, but after 068c29a248b6 ("PCI/ERR: Clear PCIe Device Status errors only if OS owns AER") and aa344bc8b727 ("PCI/ERR: Clear AER status only when we control AER"), both callers check before calling it. I think we should move the check inside pcie_clear_device_status(). That could be a separate preliminary patch. There are a couple other places (aer_root_reset() and get_port_device_capability()) that do the same check and could be changed to use pcie_aer_is_native() instead. That could be another preliminary patch. Good suggestion. But I have only one doubt. In aer_root_reset(), if we use "if (pcie_aer_is_native(dev) && aer)", when dev->aer_cap is NULL and root->aer_cap is not NULL, pcie_aer_is_native() will return false. It's different from just using "(host->native_aer || pcie_ports_native)". Or if we can use "if (pcie_aer_is_native(root))", at this time a NULL pointer check should be added in pcie_aer_is_native() because root may be NULL. + if (state == pci_channel_io_frozen) + pci_aer_clear_fatal_status(dev); + else pci_aer_clear_nonfatal_status(dev); - } + pci_info(bridge, "device recovery successful\n"); return status; -- 2.30.1 (Apple Git-130) -- Thanks, Zhuo Chen
Re: [PATCH 1/3] PCI/AER: Use pci_aer_clear_uncorrect_error_status() to clear uncorrectable error status
On 9/23/22 4:02 AM, Bjorn Helgaas wrote: On Mon, Sep 12, 2022 at 01:09:05AM +0800, Zhuo Chen wrote: On 9/12/22 12:22 AM, Serge Semin wrote: On Fri, Sep 02, 2022 at 02:16:32AM +0800, Zhuo Chen wrote: Status bits for ERR_NONFATAL errors only are cleared in pci_aer_clear_nonfatal_status(), but we want clear uncorrectable error status in ntb_hw_idt.c and lpfc_attr.c. So we add pci_aer_clear_uncorrect_error_status() and change to use it. What about the next drivers drivers/scsi/lpfc/lpfc_attr.c drivers/crypto/hisilicon/qm.c drivers/net/ethernet/intel/ice/ice_main.c which call the pci_aer_clear_nonfatal_status() method too? ‘pci_aer_clear_nonfatal_status()’ in drivers/net/ethernet/intel/ice/ice_main.c has already been removed and merged in kernel in: https://github.com/torvalds/linux/commit/ca415ea1f03abf34fc8e4cc5fc30a00189b4e776 It's better if you can use kernel.org URLs that don't depend on third parties like github, e.g., https://git.kernel.org/linus/ca415ea1f03a Good reminder, I'll pay attention next time. ‘pci_aer_clear_nonfatal_status()’ in drivers/crypto/hisilicon/qm.c will be removed in the next kernel: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/hisilicon/qm.c?id=00278564a60e11df8bcca0ececd8b2f55434e406 This is a problem because 00278564a60e ("crypto: hisilicon - Remove pci_aer_clear_nonfatal_status() call") is in Herbert's cryptodev tree, and if I apply this series to the PCI tree and Linus merges it before Herbert's cryptodev changes, it will break the build. I think we need to split this patch up like this: - Add pci_aer_clear_uncorrect_error_status() to PCI core - Convert dpc to use pci_aer_clear_uncorrect_error_status() (I might end up squashing with above) - Convert lpfc to use pci_aer_clear_uncorrect_error_status() - Convert ntb_hw_idt to use pci_aer_clear_uncorrect_error_status() - Unexport pci_aer_clear_nonfatal_status() Then I can apply all but the last patch safely. If the crypto changes are merged first, we can add the last one; otherwise we can do it for the next cycle. Good proposal. I will implement these in the next version. Do I need to put pci related modifications (include patch 2/3 and 3/3) in a patch set or just single patches? Uncorrectable error status register was intended to be cleared in drivers/scsi/lpfc/lpfc_attr.c. But originally function was changed in https://github.com/torvalds/linux/commit/e7b0b847de6db161e3917732276e425bc92a2feb and https://github.com/torvalds/linux/commit/894020fdd88c1e9a74c60b67c0f19f1c7696ba2f This will be a behavior change for lpfc and ntb_hw_idt. It looks like it changes the behavior back to what it was before e7b0b847de6d ("PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery"), so it might be OK, but splitting these out to their own patches will make the change more obvious and we can make sure that's what we want. Bjorn Thanks Bjorn, I will put lpfc and ntb_hw_idt changes in single patchs. Use pci_aer_clear_nonfatal_status() in dpc_process_error(), which has no functional changes. Since pci_aer_clear_nonfatal_status() is used only internally, move its declaration to the PCI internal header file. Also, no one cares about return value of pci_aer_clear_nonfatal_status(), so make it void. Signed-off-by: Zhuo Chen --- drivers/ntb/hw/idt/ntb_hw_idt.c | 4 ++-- drivers/pci/pci.h | 2 ++ drivers/pci/pcie/aer.c | 23 ++- drivers/pci/pcie/dpc.c | 3 +-- drivers/scsi/lpfc/lpfc_attr.c | 4 ++-- include/linux/aer.h | 4 ++-- 6 files changed, 27 insertions(+), 13 deletions(-) diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c index 733557231ed0..de1dbbc5b9de 100644 --- a/drivers/ntb/hw/idt/ntb_hw_idt.c +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c @@ -2657,8 +2657,8 @@ static int idt_init_pci(struct idt_ntb_dev *ndev) ret = pci_enable_pcie_error_reporting(pdev); if (ret != 0) dev_warn(>dev, "PCIe AER capability disabled\n"); - else /* Cleanup nonfatal error status before getting to init */ - pci_aer_clear_nonfatal_status(pdev); + else /* Cleanup uncorrectable error status before getting to init */ + pci_aer_clear_uncorrect_error_status(pdev); From the IDT NTB PCIe initialization procedure point of view both of these methods are equivalent. So for the IDT NTB part: IDT NTB part is the same as drivers/scsi/lpfc/lpfc_attr.c. The original function is clear uncorrectable error status register including fatal and non-fatal error status bits. Acked-by: Serge Semin -Sergey /* First enable the PCI device */ ret = pcim_enable_device(pdev); diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index e10cdec6c56e..574176f43025 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -686,6 +686,7 @@ void pci_aer_init(struct pci_dev *dev);
Re: [PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function
Hi Benjamin, Thank you for the patch! Yet something to improve: [auto build test ERROR on 3d7a198cfdb47405cfb4a3ea523876569fe341e6] url: https://github.com/intel-lab-lkp/linux/commits/Benjamin-Gray/Out-of-line-static-calls-for-powerpc64-ELF-V2/20220926-145009 base: 3d7a198cfdb47405cfb4a3ea523876569fe341e6 config: powerpc-allnoconfig compiler: powerpc-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/7e7a5738456329ebbc24558228fb729ce5236f60 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Benjamin-Gray/Out-of-line-static-calls-for-powerpc64-ELF-V2/20220926-145009 git checkout 7e7a5738456329ebbc24558228fb729ce5236f60 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/lib/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot All errors (new ones prefixed by >>): >> arch/powerpc/lib/code-patching.c:18:1: error: 'inline' is not at beginning >> of declaration [-Werror=old-style-declaration] 18 | static int __always_inline ___patch_memory(void *patch_addr, | ^~ cc1: all warnings being treated as errors vim +/inline +18 arch/powerpc/lib/code-patching.c 17 > 18 static int __always_inline ___patch_memory(void *patch_addr, 19 unsigned long data, 20 void *prog_addr, 21 size_t size) 22 { 23 switch (size) { 24 case 1: 25 __put_kernel_nofault(patch_addr, , u8, failed); 26 break; 27 case 2: 28 __put_kernel_nofault(patch_addr, , u16, failed); 29 break; 30 case 4: 31 __put_kernel_nofault(patch_addr, , u32, failed); 32 break; 33 #ifdef CONFIG_PPC64 34 case 8: 35 __put_kernel_nofault(patch_addr, , u64, failed); 36 break; 37 #endif 38 default: 39 unreachable(); 40 } 41 42 dcbst(patch_addr); 43 dcbst(patch_addr + size - 1); /* Last byte of data may cross a cacheline */ 44 45 mb(); /* sync */ 46 47 /* Flush on the EA that may be executed in case of a non-coherent icache */ 48 icbi(prog_addr); 49 50 /* Also flush the last byte of the instruction if it may be a 51 * prefixed instruction and we aren't assuming minimum 64-byte 52 * cacheline sizes 53 */ 54 if (IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64) 55 icbi(prog_addr + size - 1); 56 57 mb(); /* sync */ 58 isync(); 59 60 return 0; 61 62 failed: 63 return -EPERM; 64 } 65 -- 0-DAY CI Kernel Test Service https://01.org/lkp # # Automatically generated file; DO NOT EDIT. # Linux/powerpc 6.0.0-rc2 Kernel Configuration # CONFIG_CC_VERSION_TEXT="powerpc-linux-gcc (GCC) 12.1.0" CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=120100 CONFIG_CLANG_VERSION=0 CONFIG_AS_IS_GNU=y CONFIG_AS_VERSION=23800 CONFIG_LD_IS_BFD=y CONFIG_LD_VERSION=23800 CONFIG_LLD_VERSION=0 CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y CONFIG_CC_HAS_ASM_INLINE=y CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y CONFIG_PAHOLE_VERSION=123 CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_TABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set # CONFIG_WERROR is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_XZ is not set CONFIG_DEFAULT_INIT="" CONFIG_DEFAULT_HOSTNAME="(none)" # CONFIG_SYSVIPC is not set # CONFIG_WATCH_QUEUE is not set # CONFIG_CROSS_MEMORY_ATTACH is not set # CONFIG_USELIB is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_SHOW_LEVEL=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y # end of IRQ subsystem CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_HZ_PERIODIC=y # CONFIG_NO_HZ_IDLE is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # end of Tim
Re: [PATCH v2] powerpc: Ignore DSI error caused by the copy/paste instruction
Haren Myneni writes: > DSI error will be generated when the paste operation is issued on > the suspended NX window due to NX state changes. The hypervisor Please spell out DSI and NX on the first usage. > expects the partition to ignore this error during page pault > handling. To differentiate DSI caused by an actual HW configuration > or by the NX window, a new “ibm,pi-features” type value is defined. > Byte 0, bit 3 of pi-attribute-specifier-type is now defined to > indicate this DSI error. If this error is not ignored, the user > space can get SIGBUS when the NX request is issued. > > This patch adds changes to read ibm,pi-features property and ignore > DSI error in the page fault handling if CPU_FTR_NX_DSI if defined. > > Signed-off-by: Haren Myneni > --- > v2: Code cleanup as suggested by Christophe Leroy > > arch/powerpc/include/asm/cputable.h | 5 ++-- > arch/powerpc/kernel/prom.c | 36 + > arch/powerpc/mm/fault.c | 17 +- > 3 files changed, 45 insertions(+), 13 deletions(-) > > diff --git a/arch/powerpc/include/asm/cputable.h > b/arch/powerpc/include/asm/cputable.h > index ae8c3e13cfce..8dc9949b6365 100644 > --- a/arch/powerpc/include/asm/cputable.h > +++ b/arch/powerpc/include/asm/cputable.h > @@ -192,6 +192,7 @@ static inline void cpu_feature_keys_init(void) { } > #define CPU_FTR_P9_RADIX_PREFETCH_BUG > LONG_ASM_CONST(0x0002) > #define CPU_FTR_ARCH_31 > LONG_ASM_CONST(0x0004) > #define CPU_FTR_DAWR1 > LONG_ASM_CONST(0x0008) > +#define CPU_FTR_NX_DSI > LONG_ASM_CONST(0x0010) Can we make this an MMU feature? We have a lot more free MMU feature bits, it should just be a case of s/cpu/mmu/ pretty much everywhere you use it. > #ifndef __ASSEMBLY__ > > @@ -429,7 +430,7 @@ static inline void cpu_feature_keys_init(void) { } > CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ > CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ > CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_P9_TLBIE_STQ_BUG | \ > - CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR) > + CPU_FTR_P9_TLBIE_ERAT_BUG | CPU_FTR_P9_TIDR | CPU_FTR_NX_DSI) > #define CPU_FTRS_POWER9_DD2_0 (CPU_FTRS_POWER9 | > CPU_FTR_P9_RADIX_PREFETCH_BUG) > #define CPU_FTRS_POWER9_DD2_1 (CPU_FTRS_POWER9 | \ > CPU_FTR_P9_RADIX_PREFETCH_BUG | \ > @@ -451,7 +452,7 @@ static inline void cpu_feature_keys_init(void) { } > CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \ > CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \ > CPU_FTR_ARCH_300 | CPU_FTR_ARCH_31 | \ > - CPU_FTR_DAWR | CPU_FTR_DAWR1) > + CPU_FTR_DAWR | CPU_FTR_DAWR1 | CPU_FTR_NX_DSI) You're turning that bit on by default for Power9 and Power10 - is that correct? If so do you have a documentation source for that? cheers
[PATCH v2 2/2] powerpc/rtas: block error injection when locked down
The error injection facility on pseries VMs allows corruption of arbitrary guest memory, potentially enabling a sufficiently privileged user to disable lockdown or perform other modifications of the running kernel via the rtas syscall. Block the PAPR error injection facility from being opened or called when locked down. Signed-off-by: Nathan Lynch --- arch/powerpc/kernel/rtas.c | 25 - include/linux/security.h | 1 + security/security.c| 1 + 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 693133972294..c2540d393f1c 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -464,6 +465,9 @@ void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret, va_end(list); } +static int ibm_open_errinjct_token; +static int ibm_errinjct_token; + int rtas_call(int token, int nargs, int nret, int *outputs, ...) { va_list list; @@ -476,6 +480,16 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...) if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE) return -1; + if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) { + /* +* It would be nicer to not discard the error value +* from security_locked_down(), but callers expect an +* RTAS status, not an errno. +*/ + if (security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION)) + return -1; + } + if ((mfmsr() & (MSR_IR|MSR_DR)) != (MSR_IR|MSR_DR)) { WARN_ON_ONCE(1); return -1; @@ -1227,6 +1241,14 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs) if (block_rtas_call(token, nargs, )) return -EINVAL; + if (token == ibm_open_errinjct_token || token == ibm_errinjct_token) { + int err; + + err = security_locked_down(LOCKDOWN_RTAS_ERROR_INJECTION); + if (err) + return err; + } + /* Need to handle ibm,suspend_me call specially */ if (token == rtas_token("ibm,suspend-me")) { @@ -1325,7 +1347,8 @@ void __init rtas_initialize(void) #ifdef CONFIG_RTAS_ERROR_LOGGING rtas_last_error_token = rtas_token("rtas-last-error"); #endif - + ibm_open_errinjct_token = rtas_token("ibm,open-errinjct"); + ibm_errinjct_token = rtas_token("ibm,errinjct"); rtas_syscall_filter_init(); } diff --git a/include/linux/security.h b/include/linux/security.h index 39e7c0e403d9..70f89dc3a712 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -123,6 +123,7 @@ enum lockdown_reason { LOCKDOWN_XMON_WR, LOCKDOWN_BPF_WRITE_USER, LOCKDOWN_DBG_WRITE_KERNEL, + LOCKDOWN_RTAS_ERROR_INJECTION, LOCKDOWN_INTEGRITY_MAX, LOCKDOWN_KCORE, LOCKDOWN_KPROBES, diff --git a/security/security.c b/security/security.c index 51bf66d4f472..eabe3ce7e74e 100644 --- a/security/security.c +++ b/security/security.c @@ -61,6 +61,7 @@ const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = { [LOCKDOWN_XMON_WR] = "xmon write access", [LOCKDOWN_BPF_WRITE_USER] = "use of bpf to write user RAM", [LOCKDOWN_DBG_WRITE_KERNEL] = "use of kgdb/kdb to write kernel RAM", + [LOCKDOWN_RTAS_ERROR_INJECTION] = "RTAS error injection", [LOCKDOWN_INTEGRITY_MAX] = "integrity", [LOCKDOWN_KCORE] = "/proc/kcore access", [LOCKDOWN_KPROBES] = "use of kprobes", -- 2.37.3
Re: [PATCH v2 5/6] powerpc/64: Add support for out-of-line static calls
Le 26/09/2022 à 08:43, Benjamin Gray a écrit : > Implement static call support for 64 bit V2 ABI. This requires > making sure the TOC is kept correct across kernel-module > boundaries. As a secondary concern, it tries to use the local > entry point of a target wherever possible. It does so by > checking if both tramp & target are kernel code, and falls > back to detecting the common global entry point patterns > if modules are involved. Detecting the global entry point is > also required for setting the local entry point as the trampoline > target: if we cannot detect the local entry point, then we need to > convservatively initialise r12 and use the global entry point. > > The trampolines are marked with `.localentry NAME, 1` to make the > linker save and restore the TOC on each call to the trampoline. This > allows the trampoline to safely target functions with different TOC > values. > > However this directive also implies the TOC is not initialised on entry > to the trampoline. The kernel TOC is easily found in the PACA, but not > an arbitrary module TOC. Therefore the trampoline implementation depends > on whether it's in the kernel or not. If in the kernel, we initialise > the TOC using the PACA. If in a module, we have to initialise the TOC > with zero context, so it's quite expensive. Build failure with GCC 5.5 (ppc64le_defconfig): CC arch/powerpc/kernel/ptrace/ptrace.o {standard input}: Assembler messages: {standard input}:10: Error: .localentry expression for `__SCT__tp_func_sys_enter' is not a valid power of 2 {standard input}:29: Error: .localentry expression for `__SCT__tp_func_sys_exit' is not a valid power of 2 > > Signed-off-by: Benjamin Gray > --- > arch/powerpc/Kconfig | 2 +- > arch/powerpc/include/asm/code-patching.h | 1 + > arch/powerpc/include/asm/static_call.h | 80 +++-- > arch/powerpc/kernel/Makefile | 3 +- > arch/powerpc/kernel/static_call.c| 90 ++-- > 5 files changed, 164 insertions(+), 12 deletions(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index 4c466acdc70d..e7a66635eade 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -248,7 +248,7 @@ config PPC > select HAVE_SOFTIRQ_ON_OWN_STACK > select HAVE_STACKPROTECTOR if PPC32 && > $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2) > select HAVE_STACKPROTECTOR if PPC64 && > $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13) > - select HAVE_STATIC_CALL if PPC32 > + select HAVE_STATIC_CALL if PPC32 || PPC64_ELF_ABI_V2 > select HAVE_SYSCALL_TRACEPOINTS > select HAVE_VIRT_CPU_ACCOUNTING > select HUGETLB_PAGE_SIZE_VARIABLE if PPC_BOOK3S_64 && HUGETLB_PAGE > diff --git a/arch/powerpc/include/asm/code-patching.h > b/arch/powerpc/include/asm/code-patching.h > index 15efd8ab22da..8d1850080af8 100644 > --- a/arch/powerpc/include/asm/code-patching.h > +++ b/arch/powerpc/include/asm/code-patching.h > @@ -132,6 +132,7 @@ int translate_branch(ppc_inst_t *instr, const u32 *dest, > const u32 *src); > bool is_conditional_branch(ppc_inst_t instr); > > #define OP_RT_RA_MASK 0xUL > +#define OP_SI_MASK 0xUL > #define LIS_R2 (PPC_RAW_LIS(_R2, 0)) > #define ADDIS_R2_R12(PPC_RAW_ADDIS(_R2, _R12, 0)) > #define ADDI_R2_R2 (PPC_RAW_ADDI(_R2, _R2, 0)) > diff --git a/arch/powerpc/include/asm/static_call.h > b/arch/powerpc/include/asm/static_call.h > index de1018cc522b..3d6e82200cb7 100644 > --- a/arch/powerpc/include/asm/static_call.h > +++ b/arch/powerpc/include/asm/static_call.h > @@ -2,12 +2,75 @@ > #ifndef _ASM_POWERPC_STATIC_CALL_H > #define _ASM_POWERPC_STATIC_CALL_H > > +#ifdef CONFIG_PPC64_ELF_ABI_V2 > + > +#ifdef MODULE > + > +#define __PPC_SCT(name, inst)\ > + asm(".pushsection .text, \"ax\" \n" \ > + ".align 6 \n" \ > + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \ > + ".localentry " STATIC_CALL_TRAMP_STR(name) ", 1 \n" \ > + STATIC_CALL_TRAMP_STR(name) ": \n" \ > + " mflr11 \n" \ > + " bcl 20, 31, $+4 \n" \ > + "0: mflr12 \n" \ > + " mtlr11 \n" \ > + " addi12, 12, (" STATIC_CALL_TRAMP_STR(name) " - 0b) \n" > \ > + " addis 2, 12, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@ha \n" > \ > + " addi 2, 2, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@l\n" > \ > + " " inst "\n" \ > +
[PATCH v2 0/2] powerpc/pseries: restrict error injection and DT changes when locked down
Add two new lockdown reasons for use in powerpc's pseries platform code. The pseries platform allows hardware-level error injection via certain calls to the RTAS (Run Time Abstraction Services) firmware. ACPI-based error injection is already restricted in lockdown; this facility should be restricted for the same reasons. pseries also allows nearly arbitrary device tree changes via /proc/powerpc/ofdt. Just as overriding ACPI tables is not allowed while locked down, so should this facility be restricted. Changes since v1: * Move LOCKDOWN_DEVICE_TREE next to LOCKDOWN_ACPI_TABLES. Nathan Lynch (2): powerpc/pseries: block untrusted device tree changes when locked down powerpc/rtas: block error injection when locked down arch/powerpc/kernel/rtas.c| 25 ++- arch/powerpc/platforms/pseries/reconfig.c | 5 + include/linux/security.h | 2 ++ security/security.c | 2 ++ 4 files changed, 33 insertions(+), 1 deletion(-) -- 2.37.3
[PATCH v2 1/2] powerpc/pseries: block untrusted device tree changes when locked down
The /proc/powerpc/ofdt interface allows the root user to freely alter the in-kernel device tree, enabling arbitrary physical address writes via drivers that could bind to malicious device nodes, thus making it possible to disable lockdown. Historically this interface has been used on the pseries platform to facilitate the runtime addition and removal of processor, memory, and device resources (aka Dynamic Logical Partitioning or DLPAR). Years ago, the processor and memory use cases were migrated to designs that happen to be lockdown-friendly: device tree updates are communicated directly to the kernel from firmware without passing through untrusted user space. I/O device DLPAR via the "drmgr" command in powerpc-utils remains the sole legitimate user of /proc/powerpc/ofdt, but it is already broken in lockdown since it uses /dev/mem to allocate argument buffers for the rtas syscall. So only illegitimate uses of the interface should see a behavior change when running on a locked down kernel. Signed-off-by: Nathan Lynch --- arch/powerpc/platforms/pseries/reconfig.c | 5 + include/linux/security.h | 1 + security/security.c | 1 + 3 files changed, 7 insertions(+) diff --git a/arch/powerpc/platforms/pseries/reconfig.c b/arch/powerpc/platforms/pseries/reconfig.c index cad7a0c93117..599bd2c78514 100644 --- a/arch/powerpc/platforms/pseries/reconfig.c +++ b/arch/powerpc/platforms/pseries/reconfig.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -361,6 +362,10 @@ static ssize_t ofdt_write(struct file *file, const char __user *buf, size_t coun char *kbuf; char *tmp; + rv = security_locked_down(LOCKDOWN_DEVICE_TREE); + if (rv) + return rv; + kbuf = memdup_user_nul(buf, count); if (IS_ERR(kbuf)) return PTR_ERR(kbuf); diff --git a/include/linux/security.h b/include/linux/security.h index 7bd0c490703d..39e7c0e403d9 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -114,6 +114,7 @@ enum lockdown_reason { LOCKDOWN_IOPORT, LOCKDOWN_MSR, LOCKDOWN_ACPI_TABLES, + LOCKDOWN_DEVICE_TREE, LOCKDOWN_PCMCIA_CIS, LOCKDOWN_TIOCSSERIAL, LOCKDOWN_MODULE_PARAMETERS, diff --git a/security/security.c b/security/security.c index 4b95de24bc8d..51bf66d4f472 100644 --- a/security/security.c +++ b/security/security.c @@ -52,6 +52,7 @@ const char *const lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = { [LOCKDOWN_IOPORT] = "raw io port access", [LOCKDOWN_MSR] = "raw MSR access", [LOCKDOWN_ACPI_TABLES] = "modifying ACPI tables", + [LOCKDOWN_DEVICE_TREE] = "modifying device tree contents", [LOCKDOWN_PCMCIA_CIS] = "direct PCMCIA CIS storage", [LOCKDOWN_TIOCSSERIAL] = "reconfiguration of serial port IO", [LOCKDOWN_MODULE_PARAMETERS] = "unsafe module parameters", -- 2.37.3
[PATCH -next] soc: fsl: dpio: Add __init/__exit annotations to module init/exit func
Add missing __init/__exit annotations to module init/exit funcs Signed-off-by: ruanjinjie --- drivers/soc/fsl/dpio/dpio-driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/soc/fsl/dpio/dpio-driver.c b/drivers/soc/fsl/dpio/dpio-driver.c index 5a2edc48dd79..534e91dd929c 100644 --- a/drivers/soc/fsl/dpio/dpio-driver.c +++ b/drivers/soc/fsl/dpio/dpio-driver.c @@ -326,7 +326,7 @@ static struct fsl_mc_driver dpaa2_dpio_driver = { .match_id_table = dpaa2_dpio_match_id_table }; -static int dpio_driver_init(void) +static int __init dpio_driver_init(void) { if (!zalloc_cpumask_var(_unused_mask, GFP_KERNEL)) return -ENOMEM; @@ -335,7 +335,7 @@ static int dpio_driver_init(void) return fsl_mc_driver_register(_dpio_driver); } -static void dpio_driver_exit(void) +static void __exit dpio_driver_exit(void) { free_cpumask_var(cpus_unused_mask); fsl_mc_driver_unregister(_dpio_driver); -- 2.25.1
[PATCH -next] soc: fsl: dpio: Add __init/__exit annotations to module init/exit func
Add missing __init/__exit annotations to module init/exit funcs Signed-off-by: ruanjinjie --- drivers/soc/fsl/dpio/dpio-driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/soc/fsl/dpio/dpio-driver.c b/drivers/soc/fsl/dpio/dpio-driver.c index 5a2edc48dd79..534e91dd929c 100644 --- a/drivers/soc/fsl/dpio/dpio-driver.c +++ b/drivers/soc/fsl/dpio/dpio-driver.c @@ -326,7 +326,7 @@ static struct fsl_mc_driver dpaa2_dpio_driver = { .match_id_table = dpaa2_dpio_match_id_table }; -static int dpio_driver_init(void) +static int __init dpio_driver_init(void) { if (!zalloc_cpumask_var(_unused_mask, GFP_KERNEL)) return -ENOMEM; @@ -335,7 +335,7 @@ static int dpio_driver_init(void) return fsl_mc_driver_register(_dpio_driver); } -static void dpio_driver_exit(void) +static void __exit dpio_driver_exit(void) { free_cpumask_var(cpus_unused_mask); fsl_mc_driver_unregister(_dpio_driver); -- 2.25.1
Re: Is PPC 44x PIKA Warp board still relevant?
Christophe Leroy writes: > Hi Dmitry > > Le 25/09/2022 à 07:06, Dmitry Torokhov a écrit : >> Hi Michael, Nick, >> >> I was wondering if PIKA Warp board still relevant. The reason for my >> question is that I am interested in dropping legacy gpio APIs, >> especially OF-specific ones, in favor of newer gpiod APIs, and >> arch/powerpc/platforms/44x/warp.c is one of few users of it. > > As far as I can see, that board is still being sold, see > > https://www.voipon.co.uk/pika-warp-asterisk-appliance-p-932.html On the other hand it looks like PIKA technologies went bankrupt earlier this year. >> The code in question is supposed to turn off green led and flash red led >> in case of overheating, and is doing so by directly accessing GPIOs >> owned by led-gpio driver without requesting/allocating them. This is not >> really supported with gpiod API, and is not a good practice in general. > > As far as I can see, it was ported to led-gpio by > > ba703e1a7a0b powerpc/4xx: Have Warp take advantage of GPIO LEDs > default-state = keep > 805e324b7fbd powerpc: Update Warp to use leds-gpio driver > >> Before I spend much time trying to implement a replacement without >> access to the hardware, I wonder if this board is in use at all, and if >> it is how important is the feature of flashing red led on critical >> temperature shutdown? > > Don't know who can tell it ? I would be surprised if anyone is still running upstream kernels on it. I can't find any sign of any activity on the mailing list related to it since it was initially merged. > Maybe let's perform a more standard implementation is see if anybody > screams ? How much work is it to convert it? Flashing a LED when the machine dies is nice, but not exactly critical, hopefully the machine *isn't* dying that often :) cheers
Re: [PATCH 3/7] powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c
On Monday 26 September 2022 10:17:26 Christophe Leroy wrote: > Le 26/09/2022 à 11:53, Pali Rohár a écrit : > > On Monday 26 September 2022 09:48:02 Christophe Leroy wrote: > >> Le 19/08/2022 à 21:15, Pali Rohár a écrit : > >>> This moves machine descriptions and all related code for all P2020 boards > >>> into new p2020.c source file. This is preparation for code deduplication > >>> and providing one unified machine description for all P2020 boards. > >> > >> I'm having hard time to review this patch. > >> > >> It looks like you are doing much more than just moving machine > >> descriptions and related code into p2020.c > >> > >> Apparently p2020.c has a lot of code that doesn't seem be move from > >> somewhere else. > >> > >> Maybe there is a need to tidy up in order to ease reviewing. > > > > This is probably harder to read due to how git format-patch generated > > this email. The important is: > > > > copy from arch/powerpc/platforms/85xx/mpc85xx_ds.c > > copy to arch/powerpc/platforms/85xx/p2020.c > > > > Which means that git thinks that my newly introduced file p2020.c is > > similar to old file mpc85xx_ds.c and generated diff in format which do: > > > > 1. copy mpc85xx_ds.c to p2020.c > > 2. apply diff on newly introduced file p2020.c > > > > Code is really moved from mpc85xx_ds.c and mpc85xx_rdb.c files into file > > p2020.c. > > > > File p2020.c is new in this patch. > > Well, I didn't really look in how the patch was generated, I imported > your series and mainly reviewed it in git directly. > > For this patch I have the following diff stat: > > $ git show --stat e2d8c39e2e32855658d1c5f042a7ce88952f488a > commit e2d8c39e2e32855658d1c5f042a7ce88952f488a > Author: Pali Rohár > Date: Fri Aug 19 21:15:53 2022 +0200 > > powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c > > This moves machine descriptions and all related code for all P2020 > boards > into new p2020.c source file. This is preparation for code > deduplication > and providing one unified machine description for all P2020 boards. > > Signed-off-by: Pali Rohár > > arch/powerpc/platforms/85xx/Makefile | 2 ++ > arch/powerpc/platforms/85xx/mpc85xx_ds.c | 23 -- > arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 44 -- > arch/powerpc/platforms/85xx/p2020.c | 273 > ++ > 4 files changed, 275 insertions(+), 67 deletions(-) > > > So there is a lot more code added than deleted. > > If it was really a code move as described in the commit message, I would > have approximately the same number of inserts as number of deletions. I see... The reason is that helper ds/rdb functions are copies (not moved) because they are needed still in ds/rdb boards. And in later patches in this patch series are then p2020 helper function cleaned and simplified. So as I see basically this change moves p2020 machine descriptions from ds/rdb files into p2020.c, plus copy helper functions. Not sure what should be the best case how to do it. I did not wanted to introduce regression in the code, so I rather did not touched non-p2020 code in ds/rdb files. > > > > >>> > >>> Signed-off-by: Pali Rohár > >>> --- > >>>arch/powerpc/platforms/85xx/Makefile | 2 + > >>>arch/powerpc/platforms/85xx/mpc85xx_ds.c | 23 --- > >>>arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 44 -- > >>>.../platforms/85xx/{mpc85xx_ds.c => p2020.c} | 134 -- > >>>4 files changed, 91 insertions(+), 112 deletions(-) > >>>copy arch/powerpc/platforms/85xx/{mpc85xx_ds.c => p2020.c} (65%) > >>> > >>> diff --git a/arch/powerpc/platforms/85xx/Makefile > >>> b/arch/powerpc/platforms/85xx/Makefile > >>> index 260fbad7967b..1ad261b4eeb6 100644 > >>> --- a/arch/powerpc/platforms/85xx/Makefile > >>> +++ b/arch/powerpc/platforms/85xx/Makefile > >>> @@ -23,6 +23,8 @@ obj-$(CONFIG_P1010_RDB) += p1010rdb.o > >>>obj-$(CONFIG_P1022_DS)+= p1022_ds.o > >>>obj-$(CONFIG_P1022_RDK) += p1022_rdk.o > >>>obj-$(CONFIG_P1023_RDB) += p1023_rdb.o > >>> +obj-$(CONFIG_MPC85xx_DS) += p2020.o > >>> +obj-$(CONFIG_MPC85xx_RDB) += p2020.o > >>>obj-$(CONFIG_TWR_P102x) += twr_p102x.o > >>>obj-$(CONFIG_CORENET_GENERIC) += corenet_generic.o > >>>obj-$(CONFIG_FB_FSL_DIU) += t1042rdb_diu.o > >>> diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c > >>> b/arch/powerpc/platforms/85xx/mpc85xx_ds.c > >>> index 9a6d637ef54a..05aac997b5ed 100644 > >>> --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c > >>> +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c > >>> @@ -168,7 +168,6 @@ static int __init mpc8544_ds_probe(void) > >>> > >>>machine_arch_initcall(mpc8544_ds, mpc85xx_common_publish_devices); > >>>machine_arch_initcall(mpc8572_ds,
Re: [PATCH 7/7] powerpc: dts: turris1x.dts: Remove "fsl,P2020RDB-PC" compatible string
On Monday 26 September 2022 10:10:19 Christophe Leroy wrote: > Le 19/08/2022 à 21:15, Pali Rohár a écrit : > > "fsl,P2020RDB-PC" compatible string was present in Turris 1.x DTS file just > > because Linux kernel required it for proper detection of P2020 processor > > during boot. > > > > This was quite a hack as CZ,NIC Turris 1.x is not compatible with > > Freescale P2020-RDB-PC board. > > > > Now when kernel has generic unified support for boards with P2020 > > processors, there is no need to have this "hack" in turris1x.dts file. > > > > So remove incorrect "fsl,P2020RDB-PC" compatible string from turris1x.dts. > > Oh, I thought it was not possible to modify DTSes. Boards which have hardcoded DTB binaries in bootloader or are kernel out-of-tree, they obviously needs to be still supported by kernel. > If it is, can you have a common compatible to all p2020, for instance > "fsl,p2020', so that you can use it in patch 5 instead of > of_find_node_by_path("/cpus/PowerPC,P2020@0") ? I can add fsl,p2020. But it does not solve issue for other boards. This string fsl,p2020 is not used by any board (yet). Also Turris 1.x boards have burned some older DTB file in Flash NOR. So it is problematic. > > > > Signed-off-by: Pali Rohár > > --- > > arch/powerpc/boot/dts/turris1x.dts | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/boot/dts/turris1x.dts > > b/arch/powerpc/boot/dts/turris1x.dts > > index 12e08271e61f..69c38ed8a3a5 100644 > > --- a/arch/powerpc/boot/dts/turris1x.dts > > +++ b/arch/powerpc/boot/dts/turris1x.dts > > @@ -15,7 +15,7 @@ > > > > / { > > model = "Turris 1.x"; > > - compatible = "cznic,turris1x", "fsl,P2020RDB-PC"; /* fsl,P2020RDB-PC is > > required for booting Linux */ > > + compatible = "cznic,turris1x"; > > > > aliases { > > ethernet0 =
Re: [PATCH 6/7] powerpc/85xx: p2020: Enable boards by new config option CONFIG_P2020
On Monday 26 September 2022 10:08:19 Christophe Leroy wrote: > Le 19/08/2022 à 21:15, Pali Rohár a écrit : > > Generic unified P2020 machine description which supports all P2020-based > > boards is now in separate file p2020.c. So create a separate config option > > CONFIG_P2020 for it. > > Could it be CONFIG_PPC_P2020 instead ? Nowadays, drivers seems to spread > all over driver/ directory, so it's much better to have CONFIG_PPC_ > prefix on all dedicated powerpc config items. Ok! I do not have any strong preference of config option name. > > > > Previously machine descriptions for P2020 boards were enabled by > > CONFIG_MPC85xx_DS or CONFIG_MPC85xx_RDB option. So set CONFIG_P2020 to be > > enabled by default when one of those option is enabled. > > > > This allows to compile support for P2020 boards without need to have > > enabled support for older mpc85xx boards. And to compile kernel for old > > mpc85xx boards without having enabled support for new P2020 boards. > > > > Signed-off-by: Pali Rohár > > --- > > arch/powerpc/platforms/85xx/Kconfig | 22 ++ > > arch/powerpc/platforms/85xx/Makefile | 3 +-- > > 2 files changed, 19 insertions(+), 6 deletions(-) > > > > diff --git a/arch/powerpc/platforms/85xx/Kconfig > > b/arch/powerpc/platforms/85xx/Kconfig > > index be16eba0f704..2cb4e9248b42 100644 > > --- a/arch/powerpc/platforms/85xx/Kconfig > > +++ b/arch/powerpc/platforms/85xx/Kconfig > > @@ -78,16 +78,16 @@ config MPC8536_DS > > This option enables support for the MPC8536 DS board > > > > config MPC85xx_DS > > - bool "Freescale MPC8544 DS / MPC8572 DS / P2020 DS" > > + bool "Freescale MPC8544 DS / MPC8572 DS" > > select PPC_I8259 > > select DEFAULT_UIMAGE > > select FSL_ULI1575 if PCI > > select SWIOTLB > > help > > - This option enables support for the MPC8544 DS, MPC8572 DS and P2020 > > DS boards > > + This option enables support for the MPC8544 DS and MPC8572 DS boards > > > > config MPC85xx_RDB > > - bool "Freescale P102x MBG/UTM/RDB and P2020 RDB" > > + bool "Freescale P102x MBG/UTM/RDB" > > select PPC_I8259 > > select DEFAULT_UIMAGE > > select FSL_ULI1575 if PCI > > @@ -95,7 +95,21 @@ config MPC85xx_RDB > > help > > This option enables support for the P1020 MBG PC, P1020 UTM PC, > > P1020 RDB PC, P1020 RDB PD, P1020 RDB, P1021 RDB PC, P1024 RDB, > > - P1025 RDB, P2020 RDB and P2020 RDB PC boards > > + and P1025 RDB boards > > + > > +config P2020 > > + bool "Freescale P2020" > > + default y if MPC85xx_DS || MPC85xx_RDB > > Is that necessary ? > Can you just update defconfigs ? This is for old users defconfigs, so if they update kernel to new version it automatically selects all features which were already enabled. But if you think this is not necessary, just drop it. > By the way, did you have a look at the impact on defconfigs ? > > > + select DEFAULT_UIMAGE > > + select SWIOTLB > > + imply PPC_I8259 > > + imply FSL_ULI1575 if PCI > > Why imply and not select ? Because more P2020 boards do not have these two HW parts. So I do not see reason for hard dependency. In my opinion, if user does not need to enable some kernel option (because his HW does not require it) then kernel should allow to do it, unless there is no strong reason for it. And IIRC imply is like select but allow user to disable specified option. > > + help > > + This option enables generic unified support for any board with the > > + Freescale P2020 processor. > > + > > + For example: P2020 DS board, P2020 RDB board, P2020 RDB PC board or > > + CZ.NIC Turris 1.x boards. > > > > config P1010_RDB > > bool "Freescale P1010 RDB" > > diff --git a/arch/powerpc/platforms/85xx/Makefile > > b/arch/powerpc/platforms/85xx/Makefile > > index 1ad261b4eeb6..021e168442d7 100644 > > --- a/arch/powerpc/platforms/85xx/Makefile > > +++ b/arch/powerpc/platforms/85xx/Makefile > > @@ -23,8 +23,7 @@ obj-$(CONFIG_P1010_RDB) += p1010rdb.o > > obj-$(CONFIG_P1022_DS)+= p1022_ds.o > > obj-$(CONFIG_P1022_RDK) += p1022_rdk.o > > obj-$(CONFIG_P1023_RDB) += p1023_rdb.o > > -obj-$(CONFIG_MPC85xx_DS) += p2020.o > > -obj-$(CONFIG_MPC85xx_RDB) += p2020.o > > +obj-$(CONFIG_P2020) += p2020.o > > obj-$(CONFIG_TWR_P102x) += twr_p102x.o > > obj-$(CONFIG_CORENET_GENERIC) += corenet_generic.o > > obj-$(CONFIG_FB_FSL_DIU) += t1042rdb_diu.o
Re: [PATCH 3/7] powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c
Le 26/09/2022 à 11:53, Pali Rohár a écrit : > On Monday 26 September 2022 09:48:02 Christophe Leroy wrote: >> Le 19/08/2022 à 21:15, Pali Rohár a écrit : >>> This moves machine descriptions and all related code for all P2020 boards >>> into new p2020.c source file. This is preparation for code deduplication >>> and providing one unified machine description for all P2020 boards. >> >> I'm having hard time to review this patch. >> >> It looks like you are doing much more than just moving machine >> descriptions and related code into p2020.c >> >> Apparently p2020.c has a lot of code that doesn't seem be move from >> somewhere else. >> >> Maybe there is a need to tidy up in order to ease reviewing. > > This is probably harder to read due to how git format-patch generated > this email. The important is: > > copy from arch/powerpc/platforms/85xx/mpc85xx_ds.c > copy to arch/powerpc/platforms/85xx/p2020.c > > Which means that git thinks that my newly introduced file p2020.c is > similar to old file mpc85xx_ds.c and generated diff in format which do: > > 1. copy mpc85xx_ds.c to p2020.c > 2. apply diff on newly introduced file p2020.c > > Code is really moved from mpc85xx_ds.c and mpc85xx_rdb.c files into file > p2020.c. > > File p2020.c is new in this patch. Well, I didn't really look in how the patch was generated, I imported your series and mainly reviewed it in git directly. For this patch I have the following diff stat: $ git show --stat e2d8c39e2e32855658d1c5f042a7ce88952f488a commit e2d8c39e2e32855658d1c5f042a7ce88952f488a Author: Pali Rohár Date: Fri Aug 19 21:15:53 2022 +0200 powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c This moves machine descriptions and all related code for all P2020 boards into new p2020.c source file. This is preparation for code deduplication and providing one unified machine description for all P2020 boards. Signed-off-by: Pali Rohár arch/powerpc/platforms/85xx/Makefile | 2 ++ arch/powerpc/platforms/85xx/mpc85xx_ds.c | 23 -- arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 44 -- arch/powerpc/platforms/85xx/p2020.c | 273 ++ 4 files changed, 275 insertions(+), 67 deletions(-) So there is a lot more code added than deleted. If it was really a code move as described in the commit message, I would have approximately the same number of inserts as number of deletions. > >>> >>> Signed-off-by: Pali Rohár >>> --- >>>arch/powerpc/platforms/85xx/Makefile | 2 + >>>arch/powerpc/platforms/85xx/mpc85xx_ds.c | 23 --- >>>arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 44 -- >>>.../platforms/85xx/{mpc85xx_ds.c => p2020.c} | 134 -- >>>4 files changed, 91 insertions(+), 112 deletions(-) >>>copy arch/powerpc/platforms/85xx/{mpc85xx_ds.c => p2020.c} (65%) >>> >>> diff --git a/arch/powerpc/platforms/85xx/Makefile >>> b/arch/powerpc/platforms/85xx/Makefile >>> index 260fbad7967b..1ad261b4eeb6 100644 >>> --- a/arch/powerpc/platforms/85xx/Makefile >>> +++ b/arch/powerpc/platforms/85xx/Makefile >>> @@ -23,6 +23,8 @@ obj-$(CONFIG_P1010_RDB) += p1010rdb.o >>>obj-$(CONFIG_P1022_DS)+= p1022_ds.o >>>obj-$(CONFIG_P1022_RDK) += p1022_rdk.o >>>obj-$(CONFIG_P1023_RDB) += p1023_rdb.o >>> +obj-$(CONFIG_MPC85xx_DS) += p2020.o >>> +obj-$(CONFIG_MPC85xx_RDB) += p2020.o >>>obj-$(CONFIG_TWR_P102x) += twr_p102x.o >>>obj-$(CONFIG_CORENET_GENERIC) += corenet_generic.o >>>obj-$(CONFIG_FB_FSL_DIU) += t1042rdb_diu.o >>> diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c >>> b/arch/powerpc/platforms/85xx/mpc85xx_ds.c >>> index 9a6d637ef54a..05aac997b5ed 100644 >>> --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c >>> +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c >>> @@ -168,7 +168,6 @@ static int __init mpc8544_ds_probe(void) >>> >>>machine_arch_initcall(mpc8544_ds, mpc85xx_common_publish_devices); >>>machine_arch_initcall(mpc8572_ds, mpc85xx_common_publish_devices); >>> -machine_arch_initcall(p2020_ds, mpc85xx_common_publish_devices); >>> >>>/* >>> * Called very early, device-tree isn't unflattened >>> @@ -178,14 +177,6 @@ static int __init mpc8572_ds_probe(void) >>> return !!of_machine_is_compatible("fsl,MPC8572DS"); >>>} >>> >>> -/* >>> - * Called very early, device-tree isn't unflattened >>> - */ >>> -static int __init p2020_ds_probe(void) >>> -{ >>> - return !!of_machine_is_compatible("fsl,P2020DS"); >>> -} >>> - >>>define_machine(mpc8544_ds) { >>> .name = "MPC8544 DS", >>> .probe = mpc8544_ds_probe, >>> @@ -213,17 +204,3 @@ define_machine(mpc8572_ds) { >>> .calibrate_decr = generic_calibrate_decr, >>> .progress =
Re: [PATCH 7/7] powerpc: dts: turris1x.dts: Remove "fsl,P2020RDB-PC" compatible string
Le 19/08/2022 à 21:15, Pali Rohár a écrit : > "fsl,P2020RDB-PC" compatible string was present in Turris 1.x DTS file just > because Linux kernel required it for proper detection of P2020 processor > during boot. > > This was quite a hack as CZ,NIC Turris 1.x is not compatible with > Freescale P2020-RDB-PC board. > > Now when kernel has generic unified support for boards with P2020 > processors, there is no need to have this "hack" in turris1x.dts file. > > So remove incorrect "fsl,P2020RDB-PC" compatible string from turris1x.dts. Oh, I thought it was not possible to modify DTSes. If it is, can you have a common compatible to all p2020, for instance "fsl,p2020', so that you can use it in patch 5 instead of of_find_node_by_path("/cpus/PowerPC,P2020@0") ? > > Signed-off-by: Pali Rohár > --- > arch/powerpc/boot/dts/turris1x.dts | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/boot/dts/turris1x.dts > b/arch/powerpc/boot/dts/turris1x.dts > index 12e08271e61f..69c38ed8a3a5 100644 > --- a/arch/powerpc/boot/dts/turris1x.dts > +++ b/arch/powerpc/boot/dts/turris1x.dts > @@ -15,7 +15,7 @@ > > / { > model = "Turris 1.x"; > - compatible = "cznic,turris1x", "fsl,P2020RDB-PC"; /* fsl,P2020RDB-PC is > required for booting Linux */ > + compatible = "cznic,turris1x"; > > aliases { > ethernet0 =
Re: [PATCH 5/7] powerpc/85xx: p2020: Define just one machine description
On Monday 26 September 2022 10:02:47 Christophe Leroy wrote: > > +static int __init p2020_probe(void) > > { > > - if (of_machine_is_compatible("fsl,P2020RDB-PC")) > > - return 1; > > - return 0; > > + struct device_node *p2020_cpu; > > + > > + /* > > +* There is no common compatible string for all P2020 boards. > > +* The only common thing is "PowerPC,P2020@0" cpu node. > > +* So check for P2020 board via this cpu node. > > +*/ > > + p2020_cpu = of_find_node_by_path("/cpus/PowerPC,P2020@0"); > > + if (!p2020_cpu) > > + return 0; > > This looks odd. I though all probe were using the compatible, and in > fact I have a series in preparation that drops all > of_machine_is_compatible() checks in probe functions and do it in the > caller instead, after adding a .compatible string in the machine > description. > > Is there really no compatible that can be used for all p2020 ? Really. There is none. I have looked into all available P2020 DTB files (either externals passed by bootloader or kernel in-tree) and there is no common compatible string. The only "common" thing is cpu node, how I implemented it int this patch series. And same issue is with boards with P101x and P102x DTB files.
Re: [PATCH 6/7] powerpc/85xx: p2020: Enable boards by new config option CONFIG_P2020
Le 19/08/2022 à 21:15, Pali Rohár a écrit : > Generic unified P2020 machine description which supports all P2020-based > boards is now in separate file p2020.c. So create a separate config option > CONFIG_P2020 for it. Could it be CONFIG_PPC_P2020 instead ? Nowadays, drivers seems to spread all over driver/ directory, so it's much better to have CONFIG_PPC_ prefix on all dedicated powerpc config items. > > Previously machine descriptions for P2020 boards were enabled by > CONFIG_MPC85xx_DS or CONFIG_MPC85xx_RDB option. So set CONFIG_P2020 to be > enabled by default when one of those option is enabled. > > This allows to compile support for P2020 boards without need to have > enabled support for older mpc85xx boards. And to compile kernel for old > mpc85xx boards without having enabled support for new P2020 boards. > > Signed-off-by: Pali Rohár > --- > arch/powerpc/platforms/85xx/Kconfig | 22 ++ > arch/powerpc/platforms/85xx/Makefile | 3 +-- > 2 files changed, 19 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/platforms/85xx/Kconfig > b/arch/powerpc/platforms/85xx/Kconfig > index be16eba0f704..2cb4e9248b42 100644 > --- a/arch/powerpc/platforms/85xx/Kconfig > +++ b/arch/powerpc/platforms/85xx/Kconfig > @@ -78,16 +78,16 @@ config MPC8536_DS > This option enables support for the MPC8536 DS board > > config MPC85xx_DS > - bool "Freescale MPC8544 DS / MPC8572 DS / P2020 DS" > + bool "Freescale MPC8544 DS / MPC8572 DS" > select PPC_I8259 > select DEFAULT_UIMAGE > select FSL_ULI1575 if PCI > select SWIOTLB > help > - This option enables support for the MPC8544 DS, MPC8572 DS and P2020 > DS boards > + This option enables support for the MPC8544 DS and MPC8572 DS boards > > config MPC85xx_RDB > - bool "Freescale P102x MBG/UTM/RDB and P2020 RDB" > + bool "Freescale P102x MBG/UTM/RDB" > select PPC_I8259 > select DEFAULT_UIMAGE > select FSL_ULI1575 if PCI > @@ -95,7 +95,21 @@ config MPC85xx_RDB > help > This option enables support for the P1020 MBG PC, P1020 UTM PC, > P1020 RDB PC, P1020 RDB PD, P1020 RDB, P1021 RDB PC, P1024 RDB, > - P1025 RDB, P2020 RDB and P2020 RDB PC boards > + and P1025 RDB boards > + > +config P2020 > + bool "Freescale P2020" > + default y if MPC85xx_DS || MPC85xx_RDB Is that necessary ? Can you just update defconfigs ? By the way, did you have a look at the impact on defconfigs ? > + select DEFAULT_UIMAGE > + select SWIOTLB > + imply PPC_I8259 > + imply FSL_ULI1575 if PCI Why imply and not select ? > + help > + This option enables generic unified support for any board with the > + Freescale P2020 processor. > + > + For example: P2020 DS board, P2020 RDB board, P2020 RDB PC board or > + CZ.NIC Turris 1.x boards. > > config P1010_RDB > bool "Freescale P1010 RDB" > diff --git a/arch/powerpc/platforms/85xx/Makefile > b/arch/powerpc/platforms/85xx/Makefile > index 1ad261b4eeb6..021e168442d7 100644 > --- a/arch/powerpc/platforms/85xx/Makefile > +++ b/arch/powerpc/platforms/85xx/Makefile > @@ -23,8 +23,7 @@ obj-$(CONFIG_P1010_RDB) += p1010rdb.o > obj-$(CONFIG_P1022_DS)+= p1022_ds.o > obj-$(CONFIG_P1022_RDK) += p1022_rdk.o > obj-$(CONFIG_P1023_RDB) += p1023_rdb.o > -obj-$(CONFIG_MPC85xx_DS) += p2020.o > -obj-$(CONFIG_MPC85xx_RDB) += p2020.o > +obj-$(CONFIG_P2020) += p2020.o > obj-$(CONFIG_TWR_P102x) += twr_p102x.o > obj-$(CONFIG_CORENET_GENERIC) += corenet_generic.o > obj-$(CONFIG_FB_FSL_DIU)+= t1042rdb_diu.o
Re: [PATCH 5/7] powerpc/85xx: p2020: Define just one machine description
Le 19/08/2022 à 21:15, Pali Rohár a écrit : > Combine machine descriptions and code of all P2020 boards into just one > generic unified P2020 machine description. This allows kernel to boot on > any P2020-based board with P2020 DTS file without need to patch kernel and > define a new machine description in 85xx powerpc platform directory. > > Signed-off-by: Pali Rohár > --- > arch/powerpc/platforms/85xx/p2020.c | 83 +++-- > 1 file changed, 19 insertions(+), 64 deletions(-) > > diff --git a/arch/powerpc/platforms/85xx/p2020.c > b/arch/powerpc/platforms/85xx/p2020.c > index d327e6c9b838..1a3ffeb47dfc 100644 > --- a/arch/powerpc/platforms/85xx/p2020.c > +++ b/arch/powerpc/platforms/85xx/p2020.c > @@ -154,83 +154,38 @@ static void __init p2020_setup_arch(void) > #endif > } > > -#ifdef CONFIG_MPC85xx_DS > -machine_arch_initcall(p2020_ds, mpc85xx_common_publish_devices); > -#endif /* CONFIG_MPC85xx_DS */ > - > -#ifdef CONFIG_MPC85xx_RDB > -machine_arch_initcall(p2020_rdb, mpc85xx_common_publish_devices); > -machine_arch_initcall(p2020_rdb_pc, mpc85xx_common_publish_devices); > -#endif /* CONFIG_MPC85xx_RDB */ > +machine_arch_initcall(p2020, mpc85xx_common_publish_devices); > > /* >* Called very early, device-tree isn't unflattened >*/ > -#ifdef CONFIG_MPC85xx_DS > -static int __init p2020_ds_probe(void) > -{ > - return !!of_machine_is_compatible("fsl,P2020DS"); > -} > -#endif /* CONFIG_MPC85xx_DS */ > - > -#ifdef CONFIG_MPC85xx_RDB > -static int __init p2020_rdb_probe(void) > -{ > - if (of_machine_is_compatible("fsl,P2020RDB")) > - return 1; > - return 0; > -} > - > -static int __init p2020_rdb_pc_probe(void) > +static int __init p2020_probe(void) > { > - if (of_machine_is_compatible("fsl,P2020RDB-PC")) > - return 1; > - return 0; > + struct device_node *p2020_cpu; > + > + /* > + * There is no common compatible string for all P2020 boards. > + * The only common thing is "PowerPC,P2020@0" cpu node. > + * So check for P2020 board via this cpu node. > + */ > + p2020_cpu = of_find_node_by_path("/cpus/PowerPC,P2020@0"); > + if (!p2020_cpu) > + return 0; This looks odd. I though all probe were using the compatible, and in fact I have a series in preparation that drops all of_machine_is_compatible() checks in probe functions and do it in the caller instead, after adding a .compatible string in the machine description. Is there really no compatible that can be used for all p2020 ? > + > + of_node_put(p2020_cpu); > + return 1; > } > -#endif /* CONFIG_MPC85xx_RDB */ > - > -#ifdef CONFIG_MPC85xx_DS > -define_machine(p2020_ds) { > - .name = "P2020 DS", > - .probe = p2020_ds_probe, > - .setup_arch = p2020_setup_arch, > - .init_IRQ = p2020_pic_init, > -#ifdef CONFIG_PCI > - .pcibios_fixup_bus = fsl_pcibios_fixup_bus, > - .pcibios_fixup_phb = fsl_pcibios_fixup_phb, > -#endif > - .get_irq= mpic_get_irq, > - .calibrate_decr = generic_calibrate_decr, > - .progress = udbg_progress, > -}; > -#endif /* CONFIG_MPC85xx_DS */ > - > -#ifdef CONFIG_MPC85xx_RDB > -define_machine(p2020_rdb) { > - .name = "P2020 RDB", > - .probe = p2020_rdb_probe, > - .setup_arch = p2020_setup_arch, > - .init_IRQ = p2020_pic_init, > -#ifdef CONFIG_PCI > - .pcibios_fixup_bus = fsl_pcibios_fixup_bus, > - .pcibios_fixup_phb = fsl_pcibios_fixup_phb, > -#endif > - .get_irq= mpic_get_irq, > - .calibrate_decr = generic_calibrate_decr, > - .progress = udbg_progress, > -}; > > -define_machine(p2020_rdb_pc) { > - .name = "P2020RDB-PC", > - .probe = p2020_rdb_pc_probe, > +define_machine(p2020) { > + .name = "Freescale P2020", > + .probe = p2020_probe, > .setup_arch = p2020_setup_arch, > .init_IRQ = p2020_pic_init, > #ifdef CONFIG_PCI > .pcibios_fixup_bus = fsl_pcibios_fixup_bus, > - .pcibios_fixup_phb = fsl_pcibios_fixup_phb, > + .pcibios_fixup_phb = fsl_pcibios_fixup_phb, > #endif > .get_irq= mpic_get_irq, > .calibrate_decr = generic_calibrate_decr, > .progress = udbg_progress, > }; > -#endif /* CONFIG_MPC85xx_RDB */
Re: [PATCH 4/7] powerpc/85xx: p2020: Unify .setup_arch and .init_IRQ callbacks
Le 19/08/2022 à 21:15, Pali Rohár a écrit : > Make just one .setup_arch and one .init_IRQ callback implementation for all > P2020 board code. This deduplicate repeated and same code. I think this patch should be split in two parts: First patch : Create function mpc85xx_8259_init Second patch : Refactor. > > Signed-off-by: Pali Rohár > --- > arch/powerpc/platforms/85xx/p2020.c | 97 + > 1 file changed, 30 insertions(+), 67 deletions(-) > > diff --git a/arch/powerpc/platforms/85xx/p2020.c > b/arch/powerpc/platforms/85xx/p2020.c > index d65d4c88ac47..d327e6c9b838 100644 > --- a/arch/powerpc/platforms/85xx/p2020.c > +++ b/arch/powerpc/platforms/85xx/p2020.c > @@ -42,9 +42,8 @@ > #define DBG(fmt, args...) > #endif > > -#ifdef CONFIG_MPC85xx_DS > - > #ifdef CONFIG_PPC_I8259 > + > static void mpc85xx_8259_cascade(struct irq_desc *desc) > { > struct irq_chip *chip = irq_desc_get_chip(desc); > @@ -55,37 +54,21 @@ static void mpc85xx_8259_cascade(struct irq_desc *desc) > } > chip->irq_eoi(>irq_data); > } > -#endif /* CONFIG_PPC_I8259 */ > > -static void __init mpc85xx_ds_pic_init(void) > +static void mpc85xx_8259_init(void) > { > - struct mpic *mpic; > -#ifdef CONFIG_PPC_I8259 > struct device_node *np; > struct device_node *cascade_node = NULL; > int cascade_irq; > -#endif > - > - mpic = mpic_alloc(NULL, 0, > - MPIC_BIG_ENDIAN | > - MPIC_SINGLE_DEST_CPU, > - 0, 256, " OpenPIC "); > > - BUG_ON(mpic == NULL); > - mpic_init(mpic); > - > -#ifdef CONFIG_PPC_I8259 > - /* Initialize the i8259 controller */ > for_each_node_by_type(np, "interrupt-controller") > if (of_device_is_compatible(np, "chrp,iic")) { > cascade_node = np; > break; > } > > - if (cascade_node == NULL) { > - printk(KERN_DEBUG "Could not find i8259 PIC\n"); > + if (cascade_node == NULL) > return; > - } > > cascade_irq = irq_of_parse_and_map(cascade_node, 0); > if (!cascade_irq) { > @@ -93,12 +76,30 @@ static void __init mpc85xx_ds_pic_init(void) > return; > } > > - DBG("mpc85xxds: cascade mapped to irq %d\n", cascade_irq); > + DBG("i8259: cascade mapped to irq %d\n", cascade_irq); > > i8259_init(cascade_node, 0); > of_node_put(cascade_node); > > irq_set_chained_handler(cascade_irq, mpc85xx_8259_cascade); > +} > + > +#endif /* CONFIG_PPC_I8259 */ > + > +static void __init p2020_pic_init(void) > +{ > + struct mpic *mpic; > + > + mpic = mpic_alloc(NULL, 0, > + MPIC_BIG_ENDIAN | > + MPIC_SINGLE_DEST_CPU, > + 0, 256, " OpenPIC "); > + > + BUG_ON(mpic == NULL); > + mpic_init(mpic); > + > +#ifdef CONFIG_PPC_I8259 > + mpc85xx_8259_init(); > #endif /* CONFIG_PPC_I8259 */ > } > > @@ -138,58 +139,20 @@ static void __init mpc85xx_ds_uli_init(void) > #endif > } > > -#endif /* CONFIG_MPC85xx_DS */ > - > -#ifdef CONFIG_MPC85xx_RDB > -static void __init mpc85xx_rdb_pic_init(void) > -{ > - struct mpic *mpic; > - > - mpic = mpic_alloc(NULL, 0, > - MPIC_BIG_ENDIAN | > - MPIC_SINGLE_DEST_CPU, > - 0, 256, " OpenPIC "); > - > - BUG_ON(mpic == NULL); > - mpic_init(mpic); > -} > -#endif /* CONFIG_MPC85xx_RDB */ > - > /* >* Setup the architecture >*/ > -#ifdef CONFIG_MPC85xx_DS > -static void __init mpc85xx_ds_setup_arch(void) > +static void __init p2020_setup_arch(void) > { > - if (ppc_md.progress) > - ppc_md.progress("mpc85xx_ds_setup_arch()", 0); > - > swiotlb_detect_4g(); > fsl_pci_assign_primary(); > mpc85xx_ds_uli_init(); > mpc85xx_smp_init(); > > - printk("MPC85xx DS board from Freescale Semiconductor\n"); > -} > -#endif /* CONFIG_MPC85xx_DS */ > - > -#ifdef CONFIG_MPC85xx_RDB > -static void __init mpc85xx_rdb_setup_arch(void) > -{ > - if (ppc_md.progress) > - ppc_md.progress("mpc85xx_rdb_setup_arch()", 0); > - > - mpc85xx_smp_init(); > - > - fsl_pci_assign_primary(); > - > #ifdef CONFIG_QUICC_ENGINE > mpc85xx_qe_par_io_init(); > -#endif /* CONFIG_QUICC_ENGINE */ > - > - printk(KERN_INFO "MPC85xx RDB board from Freescale Semiconductor\n"); > +#endif > } > -#endif /* CONFIG_MPC85xx_RDB */ > > #ifdef CONFIG_MPC85xx_DS > machine_arch_initcall(p2020_ds, mpc85xx_common_publish_devices); > @@ -230,8 +193,8 @@ static int __init p2020_rdb_pc_probe(void) > define_machine(p2020_ds) { > .name = "P2020 DS", > .probe = p2020_ds_probe, > - .setup_arch = mpc85xx_ds_setup_arch, > - .init_IRQ = mpc85xx_ds_pic_init, > + .setup_arch = p2020_setup_arch, > + .init_IRQ = p2020_pic_init, > #ifdef CONFIG_PCI >
Re: [PATCH 3/7] powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c
On Monday 26 September 2022 09:48:02 Christophe Leroy wrote: > Le 19/08/2022 à 21:15, Pali Rohár a écrit : > > This moves machine descriptions and all related code for all P2020 boards > > into new p2020.c source file. This is preparation for code deduplication > > and providing one unified machine description for all P2020 boards. > > I'm having hard time to review this patch. > > It looks like you are doing much more than just moving machine > descriptions and related code into p2020.c > > Apparently p2020.c has a lot of code that doesn't seem be move from > somewhere else. > > Maybe there is a need to tidy up in order to ease reviewing. This is probably harder to read due to how git format-patch generated this email. The important is: copy from arch/powerpc/platforms/85xx/mpc85xx_ds.c copy to arch/powerpc/platforms/85xx/p2020.c Which means that git thinks that my newly introduced file p2020.c is similar to old file mpc85xx_ds.c and generated diff in format which do: 1. copy mpc85xx_ds.c to p2020.c 2. apply diff on newly introduced file p2020.c Code is really moved from mpc85xx_ds.c and mpc85xx_rdb.c files into file p2020.c. File p2020.c is new in this patch. > > > > Signed-off-by: Pali Rohár > > --- > > arch/powerpc/platforms/85xx/Makefile | 2 + > > arch/powerpc/platforms/85xx/mpc85xx_ds.c | 23 --- > > arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 44 -- > > .../platforms/85xx/{mpc85xx_ds.c => p2020.c} | 134 -- > > 4 files changed, 91 insertions(+), 112 deletions(-) > > copy arch/powerpc/platforms/85xx/{mpc85xx_ds.c => p2020.c} (65%) > > > > diff --git a/arch/powerpc/platforms/85xx/Makefile > > b/arch/powerpc/platforms/85xx/Makefile > > index 260fbad7967b..1ad261b4eeb6 100644 > > --- a/arch/powerpc/platforms/85xx/Makefile > > +++ b/arch/powerpc/platforms/85xx/Makefile > > @@ -23,6 +23,8 @@ obj-$(CONFIG_P1010_RDB) += p1010rdb.o > > obj-$(CONFIG_P1022_DS)+= p1022_ds.o > > obj-$(CONFIG_P1022_RDK) += p1022_rdk.o > > obj-$(CONFIG_P1023_RDB) += p1023_rdb.o > > +obj-$(CONFIG_MPC85xx_DS) += p2020.o > > +obj-$(CONFIG_MPC85xx_RDB) += p2020.o > > obj-$(CONFIG_TWR_P102x) += twr_p102x.o > > obj-$(CONFIG_CORENET_GENERIC) += corenet_generic.o > > obj-$(CONFIG_FB_FSL_DIU) += t1042rdb_diu.o > > diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c > > b/arch/powerpc/platforms/85xx/mpc85xx_ds.c > > index 9a6d637ef54a..05aac997b5ed 100644 > > --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c > > +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c > > @@ -168,7 +168,6 @@ static int __init mpc8544_ds_probe(void) > > > > machine_arch_initcall(mpc8544_ds, mpc85xx_common_publish_devices); > > machine_arch_initcall(mpc8572_ds, mpc85xx_common_publish_devices); > > -machine_arch_initcall(p2020_ds, mpc85xx_common_publish_devices); > > > > /* > >* Called very early, device-tree isn't unflattened > > @@ -178,14 +177,6 @@ static int __init mpc8572_ds_probe(void) > > return !!of_machine_is_compatible("fsl,MPC8572DS"); > > } > > > > -/* > > - * Called very early, device-tree isn't unflattened > > - */ > > -static int __init p2020_ds_probe(void) > > -{ > > - return !!of_machine_is_compatible("fsl,P2020DS"); > > -} > > - > > define_machine(mpc8544_ds) { > > .name = "MPC8544 DS", > > .probe = mpc8544_ds_probe, > > @@ -213,17 +204,3 @@ define_machine(mpc8572_ds) { > > .calibrate_decr = generic_calibrate_decr, > > .progress = udbg_progress, > > }; > > - > > -define_machine(p2020_ds) { > > - .name = "P2020 DS", > > - .probe = p2020_ds_probe, > > - .setup_arch = mpc85xx_ds_setup_arch, > > - .init_IRQ = mpc85xx_ds_pic_init, > > -#ifdef CONFIG_PCI > > - .pcibios_fixup_bus = fsl_pcibios_fixup_bus, > > - .pcibios_fixup_phb = fsl_pcibios_fixup_phb, > > -#endif > > - .get_irq= mpic_get_irq, > > - .calibrate_decr = generic_calibrate_decr, > > - .progress = udbg_progress, > > -}; > > diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c > > b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c > > index b6129c148fea..05f1ed635735 100644 > > --- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c > > +++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c > > @@ -108,8 +108,6 @@ static void __init mpc85xx_rdb_setup_arch(void) > > printk(KERN_INFO "MPC85xx RDB board from Freescale Semiconductor\n"); > > } > > > > -machine_arch_initcall(p2020_rdb, mpc85xx_common_publish_devices); > > -machine_arch_initcall(p2020_rdb_pc, mpc85xx_common_publish_devices); > > machine_arch_initcall(p1020_mbg_pc, mpc85xx_common_publish_devices); > > machine_arch_initcall(p1020_rdb, mpc85xx_common_publish_devices); > > machine_arch_initcall(p1020_rdb_pc, mpc85xx_common_publish_devices); > > @@ -122,13 +120,6 @@ machine_arch_initcall(p1024_rdb, > >
Re: [PATCH 3/7] powerpc/85xx: p2020: Move all P2020 machine descriptions to p2020.c
Le 19/08/2022 à 21:15, Pali Rohár a écrit : > This moves machine descriptions and all related code for all P2020 boards > into new p2020.c source file. This is preparation for code deduplication > and providing one unified machine description for all P2020 boards. I'm having hard time to review this patch. It looks like you are doing much more than just moving machine descriptions and related code into p2020.c Apparently p2020.c has a lot of code that doesn't seem be move from somewhere else. Maybe there is a need to tidy up in order to ease reviewing. > > Signed-off-by: Pali Rohár > --- > arch/powerpc/platforms/85xx/Makefile | 2 + > arch/powerpc/platforms/85xx/mpc85xx_ds.c | 23 --- > arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 44 -- > .../platforms/85xx/{mpc85xx_ds.c => p2020.c} | 134 -- > 4 files changed, 91 insertions(+), 112 deletions(-) > copy arch/powerpc/platforms/85xx/{mpc85xx_ds.c => p2020.c} (65%) > > diff --git a/arch/powerpc/platforms/85xx/Makefile > b/arch/powerpc/platforms/85xx/Makefile > index 260fbad7967b..1ad261b4eeb6 100644 > --- a/arch/powerpc/platforms/85xx/Makefile > +++ b/arch/powerpc/platforms/85xx/Makefile > @@ -23,6 +23,8 @@ obj-$(CONFIG_P1010_RDB) += p1010rdb.o > obj-$(CONFIG_P1022_DS)+= p1022_ds.o > obj-$(CONFIG_P1022_RDK) += p1022_rdk.o > obj-$(CONFIG_P1023_RDB) += p1023_rdb.o > +obj-$(CONFIG_MPC85xx_DS) += p2020.o > +obj-$(CONFIG_MPC85xx_RDB) += p2020.o > obj-$(CONFIG_TWR_P102x) += twr_p102x.o > obj-$(CONFIG_CORENET_GENERIC) += corenet_generic.o > obj-$(CONFIG_FB_FSL_DIU)+= t1042rdb_diu.o > diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c > b/arch/powerpc/platforms/85xx/mpc85xx_ds.c > index 9a6d637ef54a..05aac997b5ed 100644 > --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c > +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c > @@ -168,7 +168,6 @@ static int __init mpc8544_ds_probe(void) > > machine_arch_initcall(mpc8544_ds, mpc85xx_common_publish_devices); > machine_arch_initcall(mpc8572_ds, mpc85xx_common_publish_devices); > -machine_arch_initcall(p2020_ds, mpc85xx_common_publish_devices); > > /* >* Called very early, device-tree isn't unflattened > @@ -178,14 +177,6 @@ static int __init mpc8572_ds_probe(void) > return !!of_machine_is_compatible("fsl,MPC8572DS"); > } > > -/* > - * Called very early, device-tree isn't unflattened > - */ > -static int __init p2020_ds_probe(void) > -{ > - return !!of_machine_is_compatible("fsl,P2020DS"); > -} > - > define_machine(mpc8544_ds) { > .name = "MPC8544 DS", > .probe = mpc8544_ds_probe, > @@ -213,17 +204,3 @@ define_machine(mpc8572_ds) { > .calibrate_decr = generic_calibrate_decr, > .progress = udbg_progress, > }; > - > -define_machine(p2020_ds) { > - .name = "P2020 DS", > - .probe = p2020_ds_probe, > - .setup_arch = mpc85xx_ds_setup_arch, > - .init_IRQ = mpc85xx_ds_pic_init, > -#ifdef CONFIG_PCI > - .pcibios_fixup_bus = fsl_pcibios_fixup_bus, > - .pcibios_fixup_phb = fsl_pcibios_fixup_phb, > -#endif > - .get_irq= mpic_get_irq, > - .calibrate_decr = generic_calibrate_decr, > - .progress = udbg_progress, > -}; > diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c > b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c > index b6129c148fea..05f1ed635735 100644 > --- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c > +++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c > @@ -108,8 +108,6 @@ static void __init mpc85xx_rdb_setup_arch(void) > printk(KERN_INFO "MPC85xx RDB board from Freescale Semiconductor\n"); > } > > -machine_arch_initcall(p2020_rdb, mpc85xx_common_publish_devices); > -machine_arch_initcall(p2020_rdb_pc, mpc85xx_common_publish_devices); > machine_arch_initcall(p1020_mbg_pc, mpc85xx_common_publish_devices); > machine_arch_initcall(p1020_rdb, mpc85xx_common_publish_devices); > machine_arch_initcall(p1020_rdb_pc, mpc85xx_common_publish_devices); > @@ -122,13 +120,6 @@ machine_arch_initcall(p1024_rdb, > mpc85xx_common_publish_devices); > /* >* Called very early, device-tree isn't unflattened >*/ > -static int __init p2020_rdb_probe(void) > -{ > - if (of_machine_is_compatible("fsl,P2020RDB")) > - return 1; > - return 0; > -} > - > static int __init p1020_rdb_probe(void) > { > if (of_machine_is_compatible("fsl,P1020RDB")) > @@ -153,13 +144,6 @@ static int __init p1021_rdb_pc_probe(void) > return 0; > } > > -static int __init p2020_rdb_pc_probe(void) > -{ > - if (of_machine_is_compatible("fsl,P2020RDB-PC")) > - return 1; > - return 0; > -} > - > static int __init p1025_rdb_probe(void) > { > return of_machine_is_compatible("fsl,P1025RDB"); > @@ -180,20 +164,6 @@ static int __init
Re: [PATCH 2/7] powerpc/85xx: Mark mpc85xx_ds_pic_init() as static
On Monday 26 September 2022 09:43:55 Christophe Leroy wrote: > Le 19/08/2022 à 21:15, Pali Rohár a écrit : > > Function mpc85xx_ds_pic_init() is not used out of the mpc85xx_ds.c file. > > > > Signed-off-by: Pali Rohár > > This patch should be squashed into patch 1. No problem. Just to explain that I split those changes into different patches because they touch different files and different board code. And I thought that different things should be in different patches. > > --- > > arch/powerpc/platforms/85xx/mpc85xx_ds.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c > > b/arch/powerpc/platforms/85xx/mpc85xx_ds.c > > index f8d2c97f39bd..9a6d637ef54a 100644 > > --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c > > +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c > > @@ -54,7 +54,7 @@ static void mpc85xx_8259_cascade(struct irq_desc *desc) > > } > > #endif/* CONFIG_PPC_I8259 */ > > > > -void __init mpc85xx_ds_pic_init(void) > > +static void __init mpc85xx_ds_pic_init(void) > > { > > struct mpic *mpic; > > #ifdef CONFIG_PPC_I8259
Re: [PATCH 2/7] powerpc/85xx: Mark mpc85xx_ds_pic_init() as static
Le 19/08/2022 à 21:15, Pali Rohár a écrit : > Function mpc85xx_ds_pic_init() is not used out of the mpc85xx_ds.c file. > > Signed-off-by: Pali Rohár This patch should be squashed into patch 1. > --- > arch/powerpc/platforms/85xx/mpc85xx_ds.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/85xx/mpc85xx_ds.c > b/arch/powerpc/platforms/85xx/mpc85xx_ds.c > index f8d2c97f39bd..9a6d637ef54a 100644 > --- a/arch/powerpc/platforms/85xx/mpc85xx_ds.c > +++ b/arch/powerpc/platforms/85xx/mpc85xx_ds.c > @@ -54,7 +54,7 @@ static void mpc85xx_8259_cascade(struct irq_desc *desc) > } > #endif /* CONFIG_PPC_I8259 */ > > -void __init mpc85xx_ds_pic_init(void) > +static void __init mpc85xx_ds_pic_init(void) > { > struct mpic *mpic; > #ifdef CONFIG_PPC_I8259
Re: [PATCH v3] powerpc/pseries/mce: Avoid instrumentation in realmode
On Mon Sep 26, 2022 at 4:18 PM AEST, Ganesh Goudar wrote: > Part of machine check error handling is done in realmode, > As of now instrumentation is not possible for any code that > runs in realmode. > When MCE is injected on KASAN enabled kernel, crash is > observed, Hence force inline or mark no instrumentation > for functions which can run in realmode, to avoid KASAN > instrumentation. > > Signed-off-by: Ganesh Goudar > --- > v2: Force inline few more functions. > > v3: Adding noinstr to few functions instead of __always_inline. I would still like to consider doing a realmode annotation, but as a minimal fix for the next merge window I suppose this is okay. There's still no indication for why the annotation exists on the functions which is a bit annoying, maybe not fundamentally worse than notrace was, but the scope of reasons why it's there gets bigger. > --- > arch/powerpc/include/asm/hw_irq.h| 8 > arch/powerpc/include/asm/interrupt.h | 2 +- > arch/powerpc/include/asm/rtas.h | 4 ++-- > arch/powerpc/kernel/rtas.c | 4 ++-- > 4 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/arch/powerpc/include/asm/hw_irq.h > b/arch/powerpc/include/asm/hw_irq.h > index 983551859891..c4d542b4a623 100644 > --- a/arch/powerpc/include/asm/hw_irq.h > +++ b/arch/powerpc/include/asm/hw_irq.h > @@ -111,7 +111,7 @@ static inline void __hard_RI_enable(void) > #ifdef CONFIG_PPC64 > #include > > -static inline notrace unsigned long irq_soft_mask_return(void) > +noinstr static unsigned long irq_soft_mask_return(void) > { > unsigned long flags; Don't uninline the ones in headers. > @@ -128,7 +128,7 @@ static inline notrace unsigned long > irq_soft_mask_return(void) > * for the critical section and as a clobber because > * we changed paca->irq_soft_mask > */ > -static inline notrace void irq_soft_mask_set(unsigned long mask) > +noinstr static void irq_soft_mask_set(unsigned long mask) > { > /* >* The irq mask must always include the STD bit if any are set. > @@ -155,7 +155,7 @@ static inline notrace void irq_soft_mask_set(unsigned > long mask) > : "memory"); > } > > -static inline notrace unsigned long irq_soft_mask_set_return(unsigned long > mask) > +noinstr static unsigned long irq_soft_mask_set_return(unsigned long mask) > { > unsigned long flags; > > @@ -191,7 +191,7 @@ static inline notrace unsigned long > irq_soft_mask_or_return(unsigned long mask) > return flags; > } > > -static inline unsigned long arch_local_save_flags(void) > +static __always_inline unsigned long arch_local_save_flags(void) > { > return irq_soft_mask_return(); > } Can we instead add noinstr to this too, the the other ones that were changed to always inline? Thanks, Nick > diff --git a/arch/powerpc/include/asm/interrupt.h > b/arch/powerpc/include/asm/interrupt.h > index 8069dbc4b8d1..090895051712 100644 > --- a/arch/powerpc/include/asm/interrupt.h > +++ b/arch/powerpc/include/asm/interrupt.h > @@ -92,7 +92,7 @@ static inline bool is_implicit_soft_masked(struct pt_regs > *regs) > return search_kernel_soft_mask_table(regs->nip); > } > > -static inline void srr_regs_clobbered(void) > +static __always_inline void srr_regs_clobbered(void) > { > local_paca->srr_valid = 0; > local_paca->hsrr_valid = 0; > diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h > index 00531af17ce0..52d29d664fdf 100644 > --- a/arch/powerpc/include/asm/rtas.h > +++ b/arch/powerpc/include/asm/rtas.h > @@ -201,13 +201,13 @@ inline uint32_t rtas_ext_event_company_id(struct > rtas_ext_event_log_v6 *ext_log) > #define PSERIES_ELOG_SECT_ID_MCE (('M' << 8) | 'C') > > static > -inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect) > +__always_inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect) > { > return be16_to_cpu(sect->id); > } > > static > -inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect) > +__always_inline uint16_t pseries_errorlog_length(struct pseries_errorlog > *sect) > { > return be16_to_cpu(sect->length); > } > diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c > index 693133972294..f9d78245c0e8 100644 > --- a/arch/powerpc/kernel/rtas.c > +++ b/arch/powerpc/kernel/rtas.c > @@ -48,7 +48,7 @@ > /* This is here deliberately so it's only used in this file */ > void enter_rtas(unsigned long); > > -static inline void do_enter_rtas(unsigned long args) > +static __always_inline void do_enter_rtas(unsigned long args) > { > unsigned long msr; > > @@ -435,7 +435,7 @@ static char *__fetch_rtas_last_error(char *altbuf) > #endif > > > -static void > +noinstr static void > va_rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret, > va_list list) > { > -- > 2.37.1
Re: [PATCH -next] powerpc: Avoid platform device Leak in the event of platform_device_add() fails
kindly ping 在 2022/9/14 11:26, Lin Yujun 写道: Use platform_device_put() to free the platform device and return directly in the event platform_device_add() fails. Fixes: a28d3af2a26c ("[PATCH] 2/5 powerpc: Rework PowerMac i2c part 2") Signed-off-by: Lin Yujun --- arch/powerpc/platforms/powermac/low_i2c.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powermac/low_i2c.c b/arch/powerpc/platforms/powermac/low_i2c.c index c1c430c66dc9..5171635c3450 100644 --- a/arch/powerpc/platforms/powermac/low_i2c.c +++ b/arch/powerpc/platforms/powermac/low_i2c.c @@ -1487,6 +1487,7 @@ static int __init pmac_i2c_create_platform_devices(void) { struct pmac_i2c_bus *bus; int i = 0; + int ret; /* In the case where we are initialized from smp_init(), we must * not use the timer (and thus the irq). It's safe from now on @@ -1502,7 +1503,11 @@ static int __init pmac_i2c_create_platform_devices(void) return -ENOMEM; bus->platform_dev->dev.platform_data = bus; bus->platform_dev->dev.of_node = bus->busnode; - platform_device_add(bus->platform_dev); + ret = platform_device_add(bus->platform_dev); + if (ret) { + platform_device_put(bus->platform_dev); + return ret; + } } /* Now call platform "init" functions */
Re: [PATCH v3] powerpc/pseries/mce: Avoid instrumentation in realmode
> On 26-Sep-2022, at 11:48 AM, Ganesh Goudar wrote: > > Part of machine check error handling is done in realmode, > As of now instrumentation is not possible for any code that > runs in realmode. > When MCE is injected on KASAN enabled kernel, crash is > observed, Hence force inline or mark no instrumentation > for functions which can run in realmode, to avoid KASAN > instrumentation. > > Signed-off-by: Ganesh Goudar > --- > v2: Force inline few more functions. > > v3: Adding noinstr to few functions instead of __always_inline. > --- Tested-by: Sachin Sant - Sachin
RE: [RFC PATCH 2/2] powerpc: nop trap instruction after WARN_ONCE fires
From: Nicholas Piggin > Sent: 23 September 2022 16:42 > > WARN_ONCE and similar are often used in frequently executed code, and > should not crash the system. The program check interrupt caused by > WARN_ON_ONCE can be a significant overhead even when nothing is being > printed. This can cause performance to become unacceptable, having the > same effective impact to the user as a BUG_ON(). > > Avoid this overhead by patching the trap with a nop instruction after a > "once" trap fires. Conditional warnings that return a result must have > equivalent compare and branch instructions after the trap, so when it is > nopped the statement will behave the same way. It's possible the asm > goto should be removed entirely and this comparison just done in C now. > > XXX: possibly this should schedule the patching to run in a different > context than the program check. I'm pretty sure WARN_ON_ONCE() is valid everywhere printk() is allowed. In many cases this means you can't call mutex_enter(). So you need a different scheme. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
[PATCH v3 2/4] powerpc/64s: Remove unneeded #ifdef CONFIG_DEBUG_PAGEALLOC in hash_utils
From: Christophe Leroy debug_pagealloc_enabled() is always defined and constant folds to 'false' when CONFIG_DEBUG_PAGEALLOC is not enabled. Remove the #ifdefs, the code and associated static variables will be optimised out by the compiler when CONFIG_DEBUG_PAGEALLOC is not defined. Signed-off-by: Christophe Leroy Signed-off-by: Nicholas Miehlbradt --- arch/powerpc/mm/book3s64/hash_utils.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index fc92613dc2bf..e63ff401a6ea 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -123,11 +123,8 @@ EXPORT_SYMBOL_GPL(mmu_slb_size); #ifdef CONFIG_PPC_64K_PAGES int mmu_ci_restrictions; #endif -#ifdef CONFIG_DEBUG_PAGEALLOC static u8 *linear_map_hash_slots; static unsigned long linear_map_hash_count; -static DEFINE_SPINLOCK(linear_map_hash_lock); -#endif /* CONFIG_DEBUG_PAGEALLOC */ struct mmu_hash_ops mmu_hash_ops; EXPORT_SYMBOL(mmu_hash_ops); @@ -427,11 +424,9 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend, break; cond_resched(); -#ifdef CONFIG_DEBUG_PAGEALLOC if (debug_pagealloc_enabled() && (paddr >> PAGE_SHIFT) < linear_map_hash_count) linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80; -#endif /* CONFIG_DEBUG_PAGEALLOC */ } return ret < 0 ? ret : 0; } @@ -1066,7 +1061,6 @@ static void __init htab_initialize(void) prot = pgprot_val(PAGE_KERNEL); -#ifdef CONFIG_DEBUG_PAGEALLOC if (debug_pagealloc_enabled()) { linear_map_hash_count = memblock_end_of_DRAM() >> PAGE_SHIFT; linear_map_hash_slots = memblock_alloc_try_nid( @@ -1076,7 +1070,6 @@ static void __init htab_initialize(void) panic("%s: Failed to allocate %lu bytes max_addr=%pa\n", __func__, linear_map_hash_count, _rma_size); } -#endif /* CONFIG_DEBUG_PAGEALLOC */ /* create bolted the linear mapping in the hash table */ for_each_mem_range(i, , ) { @@ -1991,6 +1984,8 @@ long hpte_insert_repeating(unsigned long hash, unsigned long vpn, } #ifdef CONFIG_DEBUG_PAGEALLOC +static DEFINE_SPINLOCK(linear_map_hash_lock); + static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) { unsigned long hash; -- 2.34.1
[PATCH v3 4/4] powerpc/64s: Enable KFENCE on book3s64
KFENCE support was added for ppc32 in commit 90cbac0e995d ("powerpc: Enable KFENCE for PPC32"). Enable KFENCE on ppc64 architecture with hash and radix MMUs. It uses the same mechanism as debug pagealloc to protect/unprotect pages. All KFENCE kunit tests pass on both MMUs. KFENCE memory is initially allocated using memblock but is later marked as SLAB allocated. This necessitates the change to __pud_free to ensure that the KFENCE pages are freed appropriately. Based on previous work by Christophe Leroy and Jordan Niethe. Signed-off-by: Nicholas Miehlbradt --- v2: Refactor v3: Simplified ABI version check --- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/book3s/64/pgalloc.h | 6 -- arch/powerpc/include/asm/book3s/64/pgtable.h | 2 +- arch/powerpc/include/asm/kfence.h| 15 +++ arch/powerpc/mm/book3s64/hash_utils.c| 10 +- arch/powerpc/mm/book3s64/radix_pgtable.c | 6 -- 6 files changed, 30 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index a4f8a5276e5c..f7dd0f49510d 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -194,7 +194,7 @@ config PPC select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14 select HAVE_ARCH_KASAN if PPC_RADIX_MMU select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN - select HAVE_ARCH_KFENCE if PPC_BOOK3S_32 || PPC_8xx || 40x + select HAVE_ARCH_KFENCE if ARCH_SUPPORTS_DEBUG_PAGEALLOC select HAVE_ARCH_KGDB select HAVE_ARCH_MMAP_RND_BITS select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h index e1af0b394ceb..dd2cff53a111 100644 --- a/arch/powerpc/include/asm/book3s/64/pgalloc.h +++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h @@ -113,9 +113,11 @@ static inline void __pud_free(pud_t *pud) /* * Early pud pages allocated via memblock allocator -* can't be directly freed to slab +* can't be directly freed to slab. KFENCE pages have +* both reserved and slab flags set so need to be freed +* kmem_cache_free. */ - if (PageReserved(page)) + if (PageReserved(page) && !PageSlab(page)) free_reserved_page(page); else kmem_cache_free(PGT_CACHE(PUD_CACHE_INDEX), pud); diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index cb9d5fd39d7f..fd5d800f2836 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -1123,7 +1123,7 @@ static inline void vmemmap_remove_mapping(unsigned long start, } #endif -#ifdef CONFIG_DEBUG_PAGEALLOC +#if defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE) static inline void __kernel_map_pages(struct page *page, int numpages, int enable) { if (radix_enabled()) diff --git a/arch/powerpc/include/asm/kfence.h b/arch/powerpc/include/asm/kfence.h index a9846b68c6b9..6fd2b4d486c5 100644 --- a/arch/powerpc/include/asm/kfence.h +++ b/arch/powerpc/include/asm/kfence.h @@ -11,11 +11,25 @@ #include #include +#ifdef CONFIG_PPC64_ELF_ABI_V1 +#define ARCH_FUNC_PREFIX "." +#endif + static inline bool arch_kfence_init_pool(void) { return true; } +#ifdef CONFIG_PPC64 +static inline bool kfence_protect_page(unsigned long addr, bool protect) +{ + struct page *page = virt_to_page(addr); + + __kernel_map_pages(page, 1, !protect); + + return true; +} +#else static inline bool kfence_protect_page(unsigned long addr, bool protect) { pte_t *kpte = virt_to_kpte(addr); @@ -29,5 +43,6 @@ static inline bool kfence_protect_page(unsigned long addr, bool protect) return true; } +#endif #endif /* __ASM_POWERPC_KFENCE_H */ diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index b37412fe5930..9cceaa5998a3 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -424,7 +424,7 @@ int htab_bolt_mapping(unsigned long vstart, unsigned long vend, break; cond_resched(); - if (debug_pagealloc_enabled() && + if (debug_pagealloc_enabled_or_kfence() && (paddr >> PAGE_SHIFT) < linear_map_hash_count) linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80; } @@ -773,7 +773,7 @@ static void __init htab_init_page_sizes(void) bool aligned = true; init_hpte_page_sizes(); - if (!debug_pagealloc_enabled()) { + if (!debug_pagealloc_enabled_or_kfence()) { /* * Pick a size for the linear mapping. Currently, we only * support 16M, 1M and 4K which is the default @@
[PATCH v3 3/4] powerpc/64s: Allow double call of kernel_[un]map_linear_page()
From: Christophe Leroy If the page is already mapped resp. already unmapped, bail out. Signed-off-by: Christophe Leroy Signed-off-by: Nicholas Miehlbradt --- arch/powerpc/mm/book3s64/hash_utils.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c index e63ff401a6ea..b37412fe5930 100644 --- a/arch/powerpc/mm/book3s64/hash_utils.c +++ b/arch/powerpc/mm/book3s64/hash_utils.c @@ -2000,6 +2000,9 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) if (!vsid) return; + if (linear_map_hash_slots[lmi] & 0x80) + return; + ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode, HPTE_V_BOLTED, mmu_linear_psize, mmu_kernel_ssize); @@ -2019,7 +2022,10 @@ static void kernel_unmap_linear_page(unsigned long vaddr, unsigned long lmi) hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize); spin_lock(_map_hash_lock); - BUG_ON(!(linear_map_hash_slots[lmi] & 0x80)); + if (!(linear_map_hash_slots[lmi] & 0x80)) { + spin_unlock(_map_hash_lock); + return; + } hidx = linear_map_hash_slots[lmi] & 0x7f; linear_map_hash_slots[lmi] = 0; spin_unlock(_map_hash_lock); -- 2.34.1
[PATCH v3 1/4] powerpc/64s: Add DEBUG_PAGEALLOC for radix
There is support for DEBUG_PAGEALLOC on hash but not on radix. Add support on radix. Signed-off-by: Nicholas Miehlbradt --- v2: Revert change to radix_memory_block_size, instead set the size in radix_init_pgtable and radix__create_section_mapping directly. v3: Remove max_mapping_size argument of create_physical_mapping as the value is the same at all call sites. --- arch/powerpc/mm/book3s64/radix_pgtable.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c index db2f3d193448..daa40e3b74dd 100644 --- a/arch/powerpc/mm/book3s64/radix_pgtable.c +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c @@ -30,6 +30,7 @@ #include #include #include +#include #include @@ -267,13 +268,16 @@ static unsigned long next_boundary(unsigned long addr, unsigned long end) static int __meminit create_physical_mapping(unsigned long start, unsigned long end, -unsigned long max_mapping_size, int nid, pgprot_t _prot) { unsigned long vaddr, addr, mapping_size = 0; bool prev_exec, exec = false; pgprot_t prot; int psize; + unsigned long max_mapping_size = radix_mem_block_size; + + if (debug_pagealloc_enabled()) + max_mapping_size = PAGE_SIZE; start = ALIGN(start, PAGE_SIZE); end = ALIGN_DOWN(end, PAGE_SIZE); @@ -352,7 +356,6 @@ static void __init radix_init_pgtable(void) } WARN_ON(create_physical_mapping(start, end, - radix_mem_block_size, -1, PAGE_KERNEL)); } @@ -850,7 +853,7 @@ int __meminit radix__create_section_mapping(unsigned long start, } return create_physical_mapping(__pa(start), __pa(end), - radix_mem_block_size, nid, prot); + nid, prot); } int __meminit radix__remove_section_mapping(unsigned long start, unsigned long end) @@ -899,7 +902,14 @@ void __meminit radix__vmemmap_remove_mapping(unsigned long start, unsigned long #ifdef CONFIG_DEBUG_PAGEALLOC void radix__kernel_map_pages(struct page *page, int numpages, int enable) { - pr_warn_once("DEBUG_PAGEALLOC not supported in radix mode\n"); + unsigned long addr; + + addr = (unsigned long)page_address(page); + + if (enable) + set_memory_p(addr, numpages); + else + set_memory_np(addr, numpages); } #endif -- 2.34.1
Re: [PATCH v1 1/3] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")
David Hildenbrand writes: >>> +Use WARN_ON_ONCE() rather than WARN() or WARN_ON() >>> +** >>> + >>> +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it >>> +is common for a given warning condition, if it occurs at all, to occur >>> +multiple times. This can fill up and wrap the kernel log, and can even slow >>> +the system enough that the excessive logging turns into its own, additional >>> +problem. >> >> FWIW I have had cases where WARN() messages caused a reboot, maybe >> mention that here? In my case the logging was so excessive that the >> watchdog wasn't updated and in the end the device was forcefully >> rebooted. >> > > That should be covered by the last part, no? What would be your suggestion? I was just thinking that maybe make it more obvious that even WARN_ON() can crash the system, something along these lines: "..., additional problem like stalling the system so much that it causes a reboot." -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies
On Mon, Sep 26, 2022 at 08:47:32AM +0200, Greg Kroah-Hartman wrote: > On Sat, Sep 24, 2022 at 01:55:23PM +0200, Michal Suchánek wrote: > > On Sat, Sep 24, 2022 at 12:13:34PM +0200, Greg Kroah-Hartman wrote: > > > On Sat, Sep 24, 2022 at 11:45:21AM +0200, Michal Suchánek wrote: > > > > On Sat, Sep 24, 2022 at 11:19:19AM +0200, Greg Kroah-Hartman wrote: > > > > > On Fri, Sep 23, 2022 at 07:10:28PM +0200, Michal Suchanek wrote: > > > > > > Hello, > > > > > > > > > > > > this is backport of commit 0d519cadf751 > > > > > > ("arm64: kexec_file: use more system keyrings to verify kernel > > > > > > image signature") > > > > > > to table 5.15 tree including the preparatory patches. > > > > > > > > > > This feels to me like a new feature for arm64, one that has never > > > > > worked > > > > > before and you are just making it feature-parity with x86, right? > > > > > > > > > > Or is this a regression fix somewhere? Why is this needed in 5.15.y > > > > > and > > > > > why can't people who need this new feature just use a newer kernel > > > > > version (5.19?) > > > > > > > > It's half-broken implementation of the kexec kernel verification. At > > > > the time > > > > it was implemented for arm64 we had the platform and secondary keyrings > > > > and x86 was using them but on arm64 the initial implementation ignores > > > > them. > > > > > > Ok, so it's something that never worked. Adding support to get it to > > > work doesn't really fall into the stable kernel rules, right? > > > > Not sure. It was defective, not using the facilities available at the > > time correctly. Which translates to kernels that can be kexec'd on x86 > > failing to kexec on arm64 without any explanation (signed with same key, > > built for the appropriate arch). > > Feature parity across architectures is not a "regression", but rather a > "this feature is not implemented for this architecture yet" type of > thing. That depends on the view - before kexec verification you could boot any kernel, now you can boot some kernels signed with a valid key, but not others - the initial implementation is buggy, probably because it is based on an old version of the x86 code. > > > > Again, what's wrong with 5.19 for anyone who wants this? Who does want > > > this? > > > > Not sure, really. > > > > The final patch was repeatedly backported to stable and failed to build > > because the prerequisites were missing. > > That's because it was tagged, but now that you show the full set of > requirements, it's pretty obvious to me that this is not relevant for > going this far back. That also works. Thanks Michal
Re: [PATCH 5.15 0/6] arm64: kexec_file: use more system keyrings to verify kernel image signature + dependencies
On Sat, Sep 24, 2022 at 01:55:23PM +0200, Michal Suchánek wrote: > On Sat, Sep 24, 2022 at 12:13:34PM +0200, Greg Kroah-Hartman wrote: > > On Sat, Sep 24, 2022 at 11:45:21AM +0200, Michal Suchánek wrote: > > > On Sat, Sep 24, 2022 at 11:19:19AM +0200, Greg Kroah-Hartman wrote: > > > > On Fri, Sep 23, 2022 at 07:10:28PM +0200, Michal Suchanek wrote: > > > > > Hello, > > > > > > > > > > this is backport of commit 0d519cadf751 > > > > > ("arm64: kexec_file: use more system keyrings to verify kernel image > > > > > signature") > > > > > to table 5.15 tree including the preparatory patches. > > > > > > > > This feels to me like a new feature for arm64, one that has never worked > > > > before and you are just making it feature-parity with x86, right? > > > > > > > > Or is this a regression fix somewhere? Why is this needed in 5.15.y and > > > > why can't people who need this new feature just use a newer kernel > > > > version (5.19?) > > > > > > It's half-broken implementation of the kexec kernel verification. At the > > > time > > > it was implemented for arm64 we had the platform and secondary keyrings > > > and x86 was using them but on arm64 the initial implementation ignores > > > them. > > > > Ok, so it's something that never worked. Adding support to get it to > > work doesn't really fall into the stable kernel rules, right? > > Not sure. It was defective, not using the facilities available at the > time correctly. Which translates to kernels that can be kexec'd on x86 > failing to kexec on arm64 without any explanation (signed with same key, > built for the appropriate arch). Feature parity across architectures is not a "regression", but rather a "this feature is not implemented for this architecture yet" type of thing. > > Again, what's wrong with 5.19 for anyone who wants this? Who does want > > this? > > Not sure, really. > > The final patch was repeatedly backported to stable and failed to build > because the prerequisites were missing. That's because it was tagged, but now that you show the full set of requirements, it's pretty obvious to me that this is not relevant for going this far back. thanks, greg k-h
[PATCH v2 5/6] powerpc/64: Add support for out-of-line static calls
Implement static call support for 64 bit V2 ABI. This requires making sure the TOC is kept correct across kernel-module boundaries. As a secondary concern, it tries to use the local entry point of a target wherever possible. It does so by checking if both tramp & target are kernel code, and falls back to detecting the common global entry point patterns if modules are involved. Detecting the global entry point is also required for setting the local entry point as the trampoline target: if we cannot detect the local entry point, then we need to convservatively initialise r12 and use the global entry point. The trampolines are marked with `.localentry NAME, 1` to make the linker save and restore the TOC on each call to the trampoline. This allows the trampoline to safely target functions with different TOC values. However this directive also implies the TOC is not initialised on entry to the trampoline. The kernel TOC is easily found in the PACA, but not an arbitrary module TOC. Therefore the trampoline implementation depends on whether it's in the kernel or not. If in the kernel, we initialise the TOC using the PACA. If in a module, we have to initialise the TOC with zero context, so it's quite expensive. Signed-off-by: Benjamin Gray --- arch/powerpc/Kconfig | 2 +- arch/powerpc/include/asm/code-patching.h | 1 + arch/powerpc/include/asm/static_call.h | 80 +++-- arch/powerpc/kernel/Makefile | 3 +- arch/powerpc/kernel/static_call.c| 90 ++-- 5 files changed, 164 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 4c466acdc70d..e7a66635eade 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -248,7 +248,7 @@ config PPC select HAVE_SOFTIRQ_ON_OWN_STACK select HAVE_STACKPROTECTOR if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2) select HAVE_STACKPROTECTOR if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13) - select HAVE_STATIC_CALL if PPC32 + select HAVE_STATIC_CALL if PPC32 || PPC64_ELF_ABI_V2 select HAVE_SYSCALL_TRACEPOINTS select HAVE_VIRT_CPU_ACCOUNTING select HUGETLB_PAGE_SIZE_VARIABLE if PPC_BOOK3S_64 && HUGETLB_PAGE diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 15efd8ab22da..8d1850080af8 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -132,6 +132,7 @@ int translate_branch(ppc_inst_t *instr, const u32 *dest, const u32 *src); bool is_conditional_branch(ppc_inst_t instr); #define OP_RT_RA_MASK 0xUL +#define OP_SI_MASK 0xUL #define LIS_R2 (PPC_RAW_LIS(_R2, 0)) #define ADDIS_R2_R12 (PPC_RAW_ADDIS(_R2, _R12, 0)) #define ADDI_R2_R2 (PPC_RAW_ADDI(_R2, _R2, 0)) diff --git a/arch/powerpc/include/asm/static_call.h b/arch/powerpc/include/asm/static_call.h index de1018cc522b..3d6e82200cb7 100644 --- a/arch/powerpc/include/asm/static_call.h +++ b/arch/powerpc/include/asm/static_call.h @@ -2,12 +2,75 @@ #ifndef _ASM_POWERPC_STATIC_CALL_H #define _ASM_POWERPC_STATIC_CALL_H +#ifdef CONFIG_PPC64_ELF_ABI_V2 + +#ifdef MODULE + +#define __PPC_SCT(name, inst) \ + asm(".pushsection .text, \"ax\" \n" \ + ".align 6 \n" \ + ".globl " STATIC_CALL_TRAMP_STR(name) " \n" \ + ".localentry " STATIC_CALL_TRAMP_STR(name) ", 1 \n" \ + STATIC_CALL_TRAMP_STR(name) ": \n" \ + " mflr11 \n" \ + " bcl 20, 31, $+4 \n" \ + "0: mflr12 \n" \ + " mtlr11 \n" \ + " addi12, 12, (" STATIC_CALL_TRAMP_STR(name) " - 0b) \n" \ + " addis 2, 12, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@ha \n" \ + " addi 2, 2, (.TOC.-" STATIC_CALL_TRAMP_STR(name) ")@l\n" \ + " " inst "\n" \ + " ld 12, (2f - " STATIC_CALL_TRAMP_STR(name) ")(12) \n" \ + " mtctr 12 \n" \ + " bctr\n" \ + "1: li 3, 0\n" \ + " blr \n" \ + ".balign 8 \n" \ + "2: .8byte 0\n" \ + ".type " STATIC_CALL_TRAMP_STR(name) ",
[PATCH v2 2/6] powerpc/module: Handle caller-saved TOC in module linker
The callee may set a field in `st_other` to 1 to indicate r2 should be treated as caller-saved. This means a trampoline must be used to save the current TOC before calling it and restore it afterwards, much like external calls. This is necessary for supporting V2 ABI static calls that do not preserve the TOC. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/module_64.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 7e45dc98df8a..4d816f7785b4 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -55,6 +55,12 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym) * of function and try to derive r2 from it). */ return PPC64_LOCAL_ENTRY_OFFSET(sym->st_other); } + +static bool need_r2save_stub(unsigned char st_other) +{ + return ((st_other & STO_PPC64_LOCAL_MASK) >> STO_PPC64_LOCAL_BIT) == 1; +} + #else static func_desc_t func_desc(unsigned long addr) @@ -66,6 +72,11 @@ static unsigned int local_entry_offset(const Elf64_Sym *sym) return 0; } +static bool need_r2save_stub(unsigned char st_other) +{ + return false; +} + void *dereference_module_function_descriptor(struct module *mod, void *ptr) { if (ptr < (void *)mod->arch.start_opd || @@ -632,7 +643,8 @@ int apply_relocate_add(Elf64_Shdr *sechdrs, case R_PPC_REL24: /* FIXME: Handle weak symbols here --RR */ if (sym->st_shndx == SHN_UNDEF || - sym->st_shndx == SHN_LIVEPATCH) { + sym->st_shndx == SHN_LIVEPATCH || + need_r2save_stub(sym->st_other)) { /* External: go via stub */ value = stub_for_addr(sechdrs, value, me, strtab + sym->st_name); -- 2.37.3
[PATCH v2 4/6] static_call: Move static call selftest to static_call_selftest.c
These tests are out-of-line only, so moving them to the their own file allows them to be run when an arch does not implement inline static calls. Signed-off-by: Benjamin Gray --- kernel/Makefile | 1 + kernel/static_call_inline.c | 43 --- kernel/static_call_selftest.c | 41 + 3 files changed, 42 insertions(+), 43 deletions(-) create mode 100644 kernel/static_call_selftest.c diff --git a/kernel/Makefile b/kernel/Makefile index 318789c728d3..8ce8beaa3cc0 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -113,6 +113,7 @@ obj-$(CONFIG_KCSAN) += kcsan/ obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o +obj-$(CONFIG_STATIC_CALL_SELFTEST) += static_call_selftest.o obj-$(CONFIG_CFI_CLANG) += cfi.o obj-$(CONFIG_PERF_EVENTS) += events/ diff --git a/kernel/static_call_inline.c b/kernel/static_call_inline.c index dc5665b62814..64d04d054698 100644 --- a/kernel/static_call_inline.c +++ b/kernel/static_call_inline.c @@ -498,46 +498,3 @@ int __init static_call_init(void) return 0; } early_initcall(static_call_init); - -#ifdef CONFIG_STATIC_CALL_SELFTEST - -static int func_a(int x) -{ - return x+1; -} - -static int func_b(int x) -{ - return x+2; -} - -DEFINE_STATIC_CALL(sc_selftest, func_a); - -static struct static_call_data { - int (*func)(int); - int val; - int expect; -} static_call_data [] __initdata = { - { NULL, 2, 3 }, - { func_b, 2, 4 }, - { func_a, 2, 3 } -}; - -static int __init test_static_call_init(void) -{ - int i; - - for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) { - struct static_call_data *scd = _call_data[i]; - - if (scd->func) - static_call_update(sc_selftest, scd->func); - - WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect); - } - - return 0; -} -early_initcall(test_static_call_init); - -#endif /* CONFIG_STATIC_CALL_SELFTEST */ diff --git a/kernel/static_call_selftest.c b/kernel/static_call_selftest.c new file mode 100644 index ..246ad89f64eb --- /dev/null +++ b/kernel/static_call_selftest.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0 +#include + +static int func_a(int x) +{ + return x+1; +} + +static int func_b(int x) +{ + return x+2; +} + +DEFINE_STATIC_CALL(sc_selftest, func_a); + +static struct static_call_data { + int (*func)(int); + int val; + int expect; +} static_call_data [] __initdata = { + { NULL, 2, 3 }, + { func_b, 2, 4 }, + { func_a, 2, 3 } +}; + +static int __init test_static_call_init(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(static_call_data); i++ ) { + struct static_call_data *scd = _call_data[i]; + + if (scd->func) + static_call_update(sc_selftest, scd->func); + + WARN_ON(static_call(sc_selftest)(scd->val) != scd->expect); + } + + return 0; +} +early_initcall(test_static_call_init); -- 2.37.3
[PATCH v2 6/6] powerpc/64: Add tests for out-of-line static calls
KUnit tests for the various combinations of caller/trampoline/target and kernel/module. They must be run from a module loaded at runtime to guarantee they have a different TOC to the kernel. The tests try to mitigate the chance of panicing by restoring the TOC after every static call. Not all possible errors can be caught by this (we can't stop a trampoline from using a bad TOC itself), but it makes certain errors easier to debug. Signed-off-by: Benjamin Gray --- arch/powerpc/Kconfig | 10 + arch/powerpc/kernel/Makefile | 1 + arch/powerpc/kernel/static_call.c | 61 ++ arch/powerpc/kernel/static_call_test.c | 251 + arch/powerpc/kernel/static_call_test.h | 56 ++ 5 files changed, 379 insertions(+) create mode 100644 arch/powerpc/kernel/static_call_test.c create mode 100644 arch/powerpc/kernel/static_call_test.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index e7a66635eade..0ca60514c0e2 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -1023,6 +1023,16 @@ config PPC_RTAS_FILTER Say Y unless you know what you are doing and the filter is causing problems for you. +config PPC_STATIC_CALL_KUNIT_TEST + tristate "KUnit tests for PPC64 ELF ABI V2 static calls" + default KUNIT_ALL_TESTS + depends on HAVE_STATIC_CALL && PPC64_ELF_ABI_V2 && KUNIT && m + help + Tests that check the TOC is kept consistent across all combinations + of caller/trampoline/target being kernel/module. Must be built as a + module and loaded at runtime to ensure the module has a different + TOC to the kernel. + endmenu config ISA_DMA_API diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index a30d0d0f5499..22c07e3d34df 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -131,6 +131,7 @@ obj-$(CONFIG_RELOCATABLE) += reloc_$(BITS).o obj-$(CONFIG_PPC32)+= entry_32.o setup_32.o early_32.o obj-$(CONFIG_PPC64)+= dma-iommu.o iommu.o obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o +obj-$(CONFIG_PPC_STATIC_CALL_KUNIT_TEST) += static_call_test.o obj-$(CONFIG_KGDB) += kgdb.o obj-$(CONFIG_BOOTX_TEXT) += btext.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/powerpc/kernel/static_call.c b/arch/powerpc/kernel/static_call.c index ecbb74e1b4d3..8d338917b70e 100644 --- a/arch/powerpc/kernel/static_call.c +++ b/arch/powerpc/kernel/static_call.c @@ -113,3 +113,64 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail) panic("%s: patching failed %pS at %pS\n", __func__, func, tramp); } EXPORT_SYMBOL_GPL(arch_static_call_transform); + + +#if IS_MODULE(CONFIG_PPC_STATIC_CALL_KUNIT_TEST) + +#include "static_call_test.h" + +int ppc_sc_kernel_target_1(struct kunit* test) +{ + toc_fixup(test); + return 1; +} + +int ppc_sc_kernel_target_2(struct kunit* test) +{ + toc_fixup(test); + return 2; +} + +DEFINE_STATIC_CALL(ppc_sc_kernel, ppc_sc_kernel_target_1); + +int ppc_sc_kernel_call(struct kunit* test) +{ + return PROTECTED_SC(test, int, static_call(ppc_sc_kernel)(test)); +} + +int ppc_sc_kernel_call_indirect(struct kunit* test, int (*fn)(struct kunit*)) +{ + return PROTECTED_SC(test, int, fn(test)); +} + +long ppc_sc_kernel_target_big(struct kunit* test, + long a, + long b, + long c, + long d, + long e, + long f, + long g, + long h, + long i) +{ + toc_fixup(test); + KUNIT_EXPECT_EQ(test, a, b); + KUNIT_EXPECT_EQ(test, a, c); + KUNIT_EXPECT_EQ(test, a, d); + KUNIT_EXPECT_EQ(test, a, e); + KUNIT_EXPECT_EQ(test, a, f); + KUNIT_EXPECT_EQ(test, a, g); + KUNIT_EXPECT_EQ(test, a, h); + KUNIT_EXPECT_EQ(test, a, i); + return ~a; +} + +EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_1); +EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_2); +EXPORT_SYMBOL_GPL(ppc_sc_kernel_target_big); +EXPORT_STATIC_CALL_GPL(ppc_sc_kernel); +EXPORT_SYMBOL_GPL(ppc_sc_kernel_call); +EXPORT_SYMBOL_GPL(ppc_sc_kernel_call_indirect); + +#endif /* IS_MODULE(CONFIG_PPC_STATIC_CALL_KUNIT_TEST) */ diff --git a/arch/powerpc/kernel/static_call_test.c b/arch/powerpc/kernel/static_call_test.c new file mode 100644 index ..2d69524d935f --- /dev/null +++ b/arch/powerpc/kernel/static_call_test.c @@ -0,0 +1,251 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include "static_call_test.h" + +#include +#include +#include + +/* + * Tests to ensure correctness in a variety of cases for static calls. + * + * The tests focus on ensuring the TOC is kept consistent across the + * module-kernel boundary, as compilers can't see that a trampoline
[PATCH v2 3/6] powerpc/module: Optimise nearby branches in ELF V2 ABI stub
Inserts a direct branch to the stub target when possible, replacing the mtctr/btctr sequence. The load into r12 could potentially be skipped too, but that change would need to refactor the arguments to indicate that the address does not have a separate local entry point. This helps the static call implementation, where modules calling their own trampolines are called through this stub and the trampoline is easily within range of a direct branch. Signed-off-by: Benjamin Gray --- arch/powerpc/kernel/module_64.c | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/powerpc/kernel/module_64.c b/arch/powerpc/kernel/module_64.c index 4d816f7785b4..745ce9097dcf 100644 --- a/arch/powerpc/kernel/module_64.c +++ b/arch/powerpc/kernel/module_64.c @@ -141,6 +141,12 @@ static u32 ppc64_stub_insns[] = { PPC_RAW_BCTR(), }; +#ifdef CONFIG_PPC64_ELF_ABI_V1 +#define PPC64_STUB_MTCTR_OFFSET 5 +#else +#define PPC64_STUB_MTCTR_OFFSET 4 +#endif + /* Count how many different 24-bit relocations (different symbol, different addend) */ static unsigned int count_relocs(const Elf64_Rela *rela, unsigned int num) @@ -429,6 +435,8 @@ static inline int create_stub(const Elf64_Shdr *sechdrs, long reladdr; func_desc_t desc; int i; + u32 *jump_seq_addr = >jump[PPC64_STUB_MTCTR_OFFSET]; + ppc_inst_t direct; if (is_mprofile_ftrace_call(name)) return create_ftrace_stub(entry, addr, me); @@ -439,6 +447,11 @@ static inline int create_stub(const Elf64_Shdr *sechdrs, return 0; } + /* Replace indirect branch sequence with direct branch where possible */ + if (!create_branch(, jump_seq_addr, addr, 0)) + if (patch_instruction(jump_seq_addr, direct)) + return 0; + /* Stub uses address relative to r2. */ reladdr = (unsigned long)entry - my_r2(sechdrs, me); if (reladdr > 0x7FFF || reladdr < -(0x8000L)) { -- 2.37.3
[PATCH v2 1/6] powerpc/code-patching: Implement generic text patching function
Adds a generic text patching mechanism for patches of 1, 2, 4, or (64-bit) 8 bytes. The patcher conditionally syncs the icache depending on if the content will be executed (as opposed to, e.g., read-only data). The `patch_instruction` function is reimplemented in terms of this more generic function. This generic implementation allows patching of arbitrary 64-bit data, whereas the original `patch_instruction` decided the size based on the 'instruction' opcode, so was not suitable for arbitrary data. Signed-off-by: Benjamin Gray --- arch/powerpc/include/asm/code-patching.h | 7 ++ arch/powerpc/lib/code-patching.c | 90 +--- 2 files changed, 71 insertions(+), 26 deletions(-) diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h index 1c6316ec4b74..15efd8ab22da 100644 --- a/arch/powerpc/include/asm/code-patching.h +++ b/arch/powerpc/include/asm/code-patching.h @@ -76,6 +76,13 @@ int create_cond_branch(ppc_inst_t *instr, const u32 *addr, int patch_branch(u32 *addr, unsigned long target, int flags); int patch_instruction(u32 *addr, ppc_inst_t instr); int raw_patch_instruction(u32 *addr, ppc_inst_t instr); +int __patch_memory(void *dest, unsigned long src, size_t size); + +#define patch_memory(addr, val) \ +({ \ + BUILD_BUG_ON(!__native_word(val)); \ + __patch_memory(addr, (unsigned long) val, sizeof(val)); \ +}) static inline unsigned long patch_site_addr(s32 *site) { diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c index ad0cf3108dd0..9979380d55ef 100644 --- a/arch/powerpc/lib/code-patching.c +++ b/arch/powerpc/lib/code-patching.c @@ -15,20 +15,47 @@ #include #include -static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr) +static int __always_inline ___patch_memory(void *patch_addr, + unsigned long data, + void *prog_addr, + size_t size) { - if (!ppc_inst_prefixed(instr)) { - u32 val = ppc_inst_val(instr); + switch (size) { + case 1: + __put_kernel_nofault(patch_addr, , u8, failed); + break; + case 2: + __put_kernel_nofault(patch_addr, , u16, failed); + break; + case 4: + __put_kernel_nofault(patch_addr, , u32, failed); + break; +#ifdef CONFIG_PPC64 + case 8: + __put_kernel_nofault(patch_addr, , u64, failed); + break; +#endif + default: + unreachable(); + } - __put_kernel_nofault(patch_addr, , u32, failed); - } else { - u64 val = ppc_inst_as_ulong(instr); + dcbst(patch_addr); + dcbst(patch_addr + size - 1); /* Last byte of data may cross a cacheline */ - __put_kernel_nofault(patch_addr, , u64, failed); - } + mb(); /* sync */ + + /* Flush on the EA that may be executed in case of a non-coherent icache */ + icbi(prog_addr); + + /* Also flush the last byte of the instruction if it may be a +* prefixed instruction and we aren't assuming minimum 64-byte +* cacheline sizes +*/ + if (IS_ENABLED(CONFIG_PPC64) && L1_CACHE_BYTES < 64) + icbi(prog_addr + size - 1); - asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr), - "r" (exec_addr)); + mb(); /* sync */ + isync(); return 0; @@ -38,7 +65,10 @@ static int __patch_instruction(u32 *exec_addr, ppc_inst_t instr, u32 *patch_addr int raw_patch_instruction(u32 *addr, ppc_inst_t instr) { - return __patch_instruction(addr, instr, addr); + if (ppc_inst_prefixed(instr)) + return ___patch_memory(addr, ppc_inst_as_ulong(instr), addr, sizeof(u64)); + else + return ___patch_memory(addr, ppc_inst_val(instr), addr, sizeof(u32)); } #ifdef CONFIG_STRICT_KERNEL_RWX @@ -147,24 +177,22 @@ static void unmap_patch_area(unsigned long addr) flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } -static int __do_patch_instruction(u32 *addr, ppc_inst_t instr) +static int __always_inline __do_patch_memory(void *dest, unsigned long src, size_t size) { int err; u32 *patch_addr; - unsigned long text_poke_addr; pte_t *pte; - unsigned long pfn = get_patch_pfn(addr); - - text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr & PAGE_MASK; - patch_addr = (u32 *)(text_poke_addr + offset_in_page(addr)); + unsigned long text_poke_addr = (unsigned long)__this_cpu_read(text_poke_area)->addr & PAGE_MASK; + unsigned long pfn = get_patch_pfn(dest); + patch_addr = (u32 *)(text_poke_addr + offset_in_page(dest)); pte = virt_to_kpte(text_poke_addr);
[PATCH v2 0/6] Out-of-line static calls for powerpc64 ELF V2
Implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI. Static calls patch an indirect branch into a direct branch at runtime. Out-of-line specifically has a caller directly call a trampoline, and the trampoline gets patched to directly call the target. Previous version here: https://lore.kernel.org/all/20220916062330.430468-1-bg...@linux.ibm.com/ I couldn't see a dedicated ftrace benchmark in the kernel, but my own benchmarking showed no significant impact to ftrace activation. The __patch_memory function is meant to be accessed through the size checking patch_memory wrapper. I don't think there's a way to expose the macro without also exposing __patch_memory though. I considered making the type an explicit macro param, but using the value type seemed more ergonomic. V2: Mostly accounting for feedback from Christophe: * Code patching rewritten - Rename to *_memory - Use __always_inline to get the compiler to realise it can collapse all the sub-functions - Pass data directly instead of through a pointer, elliding a redundant load - Flush the last byte of data too (technically redundant if an instrucion, but saves a conditional branch + the isync will be the bottleneck). - Handle a non-cohenrent icache, assume a coherent dcache - Handle when we don't assume a 64 byte icache on 64-bits - Flatten the poke address init and teardown - Check the data size in patch_memory at build time (inline function was suggested, but a macro makes checking based on the data type easier). - It builds now on 32 bit and without strict RWX * Static call enabling is no longer configurable * Refactored arch_static_call_transform to minimise casting * Made the KUnit tests more robust (previously they changed non-volatile registers in the init hook, but that's incorrect because it returns to the KUnit framework before the test case is called). * Some other minor refactoring in other patches Benjamin Gray (6): powerpc/code-patching: Implement generic text patching function powerpc/module: Handle caller-saved TOC in module linker powerpc/module: Optimise nearby branches in ELF V2 ABI stub static_call: Move static call selftest to static_call_selftest.c powerpc/64: Add support for out-of-line static calls powerpc/64: Add tests for out-of-line static calls arch/powerpc/Kconfig | 12 +- arch/powerpc/include/asm/code-patching.h | 8 + arch/powerpc/include/asm/static_call.h | 80 +++- arch/powerpc/kernel/Makefile | 4 +- arch/powerpc/kernel/module_64.c | 27 ++- arch/powerpc/kernel/static_call.c| 151 +- arch/powerpc/kernel/static_call_test.c | 251 +++ arch/powerpc/kernel/static_call_test.h | 56 + arch/powerpc/lib/code-patching.c | 90 +--- kernel/Makefile | 1 + kernel/static_call_inline.c | 43 kernel/static_call_selftest.c| 41 12 files changed, 682 insertions(+), 82 deletions(-) create mode 100644 arch/powerpc/kernel/static_call_test.c create mode 100644 arch/powerpc/kernel/static_call_test.h create mode 100644 kernel/static_call_selftest.c base-commit: 3d7a198cfdb47405cfb4a3ea523876569fe341e6 -- 2.37.3
Re: Is PPC 44x PIKA Warp board still relevant?
Hi Dmitry Le 25/09/2022 à 07:06, Dmitry Torokhov a écrit : > Hi Michael, Nick, > > I was wondering if PIKA Warp board still relevant. The reason for my > question is that I am interested in dropping legacy gpio APIs, > especially OF-specific ones, in favor of newer gpiod APIs, and > arch/powerpc/platforms/44x/warp.c is one of few users of it. As far as I can see, that board is still being sold, see https://www.voipon.co.uk/pika-warp-asterisk-appliance-p-932.html > > The code in question is supposed to turn off green led and flash red led > in case of overheating, and is doing so by directly accessing GPIOs > owned by led-gpio driver without requesting/allocating them. This is not > really supported with gpiod API, and is not a good practice in general. As far as I can see, it was ported to led-gpio by ba703e1a7a0b powerpc/4xx: Have Warp take advantage of GPIO LEDs default-state = keep 805e324b7fbd powerpc: Update Warp to use leds-gpio driver > Before I spend much time trying to implement a replacement without > access to the hardware, I wonder if this board is in use at all, and if > it is how important is the feature of flashing red led on critical > temperature shutdown? > Don't know who can tell it ? Maybe let's perform a more standard implementation is see if anybody screams ? Christophe
[PATCH v3] powerpc/pseries/mce: Avoid instrumentation in realmode
Part of machine check error handling is done in realmode, As of now instrumentation is not possible for any code that runs in realmode. When MCE is injected on KASAN enabled kernel, crash is observed, Hence force inline or mark no instrumentation for functions which can run in realmode, to avoid KASAN instrumentation. Signed-off-by: Ganesh Goudar --- v2: Force inline few more functions. v3: Adding noinstr to few functions instead of __always_inline. --- arch/powerpc/include/asm/hw_irq.h| 8 arch/powerpc/include/asm/interrupt.h | 2 +- arch/powerpc/include/asm/rtas.h | 4 ++-- arch/powerpc/kernel/rtas.c | 4 ++-- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h index 983551859891..c4d542b4a623 100644 --- a/arch/powerpc/include/asm/hw_irq.h +++ b/arch/powerpc/include/asm/hw_irq.h @@ -111,7 +111,7 @@ static inline void __hard_RI_enable(void) #ifdef CONFIG_PPC64 #include -static inline notrace unsigned long irq_soft_mask_return(void) +noinstr static unsigned long irq_soft_mask_return(void) { unsigned long flags; @@ -128,7 +128,7 @@ static inline notrace unsigned long irq_soft_mask_return(void) * for the critical section and as a clobber because * we changed paca->irq_soft_mask */ -static inline notrace void irq_soft_mask_set(unsigned long mask) +noinstr static void irq_soft_mask_set(unsigned long mask) { /* * The irq mask must always include the STD bit if any are set. @@ -155,7 +155,7 @@ static inline notrace void irq_soft_mask_set(unsigned long mask) : "memory"); } -static inline notrace unsigned long irq_soft_mask_set_return(unsigned long mask) +noinstr static unsigned long irq_soft_mask_set_return(unsigned long mask) { unsigned long flags; @@ -191,7 +191,7 @@ static inline notrace unsigned long irq_soft_mask_or_return(unsigned long mask) return flags; } -static inline unsigned long arch_local_save_flags(void) +static __always_inline unsigned long arch_local_save_flags(void) { return irq_soft_mask_return(); } diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index 8069dbc4b8d1..090895051712 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -92,7 +92,7 @@ static inline bool is_implicit_soft_masked(struct pt_regs *regs) return search_kernel_soft_mask_table(regs->nip); } -static inline void srr_regs_clobbered(void) +static __always_inline void srr_regs_clobbered(void) { local_paca->srr_valid = 0; local_paca->hsrr_valid = 0; diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 00531af17ce0..52d29d664fdf 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -201,13 +201,13 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log) #define PSERIES_ELOG_SECT_ID_MCE (('M' << 8) | 'C') static -inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect) +__always_inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect) { return be16_to_cpu(sect->id); } static -inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect) +__always_inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect) { return be16_to_cpu(sect->length); } diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 693133972294..f9d78245c0e8 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -48,7 +48,7 @@ /* This is here deliberately so it's only used in this file */ void enter_rtas(unsigned long); -static inline void do_enter_rtas(unsigned long args) +static __always_inline void do_enter_rtas(unsigned long args) { unsigned long msr; @@ -435,7 +435,7 @@ static char *__fetch_rtas_last_error(char *altbuf) #endif -static void +noinstr static void va_rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret, va_list list) { -- 2.37.1
Re: [PATCH] powerpc/microwatt: Remove unused early debug code
On Mon, 19 Sept 2022 at 05:28, Michael Ellerman wrote: > > The original microwatt submission[1] included some early debug code for > using the Microwatt "potato" UART. The potato is indeed dead. > > The series that was eventually merged switched to using a standard UART, > and so doesn't need any special early debug handling. But some of the > original code was merged accidentally under the non-existent > CONFIG_PPC_EARLY_DEBUG_MICROWATT. The kconfig never got added, so you're right. Using the "legacy serial console" must be how we get early console on microwatt? I can't quite work it out. May or may not be related to https://github.com/linuxppc/issues/issues/413 > > Drop the unused code. > > 1: > https://lore.kernel.org/linuxppc-dev/20200509050340.gd1464...@thinks.paulus.ozlabs.org/ > > Fixes: 48b545b8018d ("powerpc/microwatt: Use standard 16550 UART for console") > Reported-by: Lukas Bulwahn > Signed-off-by: Michael Ellerman > --- > arch/powerpc/kernel/udbg_16550.c | 39 > 1 file changed, 39 deletions(-) > > diff --git a/arch/powerpc/kernel/udbg_16550.c > b/arch/powerpc/kernel/udbg_16550.c > index d3942de254c6..ddfbc74bf85f 100644 > --- a/arch/powerpc/kernel/udbg_16550.c > +++ b/arch/powerpc/kernel/udbg_16550.c > @@ -296,42 +296,3 @@ void __init udbg_init_40x_realmode(void) > } > > #endif /* CONFIG_PPC_EARLY_DEBUG_40x */ > - > -#ifdef CONFIG_PPC_EARLY_DEBUG_MICROWATT > - > -#define UDBG_UART_MW_ADDR ((void __iomem *)0xc0002000) > - > -static u8 udbg_uart_in_isa300_rm(unsigned int reg) > -{ > - uint64_t msr = mfmsr(); > - uint8_t c; > - > - mtmsr(msr & ~(MSR_EE|MSR_DR)); > - isync(); > - eieio(); > - c = __raw_rm_readb(UDBG_UART_MW_ADDR + (reg << 2)); > - mtmsr(msr); > - isync(); > - return c; > -} > - > -static void udbg_uart_out_isa300_rm(unsigned int reg, u8 val) > -{ > - uint64_t msr = mfmsr(); > - > - mtmsr(msr & ~(MSR_EE|MSR_DR)); > - isync(); > - eieio(); > - __raw_rm_writeb(val, UDBG_UART_MW_ADDR + (reg << 2)); > - mtmsr(msr); > - isync(); > -} > - > -void __init udbg_init_debug_microwatt(void) > -{ > - udbg_uart_in = udbg_uart_in_isa300_rm; > - udbg_uart_out = udbg_uart_out_isa300_rm; > - udbg_use_uart(); > -} > - > -#endif /* CONFIG_PPC_EARLY_DEBUG_MICROWATT */ > -- > 2.37.2 >
[PATCH 7/7] hmm-tests: Add test for migrate_device_range()
Signed-off-by: Alistair Popple --- lib/test_hmm.c | 119 +- lib/test_hmm_uapi.h| 1 +- tools/testing/selftests/vm/hmm-tests.c | 49 +++- 3 files changed, 148 insertions(+), 21 deletions(-) diff --git a/lib/test_hmm.c b/lib/test_hmm.c index 2bd3a67..d2821dd 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -100,6 +100,7 @@ struct dmirror { struct dmirror_chunk { struct dev_pagemap pagemap; struct dmirror_device *mdevice; + bool remove; }; /* @@ -192,11 +193,15 @@ static int dmirror_fops_release(struct inode *inode, struct file *filp) return 0; } +static struct dmirror_chunk *dmirror_page_to_chunk(struct page *page) +{ + return container_of(page->pgmap, struct dmirror_chunk, pagemap); +} + static struct dmirror_device *dmirror_page_to_device(struct page *page) { - return container_of(page->pgmap, struct dmirror_chunk, - pagemap)->mdevice; + return dmirror_page_to_chunk(page)->mdevice; } static int dmirror_do_fault(struct dmirror *dmirror, struct hmm_range *range) @@ -1219,6 +1224,84 @@ static int dmirror_snapshot(struct dmirror *dmirror, return ret; } +static void dmirror_device_evict_chunk(struct dmirror_chunk *chunk) +{ + unsigned long start_pfn = chunk->pagemap.range.start >> PAGE_SHIFT; + unsigned long end_pfn = chunk->pagemap.range.end >> PAGE_SHIFT; + unsigned long npages = end_pfn - start_pfn + 1; + unsigned long i; + unsigned long *src_pfns; + unsigned long *dst_pfns; + + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); + + migrate_device_range(src_pfns, start_pfn, npages); + for (i = 0; i < npages; i++) { + struct page *dpage, *spage; + + spage = migrate_pfn_to_page(src_pfns[i]); + if (!spage || !(src_pfns[i] & MIGRATE_PFN_MIGRATE)) + continue; + + if (WARN_ON(!is_device_private_page(spage) && + !is_device_coherent_page(spage))) + continue; + spage = BACKING_PAGE(spage); + dpage = alloc_page(GFP_HIGHUSER_MOVABLE | __GFP_NOFAIL); + lock_page(dpage); + copy_highpage(dpage, spage); + dst_pfns[i] = migrate_pfn(page_to_pfn(dpage)); + if (src_pfns[i] & MIGRATE_PFN_WRITE) + dst_pfns[i] |= MIGRATE_PFN_WRITE; + } + migrate_device_pages(src_pfns, dst_pfns, npages); + migrate_device_finalize(src_pfns, dst_pfns, npages); + kfree(src_pfns); + kfree(dst_pfns); +} + +/* Removes free pages from the free list so they can't be re-allocated */ +static void dmirror_remove_free_pages(struct dmirror_chunk *devmem) +{ + struct dmirror_device *mdevice = devmem->mdevice; + struct page *page; + + for (page = mdevice->free_pages; page; page = page->zone_device_data) + if (dmirror_page_to_chunk(page) == devmem) + mdevice->free_pages = page->zone_device_data; +} + +static void dmirror_device_remove_chunks(struct dmirror_device *mdevice) +{ + unsigned int i; + + mutex_lock(>devmem_lock); + if (mdevice->devmem_chunks) { + for (i = 0; i < mdevice->devmem_count; i++) { + struct dmirror_chunk *devmem = + mdevice->devmem_chunks[i]; + + spin_lock(>lock); + devmem->remove = true; + dmirror_remove_free_pages(devmem); + spin_unlock(>lock); + + dmirror_device_evict_chunk(devmem); + memunmap_pages(>pagemap); + if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE) + release_mem_region(devmem->pagemap.range.start, + range_len(>pagemap.range)); + kfree(devmem); + } + mdevice->devmem_count = 0; + mdevice->devmem_capacity = 0; + mdevice->free_pages = NULL; + kfree(mdevice->devmem_chunks); + } + mutex_unlock(>devmem_lock); +} + static long dmirror_fops_unlocked_ioctl(struct file *filp, unsigned int command, unsigned long arg) @@ -1273,6 +1356,11 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp, ret = dmirror_snapshot(dmirror, ); break; + case HMM_DMIRROR_RELEASE: + dmirror_device_remove_chunks(dmirror->mdevice); + ret = 0; + break; + default: return -EINVAL; } @@ -1327,9 +1415,13 @@ static void
[PATCH 6/7] nouveau/dmem: Evict device private memory during release
When the module is unloaded or a GPU is unbound from the module it is possible for device private pages to be left mapped in currently running processes. This leads to a kernel crash when the pages are either freed or accessed from the CPU because the GPU and associated data structures and callbacks have all been freed. Fix this by migrating any mappings back to normal CPU memory prior to freeing the GPU memory chunks and associated device private pages. Signed-off-by: Alistair Popple --- I assume the AMD driver might have a similar issue. However I can't see where device private (or coherent) pages actually get unmapped/freed during teardown as I couldn't find any relevant calls to devm_memunmap(), memunmap(), devm_release_mem_region() or release_mem_region(). So it appears that ZONE_DEVICE pages are not being properly freed during module unload, unless I'm missing something? --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++- 1 file changed, 48 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 66ebbd4..3b247b8 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm) mutex_unlock(>dmem->mutex); } +/* + * Evict all pages mapping a chunk. + */ +void +nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk) +{ + unsigned long i, npages = range_len(>pagemap.range) >> PAGE_SHIFT; + unsigned long *src_pfns, *dst_pfns; + dma_addr_t *dma_addrs; + struct nouveau_fence *fence; + + src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL); + dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL); + dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL); + + migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT, + npages); + + for (i = 0; i < npages; i++) { + if (src_pfns[i] & MIGRATE_PFN_MIGRATE) { + struct page *dpage; + + /* +* _GFP_NOFAIL because the GPU is going away and there +* is nothing sensible we can do if we can't copy the +* data back. +*/ + dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL); + dst_pfns[i] = migrate_pfn(page_to_pfn(dpage)); + nouveau_dmem_copy_one(chunk->drm, + migrate_pfn_to_page(src_pfns[i]), dpage, + _addrs[i]); + } + } + + nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, ); + migrate_device_pages(src_pfns, dst_pfns, npages); + nouveau_dmem_fence_done(); + migrate_device_finalize(src_pfns, dst_pfns, npages); + kfree(src_pfns); + kfree(dst_pfns); + for (i = 0; i < npages; i++) + dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE, DMA_BIDIRECTIONAL); + kfree(dma_addrs); +} + void nouveau_dmem_fini(struct nouveau_drm *drm) { @@ -380,8 +426,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm) mutex_lock(>dmem->mutex); list_for_each_entry_safe(chunk, tmp, >dmem->chunks, list) { + nouveau_dmem_evict_chunk(chunk); nouveau_bo_unpin(chunk->bo); nouveau_bo_ref(NULL, >bo); + WARN_ON(chunk->callocated); list_del(>list); memunmap_pages(>pagemap); release_mem_region(chunk->pagemap.range.start, -- git-series 0.9.1
[PATCH 5/7] nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()
nouveau_dmem_fault_copy_one() is used during handling of CPU faults via the migrate_to_ram() callback and is used to copy data from GPU to CPU memory. It is currently specific to fault handling, however a future patch implementing eviction of data during teardown needs similar functionality. Refactor out the core functionality so that it is not specific to fault handling. Signed-off-by: Alistair Popple --- drivers/gpu/drm/nouveau/nouveau_dmem.c | 59 +-- 1 file changed, 29 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index f9234ed..66ebbd4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -139,44 +139,25 @@ static void nouveau_dmem_fence_done(struct nouveau_fence **fence) } } -static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm, - struct vm_fault *vmf, struct migrate_vma *args, - dma_addr_t *dma_addr) +static int nouveau_dmem_copy_one(struct nouveau_drm *drm, struct page *spage, + struct page *dpage, dma_addr_t *dma_addr) { struct device *dev = drm->dev->dev; - struct page *dpage, *spage; - struct nouveau_svmm *svmm; - - spage = migrate_pfn_to_page(args->src[0]); - if (!spage || !(args->src[0] & MIGRATE_PFN_MIGRATE)) - return 0; - dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address); - if (!dpage) - return VM_FAULT_SIGBUS; lock_page(dpage); *dma_addr = dma_map_page(dev, dpage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL); if (dma_mapping_error(dev, *dma_addr)) - goto error_free_page; + return -EIO; - svmm = spage->zone_device_data; - mutex_lock(>mutex); - nouveau_svmm_invalidate(svmm, args->start, args->end); if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_HOST, *dma_addr, - NOUVEAU_APER_VRAM, nouveau_dmem_page_addr(spage))) - goto error_dma_unmap; - mutex_unlock(>mutex); +NOUVEAU_APER_VRAM, +nouveau_dmem_page_addr(spage))) { + dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); + return -EIO; + } - args->dst[0] = migrate_pfn(page_to_pfn(dpage)); return 0; - -error_dma_unmap: - mutex_unlock(>mutex); - dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); -error_free_page: - __free_page(dpage); - return VM_FAULT_SIGBUS; } static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) @@ -184,9 +165,11 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) struct nouveau_drm *drm = page_to_drm(vmf->page); struct nouveau_dmem *dmem = drm->dmem; struct nouveau_fence *fence; + struct nouveau_svmm *svmm; + struct page *spage, *dpage; unsigned long src = 0, dst = 0; dma_addr_t dma_addr = 0; - vm_fault_t ret; + vm_fault_t ret = 0; struct migrate_vma args = { .vma= vmf->vma, .start = vmf->address, @@ -207,9 +190,25 @@ static vm_fault_t nouveau_dmem_migrate_to_ram(struct vm_fault *vmf) if (!args.cpages) return 0; - ret = nouveau_dmem_fault_copy_one(drm, vmf, , _addr); - if (ret || dst == 0) + spage = migrate_pfn_to_page(src); + if (!spage || !(src & MIGRATE_PFN_MIGRATE)) + goto done; + + dpage = alloc_page_vma(GFP_HIGHUSER, vmf->vma, vmf->address); + if (!dpage) + goto done; + + dst = migrate_pfn(page_to_pfn(dpage)); + + svmm = spage->zone_device_data; + mutex_lock(>mutex); + nouveau_svmm_invalidate(svmm, args.start, args.end); + ret = nouveau_dmem_copy_one(drm, spage, dpage, _addr); + mutex_unlock(>mutex); + if (ret) { + ret = VM_FAULT_SIGBUS; goto done; + } nouveau_fence_new(dmem->migrate.chan, false, ); migrate_vma_pages(); -- git-series 0.9.1