Re: Calculating array sizes in C - was: Re: Build regressions/improvements in v6.2-rc1
Hello Adrian, Could you post a kernel patch for that? I would be happy to test it on my SH-7785CLR board. Also, I'm going to file a bug report against GCC. I filed the bug already. It's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108483. The diff is attached. It's published as CC0 in case anyone considers this trivial change copyrightable. This patch prevents this one specific warning from being upgraded to "error" even if you configure the kernel to use "-Werror". It still keeps it active as warning, though. Kind regards, Michael Karcher diff --git a/Makefile b/Makefile index e09fe100efb2..b4cd075c6a19 100644 --- a/Makefile +++ b/Makefile @@ -870,7 +870,7 @@ stackp-flags-$(CONFIG_STACKPROTECTOR_STRONG) := -fstack-protector-strong KBUILD_CFLAGS += $(stackp-flags-y) -KBUILD_CPPFLAGS-$(CONFIG_WERROR) += -Werror +KBUILD_CPPFLAGS-$(CONFIG_WERROR) += -Werror -Wno-error=sizeof-pointer-div KBUILD_CPPFLAGS += $(KBUILD_CPPFLAGS-y) KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
Re: [PATCH RESEND 0/8] Resend LED patches
On Fri, 20 Jan 2023, Arnd Bergmann wrote: > On Fri, Jan 20, 2023, at 18:15, Lee Jones wrote: > > On Fri, 20 Jan 2023, Arnd Bergmann wrote: > > >> > Marek Behún (3): > >> > leds: turris-omnia: support HW controlled mode via private trigger > >> > leds: turris-omnia: initialize multi-intensity to full > >> > leds: turris-omnia: change max brightness from 255 to 1 > >> > > >> > Pali Rohár (5): > >> > dt-bindings: leds: register-bit-led: Add active-low property > >> > leds: syscon: Implement support for active-low property > >> > powerpc/85xx: DTS: Add CPLD definitions for P1021RDB Combo Board CPL > >> > Design > >> > dt-bindings: leds: Add cznic,turris1x-leds.yaml binding > >> > leds: Add support for Turris 1.x LEDs > >> > > >> > .../testing/sysfs-class-led-driver-turris1x | 31 ++ > >> > .../bindings/leds/cznic,turris1x-leds.yaml| 118 + > >> > .../bindings/leds/register-bit-led.yaml | 5 + > >> > arch/powerpc/boot/dts/fsl/p1020mbg-pc.dtsi| 92 > >> > arch/powerpc/boot/dts/fsl/p1020mbg-pc_32b.dts | 6 +- > >> > arch/powerpc/boot/dts/fsl/p1020mbg-pc_36b.dts | 6 +- > >> > arch/powerpc/boot/dts/fsl/p1020rdb-pd.dts | 44 +- > >> > arch/powerpc/boot/dts/fsl/p1020utm-pc.dtsi| 37 ++ > >> > arch/powerpc/boot/dts/fsl/p1020utm-pc_32b.dts | 4 +- > >> > arch/powerpc/boot/dts/fsl/p1020utm-pc_36b.dts | 4 +- > >> > arch/powerpc/boot/dts/fsl/p1021rdb-pc.dtsi| 37 ++ > >> > arch/powerpc/boot/dts/fsl/p1021rdb-pc_32b.dts | 5 +- > >> > arch/powerpc/boot/dts/fsl/p1021rdb-pc_36b.dts | 5 +- > >> > arch/powerpc/boot/dts/fsl/p2020rdb-pc.dtsi| 33 +- > > > >> > drivers/leds/Kconfig | 10 + > >> > drivers/leds/Makefile | 1 + > >> > drivers/leds/leds-syscon.c| 14 +- > >> > drivers/leds/leds-turris-1x.c | 474 ++ > >> > drivers/leds/leds-turris-omnia.c | 46 +- > > > > If everyone is convinced that applying these drivers is the correct > > thing to do, I'd be happy to (rather) take them via LEDs. > > Ok, thanks. I had not actually looked at the patches until today. > They were in the soc tree backlog but appeared to be misplaced > there until I read the 0/10 message text. > > Looking at it now, I see: > > - patches 1 and 2 seem obvious and have been reviewed by > others already > > - patch 3 is for arch/powerpc and should get merged through > there if there are no objections to the binding in patch 4. > > - patch 5 is the big driver patch, with a Reviewed-by tag > from Marek Behún, who is the author of the last three patches. > An earlier version of this patch was sent in June and got > a few Acks and detailed feedback from Andy [1], but he's also > not on Cc, and I don't know if his comments are all resolved > in this version. > > - Patches 6, 7 and 8 all seem simple LED subsystem patches, > they just need review from you in order to get applied. > These are also missing a Signed-off-by from the submitter > in addition to the author in order to be applied. Very well. Let's have them resent then please (with past reviewers on Cc:) and we'll go from there. -- Lee Jones [李琼斯]
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
* Suren Baghdasaryan [230120 12:50]: > On Fri, Jan 20, 2023 at 9:32 AM Matthew Wilcox wrote: > > > > On Fri, Jan 20, 2023 at 09:17:46AM -0800, Suren Baghdasaryan wrote: > > > On Fri, Jan 20, 2023 at 9:08 AM Liam R. Howlett > > > wrote: > > > > > > > > * Matthew Wilcox [230120 11:50]: > > > > > On Fri, Jan 20, 2023 at 08:45:21AM -0800, Suren Baghdasaryan wrote: > > > > > > On Fri, Jan 20, 2023 at 8:20 AM Suren Baghdasaryan > > > > > > wrote: > > > > > > > > > > > > > > On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko > > > > > > > wrote: > > > > > > > > > > > > > > > > On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > > > > > > > > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > > > > > > > > > call_rcu() can take a long time when callback offloading > > > > > > > > > > > is enabled. > > > > > > > > > > > Its use in the vm_area_free can cause regressions in the > > > > > > > > > > > exit path when > > > > > > > > > > > multiple VMAs are being freed. To minimize that impact, > > > > > > > > > > > place VMAs into > > > > > > > > > > > a list and free them in groups using one call_rcu() call > > > > > > > > > > > per group. > > > > > > > > > > > > > > > > > > > > After some more clarification I can understand how call_rcu > > > > > > > > > > might not be > > > > > > > > > > super happy about thousands of callbacks to be invoked and > > > > > > > > > > I do agree > > > > > > > > > > that this is not really optimal. > > > > > > > > > > > > > > > > > > > > On the other hand I do not like this solution much either. > > > > > > > > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help > > > > > > > > > > all that > > > > > > > > > > much with processes with a huge number of vmas either. It > > > > > > > > > > would still be > > > > > > > > > > in housands of callbacks to be scheduled without a good > > > > > > > > > > reason. > > > > > > > > > > > > > > > > > > > > Instead, are there any other cases than remove_vma that > > > > > > > > > > need this > > > > > > > > > > batching? We could easily just link all the vmas into > > > > > > > > > > linked list and > > > > > > > > > > use a single call_rcu instead, no? This would both simplify > > > > > > > > > > the > > > > > > > > > > implementation, remove the scaling issue as well and we do > > > > > > > > > > not have to > > > > > > > > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or > > > > > > > > > > epsilon + 1. > > > > > > > > > > > > > > > > > > Yes, I agree the solution is not stellar. I wanted something > > > > > > > > > simple > > > > > > > > > but this is probably too simple. OTOH keeping all dead > > > > > > > > > vm_area_structs > > > > > > > > > on the list without hooking up a shrinker (additional > > > > > > > > > complexity) does > > > > > > > > > not sound too appealing either. > > > > > > > > > > > > > > > > I suspect you have missed my idea. I do not really want to keep > > > > > > > > the list > > > > > > > > around or any shrinker. It is dead simple. Collect all vmas in > > > > > > > > remove_vma and then call_rcu the whole list at once after the > > > > > > > > whole list > > > > > > > > (be it from exit_mmap or remove_mt). See? > > > > > > > > > > > > > > Yes, I understood your idea but keeping dead objects until the > > > > > > > process > > > > > > > exits even when the system is low on memory (no shrinkers > > > > > > > attached) > > > > > > > seems too wasteful. If we do this I would advocate for attaching a > > > > > > > shrinker. > > > > > > > > > > > > Maybe even simpler, since we are hit with this VMA freeing flood > > > > > > during exit_mmap (when all VMAs are destroyed), we pass a hint to > > > > > > vm_area_free to batch the destruction and all other cases call > > > > > > call_rcu()? I don't think there will be other cases of VMA > > > > > > destruction > > > > > > floods. > > > > > > > > > > ... or have two different call_rcu functions; one for munmap() and > > > > > one for exit. It'd be nice to use kmem_cache_free_bulk(). > > > > > > > > Do we even need a call_rcu on exit? At the point of freeing the VMAs we > > > > have set the MMF_OOM_SKIP bit and unmapped the vmas under the read lock. > > > > Once we have obtained the write lock again, I think it's safe to say we > > > > can just go ahead and free the VMAs directly. > > > > > > I think that would be still racy if the page fault handler found that > > > VMA under read-RCU protection but did not lock it yet (no locks are > > > held yet). If it's preempted, the VMA can be freed and destroyed from > > > under it without RCU grace period. > > > > The page fault handler (or whatever other reader -- ptrace, proc, etc) > > should have a refcount on the mm_struct, so we can't be in this path > > trying to free VMAs. Right? > > Hmm. That sounds right. I checked process_mrelease() as well, which > operated
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Fri, Jan 20, 2023 at 9:21 AM Paul E. McKenney wrote: > > On Fri, Jan 20, 2023 at 04:49:42PM +, Matthew Wilcox wrote: > > On Fri, Jan 20, 2023 at 08:45:21AM -0800, Suren Baghdasaryan wrote: > > > On Fri, Jan 20, 2023 at 8:20 AM Suren Baghdasaryan > > > wrote: > > > > > > > > On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko wrote: > > > > > > > > > > On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > > > > > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko > > > > > > wrote: > > > > > > > > > > > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > > > > > > call_rcu() can take a long time when callback offloading is > > > > > > > > enabled. > > > > > > > > Its use in the vm_area_free can cause regressions in the exit > > > > > > > > path when > > > > > > > > multiple VMAs are being freed. To minimize that impact, place > > > > > > > > VMAs into > > > > > > > > a list and free them in groups using one call_rcu() call per > > > > > > > > group. > > > > > > > > > > > > > > After some more clarification I can understand how call_rcu might > > > > > > > not be > > > > > > > super happy about thousands of callbacks to be invoked and I do > > > > > > > agree > > > > > > > that this is not really optimal. > > > > > > > > > > > > > > On the other hand I do not like this solution much either. > > > > > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all > > > > > > > that > > > > > > > much with processes with a huge number of vmas either. It would > > > > > > > still be > > > > > > > in housands of callbacks to be scheduled without a good reason. > > > > > > > > > > > > > > Instead, are there any other cases than remove_vma that need this > > > > > > > batching? We could easily just link all the vmas into linked list > > > > > > > and > > > > > > > use a single call_rcu instead, no? This would both simplify the > > > > > > > implementation, remove the scaling issue as well and we do not > > > > > > > have to > > > > > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon > > > > > > > + 1. > > > > > > > > > > > > Yes, I agree the solution is not stellar. I wanted something simple > > > > > > but this is probably too simple. OTOH keeping all dead > > > > > > vm_area_structs > > > > > > on the list without hooking up a shrinker (additional complexity) > > > > > > does > > > > > > not sound too appealing either. > > > > > > > > > > I suspect you have missed my idea. I do not really want to keep the > > > > > list > > > > > around or any shrinker. It is dead simple. Collect all vmas in > > > > > remove_vma and then call_rcu the whole list at once after the whole > > > > > list > > > > > (be it from exit_mmap or remove_mt). See? > > > > > > > > Yes, I understood your idea but keeping dead objects until the process > > > > exits even when the system is low on memory (no shrinkers attached) > > > > seems too wasteful. If we do this I would advocate for attaching a > > > > shrinker. > > > > > > Maybe even simpler, since we are hit with this VMA freeing flood > > > during exit_mmap (when all VMAs are destroyed), we pass a hint to > > > vm_area_free to batch the destruction and all other cases call > > > call_rcu()? I don't think there will be other cases of VMA destruction > > > floods. > > > > ... or have two different call_rcu functions; one for munmap() and > > one for exit. It'd be nice to use kmem_cache_free_bulk(). > > Good point, kfree_rcu(p, r) where "r" is the name of the rcu_head > structure's field, is much more cache-efficient. > > The penalty is that there is no callback function to do any cleanup. > There is just a kfree()/kvfree (bulk version where applicable), > nothing else. If Liam's suggestion works then we won't need anything additional. We will free the vm_area_structs directly on process exit and use call_rcu() in all other cases. Let's see if Michal knows of any case which still needs an RCU grace period during exit_mmap. > > Thanx, Paul > > -- > To unsubscribe from this group and stop receiving emails from it, send an > email to kernel-team+unsubscr...@android.com. >
Re: Calculating array sizes in C - was: Re: Build regressions/improvements in v6.2-rc1
Hello! Can someone please file a GCC PR? With reduced testcase preferably. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108483 There you are. Kind regars, Michael Karcher
[PATCH v3] of: Make of framebuffer devices unique
Since Linux 5.19 this error is observed: sysfs: cannot create duplicate filename '/devices/platform/of-display' This is because multiple devices with the same name 'of-display' are created on the same bus. Update the code to create numbered device names for the non-boot disaplay. cc: linuxppc-dev@lists.ozlabs.org References: https://bugzilla.kernel.org/show_bug.cgi?id=216095 Fixes: 52b1b46c39ae ("of: Create platform devices for OF framebuffers") Reported-by: Erhard F. Suggested-by: Thomas Zimmermann Signed-off-by: Michal Suchanek --- v3: - merge fix into original patch - Update the device name format - add missing const - do not continue with iterating display devices when formatting device name fails --- drivers/of/platform.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/of/platform.c b/drivers/of/platform.c index 81c8c227ab6b..4d3a4d9f79f2 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -525,6 +525,9 @@ static int __init of_platform_default_populate_init(void) if (IS_ENABLED(CONFIG_PPC)) { struct device_node *boot_display = NULL; struct platform_device *dev; + int display_number = 0; + char buf[14]; + const char *of_display_format = "of-display.%d"; int ret; /* Check if we have a MacOS display without a node spec */ @@ -555,7 +558,10 @@ static int __init of_platform_default_populate_init(void) if (!of_get_property(node, "linux,opened", NULL) || !of_get_property(node, "linux,boot-display", NULL)) continue; - dev = of_platform_device_create(node, "of-display", NULL); + ret = snprintf(buf, sizeof(buf), of_display_format, display_number++); + if (ret >= sizeof(buf)) + return -EOVERFLOW; + dev = of_platform_device_create(node, buf, NULL); if (WARN_ON(!dev)) return -ENOMEM; boot_display = node; @@ -564,7 +570,10 @@ static int __init of_platform_default_populate_init(void) for_each_node_by_type(node, "display") { if (!of_get_property(node, "linux,opened", NULL) || node == boot_display) continue; - of_platform_device_create(node, "of-display", NULL); + ret = snprintf(buf, sizeof(buf), of_display_format, display_number++); + if (ret >= sizeof(buf)) + break; + of_platform_device_create(node, buf, NULL); } } else { -- 2.35.3
Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code
Hello, On Fri, Jan 20, 2023 at 11:23:39AM -0600, Rob Herring wrote: > On Thu, Jan 19, 2023 at 3:53 AM Michal Suchanek wrote: > > > > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique") > > breaks build because of wrong argument to snprintf. That certainly > > avoids the runtime error but is not the intended outcome. > > > > Also use standard device name format of-display.N for all created > > devices. > > > > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique") > > Signed-off-by: Michal Suchanek > > --- > > v2: Update the device name format > > --- > > drivers/of/platform.c | 12 > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > index f2a5d679a324..8c1b1de22036 100644 > > --- a/drivers/of/platform.c > > +++ b/drivers/of/platform.c > > @@ -525,7 +525,9 @@ static int __init > > of_platform_default_populate_init(void) > > if (IS_ENABLED(CONFIG_PPC)) { > > struct device_node *boot_display = NULL; > > struct platform_device *dev; > > - int display_number = 1; > > + int display_number = 0; > > + char buf[14]; > > + char *of_display_format = "of-display.%d"; > > static const as suggested and can we just move on please... Only const, static could be dodgy > > int ret; > > > > /* Check if we have a MacOS display without a node spec */ > > @@ -556,7 +558,10 @@ static int __init > > of_platform_default_populate_init(void) > > if (!of_get_property(node, "linux,opened", NULL) || > > !of_get_property(node, "linux,boot-display", > > NULL)) > > continue; > > - dev = of_platform_device_create(node, "of-display", > > NULL); > > + ret = snprintf(buf, sizeof(buf), of_display_format, > > display_number++); > > The boot display is always "of-display.0". Just use the fixed string > here. Then we can get rid of the whole debate around static const. I prefer to use the same format string when the names should be consistent. Also it would resurrect the starting from 1 debate. But if you really want to have two strings I do not care all that much. > > > + if (ret >= sizeof(buf)) > > + continue; > > This only happens if display_number becomes too big. Why continue on? > The next iteration will fail too. Yes, there is no need to continue with the loop. Thanks Michal > > > + dev = of_platform_device_create(node, buf, NULL); > > if (WARN_ON(!dev)) > > return -ENOMEM; > > boot_display = node; > > @@ -564,10 +569,9 @@ static int __init > > of_platform_default_populate_init(void) > > } > > > > for_each_node_by_type(node, "display") { > > - char *buf[14]; > > if (!of_get_property(node, "linux,opened", NULL) || > > node == boot_display) > > continue; > > - ret = snprintf(buf, "of-display-%d", > > display_number++); > > + ret = snprintf(buf, sizeof(buf), of_display_format, > > display_number++); > > if (ret >= sizeof(buf)) > > continue; > > Here too in the original change. > > > of_platform_device_create(node, buf, NULL); > > -- > > 2.35.3 > >
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Fri, Jan 20, 2023 at 9:32 AM Matthew Wilcox wrote: > > On Fri, Jan 20, 2023 at 09:17:46AM -0800, Suren Baghdasaryan wrote: > > On Fri, Jan 20, 2023 at 9:08 AM Liam R. Howlett > > wrote: > > > > > > * Matthew Wilcox [230120 11:50]: > > > > On Fri, Jan 20, 2023 at 08:45:21AM -0800, Suren Baghdasaryan wrote: > > > > > On Fri, Jan 20, 2023 at 8:20 AM Suren Baghdasaryan > > > > > wrote: > > > > > > > > > > > > On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko > > > > > > wrote: > > > > > > > > > > > > > > On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > > > > > > > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > > > > > > > > call_rcu() can take a long time when callback offloading is > > > > > > > > > > enabled. > > > > > > > > > > Its use in the vm_area_free can cause regressions in the > > > > > > > > > > exit path when > > > > > > > > > > multiple VMAs are being freed. To minimize that impact, > > > > > > > > > > place VMAs into > > > > > > > > > > a list and free them in groups using one call_rcu() call > > > > > > > > > > per group. > > > > > > > > > > > > > > > > > > After some more clarification I can understand how call_rcu > > > > > > > > > might not be > > > > > > > > > super happy about thousands of callbacks to be invoked and I > > > > > > > > > do agree > > > > > > > > > that this is not really optimal. > > > > > > > > > > > > > > > > > > On the other hand I do not like this solution much either. > > > > > > > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help > > > > > > > > > all that > > > > > > > > > much with processes with a huge number of vmas either. It > > > > > > > > > would still be > > > > > > > > > in housands of callbacks to be scheduled without a good > > > > > > > > > reason. > > > > > > > > > > > > > > > > > > Instead, are there any other cases than remove_vma that need > > > > > > > > > this > > > > > > > > > batching? We could easily just link all the vmas into linked > > > > > > > > > list and > > > > > > > > > use a single call_rcu instead, no? This would both simplify > > > > > > > > > the > > > > > > > > > implementation, remove the scaling issue as well and we do > > > > > > > > > not have to > > > > > > > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or > > > > > > > > > epsilon + 1. > > > > > > > > > > > > > > > > Yes, I agree the solution is not stellar. I wanted something > > > > > > > > simple > > > > > > > > but this is probably too simple. OTOH keeping all dead > > > > > > > > vm_area_structs > > > > > > > > on the list without hooking up a shrinker (additional > > > > > > > > complexity) does > > > > > > > > not sound too appealing either. > > > > > > > > > > > > > > I suspect you have missed my idea. I do not really want to keep > > > > > > > the list > > > > > > > around or any shrinker. It is dead simple. Collect all vmas in > > > > > > > remove_vma and then call_rcu the whole list at once after the > > > > > > > whole list > > > > > > > (be it from exit_mmap or remove_mt). See? > > > > > > > > > > > > Yes, I understood your idea but keeping dead objects until the > > > > > > process > > > > > > exits even when the system is low on memory (no shrinkers attached) > > > > > > seems too wasteful. If we do this I would advocate for attaching a > > > > > > shrinker. > > > > > > > > > > Maybe even simpler, since we are hit with this VMA freeing flood > > > > > during exit_mmap (when all VMAs are destroyed), we pass a hint to > > > > > vm_area_free to batch the destruction and all other cases call > > > > > call_rcu()? I don't think there will be other cases of VMA destruction > > > > > floods. > > > > > > > > ... or have two different call_rcu functions; one for munmap() and > > > > one for exit. It'd be nice to use kmem_cache_free_bulk(). > > > > > > Do we even need a call_rcu on exit? At the point of freeing the VMAs we > > > have set the MMF_OOM_SKIP bit and unmapped the vmas under the read lock. > > > Once we have obtained the write lock again, I think it's safe to say we > > > can just go ahead and free the VMAs directly. > > > > I think that would be still racy if the page fault handler found that > > VMA under read-RCU protection but did not lock it yet (no locks are > > held yet). If it's preempted, the VMA can be freed and destroyed from > > under it without RCU grace period. > > The page fault handler (or whatever other reader -- ptrace, proc, etc) > should have a refcount on the mm_struct, so we can't be in this path > trying to free VMAs. Right? Hmm. That sounds right. I checked process_mrelease() as well, which operated on mm with only mmgrab()+mmap_read_lock() but it only unmaps VMAs without freeing them, so we are still good. Michal, do you agree this is ok? lock_vma_under_rcu() receives mm as a parameter, so I guess it's implied that the caller should either mmget() it or
Re: [PATCH RESEND 0/8] Resend LED patches
On Fri, Jan 20, 2023, at 18:15, Lee Jones wrote: > On Fri, 20 Jan 2023, Arnd Bergmann wrote: >> > Marek Behún (3): >> > leds: turris-omnia: support HW controlled mode via private trigger >> > leds: turris-omnia: initialize multi-intensity to full >> > leds: turris-omnia: change max brightness from 255 to 1 >> > >> > Pali Rohár (5): >> > dt-bindings: leds: register-bit-led: Add active-low property >> > leds: syscon: Implement support for active-low property >> > powerpc/85xx: DTS: Add CPLD definitions for P1021RDB Combo Board CPL >> > Design >> > dt-bindings: leds: Add cznic,turris1x-leds.yaml binding >> > leds: Add support for Turris 1.x LEDs >> > >> > .../testing/sysfs-class-led-driver-turris1x | 31 ++ >> > .../bindings/leds/cznic,turris1x-leds.yaml| 118 + >> > .../bindings/leds/register-bit-led.yaml | 5 + >> > arch/powerpc/boot/dts/fsl/p1020mbg-pc.dtsi| 92 >> > arch/powerpc/boot/dts/fsl/p1020mbg-pc_32b.dts | 6 +- >> > arch/powerpc/boot/dts/fsl/p1020mbg-pc_36b.dts | 6 +- >> > arch/powerpc/boot/dts/fsl/p1020rdb-pd.dts | 44 +- >> > arch/powerpc/boot/dts/fsl/p1020utm-pc.dtsi| 37 ++ >> > arch/powerpc/boot/dts/fsl/p1020utm-pc_32b.dts | 4 +- >> > arch/powerpc/boot/dts/fsl/p1020utm-pc_36b.dts | 4 +- >> > arch/powerpc/boot/dts/fsl/p1021rdb-pc.dtsi| 37 ++ >> > arch/powerpc/boot/dts/fsl/p1021rdb-pc_32b.dts | 5 +- >> > arch/powerpc/boot/dts/fsl/p1021rdb-pc_36b.dts | 5 +- >> > arch/powerpc/boot/dts/fsl/p2020rdb-pc.dtsi| 33 +- > >> > drivers/leds/Kconfig | 10 + >> > drivers/leds/Makefile | 1 + >> > drivers/leds/leds-syscon.c| 14 +- >> > drivers/leds/leds-turris-1x.c | 474 ++ >> > drivers/leds/leds-turris-omnia.c | 46 +- > > If everyone is convinced that applying these drivers is the correct > thing to do, I'd be happy to (rather) take them via LEDs. Ok, thanks. I had not actually looked at the patches until today. They were in the soc tree backlog but appeared to be misplaced there until I read the 0/10 message text. Looking at it now, I see: - patches 1 and 2 seem obvious and have been reviewed by others already - patch 3 is for arch/powerpc and should get merged through there if there are no objections to the binding in patch 4. - patch 5 is the big driver patch, with a Reviewed-by tag from Marek Behún, who is the author of the last three patches. An earlier version of this patch was sent in June and got a few Acks and detailed feedback from Andy [1], but he's also not on Cc, and I don't know if his comments are all resolved in this version. - Patches 6, 7 and 8 all seem simple LED subsystem patches, they just need review from you in order to get applied. These are also missing a Signed-off-by from the submitter in addition to the author in order to be applied. Arnd [1] https://lore.kernel.org/all/CAHp75Vcr6o2rm+T6Tr8sS4VXCLVHtmLPWy-njOKAvO4AcZoW=a...@mail.gmail.com/
Re: [PATCH v2 01/10] dt-bindings: arm: fsl: add TQ-Systems LS1021A board
On 20/01/2023 14:34, Alexander Stein wrote: > From: Matthias Schiffer > > TQMLS102xA is a SOM family using NXP LS1021A CPU family. > MBLS102xA is an evaluation mainboard for this SOM. > > Signed-off-by: Matthias Schiffer > Signed-off-by: Alexander Stein > --- > Changes in v2: > * Improved the description mentioning this is a socketable module > Acked-by: Krzysztof Kozlowski Best regards, Krzysztof
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Fri, Jan 20, 2023 at 09:17:46AM -0800, Suren Baghdasaryan wrote: > On Fri, Jan 20, 2023 at 9:08 AM Liam R. Howlett > wrote: > > > > * Matthew Wilcox [230120 11:50]: > > > On Fri, Jan 20, 2023 at 08:45:21AM -0800, Suren Baghdasaryan wrote: > > > > On Fri, Jan 20, 2023 at 8:20 AM Suren Baghdasaryan > > > > wrote: > > > > > > > > > > On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko wrote: > > > > > > > > > > > > On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > > > > > > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko > > > > > > > wrote: > > > > > > > > > > > > > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > > > > > > > call_rcu() can take a long time when callback offloading is > > > > > > > > > enabled. > > > > > > > > > Its use in the vm_area_free can cause regressions in the exit > > > > > > > > > path when > > > > > > > > > multiple VMAs are being freed. To minimize that impact, place > > > > > > > > > VMAs into > > > > > > > > > a list and free them in groups using one call_rcu() call per > > > > > > > > > group. > > > > > > > > > > > > > > > > After some more clarification I can understand how call_rcu > > > > > > > > might not be > > > > > > > > super happy about thousands of callbacks to be invoked and I do > > > > > > > > agree > > > > > > > > that this is not really optimal. > > > > > > > > > > > > > > > > On the other hand I do not like this solution much either. > > > > > > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all > > > > > > > > that > > > > > > > > much with processes with a huge number of vmas either. It would > > > > > > > > still be > > > > > > > > in housands of callbacks to be scheduled without a good reason. > > > > > > > > > > > > > > > > Instead, are there any other cases than remove_vma that need > > > > > > > > this > > > > > > > > batching? We could easily just link all the vmas into linked > > > > > > > > list and > > > > > > > > use a single call_rcu instead, no? This would both simplify the > > > > > > > > implementation, remove the scaling issue as well and we do not > > > > > > > > have to > > > > > > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or > > > > > > > > epsilon + 1. > > > > > > > > > > > > > > Yes, I agree the solution is not stellar. I wanted something > > > > > > > simple > > > > > > > but this is probably too simple. OTOH keeping all dead > > > > > > > vm_area_structs > > > > > > > on the list without hooking up a shrinker (additional complexity) > > > > > > > does > > > > > > > not sound too appealing either. > > > > > > > > > > > > I suspect you have missed my idea. I do not really want to keep the > > > > > > list > > > > > > around or any shrinker. It is dead simple. Collect all vmas in > > > > > > remove_vma and then call_rcu the whole list at once after the whole > > > > > > list > > > > > > (be it from exit_mmap or remove_mt). See? > > > > > > > > > > Yes, I understood your idea but keeping dead objects until the process > > > > > exits even when the system is low on memory (no shrinkers attached) > > > > > seems too wasteful. If we do this I would advocate for attaching a > > > > > shrinker. > > > > > > > > Maybe even simpler, since we are hit with this VMA freeing flood > > > > during exit_mmap (when all VMAs are destroyed), we pass a hint to > > > > vm_area_free to batch the destruction and all other cases call > > > > call_rcu()? I don't think there will be other cases of VMA destruction > > > > floods. > > > > > > ... or have two different call_rcu functions; one for munmap() and > > > one for exit. It'd be nice to use kmem_cache_free_bulk(). > > > > Do we even need a call_rcu on exit? At the point of freeing the VMAs we > > have set the MMF_OOM_SKIP bit and unmapped the vmas under the read lock. > > Once we have obtained the write lock again, I think it's safe to say we > > can just go ahead and free the VMAs directly. > > I think that would be still racy if the page fault handler found that > VMA under read-RCU protection but did not lock it yet (no locks are > held yet). If it's preempted, the VMA can be freed and destroyed from > under it without RCU grace period. The page fault handler (or whatever other reader -- ptrace, proc, etc) should have a refcount on the mm_struct, so we can't be in this path trying to free VMAs. Right?
Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code
On Thu, Jan 19, 2023 at 3:53 AM Michal Suchanek wrote: > > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique") > breaks build because of wrong argument to snprintf. That certainly > avoids the runtime error but is not the intended outcome. > > Also use standard device name format of-display.N for all created > devices. > > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique") > Signed-off-by: Michal Suchanek > --- > v2: Update the device name format > --- > drivers/of/platform.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > index f2a5d679a324..8c1b1de22036 100644 > --- a/drivers/of/platform.c > +++ b/drivers/of/platform.c > @@ -525,7 +525,9 @@ static int __init of_platform_default_populate_init(void) > if (IS_ENABLED(CONFIG_PPC)) { > struct device_node *boot_display = NULL; > struct platform_device *dev; > - int display_number = 1; > + int display_number = 0; > + char buf[14]; > + char *of_display_format = "of-display.%d"; static const as suggested and can we just move on please... > int ret; > > /* Check if we have a MacOS display without a node spec */ > @@ -556,7 +558,10 @@ static int __init of_platform_default_populate_init(void) > if (!of_get_property(node, "linux,opened", NULL) || > !of_get_property(node, "linux,boot-display", > NULL)) > continue; > - dev = of_platform_device_create(node, "of-display", > NULL); > + ret = snprintf(buf, sizeof(buf), of_display_format, > display_number++); The boot display is always "of-display.0". Just use the fixed string here. Then we can get rid of the whole debate around static const. > + if (ret >= sizeof(buf)) > + continue; This only happens if display_number becomes too big. Why continue on? The next iteration will fail too. > + dev = of_platform_device_create(node, buf, NULL); > if (WARN_ON(!dev)) > return -ENOMEM; > boot_display = node; > @@ -564,10 +569,9 @@ static int __init of_platform_default_populate_init(void) > } > > for_each_node_by_type(node, "display") { > - char *buf[14]; > if (!of_get_property(node, "linux,opened", NULL) || > node == boot_display) > continue; > - ret = snprintf(buf, "of-display-%d", > display_number++); > + ret = snprintf(buf, sizeof(buf), of_display_format, > display_number++); > if (ret >= sizeof(buf)) > continue; Here too in the original change. > of_platform_device_create(node, buf, NULL); > -- > 2.35.3 >
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Fri, Jan 20, 2023 at 04:49:42PM +, Matthew Wilcox wrote: > On Fri, Jan 20, 2023 at 08:45:21AM -0800, Suren Baghdasaryan wrote: > > On Fri, Jan 20, 2023 at 8:20 AM Suren Baghdasaryan > > wrote: > > > > > > On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko wrote: > > > > > > > > On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > > > > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko wrote: > > > > > > > > > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > > > > > call_rcu() can take a long time when callback offloading is > > > > > > > enabled. > > > > > > > Its use in the vm_area_free can cause regressions in the exit > > > > > > > path when > > > > > > > multiple VMAs are being freed. To minimize that impact, place > > > > > > > VMAs into > > > > > > > a list and free them in groups using one call_rcu() call per > > > > > > > group. > > > > > > > > > > > > After some more clarification I can understand how call_rcu might > > > > > > not be > > > > > > super happy about thousands of callbacks to be invoked and I do > > > > > > agree > > > > > > that this is not really optimal. > > > > > > > > > > > > On the other hand I do not like this solution much either. > > > > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that > > > > > > much with processes with a huge number of vmas either. It would > > > > > > still be > > > > > > in housands of callbacks to be scheduled without a good reason. > > > > > > > > > > > > Instead, are there any other cases than remove_vma that need this > > > > > > batching? We could easily just link all the vmas into linked list > > > > > > and > > > > > > use a single call_rcu instead, no? This would both simplify the > > > > > > implementation, remove the scaling issue as well and we do not have > > > > > > to > > > > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + > > > > > > 1. > > > > > > > > > > Yes, I agree the solution is not stellar. I wanted something simple > > > > > but this is probably too simple. OTOH keeping all dead vm_area_structs > > > > > on the list without hooking up a shrinker (additional complexity) does > > > > > not sound too appealing either. > > > > > > > > I suspect you have missed my idea. I do not really want to keep the list > > > > around or any shrinker. It is dead simple. Collect all vmas in > > > > remove_vma and then call_rcu the whole list at once after the whole list > > > > (be it from exit_mmap or remove_mt). See? > > > > > > Yes, I understood your idea but keeping dead objects until the process > > > exits even when the system is low on memory (no shrinkers attached) > > > seems too wasteful. If we do this I would advocate for attaching a > > > shrinker. > > > > Maybe even simpler, since we are hit with this VMA freeing flood > > during exit_mmap (when all VMAs are destroyed), we pass a hint to > > vm_area_free to batch the destruction and all other cases call > > call_rcu()? I don't think there will be other cases of VMA destruction > > floods. > > ... or have two different call_rcu functions; one for munmap() and > one for exit. It'd be nice to use kmem_cache_free_bulk(). Good point, kfree_rcu(p, r) where "r" is the name of the rcu_head structure's field, is much more cache-efficient. The penalty is that there is no callback function to do any cleanup. There is just a kfree()/kvfree (bulk version where applicable), nothing else. Thanx, Paul
Re: [PATCH v2 10/10] ARM: add multi_v7_lpae_defconfig
On Fri, Jan 20, 2023, at 14:34, Alexander Stein wrote: > From: Nicolas Saenz Julienne > > The only missing configuration option preventing us from using > multi_v7_defconfig with the Raspberry Pi 4 is ARM_LPAE. It's needed as > the PCIe controller found on the SoC depends on 64bit addressing, yet > can't be included as not all v7 boards support LPAE. > > Introduce multi_v7_lpae_defconfig, built off multi_v7_defconfig, which will > avoid us having to duplicate and maintain multiple similar configurations. > > Needless to say the Raspberry Pi 4 is not the only platform that can > benefit from this new configuration. > > Signed-off-by: Nicolas Saenz Julienne > Signed-off-by: Alexander Stein This is ok in principle, two minor points though: > +include $(srctree)/scripts/Makefile.defconf > +PHONY += multi_v7_lpae_defconfig > +multi_v7_lpae_defconfig: > + $(call merge_into_defconfig,multi_v7_defconfig,lpae) > > define archhelp >echo '* zImage- Compressed kernel image > (arch/$(ARCH)/boot/zImage)' The new target does not get listed in 'make help' as far as I can tell, can you add it there in the process? > diff --git a/arch/arm/configs/lpae.config b/arch/arm/configs/lpae.config > new file mode 100644 > index 0..19bab134e014b > --- /dev/null > +++ b/arch/arm/configs/lpae.config > @@ -0,0 +1 @@ > +CONFIG_ARM_LPAE=y My feeling is that we probably want CONFIG_VMSPLIT_2G=y here as well, given that a lot of the systems that want LPAE will have a lot of memory, and are limited by the amount of lowmem even when CONFIG_HIGHMEM is enabled. Can you make sure that this works on your machine, and include this option? Arnd
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Fri, Jan 20, 2023 at 9:08 AM Liam R. Howlett wrote: > > * Matthew Wilcox [230120 11:50]: > > On Fri, Jan 20, 2023 at 08:45:21AM -0800, Suren Baghdasaryan wrote: > > > On Fri, Jan 20, 2023 at 8:20 AM Suren Baghdasaryan > > > wrote: > > > > > > > > On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko wrote: > > > > > > > > > > On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > > > > > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko > > > > > > wrote: > > > > > > > > > > > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > > > > > > call_rcu() can take a long time when callback offloading is > > > > > > > > enabled. > > > > > > > > Its use in the vm_area_free can cause regressions in the exit > > > > > > > > path when > > > > > > > > multiple VMAs are being freed. To minimize that impact, place > > > > > > > > VMAs into > > > > > > > > a list and free them in groups using one call_rcu() call per > > > > > > > > group. > > > > > > > > > > > > > > After some more clarification I can understand how call_rcu might > > > > > > > not be > > > > > > > super happy about thousands of callbacks to be invoked and I do > > > > > > > agree > > > > > > > that this is not really optimal. > > > > > > > > > > > > > > On the other hand I do not like this solution much either. > > > > > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all > > > > > > > that > > > > > > > much with processes with a huge number of vmas either. It would > > > > > > > still be > > > > > > > in housands of callbacks to be scheduled without a good reason. > > > > > > > > > > > > > > Instead, are there any other cases than remove_vma that need this > > > > > > > batching? We could easily just link all the vmas into linked list > > > > > > > and > > > > > > > use a single call_rcu instead, no? This would both simplify the > > > > > > > implementation, remove the scaling issue as well and we do not > > > > > > > have to > > > > > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon > > > > > > > + 1. > > > > > > > > > > > > Yes, I agree the solution is not stellar. I wanted something simple > > > > > > but this is probably too simple. OTOH keeping all dead > > > > > > vm_area_structs > > > > > > on the list without hooking up a shrinker (additional complexity) > > > > > > does > > > > > > not sound too appealing either. > > > > > > > > > > I suspect you have missed my idea. I do not really want to keep the > > > > > list > > > > > around or any shrinker. It is dead simple. Collect all vmas in > > > > > remove_vma and then call_rcu the whole list at once after the whole > > > > > list > > > > > (be it from exit_mmap or remove_mt). See? > > > > > > > > Yes, I understood your idea but keeping dead objects until the process > > > > exits even when the system is low on memory (no shrinkers attached) > > > > seems too wasteful. If we do this I would advocate for attaching a > > > > shrinker. > > > > > > Maybe even simpler, since we are hit with this VMA freeing flood > > > during exit_mmap (when all VMAs are destroyed), we pass a hint to > > > vm_area_free to batch the destruction and all other cases call > > > call_rcu()? I don't think there will be other cases of VMA destruction > > > floods. > > > > ... or have two different call_rcu functions; one for munmap() and > > one for exit. It'd be nice to use kmem_cache_free_bulk(). > > Do we even need a call_rcu on exit? At the point of freeing the VMAs we > have set the MMF_OOM_SKIP bit and unmapped the vmas under the read lock. > Once we have obtained the write lock again, I think it's safe to say we > can just go ahead and free the VMAs directly. I think that would be still racy if the page fault handler found that VMA under read-RCU protection but did not lock it yet (no locks are held yet). If it's preempted, the VMA can be freed and destroyed from under it without RCU grace period. > > -- > To unsubscribe from this group and stop receiving emails from it, send an > email to kernel-team+unsubscr...@android.com. >
Re: [PATCH RESEND 0/8] Resend LED patches
On Fri, 20 Jan 2023, Arnd Bergmann wrote: > On Mon, Dec 26, 2022, at 13:36, Pali Rohár wrote: > > Linus Walleij suggested me to send these patches to SoC tree [1] > > instead. So I'm doing it. > > > > This patch series contains LED patches which are on the linux-leds > > mailing list for a long time without any future movement. Could you > > please handle them here via SoC tree? Thanks. > > > > [1] - > > https://lore.kernel.org/linux-leds/cacrpkdad6wdo7rgfa4mw8zz0mlxmcpho+sec-ylqnrz_kdr...@mail.gmail.com/ > > I'm going through the backlog of patches sent to s...@kernel.org > and came across this series. While I don't mind taking these > patches through the soc tree in principle, it is important > that this is only done as an exception, and with all the > relevant parties on Cc. > > In particular, the original series that you got no > feedback for did not include the arch/powerpc/ changes, > and I would assume those should go through the powerpc > tree anyway. We have recently decided to take > risc-v and loongarch dts changes through the soc > tree, and I don't mind doing it for powerpc as well > if the powerpc maintainers prefer that, but this is > not something we have even discussed so far. > > I've added everyone to Cc on this mail, but please > resend the series once more so everyone has the patches, > and then we can decide who will pick up what. Thanks Arnd (PSB). > > Marek Behún (3): > > leds: turris-omnia: support HW controlled mode via private trigger > > leds: turris-omnia: initialize multi-intensity to full > > leds: turris-omnia: change max brightness from 255 to 1 > > > > Pali Rohár (5): > > dt-bindings: leds: register-bit-led: Add active-low property > > leds: syscon: Implement support for active-low property > > powerpc/85xx: DTS: Add CPLD definitions for P1021RDB Combo Board CPL > > Design > > dt-bindings: leds: Add cznic,turris1x-leds.yaml binding > > leds: Add support for Turris 1.x LEDs > > > > .../testing/sysfs-class-led-driver-turris1x | 31 ++ > > .../bindings/leds/cznic,turris1x-leds.yaml| 118 + > > .../bindings/leds/register-bit-led.yaml | 5 + > > arch/powerpc/boot/dts/fsl/p1020mbg-pc.dtsi| 92 > > arch/powerpc/boot/dts/fsl/p1020mbg-pc_32b.dts | 6 +- > > arch/powerpc/boot/dts/fsl/p1020mbg-pc_36b.dts | 6 +- > > arch/powerpc/boot/dts/fsl/p1020rdb-pd.dts | 44 +- > > arch/powerpc/boot/dts/fsl/p1020utm-pc.dtsi| 37 ++ > > arch/powerpc/boot/dts/fsl/p1020utm-pc_32b.dts | 4 +- > > arch/powerpc/boot/dts/fsl/p1020utm-pc_36b.dts | 4 +- > > arch/powerpc/boot/dts/fsl/p1021rdb-pc.dtsi| 37 ++ > > arch/powerpc/boot/dts/fsl/p1021rdb-pc_32b.dts | 5 +- > > arch/powerpc/boot/dts/fsl/p1021rdb-pc_36b.dts | 5 +- > > arch/powerpc/boot/dts/fsl/p2020rdb-pc.dtsi| 33 +- > > drivers/leds/Kconfig | 10 + > > drivers/leds/Makefile | 1 + > > drivers/leds/leds-syscon.c| 14 +- > > drivers/leds/leds-turris-1x.c | 474 ++ > > drivers/leds/leds-turris-omnia.c | 46 +- If everyone is convinced that applying these drivers is the correct thing to do, I'd be happy to (rather) take them via LEDs. -- Lee Jones [李琼斯]
Re: [PATCH v2 09/10] kbuild: Add config fragment merge functionality
On Fri, Jan 20, 2023, at 14:34, Alexander Stein wrote: > From: Nicolas Saenz Julienne > > So far this function was only used locally in powerpc, some other > architectures might benefit from it. Move it into > scripts/Makefile.defconf. > > Signed-off-by: Nicolas Saenz Julienne > Signed-off-by: Alexander Stein > --- > Directly applied from > https://lore.kernel.org/linux-arm-kernel/20200203184820.4433-2-nsaenzjulie...@suse.de/T/#m96968dd45c0aaa88e0a7387024b5ac13b002363d Acked-by: Arnd Bergmann I can probably best take this through the soc tree directly along with patch 10, while the first eight patches should got through the nxp maintainer tree. Arnd
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
* Matthew Wilcox [230120 11:50]: > On Fri, Jan 20, 2023 at 08:45:21AM -0800, Suren Baghdasaryan wrote: > > On Fri, Jan 20, 2023 at 8:20 AM Suren Baghdasaryan > > wrote: > > > > > > On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko wrote: > > > > > > > > On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > > > > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko wrote: > > > > > > > > > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > > > > > call_rcu() can take a long time when callback offloading is > > > > > > > enabled. > > > > > > > Its use in the vm_area_free can cause regressions in the exit > > > > > > > path when > > > > > > > multiple VMAs are being freed. To minimize that impact, place > > > > > > > VMAs into > > > > > > > a list and free them in groups using one call_rcu() call per > > > > > > > group. > > > > > > > > > > > > After some more clarification I can understand how call_rcu might > > > > > > not be > > > > > > super happy about thousands of callbacks to be invoked and I do > > > > > > agree > > > > > > that this is not really optimal. > > > > > > > > > > > > On the other hand I do not like this solution much either. > > > > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that > > > > > > much with processes with a huge number of vmas either. It would > > > > > > still be > > > > > > in housands of callbacks to be scheduled without a good reason. > > > > > > > > > > > > Instead, are there any other cases than remove_vma that need this > > > > > > batching? We could easily just link all the vmas into linked list > > > > > > and > > > > > > use a single call_rcu instead, no? This would both simplify the > > > > > > implementation, remove the scaling issue as well and we do not have > > > > > > to > > > > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + > > > > > > 1. > > > > > > > > > > Yes, I agree the solution is not stellar. I wanted something simple > > > > > but this is probably too simple. OTOH keeping all dead vm_area_structs > > > > > on the list without hooking up a shrinker (additional complexity) does > > > > > not sound too appealing either. > > > > > > > > I suspect you have missed my idea. I do not really want to keep the list > > > > around or any shrinker. It is dead simple. Collect all vmas in > > > > remove_vma and then call_rcu the whole list at once after the whole list > > > > (be it from exit_mmap or remove_mt). See? > > > > > > Yes, I understood your idea but keeping dead objects until the process > > > exits even when the system is low on memory (no shrinkers attached) > > > seems too wasteful. If we do this I would advocate for attaching a > > > shrinker. > > > > Maybe even simpler, since we are hit with this VMA freeing flood > > during exit_mmap (when all VMAs are destroyed), we pass a hint to > > vm_area_free to batch the destruction and all other cases call > > call_rcu()? I don't think there will be other cases of VMA destruction > > floods. > > ... or have two different call_rcu functions; one for munmap() and > one for exit. It'd be nice to use kmem_cache_free_bulk(). Do we even need a call_rcu on exit? At the point of freeing the VMAs we have set the MMF_OOM_SKIP bit and unmapped the vmas under the read lock. Once we have obtained the write lock again, I think it's safe to say we can just go ahead and free the VMAs directly.
Re: [PATCH RESEND 0/8] Resend LED patches
On Mon, Dec 26, 2022, at 13:36, Pali Rohár wrote: > Linus Walleij suggested me to send these patches to SoC tree [1] > instead. So I'm doing it. > > This patch series contains LED patches which are on the linux-leds > mailing list for a long time without any future movement. Could you > please handle them here via SoC tree? Thanks. > > [1] - > https://lore.kernel.org/linux-leds/cacrpkdad6wdo7rgfa4mw8zz0mlxmcpho+sec-ylqnrz_kdr...@mail.gmail.com/ I'm going through the backlog of patches sent to s...@kernel.org and came across this series. While I don't mind taking these patches through the soc tree in principle, it is important that this is only done as an exception, and with all the relevant parties on Cc. In particular, the original series that you got no feedback for did not include the arch/powerpc/ changes, and I would assume those should go through the powerpc tree anyway. We have recently decided to take risc-v and loongarch dts changes through the soc tree, and I don't mind doing it for powerpc as well if the powerpc maintainers prefer that, but this is not something we have even discussed so far. I've added everyone to Cc on this mail, but please resend the series once more so everyone has the patches, and then we can decide who will pick up what. Arnd > > Marek Behún (3): > leds: turris-omnia: support HW controlled mode via private trigger > leds: turris-omnia: initialize multi-intensity to full > leds: turris-omnia: change max brightness from 255 to 1 > > Pali Rohár (5): > dt-bindings: leds: register-bit-led: Add active-low property > leds: syscon: Implement support for active-low property > powerpc/85xx: DTS: Add CPLD definitions for P1021RDB Combo Board CPL > Design > dt-bindings: leds: Add cznic,turris1x-leds.yaml binding > leds: Add support for Turris 1.x LEDs > > .../testing/sysfs-class-led-driver-turris1x | 31 ++ > .../bindings/leds/cznic,turris1x-leds.yaml| 118 + > .../bindings/leds/register-bit-led.yaml | 5 + > arch/powerpc/boot/dts/fsl/p1020mbg-pc.dtsi| 92 > arch/powerpc/boot/dts/fsl/p1020mbg-pc_32b.dts | 6 +- > arch/powerpc/boot/dts/fsl/p1020mbg-pc_36b.dts | 6 +- > arch/powerpc/boot/dts/fsl/p1020rdb-pd.dts | 44 +- > arch/powerpc/boot/dts/fsl/p1020utm-pc.dtsi| 37 ++ > arch/powerpc/boot/dts/fsl/p1020utm-pc_32b.dts | 4 +- > arch/powerpc/boot/dts/fsl/p1020utm-pc_36b.dts | 4 +- > arch/powerpc/boot/dts/fsl/p1021rdb-pc.dtsi| 37 ++ > arch/powerpc/boot/dts/fsl/p1021rdb-pc_32b.dts | 5 +- > arch/powerpc/boot/dts/fsl/p1021rdb-pc_36b.dts | 5 +- > arch/powerpc/boot/dts/fsl/p2020rdb-pc.dtsi| 33 +- > drivers/leds/Kconfig | 10 + > drivers/leds/Makefile | 1 + > drivers/leds/leds-syscon.c| 14 +- > drivers/leds/leds-turris-1x.c | 474 ++ > drivers/leds/leds-turris-omnia.c | 46 +- > 19 files changed, 945 insertions(+), 27 deletions(-) > create mode 100644 > Documentation/ABI/testing/sysfs-class-led-driver-turris1x > create mode 100644 > Documentation/devicetree/bindings/leds/cznic,turris1x-leds.yaml > create mode 100644 drivers/leds/leds-turris-1x.c > > -- > 2.20.1
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Fri, Jan 20, 2023 at 08:45:21AM -0800, Suren Baghdasaryan wrote: > On Fri, Jan 20, 2023 at 8:20 AM Suren Baghdasaryan wrote: > > > > On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko wrote: > > > > > > On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > > > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko wrote: > > > > > > > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > > > > call_rcu() can take a long time when callback offloading is enabled. > > > > > > Its use in the vm_area_free can cause regressions in the exit path > > > > > > when > > > > > > multiple VMAs are being freed. To minimize that impact, place VMAs > > > > > > into > > > > > > a list and free them in groups using one call_rcu() call per group. > > > > > > > > > > After some more clarification I can understand how call_rcu might not > > > > > be > > > > > super happy about thousands of callbacks to be invoked and I do agree > > > > > that this is not really optimal. > > > > > > > > > > On the other hand I do not like this solution much either. > > > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that > > > > > much with processes with a huge number of vmas either. It would still > > > > > be > > > > > in housands of callbacks to be scheduled without a good reason. > > > > > > > > > > Instead, are there any other cases than remove_vma that need this > > > > > batching? We could easily just link all the vmas into linked list and > > > > > use a single call_rcu instead, no? This would both simplify the > > > > > implementation, remove the scaling issue as well and we do not have to > > > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + 1. > > > > > > > > Yes, I agree the solution is not stellar. I wanted something simple > > > > but this is probably too simple. OTOH keeping all dead vm_area_structs > > > > on the list without hooking up a shrinker (additional complexity) does > > > > not sound too appealing either. > > > > > > I suspect you have missed my idea. I do not really want to keep the list > > > around or any shrinker. It is dead simple. Collect all vmas in > > > remove_vma and then call_rcu the whole list at once after the whole list > > > (be it from exit_mmap or remove_mt). See? > > > > Yes, I understood your idea but keeping dead objects until the process > > exits even when the system is low on memory (no shrinkers attached) > > seems too wasteful. If we do this I would advocate for attaching a > > shrinker. > > Maybe even simpler, since we are hit with this VMA freeing flood > during exit_mmap (when all VMAs are destroyed), we pass a hint to > vm_area_free to batch the destruction and all other cases call > call_rcu()? I don't think there will be other cases of VMA destruction > floods. ... or have two different call_rcu functions; one for munmap() and one for exit. It'd be nice to use kmem_cache_free_bulk().
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Fri, Jan 20, 2023 at 8:20 AM Suren Baghdasaryan wrote: > > On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko wrote: > > > > On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko wrote: > > > > > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > > > call_rcu() can take a long time when callback offloading is enabled. > > > > > Its use in the vm_area_free can cause regressions in the exit path > > > > > when > > > > > multiple VMAs are being freed. To minimize that impact, place VMAs > > > > > into > > > > > a list and free them in groups using one call_rcu() call per group. > > > > > > > > After some more clarification I can understand how call_rcu might not be > > > > super happy about thousands of callbacks to be invoked and I do agree > > > > that this is not really optimal. > > > > > > > > On the other hand I do not like this solution much either. > > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that > > > > much with processes with a huge number of vmas either. It would still be > > > > in housands of callbacks to be scheduled without a good reason. > > > > > > > > Instead, are there any other cases than remove_vma that need this > > > > batching? We could easily just link all the vmas into linked list and > > > > use a single call_rcu instead, no? This would both simplify the > > > > implementation, remove the scaling issue as well and we do not have to > > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + 1. > > > > > > Yes, I agree the solution is not stellar. I wanted something simple > > > but this is probably too simple. OTOH keeping all dead vm_area_structs > > > on the list without hooking up a shrinker (additional complexity) does > > > not sound too appealing either. > > > > I suspect you have missed my idea. I do not really want to keep the list > > around or any shrinker. It is dead simple. Collect all vmas in > > remove_vma and then call_rcu the whole list at once after the whole list > > (be it from exit_mmap or remove_mt). See? > > Yes, I understood your idea but keeping dead objects until the process > exits even when the system is low on memory (no shrinkers attached) > seems too wasteful. If we do this I would advocate for attaching a > shrinker. Maybe even simpler, since we are hit with this VMA freeing flood during exit_mmap (when all VMAs are destroyed), we pass a hint to vm_area_free to batch the destruction and all other cases call call_rcu()? I don't think there will be other cases of VMA destruction floods. > > > > > -- > > Michal Hocko > > SUSE Labs
Re: [PATCH v9 00/10] phy: Add support for Lynx 10G SerDes
On 1/20/23 03:06, Vinod Koul wrote: > On 19-01-23, 11:22, Sean Anderson wrote: >> On 1/18/23 11:54, Vinod Koul wrote: >> > On 17-01-23, 11:46, Sean Anderson wrote: >> >> >> >> I noticed that this series is marked "changes requested" on patchwork. >> >> However, I have received only automated feedback. I have done my best >> >> effort to address feedback I have received on prior revisions. I would >> >> appreciate getting another round of review before resending this series. >> > >> > Looking at the series, looks like kernel-bot sent some warnings on the >> > series so I was expecting an updated series for review >> > >> >> Generally, multiple reviewers will comment on a patch, even if another >> reviewer finds something which needs to be changed. This is a one-line >> fix, so I would appreciate getting more substantial feedback before >> respinning. Every time I send a new series I have to rebase and test on >> hardware. It's work that I would rather do when there is something to be >> gained. > > I review to apply, if I can apply, I would typically skip this > It is much more efficient to conduct reviews in parallel. So e.g. the bindings can be reviewed at the same time as the driver, at the same time as the device tree changes. This way, I can get a series applied after max(N, M, ...) revisions, where I would otherwise need N revisions to get the bindings ready, M revisions to get the driver ready, etc. But what's happening is that I have to make N + M + ... revisions! I am very frustrated by your refusal to review anything until there are no other comments, since it unnecessarily extends the process of getting a series applied. I have been trying to get this series applied since June, with nine revisions, and you have reviewed it *twice*! I think the driver is in a good state and is ready to be applied (aside from the one known issue), but I have no idea if you agree with that assessment. --Sean
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Fri, Jan 20, 2023 at 12:52 AM Michal Hocko wrote: > > On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko wrote: > > > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > > call_rcu() can take a long time when callback offloading is enabled. > > > > Its use in the vm_area_free can cause regressions in the exit path when > > > > multiple VMAs are being freed. To minimize that impact, place VMAs into > > > > a list and free them in groups using one call_rcu() call per group. > > > > > > After some more clarification I can understand how call_rcu might not be > > > super happy about thousands of callbacks to be invoked and I do agree > > > that this is not really optimal. > > > > > > On the other hand I do not like this solution much either. > > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that > > > much with processes with a huge number of vmas either. It would still be > > > in housands of callbacks to be scheduled without a good reason. > > > > > > Instead, are there any other cases than remove_vma that need this > > > batching? We could easily just link all the vmas into linked list and > > > use a single call_rcu instead, no? This would both simplify the > > > implementation, remove the scaling issue as well and we do not have to > > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + 1. > > > > Yes, I agree the solution is not stellar. I wanted something simple > > but this is probably too simple. OTOH keeping all dead vm_area_structs > > on the list without hooking up a shrinker (additional complexity) does > > not sound too appealing either. > > I suspect you have missed my idea. I do not really want to keep the list > around or any shrinker. It is dead simple. Collect all vmas in > remove_vma and then call_rcu the whole list at once after the whole list > (be it from exit_mmap or remove_mt). See? Yes, I understood your idea but keeping dead objects until the process exits even when the system is low on memory (no shrinkers attached) seems too wasteful. If we do this I would advocate for attaching a shrinker. > > -- > Michal Hocko > SUSE Labs
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Fri, Jan 20, 2023 at 09:57:05AM +0100, Michal Hocko wrote: > On Thu 19-01-23 11:17:07, Paul E. McKenney wrote: > > On Thu, Jan 19, 2023 at 01:52:14PM +0100, Michal Hocko wrote: > > > On Wed 18-01-23 11:01:08, Suren Baghdasaryan wrote: > > > > On Wed, Jan 18, 2023 at 10:34 AM Paul E. McKenney > > > > wrote: > > > [...] > > > > > There are a couple of possibilities here. > > > > > > > > > > First, if I am remembering correctly, the time between the call_rcu() > > > > > and invocation of the corresponding callback was taking multiple > > > > > seconds, > > > > > but that was because the kernel was built with CONFIG_LAZY_RCU=y in > > > > > order to save power by batching RCU work over multiple call_rcu() > > > > > invocations. If this is causing a problem for a given call site, the > > > > > shiny new call_rcu_hurry() can be used instead. Doing this gets back > > > > > to the old-school non-laziness, but can of course consume more power. > > > > > > > > That would not be the case because CONFIG_LAZY_RCU was not an option > > > > at the time I was profiling this issue. > > > > Laxy RCU would be a great option to replace this patch but > > > > unfortunately it's not the default behavior, so I would still have to > > > > implement this batching in case lazy RCU is not enabled. > > > > > > > > > > > > > > Second, there is a much shorter one-jiffy delay between the call_rcu() > > > > > and the invocation of the corresponding callback in kernels built with > > > > > either CONFIG_NO_HZ_FULL=y (but only on CPUs mentioned in the > > > > > nohz_full > > > > > or rcu_nocbs kernel boot parameters) or CONFIG_RCU_NOCB_CPU=y (but > > > > > only > > > > > on CPUs mentioned in the rcu_nocbs kernel boot parameters). The > > > > > purpose > > > > > of this delay is to avoid lock contention, and so this delay is > > > > > incurred > > > > > only on CPUs that are queuing callbacks at a rate exceeding > > > > > 16K/second. > > > > > This is reduced to a per-jiffy limit, so on a HZ=1000 system, a CPU > > > > > invoking call_rcu() at least 16 times within a given jiffy will incur > > > > > the added delay. The reason for this delay is the use of a separate > > > > > ->nocb_bypass list. As Suren says, this bypass list is used to reduce > > > > > lock contention on the main ->cblist. This is not needed in > > > > > old-school > > > > > kernels built without either CONFIG_NO_HZ_FULL=y or > > > > > CONFIG_RCU_NOCB_CPU=y > > > > > (including most datacenter kernels) because in that case the callbacks > > > > > enqueued by call_rcu() are touched only by the corresponding CPU, so > > > > > that there is no need for locks. > > > > > > > > I believe this is the reason in my profiled case. > > > > > > > > > > > > > > Third, if you are instead seeing multiple milliseconds of CPU > > > > > consumed by > > > > > call_rcu() in the common case (for example, without the aid of > > > > > interrupts, > > > > > NMIs, or SMIs), please do let me know. That sounds to me like a bug. > > > > > > > > I don't think I've seen such a case. > > > > Thanks for clarifications, Paul! > > > > > > Thanks for the explanation Paul. I have to say this has caught me as a > > > surprise. There are just not enough details about the benchmark to > > > understand what is going on but I find it rather surprising that > > > call_rcu can induce a higher overhead than the actual kmem_cache_free > > > which is the callback. My naive understanding has been that call_rcu is > > > really fast way to defer the execution to the RCU safe context to do the > > > final cleanup. > > > > If I am following along correctly (ha!), then your "induce a higher > > overhead" should be something like "induce a higher to-kfree() latency". > > Yes, this is expected. > > > Of course, there already is a higher latency-to-kfree via call_rcu() > > than via a direct call to kfree(), and callback-offload CPUs that are > > being flooded with callbacks raise that latency a jiffy or so more in > > order to avoid lock contention. > > > > If this becomes a problem, the callback-offloading code can be a bit > > smarter about avoiding lock contention, but need to see a real problem > > before I make that change. But if there is a real problem I will of > > course fix it. > > I believe that Suren claims that the call_rcu is really visible in the > exit_mmap case. Time-to-free actual vmas shouldn't really be material > for that path. If that happens much more later on there could be some > side effects by an increased memory consumption but that should be > marginal. How fast exit_mmap really is should only depend on direct > calls from that path. > > But I guess we need some specific numbers from Suren to be sure what is > going on here. Actually, Suren did discuss these (perhaps offlist) back in August. I was just being forgetful. :-/ Thanx, Paul
Re: [PATCH V3] tools/perf/tests: Fix string substitutions in build id test
Em Fri, Jan 20, 2023 at 08:51:59AM +, David Laight escreveu: > From: Arnaldo Carvalho de Melo > > Sent: 19 January 2023 17:00 > > > > Em Thu, Jan 19, 2023 at 03:49:15PM +, David Laight escreveu: > > > From: Athira Rajeev > > > > Sent: 19 January 2023 14:27 > > > ... > > > > The test script "tests/shell/buildid.sh" uses some of the > > > > string substitution ways which are supported in bash, but not in > > > > "sh" or other shells. Above error on line number 69 that reports > > > > "Bad substitution" is: > > > > > > Looks better - assuming it works :-) > > > > :-) > > > > Can I take this as an: > > > > Acked-by: David Laight > > I'm not sure that is worth anything. > You can add a Reviewed-by Thanks, I'll then add: Reviewed-by: David Laight - Arnaldo
Re: [PATCH] tools/perf: Disable perf probe when libtraceevent is missing
Em Fri, Jan 20, 2023 at 05:32:56PM +0530, Athira Rajeev escreveu: > While parsing the tracepoint events in parse_events_add_tracepoint() > function, code checks for HAVE_LIBTRACEEVENT support. This is needed > since libtraceevent is necessary for tracepoint. But while adding > probe points, check for LIBTRACEEVENT is not done in case of perf probe. > Hence, in environment with missing libtraceevent-devel, it is > observed that adding a probe point works even though its not > supported. > Example: > Adding probe point: > ./perf probe 'vfs_getname=getname_flags:72 pathname=result->name:string' > Added new event: > probe:vfs_getname(on getname_flags:72 with > pathname=result->name:string) > You can now use it in all perf tools, such as: > perf record -e probe:vfs_getname -aR sleep 1 > But trying perf record: > ./perf record -e probe:vfs_getname -aR sleep 1 > event syntax error: 'probe:vfs_getname' > \___ unsupported tracepoint > libtraceevent is necessary for tracepoint support > Run 'perf list' for a list of valid events > > Fix this by wrapping "builtin-probe" compilation and > "perf probe" usage under "CONFIG_LIBTRACEEVENT" check. Humm, but 'perf probe' continues to work, as uou demoed above, the problem is with the suggestion to use other perf tools that need to parse tracefs and without libtraceevent, currently can't do it: [root@quaco ~]# perf probe 'vfs_getname=getname_flags:72 pathname=result->name:string' Added new event: probe:vfs_getname(on getname_flags:72 with pathname=result->name:string) You can now use it in all perf tools, such as: perf record -e probe:vfs_getname -aR sleep 1 [root@quaco ~]# perf probe -l probe:vfs_getname(on getname_flags:72@fs/namei.c with pathname) [root@quaco ~]# perf trace -e probe:vfs_getname perf: 'trace' is not a perf-command. See 'perf --help'. [root@quaco ~]# cd /sys/kernel/tracing/events/probe/vfs_getname/ [root@quaco vfs_getname]# ls enable filter format hist id trigger [root@quaco vfs_getname]# ls -la total 0 drwxr-x---. 2 root root 0 Jan 20 11:18 . drwxr-x---. 3 root root 0 Jan 20 11:18 .. -rw-r-. 1 root root 0 Jan 20 11:18 enable -rw-r-. 1 root root 0 Jan 20 11:18 filter -r--r-. 1 root root 0 Jan 20 11:18 format -r--r-. 1 root root 0 Jan 20 11:18 hist -r--r-. 1 root root 0 Jan 20 11:18 id -rw-r-. 1 root root 0 Jan 20 11:18 trigger [root@quaco vfs_getname]# But we can go on from there: [root@quaco tracing]# pwd /sys/kernel/tracing [root@quaco tracing]# echo 1 > /sys/kernel/tracing/events/probe/vfs_getname/enable [root@quaco tracing]# echo 1 > tracing_on [root@quaco tracing]# head trace_pipe systemd-oomd-979 [003] . 96369.978971: vfs_getname: (getname_flags.part.0+0x6b/0x1c0) pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service" systemd-oomd-979 [003] . 96369.979022: vfs_getname: (getname_flags.part.0+0x6b/0x1c0) pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.current" systemd-oomd-979 [003] . 96369.979084: vfs_getname: (getname_flags.part.0+0x6b/0x1c0) pathname="" systemd-oomd-979 [003] . 96369.979162: vfs_getname: (getname_flags.part.0+0x6b/0x1c0) pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.min" systemd-oomd-979 [003] . 96369.979197: vfs_getname: (getname_flags.part.0+0x6b/0x1c0) pathname="" systemd-oomd-979 [003] . 96369.979267: vfs_getname: (getname_flags.part.0+0x6b/0x1c0) pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.low" systemd-oomd-979 [003] . 96369.979303: vfs_getname: (getname_flags.part.0+0x6b/0x1c0) pathname="" systemd-oomd-979 [003] . 96369.979372: vfs_getname: (getname_flags.part.0+0x6b/0x1c0) pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.swap.current" systemd-oomd-979 [003] . 96369.979406: vfs_getname: (getname_flags.part.0+0x6b/0x1c0) pathname="" systemd-oomd-979 [003] . 96369.979475: vfs_getname: (getname_flags.part.0+0x6b/0x1c0) pathname="/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.stat" [root@quaco tracing]# So you could instead replace the suggestion from: " You can now use it in all perf tools, such as: perf record -e probe:vfs_getname -aR sleep 1 " To: " perf is not linked with libtraceevent, to use the new probe you can use tracefs: cd /sys/kernel/tracing/ echo 1 > events/probe/vfs_getname/enable echo 1 > tracing_on cat trace_pipe " wdyt? - Arnaldo > Signed-off-by: Athira Rajeev > --- > tools/perf/Build | 4 +++- > tools/perf/perf.c | 2 ++ > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/tools/perf/Build b/tools/perf/Build > index
[PATCH v2 10/10] ARM: add multi_v7_lpae_defconfig
From: Nicolas Saenz Julienne The only missing configuration option preventing us from using multi_v7_defconfig with the Raspberry Pi 4 is ARM_LPAE. It's needed as the PCIe controller found on the SoC depends on 64bit addressing, yet can't be included as not all v7 boards support LPAE. Introduce multi_v7_lpae_defconfig, built off multi_v7_defconfig, which will avoid us having to duplicate and maintain multiple similar configurations. Needless to say the Raspberry Pi 4 is not the only platform that can benefit from this new configuration. Signed-off-by: Nicolas Saenz Julienne Signed-off-by: Alexander Stein --- Directly applied from https://lore.kernel.org/linux-arm-kernel/20200203184820.4433-2-nsaenzjulie...@suse.de/T/#m96968dd45c0aaa88e0a7387024b5ac13b002363d Although I had to apply manually arch/arm/Makefile| 4 arch/arm/configs/lpae.config | 1 + 2 files changed, 5 insertions(+) create mode 100644 arch/arm/configs/lpae.config diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 506dbc72323bc..80d9eaf3dc06a 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -314,6 +314,10 @@ endif # My testing targets (bypasses dependencies) bp:; $(Q)$(MAKE) $(build)=$(boot) $(boot)/bootpImage +include $(srctree)/scripts/Makefile.defconf +PHONY += multi_v7_lpae_defconfig +multi_v7_lpae_defconfig: + $(call merge_into_defconfig,multi_v7_defconfig,lpae) define archhelp echo '* zImage- Compressed kernel image (arch/$(ARCH)/boot/zImage)' diff --git a/arch/arm/configs/lpae.config b/arch/arm/configs/lpae.config new file mode 100644 index 0..19bab134e014b --- /dev/null +++ b/arch/arm/configs/lpae.config @@ -0,0 +1 @@ +CONFIG_ARM_LPAE=y -- 2.34.1
[PATCH v2 09/10] kbuild: Add config fragment merge functionality
From: Nicolas Saenz Julienne So far this function was only used locally in powerpc, some other architectures might benefit from it. Move it into scripts/Makefile.defconf. Signed-off-by: Nicolas Saenz Julienne Signed-off-by: Alexander Stein --- Directly applied from https://lore.kernel.org/linux-arm-kernel/20200203184820.4433-2-nsaenzjulie...@suse.de/T/#m96968dd45c0aaa88e0a7387024b5ac13b002363d arch/powerpc/Makefile| 12 +--- scripts/Makefile.defconf | 15 +++ 2 files changed, 16 insertions(+), 11 deletions(-) create mode 100644 scripts/Makefile.defconf diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index dc4cbf0a5ca95..533457466ce25 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -253,17 +253,7 @@ PHONY += bootwrapper_install bootwrapper_install: $(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@) -# Used to create 'merged defconfigs' -# To use it $(call) it with the first argument as the base defconfig -# and the second argument as a space separated list of .config files to merge, -# without the .config suffix. -define merge_into_defconfig - $(Q)$(CONFIG_SHELL) $(srctree)/scripts/kconfig/merge_config.sh \ - -m -O $(objtree) $(srctree)/arch/$(ARCH)/configs/$(1) \ - $(foreach config,$(2),$(srctree)/arch/$(ARCH)/configs/$(config).config) - +$(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig -endef - +include $(srctree)/scripts/Makefile.defconf PHONY += pseries_le_defconfig pseries_le_defconfig: $(call merge_into_defconfig,pseries_defconfig,le) diff --git a/scripts/Makefile.defconf b/scripts/Makefile.defconf new file mode 100644 index 0..ab332f7534f51 --- /dev/null +++ b/scripts/Makefile.defconf @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0 +# Configuration heplers + +# Creates 'merged defconfigs' +# --- +# Usage: +# $(call merge_into_defconfig,base_config,config_fragment1 config_fragment2 ...) +# +# Input config fragments without '.config' suffix +define merge_into_defconfig + $(Q)$(CONFIG_SHELL) $(srctree)/scripts/kconfig/merge_config.sh \ + -m -O $(objtree) $(srctree)/arch/$(ARCH)/configs/$(1) \ + $(foreach config,$(2),$(srctree)/arch/$(ARCH)/configs/$(config).config) + +$(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig +endef -- 2.34.1
[PATCH v2 08/10] ARM: multi_v7_defconfig: Add options to support TQMLS102xA series
Enable drivers used on TQMLS102xA + MBLS1021A. Signed-off-by: Alexander Stein --- Changes in v2: * Changed symbols to 'm' where possible arch/arm/configs/multi_v7_defconfig | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig index ee184eb37adcf..92628a160cfb3 100644 --- a/arch/arm/configs/multi_v7_defconfig +++ b/arch/arm/configs/multi_v7_defconfig @@ -191,6 +191,7 @@ CONFIG_PCI_TEGRA=y CONFIG_PCI_RCAR_GEN2=y CONFIG_PCIE_RCAR_HOST=y CONFIG_PCI_DRA7XX_EP=y +CONFIG_PCI_LAYERSCAPE=y CONFIG_PCI_ENDPOINT=y CONFIG_PCI_ENDPOINT_CONFIGFS=y CONFIG_PCI_EPF_TEST=m @@ -249,6 +250,7 @@ CONFIG_AHCI_ST=y CONFIG_AHCI_IMX=y CONFIG_AHCI_SUNXI=y CONFIG_AHCI_TEGRA=y +CONFIG_AHCI_QORIQ=y CONFIG_SATA_HIGHBANK=y CONFIG_SATA_MV=y CONFIG_SATA_RCAR=y @@ -329,6 +331,7 @@ CONFIG_TOUCHSCREEN_ADC=m CONFIG_TOUCHSCREEN_ATMEL_MXT=m CONFIG_TOUCHSCREEN_ELAN=m CONFIG_TOUCHSCREEN_MMS114=m +CONFIG_TOUCHSCREEN_EDT_FT5X06=m CONFIG_TOUCHSCREEN_WM97XX=m CONFIG_TOUCHSCREEN_ST1232=m CONFIG_TOUCHSCREEN_STMPE=y @@ -483,6 +486,7 @@ CONFIG_GPIO_ASPEED_SGPIO=y CONFIG_GPIO_DAVINCI=y CONFIG_GPIO_DWAPB=y CONFIG_GPIO_EM=y +CONFIG_GPIO_MPC8XXX=y CONFIG_GPIO_MXC=y CONFIG_GPIO_RCAR=y CONFIG_GPIO_SYSCON=y @@ -493,6 +497,7 @@ CONFIG_GPIO_PCA953X=y CONFIG_GPIO_PCA953X_IRQ=y CONFIG_GPIO_PCF857X=y CONFIG_GPIO_PALMAS=y +CONFIG_GPIO_STMPE=y CONFIG_GPIO_TPS6586X=y CONFIG_GPIO_TPS65910=y CONFIG_GPIO_TWL4030=y @@ -533,6 +538,7 @@ CONFIG_SENSORS_INA2XX=m CONFIG_CPU_THERMAL=y CONFIG_DEVFREQ_THERMAL=y CONFIG_IMX_THERMAL=y +CONFIG_QORIQ_THERMAL=m CONFIG_ROCKCHIP_THERMAL=y CONFIG_RCAR_THERMAL=y CONFIG_ARMADA_THERMAL=y @@ -821,6 +827,8 @@ CONFIG_SND_SOC_MSM8916_WCD_ANALOG=m CONFIG_SND_SOC_MSM8916_WCD_DIGITAL=m CONFIG_SND_SOC_SGTL5000=m CONFIG_SND_SOC_STI_SAS=m +CONFIG_SND_SOC_TLV320AIC32X4=m +CONFIG_SND_SOC_TLV320AIC32X4_I2C=m CONFIG_SND_SOC_WM8978=m CONFIG_SND_AUDIO_GRAPH_CARD=m CONFIG_USB=y @@ -830,6 +838,7 @@ CONFIG_USB_XHCI_MVEBU=y CONFIG_USB_XHCI_TEGRA=m CONFIG_USB_BRCMSTB=m CONFIG_USB_EHCI_HCD=y +CONFIG_USB_EHCI_FSL=m CONFIG_USB_EHCI_HCD_STI=y CONFIG_USB_EHCI_EXYNOS=m CONFIG_USB_EHCI_MV=m @@ -933,6 +942,8 @@ CONFIG_NEW_LEDS=y CONFIG_LEDS_CLASS=y CONFIG_LEDS_CLASS_FLASH=m CONFIG_LEDS_CPCAP=m +CONFIG_LEDS_PCA9532=m +CONFIG_LEDS_PCA9532_GPIO=y CONFIG_LEDS_GPIO=y CONFIG_LEDS_PWM=y CONFIG_LEDS_MAX8997=m @@ -949,6 +960,7 @@ CONFIG_LEDS_TRIGGER_DEFAULT_ON=y CONFIG_LEDS_TRIGGER_TRANSIENT=y CONFIG_LEDS_TRIGGER_CAMERA=y CONFIG_EDAC=y +CONFIG_EDAC_LAYERSCAPE=y CONFIG_EDAC_HIGHBANK_MC=y CONFIG_EDAC_HIGHBANK_L2=y CONFIG_RTC_CLASS=y @@ -962,6 +974,7 @@ CONFIG_RTC_DRV_MAX8997=m CONFIG_RTC_DRV_MAX77686=y CONFIG_RTC_DRV_RK808=m CONFIG_RTC_DRV_RS5C372=m +CONFIG_RTC_DRV_PCF85063=m CONFIG_RTC_DRV_PCF85363=m CONFIG_RTC_DRV_BQ32K=m CONFIG_RTC_DRV_TWL4030=y -- 2.34.1
[PATCH v2 07/10] ARM: dts: ls1021a: add TQMLS1021A/MBLS102xA LVDS CDTECH FC21 overlay
Add device tree overlay for LVDS display usage. Signed-off-by: Alexander Stein --- Changes in v2: * None arch/arm/boot/dts/Makefile| 2 + ...1021a-tqmls1021a-mbls1021a-cdtech-fc21.dts | 55 +++ 2 files changed, 57 insertions(+) create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-cdtech-fc21.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index 774cc57274e20..bc51a5b868c10 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -806,10 +806,12 @@ dtb-$(CONFIG_SOC_LS1021A) += \ ls1021a-tsn.dtb \ ls1021a-twr.dtb +ls1021a-tqmls1021a-mbls1021a-cdtech-fc21-dtbs := ls1021a-tqmls1021a-mbls1021a.dtb ls1021a-tqmls1021a-mbls1021a-cdtech-fc21.dtbo ls1021a-tqmls1021a-mbls1021a-cdtech-dc44-dtbs := ls1021a-tqmls1021a-mbls1021a.dtb ls1021a-tqmls1021a-mbls1021a-cdtech-dc44.dtbo ls1021a-tqmls1021a-mbls1021a-hdmi-dtbs := ls1021a-tqmls1021a-mbls1021a.dtb ls1021a-tqmls1021a-mbls1021a-hdmi.dtbo ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33-dtbs := ls1021a-tqmls1021a-mbls1021a.dtb ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtbo +dtb-$(CONFIG_SOC_LS1021A) += ls1021a-tqmls1021a-mbls1021a-cdtech-fc21.dtb dtb-$(CONFIG_SOC_LS1021A) += ls1021a-tqmls1021a-mbls1021a-cdtech-dc44.dtb dtb-$(CONFIG_SOC_LS1021A) += ls1021a-tqmls1021a-mbls1021a-hdmi.dtb dtb-$(CONFIG_SOC_LS1021A) += ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtb diff --git a/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-cdtech-fc21.dts b/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-cdtech-fc21.dts new file mode 100644 index 0..4bd10d0e17b90 --- /dev/null +++ b/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-cdtech-fc21.dts @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: (GPL-2.0-or-later OR X11) +/* + * Copyright 2018-2023 TQ-Systems GmbH , + * D-82229 Seefeld, Germany. + * Author: Alexander Stein + + */ + +/dts-v1/; +/plugin/; + +#include +#include + +_dcu { + status = "okay"; +}; + + { + status = "okay"; + + port { + dcu_out: endpoint { + remote-endpoint = <_in>; + }; + }; +}; + + { + compatible = "cdtech,s070pws19hp-fc21"; + status = "okay"; + + port { + panel_in: endpoint { + remote-endpoint = <_out>; + }; + }; +}; + + { + #address-cells = <1>; + #size-cells = <0>; + + polytouch: touchscreen@38 { + compatible = "edt,edt-ft5406"; + reg = <0x38>; + interrupt-parent = <_0>; + interrupts = <6 IRQ_TYPE_EDGE_FALLING>; + /* LCD_PWR_EN -> TSC_WAKE */ + wake-gpios = <_1 4 GPIO_ACTIVE_HIGH>; + gain = <20>; + touchscreen-size-x = <1024>; + touchscreen-size-y = <600>; + }; +}; -- 2.34.1
[PATCH v2 06/10] ARM: dts: ls1021a: add TQMLS1021A/MBLS102xA LVDS CDTECH DC44 overlay
Add device tree overlay for LVDS display usage. Signed-off-by: Alexander Stein --- Changes in v2: * None arch/arm/boot/dts/Makefile| 2 + ...1021a-tqmls1021a-mbls1021a-cdtech-dc44.dts | 55 +++ 2 files changed, 57 insertions(+) create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-cdtech-dc44.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index d5105e8179431..774cc57274e20 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -806,9 +806,11 @@ dtb-$(CONFIG_SOC_LS1021A) += \ ls1021a-tsn.dtb \ ls1021a-twr.dtb +ls1021a-tqmls1021a-mbls1021a-cdtech-dc44-dtbs := ls1021a-tqmls1021a-mbls1021a.dtb ls1021a-tqmls1021a-mbls1021a-cdtech-dc44.dtbo ls1021a-tqmls1021a-mbls1021a-hdmi-dtbs := ls1021a-tqmls1021a-mbls1021a.dtb ls1021a-tqmls1021a-mbls1021a-hdmi.dtbo ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33-dtbs := ls1021a-tqmls1021a-mbls1021a.dtb ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtbo +dtb-$(CONFIG_SOC_LS1021A) += ls1021a-tqmls1021a-mbls1021a-cdtech-dc44.dtb dtb-$(CONFIG_SOC_LS1021A) += ls1021a-tqmls1021a-mbls1021a-hdmi.dtb dtb-$(CONFIG_SOC_LS1021A) += ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtb diff --git a/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-cdtech-dc44.dts b/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-cdtech-dc44.dts new file mode 100644 index 0..ddc71bc597295 --- /dev/null +++ b/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-cdtech-dc44.dts @@ -0,0 +1,55 @@ +// SPDX-License-Identifier: (GPL-2.0-or-later OR X11) +/* + * Copyright 2018-2023 TQ-Systems GmbH , + * D-82229 Seefeld, Germany. + * Author: Alexander Stein + + */ + +/dts-v1/; +/plugin/; + +#include +#include + +_dcu { + status = "okay"; +}; + + { + status = "okay"; + + port { + dcu_out: endpoint { + remote-endpoint = <_in>; + }; + }; +}; + + { + compatible = "cdtech,s070swv29hg-dc44"; + status = "okay"; + + port { + panel_in: endpoint { + remote-endpoint = <_out>; + }; + }; +}; + + { + #address-cells = <1>; + #size-cells = <0>; + + polytouch: touchscreen@38 { + compatible = "edt,edt-ft5406"; + reg = <0x38>; + interrupt-parent = <_0>; + interrupts = <6 IRQ_TYPE_EDGE_FALLING>; + /* LCD_PWR_EN -> TSC_WAKE */ + wake-gpios = <_1 4 GPIO_ACTIVE_HIGH>; + gain = <20>; + touchscreen-size-x = <800>; + touchscreen-size-y = <480>; + }; +}; -- 2.34.1
[PATCH v2 05/10] ARM: dts: ls1021a: add TQMLS1021A/MBLS102xA HDMI overlay
Add device tree overlay for HDMI usage. Signed-off-by: Alexander Stein --- Changes in v2: * None arch/arm/boot/dts/Makefile| 2 ++ .../ls1021a-tqmls1021a-mbls1021a-hdmi.dtso| 36 +++ 2 files changed, 38 insertions(+) create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-hdmi.dtso diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index e6943dc73a3fd..d5105e8179431 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -806,8 +806,10 @@ dtb-$(CONFIG_SOC_LS1021A) += \ ls1021a-tsn.dtb \ ls1021a-twr.dtb +ls1021a-tqmls1021a-mbls1021a-hdmi-dtbs := ls1021a-tqmls1021a-mbls1021a.dtb ls1021a-tqmls1021a-mbls1021a-hdmi.dtbo ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33-dtbs := ls1021a-tqmls1021a-mbls1021a.dtb ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtbo +dtb-$(CONFIG_SOC_LS1021A) += ls1021a-tqmls1021a-mbls1021a-hdmi.dtb dtb-$(CONFIG_SOC_LS1021A) += ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtb dtb-$(CONFIG_SOC_VF610) += \ diff --git a/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-hdmi.dtso b/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-hdmi.dtso new file mode 100644 index 0..f5ca22643c08e --- /dev/null +++ b/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-hdmi.dtso @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: (GPL-2.0-or-later OR X11) +/* + * Copyright 2018-2023 TQ-Systems GmbH , + * D-82229 Seefeld, Germany. + * Author: Alexander Stein + + */ + +/dts-v1/; +/plugin/; + + { + status = "okay"; + + port { + dcu_out: endpoint { + remote-endpoint = <_in>; + }; + }; +}; + +_out { + status = "okay"; +}; + + { + status = "okay"; + + ports { + port@0 { + sii9022a_in: endpoint { + remote-endpoint = <_out>; + }; + }; + }; +}; -- 2.34.1
[PATCH v2 04/10] ARM: dts: ls1021a: add TQMLS1021A/MBLS102xA LVDS TM070JVHG33 overlay
Add device tree overlay for LVDS display usage. Signed-off-by: Alexander Stein --- Changes in v2: * None arch/arm/boot/dts/Makefile| 5 ++ ...tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtso | 56 +++ 2 files changed, 61 insertions(+) create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtso diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index 44b5ed44b13d6..e6943dc73a3fd 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -805,6 +805,11 @@ dtb-$(CONFIG_SOC_LS1021A) += \ ls1021a-tqmls1021a-mbls1021a.dtb \ ls1021a-tsn.dtb \ ls1021a-twr.dtb + +ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33-dtbs := ls1021a-tqmls1021a-mbls1021a.dtb ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtbo + +dtb-$(CONFIG_SOC_LS1021A) += ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtb + dtb-$(CONFIG_SOC_VF610) += \ vf500-colibri-eval-v3.dtb \ vf610-bk4.dtb \ diff --git a/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtso b/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtso new file mode 100644 index 0..252ef982dd862 --- /dev/null +++ b/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtso @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: (GPL-2.0-or-later OR X11) +/* + * Copyright 2018-2023 TQ-Systems GmbH , + * D-82229 Seefeld, Germany. + * Author: Alexander Stein + + */ + +/dts-v1/; +/plugin/; + +#include + +_dcu { + status = "okay"; +}; + + { + status = "okay"; + + port { + + dcu_out: endpoint { + remote-endpoint = <_encoder_in>; + }; + }; +}; + + { + compatible = "tianma,tm070jvhg33"; + status = "okay"; + + port { + panel_in: endpoint { + remote-endpoint = <_encoder_out>; + }; + }; +}; + +_encoder { + status = "okay"; + + ports { + port@0 { + lvds_encoder_in: endpoint { + remote-endpoint = <_out>; + }; + }; + + port@1 { + lvds_encoder_out: endpoint { + remote-endpoint = <_in>; + }; + }; + }; +}; -- 2.34.1
[PATCH v2 03/10] ARM: dts: ls1021a: add TQMLS1021A flash partition layout
The bootloader does not add the partitions into DT, so add them manually here. Signed-off-by: Alexander Stein --- Changes in v2: * None arch/arm/boot/dts/ls1021a-tqmls1021a.dtsi | 31 +++ 1 file changed, 31 insertions(+) diff --git a/arch/arm/boot/dts/ls1021a-tqmls1021a.dtsi b/arch/arm/boot/dts/ls1021a-tqmls1021a.dtsi index 24ad4a76fe597..ea6734253ba88 100644 --- a/arch/arm/boot/dts/ls1021a-tqmls1021a.dtsi +++ b/arch/arm/boot/dts/ls1021a-tqmls1021a.dtsi @@ -77,5 +77,36 @@ qflash0: flash@0 { spi-rx-bus-width = <4>; spi-tx-bus-width = <4>; reg = <0>; + + partitions { + compatible = "fixed-partitions"; + #address-cells = <1>; + #size-cells = <1>; + + uboot@0 { + label = "U-Boot-PBL"; + reg = <0x0 0xe>; + }; + + env@e { + label = "U-Boot Environment"; + reg = <0xe 0x1>; + }; + + dtb@f { + label = "DTB"; + reg = <0xf 0x1>; + }; + + linux@10 { + label = "Linux"; + reg = <0x10 0x70>; + }; + + rootfs@80 { + label = "RootFS"; + reg = <0x80 0x380>; + }; + }; }; }; -- 2.34.1
[PATCH v2 02/10] ARM: dts: ls1021a: add TQ-Systems MBLS102xA device tree
Add device tree for the MBLS102xA mainboard with TQMLS1021A SoM. Signed-off-by: Alexander Stein --- Changes in v2: * Remove unnecessary status = "okay" * Remove underscore from node names * Move reg direct below compatiblefor i2c devices * Remove i2c device nodes without software support Add a comment about existance for the device though arch/arm/boot/dts/Makefile| 1 + .../boot/dts/ls1021a-tqmls1021a-mbls1021a.dts | 406 ++ arch/arm/boot/dts/ls1021a-tqmls1021a.dtsi | 81 3 files changed, 488 insertions(+) create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a.dts create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a.dtsi diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index d0c07867aeabe..44b5ed44b13d6 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -802,6 +802,7 @@ dtb-$(CONFIG_SOC_LS1021A) += \ ls1021a-iot.dtb \ ls1021a-moxa-uc-8410a.dtb \ ls1021a-qds.dtb \ + ls1021a-tqmls1021a-mbls1021a.dtb \ ls1021a-tsn.dtb \ ls1021a-twr.dtb dtb-$(CONFIG_SOC_VF610) += \ diff --git a/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a.dts b/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a.dts new file mode 100644 index 0..aa8b605344655 --- /dev/null +++ b/arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a.dts @@ -0,0 +1,406 @@ +// SPDX-License-Identifier: (GPL-2.0-or-later OR X11) +/* + * Copyright 2013-2014 Freescale Semiconductor, Inc. + * Copyright 2018-2023 TQ-Systems GmbH , + * D-82229 Seefeld, Germany. + * Author: Alexander Stein + */ + +/dts-v1/; + +#include +#include +#include +#include +#include +#include "ls1021a-tqmls1021a.dtsi" + +/ { + model = "TQMLS102xA SOM on MBLS102xA"; + compatible = "tq,ls1021a-tqmls1021a-mbls102xa", "tq,ls1021a-tqmls1021a", "fsl,ls1021a"; + + audio_mclk: audio-clock { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <2500>; + }; + + backlight_dcu: backlight { + compatible = "gpio-backlight"; + gpios = < 0 GPIO_ACTIVE_LOW>; + status = "disabled"; + }; + + gpio-keys { + compatible = "gpio-keys"; + autorepeat; + + switch-1 { + label = "S6"; + linux,code = ; + gpios = <_0 0 GPIO_ACTIVE_LOW>; + }; + + btn2: switch-2 { + label = "S7"; + linux,code = ; + gpios = <_0 1 GPIO_ACTIVE_LOW>; + }; + + switch-3 { + label = "S8"; + linux,code = ; + gpios = <_0 2 GPIO_ACTIVE_LOW>; + }; + }; + + gpio_leds: gpio-leds { + compatible = "gpio-leds"; + + led-0 { + color = ; + function = LED_FUNCTION_STATUS; + function-enumerator = <0>; + gpios = <_2 4 GPIO_ACTIVE_LOW>; + linux,default-trigger = "default-on"; + }; + + led-1 { + color = ; + function = LED_FUNCTION_STATUS; + function-enumerator = <1>; + gpios = <_2 5 GPIO_ACTIVE_LOW>; + linux,default-trigger = "default-on"; + }; + + led-2 { + color = ; + function = LED_FUNCTION_STATUS; + function-enumerator = <2>; + gpios = <_2 6 GPIO_ACTIVE_LOW>; + linux,default-trigger = "default-on"; + }; + + led-3 { + color = ; + function = LED_FUNCTION_HEARTBEAT; + function-enumerator = <0>; + gpios = <_2 7 GPIO_ACTIVE_LOW>; + linux,default-trigger = "heartbeat"; + }; + }; + + lvds_encoder: lvds-encoder { + compatible = "ti,sn75lvds83", "lvds-encoder"; + power-supply = <_3p3v>; + status = "disabled"; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + + lvds_encoder_in: endpoint {}; + }; + + port@1 { + reg = <1>; + + lvds_encoder_out: endpoint {}; + }; + }; + }; + + reg_1p2v: regulator-1p2v { + compatible = "regulator-fixed"; + regulator-name = "1P2V"; +
[PATCH v2 01/10] dt-bindings: arm: fsl: add TQ-Systems LS1021A board
From: Matthias Schiffer TQMLS102xA is a SOM family using NXP LS1021A CPU family. MBLS102xA is an evaluation mainboard for this SOM. Signed-off-by: Matthias Schiffer Signed-off-by: Alexander Stein --- Changes in v2: * Improved the description mentioning this is a socketable module Documentation/devicetree/bindings/arm/fsl.yaml | 10 ++ 1 file changed, 10 insertions(+) diff --git a/Documentation/devicetree/bindings/arm/fsl.yaml b/Documentation/devicetree/bindings/arm/fsl.yaml index 3ba354578e8f9..489d350c3bbbe 100644 --- a/Documentation/devicetree/bindings/arm/fsl.yaml +++ b/Documentation/devicetree/bindings/arm/fsl.yaml @@ -1164,6 +1164,16 @@ properties: - fsl,ls1021a-twr - const: fsl,ls1021a + - description: + TQ-Systems TQMLS102xA is a series of socketable SOM featuring + LS102x system-on-chip variants. MBLS102xA mainboard can be used as + starterkit. +items: + - enum: + - tq,ls1021a-tqmls1021a-mbls102xa + - const: tq,ls1021a-tqmls1021a + - const: fsl,ls1021a + - description: LS1028A based Boards items: - enum: -- 2.34.1
[PATCH v2 00/10] TQMLS1021A support
Hi, this is the second round of this series to add support for the TQMLS1021A using the MBLS1021A mainboard. The changelog is included in the individual patches but the most notable one is picking up the old patch series for adding lpae config snippet support. Best regards, Alexander Alexander Stein (7): ARM: dts: ls1021a: add TQ-Systems MBLS102xA device tree ARM: dts: ls1021a: add TQMLS1021A flash partition layout ARM: dts: ls1021a: add TQMLS1021A/MBLS102xA LVDS TM070JVHG33 overlay ARM: dts: ls1021a: add TQMLS1021A/MBLS102xA HDMI overlay ARM: dts: ls1021a: add TQMLS1021A/MBLS102xA LVDS CDTECH DC44 overlay ARM: dts: ls1021a: add TQMLS1021A/MBLS102xA LVDS CDTECH FC21 overlay ARM: multi_v7_defconfig: Add options to support TQMLS102xA series Matthias Schiffer (1): dt-bindings: arm: fsl: add TQ-Systems LS1021A board Nicolas Saenz Julienne (2): kbuild: Add config fragment merge functionality ARM: add multi_v7_lpae_defconfig .../devicetree/bindings/arm/fsl.yaml | 10 + arch/arm/Makefile | 4 + arch/arm/boot/dts/Makefile| 12 + ...1021a-tqmls1021a-mbls1021a-cdtech-dc44.dts | 55 +++ ...1021a-tqmls1021a-mbls1021a-cdtech-fc21.dts | 55 +++ .../ls1021a-tqmls1021a-mbls1021a-hdmi.dtso| 36 ++ ...tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtso | 56 +++ .../boot/dts/ls1021a-tqmls1021a-mbls1021a.dts | 406 ++ arch/arm/boot/dts/ls1021a-tqmls1021a.dtsi | 112 + arch/arm/configs/lpae.config | 1 + arch/arm/configs/multi_v7_defconfig | 13 + arch/powerpc/Makefile | 12 +- scripts/Makefile.defconf | 15 + 13 files changed, 776 insertions(+), 11 deletions(-) create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-cdtech-dc44.dts create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-cdtech-fc21.dts create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-hdmi.dtso create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a-lvds-tm070jvhg33.dtso create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a-mbls1021a.dts create mode 100644 arch/arm/boot/dts/ls1021a-tqmls1021a.dtsi create mode 100644 arch/arm/configs/lpae.config create mode 100644 scripts/Makefile.defconf -- 2.34.1
Re: [PATCH v3 16/51] cpuidle: Annotate poll_idle()
On Thu, Jan 12, 2023 at 08:43:30PM +0100, Peter Zijlstra wrote: > The __cpuidle functions will become a noinstr class, as such they need > explicit annotations. > > Signed-off-by: Peter Zijlstra (Intel) > Reviewed-by: Rafael J. Wysocki > Acked-by: Frederic Weisbecker > Tested-by: Tony Lindgren > Tested-by: Ulf Hansson > --- > drivers/cpuidle/poll_state.c |6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > --- a/drivers/cpuidle/poll_state.c > +++ b/drivers/cpuidle/poll_state.c > @@ -13,7 +13,10 @@ > static int __cpuidle poll_idle(struct cpuidle_device *dev, > struct cpuidle_driver *drv, int index) > { > - u64 time_start = local_clock(); > + u64 time_start; > + > + instrumentation_begin(); > + time_start = local_clock(); > > dev->poll_time_limit = false; > > @@ -39,6 +42,7 @@ static int __cpuidle poll_idle(struct cp > raw_local_irq_disable(); > > current_clr_polling(); > + instrumentation_end(); > > return index; > } Pff, this patch is garbage. However wrote it didn't have his brain engaged :/ Something like the below fixes it, but I still need to build me funny configs like ia64 and paravirt to see if I didn't wreck me something... diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a78e73da4a74..70c07e11caa6 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -215,7 +215,7 @@ static void __init cyc2ns_init_secondary_cpus(void) /* * Scheduler clock - returns current time in nanosec units. */ -u64 native_sched_clock(void) +noinstr u64 native_sched_clock(void) { if (static_branch_likely(&__use_tsc)) { u64 tsc_now = rdtsc(); diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 500d1720421e..0b00f21cefe3 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -426,7 +426,7 @@ void cpuidle_reflect(struct cpuidle_device *dev, int index) * @dev: the cpuidle device * */ -u64 cpuidle_poll_time(struct cpuidle_driver *drv, +__cpuidle u64 cpuidle_poll_time(struct cpuidle_driver *drv, struct cpuidle_device *dev) { int i; diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c index d25ec52846e6..bdcfeaecd228 100644 --- a/drivers/cpuidle/poll_state.c +++ b/drivers/cpuidle/poll_state.c @@ -15,7 +15,6 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, { u64 time_start; - instrumentation_begin(); time_start = local_clock(); dev->poll_time_limit = false; @@ -42,7 +41,6 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, raw_local_irq_disable(); current_clr_polling(); - instrumentation_end(); return index; } diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h index 867d588314e0..7960f0769884 100644 --- a/include/linux/sched/clock.h +++ b/include/linux/sched/clock.h @@ -45,7 +45,7 @@ static inline u64 cpu_clock(int cpu) return sched_clock(); } -static inline u64 local_clock(void) +static __always_inline u64 local_clock(void) { return sched_clock(); } @@ -79,7 +79,7 @@ static inline u64 cpu_clock(int cpu) return sched_clock_cpu(cpu); } -static inline u64 local_clock(void) +static __always_inline u64 local_clock(void) { return sched_clock_cpu(raw_smp_processor_id()); } diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index e374c0c923da..6b3b0559e53c 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -260,7 +260,7 @@ notrace static inline u64 wrap_max(u64 x, u64 y) * - filter out backward motion * - use the GTOD tick value to create a window to filter crazy TSC values */ -notrace static u64 sched_clock_local(struct sched_clock_data *scd) +noinstr static u64 sched_clock_local(struct sched_clock_data *scd) { u64 now, clock, old_clock, min_clock, max_clock, gtod; s64 delta; @@ -287,7 +287,7 @@ notrace static u64 sched_clock_local(struct sched_clock_data *scd) clock = wrap_max(clock, min_clock); clock = wrap_min(clock, max_clock); - if (!try_cmpxchg64(>clock, _clock, clock)) + if (!arch_try_cmpxchg64(>clock, _clock, clock)) goto again; return clock; @@ -360,7 +360,7 @@ notrace static u64 sched_clock_remote(struct sched_clock_data *scd) * * See cpu_clock(). */ -notrace u64 sched_clock_cpu(int cpu) +noinstr u64 sched_clock_cpu(int cpu) { struct sched_clock_data *scd; u64 clock;
Re: [PATCH v4 5/7] treewide: Trace IPIs sent via smp_send_reschedule()
For LoongArch parts, Acked-by: Huacai Chen On Thu, Jan 19, 2023 at 10:37 PM Valentin Schneider wrote: > > To be able to trace invocations of smp_send_reschedule(), rename the > arch-specific definitions of it to arch_smp_send_reschedule() and wrap it > into an smp_send_reschedule() that contains a tracepoint. > > Changes to include the declaration of the tracepoint were driven by the > following coccinelle script: > > @func_use@ > @@ > smp_send_reschedule(...); > > @include@ > @@ > #include > > @no_include depends on func_use && !include@ > @@ > #include <...> > + > + #include > > Signed-off-by: Valentin Schneider > [csky bits] > Acked-by: Guo Ren > [riscv bits] > Acked-by: Palmer Dabbelt > --- > arch/alpha/kernel/smp.c | 2 +- > arch/arc/kernel/smp.c| 2 +- > arch/arm/kernel/smp.c| 2 +- > arch/arm/mach-actions/platsmp.c | 2 ++ > arch/arm64/kernel/smp.c | 2 +- > arch/csky/kernel/smp.c | 2 +- > arch/hexagon/kernel/smp.c| 2 +- > arch/ia64/kernel/smp.c | 4 ++-- > arch/loongarch/kernel/smp.c | 4 ++-- > arch/mips/include/asm/smp.h | 2 +- > arch/mips/kernel/rtlx-cmp.c | 2 ++ > arch/openrisc/kernel/smp.c | 2 +- > arch/parisc/kernel/smp.c | 4 ++-- > arch/powerpc/kernel/smp.c| 6 -- > arch/powerpc/kvm/book3s_hv.c | 3 +++ > arch/powerpc/platforms/powernv/subcore.c | 2 ++ > arch/riscv/kernel/smp.c | 4 ++-- > arch/s390/kernel/smp.c | 2 +- > arch/sh/kernel/smp.c | 2 +- > arch/sparc/kernel/smp_32.c | 2 +- > arch/sparc/kernel/smp_64.c | 2 +- > arch/x86/include/asm/smp.h | 2 +- > arch/x86/kvm/svm/svm.c | 4 > arch/x86/kvm/x86.c | 2 ++ > arch/xtensa/kernel/smp.c | 2 +- > include/linux/smp.h | 11 +-- > virt/kvm/kvm_main.c | 2 ++ > 27 files changed, 52 insertions(+), 26 deletions(-) > > diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c > index f4e20f75438f8..38637eb9eebd5 100644 > --- a/arch/alpha/kernel/smp.c > +++ b/arch/alpha/kernel/smp.c > @@ -562,7 +562,7 @@ handle_ipi(struct pt_regs *regs) > } > > void > -smp_send_reschedule(int cpu) > +arch_smp_send_reschedule(int cpu) > { > #ifdef DEBUG_IPI_MSG > if (cpu == hard_smp_processor_id()) > diff --git a/arch/arc/kernel/smp.c b/arch/arc/kernel/smp.c > index ad93fe6e4b77d..409cfa4675b40 100644 > --- a/arch/arc/kernel/smp.c > +++ b/arch/arc/kernel/smp.c > @@ -292,7 +292,7 @@ static void ipi_send_msg(const struct cpumask *callmap, > enum ipi_msg_type msg) > ipi_send_msg_one(cpu, msg); > } > > -void smp_send_reschedule(int cpu) > +void arch_smp_send_reschedule(int cpu) > { > ipi_send_msg_one(cpu, IPI_RESCHEDULE); > } > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index 45b8ca2ce521f..dea24a6e0ed6f 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -744,7 +744,7 @@ void __init set_smp_ipi_range(int ipi_base, int n) > ipi_setup(smp_processor_id()); > } > > -void smp_send_reschedule(int cpu) > +void arch_smp_send_reschedule(int cpu) > { > smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE); > } > diff --git a/arch/arm/mach-actions/platsmp.c b/arch/arm/mach-actions/platsmp.c > index f26618b435145..7b208e96fbb67 100644 > --- a/arch/arm/mach-actions/platsmp.c > +++ b/arch/arm/mach-actions/platsmp.c > @@ -20,6 +20,8 @@ > #include > #include > > +#include > + > #define OWL_CPU1_ADDR 0x50 > #define OWL_CPU1_FLAG 0x5c > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 937d2623e06ba..8d108edc4a89f 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -976,7 +976,7 @@ void __init set_smp_ipi_range(int ipi_base, int n) > ipi_setup(smp_processor_id()); > } > > -void smp_send_reschedule(int cpu) > +void arch_smp_send_reschedule(int cpu) > { > smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE); > } > diff --git a/arch/csky/kernel/smp.c b/arch/csky/kernel/smp.c > index 4b605aa2e1d65..fd7f81be16dd6 100644 > --- a/arch/csky/kernel/smp.c > +++ b/arch/csky/kernel/smp.c > @@ -140,7 +140,7 @@ void smp_send_stop(void) > on_each_cpu(ipi_stop, NULL, 1); > } > > -void smp_send_reschedule(int cpu) > +void arch_smp_send_reschedule(int cpu) > { > send_ipi_message(cpumask_of(cpu), IPI_RESCHEDULE); > } > diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c > index 4ba93e59370c4..4e8bee25b8c68 100644 > --- a/arch/hexagon/kernel/smp.c > +++ b/arch/hexagon/kernel/smp.c > @@ -217,7 +217,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus) > } > } > > -void smp_send_reschedule(int
Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code
On Thu, Jan 19, 2023 at 11:34:46AM +0100, Michal Suchánek wrote: > Hello, > > On Thu, Jan 19, 2023 at 10:24:07AM +, Christophe Leroy wrote: > > > > > > Le 19/01/2023 à 10:53, Michal Suchanek a écrit : > > > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique") > > > breaks build because of wrong argument to snprintf. That certainly > > > avoids the runtime error but is not the intended outcome. > > > > > > Also use standard device name format of-display.N for all created > > > devices. > > > > > > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique") > > > Signed-off-by: Michal Suchanek > > > --- > > > v2: Update the device name format > > > --- > > > drivers/of/platform.c | 12 > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > > index f2a5d679a324..8c1b1de22036 100644 > > > --- a/drivers/of/platform.c > > > +++ b/drivers/of/platform.c > > > @@ -525,7 +525,9 @@ static int __init > > > of_platform_default_populate_init(void) > > > if (IS_ENABLED(CONFIG_PPC)) { > > > struct device_node *boot_display = NULL; > > > struct platform_device *dev; > > > - int display_number = 1; > > > + int display_number = 0; > > > + char buf[14]; > > > > Can you declare that in the for block where it is used instead ? > > No, there are two for blocks. > > > > > > + char *of_display_format = "of-display.%d"; > > > > Should be const ? > > Yes, could be. > > > > > > int ret; > > > > > > /* Check if we have a MacOS display without a node spec > > > */ > > > @@ -556,7 +558,10 @@ static int __init > > > of_platform_default_populate_init(void) > > > if (!of_get_property(node, "linux,opened", > > > NULL) || > > > !of_get_property(node, > > > "linux,boot-display", NULL)) > > > continue; > > > - dev = of_platform_device_create(node, "of-display", > > > NULL); > > > + ret = snprintf(buf, sizeof(buf), of_display_format, > > > display_number++); > > > + if (ret >= sizeof(buf)) > > > + continue; > > > > > > Can you make buf big enough to avoid that ? > > It would be a bit fragile that way. > > The buffer would have to theoretically accomodate > "of-display.-9223372036854775808", and any change to the format requires > recalculating the length, by hand. > > Of course, the memory would run out way before allocating that many > devices so it's kind of pointless to try and accomodate all possible > device numbers. > > > > > And by the way could it be called something else than 'buf' ? > > > > See exemple here : > > https://elixir.bootlin.com/linux/v6.1/source/drivers/fsi/fsi-occ.c#L690 > > Yes, that is quite possible. Nonetheless, just like 'ret' generic > variable names also work. And in fact judicious use of short generic variable names is more readeable than naming all variables foobar_* as far as I am concerned. Of course, YMMV. Thanks Michal
[PATCH] tools/perf: Disable perf probe when libtraceevent is missing
While parsing the tracepoint events in parse_events_add_tracepoint() function, code checks for HAVE_LIBTRACEEVENT support. This is needed since libtraceevent is necessary for tracepoint. But while adding probe points, check for LIBTRACEEVENT is not done in case of perf probe. Hence, in environment with missing libtraceevent-devel, it is observed that adding a probe point works even though its not supported. Example: Adding probe point: ./perf probe 'vfs_getname=getname_flags:72 pathname=result->name:string' Added new event: probe:vfs_getname(on getname_flags:72 with pathname=result->name:string) You can now use it in all perf tools, such as: perf record -e probe:vfs_getname -aR sleep 1 But trying perf record: ./perf record -e probe:vfs_getname -aR sleep 1 event syntax error: 'probe:vfs_getname' \___ unsupported tracepoint libtraceevent is necessary for tracepoint support Run 'perf list' for a list of valid events Fix this by wrapping "builtin-probe" compilation and "perf probe" usage under "CONFIG_LIBTRACEEVENT" check. Signed-off-by: Athira Rajeev --- tools/perf/Build | 4 +++- tools/perf/perf.c | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/perf/Build b/tools/perf/Build index 6dd67e502295..a138a2304929 100644 --- a/tools/perf/Build +++ b/tools/perf/Build @@ -33,7 +33,9 @@ ifeq ($(CONFIG_LIBTRACEEVENT),y) perf-$(CONFIG_TRACE) += trace/beauty/ endif -perf-$(CONFIG_LIBELF) += builtin-probe.o +ifeq ($(CONFIG_LIBTRACEEVENT),y) + perf-$(CONFIG_LIBELF) += builtin-probe.o +endif perf-y += bench/ perf-y += tests/ diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 82bbe0ca858b..7b0d79284d5a 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -80,9 +80,11 @@ static struct cmd_struct commands[] = { #ifdef HAVE_LIBTRACEEVENT { "sched", cmd_sched, 0 }, #endif +#ifdef HAVE_LIBTRACEEVENT #ifdef HAVE_LIBELF_SUPPORT { "probe", cmd_probe, 0 }, #endif +#endif #ifdef HAVE_LIBTRACEEVENT { "kmem", cmd_kmem, 0 }, { "lock", cmd_lock, 0 }, -- 2.39.0
Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code
On Fri, Jan 20, 2023 at 12:39:23PM +0100, Thomas Zimmermann wrote: > Hi > > Am 20.01.23 um 12:27 schrieb Michal Suchánek: > > Hello, > > > > On Thu, Jan 19, 2023 at 04:20:57PM +0100, Thomas Zimmermann wrote: > > > Hi > > > > > > Am 19.01.23 um 14:23 schrieb Michal Suchánek: > > > > On Thu, Jan 19, 2023 at 02:11:13PM +0100, Thomas Zimmermann wrote: > > > > > Hi > > > > > > > > > > Am 19.01.23 um 11:24 schrieb Christophe Leroy: > > > > > > > > > > > > > > > > > > Le 19/01/2023 à 10:53, Michal Suchanek a écrit : > > > > > > > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique") > > > > > > > breaks build because of wrong argument to snprintf. That certainly > > > > > > > avoids the runtime error but is not the intended outcome. > > > > > > > > > > > > > > Also use standard device name format of-display.N for all created > > > > > > > devices. > > > > > > > > > > > > > > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique") > > > > > > > Signed-off-by: Michal Suchanek > > > > > > > --- > > > > > > > v2: Update the device name format > > > > > > > --- > > > > > > > drivers/of/platform.c | 12 > > > > > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > > > > > > index f2a5d679a324..8c1b1de22036 100644 > > > > > > > --- a/drivers/of/platform.c > > > > > > > +++ b/drivers/of/platform.c > > > > > > > @@ -525,7 +525,9 @@ static int __init > > > > > > > of_platform_default_populate_init(void) > > > > > > > if (IS_ENABLED(CONFIG_PPC)) { > > > > > > > struct device_node *boot_display = NULL; > > > > > > > struct platform_device *dev; > > > > > > > - int display_number = 1; > > > > > > > + int display_number = 0; > > > > > > > + char buf[14]; > > > > > > > > > > > > Can you declare that in the for block where it is used instead ? > > > > > > > > > > > > > + char *of_display_format = "of-display.%d"; > > > > > > > > > > > > Should be const ? > > > > > > > > > > That should be static const of_display_format[] = then > > > > > > > > Why? It sounds completely fine to have a const pointer to a string > > > > constatnt. > > > > > > Generally speaking: > > > > > > 'static' because your const pointer is then not a local variable, so it > > > takes pressure off the stack. For global variables, you don't want them to > > > show up in any linker symbol tables. > > > > This sounds a lot like an exemplar case of premature optimization. > > A simplistic compiler might do exactly what you say, and allocate a slot > > for the variable on the stack the moment the function is entered. > > > > However, in real compilers there is no stack pressure from having a > > local variable: > > - the compiler can put the variable into a register > > - it can completely omit the variable before and after it's actually > > used which is that specific function call > > > > > The string "of-display.%d" is stored as an array in the ELF data section. > > > And your char pointer is a reference to that array. For static pointers, > > > these indirections take CPU cycles to update when the loader has to > > > relocate > > > > Provided that the char pointer ever exists in the compiled code. Its > > address is not taken so it does not need to. > > > > > sections. If you declare of_display_format[] directly as array, you avoid > > > the reference and work directly with the array. > > > > > > Of course, this is a kernel module and the string is self-contained within > > > the function. So the compiler can probably detect that and optimize the > > > code > > > to be like the 'static const []' version. It's still good to follow best > > > practices, as someone might copy from this function. > > > > If it could not detect it there would be a lot of trouble all around. > > The issues definitely exist in userspace code. Kernel modules are simpler, > so compiler optimization is easier. > > But I'm not really trying to make a technical argument. My point here is > that someone might read your code and duplicate the pattern. That's not > unreasonable: it's core Linux code, so it can be assumed to be good (or at > least not bad). But your current code teaches the reader a bad practices, > which should be avoided. It is better to do the correct thing, even if it > makes no difference to the compiled code. The point I am trying to get across is that besides the original objection about missing 'const' this code is not bad. Loading a string constant address into a local variable and passing it as function call argument is perfectly fine. If you get any advantage by the alternate convoluted construct it's more likely than anything else a bug in the compiler you are using. It may be necessary to work around such bugs in performance-critical code but not in driver probing code that runs exactly once during boot. Thanks
RE: Calculating array sizes in C - was: Re: Build regressions/improvements in v6.2-rc1
From: Segher Boessenkool > Sent: 20 January 2023 10:54 ... > > > I suggest to file a bug against gcc complaining about a "spurious > > > warning", and using "-Werror -Wno-error-sizeof-pointer-div" until gcc is > > > adapted to not emit the warning about the pointer division if the result > > > is not used. > > Yeah. If the first operand of a conditional operator is non-zero, the > second operand is not evaluated, and if the first is zero, the third > operand is not evaluated. It is better if we do not warn about > something we do not evaluate. In cases like here where it is clear at > compile time which branch is taken, that shouldn't be too hard. > > Can someone please file a GCC PR? With reduced testcase preferably. It's not a bug. All the operands of the conditional operator have to be valid. It might be that the optimiser can discard one, but that happens much later on. Even the operands of choose_expr() have to be valid - but can have different types. I'm not sure what the code is trying to do or why it is failing. Was it a fail in userspace - where the option to allow sizeof (void) isn't allowed. FWIW you can check for a compile-time NULL (or 0) with: #define is_null(x) (sizeof *(0 : (void *)(x) ? (int *)0) != 1) Although that is a compile-time error for non-NULL unless 'void *' arithmetic is allowed. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH v7 4/8] crash: add phdr for possible CPUs in elfcorehdr
On 19/01/2023 19:29:52, Laurent Dufour wrote: > On 15/01/2023 16:02:02, Sourabh Jain wrote: >> On architectures like PowerPC the crash notes are available for all >> possible CPUs. So let's populate the elfcorehdr for all possible >> CPUs having crash notes to avoid updating elfcorehdr during in-kernel >> crash update on CPU hotplug events. >> >> The similar technique is used in kexec-tool for kexec_load case. >> >> Signed-off-by: Sourabh Jain >> --- >> kernel/crash_core.c | 9 ++--- >> 1 file changed, 6 insertions(+), 3 deletions(-) > > This patch is not applying on ppc/next (53ab112a9508). > > As far as I could see, crash_prepare_elf64_headers() is defined in the file > kernel/kexec_file.c and that's not recent, see babac4a84a88 (kexec_file, > x86: move re-factored code to generic side, 2018-04-13) > > Am I missing something? My mistake, sounds that your series is based on top of the Eric's one (not yet upstream): https://lore.kernel.org/lkml/20230118213544.2128-1-eric.devol...@oracle.com/ > >> >> diff --git a/kernel/crash_core.c b/kernel/crash_core.c >> index 910d377ea317e..19f987b3851e8 100644 >> --- a/kernel/crash_core.c >> +++ b/kernel/crash_core.c >> @@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct kimage *image, >> struct crash_mem *mem, >> ehdr->e_ehsize = sizeof(Elf64_Ehdr); >> ehdr->e_phentsize = sizeof(Elf64_Phdr); >> >> -/* Prepare one phdr of type PT_NOTE for each present CPU */ >> -for_each_present_cpu(cpu) { >> +/* Prepare one phdr of type PT_NOTE for possible CPU with crash note. */ >> +for_each_possible_cpu(cpu) { >> #ifdef CONFIG_CRASH_HOTPLUG >> if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) { >> /* Skip the soon-to-be offlined cpu */ >> @@ -373,8 +373,11 @@ int crash_prepare_elf64_headers(struct kimage *image, >> struct crash_mem *mem, >> continue; >> } >> #endif >> -phdr->p_type = PT_NOTE; >> notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu)); >> +if (!notes_addr) >> +continue; >> + >> +phdr->p_type = PT_NOTE; >> phdr->p_offset = phdr->p_paddr = notes_addr; >> phdr->p_filesz = phdr->p_memsz = sizeof(note_buf_t); >> (ehdr->e_phnum)++; >
Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code
Hi Am 20.01.23 um 12:27 schrieb Michal Suchánek: Hello, On Thu, Jan 19, 2023 at 04:20:57PM +0100, Thomas Zimmermann wrote: Hi Am 19.01.23 um 14:23 schrieb Michal Suchánek: On Thu, Jan 19, 2023 at 02:11:13PM +0100, Thomas Zimmermann wrote: Hi Am 19.01.23 um 11:24 schrieb Christophe Leroy: Le 19/01/2023 à 10:53, Michal Suchanek a écrit : The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique") breaks build because of wrong argument to snprintf. That certainly avoids the runtime error but is not the intended outcome. Also use standard device name format of-display.N for all created devices. Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique") Signed-off-by: Michal Suchanek --- v2: Update the device name format --- drivers/of/platform.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/of/platform.c b/drivers/of/platform.c index f2a5d679a324..8c1b1de22036 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -525,7 +525,9 @@ static int __init of_platform_default_populate_init(void) if (IS_ENABLED(CONFIG_PPC)) { struct device_node *boot_display = NULL; struct platform_device *dev; - int display_number = 1; + int display_number = 0; + char buf[14]; Can you declare that in the for block where it is used instead ? + char *of_display_format = "of-display.%d"; Should be const ? That should be static const of_display_format[] = then Why? It sounds completely fine to have a const pointer to a string constatnt. Generally speaking: 'static' because your const pointer is then not a local variable, so it takes pressure off the stack. For global variables, you don't want them to show up in any linker symbol tables. This sounds a lot like an exemplar case of premature optimization. A simplistic compiler might do exactly what you say, and allocate a slot for the variable on the stack the moment the function is entered. However, in real compilers there is no stack pressure from having a local variable: - the compiler can put the variable into a register - it can completely omit the variable before and after it's actually used which is that specific function call The string "of-display.%d" is stored as an array in the ELF data section. And your char pointer is a reference to that array. For static pointers, these indirections take CPU cycles to update when the loader has to relocate Provided that the char pointer ever exists in the compiled code. Its address is not taken so it does not need to. sections. If you declare of_display_format[] directly as array, you avoid the reference and work directly with the array. Of course, this is a kernel module and the string is self-contained within the function. So the compiler can probably detect that and optimize the code to be like the 'static const []' version. It's still good to follow best practices, as someone might copy from this function. If it could not detect it there would be a lot of trouble all around. The issues definitely exist in userspace code. Kernel modules are simpler, so compiler optimization is easier. But I'm not really trying to make a technical argument. My point here is that someone might read your code and duplicate the pattern. That's not unreasonable: it's core Linux code, so it can be assumed to be good (or at least not bad). But your current code teaches the reader a bad practices, which should be avoided. It is better to do the correct thing, even if it makes no difference to the compiled code. Best regards Thomas Thanks Michal -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Ivo Totev OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v2] of: Fix of platform build on powerpc due to bad of disaply code
Hello, On Thu, Jan 19, 2023 at 04:20:57PM +0100, Thomas Zimmermann wrote: > Hi > > Am 19.01.23 um 14:23 schrieb Michal Suchánek: > > On Thu, Jan 19, 2023 at 02:11:13PM +0100, Thomas Zimmermann wrote: > > > Hi > > > > > > Am 19.01.23 um 11:24 schrieb Christophe Leroy: > > > > > > > > > > > > Le 19/01/2023 à 10:53, Michal Suchanek a écrit : > > > > > The commit 2d681d6a23a1 ("of: Make of framebuffer devices unique") > > > > > breaks build because of wrong argument to snprintf. That certainly > > > > > avoids the runtime error but is not the intended outcome. > > > > > > > > > > Also use standard device name format of-display.N for all created > > > > > devices. > > > > > > > > > > Fixes: 2d681d6a23a1 ("of: Make of framebuffer devices unique") > > > > > Signed-off-by: Michal Suchanek > > > > > --- > > > > > v2: Update the device name format > > > > > --- > > > > > drivers/of/platform.c | 12 > > > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > > > > > index f2a5d679a324..8c1b1de22036 100644 > > > > > --- a/drivers/of/platform.c > > > > > +++ b/drivers/of/platform.c > > > > > @@ -525,7 +525,9 @@ static int __init > > > > > of_platform_default_populate_init(void) > > > > > if (IS_ENABLED(CONFIG_PPC)) { > > > > > struct device_node *boot_display = NULL; > > > > > struct platform_device *dev; > > > > > - int display_number = 1; > > > > > + int display_number = 0; > > > > > + char buf[14]; > > > > > > > > Can you declare that in the for block where it is used instead ? > > > > > > > > > + char *of_display_format = "of-display.%d"; > > > > > > > > Should be const ? > > > > > > That should be static const of_display_format[] = then > > > > Why? It sounds completely fine to have a const pointer to a string > > constatnt. > > Generally speaking: > > 'static' because your const pointer is then not a local variable, so it > takes pressure off the stack. For global variables, you don't want them to > show up in any linker symbol tables. This sounds a lot like an exemplar case of premature optimization. A simplistic compiler might do exactly what you say, and allocate a slot for the variable on the stack the moment the function is entered. However, in real compilers there is no stack pressure from having a local variable: - the compiler can put the variable into a register - it can completely omit the variable before and after it's actually used which is that specific function call > The string "of-display.%d" is stored as an array in the ELF data section. > And your char pointer is a reference to that array. For static pointers, > these indirections take CPU cycles to update when the loader has to relocate Provided that the char pointer ever exists in the compiled code. Its address is not taken so it does not need to. > sections. If you declare of_display_format[] directly as array, you avoid > the reference and work directly with the array. > > Of course, this is a kernel module and the string is self-contained within > the function. So the compiler can probably detect that and optimize the code > to be like the 'static const []' version. It's still good to follow best > practices, as someone might copy from this function. If it could not detect it there would be a lot of trouble all around. Thanks Michal
Re: Calculating array sizes in C - was: Re: Build regressions/improvements in v6.2-rc1
On Thu, Jan 19, 2023 at 09:31:21PM -0600, Rob Landley wrote: > On 1/19/23 16:11, Michael.Karcher wrote: > > I don't see a clear bug at this point. We are talking about the C expression > > > > __same_type((void*)0, (void*)0)? 0 : sizeof((void*)0)/sizeof(*((void*0)) (__same_type is a kernel macro, it expands to something with __builtin_compatible_type()). > *(void*) is type "void" which does not have a size. It has size 1, in GCC, so that you can do arithmetic on pointers to void. This is a long-standing and very widely used GCC extension. """ 6.24 Arithmetic on 'void'- and Function-Pointers In GNU C, addition and subtraction operations are supported on pointers to 'void' and on pointers to functions. This is done by treating the size of a 'void' or of a function as 1. A consequence of this is that 'sizeof' is also allowed on 'void' and on function types, and returns 1. The option '-Wpointer-arith' requests a warning if these extensions are used. """ > The problem is gcc "optimizing out" an earlier type check, the same way it > "optimizes out" checks for signed integer math overflowing, or "optimizes > out" a > comparison to pointers from two different local variables from different > function calls trying to calculate the amount of stack used, or "optimizes > out" Are you saying something in the kernel code here is invalid code? Because your other examples are. > using char *x = (char *)1; as a flag value and then doing "if (!(x-1)) because > it can "never happen"... Like here. And no, this is not allowed by -fno-strict-aliasing. > > I suggest to file a bug against gcc complaining about a "spurious > > warning", and using "-Werror -Wno-error-sizeof-pointer-div" until gcc is > > adapted to not emit the warning about the pointer division if the result > > is not used. Yeah. If the first operand of a conditional operator is non-zero, the second operand is not evaluated, and if the first is zero, the third operand is not evaluated. It is better if we do not warn about something we do not evaluate. In cases like here where it is clear at compile time which branch is taken, that shouldn't be too hard. Can someone please file a GCC PR? With reduced testcase preferably. Segher
[PATCH] powerpc/powernv/ioda: Skip unallocated resources when mapping to PE
pnv_ioda_setup_pe_res() calls opal to map a resource with a PE. However, the code assumes the resource is allocated and it uses the resource address to find out the segment(s) which need to be mapped to the PE. In the unlikely case where the resource hasn't been allocated, the computation for the segment number is garbage, which can lead to invalid memory access and potentially a kernel crash, such as: [ ] pci_bus 0002:02: Configuring PE for bus [ ] pci 0002:02 : [PE# fc] Secondary bus 0x0002..0x0002 associated with PE#fc [ ] BUG: Kernel NULL pointer dereference on write at 0x [ ] Faulting instruction address: 0xc005eac4 [ ] Oops: Kernel access of bad area, sig: 7 [#1] [ ] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV [ ] Modules linked in: [ ] CPU: 12 PID: 1 Comm: swapper/20 Not tainted 5.10.50-openpower1 #2 [ ] NIP: c005eac4 LR: c005ea44 CTR: 30061b9c [ ] REGS: c00027383650 TRAP: 0300 Not tainted (5.10.50-openpower1) [ ] MSR: 90009033 CR: 44000224 XER: 2004 [ ] CFAR: c005eaa0 DAR: DSISR: 0208 IRQMASK: 0 [ ] GPR00: c005dd98 c000273838e0 c185de00 c000200fff018960 [ ] GPR04: 00fc 0003 [ ] GPR08: 90001033 [ ] GPR12: 31cb c00e6a80 c0010a58 [ ] GPR16: [ ] GPR20: c711e200 [ ] GPR24: 0100 c00029501120 c0002cee2800 03ff [ ] GPR28: c000200fff018960 c000200ffcb7fd00 [ ] NIP [c005eac4] pnv_ioda_setup_pe_res+0x94/0x1a0 [ ] LR [c005ea44] pnv_ioda_setup_pe_res+0x14/0x1a0 [ ] Call Trace: [ ] [c000273838e0] [c005eb98] pnv_ioda_setup_pe_res+0x168/0x1a0 (unreliable) [ ] [c00027383970] [c005dd98] pnv_pci_ioda_dma_dev_setup+0x43c/0x970 [ ] [c00027383a60] [c0032cdc] pcibios_bus_add_device+0x78/0x18c [ ] [c00027383aa0] [c028f2bc] pci_bus_add_device+0x28/0xbc [ ] [c00027383b10] [c028f3a0] pci_bus_add_devices+0x50/0x7c [ ] [c00027383b50] [c028f3c4] pci_bus_add_devices+0x74/0x7c [ ] [c00027383b90] [c028f3c4] pci_bus_add_devices+0x74/0x7c [ ] [c00027383bd0] [c069ad0c] pcibios_init+0xf0/0x104 [ ] [c00027383c50] [c00106d8] do_one_initcall+0x84/0x1c4 [ ] [c00027383d20] [c06910b8] kernel_init_freeable+0x264/0x268 [ ] [c00027383dc0] [c0010a68] kernel_init+0x18/0x138 [ ] [c00027383e20] [c000cbfc] ret_from_kernel_thread+0x5c/0x80 [ ] Instruction dump: [ ] 7f89e840 409d000c 7fbbf840 409c000c 38210090 4848f448 809c002c e95e0120 [ ] 7ba91764 38a3 57a7043e 38c0 <7c8a492e> 5484043e e87e0018 4bff23bd Hitting the problem is not that easy. It was seen with a (semi-bogus) PCI device with a class code of 0. The generic PCI framework doesn't allocate resources in such a case. The patch is simply skipping resources which are still flagged with IORESOURCE_UNSET. We don't have the problem with 64-bit mem resources, as the address of the resource is checked to be within the range of the 64-bit mmio window. See pnv_ioda_reserve_dev_m64_pe() and pnv_pci_is_m64(). Reported-by: Andrew Jeffery Fixes: 23e79425fe7c ("powerpc/powernv: Simplify pnv_ioda_setup_pe_seg()") Signed-off-by: Frederic Barrat --- arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 5c144c05cbfd..4f6e20a35aa1 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2325,7 +2325,8 @@ static void pnv_ioda_setup_pe_res(struct pnv_ioda_pe *pe, int index; int64_t rc; - if (!res || !res->flags || res->start > res->end) + if (!res || !res->flags || res->start > res->end || + res->flags & IORESOURCE_UNSET) return; if (res->flags & IORESOURCE_IO) { -- 2.39.0
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Thu 19-01-23 11:17:07, Paul E. McKenney wrote: > On Thu, Jan 19, 2023 at 01:52:14PM +0100, Michal Hocko wrote: > > On Wed 18-01-23 11:01:08, Suren Baghdasaryan wrote: > > > On Wed, Jan 18, 2023 at 10:34 AM Paul E. McKenney > > > wrote: > > [...] > > > > There are a couple of possibilities here. > > > > > > > > First, if I am remembering correctly, the time between the call_rcu() > > > > and invocation of the corresponding callback was taking multiple > > > > seconds, > > > > but that was because the kernel was built with CONFIG_LAZY_RCU=y in > > > > order to save power by batching RCU work over multiple call_rcu() > > > > invocations. If this is causing a problem for a given call site, the > > > > shiny new call_rcu_hurry() can be used instead. Doing this gets back > > > > to the old-school non-laziness, but can of course consume more power. > > > > > > That would not be the case because CONFIG_LAZY_RCU was not an option > > > at the time I was profiling this issue. > > > Laxy RCU would be a great option to replace this patch but > > > unfortunately it's not the default behavior, so I would still have to > > > implement this batching in case lazy RCU is not enabled. > > > > > > > > > > > Second, there is a much shorter one-jiffy delay between the call_rcu() > > > > and the invocation of the corresponding callback in kernels built with > > > > either CONFIG_NO_HZ_FULL=y (but only on CPUs mentioned in the nohz_full > > > > or rcu_nocbs kernel boot parameters) or CONFIG_RCU_NOCB_CPU=y (but only > > > > on CPUs mentioned in the rcu_nocbs kernel boot parameters). The purpose > > > > of this delay is to avoid lock contention, and so this delay is incurred > > > > only on CPUs that are queuing callbacks at a rate exceeding 16K/second. > > > > This is reduced to a per-jiffy limit, so on a HZ=1000 system, a CPU > > > > invoking call_rcu() at least 16 times within a given jiffy will incur > > > > the added delay. The reason for this delay is the use of a separate > > > > ->nocb_bypass list. As Suren says, this bypass list is used to reduce > > > > lock contention on the main ->cblist. This is not needed in old-school > > > > kernels built without either CONFIG_NO_HZ_FULL=y or > > > > CONFIG_RCU_NOCB_CPU=y > > > > (including most datacenter kernels) because in that case the callbacks > > > > enqueued by call_rcu() are touched only by the corresponding CPU, so > > > > that there is no need for locks. > > > > > > I believe this is the reason in my profiled case. > > > > > > > > > > > Third, if you are instead seeing multiple milliseconds of CPU consumed > > > > by > > > > call_rcu() in the common case (for example, without the aid of > > > > interrupts, > > > > NMIs, or SMIs), please do let me know. That sounds to me like a bug. > > > > > > I don't think I've seen such a case. > > > Thanks for clarifications, Paul! > > > > Thanks for the explanation Paul. I have to say this has caught me as a > > surprise. There are just not enough details about the benchmark to > > understand what is going on but I find it rather surprising that > > call_rcu can induce a higher overhead than the actual kmem_cache_free > > which is the callback. My naive understanding has been that call_rcu is > > really fast way to defer the execution to the RCU safe context to do the > > final cleanup. > > If I am following along correctly (ha!), then your "induce a higher > overhead" should be something like "induce a higher to-kfree() latency". Yes, this is expected. > Of course, there already is a higher latency-to-kfree via call_rcu() > than via a direct call to kfree(), and callback-offload CPUs that are > being flooded with callbacks raise that latency a jiffy or so more in > order to avoid lock contention. > > If this becomes a problem, the callback-offloading code can be a bit > smarter about avoiding lock contention, but need to see a real problem > before I make that change. But if there is a real problem I will of > course fix it. I believe that Suren claims that the call_rcu is really visible in the exit_mmap case. Time-to-free actual vmas shouldn't really be material for that path. If that happens much more later on there could be some side effects by an increased memory consumption but that should be marginal. How fast exit_mmap really is should only depend on direct calls from that path. But I guess we need some specific numbers from Suren to be sure what is going on here. Thanks! -- Michal Hocko SUSE Labs
Re: [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free
On Thu 19-01-23 10:52:03, Suren Baghdasaryan wrote: > On Thu, Jan 19, 2023 at 4:59 AM Michal Hocko wrote: > > > > On Mon 09-01-23 12:53:34, Suren Baghdasaryan wrote: > > > call_rcu() can take a long time when callback offloading is enabled. > > > Its use in the vm_area_free can cause regressions in the exit path when > > > multiple VMAs are being freed. To minimize that impact, place VMAs into > > > a list and free them in groups using one call_rcu() call per group. > > > > After some more clarification I can understand how call_rcu might not be > > super happy about thousands of callbacks to be invoked and I do agree > > that this is not really optimal. > > > > On the other hand I do not like this solution much either. > > VM_AREA_FREE_LIST_MAX is arbitrary and it won't really help all that > > much with processes with a huge number of vmas either. It would still be > > in housands of callbacks to be scheduled without a good reason. > > > > Instead, are there any other cases than remove_vma that need this > > batching? We could easily just link all the vmas into linked list and > > use a single call_rcu instead, no? This would both simplify the > > implementation, remove the scaling issue as well and we do not have to > > argue whether VM_AREA_FREE_LIST_MAX should be epsilon or epsilon + 1. > > Yes, I agree the solution is not stellar. I wanted something simple > but this is probably too simple. OTOH keeping all dead vm_area_structs > on the list without hooking up a shrinker (additional complexity) does > not sound too appealing either. I suspect you have missed my idea. I do not really want to keep the list around or any shrinker. It is dead simple. Collect all vmas in remove_vma and then call_rcu the whole list at once after the whole list (be it from exit_mmap or remove_mt). See? -- Michal Hocko SUSE Labs
RE: [PATCH V3] tools/perf/tests: Fix string substitutions in build id test
From: Arnaldo Carvalho de Melo > Sent: 19 January 2023 17:00 > > Em Thu, Jan 19, 2023 at 03:49:15PM +, David Laight escreveu: > > From: Athira Rajeev > > > Sent: 19 January 2023 14:27 > > ... > > > The test script "tests/shell/buildid.sh" uses some of the > > > string substitution ways which are supported in bash, but not in > > > "sh" or other shells. Above error on line number 69 that reports > > > "Bad substitution" is: > > > > Looks better - assuming it works :-) > > :-) > > Can I take this as an: > > Acked-by: David Laight I'm not sure that is worth anything. You can add a Reviewed-by David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: Calculating array sizes in C - was: Re: Build regressions/improvements in v6.2-rc1
Hi Michael! On 1/19/23 23:11, Michael.Karcher wrote: I suggest to file a bug against gcc complaining about a "spurious warning", and using "-Werror -Wno-error-sizeof-pointer-div" until gcc is adapted to not emit the warning about the pointer division if the result is not used. Could you post a kernel patch for that? I would be happy to test it on my SH-7785CLR board. Also, I'm going to file a bug report against GCC. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `-GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Re: [PATCH v9 00/10] phy: Add support for Lynx 10G SerDes
On 19-01-23, 11:22, Sean Anderson wrote: > On 1/18/23 11:54, Vinod Koul wrote: > > On 17-01-23, 11:46, Sean Anderson wrote: > >> > >> I noticed that this series is marked "changes requested" on patchwork. > >> However, I have received only automated feedback. I have done my best > >> effort to address feedback I have received on prior revisions. I would > >> appreciate getting another round of review before resending this series. > > > > Looking at the series, looks like kernel-bot sent some warnings on the > > series so I was expecting an updated series for review > > > > Generally, multiple reviewers will comment on a patch, even if another > reviewer finds something which needs to be changed. This is a one-line > fix, so I would appreciate getting more substantial feedback before > respinning. Every time I send a new series I have to rebase and test on > hardware. It's work that I would rather do when there is something to be > gained. I review to apply, if I can apply, I would typically skip this -- ~Vinod
Re: [PATCH V3] tools/perf/tests: Fix string substitutions in build id test
Environment with /bin/sh # readlink -f $(which sh) /bin/dash Running perf test from tmp.perf/urgent # ./perf test -v "build id cache operations" 73: build id cache operations : --- start --- test child forked, pid 71063 WARNING: wine not found. PE binaries will not be run. test binaries: /tmp/perf.ex.SHA1.RNm /tmp/perf.ex.MD5.Flx ./tests/shell/../pe-file.exe DEBUGINFOD_URLS= Adding 4abd406f041feb4f10ecde3fc30fd0639e1a91cb /tmp/perf.ex.SHA1.RNm: Ok build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb ./tests/shell/buildid.sh: 69: ./tests/shell/buildid.sh: Bad substitution test child finished with -2 end After applying the patch in same environment, error is not seen and test runs fine. Tested the same patch in bash as well. # readlink -f $(which sh) /usr/bin/bash The test works fine with the changes. Tested-by: Disha Goel On 1/19/23 7:57 PM, Athira Rajeev wrote: The perf test named “build id cache operations” skips with below error on some distros: <<>> 78: build id cache operations : test child forked, pid 01 WARNING: wine not found. PE binaries will not be run. test binaries: /tmp/perf.ex.SHA1.PKz /tmp/perf.ex.MD5.Gt3 ./tests/shell/../pe-file.exe DEBUGINFOD_URLS= Adding 4abd406f041feb4f10ecde3fc30fd0639e1a91cb /tmp/perf.ex.SHA1.PKz: Ok build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb ./tests/shell/buildid.sh: 69: ./tests/shell/buildid.sh: Bad substitution test child finished with -2 build id cache operations: Skip <<>> The test script "tests/shell/buildid.sh" uses some of the string substitution ways which are supported in bash, but not in "sh" or other shells. Above error on line number 69 that reports "Bad substitution" is: <<>> link=${build_id_dir}/.build-id/${id:0:2}/${id:2} <<>> Here the way of getting first two characters from id ie, ${id:0:2} and similarly expressions like ${id:2} is not recognised in "sh". So the line errors and instead of hitting failure, the test gets skipped as shown in logs. So the syntax issue causes test not to be executed in such cases. Similarly usage : "${@: -1}" [ to pick last argument passed to a function] in “test_record” doesn’t work in all distros. Fix this by using alternative way with shell substitution to pick required characters from the string. Also fix the usage of “${@: -1}” to work in all cases. Another usage in “test_record” is: <<>> ${perf} record --buildid-all -o ${data} $@ &> ${log} <<>> This causes the perf record to start in background and Results in the data file not being created by the time "check" function is invoked. Below log shows perf record result getting displayed after the call to "check" function. <<>> running: perf record /tmp/perf.ex.SHA1.EAU build id: 4abd406f041feb4f10ecde3fc30fd0639e1a91cb link: /tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb failed: link /tmp/perf.debug.mLT/.build-id/4a/bd406f041feb4f10ecde3fc30fd0639e1a91cb does not exist test child finished with -1 build id cache operations: FAILED! root@machine:~/athira/linux/tools/perf# Couldn't synthesize bpf events. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.010 MB /tmp/perf.data.bFF ] <<>> Fix this by redirecting output instead of using “&” which starts the command in background. Signed-off-by: Athira Rajeev Acked-by: Ian Rogers --- Changelog: From v2 -> v3 - Addressed review comments from David Laight for using shell substitutions. From v1 -> v2 - Added Acked-by from Ian. - Rebased to tmp.perf/urgent of: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tools/perf/tests/shell/buildid.sh | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/tools/perf/tests/shell/buildid.sh b/tools/perf/tests/shell/buildid.sh index aaf851108ca3..0ce22ea0a7f1 100755 --- a/tools/perf/tests/shell/buildid.sh +++ b/tools/perf/tests/shell/buildid.sh @@ -66,7 +66,9 @@ check() esac echo "build id: ${id}" - link=${build_id_dir}/.build-id/${id:0:2}/${id:2} + id_file=${id#??} + id_dir=${id%$id_file} + link=$build_id_dir/.build-id/$id_dir/$id_file echo "link: ${link}" if [ ! -h $link ]; then @@ -74,7 +76,7 @@ check() exit 1 fi - file=${build_id_dir}/.build-id/${id:0:2}/`readlink ${link}`/elf + file=${build_id_dir}/.build-id/$id_dir/`readlink ${link}`/elf echo "file: ${file}" # Check for file permission of original file @@ -130,20 +132,22 @@ test_record() { data=$(mktemp /tmp/perf.data.XXX) build_id_dir=$(mktemp -d /tmp/perf.debug.XXX) - log=$(mktemp /tmp/perf.log.XXX) + log_out=$(mktemp /tmp/perf.log.out.XXX) + log_err=$(mktemp /tmp/perf.log.err.XXX) perf="perf --buildid-dir ${build_id_dir}" echo "running: perf record $@" - ${perf} record --buildid-all -o ${data} $@ &> ${log} + ${perf} record --buildid-all -o ${data} $@ 1>${log_out} 2>${log_err} if [
[PATCH v4 07/24] powerpc/secvar: Handle max object size in the consumer
From: Russell Currey Currently the max object size is handled in the core secvar code with an entirely OPAL-specific implementation, so create a new max_size() op and move the existing implementation into the powernv platform. Should be no functional change. Signed-off-by: Russell Currey Signed-off-by: Andrew Donnellan --- v3: Change uint64_t type to u64 (mpe) v4: Return immediately if node is NULL (gjoyce) --- arch/powerpc/include/asm/secvar.h| 1 + arch/powerpc/kernel/secvar-sysfs.c | 17 +++ arch/powerpc/platforms/powernv/opal-secvar.c | 22 3 files changed, 26 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/secvar.h b/arch/powerpc/include/asm/secvar.h index 2d9816dff128..b97ab793cc8a 100644 --- a/arch/powerpc/include/asm/secvar.h +++ b/arch/powerpc/include/asm/secvar.h @@ -18,6 +18,7 @@ struct secvar_operations { int (*get_next)(const char *key, u64 *key_len, u64 keybufsize); int (*set)(const char *key, u64 key_len, u8 *data, u64 data_size); ssize_t (*format)(char *buf, size_t bufsize); + int (*max_size)(u64 *max_size); }; #ifdef CONFIG_PPC_SECURE_BOOT diff --git a/arch/powerpc/kernel/secvar-sysfs.c b/arch/powerpc/kernel/secvar-sysfs.c index 4beec935f5e7..d2f65a67845c 100644 --- a/arch/powerpc/kernel/secvar-sysfs.c +++ b/arch/powerpc/kernel/secvar-sysfs.c @@ -132,27 +132,16 @@ static struct kobj_type secvar_ktype = { static int update_kobj_size(void) { - struct device_node *node; u64 varsize; - int rc = 0; + int rc = secvar_ops->max_size(); - node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend"); - if (!of_device_is_available(node)) { - rc = -ENODEV; - goto out; - } - - rc = of_property_read_u64(node, "max-var-size", ); if (rc) - goto out; + return rc; data_attr.size = varsize; update_attr.size = varsize; -out: - of_node_put(node); - - return rc; + return 0; } static int secvar_sysfs_load(void) diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c b/arch/powerpc/platforms/powernv/opal-secvar.c index e33bb703ecbc..a8436bf35e2f 100644 --- a/arch/powerpc/platforms/powernv/opal-secvar.c +++ b/arch/powerpc/platforms/powernv/opal-secvar.c @@ -122,11 +122,33 @@ static ssize_t opal_secvar_format(char *buf, size_t bufsize) return rc; } +static int opal_secvar_max_size(u64 *max_size) +{ + int rc; + struct device_node *node; + + node = of_find_compatible_node(NULL, NULL, "ibm,secvar-backend"); + if (!node) + return -ENODEV; + + if (!of_device_is_available(node)) { + rc = -ENODEV; + goto out; + } + + rc = of_property_read_u64(node, "max-var-size", max_size); + +out: + of_node_put(node); + return rc; +} + static const struct secvar_operations opal_secvar_ops = { .get = opal_get_variable, .get_next = opal_get_next_variable, .set = opal_set_variable, .format = opal_secvar_format, + .max_size = opal_secvar_max_size, }; static int opal_secvar_probe(struct platform_device *pdev) -- 2.39.0
[PATCH v4 12/24] powerpc/secvar: Don't print error on ENOENT when reading variables
If attempting to read the size or data attributes of a non-existent variable (which will be possible after a later patch to expose the PLPKS via the secvar interface), don't spam the kernel log with error messages. Only print errors for return codes that aren't ENOENT. Reported-by: Sudhakar Kuppusamy Signed-off-by: Andrew Donnellan --- v3: New patch --- arch/powerpc/kernel/secvar-sysfs.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/secvar-sysfs.c b/arch/powerpc/kernel/secvar-sysfs.c index 6dd9b4f6f87c..33d1797851ba 100644 --- a/arch/powerpc/kernel/secvar-sysfs.c +++ b/arch/powerpc/kernel/secvar-sysfs.c @@ -43,8 +43,8 @@ static ssize_t size_show(struct kobject *kobj, struct kobj_attribute *attr, rc = secvar_ops->get(kobj->name, strlen(kobj->name) + 1, NULL, ); if (rc) { - pr_err("Error retrieving %s variable size %d\n", kobj->name, - rc); + if (rc != -ENOENT) + pr_err("Error retrieving %s variable size %d\n", kobj->name, rc); return rc; } @@ -61,7 +61,8 @@ static ssize_t data_read(struct file *filep, struct kobject *kobj, rc = secvar_ops->get(kobj->name, strlen(kobj->name) + 1, NULL, ); if (rc) { - pr_err("Error getting %s variable size %d\n", kobj->name, rc); + if (rc != -ENOENT) + pr_err("Error getting %s variable size %d\n", kobj->name, rc); return rc; } pr_debug("dsize is %llu\n", dsize); -- 2.39.0
[PATCH v4 18/24] powerpc/pseries: Make caller pass buffer to plpks_read_var()
Currently, plpks_read_var() allocates a buffer to pass to the H_PKS_READ_OBJECT hcall, then allocates another buffer, of the caller's preferred size if necessary, into which the data is copied, and returns that buffer to the caller. This is a bit over the top - while we probably still want to allocate a separate buffer to pass to the hypervisor in the hcall, we can let the caller allocate the final buffer and specify the size. Don't allocate var->data in plpks_read_var(), instead expect the caller to allocate it. If the caller needs to discover the size, it can set var->data to NULL and var->datalen will be populated. Update header file to document this. Suggested-by: Michael Ellerman Signed-off-by: Andrew Donnellan Signed-off-by: Russell Currey --- v3: New patch (mpe) --- arch/powerpc/include/asm/plpks.h | 12 arch/powerpc/platforms/pseries/plpks.c | 11 --- 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/plpks.h b/arch/powerpc/include/asm/plpks.h index e7204e6c0ca4..0c49969b0864 100644 --- a/arch/powerpc/include/asm/plpks.h +++ b/arch/powerpc/include/asm/plpks.h @@ -88,16 +88,28 @@ int plpks_remove_var(char *component, u8 varos, /** * Returns the data for the specified os variable. + * + * Caller must allocate a buffer in var->data with length in var->datalen. + * If no buffer is provided, var->datalen will be populated with the object's + * size. */ int plpks_read_os_var(struct plpks_var *var); /** * Returns the data for the specified firmware variable. + * + * Caller must allocate a buffer in var->data with length in var->datalen. + * If no buffer is provided, var->datalen will be populated with the object's + * size. */ int plpks_read_fw_var(struct plpks_var *var); /** * Returns the data for the specified bootloader variable. + * + * Caller must allocate a buffer in var->data with length in var->datalen. + * If no buffer is provided, var->datalen will be populated with the object's + * size. */ int plpks_read_bootloader_var(struct plpks_var *var); diff --git a/arch/powerpc/platforms/pseries/plpks.c b/arch/powerpc/platforms/pseries/plpks.c index 96a026a37285..5d9c6a3b2014 100644 --- a/arch/powerpc/platforms/pseries/plpks.c +++ b/arch/powerpc/platforms/pseries/plpks.c @@ -578,17 +578,14 @@ static int plpks_read_var(u8 consumer, struct plpks_var *var) goto out_free_output; } - if (var->datalen == 0 || var->datalen > retbuf[0]) + if (!var->data || var->datalen > retbuf[0]) var->datalen = retbuf[0]; - var->data = kzalloc(var->datalen, GFP_KERNEL); - if (!var->data) { - rc = -ENOMEM; - goto out_free_output; - } var->policy = retbuf[1]; - memcpy(var->data, output, var->datalen); + if (var->data) + memcpy(var->data, output, var->datalen); + rc = 0; out_free_output: -- 2.39.0
[PATCH v4 23/24] integrity/powerpc: Improve error handling & reporting when loading certs
From: Russell Currey A few improvements to load_powerpc.c: - include integrity.h for the pr_fmt() - move all error reporting out of get_cert_list() - use ERR_PTR() to better preserve error detail - don't use pr_err() for missing keys Signed-off-by: Russell Currey Signed-off-by: Andrew Donnellan --- v3: New patch --- .../integrity/platform_certs/load_powerpc.c | 26 ++- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/security/integrity/platform_certs/load_powerpc.c b/security/integrity/platform_certs/load_powerpc.c index 1e4f80a4e71c..dee51606d5f4 100644 --- a/security/integrity/platform_certs/load_powerpc.c +++ b/security/integrity/platform_certs/load_powerpc.c @@ -14,9 +14,15 @@ #include #include #include "keyring_handler.h" +#include "../integrity.h" /* * Get a certificate list blob from the named secure variable. + * + * Returns: + * - a pointer to a kmalloc'd buffer containing the cert list on success + * - NULL if the key does not exist + * - an ERR_PTR on error */ static __init void *get_cert_list(u8 *key, unsigned long keylen, u64 *size) { @@ -25,19 +31,19 @@ static __init void *get_cert_list(u8 *key, unsigned long keylen, u64 *size) rc = secvar_ops->get(key, keylen, NULL, size); if (rc) { - pr_err("Couldn't get size: %d\n", rc); - return NULL; + if (rc == -ENOENT) + return NULL; + return ERR_PTR(rc); } db = kmalloc(*size, GFP_KERNEL); if (!db) - return NULL; + return ERR_PTR(-ENOMEM); rc = secvar_ops->get(key, keylen, db, size); if (rc) { kfree(db); - pr_err("Error reading %s var: %d\n", key, rc); - return NULL; + return ERR_PTR(rc); } return db; @@ -69,7 +75,11 @@ static int __init load_powerpc_certs(void) */ db = get_cert_list("db", 3, ); if (!db) { - pr_err("Couldn't get db list from firmware\n"); + pr_info("Couldn't get db list from firmware\n"); + } else if (IS_ERR(db)) { + rc = PTR_ERR(db); + pr_err("Error reading db from firmware: %d\n", rc); + return rc; } else { rc = parse_efi_signature_list("powerpc:db", db, dbsize, get_handler_for_db); @@ -81,6 +91,10 @@ static int __init load_powerpc_certs(void) dbx = get_cert_list("dbx", 4, ); if (!dbx) { pr_info("Couldn't get dbx list from firmware\n"); + } else if (IS_ERR(dbx)) { + rc = PTR_ERR(dbx); + pr_err("Error reading dbx from firmware: %d\n", rc); + return rc; } else { rc = parse_efi_signature_list("powerpc:dbx", dbx, dbxsize, get_handler_for_dbx); -- 2.39.0
[PATCH v4 19/24] powerpc/pseries: Turn PSERIES_PLPKS into a hidden option
It seems a bit unnecessary for the PLPKS code to have a user-visible config option when it doesn't do anything on its own, and there's existing options for enabling Secure Boot-related features. It should be enabled by PPC_SECURE_BOOT, which will eventually be what uses PLPKS to populate keyrings. However, we can't get of the separate option completely, because it will also be used for SED Opal purposes. Change PSERIES_PLPKS into a hidden option, which is selected by PPC_SECURE_BOOT. Signed-off-by: Andrew Donnellan Signed-off-by: Russell Currey --- v3: New patch --- arch/powerpc/Kconfig | 1 + arch/powerpc/platforms/pseries/Kconfig | 11 +-- 2 files changed, 2 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index b8c4ac56bddc..d4ed46101bec 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -1029,6 +1029,7 @@ config PPC_SECURE_BOOT depends on PPC_POWERNV || PPC_PSERIES depends on IMA_ARCH_POLICY imply IMA_SECURE_AND_OR_TRUSTED_BOOT + select PSERIES_PLPKS if PPC_PSERIES help Systems with firmware secure boot enabled need to define security policies to extend secure boot to the OS. This config allows a user diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index a3b4d99567cb..82b6f993be0f 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -151,16 +151,7 @@ config IBMEBUS config PSERIES_PLPKS depends on PPC_PSERIES - bool "Support for the Platform Key Storage" - help - PowerVM provides an isolated Platform Keystore(PKS) storage - allocation for each LPAR with individually managed access - controls to store sensitive information securely. It can be - used to store asymmetric public keys or secrets as required - by different usecases. Select this config to enable - operating system interface to hypervisor to access this space. - - If unsure, select N. + bool config PAPR_SCM depends on PPC_PSERIES && MEMORY_HOTPLUG && LIBNVDIMM -- 2.39.0
[PATCH v4 21/24] powerpc/pseries: Pass PLPKS password on kexec
From: Russell Currey Before interacting with the PLPKS, we ask the hypervisor to generate a password for the current boot, which is then required for most further PLPKS operations. If we kexec into a new kernel, the new kernel will try and fail to generate a new password, as the password has already been set. Pass the password through to the new kernel via the device tree, in /chosen/plpks-pw. Check for the presence of this property before trying to generate a new password - if it exists, use the existing password and remove it from the device tree. Signed-off-by: Russell Currey Signed-off-by: Andrew Donnellan --- v3: New patch v4: Fix compile when CONFIG_PSERIES_PLPKS=n (snowpatch) Fix error handling on fdt_path_offset() call (ruscur) --- arch/powerpc/kexec/file_load_64.c | 18 ++ arch/powerpc/platforms/pseries/plpks.c | 18 +- 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c index af8854f9eae3..0c9130af60cc 100644 --- a/arch/powerpc/kexec/file_load_64.c +++ b/arch/powerpc/kexec/file_load_64.c @@ -27,6 +27,7 @@ #include #include #include +#include struct umem_info { u64 *buf; /* data buffer for usable-memory property */ @@ -1156,6 +1157,9 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, { struct crash_mem *umem = NULL, *rmem = NULL; int i, nr_ranges, ret; +#ifdef CONFIG_PSERIES_PLPKS + int chosen_offset; +#endif /* * Restrict memory usage for kdump kernel by setting up @@ -1230,6 +1234,20 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, } } +#ifdef CONFIG_PSERIES_PLPKS + // If we have PLPKS active, we need to provide the password + if (plpks_is_available()) { + chosen_offset = fdt_path_offset(fdt, "/chosen"); + if (chosen_offset < 0) { + pr_err("Can't find chosen node: %s\n", + fdt_strerror(chosen_offset)); + goto out; + } + ret = fdt_setprop(fdt, chosen_offset, "ibm,plpks-pw", + plpks_get_password(), plpks_get_passwordlen()); + } +#endif // CONFIG_PSERIES_PLPKS + out: kfree(rmem); kfree(umem); diff --git a/arch/powerpc/platforms/pseries/plpks.c b/arch/powerpc/platforms/pseries/plpks.c index b3c7410a4f13..0350f10e1755 100644 --- a/arch/powerpc/platforms/pseries/plpks.c +++ b/arch/powerpc/platforms/pseries/plpks.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -126,7 +127,22 @@ static int plpks_gen_password(void) { unsigned long retbuf[PLPAR_HCALL_BUFSIZE] = { 0 }; u8 *password, consumer = PLPKS_OS_OWNER; - int rc; + struct property *prop; + int rc, len; + + // Before we generate the password, we may have been booted by kexec and + // provided with a previous password. Check for that first. + prop = of_find_property(of_chosen, "ibm,plpks-pw", ); + if (prop) { + ospasswordlength = (u16)len; + ospassword = kzalloc(ospasswordlength, GFP_KERNEL); + if (!ospassword) { + of_remove_property(of_chosen, prop); + return -ENOMEM; + } + memcpy(ospassword, prop->value, len); + return of_remove_property(of_chosen, prop); + } // The password must not cross a page boundary, so we align to the next power of 2 password = kzalloc(roundup_pow_of_two(maxpwsize), GFP_KERNEL); -- 2.39.0
[PATCH v4 22/24] powerpc/pseries: Implement secvars for dynamic secure boot
From: Russell Currey The pseries platform can support dynamic secure boot (i.e. secure boot using user-defined keys) using variables contained with the PowerVM LPAR Platform KeyStore (PLPKS). Using the powerpc secvar API, expose the relevant variables for pseries dynamic secure boot through the existing secvar filesystem layout. The relevant variables for dynamic secure boot are signed in the keystore, and can only be modified using the H_PKS_SIGNED_UPDATE hcall. Object labels in the keystore are encoded using ucs2 format. With our fixed variable names we don't have to care about encoding outside of the necessary byte padding. When a user writes to a variable, the first 8 bytes of data must contain the signed update flags as defined by the hypervisor. When a user reads a variable, the first 4 bytes of data contain the policies defined for the object. Limitations exist due to the underlying implementation of sysfs binary attributes, as is the case for the OPAL secvar implementation - partial writes are unsupported and writes cannot be larger than PAGE_SIZE. (Even when using bin_attributes, which can be larger than a single page, sysfs only gives us one page's worth of write buffer at a time, and the hypervisor does not expose an interface for partial writes.) Co-developed-by: Nayna Jain Signed-off-by: Nayna Jain Co-developed-by: Andrew Donnellan Signed-off-by: Andrew Donnellan Signed-off-by: Russell Currey --- v2: Remove unnecessary config vars from sysfs and document the others, thanks to review from Greg. If we end up needing to expose more, we can add them later and update the docs. Use sysfs_emit() instead of sprintf(), thanks to Greg. Change the size of the sysfs binary attributes to include the 8-byte flags header, preventing truncation of large writes. v3: plpks_set_variable(): pass var to plpks_signed_update_var() as a pointer (mpe) Update copyright date (ajd) Consistent comment style (ajd) Change device_initcall() to machine_arch_initcall(pseries...) so we don't try to load on powernv and kill the machine (mpe) Add config attributes into plpks_secvar_ops (mpe) Get rid of PLPKS_SECVAR_COUNT macro (mpe) Reworded descriptions in ABI documentation (mpe) Switch to using secvar_ops->var_names rather than secvar_ops->get_next() (ajd/mpe) Optimise allocation/copying of buffers (mpe) Elaborate the comment documenting the "format" string (mpe) Return -EIO on errors in the read case (mpe) Add "grubdbx" variable (Sudhakar Kuppusamy) Use utf8s_to_utf16s() rather than our own "UCS-2" conversion code (mpe) Change uint64_t to u64 (mpe) Fix SB_VERSION data length (ruscur) Stop prepending policy data on read (ruscur) Enforce max format length on format string (not strictly needed, but makes the length limit clear) (ajd) Update include of plpks.h to reflect new path (ruscur) Consistent constant naming scheme (ruscur) v4: Return set_secvar_ops() return code Pass buffer size to plpks_secvar_format() (stefanb, npiggin) Add missing null check (stefanb) Add comment to commit message explaining PAGE_SIZE write limit (joel) --- Documentation/ABI/testing/sysfs-secvar| 75 +- arch/powerpc/platforms/pseries/Makefile | 4 +- arch/powerpc/platforms/pseries/plpks-secvar.c | 215 ++ 3 files changed, 291 insertions(+), 3 deletions(-) create mode 100644 arch/powerpc/platforms/pseries/plpks-secvar.c diff --git a/Documentation/ABI/testing/sysfs-secvar b/Documentation/ABI/testing/sysfs-secvar index feebb8c57294..a19f4d5fcec6 100644 --- a/Documentation/ABI/testing/sysfs-secvar +++ b/Documentation/ABI/testing/sysfs-secvar @@ -18,6 +18,14 @@ Description: A string indicating which backend is in use by the firmware. This determines the format of the variable and the accepted format of variable updates. + On powernv/OPAL, this value is provided by the OPAL firmware + and is expected to be "ibm,edk2-compat-v1". + + On pseries/PLPKS, this is generated by the kernel based on the + version number in the SB_VERSION variable in the keystore, and + has the form "ibm,plpks-sb-v", or + "ibm,plpks-sb-unknown" if there is no SB_VERSION variable. + What: /sys/firmware/secvar/vars/ Date: August 2019 Contact: Nayna Jain @@ -34,7 +42,7 @@ Description: An integer representation of the size of the content of the What: /sys/firmware/secvar/vars//data Date: August 2019 -Contact: Nayna Jain h +Contact: Nayna Jain Description: A read-only file containing the value of the variable. The size of the file represents the maximum size of the variable data. @@ -44,3 +52,68 @@ Contact: Nayna Jain Description: A write-only file that is used to submit the new