[6.1.0-rc2] insecure W+X mapping warning during kdump kernel boot
With CONFIG_DEBUG_WX=y I am observing following warning During kdump kernel boot. This warning is not seen during production kernel boot. Kernel crash dump is captured correctly. [ cut here ] [ 11.541311] powerpc/mm: Found insecure W+X mapping at address 749d3849/0xc000 [ 11.541328] WARNING: CPU: 28 PID: 1 at arch/powerpc/mm/ptdump/ptdump.c:194 note_page+0x408/0x430 [ 11.541342] Modules linked in: [ 11.541348] CPU: 28 PID: 1 Comm: swapper/28 Not tainted 6.1.0-rc2-gb229b6ca5abb #1 [ 11.541356] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1030.00 (NH1030_026) hv:phyp pSeries [ 11.541364] NIP: c000100b1ac8 LR: c000100b1ac4 CTR: 00725d90 [ 11.541370] REGS: c000156f7720 TRAP: 0700 Not tainted (6.1.0-rc2-gb229b6ca5abb) [ 11.541377] MSR: 8282b033 CR: 28000822 XER: 0006 [ 11.541393] CFAR: c00010157514 IRQMASK: 0 [ 11.541393] GPR00: c000100b1ac4 c000156f79c0 c0001135eb00 0055 [ 11.541393] GPR04: fffe c000156f7780 c000156f7778 [ 11.541393] GPR08: fffe c000125d73b8 c00083dfffe8 [ 11.541393] GPR12: c00012e12400 c00010f3b5f0 c01f [ 11.541393] GPR16: c00012b2dba0 0001 c000156f7bd8 c0008ffc0010 [ 11.541393] GPR20: c00010f3b5f0 c0003fff c001 c00012b2dbb0 [ 11.541393] GPR24: c00012b2dba8 c0008ffe c020 c001 [ 11.541393] GPR28: c001018e 0004 c001018e c000156f7cc0 [ 11.541462] NIP [c000100b1ac8] note_page+0x408/0x430 [ 11.541469] LR [c000100b1ac4] note_page+0x404/0x430 [ 11.541476] Call Trace: [ 11.541478] [c000156f79c0] [c000100b1ac4] note_page+0x404/0x430 (unreliable) [ 11.541488] [c000156f7a70] [c00010550a14] ptdump_pte_entry+0xa4/0x100 [ 11.541498] [c000156f7ab0] [c0001049849c] walk_pgd_range+0x8ec/0xb20 [ 11.541507] [c000156f7bb0] [c00010498bf4] walk_page_range_novma+0x74/0xc0 [ 11.541515] [c000156f7c10] [c00010550e48] ptdump_walk_pgd+0x98/0x170 [ 11.541523] [c000156f7c60] [c000100b1b84] ptdump_check_wx+0x94/0x100 [ 11.541532] [c000156f7d40] [c00010094cb0] mark_rodata_ro+0x30/0x70 [ 11.541540] [c000156f7da0] [c0001001282c] kernel_init+0x8c/0x1b0 [ 11.541548] [c000156f7e10] [c0001000cf60] ret_from_kernel_thread+0x5c/0x64 [ 11.541556] Instruction dump: [ 11.541560] eb410080 ebc100a0 7c0803a6 4bfffc94 3c62ffe1 3921 3d42016b 7ca42b78 [ 11.541571] 3863e5c0 992a9e23 480a59ed 6000 <0fe0> fae10068 fb010070 fb210078 [ 11.541583] ---[ end trace ]— - Sachin
Re: [PATCH] powerpc/warp: switch to using gpiod API
On Mon, Sep 26, 2022 at 11:03:25PM -0700, Dmitry Torokhov wrote: > This switches PIKA Warp away from legacy gpio API and to newer gpiod > API, so that we can eventually deprecate the former. > > Because LEDs are normally driven by leds-gpio driver, but the > platform code also wants to access the LEDs during thermal shutdown, > and gpiod API does not allow locating GPIO without requesting it, > the platform code is now responsible for locating GPIOs through device > tree and requesting them. It then constructs platform data for > leds-gpio platform device and registers it. This allows platform > code to retain access to LED GPIO descriptors and use them when needed. > > Signed-off-by: Dmitry Torokhov Gentle ping on this... Could I get a feedback if this is acceptable or if you want me to rework this somehow? Thanks! > --- > > Compiled only, no hardware to test this. > > arch/powerpc/boot/dts/warp.dts| 4 +- > arch/powerpc/platforms/44x/warp.c | 105 ++ > 2 files changed, 94 insertions(+), 15 deletions(-) > > diff --git a/arch/powerpc/boot/dts/warp.dts b/arch/powerpc/boot/dts/warp.dts > index b4f32740870e..aa62d08e97c2 100644 > --- a/arch/powerpc/boot/dts/warp.dts > +++ b/arch/powerpc/boot/dts/warp.dts > @@ -258,14 +258,12 @@ GPIO1: gpio@ef600c00 { > }; > > power-leds { > - compatible = "gpio-leds"; > + compatible = "warp-power-leds"; > green { > gpios = < 0 0>; > - default-state = "keep"; > }; > red { > gpios = < 1 0>; > - default-state = "keep"; > }; > }; > > diff --git a/arch/powerpc/platforms/44x/warp.c > b/arch/powerpc/platforms/44x/warp.c > index f03432ef010b..cefa313c09f0 100644 > --- a/arch/powerpc/platforms/44x/warp.c > +++ b/arch/powerpc/platforms/44x/warp.c > @@ -5,15 +5,17 @@ > * Copyright (c) 2008-2009 PIKA Technologies > * Sean MacLennan > */ > +#include > #include > #include > #include > +#include > #include > #include > #include > #include > #include > -#include > +#include > #include > #include > > @@ -92,8 +94,6 @@ static int __init warp_post_info(void) > > static LIST_HEAD(dtm_shutdown_list); > static void __iomem *dtm_fpga; > -static unsigned green_led, red_led; > - > > struct dtm_shutdown { > struct list_head list; > @@ -101,7 +101,6 @@ struct dtm_shutdown { > void *arg; > }; > > - > int pika_dtm_register_shutdown(void (*func)(void *arg), void *arg) > { > struct dtm_shutdown *shutdown; > @@ -132,6 +131,35 @@ int pika_dtm_unregister_shutdown(void (*func)(void > *arg), void *arg) > return -EINVAL; > } > > +#define WARP_GREEN_LED 0 > +#define WARP_RED_LED 1 > + > +static struct gpio_led warp_gpio_led_pins[] = { > + [WARP_GREEN_LED] = { > + .name = "green", > + .default_state = LEDS_DEFSTATE_KEEP, > + .gpiod = NULL, /* to be filled by pika_setup_leds() */ > + }, > + [WARP_RED_LED] = { > + .name = "red", > + .default_state = LEDS_DEFSTATE_KEEP, > + .gpiod = NULL, /* to be filled by pika_setup_leds() */ > + }, > +}; > + > +static struct gpio_led_platform_data warp_gpio_led_data = { > + .leds = warp_gpio_led_pins, > + .num_leds = ARRAY_SIZE(warp_gpio_led_pins), > +}; > + > +static struct platform_device warp_gpio_leds = { > + .name = "leds-gpio", > + .id = -1, > + .dev= { > + .platform_data = _gpio_led_data, > + }, > +}; > + > static irqreturn_t temp_isr(int irq, void *context) > { > struct dtm_shutdown *shutdown; > @@ -139,7 +167,7 @@ static irqreturn_t temp_isr(int irq, void *context) > > local_irq_disable(); > > - gpio_set_value(green_led, 0); > + gpiod_set_value(warp_gpio_led_pins[WARP_GREEN_LED].gpiod, 0); > > /* Run through the shutdown list. */ > list_for_each_entry(shutdown, _shutdown_list, list) > @@ -153,7 +181,7 @@ static irqreturn_t temp_isr(int irq, void *context) > out_be32(dtm_fpga + 0x14, reset); > } > > - gpio_set_value(red_led, value); > + gpiod_set_value(warp_gpio_led_pins[WARP_RED_LED].gpiod, value); > value ^= 1; > mdelay(500); > } > @@ -162,25 +190,78 @@ static irqreturn_t temp_isr(int irq, void *context) > return IRQ_HANDLED; > } > > +/* > + * Because green and red power LEDs are normally driven by leds-gpio driver, > + * but in case of critical temperature shutdown we want to drive them > + * ourselves, we acquire both and then
Re: [PATCH v3 2/2] mm, hwpoison: When copy-on-write hits poison, take page offline
On 2022/10/22 4:01, Tony Luck wrote: > Cannot call memory_failure() directly from the fault handler because > mmap_lock (and others) are held. Could you please explain which lock makes it unfeasible to call memory_failure() directly and why? I'm somewhat confused. But I agree using memory_failure_queue() should be a good idea. > > It is important, but not urgent, to mark the source page as h/w poisoned > and unmap it from other tasks. > > Use memory_failure_queue() to request a call to memory_failure() for the > page with the error. > > Also provide a stub version for CONFIG_MEMORY_FAILURE=n > > Signed-off-by: Tony Luck > --- > include/linux/mm.h | 5 - > mm/memory.c| 4 +++- > 2 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 8bbcccbc5565..03ced659eb58 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -3268,7 +3268,6 @@ enum mf_flags { > int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, > unsigned long count, int mf_flags); > extern int memory_failure(unsigned long pfn, int flags); > -extern void memory_failure_queue(unsigned long pfn, int flags); > extern void memory_failure_queue_kick(int cpu); > extern int unpoison_memory(unsigned long pfn); > extern int sysctl_memory_failure_early_kill; > @@ -3277,8 +3276,12 @@ extern void shake_page(struct page *p); > extern atomic_long_t num_poisoned_pages __read_mostly; > extern int soft_offline_page(unsigned long pfn, int flags); > #ifdef CONFIG_MEMORY_FAILURE > +extern void memory_failure_queue(unsigned long pfn, int flags); > extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags); > #else > +static inline void memory_failure_queue(unsigned long pfn, int flags) > +{ > +} > static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags) > { > return 0; > diff --git a/mm/memory.c b/mm/memory.c > index b6056eef2f72..eae242351726 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2866,8 +2866,10 @@ static inline int __wp_page_copy_user(struct page > *dst, struct page *src, > unsigned long addr = vmf->address; > > if (likely(src)) { > - if (copy_mc_user_highpage(dst, src, addr, vma)) > + if (copy_mc_user_highpage(dst, src, addr, vma)) { > + memory_failure_queue(page_to_pfn(src), 0); It seems MF_ACTION_REQUIRED is not needed for memory_failure_queue() here. Thanks for your patch. Reviewed-by: Miaohe Lin Thanks, Miaohe Lin
Re: [PATCH v4 2/2] arm64: support batched/deferred tlb shootdown during page reclamation
On 10/28/22 03:25, Barry Song wrote: > On Fri, Oct 28, 2022 at 3:19 AM Punit Agrawal > wrote: >> >> [ Apologies for chiming in late in the conversation ] >> >> Anshuman Khandual writes: >> >>> On 9/28/22 05:53, Barry Song wrote: On Tue, Sep 27, 2022 at 10:15 PM Yicong Yang wrote: > On 2022/9/27 14:16, Anshuman Khandual wrote: >> [...] >> >> On 9/21/22 14:13, Yicong Yang wrote: >>> +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) >>> +{ >>> +/* for small systems with small number of CPUs, TLB shootdown is >>> cheap */ >>> +if (num_online_cpus() <= 4) >> It would be great to have some more inputs from others, whether 4 (which >> should >> to be codified into a macro e.g ARM64_NR_CPU_DEFERRED_TLB, or something >> similar) >> is optimal for an wide range of arm64 platforms. >> I have tested it on a 4-cpus and 8-cpus machine. but i have no machine with 5,6,7 cores. I saw improvement on 8-cpus machines and I found 4-cpus machines don't need this patch. so it seems safe to have if (num_online_cpus() < 8) > Do you prefer this macro to be static or make it configurable through > kconfig then > different platforms can make choice based on their own situations? It > maybe hard to > test on all the arm64 platforms. Maybe we can have this default enabled on machines with 8 and more cpus and provide a tlbflush_batched = on or off to allow users enable or disable it according to their hardware and products. Similar example: rodata=on or off. >>> No, sounds bit excessive. Kernel command line options should not be added >>> for every possible run time switch options. >>> Hi Anshuman, Will, Catalin, Andrew, what do you think about this approach? BTW, haoxin mentioned another important user scenarios for tlb bach on arm64: https://lore.kernel.org/lkml/393d6318-aa38-01ed-6ad8-f9eac89bf...@linux.alibaba.com/ I do believe we need it based on the expensive cost of tlb shootdown in arm64 even by hardware broadcast. >>> Alright, for now could we enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH >>> selectively >>> with CONFIG_EXPERT and for num_online_cpus() > 8 ? >> When running the test program in the commit in a VM, I saw benefits from >> the patches at all sizes from 2, 4, 8, 32 vcpus. On the test machine, >> ptep_clear_flush() went from ~1% in the unpatched version to not showing >> up. >> >> Yicong mentioned that he didn't see any benefit for <= 4 CPUs but is >> there any overhead? I am wondering what are the downsides of enabling >> the config by default. > As we are deferring tlb flush, but sometimes while we are modifying the vma > which are deferred, we need to do a sync by flush_tlb_batched_pending() in > mprotect() , madvise() to make sure they can see the flushed result. if nobody > is doing mprotect(), madvise() etc in the deferred period, the overhead is > zero. Right, it is difficult to justify this overhead for smaller systems, which for sure would not benefit from this batched TLB framework.
Re: [PATCH v3 1/2] mm, hwpoison: Try to recover from copy-on write faults
On 2022/10/22 4:01, Tony Luck wrote: > If the kernel is copying a page as the result of a copy-on-write > fault and runs into an uncorrectable error, Linux will crash because > it does not have recovery code for this case where poison is consumed > by the kernel. > > It is easy to set up a test case. Just inject an error into a private > page, fork(2), and have the child process write to the page. > > I wrapped that neatly into a test at: > > git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git > > just enable ACPI error injection and run: > > # ./einj_mem-uc -f copy-on-write > > Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel() > on architectures where that is available (currently x86 and powerpc). > When an error is detected during the page copy, return VM_FAULT_HWPOISON > to caller of wp_page_copy(). This propagates up the call stack. Both x86 > and powerpc have code in their fault handler to deal with this code by > sending a SIGBUS to the application. > > Note that this patch avoids a system crash and signals the process that > triggered the copy-on-write action. It does not take any action for the > memory error that is still in the shared page. To handle that a call to > memory_failure() is needed. But this cannot be done from wp_page_copy() > because it holds mmap_lock(). Perhaps the architecture fault handlers > can deal with this loose end in a subsequent patch? > > On Intel/x86 this loose end will often be handled automatically because > the memory controller provides an additional notification of the h/w > poison in memory, the handler for this will call memory_failure(). This > isn't a 100% solution. If there are multiple errors, not all may be > logged in this way. > > Reviewed-by: Dan Williams > Signed-off-by: Tony Luck Thanks for your work, Tony. > > --- > Changes in V3: > Dan Williams > Rename copy_user_highpage_mc() to copy_mc_user_highpage() for > consistency with Linus' discussion on names of functions that > check for machine check. > Write complete functions for the have/have-not copy_mc_to_kernel > cases (so grep shows there are two versions) > Change __wp_page_copy_user() to return 0 for success, negative for fail > [I picked -EAGAIN for both non-EHWPOISON cases] > > Changes in V2: >Naoya Horiguchi: > 1) Use -EHWPOISON error code instead of minus one. > 2) Poison path needs also to deal with old_page >Tony Luck: > Rewrote commit message > Added some powerpc folks to Cc: list > --- > include/linux/highmem.h | 24 > mm/memory.c | 30 -- > 2 files changed, 44 insertions(+), 10 deletions(-) > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index e9912da5441b..a32c64681f03 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -319,6 +319,30 @@ static inline void copy_user_highpage(struct page *to, > struct page *from, > > #endif > > +#ifdef copy_mc_to_kernel > +static inline int copy_mc_user_highpage(struct page *to, struct page *from, > + unsigned long vaddr, struct > vm_area_struct *vma) > +{ > + unsigned long ret; > + char *vfrom, *vto; > + > + vfrom = kmap_local_page(from); > + vto = kmap_local_page(to); > + ret = copy_mc_to_kernel(vto, vfrom, PAGE_SIZE); In copy_user_highpage(), kmsan_unpoison_memory(page_address(to), PAGE_SIZE) is done after the copy when __HAVE_ARCH_COPY_USER_HIGHPAGE isn't defined. Do we need to do something similar here? But I'm not familiar with kmsan, so I can easy be wrong. Anyway, this patch looks good to me. Thanks. Reviewed-by: Miaohe Lin Thanks, Miaohe Lin
Re: [PATCH v4 2/2] arm64: support batched/deferred tlb shootdown during page reclamation
On 10/28/22 03:37, Barry Song wrote: > On Thu, Oct 27, 2022 at 11:42 PM Anshuman Khandual > wrote: >> >> >> >> On 9/28/22 05:53, Barry Song wrote: >>> On Tue, Sep 27, 2022 at 10:15 PM Yicong Yang wrote: On 2022/9/27 14:16, Anshuman Khandual wrote: > [...] > > On 9/21/22 14:13, Yicong Yang wrote: >> +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) >> +{ >> +/* for small systems with small number of CPUs, TLB shootdown is >> cheap */ >> +if (num_online_cpus() <= 4) > > It would be great to have some more inputs from others, whether 4 (which > should > to be codified into a macro e.g ARM64_NR_CPU_DEFERRED_TLB, or something > similar) > is optimal for an wide range of arm64 platforms. > >>> >>> I have tested it on a 4-cpus and 8-cpus machine. but i have no machine >>> with 5,6,7 >>> cores. >>> I saw improvement on 8-cpus machines and I found 4-cpus machines don't need >>> this patch. >>> >>> so it seems safe to have >>> if (num_online_cpus() < 8) >>> Do you prefer this macro to be static or make it configurable through kconfig then different platforms can make choice based on their own situations? It maybe hard to test on all the arm64 platforms. >>> >>> Maybe we can have this default enabled on machines with 8 and more cpus and >>> provide a tlbflush_batched = on or off to allow users enable or >>> disable it according >>> to their hardware and products. Similar example: rodata=on or off. >> >> No, sounds bit excessive. Kernel command line options should not be added >> for every possible run time switch options. >> >>> >>> Hi Anshuman, Will, Catalin, Andrew, >>> what do you think about this approach? >>> >>> BTW, haoxin mentioned another important user scenarios for tlb bach on >>> arm64: >>> https://lore.kernel.org/lkml/393d6318-aa38-01ed-6ad8-f9eac89bf...@linux.alibaba.com/ >>> >>> I do believe we need it based on the expensive cost of tlb shootdown in >>> arm64 >>> even by hardware broadcast. >> >> Alright, for now could we enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH >> selectively >> with CONFIG_EXPERT and for num_online_cpus() > 8 ? > > Sounds good to me. It is a good start to bring up tlb batched flush in > ARM64. Later on, we > might want to see it in both memory reclamation and migration. Right, that is the idea, CONFIG_EXPERT gives an way to test this out for some time on various platforms, and later it can be dropped off. Regarding num_online_cpus() = '8' as the threshold which would potentially give benefit of batched TLB should be defined as a macro e.g NR_CPUS_FOR_BATCHED_TLB or internal (non user selectable) config , with a proper in-code comment, explaining the rationale.
[powerpc:fixes-test] BUILD SUCCESS 32f648fac0c17215014a47f1cd5caaaa17d8abec
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test branch HEAD: 32f648fac0c17215014a47f1cd5c17d8abec powerpc/64e: Fix amdgpu build on Book3E w/o AltiVec elapsed time: 724m configs tested: 32 configs skipped: 94 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: powerpc allnoconfig powerpc allmodconfig arc randconfig-r043-20221026 s390 randconfig-r044-20221026 riscvrandconfig-r042-20221026 x86_64 defconfig x86_64 allyesconfig x86_64 rhel-8.3 ia64 allmodconfig x86_64 rhel-8.3-kvm x86_64 rhel-8.3-syz x86_64 rhel-8.3-kunit i386 allyesconfig i386defconfig x86_64 rhel-8.3-func x86_64rhel-8.3-kselftests arm64allyesconfig arm defconfig arm allyesconfig m68k allyesconfig m68k allmodconfig arc allyesconfig alphaallyesconfig s390defconfig s390 allmodconfig arc defconfig alpha defconfig s390 allyesconfig um x86_64_defconfig um i386_defconfig mips allyesconfig sh allmodconfig -- 0-DAY CI Kernel Test Service https://01.org/lkp
[powerpc:merge] BUILD SUCCESS eb05539de71d4784705d3e502d951ac17fea05e3
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge branch HEAD: eb05539de71d4784705d3e502d951ac17fea05e3 Automatic merge of 'fixes' into merge (2022-10-27 22:29) elapsed time: 724m configs tested: 58 configs skipped: 2 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arc defconfig s390 allmodconfig alpha defconfig s390 allyesconfig s390defconfig arm defconfig x86_64 rhel-8.3-syz um i386_defconfig x86_64 rhel-8.3-kunit x86_64 rhel-8.3-kvm um x86_64_defconfig x86_64 defconfig m68k allmodconfig alphaallyesconfig powerpc allnoconfig powerpc allmodconfig mips allyesconfig m68k allyesconfig i386defconfig arm allyesconfig arc randconfig-r043-20221027 sh allmodconfig arm64allyesconfig i386 randconfig-a001 i386 randconfig-a003 x86_64 allyesconfig arc allyesconfig x86_64rhel-8.3-kselftests x86_64 rhel-8.3-func x86_64 rhel-8.3 i386 allyesconfig i386 randconfig-a005 ia64 allmodconfig x86_64randconfig-a013 x86_64randconfig-a011 x86_64randconfig-a015 x86_64randconfig-a004 x86_64randconfig-a002 i386 randconfig-a014 i386 randconfig-a012 i386 randconfig-a016 x86_64randconfig-a006 clang tested configs: hexagon randconfig-r041-20221027 riscvrandconfig-r042-20221027 i386 randconfig-a002 hexagon randconfig-r045-20221027 i386 randconfig-a004 s390 randconfig-r044-20221027 i386 randconfig-a006 x86_64randconfig-a014 x86_64randconfig-a016 x86_64randconfig-a012 x86_64randconfig-a001 x86_64randconfig-a003 i386 randconfig-a013 i386 randconfig-a011 i386 randconfig-a015 x86_64randconfig-a005 -- 0-DAY CI Kernel Test Service https://01.org/lkp
Re: [PATCH v4 2/2] arm64: support batched/deferred tlb shootdown during page reclamation
On 2022/10/27 22:19, Punit Agrawal wrote: > > [ Apologies for chiming in late in the conversation ] > > Anshuman Khandual writes: > >> On 9/28/22 05:53, Barry Song wrote: >>> On Tue, Sep 27, 2022 at 10:15 PM Yicong Yang wrote: On 2022/9/27 14:16, Anshuman Khandual wrote: > [...] > > On 9/21/22 14:13, Yicong Yang wrote: >> +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) >> +{ >> +/* for small systems with small number of CPUs, TLB shootdown is >> cheap */ >> +if (num_online_cpus() <= 4) > > It would be great to have some more inputs from others, whether 4 (which > should > to be codified into a macro e.g ARM64_NR_CPU_DEFERRED_TLB, or something > similar) > is optimal for an wide range of arm64 platforms. > >>> >>> I have tested it on a 4-cpus and 8-cpus machine. but i have no machine >>> with 5,6,7 >>> cores. >>> I saw improvement on 8-cpus machines and I found 4-cpus machines don't need >>> this patch. >>> >>> so it seems safe to have >>> if (num_online_cpus() < 8) >>> Do you prefer this macro to be static or make it configurable through kconfig then different platforms can make choice based on their own situations? It maybe hard to test on all the arm64 platforms. >>> >>> Maybe we can have this default enabled on machines with 8 and more cpus and >>> provide a tlbflush_batched = on or off to allow users enable or >>> disable it according >>> to their hardware and products. Similar example: rodata=on or off. >> >> No, sounds bit excessive. Kernel command line options should not be added >> for every possible run time switch options. >> >>> >>> Hi Anshuman, Will, Catalin, Andrew, >>> what do you think about this approach? >>> >>> BTW, haoxin mentioned another important user scenarios for tlb bach on >>> arm64: >>> https://lore.kernel.org/lkml/393d6318-aa38-01ed-6ad8-f9eac89bf...@linux.alibaba.com/ >>> >>> I do believe we need it based on the expensive cost of tlb shootdown in >>> arm64 >>> even by hardware broadcast. >> >> Alright, for now could we enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH >> selectively >> with CONFIG_EXPERT and for num_online_cpus() > 8 ? > > When running the test program in the commit in a VM, I saw benefits from > the patches at all sizes from 2, 4, 8, 32 vcpus. On the test machine, > ptep_clear_flush() went from ~1% in the unpatched version to not showing > up. > Maybe you're booting VM on a server with more than 32 cores and Barry tested on his 4 CPUs embedded platform. I guess a 4 CPU VM is not fully equivalent to a 4 CPU real machine as the tbli and dsb in the VM may influence the host as well. > Yicong mentioned that he didn't see any benefit for <= 4 CPUs but is > there any overhead? I am wondering what are the downsides of enabling > the config by default. > > Thanks, > Punit > . >
Re: [PATCH v4 2/2] arm64: support batched/deferred tlb shootdown during page reclamation
[ Apologies for chiming in late in the conversation ] Anshuman Khandual writes: > On 9/28/22 05:53, Barry Song wrote: >> On Tue, Sep 27, 2022 at 10:15 PM Yicong Yang wrote: >>> >>> On 2022/9/27 14:16, Anshuman Khandual wrote: [...] On 9/21/22 14:13, Yicong Yang wrote: > +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) > +{ > +/* for small systems with small number of CPUs, TLB shootdown is > cheap */ > +if (num_online_cpus() <= 4) It would be great to have some more inputs from others, whether 4 (which should to be codified into a macro e.g ARM64_NR_CPU_DEFERRED_TLB, or something similar) is optimal for an wide range of arm64 platforms. >> >> I have tested it on a 4-cpus and 8-cpus machine. but i have no machine >> with 5,6,7 >> cores. >> I saw improvement on 8-cpus machines and I found 4-cpus machines don't need >> this patch. >> >> so it seems safe to have >> if (num_online_cpus() < 8) >> >>> >>> Do you prefer this macro to be static or make it configurable through >>> kconfig then >>> different platforms can make choice based on their own situations? It maybe >>> hard to >>> test on all the arm64 platforms. >> >> Maybe we can have this default enabled on machines with 8 and more cpus and >> provide a tlbflush_batched = on or off to allow users enable or >> disable it according >> to their hardware and products. Similar example: rodata=on or off. > > No, sounds bit excessive. Kernel command line options should not be added > for every possible run time switch options. > >> >> Hi Anshuman, Will, Catalin, Andrew, >> what do you think about this approach? >> >> BTW, haoxin mentioned another important user scenarios for tlb bach on arm64: >> https://lore.kernel.org/lkml/393d6318-aa38-01ed-6ad8-f9eac89bf...@linux.alibaba.com/ >> >> I do believe we need it based on the expensive cost of tlb shootdown in arm64 >> even by hardware broadcast. > > Alright, for now could we enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH selectively > with CONFIG_EXPERT and for num_online_cpus() > 8 ? When running the test program in the commit in a VM, I saw benefits from the patches at all sizes from 2, 4, 8, 32 vcpus. On the test machine, ptep_clear_flush() went from ~1% in the unpatched version to not showing up. Yicong mentioned that he didn't see any benefit for <= 4 CPUs but is there any overhead? I am wondering what are the downsides of enabling the config by default. Thanks, Punit
Re: [PATCH v3 0/2] Fix /proc/cpuinfo cpumask warning
On Fri, 14 Oct 2022 08:58:43 PDT (-0700), ajo...@ventanamicro.com wrote: Commit 78e5a3399421 ("cpumask: fix checking valid cpu range") has started issuing warnings[*] when cpu indices equal to nr_cpu_ids - 1 are passed to cpumask_next* functions. seq_read_iter() and cpuinfo's start and next seq operations implement a pattern like n = cpumask_next(n - 1, mask); show(n); while (1) { ++n; n = cpumask_next(n - 1, mask); if (n >= nr_cpu_ids) break; show(n); } which will issue the warning when reading /proc/cpuinfo. [*] Warnings will only appear with DEBUG_PER_CPU_MAPS enabled. This series address the issue for x86 and riscv, but from a quick grep of cpuinfo seq operations, I think at least openrisc, powerpc, and s390 also need an equivalent patch. While the test is simple (see next paragraph) I'm not equipped to test on each architecture. To test, just build a kernel with DEBUG_PER_CPU_MAPS enabled, boot to a shell, do 'cat /proc/cpuinfo', and look for a kernel warning. While the patches are being posted together in a series since they're for two different architectures they don't necessarily need to go through the same tree. v3: - Change condition from >= to == in order to still get a warning for > as that's unexpected. [Yury] - Picked up tags on the riscv patch v2: - Added all the information I should have in the first place to the commit message [Boris] - Changed style of fix [Boris] Andrew Jones (2): RISC-V: Fix /proc/cpuinfo cpumask warning I just took the RISC-V fix, might be worth re-sending the x86 one alone as nobody's replied over there so it may be lost. Thanks! x86: Fix /proc/cpuinfo cpumask warning arch/riscv/kernel/cpu.c| 3 +++ arch/x86/kernel/cpu/proc.c | 3 +++ 2 files changed, 6 insertions(+)
Re: [PATCH v8 4/9] phy: fsl: Add Lynx 10G SerDes driver
Quoting Sean Anderson (2022-10-27 12:11:08) > diff --git a/drivers/phy/freescale/Kconfig b/drivers/phy/freescale/Kconfig > index 853958fb2c06..a6f9e39b 100644 > --- a/drivers/phy/freescale/Kconfig > +++ b/drivers/phy/freescale/Kconfig > @@ -47,3 +47,25 @@ config PHY_FSL_LYNX_28G > found on NXP's Layerscape platforms such as LX2160A. > Used to change the protocol running on SerDes lanes at runtime. > Only useful for a restricted set of Ethernet protocols. > + > +config PHY_FSL_LYNX_10G > + tristate "Freescale QorIQ Lynx 10G SerDes support" > + depends on COMMON_CLK Does something not compile if COMMON_CLK is disabled? > + depends on ARCH_LAYERSCAPE || PPC || COMPILE_TEST > + select GENERIC_PHY > + select REGMAP_MMIO > + help > + This adds support for the Lynx "SerDes" devices found on various > QorIQ > + SoCs. There may be up to four SerDes devices on each SoC, and each > + device supports up to eight lanes. The SerDes is configured by > + default by the RCW, but this module is necessary in order to support > + some modes (such as 2.5G SGMII or 1000BASE-KX), or clock setups (as > + only as subset of clock configurations are supported by the RCW). > + The hardware supports a variety of protocols, including Ethernet, > + SATA, PCIe, and more exotic links such as Interlaken and Aurora. > This > + driver only supports Ethernet, but it will try not to touch lanes > + configured for other protocols. > + > + If you have a QorIQ processor and want to dynamically reconfigure > your > + SerDes, say Y. If this driver is compiled as a module, it will be > + named phy-fsl-lynx-10g-drv. > diff --git a/drivers/phy/freescale/Makefile b/drivers/phy/freescale/Makefile > index cedb328bc4d2..1f18936507e0 100644 > --- a/drivers/phy/freescale/Makefile > +++ b/drivers/phy/freescale/Makefile > @@ -3,4 +3,7 @@ obj-$(CONFIG_PHY_FSL_IMX8MQ_USB)+= > phy-fsl-imx8mq-usb.o > obj-$(CONFIG_PHY_MIXEL_LVDS_PHY) += phy-fsl-imx8qm-lvds-phy.o > obj-$(CONFIG_PHY_MIXEL_MIPI_DPHY) += phy-fsl-imx8-mipi-dphy.o > obj-$(CONFIG_PHY_FSL_IMX8M_PCIE) += phy-fsl-imx8m-pcie.o > +phy-fsl-lynx-10g-drv-y += phy-fsl-lynx-10g.o > +phy-fsl-lynx-10g-drv-y += phy-fsl-lynx-10g-clk.o > +obj-$(CONFIG_PHY_FSL_LYNX_10G) += phy-fsl-lynx-10g-drv.o > obj-$(CONFIG_PHY_FSL_LYNX_28G) += phy-fsl-lynx-28g.o > diff --git a/drivers/phy/freescale/lynx-10g.h > b/drivers/phy/freescale/lynx-10g.h > new file mode 100644 > index ..75d9353a867b > --- /dev/null > +++ b/drivers/phy/freescale/lynx-10g.h > @@ -0,0 +1,16 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright (C) 2022 Sean Anderson > + */ > + > +#ifndef LYNX_10G > +#define LYNX_10G > + > +struct clk; > +struct device; > +struct regmap; > + > +int lynx_clks_init(struct device *dev, struct regmap *regmap, Can you use auxiliary bus to register this clk controller instead and then move the clk file to drivers/clk/? > + struct clk *plls[2], struct clk *ex_dlys[2]); > + > +#endif /* LYNX 10G */ > diff --git a/drivers/phy/freescale/phy-fsl-lynx-10g-clk.c > b/drivers/phy/freescale/phy-fsl-lynx-10g-clk.c > new file mode 100644 > index ..6ec32bdfb9dd > --- /dev/null > +++ b/drivers/phy/freescale/phy-fsl-lynx-10g-clk.c > @@ -0,0 +1,503 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2022 Sean Anderson > + * > + * This file contains the implementation for the PLLs found on Lynx 10G phys. > + * > + * XXX: The VCO rate of the PLLs can exceed ~4GHz, which is the maximum rate > + * expressable in an unsigned long. To work around this, rates are specified > in > + * kHz. This is as if there was a division by 1000 in the PLL. > + */ > + > +#include Ideally clk.h isn't included in a clk provider. This allows us to easily identify drivers that are both a consumer (clk.h) and a provider (clk-provider.h). A provider/consumer is rare. > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "lynx-10g.h" > + > +#define PLL_STRIDE 0x20 > +#define PLLa(a, off) ((a) * PLL_STRIDE + (off)) > +#define PLLaRSTCTL(a) PLLa(a, 0x00) > +#define PLLaCR0(a) PLLa(a, 0x04) > + > +#define PLLaRSTCTL_RSTREQ BIT(31) > +#define PLLaRSTCTL_RST_DONEBIT(30) > +#define PLLaRSTCTL_RST_ERR BIT(29) > +#define PLLaRSTCTL_PLLRST_BBIT(7) > +#define PLLaRSTCTL_SDRST_B BIT(6) > +#define PLLaRSTCTL_SDENBIT(5) > + > +#define PLLaRSTCTL_ENABLE_SET (PLLaRSTCTL_RST_DONE | PLLaRSTCTL_PLLRST_B | \ > +PLLaRSTCTL_SDRST_B | PLLaRSTCTL_SDEN) > +#define PLLaRSTCTL_ENABLE_MASK (PLLaRSTCTL_ENABLE_SET | PLLaRSTCTL_RST_ERR) > + > +#define PLLaCR0_POFF BIT(31) > +#define PLLaCR0_RFCLK_SEL GENMASK(30, 28) >
Re: [PATCH v4 2/2] arm64: support batched/deferred tlb shootdown during page reclamation
On Thu, Oct 27, 2022 at 11:42 PM Anshuman Khandual wrote: > > > > On 9/28/22 05:53, Barry Song wrote: > > On Tue, Sep 27, 2022 at 10:15 PM Yicong Yang wrote: > >> > >> On 2022/9/27 14:16, Anshuman Khandual wrote: > >>> [...] > >>> > >>> On 9/21/22 14:13, Yicong Yang wrote: > +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) > +{ > +/* for small systems with small number of CPUs, TLB shootdown is > cheap */ > +if (num_online_cpus() <= 4) > >>> > >>> It would be great to have some more inputs from others, whether 4 (which > >>> should > >>> to be codified into a macro e.g ARM64_NR_CPU_DEFERRED_TLB, or something > >>> similar) > >>> is optimal for an wide range of arm64 platforms. > >>> > > > > I have tested it on a 4-cpus and 8-cpus machine. but i have no machine > > with 5,6,7 > > cores. > > I saw improvement on 8-cpus machines and I found 4-cpus machines don't need > > this patch. > > > > so it seems safe to have > > if (num_online_cpus() < 8) > > > >> > >> Do you prefer this macro to be static or make it configurable through > >> kconfig then > >> different platforms can make choice based on their own situations? It > >> maybe hard to > >> test on all the arm64 platforms. > > > > Maybe we can have this default enabled on machines with 8 and more cpus and > > provide a tlbflush_batched = on or off to allow users enable or > > disable it according > > to their hardware and products. Similar example: rodata=on or off. > > No, sounds bit excessive. Kernel command line options should not be added > for every possible run time switch options. > > > > > Hi Anshuman, Will, Catalin, Andrew, > > what do you think about this approach? > > > > BTW, haoxin mentioned another important user scenarios for tlb bach on > > arm64: > > https://lore.kernel.org/lkml/393d6318-aa38-01ed-6ad8-f9eac89bf...@linux.alibaba.com/ > > > > I do believe we need it based on the expensive cost of tlb shootdown in > > arm64 > > even by hardware broadcast. > > Alright, for now could we enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH selectively > with CONFIG_EXPERT and for num_online_cpus() > 8 ? Sounds good to me. It is a good start to bring up tlb batched flush in ARM64. Later on, we might want to see it in both memory reclamation and migration. Thanks Barry
Re: [PATCH v8 3/9] dt-bindings: clock: Add ids for Lynx 10g PLLs
On 10/27/22 17:49, Stephen Boyd wrote: > Quoting Sean Anderson (2022-10-27 12:11:07) >> This adds ids for the Lynx 10g SerDes's internal PLLs. These may be used >> with assigned-clock* to specify a particular frequency to use. For >> example, to set the second PLL (at offset 0x20)'s frequency, use >> LYNX10G_PLLa(1). These are for use only in the device tree, and are not >> otherwise used by the driver. >> >> Signed-off-by: Sean Anderson >> Acked-by: Rob Herring >> --- > > Acked-by: Stephen Boyd Whoops, looks like I forgot to pick this up in v5. There's also an (internal) clock driver in the next patch if you want to look that over. --Sean
Re: [PATCH v4 2/2] arm64: support batched/deferred tlb shootdown during page reclamation
On Fri, Oct 28, 2022 at 3:19 AM Punit Agrawal wrote: > > > [ Apologies for chiming in late in the conversation ] > > Anshuman Khandual writes: > > > On 9/28/22 05:53, Barry Song wrote: > >> On Tue, Sep 27, 2022 at 10:15 PM Yicong Yang wrote: > >>> > >>> On 2022/9/27 14:16, Anshuman Khandual wrote: > [...] > > On 9/21/22 14:13, Yicong Yang wrote: > > +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) > > +{ > > +/* for small systems with small number of CPUs, TLB shootdown is > > cheap */ > > +if (num_online_cpus() <= 4) > > It would be great to have some more inputs from others, whether 4 (which > should > to be codified into a macro e.g ARM64_NR_CPU_DEFERRED_TLB, or something > similar) > is optimal for an wide range of arm64 platforms. > > >> > >> I have tested it on a 4-cpus and 8-cpus machine. but i have no machine > >> with 5,6,7 > >> cores. > >> I saw improvement on 8-cpus machines and I found 4-cpus machines don't need > >> this patch. > >> > >> so it seems safe to have > >> if (num_online_cpus() < 8) > >> > >>> > >>> Do you prefer this macro to be static or make it configurable through > >>> kconfig then > >>> different platforms can make choice based on their own situations? It > >>> maybe hard to > >>> test on all the arm64 platforms. > >> > >> Maybe we can have this default enabled on machines with 8 and more cpus and > >> provide a tlbflush_batched = on or off to allow users enable or > >> disable it according > >> to their hardware and products. Similar example: rodata=on or off. > > > > No, sounds bit excessive. Kernel command line options should not be added > > for every possible run time switch options. > > > >> > >> Hi Anshuman, Will, Catalin, Andrew, > >> what do you think about this approach? > >> > >> BTW, haoxin mentioned another important user scenarios for tlb bach on > >> arm64: > >> https://lore.kernel.org/lkml/393d6318-aa38-01ed-6ad8-f9eac89bf...@linux.alibaba.com/ > >> > >> I do believe we need it based on the expensive cost of tlb shootdown in > >> arm64 > >> even by hardware broadcast. > > > > Alright, for now could we enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > > selectively > > with CONFIG_EXPERT and for num_online_cpus() > 8 ? > > When running the test program in the commit in a VM, I saw benefits from > the patches at all sizes from 2, 4, 8, 32 vcpus. On the test machine, > ptep_clear_flush() went from ~1% in the unpatched version to not showing > up. > > Yicong mentioned that he didn't see any benefit for <= 4 CPUs but is > there any overhead? I am wondering what are the downsides of enabling > the config by default. As we are deferring tlb flush, but sometimes while we are modifying the vma which are deferred, we need to do a sync by flush_tlb_batched_pending() in mprotect() , madvise() to make sure they can see the flushed result. if nobody is doing mprotect(), madvise() etc in the deferred period, the overhead is zero. > > Thanks, > Punit Thanks Barry
Re: [PATCH v8 3/9] dt-bindings: clock: Add ids for Lynx 10g PLLs
Quoting Sean Anderson (2022-10-27 12:11:07) > This adds ids for the Lynx 10g SerDes's internal PLLs. These may be used > with assigned-clock* to specify a particular frequency to use. For > example, to set the second PLL (at offset 0x20)'s frequency, use > LYNX10G_PLLa(1). These are for use only in the device tree, and are not > otherwise used by the driver. > > Signed-off-by: Sean Anderson > Acked-by: Rob Herring > --- Acked-by: Stephen Boyd
Re: [PATCH v7 3/8] dt-bindings: clock: Add ids for Lynx 10g PLLs
Quoting Sean Anderson (2022-10-18 16:11:07) > This adds ids for the Lynx 10g SerDes's internal PLLs. These may be used > with assigned-clock* to specify a particular frequency to use. For > example, to set the second PLL (at offset 0x20)'s frequency, use > LYNX10G_PLLa(1). These are for use only in the device tree, and are not > otherwise used by the driver. > > Signed-off-by: Sean Anderson > Acked-by: Rob Herring > --- Acked-by: Stephen Boyd
Re: [PATCH v3] hugetlb: simplify hugetlb handling in follow_page_mask
On Wed, Oct 26, 2022 at 05:34:04PM -0700, Mike Kravetz wrote: > On 10/26/22 17:59, Peter Xu wrote: > > Hi, Mike, > > > > On Sun, Sep 18, 2022 at 07:13:48PM -0700, Mike Kravetz wrote: > > > +struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, > > > + unsigned long address, unsigned int flags) > > > +{ > > > + struct hstate *h = hstate_vma(vma); > > > + struct mm_struct *mm = vma->vm_mm; > > > + unsigned long haddr = address & huge_page_mask(h); > > > + struct page *page = NULL; > > > + spinlock_t *ptl; > > > + pte_t *pte, entry; > > > + > > > + /* > > > + * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via > > > + * follow_hugetlb_page(). > > > + */ > > > + if (WARN_ON_ONCE(flags & FOLL_PIN)) > > > + return NULL; > > > + > > > +retry: > > > + /* > > > + * vma lock prevents racing with another thread doing a pmd unshare. > > > + * This keeps pte as returned by huge_pte_offset valid. > > > + */ > > > + hugetlb_vma_lock_read(vma); > > > > I'm not sure whether it's okay to take a rwsem here, as the code can be > > called by e.g. FOLL_NOWAIT? > > I think you are right. This is possible even thought not called this > way today, > > > I'm wondering whether it's fine to just drop this anyway, just always walk > > it lockless. IIUC gup callers should be safe here because the worst case > > is the caller will fetch a wrong page, but then it should be invalidated > > very soon with mmu notifiers. One thing worth mention is that pmd unshare > > should never free a pgtable page. > > You are correct in that pmd unshare will not directly free a pgtable page. > However, I think a 'very worst case' race could be caused by two threads(1,2) > in the same process A, and another process B. Processes A and B share a PMD. > - Process A thread 1 gets a *ptep via huge_pte_offset and gets scheduled out. > - Process A thread 2 calls mprotect to change protection and unshares > the PMD shared with process B. > - Process B then unmaps the PMD shared with process A and the PMD page > gets deleted. [2] > - The *ptep in Process A thread 1 then points into a freed page. > This is VERY unlikely, but I do think it is possible and is the reason I > may be overcautious about protecting against races with pmd unshare. Yes this is possible, I just realized that actually huge_pte_offset() is a soft pgtable walker too. Thanks for pointing that out. If we want to use the vma read lock to protect here as the slow gup path, then please check again with below [1] - I think we'll also need to protect it with fast-gup (probably with trylock only, because fast-gup cannot sleep) or it'll encounter the same race, iiuc. Actually, instead of using vma lock, I really think this is another problem and needs standalone fixing. The problem is we allows huge_pte_offset() to walk the process pgtable without any protection, while pmd unsharing can drop a page anytime. huge_pte_offset() is always facing use-after-free when walking the PUD page. We may want RCU lock to protect the pgtable pages from getting away when huge_pte_offset() is walking it, it'll be safe then because pgtable pages are released in RCU fashion only (e.g. in above example, process [2] will munmap() and release the last ref to the "used to be shared" pmd and the PUD that maps the shared pmds will be released only after a RCU grace period), and afaict that's also what's protecting fast-gup from accessing freed pgtable pages. If with all huge_pte_offset() callers becoming RCU-safe, then IIUC we can drop the vma lock in all GUP code, aka, in hugetlb_follow_page_mask() here, because both slow and fast gup should be safe too in the same manner. Thanks, > > IIUC it's also the same as fast-gup - afaiu we don't take the read vma lock > > in fast-gup too but I also think it's safe. But I hope I didn't miss > > something. [1] -- Peter Xu
[PATCH v8 9/9] [DO NOT MERGE] arm64: dts: ls1088ardb: Add serdes bindings
This adds serdes support to the LS1088ARDB. I have tested the QSGMII ports as well as the two 10G ports. The SFP slot is now fully supported, instead of being modeled as a fixed-link. Linux hangs around when the serdes is initialized if the si5341 is enabled with the in-tree driver, so I have modeled it as a two fixed clocks instead. There are a few registers in the QIXIS FPGA which control the SFP GPIOs; I have modeled them as discrete GPIO controllers for now. I never saw the AQR105 interrupt fire; not sure what was going on, but I have removed it to force polling. The MC firmware needs to be fairly new (it must support DPAA2_MAC_FEATURE_PROTOCOL_CHANGE), and the DPC needs to set the macs to MAC_LINK_TYPE_BACKPLANE. This will break ethernet if those features are not enabled. I am not sure what the upstreaming plan should be. Signed-off-by: Sean Anderson --- Changes in v8: - Rename serdes phy handles like the LS1046A - Add SFP slot binding - Fix incorrect lane ordering (it's backwards on the LS1088A just like it is in the LS1046A). - Fix duplicated lane 2 (it should have been lane 3). - Fix incorrectly-documented value for XFI1. - Remove interrupt for aquantia phy. It never fired for whatever reason, preventing the link from coming up. - Add GPIOs for QIXIS FPGA. - Enable MAC1 PCS - Remove si5341 binding Changes in v4: - Convert to new bindings .../boot/dts/freescale/fsl-ls1088a-rdb.dts| 168 +- 1 file changed, 166 insertions(+), 2 deletions(-) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a-rdb.dts b/arch/arm64/boot/dts/freescale/fsl-ls1088a-rdb.dts index 1bfbce69cc8b..dac3337acbc9 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a-rdb.dts +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a-rdb.dts @@ -10,17 +10,143 @@ /dts-v1/; +#include +#include + #include "fsl-ls1088a.dtsi" / { model = "LS1088A RDB Board"; compatible = "fsl,ls1088a-rdb", "fsl,ls1088a"; + + clocks { + si5341_xtal: clock-48mhz { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <4800>; + }; + + clk_100mhz: clock-100mhz { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <1>; + }; + + clk_156mhz: clock-156mhz { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <15625>; + }; + }; + + sfp_slot: sfp { + compatible = "sff,sfp"; + i2c-bus = <_i2c>; + los-gpios = <_stat 5 GPIO_ACTIVE_HIGH>; + tx-fault-gpios = <_stat 4 GPIO_ACTIVE_HIGH>; + tx-disable-gpios = < 4 GPIO_ACTIVE_HIGH>; + }; +}; + + { + //clocks = < 0 8>, < 0 9>; + clocks = <_100mhz>, <_156mhz>; + clock-names = "ref0", "ref1"; + status = "okay"; + + /* +* XXX: Lane A uses pins SD1_RX3_P/N! That is, the lane numbers and pin +* numbers are _reversed_. +*/ + serdes1_A: phy@0 { + #phy-cells = <0>; + reg = <0>; + + /* QSGb */ + qsgmii-0 { + fsl,pccr = <0x9>; + fsl,index = <0>; + fsl,cfg = <0x1>; + phy-type = ; + }; + }; + + serdes1_B: phy@1 { + #phy-cells = <0>; + reg = <1>; + + /* QSGa */ + qsgmii-1 { + fsl,pccr = <0x9>; + fsl,index = <1>; + fsl,cfg = <0x1>; + phy-type = ; + }; + }; + + serdes1_C: phy@2 { + #phy-cells = <0>; + reg = <2>; + + /* SG1 */ + sgmii-1 { + fsl,pccr = <0x8>; + fsl,index = <2>; + fsl,cfg = <0x1>; + phy-type = ; + }; + + /* +* XFI1 +* Table 23-1 and section 23.5.16.4 disagree; this reflects the +* table. +* +* fsl,cfg is documented as 1, but it is set to 2 by the RCW! +* This is the same as the LS1046A. +*/ + xfi-0 { + fsl,pccr = <0xb>; + fsl,index = <0>; + fsl,cfg = <0x2>; + phy-type = ; + }; + }; + + serdes1_D: phy@3 { + #phy-cells = <0>; + reg = <3>; + + /* SG2 */ + sgmii-3 { + fsl,pccr = <0x8>; + fsl,index = <3>; + fsl,cfg =
[PATCH v8 8/9] arm64: dts: ls1088a: Prevent PCSs from probing as phys
The internal PCSs are not always accessible during boot (such as if the serdes has deselected the appropriate link mode). Give them appropriate compatible strings so they don't automatically (fail to) probe as genphys. Signed-off-by: Sean Anderson --- Changes in v8: - New .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 30 --- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi index 3810d66f8725..7603ea6328d2 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi @@ -817,7 +817,8 @@ pcs_mdio1: mdio@8c07000 { #size-cells = <0>; status = "disabled"; - pcs1: ethernet-phy@0 { + pcs1: ethernet-pcs@0 { + compatible = "fsl,lynx-pcs"; reg = <0>; }; }; @@ -830,7 +831,8 @@ pcs_mdio2: mdio@8c0b000 { #size-cells = <0>; status = "disabled"; - pcs2: ethernet-phy@0 { + pcs2: ethernet-pcs@0 { + compatible = "fsl,lynx-pcs"; reg = <0>; }; }; @@ -843,19 +845,23 @@ pcs_mdio3: mdio@8c0f000 { #size-cells = <0>; status = "disabled"; - pcs3_0: ethernet-phy@0 { + pcs3_0: ethernet-pcs@0 { + compatible = "fsl,lynx-pcs"; reg = <0>; }; - pcs3_1: ethernet-phy@1 { + pcs3_1: ethernet-pcs@1 { + compatible = "fsl,lynx-pcs"; reg = <1>; }; - pcs3_2: ethernet-phy@2 { + pcs3_2: ethernet-pcs@2 { + compatible = "fsl,lynx-pcs"; reg = <2>; }; - pcs3_3: ethernet-phy@3 { + pcs3_3: ethernet-pcs@3 { + compatible = "fsl,lynx-pcs"; reg = <3>; }; }; @@ -868,19 +874,23 @@ pcs_mdio7: mdio@8c1f000 { #size-cells = <0>; status = "disabled"; - pcs7_0: ethernet-phy@0 { + pcs7_0: ethernet-pcs@0 { + compatible = "fsl,lynx-pcs"; reg = <0>; }; - pcs7_1: ethernet-phy@1 { + pcs7_1: ethernet-pcs@1 { + compatible = "fsl,lynx-pcs"; reg = <1>; }; - pcs7_2: ethernet-phy@2 { + pcs7_2: ethernet-pcs@2 { + compatible = "fsl,lynx-pcs"; reg = <2>; }; - pcs7_3: ethernet-phy@3 { + pcs7_3: ethernet-pcs@3 { + compatible = "fsl,lynx-pcs"; reg = <3>; }; }; -- 2.35.1.1320.gc452695387.dirty
[PATCH v8 7/9] arm64: dts: ls1088a: Add serdes bindings
This adds bindings for the SerDes devices. They are disabled by default to prevent any breakage on existing boards. Signed-off-by: Sean Anderson --- (no changes since v4) Changes in v4: - Convert to new bindings Changes in v3: - New arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi index 421d879013d7..3810d66f8725 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi @@ -238,6 +238,24 @@ reset: syscon@1e6 { reg = <0x0 0x1e6 0x0 0x1>; }; + serdes1: serdes@1ea { + #address-cells = <1>; + #size-cells = <0>; + #clock-cells = <1>; + compatible = "fsl,ls1088a-serdes", "fsl,lynx-10g"; + reg = <0x0 0x1ea 0x0 0x2000>; + status = "disabled"; + }; + + serdes2: serdes@1eb { + #address-cells = <1>; + #size-cells = <0>; + #clock-cells = <1>; + compatible = "fsl,ls1088a-serdes", "fsl,lynx-10g"; + reg = <0x0 0x1eb 0x0 0x2000>; + status = "disabled"; + }; + isc: syscon@1f7 { compatible = "fsl,ls1088a-isc", "syscon"; reg = <0x0 0x1f7 0x0 0x1>; -- 2.35.1.1320.gc452695387.dirty
[PATCH v8 6/9] arm64: dts: ls1046ardb: Add serdes bindings
This adds appropriate bindings for the macs which use the SerDes. The 156.25MHz fixed clock is a crystal. The 100MHz clocks (there are actually 3) come from a Renesas 6V49205B at address 69 on i2c0. There is no driver for this device (and as far as I know all you can do with the 100MHz clocks is gate them), so I have chosen to model it as a single fixed clock. Note: the SerDes1 lane numbering for the LS1046A is *reversed*. This means that Lane A (what the driver thinks is lane 0) uses pins SD1_TX3_P/N. Because this will break ethernet if the serdes is not enabled, enable the serdes driver by default on Layerscape. Signed-off-by: Sean Anderson --- This depends on [1]. [1] https://lore.kernel.org/netdev/20220804194705.459670-4-sean.ander...@seco.com/ Changes in v8: - Rename serdes phy handles to use _A, _B, etc. instead of _0, _1, etc. This should help remind readers that the numbering corresponds to the physical layout of the registers, and not the lane (pin) number. Changes in v6: - XGI.9 -> XFI.9 Changes in v4: - Convert to new bindings .../boot/dts/freescale/fsl-ls1046a-rdb.dts| 112 ++ drivers/phy/freescale/Kconfig | 1 + 2 files changed, 113 insertions(+) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dts b/arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dts index 7025aad8ae89..50bc310ac31d 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dts +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a-rdb.dts @@ -10,6 +10,8 @@ /dts-v1/; +#include + #include "fsl-ls1046a.dtsi" / { @@ -26,8 +28,110 @@ aliases { chosen { stdout-path = "serial0:115200n8"; }; + + clocks { + clk_100mhz: clock-100mhz { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <1>; + }; + + clk_156mhz: clock-156mhz { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <15625>; + }; + }; }; + { + clocks = <_100mhz>, <_156mhz>; + clock-names = "ref0", "ref1"; + status = "okay"; + + /* +* XXX: Lane A uses pins SD1_RX3_P/N! That is, the lane numbers and pin +* numbers are _reversed_. In addition, the PCCR documentation is +* _inconsistent_ in its usage of these terms! +* +* PCCR "Lane 0" refers to... +* = +*0 Lane A +*2 Lane A +*8 Lane A +*9 Lane A +*B Lane D! +*/ + serdes1_A: phy@0 { + #phy-cells = <0>; + reg = <0>; + + /* SGMII.6 */ + sgmii-0 { + fsl,pccr = <0x8>; + fsl,index = <0>; + fsl,cfg = <0x1>; + phy-type = ; + }; + }; + + serdes1_B: phy@1 { + #phy-cells = <0>; + reg = <1>; + + /* SGMII.5 */ + sgmii-1 { + fsl,pccr = <0x8>; + fsl,index = <1>; + fsl,cfg = <0x1>; + phy-type = ; + }; + }; + + serdes1_C: phy@2 { + #phy-cells = <0>; + reg = <2>; + + /* SGMII.10 */ + sgmii-2 { + fsl,pccr = <0x8>; + fsl,index = <2>; + fsl,cfg = <0x1>; + phy-type = ; + }; + + /* XFI.10 */ + xfi-0 { + fsl,pccr = <0xb>; + fsl,index = <0>; + fsl,cfg = <0x2>; + phy-type = ; + }; + }; + + serdes1_D: phy@3 { + #phy-cells = <0>; + reg = <3>; + + /* SGMII.9 */ + sgmii-3 { + fsl,pccr = <0x8>; + fsl,index = <3>; + fsl,cfg = <0x1>; + phy-type = ; + }; + + /* XFI.9 */ + xfi-9 { + fsl,pccr = <0xb>; + fsl,index = <1>; + fsl,cfg = <0x1>; + phy-type = ; + }; + }; +}; + + { status = "okay"; }; @@ -140,21 +244,29 @@ ethernet@e6000 { ethernet@e8000 { phy-handle = <_phy1>; phy-connection-type = "sgmii"; + phys = <_B>; + phy-names = "serdes"; }; ethernet@ea000 { phy-handle = <_phy2>; phy-connection-type = "sgmii"; + phys = <_A>; + phy-names = "serdes"; };
[PATCH v8 4/9] phy: fsl: Add Lynx 10G SerDes driver
This adds support for the Lynx 10G "SerDes" devices found on various NXP QorIQ SoCs. There may be up to four SerDes devices on each SoC, each supporting up to eight lanes. Protocol support for each SerDes is highly heterogeneous, with each SoC typically having a totally different selection of supported protocols for each lane. Additionally, the SerDes devices on each SoC also have differing support. One SerDes will typically support Ethernet on most lanes, while the other will typically support PCIe on most lanes. There is wide hardware support for this SerDes. It is present on QorIQ T-Series and Layerscape processors. Because each SoC typically has specific instructions and exceptions for its SerDes, I have limited the initial scope of this module to just the LS1046A and LS1088A. Additionally, I have only added support for Ethernet protocols. There is not a great need for dynamic reconfiguration for other protocols (except perhaps for M.2 cards), so support for them may never be added. Nevertheless, I have tried to provide an obvious path for adding support for other SoCs as well as other protocols. SATA just needs support for configuring LNmSSCR0. PCIe may need to configure the equalization registers. It also uses multiple lanes. I have tried to write the driver with multi-lane support in mind, so there should not need to be any large changes. Although there are 6 protocols supported, I have only tested SGMII and XFI. The rest have been implemented as described in the datasheet. Most of these protocols should work "as-is", but 10GBASE-KR will need PCS support for link training. The PLLs are modeled as clocks proper. This lets us take advantage of the existing clock infrastructure. I have not given the same treatment to the per-lane clocks because they need to be programmed in-concert with the rest of the lane settings. One tricky thing is that the VCO (PLL) rate exceeds 2^32 (maxing out at around 5GHz). This will be a problem on 32-bit platforms, since clock rates are stored as unsigned longs. To work around this, the pll clock rate is generally treated in units of kHz. The PLLs are configured rather interestingly. Instead of the usual direct programming of the appropriate divisors, the input and output clock rates are selected directly. Generally, the only restriction is that the input and output must be integer multiples of each other. This suggests some kind of internal look-up table. The datasheets generally list out the supported combinations explicitly, and not all input/output combinations are documented. I'm not sure if this is due to lack of support, or due to an oversight. If this becomes an issue, then some combinations can be blacklisted (or whitelisted). This may also be necessary for other SoCs which have more stringent clock requirements. Unlike some other phys where e.g. PCIe x4 will use 4 separate phys all configured for PCIe, this driver uses one phy configured to use 4 lanes. This is because while the individual lanes may be configured individually, the protocol selection acts on all lanes at once. Additionally, the order which lanes should be configured in is specified by the datasheet. To coordinate this, lanes are reserved in phy_init, and released in phy_exit. This driver was written with reference to the LS1046A reference manual. However, it was informed by reference manuals for all processors with mEMACs, especially the T4240 (which appears to have a "maxed-out" configuration). The earlier P-series processors appear to be similar, but have a different overall register layout (using "banks" instead of separate SerDes). Perhaps this those use a "5G Lynx SerDes." Signed-off-by: Sean Anderson --- Changes in v8: - Remove unused variable from lynx_ls_mode_init Changes in v7: - Break out call order into generic documentation - Refuse to switch "major" protocols - Update Kconfig to reflect restrictions - Remove set/clear of "pcs reset" bit, since it doesn't seem to fix anything. Changes in v6: - Update MAINTAINERS to include new files - Include bitfield.h and slab.h to allow compilation on non-arm64 arches. - Depend on COMMON_CLK and either layerscape/ppc Changes in v5: - Remove references to PHY_INTERFACE_MODE_1000BASEKX to allow this series to be applied directly to linux/master. - Add fsl,lynx-10g.h to MAINTAINERS Changes in v4: - Rework all debug statements to remove use of __func__. Additional information has been provided as necessary. - Consider alternative parent rates in round_rate and not in set_rate. Trying to modify out parent's rate in set_rate will deadlock. - Explicitly perform a stop/reset sequence in set_rate. This way we always ensure that the PLL is properly stopped. - Set the power-down bit when disabling the PLL. We can do this now that enable/disable aren't abused during the set rate sequence. - Fix typos in QSGMII_OFFSET and XFI_OFFSET - Rename LNmTECR0_TEQ_TYPE_PRE to LNmTECR0_TEQ_TYPE_POST to better reflect its function (adding
[PATCH v8 5/9] arm64: dts: ls1046a: Add serdes bindings
This adds bindings for the SerDes devices. They are disabled by default to prevent any breakage on existing boards. Signed-off-by: Sean Anderson --- (no changes since v4) Changes in v4: - Convert to new bindings Changes in v3: - Describe modes in device tree Changes in v2: - Use one phy cell for SerDes1, since no lanes can be grouped - Disable SerDes by default to prevent breaking boards inadvertently. arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi index 3d9e29824bb2..8f986b4f5efc 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi @@ -423,6 +423,24 @@ sfp: efuse@1e8 { clock-names = "sfp"; }; + serdes1: serdes@1ea { + #address-cells = <1>; + #size-cells = <0>; + #clock-cells = <1>; + compatible = "fsl,ls1046a-serdes", "fsl,lynx-10g"; + reg = <0x0 0x1ea 0x0 0x2000>; + status = "disabled"; + }; + + serdes2: serdes@1eb { + #address-cells = <1>; + #size-cells = <0>; + #clock-cells = <1>; + compatible = "fsl,ls1046a-serdes", "fsl,lynx-10g"; + reg = <0x0 0x1eb 0x0 0x2000>; + status = "disabled"; + }; + dcfg: dcfg@1ee { compatible = "fsl,ls1046a-dcfg", "syscon"; reg = <0x0 0x1ee 0x0 0x1000>; -- 2.35.1.1320.gc452695387.dirty
[PATCH v8 3/9] dt-bindings: clock: Add ids for Lynx 10g PLLs
This adds ids for the Lynx 10g SerDes's internal PLLs. These may be used with assigned-clock* to specify a particular frequency to use. For example, to set the second PLL (at offset 0x20)'s frequency, use LYNX10G_PLLa(1). These are for use only in the device tree, and are not otherwise used by the driver. Signed-off-by: Sean Anderson Acked-by: Rob Herring --- (no changes since v6) Changes in v6: - frequence -> frequency Changes in v5: - Update commit description - Dual id header Changes in v4: - New include/dt-bindings/clock/fsl,lynx-10g.h | 14 ++ 1 file changed, 14 insertions(+) create mode 100644 include/dt-bindings/clock/fsl,lynx-10g.h diff --git a/include/dt-bindings/clock/fsl,lynx-10g.h b/include/dt-bindings/clock/fsl,lynx-10g.h new file mode 100644 index ..15362ae85304 --- /dev/null +++ b/include/dt-bindings/clock/fsl,lynx-10g.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause */ +/* + * Copyright (C) 2022 Sean Anderson + */ + +#ifndef __DT_BINDINGS_CLK_LYNX_10G_H +#define __DT_BINDINGS_CLK_LYNX_10G_H + +#define LYNX10G_CLKS_PER_PLL 2 + +#define LYNX10G_PLLa(a)((a) * LYNX10G_CLKS_PER_PLL) +#define LYNX10G_PLLa_EX_DLY(a) ((a) * LYNX10G_CLKS_PER_PLL + 1) + +#endif /* __DT_BINDINGS_CLK_LYNX_10G_H */ -- 2.35.1.1320.gc452695387.dirty
[PATCH v8 1/9] dt-bindings: phy: Add 2500BASE-X and 10GBASE-R
This adds some modes necessary for Lynx 10G support. 2500BASE-X, also known as 2.5G SGMII, is 1000BASE-X/SGMII overclocked to 3.125 GHz, with autonegotiation disabled. 10GBASE-R, also known as XFI, is the protocol spoken between the PMA and PMD ethernet layers for 10GBASE-T and 10GBASE-S/L/E. It is typically used to communicate directly with SFP+ modules, or with 10GBASE-T phys. Signed-off-by: Sean Anderson Acked-by: Rob Herring --- PR increasing phy-type maximum [1]. If this commit could be applied sooner rather than later, I'd appreciate it. This should help avoid another respin if someone else adds another phy type. [1] https://github.com/devicetree-org/dt-schema/pull/85 (no changes since v6) Changes in v6: - Bump PHY_TYPE_2500BASEX to 13, since PHY_TYPE_USXGMII was added in the meantime Changes in v4: - New include/dt-bindings/phy/phy.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/dt-bindings/phy/phy.h b/include/dt-bindings/phy/phy.h index 6b901b342348..5b2b674d8d25 100644 --- a/include/dt-bindings/phy/phy.h +++ b/include/dt-bindings/phy/phy.h @@ -23,5 +23,7 @@ #define PHY_TYPE_DPHY 10 #define PHY_TYPE_CPHY 11 #define PHY_TYPE_USXGMII 12 +#define PHY_TYPE_2500BASEX 13 +#define PHY_TYPE_10GBASER 14 #endif /* _DT_BINDINGS_PHY */ -- 2.35.1.1320.gc452695387.dirty
[PATCH v8 2/9] dt-bindings: phy: Add Lynx 10G phy binding
This adds a binding for the SerDes module found on QorIQ processors. Each phy is a subnode of the top-level device, possibly supporting multiple lanes and protocols. This "thick" #phy-cells is used due to allow for better organization of parameters. Note that the particular parameters necessary to select a protocol-controller/lane combination vary across different SoCs, and even within different SerDes on the same SoC. The driver is designed to be able to completely reconfigure lanes at runtime. Generally, the phy consumer can select the appropriate protocol using set_mode. There are two PLLs, each of which can be used as the master clock for each lane. Each PLL has its own reference. For the moment they are required, because it simplifies the driver implementation. Absent reference clocks can be modeled by a fixed-clock with a rate of 0. Signed-off-by: Sean Anderson Reviewed-by: Rob Herring Acked-by: Krzysztof Kozlowski --- (no changes since v7) Changes in v7: - Use double quotes everywhere in yaml Changes in v6: - fsl,type -> phy-type Changes in v4: - Use subnodes to describe lane configuration, instead of describing PCCRs. This is the same style used by phy-cadence-sierra et al. Changes in v3: - Manually expand yaml references - Add mode configuration to device tree Changes in v2: - Rename to fsl,lynx-10g.yaml - Refer to the device in the documentation, rather than the binding - Move compatible first - Document phy cells in the description - Allow a value of 1 for phy-cells. This allows for compatibility with the similar (but according to Ioana Ciornei different enough) lynx-28g binding. - Remove minItems - Use list for clock-names - Fix example binding having too many cells in regs - Add #clock-cells. This will allow using assigned-clocks* to configure the PLLs. - Document the structure of the compatible strings .../devicetree/bindings/phy/fsl,lynx-10g.yaml | 236 ++ 1 file changed, 236 insertions(+) create mode 100644 Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml diff --git a/Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml b/Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml new file mode 100644 index ..f326fdc159c5 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/fsl,lynx-10g.yaml @@ -0,0 +1,236 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/phy/fsl,lynx-10g.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: NXP Lynx 10G SerDes + +maintainers: + - Sean Anderson + +description: | + These Lynx "SerDes" devices are found in NXP's QorIQ line of processors. The + SerDes provides up to eight lanes. Each lane may be configured individually, + or may be combined with adjacent lanes for a multi-lane protocol. The SerDes + supports a variety of protocols, including up to 10G Ethernet, PCIe, SATA, and + others. The specific protocols supported for each lane depend on the + particular SoC. + +properties: + compatible: +items: + - enum: + - fsl,ls1046a-serdes + - fsl,ls1088a-serdes + - const: fsl,lynx-10g + + "#address-cells": +const: 1 + + "#size-cells": +const: 0 + + "#clock-cells": +const: 1 +description: | + The cell contains an ID as described in dt-bindings/clock/fsl,lynx-10g.h. + Note that when assigning a rate to a PLL, the PLL's rate is divided by + 1000 to avoid overflow. A rate of 500 corresponds to 5GHz. + + clocks: +maxItems: 2 +description: | + Clock for each PLL reference clock input. + + clock-names: +minItems: 2 +maxItems: 2 +items: + enum: +- ref0 +- ref1 + + reg: +maxItems: 1 + +patternProperties: + "^phy@": +type: object + +description: | + A contiguous group of lanes which will be configured together. Each group + corresponds to one phy device. Lanes not described by any group will be + left as-is. + +properties: + "#phy-cells": +const: 0 + + reg: +minItems: 1 +maxItems: 8 +description: + The lanes in the group. These must be listed in order. The first lane + will have the FIRST_LANE bit set in GCR0. The order of lanes also + determines the reset order (TRSTDIR). + +patternProperties: + "^(q?sgmii|xfi)": +type: object + +description: | + A protocol controller which may control the group of lanes. Each + controller is selected through the PCCRs. In addition to protocols + desired for use by the OS, protocols which may have been configured + by the bootloader must also be described. This ensures that only one + protocol controller is attached to a group of lanes at once. + +properties: + fsl,pccr: +$ref: /schemas/types.yaml#/definitions/uint32 +description: | + The index of
[PATCH v8 0/9] phy: Add support for Lynx 10G SerDes
This adds support for the Lynx 10G SerDes found on the QorIQ T-series and Layerscape series. Due to limited time and hardware, only support for the LS1046ARDB and LS1088ARDB is added in this initial series. This series is based on phy/next, but it requires phylink support. This is already present for the LS1088A, and it was recently added for the LS1046A in net-next/master. Dynamic reconfiguration does not work. That is, the configuration must match what is set in the RCW. From my testing, SerDes register settings appear identical. The issue appears to be between the PCS and the MAC. The link itself comes up at both ends, and a mac loopback succeeds. However, a PCS loopback results in dropped packets. Perhaps there is some undocumented register in the PCS? I suspect this driver is around 95% complete, but I don't have the documentation to make it work completely. At the very least it is useful for two cases: - Although this is untested, it should support 2.5G SGMII as well as 1000BASE-KX. The latter needs MAC and PCS support, but the former should work out of the box. - It allows for clock configurations not supported by the RCW. This is very useful if you want to use e.g. SRDS_PRTCL_S1=0x and =0x1133 on the same board. This is because the former setting will use PLL1 as the 1G reference, but the latter will use PLL1 as the 10G reference. Because we can reconfigure the PLLs, it is possible to always use PLL1 as the 1G reference. The final patch in this series should not be applied, as it depends on recent MC firmware (and configuration). See the commit message for details. Changes in v8: - Remove unused variable from lynx_ls_mode_init - Rename serdes phy handles to use _A, _B, etc. instead of _0, _1, etc. This should help remind readers that the numbering corresponds to the physical layout of the registers, and not the lane (pin) number. - Prevent PCSs from probing as phys - Rename serdes phy handles like the LS1046A - Add SFP slot binding - Fix incorrect lane ordering (it's backwards on the LS1088A just like it is in the LS1046A). - Fix duplicated lane 2 (it should have been lane 3). - Fix incorrectly-documented value for XFI1. - Remove interrupt for aquantia phy. It never fired for whatever reason, preventing the link from coming up. - Add GPIOs for QIXIS FPGA. - Enable MAC1 PCS - Remove si5341 binding Changes in v7: - Use double quotes everywhere in yaml - Break out call order into generic documentation - Refuse to switch "major" protocols - Update Kconfig to reflect restrictions - Remove set/clear of "pcs reset" bit, since it doesn't seem to fix anything. Changes in v6: - Bump PHY_TYPE_2500BASEX to 13, since PHY_TYPE_USXGMII was added in the meantime - fsl,type -> phy-type - frequence -> frequency - Update MAINTAINERS to include new files - Include bitfield.h and slab.h to allow compilation on non-arm64 arches. - Depend on COMMON_CLK and either layerscape/ppc - XGI.9 -> XFI.9 Changes in v5: - Update commit description - Dual id header - Remove references to PHY_INTERFACE_MODE_1000BASEKX to allow this series to be applied directly to linux/master. - Add fsl,lynx-10g.h to MAINTAINERS Changes in v4: - Add 2500BASE-X and 10GBASE-R phy types - Use subnodes to describe lane configuration, instead of describing PCCRs. This is the same style used by phy-cadence-sierra et al. - Add ids for Lynx 10g PLLs - Rework all debug statements to remove use of __func__. Additional information has been provided as necessary. - Consider alternative parent rates in round_rate and not in set_rate. Trying to modify out parent's rate in set_rate will deadlock. - Explicitly perform a stop/reset sequence in set_rate. This way we always ensure that the PLL is properly stopped. - Set the power-down bit when disabling the PLL. We can do this now that enable/disable aren't abused during the set rate sequence. - Fix typos in QSGMII_OFFSET and XFI_OFFSET - Rename LNmTECR0_TEQ_TYPE_PRE to LNmTECR0_TEQ_TYPE_POST to better reflect its function (adding post-cursor equalization). - Use of_clk_hw_onecell_get instead of a custom function. - Return struct clks from lynx_clks_init instead of embedding lynx_clk in lynx_priv. - Rework PCCR helper functions; T-series SoCs differ from Layerscape SoCs primarily in the layout and offset of the PCCRs. This will help bring a cleaner abstraction layer. The caps have been removed, since this handles the only current usage. - Convert to use new binding format. As a result of this, we no longer need to have protocols for PCIe or SATA. Additionally, modes now live in lynx_group instead of lynx_priv. - Remove teq from lynx_proto_params, since it can be determined from preq_ratio/postq_ratio. - Fix an early return from lynx_set_mode not releasing serdes->lock. - Rename lynx_priv.conf to .cfg, since I kept mistyping it. Changes in v3: - Manually expand yaml references - Add mode configuration to device tree - Rename remaining references to
Re: [PATCH 1/3] ASoC: dt-bindings: fsl,sai: Add compatible string for i.MX93 platform
On 27/10/2022 02:03, Chancel Liu wrote: > Add compatible string "fsl,imx93-sai" for i.MX93 platform > > Signed-off-by: Chancel Liu > --- > Documentation/devicetree/bindings/sound/fsl,sai.yaml | 1 + Acked-by: Krzysztof Kozlowski Best regards, Krzysztof
[PATCH] powerpc/64e: Fix amdgpu build on Book3E w/o AltiVec
There's a build failure for Book3E without AltiVec: Error: cc1: error: AltiVec not supported in this target make[6]: *** [/linux/scripts/Makefile.build:250: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o] Error 1 This happens because the amdgpu build is only gated by PPC_LONG_DOUBLE_128, but that symbol can be enabled even though AltiVec is disabled. The only user of PPC_LONG_DOUBLE_128 is amdgpu, so just add a dependency on AltiVec to that symbol to fix the build. Signed-off-by: Michael Ellerman --- arch/powerpc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 699df27b0e2f..20fb1765238c 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -285,7 +285,7 @@ config PPC # config PPC_LONG_DOUBLE_128 - depends on PPC64 + depends on PPC64 && ALTIVEC def_bool $(success,test "$(shell,echo __LONG_DOUBLE_128__ | $(CC) -E -P -)" = 1) config PPC_BARRIER_NOSPEC -- 2.37.3
Re: [PATCH net-next] eth: fealnx: delete the driver for Myson MTD-800
Hello: This patch was applied to netdev/net-next.git (master) by Paolo Abeni : On Tue, 25 Oct 2022 11:42:54 -0700 you wrote: > The git history for this driver seems to be completely > automated / tree wide changes. I can't find any boards > or systems which would use this chip. Google search > shows pictures of towel warmers and no networking products. > > Signed-off-by: Jakub Kicinski > > [...] Here is the summary with links: - [net-next] eth: fealnx: delete the driver for Myson MTD-800 https://git.kernel.org/netdev/net-next/c/d5e2d038dbec You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH v4 2/2] arm64: support batched/deferred tlb shootdown during page reclamation
On 9/28/22 05:53, Barry Song wrote: > On Tue, Sep 27, 2022 at 10:15 PM Yicong Yang wrote: >> >> On 2022/9/27 14:16, Anshuman Khandual wrote: >>> [...] >>> >>> On 9/21/22 14:13, Yicong Yang wrote: +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) +{ +/* for small systems with small number of CPUs, TLB shootdown is cheap */ +if (num_online_cpus() <= 4) >>> >>> It would be great to have some more inputs from others, whether 4 (which >>> should >>> to be codified into a macro e.g ARM64_NR_CPU_DEFERRED_TLB, or something >>> similar) >>> is optimal for an wide range of arm64 platforms. >>> > > I have tested it on a 4-cpus and 8-cpus machine. but i have no machine > with 5,6,7 > cores. > I saw improvement on 8-cpus machines and I found 4-cpus machines don't need > this patch. > > so it seems safe to have > if (num_online_cpus() < 8) > >> >> Do you prefer this macro to be static or make it configurable through >> kconfig then >> different platforms can make choice based on their own situations? It maybe >> hard to >> test on all the arm64 platforms. > > Maybe we can have this default enabled on machines with 8 and more cpus and > provide a tlbflush_batched = on or off to allow users enable or > disable it according > to their hardware and products. Similar example: rodata=on or off. No, sounds bit excessive. Kernel command line options should not be added for every possible run time switch options. > > Hi Anshuman, Will, Catalin, Andrew, > what do you think about this approach? > > BTW, haoxin mentioned another important user scenarios for tlb bach on arm64: > https://lore.kernel.org/lkml/393d6318-aa38-01ed-6ad8-f9eac89bf...@linux.alibaba.com/ > > I do believe we need it based on the expensive cost of tlb shootdown in arm64 > even by hardware broadcast. Alright, for now could we enable ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH selectively with CONFIG_EXPERT and for num_online_cpus() > 8 ?
Re: [PATCH] powerpc/books: Never call nmi_enter for real-mode NMIs
Hi Nicholas, I love your patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on linus/master v6.1-rc2 next-20221027] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Nicholas-Piggin/powerpc-books-Never-call-nmi_enter-for-real-mode-NMIs/20221027-154503 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next patch link: https://lore.kernel.org/r/20221027074314.2084016-1-npiggin%40gmail.com patch subject: [PATCH] powerpc/books: Never call nmi_enter for real-mode NMIs config: powerpc-allnoconfig compiler: powerpc-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/57c237cb1fe034e85db9cd44fb09986fa27b2197 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Nicholas-Piggin/powerpc-books-Never-call-nmi_enter-for-real-mode-NMIs/20221027-154503 git checkout 57c237cb1fe034e85db9cd44fb09986fa27b2197 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/lib/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot All errors (new ones prefixed by >>): In file included from arch/powerpc/lib/feature-fixups.c:20: arch/powerpc/include/asm/interrupt.h: In function 'interrupt_nmi_enter_prepare': >> arch/powerpc/include/asm/interrupt.h:349:18: error: 'struct >> interrupt_nmi_state' has no member named 'mmu_enabled' 349 | if (state->mmu_enabled) | ^~ In file included from arch/powerpc/include/asm/bug.h:158, from include/linux/bug.h:5, from arch/powerpc/include/asm/cmpxchg.h:8, from arch/powerpc/include/asm/atomic.h:11, from include/linux/atomic.h:7, from include/linux/jump_label.h:254, from arch/powerpc/lib/feature-fixups.c:12: arch/powerpc/include/asm/interrupt.h: In function 'interrupt_nmi_exit_prepare': arch/powerpc/include/asm/interrupt.h:355:27: error: 'struct interrupt_nmi_state' has no member named 'mmu_enabled' 355 | WARN_ON_ONCE(state->mmu_enabled != !!(mfmsr() & MSR_DR)); | ^~ include/asm-generic/bug.h:110:32: note: in definition of macro 'WARN_ON_ONCE' 110 | int __ret_warn_on = !!(condition); \ |^ arch/powerpc/include/asm/interrupt.h:357:18: error: 'struct interrupt_nmi_state' has no member named 'mmu_enabled' 357 | if (state->mmu_enabled) | ^~ vim +349 arch/powerpc/include/asm/interrupt.h 337 338 /* 339 * If data relocations are enabled, it's safe to use nmi_enter(). 340 * Otherwise avoid using it because the core kernel may touch 341 * vmalloc (e.g., in per-CPU variables), which is not accessible 342 * with the MMU off. Linear memory beyond the VRMA limit is also 343 * a problem for hash guests. 344 * 345 * The real-mode machine checks should not use RCU, tracing, lockdep 346 * locks, and should not printk, access per-CPU variables, among 347 * many other restrictions. 348 */ > 349 if (state->mmu_enabled) 350 nmi_enter(); 351 } 352 -- 0-DAY CI Kernel Test Service https://01.org/lkp # # Automatically generated file; DO NOT EDIT. # Linux/powerpc 6.1.0-rc2 Kernel Configuration # CONFIG_CC_VERSION_TEXT="powerpc-linux-gcc (GCC) 12.1.0" CONFIG_CC_IS_GCC=y CONFIG_GCC_VERSION=120100 CONFIG_CLANG_VERSION=0 CONFIG_AS_IS_GNU=y CONFIG_AS_VERSION=23800 CONFIG_LD_IS_BFD=y CONFIG_LD_VERSION=23800 CONFIG_LLD_VERSION=0 CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y CONFIG_CC_HAS_ASM_INLINE=y CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y CONFIG_PAHOLE_VERSION=123 CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_TABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # CONFIG_COMPILE_TEST is not set # CONFIG_WERROR is not set CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_BUILD_SALT="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_XZ is not set CONFIG_DEFAULT_INIT="&qu
[PATCH v3] soc: fsl: qe: Avoid using gpio_to_desc()
The qe gpio driver is a custom API combined GPIO and pin control driver that exist outside of the pin control subsystem for historical reasons. We want to get rid of the old GPIO numberspace, so instead of calling gpio_to_desc() we get the gpio descriptor for the requested line from the device tree directly without passing through the GPIO numberspace, and then we get the gpiochip from the descriptor. Using the reference counting inside the gpio descriptor we can drop the reference counting code in this driver. A second gpiod_get() will not succeed. To obtain the local hardware offset of the GPIO line, the driver need to include the header from the gpiolib internals. This isn't pretty but it is the lesser evil compared to keeping the code as a roadblock to gpiolib refactoring. A proper solution would be to rewrite the driver as a real pin control driver with a built-in gpio_chip. Cc: Bartosz Golaszewski Cc: linux-g...@vger.kernel.org Signed-off-by: Linus Walleij --- SoC folks: please apply this directly for -next if it seems OK. ChangeLog v2->v3: - Realize what the driver is trying to do and make a deeper refactoring to get at gpiolib internals. ChangeLog v1->v2: - Add back the include: we are using the mm_of_gpio_chip, which should be fixed, but not now. --- drivers/soc/fsl/qe/gpio.c | 66 ++- 1 file changed, 30 insertions(+), 36 deletions(-) diff --git a/drivers/soc/fsl/qe/gpio.c b/drivers/soc/fsl/qe/gpio.c index 99f7de43c3c6..af9193e7e49b 100644 --- a/drivers/soc/fsl/qe/gpio.c +++ b/drivers/soc/fsl/qe/gpio.c @@ -14,20 +14,23 @@ #include #include #include +#include #include -/* FIXME: needed for gpio_to_chip() get rid of this */ -#include #include #include #include +/* + * FIXME: this is legacy code that is accessing gpiolib internals in order + * to implement a custom pin controller. The proper solution is to create + * a real combined pin control and GPIO driver in drivers/pinctrl. However + * this hack is here for legacy code reasons. + */ +#include "../../../gpio/gpiolib.h" struct qe_gpio_chip { struct of_mm_gpio_chip mm_gc; spinlock_t lock; - unsigned long pin_flags[QE_PIO_PINS]; -#define QE_PIN_REQUESTED 0 - /* shadowed data register to clear/set bits safely */ u32 cpdata; @@ -144,6 +147,7 @@ struct qe_pin { * something like qe_pio_controller. Someday. */ struct qe_gpio_chip *controller; + struct gpio_desc *gpiod; int num; }; @@ -160,9 +164,8 @@ struct qe_pin *qe_pin_request(struct device_node *np, int index) { struct qe_pin *qe_pin; struct gpio_chip *gc; - struct qe_gpio_chip *qe_gc; + struct gpio_desc *gpiod; int err; - unsigned long flags; qe_pin = kzalloc(sizeof(*qe_pin), GFP_KERNEL); if (!qe_pin) { @@ -170,38 +173,36 @@ struct qe_pin *qe_pin_request(struct device_node *np, int index) return ERR_PTR(-ENOMEM); } - err = of_get_gpio(np, index); - if (err < 0) + gpiod = fwnode_gpiod_get_index(of_fwnode_handle(np), NULL, index, GPIOD_ASIS, "qe"); + if (IS_ERR(gpiod)) { + err = PTR_ERR(gpiod); + goto err0; + } + if (!gpiod) { + err = -EINVAL; goto err0; - gc = gpio_to_chip(err); + } + gc = gpiod_to_chip(gpiod); if (WARN_ON(!gc)) { err = -ENODEV; goto err0; } + qe_pin->gpiod = gpiod; + qe_pin->controller = gpiochip_get_data(gc); + /* +* FIXME: this gets the local offset on the gpio_chip so that the driver +* can manipulate pin control settings through its custom API. The real +* solution is to create a real pin control driver for this. +*/ + qe_pin->num = gpio_chip_hwgpio(gpiod); if (!of_device_is_compatible(gc->of_node, "fsl,mpc8323-qe-pario-bank")) { pr_debug("%s: tried to get a non-qe pin\n", __func__); + gpiod_put(gpiod); err = -EINVAL; goto err0; } - - qe_gc = gpiochip_get_data(gc); - - spin_lock_irqsave(_gc->lock, flags); - - err -= gc->base; - if (test_and_set_bit(QE_PIN_REQUESTED, _gc->pin_flags[err]) == 0) { - qe_pin->controller = qe_gc; - qe_pin->num = err; - err = 0; - } else { - err = -EBUSY; - } - - spin_unlock_irqrestore(_gc->lock, flags); - - if (!err) - return qe_pin; + return qe_pin; err0: kfree(qe_pin); pr_debug("%s failed with status %d\n", __func__, err); @@ -219,14 +220,7 @@ EXPORT_SYMBOL(qe_pin_request); */ void qe_pin_free(struct qe_pin *qe_pin) { - struct qe_gpio_chip *qe_gc = qe_pin->controller; - unsigned long flags; - const int pin = qe_pin->num; - -
Re: [PATCH] ALSA: aoa: i2sbus: fix possible memory leak in i2sbus_add_dev()
On Thu, 27 Oct 2022 09:41:03 +0200, Yang Yingliang wrote: > > > On 2022/10/27 14:38, Takashi Iwai wrote: > > On Thu, 27 Oct 2022 03:34:38 +0200, > > Yang Yingliang wrote: > >> dev_set_name() in soundbus_add_one() allocates memory for name, it need be > >> freed when of_device_register() fails, call soundbus_dev_put() to give up > >> the reference that hold in device_initialize(), so that it can be freed in > >> kobject_cleanup() when the refcount hit to 0. And other resources are also > >> freed in i2sbus_release_dev(), so it can return 0 directly. > >> > >> Fixes: f3d9478b2ce4 ("[ALSA] snd-aoa: add snd-aoa") > >> Signed-off-by: Yang Yingliang > > The check of kobj state is awkward, but it seems to be the simplest > > way... Applied now. Thanks! > > Indeed, it's awkward, shall we introduce a helper like this: > > diff --git a/sound/aoa/soundbus/i2sbus/core.c > b/sound/aoa/soundbus/i2sbus/core.c > index f6841daf9e3b..950c37e0297e 100644 > --- a/sound/aoa/soundbus/i2sbus/core.c > +++ b/sound/aoa/soundbus/i2sbus/core.c > @@ -302,7 +302,7 @@ static int i2sbus_add_dev(struct macio_dev *macio, > > if (soundbus_add_one(>sound)) { > printk(KERN_DEBUG "i2sbus: device registration error!\n"); > - if (dev->sound.ofdev.dev.kobj.state_initialized) { > + if (soundbus_dev_initialized(>sound)) { > soundbus_dev_put(>sound); > return 0; > } > diff --git a/sound/aoa/soundbus/soundbus.h b/sound/aoa/soundbus/soundbus.h > index 3a99c1f1a3ca..2c9c95cf156a 100644 > --- a/sound/aoa/soundbus/soundbus.h > +++ b/sound/aoa/soundbus/soundbus.h > @@ -174,6 +174,10 @@ struct soundbus_dev { > > extern int soundbus_add_one(struct soundbus_dev *dev); > extern void soundbus_remove_one(struct soundbus_dev *dev); > +static inline bool soundbus_dev_initialized(struct soundbus_dev *dev) > +{ > + return dev->ofdev.dev.kobj.state_initialized; > +} I think it's not worth much as it's used only at a single place. Takashi
[PATCH] powerpc/books: Never call nmi_enter for real-mode NMIs
NMIs that are taken in real mode (the early MCE and HMI handlers) skipped calling nmi_enter() in some configurations, in the hope that more modern configurations like radix suffer fewer restrictions. This just turns into whack-a-mole and fragile when core kernel code changes anything. A recent such example that breaks with radix, an HMI real mode interrupt tries to access vmalloc memory, causing it to take a machine check: --- interrupt: 200 at perf_trace_rcu_dyntick+0x140/0x190 NIP: c01d4720 LR: c01d2bb4 CTR: c01d45e0 REGS: c00fffdbfd60 TRAP: 0200 Tainted: G M(6.0.0-dirty) MSR: 90201003 CR: 24024228 XER: 2004 CFAR: c01d4648 DAR: c009e16e29a8 DSISR: 0008 IRQMASK: 3 GPR00: c01d2bb4 c00fffdc7b30 c255c100 c23089f8 GPR04: c1bd0438 4000 4002 00964794 GPR08: c009dff0055b29a8 000ffc13 7265677368657265 GPR12: c01d45e0 c00d7000 c014e7c8 cab74280 GPR16: c31a64d8 GPR20: cd9f7b00 0006 c2446a28 c009e16e29a8 GPR24: c1bd0438 4000 4002 00964794 GPR28: c01d2bb4 4002 c23089f8 c000263f0668 perf_trace_rcu_dyntick+0x140/0x190 __traceiter_rcu_dyntick+0x84/0xc0 --- interrupt: 200 rcu_read_lock_sched_held+0x10/0xe0 (unreliable) __traceiter_rcu_dyntick+0x84/0xc0 ct_nmi_enter+0x118/0x280 interrupt_nmi_enter_prepare+0x118/0x1f0 hmi_exception_realmode+0x38/0xe4 hmi_exception_early_common+0x114/0x2a0 --- interrupt: e60 at arch_local_irq_restore+0x11c/0x1b0 Just disable this entirely. It turns out the features that might be enabled by nmi_enter(), like RCU or printk are unlikely to be usable in real mode anyway. Reported-by: Michael Ellerman Cc: Mahesh Salgaonkar Signed-off-by: Nicholas Piggin --- arch/powerpc/include/asm/interrupt.h | 49 +--- 1 file changed, 16 insertions(+), 33 deletions(-) diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h index 4745bb9998bd..3e87e9ec5117 100644 --- a/arch/powerpc/include/asm/interrupt.h +++ b/arch/powerpc/include/asm/interrupt.h @@ -276,6 +276,7 @@ struct interrupt_nmi_state { u8 irq_soft_mask; u8 irq_happened; u8 ftrace_enabled; + u8 mmu_enabled; u64 softe; #endif }; @@ -303,6 +304,7 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte state->irq_soft_mask = local_paca->irq_soft_mask; state->irq_happened = local_paca->irq_happened; state->softe = regs->softe; + state->mmu_enabled = !!(mfmsr() & MSR_DR); /* * Set IRQS_ALL_DISABLED unconditionally so irqs_disabled() does @@ -333,46 +335,27 @@ static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct inte } #endif - /* If data relocations are enabled, it's safe to use nmi_enter() */ - if (mfmsr() & MSR_DR) { - nmi_enter(); - return; - } - - /* -* But do not use nmi_enter() for pseries hash guest taking a real-mode -* NMI because not everything it touches is within the RMA limit. -*/ - if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && - firmware_has_feature(FW_FEATURE_LPAR) && - !radix_enabled()) - return; - /* -* Likewise, don't use it if we have some form of instrumentation (like -* KASAN shadow) that is not safe to access in real mode (even on radix) +* If data relocations are enabled, it's safe to use nmi_enter(). +* Otherwise avoid using it because the core kernel may touch +* vmalloc (e.g., in per-CPU variables), which is not accessible +* with the MMU off. Linear memory beyond the VRMA limit is also +* a problem for hash guests. +* +* The real-mode machine checks should not use RCU, tracing, lockdep +* locks, and should not printk, access per-CPU variables, among +* many other restrictions. */ - if (IS_ENABLED(CONFIG_KASAN)) - return; - - /* Otherwise, it should be safe to call it */ - nmi_enter(); + if (state->mmu_enabled) + nmi_enter(); } static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state) { - if (mfmsr() & MSR_DR) { - // nmi_exit if relocations are on - nmi_exit(); - } else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && - firmware_has_feature(FW_FEATURE_LPAR) && - !radix_enabled()) { - // no nmi_exit for a pseries hash guest taking a real mode exception - } else if (IS_ENABLED(CONFIG_KASAN)) { - // no nmi_exit for KASAN
Re: [PATCH] ALSA: aoa: i2sbus: fix possible memory leak in i2sbus_add_dev()
On 2022/10/27 14:38, Takashi Iwai wrote: On Thu, 27 Oct 2022 03:34:38 +0200, Yang Yingliang wrote: dev_set_name() in soundbus_add_one() allocates memory for name, it need be freed when of_device_register() fails, call soundbus_dev_put() to give up the reference that hold in device_initialize(), so that it can be freed in kobject_cleanup() when the refcount hit to 0. And other resources are also freed in i2sbus_release_dev(), so it can return 0 directly. Fixes: f3d9478b2ce4 ("[ALSA] snd-aoa: add snd-aoa") Signed-off-by: Yang Yingliang The check of kobj state is awkward, but it seems to be the simplest way... Applied now. Thanks! Indeed, it's awkward, shall we introduce a helper like this: diff --git a/sound/aoa/soundbus/i2sbus/core.c b/sound/aoa/soundbus/i2sbus/core.c index f6841daf9e3b..950c37e0297e 100644 --- a/sound/aoa/soundbus/i2sbus/core.c +++ b/sound/aoa/soundbus/i2sbus/core.c @@ -302,7 +302,7 @@ static int i2sbus_add_dev(struct macio_dev *macio, if (soundbus_add_one(>sound)) { printk(KERN_DEBUG "i2sbus: device registration error!\n"); - if (dev->sound.ofdev.dev.kobj.state_initialized) { + if (soundbus_dev_initialized(>sound)) { soundbus_dev_put(>sound); return 0; } diff --git a/sound/aoa/soundbus/soundbus.h b/sound/aoa/soundbus/soundbus.h index 3a99c1f1a3ca..2c9c95cf156a 100644 --- a/sound/aoa/soundbus/soundbus.h +++ b/sound/aoa/soundbus/soundbus.h @@ -174,6 +174,10 @@ struct soundbus_dev { extern int soundbus_add_one(struct soundbus_dev *dev); extern void soundbus_remove_one(struct soundbus_dev *dev); +static inline bool soundbus_dev_initialized(struct soundbus_dev *dev) +{ + return dev->ofdev.dev.kobj.state_initialized; +} Thanks, Yang Takashi --- sound/aoa/soundbus/i2sbus/core.c | 4 1 file changed, 4 insertions(+) diff --git a/sound/aoa/soundbus/i2sbus/core.c b/sound/aoa/soundbus/i2sbus/core.c index faf6b03131ee..f6841daf9e3b 100644 --- a/sound/aoa/soundbus/i2sbus/core.c +++ b/sound/aoa/soundbus/i2sbus/core.c @@ -302,6 +302,10 @@ static int i2sbus_add_dev(struct macio_dev *macio, if (soundbus_add_one(>sound)) { printk(KERN_DEBUG "i2sbus: device registration error!\n"); + if (dev->sound.ofdev.dev.kobj.state_initialized) { + soundbus_dev_put(>sound); + return 0; + } goto err; } -- 2.25.1 .
Re: [PATCH kernel v2 0/3] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
Michael, Fred, ping? On 20/09/2022 23:04, Alexey Kardashevskiy wrote: Here is another take on iommu_ops on POWER to make VFIO work again on POWERPC64. Tested on PPC, kudos to Fred! The tree with all prerequisites is here: https://github.com/aik/linux/tree/kvm-fixes-wip The previous discussion is here: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20220707135552.3688927-1-...@ozlabs.ru/ https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-...@ozlabs.ru/ https://lore.kernel.org/all/20220714081822.3717693-3-...@ozlabs.ru/T/ Please comment. Thanks. This is based on sha1 ce888220d5c7 Linus Torvalds "Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi". Please comment. Thanks. Alexey Kardashevskiy (3): powerpc/iommu: Add "borrowing" iommu_table_group_ops powerpc/pci_64: Init pcibios subsys a bit later powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains arch/powerpc/include/asm/iommu.h | 6 +- arch/powerpc/include/asm/pci-bridge.h | 7 + arch/powerpc/platforms/pseries/pseries.h | 4 + arch/powerpc/kernel/iommu.c | 247 +- arch/powerpc/kernel/pci_64.c | 2 +- arch/powerpc/platforms/powernv/pci-ioda.c | 36 +++- arch/powerpc/platforms/pseries/iommu.c| 27 +++ arch/powerpc/platforms/pseries/setup.c| 3 + drivers/vfio/vfio_iommu_spapr_tce.c | 96 ++--- 9 files changed, 334 insertions(+), 94 deletions(-) -- Alexey
Re: [PATCH 0/3] Add support for SAI on i.MX93 platform
On Thu, Oct 27, 2022 at 9:14 AM Chancel Liu wrote: > > This patchset supports SAI on i.MX93 platform. > > Chancel Liu (3): > ASoC: dt-bindings: fsl,sai: Add compatible string for i.MX93 platform > ASoC: fsl_sai: Add support for i.MX93 platform > ASoC: fsl_sai: Specify the maxburst to 8 on i.MX93 platform Reviewed-by: Daniel Baluta
Re: [PATCH] ALSA: aoa: i2sbus: fix possible memory leak in i2sbus_add_dev()
On Thu, 27 Oct 2022 03:34:38 +0200, Yang Yingliang wrote: > > dev_set_name() in soundbus_add_one() allocates memory for name, it need be > freed when of_device_register() fails, call soundbus_dev_put() to give up > the reference that hold in device_initialize(), so that it can be freed in > kobject_cleanup() when the refcount hit to 0. And other resources are also > freed in i2sbus_release_dev(), so it can return 0 directly. > > Fixes: f3d9478b2ce4 ("[ALSA] snd-aoa: add snd-aoa") > Signed-off-by: Yang Yingliang The check of kobj state is awkward, but it seems to be the simplest way... Applied now. Thanks! Takashi > --- > sound/aoa/soundbus/i2sbus/core.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/sound/aoa/soundbus/i2sbus/core.c > b/sound/aoa/soundbus/i2sbus/core.c > index faf6b03131ee..f6841daf9e3b 100644 > --- a/sound/aoa/soundbus/i2sbus/core.c > +++ b/sound/aoa/soundbus/i2sbus/core.c > @@ -302,6 +302,10 @@ static int i2sbus_add_dev(struct macio_dev *macio, > > if (soundbus_add_one(>sound)) { > printk(KERN_DEBUG "i2sbus: device registration error!\n"); > + if (dev->sound.ofdev.dev.kobj.state_initialized) { > + soundbus_dev_put(>sound); > + return 0; > + } > goto err; > } > > -- > 2.25.1 >
[PATCH 2/3] ASoC: fsl_sai: Add support for i.MX93 platform
Add compatible string and specific soc data to support SAI on i.MX93 platform. Signed-off-by: Chancel Liu --- sound/soc/fsl/fsl_sai.c | 12 1 file changed, 12 insertions(+) diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c index 81f89f6767a2..68e1cc4c369a 100644 --- a/sound/soc/fsl/fsl_sai.c +++ b/sound/soc/fsl/fsl_sai.c @@ -1579,6 +1579,17 @@ static const struct fsl_sai_soc_data fsl_sai_imx8ulp_data = { .max_register = FSL_SAI_RTCAP, }; +static const struct fsl_sai_soc_data fsl_sai_imx93_data = { + .use_imx_pcm = true, + .use_edma = true, + .fifo_depth = 128, + .reg_offset = 8, + .mclk0_is_mclk1 = false, + .pins = 4, + .flags = 0, + .max_register = FSL_SAI_MCTL, +}; + static const struct of_device_id fsl_sai_ids[] = { { .compatible = "fsl,vf610-sai", .data = _sai_vf610_data }, { .compatible = "fsl,imx6sx-sai", .data = _sai_imx6sx_data }, @@ -1590,6 +1601,7 @@ static const struct of_device_id fsl_sai_ids[] = { { .compatible = "fsl,imx8mp-sai", .data = _sai_imx8mp_data }, { .compatible = "fsl,imx8ulp-sai", .data = _sai_imx8ulp_data }, { .compatible = "fsl,imx8mn-sai", .data = _sai_imx8mp_data }, + { .compatible = "fsl,imx93-sai", .data = _sai_imx93_data }, { /* sentinel */ } }; MODULE_DEVICE_TABLE(of, fsl_sai_ids); -- 2.25.1
[PATCH 3/3] ASoC: fsl_sai: Specify the maxburst to 8 on i.MX93 platform
There is a limit to eDMA AXI on i.MX93. Only TCD that has NBYTES in a multiple of 8bytes can enable scatter-gather. NBYTES is calculated by bus width times maxburst. On i.MX93 platform the value of maxburst is specified to 8. It makes sure that NBYTES is a multiple of 8bytes. Signed-off-by: Chancel Liu --- sound/soc/fsl/fsl_sai.c | 11 +++ sound/soc/fsl/fsl_sai.h | 1 + 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c index 68e1cc4c369a..a0ea27f06997 100644 --- a/sound/soc/fsl/fsl_sai.c +++ b/sound/soc/fsl/fsl_sai.c @@ -872,10 +872,10 @@ static int fsl_sai_dai_probe(struct snd_soc_dai *cpu_dai) regmap_update_bits(sai->regmap, FSL_SAI_TCR1(ofs), FSL_SAI_CR1_RFW_MASK(sai->soc_data->fifo_depth), - sai->soc_data->fifo_depth - FSL_SAI_MAXBURST_TX); + sai->soc_data->fifo_depth - sai->dma_params_tx.maxburst); regmap_update_bits(sai->regmap, FSL_SAI_RCR1(ofs), FSL_SAI_CR1_RFW_MASK(sai->soc_data->fifo_depth), - FSL_SAI_MAXBURST_RX - 1); + sai->dma_params_rx.maxburst - 1); snd_soc_dai_init_dma_data(cpu_dai, >dma_params_tx, >dma_params_rx); @@ -1416,8 +1416,10 @@ static int fsl_sai_probe(struct platform_device *pdev) sai->dma_params_rx.addr = sai->res->start + FSL_SAI_RDR0; sai->dma_params_tx.addr = sai->res->start + FSL_SAI_TDR0; - sai->dma_params_rx.maxburst = FSL_SAI_MAXBURST_RX; - sai->dma_params_tx.maxburst = FSL_SAI_MAXBURST_TX; + sai->dma_params_rx.maxburst = + sai->soc_data->max_burst[RX] ? sai->soc_data->max_burst[RX] : FSL_SAI_MAXBURST_RX; + sai->dma_params_tx.maxburst = + sai->soc_data->max_burst[TX] ? sai->soc_data->max_burst[TX] : FSL_SAI_MAXBURST_TX; sai->pinctrl = devm_pinctrl_get(>dev); @@ -1588,6 +1590,7 @@ static const struct fsl_sai_soc_data fsl_sai_imx93_data = { .pins = 4, .flags = 0, .max_register = FSL_SAI_MCTL, + .max_burst = {8, 8}, }; static const struct of_device_id fsl_sai_ids[] = { diff --git a/sound/soc/fsl/fsl_sai.h b/sound/soc/fsl/fsl_sai.h index 697f6690068c..197748a888d5 100644 --- a/sound/soc/fsl/fsl_sai.h +++ b/sound/soc/fsl/fsl_sai.h @@ -235,6 +235,7 @@ struct fsl_sai_soc_data { unsigned int reg_offset; unsigned int flags; unsigned int max_register; + unsigned int max_burst[2]; }; /** -- 2.25.1
[PATCH 1/3] ASoC: dt-bindings: fsl,sai: Add compatible string for i.MX93 platform
Add compatible string "fsl,imx93-sai" for i.MX93 platform Signed-off-by: Chancel Liu --- Documentation/devicetree/bindings/sound/fsl,sai.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/sound/fsl,sai.yaml b/Documentation/devicetree/bindings/sound/fsl,sai.yaml index 70c4111d59c7..11aab891288d 100644 --- a/Documentation/devicetree/bindings/sound/fsl,sai.yaml +++ b/Documentation/devicetree/bindings/sound/fsl,sai.yaml @@ -26,6 +26,7 @@ properties: - fsl,imx8mq-sai - fsl,imx8qm-sai - fsl,imx8ulp-sai + - fsl,imx93-sai - items: - enum: - fsl,imx8mm-sai -- 2.25.1
[PATCH 0/3] Add support for SAI on i.MX93 platform
This patchset supports SAI on i.MX93 platform. Chancel Liu (3): ASoC: dt-bindings: fsl,sai: Add compatible string for i.MX93 platform ASoC: fsl_sai: Add support for i.MX93 platform ASoC: fsl_sai: Specify the maxburst to 8 on i.MX93 platform .../devicetree/bindings/sound/fsl,sai.yaml| 1 + sound/soc/fsl/fsl_sai.c | 23 +++ sound/soc/fsl/fsl_sai.h | 1 + 3 files changed, 21 insertions(+), 4 deletions(-) -- 2.25.1