[PATCH v2] powerpc: implement CONFIG_DEBUG_VIRTUAL
This patch implements CONFIG_DEBUG_VIRTUAL to warn about incorrect use of virt_to_phys() and page_to_phys() It also warns about DMA on stack when CONFIG_HAVE_ARCH_VMAP_STACK is selected. It will help locate them before activating CONFIG_VMAP_STACK Below is the result of test_debug_virtual: [1.438746] WARNING: CPU: 0 PID: 1 at ./arch/powerpc/include/asm/io.h:808 test_debug_virtual_init+0x3c/0xd4 [1.448156] CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc5-00560-g6bfb52e23a00-dirty #532 [1.457259] NIP: c066c550 LR: c0650ccc CTR: c066c514 [1.462257] REGS: c900bdb0 TRAP: 0700 Not tainted (4.20.0-rc5-00560-g6bfb52e23a00-dirty) [1.471184] MSR: 00029032 CR: 48000422 XER: 2000 [1.477811] [1.477811] GPR00: c0650ccc c900be60 c60d 006000c0 c900 9032 c7fa0020 [1.477811] GPR08: 2400 0001 0900 c07b5d04 c00037d8 [1.477811] GPR16: c076 c074 0092 c0685bb0 [1.477811] GPR24: c065042c c068a734 c0685b8c 0006 c076 c075c3c0 [1.512711] NIP [c066c550] test_debug_virtual_init+0x3c/0xd4 [1.518315] LR [c0650ccc] do_one_initcall+0x8c/0x1cc [1.523163] Call Trace: [1.525595] [c900be60] [c0567340] 0xc0567340 (unreliable) [1.530954] [c900be90] [c0650ccc] do_one_initcall+0x8c/0x1cc [1.536551] [c900bef0] [c0651000] kernel_init_freeable+0x1f4/0x2cc [1.542658] [c900bf30] [c00037ec] kernel_init+0x14/0x110 [1.547913] [c900bf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c [1.553971] Instruction dump: [1.556909] 3ca50100 bfa10024 54a5000e 3fa0c076 7c0802a6 3d454000 813dc204 554893be [1.564566] 7d294010 7d294910 90010034 39290001 <0f09> 7c3e0b78 955e0008 3fe0c062 [1.572425] ---[ end trace 6f6984225b280ad6 ]--- [1.577467] PA: 0x0900 for VA: 0xc900 [1.581799] PA: 0x061e8f50 for VA: 0xc61e8f50 Signed-off-by: Christophe Leroy --- v2: Using asm/pgtable.h to avoid build failure on ppc64e. Added a verification that the object is not in stack to catch problems before activing VMAP_STACK. arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/io.h | 19 ++- arch/powerpc/mm/pgtable_32.c | 2 +- 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index e312e92e3381..94b46624068d 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -128,6 +128,7 @@ config PPC # # Please keep this list sorted alphabetically. # + select ARCH_HAS_DEBUG_VIRTUAL select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_DMA_SET_COHERENT_MASK select ARCH_HAS_ELF_RANDOMIZE diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index e746becd9d6f..51e96a4413d2 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -29,12 +29,14 @@ extern struct pci_dev *isa_bridge_pcidev; #include #include +#include #include #include #include #include #include #include +#include #ifdef CONFIG_PPC64 #include @@ -804,6 +806,11 @@ extern void __iounmap_at(void *ea, unsigned long size); */ static inline unsigned long virt_to_phys(volatile void * address) { + if (IS_ENABLED(CONFIG_DEBUG_VIRTUAL) && + !WARN_ON(IS_ENABLED(CONFIG_HAVE_ARCH_VMAP_STACK) && current->pid && +object_is_on_stack((const void*)address))) + WARN_ON(!virt_addr_valid(address)); + return __pa((unsigned long)address); } @@ -827,7 +834,17 @@ static inline void * phys_to_virt(unsigned long address) /* * Change "struct page" to physical address. */ -#define page_to_phys(page) ((phys_addr_t)page_to_pfn(page) << PAGE_SHIFT) +static inline phys_addr_t page_to_phys(struct page *page) +{ + unsigned long pfn = page_to_pfn(page); + + if (IS_ENABLED(CONFIG_DEBUG_VIRTUAL) && + !WARN_ON(IS_ENABLED(CONFIG_HAVE_ARCH_VMAP_STACK) && current->pid && +object_is_on_stack(__va(PFN_PHYS(pfn) + WARN_ON(!pfn_valid(pfn)); + + return PFN_PHYS(pfn); +} /* * 32 bits still uses virt_to_bus() for it's implementation of DMA diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c index 4fc77a99c9bf..68d204a45cd0 100644 --- a/arch/powerpc/mm/pgtable_32.c +++ b/arch/powerpc/mm/pgtable_32.c @@ -143,7 +143,7 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, pgprot_t prot, void *call * Don't allow anybody to remap normal RAM that we're using. * mem_init() sets high_memory so only do the check after that. */ - if (slab_is_available() && (p < virt_to_phys(high_memory)) && + if (slab_is_available() && virt_addr_valid(p) && page_is_ram(__phys_to_pfn(p))) { printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n", (unsigned long long)p, _
[PATCH] lib: fix build failure in CONFIG_DEBUG_VIRTUAL test
On several arches, virt_to_phys() is in io.h Build fails without it: CC lib/test_debug_virtual.o lib/test_debug_virtual.c: In function 'test_debug_virtual_init': lib/test_debug_virtual.c:26:7: error: implicit declaration of function 'virt_to_phys' [-Werror=implicit-function-declaration] pa = virt_to_phys(va); ^ Fixes: e4dace361552 ("lib: add test module for CONFIG_DEBUG_VIRTUAL") CC: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- lib/test_debug_virtual.c | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/test_debug_virtual.c b/lib/test_debug_virtual.c index d5a06addeb27..bf864c73e462 100644 --- a/lib/test_debug_virtual.c +++ b/lib/test_debug_virtual.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #ifdef CONFIG_MIPS -- 2.13.3
Re: [PATCH v2.1 24/34] dt-bindings: arm: Convert Rockchip board/soc bindings to json-schema
Am Sonntag, 9. Dezember 2018, 23:14:05 CET schrieb Heiko Stuebner: Forgot the From: Rob Herring here, but if you're ok with how it looks I can apply it to my tree. Heiko > Convert Rockchip SoC bindings to DT schema format using json-schema. > > Cc: Mark Rutland > Cc: Heiko Stuebner > Cc: devicet...@vger.kernel.org > Cc: linux-arm-ker...@lists.infradead.org > Cc: linux-rockc...@lists.infradead.org > Signed-off-by: Rob Herring > [move to per-board entries and added recently added boards] > Signed-off-by: Heiko Stuebner > --- > Hi Rob, > > there are boards where the description adds much value and on others > it is maybe less, but personally I'd like to keep things uniform, > as that makes reading these things easier if the format stays the > same all the time, so I've gone forward and just did the conversion > > make dtbs_check did not complain about the schema it seems but I > did end up with an error later on: > > FATAL ERROR: Unknown output format "yaml" > make[2]: *** [scripts/Makefile.lib:313: arch/arm/boot/dts/rk3036-evb.dt.yaml] > Fehler 1 > > But I guess I did not mess up the schema yet. > > So does it look ok that way? > Heiko > > .../devicetree/bindings/arm/rockchip.txt | 240 -- > .../devicetree/bindings/arm/rockchip.yaml | 419 ++ > 2 files changed, 419 insertions(+), 240 deletions(-) > delete mode 100644 Documentation/devicetree/bindings/arm/rockchip.txt > create mode 100644 Documentation/devicetree/bindings/arm/rockchip.yaml > > diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt > b/Documentation/devicetree/bindings/arm/rockchip.txt > deleted file mode 100644 > index 0cc71236d639.. > --- a/Documentation/devicetree/bindings/arm/rockchip.txt > +++ /dev/null > @@ -1,240 +0,0 @@ > -Rockchip platforms device tree bindings > > - > -- 96boards RK3399 Ficus (ROCK960 Enterprise Edition) > -Required root node properties: > - - compatible = "vamrs,ficus", "rockchip,rk3399"; > - > -- 96boards RK3399 Rock960 (ROCK960 Consumer Edition) > -Required root node properties: > - - compatible = "vamrs,rock960", "rockchip,rk3399"; > - > -- Amarula Vyasa RK3288 board > -Required root node properties: > - - compatible = "amarula,vyasa-rk3288", "rockchip,rk3288"; > - > -- Asus Tinker board > -Required root node properties: > - - compatible = "asus,rk3288-tinker", "rockchip,rk3288"; > - > -- Asus Tinker board S > -Required root node properties: > - - compatible = "asus,rk3288-tinker-s", "rockchip,rk3288"; > - > -- Kylin RK3036 board: > -Required root node properties: > - - compatible = "rockchip,kylin-rk3036", "rockchip,rk3036"; > - > -- MarsBoard RK3066 board: > -Required root node properties: > - - compatible = "haoyu,marsboard-rk3066", "rockchip,rk3066a"; > - > -- bq Curie 2 tablet: > -Required root node properties: > - - compatible = "mundoreader,bq-curie2", "rockchip,rk3066a"; > - > -- ChipSPARK Rayeager PX2 board: > -Required root node properties: > - - compatible = "chipspark,rayeager-px2", "rockchip,rk3066a"; > - > -- Radxa Rock board: > -Required root node properties: > - - compatible = "radxa,rock", "rockchip,rk3188"; > - > -- Radxa Rock2 Square board: > -Required root node properties: > - - compatible = "radxa,rock2-square", "rockchip,rk3288"; > - > -- Rikomagic MK808 v1 board: > -Required root node properties: > - - compatible = "rikomagic,mk808", "rockchip,rk3066a"; > - > -- Firefly Firefly-RK3288 board: > -Required root node properties: > - - compatible = "firefly,firefly-rk3288", "rockchip,rk3288"; > -or > - - compatible = "firefly,firefly-rk3288-beta", "rockchip,rk3288"; > - > -- Firefly Firefly-RK3288 Reload board: > -Required root node properties: > - - compatible = "firefly,firefly-rk3288-reload", "rockchip,rk3288"; > - > -- Firefly Firefly-RK3399 board: > -Required root node properties: > - - compatible = "firefly,firefly-rk3399", "rockchip,rk3399"; > - > -- Firefly roc-rk3328-cc board: > -Required root node properties: > - - compatible = "firefly,roc-rk3328-cc", "rockchip,rk3328"; > - > -- Firefly ROC-RK3399-PC board: > -Required root node properties: > - - compatible = "firefly,roc-rk3399-pc", "rockchip,rk3399"; > - > -- ChipSPARK PopMetal-RK3288 board: > -Required root node properties: > - - compatible = "chipspark,popmetal-rk3288", "rockchip,rk3288"; > - > -- Netxeon R89 board: > -Required root node properties: > - - compatible = "netxeon,r89", "rockchip,rk3288"; > - > -- GeekBuying GeekBox: > -Required root node properties: > - - compatible = "geekbuying,geekbox", "rockchip,rk3368"; > - > -- Google Bob (Asus Chromebook Flip C101PA): > -Required root node properties: > - compatible = "google,bob-rev13", "google,bob-rev12", > - "google,bob-rev11", "google,bob-rev10",
Re: [PATCH v2 26/34] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema
On Thu, Dec 06, 2018 at 01:38:42PM -0600, Rob Herring wrote: > On Wed, Dec 5, 2018 at 1:44 PM Simon Horman wrote: > > > > On Tue, Dec 04, 2018 at 09:08:57AM -0600, Rob Herring wrote: > > > On Tue, Dec 4, 2018 at 8:57 AM Geert Uytterhoeven > > > wrote: > > > > > > > > Hi Simon, > > > > > > > > On Tue, Dec 4, 2018 at 3:48 PM Simon Horman wrote: > > > > > On Mon, Dec 03, 2018 at 03:32:15PM -0600, Rob Herring wrote: > > > > > > Convert Renesas SoC bindings to DT schema format using json-schema. > > > > > > > > > > > > Cc: Simon Horman > > > > > > Cc: Magnus Damm > > > > > > Cc: Mark Rutland > > > > > > Cc: linux-renesas-...@vger.kernel.org > > > > > > Cc: devicet...@vger.kernel.org > > > > > > Signed-off-by: Rob Herring > > > > > > --- > > > > > > .../devicetree/bindings/arm/shmobile.txt | 151 > > > > > > .../devicetree/bindings/arm/shmobile.yaml | 218 > > > > > > ++ > > > > > > 2 files changed, 218 insertions(+), 151 deletions(-) > > > > > > delete mode 100644 > > > > > > Documentation/devicetree/bindings/arm/shmobile.txt > > > > > > create mode 100644 > > > > > > Documentation/devicetree/bindings/arm/shmobile.yaml > > > > > > > > > > Hi Rob, > > > > > > > > > > what is this based on? I get a conflict when applying the .txt change > > > > > and if I knew the base for this patch it would be rather easy to work > > > > > out what has changed. > > > > > > 4.20-rc2 > > > > > > > > > > > > > Also, should we do an s/shmobile.txt/shmobile.yaml/ in MAINTAINERS? > > > > > > Yes. Though it was pointed out that get_maintainers.pl can pull emails > > > out of this file. We'd need to get that to work by default though. > > > > > > > Probably even s/shmobile.yaml/renesas.yaml/, while at it? > > > > > > Sure, if that's what you all want. > > > > How about this? > > LGTM Thanks. As my tree is already closed for v4.21 I have applied this for v4.22.
[PATCH] powerpc/83xx: handle machine check caused by watchdog timer
When the watchdog timer is set in interrupt mode, it causes a machine check when it times out. The purpose of this mode is to ease debugging, not to crash the kernel and reboot the machine. This patch implements a special handling for that, in order to not crash the kernel if the watchdog times out while in interrupt or within the idle task. Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/cputable.h | 1 + arch/powerpc/include/asm/reg.h | 2 ++ arch/powerpc/kernel/cputable.c | 10 ++ arch/powerpc/platforms/83xx/misc.c | 16 4 files changed, 25 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h index a0395ccbbe9e..d05f0c28e515 100644 --- a/arch/powerpc/include/asm/cputable.h +++ b/arch/powerpc/include/asm/cputable.h @@ -44,6 +44,7 @@ extern int machine_check_e500(struct pt_regs *regs); extern int machine_check_e200(struct pt_regs *regs); extern int machine_check_47x(struct pt_regs *regs); int machine_check_8xx(struct pt_regs *regs); +int machine_check_83xx(struct pt_regs *regs); extern void cpu_down_flush_e500v2(void); extern void cpu_down_flush_e500mc(void); diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 0d2139a0d5b9..1c98ef1f2d5b 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -769,6 +769,8 @@ #define SRR1_PROGTRAP0x0002 /* Trap */ #define SRR1_PROGADDR0x0001 /* SRR0 contains subsequent addr */ +#define SRR1_MCE_MCP 0x0008 /* Machine check signal caused interrupt */ + #define SPRN_HSRR0 0x13A /* Save/Restore Register 0 */ #define SPRN_HSRR1 0x13B /* Save/Restore Register 1 */ #define HSRR1_DENORM 0x0010 /* Denorm exception */ diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 2da01340c84c..1eab54bc6ee9 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -1141,6 +1141,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .machine_check = machine_check_generic, .platform = "ppc603", }, +#ifdef CONFIG_PPC_83xx { /* e300c1 (a 603e core, plus some) on 83xx */ .pvr_mask = 0x7fff, .pvr_value = 0x0083, @@ -1151,7 +1152,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .icache_bsize = 32, .dcache_bsize = 32, .cpu_setup = __setup_cpu_603, - .machine_check = machine_check_generic, + .machine_check = machine_check_83xx, .platform = "ppc603", }, { /* e300c2 (an e300c1 core, plus some, minus FPU) on 83xx */ @@ -1165,7 +1166,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .icache_bsize = 32, .dcache_bsize = 32, .cpu_setup = __setup_cpu_603, - .machine_check = machine_check_generic, + .machine_check = machine_check_83xx, .platform = "ppc603", }, { /* e300c3 (e300c1, plus one IU, half cache size) on 83xx */ @@ -1179,7 +1180,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .icache_bsize = 32, .dcache_bsize = 32, .cpu_setup = __setup_cpu_603, - .machine_check = machine_check_generic, + .machine_check = machine_check_83xx, .num_pmcs = 4, .oprofile_cpu_type = "ppc/e300", .oprofile_type = PPC_OPROFILE_FSL_EMB, @@ -1196,12 +1197,13 @@ static struct cpu_spec __initdata cpu_specs[] = { .icache_bsize = 32, .dcache_bsize = 32, .cpu_setup = __setup_cpu_603, - .machine_check = machine_check_generic, + .machine_check = machine_check_83xx, .num_pmcs = 4, .oprofile_cpu_type = "ppc/e300", .oprofile_type = PPC_OPROFILE_FSL_EMB, .platform = "ppc603", }, +#endif { /* default match, we assume split I/D cache & TB (non-601)... */ .pvr_mask = 0x, .pvr_value = 0x, diff --git a/arch/powerpc/platforms/83xx/misc.c b/arch/powerpc/platforms/83xx/misc.c index d75c9816a5c9..3e591d159413 100644 --- a/arch/powerpc/platforms/83xx/misc.c +++ b/arch/powerpc/platforms/83xx/misc.c @@ -150,3 +150,19 @@ void __init mpc83xx_setup_arch(void) mpc83xx_setup_pci(); }
Re: [PATCH] powerpc/ipic: Fix a bounds check in ipic_set_priority()
Le 07/12/2018 à 03:07, Michael Ellerman a écrit : Christophe LEROY writes: Le 05/12/2018 à 04:26, Michael Ellerman a écrit : Hi Dan, Thanks for the patch. Dan Carpenter writes: The ipic_info[] array only has 95 elements so I have made the bounds check smaller to prevent a read overflow. It was Smatch that found this issue: arch/powerpc/sysdev/ipic.c:784 ipic_set_priority() error: buffer overflow 'ipic_info' 95 <= 127 Signed-off-by: Dan Carpenter --- I wasn't able to find any callers of this code. Maybe we removed the last one in commit b9f0f1bb2bca ("[POWERPC] Adapt ipic driver to new host_ops interface, add set_irq_type to set IRQ sense"). So perhaps we should just remove it. I'm not really comfortable doing that myself, because I don't know the code well enough and can't build test it properly. Hah wow, last usage removed in 2006! I don't see any mention of it since then, so I'll remove it. If it breaks something we can put it back. Can smatch help us find things like this that are defined non-static but never used? I think we have to do that carrefully. Some of those functions might be used by out-of-tree boards. We don't keep unused code around for out-of-tree boards. Either the out-of-tree code should be merged upstream, or you can maintain whatever extra functions you need as part of your out-of-tree code base. I'm thinking especially at ipic_get_mcp_status() and ipic_set_mcp_status(). They are used in my 832x boards's machine check handler to know when a machine check is a timeout from the 832x watchdog. Thanks for pointing them out, I'll send a patch to remove them :) Lol :) If you are doing the housework, you can remove ipic_set_highest_priority() ipic_enable_mcp() and ipic_disable_mcp() But seriously, why is your machine check code not in-tree, is there some reason you can't merge it? Maybe because you haven't merged it yet allthough I sent it more than three minutes ago. :) Seriously, that was just left over with other priorities, and also because I'm not too happy about it because what I would really like is that it kills the userland app if any but don't crash the system when it is in interrupt or in idle. But due to b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt"), die_will_crash() doesn't work anymore. So for the time being, the patch I sent is not killing anybody, just doing an Oops for notification (note that die_will_crash() is used in the Opal machine check handler, so it probably doesn't work anymore there either). Christophe
[PATCH] powerpc/mm: define an empty slice_init_new_context_exec()
Define slice_init_new_context_exec() at all time to avoid Signed-off-by: Christophe Leroy --- arch/powerpc/include/asm/slice.h | 14 +- arch/powerpc/mm/mmu_context_nohash.c | 2 -- 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/slice.h b/arch/powerpc/include/asm/slice.h index a595461c9cb0..44816cbc4198 100644 --- a/arch/powerpc/include/asm/slice.h +++ b/arch/powerpc/include/asm/slice.h @@ -10,6 +10,10 @@ #include #endif +#ifndef __ASSEMBLY__ + +struct mm_struct; + #ifdef CONFIG_PPC_MM_SLICES #ifdef CONFIG_HUGETLB_PAGE @@ -18,10 +22,6 @@ #define HAVE_ARCH_UNMAPPED_AREA #define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN -#ifndef __ASSEMBLY__ - -struct mm_struct; - unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len, unsigned long flags, unsigned int psize, int topdown); @@ -34,8 +34,12 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned long start, void slice_init_new_context_exec(struct mm_struct *mm); void slice_setup_new_exec(void); -#endif /* __ASSEMBLY__ */ +#else /* CONFIG_PPC_MM_SLICES */ + +static inline void slice_init_new_context_exec(struct mm_struct *mm) {} #endif /* CONFIG_PPC_MM_SLICES */ +#endif /* __ASSEMBLY__ */ + #endif /* _ASM_POWERPC_SLICE_H */ diff --git a/arch/powerpc/mm/mmu_context_nohash.c b/arch/powerpc/mm/mmu_context_nohash.c index 431ecf37f17c..22d71a58167f 100644 --- a/arch/powerpc/mm/mmu_context_nohash.c +++ b/arch/powerpc/mm/mmu_context_nohash.c @@ -372,7 +372,6 @@ int init_new_context(struct task_struct *t, struct mm_struct *mm) { pr_hard("initing context for mm @%p\n", mm); -#ifdef CONFIG_PPC_MM_SLICES /* * We have MMU_NO_CONTEXT set to be ~0. Hence check * explicitly against context.id == 0. This ensures that we properly @@ -382,7 +381,6 @@ int init_new_context(struct task_struct *t, struct mm_struct *mm) */ if (mm->context.id == 0) slice_init_new_context_exec(mm); -#endif mm->context.id = MMU_NO_CONTEXT; mm->context.active = 0; pte_frag_set(&mm->context, NULL); -- 2.13.3
Re: [PATCH v6] powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call
On 12/07, Dmitry V. Levin wrote: > > Please make either v5 or v6 edition of this fix, or any similar fix, > into v4.20. IIUC, v5 above means [PATCH v5 23/25] powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call you sent in another series... > long do_syscall_trace_enter(struct pt_regs *regs) > { > + struct thread_info *ti; > + u32 cached_flags; > + > user_exit(); > > - if (test_thread_flag(TIF_SYSCALL_EMU)) { > - ptrace_report_syscall(regs); > - /* > - * Returning -1 will skip the syscall execution. We want to > - * avoid clobbering any register also, thus, not 'gotoing' > - * skip label. > - */ > - return -1; > - } > + ti = current_thread_info(); > + cached_flags = READ_ONCE(ti->flags) & > +(_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE | > + _TIF_SYSCALL_TRACEPOINT); > > - /* > - * The tracer may decide to abort the syscall, if so tracehook > - * will return !0. Note that the tracer may also just change > - * regs->gpr[0] to an invalid syscall number, that is handled > - * below on the exit path. > - */ > - if (test_thread_flag(TIF_SYSCALL_TRACE) && > - tracehook_report_syscall_entry(regs)) > - goto skip; > + if (cached_flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { > + int rc = tracehook_report_syscall_entry(regs); > + > + if (unlikely(cached_flags & _TIF_SYSCALL_EMU)) { > + /* > + * A nonzero return code from > + * tracehook_report_syscall_entry() tells us > + * to prevent the syscall execution, but > + * we are not going to execute it anyway. > + * > + * Returning -1 will skip the syscall execution. > + * We want to avoid clobbering any register also, > + * thus, not 'gotoing' skip label. > + */ > + return -1; > + } > + > + if (rc) { > + /* > + * The tracer decided to abort the syscall. > + * Note that the tracer may also just change > + * regs->gpr[0] to an invalid syscall number, > + * that is handled below on the exit path. > + */ > + goto skip; > + } > + } > > /* Run seccomp after ptrace; allow it to set gpr[3]. */ > if (do_seccomp(regs)) > @@ -3293,7 +3309,7 @@ long do_syscall_trace_enter(struct pt_regs *regs) > if (regs->gpr[0] >= NR_syscalls) > goto skip; > > - if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT))) > + if (unlikely(cached_flags & _TIF_SYSCALL_TRACEPOINT)) I will leave this to maintainers, but to me this change looks good and imo it also cleanups the code. However I am not sure cached_flags should include _TIF_SYSCALL_TRACEPOINT. If nothing else, the caller can sleep in ptrace_stop() unpredictably long and TIF_SYSCALL_TRACEPOINT can be set/cleared meanwhile. Oleg.
Re: [PATCH v6] powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call
On Mon, Dec 10, 2018 at 02:28:07PM +0100, Oleg Nesterov wrote: > On 12/07, Dmitry V. Levin wrote: > > > > Please make either v5 or v6 edition of this fix, or any similar fix, > > into v4.20. > > IIUC, v5 above means > > [PATCH v5 23/25] powerpc/ptrace: replace ptrace_report_syscall() with a > tracehook call > > you sent in another series... They just happen to have the same v5 here and there. In that series I included the most trivial variant of the change. > > long do_syscall_trace_enter(struct pt_regs *regs) > > { > > + struct thread_info *ti; > > + u32 cached_flags; > > + > > user_exit(); > > > > - if (test_thread_flag(TIF_SYSCALL_EMU)) { > > - ptrace_report_syscall(regs); > > - /* > > -* Returning -1 will skip the syscall execution. We want to > > -* avoid clobbering any register also, thus, not 'gotoing' > > -* skip label. > > -*/ > > - return -1; > > - } > > + ti = current_thread_info(); > > + cached_flags = READ_ONCE(ti->flags) & > > + (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE | > > + _TIF_SYSCALL_TRACEPOINT); > > > > - /* > > -* The tracer may decide to abort the syscall, if so tracehook > > -* will return !0. Note that the tracer may also just change > > -* regs->gpr[0] to an invalid syscall number, that is handled > > -* below on the exit path. > > -*/ > > - if (test_thread_flag(TIF_SYSCALL_TRACE) && > > - tracehook_report_syscall_entry(regs)) > > - goto skip; > > + if (cached_flags & (_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE)) { > > + int rc = tracehook_report_syscall_entry(regs); > > + > > + if (unlikely(cached_flags & _TIF_SYSCALL_EMU)) { > > + /* > > +* A nonzero return code from > > +* tracehook_report_syscall_entry() tells us > > +* to prevent the syscall execution, but > > +* we are not going to execute it anyway. > > +* > > +* Returning -1 will skip the syscall execution. > > +* We want to avoid clobbering any register also, > > +* thus, not 'gotoing' skip label. > > +*/ > > + return -1; > > + } > > + > > + if (rc) { > > + /* > > +* The tracer decided to abort the syscall. > > +* Note that the tracer may also just change > > +* regs->gpr[0] to an invalid syscall number, > > +* that is handled below on the exit path. > > +*/ > > + goto skip; > > + } > > + } > > > > /* Run seccomp after ptrace; allow it to set gpr[3]. */ > > if (do_seccomp(regs)) > > @@ -3293,7 +3309,7 @@ long do_syscall_trace_enter(struct pt_regs *regs) > > if (regs->gpr[0] >= NR_syscalls) > > goto skip; > > > > - if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT))) > > + if (unlikely(cached_flags & _TIF_SYSCALL_TRACEPOINT)) > > I will leave this to maintainers, but to me this change looks good and imo it > also cleanups the code. > > However I am not sure cached_flags should include _TIF_SYSCALL_TRACEPOINT. If > nothing else, the caller can sleep in ptrace_stop() unpredictably long and > TIF_SYSCALL_TRACEPOINT can be set/cleared meanwhile. I agree, we shouldn't cache _TIF_SYSCALL_TRACEPOINT. -- ldv signature.asc Description: PGP signature
Re: [REPOST PATCH v6 0/4] kgdb: Fix kgdb_roundup_cpus()
Hi Doug, On Fri, Dec 07, 2018 at 10:40:24AM -0800, Doug Anderson wrote: > On Fri, Dec 7, 2018 at 9:42 AM Catalin Marinas > wrote: > > On Tue, Dec 04, 2018 at 07:38:24PM -0800, Douglas Anderson wrote: > > > Douglas Anderson (4): > > > kgdb: Remove irq flags from roundup > > > kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function() > > > kgdb: Don't round up a CPU that failed rounding up before > > > kdb: Don't back trace on a cpu that didn't round up > > > > FWIW, trying these on arm64 (ThunderX2) with CONFIG_KGDB_TESTS_ON_BOOT=y > > on top of 4.20-rc5 doesn't boot. It looks like they leave interrupts > > disabled when they shouldn't and it trips over the BUG at > > mm/vmalloc.c:1380 (called via do_fork -> copy_process). > > > > Now, I don't think these patches make things worse on arm64 since prior > > to them the kgdb boot tests on arm64 were stuck in a loop (RUN > > singlestep). > > Thanks for the report! ...actually, I'd never tried CONFIG_KGDB_TESTS > before. ...so I tried them now: > > A) chromeos-4.19 tree on qcom-sdm845 without this series: booted up OK > B) chromeos-4.19 tree on qcom-sdm845 with this series: booted up OK > C) v4.20-rc5-90-g30002dd008ed on rockchip-rk3399 (kevin) with this > series: booted up OK > > Example output from B) above: > > localhost ~ # dmesg | grep kgdbts > [2.139814] KGDB: Registered I/O driver kgdbts > [2.144582] kgdbts:RUN plant and detach test > [2.165333] kgdbts:RUN sw breakpoint test > [2.172990] kgdbts:RUN bad memory access test > [2.178640] kgdbts:RUN singlestep test 1000 iterations > [2.187765] kgdbts:RUN singlestep [0/1000] > [2.559596] kgdbts:RUN singlestep [100/1000] > [2.931419] kgdbts:RUN singlestep [200/1000] > [3.303474] kgdbts:RUN singlestep [300/1000] > [3.675121] kgdbts:RUN singlestep [400/1000] > [4.046867] kgdbts:RUN singlestep [500/1000] > [4.418920] kgdbts:RUN singlestep [600/1000] > [4.790824] kgdbts:RUN singlestep [700/1000] > [5.162479] kgdbts:RUN singlestep [800/1000] > [5.534103] kgdbts:RUN singlestep [900/1000] > [5.902299] kgdbts:RUN do_fork for 100 breakpoints > [8.463900] KGDB: Unregistered I/O driver kgdbts, debugger disabled > > ...so I guess I'm a little confused. Either I have a different config > than you do or something is special about your machine? I tried it now on a Juno board both as a host and a guest and boots fine. It must be something that only triggers ThunderX2. Ignore the report for now, if I find anything interesting I'll let you know. -- Catalin
Re: [PATCH] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support
On 10/12/2018 14:21, Rafael David Tinoco wrote: On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the physical frame number might be so big that zsmalloc obj encoding (to location) will break, causing: BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc Read of size 4 at addr by task mkfs.ext4/623 CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 4.19.0-rc8-00017-g8239bc6d3307-dirty #15 Hardware name: Generic DT based system [] (unwind_backtrace) from [] (show_stack+0x20/0x24) [] (show_stack) from [] (dump_stack+0xbc/0xe8) [] (dump_stack) from [] (kasan_report+0x248/0x390) [] (kasan_report) from [] (__asan_load4+0x78/0xb4) [] (__asan_load4) from [] (zs_map_object+0xa4/0x2bc) [] (zs_map_object) from [] (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram]) [] (zram_bvec_rw.constprop.2 [zram]) from [] (zram_make_request+0x234/0x46c [zram]) [] (zram_make_request [zram]) from [] (generic_make_request+0x304/0x63c) [] (generic_make_request) from [] (submit_bio+0x4c/0x1c8) [] (submit_bio) from [] (submit_bh_wbc.constprop.15+0x238/0x26c) [] (submit_bh_wbc.constprop.15) from [] (__block_write_full_page+0x524/0x76c) [] (__block_write_full_page) from [] (block_write_full_page+0x1bc/0x1d4) [] (block_write_full_page) from [] (blkdev_writepage+0x24/0x28) [] (blkdev_writepage) from [] (__writepage+0x44/0x78) [] (__writepage) from [] (write_cache_pages+0x3b8/0x800) [] (write_cache_pages) from [] (generic_writepages+0x74/0xa0) [] (generic_writepages) from [] (blkdev_writepages+0x18/0x1c) [] (blkdev_writepages) from [] (do_writepages+0x68/0x134) [] (do_writepages) from [] (__filemap_fdatawrite_range+0xb0/0xf4) [] (__filemap_fdatawrite_range) from [] (file_write_and_wait_range+0x64/0xd0) [] (file_write_and_wait_range) from [] (blkdev_fsync+0x54/0x84) [] (blkdev_fsync) from [] (vfs_fsync_range+0x70/0xd4) [] (vfs_fsync_range) from [] (do_fsync+0x4c/0x80) [] (do_fsync) from [] (sys_fsync+0x1c/0x20) [] (sys_fsync) from [] (ret_fast_syscall+0x0/0x2c) when trying to decode (the pfn) and map the object. That happens because one architecture might not re-define MAX_PHYSMEM_BITS, like in this ARM 32-bit w/ LPAE enabled example. For 32-bit systems, if not re-defined, MAX_POSSIBLE_PHYSMEM_BITS will default to BITS_PER_LONG (32) in most cases, and, with PAE enabled, _PFN_BITS might be wrong: which may cause obj variable to overflow if frame number is in HIGHMEM and referencing a page above the 4GB watermark. commit 6e00ec00b1a7 ("staging: zsmalloc: calculate MAX_PHYSMEM_BITS if not defined") realized MAX_PHYSMEM_BITS depended on SPARSEMEM headers and "fixed" it by calculating it using BITS_PER_LONG if SPARSEMEM wasn't used, like in the example given above. Systems with potential for PAE exist for a long time and assuming BITS_PER_LONG seems inadequate. Defining MAX_PHYSMEM_BITS looks better, however it is NOT a constant anymore for x86. SO, instead, MAX_POSSIBLE_PHYSMEM_BITS should be defined by every architecture using zsmalloc, together with a sanity check for MAX_POSSIBLE_PHYSMEM_BITS being too big on 32-bit systems. Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17 Signed-off-by: Rafael David Tinoco --- arch/arm/include/asm/pgtable-2level-types.h | 2 ++ arch/arm/include/asm/pgtable-3level-types.h | 2 ++ arch/arm64/include/asm/pgtable-types.h | 2 ++ arch/ia64/include/asm/page.h| 2 ++ arch/mips/include/asm/page.h| 2 ++ arch/powerpc/include/asm/mmu.h | 2 ++ arch/s390/include/asm/page.h| 2 ++ arch/sh/include/asm/page.h | 2 ++ arch/sparc/include/asm/page_32.h| 2 ++ arch/sparc/include/asm/page_64.h| 2 ++ arch/x86/include/asm/pgtable-2level_types.h | 2 ++ arch/x86/include/asm/pgtable-3level_types.h | 3 +- arch/x86/include/asm/pgtable_64_types.h | 4 +-- mm/zsmalloc.c | 35 +++-- 14 files changed, 45 insertions(+), 19 deletions(-) diff --git a/arch/arm/include/asm/pgtable-2level-types.h b/arch/arm/include/asm/pgtable-2level-types.h index 66cb5b0e89c5..552dba411324 100644 --- a/arch/arm/include/asm/pgtable-2level-types.h +++ b/arch/arm/include/asm/pgtable-2level-types.h @@ -64,4 +64,6 @@ typedef pteval_t pgprot_t; #endif /* STRICT_MM_TYPECHECKS */ +#define MAX_POSSIBLE_PHYSMEM_BITS 32 + #endif/* _ASM_PGTABLE_2LEVEL_TYPES_H */ diff --git a/arch/arm/include/asm/pgtable-3level-types.h b/arch/arm/include/asm/pgtable-3level-types.h index 921aa30259c4..664c39e6517c 100644 --- a/arch/arm/include/asm/pgtable-3level-types.h +++ b/arch/arm/include/asm/pgtable-3level-types.h @@ -67,4 +67,6 @@ typedef pteval_t pgprot_t; #endif /* STRICT_MM_TYPECHECKS */ +#define MAX_POSSIBLE_PHYSMEM_BITS 36 Nit: with LPAE, physical addresses go up to 40 bits, not just 36. Robin. + #endif/* _ASM_PGTABLE_3LEVEL_TYPES_H */ diff --git a/arch/arm64/include/asm/pgtable-types.h b/arch/arm64/inclu
Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
On 12/07/2018 04:36 PM, Michael Roth wrote: > Quoting Michael Ellerman (2018-12-07 06:31:13) >> Michael Roth writes: >> >>> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF >>> JIT allocations. At compile time it defaults to PAGE_SIZE * 4, >>> and is adjusted again at init time if MODULES_VADDR is defined. >>> >>> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with >> >> But maybe it should be, I don't know why we don't define it. >> >>> the compile-time default at boot-time, which is 0x9c40 when >>> using 64K page size. This overflows the signed 32-bit bpf_jit_limit >>> value: >>> >>> root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit >>> -1673527296 >>> >>> and can cause various unexpected failures throughout the network >>> stack. In one case `strace dhclient eth0` reported: >>> >>> setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, >>> 16) = -1 ENOTSUPP (Unknown error 524) >>> >>> and similar failures can be seen with tools like tcpdump. This doesn't >>> always reproduce however, and I'm not sure why. The more consistent >>> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host >>> would time out on systemd/netplan configuring a virtio-net NIC with no >>> noticeable errors in the logs. >>> >>> Fix this by limiting the compile-time default for bpf_jit_limit to >>> INT_MAX. >> >> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on >> whether we should be using PAGE_SIZE here at all. I guess each BPF >> program uses at least one page is the thinking? > > That seems to be the case, at least, the max number of minimum-sized > allocations would be less on ppc64 since the allocations are always at > least PAGE_SIZE in size. The init-time default also limits to INT_MAX, > so it seemed consistent to do that here too. > >> >> Thanks for tracking this down. For some reason none of my ~10 test boxes >> have hit this, perhaps I don't have new enough userspace? > > I'm not too sure, I would've thought things like the dhclient case in > the commit log would fail every time, but sometimes I need to reboot the > guest before I start seeing the behavior. Maybe there's something special > about when JIT allocations are actually done that can affect > reproducibility? > > In my case at least the virtio-net networking timeout was consistent > enough for a bisect, but maybe it depends on the specific network > configuration (single NIC, basic DHCP through netplan/systemd in my case). > >> >> You don't mention why you needed to add BPF_MIN(), I assume because the >> kernel version of min() has gotten too complicated to work here? > > I wasn't sure if it was safe here or not, so I tried looking at other > users and came across: > > mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : (y)) > /* can't use min() */ > > I'm not sure what the reasoning was (or whether it still applies), but I > figured it was safer to do the same here. Maybe Nick still recalls? > >> >> Daniel I assume you'll merge this via your tree? >> >> cheers >> >>> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv >>> allocations") >>> Cc: linuxppc-...@ozlabs.org >>> Cc: Daniel Borkmann >>> Cc: Sandipan Das >>> Cc: Alexei Starovoitov >>> Signed-off-by: Michael Roth Thanks for the reports / fixes and sorry for my late reply (bit too swamped last week), some more thoughts below. >>> kernel/bpf/core.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c >>> index b1a3545d0ec8..55de4746cdfd 100644 >>> --- a/kernel/bpf/core.c >>> +++ b/kernel/bpf/core.c >>> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) >>> } >>> >>> #ifdef CONFIG_BPF_JIT >>> -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4) >>> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y)) >>> +# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), INT_MAX) >>> >>> /* All BPF JIT sysctl knobs here. */ >>> int bpf_jit_enable __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT define also given for 4.21 arm64 will have its own dedicated area for JIT allocations where neither the above limit nor the MODULES_END/ MODULES_VADDR one would fit and I don't want to make this even more ugly with adding further cases into the core. Would the below variant work for you? Thanks, Daniel >From da9daf462d41ce5506c6b6318a9fa3d6d8a64f6c Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 10 Dec 2018 14:30:27 +0100 Subject: [PATCH bpf] bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K Michael and Sandipan report: Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF JIT allocations. At compile time it defaults to PAGE_SIZE * 4, and is adjusted again at init time if MODULES_VADDR is defined. For ppc64 kernels, MODULES_VADDR isn't defined, so we
Re: [PATCH] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support
On Mon, Dec 10, 2018 at 02:35:55PM +, Robin Murphy wrote: > On 10/12/2018 14:21, Rafael David Tinoco wrote: > >On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the > >physical frame number might be so big that zsmalloc obj encoding (to > >location) will break, causing: > > > >BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc > >Read of size 4 at addr by task mkfs.ext4/623 > >CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted > >4.19.0-rc8-00017-g8239bc6d3307-dirty #15 > >Hardware name: Generic DT based system > >[] (unwind_backtrace) from [] (show_stack+0x20/0x24) > >[] (show_stack) from [] (dump_stack+0xbc/0xe8) > >[] (dump_stack) from [] (kasan_report+0x248/0x390) > >[] (kasan_report) from [] (__asan_load4+0x78/0xb4) > >[] (__asan_load4) from [] (zs_map_object+0xa4/0x2bc) > >[] (zs_map_object) from [] > >(zram_bvec_rw.constprop.2+0x324/0x8e4 [zram]) > >[] (zram_bvec_rw.constprop.2 [zram]) from [] > >(zram_make_request+0x234/0x46c [zram]) > >[] (zram_make_request [zram]) from [] > >(generic_make_request+0x304/0x63c) > >[] (generic_make_request) from [] (submit_bio+0x4c/0x1c8) > >[] (submit_bio) from [] > >(submit_bh_wbc.constprop.15+0x238/0x26c) > >[] (submit_bh_wbc.constprop.15) from [] > >(__block_write_full_page+0x524/0x76c) > >[] (__block_write_full_page) from [] > >(block_write_full_page+0x1bc/0x1d4) > >[] (block_write_full_page) from [] > >(blkdev_writepage+0x24/0x28) > >[] (blkdev_writepage) from [] (__writepage+0x44/0x78) > >[] (__writepage) from [] (write_cache_pages+0x3b8/0x800) > >[] (write_cache_pages) from [] > >(generic_writepages+0x74/0xa0) > >[] (generic_writepages) from [] > >(blkdev_writepages+0x18/0x1c) > >[] (blkdev_writepages) from [] (do_writepages+0x68/0x134) > >[] (do_writepages) from [] > >(__filemap_fdatawrite_range+0xb0/0xf4) > >[] (__filemap_fdatawrite_range) from [] > >(file_write_and_wait_range+0x64/0xd0) > >[] (file_write_and_wait_range) from [] > >(blkdev_fsync+0x54/0x84) > >[] (blkdev_fsync) from [] (vfs_fsync_range+0x70/0xd4) > >[] (vfs_fsync_range) from [] (do_fsync+0x4c/0x80) > >[] (do_fsync) from [] (sys_fsync+0x1c/0x20) > >[] (sys_fsync) from [] (ret_fast_syscall+0x0/0x2c) > > > >when trying to decode (the pfn) and map the object. > > > >That happens because one architecture might not re-define > >MAX_PHYSMEM_BITS, like in this ARM 32-bit w/ LPAE enabled example. For > >32-bit systems, if not re-defined, MAX_POSSIBLE_PHYSMEM_BITS will > >default to BITS_PER_LONG (32) in most cases, and, with PAE enabled, > >_PFN_BITS might be wrong: which may cause obj variable to overflow if > >frame number is in HIGHMEM and referencing a page above the 4GB > >watermark. > > > >commit 6e00ec00b1a7 ("staging: zsmalloc: calculate MAX_PHYSMEM_BITS if > >not defined") realized MAX_PHYSMEM_BITS depended on SPARSEMEM headers > >and "fixed" it by calculating it using BITS_PER_LONG if SPARSEMEM wasn't > >used, like in the example given above. > > > >Systems with potential for PAE exist for a long time and assuming > >BITS_PER_LONG seems inadequate. Defining MAX_PHYSMEM_BITS looks better, > >however it is NOT a constant anymore for x86. > > > >SO, instead, MAX_POSSIBLE_PHYSMEM_BITS should be defined by every > >architecture using zsmalloc, together with a sanity check for > >MAX_POSSIBLE_PHYSMEM_BITS being too big on 32-bit systems. > > > >Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17 > >Signed-off-by: Rafael David Tinoco > >--- > > arch/arm/include/asm/pgtable-2level-types.h | 2 ++ > > arch/arm/include/asm/pgtable-3level-types.h | 2 ++ > > arch/arm64/include/asm/pgtable-types.h | 2 ++ > > arch/ia64/include/asm/page.h| 2 ++ > > arch/mips/include/asm/page.h| 2 ++ > > arch/powerpc/include/asm/mmu.h | 2 ++ > > arch/s390/include/asm/page.h| 2 ++ > > arch/sh/include/asm/page.h | 2 ++ > > arch/sparc/include/asm/page_32.h| 2 ++ > > arch/sparc/include/asm/page_64.h| 2 ++ > > arch/x86/include/asm/pgtable-2level_types.h | 2 ++ > > arch/x86/include/asm/pgtable-3level_types.h | 3 +- > > arch/x86/include/asm/pgtable_64_types.h | 4 +-- > > mm/zsmalloc.c | 35 +++-- > > 14 files changed, 45 insertions(+), 19 deletions(-) > > > >diff --git a/arch/arm/include/asm/pgtable-2level-types.h > >b/arch/arm/include/asm/pgtable-2level-types.h > >index 66cb5b0e89c5..552dba411324 100644 > >--- a/arch/arm/include/asm/pgtable-2level-types.h > >+++ b/arch/arm/include/asm/pgtable-2level-types.h > >@@ -64,4 +64,6 @@ typedef pteval_t pgprot_t; > > #endif /* STRICT_MM_TYPECHECKS */ > >+#define MAX_POSSIBLE_PHYSMEM_BITS 32 > >+ > > #endif /* _ASM_PGTABLE_2LEVEL_TYPES_H */ > >diff --git a/arch/arm/include/asm/pgtable-3level-types.h > >b/arch/arm/include/asm/pgtable-3level-types.h > >index 921aa30259c4..664c39e6517c 100644 > >--- a/arch/arm/include/asm/pgtable-3level-types.h > >+++ b/arch/a
Re: [PATCH v2.1 24/34] dt-bindings: arm: Convert Rockchip board/soc bindings to json-schema
On Sun, Dec 9, 2018 at 4:14 PM Heiko Stuebner wrote: > > Convert Rockchip SoC bindings to DT schema format using json-schema. > > Cc: Mark Rutland > Cc: Heiko Stuebner > Cc: devicet...@vger.kernel.org > Cc: linux-arm-ker...@lists.infradead.org > Cc: linux-rockc...@lists.infradead.org > Signed-off-by: Rob Herring > [move to per-board entries and added recently added boards] > Signed-off-by: Heiko Stuebner > --- > Hi Rob, > > there are boards where the description adds much value and on others > it is maybe less, but personally I'd like to keep things uniform, > as that makes reading these things easier if the format stays the > same all the time, so I've gone forward and just did the conversion > > make dtbs_check did not complain about the schema it seems but I > did end up with an error later on: > > FATAL ERROR: Unknown output format "yaml" > make[2]: *** [scripts/Makefile.lib:313: arch/arm/boot/dts/rk3036-evb.dt.yaml] > Fehler 1 You need libyaml and its headers installed so dtc can output yaml, but that's not needed for checking the schema against the meta-schema. > But I guess I did not mess up the schema yet. > > So does it look ok that way? Yes, but one comment... > + - description: Firefly Firefly-RK3288 > +items: > + - const: firefly,firefly-rk3288 > + - const: rockchip,rk3288 > + > + - description: Firefly Firefly-RK3288 (beta board) > +items: > + - const: firefly,firefly-rk3288-beta > + - const: rockchip,rk3288 > + > + - description: Firefly Firefly-RK3288 Reload Seems like combining these 3 (or first 2?) would make sense if this is just revs of the same board. But either way is fine. > +items: > + - const: firefly,firefly-rk3288-reload > + - const: rockchip,rk3288
Re: [PATCH] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support
On Mon, Dec 10, 2018 at 12:21:05PM -0200, Rafael David Tinoco wrote: > diff --git a/arch/x86/include/asm/pgtable_64_types.h > b/arch/x86/include/asm/pgtable_64_types.h > index 84bd9bdc1987..d808cfde3d19 100644 > --- a/arch/x86/include/asm/pgtable_64_types.h > +++ b/arch/x86/include/asm/pgtable_64_types.h > @@ -64,8 +64,6 @@ extern unsigned int ptrs_per_p4d; > #define P4D_SIZE (_AC(1, UL) << P4D_SHIFT) > #define P4D_MASK (~(P4D_SIZE - 1)) > > -#define MAX_POSSIBLE_PHYSMEM_BITS52 > - > #else /* CONFIG_X86_5LEVEL */ > > /* > @@ -154,4 +152,6 @@ extern unsigned int ptrs_per_p4d; > > #define PGD_KERNEL_START ((PAGE_SIZE / 2) / sizeof(pgd_t)) > > +#define MAX_POSSIBLE_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46) > + ... > #endif /* _ASM_X86_PGTABLE_64_DEFS_H */ > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 0787d33b80d8..132c20b6fd4f 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c ... > @@ -116,6 +100,25 @@ > */ > #define OBJ_ALLOCATED_TAG 1 > #define OBJ_TAG_BITS 1 > + > +/* > + * MAX_POSSIBLE_PHYSMEM_BITS should be defined by all archs using zsmalloc: > + * Trying to guess it from MAX_PHYSMEM_BITS, or considering it BITS_PER_LONG, > + * proved to be wrong by not considering PAE capabilities, or using SPARSEMEM > + * only headers, leading to bad object encoding due to object index overflow. > + */ > +#ifndef MAX_POSSIBLE_PHYSMEM_BITS > + #define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG > + #error "MAX_POSSIBLE_PHYSMEM_BITS HAS to be defined by arch using zsmalloc"; > +#else > + #ifndef CONFIG_64BIT > + #if (MAX_POSSIBLE_PHYSMEM_BITS >= (BITS_PER_LONG + PAGE_SHIFT - > OBJ_TAG_BITS)) > + #error "MAX_POSSIBLE_PHYSMEM_BITS is wrong for this arch"; > + #endif > + #endif > +#endif > + > +#define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT) > #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) > #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) Have you tested it with CONFIG_X86_5LEVEL=y? ASAICS, the patch makes OBJ_INDEX_BITS and what depends from it dynamic -- it depends what paging mode we are booting in. ZS_SIZE_CLASSES depends indirectly on OBJ_INDEX_BITS and I don't see how struct zs_pool definition can compile with dynamic ZS_SIZE_CLASSES. Hm? -- Kirill A. Shutemov
[PATCH] ocxl: Fix endiannes bug in read_afu_name()
The double word returned by read_afu_info(OCXL_DVSEC_TEMPL_NAME) contains four characters of the AFU name, read from the PCI config space, hence with a little-endian ordering. When composing the string, a big-endian system must swap the bytes so that the characters appear in the right order. Do this with le32_to_cpu(). Signed-off-by: Greg Kurz --- drivers/misc/ocxl/config.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c index 57a6bb1fd3c9..b76198ba8630 100644 --- a/drivers/misc/ocxl/config.c +++ b/drivers/misc/ocxl/config.c @@ -318,7 +318,7 @@ static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn, if (rc) return rc; ptr = (u32 *) &afu->name[i]; - *ptr = val; + *ptr = le32_to_cpu(val); } afu->name[OCXL_AFU_NAME_SZ - 1] = '\0'; /* play safe */ return 0;
Re: [PATCH] ocxl: Fix endiannes bug in read_afu_name()
Le 10/12/2018 à 16:10, Greg Kurz a écrit : The double word returned by read_afu_info(OCXL_DVSEC_TEMPL_NAME) contains four characters of the AFU name, read from the PCI config space, hence with a little-endian ordering. When composing the string, a big-endian system must swap the bytes so that the characters appear in the right order. Do this with le32_to_cpu(). Signed-off-by: Greg Kurz --- Thanks! Acked-by: Frederic Barrat drivers/misc/ocxl/config.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c index 57a6bb1fd3c9..b76198ba8630 100644 --- a/drivers/misc/ocxl/config.c +++ b/drivers/misc/ocxl/config.c @@ -318,7 +318,7 @@ static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn, if (rc) return rc; ptr = (u32 *) &afu->name[i]; - *ptr = val; + *ptr = le32_to_cpu(val); } afu->name[OCXL_AFU_NAME_SZ - 1] = '\0'; /* play safe */ return 0;
Re: use generic DMA mapping code in powerpc V4
Next step: 64ecd2c160bbef31465c4d34efc0f076a2aad4df (powerpc/dma: use phys_to_dma instead of get_dma_offset) The P5020 board boots and the PASEMI onboard ethernet works. -- Christian On 09 December 2018 at 7:26PM, Christian Zigotzky wrote: Next step: c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a (dma-mapping, powerpc: simplify the arch dma_set_mask override) Result: No problems with the PASEMI onboard ethernet and with booting the X5000 (P5020 board). -- Christian On 09 December 2018 at 3:20PM, Christian Zigotzky wrote: Next step: 602307b034734ce77a05da4b99333a2eaf6b6482 (powerpc/fsl_pci: simplify fsl_pci_dma_set_mask) git checkout 602307b034734ce77a05da4b99333a2eaf6b6482 The PASEMI onboard ethernet works and the X5000 boots. -- Christian On 08 December 2018 at 2:47PM, Christian Zigotzky wrote: Next step: e15cd8173ef85e9cc3e2a9c7cc2982f5c1355615 (powerpc/dma: fix an off-by-one in dma_capable) git checkout e15cd8173ef85e9cc3e2a9c7cc2982f5c1355615 The PASEMI onboard ethernet also works with this commit and the X5000 boots without any problems. -- Christian On 08 December 2018 at 11:29AM, Christian Zigotzky wrote: Next step: 7ebc44c535f6bd726d553756d38b137acc718443 (powerpc/dma: remove max_direct_dma_addr) git checkout 7ebc44c535f6bd726d553756d38b137acc718443 OK, the PASEMI onboard ethernet works and the P5020 board boots. -- Christian On 07 December 2018 at 7:33PM, Christian Zigotzky wrote: Next step: 13c1fdec5682b6e13257277fa16aa31f342d167d (powerpc/dma: move pci_dma_dev_setup_swiotlb to fsl_pci.c) git checkout 13c1fdec5682b6e13257277fa16aa31f342d167d Result: The PASEMI onboard ethernet works and the P5020 board boots. — Christian
Re: [PATCH v2 01/34] kbuild: Add support for DT binding schema checks
On Fri, Dec 7, 2018 at 10:47 PM Masahiro Yamada wrote: > > Hi Rob, > > > On Tue, Dec 4, 2018 at 6:32 AM Rob Herring wrote: > > > > This adds the build infrastructure for checking DT binding schema > > documents and validating dts files using the binding schema. > > > > Check DT binding schema documents: > > make dt_binding_check > > > > Build dts files and check using DT binding schema: > > make dtbs_check > > > > Optionally, DT_SCHEMA_FILES can passed in with a schema file(s) to use > > for validation. This makes it easier to find and fix errors generated by > > a specific schema. > > > > Currently, the validation targets are separate from a normal build to > > avoid a hard dependency on the external DT schema project and because > > there are lots of warnings generated. > > > > Cc: Jonathan Corbet > > Cc: Mark Rutland > > Cc: Masahiro Yamada > > Cc: Michal Marek > > Cc: linux-...@vger.kernel.org > > Cc: devicet...@vger.kernel.org > > Cc: linux-kbu...@vger.kernel.org > > Signed-off-by: Rob Herring > > --- > > .gitignore | 1 + > > Documentation/Makefile | 2 +- > > Documentation/devicetree/bindings/.gitignore | 1 + > > Documentation/devicetree/bindings/Makefile | 33 > > Makefile | 11 +-- > > scripts/Makefile.lib | 24 -- > > 6 files changed, 67 insertions(+), 5 deletions(-) > > create mode 100644 Documentation/devicetree/bindings/.gitignore > > create mode 100644 Documentation/devicetree/bindings/Makefile > > > > diff --git a/.gitignore b/.gitignore > > index 97ba6b79834c..a20ac26aa2f5 100644 > > --- a/.gitignore > > +++ b/.gitignore > > @@ -15,6 +15,7 @@ > > *.bin > > *.bz2 > > *.c.[012]*.* > > +*.dt.yaml > > *.dtb > > *.dtb.S > > *.dwo > > diff --git a/Documentation/Makefile b/Documentation/Makefile > > index 2ca77ad0f238..9786957c6a35 100644 > > --- a/Documentation/Makefile > > +++ b/Documentation/Makefile > > @@ -2,7 +2,7 @@ > > # Makefile for Sphinx documentation > > # > > > > -subdir-y := > > +subdir-y := devicetree/bindings/ > > > > # You can set these variables from the command line. > > SPHINXBUILD = sphinx-build > > diff --git a/Documentation/devicetree/bindings/.gitignore > > b/Documentation/devicetree/bindings/.gitignore > > new file mode 100644 > > index ..d9194c02dd08 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/.gitignore > > @@ -0,0 +1 @@ > > +*.example.dts > > diff --git a/Documentation/devicetree/bindings/Makefile > > b/Documentation/devicetree/bindings/Makefile > > new file mode 100644 > > index ..ee0110dd8131 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/Makefile > > @@ -0,0 +1,33 @@ > > +# SPDX-License-Identifier: GPL-2.0 > > +DT_DOC_CHECKER ?= dt-doc-validate > > +DT_EXTRACT_EX ?= dt-extract-example > > +DT_MK_SCHEMA ?= dt-mk-schema > > +DT_MK_SCHEMA_FLAGS := $(if $(DT_SCHEMA_FILES), -u) > > + > > +quiet_cmd_chk_binding = CHKDT $< > > + cmd_chk_binding = (set -e; \ > > + $(DT_DOC_CHECKER) $< ; \ > > + mkdir -p $(dir $@) ; \ > > + $(DT_EXTRACT_EX) $< > $@ ) > > + > > +$(obj)/%.example.dts: $(src)/%.yaml FORCE > > + $(call if_changed,chk_binding) > > + > > +DT_TMP_SCHEMA := .schema.yaml.tmp > > +extra-y += $(DT_TMP_SCHEMA) > > + > > +quiet_cmd_mk_schema = SCHEMA $@ > > + cmd_mk_schema = mkdir -p $(obj); \ > > + rm -f $@; \ > > + $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $< > > > I think '$<' is wrong. > > '$<' is replaced with the first prerequisite. Indeed. > You can easily check what is happening here. > > $ cat Documentation/devicetree/bindings/..schema.yaml.tmp.cmd > cmd_Documentation/devicetree/bindings/.schema.yaml.tmp := mkdir -p > Documentation/devicetree/bindings; rm -f > Documentation/devicetree/bindings/.schema.yaml.tmp; dt-mk-schema -o > Documentation/devicetree/bindings/.schema.yaml.tmp > Documentation/devicetree/bindings/arm/ti/ti,davinci.yaml > > > So, the dt-validater will check only binding from ti,davinci.yaml, > which is almost useless. > > > > If I understand it correctly, > .schema.yaml.tmp should contain all binding yaml. > > > I fixed it up like follows: > > diff --git a/Documentation/devicetree/bindings/Makefile > b/Documentation/devicetree/bindings/Makefile > index ee0110d..267458f 100644 > --- a/Documentation/devicetree/bindings/Makefile > +++ b/Documentation/devicetree/bindings/Makefile > @@ -19,7 +19,7 @@ extra-y += $(DT_TMP_SCHEMA) > quiet_cmd_mk_schema = SCHEMA $@ >cmd_mk_schema = mkdir -p $(obj); \ >rm -f $@; \ > - $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $< > + $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ > $(filter-out FORCE, $^) Thanks, I incorporated this fix. > > DT_DOCS = $(shell cd $(srctree)/$
Re: [PATCH] ocxl/afu_irq: Don't include
Le 10/12/2018 à 16:13, Greg Kurz a écrit : The AFU irq code doesn't need to reach out to the platform. Signed-off-by: Greg Kurz --- Acked-by: Frederic Barrat drivers/misc/ocxl/afu_irq.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c index e70cfa24577f..11ab996657a2 100644 --- a/drivers/misc/ocxl/afu_irq.c +++ b/drivers/misc/ocxl/afu_irq.c @@ -2,7 +2,6 @@ // Copyright 2017 IBM Corp. #include #include -#include #include "ocxl_internal.h" #include "trace.h"
[PATCH] ocxl: Clarify error path in setup_xsl_irq()
Implementing rollback with goto and labels is a common practice that leads to prettier and more maintainable code. FWIW, this design pattern is already being used in alloc_link() a few lines below in this file. Do the same in setup_xsl_irq(). Signed-off-by: Greg Kurz --- drivers/misc/ocxl/link.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index eed92055184d..659977a17405 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -273,9 +273,9 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) spa->irq_name = kasprintf(GFP_KERNEL, "ocxl-xsl-%x-%x-%x", link->domain, link->bus, link->dev); if (!spa->irq_name) { - unmap_irq_registers(spa); dev_err(&dev->dev, "Can't allocate name for xsl interrupt\n"); - return -ENOMEM; + rc = -ENOMEM; + goto err_xsl; } /* * At some point, we'll need to look into allowing a higher @@ -283,11 +283,10 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) */ spa->virq = irq_create_mapping(NULL, hwirq); if (!spa->virq) { - kfree(spa->irq_name); - unmap_irq_registers(spa); dev_err(&dev->dev, "irq_create_mapping failed for translation interrupt\n"); - return -EINVAL; + rc = -EINVAL; + goto err_name; } dev_dbg(&dev->dev, "hwirq %d mapped to virq %d\n", hwirq, spa->virq); @@ -295,15 +294,21 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) rc = request_irq(spa->virq, xsl_fault_handler, 0, spa->irq_name, link); if (rc) { - irq_dispose_mapping(spa->virq); - kfree(spa->irq_name); - unmap_irq_registers(spa); dev_err(&dev->dev, "request_irq failed for translation interrupt: %d\n", rc); - return -EINVAL; + rc = -EINVAL; + goto err_mapping; } return 0; + +err_mapping: + irq_dispose_mapping(spa->virq); +err_name: + kfree(spa->irq_name); +err_xsl: + unmap_irq_registers(spa); + return rc; } static void release_xsl_irq(struct link *link)
Re: [PATCH] ocxl: Simplify free_spa()
Le 10/12/2018 à 16:15, Greg Kurz a écrit : The only users of free_spa() are alloc_link() and free_link(), and in both cases: - link->spa != NULL - free_spa(link) is immediatly followed by kfree(link) The check isn't needed, and it doesn't bring much to clear the link->spa pointer. Drop both. Signed-off-by: Greg Kurz --- OK, that looks safe enough Acked-by: Frederic Barrat drivers/misc/ocxl/link.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 31695a078485..eed92055184d 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -352,11 +352,8 @@ static void free_spa(struct link *link) pr_debug("Freeing SPA for %x:%x:%x\n", link->domain, link->bus, link->dev); - if (spa && spa->spa_mem) { - free_pages((unsigned long) spa->spa_mem, spa->spa_order); - kfree(spa); - link->spa = NULL; - } + free_pages((unsigned long) spa->spa_mem, spa->spa_order); + kfree(spa); } static int alloc_link(struct pci_dev *dev, int PE_mask, struct link **out_link)
[PATCH] ocxl: Simplify free_spa()
The only users of free_spa() are alloc_link() and free_link(), and in both cases: - link->spa != NULL - free_spa(link) is immediatly followed by kfree(link) The check isn't needed, and it doesn't bring much to clear the link->spa pointer. Drop both. Signed-off-by: Greg Kurz --- drivers/misc/ocxl/link.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 31695a078485..eed92055184d 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -352,11 +352,8 @@ static void free_spa(struct link *link) pr_debug("Freeing SPA for %x:%x:%x\n", link->domain, link->bus, link->dev); - if (spa && spa->spa_mem) { - free_pages((unsigned long) spa->spa_mem, spa->spa_order); - kfree(spa); - link->spa = NULL; - } + free_pages((unsigned long) spa->spa_mem, spa->spa_order); + kfree(spa); } static int alloc_link(struct pci_dev *dev, int PE_mask, struct link **out_link)
[PATCH] ocxl/afu_irq: Don't include
The AFU irq code doesn't need to reach out to the platform. Signed-off-by: Greg Kurz --- drivers/misc/ocxl/afu_irq.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c index e70cfa24577f..11ab996657a2 100644 --- a/drivers/misc/ocxl/afu_irq.c +++ b/drivers/misc/ocxl/afu_irq.c @@ -2,7 +2,6 @@ // Copyright 2017 IBM Corp. #include #include -#include #include "ocxl_internal.h" #include "trace.h"
Re: use generic DMA mapping code in powerpc V4
On Sat, 8 Dec 2018 at 17:17, Christoph Hellwig wrote: > > On Sun, Dec 02, 2018 at 05:11:02PM +1100, Benjamin Herrenschmidt wrote: > > Talking of which ... Christoph, not sure if we can do something about > > this at the DMA API level or keep hacks but some adapters such as the > > nVidia GPUs have a HW hack we can use to work around their limitations > > in that case. > > > > They have a register that can program a fixed value for the top bits > > that they don't support. > > > > This works fine for any linear mapping with an offset, provided they > > can program the offset in that register and they have enough DMA range > > to cover all memory from that offset. > > > > I can probably get the info about this from them so we can exploit it > > in nouveau. > > I think we can expose the direct mapping offset if people care enough, > we just have to be very careful designing the API. I'll happily leave > that to those that actually want to use it, but I'll gladly help > reviewing it. Hi, Christoph and Ben, It just came to my mind (and this is most likely a stupid question, but still)… Is there any possibility of these changes having an (positive) effect on the long-standing problem of Power Mac machines with AGP graphics cards (which have to be limited to PCI transfers, otherwise they'll hang, due to coherence issues)? If so, I have a G4 machine where I'd gladly test them. Thanks, Rui
Re: [PATCH] ocxl: Clarify error path in setup_xsl_irq()
Le 10/12/2018 à 16:18, Greg Kurz a écrit : Implementing rollback with goto and labels is a common practice that leads to prettier and more maintainable code. FWIW, this design pattern is already being used in alloc_link() a few lines below in this file. Do the same in setup_xsl_irq(). Signed-off-by: Greg Kurz --- This looks good. I don't have a fixed limit when I start using the "goto undo" pattern, so it's likely inconsistent in other places as well. Truth is I'm not too fussed either way. Acked-by: Frederic Barrat drivers/misc/ocxl/link.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index eed92055184d..659977a17405 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -273,9 +273,9 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) spa->irq_name = kasprintf(GFP_KERNEL, "ocxl-xsl-%x-%x-%x", link->domain, link->bus, link->dev); if (!spa->irq_name) { - unmap_irq_registers(spa); dev_err(&dev->dev, "Can't allocate name for xsl interrupt\n"); - return -ENOMEM; + rc = -ENOMEM; + goto err_xsl; } /* * At some point, we'll need to look into allowing a higher @@ -283,11 +283,10 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) */ spa->virq = irq_create_mapping(NULL, hwirq); if (!spa->virq) { - kfree(spa->irq_name); - unmap_irq_registers(spa); dev_err(&dev->dev, "irq_create_mapping failed for translation interrupt\n"); - return -EINVAL; + rc = -EINVAL; + goto err_name; } dev_dbg(&dev->dev, "hwirq %d mapped to virq %d\n", hwirq, spa->virq); @@ -295,15 +294,21 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) rc = request_irq(spa->virq, xsl_fault_handler, 0, spa->irq_name, link); if (rc) { - irq_dispose_mapping(spa->virq); - kfree(spa->irq_name); - unmap_irq_registers(spa); dev_err(&dev->dev, "request_irq failed for translation interrupt: %d\n", rc); - return -EINVAL; + rc = -EINVAL; + goto err_mapping; } return 0; + +err_mapping: + irq_dispose_mapping(spa->virq); +err_name: + kfree(spa->irq_name); +err_xsl: + unmap_irq_registers(spa); + return rc; } static void release_xsl_irq(struct link *link)
Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
Quoting Daniel Borkmann (2018-12-10 08:26:31) > On 12/07/2018 04:36 PM, Michael Roth wrote: > > Quoting Michael Ellerman (2018-12-07 06:31:13) > >> Michael Roth writes: > >> > >>> Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF > >>> JIT allocations. At compile time it defaults to PAGE_SIZE * 4, > >>> and is adjusted again at init time if MODULES_VADDR is defined. > >>> > >>> For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with > >> > >> But maybe it should be, I don't know why we don't define it. > >> > >>> the compile-time default at boot-time, which is 0x9c40 when > >>> using 64K page size. This overflows the signed 32-bit bpf_jit_limit > >>> value: > >>> > >>> root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit > >>> -1673527296 > >>> > >>> and can cause various unexpected failures throughout the network > >>> stack. In one case `strace dhclient eth0` reported: > >>> > >>> setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, > >>> filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524) > >>> > >>> and similar failures can be seen with tools like tcpdump. This doesn't > >>> always reproduce however, and I'm not sure why. The more consistent > >>> failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host > >>> would time out on systemd/netplan configuring a virtio-net NIC with no > >>> noticeable errors in the logs. > >>> > >>> Fix this by limiting the compile-time default for bpf_jit_limit to > >>> INT_MAX. > >> > >> INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on > >> whether we should be using PAGE_SIZE here at all. I guess each BPF > >> program uses at least one page is the thinking? > > > > That seems to be the case, at least, the max number of minimum-sized > > allocations would be less on ppc64 since the allocations are always at > > least PAGE_SIZE in size. The init-time default also limits to INT_MAX, > > so it seemed consistent to do that here too. > > > >> > >> Thanks for tracking this down. For some reason none of my ~10 test boxes > >> have hit this, perhaps I don't have new enough userspace? > > > > I'm not too sure, I would've thought things like the dhclient case in > > the commit log would fail every time, but sometimes I need to reboot the > > guest before I start seeing the behavior. Maybe there's something special > > about when JIT allocations are actually done that can affect > > reproducibility? > > > > In my case at least the virtio-net networking timeout was consistent > > enough for a bisect, but maybe it depends on the specific network > > configuration (single NIC, basic DHCP through netplan/systemd in my case). > > > >> > >> You don't mention why you needed to add BPF_MIN(), I assume because the > >> kernel version of min() has gotten too complicated to work here? > > > > I wasn't sure if it was safe here or not, so I tried looking at other > > users and came across: > > > > mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : > > (y)) /* can't use min() */ > > > > I'm not sure what the reasoning was (or whether it still applies), but I > > figured it was safer to do the same here. Maybe Nick still recalls? > > > >> > >> Daniel I assume you'll merge this via your tree? > >> > >> cheers > >> > >>> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv > >>> allocations") > >>> Cc: linuxppc-...@ozlabs.org > >>> Cc: Daniel Borkmann > >>> Cc: Sandipan Das > >>> Cc: Alexei Starovoitov > >>> Signed-off-by: Michael Roth > > Thanks for the reports / fixes and sorry for my late reply (bit too > swamped last week), some more thoughts below. > > >>> kernel/bpf/core.c | 3 ++- > >>> 1 file changed, 2 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > >>> index b1a3545d0ec8..55de4746cdfd 100644 > >>> --- a/kernel/bpf/core.c > >>> +++ b/kernel/bpf/core.c > >>> @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) > >>> } > >>> > >>> #ifdef CONFIG_BPF_JIT > >>> -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4) > >>> +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y)) > >>> +# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), > >>> INT_MAX) > >>> > >>> /* All BPF JIT sysctl knobs here. */ > >>> int bpf_jit_enable __read_mostly = > >>> IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); > > I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT > define also given for 4.21 arm64 will have its own dedicated area for > JIT allocations where neither the above limit nor the MODULES_END/ > MODULES_VADDR one would fit and I don't want to make this even more > ugly with adding further cases into the core. Would the below variant > work for you? Looks good to me. My one concern (which is probably a separate issue) is that the INT_MAX limit is a bit more punishing for larger page sizes since the minimum allocations seem to be 1 page. Are there reasonable workloads that cou
Re: [PATCH] lib: fix build failure in CONFIG_DEBUG_VIRTUAL test
On Mon, Dec 10, 2018 at 12:08 AM Christophe Leroy wrote: > > On several arches, virt_to_phys() is in io.h > > Build fails without it: > > CC lib/test_debug_virtual.o > lib/test_debug_virtual.c: In function 'test_debug_virtual_init': > lib/test_debug_virtual.c:26:7: error: implicit declaration of function > 'virt_to_phys' [-Werror=implicit-function-declaration] > pa = virt_to_phys(va); >^ > > Fixes: e4dace361552 ("lib: add test module for CONFIG_DEBUG_VIRTUAL") > CC: sta...@vger.kernel.org > Signed-off-by: Christophe Leroy Reviewed-by: Kees Cook -Kees > --- > lib/test_debug_virtual.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/test_debug_virtual.c b/lib/test_debug_virtual.c > index d5a06addeb27..bf864c73e462 100644 > --- a/lib/test_debug_virtual.c > +++ b/lib/test_debug_virtual.c > @@ -5,6 +5,7 @@ > #include > #include > #include > +#include > > #include > #ifdef CONFIG_MIPS > -- > 2.13.3 > -- Kees Cook
Re: use generic DMA mapping code in powerpc V4
On Mon, Dec 10, 2018 at 05:04:46PM +, Rui Salvaterra wrote: > Hi, Christoph and Ben, > > It just came to my mind (and this is most likely a stupid question, > but still)… Is there any possibility of these changes having an > (positive) effect on the long-standing problem of Power Mac machines > with AGP graphics cards (which have to be limited to PCI transfers, > otherwise they'll hang, due to coherence issues)? If so, I have a G4 > machine where I'd gladly test them. These patches themselves are not going to affect that directly. But IFF the problem really is that the AGP needs to be treated as not cache coherent (I have no idea if that is true) the generic direct mapping code has full support for a per-device coherent flag, so support for a non-coherent AGP slot could be implemented relatively simply.
[PATCH 3/6] powerpc/frace: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack
From: "Steven Rostedt (VMware)" [ Folks, I'm working on rewriting the function graph tracer. In order to do so, some changes need to be done that affect architecture specific code. I'm only able to compile test these changes. I would like to have folks check out my repo and give them a test. git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git ftrace/core Head SHA1: 51584396cff54aaf57ed0bd353767d71429f77b4 ] The structure of the ret_stack array on the task struct is going to change, and accessing it directly via the curr_ret_stack index will no longer give the ret_stack entry that holds the return address. To access that, architectures must now use ftrace_graph_get_ret_stack() to get the associated ret_stack that matches the saved return address. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Steven Rostedt (VMware) --- arch/powerpc/kernel/process.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 96f34730010f..ce393df243aa 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -2061,9 +2061,10 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) int count = 0; int firstframe = 1; #ifdef CONFIG_FUNCTION_GRAPH_TRACER - int curr_frame = current->curr_ret_stack; + struct ftrace_ret_stack *ret_stack; extern void return_to_handler(void); unsigned long rth = (unsigned long)return_to_handler; + int curr_frame = 0; #endif sp = (unsigned long) stack; @@ -2089,9 +2090,13 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); #ifdef CONFIG_FUNCTION_GRAPH_TRACER if ((ip == rth) && curr_frame >= 0) { - pr_cont(" (%pS)", - (void *)current->ret_stack[curr_frame].ret); - curr_frame--; + ret_stack = ftrace_graph_get_ret_stack(current, + curr_frame++); + if (ret_stack) + pr_cont(" (%pS)", + (void *)ret_stack->ret); + else + curr_frame = -1; } #endif if (firstframe) -- 2.19.1
[PATCH] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support
On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the physical frame number might be so big that zsmalloc obj encoding (to location) will break, causing: BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc Read of size 4 at addr by task mkfs.ext4/623 CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted 4.19.0-rc8-00017-g8239bc6d3307-dirty #15 Hardware name: Generic DT based system [] (unwind_backtrace) from [] (show_stack+0x20/0x24) [] (show_stack) from [] (dump_stack+0xbc/0xe8) [] (dump_stack) from [] (kasan_report+0x248/0x390) [] (kasan_report) from [] (__asan_load4+0x78/0xb4) [] (__asan_load4) from [] (zs_map_object+0xa4/0x2bc) [] (zs_map_object) from [] (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram]) [] (zram_bvec_rw.constprop.2 [zram]) from [] (zram_make_request+0x234/0x46c [zram]) [] (zram_make_request [zram]) from [] (generic_make_request+0x304/0x63c) [] (generic_make_request) from [] (submit_bio+0x4c/0x1c8) [] (submit_bio) from [] (submit_bh_wbc.constprop.15+0x238/0x26c) [] (submit_bh_wbc.constprop.15) from [] (__block_write_full_page+0x524/0x76c) [] (__block_write_full_page) from [] (block_write_full_page+0x1bc/0x1d4) [] (block_write_full_page) from [] (blkdev_writepage+0x24/0x28) [] (blkdev_writepage) from [] (__writepage+0x44/0x78) [] (__writepage) from [] (write_cache_pages+0x3b8/0x800) [] (write_cache_pages) from [] (generic_writepages+0x74/0xa0) [] (generic_writepages) from [] (blkdev_writepages+0x18/0x1c) [] (blkdev_writepages) from [] (do_writepages+0x68/0x134) [] (do_writepages) from [] (__filemap_fdatawrite_range+0xb0/0xf4) [] (__filemap_fdatawrite_range) from [] (file_write_and_wait_range+0x64/0xd0) [] (file_write_and_wait_range) from [] (blkdev_fsync+0x54/0x84) [] (blkdev_fsync) from [] (vfs_fsync_range+0x70/0xd4) [] (vfs_fsync_range) from [] (do_fsync+0x4c/0x80) [] (do_fsync) from [] (sys_fsync+0x1c/0x20) [] (sys_fsync) from [] (ret_fast_syscall+0x0/0x2c) when trying to decode (the pfn) and map the object. That happens because one architecture might not re-define MAX_PHYSMEM_BITS, like in this ARM 32-bit w/ LPAE enabled example. For 32-bit systems, if not re-defined, MAX_POSSIBLE_PHYSMEM_BITS will default to BITS_PER_LONG (32) in most cases, and, with PAE enabled, _PFN_BITS might be wrong: which may cause obj variable to overflow if frame number is in HIGHMEM and referencing a page above the 4GB watermark. commit 6e00ec00b1a7 ("staging: zsmalloc: calculate MAX_PHYSMEM_BITS if not defined") realized MAX_PHYSMEM_BITS depended on SPARSEMEM headers and "fixed" it by calculating it using BITS_PER_LONG if SPARSEMEM wasn't used, like in the example given above. Systems with potential for PAE exist for a long time and assuming BITS_PER_LONG seems inadequate. Defining MAX_PHYSMEM_BITS looks better, however it is NOT a constant anymore for x86. SO, instead, MAX_POSSIBLE_PHYSMEM_BITS should be defined by every architecture using zsmalloc, together with a sanity check for MAX_POSSIBLE_PHYSMEM_BITS being too big on 32-bit systems. Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17 Signed-off-by: Rafael David Tinoco --- arch/arm/include/asm/pgtable-2level-types.h | 2 ++ arch/arm/include/asm/pgtable-3level-types.h | 2 ++ arch/arm64/include/asm/pgtable-types.h | 2 ++ arch/ia64/include/asm/page.h| 2 ++ arch/mips/include/asm/page.h| 2 ++ arch/powerpc/include/asm/mmu.h | 2 ++ arch/s390/include/asm/page.h| 2 ++ arch/sh/include/asm/page.h | 2 ++ arch/sparc/include/asm/page_32.h| 2 ++ arch/sparc/include/asm/page_64.h| 2 ++ arch/x86/include/asm/pgtable-2level_types.h | 2 ++ arch/x86/include/asm/pgtable-3level_types.h | 3 +- arch/x86/include/asm/pgtable_64_types.h | 4 +-- mm/zsmalloc.c | 35 +++-- 14 files changed, 45 insertions(+), 19 deletions(-) diff --git a/arch/arm/include/asm/pgtable-2level-types.h b/arch/arm/include/asm/pgtable-2level-types.h index 66cb5b0e89c5..552dba411324 100644 --- a/arch/arm/include/asm/pgtable-2level-types.h +++ b/arch/arm/include/asm/pgtable-2level-types.h @@ -64,4 +64,6 @@ typedef pteval_t pgprot_t; #endif /* STRICT_MM_TYPECHECKS */ +#define MAX_POSSIBLE_PHYSMEM_BITS 32 + #endif /* _ASM_PGTABLE_2LEVEL_TYPES_H */ diff --git a/arch/arm/include/asm/pgtable-3level-types.h b/arch/arm/include/asm/pgtable-3level-types.h index 921aa30259c4..664c39e6517c 100644 --- a/arch/arm/include/asm/pgtable-3level-types.h +++ b/arch/arm/include/asm/pgtable-3level-types.h @@ -67,4 +67,6 @@ typedef pteval_t pgprot_t; #endif /* STRICT_MM_TYPECHECKS */ +#define MAX_POSSIBLE_PHYSMEM_BITS 36 + #endif /* _ASM_PGTABLE_3LEVEL_TYPES_H */ diff --git a/arch/arm64/include/asm/pgtable-types.h b/arch/arm64/include/asm/pgtable-types.h index 345a072b5856..45c3834eb4c8 100644 --- a/arch/arm64/include/asm/pgtable-types.h +++ b/arch/arm64/include/asm/pgtable-types.h @@ -64,4 +64,
Re: [PATCH] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support
On 12/10/18 1:15 PM, Kirill A. Shutemov wrote: > On Mon, Dec 10, 2018 at 12:21:05PM -0200, Rafael David Tinoco wrote: >> diff --git a/arch/x86/include/asm/pgtable_64_types.h >> b/arch/x86/include/asm/pgtable_64_types.h >> index 84bd9bdc1987..d808cfde3d19 100644 >> --- a/arch/x86/include/asm/pgtable_64_types.h >> +++ b/arch/x86/include/asm/pgtable_64_types.h >> @@ -64,8 +64,6 @@ extern unsigned int ptrs_per_p4d; >> #define P4D_SIZE(_AC(1, UL) << P4D_SHIFT) >> #define P4D_MASK(~(P4D_SIZE - 1)) >> >> -#define MAX_POSSIBLE_PHYSMEM_BITS 52 >> - >> #else /* CONFIG_X86_5LEVEL */ >> >> /* >> @@ -154,4 +152,6 @@ extern unsigned int ptrs_per_p4d; >> >> #define PGD_KERNEL_START((PAGE_SIZE / 2) / sizeof(pgd_t)) >> >> +#define MAX_POSSIBLE_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46) >> + > > ... > >> #endif /* _ASM_X86_PGTABLE_64_DEFS_H */ >> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c >> index 0787d33b80d8..132c20b6fd4f 100644 >> --- a/mm/zsmalloc.c >> +++ b/mm/zsmalloc.c > > ... > >> @@ -116,6 +100,25 @@ >> */ >> #define OBJ_ALLOCATED_TAG 1 >> #define OBJ_TAG_BITS 1 >> + >> +/* >> + * MAX_POSSIBLE_PHYSMEM_BITS should be defined by all archs using zsmalloc: >> + * Trying to guess it from MAX_PHYSMEM_BITS, or considering it >> BITS_PER_LONG, >> + * proved to be wrong by not considering PAE capabilities, or using >> SPARSEMEM >> + * only headers, leading to bad object encoding due to object index >> overflow. >> + */ >> +#ifndef MAX_POSSIBLE_PHYSMEM_BITS >> + #define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG >> + #error "MAX_POSSIBLE_PHYSMEM_BITS HAS to be defined by arch using >> zsmalloc"; >> +#else >> + #ifndef CONFIG_64BIT >> + #if (MAX_POSSIBLE_PHYSMEM_BITS >= (BITS_PER_LONG + PAGE_SHIFT - >> OBJ_TAG_BITS)) >> + #error "MAX_POSSIBLE_PHYSMEM_BITS is wrong for this arch"; >> + #endif >> + #endif >> +#endif >> + >> +#define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT) >> #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) >> #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) > > Have you tested it with CONFIG_X86_5LEVEL=y? > > ASAICS, the patch makes OBJ_INDEX_BITS and what depends from it dynamic -- > it depends what paging mode we are booting in. ZS_SIZE_CLASSES depends > indirectly on OBJ_INDEX_BITS and I don't see how struct zs_pool definition > can compile with dynamic ZS_SIZE_CLASSES. > > Hm? > You're right, terribly sorry. This was a last time change. mm/zsmalloc.c:256:21: error: variably modified ‘size_class’ at file scope I'll revisit the patch. Any other comments are welcome. Thank you
Re: [PATCH] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support
On 12/10/18 1:05 PM, Russell King - ARM Linux wrote: > On Mon, Dec 10, 2018 at 02:35:55PM +, Robin Murphy wrote: >> On 10/12/2018 14:21, Rafael David Tinoco wrote: >>> On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the >>> physical frame number might be so big that zsmalloc obj encoding (to >>> location) will break, causing: >>> >>> BUG: KASAN: null-ptr-deref in zs_map_object+0xa4/0x2bc >>> Read of size 4 at addr by task mkfs.ext4/623 >>> CPU: 2 PID: 623 Comm: mkfs.ext4 Not tainted >>> 4.19.0-rc8-00017-g8239bc6d3307-dirty #15 >>> Hardware name: Generic DT based system >>> [] (unwind_backtrace) from [] (show_stack+0x20/0x24) >>> [] (show_stack) from [] (dump_stack+0xbc/0xe8) >>> [] (dump_stack) from [] (kasan_report+0x248/0x390) >>> [] (kasan_report) from [] (__asan_load4+0x78/0xb4) >>> [] (__asan_load4) from [] (zs_map_object+0xa4/0x2bc) >>> [] (zs_map_object) from [] >>> (zram_bvec_rw.constprop.2+0x324/0x8e4 [zram]) >>> [] (zram_bvec_rw.constprop.2 [zram]) from [] >>> (zram_make_request+0x234/0x46c [zram]) >>> [] (zram_make_request [zram]) from [] >>> (generic_make_request+0x304/0x63c) >>> [] (generic_make_request) from [] >>> (submit_bio+0x4c/0x1c8) >>> [] (submit_bio) from [] >>> (submit_bh_wbc.constprop.15+0x238/0x26c) >>> [] (submit_bh_wbc.constprop.15) from [] >>> (__block_write_full_page+0x524/0x76c) >>> [] (__block_write_full_page) from [] >>> (block_write_full_page+0x1bc/0x1d4) >>> [] (block_write_full_page) from [] >>> (blkdev_writepage+0x24/0x28) >>> [] (blkdev_writepage) from [] (__writepage+0x44/0x78) >>> [] (__writepage) from [] (write_cache_pages+0x3b8/0x800) >>> [] (write_cache_pages) from [] >>> (generic_writepages+0x74/0xa0) >>> [] (generic_writepages) from [] >>> (blkdev_writepages+0x18/0x1c) >>> [] (blkdev_writepages) from [] >>> (do_writepages+0x68/0x134) >>> [] (do_writepages) from [] >>> (__filemap_fdatawrite_range+0xb0/0xf4) >>> [] (__filemap_fdatawrite_range) from [] >>> (file_write_and_wait_range+0x64/0xd0) >>> [] (file_write_and_wait_range) from [] >>> (blkdev_fsync+0x54/0x84) >>> [] (blkdev_fsync) from [] (vfs_fsync_range+0x70/0xd4) >>> [] (vfs_fsync_range) from [] (do_fsync+0x4c/0x80) >>> [] (do_fsync) from [] (sys_fsync+0x1c/0x20) >>> [] (sys_fsync) from [] (ret_fast_syscall+0x0/0x2c) >>> >>> when trying to decode (the pfn) and map the object. >>> >>> That happens because one architecture might not re-define >>> MAX_PHYSMEM_BITS, like in this ARM 32-bit w/ LPAE enabled example. For >>> 32-bit systems, if not re-defined, MAX_POSSIBLE_PHYSMEM_BITS will >>> default to BITS_PER_LONG (32) in most cases, and, with PAE enabled, >>> _PFN_BITS might be wrong: which may cause obj variable to overflow if >>> frame number is in HIGHMEM and referencing a page above the 4GB >>> watermark. >>> >>> commit 6e00ec00b1a7 ("staging: zsmalloc: calculate MAX_PHYSMEM_BITS if >>> not defined") realized MAX_PHYSMEM_BITS depended on SPARSEMEM headers >>> and "fixed" it by calculating it using BITS_PER_LONG if SPARSEMEM wasn't >>> used, like in the example given above. >>> >>> Systems with potential for PAE exist for a long time and assuming >>> BITS_PER_LONG seems inadequate. Defining MAX_PHYSMEM_BITS looks better, >>> however it is NOT a constant anymore for x86. >>> >>> SO, instead, MAX_POSSIBLE_PHYSMEM_BITS should be defined by every >>> architecture using zsmalloc, together with a sanity check for >>> MAX_POSSIBLE_PHYSMEM_BITS being too big on 32-bit systems. >>> >>> Link: https://bugs.linaro.org/show_bug.cgi?id=3765#c17 >>> Signed-off-by: Rafael David Tinoco >>> --- >>> arch/arm/include/asm/pgtable-2level-types.h | 2 ++ >>> arch/arm/include/asm/pgtable-3level-types.h | 2 ++ >>> arch/arm64/include/asm/pgtable-types.h | 2 ++ >>> arch/ia64/include/asm/page.h| 2 ++ >>> arch/mips/include/asm/page.h| 2 ++ >>> arch/powerpc/include/asm/mmu.h | 2 ++ >>> arch/s390/include/asm/page.h| 2 ++ >>> arch/sh/include/asm/page.h | 2 ++ >>> arch/sparc/include/asm/page_32.h| 2 ++ >>> arch/sparc/include/asm/page_64.h| 2 ++ >>> arch/x86/include/asm/pgtable-2level_types.h | 2 ++ >>> arch/x86/include/asm/pgtable-3level_types.h | 3 +- >>> arch/x86/include/asm/pgtable_64_types.h | 4 +-- >>> mm/zsmalloc.c | 35 +++-- >>> 14 files changed, 45 insertions(+), 19 deletions(-) >>> >>> diff --git a/arch/arm/include/asm/pgtable-2level-types.h >>> b/arch/arm/include/asm/pgtable-2level-types.h >>> index 66cb5b0e89c5..552dba411324 100644 >>> --- a/arch/arm/include/asm/pgtable-2level-types.h >>> +++ b/arch/arm/include/asm/pgtable-2level-types.h >>> @@ -64,4 +64,6 @@ typedef pteval_t pgprot_t; >>> #endif /* STRICT_MM_TYPECHECKS */ >>> +#define MAX_POSSIBLE_PHYSMEM_BITS 32 >>> + >>> #endif /* _ASM_PGTABLE_2LEVEL_TYPES_H */ >>> diff --git a/arch/arm/include/asm/pgtable-3level-types.h >>> b/arch/arm/incl
[PATCH v1 0/2] Interface to expose VPHN information
This is an alternate way to provide statistics around vcpu dispatches in a SPLPAR environment. By obtaining access to the VPHN information, and in concert with the existing debugfs dtl interface, we can derive different statistics related to how vcpus are dispatched by the hypervisor. - Naveen Naveen N. Rao (2): powerpc/pseries: Generalize hcall_vphn() powerpc/pseries: Add debugfs interface to retrieve VPHN info arch/powerpc/mm/numa.c | 132 - 1 file changed, 118 insertions(+), 14 deletions(-) -- 2.19.2
[PATCH v1 1/2] powerpc/pseries: Generalize hcall_vphn()
H_HOME_NODE_ASSOCIATIVITY hcall can take two different flags and return different associativity information in each case. Generalize the existing hcall_vphn() function to take flags as an argument and to return the result. Update the only existing user to pass the proper arguments. Signed-off-by: Naveen N. Rao --- arch/powerpc/mm/numa.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 87f0dd004295..6677a578f18d 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -1078,6 +1078,17 @@ static void reset_topology_timer(void); static int topology_timer_secs = 1; static int topology_inited; +static long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity) +{ + long rc; + long retbuf[PLPAR_HCALL9_BUFSIZE] = {0}; + + rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, retbuf, flags, cpu); + vphn_unpack_associativity(retbuf, associativity); + + return rc; +} + /* * Change polling interval for associativity changes. */ @@ -1156,25 +1167,13 @@ static int update_cpu_associativity_changes_mask(void) * Retrieve the new associativity information for a virtual processor's * home node. */ -static long hcall_vphn(unsigned long cpu, __be32 *associativity) -{ - long rc; - long retbuf[PLPAR_HCALL9_BUFSIZE] = {0}; - u64 flags = 1; - int hwcpu = get_hard_smp_processor_id(cpu); - - rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, retbuf, flags, hwcpu); - vphn_unpack_associativity(retbuf, associativity); - - return rc; -} - static long vphn_get_associativity(unsigned long cpu, __be32 *associativity) { long rc; + int hwcpu = get_hard_smp_processor_id(cpu); - rc = hcall_vphn(cpu, associativity); + rc = hcall_vphn(hwcpu, 1, associativity); switch (rc) { case H_FUNCTION: -- 2.19.2
[PATCH v1 2/2] powerpc/pseries: Add debugfs interface to retrieve VPHN info
Add debugfs interface to retrieve associativity information for lpar vcpus (debugfs/vphn/lpar) and the hypervisor cpus (debugfs/vphn/hyp). This information is useful to derive various metrics, including the vcpu dispatch statistics in a SPLPAR environment. Signed-off-by: Naveen N. Rao --- arch/powerpc/mm/numa.c | 105 + 1 file changed, 105 insertions(+) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 6677a578f18d..f0b0e87016e6 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -40,6 +40,7 @@ #include #include #include +#include static int numa_enabled = 1; @@ -1089,6 +1090,107 @@ static long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity) return rc; } +#ifdef CONFIG_DEBUG_FS +static ssize_t vphn_lpar_cpu_file_read(struct file *filp, char __user *buf, + size_t len, loff_t *pos) +{ + int cpu = (long)filp->private_data; + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0}; + int hwcpu = get_hard_smp_processor_id(cpu); + long int rc; + + if (len != sizeof(associativity)) + return -EINVAL; + + rc = hcall_vphn(hwcpu, 1, associativity); + if (rc) + return -EFAULT; + + rc = copy_to_user(buf, &associativity, sizeof(associativity)); + if (rc) + return -EFAULT; + + return sizeof(associativity); +} + +static ssize_t vphn_hyp_cpu_file_read(struct file *filp, char __user *buf, + size_t len, loff_t *pos) +{ + int cpu = (long)filp->private_data; + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0}; + long int rc; + + if (len != sizeof(associativity)) + return -EINVAL; + + rc = hcall_vphn(cpu, 2, associativity); + if (rc) + return -EFAULT; + + rc = copy_to_user(buf, &associativity, sizeof(associativity)); + if (rc) + return -EFAULT; + + return sizeof(associativity); +} + +static const struct file_operations vphn_lpar_cpu_fops = { + .open = simple_open, + .read = vphn_lpar_cpu_file_read, + .llseek = no_llseek, +}; + +static const struct file_operations vphn_hyp_cpu_fops = { + .open = simple_open, + .read = vphn_hyp_cpu_file_read, + .llseek = no_llseek, +}; + +static int debug_init_vphn_entries(void) +{ + struct dentry *vphn_dir, *vphn_lpar_dir, *vphn_hyp_dir; + struct dentry *vphn_lpar_cpu_file, *vphn_hyp_cpu_file; + long cpu; + char name[10]; + + if (!firmware_has_feature(FW_FEATURE_SPLPAR)) + return 0; + + vphn_dir = debugfs_create_dir("vphn", powerpc_debugfs_root); + if (!vphn_dir) { + pr_warn("%s: can't create vphn debugfs root dir\n", __func__); + return -ENOMEM; + } + + vphn_lpar_dir = debugfs_create_dir("lpar", vphn_dir); + vphn_hyp_dir = debugfs_create_dir("hyp", vphn_dir); + if (!vphn_lpar_dir || !vphn_hyp_dir) { + pr_warn("%s: can't create vphn dir\n", __func__); + goto err_remove_dir; + } + + for_each_possible_cpu(cpu) { + sprintf(name, "cpu-%ld", cpu); + vphn_lpar_cpu_file = debugfs_create_file(name, 0400, + vphn_lpar_dir, (void *)cpu, &vphn_lpar_cpu_fops); + vphn_hyp_cpu_file = debugfs_create_file(name, 0400, + vphn_hyp_dir, (void *)cpu, &vphn_hyp_cpu_fops); + if (!vphn_lpar_cpu_file || !vphn_hyp_cpu_file) { + pr_warn("%s: can't create vphn cpu file\n", __func__); + goto err_remove_dir; + } + } + + return 0; + +err_remove_dir: + debugfs_remove_recursive(vphn_dir); + return -ENOMEM; +} +#else +static int debug_init_vphn_entries(void) { return 0; } +#endif /* CONFIG_DEBUG_FS */ + /* * Change polling interval for associativity changes. */ @@ -1619,6 +1721,9 @@ static int topology_update_init(void) if (!proc_create("powerpc/topology_updates", 0644, NULL, &topology_ops)) return -ENOMEM; + if (!debug_init_vphn_entries()) + return -ENOMEM; + topology_inited = 1; return 0; } -- 2.19.2
Re: use generic DMA mapping code in powerpc V4
On Mon, 10 Dec 2018 at 19:33, Christoph Hellwig wrote: > > On Mon, Dec 10, 2018 at 05:04:46PM +, Rui Salvaterra wrote: > > Hi, Christoph and Ben, > > > > It just came to my mind (and this is most likely a stupid question, > > but still)… Is there any possibility of these changes having an > > (positive) effect on the long-standing problem of Power Mac machines > > with AGP graphics cards (which have to be limited to PCI transfers, > > otherwise they'll hang, due to coherence issues)? If so, I have a G4 > > machine where I'd gladly test them. > > These patches themselves are not going to affect that directly. > But IFF the problem really is that the AGP needs to be treated as not > cache coherent (I have no idea if that is true) the generic direct > mapping code has full support for a per-device coherent flag, so > support for a non-coherent AGP slot could be implemented relatively > simply. Thanks for the insight, Christoph. Well, the problem[1] is real, and it's been known for a long time, though I can't be sure if it's *only* a coherence issue. If someone who knows the hardware manages to cook up a patch (as hacky is it may be), I'll definitely fire up old my G4 laptop to test it! :) [1] https://bugs.freedesktop.org/show_bug.cgi?id=95017
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
I have asked Scott Mayes to take a look at one of these crashes from the phyp side. I will let you know if he finds anything notable. Michael On 12/07/2018 08:40 PM, Thiago Jung Bauermann wrote: > > Gautham R Shenoy writes: >> On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: >>> Sure. I will test the patch and report back. >> >> I added the following debug patch on top of your patch, and after an >> hour's run, the system crashed. Appending the log at the end. > > Thank you very much for testing! Your debug patch was very helpful as > well. > >> I suppose we still need to increase the number of tries since we wait >> only for 2.5ms looping before giving up. > > Do you think it would have helped? In the debug output you posted I > would have expected the following message to show up if the loop > finished too early, and it didn't: > > "Querying DEAD? cpu %i (%i) shows %i\n" > > So I don't think increasing the loop length would have made a > difference. In fact, the call to smp_query_cpu_stopped() always > succeeded on the first iteration. > > I think there is something else going on which we don't fully understand > yet. From your other email: > >> I agree that the Kernel has to respect RTAS's restriction. The PAPR >> v2.8.1, Requirement R1-7.2.3-8 under section 7.2.3 says the following: >> >> "The stop-self service needs to be serialized with calls to the >> stop-self, start-cpu, and set-power-level services. The OS must >> be able to call RTAS services on other processors while the >> processor is stopped or being stopped" >> >> Thus the onus is on the OS to ensure that there are no concurrent rtas >> calls with "stop-self" token. > > As you say perhaps there's another call to stop-self, start-cpu or > set-power-level being made concurrently. I don't currently see how more > than one stop-self or start-cpu call could be in flight at the same time > given that there are a number of locks being grabbed during CPU hotplug > and unplug. OTOH the CPU that actually calls stop-self doesn't seem to > grab any locks itself so it's a possibility. > > As for set-power-level, it's only used in the case of PCI hotplug from > what I can see, and that isn't part of the picture in this case, right? > > We could address this problem directly by adding another lock separate > from rtas.lock to serialize just these calls. The challenge is with > stop-self, because the CPU calling it will never return to release the > lock. Is it possible to grab a lock (or down a semaphore) in the CPU > calling stop-self and then release the lock (or up the semaphore) in the > CPU running pseries_cpu_die()? > >>> There's also a race between the CPU driving the unplug and the CPU >>> being unplugged which I think is not easy for the CPU being >>> unplugged to win, which makes the busy loop in pseries_cpu_die() a >>> bit fragile. I describe the race in the patch description. >>> >>> My solution to make the race less tight is to make the CPU driving >>> the unplug to only start the busy loop only after the CPU being >>> unplugged is in the CPU_STATE_OFFLINE state. At that point, we know >>> that it either is about to call RTAS or it already has. >> >> Ah, yes this is good optimization. Though, I think we ought to >> unconditionally wait until the target CPU has woken up from CEDE and >> changed its state to CPU_STATE_OFFLINE. After if PROD failed, then we >> would have caught it in dlpar_offline_cpu() itself. > > I recently saw a QEMU-implemented hcall (H_LOGICAL_CI_LOAD) return > success when it had been given an invalid memory address to load from, > so my confidence in the error reporting of hcalls is a bit shaken. :-) > > In that case the CPU would wait forever for the CPU state to change. If > you believe 100 ms is too short a timeout, we could make it 500 ms or > even 1s. What do you think? > >> cpu 112 (hwid 112) Ready to die... >> [DEBUG] Waited for CPU 112 to enter rtas: tries=0, time=65 >> cpu 113 (hwid 113) Ready to die... >> [DEBUG] Waited for CPU 113 to enter rtas: tries=0, time=1139 >> cpu 114 (hwid 114) Ready to die... >> [DEBUG] Waited for CPU 114 to enter rtas: tries=0, time=1036 >> cpu 115 (hwid 115) Ready to die... >> [DEBUG] Waited for CPU 115 to enter rtas: tries=0, time=133 >> cpu 116 (hwid 116) Ready to die... >> [DEBUG] Waited for CPU 116 to enter rtas: tries=0, time=1231 >> cpu 117 (hwid 117) Ready to die... >> [DEBUG] Waited for CPU 117 to enter rtas: tries=0, time=1231 >> cpu 118 (hwid 118) Ready to die... >> [DEBUG] Waited for CPU 118 to enter rtas: tries=0, time=1231 >> cpu 119 (hwid 119) Ready to die... >> [DEBUG] Waited for CPU 119 to enter rtas: tries=0, time=1131 >> cpu 104 (hwid 104) Ready to die... >> [DEBUG] Waited for CPU 104 to enter rtas: tries=0, time=40 > > Interesting, so 1.2 ms can pass before the dying CPU actually gets close > to making the stop-self call. And even in those cases the retry loop is > succeeding on the first try! So this shows th
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Hello Michael, Michael Bringmann writes: > I have asked Scott Mayes to take a look at one of these crashes from > the phyp side. I will let you know if he finds anything notable. Thanks! It might make sense to test whether booting with cede_offline=off makes the bug go away. One suspicion I have is regarding the code handling CPU_STATE_INACTIVE. >From what I understand, it is a powerpc-specific CPU state and from the perspective of the generic CPU hotplug state machine, inactive CPUs are already fully offline. Which means that the locking performed by the generic code state machine doesn't apply to transitioning CPUs from INACTIVE to OFFLINE state. Perhaps the bug is that there is more than one CPU making that transition at the same time? That would cause two CPUs to call RTAS stop-self. I haven't checked whether this is really possible or not, though. It's just a conjecture. -- Thiago Jung Bauermann IBM Linux Technology Center
[PATCH v3] kbuild: Add support for DT binding schema checks
This adds the build infrastructure for checking DT binding schema documents and validating dts files using the binding schema. Check DT binding schema documents: make dt_binding_check Build dts files and check using DT binding schema: make dtbs_check Optionally, DT_SCHEMA_FILES can passed in with a schema file(s) to use for validation. This makes it easier to find and fix errors generated by a specific schema. Currently, the validation targets are separate from a normal build to avoid a hard dependency on the external DT schema project and because there are lots of warnings generated. Cc: Jonathan Corbet Cc: Mark Rutland Cc: Masahiro Yamada Cc: Michal Marek Cc: linux-...@vger.kernel.org Cc: devicet...@vger.kernel.org Cc: linux-kbu...@vger.kernel.org Signed-off-by: Rob Herring --- v3: - Fix error causing only 1st schema file to get used. - Add a more useful error message when dtc is missing YAML support telling the user they need to install libyaml devel package. .gitignore | 1 + Documentation/Makefile | 2 +- Documentation/devicetree/bindings/.gitignore | 1 + Documentation/devicetree/bindings/Makefile | 33 + Makefile | 11 -- scripts/Makefile.lib | 37 ++-- 6 files changed, 80 insertions(+), 5 deletions(-) create mode 100644 Documentation/devicetree/bindings/.gitignore create mode 100644 Documentation/devicetree/bindings/Makefile diff --git a/.gitignore b/.gitignore index 97ba6b79834c..a20ac26aa2f5 100644 --- a/.gitignore +++ b/.gitignore @@ -15,6 +15,7 @@ *.bin *.bz2 *.c.[012]*.* +*.dt.yaml *.dtb *.dtb.S *.dwo diff --git a/Documentation/Makefile b/Documentation/Makefile index 2ca77ad0f238..9786957c6a35 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile @@ -2,7 +2,7 @@ # Makefile for Sphinx documentation # -subdir-y := +subdir-y := devicetree/bindings/ # You can set these variables from the command line. SPHINXBUILD = sphinx-build diff --git a/Documentation/devicetree/bindings/.gitignore b/Documentation/devicetree/bindings/.gitignore new file mode 100644 index ..d9194c02dd08 --- /dev/null +++ b/Documentation/devicetree/bindings/.gitignore @@ -0,0 +1 @@ +*.example.dts diff --git a/Documentation/devicetree/bindings/Makefile b/Documentation/devicetree/bindings/Makefile new file mode 100644 index ..43f8657ab070 --- /dev/null +++ b/Documentation/devicetree/bindings/Makefile @@ -0,0 +1,33 @@ +# SPDX-License-Identifier: GPL-2.0 +DT_DOC_CHECKER ?= dt-doc-validate +DT_EXTRACT_EX ?= dt-extract-example +DT_MK_SCHEMA ?= dt-mk-schema +DT_MK_SCHEMA_FLAGS := $(if $(DT_SCHEMA_FILES), -u) + +quiet_cmd_chk_binding = CHKDT $< + cmd_chk_binding = (set -e; \ + $(DT_DOC_CHECKER) $< ; \ + mkdir -p $(dir $@) ; \ + $(DT_EXTRACT_EX) $< > $@ ) + +$(obj)/%.example.dts: $(src)/%.yaml FORCE + $(call if_changed,chk_binding) + +DT_TMP_SCHEMA := .schema.yaml.tmp +extra-y += $(DT_TMP_SCHEMA) + +quiet_cmd_mk_schema = SCHEMA $@ + cmd_mk_schema = mkdir -p $(obj); \ + rm -f $@; \ + $(DT_MK_SCHEMA) $(DT_MK_SCHEMA_FLAGS) -o $@ $(filter-out FORCE, $^) + +DT_DOCS = $(shell cd $(srctree)/$(src) && find * -name '*.yaml') +DT_SCHEMA_FILES ?= $(addprefix $(src)/,$(DT_DOCS)) + +extra-y += $(patsubst $(src)/%.yaml,%.example.dts, $(DT_SCHEMA_FILES)) +extra-y += $(patsubst $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES)) + +$(obj)/$(DT_TMP_SCHEMA): | $(addprefix $(obj)/,$(patsubst $(src)/%.yaml,%.example.dtb, $(DT_SCHEMA_FILES))) + +$(obj)/$(DT_TMP_SCHEMA): $(addprefix $(srctree)/, $(DT_SCHEMA_FILES)) FORCE + $(call if_changed,mk_schema) diff --git a/Makefile b/Makefile index 2f36db897895..ff59adf43300 100644 --- a/Makefile +++ b/Makefile @@ -1232,10 +1232,13 @@ ifneq ($(dtstree),) %.dtb: prepare3 scripts_dtc $(Q)$(MAKE) $(build)=$(dtstree) $(dtstree)/$@ -PHONY += dtbs dtbs_install +PHONY += dtbs dtbs_install dt_binding_check dtbs: prepare3 scripts_dtc $(Q)$(MAKE) $(build)=$(dtstree) +dtbs_check: prepare3 dt_binding_check + $(Q)$(MAKE) $(build)=$(dtstree) CHECK_DTBS=1 + dtbs_install: $(Q)$(MAKE) $(dtbinst)=$(dtstree) @@ -1249,6 +1252,9 @@ PHONY += scripts_dtc scripts_dtc: scripts_basic $(Q)$(MAKE) $(build)=scripts/dtc +dt_binding_check: scripts_dtc + $(Q)$(MAKE) $(build)=Documentation/devicetree/bindings + # --- # Modules @@ -1611,7 +1617,8 @@ clean: $(clean-dirs) $(call cmd,rmfiles) @find $(if $(KBUILD_EXTMOD), $(KBUILD_EXTMOD), .) $(RCS_FIND_IGNORE) \ \( -name '*.[aios]' -o -name '*.ko' -o -name '.*.cmd' \ - -o -name '*.ko.*' -o -name '*.dtb' -o -name '*.dtb.S' \ + -o -name '*.ko.*' \ +
Re: use generic DMA mapping code in powerpc V4
On Mon, 2018-12-10 at 20:33 +0100, Christoph Hellwig wrote: > On Mon, Dec 10, 2018 at 05:04:46PM +, Rui Salvaterra wrote: > > Hi, Christoph and Ben, > > > > It just came to my mind (and this is most likely a stupid question, > > but still)… Is there any possibility of these changes having an > > (positive) effect on the long-standing problem of Power Mac machines > > with AGP graphics cards (which have to be limited to PCI transfers, > > otherwise they'll hang, due to coherence issues)? If so, I have a G4 > > machine where I'd gladly test them. > > These patches themselves are not going to affect that directly. > But IFF the problem really is that the AGP needs to be treated as not > cache coherent (I have no idea if that is true) the generic direct > mapping code has full support for a per-device coherent flag, so > support for a non-coherent AGP slot could be implemented relatively > simply. AGP is a gigantic nightmare :-) It's not just cache coherency issues (some implementations are coherent, some aren't, Apple's is ... weird). Apple has all sort of bugs, and Darwin source code only sheds light on some of them. Some implementation can only read, not write I think, for example. There are issues with transfers crossing some boundaries I beleive, but it's all unclear. Apple makes this work with a combination of hacks in the AGP "driver" and the closed source GPU driver, which we don't see. I have given up trying to make that stuff work reliably a decade ago :) Cheers, Ben.
[PATCH net 0/2] net/ibmvnic: Fix reset work item locking bugs
This patch set fixes issues with scheduling reset work items in a tasklet context. Since ibmvnic_reset can called in an interrupt, it should not use a mutex or allocate memory non-atomically. Thomas Falcon (2): ibmvnic: Convert reset work item mutex to spin lock ibmvnic: Fix non-atomic memory allocation in IRQ context drivers/net/ethernet/ibm/ibmvnic.c | 18 ++ drivers/net/ethernet/ibm/ibmvnic.h | 2 +- 2 files changed, 11 insertions(+), 9 deletions(-) -- 2.12.3
[PATCH net 1/2] ibmvnic: Convert reset work item mutex to spin lock
ibmvnic_reset can create and schedule a reset work item from an IRQ context, so do not use a mutex, which can sleep. Convert the reset work item mutex to a spin lock. Locking debugger generated the trace output below. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908 in_atomic(): 1, irqs_disabled(): 1, pid: 120, name: kworker/8:1 4 locks held by kworker/8:1/120: #0: 17c05720 ((wq_completion)"events"){+.+.}, at: process_one_work+0x188/0x710 #1: ace90706 ((linkwatch_work).work){+.+.}, at: process_one_work+0x188/0x710 #2: 7632871f (rtnl_mutex){+.+.}, at: rtnl_lock+0x30/0x50 #3: fc36813a (&(&crq->lock)->rlock){..-.}, at: ibmvnic_tasklet+0x88/0x2010 [ibmvnic] irq event stamp: 26293 hardirqs last enabled at (26292): [] tasklet_action_common.isra.12+0x78/0x1c0 hardirqs last disabled at (26293): [] _raw_spin_lock_irqsave+0x48/0xf0 softirqs last enabled at (26288): [] dev_deactivate_queue.constprop.28+0xc8/0x160 softirqs last disabled at (26289): [] call_do_softirq+0x14/0x24 CPU: 8 PID: 120 Comm: kworker/8:1 Kdump: loaded Not tainted 4.20.0-rc6 #6 Workqueue: events linkwatch_event Call Trace: [c003fffa7a50] [c0bc83e4] dump_stack+0xe8/0x164 (unreliable) [c003fffa7aa0] [c015ba0c] ___might_sleep+0x2dc/0x320 [c003fffa7b20] [c0be960c] __mutex_lock+0x8c/0xb40 [c003fffa7c30] [d6202ac8] ibmvnic_reset+0x78/0x330 [ibmvnic] [c003fffa7cc0] [d62097f4] ibmvnic_tasklet+0x1054/0x2010 [ibmvnic] [c003fffa7e00] [c01224c8] tasklet_action_common.isra.12+0xd8/0x1c0 [c003fffa7e60] [c0bf1238] __do_softirq+0x1a8/0x64c [c003fffa7f90] [c00306e0] call_do_softirq+0x14/0x24 [c003f3f87980] [c001ba50] do_softirq_own_stack+0x60/0xb0 [c003f3f879c0] [c01218a8] do_softirq+0xa8/0x100 [c003f3f879f0] [c0121a74] __local_bh_enable_ip+0x174/0x180 [c003f3f87a60] [c0bf003c] _raw_spin_unlock_bh+0x5c/0x80 [c003f3f87a90] [c0a8ac78] dev_deactivate_queue.constprop.28+0xc8/0x160 [c003f3f87ad0] [c0a8c8b0] dev_deactivate_many+0xd0/0x520 [c003f3f87b70] [c0a8cd40] dev_deactivate+0x40/0x60 [c003f3f87ba0] [c0a5e0c4] linkwatch_do_dev+0x74/0xd0 [c003f3f87bd0] [c0a5e694] __linkwatch_run_queue+0x1a4/0x1f0 [c003f3f87c30] [c0a5e728] linkwatch_event+0x48/0x60 [c003f3f87c50] [c01444e8] process_one_work+0x238/0x710 [c003f3f87d20] [c0144a48] worker_thread+0x88/0x4e0 [c003f3f87db0] [c014e3a8] kthread+0x178/0x1c0 [c003f3f87e20] [c000bfd0] ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Thomas Falcon --- drivers/net/ethernet/ibm/ibmvnic.c | 16 +--- drivers/net/ethernet/ibm/ibmvnic.h | 2 +- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index ed50b8dee44f..ffc0cab05b0f 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -1939,8 +1939,9 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, static struct ibmvnic_rwi *get_next_rwi(struct ibmvnic_adapter *adapter) { struct ibmvnic_rwi *rwi; + unsigned long flags; - mutex_lock(&adapter->rwi_lock); + spin_lock_irqsave(&adapter->rwi_lock, flags); if (!list_empty(&adapter->rwi_list)) { rwi = list_first_entry(&adapter->rwi_list, struct ibmvnic_rwi, @@ -1950,7 +1951,7 @@ static struct ibmvnic_rwi *get_next_rwi(struct ibmvnic_adapter *adapter) rwi = NULL; } - mutex_unlock(&adapter->rwi_lock); + spin_unlock_irqrestore(&adapter->rwi_lock, flags); return rwi; } @@ -2025,6 +2026,7 @@ static int ibmvnic_reset(struct ibmvnic_adapter *adapter, struct list_head *entry, *tmp_entry; struct ibmvnic_rwi *rwi, *tmp; struct net_device *netdev = adapter->netdev; + unsigned long flags; int ret; if (adapter->state == VNIC_REMOVING || @@ -2041,13 +2043,13 @@ static int ibmvnic_reset(struct ibmvnic_adapter *adapter, goto err; } - mutex_lock(&adapter->rwi_lock); + spin_lock_irqsave(&adapter->rwi_lock, flags); list_for_each(entry, &adapter->rwi_list) { tmp = list_entry(entry, struct ibmvnic_rwi, list); if (tmp->reset_reason == reason) { netdev_dbg(netdev, "Skipping matching reset\n"); - mutex_unlock(&adapter->rwi_lock); + spin_unlock_irqrestore(&adapter->rwi_lock, flags); ret = EBUSY; goto err; } @@ -2055,7 +2057,7 @@ static int ibmvnic_reset(struct ibmvnic_adapter *adapter, rwi = kzalloc(sizeof(*rwi), GFP_KERNEL); if (!rwi) { - mutex_unlock(&adapter->rwi_lock); + spin_unl
[PATCH net 2/2] ibmvnic: Fix non-atomic memory allocation in IRQ context
ibmvnic_reset allocated new reset work item objects in a non-atomic context. This can be called from a tasklet, generating the output below. Allocate work items with the GFP_ATOMIC flag instead. BUG: sleeping function called from invalid context at mm/slab.h:421 in_atomic(): 1, irqs_disabled(): 1, pid: 93, name: kworker/0:2 INFO: lockdep is turned off. irq event stamp: 66049 hardirqs last enabled at (66048): [] tasklet_action_common.isra.12+0x78/0x1c0 hardirqs last disabled at (66049): [] _raw_spin_lock_irqsave+0x48/0xf0 softirqs last enabled at (66044): [] dev_deactivate_queue.constprop.28+0xc8/0x160 softirqs last disabled at (66045): [] call_do_softirq+0x14/0x24 CPU: 0 PID: 93 Comm: kworker/0:2 Kdump: loaded Not tainted 4.20.0-rc6-1-g1b50a8f03706 #7 Workqueue: events linkwatch_event Call Trace: [c003fffe7ae0] [c0bc83e4] dump_stack+0xe8/0x164 (unreliable) [c003fffe7b30] [c015ba0c] ___might_sleep+0x2dc/0x320 [c003fffe7bb0] [c0391514] kmem_cache_alloc_trace+0x3e4/0x440 [c003fffe7c30] [d5b2309c] ibmvnic_reset+0x16c/0x360 [ibmvnic] [c003fffe7cc0] [d5b29834] ibmvnic_tasklet+0x1054/0x2010 [ibmvnic] [c003fffe7e00] [c01224c8] tasklet_action_common.isra.12+0xd8/0x1c0 [c003fffe7e60] [c0bf1238] __do_softirq+0x1a8/0x64c [c003fffe7f90] [c00306e0] call_do_softirq+0x14/0x24 [c003f3967980] [c001ba50] do_softirq_own_stack+0x60/0xb0 [c003f39679c0] [c01218a8] do_softirq+0xa8/0x100 [c003f39679f0] [c0121a74] __local_bh_enable_ip+0x174/0x180 [c003f3967a60] [c0bf003c] _raw_spin_unlock_bh+0x5c/0x80 [c003f3967a90] [c0a8ac78] dev_deactivate_queue.constprop.28+0xc8/0x160 [c003f3967ad0] [c0a8c8b0] dev_deactivate_many+0xd0/0x520 [c003f3967b70] [c0a8cd40] dev_deactivate+0x40/0x60 [c003f3967ba0] [c0a5e0c4] linkwatch_do_dev+0x74/0xd0 [c003f3967bd0] [c0a5e694] __linkwatch_run_queue+0x1a4/0x1f0 [c003f3967c30] [c0a5e728] linkwatch_event+0x48/0x60 [c003f3967c50] [c01444e8] process_one_work+0x238/0x710 [c003f3967d20] [c0144a48] worker_thread+0x88/0x4e0 [c003f3967db0] [c014e3a8] kthread+0x178/0x1c0 [c003f3967e20] [c000bfd0] ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Thomas Falcon --- drivers/net/ethernet/ibm/ibmvnic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index ffc0cab05b0f..67cc6d9c8fd7 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -2055,7 +2055,7 @@ static int ibmvnic_reset(struct ibmvnic_adapter *adapter, } } - rwi = kzalloc(sizeof(*rwi), GFP_KERNEL); + rwi = kzalloc(sizeof(*rwi), GFP_ATOMIC); if (!rwi) { spin_unlock_irqrestore(&adapter->rwi_lock, flags); ibmvnic_close(netdev); -- 2.12.3
[PATCH v03] powerpc/mobility: Fix node detach/rename problem
The PPC mobility code receives RTAS requests to delete nodes with platform-/hardware-specific attributes when restarting the kernel after a migration. My example is for migration between a P8 Alpine and a P8 Brazos. Nodes to be deleted include 'ibm,random-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1', and, 'ibm,compression-v1'. The mobility.c code calls 'of_detach_node' for the nodes and their children. This makes calls to detach the properties and to remove the associated sysfs/kernfs files. Then new copies of the same nodes are next provided by the PHYP, local copies are built, and a pointer to the 'struct device_node' is passed to of_attach_node. Before the call to of_attach_node, the phandle is initialized to 0 when the data structure is alloced. During the call to of_attach_node, it calls __of_attach_node which pulls the actual name and phandle from just created sub-properties named something like 'name' and 'ibm,phandle'. This is all fine for the first migration. The problem occurs with the second and subsequent migrations when the PHYP on the new system wants to replace the same set of nodes again, referenced with the same names and phandle values. On the second and subsequent migrations, the PHYP tells the system to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1', 'ibm,compression-v1', 'ibm,sym-encryption-v1'. It specifies these nodes by its known set of phandle values -- the same handles used by the PHYP on the source system are known on the target system. The mobility.c code calls of_find_node_by_phandle() with these values and ends up locating the first instance of each node that was added during the original boot, instead of the second instance of each node created after the first migration. The detach during the second migration fails with errors like, [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 __of_detach_node+0x8/0xa0 [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: GW 4.18.0-rc1-wi107836-v05-120+ #201 [ 4565.030737] NIP: c07c1ea8 LR: c07c1fb4 CTR: 00655170 [ 4565.030741] REGS: c003f302b690 TRAP: 0700 Tainted: GW (4.18.0-rc1-wi107836-v05-120+) [ 4565.030745] MSR: 80010282b033 CR: 22288822 XER: 000a [ 4565.030757] CFAR: c07c1fb0 IRQMASK: 1 [ 4565.030757] GPR00: c07c1fa4 c003f302b910 c114bf00 c0038e68 [ 4565.030757] GPR04: 0001 80c008e0b4b8 [ 4565.030757] GPR08: 0001 8003 2843 [ 4565.030757] GPR12: 8800 c0001ec9ae00 4000 [ 4565.030757] GPR16: 0008 f6ff [ 4565.030757] GPR20: 0007 c003e9f1f034 0001 [ 4565.030757] GPR24: [ 4565.030757] GPR28: c1549d28 c1134828 c0038e68 c003f302b930 [ 4565.030804] NIP [c07c1ea8] __of_detach_node+0x8/0xa0 [ 4565.030808] LR [c07c1fb4] of_detach_node+0x74/0xd0 [ 4565.030811] Call Trace: [ 4565.030815] [c003f302b910] [c07c1fa4] of_detach_node+0x64/0xd0 (unreliable) [ 4565.030821] [c003f302b980] [c00c33c4] dlpar_detach_node+0xb4/0x150 [ 4565.030826] [c003f302ba10] [c00c3ffc] delete_dt_node+0x3c/0x80 [ 4565.030831] [c003f302ba40] [c00c4380] pseries_devicetree_update+0x150/0x4f0 [ 4565.030836] [c003f302bb70] [c00c479c] post_mobility_fixup+0x7c/0xf0 [ 4565.030841] [c003f302bbe0] [c00c4908] migration_store+0xf8/0x130 [ 4565.030847] [c003f302bc70] [c0998160] kobj_attr_store+0x30/0x60 [ 4565.030852] [c003f302bc90] [c0412f14] sysfs_kf_write+0x64/0xa0 [ 4565.030857] [c003f302bcb0] [c0411cac] kernfs_fop_write+0x16c/0x240 [ 4565.030862] [c003f302bd00] [c0355f20] __vfs_write+0x40/0x220 [ 4565.030867] [c003f302bd90] [c0356358] vfs_write+0xc8/0x240 [ 4565.030872] [c003f302bde0] [c03566cc] ksys_write+0x5c/0x100 [ 4565.030880] [c003f302be30] [c000b288] system_call+0x5c/0x70 [ 4565.030884] Instruction dump: [ 4565.030887] 38210070 3860 e8010010 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 [ 4565.030895] 7c0803a6 4e800020 e9230098 7929f7e2 <0b09> 2f89 4cde0020 e9030040 [ 4565.030903] ---[ end trace 5bd54cb1df9d2976 ]--- The mobility.c code continues on during the second migration, accepts the definitions of the new nodes from the PHYP and ends up renaming the new properties e
Re: [PATCH v2 2/3] powerpc: Discard dynsym section for !PPC32
On Wed, 5 Dec 2018 at 04:11, Segher Boessenkool wrote: > > On Tue, Dec 04, 2018 at 11:24:28AM +1030, Joel Stanley wrote: > > Alan Modra explains: > > > > > Likely you could discard .interp > and .dynstr too, and .dynsym when > > > !CONFIG_PPC32. > > > > Discarding of interp and dynstr happened in a previous patch. The dynsym > > cleanup was a bit less straightforward, so it gets it's own patch. > > > diff --git a/arch/powerpc/kernel/vmlinux.lds.S > > b/arch/powerpc/kernel/vmlinux.lds.S > > index 6570209b0671..2c93a420f456 100644 > > --- a/arch/powerpc/kernel/vmlinux.lds.S > > +++ b/arch/powerpc/kernel/vmlinux.lds.S > > @@ -266,14 +266,13 @@ SECTIONS > > } > > #ifdef CONFIG_RELOCATABLE > > . = ALIGN(8); > > +#ifdef CONFIG_PPC32 > > .dynsym : AT(ADDR(.dynsym) - LOAD_OFFSET) > > { > > -#ifdef CONFIG_PPC32 > > __dynamic_symtab = .; > > -#endif > > *(.dynsym) > > } > > - .dynstr : AT(ADDR(.dynstr) - LOAD_OFFSET) { *(.dynstr) } > > So this last line belongs in the previous patch then, right? Correct. Good catch.
[PATCH] powerpc: eeh_event: convert semaphore to completion
For this use case, completions and semaphores are equivalent, but semaphores are an awkward interface that should generally be avoided, so use the completion instead. Signed-off-by: Arnd Bergmann --- arch/powerpc/kernel/eeh_event.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c index 61c9356bf9c9..227e57f980df 100644 --- a/arch/powerpc/kernel/eeh_event.c +++ b/arch/powerpc/kernel/eeh_event.c @@ -35,7 +35,7 @@ */ static DEFINE_SPINLOCK(eeh_eventlist_lock); -static struct semaphore eeh_eventlist_sem; +static DECLARE_COMPLETION(eeh_eventlist_event); static LIST_HEAD(eeh_eventlist); /** @@ -55,7 +55,7 @@ static int eeh_event_handler(void * dummy) struct eeh_pe *pe; while (!kthread_should_stop()) { - if (down_interruptible(&eeh_eventlist_sem)) + if (wait_for_completion_interruptible(&eeh_eventlist_event)) break; /* Fetch EEH event from the queue */ @@ -102,9 +102,6 @@ int eeh_event_init(void) struct task_struct *t; int ret = 0; - /* Initialize semaphore */ - sema_init(&eeh_eventlist_sem, 0); - t = kthread_run(eeh_event_handler, NULL, "eehd"); if (IS_ERR(t)) { ret = PTR_ERR(t); @@ -142,7 +139,7 @@ int eeh_send_failure_event(struct eeh_pe *pe) spin_unlock_irqrestore(&eeh_eventlist_lock, flags); /* For EEH deamon to knick in */ - up(&eeh_eventlist_sem); + complete(&eeh_eventlist_event); return 0; } -- 2.20.0
Re: [PATCHv3 1/4] dt-bindings: add DT binding for the layerscape PCIe controller with EP mode
On Mon, 3 Dec 2018 18:35:02 +0800, Xiaowei Bao wrote: > Add the documentation for the Device Tree binding for the layerscape PCIe > controller with EP mode. > > Signed-off-by: Xiaowei Bao > --- > v2: > - Add the SoC specific compatibles. > v3: > - modify the commit message. > > .../devicetree/bindings/pci/layerscape-pci.txt |3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > Reviewed-by: Rob Herring
[PATCH v3 0/3] powerpc: Add to linker script discards
v3 fixes up the splitting of the patches, moving the dynstr hunk from patch 2 to patch 1. v2 pulls in the branch_lt patch too. No changes to the first two patches. Joel Stanley (3): powerpc: Discard more sections in linker script powerpc: Discard dynsym section for !PPC32 powerpc: Discard .branch_lt section arch/powerpc/kernel/vmlinux.lds.S | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) -- 2.19.1
[PATCH v3 1/3] powerpc: Discard more sections in linker script
Building the ppc64 kernel with a modern binutils results in this warning: powerpc64le-linux-gnu-ld: warning: orphan section `.gnu.hash' from `linker stubs' being placed in section `.gnu.hash' Alan Modra explains: > .gnu.hash, like .hash, is used by glibc ld.so for dynamic symbol > lookup. I imagine you don't need either section in a kernel, so > discarding both sounds reasonable. Likely you could discard .interp > and .dynstr too, and .dynsym when !CONFIG_PPC32. Reported-by: Stephen Rothwell Signed-off-by: Joel Stanley --- See https://lore.kernel.org/lkml/CACPK8Xft3n5KkpTjN3=7_vucxhfck7mxvzm2rrqu7tppcbo...@mail.gmail.com/T/#m58532c86cf0c7b4fb01cc1fe724e48d4c7d8e4a7 v3: Add dynstr hunk to this patch (it was incorrectly left in patch 2) --- arch/powerpc/kernel/vmlinux.lds.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 434581bcd5b4..779b8b3075a1 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -273,14 +273,11 @@ SECTIONS #endif *(.dynsym) } - .dynstr : AT(ADDR(.dynstr) - LOAD_OFFSET) { *(.dynstr) } .dynamic : AT(ADDR(.dynamic) - LOAD_OFFSET) { __dynamic_start = .; *(.dynamic) } - .hash : AT(ADDR(.hash) - LOAD_OFFSET) { *(.hash) } - .interp : AT(ADDR(.interp) - LOAD_OFFSET) { *(.interp) } .rela.dyn : AT(ADDR(.rela.dyn) - LOAD_OFFSET) { __rela_dyn_start = .; @@ -388,5 +385,8 @@ SECTIONS *(.gnu.version*) *(.gnu.attributes) *(.eh_frame) + *(.hash .gnu.hash) + *(.interp) + *(.dynstr) } } -- 2.19.1
[PATCH v3 2/3] powerpc: Discard dynsym section for !PPC32
Alan Modra explains: > Likely you could discard .interp > and .dynstr too, and .dynsym when > !CONFIG_PPC32. Discarding of interp and dynstr happened in a previous patch. The dynsym cleanup was a bit less straightforward, so it gets it's own patch. Signed-off-by: Joel Stanley --- See https://lore.kernel.org/lkml/CACPK8Xft3n5KkpTjN3=7_vucxhfck7mxvzm2rrqu7tppcbo...@mail.gmail.com/T/#m58532c86cf0c7b4fb01cc1fe724e48d4c7d8e4a7 v3: Move dynstr hunk to patch 1 (it was incorrectly left in this patch) --- arch/powerpc/kernel/vmlinux.lds.S | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 779b8b3075a1..2c93a420f456 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -266,13 +266,13 @@ SECTIONS } #ifdef CONFIG_RELOCATABLE . = ALIGN(8); +#ifdef CONFIG_PPC32 .dynsym : AT(ADDR(.dynsym) - LOAD_OFFSET) { -#ifdef CONFIG_PPC32 __dynamic_symtab = .; -#endif *(.dynsym) } +#endif .dynamic : AT(ADDR(.dynamic) - LOAD_OFFSET) { __dynamic_start = .; @@ -388,5 +388,8 @@ SECTIONS *(.hash .gnu.hash) *(.interp) *(.dynstr) +#ifndef CONFIG_PPC32 + *(.dynsym) +#endif } } -- 2.19.1
[PATCH v3 3/3] powerpc: Discard .branch_lt section
When building a 32 bit powerpc kernel with Binutils 2.31.1 this warning is emitted: powerpc-linux-gnu-ld: warning: orphan section `.branch_lt' from `arch/powerpc/kernel/head_44x.o' being placed in section `.branch_lt' As of binutils commit 2d7ad24e8726 ("Support PLT16 relocs against local symbols")[1], 32 bit targets can produce .branch_lt sections in their output. These sections should be empty for the kernel build so discard them for both PPC64 and PPC32. [1] https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=2d7ad24e8726ba4c45c9e67be08223a146a837ce Signed-off-by: Joel Stanley --- v2: Discard instead of keep. Alan said "[discarding] can be a recipe for finding linker bugs", so lets go with that. --- arch/powerpc/kernel/vmlinux.lds.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 2c93a420f456..af0f81df8bb9 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -316,7 +316,6 @@ SECTIONS DATA_DATA *(.data.rel*) *(.toc1) - *(.branch_lt) } .opd : AT(ADDR(.opd) - LOAD_OFFSET) { @@ -388,6 +387,7 @@ SECTIONS *(.hash .gnu.hash) *(.interp) *(.dynstr) + *(.branch_lt) #ifndef CONFIG_PPC32 *(.dynsym) #endif -- 2.19.1
[PATCH v2.2 24/34] dt-bindings: arm: Convert Rockchip board/soc bindings to json-schema
From: Rob Herring Convert Rockchip SoC bindings to DT schema format using json-schema. Cc: Mark Rutland Cc: Heiko Stuebner Cc: devicet...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-rockc...@lists.infradead.org Signed-off-by: Rob Herring [move to per-board entries and added recently added boards] Signed-off-by: Heiko Stuebner --- Hi Rob, thanks for the libyaml hint, now dtc does check my dts nicely and emits quite a number of little complaints ;-) . Also that suggestion to move the original firefly release+beta boards together was great and I just did that. Should look ok now, if so I'll apply it tomorrow. Heiko changes in v2.2: - use enum to differentiate between rk3288-firefly and rk3288-firefly-beta as suggested by Rob (firefly-reload is a completely different board though) - add the rk3288-evb variants, which were missing from the old binding changes in v2.1: - move to one entry per board - add boards added after the original yaml conversion .../devicetree/bindings/arm/rockchip.txt | 240 -- .../devicetree/bindings/arm/rockchip.yaml | 423 ++ 2 files changed, 423 insertions(+), 240 deletions(-) delete mode 100644 Documentation/devicetree/bindings/arm/rockchip.txt create mode 100644 Documentation/devicetree/bindings/arm/rockchip.yaml diff --git a/Documentation/devicetree/bindings/arm/rockchip.txt b/Documentation/devicetree/bindings/arm/rockchip.txt deleted file mode 100644 index 0cc71236d639.. --- a/Documentation/devicetree/bindings/arm/rockchip.txt +++ /dev/null @@ -1,240 +0,0 @@ -Rockchip platforms device tree bindings - -- 96boards RK3399 Ficus (ROCK960 Enterprise Edition) -Required root node properties: - - compatible = "vamrs,ficus", "rockchip,rk3399"; - -- 96boards RK3399 Rock960 (ROCK960 Consumer Edition) -Required root node properties: - - compatible = "vamrs,rock960", "rockchip,rk3399"; - -- Amarula Vyasa RK3288 board -Required root node properties: - - compatible = "amarula,vyasa-rk3288", "rockchip,rk3288"; - -- Asus Tinker board -Required root node properties: - - compatible = "asus,rk3288-tinker", "rockchip,rk3288"; - -- Asus Tinker board S -Required root node properties: - - compatible = "asus,rk3288-tinker-s", "rockchip,rk3288"; - -- Kylin RK3036 board: -Required root node properties: - - compatible = "rockchip,kylin-rk3036", "rockchip,rk3036"; - -- MarsBoard RK3066 board: -Required root node properties: - - compatible = "haoyu,marsboard-rk3066", "rockchip,rk3066a"; - -- bq Curie 2 tablet: -Required root node properties: - - compatible = "mundoreader,bq-curie2", "rockchip,rk3066a"; - -- ChipSPARK Rayeager PX2 board: -Required root node properties: - - compatible = "chipspark,rayeager-px2", "rockchip,rk3066a"; - -- Radxa Rock board: -Required root node properties: - - compatible = "radxa,rock", "rockchip,rk3188"; - -- Radxa Rock2 Square board: -Required root node properties: - - compatible = "radxa,rock2-square", "rockchip,rk3288"; - -- Rikomagic MK808 v1 board: -Required root node properties: - - compatible = "rikomagic,mk808", "rockchip,rk3066a"; - -- Firefly Firefly-RK3288 board: -Required root node properties: - - compatible = "firefly,firefly-rk3288", "rockchip,rk3288"; -or - - compatible = "firefly,firefly-rk3288-beta", "rockchip,rk3288"; - -- Firefly Firefly-RK3288 Reload board: -Required root node properties: - - compatible = "firefly,firefly-rk3288-reload", "rockchip,rk3288"; - -- Firefly Firefly-RK3399 board: -Required root node properties: - - compatible = "firefly,firefly-rk3399", "rockchip,rk3399"; - -- Firefly roc-rk3328-cc board: -Required root node properties: - - compatible = "firefly,roc-rk3328-cc", "rockchip,rk3328"; - -- Firefly ROC-RK3399-PC board: -Required root node properties: - - compatible = "firefly,roc-rk3399-pc", "rockchip,rk3399"; - -- ChipSPARK PopMetal-RK3288 board: -Required root node properties: - - compatible = "chipspark,popmetal-rk3288", "rockchip,rk3288"; - -- Netxeon R89 board: -Required root node properties: - - compatible = "netxeon,r89", "rockchip,rk3288"; - -- GeekBuying GeekBox: -Required root node properties: - - compatible = "geekbuying,geekbox", "rockchip,rk3368"; - -- Google Bob (Asus Chromebook Flip C101PA): -Required root node properties: - compatible = "google,bob-rev13", "google,bob-rev12", -"google,bob-rev11", "google,bob-rev10", -"google,bob-rev9", "google,bob-rev8", -"google,bob-rev7", "google,bob-rev6", -"google,bob-rev5", "google,bob-rev4", -"google,bob", "google,gru", "rockchip,rk3399"; - -- Google Brain (dev-board): -Required root node properties: - - compatible = "google,veyron-brain-rev0", "google,veyr
Re: [PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K
On 12/10/2018 06:27 PM, Michael Roth wrote: > Quoting Daniel Borkmann (2018-12-10 08:26:31) >> On 12/07/2018 04:36 PM, Michael Roth wrote: >>> Quoting Michael Ellerman (2018-12-07 06:31:13) Michael Roth writes: > Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF > JIT allocations. At compile time it defaults to PAGE_SIZE * 4, > and is adjusted again at init time if MODULES_VADDR is defined. > > For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with But maybe it should be, I don't know why we don't define it. > the compile-time default at boot-time, which is 0x9c40 when > using 64K page size. This overflows the signed 32-bit bpf_jit_limit > value: > > root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit > -1673527296 > > and can cause various unexpected failures throughout the network > stack. In one case `strace dhclient eth0` reported: > > setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, > filter=0x105dd27f8}, 16) = -1 ENOTSUPP (Unknown error 524) > > and similar failures can be seen with tools like tcpdump. This doesn't > always reproduce however, and I'm not sure why. The more consistent > failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host > would time out on systemd/netplan configuring a virtio-net NIC with no > noticeable errors in the logs. > > Fix this by limiting the compile-time default for bpf_jit_limit to > INT_MAX. INT_MAX is a lot more than (4k * 4), so I guess I'm not clear on whether we should be using PAGE_SIZE here at all. I guess each BPF program uses at least one page is the thinking? >>> >>> That seems to be the case, at least, the max number of minimum-sized >>> allocations would be less on ppc64 since the allocations are always at >>> least PAGE_SIZE in size. The init-time default also limits to INT_MAX, >>> so it seemed consistent to do that here too. >>> Thanks for tracking this down. For some reason none of my ~10 test boxes have hit this, perhaps I don't have new enough userspace? >>> >>> I'm not too sure, I would've thought things like the dhclient case in >>> the commit log would fail every time, but sometimes I need to reboot the >>> guest before I start seeing the behavior. Maybe there's something special >>> about when JIT allocations are actually done that can affect >>> reproducibility? >>> >>> In my case at least the virtio-net networking timeout was consistent >>> enough for a bisect, but maybe it depends on the specific network >>> configuration (single NIC, basic DHCP through netplan/systemd in my case). >>> You don't mention why you needed to add BPF_MIN(), I assume because the kernel version of min() has gotten too complicated to work here? >>> >>> I wasn't sure if it was safe here or not, so I tried looking at other >>> users and came across: >>> >>> mm/vmalloc.c:777:#define VMAP_MIN(x, y) ((x) < (y) ? (x) : >>> (y)) /* can't use min() */ >>> >>> I'm not sure what the reasoning was (or whether it still applies), but I >>> figured it was safer to do the same here. Maybe Nick still recalls? >>> Daniel I assume you'll merge this via your tree? cheers > Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv > allocations") > Cc: linuxppc-...@ozlabs.org > Cc: Daniel Borkmann > Cc: Sandipan Das > Cc: Alexei Starovoitov > Signed-off-by: Michael Roth >> >> Thanks for the reports / fixes and sorry for my late reply (bit too >> swamped last week), some more thoughts below. >> > kernel/bpf/core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index b1a3545d0ec8..55de4746cdfd 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp) > } > > #ifdef CONFIG_BPF_JIT > -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4) > +# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y)) > +# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), > INT_MAX) > > /* All BPF JIT sysctl knobs here. */ > int bpf_jit_enable __read_mostly = > IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON); >> >> I would actually just like to get rid of the BPF_JIT_LIMIT_DEFAULT >> define also given for 4.21 arm64 will have its own dedicated area for >> JIT allocations where neither the above limit nor the MODULES_END/ >> MODULES_VADDR one would fit and I don't want to make this even more >> ugly with adding further cases into the core. Would the below variant >> work for you? > > Looks good to me. My one concern (which is probably a separate > issue) is that the INT_MAX limit is a bit more punishing for larger > page sizes since the minimum allocations seem
Re: [PATCH] powerpc: eeh_event: convert semaphore to completion
On Tue, Dec 11, 2018 at 8:52 AM Arnd Bergmann wrote: > > For this use case, completions and semaphores are equivalent, > but semaphores are an awkward interface that should generally > be avoided, so use the completion instead. IIRC Sam has been reworking the locking used inside of EEH so this is probably going to clash with his changes. Converting to a completion is probably a good idea, but we might want to do it as a part of his series since it's going to collide with this anyway. Sam, what do you think? > > Signed-off-by: Arnd Bergmann > --- > arch/powerpc/kernel/eeh_event.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c > index 61c9356bf9c9..227e57f980df 100644 > --- a/arch/powerpc/kernel/eeh_event.c > +++ b/arch/powerpc/kernel/eeh_event.c > @@ -35,7 +35,7 @@ > */ > > static DEFINE_SPINLOCK(eeh_eventlist_lock); > -static struct semaphore eeh_eventlist_sem; > +static DECLARE_COMPLETION(eeh_eventlist_event); > static LIST_HEAD(eeh_eventlist); > > /** > @@ -55,7 +55,7 @@ static int eeh_event_handler(void * dummy) > struct eeh_pe *pe; > > while (!kthread_should_stop()) { > - if (down_interruptible(&eeh_eventlist_sem)) > + if (wait_for_completion_interruptible(&eeh_eventlist_event)) > break; > > /* Fetch EEH event from the queue */ > @@ -102,9 +102,6 @@ int eeh_event_init(void) > struct task_struct *t; > int ret = 0; > > - /* Initialize semaphore */ > - sema_init(&eeh_eventlist_sem, 0); > - > t = kthread_run(eeh_event_handler, NULL, "eehd"); > if (IS_ERR(t)) { > ret = PTR_ERR(t); > @@ -142,7 +139,7 @@ int eeh_send_failure_event(struct eeh_pe *pe) > spin_unlock_irqrestore(&eeh_eventlist_lock, flags); > > /* For EEH deamon to knick in */ > - up(&eeh_eventlist_sem); > + complete(&eeh_eventlist_event); > > return 0; > } > -- > 2.20.0 >
Re: [PATCH] powerpc: eeh_event: convert semaphore to completion
On Tue, Dec 11, 2018 at 10:18:31AM +1100, Oliver wrote: > On Tue, Dec 11, 2018 at 8:52 AM Arnd Bergmann wrote: > > > > For this use case, completions and semaphores are equivalent, > > but semaphores are an awkward interface that should generally > > be avoided, so use the completion instead. > > IIRC Sam has been reworking the locking used inside of EEH so this is > probably going to clash with his changes. Converting to a completion > is probably a good idea, but we might want to do it as a part of his > series since it's going to collide with this anyway. > > Sam, what do you think? It's such a small change, I don't think it will cause any problems for the rework. Anyway it seems like a good change, so I'd prefer to see it go in :-) Cheers, Sam. > > > > Signed-off-by: Arnd Bergmann > > --- > > arch/powerpc/kernel/eeh_event.c | 9 +++-- > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > diff --git a/arch/powerpc/kernel/eeh_event.c > > b/arch/powerpc/kernel/eeh_event.c > > index 61c9356bf9c9..227e57f980df 100644 > > --- a/arch/powerpc/kernel/eeh_event.c > > +++ b/arch/powerpc/kernel/eeh_event.c > > @@ -35,7 +35,7 @@ > > */ > > > > static DEFINE_SPINLOCK(eeh_eventlist_lock); > > -static struct semaphore eeh_eventlist_sem; > > +static DECLARE_COMPLETION(eeh_eventlist_event); > > static LIST_HEAD(eeh_eventlist); > > > > /** > > @@ -55,7 +55,7 @@ static int eeh_event_handler(void * dummy) > > struct eeh_pe *pe; > > > > while (!kthread_should_stop()) { > > - if (down_interruptible(&eeh_eventlist_sem)) > > + if (wait_for_completion_interruptible(&eeh_eventlist_event)) > > break; > > > > /* Fetch EEH event from the queue */ > > @@ -102,9 +102,6 @@ int eeh_event_init(void) > > struct task_struct *t; > > int ret = 0; > > > > - /* Initialize semaphore */ > > - sema_init(&eeh_eventlist_sem, 0); > > - > > t = kthread_run(eeh_event_handler, NULL, "eehd"); > > if (IS_ERR(t)) { > > ret = PTR_ERR(t); > > @@ -142,7 +139,7 @@ int eeh_send_failure_event(struct eeh_pe *pe) > > spin_unlock_irqrestore(&eeh_eventlist_lock, flags); > > > > /* For EEH deamon to knick in */ > > - up(&eeh_eventlist_sem); > > + complete(&eeh_eventlist_event); > > > > return 0; > > } > > -- > > 2.20.0 > > > signature.asc Description: PGP signature
Re: [PATCH v3 00/12] perf/core: Generalise event exclusion checking
On Fri, Dec 07, 2018 at 05:25:17PM +, Will Deacon wrote: > On Thu, Dec 06, 2018 at 04:47:17PM +, Andrew Murray wrote: > > Many PMU drivers do not have the capability to exclude counting events > > that occur in specific contexts such as idle, kernel, guest, etc. These > > drivers indicate this by returning an error in their event_init upon > > testing the events attribute flags. > > > > However this approach requires that each time a new event modifier is > > added to perf, all the perf drivers need to be modified to indicate that > > they don't support the attribute. This results in additional boiler-plate > > code common to many drivers that needs to be maintained. Furthermore the > > drivers are not consistent with regards to the error value they return > > when reporting unsupported attributes. > > > > This patchset allow PMU drivers to advertise their inability to exclude > > based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This > > allows the perf core to reject requests for exclusion events where there > > is no support in the PMU. > > > > This is a functional change, in particular: > > > > - Some drivers will now additionally (but correctly) report unsupported > >exclusion flags. It's typical for existing userspace tools such as > >perf to handle such errors by retrying the system call without the > >unsupported flags. > > > > - Drivers that do not support any exclusion that previously reported > >-EPERM or -EOPNOTSUPP will now report -EINVAL - this is consistent > >with the majority and results in userspace perf retrying without > >exclusion. > > > > All drivers touched by this patchset have been compile tested. > > For the bits under arch/arm/ and drivers/perf: > > Acked-by: Will Deacon > > Note that I've queued the TX2 uncore PMU for 4.21 [1], which could also > benefit from your new flag. Ah thanks for pointing this out, I'll send a patch in due course. Thanks, Andrwe Murray > > Will > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/commit/?h=for-next/perf&id=69c32972d59388c041268e8206e8eb1acff29b9a
Re: [PATCH] ocxl: Fix endiannes bug in read_afu_name()
On 11/12/18 2:10 am, Greg Kurz wrote: The double word returned by read_afu_info(OCXL_DVSEC_TEMPL_NAME) contains four characters of the AFU name, read from the PCI config space, hence with a little-endian ordering. When composing the string, a big-endian system must swap the bytes so that the characters appear in the right order. Do this with le32_to_cpu(). Signed-off-by: Greg Kurz snowpatch reports the following sparse warning: +drivers/misc/ocxl/config.c:321:24: warning: cast to restricted __le32 You probably need to change val from a u32 to a __le32. --- drivers/misc/ocxl/config.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c index 57a6bb1fd3c9..b76198ba8630 100644 --- a/drivers/misc/ocxl/config.c +++ b/drivers/misc/ocxl/config.c @@ -318,7 +318,7 @@ static int read_afu_name(struct pci_dev *dev, struct ocxl_fn_config *fn, if (rc) return rc; ptr = (u32 *) &afu->name[i]; - *ptr = val; + *ptr = le32_to_cpu(val); } afu->name[OCXL_AFU_NAME_SZ - 1] = '\0'; /* play safe */ return 0; -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [PATCH kernel v4 19/19] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver
On Fri, 23 Nov 2018 16:53:04 +1100 Alexey Kardashevskiy wrote: > POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not > pluggable PCIe devices but still have PCIe links which are used > for config space and MMIO. In addition to that the GPUs have 6 NVLinks > which are connected to other GPUs and the POWER9 CPU. POWER9 chips > have a special unit on a die called an NPU which is an NVLink2 host bus > adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each. > These systems also support ATS (address translation services) which is > a part of the NVLink2 protocol. Such GPUs also share on-board RAM > (16GB or 32GB) to the system via the same NVLink2 so a CPU has > cache-coherent access to a GPU RAM. > > This exports GPU RAM to the userspace as a new VFIO device region. This > preregisters the new memory as device memory as it might be used for DMA. > This inserts pfns from the fault handler as the GPU memory is not onlined > until the vendor driver is loaded and trained the NVLinks so doing this > earlier causes low level errors which we fence in the firmware so > it does not hurt the host system but still better be avoided. > > This exports an ATSD (Address Translation Shootdown) register of NPU which > allows TLB invalidations inside GPU for an operating system. The register > conveniently occupies a single 64k page. It is also presented to > the userspace as a new VFIO device region. > > In order to provide the userspace with the information about GPU-to-NVLink > connections, this exports an additional capability called "tgt" > (which is an abbreviated host system bus address). The "tgt" property > tells the GPU its own system address and allows the guest driver to > conglomerate the routing information so each GPU knows how to get directly > to the other GPUs. > > For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to > know LPID (a logical partition ID or a KVM guest hardware ID in other > words) and PID (a memory context ID of a userspace process, not to be > confused with a linux pid). This assigns a GPU to LPID in the NPU and > this is why this adds a listener for KVM on an IOMMU group. A PID comes > via NVLink from a GPU and NPU uses a PID wildcard to pass it through. > > This requires coherent memory and ATSD to be available on the host as > the GPU vendor only supports configurations with both features enabled > and other configurations are known not to work. Because of this and > because of the ways the features are advertised to the host system > (which is a device tree with very platform specific properties), > this requires enabled POWERNV platform. > > The V100 GPUs do not advertise none of these capabilities via the config s/none/any/ > space and there are more than just one device ID so this relies on > the platform to tell whether these GPUs have special abilities such as > NVLinks. > > Signed-off-by: Alexey Kardashevskiy > --- > Changes: > v4: > * added nvlink-speed to the NPU bridge capability as this turned out to > be not a constant value > * instead of looking at the exact device ID (which also changes from system > to system), now this (indirectly) looks at the device tree to know > if GPU and NPU support NVLink > > v3: > * reworded the commit log about tgt > * added tracepoints (do we want them enabled for entire vfio-pci?) > * added code comments > * added write|mmap flags to the new regions > * auto enabled VFIO_PCI_NVLINK2 config option > * added 'tgt' capability to a GPU so QEMU can recreate ibm,npu and ibm,gpu > references; there are required by the NVIDIA driver > * keep notifier registered only for short time > --- > drivers/vfio/pci/Makefile | 1 + > drivers/vfio/pci/trace.h| 102 +++ > drivers/vfio/pci/vfio_pci_private.h | 2 + > include/uapi/linux/vfio.h | 27 ++ > drivers/vfio/pci/vfio_pci.c | 37 ++- > drivers/vfio/pci/vfio_pci_nvlink2.c | 448 > drivers/vfio/pci/Kconfig| 6 + > 7 files changed, 621 insertions(+), 2 deletions(-) > create mode 100644 drivers/vfio/pci/trace.h > create mode 100644 drivers/vfio/pci/vfio_pci_nvlink2.c > > diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile > index 76d8ec0..9662c06 100644 > --- a/drivers/vfio/pci/Makefile > +++ b/drivers/vfio/pci/Makefile > @@ -1,5 +1,6 @@ > > vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o > vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o > +vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o > > obj-$(CONFIG_VFIO_PCI) += vfio-pci.o ... > diff --git a/drivers/vfio/pci/vfio_pci_private.h > b/drivers/vfio/pci/vfio_pci_private.h > index 93c1738..7639241 100644 > --- a/drivers/vfio/pci/vfio_pci_private.h > +++ b/drivers/vfio/pci/vfio_pci_private.h > @@ -163,4 +163,6 @@ static inline int vfio_pci_igd_init(struct > vfio_pci_device *vdev) > return -ENODEV; > } > #endif > +extern int vfio_pci_nvdia_v100_n
Re: [PATCH kernel v4 17/19] vfio_pci: Allow mapping extra regions
On Fri, 23 Nov 2018 16:53:02 +1100 Alexey Kardashevskiy wrote: > So far we only allowed mapping of MMIO BARs to the userspace. However > there there are GPUs with on-board coherent RAM accessible via side s/there there/there/ Otherwise: Acked-by: Alex Williamson > channels which we also want to map to the userspace. The first client > for this is NVIDIA V100 GPU with NVLink2 direct links to a POWER9 > NPU-enabled CPU; such GPUs have 16GB RAM which is coherently mapped > to the system address space, we are going to export these as an extra > PCI region. > > We already support extra PCI regions and this adds support for mapping > them to the userspace. > > Signed-off-by: Alexey Kardashevskiy > Reviewed-by: David Gibson > --- > Changes: > v2: > * reverted one of mistakenly removed error checks > --- > drivers/vfio/pci/vfio_pci_private.h | 3 +++ > drivers/vfio/pci/vfio_pci.c | 9 + > 2 files changed, 12 insertions(+) > > diff --git a/drivers/vfio/pci/vfio_pci_private.h > b/drivers/vfio/pci/vfio_pci_private.h > index cde3b5d..86aab05 100644 > --- a/drivers/vfio/pci/vfio_pci_private.h > +++ b/drivers/vfio/pci/vfio_pci_private.h > @@ -59,6 +59,9 @@ struct vfio_pci_regops { > size_t count, loff_t *ppos, bool iswrite); > void(*release)(struct vfio_pci_device *vdev, > struct vfio_pci_region *region); > + int (*mmap)(struct vfio_pci_device *vdev, > + struct vfio_pci_region *region, > + struct vm_area_struct *vma); > }; > > struct vfio_pci_region { > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index fef5002..4a6f7c0 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -1130,6 +1130,15 @@ static int vfio_pci_mmap(void *device_data, struct > vm_area_struct *vma) > return -EINVAL; > if ((vma->vm_flags & VM_SHARED) == 0) > return -EINVAL; > + if (index >= VFIO_PCI_NUM_REGIONS) { > + int regnum = index - VFIO_PCI_NUM_REGIONS; > + struct vfio_pci_region *region = vdev->region + regnum; > + > + if (region && region->ops && region->ops->mmap && > + (region->flags & VFIO_REGION_INFO_FLAG_MMAP)) > + return region->ops->mmap(vdev, region, vma); > + return -EINVAL; > + } > if (index >= VFIO_PCI_ROM_REGION_INDEX) > return -EINVAL; > if (!vdev->bar_mmap_supported[index])
Re: [PATCH] ocxl/afu_irq: Don't include
Acked-by: Andrew Donnellan On 11/12/18 2:13 am, Greg Kurz wrote: The AFU irq code doesn't need to reach out to the platform. Signed-off-by: Greg Kurz --- drivers/misc/ocxl/afu_irq.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/misc/ocxl/afu_irq.c b/drivers/misc/ocxl/afu_irq.c index e70cfa24577f..11ab996657a2 100644 --- a/drivers/misc/ocxl/afu_irq.c +++ b/drivers/misc/ocxl/afu_irq.c @@ -2,7 +2,6 @@ // Copyright 2017 IBM Corp. #include #include -#include #include "ocxl_internal.h" #include "trace.h" -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [PATCH kernel v4 18/19] vfio_pci: Allow regions to add own capabilities
On Fri, 23 Nov 2018 16:53:03 +1100 Alexey Kardashevskiy wrote: > VFIO regions already support region capabilities with a limited set of > fields. However the subdriver might have to report to the userspace > additional bits. > > This adds an add_capability() hook to vfio_pci_regops. > > Signed-off-by: Alexey Kardashevskiy > --- > Changes: > v3: > * removed confusing rationale for the patch, the next patch makes > use of it anyway > --- > drivers/vfio/pci/vfio_pci_private.h | 3 +++ > drivers/vfio/pci/vfio_pci.c | 6 ++ > 2 files changed, 9 insertions(+) Acked-by: Alex Williamson > diff --git a/drivers/vfio/pci/vfio_pci_private.h > b/drivers/vfio/pci/vfio_pci_private.h > index 86aab05..93c1738 100644 > --- a/drivers/vfio/pci/vfio_pci_private.h > +++ b/drivers/vfio/pci/vfio_pci_private.h > @@ -62,6 +62,9 @@ struct vfio_pci_regops { > int (*mmap)(struct vfio_pci_device *vdev, > struct vfio_pci_region *region, > struct vm_area_struct *vma); > + int (*add_capability)(struct vfio_pci_device *vdev, > + struct vfio_pci_region *region, > + struct vfio_info_cap *caps); > }; > > struct vfio_pci_region { > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index 4a6f7c0..6cb70cf 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -763,6 +763,12 @@ static long vfio_pci_ioctl(void *device_data, > if (ret) > return ret; > > + if (vdev->region[i].ops->add_capability) { > + ret = vdev->region[i].ops->add_capability(vdev, > + &vdev->region[i], &caps); > + if (ret) > + return ret; > + } > } > } >
Re: [PATCH] ocxl: Clarify error path in setup_xsl_irq()
On 11/12/18 2:18 am, Greg Kurz wrote: Implementing rollback with goto and labels is a common practice that leads to prettier and more maintainable code. FWIW, this design pattern is already being used in alloc_link() a few lines below in this file. Do the same in setup_xsl_irq(). Signed-off-by: Greg Kurz This is good, thanks. Acked-by: Andrew Donnellan --- drivers/misc/ocxl/link.c | 23 ++- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index eed92055184d..659977a17405 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -273,9 +273,9 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) spa->irq_name = kasprintf(GFP_KERNEL, "ocxl-xsl-%x-%x-%x", link->domain, link->bus, link->dev); if (!spa->irq_name) { - unmap_irq_registers(spa); dev_err(&dev->dev, "Can't allocate name for xsl interrupt\n"); - return -ENOMEM; + rc = -ENOMEM; + goto err_xsl; } /* * At some point, we'll need to look into allowing a higher @@ -283,11 +283,10 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) */ spa->virq = irq_create_mapping(NULL, hwirq); if (!spa->virq) { - kfree(spa->irq_name); - unmap_irq_registers(spa); dev_err(&dev->dev, "irq_create_mapping failed for translation interrupt\n"); - return -EINVAL; + rc = -EINVAL; + goto err_name; } dev_dbg(&dev->dev, "hwirq %d mapped to virq %d\n", hwirq, spa->virq); @@ -295,15 +294,21 @@ static int setup_xsl_irq(struct pci_dev *dev, struct link *link) rc = request_irq(spa->virq, xsl_fault_handler, 0, spa->irq_name, link); if (rc) { - irq_dispose_mapping(spa->virq); - kfree(spa->irq_name); - unmap_irq_registers(spa); dev_err(&dev->dev, "request_irq failed for translation interrupt: %d\n", rc); - return -EINVAL; + rc = -EINVAL; + goto err_mapping; } return 0; + +err_mapping: + irq_dispose_mapping(spa->virq); +err_name: + kfree(spa->irq_name); +err_xsl: + unmap_irq_registers(spa); + return rc; } static void release_xsl_irq(struct link *link) -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [PATCH] ocxl: Fix endiannes bug in read_afu_name()
On 11/12/18 11:05 am, Andrew Donnellan wrote: On 11/12/18 2:10 am, Greg Kurz wrote: The double word returned by read_afu_info(OCXL_DVSEC_TEMPL_NAME) contains four characters of the AFU name, read from the PCI config space, hence with a little-endian ordering. When composing the string, a big-endian system must swap the bytes so that the characters appear in the right order. Do this with le32_to_cpu(). Signed-off-by: Greg Kurz snowpatch reports the following sparse warning: +drivers/misc/ocxl/config.c:321:24: warning: cast to restricted __le32 You probably need to change val from a u32 to a __le32. Also does this need to go to stable? -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [PATCH kernel v4 19/19] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver
On 11/12/2018 11:08, Alex Williamson wrote: > On Fri, 23 Nov 2018 16:53:04 +1100 > Alexey Kardashevskiy wrote: > >> POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not >> pluggable PCIe devices but still have PCIe links which are used >> for config space and MMIO. In addition to that the GPUs have 6 NVLinks >> which are connected to other GPUs and the POWER9 CPU. POWER9 chips >> have a special unit on a die called an NPU which is an NVLink2 host bus >> adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each. >> These systems also support ATS (address translation services) which is >> a part of the NVLink2 protocol. Such GPUs also share on-board RAM >> (16GB or 32GB) to the system via the same NVLink2 so a CPU has >> cache-coherent access to a GPU RAM. >> >> This exports GPU RAM to the userspace as a new VFIO device region. This >> preregisters the new memory as device memory as it might be used for DMA. >> This inserts pfns from the fault handler as the GPU memory is not onlined >> until the vendor driver is loaded and trained the NVLinks so doing this >> earlier causes low level errors which we fence in the firmware so >> it does not hurt the host system but still better be avoided. >> >> This exports an ATSD (Address Translation Shootdown) register of NPU which >> allows TLB invalidations inside GPU for an operating system. The register >> conveniently occupies a single 64k page. It is also presented to >> the userspace as a new VFIO device region. >> >> In order to provide the userspace with the information about GPU-to-NVLink >> connections, this exports an additional capability called "tgt" >> (which is an abbreviated host system bus address). The "tgt" property >> tells the GPU its own system address and allows the guest driver to >> conglomerate the routing information so each GPU knows how to get directly >> to the other GPUs. >> >> For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to >> know LPID (a logical partition ID or a KVM guest hardware ID in other >> words) and PID (a memory context ID of a userspace process, not to be >> confused with a linux pid). This assigns a GPU to LPID in the NPU and >> this is why this adds a listener for KVM on an IOMMU group. A PID comes >> via NVLink from a GPU and NPU uses a PID wildcard to pass it through. >> >> This requires coherent memory and ATSD to be available on the host as >> the GPU vendor only supports configurations with both features enabled >> and other configurations are known not to work. Because of this and >> because of the ways the features are advertised to the host system >> (which is a device tree with very platform specific properties), >> this requires enabled POWERNV platform. >> >> The V100 GPUs do not advertise none of these capabilities via the config > > s/none/any/ > >> space and there are more than just one device ID so this relies on >> the platform to tell whether these GPUs have special abilities such as >> NVLinks. >> >> Signed-off-by: Alexey Kardashevskiy >> --- >> Changes: >> v4: >> * added nvlink-speed to the NPU bridge capability as this turned out to >> be not a constant value >> * instead of looking at the exact device ID (which also changes from system >> to system), now this (indirectly) looks at the device tree to know >> if GPU and NPU support NVLink >> >> v3: >> * reworded the commit log about tgt >> * added tracepoints (do we want them enabled for entire vfio-pci?) >> * added code comments >> * added write|mmap flags to the new regions >> * auto enabled VFIO_PCI_NVLINK2 config option >> * added 'tgt' capability to a GPU so QEMU can recreate ibm,npu and ibm,gpu >> references; there are required by the NVIDIA driver >> * keep notifier registered only for short time >> --- >> drivers/vfio/pci/Makefile | 1 + >> drivers/vfio/pci/trace.h| 102 +++ >> drivers/vfio/pci/vfio_pci_private.h | 2 + >> include/uapi/linux/vfio.h | 27 ++ >> drivers/vfio/pci/vfio_pci.c | 37 ++- >> drivers/vfio/pci/vfio_pci_nvlink2.c | 448 >> drivers/vfio/pci/Kconfig| 6 + >> 7 files changed, 621 insertions(+), 2 deletions(-) >> create mode 100644 drivers/vfio/pci/trace.h >> create mode 100644 drivers/vfio/pci/vfio_pci_nvlink2.c >> >> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile >> index 76d8ec0..9662c06 100644 >> --- a/drivers/vfio/pci/Makefile >> +++ b/drivers/vfio/pci/Makefile >> @@ -1,5 +1,6 @@ >> >> vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o >> vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o >> +vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o >> >> obj-$(CONFIG_VFIO_PCI) += vfio-pci.o > ... >> diff --git a/drivers/vfio/pci/vfio_pci_private.h >> b/drivers/vfio/pci/vfio_pci_private.h >> index 93c1738..7639241 100644 >> --- a/drivers/vfio/pci/vfio_pci_private.h >> +++ b/drivers/vfio/pci/vfio_pci_private.h >> @@ -163,4 +163
Re: [PATCH] ocxl: Simplify free_spa()
On 11/12/18 2:15 am, Greg Kurz wrote: The only users of free_spa() are alloc_link() and free_link(), and in both cases: - link->spa != NULL - free_spa(link) is immediatly followed by kfree(link) The check isn't needed, and it doesn't bring much to clear the link->spa pointer. Drop both. Signed-off-by: Greg Kurz I like defensive programming but for this case I don't really care too much either way Acked-by: Andrew Donnellan --- drivers/misc/ocxl/link.c |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c index 31695a078485..eed92055184d 100644 --- a/drivers/misc/ocxl/link.c +++ b/drivers/misc/ocxl/link.c @@ -352,11 +352,8 @@ static void free_spa(struct link *link) pr_debug("Freeing SPA for %x:%x:%x\n", link->domain, link->bus, link->dev); - if (spa && spa->spa_mem) { - free_pages((unsigned long) spa->spa_mem, spa->spa_order); - kfree(spa); - link->spa = NULL; - } + free_pages((unsigned long) spa->spa_mem, spa->spa_order); + kfree(spa); } static int alloc_link(struct pci_dev *dev, int PE_mask, struct link **out_link) -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [PATCH kernel v4 19/19] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver
On Tue, 11 Dec 2018 11:57:20 +1100 Alexey Kardashevskiy wrote: > On 11/12/2018 11:08, Alex Williamson wrote: > > On Fri, 23 Nov 2018 16:53:04 +1100 > > Alexey Kardashevskiy wrote: > > > >> POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not > >> pluggable PCIe devices but still have PCIe links which are used > >> for config space and MMIO. In addition to that the GPUs have 6 NVLinks > >> which are connected to other GPUs and the POWER9 CPU. POWER9 chips > >> have a special unit on a die called an NPU which is an NVLink2 host bus > >> adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each. > >> These systems also support ATS (address translation services) which is > >> a part of the NVLink2 protocol. Such GPUs also share on-board RAM > >> (16GB or 32GB) to the system via the same NVLink2 so a CPU has > >> cache-coherent access to a GPU RAM. > >> > >> This exports GPU RAM to the userspace as a new VFIO device region. This > >> preregisters the new memory as device memory as it might be used for DMA. > >> This inserts pfns from the fault handler as the GPU memory is not onlined > >> until the vendor driver is loaded and trained the NVLinks so doing this > >> earlier causes low level errors which we fence in the firmware so > >> it does not hurt the host system but still better be avoided. > >> > >> This exports an ATSD (Address Translation Shootdown) register of NPU which > >> allows TLB invalidations inside GPU for an operating system. The register > >> conveniently occupies a single 64k page. It is also presented to > >> the userspace as a new VFIO device region. > >> > >> In order to provide the userspace with the information about GPU-to-NVLink > >> connections, this exports an additional capability called "tgt" > >> (which is an abbreviated host system bus address). The "tgt" property > >> tells the GPU its own system address and allows the guest driver to > >> conglomerate the routing information so each GPU knows how to get directly > >> to the other GPUs. > >> > >> For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to > >> know LPID (a logical partition ID or a KVM guest hardware ID in other > >> words) and PID (a memory context ID of a userspace process, not to be > >> confused with a linux pid). This assigns a GPU to LPID in the NPU and > >> this is why this adds a listener for KVM on an IOMMU group. A PID comes > >> via NVLink from a GPU and NPU uses a PID wildcard to pass it through. > >> > >> This requires coherent memory and ATSD to be available on the host as > >> the GPU vendor only supports configurations with both features enabled > >> and other configurations are known not to work. Because of this and > >> because of the ways the features are advertised to the host system > >> (which is a device tree with very platform specific properties), > >> this requires enabled POWERNV platform. > >> > >> The V100 GPUs do not advertise none of these capabilities via the config > > > > s/none/any/ > > > >> space and there are more than just one device ID so this relies on > >> the platform to tell whether these GPUs have special abilities such as > >> NVLinks. > >> > >> Signed-off-by: Alexey Kardashevskiy > >> --- > >> Changes: > >> v4: > >> * added nvlink-speed to the NPU bridge capability as this turned out to > >> be not a constant value > >> * instead of looking at the exact device ID (which also changes from system > >> to system), now this (indirectly) looks at the device tree to know > >> if GPU and NPU support NVLink > >> > >> v3: > >> * reworded the commit log about tgt > >> * added tracepoints (do we want them enabled for entire vfio-pci?) > >> * added code comments > >> * added write|mmap flags to the new regions > >> * auto enabled VFIO_PCI_NVLINK2 config option > >> * added 'tgt' capability to a GPU so QEMU can recreate ibm,npu and ibm,gpu > >> references; there are required by the NVIDIA driver > >> * keep notifier registered only for short time > >> --- > >> drivers/vfio/pci/Makefile | 1 + > >> drivers/vfio/pci/trace.h| 102 +++ > >> drivers/vfio/pci/vfio_pci_private.h | 2 + > >> include/uapi/linux/vfio.h | 27 ++ > >> drivers/vfio/pci/vfio_pci.c | 37 ++- > >> drivers/vfio/pci/vfio_pci_nvlink2.c | 448 > >> drivers/vfio/pci/Kconfig| 6 + > >> 7 files changed, 621 insertions(+), 2 deletions(-) > >> create mode 100644 drivers/vfio/pci/trace.h > >> create mode 100644 drivers/vfio/pci/vfio_pci_nvlink2.c > >> > >> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile > >> index 76d8ec0..9662c06 100644 > >> --- a/drivers/vfio/pci/Makefile > >> +++ b/drivers/vfio/pci/Makefile > >> @@ -1,5 +1,6 @@ > >> > >> vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o > >> vfio_pci_config.o > >> vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o > >> +vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o > >> > >> obj-$(C
Re: [PATCH net 0/2] net/ibmvnic: Fix reset work item locking bugs
From: Thomas Falcon Date: Mon, 10 Dec 2018 15:22:21 -0600 > This patch set fixes issues with scheduling reset work items in > a tasklet context. Since ibmvnic_reset can called in an interrupt, > it should not use a mutex or allocate memory non-atomically. Series applied, thanks.
[PATCH] powerpc: Fix bogus usage of MSR_RI on BookE and 40x
BookE and 40x processors lack the MSR:RI bit. However, we have a few common code places that rely on it. This fixes it by not defining MSR_RI on those processor types and using the appropriate ifdef's in those locations. Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/include/asm/reg.h | 2 ++ arch/powerpc/include/asm/reg_booke.h | 4 ++-- arch/powerpc/kernel/process.c| 2 +- arch/powerpc/kernel/traps.c | 8 ++-- arch/powerpc/lib/sstep.c | 2 ++ 5 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index de52c31..41d0b2e 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -110,7 +110,9 @@ #ifndef MSR_PMM #define MSR_PMM__MASK(MSR_PMM_LG) /* Performance monitor */ #endif +#if !defined(CONFIG_BOOKE) && !defined(CONFIG_4xx) #define MSR_RI __MASK(MSR_RI_LG) /* Recoverable Exception */ +#endif #define MSR_LE __MASK(MSR_LE_LG) /* Little Endian */ #define MSR_TM __MASK(MSR_TM_LG) /* Transactional Mem Available */ diff --git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h index eb2a33d..06967f1 100644 --- a/arch/powerpc/include/asm/reg_booke.h +++ b/arch/powerpc/include/asm/reg_booke.h @@ -46,10 +46,10 @@ #define MSR_USER32 (MSR_ | MSR_PR | MSR_EE) #define MSR_USER64 (MSR_USER32 | MSR_64BIT) #elif defined (CONFIG_40x) -#define MSR_KERNEL (MSR_ME|MSR_RI|MSR_IR|MSR_DR|MSR_CE) +#define MSR_KERNEL (MSR_ME|MSR_IR|MSR_DR|MSR_CE) #define MSR_USER (MSR_KERNEL|MSR_PR|MSR_EE) #else -#define MSR_KERNEL (MSR_ME|MSR_RI|MSR_CE) +#define MSR_KERNEL (MSR_ME|MSR_CE) #define MSR_USER (MSR_KERNEL|MSR_PR|MSR_EE) #endif diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 96f3473..77679a7 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1359,7 +1359,7 @@ static struct regbit msr_bits[] = { {MSR_IR,"IR"}, {MSR_DR,"DR"}, {MSR_PMM, "PMM"}, -#ifndef CONFIG_BOOKE +#if !defined(CONFIG_BOOKE) && !defined(CONFIG_40x) {MSR_RI,"RI"}, {MSR_LE,"LE"}, #endif diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 9a86572..2d00696 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -429,10 +429,11 @@ void system_reset_exception(struct pt_regs *regs) if (get_paca()->in_nmi > 1) nmi_panic(regs, "Unrecoverable nested System Reset"); #endif +#ifdef MSR_RI /* Must die if the interrupt is not recoverable */ if (!(regs->msr & MSR_RI)) nmi_panic(regs, "Unrecoverable System Reset"); - +#endif if (!nested) nmi_exit(); @@ -478,7 +479,9 @@ static inline int check_io_access(struct pt_regs *regs) printk(KERN_DEBUG "%s bad port %lx at %p\n", (*nip & 0x100)? "OUT to": "IN from", regs->gpr[rb] - _IO_BASE, nip); +#ifdef MSR_RI regs->msr |= MSR_RI; +#endif regs->nip = extable_fixup(entry); return 1; } @@ -763,10 +766,11 @@ void machine_check_exception(struct pt_regs *regs) if (check_io_access(regs)) goto bail; +#ifdef MSR_RI /* Must die if the interrupt is not recoverable */ if (!(regs->msr & MSR_RI)) nmi_panic(regs, "Unrecoverable Machine check"); - +#endif if (!nested) nmi_exit(); diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index d81568f..c03c453 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -3059,9 +3059,11 @@ int emulate_step(struct pt_regs *regs, unsigned int instr) case MTMSR: val = regs->gpr[op.reg]; +#ifdef MSR_RI if ((val & MSR_RI) == 0) /* can't step mtmsr[d] that would clear MSR_RI */ return -1; +#endif /* here op.val is the mask of bits to change */ regs->msr = (regs->msr & ~op.val) | (val & op.val); goto instr_done;
[PATCH] powerpc/44x/bamboo: Fix PCI range
>From 88509506b80b4960004146280eb740be64513a0b Mon Sep 17 00:00:00 2001 The bamboo dts has a bug: it uses a non-naturally aligned range for PCI memory space. This isnt' supported by the code, thus causing PCI to break on this system. This is due to the fact that while the chip memory map has 1G reserved for PCI memory, it's only 512M aligned. The code doesn't know how to split that into 2 different PMMs and fails, so limit the region to 512M. Signed-off-by: Benjamin Herrenschmidt --- arch/powerpc/boot/dts/bamboo.dts | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/boot/dts/bamboo.dts b/arch/powerpc/boot/dts/bamboo.dts index 538e42b..b5861fa 100644 --- a/arch/powerpc/boot/dts/bamboo.dts +++ b/arch/powerpc/boot/dts/bamboo.dts @@ -268,8 +268,10 @@ /* Outbound ranges, one memory and one IO, * later cannot be changed. Chip supports a second * IO range but we don't use it for now +* The chip also supports a larger memory range but +* it's not naturally aligned, so our code will break */ - ranges = <0x0200 0x 0xa000 0x 0xa000 0x 0x4000 + ranges = <0x0200 0x 0xa000 0x 0xa000 0x 0x2000 0x0200 0x 0x 0x 0xe000 0x 0x0010 0x0100 0x 0x 0x 0xe800 0x 0x0001>;
Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
On Thu, Dec 06, 2018 at 08:45:09AM +0200, Leon Romanovsky wrote: > On Thu, Dec 06, 2018 at 03:19:51PM +1100, David Gibson wrote: > > Mellanox ConnectX-5 IB cards (MT27800) seem to cause a call trace when > > unbound from their regular driver and attached to vfio-pci in order to pass > > them through to a guest. > > > > This goes away if the disable_idle_d3 option is used, so it looks like a > > problem with the hardware handling D3 state. To fix that more permanently, > > use a device quirk to disable D3 state for these devices. > > > > We do this by renaming the existing quirk_no_ata_d3() more generally and > > attaching it to the ConnectX-[45] devices (0x15b3:0x1013). > > > > Signed-off-by: David Gibson > > --- > > drivers/pci/quirks.c | 17 +++-- > > 1 file changed, 11 insertions(+), 6 deletions(-) > > > > Hi David, > > Thank for your patch, > > I would like to reproduce the calltrace before moving forward, > but have trouble to reproduce the original issue. > > I'm working with vfio-pci and CX-4/5 cards on daily basis, > tried manually enter into D3 state now, and it worked for me. > > Can you please post your full calltrace, and "lspci -s PCI_ID -vv" > output? Sorry, I may have jumped the gun on this. Using disable_idle_d3 seems to do _something_ for these cards, but there are some other things going wrong which are confusing the issue. This is on POWER, which might affect the situation. I'll get back to you once I have some more information. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: Is it worth to fix the crashkernel reserved memory blocks the hotplug issue?
On 12/10/18 at 12:08pm, Pingfan Liu wrote: > Hi, > I found in powerpc code, it is doable to reserve memory region in > movable zone, such as crashkernel does. But in x86 code, it checks the > hotpluggable attribute of memory, hence if manually specifying a > region in hotpluggable region, it will fail. Yes, it is a problem. In x86, for crashkernel=xx@yy case to specify base address of crashkernel reservation, it will find the region firstly, if found region's base is not equal to specified base 'yy', means that region has been occupied. The reservation will fail. And memblock finding will only iterate unhotpluggable area, this will avoid reserving crashkernel memory on hotpluggable region. In my opinion, it's worth fixing. Thanks Baoquan > The x86 code: > /* 0 means: find the address automatically */ > if (crash_base <= 0) { > /* > * Set CRASH_ADDR_LOW_MAX upper bound for crash memory, > * as old kexec-tools loads bzImage below that, unless > * "crashkernel=size[KMG],high" is specified. > */ > crash_base = memblock_find_in_range(CRASH_ALIGN, >high ? CRASH_ADDR_HIGH_MAX > : CRASH_ADDR_LOW_MAX, >crash_size, CRASH_ALIGN); > if (!crash_base) { > pr_info("crashkernel reservation failed - No suitable area found.\n"); > return; > } > > } else { > unsigned long long start; > > start = memblock_find_in_range(crash_base, --> this func will check > the hotpluggable attribute of memory and return failure if the > specifying region intersects with it. > crash_base + crash_size, > crash_size, 1 << 20); > if (start != crash_base) { > pr_info("crashkernel reservation failed - memory is in use.\n"); > return; > } > } > > Thanks, > Pingfan
[PATCH] net/ibmvnic: Remove tests of member address
The driver was checking for non-NULL address. - adapter->napi[i] This is pointless as these will be always non-NULL, since the 'dapter->napi' is allocated in init_napi(). It is safe to get rid of useless checks for addresses to fix the coccinelle warning: >>drivers/net/ethernet/ibm/ibmvnic.c: test of a variable/field address Since such statements always return true, they are redundant. Signed-off-by: Wen Yang CC: Benjamin Herrenschmidt CC: Paul Mackerras CC: Michael Ellerman CC: Thomas Falcon CC: John Allen CC: "David S. Miller" CC: linuxppc-dev@lists.ozlabs.org CC: net...@vger.kernel.org CC: linux-ker...@vger.kernel.org --- drivers/net/ethernet/ibm/ibmvnic.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index ed50b8dee44f..14d00985f087 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -773,11 +773,8 @@ static void release_napi(struct ibmvnic_adapter *adapter) return; for (i = 0; i < adapter->num_active_rx_napi; i++) { - if (&adapter->napi[i]) { - netdev_dbg(adapter->netdev, - "Releasing napi[%d]\n", i); - netif_napi_del(&adapter->napi[i]); - } + netdev_dbg(adapter->netdev, "Releasing napi[%d]\n", i); + netif_napi_del(&adapter->napi[i]); } kfree(adapter->napi); -- 2.19.1
Re: [PATCH v3] kbuild: Add support for DT binding schema checks
Hi Rob, On Tue, Dec 11, 2018 at 5:50 AM Rob Herring wrote: > > This adds the build infrastructure for checking DT binding schema > documents and validating dts files using the binding schema. > > Check DT binding schema documents: > make dt_binding_check > > Build dts files and check using DT binding schema: > make dtbs_check > > Optionally, DT_SCHEMA_FILES can passed in with a schema file(s) to use Perhaps, "can be passed" ? > for validation. This makes it easier to find and fix errors generated by > a specific schema. > > Currently, the validation targets are separate from a normal build to > avoid a hard dependency on the external DT schema project and because > there are lots of warnings generated. > > Cc: Jonathan Corbet > Cc: Mark Rutland > Cc: Masahiro Yamada > Cc: Michal Marek > Cc: linux-...@vger.kernel.org > Cc: devicet...@vger.kernel.org > Cc: linux-kbu...@vger.kernel.org > Signed-off-by: Rob Herring > --- > v3: > - Fix error causing only 1st schema file to get used. > - Add a more useful error message when dtc is missing YAML support > telling the user they need to install libyaml devel package. > > > .gitignore | 1 + > Documentation/Makefile | 2 +- > Documentation/devicetree/bindings/.gitignore | 1 + > Documentation/devicetree/bindings/Makefile | 33 + > Makefile | 11 -- > scripts/Makefile.lib | 37 ++-- > 6 files changed, 80 insertions(+), 5 deletions(-) > create mode 100644 Documentation/devicetree/bindings/.gitignore > create mode 100644 Documentation/devicetree/bindings/Makefile > > diff --git a/.gitignore b/.gitignore > index 97ba6b79834c..a20ac26aa2f5 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -15,6 +15,7 @@ > *.bin > *.bz2 > *.c.[012]*.* > +*.dt.yaml > *.dtb > *.dtb.S > *.dwo > diff --git a/Documentation/Makefile b/Documentation/Makefile > index 2ca77ad0f238..9786957c6a35 100644 > --- a/Documentation/Makefile > +++ b/Documentation/Makefile > @@ -2,7 +2,7 @@ > # Makefile for Sphinx documentation > # > > -subdir-y := > +subdir-y := devicetree/bindings/ > > # You can set these variables from the command line. > SPHINXBUILD = sphinx-build > diff --git a/Documentation/devicetree/bindings/.gitignore > b/Documentation/devicetree/bindings/.gitignore > new file mode 100644 > index ..d9194c02dd08 > --- /dev/null > +++ b/Documentation/devicetree/bindings/.gitignore > @@ -0,0 +1 @@ > +*.example.dts > diff --git a/Documentation/devicetree/bindings/Makefile > b/Documentation/devicetree/bindings/Makefile > new file mode 100644 > index ..43f8657ab070 > --- /dev/null > +++ b/Documentation/devicetree/bindings/Makefile > @@ -0,0 +1,33 @@ > +# SPDX-License-Identifier: GPL-2.0 > +DT_DOC_CHECKER ?= dt-doc-validate > +DT_EXTRACT_EX ?= dt-extract-example > +DT_MK_SCHEMA ?= dt-mk-schema > +DT_MK_SCHEMA_FLAGS := $(if $(DT_SCHEMA_FILES), -u) > + > +quiet_cmd_chk_binding = CHKDT $< In convention, the short log displays the target name instead of the prerequisite. If O=foo/bar is given, $< shows the full path, then log will look like follows: CHKDT /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/calxeda.yaml DTC Documentation/devicetree/bindings/arm/calxeda.example.dtb CHKDT /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/qcom.yaml DTC Documentation/devicetree/bindings/arm/qcom.example.dtb CHKDT /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/xilinx.yaml DTC Documentation/devicetree/bindings/arm/xilinx.example.dtb CHKDT /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/ti/nspire.yaml DTC Documentation/devicetree/bindings/arm/ti/nspire.example.dtb CHKDT /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/ti/ti,davinci.yaml DTC Documentation/devicetree/bindings/arm/ti/ti,davinci.example.dtb CHKDT /home/masahiro/ref/linux-devicetree/Documentation/devicetree/bindings/arm/altera.yaml You might think it is ugly. > + cmd_chk_binding = (set -e; \ > + $(DT_DOC_CHECKER) $< ; \ > + mkdir -p $(dir $@) ; \ > + $(DT_EXTRACT_EX) $< > $@ ) - 'set -e' is redundant because if_changed already has it. - 'mkdir mkdir -p $(dir $@)' is also redundant because scripts/Makefile.build automatically creates it. - You do not need to execute this in a sub-shell You can simplify like this: cmd_chk_binding = $(DT_DOC_CHECKER) $< ; \ $(DT_EXTRACT_EX) $< > $@ > +$(obj)/%.example.dts: $(src)/%.yaml FORCE > + $(call if_changed,chk_binding) > + > +DT_TMP_SCHEMA := .schema.yaml.tmp BTW, why does this file start with a period? What is the meaning of '.tmp' extension? > +extra-y += $(DT_TMP_SCHEMA) > + > +quiet_c
Re: [RFC PATCH v2 11/11] powerpc/book3s32: Implement Kernel Userspace Access Protection
On Wed, 2018-11-28 at 09:27 +, Christophe Leroy wrote: > This patch implements Kernel Userspace Access Protection for > book3s/32. > > Due to limitations of the processor page protection capabilities, > the protection is only against writing. read protection cannot be > achieved using page protection. > > In order to provide the protection, Ku and Ks keys are modified in > Userspace Segment registers, and different PP bits are used to: > > PP01 provides RW for Key 0 and RO for Key 1 > PP10 provides RW for all > PP11 provides RO for all > > Today PP10 is used for RW pages and PP11 for RO pages. This patch > modifies page protection to PP01 for RW pages. > > Then segment registers are set to Ku 0 and Ks 1. When kernel needs > to write to RW pages, the associated segment register is changed to > Ks 0 in order to allow write access to the kernel. > > In order to avoid having the read all segment registers when > locking/unlocking the access, some data is kept in the thread_struct > and saved on stack on exceptions. The field identifies both the > first unlocked segment and the first segment following the last > unlocked one. When no segment is unlocked, it contains value 0. > > Signed-off-by: Christophe Leroy Hey Christophe, I tried to test this and got a machine check after the kernel starts init. Vector: 700 (Program Check) at [ef0b5e70] pc: 0ca4 lr: b7e1a030 sp: ef0b5f30 msr: 81002 current = 0xef0b8000 pid = 1, comm = init Testing with mac99 model in qemu. - Russell
Re: [PATCH] crypto/testmgr: fix skcipher test with CONFIG_VMAP_STACK
On Fri, Dec 07, 2018 at 09:26:15PM +0100, Ard Biesheuvel wrote: > On Fri, 7 Dec 2018 at 18:33, Christophe Leroy wrote: > > > > [2.364486] WARNING: CPU: 0 PID: 60 at > > ./arch/powerpc/include/asm/io.h:837 dma_nommu_map_page+0x44/0xd4 > > [2.373579] CPU: 0 PID: 60 Comm: cryptomgr_test Tainted: GW > >4.20.0-rc5-00560-g6bfb52e23a00-dirty #531 > > [2.384740] NIP: c000c540 LR: c000c584 CTR: > > [2.389743] REGS: c95abab0 TRAP: 0700 Tainted: GW > > (4.20.0-rc5-00560-g6bfb52e23a00-dirty) > > [2.400042] MSR: 00029032 CR: 24042204 XER: > > [2.406669] > > [2.406669] GPR00: c02f2244 c95abb60 c6262990 c95abd80 256a 0001 > > 0001 0001 > > [2.406669] GPR08: 2000 0010 0010 24042202 > > 0100 c95abd88 > > [2.406669] GPR16: c05569d4 0001 0010 c95abc88 c0615664 > > 0004 > > [2.406669] GPR24: 0010 c95abc88 c95abc88 c61ae210 c7ff6d40 > > c61ae210 3d68 > > [2.441559] NIP [c000c540] dma_nommu_map_page+0x44/0xd4 > > [2.446720] LR [c000c584] dma_nommu_map_page+0x88/0xd4 > > [2.451762] Call Trace: > > [2.454195] [c95abb60] [82000808] 0x82000808 (unreliable) > > [2.459572] [c95abb80] [c02f2244] talitos_edesc_alloc+0xbc/0x3c8 > > [2.465493] [c95abbb0] [c02f2600] ablkcipher_edesc_alloc+0x4c/0x5c > > [2.471606] [c95abbd0] [c02f4ed0] ablkcipher_encrypt+0x20/0x64 > > [2.477389] [c95abbe0] [c02023b0] __test_skcipher+0x4bc/0xa08 > > [2.483049] [c95abe00] [c0204b60] test_skcipher+0x2c/0xcc > > [2.488385] [c95abe20] [c0204c48] alg_test_skcipher+0x48/0xbc > > [2.494064] [c95abe40] [c0205cec] alg_test+0x164/0x2e8 > > [2.499142] [c95abf00] [c0200dec] cryptomgr_test+0x48/0x50 > > [2.504558] [c95abf10] [c0039ff4] kthread+0xe4/0x110 > > [2.509471] [c95abf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c > > [2.515532] Instruction dump: > > [2.518468] 7c7e1b78 7c9d2378 7cbf2b78 41820054 3d20c076 8089c200 > > 3d20c076 7c84e850 > > [2.526127] 8129c204 7c842e70 7f844840 419c0008 <0fe0> 2f9e > > 54847022 7c84fa14 > > [2.533960] ---[ end trace bf78d94af73fe3b8 ]--- > > [2.539123] talitos ff02.crypto: master data transfer error > > [2.544775] talitos ff02.crypto: TEA error: ISR 0x2000_0040 > > [2.551625] alg: skcipher: encryption failed on test 1 for > > ecb-aes-talitos: ret=22 > > > > IV cannot be on stack when CONFIG_VMAP_STACK is selected because the stack > > cannot be DMA mapped anymore. > > > > This patch allocates it with kmalloc() > > > > This looks like a driver bug to me. Other accelerators in > drivers/crypto all take a copy of the IV before mapping it for DMA, so > it would be better if talitos did the same. Yes please fix the driver. In general, if a paramter is coming in as a pointer, you must assume that it may lie on the stack. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] powerpc: Fix bogus usage of MSR_RI on BookE and 40x
Hi Benjamin, I love your patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v4.20-rc6 next-20181210] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Benjamin-Herrenschmidt/powerpc-Fix-bogus-usage-of-MSR_RI-on-BookE-and-40x/20181211-110017 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-ppa8548_defconfig (attached as .config) compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.2.0 make.cross ARCH=powerpc All errors (new ones prefixed by >>): arch/powerpc/sysdev/fsl_rio.c: In function 'fsl_rio_mcheck_exception': >> arch/powerpc/sysdev/fsl_rio.c:115:17: error: 'MSR_RI' undeclared (first use >> in this function); did you mean 'MSR_BE'? regs->msr |= MSR_RI; ^~ MSR_BE arch/powerpc/sysdev/fsl_rio.c:115:17: note: each undeclared identifier is reported only once for each function it appears in vim +115 arch/powerpc/sysdev/fsl_rio.c a52c8f52 Alexandre Bounine 2010-05-26 96 ff33f182 Li Yang 2010-06-18 97 #ifdef CONFIG_E500 cce1f106 Shaohui Xie 2010-11-18 98 int fsl_rio_mcheck_exception(struct pt_regs *regs) a52c8f52 Alexandre Bounine 2010-05-26 99 { 82a9a480 Scott Wood2011-06-16 100 const struct exception_table_entry *entry; 82a9a480 Scott Wood2011-06-16 101 unsigned long reason; 82a9a480 Scott Wood2011-06-16 102 82a9a480 Scott Wood2011-06-16 103 if (!rio_regs_win) 82a9a480 Scott Wood2011-06-16 104 return 0; a52c8f52 Alexandre Bounine 2010-05-26 105 a52c8f52 Alexandre Bounine 2010-05-26 106 reason = in_be32((u32 *)(rio_regs_win + RIO_LTLEDCSR)); a52c8f52 Alexandre Bounine 2010-05-26 107 if (reason & (RIO_LTLEDCSR_IER | RIO_LTLEDCSR_PRT)) { a52c8f52 Alexandre Bounine 2010-05-26 108 /* Check if we are prepared to handle this fault */ a52c8f52 Alexandre Bounine 2010-05-26 109 entry = search_exception_tables(regs->nip); a52c8f52 Alexandre Bounine 2010-05-26 110 if (entry) { a52c8f52 Alexandre Bounine 2010-05-26 111 pr_debug("RIO: %s - MC Exception handled\n", a52c8f52 Alexandre Bounine 2010-05-26 112 __func__); a52c8f52 Alexandre Bounine 2010-05-26 113 out_be32((u32 *)(rio_regs_win + RIO_LTLEDCSR), a52c8f52 Alexandre Bounine 2010-05-26 114 0); a52c8f52 Alexandre Bounine 2010-05-26 @115 regs->msr |= MSR_RI; 61a92f70 Nicholas Piggin 2016-10-14 116 regs->nip = extable_fixup(entry); a52c8f52 Alexandre Bounine 2010-05-26 117 return 1; a52c8f52 Alexandre Bounine 2010-05-26 118 } a52c8f52 Alexandre Bounine 2010-05-26 119 } a52c8f52 Alexandre Bounine 2010-05-26 120 cce1f106 Shaohui Xie 2010-11-18 121 return 0; a52c8f52 Alexandre Bounine 2010-05-26 122 } cce1f106 Shaohui Xie 2010-11-18 123 EXPORT_SYMBOL_GPL(fsl_rio_mcheck_exception); ff33f182 Li Yang 2010-06-18 124 #endif a52c8f52 Alexandre Bounine 2010-05-26 125 :: The code at line 115 was first introduced by commit :: a52c8f521fed43bce53451d7dfddf2b42a2af689 rapidio, powerpc/85xx: Add MChk handler for SRIO port :: TO: Alexandre Bounine :: CC: Linus Torvalds --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH] crypto: talitos - fix ablkcipher for CONFIG_VMAP_STACK
[2.364486] WARNING: CPU: 0 PID: 60 at ./arch/powerpc/include/asm/io.h:837 dma_nommu_map_page+0x44/0xd4 [2.373579] CPU: 0 PID: 60 Comm: cryptomgr_test Tainted: GW 4.20.0-rc5-00560-g6bfb52e23a00-dirty #531 [2.384740] NIP: c000c540 LR: c000c584 CTR: [2.389743] REGS: c95abab0 TRAP: 0700 Tainted: GW (4.20.0-rc5-00560-g6bfb52e23a00-dirty) [2.400042] MSR: 00029032 CR: 24042204 XER: [2.406669] [2.406669] GPR00: c02f2244 c95abb60 c6262990 c95abd80 256a 0001 0001 0001 [2.406669] GPR08: 2000 0010 0010 24042202 0100 c95abd88 [2.406669] GPR16: c05569d4 0001 0010 c95abc88 c0615664 0004 [2.406669] GPR24: 0010 c95abc88 c95abc88 c61ae210 c7ff6d40 c61ae210 3d68 [2.441559] NIP [c000c540] dma_nommu_map_page+0x44/0xd4 [2.446720] LR [c000c584] dma_nommu_map_page+0x88/0xd4 [2.451762] Call Trace: [2.454195] [c95abb60] [82000808] 0x82000808 (unreliable) [2.459572] [c95abb80] [c02f2244] talitos_edesc_alloc+0xbc/0x3c8 [2.465493] [c95abbb0] [c02f2600] ablkcipher_edesc_alloc+0x4c/0x5c [2.471606] [c95abbd0] [c02f4ed0] ablkcipher_encrypt+0x20/0x64 [2.477389] [c95abbe0] [c02023b0] __test_skcipher+0x4bc/0xa08 [2.483049] [c95abe00] [c0204b60] test_skcipher+0x2c/0xcc [2.488385] [c95abe20] [c0204c48] alg_test_skcipher+0x48/0xbc [2.494064] [c95abe40] [c0205cec] alg_test+0x164/0x2e8 [2.499142] [c95abf00] [c0200dec] cryptomgr_test+0x48/0x50 [2.504558] [c95abf10] [c0039ff4] kthread+0xe4/0x110 [2.509471] [c95abf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c [2.515532] Instruction dump: [2.518468] 7c7e1b78 7c9d2378 7cbf2b78 41820054 3d20c076 8089c200 3d20c076 7c84e850 [2.526127] 8129c204 7c842e70 7f844840 419c0008 <0fe0> 2f9e 54847022 7c84fa14 [2.533960] ---[ end trace bf78d94af73fe3b8 ]--- [2.539123] talitos ff02.crypto: master data transfer error [2.544775] talitos ff02.crypto: TEA error: ISR 0x2000_0040 [2.551625] alg: skcipher: encryption failed on test 1 for ecb-aes-talitos: ret=22 IV cannot be on stack when CONFIG_VMAP_STACK is selected because the stack cannot be DMA mapped anymore. This patch copies the IV from areq->info to the context ctx->iv. Fixes: 4de9d0b547b9 ("crypto: talitos - Add ablkcipher algorithms") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- drivers/crypto/talitos.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 6988012deca4..385ec970b639 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -1668,8 +1668,11 @@ static struct talitos_edesc *ablkcipher_edesc_alloc(struct ablkcipher_request * struct talitos_ctx *ctx = crypto_ablkcipher_ctx(cipher); unsigned int ivsize = crypto_ablkcipher_ivsize(cipher); + if (ivsize) + memcpy(ctx->iv, areq->info, ivsize); + return talitos_edesc_alloc(ctx->dev, areq->src, areq->dst, - areq->info, 0, areq->nbytes, 0, ivsize, 0, + ctx->iv, 0, areq->nbytes, 0, ivsize, 0, areq->base.flags, encrypt); } -- 2.13.3
Re: [PATCH] crypto/testmgr: fix skcipher test with CONFIG_VMAP_STACK
Le 11/12/2018 à 06:07, Herbert Xu a écrit : On Fri, Dec 07, 2018 at 09:26:15PM +0100, Ard Biesheuvel wrote: On Fri, 7 Dec 2018 at 18:33, Christophe Leroy wrote: [2.364486] WARNING: CPU: 0 PID: 60 at ./arch/powerpc/include/asm/io.h:837 dma_nommu_map_page+0x44/0xd4 [2.373579] CPU: 0 PID: 60 Comm: cryptomgr_test Tainted: GW 4.20.0-rc5-00560-g6bfb52e23a00-dirty #531 [2.384740] NIP: c000c540 LR: c000c584 CTR: [2.389743] REGS: c95abab0 TRAP: 0700 Tainted: GW (4.20.0-rc5-00560-g6bfb52e23a00-dirty) [2.400042] MSR: 00029032 CR: 24042204 XER: [2.406669] [2.406669] GPR00: c02f2244 c95abb60 c6262990 c95abd80 256a 0001 0001 0001 [2.406669] GPR08: 2000 0010 0010 24042202 0100 c95abd88 [2.406669] GPR16: c05569d4 0001 0010 c95abc88 c0615664 0004 [2.406669] GPR24: 0010 c95abc88 c95abc88 c61ae210 c7ff6d40 c61ae210 3d68 [2.441559] NIP [c000c540] dma_nommu_map_page+0x44/0xd4 [2.446720] LR [c000c584] dma_nommu_map_page+0x88/0xd4 [2.451762] Call Trace: [2.454195] [c95abb60] [82000808] 0x82000808 (unreliable) [2.459572] [c95abb80] [c02f2244] talitos_edesc_alloc+0xbc/0x3c8 [2.465493] [c95abbb0] [c02f2600] ablkcipher_edesc_alloc+0x4c/0x5c [2.471606] [c95abbd0] [c02f4ed0] ablkcipher_encrypt+0x20/0x64 [2.477389] [c95abbe0] [c02023b0] __test_skcipher+0x4bc/0xa08 [2.483049] [c95abe00] [c0204b60] test_skcipher+0x2c/0xcc [2.488385] [c95abe20] [c0204c48] alg_test_skcipher+0x48/0xbc [2.494064] [c95abe40] [c0205cec] alg_test+0x164/0x2e8 [2.499142] [c95abf00] [c0200dec] cryptomgr_test+0x48/0x50 [2.504558] [c95abf10] [c0039ff4] kthread+0xe4/0x110 [2.509471] [c95abf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c [2.515532] Instruction dump: [2.518468] 7c7e1b78 7c9d2378 7cbf2b78 41820054 3d20c076 8089c200 3d20c076 7c84e850 [2.526127] 8129c204 7c842e70 7f844840 419c0008 <0fe0> 2f9e 54847022 7c84fa14 [2.533960] ---[ end trace bf78d94af73fe3b8 ]--- [2.539123] talitos ff02.crypto: master data transfer error [2.544775] talitos ff02.crypto: TEA error: ISR 0x2000_0040 [2.551625] alg: skcipher: encryption failed on test 1 for ecb-aes-talitos: ret=22 IV cannot be on stack when CONFIG_VMAP_STACK is selected because the stack cannot be DMA mapped anymore. This patch allocates it with kmalloc() This looks like a driver bug to me. Other accelerators in drivers/crypto all take a copy of the IV before mapping it for DMA, so it would be better if talitos did the same. Yes please fix the driver. In general, if a paramter is coming in as a pointer, you must assume that it may lie on the stack. Yes, just sent a patch for the talitos driver for it. But note that the IV is received via areq->info and not via a standalone pointer. Christophe