Re: [PATCH 3/5] powerpc/mm/32: Use page_is_ram to check for RAM
Jonathan Neuschäfer writes: > Signed-off-by: Jonathan Neuschäfer > --- > arch/powerpc/mm/pgtable_32.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c > index d35d9ad3c1cd..d54e1a9c1c99 100644 > --- a/arch/powerpc/mm/pgtable_32.c > +++ b/arch/powerpc/mm/pgtable_32.c > @@ -145,9 +145,8 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, > unsigned long flags, > #ifndef CONFIG_CRASH_DUMP > /* >* Don't allow anybody to remap normal RAM that we're using. > - * mem_init() sets high_memory so only do the check after that. >*/ > - if (slab_is_available() && (p < virt_to_phys(high_memory)) && > + if (page_is_ram(__phys_to_pfn(p)) && > !(__allow_ioremap_reserved && memblock_is_region_reserved(p, > size))) { > printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n", > (unsigned long long)p, __builtin_return_address(0)); This is killing my p5020ds (Freescale e5500) unfortunately: smp: Bringing up secondary CPUs ... __ioremap(): phys addr 0x7fef5000 is RAM lr smp_85xx_kick_cpu Unable to handle kernel paging request for data at address 0x Faulting instruction address: 0xc0029080 Oops: Kernel access of bad area, sig: 11 [#1] BE SMP NR_CPUS=24 CoreNet Generic Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc4-gcc-4.6.3-00076-g85319478bdb4 #86 NIP: c0029080 LR: c0029020 CTR: 0001 REGS: e804bd40 TRAP: 0300 Not tainted (4.16.0-rc4-gcc-4.6.3-00076-g85319478bdb4) MSR: 00021002 CR: 24ad4e22 XER: DEAR: ESR: GPR00: c0029020 e804bdf0 e805 00021002 004d c0aaaeed GPR08: 2d57d000 22adbe84 c0002630 GPR16: c0a8b4a4 GPR24: 0001 00029002 0001 0001 0001 NIP [c0029080] smp_85xx_kick_cpu+0x100/0x2c0 LR [c0029020] smp_85xx_kick_cpu+0xa0/0x2c0 Call Trace: [e804bdf0] [c0029020] smp_85xx_kick_cpu+0xa0/0x2c0 (unreliable) [e804be30] [c0011194] __cpu_up+0xb4/0x1c0 [e804be60] [c002f16c] bringup_cpu+0x2c/0xf0 [e804be80] [c002ec9c] cpuhp_invoke_callback+0x12c/0x310 [e804beb0] [c002fdd8] cpu_up+0x108/0x230 [e804bee0] [c09f7438] smp_init+0x84/0x104 [e804bf00] [c09e9acc] kernel_init_freeable+0xc4/0x228 [e804bf30] [c0002644] kernel_init+0x14/0x110 [e804bf40] [c000f3b0] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 57990032 3b1c0057 7f19c050 7f3acb78 5718d1be 2e18 41920024 7f29cb78 7f0903a6 6000 6000 6000 <7c0048ac> 39290040 4200fff8 7c0004ac random: get_random_bytes called from init_oops_id+0x5c/0x70 with crng_init=0 ---[ end trace c3807aa91cf16cd8 ]--- The obvious fix of changing the test in smp_85xx_start_cpu() didn't work, I get a different oops: Unable to handle kernel paging request for data at address 0x3fef5140 Faulting instruction address: 0xc00290a0 Oops: Kernel access of bad area, sig: 11 [#1] BE SMP NR_CPUS=24 CoreNet Generic Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc4-gcc-4.6.3-00076-g85319478bdb4-dirty #90 NIP: c00290a0 LR: c0029040 CTR: 0001 REGS: e804bd50 TRAP: 0300 Not tainted (4.16.0-rc4-gcc-4.6.3-00076-g85319478bdb4-dirty) MSR: 00021002 CR: 24a24e22 XER: DEAR: 3fef5140 ESR: GPR00: c0029040 e804be00 e805 0023 00021002 004e c0aaaed3 GPR08: 3fef5140 2d57d000 22a2be84 c0002630 GPR16: c0a8b4a4 GPR24: 0004 0001 3fef5140 3fef5140 00029002 3fef5140 0001 0001 NIP [c00290a0] smp_85xx_kick_cpu+0x120/0x2e0 LR [c0029040] smp_85xx_kick_cpu+0xc0/0x2e0 Call Trace: [e804be00] [c0029040] smp_85xx_kick_cpu+0xc0/0x2e0 (unreliable) [e804be30] [c0011194] __cpu_up+0xb4/0x1c0 [e804be60] [c002f18c] bringup_cpu+0x2c/0xf0 [e804be80] [c002ecbc] cpuhp_invoke_callback+0x12c/0x310 [e804beb0] [c002fdf8] cpu_up+0x108/0x230 [e804bee0] [c09f7438] smp_init+0x84/0x104 [e804bf00] [c09e9acc] kernel_init_freeable+0xc4/0x228 [e804bf30] [c0002644] kernel_init+0x14/0x110 [e804bf40] [c000f3b0] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 7c0903a6 4e800421 57ba0032 3b3d0057 7f3ac850 7f5bd378 5739d1be 2e19 4192001c 7f49d378 7f2903a6 6000 <7c0048ac> 39290040 4200fff8 7c0004ac random: get_random_bytes called from init_oops_id+0x5c/0x70 with crng_init=0 ---[ end trace 950df40ee04f2d5e ]--- So that will require a bit more debugging. cheers
[PATCH 5/5] powerpc/zImage: Also check for stdout-path
The /chosen/linux,stdout-path is "deprecated" in favour of /chosen/stdout-path so we should be checking for both. Signed-off-by: Oliver O'Halloran --- arch/powerpc/boot/serial.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c index a7f9ac9a27b5..9ef1ed35ae62 100644 --- a/arch/powerpc/boot/serial.c +++ b/arch/powerpc/boot/serial.c @@ -92,7 +92,8 @@ static void *serial_get_stdout_devp(void) if (devp == NULL) goto err_out; - if (getprop(devp, "linux,stdout-path", path, MAX_PATH_LEN) > 0) { + if (getprop(devp, "linux,stdout-path", path, MAX_PATH_LEN) > 0 || + getprop(devp, "stdout-path", path, MAX_PATH_LEN) > 0) { devp = finddevice(path); if (devp == NULL) goto err_out; -- 2.9.5
[PATCH 2/5] powerpc/zImage: Don't build zlib object files
The required source files are directly #include`ed into decompress.c when using gzip compression. Building the separate .o files for the zlib sources isn't required and can cause linker errors due to the symbols being defined in decompress.o and the zlib .o files. Signed-off-by: Oliver O'Halloran --- You can stop throwing up now --- arch/powerpc/boot/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile index 26d5d2a5b8e9..f66b9dba99fd 100644 --- a/arch/powerpc/boot/Makefile +++ b/arch/powerpc/boot/Makefile @@ -108,7 +108,7 @@ $(addprefix $(obj)/,$(libfdt) libfdt-wrapper.o simpleboot.o epapr.o opal.o \ src-wlib-y := string.S crt0.S stdio.c decompress.c main.c \ $(libfdt) libfdt-wrapper.c \ ns16550.c serial.c simple_alloc.c div64.S util.S \ - elf_util.c $(zlib-y) devtree.c stdlib.c \ + elf_util.c devtree.c stdlib.c \ oflib.c ofconsole.c cuboot.c src-wlib-$(CONFIG_PPC_MPC52XX) += mpc52xx-psc.c -- 2.9.5
[PATCH 4/5] powerpc/zImage: Rework serial driver probing
Rather than checking the compatible string in serial_driver_init() we call into the driver's init function and wait for a driver to inidicate it bound to the device by returning zero. The pointers to each driver probe functions are stored in the ".serialdrv" section of the zImage, similar to how initcalls and whatnot are handled inside Linux. This structure allows us to conditionally compile specific driver files based on the KConfig settings. This is needed because we don't have access to the KConfig #defines in the zImage source files (it's a giant #include headache) so conditional compilation is the only way to eliminate unnecessary code. Signed-off-by: Oliver O'Halloran --- arch/powerpc/boot/cpm-serial.c | 1 + arch/powerpc/boot/mpc52xx-psc.c | 1 + arch/powerpc/boot/mpsc.c| 1 + arch/powerpc/boot/ns16550.c | 2 ++ arch/powerpc/boot/opal.c| 1 + arch/powerpc/boot/ops.h | 16 +-- arch/powerpc/boot/serial.c | 41 +++-- arch/powerpc/boot/uartlite.c| 1 + arch/powerpc/boot/wrapper | 2 +- arch/powerpc/boot/zImage.coff.lds.S | 4 arch/powerpc/boot/zImage.lds.S | 8 11 files changed, 42 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/boot/cpm-serial.c b/arch/powerpc/boot/cpm-serial.c index f6b1cf23946e..5ebf3dfd810a 100644 --- a/arch/powerpc/boot/cpm-serial.c +++ b/arch/powerpc/boot/cpm-serial.c @@ -295,3 +295,4 @@ int cpm_console_init(void *devp, struct serial_console_data *scdp) return 0; } +SERIAL_DRIVER(cpm_console_init); diff --git a/arch/powerpc/boot/mpc52xx-psc.c b/arch/powerpc/boot/mpc52xx-psc.c index 75470936e661..c09e82eaf006 100644 --- a/arch/powerpc/boot/mpc52xx-psc.c +++ b/arch/powerpc/boot/mpc52xx-psc.c @@ -66,3 +66,4 @@ int mpc5200_psc_console_init(void *devp, struct serial_console_data *scdp) return 0; } +SERIAL_DRIVER(mpc5200_psc_console_init); diff --git a/arch/powerpc/boot/mpsc.c b/arch/powerpc/boot/mpsc.c index 59e23e886440..2d0a72522b3c 100644 --- a/arch/powerpc/boot/mpsc.c +++ b/arch/powerpc/boot/mpsc.c @@ -170,3 +170,4 @@ int mpsc_console_init(void *devp, struct serial_console_data *scdp) err_out: return -1; } +SERIAL_DRIVER(mpsc_console_init); diff --git a/arch/powerpc/boot/ns16550.c b/arch/powerpc/boot/ns16550.c index 0e7bd5284255..49e8c299b829 100644 --- a/arch/powerpc/boot/ns16550.c +++ b/arch/powerpc/boot/ns16550.c @@ -81,3 +81,5 @@ int ns16550_console_init(void *devp, struct serial_console_data *scdp) return 0; } + +SERIAL_DRIVER(ns16550_console_init); diff --git a/arch/powerpc/boot/opal.c b/arch/powerpc/boot/opal.c index 0e92537536b9..dc345ff63422 100644 --- a/arch/powerpc/boot/opal.c +++ b/arch/powerpc/boot/opal.c @@ -102,3 +102,4 @@ int opal_console_init(void *devp, struct serial_console_data *scdp) return 0; } +SERIAL_DRIVER(opal_console_init); diff --git a/arch/powerpc/boot/ops.h b/arch/powerpc/boot/ops.h index fad1862f4b2d..06151ecb2f45 100644 --- a/arch/powerpc/boot/ops.h +++ b/arch/powerpc/boot/ops.h @@ -84,13 +84,17 @@ extern struct loader_info loader_info; void start(void); void fdt_init(void *blob); + +struct driver_ptr { + int (*ptr)(void *node, struct serial_console_data *scdp); +}; + +#define SERIAL_DRIVER(init_function) static const struct driver_ptr \ + __attribute__((__section__(".serialdrv"))) __attribute__((used)) \ + init_function##_ptr = { .ptr = &init_function, } + int serial_console_init(void); -int ns16550_console_init(void *devp, struct serial_console_data *scdp); -int mpsc_console_init(void *devp, struct serial_console_data *scdp); -int cpm_console_init(void *devp, struct serial_console_data *scdp); -int mpc5200_psc_console_init(void *devp, struct serial_console_data *scdp); -int uartlite_console_init(void *devp, struct serial_console_data *scdp); -int opal_console_init(void *devp, struct serial_console_data *scdp); + void *simple_alloc_init(char *base, unsigned long heap_size, unsigned long granularity, unsigned long max_allocs); extern void flush_cache(void *, unsigned long); diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c index 88955095ec07..a7f9ac9a27b5 100644 --- a/arch/powerpc/boot/serial.c +++ b/arch/powerpc/boot/serial.c @@ -105,11 +105,15 @@ static void *serial_get_stdout_devp(void) return NULL; } +extern unsigned long _serial_drv_start; +extern unsigned long _serial_drv_end; + static struct serial_console_data serial_cd; /* Node's "compatible" property determines which serial driver to use */ int serial_console_init(void) { + struct driver_ptr *start, *end, *probe; void *devp; int rc = -1; @@ -117,35 +121,14 @@ int serial_console_init(void) if (devp == NULL) goto err_out; - if (dt_is_compatible(devp, "ns16550") || - dt_is_compatible(devp, "pnpPNP,501")) - rc =
[PATCH 3/5] powerpc/zImage: Check compatible at driver init
Have each driver's init function check the compatible string of the node given by the stdout path. This gives the drivers a more traditional probe-also-inits structure. Signed-off-by: Oliver O'Halloran --- arch/powerpc/boot/cpm-serial.c | 2 ++ arch/powerpc/boot/mpc52xx-psc.c | 3 +++ arch/powerpc/boot/mpsc.c| 3 +++ arch/powerpc/boot/ns16550.c | 4 arch/powerpc/boot/opal.c| 3 +++ arch/powerpc/boot/uartlite.c| 5 + 6 files changed, 20 insertions(+) diff --git a/arch/powerpc/boot/cpm-serial.c b/arch/powerpc/boot/cpm-serial.c index dfb56829cace..f6b1cf23946e 100644 --- a/arch/powerpc/boot/cpm-serial.c +++ b/arch/powerpc/boot/cpm-serial.c @@ -212,6 +212,8 @@ int cpm_console_init(void *devp, struct serial_console_data *scdp) } else if (dt_is_compatible(devp, "fsl,cpm2-smc-uart")) { is_cpm2 = 1; is_smc = 1; + } else if (!dt_is_compatible(devp, "fsl,cpm1-scc-uart")) { + return 1; /* not compatible */ } if (is_smc) { diff --git a/arch/powerpc/boot/mpc52xx-psc.c b/arch/powerpc/boot/mpc52xx-psc.c index c2c08633ee35..75470936e661 100644 --- a/arch/powerpc/boot/mpc52xx-psc.c +++ b/arch/powerpc/boot/mpc52xx-psc.c @@ -52,6 +52,9 @@ static unsigned char psc_getc(void) int mpc5200_psc_console_init(void *devp, struct serial_console_data *scdp) { + if (!dt_is_compatible(devp, "fsl,mpc5200-psc-uart")) + return 1; + /* Get the base address of the psc registers */ if (dt_get_virtual_reg(devp, &psc, 1) < 1) return -1; diff --git a/arch/powerpc/boot/mpsc.c b/arch/powerpc/boot/mpsc.c index 425ad88cce8d..59e23e886440 100644 --- a/arch/powerpc/boot/mpsc.c +++ b/arch/powerpc/boot/mpsc.c @@ -128,6 +128,9 @@ int mpsc_console_init(void *devp, struct serial_console_data *scdp) int n, reg_set; volatile char *sdma_base; + if (!dt_is_compatible(devp, "marvell,mv64360-mpsc")) + return 1; + n = getprop(devp, "virtual-reg", &v, sizeof(v)); if (n != sizeof(v)) goto err_out; diff --git a/arch/powerpc/boot/ns16550.c b/arch/powerpc/boot/ns16550.c index b0da4466d419..0e7bd5284255 100644 --- a/arch/powerpc/boot/ns16550.c +++ b/arch/powerpc/boot/ns16550.c @@ -58,6 +58,10 @@ int ns16550_console_init(void *devp, struct serial_console_data *scdp) int n; u32 reg_offset; + if (!dt_is_compatible(devp, "ns16550") && + !dt_is_compatible(devp, "pnpPNP,501")) + return 1; + if (dt_get_virtual_reg(devp, (void **)®_base, 1) < 1) return -1; diff --git a/arch/powerpc/boot/opal.c b/arch/powerpc/boot/opal.c index dfb199ef5b94..0e92537536b9 100644 --- a/arch/powerpc/boot/opal.c +++ b/arch/powerpc/boot/opal.c @@ -83,6 +83,9 @@ static void opal_init(void) int opal_console_init(void *devp, struct serial_console_data *scdp) { + if (!dt_is_compatible(devp, "ibm,opal-console-raw")) + return 1; + opal_init(); if (devp) { diff --git a/arch/powerpc/boot/uartlite.c b/arch/powerpc/boot/uartlite.c index 46bed69b4169..fc66a5fcea66 100644 --- a/arch/powerpc/boot/uartlite.c +++ b/arch/powerpc/boot/uartlite.c @@ -62,6 +62,11 @@ int uartlite_console_init(void *devp, struct serial_console_data *scdp) int n; unsigned long reg_phys; + + if (!dt_is_compatible(devp, "xlnx,opb-uartlite-1.00.b") && + !dt_is_compatible(devp, "xlnx,xps-uartlite-1.00.a")) + return 1; + n = getprop(devp, "virtual-reg", ®_base, sizeof(reg_base)); if (n != sizeof(reg_base)) { if (!dt_xlate_reg(devp, 0, ®_phys, NULL)) -- 2.9.5
[PATCH 1/5] powerpc/zImage: Remove #ifdef in opal.c
This file is only ever compiled if CONFIG_PPC64_BOOT_WRAPPER is set so this check is unnecessary. Signed-off-by: Oliver O'Halloran --- arch/powerpc/boot/opal.c | 8 1 file changed, 8 deletions(-) diff --git a/arch/powerpc/boot/opal.c b/arch/powerpc/boot/opal.c index 0272570d02de..dfb199ef5b94 100644 --- a/arch/powerpc/boot/opal.c +++ b/arch/powerpc/boot/opal.c @@ -13,8 +13,6 @@ #include #include "../include/asm/opal-api.h" -#ifdef CONFIG_PPC64_BOOT_WRAPPER - /* Global OPAL struct used by opal-call.S */ struct opal { u64 base; @@ -101,9 +99,3 @@ int opal_console_init(void *devp, struct serial_console_data *scdp) return 0; } -#else -int opal_console_init(void *devp, struct serial_console_data *scdp) -{ - return -1; -} -#endif /* __powerpc64__ */ -- 2.9.5
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Sun, Mar 18, 2018 at 06:18:48PM +, Al Viro wrote: > I'd done some digging in that area, will find the notes and post. OK, found: We have two ABIs in the game - syscall and normal C. The latter (for all targets we support) can be described in the following terms: * there is a series of word-sized objects used to pass arguments, some in registers, some on stack. Arguments are mapped on that sequence. Anything up to word size simply takes the next available word, so on 64bit it's all there is - nth argument goes into the nth object, types simply do not matter. On 32bit it's not that simple - there 64bit arguments occupy two objects. They are still picked from the same sequence; on little-endian the lower half goes first, on big-endian - the higher one. For some architectures that's all there is to it. However, on quite a few there's an extra complication - not every pair can be used for 64bit value. Rules for those are arch-dependent. One variety is "pairs should be aligned", i.e. "if we'd consumed an odd number of slots, add a dummy one before eating a pair". Another is "don't let a pair span the registers/stack boundary"; surprisingly enough, that's not universal. The syscall ABI can considerably differ from C one. First of all, we really do *not* want to pass anything on stack - it's a major headache at syscall entry. See mips/o32 for the scale of that headache. Not fun. OTOH, the registers that can be used are a limited resource. i386 can't pass more than 6 words and that has served as an upper limit. If we encode the argument sizes (W - word, D - 64bit) we have the following variants among the syscalls: * no arguments at all (SYSCALL_DEFINE0) * W (SYSCALL_DEFINE1) * WW (SYSCALL_DEFINE2) * WWW (SYSCALL_DEFINE3) * (SYSCALL_DEFINE4) * W (SYSCALL_DEFINE5) * WW (SYSCALL_DEFINE6) * WD (SYSCALL_DEFINE2, truncate64, ftruncate64) * DWW (SYSCALL_DEFINE3, lookup_dcookie) * WDW (SYSCALL_DEFINE3, readahead) * WWWD (SYSCALL_DEFINE4, pread64, pwrite64) * WWDD (SYSCALL_DEFINE4, fallocate, sync_file_range2) * WDDW (SYSCALL_DEFINE4, sync_file_range, fadvise64_64) * WDWW (SYSCALL_DEFINE4, fadvise64) * WWDWW (SYSCALL_DEFINE5, fanotify_mark) The list for each long long-passing variant might be incomplete, but they really are rare. And a source of headache; not just for biarch architectures - e.g. c6x and metag are not biarch and these syscalls are exactly what stinks them up. One thing we *really* don't want is syscall-dependent mapping from syscall ABI registers to C ABI sequence. Inside a syscall (or in per-syscall glue) - sure, we can do it (if not happily). In the syscall dispatcher - fuck no, too much PITA. mips/o32 used to be a horrible example of why not, then they went for "copy from userland stack whether we need it or not"... For simple syscalls (first 7 classes in the above, each argument fits into word) we simply map registers to the first 6 slots of the sequence and we are done. It's bloody tempting to use the same mapping for the rest - use the same code that calls simple syscalls and have it call sys_readahead() instead of sys_mkdir(), etc. For something like x86 or sparc that works perfectly - these guys have no padding for 64bit arguments, so we are good (provided that userland sets those registers sanely, that is). OTOH, consider arm. There we have * r0, r1, r2, r3, [sp,#8], [sp,#12], [sp,#16]... is the sequence of objects used to pass arguments * 32bit and less - pick the next available slot * 64bit - skip a slot if we'd already taken an odd number, then use the next two slots for lower and upper 32 bits of the argument. So our classes take simple n-argument: 0 to 6 slots WD 4 slots DWW 4 slots WDW 5 slots WWDD6 slots WDWW5 slots WWWD6 slots WWDWW 6 slots WDDW7 slots (!) Also , , !@#!@#!@#!# and other nice and well-deserved comments from arch maintainers, some of them even printable: /* It would be nice if people remember that not all the world's an i386 when they introduce new system calls */ SYSCALL_DEFINE4(sync_file_range2, int, fd, unsigned int, flags, loff_t, offset, loff_t, nbytes) No idea why that hadn't been done to fadvise64_64() - that got /* * Since loff_t is a 64 bit type we avoid a lot of ABI hassle * with a different argument ordering. */ asmlinkage long sys_arm_fadvise64_64(int fd, int advice, loff_t offset, loff_t len) { return sys_fadvise64_64(fd, offset, len, advice); } long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low, u32 len_high, u32 len_low) { return sys_fadvise64(fd, (u64)offset_hig
[PATCH v2 9/9] powerpc/eeh: Add eeh_state_active() helper
Checking for a "fully active" device state requires testing two flag bits, which is open coded in several places, so add a function to do it. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh.h | 6 ++ arch/powerpc/kernel/eeh.c| 19 ++- arch/powerpc/platforms/powernv/eeh-powernv.c | 9 ++--- 3 files changed, 14 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h index fd37cc101f4f..c2266ca61853 100644 --- a/arch/powerpc/include/asm/eeh.h +++ b/arch/powerpc/include/asm/eeh.h @@ -256,6 +256,12 @@ static inline void eeh_serialize_unlock(unsigned long flags) raw_spin_unlock_irqrestore(&confirm_error_lock, flags); } +static inline bool eeh_state_active(int state) +{ + return (state & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) + == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE); +} + typedef void *(*eeh_traverse_func)(void *data, void *flag); void eeh_set_pe_aux_size(int size); int eeh_phb_pe_create(struct pci_controller *phb); diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 2b9df0040d6b..bc640e4c5ca5 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -394,9 +394,7 @@ static int eeh_phb_check_failure(struct eeh_pe *pe) /* Check PHB state */ ret = eeh_ops->get_state(phb_pe, NULL); if ((ret < 0) || - (ret == EEH_STATE_NOT_SUPPORT) || - (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) == - (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) { + (ret == EEH_STATE_NOT_SUPPORT) || eeh_state_active(ret)) { ret = 0; goto out; } @@ -433,7 +431,6 @@ static int eeh_phb_check_failure(struct eeh_pe *pe) int eeh_dev_check_failure(struct eeh_dev *edev) { int ret; - int active_flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE); unsigned long flags; struct device_node *dn; struct pci_dev *dev; @@ -525,8 +522,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev) * state, PE is in good state. */ if ((ret < 0) || - (ret == EEH_STATE_NOT_SUPPORT) || - ((ret & active_flags) == active_flags)) { + (ret == EEH_STATE_NOT_SUPPORT) || eeh_state_active(ret)) { eeh_stats.false_positives++; pe->false_positives++; rc = 0; @@ -546,8 +542,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev) /* Frozen parent PE ? */ ret = eeh_ops->get_state(parent_pe, NULL); - if (ret > 0 && - (ret & active_flags) != active_flags) + if (ret > 0 && !eeh_state_active(ret)) pe = parent_pe; /* Next parent level */ @@ -888,7 +883,6 @@ static void *eeh_set_dev_freset(void *data, void *flag) */ int eeh_pe_reset_full(struct eeh_pe *pe) { - int active_flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE); int reset_state = (EEH_PE_RESET | EEH_PE_CFG_BLOCKED); int type = EEH_RESET_HOT; unsigned int freset = 0; @@ -919,7 +913,7 @@ int eeh_pe_reset_full(struct eeh_pe *pe) /* Wait until the PE is in a functioning state */ state = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC); - if ((state & active_flags) == active_flags) + if (eeh_state_active(state)) break; if (state < 0) { @@ -1352,16 +1346,15 @@ static int eeh_pe_change_owner(struct eeh_pe *pe) struct eeh_dev *edev, *tmp; struct pci_dev *pdev; struct pci_device_id *id; - int flags, ret; + int ret; /* Check PE state */ - flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE); ret = eeh_ops->get_state(pe, NULL); if (ret < 0 || ret == EEH_STATE_NOT_SUPPORT) return 0; /* Unfrozen PE, nothing to do */ - if ((ret & flags) == flags) + if (eeh_state_active(ret)) return 0; /* Frozen PE, check if it needs PE level reset */ diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c index 33c86c1a1720..ddfc3544d285 100644 --- a/arch/powerpc/platforms/powernv/eeh-powernv.c +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c @@ -1425,11 +1425,8 @@ static int pnv_eeh_get_pe(struct pci_controller *hose, dev_pe = dev_pe->parent; while (dev_pe && !(dev_pe->type & EEH_PE_PHB)) { int ret; - int active_flags = (EEH_STATE_MMIO_ACTIVE | - EEH_STATE_DMA_ACTIVE); - ret = eeh_ops->get_state(dev_pe, NULL); - if (ret <= 0 || (ret & active_flags) == active_flags) { + if (ret <= 0 || eeh_state_active(ret)) { dev_pe = de
[PATCH v2 8/9] powerpc/eeh: Factor out common code eeh_reset_device()
The caller will always pass NULL for 'rmv_data' when 'eeh_aware_driver' is true, so the first two calls to eeh_pe_dev_traverse() can be combined without changing behaviour as can the two arms of the final 'if' block. This should not change behaviour. Signed-off-by: Sam Bobroff --- == v1 -> v2: == arch/powerpc/kernel/eeh_driver.c | 32 ++-- 1 file changed, 10 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 93fc22e791fa..aa3a2b08aec9 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -647,16 +647,12 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, * into pci_hp_add_devices(). */ eeh_pe_state_mark(pe, EEH_PE_KEEP); - if (!driver_eeh_aware) { - if (pe->type & EEH_PE_VF) { - eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL); - } else { - pci_lock_rescan_remove(); - pci_hp_remove_devices(bus); - pci_unlock_rescan_remove(); - } - } else { + if (driver_eeh_aware || (pe->type & EEH_PE_VF)) { eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data); + } else { + pci_lock_rescan_remove(); + pci_hp_remove_devices(bus); + pci_unlock_rescan_remove(); } /* @@ -691,8 +687,9 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, * the device up before the scripts have taken it down, * potentially weird things happen. */ - if (!driver_eeh_aware) { - pr_info("EEH: Sleep 5s ahead of complete hotplug\n"); + if (!driver_eeh_aware || rmv_data->removed) { + pr_info("EEH: Sleep 5s ahead of %s hotplug\n", + (eeh_aware_driver ? "partial" : "complete")); ssleep(5); /* @@ -705,19 +702,10 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, if (pe->type & EEH_PE_VF) { eeh_add_virt_device(edev, NULL); } else { - eeh_pe_state_clear(pe, EEH_PE_PRI_BUS); + if (!eeh_aware_driver) + eeh_pe_state_clear(pe, EEH_PE_PRI_BUS); pci_hp_add_devices(bus); } - } else if (rmv_data->removed) { - pr_info("EEH: Sleep 5s ahead of partial hotplug\n"); - ssleep(5); - - edev = list_first_entry(&pe->edevs, struct eeh_dev, list); - eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL); - if (pe->type & EEH_PE_VF) - eeh_add_virt_device(edev, NULL); - else - pci_hp_add_devices(bus); } eeh_pe_state_clear(pe, EEH_PE_KEEP); -- 2.16.1.74.g9b0b1f47b
[PATCH v2 7/9] powerpc/eeh: Remove always-true tests in eeh_reset_device()
eeh_reset_device() tests the value of 'bus' more than once but the only caller, eeh_handle_normal_device() does this test itself and will never pass NULL. So, remove the dead tests. This should not change behaviour. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_driver.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 07437d765434..93fc22e791fa 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -655,7 +655,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, pci_hp_remove_devices(bus); pci_unlock_rescan_remove(); } - } else if (bus) { + } else { eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data); } @@ -708,7 +708,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, eeh_pe_state_clear(pe, EEH_PE_PRI_BUS); pci_hp_add_devices(bus); } - } else if (bus && rmv_data->removed) { + } else if (rmv_data->removed) { pr_info("EEH: Sleep 5s ahead of partial hotplug\n"); ssleep(5); -- 2.16.1.74.g9b0b1f47b
[PATCH v2 6/9] powerpc/eeh: Clarify arguments to eeh_reset_device()
It is currently difficult to understand the behaviour of eeh_reset_device() due to the way it's parameters are used. In particular, when 'bus' is NULL, it's value is still necessary so the same value is looked up again locally under a different name ('frozen_bus') but behaviour is changed. To clarify this, add a new parameter 'driver_eeh_aware', and have the caller set it when it would have passed NULL for 'bus' and always pass a value for 'bus'. Then change any test that was on 'bus' to one on '!driver_eeh_aware' and replace uses of 'frozen_bus' with 'bus'. Also update the function's comment. This should not change behaviour. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_driver.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index cb584d72b0a5..07437d765434 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -619,17 +619,19 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe) /** * eeh_reset_device - Perform actual reset of a pci slot + * @driver_eeh_aware: Does the device's driver provide EEH support? * @pe: EEH PE * @bus: PCI bus corresponding to the isolcated slot + * @rmv_data: Optional, list to record removed devices * * This routine must be called to do reset on the indicated PE. * During the reset, udev might be invoked because those affected * PCI devices will be removed and then added. */ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, - struct eeh_rmv_data *rmv_data) + struct eeh_rmv_data *rmv_data, + bool driver_eeh_aware) { - struct pci_bus *frozen_bus = eeh_pe_bus_get(pe); time64_t tstamp; int cnt, rc; struct eeh_dev *edev; @@ -645,7 +647,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, * into pci_hp_add_devices(). */ eeh_pe_state_mark(pe, EEH_PE_KEEP); - if (bus) { + if (!driver_eeh_aware) { if (pe->type & EEH_PE_VF) { eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL); } else { @@ -653,7 +655,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, pci_hp_remove_devices(bus); pci_unlock_rescan_remove(); } - } else if (frozen_bus) { + } else if (bus) { eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data); } @@ -689,7 +691,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, * the device up before the scripts have taken it down, * potentially weird things happen. */ - if (bus) { + if (!driver_eeh_aware) { pr_info("EEH: Sleep 5s ahead of complete hotplug\n"); ssleep(5); @@ -706,7 +708,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, eeh_pe_state_clear(pe, EEH_PE_PRI_BUS); pci_hp_add_devices(bus); } - } else if (frozen_bus && rmv_data->removed) { + } else if (bus && rmv_data->removed) { pr_info("EEH: Sleep 5s ahead of partial hotplug\n"); ssleep(5); @@ -715,7 +717,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, if (pe->type & EEH_PE_VF) eeh_add_virt_device(edev, NULL); else - pci_hp_add_devices(frozen_bus); + pci_hp_add_devices(bus); } eeh_pe_state_clear(pe, EEH_PE_KEEP); @@ -820,7 +822,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) */ if (result == PCI_ERS_RESULT_NONE) { pr_info("EEH: Reset with hotplug activity\n"); - rc = eeh_reset_device(pe, bus, NULL); + rc = eeh_reset_device(pe, bus, NULL, false); if (rc) { pr_warn("%s: Unable to reset, err=%d\n", __func__, rc); @@ -872,7 +874,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) /* If any device called out for a reset, then reset the slot */ if (result == PCI_ERS_RESULT_NEED_RESET) { pr_info("EEH: Reset without hotplug activity\n"); - rc = eeh_reset_device(pe, NULL, &rmv_data); + rc = eeh_reset_device(pe, bus, &rmv_data, true); if (rc) { pr_warn("%s: Cannot reset, err=%d\n", __func__, rc); -- 2.16.1.74.g9b0b1f47b
[PATCH v2 5/9] powerpc/eeh: Rename frozen_bus to bus in eeh_handle_normal_event()
The name "frozen_bus" is misleading: it's not necessarily frozen, it's just the PE's PCI bus. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_driver.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 04a5d9db5499..cb584d72b0a5 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -754,14 +754,14 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, */ void eeh_handle_normal_event(struct eeh_pe *pe) { - struct pci_bus *frozen_bus; + struct pci_bus *bus; struct eeh_dev *edev, *tmp; int rc = 0; enum pci_ers_result result = PCI_ERS_RESULT_NONE; struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.edev_list), 0}; - frozen_bus = eeh_pe_bus_get(pe); - if (!frozen_bus) { + bus = eeh_pe_bus_get(pe); + if (!bus) { pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n", __func__, pe->phb->global_number, pe->addr); return; @@ -820,7 +820,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) */ if (result == PCI_ERS_RESULT_NONE) { pr_info("EEH: Reset with hotplug activity\n"); - rc = eeh_reset_device(pe, frozen_bus, NULL); + rc = eeh_reset_device(pe, bus, NULL); if (rc) { pr_warn("%s: Unable to reset, err=%d\n", __func__, rc); @@ -938,7 +938,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe) eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED); pci_lock_rescan_remove(); - pci_hp_remove_devices(frozen_bus); + pci_hp_remove_devices(bus); pci_unlock_rescan_remove(); /* The passed PE should no longer be used */ return; -- 2.16.1.74.g9b0b1f47b
[PATCH v2 4/9] powerpc/eeh: Remove misleading test in eeh_handle_normal_event()
Remove a test that checks if "frozen_bus" is NULL, because it cannot have changed since it was tested at the start of the function and so must be true here. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_driver.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 5b7a5ed4db4d..04a5d9db5499 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -930,20 +930,18 @@ void eeh_handle_normal_event(struct eeh_pe *pe) * all removed devices correctly to avoid access * the their PCI config any more. */ - if (frozen_bus) { - if (pe->type & EEH_PE_VF) { - eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL); - eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED); - } else { - eeh_pe_state_clear(pe, EEH_PE_PRI_BUS); - eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED); + if (pe->type & EEH_PE_VF) { + eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL); + eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED); + } else { + eeh_pe_state_clear(pe, EEH_PE_PRI_BUS); + eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED); - pci_lock_rescan_remove(); - pci_hp_remove_devices(frozen_bus); - pci_unlock_rescan_remove(); - /* The passed PE should no longer be used */ - return; - } + pci_lock_rescan_remove(); + pci_hp_remove_devices(frozen_bus); + pci_unlock_rescan_remove(); + /* The passed PE should no longer be used */ + return; } final: eeh_pe_state_clear(pe, EEH_PE_RECOVERING); -- 2.16.1.74.g9b0b1f47b
[PATCH v2 3/9] powerpc/eeh: Fix misleading comment in __eeh_addr_cache_get_device()
Commit "0ba17b05 powerpc/eeh: Remove reference to PCI device" removed a call to pci_dev_get() from __eeh_addr_cache_get_device() but did not update the comment to match. Signed-off-by: Sam Bobroff --- arch/powerpc/kernel/eeh_cache.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c index d4cc26618809..201943d54a6e 100644 --- a/arch/powerpc/kernel/eeh_cache.c +++ b/arch/powerpc/kernel/eeh_cache.c @@ -84,8 +84,7 @@ static inline struct eeh_dev *__eeh_addr_cache_get_device(unsigned long addr) * @addr: mmio (PIO) phys address or i/o port number * * Given an mmio phys address, or a port number, find a pci device - * that implements this address. Be sure to pci_dev_put the device - * when finished. I/O port numbers are assumed to be offset + * that implements this address. I/O port numbers are assumed to be offset * from zero (that is, they do *not* have pci_io_addr added in). * It is safe to call this function within an interrupt. */ -- 2.16.1.74.g9b0b1f47b
[PATCH v2 2/9] powerpc/eeh: Manage EEH_PE_RECOVERING inside eeh_handle_normal_event()
Currently the EEH_PE_RECOVERING flag for a PE is managed by both the caller and callee of eeh_handle_normal_event() (among other places not considered here). This is complicated by the fact that the PE may or may not have been invalidated by the call. So move the callee's handling into eeh_handle_normal_event(), which clarifies it and allows the return type to be changed to void (because it no longer needs to indicate at the PE has been invalidated). This should not change behaviour except in eeh_event_handler() where it was previously possible to cause eeh_pe_state_clear() to be called on an invalid PE, which is now avoided. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh_event.h | 2 +- arch/powerpc/kernel/eeh_driver.c | 29 +++-- arch/powerpc/kernel/eeh_event.c | 2 -- 3 files changed, 12 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/eeh_event.h b/arch/powerpc/include/asm/eeh_event.h index 0a168038882d..9884e872686f 100644 --- a/arch/powerpc/include/asm/eeh_event.h +++ b/arch/powerpc/include/asm/eeh_event.h @@ -34,7 +34,7 @@ struct eeh_event { int eeh_event_init(void); int eeh_send_failure_event(struct eeh_pe *pe); void eeh_remove_event(struct eeh_pe *pe, bool force); -bool eeh_handle_normal_event(struct eeh_pe *pe); +void eeh_handle_normal_event(struct eeh_pe *pe); void eeh_handle_special_event(void); #endif /* __KERNEL__ */ diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 51b21c97910f..5b7a5ed4db4d 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -733,7 +733,8 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, /** * eeh_handle_normal_event - Handle EEH events on a specific PE - * @pe: EEH PE + * @pe: EEH PE - which should not be used after we return, as it may + * have been invalidated. * * Attempts to recover the given PE. If recovery fails or the PE has failed * too many times, remove the PE. @@ -750,10 +751,8 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, * & devices under this slot, and then finally restarting the device * drivers (which cause a second set of hotplug events to go out to * userspace). - * - * Returns true if @pe should no longer be used, else false. */ -bool eeh_handle_normal_event(struct eeh_pe *pe) +void eeh_handle_normal_event(struct eeh_pe *pe) { struct pci_bus *frozen_bus; struct eeh_dev *edev, *tmp; @@ -765,9 +764,11 @@ bool eeh_handle_normal_event(struct eeh_pe *pe) if (!frozen_bus) { pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n", __func__, pe->phb->global_number, pe->addr); - return false; + return; } + eeh_pe_state_mark(pe, EEH_PE_RECOVERING); + eeh_pe_update_time_stamp(pe); pe->freeze_count++; if (pe->freeze_count > eeh_max_freezes) { @@ -904,7 +905,7 @@ bool eeh_handle_normal_event(struct eeh_pe *pe) pr_info("EEH: Notify device driver to resume\n"); eeh_pe_dev_traverse(pe, eeh_report_resume, NULL); - return false; + goto final; hard_fail: /* @@ -940,12 +941,12 @@ bool eeh_handle_normal_event(struct eeh_pe *pe) pci_lock_rescan_remove(); pci_hp_remove_devices(frozen_bus); pci_unlock_rescan_remove(); - /* The passed PE should no longer be used */ - return true; + return; } } - return false; +final: + eeh_pe_state_clear(pe, EEH_PE_RECOVERING); } /** @@ -1018,15 +1019,7 @@ void eeh_handle_special_event(void) */ if (rc == EEH_NEXT_ERR_FROZEN_PE || rc == EEH_NEXT_ERR_FENCED_PHB) { - /* -* eeh_handle_normal_event() can make the PE stale if it -* determines that the PE cannot possibly be recovered. -* Don't modify the PE state if that's the case. -*/ - if (eeh_handle_normal_event(pe)) - continue; - - eeh_pe_state_clear(pe, EEH_PE_RECOVERING); + eeh_handle_normal_event(pe); } else { pci_lock_rescan_remove(); list_for_each_entry(hose, &hose_list, list_node) { diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c index 872bcfe8f90e..61c9356bf9c9 100644 --- a/arch/powerpc/kernel/eeh_event.c +++ b/arch/powerpc/kernel/eeh_event.c @@ -73,7 +73,6 @@ static int eeh_event_handler(void * dummy) /* We might have event without binding PE */ pe = event->pe; if (pe) { - eeh_pe_state_mark(pe, EEH_PE_RE
[PATCH v2 1/9] powerpc/eeh: Remove eeh_handle_event()
The function eeh_handle_event(pe) does nothing other than switching between calling eeh_handle_normal_event(pe) and eeh_handle_special_event(). However it is only called in two places, one where pe can't be NULL and the other where it must be NULL (see eeh_event_handler()) so it does nothing but obscure the flow of control. So, remove it. Signed-off-by: Sam Bobroff --- arch/powerpc/include/asm/eeh_event.h | 3 ++- arch/powerpc/kernel/eeh_driver.c | 42 +--- arch/powerpc/kernel/eeh_event.c | 4 ++-- 3 files changed, 19 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/include/asm/eeh_event.h b/arch/powerpc/include/asm/eeh_event.h index 1e551a2d6f82..0a168038882d 100644 --- a/arch/powerpc/include/asm/eeh_event.h +++ b/arch/powerpc/include/asm/eeh_event.h @@ -34,7 +34,8 @@ struct eeh_event { int eeh_event_init(void); int eeh_send_failure_event(struct eeh_pe *pe); void eeh_remove_event(struct eeh_pe *pe, bool force); -void eeh_handle_event(struct eeh_pe *pe); +bool eeh_handle_normal_event(struct eeh_pe *pe); +void eeh_handle_special_event(void); #endif /* __KERNEL__ */ #endif /* ASM_POWERPC_EEH_EVENT_H */ diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 0c0b66fc5bfb..51b21c97910f 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -738,9 +738,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus, * Attempts to recover the given PE. If recovery fails or the PE has failed * too many times, remove the PE. * + * While PHB detects address or data parity errors on particular PCI + * slot, the associated PE will be frozen. Besides, DMA's occurring + * to wild addresses (which usually happen due to bugs in device + * drivers or in PCI adapter firmware) can cause EEH error. #SERR, + * #PERR or other misc PCI-related errors also can trigger EEH errors. + * + * Recovery process consists of unplugging the device driver (which + * generated hotplug events to userspace), then issuing a PCI #RST to + * the device, then reconfiguring the PCI config space for all bridges + * & devices under this slot, and then finally restarting the device + * drivers (which cause a second set of hotplug events to go out to + * userspace). + * * Returns true if @pe should no longer be used, else false. */ -static bool eeh_handle_normal_event(struct eeh_pe *pe) +bool eeh_handle_normal_event(struct eeh_pe *pe) { struct pci_bus *frozen_bus; struct eeh_dev *edev, *tmp; @@ -942,7 +955,7 @@ static bool eeh_handle_normal_event(struct eeh_pe *pe) * specific PE. Iterates through possible failures and handles them as * necessary. */ -static void eeh_handle_special_event(void) +void eeh_handle_special_event(void) { struct eeh_pe *pe, *phb_pe; struct pci_bus *bus; @@ -1049,28 +1062,3 @@ static void eeh_handle_special_event(void) break; } while (rc != EEH_NEXT_ERR_NONE); } - -/** - * eeh_handle_event - Reset a PCI device after hard lockup. - * @pe: EEH PE - * - * While PHB detects address or data parity errors on particular PCI - * slot, the associated PE will be frozen. Besides, DMA's occurring - * to wild addresses (which usually happen due to bugs in device - * drivers or in PCI adapter firmware) can cause EEH error. #SERR, - * #PERR or other misc PCI-related errors also can trigger EEH errors. - * - * Recovery process consists of unplugging the device driver (which - * generated hotplug events to userspace), then issuing a PCI #RST to - * the device, then reconfiguring the PCI config space for all bridges - * & devices under this slot, and then finally restarting the device - * drivers (which cause a second set of hotplug events to go out to - * userspace). - */ -void eeh_handle_event(struct eeh_pe *pe) -{ - if (pe) - eeh_handle_normal_event(pe); - else - eeh_handle_special_event(); -} diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c index accbf8b5fd46..872bcfe8f90e 100644 --- a/arch/powerpc/kernel/eeh_event.c +++ b/arch/powerpc/kernel/eeh_event.c @@ -81,10 +81,10 @@ static int eeh_event_handler(void * dummy) pr_info("EEH: Detected PCI bus error on " "PHB#%x-PE#%x\n", pe->phb->global_number, pe->addr); - eeh_handle_event(pe); + eeh_handle_normal_event(pe); eeh_pe_state_clear(pe, EEH_PE_RECOVERING); } else { - eeh_handle_event(NULL); + eeh_handle_special_event(); } kfree(event); -- 2.16.1.74.g9b0b1f47b
[PATCH v2 0/9] EEH refactoring 1
Hello everyone, Here is a set of some small, mostly idempotent, changes to improve maintainability in some of the EEH code, primarily in eeh_driver.c. I've kept them all small to aid review but perhaps they should be squashed down before being applied. Cheers, Sam. Patch set changelog follows: == v1 -> v2: == Patch 1/9: powerpc/eeh: Remove eeh_handle_event() Patch 2/9: powerpc/eeh: Manage EEH_PE_RECOVERING inside eeh_handle_normal_event() Patch 3/9: powerpc/eeh: Fix misleading comment in __eeh_addr_cache_get_device() Patch 4/9: powerpc/eeh: Remove misleading test in eeh_handle_normal_event() Patch 5/9: powerpc/eeh: Rename frozen_bus to bus in eeh_handle_normal_event() Patch 6/9: powerpc/eeh: Clarify arguments to eeh_reset_device() * Re-ordered parameters to eeh_reset_device() to keep pe first. * Changed eeh_aware_driver to driver_eeh_aware. Patch 7/9: powerpc/eeh: Remove always-true tests in eeh_reset_device() Patch 8/9: powerpc/eeh: Factor out common code eeh_reset_device() * In one case, added braces to "if" to match "else". Patch 9/9: powerpc/eeh: Add eeh_state_active() helper == v1: == Patch 1/9: powerpc/eeh: Remove eeh_handle_event() Patch 2/9: powerpc/eeh: Manage EEH_PE_RECOVERING inside eeh_handle_normal_event() Patch 3/9: powerpc/eeh: Fix misleading comment in __eeh_addr_cache_get_device() Patch 4/9: powerpc/eeh: Remove misleading test in eeh_handle_normal_event() Patch 5/9: powerpc/eeh: Rename frozen_bus to bus in eeh_handle_normal_event() Patch 6/9: powerpc/eeh: Clarify arguments to eeh_reset_device() Patch 7/9: powerpc/eeh: Remove always-true tests in eeh_reset_device() Patch 8/9: powerpc/eeh: Factor out common code eeh_reset_device() Patch 9/9: powerpc/eeh: Add eeh_state_active() helper Sam Bobroff (9): powerpc/eeh: Remove eeh_handle_event() powerpc/eeh: Manage EEH_PE_RECOVERING inside eeh_handle_normal_event() powerpc/eeh: Fix misleading comment in __eeh_addr_cache_get_device() powerpc/eeh: Remove misleading test in eeh_handle_normal_event() powerpc/eeh: Rename frozen_bus to bus in eeh_handle_normal_event() powerpc/eeh: Clarify arguments to eeh_reset_device() powerpc/eeh: Remove always-true tests in eeh_reset_device() powerpc/eeh: Factor out common code eeh_reset_device() powerpc/eeh: Add eeh_state_active() helper arch/powerpc/include/asm/eeh.h | 6 ++ arch/powerpc/include/asm/eeh_event.h | 3 +- arch/powerpc/kernel/eeh.c| 19 ++-- arch/powerpc/kernel/eeh_cache.c | 3 +- arch/powerpc/kernel/eeh_driver.c | 137 +++ arch/powerpc/kernel/eeh_event.c | 6 +- arch/powerpc/platforms/powernv/eeh-powernv.c | 9 +- 7 files changed, 72 insertions(+), 111 deletions(-) -- 2.16.1.74.g9b0b1f47b
Re: [PATCH v3 18/18] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs
On Sat, Mar 17, 2018 at 02:30:10PM -0400, Sinan Kaya wrote: > Somebody also has to take a task and work very hard to get rid of > __raw_writeX() > APIs in drivers/net directory. It looked like a very common practice though > it clearly violates multiarch portability concerns Jason and Deve highlighted. When you posted your list I thought most of the hits were in what I'd think of 'one-arch drivers', eg an IRQ controller or clock driver or something.. Some might have a reason for it (eg avoiding the swap, for instance), maybe it is a hold over from before writel_relaxed, or maybe it is just a cargo-cult behavior.. It is the obviously multi-arch drivers that probably need some attention.. Jason
Re: [RFC PATCH 2/2] powerpc: Only support DYNAMIC_FTRACE not static
Steven Rostedt writes: > On Sat, 17 Mar 2018 00:46:33 +1100 > Michael Ellerman wrote: > >> We've had dynamic ftrace support for over 9 years since Steve first >> wrote it, all the distros use dynamic, and static is basically >> untested these days, so drop support for static ftrace. >> >> Signed-off-by: Michael Ellerman >> --- >> arch/powerpc/Kconfig | 1 + >> arch/powerpc/include/asm/ftrace.h | 4 +--- >> arch/powerpc/include/asm/module.h | 5 - >> arch/powerpc/kernel/trace/ftrace.c | 2 -- >> arch/powerpc/kernel/trace/ftrace_32.S | 20 -- >> arch/powerpc/kernel/trace/ftrace_64.S | 29 >> -- >> arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 3 --- >> arch/powerpc/kernel/trace/ftrace_64_pg.S | 2 -- >> 8 files changed, 2 insertions(+), 64 deletions(-) >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> index 73ce5dd07642..23a325df784a 100644 >> --- a/arch/powerpc/Kconfig >> +++ b/arch/powerpc/Kconfig >> @@ -189,6 +189,7 @@ config PPC >> select HAVE_DEBUG_STACKOVERFLOW >> select HAVE_DMA_API_DEBUG >> select HAVE_DYNAMIC_FTRACE >> +select HAVE_DYNAMIC_FTRACE_ONLY > > I still think adding: > > select DYNAMIC_FTRACE if FUNCTION_TRACER > > is the better approach. OK. As I said in my other reply it's a bit fragile, but it does work. I'll do a version for powerpc using the above approach. > But I'm all for this patch. I've debated doing the same thing for x86, > but the only reason I have not, was because it's the only way I test > the !DYNAMIC_FTRACE code. I've broken the static function tracing > several times and only find out during my test suite that still tests > that case. But yeah, it would be nice to just nuke static function > tracing for all archs. Perhaps after we finish removing unused archs, > that may be the way to go forward. Yeah I did look and we still have some arches that support ftrace but not dynamic ftrace, but there's not many. cheers
Re: [RFC PATCH 1/2] ftrace: Allow arches to opt-out of static ftrace
Steven Rostedt writes: > On Sat, 17 Mar 2018 00:46:32 +1100 > Michael Ellerman wrote: > >> There is a small but non-zero amount of code required by arches to >> suppory non-dynamic (static) ftrace, and more importantly there is the >> added work of testing both configurations. >> >> There are also almost no down sides to dynamic ftrace once it's well >> tested, other than a small increase in code/data size. >> >> So give arches the option to opt-out of supporting static ftrace. >> >> This is implemented as a DYNAMIC_FTRACE_CHOICE option, which controls >> whether DYNAMIC_FTRACE is presented as a user-selectable option or if >> it is just enabled based on its dependencies being enabled (because >> it's already default y). >> >> Then the CHOICE option depends on an arch *not* selecting >> HAVE_DYNAMIC_FTRACE_ONLY. This would be more natural in reverse, as a >> HAVE_STATIC_FTRACE option, but that would require updating every arch. >> >> Signed-off-by: Michael Ellerman > > Why not just add in arch/powerpc/Kconfig: > > config PPC > [..] > select DYNAMIC_FTRACE if FUNCTION_TRACER > > ? > > It seems to work for me. It does work, but it's a bit fragile. It requires duplicating the dependencies of DYNAMIC_FTRACE in the 'if' condition. Currently that's: config DYNAMIC_FTRACE depends on FUNCTION_TRACER depends on HAVE_DYNAMIC_FTRACE So technically we should use: select DYNAMIC_FTRACE if FUNCTION_TRACER and HAVE_DYNAMIC_FTRACE Though we happen to know we just selected HAVE_DYNAMIC_FTRACE so we can leave that out. As long as the dependencies of DYNAMIC_FTRACE don't change, or the 'if' clause is updated when they do, then it's OK and it is certainly simpler. cheers
Re: [PATCH 1/5] mtd: Initialize ->fail_addr early in mtd_erase()
On Mon, 12 Feb 2018 22:03:07 +0100 Boris Brezillon wrote: > mtd_erase() can return an error before ->fail_addr is initialized to > MTD_FAIL_ADDR_UNKNOWN. Move this initialization at the very beginning > of the function. Applied the patchset after addressing Miquel's comments. > > Signed-off-by: Boris Brezillon > --- > drivers/mtd/mtdcore.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c > index a1c94526fb88..c87859ff338b 100644 > --- a/drivers/mtd/mtdcore.c > +++ b/drivers/mtd/mtdcore.c > @@ -953,6 +953,8 @@ EXPORT_SYMBOL_GPL(__put_mtd_device); > */ > int mtd_erase(struct mtd_info *mtd, struct erase_info *instr) > { > + instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN; > + > if (!mtd->erasesize || !mtd->_erase) > return -ENOTSUPP; > > @@ -961,7 +963,6 @@ int mtd_erase(struct mtd_info *mtd, struct erase_info > *instr) > if (!(mtd->flags & MTD_WRITEABLE)) > return -EROFS; > > - instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN; > if (!instr->len) { > instr->state = MTD_ERASE_DONE; > mtd_erase_callback(instr); -- Boris Brezillon, Bootlin (formerly Free Electrons) Embedded Linux and Kernel engineering https://bootlin.com
Re: [RFC PATCH 2/6] fs: provide a generic compat_sys_truncate64() implementation
On Sun, Mar 18, 2018 at 9:10 AM, Dominik Brodowski wrote: > +#ifdef __ARCH_WANT_COMPAT_SYS_TRUNCATE64 > +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE4(truncate64, const char __user *, filename, u32 > padding, > + unsigned int, offset_low, unsigned int, offset_high) > +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + !defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE4(truncate64, const char __user *, filename, u32 > padding, > + unsigned int, offset_high, unsigned int, offset_low) > +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE3(truncate64, const char __user *, filename, > + unsigned int, offset_low, unsigned int, offset_high) > +#else /* no padding, big endian */ > +COMPAT_SYSCALL_DEFINE3(truncate64, const char __user *, filename, > + unsigned int, offset_high, unsigned int, offset_low) > +#endif > +{ > +#ifdef CONFIG_SPARC > + if ((int) offset_high < 0) > + return -EINVAL; > +#endif > + return do_sys_truncate(filename, > + ((loff_t) offset_high << 32) | offset_low); > +} > +#endif /* __ARCH_WANT_COMPAT_SYS_TRUNCATE64 */ This really screams out for a sparc-specific wrapper, or maybe that #ifdef CONFIG_SPARC should just happen for everybody. But regardless, code like the above is completely unacceptable. And yes, it also shows that my suggested alternative doesn't really work, because of the way the padding changes the number of arguments, giving that whole COMPAT_SYSCALL_DEFINE 3-vs-4 argument version. So I think just making it be arch-specific is the right thing. Alternatively, the COMPAT_SYSCALL_DEFINE() macro could be made smarter and do self-counting. There are generally tricks to count macro arguments, particularly when you know that they are limited. Things like #define NARG(...) __NARG(__VA_ARGS__##, 6,5,4,3,2,1) #define _NARG(a,b,c,d,e,f,...) f or something (I may have screwed that up, you get the idea). Linus
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Sun, Mar 18, 2018 at 11:06:42AM -0700, Linus Torvalds wrote: > and then we can do > > COMPAT_SYSCALL_DEFINE5(readahead, int, fd, > COMPAT_ARG_64BIT_ODD(off), compat_size_t, count) > { > return do_readahead(fd, off_lo + ((u64)off_hi << 64), count); > } > > which at least looks reasonably legible, and has *zero* ifdef's anywhere. It's a bit more complicated, but... > I do *not* want to see those disgusting __ARCH_WANT_LE_COMPAT_SYS > things and crazy #ifdef's in code. Absolutely. Those piles of ifdefs are unreadable garbage. > So either let the architectures do their own trivial wrappers > entirely, or do something clean like the above. Do *not* do > #ifdef'fery at the system call declaration time. > > Also note that the "ODD" arguments may not be the ones that need > padding. I could easily see a system call argument numbering scheme > like > >r0 - system call number >r1 - first argument >r2 - second argument >... > > and then it's the *EVEN* 64-bit arguments that would need the padding > (because they are actually odd in the register numbers). The above > COMPAT_ARG_64BIT[_ODD]() model allows for that too. > > Of course, if some architecture then has some other arbitrary rules (I > could see register pairing rules that aren't the usual "even register" > ones), then such an architecture would really have to have its own > wrapper, but the above at least would handle the simple cases, and > doesn't look disgusting to use. I'd done some digging in that area, will find the notes and post. Basically, we can even avoid the odd/even annotations and have COMPAT_SYSCALL_DEFINE... sort it out. It's a bit more hairy than I would like at this stage in the cycle, though. I'll see if it can be done without too much PITA. However, there still are genuinely speci^Wfucked in head cases - see e.g. this sad story: commit ab8a261ba5e2dd9206da640de5870cc31d568a7c Author: Helge Deller Date: Thu Jul 10 18:07:17 2014 +0200 parisc: fix fanotify_mark() syscall on 32bit compat kernel Those certainly ought to stay in arch/*
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Sun, Mar 18, 2018 at 05:10:54PM +0100, Dominik Brodowski wrote: > +#ifdef __ARCH_WANT_COMPAT_SYS_READAHEAD > +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding, > +unsigned int, off_lo, unsigned int, off_hi, > +size_t, count) > +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + !defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding, > +unsigned int, off_hi, unsigned int, off_lo, > +size_t, count) > +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE4(readahead, int, fd, > +unsigned int, off_lo, unsigned int, off_hi, > +size_t, count) > +#else /* no padding, big endian */ > +COMPAT_SYSCALL_DEFINE4(readahead, int, fd, > +unsigned int, off_hi, unsigned int, off_lo, > +size_t, count) > +#endif > +{ > + return do_readahead(fd, ((u64) off_hi << 32) | off_lo, count); > } *UGH* static inline compat_to_u64(u32 w0, u32 w1) { #ifdef __BIG_ENDIAN return ((u64)w0 << 32) | w1; #else return ((u64)w1 << 32) | w0; #endif } in compat.h, then this turns into #ifdef __ARCH_WANT_COMPAT_SYS_WITH_PADDING COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding, u32, off0, u32 off1, compat_size_t, count) #else COMPAT_SYSCALL_DEFINE4(readahead, int, fd, u32, off0, u32 off1, compat_size_t, count) #endif { return do_readahead(fd, compat_to_u64(off0, off1), count); }
Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
On Sun, Mar 18, 2018 at 10:40 AM, Al Viro wrote: > > *UGH* > > static inline compat_to_u64(u32 w0, u32 w1) > { > #ifdef __BIG_ENDIAN > return ((u64)w0 << 32) | w1; > #else > return ((u64)w1 << 32) | w0; > #endif > } > > in compat.h, then this turns into > > #ifdef __ARCH_WANT_COMPAT_SYS_WITH_PADDING > COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding, >u32, off0, u32 off1, >compat_size_t, count) > #else > COMPAT_SYSCALL_DEFINE4(readahead, int, fd, >u32, off0, u32 off1, >compat_size_t, count) > #endif No. This is still too ugly to live. What *may* be acceptable is if architectures defined something like this: x86: /* Little endian registers - low bits first, no padding for odd register numbers necessary */ #define COMPAT_ARG_64BIT(x) unsigned int x##_lo, unsigned int x##_hi #define COMPAT_ARG_64BIT_ODD(x) COMPAT_ARG_64BIT(x) ppc BE: /* Big-endian registers - high bits first, odd argument pairs padded up to the next even register */ #define COMPAT_ARG_64BIT(x) unsigned int x##_hi, unsigned int x##_lo #define COMPAT_ARG_64BIT_ODD(x) unsigned int x##_padding, COMPAT_ARG_64BIT(x) and then we can do COMPAT_SYSCALL_DEFINE5(readahead, int, fd, COMPAT_ARG_64BIT_ODD(off), compat_size_t, count) { return do_readahead(fd, off_lo + ((u64)off_hi << 64), count); } which at least looks reasonably legible, and has *zero* ifdef's anywhere. I do *not* want to see those disgusting __ARCH_WANT_LE_COMPAT_SYS things and crazy #ifdef's in code. So either let the architectures do their own trivial wrappers entirely, or do something clean like the above. Do *not* do #ifdef'fery at the system call declaration time. Also note that the "ODD" arguments may not be the ones that need padding. I could easily see a system call argument numbering scheme like r0 - system call number r1 - first argument r2 - second argument ... and then it's the *EVEN* 64-bit arguments that would need the padding (because they are actually odd in the register numbers). The above COMPAT_ARG_64BIT[_ODD]() model allows for that too. Of course, if some architecture then has some other arbitrary rules (I could see register pairing rules that aren't the usual "even register" ones), then such an architecture would really have to have its own wrapper, but the above at least would handle the simple cases, and doesn't look disgusting to use. Linus PS. It is possible that we should then add a #define COMPAT_ARG_64BIT_VAL(x) (x_##lo + ((u64)x_##hi << 32)) and then do COMPAT_SYSCALL_DEFINE5(readahead, int, fd, COMPAT_ARG_64BIT_ODD(off), compat_size_t, count) { return do_readahead(fd, COMPAT_ARG_64BIT_VAL(off), count); } because then we could perhaps generate the *non*compat system calls this way too, just using #define COMPAT_ARG_64BIT(x) unsigned long x #define COMPAT_ARG_64BIT_VAL(x) (x) instead (there might also be a "compat" mode that actually has access to 64-bit registers, like x32 does, but I suspect it would have other issues).
Re: [RFC PATCH 3/6] fs: provide generic compat_sys_p{read, write}64() implementations
On Sun, Mar 18, 2018 at 05:10:53PM +0100, Dominik Brodowski wrote: > +#ifdef __ARCH_WANT_COMPAT_SYS_PREADWRITE64 > +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE6(pread64, unsigned int, fd, char __user *, ubuf, > +u32, count, u32, padding, u32, poslo, u32, poshi) > +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + !defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE6(pread64, unsigned int, fd, char __user *, ubuf, > +u32, count, u32, padding, u32, poshi, u32, poslo) > +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, ubuf, > +u32, count, u32, poslo, u32, poshi) > +#else /* no padding, big endian */ > +COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, ubuf, > +u32, count, u32, poshi, u32, poslo) > +#endif > +{ > +#ifdef CONFIG_S390 > + if ((compat_ssize_t) count < 0) > + return -EINVAL; > +#endif /* CONFIG_S390 */ > + return do_pread64(fd, ubuf, count, > + ((loff_t) (unsigned long) (poshi) << 32) | > + (unsigned long) (poslo)); > +} Egads... You have 4 ifdefs before you even get to the body. And good luck trying to actually keep track of that mess. They clearly go in 2 pairs, right? One parameter is "do we have padding" (== does ABI prohibit passing 64bit value in 4th and 5th words), another is the order in which the halves of 64bit are passed. On l-e you have bits 0..31 in the first one and bits 32..63 in the second; on b-e it's the other way round. Only the logics for putting them together into a 64bit value cares which half is which; insisting on the names of form {hi,lo} gives you arseloads of similar variants in ifdefs, all for the sake of not having conditional code in the body. Or, actually, in the inlined helper for building that 64bit out of two halves...
Re: [RFC PATCH 2/6] fs: provide a generic compat_sys_truncate64() implementation
On Sun, Mar 18, 2018 at 05:10:52PM +0100, Dominik Brodowski wrote: > +#ifdef __ARCH_WANT_COMPAT_SYS_TRUNCATE64 > +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE4(truncate64, const char __user *, filename, u32 > padding, > +unsigned int, offset_low, unsigned int, offset_high) > +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + !defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE4(truncate64, const char __user *, filename, u32 > padding, > +unsigned int, offset_high, unsigned int, offset_low) > +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE3(truncate64, const char __user *, filename, > +unsigned int, offset_low, unsigned int, offset_high) > +#else /* no padding, big endian */ > +COMPAT_SYSCALL_DEFINE3(truncate64, const char __user *, filename, > +unsigned int, offset_high, unsigned int, offset_low) > +#endif > +{ > +#ifdef CONFIG_SPARC > + if ((int) offset_high < 0) > + return -EINVAL; > +#endif > + return do_sys_truncate(filename, > +((loff_t) offset_high << 32) | offset_low); > +} > +#endif /* __ARCH_WANT_COMPAT_SYS_TRUNCATE64 */ Ow... For one thing, the same observation as for readahead(2). For another, that sparc-specific test is very suspicious, innit? Let's take a look at do_sys_truncate(): static long do_sys_truncate(const char __user *pathname, loff_t length) { unsigned int lookup_flags = LOOKUP_FOLLOW; struct path path; int error; if (length < 0) /* sorry, but loff_t says... */ return -EINVAL; So in case of offset_high having bit 31 set, we would get length with bit 63 set, and step into that if (length < 0) return -EINVAL; Sure, any set of texts can be combined, given a sufficiently large pile of ifdefs, but you are replacing an arseload of almost but not quite identical functions spread all over the tree with something that is in one place, but is awfully hard to look at, nevermind reading it...
Re: [RFC PATCH 3/6] fs: provide generic compat_sys_p{read, write}64() implementations
Honestly, I think the patches like this are disgusting: On Sun, Mar 18, 2018 at 9:10 AM, Dominik Brodowski wrote: > +#ifdef __ARCH_WANT_COMPAT_SYS_PREADWRITE64 > +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE6(pread64, unsigned int, fd, char __user *, ubuf, > + u32, count, u32, padding, u32, poslo, u32, poshi) > +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + !defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE6(pread64, unsigned int, fd, char __user *, ubuf, > + u32, count, u32, padding, u32, poshi, u32, poslo) > +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \ > + defined(__ARCH_WANT_LE_COMPAT_SYS) > +COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, ubuf, > + u32, count, u32, poslo, u32, poshi) > +#else /* no padding, big endian */ > +COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, ubuf, > + u32, count, u32, poshi, u32, poslo) > +#endif > +{ > +#ifdef CONFIG_S390 > + if ((compat_ssize_t) count < 0) > + return -EINVAL; > +#endif /* CONFIG_S390 */ and we should just keep code like this entirely architecture-dependent. It doesn't save all that many lines: 19 files changed, 97 insertions(+), 106 deletions(-) and the lines it adds are an unreadable mess compared to the lines it removes. So please keep the high/low/padding stuff in the arch wrapper, and just make them use "do_pwrite64()" and friends instead (or "kern_pwrite64()", or whatever we ended up using as the kernel naming for in-kernel system call wrappers). Linus
[RFC PATCH 1/6] fs: provide a generic compat_sys_fallocate() implementation
The compat_sys_fallocate() implementations in mips, powerpc, s390, sparc and x86 only differed based on the endianness of the u64 being passed as parameters (3, 4) and (5, 6). In addition, do not call sys_fallocate() from compat_sys_fallocate(), but use a common do_fallocate() helper instead. This patch is part of a series which tries to remove in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. Cc: Ralf Baechle Cc: James Hogan Cc: linux-m...@linux-mips.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: linux-s...@vger.kernel.org Cc: David S. Miller Cc: sparcli...@vger.kernel.org Cc: Ingo Molnar Cc: Jiri Slaby Cc: x...@kernel.org Cc: Al Viro Signed-off-by: Dominik Brodowski --- arch/mips/include/asm/unistd.h | 4 arch/mips/kernel/linux32.c | 7 --- arch/mips/kernel/scall64-o32.S | 2 +- arch/powerpc/include/asm/unistd.h | 1 + arch/powerpc/kernel/sys_ppc32.c| 7 --- arch/s390/include/asm/unistd.h | 1 + arch/s390/kernel/compat_linux.c| 7 --- arch/s390/kernel/compat_linux.h| 1 - arch/s390/kernel/syscalls/syscall.tbl | 2 +- arch/sparc/include/asm/unistd.h| 1 + arch/sparc/kernel/sys_sparc32.c| 7 --- arch/sparc/kernel/systbls.h| 2 -- arch/x86/entry/syscalls/syscall_32.tbl | 2 +- arch/x86/ia32/sys_ia32.c | 8 arch/x86/include/asm/sys_ia32.h| 2 -- arch/x86/include/asm/unistd.h | 2 ++ fs/open.c | 24 +++- include/linux/compat.h | 6 ++ 18 files changed, 41 insertions(+), 45 deletions(-) diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h index 3c09450908aa..bec9c6f55956 100644 --- a/arch/mips/include/asm/unistd.h +++ b/arch/mips/include/asm/unistd.h @@ -45,6 +45,10 @@ # endif # ifdef CONFIG_MIPS32_O32 # define __ARCH_WANT_COMPAT_SYS_TIME +# define __ARCH_WANT_COMPAT_SYS_FALLOCATE +# ifdef __MIPSEL__ +#define __ARCH_WANT_LE_COMPAT_SYS +# endif # endif #define __ARCH_WANT_SYS_FORK #define __ARCH_WANT_SYS_CLONE diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c index b332f6fc1e72..f6e6cb41c01f 100644 --- a/arch/mips/kernel/linux32.c +++ b/arch/mips/kernel/linux32.c @@ -153,10 +153,3 @@ asmlinkage long sys32_fadvise64_64(int fd, int __pad, merge_64(a2, a3), merge_64(a4, a5), flags); } - -asmlinkage long sys32_fallocate(int fd, int mode, unsigned offset_a2, - unsigned offset_a3, unsigned len_a4, unsigned len_a5) -{ - return sys_fallocate(fd, mode, merge_64(offset_a2, offset_a3), -merge_64(len_a4, len_a5)); -} diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S index 9ebe3e2403b1..da8babc2c1f5 100644 --- a/arch/mips/kernel/scall64-o32.S +++ b/arch/mips/kernel/scall64-o32.S @@ -536,7 +536,7 @@ EXPORT(sys32_call_table) PTR compat_sys_signalfd PTR sys_ni_syscall /* was timerfd */ PTR sys_eventfd - PTR sys32_fallocate /* 4320 */ + PTR compat_sys_fallocate/* 4320 */ PTR sys_timerfd_create PTR compat_sys_timerfd_gettime PTR compat_sys_timerfd_settime diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index daf1ba97a00c..46890687fb1d 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -49,6 +49,7 @@ #define __ARCH_WANT_COMPAT_SYS_TIME #define __ARCH_WANT_SYS_NEWFSTATAT #define __ARCH_WANT_COMPAT_SYS_SENDFILE +#define __ARCH_WANT_COMPAT_SYS_FALLOCATE #endif #define __ARCH_WANT_SYS_FORK #define __ARCH_WANT_SYS_VFORK diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c index 15f216d022e2..7c6da6273367 100644 --- a/arch/powerpc/kernel/sys_ppc32.c +++ b/arch/powerpc/kernel/sys_ppc32.c @@ -97,13 +97,6 @@ asmlinkage int compat_sys_truncate64(const char __user * path, u32 reg4, return sys_truncate(path, (high << 32) | low); } -asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offhi, u32 offlo, -u32 lenhi, u32 lenlo) -{ - return sys_fallocate(fd, mode, ((loff_t)offhi << 32) | offlo, -((loff_t)lenhi << 32) | lenlo); -} - asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long high, unsigned long low) { diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h index fd79c0d35dc4..5e919c11c11f 100644 --- a/arch/s390/include/asm/unistd.h +++ b/arch/s390/include/asm/unistd.h @@ -34,6 +34,7 @@ #define __ARCH_WANT_SYS_SIGPROCMASK # ifdef CONFIG_COMPAT # define __ARCH_WANT_C
[RFC PATCH 3/6] fs: provide generic compat_sys_p{read, write}64() implementations
The compat_sys_{read,write}64() implementations in mips, powerpc, s390, sparc and x86 only differed based on whether the u64 parameter needed padding and on their endianness. Oh, and some defined the parameters as u64 or "unsigned long" which expanded to u64, though it only expected u32 in these parameters. This patch is part of a series which tries to remove in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. Suggested-by: Al Viro Cc: Ralf Baechle Cc: James Hogan Cc: linux-m...@linux-mips.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: linux-s...@vger.kernel.org Cc: David S. Miller Cc: sparcli...@vger.kernel.org Cc: Ingo Molnar Cc: Jiri Slaby Cc: x...@kernel.org Cc: Al Viro Signed-off-by: Dominik Brodowski --- arch/mips/include/asm/unistd.h | 1 + arch/mips/kernel/linux32.c | 16 arch/mips/kernel/scall64-o32.S | 4 +- arch/powerpc/include/asm/unistd.h | 1 + arch/powerpc/kernel/sys_ppc32.c| 12 -- arch/s390/include/asm/unistd.h | 1 + arch/s390/kernel/compat_linux.c| 16 arch/s390/kernel/compat_linux.h| 2 - arch/s390/kernel/syscalls/syscall.tbl | 4 +- arch/sparc/include/asm/unistd.h| 1 + arch/sparc/kernel/sys_sparc32.c| 18 - arch/sparc/kernel/systbls.h| 10 - arch/sparc/kernel/systbls_64.S | 2 +- arch/x86/entry/syscalls/syscall_32.tbl | 4 +- arch/x86/ia32/sys_ia32.c | 16 arch/x86/include/asm/sys_ia32.h| 5 --- arch/x86/include/asm/unistd.h | 1 + fs/read_write.c| 74 -- include/linux/compat.h | 15 +++ 19 files changed, 97 insertions(+), 106 deletions(-) diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h index 8aa5b7a19133..3ddc271ad77b 100644 --- a/arch/mips/include/asm/unistd.h +++ b/arch/mips/include/asm/unistd.h @@ -48,6 +48,7 @@ # define __ARCH_WANT_COMPAT_SYS_TIME # define __ARCH_WANT_COMPAT_SYS_FALLOCATE # define __ARCH_WANT_COMPAT_SYS_TRUNCATE64 +# define __ARCH_WANT_COMPAT_SYS_PREADWRITE64 # ifdef __MIPSEL__ #define __ARCH_WANT_LE_COMPAT_SYS # endif diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c index f3aad4ca5560..871cda53a915 100644 --- a/arch/mips/kernel/linux32.c +++ b/arch/mips/kernel/linux32.c @@ -92,22 +92,6 @@ SYSCALL_DEFINE5(32_llseek, unsigned int, fd, unsigned int, offset_high, return sys_llseek(fd, offset_high, offset_low, result, origin); } -/* From the Single Unix Spec: pread & pwrite act like lseek to pos + op + - lseek back to original location. They fail just like lseek does on - non-seekable files. */ - -SYSCALL_DEFINE6(32_pread, unsigned long, fd, char __user *, buf, size_t, count, - unsigned long, unused, unsigned long, a4, unsigned long, a5) -{ - return sys_pread64(fd, buf, count, merge_64(a4, a5)); -} - -SYSCALL_DEFINE6(32_pwrite, unsigned int, fd, const char __user *, buf, - size_t, count, u32, unused, u64, a4, u64, a5) -{ - return sys_pwrite64(fd, buf, count, merge_64(a4, a5)); -} - SYSCALL_DEFINE1(32_personality, unsigned long, personality) { unsigned int p = personality & 0x; diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S index 216450516b44..fbc463b234a1 100644 --- a/arch/mips/kernel/scall64-o32.S +++ b/arch/mips/kernel/scall64-o32.S @@ -416,8 +416,8 @@ EXPORT(sys32_call_table) PTR compat_sys_rt_sigtimedwait PTR compat_sys_rt_sigqueueinfo PTR compat_sys_rt_sigsuspend - PTR sys_32_pread/* 4200 */ - PTR sys_32_pwrite + PTR compat_sys_pread64 /* 4200 */ + PTR compat_sys_pwrite64 PTR sys_chown PTR sys_getcwd PTR sys_capget diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index dca76157f27e..704f2413ac30 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -52,6 +52,7 @@ #define __ARCH_WANT_COMPAT_SYS_SENDFILE #define __ARCH_WANT_COMPAT_SYS_FALLOCATE #define __ARCH_WANT_COMPAT_SYS_TRUNCATE64 +#define __ARCH_WANT_COMPAT_SYS_PREADWRITE64 #endif #define __ARCH_WANT_SYS_FORK #define __ARCH_WANT_SYS_VFORK diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c index dab9eece7731..ec896c8df968 100644 --- a/arch/powerpc/kernel/sys_ppc32.c +++ b/arch/powerpc/kernel/sys_ppc32.c @@ -74,18 +74,6 @@ unsigned long compat_sys_mmap2(unsigned long addr, size_t len, * The 32 bit ABI passes long longs in an odd even register pair. */ -compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, compat_size_t count, -u32 reg6, u32 poshi, u32
[RFC PATCH 2/6] fs: provide a generic compat_sys_truncate64() implementation
The compat_sys_truncate64() implementations in mips, powerpc, s390, sparc and x86 only differed based on whether the u64 parameter needed padding and on its endianness. Oh, and some defined the parameters as "unsigned long" which expanded to u64, though it only expected u32 in these parameters. This patch is part of a series which tries to remove in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. Cc: Ralf Baechle Cc: James Hogan Cc: linux-m...@linux-mips.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: linux-s...@vger.kernel.org Cc: David S. Miller Cc: sparcli...@vger.kernel.org Cc: Ingo Molnar Cc: Jiri Slaby Cc: x...@kernel.org Cc: Al Viro Signed-off-by: Dominik Brodowski --- arch/mips/include/asm/unistd.h | 2 ++ arch/mips/kernel/linux32.c | 6 -- arch/mips/kernel/scall64-o32.S | 2 +- arch/powerpc/include/asm/unistd.h | 2 ++ arch/powerpc/kernel/sys_ppc32.c| 6 -- arch/s390/include/asm/unistd.h | 1 + arch/s390/kernel/compat_linux.c| 5 - arch/s390/kernel/compat_linux.h| 1 - arch/s390/kernel/syscalls/syscall.tbl | 2 +- arch/sparc/include/asm/unistd.h| 1 + arch/sparc/kernel/sys_sparc32.c| 8 arch/sparc/kernel/systbls.h| 3 --- arch/sparc/kernel/systbls_64.S | 2 +- arch/x86/entry/syscalls/syscall_32.tbl | 2 +- arch/x86/ia32/sys_ia32.c | 7 --- arch/x86/include/asm/sys_ia32.h| 2 -- arch/x86/include/asm/unistd.h | 1 + fs/open.c | 28 +++- include/linux/compat.h | 11 +++ 19 files changed, 49 insertions(+), 43 deletions(-) diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h index bec9c6f55956..8aa5b7a19133 100644 --- a/arch/mips/include/asm/unistd.h +++ b/arch/mips/include/asm/unistd.h @@ -44,8 +44,10 @@ # define __ARCH_WANT_SYS_TIME # endif # ifdef CONFIG_MIPS32_O32 +# define __ARCH_WANT_COMPAT_SYS_WITH_PADDING # define __ARCH_WANT_COMPAT_SYS_TIME # define __ARCH_WANT_COMPAT_SYS_FALLOCATE +# define __ARCH_WANT_COMPAT_SYS_TRUNCATE64 # ifdef __MIPSEL__ #define __ARCH_WANT_LE_COMPAT_SYS # endif diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c index f6e6cb41c01f..f3aad4ca5560 100644 --- a/arch/mips/kernel/linux32.c +++ b/arch/mips/kernel/linux32.c @@ -79,12 +79,6 @@ struct rlimit32 { int rlim_max; }; -SYSCALL_DEFINE4(32_truncate64, const char __user *, path, - unsigned long, __dummy, unsigned long, a2, unsigned long, a3) -{ - return sys_truncate(path, merge_64(a2, a3)); -} - SYSCALL_DEFINE4(32_ftruncate64, unsigned long, fd, unsigned long, __dummy, unsigned long, a2, unsigned long, a3) { diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S index da8babc2c1f5..216450516b44 100644 --- a/arch/mips/kernel/scall64-o32.S +++ b/arch/mips/kernel/scall64-o32.S @@ -427,7 +427,7 @@ EXPORT(sys32_call_table) PTR sys_ni_syscall PTR sys_ni_syscall PTR sys_mips_mmap2 /* 4210 */ - PTR sys_32_truncate64 + PTR compat_sys_truncate64 PTR sys_32_ftruncate64 PTR sys_newstat PTR sys_newlstat diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index 46890687fb1d..dca76157f27e 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -46,10 +46,12 @@ #define __ARCH_WANT_OLD_STAT #endif #ifdef CONFIG_PPC64 +#define __ARCH_WANT_COMPAT_SYS_WITH_PADDING #define __ARCH_WANT_COMPAT_SYS_TIME #define __ARCH_WANT_SYS_NEWFSTATAT #define __ARCH_WANT_COMPAT_SYS_SENDFILE #define __ARCH_WANT_COMPAT_SYS_FALLOCATE +#define __ARCH_WANT_COMPAT_SYS_TRUNCATE64 #endif #define __ARCH_WANT_SYS_FORK #define __ARCH_WANT_SYS_VFORK diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c index 7c6da6273367..dab9eece7731 100644 --- a/arch/powerpc/kernel/sys_ppc32.c +++ b/arch/powerpc/kernel/sys_ppc32.c @@ -91,12 +91,6 @@ compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offhi, u32 offlo, u32 co return sys_readahead(fd, ((loff_t)offhi << 32) | offlo, count); } -asmlinkage int compat_sys_truncate64(const char __user * path, u32 reg4, - unsigned long high, unsigned long low) -{ - return sys_truncate(path, (high << 32) | low); -} - asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long high, unsigned long low) { diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h index 5e919c11c11f..7667a2d0b1e1 100644 --- a/arch/s390/include/asm/unistd.h +++ b/arch/s390/include/asm/unistd.h @@ -35,6 +35,7 @@ # ifdef
[RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation
The compat_sys_readahead() implementations in mips, powerpc, s390, sparc and x86 only differed based on whether the u64 parameter needed padding and on their endianness. Oh, and some defined the parameters as u64 or "unsigned long" which expanded to u64, though it only expected u32 in these parameters. This patch is part of a series which tries to remove in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. Cc: Ralf Baechle Cc: James Hogan Cc: linux-m...@linux-mips.org Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: linux-s...@vger.kernel.org Cc: David S. Miller Cc: sparcli...@vger.kernel.org Cc: Ingo Molnar Cc: Jiri Slaby Cc: x...@kernel.org Cc: Al Viro Signed-off-by: Dominik Brodowski --- arch/mips/include/asm/unistd.h | 1 + arch/mips/kernel/linux32.c | 6 --- arch/mips/kernel/scall64-o32.S | 2 +- arch/powerpc/include/asm/unistd.h | 1 + arch/powerpc/kernel/sys_ppc32.c| 5 --- arch/s390/include/asm/unistd.h | 1 + arch/s390/kernel/compat_linux.c| 5 --- arch/s390/kernel/compat_linux.h| 1 - arch/s390/kernel/syscalls/syscall.tbl | 2 +- arch/sparc/include/asm/unistd.h| 1 + arch/sparc/kernel/sys_sparc32.c| 8 arch/sparc/kernel/systbls.h| 4 -- arch/x86/entry/syscalls/syscall_32.tbl | 2 +- arch/x86/ia32/sys_ia32.c | 6 --- arch/x86/include/asm/sys_ia32.h| 2 - arch/x86/include/asm/unistd.h | 1 + include/linux/compat.h | 10 + mm/readahead.c | 81 -- 18 files changed, 76 insertions(+), 63 deletions(-) diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h index 3ddc271ad77b..f8f9046164ae 100644 --- a/arch/mips/include/asm/unistd.h +++ b/arch/mips/include/asm/unistd.h @@ -49,6 +49,7 @@ # define __ARCH_WANT_COMPAT_SYS_FALLOCATE # define __ARCH_WANT_COMPAT_SYS_TRUNCATE64 # define __ARCH_WANT_COMPAT_SYS_PREADWRITE64 +# define __ARCH_WANT_COMPAT_SYS_READAHEAD # ifdef __MIPSEL__ #define __ARCH_WANT_LE_COMPAT_SYS # endif diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c index 871cda53a915..c40ce08be17d 100644 --- a/arch/mips/kernel/linux32.c +++ b/arch/mips/kernel/linux32.c @@ -106,12 +106,6 @@ SYSCALL_DEFINE1(32_personality, unsigned long, personality) return ret; } -asmlinkage ssize_t sys32_readahead(int fd, u32 pad0, u64 a2, u64 a3, - size_t count) -{ - return sys_readahead(fd, merge_64(a2, a3), count); -} - asmlinkage long sys32_sync_file_range(int fd, int __pad, unsigned long a2, unsigned long a3, unsigned long a4, unsigned long a5, diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S index fbc463b234a1..eb4e66ba025a 100644 --- a/arch/mips/kernel/scall64-o32.S +++ b/arch/mips/kernel/scall64-o32.S @@ -439,7 +439,7 @@ EXPORT(sys32_call_table) PTR compat_sys_fcntl64 /* 4220 */ PTR sys_ni_syscall PTR sys_gettid - PTR sys32_readahead + PTR compat_sys_readahead PTR sys_setxattr PTR sys_lsetxattr /* 4225 */ PTR sys_fsetxattr diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h index 704f2413ac30..870317a35763 100644 --- a/arch/powerpc/include/asm/unistd.h +++ b/arch/powerpc/include/asm/unistd.h @@ -53,6 +53,7 @@ #define __ARCH_WANT_COMPAT_SYS_FALLOCATE #define __ARCH_WANT_COMPAT_SYS_TRUNCATE64 #define __ARCH_WANT_COMPAT_SYS_PREADWRITE64 +#define __ARCH_WANT_COMPAT_SYS_READAHEAD #endif #define __ARCH_WANT_SYS_FORK #define __ARCH_WANT_SYS_VFORK diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c index ec896c8df968..289ae55bb4b5 100644 --- a/arch/powerpc/kernel/sys_ppc32.c +++ b/arch/powerpc/kernel/sys_ppc32.c @@ -74,11 +74,6 @@ unsigned long compat_sys_mmap2(unsigned long addr, size_t len, * The 32 bit ABI passes long longs in an odd even register pair. */ -compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offhi, u32 offlo, u32 count) -{ - return sys_readahead(fd, ((loff_t)offhi << 32) | offlo, count); -} - asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long high, unsigned long low) { diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h index 71e6f7d65762..685ad7944850 100644 --- a/arch/s390/include/asm/unistd.h +++ b/arch/s390/include/asm/unistd.h @@ -37,6 +37,7 @@ # define __ARCH_WANT_COMPAT_SYS_FALLOCATE # define __ARCH_WANT_COMPAT_SYS_TRUNCATE64 # define __ARCH_WANT_COMPAT_SYS_PREADWRITE64 +# define __ARCH_WANT_COMPAT_SYS_READAHEAD # endif #define __ARCH_WANT_SYS_FORK #define __ARCH_WANT_SYS_VFORK d
Re: [PATCH v12 00/11] Application Data Integrity feature introduced by SPARC M7
From: Khalid Aziz Date: Wed, 21 Feb 2018 10:15:42 -0700 > V12 changes: > This series is same as v10 and v11 and was simply rebased on 4.16-rc2 > kernel and patch 11 was added to update signal delivery code to use the > new helper functions added by Eric Biederman. Can mm maintainers please > review patches 2, 7, 8 and 9 which are arch independent, and > include/linux/mm.h and mm/ksm.c changes in patch 10 and ack these if > everything looks good? Khalid I've applied this series to sparc-next, thank you! But one thing has to be fixed up. In uapi/asm/auxvec.h you conditionalize the ADI aux vectors on CONFIG_SPARC64. That's not correct, you should never control user facing definitions based upon kernel configuration. Also, both 32-bit and 64-bit applications running on ADI capable machines want to compile against and use this informaiton. So please remove these CPP checks.
[PATCH V5 4/4] powerpc/mm/hash: Don't memset pgd table if not needed
We need to zero-out pgd table only if we share the slab cache with pud/pmd level caches. With the support of 4PB, we don't share the slab cache anymore. Instead of removing the code completely hide it within an #ifdef. We don't need to do this with any other page table level, because they all allocate table of double the size and we take of initializing the first half corrrectly during page table zap. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/pgalloc.h | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h b/arch/powerpc/include/asm/book3s/64/pgalloc.h index 4746bc68d446..07f0dbac479f 100644 --- a/arch/powerpc/include/asm/book3s/64/pgalloc.h +++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h @@ -80,8 +80,19 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm) pgd = kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE), pgtable_gfp_flags(mm, GFP_KERNEL)); + /* +* With hugetlb, we don't clear the second half of the page table. +* If we share the same slab cache with the pmd or pud level table, +* we need to make sure we zero out the full table on alloc. +* With 4K we don't store slot in the second half. Hence we don't +* need to do this for 4k. +*/ +#if (H_PGD_INDEX_SIZE == H_PUD_CACHE_INDEX) || \ + (H_PGD_INDEX_SIZE == H_PMD_CACHE_INDEX) +#if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_PPC_64K_PAGES) memset(pgd, 0, PGD_TABLE_SIZE); - +#endif +#endif return pgd; } -- 2.14.3
[PATCH V5 3/4] powerpc/mm/hash64: Increase the VA range
This patch increase the max virtual address value to 4PB. With 4K page size config we continue to limit ourself to 64TB. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/hash-64k.h | 2 +- arch/powerpc/include/asm/processor.h | 9 - arch/powerpc/mm/init_64.c | 6 -- arch/powerpc/mm/pgtable_64.c | 5 - 4 files changed, 9 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h index 8d0cbbb31023..55d5cd64cdb3 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h @@ -4,7 +4,7 @@ #define H_PTE_INDEX_SIZE 8 #define H_PMD_INDEX_SIZE 10 -#define H_PUD_INDEX_SIZE 7 +#define H_PUD_INDEX_SIZE 10 #define H_PGD_INDEX_SIZE 8 /* diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index 75b084486ce1..bb9cb25ffb20 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -109,6 +109,13 @@ void release_thread(struct task_struct *); #define TASK_SIZE_64TB (0x4000UL) #define TASK_SIZE_128TB (0x8000UL) #define TASK_SIZE_512TB (0x0002UL) +#define TASK_SIZE_1PB (0x0004UL) +#define TASK_SIZE_2PB (0x0008UL) +/* + * With 52 bits in the address we can support + * upto 4PB of range. + */ +#define TASK_SIZE_4PB (0x0010UL) /* * For now 512TB is only supported with book3s and 64K linux page size. @@ -117,7 +124,7 @@ void release_thread(struct task_struct *); /* * Max value currently used: */ -#define TASK_SIZE_USER64 TASK_SIZE_512TB +#define TASK_SIZE_USER64 TASK_SIZE_4PB #define DEFAULT_MAP_WINDOW_USER64 TASK_SIZE_128TB #define TASK_CONTEXT_SIZE TASK_SIZE_512TB #else diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c index fdb424a29f03..63470b06c502 100644 --- a/arch/powerpc/mm/init_64.c +++ b/arch/powerpc/mm/init_64.c @@ -68,12 +68,6 @@ #include "mmu_decl.h" -#ifdef CONFIG_PPC_BOOK3S_64 -#if H_PGTABLE_RANGE > USER_VSID_RANGE -#warning Limited user VSID range means pagetable space is wasted -#endif -#endif /* CONFIG_PPC_BOOK3S_64 */ - phys_addr_t memstart_addr = ~0; EXPORT_SYMBOL_GPL(memstart_addr); phys_addr_t kernstart_addr; diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c index 28c980eb4422..16636bdf3331 100644 --- a/arch/powerpc/mm/pgtable_64.c +++ b/arch/powerpc/mm/pgtable_64.c @@ -57,11 +57,6 @@ #include "mmu_decl.h" -#ifdef CONFIG_PPC_BOOK3S_64 -#if TASK_SIZE_USER64 > (1UL << (ESID_BITS + SID_SHIFT)) -#error TASK_SIZE_USER64 exceeds user VSID range -#endif -#endif #ifdef CONFIG_PPC_BOOK3S_64 /* -- 2.14.3
[PATCH V5 2/4] powerpc/mm: Add support for handling > 512TB address in SLB miss
For address above 512TB we allocate additional mmu context. To make it all easy, address above 512TB is handled with IR/DR=1 and with stack frame setup. The mmu_context_t is also updated to track the new extended_ids. To support upto 4PB we need a total 8 contexts. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/hash-4k.h | 6 ++ arch/powerpc/include/asm/book3s/64/hash-64k.h | 6 ++ arch/powerpc/include/asm/book3s/64/mmu.h | 31 +++- arch/powerpc/include/asm/mmu_context.h| 38 + arch/powerpc/include/asm/processor.h | 6 ++ arch/powerpc/kernel/exceptions-64s.S | 11 ++- arch/powerpc/kernel/traps.c | 12 --- arch/powerpc/mm/copro_fault.c | 2 +- arch/powerpc/mm/hash_utils_64.c | 4 +- arch/powerpc/mm/mmu_context_book3s64.c| 15 +++- arch/powerpc/mm/pgtable-hash64.c | 2 +- arch/powerpc/mm/slb.c | 108 ++ arch/powerpc/mm/slb_low.S | 8 +- arch/powerpc/mm/slice.c | 15 +++- arch/powerpc/mm/tlb_hash64.c | 2 +- 15 files changed, 239 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h index 67c5475311ee..1a35eb944481 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h @@ -11,6 +11,12 @@ #define H_PUD_INDEX_SIZE 9 #define H_PGD_INDEX_SIZE 9 +/* + * Each context is 512TB. But on 4k we restrict our max TASK size to 64TB + * Hence also limit max EA bits to 64TB. + */ +#define MAX_EA_BITS_PER_CONTEXT46 + #ifndef __ASSEMBLY__ #define H_PTE_TABLE_SIZE (sizeof(pte_t) << H_PTE_INDEX_SIZE) #define H_PMD_TABLE_SIZE (sizeof(pmd_t) << H_PMD_INDEX_SIZE) diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h index 3bcf269f8f55..8d0cbbb31023 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h @@ -7,6 +7,12 @@ #define H_PUD_INDEX_SIZE 7 #define H_PGD_INDEX_SIZE 8 +/* + * Each context is 512TB size. SLB miss for first context/default context + * is handled in the hotpath. + */ +#define MAX_EA_BITS_PER_CONTEXT49 + /* * 64k aligned address free up few of the lower bits of RPN for us * We steal that here. For more deatils look at pte_pfn/pfn_pte() diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h index 78579305..33ba81d69bd1 100644 --- a/arch/powerpc/include/asm/book3s/64/mmu.h +++ b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -91,7 +91,18 @@ struct slice_mask { }; typedef struct { - mm_context_id_t id; + union { + /* +* We use id as the PIDR content for radix. On hash we can use +* more than one id. The extended ids are used when we start +* having address above 512TB. We allocate one extended id +* for each 512TB. The new id is then used with the 49 bit +* EA to build a new VA. We always use ESID_BITS_1T_MASK bits +* from EA and new context ids to build the new VAs. +*/ + mm_context_id_t id; + mm_context_id_t extended_id[TASK_SIZE_USER64/TASK_CONTEXT_SIZE]; + }; u16 user_psize; /* page size index */ /* Number of bits in the mm_cpumask */ @@ -193,5 +204,23 @@ extern void radix_init_pseries(void); static inline void radix_init_pseries(void) { }; #endif +static inline int get_ea_context(mm_context_t *ctx, unsigned long ea) +{ + int index = ea >> MAX_EA_BITS_PER_CONTEXT; + + if (likely(index < ARRAY_SIZE(ctx->extended_id))) + return ctx->extended_id[index]; + /* should never happen */ + BUG(); +} + +static inline unsigned long get_user_vsid(mm_context_t *ctx, + unsigned long ea, int ssize) +{ + unsigned long context = get_ea_context(ctx, ea); + + return get_vsid(context, ea, ssize); +} + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */ diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 051b3d63afe3..61d96b4a03b9 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -60,12 +60,50 @@ extern int hash__alloc_context_id(void); extern void hash__reserve_context_id(int id); extern void __destroy_context(int context_id); static inline void mmu_context_init(void) { } + +static inline int alloc_extended_context(struct mm_struct *mm, +unsigned long ea) +{ + int context_id; + + int index = ea >> MAX_EA_BITS_PER_CONTEXT; + + context_id = hash__alloc_conte
[PATCH V5 1/4] powerpc/mm/slice: Consolidate the return path for the function.
In the follow up patch, on finding a free area we will need to do allocated extra contexts as needed. Consolidating the return path for slice_get_unmapped_area makes it easy. Split into a separate patch to make review easy. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/mm/slice.c | 36 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c index 15a857772617..f64bfe8bae57 100644 --- a/arch/powerpc/mm/slice.c +++ b/arch/powerpc/mm/slice.c @@ -585,7 +585,8 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len, */ if (slice_check_range_fits(mm, &good_mask, addr, len)) { slice_dbg(" fits good !\n"); - return addr; + newaddr = addr; + goto return_addr; } } else { /* Now let's see if we can find something in the existing @@ -598,7 +599,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len, * we thus return directly */ slice_dbg(" found area at 0x%lx\n", newaddr); - return newaddr; + goto return_addr; } } /* @@ -612,6 +613,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len, if (addr || fixed) { if (slice_check_range_fits(mm, &potential_mask, addr, len)) { slice_dbg(" fits potential !\n"); + newaddr = addr; goto convert; } } @@ -626,34 +628,34 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len, * anywhere in the good area. */ if (addr) { - addr = slice_find_area(mm, len, &good_mask, - psize, topdown, high_limit); - if (addr != -ENOMEM) { - slice_dbg(" found area at 0x%lx\n", addr); - return addr; + newaddr = slice_find_area(mm, len, &good_mask, + psize, topdown, high_limit); + if (newaddr != -ENOMEM) { + slice_dbg(" found area at 0x%lx\n", newaddr); + goto return_addr; } } /* Now let's see if we can find something in the existing slices * for that size plus free slices */ - addr = slice_find_area(mm, len, &potential_mask, - psize, topdown, high_limit); + newaddr = slice_find_area(mm, len, &potential_mask, + psize, topdown, high_limit); #ifdef CONFIG_PPC_64K_PAGES - if (addr == -ENOMEM && psize == MMU_PAGE_64K) { + if (newaddr == -ENOMEM && psize == MMU_PAGE_64K) { /* retry the search with 4k-page slices included */ slice_or_mask(&potential_mask, &potential_mask, compat_maskp, high_slices); - addr = slice_find_area(mm, len, &potential_mask, - psize, topdown, high_limit); + newaddr = slice_find_area(mm, len, &potential_mask, + psize, topdown, high_limit); } #endif - if (addr == -ENOMEM) + if (newaddr == -ENOMEM) return -ENOMEM; - slice_range_to_mask(addr, len, &potential_mask, high_slices); - slice_dbg(" found potential area at 0x%lx\n", addr); + slice_range_to_mask(newaddr, len, &potential_mask, high_slices); + slice_dbg(" found potential area at 0x%lx\n", newaddr); slice_print_mask(" mask", &potential_mask); convert: @@ -667,7 +669,9 @@ unsigned long slice_get_unmapped_area(unsigned long addr, unsigned long len, if (psize > MMU_PAGE_BASE) on_each_cpu(slice_flush_segments, mm, 1); } - return addr; + +return_addr: + return newaddr; } EXPORT_SYMBOL_GPL(slice_get_unmapped_area); -- 2.14.3
[PATCH V5 0/4] Add support for 4PB virtual address space on hash
This patch series extended the max virtual address space value from 512TB to 4PB with 64K page size. We do that by allocating one vsid context for each 512TB range. More details of that is explained in patch 3. Changes from V4: * Move context allocation to mmap time instead of SLB miss time * Address review comments Changes from V3: * move extended_id to be a union with mm_context_t id. This reduce some array index complexity. * Add addr_limit check when handling slb miss for extended context Changes from V2: * Rebased on top of slice_mask series from Nick Piggin * Fixed segfault when mmap with 512TB hint address Aneesh Kumar K.V (4): powerpc/mm/slice: Consolidate the return path for the function. powerpc/mm: Add support for handling > 512TB address in SLB miss powerpc/mm/hash64: Increase the VA range powerpc/mm/hash: Don't memset pgd table if not needed arch/powerpc/include/asm/book3s/64/hash-4k.h | 6 ++ arch/powerpc/include/asm/book3s/64/hash-64k.h | 8 +- arch/powerpc/include/asm/book3s/64/mmu.h | 31 +++- arch/powerpc/include/asm/book3s/64/pgalloc.h | 13 +++- arch/powerpc/include/asm/mmu_context.h| 38 + arch/powerpc/include/asm/processor.h | 15 +++- arch/powerpc/kernel/exceptions-64s.S | 11 ++- arch/powerpc/kernel/traps.c | 12 --- arch/powerpc/mm/copro_fault.c | 2 +- arch/powerpc/mm/hash_utils_64.c | 4 +- arch/powerpc/mm/init_64.c | 6 -- arch/powerpc/mm/mmu_context_book3s64.c| 15 +++- arch/powerpc/mm/pgtable-hash64.c | 2 +- arch/powerpc/mm/pgtable_64.c | 5 -- arch/powerpc/mm/slb.c | 108 ++ arch/powerpc/mm/slb_low.S | 8 +- arch/powerpc/mm/slice.c | 49 arch/powerpc/mm/tlb_hash64.c | 2 +- 18 files changed, 279 insertions(+), 56 deletions(-) -- 2.14.3