Re: [PATCH 3/5] powerpc/mm/32: Use page_is_ram to check for RAM

2018-03-18 Thread Michael Ellerman
Jonathan Neuschäfer  writes:

> Signed-off-by: Jonathan Neuschäfer 
> ---
>  arch/powerpc/mm/pgtable_32.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index d35d9ad3c1cd..d54e1a9c1c99 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -145,9 +145,8 @@ __ioremap_caller(phys_addr_t addr, unsigned long size, 
> unsigned long flags,
>  #ifndef CONFIG_CRASH_DUMP
>   /*
>* Don't allow anybody to remap normal RAM that we're using.
> -  * mem_init() sets high_memory so only do the check after that.
>*/
> - if (slab_is_available() && (p < virt_to_phys(high_memory)) &&
> + if (page_is_ram(__phys_to_pfn(p)) &&
>   !(__allow_ioremap_reserved && memblock_is_region_reserved(p, 
> size))) {
>   printk("__ioremap(): phys addr 0x%llx is RAM lr %ps\n",
>  (unsigned long long)p, __builtin_return_address(0));


This is killing my p5020ds (Freescale e5500) unfortunately:

  smp: Bringing up secondary CPUs ...
  __ioremap(): phys addr 0x7fef5000 is RAM lr smp_85xx_kick_cpu
  Unable to handle kernel paging request for data at address 0x
  Faulting instruction address: 0xc0029080
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE SMP NR_CPUS=24 CoreNet Generic
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.16.0-rc4-gcc-4.6.3-00076-g85319478bdb4 #86
  NIP:  c0029080 LR: c0029020 CTR: 0001
  REGS: e804bd40 TRAP: 0300   Not tainted  
(4.16.0-rc4-gcc-4.6.3-00076-g85319478bdb4)
  MSR:  00021002   CR: 24ad4e22  XER: 
  DEAR:  ESR:  
  GPR00: c0029020 e804bdf0 e805  00021002 004d  
c0aaaeed 
  GPR08:    2d57d000 22adbe84  c0002630 
 
  GPR16:        
c0a8b4a4 
  GPR24: 0001   00029002  0001 0001 
0001 
  NIP [c0029080] smp_85xx_kick_cpu+0x100/0x2c0
  LR [c0029020] smp_85xx_kick_cpu+0xa0/0x2c0
  Call Trace:
  [e804bdf0] [c0029020] smp_85xx_kick_cpu+0xa0/0x2c0 (unreliable)
  [e804be30] [c0011194] __cpu_up+0xb4/0x1c0
  [e804be60] [c002f16c] bringup_cpu+0x2c/0xf0
  [e804be80] [c002ec9c] cpuhp_invoke_callback+0x12c/0x310
  [e804beb0] [c002fdd8] cpu_up+0x108/0x230
  [e804bee0] [c09f7438] smp_init+0x84/0x104
  [e804bf00] [c09e9acc] kernel_init_freeable+0xc4/0x228
  [e804bf30] [c0002644] kernel_init+0x14/0x110
  [e804bf40] [c000f3b0] ret_from_kernel_thread+0x5c/0x64
  Instruction dump:
  57990032 3b1c0057 7f19c050 7f3acb78 5718d1be 2e18 41920024 7f29cb78 
  7f0903a6 6000 6000 6000 <7c0048ac> 39290040 4200fff8 7c0004ac 
  random: get_random_bytes called from init_oops_id+0x5c/0x70 with crng_init=0
  ---[ end trace c3807aa91cf16cd8 ]---


The obvious fix of changing the test in smp_85xx_start_cpu() didn't
work, I get a different oops:

  Unable to handle kernel paging request for data at address 0x3fef5140
  Faulting instruction address: 0xc00290a0
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE SMP NR_CPUS=24 CoreNet Generic
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
4.16.0-rc4-gcc-4.6.3-00076-g85319478bdb4-dirty #90
  NIP:  c00290a0 LR: c0029040 CTR: 0001
  REGS: e804bd50 TRAP: 0300   Not tainted  
(4.16.0-rc4-gcc-4.6.3-00076-g85319478bdb4-dirty)
  MSR:  00021002   CR: 24a24e22  XER: 
  DEAR: 3fef5140 ESR:  
  GPR00: c0029040 e804be00 e805 0023 00021002 004e  
c0aaaed3 
  GPR08:  3fef5140  2d57d000 22a2be84  c0002630 
 
  GPR16:        
c0a8b4a4 
  GPR24: 0004 0001 3fef5140 3fef5140 00029002 3fef5140 0001 
0001 
  NIP [c00290a0] smp_85xx_kick_cpu+0x120/0x2e0
  LR [c0029040] smp_85xx_kick_cpu+0xc0/0x2e0
  Call Trace:
  [e804be00] [c0029040] smp_85xx_kick_cpu+0xc0/0x2e0 (unreliable)
  [e804be30] [c0011194] __cpu_up+0xb4/0x1c0
  [e804be60] [c002f18c] bringup_cpu+0x2c/0xf0
  [e804be80] [c002ecbc] cpuhp_invoke_callback+0x12c/0x310
  [e804beb0] [c002fdf8] cpu_up+0x108/0x230
  [e804bee0] [c09f7438] smp_init+0x84/0x104
  [e804bf00] [c09e9acc] kernel_init_freeable+0xc4/0x228
  [e804bf30] [c0002644] kernel_init+0x14/0x110
  [e804bf40] [c000f3b0] ret_from_kernel_thread+0x5c/0x64
  Instruction dump:
  7c0903a6 4e800421 57ba0032 3b3d0057 7f3ac850 7f5bd378 5739d1be 2e19 
  4192001c 7f49d378 7f2903a6 6000 <7c0048ac> 39290040 4200fff8 7c0004ac 
  random: get_random_bytes called from init_oops_id+0x5c/0x70 with crng_init=0
  ---[ end trace 950df40ee04f2d5e ]---


So that will require a bit more debugging.

cheers


[PATCH 5/5] powerpc/zImage: Also check for stdout-path

2018-03-18 Thread Oliver O'Halloran
The /chosen/linux,stdout-path is "deprecated" in favour of
/chosen/stdout-path so we should be checking for both.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/boot/serial.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c
index a7f9ac9a27b5..9ef1ed35ae62 100644
--- a/arch/powerpc/boot/serial.c
+++ b/arch/powerpc/boot/serial.c
@@ -92,7 +92,8 @@ static void *serial_get_stdout_devp(void)
if (devp == NULL)
goto err_out;
 
-   if (getprop(devp, "linux,stdout-path", path, MAX_PATH_LEN) > 0) {
+   if (getprop(devp, "linux,stdout-path", path, MAX_PATH_LEN) > 0 ||
+   getprop(devp, "stdout-path", path, MAX_PATH_LEN) > 0) {
devp = finddevice(path);
if (devp == NULL)
goto err_out;
-- 
2.9.5



[PATCH 2/5] powerpc/zImage: Don't build zlib object files

2018-03-18 Thread Oliver O'Halloran
The required source files are directly #include`ed into
decompress.c when using gzip compression. Building the
separate .o files for the zlib sources isn't required
and can cause linker errors due to the symbols being
defined in decompress.o and the zlib .o files.

Signed-off-by: Oliver O'Halloran 
---
You can stop throwing up now
---
 arch/powerpc/boot/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 26d5d2a5b8e9..f66b9dba99fd 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -108,7 +108,7 @@ $(addprefix $(obj)/,$(libfdt) libfdt-wrapper.o simpleboot.o 
epapr.o opal.o \
 src-wlib-y := string.S crt0.S stdio.c decompress.c main.c \
$(libfdt) libfdt-wrapper.c \
ns16550.c serial.c simple_alloc.c div64.S util.S \
-   elf_util.c $(zlib-y) devtree.c stdlib.c \
+   elf_util.c devtree.c stdlib.c \
oflib.c ofconsole.c cuboot.c
 
 src-wlib-$(CONFIG_PPC_MPC52XX) += mpc52xx-psc.c
-- 
2.9.5



[PATCH 4/5] powerpc/zImage: Rework serial driver probing

2018-03-18 Thread Oliver O'Halloran
Rather than checking the compatible string in serial_driver_init()
we call into the driver's init function and wait for a driver to
inidicate it bound to the device by returning zero.

The pointers to each driver probe functions are stored in the
".serialdrv" section of the zImage, similar to how initcalls and
whatnot are handled inside Linux. This structure allows us to
conditionally compile specific driver files based on the KConfig
settings. This is needed because we don't have access to the
KConfig #defines in the zImage source files (it's a giant #include
headache) so conditional compilation is the only way to eliminate
unnecessary code.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/boot/cpm-serial.c  |  1 +
 arch/powerpc/boot/mpc52xx-psc.c |  1 +
 arch/powerpc/boot/mpsc.c|  1 +
 arch/powerpc/boot/ns16550.c |  2 ++
 arch/powerpc/boot/opal.c|  1 +
 arch/powerpc/boot/ops.h | 16 +--
 arch/powerpc/boot/serial.c  | 41 +++--
 arch/powerpc/boot/uartlite.c|  1 +
 arch/powerpc/boot/wrapper   |  2 +-
 arch/powerpc/boot/zImage.coff.lds.S |  4 
 arch/powerpc/boot/zImage.lds.S  |  8 
 11 files changed, 42 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/boot/cpm-serial.c b/arch/powerpc/boot/cpm-serial.c
index f6b1cf23946e..5ebf3dfd810a 100644
--- a/arch/powerpc/boot/cpm-serial.c
+++ b/arch/powerpc/boot/cpm-serial.c
@@ -295,3 +295,4 @@ int cpm_console_init(void *devp, struct serial_console_data 
*scdp)
 
return 0;
 }
+SERIAL_DRIVER(cpm_console_init);
diff --git a/arch/powerpc/boot/mpc52xx-psc.c b/arch/powerpc/boot/mpc52xx-psc.c
index 75470936e661..c09e82eaf006 100644
--- a/arch/powerpc/boot/mpc52xx-psc.c
+++ b/arch/powerpc/boot/mpc52xx-psc.c
@@ -66,3 +66,4 @@ int mpc5200_psc_console_init(void *devp, struct 
serial_console_data *scdp)
 
return 0;
 }
+SERIAL_DRIVER(mpc5200_psc_console_init);
diff --git a/arch/powerpc/boot/mpsc.c b/arch/powerpc/boot/mpsc.c
index 59e23e886440..2d0a72522b3c 100644
--- a/arch/powerpc/boot/mpsc.c
+++ b/arch/powerpc/boot/mpsc.c
@@ -170,3 +170,4 @@ int mpsc_console_init(void *devp, struct 
serial_console_data *scdp)
 err_out:
return -1;
 }
+SERIAL_DRIVER(mpsc_console_init);
diff --git a/arch/powerpc/boot/ns16550.c b/arch/powerpc/boot/ns16550.c
index 0e7bd5284255..49e8c299b829 100644
--- a/arch/powerpc/boot/ns16550.c
+++ b/arch/powerpc/boot/ns16550.c
@@ -81,3 +81,5 @@ int ns16550_console_init(void *devp, struct 
serial_console_data *scdp)
 
return 0;
 }
+
+SERIAL_DRIVER(ns16550_console_init);
diff --git a/arch/powerpc/boot/opal.c b/arch/powerpc/boot/opal.c
index 0e92537536b9..dc345ff63422 100644
--- a/arch/powerpc/boot/opal.c
+++ b/arch/powerpc/boot/opal.c
@@ -102,3 +102,4 @@ int opal_console_init(void *devp, struct 
serial_console_data *scdp)
 
return 0;
 }
+SERIAL_DRIVER(opal_console_init);
diff --git a/arch/powerpc/boot/ops.h b/arch/powerpc/boot/ops.h
index fad1862f4b2d..06151ecb2f45 100644
--- a/arch/powerpc/boot/ops.h
+++ b/arch/powerpc/boot/ops.h
@@ -84,13 +84,17 @@ extern struct loader_info loader_info;
 
 void start(void);
 void fdt_init(void *blob);
+
+struct driver_ptr {
+   int (*ptr)(void *node, struct serial_console_data *scdp);
+};
+
+#define SERIAL_DRIVER(init_function) static const struct driver_ptr \
+   __attribute__((__section__(".serialdrv"))) __attribute__((used)) \
+   init_function##_ptr = { .ptr = &init_function, }
+
 int serial_console_init(void);
-int ns16550_console_init(void *devp, struct serial_console_data *scdp);
-int mpsc_console_init(void *devp, struct serial_console_data *scdp);
-int cpm_console_init(void *devp, struct serial_console_data *scdp);
-int mpc5200_psc_console_init(void *devp, struct serial_console_data *scdp);
-int uartlite_console_init(void *devp, struct serial_console_data *scdp);
-int opal_console_init(void *devp, struct serial_console_data *scdp);
+
 void *simple_alloc_init(char *base, unsigned long heap_size,
unsigned long granularity, unsigned long max_allocs);
 extern void flush_cache(void *, unsigned long);
diff --git a/arch/powerpc/boot/serial.c b/arch/powerpc/boot/serial.c
index 88955095ec07..a7f9ac9a27b5 100644
--- a/arch/powerpc/boot/serial.c
+++ b/arch/powerpc/boot/serial.c
@@ -105,11 +105,15 @@ static void *serial_get_stdout_devp(void)
return NULL;
 }
 
+extern unsigned long _serial_drv_start;
+extern unsigned long _serial_drv_end;
+
 static struct serial_console_data serial_cd;
 
 /* Node's "compatible" property determines which serial driver to use */
 int serial_console_init(void)
 {
+   struct driver_ptr *start, *end, *probe;
void *devp;
int rc = -1;
 
@@ -117,35 +121,14 @@ int serial_console_init(void)
if (devp == NULL)
goto err_out;
 
-   if (dt_is_compatible(devp, "ns16550") ||
-   dt_is_compatible(devp, "pnpPNP,501"))
-   rc =

[PATCH 3/5] powerpc/zImage: Check compatible at driver init

2018-03-18 Thread Oliver O'Halloran
Have each driver's init function check the compatible string
of the node given by the stdout path. This gives the drivers
a more traditional probe-also-inits structure.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/boot/cpm-serial.c  | 2 ++
 arch/powerpc/boot/mpc52xx-psc.c | 3 +++
 arch/powerpc/boot/mpsc.c| 3 +++
 arch/powerpc/boot/ns16550.c | 4 
 arch/powerpc/boot/opal.c| 3 +++
 arch/powerpc/boot/uartlite.c| 5 +
 6 files changed, 20 insertions(+)

diff --git a/arch/powerpc/boot/cpm-serial.c b/arch/powerpc/boot/cpm-serial.c
index dfb56829cace..f6b1cf23946e 100644
--- a/arch/powerpc/boot/cpm-serial.c
+++ b/arch/powerpc/boot/cpm-serial.c
@@ -212,6 +212,8 @@ int cpm_console_init(void *devp, struct serial_console_data 
*scdp)
} else if (dt_is_compatible(devp, "fsl,cpm2-smc-uart")) {
is_cpm2 = 1;
is_smc = 1;
+   } else if (!dt_is_compatible(devp, "fsl,cpm1-scc-uart")) {
+   return 1; /* not compatible */
}
 
if (is_smc) {
diff --git a/arch/powerpc/boot/mpc52xx-psc.c b/arch/powerpc/boot/mpc52xx-psc.c
index c2c08633ee35..75470936e661 100644
--- a/arch/powerpc/boot/mpc52xx-psc.c
+++ b/arch/powerpc/boot/mpc52xx-psc.c
@@ -52,6 +52,9 @@ static unsigned char psc_getc(void)
 
 int mpc5200_psc_console_init(void *devp, struct serial_console_data *scdp)
 {
+   if (!dt_is_compatible(devp, "fsl,mpc5200-psc-uart"))
+   return 1;
+
/* Get the base address of the psc registers */
if (dt_get_virtual_reg(devp, &psc, 1) < 1)
return -1;
diff --git a/arch/powerpc/boot/mpsc.c b/arch/powerpc/boot/mpsc.c
index 425ad88cce8d..59e23e886440 100644
--- a/arch/powerpc/boot/mpsc.c
+++ b/arch/powerpc/boot/mpsc.c
@@ -128,6 +128,9 @@ int mpsc_console_init(void *devp, struct 
serial_console_data *scdp)
int n, reg_set;
volatile char *sdma_base;
 
+   if (!dt_is_compatible(devp, "marvell,mv64360-mpsc"))
+   return 1;
+
n = getprop(devp, "virtual-reg", &v, sizeof(v));
if (n != sizeof(v))
goto err_out;
diff --git a/arch/powerpc/boot/ns16550.c b/arch/powerpc/boot/ns16550.c
index b0da4466d419..0e7bd5284255 100644
--- a/arch/powerpc/boot/ns16550.c
+++ b/arch/powerpc/boot/ns16550.c
@@ -58,6 +58,10 @@ int ns16550_console_init(void *devp, struct 
serial_console_data *scdp)
int n;
u32 reg_offset;
 
+   if (!dt_is_compatible(devp, "ns16550") &&
+   !dt_is_compatible(devp, "pnpPNP,501"))
+   return 1;
+
if (dt_get_virtual_reg(devp, (void **)®_base, 1) < 1)
return -1;
 
diff --git a/arch/powerpc/boot/opal.c b/arch/powerpc/boot/opal.c
index dfb199ef5b94..0e92537536b9 100644
--- a/arch/powerpc/boot/opal.c
+++ b/arch/powerpc/boot/opal.c
@@ -83,6 +83,9 @@ static void opal_init(void)
 
 int opal_console_init(void *devp, struct serial_console_data *scdp)
 {
+   if (!dt_is_compatible(devp, "ibm,opal-console-raw"))
+   return 1;
+
opal_init();
 
if (devp) {
diff --git a/arch/powerpc/boot/uartlite.c b/arch/powerpc/boot/uartlite.c
index 46bed69b4169..fc66a5fcea66 100644
--- a/arch/powerpc/boot/uartlite.c
+++ b/arch/powerpc/boot/uartlite.c
@@ -62,6 +62,11 @@ int uartlite_console_init(void *devp, struct 
serial_console_data *scdp)
int n;
unsigned long reg_phys;
 
+
+   if (!dt_is_compatible(devp, "xlnx,opb-uartlite-1.00.b") &&
+   !dt_is_compatible(devp, "xlnx,xps-uartlite-1.00.a"))
+   return 1;
+
n = getprop(devp, "virtual-reg", ®_base, sizeof(reg_base));
if (n != sizeof(reg_base)) {
if (!dt_xlate_reg(devp, 0, ®_phys, NULL))
-- 
2.9.5



[PATCH 1/5] powerpc/zImage: Remove #ifdef in opal.c

2018-03-18 Thread Oliver O'Halloran
This file is only ever compiled if CONFIG_PPC64_BOOT_WRAPPER is set
so this check is unnecessary.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/boot/opal.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/powerpc/boot/opal.c b/arch/powerpc/boot/opal.c
index 0272570d02de..dfb199ef5b94 100644
--- a/arch/powerpc/boot/opal.c
+++ b/arch/powerpc/boot/opal.c
@@ -13,8 +13,6 @@
 #include 
 #include "../include/asm/opal-api.h"
 
-#ifdef CONFIG_PPC64_BOOT_WRAPPER
-
 /* Global OPAL struct used by opal-call.S */
 struct opal {
u64 base;
@@ -101,9 +99,3 @@ int opal_console_init(void *devp, struct serial_console_data 
*scdp)
 
return 0;
 }
-#else
-int opal_console_init(void *devp, struct serial_console_data *scdp)
-{
-   return -1;
-}
-#endif /* __powerpc64__ */
-- 
2.9.5



Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-18 Thread Al Viro
On Sun, Mar 18, 2018 at 06:18:48PM +, Al Viro wrote:

> I'd done some digging in that area, will find the notes and post.

OK, found:

We have two ABIs in the game - syscall and normal C.  The latter
(for all targets we support) can be described in the following
terms:
* there is a series of word-sized objects used to pass
arguments, some in registers, some on stack.  Arguments are
mapped on that sequence.  Anything up to word size simply takes
the next available word, so on 64bit it's all there is - nth
argument goes into the nth object, types simply do not matter.
On 32bit it's not that simple - there 64bit arguments occupy
two objects.  They are still picked from the same sequence;
on little-endian the lower half goes first, on big-endian -
the higher one.  For some architectures that's all there is to it.
However, on quite a few there's an extra complication - not every
pair can be used for 64bit value.  Rules for those are arch-dependent.
One variety is "pairs should be aligned", i.e. "if we'd consumed
an odd number of slots, add a dummy one before eating a pair".
Another is "don't let a pair span the registers/stack boundary";
surprisingly enough, that's not universal.

The syscall ABI can considerably differ from C one.  First of all,
we really do *not* want to pass anything on stack - it's a major
headache at syscall entry.  See mips/o32 for the scale of that headache.
Not fun.  OTOH, the registers that can be used are a limited resource.
i386 can't pass more than 6 words and that has served as an upper
limit.  If we encode the argument sizes (W - word, D - 64bit) we
have the following variants among the syscalls:
* no arguments at all (SYSCALL_DEFINE0)
* W (SYSCALL_DEFINE1)
* WW (SYSCALL_DEFINE2)
* WWW (SYSCALL_DEFINE3)
*  (SYSCALL_DEFINE4)
* W (SYSCALL_DEFINE5)
* WW (SYSCALL_DEFINE6)
* WD (SYSCALL_DEFINE2, truncate64, ftruncate64)
* DWW (SYSCALL_DEFINE3, lookup_dcookie)
* WDW (SYSCALL_DEFINE3, readahead)
* WWWD (SYSCALL_DEFINE4, pread64, pwrite64)
* WWDD (SYSCALL_DEFINE4, fallocate, sync_file_range2)
* WDDW (SYSCALL_DEFINE4, sync_file_range, fadvise64_64)
* WDWW (SYSCALL_DEFINE4, fadvise64)
* WWDWW (SYSCALL_DEFINE5, fanotify_mark)

The list for each long long-passing variant might be incomplete, but
they really are rare.  And a source of headache; not just for biarch
architectures - e.g. c6x and metag are not biarch and these syscalls
are exactly what stinks them up.

One thing we *really* don't want is syscall-dependent mapping from
syscall ABI registers to C ABI sequence.  Inside a syscall (or in
per-syscall glue) - sure, we can do it (if not happily).  In the
syscall dispatcher - fuck no, too much PITA.  mips/o32 used to be
a horrible example of why not, then they went for "copy from userland
stack whether we need it or not"...

For simple syscalls (first 7 classes in the above, each argument fits
into word) we simply map registers to the first 6 slots of the sequence
and we are done.  It's bloody tempting to use the same mapping for
the rest - use the same code that calls simple syscalls and have it
call sys_readahead() instead of sys_mkdir(), etc.  For something like
x86 or sparc that works perfectly - these guys have no padding for 64bit
arguments, so we are good (provided that userland sets those registers
sanely, that is).

OTOH, consider arm.  There we have
* r0, r1, r2, r3, [sp,#8], [sp,#12], [sp,#16]... is the sequence
of objects used to pass arguments
* 32bit and less - pick the next available slot
* 64bit - skip a slot if we'd already taken an odd number, then use
the next two slots for lower and upper 32 bits of the argument.

So our classes take
simple n-argument:  0 to 6 slots
WD  4 slots
DWW 4 slots
WDW 5 slots
WWDD6 slots
WDWW5 slots
WWWD6 slots
WWDWW   6 slots
WDDW7 slots (!)  Also , , !@#!@#!@#!# and other nice
and well-deserved comments from arch maintainers, some of them even printable:
/* It would be nice if people remember that not all the world's an i386
   when they introduce new system calls */
SYSCALL_DEFINE4(sync_file_range2, int, fd, unsigned int, flags,
 loff_t, offset, loff_t, nbytes)

No idea why that hadn't been done to fadvise64_64() - that got
/*
 * Since loff_t is a 64 bit type we avoid a lot of ABI hassle
 * with a different argument ordering.
 */
asmlinkage long sys_arm_fadvise64_64(int fd, int advice,
 loff_t offset, loff_t len)
{
return sys_fadvise64_64(fd, offset, len, advice);
}
long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
  u32 len_high, u32 len_low)
{
return sys_fadvise64(fd, (u64)offset_hig

[PATCH v2 9/9] powerpc/eeh: Add eeh_state_active() helper

2018-03-18 Thread Sam Bobroff
Checking for a "fully active" device state requires testing two flag
bits, which is open coded in several places, so add a function to do
it.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh.h   |  6 ++
 arch/powerpc/kernel/eeh.c| 19 ++-
 arch/powerpc/platforms/powernv/eeh-powernv.c |  9 ++---
 3 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index fd37cc101f4f..c2266ca61853 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -256,6 +256,12 @@ static inline void eeh_serialize_unlock(unsigned long 
flags)
raw_spin_unlock_irqrestore(&confirm_error_lock, flags);
 }
 
+static inline bool eeh_state_active(int state)
+{
+   return (state & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE))
+   == (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
+}
+
 typedef void *(*eeh_traverse_func)(void *data, void *flag);
 void eeh_set_pe_aux_size(int size);
 int eeh_phb_pe_create(struct pci_controller *phb);
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 2b9df0040d6b..bc640e4c5ca5 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -394,9 +394,7 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
/* Check PHB state */
ret = eeh_ops->get_state(phb_pe, NULL);
if ((ret < 0) ||
-   (ret == EEH_STATE_NOT_SUPPORT) ||
-   (ret & (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) ==
-   (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE)) {
+   (ret == EEH_STATE_NOT_SUPPORT) || eeh_state_active(ret)) {
ret = 0;
goto out;
}
@@ -433,7 +431,6 @@ static int eeh_phb_check_failure(struct eeh_pe *pe)
 int eeh_dev_check_failure(struct eeh_dev *edev)
 {
int ret;
-   int active_flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
unsigned long flags;
struct device_node *dn;
struct pci_dev *dev;
@@ -525,8 +522,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 * state, PE is in good state.
 */
if ((ret < 0) ||
-   (ret == EEH_STATE_NOT_SUPPORT) ||
-   ((ret & active_flags) == active_flags)) {
+   (ret == EEH_STATE_NOT_SUPPORT) || eeh_state_active(ret)) {
eeh_stats.false_positives++;
pe->false_positives++;
rc = 0;
@@ -546,8 +542,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 
/* Frozen parent PE ? */
ret = eeh_ops->get_state(parent_pe, NULL);
-   if (ret > 0 &&
-   (ret & active_flags) != active_flags)
+   if (ret > 0 && !eeh_state_active(ret))
pe = parent_pe;
 
/* Next parent level */
@@ -888,7 +883,6 @@ static void *eeh_set_dev_freset(void *data, void *flag)
  */
 int eeh_pe_reset_full(struct eeh_pe *pe)
 {
-   int active_flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
int reset_state = (EEH_PE_RESET | EEH_PE_CFG_BLOCKED);
int type = EEH_RESET_HOT;
unsigned int freset = 0;
@@ -919,7 +913,7 @@ int eeh_pe_reset_full(struct eeh_pe *pe)
 
/* Wait until the PE is in a functioning state */
state = eeh_ops->wait_state(pe, PCI_BUS_RESET_WAIT_MSEC);
-   if ((state & active_flags) == active_flags)
+   if (eeh_state_active(state))
break;
 
if (state < 0) {
@@ -1352,16 +1346,15 @@ static int eeh_pe_change_owner(struct eeh_pe *pe)
struct eeh_dev *edev, *tmp;
struct pci_dev *pdev;
struct pci_device_id *id;
-   int flags, ret;
+   int ret;
 
/* Check PE state */
-   flags = (EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE);
ret = eeh_ops->get_state(pe, NULL);
if (ret < 0 || ret == EEH_STATE_NOT_SUPPORT)
return 0;
 
/* Unfrozen PE, nothing to do */
-   if ((ret & flags) == flags)
+   if (eeh_state_active(ret))
return 0;
 
/* Frozen PE, check if it needs PE level reset */
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 33c86c1a1720..ddfc3544d285 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1425,11 +1425,8 @@ static int pnv_eeh_get_pe(struct pci_controller *hose,
dev_pe = dev_pe->parent;
while (dev_pe && !(dev_pe->type & EEH_PE_PHB)) {
int ret;
-   int active_flags = (EEH_STATE_MMIO_ACTIVE |
-   EEH_STATE_DMA_ACTIVE);
-
ret = eeh_ops->get_state(dev_pe, NULL);
-   if (ret <= 0 || (ret & active_flags) == active_flags) {
+   if (ret <= 0 || eeh_state_active(ret)) {
dev_pe = de

[PATCH v2 8/9] powerpc/eeh: Factor out common code eeh_reset_device()

2018-03-18 Thread Sam Bobroff
The caller will always pass NULL for 'rmv_data' when
'eeh_aware_driver' is true, so the first two calls to
eeh_pe_dev_traverse() can be combined without changing behaviour as
can the two arms of the final 'if' block.

This should not change behaviour.

Signed-off-by: Sam Bobroff 
---
== v1 -> v2: ==

 arch/powerpc/kernel/eeh_driver.c | 32 ++--
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 93fc22e791fa..aa3a2b08aec9 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -647,16 +647,12 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
 * into pci_hp_add_devices().
 */
eeh_pe_state_mark(pe, EEH_PE_KEEP);
-   if (!driver_eeh_aware) {
-   if (pe->type & EEH_PE_VF) {
-   eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
-   } else {
-   pci_lock_rescan_remove();
-   pci_hp_remove_devices(bus);
-   pci_unlock_rescan_remove();
-   }
-   } else {
+   if (driver_eeh_aware || (pe->type & EEH_PE_VF)) {
eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data);
+   } else {
+   pci_lock_rescan_remove();
+   pci_hp_remove_devices(bus);
+   pci_unlock_rescan_remove();
}
 
/*
@@ -691,8 +687,9 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
 * the device up before the scripts have taken it down,
 * potentially weird things happen.
 */
-   if (!driver_eeh_aware) {
-   pr_info("EEH: Sleep 5s ahead of complete hotplug\n");
+   if (!driver_eeh_aware || rmv_data->removed) {
+   pr_info("EEH: Sleep 5s ahead of %s hotplug\n",
+   (eeh_aware_driver ? "partial" : "complete"));
ssleep(5);
 
/*
@@ -705,19 +702,10 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
if (pe->type & EEH_PE_VF) {
eeh_add_virt_device(edev, NULL);
} else {
-   eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
+   if (!eeh_aware_driver)
+   eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
pci_hp_add_devices(bus);
}
-   } else if (rmv_data->removed) {
-   pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
-   ssleep(5);
-
-   edev = list_first_entry(&pe->edevs, struct eeh_dev, list);
-   eeh_pe_traverse(pe, eeh_pe_detach_dev, NULL);
-   if (pe->type & EEH_PE_VF)
-   eeh_add_virt_device(edev, NULL);
-   else
-   pci_hp_add_devices(bus);
}
eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 7/9] powerpc/eeh: Remove always-true tests in eeh_reset_device()

2018-03-18 Thread Sam Bobroff
eeh_reset_device() tests the value of 'bus' more than once but the
only caller, eeh_handle_normal_device() does this test itself and will
never pass NULL.

So, remove the dead tests.

This should not change behaviour.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh_driver.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 07437d765434..93fc22e791fa 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -655,7 +655,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
pci_hp_remove_devices(bus);
pci_unlock_rescan_remove();
}
-   } else if (bus) {
+   } else {
eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data);
}
 
@@ -708,7 +708,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
pci_hp_add_devices(bus);
}
-   } else if (bus && rmv_data->removed) {
+   } else if (rmv_data->removed) {
pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
ssleep(5);
 
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 6/9] powerpc/eeh: Clarify arguments to eeh_reset_device()

2018-03-18 Thread Sam Bobroff
It is currently difficult to understand the behaviour of
eeh_reset_device() due to the way it's parameters are used. In
particular, when 'bus' is NULL, it's value is still necessary so the
same value is looked up again locally under a different name
('frozen_bus') but behaviour is changed.

To clarify this, add a new parameter 'driver_eeh_aware', and have the
caller set it when it would have passed NULL for 'bus' and always pass
a value for 'bus'. Then change any test that was on 'bus' to one on
'!driver_eeh_aware' and replace uses of 'frozen_bus' with 'bus'.

Also update the function's comment.

This should not change behaviour.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh_driver.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index cb584d72b0a5..07437d765434 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -619,17 +619,19 @@ int eeh_pe_reset_and_recover(struct eeh_pe *pe)
 
 /**
  * eeh_reset_device - Perform actual reset of a pci slot
+ * @driver_eeh_aware: Does the device's driver provide EEH support?
  * @pe: EEH PE
  * @bus: PCI bus corresponding to the isolcated slot
+ * @rmv_data: Optional, list to record removed devices
  *
  * This routine must be called to do reset on the indicated PE.
  * During the reset, udev might be invoked because those affected
  * PCI devices will be removed and then added.
  */
 static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus,
-   struct eeh_rmv_data *rmv_data)
+   struct eeh_rmv_data *rmv_data,
+   bool driver_eeh_aware)
 {
-   struct pci_bus *frozen_bus = eeh_pe_bus_get(pe);
time64_t tstamp;
int cnt, rc;
struct eeh_dev *edev;
@@ -645,7 +647,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
 * into pci_hp_add_devices().
 */
eeh_pe_state_mark(pe, EEH_PE_KEEP);
-   if (bus) {
+   if (!driver_eeh_aware) {
if (pe->type & EEH_PE_VF) {
eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
} else {
@@ -653,7 +655,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
pci_hp_remove_devices(bus);
pci_unlock_rescan_remove();
}
-   } else if (frozen_bus) {
+   } else if (bus) {
eeh_pe_dev_traverse(pe, eeh_rmv_device, rmv_data);
}
 
@@ -689,7 +691,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
 * the device up before the scripts have taken it down,
 * potentially weird things happen.
 */
-   if (bus) {
+   if (!driver_eeh_aware) {
pr_info("EEH: Sleep 5s ahead of complete hotplug\n");
ssleep(5);
 
@@ -706,7 +708,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
pci_hp_add_devices(bus);
}
-   } else if (frozen_bus && rmv_data->removed) {
+   } else if (bus && rmv_data->removed) {
pr_info("EEH: Sleep 5s ahead of partial hotplug\n");
ssleep(5);
 
@@ -715,7 +717,7 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
if (pe->type & EEH_PE_VF)
eeh_add_virt_device(edev, NULL);
else
-   pci_hp_add_devices(frozen_bus);
+   pci_hp_add_devices(bus);
}
eeh_pe_state_clear(pe, EEH_PE_KEEP);
 
@@ -820,7 +822,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 */
if (result == PCI_ERS_RESULT_NONE) {
pr_info("EEH: Reset with hotplug activity\n");
-   rc = eeh_reset_device(pe, bus, NULL);
+   rc = eeh_reset_device(pe, bus, NULL, false);
if (rc) {
pr_warn("%s: Unable to reset, err=%d\n",
__func__, rc);
@@ -872,7 +874,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
/* If any device called out for a reset, then reset the slot */
if (result == PCI_ERS_RESULT_NEED_RESET) {
pr_info("EEH: Reset without hotplug activity\n");
-   rc = eeh_reset_device(pe, NULL, &rmv_data);
+   rc = eeh_reset_device(pe, bus, &rmv_data, true);
if (rc) {
pr_warn("%s: Cannot reset, err=%d\n",
__func__, rc);
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 5/9] powerpc/eeh: Rename frozen_bus to bus in eeh_handle_normal_event()

2018-03-18 Thread Sam Bobroff
The name "frozen_bus" is misleading: it's not necessarily frozen, it's
just the PE's PCI bus.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh_driver.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 04a5d9db5499..cb584d72b0a5 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -754,14 +754,14 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
  */
 void eeh_handle_normal_event(struct eeh_pe *pe)
 {
-   struct pci_bus *frozen_bus;
+   struct pci_bus *bus;
struct eeh_dev *edev, *tmp;
int rc = 0;
enum pci_ers_result result = PCI_ERS_RESULT_NONE;
struct eeh_rmv_data rmv_data = {LIST_HEAD_INIT(rmv_data.edev_list), 0};
 
-   frozen_bus = eeh_pe_bus_get(pe);
-   if (!frozen_bus) {
+   bus = eeh_pe_bus_get(pe);
+   if (!bus) {
pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n",
__func__, pe->phb->global_number, pe->addr);
return;
@@ -820,7 +820,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 */
if (result == PCI_ERS_RESULT_NONE) {
pr_info("EEH: Reset with hotplug activity\n");
-   rc = eeh_reset_device(pe, frozen_bus, NULL);
+   rc = eeh_reset_device(pe, bus, NULL);
if (rc) {
pr_warn("%s: Unable to reset, err=%d\n",
__func__, rc);
@@ -938,7 +938,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 
pci_lock_rescan_remove();
-   pci_hp_remove_devices(frozen_bus);
+   pci_hp_remove_devices(bus);
pci_unlock_rescan_remove();
/* The passed PE should no longer be used */
return;
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 4/9] powerpc/eeh: Remove misleading test in eeh_handle_normal_event()

2018-03-18 Thread Sam Bobroff
Remove a test that checks if "frozen_bus" is NULL, because it cannot
have changed since it was tested at the start of the function and so
must be true here.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh_driver.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 5b7a5ed4db4d..04a5d9db5499 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -930,20 +930,18 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 * all removed devices correctly to avoid access
 * the their PCI config any more.
 */
-   if (frozen_bus) {
-   if (pe->type & EEH_PE_VF) {
-   eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
-   eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
-   } else {
-   eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
-   eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+   if (pe->type & EEH_PE_VF) {
+   eeh_pe_dev_traverse(pe, eeh_rmv_device, NULL);
+   eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
+   } else {
+   eeh_pe_state_clear(pe, EEH_PE_PRI_BUS);
+   eeh_pe_dev_mode_mark(pe, EEH_DEV_REMOVED);
 
-   pci_lock_rescan_remove();
-   pci_hp_remove_devices(frozen_bus);
-   pci_unlock_rescan_remove();
-   /* The passed PE should no longer be used */
-   return;
-   }
+   pci_lock_rescan_remove();
+   pci_hp_remove_devices(frozen_bus);
+   pci_unlock_rescan_remove();
+   /* The passed PE should no longer be used */
+   return;
}
 final:
eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 3/9] powerpc/eeh: Fix misleading comment in __eeh_addr_cache_get_device()

2018-03-18 Thread Sam Bobroff
Commit "0ba17b05 powerpc/eeh: Remove reference to PCI device"
removed a call to pci_dev_get() from __eeh_addr_cache_get_device() but
did not update the comment to match.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/kernel/eeh_cache.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index d4cc26618809..201943d54a6e 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -84,8 +84,7 @@ static inline struct eeh_dev 
*__eeh_addr_cache_get_device(unsigned long addr)
  * @addr: mmio (PIO) phys address or i/o port number
  *
  * Given an mmio phys address, or a port number, find a pci device
- * that implements this address.  Be sure to pci_dev_put the device
- * when finished.  I/O port numbers are assumed to be offset
+ * that implements this address.  I/O port numbers are assumed to be offset
  * from zero (that is, they do *not* have pci_io_addr added in).
  * It is safe to call this function within an interrupt.
  */
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 2/9] powerpc/eeh: Manage EEH_PE_RECOVERING inside eeh_handle_normal_event()

2018-03-18 Thread Sam Bobroff
Currently the EEH_PE_RECOVERING flag for a PE is managed by both the
caller and callee of eeh_handle_normal_event() (among other places not
considered here). This is complicated by the fact that the PE may
or may not have been invalidated by the call.

So move the callee's handling into eeh_handle_normal_event(), which
clarifies it and allows the return type to be changed to void (because
it no longer needs to indicate at the PE has been invalidated).

This should not change behaviour except in eeh_event_handler() where
it was previously possible to cause eeh_pe_state_clear() to be called
on an invalid PE, which is now avoided.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh_event.h |  2 +-
 arch/powerpc/kernel/eeh_driver.c | 29 +++--
 arch/powerpc/kernel/eeh_event.c  |  2 --
 3 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh_event.h 
b/arch/powerpc/include/asm/eeh_event.h
index 0a168038882d..9884e872686f 100644
--- a/arch/powerpc/include/asm/eeh_event.h
+++ b/arch/powerpc/include/asm/eeh_event.h
@@ -34,7 +34,7 @@ struct eeh_event {
 int eeh_event_init(void);
 int eeh_send_failure_event(struct eeh_pe *pe);
 void eeh_remove_event(struct eeh_pe *pe, bool force);
-bool eeh_handle_normal_event(struct eeh_pe *pe);
+void eeh_handle_normal_event(struct eeh_pe *pe);
 void eeh_handle_special_event(void);
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 51b21c97910f..5b7a5ed4db4d 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -733,7 +733,8 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
 
 /**
  * eeh_handle_normal_event - Handle EEH events on a specific PE
- * @pe: EEH PE
+ * @pe: EEH PE - which should not be used after we return, as it may
+ * have been invalidated.
  *
  * Attempts to recover the given PE.  If recovery fails or the PE has failed
  * too many times, remove the PE.
@@ -750,10 +751,8 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
  * & devices under this slot, and then finally restarting the device
  * drivers (which cause a second set of hotplug events to go out to
  * userspace).
- *
- * Returns true if @pe should no longer be used, else false.
  */
-bool eeh_handle_normal_event(struct eeh_pe *pe)
+void eeh_handle_normal_event(struct eeh_pe *pe)
 {
struct pci_bus *frozen_bus;
struct eeh_dev *edev, *tmp;
@@ -765,9 +764,11 @@ bool eeh_handle_normal_event(struct eeh_pe *pe)
if (!frozen_bus) {
pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n",
__func__, pe->phb->global_number, pe->addr);
-   return false;
+   return;
}
 
+   eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
+
eeh_pe_update_time_stamp(pe);
pe->freeze_count++;
if (pe->freeze_count > eeh_max_freezes) {
@@ -904,7 +905,7 @@ bool eeh_handle_normal_event(struct eeh_pe *pe)
pr_info("EEH: Notify device driver to resume\n");
eeh_pe_dev_traverse(pe, eeh_report_resume, NULL);
 
-   return false;
+   goto final;
 
 hard_fail:
/*
@@ -940,12 +941,12 @@ bool eeh_handle_normal_event(struct eeh_pe *pe)
pci_lock_rescan_remove();
pci_hp_remove_devices(frozen_bus);
pci_unlock_rescan_remove();
-
/* The passed PE should no longer be used */
-   return true;
+   return;
}
}
-   return false;
+final:
+   eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
 }
 
 /**
@@ -1018,15 +1019,7 @@ void eeh_handle_special_event(void)
 */
if (rc == EEH_NEXT_ERR_FROZEN_PE ||
rc == EEH_NEXT_ERR_FENCED_PHB) {
-   /*
-* eeh_handle_normal_event() can make the PE stale if it
-* determines that the PE cannot possibly be recovered.
-* Don't modify the PE state if that's the case.
-*/
-   if (eeh_handle_normal_event(pe))
-   continue;
-
-   eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
+   eeh_handle_normal_event(pe);
} else {
pci_lock_rescan_remove();
list_for_each_entry(hose, &hose_list, list_node) {
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index 872bcfe8f90e..61c9356bf9c9 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -73,7 +73,6 @@ static int eeh_event_handler(void * dummy)
/* We might have event without binding PE */
pe = event->pe;
if (pe) {
-   eeh_pe_state_mark(pe, EEH_PE_RE

[PATCH v2 1/9] powerpc/eeh: Remove eeh_handle_event()

2018-03-18 Thread Sam Bobroff
The function eeh_handle_event(pe) does nothing other than switching
between calling eeh_handle_normal_event(pe) and
eeh_handle_special_event(). However it is only called in two places,
one where pe can't be NULL and the other where it must be NULL (see
eeh_event_handler()) so it does nothing but obscure the flow of
control.

So, remove it.

Signed-off-by: Sam Bobroff 
---
 arch/powerpc/include/asm/eeh_event.h |  3 ++-
 arch/powerpc/kernel/eeh_driver.c | 42 +---
 arch/powerpc/kernel/eeh_event.c  |  4 ++--
 3 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh_event.h 
b/arch/powerpc/include/asm/eeh_event.h
index 1e551a2d6f82..0a168038882d 100644
--- a/arch/powerpc/include/asm/eeh_event.h
+++ b/arch/powerpc/include/asm/eeh_event.h
@@ -34,7 +34,8 @@ struct eeh_event {
 int eeh_event_init(void);
 int eeh_send_failure_event(struct eeh_pe *pe);
 void eeh_remove_event(struct eeh_pe *pe, bool force);
-void eeh_handle_event(struct eeh_pe *pe);
+bool eeh_handle_normal_event(struct eeh_pe *pe);
+void eeh_handle_special_event(void);
 
 #endif /* __KERNEL__ */
 #endif /* ASM_POWERPC_EEH_EVENT_H */
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 0c0b66fc5bfb..51b21c97910f 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -738,9 +738,22 @@ static int eeh_reset_device(struct eeh_pe *pe, struct 
pci_bus *bus,
  * Attempts to recover the given PE.  If recovery fails or the PE has failed
  * too many times, remove the PE.
  *
+ * While PHB detects address or data parity errors on particular PCI
+ * slot, the associated PE will be frozen. Besides, DMA's occurring
+ * to wild addresses (which usually happen due to bugs in device
+ * drivers or in PCI adapter firmware) can cause EEH error. #SERR,
+ * #PERR or other misc PCI-related errors also can trigger EEH errors.
+ *
+ * Recovery process consists of unplugging the device driver (which
+ * generated hotplug events to userspace), then issuing a PCI #RST to
+ * the device, then reconfiguring the PCI config space for all bridges
+ * & devices under this slot, and then finally restarting the device
+ * drivers (which cause a second set of hotplug events to go out to
+ * userspace).
+ *
  * Returns true if @pe should no longer be used, else false.
  */
-static bool eeh_handle_normal_event(struct eeh_pe *pe)
+bool eeh_handle_normal_event(struct eeh_pe *pe)
 {
struct pci_bus *frozen_bus;
struct eeh_dev *edev, *tmp;
@@ -942,7 +955,7 @@ static bool eeh_handle_normal_event(struct eeh_pe *pe)
  * specific PE.  Iterates through possible failures and handles them as
  * necessary.
  */
-static void eeh_handle_special_event(void)
+void eeh_handle_special_event(void)
 {
struct eeh_pe *pe, *phb_pe;
struct pci_bus *bus;
@@ -1049,28 +1062,3 @@ static void eeh_handle_special_event(void)
break;
} while (rc != EEH_NEXT_ERR_NONE);
 }
-
-/**
- * eeh_handle_event - Reset a PCI device after hard lockup.
- * @pe: EEH PE
- *
- * While PHB detects address or data parity errors on particular PCI
- * slot, the associated PE will be frozen. Besides, DMA's occurring
- * to wild addresses (which usually happen due to bugs in device
- * drivers or in PCI adapter firmware) can cause EEH error. #SERR,
- * #PERR or other misc PCI-related errors also can trigger EEH errors.
- *
- * Recovery process consists of unplugging the device driver (which
- * generated hotplug events to userspace), then issuing a PCI #RST to
- * the device, then reconfiguring the PCI config space for all bridges
- * & devices under this slot, and then finally restarting the device
- * drivers (which cause a second set of hotplug events to go out to
- * userspace).
- */
-void eeh_handle_event(struct eeh_pe *pe)
-{
-   if (pe)
-   eeh_handle_normal_event(pe);
-   else
-   eeh_handle_special_event();
-}
diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
index accbf8b5fd46..872bcfe8f90e 100644
--- a/arch/powerpc/kernel/eeh_event.c
+++ b/arch/powerpc/kernel/eeh_event.c
@@ -81,10 +81,10 @@ static int eeh_event_handler(void * dummy)
pr_info("EEH: Detected PCI bus error on "
"PHB#%x-PE#%x\n",
pe->phb->global_number, pe->addr);
-   eeh_handle_event(pe);
+   eeh_handle_normal_event(pe);
eeh_pe_state_clear(pe, EEH_PE_RECOVERING);
} else {
-   eeh_handle_event(NULL);
+   eeh_handle_special_event();
}
 
kfree(event);
-- 
2.16.1.74.g9b0b1f47b



[PATCH v2 0/9] EEH refactoring 1

2018-03-18 Thread Sam Bobroff
Hello everyone,

Here is a set of some small, mostly idempotent, changes to improve
maintainability in some of the EEH code, primarily in eeh_driver.c.

I've kept them all small to aid review but perhaps they should be squashed down
before being applied.

Cheers,
Sam.

Patch set changelog follows:

== v1 -> v2: ==

Patch 1/9: powerpc/eeh: Remove eeh_handle_event()

Patch 2/9: powerpc/eeh: Manage EEH_PE_RECOVERING inside 
eeh_handle_normal_event()

Patch 3/9: powerpc/eeh: Fix misleading comment in __eeh_addr_cache_get_device()

Patch 4/9: powerpc/eeh: Remove misleading test in eeh_handle_normal_event()

Patch 5/9: powerpc/eeh: Rename frozen_bus to bus in eeh_handle_normal_event()

Patch 6/9: powerpc/eeh: Clarify arguments to eeh_reset_device()
* Re-ordered parameters to eeh_reset_device() to keep pe first.
* Changed eeh_aware_driver to driver_eeh_aware.

Patch 7/9: powerpc/eeh: Remove always-true tests in eeh_reset_device()

Patch 8/9: powerpc/eeh: Factor out common code eeh_reset_device()
* In one case, added braces to "if" to match "else".

Patch 9/9: powerpc/eeh: Add eeh_state_active() helper

== v1: ==

Patch 1/9: powerpc/eeh: Remove eeh_handle_event()
Patch 2/9: powerpc/eeh: Manage EEH_PE_RECOVERING inside 
eeh_handle_normal_event()
Patch 3/9: powerpc/eeh: Fix misleading comment in __eeh_addr_cache_get_device()
Patch 4/9: powerpc/eeh: Remove misleading test in eeh_handle_normal_event()
Patch 5/9: powerpc/eeh: Rename frozen_bus to bus in eeh_handle_normal_event()
Patch 6/9: powerpc/eeh: Clarify arguments to eeh_reset_device()
Patch 7/9: powerpc/eeh: Remove always-true tests in eeh_reset_device()
Patch 8/9: powerpc/eeh: Factor out common code eeh_reset_device()
Patch 9/9: powerpc/eeh: Add eeh_state_active() helper

Sam Bobroff (9):
  powerpc/eeh: Remove eeh_handle_event()
  powerpc/eeh: Manage EEH_PE_RECOVERING inside eeh_handle_normal_event()
  powerpc/eeh: Fix misleading comment in __eeh_addr_cache_get_device()
  powerpc/eeh: Remove misleading test in eeh_handle_normal_event()
  powerpc/eeh: Rename frozen_bus to bus in eeh_handle_normal_event()
  powerpc/eeh: Clarify arguments to eeh_reset_device()
  powerpc/eeh: Remove always-true tests in eeh_reset_device()
  powerpc/eeh: Factor out common code eeh_reset_device()
  powerpc/eeh: Add eeh_state_active() helper

 arch/powerpc/include/asm/eeh.h   |   6 ++
 arch/powerpc/include/asm/eeh_event.h |   3 +-
 arch/powerpc/kernel/eeh.c|  19 ++--
 arch/powerpc/kernel/eeh_cache.c  |   3 +-
 arch/powerpc/kernel/eeh_driver.c | 137 +++
 arch/powerpc/kernel/eeh_event.c  |   6 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c |   9 +-
 7 files changed, 72 insertions(+), 111 deletions(-)

-- 
2.16.1.74.g9b0b1f47b



Re: [PATCH v3 18/18] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs

2018-03-18 Thread Jason Gunthorpe
On Sat, Mar 17, 2018 at 02:30:10PM -0400, Sinan Kaya wrote:

> Somebody also has to take a task and work very hard to get rid of 
> __raw_writeX()
> APIs in drivers/net directory. It looked like a very common practice though
> it clearly violates multiarch portability concerns Jason and Deve highlighted.

When you posted your list I thought most of the hits were in what I'd
think of 'one-arch drivers', eg an IRQ controller or clock driver or
something.. Some might have a reason for it (eg avoiding the swap, for
instance), maybe it is a hold over from before writel_relaxed, or
maybe it is just a cargo-cult behavior..

It is the obviously multi-arch drivers that probably need some
attention..

Jason


Re: [RFC PATCH 2/2] powerpc: Only support DYNAMIC_FTRACE not static

2018-03-18 Thread Michael Ellerman
Steven Rostedt  writes:

> On Sat, 17 Mar 2018 00:46:33 +1100
> Michael Ellerman  wrote:
>
>> We've had dynamic ftrace support for over 9 years since Steve first
>> wrote it, all the distros use dynamic, and static is basically
>> untested these days, so drop support for static ftrace.
>> 
>> Signed-off-by: Michael Ellerman 
>> ---
>>  arch/powerpc/Kconfig   |  1 +
>>  arch/powerpc/include/asm/ftrace.h  |  4 +---
>>  arch/powerpc/include/asm/module.h  |  5 -
>>  arch/powerpc/kernel/trace/ftrace.c |  2 --
>>  arch/powerpc/kernel/trace/ftrace_32.S  | 20 --
>>  arch/powerpc/kernel/trace/ftrace_64.S  | 29 
>> --
>>  arch/powerpc/kernel/trace/ftrace_64_mprofile.S |  3 ---
>>  arch/powerpc/kernel/trace/ftrace_64_pg.S   |  2 --
>>  8 files changed, 2 insertions(+), 64 deletions(-)
>> 
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index 73ce5dd07642..23a325df784a 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -189,6 +189,7 @@ config PPC
>>  select HAVE_DEBUG_STACKOVERFLOW
>>  select HAVE_DMA_API_DEBUG
>>  select HAVE_DYNAMIC_FTRACE
>> +select HAVE_DYNAMIC_FTRACE_ONLY
>
> I still think adding:
>
>   select DYNAMIC_FTRACE   if FUNCTION_TRACER
>
> is the better approach.

OK. As I said in my other reply it's a bit fragile, but it does work.

I'll do a version for powerpc using the above approach.

> But I'm all for this patch. I've debated doing the same thing for x86,
> but the only reason I have not, was because it's the only way I test
> the !DYNAMIC_FTRACE code. I've broken the static function tracing
> several times and only find out during my test suite that still tests
> that case. But yeah, it would be nice to just nuke static function
> tracing for all archs. Perhaps after we finish removing unused archs,
> that may be the way to go forward.

Yeah I did look and we still have some arches that support ftrace but
not dynamic ftrace, but there's not many.

cheers


Re: [RFC PATCH 1/2] ftrace: Allow arches to opt-out of static ftrace

2018-03-18 Thread Michael Ellerman
Steven Rostedt  writes:

> On Sat, 17 Mar 2018 00:46:32 +1100
> Michael Ellerman  wrote:
>
>> There is a small but non-zero amount of code required by arches to
>> suppory non-dynamic (static) ftrace, and more importantly there is the
>> added work of testing both configurations.
>> 
>> There are also almost no down sides to dynamic ftrace once it's well
>> tested, other than a small increase in code/data size.
>> 
>> So give arches the option to opt-out of supporting static ftrace.
>> 
>> This is implemented as a DYNAMIC_FTRACE_CHOICE option, which controls
>> whether DYNAMIC_FTRACE is presented as a user-selectable option or if
>> it is just enabled based on its dependencies being enabled (because
>> it's already default y).
>> 
>> Then the CHOICE option depends on an arch *not* selecting
>> HAVE_DYNAMIC_FTRACE_ONLY. This would be more natural in reverse, as a
>> HAVE_STATIC_FTRACE option, but that would require updating every arch.
>> 
>> Signed-off-by: Michael Ellerman 
>
> Why not just add in arch/powerpc/Kconfig:
>
> config PPC
>   [..]
>   select DYNAMIC_FTRACE   if FUNCTION_TRACER
>
> ?
>
> It seems to work for me.

It does work, but it's a bit fragile. It requires duplicating the
dependencies of DYNAMIC_FTRACE in the 'if' condition.

Currently that's:

config DYNAMIC_FTRACE
depends on FUNCTION_TRACER
depends on HAVE_DYNAMIC_FTRACE

So technically we should use:

select DYNAMIC_FTRACE   if FUNCTION_TRACER and 
HAVE_DYNAMIC_FTRACE

Though we happen to know we just selected HAVE_DYNAMIC_FTRACE so we can
leave that out.

As long as the dependencies of DYNAMIC_FTRACE don't change, or the 'if'
clause is updated when they do, then it's OK and it is certainly simpler.

cheers


Re: [PATCH 1/5] mtd: Initialize ->fail_addr early in mtd_erase()

2018-03-18 Thread Boris Brezillon
On Mon, 12 Feb 2018 22:03:07 +0100
Boris Brezillon  wrote:

> mtd_erase() can return an error before ->fail_addr is initialized to
> MTD_FAIL_ADDR_UNKNOWN. Move this initialization at the very beginning
> of the function.

Applied the patchset after addressing Miquel's comments.

> 
> Signed-off-by: Boris Brezillon 
> ---
>  drivers/mtd/mtdcore.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> index a1c94526fb88..c87859ff338b 100644
> --- a/drivers/mtd/mtdcore.c
> +++ b/drivers/mtd/mtdcore.c
> @@ -953,6 +953,8 @@ EXPORT_SYMBOL_GPL(__put_mtd_device);
>   */
>  int mtd_erase(struct mtd_info *mtd, struct erase_info *instr)
>  {
> + instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN;
> +
>   if (!mtd->erasesize || !mtd->_erase)
>   return -ENOTSUPP;
>  
> @@ -961,7 +963,6 @@ int mtd_erase(struct mtd_info *mtd, struct erase_info 
> *instr)
>   if (!(mtd->flags & MTD_WRITEABLE))
>   return -EROFS;
>  
> - instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN;
>   if (!instr->len) {
>   instr->state = MTD_ERASE_DONE;
>   mtd_erase_callback(instr);



-- 
Boris Brezillon, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [RFC PATCH 2/6] fs: provide a generic compat_sys_truncate64() implementation

2018-03-18 Thread Linus Torvalds
On Sun, Mar 18, 2018 at 9:10 AM, Dominik Brodowski
 wrote:
> +#ifdef __ARCH_WANT_COMPAT_SYS_TRUNCATE64
> +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> +   defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE4(truncate64, const char __user *, filename, u32 
> padding,
> +  unsigned int, offset_low, unsigned int, offset_high)
> +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> +   !defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE4(truncate64, const char __user *, filename, u32 
> padding,
> +  unsigned int, offset_high, unsigned int, offset_low)
> +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> +   defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE3(truncate64, const char __user *, filename,
> +  unsigned int, offset_low, unsigned int, offset_high)
> +#else /* no padding, big endian */
> +COMPAT_SYSCALL_DEFINE3(truncate64, const char __user *, filename,
> +  unsigned int, offset_high, unsigned int, offset_low)
> +#endif
> +{
> +#ifdef CONFIG_SPARC
> +   if ((int) offset_high < 0)
> +   return -EINVAL;
> +#endif
> +   return do_sys_truncate(filename,
> +  ((loff_t) offset_high << 32) | offset_low);
> +}
> +#endif /* __ARCH_WANT_COMPAT_SYS_TRUNCATE64 */

This really screams out for a sparc-specific wrapper, or maybe that
#ifdef CONFIG_SPARC should just happen for everybody.

But regardless, code like the above is completely unacceptable.

And yes, it also shows that my suggested alternative doesn't really
work, because of the way the padding changes the number of arguments,
giving that whole COMPAT_SYSCALL_DEFINE 3-vs-4 argument version.

So I think just making it be arch-specific is the right thing.

Alternatively, the COMPAT_SYSCALL_DEFINE() macro could be made smarter
and do self-counting. There are generally tricks to count macro
arguments, particularly when you know that they are limited. Things
like

#define NARG(...) __NARG(__VA_ARGS__##, 6,5,4,3,2,1)
#define _NARG(a,b,c,d,e,f,...) f

or something (I may have screwed that up, you get the idea).

Linus


Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-18 Thread Al Viro
On Sun, Mar 18, 2018 at 11:06:42AM -0700, Linus Torvalds wrote:

> and then we can do
> 
>   COMPAT_SYSCALL_DEFINE5(readahead, int, fd,
> COMPAT_ARG_64BIT_ODD(off), compat_size_t, count)
>   {
>   return do_readahead(fd, off_lo + ((u64)off_hi << 64), count);
>   }
> 
> which at least looks reasonably legible, and has *zero* ifdef's anywhere.

It's a bit more complicated, but...

> I do *not* want to see those disgusting __ARCH_WANT_LE_COMPAT_SYS
> things and crazy #ifdef's in code.

Absolutely.  Those piles of ifdefs are unreadable garbage.

> So either let the architectures do their own trivial wrappers
> entirely, or do something clean like the above. Do *not* do
> #ifdef'fery at the system call declaration time.
> 
> Also note that the "ODD" arguments may not be the ones that need
> padding. I could easily see a system call argument numbering scheme
> like
> 
>r0 - system call number
>r1 - first argument
>r2 - second argument
>...
> 
> and then it's the *EVEN* 64-bit arguments that would need the padding
> (because they are actually odd in the register numbers). The above
> COMPAT_ARG_64BIT[_ODD]() model allows for that too.
> 
> Of course, if some architecture then has some other arbitrary rules (I
> could see register pairing rules that aren't the usual "even register"
> ones), then such an architecture would really have to have its own
> wrapper, but the above at least would handle the simple cases, and
> doesn't look disgusting to use.

I'd done some digging in that area, will find the notes and post.
Basically, we can even avoid the odd/even annotations and have
COMPAT_SYSCALL_DEFINE... sort it out.  It's a bit more hairy than
I would like at this stage in the cycle, though.  I'll see if it can
be done without too much PITA.

However, there still are genuinely speci^Wfucked in head cases - see
e.g. this sad story:
commit ab8a261ba5e2dd9206da640de5870cc31d568a7c
Author: Helge Deller 
Date:   Thu Jul 10 18:07:17 2014 +0200

parisc: fix fanotify_mark() syscall on 32bit compat kernel

Those certainly ought to stay in arch/*


Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-18 Thread Al Viro
On Sun, Mar 18, 2018 at 05:10:54PM +0100, Dominik Brodowski wrote:

> +#ifdef __ARCH_WANT_COMPAT_SYS_READAHEAD
> +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding,
> +unsigned int, off_lo, unsigned int, off_hi,
> +size_t, count)
> +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + !defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding,
> +unsigned int, off_hi, unsigned int, off_lo,
> +size_t, count)
> +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE4(readahead, int, fd,
> +unsigned int, off_lo, unsigned int, off_hi,
> +size_t, count)
> +#else /* no padding, big endian */
> +COMPAT_SYSCALL_DEFINE4(readahead, int, fd,
> +unsigned int, off_hi, unsigned int, off_lo,
> +size_t, count)
> +#endif
> +{
> + return do_readahead(fd, ((u64) off_hi << 32) | off_lo, count);
>  }

*UGH*

static inline compat_to_u64(u32 w0, u32 w1)
{
#ifdef __BIG_ENDIAN
return ((u64)w0 << 32) | w1;
#else
return ((u64)w1 << 32) | w0;
#endif
}

in compat.h, then this turns into

#ifdef __ARCH_WANT_COMPAT_SYS_WITH_PADDING
COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding,
   u32, off0, u32 off1,
   compat_size_t, count)
#else
COMPAT_SYSCALL_DEFINE4(readahead, int, fd,
   u32, off0, u32 off1,
   compat_size_t, count)
#endif
{
return do_readahead(fd, compat_to_u64(off0, off1), count);
}


Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-18 Thread Linus Torvalds
On Sun, Mar 18, 2018 at 10:40 AM, Al Viro  wrote:
>
> *UGH*
>
> static inline compat_to_u64(u32 w0, u32 w1)
> {
> #ifdef __BIG_ENDIAN
> return ((u64)w0 << 32) | w1;
> #else
> return ((u64)w1 << 32) | w0;
> #endif
> }
>
> in compat.h, then this turns into
>
> #ifdef __ARCH_WANT_COMPAT_SYS_WITH_PADDING
> COMPAT_SYSCALL_DEFINE5(readahead, int, fd, unsigned int, padding,
>u32, off0, u32 off1,
>compat_size_t, count)
> #else
> COMPAT_SYSCALL_DEFINE4(readahead, int, fd,
>u32, off0, u32 off1,
>compat_size_t, count)
> #endif

No. This is still too ugly to live.

What *may* be acceptable is if architectures defined something like this:

x86:

/* Little endian registers - low bits first, no padding for odd
register numbers necessary */
#define COMPAT_ARG_64BIT(x) unsigned int x##_lo, unsigned int x##_hi
#define COMPAT_ARG_64BIT_ODD(x) COMPAT_ARG_64BIT(x)

ppc BE:

/* Big-endian registers - high bits first, odd argument pairs
padded up to the next even register */
#define COMPAT_ARG_64BIT(x) unsigned int x##_hi, unsigned int x##_lo
#define COMPAT_ARG_64BIT_ODD(x) unsigned int  x##_padding,
COMPAT_ARG_64BIT(x)

and then we can do

  COMPAT_SYSCALL_DEFINE5(readahead, int, fd,
COMPAT_ARG_64BIT_ODD(off), compat_size_t, count)
  {
  return do_readahead(fd, off_lo + ((u64)off_hi << 64), count);
  }

which at least looks reasonably legible, and has *zero* ifdef's anywhere.

I do *not* want to see those disgusting __ARCH_WANT_LE_COMPAT_SYS
things and crazy #ifdef's in code.

So either let the architectures do their own trivial wrappers
entirely, or do something clean like the above. Do *not* do
#ifdef'fery at the system call declaration time.

Also note that the "ODD" arguments may not be the ones that need
padding. I could easily see a system call argument numbering scheme
like

   r0 - system call number
   r1 - first argument
   r2 - second argument
   ...

and then it's the *EVEN* 64-bit arguments that would need the padding
(because they are actually odd in the register numbers). The above
COMPAT_ARG_64BIT[_ODD]() model allows for that too.

Of course, if some architecture then has some other arbitrary rules (I
could see register pairing rules that aren't the usual "even register"
ones), then such an architecture would really have to have its own
wrapper, but the above at least would handle the simple cases, and
doesn't look disgusting to use.

   Linus

PS. It is possible that we should then add a

   #define COMPAT_ARG_64BIT_VAL(x) (x_##lo + ((u64)x_##hi << 32))

and then do

  COMPAT_SYSCALL_DEFINE5(readahead, int, fd,
COMPAT_ARG_64BIT_ODD(off), compat_size_t, count)
  {
  return do_readahead(fd, COMPAT_ARG_64BIT_VAL(off), count);
  }

because then we could perhaps generate the *non*compat system calls
this way too, just using

#define COMPAT_ARG_64BIT(x) unsigned long x
#define COMPAT_ARG_64BIT_VAL(x) (x)

instead (there might also be a "compat" mode that actually has access
to 64-bit registers, like x32 does, but I suspect it would have other
issues).


Re: [RFC PATCH 3/6] fs: provide generic compat_sys_p{read, write}64() implementations

2018-03-18 Thread Al Viro
On Sun, Mar 18, 2018 at 05:10:53PM +0100, Dominik Brodowski wrote:

> +#ifdef __ARCH_WANT_COMPAT_SYS_PREADWRITE64
> +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE6(pread64, unsigned int, fd, char __user *, ubuf,
> +u32, count, u32, padding, u32, poslo, u32, poshi)
> +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + !defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE6(pread64, unsigned int, fd, char __user *, ubuf,
> +u32, count, u32, padding, u32, poshi, u32, poslo)
> +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, ubuf,
> +u32, count, u32, poslo, u32, poshi)
> +#else /* no padding, big endian */
> +COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, ubuf,
> +u32, count, u32, poshi, u32, poslo)
> +#endif
> +{
> +#ifdef CONFIG_S390
> + if ((compat_ssize_t) count < 0)
> + return -EINVAL;
> +#endif /* CONFIG_S390 */
> + return do_pread64(fd, ubuf, count,
> +   ((loff_t) (unsigned long) (poshi) << 32) |
> + (unsigned long) (poslo));
> +}

Egads...  You have 4 ifdefs before you even get to the body.  And good luck
trying to actually keep track of that mess.

They clearly go in 2 pairs, right?  One parameter is "do we have padding"
(== does ABI prohibit passing 64bit value in 4th and 5th words), another
is the order in which the halves of 64bit are passed.  On l-e you have
bits 0..31 in the first one and bits 32..63 in the second; on b-e it's the
other way round.

Only the logics for putting them together into a 64bit value cares which
half is which; insisting on the names of form {hi,lo} gives
you arseloads of similar variants in ifdefs, all for the sake of not
having conditional code in the body.  Or, actually, in the inlined
helper for building that 64bit out of two halves...


Re: [RFC PATCH 2/6] fs: provide a generic compat_sys_truncate64() implementation

2018-03-18 Thread Al Viro
On Sun, Mar 18, 2018 at 05:10:52PM +0100, Dominik Brodowski wrote:

> +#ifdef __ARCH_WANT_COMPAT_SYS_TRUNCATE64
> +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE4(truncate64, const char __user *, filename, u32 
> padding,
> +unsigned int, offset_low, unsigned int, offset_high)
> +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + !defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE4(truncate64, const char __user *, filename, u32 
> padding,
> +unsigned int, offset_high, unsigned int, offset_low)
> +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> + defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE3(truncate64, const char __user *, filename,
> +unsigned int, offset_low, unsigned int, offset_high)
> +#else /* no padding, big endian */
> +COMPAT_SYSCALL_DEFINE3(truncate64, const char __user *, filename,
> +unsigned int, offset_high, unsigned int, offset_low)
> +#endif
> +{
> +#ifdef CONFIG_SPARC
> + if ((int) offset_high < 0)
> + return -EINVAL;
> +#endif
> + return do_sys_truncate(filename,
> +((loff_t) offset_high << 32) | offset_low);
> +}
> +#endif /* __ARCH_WANT_COMPAT_SYS_TRUNCATE64 */

Ow...

For one thing, the same observation as for readahead(2).  For another, that
sparc-specific test is very suspicious, innit?  Let's take a look at
do_sys_truncate():
static long do_sys_truncate(const char __user *pathname, loff_t length)
{
unsigned int lookup_flags = LOOKUP_FOLLOW;
struct path path;
int error;

if (length < 0) /* sorry, but loff_t says... */
return -EINVAL;

So in case of offset_high having bit 31 set, we would get length with bit 63 
set,
and step into that if (length < 0) return -EINVAL;

Sure, any set of texts can be combined, given a sufficiently large pile of 
ifdefs,
but you are replacing an arseload of almost but not quite identical functions
spread all over the tree with something that is in one place, but is awfully 
hard
to look at, nevermind reading it...


Re: [RFC PATCH 3/6] fs: provide generic compat_sys_p{read, write}64() implementations

2018-03-18 Thread Linus Torvalds
Honestly, I think the patches like this are disgusting:

On Sun, Mar 18, 2018 at 9:10 AM, Dominik Brodowski
 wrote:
> +#ifdef __ARCH_WANT_COMPAT_SYS_PREADWRITE64
> +#if defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> +   defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE6(pread64, unsigned int, fd, char __user *, ubuf,
> +  u32, count, u32, padding, u32, poslo, u32, poshi)
> +#elif defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> +   !defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE6(pread64, unsigned int, fd, char __user *, ubuf,
> +  u32, count, u32, padding, u32, poshi, u32, poslo)
> +#elif !defined(__ARCH_WANT_COMPAT_SYS_WITH_PADDING) && \
> +   defined(__ARCH_WANT_LE_COMPAT_SYS)
> +COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, ubuf,
> +  u32, count, u32, poslo, u32, poshi)
> +#else /* no padding, big endian */
> +COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, ubuf,
> +  u32, count, u32, poshi, u32, poslo)
> +#endif
> +{
> +#ifdef CONFIG_S390
> +   if ((compat_ssize_t) count < 0)
> +   return -EINVAL;
> +#endif /* CONFIG_S390 */

and we should just keep code like this entirely architecture-dependent.

It doesn't save all that many lines:

 19 files changed, 97 insertions(+), 106 deletions(-)

and the lines it adds are an unreadable mess compared to the lines it removes.

So please keep the high/low/padding stuff in the arch wrapper, and
just make them use "do_pwrite64()" and friends instead (or
"kern_pwrite64()", or whatever we ended up using as the kernel naming
for in-kernel system call wrappers).

Linus


[RFC PATCH 1/6] fs: provide a generic compat_sys_fallocate() implementation

2018-03-18 Thread Dominik Brodowski
The compat_sys_fallocate() implementations in mips, powerpc, s390, sparc
and x86 only differed based on the endianness of the u64 being passed as
parameters (3, 4) and (5, 6).

In addition, do not call sys_fallocate() from compat_sys_fallocate(), but
use a common do_fallocate() helper instead.

This patch is part of a series which tries to remove in-kernel calls to
syscalls. On this basis, the syscall entry path can be streamlined.

Cc: Ralf Baechle 
Cc: James Hogan 
Cc: linux-m...@linux-mips.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: linux-s...@vger.kernel.org
Cc: David S. Miller 
Cc: sparcli...@vger.kernel.org
Cc: Ingo Molnar 
Cc: Jiri Slaby 
Cc: x...@kernel.org
Cc: Al Viro 
Signed-off-by: Dominik Brodowski 
---
 arch/mips/include/asm/unistd.h |  4 
 arch/mips/kernel/linux32.c |  7 ---
 arch/mips/kernel/scall64-o32.S |  2 +-
 arch/powerpc/include/asm/unistd.h  |  1 +
 arch/powerpc/kernel/sys_ppc32.c|  7 ---
 arch/s390/include/asm/unistd.h |  1 +
 arch/s390/kernel/compat_linux.c|  7 ---
 arch/s390/kernel/compat_linux.h|  1 -
 arch/s390/kernel/syscalls/syscall.tbl  |  2 +-
 arch/sparc/include/asm/unistd.h|  1 +
 arch/sparc/kernel/sys_sparc32.c|  7 ---
 arch/sparc/kernel/systbls.h|  2 --
 arch/x86/entry/syscalls/syscall_32.tbl |  2 +-
 arch/x86/ia32/sys_ia32.c   |  8 
 arch/x86/include/asm/sys_ia32.h|  2 --
 arch/x86/include/asm/unistd.h  |  2 ++
 fs/open.c  | 24 +++-
 include/linux/compat.h |  6 ++
 18 files changed, 41 insertions(+), 45 deletions(-)

diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h
index 3c09450908aa..bec9c6f55956 100644
--- a/arch/mips/include/asm/unistd.h
+++ b/arch/mips/include/asm/unistd.h
@@ -45,6 +45,10 @@
 # endif
 # ifdef CONFIG_MIPS32_O32
 #  define __ARCH_WANT_COMPAT_SYS_TIME
+#  define __ARCH_WANT_COMPAT_SYS_FALLOCATE
+#  ifdef __MIPSEL__
+#define __ARCH_WANT_LE_COMPAT_SYS
+#  endif
 # endif
 #define __ARCH_WANT_SYS_FORK
 #define __ARCH_WANT_SYS_CLONE
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index b332f6fc1e72..f6e6cb41c01f 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -153,10 +153,3 @@ asmlinkage long sys32_fadvise64_64(int fd, int __pad,
merge_64(a2, a3), merge_64(a4, a5),
flags);
 }
-
-asmlinkage long sys32_fallocate(int fd, int mode, unsigned offset_a2,
-   unsigned offset_a3, unsigned len_a4, unsigned len_a5)
-{
-   return sys_fallocate(fd, mode, merge_64(offset_a2, offset_a3),
-merge_64(len_a4, len_a5));
-}
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index 9ebe3e2403b1..da8babc2c1f5 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -536,7 +536,7 @@ EXPORT(sys32_call_table)
PTR compat_sys_signalfd
PTR sys_ni_syscall  /* was timerfd */
PTR sys_eventfd
-   PTR sys32_fallocate /* 4320 */
+   PTR compat_sys_fallocate/* 4320 */
PTR sys_timerfd_create
PTR compat_sys_timerfd_gettime
PTR compat_sys_timerfd_settime
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index daf1ba97a00c..46890687fb1d 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -49,6 +49,7 @@
 #define __ARCH_WANT_COMPAT_SYS_TIME
 #define __ARCH_WANT_SYS_NEWFSTATAT
 #define __ARCH_WANT_COMPAT_SYS_SENDFILE
+#define __ARCH_WANT_COMPAT_SYS_FALLOCATE
 #endif
 #define __ARCH_WANT_SYS_FORK
 #define __ARCH_WANT_SYS_VFORK
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 15f216d022e2..7c6da6273367 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -97,13 +97,6 @@ asmlinkage int compat_sys_truncate64(const char __user * 
path, u32 reg4,
return sys_truncate(path, (high << 32) | low);
 }
 
-asmlinkage long compat_sys_fallocate(int fd, int mode, u32 offhi, u32 offlo,
-u32 lenhi, u32 lenlo)
-{
-   return sys_fallocate(fd, mode, ((loff_t)offhi << 32) | offlo,
-((loff_t)lenhi << 32) | lenlo);
-}
-
 asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long 
high,
 unsigned long low)
 {
diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h
index fd79c0d35dc4..5e919c11c11f 100644
--- a/arch/s390/include/asm/unistd.h
+++ b/arch/s390/include/asm/unistd.h
@@ -34,6 +34,7 @@
 #define __ARCH_WANT_SYS_SIGPROCMASK
 # ifdef CONFIG_COMPAT
 #   define __ARCH_WANT_C

[RFC PATCH 3/6] fs: provide generic compat_sys_p{read, write}64() implementations

2018-03-18 Thread Dominik Brodowski
The compat_sys_{read,write}64() implementations in mips, powerpc, s390,
sparc and x86 only differed based on whether the u64 parameter needed
padding and on their endianness.

Oh, and some defined the parameters as u64 or "unsigned long" which
expanded to u64, though it only expected u32 in these parameters.

This patch is part of a series which tries to remove in-kernel calls to
syscalls. On this basis, the syscall entry path can be streamlined.

Suggested-by: Al Viro 
Cc: Ralf Baechle 
Cc: James Hogan 
Cc: linux-m...@linux-mips.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: linux-s...@vger.kernel.org
Cc: David S. Miller 
Cc: sparcli...@vger.kernel.org
Cc: Ingo Molnar 
Cc: Jiri Slaby 
Cc: x...@kernel.org
Cc: Al Viro 
Signed-off-by: Dominik Brodowski 
---
 arch/mips/include/asm/unistd.h |  1 +
 arch/mips/kernel/linux32.c | 16 
 arch/mips/kernel/scall64-o32.S |  4 +-
 arch/powerpc/include/asm/unistd.h  |  1 +
 arch/powerpc/kernel/sys_ppc32.c| 12 --
 arch/s390/include/asm/unistd.h |  1 +
 arch/s390/kernel/compat_linux.c| 16 
 arch/s390/kernel/compat_linux.h|  2 -
 arch/s390/kernel/syscalls/syscall.tbl  |  4 +-
 arch/sparc/include/asm/unistd.h|  1 +
 arch/sparc/kernel/sys_sparc32.c| 18 -
 arch/sparc/kernel/systbls.h| 10 -
 arch/sparc/kernel/systbls_64.S |  2 +-
 arch/x86/entry/syscalls/syscall_32.tbl |  4 +-
 arch/x86/ia32/sys_ia32.c   | 16 
 arch/x86/include/asm/sys_ia32.h|  5 ---
 arch/x86/include/asm/unistd.h  |  1 +
 fs/read_write.c| 74 --
 include/linux/compat.h | 15 +++
 19 files changed, 97 insertions(+), 106 deletions(-)

diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h
index 8aa5b7a19133..3ddc271ad77b 100644
--- a/arch/mips/include/asm/unistd.h
+++ b/arch/mips/include/asm/unistd.h
@@ -48,6 +48,7 @@
 #  define __ARCH_WANT_COMPAT_SYS_TIME
 #  define __ARCH_WANT_COMPAT_SYS_FALLOCATE
 #  define __ARCH_WANT_COMPAT_SYS_TRUNCATE64
+#  define __ARCH_WANT_COMPAT_SYS_PREADWRITE64
 #  ifdef __MIPSEL__
 #define __ARCH_WANT_LE_COMPAT_SYS
 #  endif
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index f3aad4ca5560..871cda53a915 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -92,22 +92,6 @@ SYSCALL_DEFINE5(32_llseek, unsigned int, fd, unsigned int, 
offset_high,
return sys_llseek(fd, offset_high, offset_low, result, origin);
 }
 
-/* From the Single Unix Spec: pread & pwrite act like lseek to pos + op +
-   lseek back to original location.  They fail just like lseek does on
-   non-seekable files. */
-
-SYSCALL_DEFINE6(32_pread, unsigned long, fd, char __user *, buf, size_t, count,
-   unsigned long, unused, unsigned long, a4, unsigned long, a5)
-{
-   return sys_pread64(fd, buf, count, merge_64(a4, a5));
-}
-
-SYSCALL_DEFINE6(32_pwrite, unsigned int, fd, const char __user *, buf,
-   size_t, count, u32, unused, u64, a4, u64, a5)
-{
-   return sys_pwrite64(fd, buf, count, merge_64(a4, a5));
-}
-
 SYSCALL_DEFINE1(32_personality, unsigned long, personality)
 {
unsigned int p = personality & 0x;
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index 216450516b44..fbc463b234a1 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -416,8 +416,8 @@ EXPORT(sys32_call_table)
PTR compat_sys_rt_sigtimedwait
PTR compat_sys_rt_sigqueueinfo
PTR compat_sys_rt_sigsuspend
-   PTR sys_32_pread/* 4200 */
-   PTR sys_32_pwrite
+   PTR compat_sys_pread64  /* 4200 */
+   PTR compat_sys_pwrite64
PTR sys_chown
PTR sys_getcwd
PTR sys_capget
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index dca76157f27e..704f2413ac30 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -52,6 +52,7 @@
 #define __ARCH_WANT_COMPAT_SYS_SENDFILE
 #define __ARCH_WANT_COMPAT_SYS_FALLOCATE
 #define __ARCH_WANT_COMPAT_SYS_TRUNCATE64
+#define __ARCH_WANT_COMPAT_SYS_PREADWRITE64
 #endif
 #define __ARCH_WANT_SYS_FORK
 #define __ARCH_WANT_SYS_VFORK
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index dab9eece7731..ec896c8df968 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -74,18 +74,6 @@ unsigned long compat_sys_mmap2(unsigned long addr, size_t 
len,
  * The 32 bit ABI passes long longs in an odd even register pair.
  */
 
-compat_ssize_t compat_sys_pread64(unsigned int fd, char __user *ubuf, 
compat_size_t count,
-u32 reg6, u32 poshi, u32

[RFC PATCH 2/6] fs: provide a generic compat_sys_truncate64() implementation

2018-03-18 Thread Dominik Brodowski
The compat_sys_truncate64() implementations in mips, powerpc, s390, sparc
and x86 only differed based on whether the u64 parameter needed padding
and on its endianness.

Oh, and some defined the parameters as "unsigned long" which expanded to
u64, though it only expected u32 in these parameters.

This patch is part of a series which tries to remove in-kernel calls to
syscalls. On this basis, the syscall entry path can be streamlined.

Cc: Ralf Baechle 
Cc: James Hogan 
Cc: linux-m...@linux-mips.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: linux-s...@vger.kernel.org
Cc: David S. Miller 
Cc: sparcli...@vger.kernel.org
Cc: Ingo Molnar 
Cc: Jiri Slaby 
Cc: x...@kernel.org
Cc: Al Viro 
Signed-off-by: Dominik Brodowski 
---
 arch/mips/include/asm/unistd.h |  2 ++
 arch/mips/kernel/linux32.c |  6 --
 arch/mips/kernel/scall64-o32.S |  2 +-
 arch/powerpc/include/asm/unistd.h  |  2 ++
 arch/powerpc/kernel/sys_ppc32.c|  6 --
 arch/s390/include/asm/unistd.h |  1 +
 arch/s390/kernel/compat_linux.c|  5 -
 arch/s390/kernel/compat_linux.h|  1 -
 arch/s390/kernel/syscalls/syscall.tbl  |  2 +-
 arch/sparc/include/asm/unistd.h|  1 +
 arch/sparc/kernel/sys_sparc32.c|  8 
 arch/sparc/kernel/systbls.h|  3 ---
 arch/sparc/kernel/systbls_64.S |  2 +-
 arch/x86/entry/syscalls/syscall_32.tbl |  2 +-
 arch/x86/ia32/sys_ia32.c   |  7 ---
 arch/x86/include/asm/sys_ia32.h|  2 --
 arch/x86/include/asm/unistd.h  |  1 +
 fs/open.c  | 28 +++-
 include/linux/compat.h | 11 +++
 19 files changed, 49 insertions(+), 43 deletions(-)

diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h
index bec9c6f55956..8aa5b7a19133 100644
--- a/arch/mips/include/asm/unistd.h
+++ b/arch/mips/include/asm/unistd.h
@@ -44,8 +44,10 @@
 #  define __ARCH_WANT_SYS_TIME
 # endif
 # ifdef CONFIG_MIPS32_O32
+#  define __ARCH_WANT_COMPAT_SYS_WITH_PADDING
 #  define __ARCH_WANT_COMPAT_SYS_TIME
 #  define __ARCH_WANT_COMPAT_SYS_FALLOCATE
+#  define __ARCH_WANT_COMPAT_SYS_TRUNCATE64
 #  ifdef __MIPSEL__
 #define __ARCH_WANT_LE_COMPAT_SYS
 #  endif
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index f6e6cb41c01f..f3aad4ca5560 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -79,12 +79,6 @@ struct rlimit32 {
int rlim_max;
 };
 
-SYSCALL_DEFINE4(32_truncate64, const char __user *, path,
-   unsigned long, __dummy, unsigned long, a2, unsigned long, a3)
-{
-   return sys_truncate(path, merge_64(a2, a3));
-}
-
 SYSCALL_DEFINE4(32_ftruncate64, unsigned long, fd, unsigned long, __dummy,
unsigned long, a2, unsigned long, a3)
 {
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index da8babc2c1f5..216450516b44 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -427,7 +427,7 @@ EXPORT(sys32_call_table)
PTR sys_ni_syscall
PTR sys_ni_syscall
PTR sys_mips_mmap2  /* 4210 */
-   PTR sys_32_truncate64
+   PTR compat_sys_truncate64
PTR sys_32_ftruncate64
PTR sys_newstat
PTR sys_newlstat
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index 46890687fb1d..dca76157f27e 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -46,10 +46,12 @@
 #define __ARCH_WANT_OLD_STAT
 #endif
 #ifdef CONFIG_PPC64
+#define __ARCH_WANT_COMPAT_SYS_WITH_PADDING
 #define __ARCH_WANT_COMPAT_SYS_TIME
 #define __ARCH_WANT_SYS_NEWFSTATAT
 #define __ARCH_WANT_COMPAT_SYS_SENDFILE
 #define __ARCH_WANT_COMPAT_SYS_FALLOCATE
+#define __ARCH_WANT_COMPAT_SYS_TRUNCATE64
 #endif
 #define __ARCH_WANT_SYS_FORK
 #define __ARCH_WANT_SYS_VFORK
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index 7c6da6273367..dab9eece7731 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -91,12 +91,6 @@ compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 
offhi, u32 offlo, u32 co
return sys_readahead(fd, ((loff_t)offhi << 32) | offlo, count);
 }
 
-asmlinkage int compat_sys_truncate64(const char __user * path, u32 reg4,
-   unsigned long high, unsigned long low)
-{
-   return sys_truncate(path, (high << 32) | low);
-}
-
 asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long 
high,
 unsigned long low)
 {
diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h
index 5e919c11c11f..7667a2d0b1e1 100644
--- a/arch/s390/include/asm/unistd.h
+++ b/arch/s390/include/asm/unistd.h
@@ -35,6 +35,7 @@
 # ifdef 

[RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-18 Thread Dominik Brodowski
The compat_sys_readahead() implementations in mips, powerpc, s390, sparc
and x86 only differed based on whether the u64 parameter needed padding
and on their endianness.

Oh, and some defined the parameters as u64 or "unsigned long" which
expanded to u64, though it only expected u32 in these parameters.

This patch is part of a series which tries to remove in-kernel calls to
syscalls. On this basis, the syscall entry path can be streamlined.

Cc: Ralf Baechle 
Cc: James Hogan 
Cc: linux-m...@linux-mips.org
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: linux-s...@vger.kernel.org
Cc: David S. Miller 
Cc: sparcli...@vger.kernel.org
Cc: Ingo Molnar 
Cc: Jiri Slaby 
Cc: x...@kernel.org
Cc: Al Viro 
Signed-off-by: Dominik Brodowski 
---
 arch/mips/include/asm/unistd.h |  1 +
 arch/mips/kernel/linux32.c |  6 ---
 arch/mips/kernel/scall64-o32.S |  2 +-
 arch/powerpc/include/asm/unistd.h  |  1 +
 arch/powerpc/kernel/sys_ppc32.c|  5 ---
 arch/s390/include/asm/unistd.h |  1 +
 arch/s390/kernel/compat_linux.c|  5 ---
 arch/s390/kernel/compat_linux.h|  1 -
 arch/s390/kernel/syscalls/syscall.tbl  |  2 +-
 arch/sparc/include/asm/unistd.h|  1 +
 arch/sparc/kernel/sys_sparc32.c|  8 
 arch/sparc/kernel/systbls.h|  4 --
 arch/x86/entry/syscalls/syscall_32.tbl |  2 +-
 arch/x86/ia32/sys_ia32.c   |  6 ---
 arch/x86/include/asm/sys_ia32.h|  2 -
 arch/x86/include/asm/unistd.h  |  1 +
 include/linux/compat.h | 10 +
 mm/readahead.c | 81 --
 18 files changed, 76 insertions(+), 63 deletions(-)

diff --git a/arch/mips/include/asm/unistd.h b/arch/mips/include/asm/unistd.h
index 3ddc271ad77b..f8f9046164ae 100644
--- a/arch/mips/include/asm/unistd.h
+++ b/arch/mips/include/asm/unistd.h
@@ -49,6 +49,7 @@
 #  define __ARCH_WANT_COMPAT_SYS_FALLOCATE
 #  define __ARCH_WANT_COMPAT_SYS_TRUNCATE64
 #  define __ARCH_WANT_COMPAT_SYS_PREADWRITE64
+#  define __ARCH_WANT_COMPAT_SYS_READAHEAD
 #  ifdef __MIPSEL__
 #define __ARCH_WANT_LE_COMPAT_SYS
 #  endif
diff --git a/arch/mips/kernel/linux32.c b/arch/mips/kernel/linux32.c
index 871cda53a915..c40ce08be17d 100644
--- a/arch/mips/kernel/linux32.c
+++ b/arch/mips/kernel/linux32.c
@@ -106,12 +106,6 @@ SYSCALL_DEFINE1(32_personality, unsigned long, personality)
return ret;
 }
 
-asmlinkage ssize_t sys32_readahead(int fd, u32 pad0, u64 a2, u64 a3,
-  size_t count)
-{
-   return sys_readahead(fd, merge_64(a2, a3), count);
-}
-
 asmlinkage long sys32_sync_file_range(int fd, int __pad,
unsigned long a2, unsigned long a3,
unsigned long a4, unsigned long a5,
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index fbc463b234a1..eb4e66ba025a 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -439,7 +439,7 @@ EXPORT(sys32_call_table)
PTR compat_sys_fcntl64  /* 4220 */
PTR sys_ni_syscall
PTR sys_gettid
-   PTR sys32_readahead
+   PTR compat_sys_readahead
PTR sys_setxattr
PTR sys_lsetxattr   /* 4225 */
PTR sys_fsetxattr
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index 704f2413ac30..870317a35763 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -53,6 +53,7 @@
 #define __ARCH_WANT_COMPAT_SYS_FALLOCATE
 #define __ARCH_WANT_COMPAT_SYS_TRUNCATE64
 #define __ARCH_WANT_COMPAT_SYS_PREADWRITE64
+#define __ARCH_WANT_COMPAT_SYS_READAHEAD
 #endif
 #define __ARCH_WANT_SYS_FORK
 #define __ARCH_WANT_SYS_VFORK
diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c
index ec896c8df968..289ae55bb4b5 100644
--- a/arch/powerpc/kernel/sys_ppc32.c
+++ b/arch/powerpc/kernel/sys_ppc32.c
@@ -74,11 +74,6 @@ unsigned long compat_sys_mmap2(unsigned long addr, size_t 
len,
  * The 32 bit ABI passes long longs in an odd even register pair.
  */
 
-compat_ssize_t compat_sys_readahead(int fd, u32 r4, u32 offhi, u32 offlo, u32 
count)
-{
-   return sys_readahead(fd, ((loff_t)offhi << 32) | offlo, count);
-}
-
 asmlinkage int compat_sys_ftruncate64(unsigned int fd, u32 reg4, unsigned long 
high,
 unsigned long low)
 {
diff --git a/arch/s390/include/asm/unistd.h b/arch/s390/include/asm/unistd.h
index 71e6f7d65762..685ad7944850 100644
--- a/arch/s390/include/asm/unistd.h
+++ b/arch/s390/include/asm/unistd.h
@@ -37,6 +37,7 @@
 #   define __ARCH_WANT_COMPAT_SYS_FALLOCATE
 #   define __ARCH_WANT_COMPAT_SYS_TRUNCATE64
 #   define __ARCH_WANT_COMPAT_SYS_PREADWRITE64
+#   define __ARCH_WANT_COMPAT_SYS_READAHEAD
 # endif
 #define __ARCH_WANT_SYS_FORK
 #define __ARCH_WANT_SYS_VFORK
d

Re: [PATCH v12 00/11] Application Data Integrity feature introduced by SPARC M7

2018-03-18 Thread David Miller
From: Khalid Aziz 
Date: Wed, 21 Feb 2018 10:15:42 -0700

> V12 changes:
> This series is same as v10 and v11 and was simply rebased on 4.16-rc2
> kernel and patch 11 was added to update signal delivery code to use the
> new helper functions added by Eric Biederman. Can mm maintainers please
> review patches 2, 7, 8 and 9 which are arch independent, and
> include/linux/mm.h and mm/ksm.c changes in patch 10 and ack these if
> everything looks good? 

Khalid I've applied this series to sparc-next, thank you!

But one thing has to be fixed up.

In uapi/asm/auxvec.h you conditionalize the ADI aux vectors on
CONFIG_SPARC64.

That's not correct, you should never control user facing definitions
based upon kernel configuration.

Also, both 32-bit and 64-bit applications running on ADI capable
machines want to compile against and use this informaiton.

So please remove these CPP checks.


[PATCH V5 4/4] powerpc/mm/hash: Don't memset pgd table if not needed

2018-03-18 Thread Aneesh Kumar K.V
We need to zero-out pgd table only if we share the slab cache with pud/pmd
level caches. With the support of 4PB, we don't share the slab cache anymore.
Instead of removing the code completely hide it within an #ifdef. We don't need
to do this with any other page table level, because they all allocate table
of double the size and we take of initializing the first half corrrectly during
page table zap.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 4746bc68d446..07f0dbac479f 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -80,8 +80,19 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
pgd = kmem_cache_alloc(PGT_CACHE(PGD_INDEX_SIZE),
   pgtable_gfp_flags(mm, GFP_KERNEL));
+   /*
+* With hugetlb, we don't clear the second half of the page table.
+* If we share the same slab cache with the pmd or pud level table,
+* we need to make sure we zero out the full table on alloc.
+* With 4K we don't store slot in the second half. Hence we don't
+* need to do this for 4k.
+*/
+#if (H_PGD_INDEX_SIZE == H_PUD_CACHE_INDEX) || \
+   (H_PGD_INDEX_SIZE == H_PMD_CACHE_INDEX)
+#if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_PPC_64K_PAGES)
memset(pgd, 0, PGD_TABLE_SIZE);
-
+#endif
+#endif
return pgd;
 }
 
-- 
2.14.3



[PATCH V5 3/4] powerpc/mm/hash64: Increase the VA range

2018-03-18 Thread Aneesh Kumar K.V
This patch increase the max virtual address value to 4PB. With 4K page size
config we continue to limit ourself to 64TB.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 2 +-
 arch/powerpc/include/asm/processor.h  | 9 -
 arch/powerpc/mm/init_64.c | 6 --
 arch/powerpc/mm/pgtable_64.c  | 5 -
 4 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 8d0cbbb31023..55d5cd64cdb3 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -4,7 +4,7 @@
 
 #define H_PTE_INDEX_SIZE  8
 #define H_PMD_INDEX_SIZE  10
-#define H_PUD_INDEX_SIZE  7
+#define H_PUD_INDEX_SIZE  10
 #define H_PGD_INDEX_SIZE  8
 
 /*
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 75b084486ce1..bb9cb25ffb20 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -109,6 +109,13 @@ void release_thread(struct task_struct *);
 #define TASK_SIZE_64TB  (0x4000UL)
 #define TASK_SIZE_128TB (0x8000UL)
 #define TASK_SIZE_512TB (0x0002UL)
+#define TASK_SIZE_1PB   (0x0004UL)
+#define TASK_SIZE_2PB   (0x0008UL)
+/*
+ * With 52 bits in the address we can support
+ * upto 4PB of range.
+ */
+#define TASK_SIZE_4PB   (0x0010UL)
 
 /*
  * For now 512TB is only supported with book3s and 64K linux page size.
@@ -117,7 +124,7 @@ void release_thread(struct task_struct *);
 /*
  * Max value currently used:
  */
-#define TASK_SIZE_USER64   TASK_SIZE_512TB
+#define TASK_SIZE_USER64   TASK_SIZE_4PB
 #define DEFAULT_MAP_WINDOW_USER64  TASK_SIZE_128TB
 #define TASK_CONTEXT_SIZE  TASK_SIZE_512TB
 #else
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index fdb424a29f03..63470b06c502 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -68,12 +68,6 @@
 
 #include "mmu_decl.h"
 
-#ifdef CONFIG_PPC_BOOK3S_64
-#if H_PGTABLE_RANGE > USER_VSID_RANGE
-#warning Limited user VSID range means pagetable space is wasted
-#endif
-#endif /* CONFIG_PPC_BOOK3S_64 */
-
 phys_addr_t memstart_addr = ~0;
 EXPORT_SYMBOL_GPL(memstart_addr);
 phys_addr_t kernstart_addr;
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 28c980eb4422..16636bdf3331 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -57,11 +57,6 @@
 
 #include "mmu_decl.h"
 
-#ifdef CONFIG_PPC_BOOK3S_64
-#if TASK_SIZE_USER64 > (1UL << (ESID_BITS + SID_SHIFT))
-#error TASK_SIZE_USER64 exceeds user VSID range
-#endif
-#endif
 
 #ifdef CONFIG_PPC_BOOK3S_64
 /*
-- 
2.14.3



[PATCH V5 2/4] powerpc/mm: Add support for handling > 512TB address in SLB miss

2018-03-18 Thread Aneesh Kumar K.V
For address above 512TB we allocate additional mmu context. To make it all
easy, address above 512TB is handled with IR/DR=1 and with stack frame setup.

The mmu_context_t is also updated to track the new extended_ids. To support
upto 4PB we need a total 8 contexts.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h  |   6 ++
 arch/powerpc/include/asm/book3s/64/hash-64k.h |   6 ++
 arch/powerpc/include/asm/book3s/64/mmu.h  |  31 +++-
 arch/powerpc/include/asm/mmu_context.h|  38 +
 arch/powerpc/include/asm/processor.h  |   6 ++
 arch/powerpc/kernel/exceptions-64s.S  |  11 ++-
 arch/powerpc/kernel/traps.c   |  12 ---
 arch/powerpc/mm/copro_fault.c |   2 +-
 arch/powerpc/mm/hash_utils_64.c   |   4 +-
 arch/powerpc/mm/mmu_context_book3s64.c|  15 +++-
 arch/powerpc/mm/pgtable-hash64.c  |   2 +-
 arch/powerpc/mm/slb.c | 108 ++
 arch/powerpc/mm/slb_low.S |   8 +-
 arch/powerpc/mm/slice.c   |  15 +++-
 arch/powerpc/mm/tlb_hash64.c  |   2 +-
 15 files changed, 239 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 67c5475311ee..1a35eb944481 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -11,6 +11,12 @@
 #define H_PUD_INDEX_SIZE  9
 #define H_PGD_INDEX_SIZE  9
 
+/*
+ * Each context is 512TB. But on 4k we restrict our max TASK size to 64TB
+ * Hence also limit max EA bits to 64TB.
+ */
+#define MAX_EA_BITS_PER_CONTEXT46
+
 #ifndef __ASSEMBLY__
 #define H_PTE_TABLE_SIZE   (sizeof(pte_t) << H_PTE_INDEX_SIZE)
 #define H_PMD_TABLE_SIZE   (sizeof(pmd_t) << H_PMD_INDEX_SIZE)
diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 3bcf269f8f55..8d0cbbb31023 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -7,6 +7,12 @@
 #define H_PUD_INDEX_SIZE  7
 #define H_PGD_INDEX_SIZE  8
 
+/*
+ * Each context is 512TB size. SLB miss for first context/default context
+ * is handled in the hotpath.
+ */
+#define MAX_EA_BITS_PER_CONTEXT49
+
 /*
  * 64k aligned address free up few of the lower bits of RPN for us
  * We steal that here. For more deatils look at pte_pfn/pfn_pte()
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 78579305..33ba81d69bd1 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -91,7 +91,18 @@ struct slice_mask {
 };
 
 typedef struct {
-   mm_context_id_t id;
+   union {
+   /*
+* We use id as the PIDR content for radix. On hash we can use
+* more than one id. The extended ids are used when we start
+* having address above 512TB. We allocate one extended id
+* for each 512TB. The new id is then used with the 49 bit
+* EA to build a new VA. We always use ESID_BITS_1T_MASK bits
+* from EA and new context ids to build the new VAs.
+*/
+   mm_context_id_t id;
+   mm_context_id_t extended_id[TASK_SIZE_USER64/TASK_CONTEXT_SIZE];
+   };
u16 user_psize; /* page size index */
 
/* Number of bits in the mm_cpumask */
@@ -193,5 +204,23 @@ extern void radix_init_pseries(void);
 static inline void radix_init_pseries(void) { };
 #endif
 
+static inline int get_ea_context(mm_context_t *ctx, unsigned long ea)
+{
+   int index = ea >> MAX_EA_BITS_PER_CONTEXT;
+
+   if (likely(index < ARRAY_SIZE(ctx->extended_id)))
+   return ctx->extended_id[index];
+   /* should never happen */
+   BUG();
+}
+
+static inline unsigned long get_user_vsid(mm_context_t *ctx,
+ unsigned long ea, int ssize)
+{
+   unsigned long context = get_ea_context(ctx, ea);
+
+   return get_vsid(context, ea, ssize);
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 051b3d63afe3..61d96b4a03b9 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -60,12 +60,50 @@ extern int hash__alloc_context_id(void);
 extern void hash__reserve_context_id(int id);
 extern void __destroy_context(int context_id);
 static inline void mmu_context_init(void) { }
+
+static inline int alloc_extended_context(struct mm_struct *mm,
+unsigned long ea)
+{
+   int context_id;
+
+   int index = ea >> MAX_EA_BITS_PER_CONTEXT;
+
+   context_id = hash__alloc_conte

[PATCH V5 1/4] powerpc/mm/slice: Consolidate the return path for the function.

2018-03-18 Thread Aneesh Kumar K.V
In the follow up patch, on finding a free area we will need to do allocated
extra contexts as needed. Consolidating the return path for
slice_get_unmapped_area makes it easy.

Split into a separate patch to make review easy.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/slice.c | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 15a857772617..f64bfe8bae57 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -585,7 +585,8 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
 */
if (slice_check_range_fits(mm, &good_mask, addr, len)) {
slice_dbg(" fits good !\n");
-   return addr;
+   newaddr = addr;
+   goto return_addr;
}
} else {
/* Now let's see if we can find something in the existing
@@ -598,7 +599,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
 * we thus return directly
 */
slice_dbg(" found area at 0x%lx\n", newaddr);
-   return newaddr;
+   goto return_addr;
}
}
/*
@@ -612,6 +613,7 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
if (addr || fixed) {
if (slice_check_range_fits(mm, &potential_mask, addr, len)) {
slice_dbg(" fits potential !\n");
+   newaddr = addr;
goto convert;
}
}
@@ -626,34 +628,34 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
 * anywhere in the good area.
 */
if (addr) {
-   addr = slice_find_area(mm, len, &good_mask,
-  psize, topdown, high_limit);
-   if (addr != -ENOMEM) {
-   slice_dbg(" found area at 0x%lx\n", addr);
-   return addr;
+   newaddr = slice_find_area(mm, len, &good_mask,
+ psize, topdown, high_limit);
+   if (newaddr != -ENOMEM) {
+   slice_dbg(" found area at 0x%lx\n", newaddr);
+   goto return_addr;
}
}
 
/* Now let's see if we can find something in the existing slices
 * for that size plus free slices
 */
-   addr = slice_find_area(mm, len, &potential_mask,
-  psize, topdown, high_limit);
+   newaddr = slice_find_area(mm, len, &potential_mask,
+ psize, topdown, high_limit);
 
 #ifdef CONFIG_PPC_64K_PAGES
-   if (addr == -ENOMEM && psize == MMU_PAGE_64K) {
+   if (newaddr == -ENOMEM && psize == MMU_PAGE_64K) {
/* retry the search with 4k-page slices included */
slice_or_mask(&potential_mask, &potential_mask, compat_maskp, 
high_slices);
-   addr = slice_find_area(mm, len, &potential_mask,
-  psize, topdown, high_limit);
+   newaddr = slice_find_area(mm, len, &potential_mask,
+ psize, topdown, high_limit);
}
 #endif
 
-   if (addr == -ENOMEM)
+   if (newaddr == -ENOMEM)
return -ENOMEM;
 
-   slice_range_to_mask(addr, len, &potential_mask, high_slices);
-   slice_dbg(" found potential area at 0x%lx\n", addr);
+   slice_range_to_mask(newaddr, len, &potential_mask, high_slices);
+   slice_dbg(" found potential area at 0x%lx\n", newaddr);
slice_print_mask(" mask", &potential_mask);
 
  convert:
@@ -667,7 +669,9 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
if (psize > MMU_PAGE_BASE)
on_each_cpu(slice_flush_segments, mm, 1);
}
-   return addr;
+
+return_addr:
+   return newaddr;
 
 }
 EXPORT_SYMBOL_GPL(slice_get_unmapped_area);
-- 
2.14.3



[PATCH V5 0/4] Add support for 4PB virtual address space on hash

2018-03-18 Thread Aneesh Kumar K.V
This patch series extended the max virtual address space value from 512TB
to 4PB with 64K page size. We do that by allocating one vsid context for
each 512TB range. More details of that is explained in patch 3.

Changes from V4:
* Move context allocation to mmap time instead of SLB miss time
* Address review comments 

Changes from V3:
* move extended_id to be a union with mm_context_t id. This reduce some
 array index complexity.
* Add addr_limit check when handling slb miss for extended context

Changes from V2:
* Rebased on top of slice_mask series from Nick Piggin
* Fixed segfault when mmap with 512TB hint address



Aneesh Kumar K.V (4):
  powerpc/mm/slice: Consolidate the return path for the function.
  powerpc/mm: Add support for handling > 512TB address in SLB miss
  powerpc/mm/hash64: Increase the VA range
  powerpc/mm/hash: Don't memset pgd table if not needed

 arch/powerpc/include/asm/book3s/64/hash-4k.h  |   6 ++
 arch/powerpc/include/asm/book3s/64/hash-64k.h |   8 +-
 arch/powerpc/include/asm/book3s/64/mmu.h  |  31 +++-
 arch/powerpc/include/asm/book3s/64/pgalloc.h  |  13 +++-
 arch/powerpc/include/asm/mmu_context.h|  38 +
 arch/powerpc/include/asm/processor.h  |  15 +++-
 arch/powerpc/kernel/exceptions-64s.S  |  11 ++-
 arch/powerpc/kernel/traps.c   |  12 ---
 arch/powerpc/mm/copro_fault.c |   2 +-
 arch/powerpc/mm/hash_utils_64.c   |   4 +-
 arch/powerpc/mm/init_64.c |   6 --
 arch/powerpc/mm/mmu_context_book3s64.c|  15 +++-
 arch/powerpc/mm/pgtable-hash64.c  |   2 +-
 arch/powerpc/mm/pgtable_64.c  |   5 --
 arch/powerpc/mm/slb.c | 108 ++
 arch/powerpc/mm/slb_low.S |   8 +-
 arch/powerpc/mm/slice.c   |  49 
 arch/powerpc/mm/tlb_hash64.c  |   2 +-
 18 files changed, 279 insertions(+), 56 deletions(-)

-- 
2.14.3