date:20181206

Re: use generic DMA mapping code in powerpc V4

2018-12-06 Thread Christian Zigotzky

Good to know. Sorry because of the email.

Sent from my iPhone

> On 6. Dec 2018, at 20:36, Christoph Hellwig  wrote:
> 
>> On Thu, Dec 06, 2018 at 06:10:54PM +0100, Christian Zigotzky wrote:
>> Please don’t merge this code. We are still testing and trying to figure out 
>> where the problems are in the code.
> 
> The ones I sent pings for were either tested successfully by you
> (the zone change) or are trivial cleanups that don't affect your setup.

Re: [PATCH v4 0/5] powerpc: system call table generation support

2018-12-06 Thread Satheesh Rajendran

On Fri, Dec 07, 2018 at 11:41:35AM +0530, Firoz Khan wrote:
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the 
> system call table generation support implementation 
> across all the architectures. 
> 
> The system call tables are in different format in 
> all architecture. It will be difficult to manually
> add, modify or delete the system calls in the resp-
> ective files manually. To make it easy by keeping a 
> script and which'll generate uapi header file and 
> syscall table file.
> 
> syscall.tbl contains the list of available system 
> calls along with system call number and correspond-
> ing entry point. Add a new system call in this arch-
> itecture will be possible by adding new entry in 
> the syscall.tbl file.
> 
> Adding a new table entry consisting of:
> - System call number.
> - ABI.
> - System call name.
> - Entry point name.
>   - Compat entry name, if required.
>   - spu entry name, if required.
> 
> ARM, s390 and x86 architecuture does exist the sim-
> ilar support. I leverage their implementation to 
> come up with a generic solution.
> 
> I have done the same support for work for alpha, 
> ia64, m68k, microblaze, mips, parisc, sh, sparc, 
> and xtensa. Below mentioned git repository contains
> more details about the workflow.
> 
> https://github.com/frzkhn/system_call_table_generator/
> 
> Finally, this is the ground work to solve the Y2038
> issue. We need to add two dozen of system calls to 
> solve Y2038 issue. So this patch series will help to
> add new system calls easily by adding new entry in the
> syscall.tbl.
> 
> Changes since v3:
>  - split compat syscall table out from native table.
>  - modified the script to add new line in the generated
>file.
Hi Firoz,

This version(v4) booted fine in IBM Power8 box.
Compiled with 'ppc64le_defconfig' aginst 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=merge=a26b21082959cee3075b3edb7ef23071c7e0b09a

Reference failure v3 version: 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-November/182110.html

Regards,
-Satheesh.
> 
> Changes since v2:
>  - modified/optimized the syscall.tbl to avoid duplicate
>for the spu entries.
>  - updated the syscalltbl.sh to meet the above point.
> 
> Changes since v1:
>  - optimized/updated the syscall table generation 
>scripts.
>  - fixed all mixed indentation issues in syscall.tbl.
>  - added "comments" in syscall_*.tbl.
>  - changed from generic-y to generated-y in Kbuild.
> 
> Firoz Khan (5):
>   powerpc: add __NR_syscalls along with NR_syscalls
>   powerpc: move macro definition from asm/systbl.h
>   powerpc: add system call table generation support
>   powerpc: split compat syscall table out from native table
>   powerpc: generate uapi header and system call table files
> 
>  arch/powerpc/Makefile   |   3 +
>  arch/powerpc/include/asm/Kbuild |   4 +
>  arch/powerpc/include/asm/syscall.h  |   3 +-
>  arch/powerpc/include/asm/systbl.h   | 396 --
>  arch/powerpc/include/asm/unistd.h   |   3 +-
>  arch/powerpc/include/uapi/asm/Kbuild|   2 +
>  arch/powerpc/include/uapi/asm/unistd.h  | 389 +
>  arch/powerpc/kernel/Makefile|  10 -
>  arch/powerpc/kernel/entry_64.S  |   7 +-
>  arch/powerpc/kernel/syscalls/Makefile   |  63 
>  arch/powerpc/kernel/syscalls/syscall.tbl| 427 
> 
>  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
>  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  36 +++
>  arch/powerpc/kernel/systbl.S|  40 ++-
>  arch/powerpc/kernel/systbl_chk.c|  60 
>  arch/powerpc/kernel/vdso.c  |   7 +-
>  arch/powerpc/platforms/cell/spu_callbacks.c |  17 +-
>  17 files changed, 606 insertions(+), 898 deletions(-)
>  delete mode 100644 arch/powerpc/include/asm/systbl.h
>  create mode 100644 arch/powerpc/kernel/syscalls/Makefile
>  create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl
>  create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
>  create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/powerpc/kernel/systbl_chk.c
> 
> -- 
> 1.9.1
>

[PATCH -next] powerpc/mm: drop test in pp_601

2018-12-06 Thread YueHaibing

The test selects between two identical values, so it doesn't look useful.
It turns out that the tested expression can only be true anyway, so drop
the test, the corresponding parameter, and the corresponding argument at
the only call site.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// 
@@
expression e,e1;
@@

* e ? e1 : e1
// 

Signed-off-by: YueHaibing 
---
 arch/powerpc/mm/dump_bats.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/dump_bats.c b/arch/powerpc/mm/dump_bats.c
index a0d23e9..33450a8 100644
--- a/arch/powerpc/mm/dump_bats.c
+++ b/arch/powerpc/mm/dump_bats.c
@@ -17,8 +17,8 @@ static char *pp_601(int k, int pp)
if (pp == 1)
return k ? "ROX" : "RWX";
if (pp == 2)
-   return k ? "RWX" : "RWX";
-   return k ? "ROX" : "ROX";
+   return "RWX";
+   return "ROX";
 }
 
 static void bat_show_601(struct seq_file *m, int idx, u32 lower, u32 upper)

Re: [PATCH v4 0/5] powerpc: system call table generation support

2018-12-06 Thread Firoz Khan

++ sathn...@linux.vnet.ibm.com,

Hi Satheesh,

On Fri, 7 Dec 2018 at 11:42, Firoz Khan  wrote:
>
>
> Changes since v3:
>  - split compat syscall table out from native table.
>  - modified the script to add new line in the generated
>file.

I have fixed few major issue. Could you perform boot test with v4
on the platform.

Thanks
Firoz

[PATCH v4 5/5] powerpc: generate uapi header and system call table files

2018-12-06 Thread Firoz Khan

System call table generation script must be run to gener-
ate unistd_32/64.h and syscall_table_32/64/c32/spu.h files.
This patch will have changes which will invokes the script.

This patch will generate unistd_32/64.h and syscall_table-
_32/64/c32/spu.h files by the syscall table generation
script invoked by parisc/Makefile and the generated files
against the removed files must be identical.

The generated uapi header file will be included in uapi/-
asm/unistd.h and generated system call table header file
will be included by kernel/systbl.S file.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/Makefile   |   3 +
 arch/powerpc/include/asm/Kbuild |   4 +
 arch/powerpc/include/asm/systbl.h   | 395 
 arch/powerpc/include/uapi/asm/Kbuild|   2 +
 arch/powerpc/include/uapi/asm/unistd.h  | 392 +--
 arch/powerpc/kernel/Makefile|  10 -
 arch/powerpc/kernel/systbl.S|  52 +---
 arch/powerpc/kernel/systbl_chk.c|  60 -
 arch/powerpc/platforms/cell/spu_callbacks.c |  17 +-
 9 files changed, 26 insertions(+), 909 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/systbl.h
 delete mode 100644 arch/powerpc/kernel/systbl_chk.c

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 8a2ce14..34897191 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -402,6 +402,9 @@ archclean:
 
 archprepare: checkbin
 
+archheaders:
+   $(Q)$(MAKE) $(build)=arch/powerpc/kernel/syscalls all
+
 ifdef CONFIG_STACKPROTECTOR
 prepare: stack_protector_prepare
 
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 3196d22..77ff7fb 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -1,3 +1,7 @@
+generated-y += syscall_table_32.h
+generated-y += syscall_table_64.h
+generated-y += syscall_table_c32.h
+generated-y += syscall_table_spu.h
 generic-y += div64.h
 generic-y += export.h
 generic-y += irq_regs.h
diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
deleted file mode 100644
index c4321b9..000
--- a/arch/powerpc/include/asm/systbl.h
+++ /dev/null
@@ -1,395 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * List of powerpc syscalls. For the meaning of the _SPU suffix see
- * arch/powerpc/platforms/cell/spu_callbacks.c
- */
-
-SYSCALL(restart_syscall)
-SYSCALL(exit)
-PPC_SYS(fork)
-SYSCALL_SPU(read)
-SYSCALL_SPU(write)
-COMPAT_SYS_SPU(open)
-SYSCALL_SPU(close)
-SYSCALL_SPU(waitpid)
-SYSCALL_SPU(creat)
-SYSCALL_SPU(link)
-SYSCALL_SPU(unlink)
-COMPAT_SYS(execve)
-SYSCALL_SPU(chdir)
-COMPAT_SYS_SPU(time)
-SYSCALL_SPU(mknod)
-SYSCALL_SPU(chmod)
-SYSCALL_SPU(lchown)
-SYSCALL(ni_syscall)
-OLDSYS(stat)
-COMPAT_SYS_SPU(lseek)
-SYSCALL_SPU(getpid)
-COMPAT_SYS(mount)
-SYSX(sys_ni_syscall,sys_oldumount,sys_oldumount)
-SYSCALL_SPU(setuid)
-SYSCALL_SPU(getuid)
-COMPAT_SYS_SPU(stime)
-COMPAT_SYS(ptrace)
-SYSCALL_SPU(alarm)
-OLDSYS(fstat)
-SYSCALL(pause)
-COMPAT_SYS(utime)
-SYSCALL(ni_syscall)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(access)
-SYSCALL_SPU(nice)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(sync)
-SYSCALL_SPU(kill)
-SYSCALL_SPU(rename)
-SYSCALL_SPU(mkdir)
-SYSCALL_SPU(rmdir)
-SYSCALL_SPU(dup)
-SYSCALL_SPU(pipe)
-COMPAT_SYS_SPU(times)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(brk)
-SYSCALL_SPU(setgid)
-SYSCALL_SPU(getgid)
-SYSCALL(signal)
-SYSCALL_SPU(geteuid)
-SYSCALL_SPU(getegid)
-SYSCALL(acct)
-SYSCALL(umount)
-SYSCALL(ni_syscall)
-COMPAT_SYS_SPU(ioctl)
-COMPAT_SYS_SPU(fcntl)
-SYSCALL(ni_syscall)
-SYSCALL_SPU(setpgid)
-SYSCALL(ni_syscall)
-SYSX(sys_ni_syscall,sys_olduname,sys_olduname)
-SYSCALL_SPU(umask)
-SYSCALL_SPU(chroot)
-COMPAT_SYS(ustat)
-SYSCALL_SPU(dup2)
-SYSCALL_SPU(getppid)
-SYSCALL_SPU(getpgrp)
-SYSCALL_SPU(setsid)
-SYS32ONLY(sigaction)
-SYSCALL_SPU(sgetmask)
-SYSCALL_SPU(ssetmask)
-SYSCALL_SPU(setreuid)
-SYSCALL_SPU(setregid)
-SYS32ONLY(sigsuspend)
-SYSX(sys_ni_syscall,compat_sys_sigpending,sys_sigpending)
-SYSCALL_SPU(sethostname)
-COMPAT_SYS_SPU(setrlimit)
-SYSX(sys_ni_syscall,compat_sys_old_getrlimit,sys_old_getrlimit)
-COMPAT_SYS_SPU(getrusage)
-COMPAT_SYS_SPU(gettimeofday)
-COMPAT_SYS_SPU(settimeofday)
-SYSCALL_SPU(getgroups)
-SYSCALL_SPU(setgroups)
-SYSX(sys_ni_syscall,sys_ni_syscall,ppc_select)
-SYSCALL_SPU(symlink)
-OLDSYS(lstat)
-SYSCALL_SPU(readlink)
-SYSCALL(uselib)
-SYSCALL(swapon)
-SYSCALL(reboot)
-SYSX(sys_ni_syscall,compat_sys_old_readdir,sys_old_readdir)
-SYSCALL_SPU(mmap)
-SYSCALL_SPU(munmap)
-COMPAT_SYS_SPU(truncate)
-COMPAT_SYS_SPU(ftruncate)
-SYSCALL_SPU(fchmod)
-SYSCALL_SPU(fchown)
-SYSCALL_SPU(getpriority)
-SYSCALL_SPU(setpriority)
-SYSCALL(ni_syscall)
-COMPAT_SYS(statfs)
-COMPAT_SYS(fstatfs)
-SYSCALL(ni_syscall)
-COMPAT_SYS_SPU(socketcall)
-SYSCALL_SPU(syslog)
-COMPAT_SYS_SPU(setitimer)
-COMPAT_SYS_SPU(getitimer)
-COMPAT_SYS_SPU(newstat)
-COMPAT_SYS_SPU(newlstat)
-COMPAT_SYS_SPU(newfstat)
-SYSX(sys_ni_syscall,sys_uname,sys_uname)

[PATCH v4 4/5] powerpc: split compat syscall table out from native table

2018-12-06 Thread Firoz Khan

PowerPC uses a syscall table with native and compat calls
interleaved, which is a slightly simpler way to define two
matching tables.

As we move to having the tables generated, that advantage
is no longer important, but the interleaved table gets in
the way of using the same scripts as on the other archit-
ectures.

Split out a new compat_sys_call_table symbol that contains
all the compat calls, and leave the main table for the nat-
ive calls, to more closely match the method we use every-
where else.

Suggested-by: Arnd Bergmann 
Signed-off-by: Firoz Khan 
---
 arch/powerpc/include/asm/syscall.h |  3 +--
 arch/powerpc/kernel/entry_64.S |  7 +--
 arch/powerpc/kernel/systbl.S   | 35 ---
 arch/powerpc/kernel/vdso.c |  7 +--
 4 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/syscall.h 
b/arch/powerpc/include/asm/syscall.h
index ab9f3f0..1a0e7a8 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -18,9 +18,8 @@
 #include 
 
 /* ftrace syscalls requires exporting the sys_call_table */
-#ifdef CONFIG_FTRACE_SYSCALLS
 extern const unsigned long sys_call_table[];
-#endif /* CONFIG_FTRACE_SYSCALLS */
+extern const unsigned long compat_sys_call_table[];
 
 static inline int syscall_get_nr(struct task_struct *task, struct pt_regs 
*regs)
 {
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 7b1693a..5574d92 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -54,6 +54,9 @@
 SYS_CALL_TABLE:
.tc sys_call_table[TC],sys_call_table
 
+COMPAT_SYS_CALL_TABLE:
+   .tc compat_sys_call_table[TC],compat_sys_call_table
+
 /* This value is used to mark exception frames on the stack. */
 exception_marker:
.tc ID_EXC_MARKER[TC],STACK_FRAME_REGS_MARKER
@@ -173,7 +176,7 @@ system_call:/* label this so stack 
traces look sane */
ld  r11,SYS_CALL_TABLE@toc(2)
andis.  r10,r10,_TIF_32BIT@h
beq 15f
-   addir11,r11,8   /* use 32-bit syscall entries */
+   ld  r11,COMPAT_SYS_CALL_TABLE@toc(2)
clrldi  r3,r3,32
clrldi  r4,r4,32
clrldi  r5,r5,32
@@ -181,7 +184,7 @@ system_call:/* label this so stack 
traces look sane */
clrldi  r7,r7,32
clrldi  r8,r8,32
 15:
-   slwir0,r0,4
+   slwir0,r0,3
 
barrier_nospec_asm
/*
diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S
index 9ff1913..0fa84e1 100644
--- a/arch/powerpc/kernel/systbl.S
+++ b/arch/powerpc/kernel/systbl.S
@@ -17,13 +17,13 @@
 #include 
 
 #ifdef CONFIG_PPC64
-#define SYSCALL(func)  .8byte  DOTSYM(sys_##func),DOTSYM(sys_##func)
-#define COMPAT_SYS(func)   .8byte  
DOTSYM(sys_##func),DOTSYM(compat_sys_##func)
-#define PPC_SYS(func)  .8byte  DOTSYM(ppc_##func),DOTSYM(ppc_##func)
-#define OLDSYS(func)   .8byte  
DOTSYM(sys_ni_syscall),DOTSYM(sys_ni_syscall)
-#define SYS32ONLY(func).8byte  
DOTSYM(sys_ni_syscall),DOTSYM(compat_sys_##func)
-#define PPC64ONLY(func).8byte  
DOTSYM(ppc_##func),DOTSYM(sys_ni_syscall)
-#define SYSX(f, f3264, f32).8byte  DOTSYM(f),DOTSYM(f3264)
+#define SYSCALL(func)  .8byte  DOTSYM(sys_##func)
+#define COMPAT_SYS(func)   .8byte  DOTSYM(sys_##func)
+#define PPC_SYS(func)  .8byte  DOTSYM(ppc_##func)
+#define OLDSYS(func)   .8byte  DOTSYM(sys_ni_syscall)
+#define SYS32ONLY(func).8byte  DOTSYM(sys_ni_syscall)
+#define PPC64ONLY(func).8byte  DOTSYM(ppc_##func)
+#define SYSX(f, f3264, f32).8byte  DOTSYM(f)
 #else
 #define SYSCALL(func)  .long   sys_##func
 #define COMPAT_SYS(func)   .long   sys_##func
@@ -46,6 +46,27 @@
 
 .globl sys_call_table
 sys_call_table:
+#include 
+
+#undef SYSCALL
+#undef COMPAT_SYS
+#undef PPC_SYS
+#undef OLDSYS
+#undef SYS32ONLY
+#undef PPC64ONLY
+#undef SYSX
 
+#ifdef CONFIG_COMPAT
+#define SYSCALL(func)  .8byte  DOTSYM(sys_##func)
+#define COMPAT_SYS(func)   .8byte  DOTSYM(compat_sys_##func)
+#define PPC_SYS(func)  .8byte  DOTSYM(ppc_##func)
+#define OLDSYS(func)   .8byte  DOTSYM(sys_ni_syscall)
+#define SYS32ONLY(func).8byte  DOTSYM(compat_sys_##func)
+#define PPC64ONLY(func).8byte  DOTSYM(sys_ni_syscall)
+#define SYSX(f, f3264, f32).8byte  DOTSYM(f3264)
+
+.globl compat_sys_call_table
+compat_sys_call_table:
 #define compat_sys_sigsuspend  sys_sigsuspend
 #include 
+#endif
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 65b3bdb..7725a97 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -671,15 +671,18 @@ static void __init vdso_setup_syscall_map(void)
 {
unsigned int i;
extern unsigned long *sys_call_table;
+#ifdef CONFIG_PPC64
+   extern unsigned long

[PATCH v4 3/5] powerpc: add system call table generation support

2018-12-06 Thread Firoz Khan

The system call tables are in different format in all
architecture and it will be difficult to manually add or
modify the system calls in the respective files. To make
it easy by keeping a script and which will generate the
uapi header and syscall table file. This change will also
help to unify the implementation across all architectures.

The system call table generation script is added in
syscalls directory which contain the script to generate
both uapi header file and system call table files.
The syscall.tbl file will be the input for the scripts.

syscall.tbl contains the list of available system calls
along with system call number and corresponding entry point.
Add a new system call in this architecture will be possible
by adding new entry in the syscall.tbl file.

Adding a new table entry consisting of:
- System call number.
- ABI.
- System call name.
- Entry point name.
- Compat entry name, if required.

syscallhdr.sh and syscalltbl.sh will generate uapi header-
unistd_32/64.h and syscall_table_32/64/c32/spu.h files
respectively. File syscall_table_32/64/c32/spu.h is incl-
uded by syscall.S - the real system call table. Both *.sh
files will parse the content syscall.tbl to generate the
header and table files.

ARM, s390 and x86 architecuture does have similar support.
I leverage their implementation to come up with a generic
solution.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/kernel/syscalls/Makefile  |  63 +
 arch/powerpc/kernel/syscalls/syscall.tbl   | 427 +
 arch/powerpc/kernel/syscalls/syscallhdr.sh |  37 +++
 arch/powerpc/kernel/syscalls/syscalltbl.sh |  36 +++
 4 files changed, 563 insertions(+)
 create mode 100644 arch/powerpc/kernel/syscalls/Makefile
 create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
 create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh

diff --git a/arch/powerpc/kernel/syscalls/Makefile 
b/arch/powerpc/kernel/syscalls/Makefile
new file mode 100644
index 000..27b4895
--- /dev/null
+++ b/arch/powerpc/kernel/syscalls/Makefile
@@ -0,0 +1,63 @@
+# SPDX-License-Identifier: GPL-2.0
+kapi := arch/$(SRCARCH)/include/generated/asm
+uapi := arch/$(SRCARCH)/include/generated/uapi/asm
+
+_dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)')  \
+ $(shell [ -d '$(kapi)' ] || mkdir -p '$(kapi)')
+
+syscall := $(srctree)/$(src)/syscall.tbl
+syshdr := $(srctree)/$(src)/syscallhdr.sh
+systbl := $(srctree)/$(src)/syscalltbl.sh
+
+quiet_cmd_syshdr = SYSHDR  $@
+  cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@'   \
+  '$(syshdr_abis_$(basetarget))'   \
+  '$(syshdr_pfx_$(basetarget))'\
+  '$(syshdr_offset_$(basetarget))'
+
+quiet_cmd_systbl = SYSTBL  $@
+  cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@'   \
+  '$(systbl_abis_$(basetarget))'   \
+  '$(systbl_abi_$(basetarget))'\
+  '$(systbl_offset_$(basetarget))'
+
+syshdr_abis_unistd_32 := common,nospu,32
+$(uapi)/unistd_32.h: $(syscall) $(syshdr)
+   $(call if_changed,syshdr)
+
+syshdr_abis_unistd_64 := common,nospu,64
+$(uapi)/unistd_64.h: $(syscall) $(syshdr)
+   $(call if_changed,syshdr)
+
+systbl_abis_syscall_table_32 := common,nospu,32
+systbl_abi_syscall_table_32 := 32
+$(kapi)/syscall_table_32.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abis_syscall_table_64 := common,nospu,64
+systbl_abi_syscall_table_64 := 64
+$(kapi)/syscall_table_64.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abis_syscall_table_c32 := common,nospu,32
+systbl_abi_syscall_table_c32 := c32
+$(kapi)/syscall_table_c32.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abis_syscall_table_spu := common,spu
+systbl_abi_syscall_table_spu := spu
+$(kapi)/syscall_table_spu.h: $(syscall) $(systbl)
+   $(call if_changed,systbl)
+
+uapisyshdr-y   += unistd_32.h unistd_64.h
+kapisyshdr-y   += syscall_table_32.h   \
+  syscall_table_64.h   \
+  syscall_table_c32.h  \
+  syscall_table_spu.h
+
+targets+= $(uapisyshdr-y) $(kapisyshdr-y)
+
+PHONY += all
+all: $(addprefix $(uapi)/,$(uapisyshdr-y))
+all: $(addprefix $(kapi)/,$(kapisyshdr-y))
+   @:
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl 
b/arch/powerpc/kernel/syscalls/syscall.tbl
new file mode 100644
index 000..db3bbb8
--- /dev/null
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -0,0 +1,427 @@
+# SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note
+#
+# system call numbers and entry vectors for powerpc
+#
+# The format is:
+# 
+#
+# The  can be common, spu, nospu, 64, or 32 for this file.
+#
+0  nospu   restart_syscall sys_restart_syscall

[PATCH v4 2/5] powerpc: move macro definition from asm/systbl.h

2018-12-06 Thread Firoz Khan

Move the macro definition for compat_sys_sigsuspend from
asm/systbl.h to the file which it is getting included.

One of the patch in this patch series is generating uapi
header and syscall table files. In order to come up with
a common implimentation across all architecture, we need
to do this change.

This change will simplify the implementation of system
call table generation script and help to come up a common
implementation across all architecture.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/include/asm/systbl.h | 1 -
 arch/powerpc/kernel/systbl.S  | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 01b5171..c4321b9 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -76,7 +76,6 @@
 SYSCALL_SPU(ssetmask)
 SYSCALL_SPU(setreuid)
 SYSCALL_SPU(setregid)
-#define compat_sys_sigsuspend sys_sigsuspend
 SYS32ONLY(sigsuspend)
 SYSX(sys_ni_syscall,compat_sys_sigpending,sys_sigpending)
 SYSCALL_SPU(sethostname)
diff --git a/arch/powerpc/kernel/systbl.S b/arch/powerpc/kernel/systbl.S
index 919a327..9ff1913 100644
--- a/arch/powerpc/kernel/systbl.S
+++ b/arch/powerpc/kernel/systbl.S
@@ -47,4 +47,5 @@
 .globl sys_call_table
 sys_call_table:
 
+#define compat_sys_sigsuspend  sys_sigsuspend
 #include 
-- 
1.9.1

[PATCH v4 1/5] powerpc: add __NR_syscalls along with NR_syscalls

2018-12-06 Thread Firoz Khan

NR_syscalls macro holds the number of system call exist
in powerpc architecture. We have to change the value of
NR_syscalls, if we add or delete a system call.

One of the patch in this patch series has a script which
will generate a uapi header based on syscall.tbl file.
The syscall.tbl file contains the number of system call
information. So we have two option to update NR_syscalls
value.

1. Update NR_syscalls in asm/unistd.h manually by count-
   ing the no.of system calls. No need to update NR_sys-
   calls until we either add a new system call or delete
   existing system call.

2. We can keep this feature in above mentioned script,
   that will count the number of syscalls and keep it in
   a generated file. In this case we don't need to expli-
   citly update NR_syscalls in asm/unistd.h file.

The 2nd option will be the recommended one. For that, I
added the __NR_syscalls macro in uapi/asm/unistd.h along
with NR_syscalls asm/unistd.h. The macro __NR_syscalls
also added for making the name convention same across all
architecture. While __NR_syscalls isn't strictly part of
the uapi, having it as part of the generated header to
simplifies the implementation. We also need to enclose
this macro with #ifdef __KERNEL__ to avoid side effects.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/include/asm/unistd.h  | 3 +--
 arch/powerpc/include/uapi/asm/unistd.h | 5 -
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index b0de85b..a3c35e6 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -11,8 +11,7 @@
 
 #include 
 
-
-#define NR_syscalls389
+#define NR_syscalls__NR_syscalls
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index 985534d..7195868 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -10,7 +10,6 @@
 #ifndef _UAPI_ASM_POWERPC_UNISTD_H_
 #define _UAPI_ASM_POWERPC_UNISTD_H_
 
-
 #define __NR_restart_syscall 0
 #define __NR_exit1
 #define __NR_fork2
@@ -401,4 +400,8 @@
 #define __NR_rseq  387
 #define __NR_io_pgetevents 388
 
+#ifdef __KERNEL__
+#define __NR_syscalls  389
+#endif
+
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
1.9.1

[PATCH v4 0/5] powerpc: system call table generation support

2018-12-06 Thread Firoz Khan

The purpose of this patch series is, we can easily
add/modify/delete system call table support by cha-
nging entry in syscall.tbl file instead of manually
changing many files. The other goal is to unify the 
system call table generation support implementation 
across all the architectures. 

The system call tables are in different format in 
all architecture. It will be difficult to manually
add, modify or delete the system calls in the resp-
ective files manually. To make it easy by keeping a 
script and which'll generate uapi header file and 
syscall table file.

syscall.tbl contains the list of available system 
calls along with system call number and correspond-
ing entry point. Add a new system call in this arch-
itecture will be possible by adding new entry in 
the syscall.tbl file.

Adding a new table entry consisting of:
- System call number.
- ABI.
- System call name.
- Entry point name.
- Compat entry name, if required.
- spu entry name, if required.

ARM, s390 and x86 architecuture does exist the sim-
ilar support. I leverage their implementation to 
come up with a generic solution.

I have done the same support for work for alpha, 
ia64, m68k, microblaze, mips, parisc, sh, sparc, 
and xtensa. Below mentioned git repository contains
more details about the workflow.

https://github.com/frzkhn/system_call_table_generator/

Finally, this is the ground work to solve the Y2038
issue. We need to add two dozen of system calls to 
solve Y2038 issue. So this patch series will help to
add new system calls easily by adding new entry in the
syscall.tbl.

Changes since v3:
 - split compat syscall table out from native table.
 - modified the script to add new line in the generated
   file.

Changes since v2:
 - modified/optimized the syscall.tbl to avoid duplicate
   for the spu entries.
 - updated the syscalltbl.sh to meet the above point.

Changes since v1:
 - optimized/updated the syscall table generation 
   scripts.
 - fixed all mixed indentation issues in syscall.tbl.
 - added "comments" in syscall_*.tbl.
 - changed from generic-y to generated-y in Kbuild.

Firoz Khan (5):
  powerpc: add __NR_syscalls along with NR_syscalls
  powerpc: move macro definition from asm/systbl.h
  powerpc: add system call table generation support
  powerpc: split compat syscall table out from native table
  powerpc: generate uapi header and system call table files

 arch/powerpc/Makefile   |   3 +
 arch/powerpc/include/asm/Kbuild |   4 +
 arch/powerpc/include/asm/syscall.h  |   3 +-
 arch/powerpc/include/asm/systbl.h   | 396 --
 arch/powerpc/include/asm/unistd.h   |   3 +-
 arch/powerpc/include/uapi/asm/Kbuild|   2 +
 arch/powerpc/include/uapi/asm/unistd.h  | 389 +
 arch/powerpc/kernel/Makefile|  10 -
 arch/powerpc/kernel/entry_64.S  |   7 +-
 arch/powerpc/kernel/syscalls/Makefile   |  63 
 arch/powerpc/kernel/syscalls/syscall.tbl| 427 
 arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
 arch/powerpc/kernel/syscalls/syscalltbl.sh  |  36 +++
 arch/powerpc/kernel/systbl.S|  40 ++-
 arch/powerpc/kernel/systbl_chk.c|  60 
 arch/powerpc/kernel/vdso.c  |   7 +-
 arch/powerpc/platforms/cell/spu_callbacks.c |  17 +-
 17 files changed, 606 insertions(+), 898 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/systbl.h
 create mode 100644 arch/powerpc/kernel/syscalls/Makefile
 create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
 create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
 delete mode 100644 arch/powerpc/kernel/systbl_chk.c

-- 
1.9.1

Re: [PATCH 0/8] KVM: PPC: Implement passthrough of emulated devices for nested guests

2018-12-06 Thread Suraj Jitindar Singh

On Fri, 2018-12-07 at 14:43 +1100, Suraj Jitindar Singh wrote:
> This patch series allows for emulated devices to be passed through to
> nested
> guests, irrespective of at which level the device is being emulated.
> 
> Note that the emulated device must be using dma, not virtio.
> 
> For example, passing through an emulated e1000:
> 
> 1. Emulate the device at L(n) for L(n+1)
> 
> qemu-system-ppc64 -netdev type=user,id=net0 -device e1000,netdev=net0
> 
> 2. Assign the VFIO-PCI driver at L(n+1)
> 
> echo :00:00.0 > /sys/bus/pci/drivers/e1000/unbind
> echo :00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
> chmod 666 /dev/vfio/0
> 
> 3. Pass the device through from L(n+1) to L(n+2)
> 
> qemu-system-ppc64 -device vfio-pci,host=:00:00.0
> 
> 4. L(n+2) can now access the device which will be emulated at L(n)

Note,
[PATCH] KVM: PPC: Book3S PR: Set hflag to indicate that POWER9 supports
1T segments
is not supposed to be part of this series

> 
> Suraj Jitindar Singh (8):
>   KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
>   KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()
>   KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2
>   KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops
> struct
>   KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants
>   KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an
> L2
> guest
>   KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access
> quadrants
> 1 & 2
>   KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an
> L3
> guest
> 
>  arch/powerpc/include/asm/hvcall.h|   1 +
>  arch/powerpc/include/asm/kvm_book3s.h|  10 ++-
>  arch/powerpc/include/asm/kvm_book3s_64.h |  13 
>  arch/powerpc/include/asm/kvm_host.h  |   3 +
>  arch/powerpc/include/asm/kvm_ppc.h   |   4 ++
>  arch/powerpc/kernel/exceptions-64s.S |   9 +++
>  arch/powerpc/kvm/book3s_64_mmu_radix.c   |  97
> ++
>  arch/powerpc/kvm/book3s_hv.c |  58 ++--
>  arch/powerpc/kvm/book3s_hv_nested.c  | 114
> +--
>  arch/powerpc/kvm/powerpc.c   |  28 +++-
>  arch/powerpc/mm/fault.c  |   1 +
>  11 files changed, 323 insertions(+), 15 deletions(-)
>

[PATCH 8/8] KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest

2018-12-06 Thread Suraj Jitindar Singh

Previously when a device was being emulated by an L1 guest for an L2
guest, that device couldn't then be passed through to an L3 guest. This
was because the L1 guest had no method for accessing L3 memory.

The hcall H_COPY_TOFROM_GUEST provides this access. Thus this setup for
passthrough can now be allowed.

Signed-off-by: Suraj Jitindar Singh 
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 9 -
 arch/powerpc/kvm/book3s_hv_nested.c| 5 -
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index da89d10e5886..cf16e9d207a5 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -37,11 +37,10 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int 
pid,
int old_pid, old_lpid;
bool is_load = !!to;
 
-   /* Can't access quadrants 1 or 2 in non-HV mode */
-   if (kvmhv_on_pseries()) {
-   /* TODO h-call */
-   return -EPERM;
-   }
+   /* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */
+   if (kvmhv_on_pseries())
+   return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr,
+ to, from, n);
 
quadrant = 1;
if (!pid)
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index f54301fcfbe4..acde90eb56f7 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1284,11 +1284,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_run 
*run,
}
 
/* passthrough of emulated MMIO case */
-   if (kvmhv_on_pseries()) {
-   pr_err("emulated MMIO passthrough?\n");
-   return -EINVAL;
-   }
-
return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea, writing);
}
if (memslot->flags & KVM_MEM_READONLY) {
-- 
2.13.6

[PATCH 7/8] KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2

2018-12-06 Thread Suraj Jitindar Singh

A guest cannot access quadrants 1 or 2 as this would result in an
exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used by a
guest when it wants to perform an access to quadrants 1 or 2, for
example when it wants to access memory for one of its nested guests.

Also provide an implementation for the kvm-hv module.

Signed-off-by: Suraj Jitindar Singh 
---
 arch/powerpc/include/asm/hvcall.h  |  1 +
 arch/powerpc/include/asm/kvm_book3s.h  |  4 ++
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  7 ++--
 arch/powerpc/kvm/book3s_hv.c   |  6 ++-
 arch/powerpc/kvm/book3s_hv_nested.c| 75 ++
 5 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index 33a4fc891947..463c63a9fcf1 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -335,6 +335,7 @@
 #define H_SET_PARTITION_TABLE  0xF800
 #define H_ENTER_NESTED 0xF804
 #define H_TLB_INVALIDATE   0xF808
+#define H_COPY_TOFROM_GUEST0xF80C
 
 /* Values for 2nd argument to H_SET_MODE */
 #define H_SET_MODE_RESOURCE_SET_CIABR  1
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index ea94110bfde4..720483733bb2 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -188,6 +188,9 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm *kvm, 
unsigned long hc);
 extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run,
struct kvm_vcpu *vcpu,
unsigned long ea, unsigned long dsisr);
+extern unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid,
+   gva_t eaddr, void *to, void *from,
+   unsigned long n);
 extern long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
void *to, unsigned long n);
 extern long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
@@ -302,6 +305,7 @@ long kvmhv_nested_init(void);
 void kvmhv_nested_exit(void);
 void kvmhv_vm_nested_init(struct kvm *kvm);
 long kvmhv_set_partition_table(struct kvm_vcpu *vcpu);
+long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu);
 void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c 
b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index e1e3ef710bd0..da89d10e5886 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -29,9 +29,9 @@
  */
 static int p9_supported_radix_bits[4] = { 5, 9, 9, 13 };
 
-static unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid,
-   gva_t eaddr, void *to, void *from,
-   unsigned long n)
+unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid,
+ gva_t eaddr, void *to, void *from,
+ unsigned long n)
 {
unsigned long quadrant, ret = n;
int old_pid, old_lpid;
@@ -82,6 +82,7 @@ static unsigned long __kvmhv_copy_tofrom_guest_radix(int 
lpid, int pid,
 
return ret;
 }
+EXPORT_SYMBOL_GPL(__kvmhv_copy_tofrom_guest_radix);
 
 static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
  void *to, void *from, unsigned long n)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index e7233499e063..e2e15722584a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -995,7 +995,11 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
if (nesting_enabled(vcpu->kvm))
ret = kvmhv_do_nested_tlbie(vcpu);
break;
-
+   case H_COPY_TOFROM_GUEST:
+   ret = H_FUNCTION;
+   if (nesting_enabled(vcpu->kvm))
+   ret = kvmhv_copy_tofrom_guest_nested(vcpu);
+   break;
default:
return RESUME_HOST;
}
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 991f40ce4eea..f54301fcfbe4 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -462,6 +462,81 @@ long kvmhv_set_partition_table(struct kvm_vcpu *vcpu)
 }
 
 /*
+ * Handle the H_COPY_TOFROM_GUEST hcall.
+ * r4 = L1 lpid of nested guest
+ * r5 = pid
+ * r6 = eaddr to access
+ * r7 = to buffer (L1 gpa)
+ * r8 = from buffer (L1 gpa)
+ * r9 = n bytes to copy
+ */
+long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu)
+{
+   struct kvm_nested_guest *gp;
+   int l1_lpid = kvmppc_get_gpr(vcpu, 4);
+   int pid = kvmppc_get_gpr(vcpu, 5);
+   gva_t eaddr

[PATCH 6/8] KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest

2018-12-06 Thread Suraj Jitindar Singh

Allow for a device which is being emulated at L0 (the host) for an L1
guest to be passed through to a nested (L2) guest.

The existing kvmppc_hv_emulate_mmio function can be used here. The main
challenge is that for a load the result must be stored into the L2 gpr,
not an L1 gpr as would normally be the case after going out to qemu to
complete the operation. This presents a challenge as at this point the
L2 gpr state has been written back into L1 memory.

To work around this we store the address in L1 memory of the L2 gpr
where the result of the load is to be stored and use the new io_gpr
value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for
which completion must be done when returning back into the kernel. Then
in kvmppc_complete_mmio_load() the resultant value is written into L1
memory at the location of the indicated L2 gpr.

Note that we don't currently let an L1 guest emulate a device for an L2
guest which is then passed through to an L3 guest.

Signed-off-by: Suraj Jitindar Singh 
---
 arch/powerpc/include/asm/kvm_book3s.h |  2 +-
 arch/powerpc/include/asm/kvm_host.h   |  3 +++
 arch/powerpc/kvm/book3s_hv.c  | 12 ++
 arch/powerpc/kvm/book3s_hv_nested.c   | 43 ++-
 arch/powerpc/kvm/powerpc.c|  4 
 5 files changed, 53 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 5883fcce7009..ea94110bfde4 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -311,7 +311,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu,
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
 void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu,
   struct hv_guest_state *hr);
-long int kvmhv_nested_page_fault(struct kvm_vcpu *vcpu);
+long int kvmhv_nested_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu);
 
 void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong fac);
 
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index fac6f631ed29..7a2483a139cf 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -793,6 +793,7 @@ struct kvm_vcpu_arch {
/* For support of nested guests */
struct kvm_nested_guest *nested;
u32 nested_vcpu_id;
+   gpa_t nested_io_gpr;
 #endif
 
 #ifdef CONFIG_KVM_BOOK3S_HV_EXIT_TIMING
@@ -827,6 +828,8 @@ struct kvm_vcpu_arch {
 #define KVM_MMIO_REG_FQPR  0x00c0
 #define KVM_MMIO_REG_VSX   0x0100
 #define KVM_MMIO_REG_VMX   0x0180
+#define KVM_MMIO_REG_NESTED_GPR0xffc0
+
 
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6c8b4f632168..e7233499e063 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -984,6 +984,10 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
if (ret == H_INTERRUPT) {
kvmppc_set_gpr(vcpu, 3, 0);
return -EINTR;
+   } else if (ret == H_TOO_HARD) {
+   kvmppc_set_gpr(vcpu, 3, 0);
+   vcpu->arch.hcall_needed = 0;
+   return RESUME_HOST;
}
break;
case H_TLB_INVALIDATE:
@@ -1335,7 +1339,7 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
return r;
 }
 
-static int kvmppc_handle_nested_exit(struct kvm_vcpu *vcpu)
+static int kvmppc_handle_nested_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu)
 {
int r;
int srcu_idx;
@@ -1393,7 +1397,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
*vcpu)
 */
case BOOK3S_INTERRUPT_H_DATA_STORAGE:
srcu_idx = srcu_read_lock(>kvm->srcu);
-   r = kvmhv_nested_page_fault(vcpu);
+   r = kvmhv_nested_page_fault(run, vcpu);
srcu_read_unlock(>kvm->srcu, srcu_idx);
break;
case BOOK3S_INTERRUPT_H_INST_STORAGE:
@@ -1403,7 +1407,7 @@ static int kvmppc_handle_nested_exit(struct kvm_vcpu 
*vcpu)
if (vcpu->arch.shregs.msr & HSRR1_HISI_WRITE)
vcpu->arch.fault_dsisr |= DSISR_ISSTORE;
srcu_idx = srcu_read_lock(>kvm->srcu);
-   r = kvmhv_nested_page_fault(vcpu);
+   r = kvmhv_nested_page_fault(run, vcpu);
srcu_read_unlock(>kvm->srcu, srcu_idx);
break;
 
@@ -4058,7 +4062,7 @@ int kvmhv_run_single_vcpu(struct kvm_run *kvm_run,
if (!nested)
r = kvmppc_handle_exit_hv(kvm_run, vcpu, current);
else
-   r = kvmppc_handle_nested_exit(vcpu);
+   r = kvmppc_handle_nested_exit(kvm_run, vcpu);
}
vcpu->arch.ret

[PATCH 5/8] KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants

2018-12-06 Thread Suraj Jitindar Singh

The functions kvmppc_st and kvmppc_ld are used to access guest memory
from the host using a guest effective address. They do so by translating
through the process table to obtain a guest real address and then using
kvm_read_guest or kvm_write_guest to make the access with the guest real
address.

This method of access however only works for L1 guests and will give the
incorrect results for a nested guest.

We can however use the store_to_eaddr and load_from_eaddr kvmppc_ops to
perform the access for a nested guesti (and a L1 guest). So attempt this
method first and fall back to the old method if this fails and we aren't
running a nested guest.

At this stage there is no fall back method to perform the access for a
nested guest and this is left as a future improvement. For now we will
return to the nested guest and rely on the fact that a translation
should be faulted in before retrying the access.

Signed-off-by: Suraj Jitindar Singh 
---
 arch/powerpc/kvm/powerpc.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 95859c53a5cd..cb029fcab404 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -331,10 +331,17 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int 
size, void *ptr,
 {
ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK;
struct kvmppc_pte pte;
-   int r;
+   int r = -EINVAL;
 
vcpu->stat.st++;
 
+   if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->store_to_eaddr)
+   r = vcpu->kvm->arch.kvm_ops->store_to_eaddr(vcpu, eaddr, ptr,
+   size);
+
+   if ((!r) || (r == -EAGAIN))
+   return r;
+
r = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST,
 XLATE_WRITE, );
if (r < 0)
@@ -367,10 +374,17 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int 
size, void *ptr,
 {
ulong mp_pa = vcpu->arch.magic_page_pa & KVM_PAM & PAGE_MASK;
struct kvmppc_pte pte;
-   int rc;
+   int rc = -EINVAL;
 
vcpu->stat.ld++;
 
+   if (vcpu->kvm->arch.kvm_ops && vcpu->kvm->arch.kvm_ops->load_from_eaddr)
+   rc = vcpu->kvm->arch.kvm_ops->load_from_eaddr(vcpu, eaddr, ptr,
+ size);
+
+   if ((!rc) || (rc == -EAGAIN))
+   return rc;
+
rc = kvmppc_xlate(vcpu, *eaddr, data ? XLATE_DATA : XLATE_INST,
  XLATE_READ, );
if (rc)
-- 
2.13.6

[PATCH 4/8] KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct

2018-12-06 Thread Suraj Jitindar Singh

The kvmppc_ops struct is used to store function pointers to kvm
implementation specific functions.

Introduce two new functions load_from_eaddr and store_to_eaddr to be
used to load from and store to a guest effective address respectively.

Also implement these for the kvm-hv module. If we are using the radix
mmu then we can call the functions to access quadrant 1 and 2.

Signed-off-by: Suraj Jitindar Singh 
---
 arch/powerpc/include/asm/kvm_ppc.h |  4 
 arch/powerpc/kvm/book3s_hv.c   | 40 ++
 2 files changed, 44 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 9b89b1918dfc..159dd76700cb 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -326,6 +326,10 @@ struct kvmppc_ops {
unsigned long flags);
void (*giveup_ext)(struct kvm_vcpu *vcpu, ulong msr);
int (*enable_nested)(struct kvm *kvm);
+   int (*load_from_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr,
+  int size);
+   int (*store_to_eaddr)(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr,
+ int size);
 };
 
 extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d65b961661fb..6c8b4f632168 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5213,6 +5213,44 @@ static int kvmhv_enable_nested(struct kvm *kvm)
return 0;
 }
 
+static int kvmhv_load_from_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void 
*ptr,
+int size)
+{
+   int rc = -EINVAL;
+
+   if (kvmhv_vcpu_is_radix(vcpu)) {
+   rc = kvmhv_copy_from_guest_radix(vcpu, *eaddr, ptr, size);
+
+   if (rc > 0)
+   rc = -EINVAL;
+   }
+
+   /* For now quadrants are the only way to access nested guest memory */
+   if (rc && vcpu->arch.nested)
+   rc = -EAGAIN;
+
+   return rc;
+}
+
+static int kvmhv_store_to_eaddr(struct kvm_vcpu *vcpu, ulong *eaddr, void *ptr,
+   int size)
+{
+   int rc = -EINVAL;
+
+   if (kvmhv_vcpu_is_radix(vcpu)) {
+   rc = kvmhv_copy_to_guest_radix(vcpu, *eaddr, ptr, size);
+
+   if (rc > 0)
+   rc = -EINVAL;
+   }
+
+   /* For now quadrants are the only way to access nested guest memory */
+   if (rc && vcpu->arch.nested)
+   rc = -EAGAIN;
+
+   return rc;
+}
+
 static struct kvmppc_ops kvm_ops_hv = {
.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
@@ -5253,6 +5291,8 @@ static struct kvmppc_ops kvm_ops_hv = {
.get_rmmu_info = kvmhv_get_rmmu_info,
.set_smt_mode = kvmhv_set_smt_mode,
.enable_nested = kvmhv_enable_nested,
+   .load_from_eaddr = kvmhv_load_from_eaddr,
+   .store_to_eaddr = kvmhv_store_to_eaddr,
 };
 
 static int kvm_init_subcore_bitmap(void)
-- 
2.13.6

[PATCH 3/8] KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2

2018-12-06 Thread Suraj Jitindar Singh

The POWER9 radix mmu has the concept of quadrants. The quadrant number
is the two high bits of the effective address and determines the fully
qualified address to be used for the translation. The fully qualified
address consists of the effective lpid, the effective pid and the
effective address. This gives then 4 possible quadrants 0, 1, 2, and 3.

When accessing these quadrants the fully qualified address is obtained
as follows:

Quadrant| Hypervisor| Guest
--
| EA[0:1] = 0xb00   | EA[0:1] = 0xb00
0   | effLPID = 0   | effLPID = LPIDR
| effPID  = PIDR| effPID  = PIDR
--
| EA[0:1] = 0xb01   |
1   | effLPID = LPIDR   | Invalid Access
| effPID  = PIDR|
--
| EA[0:1] = 0xb10   |
2   | effLPID = LPIDR   | Invalid Access
| effPID  = 0   |
--
| EA[0:1] = 0xb11   | EA[0:1] = 0xb11
3   | effLPID = 0   | effLPID = LPIDR
| effPID  = 0   | effPID  = 0
--

In the Guest;
Quadrant 3 is normally used to address the operating system since this
uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to
be switched.
Quadrant 0 is normally used to address user space since the effLPID and
effPID are taken from the corresponding registers.

In the Host;
Quadrant 0 and 3 are used as above, however the effLPID is always 0 to
address the host.

Quadrants 1 and 2 can be used by the host to address guest memory using
a guest effective address. Since the effLPID comes from the LPID register,
the host loads the LPID of the guest it would like to access (and the
PID of the process) and can perform accesses to a guest effective
address.

This means quadrant 1 can be used to address the guest user space and
quadrant 2 can be used to address the guest operating system from the
hypervisor, using a guest effective address.

Access to the quadrants can cause a Hypervisor Data Storage Interrupt
(HDSI) due to being unable to perform partition scoped translation.
Previously this could only be generated from a guest and so the code
path expects us to take the KVM trampoline in the interrupt handler.
This is no longer the case so we modify the handler to call
bad_page_fault() to check if we were expecting this fault so we can
handle it gracefully and just return with an error code. In the hash mmu
case we still raise an unknown exception since quadrants aren't defined
for the hash mmu.

Signed-off-by: Suraj Jitindar Singh 
---
 arch/powerpc/include/asm/kvm_book3s.h  |  4 ++
 arch/powerpc/kernel/exceptions-64s.S   |  9 
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 97 ++
 arch/powerpc/mm/fault.c|  1 +
 4 files changed, 111 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 09f8e9ba69bc..5883fcce7009 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -188,6 +188,10 @@ extern int kvmppc_book3s_hcall_implemented(struct kvm 
*kvm, unsigned long hc);
 extern int kvmppc_book3s_radix_page_fault(struct kvm_run *run,
struct kvm_vcpu *vcpu,
unsigned long ea, unsigned long dsisr);
+extern long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
+   void *to, unsigned long n);
+extern long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
+ void *from, unsigned long n);
 extern int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr,
  struct kvmppc_pte *gpte, u64 root,
  u64 *pte_ret_p);
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 89d32bb79d5e..db2691ff4c0b 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -995,7 +995,16 @@ EXC_COMMON_BEGIN(h_data_storage_common)
bl  save_nvgprs
RECONCILE_IRQ_STATE(r10, r11)
addir3,r1,STACK_FRAME_OVERHEAD
+BEGIN_MMU_FTR_SECTION
+   ld  r4,PACA_EXGEN+EX_DAR(r13)
+   lwz r5,PACA_EXGEN+EX_DSISR(r13)
+   std r4,_DAR(r1)
+   std r5,_DSISR(r1)
+   li  r5,SIGSEGV
+   bl  bad_page_fault
+MMU_FTR_SECTION_ELSE
bl  unknown_exception

[PATCH 2/8] KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()

2018-12-06 Thread Suraj Jitindar Singh

There exists a function kvm_is_radix() which is used to determine if a
kvm instance is using the radix mmu. However this only applies to the
first level (L1) guest. Add a function kvmhv_vcpu_is_radix() which can
be used to determine if the current execution context of the vcpu is
radix, accounting for if the vcpu is running a nested guest.

Currently all nested guests must be radix but this may change in the
future.

Signed-off-by: Suraj Jitindar Singh 
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 13 +
 arch/powerpc/kvm/book3s_hv_nested.c  |  1 +
 2 files changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 6d298145d564..7a9e472f2872 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -55,6 +55,7 @@ struct kvm_nested_guest {
cpumask_t need_tlb_flush;
cpumask_t cpu_in_guest;
short prev_cpu[NR_CPUS];
+   u8 radix;   /* is this nested guest radix */
 };
 
 /*
@@ -150,6 +151,18 @@ static inline bool kvm_is_radix(struct kvm *kvm)
return kvm->arch.radix;
 }
 
+static inline bool kvmhv_vcpu_is_radix(struct kvm_vcpu *vcpu)
+{
+   bool radix;
+
+   if (vcpu->arch.nested)
+   radix = vcpu->arch.nested->radix;
+   else
+   radix = kvm_is_radix(vcpu->kvm);
+
+   return radix;
+}
+
 #define KVM_DEFAULT_HPT_ORDER  24  /* 16MB HPT by default */
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index 401d2ecbebc5..4fca462e54c4 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -480,6 +480,7 @@ struct kvm_nested_guest *kvmhv_alloc_nested(struct kvm 
*kvm, unsigned int lpid)
if (shadow_lpid < 0)
goto out_free2;
gp->shadow_lpid = shadow_lpid;
+   gp->radix = 1;
 
memset(gp->prev_cpu, -1, sizeof(gp->prev_cpu));
 
-- 
2.13.6

[PATCH 1/8] KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines

2018-12-06 Thread Suraj Jitindar Singh

The kvm capability KVM_CAP_SPAPR_TCE_VFIO is used to indicate the
availability of in kernel tce acceleration for vfio. However it is
currently the case that this is only available on a powernv machine,
not for a pseries machine.

Thus make this capability dependent on having the cpu feature
CPU_FTR_HVMODE.

Signed-off-by: Suraj Jitindar Singh 
---
 arch/powerpc/kvm/powerpc.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2869a299c4ed..95859c53a5cd 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -496,6 +496,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
int r;
/* Assume we're using HV mode when the HV module is loaded */
int hv_enabled = kvmppc_hv_ops ? 1 : 0;
+   int kvm_on_pseries = !cpu_has_feature(CPU_FTR_HVMODE);
 
if (kvm) {
/*
@@ -543,8 +544,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 #ifdef CONFIG_PPC_BOOK3S_64
case KVM_CAP_SPAPR_TCE:
case KVM_CAP_SPAPR_TCE_64:
-   /* fallthrough */
+   r = 1;
+   break;
case KVM_CAP_SPAPR_TCE_VFIO:
+   r = !kvm_on_pseries;
+   break;
case KVM_CAP_PPC_RTAS:
case KVM_CAP_PPC_FIXUP_HCALL:
case KVM_CAP_PPC_ENABLE_HCALL:
-- 
2.13.6

[PATCH] KVM: PPC: Book3S PR: Set hflag to indicate that POWER9 supports 1T segments

2018-12-06 Thread Suraj Jitindar Singh

When booting a kvm-pr guest on a POWER9 machine the following message is
observed:
"qemu-system-ppc64: KVM does not support 1TiB segments which guest expects"

This is because the guest is expecting to be able to use 1T segments
however we don't indicate support for it. This is because we don't set
the BOOK3S_HFLAG_MULTI_PGSIZE flag in the hflags in kvmppc_set_pvr_pr()
on POWER9.

POWER9 does indeed have support for 1T segments, so add a case for
POWER9 to the switch statement to ensure it is set.

Signed-off-by: Suraj Jitindar Singh 
---
 arch/powerpc/kvm/book3s_pr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 4efd65d9e828..82840160c606 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -587,6 +587,7 @@ void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr)
case PVR_POWER8:
case PVR_POWER8E:
case PVR_POWER8NVL:
+   case PVR_POWER9:
vcpu->arch.hflags |= BOOK3S_HFLAG_MULTI_PGSIZE |
BOOK3S_HFLAG_NEW_TLBIE;
break;
-- 
2.13.6

[PATCH 0/8] KVM: PPC: Implement passthrough of emulated devices for nested guests

2018-12-06 Thread Suraj Jitindar Singh

This patch series allows for emulated devices to be passed through to nested
guests, irrespective of at which level the device is being emulated.

Note that the emulated device must be using dma, not virtio.

For example, passing through an emulated e1000:

1. Emulate the device at L(n) for L(n+1)

qemu-system-ppc64 -netdev type=user,id=net0 -device e1000,netdev=net0

2. Assign the VFIO-PCI driver at L(n+1)

echo :00:00.0 > /sys/bus/pci/drivers/e1000/unbind
echo :00:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
chmod 666 /dev/vfio/0

3. Pass the device through from L(n+1) to L(n+2)

qemu-system-ppc64 -device vfio-pci,host=:00:00.0

4. L(n+2) can now access the device which will be emulated at L(n)

Suraj Jitindar Singh (8):
  KVM: PPC: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
  KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()
  KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2
  KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops
struct
  KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants
  KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2
guest
  KVM: PPC: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants
1 & 2
  KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3
guest

 arch/powerpc/include/asm/hvcall.h|   1 +
 arch/powerpc/include/asm/kvm_book3s.h|  10 ++-
 arch/powerpc/include/asm/kvm_book3s_64.h |  13 
 arch/powerpc/include/asm/kvm_host.h  |   3 +
 arch/powerpc/include/asm/kvm_ppc.h   |   4 ++
 arch/powerpc/kernel/exceptions-64s.S |   9 +++
 arch/powerpc/kvm/book3s_64_mmu_radix.c   |  97 ++
 arch/powerpc/kvm/book3s_hv.c |  58 ++--
 arch/powerpc/kvm/book3s_hv_nested.c  | 114 +--
 arch/powerpc/kvm/powerpc.c   |  28 +++-
 arch/powerpc/mm/fault.c  |   1 +
 11 files changed, 323 insertions(+), 15 deletions(-)

-- 
2.13.6

Re: [PATCH] powerpc/ipic: Fix a bounds check in ipic_set_priority()

2018-12-06 Thread Michael Ellerman

Christophe LEROY  writes:
> Le 05/12/2018 à 04:26, Michael Ellerman a écrit :
>> Hi Dan,
>> 
>> Thanks for the patch.
>> 
>> Dan Carpenter  writes:
>>> The ipic_info[] array only has 95 elements so I have made the bounds
>>> check smaller to prevent a read overflow.  It was Smatch that found
>>> this issue:
>>>
>>>  arch/powerpc/sysdev/ipic.c:784 ipic_set_priority()
>>>  error: buffer overflow 'ipic_info' 95 <= 127
>>>
>>> Signed-off-by: Dan Carpenter 
>>> ---
>>> I wasn't able to find any callers of this code.  Maybe we removed the
>>> last one in commit b9f0f1bb2bca ("[POWERPC] Adapt ipic driver to new
>>> host_ops interface, add set_irq_type to set IRQ sense").  So perhaps we
>>> should just remove it.  I'm not really comfortable doing that myself,
>>> because I don't know the code well enough and can't build test
>>> it properly.
>> 
>> Hah wow, last usage removed in 2006!
>> 
>> I don't see any mention of it since then, so I'll remove it. If it
>> breaks something we can put it back.
>> 
>> Can smatch help us find things like this that are defined non-static but
>> never used?
>> 
>
> I think we have to do that carrefully. Some of those functions might be 
> used by out-of-tree boards.

We don't keep unused code around for out-of-tree boards.

Either the out-of-tree code should be merged upstream, or you can
maintain whatever extra functions you need as part of your out-of-tree
code base.

> I'm thinking especially at ipic_get_mcp_status() and 
> ipic_set_mcp_status(). They are used in my 832x boards's machine check 
> handler to know when a machine check is a timeout from the 832x watchdog.

Thanks for pointing them out, I'll send a patch to remove them :)

But seriously, why is your machine check code not in-tree, is there some
reason you can't merge it?

cheers

Re: linux-next: manual merge of the akpm-current tree with the powerpc tree

2018-12-06 Thread Joel Fernandes

On Thu, Dec 06, 2018 at 05:44:17PM +1100, Stephen Rothwell wrote:
> Hi Andrew,
> 
> Today's linux-next merge of the akpm-current tree got conflicts in:
> 
>   arch/powerpc/include/asm/book3s/32/pgalloc.h
>   arch/powerpc/include/asm/nohash/32/pgalloc.h
>   arch/powerpc/mm/pgtable-book3s64.c
> 
> between commits:
> 
>   a95d133c8643 ("powerpc/mm: Move pte_fragment_alloc() to a common location")
>   32ea4c149990 ("powerpc/mm: Extend pte_fragment functionality to PPC32")
> 
> from the powerpc tree and commit:
> 
>   913c2d755b39 ("mm: treewide: remove unused address argument from pte_alloc 
> functions")
> 
> from the akpm-current tree.
> 
> I fixed it up (see below, plus the extra merge fix patch) and can
> carry the fix as necessary. This is now fixed as far as linux-next is
> concerned, but any non trivial conflicts should be mentioned to your
> upstream maintainer when your tree is submitted for merging.  You may
> also want to consider cooperating with the maintainer of the conflicting
> tree to minimise any particularly complex conflicts.

The conflict resolution looks good to me.

Reviewed-by: Joel Fernandes (Google) 

thanks,

 - Joel

Re: [PATCH v4] powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call

2018-12-06 Thread Dmitry V. Levin

On Mon, Dec 03, 2018 at 06:18:23AM +0300, Dmitry V. Levin wrote:
> From: Elvira Khabirova 
> 
> Arch code should use tracehook_*() helpers, as documented
> in include/linux/tracehook.h,
> ptrace_report_syscall() is not expected to be used outside that file.
> 
> Co-authored-by: Dmitry V. Levin 
> Fixes: 5521eb4bca2d ("powerpc/ptrace: Add support for PTRACE_SYSEMU")
> Signed-off-by: Elvira Khabirova 
> Signed-off-by: Dmitry V. Levin 
> ---
> v4: rewritten to call tracehook_report_syscall_entry() once, compile-tested
> v3: add a descriptive comment
> v2: explicitly ignore tracehook_report_syscall_entry() return code
> 
>  arch/powerpc/kernel/ptrace.c | 54 +++-
>  1 file changed, 35 insertions(+), 19 deletions(-)

Sorry, this patch does not work, please ignore it.
However, the bug blocks PTRACE_GET_SYSCALL_INFO, so please fix it.

I'm going to use
if (tracehook_report_syscall_entry(regs))
return -1;
return -1;
in the series until you have a better fix.


-- 
ldv


signature.asc
Description: PGP signature

Re: [PATCH v2 18/34] dt-bindings: arm: Convert FSL board/soc bindings to json-schema

2018-12-06 Thread Rob Herring

On Wed, Dec 5, 2018 at 8:32 PM Shawn Guo  wrote:
>
> On Mon, Dec 03, 2018 at 03:32:07PM -0600, Rob Herring wrote:
> > Convert Freescale SoC bindings to DT schema format using json-schema.
> >
> > Cc: Shawn Guo 
> > Cc: Mark Rutland 
> > Cc: devicet...@vger.kernel.org
> > Signed-off-by: Rob Herring 
> > ---
> >  .../devicetree/bindings/arm/armadeus.txt  |   6 -
> >  Documentation/devicetree/bindings/arm/bhf.txt |   6 -
> >  .../bindings/arm/compulab-boards.txt  |  25 --
> >  Documentation/devicetree/bindings/arm/fsl.txt | 229 --
> >  .../devicetree/bindings/arm/fsl.yaml  | 214 
>
> Rob,
>
> I do have any changes on bindings/arm/fsl.txt queued for 4.21 on my
> tree, so please send it via your tree.

What about:

c386f362957b dt-bindings: Add compatible string for LS1028A-QDS
3671cd57de06 dt-bindings: ls1012a: Add FRWY-LS1012A device tree binding

>
> Acked-by: Shawn Guo

Re: [PATCH v2 30/34] dt-bindings: arm: Convert Tegra board/soc bindings to json-schema

2018-12-06 Thread Rob Herring

On Tue, Dec 4, 2018 at 2:50 AM Thierry Reding  wrote:
>
> On Mon, Dec 03, 2018 at 03:32:19PM -0600, Rob Herring wrote:
> > Convert Tegra SoC bindings to DT schema format using json-schema.
> >
> > Cc: Mark Rutland 
> > Cc: Thierry Reding 
> > Cc: Jonathan Hunter 
> > Cc: devicet...@vger.kernel.org
> > Cc: linux-te...@vger.kernel.org
> > Signed-off-by: Rob Herring 
> > ---
> >  .../devicetree/bindings/arm/tegra.txt |  65 ---
> >  .../devicetree/bindings/arm/tegra.yaml| 101 ++
> >  2 files changed, 101 insertions(+), 65 deletions(-)
> >  delete mode 100644 Documentation/devicetree/bindings/arm/tegra.txt
> >  create mode 100644 Documentation/devicetree/bindings/arm/tegra.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/arm/tegra.txt 
> > b/Documentation/devicetree/bindings/arm/tegra.txt
> > deleted file mode 100644
> > index c59b15f64346..
> > --- a/Documentation/devicetree/bindings/arm/tegra.txt
> > +++ /dev/null
> > @@ -1,65 +0,0 @@
> > -NVIDIA Tegra device tree bindings
> > 
> > -
> > -SoCs
> > 
> > -
> > -Each device tree must specify which Tegra SoC it uses, using one of the
> > -following compatible values:
> > -
> > -  nvidia,tegra20
> > -  nvidia,tegra30
> > -  nvidia,tegra114
> > -  nvidia,tegra124
> > -  nvidia,tegra132
> > -  nvidia,tegra210
> > -  nvidia,tegra186
> > -  nvidia,tegra194
> > -
> > -Boards
> > 
> > -
> > -Each device tree must specify which one or more of the following
> > -board-specific compatible values:
> > -
> > -  ad,medcom-wide
> > -  ad,plutux
> > -  ad,tamonten
> > -  ad,tec
> > -  compal,paz00
> > -  compulab,trimslice
> > -  nvidia,beaver
> > -  nvidia,cardhu
> > -  nvidia,cardhu-a02
> > -  nvidia,cardhu-a04
> > -  nvidia,dalmore
> > -  nvidia,harmony
> > -  nvidia,jetson-tk1
> > -  nvidia,norrin
> > -  nvidia,p2371-
> > -  nvidia,p2371-2180
> > -  nvidia,p2571
> > -  nvidia,p2771-
> > -  nvidia,p2972-
> > -  nvidia,roth
> > -  nvidia,seaboard
> > -  nvidia,tn7
> > -  nvidia,ventana
> > -  toradex,apalis_t30
> > -  toradex,apalis_t30-eval
> > -  toradex,apalis_t30-v1.1
> > -  toradex,apalis_t30-v1.1-eval
> > -  toradex,apalis-tk1
> > -  toradex,apalis-tk1-eval
> > -  toradex,apalis-tk1-v1.2
> > -  toradex,apalis-tk1-v1.2-eval
> > -  toradex,colibri_t20
> > -  toradex,colibri_t20-eval-v3
> > -  toradex,colibri_t20-iris
> > -  toradex,colibri_t30
> > -  toradex,colibri_t30-eval-v3
> > -
> > -Trusted Foundations
> > 
> > -Tegra supports the Trusted Foundation secure monitor. See the
> > -"tlm,trusted-foundations" binding's documentation for more details.
> > diff --git a/Documentation/devicetree/bindings/arm/tegra.yaml 
> > b/Documentation/devicetree/bindings/arm/tegra.yaml
> > new file mode 100644
> > index ..66493892ffc1
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/arm/tegra.yaml
> > @@ -0,0 +1,101 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/arm/tegra.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
>
> Could you explain what these are? They give 404, so I assume they are
> more like placeholders and not actually used?

Please read the documentation in patch 2. They are used, but aren't live URLs.

> > +
> > +title: NVIDIA Tegra device tree bindings
> > +
> > +maintainers:
> > +  - Marcel Ziswiler 
> > +  - Peter De Schrijver 
>
> Not sure how you got that list, but probably from git history. I think
> it makes sense to replace Marcel and Peter with Jon and myself.

Will do.

> Other than that, looks fine to me. Some of the enumerations below look
> somewhat hard to parse, but I suspect that it will become second nature
> in no time.

We can add descriptions and/or comments if that helps. I didn't add
them if not already there.

> Acked-by: Thierry Reding 

Thanks.

Rob

Re: [PATCH V3 1/5] mm: Update ptep_modify_prot_start/commit to take vm_area_struct as arg

2018-12-06 Thread kbuild test robot

Hi Aneesh,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.20-rc5 next-20181206]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/NestMMU-pte-upgrade-workaround-for-mprotect/20181207-040417
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/msr.h:246:0,
from arch/x86/include/asm/processor.h:21,
from arch/x86/include/asm/cpufeature.h:8,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:81,
from include/linux/spinlock.h:51,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/slab.h:15,
from include/linux/crypto.h:24,
from arch/x86/kernel/asm-offsets.c:9:
   arch/x86/include/asm/paravirt.h: In function 'ptep_modify_prot_start':
>> arch/x86/include/asm/paravirt.h:424:28: error: dereferencing pointer to 
>> incomplete type 'struct vm_area_struct'
 struct mm_struct *mm = vma->vm_mm;
   ^~
   make[2]: *** [arch/x86/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +424 arch/x86/include/asm/paravirt.h

   418  
   419  #define  __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION
   420  static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma, 
unsigned long addr,
   421 pte_t *ptep)
   422  {
   423  pteval_t ret;
 > 424  struct mm_struct *mm = vma->vm_mm;
   425  
   426  ret = PVOP_CALL3(pteval_t, mmu.ptep_modify_prot_start, mm, 
addr, ptep);
   427  
   428  return (pte_t) { .pte = ret };
   429  }
   430  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH v2 5/6] arch: simplify several early memory allocations

2018-12-06 Thread Mike Rapoport

On Thu, Dec 06, 2018 at 07:08:26PM +0100, Sam Ravnborg wrote:
> On Mon, Dec 03, 2018 at 06:49:21PM +0200, Mike Rapoport wrote:
> > On Mon, Dec 03, 2018 at 05:29:08PM +0100, Sam Ravnborg wrote:
> > > Hi Mike.
> > > 
> > > > index c37955d..2a17665 100644
> > > > --- a/arch/sparc/kernel/prom_64.c
> > > > +++ b/arch/sparc/kernel/prom_64.c
> > > > @@ -34,16 +34,13 @@
> > > >  
> > > >  void * __init prom_early_alloc(unsigned long size)
> > > >  {
> > > > -   unsigned long paddr = memblock_phys_alloc(size, 
> > > > SMP_CACHE_BYTES);
> > > > -   void *ret;
> > > > +   void *ret = memblock_alloc(size, SMP_CACHE_BYTES);
> > > >  
> > > > -   if (!paddr) {
> > > > +   if (!ret) {
> > > > prom_printf("prom_early_alloc(%lu) failed\n", size);
> > > > prom_halt();
> > > > }
> > > >  
> > > > -   ret = __va(paddr);
> > > > -   memset(ret, 0, size);
> > > > prom_early_allocated += size;
> > > >  
> > > > return ret;
> > > 
> > > memblock_alloc() calls memblock_alloc_try_nid().
> > > And if allocation fails then memblock_alloc_try_nid() calls panic().
> > > So will we ever hit the prom_halt() code?
> > 
> > memblock_phys_alloc_try_nid() also calls panic if an allocation fails. So
> > in either case we never reach prom_halt() code.
> 
> So we have code here we never reach - not nice.
> If the idea is to avoid relying on the panic inside memblock_alloc() then
> maybe replace it with a variant that do not call panic?
> To make it clear what happens.

My plan is to completely remove memblock variants that call panic() and
make the callers check the return value.

I've started to work on it, but with the holidays it progresses slower than
I'd like to.

Since the code here was unreachable for several year, a few more weeks
won't make real difference so I'd prefer to keep the variant with panic()
for now. 
 
>   Sam
> 

-- 
Sincerely yours,
Mike.

Re: [PATCH v2 14/34] dt-bindings: arm: Convert Amlogic board/soc bindings to json-schema

2018-12-06 Thread Rob Herring

On Tue, Dec 4, 2018 at 8:44 AM Rob Herring  wrote:
>
> On Tue, Dec 4, 2018 at 2:39 AM Neil Armstrong  wrote:
> >
> > Hi Rob,
> >
> > You forgot linux-amlogic in CC...
> >
> > On 03/12/2018 22:32, Rob Herring wrote:
> > > Convert Amlogic SoC bindings to DT schema format using json-schema.
> > >
> > > Cc: Carlo Caione 
> > > Cc: Kevin Hilman 
> > > Cc: Mark Rutland 
> > > Cc: devicet...@vger.kernel.org
> > > Signed-off-by: Rob Herring 
> > > ---
>
> [...]
>
> > > +  - items:
> > > +  - enum:
> > > +  - amlogic,s400
> > > +  - const: amlogic,a113d
> > > +  - const: amlogic,meson-axg
> > > +  - items:
> > > +  - enum:
> > > +  - amlogic,u200
> > > +  - const: amlogic,g12a
> >
> > but all this feels wrong for me.
> >
> > First of all, this yaml description is not human friendly and not intuitive 
> > at all,
> > and secondly with this conversion we loose all the comments about the SoC 
> > family relationship
> > with the compatible strings !
> >
> > I really understand the point to have automated verification, but really 
> > it's a pain to read
> > (I can't imagine newcomers... the actual DT bindings are already hard to 
> > read...) and
> > I feel it will be a real pain to write !
>
> What do you suggest that would be easier? Is it the YAML itself or the
> json-schema vocabulary? For the former, we could use {} and [] to make
> things more json style. But I imagine it is the latter.
>
> There is some learning curve for json-schema and is certainly a
> concern I have, but there would be a learning curve for anything. Our
> choices are use some existing schema language or invent one. All the
> previous efforts (there's been about 5 since 2013) have been inventing
> one, and they've not gone far. There will be far few resources
> available to train people with if we do something custom.
>
> > Can't we mix an "humam text" with a "yaml" part on a same document ? we are 
> > in 2018 (nearly 2019),
> > and it should be easy to extract a yaml description from a text document 
> > without pain and
> > keep all the human description, no ?
>
> Yes. Please go look at the annotated example in patch 2.

How's this?:

  compatible:
oneOf:
  - description: Boards with the Amlogic Meson6 SoC
items:
  - enum:
  - geniatech,atv1200
  - const: amlogic,meson6

  - description: Boards with the Amlogic Meson8 SoC
items:
  - enum:
  - minix,neo-x8
  - const: amlogic,meson8

  - description: Boards with the Amlogic Meson8m2 SoC
items:
  - enum:
  - tronsmart,mxiii-plus
  - const: amlogic,meson8m2

Re: [PATCH v2 2/2] kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops

2018-12-06 Thread Borislav Petkov

On Thu, Dec 06, 2018 at 08:07:40PM +, Christophe Leroy wrote:
> checkpatch.pl reports the following:
> 
>   WARNING: struct kgdb_arch should normally be const
>   #28: FILE: arch/mips/kernel/kgdb.c:397:
>   +struct kgdb_arch arch_kgdb_ops = {
> 
> This report makes sense, as all other ops struct, this
> one should also be const. This patch does the change.
> 
> Cc: Vineet Gupta 
> Cc: Russell King 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Yoshinori Sato 
> Cc: Richard Kuo 
> Cc: Michal Simek 
> Cc: Ralf Baechle 
> Cc: Paul Burton 
> Cc: James Hogan 
> Cc: Ley Foon Tan 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: Rich Felker 
> Cc: "David S. Miller" 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: x...@kernel.org
> Acked-by: Daniel Thompson 
> Acked-by: Paul Burton 
> Signed-off-by: Christophe Leroy 
> ---
>  v2: Added CCs to all maintainers/supporters identified by get_maintainer.pl 
> and Acks from Daniel and Paul.
> 
>  arch/arc/kernel/kgdb.c| 2 +-
>  arch/arm/kernel/kgdb.c| 2 +-
>  arch/arm64/kernel/kgdb.c  | 2 +-
>  arch/h8300/kernel/kgdb.c  | 2 +-
>  arch/hexagon/kernel/kgdb.c| 2 +-
>  arch/microblaze/kernel/kgdb.c | 2 +-
>  arch/mips/kernel/kgdb.c   | 2 +-
>  arch/nios2/kernel/kgdb.c  | 2 +-
>  arch/powerpc/kernel/kgdb.c| 2 +-
>  arch/sh/kernel/kgdb.c | 2 +-
>  arch/sparc/kernel/kgdb_32.c   | 2 +-
>  arch/sparc/kernel/kgdb_64.c   | 2 +-
>  arch/x86/kernel/kgdb.c| 2 +-
>  include/linux/kgdb.h  | 2 +-
>  14 files changed, 14 insertions(+), 14 deletions(-)

...

> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index 8e36f249646e..e7effc02f13c 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -804,7 +804,7 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
> (char *)bpt->saved_instr, BREAK_INSTR_SIZE);
>  }
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* Breakpoint instruction: */
>   .gdb_bpt_instr  = { 0xcc },
>   .flags  = KGDB_HW_BREAKPOINT,

For the x86 bits:

Acked-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Applied "ASoC: Use of_node_name_eq for node name comparisons" to the asoc tree

2018-12-06 Thread Mark Brown

The patch

   ASoC: Use of_node_name_eq for node name comparisons

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 1d52a74ea2300158f87196fa381cde52d98cf4e4 Mon Sep 17 00:00:00 2001
From: Rob Herring 
Date: Wed, 5 Dec 2018 13:50:49 -0600
Subject: [PATCH] ASoC: Use of_node_name_eq for node name comparisons

Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.

For the FSL ASoC card, the full node names appear to be "ssi", "esai",
and "sai", so there's not any reason to use strstr and of_node_name_eq
can be used instead.

Cc: Timur Tabi 
Cc: Nicolin Chen 
Cc: Xiubo Li 
Cc: Fabio Estevam 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: alsa-de...@alsa-project.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring 
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/fsl-asoc-card.c   | 6 +++---
 sound/soc/generic/simple-scu-card.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 44433b20435c..81f2fe2c6d23 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -571,17 +571,17 @@ static int fsl_asoc_card_probe(struct platform_device 
*pdev)
}
 
/* Common settings for corresponding Freescale CPU DAI driver */
-   if (strstr(cpu_np->name, "ssi")) {
+   if (of_node_name_eq(cpu_np, "ssi")) {
/* Only SSI needs to configure AUDMUX */
ret = fsl_asoc_card_audmux_init(np, priv);
if (ret) {
dev_err(>dev, "failed to init audmux\n");
goto asrc_fail;
}
-   } else if (strstr(cpu_np->name, "esai")) {
+   } else if (of_node_name_eq(cpu_np, "esai")) {
priv->cpu_priv.sysclk_id[1] = ESAI_HCKT_EXTAL;
priv->cpu_priv.sysclk_id[0] = ESAI_HCKR_EXTAL;
-   } else if (strstr(cpu_np->name, "sai")) {
+   } else if (of_node_name_eq(cpu_np, "sai")) {
priv->cpu_priv.sysclk_id[1] = FSL_SAI_CLK_MAST1;
priv->cpu_priv.sysclk_id[0] = FSL_SAI_CLK_MAST1;
}
diff --git a/sound/soc/generic/simple-scu-card.c 
b/sound/soc/generic/simple-scu-card.c
index 7ae1901b2f85..656abe2015e1 100644
--- a/sound/soc/generic/simple-scu-card.c
+++ b/sound/soc/generic/simple-scu-card.c
@@ -223,7 +223,7 @@ static int asoc_simple_card_parse_of(struct 
simple_card_data *priv)
i = 0;
for_each_child_of_node(node, np) {
is_fe = false;
-   if (strcmp(np->name, PREFIX "cpu") == 0)
+   if (of_node_name_eq(np, PREFIX "cpu"))
is_fe = true;
 
ret = asoc_simple_card_dai_link_of(np, priv, daifmt, i, is_fe);
-- 
2.19.0.rc2

Re: [PATCH 2/2] kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops

2018-12-06 Thread Christophe LEROY





Le 06/12/2018 à 15:07, Daniel Thompson a écrit :

On Wed, Dec 05, 2018 at 04:41:11AM +, Christophe Leroy wrote:

checkpatch.pl reports the following:

   WARNING: struct kgdb_arch should normally be const
   #28: FILE: arch/mips/kernel/kgdb.c:397:
   +struct kgdb_arch arch_kgdb_ops = {

This report makes sense, as all other ops struct, this
one should also be const. This patch does the change.

Signed-off-by: Christophe Leroy 


Acked-by: Daniel Thompson 

Similar to https://patchwork.kernel.org/patch/10701129/ I would be more
comfortable to see a resend with the relevant arch maintainers
explicitly called out with a Cc: entry here.


Done in v2

And
Acked-by: Paul Burton 

Thanks
Christophe





---
  arch/arc/kernel/kgdb.c| 2 +-
  arch/arm/kernel/kgdb.c| 2 +-
  arch/arm64/kernel/kgdb.c  | 2 +-
  arch/h8300/kernel/kgdb.c  | 2 +-
  arch/hexagon/kernel/kgdb.c| 2 +-
  arch/microblaze/kernel/kgdb.c | 2 +-
  arch/mips/kernel/kgdb.c   | 2 +-
  arch/nios2/kernel/kgdb.c  | 2 +-
  arch/powerpc/kernel/kgdb.c| 2 +-
  arch/sh/kernel/kgdb.c | 2 +-
  arch/sparc/kernel/kgdb_32.c   | 2 +-
  arch/sparc/kernel/kgdb_64.c   | 2 +-
  arch/x86/kernel/kgdb.c| 2 +-
  include/linux/kgdb.h  | 2 +-
  14 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 9a3c34af2ae8..bfd04b442e36 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -204,7 +204,7 @@ void kgdb_roundup_cpus(unsigned long flags)
local_irq_disable();
  }
  
-struct kgdb_arch arch_kgdb_ops = {

+const struct kgdb_arch arch_kgdb_ops = {
/* breakpoint instruction: TRAP_S 0x3 */
  #ifdef CONFIG_CPU_BIG_ENDIAN
.gdb_bpt_instr  = {0x78, 0x7e},
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index caa0dbe3dc61..21a6d5958955 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -274,7 +274,7 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
   * and we handle the normal undef case within the do_undefinstr
   * handler.
   */
-struct kgdb_arch arch_kgdb_ops = {
+const struct kgdb_arch arch_kgdb_ops = {
  #ifndef __ARMEB__
.gdb_bpt_instr  = {0xfe, 0xde, 0xff, 0xe7}
  #else /* ! __ARMEB__ */
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a20de58061a8..fe1d1f935b90 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -357,7 +357,7 @@ void kgdb_arch_exit(void)
unregister_die_notifier(_notifier);
  }
  
-struct kgdb_arch arch_kgdb_ops;

+const struct kgdb_arch arch_kgdb_ops;
  
  int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)

  {
diff --git a/arch/h8300/kernel/kgdb.c b/arch/h8300/kernel/kgdb.c
index 1a1d30cb0609..602e478afbd5 100644
--- a/arch/h8300/kernel/kgdb.c
+++ b/arch/h8300/kernel/kgdb.c
@@ -129,7 +129,7 @@ void kgdb_arch_exit(void)
/* Nothing to do */
  }
  
-struct kgdb_arch arch_kgdb_ops = {

+const struct kgdb_arch arch_kgdb_ops = {
/* Breakpoint instruction: trapa #2 */
.gdb_bpt_instr = { 0x57, 0x20 },
  };
diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
index 16c24b22d0b2..f1924d483e78 100644
--- a/arch/hexagon/kernel/kgdb.c
+++ b/arch/hexagon/kernel/kgdb.c
@@ -83,7 +83,7 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
{ "syscall_nr", GDB_SIZEOF_REG, offsetof(struct pt_regs, syscall_nr)},
  };
  
-struct kgdb_arch arch_kgdb_ops = {

+const struct kgdb_arch arch_kgdb_ops = {
/* trap0(#0xDB) 0x0cdb0054 */
.gdb_bpt_instr = {0x54, 0x00, 0xdb, 0x0c},
  };
diff --git a/arch/microblaze/kernel/kgdb.c b/arch/microblaze/kernel/kgdb.c
index 6366f69d118e..130cd0f064ce 100644
--- a/arch/microblaze/kernel/kgdb.c
+++ b/arch/microblaze/kernel/kgdb.c
@@ -143,7 +143,7 @@ void kgdb_arch_exit(void)
  /*
   * Global data
   */
-struct kgdb_arch arch_kgdb_ops = {
+const struct kgdb_arch arch_kgdb_ops = {
  #ifdef __MICROBLAZEEL__
.gdb_bpt_instr = {0x18, 0x00, 0x0c, 0xba}, /* brki r16, 0x18 */
  #else
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index 31eff1bec577..edfdc2ec2d16 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -394,7 +394,7 @@ int kgdb_arch_handle_exception(int vector, int signo, int 
err_code,
return -1;
  }
  
-struct kgdb_arch arch_kgdb_ops = {

+const struct kgdb_arch arch_kgdb_ops = {
  #ifdef CONFIG_CPU_BIG_ENDIAN
.gdb_bpt_instr = { spec_op << 2, 0x00, 0x00, break_op },
  #else
diff --git a/arch/nios2/kernel/kgdb.c b/arch/nios2/kernel/kgdb.c
index 117859122d1c..37b25f844a2d 100644
--- a/arch/nios2/kernel/kgdb.c
+++ b/arch/nios2/kernel/kgdb.c
@@ -165,7 +165,7 @@ void kgdb_arch_exit(void)
/* Nothing to do */
  }
  
-struct kgdb_arch arch_kgdb_ops = {

+const struct kgdb_arch arch_kgdb_ops = {
/* Breakpoint instruction: trap 30 */
.gdb_bpt_instr = { 0xba, 0x6f, 0x3b, 0x00 },
  };
diff --git a/arch/powerpc/kernel/kgdb.c

Re: [PATCH 1/2] mips/kgdb: prepare arch_kgdb_ops for constness

2018-12-06 Thread Christophe LEROY





Le 06/12/2018 à 15:09, Daniel Thompson a écrit :

On Wed, Dec 05, 2018 at 04:41:09AM +, Christophe Leroy wrote:

MIPS is the only architecture modifying arch_kgdb_ops during init.
This patch makes the init static, so that it can be changed to
const in following patch, as recommended by checkpatch.pl

Suggested-by: Paul Burton 
Signed-off-by: Christophe Leroy 


 From my side this is
Acked-by: Daniel Thompson 

Since this is a dependency for the next patch I'd be happy to take via
my tree... but would need an ack from the MIPS guys to do that.


Got an ack from MIPS:
Acked-by: Paul Burton 

Included in commit text v2

Christophe




Daniel.


---
  arch/mips/kernel/kgdb.c | 16 +++-
  1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index eb6c0d582626..31eff1bec577 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -394,18 +394,16 @@ int kgdb_arch_handle_exception(int vector, int signo, int 
err_code,
return -1;
  }
  
-struct kgdb_arch arch_kgdb_ops;

+struct kgdb_arch arch_kgdb_ops = {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+   .gdb_bpt_instr = { spec_op << 2, 0x00, 0x00, break_op },
+#else
+   .gdb_bpt_instr = { break_op, 0x00, 0x00, spec_op << 2 },
+#endif
+};
  
  int kgdb_arch_init(void)

  {
-   union mips_instruction insn = {
-   .r_format = {
-   .opcode = spec_op,
-   .func   = break_op,
-   }
-   };
-   memcpy(arch_kgdb_ops.gdb_bpt_instr, insn.byte, BREAK_INSTR_SIZE);
-
register_die_notifier(_notifier);
  
  	return 0;

--
2.13.3

[PATCH v2 2/2] kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops

2018-12-06 Thread Christophe Leroy

checkpatch.pl reports the following:

  WARNING: struct kgdb_arch should normally be const
  #28: FILE: arch/mips/kernel/kgdb.c:397:
  +struct kgdb_arch arch_kgdb_ops = {

This report makes sense, as all other ops struct, this
one should also be const. This patch does the change.

Cc: Vineet Gupta 
Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Yoshinori Sato 
Cc: Richard Kuo 
Cc: Michal Simek 
Cc: Ralf Baechle 
Cc: Paul Burton 
Cc: James Hogan 
Cc: Ley Foon Tan 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Rich Felker 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Acked-by: Daniel Thompson 
Acked-by: Paul Burton 
Signed-off-by: Christophe Leroy 
---
 v2: Added CCs to all maintainers/supporters identified by get_maintainer.pl 
and Acks from Daniel and Paul.

 arch/arc/kernel/kgdb.c| 2 +-
 arch/arm/kernel/kgdb.c| 2 +-
 arch/arm64/kernel/kgdb.c  | 2 +-
 arch/h8300/kernel/kgdb.c  | 2 +-
 arch/hexagon/kernel/kgdb.c| 2 +-
 arch/microblaze/kernel/kgdb.c | 2 +-
 arch/mips/kernel/kgdb.c   | 2 +-
 arch/nios2/kernel/kgdb.c  | 2 +-
 arch/powerpc/kernel/kgdb.c| 2 +-
 arch/sh/kernel/kgdb.c | 2 +-
 arch/sparc/kernel/kgdb_32.c   | 2 +-
 arch/sparc/kernel/kgdb_64.c   | 2 +-
 arch/x86/kernel/kgdb.c| 2 +-
 include/linux/kgdb.h  | 2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 9a3c34af2ae8..bfd04b442e36 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -204,7 +204,7 @@ void kgdb_roundup_cpus(unsigned long flags)
local_irq_disable();
 }
 
-struct kgdb_arch arch_kgdb_ops = {
+const struct kgdb_arch arch_kgdb_ops = {
/* breakpoint instruction: TRAP_S 0x3 */
 #ifdef CONFIG_CPU_BIG_ENDIAN
.gdb_bpt_instr  = {0x78, 0x7e},
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index caa0dbe3dc61..21a6d5958955 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -274,7 +274,7 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
  * and we handle the normal undef case within the do_undefinstr
  * handler.
  */
-struct kgdb_arch arch_kgdb_ops = {
+const struct kgdb_arch arch_kgdb_ops = {
 #ifndef __ARMEB__
.gdb_bpt_instr  = {0xfe, 0xde, 0xff, 0xe7}
 #else /* ! __ARMEB__ */
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a20de58061a8..fe1d1f935b90 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -357,7 +357,7 @@ void kgdb_arch_exit(void)
unregister_die_notifier(_notifier);
 }
 
-struct kgdb_arch arch_kgdb_ops;
+const struct kgdb_arch arch_kgdb_ops;
 
 int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
 {
diff --git a/arch/h8300/kernel/kgdb.c b/arch/h8300/kernel/kgdb.c
index 1a1d30cb0609..602e478afbd5 100644
--- a/arch/h8300/kernel/kgdb.c
+++ b/arch/h8300/kernel/kgdb.c
@@ -129,7 +129,7 @@ void kgdb_arch_exit(void)
/* Nothing to do */
 }
 
-struct kgdb_arch arch_kgdb_ops = {
+const struct kgdb_arch arch_kgdb_ops = {
/* Breakpoint instruction: trapa #2 */
.gdb_bpt_instr = { 0x57, 0x20 },
 };
diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
index 16c24b22d0b2..f1924d483e78 100644
--- a/arch/hexagon/kernel/kgdb.c
+++ b/arch/hexagon/kernel/kgdb.c
@@ -83,7 +83,7 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
{ "syscall_nr", GDB_SIZEOF_REG, offsetof(struct pt_regs, syscall_nr)},
 };
 
-struct kgdb_arch arch_kgdb_ops = {
+const struct kgdb_arch arch_kgdb_ops = {
/* trap0(#0xDB) 0x0cdb0054 */
.gdb_bpt_instr = {0x54, 0x00, 0xdb, 0x0c},
 };
diff --git a/arch/microblaze/kernel/kgdb.c b/arch/microblaze/kernel/kgdb.c
index 6366f69d118e..130cd0f064ce 100644
--- a/arch/microblaze/kernel/kgdb.c
+++ b/arch/microblaze/kernel/kgdb.c
@@ -143,7 +143,7 @@ void kgdb_arch_exit(void)
 /*
  * Global data
  */
-struct kgdb_arch arch_kgdb_ops = {
+const struct kgdb_arch arch_kgdb_ops = {
 #ifdef __MICROBLAZEEL__
.gdb_bpt_instr = {0x18, 0x00, 0x0c, 0xba}, /* brki r16, 0x18 */
 #else
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index 31eff1bec577..edfdc2ec2d16 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -394,7 +394,7 @@ int kgdb_arch_handle_exception(int vector, int signo, int 
err_code,
return -1;
 }
 
-struct kgdb_arch arch_kgdb_ops = {
+const struct kgdb_arch arch_kgdb_ops = {
 #ifdef CONFIG_CPU_BIG_ENDIAN
.gdb_bpt_instr = { spec_op << 2, 0x00, 0x00, break_op },
 #else
diff --git a/arch/nios2/kernel/kgdb.c b/arch/nios2/kernel/kgdb.c
index 117859122d1c..37b25f844a2d 100644
--- a/arch/nios2/kernel/kgdb.c
+++ b/arch/nios2/kernel/kgdb.c
@@ -165,7 +165,7 @@ void kgdb_arch_exit(void)
/* Nothing to do */
 }
 
-struct kgdb_arch arch_kgdb_ops = {
+const struct kgdb_arch arch_kgdb_ops = {
/* Breakpoint instruction: trap 30 */

[PATCH v2 1/2] mips/kgdb: prepare arch_kgdb_ops for constness

2018-12-06 Thread Christophe Leroy

MIPS is the only architecture modifying arch_kgdb_ops during init.
This patch makes the init static, so that it can be changed to
const in following patch, as recommended by checkpatch.pl

Suggested-by: Paul Burton 
Acked-by: Daniel Thompson 
Acked-by: Paul Burton 
Signed-off-by: Christophe Leroy 
---
 v2: Added acks from Daniel and Paul.

 arch/mips/kernel/kgdb.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index eb6c0d582626..31eff1bec577 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -394,18 +394,16 @@ int kgdb_arch_handle_exception(int vector, int signo, int 
err_code,
return -1;
 }
 
-struct kgdb_arch arch_kgdb_ops;
+struct kgdb_arch arch_kgdb_ops = {
+#ifdef CONFIG_CPU_BIG_ENDIAN
+   .gdb_bpt_instr = { spec_op << 2, 0x00, 0x00, break_op },
+#else
+   .gdb_bpt_instr = { break_op, 0x00, 0x00, spec_op << 2 },
+#endif
+};
 
 int kgdb_arch_init(void)
 {
-   union mips_instruction insn = {
-   .r_format = {
-   .opcode = spec_op,
-   .func   = break_op,
-   }
-   };
-   memcpy(arch_kgdb_ops.gdb_bpt_instr, insn.byte, BREAK_INSTR_SIZE);
-
register_die_notifier(_notifier);
 
return 0;
-- 
2.13.3

Re: [PATCH v2 26/34] dt-bindings: arm: Convert Renesas board/soc bindings to json-schema

2018-12-06 Thread Rob Herring

On Wed, Dec 5, 2018 at 1:44 PM Simon Horman  wrote:
>
> On Tue, Dec 04, 2018 at 09:08:57AM -0600, Rob Herring wrote:
> > On Tue, Dec 4, 2018 at 8:57 AM Geert Uytterhoeven  
> > wrote:
> > >
> > > Hi Simon,
> > >
> > > On Tue, Dec 4, 2018 at 3:48 PM Simon Horman  wrote:
> > > > On Mon, Dec 03, 2018 at 03:32:15PM -0600, Rob Herring wrote:
> > > > > Convert Renesas SoC bindings to DT schema format using json-schema.
> > > > >
> > > > > Cc: Simon Horman 
> > > > > Cc: Magnus Damm 
> > > > > Cc: Mark Rutland 
> > > > > Cc: linux-renesas-...@vger.kernel.org
> > > > > Cc: devicet...@vger.kernel.org
> > > > > Signed-off-by: Rob Herring 
> > > > > ---
> > > > >  .../devicetree/bindings/arm/shmobile.txt  | 151 
> > > > >  .../devicetree/bindings/arm/shmobile.yaml | 218 
> > > > > ++
> > > > >  2 files changed, 218 insertions(+), 151 deletions(-)
> > > > >  delete mode 100644 Documentation/devicetree/bindings/arm/shmobile.txt
> > > > >  create mode 100644 
> > > > > Documentation/devicetree/bindings/arm/shmobile.yaml
> > > >
> > > > Hi Rob,
> > > >
> > > > what is this based on? I get a conflict when applying the .txt change
> > > > and if I knew the base for this patch it would be rather easy to work
> > > > out what has changed.
> >
> > 4.20-rc2
> >
> > > >
> > > > Also, should we do an s/shmobile.txt/shmobile.yaml/ in MAINTAINERS?
> >
> > Yes. Though it was pointed out that get_maintainers.pl can pull emails
> > out of this file. We'd need to get that to work by default though.
> >
> > > Probably even s/shmobile.yaml/renesas.yaml/, while at it?
> >
> > Sure, if that's what you all want.
>
> How about this?

LGTM

> From: Rob Herring 
> Subject: [PATCH v2.1] dt-bindings: arm: Convert Renesas board/soc bindings to
>  json-schema
>
> Convert Renesas SoC bindings to DT schema format using json-schema.
>
> v2.1 [Simon Horman]
> - rebased on renesas-devel-20181204-v4.20-rc5
>   + Added r8a7744 development platform and SoM
>   + Correct RZ/G2E part number
> - Update MAINTAINERS
>
> Signed-off-by: Rob Herring 
> Signed-off-by: Simon Horman 
> ---
>  Documentation/devicetree/bindings/arm/renesas.yaml | 228 
> +
>  Documentation/devicetree/bindings/arm/shmobile.txt | 155 --
>  MAINTAINERS|   4 +-
>  3 files changed, 230 insertions(+), 157 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/arm/renesas.yaml
>  delete mode 100644 Documentation/devicetree/bindings/arm/shmobile.txt

Re: use generic DMA mapping code in powerpc V4

2018-12-06 Thread Christoph Hellwig

On Thu, Dec 06, 2018 at 06:10:54PM +0100, Christian Zigotzky wrote:
> Please don’t merge this code. We are still testing and trying to figure out 
> where the problems are in the code.

The ones I sent pings for were either tested successfully by you
(the zone change) or are trivial cleanups that don't affect your setup.

Re: [PATCH v2 5/6] arch: simplify several early memory allocations

2018-12-06 Thread Sam Ravnborg

On Mon, Dec 03, 2018 at 06:49:21PM +0200, Mike Rapoport wrote:
> On Mon, Dec 03, 2018 at 05:29:08PM +0100, Sam Ravnborg wrote:
> > Hi Mike.
> > 
> > > index c37955d..2a17665 100644
> > > --- a/arch/sparc/kernel/prom_64.c
> > > +++ b/arch/sparc/kernel/prom_64.c
> > > @@ -34,16 +34,13 @@
> > >  
> > >  void * __init prom_early_alloc(unsigned long size)
> > >  {
> > > - unsigned long paddr = memblock_phys_alloc(size, SMP_CACHE_BYTES);
> > > - void *ret;
> > > + void *ret = memblock_alloc(size, SMP_CACHE_BYTES);
> > >  
> > > - if (!paddr) {
> > > + if (!ret) {
> > >   prom_printf("prom_early_alloc(%lu) failed\n", size);
> > >   prom_halt();
> > >   }
> > >  
> > > - ret = __va(paddr);
> > > - memset(ret, 0, size);
> > >   prom_early_allocated += size;
> > >  
> > >   return ret;
> > 
> > memblock_alloc() calls memblock_alloc_try_nid().
> > And if allocation fails then memblock_alloc_try_nid() calls panic().
> > So will we ever hit the prom_halt() code?
> 
> memblock_phys_alloc_try_nid() also calls panic if an allocation fails. So
> in either case we never reach prom_halt() code.

So we have code here we never reach - not nice.
If the idea is to avoid relying on the panic inside memblock_alloc() then
maybe replace it with a variant that do not call panic?
To make it clear what happens.

Sam

Re: [PATCH 1/2] mips/kgdb: prepare arch_kgdb_ops for constness

2018-12-06 Thread Paul Burton

Hi Christophe & Daniel,

On Thu, Dec 06, 2018 at 02:09:02PM +, Daniel Thompson wrote:
> On Wed, Dec 05, 2018 at 04:41:09AM +, Christophe Leroy wrote:
> > MIPS is the only architecture modifying arch_kgdb_ops during init.
> > This patch makes the init static, so that it can be changed to
> > const in following patch, as recommended by checkpatch.pl
> > 
> > Suggested-by: Paul Burton 
> > Signed-off-by: Christophe Leroy 
> 
> From my side this is
> Acked-by: Daniel Thompson 
> 
> Since this is a dependency for the next patch I'd be happy to take via
> my tree... but would need an ack from the MIPS guys to do that.

For both patches in this series:

Acked-by: Paul Burton 

Thanks,
Paul

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-06 Thread Thiago Jung Bauermann



Hello Gautham,

Gautham R. Shenoy  writes:

> From: "Gautham R. Shenoy" 
>
> Currently running DLPAR offline/online operations in a loop on a
> POWER9 system with SMT=off results in the following crash:
>
> [  223.321032] cpu 112 (hwid 112) Ready to die...
> [  223.355963] Querying DEAD? cpu 113 (113) shows 2
> [  223.356233] cpu 114 (hwid 114) Ready to die...
> [  223.356236] cpu 113 (hwid 113) Ready to die...
> [  223.356241] Bad kernel stack pointer 1faf6ca0 at 1faf6d50
> [  223.356243] Oops: Bad kernel stack pointer, sig: 6 [#1]
> [  223.356244] LE SMP NR_CPUS=2048 NUMA pSeries
> [  223.356247] Modules linked in:
> [  223.356255] CPU: 114 PID: 0 Comm: swapper/114 Not tainted 4.20.0-rc3 #39
> [  223.356258] NIP:  1faf6d50 LR: 1faf6d50 CTR: 
> 1ec6d06c
> [  223.356259] REGS: c0001e5cbd30 TRAP: 0700   Not tainted  (4.20.0-rc3)
> [  223.356260] MSR:  80081000   CR: 2804  XER: 0020
> [  223.356263] CFAR: 1ec98590 IRQMASK: 80009033
>GPR00: 1faf6d50 1faf6ca0 1ed1c448 
> 0267e6a273c3
>GPR04:  00e0 dfe8 
> 1faf6d30
>GPR08: 1faf6d28 0267e6a273c3 1ec1b108 
> 
>GPR12: 01b6d998 c0001eb55080 c003a1b8bf90 
> 1eea3f40
>GPR16:  c006fda85100 c004c8b0 
> c13d5300
>GPR20: c006fda85300 0008 c19d2cf8 
> c13d6888
>GPR24: 0072 c13d688c 0002 
> c13d688c
>GPR28: c19cecf0 0390  
> 1faf6ca0
> [  223.356281] NIP [1faf6d50] 0x1faf6d50
> [  223.356281] LR [1faf6d50] 0x1faf6d50
> [  223.356282] Call Trace:
> [  223.356283] Instruction dump:
> [  223.356285]        
> 
> [  223.356286]        
> 
> [  223.356290] ---[ end trace f46a4e046b564d1f ]---
>
> This is due to multiple offlined CPUs (CPU 113 and CPU 114 above)
> concurrently (within 3us) trying to make the rtas-call with the
> "stop-self" token, something that is prohibited by the PAPR.
>
> The concurrent calls can happen as follows.
>
>   o In dlpar_offline_cpu() we prod an offline CPU X (offline due to
> SMT=off) and loop for 25 tries in pseries_cpu_die() querying if
> the target CPU X has been stopped in RTAS. After 25 tries, we
> prints the message "Querying DEAD? cpu X (X) shows 2" and return
> to dlpar_offline_cpu(). Note that at this point CPU X has not yet
> called rtas with the "stop-self" token, but can do so anytime now.
>
>   o Back in dlpar_offline_cpu(), we prod the next offline CPU Y. CPU Y
> promptly wakes up and calls RTAS with the "stop-self" token.
>
>   o Before RTAS can stop CPU Y, CPU X also calls RTAS with the
> "stop-self" token.
>
> The problem occurs due to the short number of tries (25) in
> pseries_cpu_die() which covers 90% of the cases. For the remaining 10%
> of the cases, it was observed that we would need to loop for a few
> hundred iterations before the target CPU calls rtas with "stop-self"
> token.Experiments show that each try takes roughly ~25us.
>
> In this patch we fix the problem by increasing the number of tries
> from 25 to 4000 (which roughly corresponds to 100ms) before bailing
> out and declaring that we have failed to observe the target CPU call
> rtas-stop-self. This provides sufficient serialization to ensure that
> there no concurrent rtas calls with "stop-self" token.
>
>
>
> Reported-by: Michael Bringmann 
> Signed-off-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 2f8e621..c913c44 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -214,14 +214,25 @@ static void pseries_cpu_die(unsigned int cpu)
>   msleep(1);
>   }
>   } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> + int max_tries = 4000; /* Roughly corresponds to 100ms */
> + u64 tb_before = mftb();
>
> - for (tries = 0; tries < 25; tries++) {
> + for (tries = 0; tries < max_tries; tries++) {
>   cpu_status = smp_query_cpu_stopped(pcpu);
>   if (cpu_status == QCSS_STOPPED ||
>   cpu_status == QCSS_HARDWARE_ERROR)
>   break;
>   cpu_relax();
>   }
> +
> + if (tries == max_tries) {
> + u64

Re: use generic DMA mapping code in powerpc V4

2018-12-06 Thread Christian Zigotzky

Please don’t merge this code. We are still testing and trying to figure out 
where the problems are in the code.

— Christian

Sent from my iPhone

> On 6. Dec 2018, at 11:55, Christian Zigotzky  wrote:
> 
>> On 05 December 2018 at 3:05PM, Christoph Hellwig wrote:
>> 
>> Thanks.  Can you try a few stepping points in the tree?
>> 
>> First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
>> (the first one) applied?
>> 
>> Second with all commits up to 5da11e49df21f21dac25a2491aa788307bdacb6b
>> 
>> And if that still works with commits up to
>> c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a
>> 
> Hi Christoph,
> 
> I undid the commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 with the 
> following command:
> 
> git checkout 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
> 
> Result: PASEMI onboard ethernet works again and the P5020 board boots.
> 
> I will test the other commits in the next days.
> 
> @All
> It is really important, that you also test Christoph's work on your PASEMI 
> and NXP boards. Could you please help us with solving the issues?
> 
> 'git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.5 a'
> 
> Thanks,
> Christian
>

[PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-06 Thread Michael Roth

Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
and is adjusted again at init time if MODULES_VADDR is defined.

For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
the compile-time default at boot-time, which is 0x9c40 when
using 64K page size. This overflows the signed 32-bit bpf_jit_limit
value:

  root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
  -1673527296

and can cause various unexpected failures throughout the network
stack. In one case `strace dhclient eth0` reported:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 16) 
= -1 ENOTSUPP (Unknown error 524)

and similar failures can be seen with tools like tcpdump. This doesn't
always reproduce however, and I'm not sure why. The more consistent
failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
would time out on systemd/netplan configuring a virtio-net NIC with no
noticeable errors in the logs.

Fix this by limiting the compile-time default for bpf_jit_limit to
INT_MAX.

Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
allocations")
Cc: linuxppc-...@ozlabs.org
Cc: Daniel Borkmann 
Cc: Sandipan Das 
Cc: Alexei Starovoitov 
Signed-off-by: Michael Roth 
---
 kernel/bpf/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b1a3545d0ec8..55de4746cdfd 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
 }
 
 #ifdef CONFIG_BPF_JIT
-# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4)
+# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
+# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), INT_MAX)
 
 /* All BPF JIT sysctl knobs here. */
 int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
-- 
2.17.1

Re: [PATCH bpf] bpf: fix default unprivileged allocation limit

2018-12-06 Thread Michael Roth

Quoting Sandipan Das (2018-12-06 03:27:32)
> When using a large page size, the default value of the bpf_jit_limit
> knob becomes invalid and users are not able to run unprivileged bpf
> programs.
> 
> The bpf_jit_limit knob is represented internally as a 32-bit signed
> integer because of which the default value, i.e. PAGE_SIZE * 4,
> overflows in case of an architecture like powerpc64 which uses 64K
> as the default page size (i.e. CONFIG_PPC_64K_PAGES is set).
> 
> So, instead of depending on the page size, use a constant value.
> 
> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
> allocations")

This also consistently caused a virtio-net KVM Ubuntu 18.04 guest to time out
on configuring networking during boot via systemd/netplan. A bisect
pointed to the same commit this patch addresses.

> Signed-off-by: Sandipan Das 
> ---
>  kernel/bpf/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index b1a3545d0ec8..a81d097a17fb 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -365,7 +365,7 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
>  }
> 
>  #ifdef CONFIG_BPF_JIT
> -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4)
> +# define BPF_JIT_LIMIT_DEFAULT (4096 * 4)

This isn't quite right as we still use (bpf_jit_limit >> PAGE_SHIFT) to check
allocations in bpf_jit_charge_modmem(), so that should be fixed up as well.

Another alternative which is to clamp BPF_JIT_LIMIT_DEFAULT to INT_MAX,
which fixes the issue for me and is similar to what
bpf_jit_charge_init() does for kernels where MODULES_VADDR is defined.
I'll go ahead and send the patch in case that seems preferable.

> 
>  /* All BPF JIT sysctl knobs here. */
>  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
> -- 
> 2.19.2
>

[PATCH v3 12/12] perf/core: remove unused perf_flags

2018-12-06 Thread Andrew Murray

Now that perf_flags is not used we remove it.

Signed-off-by: Andrew Murray 
---
 include/uapi/linux/perf_event.h   | 2 --
 tools/include/uapi/linux/perf_event.h | 2 --
 2 files changed, 4 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f35eb72..ba89bd3 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
-#define perf_flags(attr)   (*(&(attr)->read_format + 1))
-
 /*
  * Ioctls that can be done on a perf event fd:
  */
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index f35eb72..ba89bd3 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -445,8 +445,6 @@ struct perf_event_query_bpf {
__u32   ids[0];
 };
 
-#define perf_flags(attr)   (*(&(attr)->read_format + 1))
-
 /*
  * Ioctls that can be done on a perf event fd:
  */
-- 
2.7.4

[PATCH v3 11/12] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray

For x86 PMUs that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

This change means that amd/iommu and amd/uncore will now also
indicate that they do not support exclude_{hv|idle} and intel/uncore
that it does not support exclude_{guest|host}.

Signed-off-by: Andrew Murray 
---
 arch/x86/events/amd/iommu.c| 6 +-
 arch/x86/events/amd/uncore.c   | 7 ++-
 arch/x86/events/intel/uncore.c | 9 +
 3 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/arch/x86/events/amd/iommu.c b/arch/x86/events/amd/iommu.c
index 3210fee..7635c23 100644
--- a/arch/x86/events/amd/iommu.c
+++ b/arch/x86/events/amd/iommu.c
@@ -223,11 +223,6 @@ static int perf_iommu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* IOMMU counters do not have usr/os/guest/host bits */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -414,6 +409,7 @@ static const struct pmu iommu_pmu __initconst = {
.read   = perf_iommu_read,
.task_ctx_nr= perf_invalid_context,
.attr_groups= amd_iommu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static __init int init_one_iommu(unsigned int idx)
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 8671de1..988cb9c 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -201,11 +201,6 @@ static int amd_uncore_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* NB and Last level cache counters do not have usr/os/guest/host bits 
*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
/* and we do not enable counter overflow interrupts */
hwc->config = event->attr.config & AMD64_RAW_EVENT_MASK_NB;
hwc->idx = -1;
@@ -307,6 +302,7 @@ static struct pmu amd_nb_pmu = {
.start  = amd_uncore_start,
.stop   = amd_uncore_stop,
.read   = amd_uncore_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static struct pmu amd_llc_pmu = {
@@ -317,6 +313,7 @@ static struct pmu amd_llc_pmu = {
.start  = amd_uncore_start,
.stop   = amd_uncore_stop,
.read   = amd_uncore_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static struct amd_uncore *amd_uncore_alloc(unsigned int cpu)
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 27a4614..d516161 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -695,14 +695,6 @@ static int uncore_pmu_event_init(struct perf_event *event)
if (pmu->func_id < 0)
return -ENOENT;
 
-   /*
-* Uncore PMU does measure at all privilege level all the time.
-* So it doesn't make sense to specify any exclude bits.
-*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle)
-   return -EINVAL;
-
/* Sampling not supported yet */
if (hwc->sample_period)
return -EINVAL;
@@ -800,6 +792,7 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
.stop   = uncore_pmu_event_stop,
.read   = uncore_pmu_event_read,
.module = THIS_MODULE,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
} else {
pmu->pmu = *pmu->type->pmu;
-- 
2.7.4

[PATCH v3 10/12] x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray

For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NOEXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

Signed-off-by: Andrew Murray 
---
 arch/x86/events/amd/ibs.c  | 13 +
 arch/x86/events/amd/power.c| 10 ++
 arch/x86/events/intel/cstate.c | 12 +++-
 arch/x86/events/intel/rapl.c   |  9 ++---
 arch/x86/events/intel/uncore_snb.c |  9 ++---
 arch/x86/events/msr.c  | 10 ++
 6 files changed, 12 insertions(+), 51 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index d50bb4d..62f317c 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -253,15 +253,6 @@ static int perf_ibs_precise_event(struct perf_event 
*event, u64 *config)
return -EOPNOTSUPP;
 }
 
-static const struct perf_event_attr ibs_notsupp = {
-   .exclude_user   = 1,
-   .exclude_kernel = 1,
-   .exclude_hv = 1,
-   .exclude_idle   = 1,
-   .exclude_host   = 1,
-   .exclude_guest  = 1,
-};
-
 static int perf_ibs_init(struct perf_event *event)
 {
struct hw_perf_event *hwc = >hw;
@@ -282,9 +273,6 @@ static int perf_ibs_init(struct perf_event *event)
if (event->pmu != _ibs->pmu)
return -ENOENT;
 
-   if (perf_flags(>attr) & perf_flags(_notsupp))
-   return -EINVAL;
-
if (config & ~perf_ibs->config_mask)
return -EINVAL;
 
@@ -537,6 +525,7 @@ static struct perf_ibs perf_ibs_fetch = {
.start  = perf_ibs_start,
.stop   = perf_ibs_stop,
.read   = perf_ibs_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
},
.msr= MSR_AMD64_IBSFETCHCTL,
.config_mask= IBS_FETCH_CONFIG_MASK,
diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index 2aefacf..c5ff084 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -136,14 +136,7 @@ static int pmu_event_init(struct perf_event *event)
return -ENOENT;
 
/* Unsupported modes and filters. */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest  ||
-   /* no sampling */
-   event->attr.sample_period)
+   if (event->attr.sample_period)
return -EINVAL;
 
if (cfg != AMD_POWER_EVENTSEL_PKG)
@@ -226,6 +219,7 @@ static struct pmu pmu_class = {
.start  = pmu_event_start,
.stop   = pmu_event_stop,
.read   = pmu_event_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static int power_cpu_exit(unsigned int cpu)
diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 9f8084f..15a1981 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -280,13 +280,7 @@ static int cstate_pmu_event_init(struct perf_event *event)
return -ENOENT;
 
/* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest  ||
-   event->attr.sample_period) /* no sampling */
+   if (event->attr.sample_period) /* no sampling */
return -EINVAL;
 
if (event->cpu < 0)
@@ -437,7 +431,7 @@ static struct pmu cstate_core_pmu = {
.start  = cstate_pmu_event_start,
.stop   = cstate_pmu_event_stop,
.read   = cstate_pmu_event_update,
-   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT,
+   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
 };
 
@@ -451,7 +445,7 @@ static struct pmu cstate_pkg_pmu = {
.start  = cstate_pmu_event_start,
.stop   = cstate_pmu_event_stop,
.read   = cstate_pmu_event_update,
-   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT,
+   .capabilities   = PERF_PMU_CAP_NO_INTERRUPT | PERF_PMU_CAP_NO_EXCLUDE,
.module = THIS_MODULE,
 };
 
diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 32f3e94..18a5628 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -397,13 +397,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
return -EINVAL;
 
/* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-

[PATCH v3 09/12] powerpc: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray

For PowerPC PMUs that do not support context exclusion let's
advertise the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that
perf will prevent us from handling events where any exclusion flags
are set. Let's also remove the now unnecessary check for exclusion
flags.

Signed-off-by: Andrew Murray 
---
 arch/powerpc/perf/hv-24x7.c | 10 +-
 arch/powerpc/perf/hv-gpci.c | 10 +-
 arch/powerpc/perf/imc-pmu.c | 19 +--
 3 files changed, 3 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 72238ee..d2b8e60 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1306,15 +1306,6 @@ static int h_24x7_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
@@ -1577,6 +1568,7 @@ static struct pmu h_24x7_pmu = {
.start_txn   = h_24x7_event_start_txn,
.commit_txn  = h_24x7_event_commit_txn,
.cancel_txn  = h_24x7_event_cancel_txn,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static int hv_24x7_init(void)
diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
index 43fabb3..735e77b 100644
--- a/arch/powerpc/perf/hv-gpci.c
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -232,15 +232,6 @@ static int h_gpci_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/* no branch sampling */
if (has_branch_stack(event))
return -EOPNOTSUPP;
@@ -285,6 +276,7 @@ static struct pmu h_gpci_pmu = {
.start   = h_gpci_event_start,
.stop= h_gpci_event_stop,
.read= h_gpci_event_update,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 static int hv_gpci_init(void)
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 1fafc32b..1dbb0ee 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -473,15 +473,6 @@ static int nest_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -748,15 +739,6 @@ static int core_imc_event_init(struct perf_event *event)
if (event->hw.sample_period)
return -EINVAL;
 
-   /* unsupported modes and filters */
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -1069,6 +1051,7 @@ static int update_pmu_ops(struct imc_pmu *pmu)
pmu->pmu.stop = imc_event_stop;
pmu->pmu.read = imc_event_update;
pmu->pmu.attr_groups = pmu->attr_groups;
+   pmu->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
pmu->attr_groups[IMC_FORMAT_ATTR] = _format_group;
 
switch (pmu->domain) {
-- 
2.7.4

[PATCH v3 08/12] drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray

For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

This change means that qcom_{l2|l3}_pmu will now also indicate that
they do not support exclude_{host|guest} and that xgene_pmu does
not also support exclude_idle and exclude_hv.

Note that for qcom_l2_pmu we now implictly return -EINVAL instead
of -EOPNOTSUPP. This change will result in the perf userspace
utility retrying the perf_event_open system call with fallback
event attributes that do not fail.

Signed-off-by: Andrew Murray 
---
 drivers/perf/qcom_l2_pmu.c | 9 +
 drivers/perf/qcom_l3_pmu.c | 8 +---
 drivers/perf/xgene_pmu.c   | 6 +-
 3 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/perf/qcom_l2_pmu.c b/drivers/perf/qcom_l2_pmu.c
index 842135c..091b4d7 100644
--- a/drivers/perf/qcom_l2_pmu.c
+++ b/drivers/perf/qcom_l2_pmu.c
@@ -509,14 +509,6 @@ static int l2_cache_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   /* We cannot filter accurately so we just don't allow it. */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle) {
-   dev_dbg_ratelimited(_pmu->pdev->dev,
-   "Can't exclude execution levels\n");
-   return -EOPNOTSUPP;
-   }
-
if (((L2_EVT_GROUP(event->attr.config) > L2_EVT_GROUP_MAX) ||
 ((event->attr.config & ~L2_EVT_MASK) != 0)) &&
(event->attr.config != L2CYCLE_CTR_RAW_CODE)) {
@@ -982,6 +974,7 @@ static int l2_cache_pmu_probe(struct platform_device *pdev)
.stop   = l2_cache_event_stop,
.read   = l2_cache_event_read,
.attr_groups= l2_cache_pmu_attr_grps,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
l2cache_pmu->num_counters = get_num_counters();
diff --git a/drivers/perf/qcom_l3_pmu.c b/drivers/perf/qcom_l3_pmu.c
index 2dc63d6..5d70646 100644
--- a/drivers/perf/qcom_l3_pmu.c
+++ b/drivers/perf/qcom_l3_pmu.c
@@ -495,13 +495,6 @@ static int qcom_l3_cache__event_init(struct perf_event 
*event)
return -ENOENT;
 
/*
-* There are no per-counter mode filters in the PMU.
-*/
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_hv || event->attr.exclude_idle)
-   return -EINVAL;
-
-   /*
 * Sampling not supported since these events are not core-attributable.
 */
if (hwc->sample_period)
@@ -777,6 +770,7 @@ static int qcom_l3_cache_pmu_probe(struct platform_device 
*pdev)
.read   = qcom_l3_cache__event_read,
 
.attr_groups= qcom_l3_cache_pmu_attr_grps,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
memrc = platform_get_resource(pdev, IORESOURCE_MEM, 0);
diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 0e31f13..dad6169 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -914,11 +914,6 @@ static int xgene_perf_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   /* SOC counters do not have usr/os/guest/host bits */
-   if (event->attr.exclude_user || event->attr.exclude_kernel ||
-   event->attr.exclude_host || event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
/*
@@ -1133,6 +1128,7 @@ static int xgene_init_perf(struct xgene_pmu_dev *pmu_dev, 
char *name)
.start  = xgene_perf_start,
.stop   = xgene_perf_stop,
.read   = xgene_perf_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
/* Hardware counter init */
-- 
2.7.4

[PATCH v3 07/12] drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray

For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

Signed-off-by: Andrew Murray 
---
 drivers/perf/arm-cci.c| 10 +-
 drivers/perf/arm-ccn.c|  6 ++
 drivers/perf/arm_dsu_pmu.c|  9 ++---
 drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c |  1 +
 drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_pmu.c  |  9 -
 7 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index 1bfeb16..bfd03e0 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -1327,15 +1327,6 @@ static int cci_pmu_event_init(struct perf_event *event)
if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
return -EOPNOTSUPP;
 
-   /* We have no filtering of any kind */
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle||
-   event->attr.exclude_host||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
/*
 * Following the example set by other "uncore" PMUs, we accept any CPU
 * and rewrite its affinity dynamically rather than having perf core
@@ -1433,6 +1424,7 @@ static int cci_pmu_init(struct cci_pmu *cci_pmu, struct 
platform_device *pdev)
.stop   = cci_pmu_stop,
.read   = pmu_read,
.attr_groups= pmu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
cci_pmu->plat_device = pdev;
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 7dd850e..2ae7602 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -741,10 +741,7 @@ static int arm_ccn_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   if (has_branch_stack(event) || event->attr.exclude_user ||
-   event->attr.exclude_kernel || event->attr.exclude_hv ||
-   event->attr.exclude_idle || event->attr.exclude_host ||
-   event->attr.exclude_guest) {
+   if (has_branch_stack(event)) {
dev_dbg(ccn->dev, "Can't exclude execution levels!\n");
return -EINVAL;
}
@@ -1290,6 +1287,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
.read = arm_ccn_pmu_event_read,
.pmu_enable = arm_ccn_pmu_enable,
.pmu_disable = arm_ccn_pmu_disable,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
 
/* No overflow interrupt? Have to use a timer instead. */
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
index 660cb8a..5851de5 100644
--- a/drivers/perf/arm_dsu_pmu.c
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -562,13 +562,7 @@ static int dsu_pmu_event_init(struct perf_event *event)
return -EINVAL;
}
 
-   if (has_branch_stack(event) ||
-   event->attr.exclude_user ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle ||
-   event->attr.exclude_host ||
-   event->attr.exclude_guest) {
+   if (has_branch_stack(event)) {
dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
return -EINVAL;
}
@@ -735,6 +729,7 @@ static int dsu_pmu_device_probe(struct platform_device 
*pdev)
.read   = dsu_pmu_read,
 
.attr_groups= dsu_pmu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
rc = perf_pmu_register(_pmu->pmu, name, -1);
diff --git a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c 
b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
index 1b10ea0..296fef8 100644
--- a/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c
@@ -396,6 +396,7 @@ static int hisi_ddrc_pmu_probe(struct platform_device *pdev)
.stop   = hisi_uncore_pmu_stop,
.read   = hisi_uncore_pmu_read,
.attr_groups= hisi_ddrc_pmu_attr_groups,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
};
 
ret = perf_pmu_register(_pmu->pmu, name, -1);
diff --git a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c 
b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
index 443906e..2553a84 100644
--- a/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
+++ b/drivers/perf/hisilicon/hisi_uncore_hha_pmu.c
@@ -407,6 +407,7 @@ static int hisi_hha_pmu_probe(struct platform_device

[PATCH v3 06/12] arm: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs

2018-12-06 Thread Andrew Murray

For drivers that do not support context exclusion let's advertise the
PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

Signed-off-by: Andrew Murray 
---
 arch/arm/mach-imx/mmdc.c | 9 ++---
 arch/arm/mm/cache-l2x0-pmu.c | 9 +
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/arch/arm/mach-imx/mmdc.c b/arch/arm/mach-imx/mmdc.c
index 04b3bf7..3453838 100644
--- a/arch/arm/mach-imx/mmdc.c
+++ b/arch/arm/mach-imx/mmdc.c
@@ -293,13 +293,7 @@ static int mmdc_pmu_event_init(struct perf_event *event)
return -EOPNOTSUPP;
}
 
-   if (event->attr.exclude_user||
-   event->attr.exclude_kernel  ||
-   event->attr.exclude_hv  ||
-   event->attr.exclude_idle||
-   event->attr.exclude_host||
-   event->attr.exclude_guest   ||
-   event->attr.sample_period)
+   if (event->attr.sample_period)
return -EINVAL;
 
if (cfg < 0 || cfg >= MMDC_NUM_COUNTERS)
@@ -455,6 +449,7 @@ static int mmdc_pmu_init(struct mmdc_pmu *pmu_mmdc,
.start  = mmdc_pmu_event_start,
.stop   = mmdc_pmu_event_stop,
.read   = mmdc_pmu_event_update,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
},
.mmdc_base = mmdc_base,
.dev = dev,
diff --git a/arch/arm/mm/cache-l2x0-pmu.c b/arch/arm/mm/cache-l2x0-pmu.c
index afe5b4c..99bcd07 100644
--- a/arch/arm/mm/cache-l2x0-pmu.c
+++ b/arch/arm/mm/cache-l2x0-pmu.c
@@ -314,14 +314,6 @@ static int l2x0_pmu_event_init(struct perf_event *event)
event->attach_state & PERF_ATTACH_TASK)
return -EINVAL;
 
-   if (event->attr.exclude_user   ||
-   event->attr.exclude_kernel ||
-   event->attr.exclude_hv ||
-   event->attr.exclude_idle   ||
-   event->attr.exclude_host   ||
-   event->attr.exclude_guest)
-   return -EINVAL;
-
if (event->cpu < 0)
return -EINVAL;
 
@@ -544,6 +536,7 @@ static __init int l2x0_pmu_init(void)
.del = l2x0_pmu_event_del,
.event_init = l2x0_pmu_event_init,
.attr_groups = l2x0_pmu_attr_groups,
+   .capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
 
l2x0_pmu_reset();
-- 
2.7.4

[PATCH v3 05/12] arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE

2018-12-06 Thread Andrew Murray

The ARM PMU driver can be used to represent a variety of ARM based
PMUs. Some of these PMUs do not provide support for context
exclusion, where this is the case we advertise the
PERF_PMU_CAP_NO_EXCLUDE capability to ensure that perf prevents us
from handling events where any exclusion flags are set.

Signed-off-by: Andrew Murray 
---
 drivers/perf/arm_pmu.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 7f01f6f..ea69067 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -357,13 +357,6 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 }
 
 static int
-event_requires_mode_exclusion(struct perf_event_attr *attr)
-{
-   return attr->exclude_idle || attr->exclude_user ||
-  attr->exclude_kernel || attr->exclude_hv;
-}
-
-static int
 __hw_perf_event_init(struct perf_event *event)
 {
struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
@@ -393,9 +386,8 @@ __hw_perf_event_init(struct perf_event *event)
/*
 * Check whether we need to exclude the counter from certain modes.
 */
-   if ((!armpmu->set_event_filter ||
-armpmu->set_event_filter(hwc, >attr)) &&
-event_requires_mode_exclusion(>attr)) {
+   if (armpmu->set_event_filter &&
+   armpmu->set_event_filter(hwc, >attr)) {
pr_debug("ARM performance counters do not support "
 "mode exclusion\n");
return -EOPNOTSUPP;
@@ -861,6 +853,9 @@ int armpmu_register(struct arm_pmu *pmu)
if (ret)
return ret;
 
+   if (!pmu->set_event_filter)
+   pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
+
ret = perf_pmu_register(>pmu, pmu->name, -1);
if (ret)
goto out_destroy;
-- 
2.7.4

[PATCH v3 04/12] alpha: perf/core: use PERF_PMU_CAP_NO_EXCLUDE

2018-12-06 Thread Andrew Murray

As the Alpha PMU doesn't support context exclusion let's advertise
the PERF_PMU_CAP_NO_EXCLUDE capability. This ensures that perf will
prevent us from handling events where any exclusion flags are set.
Let's also remove the now unnecessary check for exclusion flags.

This change means that __hw_perf_event_init will now also
indicate that it doesn't support exclude_host and exclude_guest and
will now implicitly return -EINVAL instead of -EPERM. This is likely
more desirable as -EPERM will result in a kernel.perf_event_paranoid
related warning from the perf userspace utility.

Signed-off-by: Andrew Murray 
---
 arch/alpha/kernel/perf_event.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c
index 5613aa37..4341ccf 100644
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -630,12 +630,6 @@ static int __hw_perf_event_init(struct perf_event *event)
return ev;
}
 
-   /* The EV67 does not support mode exclusion */
-   if (attr->exclude_kernel || attr->exclude_user
-   || attr->exclude_hv || attr->exclude_idle) {
-   return -EPERM;
-   }
-
/*
 * We place the event type in event_base here and leave calculation
 * of the codes to programme the PMU for alpha_pmu_enable() because
@@ -771,6 +765,7 @@ static struct pmu pmu = {
.start  = alpha_pmu_start,
.stop   = alpha_pmu_stop,
.read   = alpha_pmu_read,
+   .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
 };
 
 
-- 
2.7.4

[PATCH v3 03/12] perf/core: add PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs

2018-12-06 Thread Andrew Murray

Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags. This approach is error prone and
often inconsistent.

Let's instead allow PMU drivers to advertise their inability to exclude
based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This
allows the perf core to reject requests for exclusion events where
there is no support in the PMU.

Signed-off-by: Andrew Murray 
---
 include/linux/perf_event.h | 1 +
 kernel/events/core.c   | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b2e806f..fe92b89 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -244,6 +244,7 @@ struct perf_event;
 #define PERF_PMU_CAP_EXCLUSIVE 0x10
 #define PERF_PMU_CAP_ITRACE0x20
 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40
+#define PERF_PMU_CAP_NO_EXCLUDE0x80
 
 /**
  * struct pmu - generic performance monitoring unit
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5a97f34..5113cfc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9743,6 +9743,15 @@ static int perf_try_init_event(struct pmu *pmu, struct 
perf_event *event)
if (ctx)
perf_event_ctx_unlock(event->group_leader, ctx);
 
+   if (!ret) {
+   if (pmu->capabilities & PERF_PMU_CAP_NO_EXCLUDE &&
+   event_has_any_exclude_flag(event)) {
+   if (event->destroy)
+   event->destroy(event);
+   ret = -EINVAL;
+   }
+   }
+
if (ret)
module_put(pmu->module);
 
-- 
2.7.4

[PATCH v3 02/12] perf/core: add function to test for event exclusion flags

2018-12-06 Thread Andrew Murray

Add a function that tests if any of the perf event exclusion flags
are set on a given event.

Signed-off-by: Andrew Murray 
---
 include/linux/perf_event.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f..b2e806f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1004,6 +1004,15 @@ perf_event__output_id_sample(struct perf_event *event,
 extern void
 perf_log_lost_samples(struct perf_event *event, u64 lost);
 
+static inline bool event_has_any_exclude_flag(struct perf_event *event)
+{
+   struct perf_event_attr *attr = >attr;
+
+   return attr->exclude_idle || attr->exclude_user ||
+  attr->exclude_kernel || attr->exclude_hv ||
+  attr->exclude_guest || attr->exclude_host;
+}
+
 static inline bool is_sampling_event(struct perf_event *event)
 {
return event->attr.sample_period != 0;
-- 
2.7.4

[PATCH v3 01/12] perf/doc: update design.txt for exclude_{host|guest} flags

2018-12-06 Thread Andrew Murray

Update design.txt to reflect the presence of the exclude_host
and exclude_guest perf flags.

Signed-off-by: Andrew Murray 
---
 tools/perf/design.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index a28dca2..0453ba2 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits 
provide a
 way to request that counting of events be restricted to times when the
 CPU is in user, kernel and/or hypervisor mode.
 
+Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
+to request counting of events restricted to guest and host contexts when
+using Linux as the hypervisor.
+
 The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap
 operations, these can be used to relate userspace IP addresses to actual
 code, even after the mapping (or even the whole process) is gone,
-- 
2.7.4

[PATCH v3 00/12] perf/core: Generalise event exclusion checking

2018-12-06 Thread Andrew Murray

Many PMU drivers do not have the capability to exclude counting events
that occur in specific contexts such as idle, kernel, guest, etc. These
drivers indicate this by returning an error in their event_init upon
testing the events attribute flags.

However this approach requires that each time a new event modifier is
added to perf, all the perf drivers need to be modified to indicate that
they don't support the attribute. This results in additional boiler-plate
code common to many drivers that needs to be maintained. Furthermore the
drivers are not consistent with regards to the error value they return
when reporting unsupported attributes.

This patchset allow PMU drivers to advertise their inability to exclude
based on context via a new capability: PERF_PMU_CAP_NO_EXCLUDE. This
allows the perf core to reject requests for exclusion events where there
is no support in the PMU.

This is a functional change, in particular:

 - Some drivers will now additionally (but correctly) report unsupported
   exclusion flags. It's typical for existing userspace tools such as
   perf to handle such errors by retrying the system call without the
   unsupported flags.

 - Drivers that do not support any exclusion that previously reported
   -EPERM or -EOPNOTSUPP will now report -EINVAL - this is consistent
   with the majority and results in userspace perf retrying without
   exclusion.

All drivers touched by this patchset have been compile tested.

Changes from v2:

 - Invert logic from CAP_EXCLUDE to CAP_NO_EXCLUDE

Changes from v1:

 - Changed approach from explicitly rejecting events in unsupporting PMU
   drivers to explicitly advertising a capability in PMU drivers that
   do support exclusion events

 - Added additional information to tools/perf/design.txt

 - Rename event_has_exclude_flags to event_has_any_exclude_flag and
   update commit log to reflect it's a function

Andrew Murray (12):
  perf/doc: update design.txt for exclude_{host|guest} flags
  perf/core: add function to test for event exclusion flags
  perf/core: add PERF_PMU_CAP_NO_EXCLUDE for exclusion incapable PMUs
  alpha: perf/core: use PERF_PMU_CAP_NO_EXCLUDE
  arm: perf: conditionally use PERF_PMU_CAP_NO_EXCLUDE
  arm: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
  drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude
incapable PMUs
  drivers/perf: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude
incapable PMUs
  powerpc: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable
PMUs
  x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
  x86: perf/core: use PERF_PMU_CAP_NO_EXCLUDE for exclude incapable PMUs
  perf/core: remove unused perf_flags

 arch/alpha/kernel/perf_event.c|  7 +--
 arch/arm/mach-imx/mmdc.c  |  9 ++---
 arch/arm/mm/cache-l2x0-pmu.c  |  9 +
 arch/powerpc/perf/hv-24x7.c   | 10 +-
 arch/powerpc/perf/hv-gpci.c   | 10 +-
 arch/powerpc/perf/imc-pmu.c   | 19 +--
 arch/x86/events/amd/ibs.c | 13 +
 arch/x86/events/amd/iommu.c   |  6 +-
 arch/x86/events/amd/power.c   | 10 ++
 arch/x86/events/amd/uncore.c  |  7 ++-
 arch/x86/events/intel/cstate.c| 12 +++-
 arch/x86/events/intel/rapl.c  |  9 ++---
 arch/x86/events/intel/uncore.c|  9 +
 arch/x86/events/intel/uncore_snb.c|  9 ++---
 arch/x86/events/msr.c | 10 ++
 drivers/perf/arm-cci.c| 10 +-
 drivers/perf/arm-ccn.c|  6 ++
 drivers/perf/arm_dsu_pmu.c|  9 ++---
 drivers/perf/arm_pmu.c| 15 +--
 drivers/perf/hisilicon/hisi_uncore_ddrc_pmu.c |  1 +
 drivers/perf/hisilicon/hisi_uncore_hha_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c  |  1 +
 drivers/perf/hisilicon/hisi_uncore_pmu.c  |  9 -
 drivers/perf/qcom_l2_pmu.c|  9 +
 drivers/perf/qcom_l3_pmu.c|  8 +---
 drivers/perf/xgene_pmu.c  |  6 +-
 include/linux/perf_event.h| 10 ++
 include/uapi/linux/perf_event.h   |  2 --
 kernel/events/core.c  |  9 +
 tools/include/uapi/linux/perf_event.h |  2 --
 tools/perf/design.txt |  4 
 31 files changed, 62 insertions(+), 189 deletions(-)

-- 
2.7.4

Re: [PATCH v2] misc: cxl: Use device_type helpers to access the node type

2018-12-06 Thread Frederic Barrat





Le 05/12/2018 à 20:16, Rob Herring a écrit :

Remove directly accessing device_type property and use the
of_node_is_type accessor instead. While not using it here, this is
part of eventually removing the struct device_node.type pointer.

Cc: Frederic Barrat 
Cc: Arnd Bergmann 
Cc: Greg Kroah-Hartman 
Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Andrew Donnellan 
Signed-off-by: Rob Herring 
---
v2:
- Reword commit message as this change was using the .type ptr.


Acked-by: Frederic Barrat 




  drivers/misc/cxl/pci.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
index b66d832d3233..c79ba1c699ad 100644
--- a/drivers/misc/cxl/pci.c
+++ b/drivers/misc/cxl/pci.c
@@ -1718,7 +1718,6 @@ int cxl_slot_is_switched(struct pci_dev *dev)
  {
struct device_node *np;
int depth = 0;
-   const __be32 *prop;

if (!(np = pci_device_to_OF_node(dev))) {
pr_err("cxl: np = NULL\n");
@@ -1727,8 +1726,7 @@ int cxl_slot_is_switched(struct pci_dev *dev)
of_node_get(np);
while (np) {
np = of_get_next_parent(np);
-   prop = of_get_property(np, "device_type", NULL);
-   if (!prop || strcmp((char *)prop, "pciex"))
+   if (!of_node_is_type(np, "pciex"))
break;
depth++;
}

[PATCH 7/7] powerpc/mm: Fallback to RAM if the altmap is unusable

2018-12-06 Thread Oliver O'Halloran

The "altmap" is used to provide a pool of memory that is reserved for
the vmemmap backing of hot-plugged memory. This is useful when adding
large amount of ZONE_DEVICE memory to a system with a limited amount of
normal memory.

On ppc64 we use huge pages to map the vmemmap which requires the backing
storage to be contigious and aligned to the hugepage size. The altmap
implementation allows for the altmap provider to reserve a few PFNs at
the start of the range for it's own uses and when this occurs the
first chunk of the altmap is not usage for hugepage mappings. On hash
there is no sane way to fall back to a normal sized page mapping so we
fail the allocation. This results in memory hotplug failing with
ENOMEM when the new range doesn't fall into an existing vmemmap block.

This patch handles this case by falling back to using system memory
rather than failing if we cannot allocate from the altmap. This
fallback should only ever be used for the first vmemmap block so it
should not cause excess memory consumption.

Fixes: 7b73d978a5d0 ("mm: pass the vmem_altmap to vmemmap_populate")
Signed-off-by: Oliver O'Halloran 
---
The Fixes here is a little dubious since the bug was present when altmap
support for powerpc was first added in b584c2544041 ("powerpc/vmemmap:
Add altmap support"). However, the rework in 7b73d978a5d0 ("mm: pass the
vmem_altmap to vmemmap_populate") moved the altmap allocation out of
generic code and into arch code so a fix for b584c2544041 would
be a completely different patch.
---
 arch/powerpc/mm/init_64.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index be6e964625da..960f2a44c525 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -173,15 +173,20 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node);
 
for (; start < end; start += page_size) {
-   void *p;
+   void *p = NULL;
int rc;
 
if (vmemmap_populated(start, page_size))
continue;
 
+   /*
+* Allocate from the altmap first if we have one. This may
+* fail due to alignment issues when using 16MB hugepages, so
+* fall back to system memory if the altmap allocation fail.
+*/
if (altmap)
p = altmap_alloc_block_buf(page_size, altmap);
-   else
+   if (!p)
p = vmemmap_alloc_block_buf(page_size, node);
if (!p)
return -ENOMEM;
@@ -241,8 +246,15 @@ void __ref vmemmap_free(unsigned long start, unsigned long 
end,
 {
unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
unsigned long page_order = get_order(page_size);
+   unsigned long alt_start = ~0, alt_end = ~0;
+   unsigned long base_pfn;
 
start = _ALIGN_DOWN(start, page_size);
+   if (altmap) {
+   alt_start = altmap->base_pfn;
+   alt_end = altmap->base_pfn + altmap->reserve +
+ altmap->free + altmap->alloc + altmap->align;
+   }
 
pr_debug("vmemmap_free %lx...%lx\n", start, end);
 
@@ -266,8 +278,9 @@ void __ref vmemmap_free(unsigned long start, unsigned long 
end,
page = pfn_to_page(addr >> PAGE_SHIFT);
section_base = pfn_to_page(vmemmap_section_start(start));
nr_pages = 1 << page_order;
+   base_pfn = PHYS_PFN(addr);
 
-   if (altmap) {
+   if (base_pfn >= alt_start && base_pfn < alt_end) {
vmem_altmap_free(altmap, nr_pages);
} else if (PageReserved(page)) {
/* allocated from bootmem */
-- 
2.17.2

[PATCH 6/7] powerpc/papr_scm: Use ibm,unit-guid as the iset cookie

2018-12-06 Thread Oliver O'Halloran

The interleave set cookie is used to determine if a label stored in the
metadata space should be applied to the current region. This is
important in the case of NVDIMMs since the firmware may change the
interleaving configuration of a DIMM which would invalidate the existing
labels. In our case the hypervisor hides those details from us so we
don't really care, but libnvdimm still requires the interleave set
cookie to be non-zero.

For our purposes we just need the set cookie to be unique and fixed for
a given PAPR SCM region and using the unit-guid (really a UUID) is fine
for this purpose.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 871b386dd4b2..92f76c656f85 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -264,6 +264,8 @@ static int papr_scm_probe(struct platform_device *pdev)
uint32_t drc_index, metadata_size;
uint64_t blocks, block_size;
struct papr_scm_priv *p;
+   const char *uuid_str;
+   uint64_t uuid[2];
int rc;
 
/* check we have all the required DT properties */
@@ -282,6 +284,11 @@ static int papr_scm_probe(struct platform_device *pdev)
return -ENODEV;
}
 
+   if (of_property_read_string(dn, "ibm,unit-guid", _str)) {
+   dev_err(>dev, "%pOF: missing unit-guid!\n", dn);
+   return -ENODEV;
+   }
+
p = kzalloc(sizeof(*p), GFP_KERNEL);
if (!p)
return -ENOMEM;
@@ -294,6 +301,11 @@ static int papr_scm_probe(struct platform_device *pdev)
p->block_size = block_size;
p->blocks = blocks;
 
+   /* We just need to ensure that set cookies are unique across */
+   uuid_parse(uuid_str, (uuid_t *) uuid);
+   p->nd_set.cookie1 = uuid[0];
+   p->nd_set.cookie2 = uuid[1];
+
/* might be zero */
p->metadata_size = metadata_size;
p->pdev = pdev;
-- 
2.17.2

[PATCH 5/7] powerpc/papr_scm: Fix DIMM device registration race

2018-12-06 Thread Oliver O'Halloran

When a new nvdimm device is registered with libnvdimm via
nvdimm_create() it is added as a device on the nvdimm bus. The probe
function for the DIMM driver is potentially quite slow so actually
registering and probing the device is done in an async domain rather
than immediately after device creation. This can result in a race where
the region device (created 2nd) is probed first and fails to activate at
boot.

To fix this we use the same approach as the ACPI/NFIT driver which is to
check that all the DIMM devices registered successfully. LibNVDIMM
provides the nvdimm_bus_count_dimms() function which synchronises with
the async domain and verifies that the dimm was successfully registered
with the bus.

If either of these does not occur then we bail.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 90203039dee8..871b386dd4b2 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -223,6 +223,9 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
goto err;
}
 
+   if (nvdimm_bus_check_dimm_count(p->bus, 1))
+   goto err;
+
/* now add the region */
 
memset(, 0, sizeof(mapping));
-- 
2.17.2

[PATCH 4/7] powerpc/papr_scm: Remove endian conversions

2018-12-06 Thread Oliver O'Halloran

The return values of a h-call are returned in the CPU registers and
written to the provided buffer by the plpar_hcall() wrapper. As a result
the values written to memory are always in the native endian and should
not be byte swapped.

The inital implementation of the H-Call interface was done in qemu and
the returned values were byte swapped unnecessarily in both the
hypervisor and in the driver so this was only noticed when bringing up
the PowerVM implementation.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index ed1082cc1d27..90203039dee8 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -55,7 +55,7 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
do {
rc = plpar_hcall(H_SCM_BIND_MEM, ret, p->drc_index, 0,
p->blocks, BIND_ANY_ADDR, token);
-   token = be64_to_cpu(ret[0]);
+   token = ret[0];
cond_resched();
} while (rc == H_BUSY);
 
@@ -64,7 +64,7 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
return -ENXIO;
}
 
-   p->bound_addr = be64_to_cpu(ret[1]);
+   p->bound_addr = ret[1];
 
dev_dbg(>pdev->dev, "bound drc %x to %pR\n", p->drc_index, >res);
 
@@ -82,7 +82,7 @@ static int drc_pmem_unbind(struct papr_scm_priv *p)
do {
rc = plpar_hcall(H_SCM_UNBIND_MEM, ret, p->drc_index,
p->bound_addr, p->blocks, token);
-   token = be64_to_cpu(ret);
+   token = ret[0];
cond_resched();
} while (rc == H_BUSY);
 
-- 
2.17.2

[PATCH 3/7] powerpc/papr_scm: Update DT properties

2018-12-06 Thread Oliver O'Halloran

The ibm,unit-sizes property was originally specified as an array of two
u32s corresponding to the memory block size, and the number of blocks
available in that region. A fairly last-minute change to the SCM DT
specification was splitting that into two seperate u64 properties:
ibm,block-sizes and ibm,number-of-blocks that convey the same
information. No firmware / hypervisor that emitted the ibm,unit-size
property ever appeared in the wild.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 390badd33547..ed1082cc1d27 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -257,8 +257,9 @@ err:nvdimm_bus_unregister(p->bus);
 
 static int papr_scm_probe(struct platform_device *pdev)
 {
-   uint32_t drc_index, metadata_size, unit_cap[2];
struct device_node *dn = pdev->dev.of_node;
+   uint32_t drc_index, metadata_size;
+   uint64_t blocks, block_size;
struct papr_scm_priv *p;
int rc;
 
@@ -268,8 +269,13 @@ static int papr_scm_probe(struct platform_device *pdev)
return -ENODEV;
}
 
-   if (of_property_read_u32_array(dn, "ibm,unit-capacity", unit_cap, 2)) {
-   dev_err(>dev, "%pOF: missing unit-capacity!\n", dn);
+   if (of_property_read_u64(dn, "ibm,block-size", _size)) {
+   dev_err(>dev, "%pOF: missing block-size!\n", dn);
+   return -ENODEV;
+   }
+
+   if (of_property_read_u64(dn, "ibm,number-of-blocks", )) {
+   dev_err(>dev, "%pOF: missing number-of-blocks!\n", dn);
return -ENODEV;
}
 
@@ -282,8 +288,8 @@ static int papr_scm_probe(struct platform_device *pdev)
 
p->dn = dn;
p->drc_index = drc_index;
-   p->block_size = unit_cap[0];
-   p->blocks = unit_cap[1];
+   p->block_size = block_size;
+   p->blocks = blocks;
 
/* might be zero */
p->metadata_size = metadata_size;
-- 
2.17.2

[PATCH 2/7] powerpc/papr_scm: Fix resource end address

2018-12-06 Thread Oliver O'Halloran

Fix an off-by-one error in the memory resource range. This resource is
used to determine the address range of the memory to be hot-plugged as
ZONE_DEVICE memory. The current end address results in the kernel
attempting to map an additional memblock and the hypervisor may reject
the mapping resulting in the entire hot-plug failing.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index ee9372b65ca5..390badd33547 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -296,7 +296,7 @@ static int papr_scm_probe(struct platform_device *pdev)
 
/* setup the resource for the newly bound range */
p->res.start = p->bound_addr;
-   p->res.end   = p->bound_addr + p->blocks * p->block_size;
+   p->res.end   = p->bound_addr + p->blocks * p->block_size - 1;
p->res.name  = pdev->name;
p->res.flags = IORESOURCE_MEM;
 
-- 
2.17.2

[PATCH 1/7] powerpc/papr_scm: Use depend instead of select

2018-12-06 Thread Oliver O'Halloran

Making PAPR_SCM select LIBNVDIMM results in circular dependencies in
Kconfig when another symbol depends on it. Fix this by replacing the
select with a depends.

Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Reported-by: Alastair D'Silva 
Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/pseries/Kconfig | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index ed91c9aea78e..e31e397003b3 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -141,8 +141,7 @@ config IBMEBUS
  Bus device driver for GX bus based adapters.
 
 config PAPR_SCM
-   depends on PPC_PSERIES && MEMORY_HOTPLUG
-   select LIBNVDIMM
+   depends on PPC_PSERIES && MEMORY_HOTPLUG && LIBNVDIMM
tristate "Support for the PAPR Storage Class Memory interface"
help
  Enable access to hypervisor provided storage class memory.
-- 
2.17.2

papr SCM driver fixes

2018-12-06 Thread Oliver O'Halloran

Various bug fixes for the papr_scm driver that were found while bringing
up the PowerVM implementation of the interface. There's a few minor bugs
there were a result of bugs in the original QEMU implementation, a few
due to the memory layouts being different and one due to a change to the
DT bindings.

Mostly self-contained to the driver with the exception of the last patch
which touches the powerpc specific parts of the vmemmap handling.

Re: [PATCH v2 08/34] dt-bindings: arm: Convert PMU binding to json-schema

2018-12-06 Thread Will Deacon

On Wed, Dec 05, 2018 at 09:42:04AM -0600, Rob Herring wrote:
> On Wed, Dec 5, 2018 at 4:08 AM Will Deacon  wrote:
> > On Mon, Dec 03, 2018 at 03:31:57PM -0600, Rob Herring wrote:
> > > +properties:
> > > +  compatible:
> > > +oneOf:
> > > +  - items:
> > > +  - enum:
> > > +  - apm,potenza-pmu
> > > +  - arm,armv8-pmuv3
> > > +  - arm,cortex-a73-pmu
> > > +  - arm,cortex-a72-pmu
> > > +  - arm,cortex-a57-pmu
> > > +  - arm,cortex-a53-pmu
> > > +  - arm,cortex-a35-pmu
> > > +  - arm,cortex-a17-pmu
> > > +  - arm,cortex-a15-pmu
> > > +  - arm,cortex-a12-pmu
> > > +  - arm,cortex-a9-pmu
> > > +  - arm,cortex-a8-pmu
> > > +  - arm,cortex-a7-pmu
> > > +  - arm,cortex-a5-pmu
> > > +  - arm,arm11mpcore-pmu
> > > +  - arm,arm1176-pmu
> > > +  - arm,arm1136-pmu
> > > +  - brcm,vulcan-pmu
> > > +  - cavium,thunder-pmu
> > > +  - qcom,scorpion-pmu
> > > +  - qcom,scorpion-mp-pmu
> > > +  - qcom,krait-pmu
> > > +  - items:
> > > +  - const: arm,cortex-a7-pmu
> > > +  - const: arm,cortex-a15-pmu
> >
> > What do these last two mean?
> 
> The first list only allows a single compatible string. This list says
> there are 2 and that the compatible property should be:
> compatible = "arm,cortex-a7-pmu", "arm,cortex-a15-pmu";
> 
> Which shows up here:
> arch/arm/boot/dts/sun6i-a31.dtsi:   compatible =
> "arm,cortex-a7-pmu", "arm,cortex-a15-pmu";
> arch/arm/boot/dts/sun7i-a20.dtsi:   compatible =
> "arm,cortex-a7-pmu", "arm,cortex-a15-pmu";
> 
> Maybe the dts is wrong and should be fixed?

Yes, that's wrong and you could end up with the kernel exposing the wrong
events to userspace. So I think we can fix the .dts and leave the binding
without this.

> > > +required:
> > > +  - compatible
> >
> > I probably said this before, but I do think that it's a shame to lose the
> > example binding, especially for something like the PMU where you can
> > pretty much take an example and bang in your own IRQ numbers to get it
> > up and running.
> 
> I just thought it is so trivial that it didn't add much. I think most
> folks would copy-n-paste from an actual dts file which has all the
> other nodes you just need to update addresses and irq nums.

True... even more of a reason to fix thise sun* .dts files then!

Will

Re: [PATCH 08/34] powerpc/dma: untangle vio_dma_mapping_ops from dma_iommu_ops

2018-12-06 Thread Christoph Hellwig

ping?

On Wed, Nov 14, 2018 at 09:22:48AM +0100, Christoph Hellwig wrote:
> vio_dma_mapping_ops currently does a lot of indirect calls through
> dma_iommu_ops, which not only make the code harder to follow but are
> also expensive in the post-spectre world.  Unwind the indirect calls
> by calling the ppc_iommu_* or iommu_* APIs directly applicable, or
> just use the dma_iommu_* methods directly where we can.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/include/asm/iommu.h |  1 +
>  arch/powerpc/kernel/dma-iommu.c  |  2 +-
>  arch/powerpc/platforms/pseries/vio.c | 87 
>  3 files changed, 38 insertions(+), 52 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 35db0cbc9222..75daa10f31a4 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -242,6 +242,7 @@ static inline int __init tce_iommu_bus_notifier_init(void)
>  }
>  #endif /* !CONFIG_IOMMU_API */
>  
> +u64 dma_iommu_get_required_mask(struct device *dev);
>  int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr);
>  
>  #else
> diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
> index 2ca6cfaebf65..0613278abf9f 100644
> --- a/arch/powerpc/kernel/dma-iommu.c
> +++ b/arch/powerpc/kernel/dma-iommu.c
> @@ -92,7 +92,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
>   return 1;
>  }
>  
> -static u64 dma_iommu_get_required_mask(struct device *dev)
> +u64 dma_iommu_get_required_mask(struct device *dev)
>  {
>   struct iommu_table *tbl = get_iommu_table_base(dev);
>   u64 mask;
> diff --git a/arch/powerpc/platforms/pseries/vio.c 
> b/arch/powerpc/platforms/pseries/vio.c
> index 88f1ad1d6309..ea3a9745c812 100644
> --- a/arch/powerpc/platforms/pseries/vio.c
> +++ b/arch/powerpc/platforms/pseries/vio.c
> @@ -492,7 +492,9 @@ static void *vio_dma_iommu_alloc_coherent(struct device 
> *dev, size_t size,
>   return NULL;
>   }
>  
> - ret = dma_iommu_ops.alloc(dev, size, dma_handle, flag, attrs);
> + ret = iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
> + dma_handle, dev->coherent_dma_mask, flag,
> + dev_to_node(dev));
>   if (unlikely(ret == NULL)) {
>   vio_cmo_dealloc(viodev, roundup(size, PAGE_SIZE));
>   atomic_inc(>cmo.allocs_failed);
> @@ -507,8 +509,7 @@ static void vio_dma_iommu_free_coherent(struct device 
> *dev, size_t size,
>  {
>   struct vio_dev *viodev = to_vio_dev(dev);
>  
> - dma_iommu_ops.free(dev, size, vaddr, dma_handle, attrs);
> -
> + iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
>   vio_cmo_dealloc(viodev, roundup(size, PAGE_SIZE));
>  }
>  
> @@ -518,22 +519,22 @@ static dma_addr_t vio_dma_iommu_map_page(struct device 
> *dev, struct page *page,
>   unsigned long attrs)
>  {
>   struct vio_dev *viodev = to_vio_dev(dev);
> - struct iommu_table *tbl;
> + struct iommu_table *tbl = get_iommu_table_base(dev);
>   dma_addr_t ret = IOMMU_MAPPING_ERROR;
>  
> - tbl = get_iommu_table_base(dev);
> - if (vio_cmo_alloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl {
> - atomic_inc(>cmo.allocs_failed);
> - return ret;
> - }
> -
> - ret = dma_iommu_ops.map_page(dev, page, offset, size, direction, attrs);
> - if (unlikely(dma_mapping_error(dev, ret))) {
> - vio_cmo_dealloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl)));
> - atomic_inc(>cmo.allocs_failed);
> - }
> -
> + if (vio_cmo_alloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl
> + goto out_fail;
> + ret = iommu_map_page(dev, tbl, page, offset, size, device_to_mask(dev),
> + direction, attrs);
> + if (unlikely(ret == IOMMU_MAPPING_ERROR))
> + goto out_deallocate;
>   return ret;
> +
> +out_deallocate:
> + vio_cmo_dealloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl)));
> +out_fail:
> + atomic_inc(>cmo.allocs_failed);
> + return IOMMU_MAPPING_ERROR;
>  }
>  
>  static void vio_dma_iommu_unmap_page(struct device *dev, dma_addr_t 
> dma_handle,
> @@ -542,11 +543,9 @@ static void vio_dma_iommu_unmap_page(struct device *dev, 
> dma_addr_t dma_handle,
>unsigned long attrs)
>  {
>   struct vio_dev *viodev = to_vio_dev(dev);
> - struct iommu_table *tbl;
> -
> - tbl = get_iommu_table_base(dev);
> - dma_iommu_ops.unmap_page(dev, dma_handle, size, direction, attrs);
> + struct iommu_table *tbl = get_iommu_table_base(dev);
>  
> + iommu_unmap_page(tbl, dma_handle, size, direction, attrs);
>   vio_cmo_dealloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl)));
>  }
>  
> @@ -555,34 +554,32 @@ static int vio_dma_iommu_map_sg(struct device *dev, 
> struct scatterlist *sglist,

Re: [PATCH 14/34] powerpc/dart: remove dead cleanup code in iommu_init_early_dart

2018-12-06 Thread Christoph Hellwig

ping?

On Wed, Nov 14, 2018 at 09:22:54AM +0100, Christoph Hellwig wrote:
> If dart_init failed we didn't have a chance to setup dma or controller
> ops yet, so there is no point in resetting them.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/sysdev/dart_iommu.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/dart_iommu.c 
> b/arch/powerpc/sysdev/dart_iommu.c
> index a5b40d1460f1..283ce04c5844 100644
> --- a/arch/powerpc/sysdev/dart_iommu.c
> +++ b/arch/powerpc/sysdev/dart_iommu.c
> @@ -428,7 +428,7 @@ void __init iommu_init_early_dart(struct 
> pci_controller_ops *controller_ops)
>  
>   /* Initialize the DART HW */
>   if (dart_init(dn) != 0)
> - goto bail;
> + return;
>  
>   /* Setup bypass if supported */
>   if (dart_is_u4)
> @@ -439,15 +439,6 @@ void __init iommu_init_early_dart(struct 
> pci_controller_ops *controller_ops)
>  
>   /* Setup pci_dma ops */
>   set_pci_dma_ops(_iommu_ops);
> - return;
> -
> - bail:
> - /* If init failed, use direct iommu and null setup functions */
> - controller_ops->dma_dev_setup = NULL;
> - controller_ops->dma_bus_setup = NULL;
> -
> - /* Setup pci_dma ops */
> - set_pci_dma_ops(_nommu_ops);
>  }
>  
>  #ifdef CONFIG_PM
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---

Re: [PATCH 19/34] cxl: drop the dma_set_mask callback from vphb

2018-12-06 Thread Christoph Hellwig

ping?

On Wed, Nov 14, 2018 at 09:22:59AM +0100, Christoph Hellwig wrote:
> The CXL code never even looks at the dma mask, so there is no good
> reason for this sanity check.  Remove it because it gets in the way
> of the dma ops refactoring.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/misc/cxl/vphb.c | 12 
>  1 file changed, 12 deletions(-)
> 
> diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
> index 7908633d9204..49da2f744bbf 100644
> --- a/drivers/misc/cxl/vphb.c
> +++ b/drivers/misc/cxl/vphb.c
> @@ -11,17 +11,6 @@
>  #include 
>  #include "cxl.h"
>  
> -static int cxl_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
> -{
> - if (dma_mask < DMA_BIT_MASK(64)) {
> - pr_info("%s only 64bit DMA supported on CXL", __func__);
> - return -EIO;
> - }
> -
> - *(pdev->dev.dma_mask) = dma_mask;
> - return 0;
> -}
> -
>  static int cxl_pci_probe_mode(struct pci_bus *bus)
>  {
>   return PCI_PROBE_NORMAL;
> @@ -220,7 +209,6 @@ static struct pci_controller_ops cxl_pci_controller_ops =
>   .reset_secondary_bus = cxl_pci_reset_secondary_bus,
>   .setup_msi_irqs = cxl_setup_msi_irqs,
>   .teardown_msi_irqs = cxl_teardown_msi_irqs,
> - .dma_set_mask = cxl_dma_set_mask,
>  };
>  
>  int cxl_pci_vphb_add(struct cxl_afu *afu)
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---

Re: [PATCH 06/34] powerpc/dma: split the two __dma_alloc_coherent implementations

2018-12-06 Thread Christoph Hellwig

ping?

On Wed, Nov 14, 2018 at 09:22:46AM +0100, Christoph Hellwig wrote:
> The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
> any code with the one for systems with coherent caches.  Split it off
> and merge it with the helpers in dma-noncoherent.c that have no other
> callers.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/include/asm/dma-mapping.h |  5 -
>  arch/powerpc/kernel/dma.c  | 14 ++
>  arch/powerpc/mm/dma-noncoherent.c  | 15 +++
>  arch/powerpc/platforms/44x/warp.c  |  2 +-
>  4 files changed, 10 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> b/arch/powerpc/include/asm/dma-mapping.h
> index f2a4a7142b1e..dacd0f93f2b2 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -39,9 +39,6 @@ extern int dma_nommu_mmap_coherent(struct device *dev,
>   * to ensure it is consistent.
>   */
>  struct device;
> -extern void *__dma_alloc_coherent(struct device *dev, size_t size,
> -   dma_addr_t *handle, gfp_t gfp);
> -extern void __dma_free_coherent(size_t size, void *vaddr);
>  extern void __dma_sync(void *vaddr, size_t size, int direction);
>  extern void __dma_sync_page(struct page *page, unsigned long offset,
>size_t size, int direction);
> @@ -52,8 +49,6 @@ extern unsigned long __dma_get_coherent_pfn(unsigned long 
> cpu_addr);
>   * Cache coherent cores.
>   */
>  
> -#define __dma_alloc_coherent(dev, gfp, size, handle) NULL
> -#define __dma_free_coherent(size, addr)  ((void)0)
>  #define __dma_sync(addr, size, rw)   ((void)0)
>  #define __dma_sync_page(pg, off, sz, rw) ((void)0)
>  
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index 6551685a4ed0..d6deb458bb91 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -62,18 +62,12 @@ static int dma_nommu_dma_supported(struct device *dev, 
> u64 mask)
>  #endif
>  }
>  
> +#ifndef CONFIG_NOT_COHERENT_CACHE
>  void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
> dma_addr_t *dma_handle, gfp_t flag,
> unsigned long attrs)
>  {
>   void *ret;
> -#ifdef CONFIG_NOT_COHERENT_CACHE
> - ret = __dma_alloc_coherent(dev, size, dma_handle, flag);
> - if (ret == NULL)
> - return NULL;
> - *dma_handle += get_dma_offset(dev);
> - return ret;
> -#else
>   struct page *page;
>   int node = dev_to_node(dev);
>  #ifdef CONFIG_FSL_SOC
> @@ -110,19 +104,15 @@ void *__dma_nommu_alloc_coherent(struct device *dev, 
> size_t size,
>   *dma_handle = __pa(ret) + get_dma_offset(dev);
>  
>   return ret;
> -#endif
>  }
>  
>  void __dma_nommu_free_coherent(struct device *dev, size_t size,
>   void *vaddr, dma_addr_t dma_handle,
>   unsigned long attrs)
>  {
> -#ifdef CONFIG_NOT_COHERENT_CACHE
> - __dma_free_coherent(size, vaddr);
> -#else
>   free_pages((unsigned long)vaddr, get_order(size));
> -#endif
>  }
> +#endif /* !CONFIG_NOT_COHERENT_CACHE */
>  
>  static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
>  dma_addr_t *dma_handle, gfp_t flag,
> diff --git a/arch/powerpc/mm/dma-noncoherent.c 
> b/arch/powerpc/mm/dma-noncoherent.c
> index b6e7b5952ab5..e955539686a4 100644
> --- a/arch/powerpc/mm/dma-noncoherent.c
> +++ b/arch/powerpc/mm/dma-noncoherent.c
> @@ -29,7 +29,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  
>  #include 
> @@ -151,8 +151,8 @@ static struct ppc_vm_region *ppc_vm_region_find(struct 
> ppc_vm_region *head, unsi
>   * Allocate DMA-coherent memory space and return both the kernel remapped
>   * virtual and bus address for that space.
>   */
> -void *
> -__dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, 
> gfp_t gfp)
> +void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
>  {
>   struct page *page;
>   struct ppc_vm_region *c;
> @@ -223,7 +223,7 @@ __dma_alloc_coherent(struct device *dev, size_t size, 
> dma_addr_t *handle, gfp_t
>   /*
>* Set the "dma handle"
>*/
> - *handle = page_to_phys(page);
> + *dma_handle = phys_to_dma(dev, page_to_phys(page));
>  
>   do {
>   SetPageReserved(page);
> @@ -249,12 +249,12 @@ __dma_alloc_coherent(struct device *dev, size_t size, 
> dma_addr_t *handle, gfp_t
>   no_page:
>   return NULL;
>  }
> -EXPORT_SYMBOL(__dma_alloc_coherent);
>  
>  /*
>   * free a page as defined by the above mapping.
>   */
> -void __dma_free_coherent(size_t size, void *vaddr)
> +void

Re: [PATCH 07/34] powerpc/dma: remove the no-op dma_nommu_unmap_{page, sg} routines

2018-12-06 Thread Christoph Hellwig

ping?

On Wed, Nov 14, 2018 at 09:22:47AM +0100, Christoph Hellwig wrote:
> These methods are optional, no need to implement no-op versions.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/kernel/dma.c | 16 
>  1 file changed, 16 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index d6deb458bb91..7078d72baec2 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -197,12 +197,6 @@ static int dma_nommu_map_sg(struct device *dev, struct 
> scatterlist *sgl,
>   return nents;
>  }
>  
> -static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction direction,
> - unsigned long attrs)
> -{
> -}
> -
>  static u64 dma_nommu_get_required_mask(struct device *dev)
>  {
>   u64 end, mask;
> @@ -228,14 +222,6 @@ static inline dma_addr_t dma_nommu_map_page(struct 
> device *dev,
>   return page_to_phys(page) + offset + get_dma_offset(dev);
>  }
>  
> -static inline void dma_nommu_unmap_page(struct device *dev,
> -  dma_addr_t dma_address,
> -  size_t size,
> -  enum dma_data_direction direction,
> -  unsigned long attrs)
> -{
> -}
> -
>  #ifdef CONFIG_NOT_COHERENT_CACHE
>  static inline void dma_nommu_sync_sg(struct device *dev,
>   struct scatterlist *sgl, int nents,
> @@ -261,10 +247,8 @@ const struct dma_map_ops dma_nommu_ops = {
>   .free   = dma_nommu_free_coherent,
>   .mmap   = dma_nommu_mmap_coherent,
>   .map_sg = dma_nommu_map_sg,
> - .unmap_sg   = dma_nommu_unmap_sg,
>   .dma_supported  = dma_nommu_dma_supported,
>   .map_page   = dma_nommu_map_page,
> - .unmap_page = dma_nommu_unmap_page,
>   .get_required_mask  = dma_nommu_get_required_mask,
>  #ifdef CONFIG_NOT_COHERENT_CACHE
>   .sync_single_for_cpu= dma_nommu_sync_single,
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---

Re: [PATCH 03/34] powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define

2018-12-06 Thread Christoph Hellwig

ping?

On Wed, Nov 14, 2018 at 09:22:43AM +0100, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> Acked-by: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/include/asm/dma-mapping.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> b/arch/powerpc/include/asm/dma-mapping.h
> index 8fa394520af6..f2a4a7142b1e 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -112,7 +112,5 @@ extern int dma_set_mask(struct device *dev, u64 dma_mask);
>  
>  extern u64 __dma_get_required_mask(struct device *dev);
>  
> -#define ARCH_HAS_DMA_MMAP_COHERENT
> -
>  #endif /* __KERNEL__ */
>  #endif   /* _ASM_DMA_MAPPING_H */
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---

Re: [PATCH 04/34] powerpc/dma: remove the unused ISA_DMA_THRESHOLD export

2018-12-06 Thread Christoph Hellwig

ping?

On Wed, Nov 14, 2018 at 09:22:44AM +0100, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> Acked-by: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/kernel/setup_32.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
> index 81909600013a..07f7e6aaf104 100644
> --- a/arch/powerpc/kernel/setup_32.c
> +++ b/arch/powerpc/kernel/setup_32.c
> @@ -59,7 +59,6 @@ unsigned long ISA_DMA_THRESHOLD;
>  unsigned int DMA_MODE_READ;
>  unsigned int DMA_MODE_WRITE;
>  
> -EXPORT_SYMBOL(ISA_DMA_THRESHOLD);
>  EXPORT_SYMBOL(DMA_MODE_READ);
>  EXPORT_SYMBOL(DMA_MODE_WRITE);
>  
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---

Re: [PATCH 05/34] powerpc/dma: remove the unused dma_iommu_ops export

2018-12-06 Thread Christoph Hellwig

ping?

On Wed, Nov 14, 2018 at 09:22:45AM +0100, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/kernel/dma-iommu.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
> index f9fe2080ceb9..2ca6cfaebf65 100644
> --- a/arch/powerpc/kernel/dma-iommu.c
> +++ b/arch/powerpc/kernel/dma-iommu.c
> @@ -6,7 +6,6 @@
>   * busses using the iommu infrastructure
>   */
>  
> -#include 
>  #include 
>  
>  /*
> @@ -123,4 +122,3 @@ struct dma_map_ops dma_iommu_ops = {
>   .get_required_mask  = dma_iommu_get_required_mask,
>   .mapping_error  = dma_iommu_mapping_error,
>  };
> -EXPORT_SYMBOL(dma_iommu_ops);
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---

Re: [PATCH 01/34] powerpc: use mm zones more sensibly

2018-12-06 Thread Christoph Hellwig

Ben / Michael,

can we get this one queued up for 4.21 to prepare for the DMA work later
on?

On Wed, Nov 14, 2018 at 09:22:41AM +0100, Christoph Hellwig wrote:
> Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
> common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.
> 
> Move to a scheme closer to what other architectures use (and I dare to
> say the intent of the system):
> 
>  - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only)
>  - ZONE_NORMAL: everything addressable by the kernel
>  - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels
> 
> Also provide information on how ZONE_DMA is used by defining
> ARCH_ZONE_DMA_BITS.
> 
> Contains various fixes from Benjamin Herrenschmidt.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/Kconfig  |  8 +---
>  arch/powerpc/include/asm/page.h   |  2 +
>  arch/powerpc/include/asm/pgtable.h|  1 -
>  arch/powerpc/kernel/dma-swiotlb.c |  6 +--
>  arch/powerpc/kernel/dma.c |  7 +--
>  arch/powerpc/mm/mem.c | 47 +++
>  arch/powerpc/platforms/85xx/corenet_generic.c | 10 
>  arch/powerpc/platforms/85xx/qemu_e500.c   |  9 
>  include/linux/mmzone.h|  2 +-
>  9 files changed, 25 insertions(+), 67 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 8be31261aec8..c3613bc1 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -374,9 +374,9 @@ config PPC_ADV_DEBUG_DAC_RANGE
>   depends on PPC_ADV_DEBUG_REGS && 44x
>   default y
>  
> -config ZONE_DMA32
> +config ZONE_DMA
>   bool
> - default y if PPC64
> + default y if PPC_BOOK3E_64
>  
>  config PGTABLE_LEVELS
>   int
> @@ -869,10 +869,6 @@ config ISA
> have an IBM RS/6000 or pSeries machine, say Y.  If you have an
> embedded board, consult your board documentation.
>  
> -config ZONE_DMA
> - bool
> - default y
> -
>  config GENERIC_ISA_DMA
>   bool
>   depends on ISA_DMA_API
> diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
> index f6a1265face2..fc8c9ac0c6be 100644
> --- a/arch/powerpc/include/asm/page.h
> +++ b/arch/powerpc/include/asm/page.h
> @@ -354,4 +354,6 @@ typedef struct page *pgtable_t;
>  #endif /* __ASSEMBLY__ */
>  #include 
>  
> +#define ARCH_ZONE_DMA_BITS 31
> +
>  #endif /* _ASM_POWERPC_PAGE_H */
> diff --git a/arch/powerpc/include/asm/pgtable.h 
> b/arch/powerpc/include/asm/pgtable.h
> index 9679b7519a35..8af32ce93c7f 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -66,7 +66,6 @@ extern unsigned long empty_zero_page[];
>  
>  extern pgd_t swapper_pg_dir[];
>  
> -void limit_zone_pfn(enum zone_type zone, unsigned long max_pfn);
>  int dma_pfn_limit_to_zone(u64 pfn_limit);
>  extern void paging_init(void);
>  
> diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
> b/arch/powerpc/kernel/dma-swiotlb.c
> index 5fc335f4d9cd..678811abccfc 100644
> --- a/arch/powerpc/kernel/dma-swiotlb.c
> +++ b/arch/powerpc/kernel/dma-swiotlb.c
> @@ -108,12 +108,8 @@ int __init swiotlb_setup_bus_notifier(void)
>  
>  void __init swiotlb_detect_4g(void)
>  {
> - if ((memblock_end_of_DRAM() - 1) > 0x) {
> + if ((memblock_end_of_DRAM() - 1) > 0x)
>   ppc_swiotlb_enable = 1;
> -#ifdef CONFIG_ZONE_DMA32
> - limit_zone_pfn(ZONE_DMA32, (1ULL << 32) >> PAGE_SHIFT);
> -#endif
> - }
>  }
>  
>  static int __init check_swiotlb_enabled(void)
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index dbfc7056d7df..6551685a4ed0 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -50,7 +50,7 @@ static int dma_nommu_dma_supported(struct device *dev, u64 
> mask)
>   return 1;
>  
>  #ifdef CONFIG_FSL_SOC
> - /* Freescale gets another chance via ZONE_DMA/ZONE_DMA32, however
> + /* Freescale gets another chance via ZONE_DMA, however
>* that will have to be refined if/when they support iommus
>*/
>   return 1;
> @@ -94,13 +94,10 @@ void *__dma_nommu_alloc_coherent(struct device *dev, 
> size_t size,
>   }
>  
>   switch (zone) {
> +#ifdef CONFIG_ZONE_DMA
>   case ZONE_DMA:
>   flag |= GFP_DMA;
>   break;
> -#ifdef CONFIG_ZONE_DMA32
> - case ZONE_DMA32:
> - flag |= GFP_DMA32;
> - break;
>  #endif
>   };
>  #endif /* CONFIG_FSL_SOC */
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 0a64fffabee1..c0b676c3a5ba 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -246,35 +246,19 @@ static int __init mark_nonram_nosave(void)
>  }
>  #endif
>  
> -static bool zone_limits_final;
> -
>  /*
> - * The memory zones past TOP_ZONE are managed by generic mm code.
> - * These should be set to zero since that's what every other
> - *

Re: [PATCH 02/34] powerpc: allow NOT_COHERENT_CACHE for amigaone

2018-12-06 Thread Christoph Hellwig

powerpc maintainers, can you pick this up as this is a bug fix for the
currently existing powerpc Kconfig code?

On Wed, Nov 14, 2018 at 09:22:42AM +0100, Christoph Hellwig wrote:
> AMIGAONE select NOT_COHERENT_CACHE, so we better allow it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/platforms/Kconfig.cputype | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
> b/arch/powerpc/platforms/Kconfig.cputype
> index f4e2c5729374..6fedbf349fce 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -412,7 +412,8 @@ config NR_CPUS
>  
>  config NOT_COHERENT_CACHE
>   bool
> - depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
> + depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || \
> + GAMECUBE_COMMON || AMIGAONE
>   default n if PPC_47x
>   default y
>  
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---

Re: [PATCH 1/2] mips/kgdb: prepare arch_kgdb_ops for constness

2018-12-06 Thread Daniel Thompson

On Wed, Dec 05, 2018 at 04:41:09AM +, Christophe Leroy wrote:
> MIPS is the only architecture modifying arch_kgdb_ops during init.
> This patch makes the init static, so that it can be changed to
> const in following patch, as recommended by checkpatch.pl
> 
> Suggested-by: Paul Burton 
> Signed-off-by: Christophe Leroy 

>From my side this is
Acked-by: Daniel Thompson 

Since this is a dependency for the next patch I'd be happy to take via
my tree... but would need an ack from the MIPS guys to do that.


Daniel.

> ---
>  arch/mips/kernel/kgdb.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
> index eb6c0d582626..31eff1bec577 100644
> --- a/arch/mips/kernel/kgdb.c
> +++ b/arch/mips/kernel/kgdb.c
> @@ -394,18 +394,16 @@ int kgdb_arch_handle_exception(int vector, int signo, 
> int err_code,
>   return -1;
>  }
>  
> -struct kgdb_arch arch_kgdb_ops;
> +struct kgdb_arch arch_kgdb_ops = {
> +#ifdef CONFIG_CPU_BIG_ENDIAN
> + .gdb_bpt_instr = { spec_op << 2, 0x00, 0x00, break_op },
> +#else
> + .gdb_bpt_instr = { break_op, 0x00, 0x00, spec_op << 2 },
> +#endif
> +};
>  
>  int kgdb_arch_init(void)
>  {
> - union mips_instruction insn = {
> - .r_format = {
> - .opcode = spec_op,
> - .func   = break_op,
> - }
> - };
> - memcpy(arch_kgdb_ops.gdb_bpt_instr, insn.byte, BREAK_INSTR_SIZE);
> -
>   register_die_notifier(_notifier);
>  
>   return 0;
> -- 
> 2.13.3
>

Re: [PATCH 2/2] kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops

2018-12-06 Thread Daniel Thompson

On Wed, Dec 05, 2018 at 04:41:11AM +, Christophe Leroy wrote:
> checkpatch.pl reports the following:
> 
>   WARNING: struct kgdb_arch should normally be const
>   #28: FILE: arch/mips/kernel/kgdb.c:397:
>   +struct kgdb_arch arch_kgdb_ops = {
> 
> This report makes sense, as all other ops struct, this
> one should also be const. This patch does the change.
> 
> Signed-off-by: Christophe Leroy 

Acked-by: Daniel Thompson 

Similar to https://patchwork.kernel.org/patch/10701129/ I would be more
comfortable to see a resend with the relevant arch maintainers
explicitly called out with a Cc: entry here.


> ---
>  arch/arc/kernel/kgdb.c| 2 +-
>  arch/arm/kernel/kgdb.c| 2 +-
>  arch/arm64/kernel/kgdb.c  | 2 +-
>  arch/h8300/kernel/kgdb.c  | 2 +-
>  arch/hexagon/kernel/kgdb.c| 2 +-
>  arch/microblaze/kernel/kgdb.c | 2 +-
>  arch/mips/kernel/kgdb.c   | 2 +-
>  arch/nios2/kernel/kgdb.c  | 2 +-
>  arch/powerpc/kernel/kgdb.c| 2 +-
>  arch/sh/kernel/kgdb.c | 2 +-
>  arch/sparc/kernel/kgdb_32.c   | 2 +-
>  arch/sparc/kernel/kgdb_64.c   | 2 +-
>  arch/x86/kernel/kgdb.c| 2 +-
>  include/linux/kgdb.h  | 2 +-
>  14 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
> index 9a3c34af2ae8..bfd04b442e36 100644
> --- a/arch/arc/kernel/kgdb.c
> +++ b/arch/arc/kernel/kgdb.c
> @@ -204,7 +204,7 @@ void kgdb_roundup_cpus(unsigned long flags)
>   local_irq_disable();
>  }
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* breakpoint instruction: TRAP_S 0x3 */
>  #ifdef CONFIG_CPU_BIG_ENDIAN
>   .gdb_bpt_instr  = {0x78, 0x7e},
> diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
> index caa0dbe3dc61..21a6d5958955 100644
> --- a/arch/arm/kernel/kgdb.c
> +++ b/arch/arm/kernel/kgdb.c
> @@ -274,7 +274,7 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
>   * and we handle the normal undef case within the do_undefinstr
>   * handler.
>   */
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>  #ifndef __ARMEB__
>   .gdb_bpt_instr  = {0xfe, 0xde, 0xff, 0xe7}
>  #else /* ! __ARMEB__ */
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index a20de58061a8..fe1d1f935b90 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -357,7 +357,7 @@ void kgdb_arch_exit(void)
>   unregister_die_notifier(_notifier);
>  }
>  
> -struct kgdb_arch arch_kgdb_ops;
> +const struct kgdb_arch arch_kgdb_ops;
>  
>  int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
>  {
> diff --git a/arch/h8300/kernel/kgdb.c b/arch/h8300/kernel/kgdb.c
> index 1a1d30cb0609..602e478afbd5 100644
> --- a/arch/h8300/kernel/kgdb.c
> +++ b/arch/h8300/kernel/kgdb.c
> @@ -129,7 +129,7 @@ void kgdb_arch_exit(void)
>   /* Nothing to do */
>  }
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* Breakpoint instruction: trapa #2 */
>   .gdb_bpt_instr = { 0x57, 0x20 },
>  };
> diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
> index 16c24b22d0b2..f1924d483e78 100644
> --- a/arch/hexagon/kernel/kgdb.c
> +++ b/arch/hexagon/kernel/kgdb.c
> @@ -83,7 +83,7 @@ struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
>   { "syscall_nr", GDB_SIZEOF_REG, offsetof(struct pt_regs, syscall_nr)},
>  };
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* trap0(#0xDB) 0x0cdb0054 */
>   .gdb_bpt_instr = {0x54, 0x00, 0xdb, 0x0c},
>  };
> diff --git a/arch/microblaze/kernel/kgdb.c b/arch/microblaze/kernel/kgdb.c
> index 6366f69d118e..130cd0f064ce 100644
> --- a/arch/microblaze/kernel/kgdb.c
> +++ b/arch/microblaze/kernel/kgdb.c
> @@ -143,7 +143,7 @@ void kgdb_arch_exit(void)
>  /*
>   * Global data
>   */
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>  #ifdef __MICROBLAZEEL__
>   .gdb_bpt_instr = {0x18, 0x00, 0x0c, 0xba}, /* brki r16, 0x18 */
>  #else
> diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
> index 31eff1bec577..edfdc2ec2d16 100644
> --- a/arch/mips/kernel/kgdb.c
> +++ b/arch/mips/kernel/kgdb.c
> @@ -394,7 +394,7 @@ int kgdb_arch_handle_exception(int vector, int signo, int 
> err_code,
>   return -1;
>  }
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>  #ifdef CONFIG_CPU_BIG_ENDIAN
>   .gdb_bpt_instr = { spec_op << 2, 0x00, 0x00, break_op },
>  #else
> diff --git a/arch/nios2/kernel/kgdb.c b/arch/nios2/kernel/kgdb.c
> index 117859122d1c..37b25f844a2d 100644
> --- a/arch/nios2/kernel/kgdb.c
> +++ b/arch/nios2/kernel/kgdb.c
> @@ -165,7 +165,7 @@ void kgdb_arch_exit(void)
>   /* Nothing to do */
>  }
>  
> -struct kgdb_arch arch_kgdb_ops = {
> +const struct kgdb_arch arch_kgdb_ops = {
>   /* Breakpoint instruction: trap 30 */
>   .gdb_bpt_instr = { 0xba, 0x6f, 0x3b, 0x00

Re: [PATCH] ASoC: Use of_node_name_eq for node name comparisons

2018-12-06 Thread Rob Herring

On Wed, Dec 5, 2018 at 7:53 PM Nicolin Chen  wrote:
>
> Hi Rob,
>
> On Thu, Dec 6, 2018 at 3:51 AM Rob Herring  wrote:
> >
> > Convert string compares of DT node names to use of_node_name_eq helper
> > instead. This removes direct access to the node name pointer.
> >
> > For the FSL ASoC card, the full node names appear to be "ssi", "esai",
> > and "sai", so there's not any reason to use strstr and of_node_name_eq
>
> I am not quite sure about the replacement of strstr.
> IIRC, a node name in fsl dts might appear to be "ssi@".

That will work fine. Unit-addresses are not part of the node name for
the comparison.

> I am currently out of town so cannot verify the patch.
>
> Fabio, would it be possible for you to run a boot test?
>
> Thanks
> Nicolin

[Resend PATCH V5 7/10] KVM: Make kvm_set_spte_hva() return int

2018-12-06 Thread lantianyu1986

From: Lan Tianyu 

The patch is to make kvm_set_spte_hva() return int and caller can
check return value to determine flush tlb or not.

Signed-off-by: Lan Tianyu 
---
 arch/arm/include/asm/kvm_host.h | 2 +-
 arch/arm64/include/asm/kvm_host.h   | 2 +-
 arch/mips/include/asm/kvm_host.h| 2 +-
 arch/mips/kvm/mmu.c | 3 ++-
 arch/powerpc/include/asm/kvm_host.h | 2 +-
 arch/powerpc/kvm/book3s.c   | 3 ++-
 arch/powerpc/kvm/e500_mmu_host.c| 3 ++-
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/mmu.c  | 3 ++-
 virt/kvm/arm/mmu.c  | 6 --
 10 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 5ca5d9af0c26..fb62b500e590 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -225,7 +225,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
unsigned long start, unsigned long end);
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 52fbc823ff8c..b3df7b38ba7d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -360,7 +360,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
unsigned long start, unsigned long end);
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 2c1c53d12179..71c3f21d80d5 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -933,7 +933,7 @@ enum kvm_mips_fault_result kvm_trap_emul_gva_fault(struct 
kvm_vcpu *vcpu,
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva_range(struct kvm *kvm,
unsigned long start, unsigned long end);
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index d8dcdb350405..97e538a8c1be 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -551,7 +551,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gfn_t gfn, 
gfn_t gfn_end,
   (pte_dirty(old_pte) && !pte_dirty(hva_pte));
 }
 
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 {
unsigned long end = hva + PAGE_SIZE;
int ret;
@@ -559,6 +559,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, 
pte_t pte)
ret = handle_hva_to_gpa(kvm, hva, end, _set_spte_handler, );
if (ret)
kvm_mips_callbacks->flush_shadow_all(kvm);
+   return 0;
 }
 
 static int kvm_age_hva_handler(struct kvm *kvm, gfn_t gfn, gfn_t gfn_end,
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index fac6f631ed29..ab23379c53a9 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -72,7 +72,7 @@ extern int kvm_unmap_hva_range(struct kvm *kvm,
   unsigned long start, unsigned long end);
 extern int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long 
end);
 extern int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
-extern void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+extern int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
 #define HPTEG_CACHE_NUM(1 << 15)
 #define HPTEG_HASH_BITS_PTE13
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index fd9893bc7aa1..437613bb609a 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -850,9 +850,10 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
return kvm->arch.kvm_ops->test_age_hva(kvm, hva);
 }
 
-void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 {
kvm->arch.kvm_ops->set_spte_hva(kvm, hva, pte);
+   return 0;
 }
 
 void kvmppc_mmu_destroy(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/e500_mmu_host.c

[Resend PATCH V5 6/10] KVM: Replace old tlb flush function with new one to flush a specified range.

2018-12-06 Thread lantianyu1986

From: Lan Tianyu 

This patch is to replace kvm_flush_remote_tlbs() with kvm_flush_
remote_tlbs_with_address() in some functions without logic change.

Signed-off-by: Lan Tianyu 
---
 arch/x86/kvm/mmu.c | 31 +--
 arch/x86/kvm/paging_tmpl.h |  3 ++-
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 74df957f2f81..1cd3c316a7e6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1485,8 +1485,12 @@ static bool __drop_large_spte(struct kvm *kvm, u64 
*sptep)
 
 static void drop_large_spte(struct kvm_vcpu *vcpu, u64 *sptep)
 {
-   if (__drop_large_spte(vcpu->kvm, sptep))
-   kvm_flush_remote_tlbs(vcpu->kvm);
+   if (__drop_large_spte(vcpu->kvm, sptep)) {
+   struct kvm_mmu_page *sp = page_header(__pa(sptep));
+
+   kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
+   KVM_PAGES_PER_HPAGE(sp->role.level));
+   }
 }
 
 /*
@@ -1954,7 +1958,8 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 
*spte, gfn_t gfn)
rmap_head = gfn_to_rmap(vcpu->kvm, gfn, sp);
 
kvm_unmap_rmapp(vcpu->kvm, rmap_head, NULL, gfn, sp->role.level, 0);
-   kvm_flush_remote_tlbs(vcpu->kvm);
+   kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn,
+   KVM_PAGES_PER_HPAGE(sp->role.level));
 }
 
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
@@ -2470,7 +2475,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
account_shadowed(vcpu->kvm, sp);
if (level == PT_PAGE_TABLE_LEVEL &&
  rmap_write_protect(vcpu, gfn))
-   kvm_flush_remote_tlbs(vcpu->kvm);
+   kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 1);
 
if (level > PT_PAGE_TABLE_LEVEL && need_sync)
flush |= kvm_sync_pages(vcpu, gfn, _list);
@@ -2590,7 +2595,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, 
u64 *sptep,
return;
 
drop_parent_pte(child, sptep);
-   kvm_flush_remote_tlbs(vcpu->kvm);
+   kvm_flush_remote_tlbs_with_address(vcpu->kvm, child->gfn, 1);
}
 }
 
@@ -3014,8 +3019,10 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep, unsigned pte_access,
ret = RET_PF_EMULATE;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
+
if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush)
-   kvm_flush_remote_tlbs(vcpu->kvm);
+   kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn,
+   KVM_PAGES_PER_HPAGE(level));
 
if (unlikely(is_mmio_spte(*sptep)))
ret = RET_PF_EMULATE;
@@ -5672,7 +5679,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 * on PT_WRITABLE_MASK anymore.
 */
if (flush)
-   kvm_flush_remote_tlbs(kvm);
+   kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+   memslot->npages);
 }
 
 static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
@@ -5742,7 +5750,8 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
 * dirty_bitmap.
 */
if (flush)
-   kvm_flush_remote_tlbs(kvm);
+   kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+   memslot->npages);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);
 
@@ -5760,7 +5769,8 @@ void kvm_mmu_slot_largepage_remove_write_access(struct 
kvm *kvm,
lockdep_assert_held(>slots_lock);
 
if (flush)
-   kvm_flush_remote_tlbs(kvm);
+   kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+   memslot->npages);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_slot_largepage_remove_write_access);
 
@@ -5777,7 +5787,8 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
 
/* see kvm_mmu_slot_leaf_clear_dirty */
if (flush)
-   kvm_flush_remote_tlbs(kvm);
+   kvm_flush_remote_tlbs_with_address(kvm, memslot->base_gfn,
+   memslot->npages);
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7cf2185b7eb5..6bdca39829bc 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -894,7 +894,8 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, 
hpa_t root_hpa)
pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
 
if (mmu_page_zap_pte(vcpu->kvm, sp, sptep))
-   kvm_flush_remote_tlbs(vcpu->kvm);
+   kvm_flush_remote_tlbs_with_address(vcpu->kvm,
+   sp->gfn, 
KVM_PAGES_PER_HPAGE(sp->role.level));

[Resend PATCH V5 5/10] KVM/MMU: Add tlb flush with range helper function

2018-12-06 Thread lantianyu1986

From: Lan Tianyu 

This patch is to add wrapper functions for tlb_remote_flush_with_range
callback and flush tlb directly in kvm_mmu_zap_collapsible_spte().
kvm_mmu_zap_collapsible_spte() returns flush request to the
slot_handle_leaf() and the latter does flush on demand. When
range flush is available, make kvm_mmu_zap_collapsible_spte()
to flush tlb with range directly to avoid returning range back
to slot_handle_leaf().

Signed-off-by: Lan Tianyu 
---
Change since V4:
   Remove flush list interface and flush tlb directly in
   kvm_mmu_zap_collapsible_spte().
Change since V3:
   Remove code of updating "tlbs_dirty"
Change since V2:
   Fix comment in the kvm_flush_remote_tlbs_with_range()
---
 arch/x86/kvm/mmu.c | 37 -
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7c03c0f35444..74df957f2f81 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -264,6 +264,35 @@ static void mmu_spte_set(u64 *sptep, u64 spte);
 static union kvm_mmu_page_role
 kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu);
 
+
+static inline bool kvm_available_flush_tlb_with_range(void)
+{
+   return kvm_x86_ops->tlb_remote_flush_with_range;
+}
+
+static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
+   struct kvm_tlb_range *range)
+{
+   int ret = -ENOTSUPP;
+
+   if (range && kvm_x86_ops->tlb_remote_flush_with_range)
+   ret = kvm_x86_ops->tlb_remote_flush_with_range(kvm, range);
+
+   if (ret)
+   kvm_flush_remote_tlbs(kvm);
+}
+
+static void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
+   u64 start_gfn, u64 pages)
+{
+   struct kvm_tlb_range range;
+
+   range.start_gfn = start_gfn;
+   range.pages = pages;
+
+   kvm_flush_remote_tlbs_with_range(kvm, );
+}
+
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value)
 {
BUG_ON((mmio_mask & mmio_value) != mmio_value);
@@ -5671,7 +5700,13 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
!kvm_is_reserved_pfn(pfn) &&
PageTransCompoundMap(pfn_to_page(pfn))) {
pte_list_remove(rmap_head, sptep);
-   need_tlb_flush = 1;
+
+   if (kvm_available_flush_tlb_with_range())
+   kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
+   KVM_PAGES_PER_HPAGE(sp->role.level));
+   else
+   need_tlb_flush = 1;
+
goto restart;
}
}
-- 
2.14.4

[Resend PATCH V5 4/10] KVM/VMX: Add hv tlb range flush support

2018-12-06 Thread lantianyu1986

From: Lan Tianyu 

This patch is to register tlb_remote_flush_with_range callback with
hv tlb range flush interface.

Signed-off-by: Lan Tianyu 
---
Change since v4:
- Use new function kvm_fill_hv_flush_list_func() to fill flush
   request.
Change since v3:
- Merge Vitaly's don't pass EPT configuration info to
vmx_hv_remote_flush_tlb() fix.
Change since v1:
- Pass flush range with new hyper-v tlb flush struct rather
   than KVM tlb flush struct.
---
 arch/x86/kvm/vmx.c | 63 +++---
 1 file changed, 46 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2356118ea440..bad2aa6c5ca1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1569,7 +1569,34 @@ static void check_ept_pointer_match(struct kvm *kvm)
to_kvm_vmx(kvm)->ept_pointers_match = EPT_POINTERS_MATCH;
 }
 
-static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
+int kvm_fill_hv_flush_list_func(struct hv_guest_mapping_flush_list *flush,
+   void *data)
+{
+   struct kvm_tlb_range *range = data;
+
+   return hyperv_fill_flush_guest_mapping_list(flush, range->start_gfn,
+   range->pages);
+}
+
+static inline int __hv_remote_flush_tlb_with_range(struct kvm *kvm,
+   struct kvm_vcpu *vcpu, struct kvm_tlb_range *range)
+{
+   u64 ept_pointer = to_vmx(vcpu)->ept_pointer;
+
+   /*
+* FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs address
+* of the base of EPT PML4 table, strip off EPT configuration
+* information.
+*/
+   if (range)
+   return hyperv_flush_guest_mapping_range(ept_pointer & PAGE_MASK,
+   kvm_fill_hv_flush_list_func, (void *)range);
+   else
+   return hyperv_flush_guest_mapping(ept_pointer & PAGE_MASK);
+}
+
+static int hv_remote_flush_tlb_with_range(struct kvm *kvm,
+   struct kvm_tlb_range *range)
 {
struct kvm_vcpu *vcpu;
int ret = -ENOTSUPP, i;
@@ -1579,29 +1606,26 @@ static int vmx_hv_remote_flush_tlb(struct kvm *kvm)
if (to_kvm_vmx(kvm)->ept_pointers_match == EPT_POINTERS_CHECK)
check_ept_pointer_match(kvm);
 
-   /*
-* FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE hypercall needs the address of the
-* base of EPT PML4 table, strip off EPT configuration information.
-* If ept_pointer is invalid pointer, bypass the flush request.
-*/
if (to_kvm_vmx(kvm)->ept_pointers_match != EPT_POINTERS_MATCH) {
kvm_for_each_vcpu(i, vcpu, kvm) {
-   u64 ept_pointer = to_vmx(vcpu)->ept_pointer;
-
-   if (!VALID_PAGE(ept_pointer))
-   continue;
-
-   ret |= hyperv_flush_guest_mapping(
-   ept_pointer & PAGE_MASK);
+   /* If ept_pointer is invalid pointer, bypass flush 
request. */
+   if (VALID_PAGE(to_vmx(vcpu)->ept_pointer))
+   ret |= __hv_remote_flush_tlb_with_range(
+   kvm, vcpu, range);
}
} else {
-   ret = hyperv_flush_guest_mapping(
-   to_vmx(kvm_get_vcpu(kvm, 0))->ept_pointer & PAGE_MASK);
+   ret = __hv_remote_flush_tlb_with_range(kvm,
+   kvm_get_vcpu(kvm, 0), range);
}
 
spin_unlock(_kvm_vmx(kvm)->ept_pointer_lock);
return ret;
 }
+
+static int hv_remote_flush_tlb(struct kvm *kvm)
+{
+   return hv_remote_flush_tlb_with_range(kvm, NULL);
+}
 #else /* !IS_ENABLED(CONFIG_HYPERV) */
 static inline void evmcs_write64(unsigned long field, u64 value) {}
 static inline void evmcs_write32(unsigned long field, u32 value) {}
@@ -7971,8 +7995,11 @@ static __init int hardware_setup(void)
 
 #if IS_ENABLED(CONFIG_HYPERV)
if (ms_hyperv.nested_features & HV_X64_NESTED_GUEST_MAPPING_FLUSH
-   && enable_ept)
-   kvm_x86_ops->tlb_remote_flush = vmx_hv_remote_flush_tlb;
+   && enable_ept) {
+   kvm_x86_ops->tlb_remote_flush = hv_remote_flush_tlb;
+   kvm_x86_ops->tlb_remote_flush_with_range =
+   hv_remote_flush_tlb_with_range;
+   }
 #endif
 
if (!cpu_has_vmx_ple()) {
@@ -11612,6 +11639,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
*kvm, unsigned int id)
vmx->nested.posted_intr_nv = -1;
vmx->nested.current_vmptr = -1ull;
 
+   vmx->ept_pointer = INVALID_PAGE;
+
vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED;
 
/*
-- 
2.14.4

[Resend PATCH V5 3/10] x86/Hyper-v: Add trace in the hyperv_nested_flush_guest_mapping_range()

2018-12-06 Thread lantianyu1986

From: Lan Tianyu 

This patch is to trace log in the hyperv_nested_flush_
guest_mapping_range().

Signed-off-by: Lan Tianyu 
---
 arch/x86/hyperv/nested.c|  1 +
 arch/x86/include/asm/trace/hyperv.h | 14 ++
 2 files changed, 15 insertions(+)

diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index 3d0f31e46954..dd0a843f766d 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -130,6 +130,7 @@ int hyperv_flush_guest_mapping_range(u64 as,
else
ret = status;
 fault:
+   trace_hyperv_nested_flush_guest_mapping_range(as, ret);
return ret;
 }
 EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping_range);
diff --git a/arch/x86/include/asm/trace/hyperv.h 
b/arch/x86/include/asm/trace/hyperv.h
index 2e6245a023ef..ace464f09681 100644
--- a/arch/x86/include/asm/trace/hyperv.h
+++ b/arch/x86/include/asm/trace/hyperv.h
@@ -42,6 +42,20 @@ TRACE_EVENT(hyperv_nested_flush_guest_mapping,
TP_printk("address space %llx ret %d", __entry->as, __entry->ret)
);
 
+TRACE_EVENT(hyperv_nested_flush_guest_mapping_range,
+   TP_PROTO(u64 as, int ret),
+   TP_ARGS(as, ret),
+
+   TP_STRUCT__entry(
+   __field(u64, as)
+   __field(int, ret)
+   ),
+   TP_fast_assign(__entry->as = as;
+  __entry->ret = ret;
+   ),
+   TP_printk("address space %llx ret %d", __entry->as, __entry->ret)
+   );
+
 TRACE_EVENT(hyperv_send_ipi_mask,
TP_PROTO(const struct cpumask *cpus,
 int vector),
-- 
2.14.4

[Resend PATCH V5 2/10] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

2018-12-06 Thread lantianyu1986

From: Lan Tianyu 

Hyper-V provides HvFlushGuestAddressList() hypercall to flush EPT tlb
with specified ranges. This patch is to add the hypercall support.

Reviewed-by:  Michael Kelley 
Signed-off-by: Lan Tianyu 
---
Change sincd v4:
   - Expose function hyperv_fill_flush_guest_mapping_list()
   out of hyperv file
   - Adjust parameter of hyperv_flush_guest_mapping_range()

Change since v2:
  Fix some coding style issues
- Move HV_MAX_FLUSH_PAGES and HV_MAX_FLUSH_REP_COUNT to
hyperv-tlfs.h.
- Calculate HV_MAX_FLUSH_REP_COUNT in the macro definition
- Use HV_MAX_FLUSH_REP_COUNT to define length of gpa_list in
struct hv_guest_mapping_flush_list.

Change since v1:
   Add hyperv tlb flush struct to avoid use kvm tlb flush struct
in the hyperv file.
---
 arch/x86/hyperv/nested.c   | 79 ++
 arch/x86/include/asm/hyperv-tlfs.h | 32 +++
 arch/x86/include/asm/mshyperv.h| 15 
 3 files changed, 126 insertions(+)

diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
index b8e60cc50461..3d0f31e46954 100644
--- a/arch/x86/hyperv/nested.c
+++ b/arch/x86/hyperv/nested.c
@@ -7,6 +7,7 @@
  *
  * Author : Lan Tianyu 
  */
+#define pr_fmt(fmt)  "Hyper-V: " fmt
 
 
 #include 
@@ -54,3 +55,81 @@ int hyperv_flush_guest_mapping(u64 as)
return ret;
 }
 EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
+
+int hyperv_fill_flush_guest_mapping_list(
+   struct hv_guest_mapping_flush_list *flush,
+   u64 start_gfn, u64 pages)
+{
+   u64 cur = start_gfn;
+   u64 additional_pages;
+   int gpa_n = 0;
+
+   do {
+   /*
+* If flush requests exceed max flush count, go back to
+* flush tlbs without range.
+*/
+   if (gpa_n >= HV_MAX_FLUSH_REP_COUNT)
+   return -ENOSPC;
+
+   additional_pages = min_t(u64, pages, HV_MAX_FLUSH_PAGES) - 1;
+
+   flush->gpa_list[gpa_n].page.additional_pages = additional_pages;
+   flush->gpa_list[gpa_n].page.largepage = false;
+   flush->gpa_list[gpa_n].page.basepfn = cur;
+
+   pages -= additional_pages + 1;
+   cur += additional_pages + 1;
+   gpa_n++;
+   } while (pages > 0);
+
+   return gpa_n;
+}
+EXPORT_SYMBOL_GPL(hyperv_fill_flush_guest_mapping_list);
+
+int hyperv_flush_guest_mapping_range(u64 as,
+   hyperv_fill_flush_list_func fill_flush_list_func, void *data)
+{
+   struct hv_guest_mapping_flush_list **flush_pcpu;
+   struct hv_guest_mapping_flush_list *flush;
+   u64 status = 0;
+   unsigned long flags;
+   int ret = -ENOTSUPP;
+   int gpa_n = 0;
+
+   if (!hv_hypercall_pg || !fill_flush_list_func)
+   goto fault;
+
+   local_irq_save(flags);
+
+   flush_pcpu = (struct hv_guest_mapping_flush_list **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+
+   flush = *flush_pcpu;
+   if (unlikely(!flush)) {
+   local_irq_restore(flags);
+   goto fault;
+   }
+
+   flush->address_space = as;
+   flush->flags = 0;
+
+   gpa_n = fill_flush_list_func(flush, data);
+   if (gpa_n < 0) {
+   local_irq_restore(flags);
+   goto fault;
+   }
+
+   status = hv_do_rep_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST,
+gpa_n, 0, flush, NULL);
+
+   local_irq_restore(flags);
+
+   if (!(status & HV_HYPERCALL_RESULT_MASK))
+   ret = 0;
+   else
+   ret = status;
+fault:
+   return ret;
+}
+EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping_range);
diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
b/arch/x86/include/asm/hyperv-tlfs.h
index 4139f7650fe5..405a378e1c62 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -10,6 +10,7 @@
 #define _ASM_X86_HYPERV_TLFS_H
 
 #include 
+#include 
 
 /*
  * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
@@ -358,6 +359,7 @@ struct hv_tsc_emulation_status {
 #define HVCALL_POST_MESSAGE0x005c
 #define HVCALL_SIGNAL_EVENT0x005d
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
+#define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
 
 #define HV_X64_MSR_VP_ASSIST_PAGE_ENABLE   0x0001
 #define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT12
@@ -757,6 +759,36 @@ struct hv_guest_mapping_flush {
u64 flags;
 };
 
+/*
+ *  HV_MAX_FLUSH_PAGES = "additional_pages" + 1. It's limited
+ *  by the bitwidth of "additional_pages" in union hv_gpa_page_range.
+ */
+#define HV_MAX_FLUSH_PAGES (2048)
+
+/* HvFlushGuestPhysicalAddressList hypercall */
+union hv_gpa_page_range {
+   u64 address_space;
+   struct {
+   u64 additional_pages:11;
+   u64 largepage:1;
+

[Resend PATCH V5 1/10] KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops

2018-12-06 Thread lantianyu1986

From: Lan Tianyu 

Add flush range call back in the kvm_x86_ops and platform can use it
to register its associated function. The parameter "kvm_tlb_range"
accepts a single range and flush list which contains a list of ranges.

Signed-off-by: Lan Tianyu 
---
Change since v1:
   Change "end_gfn" to "pages" to aviod confusion as to whether
"end_gfn" is inclusive or exlusive.
---
 arch/x86/include/asm/kvm_host.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fbda5a917c5b..fc7513ecfc13 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -439,6 +439,11 @@ struct kvm_mmu {
u64 pdptrs[4]; /* pae */
 };
 
+struct kvm_tlb_range {
+   u64 start_gfn;
+   u64 pages;
+};
+
 enum pmc_type {
KVM_PMC_GP = 0,
KVM_PMC_FIXED,
@@ -1042,6 +1047,8 @@ struct kvm_x86_ops {
 
void (*tlb_flush)(struct kvm_vcpu *vcpu, bool invalidate_gpa);
int  (*tlb_remote_flush)(struct kvm *kvm);
+   int  (*tlb_remote_flush_with_range)(struct kvm *kvm,
+   struct kvm_tlb_range *range);
 
/*
 * Flush any TLB entries associated with the given GVA.
-- 
2.14.4

[Resend PATCH V5 0/10] x86/KVM/Hyper-v: Add HV ept tlb range flush hypercall support in KVM

2018-12-06 Thread lantianyu1986

From: Lan Tianyu 

For nested memory virtualization, Hyper-v doesn't set write-protect
L1 hypervisor EPT page directory and page table node to track changes 
while it relies on guest to tell it changes via HvFlushGuestAddressLlist
hypercall. HvFlushGuestAddressLlist hypercall provides a way to flush
EPT page table with ranges which are specified by L1 hypervisor.

If L1 hypervisor uses INVEPT or HvFlushGuestAddressSpace hypercall to
flush EPT tlb, Hyper-V will invalidate associated EPT shadow page table
and sync L1's EPT table when next EPT page fault is triggered.
HvFlushGuestAddressLlist hypercall helps to avoid such redundant EPT
page fault and synchronization of shadow page table.

This patchset is based on the Patch "KVM/VMX: Check ept_pointer before
flushing ept tlb"(https://marc.info/?l=kvm=154408169705686=2).

Change since v4:
   1) Split flush address and flush list patches. This patchset only 
contains
   flush address patches. Will post flush list patches later.
   2) Expose function hyperv_fill_flush_guest_mapping_list()
   out of hyperv file
   3) Adjust parameter of hyperv_flush_guest_mapping_range()
   4) Reorder patchset and move Hyper-V and VMX changes ahead.

Change since v3:
1) Remove code of updating "tlbs_dirty" in 
kvm_flush_remote_tlbs_with_range()
2) Remove directly tlb flush in the kvm_handle_hva_range()
3) Move tlb flush in kvm_set_pte_rmapp() to 
kvm_mmu_notifier_change_pte()
4) Combine Vitaly's "don't pass EPT configuration info to
vmx_hv_remote_flush_tlb()" fix

Change since v2:
   1) Fix comment in the kvm_flush_remote_tlbs_with_range()
   2) Move HV_MAX_FLUSH_PAGES and HV_MAX_FLUSH_REP_COUNT to
hyperv-tlfs.h.
   3) Calculate HV_MAX_FLUSH_REP_COUNT in the macro definition
   4) Use HV_MAX_FLUSH_REP_COUNT to define length of gpa_list in
struct hv_guest_mapping_flush_list.

Change since v1:
   1) Convert "end_gfn" of struct kvm_tlb_range to "pages" in order
  to avoid confusion as to whether "end_gfn" is inclusive or exlusive.
   2) Add hyperv tlb range struct and replace kvm tlb range struct
  with new struct in order to avoid using kvm struct in the hyperv
  code directly.



Lan Tianyu (10):
  KVM: Add tlb_remote_flush_with_range callback in kvm_x86_ops
  x86/hyper-v: Add HvFlushGuestAddressList hypercall support
  x86/Hyper-v: Add trace in the
hyperv_nested_flush_guest_mapping_range()
  KVM/VMX: Add hv tlb range flush support
  KVM/MMU: Add tlb flush with range helper function
  KVM: Replace old tlb flush function with new one to flush a specified
range.
  KVM: Make kvm_set_spte_hva() return int
  KVM/MMU: Move tlb flush in kvm_set_pte_rmapp() to
kvm_mmu_notifier_change_pte()
  KVM/MMU: Flush tlb directly in the kvm_set_pte_rmapp()
  KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()

 arch/arm/include/asm/kvm_host.h |  2 +-
 arch/arm64/include/asm/kvm_host.h   |  2 +-
 arch/mips/include/asm/kvm_host.h|  2 +-
 arch/mips/kvm/mmu.c |  3 +-
 arch/powerpc/include/asm/kvm_host.h |  2 +-
 arch/powerpc/kvm/book3s.c   |  3 +-
 arch/powerpc/kvm/e500_mmu_host.c|  3 +-
 arch/x86/hyperv/nested.c| 80 +++
 arch/x86/include/asm/hyperv-tlfs.h  | 32 +
 arch/x86/include/asm/kvm_host.h |  9 +++-
 arch/x86/include/asm/mshyperv.h | 15 ++
 arch/x86/include/asm/trace/hyperv.h | 14 ++
 arch/x86/kvm/mmu.c  | 96 +
 arch/x86/kvm/paging_tmpl.h  |  3 +-
 arch/x86/kvm/vmx.c  | 63 +---
 virt/kvm/arm/mmu.c  |  6 ++-
 virt/kvm/kvm_main.c |  5 +-
 17 files changed, 292 insertions(+), 48 deletions(-)

-- 
2.14.4

Re: [PATCH] powerpc/ipic: Fix a bounds check in ipic_set_priority()

2018-12-06 Thread Dan Carpenter

On Wed, Dec 05, 2018 at 02:26:47PM +1100, Michael Ellerman wrote:
> Can smatch help us find things like this that are defined non-static but
> never used?
> 

It's too tricky because it depends on the .config as well.

regards,
dan carpenter

[PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-06 Thread Gautham R. Shenoy

From: "Gautham R. Shenoy" 

Currently running DLPAR offline/online operations in a loop on a
POWER9 system with SMT=off results in the following crash:

[  223.321032] cpu 112 (hwid 112) Ready to die...
[  223.355963] Querying DEAD? cpu 113 (113) shows 2
[  223.356233] cpu 114 (hwid 114) Ready to die...
[  223.356236] cpu 113 (hwid 113) Ready to die...
[  223.356241] Bad kernel stack pointer 1faf6ca0 at 1faf6d50
[  223.356243] Oops: Bad kernel stack pointer, sig: 6 [#1]
[  223.356244] LE SMP NR_CPUS=2048 NUMA pSeries
[  223.356247] Modules linked in:
[  223.356255] CPU: 114 PID: 0 Comm: swapper/114 Not tainted 4.20.0-rc3 #39
[  223.356258] NIP:  1faf6d50 LR: 1faf6d50 CTR: 1ec6d06c
[  223.356259] REGS: c0001e5cbd30 TRAP: 0700   Not tainted  (4.20.0-rc3)
[  223.356260] MSR:  80081000   CR: 2804  XER: 0020
[  223.356263] CFAR: 1ec98590 IRQMASK: 80009033
   GPR00: 1faf6d50 1faf6ca0 1ed1c448 
0267e6a273c3
   GPR04:  00e0 dfe8 
1faf6d30
   GPR08: 1faf6d28 0267e6a273c3 1ec1b108 

   GPR12: 01b6d998 c0001eb55080 c003a1b8bf90 
1eea3f40
   GPR16:  c006fda85100 c004c8b0 
c13d5300
   GPR20: c006fda85300 0008 c19d2cf8 
c13d6888
   GPR24: 0072 c13d688c 0002 
c13d688c
   GPR28: c19cecf0 0390  
1faf6ca0
[  223.356281] NIP [1faf6d50] 0x1faf6d50
[  223.356281] LR [1faf6d50] 0x1faf6d50
[  223.356282] Call Trace:
[  223.356283] Instruction dump:
[  223.356285]        

[  223.356286]        

[  223.356290] ---[ end trace f46a4e046b564d1f ]---

This is due to multiple offlined CPUs (CPU 113 and CPU 114 above)
concurrently (within 3us) trying to make the rtas-call with the
"stop-self" token, something that is prohibited by the PAPR.

The concurrent calls can happen as follows.

  o In dlpar_offline_cpu() we prod an offline CPU X (offline due to
SMT=off) and loop for 25 tries in pseries_cpu_die() querying if
the target CPU X has been stopped in RTAS. After 25 tries, we
prints the message "Querying DEAD? cpu X (X) shows 2" and return
to dlpar_offline_cpu(). Note that at this point CPU X has not yet
called rtas with the "stop-self" token, but can do so anytime now.

  o Back in dlpar_offline_cpu(), we prod the next offline CPU Y. CPU Y
promptly wakes up and calls RTAS with the "stop-self" token.

  o Before RTAS can stop CPU Y, CPU X also calls RTAS with the
"stop-self" token.

The problem occurs due to the short number of tries (25) in
pseries_cpu_die() which covers 90% of the cases. For the remaining 10%
of the cases, it was observed that we would need to loop for a few
hundred iterations before the target CPU calls rtas with "stop-self"
token.Experiments show that each try takes roughly ~25us.

In this patch we fix the problem by increasing the number of tries
from 25 to 4000 (which roughly corresponds to 100ms) before bailing
out and declaring that we have failed to observe the target CPU call
rtas-stop-self. This provides sufficient serialization to ensure that
there no concurrent rtas calls with "stop-self" token.

Reported-by: Michael Bringmann 
Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 2f8e621..c913c44 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -214,14 +214,25 @@ static void pseries_cpu_die(unsigned int cpu)
msleep(1);
}
} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+   int max_tries = 4000; /* Roughly corresponds to 100ms */
+   u64 tb_before = mftb();
 
-   for (tries = 0; tries < 25; tries++) {
+   for (tries = 0; tries < max_tries; tries++) {
cpu_status = smp_query_cpu_stopped(pcpu);
if (cpu_status == QCSS_STOPPED ||
cpu_status == QCSS_HARDWARE_ERROR)
break;
cpu_relax();
}
+
+   if (tries == max_tries) {
+   u64 time_elapsed_us = div_u64(mftb() - tb_before,
+ tb_ticks_per_usec);
+
+   pr_warn("Offlined CPU %d isn't stopped by RTAS after 
%llu us\n",
+

RE: [v11 1/7] dmaengine: fsldma: Replace DMA_IN/OUT by FSL_DMA_IN/OUT

2018-12-06 Thread Peng Ma

Hi Vinod,

Thanks for your apply, I have finished update DTS patch, please review.

Best Regards,
Peng

>-Original Message-
>From: Vinod Koul 
>Sent: 2018年12月6日 1:49
>To: Peng Ma 
>Cc: robh...@kernel.org; mark.rutl...@arm.com; shawn...@kernel.org; Leo
>Li ; dan.j.willi...@intel.com; z...@zh-kernel.org;
>dmaeng...@vger.kernel.org; devicet...@vger.kernel.org;
>linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
>linuxppc-dev@lists.ozlabs.org; Wen He 
>Subject: Re: [v11 1/7] dmaengine: fsldma: Replace DMA_IN/OUT by
>FSL_DMA_IN/OUT
>
>On 30-10-18, 10:35, Peng Ma wrote:
>> From: Wen He 
>>
>> This patch implement a standard macro call functions is used to NXP
>> dma drivers.
>
>Applied all except DTS patches, thanks
>--
>~Vinod

RE: [v11 6/7] arm64: dts: ls1046a: add qdma device tree nodes

2018-12-06 Thread Peng Ma

Hi Shawn,

Thanks for your review , I have used GIC_SPI and IRQ_TYPE_xxx to my 
dtsi, please check and review.

Best Regard,
Peng

>-Original Message-
>From: Shawn Guo 
>Sent: 2018年12月6日 9:06
>To: Peng Ma 
>Cc: vk...@kernel.org; robh...@kernel.org; mark.rutl...@arm.com; Leo Li
>; dan.j.willi...@intel.com; z...@zh-kernel.org;
>dmaeng...@vger.kernel.org; devicet...@vger.kernel.org;
>linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
>linuxppc-dev@lists.ozlabs.org; Wen He 
>Subject: Re: [v11 6/7] arm64: dts: ls1046a: add qdma device tree nodes
>
>On Tue, Oct 30, 2018 at 10:36:03AM +0800, Peng Ma wrote:
>> add the qDMA device tree nodes for LS1046A devices.
>>
>> Signed-off-by: Wen He 
>> Signed-off-by: Peng Ma 
>> ---
>> change in v11:
>>  - no
>>
>>  arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi |   21
>+
>>  1 files changed, 21 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> index ef83786..dc65318 100644
>> --- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
>> @@ -704,6 +704,27 @@
>>  < 0 0 4  GIC_SPI 154
>IRQ_TYPE_LEVEL_HIGH>;
>>  };
>>
>> +qdma: dma-controller@838 {
>> +compatible = "fsl,ls1046a-qdma", "fsl,ls1021a-qdma";
>> +reg = <0x0 0x838 0x0 0x1000>, /* Controller regs */
>> +  <0x0 0x839 0x0 0x1>, /* Status regs */
>> +  <0x0 0x83a 0x0 0x4>; /* Block regs */
>> +interrupts = <0 153 0x4>,
>> + <0 39 0x4>,
>> + <0 40 0x4>,
>> + <0 41 0x4>,
>> + <0 42 0x4>;
>
>Use GIC_SPI and IRQ_TYPE_xxx defines.
>
>Shawn
>
>> +interrupt-names = "qdma-error", "qdma-queue0",
>> +"qdma-queue1", "qdma-queue2", "qdma-queue3";
>> +dma-channels = <8>;
>> +block-number = <1>;
>> +block-offset = <0x1>;
>> +fsl,dma-queues = <2>;
>> +status-sizes = <64>;
>> +queue-sizes = <64 64>;
>> +big-endian;
>> +};
>> +
>>  };
>>
>>  reserved-memory {
>> --
>> 1.7.1
>>

Re: use generic DMA mapping code in powerpc V4

2018-12-06 Thread Christian Zigotzky


On 05 December 2018 at 3:05PM, Christoph Hellwig wrote:


Thanks.  Can you try a few stepping points in the tree?

First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
(the first one) applied?

Second with all commits up to 5da11e49df21f21dac25a2491aa788307bdacb6b

And if that still works with commits up to
c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a


Hi Christoph,

I undid the commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 with the 
following command:


git checkout 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6

Result: PASEMI onboard ethernet works again and the P5020 board boots.

I will test the other commits in the next days.

@All
It is really important, that you also test Christoph's work on your 
PASEMI and NXP boards. Could you please help us with solving the issues?


'git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.5 a'

Thanks,
Christian

Re: [PATCH RFC 7/7] mm: better document PG_reserved

2018-12-06 Thread David Hildenbrand

On 05.12.18 19:13, David Hildenbrand wrote:
> On 05.12.18 18:32, Matthew Wilcox wrote:
>> On Wed, Dec 05, 2018 at 04:05:12PM +0100, David Hildenbrand wrote:
>>> On 05.12.18 15:35, Matthew Wilcox wrote:
 On Wed, Dec 05, 2018 at 01:28:51PM +0100, David Hildenbrand wrote:
> I don't see a reason why we have to document "Some of them might not even
> exist". If there is a user, we should document it. E.g. for balloon
> drivers we now use PG_offline to indicate that a page might currently
> not be backed by memory in the hypervisor. And that is independent from
> PG_reserved.

 I think you're confused by the meaning of "some of them might not even
 exist".  What this means is that there might not be memory there; maybe
 writes to that memory will be discarded, or maybe they'll cause a machine
 check.  Maybe reads will return ~0, or 0, or cause a machine check.
 We just don't know what's there, and we shouldn't try touching the memory.
>>>
>>> If there are users, let's document it. And I need more details for that :)
>>>
>>> 1. machine check: if there is a HW error, we set PG_hwpoison (except
>>> ia64 MCA, see the list)
>>>
>>> 2. Writes to that memory will be discarded
>>>
>>> Who is the user of that? When will we have such pages right now?
>>>
>>> 3. Reads will return ~0, / 0?
>>>
>>> I think this is a special case of e.g. x86? But where do we have that,
>>> are there any user?
>>
>> When there are gaps in the physical memory.  As in, if you put that
>> physical address on the bus (or in a packet), no device will respond
>> to it.  Look:
>>
>> -0fff : Reserved
>> 1000-00057fff : System RAM
>> 00058000-00058fff : Reserved
>> 00059000-0009dfff : System RAM
>> 0009e000-000f : Reserved
>>
>> Those examples I gave are examples of how various different architectures
>> respond to "no device responded to this memory access".
>>
> 
> Okay, so for this memory we will have
> a) vmmaps
> b) Memory block devices
> c) A sections that is online
> 
> So essentially "Gaps in physical memory" which is part of a online section.
> 
> This might be a candidate for PG_offline as well.
> 
> Thanks for the info, I'll try to find out how such things are handled.
> In general I assume this memory has to be readable, because otherwise
> kdump and friends would crash already when trying to dump?
> 

So I finally understood how physical memory holes in online sections are
handled when dumping. They won't be dumped because the list of dumpable
chunks (contained in /proc/kcore and after a crash /proc/vmcore) is
built using walk_system_ram_range(). So anything not listed as RAM will
be ignored.

I will update the documentation, describing that if we have an online
section that is not completely IORESOURCE_SYSTEM_RAM, that the physical
memory gaps will also be set to PG_reserved.

Trying to touch this memory is indeed dangerous, luckily dumping is
properly handled.

I'll think about if marking these ranges as PG_offline might make sense
(and if it can be easily added). Then we directly know when seeing that
page type that we should not touch it. Ever.

That hint was really helpful :)

-- 

Thanks,

David / dhildenb

Re: [PATCH v2 17/34] dt-bindings: arm: Convert TI davinci board/soc bindings to json-schema

2018-12-06 Thread Sekhar Nori

On 04/12/18 3:02 AM, Rob Herring wrote:
> Convert TI Davinci SoC bindings to DT schema format using json-schema.
> 
> Cc: Sekhar Nori 
> Cc: Kevin Hilman 
> Cc: Mark Rutland 
> Cc: devicet...@vger.kernel.org
> Signed-off-by: Rob Herring 

Reviewed-by: Sekhar Nori 

Thanks,
Sekhar

Re: [PATCH] ALSA: Use of_node_name_eq for node name comparisons

2018-12-06 Thread Takashi Iwai

On Wed, 05 Dec 2018 20:50:46 +0100,
Rob Herring wrote:
> 
> Convert string compares of DT node names to use of_node_name_eq helper
> instead. This removes direct access to the node name pointer.
> 
> A couple of open coded iterating thru the child node names are converted
> to use for_each_child_of_node() instead.
> 
> Cc: Johannes Berg 
> Cc: Jaroslav Kysela 
> Cc: Takashi Iwai 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: alsa-de...@alsa-project.org
> Signed-off-by: Rob Herring 

Applied, thanks.


Takashi

Re: [PATCH] ALSA: soundbus: Remove direct OF name and type accesses

2018-12-06 Thread Takashi Iwai

On Wed, 05 Dec 2018 20:50:48 +0100,
Rob Herring wrote:
> 
> Convert soundbus uevent and sysfs OF node name and device type usage to
> use printf specifier and helper functions instead of directly accessing
> the name and type pointers. This will allow the eventual removal of the
> pointers.
> 
> Cc: Johannes Berg 
> Cc: Jaroslav Kysela 
> Cc: Takashi Iwai 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: alsa-de...@alsa-project.org
> Signed-off-by: Rob Herring 

Applied, thanks.


Takashi

[PATCH bpf] bpf: fix default unprivileged allocation limit

2018-12-06 Thread Sandipan Das

When using a large page size, the default value of the bpf_jit_limit
knob becomes invalid and users are not able to run unprivileged bpf
programs.

The bpf_jit_limit knob is represented internally as a 32-bit signed
integer because of which the default value, i.e. PAGE_SIZE * 4,
overflows in case of an architecture like powerpc64 which uses 64K
as the default page size (i.e. CONFIG_PPC_64K_PAGES is set).

So, instead of depending on the page size, use a constant value.

Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
allocations")
Signed-off-by: Sandipan Das 
---
 kernel/bpf/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b1a3545d0ec8..a81d097a17fb 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -365,7 +365,7 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
 }
 
 #ifdef CONFIG_BPF_JIT
-# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4)
+# define BPF_JIT_LIMIT_DEFAULT (4096 * 4)
 
 /* All BPF JIT sysctl knobs here. */
 int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
-- 
2.19.2

[PATCH bpf] bpf: powerpc: fix broken uapi for BPF_PROG_TYPE_PERF_EVENT

2018-12-06 Thread Sandipan Das

Now that there are different variants of pt_regs for userspace and
kernel, the uapi for the BPF_PROG_TYPE_PERF_EVENT program type must
be changed by exporting the user_pt_regs structure instead of the
pt_regs structure that is in-kernel only.

Fixes: 002af9391bfb ("powerpc: Split user/kernel definitions of struct pt_regs")
Signed-off-by: Sandipan Das 
---
 arch/powerpc/include/asm/perf_event.h  | 2 ++
 arch/powerpc/include/uapi/asm/Kbuild   | 1 -
 arch/powerpc/include/uapi/asm/bpf_perf_event.h | 9 +
 3 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/uapi/asm/bpf_perf_event.h

diff --git a/arch/powerpc/include/asm/perf_event.h 
b/arch/powerpc/include/asm/perf_event.h
index 8bf1b6351716..16a49819da9a 100644
--- a/arch/powerpc/include/asm/perf_event.h
+++ b/arch/powerpc/include/asm/perf_event.h
@@ -26,6 +26,8 @@
 #include 
 #include 
 
+#define perf_arch_bpf_user_pt_regs(regs) >user_regs
+
 /*
  * Overload regs->result to specify whether we should use the MSR (result
  * is zero) or the SIAR (result is non zero).
diff --git a/arch/powerpc/include/uapi/asm/Kbuild 
b/arch/powerpc/include/uapi/asm/Kbuild
index a658091a19f9..3712152206f3 100644
--- a/arch/powerpc/include/uapi/asm/Kbuild
+++ b/arch/powerpc/include/uapi/asm/Kbuild
@@ -1,7 +1,6 @@
 # UAPI Header export list
 include include/uapi/asm-generic/Kbuild.asm
 
-generic-y += bpf_perf_event.h
 generic-y += param.h
 generic-y += poll.h
 generic-y += resource.h
diff --git a/arch/powerpc/include/uapi/asm/bpf_perf_event.h 
b/arch/powerpc/include/uapi/asm/bpf_perf_event.h
new file mode 100644
index ..b551b741653d
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/bpf_perf_event.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _UAPI__ASM_BPF_PERF_EVENT_H__
+#define _UAPI__ASM_BPF_PERF_EVENT_H__
+
+#include 
+
+typedef struct user_pt_regs bpf_user_pt_regs_t;
+
+#endif /* _UAPI__ASM_BPF_PERF_EVENT_H__ */
-- 
2.19.2

Re: [PATCH] powerpc/ipic: Fix a bounds check in ipic_set_priority()

2018-12-06 Thread Julia Lawall



On Thu, 6 Dec 2018, Christophe LEROY wrote:

>
>
> Le 05/12/2018 à 04:26, Michael Ellerman a écrit :
> > Hi Dan,
> >
> > Thanks for the patch.
> >
> > Dan Carpenter  writes:
> > > The ipic_info[] array only has 95 elements so I have made the bounds
> > > check smaller to prevent a read overflow.  It was Smatch that found
> > > this issue:
> > >
> > >  arch/powerpc/sysdev/ipic.c:784 ipic_set_priority()
> > >  error: buffer overflow 'ipic_info' 95 <= 127
> > >
> > > Signed-off-by: Dan Carpenter 
> > > ---
> > > I wasn't able to find any callers of this code.  Maybe we removed the
> > > last one in commit b9f0f1bb2bca ("[POWERPC] Adapt ipic driver to new
> > > host_ops interface, add set_irq_type to set IRQ sense").  So perhaps we
> > > should just remove it.  I'm not really comfortable doing that myself,
> > > because I don't know the code well enough and can't build test
> > > it properly.
> >
> > Hah wow, last usage removed in 2006!
> >
> > I don't see any mention of it since then, so I'll remove it. If it
> > breaks something we can put it back.
> >
> > Can smatch help us find things like this that are defined non-static but
> > never used?
> >
>
> I think we have to do that carrefully. Some of those functions might be used
> by out-of-tree boards.
>
> I'm thinking especially at ipic_get_mcp_status() and ipic_set_mcp_status().
> They are used in my 832x boards's machine check handler to know when a machine
> check is a timeout from the 832x watchdog.

The message I have gotten in the past is that the Linux kernel doesn't
support code that is not used in the Linux kernel.  However, if I were to
do this, I would send the code to the individual maintainers, who
presumably would know what is actually needed and what is not.

Perhaps a good sanity check would be if the code has been used in the
past.  If there was a use in the past that has been removed, then perhaps
it is more likely that the function was intended for internal kernel use
rather than the case that you are describing.

julia

99 matches

Mail list logo