Re: [PATCH v2 5/5] powerpc: Remove -mno-sched-epilog

2018-09-14 Thread Segher Boessenkool
On Fri, Sep 14, 2018 at 03:20:18PM -0700, Nick Desaulniers wrote:
> On Fri, Sep 14, 2018 at 2:56 PM Segher Boessenkool
>  wrote:
> > On Sat, Sep 15, 2018 at 06:43:05AM +1000, Nicholas Piggin wrote:
> > > On Fri, 14 Sep 2018 11:03:38 -0700
> > > Nick Desaulniers  wrote:
> > >
> > > Sorry I forgot to cc you. This has links to some of the sched
> > > epilog bugs.
> > >
> > > https://marc.info/?l=linuxppc-embedded=153690223909654=2
> 
> Cool, cc me on the thread if you'd like me to add my reviewed-by tag
> visibly on the thread.
> 
> > PR44199 was backported to 4.4 and PR52828 is fixed in 4.8.
> 
> Are those GCC versions? If so, what does that mean for the users of
> the many GCC releases between 4.4 and 4.8?

Yes, GCC bug 44199 was fixed in GCC version 4.4, etc.  It of course also
is fixed in all later versions; so GCC releases between 4.4 and 4.8 have
the fix for PR44199 but not that for PR52828.

I didn't check exactly what 4.4.x versions have the fix, etc.  I always
assume anyone using x.y.z uses the highest z available.

I don't know if those are the only fixes you need; those are the two
bugs mentioned in Nicholas' patch (that MARC link above).


Segher


[PATCH 3/3] powerpc: uapi header and system call table file generation

2018-09-14 Thread Firoz Khan
System call table generation script must be run to generate
unistd_32/64.h and syscall_table_32/64/c32.h files. This patch
will have changes which will invokes the script.

This patch will generate unistd_32/64.h and syscall_table_
32/64/c32.h files by the syscall table generation script
invoked by arch/sparc/Makefile and the generated files against
the removed files will be identical.

The generated uapi header file will be included in uapi/asm/
unistd_32/64.h and generated system call table support file will
be included by arch/sparc/kernel/syscall_table_32/64.S file.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/Makefile  |   3 +
 arch/powerpc/include/asm/Kbuild|   3 +
 arch/powerpc/include/uapi/asm/Kbuild   |   2 +
 arch/powerpc/include/uapi/asm/unistd.h | 393 +
 arch/powerpc/kernel/Makefile   |   3 +-
 arch/powerpc/kernel/syscall_table_32.S |   9 +
 arch/powerpc/kernel/syscall_table_64.S |  17 ++
 arch/powerpc/kernel/systbl.S   |  50 -
 8 files changed, 39 insertions(+), 441 deletions(-)
 create mode 100644 arch/powerpc/kernel/syscall_table_32.S
 create mode 100644 arch/powerpc/kernel/syscall_table_64.S
 delete mode 100644 arch/powerpc/kernel/systbl.S

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 11a1acb..90614c9 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -400,6 +400,9 @@ archclean:
 
 archprepare: checkbin
 
+archheaders:
+   $(Q)$(MAKE) $(build)=arch/powerpc/kernel/syscalls all
+
 # Use the file '.tmp_gas_check' for binutils tests, as gas won't output
 # to stdout and these checks are run even on install targets.
 TOUT   := .tmp_gas_check
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 3196d22..74e63b4 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -8,3 +8,6 @@ generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += vtime.h
 generic-y += msi.h
+generated-y += syscall_table_32.h
+generated-y += syscall_table_64.h
+generated-y += syscall_table_c32.h
\ No newline at end of file
diff --git a/arch/powerpc/include/uapi/asm/Kbuild 
b/arch/powerpc/include/uapi/asm/Kbuild
index 1a6ed59..a731c5b 100644
--- a/arch/powerpc/include/uapi/asm/Kbuild
+++ b/arch/powerpc/include/uapi/asm/Kbuild
@@ -7,3 +7,5 @@ generic-y += poll.h
 generic-y += resource.h
 generic-y += sockios.h
 generic-y += statfs.h
+generated-y += unistd_32.h
+generated-y += unistd_64.h
\ No newline at end of file
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index f999df2..9084a0c 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -10,397 +10,10 @@
 #ifndef _UAPI_ASM_POWERPC_UNISTD_H_
 #define _UAPI_ASM_POWERPC_UNISTD_H_
 
-
-#define __NR_restart_syscall 0
-#define __NR_exit1
-#define __NR_fork2
-#define __NR_read3
-#define __NR_write   4
-#define __NR_open5
-#define __NR_close   6
-#define __NR_waitpid 7
-#define __NR_creat   8
-#define __NR_link9
-#define __NR_unlink 10
-#define __NR_execve 11
-#define __NR_chdir  12
-#define __NR_time   13
-#define __NR_mknod  14
-#define __NR_chmod  15
-#define __NR_lchown 16
-#define __NR_break  17
-#define __NR_oldstat18
-#define __NR_lseek  19
-#define __NR_getpid 20
-#define __NR_mount  21
-#define __NR_umount 22
-#define __NR_setuid 23
-#define __NR_getuid 24
-#define __NR_stime  25
-#define __NR_ptrace 26
-#define __NR_alarm  27
-#define __NR_oldfstat   28
-#define __NR_pause  29
-#define __NR_utime  30
-#define __NR_stty   31
-#define __NR_gtty   32
-#define __NR_access 33
-#define __NR_nice   34
-#define __NR_ftime  35
-#define __NR_sync   36
-#define __NR_kill   37
-#define __NR_rename 38
-#define __NR_mkdir  39
-#define __NR_rmdir  40
-#define __NR_dup41
-#define __NR_pipe   42
-#define __NR_times  43
-#define __NR_prof   44
-#define __NR_brk45
-#define __NR_setgid 46
-#define __NR_getgid 47
-#define __NR_signal 48
-#define __NR_geteuid49
-#define __NR_getegid50
-#define __NR_acct   51
-#define __NR_umount252
-#define __NR_lock   53
-#define __NR_ioctl  54
-#define __NR_fcntl  55
-#define __NR_mpx56
-#define __NR_setpgid57
-#define __NR_ulimit 58
-#define __NR_oldolduname59
-#define 

[PATCH 2/3] powerpc: Add system call table generation support

2018-09-14 Thread Firoz Khan
The system call tables are in different format in all
architecture and it will be difficult to manually add or
modify the system calls in the respective files. To make
it easy by keeping a script and which'll generate the
header file and syscall table file so this change will
unify them across all architectures.

The system call table generation script is added in
syscalls directory which contain the script to generate
both uapi header file system call table generation file
and syscall_32/64.tbl file which'll be the input for the
scripts.

syscall_32/64.tbl contains the list of available system calls
along with system call number and corresponding entry point.
Add a new system call in this architecture will be possible
by adding new entry in the syscall_32/64.tbl file.

Adding a new table entry consisting of:
- System call number.
- ABI.
- System call name.
- Entry point name.
- Compat entry name, if required.

syscallhdr.sh and syscalltbl.sh will generate uapi header-
unistd_32/64.h and syscall_table_32/64/c32.h files respectively.
File syscall_table_32/64/c32.h is included by syscall.S - the
real system call table. Both .sh files will parse the content
syscall.tbl to generate the header and table files.

ARM, s390 and x86 architecuture does have the similar support.
I leverage their implementation to come up with a generic
solution.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/kernel/syscalls/Makefile   |  51 
 arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 
 arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 +++
 arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
 arch/powerpc/kernel/syscalls/syscalltbl.sh  |  38 +++
 5 files changed, 876 insertions(+)
 create mode 100644 arch/powerpc/kernel/syscalls/Makefile
 create mode 100644 arch/powerpc/kernel/syscalls/syscall_32.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscall_64.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
 create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh

diff --git a/arch/powerpc/kernel/syscalls/Makefile 
b/arch/powerpc/kernel/syscalls/Makefile
new file mode 100644
index 000..0c87acb
--- /dev/null
+++ b/arch/powerpc/kernel/syscalls/Makefile
@@ -0,0 +1,51 @@
+# SPDX-License-Identifier: GPL-2.0
+out := arch/$(SRCARCH)/include/generated/asm
+uapi := arch/$(SRCARCH)/include/generated/uapi/asm
+
+_dummy := $(shell [ -d '$(uapi)' ] || mkdir -p '$(uapi)') \
+ $(shell [ -d '$(out)' ] || mkdir -p '$(out)')
+
+syscall32 := $(srctree)/$(src)/syscall_32.tbl
+syscall64 := $(srctree)/$(src)/syscall_64.tbl
+
+syshdr := $(srctree)/$(src)/syscallhdr.sh
+systbl := $(srctree)/$(src)/syscalltbl.sh
+
+quiet_cmd_syshdr = SYSHDR  $@
+  cmd_syshdr = $(CONFIG_SHELL) '$(syshdr)' '$<' '$@'  \
+  '$(syshdr_abi_$(basetarget))'  \
+  '$(syshdr_pfx_$(basetarget))'  \
+  '$(syshdr_offset_$(basetarget))'
+
+quiet_cmd_systbl = SYSTBL  $@
+  cmd_systbl = $(CONFIG_SHELL) '$(systbl)' '$<' '$@'  \
+  '$(systbl_abi_$(basetarget))'
+
+$(uapi)/unistd_32.h: $(syscall32) $(syshdr)
+   $(call if_changed,syshdr)
+
+$(uapi)/unistd_64.h: $(syscall64) $(syshdr)
+   $(call if_changed,syshdr)
+
+systbl_abi_syscall_table_32 := 32
+$(out)/syscall_table_32.h: $(syscall32) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abi_syscall_table_64 := 64
+$(out)/syscall_table_64.h: $(syscall64) $(systbl)
+   $(call if_changed,systbl)
+
+systbl_abi_syscall_table_c32 := c32
+$(out)/syscall_table_c32.h: $(syscall32) $(systbl)
+   $(call if_changed,systbl)
+
+uapisyshdr-y   += unistd_32.h unistd_64.h
+syshdr-y   += syscall_table_32.h syscall_table_64.h \
+  syscall_table_c32.h
+
+targets+= $(uapisyshdr-y) $(syshdr-y)
+
+PHONY += all
+all: $(addprefix $(uapi)/,$(uapisyshdr-y))
+all: $(addprefix $(out)/,$(syshdr-y))
+   @:
diff --git a/arch/powerpc/kernel/syscalls/syscall_32.tbl 
b/arch/powerpc/kernel/syscalls/syscall_32.tbl
new file mode 100644
index 000..50c419c
--- /dev/null
+++ b/arch/powerpc/kernel/syscalls/syscall_32.tbl
@@ -0,0 +1,378 @@
+#
+# 32-bit system call numbers and entry vectors
+#
+# The format is:
+# 
+#
+# The abi is always "common" for this file.
+#
+0   common  restart_syscall sys_restart_syscall
 
+1   common  exitsys_exit   
 
+2   common  forkppc_fork   
 
+3   common  readsys_read   
 
+4   common  write   sys_write  
 
+5   common  opensys_open   
 compat_sys_open
+6   common  close   sys_close  
 

[PATCH 1/3] powerpc: Replace NR_syscalls macro from asm/unistd.h

2018-09-14 Thread Firoz Khan
__NR_syscalls macro holds the number of system call exist in POWERPC
architecture. This macro is currently the part of asm/unistd.h file.
We have to change the value of __NR_syscalls, if we add or delete a
system call.

One of the patch in this patch series has a script which will generate
a uapi header based on syscall.tbl file. The syscall.tbl file contains
the number of system call information. So we have two option to update
__NR_syscalls value.

1. Update __NR_syscalls in asm/unistd.h manually by counting the
   no.of system calls. No need to update __NR_syscalls untill
   we either add a new system call or delete an existing system
   call.

2. We can keep this feature it above mentioned script, that'll
   count the number of syscalls and keep it in a generated file.
   In this case we don't need to explicitly update __NR_syscalls
   in asm/unistd.h file.

The 2nd option will be the recommended one. For that, I moved the
NR_syscalls macro from asm/unistd.h to uapi/asm/unistd.h. The macro
name also changed form NR_syscalls to __NR_syscalls for making the
name convention same across all architecture. While __NR_syscalls
isn't strictly part of the uapi, having it as part of the generated
header to simplifies the implementation.

Signed-off-by: Firoz Khan 
---
 arch/powerpc/include/asm/unistd.h  | 3 +--
 arch/powerpc/include/uapi/asm/unistd.h | 2 ++
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index c19379f..54732f9 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -11,8 +11,7 @@
 
 #include 
 
-
-#define NR_syscalls389
+#define NR_syscalls  __NR_syscalls
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index 985534d..f999df2 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -401,4 +401,6 @@
 #define __NR_rseq  387
 #define __NR_io_pgetevents 388
 
+#define __NR_syscalls   389
+
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
-- 
1.9.1



[PATCH 0/3] System call table generation support

2018-09-14 Thread Firoz Khan
The purpose of this patch series is:
1. We can easily add/modify/delete system call by changing entry 
in syscall.tbl file. No need to manually edit many files.

2. It is easy to unify the system call implementation across all 
the architectures. 

The system call tables are in different format in all architecture 
and it will be difficult to manually add or modify the system calls
in the respective files manually. To make it easy by keeping a script 
and which'll generate the header file and syscall table file so this 
change will unify them across all architectures.

syscall.tbl contains the list of available system calls along with 
system call number and corresponding entry point. Add a new system 
call in this architecture will be possible by adding new entry in 
the syscall.tbl file.

Adding a new table entry consisting of:
- System call number.
- ABI.
- System call name.
- Entry point name.
- Compat entry name, if required.

ARM, s390 and x86 architecuture does exist the similar support. I 
leverage their implementation to come up with a generic solution.

I have done the same support for work for alpha, m68k, microblaze, 
ia64, mips, parisc, sh, sparc, and xtensa. But I started sending 
the patch for one architecuture for review. Below mentioned git
repository contains more details.
Git repo:- https://github.com/frzkhn/system_call_table_generator/

Finally, this is the ground work for solving the Y2038 issue. We 
need to add/change two dozen of system calls to solve Y2038 issue. 
So this patch series will help to easily modify from existing 
system call to Y2038 compatible system calls.

I started working system call table generation on 4.17-rc1. I used 
marcin's script - https://github.com/hrw/syscalls-table to generate 
the syscall.tbl file. And this will be the input to the system call 
table generation script. But there are couple system call got add 
in the latest rc release. If run Marcin's script on latest release,
It will generate a new syscall.tbl. But I still use the old file - 
syscall.tbl and once all review got over I'll update syscall.tbl 
alone w.r.to the tip of the kernel. The impact of this thing, few 
of the system call won't work. 

Firoz Khan (3):
  powerpc: Replace NR_syscalls macro from asm/unistd.h
  powerpc: Add system call table generation support
  powerpc: uapi header and system call table file generation

 arch/powerpc/Makefile   |   3 +
 arch/powerpc/include/asm/Kbuild |   3 +
 arch/powerpc/include/asm/unistd.h   |   3 +-
 arch/powerpc/include/uapi/asm/Kbuild|   2 +
 arch/powerpc/include/uapi/asm/unistd.h  | 391 +---
 arch/powerpc/kernel/Makefile|   3 +-
 arch/powerpc/kernel/syscall_table_32.S  |   9 +
 arch/powerpc/kernel/syscall_table_64.S  |  17 ++
 arch/powerpc/kernel/syscalls/Makefile   |  51 
 arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 +++
 arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 ++
 arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
 arch/powerpc/kernel/syscalls/syscalltbl.sh  |  38 +++
 arch/powerpc/kernel/systbl.S|  50 
 14 files changed, 916 insertions(+), 441 deletions(-)
 create mode 100644 arch/powerpc/kernel/syscall_table_32.S
 create mode 100644 arch/powerpc/kernel/syscall_table_64.S
 create mode 100644 arch/powerpc/kernel/syscalls/Makefile
 create mode 100644 arch/powerpc/kernel/syscalls/syscall_32.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscall_64.tbl
 create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
 create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
 delete mode 100644 arch/powerpc/kernel/systbl.S

-- 
1.9.1



Re: [PATCH v2 5/5] powerpc: Remove -mno-sched-epilog

2018-09-14 Thread Segher Boessenkool
On Sat, Sep 15, 2018 at 06:43:05AM +1000, Nicholas Piggin wrote:
> On Fri, 14 Sep 2018 11:03:38 -0700
> Nick Desaulniers  wrote:
> 
> > On Thu, Sep 13, 2018 at 9:07 PM Joel Stanley  wrote:
> > > Last time this was proposed there was an issue reported:
> > >
> > >  
> > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-September/121214.html
> > >   
> > 
> > Heh, did PASemi sell boxes? Interesting, I'll have to read up on my history.
> > 
> > On Thu, Sep 13, 2018 at 10:06 PM Nicholas Piggin  wrote:
> > > I don't think we can remove it completely because up to at least 4.6
> > > maybe 4.8 has problems.
> > >
> > > I have a few patches lying around I started looking at this... I'll
> > > send them.  
> > 
> > Yeah, it's too bad the link above doesn't mention gcc version.
> > 
> > The gcc bugreport mentions fixing the bug in
> > 7563dc64585324f443f5ac107eb6d89ee813a2d2, not sure how to check what
> > release version of gcc that is? (Do they tag releases?)
> 
> I'm not sure, that's not in my gcc tree AFAIKS.

This is a git hash in the kernel tree.

> > Nick, do you have a test case or more context about this still being
> > an issue in gcc 4.8? (maybe I should wait for your patch series?)
> 
> Sorry I forgot to cc you. This has links to some of the sched
> epilog bugs.
> 
> https://marc.info/?l=linuxppc-embedded=153690223909654=2

PR44199 was backported to 4.4 and PR52828 is fixed in 4.8.


Segher


Re: [PATCH 3/3] scripts/dtc: Update to upstream version v1.4.7-14-gc86da84d30e4

2018-09-14 Thread Frank Rowand
On 09/13/18 13:28, Rob Herring wrote:
> Major changes are I2C and SPI bus checks, YAML output format (for
> future validation), some new libfdt functions, and more libfdt
> validation of dtbs.
> 
> The YAML addition adds an optional dependency on libyaml. pkg-config is
> used to test for it and pkg-config became a kconfig dependency in 4.18.

For Ubuntu, the libyaml dependency is provided by the packages:

   libyaml-0-2
   libyaml-dev


-Frank

> 
> This adds the following commits from upstream:
> 
> c86da84d30e4 Add support for YAML encoded output
> 361b5e7d8067 Make type_marker_length helper public


< snip >


Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg

2018-09-14 Thread Al Viro
On Fri, Sep 14, 2018 at 01:35:06PM -0700, Darren Hart wrote:
 
> Acked-by: Darren Hart (VMware) 
> 
> As for a longer term solution, would it be possible to init fops in such
> a way that the compat_ioctl call defaults to generic_compat_ioctl_ptrarg
> so we don't have to duplicate this boilerplate for every ioctl fops
> structure?

Bad idea, that...  Because several years down the road somebody will add
an ioctl that takes an unsigned int for argument.  Without so much as looking
at your magical mystery macro being used to initialize file_operations.

FWIW, I would name that helper in more blunt way - something like
compat_ioctl_only_compat_pointer_ioctls_here()...


Re: [PATCH v2 5/5] PCI/powerpc/eeh: Add pcibios hooks for preparing to rescan

2018-09-14 Thread Sergey Miroshnichenko


On 9/12/18 1:39 PM,  wrote:
> On Mon, 2018-09-10 at 19:00 +0300, Sergey Miroshnichenko wrote:
>>
>> Yes, missing a real EEH event is possible, unfortunately, and it is
>> indeed worth mentioning.
>>
>> To reduce this probability the next patchset I'll post in a few days
>> among other things puts all the affected device drivers to pause during
>> rescan, mainly because of moving BARs and bridge windows, but it will
>> also help here a bit.
> 
> How do you deal with moving BARs etc... within the segmenting
> restrictions of EEH ?
> 

Actually, [un]fortunately, we haven't encountered any segmenting issues
yet, but to move BARs we are using the same existing mechanism in Linux
kernel that re-enumerated the PCIe topology during startup with
"pci=realloc" and PCI_REASSIGN_ALL_BUS. What restrictions must be broken
to provoke a segmenting event?

Are there any other limitations on segmenting besides keeping all the
BARs of the PHB within its huge M32+M64 segments which are 2GiB+4GiB on
our setup?

> It's a horrible mess right now and I don't know if the current code can
> even work properly to be honest.
> 
> Cheers,
> Ben.
> 
> 

Best regards,
Serge


Re: [PATCH v2 2/5] powerpc/boot: Fix crt0.S syntax for clang

2018-09-14 Thread Segher Boessenkool
On Fri, Sep 14, 2018 at 10:47:08AM -0700, Nick Desaulniers wrote:
> On Thu, Sep 13, 2018 at 9:07 PM Joel Stanley  wrote:
> >  10:addis   r12,r12,(-RELACOUNT)@ha
> > -   cmpdi   r12,RELACOUNT@l
> > +   cmpdi   r12,(RELACOUNT)@l
> 
> Yep, as we can see above, when RELACOUNT is negated, it's wrapped in
> parens.

The only thing that does is make it easier for humans to read; it means
exactly the same thing.


Segher


Re: [PATCH 4/5] powerpc/powernv/pci: Enable reassigning the bus numbers

2018-09-14 Thread Sergey Miroshnichenko
Hello Ben,

On 9/12/18 1:35 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2018-09-05 at 18:40 +0300, Sergey Miroshnichenko wrote:
>> PowerNV doesn't depend on PCIe topology info from DT anymore, and now
>> it is able to enumerate the fabric and assign the bus numbers.
> 
> No it's not, at least unless we drop P7 support.
> 
> P7 has constraints on the bus ranges being aligned power-of-two for the
> PE assignment to work, which is why we have to honor the firmware
> provided numbers.
> 
> Additionally, this breaks the mapping between the firmware idea of the
> bus numbers and Linux idea. This will probably break all of the SR-IOV
> stuff.
> 

Oh, I see. To make this more controllable and less intrusive I've bound
the PCI_REASSIGN_ALL_BUS flag to the "pci=realloc" command line argument
(in version 3 of this patchset) instead of the unconditional setting.

> Now we should probably fix it all by removing the FW bits completely
> and doing it all from Linux, though we really need to better handle how
> we deal with the segmented MMIO space.
> 
> I would also be weary of what other parts of the code depends on that
> matching between the FW bdfn and the Linux bdfn.
> 

This approach allows us to use the same in-kernel hotplug mechanisms for
PowerNV+OPAL and other platforms, so we are highly interested. Would you
kindly advice what are the essential parts to start with, maybe point
out some documentation on EEH segmentation and FW/OS sync?

> Cheers,
> Ben.
> 

Best regards,
Serge

>> Signed-off-by: Sergey Miroshnichenko 
>> ---
>>  arch/powerpc/platforms/powernv/pci.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/powerpc/platforms/powernv/pci.c 
>> b/arch/powerpc/platforms/powernv/pci.c
>> index 6d4280086a08..f6eaca3123cd 100644
>> --- a/arch/powerpc/platforms/powernv/pci.c
>> +++ b/arch/powerpc/platforms/powernv/pci.c
>> @@ -1104,6 +1104,7 @@ void __init pnv_pci_init(void)
>>  struct device_node *np;
>>  
>>  pci_add_flags(PCI_CAN_SKIP_ISA_ALIGN);
>> +pci_add_flags(PCI_REASSIGN_ALL_BUS);
>>  
>>  /* If we don't have OPAL, eg. in sim, just skip PCI probe */
>>  if (!firmware_has_feature(FW_FEATURE_OPAL))
> 


Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg

2018-09-14 Thread Darren Hart
On Wed, Sep 12, 2018 at 05:08:52PM +0200, Arnd Bergmann wrote:
> The .ioctl and .compat_ioctl file operations have the same prototype so
> they can both point to the same function, which works great almost all
> the time when all the commands are compatible.
> 
> One exception is the s390 architecture, where a compat pointer is only
> 31 bit wide, and converting it into a 64-bit pointer requires calling
> compat_ptr(). Most drivers here will ever run in s390, but since we now
> have a generic helper for it, it's easy enough to use it consistently.
> 
> I double-checked all these drivers to ensure that all ioctl arguments
> are used as pointers or are ignored, but are not interpreted as integer
> values.
> 
> Signed-off-by: Arnd Bergmann 
> ---
...
>  drivers/platform/x86/wmi.c  | 2 +-
...
>  static void link_event_work(struct work_struct *work)
> diff --git a/drivers/platform/x86/wmi.c b/drivers/platform/x86/wmi.c
> index 04791ea5d97b..e4d0697e07d6 100644
> --- a/drivers/platform/x86/wmi.c
> +++ b/drivers/platform/x86/wmi.c
> @@ -886,7 +886,7 @@ static const struct file_operations wmi_fops = {
>   .read   = wmi_char_read,
>   .open   = wmi_char_open,
>   .unlocked_ioctl = wmi_ioctl,
> - .compat_ioctl   = wmi_ioctl,
> + .compat_ioctl   = generic_compat_ioctl_ptrarg,
>  };

For platform/drivers/x86:

Acked-by: Darren Hart (VMware) 

As for a longer term solution, would it be possible to init fops in such
a way that the compat_ioctl call defaults to generic_compat_ioctl_ptrarg
so we don't have to duplicate this boilerplate for every ioctl fops
structure?

-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH v2 5/5] powerpc: Remove -mno-sched-epilog

2018-09-14 Thread Nicholas Piggin
On Fri, 14 Sep 2018 11:03:38 -0700
Nick Desaulniers  wrote:

> On Thu, Sep 13, 2018 at 9:07 PM Joel Stanley  wrote:
> > Last time this was proposed there was an issue reported:
> >
> >  https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-September/121214.html 
> >  
> 
> Heh, did PASemi sell boxes? Interesting, I'll have to read up on my history.
> 
> On Thu, Sep 13, 2018 at 10:06 PM Nicholas Piggin  wrote:
> > I don't think we can remove it completely because up to at least 4.6
> > maybe 4.8 has problems.
> >
> > I have a few patches lying around I started looking at this... I'll
> > send them.  
> 
> Yeah, it's too bad the link above doesn't mention gcc version.
> 
> The gcc bugreport mentions fixing the bug in
> 7563dc64585324f443f5ac107eb6d89ee813a2d2, not sure how to check what
> release version of gcc that is? (Do they tag releases?)

I'm not sure, that's not in my gcc tree AFAIKS.

> 
> Nick, do you have a test case or more context about this still being
> an issue in gcc 4.8? (maybe I should wait for your patch series?)

Sorry I forgot to cc you. This has links to some of the sched
epilog bugs.

https://marc.info/?l=linuxppc-embedded=153690223909654=2

Thanks,
Nick


Re: [PATCH v3 0/6] powerpc/powernv/pci: Discover surprise-hotplugged PCIe devices during rescan

2018-09-14 Thread Sergey Miroshnichenko
Hello Oliver,

On 9/12/18 12:49 PM, Oliver wrote:
> On Tue, Sep 11, 2018 at 9:56 PM, Sergey Miroshnichenko
>  wrote:
>> This patchset allows hotplugged PCIe devices to be enumerated during a bus
>> rescan being issued via sysfs on PowerNV platforms, when the "Presence
>> Detect Changed" interrupt is not available.
> 
> Seems to be on par with the sysfs slot power hack that pnv_php uses.
> 

Yes, ours is just for manual initiation of rescan, which helps us with
reliable detection of a bridge hotplug in our particular config.

>> As a first part of our work on adding support for hotplugging PCIe bridges
>> full of devices (without special requirement such as Hot-Plug Controller,
>> reservation of bus numbers and memory regions by firmware, etc.), this
>> serie is intended to solve the first two problems of the listed below:
>>
>> I   PowerNV doesn't discover new hotplugged PCIe devices
>> II  EEH is falsely triggered when poking empty slots during the PCIe rescan
> 
> We avoid this problem in pnv_php by having OPAL to do the rescan and
> Linux requests
> a FDT fragment of everything under the slot. I'm don't think it's a
> great system, but
> it keeps firmware and the OS on the same page.
> 

So if we re-enumerate the PCIe topology from the Linux, we must then
synchronize with the firmware? How would you recommend to approach that
for PowerNV and OPAL? Can we can find somewhere a list of criteria to
ensure that they are properly synced?

>> III The PCI subsystem is not prepared to runtime changes of BAR addresses
>> IV  Device drivers don't track changes of their BAR addresses
>> V   BARs of working devices don't move to make space for new ones
> 
> I'm having a really hard to figuring out what would make this
> necessary. Keep in mind
> that each PHB has it's own set of bus numbers and it's own MMIO space,
> so it's not
> like you're short on either.
> 
> How are you planning on making this sort of live-device-migration work? And 
> what
> are you trying to do that makes the added complexity worth it?
> 

With the "pci=realloc" command line argument and with the
PCI_REASSIGN_ALL_BUS flag the kernel doesn't rely on values of bus
numbers and BAR addresses provided by a firmware (OPAL via FDT in our
case, BIOS/UEFI/Coreboot for x86_64), but re-enumerates the PCIe
topology by its own means, and it arranges BARs quite compactly.

Let's say we have two bridges plugged into neighboring ports of the
root/PHB, each of them have a few NVME drives inserted and several empty
slots, when the system boots. Linux makes their bridge windows adjacent,
so if we plug in a new NVME into the first of them, there will be just
no free space to put its BARs.

Without considering memory pre-allocation, the only way we see to free
some space for new BARs is to move existing BARs of the second bridge
(in this example).

We've implemented a "firmware-independent" proof-of-concept (not
flawless, though, as you and Ben pointed out) and verified on
PowerNV+OPAL and x86_64 that a running NVME with an ongoing "fio"
benchmark always survives BAR movement during hotplug - of course after
applying a patch that pauses the NVME Linux driver during rescan. The
only visible effect is a bandwidth temporary drops to 0 for a second or
two, until NVME restarts. The same for a network adapter - an SSH
connection just freezes for a while.

This patchset is a first part of our work, and we've just published [1]
a second part (on BAR movement and pausing the drivers) for the
community to review, discuss and validate.

[1] https://www.spinics.net/lists/linux-pci/msg76211.html

Best regards,
Serge

>> Tested on:
>>  - POWER8 PowerNV+OPAL ppc64le (our Vesnin server) w/ and w/o pci=realloc;
>>  - POWER8 IBM 8247-42L (pSeries);
>>  - POWER8 IBM 8247-42L (PowerNV+OPAL) w/ and w/o pci=realloc.
>>
>> Changes since v2:
>>  - Don't reassign bus numbers on PowerNV by default (to retain the default
>>behavior), but only when pci=realloc is passed;
>>  - Less code affected;
>>  - pci_add_device_node_info is refactored with add_one_dev_pci_data;
>>  - Minor code cleanup.
>>
>> Changes since v1:
>>  - Fixed build for ppc64le and ppc64be when CONFIG_PCI_IOV is disabled;
>>  - Fixed build for ppc64e when CONFIG_EEH is disabled;
>>  - Fixed code style warnings.
>>
>> Sergey Miroshnichenko (6):
>>   powerpc/pci: Access PCI config space directly w/o pci_dn
>>   powerpc/pci: Create pci_dn on demand
>>   powerpc/pci: Use DT to create pci_dn for root bridges only
>>   powerpc/powernv/pci: Enable reassigning the bus numbers
>>   PCI/powerpc/eeh: Add pcibios hooks for preparing to rescan
>>   powerpc/pci: Reduce code duplication in pci_add_device_node_info
>>
>>  arch/powerpc/include/asm/eeh.h   |   2 +
>>  arch/powerpc/kernel/eeh.c|  12 ++
>>  arch/powerpc/kernel/pci_dn.c | 119 ++-
>>  arch/powerpc/kernel/rtas_pci.c   |  97 ++-
>>  arch/powerpc/platforms/powernv/eeh-powernv.c |  22 

[GIT PULL] Devicetree fix for 4.19-rc

2018-09-14 Thread Rob Herring
Linus,

Please pull. One regression for a 20 year old PowerMac.

Rob

The following changes since commit 0413bedabc886c3a56804d1c80a58e99077b1d91:

  of: Add device_type access helper functions (2018-08-31 08:30:42 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git
tags/devicetree-fixes-for-4.19-2

for you to fetch changes up to e54192b48da75f025ae4b277925eaf6aca1d13bd:

  of: fix phandle cache creation for DTs with no phandles (2018-09-11
11:28:40 -0500)


Devicetree fixes for 4.19, part 2:

- Fix a regression on systems having a DT without any phandles which
  happens on a PowerMac G3.


Rob Herring (1):
  of: fix phandle cache creation for DTs with no phandles

 drivers/of/base.c | 3 +++
 1 file changed, 3 insertions(+)


Re: [PATCH v2 1/5] powerpc/Makefiles: Fix clang/llvm build

2018-09-14 Thread Nick Desaulniers
On Thu, Sep 13, 2018 at 9:07 PM Joel Stanley  wrote:
>
> From: Anton Blanchard 
>
> Commit 15a3204d24a3 ("powerpc/64s: Set assembler machine type to POWER4")
> passes -mpower4 to the assembler. We have more recent instructions in our
> assembly files, but gas permits them. The clang/llvm integrated assembler
> is more strict, and we get a build failure.

Note that we disable clang's integrated assembler in the top level
Makefile for now, but it will still validate constraints for inline
assembly.  Do you know which case is meant by "build failure?"

Is there a link to the Clang bug?  It would be good to have that
context in the commit message.

>
> Fix this by calling the assembler with -mcpu=power8 if as supports it,
> else fall back to power4.
>
> Suggested-by: Nicholas Piggin 
> Signed-off-by: Anton Blanchard 
> Signed-off-by: Joel Stanley 
> ---
>  arch/powerpc/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 11a1acba164a..a70639482053 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -238,7 +238,7 @@ cpu-as-$(CONFIG_4xx)+= -Wa,-m405
>  cpu-as-$(CONFIG_ALTIVEC)   += $(call as-option,-Wa$(comma)-maltivec)
>  cpu-as-$(CONFIG_E200)  += -Wa,-me200
>  cpu-as-$(CONFIG_E500)  += -Wa,-me500
> -cpu-as-$(CONFIG_PPC_BOOK3S_64) += -Wa,-mpower4
> +cpu-as-$(CONFIG_PPC_BOOK3S_64) += $(call 
> as-option,-Wa$(comma)-mpower8,-Wa$(comma)-mpower4)
>  cpu-as-$(CONFIG_PPC_E500MC)+= $(call as-option,-Wa$(comma)-me500mc)
>
>  KBUILD_AFLAGS += $(cpu-as-y)
> --
> 2.17.1
>

-- 
Thanks,
~Nick Desaulniers


Re: [PATCH v2 2/3] watchdog: mpc8xxx: provide boot status

2018-09-14 Thread Guenter Roeck
On Fri, Sep 14, 2018 at 01:32:01PM +, Christophe Leroy wrote:
> mpc8xxx watchdog driver supports the following platforms:
> - mpc8xx
> - mpc83xx
> - mpc86xx
> 
> Those three platforms have a 32 bits register which provides the
> reason of the last boot, including whether it was caused by the
> watchdog.
> 
> mpc8xx: Register RSR, bit SWRS (bit 3)
> mpc83xx: Register RSR, bit SWRS (bit 28)
> mpc86xx: Register RSTRSCR, bit WDT_RR (bit 11)
> 
> This patch maps the register as defined in the device tree and updates
> wdt.bootstatus based on the value of the watchdog related bit. Then
> the information can be retrieved via the WDIOC_GETBOOTSTATUS ioctl.
> 
> Hereunder is an example of devicetree for mpc8xx,
> the Reset Status Register being at offset 0x288:
> 
>   WDT: watchdog@0 {
>   compatible = "fsl,mpc823-wdt";
>   reg = <0x0 0x10 0x288 0x4>;
>   };
> 
> On the mpc83xx, RSR is at offset 0x910
> On the mpc86xx, RSTRSCR is at offset 0xe0094
> 
> Suggested-by: Radu Rendec 
> Tested-by: Christophe Leroy  # On mpc885
> Signed-off-by: Christophe Leroy 
> ---
>  drivers/watchdog/mpc8xxx_wdt.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c
> index 1dcf5f10cdd9..4a4700458b17 100644
> --- a/drivers/watchdog/mpc8xxx_wdt.c
> +++ b/drivers/watchdog/mpc8xxx_wdt.c
> @@ -47,6 +47,7 @@ struct mpc8xxx_wdt {
>  struct mpc8xxx_wdt_type {
>   int prescaler;
>   bool hw_enabled;
> + u32 rsr_mask;
>  };
>  
>  struct mpc8xxx_wdt_ddata {
> @@ -136,6 +137,7 @@ static int mpc8xxx_wdt_probe(struct platform_device 
> *ofdev)
>   u32 freq = fsl_get_sys_freq();
>   bool enabled;
>   struct device *dev = >dev;
> + u32 __iomem *rsr = NULL;
>  
>   wdt_type = of_device_get_match_data(dev);
>   if (!wdt_type)
> @@ -159,6 +161,21 @@ static int mpc8xxx_wdt_probe(struct platform_device 
> *ofdev)
>   return -ENODEV;
>   }
>  
> + res = platform_get_resource(ofdev, IORESOURCE_MEM, 1);
> + if (res)
> + rsr = ioremap(res->start, resource_size(res));
> + if (rsr) {

This if() can be inside the first if(), and it should be something like

if (res) {
rsr = ioremap(res->start, resource_size(res));
if (!rsr) {
dev_err(...);
return -ENOMEM;
}
...
}

... because _if_ the resource is provided in dt it should be valid.

Thanks,
Guenter

> + bool status = in_be32(rsr) & wdt_type->rsr_mask;
> +
> + ddata->wdd.bootstatus = status ? WDIOF_CARDRESET : 0;
> +  /* clear reset status bits related to watchdog timer */
> + out_be32(rsr, wdt_type->rsr_mask);
> + iounmap(rsr);
> +
> + dev_info(dev, "Last boot was %scaused by watchdog\n",
> +  status ? "" : "not ");
> + }
> +
>   spin_lock_init(>lock);
>  
>   ddata->wdd.info = _wdt_info,
> @@ -216,6 +233,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = {
>   .compatible = "mpc83xx_wdt",
>   .data = &(struct mpc8xxx_wdt_type) {
>   .prescaler = 0x1,
> + .rsr_mask = BIT(3), /* RSR Bit SWRS */
>   },
>   },
>   {
> @@ -223,6 +241,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = {
>   .data = &(struct mpc8xxx_wdt_type) {
>   .prescaler = 0x1,
>   .hw_enabled = true,
> + .rsr_mask = BIT(20), /* RSTRSCR Bit WDT_RR */
>   },
>   },
>   {
> @@ -230,6 +249,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = {
>   .data = &(struct mpc8xxx_wdt_type) {
>   .prescaler = 0x800,
>   .hw_enabled = true,
> + .rsr_mask = BIT(28), /* RSR Bit SWRS */
>   },
>   },
>   {},
> -- 
> 2.13.3
> 


Re: [PATCH v2 3/3] dt-bindings: watchdog: add mpc8xxx-wdt support

2018-09-14 Thread Guenter Roeck
On Fri, Sep 14, 2018 at 01:32:03PM +, Christophe Leroy wrote:
> Add description of DT bindings for mpc8xxx-wdt driver which
> handles the CPU watchdog timer on the mpc83xx, mpc86xx and mpc8xx.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  .../devicetree/bindings/watchdog/mpc8xxx-wdt.txt   | 25 
> ++
>  1 file changed, 25 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt
> 
> diff --git a/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt 
> b/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt
> new file mode 100644
> index ..1d99e1e4d306
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt
> @@ -0,0 +1,25 @@
> +* Freescale mpc8xxx watchdog driver (For 83xx, 86xx and 8xx)
> +
> +Required properties:
> +- compatible: Shall contain one of the following:
> + "mpc83xx_wdt" for an mpc83xx
> + "fsl,mpc8610-wdt" for an mpc86xx
> + "fsl,mpc823-wdt" for an mpc8xx
> +- reg: base physical address and length of the area hosting the
> +   watchdog registers.
> + On the 83xx, "Watchdog Timer Registers" area:   <0x200 0x100>
> + On the 86xx, "Watchdog Timer Registers" area:   <0xe4000 0x100>
> + On the 8xx, "General System Interface Unit" area: <0x0 0x10>
> +

Note for Rob: The above has been implemented for several years.
This is merely to document the current implementation.
Maybe "mpc83xx_wdt" should be deprecated and replaced, but I think
that should be a separate patch.

> +Optional properties:
> +- reg: additionnal physical address and length (4) of location of the

s/additionnal/additional/

> +   Reset Status Register (called RSTRSCR on the mpc86xx)
> + On the 83xx, it is located at offset 0x910
> + On the 86xx, it is located at offset 0xe0094
> + On the 8xx, it is located at offset 0x288
> +
> +Example:
> + WDT: watchdog@0 {
> + compatible = "fsl,mpc823-wdt";
> + reg = <0x0 0x10 0x288 0x4>;
> + };
> -- 
> 2.13.3
> 


Re: [PATCH v2 1/3] watchdog: mpc8xxx: use dev_xxxx() instead of pr_xxxx()

2018-09-14 Thread Guenter Roeck
On Fri, Sep 14, 2018 at 01:31:59PM +, Christophe Leroy wrote:
> mpc8xxx watchdog driver is a platform device drivers, it is
> therefore possible to use dev_xxx() messaging rather than pr_xxx()
> 
> Signed-off-by: Christophe Leroy 

Reviewed-by: Guenter Roeck 

> ---
>  drivers/watchdog/mpc8xxx_wdt.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c
> index aca2d6323f8a..1dcf5f10cdd9 100644
> --- a/drivers/watchdog/mpc8xxx_wdt.c
> +++ b/drivers/watchdog/mpc8xxx_wdt.c
> @@ -17,8 +17,6 @@
>   * option) any later version.
>   */
>  
> -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> -
>  #include 
>  #include 
>  #include 
> @@ -137,26 +135,27 @@ static int mpc8xxx_wdt_probe(struct platform_device 
> *ofdev)
>   struct mpc8xxx_wdt_ddata *ddata;
>   u32 freq = fsl_get_sys_freq();
>   bool enabled;
> + struct device *dev = >dev;
>  
> - wdt_type = of_device_get_match_data(>dev);
> + wdt_type = of_device_get_match_data(dev);
>   if (!wdt_type)
>   return -EINVAL;
>  
>   if (!freq || freq == -1)
>   return -EINVAL;
>  
> - ddata = devm_kzalloc(>dev, sizeof(*ddata), GFP_KERNEL);
> + ddata = devm_kzalloc(dev, sizeof(*ddata), GFP_KERNEL);
>   if (!ddata)
>   return -ENOMEM;
>  
>   res = platform_get_resource(ofdev, IORESOURCE_MEM, 0);
> - ddata->base = devm_ioremap_resource(>dev, res);
> + ddata->base = devm_ioremap_resource(dev, res);
>   if (IS_ERR(ddata->base))
>   return PTR_ERR(ddata->base);
>  
>   enabled = in_be32(>base->swcrr) & SWCRR_SWEN;
>   if (!enabled && wdt_type->hw_enabled) {
> - pr_info("could not be enabled in software\n");
> + dev_info(dev, "could not be enabled in software\n");
>   return -ENODEV;
>   }
>  
> @@ -166,7 +165,7 @@ static int mpc8xxx_wdt_probe(struct platform_device 
> *ofdev)
>   ddata->wdd.ops = _wdt_ops,
>  
>   ddata->wdd.timeout = WATCHDOG_TIMEOUT;
> - watchdog_init_timeout(>wdd, timeout, >dev);
> + watchdog_init_timeout(>wdd, timeout, dev);
>  
>   watchdog_set_nowayout(>wdd, nowayout);
>  
> @@ -189,12 +188,13 @@ static int mpc8xxx_wdt_probe(struct platform_device 
> *ofdev)
>  
>   ret = watchdog_register_device(>wdd);
>   if (ret) {
> - pr_err("cannot register watchdog device (err=%d)\n", ret);
> + dev_err(dev, "cannot register watchdog device (err=%d)\n", ret);
>   return ret;
>   }
>  
> - pr_info("WDT driver for MPC8xxx initialized. mode:%s timeout=%d sec\n",
> - reset ? "reset" : "interrupt", ddata->wdd.timeout);
> + dev_info(dev,
> +  "WDT driver for MPC8xxx initialized. mode:%s timeout=%d sec\n",
> +  reset ? "reset" : "interrupt", ddata->wdd.timeout);
>  
>   platform_set_drvdata(ofdev, ddata);
>   return 0;
> @@ -204,8 +204,8 @@ static int mpc8xxx_wdt_remove(struct platform_device 
> *ofdev)
>  {
>   struct mpc8xxx_wdt_ddata *ddata = platform_get_drvdata(ofdev);
>  
> - pr_crit("Watchdog removed, expect the %s soon!\n",
> - reset ? "reset" : "machine check exception");
> + dev_crit(>dev, "Watchdog removed, expect the %s soon!\n",
> +  reset ? "reset" : "machine check exception");
>   watchdog_unregister_device(>wdd);
>  
>   return 0;
> -- 
> 2.13.3
> 


Re: KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM workarounds

2018-09-14 Thread sathnaga

On 2018-09-14 22:26, sathn...@linux.vnet.ibm.com wrote:

Date: Thu, 13 Sep 2018 15:33:47 +1000
From: Michael Neuling 
To: m...@ellerman.id.au
Cc: linuxppc-dev@lists.ozlabs.org, kvm-...@vger.kernel.org,
pau...@ozlabs.org, sjitindarsi...@gmail.com, mi...@neuling.org
Subject: KVM: PPC: Book3S HV: Fix guest r11 corruption with POWER9 TM
workarounds

When we come into the softpatch handler (0x1500), we use r11 to store
the HSRR0 for later use by the denorm handler.

We also use the softpatch handler for the TM workarounds for
POWER9. Unfortunately, in kvmppc_interrupt_hv we later store r11 out
to the vcpu assuming it's still what we got from userspace.

This causes r11 to be corrupted in the VCPU and hence when we restore
the guest, we get a corrupted r11. We've seen this when running TM
tests inside guests on P9.

This fixes the problem by only touching r11 in the denorm case.

Fixes: 4bb3c7a020 ("KVM: PPC: Book3S HV: Work around transactional
memory bugs in POWER9")
Cc:  # 4.17+
Test-by: Suraj Jitindar Singh 
Reviewed-by: Paul Mackerras 
Signed-off-by: Michael Neuling 
---
 arch/powerpc/kernel/exceptions-64s.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


Tested-by: Satheesh Rajendran 

Test details: 
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1792501


Regards,
-Satheesh.


diff --git a/arch/powerpc/kernel/exceptions-64s.S
b/arch/powerpc/kernel/exceptions-64s.S
index ea04dfb8c0..2d8fc8c9da 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1314,9 +1314,7 @@ EXC_REAL_BEGIN(denorm_exception_hv, 0x1500, 
0x100)


 #ifdef CONFIG_PPC_DENORMALISATION
mfspr   r10,SPRN_HSRR1
-   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
andis.  r10,r10,(HSRR1_DENORM)@h /* denorm? */
-   addir11,r11,-4  /* HSRR0 is next instruction */
bne+denorm_assist
 #endif

@@ -1382,6 +1380,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
  */
XVCPSGNDP32(32)
 denorm_done:
+   mfspr   r11,SPRN_HSRR0
+   subir11,r11,4
mtspr   SPRN_HSRR0,r11
mtcrf   0x80,r9
ld  r9,PACA_EXGEN+EX_R9(r13)





Re: [PATCH] powerpc/fadump: re-register firmware-assisted dump if already registered

2018-09-14 Thread Hari Bathini




On Friday 14 September 2018 07:58 PM, Petr Tesarik wrote:

On Fri, 14 Sep 2018 19:36:02 +0530
Hari Bathini  wrote:


Firmware-Assisted Dump (FADump) needs to be registered again after any
memory hot add/remove operation to update the crash memory ranges. But
currently, the kernel returns '-EEXIST' if we try to register without
uregistering it first. This could expose the system to racing issues
while unregistering and registering FADump from userspace during udev
events. Spare the userspace of this and let it be taken care of in the
kernel space for a simpler interface.

Since this change, running 'echo 1 > /sys/kernel/fadump_registered'
would result in re-regisering (unregistering and registering) FADump,
if it was already registered.

Great improvement to the API!

Any suggestions what should be done in a client which tries to be
compatible with kernels before this change and after this change?


If `echo 1 > /sys/kernel/fadump_registered` fails, check for the output
of  `cat /sys/kernel/fadump_registered` and if it is still `1`, that 
indicates

old kernel and we are already registered. Treat it as success if being
registered is what we care about or unregister/register (if re-register
is the intention)..

Hope that helps..

Thanks
Hari



[PATCH 12/12] powerpc/64s/hash: Add a SLB preload cache

2018-09-14 Thread Nicholas Piggin
When switching processes, currently all user SLBEs are cleared, and a
few (exec_base, pc, and stack) are preloaded. In trivial testing with
small apps, this tends to miss the heap and low 256MB segments, and it
will also miss commonly accessed segments on large memory workloads.

Add a simple round-robin preload cache that just inserts the last SLB
miss into the head of the cache and preloads those at context switch
time. Every 256 context switches, the oldest entry is removed from the
cache to shrink the cache and require fewer slbmte if they are unused.

Much more could go into this, including into the SLB entry reclaim
side to track some LRU information etc, which would require a study of
large memory workloads. But this is a simple thing we can do now that
is an obvious win for common workloads.

With the full series, process switching speed on the context_switch
benchmark on POWER9/hash (with kernel speculation security masures
disabled) increases from 140K/s to 178K/s (27%).

POWER8 does not change much (within 1%), it's unclear why it does not
see a big gain like POWER9.

Booting to busybox init with 256MB segments has SLB misses go down
from 945 to 69, and with 1T segments 900 to 21. These could almost all
be eliminated by preloading a bit more carefully with ELF binary
loading.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/processor.h   |   1 +
 arch/powerpc/include/asm/thread_info.h |   5 +
 arch/powerpc/kernel/process.c  |   7 ++
 arch/powerpc/mm/mmu_context_book3s64.c |   4 +
 arch/powerpc/mm/slb.c  | 166 +++--
 5 files changed, 143 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 12b76ecdc57d..936795acba48 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -273,6 +273,7 @@ struct thread_struct {
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */
unsigned long   trap_nr;/* last trap # on this thread */
+   u8 load_slb;/* Ages out SLB preload cache entries */
u8 load_fp;
 #ifdef CONFIG_ALTIVEC
u8 load_vec;
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index f9a442bb5a72..9e78b7d26b64 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -29,6 +29,7 @@
 #include 
 #include 
 
+#define SLB_PRELOAD_NR 16U
 /*
  * low level task data.
  */
@@ -44,6 +45,10 @@ struct thread_info {
 #if defined(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) && defined(CONFIG_PPC32)
struct cpu_accounting_data accounting;
 #endif
+   unsigned char slb_preload_nr;
+   unsigned char slb_preload_tail;
+   u32 slb_preload_esid[SLB_PRELOAD_NR];
+
/* low level flags - has atomic operations done on it */
unsigned long   flags cacheline_aligned_in_smp;
 };
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e4feb45ae4c6..03c2e1f134bc 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1719,6 +1719,8 @@ int copy_thread(unsigned long clone_flags, unsigned long 
usp,
return 0;
 }
 
+void preload_new_slb_context(unsigned long start, unsigned long sp);
+
 /*
  * Set up a thread for executing a new program
  */
@@ -1726,6 +1728,10 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
 {
 #ifdef CONFIG_PPC64
unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */
+
+#ifdef CONFIG_PPC_BOOK3S_64
+   preload_new_slb_context(start, sp);
+#endif
 #endif
 
/*
@@ -1816,6 +1822,7 @@ void start_thread(struct pt_regs *regs, unsigned long 
start, unsigned long sp)
 #ifdef CONFIG_VSX
current->thread.used_vsr = 0;
 #endif
+   current->thread.load_slb = 0;
current->thread.load_fp = 0;
memset(>thread.fp_state, 0, sizeof(current->thread.fp_state));
current->thread.fp_save_area = NULL;
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
b/arch/powerpc/mm/mmu_context_book3s64.c
index f7352c66b6b8..510f103d7813 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -53,6 +53,8 @@ int hash__alloc_context_id(void)
 }
 EXPORT_SYMBOL_GPL(hash__alloc_context_id);
 
+void slb_setup_new_exec(void);
+
 static int hash__init_new_context(struct mm_struct *mm)
 {
int index;
@@ -87,6 +89,8 @@ static int hash__init_new_context(struct mm_struct *mm)
 void hash__setup_new_exec(void)
 {
slice_setup_new_exec();
+
+   slb_setup_new_exec();
 }
 
 static int radix__init_new_context(struct mm_struct *mm)
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 98521fec3536..d200728fe41b 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -187,41 +187,119 @@ void slb_vmalloc_update(void)

[PATCH 11/12] powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup

2018-09-14 Thread Nicholas Piggin
This will be used by the SLB code in the next patch, but for now this
sets the slb_addr_limit to the correct size for 32-bit tasks.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 ++
 arch/powerpc/include/asm/slice.h  |  1 +
 arch/powerpc/include/asm/thread_info.h|  6 ++
 arch/powerpc/kernel/process.c |  9 +
 arch/powerpc/mm/mmu_context_book3s64.c|  5 +
 arch/powerpc/mm/slice.c   | 14 ++
 6 files changed, 37 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 4c8d413ce99a..fc68058554fa 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -487,6 +487,8 @@ int htab_remove_mapping(unsigned long vstart, unsigned long 
vend,
 extern void pseries_add_gpage(u64 addr, u64 page_size, unsigned long 
number_of_pages);
 extern void demote_segment_4k(struct mm_struct *mm, unsigned long addr);
 
+extern void hash__setup_new_exec(void);
+
 #ifdef CONFIG_PPC_PSERIES
 void hpte_init_pseries(void);
 #else
diff --git a/arch/powerpc/include/asm/slice.h b/arch/powerpc/include/asm/slice.h
index e40406cf5628..a595461c9cb0 100644
--- a/arch/powerpc/include/asm/slice.h
+++ b/arch/powerpc/include/asm/slice.h
@@ -32,6 +32,7 @@ void slice_set_range_psize(struct mm_struct *mm, unsigned 
long start,
   unsigned long len, unsigned int psize);
 
 void slice_init_new_context_exec(struct mm_struct *mm);
+void slice_setup_new_exec(void);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/include/asm/thread_info.h 
b/arch/powerpc/include/asm/thread_info.h
index 3c0002044bc9..f9a442bb5a72 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -72,6 +72,12 @@ static inline struct thread_info *current_thread_info(void)
 }
 
 extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct 
*src);
+
+#ifdef CONFIG_PPC_BOOK3S_64
+void arch_setup_new_exec(void);
+#define arch_setup_new_exec arch_setup_new_exec
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 /*
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 913c5725cdb2..e4feb45ae4c6 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1482,6 +1482,15 @@ void flush_thread(void)
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 }
 
+#ifdef CONFIG_PPC_BOOK3S_64
+void arch_setup_new_exec(void)
+{
+   if (radix_enabled())
+   return;
+   hash__setup_new_exec();
+}
+#endif
+
 int set_thread_uses_vas(void)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
b/arch/powerpc/mm/mmu_context_book3s64.c
index dbd8f762140b..f7352c66b6b8 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -84,6 +84,11 @@ static int hash__init_new_context(struct mm_struct *mm)
return index;
 }
 
+void hash__setup_new_exec(void)
+{
+   slice_setup_new_exec();
+}
+
 static int radix__init_new_context(struct mm_struct *mm)
 {
unsigned long rts_field;
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 606f424aac47..fc5b3a1ec666 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -746,6 +746,20 @@ void slice_init_new_context_exec(struct mm_struct *mm)
bitmap_fill(mask->high_slices, SLICE_NUM_HIGH);
 }
 
+#ifdef CONFIG_PPC_BOOK3S_64
+void slice_setup_new_exec(void)
+{
+   struct mm_struct *mm = current->mm;
+
+   slice_dbg("slice_setup_new_exec(mm=%p)\n", mm);
+
+   if (!is_32bit_task())
+   return;
+
+   mm->context.slb_addr_limit = DEFAULT_MAP_WINDOW;
+}
+#endif
+
 void slice_set_range_psize(struct mm_struct *mm, unsigned long start,
   unsigned long len, unsigned int psize)
 {
-- 
2.18.0



[PATCH 10/12] powerpc/64s: xmon do not dump hash fields when using radix mode

2018-09-14 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/xmon/xmon.c | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 323aac8321fa..5dec84aba59e 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2378,30 +2378,32 @@ static void dump_one_paca(int cpu)
DUMP(p, cpu_start, "%#-*x");
DUMP(p, kexec_state, "%#-*x");
 #ifdef CONFIG_PPC_BOOK3S_64
-   for (i = 0; i < SLB_NUM_BOLTED; i++) {
-   u64 esid, vsid;
+   if (!early_radix_enabled()) {
+   for (i = 0; i < SLB_NUM_BOLTED; i++) {
+   u64 esid, vsid;
 
-   if (!p->slb_shadow_ptr)
-   continue;
+   if (!p->slb_shadow_ptr)
+   continue;
 
-   esid = be64_to_cpu(p->slb_shadow_ptr->save_area[i].esid);
-   vsid = be64_to_cpu(p->slb_shadow_ptr->save_area[i].vsid);
+   esid = 
be64_to_cpu(p->slb_shadow_ptr->save_area[i].esid);
+   vsid = 
be64_to_cpu(p->slb_shadow_ptr->save_area[i].vsid);
 
-   if (esid || vsid) {
-   printf(" %-*s[%d] = 0x%016llx 0x%016llx\n",
-  22, "slb_shadow", i, esid, vsid);
+   if (esid || vsid) {
+   printf(" %-*s[%d] = 0x%016llx 0x%016llx\n",
+  22, "slb_shadow", i, esid, vsid);
+   }
}
-   }
-   DUMP(p, vmalloc_sllp, "%#-*x");
-   DUMP(p, stab_rr, "%#-*x");
-   DUMP(p, slb_used_bitmap, "%#-*x");
-   DUMP(p, slb_kern_bitmap, "%#-*x");
+   DUMP(p, vmalloc_sllp, "%#-*x");
+   DUMP(p, stab_rr, "%#-*x");
+   DUMP(p, slb_used_bitmap, "%#-*x");
+   DUMP(p, slb_kern_bitmap, "%#-*x");
 
-   if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) {
-   DUMP(p, slb_cache_ptr, "%#-*x");
-   for (i = 0; i < SLB_CACHE_ENTRIES; i++)
-   printf(" %-*s[%d] = 0x%016x\n",
-  22, "slb_cache", i, p->slb_cache[i]);
+   if (!early_cpu_has_feature(CPU_FTR_ARCH_300)) {
+   DUMP(p, slb_cache_ptr, "%#-*x");
+   for (i = 0; i < SLB_CACHE_ENTRIES; i++)
+   printf(" %-*s[%d] = 0x%016x\n",
+  22, "slb_cache", i, p->slb_cache[i]);
+   }
}
 
DUMP(p, rfi_flush_fallback_area, "%-*px");
-- 
2.18.0



[PATCH 09/12] powerpc/64s/hash: SLB allocation status bitmaps

2018-09-14 Thread Nicholas Piggin
Add 32-entry bitmaps to track the allocation status of the first 32
SLB entries, and whether they are user or kernel entries. These are
used to allocate free SLB entries first, before resorting to the round
robin allocator.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/paca.h   |  6 ++-
 arch/powerpc/kernel/asm-offsets.c |  2 +-
 arch/powerpc/mm/slb.c | 62 +--
 arch/powerpc/xmon/xmon.c  |  4 +-
 4 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 8c258a057207..bf7ab59be3b8 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -113,7 +113,10 @@ struct paca_struct {
 * on the linear mapping */
/* SLB related definitions */
u16 vmalloc_sllp;
-   u16 slb_cache_ptr;
+   u8 slb_cache_ptr;
+   u8 stab_rr; /* stab/slb round-robin counter */
+   u32 slb_used_bitmap;/* Bitmaps for first 32 SLB entries. */
+   u32 slb_kern_bitmap;
u32 slb_cache[SLB_CACHE_ENTRIES];
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
@@ -148,7 +151,6 @@ struct paca_struct {
 */
struct task_struct *__current;  /* Pointer to current */
u64 kstack; /* Saved Kernel stack addr */
-   u64 stab_rr;/* stab/slb round-robin counter */
u64 saved_r1;   /* r1 save for RTAS calls or PM or EE=0 
*/
u64 saved_msr;  /* MSR saved here by enter_rtas */
u16 trap_save;  /* Used when bad stack is encountered */
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 43b67ead5b97..1f79cbf3da62 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -173,7 +173,6 @@ int main(void)
OFFSET(PACAKSAVE, paca_struct, kstack);
OFFSET(PACACURRENT, paca_struct, __current);
OFFSET(PACASAVEDMSR, paca_struct, saved_msr);
-   OFFSET(PACASTABRR, paca_struct, stab_rr);
OFFSET(PACAR1, paca_struct, saved_r1);
OFFSET(PACATOC, paca_struct, kernel_toc);
OFFSET(PACAKBASE, paca_struct, kernelbase);
@@ -203,6 +202,7 @@ int main(void)
 #ifdef CONFIG_PPC_BOOK3S_64
OFFSET(PACASLBCACHE, paca_struct, slb_cache);
OFFSET(PACASLBCACHEPTR, paca_struct, slb_cache_ptr);
+   OFFSET(PACASTABRR, paca_struct, stab_rr);
OFFSET(PACAVMALLOCSLLP, paca_struct, vmalloc_sllp);
 #ifdef CONFIG_PPC_MM_SLICES
OFFSET(MMUPSIZESLLP, mmu_psize_def, sllp);
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index d782a70d4a5d..98521fec3536 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -122,6 +122,9 @@ void slb_restore_bolted_realmode(void)
 {
__slb_restore_bolted_realmode();
get_paca()->slb_cache_ptr = 0;
+
+   get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1;
+   get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap;
 }
 
 /*
@@ -129,9 +132,6 @@ void slb_restore_bolted_realmode(void)
  */
 void slb_flush_all_realmode(void)
 {
-   /*
-* This flushes all SLB entries including 0, so it must be realmode.
-*/
asm volatile("slbmte %0,%0; slbia" : : "r" (0));
 }
 
@@ -177,6 +177,9 @@ void slb_flush_and_rebolt(void)
 : "memory");
 
get_paca()->slb_cache_ptr = 0;
+
+   get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1;
+   get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap;
 }
 
 void slb_vmalloc_update(void)
@@ -273,10 +276,13 @@ void switch_slb(struct task_struct *tsk, struct mm_struct 
*mm)
 "isync"
 :: "r"(ksp_vsid_data),
"r"(ksp_esid_data));
+
+   get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 
1;
}
 
get_paca()->slb_cache_ptr = 0;
}
+   get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap;
 
/*
 * preload some userspace segments into the SLB.
@@ -349,6 +355,8 @@ void slb_initialize(void)
}
 
get_paca()->stab_rr = SLB_NUM_BOLTED - 1;
+   get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1;
+   get_paca()->slb_used_bitmap = get_paca()->slb_kern_bitmap;
 
lflags = SLB_VSID_KERNEL | linear_llp;
 
@@ -400,17 +408,47 @@ static void slb_cache_update(unsigned long esid_data)
}
 }
 
-static enum slb_index alloc_slb_index(void)
+static enum slb_index alloc_slb_index(bool kernel)
 {
enum slb_index index;
 
-   /* round-robin replacement of slb starting at SLB_NUM_BOLTED. */
-   index = get_paca()->stab_rr;
-   if (index < (mmu_slb_size - 1))
-   index++;
-   else
-   index = SLB_NUM_BOLTED;
-   get_paca()->stab_rr = index;
+  

[PATCH 08/12] powerpc/64s/hash: remove user SLB data from the paca

2018-09-14 Thread Nicholas Piggin
User SLB mappig data is copied into the PACA from the mm->context so
it can be accessed by the SLB miss handlers.

After the C conversion, SLB miss handlers now run with relocation on,
and user SLB misses are able to take recursive kernel SLB misses, so
the user SLB mapping data can be removed from the paca and accessed
directly.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  1 +
 arch/powerpc/include/asm/paca.h   | 13 --
 arch/powerpc/kernel/asm-offsets.c |  9 
 arch/powerpc/kernel/paca.c| 21 -
 arch/powerpc/mm/hash_utils_64.c   | 46 +--
 arch/powerpc/mm/mmu_context.c |  3 +-
 arch/powerpc/mm/slb.c | 20 +++-
 arch/powerpc/mm/slice.c   | 29 
 8 files changed, 40 insertions(+), 102 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 20d9ca736bbd..4c8d413ce99a 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -496,6 +496,7 @@ static inline void hpte_init_pseries(void) { }
 extern void hpte_init_native(void);
 
 extern void slb_initialize(void);
+extern void core_flush_all_slbs(struct mm_struct *mm);
 extern void slb_flush_and_rebolt(void);
 void slb_flush_all_realmode(void);
 void __slb_restore_bolted_realmode(void);
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4331295db0f7..8c258a057207 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -143,18 +143,6 @@ struct paca_struct {
struct tlb_core_data tcd;
 #endif /* CONFIG_PPC_BOOK3E */
 
-#ifdef CONFIG_PPC_BOOK3S
-   mm_context_id_t mm_ctx_id;
-#ifdef CONFIG_PPC_MM_SLICES
-   unsigned char mm_ctx_low_slices_psize[BITS_PER_LONG / BITS_PER_BYTE];
-   unsigned char mm_ctx_high_slices_psize[SLICE_ARRAY_SIZE];
-   unsigned long mm_ctx_slb_addr_limit;
-#else
-   u16 mm_ctx_user_psize;
-   u16 mm_ctx_sllp;
-#endif
-#endif
-
/*
 * then miscellaneous read-write fields
 */
@@ -256,7 +244,6 @@ struct paca_struct {
 #endif /* CONFIG_PPC_PSERIES */
 } cacheline_aligned;
 
-extern void copy_mm_to_paca(struct mm_struct *mm);
 extern struct paca_struct **paca_ptrs;
 extern void initialise_paca(struct paca_struct *new_paca, int cpu);
 extern void setup_paca(struct paca_struct *new_paca);
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 7834256585f1..43b67ead5b97 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -181,15 +181,6 @@ int main(void)
OFFSET(PACAIRQSOFTMASK, paca_struct, irq_soft_mask);
OFFSET(PACAIRQHAPPENED, paca_struct, irq_happened);
OFFSET(PACA_FTRACE_ENABLED, paca_struct, ftrace_enabled);
-#ifdef CONFIG_PPC_BOOK3S
-   OFFSET(PACACONTEXTID, paca_struct, mm_ctx_id);
-#ifdef CONFIG_PPC_MM_SLICES
-   OFFSET(PACALOWSLICESPSIZE, paca_struct, mm_ctx_low_slices_psize);
-   OFFSET(PACAHIGHSLICEPSIZE, paca_struct, mm_ctx_high_slices_psize);
-   OFFSET(PACA_SLB_ADDR_LIMIT, paca_struct, mm_ctx_slb_addr_limit);
-   DEFINE(MMUPSIZEDEFSIZE, sizeof(struct mmu_psize_def));
-#endif /* CONFIG_PPC_MM_SLICES */
-#endif
 
 #ifdef CONFIG_PPC_BOOK3E
OFFSET(PACAPGD, paca_struct, pgd);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 0ee3e6d50f28..6752e17f0281 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -259,24 +259,3 @@ void __init free_unused_pacas(void)
paca_ptrs_size + paca_struct_size, nr_cpu_ids);
 }
 
-void copy_mm_to_paca(struct mm_struct *mm)
-{
-#ifdef CONFIG_PPC_BOOK3S
-   mm_context_t *context = >context;
-
-   get_paca()->mm_ctx_id = context->id;
-#ifdef CONFIG_PPC_MM_SLICES
-   VM_BUG_ON(!mm->context.slb_addr_limit);
-   get_paca()->mm_ctx_slb_addr_limit = mm->context.slb_addr_limit;
-   memcpy(_paca()->mm_ctx_low_slices_psize,
-  >low_slices_psize, sizeof(context->low_slices_psize));
-   memcpy(_paca()->mm_ctx_high_slices_psize,
-  >high_slices_psize, TASK_SLICE_ARRAY_SZ(mm));
-#else /* CONFIG_PPC_MM_SLICES */
-   get_paca()->mm_ctx_user_psize = context->user_psize;
-   get_paca()->mm_ctx_sllp = context->sllp;
-#endif
-#else /* !CONFIG_PPC_BOOK3S */
-   return;
-#endif
-}
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index f23a89d8e4ce..88c95dc8b141 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1088,16 +1088,16 @@ unsigned int hash_page_do_lazy_icache(unsigned int pp, 
pte_t pte, int trap)
 }
 
 #ifdef CONFIG_PPC_MM_SLICES
-static unsigned int get_paca_psize(unsigned long addr)
+static unsigned int get_psize(struct mm_struct *mm, unsigned long addr)
 {

[PATCH 07/12] powerpc/64s/hash: convert SLB miss handlers to C

2018-09-14 Thread Nicholas Piggin
This patch moves SLB miss handlers completely to C, using the standard
exception handler macros to set up the stack and branch to C.

This can be done because the segment containing the kernel stack is
always bolted, so accessing it with relocation on will not cause an
SLB exception.

Arbitrary kernel memory may not be accessed when handling kernel space
SLB misses, so care should be taken there. However user SLB misses can
access any kernel memory, which can be used to move some fields out of
the paca (in later patches).

User SLB misses could quite easily reconcile IRQs and set up a first
class kernel environment and exit via ret_from_except, however that
doesn't seem to be necessary at the moment, so we only do that if a
bad fault is encountered.

[ Credit to Aneesh for bug fixes, error checks, and improvements to bad
  address handling, etc ]

Signed-off-by: Nicholas Piggin 

Since RFC:
- Added MSR[RI] handling
- Fixed up a register loss bug exposed by irq tracing (Aneesh)
- Reject misses outside the defined kernel regions (Aneesh)
- Added several more sanity checks and error handling (Aneesh), we may
  look at consolidating these tests and tightenig up the code but for
  a first pass we decided it's better to check carefully.

Since v1:
- Fixed SLB cache corruption (Aneesh)
- Fixed untidy SLBE allocation "leak" in get_vsid error case
- Now survives some stress testing on real hardware
---
 arch/powerpc/include/asm/asm-prototypes.h |   2 +
 arch/powerpc/include/asm/exception-64s.h  |   8 -
 arch/powerpc/kernel/exceptions-64s.S  | 202 +++--
 arch/powerpc/mm/Makefile  |   2 +-
 arch/powerpc/mm/slb.c | 271 +
 arch/powerpc/mm/slb_low.S | 335 --
 6 files changed, 196 insertions(+), 624 deletions(-)
 delete mode 100644 arch/powerpc/mm/slb_low.S

diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
b/arch/powerpc/include/asm/asm-prototypes.h
index 1f4691ce4126..78ed3c3f879a 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -78,6 +78,8 @@ void kernel_bad_stack(struct pt_regs *regs);
 void system_reset_exception(struct pt_regs *regs);
 void machine_check_exception(struct pt_regs *regs);
 void emulation_assist_interrupt(struct pt_regs *regs);
+long do_slb_fault(struct pt_regs *regs, unsigned long ea);
+void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err);
 
 /* signals, syscalls and interrupts */
 long sys_swapcontext(struct ucontext __user *old_ctx,
diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index a86fead0..47578b79f0fb 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -60,14 +60,6 @@
  */
 #define MAX_MCE_DEPTH  4
 
-/*
- * EX_LR is only used in EXSLB and where it does not overlap with EX_DAR
- * EX_CCR similarly with DSISR, but being 4 byte registers there is a hole
- * in the save area so it's not necessary to overlap them. Could be used
- * for future savings though if another 4 byte register was to be saved.
- */
-#define EX_LR  EX_DAR
-
 /*
  * EX_R3 is only used by the bad_stack handler. bad_stack reloads and
  * saves DAR from SPRN_DAR, and EX_DAR is not used. So EX_R3 can overlap
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 9dad73722d1a..c4f372ef4842 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -567,28 +567,36 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
 
 
 EXC_REAL_BEGIN(data_access_slb, 0x380, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, KVMTEST_PR, 0x380)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_DAR
-   mfspr   r11,SPRN_SRR1
-   crset   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, KVMTEST_PR, 
0x380);
 EXC_REAL_END(data_access_slb, 0x380, 0x80)
 
 EXC_VIRT_BEGIN(data_access_slb, 0x4380, 0x80)
-   SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXSLB)
-   EXCEPTION_PROLOG_1(PACA_EXSLB, NOTEST, 0x380)
-   mr  r12,r3  /* save r3 */
-   mfspr   r3,SPRN_DAR
-   mfspr   r11,SPRN_SRR1
-   crset   4*cr6+eq
-   BRANCH_TO_COMMON(r10, slb_miss_common)
+EXCEPTION_RELON_PROLOG(PACA_EXSLB, data_access_slb_common, EXC_STD, NOTEST, 
0x380);
 EXC_VIRT_END(data_access_slb, 0x4380, 0x80)
+
 TRAMP_KVM_SKIP(PACA_EXSLB, 0x380)
 
+EXC_COMMON_BEGIN(data_access_slb_common)
+   mfspr   r10,SPRN_DAR
+   std r10,PACA_EXSLB+EX_DAR(r13)
+   EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB)
+   ld  r4,PACA_EXSLB+EX_DAR(r13)
+   std r4,_DAR(r1)
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  do_slb_fault
+   cmpdi   r3,0
+   bne-1f
+   b   fast_exception_return
+1: /* 

[PATCH 06/12] powerpc/64s/hash: Use POWER9 SLBIA IH=3 variant in switch_slb

2018-09-14 Thread Nicholas Piggin
POWER9 introduces SLBIA IH=3, which invalidates all SLB entries and
associated lookaside information that have a class value of 1, which
Linux assigns to user addresses. This matches what switch_slb wants,
and allows a simple fast implementation that avoids the slb_cache
complexity.

As a side-effect, the POWER5 < DD2.1 SLB invalidation workaround is
also avoided on POWER9.

Process context switching rate is improved about 2.2% for a small
process that hits the slb cache which is the best case for the current
code.

Signed-of-by: Nicholas Piggin 
---
 arch/powerpc/mm/slb.c| 86 +++-
 arch/powerpc/xmon/xmon.c | 11 +++--
 2 files changed, 57 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 03fa1c663ccf..319c772f7cbd 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -209,7 +209,6 @@ static inline int esids_match(unsigned long addr1, unsigned 
long addr2)
 /* Flush all user entries from the segment table of the current processor. */
 void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 {
-   unsigned long offset;
unsigned long pc = KSTK_EIP(tsk);
unsigned long stack = KSTK_ESP(tsk);
unsigned long exec_base;
@@ -221,45 +220,57 @@ void switch_slb(struct task_struct *tsk, struct mm_struct 
*mm)
 * which would update the slb_cache/slb_cache_ptr fields in the PACA.
 */
hard_irq_disable();
-   offset = get_paca()->slb_cache_ptr;
-   if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) &&
-   offset <= SLB_CACHE_ENTRIES) {
-   unsigned long slbie_data;
-   int i;
-
-   asm volatile("isync" : : : "memory");
-   for (i = 0; i < offset; i++) {
-   slbie_data = (unsigned long)get_paca()->slb_cache[i]
-   << SID_SHIFT; /* EA */
-   slbie_data |= user_segment_size(slbie_data)
-   << SLBIE_SSIZE_SHIFT;
-   slbie_data |= SLBIE_C; /* C set for user addresses */
-   asm volatile("slbie %0" : : "r" (slbie_data));
-   }
-
-   /* Workaround POWER5 < DD2.1 issue */
-   if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1)
-   asm volatile("slbie %0" : : "r" (slbie_data));
+   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   /*
+* SLBIA IH=3 invalidates all Class=1 SLBEs and their
+* associated lookaside structures, which matches what
+* switch_slb wants. So ARCH_300 does not use the slb
+* cache.
+*/
+   asm volatile("isync ; " PPC_SLBIA(3)" ; isync");
 
-   asm volatile("isync" : : : "memory");
} else {
-   struct slb_shadow *p = get_slb_shadow();
-   unsigned long ksp_esid_data =
-   be64_to_cpu(p->save_area[KSTACK_INDEX].esid);
-   unsigned long ksp_vsid_data =
-   be64_to_cpu(p->save_area[KSTACK_INDEX].vsid);
-
-   asm volatile("isync\n"
-PPC_SLBIA(1) "\n"
-"slbmte%0,%1\n"
-"isync"
-:: "r"(ksp_vsid_data),
-   "r"(ksp_esid_data));
-
-   asm volatile("isync" : : : "memory");
+   unsigned long offset = get_paca()->slb_cache_ptr;
+
+   if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) &&
+   offset <= SLB_CACHE_ENTRIES) {
+   unsigned long slbie_data;
+   int i;
+
+   asm volatile("isync" : : : "memory");
+   for (i = 0; i < offset; i++) {
+   /* EA */
+   slbie_data = (unsigned long)
+   get_paca()->slb_cache[i] << SID_SHIFT;
+   slbie_data |= user_segment_size(slbie_data)
+   << SLBIE_SSIZE_SHIFT;
+   slbie_data |= SLBIE_C; /* user slbs have C=1 */
+   asm volatile("slbie %0" : : "r" (slbie_data));
+   }
+
+   /* Workaround POWER5 < DD2.1 issue */
+   if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1)
+   asm volatile("slbie %0" : : "r" (slbie_data));
+
+   asm volatile("isync" : : : "memory");
+   } else {
+   struct slb_shadow *p = get_slb_shadow();
+   unsigned long ksp_esid_data =
+   be64_to_cpu(p->save_area[KSTACK_INDEX].esid);
+   unsigned long ksp_vsid_data =
+   

[PATCH 05/12] powerpc/64s/hash: Use POWER6 SLBIA IH=1 variant in switch_slb

2018-09-14 Thread Nicholas Piggin
The SLBIA IH=1 hint will remove all non-zero SLBEs, but only
invalidate ERAT entries associated with a class value of 1, for
processors that support the hint (e.g., POWER6 and newer), which
Linux assigns to user addresses.

This prevents kernel ERAT entries from being invalidated when
context switchig (if the thread faulted in more than 8 user SLBEs).

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/slb.c | 38 +++---
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index a5e58f11d676..03fa1c663ccf 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -128,13 +128,21 @@ void slb_flush_all_realmode(void)
asm volatile("slbmte %0,%0; slbia" : : "r" (0));
 }
 
-static void __slb_flush_and_rebolt(void)
+void slb_flush_and_rebolt(void)
 {
/* If you change this make sure you change SLB_NUM_BOLTED
 * and PR KVM appropriately too. */
unsigned long linear_llp, lflags;
unsigned long ksp_esid_data, ksp_vsid_data;
 
+   WARN_ON(!irqs_disabled());
+
+   /*
+* We can't take a PMU exception in the following code, so hard
+* disable interrupts.
+*/
+   hard_irq_disable();
+
linear_llp = mmu_psize_defs[mmu_linear_psize].sllp;
lflags = SLB_VSID_KERNEL | linear_llp;
 
@@ -160,20 +168,7 @@ static void __slb_flush_and_rebolt(void)
 :: "r"(ksp_vsid_data),
"r"(ksp_esid_data)
 : "memory");
-}
 
-void slb_flush_and_rebolt(void)
-{
-
-   WARN_ON(!irqs_disabled());
-
-   /*
-* We can't take a PMU exception in the following code, so hard
-* disable interrupts.
-*/
-   hard_irq_disable();
-
-   __slb_flush_and_rebolt();
get_paca()->slb_cache_ptr = 0;
 }
 
@@ -248,7 +243,20 @@ void switch_slb(struct task_struct *tsk, struct mm_struct 
*mm)
 
asm volatile("isync" : : : "memory");
} else {
-   __slb_flush_and_rebolt();
+   struct slb_shadow *p = get_slb_shadow();
+   unsigned long ksp_esid_data =
+   be64_to_cpu(p->save_area[KSTACK_INDEX].esid);
+   unsigned long ksp_vsid_data =
+   be64_to_cpu(p->save_area[KSTACK_INDEX].vsid);
+
+   asm volatile("isync\n"
+PPC_SLBIA(1) "\n"
+"slbmte%0,%1\n"
+"isync"
+:: "r"(ksp_vsid_data),
+   "r"(ksp_esid_data));
+
+   asm volatile("isync" : : : "memory");
}
 
get_paca()->slb_cache_ptr = 0;
-- 
2.18.0



[PATCH 04/12] powerpc/64s/hash: remove the vmalloc segment from the bolted SLB

2018-09-14 Thread Nicholas Piggin
Remove the vmalloc segment from bolted SLBEs. This is not required to
be bolted, and seems like it was added to help pre-load the SLB on
context switch. However there are now other segments like the vmemmap
segment and non-zero node memory that often take misses after a context
switch, so it is better to solve this in a more general way.

A subsequent change will track free SLB entries and uses those rather
than round-robin overwrite valid entries, which makes it far less
likely for kernel SLBEs to be evicted after they are installed.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |  2 +-
 arch/powerpc/mm/slb.c | 23 ---
 2 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index b3520b549cba..20d9ca736bbd 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -30,7 +30,7 @@
  * SLB
  */
 
-#define SLB_NUM_BOLTED 3
+#define SLB_NUM_BOLTED 2
 #define SLB_CACHE_ENTRIES  8
 #define SLB_MIN_SIZE   32
 
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index d952ece3abf7..a5e58f11d676 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -30,8 +30,7 @@
 
 enum slb_index {
LINEAR_INDEX= 0, /* Kernel linear map  (0xc000) */
-   VMALLOC_INDEX   = 1, /* Kernel virtual map (0xd000) */
-   KSTACK_INDEX= 2, /* Kernel stack map */
+   KSTACK_INDEX= 1, /* Kernel stack map */
 };
 
 extern void slb_allocate(unsigned long ea);
@@ -133,13 +132,11 @@ static void __slb_flush_and_rebolt(void)
 {
/* If you change this make sure you change SLB_NUM_BOLTED
 * and PR KVM appropriately too. */
-   unsigned long linear_llp, vmalloc_llp, lflags, vflags;
+   unsigned long linear_llp, lflags;
unsigned long ksp_esid_data, ksp_vsid_data;
 
linear_llp = mmu_psize_defs[mmu_linear_psize].sllp;
-   vmalloc_llp = mmu_psize_defs[mmu_vmalloc_psize].sllp;
lflags = SLB_VSID_KERNEL | linear_llp;
-   vflags = SLB_VSID_KERNEL | vmalloc_llp;
 
ksp_esid_data = mk_esid_data(get_paca()->kstack, mmu_kernel_ssize, 
KSTACK_INDEX);
if ((ksp_esid_data & ~0xfffUL) <= PAGE_OFFSET) {
@@ -157,14 +154,10 @@ static void __slb_flush_and_rebolt(void)
 * the stack between the slbia and rebolting it. */
asm volatile("isync\n"
 "slbia\n"
-/* Slot 1 - first VMALLOC segment */
+/* Slot 1 - kernel stack */
 "slbmte%0,%1\n"
-/* Slot 2 - kernel stack */
-"slbmte%2,%3\n"
 "isync"
-:: "r"(mk_vsid_data(VMALLOC_START, mmu_kernel_ssize, 
vflags)),
-   "r"(mk_esid_data(VMALLOC_START, mmu_kernel_ssize, 
VMALLOC_INDEX)),
-   "r"(ksp_vsid_data),
+:: "r"(ksp_vsid_data),
"r"(ksp_esid_data)
 : "memory");
 }
@@ -186,10 +179,6 @@ void slb_flush_and_rebolt(void)
 
 void slb_vmalloc_update(void)
 {
-   unsigned long vflags;
-
-   vflags = SLB_VSID_KERNEL | mmu_psize_defs[mmu_vmalloc_psize].sllp;
-   slb_shadow_update(VMALLOC_START, mmu_kernel_ssize, vflags, 
VMALLOC_INDEX);
slb_flush_and_rebolt();
 }
 
@@ -324,7 +313,7 @@ void slb_set_size(u16 size)
 void slb_initialize(void)
 {
unsigned long linear_llp, vmalloc_llp, io_llp;
-   unsigned long lflags, vflags;
+   unsigned long lflags;
static int slb_encoding_inited;
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
unsigned long vmemmap_llp;
@@ -360,14 +349,12 @@ void slb_initialize(void)
get_paca()->stab_rr = SLB_NUM_BOLTED - 1;
 
lflags = SLB_VSID_KERNEL | linear_llp;
-   vflags = SLB_VSID_KERNEL | vmalloc_llp;
 
/* Invalidate the entire SLB (even entry 0) & all the ERATS */
asm volatile("isync":::"memory");
asm volatile("slbmte  %0,%0"::"r" (0) : "memory");
asm volatile("isync; slbia; isync":::"memory");
create_shadowed_slbe(PAGE_OFFSET, mmu_kernel_ssize, lflags, 
LINEAR_INDEX);
-   create_shadowed_slbe(VMALLOC_START, mmu_kernel_ssize, vflags, 
VMALLOC_INDEX);
 
/* For the boot cpu, we're running on the stack in init_thread_union,
 * which is in the first segment of the linear mapping, and also
-- 
2.18.0



[PATCH 03/12] powerpc/64s/hash: move POWER5 < DD2.1 slbie workaround where it is needed

2018-09-14 Thread Nicholas Piggin
The POWER5 < DD2.1 issue is that slbie needs to be issued more than
once. It came in with this change:

ChangeSet@1.1608, 2004-04-29 07:12:31-07:00, da...@gibson.dropbear.id.au
  [PATCH] POWER5 erratum workaround

  Early POWER5 revisions (
---
 arch/powerpc/mm/slb.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 1c7128c63a4b..d952ece3abf7 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -226,7 +226,6 @@ static inline int esids_match(unsigned long addr1, unsigned 
long addr2)
 void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 {
unsigned long offset;
-   unsigned long slbie_data = 0;
unsigned long pc = KSTK_EIP(tsk);
unsigned long stack = KSTK_ESP(tsk);
unsigned long exec_base;
@@ -241,7 +240,9 @@ void switch_slb(struct task_struct *tsk, struct mm_struct 
*mm)
offset = get_paca()->slb_cache_ptr;
if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) &&
offset <= SLB_CACHE_ENTRIES) {
+   unsigned long slbie_data;
int i;
+
asm volatile("isync" : : : "memory");
for (i = 0; i < offset; i++) {
slbie_data = (unsigned long)get_paca()->slb_cache[i]
@@ -251,15 +252,14 @@ void switch_slb(struct task_struct *tsk, struct mm_struct 
*mm)
slbie_data |= SLBIE_C; /* C set for user addresses */
asm volatile("slbie %0" : : "r" (slbie_data));
}
-   asm volatile("isync" : : : "memory");
-   } else {
-   __slb_flush_and_rebolt();
-   }
 
-   if (!cpu_has_feature(CPU_FTR_ARCH_207S)) {
/* Workaround POWER5 < DD2.1 issue */
-   if (offset == 1 || offset > SLB_CACHE_ENTRIES)
+   if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1)
asm volatile("slbie %0" : : "r" (slbie_data));
+
+   asm volatile("isync" : : : "memory");
+   } else {
+   __slb_flush_and_rebolt();
}
 
get_paca()->slb_cache_ptr = 0;
-- 
2.18.0



[PATCH 02/12] powerpc/64s/hash: avoid the POWER5 < DD2.1 slb invalidate workaround on POWER8/9

2018-09-14 Thread Nicholas Piggin
I only have POWER8/9 to test, so just remove it for those.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/entry_64.S | 2 ++
 arch/powerpc/mm/slb.c  | 8 +---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 2206912ea4f0..77a888bfcb53 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -672,7 +672,9 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 
isync
slbie   r6
+BEGIN_FTR_SECTION
slbie   r6  /* Workaround POWER5 < DD2.1 issue */
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
slbmte  r7,r0
isync
 2:
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 2f162c6e52d4..1c7128c63a4b 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -256,9 +256,11 @@ void switch_slb(struct task_struct *tsk, struct mm_struct 
*mm)
__slb_flush_and_rebolt();
}
 
-   /* Workaround POWER5 < DD2.1 issue */
-   if (offset == 1 || offset > SLB_CACHE_ENTRIES)
-   asm volatile("slbie %0" : : "r" (slbie_data));
+   if (!cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   /* Workaround POWER5 < DD2.1 issue */
+   if (offset == 1 || offset > SLB_CACHE_ENTRIES)
+   asm volatile("slbie %0" : : "r" (slbie_data));
+   }
 
get_paca()->slb_cache_ptr = 0;
copy_mm_to_paca(mm);
-- 
2.18.0



[PATCH 01/12] powerpc/64s/hash: Fix stab_rr off by one initialization

2018-09-14 Thread Nicholas Piggin
This causes SLB alloation to start 1 beyond the start of the SLB.
There is no real problem because after it wraps it stats behaving
properly, it's just surprisig to see when looking at SLB traces.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/mm/slb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 9f574e59d178..2f162c6e52d4 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -355,7 +355,7 @@ void slb_initialize(void)
 #endif
}
 
-   get_paca()->stab_rr = SLB_NUM_BOLTED;
+   get_paca()->stab_rr = SLB_NUM_BOLTED - 1;
 
lflags = SLB_VSID_KERNEL | linear_llp;
vflags = SLB_VSID_KERNEL | vmalloc_llp;
-- 
2.18.0



[PATCH 00/12] SLB miss conversion to C, and SLB optimisations

2018-09-14 Thread Nicholas Piggin
This is a repost of the SLB conversion to C, no real change since last
post. But given that slows down the SLB miss handler, I promised some
optimisations could be made to mitigate that.

The two main optimisations after the C conversion are the SLB alloation
bitmaps, and the preload cache.

Thanks,
Nick

Nicholas Piggin (12):
  powerpc/64s/hash: Fix stab_rr off by one initialization
  powerpc/64s/hash: avoid the POWER5 < DD2.1 slb invalidate workaround
on POWER8/9
  powerpc/64s/hash: move POWER5 < DD2.1 slbie workaround where it is
needed
  powerpc/64s/hash: remove the vmalloc segment from the bolted SLB
  powerpc/64s/hash: Use POWER6 SLBIA IH=1 variant in switch_slb
  powerpc/64s/hash: Use POWER9 SLBIA IH=3 variant in switch_slb
  powerpc/64s/hash: convert SLB miss handlers to C
  powerpc/64s/hash: remove user SLB data from the paca
  powerpc/64s/hash: SLB allocation status bitmaps
  powerpc/64s: xmon do not dump hash fields when using radix mode
  powerpc/64s/hash: provide arch_setup_exec hooks for hash slice setup
  powerpc/64s/hash: Add a SLB preload cache

 arch/powerpc/include/asm/asm-prototypes.h |   2 +
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   5 +-
 arch/powerpc/include/asm/exception-64s.h  |   8 -
 arch/powerpc/include/asm/paca.h   |  19 +-
 arch/powerpc/include/asm/processor.h  |   1 +
 arch/powerpc/include/asm/slice.h  |   1 +
 arch/powerpc/include/asm/thread_info.h|  11 +
 arch/powerpc/kernel/asm-offsets.c |  11 +-
 arch/powerpc/kernel/entry_64.S|   2 +
 arch/powerpc/kernel/exceptions-64s.S  | 202 ++
 arch/powerpc/kernel/paca.c|  21 -
 arch/powerpc/kernel/process.c |  16 +
 arch/powerpc/mm/Makefile  |   2 +-
 arch/powerpc/mm/hash_utils_64.c   |  46 +-
 arch/powerpc/mm/mmu_context.c |   3 +-
 arch/powerpc/mm/mmu_context_book3s64.c|   9 +
 arch/powerpc/mm/slb.c | 596 --
 arch/powerpc/mm/slb_low.S | 335 --
 arch/powerpc/mm/slice.c   |  43 +-
 arch/powerpc/xmon/xmon.c  |  37 +-
 20 files changed, 540 insertions(+), 830 deletions(-)
 delete mode 100644 arch/powerpc/mm/slb_low.S

-- 
2.18.0



Re: [PATCH] powerpc/fadump: re-register firmware-assisted dump if already registered

2018-09-14 Thread Petr Tesarik
On Fri, 14 Sep 2018 19:36:02 +0530
Hari Bathini  wrote:

> Firmware-Assisted Dump (FADump) needs to be registered again after any
> memory hot add/remove operation to update the crash memory ranges. But
> currently, the kernel returns '-EEXIST' if we try to register without
> uregistering it first. This could expose the system to racing issues
> while unregistering and registering FADump from userspace during udev
> events. Spare the userspace of this and let it be taken care of in the
> kernel space for a simpler interface.
> 
> Since this change, running 'echo 1 > /sys/kernel/fadump_registered'
> would result in re-regisering (unregistering and registering) FADump,
> if it was already registered.

Great improvement to the API!

Any suggestions what should be done in a client which tries to be
compatible with kernels before this change and after this change?

Petr T

> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/kernel/fadump.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index a711d22..761b28b 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -1444,8 +1444,8 @@ static ssize_t fadump_register_store(struct kobject 
> *kobj,
>   break;
>   case 1:
>   if (fw_dump.dump_registered == 1) {
> - ret = -EEXIST;
> - goto unlock_out;
> + /* Un-register Firmware-assisted dump */
> + fadump_unregister_dump();
>   }
>   /* Register Firmware-assisted dump */
>   ret = register_fadump();
> 



Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg

2018-09-14 Thread David Sterba
On Wed, Sep 12, 2018 at 05:08:52PM +0200, Arnd Bergmann wrote:
> The .ioctl and .compat_ioctl file operations have the same prototype so
> they can both point to the same function, which works great almost all
> the time when all the commands are compatible.
> 
> One exception is the s390 architecture, where a compat pointer is only
> 31 bit wide, and converting it into a 64-bit pointer requires calling
> compat_ptr(). Most drivers here will ever run in s390, but since we now
> have a generic helper for it, it's easy enough to use it consistently.
> 
> I double-checked all these drivers to ensure that all ioctl arguments
> are used as pointers or are ignored, but are not interpreted as integer
> values.
> 
> Signed-off-by: Arnd Bergmann 
> ---

>  fs/btrfs/super.c| 2 +-

Acked-by: David Sterba 


[PATCH] powerpc/fadump: re-register firmware-assisted dump if already registered

2018-09-14 Thread Hari Bathini
Firmware-Assisted Dump (FADump) needs to be registered again after any
memory hot add/remove operation to update the crash memory ranges. But
currently, the kernel returns '-EEXIST' if we try to register without
uregistering it first. This could expose the system to racing issues
while unregistering and registering FADump from userspace during udev
events. Spare the userspace of this and let it be taken care of in the
kernel space for a simpler interface.

Since this change, running 'echo 1 > /sys/kernel/fadump_registered'
would result in re-regisering (unregistering and registering) FADump,
if it was already registered.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/kernel/fadump.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index a711d22..761b28b 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -1444,8 +1444,8 @@ static ssize_t fadump_register_store(struct kobject *kobj,
break;
case 1:
if (fw_dump.dump_registered == 1) {
-   ret = -EEXIST;
-   goto unlock_out;
+   /* Un-register Firmware-assisted dump */
+   fadump_unregister_dump();
}
/* Register Firmware-assisted dump */
ret = register_fadump();



Re: [PATCH] watchdog: mpc8xxx: provide boot status

2018-09-14 Thread Christophe LEROY




Le 13/09/2018 à 22:25, Guenter Roeck a écrit :

On Thu, Sep 13, 2018 at 08:07:21AM +, Christophe Leroy wrote:

mpc8xxx watchdog driver supports the following platforms:
- mpc8xx
- mpc83xx
- mpc86xx

Those three platforms have a 32 bits register which provides the
reason of the last boot, including whether it was caused by the
watchdog.

mpc8xx: Register RSR, bit SWRS (bit 3)
mpc83xx: Register RSR, bit SWRS (bit 28)
mpc86xx: Register RSTRSCR, bit WDT_RR (bit 11)

This patch maps the register as defined in the device tree and updates
wdt.bootstatus based on the value of the watchdog related bit. Then
the information can be retrieved via the WDIOC_GETBOOTSTATUS ioctl.

Hereunder is an exemple of devicetree for mpc8xx,


example


ok




the Reset Status Register being at offset 0x288:

WDT: watchdog@0 {
compatible = "fsl,mpc823-wdt";
reg = <0x0 0x10 0x288 0x4>;


This isn't documented anywhere, and no one wil know how to use it.
So far that was grandfathered in, but with more complex usage
it really needs to be documented.


Ok, added a binding




};

On the mpc83xx, RSR is at offset 0x910
On the mpc86xx, RSTRSCR is at offset 0xe0094

Suggested-by: Radu Rendec 
Tested-by: Christophe Leroy  # On mpc885
Signed-off-by: Christophe Leroy 
---
  drivers/watchdog/mpc8xxx_wdt.c | 22 ++
  1 file changed, 22 insertions(+)

diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c
index aca2d6323f8a..2951a485a6b4 100644
--- a/drivers/watchdog/mpc8xxx_wdt.c
+++ b/drivers/watchdog/mpc8xxx_wdt.c
@@ -49,10 +49,12 @@ struct mpc8xxx_wdt {
  struct mpc8xxx_wdt_type {
int prescaler;
bool hw_enabled;
+   u32 rsr_mask;
  };
  
  struct mpc8xxx_wdt_ddata {

struct mpc8xxx_wdt __iomem *base;
+   u32 __iomem *rsr;
struct watchdog_device wdd;
spinlock_t lock;
u16 swtc;
@@ -137,6 +139,7 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev)
struct mpc8xxx_wdt_ddata *ddata;
u32 freq = fsl_get_sys_freq();
bool enabled;
+   struct device *dev = >dev;


If you introduce this variable, please use it everywhere
in the function.


ok, introduced it and changed every pr_xxx() to dev_xxx() in a 
preceeding patch.




  
  	wdt_type = of_device_get_match_data(>dev);

if (!wdt_type)
@@ -160,6 +163,22 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev)
return -ENODEV;
}
  
+	res = platform_get_resource(ofdev, IORESOURCE_MEM, 1);

+   ddata->rsr = devm_ioremap_resource(dev, res);
+   if (IS_ERR(ddata->rsr)) {
+   dev_info(dev, "Could not map reset status register");


Please, no such message. It would start to show up everywhere unless
devicetree files are updated, which likely won't happen. Then we get
bogged down by people asking where this message suddenly comes from.


ok




+   } else {
+   u32 rsr_v = in_be32(ddata->rsr);
+   bool status = rsr_v & wdt_type->rsr_mask;
+
+   ddata->wdd.bootstatus = status ? WDIOF_CARDRESET : 0;
+/* clear reset status bits related to watchdog time */
+   out_be32(ddata->rsr, wdt_type->rsr_mask);
+
+   dev_info(dev, "Last boot was %s by watchdog (RSR = 0x%8.8x)\n",
+status ? "caused" : "not caused", rsr_v);


The hex value of RSR may be interesting for developers, but not for users.
Please drop.

Also, "caused" is redundant. Add it to the base string and add "not "
when needed.


Ok, I did it in v2, allthought I find the code less readable.




+   }
+
spin_lock_init(>lock);
  
  	ddata->wdd.info = _wdt_info,

@@ -216,6 +235,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = {
.compatible = "mpc83xx_wdt",
.data = &(struct mpc8xxx_wdt_type) {
.prescaler = 0x1,
+   .rsr_mask = BIT(3), /* RSR Bit 28 */


The comment is quite useless. How does BIT(3) match RSR bit 28 ?
I am sure it is because the HW manual counts bits the other way,
but here it is just confusing and thus doesn't add value unless
you provide additional context.


Ok, put the BIT names instead.

Thanks for the review
Christophe




},
},
{
@@ -223,6 +243,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = {
.data = &(struct mpc8xxx_wdt_type) {
.prescaler = 0x1,
.hw_enabled = true,
+   .rsr_mask = BIT(20), /* RSTRSCR Bit 11 */
},
},
{
@@ -230,6 +251,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = {
.data = &(struct mpc8xxx_wdt_type) {
.prescaler = 0x800,
.hw_enabled = true,
+   .rsr_mask = BIT(28), /* RSR Bit 3 */
   

[PATCH v2 3/3] dt-bindings: watchdog: add mpc8xxx-wdt support

2018-09-14 Thread Christophe Leroy
Add description of DT bindings for mpc8xxx-wdt driver which
handles the CPU watchdog timer on the mpc83xx, mpc86xx and mpc8xx.

Signed-off-by: Christophe Leroy 
---
 .../devicetree/bindings/watchdog/mpc8xxx-wdt.txt   | 25 ++
 1 file changed, 25 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt

diff --git a/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt 
b/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt
new file mode 100644
index ..1d99e1e4d306
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/mpc8xxx-wdt.txt
@@ -0,0 +1,25 @@
+* Freescale mpc8xxx watchdog driver (For 83xx, 86xx and 8xx)
+
+Required properties:
+- compatible: Shall contain one of the following:
+   "mpc83xx_wdt" for an mpc83xx
+   "fsl,mpc8610-wdt" for an mpc86xx
+   "fsl,mpc823-wdt" for an mpc8xx
+- reg: base physical address and length of the area hosting the
+   watchdog registers.
+   On the 83xx, "Watchdog Timer Registers" area:   <0x200 0x100>
+   On the 86xx, "Watchdog Timer Registers" area:   <0xe4000 0x100>
+   On the 8xx, "General System Interface Unit" area: <0x0 0x10>
+
+Optional properties:
+- reg: additionnal physical address and length (4) of location of the
+   Reset Status Register (called RSTRSCR on the mpc86xx)
+   On the 83xx, it is located at offset 0x910
+   On the 86xx, it is located at offset 0xe0094
+   On the 8xx, it is located at offset 0x288
+
+Example:
+   WDT: watchdog@0 {
+   compatible = "fsl,mpc823-wdt";
+   reg = <0x0 0x10 0x288 0x4>;
+   };
-- 
2.13.3



[PATCH v2 2/3] watchdog: mpc8xxx: provide boot status

2018-09-14 Thread Christophe Leroy
mpc8xxx watchdog driver supports the following platforms:
- mpc8xx
- mpc83xx
- mpc86xx

Those three platforms have a 32 bits register which provides the
reason of the last boot, including whether it was caused by the
watchdog.

mpc8xx: Register RSR, bit SWRS (bit 3)
mpc83xx: Register RSR, bit SWRS (bit 28)
mpc86xx: Register RSTRSCR, bit WDT_RR (bit 11)

This patch maps the register as defined in the device tree and updates
wdt.bootstatus based on the value of the watchdog related bit. Then
the information can be retrieved via the WDIOC_GETBOOTSTATUS ioctl.

Hereunder is an example of devicetree for mpc8xx,
the Reset Status Register being at offset 0x288:

WDT: watchdog@0 {
compatible = "fsl,mpc823-wdt";
reg = <0x0 0x10 0x288 0x4>;
};

On the mpc83xx, RSR is at offset 0x910
On the mpc86xx, RSTRSCR is at offset 0xe0094

Suggested-by: Radu Rendec 
Tested-by: Christophe Leroy  # On mpc885
Signed-off-by: Christophe Leroy 
---
 drivers/watchdog/mpc8xxx_wdt.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c
index 1dcf5f10cdd9..4a4700458b17 100644
--- a/drivers/watchdog/mpc8xxx_wdt.c
+++ b/drivers/watchdog/mpc8xxx_wdt.c
@@ -47,6 +47,7 @@ struct mpc8xxx_wdt {
 struct mpc8xxx_wdt_type {
int prescaler;
bool hw_enabled;
+   u32 rsr_mask;
 };
 
 struct mpc8xxx_wdt_ddata {
@@ -136,6 +137,7 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev)
u32 freq = fsl_get_sys_freq();
bool enabled;
struct device *dev = >dev;
+   u32 __iomem *rsr = NULL;
 
wdt_type = of_device_get_match_data(dev);
if (!wdt_type)
@@ -159,6 +161,21 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev)
return -ENODEV;
}
 
+   res = platform_get_resource(ofdev, IORESOURCE_MEM, 1);
+   if (res)
+   rsr = ioremap(res->start, resource_size(res));
+   if (rsr) {
+   bool status = in_be32(rsr) & wdt_type->rsr_mask;
+
+   ddata->wdd.bootstatus = status ? WDIOF_CARDRESET : 0;
+/* clear reset status bits related to watchdog timer */
+   out_be32(rsr, wdt_type->rsr_mask);
+   iounmap(rsr);
+
+   dev_info(dev, "Last boot was %scaused by watchdog\n",
+status ? "" : "not ");
+   }
+
spin_lock_init(>lock);
 
ddata->wdd.info = _wdt_info,
@@ -216,6 +233,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = {
.compatible = "mpc83xx_wdt",
.data = &(struct mpc8xxx_wdt_type) {
.prescaler = 0x1,
+   .rsr_mask = BIT(3), /* RSR Bit SWRS */
},
},
{
@@ -223,6 +241,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = {
.data = &(struct mpc8xxx_wdt_type) {
.prescaler = 0x1,
.hw_enabled = true,
+   .rsr_mask = BIT(20), /* RSTRSCR Bit WDT_RR */
},
},
{
@@ -230,6 +249,7 @@ static const struct of_device_id mpc8xxx_wdt_match[] = {
.data = &(struct mpc8xxx_wdt_type) {
.prescaler = 0x800,
.hw_enabled = true,
+   .rsr_mask = BIT(28), /* RSR Bit SWRS */
},
},
{},
-- 
2.13.3



[PATCH v2 1/3] watchdog: mpc8xxx: use dev_xxxx() instead of pr_xxxx()

2018-09-14 Thread Christophe Leroy
mpc8xxx watchdog driver is a platform device drivers, it is
therefore possible to use dev_xxx() messaging rather than pr_xxx()

Signed-off-by: Christophe Leroy 
---
 drivers/watchdog/mpc8xxx_wdt.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/watchdog/mpc8xxx_wdt.c b/drivers/watchdog/mpc8xxx_wdt.c
index aca2d6323f8a..1dcf5f10cdd9 100644
--- a/drivers/watchdog/mpc8xxx_wdt.c
+++ b/drivers/watchdog/mpc8xxx_wdt.c
@@ -17,8 +17,6 @@
  * option) any later version.
  */
 
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
 #include 
 #include 
 #include 
@@ -137,26 +135,27 @@ static int mpc8xxx_wdt_probe(struct platform_device 
*ofdev)
struct mpc8xxx_wdt_ddata *ddata;
u32 freq = fsl_get_sys_freq();
bool enabled;
+   struct device *dev = >dev;
 
-   wdt_type = of_device_get_match_data(>dev);
+   wdt_type = of_device_get_match_data(dev);
if (!wdt_type)
return -EINVAL;
 
if (!freq || freq == -1)
return -EINVAL;
 
-   ddata = devm_kzalloc(>dev, sizeof(*ddata), GFP_KERNEL);
+   ddata = devm_kzalloc(dev, sizeof(*ddata), GFP_KERNEL);
if (!ddata)
return -ENOMEM;
 
res = platform_get_resource(ofdev, IORESOURCE_MEM, 0);
-   ddata->base = devm_ioremap_resource(>dev, res);
+   ddata->base = devm_ioremap_resource(dev, res);
if (IS_ERR(ddata->base))
return PTR_ERR(ddata->base);
 
enabled = in_be32(>base->swcrr) & SWCRR_SWEN;
if (!enabled && wdt_type->hw_enabled) {
-   pr_info("could not be enabled in software\n");
+   dev_info(dev, "could not be enabled in software\n");
return -ENODEV;
}
 
@@ -166,7 +165,7 @@ static int mpc8xxx_wdt_probe(struct platform_device *ofdev)
ddata->wdd.ops = _wdt_ops,
 
ddata->wdd.timeout = WATCHDOG_TIMEOUT;
-   watchdog_init_timeout(>wdd, timeout, >dev);
+   watchdog_init_timeout(>wdd, timeout, dev);
 
watchdog_set_nowayout(>wdd, nowayout);
 
@@ -189,12 +188,13 @@ static int mpc8xxx_wdt_probe(struct platform_device 
*ofdev)
 
ret = watchdog_register_device(>wdd);
if (ret) {
-   pr_err("cannot register watchdog device (err=%d)\n", ret);
+   dev_err(dev, "cannot register watchdog device (err=%d)\n", ret);
return ret;
}
 
-   pr_info("WDT driver for MPC8xxx initialized. mode:%s timeout=%d sec\n",
-   reset ? "reset" : "interrupt", ddata->wdd.timeout);
+   dev_info(dev,
+"WDT driver for MPC8xxx initialized. mode:%s timeout=%d sec\n",
+reset ? "reset" : "interrupt", ddata->wdd.timeout);
 
platform_set_drvdata(ofdev, ddata);
return 0;
@@ -204,8 +204,8 @@ static int mpc8xxx_wdt_remove(struct platform_device *ofdev)
 {
struct mpc8xxx_wdt_ddata *ddata = platform_get_drvdata(ofdev);
 
-   pr_crit("Watchdog removed, expect the %s soon!\n",
-   reset ? "reset" : "machine check exception");
+   dev_crit(>dev, "Watchdog removed, expect the %s soon!\n",
+reset ? "reset" : "machine check exception");
watchdog_unregister_device(>wdd);
 
return 0;
-- 
2.13.3



[PATCH] kdb: use correct pointer when 'btc' calls 'btt'

2018-09-14 Thread Christophe Leroy
On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Signed-off-by: Christophe Leroy 
Cc:  # 4.15+
---
 kernel/debug/kdb/kdb_bt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
index 6ad4a9fcbd6f..7921ae4fca8d 100644
--- a/kernel/debug/kdb/kdb_bt.c
+++ b/kernel/debug/kdb/kdb_bt.c
@@ -179,14 +179,14 @@ kdb_bt(int argc, const char **argv)
kdb_printf("no process for cpu %ld\n", cpu);
return 0;
}
-   sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));
+   sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
kdb_parse(buf);
return 0;
}
kdb_printf("btc: cpu status: ");
kdb_parse("cpu\n");
for_each_online_cpu(cpu) {
-   sprintf(buf, "btt 0x%p\n", KDB_TSK(cpu));
+   sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
kdb_parse(buf);
touch_nmi_watchdog();
}
-- 
2.13.3



[PATCH 30/30] docs/boot-time-mm: remove bootmem documentation

2018-09-14 Thread Mike Rapoport
Signed-off-by: Mike Rapoport 
---
 Documentation/core-api/boot-time-mm.rst | 71 +
 1 file changed, 10 insertions(+), 61 deletions(-)

diff --git a/Documentation/core-api/boot-time-mm.rst 
b/Documentation/core-api/boot-time-mm.rst
index 03cb164..e5ec9f1 100644
--- a/Documentation/core-api/boot-time-mm.rst
+++ b/Documentation/core-api/boot-time-mm.rst
@@ -5,54 +5,23 @@ Boot time memory management
 Early system initialization cannot use "normal" memory management
 simply because it is not set up yet. But there is still need to
 allocate memory for various data structures, for instance for the
-physical page allocator. To address this, a specialized allocator
-called the :ref:`Boot Memory Allocator `, or bootmem, was
-introduced. Several years later PowerPC developers added a "Logical
-Memory Blocks" allocator, which was later adopted by other
-architectures and renamed to :ref:`memblock `. There is also
-a compatibility layer called `nobootmem` that translates bootmem
-allocation interfaces to memblock calls.
+physical page allocator.
 
-The selection of the early allocator is done using
-``CONFIG_NO_BOOTMEM`` and ``CONFIG_HAVE_MEMBLOCK`` kernel
-configuration options. These options are enabled or disabled
-statically by the architectures' Kconfig files.
-
-* Architectures that rely only on bootmem select
-  ``CONFIG_NO_BOOTMEM=n && CONFIG_HAVE_MEMBLOCK=n``.
-* The users of memblock with the nobootmem compatibility layer set
-  ``CONFIG_NO_BOOTMEM=y && CONFIG_HAVE_MEMBLOCK=y``.
-* And for those that use both memblock and bootmem the configuration
-  includes ``CONFIG_NO_BOOTMEM=n && CONFIG_HAVE_MEMBLOCK=y``.
-
-Whichever allocator is used, it is the responsibility of the
-architecture specific initialization to set it up in
-:c:func:`setup_arch` and tear it down in :c:func:`mem_init` functions.
+A specialized allocator called ``memblock`` performs the
+boot time memory management. The architecture specific initialization
+must set it up in :c:func:`setup_arch` and tear it down in
+:c:func:`mem_init` functions.
 
 Once the early memory management is available it offers a variety of
 functions and macros for memory allocations. The allocation request
 may be directed to the first (and probably the only) node or to a
 particular node in a NUMA system. There are API variants that panic
-when an allocation fails and those that don't. And more recent and
-advanced memblock even allows controlling its own behaviour.
-
-.. _bootmem:
-
-Bootmem
-===
-
-(mostly stolen from Mel Gorman's "Understanding the Linux Virtual
-Memory Manager" `book`_)
-
-.. _book: https://www.kernel.org/doc/gorman/
-
-.. kernel-doc:: mm/bootmem.c
-   :doc: bootmem overview
+when an allocation fails and those that don't.
 
-.. _memblock:
+Memblock also offers a variety of APIs that control its own behaviour.
 
-Memblock
-
+Memblock Overview
+=
 
 .. kernel-doc:: mm/memblock.c
:doc: memblock overview
@@ -61,26 +30,6 @@ Memblock
 Functions and structures
 
 
-Common API
---
-
-The functions that are described in this section are available
-regardless of what early memory manager is enabled.
-
-.. kernel-doc:: mm/nobootmem.c
-
-Bootmem specific API
-
-
-These interfaces available only with bootmem, i.e when ``CONFIG_NO_BOOTMEM=n``
-
-.. kernel-doc:: include/linux/bootmem.h
-.. kernel-doc:: mm/bootmem.c
-   :nodocs:
-
-Memblock specific API
--
-
 Here is the description of memblock data structures, functions and
 macros. Some of them are actually internal, but since they are
 documented it would be silly to omit them. Besides, reading the
@@ -89,4 +38,4 @@ really happens under the hood.
 
 .. kernel-doc:: include/linux/memblock.h
 .. kernel-doc:: mm/memblock.c
-   :nodocs:
+   :functions:
-- 
2.7.4



[PATCH 29/30] mm: remove include/linux/bootmem.h

2018-09-14 Thread Mike Rapoport
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include 

@@
@@
- #include 
+ #include 

Signed-off-by: Mike Rapoport 
---
 arch/alpha/kernel/core_cia.c|   2 +-
 arch/alpha/kernel/core_irongate.c   |   1 -
 arch/alpha/kernel/core_marvel.c |   2 +-
 arch/alpha/kernel/core_titan.c  |   2 +-
 arch/alpha/kernel/core_tsunami.c|   2 +-
 arch/alpha/kernel/pci-noop.c|   2 +-
 arch/alpha/kernel/pci.c |   2 +-
 arch/alpha/kernel/pci_iommu.c   |   2 +-
 arch/alpha/kernel/setup.c   |   1 -
 arch/alpha/kernel/sys_nautilus.c|   2 +-
 arch/alpha/mm/init.c|   2 +-
 arch/alpha/mm/numa.c|   1 -
 arch/arc/kernel/unwind.c|   2 +-
 arch/arc/mm/highmem.c   |   2 +-
 arch/arc/mm/init.c  |   1 -
 arch/arm/kernel/devtree.c   |   1 -
 arch/arm/kernel/setup.c |   1 -
 arch/arm/mach-omap2/omap_hwmod.c|   2 +-
 arch/arm/mm/dma-mapping.c   |   1 -
 arch/arm/mm/init.c  |   1 -
 arch/arm/xen/mm.c   |   1 -
 arch/arm/xen/p2m.c  |   2 +-
 arch/arm64/kernel/acpi.c|   1 -
 arch/arm64/kernel/acpi_numa.c   |   1 -
 arch/arm64/kernel/setup.c   |   1 -
 arch/arm64/mm/dma-mapping.c |   2 +-
 arch/arm64/mm/init.c|   1 -
 arch/arm64/mm/kasan_init.c  |   1 -
 arch/arm64/mm/numa.c|   1 -
 arch/c6x/kernel/setup.c |   1 -
 arch/c6x/mm/init.c  |   2 +-
 arch/h8300/kernel/setup.c   |   1 -
 arch/h8300/mm/init.c|   2 +-
 arch/hexagon/kernel/dma.c   |   2 +-
 arch/hexagon/kernel/setup.c |   2 +-
 arch/hexagon/mm/init.c  |   1 -
 arch/ia64/kernel/crash.c|   2 +-
 arch/ia64/kernel/efi.c  |   2 +-
 arch/ia64/kernel/ia64_ksyms.c   |   2 +-
 arch/ia64/kernel/iosapic.c  |   2 +-
 arch/ia64/kernel/mca.c  |   2 +-
 arch/ia64/kernel/mca_drv.c  |   2 +-
 arch/ia64/kernel/setup.c|   1 -
 arch/ia64/kernel/smpboot.c  |   2 +-
 arch/ia64/kernel/topology.c |   2 +-
 arch/ia64/kernel/unwind.c   |   2 +-
 arch/ia64/mm/contig.c   |   1 -
 arch/ia64/mm/discontig.c|   1 -
 arch/ia64/mm/init.c |   1 -
 arch/ia64/mm/numa.c |   2 +-
 arch/ia64/mm/tlb.c  |   2 +-
 arch/ia64/pci/pci.c |   2 +-
 arch/ia64/sn/kernel/bte.c   |   2 +-
 arch/ia64/sn/kernel/io_common.c |   2 +-
 arch/ia64/sn/kernel/setup.c |   2 +-
 arch/m68k/atari/stram.c |   2 +-
 arch/m68k/coldfire/m54xx.c  |   2 +-
 arch/m68k/kernel/setup_mm.c |   1 -
 arch/m68k/kernel/setup_no.c |   1 -
 arch/m68k/kernel/uboot.c|   2 +-
 arch/m68k/mm/init.c |   2 +-
 arch/m68k/mm/mcfmmu.c   |   1 -
 arch/m68k/mm/motorola.c |   1 -
 arch/m68k/mm/sun3mmu.c  |   2 +-
 arch/m68k/sun3/config.c |   2 +-
 arch/m68k/sun3/dvma.c   |   2 +-
 arch/m68k/sun3/mmu_emu.c|   2 +-
 arch/m68k/sun3/sun3dvma.c   |   2 +-
 arch/m68k/sun3x/dvma.c  |   2 +-
 arch/microblaze/mm/consistent.c |   2 +-
 arch/microblaze/mm/init.c   |   3 +-
 arch/microblaze/pci/pci-common.c|   2 +-
 arch/mips/ar7/memory.c  |   2 +-
 arch/mips/ath79/setup.c |   2 +-
 arch/mips/bcm63xx/prom.c|   2 +-
 arch/mips/bcm63xx/setup.c   |   2 +-
 arch/mips/bmips/setup.c |   2 +-
 arch/mips/cavium-octeon/dma-octeon.c|   2 +-
 arch/mips/dec/prom/memory.c |   2 +-
 arch/mips/emma/common/prom.c|   2 +-
 arch/mips/fw/arc/memory.c   |   2 +-
 arch/mips/jazz/jazzdma.c|   2 +-
 arch/mips/kernel/crash.c|   2 +-
 arch/mips/kernel/crash_dump.c   |   2 +-
 arch/mips/kernel/prom.c |   2 +-
 arch/mips/kernel/setup.c|   1 -
 arch/mips/kernel/traps.c|   1 -
 

[PATCH 28/30] memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variants

2018-09-14 Thread Mike Rapoport
Drop BOOTMEM_ALLOC_ACCESSIBLE and BOOTMEM_ALLOC_ANYWHERE in favor of
identical MEMBLOCK definitions.

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/ia64/mm/discontig.c   | 2 +-
 arch/powerpc/kernel/setup_64.c | 2 +-
 arch/sparc/kernel/smp_64.c | 2 +-
 arch/x86/kernel/setup_percpu.c | 2 +-
 arch/x86/mm/kasan_init_64.c| 4 ++--
 mm/hugetlb.c   | 3 ++-
 mm/kasan/kasan_init.c  | 2 +-
 mm/memblock.c  | 8 
 mm/page_ext.c  | 2 +-
 mm/sparse-vmemmap.c| 3 ++-
 mm/sparse.c| 5 +++--
 11 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 918dda9..70609f8 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -453,7 +453,7 @@ static void __init *memory_less_node_alloc(int nid, 
unsigned long pernodesize)
 
ptr = memblock_alloc_try_nid(pernodesize, PERCPU_PAGE_SIZE,
 __pa(MAX_DMA_ADDRESS),
-BOOTMEM_ALLOC_ACCESSIBLE,
+MEMBLOCK_ALLOC_ACCESSIBLE,
 bestnode);
 
return ptr;
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index e564b27..b3e70cc 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -758,7 +758,7 @@ void __init emergency_stack_init(void)
 static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
 {
return memblock_alloc_try_nid(size, align, __pa(MAX_DMA_ADDRESS),
- BOOTMEM_ALLOC_ACCESSIBLE,
+ MEMBLOCK_ALLOC_ACCESSIBLE,
  early_cpu_to_node(cpu));
 
 }
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index a087a6a..6cc80d0 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1595,7 +1595,7 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, 
size_t size,
 cpu, size, __pa(ptr));
} else {
ptr = memblock_alloc_try_nid(size, align, goal,
-BOOTMEM_ALLOC_ACCESSIBLE, node);
+MEMBLOCK_ALLOC_ACCESSIBLE, node);
pr_debug("per cpu data for cpu%d %lu bytes on node%d at "
 "%016lx\n", cpu, size, node, __pa(ptr));
}
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index a006f1b..483412f 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -114,7 +114,7 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, 
unsigned long size,
 cpu, size, __pa(ptr));
} else {
ptr = memblock_alloc_try_nid_nopanic(size, align, goal,
-BOOTMEM_ALLOC_ACCESSIBLE,
+MEMBLOCK_ALLOC_ACCESSIBLE,
 node);
 
pr_debug("per cpu data for cpu%d %lu bytes on node%d at 
%016lx\n",
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 77b857c..8f87499 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -29,10 +29,10 @@ static __init void *early_alloc(size_t size, int nid, bool 
panic)
 {
if (panic)
return memblock_alloc_try_nid(size, size,
-   __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid);
+   __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, nid);
else
return memblock_alloc_try_nid_nopanic(size, size,
-   __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid);
+   __pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, nid);
 }
 
 static void __init kasan_populate_pmd(pmd_t *pmd, unsigned long addr,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3b63370..67629dc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2102,7 +2103,7 @@ int __alloc_bootmem_huge_page(struct hstate *h)
 
addr = memblock_alloc_try_nid_raw(
huge_page_size(h), huge_page_size(h),
-   0, BOOTMEM_ALLOC_ACCESSIBLE, node);
+   0, MEMBLOCK_ALLOC_ACCESSIBLE, node);
if (addr) {
/*
 * Use the beginning of the huge page to store the
diff --git a/mm/kasan/kasan_init.c b/mm/kasan/kasan_init.c
index 24d734b..785a970 100644
--- a/mm/kasan/kasan_init.c
+++ b/mm/kasan/kasan_init.c
@@ -84,7 +84,7 @@ static inline bool kasan_zero_page_entry(pte_t pte)
 static __init void *early_alloc(size_t size, int node)
 {

[PATCH 27/30] mm: remove nobootmem

2018-09-14 Thread Mike Rapoport
Move a few remaining functions from nobootmem.c to memblock.c and remove
nobootmem

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 mm/Makefile|   1 -
 mm/memblock.c  | 104 ++
 mm/nobootmem.c | 128 -
 3 files changed, 104 insertions(+), 129 deletions(-)
 delete mode 100644 mm/nobootmem.c

diff --git a/mm/Makefile b/mm/Makefile
index ca3c844..d210cc9 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -42,7 +42,6 @@ obj-y := filemap.o mempool.o oom_kill.o 
fadvise.o \
   debug.o $(mmu-y)
 
 obj-y += init-mm.o
-obj-y += nobootmem.o
 obj-y += memblock.o
 
 ifdef CONFIG_MMU
diff --git a/mm/memblock.c b/mm/memblock.c
index a2cd61d..4591f38 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -82,6 +82,16 @@
  * initialization compltes.
  */
 
+#ifndef CONFIG_NEED_MULTIPLE_NODES
+struct pglist_data __refdata contig_page_data;
+EXPORT_SYMBOL(contig_page_data);
+#endif
+
+unsigned long max_low_pfn;
+unsigned long min_low_pfn;
+unsigned long max_pfn;
+unsigned long long max_possible_pfn;
+
 static struct memblock_region 
memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
 static struct memblock_region 
memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
 #ifdef CONFIG_HAVE_MEMBLOCK_PHYS_MAP
@@ -1929,6 +1939,100 @@ static int __init early_memblock(char *p)
 }
 early_param("memblock", early_memblock);
 
+static void __init __free_pages_memory(unsigned long start, unsigned long end)
+{
+   int order;
+
+   while (start < end) {
+   order = min(MAX_ORDER - 1UL, __ffs(start));
+
+   while (start + (1UL << order) > end)
+   order--;
+
+   memblock_free_pages(pfn_to_page(start), start, order);
+
+   start += (1UL << order);
+   }
+}
+
+static unsigned long __init __free_memory_core(phys_addr_t start,
+phys_addr_t end)
+{
+   unsigned long start_pfn = PFN_UP(start);
+   unsigned long end_pfn = min_t(unsigned long,
+ PFN_DOWN(end), max_low_pfn);
+
+   if (start_pfn >= end_pfn)
+   return 0;
+
+   __free_pages_memory(start_pfn, end_pfn);
+
+   return end_pfn - start_pfn;
+}
+
+static unsigned long __init free_low_memory_core_early(void)
+{
+   unsigned long count = 0;
+   phys_addr_t start, end;
+   u64 i;
+
+   memblock_clear_hotplug(0, -1);
+
+   for_each_reserved_mem_region(i, , )
+   reserve_bootmem_region(start, end);
+
+   /*
+* We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
+*  because in some case like Node0 doesn't have RAM installed
+*  low ram will be on Node1
+*/
+   for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, , ,
+   NULL)
+   count += __free_memory_core(start, end);
+
+   return count;
+}
+
+static int reset_managed_pages_done __initdata;
+
+void reset_node_managed_pages(pg_data_t *pgdat)
+{
+   struct zone *z;
+
+   for (z = pgdat->node_zones; z < pgdat->node_zones + MAX_NR_ZONES; z++)
+   z->managed_pages = 0;
+}
+
+void __init reset_all_zones_managed_pages(void)
+{
+   struct pglist_data *pgdat;
+
+   if (reset_managed_pages_done)
+   return;
+
+   for_each_online_pgdat(pgdat)
+   reset_node_managed_pages(pgdat);
+
+   reset_managed_pages_done = 1;
+}
+
+/**
+ * memblock_free_all - release free pages to the buddy allocator
+ *
+ * Return: the number of pages actually released.
+ */
+unsigned long __init memblock_free_all(void)
+{
+   unsigned long pages;
+
+   reset_all_zones_managed_pages();
+
+   pages = free_low_memory_core_early();
+   totalram_pages += pages;
+
+   return pages;
+}
+
 #if defined(CONFIG_DEBUG_FS) && !defined(CONFIG_ARCH_DISCARD_MEMBLOCK)
 
 static int memblock_debug_show(struct seq_file *m, void *private)
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
deleted file mode 100644
index 9608bc5..000
--- a/mm/nobootmem.c
+++ /dev/null
@@ -1,128 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- *  bootmem - A boot-time physical memory allocator and configurator
- *
- *  Copyright (C) 1999 Ingo Molnar
- *1999 Kanoj Sarcar, SGI
- *2008 Johannes Weiner
- *
- * Access to this subsystem has to be serialized externally (which is true
- * for the boot process anyway).
- */
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
-#include "internal.h"
-
-#ifndef CONFIG_NEED_MULTIPLE_NODES
-struct pglist_data __refdata contig_page_data;
-EXPORT_SYMBOL(contig_page_data);
-#endif
-
-unsigned long max_low_pfn;
-unsigned long min_low_pfn;
-unsigned long max_pfn;
-unsigned long long max_possible_pfn;
-
-static void __init 

[PATCH 26/30] memblock: rename __free_pages_bootmem to memblock_free_pages

2018-09-14 Thread Mike Rapoport
The conversion is done using

sed -i 's@__free_pages_bootmem@memblock_free_pages@' \
$(git grep -l __free_pages_bootmem)

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 mm/internal.h   | 2 +-
 mm/memblock.c   | 2 +-
 mm/nobootmem.c  | 2 +-
 mm/page_alloc.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 87256ae..291eb2b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -161,7 +161,7 @@ static inline struct page *pageblock_pfn_to_page(unsigned 
long start_pfn,
 }
 
 extern int __isolate_free_page(struct page *page, unsigned int order);
-extern void __free_pages_bootmem(struct page *page, unsigned long pfn,
+extern void memblock_free_pages(struct page *page, unsigned long pfn,
unsigned int order);
 extern void prep_compound_page(struct page *page, unsigned int order);
 extern void post_alloc_hook(struct page *page, unsigned int order,
diff --git a/mm/memblock.c b/mm/memblock.c
index 1534edb..a2cd61d 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1615,7 +1615,7 @@ void __init __memblock_free_late(phys_addr_t base, 
phys_addr_t size)
end = PFN_DOWN(base + size);
 
for (; cursor < end; cursor++) {
-   __free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
+   memblock_free_pages(pfn_to_page(cursor), cursor, 0);
totalram_pages++;
}
 }
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index bb64b09..9608bc5 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -43,7 +43,7 @@ static void __init __free_pages_memory(unsigned long start, 
unsigned long end)
while (start + (1UL << order) > end)
order--;
 
-   __free_pages_bootmem(pfn_to_page(start), start, order);
+   memblock_free_pages(pfn_to_page(start), start, order);
 
start += (1UL << order);
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 13e394c..f4a8bc8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1334,7 +1334,7 @@ meminit_pfn_in_nid(unsigned long pfn, int node,
 #endif
 
 
-void __init __free_pages_bootmem(struct page *page, unsigned long pfn,
+void __init memblock_free_pages(struct page *page, unsigned long pfn,
unsigned int order)
 {
if (early_page_uninitialised(pfn))
-- 
2.7.4



[PATCH 25/30] memblock: rename free_all_bootmem to memblock_free_all

2018-09-14 Thread Mike Rapoport
The conversion is done using

sed -i 's@free_all_bootmem@memblock_free_all@' \
$(git grep -l free_all_bootmem)

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/alpha/mm/init.c   | 2 +-
 arch/arc/mm/init.c | 2 +-
 arch/arm/mm/init.c | 2 +-
 arch/arm64/mm/init.c   | 2 +-
 arch/c6x/mm/init.c | 2 +-
 arch/h8300/mm/init.c   | 2 +-
 arch/hexagon/mm/init.c | 2 +-
 arch/ia64/mm/init.c| 2 +-
 arch/m68k/mm/init.c| 2 +-
 arch/microblaze/mm/init.c  | 2 +-
 arch/mips/loongson64/loongson-3/numa.c | 2 +-
 arch/mips/mm/init.c| 2 +-
 arch/mips/sgi-ip27/ip27-memory.c   | 2 +-
 arch/nds32/mm/init.c   | 2 +-
 arch/nios2/mm/init.c   | 2 +-
 arch/openrisc/mm/init.c| 2 +-
 arch/parisc/mm/init.c  | 2 +-
 arch/powerpc/mm/mem.c  | 2 +-
 arch/riscv/mm/init.c   | 2 +-
 arch/s390/mm/init.c| 2 +-
 arch/sh/mm/init.c  | 2 +-
 arch/sparc/mm/init_32.c| 2 +-
 arch/sparc/mm/init_64.c| 4 ++--
 arch/um/kernel/mem.c   | 2 +-
 arch/unicore32/mm/init.c   | 2 +-
 arch/x86/mm/highmem_32.c   | 2 +-
 arch/x86/mm/init_32.c  | 4 ++--
 arch/x86/mm/init_64.c  | 4 ++--
 arch/x86/xen/mmu_pv.c  | 2 +-
 arch/xtensa/mm/init.c  | 2 +-
 include/linux/bootmem.h| 2 +-
 mm/memblock.c  | 2 +-
 mm/nobootmem.c | 4 ++--
 mm/page_alloc.c| 2 +-
 mm/page_poison.c   | 2 +-
 35 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/arch/alpha/mm/init.c b/arch/alpha/mm/init.c
index 9d74520..853d153 100644
--- a/arch/alpha/mm/init.c
+++ b/arch/alpha/mm/init.c
@@ -282,7 +282,7 @@ mem_init(void)
 {
set_max_mapnr(max_low_pfn);
high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
-   free_all_bootmem();
+   memblock_free_all();
mem_init_print_info(NULL);
 }
 
diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c
index ba14506..0f29c65 100644
--- a/arch/arc/mm/init.c
+++ b/arch/arc/mm/init.c
@@ -218,7 +218,7 @@ void __init mem_init(void)
free_highmem_page(pfn_to_page(tmp));
 #endif
 
-   free_all_bootmem();
+   memblock_free_all();
mem_init_print_info(NULL);
 }
 
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 0cc8e04..d421a10 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -508,7 +508,7 @@ void __init mem_init(void)
 
/* this will put all unused low memory onto the freelists */
free_unused_memmap();
-   free_all_bootmem();
+   memblock_free_all();
 
 #ifdef CONFIG_SA
/* now that our DMA memory is actually so designated, we can free it */
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index e335452..ae21849 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -601,7 +601,7 @@ void __init mem_init(void)
free_unused_memmap();
 #endif
/* this will put all unused low memory onto the freelists */
-   free_all_bootmem();
+   memblock_free_all();
 
kexec_reserve_crashkres_pages();
 
diff --git a/arch/c6x/mm/init.c b/arch/c6x/mm/init.c
index dc369ad..3383df8 100644
--- a/arch/c6x/mm/init.c
+++ b/arch/c6x/mm/init.c
@@ -62,7 +62,7 @@ void __init mem_init(void)
high_memory = (void *)(memory_end & PAGE_MASK);
 
/* this will put all memory onto the freelists */
-   free_all_bootmem();
+   memblock_free_all();
 
mem_init_print_info(NULL);
 }
diff --git a/arch/h8300/mm/init.c b/arch/h8300/mm/init.c
index 5d31ac9..f2bf448 100644
--- a/arch/h8300/mm/init.c
+++ b/arch/h8300/mm/init.c
@@ -96,7 +96,7 @@ void __init mem_init(void)
max_mapnr = MAP_NR(high_memory);
 
/* this will put all low memory onto the freelists */
-   free_all_bootmem();
+   memblock_free_all();
 
mem_init_print_info(NULL);
 }
diff --git a/arch/hexagon/mm/init.c b/arch/hexagon/mm/init.c
index d789b9c..88643fa 100644
--- a/arch/hexagon/mm/init.c
+++ b/arch/hexagon/mm/init.c
@@ -68,7 +68,7 @@ unsigned long long kmap_generation;
 void __init mem_init(void)
 {
/*  No idea where this is actually declared.  Seems to evade LXR.  */
-   free_all_bootmem();
+   memblock_free_all();
mem_init_print_info(NULL);
 
/*
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 2169ca5..43ea4a4 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -627,7 +627,7 @@ mem_init (void)
 
set_max_mapnr(max_low_pfn);
high_memory = __va(max_low_pfn * PAGE_SIZE);
-   free_all_bootmem();
+   memblock_free_all();
mem_init_print_info(NULL);
 
/*

[PATCH 24/30] memblock: replace free_bootmem_late with memblock_free_late

2018-09-14 Thread Mike Rapoport
The free_bootmem_late and memblock_free_late do exactly the same thing:
they iterate over a range and give pages to the page allocator.

Replace calls to free_bootmem_late with calls to memblock_free_late and
remove the bootmem variant.

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/sparc/kernel/mdesc.c   |  3 ++-
 arch/x86/platform/efi/quirks.c  |  6 +++---
 drivers/firmware/efi/apple-properties.c |  2 +-
 include/linux/bootmem.h |  2 --
 mm/nobootmem.c  | 24 
 5 files changed, 6 insertions(+), 31 deletions(-)

diff --git a/arch/sparc/kernel/mdesc.c b/arch/sparc/kernel/mdesc.c
index 59131e7..a41526b 100644
--- a/arch/sparc/kernel/mdesc.c
+++ b/arch/sparc/kernel/mdesc.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -190,7 +191,7 @@ static void __init mdesc_memblock_free(struct mdesc_handle 
*hp)
 
alloc_size = PAGE_ALIGN(hp->handle_size);
start = __pa(hp);
-   free_bootmem_late(start, alloc_size);
+   memblock_free_late(start, alloc_size);
 }
 
 static struct mdesc_mem_ops memblock_mdesc_ops = {
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31c..7b4854c 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -332,7 +332,7 @@ void __init efi_reserve_boot_services(void)
 
/*
 * Because the following memblock_reserve() is paired
-* with free_bootmem_late() for this region in
+* with memblock_free_late() for this region in
 * efi_free_boot_services(), we must be extremely
 * careful not to reserve, and subsequently free,
 * critical regions of memory (like the kernel image) or
@@ -363,7 +363,7 @@ void __init efi_reserve_boot_services(void)
 * doesn't make sense as far as the firmware is
 * concerned, but it does provide us with a way to tag
 * those regions that must not be paired with
-* free_bootmem_late().
+* memblock_free_late().
 */
md->attribute |= EFI_MEMORY_RUNTIME;
}
@@ -413,7 +413,7 @@ void __init efi_free_boot_services(void)
size -= rm_size;
}
 
-   free_bootmem_late(start, size);
+   memblock_free_late(start, size);
}
 
if (!num_entries)
diff --git a/drivers/firmware/efi/apple-properties.c 
b/drivers/firmware/efi/apple-properties.c
index 60a9571..2b675f7 100644
--- a/drivers/firmware/efi/apple-properties.c
+++ b/drivers/firmware/efi/apple-properties.c
@@ -235,7 +235,7 @@ static int __init map_properties(void)
 */
data->len = 0;
memunmap(data);
-   free_bootmem_late(pa_data + sizeof(*data), data_len);
+   memblock_free_late(pa_data + sizeof(*data), data_len);
 
return ret;
}
diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index 706cf8e..bcc7e2f 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -30,8 +30,6 @@ extern unsigned long free_all_bootmem(void);
 extern void reset_node_managed_pages(pg_data_t *pgdat);
 extern void reset_all_zones_managed_pages(void);
 
-extern void free_bootmem_late(unsigned long physaddr, unsigned long size);
-
 /* We are using top down, so it is safe to use 0 here */
 #define BOOTMEM_LOW_LIMIT 0
 
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 85e1822..ee0f7fc 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -33,30 +33,6 @@ unsigned long min_low_pfn;
 unsigned long max_pfn;
 unsigned long long max_possible_pfn;
 
-/**
- * free_bootmem_late - free bootmem pages directly to page allocator
- * @addr: starting address of the range
- * @size: size of the range in bytes
- *
- * This is only useful when the bootmem allocator has already been torn
- * down, but we are still initializing the system.  Pages are given directly
- * to the page allocator, no bootmem metadata is updated because it is gone.
- */
-void __init free_bootmem_late(unsigned long addr, unsigned long size)
-{
-   unsigned long cursor, end;
-
-   kmemleak_free_part_phys(addr, size);
-
-   cursor = PFN_UP(addr);
-   end = PFN_DOWN(addr + size);
-
-   for (; cursor < end; cursor++) {
-   __free_pages_bootmem(pfn_to_page(cursor), cursor, 0);
-   totalram_pages++;
-   }
-}
-
 static void __init __free_pages_memory(unsigned long start, unsigned long end)
 {
int order;
-- 
2.7.4



[PATCH 23/30] memblock: replace free_bootmem{_node} with memblock_free

2018-09-14 Thread Mike Rapoport
The free_bootmem and free_bootmem_node are merely wrappers for
memblock_free. Replace their usage with a call to memblock_free using the
following semantic patch:

@@
expression e1, e2, e3;
@@
(
- free_bootmem(e1, e2)
+ memblock_free(e1, e2)
|
- free_bootmem_node(e1, e2, e3)
+ memblock_free(e2, e3)
)

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/alpha/kernel/core_irongate.c |  3 +--
 arch/arm64/mm/init.c  |  2 +-
 arch/mips/kernel/setup.c  |  2 +-
 arch/powerpc/kernel/setup_64.c|  2 +-
 arch/sparc/kernel/smp_64.c|  2 +-
 arch/um/kernel/mem.c  |  3 ++-
 arch/unicore32/mm/init.c  |  2 +-
 arch/x86/kernel/setup_percpu.c|  3 ++-
 arch/x86/kernel/tce_64.c  |  3 ++-
 arch/x86/xen/p2m.c|  3 ++-
 drivers/macintosh/smu.c   |  2 +-
 drivers/usb/early/xhci-dbc.c  | 11 ++-
 drivers/xen/swiotlb-xen.c |  4 +++-
 include/linux/bootmem.h   |  4 
 mm/nobootmem.c| 30 --
 15 files changed, 24 insertions(+), 52 deletions(-)

diff --git a/arch/alpha/kernel/core_irongate.c 
b/arch/alpha/kernel/core_irongate.c
index f709866..35572be 100644
--- a/arch/alpha/kernel/core_irongate.c
+++ b/arch/alpha/kernel/core_irongate.c
@@ -234,8 +234,7 @@ albacore_init_arch(void)
unsigned long size;
 
size = initrd_end - initrd_start;
-   free_bootmem_node(NODE_DATA(0), __pa(initrd_start),
- PAGE_ALIGN(size));
+   memblock_free(__pa(initrd_start), PAGE_ALIGN(size));
if (!move_initrd(pci_mem))
printk("irongate_init_arch: initrd too big "
   "(%ldK)\ndisabling initrd\n",
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 787e279..e335452 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -538,7 +538,7 @@ static inline void free_memmap(unsigned long start_pfn, 
unsigned long end_pfn)
 * memmap array.
 */
if (pg < pgend)
-   free_bootmem(pg, pgend - pg);
+   memblock_free(pg, pgend - pg);
 }
 
 /*
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index a6bc2f6..86c9eda 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -561,7 +561,7 @@ static void __init bootmem_init(void)
extern void show_kernel_relocation(const char *level);
 
offset = __pa_symbol(_text) - __pa_symbol(VMLINUX_LOAD_ADDRESS);
-   free_bootmem(__pa_symbol(VMLINUX_LOAD_ADDRESS), offset);
+   memblock_free(__pa_symbol(VMLINUX_LOAD_ADDRESS), offset);
 
 #if defined(CONFIG_DEBUG_KERNEL) && defined(CONFIG_DEBUG_INFO)
/*
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6add560..e564b27 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -765,7 +765,7 @@ static void * __init pcpu_fc_alloc(unsigned int cpu, size_t 
size, size_t align)
 
 static void __init pcpu_fc_free(void *ptr, size_t size)
 {
-   free_bootmem(__pa(ptr), size);
+   memblock_free(__pa(ptr), size);
 }
 
 static int pcpu_cpu_distance(unsigned int from, unsigned int to)
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index 337febd..a087a6a 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1607,7 +1607,7 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, 
size_t size,
 
 static void __init pcpu_free_bootmem(void *ptr, size_t size)
 {
-   free_bootmem(__pa(ptr), size);
+   memblock_free(__pa(ptr), size);
 }
 
 static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 185f6bb..3555c13 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -46,7 +47,7 @@ void __init mem_init(void)
 */
brk_end = (unsigned long) UML_ROUND_UP(sbrk(0));
map_memory(brk_end, __pa(brk_end), uml_reserved - brk_end, 1, 1, 0);
-   free_bootmem(__pa(brk_end), uml_reserved - brk_end);
+   memblock_free(__pa(brk_end), uml_reserved - brk_end);
uml_reserved = brk_end;
 
/* this will put all low memory onto the freelists */
diff --git a/arch/unicore32/mm/init.c b/arch/unicore32/mm/init.c
index 44ccc15..4c572ab 100644
--- a/arch/unicore32/mm/init.c
+++ b/arch/unicore32/mm/init.c
@@ -241,7 +241,7 @@ free_memmap(unsigned long start_pfn, unsigned long end_pfn)
 * free the section of the memmap array.
 */
if (pg < pgend)
-   free_bootmem(pg, pgend - pg);
+   memblock_free(pg, pgend - pg);
 }
 
 /*
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 

[PATCH 22/30] mm: nobootmem: remove bootmem allocation APIs

2018-09-14 Thread Mike Rapoport
The bootmem compatibility APIs are not used and can be removed.

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 include/linux/bootmem.h |  47 --
 mm/nobootmem.c  | 224 
 2 files changed, 271 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index c97c105..73f1272 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -36,33 +36,6 @@ extern void free_bootmem_node(pg_data_t *pgdat,
 extern void free_bootmem(unsigned long physaddr, unsigned long size);
 extern void free_bootmem_late(unsigned long physaddr, unsigned long size);
 
-extern void *__alloc_bootmem(unsigned long size,
-unsigned long align,
-unsigned long goal);
-extern void *__alloc_bootmem_nopanic(unsigned long size,
-unsigned long align,
-unsigned long goal) __malloc;
-extern void *__alloc_bootmem_node(pg_data_t *pgdat,
- unsigned long size,
- unsigned long align,
- unsigned long goal) __malloc;
-void *__alloc_bootmem_node_high(pg_data_t *pgdat,
- unsigned long size,
- unsigned long align,
- unsigned long goal) __malloc;
-extern void *__alloc_bootmem_node_nopanic(pg_data_t *pgdat,
- unsigned long size,
- unsigned long align,
- unsigned long goal) __malloc;
-void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
- unsigned long size,
- unsigned long align,
- unsigned long goal,
- unsigned long limit) __malloc;
-extern void *__alloc_bootmem_low(unsigned long size,
-unsigned long align,
-unsigned long goal) __malloc;
-
 /* We are using top down, so it is safe to use 0 here */
 #define BOOTMEM_LOW_LIMIT 0
 
@@ -70,26 +43,6 @@ extern void *__alloc_bootmem_low(unsigned long size,
 #define ARCH_LOW_ADDRESS_LIMIT  0xUL
 #endif
 
-#define alloc_bootmem(x) \
-   __alloc_bootmem(x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT)
-#define alloc_bootmem_align(x, align) \
-   __alloc_bootmem(x, align, BOOTMEM_LOW_LIMIT)
-#define alloc_bootmem_pages(x) \
-   __alloc_bootmem(x, PAGE_SIZE, BOOTMEM_LOW_LIMIT)
-#define alloc_bootmem_pages_nopanic(x) \
-   __alloc_bootmem_nopanic(x, PAGE_SIZE, BOOTMEM_LOW_LIMIT)
-#define alloc_bootmem_node(pgdat, x) \
-   __alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT)
-#define alloc_bootmem_node_nopanic(pgdat, x) \
-   __alloc_bootmem_node_nopanic(pgdat, x, SMP_CACHE_BYTES, 
BOOTMEM_LOW_LIMIT)
-#define alloc_bootmem_pages_node(pgdat, x) \
-   __alloc_bootmem_node(pgdat, x, PAGE_SIZE, BOOTMEM_LOW_LIMIT)
-
-#define alloc_bootmem_low(x) \
-   __alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
-#define alloc_bootmem_low_pages(x) \
-   __alloc_bootmem_low(x, PAGE_SIZE, 0)
-
 /* FIXME: use MEMBLOCK_ALLOC_* variants here */
 #define BOOTMEM_ALLOC_ACCESSIBLE   0
 #define BOOTMEM_ALLOC_ANYWHERE (~(phys_addr_t)0)
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 44ce7de..bc38e56 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -33,41 +33,6 @@ unsigned long min_low_pfn;
 unsigned long max_pfn;
 unsigned long long max_possible_pfn;
 
-static void * __init __alloc_memory_core_early(int nid, u64 size, u64 align,
-   u64 goal, u64 limit)
-{
-   void *ptr;
-   u64 addr;
-   enum memblock_flags flags = choose_memblock_flags();
-
-   if (limit > memblock.current_limit)
-   limit = memblock.current_limit;
-
-again:
-   addr = memblock_find_in_range_node(size, align, goal, limit, nid,
-  flags);
-   if (!addr && (flags & MEMBLOCK_MIRROR)) {
-   flags &= ~MEMBLOCK_MIRROR;
-   pr_warn("Could not allocate %pap bytes of mirrored memory\n",
-   );
-   goto again;
-   }
-   if (!addr)
-   return NULL;
-
-   if (memblock_reserve(addr, size))
-   return NULL;
-
-   ptr = phys_to_virt(addr);
-   memset(ptr, 0, size);
-   /*
-* The min_count is set to 0 so that bootmem allocated blocks
-* are never reported as leaks.
-*/
-   kmemleak_alloc(ptr, size, 0, 0);
-   return ptr;
-}
-
 /**
  * free_bootmem_late - free bootmem pages directly to page allocator
  * @addr: starting address of the range
@@ -215,192 +180,3 @@ void __init free_bootmem(unsigned long addr, unsigned 
long size)
 {
memblock_free(addr, size);
 }
-
-static void * __init 

[PATCH 21/30] memblock: replace alloc_bootmem with memblock_alloc

2018-09-14 Thread Mike Rapoport
The alloc_bootmem(size) is a shortcut for allocation of SMP_CACHE_BYTES
aligned memory. When the align parameter of memblock_alloc() is 0, the
alignment is implicitly set to SMP_CACHE_BYTES and thus alloc_bootmem(size)
and memblock_alloc(size, 0) are equivalent.

The conversion is done using the following semantic patch:

@@
expression size;
@@
- alloc_bootmem(size)
+ memblock_alloc(size, 0)

Signed-off-by: Mike Rapoport 
---
 arch/alpha/kernel/core_marvel.c | 4 ++--
 arch/alpha/kernel/pci-noop.c| 4 ++--
 arch/alpha/kernel/pci.c | 4 ++--
 arch/alpha/kernel/pci_iommu.c   | 4 ++--
 arch/ia64/kernel/mca.c  | 4 ++--
 arch/ia64/mm/tlb.c  | 4 ++--
 arch/m68k/sun3/sun3dvma.c   | 3 ++-
 arch/microblaze/mm/init.c   | 2 +-
 arch/mips/kernel/setup.c| 2 +-
 arch/um/drivers/net_kern.c  | 2 +-
 arch/um/drivers/vector_kern.c   | 2 +-
 arch/um/kernel/initrd.c | 2 +-
 arch/x86/kernel/acpi/boot.c | 3 ++-
 arch/x86/kernel/apic/io_apic.c  | 2 +-
 arch/x86/kernel/e820.c  | 2 +-
 arch/x86/platform/olpc/olpc_dt.c| 2 +-
 arch/xtensa/platforms/iss/network.c | 2 +-
 drivers/macintosh/smu.c | 2 +-
 init/main.c | 4 ++--
 19 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/arch/alpha/kernel/core_marvel.c b/arch/alpha/kernel/core_marvel.c
index bdebb8c2..1f00c94 100644
--- a/arch/alpha/kernel/core_marvel.c
+++ b/arch/alpha/kernel/core_marvel.c
@@ -82,7 +82,7 @@ mk_resource_name(int pe, int port, char *str)
char *name;

sprintf(tmp, "PCI %s PE %d PORT %d", str, pe, port);
-   name = alloc_bootmem(strlen(tmp) + 1);
+   name = memblock_alloc(strlen(tmp) + 1, 0);
strcpy(name, tmp);
 
return name;
@@ -117,7 +117,7 @@ alloc_io7(unsigned int pe)
return NULL;
}
 
-   io7 = alloc_bootmem(sizeof(*io7));
+   io7 = memblock_alloc(sizeof(*io7), 0);
io7->pe = pe;
raw_spin_lock_init(>irq_lock);
 
diff --git a/arch/alpha/kernel/pci-noop.c b/arch/alpha/kernel/pci-noop.c
index c7c5879..59cbfc2 100644
--- a/arch/alpha/kernel/pci-noop.c
+++ b/arch/alpha/kernel/pci-noop.c
@@ -33,7 +33,7 @@ alloc_pci_controller(void)
 {
struct pci_controller *hose;
 
-   hose = alloc_bootmem(sizeof(*hose));
+   hose = memblock_alloc(sizeof(*hose), 0);
 
*hose_tail = hose;
hose_tail = >next;
@@ -44,7 +44,7 @@ alloc_pci_controller(void)
 struct resource * __init
 alloc_resource(void)
 {
-   return alloc_bootmem(sizeof(struct resource));
+   return memblock_alloc(sizeof(struct resource), 0);
 }
 
 SYSCALL_DEFINE3(pciconfig_iobase, long, which, unsigned long, bus,
diff --git a/arch/alpha/kernel/pci.c b/arch/alpha/kernel/pci.c
index c668c3b..4cc3eb9 100644
--- a/arch/alpha/kernel/pci.c
+++ b/arch/alpha/kernel/pci.c
@@ -392,7 +392,7 @@ alloc_pci_controller(void)
 {
struct pci_controller *hose;
 
-   hose = alloc_bootmem(sizeof(*hose));
+   hose = memblock_alloc(sizeof(*hose), 0);
 
*hose_tail = hose;
hose_tail = >next;
@@ -403,7 +403,7 @@ alloc_pci_controller(void)
 struct resource * __init
 alloc_resource(void)
 {
-   return alloc_bootmem(sizeof(struct resource));
+   return memblock_alloc(sizeof(struct resource), 0);
 }
 
 
diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 0c05493..5d178c7 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -79,7 +79,7 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, 
dma_addr_t base,
printk("%s: couldn't allocate arena from node %d\n"
   "falling back to system-wide allocation\n",
   __func__, nid);
-   arena = alloc_bootmem(sizeof(*arena));
+   arena = memblock_alloc(sizeof(*arena), 0);
}
 
arena->ptes = memblock_alloc_node(sizeof(*arena), align, nid);
@@ -92,7 +92,7 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, 
dma_addr_t base,
 
 #else /* CONFIG_DISCONTIGMEM */
 
-   arena = alloc_bootmem(sizeof(*arena));
+   arena = memblock_alloc(sizeof(*arena), 0);
arena->ptes = memblock_alloc_from(mem_size, align, 0);
 
 #endif /* CONFIG_DISCONTIGMEM */
diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
index 5586926..7120976 100644
--- a/arch/ia64/kernel/mca.c
+++ b/arch/ia64/kernel/mca.c
@@ -361,9 +361,9 @@ static ia64_state_log_t ia64_state_log[IA64_MAX_LOG_TYPES];
 
 #define IA64_LOG_ALLOCATE(it, size) \
{ia64_state_log[it].isl_log[IA64_LOG_CURR_INDEX(it)] = \
-   (ia64_err_rec_t *)alloc_bootmem(size); \
+   (ia64_err_rec_t *)memblock_alloc(size, 0); \
ia64_state_log[it].isl_log[IA64_LOG_NEXT_INDEX(it)] = \
-   (ia64_err_rec_t *)alloc_bootmem(size);}
+   (ia64_err_rec_t *)memblock_alloc(size, 0);}
 #define 

[PATCH 20/30] memblock: replace __alloc_bootmem with memblock_alloc_from

2018-09-14 Thread Mike Rapoport
The functions are equivalent, just the later does not require nobootmem
translation layer.

The conversion is done using the following semantic patch:

@@
expression size, align, goal;
@@
- __alloc_bootmem(size, align, goal)
+ memblock_alloc_from(size, align, goal)

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/alpha/kernel/core_cia.c  |  2 +-
 arch/alpha/kernel/pci_iommu.c |  4 ++--
 arch/alpha/kernel/setup.c |  2 +-
 arch/ia64/kernel/mca.c|  4 ++--
 arch/ia64/mm/contig.c |  5 +++--
 arch/mips/kernel/traps.c  |  2 +-
 arch/sparc/kernel/prom_32.c   |  2 +-
 arch/sparc/kernel/smp_64.c| 10 +-
 arch/sparc/mm/init_32.c   |  2 +-
 arch/sparc/mm/init_64.c   |  9 ++---
 arch/sparc/mm/srmmu.c | 10 +-
 include/linux/bootmem.h   |  8 
 12 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/arch/alpha/kernel/core_cia.c b/arch/alpha/kernel/core_cia.c
index 4b38386..026ee95 100644
--- a/arch/alpha/kernel/core_cia.c
+++ b/arch/alpha/kernel/core_cia.c
@@ -331,7 +331,7 @@ cia_prepare_tbia_workaround(int window)
long i;
 
/* Use minimal 1K map. */
-   ppte = __alloc_bootmem(CIA_BROKEN_TBIA_SIZE, 32768, 0);
+   ppte = memblock_alloc_from(CIA_BROKEN_TBIA_SIZE, 32768, 0);
pte = (virt_to_phys(ppte) >> (PAGE_SHIFT - 1)) | 1;
 
for (i = 0; i < CIA_BROKEN_TBIA_SIZE / sizeof(unsigned long); ++i)
diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index b52d76f..0c05493 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -87,13 +87,13 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, 
dma_addr_t base,
printk("%s: couldn't allocate arena ptes from node %d\n"
   "falling back to system-wide allocation\n",
   __func__, nid);
-   arena->ptes = __alloc_bootmem(mem_size, align, 0);
+   arena->ptes = memblock_alloc_from(mem_size, align, 0);
}
 
 #else /* CONFIG_DISCONTIGMEM */
 
arena = alloc_bootmem(sizeof(*arena));
-   arena->ptes = __alloc_bootmem(mem_size, align, 0);
+   arena->ptes = memblock_alloc_from(mem_size, align, 0);
 
 #endif /* CONFIG_DISCONTIGMEM */
 
diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index 4f0d944..64c06a0 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -294,7 +294,7 @@ move_initrd(unsigned long mem_limit)
unsigned long size;
 
size = initrd_end - initrd_start;
-   start = __alloc_bootmem(PAGE_ALIGN(size), PAGE_SIZE, 0);
+   start = memblock_alloc_from(PAGE_ALIGN(size), PAGE_SIZE, 0);
if (!start || __pa(start) + size > mem_limit) {
initrd_start = initrd_end = 0;
return NULL;
diff --git a/arch/ia64/kernel/mca.c b/arch/ia64/kernel/mca.c
index 6115464..5586926 100644
--- a/arch/ia64/kernel/mca.c
+++ b/arch/ia64/kernel/mca.c
@@ -1835,8 +1835,8 @@ format_mca_init_stack(void *mca_data, unsigned long 
offset,
 /* Caller prevents this from being called after init */
 static void * __ref mca_bootmem(void)
 {
-   return __alloc_bootmem(sizeof(struct ia64_mca_cpu),
-   KERNEL_STACK_SIZE, 0);
+   return memblock_alloc_from(sizeof(struct ia64_mca_cpu),
+  KERNEL_STACK_SIZE, 0);
 }
 
 /* Do per-CPU MCA-related initialization.  */
diff --git a/arch/ia64/mm/contig.c b/arch/ia64/mm/contig.c
index e2e40bb..9e5c23a 100644
--- a/arch/ia64/mm/contig.c
+++ b/arch/ia64/mm/contig.c
@@ -85,8 +85,9 @@ void *per_cpu_init(void)
 static inline void
 alloc_per_cpu_data(void)
 {
-   cpu_data = __alloc_bootmem(PERCPU_PAGE_SIZE * num_possible_cpus(),
-  PERCPU_PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
+   cpu_data = memblock_alloc_from(PERCPU_PAGE_SIZE * num_possible_cpus(),
+  PERCPU_PAGE_SIZE,
+  __pa(MAX_DMA_ADDRESS));
 }
 
 /**
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 5feef28..623dc18 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2263,7 +2263,7 @@ void __init trap_init(void)
 
memblock_set_bottom_up(true);
ebase = (unsigned long)
-   __alloc_bootmem(size, 1 << fls(size), 0);
+   memblock_alloc_from(size, 1 << fls(size), 0);
memblock_set_bottom_up(false);
 
/*
diff --git a/arch/sparc/kernel/prom_32.c b/arch/sparc/kernel/prom_32.c
index b51cbb9..4389944 100644
--- a/arch/sparc/kernel/prom_32.c
+++ b/arch/sparc/kernel/prom_32.c
@@ -32,7 +32,7 @@ void * __init prom_early_alloc(unsigned long size)
 {
void *ret;
 
-   ret = __alloc_bootmem(size, SMP_CACHE_BYTES, 0UL);
+   ret = memblock_alloc_from(size, SMP_CACHE_BYTES, 0UL);
if (ret != NULL)
memset(ret, 0, 

[PATCH 19/30] memblock: replace alloc_bootmem_pages with memblock_alloc

2018-09-14 Thread Mike Rapoport
The alloc_bootmem_pages() function allocates PAGE_SIZE aligned memory.
memblock_alloc() with alignment set to PAGE_SIZE does exactly the same
thing.

The conversion is done using the following semantic patch:

@@
expression e;
@@
- alloc_bootmem_pages(e)
+ memblock_alloc(e, PAGE_SIZE)

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/c6x/mm/init.c | 3 ++-
 arch/h8300/mm/init.c   | 2 +-
 arch/m68k/mm/init.c| 2 +-
 arch/m68k/mm/mcfmmu.c  | 4 ++--
 arch/m68k/mm/motorola.c| 2 +-
 arch/m68k/mm/sun3mmu.c | 4 ++--
 arch/sh/mm/init.c  | 4 ++--
 arch/x86/kernel/apic/io_apic.c | 3 ++-
 arch/x86/mm/init_64.c  | 2 +-
 drivers/xen/swiotlb-xen.c  | 3 ++-
 10 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/arch/c6x/mm/init.c b/arch/c6x/mm/init.c
index 4cc72b0..dc369ad 100644
--- a/arch/c6x/mm/init.c
+++ b/arch/c6x/mm/init.c
@@ -38,7 +38,8 @@ void __init paging_init(void)
struct pglist_data *pgdat = NODE_DATA(0);
unsigned long zones_size[MAX_NR_ZONES] = {0, };
 
-   empty_zero_page  = (unsigned long) alloc_bootmem_pages(PAGE_SIZE);
+   empty_zero_page  = (unsigned long) memblock_alloc(PAGE_SIZE,
+ PAGE_SIZE);
memset((void *)empty_zero_page, 0, PAGE_SIZE);
 
/*
diff --git a/arch/h8300/mm/init.c b/arch/h8300/mm/init.c
index 015287a..5d31ac9 100644
--- a/arch/h8300/mm/init.c
+++ b/arch/h8300/mm/init.c
@@ -67,7 +67,7 @@ void __init paging_init(void)
 * Initialize the bad page table and bad page to point
 * to a couple of allocated pages.
 */
-   empty_zero_page = (unsigned long)alloc_bootmem_pages(PAGE_SIZE);
+   empty_zero_page = (unsigned long)memblock_alloc(PAGE_SIZE, PAGE_SIZE);
memset((void *)empty_zero_page, 0, PAGE_SIZE);
 
/*
diff --git a/arch/m68k/mm/init.c b/arch/m68k/mm/init.c
index 38e2b27..977363e 100644
--- a/arch/m68k/mm/init.c
+++ b/arch/m68k/mm/init.c
@@ -93,7 +93,7 @@ void __init paging_init(void)
 
high_memory = (void *) end_mem;
 
-   empty_zero_page = alloc_bootmem_pages(PAGE_SIZE);
+   empty_zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 
/*
 * Set up SFC/DFC registers (user data space).
diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
index f5453d9..38a1d92 100644
--- a/arch/m68k/mm/mcfmmu.c
+++ b/arch/m68k/mm/mcfmmu.c
@@ -44,7 +44,7 @@ void __init paging_init(void)
enum zone_type zone;
int i;
 
-   empty_zero_page = (void *) alloc_bootmem_pages(PAGE_SIZE);
+   empty_zero_page = (void *) memblock_alloc(PAGE_SIZE, PAGE_SIZE);
memset((void *) empty_zero_page, 0, PAGE_SIZE);
 
pg_dir = swapper_pg_dir;
@@ -52,7 +52,7 @@ void __init paging_init(void)
 
size = num_pages * sizeof(pte_t);
size = (size + PAGE_SIZE) & ~(PAGE_SIZE-1);
-   next_pgtable = (unsigned long) alloc_bootmem_pages(size);
+   next_pgtable = (unsigned long) memblock_alloc(size, PAGE_SIZE);
 
bootmem_end = (next_pgtable + size + PAGE_SIZE) & PAGE_MASK;
pg_dir += PAGE_OFFSET >> PGDIR_SHIFT;
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 8bcf57e..2113eec 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -276,7 +276,7 @@ void __init paging_init(void)
 * initialize the bad page table and bad page to point
 * to a couple of allocated pages
 */
-   empty_zero_page = alloc_bootmem_pages(PAGE_SIZE);
+   empty_zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 
/*
 * Set up SFC/DFC registers
diff --git a/arch/m68k/mm/sun3mmu.c b/arch/m68k/mm/sun3mmu.c
index 4a99799..19c05ab 100644
--- a/arch/m68k/mm/sun3mmu.c
+++ b/arch/m68k/mm/sun3mmu.c
@@ -45,7 +45,7 @@ void __init paging_init(void)
unsigned long zones_size[MAX_NR_ZONES] = { 0, };
unsigned long size;
 
-   empty_zero_page = alloc_bootmem_pages(PAGE_SIZE);
+   empty_zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 
address = PAGE_OFFSET;
pg_dir = swapper_pg_dir;
@@ -55,7 +55,7 @@ void __init paging_init(void)
size = num_pages * sizeof(pte_t);
size = (size + PAGE_SIZE) & ~(PAGE_SIZE-1);
 
-   next_pgtable = (unsigned long)alloc_bootmem_pages(size);
+   next_pgtable = (unsigned long)memblock_alloc(size, PAGE_SIZE);
bootmem_end = (next_pgtable + size + PAGE_SIZE) & PAGE_MASK;
 
/* Map whole memory from PAGE_OFFSET (0x0E00) */
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 7713c08..c884b76 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -128,7 +128,7 @@ static pmd_t * __init one_md_table_init(pud_t *pud)
if (pud_none(*pud)) {
pmd_t *pmd;
 
-   pmd = alloc_bootmem_pages(PAGE_SIZE);
+   pmd = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
pud_populate(_mm, pud, pmd);

[PATCH 18/30] memblock: replace alloc_bootmem_low_pages with memblock_alloc_low

2018-09-14 Thread Mike Rapoport
The alloc_bootmem_low_pages() function allocates PAGE_SIZE aligned regions
from low memory. memblock_alloc_low() with alignment set to PAGE_SIZE does
exactly the same thing.

The conversion is done using the following semantic patch:

@@
expression e;
@@
- alloc_bootmem_low_pages(e)
+ memblock_alloc_low(e, PAGE_SIZE)

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/arc/mm/highmem.c|  2 +-
 arch/m68k/atari/stram.c  |  3 ++-
 arch/m68k/mm/motorola.c  |  5 +++--
 arch/mips/cavium-octeon/dma-octeon.c |  2 +-
 arch/mips/mm/init.c  |  3 ++-
 arch/um/kernel/mem.c | 10 ++
 arch/xtensa/mm/mmu.c |  2 +-
 7 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/arc/mm/highmem.c b/arch/arc/mm/highmem.c
index 77ff64a..f582dc8 100644
--- a/arch/arc/mm/highmem.c
+++ b/arch/arc/mm/highmem.c
@@ -123,7 +123,7 @@ static noinline pte_t * __init alloc_kmap_pgtable(unsigned 
long kvaddr)
pud_k = pud_offset(pgd_k, kvaddr);
pmd_k = pmd_offset(pud_k, kvaddr);
 
-   pte_k = (pte_t *)alloc_bootmem_low_pages(PAGE_SIZE);
+   pte_k = (pte_t *)memblock_alloc_low(PAGE_SIZE, PAGE_SIZE);
pmd_populate_kernel(_mm, pmd_k, pte_k);
return pte_k;
 }
diff --git a/arch/m68k/atari/stram.c b/arch/m68k/atari/stram.c
index c83d664..1089d67 100644
--- a/arch/m68k/atari/stram.c
+++ b/arch/m68k/atari/stram.c
@@ -95,7 +95,8 @@ void __init atari_stram_reserve_pages(void *start_mem)
 {
if (kernel_in_stram) {
pr_debug("atari_stram pool: kernel in ST-RAM, using 
alloc_bootmem!\n");
-   stram_pool.start = 
(resource_size_t)alloc_bootmem_low_pages(pool_size);
+   stram_pool.start = 
(resource_size_t)memblock_alloc_low(pool_size,
+  
PAGE_SIZE);
stram_pool.end = stram_pool.start + pool_size - 1;
request_resource(_resource, _pool);
stram_virt_offset = 0;
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 4e17ecb..8bcf57e 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -55,7 +55,7 @@ static pte_t * __init kernel_page_table(void)
 {
pte_t *ptablep;
 
-   ptablep = (pte_t *)alloc_bootmem_low_pages(PAGE_SIZE);
+   ptablep = (pte_t *)memblock_alloc_low(PAGE_SIZE, PAGE_SIZE);
 
clear_page(ptablep);
__flush_page_to_ram(ptablep);
@@ -95,7 +95,8 @@ static pmd_t * __init kernel_ptr_table(void)
 
last_pgtable += PTRS_PER_PMD;
if (((unsigned long)last_pgtable & ~PAGE_MASK) == 0) {
-   last_pgtable = (pmd_t *)alloc_bootmem_low_pages(PAGE_SIZE);
+   last_pgtable = (pmd_t *)memblock_alloc_low(PAGE_SIZE,
+  PAGE_SIZE);
 
clear_page(last_pgtable);
__flush_page_to_ram(last_pgtable);
diff --git a/arch/mips/cavium-octeon/dma-octeon.c 
b/arch/mips/cavium-octeon/dma-octeon.c
index 236833b..c44c1a6 100644
--- a/arch/mips/cavium-octeon/dma-octeon.c
+++ b/arch/mips/cavium-octeon/dma-octeon.c
@@ -244,7 +244,7 @@ void __init plat_swiotlb_setup(void)
swiotlb_nslabs = ALIGN(swiotlb_nslabs, IO_TLB_SEGSIZE);
swiotlbsize = swiotlb_nslabs << IO_TLB_SHIFT;
 
-   octeon_swiotlb = alloc_bootmem_low_pages(swiotlbsize);
+   octeon_swiotlb = memblock_alloc_low(swiotlbsize, PAGE_SIZE);
 
if (swiotlb_init_with_tbl(octeon_swiotlb, swiotlb_nslabs, 1) == -ENOMEM)
panic("Cannot allocate SWIOTLB buffer");
diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 400676c..a010fba7 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -244,7 +244,8 @@ void __init fixrange_init(unsigned long start, unsigned 
long end,
pmd = (pmd_t *)pud;
for (; (k < PTRS_PER_PMD) && (vaddr < end); pmd++, k++) 
{
if (pmd_none(*pmd)) {
-   pte = (pte_t *) 
alloc_bootmem_low_pages(PAGE_SIZE);
+   pte = (pte_t *) 
memblock_alloc_low(PAGE_SIZE,
+  
PAGE_SIZE);
set_pmd(pmd, __pmd((unsigned long)pte));
BUG_ON(pte != pte_offset_kernel(pmd, 
0));
}
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 3c0e470..185f6bb 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -64,7 +64,8 @@ void __init mem_init(void)
 static void __init one_page_table_init(pmd_t *pmd)
 {
if (pmd_none(*pmd)) {
-   pte_t *pte = (pte_t *) alloc_bootmem_low_pages(PAGE_SIZE);
+   pte_t *pte = (pte_t *) memblock_alloc_low(PAGE_SIZE,
+ PAGE_SIZE);
  

[PATCH 17/30] memblock: replace alloc_bootmem_node with memblock_alloc_node

2018-09-14 Thread Mike Rapoport
Both functions attempt to allocate memory with specified alignment from a
particular node. If the allocation from that node fails, they both fall
back to allocating from any node in the system.

Usage of native memblock API eliminates the nobootmem translation layer.

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/alpha/kernel/pci_iommu.c   | 4 ++--
 arch/ia64/sn/kernel/io_common.c | 7 ++-
 arch/ia64/sn/kernel/setup.c | 4 ++--
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 6923b0d..b52d76f 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -74,7 +74,7 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, 
dma_addr_t base,
 
 #ifdef CONFIG_DISCONTIGMEM
 
-   arena = alloc_bootmem_node(NODE_DATA(nid), sizeof(*arena));
+   arena = memblock_alloc_node(sizeof(*arena), align, nid);
if (!NODE_DATA(nid) || !arena) {
printk("%s: couldn't allocate arena from node %d\n"
   "falling back to system-wide allocation\n",
@@ -82,7 +82,7 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, 
dma_addr_t base,
arena = alloc_bootmem(sizeof(*arena));
}
 
-   arena->ptes = __alloc_bootmem_node(NODE_DATA(nid), mem_size, align, 0);
+   arena->ptes = memblock_alloc_node(sizeof(*arena), align, nid);
if (!NODE_DATA(nid) || !arena->ptes) {
printk("%s: couldn't allocate arena ptes from node %d\n"
   "falling back to system-wide allocation\n",
diff --git a/arch/ia64/sn/kernel/io_common.c b/arch/ia64/sn/kernel/io_common.c
index 102aaba..8b05d55 100644
--- a/arch/ia64/sn/kernel/io_common.c
+++ b/arch/ia64/sn/kernel/io_common.c
@@ -385,16 +385,13 @@ void __init hubdev_init_node(nodepda_t * npda, cnodeid_t 
node)
 {
struct hubdev_info *hubdev_info;
int size;
-   pg_data_t *pg;
 
size = sizeof(struct hubdev_info);
 
if (node >= num_online_nodes()) /* Headless/memless IO nodes */
-   pg = NODE_DATA(0);
-   else
-   pg = NODE_DATA(node);
+   node = 0;
 
-   hubdev_info = (struct hubdev_info *)alloc_bootmem_node(pg, size);
+   hubdev_info = (struct hubdev_info *)memblock_alloc_node(size, 0, node);
 
npda->pdinfo = (void *)hubdev_info;
 }
diff --git a/arch/ia64/sn/kernel/setup.c b/arch/ia64/sn/kernel/setup.c
index 5f6b6b4..ab2564f 100644
--- a/arch/ia64/sn/kernel/setup.c
+++ b/arch/ia64/sn/kernel/setup.c
@@ -511,7 +511,7 @@ static void __init sn_init_pdas(char **cmdline_p)
 */
for_each_online_node(cnode) {
nodepdaindr[cnode] =
-   alloc_bootmem_node(NODE_DATA(cnode), sizeof(nodepda_t));
+   memblock_alloc_node(sizeof(nodepda_t), 0, cnode);
memset(nodepdaindr[cnode]->phys_cpuid, -1,
sizeof(nodepdaindr[cnode]->phys_cpuid));
spin_lock_init([cnode]->ptc_lock);
@@ -522,7 +522,7 @@ static void __init sn_init_pdas(char **cmdline_p)
 */
for (cnode = num_online_nodes(); cnode < num_cnodes; cnode++)
nodepdaindr[cnode] =
-   alloc_bootmem_node(NODE_DATA(0), sizeof(nodepda_t));
+   memblock_alloc_node(sizeof(nodepda_t), 0, 0);
 
/*
 * Now copy the array of nodepda pointers to each nodepda.
-- 
2.7.4



[PATCH 16/30] memblock: replace __alloc_bootmem_node with appropriate memblock_ API

2018-09-14 Thread Mike Rapoport
Use memblock_alloc_try_nid whenever goal (i.e. minimal address is
specified) and memblock_alloc_node otherwise.

Signed-off-by: Mike Rapoport 
---
 arch/ia64/mm/discontig.c   |  6 --
 arch/powerpc/kernel/setup_64.c |  6 --
 arch/sparc/kernel/setup_64.c   | 10 --
 arch/sparc/kernel/smp_64.c |  4 ++--
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c
index 1928d57..918dda9 100644
--- a/arch/ia64/mm/discontig.c
+++ b/arch/ia64/mm/discontig.c
@@ -451,8 +451,10 @@ static void __init *memory_less_node_alloc(int nid, 
unsigned long pernodesize)
if (bestnode == -1)
bestnode = anynode;
 
-   ptr = __alloc_bootmem_node(pgdat_list[bestnode], pernodesize,
-   PERCPU_PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
+   ptr = memblock_alloc_try_nid(pernodesize, PERCPU_PAGE_SIZE,
+__pa(MAX_DMA_ADDRESS),
+BOOTMEM_ALLOC_ACCESSIBLE,
+bestnode);
 
return ptr;
 }
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 6a501b2..6add560 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -757,8 +757,10 @@ void __init emergency_stack_init(void)
 
 static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
 {
-   return __alloc_bootmem_node(NODE_DATA(early_cpu_to_node(cpu)), size, 
align,
-   __pa(MAX_DMA_ADDRESS));
+   return memblock_alloc_try_nid(size, align, __pa(MAX_DMA_ADDRESS),
+ BOOTMEM_ALLOC_ACCESSIBLE,
+ early_cpu_to_node(cpu));
+
 }
 
 static void __init pcpu_fc_free(void *ptr, size_t size)
diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 206bf81..5fb11ea 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -622,12 +622,10 @@ void __init alloc_irqstack_bootmem(void)
for_each_possible_cpu(i) {
node = cpu_to_node(i);
 
-   softirq_stack[i] = __alloc_bootmem_node(NODE_DATA(node),
-   THREAD_SIZE,
-   THREAD_SIZE, 0);
-   hardirq_stack[i] = __alloc_bootmem_node(NODE_DATA(node),
-   THREAD_SIZE,
-   THREAD_SIZE, 0);
+   softirq_stack[i] = memblock_alloc_node(THREAD_SIZE,
+  THREAD_SIZE, node);
+   hardirq_stack[i] = memblock_alloc_node(THREAD_SIZE,
+  THREAD_SIZE, node);
}
 }
 
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index d3ea1f3..83ff88d 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1594,8 +1594,8 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, 
size_t size,
pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n",
 cpu, size, __pa(ptr));
} else {
-   ptr = __alloc_bootmem_node(NODE_DATA(node),
-  size, align, goal);
+   ptr = memblock_alloc_try_nid(size, align, goal,
+BOOTMEM_ALLOC_ACCESSIBLE, node);
pr_debug("per cpu data for cpu%d %lu bytes on node%d at "
 "%016lx\n", cpu, size, node, __pa(ptr));
}
-- 
2.7.4



[PATCH 15/30] memblock: replace alloc_bootmem_pages_node with memblock_alloc_node

2018-09-14 Thread Mike Rapoport
The functions are equivalent, just the later does not require nobootmem
translation layer.

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/ia64/mm/init.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 3b85c3e..2169ca5 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -447,19 +447,19 @@ int __init create_mem_map_page_table(u64 start, u64 end, 
void *arg)
for (address = start_page; address < end_page; address += PAGE_SIZE) {
pgd = pgd_offset_k(address);
if (pgd_none(*pgd))
-   pgd_populate(_mm, pgd, 
alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE));
+   pgd_populate(_mm, pgd, 
memblock_alloc_node(PAGE_SIZE, PAGE_SIZE, node));
pud = pud_offset(pgd, address);
 
if (pud_none(*pud))
-   pud_populate(_mm, pud, 
alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE));
+   pud_populate(_mm, pud, 
memblock_alloc_node(PAGE_SIZE, PAGE_SIZE, node));
pmd = pmd_offset(pud, address);
 
if (pmd_none(*pmd))
-   pmd_populate_kernel(_mm, pmd, 
alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE));
+   pmd_populate_kernel(_mm, pmd, 
memblock_alloc_node(PAGE_SIZE, PAGE_SIZE, node));
pte = pte_offset_kernel(pmd, address);
 
if (pte_none(*pte))
-   set_pte(pte, 
pfn_pte(__pa(alloc_bootmem_pages_node(NODE_DATA(node), PAGE_SIZE)) >> 
PAGE_SHIFT,
+   set_pte(pte, 
pfn_pte(__pa(memblock_alloc_node(PAGE_SIZE, PAGE_SIZE, node)) >> PAGE_SHIFT,
 PAGE_KERNEL));
}
return 0;
-- 
2.7.4



[PATCH 14/30] memblock: add align parameter to memblock_alloc_node()

2018-09-14 Thread Mike Rapoport
With the align parameter memblock_alloc_node() can be used as drop in
replacement for alloc_bootmem_pages_node() and __alloc_bootmem_node(),
which is done in the following patches.

Signed-off-by: Mike Rapoport 
---
 include/linux/bootmem.h | 4 ++--
 mm/sparse.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index 7d91f0f..3896af2 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -157,9 +157,9 @@ static inline void * __init memblock_alloc_from_nopanic(
 }
 
 static inline void * __init memblock_alloc_node(
-   phys_addr_t size, int nid)
+   phys_addr_t size, phys_addr_t align, int nid)
 {
-   return memblock_alloc_try_nid(size, 0, BOOTMEM_LOW_LIMIT,
+   return memblock_alloc_try_nid(size, align, BOOTMEM_LOW_LIMIT,
BOOTMEM_ALLOC_ACCESSIBLE, nid);
 }
 
diff --git a/mm/sparse.c b/mm/sparse.c
index 04e97af..509828f 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -68,7 +68,7 @@ static noinline struct mem_section __ref 
*sparse_index_alloc(int nid)
if (slab_is_available())
section = kzalloc_node(array_size, GFP_KERNEL, nid);
else
-   section = memblock_alloc_node(array_size, nid);
+   section = memblock_alloc_node(array_size, 0, nid);
 
return section;
 }
-- 
2.7.4



[PATCH 13/30] memblock: replace __alloc_bootmem_nopanic with memblock_alloc_from_nopanic

2018-09-14 Thread Mike Rapoport
When __alloc_bootmem_nopanic() is used with explicit lower limit for the
allocation it attempts to allocate memory at or above that limit and falls
back to allocation with no limit set.

The memblock_alloc_from_nopanic() does exactly the same thing and can be
used as a replacement for __alloc_bootmem_nopanic() is such cases.

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/arc/kernel/unwind.c   | 4 ++--
 arch/x86/kernel/setup_percpu.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arc/kernel/unwind.c b/arch/arc/kernel/unwind.c
index 183391d..2a01dd1 100644
--- a/arch/arc/kernel/unwind.c
+++ b/arch/arc/kernel/unwind.c
@@ -181,8 +181,8 @@ static void init_unwind_hdr(struct unwind_table *table,
  */
 static void *__init unw_hdr_alloc_early(unsigned long sz)
 {
-   return __alloc_bootmem_nopanic(sz, sizeof(unsigned int),
-  MAX_DMA_ADDRESS);
+   return memblock_alloc_from_nopanic(sz, sizeof(unsigned int),
+  MAX_DMA_ADDRESS);
 }
 
 static void *unw_hdr_alloc(unsigned long sz)
diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index 67d48e26..041663a 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -106,7 +106,7 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, 
unsigned long size,
void *ptr;
 
if (!node_online(node) || !NODE_DATA(node)) {
-   ptr = __alloc_bootmem_nopanic(size, align, goal);
+   ptr = memblock_alloc_from_nopanic(size, align, goal);
pr_info("cpu %d has no node %d or node-local memory\n",
cpu, node);
pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n",
@@ -121,7 +121,7 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, 
unsigned long size,
}
return ptr;
 #else
-   return __alloc_bootmem_nopanic(size, align, goal);
+   return memblock_alloc_from_nopanic(size, align, goal);
 #endif
 }
 
-- 
2.7.4



[PATCH 12/30] memblock: replace alloc_bootmem_low with memblock_alloc_low

2018-09-14 Thread Mike Rapoport
The alloc_bootmem_low(size) allocates low memory with default alignement
and can be replcaed by memblock_alloc_low(size, 0)

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/arm64/kernel/setup.c | 2 +-
 arch/unicore32/kernel/setup.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 5b4fac4..cf7a7b7 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -213,7 +213,7 @@ static void __init request_standard_resources(void)
kernel_data.end = __pa_symbol(_end - 1);
 
for_each_memblock(memory, region) {
-   res = alloc_bootmem_low(sizeof(*res));
+   res = memblock_alloc_low(sizeof(*res), 0);
if (memblock_is_nomap(region)) {
res->name  = "reserved";
res->flags = IORESOURCE_MEM;
diff --git a/arch/unicore32/kernel/setup.c b/arch/unicore32/kernel/setup.c
index c2bffa5..9f163f9 100644
--- a/arch/unicore32/kernel/setup.c
+++ b/arch/unicore32/kernel/setup.c
@@ -207,7 +207,7 @@ request_standard_resources(struct meminfo *mi)
if (mi->bank[i].size == 0)
continue;
 
-   res = alloc_bootmem_low(sizeof(*res));
+   res = memblock_alloc_low(sizeof(*res), 0);
res->name  = "System RAM";
res->start = mi->bank[i].start;
res->end   = mi->bank[i].start + mi->bank[i].size - 1;
-- 
2.7.4



[PATCH 11/30] memblock: replace alloc_bootmem_pages_nopanic with memblock_alloc_nopanic

2018-09-14 Thread Mike Rapoport
The alloc_bootmem_pages_nopanic(size) is a shortcut for
__alloc_bootmem_nopanic(size, PAGE_SIZE, BOOTMEM_LOW_LIMIT) which allocates
PAGE_SIZE aligned memory. Since BOOTMEM_LOW_LIMIT is hardwired to 0 there
is no restrictions on where the allocated memory should reside.

The memblock_alloc_nopanic(size, PAGE_SIZE) also allocates PAGE_SIZE
aligned memory without any restrictions and thus can be used as a
replacement for alloc_bootmem_pages_nopanic()

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 drivers/usb/early/xhci-dbc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/early/xhci-dbc.c b/drivers/usb/early/xhci-dbc.c
index e15e896..16df968 100644
--- a/drivers/usb/early/xhci-dbc.c
+++ b/drivers/usb/early/xhci-dbc.c
@@ -94,7 +94,7 @@ static void * __init xdbc_get_page(dma_addr_t *dma_addr)
 {
void *virt;
 
-   virt = alloc_bootmem_pages_nopanic(PAGE_SIZE);
+   virt = memblock_alloc_nopanic(PAGE_SIZE, PAGE_SIZE);
if (!virt)
return NULL;
 
-- 
2.7.4



[PATCH 10/30] memblock: replace __alloc_bootmem_node_nopanic with memblock_alloc_try_nid_nopanic

2018-09-14 Thread Mike Rapoport
The __alloc_bootmem_node_nopanic() attempts to allocate memory for a
specified node. If the allocation fails it then retries to allocate memory
from any node. Upon success, the allocated memory is set to 0.

The memblock_alloc_try_nid_nopanic() does exactly the same thing and can be
used instead.

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/x86/kernel/setup_percpu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index ea554f8..67d48e26 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -112,8 +112,10 @@ static void * __init pcpu_alloc_bootmem(unsigned int cpu, 
unsigned long size,
pr_debug("per cpu data for cpu%d %lu bytes at %016lx\n",
 cpu, size, __pa(ptr));
} else {
-   ptr = __alloc_bootmem_node_nopanic(NODE_DATA(node),
-  size, align, goal);
+   ptr = memblock_alloc_try_nid_nopanic(size, align, goal,
+BOOTMEM_ALLOC_ACCESSIBLE,
+node);
+
pr_debug("per cpu data for cpu%d %lu bytes on node%d at 
%016lx\n",
 cpu, size, node, __pa(ptr));
}
-- 
2.7.4



[PATCH 08/30] memblock: replace alloc_bootmem_align with memblock_alloc

2018-09-14 Thread Mike Rapoport
The functions are equivalent, just the later does not require nobootmem
translation layer.

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/x86/xen/p2m.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index d6d74ef..5de761b 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -182,7 +182,7 @@ static void p2m_init_identity(unsigned long *p2m, unsigned 
long pfn)
 static void * __ref alloc_p2m_page(void)
 {
if (unlikely(!slab_is_available()))
-   return alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE);
+   return memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 
return (void *)__get_free_page(GFP_KERNEL);
 }
-- 
2.7.4



[PATCH 09/30] memblock: replace alloc_bootmem_low with memblock_alloc_low

2018-09-14 Thread Mike Rapoport
The functions are equivalent, just the later does not require nobootmem
translation layer.

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/x86/kernel/tce_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/tce_64.c b/arch/x86/kernel/tce_64.c
index f386bad..54c9b5a 100644
--- a/arch/x86/kernel/tce_64.c
+++ b/arch/x86/kernel/tce_64.c
@@ -173,7 +173,7 @@ void * __init alloc_tce_table(void)
size = table_size_to_number_of_entries(specified_table_size);
size *= TCE_ENTRY_SIZE;
 
-   return __alloc_bootmem_low(size, size, 0);
+   return memblock_alloc_low(size, size);
 }
 
 void __init free_tce_table(void *tbl)
-- 
2.7.4



[PATCH 07/30] memblock: remove _virt from APIs returning virtual address

2018-09-14 Thread Mike Rapoport
The conversion is done using

sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
$(git grep -l memblock_virt_alloc)

Signed-off-by: Mike Rapoport 
---
 arch/arm/kernel/setup.c   |  4 ++--
 arch/arm/mach-omap2/omap_hwmod.c  |  6 ++---
 arch/arm64/mm/kasan_init.c|  2 +-
 arch/arm64/mm/numa.c  |  2 +-
 arch/mips/kernel/setup.c  |  2 +-
 arch/powerpc/kernel/pci_32.c  |  2 +-
 arch/powerpc/lib/alloc.c  |  2 +-
 arch/powerpc/mm/mmu_context_nohash.c  |  6 ++---
 arch/powerpc/platforms/powermac/nvram.c   |  2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  6 ++---
 arch/powerpc/platforms/ps3/setup.c|  2 +-
 arch/powerpc/sysdev/msi_bitmap.c  |  2 +-
 arch/s390/kernel/setup.c  | 12 +-
 arch/s390/kernel/smp.c|  2 +-
 arch/s390/kernel/topology.c   |  4 ++--
 arch/s390/numa/mode_emu.c |  2 +-
 arch/s390/numa/toptree.c  |  2 +-
 arch/x86/mm/kasan_init_64.c   |  4 ++--
 arch/xtensa/mm/kasan_init.c   |  2 +-
 drivers/clk/ti/clk.c  |  2 +-
 drivers/firmware/memmap.c |  2 +-
 drivers/of/fdt.c  |  2 +-
 drivers/of/unittest.c |  2 +-
 include/linux/bootmem.h   | 38 +++
 init/main.c   |  6 ++---
 kernel/dma/swiotlb.c  |  8 +++
 kernel/power/snapshot.c   |  2 +-
 kernel/printk/printk.c|  4 ++--
 lib/cpumask.c |  2 +-
 mm/hugetlb.c  |  2 +-
 mm/kasan/kasan_init.c |  2 +-
 mm/memblock.c | 26 ++---
 mm/page_alloc.c   |  8 +++
 mm/page_ext.c |  2 +-
 mm/percpu.c   | 28 +++
 mm/sparse-vmemmap.c   |  2 +-
 mm/sparse.c   | 12 +-
 37 files changed, 108 insertions(+), 108 deletions(-)

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 4c249cb..39e6090 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -857,7 +857,7 @@ static void __init request_standard_resources(const struct 
machine_desc *mdesc)
 */
boot_alias_start = phys_to_idmap(start);
if (arm_has_idmap_alias() && boot_alias_start != 
IDMAP_INVALID_ADDR) {
-   res = memblock_virt_alloc(sizeof(*res), 0);
+   res = memblock_alloc(sizeof(*res), 0);
res->name = "System RAM (boot alias)";
res->start = boot_alias_start;
res->end = phys_to_idmap(end);
@@ -865,7 +865,7 @@ static void __init request_standard_resources(const struct 
machine_desc *mdesc)
request_resource(_resource, res);
}
 
-   res = memblock_virt_alloc(sizeof(*res), 0);
+   res = memblock_alloc(sizeof(*res), 0);
res->name  = "System RAM";
res->start = start;
res->end = end;
diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c
index 56a1fe9..1f9b34a 100644
--- a/arch/arm/mach-omap2/omap_hwmod.c
+++ b/arch/arm/mach-omap2/omap_hwmod.c
@@ -726,7 +726,7 @@ static int __init _setup_clkctrl_provider(struct 
device_node *np)
u64 size;
int i;
 
-   provider = memblock_virt_alloc(sizeof(*provider), 0);
+   provider = memblock_alloc(sizeof(*provider), 0);
if (!provider)
return -ENOMEM;
 
@@ -736,12 +736,12 @@ static int __init _setup_clkctrl_provider(struct 
device_node *np)
of_property_count_elems_of_size(np, "reg", sizeof(u32)) / 2;
 
provider->addr =
-   memblock_virt_alloc(sizeof(void *) * provider->num_addrs, 0);
+   memblock_alloc(sizeof(void *) * provider->num_addrs, 0);
if (!provider->addr)
return -ENOMEM;
 
provider->size =
-   memblock_virt_alloc(sizeof(u32) * provider->num_addrs, 0);
+   memblock_alloc(sizeof(u32) * provider->num_addrs, 0);
if (!provider->size)
return -ENOMEM;
 
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 1214587..2391560 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -38,7 +38,7 @@ static pgd_t tmp_pg_dir[PTRS_PER_PGD] __initdata 
__aligned(PGD_SIZE);
 
 static phys_addr_t __init kasan_alloc_zeroed_page(int node)
 {
-   void *p = memblock_virt_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
+   void *p = memblock_alloc_try_nid(PAGE_SIZE, PAGE_SIZE,
  __pa(MAX_DMA_ADDRESS),

[PATCH 06/30] memblock: rename memblock_alloc{_nid, _try_nid} to memblock_phys_alloc*

2018-09-14 Thread Mike Rapoport
Make it explicit that the caller gets a physical address rather than a
virtual one.

This will also allow using meblock_alloc prefix for memblock allocations
returning virtual address, which is done in the following patches.

The conversion is done using the following semantic patch:

@@
expression e1, e2, e3;
@@
(
- memblock_alloc(e1, e2)
+ memblock_phys_alloc(e1, e2)
|
- memblock_alloc_nid(e1, e2, e3)
+ memblock_phys_alloc_nid(e1, e2, e3)
|
- memblock_alloc_try_nid(e1, e2, e3)
+ memblock_phys_alloc_try_nid(e1, e2, e3)
)

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 arch/arm/mm/mmu.c |  2 +-
 arch/arm64/mm/mmu.c   |  2 +-
 arch/arm64/mm/numa.c  |  2 +-
 arch/c6x/mm/dma-coherent.c|  4 ++--
 arch/nds32/mm/init.c  |  8 
 arch/openrisc/mm/init.c   |  2 +-
 arch/openrisc/mm/ioremap.c|  2 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |  4 +---
 arch/powerpc/kernel/paca.c|  2 +-
 arch/powerpc/kernel/prom.c|  2 +-
 arch/powerpc/kernel/setup-common.c|  3 +--
 arch/powerpc/kernel/setup_32.c| 10 +-
 arch/powerpc/mm/numa.c|  2 +-
 arch/powerpc/mm/pgtable_32.c  |  2 +-
 arch/powerpc/mm/ppc_mmu_32.c  |  2 +-
 arch/powerpc/platforms/pasemi/iommu.c |  2 +-
 arch/powerpc/platforms/powernv/opal.c |  2 +-
 arch/powerpc/sysdev/dart_iommu.c  |  2 +-
 arch/s390/kernel/crash_dump.c |  2 +-
 arch/s390/kernel/setup.c  |  3 ++-
 arch/s390/mm/vmem.c   |  4 ++--
 arch/s390/numa/numa.c |  2 +-
 arch/sparc/kernel/mdesc.c |  2 +-
 arch/sparc/kernel/prom_64.c   |  2 +-
 arch/sparc/mm/init_64.c   | 11 ++-
 arch/unicore32/mm/mmu.c   |  2 +-
 arch/x86/mm/numa.c|  2 +-
 drivers/firmware/efi/memmap.c |  2 +-
 include/linux/memblock.h  |  6 +++---
 mm/memblock.c |  8 
 30 files changed, 50 insertions(+), 51 deletions(-)

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index e46a6a4..f5cc1cc 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -721,7 +721,7 @@ EXPORT_SYMBOL(phys_mem_access_prot);
 
 static void __init *early_alloc_aligned(unsigned long sz, unsigned long align)
 {
-   void *ptr = __va(memblock_alloc(sz, align));
+   void *ptr = __va(memblock_phys_alloc(sz, align));
memset(ptr, 0, sz);
return ptr;
 }
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8080c9f..b8e037b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -83,7 +83,7 @@ static phys_addr_t __init early_pgtable_alloc(void)
phys_addr_t phys;
void *ptr;
 
-   phys = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+   phys = memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE);
 
/*
 * The FIX_{PGD,PUD,PMD} slots may be in active use, but the FIX_PTE
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 146c04c..e5aacd6 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -237,7 +237,7 @@ static void __init setup_node_data(int nid, u64 start_pfn, 
u64 end_pfn)
if (start_pfn >= end_pfn)
pr_info("Initmem setup node %d []\n", nid);
 
-   nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
+   nd_pa = memblock_phys_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
nd = __va(nd_pa);
 
/* report and initialize */
diff --git a/arch/c6x/mm/dma-coherent.c b/arch/c6x/mm/dma-coherent.c
index d0a8e0c..01305c7 100644
--- a/arch/c6x/mm/dma-coherent.c
+++ b/arch/c6x/mm/dma-coherent.c
@@ -135,8 +135,8 @@ void __init coherent_mem_init(phys_addr_t start, u32 size)
if (dma_size & (PAGE_SIZE - 1))
++dma_pages;
 
-   bitmap_phys = memblock_alloc(BITS_TO_LONGS(dma_pages) * sizeof(long),
-sizeof(long));
+   bitmap_phys = memblock_phys_alloc(BITS_TO_LONGS(dma_pages) * 
sizeof(long),
+ sizeof(long));
 
dma_bitmap = phys_to_virt(bitmap_phys);
memset(dma_bitmap, 0, dma_pages * PAGE_SIZE);
diff --git a/arch/nds32/mm/init.c b/arch/nds32/mm/init.c
index c713d2a..5af81b8 100644
--- a/arch/nds32/mm/init.c
+++ b/arch/nds32/mm/init.c
@@ -81,7 +81,7 @@ static void __init map_ram(void)
}
 
/* Alloc one page for holding PTE's... */
-   pte = (pte_t *) __va(memblock_alloc(PAGE_SIZE, PAGE_SIZE));
+   pte = (pte_t *) __va(memblock_phys_alloc(PAGE_SIZE, PAGE_SIZE));
memset(pte, 0, PAGE_SIZE);
set_pmd(pme, __pmd(__pa(pte) + _PAGE_KERNEL_TABLE));
 
@@ -114,7 +114,7 @@ static void __init fixedrange_init(void)
pgd = swapper_pg_dir + pgd_index(vaddr);
pud = pud_offset(pgd, vaddr);
pmd = pmd_offset(pud, vaddr);
-   fixmap_pmd_p = (pmd_t *) 

[PATCH 05/30] mm: nobootmem: remove dead code

2018-09-14 Thread Mike Rapoport
Several bootmem functions and macros are not used. Remove them.

Signed-off-by: Mike Rapoport 
---
 include/linux/bootmem.h | 26 --
 mm/nobootmem.c  | 35 ---
 2 files changed, 61 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index fce6278..b74bafd1 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -36,17 +36,6 @@ extern void free_bootmem_node(pg_data_t *pgdat,
 extern void free_bootmem(unsigned long physaddr, unsigned long size);
 extern void free_bootmem_late(unsigned long physaddr, unsigned long size);
 
-/*
- * Flags for reserve_bootmem (also if CONFIG_HAVE_ARCH_BOOTMEM_NODE,
- * the architecture-specific code should honor this).
- *
- * If flags is BOOTMEM_DEFAULT, then the return value is always 0 (success).
- * If flags contains BOOTMEM_EXCLUSIVE, then -EBUSY is returned if the memory
- * already was reserved.
- */
-#define BOOTMEM_DEFAULT0
-#define BOOTMEM_EXCLUSIVE  (1<<0)
-
 extern void *__alloc_bootmem(unsigned long size,
 unsigned long align,
 unsigned long goal);
@@ -73,13 +62,6 @@ void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
 extern void *__alloc_bootmem_low(unsigned long size,
 unsigned long align,
 unsigned long goal) __malloc;
-void *__alloc_bootmem_low_nopanic(unsigned long size,
-unsigned long align,
-unsigned long goal) __malloc;
-extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
- unsigned long size,
- unsigned long align,
- unsigned long goal) __malloc;
 
 /* We are using top down, so it is safe to use 0 here */
 #define BOOTMEM_LOW_LIMIT 0
@@ -92,8 +74,6 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
__alloc_bootmem(x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT)
 #define alloc_bootmem_align(x, align) \
__alloc_bootmem(x, align, BOOTMEM_LOW_LIMIT)
-#define alloc_bootmem_nopanic(x) \
-   __alloc_bootmem_nopanic(x, SMP_CACHE_BYTES, BOOTMEM_LOW_LIMIT)
 #define alloc_bootmem_pages(x) \
__alloc_bootmem(x, PAGE_SIZE, BOOTMEM_LOW_LIMIT)
 #define alloc_bootmem_pages_nopanic(x) \
@@ -104,17 +84,11 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
__alloc_bootmem_node_nopanic(pgdat, x, SMP_CACHE_BYTES, 
BOOTMEM_LOW_LIMIT)
 #define alloc_bootmem_pages_node(pgdat, x) \
__alloc_bootmem_node(pgdat, x, PAGE_SIZE, BOOTMEM_LOW_LIMIT)
-#define alloc_bootmem_pages_node_nopanic(pgdat, x) \
-   __alloc_bootmem_node_nopanic(pgdat, x, PAGE_SIZE, BOOTMEM_LOW_LIMIT)
 
 #define alloc_bootmem_low(x) \
__alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
-#define alloc_bootmem_low_pages_nopanic(x) \
-   __alloc_bootmem_low_nopanic(x, PAGE_SIZE, 0)
 #define alloc_bootmem_low_pages(x) \
__alloc_bootmem_low(x, PAGE_SIZE, 0)
-#define alloc_bootmem_low_pages_node(pgdat, x) \
-   __alloc_bootmem_low_node(pgdat, x, PAGE_SIZE, 0)
 
 /* FIXME: use MEMBLOCK_ALLOC_* variants here */
 #define BOOTMEM_ALLOC_ACCESSIBLE   0
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index d4d0cd4..44ce7de 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -404,38 +404,3 @@ void * __init __alloc_bootmem_low(unsigned long size, 
unsigned long align,
 {
return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
 }
-
-void * __init __alloc_bootmem_low_nopanic(unsigned long size,
- unsigned long align,
- unsigned long goal)
-{
-   return ___alloc_bootmem_nopanic(size, align, goal,
-   ARCH_LOW_ADDRESS_LIMIT);
-}
-
-/**
- * __alloc_bootmem_low_node - allocate low boot memory from a specific node
- * @pgdat: node to allocate from
- * @size: size of the request in bytes
- * @align: alignment of the region
- * @goal: preferred starting address of the region
- *
- * The goal is dropped if it can not be satisfied and the allocation will
- * fall back to memory below @goal.
- *
- * Allocation may fall back to any node in the system if the specified node
- * can not hold the requested memory.
- *
- * The function panics if the request can not be satisfied.
- *
- * Return: address of the allocated region.
- */
-void * __init __alloc_bootmem_low_node(pg_data_t *pgdat, unsigned long size,
-  unsigned long align, unsigned long goal)
-{
-   if (WARN_ON_ONCE(slab_is_available()))
-   return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
-
-   return ___alloc_bootmem_node(pgdat, size, align, goal,
-ARCH_LOW_ADDRESS_LIMIT);
-}
-- 
2.7.4



[PATCH 04/30] mm: remove bootmem allocator implementation.

2018-09-14 Thread Mike Rapoport
All architectures have been converted to use MEMBLOCK + NO_BOOTMEM. The
bootmem allocator implementation can be removed.

Signed-off-by: Mike Rapoport 
Acked-by: Michal Hocko 
---
 include/linux/bootmem.h |  16 -
 mm/bootmem.c| 811 
 2 files changed, 827 deletions(-)
 delete mode 100644 mm/bootmem.c

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index ee61ac3..fce6278 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -26,14 +26,6 @@ extern unsigned long max_pfn;
  */
 extern unsigned long long max_possible_pfn;
 
-extern unsigned long bootmem_bootmap_pages(unsigned long);
-
-extern unsigned long init_bootmem_node(pg_data_t *pgdat,
-  unsigned long freepfn,
-  unsigned long startpfn,
-  unsigned long endpfn);
-extern unsigned long init_bootmem(unsigned long addr, unsigned long memend);
-
 extern unsigned long free_all_bootmem(void);
 extern void reset_node_managed_pages(pg_data_t *pgdat);
 extern void reset_all_zones_managed_pages(void);
@@ -55,14 +47,6 @@ extern void free_bootmem_late(unsigned long physaddr, 
unsigned long size);
 #define BOOTMEM_DEFAULT0
 #define BOOTMEM_EXCLUSIVE  (1<<0)
 
-extern int reserve_bootmem(unsigned long addr,
-  unsigned long size,
-  int flags);
-extern int reserve_bootmem_node(pg_data_t *pgdat,
-   unsigned long physaddr,
-   unsigned long size,
-   int flags);
-
 extern void *__alloc_bootmem(unsigned long size,
 unsigned long align,
 unsigned long goal);
diff --git a/mm/bootmem.c b/mm/bootmem.c
deleted file mode 100644
index 97db0e8..000
--- a/mm/bootmem.c
+++ /dev/null
@@ -1,811 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- *  bootmem - A boot-time physical memory allocator and configurator
- *
- *  Copyright (C) 1999 Ingo Molnar
- *1999 Kanoj Sarcar, SGI
- *2008 Johannes Weiner
- *
- * Access to this subsystem has to be serialized externally (which is true
- * for the boot process anyway).
- */
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "internal.h"
-
-/**
- * DOC: bootmem overview
- *
- * Bootmem is a boot-time physical memory allocator and configurator.
- *
- * It is used early in the boot process before the page allocator is
- * set up.
- *
- * Bootmem is based on the most basic of allocators, a First Fit
- * allocator which uses a bitmap to represent memory. If a bit is 1,
- * the page is allocated and 0 if unallocated. To satisfy allocations
- * of sizes smaller than a page, the allocator records the Page Frame
- * Number (PFN) of the last allocation and the offset the allocation
- * ended at. Subsequent small allocations are merged together and
- * stored on the same page.
- *
- * The information used by the bootmem allocator is represented by
- * :c:type:`struct bootmem_data`. An array to hold up to %MAX_NUMNODES
- * such structures is statically allocated and then it is discarded
- * when the system initialization completes. Each entry in this array
- * corresponds to a node with memory. For UMA systems only entry 0 is
- * used.
- *
- * The bootmem allocator is initialized during early architecture
- * specific setup. Each architecture is required to supply a
- * :c:func:`setup_arch` function which, among other tasks, is
- * responsible for acquiring the necessary parameters to initialise
- * the boot memory allocator. These parameters define limits of usable
- * physical memory:
- *
- * * @min_low_pfn - the lowest PFN that is available in the system
- * * @max_low_pfn - the highest PFN that may be addressed by low
- *   memory (%ZONE_NORMAL)
- * * @max_pfn - the last PFN available to the system.
- *
- * After those limits are determined, the :c:func:`init_bootmem` or
- * :c:func:`init_bootmem_node` function should be called to initialize
- * the bootmem allocator. The UMA case should use the `init_bootmem`
- * function. It will initialize ``contig_page_data`` structure that
- * represents the only memory node in the system. In the NUMA case the
- * `init_bootmem_node` function should be called to initialize the
- * bootmem allocator for each node.
- *
- * Once the allocator is set up, it is possible to use either single
- * node or NUMA variant of the allocation APIs.
- */
-
-#ifndef CONFIG_NEED_MULTIPLE_NODES
-struct pglist_data __refdata contig_page_data = {
-   .bdata = _node_data[0]
-};
-EXPORT_SYMBOL(contig_page_data);
-#endif
-
-unsigned long max_low_pfn;
-unsigned long min_low_pfn;
-unsigned long max_pfn;
-unsigned long long max_possible_pfn;
-
-bootmem_data_t bootmem_node_data[MAX_NUMNODES] __initdata;
-
-static struct list_head 

[PATCH 03/30] mm: remove CONFIG_HAVE_MEMBLOCK

2018-09-14 Thread Mike Rapoport
All architecures use memblock for early memory management. There is no need
for the CONFIG_HAVE_MEMBLOCK configuration option.

Signed-off-by: Mike Rapoport 
---
 arch/alpha/Kconfig  |   1 -
 arch/arc/Kconfig|   1 -
 arch/arm/Kconfig|   1 -
 arch/arm64/Kconfig  |   1 -
 arch/c6x/Kconfig|   1 -
 arch/h8300/Kconfig  |   1 -
 arch/hexagon/Kconfig|   1 -
 arch/ia64/Kconfig   |   1 -
 arch/m68k/Kconfig   |   1 -
 arch/microblaze/Kconfig |   1 -
 arch/mips/Kconfig   |   1 -
 arch/nds32/Kconfig  |   1 -
 arch/nios2/Kconfig  |   1 -
 arch/openrisc/Kconfig   |   1 -
 arch/parisc/Kconfig |   1 -
 arch/powerpc/Kconfig|   1 -
 arch/riscv/Kconfig  |   1 -
 arch/s390/Kconfig   |   1 -
 arch/sh/Kconfig |   1 -
 arch/sparc/Kconfig  |   1 -
 arch/um/Kconfig |   1 -
 arch/unicore32/Kconfig  |   1 -
 arch/x86/Kconfig|   1 -
 arch/xtensa/Kconfig |   1 -
 drivers/of/fdt.c|   2 -
 drivers/of/of_reserved_mem.c|  13 +
 drivers/staging/android/ion/Kconfig |   2 +-
 fs/pstore/Kconfig   |   1 -
 include/linux/bootmem.h | 112 
 include/linux/memblock.h|   2 -
 include/linux/mm.h  |   2 +-
 lib/Kconfig.debug   |   3 +-
 mm/Kconfig  |   5 +-
 mm/Makefile |   2 +-
 mm/nobootmem.c  |   4 --
 mm/page_alloc.c |   4 +-
 36 files changed, 8 insertions(+), 168 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 04de6be..5b4f883 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -31,7 +31,6 @@ config ALPHA
select ODD_RT_SIGACTION
select OLD_SIGSUSPEND
select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67
-   select HAVE_MEMBLOCK
help
  The Alpha is a 64-bit general-purpose processor designed and
  marketed by the Digital Equipment Corporation of blessed memory,
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index 04ebead..5260440 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -37,7 +37,6 @@ config ARC
select HAVE_KERNEL_LZMA
select HAVE_KPROBES
select HAVE_KRETPROBES
-   select HAVE_MEMBLOCK
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_OPROFILE
select HAVE_PERF_EVENTS
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a961d70..33f4653 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -82,7 +82,6 @@ config ARM
select HAVE_KERNEL_XZ
select HAVE_KPROBES if !XIP_KERNEL && !CPU_ENDIAN_BE32 && !CPU_V7M
select HAVE_KRETPROBES if (HAVE_KPROBES)
-   select HAVE_MEMBLOCK
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI
select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1795eaa..23ae619 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -134,7 +134,6 @@ config ARM64
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IRQ_TIME_ACCOUNTING
-   select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP if NUMA
select HAVE_NMI
select HAVE_PATA_PLATFORM
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index a641b0b..833fdb0 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -13,7 +13,6 @@ config C6X
select GENERIC_ATOMIC64
select GENERIC_IRQ_SHOW
select HAVE_ARCH_TRACEHOOK
-   select HAVE_MEMBLOCK
select SPARSE_IRQ
select IRQ_DOMAIN
select OF
diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index 5e89d40..d19c6b16 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -15,7 +15,6 @@ config H8300
select OF
select OF_IRQ
select OF_EARLY_FLATTREE
-   select HAVE_MEMBLOCK
select TIMER_OF
select H8300_TMR8
select HAVE_KERNEL_GZIP
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 24a6da9..d9ae82b 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -31,7 +31,6 @@ config HEXAGON
select GENERIC_CLOCKEVENTS_BROADCAST
select MODULES_USE_ELF_RELA
select GENERIC_CPU_DEVICES
-   select HAVE_MEMBLOCK
select ARCH_DISCARD_MEMBLOCK
---help---
  Qualcomm Hexagon is a processor architecture designed for high
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 2bf4ef7..36773de 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -26,7 +26,6 @@ config IA64
select HAVE_FUNCTION_TRACER
select TTY
select HAVE_ARCH_TRACEHOOK
- 

[PATCH 02/30] mm: remove CONFIG_NO_BOOTMEM

2018-09-14 Thread Mike Rapoport
All achitectures select NO_BOOTMEM which essentially becomes 'Y' for any
kernel configuration and therefore it can be removed.

Signed-off-by: Mike Rapoport 
---
 arch/alpha/Kconfig  |  1 -
 arch/arc/Kconfig|  1 -
 arch/arm/Kconfig|  1 -
 arch/arm64/Kconfig  |  1 -
 arch/c6x/Kconfig|  1 -
 arch/h8300/Kconfig  |  1 -
 arch/hexagon/Kconfig|  1 -
 arch/ia64/Kconfig   |  1 -
 arch/m68k/Kconfig   |  1 -
 arch/microblaze/Kconfig |  1 -
 arch/mips/Kconfig   |  1 -
 arch/nds32/Kconfig  |  1 -
 arch/nios2/Kconfig  |  1 -
 arch/openrisc/Kconfig   |  1 -
 arch/parisc/Kconfig |  1 -
 arch/powerpc/Kconfig|  1 -
 arch/riscv/Kconfig  |  1 -
 arch/s390/Kconfig   |  1 -
 arch/sh/Kconfig |  1 -
 arch/sparc/Kconfig  |  1 -
 arch/um/Kconfig |  1 -
 arch/unicore32/Kconfig  |  1 -
 arch/x86/Kconfig|  3 ---
 arch/xtensa/Kconfig |  1 -
 include/linux/bootmem.h | 36 ++--
 include/linux/mmzone.h  |  5 +
 mm/Kconfig  |  3 ---
 mm/Makefile |  7 +--
 mm/memblock.c   |  2 --
 29 files changed, 4 insertions(+), 75 deletions(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 620b0a7..04de6be 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -32,7 +32,6 @@ config ALPHA
select OLD_SIGSUSPEND
select CPU_NO_EFFICIENT_FFS if !ALPHA_EV67
select HAVE_MEMBLOCK
-   select NO_BOOTMEM
help
  The Alpha is a 64-bit general-purpose processor designed and
  marketed by the Digital Equipment Corporation of blessed memory,
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index b4441b0..04ebead 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -44,7 +44,6 @@ config ARC
select HANDLE_DOMAIN_IRQ
select IRQ_DOMAIN
select MODULES_USE_ELF_RELA
-   select NO_BOOTMEM
select OF
select OF_EARLY_FLATTREE
select OF_RESERVED_MEM
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4607d32..a961d70 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -100,7 +100,6 @@ config ARM
select IRQ_FORCED_THREADING
select MODULES_USE_ELF_REL
select NEED_DMA_MAP_STATE
-   select NO_BOOTMEM
select OF_EARLY_FLATTREE if OF
select OF_RESERVED_MEM if OF
select OLD_SIGACTION
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0128d84..1795eaa 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -156,7 +156,6 @@ config ARM64
select MULTI_IRQ_HANDLER
select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH
-   select NO_BOOTMEM
select OF
select OF_EARLY_FLATTREE
select OF_RESERVED_MEM
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index 85ed568..a641b0b 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -14,7 +14,6 @@ config C6X
select GENERIC_IRQ_SHOW
select HAVE_ARCH_TRACEHOOK
select HAVE_MEMBLOCK
-   select NO_BOOTMEM
select SPARSE_IRQ
select IRQ_DOMAIN
select OF
diff --git a/arch/h8300/Kconfig b/arch/h8300/Kconfig
index 0b334b6..5e89d40 100644
--- a/arch/h8300/Kconfig
+++ b/arch/h8300/Kconfig
@@ -16,7 +16,6 @@ config H8300
select OF_IRQ
select OF_EARLY_FLATTREE
select HAVE_MEMBLOCK
-   select NO_BOOTMEM
select TIMER_OF
select H8300_TMR8
select HAVE_KERNEL_GZIP
diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 3ba6873..24a6da9 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -33,7 +33,6 @@ config HEXAGON
select GENERIC_CPU_DEVICES
select HAVE_MEMBLOCK
select ARCH_DISCARD_MEMBLOCK
-   select NO_BOOTMEM
---help---
  Qualcomm Hexagon is a processor architecture designed for high
  performance and low power across a wide variety of applications.
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 8b4a0c17..2bf4ef7 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -28,7 +28,6 @@ config IA64
select HAVE_ARCH_TRACEHOOK
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
-   select NO_BOOTMEM
select HAVE_VIRT_CPU_ACCOUNTING
select ARCH_HAS_DMA_MARK_CLEAN
select ARCH_HAS_SG_CHAIN
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index 0705537..8c7111d 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -29,7 +29,6 @@ config M68K
select DMA_NONCOHERENT_OPS if HAS_DMA
select HAVE_MEMBLOCK
select ARCH_DISCARD_MEMBLOCK
-   select NO_BOOTMEM
 
 config CPU_BIG_ENDIAN
def_bool y
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index ace5c5b..56379b9 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -28,7 +28,6 @@ config MICROBLAZE
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
-   

[PATCH 01/30] mips: switch to NO_BOOTMEM

2018-09-14 Thread Mike Rapoport
MIPS already has memblock support and all the memory is already registered
with it.

This patch replaces bootmem memory reservations with memblock ones and
removes the bootmem initialization.

Since memblock allocates memory in top-down mode, we ensure that memblock
limit is max_low_pfn to prevent allocations from the high memory.

To have the exceptions base in the lower 512M of the physical memory, its
allocation in arch/mips/kernel/traps.c::traps_init() is using bottom-up
mode.

Signed-off-by: Mike Rapoport 
---
 arch/mips/Kconfig  |  1 +
 arch/mips/kernel/setup.c   | 99 --
 arch/mips/kernel/traps.c   |  3 ++
 arch/mips/loongson64/loongson-3/numa.c | 34 ++--
 arch/mips/sgi-ip27/ip27-memory.c   | 11 ++--
 5 files changed, 46 insertions(+), 102 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 54532f2..1b5fa1a 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -78,6 +78,7 @@ config MIPS
select RTC_LIB
select SYSCTL_EXCEPTION_TRACE
select VIRT_TO_BUS
+   select NO_BOOTMEM
 
 menu "Machine selection"
 
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 32fc11d..2fde53e 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -333,7 +333,7 @@ static void __init finalize_initrd(void)
 
maybe_bswap_initrd();
 
-   reserve_bootmem(__pa(initrd_start), size, BOOTMEM_DEFAULT);
+   memblock_reserve(__pa(initrd_start), size);
initrd_below_start_ok = 1;
 
pr_info("Initial ramdisk at: 0x%lx (%lu bytes)\n",
@@ -370,20 +370,10 @@ static void __init bootmem_init(void)
 
 #else  /* !CONFIG_SGI_IP27 */
 
-static unsigned long __init bootmap_bytes(unsigned long pages)
-{
-   unsigned long bytes = DIV_ROUND_UP(pages, 8);
-
-   return ALIGN(bytes, sizeof(long));
-}
-
 static void __init bootmem_init(void)
 {
unsigned long reserved_end;
-   unsigned long mapstart = ~0UL;
-   unsigned long bootmap_size;
phys_addr_t ramstart = PHYS_ADDR_MAX;
-   bool bootmap_valid = false;
int i;
 
/*
@@ -395,6 +385,8 @@ static void __init bootmem_init(void)
init_initrd();
reserved_end = (unsigned long) PFN_UP(__pa_symbol(&_end));
 
+   memblock_reserve(PHYS_OFFSET, reserved_end << PAGE_SHIFT);
+
/*
 * max_low_pfn is not a number of pages. The number of pages
 * of the system is given by 'max_low_pfn - min_low_pfn'.
@@ -442,9 +434,6 @@ static void __init bootmem_init(void)
if (initrd_end && end <= (unsigned 
long)PFN_UP(__pa(initrd_end)))
continue;
 #endif
-   if (start >= mapstart)
-   continue;
-   mapstart = max(reserved_end, start);
}
 
if (min_low_pfn >= max_low_pfn)
@@ -456,9 +445,11 @@ static void __init bootmem_init(void)
/*
 * Reserve any memory between the start of RAM and PHYS_OFFSET
 */
-   if (ramstart > PHYS_OFFSET)
+   if (ramstart > PHYS_OFFSET) {
add_memory_region(PHYS_OFFSET, ramstart - PHYS_OFFSET,
  BOOT_MEM_RESERVED);
+   memblock_reserve(PHYS_OFFSET, ramstart - PHYS_OFFSET);
+   }
 
if (min_low_pfn > ARCH_PFN_OFFSET) {
pr_info("Wasting %lu bytes for tracking %lu unused pages\n",
@@ -483,52 +474,6 @@ static void __init bootmem_init(void)
max_low_pfn = PFN_DOWN(HIGHMEM_START);
}
 
-#ifdef CONFIG_BLK_DEV_INITRD
-   /*
-* mapstart should be after initrd_end
-*/
-   if (initrd_end)
-   mapstart = max(mapstart, (unsigned 
long)PFN_UP(__pa(initrd_end)));
-#endif
-
-   /*
-* check that mapstart doesn't overlap with any of
-* memory regions that have been reserved through eg. DTB
-*/
-   bootmap_size = bootmap_bytes(max_low_pfn - min_low_pfn);
-
-   bootmap_valid = memory_region_available(PFN_PHYS(mapstart),
-   bootmap_size);
-   for (i = 0; i < boot_mem_map.nr_map && !bootmap_valid; i++) {
-   unsigned long mapstart_addr;
-
-   switch (boot_mem_map.map[i].type) {
-   case BOOT_MEM_RESERVED:
-   mapstart_addr = PFN_ALIGN(boot_mem_map.map[i].addr +
-   boot_mem_map.map[i].size);
-   if (PHYS_PFN(mapstart_addr) < mapstart)
-   break;
-
-   bootmap_valid = memory_region_available(mapstart_addr,
-   bootmap_size);
-   if (bootmap_valid)
-   mapstart = PHYS_PFN(mapstart_addr);
-   break;
-   default:
-   break;
-   }
-   }
-
-   if 

[PATCH 00/30] mm: remove bootmem allocator

2018-09-14 Thread Mike Rapoport
Hi,

These patches switch early memory management to use memblock directly
without any bootmem compatibility wrappers. As the result both bootmem and
nobootmem are removed.

The patchset survived allyesconfig builds on arm, arm64, i386, mips, nds32,
parisc, powerpc, riscv, s390 and x86 and most of the *_defconfig builds for
all architectures except unicore32.

The patchset is based on v4.19-rc3-mmotm-2018-09-12-16-40, so I needed a
small PSI fix from [1] for some of the builds.

I did my best to verify that the failures are not caused by my changes, but
I may have missed something. Most defconfig build failures I've seen were
caused by assembler being unhappy about unsupported opcode, wrong encoding
or something else. Some builds for allyesconfig also failed because of it
and others failed because of symbol mismatch in spi-sprd or n_hdlc.

I've done boot testing on real x86-64 and Power8 machines and on
qemu-system-alpha and qemu-system-mips64el VMs.

I've tried to keep the distribution list as small as possible, but it's
still pretty log; my apologies for spamming.

Changes since RFC:
* updated MIPS conversion to nobootmem: 
  - set memblock limit to max_low_pfn to avoid allocation attempts from
high memory
  - use boottom-up mode for allocation of the exceptions base
* added elaborate changelogs
* updated boot-time-mm documentation

[1] https://lkml.org/lkml/2018/9/13/88

Mike Rapoport (30):
  mips: switch to NO_BOOTMEM
  mm: remove CONFIG_NO_BOOTMEM
  mm: remove CONFIG_HAVE_MEMBLOCK
  mm: remove bootmem allocator implementation.
  mm: nobootmem: remove dead code
  memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*
  memblock: remove _virt from APIs returning virtual address
  memblock: replace alloc_bootmem_align with memblock_alloc
  memblock: replace alloc_bootmem_low with memblock_alloc_low
  memblock: replace __alloc_bootmem_node_nopanic with
memblock_alloc_try_nid_nopanic
  memblock: replace alloc_bootmem_pages_nopanic with
memblock_alloc_nopanic
  memblock: replace alloc_bootmem_low with memblock_alloc_low
  memblock: replace __alloc_bootmem_nopanic with
memblock_alloc_from_nopanic
  memblock: add align parameter to memblock_alloc_node()
  memblock: replace alloc_bootmem_pages_node with memblock_alloc_node
  memblock: replace __alloc_bootmem_node with appropriate memblock_ API
  memblock: replace alloc_bootmem_node with memblock_alloc_node
  memblock: replace alloc_bootmem_low_pages with memblock_alloc_low
  memblock: replace alloc_bootmem_pages with memblock_alloc
  memblock: replace __alloc_bootmem with memblock_alloc_from
  memblock: replace alloc_bootmem with memblock_alloc
  mm: nobootmem: remove bootmem allocation APIs
  memblock: replace free_bootmem{_node} with memblock_free
  memblock: replace free_bootmem_late with memblock_free_late
  memblock: rename free_all_bootmem to memblock_free_all
  memblock: rename __free_pages_bootmem to memblock_free_pages
  mm: remove nobootmem
  memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variants
  mm: remove include/linux/bootmem.h
  docs/boot-time-mm: remove bootmem documentation

 Documentation/core-api/boot-time-mm.rst |  71 +--
 arch/alpha/Kconfig  |   2 -
 arch/alpha/kernel/core_cia.c|   4 +-
 arch/alpha/kernel/core_irongate.c   |   4 +-
 arch/alpha/kernel/core_marvel.c |   6 +-
 arch/alpha/kernel/core_titan.c  |   2 +-
 arch/alpha/kernel/core_tsunami.c|   2 +-
 arch/alpha/kernel/pci-noop.c|   6 +-
 arch/alpha/kernel/pci.c |   6 +-
 arch/alpha/kernel/pci_iommu.c   |  14 +-
 arch/alpha/kernel/setup.c   |   3 +-
 arch/alpha/kernel/sys_nautilus.c|   2 +-
 arch/alpha/mm/init.c|   4 +-
 arch/alpha/mm/numa.c|   1 -
 arch/arc/Kconfig|   2 -
 arch/arc/kernel/unwind.c|   6 +-
 arch/arc/mm/highmem.c   |   4 +-
 arch/arc/mm/init.c  |   3 +-
 arch/arm/Kconfig|   2 -
 arch/arm/kernel/devtree.c   |   1 -
 arch/arm/kernel/setup.c |   5 +-
 arch/arm/mach-omap2/omap_hwmod.c|   8 +-
 arch/arm/mm/dma-mapping.c   |   1 -
 arch/arm/mm/init.c  |   3 +-
 arch/arm/mm/mmu.c   |   2 +-
 arch/arm/xen/mm.c   |   1 -
 arch/arm/xen/p2m.c  |   2 +-
 arch/arm64/Kconfig  |   2 -
 arch/arm64/kernel/acpi.c|   1 -
 arch/arm64/kernel/acpi_numa.c   |   1 -
 arch/arm64/kernel/setup.c   |   3 +-
 arch/arm64/mm/dma-mapping.c |   2 +-
 arch/arm64/mm/init.c|   5 +-
 arch/arm64/mm/kasan_init.c  |   3 +-
 arch/arm64/mm/mmu.c   

[PATCH] serial: cpm_uart: return immediately from console poll

2018-09-14 Thread Christophe Leroy
kgdb expects poll function to return immediately and
returning NO_POLL_CHAR when no character is available.

Fixes: f5316b4aea024 ("kgdb,8250,pl011: Return immediately from console poll")
Cc: Jason Wessel 
Cc: 
Signed-off-by: Christophe Leroy 
---
 drivers/tty/serial/cpm_uart/cpm_uart_core.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c 
b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
index cd3f3fc4e0a5..280acc4dfa90 100644
--- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
+++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
@@ -1093,8 +1093,8 @@ static int poll_wait_key(char *obuf, struct uart_cpm_port 
*pinfo)
/* Get the address of the host memory buffer.
 */
bdp = pinfo->rx_cur;
-   while (bdp->cbd_sc & BD_SC_EMPTY)
-   ;
+   if (bdp->cbd_sc & BD_SC_EMPTY)
+   return NO_POLL_CHAR;
 
/* If the buffer address is in the CPM DPRAM, don't
 * convert it.
@@ -1129,7 +1129,11 @@ static int cpm_get_poll_char(struct uart_port *port)
poll_chars = 0;
}
if (poll_chars <= 0) {
-   poll_chars = poll_wait_key(poll_buf, pinfo);
+   int ret = poll_wait_key(poll_buf, pinfo);
+
+   if (ret == NO_POLL_CHAR)
+   return ret;
+   poll_chars = ret;
pollp = poll_buf;
}
poll_chars--;
-- 
2.13.3



Re: [PATCH 2/3] powerpc: Add system call table generation support

2018-09-14 Thread Arnd Bergmann
On Fri, Sep 14, 2018 at 10:33 AM Firoz Khan  wrote:

> ---
>  arch/powerpc/kernel/syscalls/Makefile   |  51 
>  arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 
> 
>  arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 +++
>  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
>  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  38 +++

I think you should only need a single .tbl  input file here.


> +
> +systbl_abi_syscall_table_32 := 32
> +$(out)/syscall_table_32.h: $(syscall32) $(systbl)
> +   $(call if_changed,systbl)
> +
> +systbl_abi_syscall_table_64 := 64
> +$(out)/syscall_table_64.h: $(syscall64) $(systbl)
> +   $(call if_changed,systbl)
> +
> +systbl_abi_syscall_table_c32 := c32
> +$(out)/syscall_table_c32.h: $(syscall32) $(systbl)
> +   $(call if_changed,systbl)

And here you need a fourth output file for the SPU table on ppc64.

> +383 common  statx   sys_statx
> +384 common  pkey_alloc  sys_pkey_alloc
> +385 common  pkey_free   sys_pkey_free
> +386 common  pkey_mprotect   sys_pkey_mprotect

This also misses rseq and io_pgetevents.

   Arnd


[PATCH v2 5/5] arm64: dts: add LX2160ARDB board support

2018-09-14 Thread Vabhav Sharma
LX2160A reference design board (RDB) is a high-performance
computing, evaluation, and development platform with LX2160A
SoC.

Signed-off-by: Priyanka Jain 
Signed-off-by: Sriram Dash 
Signed-off-by: Vabhav Sharma 
---
 arch/arm64/boot/dts/freescale/Makefile|  1 +
 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts | 88 +++
 2 files changed, 89 insertions(+)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts

diff --git a/arch/arm64/boot/dts/freescale/Makefile 
b/arch/arm64/boot/dts/freescale/Makefile
index 86e18ad..445b72b 100644
--- a/arch/arm64/boot/dts/freescale/Makefile
+++ b/arch/arm64/boot/dts/freescale/Makefile
@@ -13,3 +13,4 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb
+dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb
diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts 
b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
new file mode 100644
index 000..1bbe663
--- /dev/null
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
@@ -0,0 +1,88 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+//
+// Device Tree file for LX2160ARDB
+//
+// Copyright 2018 NXP
+
+/dts-v1/;
+
+#include "fsl-lx2160a.dtsi"
+
+/ {
+   model = "NXP Layerscape LX2160ARDB";
+   compatible = "fsl,lx2160a-rdb", "fsl,lx2160a";
+
+   chosen {
+   stdout-path = "serial0:115200n8";
+   };
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+   i2c-mux@77 {
+   compatible = "nxp,pca9547";
+   reg = <0x77>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   i2c@2 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x2>;
+
+   power-monitor@40 {
+   compatible = "ti,ina220";
+   reg = <0x40>;
+   shunt-resistor = <1000>;
+   };
+   };
+
+   i2c@3 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x3>;
+
+   temperature-sensor@4c {
+   compatible = "nxp,sa56004";
+   reg = <0x4c>;
+   };
+
+   temperature-sensor@4d {
+   compatible = "nxp,sa56004";
+   reg = <0x4d>;
+   };
+   };
+   };
+};
+
+ {
+   status = "okay";
+
+   rtc@51 {
+   compatible = "nxp,pcf2129";
+   reg = <0x51>;
+   // IRQ10_B
+   interrupts = <0 150 0x4>;
+   };
+
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
-- 
2.7.4



[PATCH v2 4/5] arm64: dts: add QorIQ LX2160A SoC support

2018-09-14 Thread Vabhav Sharma
LX2160A SoC is based on Layerscape Chassis Generation 3.2 Architecture.

LX2160A features an advanced 16 64-bit ARM v8 CortexA72 processor cores
in 8 cluster, CCN508, GICv3,two 64-bit DDR4 memory controller, 8 I2C
controllers, 3 dspi, 2 esdhc,2 USB 3.0, mmu 500, 3 SATA, 4 PL011 SBSA
UARTs etc.

Signed-off-by: Ramneek Mehresh 
Signed-off-by: Zhang Ying-22455 
Signed-off-by: Nipun Gupta 
Signed-off-by: Priyanka Jain 
Signed-off-by: Yogesh Gaur 
Signed-off-by: Sriram Dash 
Signed-off-by: Vabhav Sharma 
---
 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi | 693 +
 1 file changed, 693 insertions(+)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi

diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
new file mode 100644
index 000..46eea16
--- /dev/null
+++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
@@ -0,0 +1,693 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+//
+// Device Tree Include file for Layerscape-LX2160A family SoC.
+//
+// Copyright 2018 NXP
+
+#include 
+
+/memreserve/ 0x8000 0x0001;
+
+/ {
+   compatible = "fsl,lx2160a";
+   interrupt-parent = <>;
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   // 8 clusters having 2 Cortex-A72 cores each
+   cpu@0 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x0>;
+   clocks = < 1 0>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@1 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x1>;
+   clocks = < 1 0>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@100 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x100>;
+   clocks = < 1 1>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@101 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x101>;
+   clocks = < 1 1>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@200 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x200>;
+   clocks = < 1 2>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@201 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x201>;
+   clocks = < 1 2>;
+   d-cache-size = <0x8000>;
+   d-cache-line-size = <64>;
+   d-cache-sets = <128>;
+   i-cache-size = <0xC000>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <192>;
+   next-level-cache = <_l2>;
+   };
+
+   cpu@300 {
+   device_type = "cpu";
+   compatible = "arm,cortex-a72";
+   reg = <0x300>;
+   clocks 

[PATCH v2 3/5] drivers: clk-qoriq: Add clockgen support for lx2160a

2018-09-14 Thread Vabhav Sharma
From: Yogesh Gaur 

Add clockgen support for lx2160a.
Added entry for compat 'fsl,lx2160a-clockgen'.

Signed-off-by: Tang Yuantian 
Signed-off-by: Yogesh Gaur 
Signed-off-by: Vabhav Sharma 
Acked-by: Stephen Boyd 
---
 drivers/clk/clk-qoriq.c | 14 +-
 drivers/cpufreq/qoriq-cpufreq.c |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c
index 3a1812f..e9ae70b 100644
--- a/drivers/clk/clk-qoriq.c
+++ b/drivers/clk/clk-qoriq.c
@@ -79,7 +79,7 @@ struct clockgen_chipinfo {
const struct clockgen_muxinfo *cmux_groups[2];
const struct clockgen_muxinfo *hwaccel[NUM_HWACCEL];
void (*init_periph)(struct clockgen *cg);
-   int cmux_to_group[NUM_CMUX]; /* -1 terminates if fewer than NUM_CMUX */
+   int cmux_to_group[NUM_CMUX+1]; /* -1 terminate if fewer to NUM_CMUX+1 */
u32 pll_mask;   /* 1 << n bit set if PLL n is valid */
u32 flags;  /* CG_xxx */
 };
@@ -570,6 +570,17 @@ static const struct clockgen_chipinfo chipinfo[] = {
.flags = CG_VER3 | CG_LITTLE_ENDIAN,
},
{
+   .compat = "fsl,lx2160a-clockgen",
+   .cmux_groups = {
+   _cmux_cga12, _cmux_cgb
+   },
+   .cmux_to_group = {
+   0, 0, 0, 0, 1, 1, 1, 1, -1
+   },
+   .pll_mask = 0x37,
+   .flags = CG_VER3 | CG_LITTLE_ENDIAN,
+   },
+   {
.compat = "fsl,p2041-clockgen",
.guts_compat = "fsl,qoriq-device-config-1.0",
.init_periph = p2041_init_periph,
@@ -1424,6 +1435,7 @@ CLK_OF_DECLARE(qoriq_clockgen_ls1043a, 
"fsl,ls1043a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls1046a, "fsl,ls1046a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls1088a, "fsl,ls1088a-clockgen", clockgen_init);
 CLK_OF_DECLARE(qoriq_clockgen_ls2080a, "fsl,ls2080a-clockgen", clockgen_init);
+CLK_OF_DECLARE(qoriq_clockgen_lx2160a, "fsl,lx2160a-clockgen", clockgen_init);
 
 /* Legacy nodes */
 CLK_OF_DECLARE(qoriq_sysclk_1, "fsl,qoriq-sysclk-1.0", sysclk_init);
diff --git a/drivers/cpufreq/qoriq-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c
index 3d773f6..83921b7 100644
--- a/drivers/cpufreq/qoriq-cpufreq.c
+++ b/drivers/cpufreq/qoriq-cpufreq.c
@@ -295,6 +295,7 @@ static const struct of_device_id node_matches[] __initconst 
= {
{ .compatible = "fsl,ls1046a-clockgen", },
{ .compatible = "fsl,ls1088a-clockgen", },
{ .compatible = "fsl,ls2080a-clockgen", },
+   { .compatible = "fsl,lx2160a-clockgen", },
{ .compatible = "fsl,p4080-clockgen", },
{ .compatible = "fsl,qoriq-clockgen-1.0", },
{ .compatible = "fsl,qoriq-clockgen-2.0", },
-- 
2.7.4



[PATCH v2 2/5] soc/fsl/guts: Add compatible string for LX2160A

2018-09-14 Thread Vabhav Sharma
Adding compatible string "lx2160a-dcfg" to
initialize guts driver for lx2160

Signed-off-by: Vabhav Sharma 
---
 drivers/soc/fsl/guts.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
index 302e0c8..5e1e633 100644
--- a/drivers/soc/fsl/guts.c
+++ b/drivers/soc/fsl/guts.c
@@ -222,6 +222,7 @@ static const struct of_device_id fsl_guts_of_match[] = {
{ .compatible = "fsl,ls1088a-dcfg", },
{ .compatible = "fsl,ls1012a-dcfg", },
{ .compatible = "fsl,ls1046a-dcfg", },
+   { .compatible = "fsl,lx2160a-dcfg", },
{}
 };
 MODULE_DEVICE_TABLE(of, fsl_guts_of_match);
-- 
2.7.4



[PATCH v2 1/5] dt-bindings: arm64: add compatible for LX2160A

2018-09-14 Thread Vabhav Sharma
Add compatible for LX2160A SoC,QDS and RDB board

Signed-off-by: Vabhav Sharma 
---
 Documentation/devicetree/bindings/arm/fsl.txt | 12 
 1 file changed, 12 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/fsl.txt 
b/Documentation/devicetree/bindings/arm/fsl.txt
index cdb9dd7..76256bd 100644
--- a/Documentation/devicetree/bindings/arm/fsl.txt
+++ b/Documentation/devicetree/bindings/arm/fsl.txt
@@ -218,3 +218,15 @@ Required root node properties:
 LS2088A ARMv8 based RDB Board
 Required root node properties:
 - compatible = "fsl,ls2088a-rdb", "fsl,ls2088a";
+
+LX2160A SoC
+Required root node properties:
+- compatible = "fsl,lx2160a";
+
+LX2160A ARMv8 based QDS Board
+Required root node properties:
+- compatible = "fsl,lx2160a-qds", "fsl,lx2160a";
+
+LX2160A ARMv8 based RDB Board
+Required root node properties:
+- compatible = "fsl,lx2160a-rdb", "fsl,lx2160a";
-- 
2.7.4



[PATCH v2 0/5] arm64: dts: NXP: add basic dts file for LX2160A SoC

2018-09-14 Thread Vabhav Sharma
Changes for v2:
- Modified cmux_to_group array to include -1 terminator
- Revert NUM_CMUX to original value 8 from 16
- Remove “As LX2160A is 16 core, so modified value for NUM_CMUX”
  in patch "[PATCH 3/5] drivers: clk-qoriq: Add clockgen support for
  lx2160a" description
- Populated cache properties for L1 and L2 cache in lx2160a device-tree.
- Removed reboot node from lx2160a device-tree as PSCI is implemented.
- Removed incorrect comment for timer node interrupt property in
  lx2160a device-tree.
- Modified pmu node compatible property from "arm,armv8-pmuv3" to
  "arm,cortex-a72-pmu" in lx2160a device-tree
- Non-standard aliases removed in lx2160a rdb board device-tree
- Updated i2c child nodes to generic name in lx2160a rdb device-tree.

Changes for v1:
- Add compatible string for LX2160A clockgen support
- Add compatible string to initialize LX2160A guts driver
- Add compatible string for LX2160A support in dt-bindings
- Add dts file to enable support for LX2160A SoC and LX2160A RDB
  (Reference design board)

Vabhav Sharma (4):
  dt-bindings: arm64: add compatible for LX2160A
  soc/fsl/guts: Add compatible string for LX2160A
  arm64: dts: add QorIQ LX2160A SoC support
  arm64: dts: add LX2160ARDB board support

Yogesh Gaur (1):
  drivers: clk-qoriq: Add clockgen support for lx2160a

 Documentation/devicetree/bindings/arm/fsl.txt |  12 +
 arch/arm64/boot/dts/freescale/Makefile|   1 +
 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts |  88 +++
 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi| 693 ++
 drivers/clk/clk-qoriq.c   |  14 +-
 drivers/cpufreq/qoriq-cpufreq.c   |   1 +
 drivers/soc/fsl/guts.c|   1 +
 7 files changed, 809 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a-rdb.dts
 create mode 100644 arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi

-- 
2.7.4