Re: [RFC][PATCH 0/2] Add support for using reserved memory for ima buffer pass

2020-05-06 Thread Prakhar Srivastava

Hi Mark,

This patch set currently only address the Pure DT implementation.
EFI and ACPI implementations will be posted in subsequent patchsets.

The logs are intended to be carried over the kexec and once read the
logs are no longer needed and in prior conversation with James(
https://lore.kernel.org/linux-arm-kernel/0053eb68-0905-4679-c97a-00c5cb6f1...@arm.com/) 
the apporach of using a chosen node doesn't

support the case.

The DT entries make the reservation permanent and thus doesnt need 
kernel segments to be used for this, however using a chosen-node with

reserved memory only changes the node information but memory still is
reserved via reserved-memory section.

On 5/5/20 2:59 AM, Mark Rutland wrote:

Hi Prakhar,

On Mon, May 04, 2020 at 01:38:27PM -0700, Prakhar Srivastava wrote:

IMA during kexec(kexec file load) verifies the kernel signature and measures
the signature of the kernel. The signature in the logs can be used to verfiy the
authenticity of the kernel. The logs don not get carried over kexec and thus
remote attesation cannot verify the signature of the running kernel.

Introduce an ABI to carry forward the ima logs over kexec.
Memory reserved via device tree reservation can be used to store and read
via the of_* functions.


This flow needs to work for:

1) Pure DT
2) DT + EFI memory map
3) ACPI + EFI memory map

... and if this is just for transiently passing the log, I don't think
that a reserved memory region is the right thing to use, since they're
supposed to be more permanent.

This sounds analogous to passing the initrd, and should probably use
properties under the chosen node (which can be used for all three boot
flows above).

For reference, how big is the IMA log likely to be? Does it need
physically contiguous space?


It purely depends on the policy used and the modules/files that are 
accessed for my local testing over a kexec session the log in

about 30KB.

Current implementation expects enough contiguous memory to allocated to 
carry forward the logs. If the log size exceeds the reserved memory the

call will fail.

Thanks,
Prakhar Srivastava


Thanks,
Mark.



Reserved memory stores the size(sizeof(size_t)) of the buffer in the starting
address, followed by the IMA log contents.

Tested on:
   arm64 with Uboot

Prakhar Srivastava (2):
   Add a layer of abstraction to use the memory reserved by device tree
 for ima buffer pass.
   Add support for ima buffer pass using reserved memory for arm64 kexec.
 Update the arch sepcific code path in kexec file load to store the
 ima buffer in the reserved memory. The same reserved memory is read
 on kexec or cold boot.

  arch/arm64/Kconfig |   1 +
  arch/arm64/include/asm/ima.h   |  22 
  arch/arm64/include/asm/kexec.h |   5 +
  arch/arm64/kernel/Makefile |   1 +
  arch/arm64/kernel/ima_kexec.c  |  64 ++
  arch/arm64/kernel/machine_kexec_file.c |   1 +
  arch/powerpc/include/asm/ima.h |   3 +-
  arch/powerpc/kexec/ima.c   |  14 ++-
  drivers/of/Kconfig |   6 +
  drivers/of/Makefile|   1 +
  drivers/of/of_ima.c| 165 +
  include/linux/of.h |  34 +
  security/integrity/ima/ima_kexec.c |  15 ++-
  13 files changed, 325 insertions(+), 7 deletions(-)
  create mode 100644 arch/arm64/include/asm/ima.h
  create mode 100644 arch/arm64/kernel/ima_kexec.c
  create mode 100644 drivers/of/of_ima.c

--
2.25.1



Re: [PATCH v2 1/2] cpufreq: qoriq: convert to a platform driver

2020-05-06 Thread Viresh Kumar
On 28-04-20, 16:31, Viresh Kumar wrote:
> On 21-04-20, 10:29, Mian Yousaf Kaukab wrote:
> > The driver has to be manually loaded if it is built as a module. It
> > is neither exporting MODULE_DEVICE_TABLE nor MODULE_ALIAS. Moreover,
> > no platform-device is created (and thus no uevent is sent) for the
> > clockgen nodes it depends on.
> > 
> > Convert the module to a platform driver with its own alias. Moreover,
> > drop whitelisted SOCs. Platform device will be created only for the
> > compatible platforms.
> > 
> > Reviewed-by: Yuantian Tang 
> > Acked-by: Viresh Kumar 
> > Signed-off-by: Mian Yousaf Kaukab 
> > ---
> > v2:
> >  +Rafael, Stephen, linux-clk
> >  Add Reviewed-by and Acked-by tags
> > 
> >  drivers/cpufreq/qoriq-cpufreq.c | 76 
> > -
> >  1 file changed, 29 insertions(+), 47 deletions(-)
> 
> @Rafael,
> 
> Though this looks to be PPC stuff, but it is used on both ARM and PPC. Do you
> want to pick them up or should I do that ?

Applied now. Thanks.

-- 
viresh


Re: [PATCH V2 08/11] arch/kmap: Ensure kmap_prot visibility

2020-05-06 Thread Christoph Hellwig
On Sun, May 03, 2020 at 06:09:09PM -0700, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> We want to support kmap_atomic_prot() on all architectures and it makes
> sense to define kmap_atomic() to use the default kmap_prot.
> 
> So we ensure all arch's have a globally available kmap_prot either as a
> define or exported symbol.
> 
> Signed-off-by: Ira Weiny 

Looks good,

Reviewed-by: Christoph Hellwig 


[PATCH -next] ALSA: sound/ppc: Use bitwise instead of arithmetic operator for flags

2020-05-06 Thread Samuel Zou
Fix the following coccinelle warnings:

sound/ppc/pmac.c:729:57-58: WARNING: sum of probable bitmasks, consider |
sound/ppc/pmac.c:229:37-38: WARNING: sum of probable bitmasks, consider |

Reported-by: Hulk Robot 
Signed-off-by: Samuel Zou 
---
 sound/ppc/pmac.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/ppc/pmac.c b/sound/ppc/pmac.c
index 592532c..2e750b3 100644
--- a/sound/ppc/pmac.c
+++ b/sound/ppc/pmac.c
@@ -226,7 +226,7 @@ static int snd_pmac_pcm_prepare(struct snd_pmac *chip, 
struct pmac_stream *rec,
offset += rec->period_size;
}
/* make loop */
-   cp->command = cpu_to_le16(DBDMA_NOP + BR_ALWAYS);
+   cp->command = cpu_to_le16(DBDMA_NOP | BR_ALWAYS);
cp->cmd_dep = cpu_to_le32(rec->cmd.addr);
 
snd_pmac_dma_stop(rec);
@@ -726,7 +726,7 @@ void snd_pmac_beep_dma_start(struct snd_pmac *chip, int 
bytes, unsigned long add
chip->extra_dma.cmds->xfer_status = cpu_to_le16(0);
chip->extra_dma.cmds->cmd_dep = cpu_to_le32(chip->extra_dma.addr);
chip->extra_dma.cmds->phy_addr = cpu_to_le32(addr);
-   chip->extra_dma.cmds->command = cpu_to_le16(OUTPUT_MORE + BR_ALWAYS);
+   chip->extra_dma.cmds->command = cpu_to_le16(OUTPUT_MORE | BR_ALWAYS);
out_le32(>awacs->control,
 (in_le32(>awacs->control) & ~0x1f00)
 | (speed << 8));
-- 
2.6.2



[PATCH -next] soc: fsl_asrc: Make some functions static

2020-05-06 Thread ChenTao
Fix the following warning:

sound/soc/fsl/fsl_asrc.c:157:5: warning:
symbol 'fsl_asrc_request_pair' was not declared. Should it be static?
sound/soc/fsl/fsl_asrc.c:200:6: warning:
symbol 'fsl_asrc_release_pair' was not declared. Should it be static?

Reported-by: Hulk Robot 
Signed-off-by: ChenTao 
---
 sound/soc/fsl/fsl_asrc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index 067a54ab554f..432936039de4 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -154,7 +154,7 @@ static void fsl_asrc_sel_proc(int inrate, int outrate,
  * within range [ANCA, ANCA+ANCB-1], depends on the channels of pair A
  * while pair A and pair C are comparatively independent.
  */
-int fsl_asrc_request_pair(int channels, struct fsl_asrc_pair *pair)
+static int fsl_asrc_request_pair(int channels, struct fsl_asrc_pair *pair)
 {
enum asrc_pair_index index = ASRC_INVALID_PAIR;
struct fsl_asrc *asrc = pair->asrc;
@@ -197,7 +197,7 @@ int fsl_asrc_request_pair(int channels, struct 
fsl_asrc_pair *pair)
  *
  * It clears the resource from asrc and releases the occupied channels.
  */
-void fsl_asrc_release_pair(struct fsl_asrc_pair *pair)
+static void fsl_asrc_release_pair(struct fsl_asrc_pair *pair)
 {
struct fsl_asrc *asrc = pair->asrc;
enum asrc_pair_index index = pair->index;
-- 
2.22.0



Re: [PATCH v2] powerpc/ima: fix secure boot rules in ima arch policy

2020-05-06 Thread Mimi Zohar
On Fri, 2020-05-01 at 10:16 -0400, Nayna Jain wrote:
> To prevent verifying the kernel module appended signature twice
> (finit_module), once by the module_sig_check() and again by IMA, powerpc
> secure boot rules define an IMA architecture specific policy rule
> only if CONFIG_MODULE_SIG_FORCE is not enabled. This, unfortunately, does
> not take into account the ability of enabling "sig_enforce" on the boot
> command line (module.sig_enforce=1).
> 
> Including the IMA module appraise rule results in failing the finit_module
> syscall, unless the module signing public key is loaded onto the IMA
> keyring.
> 
> This patch fixes secure boot policy rules to be based on CONFIG_MODULE_SIG
> instead.
> 
> Fixes: 4238fad366a6 ("powerpc/ima: Add support to initialize ima policy 
> rules")
> Signed-off-by: Nayna Jain 

Thanks, Nayna.

Signed-off-by: Mimi Zohar 


Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-05-06 Thread Segher Boessenkool
On Wed, May 06, 2020 at 08:10:57PM +0200, Christophe Leroy wrote:
> Le 06/05/2020 à 19:58, Segher Boessenkool a écrit :
> >>  #define __put_user_asm_goto(x, addr, label, op)   \
> >>asm volatile goto(  \
> >>-   "1: " op "%U1%X1 %0,%1  # put_user\n"   \
> >>+   "1: " op "%X1 %0,%1 # put_user\n"   \
> >>EX_TABLE(1b, %l2)   \
> >>:   \
> >>-   : "r" (x), "m<>" (*addr)\
> >>+   : "r" (x), "m" (*addr)  \
> >>:   \
> >>: label)
> >
> >Like that.  But you will have to do that to *all* places we use the "<>"
> >constraints, or wait for more stuff to fail?  And, there probably are
> >places we *do* want update form insns used (they do help in some loops,
> >for example)?
> >
> 
> AFAICT, git grep "m<>" provides no result.

Ah, okay.

> However many places have %Ux:



Yes, all of those are from when "m" still meant what "m<>" is now.  For
seeing how many update form insns can be generated (and how much that
matters), these all should be fixed, at a minimum.


Segher


Re: [PATCH V2 08/11] arch/kmap: Ensure kmap_prot visibility

2020-05-06 Thread Ira Weiny
On Tue, May 05, 2020 at 11:13:26PM -0700, Christoph Hellwig wrote:
> On Sun, May 03, 2020 at 06:09:09PM -0700, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > We want to support kmap_atomic_prot() on all architectures and it makes
> > sense to define kmap_atomic() to use the default kmap_prot.
> > 
> > So we ensure all arch's have a globally available kmap_prot either as a
> > define or exported symbol.
> 
> FYI, I still think a
> 
> #ifndef kmap_prot
> #define kmap_prot PAGE_KERNEL
> #endif
> 
> in linux/highmem.h would be nicer.  Then only xtensa and sparc need
> to override it and clearly stand out.

That would be nice...  But...  in this particular patch kmap_prot needs to be
in arch/microblaze/include/asm/highmem.h to preserve bisect-ability.

So there would be an inversion with this define and the core #ifndef...

I like the change but I'm going to add this change as a follow on patch because
at the end of the series microblaze no longer needs this.

If this is reasonable could I get a review on this patch to add to the next
series?

Ira



Re: [PATCH V2 05/11] {x86,powerpc,microblaze}/kmap: Move preempt disable

2020-05-06 Thread Ira Weiny
On Tue, May 05, 2020 at 11:11:13PM -0700, Christoph Hellwig wrote:
> On Sun, May 03, 2020 at 06:09:06PM -0700, ira.we...@intel.com wrote:
> > From: Ira Weiny 
> > 
> > During this kmap() conversion series we must maintain bisect-ability.
> > To do this, kmap_atomic_prot() in x86, powerpc, and microblaze need to
> > remain functional.
> > 
> > Create a temporary inline version of kmap_atomic_prot within these
> > architectures so we can rework their kmap_atomic() calls and then lift
> > kmap_atomic_prot() to the core.
> > 
> > Signed-off-by: Ira Weiny 
> > 
> > ---
> > Changes from V1:
> > New patch
> > ---
> >  arch/microblaze/include/asm/highmem.h | 11 ++-
> >  arch/microblaze/mm/highmem.c  | 10 ++
> >  arch/powerpc/include/asm/highmem.h| 11 ++-
> >  arch/powerpc/mm/highmem.c |  9 ++---
> >  arch/x86/include/asm/highmem.h| 11 ++-
> >  arch/x86/mm/highmem_32.c  | 10 ++
> >  6 files changed, 36 insertions(+), 26 deletions(-)
> > 
> > diff --git a/arch/microblaze/include/asm/highmem.h 
> > b/arch/microblaze/include/asm/highmem.h
> > index 0c94046f2d58..ec9954b091e1 100644
> > --- a/arch/microblaze/include/asm/highmem.h
> > +++ b/arch/microblaze/include/asm/highmem.h
> > @@ -51,7 +51,16 @@ extern pte_t *pkmap_page_table;
> >  #define PKMAP_NR(virt)  ((virt - PKMAP_BASE) >> PAGE_SHIFT)
> >  #define PKMAP_ADDR(nr)  (PKMAP_BASE + ((nr) << PAGE_SHIFT))
> >  
> > -extern void *kmap_atomic_prot(struct page *page, pgprot_t prot);
> > +extern void *kmap_atomic_high_prot(struct page *page, pgprot_t prot);
> > +void *kmap_atomic_prot(struct page *page, pgprot_t prot)
> 
> Shouldn't this be marked inline?

Yes Thanks.  Done.

> 
> The rest looks fine:
> 
> Reviewed-by: Christoph Hellwig 

Thanks,
Ira



Re: [PATCH net 11/16] net: ethernet: marvell: mvneta: fix fixed-link phydev leaks

2020-05-06 Thread Naresh Kamboju
On Tue, 29 Nov 2016 at 00:00, Johan Hovold  wrote:
>
> Make sure to deregister and free any fixed-link PHY registered using
> of_phy_register_fixed_link() on probe errors and on driver unbind.
>
> Fixes: 83895bedeee6 ("net: mvneta: add support for fixed links")
> Signed-off-by: Johan Hovold 
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index 0c0a45af950f..707bc4680b9b 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -4191,6 +4191,8 @@ static int mvneta_probe(struct platform_device *pdev)
> clk_disable_unprepare(pp->clk);
>  err_put_phy_node:
> of_node_put(phy_node);
> +   if (of_phy_is_fixed_link(dn))
> +   of_phy_deregister_fixed_link(dn);

While building kernel Image for arm architecture on stable-rc 4.4 branch
the following build error found.

drivers/net/ethernet/marvell/mvneta.c:3442:3: error: implicit
declaration of function 'of_phy_deregister_fixed_link'; did you mean
'of_phy_register_fixed_link'? [-Werror=implicit-function-declaration]
|of_phy_deregister_fixed_link(dn);
|^~~~
|of_phy_register_fixed_link

ref:
https://gitlab.com/Linaro/lkft/kernel-runs/-/jobs/541374729

- Naresh


Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-05-06 Thread Christophe Leroy




Le 06/05/2020 à 19:58, Segher Boessenkool a écrit :

On Wed, May 06, 2020 at 10:58:55AM +1000, Michael Ellerman wrote:

The "m<>" here is breaking GCC 4.6.3, which we allegedly still support.


[ You shouldn't use 4.6.3, there has been 4.6.4 since a while.  And 4.6
   is nine years old now.  Most projects do not support < 4.8 anymore, on
   any architecture.  ]


Moving up to 4.6.4 wouldn't actually help with this though would it?


Nope.  But 4.6.4 is a bug-fix release, 91 bugs fixed since 4.6.3, so you
should switch to it if you can :-)


Also I have 4.6.3 compilers already built, I don't really have time to
rebuild them for 4.6.4.

The kernel has a top-level minimum version, which I'm not in charge of, see:

https://www.kernel.org/doc/html/latest/process/changes.html?highlight=gcc


Yes, I know.  And it is much preferred not to have stricter requirements
for Power, I know that too.  Something has to give though :-/


There were discussions about making 4.8 the minimum, but I'm not sure
where they got to.


Yeah, just petered out I think?

All significant distros come with a 4.8 as system compiler.


Plain "m" works, how much does the "<>" affect code gen in practice?

A quick diff here shows no difference from removing "<>".


It will make it impossible to use update-form instructions here.  That
probably does not matter much at all, in this case.

If you remove the "<>" constraints, also remove the "%Un" output modifier?


So like this?

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 62cc8d7640ec..ca847aed8e45 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -207,10 +207,10 @@ do {  
\
  
  #define __put_user_asm_goto(x, addr, label, op)			\

asm volatile goto(  \
-   "1:" op "%U1%X1 %0,%1# put_user\n"  \
+   "1:" op "%X1 %0,%1   # put_user\n"  \
EX_TABLE(1b, %l2)   \
:   \
-   : "r" (x), "m<>" (*addr)  \
+   : "r" (x), "m" (*addr)  \
:   \
: label)


Like that.  But you will have to do that to *all* places we use the "<>"
constraints, or wait for more stuff to fail?  And, there probably are
places we *do* want update form insns used (they do help in some loops,
for example)?



AFAICT, git grep "m<>" provides no result.

However many places have %Ux:

arch/powerpc/boot/io.h:	__asm__ __volatile__("lbz%U1%X1 %0,%1; twi 
0,%0,0; isync"

arch/powerpc/boot/io.h: __asm__ __volatile__("stb%U0%X0 %1,%0; sync"
arch/powerpc/boot/io.h:	__asm__ __volatile__("lhz%U1%X1 %0,%1; twi 
0,%0,0; isync"

arch/powerpc/boot/io.h: __asm__ __volatile__("sth%U0%X0 %1,%0; sync"
arch/powerpc/boot/io.h:	__asm__ __volatile__("lwz%U1%X1 %0,%1; twi 
0,%0,0; isync"

arch/powerpc/boot/io.h: __asm__ __volatile__("stw%U0%X0 %1,%0; sync"
arch/powerpc/include/asm/atomic.h:	__asm__ __volatile__("lwz%U1%X1 
%0,%1" : "=r"(t) : "m"(v->counter));
arch/powerpc/include/asm/atomic.h:	__asm__ __volatile__("stw%U0%X0 
%1,%0" : "=m"(v->counter) : "r"(i));
arch/powerpc/include/asm/atomic.h:	__asm__ __volatile__("ld%U1%X1 %0,%1" 
: "=r"(t) : "m"(v->counter));
arch/powerpc/include/asm/atomic.h:	__asm__ __volatile__("std%U0%X0 
%1,%0" : "=m"(v->counter) : "r"(i));

arch/powerpc/include/asm/book3s/32/pgtable.h:   stw%U0%X0 %2,%0\n\
arch/powerpc/include/asm/book3s/32/pgtable.h:   stw%U0%X0 %L2,%1"
arch/powerpc/include/asm/io.h:	__asm__ __volatile__("sync;"#insn"%U1%X1 
%0,%1;twi 0,%0,0;isync"\
arch/powerpc/include/asm/io.h:	__asm__ __volatile__("sync;"#insn"%U0%X0 
%1,%0"			\

arch/powerpc/include/asm/nohash/pgtable.h:  stw%U0%X0 
%2,%0\n\
arch/powerpc/include/asm/nohash/pgtable.h:  stw%U0%X0 
%L2,%1"
arch/powerpc/kvm/powerpc.c:	asm ("lfs%U1%X1 0,%1; stfd%U0%X0 0,%0" : 
"=m" (fprd) : "m" (fprs)
arch/powerpc/kvm/powerpc.c:	asm ("lfd%U1%X1 0,%1; stfs%U0%X0 0,%0" : 
"=m" (fprs) : "m" (fprd)



Christophe


Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-05-06 Thread Segher Boessenkool
On Wed, May 06, 2020 at 11:36:00AM +1000, Michael Ellerman wrote:
> >> As far as I understood that's mandatory on recent gcc to get the 
> >> pre-update form of the instruction. With older versions "m" was doing 
> >> the same, but not anymore.
> >
> > Yes.  How much that matters depends on the asm.  On older CPUs (6xx/7xx,
> > say) the update form was just as fast as the non-update form.  On newer
> > or bigger CPUs it is usually executed just the same as an add followed
> > by the memory access, so it just saves a bit of code size.
> 
> The update-forms are stdux, sthux etc. right?

And stdu, sthu, etc.  "x" is "indexed form" (reg+reg addressing).

> I don't see any change in the number of those with or without the
> constraint. That's using GCC 9.3.0.

It's most useful in loops (and happens more often there).  You probably
do not have many loops that allow update form insns.

> >> Should we ifdef the "m<>" or "m" based on GCC 
> >> version ?
> >
> > That will be a lot of churn.  Just make 4.8 minimum?
> 
> As I said in my other mail that's not really up to us. We could mandate
> a higher minimum for powerpc, but I'd rather not.

Yeah, I quite understand that.

> I think for now I'm inclined to just drop the "<>", and we can revisit
> in a release or two when hopefully GCC 4.8 has become the minimum.

An unhappy resolution, but it leaves a light on the horizon :-)

In that case, leave the "%Un", if you will but the "<>" back soonish?
Not much point removing it and putting it back later (it is harmless,
there are more instances of it in the kernel as well, since many years).

Thanks!


Segher


Re: [PATCH v4 1/2] powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'

2020-05-06 Thread Segher Boessenkool
On Wed, May 06, 2020 at 10:58:55AM +1000, Michael Ellerman wrote:
> >> The "m<>" here is breaking GCC 4.6.3, which we allegedly still support.
> >
> > [ You shouldn't use 4.6.3, there has been 4.6.4 since a while.  And 4.6
> >   is nine years old now.  Most projects do not support < 4.8 anymore, on
> >   any architecture.  ]
> 
> Moving up to 4.6.4 wouldn't actually help with this though would it?

Nope.  But 4.6.4 is a bug-fix release, 91 bugs fixed since 4.6.3, so you
should switch to it if you can :-)

> Also I have 4.6.3 compilers already built, I don't really have time to
> rebuild them for 4.6.4.
> 
> The kernel has a top-level minimum version, which I'm not in charge of, see:
> 
> https://www.kernel.org/doc/html/latest/process/changes.html?highlight=gcc

Yes, I know.  And it is much preferred not to have stricter requirements
for Power, I know that too.  Something has to give though :-/

> There were discussions about making 4.8 the minimum, but I'm not sure
> where they got to.

Yeah, just petered out I think?

All significant distros come with a 4.8 as system compiler.

> >> Plain "m" works, how much does the "<>" affect code gen in practice?
> >> 
> >> A quick diff here shows no difference from removing "<>".
> >
> > It will make it impossible to use update-form instructions here.  That
> > probably does not matter much at all, in this case.
> >
> > If you remove the "<>" constraints, also remove the "%Un" output modifier?
> 
> So like this?
> 
> diff --git a/arch/powerpc/include/asm/uaccess.h 
> b/arch/powerpc/include/asm/uaccess.h
> index 62cc8d7640ec..ca847aed8e45 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -207,10 +207,10 @@ do {
> \
>  
>  #define __put_user_asm_goto(x, addr, label, op)  \
>   asm volatile goto(  \
> - "1: " op "%U1%X1 %0,%1  # put_user\n"   \
> + "1: " op "%X1 %0,%1 # put_user\n"   \
>   EX_TABLE(1b, %l2)   \
>   :   \
> - : "r" (x), "m<>" (*addr)\
> + : "r" (x), "m" (*addr)  \
>   :   \
>   : label)

Like that.  But you will have to do that to *all* places we use the "<>"
constraints, or wait for more stuff to fail?  And, there probably are
places we *do* want update form insns used (they do help in some loops,
for example)?


Segher


[PATCH v2 45/45] powerpc/32s: Implement dedicated kasan_init_region()

2020-05-06 Thread Christophe Leroy
Implement a kasan_init_region() dedicated to book3s/32 that
allocates KASAN regions using BATs.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/kasan.h  |  1 +
 arch/powerpc/mm/kasan/Makefile|  1 +
 arch/powerpc/mm/kasan/book3s_32.c | 57 +++
 arch/powerpc/mm/kasan/kasan_init_32.c |  2 +-
 4 files changed, 60 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/mm/kasan/book3s_32.c

diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 107a24c3f7b3..be85c7005fb1 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -34,6 +34,7 @@ static inline void kasan_init(void) { }
 static inline void kasan_late_init(void) { }
 #endif
 
+void kasan_update_early_region(unsigned long k_start, unsigned long k_end, 
pte_t pte);
 int kasan_init_shadow_page_tables(unsigned long k_start, unsigned long k_end);
 int kasan_init_region(void *start, size_t size);
 
diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index 440038ea79f1..bb1a5408b86b 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -4,3 +4,4 @@ KASAN_SANITIZE := n
 
 obj-$(CONFIG_PPC32)   += kasan_init_32.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
+obj-$(CONFIG_PPC_BOOK3S_32)+= book3s_32.o
diff --git a/arch/powerpc/mm/kasan/book3s_32.c 
b/arch/powerpc/mm/kasan/book3s_32.c
new file mode 100644
index ..4bc491a4a1fd
--- /dev/null
+++ b/arch/powerpc/mm/kasan/book3s_32.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define DISABLE_BRANCH_PROFILING
+
+#include 
+#include 
+#include 
+#include 
+
+int __init kasan_init_region(void *start, size_t size)
+{
+   unsigned long k_start = (unsigned long)kasan_mem_to_shadow(start);
+   unsigned long k_end = (unsigned long)kasan_mem_to_shadow(start + size);
+   unsigned long k_cur = k_start;
+   int k_size = k_end - k_start;
+   int k_size_base = 1 << (ffs(k_size) - 1);
+   int ret;
+   void *block;
+
+   block = memblock_alloc(k_size, k_size_base);
+
+   if (block && k_size_base >= SZ_128K && k_start == ALIGN(k_start, 
k_size_base)) {
+   int k_size_more = 1 << (ffs(k_size - k_size_base) - 1);
+
+   setbat(-1, k_start, __pa(block), k_size_base, PAGE_KERNEL);
+   if (k_size_more >= SZ_128K)
+   setbat(-1, k_start + k_size_base, __pa(block) + 
k_size_base,
+  k_size_more, PAGE_KERNEL);
+   if (v_block_mapped(k_start))
+   k_cur = k_start + k_size_base;
+   if (v_block_mapped(k_start + k_size_base))
+   k_cur = k_start + k_size_base + k_size_more;
+
+   update_bats();
+   }
+
+   if (!block)
+   block = memblock_alloc(k_size, PAGE_SIZE);
+   if (!block)
+   return -ENOMEM;
+
+   ret = kasan_init_shadow_page_tables(k_start, k_end);
+   if (ret)
+   return ret;
+
+   kasan_update_early_region(k_start, k_cur, __pte(0));
+
+   for (; k_cur < k_end; k_cur += PAGE_SIZE) {
+   pmd_t *pmd = pmd_ptr_k(k_cur);
+   void *va = block + k_cur - k_start;
+   pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
+
+   __set_pte_at(_mm, k_cur, pte_offset_kernel(pmd, k_cur), 
pte, 0);
+   }
+   flush_tlb_kernel_range(k_start, k_end);
+   return 0;
+}
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index 76d418af4ce8..c42085801c04 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -79,7 +79,7 @@ int __init __weak kasan_init_region(void *start, size_t size)
return 0;
 }
 
-static void __init
+void __init
 kasan_update_early_region(unsigned long k_start, unsigned long k_end, pte_t 
pte)
 {
unsigned long k_cur;
-- 
2.25.0



[PATCH v2 44/45] powerpc/32s: Allow mapping with BATs with DEBUG_PAGEALLOC

2020-05-06 Thread Christophe Leroy
DEBUG_PAGEALLOC only manages RW data.

Text and RO data can still be mapped with BATs.

In order to map with BATs, also enforce data alignment. Set
by default to 256M which is a good compromise for keeping
enough BATs for also KASAN and IMMR.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   | 1 +
 arch/powerpc/mm/book3s32/mmu.c | 6 ++
 arch/powerpc/mm/init_32.c  | 5 ++---
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 9d94e8b178d8..5c1fcfe9be74 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -796,6 +796,7 @@ config DATA_SHIFT
range 17 28 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC) && PPC_BOOK3S_32
range 19 23 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC) && PPC_8xx
default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   default 18 if DEBUG_PAGEALLOC && PPC_BOOK3S_32
default 23 if STRICT_KERNEL_RWX && PPC_8xx
default 23 if DEBUG_PAGEALLOC && PPC_8xx && PIN_TLB_DATA
default 19 if DEBUG_PAGEALLOC && PPC_8xx
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index a9b2cbc74797..a6dcc708eee3 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -170,6 +170,12 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
pr_debug("RAM mapped without BATs\n");
return base;
}
+   if (debug_pagealloc_enabled()) {
+   if (base >= border)
+   return base;
+   if (top >= border)
+   top = border;
+   }
 
if (!strict_kernel_rwx_enabled() || base >= border || top <= border)
return __mmu_mapin_ram(base, top);
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 8977a7c2543d..36c39bd37256 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -99,10 +99,9 @@ static void __init MMU_setup(void)
if (IS_ENABLED(CONFIG_PPC_8xx))
return;
 
-   if (debug_pagealloc_enabled()) {
-   __map_without_bats = 1;
+   if (debug_pagealloc_enabled())
__map_without_ltlbs = 1;
-   }
+
if (strict_kernel_rwx_enabled())
__map_without_ltlbs = 1;
 }
-- 
2.25.0



[PATCH v2 43/45] powerpc/8xx: Implement dedicated kasan_init_region()

2020-05-06 Thread Christophe Leroy
Implement a kasan_init_region() dedicated to 8xx that
allocates KASAN regions using huge pages.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/kasan/8xx.c| 74 ++
 arch/powerpc/mm/kasan/Makefile |  1 +
 2 files changed, 75 insertions(+)
 create mode 100644 arch/powerpc/mm/kasan/8xx.c

diff --git a/arch/powerpc/mm/kasan/8xx.c b/arch/powerpc/mm/kasan/8xx.c
new file mode 100644
index ..db4ef44af22f
--- /dev/null
+++ b/arch/powerpc/mm/kasan/8xx.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define DISABLE_BRANCH_PROFILING
+
+#include 
+#include 
+#include 
+#include 
+
+static int __init
+kasan_init_shadow_8M(unsigned long k_start, unsigned long k_end, void *block)
+{
+   pmd_t *pmd = pmd_ptr_k(k_start);
+   unsigned long k_cur, k_next;
+
+   for (k_cur = k_start; k_cur != k_end; k_cur = k_next, pmd += 2, block 
+= SZ_8M) {
+   pte_basic_t *new;
+
+   k_next = pgd_addr_end(k_cur, k_end);
+   k_next = pgd_addr_end(k_next, k_end);
+   if ((void *)pmd_page_vaddr(*pmd) != kasan_early_shadow_pte)
+   continue;
+
+   new = memblock_alloc(sizeof(pte_basic_t), SZ_4K);
+   if (!new)
+   return -ENOMEM;
+
+   *new = pte_val(pte_mkhuge(pfn_pte(PHYS_PFN(__pa(block)), 
PAGE_KERNEL)));
+
+   hugepd_populate_kernel((hugepd_t *)pmd, (pte_t *)new, 
PAGE_SHIFT_8M);
+   hugepd_populate_kernel((hugepd_t *)pmd + 1, (pte_t *)new, 
PAGE_SHIFT_8M);
+   }
+   return 0;
+}
+
+int __init kasan_init_region(void *start, size_t size)
+{
+   unsigned long k_start = (unsigned long)kasan_mem_to_shadow(start);
+   unsigned long k_end = (unsigned long)kasan_mem_to_shadow(start + size);
+   unsigned long k_cur;
+   int ret;
+   void *block;
+
+   block = memblock_alloc(k_end - k_start, SZ_8M);
+   if (!block)
+   return -ENOMEM;
+
+   if (IS_ALIGNED(k_start, SZ_8M)) {
+   kasan_init_shadow_8M(k_start, ALIGN_DOWN(k_end, SZ_8M), block);
+   k_cur = ALIGN_DOWN(k_end, SZ_8M);
+   if (k_cur == k_end)
+   goto finish;
+   } else {
+   k_cur = k_start;
+   }
+
+   ret = kasan_init_shadow_page_tables(k_start, k_end);
+   if (ret)
+   return ret;
+
+   for (; k_cur < k_end; k_cur += PAGE_SIZE) {
+   pmd_t *pmd = pmd_ptr_k(k_cur);
+   void *va = block + k_cur - k_start;
+   pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
+
+   if (k_cur < ALIGN_DOWN(k_end, SZ_512K))
+   pte = pte_mkhuge(pte);
+
+   __set_pte_at(_mm, k_cur, pte_offset_kernel(pmd, k_cur), 
pte, 0);
+   }
+finish:
+   flush_tlb_kernel_range(k_start, k_end);
+   return 0;
+}
diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index 6577897673dd..440038ea79f1 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -3,3 +3,4 @@
 KASAN_SANITIZE := n
 
 obj-$(CONFIG_PPC32)   += kasan_init_32.o
+obj-$(CONFIG_PPC_8xx)  += 8xx.o
-- 
2.25.0



[PATCH v2 42/45] powerpc/8xx: Allow large TLBs with DEBUG_PAGEALLOC

2020-05-06 Thread Christophe Leroy
DEBUG_PAGEALLOC only manages RW data.

Text and RO data can still be mapped with hugepages and pinned TLB.

In order to map with hugepages, also enforce a 512kB data alignment
minimum. That's a trade-off between size of speed, taking into
account that DEBUG_PAGEALLOC is a debug option. Anyway the alignment
is still tunable.

We also allow tuning of alignment for book3s to limit the complexity
of the test in Kconfig that will anyway disappear in the following
patches once DEBUG_PAGEALLOC is handled together with BATs.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   | 11 +++
 arch/powerpc/mm/init_32.c  |  5 -
 arch/powerpc/mm/nohash/8xx.c   | 11 ---
 arch/powerpc/platforms/8xx/Kconfig |  2 +-
 4 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index edbe39140da0..9d94e8b178d8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -780,8 +780,9 @@ config THREAD_SHIFT
 config DATA_SHIFT_BOOL
bool "Set custom data alignment"
depends on ADVANCED_OPTIONS
-   depends on STRICT_KERNEL_RWX
-   depends on PPC_BOOK3S_32 || (PPC_8xx && !PIN_TLB_DATA && !PIN_TLB_TEXT)
+   depends on STRICT_KERNEL_RWX || DEBUG_PAGEALLOC
+   depends on PPC_BOOK3S_32 || (PPC_8xx && !PIN_TLB_DATA && \
+(!PIN_TLB_TEXT || !STRICT_KERNEL_RWX))
help
  This option allows you to set the kernel data alignment. When
  RAM is mapped by blocks, the alignment needs to fit the size and
@@ -792,10 +793,12 @@ config DATA_SHIFT_BOOL
 config DATA_SHIFT
int "Data shift" if DATA_SHIFT_BOOL
default 24 if STRICT_KERNEL_RWX && PPC64
-   range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
-   range 19 23 if STRICT_KERNEL_RWX && PPC_8xx
+   range 17 28 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC) && PPC_BOOK3S_32
+   range 19 23 if (STRICT_KERNEL_RWX || DEBUG_PAGEALLOC) && PPC_8xx
default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default 23 if STRICT_KERNEL_RWX && PPC_8xx
+   default 23 if DEBUG_PAGEALLOC && PPC_8xx && PIN_TLB_DATA
+   default 19 if DEBUG_PAGEALLOC && PPC_8xx
default PPC_PAGE_SHIFT
help
  On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO.
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index a6991ef8727d..8977a7c2543d 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -96,11 +96,14 @@ static void __init MMU_setup(void)
if (strstr(boot_command_line, "noltlbs")) {
__map_without_ltlbs = 1;
}
+   if (IS_ENABLED(CONFIG_PPC_8xx))
+   return;
+
if (debug_pagealloc_enabled()) {
__map_without_bats = 1;
__map_without_ltlbs = 1;
}
-   if (strict_kernel_rwx_enabled() && !IS_ENABLED(CONFIG_PPC_8xx))
+   if (strict_kernel_rwx_enabled())
__map_without_ltlbs = 1;
 }
 
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index 35796ce81695..e112eb157e48 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -149,7 +149,8 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 {
unsigned long etext8 = ALIGN(__pa(_etext), SZ_8M);
unsigned long sinittext = __pa(_sinittext);
-   unsigned long boundary = strict_kernel_rwx_enabled() ? sinittext : 
etext8;
+   bool strict_boundary = strict_kernel_rwx_enabled() || 
debug_pagealloc_enabled();
+   unsigned long boundary = strict_boundary ? sinittext : etext8;
unsigned long einittext8 = ALIGN(__pa(_einittext), SZ_8M);
 
WARN_ON(top < einittext8);
@@ -160,8 +161,12 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
return 0;
 
mmu_mapin_ram_chunk(0, boundary, PAGE_KERNEL_TEXT, true);
-   mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL_TEXT, true);
-   mmu_mapin_ram_chunk(einittext8, top, PAGE_KERNEL, true);
+   if (debug_pagealloc_enabled()) {
+   top = boundary;
+   } else {
+   mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL_TEXT, 
true);
+   mmu_mapin_ram_chunk(einittext8, top, PAGE_KERNEL, true);
+   }
 
if (top > SZ_32M)
memblock_set_current_limit(top);
diff --git a/arch/powerpc/platforms/8xx/Kconfig 
b/arch/powerpc/platforms/8xx/Kconfig
index 05669f2fadce..abb2b45b2789 100644
--- a/arch/powerpc/platforms/8xx/Kconfig
+++ b/arch/powerpc/platforms/8xx/Kconfig
@@ -167,7 +167,7 @@ menu "8xx advanced setup"
 
 config PIN_TLB
bool "Pinned Kernel TLBs"
-   depends on ADVANCED_OPTIONS && !DEBUG_PAGEALLOC
+   depends on ADVANCED_OPTIONS
help
  On the 8xx, we have 32 instruction TLBs and 32 data TLBs. In each
  table 4 TLBs can be pinned.
-- 
2.25.0



[PATCH v2 41/45] powerpc/8xx: Allow STRICT_KERNEL_RwX with pinned TLB

2020-05-06 Thread Christophe Leroy
Pinned TLB are 8M. Now that there is no strict boundary anymore
between text and RO data, it is possible to use 8M pinned executable
TLB that covers both text and RO data.

When PIN_TLB_DATA or PIN_TLB_TEXT is selected, enforce 8M RW data
alignment and allow STRICT_KERNEL_RWX.

Signed-off-by: Christophe Leroy 
---
v2: Use the new function that sets all pinned TLBs at once.
---
 arch/powerpc/Kconfig   | 8 +---
 arch/powerpc/mm/nohash/8xx.c   | 9 +++--
 arch/powerpc/platforms/8xx/Kconfig | 2 +-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 970a5802850f..edbe39140da0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -778,9 +778,10 @@ config THREAD_SHIFT
  want. Only change this if you know what you are doing.
 
 config DATA_SHIFT_BOOL
-   bool "Set custom data alignment" if STRICT_KERNEL_RWX && \
-   (PPC_BOOK3S_32 || PPC_8xx)
+   bool "Set custom data alignment"
depends on ADVANCED_OPTIONS
+   depends on STRICT_KERNEL_RWX
+   depends on PPC_BOOK3S_32 || (PPC_8xx && !PIN_TLB_DATA && !PIN_TLB_TEXT)
help
  This option allows you to set the kernel data alignment. When
  RAM is mapped by blocks, the alignment needs to fit the size and
@@ -802,7 +803,8 @@ config DATA_SHIFT
 
  On 8xx, large pages (512kb or 8M) are used to map kernel linear
  memory. Aligning to 8M reduces TLB misses as only 8M pages are used
- in that case.
+ in that case. If PIN_TLB is selected, it must be aligned to 8M as
+ 8M pages will be pinned.
 
 config FORCE_MAX_ZONEORDER
int "Maximum zone order"
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index c62cab996d4d..35796ce81695 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -126,8 +126,8 @@ void __init mmu_mapin_immr(void)
PAGE_KERNEL_NCG, MMU_PAGE_512K, true);
 }
 
-static void __init mmu_mapin_ram_chunk(unsigned long offset, unsigned long top,
-  pgprot_t prot, bool new)
+static void mmu_mapin_ram_chunk(unsigned long offset, unsigned long top,
+   pgprot_t prot, bool new)
 {
unsigned long v = PAGE_OFFSET + offset;
unsigned long p = offset;
@@ -180,6 +180,9 @@ void mmu_mark_initmem_nx(void)
 
mmu_mapin_ram_chunk(0, boundary, PAGE_KERNEL_TEXT, false);
mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, false);
+
+   if (IS_ENABLED(CONFIG_PIN_TLB_TEXT))
+   mmu_pin_tlb(block_mapped_ram, false);
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
@@ -188,6 +191,8 @@ void mmu_mark_rodata_ro(void)
unsigned long sinittext = __pa(_sinittext);
 
mmu_mapin_ram_chunk(0, sinittext, PAGE_KERNEL_ROX, false);
+   if (IS_ENABLED(CONFIG_PIN_TLB_DATA))
+   mmu_pin_tlb(block_mapped_ram, true);
 }
 #endif
 
diff --git a/arch/powerpc/platforms/8xx/Kconfig 
b/arch/powerpc/platforms/8xx/Kconfig
index 04ea1a8a0bdc..05669f2fadce 100644
--- a/arch/powerpc/platforms/8xx/Kconfig
+++ b/arch/powerpc/platforms/8xx/Kconfig
@@ -167,7 +167,7 @@ menu "8xx advanced setup"
 
 config PIN_TLB
bool "Pinned Kernel TLBs"
-   depends on ADVANCED_OPTIONS && !DEBUG_PAGEALLOC && !STRICT_KERNEL_RWX
+   depends on ADVANCED_OPTIONS && !DEBUG_PAGEALLOC
help
  On the 8xx, we have 32 instruction TLBs and 32 data TLBs. In each
  table 4 TLBs can be pinned.
-- 
2.25.0



[PATCH v2 40/45] powerpc/8xx: Map linear memory with huge pages

2020-05-06 Thread Christophe Leroy
Map linear memory space with 512k and 8M pages whenever
possible.

Three mappings are performed:
- One for kernel text
- One for RO data
- One for the rest

Separating the mappings is done to be able to update the
protection later when using STRICT_KERNEL_RWX.

The ITLB miss handler now need to also handle huge TLBs
unless kernel text in pinned.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S |  4 +--
 arch/powerpc/mm/nohash/8xx.c   | 50 +-
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 9a117b9f0998..abb71fad7d6a 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -224,7 +224,7 @@ InstructionTLBMiss:
 3:
mtcrr11
 #endif
-#ifdef CONFIG_HUGETLBFS
+#if defined(CONFIG_HUGETLBFS) || !defined(CONFIG_PIN_TLB_TEXT)
lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r10)/* Get level 1 
entry */
mtspr   SPRN_MD_TWC, r11
 #else
@@ -234,7 +234,7 @@ InstructionTLBMiss:
 #endif
mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
-#ifdef CONFIG_HUGETLBFS
+#if defined(CONFIG_HUGETLBFS) || !defined(CONFIG_PIN_TLB_TEXT)
rlwimi  r11, r10, 32 - 9, _PMD_PAGE_512K
mtspr   SPRN_MI_TWC, r11
 #endif
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index fb31a0c1c2a4..c62cab996d4d 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -126,20 +126,68 @@ void __init mmu_mapin_immr(void)
PAGE_KERNEL_NCG, MMU_PAGE_512K, true);
 }
 
+static void __init mmu_mapin_ram_chunk(unsigned long offset, unsigned long top,
+  pgprot_t prot, bool new)
+{
+   unsigned long v = PAGE_OFFSET + offset;
+   unsigned long p = offset;
+
+   WARN_ON(!IS_ALIGNED(offset, SZ_512K) || !IS_ALIGNED(top, SZ_512K));
+
+   for (; p < ALIGN(p, SZ_8M) && p < top; p += SZ_512K, v += SZ_512K)
+   __early_map_kernel_hugepage(v, p, prot, MMU_PAGE_512K, new);
+   for (; p < ALIGN_DOWN(top, SZ_8M) && p < top; p += SZ_8M, v += SZ_8M)
+   __early_map_kernel_hugepage(v, p, prot, MMU_PAGE_8M, new);
+   for (; p < ALIGN_DOWN(top, SZ_512K) && p < top; p += SZ_512K, v += 
SZ_512K)
+   __early_map_kernel_hugepage(v, p, prot, MMU_PAGE_512K, new);
+
+   if (!new)
+   flush_tlb_kernel_range(PAGE_OFFSET + v, PAGE_OFFSET + top);
+}
+
 unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
+   unsigned long etext8 = ALIGN(__pa(_etext), SZ_8M);
+   unsigned long sinittext = __pa(_sinittext);
+   unsigned long boundary = strict_kernel_rwx_enabled() ? sinittext : 
etext8;
+   unsigned long einittext8 = ALIGN(__pa(_einittext), SZ_8M);
+
+   WARN_ON(top < einittext8);
+
mmu_mapin_immr();
 
-   return 0;
+   if (__map_without_ltlbs)
+   return 0;
+
+   mmu_mapin_ram_chunk(0, boundary, PAGE_KERNEL_TEXT, true);
+   mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL_TEXT, true);
+   mmu_mapin_ram_chunk(einittext8, top, PAGE_KERNEL, true);
+
+   if (top > SZ_32M)
+   memblock_set_current_limit(top);
+
+   block_mapped_ram = top;
+
+   return top;
 }
 
 void mmu_mark_initmem_nx(void)
 {
+   unsigned long etext8 = ALIGN(__pa(_etext), SZ_8M);
+   unsigned long sinittext = __pa(_sinittext);
+   unsigned long boundary = strict_kernel_rwx_enabled() ? sinittext : 
etext8;
+   unsigned long einittext8 = ALIGN(__pa(_einittext), SZ_8M);
+
+   mmu_mapin_ram_chunk(0, boundary, PAGE_KERNEL_TEXT, false);
+   mmu_mapin_ram_chunk(boundary, einittext8, PAGE_KERNEL, false);
 }
 
 #ifdef CONFIG_STRICT_KERNEL_RWX
 void mmu_mark_rodata_ro(void)
 {
+   unsigned long sinittext = __pa(_sinittext);
+
+   mmu_mapin_ram_chunk(0, sinittext, PAGE_KERNEL_ROX, false);
 }
 #endif
 
-- 
2.25.0



[PATCH v2 39/45] powerpc/8xx: Map IMMR with a huge page

2020-05-06 Thread Christophe Leroy
Map the IMMR area with a single 512k huge page.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/nohash/8xx.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index 570ab2114a73..fb31a0c1c2a4 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -117,17 +117,13 @@ static bool immr_is_mapped __initdata;
 
 void __init mmu_mapin_immr(void)
 {
-   unsigned long p = PHYS_IMMR_BASE;
-   unsigned long v = VIRT_IMMR_BASE;
-   int offset;
-
if (immr_is_mapped)
return;
 
immr_is_mapped = true;
 
-   for (offset = 0; offset < IMMR_SIZE; offset += PAGE_SIZE)
-   map_kernel_page(v + offset, p + offset, PAGE_KERNEL_NCG);
+   __early_map_kernel_hugepage(VIRT_IMMR_BASE, PHYS_IMMR_BASE,
+   PAGE_KERNEL_NCG, MMU_PAGE_512K, true);
 }
 
 unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
-- 
2.25.0



[PATCH v2 38/45] powerpc/8xx: Add a function to early map kernel via huge pages

2020-05-06 Thread Christophe Leroy
Add a function to early map kernel memory using huge pages.

For 512k pages, just use standard page table and map in using 512k
pages.

For 8M pages, create a hugepd table and populate the two PGD
entries with it.

This function can only be used to create page tables at startup. Once
the regular SLAB allocation functions replace memblock functions,
this function cannot allocate new pages anymore. However it can still
update existing mappings with new protections.

hugepd_none() macro is moved into asm/hugetlb.h to be usable outside
of mm/hugetlbpage.c

early_pte_alloc_kernel() is made visible.

_PAGE_HUGE flag is now displayed by ptdump.

Signed-off-by: Christophe Leroy 
---
v2: Select CONFIG_HUGETLBFS instead of CONFIG_HUGETLB_PAGE which leads to 
linktime failure
---
 .../include/asm/nohash/32/hugetlb-8xx.h   |  5 ++
 arch/powerpc/include/asm/pgtable.h|  2 +
 arch/powerpc/mm/nohash/8xx.c  | 52 +++
 arch/powerpc/mm/pgtable_32.c  |  2 +-
 arch/powerpc/mm/ptdump/8xx.c  |  5 ++
 arch/powerpc/platforms/Kconfig.cputype|  1 +
 6 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h 
b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
index 1c7d4693a78e..e752a5807a59 100644
--- a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
@@ -35,6 +35,11 @@ static inline void hugepd_populate(hugepd_t *hpdp, pte_t 
*new, unsigned int pshi
*hpdp = __hugepd(__pa(new) | _PMD_USER | _PMD_PRESENT | _PMD_PAGE_8M);
 }
 
+static inline void hugepd_populate_kernel(hugepd_t *hpdp, pte_t *new, unsigned 
int pshift)
+{
+   *hpdp = __hugepd(__pa(new) | _PMD_PRESENT | _PMD_PAGE_8M);
+}
+
 static inline int check_and_get_huge_psize(int shift)
 {
return shift_to_mmu_psize(shift);
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index b1f1d5339735..961895be932a 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -107,6 +107,8 @@ unsigned long vmalloc_to_phys(void *vmalloc_addr);
 
 void pgtable_cache_add(unsigned int shift);
 
+pte_t *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va);
+
 #if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_PPC32)
 void mark_initmem_nx(void);
 #else
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index d9f205d9a654..570ab2114a73 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -9,8 +9,10 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -54,6 +56,56 @@ unsigned long p_block_mapped(phys_addr_t pa)
return 0;
 }
 
+static pte_t __init *early_hugepd_alloc_kernel(hugepd_t *pmdp, unsigned long 
va)
+{
+   if (hpd_val(*pmdp) == 0) {
+   pte_t *ptep = memblock_alloc(sizeof(pte_basic_t), SZ_4K);
+
+   if (!ptep)
+   return NULL;
+
+   hugepd_populate_kernel((hugepd_t *)pmdp, ptep, PAGE_SHIFT_8M);
+   hugepd_populate_kernel((hugepd_t *)pmdp + 1, ptep, 
PAGE_SHIFT_8M);
+   }
+   return hugepte_offset(*(hugepd_t *)pmdp, va, PGDIR_SHIFT);
+}
+
+static int __ref __early_map_kernel_hugepage(unsigned long va, phys_addr_t pa,
+pgprot_t prot, int psize, bool new)
+{
+   pmd_t *pmdp = pmd_ptr_k(va);
+   pte_t *ptep;
+
+   if (WARN_ON(psize != MMU_PAGE_512K && psize != MMU_PAGE_8M))
+   return -EINVAL;
+
+   if (new) {
+   if (WARN_ON(slab_is_available()))
+   return -EINVAL;
+
+   if (psize == MMU_PAGE_512K)
+   ptep = early_pte_alloc_kernel(pmdp, va);
+   else
+   ptep = early_hugepd_alloc_kernel((hugepd_t *)pmdp, va);
+   } else {
+   if (psize == MMU_PAGE_512K)
+   ptep = pte_offset_kernel(pmdp, va);
+   else
+   ptep = hugepte_offset(*(hugepd_t *)pmdp, va, 
PGDIR_SHIFT);
+   }
+
+   if (WARN_ON(!ptep))
+   return -ENOMEM;
+
+   /* The PTE should never be already present */
+   if (new && WARN_ON(pte_present(*ptep) && pgprot_val(prot)))
+   return -EINVAL;
+
+   set_huge_pte_at(_mm, va, ptep, pte_mkhuge(pfn_pte(pa >> 
PAGE_SHIFT, prot)));
+
+   return 0;
+}
+
 /*
  * MMU_init_hw does the chip-specific initialization of the MMU hardware.
  */
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index bd0cb6e3573e..05902bbff8d6 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -61,7 +61,7 @@ static void __init *early_alloc_pgtable(unsigned long size)
return ptr;
 }
 
-static pte_t __init *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va)
+pte_t __init *early_pte_alloc_kernel(pmd_t *pmdp, unsigned long va)
 {
  

[PATCH v2 37/45] powerpc/8xx: Refactor kernel address boundary comparison

2020-05-06 Thread Christophe Leroy
Now that linear and IMMR dedicated TLB handling is gone, kernel
boundary address comparison is similar in ITLB miss handler and
in DTLB miss handler.

Create a macro named compare_to_kernel_boundary.

When TASK_SIZE is strictly below 0x8000 and PAGE_OFFSET is
above 0x8000, it is enough to compare to 0x800, and this
can be done with a single instruction.

Using not. instruction, we get to use 'blt' conditional branch as
when doing a regular comparison:

0x <= addr <= 0x7fff ==>
0x >= NOT(addr) >= 0x8000
The above test corresponds to a 'blt'

Otherwise, do a regular comparison using two instructions.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 9f3f7f3d03a7..9a117b9f0998 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -32,10 +32,15 @@
 
 #include "head_32.h"
 
+.macro compare_to_kernel_boundary scratch, addr
 #if CONFIG_TASK_SIZE <= 0x8000 && CONFIG_PAGE_OFFSET >= 0x8000
 /* By simply checking Address >= 0x8000, we know if its a kernel address */
-#define SIMPLE_KERNEL_ADDRESS  1
+   not.\scratch, \addr
+#else
+   rlwinm  \scratch, \addr, 16, 0xfff8
+   cmpli   cr0, \scratch, PAGE_OFFSET@h
 #endif
+.endm
 
 /*
  * We need an ITLB miss handler for kernel addresses if:
@@ -209,20 +214,11 @@ InstructionTLBMiss:
mtspr   SPRN_MD_EPN, r10
 #ifdef ITLB_MISS_KERNEL
mfcrr11
-#if defined(SIMPLE_KERNEL_ADDRESS)
-   cmpicr0, r10, 0 /* Address >= 0x8000 */
-#else
-   rlwinm  r10, r10, 16, 0xfff8
-   cmpli   cr0, r10, PAGE_OFFSET@h
-#endif
+   compare_to_kernel_boundary r10, r10
 #endif
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
 #ifdef ITLB_MISS_KERNEL
-#if defined(SIMPLE_KERNEL_ADDRESS)
-   bge+3f
-#else
blt+3f
-#endif
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
@@ -288,9 +284,7 @@ DataStoreTLBMiss:
 * kernel page tables.
 */
mfspr   r10, SPRN_MD_EPN
-   rlwinm  r10, r10, 16, 0xfff8
-   cmpli   cr0, r10, PAGE_OFFSET@h
-
+   compare_to_kernel_boundary r10, r10
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
blt+3f
rlwinm  r10, r10, 0, 20, 31
-- 
2.25.0



[PATCH v2 36/45] powerpc/mm: Don't be too strict with _etext alignment on PPC32

2020-05-06 Thread Christophe Leroy
Similar to PPC64, accept to map RO data as ROX as a trade off between
between security and memory usage.

Having RO data executable is not a high risk as RO data can't be
modified to forge an exploit.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  | 26 --
 arch/powerpc/kernel/vmlinux.lds.S |  3 +--
 2 files changed, 1 insertion(+), 28 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f552726c9de2..970a5802850f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -777,32 +777,6 @@ config THREAD_SHIFT
  Used to define the stack size. The default is almost always what you
  want. Only change this if you know what you are doing.
 
-config ETEXT_SHIFT_BOOL
-   bool "Set custom etext alignment" if STRICT_KERNEL_RWX && \
-(PPC_BOOK3S_32 || PPC_8xx)
-   depends on ADVANCED_OPTIONS
-   help
- This option allows you to set the kernel end of text alignment. When
- RAM is mapped by blocks, the alignment needs to fit the size and
- number of possible blocks. The default should be OK for most configs.
-
- Say N here unless you know what you are doing.
-
-config ETEXT_SHIFT
-   int "_etext shift" if ETEXT_SHIFT_BOOL
-   range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
-   range 19 23 if STRICT_KERNEL_RWX && PPC_8xx
-   default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
-   default 19 if STRICT_KERNEL_RWX && PPC_8xx
-   default PPC_PAGE_SHIFT
-   help
- On Book3S 32 (603+), IBATs are used to map kernel text.
- Smaller is the alignment, greater is the number of necessary IBATs.
-
- On 8xx, large pages (512kb or 8M) are used to map kernel linear
- memory. Aligning to 8M reduces TLB misses as only 8M pages are used
- in that case.
-
 config DATA_SHIFT_BOOL
bool "Set custom data alignment" if STRICT_KERNEL_RWX && \
(PPC_BOOK3S_32 || PPC_8xx)
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 31a0f201fb6f..54f23205c2b9 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -15,7 +15,6 @@
 #include 
 
 #define STRICT_ALIGN_SIZE  (1 << CONFIG_DATA_SHIFT)
-#define ETEXT_ALIGN_SIZE   (1 << CONFIG_ETEXT_SHIFT)
 
 ENTRY(_stext)
 
@@ -116,7 +115,7 @@ SECTIONS
 
} :text
 
-   . = ALIGN(ETEXT_ALIGN_SIZE);
+   . = ALIGN(PAGE_SIZE);
_etext = .;
PROVIDE32 (etext = .);
 
-- 
2.25.0



[PATCH v2 35/45] powerpc/8xx: Move DTLB perf handling closer.

2020-05-06 Thread Christophe Leroy
Now that space have been freed next to the DTLB miss handler,
it's associated DTLB perf handling can be brought back in
the same place.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index fb5d17187772..9f3f7f3d03a7 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -344,6 +344,17 @@ DataStoreTLBMiss:
rfi
patch_site  0b, patch__dtlbmiss_exit_1
 
+#ifdef CONFIG_PERF_EVENTS
+   patch_site  0f, patch__dtlbmiss_perf
+0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, 1
+   stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
+   mfspr   r10, SPRN_DAR
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+   mfspr   r11, SPRN_M_TW
+   rfi
+#endif
+
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
@@ -390,18 +401,6 @@ DARFixed:/* Return from dcbx instruction bug workaround */
/* 0x300 is DataAccess exception, needed by bad_page_fault() */
EXC_XFER_LITE(0x300, handle_page_fault)
 
-/* Called from DataStoreTLBMiss when perf TLB misses events are activated */
-#ifdef CONFIG_PERF_EVENTS
-   patch_site  0f, patch__dtlbmiss_perf
-0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
-   addir10, r10, 1
-   stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
-   mfspr   r10, SPRN_DAR
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
-   mfspr   r11, SPRN_M_TW
-   rfi
-#endif
-
 stack_overflow:
vmap_stack_overflow_exception
 
-- 
2.25.0



[PATCH v2 34/45] powerpc/8xx: Remove now unused TLB miss functions

2020-05-06 Thread Christophe Leroy
The code to setup linear and IMMR mapping via huge TLB entries is
not called anymore. Remove it.

Also remove the handling of removed code exits in the perf driver.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h |  8 +-
 arch/powerpc/kernel/head_8xx.S   | 83 
 arch/powerpc/perf/8xx-pmu.c  | 10 ---
 3 files changed, 1 insertion(+), 100 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index 4d3ef3841b00..e82368838416 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -240,13 +240,7 @@ static inline unsigned int mmu_psize_to_shift(unsigned int 
mmu_psize)
 }
 
 /* patch sites */
-extern s32 patch__itlbmiss_linmem_top, patch__itlbmiss_linmem_top8;
-extern s32 patch__dtlbmiss_linmem_top, patch__dtlbmiss_immr_jmp;
-extern s32 patch__fixupdar_linmem_top;
-extern s32 patch__dtlbmiss_romem_top, patch__dtlbmiss_romem_top8;
-
-extern s32 patch__itlbmiss_exit_1, patch__itlbmiss_exit_2;
-extern s32 patch__dtlbmiss_exit_1, patch__dtlbmiss_exit_2, 
patch__dtlbmiss_exit_3;
+extern s32 patch__itlbmiss_exit_1, patch__dtlbmiss_exit_1;
 extern s32 patch__itlbmiss_perf, patch__dtlbmiss_perf;
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index d1546f379757..fb5d17187772 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -278,33 +278,6 @@ InstructionTLBMiss:
rfi
 #endif
 
-#ifndef CONFIG_PIN_TLB_TEXT
-ITLBMissLinear:
-   mtcrr11
-#if defined(CONFIG_STRICT_KERNEL_RWX) && CONFIG_ETEXT_SHIFT < 23
-   patch_site  0f, patch__itlbmiss_linmem_top8
-
-   mfspr   r10, SPRN_SRR0
-0: subis   r11, r10, (PAGE_OFFSET - 0x8000)@ha
-   rlwinm  r11, r11, 4, MI_PS8MEG ^ MI_PS512K
-   ori r11, r11, MI_PS512K | MI_SVALID
-   rlwinm  r10, r10, 0, 0x0ff8 /* 8xx supports max 256Mb RAM */
-#else
-   /* Set 8M byte page and mark it valid */
-   li  r11, MI_PS8MEG | MI_SVALID
-   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
-#endif
-   mtspr   SPRN_MI_TWC, r11
-   ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT
-   mtspr   SPRN_MI_RPN, r10/* Update TLB entry */
-
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
-   rfi
-   patch_site  0b, patch__itlbmiss_exit_2
-#endif
-
. = 0x1200
 DataStoreTLBMiss:
mtspr   SPRN_DAR, r10
@@ -371,62 +344,6 @@ DataStoreTLBMiss:
rfi
patch_site  0b, patch__dtlbmiss_exit_1
 
-DTLBMissIMMR:
-   mtcrr11
-   /* Set 512k byte guarded page and mark it valid */
-   li  r10, MD_PS512K | MD_GUARDED | MD_SVALID
-   mtspr   SPRN_MD_TWC, r10
-   mfspr   r10, SPRN_IMMR  /* Get current IMMR */
-   rlwinm  r10, r10, 0, 0xfff8 /* Get 512 kbytes boundary */
-   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT | _PAGE_NO_CACHE
-   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
-
-   li  r11, RPN_PATTERN
-
-0: mfspr   r10, SPRN_DAR
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
-   mfspr   r11, SPRN_M_TW
-   rfi
-   patch_site  0b, patch__dtlbmiss_exit_2
-
-DTLBMissLinear:
-   mtcrr11
-   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
-#if defined(CONFIG_STRICT_KERNEL_RWX) && CONFIG_DATA_SHIFT < 23
-   patch_site  0f, patch__dtlbmiss_romem_top8
-
-0: subis   r11, r10, (PAGE_OFFSET - 0x8000)@ha
-   rlwinm  r11, r11, 0, 0xff80
-   neg r10, r11
-   or  r11, r11, r10
-   rlwinm  r11, r11, 4, MI_PS8MEG ^ MI_PS512K
-   ori r11, r11, MI_PS512K | MI_SVALID
-   mfspr   r10, SPRN_MD_EPN
-   rlwinm  r10, r10, 0, 0x0ff8 /* 8xx supports max 256Mb RAM */
-#else
-   /* Set 8M byte page and mark it valid */
-   li  r11, MD_PS8MEG | MD_SVALID
-#endif
-   mtspr   SPRN_MD_TWC, r11
-#ifdef CONFIG_STRICT_KERNEL_RWX
-   patch_site  0f, patch__dtlbmiss_romem_top
-
-0: subis   r11, r10, 0
-   rlwimi  r10, r11, 11, _PAGE_RO
-#endif
-   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT
-   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
-
-   li  r11, RPN_PATTERN
-
-0: mfspr   r10, SPRN_DAR
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
-   mfspr   r11, SPRN_M_TW
-   rfi
-   patch_site  0b, patch__dtlbmiss_exit_3
-
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error 

[PATCH v2 33/45] powerpc/8xx: Drop special handling of Linear and IMMR mappings in I/D TLB handlers

2020-05-06 Thread Christophe Leroy
Up to now, linear and IMMR mappings are managed via huge TLB entries
through specific code directly in TLB miss handlers. This implies
some patching of the TLB miss handlers at startup, and a lot of
dedicated code.

Remove all this specific dedicated code.

For now we are back to normal handling via standard 4k pages. In the
next patches, linear memory mapping and IMMR mapping will be managed
through huge pages.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S |  29 +
 arch/powerpc/mm/nohash/8xx.c   | 106 +
 2 files changed, 3 insertions(+), 132 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index b0cceee6405c..d1546f379757 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -207,31 +207,21 @@ InstructionTLBMiss:
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r10)
mtspr   SPRN_MD_EPN, r10
-   /* Only modules will cause ITLB Misses as we always
-* pin the first 8MB of kernel memory */
 #ifdef ITLB_MISS_KERNEL
mfcrr11
-#if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
+#if defined(SIMPLE_KERNEL_ADDRESS)
cmpicr0, r10, 0 /* Address >= 0x8000 */
 #else
rlwinm  r10, r10, 16, 0xfff8
cmpli   cr0, r10, PAGE_OFFSET@h
-#ifndef CONFIG_PIN_TLB_TEXT
-   /* It is assumed that kernel code fits into the first 32M */
-0: cmpli   cr7, r10, (PAGE_OFFSET + 0x200)@h
-   patch_site  0b, patch__itlbmiss_linmem_top
-#endif
 #endif
 #endif
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
 #ifdef ITLB_MISS_KERNEL
-#if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
+#if defined(SIMPLE_KERNEL_ADDRESS)
bge+3f
 #else
blt+3f
-#endif
-#ifndef CONFIG_PIN_TLB_TEXT
-   blt cr7, ITLBMissLinear
 #endif
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
@@ -327,19 +317,9 @@ DataStoreTLBMiss:
mfspr   r10, SPRN_MD_EPN
rlwinm  r10, r10, 16, 0xfff8
cmpli   cr0, r10, PAGE_OFFSET@h
-#ifndef CONFIG_PIN_TLB_IMMR
-   cmpli   cr6, r10, VIRT_IMMR_BASE@h
-#endif
-0: cmpli   cr7, r10, (PAGE_OFFSET + 0x200)@h
-   patch_site  0b, patch__dtlbmiss_linmem_top
 
mfspr   r10, SPRN_M_TWB /* Get level 1 table */
blt+3f
-#ifndef CONFIG_PIN_TLB_IMMR
-0: beq-cr6, DTLBMissIMMR
-   patch_site  0b, patch__dtlbmiss_immr_jmp
-#endif
-   blt cr7, DTLBMissLinear
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
@@ -571,14 +551,9 @@ FixupDAR:/* Entry point for dcbx workaround. */
cmpli   cr1, r11, PAGE_OFFSET@h
mfspr   r11, SPRN_M_TWB /* Get level 1 table */
blt+cr1, 3f
-   rlwinm  r11, r10, 16, 0xfff8
-
-0: cmpli   cr7, r11, (PAGE_OFFSET + 0x180)@h
-   patch_site  0b, patch__fixupdar_linmem_top
 
/* create physical page address from effective address */
tophys(r11, r10)
-   blt-cr7, 201f
mfspr   r11, SPRN_M_TWB /* Get level 1 table */
rlwinm  r11, r11, 0, 20, 31
orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index 43578a8a8cad..d9f205d9a654 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -54,8 +54,6 @@ unsigned long p_block_mapped(phys_addr_t pa)
return 0;
 }
 
-#define LARGE_PAGE_SIZE_8M (1<<23)
-
 /*
  * MMU_init_hw does the chip-specific initialization of the MMU hardware.
  */
@@ -80,122 +78,20 @@ void __init mmu_mapin_immr(void)
map_kernel_page(v + offset, p + offset, PAGE_KERNEL_NCG);
 }
 
-static void mmu_patch_cmp_limit(s32 *site, unsigned long mapped)
-{
-   modify_instruction_site(site, 0x, (unsigned long)__va(mapped) >> 
16);
-}
-
-static void mmu_patch_addis(s32 *site, long simm)
-{
-   unsigned int instr = *(unsigned int *)patch_site_addr(site);
-
-   instr &= 0x;
-   instr |= ((unsigned long)simm) >> 16;
-   patch_instruction_site(site, instr);
-}
-
-static void mmu_mapin_ram_chunk(unsigned long offset, unsigned long top, 
pgprot_t prot)
-{
-   unsigned long s = offset;
-   unsigned long v = PAGE_OFFSET + s;
-   phys_addr_t p = memstart_addr + s;
-
-   for (; s < top; s += PAGE_SIZE) {
-   map_kernel_page(v, p, prot);
-   v += PAGE_SIZE;
-   p += PAGE_SIZE;
-   }
-}
-
 unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
-   unsigned long mapped;
-
mmu_mapin_immr();
 
-   if (__map_without_ltlbs) {
-   mapped = 0;
-   if (!IS_ENABLED(CONFIG_PIN_TLB_IMMR))
-   patch_instruction_site(__dtlbmiss_immr_jmp, 
PPC_INST_NOP);
- 

[PATCH v2 32/45] powerpc/8xx: Always pin TLBs at startup.

2020-05-06 Thread Christophe Leroy
At startup, map 32 Mbytes of memory through 4 pages of 8M,
and PIN them inconditionnaly. They need to be pinned because
KASAN is using page tables early and the TLBs might be
dynamically replaced otherwise.

Remove RSV4I flag after installing mappings unless
CONFIG_PIN_TLB_ is selected.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 31 +--
 arch/powerpc/mm/nohash/8xx.c   | 19 +--
 2 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index d607f4b53e0f..b0cceee6405c 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -765,6 +765,14 @@ start_here:
mtspr   SPRN_MD_RPN, r0
lis r0, (MD_TWAM | MD_RSV4I)@h
mtspr   SPRN_MD_CTR, r0
+#endif
+#ifndef CONFIG_PIN_TLB_TEXT
+   li  r0, 0
+   mtspr   SPRN_MI_CTR, r0
+#endif
+#if !defined(CONFIG_PIN_TLB_DATA) && !defined(CONFIG_PIN_TLB_IMMR)
+   lis r0, MD_TWAM@h
+   mtspr   SPRN_MD_CTR, r0
 #endif
tlbia   /* Clear all TLB entries */
sync/* wait for tlbia/tlbie to finish */
@@ -802,10 +810,6 @@ initial_mmu:
mtspr   SPRN_MD_CTR, r10/* remove PINNED DTLB entries */
 
tlbia   /* Invalidate all TLB entries */
-#ifdef CONFIG_PIN_TLB_DATA
-   orisr10, r10, MD_RSV4I@h
-   mtspr   SPRN_MD_CTR, r10/* Set data TLB control */
-#endif
 
lis r8, MI_APG_INIT@h   /* Set protection modes */
ori r8, r8, MI_APG_INIT@l
@@ -814,33 +818,32 @@ initial_mmu:
ori r8, r8, MD_APG_INIT@l
mtspr   SPRN_MD_AP, r8
 
-   /* Now map the lower RAM (up to 32 Mbytes) into the ITLB. */
-#ifdef CONFIG_PIN_TLB_TEXT
+   /* Map the lower RAM (up to 32 Mbytes) into the ITLB and DTLB */
lis r8, MI_RSV4I@h
ori r8, r8, 0x1c00
-#endif
+   orisr12, r10, MD_RSV4I@h
+   ori r12, r12, 0x1c00
li  r9, 4   /* up to 4 pages of 8M */
mtctr   r9
lis r9, KERNELBASE@h/* Create vaddr for TLB */
li  r10, MI_PS8MEG | MI_SVALID  /* Set 8M byte page */
li  r11, MI_BOOTINIT/* Create RPN for address 0 */
-   lis r12, _einittext@h
-   ori r12, r12, _einittext@l
 1:
-#ifdef CONFIG_PIN_TLB_TEXT
mtspr   SPRN_MI_CTR, r8 /* Set instruction MMU control */
addir8, r8, 0x100
-#endif
-
ori r0, r9, MI_EVALID   /* Mark it valid */
mtspr   SPRN_MI_EPN, r0
mtspr   SPRN_MI_TWC, r10
mtspr   SPRN_MI_RPN, r11/* Store TLB entry */
+   mtspr   SPRN_MD_CTR, r12
+   addir12, r12, 0x100
+   mtspr   SPRN_MD_EPN, r0
+   mtspr   SPRN_MD_TWC, r10
+   mtspr   SPRN_MD_RPN, r11
addis   r9, r9, 0x80
addis   r11, r11, 0x80
 
-   cmplcr0, r9, r12
-   bdnzf   gt, 1b
+   bdnz1b
 
/* Since the cache is enabled according to the information we
 * just loaded into the TLB, invalidate and enable the caches here.
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index d54d395c3378..43578a8a8cad 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -61,23 +61,6 @@ unsigned long p_block_mapped(phys_addr_t pa)
  */
 void __init MMU_init_hw(void)
 {
-   /* PIN up to the 3 first 8Mb after IMMR in DTLB table */
-   if (IS_ENABLED(CONFIG_PIN_TLB_DATA)) {
-   unsigned long ctr = mfspr(SPRN_MD_CTR) & 0xfe00;
-   unsigned long flags = 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY;
-   int i = 28;
-   unsigned long addr = 0;
-   unsigned long mem = total_lowmem;
-
-   for (; i < 32 && mem >= LARGE_PAGE_SIZE_8M; i++) {
-   mtspr(SPRN_MD_CTR, ctr | (i << 8));
-   mtspr(SPRN_MD_EPN, (unsigned long)__va(addr) | 
MD_EVALID);
-   mtspr(SPRN_MD_TWC, MD_PS8MEG | MD_SVALID);
-   mtspr(SPRN_MD_RPN, addr | flags | _PAGE_PRESENT);
-   addr += LARGE_PAGE_SIZE_8M;
-   mem -= LARGE_PAGE_SIZE_8M;
-   }
-   }
 }
 
 static bool immr_is_mapped __initdata;
@@ -225,7 +208,7 @@ void __init setup_initial_memory_limit(phys_addr_t 
first_memblock_base,
BUG_ON(first_memblock_base != 0);
 
/* 8xx can only access 32MB at the moment */
-   memblock_set_current_limit(min_t(u64, first_memblock_size, 0x0200));
+   memblock_set_current_limit(min_t(u64, first_memblock_size, SZ_32M));
 }
 
 /*
-- 
2.25.0



[PATCH v2 31/45] powerpc/8xx: Don't set IMMR map anymore at boot

2020-05-06 Thread Christophe Leroy
Only early debug requires IMMR to be mapped early.

No need to set it up and pin it in assembly. Map it
through page tables at udbg init when necessary.

If CONFIG_PIN_TLB_IMMR is selected, pin it once we
don't need the 32 Mb pinned RAM anymore.

Signed-off-by: Christophe Leroy 
---
v2: Disable TLB reservation to modify entry 31
---
 arch/powerpc/kernel/head_8xx.S | 39 +-
 arch/powerpc/mm/mmu_decl.h |  4 +++
 arch/powerpc/mm/nohash/8xx.c   | 15 +---
 arch/powerpc/platforms/8xx/Kconfig |  2 +-
 arch/powerpc/sysdev/cpm_common.c   |  2 ++
 5 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index c9e3d54e6a6f..d607f4b53e0f 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -749,6 +749,23 @@ start_here:
rfi
 /* Load up the kernel context */
 2:
+#ifdef CONFIG_PIN_TLB_IMMR
+   lis r0, MD_TWAM@h
+   orisr0, r0, 0x1f00
+   mtspr   SPRN_MD_CTR, r0
+   LOAD_REG_IMMEDIATE(r0, VIRT_IMMR_BASE | MD_EVALID)
+   tlbie   r0
+   mtspr   SPRN_MD_EPN, r0
+   LOAD_REG_IMMEDIATE(r0, MD_SVALID | MD_PS512K | MD_GUARDED)
+   mtspr   SPRN_MD_TWC, r0
+   mfspr   r0, SPRN_IMMR
+   rlwinm  r0, r0, 0, 0xfff8
+   ori r0, r0, 0xf0 | _PAGE_DIRTY | _PAGE_SPS | _PAGE_SH | \
+   _PAGE_NO_CACHE | _PAGE_PRESENT
+   mtspr   SPRN_MD_RPN, r0
+   lis r0, (MD_TWAM | MD_RSV4I)@h
+   mtspr   SPRN_MD_CTR, r0
+#endif
tlbia   /* Clear all TLB entries */
sync/* wait for tlbia/tlbie to finish */
 
@@ -797,28 +814,6 @@ initial_mmu:
ori r8, r8, MD_APG_INIT@l
mtspr   SPRN_MD_AP, r8
 
-   /* Map a 512k page for the IMMR to get the processor
-* internal registers (among other things).
-*/
-#ifdef CONFIG_PIN_TLB_IMMR
-   orisr10, r10, MD_RSV4I@h
-   ori r10, r10, 0x1c00
-   mtspr   SPRN_MD_CTR, r10
-
-   mfspr   r9, 638 /* Get current IMMR */
-   andis.  r9, r9, 0xfff8  /* Get 512 kbytes boundary */
-
-   lis r8, VIRT_IMMR_BASE@h/* Create vaddr for TLB */
-   ori r8, r8, MD_EVALID   /* Mark it valid */
-   mtspr   SPRN_MD_EPN, r8
-   li  r8, MD_PS512K | MD_GUARDED  /* Set 512k byte page */
-   ori r8, r8, MD_SVALID   /* Make it valid */
-   mtspr   SPRN_MD_TWC, r8
-   mr  r8, r9  /* Create paddr for TLB */
-   ori r8, r8, MI_BOOTINIT|0x2 /* Inhibit cache -- Cort */
-   mtspr   SPRN_MD_RPN, r8
-#endif
-
/* Now map the lower RAM (up to 32 Mbytes) into the ITLB. */
 #ifdef CONFIG_PIN_TLB_TEXT
lis r8, MI_RSV4I@h
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 7097e07a209a..1b6d39e9baed 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -182,6 +182,10 @@ static inline void mmu_mark_initmem_nx(void) { }
 static inline void mmu_mark_rodata_ro(void) { }
 #endif
 
+#ifdef CONFIG_PPC_8xx
+void __init mmu_mapin_immr(void);
+#endif
+
 #ifdef CONFIG_PPC_DEBUG_WX
 void ptdump_check_wx(void);
 #else
diff --git a/arch/powerpc/mm/nohash/8xx.c b/arch/powerpc/mm/nohash/8xx.c
index d83a12c5bc7f..d54d395c3378 100644
--- a/arch/powerpc/mm/nohash/8xx.c
+++ b/arch/powerpc/mm/nohash/8xx.c
@@ -65,7 +65,7 @@ void __init MMU_init_hw(void)
if (IS_ENABLED(CONFIG_PIN_TLB_DATA)) {
unsigned long ctr = mfspr(SPRN_MD_CTR) & 0xfe00;
unsigned long flags = 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY;
-   int i = IS_ENABLED(CONFIG_PIN_TLB_IMMR) ? 29 : 28;
+   int i = 28;
unsigned long addr = 0;
unsigned long mem = total_lowmem;
 
@@ -80,12 +80,19 @@ void __init MMU_init_hw(void)
}
 }
 
-static void __init mmu_mapin_immr(void)
+static bool immr_is_mapped __initdata;
+
+void __init mmu_mapin_immr(void)
 {
unsigned long p = PHYS_IMMR_BASE;
unsigned long v = VIRT_IMMR_BASE;
int offset;
 
+   if (immr_is_mapped)
+   return;
+
+   immr_is_mapped = true;
+
for (offset = 0; offset < IMMR_SIZE; offset += PAGE_SIZE)
map_kernel_page(v + offset, p + offset, PAGE_KERNEL_NCG);
 }
@@ -121,9 +128,10 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 {
unsigned long mapped;
 
+   mmu_mapin_immr();
+
if (__map_without_ltlbs) {
mapped = 0;
-   mmu_mapin_immr();
if (!IS_ENABLED(CONFIG_PIN_TLB_IMMR))
patch_instruction_site(__dtlbmiss_immr_jmp, 
PPC_INST_NOP);
if (!IS_ENABLED(CONFIG_PIN_TLB_TEXT))
@@ -142,7 +150,6 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
 */
mmu_mapin_ram_chunk(0, 

[PATCH v2 30/45] powerpc/8xx: Add function to set pinned TLBs

2020-05-06 Thread Christophe Leroy
Pinned TLBs cannot be modified when the MMU is enabled.

Create a function to rewrite the pinned TLB entries with MMU off.

To set pinned TLB, we have to turn off MMU, disable pinning,
do a TLB flush (Either with tlbie and tlbia) then reprogam
the TLB entries, enable pinning and turn on MMU.

If using tlbie, it cleared entries in both instruction and data
TLB regardless whether pinning is disabled or not.
If using tlbia, it clears all entries of the TLB which has
disabled pinning.

To make it easy, just clear all entries in both TLBs, and
reprogram them.

The function takes two arguments, the top of the memory to
consider and whether data is RO under _sinittext.
When DEBUG_PAGEALLOC is set, the top is the end of kernel rodata.
Otherwise, that's the top of physical RAM.

Everything below _sinittext is set RX, over _sinittext that's RW.

Signed-off-by: Christophe Leroy 
---
v2: Function rewritten to manage all entries at once.
---
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h |   2 +
 arch/powerpc/kernel/head_8xx.S   | 103 +++
 2 files changed, 105 insertions(+)

diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index a092e6434bda..4d3ef3841b00 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -193,6 +193,8 @@
 
 #include 
 
+void mmu_pin_tlb(unsigned long top, bool readonly);
+
 typedef struct {
unsigned int id;
unsigned int active;
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 423465b10c82..c9e3d54e6a6f 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -16,6 +16,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -866,6 +867,108 @@ initial_mmu:
mtspr   SPRN_DER, r8
blr
 
+#ifdef CONFIG_PIN_TLB
+_GLOBAL(mmu_pin_tlb)
+   lis r9, (1f - PAGE_OFFSET)@h
+   ori r9, r9, (1f - PAGE_OFFSET)@l
+   mfmsr   r10
+   mflrr11
+   li  r12, MSR_KERNEL & ~(MSR_IR | MSR_DR | MSR_RI)
+   rlwinm  r0, r10, 0, ~MSR_RI
+   rlwinm  r0, r0, 0, ~MSR_EE
+   mtmsr   r0
+   isync
+   .align  4
+   mtspr   SPRN_SRR0, r9
+   mtspr   SPRN_SRR1, r12
+   rfi
+1:
+   li  r5, 0
+   lis r6, MD_TWAM@h
+   mtspr   SPRN_MI_CTR, r5
+   mtspr   SPRN_MD_CTR, r6
+   tlbia
+
+#ifdef CONFIG_PIN_TLB_TEXT
+   LOAD_REG_IMMEDIATE(r5, 28 << 8)
+   LOAD_REG_IMMEDIATE(r6, PAGE_OFFSET)
+   LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG)
+   LOAD_REG_IMMEDIATE(r8, 0xf0 | _PAGE_RO | _PAGE_SPS | _PAGE_SH | 
_PAGE_PRESENT)
+   LOAD_REG_ADDR(r9, _sinittext)
+   li  r0, 4
+   mtctr   r0
+
+2: ori r0, r6, MI_EVALID
+   mtspr   SPRN_MI_CTR, r5
+   mtspr   SPRN_MI_EPN, r0
+   mtspr   SPRN_MI_TWC, r7
+   mtspr   SPRN_MI_RPN, r8
+   addir5, r5, 0x100
+   addis   r6, r6, SZ_8M@h
+   addis   r8, r8, SZ_8M@h
+   cmplw   r6, r9
+   bdnzt   lt, 2b
+   lis r0, MI_RSV4I@h
+   mtspr   SPRN_MI_CTR, r0
+#endif
+   LOAD_REG_IMMEDIATE(r5, 28 << 8 | MD_TWAM)
+#ifdef CONFIG_PIN_TLB_DATA
+   LOAD_REG_IMMEDIATE(r6, PAGE_OFFSET)
+   LOAD_REG_IMMEDIATE(r7, MI_SVALID | MI_PS8MEG)
+#ifdef CONFIG_PIN_TLB_IMMR
+   li  r0, 3
+#else
+   li  r0, 4
+#endif
+   mtctr   r0
+   cmpwi   r4, 0
+   beq 4f
+   LOAD_REG_IMMEDIATE(r8, 0xf0 | _PAGE_RO | _PAGE_SPS | _PAGE_SH | 
_PAGE_PRESENT)
+   LOAD_REG_ADDR(r9, _sinittext)
+
+2: ori r0, r6, MD_EVALID
+   mtspr   SPRN_MD_CTR, r5
+   mtspr   SPRN_MD_EPN, r0
+   mtspr   SPRN_MD_TWC, r7
+   mtspr   SPRN_MD_RPN, r8
+   addir5, r5, 0x100
+   addis   r6, r6, SZ_8M@h
+   addis   r8, r8, SZ_8M@h
+   cmplw   r6, r9
+   bdnzt   lt, 2b
+
+4: LOAD_REG_IMMEDIATE(r8, 0xf0 | _PAGE_SPS | _PAGE_SH | _PAGE_PRESENT)
+2: ori r0, r6, MD_EVALID
+   mtspr   SPRN_MD_CTR, r5
+   mtspr   SPRN_MD_EPN, r0
+   mtspr   SPRN_MD_TWC, r7
+   mtspr   SPRN_MD_RPN, r8
+   addir5, r5, 0x100
+   addis   r6, r6, SZ_8M@h
+   addis   r8, r8, SZ_8M@h
+   cmplw   r6, r3
+   bdnzt   lt, 2b
+#endif
+#ifdef CONFIG_PIN_TLB_IMMR
+   LOAD_REG_IMMEDIATE(r0, VIRT_IMMR_BASE | MD_EVALID)
+   LOAD_REG_IMMEDIATE(r7, MD_SVALID | MD_PS512K | MD_GUARDED)
+   mfspr   r8, SPRN_IMMR
+   rlwinm  r8, r8, 0, 0xfff8
+   ori r8, r8, 0xf0 | _PAGE_DIRTY | _PAGE_SPS | _PAGE_SH | \
+   _PAGE_NO_CACHE | _PAGE_PRESENT
+   mtspr   SPRN_MD_CTR, r5
+   mtspr   SPRN_MD_EPN, r0
+   mtspr   SPRN_MD_TWC, r7
+   mtspr   SPRN_MD_RPN, r8
+#endif
+#if defined(CONFIG_PIN_TLB_IMMR) || defined(CONFIG_PIN_TLB_DATA)
+   lis r0, (MD_RSV4I | MD_TWAM)@h
+   mtspr   SPRN_MI_CTR, r0
+#endif
+   mtspr   SPRN_SRR1, r10
+   mtspr   SPRN_SRR0, r11
+   rfi
+#endif 

[PATCH v2 29/45] powerpc/8xx: Move PPC_PIN_TLB options into 8xx Kconfig

2020-05-06 Thread Christophe Leroy
PPC_PIN_TLB options are dedicated to the 8xx, move them into
the 8xx Kconfig.

While we are at it, add some text to explain what it does.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   | 20 ---
 arch/powerpc/platforms/8xx/Kconfig | 41 ++
 2 files changed, 41 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8324d98728db..f552726c9de2 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -1226,26 +1226,6 @@ config TASK_SIZE
hex "Size of user task space" if TASK_SIZE_BOOL
default "0x8000" if PPC_8xx
default "0xc000"
-
-config PIN_TLB
-   bool "Pinned Kernel TLBs (860 ONLY)"
-   depends on ADVANCED_OPTIONS && PPC_8xx && \
-  !DEBUG_PAGEALLOC && !STRICT_KERNEL_RWX
-
-config PIN_TLB_DATA
-   bool "Pinned TLB for DATA"
-   depends on PIN_TLB
-   default y
-
-config PIN_TLB_IMMR
-   bool "Pinned TLB for IMMR"
-   depends on PIN_TLB || PPC_EARLY_DEBUG_CPM
-   default y
-
-config PIN_TLB_TEXT
-   bool "Pinned TLB for TEXT"
-   depends on PIN_TLB
-   default y
 endmenu
 
 if PPC64
diff --git a/arch/powerpc/platforms/8xx/Kconfig 
b/arch/powerpc/platforms/8xx/Kconfig
index b37de62d7e7f..0d036cd868ef 100644
--- a/arch/powerpc/platforms/8xx/Kconfig
+++ b/arch/powerpc/platforms/8xx/Kconfig
@@ -162,4 +162,45 @@ config UCODE_PATCH
default y
depends on !NO_UCODE_PATCH
 
+menu "8xx advanced setup"
+   depends on PPC_8xx
+
+config PIN_TLB
+   bool "Pinned Kernel TLBs"
+   depends on ADVANCED_OPTIONS && !DEBUG_PAGEALLOC && !STRICT_KERNEL_RWX
+   help
+ On the 8xx, we have 32 instruction TLBs and 32 data TLBs. In each
+ table 4 TLBs can be pinned.
+
+ It reduces the amount of usable TLBs to 28 (ie by 12%). That's the
+ reason why we make it selectable.
+
+ This option does nothing, it just activate the selection of what
+ to pin.
+
+config PIN_TLB_DATA
+   bool "Pinned TLB for DATA"
+   depends on PIN_TLB
+   default y
+   help
+ This pins the first 32 Mbytes of memory with 8M pages.
+
+config PIN_TLB_IMMR
+   bool "Pinned TLB for IMMR"
+   depends on PIN_TLB || PPC_EARLY_DEBUG_CPM
+   default y
+   help
+ This pins the IMMR area with a 512kbytes page. In case
+ CONFIG_PIN_TLB_DATA is also selected, it will reduce
+ CONFIG_PIN_TLB_DATA to 24 Mbytes.
+
+config PIN_TLB_TEXT
+   bool "Pinned TLB for TEXT"
+   depends on PIN_TLB
+   default y
+   help
+ This pins kernel text with 8M pages.
+
+endmenu
+
 endmenu
-- 
2.25.0



[PATCH v2 28/45] powerpc/8xx: MM_SLICE is not needed anymore

2020-05-06 Thread Christophe Leroy
As the 8xx now manages 512k pages in standard page tables,
it doesn't need CONFIG_PPC_MM_SLICES anymore.

Don't select it anymore and remove all related code.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 64 
 arch/powerpc/include/asm/nohash/32/slice.h   | 20 --
 arch/powerpc/include/asm/slice.h |  2 -
 arch/powerpc/platforms/Kconfig.cputype   |  1 -
 4 files changed, 87 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/nohash/32/slice.h

diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index 26b7cee34dfe..a092e6434bda 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -176,12 +176,6 @@
  */
 #define SPRN_M_TW  799
 
-#ifdef CONFIG_PPC_MM_SLICES
-#include 
-#define SLICE_ARRAY_SIZE   (1 << (32 - SLICE_LOW_SHIFT - 1))
-#define LOW_SLICE_ARRAY_SZ SLICE_ARRAY_SIZE
-#endif
-
 #if defined(CONFIG_PPC_4K_PAGES)
 #define mmu_virtual_psize  MMU_PAGE_4K
 #elif defined(CONFIG_PPC_16K_PAGES)
@@ -199,71 +193,13 @@
 
 #include 
 
-struct slice_mask {
-   u64 low_slices;
-   DECLARE_BITMAP(high_slices, 0);
-};
-
 typedef struct {
unsigned int id;
unsigned int active;
unsigned long vdso_base;
-#ifdef CONFIG_PPC_MM_SLICES
-   u16 user_psize; /* page size index */
-   unsigned char low_slices_psize[SLICE_ARRAY_SIZE];
-   unsigned char high_slices_psize[0];
-   unsigned long slb_addr_limit;
-   struct slice_mask mask_base_psize; /* 4k or 16k */
-   struct slice_mask mask_512k;
-   struct slice_mask mask_8m;
-#endif
void *pte_frag;
 } mm_context_t;
 
-#ifdef CONFIG_PPC_MM_SLICES
-static inline u16 mm_ctx_user_psize(mm_context_t *ctx)
-{
-   return ctx->user_psize;
-}
-
-static inline void mm_ctx_set_user_psize(mm_context_t *ctx, u16 user_psize)
-{
-   ctx->user_psize = user_psize;
-}
-
-static inline unsigned char *mm_ctx_low_slices(mm_context_t *ctx)
-{
-   return ctx->low_slices_psize;
-}
-
-static inline unsigned char *mm_ctx_high_slices(mm_context_t *ctx)
-{
-   return ctx->high_slices_psize;
-}
-
-static inline unsigned long mm_ctx_slb_addr_limit(mm_context_t *ctx)
-{
-   return ctx->slb_addr_limit;
-}
-
-static inline void mm_ctx_set_slb_addr_limit(mm_context_t *ctx, unsigned long 
limit)
-{
-   ctx->slb_addr_limit = limit;
-}
-
-static inline struct slice_mask *slice_mask_for_size(mm_context_t *ctx, int 
psize)
-{
-   if (psize == MMU_PAGE_512K)
-   return >mask_512k;
-   if (psize == MMU_PAGE_8M)
-   return >mask_8m;
-
-   BUG_ON(psize != mmu_virtual_psize);
-
-   return >mask_base_psize;
-}
-#endif /* CONFIG_PPC_MM_SLICE */
-
 #define PHYS_IMMR_BASE (mfspr(SPRN_IMMR) & 0xfff8)
 #define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
 
diff --git a/arch/powerpc/include/asm/nohash/32/slice.h 
b/arch/powerpc/include/asm/nohash/32/slice.h
deleted file mode 100644
index 39eb0154ae2d..
--- a/arch/powerpc/include/asm/nohash/32/slice.h
+++ /dev/null
@@ -1,20 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_POWERPC_NOHASH_32_SLICE_H
-#define _ASM_POWERPC_NOHASH_32_SLICE_H
-
-#ifdef CONFIG_PPC_MM_SLICES
-
-#define SLICE_LOW_SHIFT26  /* 64 slices */
-#define SLICE_LOW_TOP  (0x1ull)
-#define SLICE_NUM_LOW  (SLICE_LOW_TOP >> SLICE_LOW_SHIFT)
-#define GET_LOW_SLICE_INDEX(addr)  ((addr) >> SLICE_LOW_SHIFT)
-
-#define SLICE_HIGH_SHIFT   0
-#define SLICE_NUM_HIGH 0ul
-#define GET_HIGH_SLICE_INDEX(addr) (addr & 0)
-
-#define SLB_ADDR_LIMIT_DEFAULT DEFAULT_MAP_WINDOW
-
-#endif /* CONFIG_PPC_MM_SLICES */
-
-#endif /* _ASM_POWERPC_NOHASH_32_SLICE_H */
diff --git a/arch/powerpc/include/asm/slice.h b/arch/powerpc/include/asm/slice.h
index c6f466f4c241..0bdd9c62eca0 100644
--- a/arch/powerpc/include/asm/slice.h
+++ b/arch/powerpc/include/asm/slice.h
@@ -4,8 +4,6 @@
 
 #ifdef CONFIG_PPC_BOOK3S_64
 #include 
-#elif defined(CONFIG_PPC_MMU_NOHASH_32)
-#include 
 #endif
 
 #ifndef __ASSEMBLY__
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 27a81c291be8..5774a55a9c58 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -55,7 +55,6 @@ config PPC_8xx
select SYS_SUPPORTS_HUGETLBFS
select PPC_HAVE_KUEP
select PPC_HAVE_KUAP
-   select PPC_MM_SLICES if HUGETLB_PAGE
select HAVE_ARCH_VMAP_STACK
 
 config 40x
-- 
2.25.0



[PATCH v2 27/45] powerpc/8xx: Only 8M pages are hugepte pages now

2020-05-06 Thread Christophe Leroy
512k pages are now standard pages, so only 8M pages
are hugepte.

No more handling of normal page tables through hugepd allocation
and freeing, and hugepte helpers can also be simplified.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h |  7 +++
 arch/powerpc/mm/hugetlbpage.c| 16 +++-
 2 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h 
b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
index 785437323576..1c7d4693a78e 100644
--- a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
@@ -13,13 +13,13 @@ static inline pte_t *hugepd_page(hugepd_t hpd)
 
 static inline unsigned int hugepd_shift(hugepd_t hpd)
 {
-   return ((hpd_val(hpd) & _PMD_PAGE_MASK) >> 1) + 17;
+   return PAGE_SHIFT_8M;
 }
 
 static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned long addr,
unsigned int pdshift)
 {
-   unsigned long idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT;
+   unsigned long idx = (addr & (SZ_4M - 1)) >> PAGE_SHIFT;
 
return hugepd_page(hpd) + idx;
 }
@@ -32,8 +32,7 @@ static inline void flush_hugetlb_page(struct vm_area_struct 
*vma,
 
 static inline void hugepd_populate(hugepd_t *hpdp, pte_t *new, unsigned int 
pshift)
 {
-   *hpdp = __hugepd(__pa(new) | _PMD_USER | _PMD_PRESENT |
-(pshift == PAGE_SHIFT_8M ? _PMD_PAGE_8M : 
_PMD_PAGE_512K));
+   *hpdp = __hugepd(__pa(new) | _PMD_USER | _PMD_PRESENT | _PMD_PAGE_8M);
 }
 
 static inline int check_and_get_huge_psize(int shift)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 35eb29584b54..243e90db400c 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -54,24 +54,17 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
if (pshift >= pdshift) {
cachep = PGT_CACHE(PTE_T_ORDER);
num_hugepd = 1 << (pshift - pdshift);
-   new = NULL;
-   } else if (IS_ENABLED(CONFIG_PPC_8xx)) {
-   cachep = NULL;
-   num_hugepd = 1;
-   new = pte_alloc_one(mm);
} else {
cachep = PGT_CACHE(pdshift - pshift);
num_hugepd = 1;
-   new = NULL;
}
 
-   if (!cachep && !new) {
+   if (!cachep) {
WARN_ONCE(1, "No page table cache created for hugetlb tables");
return -ENOMEM;
}
 
-   if (cachep)
-   new = kmem_cache_alloc(cachep, pgtable_gfp_flags(mm, 
GFP_KERNEL));
+   new = kmem_cache_alloc(cachep, pgtable_gfp_flags(mm, GFP_KERNEL));
 
BUG_ON(pshift > HUGEPD_SHIFT_MASK);
BUG_ON((unsigned long)new & HUGEPD_SHIFT_MASK);
@@ -102,10 +95,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
if (i < num_hugepd) {
for (i = i - 1 ; i >= 0; i--, hpdp--)
*hpdp = __hugepd(0);
-   if (cachep)
-   kmem_cache_free(cachep, new);
-   else
-   pte_free(mm, new);
+   kmem_cache_free(cachep, new);
} else {
kmemleak_ignore(new);
}
-- 
2.25.0



[PATCH v2 26/45] powerpc/8xx: Manage 512k huge pages as standard pages.

2020-05-06 Thread Christophe Leroy
At the time being, 512k huge pages are handled through hugepd page
tables. The PMD entry is flagged as a hugepd pointer and it
means that only 512k hugepages can be managed in that 4M block.
However, the hugepd table has the same size as a normal page
table, and 512k entries can therefore be nested with normal pages.

On the 8xx, TLB loading is performed by software and allthough the
page tables are organised to match the L1 and L2 level defined by
the HW, all TLB entries have both L1 and L2 independent entries.
It means that even if two TLB entries are associated with the same
PMD entry, they can be loaded with different values in L1 part.

The L1 entry contains the page size (PS field):
- 00 for 4k and 16 pages
- 01 for 512k pages
- 11 for 8M pages

By adding a flag for hugepages in the PTE (_PAGE_HUGE) and copying it
into the lower bit of PS, we can then manage 512k pages with normal
page tables:
- PMD entry has PS=11 for 8M pages
- PMD entry has PS=00 for other pages.

As a PMD entry covers 4M areas, a PMD will either point to a hugepd
table having a single entry to an 8M page, or the PMD will point to
a standard page table which will have either entries to 4k or 16k or
512k pages. For 512k pages, as the L1 entry will not know it is a
512k page before the PTE is read, there will be 128 entries in the
PTE as if it was 4k pages. But when loading the TLB, it will be
flagged as a 512k page.

Note that we can't use pmd_ptr() in asm/nohash/32/pgtable.h because
it is not defined yet.

In ITLB miss, we keep the possibility to opt it out as when kernel
text is pinned and no user hugepages are used, we can save several
instruction by not using r11.

In DTLB miss, that's just one instruction so it's not worth bothering
with it.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 10 ++---
 arch/powerpc/include/asm/nohash/32/pte-8xx.h |  4 +++-
 arch/powerpc/include/asm/nohash/pgtable.h|  2 +-
 arch/powerpc/kernel/head_8xx.S   | 12 +--
 arch/powerpc/mm/hugetlbpage.c| 22 +---
 arch/powerpc/mm/pgtable.c| 10 -
 6 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 1a86d20b58f3..1504af38a9a8 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -229,8 +229,9 @@ static inline void pmd_clear(pmd_t *pmdp)
  * those implementations.
  *
  * On the 8xx, the page tables are a bit special. For 16k pages, we have
- * 4 identical entries. For other page sizes, we have a single entry in the
- * table.
+ * 4 identical entries. For 512k pages, we have 128 entries as if it was
+ * 4k pages, but they are flagged as 512k pages for the hardware.
+ * For other page sizes, we have a single entry in the table.
  */
 #ifdef CONFIG_PPC_8xx
 static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, 
pte_t *p,
@@ -240,13 +241,16 @@ static inline pte_basic_t pte_update(struct mm_struct 
*mm, unsigned long addr, p
pte_basic_t old = pte_val(*p);
pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
int num, i;
+   pmd_t *pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);
 
if (!huge)
num = PAGE_SIZE / SZ_4K;
+   else if ((pmd_val(*pmd) & _PMD_PAGE_MASK) != _PMD_PAGE_8M)
+   num = SZ_512K / SZ_4K;
else
num = 1;
 
-   for (i = 0; i < num; i++, entry++)
+   for (i = 0; i < num; i++, entry++, new += SZ_4K)
*entry = new;
 
return old;
diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index c9e4b2d90f65..66f403a7da44 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -46,6 +46,8 @@
 #define _PAGE_NA   0x0200  /* Supervisor NA, User no access */
 #define _PAGE_RO   0x0600  /* Supervisor RO, User no access */
 
+#define _PAGE_HUGE 0x0800  /* Copied to L1 PS bit 29 */
+
 /* cache related flags non existing on 8xx */
 #define _PAGE_COHERENT 0
 #define _PAGE_WRITETHRU0
@@ -128,7 +130,7 @@ static inline pte_t pte_mkuser(pte_t pte)
 
 static inline pte_t pte_mkhuge(pte_t pte)
 {
-   return __pte(pte_val(pte) | _PAGE_SPS);
+   return __pte(pte_val(pte) | _PAGE_SPS | _PAGE_HUGE);
 }
 
 #define pte_mkhuge pte_mkhuge
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 7fed9dc0f147..f27c967d9269 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -267,7 +267,7 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
unsigned long pfn,
 static inline int hugepd_ok(hugepd_t hpd)
 {
 #ifdef CONFIG_PPC_8xx
-   return ((hpd_val(hpd) & 0x4) != 0);
+   return 

[PATCH v2 25/45] powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.

2020-05-06 Thread Christophe Leroy
Prepare ITLB handler to handle _PAGE_HUGE when CONFIG_HUGETLBFS
is enabled. This means that the L1 entry has to be kept in r11
until L2 entry is read, in order to insert _PAGE_HUGE into it.

Also move pgd_offset helpers before pte_update() as they
will be needed there in next patch.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 13 ++---
 arch/powerpc/kernel/head_8xx.S   | 15 +--
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index dd5835354e33..1a86d20b58f3 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -206,6 +206,12 @@ static inline void pmd_clear(pmd_t *pmdp)
 }
 
 
+/* to find an entry in a kernel page-table-directory */
+#define pgd_offset_k(address) pgd_offset(_mm, address)
+
+/* to find an entry in a page-table-directory */
+#define pgd_index(address)  ((address) >> PGDIR_SHIFT)
+#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))
 
 /*
  * PTE updates. This function is called whenever an existing
@@ -348,13 +354,6 @@ static inline int pte_young(pte_t pte)
pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
 #endif
 
-/* to find an entry in a kernel page-table-directory */
-#define pgd_offset_k(address) pgd_offset(_mm, address)
-
-/* to find an entry in a page-table-directory */
-#define pgd_index(address)  ((address) >> PGDIR_SHIFT)
-#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))
-
 /* Find an entry in the third-level page table.. */
 #define pte_index(address) \
(((address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 905205c79a25..adad8baadcf5 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -196,7 +196,7 @@ SystemCall:
 
 InstructionTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || 
defined(CONFIG_HUGETLBFS)
mtspr   SPRN_SPRG_SCRATCH1, r11
 #endif
 
@@ -235,16 +235,19 @@ InstructionTLBMiss:
rlwinm  r10, r10, 0, 20, 31
orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
+   mtcrr11
 #endif
+#ifdef CONFIG_HUGETLBFS
+   lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r10)/* Get level 1 
entry */
+   mtspr   SPRN_MI_TWC, r11/* Set segment attributes */
+   mtspr   SPRN_MD_TWC, r11
+#else
lwz r10, (swapper_pg_dir-PAGE_OFFSET)@l(r10)/* Get level 1 
entry */
mtspr   SPRN_MI_TWC, r10/* Set segment attributes */
-
mtspr   SPRN_MD_TWC, r10
+#endif
mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
-#ifdef ITLB_MISS_KERNEL
-   mtcrr11
-#endif
 #ifdef CONFIG_SWAP
rlwinm  r11, r10, 32-5, _PAGE_PRESENT
and r11, r11, r10
@@ -263,7 +266,7 @@ InstructionTLBMiss:
 
/* Restore registers */
 0: mfspr   r10, SPRN_SPRG_SCRATCH0
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || 
defined(CONFIG_HUGETLBFS)
mfspr   r11, SPRN_SPRG_SCRATCH1
 #endif
rfi
-- 
2.25.0



[PATCH v2 24/45] powerpc/8xx: Drop CONFIG_8xx_COPYBACK option

2020-05-06 Thread Christophe Leroy
CONFIG_8xx_COPYBACK was there to help disabling copyback cache mode
for debuging hardware. But nobody will design new boards with 8xx now.

All 8xx platforms select it, so make it the default and remove
the option.

Also remove the Mx_RESETVAL values which are pretty useless and hide
the real value while reading code.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/configs/adder875_defconfig  |  1 -
 arch/powerpc/configs/ep88xc_defconfig|  1 -
 arch/powerpc/configs/mpc866_ads_defconfig|  1 -
 arch/powerpc/configs/mpc885_ads_defconfig|  1 -
 arch/powerpc/configs/tqm8xx_defconfig|  1 -
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h |  2 --
 arch/powerpc/kernel/head_8xx.S   | 15 +--
 arch/powerpc/platforms/8xx/Kconfig   |  9 -
 8 files changed, 1 insertion(+), 30 deletions(-)

diff --git a/arch/powerpc/configs/adder875_defconfig 
b/arch/powerpc/configs/adder875_defconfig
index f55e23cb176c..5326bc739279 100644
--- a/arch/powerpc/configs/adder875_defconfig
+++ b/arch/powerpc/configs/adder875_defconfig
@@ -10,7 +10,6 @@ CONFIG_EXPERT=y
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_PARTITION_ADVANCED=y
 CONFIG_PPC_ADDER875=y
-CONFIG_8xx_COPYBACK=y
 CONFIG_GEN_RTC=y
 CONFIG_HZ_1000=y
 # CONFIG_SECCOMP is not set
diff --git a/arch/powerpc/configs/ep88xc_defconfig 
b/arch/powerpc/configs/ep88xc_defconfig
index 0e2e5e81a359..f5c3e72da719 100644
--- a/arch/powerpc/configs/ep88xc_defconfig
+++ b/arch/powerpc/configs/ep88xc_defconfig
@@ -12,7 +12,6 @@ CONFIG_EXPERT=y
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_PARTITION_ADVANCED=y
 CONFIG_PPC_EP88XC=y
-CONFIG_8xx_COPYBACK=y
 CONFIG_GEN_RTC=y
 CONFIG_HZ_100=y
 # CONFIG_SECCOMP is not set
diff --git a/arch/powerpc/configs/mpc866_ads_defconfig 
b/arch/powerpc/configs/mpc866_ads_defconfig
index 5320735395e7..5c56d36cdfc5 100644
--- a/arch/powerpc/configs/mpc866_ads_defconfig
+++ b/arch/powerpc/configs/mpc866_ads_defconfig
@@ -12,7 +12,6 @@ CONFIG_EXPERT=y
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_PARTITION_ADVANCED=y
 CONFIG_MPC86XADS=y
-CONFIG_8xx_COPYBACK=y
 CONFIG_GEN_RTC=y
 CONFIG_HZ_1000=y
 CONFIG_MATH_EMULATION=y
diff --git a/arch/powerpc/configs/mpc885_ads_defconfig 
b/arch/powerpc/configs/mpc885_ads_defconfig
index 82a008c04eae..949ff9ccda5e 100644
--- a/arch/powerpc/configs/mpc885_ads_defconfig
+++ b/arch/powerpc/configs/mpc885_ads_defconfig
@@ -11,7 +11,6 @@ CONFIG_EXPERT=y
 # CONFIG_VM_EVENT_COUNTERS is not set
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_PARTITION_ADVANCED=y
-CONFIG_8xx_COPYBACK=y
 CONFIG_GEN_RTC=y
 CONFIG_HZ_100=y
 # CONFIG_SECCOMP is not set
diff --git a/arch/powerpc/configs/tqm8xx_defconfig 
b/arch/powerpc/configs/tqm8xx_defconfig
index eda8bfb2d0a3..77857d513022 100644
--- a/arch/powerpc/configs/tqm8xx_defconfig
+++ b/arch/powerpc/configs/tqm8xx_defconfig
@@ -15,7 +15,6 @@ CONFIG_MODULE_SRCVERSION_ALL=y
 # CONFIG_BLK_DEV_BSG is not set
 CONFIG_PARTITION_ADVANCED=y
 CONFIG_TQM8XX=y
-CONFIG_8xx_COPYBACK=y
 # CONFIG_8xx_CPU15 is not set
 CONFIG_GEN_RTC=y
 CONFIG_HZ_100=y
diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index 76af5b0cb16e..26b7cee34dfe 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -19,7 +19,6 @@
 #define MI_RSV4I   0x0800  /* Reserve 4 TLB entries */
 #define MI_PPCS0x0200  /* Use MI_RPN prob/priv state */
 #define MI_IDXMASK 0x1f00  /* TLB index to be loaded */
-#define MI_RESETVAL0x  /* Value of register at reset */
 
 /* These are the Ks and Kp from the PowerPC books.  For proper operation,
  * Ks = 0, Kp = 1.
@@ -95,7 +94,6 @@
 #define MD_TWAM0x0400  /* Use 4K page hardware assist 
*/
 #define MD_PPCS0x0200  /* Use MI_RPN prob/priv state */
 #define MD_IDXMASK 0x1f00  /* TLB index to be loaded */
-#define MD_RESETVAL0x0400  /* Value of register at reset */
 
 #define SPRN_M_CASID   793 /* Address space ID (context) to match */
 #define MC_ASIDMASK0x000f  /* Bits used for ASID value */
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 073a651787df..905205c79a25 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -779,10 +779,7 @@ start_here:
 initial_mmu:
li  r8, 0
mtspr   SPRN_MI_CTR, r8 /* remove PINNED ITLB entries */
-   lis r10, MD_RESETVAL@h
-#ifndef CONFIG_8xx_COPYBACK
-   orisr10, r10, MD_WTDEF@h
-#endif
+   lis r10, MD_TWAM@h
mtspr   SPRN_MD_CTR, r10/* remove PINNED DTLB entries */
 
tlbia   /* Invalidate all TLB entries */
@@ -857,17 +854,7 @@ initial_mmu:
mtspr   SPRN_DC_CST, r8
lis r8, IDC_ENABLE@h
mtspr   SPRN_IC_CST, r8
-#ifdef CONFIG_8xx_COPYBACK
-   mtspr   SPRN_DC_CST, r8
-#else
-   /* For a 

[PATCH v2 23/45] powerpc/mm: Reduce hugepd size for 8M hugepages on 8xx

2020-05-06 Thread Christophe Leroy
Commit 55c8fc3f4930 ("powerpc/8xx: reintroduce 16K pages with HW
assistance") redefined pte_t as a struct of 4 pte_basic_t, because
in 16K pages mode there are four identical entries in the page table.
But hugepd entries for 8M pages require only one entry of size
pte_basic_t. So there is no point in creating a cache for 4 entries
page tables.

Calculate PTE_T_ORDER using the size of pte_basic_t instead of pte_t.

Define specific huge_pte helpers (set_huge_pte_at(), huge_pte_clear(),
huge_ptep_set_wrprotect()) to write the pte in a single entry instead
of using set_pte_at() which writes 4 identical entries in 16k pages
mode. Also make sure that __ptep_set_access_flags() properly handle
the huge_pte case.

Define set_pte_filter() inline otherwise GCC doesn't inline it anymore
because it is now used twice, and that gives a pretty suboptimal code
because of pte_t being a struct of 4 entries.

Those functions are also used for 512k pages which only require one
entry as well allthough replicating it four times was harmless as 512k
pages entries are spread every 128 bytes in the table.

Signed-off-by: Christophe Leroy 
---
 .../include/asm/nohash/32/hugetlb-8xx.h   | 20 ++
 arch/powerpc/include/asm/nohash/32/pgtable.h  |  3 ++-
 arch/powerpc/mm/hugetlbpage.c |  3 ++-
 arch/powerpc/mm/pgtable.c | 26 ---
 4 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h 
b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
index a46616937d20..785437323576 100644
--- a/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/hugetlb-8xx.h
@@ -41,4 +41,24 @@ static inline int check_and_get_huge_psize(int shift)
return shift_to_mmu_psize(shift);
 }
 
+#define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
+void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, 
pte_t pte);
+
+#define __HAVE_ARCH_HUGE_PTE_CLEAR
+static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep, unsigned long sz)
+{
+   pte_update(mm, addr, ptep, ~0UL, 0, 1);
+}
+
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
+static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
+  unsigned long addr, pte_t *ptep)
+{
+   unsigned long clr = ~pte_val(pte_wrprotect(__pte(~0)));
+   unsigned long set = pte_val(pte_wrprotect(__pte(0)));
+
+   pte_update(mm, addr, ptep, clr, set, 1);
+}
+
 #endif /* _ASM_POWERPC_NOHASH_32_HUGETLB_8XX_H */
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 272963a05ab2..dd5835354e33 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -314,8 +314,9 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
pte_t pte_clr = 
pte_mkyoung(pte_mkdirty(pte_mkwrite(pte_mkexec(__pte(~0);
unsigned long set = pte_val(entry) & pte_val(pte_set);
unsigned long clr = ~pte_val(entry) & ~pte_val(pte_clr);
+   int huge = psize > mmu_virtual_psize ? 1 : 0;
 
-   pte_update(vma->vm_mm, address, ptep, clr, set, 0);
+   pte_update(vma->vm_mm, address, ptep, clr, set, huge);
 
flush_tlb_page(vma, address);
 }
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 33b3461d91e8..edf511c2a30a 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -30,7 +30,8 @@ bool hugetlb_disabled = false;
 
 #define hugepd_none(hpd)   (hpd_val(hpd) == 0)
 
-#define PTE_T_ORDER(__builtin_ffs(sizeof(pte_t)) - 
__builtin_ffs(sizeof(void *)))
+#define PTE_T_ORDER(__builtin_ffs(sizeof(pte_basic_t)) - \
+__builtin_ffs(sizeof(void *)))
 
 pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long 
sz)
 {
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index e3759b69f81b..214a5f4beb6c 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -100,7 +100,7 @@ static pte_t set_pte_filter_hash(pte_t pte) { return pte; }
  * as we don't have two bits to spare for _PAGE_EXEC and _PAGE_HWEXEC so
  * instead we "filter out" the exec permission for non clean pages.
  */
-static pte_t set_pte_filter(pte_t pte)
+static inline pte_t set_pte_filter(pte_t pte)
 {
struct page *pg;
 
@@ -249,16 +249,34 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 
 #else
/*
-* Not used on non book3s64 platforms. But 8xx
-* can possibly use tsize derived from hstate.
+* Not used on non book3s64 platforms.
+* 8xx compares it with mmu_virtual_psize to
+* know if it is a huge page or not.
 */
-   psize = 0;
+   psize = MMU_PAGE_COUNT;
 #endif

[PATCH v2 22/45] powerpc/mm: Create a dedicated pte_update() for 8xx

2020-05-06 Thread Christophe Leroy
pte_update() is a bit special for the 8xx. At the time
being, that's an #ifdef inside the nohash/32 pte_update().

As we are going to make it even more special in the coming
patches, create a dedicated version for pte_update() for 8xx.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 29 +---
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 75880eb1cb91..272963a05ab2 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -221,7 +221,31 @@ static inline void pmd_clear(pmd_t *pmdp)
  * that an executable user mapping was modified, which is needed
  * to properly flush the virtually tagged instruction cache of
  * those implementations.
+ *
+ * On the 8xx, the page tables are a bit special. For 16k pages, we have
+ * 4 identical entries. For other page sizes, we have a single entry in the
+ * table.
  */
+#ifdef CONFIG_PPC_8xx
+static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, 
pte_t *p,
+unsigned long clr, unsigned long set, int 
huge)
+{
+   pte_basic_t *entry = >pte;
+   pte_basic_t old = pte_val(*p);
+   pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
+   int num, i;
+
+   if (!huge)
+   num = PAGE_SIZE / SZ_4K;
+   else
+   num = 1;
+
+   for (i = 0; i < num; i++, entry++)
+   *entry = new;
+
+   return old;
+}
+#else
 static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, 
pte_t *p,
 unsigned long clr, unsigned long set, int 
huge)
 {
@@ -242,11 +266,7 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, 
unsigned long addr, p
pte_basic_t old = pte_val(*p);
pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
 
-#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
-   p->pte = p->pte1 = p->pte2 = p->pte3 = new;
-#else
*p = __pte(new);
-#endif
 #endif /* !PTE_ATOMIC_UPDATES */
 
 #ifdef CONFIG_44x
@@ -255,6 +275,7 @@ static inline pte_basic_t pte_update(struct mm_struct *mm, 
unsigned long addr, p
 #endif
return old;
 }
+#endif
 
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
 static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
-- 
2.25.0



[PATCH v2 21/45] powerpc/mm: Standardise pte_update() prototype between PPC32 and PPC64

2020-05-06 Thread Christophe Leroy
PPC64 takes 3 additional parameters compared to PPC32:
- mm
- address
- huge

These 3 parameters will be needed in order to perform different
action depending on the page size on the 8xx.

Make pte_update() prototype identical for PPC32 and PPC64.

This allows dropping an #ifdef in huge_ptep_get_and_clear().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 15 ---
 arch/powerpc/include/asm/hugetlb.h   |  4 
 arch/powerpc/include/asm/nohash/32/pgtable.h | 13 +++--
 3 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 8122f0b55d21..f5eab98c4e41 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -218,7 +218,7 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, 
pgprot_t prot);
  */
 
 #define pte_clear(mm, addr, ptep) \
-   do { pte_update(ptep, ~_PAGE_HASHPTE, 0); } while (0)
+   do { pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0); } while (0)
 
 #define pmd_none(pmd)  (!pmd_val(pmd))
 #definepmd_bad(pmd)(pmd_val(pmd) & _PMD_BAD)
@@ -254,7 +254,8 @@ extern void flush_hash_entry(struct mm_struct *mm, pte_t 
*ptep,
  * when using atomic updates, only the low part of the PTE is
  * accessed atomically.
  */
-static inline pte_basic_t pte_update(pte_t *p, unsigned long clr, unsigned 
long set)
+static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, 
pte_t *p,
+unsigned long clr, unsigned long set, int 
huge)
 {
pte_basic_t old;
unsigned long tmp;
@@ -292,7 +293,7 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
  unsigned long addr, pte_t *ptep)
 {
unsigned long old;
-   old = pte_update(ptep, _PAGE_ACCESSED, 0);
+   old = pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0, 0);
if (old & _PAGE_HASHPTE) {
unsigned long ptephys = __pa(ptep) & PAGE_MASK;
flush_hash_pages(mm->context.id, addr, ptephys, 1);
@@ -306,14 +307,14 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep)
 {
-   return __pte(pte_update(ptep, ~_PAGE_HASHPTE, 0));
+   return __pte(pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, 0, 0));
 }
 
 #define __HAVE_ARCH_PTEP_SET_WRPROTECT
 static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep)
 {
-   pte_update(ptep, _PAGE_RW, 0);
+   pte_update(mm, addr, ptep, _PAGE_RW, 0, 0);
 }
 
 static inline void __ptep_set_access_flags(struct vm_area_struct *vma,
@@ -324,7 +325,7 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
unsigned long set = pte_val(entry) &
(_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
 
-   pte_update(ptep, 0, set);
+   pte_update(vma->vm_mm, address, ptep, 0, set, 0);
 
flush_tlb_page(vma, address);
 }
@@ -522,7 +523,7 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
*ptep = __pte((pte_val(*ptep) & _PAGE_HASHPTE)
  | (pte_val(pte) & ~_PAGE_HASHPTE));
else
-   pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte));
+   pte_update(mm, addr, ptep, ~_PAGE_HASHPTE, pte_val(pte), 0);
 
 #elif defined(CONFIG_PTE_64BIT)
/* Second case is 32-bit with 64-bit PTE.  In this case, we
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index bd6504c28c2f..e4276af034e9 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -40,11 +40,7 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned 
long addr,
 static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
 {
-#ifdef CONFIG_PPC64
return __pte(pte_update(mm, addr, ptep, ~0UL, 0, 1));
-#else
-   return __pte(pte_update(ptep, ~0UL, 0));
-#endif
 }
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index ddf681ceb860..75880eb1cb91 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -166,7 +166,7 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, 
pgprot_t prot);
 #ifndef __ASSEMBLY__
 
 #define pte_clear(mm, addr, ptep) \
-   do { pte_update(ptep, ~0, 0); } while (0)
+   do { pte_update(mm, addr, ptep, ~0, 0, 0); } while (0)
 
 #ifndef pte_mkwrite
 static inline pte_t pte_mkwrite(pte_t pte)
@@ -222,7 +222,8 @@ static inline void 

[PATCH v2 20/45] powerpc/mm: Standardise __ptep_test_and_clear_young() params between PPC32 and PPC64

2020-05-06 Thread Christophe Leroy
On PPC32, __ptep_test_and_clear_young() takes the mm->context.id

In preparation of standardising pte_update() params between PPC32 and
PPC64, __ptep_test_and_clear_young() need mm instead of mm->context.id

Replace context param by mm.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 7 ---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 5 +++--
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index d1108d25e2e5..8122f0b55d21 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -288,18 +288,19 @@ static inline pte_basic_t pte_update(pte_t *p, unsigned 
long clr, unsigned long
  * for our hash-based implementation, we fix that up here.
  */
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
-static inline int __ptep_test_and_clear_young(unsigned int context, unsigned 
long addr, pte_t *ptep)
+static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
 {
unsigned long old;
old = pte_update(ptep, _PAGE_ACCESSED, 0);
if (old & _PAGE_HASHPTE) {
unsigned long ptephys = __pa(ptep) & PAGE_MASK;
-   flush_hash_pages(context, addr, ptephys, 1);
+   flush_hash_pages(mm->context.id, addr, ptephys, 1);
}
return (old & _PAGE_ACCESSED) != 0;
 }
 #define ptep_test_and_clear_young(__vma, __addr, __ptep) \
-   __ptep_test_and_clear_young((__vma)->vm_mm->context.id, __addr, __ptep)
+   __ptep_test_and_clear_young((__vma)->vm_mm, __addr, __ptep)
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 9eaf386a747b..ddf681ceb860 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -256,14 +256,15 @@ static inline pte_basic_t pte_update(pte_t *p, unsigned 
long clr, unsigned long
 }
 
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
-static inline int __ptep_test_and_clear_young(unsigned int context, unsigned 
long addr, pte_t *ptep)
+static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
+ unsigned long addr, pte_t *ptep)
 {
unsigned long old;
old = pte_update(ptep, _PAGE_ACCESSED, 0);
return (old & _PAGE_ACCESSED) != 0;
 }
 #define ptep_test_and_clear_young(__vma, __addr, __ptep) \
-   __ptep_test_and_clear_young((__vma)->vm_mm->context.id, __addr, __ptep)
+   __ptep_test_and_clear_young((__vma)->vm_mm, __addr, __ptep)
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr,
-- 
2.25.0



[PATCH v2 19/45] powerpc/mm: Refactor pte_update() on book3s/32

2020-05-06 Thread Christophe Leroy
When CONFIG_PTE_64BIT is set, pte_update() operates on
'unsigned long long'
When CONFIG_PTE_64BIT is not set, pte_update() operates on
'unsigned long'

In asm/page.h, we have pte_basic_t which is 'unsigned long long'
when CONFIG_PTE_64BIT is set and 'unsigned long' otherwise.

Refactor pte_update() using pte_basic_t.

While we are at it, drop the comment on 44x which is not applicable
to book3s version of pte_update().

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 58 +++-
 1 file changed, 20 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 7549393c4c43..d1108d25e2e5 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -253,53 +253,35 @@ extern void flush_hash_entry(struct mm_struct *mm, pte_t 
*ptep,
  * and the PTE may be either 32 or 64 bit wide. In the later case,
  * when using atomic updates, only the low part of the PTE is
  * accessed atomically.
- *
- * In addition, on 44x, we also maintain a global flag indicating
- * that an executable user mapping was modified, which is needed
- * to properly flush the virtually tagged instruction cache of
- * those implementations.
  */
-#ifndef CONFIG_PTE_64BIT
-static inline unsigned long pte_update(pte_t *p,
-  unsigned long clr,
-  unsigned long set)
+static inline pte_basic_t pte_update(pte_t *p, unsigned long clr, unsigned 
long set)
 {
-   unsigned long old, tmp;
-
-   __asm__ __volatile__("\
-1: lwarx   %0,0,%3\n\
-   andc%1,%0,%4\n\
-   or  %1,%1,%5\n"
-"  stwcx.  %1,0,%3\n\
-   bne-1b"
-   : "=" (old), "=" (tmp), "=m" (*p)
-   : "r" (p), "r" (clr), "r" (set), "m" (*p)
-   : "cc" );
-
-   return old;
-}
-#else /* CONFIG_PTE_64BIT */
-static inline unsigned long long pte_update(pte_t *p,
-   unsigned long clr,
-   unsigned long set)
-{
-   unsigned long long old;
+   pte_basic_t old;
unsigned long tmp;
 
-   __asm__ __volatile__("\
-1: lwarx   %L0,0,%4\n\
-   lwzx%0,0,%3\n\
-   andc%1,%L0,%5\n\
-   or  %1,%1,%6\n"
-"  stwcx.  %1,0,%4\n\
-   bne-1b"
+   __asm__ __volatile__(
+#ifndef CONFIG_PTE_64BIT
+"1:lwarx   %0, 0, %3\n"
+"  andc%1, %0, %4\n"
+#else
+"1:lwarx   %L0, 0, %3\n"
+"  lwz %0, -4(%3)\n"
+"  andc%1, %L0, %4\n"
+#endif
+"  or  %1, %1, %5\n"
+"  stwcx.  %1, 0, %3\n"
+"  bne-1b"
: "=" (old), "=" (tmp), "=m" (*p)
-   : "r" (p), "r" ((unsigned long)(p) + 4), "r" (clr), "r" (set), "m" (*p)
+#ifndef CONFIG_PTE_64BIT
+   : "r" (p),
+#else
+   : "b" ((unsigned long)(p) + 4),
+#endif
+ "r" (clr), "r" (set), "m" (*p)
: "cc" );
 
return old;
 }
-#endif /* CONFIG_PTE_64BIT */
 
 /*
  * 2.6 calls this without flushing the TLB entry; this is wrong
-- 
2.25.0



[PATCH v2 18/45] powerpc/mm: Refactor pte_update() on nohash/32

2020-05-06 Thread Christophe Leroy
When CONFIG_PTE_64BIT is set, pte_update() operates on
'unsigned long long'
When CONFIG_PTE_64BIT is not set, pte_update() operates on
'unsigned long'

In asm/page.h, we have pte_basic_t which is 'unsigned long long'
when CONFIG_PTE_64BIT is set and 'unsigned long' otherwise.

Refactor pte_update() using pte_basic_t.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 26 +++-
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 523c4c3876c5..9eaf386a747b 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -222,12 +222,9 @@ static inline void pmd_clear(pmd_t *pmdp)
  * to properly flush the virtually tagged instruction cache of
  * those implementations.
  */
-#ifndef CONFIG_PTE_64BIT
-static inline unsigned long pte_update(pte_t *p,
-  unsigned long clr,
-  unsigned long set)
+static inline pte_basic_t pte_update(pte_t *p, unsigned long clr, unsigned 
long set)
 {
-#ifdef PTE_ATOMIC_UPDATES
+#if defined(PTE_ATOMIC_UPDATES) && !defined(CONFIG_PTE_64BIT)
unsigned long old, tmp;
 
__asm__ __volatile__("\
@@ -241,8 +238,8 @@ static inline unsigned long pte_update(pte_t *p,
: "r" (p), "r" (clr), "r" (set), "m" (*p)
: "cc" );
 #else /* PTE_ATOMIC_UPDATES */
-   unsigned long old = pte_val(*p);
-   unsigned long new = (old & ~clr) | set;
+   pte_basic_t old = pte_val(*p);
+   pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
 
 #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
p->pte = p->pte1 = p->pte2 = p->pte3 = new;
@@ -257,21 +254,6 @@ static inline unsigned long pte_update(pte_t *p,
 #endif
return old;
 }
-#else /* CONFIG_PTE_64BIT */
-static inline unsigned long long pte_update(pte_t *p,
-   unsigned long clr,
-   unsigned long set)
-{
-   unsigned long long old = pte_val(*p);
-   *p = __pte((old & ~(unsigned long long)clr) | set);
-
-#ifdef CONFIG_44x
-   if ((old & _PAGE_USER) && (old & _PAGE_EXEC))
-   icache_44x_need_flush = 1;
-#endif
-   return old;
-}
-#endif /* CONFIG_PTE_64BIT */
 
 #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
 static inline int __ptep_test_and_clear_young(unsigned int context, unsigned 
long addr, pte_t *ptep)
-- 
2.25.0



[PATCH v2 17/45] powerpc/mm: PTE_ATOMIC_UPDATES is only for 40x

2020-05-06 Thread Christophe Leroy
Only 40x still uses PTE_ATOMIC_UPDATES.
40x cannot not select CONFIG_PTE64_BIT.

Drop handling of PTE_ATOMIC_UPDATES:
- In nohash/64
- In nohash/32 for CONFIG_PTE_64BIT

Keep PTE_ATOMIC_UPDATES only for nohash/32 for !CONFIG_PTE_64BIT

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pgtable.h | 17 
 arch/powerpc/include/asm/nohash/64/pgtable.h | 28 +---
 2 files changed, 1 insertion(+), 44 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index b04ba257fddb..523c4c3876c5 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -262,25 +262,8 @@ static inline unsigned long long pte_update(pte_t *p,
unsigned long clr,
unsigned long set)
 {
-#ifdef PTE_ATOMIC_UPDATES
-   unsigned long long old;
-   unsigned long tmp;
-
-   __asm__ __volatile__("\
-1: lwarx   %L0,0,%4\n\
-   lwzx%0,0,%3\n\
-   andc%1,%L0,%5\n\
-   or  %1,%1,%6\n"
-   PPC405_ERR77(0,%3)
-"  stwcx.  %1,0,%4\n\
-   bne-1b"
-   : "=" (old), "=" (tmp), "=m" (*p)
-   : "r" (p), "r" ((unsigned long)(p) + 4), "r" (clr), "r" (set), "m" (*p)
-   : "cc" );
-#else /* PTE_ATOMIC_UPDATES */
unsigned long long old = pte_val(*p);
*p = __pte((old & ~(unsigned long long)clr) | set);
-#endif /* !PTE_ATOMIC_UPDATES */
 
 #ifdef CONFIG_44x
if ((old & _PAGE_USER) && (old & _PAGE_EXEC))
diff --git a/arch/powerpc/include/asm/nohash/64/pgtable.h 
b/arch/powerpc/include/asm/nohash/64/pgtable.h
index 9a33b8bd842d..9c703b140d64 100644
--- a/arch/powerpc/include/asm/nohash/64/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/64/pgtable.h
@@ -211,22 +211,9 @@ static inline unsigned long pte_update(struct mm_struct 
*mm,
   unsigned long set,
   int huge)
 {
-#ifdef PTE_ATOMIC_UPDATES
-   unsigned long old, tmp;
-
-   __asm__ __volatile__(
-   "1: ldarx   %0,0,%3 # pte_update\n\
-   andc%1,%0,%4 \n\
-   or  %1,%1,%6\n\
-   stdcx.  %1,0,%3 \n\
-   bne-1b"
-   : "=" (old), "=" (tmp), "=m" (*ptep)
-   : "r" (ptep), "r" (clr), "m" (*ptep), "r" (set)
-   : "cc" );
-#else
unsigned long old = pte_val(*ptep);
*ptep = __pte((old & ~clr) | set);
-#endif
+
/* huge pages use the old page table lock */
if (!huge)
assert_pte_locked(mm, addr);
@@ -310,21 +297,8 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
unsigned long bits = pte_val(entry) &
(_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
 
-#ifdef PTE_ATOMIC_UPDATES
-   unsigned long old, tmp;
-
-   __asm__ __volatile__(
-   "1: ldarx   %0,0,%4\n\
-   or  %0,%3,%0\n\
-   stdcx.  %0,0,%4\n\
-   bne-1b"
-   :"=" (old), "=" (tmp), "=m" (*ptep)
-   :"r" (bits), "r" (ptep), "m" (*ptep)
-   :"cc");
-#else
unsigned long old = pte_val(*ptep);
*ptep = __pte(old | bits);
-#endif
 
flush_tlb_page(vma, address);
 }
-- 
2.25.0



[PATCH v2 15/45] powerpc/mm: Allocate static page tables for fixmap

2020-05-06 Thread Christophe Leroy
Allocate static page tables for the fixmap area. This allows
setting mappings through page tables before memblock is ready.
That's needed to use early_ioremap() early and to use standard
page mappings with fixmap.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/fixmap.h |  4 
 arch/powerpc/kernel/setup_32.c|  2 +-
 arch/powerpc/mm/pgtable_32.c  | 16 
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/fixmap.h 
b/arch/powerpc/include/asm/fixmap.h
index 2ef155a3c821..ccbe2e83c950 100644
--- a/arch/powerpc/include/asm/fixmap.h
+++ b/arch/powerpc/include/asm/fixmap.h
@@ -86,6 +86,10 @@ enum fixed_addresses {
 #define __FIXADDR_SIZE (__end_of_fixed_addresses << PAGE_SHIFT)
 #define FIXADDR_START  (FIXADDR_TOP - __FIXADDR_SIZE)
 
+#define FIXMAP_ALIGNED_SIZE(ALIGN(FIXADDR_TOP, PGDIR_SIZE) - \
+ALIGN_DOWN(FIXADDR_START, PGDIR_SIZE))
+#define FIXMAP_PTE_SIZE(FIXMAP_ALIGNED_SIZE / PGDIR_SIZE * 
PTE_TABLE_SIZE)
+
 #define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_NCG
 #define FIXMAP_PAGE_IO PAGE_KERNEL_NCG
 
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 305ca89d856f..fee167d2701f 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -80,7 +80,7 @@ notrace void __init machine_init(u64 dt_ptr)
/* Configure static keys first, now that we're relocated. */
setup_feature_keys();
 
-   early_ioremap_setup();
+   early_ioremap_init();
 
/* Enable early debugging if any specified (see udbg.h) */
udbg_early_init();
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index f62de06e3d07..9934659cb871 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -29,11 +29,27 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
 extern char etext[], _stext[], _sinittext[], _einittext[];
 
+static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data;
+
+notrace void __init early_ioremap_init(void)
+{
+   unsigned long addr = ALIGN_DOWN(FIXADDR_START, PGDIR_SIZE);
+   pte_t *ptep = (pte_t *)early_fixmap_pagetable;
+   pmd_t *pmdp = pmd_ptr_k(addr);
+
+   for (; (s32)(FIXADDR_TOP - addr) > 0;
+addr += PGDIR_SIZE, ptep += PTRS_PER_PTE, pmdp++)
+   pmd_populate_kernel(_mm, pmdp, ptep);
+
+   early_ioremap_setup();
+}
+
 static void __init *early_alloc_pgtable(unsigned long size)
 {
void *ptr = memblock_alloc(size, size);
-- 
2.25.0



[PATCH v2 16/45] powerpc/mm: Fix conditions to perform MMU specific management by blocks on PPC32.

2020-05-06 Thread Christophe Leroy
Setting init mem to NX shall depend on sinittext being mapped by
block, not on stext being mapped by block.

Setting text and rodata to RO shall depend on stext being mapped by
block, not on sinittext being mapped by block.

Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable_32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 9934659cb871..bd0cb6e3573e 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -185,7 +185,7 @@ void mark_initmem_nx(void)
unsigned long numpages = PFN_UP((unsigned long)_einittext) -
 PFN_DOWN((unsigned long)_sinittext);
 
-   if (v_block_mapped((unsigned long)_stext + 1))
+   if (v_block_mapped((unsigned long)_sinittext))
mmu_mark_initmem_nx();
else
change_page_attr(page, numpages, PAGE_KERNEL);
@@ -197,7 +197,7 @@ void mark_rodata_ro(void)
struct page *page;
unsigned long numpages;
 
-   if (v_block_mapped((unsigned long)_sinittext)) {
+   if (v_block_mapped((unsigned long)_stext + 1)) {
mmu_mark_rodata_ro();
ptdump_check_wx();
return;
-- 
2.25.0



[PATCH v2 14/45] powerpc/32s: Don't warn when mapping RO data ROX.

2020-05-06 Thread Christophe Leroy
Mapping RO data as ROX is not an issue since that data
cannot be modified to introduce an exploit.

PPC64 accepts to have RO data mapped ROX, as a trade off
between kernel size and strictness of protection.

On PPC32, kernel size is even more critical as amount of
memory is usually small.

Depending on the number of available IBATs, the last IBATs
might overflow the end of text. Only warn if it crosses
the end of RO data.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/book3s32/mmu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 39ba53ca5bb5..a9b2cbc74797 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -187,6 +187,7 @@ void mmu_mark_initmem_nx(void)
int i;
unsigned long base = (unsigned long)_stext - PAGE_OFFSET;
unsigned long top = (unsigned long)_etext - PAGE_OFFSET;
+   unsigned long border = (unsigned long)__init_begin - PAGE_OFFSET;
unsigned long size;
 
if (IS_ENABLED(CONFIG_PPC_BOOK3S_601))
@@ -201,9 +202,10 @@ void mmu_mark_initmem_nx(void)
size = block_size(base, top);
size = max(size, 128UL << 10);
if ((top - base) > size) {
-   if (strict_kernel_rwx_enabled())
-   pr_warn("Kernel _etext not properly aligned\n");
size <<= 1;
+   if (strict_kernel_rwx_enabled() && base + size > border)
+   pr_warn("Some RW data is getting mapped X. "
+   "Adjust CONFIG_DATA_SHIFT to avoid 
that.\n");
}
setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
base += size;
-- 
2.25.0



[PATCH v2 13/45] powerpc/ptdump: Handle hugepd at PGD level

2020-05-06 Thread Christophe Leroy
The 8xx is about to map kernel linear space and IMMR using huge
pages.

In order to display those pages properly, ptdump needs to handle
hugepd tables at PGD level.

For the time being do it only at PGD level. Further patches may
add handling of hugepd tables at lower level for other platforms
when needed in the future.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ptdump/ptdump.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index 64434b66f240..1adaa7e794f3 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -270,6 +271,26 @@ static void walk_pte(struct pg_state *st, pmd_t *pmd, 
unsigned long start)
}
 }
 
+static void walk_hugepd(struct pg_state *st, hugepd_t *phpd, unsigned long 
start,
+   int pdshift, int level)
+{
+#ifdef CONFIG_ARCH_HAS_HUGEPD
+   unsigned int i;
+   int shift = hugepd_shift(*phpd);
+   int ptrs_per_hpd = pdshift - shift > 0 ? 1 << (pdshift - shift) : 1;
+
+   if (start & ((1 << shift) - 1))
+   return;
+
+   for (i = 0; i < ptrs_per_hpd; i++) {
+   unsigned long addr = start + (i << shift);
+   pte_t *pte = hugepte_offset(*phpd, addr, pdshift);
+
+   note_page(st, addr, level + 1, pte_val(*pte), shift);
+   }
+#endif
+}
+
 static void walk_pmd(struct pg_state *st, pud_t *pud, unsigned long start)
 {
pmd_t *pmd = pmd_offset(pud, 0);
@@ -313,11 +334,13 @@ static void walk_pagetables(struct pg_state *st)
 * the hash pagetable.
 */
for (i = pgd_index(addr); i < PTRS_PER_PGD; i++, pgd++, addr += 
PGDIR_SIZE) {
-   if (!pgd_none(*pgd) && !pgd_is_leaf(*pgd))
+   if (pgd_none(*pgd) || pgd_is_leaf(*pgd))
+   note_page(st, addr, 1, pgd_val(*pgd), PUD_SHIFT);
+   else if (is_hugepd(__hugepd(pgd_val(*pgd
+   walk_hugepd(st, (hugepd_t *)pgd, addr, PGDIR_SHIFT, 1);
+   else
/* pgd exists */
walk_pud(st, pgd, addr);
-   else
-   note_page(st, addr, 1, pgd_val(*pgd), PUD_SHIFT);
}
 }
 
-- 
2.25.0



[PATCH v2 11/45] powerpc/ptdump: Standardise display of BAT flags

2020-05-06 Thread Christophe Leroy
Display BAT flags the same way as page flags: rwx and wimg

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ptdump/bats.c | 37 ++-
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/mm/ptdump/bats.c b/arch/powerpc/mm/ptdump/bats.c
index d6c660f63d71..cebb58c7e289 100644
--- a/arch/powerpc/mm/ptdump/bats.c
+++ b/arch/powerpc/mm/ptdump/bats.c
@@ -15,12 +15,12 @@
 static char *pp_601(int k, int pp)
 {
if (pp == 0)
-   return k ? "NA" : "RWX";
+   return k ? "   " : "rwx";
if (pp == 1)
-   return k ? "ROX" : "RWX";
+   return k ? "r x" : "rwx";
if (pp == 2)
-   return k ? "RWX" : "RWX";
-   return k ? "ROX" : "ROX";
+   return "rwx";
+   return "r x";
 }
 
 static void bat_show_601(struct seq_file *m, int idx, u32 lower, u32 upper)
@@ -48,12 +48,9 @@ static void bat_show_601(struct seq_file *m, int idx, u32 
lower, u32 upper)
 
seq_printf(m, "Kernel %s User %s", pp_601(k & 2, pp), pp_601(k & 1, 
pp));
 
-   if (lower & _PAGE_WRITETHRU)
-   seq_puts(m, "write through ");
-   if (lower & _PAGE_NO_CACHE)
-   seq_puts(m, "no cache ");
-   if (lower & _PAGE_COHERENT)
-   seq_puts(m, "coherent ");
+   seq_puts(m, lower & _PAGE_WRITETHRU ? "w " : "  ");
+   seq_puts(m, lower & _PAGE_NO_CACHE ? "i " : "  ");
+   seq_puts(m, lower & _PAGE_COHERENT ? "m " : "  ");
seq_puts(m, "\n");
 }
 
@@ -101,20 +98,16 @@ static void bat_show_603(struct seq_file *m, int idx, u32 
lower, u32 upper, bool
seq_puts(m, "Kernel/User ");
 
if (lower & BPP_RX)
-   seq_puts(m, is_d ? "RO " : "EXEC ");
+   seq_puts(m, is_d ? "r   " : "  x ");
else if (lower & BPP_RW)
-   seq_puts(m, is_d ? "RW " : "EXEC ");
+   seq_puts(m, is_d ? "rw  " : "  x ");
else
-   seq_puts(m, is_d ? "NA " : "NX   ");
-
-   if (lower & _PAGE_WRITETHRU)
-   seq_puts(m, "write through ");
-   if (lower & _PAGE_NO_CACHE)
-   seq_puts(m, "no cache ");
-   if (lower & _PAGE_COHERENT)
-   seq_puts(m, "coherent ");
-   if (lower & _PAGE_GUARDED)
-   seq_puts(m, "guarded ");
+   seq_puts(m, is_d ? "" : "");
+
+   seq_puts(m, lower & _PAGE_WRITETHRU ? "w " : "  ");
+   seq_puts(m, lower & _PAGE_NO_CACHE ? "i " : "  ");
+   seq_puts(m, lower & _PAGE_COHERENT ? "m " : "  ");
+   seq_puts(m, lower & _PAGE_GUARDED ? "g " : "  ");
seq_puts(m, "\n");
 }
 
-- 
2.25.0



[PATCH v2 12/45] powerpc/ptdump: Properly handle non standard page size

2020-05-06 Thread Christophe Leroy
In order to properly display information regardless of the page size,
it is necessary to take into account real page size.

Signed-off-by: Christophe Leroy 
Fixes: cabe8138b23c ("powerpc: dump as a single line areas mapping a single 
physical page.")
Cc: sta...@vger.kernel.org
---
 arch/powerpc/mm/ptdump/ptdump.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index 1f97668853e3..64434b66f240 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -60,6 +60,7 @@ struct pg_state {
unsigned long start_address;
unsigned long start_pa;
unsigned long last_pa;
+   unsigned long page_size;
unsigned int level;
u64 current_flags;
bool check_wx;
@@ -168,9 +169,9 @@ static void dump_addr(struct pg_state *st, unsigned long 
addr)
 #endif
 
pt_dump_seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 
1);
-   if (st->start_pa == st->last_pa && st->start_address + PAGE_SIZE != 
addr) {
+   if (st->start_pa == st->last_pa && st->start_address + st->page_size != 
addr) {
pt_dump_seq_printf(st->seq, "[" REG "]", st->start_pa);
-   delta = PAGE_SIZE >> 10;
+   delta = st->page_size >> 10;
} else {
pt_dump_seq_printf(st->seq, " " REG " ", st->start_pa);
delta = (addr - st->start_address) >> 10;
@@ -195,10 +196,11 @@ static void note_prot_wx(struct pg_state *st, unsigned 
long addr)
 }
 
 static void note_page(struct pg_state *st, unsigned long addr,
-  unsigned int level, u64 val)
+  unsigned int level, u64 val, int shift)
 {
u64 flag = val & pg_level[level].mask;
u64 pa = val & PTE_RPN_MASK;
+   unsigned long page_size = 1 << shift;
 
/* At first no level is set */
if (!st->level) {
@@ -207,6 +209,7 @@ static void note_page(struct pg_state *st, unsigned long 
addr,
st->start_address = addr;
st->start_pa = pa;
st->last_pa = pa;
+   st->page_size = page_size;
pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
/*
 * Dump the section of virtual memory when:
@@ -218,7 +221,7 @@ static void note_page(struct pg_state *st, unsigned long 
addr,
 */
} else if (flag != st->current_flags || level != st->level ||
   addr >= st->marker[1].start_address ||
-  (pa != st->last_pa + PAGE_SIZE &&
+  (pa != st->last_pa + st->page_size &&
(pa != st->start_pa || st->start_pa != st->last_pa))) {
 
/* Check the PTE flags */
@@ -246,6 +249,7 @@ static void note_page(struct pg_state *st, unsigned long 
addr,
st->start_address = addr;
st->start_pa = pa;
st->last_pa = pa;
+   st->page_size = page_size;
st->current_flags = flag;
st->level = level;
} else {
@@ -261,7 +265,7 @@ static void walk_pte(struct pg_state *st, pmd_t *pmd, 
unsigned long start)
 
for (i = 0; i < PTRS_PER_PTE; i++, pte++) {
addr = start + i * PAGE_SIZE;
-   note_page(st, addr, 4, pte_val(*pte));
+   note_page(st, addr, 4, pte_val(*pte), PAGE_SHIFT);
 
}
 }
@@ -278,7 +282,7 @@ static void walk_pmd(struct pg_state *st, pud_t *pud, 
unsigned long start)
/* pmd exists */
walk_pte(st, pmd, addr);
else
-   note_page(st, addr, 3, pmd_val(*pmd));
+   note_page(st, addr, 3, pmd_val(*pmd), PTE_SHIFT);
}
 }
 
@@ -294,7 +298,7 @@ static void walk_pud(struct pg_state *st, pgd_t *pgd, 
unsigned long start)
/* pud exists */
walk_pmd(st, pud, addr);
else
-   note_page(st, addr, 2, pud_val(*pud));
+   note_page(st, addr, 2, pud_val(*pud), PMD_SHIFT);
}
 }
 
@@ -313,7 +317,7 @@ static void walk_pagetables(struct pg_state *st)
/* pgd exists */
walk_pud(st, pgd, addr);
else
-   note_page(st, addr, 1, pgd_val(*pgd));
+   note_page(st, addr, 1, pgd_val(*pgd), PUD_SHIFT);
}
 }
 
@@ -368,7 +372,7 @@ static int ptdump_show(struct seq_file *m, void *v)
 
/* Traverse kernel page tables */
walk_pagetables();
-   note_page(, 0, 0, 0);
+   note_page(, 0, 0, 0, 0);
return 0;
 }
 
-- 
2.25.0



[PATCH v2 06/45] powerpc/kasan: Declare kasan_init_region() weak

2020-05-06 Thread Christophe Leroy
In order to alloc sub-arches to alloc KASAN regions using optimised
methods (Huge pages on 8xx, BATs on BOOK3S, ...), declare
kasan_init_region() weak.

Also make kasan_init_shadow_page_tables() accessible from outside,
so that it can be called from the specific kasan_init_region()
functions if needed.

And populate remaining KASAN address space only once performed
the region mapping, to allow 8xx to allocate hugepd instead of
standard page tables for mapping via 8M hugepages.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/kasan.h  |  3 +++
 arch/powerpc/mm/kasan/kasan_init_32.c | 21 +++--
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 4769bbf7173a..107a24c3f7b3 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -34,5 +34,8 @@ static inline void kasan_init(void) { }
 static inline void kasan_late_init(void) { }
 #endif
 
+int kasan_init_shadow_page_tables(unsigned long k_start, unsigned long k_end);
+int kasan_init_region(void *start, size_t size);
+
 #endif /* __ASSEMBLY */
 #endif
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index 10481d904fea..76d418af4ce8 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -28,7 +28,7 @@ static void __init kasan_populate_pte(pte_t *ptep, pgprot_t 
prot)
__set_pte_at(_mm, va, ptep, pfn_pte(PHYS_PFN(pa), prot), 
0);
 }
 
-static int __init kasan_init_shadow_page_tables(unsigned long k_start, 
unsigned long k_end)
+int __init kasan_init_shadow_page_tables(unsigned long k_start, unsigned long 
k_end)
 {
pmd_t *pmd;
unsigned long k_cur, k_next;
@@ -52,7 +52,7 @@ static int __init kasan_init_shadow_page_tables(unsigned long 
k_start, unsigned
return 0;
 }
 
-static int __init kasan_init_region(void *start, size_t size)
+int __init __weak kasan_init_region(void *start, size_t size)
 {
unsigned long k_start = (unsigned long)kasan_mem_to_shadow(start);
unsigned long k_end = (unsigned long)kasan_mem_to_shadow(start + size);
@@ -122,14 +122,6 @@ static void __init kasan_mmu_init(void)
int ret;
struct memblock_region *reg;
 
-   if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE) ||
-   IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
-   ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, 
KASAN_SHADOW_END);
-
-   if (ret)
-   panic("kasan: kasan_init_shadow_page_tables() failed");
-   }
-
for_each_memblock(memory, reg) {
phys_addr_t base = reg->base;
phys_addr_t top = min(base + reg->size, total_lowmem);
@@ -141,6 +133,15 @@ static void __init kasan_mmu_init(void)
if (ret)
panic("kasan: kasan_init_region() failed");
}
+
+   if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE) ||
+   IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
+   ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, 
KASAN_SHADOW_END);
+
+   if (ret)
+   panic("kasan: kasan_init_shadow_page_tables() failed");
+   }
+
 }
 
 void __init kasan_init(void)
-- 
2.25.0



[PATCH v2 10/45] powerpc/ptdump: Display size of BATs

2020-05-06 Thread Christophe Leroy
Display the size of areas mapped with BATs.

For that, the size display for pages is refactorised.

Signed-off-by: Christophe Leroy 
---
v2: Add missing include of linux/seq_file.h (Thanks to kbuild test robot)
---
 arch/powerpc/mm/ptdump/bats.c   |  4 
 arch/powerpc/mm/ptdump/ptdump.c | 23 ++-
 arch/powerpc/mm/ptdump/ptdump.h |  3 +++
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/ptdump/bats.c b/arch/powerpc/mm/ptdump/bats.c
index d3a5d6b318d1..d6c660f63d71 100644
--- a/arch/powerpc/mm/ptdump/bats.c
+++ b/arch/powerpc/mm/ptdump/bats.c
@@ -10,6 +10,8 @@
 #include 
 #include 
 
+#include "ptdump.h"
+
 static char *pp_601(int k, int pp)
 {
if (pp == 0)
@@ -42,6 +44,7 @@ static void bat_show_601(struct seq_file *m, int idx, u32 
lower, u32 upper)
 #else
seq_printf(m, "0x%08x ", pbn);
 #endif
+   pt_dump_size(m, size);
 
seq_printf(m, "Kernel %s User %s", pp_601(k & 2, pp), pp_601(k & 1, 
pp));
 
@@ -88,6 +91,7 @@ static void bat_show_603(struct seq_file *m, int idx, u32 
lower, u32 upper, bool
 #else
seq_printf(m, "0x%08x ", brpn);
 #endif
+   pt_dump_size(m, size);
 
if (k == 1)
seq_puts(m, "User ");
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index d92bb8ea229c..1f97668853e3 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -112,6 +112,19 @@ static struct addr_marker address_markers[] = {
seq_putc(m, c); \
 })
 
+void pt_dump_size(struct seq_file *m, unsigned long size)
+{
+   static const char units[] = "KMGTPE";
+   const char *unit = units;
+
+   /* Work out what appropriate unit to use */
+   while (!(size & 1023) && unit[1]) {
+   size >>= 10;
+   unit++;
+   }
+   pt_dump_seq_printf(m, "%9lu%c ", size, *unit);
+}
+
 static void dump_flag_info(struct pg_state *st, const struct flag_info
*flag, u64 pte, int num)
 {
@@ -146,8 +159,6 @@ static void dump_flag_info(struct pg_state *st, const 
struct flag_info
 
 static void dump_addr(struct pg_state *st, unsigned long addr)
 {
-   static const char units[] = "KMGTPE";
-   const char *unit = units;
unsigned long delta;
 
 #ifdef CONFIG_PPC64
@@ -164,13 +175,7 @@ static void dump_addr(struct pg_state *st, unsigned long 
addr)
pt_dump_seq_printf(st->seq, " " REG " ", st->start_pa);
delta = (addr - st->start_address) >> 10;
}
-   /* Work out what appropriate unit to use */
-   while (!(delta & 1023) && unit[1]) {
-   delta >>= 10;
-   unit++;
-   }
-   pt_dump_seq_printf(st->seq, "%9lu%c", delta, *unit);
-
+   pt_dump_size(st->seq, delta);
 }
 
 static void note_prot_wx(struct pg_state *st, unsigned long addr)
diff --git a/arch/powerpc/mm/ptdump/ptdump.h b/arch/powerpc/mm/ptdump/ptdump.h
index 5d513636de73..154efae96ae0 100644
--- a/arch/powerpc/mm/ptdump/ptdump.h
+++ b/arch/powerpc/mm/ptdump/ptdump.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include 
+#include 
 
 struct flag_info {
u64 mask;
@@ -17,3 +18,5 @@ struct pgtable_level {
 };
 
 extern struct pgtable_level pg_level[5];
+
+void pt_dump_size(struct seq_file *m, unsigned long delta);
-- 
2.25.0



[PATCH v2 05/45] powerpc/kasan: Refactor update of early shadow mappings

2020-05-06 Thread Christophe Leroy
kasan_remap_early_shadow_ro() and kasan_unmap_early_shadow_vmalloc()
are both updating the early shadow mapping: the first one sets
the mapping read-only while the other clears the mapping.

Refactor and create kasan_update_early_region()

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/kasan/kasan_init_32.c | 39 +--
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index 91e2ade75192..10481d904fea 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -79,45 +79,42 @@ static int __init kasan_init_region(void *start, size_t 
size)
return 0;
 }
 
-static void __init kasan_remap_early_shadow_ro(void)
+static void __init
+kasan_update_early_region(unsigned long k_start, unsigned long k_end, pte_t 
pte)
 {
-   pgprot_t prot = kasan_prot_ro();
-   unsigned long k_start = KASAN_SHADOW_START;
-   unsigned long k_end = KASAN_SHADOW_END;
unsigned long k_cur;
phys_addr_t pa = __pa(kasan_early_shadow_page);
 
-   kasan_populate_pte(kasan_early_shadow_pte, prot);
-
-   for (k_cur = k_start & PAGE_MASK; k_cur != k_end; k_cur += PAGE_SIZE) {
+   for (k_cur = k_start; k_cur != k_end; k_cur += PAGE_SIZE) {
pmd_t *pmd = pmd_ptr_k(k_cur);
pte_t *ptep = pte_offset_kernel(pmd, k_cur);
 
if ((pte_val(*ptep) & PTE_RPN_MASK) != pa)
continue;
 
-   __set_pte_at(_mm, k_cur, ptep, pfn_pte(PHYS_PFN(pa), 
prot), 0);
+   __set_pte_at(_mm, k_cur, ptep, pte, 0);
}
-   flush_tlb_kernel_range(KASAN_SHADOW_START, KASAN_SHADOW_END);
+
+   flush_tlb_kernel_range(k_start, k_end);
 }
 
-static void __init kasan_unmap_early_shadow_vmalloc(void)
+static void __init kasan_remap_early_shadow_ro(void)
 {
-   unsigned long k_start = (unsigned long)kasan_mem_to_shadow((void 
*)VMALLOC_START);
-   unsigned long k_end = (unsigned long)kasan_mem_to_shadow((void 
*)VMALLOC_END);
-   unsigned long k_cur;
+   pgprot_t prot = kasan_prot_ro();
phys_addr_t pa = __pa(kasan_early_shadow_page);
 
-   for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
-   pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(k_cur), k_cur), 
k_cur);
-   pte_t *ptep = pte_offset_kernel(pmd, k_cur);
+   kasan_populate_pte(kasan_early_shadow_pte, prot);
 
-   if ((pte_val(*ptep) & PTE_RPN_MASK) != pa)
-   continue;
+   kasan_update_early_region(KASAN_SHADOW_START, KASAN_SHADOW_END,
+ pfn_pte(PHYS_PFN(pa), prot));
+}
 
-   __set_pte_at(_mm, k_cur, ptep, __pte(0), 0);
-   }
-   flush_tlb_kernel_range(k_start, k_end);
+static void __init kasan_unmap_early_shadow_vmalloc(void)
+{
+   unsigned long k_start = (unsigned long)kasan_mem_to_shadow((void 
*)VMALLOC_START);
+   unsigned long k_end = (unsigned long)kasan_mem_to_shadow((void 
*)VMALLOC_END);
+
+   kasan_update_early_region(k_start, k_end, __pte(0));
 }
 
 static void __init kasan_mmu_init(void)
-- 
2.25.0



[PATCH v2 09/45] powerpc/ptdump: Add _PAGE_COHERENT flag

2020-05-06 Thread Christophe Leroy
For platforms using shared.c (4xx, Book3e, Book3s/32),
also handle the _PAGE_COHERENT flag with corresponds to the
M bit of the WIMG flags.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ptdump/shared.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/mm/ptdump/shared.c b/arch/powerpc/mm/ptdump/shared.c
index dab5d8028a9b..634b83aa3487 100644
--- a/arch/powerpc/mm/ptdump/shared.c
+++ b/arch/powerpc/mm/ptdump/shared.c
@@ -40,6 +40,11 @@ static const struct flag_info flag_array[] = {
.val= _PAGE_NO_CACHE,
.set= "i",
.clear  = " ",
+   }, {
+   .mask   = _PAGE_COHERENT,
+   .val= _PAGE_COHERENT,
+   .set= "m",
+   .clear  = " ",
}, {
.mask   = _PAGE_GUARDED,
.val= _PAGE_GUARDED,
-- 
2.25.0



[PATCH v2 08/45] powerpc/ptdump: Reorder flags

2020-05-06 Thread Christophe Leroy
Reorder flags in a more logical way:
- Page size (huge) first
- User
- RWX
- Present
- WIMG
- Special
- Dirty and Accessed

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ptdump/8xx.c| 30 +++---
 arch/powerpc/mm/ptdump/shared.c | 30 +++---
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/mm/ptdump/8xx.c b/arch/powerpc/mm/ptdump/8xx.c
index ca9ce94672f5..a3169677dced 100644
--- a/arch/powerpc/mm/ptdump/8xx.c
+++ b/arch/powerpc/mm/ptdump/8xx.c
@@ -11,11 +11,6 @@
 
 static const struct flag_info flag_array[] = {
{
-   .mask   = _PAGE_SH,
-   .val= _PAGE_SH,
-   .set= "sh",
-   .clear  = "  ",
-   }, {
.mask   = _PAGE_RO | _PAGE_NA,
.val= 0,
.set= "rw",
@@ -37,11 +32,26 @@ static const struct flag_info flag_array[] = {
.val= _PAGE_PRESENT,
.set= "p",
.clear  = " ",
+   }, {
+   .mask   = _PAGE_NO_CACHE,
+   .val= _PAGE_NO_CACHE,
+   .set= "i",
+   .clear  = " ",
}, {
.mask   = _PAGE_GUARDED,
.val= _PAGE_GUARDED,
.set= "g",
.clear  = " ",
+   }, {
+   .mask   = _PAGE_SH,
+   .val= _PAGE_SH,
+   .set= "sh",
+   .clear  = "  ",
+   }, {
+   .mask   = _PAGE_SPECIAL,
+   .val= _PAGE_SPECIAL,
+   .set= "s",
+   .clear  = " ",
}, {
.mask   = _PAGE_DIRTY,
.val= _PAGE_DIRTY,
@@ -52,16 +62,6 @@ static const struct flag_info flag_array[] = {
.val= _PAGE_ACCESSED,
.set= "a",
.clear  = " ",
-   }, {
-   .mask   = _PAGE_NO_CACHE,
-   .val= _PAGE_NO_CACHE,
-   .set= "i",
-   .clear  = " ",
-   }, {
-   .mask   = _PAGE_SPECIAL,
-   .val= _PAGE_SPECIAL,
-   .set= "s",
-   .clear  = " ",
}
 };
 
diff --git a/arch/powerpc/mm/ptdump/shared.c b/arch/powerpc/mm/ptdump/shared.c
index 44a8a64a664f..dab5d8028a9b 100644
--- a/arch/powerpc/mm/ptdump/shared.c
+++ b/arch/powerpc/mm/ptdump/shared.c
@@ -30,21 +30,6 @@ static const struct flag_info flag_array[] = {
.val= _PAGE_PRESENT,
.set= "p",
.clear  = " ",
-   }, {
-   .mask   = _PAGE_GUARDED,
-   .val= _PAGE_GUARDED,
-   .set= "g",
-   .clear  = " ",
-   }, {
-   .mask   = _PAGE_DIRTY,
-   .val= _PAGE_DIRTY,
-   .set= "d",
-   .clear  = " ",
-   }, {
-   .mask   = _PAGE_ACCESSED,
-   .val= _PAGE_ACCESSED,
-   .set= "a",
-   .clear  = " ",
}, {
.mask   = _PAGE_WRITETHRU,
.val= _PAGE_WRITETHRU,
@@ -55,11 +40,26 @@ static const struct flag_info flag_array[] = {
.val= _PAGE_NO_CACHE,
.set= "i",
.clear  = " ",
+   }, {
+   .mask   = _PAGE_GUARDED,
+   .val= _PAGE_GUARDED,
+   .set= "g",
+   .clear  = " ",
}, {
.mask   = _PAGE_SPECIAL,
.val= _PAGE_SPECIAL,
.set= "s",
.clear  = " ",
+   }, {
+   .mask   = _PAGE_DIRTY,
+   .val= _PAGE_DIRTY,
+   .set= "d",
+   .clear  = " ",
+   }, {
+   .mask   = _PAGE_ACCESSED,
+   .val= _PAGE_ACCESSED,
+   .set= "a",
+   .clear  = " ",
}
 };
 
-- 
2.25.0



[PATCH v2 07/45] powerpc/ptdump: Limit size of flags text to 1/2 chars on PPC32

2020-05-06 Thread Christophe Leroy
In order to have all flags fit on a 80 chars wide screen,
reduce the flags to 1 char (2 where ambiguous).

No cache is 'i'
User is 'ur' (Supervisor would be sr)
Shared (for 8xx) becomes 'sh' (it was 'user' when not shared but
that was ambiguous because that's not entirely right)

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ptdump/8xx.c| 33 ---
 arch/powerpc/mm/ptdump/shared.c | 35 +
 2 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/mm/ptdump/8xx.c b/arch/powerpc/mm/ptdump/8xx.c
index 9e2d8e847d6e..ca9ce94672f5 100644
--- a/arch/powerpc/mm/ptdump/8xx.c
+++ b/arch/powerpc/mm/ptdump/8xx.c
@@ -12,9 +12,9 @@
 static const struct flag_info flag_array[] = {
{
.mask   = _PAGE_SH,
-   .val= 0,
-   .set= "user",
-   .clear  = "",
+   .val= _PAGE_SH,
+   .set= "sh",
+   .clear  = "  ",
}, {
.mask   = _PAGE_RO | _PAGE_NA,
.val= 0,
@@ -30,37 +30,38 @@ static const struct flag_info flag_array[] = {
}, {
.mask   = _PAGE_EXEC,
.val= _PAGE_EXEC,
-   .set= " X ",
-   .clear  = "   ",
+   .set= "x",
+   .clear  = " ",
}, {
.mask   = _PAGE_PRESENT,
.val= _PAGE_PRESENT,
-   .set= "present",
-   .clear  = "   ",
+   .set= "p",
+   .clear  = " ",
}, {
.mask   = _PAGE_GUARDED,
.val= _PAGE_GUARDED,
-   .set= "guarded",
-   .clear  = "   ",
+   .set= "g",
+   .clear  = " ",
}, {
.mask   = _PAGE_DIRTY,
.val= _PAGE_DIRTY,
-   .set= "dirty",
-   .clear  = " ",
+   .set= "d",
+   .clear  = " ",
}, {
.mask   = _PAGE_ACCESSED,
.val= _PAGE_ACCESSED,
-   .set= "accessed",
-   .clear  = "",
+   .set= "a",
+   .clear  = " ",
}, {
.mask   = _PAGE_NO_CACHE,
.val= _PAGE_NO_CACHE,
-   .set= "no cache",
-   .clear  = "",
+   .set= "i",
+   .clear  = " ",
}, {
.mask   = _PAGE_SPECIAL,
.val= _PAGE_SPECIAL,
-   .set= "special",
+   .set= "s",
+   .clear  = " ",
}
 };
 
diff --git a/arch/powerpc/mm/ptdump/shared.c b/arch/powerpc/mm/ptdump/shared.c
index f7ed2f187cb0..44a8a64a664f 100644
--- a/arch/powerpc/mm/ptdump/shared.c
+++ b/arch/powerpc/mm/ptdump/shared.c
@@ -13,8 +13,8 @@ static const struct flag_info flag_array[] = {
{
.mask   = _PAGE_USER,
.val= _PAGE_USER,
-   .set= "user",
-   .clear  = "",
+   .set= "ur",
+   .clear  = "  ",
}, {
.mask   = _PAGE_RW,
.val= _PAGE_RW,
@@ -23,42 +23,43 @@ static const struct flag_info flag_array[] = {
}, {
.mask   = _PAGE_EXEC,
.val= _PAGE_EXEC,
-   .set= " X ",
-   .clear  = "   ",
+   .set= "x",
+   .clear  = " ",
}, {
.mask   = _PAGE_PRESENT,
.val= _PAGE_PRESENT,
-   .set= "present",
-   .clear  = "   ",
+   .set= "p",
+   .clear  = " ",
}, {
.mask   = _PAGE_GUARDED,
.val= _PAGE_GUARDED,
-   .set= "guarded",
-   .clear  = "   ",
+   .set= "g",
+   .clear  = " ",
}, {
.mask   = _PAGE_DIRTY,
.val= _PAGE_DIRTY,
-   .set= "dirty",
-   .clear  = " ",
+   .set= "d",
+   .clear  = " ",
}, {
.mask   = _PAGE_ACCESSED,
.val= _PAGE_ACCESSED,
-   .set= "accessed",
-   .clear  = "",
+   .set= "a",
+   .clear  = " ",
}, {
.mask   = _PAGE_WRITETHRU,
.val= _PAGE_WRITETHRU,
-   .set= "write through",
-   .clear  = " ",
+   .set= "w",
+   .clear  = " ",
}, {
.mask   = _PAGE_NO_CACHE,
.val= _PAGE_NO_CACHE,
-   .set= "no cache",
-   .clear  = "",
+   .set= "i",
+   .clear  = " ",
   

[PATCH v2 03/45] powerpc/kasan: Fix shadow pages allocation failure

2020-05-06 Thread Christophe Leroy
Doing kasan pages allocation in MMU_init is too early, kernel doesn't
have access yet to the entire memory space and memblock_alloc() fails
when the kernel is a bit big.

Do it from kasan_init() instead.

Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/kasan.h  | 2 --
 arch/powerpc/mm/init_32.c | 2 --
 arch/powerpc/mm/kasan/kasan_init_32.c | 4 +++-
 3 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index fc900937f653..4769bbf7173a 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -27,12 +27,10 @@
 
 #ifdef CONFIG_KASAN
 void kasan_early_init(void);
-void kasan_mmu_init(void);
 void kasan_init(void);
 void kasan_late_init(void);
 #else
 static inline void kasan_init(void) { }
-static inline void kasan_mmu_init(void) { }
 static inline void kasan_late_init(void) { }
 #endif
 
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 872df48ae41b..a6991ef8727d 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -170,8 +170,6 @@ void __init MMU_init(void)
btext_unmap();
 #endif
 
-   kasan_mmu_init();
-
setup_kup();
 
/* Shortly after that, the entire linear mapping will be available */
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index 8b15fe09b967..b7c287adfd59 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -131,7 +131,7 @@ static void __init kasan_unmap_early_shadow_vmalloc(void)
flush_tlb_kernel_range(k_start, k_end);
 }
 
-void __init kasan_mmu_init(void)
+static void __init kasan_mmu_init(void)
 {
int ret;
struct memblock_region *reg;
@@ -159,6 +159,8 @@ void __init kasan_mmu_init(void)
 
 void __init kasan_init(void)
 {
+   kasan_mmu_init();
+
kasan_remap_early_shadow_ro();
 
clear_page(kasan_early_shadow_page);
-- 
2.25.0



[PATCH v2 04/45] powerpc/kasan: Remove unnecessary page table locking

2020-05-06 Thread Christophe Leroy
Commit 45ff3c559585 ("powerpc/kasan: Fix parallel loading of
modules.") added spinlocks to manage parallele module loading.

Since then commit 47febbeeec44 ("powerpc/32: Force KASAN_VMALLOC for
modules") converted the module loading to KASAN_VMALLOC.

The spinlocking has then become unneeded and can be removed to
simplify kasan_init_shadow_page_tables()

Also remove inclusion of linux/moduleloader.h and linux/vmalloc.h
which are not needed anymore since the removal of modules management.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/kasan/kasan_init_32.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index b7c287adfd59..91e2ade75192 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -5,9 +5,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -34,31 +32,22 @@ static int __init kasan_init_shadow_page_tables(unsigned 
long k_start, unsigned
 {
pmd_t *pmd;
unsigned long k_cur, k_next;
-   pte_t *new = NULL;
 
pmd = pmd_ptr_k(k_start);
 
for (k_cur = k_start; k_cur != k_end; k_cur = k_next, pmd++) {
+   pte_t *new;
+
k_next = pgd_addr_end(k_cur, k_end);
if ((void *)pmd_page_vaddr(*pmd) != kasan_early_shadow_pte)
continue;
 
-   if (!new)
-   new = memblock_alloc(PTE_FRAG_SIZE, PTE_FRAG_SIZE);
+   new = memblock_alloc(PTE_FRAG_SIZE, PTE_FRAG_SIZE);
 
if (!new)
return -ENOMEM;
kasan_populate_pte(new, PAGE_KERNEL);
-
-   smp_wmb(); /* See comment in __pte_alloc */
-
-   spin_lock(_mm.page_table_lock);
-   /* Has another populated it ? */
-   if (likely((void *)pmd_page_vaddr(*pmd) == 
kasan_early_shadow_pte)) {
-   pmd_populate_kernel(_mm, pmd, new);
-   new = NULL;
-   }
-   spin_unlock(_mm.page_table_lock);
+   pmd_populate_kernel(_mm, pmd, new);
}
return 0;
 }
-- 
2.25.0



[PATCH v2 01/45] powerpc/kasan: Fix error detection on memory allocation

2020-05-06 Thread Christophe Leroy
In case (k_start & PAGE_MASK) doesn't equal (kstart), 'va' will never be
NULL allthough 'block' is NULL

Check the return of memblock_alloc() directly instead of
the resulting address in the loop.

Fixes: 509cd3f2b473 ("powerpc/32: Simplify KASAN init")
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/kasan/kasan_init_32.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index cbcad369fcb2..8b15fe09b967 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -76,15 +76,14 @@ static int __init kasan_init_region(void *start, size_t 
size)
return ret;
 
block = memblock_alloc(k_end - k_start, PAGE_SIZE);
+   if (!block)
+   return -ENOMEM;
 
for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
pmd_t *pmd = pmd_ptr_k(k_cur);
void *va = block + k_cur - k_start;
pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
 
-   if (!va)
-   return -ENOMEM;
-
__set_pte_at(_mm, k_cur, pte_offset_kernel(pmd, k_cur), 
pte, 0);
}
flush_tlb_kernel_range(k_start, k_end);
-- 
2.25.0



[PATCH v2 00/45] Use hugepages to map kernel mem on 8xx

2020-05-06 Thread Christophe Leroy
The main purpose of this big series is to:
- reorganise huge page handling to avoid using mm_slices.
- use huge pages to map kernel memory on the 8xx.

The 8xx supports 4 page sizes: 4k, 16k, 512k and 8M.
It uses 2 Level page tables, PGD having 1024 entries, each entry
covering 4M address space. Then each page table has 1024 entries.

At the time being, page sizes are managed in PGD entries, implying
the use of mm_slices as it can't mix several pages of the same size
in one page table.

The first purpose of this series is to reorganise things so that
standard page tables can also handle 512k pages. This is done by
adding a new _PAGE_HUGE flag which will be copied into the Level 1
entry in the TLB miss handler. That done, we have 2 types of pages:
- PGD entries to regular page tables handling 4k/16k and 512k pages
- PGD entries to hugepd tables handling 8M pages.

There is no need to mix 8M pages with other sizes, because a 8M page
will use more than what a single PGD covers.

Then comes the second purpose of this series. At the time being, the
8xx has implemented special handling in the TLB miss handlers in order
to transparently map kernel linear address space and the IMMR using
huge pages by building the TLB entries in assembly at the time of the
exception.

As mm_slices is only for user space pages, and also because it would
anyway not be convenient to slice kernel address space, it was not
possible to use huge pages for kernel address space. But after step
one of the series, it is now more flexible to use huge pages.

This series drop all assembly 'just in time' handling of huge pages
and use huge pages in page tables instead.

Once the above is done, then comes the cherry on cake:
- Use huge pages for KASAN shadow mapping
- Allow pinned TLBs with strict kernel rwx
- Allow pinned TLBs with debug pagealloc

Then, last but not least, those modifications for the 8xx allows the
following improvement on book3s/32:
- Mapping KASAN shadow with BATs
- Allowing BATs with debug pagealloc

All this allows to considerably simplify TLB miss handlers and associated
initialisation. The overhead of reading page tables is negligible
compared to the reduction of the miss handlers.

While we were at touching pte_update(), some cleanup was done
there too.

Tested widely on 8xx and 832x. Boot tested on QEMU MAC99.

Changes in v2:
- Selecting HUGETLBFS instead of HUGETLB_PAGE which leads to link failure.
- Rebase on latest powerpc/merge branch
- Reworked the way TLB 28 to 31 are pinned because it was not working.

Christophe Leroy (45):
  powerpc/kasan: Fix error detection on memory allocation
  powerpc/kasan: Fix issues by lowering KASAN_SHADOW_END
  powerpc/kasan: Fix shadow pages allocation failure
  powerpc/kasan: Remove unnecessary page table locking
  powerpc/kasan: Refactor update of early shadow mappings
  powerpc/kasan: Declare kasan_init_region() weak
  powerpc/ptdump: Limit size of flags text to 1/2 chars on PPC32
  powerpc/ptdump: Reorder flags
  powerpc/ptdump: Add _PAGE_COHERENT flag
  powerpc/ptdump: Display size of BATs
  powerpc/ptdump: Standardise display of BAT flags
  powerpc/ptdump: Properly handle non standard page size
  powerpc/ptdump: Handle hugepd at PGD level
  powerpc/32s: Don't warn when mapping RO data ROX.
  powerpc/mm: Allocate static page tables for fixmap
  powerpc/mm: Fix conditions to perform MMU specific management by
blocks on PPC32.
  powerpc/mm: PTE_ATOMIC_UPDATES is only for 40x
  powerpc/mm: Refactor pte_update() on nohash/32
  powerpc/mm: Refactor pte_update() on book3s/32
  powerpc/mm: Standardise __ptep_test_and_clear_young() params between
PPC32 and PPC64
  powerpc/mm: Standardise pte_update() prototype between PPC32 and PPC64
  powerpc/mm: Create a dedicated pte_update() for 8xx
  powerpc/mm: Reduce hugepd size for 8M hugepages on 8xx
  powerpc/8xx: Drop CONFIG_8xx_COPYBACK option
  powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.
  powerpc/8xx: Manage 512k huge pages as standard pages.
  powerpc/8xx: Only 8M pages are hugepte pages now
  powerpc/8xx: MM_SLICE is not needed anymore
  powerpc/8xx: Move PPC_PIN_TLB options into 8xx Kconfig
  powerpc/8xx: Add function to set pinned TLBs
  powerpc/8xx: Don't set IMMR map anymore at boot
  powerpc/8xx: Always pin TLBs at startup.
  powerpc/8xx: Drop special handling of Linear and IMMR mappings in I/D
TLB handlers
  powerpc/8xx: Remove now unused TLB miss functions
  powerpc/8xx: Move DTLB perf handling closer.
  powerpc/mm: Don't be too strict with _etext alignment on PPC32
  powerpc/8xx: Refactor kernel address boundary comparison
  powerpc/8xx: Add a function to early map kernel via huge pages
  powerpc/8xx: Map IMMR with a huge page
  powerpc/8xx: Map linear memory with huge pages
  powerpc/8xx: Allow STRICT_KERNEL_RwX with pinned TLB
  powerpc/8xx: Allow large TLBs with DEBUG_PAGEALLOC
  powerpc/8xx: Implement dedicated kasan_init_region()
  powerpc/32s: Allow mapping with BATs with DEBUG_PAGEALLOC
  

[PATCH v2 02/45] powerpc/kasan: Fix issues by lowering KASAN_SHADOW_END

2020-05-06 Thread Christophe Leroy
At the time being, KASAN_SHADOW_END is 0x1, which
is 0 in 32 bits representation.

This leads to a couple of issues:
- kasan_remap_early_shadow_ro() does nothing because the comparison
k_cur < k_end is always false.
- In ptdump, address comparison for markers display fails and the
marker's name is printed at the start of the KASAN area instead of
being printed at the end.

However, there is no need to shadow the KASAN shadow area itself,
so the KASAN shadow area can stop shadowing memory at the start
of itself.

With a PAGE_OFFSET set to 0xc000, KASAN shadow area is then going
from 0xf800 to 0xff00.

Signed-off-by: Christophe Leroy 
Fixes: cbd18991e24f ("powerpc/mm: Fix an Oops in kasan_mmu_init()")
Cc: sta...@vger.kernel.org
---
 arch/powerpc/include/asm/kasan.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index fbff9ff9032e..fc900937f653 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -23,9 +23,7 @@
 
 #define KASAN_SHADOW_OFFSETASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
 
-#define KASAN_SHADOW_END   0UL
-
-#define KASAN_SHADOW_SIZE  (KASAN_SHADOW_END - KASAN_SHADOW_START)
+#define KASAN_SHADOW_END   (-(-KASAN_SHADOW_START >> 
KASAN_SHADOW_SCALE_SHIFT))
 
 #ifdef CONFIG_KASAN
 void kasan_early_init(void);
-- 
2.25.0



Re: remove set_fs calls from the coredump code v6

2020-05-06 Thread Eric W. Biederman
Christoph Hellwig  writes:

> On Tue, May 05, 2020 at 03:28:50PM -0500, Eric W. Biederman wrote:
>> We probably can.   After introducing a kernel_compat_siginfo that is
>> the size that userspace actually would need.
>> 
>> It isn't something I want to mess with until this code gets merged, as I
>> think the set_fs cleanups are more important.
>> 
>> 
>> Christoph made some good points about how ugly the #ifdefs are in
>> the generic copy_siginfo_to_user32 implementation.
>
> Take a look at the series you are replying to, the magic x86 ifdefs are
> entirely gone from the common code :)

Interesting.

That is a different way of achieving that, and I don't hate it.
  
I still want whatever you are doing to settle before I touch that code
again.  Removing the set_fs is important and I have other fish to fry
at the moment.

Eric





[PATCH 21/91] perf vendor events power9: Add hv_24x7 socket/chip level metric events

2020-05-06 Thread Arnaldo Carvalho de Melo
From: Kajol Jain 

The hv_24×7 feature in IBM® POWER9™ processor-based servers provide the
facility to continuously collect large numbers of hardware performance
metrics efficiently and accurately.

This patch adds hv_24x7  metric file for different Socket/chip
resources.

Result:

power9 platform:

  command:# ./perf stat --metric-only -M Memory_RD_BW_Chip -C 0 -I 1000

 1.96188  0.9   0.3
 2.000285720  0.5   0.1
 3.000424990  0.4   0.1

  command:# ./perf stat --metric-only -M PowerBUS_Frequency -C 0 -I 1000

 1.97981  2.3   2.3
 2.000291713  2.3   2.3
 3.000421719  2.3   2.3
 4.000550912  2.3   2.3

Signed-off-by: Kajol Jain 
Acked-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Anju T Sudhakar 
Cc: Benjamin Herrenschmidt 
Cc: Greg Kroah-Hartman 
Cc: Jin Yao 
Cc: Joe Mario 
Cc: Kan Liang 
Cc: Madhavan Srinivasan 
Cc: Mamatha Inamdar 
Cc: Mark Rutland 
Cc: Michael Ellerman 
Cc: Michael Petlan 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Sukadev Bhattiprolu 
Cc: Thomas Gleixner 
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-8-kj...@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 .../arch/powerpc/power9/nest_metrics.json | 19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power9/nest_metrics.json

diff --git a/tools/perf/pmu-events/arch/powerpc/power9/nest_metrics.json 
b/tools/perf/pmu-events/arch/powerpc/power9/nest_metrics.json
new file mode 100644
index ..c121e526442a
--- /dev/null
+++ b/tools/perf/pmu-events/arch/powerpc/power9/nest_metrics.json
@@ -0,0 +1,19 @@
+[
+{
+"MetricExpr": "(hv_24x7@PM_MCS01_128B_RD_DISP_PORT01\\,chip\\=?@ + 
hv_24x7@PM_MCS01_128B_RD_DISP_PORT23\\,chip\\=?@ + 
hv_24x7@PM_MCS23_128B_RD_DISP_PORT01\\,chip\\=?@ + 
hv_24x7@PM_MCS23_128B_RD_DISP_PORT23\\,chip\\=?@)",
+"MetricName": "Memory_RD_BW_Chip",
+"MetricGroup": "Memory_BW",
+"ScaleUnit": "1.6e-2MB"
+},
+{
+   "MetricExpr": "(hv_24x7@PM_MCS01_128B_WR_DISP_PORT01\\,chip\\=?@ + 
hv_24x7@PM_MCS01_128B_WR_DISP_PORT23\\,chip\\=?@ + 
hv_24x7@PM_MCS23_128B_WR_DISP_PORT01\\,chip\\=?@ + 
hv_24x7@PM_MCS23_128B_WR_DISP_PORT23\\,chip\\=?@ )",
+"MetricName": "Memory_WR_BW_Chip",
+"MetricGroup": "Memory_BW",
+"ScaleUnit": "1.6e-2MB"
+},
+{
+   "MetricExpr": "(hv_24x7@PM_PB_CYC\\,chip\\=?@ )",
+"MetricName": "PowerBUS_Frequency",
+"ScaleUnit": "2.5e-7GHz"
+}
+]
-- 
2.21.1



[PATCH 20/91] perf tools: Enable Hz/hz prinitg for --metric-only option

2020-05-06 Thread Arnaldo Carvalho de Melo
From: Kajol Jain 

Commit 54b5091606c18 ("perf stat: Implement --metric-only mode") added
function 'valid_only_metric()' which drops "Hz" or "hz", if it is part
of "ScaleUnit". This patch enable it since hv_24x7 supports couple of
frequency events.

Signed-off-by: Kajol Jain 
Acked-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Anju T Sudhakar 
Cc: Benjamin Herrenschmidt 
Cc: Greg Kroah-Hartman 
Cc: Jin Yao 
Cc: Joe Mario 
Cc: Kan Liang 
Cc: Madhavan Srinivasan 
Cc: Mamatha Inamdar 
Cc: Mark Rutland 
Cc: Michael Ellerman 
Cc: Michael Petlan 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Sukadev Bhattiprolu 
Cc: Thomas Gleixner 
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-7-kj...@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/util/stat-display.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 9e757d18d713..679aaa655824 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -237,8 +237,6 @@ static bool valid_only_metric(const char *unit)
if (!unit)
return false;
if (strstr(unit, "/sec") ||
-   strstr(unit, "hz") ||
-   strstr(unit, "Hz") ||
strstr(unit, "CPUs utilized"))
return false;
return true;
-- 
2.21.1



[PATCH 19/91] perf tests expr: Added test for runtime param in metric expression

2020-05-06 Thread Arnaldo Carvalho de Melo
From: Kajol Jain 

Added test case for parsing  "?" in metric expression.

Signed-off-by: Kajol Jain 
Acked-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Anju T Sudhakar 
Cc: Benjamin Herrenschmidt 
Cc: Greg Kroah-Hartman 
Cc: Jin Yao 
Cc: Joe Mario 
Cc: Kan Liang 
Cc: Madhavan Srinivasan 
Cc: Mamatha Inamdar 
Cc: Mark Rutland 
Cc: Michael Ellerman 
Cc: Michael Petlan 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Sukadev Bhattiprolu 
Cc: Thomas Gleixner 
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-6-kj...@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/tests/expr.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
index 516504cf0ea5..f9e8e5628836 100644
--- a/tools/perf/tests/expr.c
+++ b/tools/perf/tests/expr.c
@@ -59,6 +59,14 @@ int test__expr(struct test *t __maybe_unused, int subtest 
__maybe_unused)
TEST_ASSERT_VAL("find other", !strcmp(other[2], "BOZO"));
TEST_ASSERT_VAL("find other", other[3] == NULL);
 
+   TEST_ASSERT_VAL("find other",
+   expr__find_other("EVENT1\\,param\\=?@ + 
EVENT2\\,param\\=?@", NULL,
+  , _other, 3) == 0);
+   TEST_ASSERT_VAL("find other", num_other == 2);
+   TEST_ASSERT_VAL("find other", !strcmp(other[0], "EVENT1,param=3/"));
+   TEST_ASSERT_VAL("find other", !strcmp(other[1], "EVENT2,param=3/"));
+   TEST_ASSERT_VAL("find other", other[2] == NULL);
+
for (i = 0; i < num_other; i++)
zfree([i]);
free((void *)other);
-- 
2.21.1



[PATCH 18/91] perf metricgroups: Enhance JSON/metric infrastructure to handle "?"

2020-05-06 Thread Arnaldo Carvalho de Melo
From: Kajol Jain 

Patch enhances current metric infrastructure to handle "?" in the metric
expression. The "?" can be use for parameters whose value not known
while creating metric events and which can be replace later at runtime
to the proper value. It also add flexibility to create multiple events
out of single metric event added in JSON file.

Patch adds function 'arch_get_runtimeparam' which is a arch specific
function, returns the count of metric events need to be created.  By
default it return 1.

This infrastructure needed for hv_24x7 socket/chip level events.
"hv_24x7" chip level events needs specific chip-id to which the data is
requested. Function 'arch_get_runtimeparam' implemented in header.c
which extract number of sockets from sysfs file "sockets" under
"/sys/devices/hv_24x7/interface/".

With this patch basically we are trying to create as many metric events
as define by runtime_param.

For that one loop is added in function 'metricgroup__add_metric', which
create multiple events at run time depend on return value of
'arch_get_runtimeparam' and merge that event in 'group_list'.

To achieve that we are actually passing this parameter value as part of
`expr__find_other` function and changing "?" present in metric
expression with this value.

As in our JSON file, there gonna be single metric event, and out of
which we are creating multiple events.

To understand which data count belongs to which parameter value,
we also printing param value in generic_metric function.

For example,

  command:# ./perf stat  -M PowerBUS_Frequency -C 0 -I 1000
1.000101867  9,356,933  hv_24x7/pm_pb_cyc,chip=0/ #  2.3 GHz  
PowerBUS_Frequency_0
1.000101867  9,366,134  hv_24x7/pm_pb_cyc,chip=1/ #  2.3 GHz  
PowerBUS_Frequency_1
2.000314878  9,365,868  hv_24x7/pm_pb_cyc,chip=0/ #  2.3 GHz  
PowerBUS_Frequency_0
2.000314878  9,366,092  hv_24x7/pm_pb_cyc,chip=1/ #  2.3 GHz  
PowerBUS_Frequency_1

So, here _0 and _1 after PowerBUS_Frequency specify parameter value.

Signed-off-by: Kajol Jain 
Acked-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Anju T Sudhakar 
Cc: Benjamin Herrenschmidt 
Cc: Greg Kroah-Hartman 
Cc: Jin Yao 
Cc: Joe Mario 
Cc: Kan Liang 
Cc: Madhavan Srinivasan 
Cc: Mamatha Inamdar 
Cc: Mark Rutland 
Cc: Michael Ellerman 
Cc: Michael Petlan 
Cc: Namhyung Kim 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Sukadev Bhattiprolu 
Cc: Thomas Gleixner 
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lore.kernel.org/lkml/20200401203340.31402-5-kj...@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/perf/arch/powerpc/util/header.c |  8 
 tools/perf/tests/expr.c   |  8 
 tools/perf/util/expr.c| 11 ++-
 tools/perf/util/expr.h|  5 +++--
 tools/perf/util/expr.l| 27 +++---
 tools/perf/util/metricgroup.c | 28 ---
 tools/perf/util/metricgroup.h |  2 ++
 tools/perf/util/stat-shadow.c | 17 ++--
 8 files changed, 79 insertions(+), 27 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/header.c 
b/tools/perf/arch/powerpc/util/header.c
index 3b4cdfc5efd6..d4870074f14c 100644
--- a/tools/perf/arch/powerpc/util/header.c
+++ b/tools/perf/arch/powerpc/util/header.c
@@ -7,6 +7,8 @@
 #include 
 #include 
 #include "header.h"
+#include "metricgroup.h"
+#include 
 
 #define mfspr(rn)   ({unsigned long rval; \
 asm volatile("mfspr %0," __stringify(rn) \
@@ -44,3 +46,9 @@ get_cpuid_str(struct perf_pmu *pmu __maybe_unused)
 
return bufp;
 }
+
+int arch_get_runtimeparam(void)
+{
+   int count;
+   return sysfs__read_int("/devices/hv_24x7/interface/sockets", ) < 
0 ? 1 : count;
+}
diff --git a/tools/perf/tests/expr.c b/tools/perf/tests/expr.c
index ea10fc4412c4..516504cf0ea5 100644
--- a/tools/perf/tests/expr.c
+++ b/tools/perf/tests/expr.c
@@ -10,7 +10,7 @@ static int test(struct expr_parse_ctx *ctx, const char *e, 
double val2)
 {
double val;
 
-   if (expr__parse(, ctx, e))
+   if (expr__parse(, ctx, e, 1))
TEST_ASSERT_VAL("parse test failed", 0);
TEST_ASSERT_VAL("unexpected value", val == val2);
return 0;
@@ -44,15 +44,15 @@ int test__expr(struct test *t __maybe_unused, int subtest 
__maybe_unused)
return ret;
 
p = "FOO/0";
-   ret = expr__parse(, , p);
+   ret = expr__parse(, , p, 1);
TEST_ASSERT_VAL("division by zero", ret == -1);
 
p = "BAR/";
-   ret = expr__parse(, , p);
+   ret = expr__parse(, , p, 1);
TEST_ASSERT_VAL("missing operand", ret == -1);
 
TEST_ASSERT_VAL("find other",
-   expr__find_other("FOO + BAR + BAZ + BOZO", "FOO", 
, _other) == 0);
+   expr__find_other("FOO + BAR + BAZ + BOZO", "FOO", 
, _other, 1) == 0);
TEST_ASSERT_VAL("find other", num_other == 3);

Re: [PATCH v4 1/2] powerpc/eeh: fix pseries_eeh_configure_bridge()

2020-05-06 Thread Nathan Lynch
Sam Bobroff  writes:
> If a device is hot unplgged during EEH recovery, it's possible for the
> RTAS call to ibm,configure-pe in pseries_eeh_configure() to return
> parameter error (-3), however negative return values are not checked
> for and this leads to an infinite loop.
>
> Fix this by correctly bailing out on negative values.

Reviewed-by: Nathan Lynch 

Thanks.


[PATCH v8 0/5] powerpc/hv-24x7: Expose chip/sockets info to add json file metric support for the hv_24x7 socket/chip level events

2020-05-06 Thread Kajol Jain
Patchset fixes the inconsistent results we are getting when
we run multiple 24x7 events.

"hv_24x7" pmu interface events needs system dependent parameter
like socket/chip/core. For example, hv_24x7 chip level events needs
specific chip-id to which the data is requested should be added as part
of pmu events.

So to enable JSON file support to "hv_24x7" interface, patchset expose
total number of sockets and chips per-socket details in sysfs
files (sockets, chips) under "/sys/devices/hv_24x7/interface/".

To get number of sockets, chips per sockets and cores per chip patchset adds a
rtas call with token "PROCESSOR_MODULE_INFO" to get these details. Patchset
also handles partition migration case to re-init these system depended
parameters by adding proper calls in post_mobility_fixup() (mobility.c).

v7: http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=167076

Changelog:
v7 -> v8
- Add support for exposing cores per details as well.
  Suggested by: Madhavan Srinivasan.
- Remove config check for 'CONFIG_PPC_RTAS' in previous
  implementation and address other comments by Michael Ellerman.

v6 -> v7
- Split patchset into two patch series, one with kernel changes
  and another with perf tool side changes. This pachset contain
  all kernel side changes.

Kajol Jain (5):
  powerpc/perf/hv-24x7: Fix inconsistent output values incase multiple
hv-24x7 events run
  powerpc/hv-24x7: Add rtas call in hv-24x7 driver to get processor
details
  powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show
processor details
  Documentation/ABI: Add ABI documentation for chips and sockets
  powerpc/hv-24x7: Update post_mobility_fixup() to handle migration

 .../sysfs-bus-event_source-devices-hv_24x7|  21 
 arch/powerpc/include/asm/rtas.h   |   1 +
 arch/powerpc/perf/hv-24x7.c   | 106 --
 arch/powerpc/platforms/pseries/mobility.c |  16 +++
 4 files changed, 134 insertions(+), 10 deletions(-)

-- 
2.18.2



[PATCH v8 5/5] powerpc/hv-24x7: Update post_mobility_fixup() to handle migration

2020-05-06 Thread Kajol Jain
Function 'read_sys_info_pseries()' is added to get system parameter
values like number of sockets and chips per socket.
and it gets these details via rtas_call with token
"PROCESSOR_MODULE_INFO".

Incase lpar migrate from one system to another, system
parameter details like chips per sockets or number of sockets might
change. So, it needs to be re-initialized otherwise, these values
corresponds to previous system values.
This patch adds a call to 'read_sys_info_pseries()' from
'post-mobility_fixup()' to re-init the physsockets and physchips values

Signed-off-by: Kajol Jain 
---
 arch/powerpc/platforms/pseries/mobility.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index b571285f6c14..0fb8f1e6e9d2 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -42,6 +42,12 @@ struct update_props_workarea {
 #define MIGRATION_SCOPE(1)
 #define PRRN_SCOPE -2
 
+#ifdef CONFIG_HV_PERF_CTRS
+void read_sys_info_pseries(void);
+#else
+static inline void read_sys_info_pseries(void) { }
+#endif
+
 static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
int rc;
@@ -371,6 +377,16 @@ void post_mobility_fixup(void)
/* Possibly switch to a new RFI flush type */
pseries_setup_rfi_flush();
 
+   /*
+* In case an Lpar migrates from one system to another, system
+* parameter details like chips per sockets, cores per chip and
+* number of sockets details might change.
+* So, they needs to be re-initialized otherwise the
+* values will correspond to the previous system.
+* Call read_sys_info_pseries() to reinitialise the values.
+*/
+   read_sys_info_pseries();
+
return;
 }
 
-- 
2.18.2



[PATCH v8 4/5] Documentation/ABI: Add ABI documentation for chips and sockets

2020-05-06 Thread Kajol Jain
Add documentation for the following sysfs files:
/sys/devices/hv_24x7/interface/chipspersocket,
/sys/devices/hv_24x7/interface/sockets,
/sys/devices/hv_24x7/interface/coresperchip

Signed-off-by: Kajol Jain 
---
 .../sysfs-bus-event_source-devices-hv_24x7| 21 +++
 1 file changed, 21 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
index ec27c6c9e737..e8698afcd952 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
@@ -22,6 +22,27 @@ Description:
Exposes the "version" field of the 24x7 catalog. This is also
extractable from the provided binary "catalog" sysfs entry.
 
+What:  /sys/devices/hv_24x7/interface/sockets
+Date:  May 2020
+Contact:   Linux on PowerPC Developer List 
+Description:   read only
+   This sysfs interface exposes the number of sockets present in 
the
+   system.
+
+What:  /sys/devices/hv_24x7/interface/chipspersocket
+Date:  May 2020
+Contact:   Linux on PowerPC Developer List 
+Description:   read only
+   This sysfs interface exposes the number of chips per socket
+   present in the system.
+
+What:  /sys/devices/hv_24x7/interface/coresperchip
+Date:  May 2020
+Contact:   Linux on PowerPC Developer List 
+Description:   read only
+   This sysfs interface exposes the number of cores per chip
+   present in the system.
+
 What:  /sys/bus/event_source/devices/hv_24x7/event_descs/
 Date:  February 2014
 Contact:   Linux on PowerPC Developer List 
-- 
2.18.2



[PATCH v8 3/5] powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show processor details

2020-05-06 Thread Kajol Jain
To expose the system dependent parameter like total number of
sockets and numbers of chips per socket, patch adds two sysfs files.
"sockets" and "chips" are added to /sys/devices/hv_24x7/interface/
of the "hv_24x7" pmu.

Signed-off-by: Kajol Jain 
---
 arch/powerpc/perf/hv-24x7.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 8cf242aad98f..f24dee2a660a 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -456,6 +456,24 @@ static ssize_t device_show_string(struct device *dev,
return sprintf(buf, "%s\n", (char *)d->var);
 }
 
+static ssize_t sockets_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%d\n", phys_sockets);
+}
+
+static ssize_t chipspersocket_show(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%d\n", phys_chipspersocket);
+}
+
+static ssize_t coresperchip_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   return sprintf(buf, "%d\n", phys_coresperchip);
+}
+
 static struct attribute *device_str_attr_create_(char *name, char *str)
 {
struct dev_ext_attribute *attr = kzalloc(sizeof(*attr), GFP_KERNEL);
@@ -1102,6 +1120,9 @@ PAGE_0_ATTR(catalog_len, "%lld\n",
(unsigned long long)be32_to_cpu(page_0->length) * 4096);
 static BIN_ATTR_RO(catalog, 0/* real length varies */);
 static DEVICE_ATTR_RO(domains);
+static DEVICE_ATTR_RO(sockets);
+static DEVICE_ATTR_RO(chipspersocket);
+static DEVICE_ATTR_RO(coresperchip);
 
 static struct bin_attribute *if_bin_attrs[] = {
_attr_catalog,
@@ -1112,6 +1133,9 @@ static struct attribute *if_attrs[] = {
_attr_catalog_len.attr,
_attr_catalog_version.attr,
_attr_domains.attr,
+   _attr_sockets.attr,
+   _attr_chipspersocket.attr,
+   _attr_coresperchip.attr,
NULL,
 };
 
-- 
2.18.2



[PATCH v8 2/5] powerpc/hv-24x7: Add rtas call in hv-24x7 driver to get processor details

2020-05-06 Thread Kajol Jain
For hv_24x7 socket/chip level events, specific chip-id to which
the data requested should be added as part of pmu events.
But number of chips/socket in the system details are not exposed.

Patch implements read_sys_info_pseries() to get system
parameter values like number of sockets and chips per socket.
Rtas_call with token "PROCESSOR_MODULE_INFO"
is used to get these values.

Sub-sequent patch exports these values via sysfs.

Patch also make these parameters default to 1.

Signed-off-by: Kajol Jain 
---
 arch/powerpc/include/asm/rtas.h |  1 +
 arch/powerpc/perf/hv-24x7.c | 72 +
 2 files changed, 73 insertions(+)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 3c1887351c71..1c11f814932d 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -482,6 +482,7 @@ static inline void rtas_initialize(void) { };
 #endif
 
 extern int call_rtas(const char *, int, int, unsigned long *, ...);
+extern void read_sys_info_pseries(void);
 
 #endif /* __KERNEL__ */
 #endif /* _POWERPC_RTAS_H */
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 48e8f4b17b91..8cf242aad98f 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 
+#include 
 #include "hv-24x7.h"
 #include "hv-24x7-catalog.h"
 #include "hv-common.h"
@@ -57,6 +58,75 @@ static bool is_physical_domain(unsigned domain)
}
 }
 
+/*
+ * The Processor Module Information system parameter allows transferring
+ * of certain processor module information from the platform to the OS.
+ * Refer PAPR+ document to get parameter token value as '43'.
+ */
+
+#define PROCESSOR_MODULE_INFO   43
+#define PROCESSOR_MAX_LENGTH   (8 * 1024)
+
+DEFINE_SPINLOCK(rtas_local_data_buf_lock);
+EXPORT_SYMBOL(rtas_local_data_buf_lock);
+
+static u32 phys_sockets;   /* Physical sockets */
+static u32 phys_chipspersocket;/* Physical chips per socket*/
+static u32 phys_coresperchip; /* Physical cores per chip */
+
+/*
+ * Function read_sys_info_pseries() make a rtas_call which require
+ * data buffer of size 8K. As standard 'rtas_data_buf' is of size
+ * 4K, we are adding new local buffer 'rtas_local_data_buf'.
+ */
+static __be16 rtas_local_data_buf[PROCESSOR_MAX_LENGTH] __cacheline_aligned;
+
+/*
+ * read_sys_info_pseries()
+ * Retrieve the number of sockets and chips per socket and cores per
+ * chip details through the get-system-parameter rtas call.
+ */
+void read_sys_info_pseries(void)
+{
+   int call_status, len, ntypes;
+
+   /*
+* Making system parameter: chips and sockets and cores per chip
+* default to 1.
+*/
+   phys_sockets = 1;
+   phys_chipspersocket = 1;
+   phys_coresperchip = 1;
+   memset(rtas_local_data_buf, 0, PROCESSOR_MAX_LENGTH * sizeof(__be16));
+   spin_lock(_local_data_buf_lock);
+
+   call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
+   NULL,
+   PROCESSOR_MODULE_INFO,
+   __pa(rtas_local_data_buf),
+   PROCESSOR_MAX_LENGTH);
+
+   spin_unlock(_local_data_buf_lock);
+
+   if (call_status != 0) {
+   pr_info("Error calling get-system-parameter (0x%x)\n",
+   call_status);
+   } else {
+   rtas_local_data_buf[PROCESSOR_MAX_LENGTH - 1] = '\0';
+   len = be16_to_cpup((__be16 *)_local_data_buf[0]);
+   if (len < 4)
+   return;
+
+   ntypes = be16_to_cpup(_local_data_buf[1]);
+
+   if (!ntypes)
+   return;
+   phys_sockets = be16_to_cpup(_local_data_buf[2]);
+   phys_chipspersocket = be16_to_cpup(_local_data_buf[3]);
+   phys_coresperchip = be16_to_cpup(_local_data_buf[4]);
+   }
+}
+
 /* Domains for which more than one result element are returned for each event. 
*/
 static bool domain_needs_aggregation(unsigned int domain)
 {
@@ -1605,6 +1675,8 @@ static int hv_24x7_init(void)
if (r)
return r;
 
+   read_sys_info_pseries();
+
return 0;
 }
 
-- 
2.18.2



[PATCH v8 1/5] powerpc/perf/hv-24x7: Fix inconsistent output values incase multiple hv-24x7 events run

2020-05-06 Thread Kajol Jain
Commit 2b206ee6b0df ("powerpc/perf/hv-24x7: Display change in counter
values")' added to print _change_ in the counter value rather then raw
value for 24x7 counters. Incase of transactions, the event count
is set to 0 at the beginning of the transaction. It also sets
the event's prev_count to the raw value at the time of initialization.
Because of setting event count to 0, we are seeing some weird behaviour,
whenever we run multiple 24x7 events at a time.

For example:

command#: ./perf stat -e "{hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/,
   hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/}"
   -C 0 -I 1000 sleep 100

 1.000121704120 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
 1.000121704  5 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
 2.000357733  8 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
 2.000357733 10 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
 3.000495215 18,446,744,073,709,551,616 
hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
 3.000495215 18,446,744,073,709,551,616 
hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
 4.000641884 56 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
 4.000641884 18,446,744,073,709,551,616 
hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
 5.000791887 18,446,744,073,709,551,616 
hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/

Getting these large values in case we do -I.

As we are setting event_count to 0, for interval case, overall event_count is 
not
coming in incremental order. As we may can get new delta lesser then previous 
count.
Because of which when we print intervals, we are getting negative value which 
create
these large values.

This patch removes part where we set event_count to 0 in function
'h_24x7_event_read'. There won't be much impact as we do set 
event->hw.prev_count
to the raw value at the time of initialization to print change value.

With this patch
In power9 platform

command#: ./perf stat -e "{hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/,
   hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/}"
   -C 0 -I 1000 sleep 100

 1.000117685 93 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
 1.000117685  1 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
 2.000349331 98 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
 2.000349331  2 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
 3.000495900131 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
 3.000495900  4 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
 4.000645920204 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
 4.000645920 61 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
 4.284169997 22 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/

Signed-off-by: Kajol Jain 
Suggested-by: Sukadev Bhattiprolu 
---
 arch/powerpc/perf/hv-24x7.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index 573e0b309c0c..48e8f4b17b91 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1400,16 +1400,6 @@ static void h_24x7_event_read(struct perf_event *event)
h24x7hw = _cpu_var(hv_24x7_hw);
h24x7hw->events[i] = event;
put_cpu_var(h24x7hw);
-   /*
-* Clear the event count so we can compute the _change_
-* in the 24x7 raw counter value at the end of the txn.
-*
-* Note that we could alternatively read the 24x7 value
-* now and save its value in event->hw.prev_count. But
-* that would require issuing a hcall, which would then
-* defeat the purpose of using the txn interface.
-*/
-   local64_set(>count, 0);
}
 
put_cpu_var(hv_24x7_reqb);
-- 
2.18.2



Re: [PATCH v7 2/5] powerpc/hv-24x7: Add rtas call in hv-24x7 driver to get processor details

2020-05-06 Thread kajoljain



On 4/29/20 5:01 PM, Michael Ellerman wrote:
> Hi Kajol,
> 
> Some comments inline ...
> 
> Kajol Jain  writes:
>> For hv_24x7 socket/chip level events, specific chip-id to which
>> the data requested should be added as part of pmu events.
>> But number of chips/socket in the system details are not exposed.
>>
>> Patch implements read_sys_info_pseries() to get system
>> parameter values like number of sockets and chips per socket.
>> Rtas_call with token "PROCESSOR_MODULE_INFO"
>> is used to get these values.
>>
>> Sub-sequent patch exports these values via sysfs.
>>
>> Patch also make these parameters default to 1.
>>
>> Signed-off-by: Kajol Jain 
>> ---
>>  arch/powerpc/perf/hv-24x7.c  | 72 
>>  arch/powerpc/platforms/pseries/pseries.h |  3 +
>>  2 files changed, 75 insertions(+)
>>
>> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
>> index 48e8f4b17b91..9ae00f29bd21 100644
>> --- a/arch/powerpc/perf/hv-24x7.c
>> +++ b/arch/powerpc/perf/hv-24x7.c
>> @@ -20,6 +20,11 @@
>>  #include 
>>  #include 
>>  
>> +#ifdef CONFIG_PPC_RTAS
> 
> This driver can only be build on pseries, and pseries always selects
> RTAS. So the ifdef is unncessary.

Hi Michael,
Thanks for review, I will remove this check.

> 
>> +#include 
>> +#include <../../platforms/pseries/pseries.h>
>> +#endif
> 
> That's not really what the platform header is intended for.
> 
> You should put the extern in arch/powerpc/include/asm somewhere.
> 
> Maybe rtas.h
> 
>> @@ -57,6 +62,69 @@ static bool is_physical_domain(unsigned domain)
>>  }
>>  }
>>  
>> +#ifdef CONFIG_PPC_RTAS
> 
> Not needed.
> 
>> +#define PROCESSOR_MODULE_INFO   43
> 
> Please document where these come from, presumably LoPAPR somewhere?
> 

Sure, will add the details.

>> +#define PROCESSOR_MAX_LENGTH(8 * 1024)
>> +
>> +static int strbe16toh(const char *buf, int offset)
>> +{
>> +return (buf[offset] << 8) + buf[offset + 1];
>> +}
> 
> I'm confused by this. "str" implies string, a string is an array of
> bytes and has no endian. But then be16 implies it's an array of __be16,
> in which case buf should be a __be16 *.
> 

Yes right, actually I was following implementation in util-linux. But what you
suggested make more sense, will update accordingly.

>> +
>> +static u32  physsockets;/* Physical sockets */
>> +static u32  physchips;  /* Physical chips */
> 
> No tabs there please.

Sure will update.

> 
>> +
>> +/*
>> + * Function read_sys_info_pseries() make a rtas_call which require
>> + * data buffer of size 8K. As standard 'rtas_data_buf' is of size
>> + * 4K, we are adding new local buffer 'rtas_local_data_buf'.
>> + */
>> +char rtas_local_data_buf[PROCESSOR_MAX_LENGTH] __cacheline_aligned;
> 
> static?
> 
>> +/*
>> + * read_sys_info_pseries()
>> + * Retrieve the number of sockets and chips per socket details
>> + * through the get-system-parameter rtas call.
>> + */
>> +void read_sys_info_pseries(void)
>> +{
>> +int call_status, len, ntypes;
>> +
>> +/*
>> + * Making system parameter: chips and sockets default to 1.
>> + */
>> +physsockets = 1;
>> +physchips = 1;
>> +memset(rtas_local_data_buf, 0, PROCESSOR_MAX_LENGTH);
>> +spin_lock(_data_buf_lock);
> 
> You're not using the rtas_data_buf, so why are you taking the
> rtas_data_buf_lock?

Sure, I will add new lock specific for rtas_local_data_buf

> 
>> +call_status = rtas_call(rtas_token("ibm,get-system-parameter"), 3, 1,
>> +NULL,
>> +PROCESSOR_MODULE_INFO,
>> +__pa(rtas_local_data_buf),
>> +PROCESSOR_MAX_LENGTH);
>> +
>> +spin_unlock(_data_buf_lock);
>> +
>> +if (call_status != 0) {
>> +pr_info("%s %s Error calling get-system-parameter (0x%x)\n",
>> +__FILE__, __func__, call_status);
> 
> pr_err(), don't use __FILE__, this file already uses pr_fmt(). Not sure
> __func__ is really necessary either
> 
>   return;
> 
> Then you can deindent the next block.
> 
>> +} else {
>> +rtas_local_data_buf[PROCESSOR_MAX_LENGTH - 1] = '\0';
>> +len = strbe16toh(rtas_local_data_buf, 0);
> 
> Why isn't the buffer a __be16 array, and then you just use be16_to_cpu() ?
> 
>> +if (len < 6)
>> +return;
>> +
>> +ntypes = strbe16toh(rtas_local_data_buf, 2);
>> +
>> +if (!ntypes)
>> +return;
> 
> What is ntype
ntype specify processor module type

> 
>> +physsockets = strbe16toh(rtas_local_data_buf, 4);
>> +physchips = strbe16toh(rtas_local_data_buf, 6);
>> +}
>> +}
>> +#endif /* CONFIG_PPC_RTAS */
>> +
>>  /* Domains for which more than one result element are returned for each 
>> event. */
>>  static bool domain_needs_aggregation(unsigned int domain)
>>  {
>> @@ -1605,6 +1673,10 @@ static int hv_24x7_init(void)

Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open

2020-05-06 Thread Greg KH
On Mon, Apr 27, 2020 at 08:26:01PM -0700, Raghavendra Rao Ananta wrote:
> Potentially, hvc_open() can be called in parallel when two tasks calls
> open() on /dev/hvcX. In such a scenario, if the hp->ops->notifier_add()
> callback in the function fails, where it sets the tty->driver_data to
> NULL, the parallel hvc_open() can see this NULL and cause a memory abort.
> Hence, serialize hvc_open and check if tty->private_data is NULL before
> proceeding ahead.
> 
> The issue can be easily reproduced by launching two tasks simultaneously
> that does nothing but open() and close() on /dev/hvcX.
> For example:
> $ ./simple_open_close /dev/hvc0 & ./simple_open_close /dev/hvc0 &
> 
> Signed-off-by: Raghavendra Rao Ananta 
> ---
>  drivers/tty/hvc/hvc_console.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
> index 436cc51c92c3..ebe26fe5ac09 100644
> --- a/drivers/tty/hvc/hvc_console.c
> +++ b/drivers/tty/hvc/hvc_console.c
> @@ -75,6 +75,8 @@ static LIST_HEAD(hvc_structs);
>   */
>  static DEFINE_MUTEX(hvc_structs_mutex);
>  
> +/* Mutex to serialize hvc_open */
> +static DEFINE_MUTEX(hvc_open_mutex);
>  /*
>   * This value is used to assign a tty->index value to a hvc_struct based
>   * upon order of exposure via hvc_probe(), when we can not match it to
> @@ -346,16 +348,24 @@ static int hvc_install(struct tty_driver *driver, 
> struct tty_struct *tty)
>   */
>  static int hvc_open(struct tty_struct *tty, struct file * filp)
>  {
> - struct hvc_struct *hp = tty->driver_data;
> + struct hvc_struct *hp;
>   unsigned long flags;
>   int rc = 0;
>  
> + mutex_lock(_open_mutex);
> +
> + hp = tty->driver_data;
> + if (!hp) {
> + rc = -EIO;
> + goto out;
> + }
> +
>   spin_lock_irqsave(>port.lock, flags);
>   /* Check and then increment for fast path open. */
>   if (hp->port.count++ > 0) {
>   spin_unlock_irqrestore(>port.lock, flags);
>   hvc_kick();
> - return 0;
> + goto out;
>   } /* else count == 0 */
>   spin_unlock_irqrestore(>port.lock, flags);

Wait, why isn't this driver just calling tty_port_open() instead of
trying to open-code all of this?

Keeping a single mutext for open will not protect it from close, it will
just slow things down a bit.  There should already be a tty lock held by
the tty core for open() to keep it from racing things, right?

Try just removing all of this logic and replacing it with a call to
tty_port_open() and see if that fixes this issue.

As "proof" of this, I don't see other serial drivers needing a single
mutex for their open calls, do you?

thanks,

greg k-h


Re: [PATCH 2/2] powerpc/perf: Add support for outputting extended regs in perf intr_regs

2020-05-06 Thread Athira Rajeev


> On 06-May-2020, at 9:56 AM, Madhavan Srinivasan  wrote:
> 
> 
> 
> On 4/29/20 11:34 AM, Anju T Sudhakar wrote:
>> The capability flag PERF_PMU_CAP_EXTENDED_REGS, is used to indicate the
>> PMU which support extended registers. The generic code define the mask
>> of extended registers as 0 for non supported architectures.
>> 
>> Add support for extended registers in POWER9 architecture. For POWER9,
>> the extended registers are mmcr0, mmc1 and mmcr2.
>> 
>> REG_RESERVED mask is redefined to accommodate the extended registers.
>> 
>> With patch:
>> 
>> 
>> # perf record -I?
>> available registers: r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14
>> r15 r16 r17 r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 nip
>> msr orig_r3 ctr link xer ccr softe trap dar dsisr sier mmcra mmcr0
>> mmcr1 mmcr2
> 

> Would prefer to have some flexibility in deciding what to expose
> in as extended regs. Meaning say if we want to add extended regs
> in power8 and if we dont want to show for ex say mmcr2 (just for example).
> 

One way to approach this is to have the "extended mask" exposed in 
sysfs: "/sys/bus/event_source/devices/cpu/caps/ext_regs_mask" by the platform 
pmu
driver. This way the perf tool side can look at this and platform driver will 
also have control 
on what to expose as part of the extended regs.

perf tools side uses extended mask to display the platform supported register 
names (with -I? option)
to the user and also send this mask to the kernel to capture the extended 
registers in each sample. 
Hence we need to expose the appropriated mask to the perf tool side.

Thanks
Athira

> Maddy
> 
>> 
>> # perf record -I ls
>> # perf script -D
>> 
>> PERF_RECORD_SAMPLE(IP, 0x1): 9019/9019: 0 period: 1 addr: 0
>> ... intr regs: mask 0x ABI 64-bit
>>  r00xc011b12c
>>  r10xc03f9a98b930
>>  r20xc1a32100
>>  r30xc03f8fe9a800
>>  r40xc03fd181
>>  r50x3e32557150
>>  r60xc03f9a98b908
>>  r70xffc1cdae06ac
>>  r80x818
>> [.]
>>  r31   0xc03ffd047230
>>  nip   0xc011b2c0
>>  msr   0x90009033
>>  orig_r3 0xc011b21c
>>  ctr   0xc0119380
>>  link  0xc011b12c
>>  xer   0x0
>>  ccr   0x2800
>>  softe 0x1
>>  trap  0xf00
>>  dar   0x0
>>  dsisr 0x800
>>  sier  0x0
>>  mmcra 0x800
>>  mmcr0 0x82008090
>>  mmcr1 0x1e00
>>  mmcr2 0x0
>>  ... thread: perf:9019
>> 
>> Signed-off-by: Anju T Sudhakar 
>> ---
>>  arch/powerpc/include/asm/perf_event_server.h  |  5 +++
>>  arch/powerpc/include/uapi/asm/perf_regs.h | 13 +++-
>>  arch/powerpc/perf/core-book3s.c   |  1 +
>>  arch/powerpc/perf/perf_regs.c | 29 ++--
>>  arch/powerpc/perf/power9-pmu.c|  1 +
>>  .../arch/powerpc/include/uapi/asm/perf_regs.h | 13 +++-
>>  tools/perf/arch/powerpc/include/perf_regs.h   |  6 +++-
>>  tools/perf/arch/powerpc/util/perf_regs.c  | 33 +++
>>  8 files changed, 95 insertions(+), 6 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/perf_event_server.h 
>> b/arch/powerpc/include/asm/perf_event_server.h
>> index 3e9703f44c7c..1d15953bd99e 100644
>> --- a/arch/powerpc/include/asm/perf_event_server.h
>> +++ b/arch/powerpc/include/asm/perf_event_server.h
>> @@ -55,6 +55,11 @@ struct power_pmu {
>>  int *blacklist_ev;
>>  /* BHRB entries in the PMU */
>>  int bhrb_nr;
>> +/*
>> + * set this flag with `PERF_PMU_CAP_EXTENDED_REGS` if
>> + * the pmu supports extended perf regs capability
>> + */
>> +int capabilities;
>>  };
>> 
>>  /*
>> diff --git a/arch/powerpc/include/uapi/asm/perf_regs.h 
>> b/arch/powerpc/include/uapi/asm/perf_regs.h
>> index f599064dd8dc..604b831378fe 100644
>> --- a/arch/powerpc/include/uapi/asm/perf_regs.h
>> +++ b/arch/powerpc/include/uapi/asm/perf_regs.h
>> @@ -48,6 +48,17 @@ enum perf_event_powerpc_regs {
>>  PERF_REG_POWERPC_DSISR,
>>  PERF_REG_POWERPC_SIER,
>>  PERF_REG_POWERPC_MMCRA,
>> -PERF_REG_POWERPC_MAX,
>> +/* Extended registers */
>> +PERF_REG_POWERPC_MMCR0,
>> +PERF_REG_POWERPC_MMCR1,
>> +PERF_REG_POWERPC_MMCR2,
>> +PERF_REG_EXTENDED_MAX,
>> +/* Max regs without the extended regs */
>> +PERF_REG_POWERPC_MAX = PERF_REG_POWERPC_MMCRA + 1,
>>  };
>> +
>> +#define PERF_REG_PMU_MASK   ((1ULL << PERF_REG_POWERPC_MAX) - 1)
>> +#define PERF_REG_EXTENDED_MASK  (((1ULL << (PERF_REG_EXTENDED_MAX)) \
>> +- 1) - PERF_REG_PMU_MASK)
>> +
>>  #endif /* _UAPI_ASM_POWERPC_PERF_REGS_H */
>> diff --git a/arch/powerpc/perf/core-book3s.c 
>> b/arch/powerpc/perf/core-book3s.c
>> index 3dcfecf858f3..f56b77800a7b 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -2276,6 +2276,7 @@ int 

Re: [PATCH] powerpc/xive: Enforce load-after-store ordering when StoreEOI is active

2020-05-06 Thread Cédric Le Goater
On 5/6/20 1:34 AM, Alistair Popple wrote:
> I am still slowly wrapping my head around XIVE and it's interaction with KVM 
> but from what I can see this looks good and is needed so we can enable 
> StoreEOI support in future so:
> 
> Reviewed-by: Alistair Popple 
> 
> On Thursday, 20 February 2020 7:15:06 PM AEST Cédric Le Goater wrote:
>> When an interrupt has been handled, the OS notifies the interrupt
>> controller with a EOI sequence. On a POWER9 system using the XIVE
>> interrupt controller, this can be done with a load or a store
>> operation on the ESB interrupt management page of the interrupt. The
>> StoreEOI operation has less latency and improves interrupt handling
>> performance but it was deactivated during the POWER9 DD2.0 timeframe
>> because of ordering issues. We use the LoadEOI today but we plan to
>> reactivate StoreEOI in future architectures.

StoreEOI was considered broken on P9, and never used, but we have to 
prepare ground for the IBM hypervisor which will activate StoreEOI 
without negotiation on the new architecture.

Thanks,

C. 


>>
>> There is usually no need to enforce ordering between ESB load and
>> store operations as they should lead to the same result. E.g. a store
>> trigger and a load EOI can be executed in any order. Assuming the
>> interrupt state is PQ=10, a store trigger followed by a load EOI will
>> return a Q bit. In the reverse order, it will create a new interrupt
>> trigger from HW. In both cases, the handler processing interrupts is
>> notified.
>>
>> In some cases, the XIVE_ESB_SET_PQ_10 load operation is used to
>> disable temporarily the interrupt source (mask/unmask). When the
>> source is reenabled, the OS can detect if interrupts were received
>> while the source was disabled and reinject them. This process needs
>> special care when StoreEOI is activated. The ESB load and store
>> operations should be correctly ordered because a XIVE_ESB_STORE_EOI
>> operation could leave the source enabled if it has not completed
>> before the loads.
>>
>> For those cases, we enforce Load-after-Store ordering with a special
>> load operation offset. To avoid performance impact, this ordering is
>> only enforced when really needed, that is when interrupt sources are
>> temporarily disabled with the XIVE_ESB_SET_PQ_10 load. It should not
>> be needed for other loads.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/asm/xive-regs.h| 8 
>>  arch/powerpc/kvm/book3s_xive_native.c   | 6 ++
>>  arch/powerpc/kvm/book3s_xive_template.c | 3 +++
>>  arch/powerpc/sysdev/xive/common.c   | 3 +++
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 5 +
>>  5 files changed, 25 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/xive-regs.h
>> b/arch/powerpc/include/asm/xive-regs.h index f2dfcd50a2d3..b1996fbae59a
>> 100644
>> --- a/arch/powerpc/include/asm/xive-regs.h
>> +++ b/arch/powerpc/include/asm/xive-regs.h
>> @@ -37,6 +37,14 @@
>>  #define XIVE_ESB_SET_PQ_10  0xe00 /* Load */
>>  #define XIVE_ESB_SET_PQ_11  0xf00 /* Load */
>>
>> +/*
>> + * Load-after-store ordering
>> + *
>> + * Adding this offset to the load address will enforce
>> + * load-after-store ordering. This is required to use StoreEOI.
>> + */
>> +#define XIVE_ESB_LD_ST_MO   0x40 /* Load-after-store ordering */
>> +
>>  #define XIVE_ESB_VAL_P  0x2
>>  #define XIVE_ESB_VAL_Q  0x1
>>
>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c
>> b/arch/powerpc/kvm/book3s_xive_native.c index d83adb1e1490..c80b6a447efd
>> 100644
>> --- a/arch/powerpc/kvm/book3s_xive_native.c
>> +++ b/arch/powerpc/kvm/book3s_xive_native.c
>> @@ -31,6 +31,12 @@ static u8 xive_vm_esb_load(struct xive_irq_data *xd, u32
>> offset) {
>>  u64 val;
>>
>> +/*
>> + * The KVM XIVE native device does not use the XIVE_ESB_SET_PQ_10
>> + * load operation, so there is no need to enforce load-after-store
>> + * ordering.
>> + */
>> +
>>  if (xd->flags & XIVE_IRQ_FLAG_SHIFT_BUG)
>>  offset |= offset << 4;
>>
>> diff --git a/arch/powerpc/kvm/book3s_xive_template.c
>> b/arch/powerpc/kvm/book3s_xive_template.c index a8a900ace1e6..4ad3c0279458
>> 100644
>> --- a/arch/powerpc/kvm/book3s_xive_template.c
>> +++ b/arch/powerpc/kvm/book3s_xive_template.c
>> @@ -58,6 +58,9 @@ static u8 GLUE(X_PFX,esb_load)(struct xive_irq_data *xd,
>> u32 offset) {
>>  u64 val;
>>
>> +if (offset == XIVE_ESB_SET_PQ_10 && xd->flags & XIVE_IRQ_FLAG_STORE_EOI)
>> +offset |= XIVE_ESB_LD_ST_MO;
>> +
>>  if (xd->flags & XIVE_IRQ_FLAG_SHIFT_BUG)
>>  offset |= offset << 4;
>>
>> diff --git a/arch/powerpc/sysdev/xive/common.c
>> b/arch/powerpc/sysdev/xive/common.c index f5fadbd2533a..0dc421bb494f 100644
>> --- a/arch/powerpc/sysdev/xive/common.c
>> +++ b/arch/powerpc/sysdev/xive/common.c
>> @@ -202,6 +202,9 @@ static notrace u8 xive_esb_read(struct xive_irq_data
>> *xd, u32 offset) {
>>  u64 val;
>>
>> +if (offset == XIVE_ESB_SET_PQ_10 && 

Re: [RFC PATCH 06/10] powerpc/powernv: opal use new opal call entry point if it exists

2020-05-06 Thread Gautham R Shenoy
Hello Nicholas,

On Sat, May 02, 2020 at 09:19:10PM +1000, Nicholas Piggin wrote:
> OPAL may advertise new endian-specific entry point which has different
> calling conventions including using the caller's stack, but otherwise
> provides the standard OPAL call API without any changes required to
> the OS.
> 
> Signed-off-by: Nicholas Piggin 
> ---

[..snip..]

> index 506b1798081a..32857254d268 100644
> --- a/arch/powerpc/platforms/powernv/opal-call.c
> +++ b/arch/powerpc/platforms/powernv/opal-call.c
> @@ -92,6 +92,18 @@ static s64 __opal_call_trace(s64 a0, s64 a1, s64 a2, s64 
> a3,
>  #define DO_TRACE false
>  #endif /* CONFIG_TRACEPOINTS */
> 
> +struct opal {
> + u64 base;
> + u64 entry;
> + u64 size;
> + u64 v4_le_entry;
> +};
> +extern struct opal opal;
> +
> +typedef int64_t (*opal_v4_le_entry_fn)(uint64_t r3, uint64_t r4, uint64_t r5,
> +   uint64_t r6, uint64_t r7, uint64_t r8,
> +   uint64_t r9, uint64_t r10);
> +
>  static int64_t opal_call(int64_t a0, int64_t a1, int64_t a2, int64_t a3,
>int64_t a4, int64_t a5, int64_t a6, int64_t a7, int64_t opcode)
>  {
> @@ -99,6 +111,30 @@ static int64_t opal_call(int64_t a0, int64_t a1, int64_t 
> a2, int64_t a3,
>   unsigned long msr = mfmsr();
>   bool mmu = (msr & (MSR_IR|MSR_DR));
>   int64_t ret;
> + opal_v4_le_entry_fn fn;

fn should be initialized to NULL here to ensure correctness when
kernel is built with CONFIG_CPU_BIG_ENDIAN.

> +
> + if (IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN))
> + fn = (opal_v4_le_entry_fn)(opal.v4_le_entry);
> +
> + if (fn) {
> + if (!mmu) {
> + BUG_ON(msr & MSR_EE);
> + ret = fn(opcode, a0, a1, a2, a3, a4, a5, a6);
> + return ret;
> + }
> +
> + local_irq_save(flags);
> + hard_irq_disable(); /* XXX r13 */
> + msr &= ~MSR_EE;
> + mtmsr(msr & ~(MSR_IR|MSR_DR));
> +
> + ret = fn(opcode, a0, a1, a2, a3, a4, a5, a6);
> +
> + mtmsr(msr);
> + local_irq_restore(flags);
> +
> + return ret;
> + }
> 
>   msr &= ~MSR_EE;
> 

--
Thanks and Regards
gautham.


[PATCH] powerpc/8xx: Update email address in MAINTAINERS

2020-05-06 Thread Christophe Leroy
Since 01 May 2020, our email adresses have changed to @csgroup.eu

Update MAINTAINERS accordingly.

Signed-off-by: Christophe Leroy 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2926327e4976..e8714328cc90 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9794,7 +9794,7 @@ F:arch/powerpc/platforms/83xx/
 F: arch/powerpc/platforms/85xx/
 
 LINUX FOR POWERPC EMBEDDED PPC8XX
-M: Christophe Leroy 
+M: Christophe Leroy 
 L: linuxppc-dev@lists.ozlabs.org
 S: Maintained
 F: arch/powerpc/platforms/8xx/
-- 
2.25.0



Re: remove set_fs calls from the coredump code v6

2020-05-06 Thread Christoph Hellwig
On Tue, May 05, 2020 at 03:28:50PM -0500, Eric W. Biederman wrote:
> We probably can.   After introducing a kernel_compat_siginfo that is
> the size that userspace actually would need.
> 
> It isn't something I want to mess with until this code gets merged, as I
> think the set_fs cleanups are more important.
> 
> 
> Christoph made some good points about how ugly the #ifdefs are in
> the generic copy_siginfo_to_user32 implementation.

Take a look at the series you are replying to, the magic x86 ifdefs are
entirely gone from the common code :)


Re: [PATCH] powerpc/5200: update contact email

2020-05-06 Thread Michael Ellerman
Wolfram Sang  writes:
>> > My 'pengutronix' address is defunct for years. Merge the entries and use
>> > the proper contact address.
>> 
>> Is there any point adding the new address? It's just likely to bit-rot
>> one day too.
>
> At least, this one is a group address, not an individual one, so less
> likey.
>
>> I figure the git history is a better source for more up-to-date emails.
>
> But yes, can still be argued. I won't persist if you don't like it.

That's fine, I'll merge this. You've already gone to the trouble to send
it and it's better than what we have now.

cheers


Re: [PATCH V2 10/11] arch/kmap: Define kmap_atomic_prot() for all arch's

2020-05-06 Thread Christoph Hellwig
On Sun, May 03, 2020 at 06:09:11PM -0700, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> To support kmap_atomic_prot(), all architectures need to support
> protections passed to their kmap_atomic_high() function.  Pass
> protections into kmap_atomic_high() and change the name to
> kmap_atomic_high_prot() to match.
> 
> Then define kmap_atomic_prot() as a core function which calls
> kmap_atomic_high_prot() when needed.
> 
> Finally, redefine kmap_atomic() as a wrapper of kmap_atomic_prot() with
> the default kmap_prot exported by the architectures.
> 
> Signed-off-by: Ira Weiny 

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH V2 08/11] arch/kmap: Ensure kmap_prot visibility

2020-05-06 Thread Christoph Hellwig
On Sun, May 03, 2020 at 06:09:09PM -0700, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> We want to support kmap_atomic_prot() on all architectures and it makes
> sense to define kmap_atomic() to use the default kmap_prot.
> 
> So we ensure all arch's have a globally available kmap_prot either as a
> define or exported symbol.

FYI, I still think a

#ifndef kmap_prot
#define kmap_prot PAGE_KERNEL
#endif

in linux/highmem.h would be nicer.  Then only xtensa and sparc need
to override it and clearly stand out.


[PATCH V2 3/3] mm/hugetlb: Define a generic fallback for arch_clear_hugepage_flags()

2020-05-06 Thread Anshuman Khandual
There are multiple similar definitions for arch_clear_hugepage_flags() on
various platforms. Lets just add it's generic fallback definition for
platforms that do not override. This help reduce code duplication.

Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Thomas Bogendoerfer 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Mike Kravetz 
Cc: Andrew Morton 
Cc: x...@kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/arm/include/asm/hugetlb.h | 1 +
 arch/arm64/include/asm/hugetlb.h   | 1 +
 arch/ia64/include/asm/hugetlb.h| 4 
 arch/mips/include/asm/hugetlb.h| 4 
 arch/parisc/include/asm/hugetlb.h  | 4 
 arch/powerpc/include/asm/hugetlb.h | 4 
 arch/riscv/include/asm/hugetlb.h   | 4 
 arch/s390/include/asm/hugetlb.h| 1 +
 arch/sh/include/asm/hugetlb.h  | 1 +
 arch/sparc/include/asm/hugetlb.h   | 4 
 arch/x86/include/asm/hugetlb.h | 4 
 include/linux/hugetlb.h| 5 +
 12 files changed, 9 insertions(+), 28 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
index 9ecd516d1ff7..d02d6ca88e92 100644
--- a/arch/arm/include/asm/hugetlb.h
+++ b/arch/arm/include/asm/hugetlb.h
@@ -18,5 +18,6 @@ static inline void arch_clear_hugepage_flags(struct page 
*page)
 {
clear_bit(PG_dcache_clean, >flags);
 }
+#define arch_clear_hugepage_flags arch_clear_hugepage_flags
 
 #endif /* _ASM_ARM_HUGETLB_H */
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 8f58e052697a..94ba0c5bced2 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -21,6 +21,7 @@ static inline void arch_clear_hugepage_flags(struct page 
*page)
 {
clear_bit(PG_dcache_clean, >flags);
 }
+#define arch_clear_hugepage_flags arch_clear_hugepage_flags
 
 extern pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
struct page *page, int writable);
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 6ef50b9a4bdf..7e46ebde8c0c 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -28,10 +28,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #include 
 
 #endif /* _ASM_IA64_HUGETLB_H */
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 8b201e281f67..10e3be870df7 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -75,10 +75,6 @@ static inline int huge_ptep_set_access_flags(struct 
vm_area_struct *vma,
return changed;
 }
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #include 
 
 #endif /* __ASM_HUGETLB_H */
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 411d9d867baa..a69cf9efb0c1 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -42,10 +42,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty);
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #include 
 
 #endif /* _ASM_PARISC64_HUGETLB_H */
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index b167c869d72d..e6dfa63da552 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -61,10 +61,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
   unsigned long addr, pte_t *ptep,
   pte_t pte, int dirty);
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #include 
 
 #else /* ! CONFIG_HUGETLB_PAGE */
diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/hugetlb.h
index 866f6ae6467c..a5c2ca1d1cd8 100644
--- a/arch/riscv/include/asm/hugetlb.h
+++ b/arch/riscv/include/asm/hugetlb.h
@@ -5,8 +5,4 @@
 #include 
 #include 
 
-static inline void arch_clear_hugepage_flags(struct page *page)
-{
-}
-
 #endif /* _ASM_RISCV_HUGETLB_H */
diff --git 

[PATCH V2 2/3] mm/hugetlb: Define a generic fallback for is_hugepage_only_range()

2020-05-06 Thread Anshuman Khandual
There are multiple similar definitions for is_hugepage_only_range() on
various platforms. Lets just add it's generic fallback definition for
platforms that do not override. This help reduce code duplication.

Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Thomas Bogendoerfer 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Mike Kravetz 
Cc: Andrew Morton 
Cc: x...@kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Anshuman Khandual 
---
 arch/arm/include/asm/hugetlb.h | 6 --
 arch/arm64/include/asm/hugetlb.h   | 6 --
 arch/ia64/include/asm/hugetlb.h| 1 +
 arch/mips/include/asm/hugetlb.h| 7 ---
 arch/parisc/include/asm/hugetlb.h  | 6 --
 arch/powerpc/include/asm/hugetlb.h | 1 +
 arch/riscv/include/asm/hugetlb.h   | 6 --
 arch/s390/include/asm/hugetlb.h| 7 ---
 arch/sh/include/asm/hugetlb.h  | 6 --
 arch/sparc/include/asm/hugetlb.h   | 6 --
 arch/x86/include/asm/hugetlb.h | 6 --
 include/linux/hugetlb.h| 9 +
 12 files changed, 11 insertions(+), 56 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
index 318dcf5921ab..9ecd516d1ff7 100644
--- a/arch/arm/include/asm/hugetlb.h
+++ b/arch/arm/include/asm/hugetlb.h
@@ -14,12 +14,6 @@
 #include 
 #include 
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-unsigned long addr, unsigned long len)
-{
-   return 0;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
clear_bit(PG_dcache_clean, >flags);
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index b88878ddc88b..8f58e052697a 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -17,12 +17,6 @@
 extern bool arch_hugetlb_migration_supported(struct hstate *h);
 #endif
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-unsigned long addr, unsigned long len)
-{
-   return 0;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
clear_bit(PG_dcache_clean, >flags);
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 36cc0396b214..6ef50b9a4bdf 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -20,6 +20,7 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
return (REGION_NUMBER(addr) == RGN_HPAGE ||
REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE);
 }
+#define is_hugepage_only_range is_hugepage_only_range
 
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 425bb6fc3bda..8b201e281f67 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -11,13 +11,6 @@
 
 #include 
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-unsigned long addr,
-unsigned long len)
-{
-   return 0;
-}
-
 #define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
 static inline int prepare_hugepage_range(struct file *file,
 unsigned long addr,
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 7cb595dcb7d7..411d9d867baa 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -12,12 +12,6 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long 
addr,
 pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr,
  pte_t *ptep);
 
-static inline int is_hugepage_only_range(struct mm_struct *mm,
-unsigned long addr,
-unsigned long len) {
-   return 0;
-}
-
 /*
  * If the arch doesn't supply something else, assume that hugepage
  * size aligned regions are ok without further preparation.
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index bd6504c28c2f..b167c869d72d 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ 

[PATCH V2 0/3] mm/hugetlb: Add some new generic fallbacks

2020-05-06 Thread Anshuman Khandual
This series adds the following new generic fallbacks. Before that it drops
__HAVE_ARCH_HUGE_PTEP_GET from arm64 platform.

1. is_hugepage_only_range()
2. arch_clear_hugepage_flags()

This has been boot tested on arm64 and x86 platforms but built tested on
some more platforms including the changed ones here. This series applies
on v5.7-rc4. After this arm (32 bit) remains the sole platform defining
it's own huge_ptep_get() via __HAVE_ARCH_HUGE_PTEP_GET.

Changes in V2:

- Adopted "#ifndef func" method (adding a single symbol to namespace) per Andrew
- Updated the commit messages in [PATCH 2/3] and [PATCH 3/3] as required

Changes in V1: 
(https://patchwork.kernel.org/project/linux-mm/list/?series=270677)

Cc: Russell King 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Thomas Bogendoerfer 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Mike Kravetz 
Cc: Andrew Morton 
Cc: x...@kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-a...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org

Anshuman Khandual (3):
  arm64/mm: Drop __HAVE_ARCH_HUGE_PTEP_GET
  mm/hugetlb: Define a generic fallback for is_hugepage_only_range()
  mm/hugetlb: Define a generic fallback for arch_clear_hugepage_flags()

 arch/arm/include/asm/hugetlb.h |  7 +--
 arch/arm64/include/asm/hugetlb.h   | 13 +
 arch/ia64/include/asm/hugetlb.h|  5 +
 arch/mips/include/asm/hugetlb.h| 11 ---
 arch/parisc/include/asm/hugetlb.h  | 10 --
 arch/powerpc/include/asm/hugetlb.h |  5 +
 arch/riscv/include/asm/hugetlb.h   | 10 --
 arch/s390/include/asm/hugetlb.h|  8 +---
 arch/sh/include/asm/hugetlb.h  |  7 +--
 arch/sparc/include/asm/hugetlb.h   | 10 --
 arch/x86/include/asm/hugetlb.h | 10 --
 include/linux/hugetlb.h| 14 ++
 12 files changed, 20 insertions(+), 90 deletions(-)

-- 
2.20.1



Re: [PATCH V2 07/11] arch/kunmap_atomic: Consolidate duplicate code

2020-05-06 Thread Christoph Hellwig
On Sun, May 03, 2020 at 06:09:08PM -0700, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> Every single architecture (including !CONFIG_HIGHMEM) calls...
> 
>   pagefault_enable();
>   preempt_enable();
> 
> ... before returning from __kunmap_atomic().  Lift this code into the
> kunmap_atomic() macro.
> 
> While we are at it rename __kunmap_atomic() to kunmap_atomic_high() to
> be consistent.
> 
> Signed-off-by: Ira Weiny 

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH V2 06/11] arch/kmap_atomic: Consolidate duplicate code

2020-05-06 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH V2 05/11] {x86,powerpc,microblaze}/kmap: Move preempt disable

2020-05-06 Thread Christoph Hellwig
On Sun, May 03, 2020 at 06:09:06PM -0700, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> During this kmap() conversion series we must maintain bisect-ability.
> To do this, kmap_atomic_prot() in x86, powerpc, and microblaze need to
> remain functional.
> 
> Create a temporary inline version of kmap_atomic_prot within these
> architectures so we can rework their kmap_atomic() calls and then lift
> kmap_atomic_prot() to the core.
> 
> Signed-off-by: Ira Weiny 
> 
> ---
> Changes from V1:
>   New patch
> ---
>  arch/microblaze/include/asm/highmem.h | 11 ++-
>  arch/microblaze/mm/highmem.c  | 10 ++
>  arch/powerpc/include/asm/highmem.h| 11 ++-
>  arch/powerpc/mm/highmem.c |  9 ++---
>  arch/x86/include/asm/highmem.h| 11 ++-
>  arch/x86/mm/highmem_32.c  | 10 ++
>  6 files changed, 36 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/microblaze/include/asm/highmem.h 
> b/arch/microblaze/include/asm/highmem.h
> index 0c94046f2d58..ec9954b091e1 100644
> --- a/arch/microblaze/include/asm/highmem.h
> +++ b/arch/microblaze/include/asm/highmem.h
> @@ -51,7 +51,16 @@ extern pte_t *pkmap_page_table;
>  #define PKMAP_NR(virt)  ((virt - PKMAP_BASE) >> PAGE_SHIFT)
>  #define PKMAP_ADDR(nr)  (PKMAP_BASE + ((nr) << PAGE_SHIFT))
>  
> -extern void *kmap_atomic_prot(struct page *page, pgprot_t prot);
> +extern void *kmap_atomic_high_prot(struct page *page, pgprot_t prot);
> +void *kmap_atomic_prot(struct page *page, pgprot_t prot)

Shouldn't this be marked inline?

The rest looks fine:

Reviewed-by: Christoph Hellwig 


[PATCH -next] powerpc/kernel/sysfs: Use ARRAY_SIZE instead of calculating the array size

2020-05-06 Thread Samuel Zou
fix coccinelle warning, use ARRAY_SIZE

arch/powerpc/kernel/sysfs.c:853:34-35: WARNING: Use ARRAY_SIZE
arch/powerpc/kernel/sysfs.c:860:33-34: WARNING: Use ARRAY_SIZE
arch/powerpc/kernel/sysfs.c:868:28-29: WARNING: Use ARRAY_SIZE
arch/powerpc/kernel/sysfs.c:947:34-35: WARNING: Use ARRAY_SIZE
arch/powerpc/kernel/sysfs.c:954:33-34: WARNING: Use ARRAY_SIZE
arch/powerpc/kernel/sysfs.c:962:28-29: WARNING: Use ARRAY_SIZE

Reported-by: Hulk Robot 
Signed-off-by: Samuel Zou 
---
 arch/powerpc/kernel/sysfs.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 571b325..13d2423 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -850,14 +850,14 @@ static int register_cpu_online(unsigned int cpu)
 #ifdef HAS_PPC_PMC_IBM
case PPC_PMC_IBM:
attrs = ibm_common_attrs;
-   nattrs = sizeof(ibm_common_attrs) / sizeof(struct 
device_attribute);
+   nattrs = ARRAY_SIZE(ibm_common_attrs);
pmc_attrs = classic_pmc_attrs;
break;
 #endif /* HAS_PPC_PMC_IBM */
 #ifdef HAS_PPC_PMC_G4
case PPC_PMC_G4:
attrs = g4_common_attrs;
-   nattrs = sizeof(g4_common_attrs) / sizeof(struct 
device_attribute);
+   nattrs = ARRAY_SIZE(g4_common_attrs);
pmc_attrs = classic_pmc_attrs;
break;
 #endif /* HAS_PPC_PMC_G4 */
@@ -865,7 +865,7 @@ static int register_cpu_online(unsigned int cpu)
case PPC_PMC_PA6T:
/* PA Semi starts counting at PMC0 */
attrs = pa6t_attrs;
-   nattrs = sizeof(pa6t_attrs) / sizeof(struct device_attribute);
+   nattrs = ARRAY_SIZE(pa6t_attrs);
pmc_attrs = NULL;
break;
 #endif
@@ -944,14 +944,14 @@ static int unregister_cpu_online(unsigned int cpu)
 #ifdef HAS_PPC_PMC_IBM
case PPC_PMC_IBM:
attrs = ibm_common_attrs;
-   nattrs = sizeof(ibm_common_attrs) / sizeof(struct 
device_attribute);
+   nattrs = ARRAY_SIZE(ibm_common_attrs);
pmc_attrs = classic_pmc_attrs;
break;
 #endif /* HAS_PPC_PMC_IBM */
 #ifdef HAS_PPC_PMC_G4
case PPC_PMC_G4:
attrs = g4_common_attrs;
-   nattrs = sizeof(g4_common_attrs) / sizeof(struct 
device_attribute);
+   nattrs = ARRAY_SIZE(g4_common_attrs);
pmc_attrs = classic_pmc_attrs;
break;
 #endif /* HAS_PPC_PMC_G4 */
@@ -959,7 +959,7 @@ static int unregister_cpu_online(unsigned int cpu)
case PPC_PMC_PA6T:
/* PA Semi starts counting at PMC0 */
attrs = pa6t_attrs;
-   nattrs = sizeof(pa6t_attrs) / sizeof(struct device_attribute);
+   nattrs = ARRAY_SIZE(pa6t_attrs);
pmc_attrs = NULL;
break;
 #endif
-- 
2.6.2