[PATCH v1 00/13] powerpc/32s: Use BATs for STRICT_KERNEL_RWX

2018-11-29 Thread Christophe Leroy
The purpose of this serie is to use BATs with STRICT_KERNEL_RWX
See patch 12 for details.

Christophe Leroy (13):
  powerpc/mm: add exec protection on powerpc 603
  powerpc/mm/32: add base address to mmu_mapin_ram()
  powerpc/mm/32s: rework mmu_mapin_ram()
  powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
  powerpc/wii: remove wii_mmu_mapin_mem2()
  powerpc/mm/32s: use _PAGE_EXEC in setbat()
  powerpc/mm/32s: add setibat() clearibat() and update_bats()
  powerpc/32: add helper to write into segment registers
  powerpc/mmu: add is_strict_kernel_rwx() helper
  powerpc/kconfig: define PAGE_SHIFT inside Kconfig
  powerpc/kconfig: define CONFIG_DATA_SHIFT and CONFIG_ETEXT_SHIFT
  powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX
  powerpc/kconfig: make _etext and data areas alignment configurable on
Book3s 32

 arch/powerpc/Kconfig   |  46 +++
 arch/powerpc/include/asm/book3s/32/hash.h  |   1 +
 arch/powerpc/include/asm/book3s/32/mmu-hash.h  |   2 +
 arch/powerpc/include/asm/book3s/32/pgtable.h   |  29 ++--
 arch/powerpc/include/asm/cputable.h|   8 +-
 arch/powerpc/include/asm/mmu.h |  11 ++
 arch/powerpc/include/asm/page.h|  13 +-
 arch/powerpc/include/asm/reg.h |   5 +
 arch/powerpc/kernel/head_32.S  |  37 -
 arch/powerpc/kernel/vmlinux.lds.S  |   9 +-
 arch/powerpc/mm/40x_mmu.c  |   2 +-
 arch/powerpc/mm/44x_mmu.c  |   2 +-
 arch/powerpc/mm/8xx_mmu.c  |   2 +-
 arch/powerpc/mm/dump_linuxpagetables-generic.c |   2 -
 arch/powerpc/mm/fsl_booke_mmu.c|   2 +-
 arch/powerpc/mm/init_32.c  |   6 +-
 arch/powerpc/mm/mmu_decl.h |  10 +-
 arch/powerpc/mm/pgtable.c  |  20 +--
 arch/powerpc/mm/pgtable_32.c   |  35 +++--
 arch/powerpc/mm/ppc_mmu_32.c   | 178 +
 arch/powerpc/platforms/embedded6xx/wii.c   |  24 
 21 files changed, 324 insertions(+), 120 deletions(-)

-- 
2.13.3



[PATCH v1 04/13] powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.

2018-11-29 Thread Christophe Leroy
Now that mmu_mapin_ram() is able to handle other blocks
than the one starting at 0, the WII can use it for all
its blocks.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable_32.c | 25 +++--
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index c030f24d1d05..68a5e2be5343 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -271,26 +271,15 @@ static void __init __mapin_ram_chunk(unsigned long 
offset, unsigned long top)
 
 void __init mapin_ram(void)
 {
-   unsigned long s, top;
-
-#ifndef CONFIG_WII
-   top = total_lowmem;
-   s = mmu_mapin_ram(0, top);
-   __mapin_ram_chunk(s, top);
-#else
-   if (!wii_hole_size) {
-   s = mmu_mapin_ram(0, total_lowmem);
-   __mapin_ram_chunk(s, total_lowmem);
-   } else {
-   top = wii_hole_start;
-   s = mmu_mapin_ram(0, top);
-   __mapin_ram_chunk(s, top);
+   struct memblock_region *reg;
+
+   for_each_memblock(memory, reg) {
+   unsigned long base = reg->base;
+   unsigned long top = base + reg->size;
 
-   top = memblock_end_of_DRAM();
-   s = wii_mmu_mapin_mem2(top);
-   __mapin_ram_chunk(s, top);
+   base = mmu_mapin_ram(base, top);
+   __mapin_ram_chunk(base, top);
}
-#endif
 }
 
 /* Scan the real Linux page tables and return a PTE pointer for
-- 
2.13.3



[PATCH v1 10/13] powerpc/kconfig: define PAGE_SHIFT inside Kconfig

2018-11-29 Thread Christophe Leroy
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig|  7 +++
 arch/powerpc/include/asm/page.h | 13 ++---
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8be31261aec8..4a81a80d0635 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -711,6 +711,13 @@ config PPC_256K_PAGES
 
 endchoice
 
+config PPC_PAGE_SHIFT
+   int
+   default 18 if PPC_256K_PAGES
+   default 16 if PPC_64K_PAGES
+   default 14 if PPC_16K_PAGES
+   default 12
+
 config THREAD_SHIFT
int "Thread shift" if EXPERT
range 13 15
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 9ea903221a9f..d12a55441629 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -20,20 +20,11 @@
 
 /*
  * On regular PPC32 page size is 4K (but we support 4K/16K/64K/256K pages
- * on PPC44x). For PPC64 we support either 4K or 64K software
+ * on PPC44x and 4K/16K on 8xx). For PPC64 we support either 4K or 64K software
  * page size. When using 64K pages however, whether we are really supporting
  * 64K pages in HW or not is irrelevant to those definitions.
  */
-#if defined(CONFIG_PPC_256K_PAGES)
-#define PAGE_SHIFT 18
-#elif defined(CONFIG_PPC_64K_PAGES)
-#define PAGE_SHIFT 16
-#elif defined(CONFIG_PPC_16K_PAGES)
-#define PAGE_SHIFT 14
-#else
-#define PAGE_SHIFT 12
-#endif
-
+#define PAGE_SHIFT CONFIG_PPC_PAGE_SHIFT
 #define PAGE_SIZE  (ASM_CONST(1) << PAGE_SHIFT)
 
 #ifndef __ASSEMBLY__
-- 
2.13.3



Re: [PATCH v8 07/20] powerpc/mm: add helpers to get/set mm.context->pte_frag

2018-11-29 Thread kbuild test robot
Hi Christophe,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.20-rc4]
[cannot apply to next-20181129]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/powerpc-book3s32-Remove-CONFIG_BOOKE-dependent-code/20181129-210058
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/book3s/64/mmu-hash.h:24:0,
from arch/powerpc/include/asm/book3s/64/mmu.h:39,
from arch/powerpc/include/asm/mmu.h:328,
from arch/powerpc/include/asm/lppaca.h:36,
from arch/powerpc/include/asm/paca.h:21,
from arch/powerpc/include/asm/hw_irq.h:64,
from arch/powerpc/include/asm/irqflags.h:12,
from include/linux/irqflags.h:16,
from include/linux/spinlock.h:54,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/slab.h:15,
from include/linux/crypto.h:24,
from include/crypto/algapi.h:15,
from include/crypto/internal/hash.h:16,
from arch/powerpc/crypto/md5-glue.c:15:
>> arch/powerpc/include/asm/book3s/64/pgtable.h:219:21: error: "__pte_frag_nr" 
>> is not defined, evaluates to 0 [-Werror=undef]
#define PTE_FRAG_NR __pte_frag_nr
^
   arch/powerpc/include/asm/pgtable.h:123:5: note: in expansion of macro 
'PTE_FRAG_NR'
#if PTE_FRAG_NR != 1
^~~
   cc1: all warnings being treated as errors

vim +/__pte_frag_nr +219 arch/powerpc/include/asm/book3s/64/pgtable.h

5ed7ecd0 Aneesh Kumar K.V 2016-04-29  217  
5ed7ecd0 Aneesh Kumar K.V 2016-04-29  218  extern unsigned long __pte_frag_nr;
5ed7ecd0 Aneesh Kumar K.V 2016-04-29 @219  #define PTE_FRAG_NR __pte_frag_nr
5ed7ecd0 Aneesh Kumar K.V 2016-04-29  220  extern unsigned long 
__pte_frag_size_shift;
5ed7ecd0 Aneesh Kumar K.V 2016-04-29  221  #define PTE_FRAG_SIZE_SHIFT 
__pte_frag_size_shift
5ed7ecd0 Aneesh Kumar K.V 2016-04-29  222  #define PTE_FRAG_SIZE (1UL << 
PTE_FRAG_SIZE_SHIFT)
dd1842a2 Aneesh Kumar K.V 2016-04-29  223  

:: The code at line 219 was first introduced by commit
:: 5ed7ecd08a0807d6d616c3d958402f9c723bb048 powerpc/mm: pte_frag abstraction

:: TO: Aneesh Kumar K.V 
:: CC: Michael Ellerman 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH v1 01/13] powerpc/mm: add exec protection on powerpc 603

2018-11-29 Thread Christophe Leroy
The 603 doesn't have a HASH table, TLB misses are handled by
software. It is then possible to generate page fault when
_PAGE_EXEC is not set like in nohash/32.

There is one "reserved" PTE bit available, this patch uses
it for _PAGE_EXEC.

In order to support it, set_pte_filter() and
set_access_flags_filter() are made common, and the handling
is made dependent on MMU_FTR_HPTE_TABLE

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/hash.h  |  1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h   | 18 +-
 arch/powerpc/include/asm/cputable.h|  8 
 arch/powerpc/kernel/head_32.S  |  2 +-
 arch/powerpc/mm/dump_linuxpagetables-generic.c |  2 --
 arch/powerpc/mm/pgtable.c  | 20 +++-
 6 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/hash.h 
b/arch/powerpc/include/asm/book3s/32/hash.h
index f2892c7ab73e..2a0a467d2985 100644
--- a/arch/powerpc/include/asm/book3s/32/hash.h
+++ b/arch/powerpc/include/asm/book3s/32/hash.h
@@ -26,6 +26,7 @@
 #define _PAGE_WRITETHRU0x040   /* W: cache write-through */
 #define _PAGE_DIRTY0x080   /* C: page changed */
 #define _PAGE_ACCESSED 0x100   /* R: page referenced */
+#define _PAGE_EXEC 0x200   /* software: exec allowed */
 #define _PAGE_RW   0x400   /* software: user write access allowed */
 #define _PAGE_SPECIAL  0x800   /* software: Special page */
 
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index c21d33704633..cf844fed4527 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -10,9 +10,9 @@
 /* And here we include common definitions */
 
 #define _PAGE_KERNEL_RO0
-#define _PAGE_KERNEL_ROX   0
+#define _PAGE_KERNEL_ROX   (_PAGE_EXEC)
 #define _PAGE_KERNEL_RW(_PAGE_DIRTY | _PAGE_RW)
-#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW)
+#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC)
 
 #define _PAGE_HPTEFLAGS _PAGE_HASHPTE
 
@@ -66,11 +66,11 @@ static inline bool pte_user(pte_t pte)
  */
 #define PAGE_NONE  __pgprot(_PAGE_BASE)
 #define PAGE_SHARED__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
-#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
+#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | 
_PAGE_EXEC)
 #define PAGE_COPY  __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER)
+#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
 #define PAGE_READONLY  __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER)
+#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
 
 /* Permission masks used for kernel mappings */
 #define PAGE_KERNEL__pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
@@ -318,7 +318,7 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
   int psize)
 {
unsigned long set = pte_val(entry) &
-   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW);
+   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
 
pte_update(ptep, 0, set);
 
@@ -384,7 +384,7 @@ static inline int pte_dirty(pte_t pte)  { 
return !!(pte_val(pte) & _PAGE_DIRTY);
 static inline int pte_young(pte_t pte) { return !!(pte_val(pte) & 
_PAGE_ACCESSED); }
 static inline int pte_special(pte_t pte)   { return !!(pte_val(pte) & 
_PAGE_SPECIAL); }
 static inline int pte_none(pte_t pte)  { return (pte_val(pte) & 
~_PTE_NONE_MASK) == 0; }
-static inline bool pte_exec(pte_t pte) { return true; }
+static inline bool pte_exec(pte_t pte) { return pte_val(pte) & 
_PAGE_EXEC; }
 
 static inline int pte_present(pte_t pte)
 {
@@ -451,7 +451,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
 
 static inline pte_t pte_exprotect(pte_t pte)
 {
-   return pte;
+   return __pte(pte_val(pte) & ~_PAGE_EXEC);
 }
 
 static inline pte_t pte_mkclean(pte_t pte)
@@ -466,7 +466,7 @@ static inline pte_t pte_mkold(pte_t pte)
 
 static inline pte_t pte_mkexec(pte_t pte)
 {
-   return pte;
+   return __pte(pte_val(pte) | _PAGE_EXEC);
 }
 
 static inline pte_t pte_mkpte(pte_t pte)
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 29f49a35d6ee..a0395ccbbe9e 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -296,7 +296,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTRS_PPC601(CPU_FTR_COMMON | CPU_FTR_601 | \
CPU_FTR_COHERENT_ICACHE | CPU_FTR_UNIFIED_ID_CACHE | CPU_FTR_USE_RTC)
 #define CPU_FTRS_603   (CPU_FTR_COMMON | CPU_FTR_MAYBE_CAN_DOZE | \
-   CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE)
+

[PATCH v1 06/13] powerpc/mm/32s: use _PAGE_EXEC in setbat()

2018-11-29 Thread Christophe Leroy
Do not set IBAT when setbat() is called without _PAGE_EXEC

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ppc_mmu_32.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index 61c10ee00ba2..1078095d9407 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -130,6 +130,7 @@ unsigned long __init mmu_mapin_ram(unsigned long base, 
unsigned long top)
  * Set up one of the I/D BAT (block address translation) register pairs.
  * The parameters are not checked; in particular size must be a power
  * of 2 between 128k and 256M.
+ * On 603+, only set IBAT when _PAGE_EXEC is set
  */
 void __init setbat(int index, unsigned long virt, phys_addr_t phys,
   unsigned int size, pgprot_t prot)
@@ -156,11 +157,12 @@ void __init setbat(int index, unsigned long virt, 
phys_addr_t phys,
bat[1].batu |= 1;   /* Vp = 1 */
if (flags & _PAGE_GUARDED) {
/* G bit must be zero in IBATs */
-   bat[0].batu = bat[0].batl = 0;
-   } else {
-   /* make IBAT same as DBAT */
-   bat[0] = bat[1];
+   flags &= ~_PAGE_EXEC;
}
+   if (flags & _PAGE_EXEC)
+   bat[0] = bat[1];
+   else
+   bat[0].batu = bat[0].batl = 0;
} else {
/* 601 cpu */
if (bl > BL_8M)
-- 
2.13.3



Re: [PATCH v8 04/14] integrity: Introduce struct evm_xattr

2018-11-29 Thread Mimi Zohar
On Fri, 2018-11-16 at 18:07 -0200, Thiago Jung Bauermann wrote:
> Even though struct evm_ima_xattr_data includes a fixed-size array to hold a
> SHA1 digest, most of the code ignores the array and uses the struct to mean
> "type indicator followed by data of unspecified size" and tracks the real
> size of what the struct represents in a separate length variable.
> 
> The only exception to that is the EVM code, which correctly uses the
> definition of struct evm_ima_xattr_data.
> 
> So make this explicit in the code by removing the length specification from
> the array in struct evm_ima_xattr_data. Also, change the name of the
> element from digest to data since in most places the array doesn't hold a
> digest.
> 
> A separate struct evm_xattr is introduced, with the original definition of
> evm_ima_xattr_data to be used in the places that actually expect that
> definition.

, specifically the EVM HMAC code.

> 
> Signed-off-by: Thiago Jung Bauermann 

Other than commenting the evm_xattr usage is limited to HMAC before
the structure definition, this looks good.

Reviewed-by: Mimi Zohar 

> ---
>  security/integrity/evm/evm_main.c | 8 
>  security/integrity/ima/ima_appraise.c | 7 ---
>  security/integrity/integrity.h| 5 +
>  3 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/security/integrity/evm/evm_main.c 
> b/security/integrity/evm/evm_main.c
> index 7f3f54d89a6e..a1b42d10efc7 100644
> --- a/security/integrity/evm/evm_main.c
> +++ b/security/integrity/evm/evm_main.c
> @@ -169,7 +169,7 @@ static enum integrity_status evm_verify_hmac(struct 
> dentry *dentry,
>   /* check value type */
>   switch (xattr_data->type) {
>   case EVM_XATTR_HMAC:
> - if (xattr_len != sizeof(struct evm_ima_xattr_data)) {
> + if (xattr_len != sizeof(struct evm_xattr)) {
>   evm_status = INTEGRITY_FAIL;
>   goto out;
>   }
> @@ -179,7 +179,7 @@ static enum integrity_status evm_verify_hmac(struct 
> dentry *dentry,
>  xattr_value_len, );
>   if (rc)
>   break;
> - rc = crypto_memneq(xattr_data->digest, digest.digest,
> + rc = crypto_memneq(xattr_data->data, digest.digest,
>  SHA1_DIGEST_SIZE);
>   if (rc)
>   rc = -EINVAL;
> @@ -523,7 +523,7 @@ int evm_inode_init_security(struct inode *inode,
>const struct xattr *lsm_xattr,
>struct xattr *evm_xattr)
>  {
> - struct evm_ima_xattr_data *xattr_data;
> + struct evm_xattr *xattr_data;
>   int rc;
>  
>   if (!evm_key_loaded() || !evm_protected_xattr(lsm_xattr->name))
> @@ -533,7 +533,7 @@ int evm_inode_init_security(struct inode *inode,
>   if (!xattr_data)
>   return -ENOMEM;
>  
> - xattr_data->type = EVM_XATTR_HMAC;
> + xattr_data->data.type = EVM_XATTR_HMAC;
>   rc = evm_init_hmac(inode, lsm_xattr, xattr_data->digest);
>   if (rc < 0)
>   goto out;
> diff --git a/security/integrity/ima/ima_appraise.c 
> b/security/integrity/ima/ima_appraise.c
> index deec1804a00a..8bcef90939f8 100644
> --- a/security/integrity/ima/ima_appraise.c
> +++ b/security/integrity/ima/ima_appraise.c
> @@ -167,7 +167,8 @@ enum hash_algo ima_get_hash_algo(struct 
> evm_ima_xattr_data *xattr_value,
>   return sig->hash_algo;
>   break;
>   case IMA_XATTR_DIGEST_NG:
> - ret = xattr_value->digest[0];
> + /* first byte contains algorithm id */
> + ret = xattr_value->data[0];
>   if (ret < HASH_ALGO__LAST)
>   return ret;
>   break;
> @@ -175,7 +176,7 @@ enum hash_algo ima_get_hash_algo(struct 
> evm_ima_xattr_data *xattr_value,
>   /* this is for backward compatibility */
>   if (xattr_len == 21) {
>   unsigned int zero = 0;
> - if (!memcmp(_value->digest[16], , 4))
> + if (!memcmp(_value->data[16], , 4))
>   return HASH_ALGO_MD5;
>   else
>   return HASH_ALGO_SHA1;
> @@ -274,7 +275,7 @@ int ima_appraise_measurement(enum ima_hooks func,
>   /* xattr length may be longer. md5 hash in previous
>  version occupied 20 bytes in xattr, instead of 16
>*/
> - rc = memcmp(_value->digest[hash_start],
> + rc = memcmp(_value->data[hash_start],
>   iint->ima_hash->digest,
>   iint->ima_hash->length);
>   else
> diff --git a/security/integrity/integrity.h b/security/integrity/integrity.h
> index e60473b13a8d..20ac02bf1b84 100644
> --- a/security/integrity/integrity.h
> +++ 

[PATCH v1 02/13] powerpc/mm/32: add base address to mmu_mapin_ram()

2018-11-29 Thread Christophe Leroy
At the time being, mmu_mapin_ram() always maps RAM from the beginning.
But some platforms like the WII have to map a second block of RAM.

This patch adds to mmu_mapin_ram() the base address of the block.
At the moment, only base address 0 is supported.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/40x_mmu.c   | 2 +-
 arch/powerpc/mm/44x_mmu.c   | 2 +-
 arch/powerpc/mm/8xx_mmu.c   | 2 +-
 arch/powerpc/mm/fsl_booke_mmu.c | 2 +-
 arch/powerpc/mm/mmu_decl.h  | 2 +-
 arch/powerpc/mm/pgtable_32.c| 6 +++---
 arch/powerpc/mm/ppc_mmu_32.c| 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/40x_mmu.c b/arch/powerpc/mm/40x_mmu.c
index 61ac468c87c6..b9cf6f8764b0 100644
--- a/arch/powerpc/mm/40x_mmu.c
+++ b/arch/powerpc/mm/40x_mmu.c
@@ -93,7 +93,7 @@ void __init MMU_init_hw(void)
 #define LARGE_PAGE_SIZE_16M(1<<24)
 #define LARGE_PAGE_SIZE_4M (1<<22)
 
-unsigned long __init mmu_mapin_ram(unsigned long top)
+unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
unsigned long v, s, mapped;
phys_addr_t p;
diff --git a/arch/powerpc/mm/44x_mmu.c b/arch/powerpc/mm/44x_mmu.c
index 12d92518e898..f59df82896a0 100644
--- a/arch/powerpc/mm/44x_mmu.c
+++ b/arch/powerpc/mm/44x_mmu.c
@@ -178,7 +178,7 @@ void __init MMU_init_hw(void)
flush_instruction_cache();
 }
 
-unsigned long __init mmu_mapin_ram(unsigned long top)
+unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
unsigned long addr;
unsigned long memstart = memstart_addr & ~(PPC_PIN_SIZE - 1);
diff --git a/arch/powerpc/mm/8xx_mmu.c b/arch/powerpc/mm/8xx_mmu.c
index 01b7f5107c3a..50b640c7a7f9 100644
--- a/arch/powerpc/mm/8xx_mmu.c
+++ b/arch/powerpc/mm/8xx_mmu.c
@@ -107,7 +107,7 @@ static void __init mmu_patch_cmp_limit(s32 *site, unsigned 
long mapped)
patch_instruction_site(site, instr);
 }
 
-unsigned long __init mmu_mapin_ram(unsigned long top)
+unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
unsigned long mapped;
 
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index 080d49b26c3a..210cbc1faf63 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -221,7 +221,7 @@ unsigned long map_mem_in_cams(unsigned long ram, int 
max_cam_idx, bool dryrun)
 #error "LOWMEM_CAM_NUM must be less than NUM_TLBCAMS"
 #endif
 
-unsigned long __init mmu_mapin_ram(unsigned long top)
+unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
return tlbcam_addrs[tlbcam_index - 1].limit - PAGE_OFFSET + 1;
 }
diff --git a/arch/powerpc/mm/mmu_decl.h b/arch/powerpc/mm/mmu_decl.h
index 8574fbbc45e0..c29f061b1678 100644
--- a/arch/powerpc/mm/mmu_decl.h
+++ b/arch/powerpc/mm/mmu_decl.h
@@ -130,7 +130,7 @@ extern void wii_memory_fixups(void);
  */
 #ifdef CONFIG_PPC32
 extern void MMU_init_hw(void);
-extern unsigned long mmu_mapin_ram(unsigned long top);
+unsigned long mmu_mapin_ram(unsigned long base, unsigned long top);
 #endif
 
 #ifdef CONFIG_PPC_FSL_BOOK3E
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index bda3c6f1bd32..c030f24d1d05 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -275,15 +275,15 @@ void __init mapin_ram(void)
 
 #ifndef CONFIG_WII
top = total_lowmem;
-   s = mmu_mapin_ram(top);
+   s = mmu_mapin_ram(0, top);
__mapin_ram_chunk(s, top);
 #else
if (!wii_hole_size) {
-   s = mmu_mapin_ram(total_lowmem);
+   s = mmu_mapin_ram(0, total_lowmem);
__mapin_ram_chunk(s, total_lowmem);
} else {
top = wii_hole_start;
-   s = mmu_mapin_ram(top);
+   s = mmu_mapin_ram(0, top);
__mapin_ram_chunk(s, top);
 
top = memblock_end_of_DRAM();
diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index f6f575bae3bc..3a29e88308b0 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -72,7 +72,7 @@ unsigned long p_block_mapped(phys_addr_t pa)
return 0;
 }
 
-unsigned long __init mmu_mapin_ram(unsigned long top)
+unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
unsigned long tot, bl, done;
unsigned long max_size = (256<<20);
-- 
2.13.3



[PATCH v1 07/13] powerpc/mm/32s: add setibat() clearibat() and update_bats()

2018-11-29 Thread Christophe Leroy
setibat() and clearibat() allows to manipulate IBATs independently
of DBATs.

update_bats() allows to update bats after init. This is done
with MMU off.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  2 ++
 arch/powerpc/kernel/head_32.S | 35 +++
 arch/powerpc/mm/ppc_mmu_32.c  | 32 
 3 files changed, 69 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index e38c91388c40..b4ccb832d4fb 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -83,6 +83,8 @@ typedef struct {
unsigned long vdso_base;
 } mm_context_t;
 
+void update_bats(void);
+
 #endif /* !__ASSEMBLY__ */
 
 /* We happily ignore the smaller BATs on 601, we don't actually use
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index d1c39b5ccfd6..0f4c72ebb151 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -1101,6 +1101,41 @@ BEGIN_MMU_FTR_SECTION
 END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
blr
 
+_ENTRY(update_bats)
+   lis r4, 1f@h
+   ori r4, r4, 1f@l
+   tophys(r4, r4)
+   mfmsr   r6
+   mflrr7
+   li  r3, MSR_KERNEL & ~(MSR_IR | MSR_DR)
+   rlwinm  r0, r6, 0, ~MSR_RI
+   rlwinm  r0, r0, 0, ~MSR_EE
+   mtmsr   r0
+   mtspr   SPRN_SRR0, r4
+   mtspr   SPRN_SRR1, r3
+   SYNC
+   RFI
+1: bl  clear_bats
+   lis r3, BATS@ha
+   addir3, r3, BATS@l
+   tophys(r3, r3)
+   LOAD_BAT(0, r3, r4, r5)
+   LOAD_BAT(1, r3, r4, r5)
+   LOAD_BAT(2, r3, r4, r5)
+   LOAD_BAT(3, r3, r4, r5)
+BEGIN_MMU_FTR_SECTION
+   LOAD_BAT(4, r3, r4, r5)
+   LOAD_BAT(5, r3, r4, r5)
+   LOAD_BAT(6, r3, r4, r5)
+   LOAD_BAT(7, r3, r4, r5)
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
+   li  r3, MSR_KERNEL & ~(MSR_IR | MSR_DR | MSR_RI)
+   mtmsr   r3
+   mtspr   SPRN_SRR0, r7
+   mtspr   SPRN_SRR1, r6
+   SYNC
+   RFI
+
 flush_tlbs:
lis r10, 0x40
 1: addic.  r10, r10, -0x1000
diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index 1078095d9407..58dd71686707 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -105,6 +105,38 @@ static unsigned int block_size(unsigned long base, 
unsigned long top)
return min3(max_size, 1U << base_shift, 1U << block_shift);
 }
 
+/*
+ * Set up one of the IBAT (block address translation) register pairs.
+ * The parameters are not checked; in particular size must be a power
+ * of 2 between 128k and 256M.
+ * Only for 603+ ...
+ */
+static void setibat(int index, unsigned long virt, phys_addr_t phys,
+   unsigned int size, pgprot_t prot)
+{
+   unsigned int bl = (size >> 17) - 1;
+   int wimgxpp;
+   struct ppc_bat *bat = BATS[index];
+   unsigned long flags = pgprot_val(prot);
+
+   if (!cpu_has_feature(CPU_FTR_NEED_COHERENT))
+   flags &= ~_PAGE_COHERENT;
+
+   wimgxpp = (flags & _PAGE_COHERENT) | (_PAGE_EXEC ? BPP_RX : BPP_XX);
+   bat[0].batu = virt | (bl << 2) | 2; /* Vs=1, Vp=0 */
+   bat[0].batl = BAT_PHYS_ADDR(phys) | wimgxpp;
+   if (flags & _PAGE_USER)
+   bat[0].batu |= 1;   /* Vp = 1 */
+}
+
+static void clearibat(int index)
+{
+   struct ppc_bat *bat = BATS[index];
+
+   bat[0].batu = 0;
+   bat[0].batl = 0;
+}
+
 unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
int idx;
-- 
2.13.3



Re: [PATCH v8 17/20] powerpc/8xx: Enable 512k hugepage support with HW assistance

2018-11-29 Thread kbuild test robot
Hi Christophe,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.20-rc4]
[cannot apply to next-20181129]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/powerpc-book3s32-Remove-CONFIG_BOOKE-dependent-code/20181129-210058
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/book3s/64/mmu-hash.h:24:0,
from arch/powerpc/include/asm/book3s/64/mmu.h:39,
from arch/powerpc/include/asm/mmu.h:328,
from arch/powerpc/include/asm/lppaca.h:36,
from arch/powerpc/include/asm/paca.h:21,
from arch/powerpc/include/asm/hw_irq.h:64,
from arch/powerpc/include/asm/irqflags.h:12,
from include/linux/irqflags.h:16,
from include/linux/spinlock.h:54,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/mm.h:10,
from arch/powerpc//mm/hugetlbpage.c:11:
   arch/powerpc/include/asm/book3s/64/pgtable.h:219:21: error: "__pte_frag_nr" 
is not defined, evaluates to 0 [-Werror=undef]
#define PTE_FRAG_NR __pte_frag_nr
^
   arch/powerpc/include/asm/pgtable.h:123:5: note: in expansion of macro 
'PTE_FRAG_NR'
#if PTE_FRAG_NR != 1
^~~
   In file included from arch/powerpc/include/asm/book3s/pgalloc.h:10:0,
from arch/powerpc/include/asm/pgalloc.h:24,
from arch/powerpc//mm/hugetlbpage.c:23:
   arch/powerpc//mm/hugetlbpage.c: In function '__hugepte_alloc':
>> arch/powerpc//mm/hugetlbpage.c:69:22: error: 'PTE_SHIFT' undeclared (first 
>> use in this function); did you mean 'PUD_SHIFT'?
  cachep = PGT_CACHE(PTE_SHIFT);
 ^
   arch/powerpc/include/asm/book3s/64/pgalloc.h:40:40: note: in definition of 
macro 'PGT_CACHE'
#define PGT_CACHE(shift) pgtable_cache[shift]
   ^
   arch/powerpc//mm/hugetlbpage.c:69:22: note: each undeclared identifier is 
reported only once for each function it appears in
  cachep = PGT_CACHE(PTE_SHIFT);
 ^
   arch/powerpc/include/asm/book3s/64/pgalloc.h:40:40: note: in definition of 
macro 'PGT_CACHE'
#define PGT_CACHE(shift) pgtable_cache[shift]
   ^
   arch/powerpc//mm/hugetlbpage.c: In function 'free_hugepd_range':
   arch/powerpc//mm/hugetlbpage.c:339:29: error: 'PTE_SHIFT' undeclared (first 
use in this function); did you mean 'PUD_SHIFT'?
 get_hugepd_cache_index(PTE_SHIFT));
^
PUD_SHIFT
   arch/powerpc//mm/hugetlbpage.c: In function 'hugetlbpage_init':
   arch/powerpc//mm/hugetlbpage.c:710:22: error: 'PTE_SHIFT' undeclared (first 
use in this function); did you mean 'PUD_SHIFT'?
   pgtable_cache_add(PTE_SHIFT);
 ^
 PUD_SHIFT
   cc1: all warnings being treated as errors

vim +69 arch/powerpc//mm/hugetlbpage.c

  > 23  #include 
24  #include 
25  #include 
26  #include 
27  #include 
28  
29  
30  #ifdef CONFIG_HUGETLB_PAGE
31  
32  #define PAGE_SHIFT_64K  16
33  #define PAGE_SHIFT_512K 19
34  #define PAGE_SHIFT_8M   23
35  #define PAGE_SHIFT_16M  24
36  #define PAGE_SHIFT_16G  34
37  
38  bool hugetlb_disabled = false;
39  
40  unsigned int HPAGE_SHIFT;
41  EXPORT_SYMBOL(HPAGE_SHIFT);
42  
43  #define hugepd_none(hpd)(hpd_val(hpd) == 0)
44  
45  #define PTE_T_ORDER (__builtin_ffs(sizeof(pte_t)) - 
__builtin_ffs(sizeof(void *)))
46  
47  pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, 
unsigned long sz)
48  {
49  /*
50   * Only called for hugetlbfs pages, hence can ignore THP and the
51   * irq disabled walk.
52   */
53  return __find_linux_pte(mm->pgd, addr, NULL, NULL);
54  }
55  
56  static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp,
57 unsigned long address, unsigned int pdshift,
58 unsigned int pshift, spinlock_t *ptl

Re: [PATCH v8 07/20] powerpc/mm: add helpers to get/set mm.context->pte_frag

2018-11-29 Thread kbuild test robot
Hi Christophe,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on v4.20-rc4]
[cannot apply to next-20181129]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Christophe-Leroy/powerpc-book3s32-Remove-CONFIG_BOOKE-dependent-code/20181129-210058
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All warnings (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/book3s/64/mmu-hash.h:24:0,
from arch/powerpc/include/asm/book3s/64/mmu.h:39,
from arch/powerpc/include/asm/mmu.h:328,
from arch/powerpc/include/asm/lppaca.h:36,
from arch/powerpc/include/asm/paca.h:21,
from arch/powerpc/include/asm/hw_irq.h:64,
from arch/powerpc/include/asm/irqflags.h:12,
from include/linux/irqflags.h:16,
from include/linux/spinlock.h:54,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/slab.h:15,
from drivers/usb/gadget/function/f_acm.c:14:
>> arch/powerpc/include/asm/book3s/64/pgtable.h:219:21: warning: 
>> "__pte_frag_nr" is not defined, evaluates to 0 [-Wundef]
#define PTE_FRAG_NR __pte_frag_nr
^
>> arch/powerpc/include/asm/pgtable.h:123:5: note: in expansion of macro 
>> 'PTE_FRAG_NR'
#if PTE_FRAG_NR != 1
^~~
--
   In file included from arch/powerpc/include/asm/book3s/64/mmu-hash.h:24:0,
from arch/powerpc/include/asm/book3s/64/mmu.h:39,
from arch/powerpc/include/asm/mmu.h:328,
from arch/powerpc/include/asm/lppaca.h:36,
from arch/powerpc/include/asm/paca.h:21,
from arch/powerpc/include/asm/current.h:16,
from include/linux/sched.h:12,
from drivers/usb/gadget/function/u_serial.c:18:
>> arch/powerpc/include/asm/book3s/64/pgtable.h:219:21: warning: 
>> "__pte_frag_nr" is not defined, evaluates to 0 [-Wundef]
#define PTE_FRAG_NR __pte_frag_nr
^
>> arch/powerpc/include/asm/pgtable.h:123:5: note: in expansion of macro 
>> 'PTE_FRAG_NR'
#if PTE_FRAG_NR != 1
^~~
   In file included from arch/powerpc/include/asm/book3s/64/mmu-hash.h:24:0,
from arch/powerpc/include/asm/book3s/64/mmu.h:39,
from arch/powerpc/include/asm/mmu.h:328,
from arch/powerpc/include/asm/lppaca.h:36,
from arch/powerpc/include/asm/paca.h:21,
from arch/powerpc/include/asm/current.h:16,
from include/linux/sched.h:12,
from drivers/usb/gadget/function/u_serial.c:18:
>> arch/powerpc/include/asm/book3s/64/pgtable.h:219:21: warning: 
>> "__pte_frag_nr" is not defined, evaluates to 0 [-Wundef]
#define PTE_FRAG_NR __pte_frag_nr
^
>> arch/powerpc/include/asm/pgtable.h:123:5: note: in expansion of macro 
>> 'PTE_FRAG_NR'
#if PTE_FRAG_NR != 1
^~~
--
   In file included from arch/powerpc/include/asm/book3s/64/mmu-hash.h:24:0,
from arch/powerpc/include/asm/book3s/64/mmu.h:39,
from arch/powerpc/include/asm/mmu.h:328,
from arch/powerpc/include/asm/lppaca.h:36,
from arch/powerpc/include/asm/paca.h:21,
from arch/powerpc/include/asm/current.h:16,
from include/linux/sched.h:12,
from drivers/staging/rtlwifi/rtl8822be/../wifi.h:20,
from drivers/staging/rtlwifi/rtl8822be/trx.c:15:
>> arch/powerpc/include/asm/book3s/64/pgtable.h:219:21: warning: 
>> "__pte_frag_nr" is not defined, evaluates to 0 [-Wundef]
#define PTE_FRAG_NR __pte_frag_nr
^
>> arch/powerpc/include/asm/pgtable.h:123:5: note: in expansion of macro 
>> 'PTE_FRAG_NR'
#if PTE_FRAG_NR != 1
^~~
   In file included from drivers/staging/rtlwifi/rtl8822be/trx.c:26:0:
   include/linux/vermagic.h:29:10: fatal error: 
generated/randomize_layout_hash.h: No such file or directory
#include 
 ^

[PATCH v1 03/13] powerpc/mm/32s: rework mmu_mapin_ram()

2018-11-29 Thread Christophe Leroy
This patch reworks mmu_mapin_ram() to be more generic and map as much
blocks as possible. It now supports blocks not starting at address 0.

It scans DBATs array to find free ones instead of forcing the use of
BAT2 and BAT3.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/ppc_mmu_32.c | 61 +---
 1 file changed, 40 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/mm/ppc_mmu_32.c b/arch/powerpc/mm/ppc_mmu_32.c
index 3a29e88308b0..61c10ee00ba2 100644
--- a/arch/powerpc/mm/ppc_mmu_32.c
+++ b/arch/powerpc/mm/ppc_mmu_32.c
@@ -72,39 +72,58 @@ unsigned long p_block_mapped(phys_addr_t pa)
return 0;
 }
 
+static int find_free_bat(void)
+{
+   int b;
+
+   if (cpu_has_feature(CPU_FTR_601)) {
+   for (b = 0; b < 4; b++) {
+   struct ppc_bat *bat = BATS[b];
+
+   if (!(bat[0].batl & 0x40))
+   return b;
+   }
+   } else {
+   int n = mmu_has_feature(MMU_FTR_USE_HIGH_BATS) ? 8 : 4;
+
+   for (b = 0; b < n; b++) {
+   struct ppc_bat *bat = BATS[b];
+
+   if (!(bat[1].batu & 3))
+   return b;
+   }
+   }
+   return -1;
+}
+
+static unsigned int block_size(unsigned long base, unsigned long top)
+{
+   unsigned int max_size = (cpu_has_feature(CPU_FTR_601) ? 8 : 256) << 20;
+   unsigned int base_shift = (fls(base) - 1) & 31;
+   unsigned int block_shift = (fls(top - base) - 1) & 31;
+
+   return min3(max_size, 1U << base_shift, 1U << block_shift);
+}
+
 unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
 {
-   unsigned long tot, bl, done;
-   unsigned long max_size = (256<<20);
+   int idx;
 
if (__map_without_bats) {
printk(KERN_DEBUG "RAM mapped without BATs\n");
return 0;
}
 
-   /* Set up BAT2 and if necessary BAT3 to cover RAM. */
+   while ((idx = find_free_bat()) != -1 && base != top) {
+   unsigned int size = block_size(base, top);
 
-   /* Make sure we don't map a block larger than the
-  smallest alignment of the physical address. */
-   tot = top;
-   for (bl = 128<<10; bl < max_size; bl <<= 1) {
-   if (bl * 2 > tot)
+   if (size < 128 << 10)
break;
+   setbat(idx, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
+   base += size;
}
 
-   setbat(2, PAGE_OFFSET, 0, bl, PAGE_KERNEL_X);
-   done = (unsigned long)bat_addrs[2].limit - PAGE_OFFSET + 1;
-   if ((done < tot) && !bat_addrs[3].limit) {
-   /* use BAT3 to cover a bit more */
-   tot -= done;
-   for (bl = 128<<10; bl < max_size; bl <<= 1)
-   if (bl * 2 > tot)
-   break;
-   setbat(3, PAGE_OFFSET+done, done, bl, PAGE_KERNEL_X);
-   done = (unsigned long)bat_addrs[3].limit - PAGE_OFFSET + 1;
-   }
-
-   return done;
+   return base;
 }
 
 /*
-- 
2.13.3



[PATCH v1 11/13] powerpc/kconfig: define CONFIG_DATA_SHIFT and CONFIG_ETEXT_SHIFT

2018-11-29 Thread Christophe Leroy
CONFIG_STRICT_KERNEL_RWX requires a special alignment
for DATA for some subarches. Today it is just defined
as an #ifdef in vmlinux.lds.S

In order to get more flexibility, this patch moves the
definition of this alignment in Kconfig

On some subarches, CONFIG_STRICT_KERNEL_RWX will
require a special alignment of _etext.

This patch also adds a configuration item for it in Kconfig

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  | 9 +
 arch/powerpc/kernel/vmlinux.lds.S | 9 +++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4a81a80d0635..f3e420f3f1d7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -728,6 +728,15 @@ config THREAD_SHIFT
  Used to define the stack size. The default is almost always what you
  want. Only change this if you know what you are doing.
 
+config ETEXT_SHIFT
+   int
+   default PPC_PAGE_SHIFT
+
+config DATA_SHIFT
+   int
+   default 24 if STRICT_KERNEL_RWX && PPC64
+   default PPC_PAGE_SHIFT
+
 config FORCE_MAX_ZONEORDER
int "Maximum zone order"
range 8 9 if PPC64 && PPC_64K_PAGES
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 1148c3c60c3b..d210dcfe915a 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -12,11 +12,8 @@
 #include 
 #include 
 
-#if defined(CONFIG_STRICT_KERNEL_RWX) && !defined(CONFIG_PPC32)
-#define STRICT_ALIGN_SIZE  (1 << 24)
-#else
-#define STRICT_ALIGN_SIZE  PAGE_SIZE
-#endif
+#define STRICT_ALIGN_SIZE  (1 << CONFIG_DATA_SHIFT)
+#define ETEXT_ALIGN_SIZE   (1 << CONFIG_ETEXT_SHIFT)
 
 ENTRY(_stext)
 
@@ -131,7 +128,7 @@ SECTIONS
 
} :kernel
 
-   . = ALIGN(PAGE_SIZE);
+   . = ALIGN(ETEXT_ALIGN_SIZE);
_etext = .;
PROVIDE32 (etext = .);
 
-- 
2.13.3



[PATCH v1 13/13] powerpc/kconfig: make _etext and data areas alignment configurable on Book3s 32

2018-11-29 Thread Christophe Leroy
Depending on the number of available BATs for mapping the different
kernel areas, it might be needed to increase the alignment of _etext
and/or of data areas.

This patchs allows the user to do it via Kconfig.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ffcf4d7a1186..bab9dab815d9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -728,16 +728,44 @@ config THREAD_SHIFT
  Used to define the stack size. The default is almost always what you
  want. Only change this if you know what you are doing.
 
+config ETEXT_SHIFT_BOOL
+   bool "Set custom etext alignment" if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   depends on ADVANCED_OPTIONS
+   help
+ This option allows you to set the kernel end of text alignment. When
+ RAM is mapped by blocks, the alignment needs to fit the size and
+ number of possible blocks. The default should be OK for most configs.
+
+ Say N here unless you know what you are doing.
+
 config ETEXT_SHIFT
-   int
+   int "_etext shift" if ETEXT_SHIFT_BOOL
+   range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default PPC_PAGE_SHIFT
+   help
+ On Book3S 32 (603+), IBATs are used to map kernel text.
+ Smaller is the alignment, greater is the number of necessary IBATs.
+
+config DATA_SHIFT_BOOL
+   bool "Set custom data alignment" if STRICT_KERNEL_RWX && PPC_BOOK3S_32
+   depends on ADVANCED_OPTIONS
+   help
+ This option allows you to set the kernel data alignment. When
+ RAM is mapped by blocks, the alignment needs to fit the size and
+ number of possible blocks. The default should be OK for most configs.
+
+ Say N here unless you know what you are doing.
 
 config DATA_SHIFT
-   int
+   int "Data shift" if DATA_SHIFT_BOOL
default 24 if STRICT_KERNEL_RWX && PPC64
+   range 17 28 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default PPC_PAGE_SHIFT
+   help
+ On Book3S 32 (603+), DBATs are used to map kernel text and rodata RO.
+ Smaller is the alignment, greater is the number of necessary DBATs.
 
 config FORCE_MAX_ZONEORDER
int "Maximum zone order"
-- 
2.13.3



[PATCH v1 08/13] powerpc/32: add helper to write into segment registers

2018-11-29 Thread Christophe Leroy
This patch add an helper which wraps 'mtsrin' instruction
to write into segment registers.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/reg.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index de52c3166ba4..c9c382e57017 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1423,6 +1423,11 @@ static inline void msr_check_and_clear(unsigned long 
bits)
 #define mfsrin(v)  ({unsigned int rval; \
asm volatile("mfsrin %0,%1" : "=r" (rval) : "r" (v)); \
rval;})
+
+static inline void mtsrin(u32 val, u32 idx)
+{
+   asm volatile("mtsrin %0, %1" : : "r" (val), "r" (idx));
+}
 #endif
 
 #define proc_trap()asm volatile("trap")
-- 
2.13.3



[PATCH v1 05/13] powerpc/wii: remove wii_mmu_mapin_mem2()

2018-11-29 Thread Christophe Leroy
wii_mmu_mapin_mem2() is not used anymore, remove it.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/platforms/embedded6xx/wii.c | 24 
 1 file changed, 24 deletions(-)

diff --git a/arch/powerpc/platforms/embedded6xx/wii.c 
b/arch/powerpc/platforms/embedded6xx/wii.c
index ecf703ee3a76..235fe81aa2b1 100644
--- a/arch/powerpc/platforms/embedded6xx/wii.c
+++ b/arch/powerpc/platforms/embedded6xx/wii.c
@@ -54,10 +54,6 @@
 static void __iomem *hw_ctrl;
 static void __iomem *hw_gpio;
 
-unsigned long wii_hole_start;
-unsigned long wii_hole_size;
-
-
 static int __init page_aligned(unsigned long x)
 {
return !(x & (PAGE_SIZE-1));
@@ -69,26 +65,6 @@ void __init wii_memory_fixups(void)
 
BUG_ON(memblock.memory.cnt != 2);
BUG_ON(!page_aligned(p[0].base) || !page_aligned(p[1].base));
-
-   /* determine hole */
-   wii_hole_start = ALIGN(p[0].base + p[0].size, PAGE_SIZE);
-   wii_hole_size = p[1].base - wii_hole_start;
-}
-
-unsigned long __init wii_mmu_mapin_mem2(unsigned long top)
-{
-   unsigned long delta, size, bl;
-   unsigned long max_size = (256<<20);
-
-   /* MEM2 64MB@0x1000 */
-   delta = wii_hole_start + wii_hole_size;
-   size = top - delta;
-   for (bl = 128<<10; bl < max_size; bl <<= 1) {
-   if (bl * 2 > size)
-   break;
-   }
-   setbat(4, PAGE_OFFSET+delta, delta, bl, PAGE_KERNEL_X);
-   return delta + bl;
 }
 
 static void __noreturn wii_spin(void)
-- 
2.13.3



[PATCH v1 12/13] powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX

2018-11-29 Thread Christophe Leroy
Today, STRICT_KERNEL_RWX is based on the use of regular pages
to map kernel pages.

On Book3s 32, it has three consequences:
- Using pages instead of BAT for mapping kernel linear memory severely
impacts performance.
- Exec protection is not effective because no-execute cannot be set at
page level (except on 603 which doesn't have hash tables)
- Write protection is not effective because PP bits do not provide RO
mode for kernel-only pages (except on 603 which handles it in software
via PAGE_DIRTY)

On the 603+, we have:
- Independent IBAT and DBAT allowing limitation of exec parts.
- NX bit can be set in segment registers to forbit execution on memory
mapped by pages.
- RO mode on DBATs even for kernel-only blocks.

On the 601, there is nothing much we can do other than warn the user
about it, because:
- BATs are common to instructions and data.
- BAT do not provide RO mode for kernel-only blocks.
- segment registers don't have the NX bit.

In order to use IBAT for exec protection, this patch:
- Aligns _etext to BAT block sizes (128kb)
- Set NX bit in kernel segment register (Except on vmalloc area when
CONFIG_MODULES is selected)
- Maps kernel text with IBATs.

In order to use DBAT for exec protection, this patch:
- Aligns RW DATA to BAT block sizes (4M)
- Maps kernel RO area with write prohibited DBATs
- Maps remaining memory with remaining DBATs

Here is what we get with this patch on a 832x when activating
STRICT_KERNEL_RWX:

Symbols:
c000 T _stext
c068 R __start_rodata
c068 R _etext
c080 T __init_begin
c080 T _sinittext

~# cat /sys/kernel/debug/block_address_translation
---[ Instruction Block Address Translation ]---
0: 0xc000-0xc03f 0x Kernel EXEC coherent
1: 0xc040-0xc05f 0x0040 Kernel EXEC coherent
2: 0xc060-0xc067 0x0060 Kernel EXEC coherent
3: -
4: -
5: -
6: -
7: -

---[ Data Block Address Translation ]---
0: 0xc000-0xc07f 0x Kernel RO coherent
1: 0xc080-0xc0ff 0x0080 Kernel RW coherent
2: 0xc100-0xc1ff 0x0100 Kernel RW coherent
3: 0xc200-0xc3ff 0x0200 Kernel RW coherent
4: 0xc400-0xc7ff 0x0400 Kernel RW coherent
5: 0xc800-0xcfff 0x0800 Kernel RW coherent
6: 0xd000-0xdfff 0x1000 Kernel RW coherent
7: -

~# cat /sys/kernel/debug/segment_registers
---[ User Segments ]---
0x-0x0fff Kern key 1 User key 1 VSID 0xa085d0
0x1000-0x1fff Kern key 1 User key 1 VSID 0xa086e1
0x2000-0x2fff Kern key 1 User key 1 VSID 0xa087f2
0x3000-0x3fff Kern key 1 User key 1 VSID 0xa08903
0x4000-0x4fff Kern key 1 User key 1 VSID 0xa08a14
0x5000-0x5fff Kern key 1 User key 1 VSID 0xa08b25
0x6000-0x6fff Kern key 1 User key 1 VSID 0xa08c36
0x7000-0x7fff Kern key 1 User key 1 VSID 0xa08d47
0x8000-0x8fff Kern key 1 User key 1 VSID 0xa08e58
0x9000-0x9fff Kern key 1 User key 1 VSID 0xa08f69
0xa000-0xafff Kern key 1 User key 1 VSID 0xa0907a
0xb000-0xbfff Kern key 1 User key 1 VSID 0xa0918b

---[ Kernel Segments ]---
0xc000-0xcfff Kern key 0 User key 1 No Exec VSID 0x000ccc
0xd000-0xdfff Kern key 0 User key 1 No Exec VSID 0x000ddd
0xe000-0xefff Kern key 0 User key 1 No Exec VSID 0x000eee
0xf000-0x Kern key 0 User key 1 No Exec VSID 0x000fff

Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs:
16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb
(A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block)

Aligning data to 4M allows to map up to 512Mb data with 8 DBATs:
16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb

Because some processors only have 4 BATs and because some targets need
DBATs for mapping other areas, the following patch will allow to
modify _etext and data alignment.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  2 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 11 
 arch/powerpc/mm/init_32.c|  4 +-
 arch/powerpc/mm/mmu_decl.h   |  8 +++
 arch/powerpc/mm/pgtable_32.c | 10 +++-
 arch/powerpc/mm/ppc_mmu_32.c | 87 ++--
 6 files changed, 112 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f3e420f3f1d7..ffcf4d7a1186 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -730,11 +730,13 @@ config THREAD_SHIFT
 
 config ETEXT_SHIFT
int
+   default 17 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default PPC_PAGE_SHIFT
 
 config DATA_SHIFT
int
default 24 if STRICT_KERNEL_RWX && PPC64
+   default 22 if STRICT_KERNEL_RWX && PPC_BOOK3S_32
default PPC_PAGE_SHIFT
 
 config FORCE_MAX_ZONEORDER
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 

[PATCH] powerpc: Look for "stdout-path" when setting up legacy consoles

2018-11-29 Thread Benjamin Herrenschmidt
Commit 78e5dfea8 "powerpc: dts: replace 'linux,stdout-path' with 'stdout-path'"
broke the default console on a number of embedded PowerPC systems, because it
failed to also update the code in arch/powerpc/kernel/legacy_serial.c to
look for that property in addition to the old one.

This fixes it.

Signed-off-by: Benjamin Herrenschmidt 
Fixes: 78e5dfea8 powerpc: dts: replace 'linux,stdout-path' with 'stdout-path'
---
diff --git a/arch/powerpc/kernel/legacy_serial.c 
b/arch/powerpc/kernel/legacy_serial.c
index 33b34a5..5b9dce1 100644
--- a/arch/powerpc/kernel/legacy_serial.c
+++ b/arch/powerpc/kernel/legacy_serial.c
@@ -372,6 +372,8 @@ void __init find_legacy_serial_ports(void)
 
/* Now find out if one of these is out firmware console */
path = of_get_property(of_chosen, "linux,stdout-path", NULL);
+   if (path == NULL)
+   path = of_get_property(of_chosen, "stdout-path", NULL);
if (path != NULL) {
stdout = of_find_node_by_path(path);
if (stdout)
@@ -595,8 +597,10 @@ static int __init check_legacy_serial_console(void)
/* We are getting a weird phandle from OF ... */
/* ... So use the full path instead */
name = of_get_property(of_chosen, "linux,stdout-path", NULL);
+   if (name == NULL)
+   name = of_get_property(of_chosen, "stdout-path", NULL);
if (name == NULL) {
-   DBG(" no linux,stdout-path !\n");
+   DBG(" no stdout-path !\n");
return -ENODEV;
}
prom_stdout = of_find_node_by_path(name);



[PATCH v8 05/20] powerpc/mm: move platform specific mmu-xxx.h in platform directories

2018-11-29 Thread Christophe Leroy
The purpose of this patch is to move platform specific
mmu-xxx.h files in platform directories like pte-xxx.h files.

In the meantime this patch creates common nohash and
nohash/32 + nohash/64 mmu.h files for future common parts.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/mmu.h | 14 ++
 arch/powerpc/include/asm/{ => nohash/32}/mmu-40x.h |  0
 arch/powerpc/include/asm/{ => nohash/32}/mmu-44x.h |  0
 arch/powerpc/include/asm/{ => nohash/32}/mmu-8xx.h |  0
 arch/powerpc/include/asm/nohash/32/mmu.h   | 19 +++
 arch/powerpc/include/asm/nohash/64/mmu.h   |  8 
 arch/powerpc/include/asm/{ => nohash}/mmu-book3e.h |  0
 arch/powerpc/include/asm/nohash/mmu.h  | 11 +++
 arch/powerpc/kernel/cpu_setup_fsl_booke.S  |  2 +-
 arch/powerpc/kvm/e500.h|  2 +-
 10 files changed, 42 insertions(+), 14 deletions(-)
 rename arch/powerpc/include/asm/{ => nohash/32}/mmu-40x.h (100%)
 rename arch/powerpc/include/asm/{ => nohash/32}/mmu-44x.h (100%)
 rename arch/powerpc/include/asm/{ => nohash/32}/mmu-8xx.h (100%)
 create mode 100644 arch/powerpc/include/asm/nohash/32/mmu.h
 create mode 100644 arch/powerpc/include/asm/nohash/64/mmu.h
 rename arch/powerpc/include/asm/{ => nohash}/mmu-book3e.h (100%)
 create mode 100644 arch/powerpc/include/asm/nohash/mmu.h

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index eb20eb3b8fb0..2184021b0e1c 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -341,18 +341,8 @@ static inline void mmu_early_init_devtree(void) { }
 #if defined(CONFIG_PPC_STD_MMU_32)
 /* 32-bit classic hash table MMU */
 #include 
-#elif defined(CONFIG_40x)
-/* 40x-style software loaded TLB */
-#  include 
-#elif defined(CONFIG_44x)
-/* 44x-style software loaded TLB */
-#  include 
-#elif defined(CONFIG_PPC_BOOK3E_MMU)
-/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
-#  include 
-#elif defined (CONFIG_PPC_8xx)
-/* Motorola/Freescale 8xx software loaded TLB */
-#  include 
+#elif defined(CONFIG_PPC_MMU_NOHASH)
+#include 
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/mmu-40x.h 
b/arch/powerpc/include/asm/nohash/32/mmu-40x.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-40x.h
rename to arch/powerpc/include/asm/nohash/32/mmu-40x.h
diff --git a/arch/powerpc/include/asm/mmu-44x.h 
b/arch/powerpc/include/asm/nohash/32/mmu-44x.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-44x.h
rename to arch/powerpc/include/asm/nohash/32/mmu-44x.h
diff --git a/arch/powerpc/include/asm/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-8xx.h
rename to arch/powerpc/include/asm/nohash/32/mmu-8xx.h
diff --git a/arch/powerpc/include/asm/nohash/32/mmu.h 
b/arch/powerpc/include/asm/nohash/32/mmu.h
new file mode 100644
index ..af0e8b54876a
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/32/mmu.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_32_MMU_H_
+#define _ASM_POWERPC_NOHASH_32_MMU_H_
+
+#if defined(CONFIG_40x)
+/* 40x-style software loaded TLB */
+#include 
+#elif defined(CONFIG_44x)
+/* 44x-style software loaded TLB */
+#include 
+#elif defined(CONFIG_PPC_BOOK3E_MMU)
+/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
+#include 
+#elif defined (CONFIG_PPC_8xx)
+/* Motorola/Freescale 8xx software loaded TLB */
+#include 
+#endif
+
+#endif /* _ASM_POWERPC_NOHASH_32_MMU_H_ */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
new file mode 100644
index ..87871d027b75
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_64_MMU_H_
+#define _ASM_POWERPC_NOHASH_64_MMU_H_
+
+/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
+#include 
+
+#endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/nohash/mmu-book3e.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-book3e.h
rename to arch/powerpc/include/asm/nohash/mmu-book3e.h
diff --git a/arch/powerpc/include/asm/nohash/mmu.h 
b/arch/powerpc/include/asm/nohash/mmu.h
new file mode 100644
index ..a037cb1efb57
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/mmu.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_MMU_H_
+#define _ASM_POWERPC_NOHASH_MMU_H_
+
+#ifdef CONFIG_PPC64
+#include 
+#else
+#include 
+#endif
+
+#endif /* _ASM_POWERPC_NOHASH_MMU_H_ */
diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S 
b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
index 8d142e5d84cd..5fbc890d1094 100644
--- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
+++ 

[PATCH v8 06/20] powerpc/mm: Move pgtable_t into platform headers

2018-11-29 Thread Christophe Leroy
This patch move pgtable_t into platform headers.

It gets rid of the CONFIG_PPC_64K_PAGES case for PPC64
as nohash/64 doesn't support CONFIG_PPC_64K_PAGES.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  2 ++
 arch/powerpc/include/asm/book3s/64/mmu.h  |  9 +
 arch/powerpc/include/asm/nohash/32/mmu.h  |  4 
 arch/powerpc/include/asm/nohash/64/mmu.h  |  4 
 arch/powerpc/include/asm/page.h   | 14 --
 5 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index e38c91388c40..5bd26c218b94 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -42,6 +42,8 @@ struct ppc_bat {
u32 batu;
u32 batl;
 };
+
+typedef struct page *pgtable_t;
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 6328857f259f..1ceee000c18d 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_MMU_H_
 #define _ASM_POWERPC_BOOK3S_64_MMU_H_
 
+#include 
+
 #ifndef __ASSEMBLY__
 /*
  * Page size definition
@@ -24,6 +26,13 @@ struct mmu_psize_def {
 };
 extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 
+/*
+ * For BOOK3s 64 with 4k and 64K linux page size
+ * we want to use pointers, because the page table
+ * actually store pfn
+ */
+typedef pte_t *pgtable_t;
+
 #endif /* __ASSEMBLY__ */
 
 /* 64-bit classic hash table MMU */
diff --git a/arch/powerpc/include/asm/nohash/32/mmu.h 
b/arch/powerpc/include/asm/nohash/32/mmu.h
index af0e8b54876a..f61f933a4cd8 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu.h
@@ -16,4 +16,8 @@
 #include 
 #endif
 
+#ifndef __ASSEMBLY__
+typedef struct page *pgtable_t;
+#endif
+
 #endif /* _ASM_POWERPC_NOHASH_32_MMU_H_ */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
index 87871d027b75..e6585480dfc4 100644
--- a/arch/powerpc/include/asm/nohash/64/mmu.h
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -5,4 +5,8 @@
 /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
 #include 
 
+#ifndef __ASSEMBLY__
+typedef struct page *pgtable_t;
+#endif
+
 #endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 9ea903221a9f..a7624a3b1435 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -335,20 +335,6 @@ void arch_free_page(struct page *page, int order);
 #endif
 
 struct vm_area_struct;
-#ifdef CONFIG_PPC_BOOK3S_64
-/*
- * For BOOK3s 64 with 4k and 64K linux page size
- * we want to use pointers, because the page table
- * actually store pfn
- */
-typedef pte_t *pgtable_t;
-#else
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC64)
-typedef pte_t *pgtable_t;
-#else
-typedef struct page *pgtable_t;
-#endif
-#endif
 
 #include 
 #endif /* __ASSEMBLY__ */
-- 
2.13.3



[PATCH v8 08/20] powerpc/mm: Extend pte_fragment functionality to PPC32

2018-11-29 Thread Christophe Leroy
In order to allow the 8xx to handle pte_fragments, this patch
extends the use of pte_fragments to PPC32 platforms.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  5 -
 arch/powerpc/include/asm/book3s/32/pgalloc.h  | 17 +
 arch/powerpc/include/asm/book3s/32/pgtable.h  |  5 +++--
 arch/powerpc/include/asm/mmu_context.h|  2 +-
 arch/powerpc/include/asm/nohash/32/mmu.h  |  4 +++-
 arch/powerpc/include/asm/nohash/32/pgalloc.h  | 22 +++---
 arch/powerpc/include/asm/nohash/32/pgtable.h  |  8 +---
 arch/powerpc/mm/Makefile  |  1 +
 arch/powerpc/mm/mmu_context.c | 10 ++
 arch/powerpc/mm/mmu_context_nohash.c  |  2 +-
 arch/powerpc/mm/pgtable_32.c  | 25 -
 11 files changed, 52 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index 5bd26c218b94..2bb500d25de6 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_
 #define _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_
+
 /*
  * 32-bit hash table MMU support
  */
@@ -9,6 +10,8 @@
  * BATs
  */
 
+#include 
+
 /* Block size masks */
 #define BL_128K0x000
 #define BL_256K 0x001
@@ -43,7 +46,7 @@ struct ppc_bat {
u32 batl;
 };
 
-typedef struct page *pgtable_t;
+typedef pte_t *pgtable_t;
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index eb8882c6dbb0..0f58e5b9dbe7 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -59,30 +59,31 @@ static inline void pmd_populate_kernel(struct mm_struct 
*mm, pmd_t *pmdp,
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pte_page)
 {
-   *pmdp = __pmd((page_to_pfn(pte_page) << PAGE_SHIFT) | _PMD_PRESENT);
+   *pmdp = __pmd(__pa(pte_page) | _PMD_PRESENT);
 }
 
-#define pmd_pgtable(pmd) pmd_page(pmd)
+#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
 
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
 extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
+void pte_frag_destroy(void *pte_frag);
+pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int 
kernel);
+void pte_fragment_free(unsigned long *table, int kernel);
 
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-   free_page((unsigned long)pte);
+   pte_fragment_free((unsigned long *)pte, 1);
 }
 
 static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
 {
-   pgtable_page_dtor(ptepage);
-   __free_page(ptepage);
+   pte_fragment_free((unsigned long *)ptepage, 0);
 }
 
 static inline void pgtable_free(void *table, unsigned index_size)
 {
if (!index_size) {
-   pgtable_page_dtor(virt_to_page(table));
-   free_page((unsigned long)table);
+   pte_fragment_free((unsigned long *)table, 0);
} else {
BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
kmem_cache_free(PGT_CACHE(index_size), table);
@@ -120,6 +121,6 @@ static inline void pgtable_free_tlb(struct mmu_gather *tlb,
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
  unsigned long address)
 {
-   pgtable_free_tlb(tlb, page_address(table), 0);
+   pgtable_free_tlb(tlb, table, 0);
 }
 #endif /* _ASM_POWERPC_BOOK3S_32_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 32c33eccc0e2..47156b93f9af 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -329,7 +329,7 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
 #define pte_same(A,B)  (((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
 
 #define pmd_page_vaddr(pmd)\
-   ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
+   ((unsigned long)__va(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1)))
 #define pmd_page(pmd)  \
pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
 
@@ -346,7 +346,8 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
 #define pte_offset_kernel(dir, addr)   \
((pte_t *) pmd_page_vaddr(*(dir)) + pte_index(addr))
 #define pte_offset_map(dir, addr)  \
-   ((pte_t *) kmap_atomic(pmd_page(*(dir))) + pte_index(addr))
+   ((pte_t *)(kmap_atomic(pmd_page(*(dir))) + \
+  (pmd_page_vaddr(*(dir)) & ~PAGE_MASK)) + pte_index(addr))
 #define pte_unmap(pte) kunmap_atomic(pte)
 
 /*
diff --git 

[PATCH v8 19/20] powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers

2018-11-29 Thread Christophe Leroy
This patch reworks the TLB Miss handler in order to not use r12
register, hence avoiding having to save it into SPRN_SPRG_SCRATCH2.

In the DAR Fixup code we can now use SPRN_M_TW, freeing
SPRN_SPRG_SCRATCH2.

Then SPRN_SPRG_SCRATCH2 may be used for something else in the future.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 110 ++---
 1 file changed, 49 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 85fb4b8bf6c7..0a4f8a9c85ff 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -302,90 +302,87 @@ SystemCall:
  */
 
 #ifdef CONFIG_8xx_CPU15
-#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr) \
-   additmp, addr, PAGE_SIZE;   \
-   tlbie   tmp;\
-   additmp, addr, -PAGE_SIZE;  \
-   tlbie   tmp
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)  \
+   addiaddr, addr, PAGE_SIZE;  \
+   tlbie   addr;   \
+   addiaddr, addr, -(PAGE_SIZE << 1);  \
+   tlbie   addr;   \
+   addiaddr, addr, PAGE_SIZE
 #else
-#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr)
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)
 #endif
 
 InstructionTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
mtspr   SPRN_SPRG_SCRATCH1, r11
-#ifdef ITLB_MISS_KERNEL
-   mtspr   SPRN_SPRG_SCRATCH2, r12
 #endif
 
/* If we are faulting a kernel address, we have to use the
 * kernel page tables.
 */
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
-   INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
+   INVALIDATE_ADJACENT_PAGES_CPU15(r10)
mtspr   SPRN_MD_EPN, r10
/* Only modules will cause ITLB Misses as we always
 * pin the first 8MB of kernel memory */
 #ifdef ITLB_MISS_KERNEL
-   mfcrr12
+   mfcrr11
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
-   andis.  r11, r10, 0x8000/* Address >= 0x8000 */
+   cmpicr0, r10, 0 /* Address >= 0x8000 */
 #else
-   rlwinm  r11, r10, 16, 0xfff8
-   cmpli   cr0, r11, PAGE_OFFSET@h
+   rlwinm  r10, r10, 16, 0xfff8
+   cmpli   cr0, r10, PAGE_OFFSET@h
 #ifndef CONFIG_PIN_TLB_TEXT
/* It is assumed that kernel code fits into the first 8M page */
-0: cmpli   cr7, r11, (PAGE_OFFSET + 0x080)@h
+0: cmpli   cr7, r10, (PAGE_OFFSET + 0x080)@h
patch_site  0b, patch__itlbmiss_linmem_top
 #endif
 #endif
 #endif
-   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
+   mfspr   r10, SPRN_M_TWB /* Get level 1 table */
 #ifdef ITLB_MISS_KERNEL
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
-   beq+3f
+   bge+3f
 #else
blt+3f
 #endif
 #ifndef CONFIG_PIN_TLB_TEXT
blt cr7, ITLBMissLinear
 #endif
-   rlwinm  r11, r11, 0, 20, 31
-   orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
+   rlwinm  r10, r10, 0, 20, 31
+   orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
 #endif
-   lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the 
level 1 entry */
+   lwz r10, (swapper_pg_dir-PAGE_OFFSET)@l(r10)/* Get level 1 
entry */
+   mtspr   SPRN_MI_TWC, r10/* Set segment attributes */
 
-   mtspr   SPRN_MD_TWC, r11
+   mtspr   SPRN_MD_TWC, r10
mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
 #ifdef ITLB_MISS_KERNEL
-   mtcrr12
+   mtcrr11
 #endif
-   /* Load the MI_TWC with the attributes for this "segment." */
-   mtspr   SPRN_MI_TWC, r11/* Set segment attributes */
-
 #ifdef CONFIG_SWAP
rlwinm  r11, r10, 32-5, _PAGE_PRESENT
and r11, r11, r10
rlwimi  r10, r11, 0, _PAGE_PRESENT
 #endif
-   li  r11, RPN_PATTERN | 0x200
/* The Linux PTE won't go exactly into the MMU TLB.
 * Software indicator bits 20 and 23 must be clear.
 * Software indicator bits 22, 24, 25, 26, and 27 must be
 * set.  All other Linux PTE bits control the behavior
 * of the MMU.
 */
-   rlwimi  r11, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */
-   rlwimi  r10, r11, 0, 0x0ff0 /* Set 22, 24-27, clear 20,23 */
+   rlwimi  r10, r10, 0, 0x0f00 /* Clear bits 20-23 */
+   rlwimi  r10, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */
+   ori r10, r10, RPN_PATTERN | 0x200 /* Set 22 and 24-27 */
mtspr   SPRN_MI_RPN, r10/* Update TLB entry */
 
/* Restore registers */
 0: mfspr   r10, SPRN_SPRG_SCRATCH0
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
mfspr   r11, SPRN_SPRG_SCRATCH1
-#ifdef ITLB_MISS_KERNEL
-   mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
patch_site  0b, 

[PATCH v8 01/20] powerpc/book3s32: Remove CONFIG_BOOKE dependent code

2018-11-29 Thread Christophe Leroy
BOOK3S/32 cannot be BOOKE, so remove useless code

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 18 --
 arch/powerpc/include/asm/book3s/32/pgtable.h | 14 --
 2 files changed, 32 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 82e44b1a00ae..eb8882c6dbb0 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -50,8 +50,6 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 #define __pmd_free_tlb(tlb,x,a)do { } while (0)
 /* #define pgd_populate(mm, pmd, pte)  BUG() */
 
-#ifndef CONFIG_BOOKE
-
 static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
   pte_t *pte)
 {
@@ -65,22 +63,6 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
 }
 
 #define pmd_pgtable(pmd) pmd_page(pmd)
-#else
-
-static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
-  pte_t *pte)
-{
-   *pmdp = __pmd((unsigned long)pte | _PMD_PRESENT);
-}
-
-static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
-   pgtable_t pte_page)
-{
-   *pmdp = __pmd((unsigned long)lowmem_page_address(pte_page) | 
_PMD_PRESENT);
-}
-
-#define pmd_pgtable(pmd) pmd_page(pmd)
-#endif
 
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
 extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index c21d33704633..32c33eccc0e2 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -328,24 +328,10 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
 #define __HAVE_ARCH_PTE_SAME
 #define pte_same(A,B)  (((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
 
-/*
- * Note that on Book E processors, the pmd contains the kernel virtual
- * (lowmem) address of the pte page.  The physical address is less useful
- * because everything runs with translation enabled (even the TLB miss
- * handler).  On everything else the pmd contains the physical address
- * of the pte page.  -- paulus
- */
-#ifndef CONFIG_BOOKE
 #define pmd_page_vaddr(pmd)\
((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
 #define pmd_page(pmd)  \
pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
-#else
-#define pmd_page_vaddr(pmd)\
-   ((unsigned long) (pmd_val(pmd) & PAGE_MASK))
-#define pmd_page(pmd)  \
-   pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
-#endif
 
 /* to find an entry in a kernel page-table-directory */
 #define pgd_offset_k(address) pgd_offset(_mm, address)
-- 
2.13.3



[PATCH v8 03/20] powerpc/mm: Move pte_fragment_alloc() to a common location

2018-11-29 Thread Christophe Leroy
In preparation of next patch which generalises the use of
pte_fragment_alloc() for all, this patch moves the related functions
in a place that is common to all subarches.

The 8xx will need that for supporting 16k pages, as in that mode
page tables still have a size of 4k.

Since pte_fragment with only once fragment is not different
from what is done in the general case, we can easily migrate all
subarchs to pte fragments.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h |   1 +
 arch/powerpc/mm/Makefile |   4 +-
 arch/powerpc/mm/mmu_context_book3s64.c   |  15 
 arch/powerpc/mm/pgtable-book3s64.c   |  85 
 arch/powerpc/mm/pgtable-frag.c   | 116 +++
 5 files changed, 120 insertions(+), 101 deletions(-)
 create mode 100644 arch/powerpc/mm/pgtable-frag.c

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 391ed2c3b697..f949dd90af9b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -50,6 +50,7 @@ extern void pgtable_free_tlb(struct mmu_gather *tlb, void 
*table, int shift);
 #ifdef CONFIG_SMP
 extern void __tlb_remove_table(void *_table);
 #endif
+void pte_frag_destroy(void *pte_frag);
 
 static inline pgd_t *radix__pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index ca96e7be4d0e..3cbb1acf0745 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -15,7 +15,9 @@ obj-$(CONFIG_PPC_MMU_NOHASH)  += mmu_context_nohash.o 
tlb_nohash.o \
 obj-$(CONFIG_PPC_BOOK3E)   += tlb_low_$(BITS)e.o
 hash64-$(CONFIG_PPC_NATIVE):= hash_native_64.o
 obj-$(CONFIG_PPC_BOOK3E_64)   += pgtable-book3e.o
-obj-$(CONFIG_PPC_BOOK3S_64)+= pgtable-hash64.o hash_utils_64.o slb.o 
$(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o
+obj-$(CONFIG_PPC_BOOK3S_64)+= pgtable-hash64.o hash_utils_64.o slb.o \
+  $(hash64-y) mmu_context_book3s64.o \
+  pgtable-book3s64.o pgtable-frag.o
 obj-$(CONFIG_PPC_RADIX_MMU)+= pgtable-radix.o tlb-radix.o
 obj-$(CONFIG_PPC_STD_MMU_32)   += ppc_mmu_32.o hash_low_32.o 
mmu_context_hash32.o
 obj-$(CONFIG_PPC_STD_MMU)  += tlb_hash$(BITS).o
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
b/arch/powerpc/mm/mmu_context_book3s64.c
index 510f103d7813..f720c5cc0b5e 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -164,21 +164,6 @@ static void destroy_contexts(mm_context_t *ctx)
}
 }
 
-static void pte_frag_destroy(void *pte_frag)
-{
-   int count;
-   struct page *page;
-
-   page = virt_to_page(pte_frag);
-   /* drop all the pending references */
-   count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
-   /* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PTE_FRAG_NR - count, >pt_frag_refcount)) {
-   pgtable_page_dtor(page);
-   __free_page(page);
-   }
-}
-
 static void pmd_frag_destroy(void *pmd_frag)
 {
int count;
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 9f93c9f985c5..0c0fd173208a 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -322,91 +322,6 @@ void pmd_fragment_free(unsigned long *pmd)
}
 }
 
-static pte_t *get_pte_from_cache(struct mm_struct *mm)
-{
-   void *pte_frag, *ret;
-
-   spin_lock(>page_table_lock);
-   ret = mm->context.pte_frag;
-   if (ret) {
-   pte_frag = ret + PTE_FRAG_SIZE;
-   /*
-* If we have taken up all the fragments mark PTE page NULL
-*/
-   if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
-   pte_frag = NULL;
-   mm->context.pte_frag = pte_frag;
-   }
-   spin_unlock(>page_table_lock);
-   return (pte_t *)ret;
-}
-
-static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
-{
-   void *ret = NULL;
-   struct page *page;
-
-   if (!kernel) {
-   page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
-   if (!page)
-   return NULL;
-   if (!pgtable_page_ctor(page)) {
-   __free_page(page);
-   return NULL;
-   }
-   } else {
-   page = alloc_page(PGALLOC_GFP);
-   if (!page)
-   return NULL;
-   }
-
-   atomic_set(>pt_frag_refcount, 1);
-
-   ret = page_address(page);
-   /*
-* if we support only one fragment just return the
-* allocated page.
-*/
-   if (PTE_FRAG_NR == 1)
-   return ret;
-   spin_lock(>page_table_lock);
-   /*

[PATCH v8 10/20] powerpc/mm: replace hugetlb_cache by PGT_CACHE(PTE_T_ORDER)

2018-11-29 Thread Christophe Leroy
Instead of opencoding cache handling for the special case
of hugepage tables having a single pte_t element, this
patch makes use of the common pgtable_cache helpers

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hugetlb.h |  2 --
 arch/powerpc/mm/hugetlbpage.c  | 26 +++---
 2 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 98004262bc87..dfb8bf236586 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -5,8 +5,6 @@
 #ifdef CONFIG_HUGETLB_PAGE
 #include 
 
-extern struct kmem_cache *hugepte_cache;
-
 #ifdef CONFIG_PPC_BOOK3S_64
 
 #include 
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8cf035e68378..c4f1263228b8 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -42,6 +42,8 @@ EXPORT_SYMBOL(HPAGE_SHIFT);
 
 #define hugepd_none(hpd)   (hpd_val(hpd) == 0)
 
+#define PTE_T_ORDER(__builtin_ffs(sizeof(pte_t)) - 
__builtin_ffs(sizeof(void *)))
+
 pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long 
sz)
 {
/*
@@ -61,7 +63,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
int num_hugepd;
 
if (pshift >= pdshift) {
-   cachep = hugepte_cache;
+   cachep = PGT_CACHE(PTE_T_ORDER);
num_hugepd = 1 << (pshift - pdshift);
} else {
cachep = PGT_CACHE(pdshift - pshift);
@@ -264,7 +266,7 @@ static void hugepd_free_rcu_callback(struct rcu_head *head)
unsigned int i;
 
for (i = 0; i < batch->index; i++)
-   kmem_cache_free(hugepte_cache, batch->ptes[i]);
+   kmem_cache_free(PGT_CACHE(PTE_T_ORDER), batch->ptes[i]);
 
free_page((unsigned long)batch);
 }
@@ -277,7 +279,7 @@ static void hugepd_free(struct mmu_gather *tlb, void 
*hugepte)
 
if (atomic_read(>mm->mm_users) < 2 ||
mm_is_thread_local(tlb->mm)) {
-   kmem_cache_free(hugepte_cache, hugepte);
+   kmem_cache_free(PGT_CACHE(PTE_T_ORDER), hugepte);
put_cpu_var(hugepd_freelist_cur);
return;
}
@@ -652,7 +654,6 @@ static int __init hugepage_setup_sz(char *str)
 }
 __setup("hugepagesz=", hugepage_setup_sz);
 
-struct kmem_cache *hugepte_cache;
 static int __init hugetlbpage_init(void)
 {
int psize;
@@ -702,21 +703,8 @@ static int __init hugetlbpage_init(void)
if (pdshift > shift)
pgtable_cache_add(pdshift - shift, NULL);
 #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
-   else if (!hugepte_cache) {
-   /*
-* Create a kmem cache for hugeptes.  The bottom bits in
-* the pte have size information encoded in them, so
-* align them to allow this
-*/
-   hugepte_cache = kmem_cache_create("hugepte-cache",
- sizeof(pte_t),
- HUGEPD_SHIFT_MASK + 1,
- 0, NULL);
-   if (hugepte_cache == NULL)
-   panic("%s: Unable to create kmem cache "
- "for hugeptes\n", __func__);
-
-   }
+   else
+   pgtable_cache_add(PTE_T_ORDER, NULL);
 #endif
}
 
-- 
2.13.3



[PATCH v8 14/20] powerpc/8xx: Temporarily disable 16k pages and hugepages

2018-11-29 Thread Christophe Leroy
In preparation of making use of hardware assistance in TLB handlers,
this patch temporarily disables 16K pages and hugepages. The reason
is that when using HW assistance in 4K pages mode, the linux model
fit with the HW model for 4K pages and 8M pages.

However for 16K pages and 512K mode some additional work is needed
to get linux model fit with HW model.
For the 8M pages, they will naturaly come back when we switch to
HW assistance, without any additional handling.
In order to keep the following patch smaller, the removal of the
current special handling for 8M pages gets removed here as well.

Therefore the 4K pages mode will be implemented first and without
support for 512k hugepages. Then the 512k hugepages will be brought
back. And the 16K pages will be implemented in the following step.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/kernel/head_8xx.S | 74 +++---
 arch/powerpc/mm/tlb_nohash.c   |  6 
 3 files changed, 6 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8be31261aec8..ddfccdf004fe 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -689,7 +689,7 @@ config PPC_4K_PAGES
 
 config PPC_16K_PAGES
bool "16k page size"
-   depends on 44x || PPC_8xx
+   depends on 44x
 
 config PPC_64K_PAGES
bool "64k page size"
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index c203defe49a4..01f58b1d9ae7 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -314,7 +314,7 @@ SystemCall:
 InstructionTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
mtspr   SPRN_SPRG_SCRATCH1, r11
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mtspr   SPRN_SPRG_SCRATCH2, r12
 #endif
 
@@ -325,10 +325,8 @@ InstructionTLBMiss:
INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
/* Only modules will cause ITLB Misses as we always
 * pin the first 8MB of kernel memory */
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
-   mfcrr12
-#endif
 #ifdef ITLB_MISS_KERNEL
+   mfcrr12
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
andis.  r11, r10, 0x8000/* Address >= 0x8000 */
 #else
@@ -360,15 +358,9 @@ InstructionTLBMiss:
 
/* Extract level 2 index */
rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-#ifdef CONFIG_HUGETLB_PAGE
-   mtcrr11
-   bt- 28, 10f /* bit 28 = Large page (8M) */
-   bt- 29, 20f /* bit 29 = Large page (8M or 512k) */
-#endif
rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
lwz r10, 0(r10) /* Get the pte */
-4:
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mtcrr12
 #endif
/* Load the MI_TWC with the attributes for this "segment." */
@@ -393,7 +385,7 @@ InstructionTLBMiss:
/* Restore registers */
 0: mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
@@ -406,35 +398,12 @@ InstructionTLBMiss:
stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
 #endif
 
-#ifdef CONFIG_HUGETLB_PAGE
-10:/* 8M pages */
-#ifdef CONFIG_PPC_16K_PAGES
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT_8M - PAGE_SHIFT), 32 + PAGE_SHIFT_8M 
- (PAGE_SHIFT << 1), 29
-   /* Add level 2 base */
-   rlwimi  r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
-#else
-   /* Level 2 base */
-   rlwinm  r10, r11, 0, ~HUGEPD_SHIFT_MASK
-#endif
-   lwz r10, 0(r10) /* Get the pte */
-   b   4b
-
-20:/* 512k pages */
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT_512K - PAGE_SHIFT), 32 + 
PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
-   /* Add level 2 base */
-   rlwimi  r10, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
-   lwz r10, 0(r10) /* Get the pte */
-   b   4b
-#endif
-
. = 0x1200
 DataStoreTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
@@ -472,11 +441,6 @@ DataStoreTLBMiss:
 */
/* Extract level 2 index */
rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-#ifdef CONFIG_HUGETLB_PAGE
-   mtcrr11
-   bt- 28, 10f /* bit 28 = Large page (8M) */
-   bt- 29, 20f /* bit 29 = Large page (8M or 512k) */
-#endif
rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
   

Re: [PATCH 0/3] System call table generation support

2018-11-29 Thread Firoz Khan
Hi Sathish,

Thanks for your email.

On Thu, 29 Nov 2018 at 12:05, Satheesh Rajendran
 wrote:
>
> On Fri, Sep 14, 2018 at 02:02:57PM +0530, Firoz Khan wrote:
> > The purpose of this patch series is:
> > 1. We can easily add/modify/delete system call by changing entry
> > in syscall.tbl file. No need to manually edit many files.
> >
> > 2. It is easy to unify the system call implementation across all
> > the architectures.
> >
> > The system call tables are in different format in all architecture
> > and it will be difficult to manually add or modify the system calls
> > in the respective files manually. To make it easy by keeping a script
> > and which'll generate the header file and syscall table file so this
> > change will unify them across all architectures.
> >
> > syscall.tbl contains the list of available system calls along with
> > system call number and corresponding entry point. Add a new system
> > call in this architecture will be possible by adding new entry in
> > the syscall.tbl file.
> >
> > Adding a new table entry consisting of:
> > - System call number.
> > - ABI.
> > - System call name.
> > - Entry point name.
> > - Compat entry name, if required.
> >
> > ARM, s390 and x86 architecuture does exist the similar support. I
> > leverage their implementation to come up with a generic solution.
> >
> > I have done the same support for work for alpha, m68k, microblaze,
> > ia64, mips, parisc, sh, sparc, and xtensa. But I started sending
> > the patch for one architecuture for review. Below mentioned git
> > repository contains more details.
> > Git repo:- https://github.com/frzkhn/system_call_table_generator/
> >
> > Finally, this is the ground work for solving the Y2038 issue. We
> > need to add/change two dozen of system calls to solve Y2038 issue.
> > So this patch series will help to easily modify from existing
> > system call to Y2038 compatible system calls.
> >
> > I started working system call table generation on 4.17-rc1. I used
> > marcin's script - https://github.com/hrw/syscalls-table to generate
> > the syscall.tbl file. And this will be the input to the system call
> > table generation script. But there are couple system call got add
> > in the latest rc release. If run Marcin's script on latest release,
> > It will generate a new syscall.tbl. But I still use the old file -
> > syscall.tbl and once all review got over I'll update syscall.tbl
> > alone w.r.to the tip of the kernel. The impact of this thing, few
> > of the system call won't work.
> >
> > Firoz Khan (3):
> >   powerpc: Replace NR_syscalls macro from asm/unistd.h
> >   powerpc: Add system call table generation support
> >   powerpc: uapi header and system call table file generation
> >
> >  arch/powerpc/Makefile   |   3 +
> >  arch/powerpc/include/asm/Kbuild |   3 +
> >  arch/powerpc/include/asm/unistd.h   |   3 +-
> >  arch/powerpc/include/uapi/asm/Kbuild|   2 +
> >  arch/powerpc/include/uapi/asm/unistd.h  | 391 
> > +---
> >  arch/powerpc/kernel/Makefile|   3 +-
> >  arch/powerpc/kernel/syscall_table_32.S  |   9 +
> >  arch/powerpc/kernel/syscall_table_64.S  |  17 ++
> >  arch/powerpc/kernel/syscalls/Makefile   |  51 
> >  arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 
> > +++
> >  arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 
> > ++
> >  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
> >  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  38 +++
> >  arch/powerpc/kernel/systbl.S|  50 
> >  14 files changed, 916 insertions(+), 441 deletions(-)
> >  create mode 100644 arch/powerpc/kernel/syscall_table_32.S
> >  create mode 100644 arch/powerpc/kernel/syscall_table_64.S
> >  create mode 100644 arch/powerpc/kernel/syscalls/Makefile
> >  create mode 100644 arch/powerpc/kernel/syscalls/syscall_32.tbl
> >  create mode 100644 arch/powerpc/kernel/syscalls/syscall_64.tbl
> >  create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
> >  create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
> >  delete mode 100644 arch/powerpc/kernel/systbl.S
>
> Hi,
>
> This patch series failed to boot in IBM Power8 box with below base commit and 
> built with ppc64le_defconfig,
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=merge=183cbf93be88d1a4fb572e27b1e08aa0ad85

I think you are applied some old patch series. Could you please
perform the boot test on powerpc v3 which I have sent few hour before.

Thanks
Firoz

>
> Complete boot log attached.
>
>
> [1.577383] SGI XFS with ACLs, security attributes, no debug enabled
> [1.581550] Bad kernel stack pointer 6e69 at c0e2ceec
> [1.581558] Oops: Bad kernel stack pointer, sig: 6 [#1]
> [1.581562] LE SMP NR_CPUS=2048 NUMA PowerNV
> [1.581567] Modules linked in:
> [1.581572] CPU: 3 PID: 1937 

[PATCH v8 12/20] powerpc/mm: remove unnecessary test in pgtable_cache_init()

2018-11-29 Thread Christophe Leroy
pgtable_cache_add() gracefully handles the case when a cache that
size already exists by returning early with the following test:

if (PGT_CACHE(shift))
return; /* Already have a cache of this size */

It is then not needed to test the existence of the cache before.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/init-common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index b7ca03643d0b..1e6910eb70ed 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -111,13 +111,13 @@ void pgtable_cache_init(void)
 {
pgtable_cache_add(PGD_INDEX_SIZE);
 
-   if (PMD_CACHE_INDEX && !PGT_CACHE(PMD_CACHE_INDEX))
+   if (PMD_CACHE_INDEX)
pgtable_cache_add(PMD_CACHE_INDEX);
/*
 * In all current configs, when the PUD index exists it's the
 * same size as either the pgd or pmd index except with THP enabled
 * on book3s 64
 */
-   if (PUD_CACHE_INDEX && !PGT_CACHE(PUD_CACHE_INDEX))
+   if (PUD_CACHE_INDEX)
pgtable_cache_add(PUD_CACHE_INDEX);
 }
-- 
2.13.3



[PATCH v8 13/20] powerpc/8xx: Move SW perf counters in first 32kb of memory

2018-11-29 Thread Christophe Leroy
In order to simplify time critical exceptions handling 8xx
specific SW perf counters, this patch moves the counters into
the beginning of memory. This is possible because .text is readable
and the counters are never modified outside of the handlers.

By doing this, we avoid having to set a second register with
the upper part of the address of the counters.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 58 --
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 3b67b9533c82..c203defe49a4 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -106,6 +106,23 @@ turn_on_mmu:
mtspr   SPRN_SRR0,r0
rfi /* enables MMU */
 
+
+#ifdef CONFIG_PERF_EVENTS
+   .align  4
+
+   .globl  itlb_miss_counter
+itlb_miss_counter:
+   .space  4
+
+   .globl  dtlb_miss_counter
+dtlb_miss_counter:
+   .space  4
+
+   .globl  instruction_counter
+instruction_counter:
+   .space  4
+#endif
+
 /*
  * Exception entry code.  This code runs with address translation
  * turned off, i.e. using physical addresses.
@@ -384,17 +401,16 @@ InstructionTLBMiss:
 
 #ifdef CONFIG_PERF_EVENTS
patch_site  0f, patch__itlbmiss_perf
-0: lis r10, (itlb_miss_counter - PAGE_OFFSET)@ha
-   lwz r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
-   addir11, r11, 1
-   stw r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
-#endif
+0: lwz r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, 1
+   stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
 #if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
+#endif
 
 #ifdef CONFIG_HUGETLB_PAGE
 10:/* 8M pages */
@@ -509,15 +525,14 @@ DataStoreTLBMiss:
 
 #ifdef CONFIG_PERF_EVENTS
patch_site  0f, patch__dtlbmiss_perf
-0: lis r10, (dtlb_miss_counter - PAGE_OFFSET)@ha
-   lwz r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10)
-   addir11, r11, 1
-   stw r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10)
-#endif
+0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, 1
+   stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
mfspr   r12, SPRN_SPRG_SCRATCH2
rfi
+#endif
 
 #ifdef CONFIG_HUGETLB_PAGE
 10:/* 8M pages */
@@ -625,16 +640,13 @@ DataBreakpoint:
. = 0x1d00
 InstructionBreakpoint:
mtspr   SPRN_SPRG_SCRATCH0, r10
-   mtspr   SPRN_SPRG_SCRATCH1, r11
-   lis r10, (instruction_counter - PAGE_OFFSET)@ha
-   lwz r11, (instruction_counter - PAGE_OFFSET)@l(r10)
-   addir11, r11, -1
-   stw r11, (instruction_counter - PAGE_OFFSET)@l(r10)
+   lwz r10, (instruction_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, -1
+   stw r10, (instruction_counter - PAGE_OFFSET)@l(0)
lis r10, 0x
ori r10, r10, 0x01
mtspr   SPRN_COUNTA, r10
mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
rfi
 #else
EXCEPTION(0x1d00, Trap_1d, unknown_exception, EXC_XFER_EE)
@@ -1065,17 +1077,3 @@ swapper_pg_dir:
  */
 abatron_pteptrs:
.space  8
-
-#ifdef CONFIG_PERF_EVENTS
-   .globl  itlb_miss_counter
-itlb_miss_counter:
-   .space  4
-
-   .globl  dtlb_miss_counter
-dtlb_miss_counter:
-   .space  4
-
-   .globl  instruction_counter
-instruction_counter:
-   .space  4
-#endif
-- 
2.13.3



[PATCH v8 15/20] powerpc/8xx: Use hardware assistance in TLB handlers

2018-11-29 Thread Christophe Leroy
Today, on the 8xx the TLB handlers do SW tablewalk by doing all
the calculation in ASM, in order to match with the Linux page
table structure.

The 8xx offers hardware assistance which allows significant size
reduction of the TLB handlers, hence also reduces the time spent
in the handlers.

However, using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

This patch modifies the TLB handlers to use HW assistance for 4K PAGES.

Before that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 80 ticks
- DTLB miss: 62 ticks
After that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 72 ticks
- DTLB miss: 54 ticks
So the improvement is 10% for ITLB and 13% for DTLB misses

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 58 +-
 arch/powerpc/mm/8xx_mmu.c  |  4 +--
 2 files changed, 26 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 01f58b1d9ae7..85fb4b8bf6c7 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -292,7 +292,7 @@ SystemCall:
. = 0x1100
 /*
  * For the MPC8xx, this is a software tablewalk to load the instruction
- * TLB.  The task switch loads the M_TW register with the pointer to the first
+ * TLB.  The task switch loads the M_TWB register with the pointer to the first
  * level table.
  * If we discover there is no second level table (value is zero) or if there
  * is an invalid pte, we load that into the TLB, which causes another fault
@@ -323,6 +323,7 @@ InstructionTLBMiss:
 */
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
+   mtspr   SPRN_MD_EPN, r10
/* Only modules will cause ITLB Misses as we always
 * pin the first 8MB of kernel memory */
 #ifdef ITLB_MISS_KERNEL
@@ -339,7 +340,7 @@ InstructionTLBMiss:
 #endif
 #endif
 #endif
-   mfspr   r11, SPRN_M_TW  /* Get level 1 table */
+   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
 #ifdef ITLB_MISS_KERNEL
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
beq+3f
@@ -349,16 +350,14 @@ InstructionTLBMiss:
 #ifndef CONFIG_PIN_TLB_TEXT
blt cr7, ITLBMissLinear
 #endif
-   lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+   rlwinm  r11, r11, 0, 20, 31
+   orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
 #endif
-   /* Insert level 1 index */
-   rlwimi  r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 
29
lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the 
level 1 entry */
 
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-   rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
+   mtspr   SPRN_MD_TWC, r11
+   mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
 #ifdef ITLB_MISS_KERNEL
mtcrr12
@@ -417,7 +416,7 @@ DataStoreTLBMiss:
mfspr   r10, SPRN_MD_EPN
rlwinm  r11, r10, 16, 0xfff8
cmpli   cr0, r11, PAGE_OFFSET@h
-   mfspr   r11, SPRN_M_TW  /* Get level 1 table */
+   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
blt+3f
rlwinm  r11, r10, 16, 0xfff8
 #ifndef CONFIG_PIN_TLB_IMMR
@@ -430,20 +429,16 @@ DataStoreTLBMiss:
patch_site  0b, patch__dtlbmiss_immr_jmp
 #endif
blt cr7, DTLBMissLinear
-   lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
+   rlwinm  r11, r11, 0, 20, 31
+   orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
-
-   /* Insert level 1 index */
-   rlwimi  r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 
29
lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the 
level 1 entry */
 
-   /* We have a pte table, so load fetch the pte from the table.
-*/
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-   rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
+   mtspr   SPRN_MD_TWC, r11
+   mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
-4:
+
mtcrr12
 
/* Insert the Guarded flag into the TWC from the Linux PTE.
@@ -668,9 +663,10 @@ FixupDAR:/* Entry point for dcbx workaround. */
mtspr   

[PATCH v8 18/20] powerpc/8xx: reintroduce 16K pages with HW assistance

2018-11-29 Thread Christophe Leroy
Using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

In order to use hardware assistance with 16K pages, this patch does
the following modifications:
- Make PGD size independent of the main page size
- In 16k pages mode, redefine pte_t as a struct with 4 elements,
and populate those 4 elements in __set_pte_at() and pte_update()
- Adapt the size of the hugepage tables.
- Define a PTE_FRAGMENT_NB so that a 16k page contains 4 page tables.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig |  2 +-
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h |  1 +
 arch/powerpc/include/asm/nohash/32/pgtable.h | 19 ++-
 arch/powerpc/include/asm/nohash/pgtable.h|  4 
 arch/powerpc/include/asm/pgtable-types.h |  4 
 5 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ddfccdf004fe..8be31261aec8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -689,7 +689,7 @@ config PPC_4K_PAGES
 
 config PPC_16K_PAGES
bool "16k page size"
-   depends on 44x
+   depends on 44x || PPC_8xx
 
 config PPC_64K_PAGES
bool "64k page size"
diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index fa05aa566ece..25f05131afd5 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -190,6 +190,7 @@ typedef struct {
struct slice_mask mask_8m;
 # endif
 #endif
+   void *pte_frag;
 } mm_context_t;
 
 #define PHYS_IMMR_BASE (mfspr(SPRN_IMMR) & 0xfff8)
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index 31a03e9a42c4..e3e81b078432 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -19,7 +19,14 @@ extern int icache_44x_need_flush;
 
 #endif /* __ASSEMBLY__ */
 
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+#define PTE_INDEX_SIZE  (PTE_SHIFT - 2)
+#define PTE_FRAG_NR4
+#define PTE_FRAG_SIZE_SHIFT12
+#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
+#else
 #define PTE_INDEX_SIZE PTE_SHIFT
+#endif
 
 #define PMD_INDEX_SIZE 0
 #define PUD_INDEX_SIZE 0
@@ -49,7 +56,11 @@ extern int icache_44x_need_flush;
  * -Matt
  */
 /* PGDIR_SHIFT determines what a top-level page table entry can map */
+#ifdef CONFIG_PPC_8xx
+#define PGDIR_SHIFT22
+#else
 #define PGDIR_SHIFT(PAGE_SHIFT + PTE_INDEX_SIZE)
+#endif
 #define PGDIR_SIZE (1UL << PGDIR_SHIFT)
 #define PGDIR_MASK (~(PGDIR_SIZE-1))
 
@@ -233,7 +244,13 @@ static inline unsigned long pte_update(pte_t *p,
: "cc" );
 #else /* PTE_ATOMIC_UPDATES */
unsigned long old = pte_val(*p);
-   *p = __pte((old & ~clr) | set);
+   unsigned long new = (old & ~clr) | set;
+
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+   p->pte = p->pte1 = p->pte2 = p->pte3 = new;
+#else
+   *p = __pte(new);
+#endif
 #endif /* !PTE_ATOMIC_UPDATES */
 
 #ifdef CONFIG_44x
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 70ff23974b59..1ca1c1864b32 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -209,7 +209,11 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
/* Anything else just stores the PTE normally. That covers all 64-bit
 * cases, and 32-bit non-hash with 32-bit PTEs.
 */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+   ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
+#else
*ptep = pte;
+#endif
 
/*
 * With hardware tablewalk, a sync is needed to ensure that
diff --git a/arch/powerpc/include/asm/pgtable-types.h 
b/arch/powerpc/include/asm/pgtable-types.h
index eccb30b38b47..3b0edf041b2e 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -3,7 +3,11 @@
 #define _ASM_POWERPC_PGTABLE_TYPES_H
 
 /* PTE level */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
+#else
 typedef struct { pte_basic_t pte; } pte_t;
+#endif
 #define __pte(x)   ((pte_t) { (x) })
 static inline pte_basic_t pte_val(pte_t x)
 {
-- 
2.13.3



[PATCH v8 00/20] Implement use of HW assistance on TLB table walk on 8xx

2018-11-29 Thread Christophe Leroy
The purpose of this serie is to implement hardware assistance for TLB table walk
on the 8xx.

First part prepares for using HW assistance in TLB routines:
- Trivial fixes:
- Remove CONFIG_BOOKE stuff from book3S headers.
- Removal of unneeded atomic PTE update requirement for 8xx.
- move book3s64 page fragment code in a common part for reusing it by the
8xx as 16k page size mode still uses 4k page tables.
- Fixing a bug in memcache handling when standard pages and hugepages share
caches of the same size (see original discussion in 
https://patchwork.ozlabs.org/patch/957565/)
- Optimise access to 8xx perf counters (hence reducing number of registers used)

Second part implements HW assistance in TLB routines in the following steps:
- Disable 16k page size mode and 512k hugepages
- Switch 4k to HW assistance
- Bring back 512k hugepages
- Bring back 16k page size mode.

Last part cleans up:
- Take benefit of Miss handler size reduction to regroup related parts
- Reduce number of registers used in miss handlers, freeing them for future use.

Tested successfully on 8xx and 83xx (book3s/32)

Changes in v8:
 - Moved definitions in pgalloc.h to avoid conflicting with memcache patches.
 - Included the memcache bugfix serie in this serie to avoid conflicts between 
the two series when coming to the 512k pages patch.
 - In the 512k HW assistance patch, reduced the #ifdef mess by using 
IS_ENABLED(CONFIG_PPC_8xx) instead.

Changes in v7:
 - Reordered to get trivial and already reviewed patches in front.
 - Reordered to regroup all HW assistance related patches together.
 - Rebased on today merge branch (28 Nov)
 - Added a helper for access to mm_context_t.frag
 - Reduced the amount of changes in PPC32 to support pte_fragment
 - Applied pte_fragment to both nohash/32 and book3s/32

Changes in v6:
 - Droped the part related to handling GUARD attribute at PGD/PMD level.
 - Moved the commonalisation of page_fragment in the begining (this part has 
been reviewed by Aneesh)
 - Rebased on today merge branch (19 Oct)

Changes in v5:
 - Also avoid useless lock in get_pmd_from_cache()
 - A new patch to relocate mmu headers in platform specific directories
 - A new patch to distribute pgtable_t typedefs in platform specific
   mmu headers instead of the uggly #ifdef
 - Moved early_pte_alloc_kernel() in platform specific pgalloc
 - Restricted definition of PTE_FRAG_SIZE and PTE_FRAG_NR to platforms
   using the pte fragmentation.
 - arch_exit_mmap() and destroy_pagetable_cache() are now platform specific.

Changes in v4:
 - Reordered the serie to put at the end the modifications which makes
   L1 and L2 entries independant.
 - No modifications to ppc64 ioremap (we still have an opportunity to
   merge them, for a future patch serie)
 - 8xx code modified to use patch_site instead of patch_instruction
   to get a clearer code and avoid object pollution with global symbols
 - Moved perf counters in first 32kb of memory to optimise access
 - Split the big bang to HW assistance in several steps:
   1. Temporarily removes support of 16k pages and 512k hugepages
   2. Change TLB routines to use HW assistance for 4k pages and 8M hugepages
   3. Add back support for 512k hugepages
   4. Add back support for 16k pages (using pte_fragment as page tables are 
still 4k)

Changes in v3:
 - Fixed an issue in the 09/14 when CONFIG_PIN_TLB_TEXT was not enabled
 - Added performance measurement in the 09/14 commit log
 - Rebased on latest 'powerpc/merge' tree, which conflicted with 13/14

Changes in v2:
 - Removed the 3 first patchs which have been applied already
 - Fixed compilation errors reported by Michael
 - Squashed the commonalisation of ioremap functions into a single patch
 - Fixed the use of pte_fragment
 - Added a patch optimising perf counting of TLB misses and instructions

Christophe Leroy (20):
  powerpc/book3s32: Remove CONFIG_BOOKE dependent code
  powerpc/8xx: Remove PTE_ATOMIC_UPDATES
  powerpc/mm: Move pte_fragment_alloc() to a common location
  powerpc/mm: Avoid useless lock with single page fragments
  powerpc/mm: move platform specific mmu-xxx.h in platform directories
  powerpc/mm: Move pgtable_t into platform headers
  powerpc/mm: add helpers to get/set mm.context->pte_frag
  powerpc/mm: Extend pte_fragment functionality to PPC32
  powerpc/mm: enable the use of page table cache of order 0
  powerpc/mm: replace hugetlb_cache by PGT_CACHE(PTE_T_ORDER)
  powerpc/mm: fix a warning when a cache is common to PGD and hugepages
  powerpc/mm: remove unnecessary test in pgtable_cache_init()
  powerpc/8xx: Move SW perf counters in first 32kb of memory
  powerpc/8xx: Temporarily disable 16k pages and hugepages
  powerpc/8xx: Use hardware assistance in TLB handlers
  powerpc/8xx: Enable 8M hugepage support with HW assistance
  powerpc/8xx: Enable 512k hugepage support with HW assistance
  powerpc/8xx: reintroduce 16K pages with HW assistance
  powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers
  

[PATCH v8 16/20] powerpc/8xx: Enable 8M hugepage support with HW assistance

2018-11-29 Thread Christophe Leroy
HW assistance naturally supports 8M huge pages without
further modifications.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/tlb_nohash.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 4f79639e432f..8ad7aab150b7 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -97,6 +97,9 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
.shift  = 14,
},
 #endif
+   [MMU_PAGE_8M] = {
+   .shift  = 23,
+   },
 };
 #else
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
-- 
2.13.3



Re: [PATCH v3 0/4] powerpc: system call table generation support

2018-11-29 Thread Firoz Khan
++ sathn...@linux.vnet.ibm.com

On Thu, 29 Nov 2018 at 09:57, Firoz Khan  wrote:
>
> The purpose of this patch series is, we can easily
> add/modify/delete system call table support by cha-
> nging entry in syscall.tbl file instead of manually
> changing many files. The other goal is to unify the
> system call table generation support implementation
> across all the architectures.
>
> The system call tables are in different format in
> all architecture. It will be difficult to manually
> add, modify or delete the system calls in the resp-
> ective files manually. To make it easy by keeping a
> script and which'll generate uapi header file and
> syscall table file.
>
> syscall.tbl contains the list of available system
> calls along with system call number and correspond-
> ing entry point. Add a new system call in this arch-
> itecture will be possible by adding new entry in
> the syscall.tbl file.
>
> Adding a new table entry consisting of:
> - System call number.
> - ABI.
> - System call name.
> - Entry point name.
> - Compat entry name, if required.
> - spu entry name, if required.
>
> ARM, s390 and x86 architecuture does exist the sim-
> ilar support. I leverage their implementation to
> come up with a generic solution.
>
> I have done the same support for work for alpha,
> ia64, m68k, microblaze, mips, parisc, sh, sparc,
> and xtensa. Below mentioned git repository contains
> more details about the workflow.
>
> https://github.com/frzkhn/system_call_table_generator/
>
> Finally, this is the ground work to solve the Y2038
> issue. We need to add two dozen of system calls to
> solve Y2038 issue. So this patch series will help to
> add new system calls easily by adding new entry in the
> syscall.tbl.
>
> changes since v2:
>  - modified/optimized the syscall.tbl to avoid duplicate
>for the spu entries.
>  - updated the syscalltbl.sh to meet the above point.
>
> changes since v1:
>  - optimized/updated the syscall table generation
>scripts.
>  - fixed all mixed indentation issues in syscall.tbl.
>  - added "comments" in syscall_*.tbl.
>  - changed from generic-y to generated-y in Kbuild.
>
> Firoz Khan (4):
>   powerpc: add __NR_syscalls along with NR_syscalls
>   powerpc: move macro definition from asm/systbl.h
>   powerpc: add system call table generation support
>   powerpc: generate uapi header and system call table files
>
>  arch/powerpc/Makefile   |   3 +
>  arch/powerpc/include/asm/Kbuild |   4 +
>  arch/powerpc/include/asm/systbl.h   | 396 --
>  arch/powerpc/include/asm/unistd.h   |   3 +-
>  arch/powerpc/include/uapi/asm/Kbuild|   2 +
>  arch/powerpc/include/uapi/asm/unistd.h  | 389 +
>  arch/powerpc/kernel/Makefile|  10 -
>  arch/powerpc/kernel/syscalls/Makefile   |  63 
>  arch/powerpc/kernel/syscalls/syscall.tbl| 427 
> 
>  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  36 +++
>  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  36 +++
>  arch/powerpc/kernel/systbl.S|  37 +--
>  arch/powerpc/kernel/systbl_chk.c|  60 
>  arch/powerpc/platforms/cell/spu_callbacks.c |  17 +-
>  14 files changed, 591 insertions(+), 892 deletions(-)
>  delete mode 100644 arch/powerpc/include/asm/systbl.h
>  create mode 100644 arch/powerpc/kernel/syscalls/Makefile
>  create mode 100644 arch/powerpc/kernel/syscalls/syscall.tbl
>  create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
>  create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
>  delete mode 100644 arch/powerpc/kernel/systbl_chk.c
>
> --
> 1.9.1
>


[PATCH v8 02/20] powerpc/8xx: Remove PTE_ATOMIC_UPDATES

2018-11-29 Thread Christophe Leroy
commit 1bc54c03117b9 ("powerpc: rework 4xx PTE access and TLB miss")
introduced non atomic PTE updates and started the work of removing
PTE updates in TLB miss handlers, but kept PTE_ATOMIC_UPDATES for the
8xx with the following comment:
/* Until my rework is finished, 8xx still needs atomic PTE updates */

commit fe11dc3f9628e ("powerpc/8xx: Update TLB asm so it behaves as
linux mm expects") removed all PTE updates done in TLB miss handlers

Therefore, atomic PTE updates are not needed anymore for the 8xx

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pte-8xx.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index 6bfe041ef59d..c9e4b2d90f65 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -65,9 +65,6 @@
 
 #define _PTE_NONE_MASK 0
 
-/* Until my rework is finished, 8xx still needs atomic PTE updates */
-#define PTE_ATOMIC_UPDATES 1
-
 #ifdef CONFIG_PPC_16K_PAGES
 #define _PAGE_PSIZE_PAGE_SPS
 #else
-- 
2.13.3



[PATCH v8 04/20] powerpc/mm: Avoid useless lock with single page fragments

2018-11-29 Thread Christophe Leroy
There is no point in taking the page table lock as pte_frag or
pmd_frag are always NULL when we have only one fragment.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable-book3s64.c | 3 +++
 arch/powerpc/mm/pgtable-frag.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 0c0fd173208a..f3c31f5e1026 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -244,6 +244,9 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
 {
void *pmd_frag, *ret;
 
+   if (PMD_FRAG_NR == 1)
+   return NULL;
+
spin_lock(>page_table_lock);
ret = mm->context.pmd_frag;
if (ret) {
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index d61e7c2a9a79..7544d0d7177d 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -34,6 +34,9 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 {
void *pte_frag, *ret;
 
+   if (PTE_FRAG_NR == 1)
+   return NULL;
+
spin_lock(>page_table_lock);
ret = mm->context.pte_frag;
if (ret) {
-- 
2.13.3



[PATCH v8 07/20] powerpc/mm: add helpers to get/set mm.context->pte_frag

2018-11-29 Thread Christophe Leroy
In order to handle pte_fragment functions with single fragment
without adding pte_frag in all mm_context_t, this patch creates
two helpers which do nothing on platforms using a single fragment.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/pgtable.h | 31 +++
 arch/powerpc/mm/pgtable-frag.c |  8 
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9679b7519a35..734df2210749 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -110,6 +110,37 @@ void mark_initmem_nx(void);
 static inline void mark_initmem_nx(void) { }
 #endif
 
+/*
+ * When used, PTE_FRAG_NR is defined in subarch pgtable.h
+ * so we are sure it is included when arriving here.
+ */
+#ifndef PTE_FRAG_NR
+#define PTE_FRAG_NR1
+#define PTE_FRAG_SIZE_SHIFTPAGE_SHIFT
+#define PTE_FRAG_SIZE  (1UL << PTE_FRAG_SIZE_SHIFT)
+#endif
+
+#if PTE_FRAG_NR != 1
+static inline void *pte_frag_get(mm_context_t *ctx)
+{
+   return ctx->pte_frag;
+}
+
+static inline void pte_frag_set(mm_context_t *ctx, void *p)
+{
+   ctx->pte_frag = p;
+}
+#else
+static inline void *pte_frag_get(mm_context_t *ctx)
+{
+   return NULL;
+}
+
+static inline void pte_frag_set(mm_context_t *ctx, void *p)
+{
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 7544d0d7177d..af23a587f019 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -38,7 +38,7 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
return NULL;
 
spin_lock(>page_table_lock);
-   ret = mm->context.pte_frag;
+   ret = pte_frag_get(>context);
if (ret) {
pte_frag = ret + PTE_FRAG_SIZE;
/*
@@ -46,7 +46,7 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 */
if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
pte_frag = NULL;
-   mm->context.pte_frag = pte_frag;
+   pte_frag_set(>context, pte_frag);
}
spin_unlock(>page_table_lock);
return (pte_t *)ret;
@@ -86,9 +86,9 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int 
kernel)
 * the allocated page with single fragement
 * count.
 */
-   if (likely(!mm->context.pte_frag)) {
+   if (likely(!pte_frag_get(>context))) {
atomic_set(>pt_frag_refcount, PTE_FRAG_NR);
-   mm->context.pte_frag = ret + PTE_FRAG_SIZE;
+   pte_frag_set(>context, ret + PTE_FRAG_SIZE);
}
spin_unlock(>page_table_lock);
 
-- 
2.13.3



[PATCH v8 09/20] powerpc/mm: enable the use of page table cache of order 0

2018-11-29 Thread Christophe Leroy
hugepages uses a cache of order 0. Lets allow page tables
of order 0 in the common part in order to avoid open coding
in hugetlb

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 5 +
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 5 +
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 5 +
 arch/powerpc/include/asm/nohash/64/pgalloc.h | 5 +
 arch/powerpc/mm/init-common.c| 6 +++---
 5 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 0f58e5b9dbe7..b5b955eb2fb7 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -25,10 +25,7 @@
 extern void __bad_pte(pmd_t *pmd);
 
 extern struct kmem_cache *pgtable_cache[];
-#define PGT_CACHE(shift) ({\
-   BUG_ON(!(shift));   \
-   pgtable_cache[(shift) - 1]; \
-   })
+#define PGT_CACHE(shift) pgtable_cache[shift]
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index f949dd90af9b..4aba625389c4 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -37,10 +37,7 @@ extern struct vmemmap_backing *vmemmap_list;
 #define MAX_PGTABLE_INDEX_SIZE 0xf
 
 extern struct kmem_cache *pgtable_cache[];
-#define PGT_CACHE(shift) ({\
-   BUG_ON(!(shift));   \
-   pgtable_cache[(shift) - 1]; \
-   })
+#define PGT_CACHE(shift) pgtable_cache[shift]
 
 extern pte_t *pte_fragment_alloc(struct mm_struct *, unsigned long, int);
 extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long);
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 7e234582dce5..17963951bdb0 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -25,10 +25,7 @@
 extern void __bad_pte(pmd_t *pmd);
 
 extern struct kmem_cache *pgtable_cache[];
-#define PGT_CACHE(shift) ({\
-   BUG_ON(!(shift));   \
-   pgtable_cache[(shift) - 1]; \
-   })
+#define PGT_CACHE(shift) pgtable_cache[shift]
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index e2d62d033708..e95eb499a174 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -36,10 +36,7 @@ extern struct vmemmap_backing *vmemmap_list;
 #define MAX_PGTABLE_INDEX_SIZE 0xf
 
 extern struct kmem_cache *pgtable_cache[];
-#define PGT_CACHE(shift) ({\
-   BUG_ON(!(shift));   \
-   pgtable_cache[(shift) - 1]; \
-   })
+#define PGT_CACHE(shift) pgtable_cache[shift]
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 2b656e67f2ea..41190f2b60c2 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -40,7 +40,7 @@ static void pmd_ctor(void *addr)
memset(addr, 0, PMD_TABLE_SIZE);
 }
 
-struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
+struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE + 1];
 EXPORT_SYMBOL_GPL(pgtable_cache);  /* used by kvm_hv module */
 
 /*
@@ -71,7 +71,7 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
 * moment, gcc doesn't seem to recognize is_power_of_2 as a
 * constant expression, so so much for that. */
BUG_ON(!is_power_of_2(minalign));
-   BUG_ON((shift < 1) || (shift > MAX_PGTABLE_INDEX_SIZE));
+   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
 
if (PGT_CACHE(shift))
return; /* Already have a cache of this size */
@@ -83,7 +83,7 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
panic("Could not allocate pgtable cache for order %d", shift);
 
kfree(name);
-   pgtable_cache[shift - 1] = new;
+   pgtable_cache[shift] = new;
 
pr_debug("Allocated pgtable cache for order %d\n", shift);
 }
-- 
2.13.3



[PATCH v8 11/20] powerpc/mm: fix a warning when a cache is common to PGD and hugepages

2018-11-29 Thread Christophe Leroy
While implementing TLB miss HW assistance on the 8xx, the following
warning was encountered:

[  423.732965] WARNING: CPU: 0 PID: 345 at mm/slub.c:2412 
___slab_alloc.constprop.30+0x26c/0x46c
[  423.733033] CPU: 0 PID: 345 Comm: mmap Not tainted 
4.18.0-rc8-00664-g2dfff9121c55 #671
[  423.733075] NIP:  c0108f90 LR: c0109ad0 CTR: 0004
[  423.733121] REGS: c455bba0 TRAP: 0700   Not tainted  
(4.18.0-rc8-00664-g2dfff9121c55)
[  423.733147] MSR:  00021032   CR: 24224848  XER: 2000
[  423.733319]
[  423.733319] GPR00: c0109ad0 c455bc50 c4521910 c60053c0 007080c0 c0011b34 
c7fa41e0 c455be30
[  423.733319] GPR08: 0001 c00103a0 c7fa41e0 c49afcc4 24282842 10018840 
c079b37c 0040
[  423.733319] GPR16: 73f0 00210d00  0001 c455a000 0100 
0200 c455a000
[  423.733319] GPR24: c60053c0 c0011b34 007080c0 c455a000 c455a000 c7fa41e0 
 9032
[  423.734190] NIP [c0108f90] ___slab_alloc.constprop.30+0x26c/0x46c
[  423.734257] LR [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734283] Call Trace:
[  423.734326] [c455bc50] [0100] 0x100 (unreliable)
[  423.734430] [c455bcc0] [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734543] [c455bcf0] [c0011b34] huge_pte_alloc+0xc0/0x1dc
[  423.734633] [c455bd20] [c01044dc] hugetlb_fault+0x408/0x48c
[  423.734720] [c455bdb0] [c0104b20] follow_hugetlb_page+0x14c/0x44c
[  423.734826] [c455be10] [c00e8e54] __get_user_pages+0x1c4/0x3dc
[  423.734919] [c455be80] [c00e9924] __mm_populate+0xac/0x140
[  423.735020] [c455bec0] [c00db14c] vm_mmap_pgoff+0xb4/0xb8
[  423.735127] [c455bf00] [c00f27c0] ksys_mmap_pgoff+0xcc/0x1fc
[  423.735222] [c455bf40] [c000e0f8] ret_from_syscall+0x0/0x38
[  423.735271] Instruction dump:
[  423.735321] 7cbf482e 38fd0008 7fa6eb78 7fc4f378 4bfff5dd 7fe3fb78 4bfffe24 
81370010
[  423.735536] 71280004 41a2ff88 4840c571 4b80 <0fe0> 4bfffeb8 81340010 
712a0004
[  423.735757] ---[ end trace e9b222919a470790 ]---

This warning occurs when calling kmem_cache_zalloc() on a
cache having a constructor.

In this case it happens because PGD cache and 512k hugepte cache are
the same size (4k). While a cache with constructor is created for
the PGD, hugepages create cache without constructor and uses
kmem_cache_zalloc(). As both expect a cache with the same size,
the hugepages reuse the cache created for PGD, hence the conflict.

In order to avoid this conflict, this patch:
- modifies pgtable_cache_add() so that a zeroising constructor is
added for any cache size.
- replaces calls to kmem_cache_zalloc() by kmem_cache_alloc()

Signed-off-by: Christophe Leroy 
---
see original discussion in https://patchwork.ozlabs.org/patch/957565/

 arch/powerpc/include/asm/pgtable.h |  2 +-
 arch/powerpc/mm/hugetlbpage.c  |  6 ++---
 arch/powerpc/mm/init-common.c  | 46 ++
 3 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 734df2210749..74810bba45d2 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -101,7 +101,7 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
 /* can we use this in kvm */
 unsigned long vmalloc_to_phys(void *vmalloc_addr);
 
-void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
+void pgtable_cache_add(unsigned int shift);
 void pgtable_cache_init(void);
 
 #if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_PPC32)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index c4f1263228b8..bc97874d7c74 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -70,7 +70,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
num_hugepd = 1;
}
 
-   new = kmem_cache_zalloc(cachep, pgtable_gfp_flags(mm, GFP_KERNEL));
+   new = kmem_cache_alloc(cachep, pgtable_gfp_flags(mm, GFP_KERNEL));
 
BUG_ON(pshift > HUGEPD_SHIFT_MASK);
BUG_ON((unsigned long)new & HUGEPD_SHIFT_MASK);
@@ -701,10 +701,10 @@ static int __init hugetlbpage_init(void)
 * use pgt cache for hugepd.
 */
if (pdshift > shift)
-   pgtable_cache_add(pdshift - shift, NULL);
+   pgtable_cache_add(pdshift - shift);
 #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
else
-   pgtable_cache_add(PTE_T_ORDER, NULL);
+   pgtable_cache_add(PTE_T_ORDER);
 #endif
}
 
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 41190f2b60c2..b7ca03643d0b 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -25,19 +25,37 @@
 #include 
 #include 
 
-static void pgd_ctor(void *addr)
-{
-   memset(addr, 0, PGD_TABLE_SIZE);
+#define CTOR(shift) static void ctor_##shift(void *addr) \
+{  \
+   

[PATCH v8 17/20] powerpc/8xx: Enable 512k hugepage support with HW assistance

2018-11-29 Thread Christophe Leroy
For using 512k pages with hardware assistance, the PTEs have to be spread
every 128 bytes in the L2 table.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hugetlb.h |  4 +++-
 arch/powerpc/mm/hugetlbpage.c  | 10 +-
 arch/powerpc/mm/tlb_nohash.c   |  3 +++
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index dfb8bf236586..62a0ca02ca7d 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -74,7 +74,9 @@ static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned 
long addr,
unsigned long idx = 0;
 
pte_t *dir = hugepd_page(hpd);
-#ifndef CONFIG_PPC_FSL_BOOK3E
+#ifdef CONFIG_PPC_8xx
+   idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT;
+#elif !defined(CONFIG_PPC_FSL_BOOK3E)
idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd);
 #endif
 
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index bc97874d7c74..5f22f1d399b3 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -65,6 +65,9 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
if (pshift >= pdshift) {
cachep = PGT_CACHE(PTE_T_ORDER);
num_hugepd = 1 << (pshift - pdshift);
+   } else if (IS_ENABLED(CONFIG_PPC_8xx)) {
+   cachep = PGT_CACHE(PTE_SHIFT);
+   num_hugepd = 1;
} else {
cachep = PGT_CACHE(pdshift - pshift);
num_hugepd = 1;
@@ -331,6 +334,9 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
hugepd_t *hpdp, int pdshif
 
if (shift >= pdshift)
hugepd_free(tlb, hugepte);
+   else if (IS_ENABLED(CONFIG_PPC_8xx))
+   pgtable_free_tlb(tlb, hugepte,
+get_hugepd_cache_index(PTE_SHIFT));
else
pgtable_free_tlb(tlb, hugepte,
 get_hugepd_cache_index(pdshift - shift));
@@ -700,7 +706,9 @@ static int __init hugetlbpage_init(void)
 * if we have pdshift and shift value same, we don't
 * use pgt cache for hugepd.
 */
-   if (pdshift > shift)
+   if (pdshift > shift && IS_ENABLED(CONFIG_PPC_8xx))
+   pgtable_cache_add(PTE_SHIFT);
+   else if (pdshift > shift)
pgtable_cache_add(pdshift - shift);
 #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
else
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 8ad7aab150b7..ae5d568e267f 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -97,6 +97,9 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
.shift  = 14,
},
 #endif
+   [MMU_PAGE_512K] = {
+   .shift  = 19,
+   },
[MMU_PAGE_8M] = {
.shift  = 23,
},
-- 
2.13.3



[PATCH v8 20/20] powerpc/8xx: regroup TLB handler routines

2018-11-29 Thread Christophe Leroy
As this is running with MMU off, the CPU only does speculative
fetch for code in the same page.

Following the significant size reduction of TLB handler routines,
the side handlers can be brought back close to the main part,
ie in the same page.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 112 -
 1 file changed, 54 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 0a4f8a9c85ff..b171b7c0a0e7 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -399,6 +399,23 @@ InstructionTLBMiss:
rfi
 #endif
 
+#ifndef CONFIG_PIN_TLB_TEXT
+ITLBMissLinear:
+   mtcrr11
+   /* Set 8M byte page and mark it valid */
+   li  r11, MI_PS8MEG | MI_SVALID
+   mtspr   SPRN_MI_TWC, r11
+   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
+   ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT
+   mtspr   SPRN_MI_RPN, r10/* Update TLB entry */
+
+0: mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r11, SPRN_SPRG_SCRATCH1
+   rfi
+   patch_site  0b, patch__itlbmiss_exit_2
+#endif
+
. = 0x1200
 DataStoreTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
@@ -484,6 +501,43 @@ DataStoreTLBMiss:
rfi
 #endif
 
+DTLBMissIMMR:
+   mtcrr11
+   /* Set 512k byte guarded page and mark it valid */
+   li  r10, MD_PS512K | MD_GUARDED | MD_SVALID
+   mtspr   SPRN_MD_TWC, r10
+   mfspr   r10, SPRN_IMMR  /* Get current IMMR */
+   rlwinm  r10, r10, 0, 0xfff8 /* Get 512 kbytes boundary */
+   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT | _PAGE_NO_CACHE
+   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
+
+   li  r11, RPN_PATTERN
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+
+0: mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r11, SPRN_SPRG_SCRATCH1
+   rfi
+   patch_site  0b, patch__dtlbmiss_exit_2
+
+DTLBMissLinear:
+   mtcrr11
+   /* Set 8M byte page and mark it valid */
+   li  r11, MD_PS8MEG | MD_SVALID
+   mtspr   SPRN_MD_TWC, r11
+   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
+   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT
+   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
+
+   li  r11, RPN_PATTERN
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+
+0: mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r11, SPRN_SPRG_SCRATCH1
+   rfi
+   patch_site  0b, patch__dtlbmiss_exit_3
+
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
@@ -583,64 +637,6 @@ InstructionBreakpoint:
 
. = 0x2000
 
-/*
- * Bottom part of DataStoreTLBMiss handlers for IMMR area and linear RAM.
- * not enough space in the DataStoreTLBMiss area.
- */
-DTLBMissIMMR:
-   mtcrr11
-   /* Set 512k byte guarded page and mark it valid */
-   li  r10, MD_PS512K | MD_GUARDED | MD_SVALID
-   mtspr   SPRN_MD_TWC, r10
-   mfspr   r10, SPRN_IMMR  /* Get current IMMR */
-   rlwinm  r10, r10, 0, 0xfff8 /* Get 512 kbytes boundary */
-   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT | _PAGE_NO_CACHE
-   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
-
-   li  r11, RPN_PATTERN
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
-
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
-   rfi
-   patch_site  0b, patch__dtlbmiss_exit_2
-
-DTLBMissLinear:
-   mtcrr11
-   /* Set 8M byte page and mark it valid */
-   li  r11, MD_PS8MEG | MD_SVALID
-   mtspr   SPRN_MD_TWC, r11
-   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
-   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT
-   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
-
-   li  r11, RPN_PATTERN
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
-
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
-   rfi
-   patch_site  0b, patch__dtlbmiss_exit_3
-
-#ifndef CONFIG_PIN_TLB_TEXT
-ITLBMissLinear:
-   mtcrr11
-   /* Set 8M byte page and mark it valid */
-   li  r11, MI_PS8MEG | MI_SVALID
-   mtspr   SPRN_MI_TWC, r11
-   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
-   ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
-   

Re: [PATCH 0/3] System call table generation support

2018-11-29 Thread Satheesh Rajendran
On Thu, Nov 29, 2018 at 01:48:16PM +0530, Firoz Khan wrote:
> Hi Sathish,
> 
> Thanks for your email.
> 
> On Thu, 29 Nov 2018 at 12:05, Satheesh Rajendran
>  wrote:
> >
> > On Fri, Sep 14, 2018 at 02:02:57PM +0530, Firoz Khan wrote:
> > > The purpose of this patch series is:
> > > 1. We can easily add/modify/delete system call by changing entry
> > > in syscall.tbl file. No need to manually edit many files.
> > >
> > > 2. It is easy to unify the system call implementation across all
> > > the architectures.
> > >
> > > The system call tables are in different format in all architecture
> > > and it will be difficult to manually add or modify the system calls
> > > in the respective files manually. To make it easy by keeping a script
> > > and which'll generate the header file and syscall table file so this
> > > change will unify them across all architectures.
> > >
> > > syscall.tbl contains the list of available system calls along with
> > > system call number and corresponding entry point. Add a new system
> > > call in this architecture will be possible by adding new entry in
> > > the syscall.tbl file.
> > >
> > > Adding a new table entry consisting of:
> > > - System call number.
> > > - ABI.
> > > - System call name.
> > > - Entry point name.
> > > - Compat entry name, if required.
> > >
> > > ARM, s390 and x86 architecuture does exist the similar support. I
> > > leverage their implementation to come up with a generic solution.
> > >
> > > I have done the same support for work for alpha, m68k, microblaze,
> > > ia64, mips, parisc, sh, sparc, and xtensa. But I started sending
> > > the patch for one architecuture for review. Below mentioned git
> > > repository contains more details.
> > > Git repo:- https://github.com/frzkhn/system_call_table_generator/
> > >
> > > Finally, this is the ground work for solving the Y2038 issue. We
> > > need to add/change two dozen of system calls to solve Y2038 issue.
> > > So this patch series will help to easily modify from existing
> > > system call to Y2038 compatible system calls.
> > >
> > > I started working system call table generation on 4.17-rc1. I used
> > > marcin's script - https://github.com/hrw/syscalls-table to generate
> > > the syscall.tbl file. And this will be the input to the system call
> > > table generation script. But there are couple system call got add
> > > in the latest rc release. If run Marcin's script on latest release,
> > > It will generate a new syscall.tbl. But I still use the old file -
> > > syscall.tbl and once all review got over I'll update syscall.tbl
> > > alone w.r.to the tip of the kernel. The impact of this thing, few
> > > of the system call won't work.
> > >
> > > Firoz Khan (3):
> > >   powerpc: Replace NR_syscalls macro from asm/unistd.h
> > >   powerpc: Add system call table generation support
> > >   powerpc: uapi header and system call table file generation
> > >
> > >  arch/powerpc/Makefile   |   3 +
> > >  arch/powerpc/include/asm/Kbuild |   3 +
> > >  arch/powerpc/include/asm/unistd.h   |   3 +-
> > >  arch/powerpc/include/uapi/asm/Kbuild|   2 +
> > >  arch/powerpc/include/uapi/asm/unistd.h  | 391 
> > > +---
> > >  arch/powerpc/kernel/Makefile|   3 +-
> > >  arch/powerpc/kernel/syscall_table_32.S  |   9 +
> > >  arch/powerpc/kernel/syscall_table_64.S  |  17 ++
> > >  arch/powerpc/kernel/syscalls/Makefile   |  51 
> > >  arch/powerpc/kernel/syscalls/syscall_32.tbl | 378 
> > > +++
> > >  arch/powerpc/kernel/syscalls/syscall_64.tbl | 372 
> > > ++
> > >  arch/powerpc/kernel/syscalls/syscallhdr.sh  |  37 +++
> > >  arch/powerpc/kernel/syscalls/syscalltbl.sh  |  38 +++
> > >  arch/powerpc/kernel/systbl.S|  50 
> > >  14 files changed, 916 insertions(+), 441 deletions(-)
> > >  create mode 100644 arch/powerpc/kernel/syscall_table_32.S
> > >  create mode 100644 arch/powerpc/kernel/syscall_table_64.S
> > >  create mode 100644 arch/powerpc/kernel/syscalls/Makefile
> > >  create mode 100644 arch/powerpc/kernel/syscalls/syscall_32.tbl
> > >  create mode 100644 arch/powerpc/kernel/syscalls/syscall_64.tbl
> > >  create mode 100644 arch/powerpc/kernel/syscalls/syscallhdr.sh
> > >  create mode 100644 arch/powerpc/kernel/syscalls/syscalltbl.sh
> > >  delete mode 100644 arch/powerpc/kernel/systbl.S
> >
> > Hi,
> >
> > This patch series failed to boot in IBM Power8 box with below base commit 
> > and built with ppc64le_defconfig,
> > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=merge=183cbf93be88d1a4fb572e27b1e08aa0ad85
> 
> I think you are applied some old patch series. Could you please
> perform the boot test on powerpc v3 which I have sent few hour before.

Hi Firoz,

Looks like I chose a wrong mail to reply, but did test with v3 series itself.


Re: [PATCH 12/24] powerpc/mm: Fix reporting of kernel execute faults

2018-11-29 Thread Christophe LEROY




Le 30/11/2018 à 06:50, Aneesh Kumar K.V a écrit :

Christophe LEROY  writes:


Hi Ben,

I have an issue on the 8xx with this change

Le 19/07/2017 à 06:49, Benjamin Herrenschmidt a écrit :

We currently test for is_exec and DSISR_PROTFAULT but that doesn't
make sense as this is the wrong error bit to test for an execute
permission failure.


On the 8xx, on an exec permission failure, this is the correct BIT, see
below extract from reference manual:

Note that only one of bits 1, 3, and 4 will be set.
1 1 if the translation of an attempted access is not in the translation
tables. Otherwise 0
3 1 if the fetch access was to guarded memory when MSR[IR] = 1. Otherwise 0
4 1 if the access is not permitted by the protection mechanism; otherwise 0.

So on the 8xx, bit 3 is not DSISR_NOEXEC_OR_G but only DSISR_G.
When the PPP bits are set to No-Execute, we really get bit 4 that is
DSISR_PROTFAULT.



Do you have an url for the document? I am wondering whether we can get
Documentation/powerpc/cpu_families.txt updated with these urls?


https://www.nxp.com/docs/en/reference-manual/MPC885RM.pdf

Christophe







In fact, we had code that would return early if we had an exec
fault in kernel mode so I think that was just dead code anyway.

Finally the location of that test is awkward and prevents further
simplifications.

So instead move that test into a helper along with the existing
early test for kernel exec faults and out of range accesses,
and put it all in a "bad_kernel_fault()" helper. While at it
test the correct error bits.

Signed-off-by: Benjamin Herrenschmidt 
---
   arch/powerpc/mm/fault.c | 21 +++--
   1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index e8d6acc888c5..aead07cf8a5b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -180,6 +180,20 @@ static int mm_fault_error(struct pt_regs *regs, unsigned 
long addr, int fault)
return MM_FAULT_CONTINUE;
   }
   
+/* Is this a bad kernel fault ? */

+static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
+unsigned long address)
+{
+   if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT))) {


Do you mind if we had DSISR_PROTFAULT here as well ?

Christophe


+   printk_ratelimited(KERN_CRIT "kernel tried to execute"
+  " exec-protected page (%lx) -"
+  "exploit attempt? (uid: %d)\n",
+  address, from_kuid(_user_ns,
+ current_uid()));
+   }
+   return is_exec || (address >= TASK_SIZE);
+}
+
   /*
* Define the correct "is_write" bit in error_code based
* on the processor family
@@ -252,7 +266,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
 * The kernel should never take an execute fault nor should it
 * take a page fault to a kernel address.
 */
-   if (!is_user && (is_exec || (address >= TASK_SIZE)))
+   if (unlikely(!is_user && bad_kernel_fault(is_exec, error_code, 
address)))
return SIGSEGV;
   
   	/* We restore the interrupt state now */

@@ -491,11 +505,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned 
long address,
return 0;
}
   
-	if (is_exec && (error_code & DSISR_PROTFAULT))

-   printk_ratelimited(KERN_CRIT "kernel tried to execute 
NX-protected"
-  " page (%lx) - exploit attempt? (uid: %d)\n",
-  address, from_kuid(_user_ns, 
current_uid()));
-
return SIGSEGV;
   }
   NOKPROBE_SYMBOL(__do_page_fault);



Re: [PATCH] powerpc/mm: add exec protection on powerpc 603

2018-11-29 Thread Aneesh Kumar K.V

On 11/16/18 10:50 PM, Christophe Leroy wrote:

The 603 doesn't have a HASH table, TLB misses are handled by
software. It is then possible to generate page fault when
_PAGE_EXEC is not set like in nohash/32.

In order to support it, set_pte_filter() and
set_access_flags_filter() are made common, and the handling
is made dependent on MMU_FTR_HPTE_TABLE

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/book3s/32/hash.h|  1 +
  arch/powerpc/include/asm/book3s/32/pgtable.h | 18 +-
  arch/powerpc/include/asm/cputable.h  |  8 
  arch/powerpc/kernel/head_32.S|  2 ++
  arch/powerpc/mm/pgtable.c| 20 +++-
  5 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/hash.h 
b/arch/powerpc/include/asm/book3s/32/hash.h
index f2892c7ab73e..2a0a467d2985 100644
--- a/arch/powerpc/include/asm/book3s/32/hash.h
+++ b/arch/powerpc/include/asm/book3s/32/hash.h
@@ -26,6 +26,7 @@
  #define _PAGE_WRITETHRU   0x040   /* W: cache write-through */
  #define _PAGE_DIRTY   0x080   /* C: page changed */
  #define _PAGE_ACCESSED0x100   /* R: page referenced */
+#define _PAGE_EXEC 0x200   /* software: exec allowed */
  #define _PAGE_RW  0x400   /* software: user write access allowed */
  #define _PAGE_SPECIAL 0x800   /* software: Special page */
  
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h

index c21d33704633..cf844fed4527 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -10,9 +10,9 @@
  /* And here we include common definitions */
  
  #define _PAGE_KERNEL_RO		0

-#define _PAGE_KERNEL_ROX   0
+#define _PAGE_KERNEL_ROX   (_PAGE_EXEC)
  #define _PAGE_KERNEL_RW   (_PAGE_DIRTY | _PAGE_RW)
-#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW)
+#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC)
  
  #define _PAGE_HPTEFLAGS _PAGE_HASHPTE
  
@@ -66,11 +66,11 @@ static inline bool pte_user(pte_t pte)

   */
  #define PAGE_NONE __pgprot(_PAGE_BASE)
  #define PAGE_SHARED   __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
-#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
+#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | 
_PAGE_EXEC)
  #define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER)
+#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
  #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER)
+#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
  
  /* Permission masks used for kernel mappings */

  #define PAGE_KERNEL   __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
@@ -318,7 +318,7 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
   int psize)
  {
unsigned long set = pte_val(entry) &
-   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW);
+   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
  
  	pte_update(ptep, 0, set);
  
@@ -384,7 +384,7 @@ static inline int pte_dirty(pte_t pte)		{ return !!(pte_val(pte) & _PAGE_DIRTY);

  static inline int pte_young(pte_t pte){ return !!(pte_val(pte) 
& _PAGE_ACCESSED); }
  static inline int pte_special(pte_t pte)  { return !!(pte_val(pte) & 
_PAGE_SPECIAL); }
  static inline int pte_none(pte_t pte) { return (pte_val(pte) & 
~_PTE_NONE_MASK) == 0; }
-static inline bool pte_exec(pte_t pte) { return true; }
+static inline bool pte_exec(pte_t pte) { return pte_val(pte) & 
_PAGE_EXEC; }
  
  static inline int pte_present(pte_t pte)

  {
@@ -451,7 +451,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
  
  static inline pte_t pte_exprotect(pte_t pte)

  {
-   return pte;
+   return __pte(pte_val(pte) & ~_PAGE_EXEC);
  }
  
  static inline pte_t pte_mkclean(pte_t pte)

@@ -466,7 +466,7 @@ static inline pte_t pte_mkold(pte_t pte)
  
  static inline pte_t pte_mkexec(pte_t pte)

  {
-   return pte;
+   return __pte(pte_val(pte) | _PAGE_EXEC);
  }
  
  static inline pte_t pte_mkpte(pte_t pte)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 29f49a35d6ee..a0395ccbbe9e 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -296,7 +296,7 @@ static inline void cpu_feature_keys_init(void) { }
  #define CPU_FTRS_PPC601   (CPU_FTR_COMMON | CPU_FTR_601 | \
CPU_FTR_COHERENT_ICACHE | CPU_FTR_UNIFIED_ID_CACHE | CPU_FTR_USE_RTC)
  #define CPU_FTRS_603  (CPU_FTR_COMMON | CPU_FTR_MAYBE_CAN_DOZE | \
-   CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE)
+   CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE | CPU_FTR_NOEXECUTE)
  #define CPU_FTRS_604  

Re: [PATCH] powerpc/mm: add exec protection on powerpc 603

2018-11-29 Thread Christophe LEROY




Le 29/11/2018 à 12:25, Aneesh Kumar K.V a écrit :

On 11/16/18 10:50 PM, Christophe Leroy wrote:

The 603 doesn't have a HASH table, TLB misses are handled by
software. It is then possible to generate page fault when
_PAGE_EXEC is not set like in nohash/32.

In order to support it, set_pte_filter() and
set_access_flags_filter() are made common, and the handling
is made dependent on MMU_FTR_HPTE_TABLE

Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/book3s/32/hash.h    |  1 +
  arch/powerpc/include/asm/book3s/32/pgtable.h | 18 +-
  arch/powerpc/include/asm/cputable.h  |  8 
  arch/powerpc/kernel/head_32.S    |  2 ++
  arch/powerpc/mm/pgtable.c    | 20 +++-
  5 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/hash.h 
b/arch/powerpc/include/asm/book3s/32/hash.h

index f2892c7ab73e..2a0a467d2985 100644
--- a/arch/powerpc/include/asm/book3s/32/hash.h
+++ b/arch/powerpc/include/asm/book3s/32/hash.h
@@ -26,6 +26,7 @@
  #define _PAGE_WRITETHRU    0x040    /* W: cache write-through */
  #define _PAGE_DIRTY    0x080    /* C: page changed */
  #define _PAGE_ACCESSED    0x100    /* R: page referenced */
+#define _PAGE_EXEC    0x200    /* software: exec allowed */
  #define _PAGE_RW    0x400    /* software: user write access allowed */
  #define _PAGE_SPECIAL    0x800    /* software: Special page */
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h

index c21d33704633..cf844fed4527 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -10,9 +10,9 @@
  /* And here we include common definitions */
  #define _PAGE_KERNEL_RO    0
-#define _PAGE_KERNEL_ROX    0
+#define _PAGE_KERNEL_ROX    (_PAGE_EXEC)
  #define _PAGE_KERNEL_RW    (_PAGE_DIRTY | _PAGE_RW)
-#define _PAGE_KERNEL_RWX    (_PAGE_DIRTY | _PAGE_RW)
+#define _PAGE_KERNEL_RWX    (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC)
  #define _PAGE_HPTEFLAGS _PAGE_HASHPTE
@@ -66,11 +66,11 @@ static inline bool pte_user(pte_t pte)
   */
  #define PAGE_NONE    __pgprot(_PAGE_BASE)
  #define PAGE_SHARED    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
-#define PAGE_SHARED_X    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
+#define PAGE_SHARED_X    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW 
| _PAGE_EXEC)

  #define PAGE_COPY    __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_COPY_X    __pgprot(_PAGE_BASE | _PAGE_USER)
+#define PAGE_COPY_X    __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
  #define PAGE_READONLY    __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_READONLY_X    __pgprot(_PAGE_BASE | _PAGE_USER)
+#define PAGE_READONLY_X    __pgprot(_PAGE_BASE | _PAGE_USER | 
_PAGE_EXEC)

  /* Permission masks used for kernel mappings */
  #define PAGE_KERNEL    __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
@@ -318,7 +318,7 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,

 int psize)
  {
  unsigned long set = pte_val(entry) &
-    (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW);
+    (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
  pte_update(ptep, 0, set);
@@ -384,7 +384,7 @@ static inline int pte_dirty(pte_t pte)    { 
return !!(pte_val(pte) & _PAGE_DIRTY);
  static inline int pte_young(pte_t pte)    { return 
!!(pte_val(pte) & _PAGE_ACCESSED); }
  static inline int pte_special(pte_t pte)    { return !!(pte_val(pte) 
& _PAGE_SPECIAL); }
  static inline int pte_none(pte_t pte)    { return (pte_val(pte) 
& ~_PTE_NONE_MASK) == 0; }

-static inline bool pte_exec(pte_t pte)    { return true; }
+static inline bool pte_exec(pte_t pte)    { return pte_val(pte) & 
_PAGE_EXEC; }

  static inline int pte_present(pte_t pte)
  {
@@ -451,7 +451,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
  static inline pte_t pte_exprotect(pte_t pte)
  {
-    return pte;
+    return __pte(pte_val(pte) & ~_PAGE_EXEC);
  }
  static inline pte_t pte_mkclean(pte_t pte)
@@ -466,7 +466,7 @@ static inline pte_t pte_mkold(pte_t pte)
  static inline pte_t pte_mkexec(pte_t pte)
  {
-    return pte;
+    return __pte(pte_val(pte) | _PAGE_EXEC);
  }
  static inline pte_t pte_mkpte(pte_t pte)
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h

index 29f49a35d6ee..a0395ccbbe9e 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -296,7 +296,7 @@ static inline void cpu_feature_keys_init(void) { }
  #define CPU_FTRS_PPC601    (CPU_FTR_COMMON | CPU_FTR_601 | \
  CPU_FTR_COHERENT_ICACHE | CPU_FTR_UNIFIED_ID_CACHE | 
CPU_FTR_USE_RTC)

  #define CPU_FTRS_603    (CPU_FTR_COMMON | CPU_FTR_MAYBE_CAN_DOZE | \
-    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE)
+    CPU_FTR_MAYBE_CAN_NAP | CPU_FTR_PPC_LE | CPU_FTR_NOEXECUTE)
  #define CPU_FTRS_604    (CPU_FTR_COMMON | CPU_FTR_PPC_LE)
  #define CPU_FTRS_740_NOTAU  

Re: [PATCH] powerpc/boot: Copy serial.c in Makefile

2018-11-29 Thread Michael Ellerman
Hi dja,

Daniel Axtens  writes:
> Right, so as both 0-day and snowpatch tell me, this patch is wrong.
>
> It turns out that this:
>>  $(obj)/serial.c: $(obj)/autoconf.h
>> +$(Q)cp $< $@
> is identical to:
> cp arch/powerpc/boot/autoconf.h arch/powerpc/boot/serial.c
>
> (Clearly my make mastery is inadequate.)
>
> Amusingly this which works for my 64e uImage but obviously not for
> anything that actually needs code from serial.c.
>
> Further analysis suggests that making with -j1 triggers the issue, but
> everything works with -j2 and above. That would make sense with the
> timeline of when I discovered the issue because I changed my build
> script to not build in parallel.

I don't get why -j makes a difference, but that does explain why we
haven't seen it, none of my tests use -j 1 :)

I don't think we actually want to copy serial.c, we just want to specify
a dependency, does this work for you?

cheers

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 39354365f54a..ed9883169190 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -197,7 +197,7 @@ $(addprefix $(obj)/,$(libfdt) $(libfdtheader)): $(obj)/%: 
$(srctree)/scripts/dtc
 $(obj)/zImage.coff.lds $(obj)/zImage.ps3.lds : $(obj)/%: $(srctree)/$(src)/%.S
$(Q)cp $< $@
 
-$(obj)/serial.c: $(obj)/autoconf.h
+$(srctree)/$(src)/serial.c: $(obj)/autoconf.h
 
 $(obj)/autoconf.h: $(obj)/%: $(objtree)/include/generated/%
$(Q)cp $< $@


Re: [PATCH] powerpc/boot: Copy serial.c in Makefile

2018-11-29 Thread Daniel Axtens
Hi mpe,

>> Further analysis suggests that making with -j1 triggers the issue, but
>> everything works with -j2 and above. That would make sense with the
>> timeline of when I discovered the issue because I changed my build
>> script to not build in parallel.
>
> I don't get why -j makes a difference, but that does explain why we
> haven't seen it, none of my tests use -j 1 :)

I don't understand either - only that with -j1 V=1 I get

  powerpc64le-linux-gnu-gcc ... -c -o arch/powerpc/boot/serial.o 
arch/powerpc/boot/serial.c

and with -j2 I get:

  powerpc64le-linux-gnu-gcc ... -c -o arch/powerpc/boot/serial.o 
/home/dja/dev/linux/linux/arch/powerpc/boot/serial.c

So for some reason j2 is getting the absolute path and j1 is getting a
relative path. I have absolutely no idea why this would be.

> I don't think we actually want to copy serial.c, we just want to specify
> a dependency, does this work for you?

Yeah I think you're right, I misunderstood the boot wrapper.

That patch works for me - it causes the full path to be used.

Thanks heaps!

Regards,
Daniel

> cheers
>
> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> index 39354365f54a..ed9883169190 100644
> --- a/arch/powerpc/boot/Makefile
> +++ b/arch/powerpc/boot/Makefile
> @@ -197,7 +197,7 @@ $(addprefix $(obj)/,$(libfdt) $(libfdtheader)): $(obj)/%: 
> $(srctree)/scripts/dtc
>  $(obj)/zImage.coff.lds $(obj)/zImage.ps3.lds : $(obj)/%: 
> $(srctree)/$(src)/%.S
>   $(Q)cp $< $@
>  
> -$(obj)/serial.c: $(obj)/autoconf.h
> +$(srctree)/$(src)/serial.c: $(obj)/autoconf.h
>  
>  $(obj)/autoconf.h: $(obj)/%: $(objtree)/include/generated/%
>   $(Q)cp $< $@


Re: pkeys: Reserve PKEY_DISABLE_READ

2018-11-29 Thread Florian Weimer
* Dave Hansen:

> On 11/27/18 3:57 AM, Florian Weimer wrote:
>> I would have expected something that translates PKEY_DISABLE_WRITE |
>> PKEY_DISABLE_READ into PKEY_DISABLE_ACCESS, and also accepts
>> PKEY_DISABLE_ACCESS | PKEY_DISABLE_READ, for consistency with POWER.
>> 
>> (My understanding is that PKEY_DISABLE_ACCESS does not disable all
>> access, but produces execute-only memory.)
>
> Correct, it disables all data access, but not execution.

So I would expect something like this (completely untested, I did not
even compile this):

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 20ebf153c871..bed23f9e8336 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -199,6 +199,11 @@ static inline bool arch_pkeys_enabled(void)
return !static_branch_likely(_disabled);
 }
 
+static inline bool arch_pkey_access_rights_valid(unsigned long rights)
+{
+   return (rights & ~(unsigned long)PKEY_ACCESS_MASK) == 0;
+}
+
 extern void pkey_mm_init(struct mm_struct *mm);
 extern bool arch_supports_pkeys(int cap);
 extern unsigned int arch_usable_pkeys(void);
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 19b137f1b3be..e3e1d5a316e8 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -14,6 +14,17 @@ static inline bool arch_pkeys_enabled(void)
return boot_cpu_has(X86_FEATURE_OSPKE);
 }
 
+static inline bool arch_pkey_access_rights_valid(unsigned long rights)
+{
+   if (rights & ~(unsigned long)PKEY_ACCESS_MASK)
+   return false;
+   if (rights & PKEY_DISABLE_READ) {
+   /* x86 can only disable read access along with write access. */
+   return rights & (PKEY_DISABLE_WRITE | PKEY_DISABLE_ACCESS);
+   }
+   return true;
+}
+
 /*
  * Try to dedicate one of the protection keys to be used as an
  * execute-only protection key.
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 87a57b7642d3..b9b78145017f 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -928,7 +928,13 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int 
pkey,
return -EINVAL;
 
/* Set the bits we need in PKRU:  */
-   if (init_val & PKEY_DISABLE_ACCESS)
+   if (init_val & (PKEY_DISABLE_ACCESS | PKEY_DISABLE_READ))
+   /*
+* arch_pkey_access_rights_valid checked that
+* PKEY_DISABLE_READ is actually representable on x86
+* (that is, it comes with PKEY_DISABLE_ACCESS or
+* PKEY_DISABLE_WRITE).
+*/
new_pkru_bits |= PKRU_AD_BIT;
if (init_val & PKEY_DISABLE_WRITE)
new_pkru_bits |= PKRU_WD_BIT;
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 2955ba976048..2c330fabbe55 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -48,6 +48,11 @@ static inline void copy_init_pkru_to_fpregs(void)
 {
 }
 
+static inline bool arch_pkey_access_rights_valid(unsigned long rights)
+{
+   return false;
+}
+
 #endif /* ! CONFIG_ARCH_HAS_PKEYS */
 
 #endif /* _LINUX_PKEYS_H */
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 6d331620b9e5..f4cefc3540df 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -597,7 +597,7 @@ SYSCALL_DEFINE2(pkey_alloc, unsigned long, flags, unsigned 
long, init_val)
if (flags)
return -EINVAL;
/* check for unsupported init values */
-   if (init_val & ~PKEY_ACCESS_MASK)
+   if (!arch_pkey_access_rights_valid(init_val))
return -EINVAL;
 
down_write(>mm->mmap_sem);

Thanks,
Florian


Re: use generic DMA mapping code in powerpc V4

2018-11-29 Thread Christian Zigotzky

On 28 November 2018 at 12:05PM, Michael Ellerman wrote:

Christoph Hellwig  writes:


Any comments?  I'd like to at least get the ball moving on the easy
bits.

Nothing specific yet.

I'm a bit worried it might break one of the many old obscure platforms
we have that aren't well tested.

There's not much we can do about that, but I'll just try and test it on
everything I can find.

Is the plan that you take these via the dma-mapping tree or that they go
via powerpc?

cheers


Hi All,

I compiled a test kernel from the following Git today.

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.4

Command: git clone git://git.infradead.org/users/hch/misc.git -b 
powerpc-dma.4 a


Unfortunately I get some DMA error messages and the PASEMI ethernet 
doesn't work anymore.


[  367.627623] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627631] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627639] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627647] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627655] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627686] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.628418] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.628505] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.628592] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.629324] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.629417] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.629495] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.629589] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0


[  430.424732]pasemi_mac: rcmdsta error: 0x04ef3001

I tested this kernel with the Nemo board (CPU: PWRficient PA6T-1682M). 
The PASEMI ethernet works with the RC4 of kernel 4.20.


Cheers,
Christian


Re: [PATCH v3] powerpc/mm: add exec protection on powerpc 603

2018-11-29 Thread Christophe LEROY




Le 28/11/2018 à 18:21, Christophe Leroy a écrit :

The 603 doesn't have a HASH table, TLB misses are handled by
software. It is then possible to generate page fault when
_PAGE_EXEC is not set like in nohash/32.

There is one "reserved" PTE bit available, this patch uses
it for _PAGE_EXEC.

In order to support it, set_pte_filter() and
set_access_flags_filter() are made common, and the handling
is made dependent on MMU_FTR_HPTE_TABLE

Signed-off-by: Christophe Leroy 


Reviewed-by: Aneesh Kumar K.V 


---
  v3: Included the _PAGE_EXEC flag in the existing test in TLB handler, no 
additional insns.

  v2: Amended commit log and removed #ifdef in pagetable dump

  arch/powerpc/include/asm/book3s/32/hash.h  |  1 +
  arch/powerpc/include/asm/book3s/32/pgtable.h   | 18 +-
  arch/powerpc/include/asm/cputable.h|  8 
  arch/powerpc/kernel/head_32.S  |  2 +-
  arch/powerpc/mm/dump_linuxpagetables-generic.c |  2 --
  arch/powerpc/mm/pgtable.c  | 20 +++-
  6 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/hash.h 
b/arch/powerpc/include/asm/book3s/32/hash.h
index f2892c7ab73e..2a0a467d2985 100644
--- a/arch/powerpc/include/asm/book3s/32/hash.h
+++ b/arch/powerpc/include/asm/book3s/32/hash.h
@@ -26,6 +26,7 @@
  #define _PAGE_WRITETHRU   0x040   /* W: cache write-through */
  #define _PAGE_DIRTY   0x080   /* C: page changed */
  #define _PAGE_ACCESSED0x100   /* R: page referenced */
+#define _PAGE_EXEC 0x200   /* software: exec allowed */
  #define _PAGE_RW  0x400   /* software: user write access allowed */
  #define _PAGE_SPECIAL 0x800   /* software: Special page */
  
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h

index b849b45429d5..6accf3e686af 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -10,9 +10,9 @@
  /* And here we include common definitions */
  
  #define _PAGE_KERNEL_RO		0

-#define _PAGE_KERNEL_ROX   0
+#define _PAGE_KERNEL_ROX   (_PAGE_EXEC)
  #define _PAGE_KERNEL_RW   (_PAGE_DIRTY | _PAGE_RW)
-#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW)
+#define _PAGE_KERNEL_RWX   (_PAGE_DIRTY | _PAGE_RW | _PAGE_EXEC)
  
  #define _PAGE_HPTEFLAGS _PAGE_HASHPTE
  
@@ -66,11 +66,11 @@ static inline bool pte_user(pte_t pte)

   */
  #define PAGE_NONE __pgprot(_PAGE_BASE)
  #define PAGE_SHARED   __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
-#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW)
+#define PAGE_SHARED_X  __pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_RW | 
_PAGE_EXEC)
  #define PAGE_COPY __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER)
+#define PAGE_COPY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
  #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_USER)
-#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER)
+#define PAGE_READONLY_X__pgprot(_PAGE_BASE | _PAGE_USER | _PAGE_EXEC)
  
  /* Permission masks used for kernel mappings */

  #define PAGE_KERNEL   __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW)
@@ -318,7 +318,7 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
   int psize)
  {
unsigned long set = pte_val(entry) &
-   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW);
+   (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
  
  	pte_update(ptep, 0, set);
  
@@ -384,7 +384,7 @@ static inline int pte_dirty(pte_t pte)		{ return !!(pte_val(pte) & _PAGE_DIRTY);

  static inline int pte_young(pte_t pte){ return !!(pte_val(pte) 
& _PAGE_ACCESSED); }
  static inline int pte_special(pte_t pte)  { return !!(pte_val(pte) & 
_PAGE_SPECIAL); }
  static inline int pte_none(pte_t pte) { return (pte_val(pte) & 
~_PTE_NONE_MASK) == 0; }
-static inline bool pte_exec(pte_t pte) { return true; }
+static inline bool pte_exec(pte_t pte) { return pte_val(pte) & 
_PAGE_EXEC; }
  
  static inline int pte_present(pte_t pte)

  {
@@ -451,7 +451,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
  
  static inline pte_t pte_exprotect(pte_t pte)

  {
-   return pte;
+   return __pte(pte_val(pte) & ~_PAGE_EXEC);
  }
  
  static inline pte_t pte_mkclean(pte_t pte)

@@ -466,7 +466,7 @@ static inline pte_t pte_mkold(pte_t pte)
  
  static inline pte_t pte_mkexec(pte_t pte)

  {
-   return pte;
+   return __pte(pte_val(pte) | _PAGE_EXEC);
  }
  
  static inline pte_t pte_mkpte(pte_t pte)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 29f49a35d6ee..a0395ccbbe9e 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -296,7 +296,7 @@ static inline void cpu_feature_keys_init(void) { }
  #define CPU_FTRS_PPC601

[PATCH v9 01/20] powerpc/book3s32: Remove CONFIG_BOOKE dependent code

2018-11-29 Thread Christophe Leroy
BOOK3S/32 cannot be BOOKE, so remove useless code

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 18 --
 arch/powerpc/include/asm/book3s/32/pgtable.h | 14 --
 2 files changed, 32 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 82e44b1a00ae..eb8882c6dbb0 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -50,8 +50,6 @@ static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 #define __pmd_free_tlb(tlb,x,a)do { } while (0)
 /* #define pgd_populate(mm, pmd, pte)  BUG() */
 
-#ifndef CONFIG_BOOKE
-
 static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
   pte_t *pte)
 {
@@ -65,22 +63,6 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmdp,
 }
 
 #define pmd_pgtable(pmd) pmd_page(pmd)
-#else
-
-static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp,
-  pte_t *pte)
-{
-   *pmdp = __pmd((unsigned long)pte | _PMD_PRESENT);
-}
-
-static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
-   pgtable_t pte_page)
-{
-   *pmdp = __pmd((unsigned long)lowmem_page_address(pte_page) | 
_PMD_PRESENT);
-}
-
-#define pmd_pgtable(pmd) pmd_page(pmd)
-#endif
 
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
 extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index c21d33704633..32c33eccc0e2 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -328,24 +328,10 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
 #define __HAVE_ARCH_PTE_SAME
 #define pte_same(A,B)  (((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
 
-/*
- * Note that on Book E processors, the pmd contains the kernel virtual
- * (lowmem) address of the pte page.  The physical address is less useful
- * because everything runs with translation enabled (even the TLB miss
- * handler).  On everything else the pmd contains the physical address
- * of the pte page.  -- paulus
- */
-#ifndef CONFIG_BOOKE
 #define pmd_page_vaddr(pmd)\
((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
 #define pmd_page(pmd)  \
pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
-#else
-#define pmd_page_vaddr(pmd)\
-   ((unsigned long) (pmd_val(pmd) & PAGE_MASK))
-#define pmd_page(pmd)  \
-   pfn_to_page((__pa(pmd_val(pmd)) >> PAGE_SHIFT))
-#endif
 
 /* to find an entry in a kernel page-table-directory */
 #define pgd_offset_k(address) pgd_offset(_mm, address)
-- 
2.13.3



[PATCH v9 02/20] powerpc/8xx: Remove PTE_ATOMIC_UPDATES

2018-11-29 Thread Christophe Leroy
commit 1bc54c03117b9 ("powerpc: rework 4xx PTE access and TLB miss")
introduced non atomic PTE updates and started the work of removing
PTE updates in TLB miss handlers, but kept PTE_ATOMIC_UPDATES for the
8xx with the following comment:
/* Until my rework is finished, 8xx still needs atomic PTE updates */

commit fe11dc3f9628e ("powerpc/8xx: Update TLB asm so it behaves as
linux mm expects") removed all PTE updates done in TLB miss handlers

Therefore, atomic PTE updates are not needed anymore for the 8xx

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/nohash/32/pte-8xx.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pte-8xx.h 
b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
index 6bfe041ef59d..c9e4b2d90f65 100644
--- a/arch/powerpc/include/asm/nohash/32/pte-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/pte-8xx.h
@@ -65,9 +65,6 @@
 
 #define _PTE_NONE_MASK 0
 
-/* Until my rework is finished, 8xx still needs atomic PTE updates */
-#define PTE_ATOMIC_UPDATES 1
-
 #ifdef CONFIG_PPC_16K_PAGES
 #define _PAGE_PSIZE_PAGE_SPS
 #else
-- 
2.13.3



[PATCH v9 04/20] powerpc/mm: Avoid useless lock with single page fragments

2018-11-29 Thread Christophe Leroy
There is no point in taking the page table lock as pte_frag or
pmd_frag are always NULL when we have only one fragment.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/pgtable-book3s64.c | 3 +++
 arch/powerpc/mm/pgtable-frag.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 0c0fd173208a..f3c31f5e1026 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -244,6 +244,9 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
 {
void *pmd_frag, *ret;
 
+   if (PMD_FRAG_NR == 1)
+   return NULL;
+
spin_lock(>page_table_lock);
ret = mm->context.pmd_frag;
if (ret) {
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index d61e7c2a9a79..7544d0d7177d 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -34,6 +34,9 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 {
void *pte_frag, *ret;
 
+   if (PTE_FRAG_NR == 1)
+   return NULL;
+
spin_lock(>page_table_lock);
ret = mm->context.pte_frag;
if (ret) {
-- 
2.13.3



[PATCH v9 07/20] powerpc/mm: add helpers to get/set mm.context->pte_frag

2018-11-29 Thread Christophe Leroy
In order to handle pte_fragment functions with single fragment
without adding pte_frag in all mm_context_t, this patch creates
two helpers which do nothing on platforms using a single fragment.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/pgtable.h | 25 +
 arch/powerpc/mm/pgtable-frag.c |  8 
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 9679b7519a35..314a2890a972 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -110,6 +110,31 @@ void mark_initmem_nx(void);
 static inline void mark_initmem_nx(void) { }
 #endif
 
+/*
+ * When used, PTE_FRAG_NR is defined in subarch pgtable.h
+ * so we are sure it is included when arriving here.
+ */
+#ifdef PTE_FRAG_NR
+static inline void *pte_frag_get(mm_context_t *ctx)
+{
+   return ctx->pte_frag;
+}
+
+static inline void pte_frag_set(mm_context_t *ctx, void *p)
+{
+   ctx->pte_frag = p;
+}
+#else
+static inline void *pte_frag_get(mm_context_t *ctx)
+{
+   return NULL;
+}
+
+static inline void pte_frag_set(mm_context_t *ctx, void *p)
+{
+}
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_H */
diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
index 7544d0d7177d..af23a587f019 100644
--- a/arch/powerpc/mm/pgtable-frag.c
+++ b/arch/powerpc/mm/pgtable-frag.c
@@ -38,7 +38,7 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
return NULL;
 
spin_lock(>page_table_lock);
-   ret = mm->context.pte_frag;
+   ret = pte_frag_get(>context);
if (ret) {
pte_frag = ret + PTE_FRAG_SIZE;
/*
@@ -46,7 +46,7 @@ static pte_t *get_pte_from_cache(struct mm_struct *mm)
 */
if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
pte_frag = NULL;
-   mm->context.pte_frag = pte_frag;
+   pte_frag_set(>context, pte_frag);
}
spin_unlock(>page_table_lock);
return (pte_t *)ret;
@@ -86,9 +86,9 @@ static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int 
kernel)
 * the allocated page with single fragement
 * count.
 */
-   if (likely(!mm->context.pte_frag)) {
+   if (likely(!pte_frag_get(>context))) {
atomic_set(>pt_frag_refcount, PTE_FRAG_NR);
-   mm->context.pte_frag = ret + PTE_FRAG_SIZE;
+   pte_frag_set(>context, ret + PTE_FRAG_SIZE);
}
spin_unlock(>page_table_lock);
 
-- 
2.13.3



[PATCH v9 08/20] powerpc/mm: Extend pte_fragment functionality to PPC32

2018-11-29 Thread Christophe Leroy
In order to allow the 8xx to handle pte_fragments, this patch
extends the use of pte_fragments to PPC32 platforms.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  5 -
 arch/powerpc/include/asm/book3s/32/pgalloc.h  | 17 +
 arch/powerpc/include/asm/book3s/32/pgtable.h  |  5 +++--
 arch/powerpc/include/asm/mmu_context.h|  2 +-
 arch/powerpc/include/asm/nohash/32/mmu.h  |  4 +++-
 arch/powerpc/include/asm/nohash/32/pgalloc.h  | 22 +++---
 arch/powerpc/include/asm/nohash/32/pgtable.h  |  7 ---
 arch/powerpc/include/asm/pgtable.h|  4 
 arch/powerpc/mm/Makefile  |  1 +
 arch/powerpc/mm/mmu_context.c | 10 ++
 arch/powerpc/mm/mmu_context_nohash.c  |  2 +-
 arch/powerpc/mm/pgtable_32.c  | 25 -
 12 files changed, 55 insertions(+), 49 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index 5bd26c218b94..2bb500d25de6 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #ifndef _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_
 #define _ASM_POWERPC_BOOK3S_32_MMU_HASH_H_
+
 /*
  * 32-bit hash table MMU support
  */
@@ -9,6 +10,8 @@
  * BATs
  */
 
+#include 
+
 /* Block size masks */
 #define BL_128K0x000
 #define BL_256K 0x001
@@ -43,7 +46,7 @@ struct ppc_bat {
u32 batl;
 };
 
-typedef struct page *pgtable_t;
+typedef pte_t *pgtable_t;
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index eb8882c6dbb0..0f58e5b9dbe7 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -59,30 +59,31 @@ static inline void pmd_populate_kernel(struct mm_struct 
*mm, pmd_t *pmdp,
 static inline void pmd_populate(struct mm_struct *mm, pmd_t *pmdp,
pgtable_t pte_page)
 {
-   *pmdp = __pmd((page_to_pfn(pte_page) << PAGE_SHIFT) | _PMD_PRESENT);
+   *pmdp = __pmd(__pa(pte_page) | _PMD_PRESENT);
 }
 
-#define pmd_pgtable(pmd) pmd_page(pmd)
+#define pmd_pgtable(pmd) ((pgtable_t)pmd_page_vaddr(pmd))
 
 extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr);
 extern pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long addr);
+void pte_frag_destroy(void *pte_frag);
+pte_t *pte_fragment_alloc(struct mm_struct *mm, unsigned long vmaddr, int 
kernel);
+void pte_fragment_free(unsigned long *table, int kernel);
 
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-   free_page((unsigned long)pte);
+   pte_fragment_free((unsigned long *)pte, 1);
 }
 
 static inline void pte_free(struct mm_struct *mm, pgtable_t ptepage)
 {
-   pgtable_page_dtor(ptepage);
-   __free_page(ptepage);
+   pte_fragment_free((unsigned long *)ptepage, 0);
 }
 
 static inline void pgtable_free(void *table, unsigned index_size)
 {
if (!index_size) {
-   pgtable_page_dtor(virt_to_page(table));
-   free_page((unsigned long)table);
+   pte_fragment_free((unsigned long *)table, 0);
} else {
BUG_ON(index_size > MAX_PGTABLE_INDEX_SIZE);
kmem_cache_free(PGT_CACHE(index_size), table);
@@ -120,6 +121,6 @@ static inline void pgtable_free_tlb(struct mmu_gather *tlb,
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t table,
  unsigned long address)
 {
-   pgtable_free_tlb(tlb, page_address(table), 0);
+   pgtable_free_tlb(tlb, table, 0);
 }
 #endif /* _ASM_POWERPC_BOOK3S_32_PGALLOC_H */
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 32c33eccc0e2..47156b93f9af 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -329,7 +329,7 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
 #define pte_same(A,B)  (((pte_val(A) ^ pte_val(B)) & ~_PAGE_HASHPTE) == 0)
 
 #define pmd_page_vaddr(pmd)\
-   ((unsigned long) __va(pmd_val(pmd) & PAGE_MASK))
+   ((unsigned long)__va(pmd_val(pmd) & ~(PTE_TABLE_SIZE - 1)))
 #define pmd_page(pmd)  \
pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
 
@@ -346,7 +346,8 @@ static inline void __ptep_set_access_flags(struct 
vm_area_struct *vma,
 #define pte_offset_kernel(dir, addr)   \
((pte_t *) pmd_page_vaddr(*(dir)) + pte_index(addr))
 #define pte_offset_map(dir, addr)  \
-   ((pte_t *) kmap_atomic(pmd_page(*(dir))) + pte_index(addr))
+   ((pte_t *)(kmap_atomic(pmd_page(*(dir))) + \
+  (pmd_page_vaddr(*(dir)) & ~PAGE_MASK)) + pte_index(addr))
 #define pte_unmap(pte) kunmap_atomic(pte)
 

[PATCH v9 11/20] powerpc/mm: fix a warning when a cache is common to PGD and hugepages

2018-11-29 Thread Christophe Leroy
While implementing TLB miss HW assistance on the 8xx, the following
warning was encountered:

[  423.732965] WARNING: CPU: 0 PID: 345 at mm/slub.c:2412 
___slab_alloc.constprop.30+0x26c/0x46c
[  423.733033] CPU: 0 PID: 345 Comm: mmap Not tainted 
4.18.0-rc8-00664-g2dfff9121c55 #671
[  423.733075] NIP:  c0108f90 LR: c0109ad0 CTR: 0004
[  423.733121] REGS: c455bba0 TRAP: 0700   Not tainted  
(4.18.0-rc8-00664-g2dfff9121c55)
[  423.733147] MSR:  00021032   CR: 24224848  XER: 2000
[  423.733319]
[  423.733319] GPR00: c0109ad0 c455bc50 c4521910 c60053c0 007080c0 c0011b34 
c7fa41e0 c455be30
[  423.733319] GPR08: 0001 c00103a0 c7fa41e0 c49afcc4 24282842 10018840 
c079b37c 0040
[  423.733319] GPR16: 73f0 00210d00  0001 c455a000 0100 
0200 c455a000
[  423.733319] GPR24: c60053c0 c0011b34 007080c0 c455a000 c455a000 c7fa41e0 
 9032
[  423.734190] NIP [c0108f90] ___slab_alloc.constprop.30+0x26c/0x46c
[  423.734257] LR [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734283] Call Trace:
[  423.734326] [c455bc50] [0100] 0x100 (unreliable)
[  423.734430] [c455bcc0] [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734543] [c455bcf0] [c0011b34] huge_pte_alloc+0xc0/0x1dc
[  423.734633] [c455bd20] [c01044dc] hugetlb_fault+0x408/0x48c
[  423.734720] [c455bdb0] [c0104b20] follow_hugetlb_page+0x14c/0x44c
[  423.734826] [c455be10] [c00e8e54] __get_user_pages+0x1c4/0x3dc
[  423.734919] [c455be80] [c00e9924] __mm_populate+0xac/0x140
[  423.735020] [c455bec0] [c00db14c] vm_mmap_pgoff+0xb4/0xb8
[  423.735127] [c455bf00] [c00f27c0] ksys_mmap_pgoff+0xcc/0x1fc
[  423.735222] [c455bf40] [c000e0f8] ret_from_syscall+0x0/0x38
[  423.735271] Instruction dump:
[  423.735321] 7cbf482e 38fd0008 7fa6eb78 7fc4f378 4bfff5dd 7fe3fb78 4bfffe24 
81370010
[  423.735536] 71280004 41a2ff88 4840c571 4b80 <0fe0> 4bfffeb8 81340010 
712a0004
[  423.735757] ---[ end trace e9b222919a470790 ]---

This warning occurs when calling kmem_cache_zalloc() on a
cache having a constructor.

In this case it happens because PGD cache and 512k hugepte cache are
the same size (4k). While a cache with constructor is created for
the PGD, hugepages create cache without constructor and uses
kmem_cache_zalloc(). As both expect a cache with the same size,
the hugepages reuse the cache created for PGD, hence the conflict.

In order to avoid this conflict, this patch:
- modifies pgtable_cache_add() so that a zeroising constructor is
added for any cache size.
- replaces calls to kmem_cache_zalloc() by kmem_cache_alloc()

Signed-off-by: Christophe Leroy 
---
see original discussion in https://patchwork.ozlabs.org/patch/957565/

 arch/powerpc/include/asm/pgtable.h |  2 +-
 arch/powerpc/mm/hugetlbpage.c  |  6 ++---
 arch/powerpc/mm/init-common.c  | 46 ++
 3 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 56ef5437eb2f..f2bfaf674674 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -101,7 +101,7 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
 /* can we use this in kvm */
 unsigned long vmalloc_to_phys(void *vmalloc_addr);
 
-void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
+void pgtable_cache_add(unsigned int shift);
 void pgtable_cache_init(void);
 
 #if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_PPC32)
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index c4f1263228b8..bc97874d7c74 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -70,7 +70,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
num_hugepd = 1;
}
 
-   new = kmem_cache_zalloc(cachep, pgtable_gfp_flags(mm, GFP_KERNEL));
+   new = kmem_cache_alloc(cachep, pgtable_gfp_flags(mm, GFP_KERNEL));
 
BUG_ON(pshift > HUGEPD_SHIFT_MASK);
BUG_ON((unsigned long)new & HUGEPD_SHIFT_MASK);
@@ -701,10 +701,10 @@ static int __init hugetlbpage_init(void)
 * use pgt cache for hugepd.
 */
if (pdshift > shift)
-   pgtable_cache_add(pdshift - shift, NULL);
+   pgtable_cache_add(pdshift - shift);
 #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
else
-   pgtable_cache_add(PTE_T_ORDER, NULL);
+   pgtable_cache_add(PTE_T_ORDER);
 #endif
}
 
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 41190f2b60c2..b7ca03643d0b 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -25,19 +25,37 @@
 #include 
 #include 
 
-static void pgd_ctor(void *addr)
-{
-   memset(addr, 0, PGD_TABLE_SIZE);
+#define CTOR(shift) static void ctor_##shift(void *addr) \
+{  \
+   

[PATCH v9 03/20] powerpc/mm: Move pte_fragment_alloc() to a common location

2018-11-29 Thread Christophe Leroy
In preparation of next patch which generalises the use of
pte_fragment_alloc() for all, this patch moves the related functions
in a place that is common to all subarches.

The 8xx will need that for supporting 16k pages, as in that mode
page tables still have a size of 4k.

Since pte_fragment with only once fragment is not different
from what is done in the general case, we can easily migrate all
subarchs to pte fragments.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/pgalloc.h |   1 +
 arch/powerpc/mm/Makefile |   4 +-
 arch/powerpc/mm/mmu_context_book3s64.c   |  15 
 arch/powerpc/mm/pgtable-book3s64.c   |  85 
 arch/powerpc/mm/pgtable-frag.c   | 116 +++
 5 files changed, 120 insertions(+), 101 deletions(-)
 create mode 100644 arch/powerpc/mm/pgtable-frag.c

diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index 391ed2c3b697..f949dd90af9b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -50,6 +50,7 @@ extern void pgtable_free_tlb(struct mmu_gather *tlb, void 
*table, int shift);
 #ifdef CONFIG_SMP
 extern void __tlb_remove_table(void *_table);
 #endif
+void pte_frag_destroy(void *pte_frag);
 
 static inline pgd_t *radix__pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index ca96e7be4d0e..3cbb1acf0745 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -15,7 +15,9 @@ obj-$(CONFIG_PPC_MMU_NOHASH)  += mmu_context_nohash.o 
tlb_nohash.o \
 obj-$(CONFIG_PPC_BOOK3E)   += tlb_low_$(BITS)e.o
 hash64-$(CONFIG_PPC_NATIVE):= hash_native_64.o
 obj-$(CONFIG_PPC_BOOK3E_64)   += pgtable-book3e.o
-obj-$(CONFIG_PPC_BOOK3S_64)+= pgtable-hash64.o hash_utils_64.o slb.o 
$(hash64-y) mmu_context_book3s64.o pgtable-book3s64.o
+obj-$(CONFIG_PPC_BOOK3S_64)+= pgtable-hash64.o hash_utils_64.o slb.o \
+  $(hash64-y) mmu_context_book3s64.o \
+  pgtable-book3s64.o pgtable-frag.o
 obj-$(CONFIG_PPC_RADIX_MMU)+= pgtable-radix.o tlb-radix.o
 obj-$(CONFIG_PPC_STD_MMU_32)   += ppc_mmu_32.o hash_low_32.o 
mmu_context_hash32.o
 obj-$(CONFIG_PPC_STD_MMU)  += tlb_hash$(BITS).o
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c 
b/arch/powerpc/mm/mmu_context_book3s64.c
index 510f103d7813..f720c5cc0b5e 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -164,21 +164,6 @@ static void destroy_contexts(mm_context_t *ctx)
}
 }
 
-static void pte_frag_destroy(void *pte_frag)
-{
-   int count;
-   struct page *page;
-
-   page = virt_to_page(pte_frag);
-   /* drop all the pending references */
-   count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
-   /* We allow PTE_FRAG_NR fragments from a PTE page */
-   if (atomic_sub_and_test(PTE_FRAG_NR - count, >pt_frag_refcount)) {
-   pgtable_page_dtor(page);
-   __free_page(page);
-   }
-}
-
 static void pmd_frag_destroy(void *pmd_frag)
 {
int count;
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 9f93c9f985c5..0c0fd173208a 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -322,91 +322,6 @@ void pmd_fragment_free(unsigned long *pmd)
}
 }
 
-static pte_t *get_pte_from_cache(struct mm_struct *mm)
-{
-   void *pte_frag, *ret;
-
-   spin_lock(>page_table_lock);
-   ret = mm->context.pte_frag;
-   if (ret) {
-   pte_frag = ret + PTE_FRAG_SIZE;
-   /*
-* If we have taken up all the fragments mark PTE page NULL
-*/
-   if (((unsigned long)pte_frag & ~PAGE_MASK) == 0)
-   pte_frag = NULL;
-   mm->context.pte_frag = pte_frag;
-   }
-   spin_unlock(>page_table_lock);
-   return (pte_t *)ret;
-}
-
-static pte_t *__alloc_for_ptecache(struct mm_struct *mm, int kernel)
-{
-   void *ret = NULL;
-   struct page *page;
-
-   if (!kernel) {
-   page = alloc_page(PGALLOC_GFP | __GFP_ACCOUNT);
-   if (!page)
-   return NULL;
-   if (!pgtable_page_ctor(page)) {
-   __free_page(page);
-   return NULL;
-   }
-   } else {
-   page = alloc_page(PGALLOC_GFP);
-   if (!page)
-   return NULL;
-   }
-
-   atomic_set(>pt_frag_refcount, 1);
-
-   ret = page_address(page);
-   /*
-* if we support only one fragment just return the
-* allocated page.
-*/
-   if (PTE_FRAG_NR == 1)
-   return ret;
-   spin_lock(>page_table_lock);
-   /*

[PATCH v9 05/20] powerpc/mm: move platform specific mmu-xxx.h in platform directories

2018-11-29 Thread Christophe Leroy
The purpose of this patch is to move platform specific
mmu-xxx.h files in platform directories like pte-xxx.h files.

In the meantime this patch creates common nohash and
nohash/32 + nohash/64 mmu.h files for future common parts.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/mmu.h | 14 ++
 arch/powerpc/include/asm/{ => nohash/32}/mmu-40x.h |  0
 arch/powerpc/include/asm/{ => nohash/32}/mmu-44x.h |  0
 arch/powerpc/include/asm/{ => nohash/32}/mmu-8xx.h |  0
 arch/powerpc/include/asm/nohash/32/mmu.h   | 19 +++
 arch/powerpc/include/asm/nohash/64/mmu.h   |  8 
 arch/powerpc/include/asm/{ => nohash}/mmu-book3e.h |  0
 arch/powerpc/include/asm/nohash/mmu.h  | 11 +++
 arch/powerpc/kernel/cpu_setup_fsl_booke.S  |  2 +-
 arch/powerpc/kvm/e500.h|  2 +-
 10 files changed, 42 insertions(+), 14 deletions(-)
 rename arch/powerpc/include/asm/{ => nohash/32}/mmu-40x.h (100%)
 rename arch/powerpc/include/asm/{ => nohash/32}/mmu-44x.h (100%)
 rename arch/powerpc/include/asm/{ => nohash/32}/mmu-8xx.h (100%)
 create mode 100644 arch/powerpc/include/asm/nohash/32/mmu.h
 create mode 100644 arch/powerpc/include/asm/nohash/64/mmu.h
 rename arch/powerpc/include/asm/{ => nohash}/mmu-book3e.h (100%)
 create mode 100644 arch/powerpc/include/asm/nohash/mmu.h

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index eb20eb3b8fb0..2184021b0e1c 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -341,18 +341,8 @@ static inline void mmu_early_init_devtree(void) { }
 #if defined(CONFIG_PPC_STD_MMU_32)
 /* 32-bit classic hash table MMU */
 #include 
-#elif defined(CONFIG_40x)
-/* 40x-style software loaded TLB */
-#  include 
-#elif defined(CONFIG_44x)
-/* 44x-style software loaded TLB */
-#  include 
-#elif defined(CONFIG_PPC_BOOK3E_MMU)
-/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
-#  include 
-#elif defined (CONFIG_PPC_8xx)
-/* Motorola/Freescale 8xx software loaded TLB */
-#  include 
+#elif defined(CONFIG_PPC_MMU_NOHASH)
+#include 
 #endif
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/include/asm/mmu-40x.h 
b/arch/powerpc/include/asm/nohash/32/mmu-40x.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-40x.h
rename to arch/powerpc/include/asm/nohash/32/mmu-40x.h
diff --git a/arch/powerpc/include/asm/mmu-44x.h 
b/arch/powerpc/include/asm/nohash/32/mmu-44x.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-44x.h
rename to arch/powerpc/include/asm/nohash/32/mmu-44x.h
diff --git a/arch/powerpc/include/asm/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-8xx.h
rename to arch/powerpc/include/asm/nohash/32/mmu-8xx.h
diff --git a/arch/powerpc/include/asm/nohash/32/mmu.h 
b/arch/powerpc/include/asm/nohash/32/mmu.h
new file mode 100644
index ..af0e8b54876a
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/32/mmu.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_32_MMU_H_
+#define _ASM_POWERPC_NOHASH_32_MMU_H_
+
+#if defined(CONFIG_40x)
+/* 40x-style software loaded TLB */
+#include 
+#elif defined(CONFIG_44x)
+/* 44x-style software loaded TLB */
+#include 
+#elif defined(CONFIG_PPC_BOOK3E_MMU)
+/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
+#include 
+#elif defined (CONFIG_PPC_8xx)
+/* Motorola/Freescale 8xx software loaded TLB */
+#include 
+#endif
+
+#endif /* _ASM_POWERPC_NOHASH_32_MMU_H_ */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
new file mode 100644
index ..87871d027b75
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_64_MMU_H_
+#define _ASM_POWERPC_NOHASH_64_MMU_H_
+
+/* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
+#include 
+
+#endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/nohash/mmu-book3e.h
similarity index 100%
rename from arch/powerpc/include/asm/mmu-book3e.h
rename to arch/powerpc/include/asm/nohash/mmu-book3e.h
diff --git a/arch/powerpc/include/asm/nohash/mmu.h 
b/arch/powerpc/include/asm/nohash/mmu.h
new file mode 100644
index ..a037cb1efb57
--- /dev/null
+++ b/arch/powerpc/include/asm/nohash/mmu.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_POWERPC_NOHASH_MMU_H_
+#define _ASM_POWERPC_NOHASH_MMU_H_
+
+#ifdef CONFIG_PPC64
+#include 
+#else
+#include 
+#endif
+
+#endif /* _ASM_POWERPC_NOHASH_MMU_H_ */
diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S 
b/arch/powerpc/kernel/cpu_setup_fsl_booke.S
index 8d142e5d84cd..5fbc890d1094 100644
--- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S
+++ 

[PATCH v9 06/20] powerpc/mm: Move pgtable_t into platform headers

2018-11-29 Thread Christophe Leroy
This patch move pgtable_t into platform headers.

It gets rid of the CONFIG_PPC_64K_PAGES case for PPC64
as nohash/64 doesn't support CONFIG_PPC_64K_PAGES.

Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/mmu-hash.h |  2 ++
 arch/powerpc/include/asm/book3s/64/mmu.h  |  9 +
 arch/powerpc/include/asm/nohash/32/mmu.h  |  4 
 arch/powerpc/include/asm/nohash/64/mmu.h  |  4 
 arch/powerpc/include/asm/page.h   | 14 --
 5 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
index e38c91388c40..5bd26c218b94 100644
--- a/arch/powerpc/include/asm/book3s/32/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/32/mmu-hash.h
@@ -42,6 +42,8 @@ struct ppc_bat {
u32 batu;
u32 batl;
 };
+
+typedef struct page *pgtable_t;
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 6328857f259f..1ceee000c18d 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_POWERPC_BOOK3S_64_MMU_H_
 #define _ASM_POWERPC_BOOK3S_64_MMU_H_
 
+#include 
+
 #ifndef __ASSEMBLY__
 /*
  * Page size definition
@@ -24,6 +26,13 @@ struct mmu_psize_def {
 };
 extern struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 
+/*
+ * For BOOK3s 64 with 4k and 64K linux page size
+ * we want to use pointers, because the page table
+ * actually store pfn
+ */
+typedef pte_t *pgtable_t;
+
 #endif /* __ASSEMBLY__ */
 
 /* 64-bit classic hash table MMU */
diff --git a/arch/powerpc/include/asm/nohash/32/mmu.h 
b/arch/powerpc/include/asm/nohash/32/mmu.h
index af0e8b54876a..f61f933a4cd8 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu.h
@@ -16,4 +16,8 @@
 #include 
 #endif
 
+#ifndef __ASSEMBLY__
+typedef struct page *pgtable_t;
+#endif
+
 #endif /* _ASM_POWERPC_NOHASH_32_MMU_H_ */
diff --git a/arch/powerpc/include/asm/nohash/64/mmu.h 
b/arch/powerpc/include/asm/nohash/64/mmu.h
index 87871d027b75..e6585480dfc4 100644
--- a/arch/powerpc/include/asm/nohash/64/mmu.h
+++ b/arch/powerpc/include/asm/nohash/64/mmu.h
@@ -5,4 +5,8 @@
 /* Freescale Book-E software loaded TLB or Book-3e (ISA 2.06+) MMU */
 #include 
 
+#ifndef __ASSEMBLY__
+typedef struct page *pgtable_t;
+#endif
+
 #endif /* _ASM_POWERPC_NOHASH_64_MMU_H_ */
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 9ea903221a9f..a7624a3b1435 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -335,20 +335,6 @@ void arch_free_page(struct page *page, int order);
 #endif
 
 struct vm_area_struct;
-#ifdef CONFIG_PPC_BOOK3S_64
-/*
- * For BOOK3s 64 with 4k and 64K linux page size
- * we want to use pointers, because the page table
- * actually store pfn
- */
-typedef pte_t *pgtable_t;
-#else
-#if defined(CONFIG_PPC_64K_PAGES) && defined(CONFIG_PPC64)
-typedef pte_t *pgtable_t;
-#else
-typedef struct page *pgtable_t;
-#endif
-#endif
 
 #include 
 #endif /* __ASSEMBLY__ */
-- 
2.13.3



[PATCH v9 09/20] powerpc/mm: enable the use of page table cache of order 0

2018-11-29 Thread Christophe Leroy
hugepages uses a cache of order 0. Lets allow page tables
of order 0 in the common part in order to avoid open coding
in hugetlb

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/32/pgalloc.h | 5 +
 arch/powerpc/include/asm/book3s/64/pgalloc.h | 5 +
 arch/powerpc/include/asm/nohash/32/pgalloc.h | 5 +
 arch/powerpc/include/asm/nohash/64/pgalloc.h | 5 +
 arch/powerpc/mm/init-common.c| 6 +++---
 5 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgalloc.h 
b/arch/powerpc/include/asm/book3s/32/pgalloc.h
index 0f58e5b9dbe7..b5b955eb2fb7 100644
--- a/arch/powerpc/include/asm/book3s/32/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/32/pgalloc.h
@@ -25,10 +25,7 @@
 extern void __bad_pte(pmd_t *pmd);
 
 extern struct kmem_cache *pgtable_cache[];
-#define PGT_CACHE(shift) ({\
-   BUG_ON(!(shift));   \
-   pgtable_cache[(shift) - 1]; \
-   })
+#define PGT_CACHE(shift) pgtable_cache[shift]
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/include/asm/book3s/64/pgalloc.h 
b/arch/powerpc/include/asm/book3s/64/pgalloc.h
index f949dd90af9b..4aba625389c4 100644
--- a/arch/powerpc/include/asm/book3s/64/pgalloc.h
+++ b/arch/powerpc/include/asm/book3s/64/pgalloc.h
@@ -37,10 +37,7 @@ extern struct vmemmap_backing *vmemmap_list;
 #define MAX_PGTABLE_INDEX_SIZE 0xf
 
 extern struct kmem_cache *pgtable_cache[];
-#define PGT_CACHE(shift) ({\
-   BUG_ON(!(shift));   \
-   pgtable_cache[(shift) - 1]; \
-   })
+#define PGT_CACHE(shift) pgtable_cache[shift]
 
 extern pte_t *pte_fragment_alloc(struct mm_struct *, unsigned long, int);
 extern pmd_t *pmd_fragment_alloc(struct mm_struct *, unsigned long);
diff --git a/arch/powerpc/include/asm/nohash/32/pgalloc.h 
b/arch/powerpc/include/asm/nohash/32/pgalloc.h
index 7e234582dce5..17963951bdb0 100644
--- a/arch/powerpc/include/asm/nohash/32/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/32/pgalloc.h
@@ -25,10 +25,7 @@
 extern void __bad_pte(pmd_t *pmd);
 
 extern struct kmem_cache *pgtable_cache[];
-#define PGT_CACHE(shift) ({\
-   BUG_ON(!(shift));   \
-   pgtable_cache[(shift) - 1]; \
-   })
+#define PGT_CACHE(shift) pgtable_cache[shift]
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/include/asm/nohash/64/pgalloc.h 
b/arch/powerpc/include/asm/nohash/64/pgalloc.h
index e2d62d033708..e95eb499a174 100644
--- a/arch/powerpc/include/asm/nohash/64/pgalloc.h
+++ b/arch/powerpc/include/asm/nohash/64/pgalloc.h
@@ -36,10 +36,7 @@ extern struct vmemmap_backing *vmemmap_list;
 #define MAX_PGTABLE_INDEX_SIZE 0xf
 
 extern struct kmem_cache *pgtable_cache[];
-#define PGT_CACHE(shift) ({\
-   BUG_ON(!(shift));   \
-   pgtable_cache[(shift) - 1]; \
-   })
+#define PGT_CACHE(shift) pgtable_cache[shift]
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index 2b656e67f2ea..41190f2b60c2 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -40,7 +40,7 @@ static void pmd_ctor(void *addr)
memset(addr, 0, PMD_TABLE_SIZE);
 }
 
-struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE];
+struct kmem_cache *pgtable_cache[MAX_PGTABLE_INDEX_SIZE + 1];
 EXPORT_SYMBOL_GPL(pgtable_cache);  /* used by kvm_hv module */
 
 /*
@@ -71,7 +71,7 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
 * moment, gcc doesn't seem to recognize is_power_of_2 as a
 * constant expression, so so much for that. */
BUG_ON(!is_power_of_2(minalign));
-   BUG_ON((shift < 1) || (shift > MAX_PGTABLE_INDEX_SIZE));
+   BUG_ON(shift > MAX_PGTABLE_INDEX_SIZE);
 
if (PGT_CACHE(shift))
return; /* Already have a cache of this size */
@@ -83,7 +83,7 @@ void pgtable_cache_add(unsigned shift, void (*ctor)(void *))
panic("Could not allocate pgtable cache for order %d", shift);
 
kfree(name);
-   pgtable_cache[shift - 1] = new;
+   pgtable_cache[shift] = new;
 
pr_debug("Allocated pgtable cache for order %d\n", shift);
 }
-- 
2.13.3



[PATCH v9 10/20] powerpc/mm: replace hugetlb_cache by PGT_CACHE(PTE_T_ORDER)

2018-11-29 Thread Christophe Leroy
Instead of opencoding cache handling for the special case
of hugepage tables having a single pte_t element, this
patch makes use of the common pgtable_cache helpers

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hugetlb.h |  2 --
 arch/powerpc/mm/hugetlbpage.c  | 26 +++---
 2 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 98004262bc87..dfb8bf236586 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -5,8 +5,6 @@
 #ifdef CONFIG_HUGETLB_PAGE
 #include 
 
-extern struct kmem_cache *hugepte_cache;
-
 #ifdef CONFIG_PPC_BOOK3S_64
 
 #include 
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 8cf035e68378..c4f1263228b8 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -42,6 +42,8 @@ EXPORT_SYMBOL(HPAGE_SHIFT);
 
 #define hugepd_none(hpd)   (hpd_val(hpd) == 0)
 
+#define PTE_T_ORDER(__builtin_ffs(sizeof(pte_t)) - 
__builtin_ffs(sizeof(void *)))
+
 pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long 
sz)
 {
/*
@@ -61,7 +63,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
int num_hugepd;
 
if (pshift >= pdshift) {
-   cachep = hugepte_cache;
+   cachep = PGT_CACHE(PTE_T_ORDER);
num_hugepd = 1 << (pshift - pdshift);
} else {
cachep = PGT_CACHE(pdshift - pshift);
@@ -264,7 +266,7 @@ static void hugepd_free_rcu_callback(struct rcu_head *head)
unsigned int i;
 
for (i = 0; i < batch->index; i++)
-   kmem_cache_free(hugepte_cache, batch->ptes[i]);
+   kmem_cache_free(PGT_CACHE(PTE_T_ORDER), batch->ptes[i]);
 
free_page((unsigned long)batch);
 }
@@ -277,7 +279,7 @@ static void hugepd_free(struct mmu_gather *tlb, void 
*hugepte)
 
if (atomic_read(>mm->mm_users) < 2 ||
mm_is_thread_local(tlb->mm)) {
-   kmem_cache_free(hugepte_cache, hugepte);
+   kmem_cache_free(PGT_CACHE(PTE_T_ORDER), hugepte);
put_cpu_var(hugepd_freelist_cur);
return;
}
@@ -652,7 +654,6 @@ static int __init hugepage_setup_sz(char *str)
 }
 __setup("hugepagesz=", hugepage_setup_sz);
 
-struct kmem_cache *hugepte_cache;
 static int __init hugetlbpage_init(void)
 {
int psize;
@@ -702,21 +703,8 @@ static int __init hugetlbpage_init(void)
if (pdshift > shift)
pgtable_cache_add(pdshift - shift, NULL);
 #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
-   else if (!hugepte_cache) {
-   /*
-* Create a kmem cache for hugeptes.  The bottom bits in
-* the pte have size information encoded in them, so
-* align them to allow this
-*/
-   hugepte_cache = kmem_cache_create("hugepte-cache",
- sizeof(pte_t),
- HUGEPD_SHIFT_MASK + 1,
- 0, NULL);
-   if (hugepte_cache == NULL)
-   panic("%s: Unable to create kmem cache "
- "for hugeptes\n", __func__);
-
-   }
+   else
+   pgtable_cache_add(PTE_T_ORDER, NULL);
 #endif
}
 
-- 
2.13.3



[PATCH v9 12/20] powerpc/mm: remove unnecessary test in pgtable_cache_init()

2018-11-29 Thread Christophe Leroy
pgtable_cache_add() gracefully handles the case when a cache that
size already exists by returning early with the following test:

if (PGT_CACHE(shift))
return; /* Already have a cache of this size */

It is then not needed to test the existence of the cache before.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/init-common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
index b7ca03643d0b..1e6910eb70ed 100644
--- a/arch/powerpc/mm/init-common.c
+++ b/arch/powerpc/mm/init-common.c
@@ -111,13 +111,13 @@ void pgtable_cache_init(void)
 {
pgtable_cache_add(PGD_INDEX_SIZE);
 
-   if (PMD_CACHE_INDEX && !PGT_CACHE(PMD_CACHE_INDEX))
+   if (PMD_CACHE_INDEX)
pgtable_cache_add(PMD_CACHE_INDEX);
/*
 * In all current configs, when the PUD index exists it's the
 * same size as either the pgd or pmd index except with THP enabled
 * on book3s 64
 */
-   if (PUD_CACHE_INDEX && !PGT_CACHE(PUD_CACHE_INDEX))
+   if (PUD_CACHE_INDEX)
pgtable_cache_add(PUD_CACHE_INDEX);
 }
-- 
2.13.3



[PATCH v9 13/20] powerpc/8xx: Move SW perf counters in first 32kb of memory

2018-11-29 Thread Christophe Leroy
In order to simplify time critical exceptions handling 8xx
specific SW perf counters, this patch moves the counters into
the beginning of memory. This is possible because .text is readable
and the counters are never modified outside of the handlers.

By doing this, we avoid having to set a second register with
the upper part of the address of the counters.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 58 --
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 3b67b9533c82..c203defe49a4 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -106,6 +106,23 @@ turn_on_mmu:
mtspr   SPRN_SRR0,r0
rfi /* enables MMU */
 
+
+#ifdef CONFIG_PERF_EVENTS
+   .align  4
+
+   .globl  itlb_miss_counter
+itlb_miss_counter:
+   .space  4
+
+   .globl  dtlb_miss_counter
+dtlb_miss_counter:
+   .space  4
+
+   .globl  instruction_counter
+instruction_counter:
+   .space  4
+#endif
+
 /*
  * Exception entry code.  This code runs with address translation
  * turned off, i.e. using physical addresses.
@@ -384,17 +401,16 @@ InstructionTLBMiss:
 
 #ifdef CONFIG_PERF_EVENTS
patch_site  0f, patch__itlbmiss_perf
-0: lis r10, (itlb_miss_counter - PAGE_OFFSET)@ha
-   lwz r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
-   addir11, r11, 1
-   stw r11, (itlb_miss_counter - PAGE_OFFSET)@l(r10)
-#endif
+0: lwz r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, 1
+   stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
 #if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
+#endif
 
 #ifdef CONFIG_HUGETLB_PAGE
 10:/* 8M pages */
@@ -509,15 +525,14 @@ DataStoreTLBMiss:
 
 #ifdef CONFIG_PERF_EVENTS
patch_site  0f, patch__dtlbmiss_perf
-0: lis r10, (dtlb_miss_counter - PAGE_OFFSET)@ha
-   lwz r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10)
-   addir11, r11, 1
-   stw r11, (dtlb_miss_counter - PAGE_OFFSET)@l(r10)
-#endif
+0: lwz r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, 1
+   stw r10, (dtlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
mfspr   r12, SPRN_SPRG_SCRATCH2
rfi
+#endif
 
 #ifdef CONFIG_HUGETLB_PAGE
 10:/* 8M pages */
@@ -625,16 +640,13 @@ DataBreakpoint:
. = 0x1d00
 InstructionBreakpoint:
mtspr   SPRN_SPRG_SCRATCH0, r10
-   mtspr   SPRN_SPRG_SCRATCH1, r11
-   lis r10, (instruction_counter - PAGE_OFFSET)@ha
-   lwz r11, (instruction_counter - PAGE_OFFSET)@l(r10)
-   addir11, r11, -1
-   stw r11, (instruction_counter - PAGE_OFFSET)@l(r10)
+   lwz r10, (instruction_counter - PAGE_OFFSET)@l(0)
+   addir10, r10, -1
+   stw r10, (instruction_counter - PAGE_OFFSET)@l(0)
lis r10, 0x
ori r10, r10, 0x01
mtspr   SPRN_COUNTA, r10
mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
rfi
 #else
EXCEPTION(0x1d00, Trap_1d, unknown_exception, EXC_XFER_EE)
@@ -1065,17 +1077,3 @@ swapper_pg_dir:
  */
 abatron_pteptrs:
.space  8
-
-#ifdef CONFIG_PERF_EVENTS
-   .globl  itlb_miss_counter
-itlb_miss_counter:
-   .space  4
-
-   .globl  dtlb_miss_counter
-dtlb_miss_counter:
-   .space  4
-
-   .globl  instruction_counter
-instruction_counter:
-   .space  4
-#endif
-- 
2.13.3



[PATCH v9 00/20] Implement use of HW assistance on TLB table walk on 8xx

2018-11-29 Thread Christophe Leroy
The purpose of this serie is to implement hardware assistance for TLB table walk
on the 8xx.

First part prepares for using HW assistance in TLB routines:
- Trivial fixes:
- Remove CONFIG_BOOKE stuff from book3S headers.
- Removal of unneeded atomic PTE update requirement for 8xx.
- move book3s64 page fragment code in a common part for reusing it by the
8xx as 16k page size mode still uses 4k page tables.
- Fixing a bug in memcache handling when standard pages and hugepages share
caches of the same size (see original discussion in 
https://patchwork.ozlabs.org/patch/957565/)
- Optimise access to 8xx perf counters (hence reducing number of registers used)

Second part implements HW assistance in TLB routines in the following steps:
- Disable 16k page size mode and 512k hugepages
- Switch 4k to HW assistance
- Bring back 512k hugepages
- Bring back 16k page size mode.

Last part cleans up:
- Take benefit of Miss handler size reduction to regroup related parts
- Reduce number of registers used in miss handlers, freeing them for future use.

Tested successfully on 8xx and 83xx (book3s/32)

Changes in v9:
 - Fixed PTE_INDEX_SIZE and PTE_SHIFT so that we always have PTE_SHIFT == 
PTE_INDEX_SIZE
 - Using PTE_INDEX_SIZE instead of PTE_SHIFT to avoid build failure on PPC64
 - Moved PTE_FRAG_NR from pgtable.h to mmu-8xx.h to reduce ifdefs

Changes in v8:
 - Moved definitions in pgalloc.h to avoid conflicting with memcache patches.
 - Included the memcache bugfix serie in this serie to avoid conflicts between 
the two series when coming to the 512k pages patch.
 - In the 512k HW assistance patch, reduced the #ifdef mess by using 
IS_ENABLED(CONFIG_PPC_8xx) instead.

Changes in v7:
 - Reordered to get trivial and already reviewed patches in front.
 - Reordered to regroup all HW assistance related patches together.
 - Rebased on today merge branch (28 Nov)
 - Added a helper for access to mm_context_t.frag
 - Reduced the amount of changes in PPC32 to support pte_fragment
 - Applied pte_fragment to both nohash/32 and book3s/32

Changes in v6:
 - Droped the part related to handling GUARD attribute at PGD/PMD level.
 - Moved the commonalisation of page_fragment in the begining (this part has 
been reviewed by Aneesh)
 - Rebased on today merge branch (19 Oct)

Changes in v5:
 - Also avoid useless lock in get_pmd_from_cache()
 - A new patch to relocate mmu headers in platform specific directories
 - A new patch to distribute pgtable_t typedefs in platform specific
   mmu headers instead of the uggly #ifdef
 - Moved early_pte_alloc_kernel() in platform specific pgalloc
 - Restricted definition of PTE_FRAG_SIZE and PTE_FRAG_NR to platforms
   using the pte fragmentation.
 - arch_exit_mmap() and destroy_pagetable_cache() are now platform specific.

Changes in v4:
 - Reordered the serie to put at the end the modifications which makes
   L1 and L2 entries independant.
 - No modifications to ppc64 ioremap (we still have an opportunity to
   merge them, for a future patch serie)
 - 8xx code modified to use patch_site instead of patch_instruction
   to get a clearer code and avoid object pollution with global symbols
 - Moved perf counters in first 32kb of memory to optimise access
 - Split the big bang to HW assistance in several steps:
   1. Temporarily removes support of 16k pages and 512k hugepages
   2. Change TLB routines to use HW assistance for 4k pages and 8M hugepages
   3. Add back support for 512k hugepages
   4. Add back support for 16k pages (using pte_fragment as page tables are 
still 4k)

Changes in v3:
 - Fixed an issue in the 09/14 when CONFIG_PIN_TLB_TEXT was not enabled
 - Added performance measurement in the 09/14 commit log
 - Rebased on latest 'powerpc/merge' tree, which conflicted with 13/14

Changes in v2:
 - Removed the 3 first patchs which have been applied already
 - Fixed compilation errors reported by Michael
 - Squashed the commonalisation of ioremap functions into a single patch
 - Fixed the use of pte_fragment
 - Added a patch optimising perf counting of TLB misses and instructions

Christophe Leroy (20):
  powerpc/book3s32: Remove CONFIG_BOOKE dependent code
  powerpc/8xx: Remove PTE_ATOMIC_UPDATES
  powerpc/mm: Move pte_fragment_alloc() to a common location
  powerpc/mm: Avoid useless lock with single page fragments
  powerpc/mm: move platform specific mmu-xxx.h in platform directories
  powerpc/mm: Move pgtable_t into platform headers
  powerpc/mm: add helpers to get/set mm.context->pte_frag
  powerpc/mm: Extend pte_fragment functionality to PPC32
  powerpc/mm: enable the use of page table cache of order 0
  powerpc/mm: replace hugetlb_cache by PGT_CACHE(PTE_T_ORDER)
  powerpc/mm: fix a warning when a cache is common to PGD and hugepages
  powerpc/mm: remove unnecessary test in pgtable_cache_init()
  powerpc/8xx: Move SW perf counters in first 32kb of memory
  powerpc/8xx: Temporarily disable 16k pages and hugepages
  powerpc/8xx: Use hardware assistance in TLB handlers
  

[PATCH -next] powerpc/64s: Fix debugfs_simple_attr.cocci warnings

2018-11-29 Thread YueHaibing
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE
for debugfs files.

Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().

Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci

Signed-off-by: YueHaibing 
---
 arch/powerpc/kernel/security.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index 9703dce..14bb806 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -90,13 +90,14 @@ static int barrier_nospec_get(void *data, u64 *val)
return 0;
 }
 
-DEFINE_SIMPLE_ATTRIBUTE(fops_barrier_nospec,
-   barrier_nospec_get, barrier_nospec_set, "%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(fops_barrier_nospec, barrier_nospec_get,
+barrier_nospec_set, "%llu\n");
 
 static __init int barrier_nospec_debugfs_init(void)
 {
-   debugfs_create_file("barrier_nospec", 0600, powerpc_debugfs_root, NULL,
-   _barrier_nospec);
+   debugfs_create_file_unsafe("barrier_nospec", 0600,
+  powerpc_debugfs_root, NULL,
+  _barrier_nospec);
return 0;
 }
 device_initcall(barrier_nospec_debugfs_init);
@@ -339,11 +340,13 @@ static int stf_barrier_get(void *data, u64 *val)
return 0;
 }
 
-DEFINE_SIMPLE_ATTRIBUTE(fops_stf_barrier, stf_barrier_get, stf_barrier_set, 
"%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(fops_stf_barrier, stf_barrier_get, stf_barrier_set,
+"%llu\n");
 
 static __init int stf_barrier_debugfs_init(void)
 {
-   debugfs_create_file("stf_barrier", 0600, powerpc_debugfs_root, NULL, 
_stf_barrier);
+   debugfs_create_file_unsafe("stf_barrier", 0600, powerpc_debugfs_root,
+  NULL, _stf_barrier);
return 0;
 }
 device_initcall(stf_barrier_debugfs_init);
@@ -404,13 +407,14 @@ static int count_cache_flush_get(void *data, u64 *val)
return 0;
 }
 
-DEFINE_SIMPLE_ATTRIBUTE(fops_count_cache_flush, count_cache_flush_get,
-   count_cache_flush_set, "%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(fops_count_cache_flush, count_cache_flush_get,
+count_cache_flush_set, "%llu\n");
 
 static __init int count_cache_flush_debugfs_init(void)
 {
-   debugfs_create_file("count_cache_flush", 0600, powerpc_debugfs_root,
-   NULL, _count_cache_flush);
+   debugfs_create_file_unsafe("count_cache_flush", 0600,
+  powerpc_debugfs_root, NULL,
+  _count_cache_flush);
return 0;
 }
 device_initcall(count_cache_flush_debugfs_init);







[PATCH v9 14/20] powerpc/8xx: Temporarily disable 16k pages and hugepages

2018-11-29 Thread Christophe Leroy
In preparation of making use of hardware assistance in TLB handlers,
this patch temporarily disables 16K pages and hugepages. The reason
is that when using HW assistance in 4K pages mode, the linux model
fit with the HW model for 4K pages and 8M pages.

However for 16K pages and 512K mode some additional work is needed
to get linux model fit with HW model.
For the 8M pages, they will naturaly come back when we switch to
HW assistance, without any additional handling.
In order to keep the following patch smaller, the removal of the
current special handling for 8M pages gets removed here as well.

Therefore the 4K pages mode will be implemented first and without
support for 512k hugepages. Then the 512k hugepages will be brought
back. And the 16K pages will be implemented in the following step.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig   |  2 +-
 arch/powerpc/kernel/head_8xx.S | 74 +++---
 arch/powerpc/mm/tlb_nohash.c   |  6 
 3 files changed, 6 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8be31261aec8..ddfccdf004fe 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -689,7 +689,7 @@ config PPC_4K_PAGES
 
 config PPC_16K_PAGES
bool "16k page size"
-   depends on 44x || PPC_8xx
+   depends on 44x
 
 config PPC_64K_PAGES
bool "64k page size"
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index c203defe49a4..01f58b1d9ae7 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -314,7 +314,7 @@ SystemCall:
 InstructionTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
mtspr   SPRN_SPRG_SCRATCH1, r11
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mtspr   SPRN_SPRG_SCRATCH2, r12
 #endif
 
@@ -325,10 +325,8 @@ InstructionTLBMiss:
INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
/* Only modules will cause ITLB Misses as we always
 * pin the first 8MB of kernel memory */
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
-   mfcrr12
-#endif
 #ifdef ITLB_MISS_KERNEL
+   mfcrr12
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
andis.  r11, r10, 0x8000/* Address >= 0x8000 */
 #else
@@ -360,15 +358,9 @@ InstructionTLBMiss:
 
/* Extract level 2 index */
rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-#ifdef CONFIG_HUGETLB_PAGE
-   mtcrr11
-   bt- 28, 10f /* bit 28 = Large page (8M) */
-   bt- 29, 20f /* bit 29 = Large page (8M or 512k) */
-#endif
rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
lwz r10, 0(r10) /* Get the pte */
-4:
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mtcrr12
 #endif
/* Load the MI_TWC with the attributes for this "segment." */
@@ -393,7 +385,7 @@ InstructionTLBMiss:
/* Restore registers */
 0: mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
@@ -406,35 +398,12 @@ InstructionTLBMiss:
stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr   r10, SPRN_SPRG_SCRATCH0
mfspr   r11, SPRN_SPRG_SCRATCH1
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_HUGETLB_PAGE)
+#ifdef ITLB_MISS_KERNEL
mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
 #endif
 
-#ifdef CONFIG_HUGETLB_PAGE
-10:/* 8M pages */
-#ifdef CONFIG_PPC_16K_PAGES
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT_8M - PAGE_SHIFT), 32 + PAGE_SHIFT_8M 
- (PAGE_SHIFT << 1), 29
-   /* Add level 2 base */
-   rlwimi  r10, r11, 0, 0, 32 + PAGE_SHIFT_8M - (PAGE_SHIFT << 1) - 1
-#else
-   /* Level 2 base */
-   rlwinm  r10, r11, 0, ~HUGEPD_SHIFT_MASK
-#endif
-   lwz r10, 0(r10) /* Get the pte */
-   b   4b
-
-20:/* 512k pages */
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT_512K - PAGE_SHIFT), 32 + 
PAGE_SHIFT_512K - (PAGE_SHIFT << 1), 29
-   /* Add level 2 base */
-   rlwimi  r10, r11, 0, 0, 32 + PAGE_SHIFT_512K - (PAGE_SHIFT << 1) - 1
-   lwz r10, 0(r10) /* Get the pte */
-   b   4b
-#endif
-
. = 0x1200
 DataStoreTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
@@ -472,11 +441,6 @@ DataStoreTLBMiss:
 */
/* Extract level 2 index */
rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-#ifdef CONFIG_HUGETLB_PAGE
-   mtcrr11
-   bt- 28, 10f /* bit 28 = Large page (8M) */
-   bt- 29, 20f /* bit 29 = Large page (8M or 512k) */
-#endif
rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
   

[PATCH v9 15/20] powerpc/8xx: Use hardware assistance in TLB handlers

2018-11-29 Thread Christophe Leroy
Today, on the 8xx the TLB handlers do SW tablewalk by doing all
the calculation in ASM, in order to match with the Linux page
table structure.

The 8xx offers hardware assistance which allows significant size
reduction of the TLB handlers, hence also reduces the time spent
in the handlers.

However, using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

This patch modifies the TLB handlers to use HW assistance for 4K PAGES.

Before that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 80 ticks
- DTLB miss: 62 ticks
After that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 72 ticks
- DTLB miss: 54 ticks
So the improvement is 10% for ITLB and 13% for DTLB misses

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 58 +-
 arch/powerpc/mm/8xx_mmu.c  |  4 +--
 2 files changed, 26 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 01f58b1d9ae7..85fb4b8bf6c7 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -292,7 +292,7 @@ SystemCall:
. = 0x1100
 /*
  * For the MPC8xx, this is a software tablewalk to load the instruction
- * TLB.  The task switch loads the M_TW register with the pointer to the first
+ * TLB.  The task switch loads the M_TWB register with the pointer to the first
  * level table.
  * If we discover there is no second level table (value is zero) or if there
  * is an invalid pte, we load that into the TLB, which causes another fault
@@ -323,6 +323,7 @@ InstructionTLBMiss:
 */
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
+   mtspr   SPRN_MD_EPN, r10
/* Only modules will cause ITLB Misses as we always
 * pin the first 8MB of kernel memory */
 #ifdef ITLB_MISS_KERNEL
@@ -339,7 +340,7 @@ InstructionTLBMiss:
 #endif
 #endif
 #endif
-   mfspr   r11, SPRN_M_TW  /* Get level 1 table */
+   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
 #ifdef ITLB_MISS_KERNEL
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
beq+3f
@@ -349,16 +350,14 @@ InstructionTLBMiss:
 #ifndef CONFIG_PIN_TLB_TEXT
blt cr7, ITLBMissLinear
 #endif
-   lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+   rlwinm  r11, r11, 0, 20, 31
+   orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
 #endif
-   /* Insert level 1 index */
-   rlwimi  r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 
29
lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the 
level 1 entry */
 
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-   rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
+   mtspr   SPRN_MD_TWC, r11
+   mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
 #ifdef ITLB_MISS_KERNEL
mtcrr12
@@ -417,7 +416,7 @@ DataStoreTLBMiss:
mfspr   r10, SPRN_MD_EPN
rlwinm  r11, r10, 16, 0xfff8
cmpli   cr0, r11, PAGE_OFFSET@h
-   mfspr   r11, SPRN_M_TW  /* Get level 1 table */
+   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
blt+3f
rlwinm  r11, r10, 16, 0xfff8
 #ifndef CONFIG_PIN_TLB_IMMR
@@ -430,20 +429,16 @@ DataStoreTLBMiss:
patch_site  0b, patch__dtlbmiss_immr_jmp
 #endif
blt cr7, DTLBMissLinear
-   lis r11, (swapper_pg_dir-PAGE_OFFSET)@ha
+   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
+   rlwinm  r11, r11, 0, 20, 31
+   orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
-
-   /* Insert level 1 index */
-   rlwimi  r11, r10, 32 - ((PAGE_SHIFT - 2) << 1), (PAGE_SHIFT - 2) << 1, 
29
lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the 
level 1 entry */
 
-   /* We have a pte table, so load fetch the pte from the table.
-*/
-   /* Extract level 2 index */
-   rlwinm  r10, r10, 32 - (PAGE_SHIFT - 2), 32 - PAGE_SHIFT, 29
-   rlwimi  r10, r11, 0, 0, 32 - PAGE_SHIFT - 1 /* Add level 2 base */
+   mtspr   SPRN_MD_TWC, r11
+   mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
-4:
+
mtcrr12
 
/* Insert the Guarded flag into the TWC from the Linux PTE.
@@ -668,9 +663,10 @@ FixupDAR:/* Entry point for dcbx workaround. */
mtspr   

[PATCH v9 17/20] powerpc/8xx: Enable 512k hugepage support with HW assistance

2018-11-29 Thread Christophe Leroy
For using 512k pages with hardware assistance, the PTEs have to be spread
every 128 bytes in the L2 table.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/hugetlb.h |  4 +++-
 arch/powerpc/mm/hugetlbpage.c  | 10 +-
 arch/powerpc/mm/tlb_nohash.c   |  3 +++
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index dfb8bf236586..62a0ca02ca7d 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -74,7 +74,9 @@ static inline pte_t *hugepte_offset(hugepd_t hpd, unsigned 
long addr,
unsigned long idx = 0;
 
pte_t *dir = hugepd_page(hpd);
-#ifndef CONFIG_PPC_FSL_BOOK3E
+#ifdef CONFIG_PPC_8xx
+   idx = (addr & ((1UL << pdshift) - 1)) >> PAGE_SHIFT;
+#elif !defined(CONFIG_PPC_FSL_BOOK3E)
idx = (addr & ((1UL << pdshift) - 1)) >> hugepd_shift(hpd);
 #endif
 
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index bc97874d7c74..5b236621d302 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -65,6 +65,9 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
if (pshift >= pdshift) {
cachep = PGT_CACHE(PTE_T_ORDER);
num_hugepd = 1 << (pshift - pdshift);
+   } else if (IS_ENABLED(CONFIG_PPC_8xx)) {
+   cachep = PGT_CACHE(PTE_INDEX_SIZE);
+   num_hugepd = 1;
} else {
cachep = PGT_CACHE(pdshift - pshift);
num_hugepd = 1;
@@ -331,6 +334,9 @@ static void free_hugepd_range(struct mmu_gather *tlb, 
hugepd_t *hpdp, int pdshif
 
if (shift >= pdshift)
hugepd_free(tlb, hugepte);
+   else if (IS_ENABLED(CONFIG_PPC_8xx))
+   pgtable_free_tlb(tlb, hugepte,
+get_hugepd_cache_index(PTE_INDEX_SIZE));
else
pgtable_free_tlb(tlb, hugepte,
 get_hugepd_cache_index(pdshift - shift));
@@ -700,7 +706,9 @@ static int __init hugetlbpage_init(void)
 * if we have pdshift and shift value same, we don't
 * use pgt cache for hugepd.
 */
-   if (pdshift > shift)
+   if (pdshift > shift && IS_ENABLED(CONFIG_PPC_8xx))
+   pgtable_cache_add(PTE_INDEX_SIZE);
+   else if (pdshift > shift)
pgtable_cache_add(pdshift - shift);
 #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
else
diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 8ad7aab150b7..ae5d568e267f 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -97,6 +97,9 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
.shift  = 14,
},
 #endif
+   [MMU_PAGE_512K] = {
+   .shift  = 19,
+   },
[MMU_PAGE_8M] = {
.shift  = 23,
},
-- 
2.13.3



[PATCH v9 18/20] powerpc/8xx: reintroduce 16K pages with HW assistance

2018-11-29 Thread Christophe Leroy
Using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

In order to use hardware assistance with 16K pages, this patch does
the following modifications:
- Make PGD size independent of the main page size
- In 16k pages mode, redefine pte_t as a struct with 4 elements,
and populate those 4 elements in __set_pte_at() and pte_update()
- Adapt the size of the hugepage tables.
- Define a PTE_FRAGMENT_NB so that a 16k page contains 4 page tables.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 2 +-
 arch/powerpc/include/asm/nohash/32/mmu-8xx.h | 4 
 arch/powerpc/include/asm/nohash/32/pgtable.h | 8 +++-
 arch/powerpc/include/asm/nohash/pgtable.h| 4 
 arch/powerpc/include/asm/page_32.h   | 3 ++-
 arch/powerpc/include/asm/pgtable-types.h | 4 
 6 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ddfccdf004fe..8be31261aec8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -689,7 +689,7 @@ config PPC_4K_PAGES
 
 config PPC_16K_PAGES
bool "16k page size"
-   depends on 44x
+   depends on 44x || PPC_8xx
 
 config PPC_64K_PAGES
bool "64k page size"
diff --git a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h 
b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
index fa05aa566ece..b0f764c827c0 100644
--- a/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
+++ b/arch/powerpc/include/asm/nohash/32/mmu-8xx.h
@@ -190,6 +190,7 @@ typedef struct {
struct slice_mask mask_8m;
 # endif
 #endif
+   void *pte_frag;
 } mm_context_t;
 
 #define PHYS_IMMR_BASE (mfspr(SPRN_IMMR) & 0xfff8)
@@ -244,6 +245,9 @@ extern s32 patch__itlbmiss_perf, patch__dtlbmiss_perf;
 #define mmu_virtual_psize  MMU_PAGE_4K
 #elif defined(CONFIG_PPC_16K_PAGES)
 #define mmu_virtual_psize  MMU_PAGE_16K
+#define PTE_FRAG_NR4
+#define PTE_FRAG_SIZE_SHIFT12
+#define PTE_FRAG_SIZE  (1UL << 12)
 #else
 #error "Unsupported PAGE_SIZE"
 #endif
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index e9c8604cc8c5..bed433358260 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -232,7 +232,13 @@ static inline unsigned long pte_update(pte_t *p,
: "cc" );
 #else /* PTE_ATOMIC_UPDATES */
unsigned long old = pte_val(*p);
-   *p = __pte((old & ~clr) | set);
+   unsigned long new = (old & ~clr) | set;
+
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+   p->pte = p->pte1 = p->pte2 = p->pte3 = new;
+#else
+   *p = __pte(new);
+#endif
 #endif /* !PTE_ATOMIC_UPDATES */
 
 #ifdef CONFIG_44x
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 70ff23974b59..1ca1c1864b32 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -209,7 +209,11 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
/* Anything else just stores the PTE normally. That covers all 64-bit
 * cases, and 32-bit non-hash with 32-bit PTEs.
 */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+   ptep->pte = ptep->pte1 = ptep->pte2 = ptep->pte3 = pte_val(pte);
+#else
*ptep = pte;
+#endif
 
/*
 * With hardware tablewalk, a sync is needed to ensure that
diff --git a/arch/powerpc/include/asm/page_32.h 
b/arch/powerpc/include/asm/page_32.h
index 5c378e9b78c8..683dfbc67ca8 100644
--- a/arch/powerpc/include/asm/page_32.h
+++ b/arch/powerpc/include/asm/page_32.h
@@ -22,7 +22,8 @@
 #define PTE_FLAGS_OFFSET   0
 #endif
 
-#ifdef CONFIG_PPC_256K_PAGES
+#if defined(CONFIG_PPC_256K_PAGES) || \
+(defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES))
 #define PTE_SHIFT  (PAGE_SHIFT - PTE_T_LOG2 - 2)   /* 1/4 of a page */
 #else
 #define PTE_SHIFT  (PAGE_SHIFT - PTE_T_LOG2)   /* full page */
diff --git a/arch/powerpc/include/asm/pgtable-types.h 
b/arch/powerpc/include/asm/pgtable-types.h
index eccb30b38b47..3b0edf041b2e 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -3,7 +3,11 @@
 #define _ASM_POWERPC_PGTABLE_TYPES_H
 
 /* PTE level */
+#if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
+typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
+#else
 typedef struct { pte_basic_t pte; } pte_t;

[PATCH v9 20/20] powerpc/8xx: regroup TLB handler routines

2018-11-29 Thread Christophe Leroy
As this is running with MMU off, the CPU only does speculative
fetch for code in the same page.

Following the significant size reduction of TLB handler routines,
the side handlers can be brought back close to the main part,
ie in the same page.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 112 -
 1 file changed, 54 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 0a4f8a9c85ff..b171b7c0a0e7 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -399,6 +399,23 @@ InstructionTLBMiss:
rfi
 #endif
 
+#ifndef CONFIG_PIN_TLB_TEXT
+ITLBMissLinear:
+   mtcrr11
+   /* Set 8M byte page and mark it valid */
+   li  r11, MI_PS8MEG | MI_SVALID
+   mtspr   SPRN_MI_TWC, r11
+   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
+   ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT
+   mtspr   SPRN_MI_RPN, r10/* Update TLB entry */
+
+0: mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r11, SPRN_SPRG_SCRATCH1
+   rfi
+   patch_site  0b, patch__itlbmiss_exit_2
+#endif
+
. = 0x1200
 DataStoreTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
@@ -484,6 +501,43 @@ DataStoreTLBMiss:
rfi
 #endif
 
+DTLBMissIMMR:
+   mtcrr11
+   /* Set 512k byte guarded page and mark it valid */
+   li  r10, MD_PS512K | MD_GUARDED | MD_SVALID
+   mtspr   SPRN_MD_TWC, r10
+   mfspr   r10, SPRN_IMMR  /* Get current IMMR */
+   rlwinm  r10, r10, 0, 0xfff8 /* Get 512 kbytes boundary */
+   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT | _PAGE_NO_CACHE
+   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
+
+   li  r11, RPN_PATTERN
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+
+0: mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r11, SPRN_SPRG_SCRATCH1
+   rfi
+   patch_site  0b, patch__dtlbmiss_exit_2
+
+DTLBMissLinear:
+   mtcrr11
+   /* Set 8M byte page and mark it valid */
+   li  r11, MD_PS8MEG | MD_SVALID
+   mtspr   SPRN_MD_TWC, r11
+   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
+   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
+ _PAGE_PRESENT
+   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
+
+   li  r11, RPN_PATTERN
+   mtspr   SPRN_DAR, r11   /* Tag DAR */
+
+0: mfspr   r10, SPRN_SPRG_SCRATCH0
+   mfspr   r11, SPRN_SPRG_SCRATCH1
+   rfi
+   patch_site  0b, patch__dtlbmiss_exit_3
+
 /* This is an instruction TLB error on the MPC8xx.  This could be due
  * to many reasons, such as executing guarded memory or illegal instruction
  * addresses.  There is nothing to do but handle a big time error fault.
@@ -583,64 +637,6 @@ InstructionBreakpoint:
 
. = 0x2000
 
-/*
- * Bottom part of DataStoreTLBMiss handlers for IMMR area and linear RAM.
- * not enough space in the DataStoreTLBMiss area.
- */
-DTLBMissIMMR:
-   mtcrr11
-   /* Set 512k byte guarded page and mark it valid */
-   li  r10, MD_PS512K | MD_GUARDED | MD_SVALID
-   mtspr   SPRN_MD_TWC, r10
-   mfspr   r10, SPRN_IMMR  /* Get current IMMR */
-   rlwinm  r10, r10, 0, 0xfff8 /* Get 512 kbytes boundary */
-   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT | _PAGE_NO_CACHE
-   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
-
-   li  r11, RPN_PATTERN
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
-
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
-   rfi
-   patch_site  0b, patch__dtlbmiss_exit_2
-
-DTLBMissLinear:
-   mtcrr11
-   /* Set 8M byte page and mark it valid */
-   li  r11, MD_PS8MEG | MD_SVALID
-   mtspr   SPRN_MD_TWC, r11
-   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
-   ori r10, r10, 0xf0 | MD_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
- _PAGE_PRESENT
-   mtspr   SPRN_MD_RPN, r10/* Update TLB entry */
-
-   li  r11, RPN_PATTERN
-   mtspr   SPRN_DAR, r11   /* Tag DAR */
-
-0: mfspr   r10, SPRN_SPRG_SCRATCH0
-   mfspr   r11, SPRN_SPRG_SCRATCH1
-   rfi
-   patch_site  0b, patch__dtlbmiss_exit_3
-
-#ifndef CONFIG_PIN_TLB_TEXT
-ITLBMissLinear:
-   mtcrr11
-   /* Set 8M byte page and mark it valid */
-   li  r11, MI_PS8MEG | MI_SVALID
-   mtspr   SPRN_MI_TWC, r11
-   rlwinm  r10, r10, 20, 0x0f80/* 8xx supports max 256Mb RAM */
-   ori r10, r10, 0xf0 | MI_SPS16K | _PAGE_SH | _PAGE_DIRTY | \
-   

[PATCH v9 16/20] powerpc/8xx: Enable 8M hugepage support with HW assistance

2018-11-29 Thread Christophe Leroy
HW assistance naturally supports 8M huge pages without
further modifications.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/tlb_nohash.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/mm/tlb_nohash.c b/arch/powerpc/mm/tlb_nohash.c
index 4f79639e432f..8ad7aab150b7 100644
--- a/arch/powerpc/mm/tlb_nohash.c
+++ b/arch/powerpc/mm/tlb_nohash.c
@@ -97,6 +97,9 @@ struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
.shift  = 14,
},
 #endif
+   [MMU_PAGE_8M] = {
+   .shift  = 23,
+   },
 };
 #else
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT] = {
-- 
2.13.3



[PATCH v9 19/20] powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers

2018-11-29 Thread Christophe Leroy
This patch reworks the TLB Miss handler in order to not use r12
register, hence avoiding having to save it into SPRN_SPRG_SCRATCH2.

In the DAR Fixup code we can now use SPRN_M_TW, freeing
SPRN_SPRG_SCRATCH2.

Then SPRN_SPRG_SCRATCH2 may be used for something else in the future.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 110 ++---
 1 file changed, 49 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 85fb4b8bf6c7..0a4f8a9c85ff 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -302,90 +302,87 @@ SystemCall:
  */
 
 #ifdef CONFIG_8xx_CPU15
-#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr) \
-   additmp, addr, PAGE_SIZE;   \
-   tlbie   tmp;\
-   additmp, addr, -PAGE_SIZE;  \
-   tlbie   tmp
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)  \
+   addiaddr, addr, PAGE_SIZE;  \
+   tlbie   addr;   \
+   addiaddr, addr, -(PAGE_SIZE << 1);  \
+   tlbie   addr;   \
+   addiaddr, addr, PAGE_SIZE
 #else
-#define INVALIDATE_ADJACENT_PAGES_CPU15(tmp, addr)
+#define INVALIDATE_ADJACENT_PAGES_CPU15(addr)
 #endif
 
 InstructionTLBMiss:
mtspr   SPRN_SPRG_SCRATCH0, r10
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
mtspr   SPRN_SPRG_SCRATCH1, r11
-#ifdef ITLB_MISS_KERNEL
-   mtspr   SPRN_SPRG_SCRATCH2, r12
 #endif
 
/* If we are faulting a kernel address, we have to use the
 * kernel page tables.
 */
mfspr   r10, SPRN_SRR0  /* Get effective address of fault */
-   INVALIDATE_ADJACENT_PAGES_CPU15(r11, r10)
+   INVALIDATE_ADJACENT_PAGES_CPU15(r10)
mtspr   SPRN_MD_EPN, r10
/* Only modules will cause ITLB Misses as we always
 * pin the first 8MB of kernel memory */
 #ifdef ITLB_MISS_KERNEL
-   mfcrr12
+   mfcrr11
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
-   andis.  r11, r10, 0x8000/* Address >= 0x8000 */
+   cmpicr0, r10, 0 /* Address >= 0x8000 */
 #else
-   rlwinm  r11, r10, 16, 0xfff8
-   cmpli   cr0, r11, PAGE_OFFSET@h
+   rlwinm  r10, r10, 16, 0xfff8
+   cmpli   cr0, r10, PAGE_OFFSET@h
 #ifndef CONFIG_PIN_TLB_TEXT
/* It is assumed that kernel code fits into the first 8M page */
-0: cmpli   cr7, r11, (PAGE_OFFSET + 0x080)@h
+0: cmpli   cr7, r10, (PAGE_OFFSET + 0x080)@h
patch_site  0b, patch__itlbmiss_linmem_top
 #endif
 #endif
 #endif
-   mfspr   r11, SPRN_M_TWB /* Get level 1 table */
+   mfspr   r10, SPRN_M_TWB /* Get level 1 table */
 #ifdef ITLB_MISS_KERNEL
 #if defined(SIMPLE_KERNEL_ADDRESS) && defined(CONFIG_PIN_TLB_TEXT)
-   beq+3f
+   bge+3f
 #else
blt+3f
 #endif
 #ifndef CONFIG_PIN_TLB_TEXT
blt cr7, ITLBMissLinear
 #endif
-   rlwinm  r11, r11, 0, 20, 31
-   orisr11, r11, (swapper_pg_dir - PAGE_OFFSET)@ha
+   rlwinm  r10, r10, 0, 20, 31
+   orisr10, r10, (swapper_pg_dir - PAGE_OFFSET)@ha
 3:
 #endif
-   lwz r11, (swapper_pg_dir-PAGE_OFFSET)@l(r11)/* Get the 
level 1 entry */
+   lwz r10, (swapper_pg_dir-PAGE_OFFSET)@l(r10)/* Get level 1 
entry */
+   mtspr   SPRN_MI_TWC, r10/* Set segment attributes */
 
-   mtspr   SPRN_MD_TWC, r11
+   mtspr   SPRN_MD_TWC, r10
mfspr   r10, SPRN_MD_TWC
lwz r10, 0(r10) /* Get the pte */
 #ifdef ITLB_MISS_KERNEL
-   mtcrr12
+   mtcrr11
 #endif
-   /* Load the MI_TWC with the attributes for this "segment." */
-   mtspr   SPRN_MI_TWC, r11/* Set segment attributes */
-
 #ifdef CONFIG_SWAP
rlwinm  r11, r10, 32-5, _PAGE_PRESENT
and r11, r11, r10
rlwimi  r10, r11, 0, _PAGE_PRESENT
 #endif
-   li  r11, RPN_PATTERN | 0x200
/* The Linux PTE won't go exactly into the MMU TLB.
 * Software indicator bits 20 and 23 must be clear.
 * Software indicator bits 22, 24, 25, 26, and 27 must be
 * set.  All other Linux PTE bits control the behavior
 * of the MMU.
 */
-   rlwimi  r11, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */
-   rlwimi  r10, r11, 0, 0x0ff0 /* Set 22, 24-27, clear 20,23 */
+   rlwimi  r10, r10, 0, 0x0f00 /* Clear bits 20-23 */
+   rlwimi  r10, r10, 4, 0x0400 /* Copy _PAGE_EXEC into bit 21 */
+   ori r10, r10, RPN_PATTERN | 0x200 /* Set 22 and 24-27 */
mtspr   SPRN_MI_RPN, r10/* Update TLB entry */
 
/* Restore registers */
 0: mfspr   r10, SPRN_SPRG_SCRATCH0
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
mfspr   r11, SPRN_SPRG_SCRATCH1
-#ifdef ITLB_MISS_KERNEL
-   mfspr   r12, SPRN_SPRG_SCRATCH2
 #endif
rfi
patch_site  0b, 

Re: use generic DMA mapping code in powerpc V4

2018-11-29 Thread Christian Zigotzky

On 29 November 2018 at 1:05PM, Christian Zigotzky wrote:

On 28 November 2018 at 12:05PM, Michael Ellerman wrote:

Christoph Hellwig  writes:


Any comments?  I'd like to at least get the ball moving on the easy
bits.

Nothing specific yet.

I'm a bit worried it might break one of the many old obscure platforms
we have that aren't well tested.

There's not much we can do about that, but I'll just try and test it on
everything I can find.

Is the plan that you take these via the dma-mapping tree or that they go
via powerpc?

cheers


Hi All,

I compiled a test kernel from the following Git today.

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.4 



Command: git clone git://git.infradead.org/users/hch/misc.git -b 
powerpc-dma.4 a


Unfortunately I get some DMA error messages and the PASEMI ethernet 
doesn't work anymore.


[  367.627623] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627631] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627639] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627647] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627655] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.627686] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.628418] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.628505] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.628592] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.629324] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.629417] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.629495] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0
[  367.629589] pci :00:1a.0: dma_direct_map_page: overflow 
0x00026bcb5002+110 of device mask  bus mask 0


[  430.424732]pasemi_mac: rcmdsta error: 0x04ef3001

I tested this kernel with the Nemo board (CPU: PWRficient PA6T-1682M). 
The PASEMI ethernet works with the RC4 of kernel 4.20.


Cheers,
Christian


Hi All,

I tested this kernel on my NXP QorIQ P5020 board. U-Boot loads the dtb 
file and the kernel and after that the booting stops. This board works 
with the RC4 of kernel 4.20. Please test this kernel on your NXP and 
PASEMI boards.


Thanks,
Christian



[PATCH v1 09/13] powerpc/mmu: add is_strict_kernel_rwx() helper

2018-11-29 Thread Christophe Leroy
Add a helper to know whether STRICT_KERNEL_RWX is enabled.

This is based on rodata_enabled flag which is defined only
when CONFIG_STRICT_KERNEL_RWX is selected.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/mmu.h | 11 +++
 arch/powerpc/mm/init_32.c  |  4 +---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index eb20eb3b8fb0..6343cbf5b651 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -268,6 +268,17 @@ static inline u16 get_mm_addr_key(struct mm_struct *mm, 
unsigned long address)
 }
 #endif /* CONFIG_PPC_MEM_KEYS */
 
+#ifdef CONFIG_STRICT_KERNEL_RWX
+static inline bool strict_kernel_rwx_enabled(void)
+{
+   return rodata_enabled;
+}
+#else
+static inline bool strict_kernel_rwx_enabled(void)
+{
+   return false;
+}
+#endif
 #endif /* !__ASSEMBLY__ */
 
 /* The kernel use the constants below to index in the page sizes array.
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index 3e59e5d64b01..ee5a430b9a18 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -108,12 +108,10 @@ static void __init MMU_setup(void)
__map_without_bats = 1;
__map_without_ltlbs = 1;
}
-#ifdef CONFIG_STRICT_KERNEL_RWX
-   if (rodata_enabled) {
+   if (strict_kernel_rwx_enabled()) {
__map_without_bats = 1;
__map_without_ltlbs = 1;
}
-#endif
 }
 
 /*
-- 
2.13.3



Re: use generic DMA mapping code in powerpc V4

2018-11-29 Thread Christoph Hellwig
> > Please don't apply the new DMA mapping code if you don't be sure if it 
> > works on all supported PowerPC machines. Is the new DMA mapping code 
> > really necessary? It's not really nice, to rewrote code if the old code 
> > works perfect. We must not forget, that we work for the end users. Does 
> > the end user have advantages with this new code? Is it faster? The old 
> > code works without any problems. 
> 
> There is another service provided to the users as well: new code that is
> cleaner and simpler which allows easier bug fixes and new features.
> Without being familiar with the DMA mapping code I cannot really say if
> that's the case here.

Yes, the main point is to move all architecturs to common code for the
dma direct mapping code.  This means we have one code bases that sees
bugs fixed and features introduced the same way for everyone.


Re: use generic DMA mapping code in powerpc V4

2018-11-29 Thread Christoph Hellwig
On Wed, Nov 28, 2018 at 10:05:19PM +1100, Michael Ellerman wrote:
> Is the plan that you take these via the dma-mapping tree or that they go
> via powerpc?

In principle either way is fine with me.  If it goes through the powerpc
tree we might run into a few minor conflicts with the dma-mapping tree
depending on how some of the current discussions go.


Re: use generic DMA mapping code in powerpc V4

2018-11-29 Thread Christoph Hellwig
On Thu, Nov 29, 2018 at 01:05:23PM +0100, Christian Zigotzky wrote:
> I compiled a test kernel from the following Git today.
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.4
>
> Command: git clone git://git.infradead.org/users/hch/misc.git -b 
> powerpc-dma.4 a
>
> Unfortunately I get some DMA error messages and the PASEMI ethernet doesn't 
> work anymore.

What kind of machine is this (and your other one)?  Can you send me
(or point me to) the .config files?