from:"Daniel Axtens"

Re: [PATCH] powerpc/pseries: extract host bridge from pci_bus prior to bus removal

2021-04-16 Thread Daniel Axtens

Hi Tyrel,

> The pci_bus->bridge reference may no longer be valid after
> pci_bus_remove() resulting in passing a bad value to device_unregister()
> for the associated bridge device.
>
> Store the host_bridge reference in a separate variable prior to
> pci_bus_remove().
>
The patch certainly seems to do what you say. I'm not really up on the
innards of PCI, so I'm struggling to figure out by what code path
pci_bus_remove() might invalidate pci_bus->bridge? A quick look at
pci_remove_bus was not very illuminating but I didn't chase down every
call it made.

Kind regards,
Daniel

> Fixes: 7340056567e3 ("powerpc/pci: Reorder pci bus/bridge unregistration 
> during PHB removal")
> Signed-off-by: Tyrel Datwyler 
> ---
>  arch/powerpc/platforms/pseries/pci_dlpar.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c 
> b/arch/powerpc/platforms/pseries/pci_dlpar.c
> index f9ae17e8a0f4..a8f9140a24fa 100644
> --- a/arch/powerpc/platforms/pseries/pci_dlpar.c
> +++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
> @@ -50,6 +50,7 @@ EXPORT_SYMBOL_GPL(init_phb_dynamic);
>  int remove_phb_dynamic(struct pci_controller *phb)
>  {
>   struct pci_bus *b = phb->bus;
> + struct pci_host_bridge *host_bridge = to_pci_host_bridge(b->bridge);
>   struct resource *res;
>   int rc, i;
>  
> @@ -76,7 +77,8 @@ int remove_phb_dynamic(struct pci_controller *phb)
>   /* Remove the PCI bus and unregister the bridge device from sysfs */
>   phb->bus = NULL;
>   pci_remove_bus(b);
> - device_unregister(b->bridge);
> + host_bridge->bus = NULL;
> + device_unregister(_bridge->dev);
>  
>   /* Now release the IO resource */
>   if (res->flags & IORESOURCE_IO)
> -- 
> 2.27.0

Re: [PATCH] soc: fsl: qe: remove unused function

2021-04-16 Thread Daniel Axtens

Hi Jiapeng,

> Fix the following clang warning:
>
> drivers/soc/fsl/qe/qe_ic.c:234:29: warning: unused function
> 'qe_ic_from_irq' [-Wunused-function].
>
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 
> ---
>  drivers/soc/fsl/qe/qe_ic.c | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c
> index 0390af9..b573712 100644
> --- a/drivers/soc/fsl/qe/qe_ic.c
> +++ b/drivers/soc/fsl/qe/qe_ic.c
> @@ -231,11 +231,6 @@ static inline void qe_ic_write(__be32  __iomem *base, 
> unsigned int reg,
>   qe_iowrite32be(value, base + (reg >> 2));
>  }
>  
> -static inline struct qe_ic *qe_ic_from_irq(unsigned int virq)
> -{
> - return irq_get_chip_data(virq);
> -}

This seems good to me.

 * We know that this function can't be called directly from outside the
  file, because it is static.

 * The function address isn't used as a function pointer anywhere, so
   that means it can't be called from outside the file that way (also
   it's inline, which would make using a function pointer unwise!)

 * There's no obvious macros in that file that might construct the name
   of the function in a way that is hidden from grep.

All in all, I am fairly confident that the function is indeed not used.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> -
>  static inline struct qe_ic *qe_ic_from_irq_data(struct irq_data *d)
>  {
>   return irq_data_get_irq_chip_data(d);
> -- 
> 1.8.3.1

Re: [PATCH] symbol : Make the size of the compile-related array fixed

2021-04-16 Thread Daniel Axtens

Hi,

Thanks for your contribution to the kernel!

I notice that your patch is sumbitted as an attachment. In future,
please could you submit your patch inline, rather than as an attachment?
See https://www.kernel.org/doc/html/v4.15/process/5.Posting.html
I'd recommend you use git send-email if possible: see e.g.
https://www.kernel.org/doc/html/v4.15/process/email-clients.html

> Subject: [PATCH] symbol : Make the size of the compile-related array fixed
>
> For the same code, the machine's user name, hostname, or compilation time
> may cause the kernel symbol address to be inconsistent, which is not
> friendly to some symbol-dependent software, such as Crash.

If I understand correctly, this patch makes it easier to recompile the
kernel from the same source but at a different time or on a different
machine or with a different user, but still get the same symbols.
Is that right?

I wonder if there are other reproducible build techniques that might be
simpler to apply? There is a kernel documentation page at
https://www.kernel.org/doc/html/latest/kbuild/reproducible-builds.html
which gives exisiting techniques to override the date, user and host.
Would they be sufficient to address your use case?

Kind regards,
Daniel

>
> Signed-off-by: Han Dapeng 
> ---
>  arch/powerpc/mm/nohash/kaslr_booke.c | 2 +-
>  arch/s390/boot/version.c | 2 +-
>  arch/x86/boot/compressed/kaslr.c | 2 +-
>  arch/x86/boot/version.c  | 2 +-
>  init/version.c   | 4 ++--
>  scripts/mkcompile_h  | 2 ++
>  6 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/mm/nohash/kaslr_booke.c 
> b/arch/powerpc/mm/nohash/kaslr_booke.c
> index 4c74e8a5482b..494ef408e60c 100644
> --- a/arch/powerpc/mm/nohash/kaslr_booke.c
> +++ b/arch/powerpc/mm/nohash/kaslr_booke.c
> @@ -37,7 +37,7 @@ struct regions {
>  };
>  
>  /* Simplified build-specific string for starting entropy. */
> -static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
> +static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" 
> LINUX_COMPILE_BY "@"
>   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
>  
>  struct regions __initdata regions;
> diff --git a/arch/s390/boot/version.c b/arch/s390/boot/version.c
> index d32e58bdda6a..627416a27d74 100644
> --- a/arch/s390/boot/version.c
> +++ b/arch/s390/boot/version.c
> @@ -3,5 +3,5 @@
>  #include 
>  #include "boot.h"
>  
> -const char kernel_version[] = UTS_RELEASE
> +const char kernel_version[COMPILE_STR_MAX] = UTS_RELEASE
>   " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") " UTS_VERSION;
> diff --git a/arch/x86/boot/compressed/kaslr.c 
> b/arch/x86/boot/compressed/kaslr.c
> index b92fffbe761f..7b72b518a4c8 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -43,7 +43,7 @@
>  extern unsigned long get_cmd_line_ptr(void);
>  
>  /* Simplified build-specific string for starting entropy. */
> -static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
> +static const char build_str[COMPILE_STR_MAX] = UTS_RELEASE " (" 
> LINUX_COMPILE_BY "@"
>   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
>  
>  static unsigned long rotate_xor(unsigned long hash, const void *area,
> diff --git a/arch/x86/boot/version.c b/arch/x86/boot/version.c
> index a1aaaf6c06a6..08feaa2d7a10 100644
> --- a/arch/x86/boot/version.c
> +++ b/arch/x86/boot/version.c
> @@ -14,6 +14,6 @@
>  #include 
>  #include 
>  
> -const char kernel_version[] =
> +const char kernel_version[COMPILE_STR_MAX] =
>   UTS_RELEASE " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ") "
>   UTS_VERSION;
> diff --git a/init/version.c b/init/version.c
> index 92afc782b043..adfc9e91b56b 100644
> --- a/init/version.c
> +++ b/init/version.c
> @@ -35,11 +35,11 @@ struct uts_namespace init_uts_ns = {
>  EXPORT_SYMBOL_GPL(init_uts_ns);
>  
>  /* FIXED STRINGS! Don't touch! */
> -const char linux_banner[] =
> +const char linux_banner[COMPILE_STR_MAX] =
>   "Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
>   LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n";
>  
> -const char linux_proc_banner[] =
> +const char linux_proc_banner[COMPILE_STR_MAX] =
>   "%s version %s"
>   " (" LINUX_COMPILE_BY "@" LINUX_COMPILE_HOST ")"
>   " (" LINUX_COMPILER ") %s\n";
> diff --git a/scripts/mkcompile_h b/scripts/mkcompile_h
> index 4ae735039daf..02b9d9d54da9 100755
> --- a/scripts/mkcompile_h
> +++ b/scripts/mkcompile_h
> @@ -65,6 +65,8 @@ UTS_VERSION="$(echo $UTS_VERSION $CONFIG_FLAGS $TIMESTAMP | 
> cut -b -$UTS_LEN)"
>LD_VERSION=$($LD -v | head -n1 | sed 's/(compatible with [^)]*)//' \
> | sed 's/[[:space:]]*$//')
>printf '#define LINUX_COMPILER "%s"\n' "$CC_VERSION, $LD_VERSION"
> +
> +  echo \#define COMPILE_STR_MAX 512
>  } > .tmpcompile
>  
>  # Only replace the real compile.h if the new one is different,
> -- 
> 2.27.0

Re: [PATCH v1 4/5] mm: ptdump: Support hugepd table entries

2021-04-15 Thread Daniel Axtens

Hi Christophe,

> Which hugepd, page table entries can be at any level
> and can be of any size.
>
> Add support for them.
>
> Signed-off-by: Christophe Leroy 
> ---
>  mm/ptdump.c | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/mm/ptdump.c b/mm/ptdump.c
> index 61cd16afb1c8..6efdb8c15a7d 100644
> --- a/mm/ptdump.c
> +++ b/mm/ptdump.c
> @@ -112,11 +112,24 @@ static int ptdump_pte_entry(pte_t *pte, unsigned long 
> addr,
>  {
>   struct ptdump_state *st = walk->private;
>   pte_t val = ptep_get(pte);
> + unsigned long page_size = next - addr;
> + int level;
> +
> + if (page_size >= PGDIR_SIZE)
> + level = 0;
> + else if (page_size >= P4D_SIZE)
> + level = 1;
> + else if (page_size >= PUD_SIZE)
> + level = 2;
> + else if (page_size >= PMD_SIZE)
> + level = 3;
> + else
> + level = 4;
>  
>   if (st->effective_prot)
> - st->effective_prot(st, 4, pte_val(val));
> + st->effective_prot(st, level, pte_val(val));
>  
> - st->note_page(st, addr, 4, pte_val(val), PAGE_SIZE);
> + st->note_page(st, addr, level, pte_val(val), page_size);

It seems to me that passing both level and page_size is a bit redundant,
but I guess it does reduce the impact on each arch's code?

Kind regards,
Daniel

>  
>   return 0;
>  }
> -- 
> 2.25.0

Re: [PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

2021-04-15 Thread Daniel Axtens

Hi Christophe,

>  static void note_page(struct ptdump_state *pt_st, unsigned long addr, int 
> level,
> -   u64 val)
> +   u64 val, unsigned long page_size)

Compilers can warn about unused parameters at -Wextra level.  However,
reading scripts/Makefile.extrawarn it looks like the warning is
explicitly _disabled_ in the kernel at W=1 and not reenabled at W=2 or
W=3. So I guess this is fine...

> @@ -126,7 +126,7 @@ static int ptdump_hole(unsigned long addr, unsigned long 
> next,
>  {
>   struct ptdump_state *st = walk->private;
>  
> - st->note_page(st, addr, depth, 0);
> + st->note_page(st, addr, depth, 0, 0);

I know it doesn't matter at this point, but I'm not really thrilled by
the idea of passing 0 as the size here. Doesn't the hole have a known
page size?

>  
>   return 0;
>  }
> @@ -153,5 +153,5 @@ void ptdump_walk_pgd(struct ptdump_state *st, struct 
> mm_struct *mm, pgd_t *pgd)
>   mmap_read_unlock(mm);
>  
>   /* Flush out the last page */
> - st->note_page(st, 0, -1, 0);
> + st->note_page(st, 0, -1, 0, 0);

I'm more OK with the idea of passing 0 as the size when the depth is -1
(don't know): if we don't know the depth we conceptually can't know the
page size.

Regards,
Daniel

Re: [PATCH v1 1/5] mm: pagewalk: Fix walk for hugepage tables

2021-04-15 Thread Daniel Axtens

Hi Christophe,

> Pagewalk ignores hugepd entries and walk down the tables
> as if it was traditionnal entries, leading to crazy result.
>
> Add walk_hugepd_range() and use it to walk hugepage tables.
>
> Signed-off-by: Christophe Leroy 
> ---
>  mm/pagewalk.c | 54 +--
>  1 file changed, 48 insertions(+), 6 deletions(-)
>
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index e81640d9f177..410a9d8f7572 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -58,6 +58,32 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, 
> unsigned long end,
>   return err;
>  }
>  
> +static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr,
> +  unsigned long end, struct mm_walk *walk, int 
> pdshift)
> +{
> + int err = 0;
> +#ifdef CONFIG_ARCH_HAS_HUGEPD
> + const struct mm_walk_ops *ops = walk->ops;
> + int shift = hugepd_shift(*phpd);
> + int page_size = 1 << shift;
> +
> + if (addr & (page_size - 1))
> + return 0;
> +
> + for (;;) {
> + pte_t *pte = hugepte_offset(*phpd, addr, pdshift);
> +
> + err = ops->pte_entry(pte, addr, addr + page_size, walk);
> + if (err)
> + break;
> + if (addr >= end - page_size)
> + break;
> + addr += page_size;
> + }

Initially I thought this was a somewhat unintuitive way to structure
this loop, but I see it parallels the structure of walk_pte_range_inner,
so I think the consistency is worth it.

I notice the pte walking code potentially takes some locks: does this
code need to do that?

arch/powerpc/mm/hugetlbpage.c says that hugepds are protected by the
mm->page_table_lock, but I don't think we're taking it in this code.

> +#endif
> + return err;
> +}
> +
>  static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
> struct mm_walk *walk)
>  {
> @@ -108,7 +134,10 @@ static int walk_pmd_range(pud_t *pud, unsigned long 
> addr, unsigned long end,
>   goto again;
>   }
>  
> - err = walk_pte_range(pmd, addr, next, walk);
> + if (is_hugepd(__hugepd(pmd_val(*pmd
> + err = walk_hugepd_range((hugepd_t *)pmd, addr, next, 
> walk, PMD_SHIFT);
> + else
> + err = walk_pte_range(pmd, addr, next, walk);
>   if (err)
>   break;
>   } while (pmd++, addr = next, addr != end);
> @@ -157,7 +186,10 @@ static int walk_pud_range(p4d_t *p4d, unsigned long 
> addr, unsigned long end,
>   if (pud_none(*pud))
>   goto again;
>  
> - err = walk_pmd_range(pud, addr, next, walk);
> + if (is_hugepd(__hugepd(pud_val(*pud
> + err = walk_hugepd_range((hugepd_t *)pud, addr, next, 
> walk, PUD_SHIFT);
> + else
> + err = walk_pmd_range(pud, addr, next, walk);

I'm a bit worried you might end up calling into walk_hugepd_range with
ops->pte_entry == NULL, and then jumping to 0.

static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
  struct mm_walk *walk)
{
...
pud = pud_offset(p4d, addr);
do {
...
if ((!walk->vma && (pud_leaf(*pud) || !pud_present(*pud))) ||
walk->action == ACTION_CONTINUE ||
!(ops->pmd_entry || ops->pte_entry)) <<< THIS CHECK
continue;
...
if (is_hugepd(__hugepd(pud_val(*pud
err = walk_hugepd_range((hugepd_t *)pud, addr, next, 
walk, PUD_SHIFT);
else
err = walk_pmd_range(pud, addr, next, walk);
if (err)
break;
} while (pud++, addr = next, addr != end);

walk_pud_range will proceed if there is _either_ an ops->pmd_entry _or_
an ops->pte_entry, but walk_hugepd_range will call ops->pte_entry
unconditionally.

The same issue applies to walk_{p4d,pgd}_range...

Kind regards,
Daniel

Re: [PATCH] powerpc/iommu/debug: Remove redundant NULL check

2021-03-26 Thread Daniel Axtens

Daniel Axtens  writes:

It looks like the kernel test robot also reported this:
"[PATCH] powerpc/iommu/debug: fix ifnullfree.cocci warnings"
Weirdly I don't see it in patchwork.

I'm not sure which one mpe will want to take but either would do.

>> Fix the following coccicheck warnings:
>>
>> ./fs/io_uring.c:5989:4-9: WARNING: NULL check before some freeing
>> functions is not needed.

(Also, while unimportant, that's technically not the error you fix here
as it's for a different file!)

>
> This looks correct to me, and matches the description of debugfs_remove
> in Documentation/filesystems/debugfs.rst.
>
> If you have a number of similar fixes it might be helpful to do them in
> a single bigger patch, but I'm not sure if coccicheck reports much else
> as I don't have coccinelle installed at the moment.
>
> Reviewed-by: Daniel Axtens 
>
> Kind regards,
> Daniel

Re: [PATCH] crypto: vmx: fix incorrect kernel-doc comment syntax in files

2021-03-26 Thread Daniel Axtens

Hi Aditya,

Thanks for your patch!

> The opening comment mark '/**' is used for highlighting the beginning of
> kernel-doc comments.
> There are certain files in drivers/crypto/vmx, which follow this syntax,
> but the content inside does not comply with kernel-doc.
> Such lines were probably not meant for kernel-doc parsing, but are parsed
> due to the presence of kernel-doc like comment syntax(i.e, '/**'), which
> causes unexpected warnings from kernel-doc.
>
> E.g., presence of kernel-doc like comment in the header line for
> drivers/crypto/vmx/vmx.c causes this warning by kernel-doc:
>
> "warning: expecting prototype for Routines supporting VMX instructions on the 
> Power 8(). Prototype was for p8_init() instead"

checkpatch (scripts/checkpatch.pl --strict -g HEAD) complains about this line:
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per 
line)
but checkpatch should be ignored here, as you did the right thing by not
breaking an error message across multiple lines.

> Similarly for other files too.
>
> Provide a simple fix by replacing such occurrences with general comment
> format, i.e. '/*', to prevent kernel-doc from parsing it.

This makes sense.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

>
> Signed-off-by: Aditya Srivastava 
> ---
> * Applies perfectly on next-20210319
>
>  drivers/crypto/vmx/aes.c | 2 +-
>  drivers/crypto/vmx/aes_cbc.c | 2 +-
>  drivers/crypto/vmx/aes_ctr.c | 2 +-
>  drivers/crypto/vmx/aes_xts.c | 2 +-
>  drivers/crypto/vmx/ghash.c   | 2 +-
>  drivers/crypto/vmx/vmx.c | 2 +-
>  6 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/crypto/vmx/aes.c b/drivers/crypto/vmx/aes.c
> index d05c02baebcf..ec06189fbf99 100644
> --- a/drivers/crypto/vmx/aes.c
> +++ b/drivers/crypto/vmx/aes.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> -/**
> +/*
>   * AES routines supporting VMX instructions on the Power 8
>   *
>   * Copyright (C) 2015 International Business Machines Inc.
> diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
> index d88084447f1c..ed0debc7acb5 100644
> --- a/drivers/crypto/vmx/aes_cbc.c
> +++ b/drivers/crypto/vmx/aes_cbc.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> -/**
> +/*
>   * AES CBC routines supporting VMX instructions on the Power 8
>   *
>   * Copyright (C) 2015 International Business Machines Inc.
> diff --git a/drivers/crypto/vmx/aes_ctr.c b/drivers/crypto/vmx/aes_ctr.c
> index 79ba062ee1c1..9a3da8cd62f3 100644
> --- a/drivers/crypto/vmx/aes_ctr.c
> +++ b/drivers/crypto/vmx/aes_ctr.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> -/**
> +/*
>   * AES CTR routines supporting VMX instructions on the Power 8
>   *
>   * Copyright (C) 2015 International Business Machines Inc.
> diff --git a/drivers/crypto/vmx/aes_xts.c b/drivers/crypto/vmx/aes_xts.c
> index 9fee1b1532a4..dabbccb41550 100644
> --- a/drivers/crypto/vmx/aes_xts.c
> +++ b/drivers/crypto/vmx/aes_xts.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> -/**
> +/*
>   * AES XTS routines supporting VMX In-core instructions on Power 8
>   *
>   * Copyright (C) 2015 International Business Machines Inc.
> diff --git a/drivers/crypto/vmx/ghash.c b/drivers/crypto/vmx/ghash.c
> index 14807ac2e3b9..5bc5710a6de0 100644
> --- a/drivers/crypto/vmx/ghash.c
> +++ b/drivers/crypto/vmx/ghash.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0
> -/**
> +/*
>   * GHASH routines supporting VMX instructions on the Power 8
>   *
>   * Copyright (C) 2015, 2019 International Business Machines Inc.
> diff --git a/drivers/crypto/vmx/vmx.c b/drivers/crypto/vmx/vmx.c
> index a40d08e75fc0..7eb713cc87c8 100644
> --- a/drivers/crypto/vmx/vmx.c
> +++ b/drivers/crypto/vmx/vmx.c
> @@ -1,5 +1,5 @@
>  // SPDX-License-Identifier: GPL-2.0-only
> -/**
> +/*
>   * Routines supporting VMX instructions on the Power 8
>   *
>   * Copyright (C) 2015 International Business Machines Inc.
> -- 
> 2.17.1

Re: [PATCH] powerpc/iommu/debug: Remove redundant NULL check

2021-03-25 Thread Daniel Axtens

Hi Jiapeng Chong,  writes:

> Fix the following coccicheck warnings:
>
> ./fs/io_uring.c:5989:4-9: WARNING: NULL check before some freeing
> functions is not needed.

This looks correct to me, and matches the description of debugfs_remove
in Documentation/filesystems/debugfs.rst.

If you have a number of similar fixes it might be helpful to do them in
a single bigger patch, but I'm not sure if coccicheck reports much else
as I don't have coccinelle installed at the moment.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [PATCH] [v2] tools: testing: Remove duplicate includes

2021-03-25 Thread Daniel Axtens

Wan Jiabing  writes:

> sched.h has been included at line 33, so remove the 
> duplicate one at line 36.

> pthread.h has been included at line 17,so remove the 
> duplicate one at line 20.

I can see that both of these are correct from the diff.

> inttypes.h has been included at line 19, so remove the 
> duplicate one at line 23.

For this one I checked the file. Indeed there is another inttypes.h, so
this is also correct.

Again, all the automated checks pass. (although I don't think any of the
automated builds include selftests.)

So:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

>
> Signed-off-by: Wan Jiabing 
> ---
>  tools/testing/selftests/powerpc/mm/tlbie_test.c | 1 -
>  tools/testing/selftests/powerpc/tm/tm-poison.c  | 1 -
>  tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c | 1 -
>  3 files changed, 3 deletions(-)
>
> diff --git a/tools/testing/selftests/powerpc/mm/tlbie_test.c 
> b/tools/testing/selftests/powerpc/mm/tlbie_test.c
> index f85a0938ab25..48344a74b212 100644
> --- a/tools/testing/selftests/powerpc/mm/tlbie_test.c
> +++ b/tools/testing/selftests/powerpc/mm/tlbie_test.c
> @@ -33,7 +33,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> diff --git a/tools/testing/selftests/powerpc/tm/tm-poison.c 
> b/tools/testing/selftests/powerpc/tm/tm-poison.c
> index 29e5f26af7b9..27c083a03d1f 100644
> --- a/tools/testing/selftests/powerpc/tm/tm-poison.c
> +++ b/tools/testing/selftests/powerpc/tm/tm-poison.c
> @@ -20,7 +20,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "tm.h"
>  
> diff --git a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c 
> b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
> index e2a0c07e8362..9ef37a9836ac 100644
> --- a/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
> +++ b/tools/testing/selftests/powerpc/tm/tm-vmx-unavail.c
> @@ -17,7 +17,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #include "tm.h"
>  #include "utils.h"
> -- 
> 2.25.1

Re: [PATCH] [v2] arch: powerpc: Remove duplicate includes

2021-03-25 Thread Daniel Axtens

Wan Jiabing  writes:

> mmu-hash.h: asm/bug.h has been included at line 12, so remove 
> the duplicate one at line 21.

Looking at the file I had wondered if this was due to a #ifdef being
removed, but no, the second one was just added in commit 891121e6c02c
("powerpc/mm: Differentiate between hugetlb and THP during page
walk"). How odd!

Anyway, all of these look good to me, and the automated checks at
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210323062916.295346-1-wanjiab...@vivo.com/
have all passed.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [PATCH -next] powerpc/eeh: Remove unused inline function eeh_dev_phb_init_dynamic()

2021-03-25 Thread Daniel Axtens

Hi,

> commit 475028efc708 ("powerpc/eeh: Remove eeh_dev_phb_init_dynamic()")
> left behind this, so can remove it.

I had a look: the inline that you are removing here is for the
!CONFIG_EEH case, which explains why it was missed the first time.

This looks like a good change. Out of interest, what tool are you using
to find these unused inlines? If there are many more, it might make
sense to combine future patches removing them into a single patch, but
I'm not sure.

checkpatch likes this patch, so that's also good :)

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [PATCH -next] powerpc/smp: Remove unused inline functions

2021-03-25 Thread Daniel Axtens

Hi,

> commit 441c19c8a290 ("powerpc/kvm/book3s_hv: Rework the secondary inhibit 
> code")
> left behind this, so can remove it.
>

Interesting: that commit removed some instances of
(un)inhibit_secondary_onlining, but it seems to have missed the ones for
the uni-processor case, which your patch removes. This seems like a good
change.

Checkpatch does have one small complaint about your commit message:

| WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a 
maximum 75 chars per line)
| #6: 
| commit 441c19c8a290 ("powerpc/kvm/book3s_hv: Rework the secondary inhibit 
code")

I don't think this warrants another revision, I think leaving the commit
name on one line makes sense.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Signed-off-by: YueHaibing 
> ---
>  arch/powerpc/include/asm/smp.h | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index 7a13bc20f0a0..ad7129a19e8f 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -189,8 +189,6 @@ extern void __cpu_die(unsigned int cpu);
>  #define hard_smp_processor_id()  get_hard_smp_processor_id(0)
>  #define smp_setup_cpu_maps()
>  #define thread_group_shares_l2  0
> -static inline void inhibit_secondary_onlining(void) {}
> -static inline void uninhibit_secondary_onlining(void) {}
>  static inline const struct cpumask *cpu_sibling_mask(int cpu)
>  {
>   return cpumask_of(cpu);
> -- 
> 2.17.1

Re: [PATCH] selftests: powerpc: unmark non-kernel-doc comments

2021-03-25 Thread Daniel Axtens

Randy Dunlap  writes:

> Drop the 'beginning of kernel-doc' notation markers (/**)
> in places that are not in kernel-doc format.

This looks good to me. Arguably we don't need the comments at all, but
it doesn't seem to hurt to keep them.

checkpatch is OK with the entire file, so there's nothing else we'd
really want to clean up while you're doing cleanups.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

>
> Signed-off-by: Randy Dunlap 
> Cc: Michael Ellerman 
> Cc: linuxppc-...@lists.ozlabs.org
> ---
>  tools/testing/selftests/powerpc/tm/tm-trap.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- linux-next-20210323.orig/tools/testing/selftests/powerpc/tm/tm-trap.c
> +++ linux-next-20210323/tools/testing/selftests/powerpc/tm/tm-trap.c
> @@ -66,7 +66,7 @@ void trap_signal_handler(int signo, sigi
>   /* Get thread endianness: extract bit LE from MSR */
>   thread_endianness = MSR_LE & ucp->uc_mcontext.gp_regs[PT_MSR];
>  
> - /***
> + /*
>* Little-Endian Machine
>*/
>  
> @@ -126,7 +126,7 @@ void trap_signal_handler(int signo, sigi
>   }
>   }
>  
> - /***
> + /*
>* Big-Endian Machine
>*/
>

Re: [PATCH v2 07/15] powerpc/uaccess: Call might_fault() inconditionaly

2021-03-25 Thread Daniel Axtens

Daniel Axtens  writes:

> Hi Christophe,
>
>> Commit 6bfd93c32a50 ("powerpc: Fix incorrect might_sleep in
>> __get_user/__put_user on kernel addresses") added a check to not call
>> might_sleep() on kernel addresses. This was to enable the use of
>> __get_user() in the alignment exception handler for any address.
>>
>> Then commit 95156f0051cb ("lockdep, mm: fix might_fault() annotation")
>> added a check of the address space in might_fault(), based on
>> set_fs() logic. But this didn't solve the powerpc alignment exception
>> case as it didn't call set_fs(KERNEL_DS).
>>
>> Nowadays, set_fs() is gone, previous patch fixed the alignment
>> exception handler and __get_user/__put_user are not supposed to be
>> used anymore to read kernel memory.
>>
>> Therefore the is_kernel_addr() check has become useless and can be
>> removed.
>
> While I agree that __get_user/__put_user should not be used on kernel
> memory, I'm not sure that we have covered every case where they might be
> used on kernel memory yet. I did a git grep for __get_user - there are
> several callers in arch/powerpc and it looks like at least lib/sstep.c
> might be using __get_user to read kernel memory while single-stepping.
>
> I am not sure if might_sleep has got logic to cover the powerpc case -
> it uses uaccess_kernel, but we don't supply a definition for that on
> powerpc, so if we do end up calling __get_user on a kernel address, I
> think we might now throw a warning. (Unless we are saved by
> pagefault_disabled()?)

Ah, I just re-read some of my earlier emails and was reminded that yes,
if we are calling __get/put, we must have disabled page faults.

So yes, this is good.

>
> (But I haven't tested this yet, so it's possible I misunderstood
> something.)
>
> Do you expect any consequences if we've missed a case where
> __(get|put)_user is called on a kernel address because it hasn't been
> converted to use better helpers yet?
>
> Kind regards,
> Daniel
>
>>
>> Signed-off-by: Christophe Leroy 
>> ---
>>  arch/powerpc/include/asm/uaccess.h | 9 -
>>  1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/uaccess.h 
>> b/arch/powerpc/include/asm/uaccess.h
>> index eaa828a6a419..c4bbc64758a0 100644
>> --- a/arch/powerpc/include/asm/uaccess.h
>> +++ b/arch/powerpc/include/asm/uaccess.h
>> @@ -77,8 +77,7 @@ __pu_failed:   
>> \
>>  __typeof__(*(ptr)) __pu_val = (x);  \
>>  __typeof__(size) __pu_size = (size);\
>>  \
>> -if (!is_kernel_addr((unsigned long)__pu_addr))  \
>> -might_fault();  \
>> +might_fault();  \
>>  __chk_user_ptr(__pu_addr);  \
>>  __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err);  \
>>  \
>> @@ -238,12 +237,12 @@ do {   
>> \
>>  __typeof__(size) __gu_size = (size);\
>>  \
>>  __chk_user_ptr(__gu_addr);  \
>> -if (do_allow && !is_kernel_addr((unsigned long)__gu_addr)) \
>> +if (do_allow) { 
>> \
>>  might_fault();  \
>> -if (do_allow)   
>> \
>>  __get_user_size(__gu_val, __gu_addr, __gu_size, __gu_err);  
>> \
>> -else        
>> \
>> +} else {
>> \
>>  __get_user_size_allowed(__gu_val, __gu_addr, __gu_size, 
>> __gu_err); \
>> +}   
>> \

One microscopic nit: these changes throw the '\'s further out of
alignment.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

>>  (x) = (__typeof__(*(ptr)))__gu_val; \
>>  \
>>  __gu_err;   \
>> -- 
>> 2.25.0

Re: [PATCH v2 07/15] powerpc/uaccess: Call might_fault() inconditionaly

2021-03-25 Thread Daniel Axtens

Hi Christophe,

> Commit 6bfd93c32a50 ("powerpc: Fix incorrect might_sleep in
> __get_user/__put_user on kernel addresses") added a check to not call
> might_sleep() on kernel addresses. This was to enable the use of
> __get_user() in the alignment exception handler for any address.
>
> Then commit 95156f0051cb ("lockdep, mm: fix might_fault() annotation")
> added a check of the address space in might_fault(), based on
> set_fs() logic. But this didn't solve the powerpc alignment exception
> case as it didn't call set_fs(KERNEL_DS).
>
> Nowadays, set_fs() is gone, previous patch fixed the alignment
> exception handler and __get_user/__put_user are not supposed to be
> used anymore to read kernel memory.
>
> Therefore the is_kernel_addr() check has become useless and can be
> removed.

While I agree that __get_user/__put_user should not be used on kernel
memory, I'm not sure that we have covered every case where they might be
used on kernel memory yet. I did a git grep for __get_user - there are
several callers in arch/powerpc and it looks like at least lib/sstep.c
might be using __get_user to read kernel memory while single-stepping.

I am not sure if might_sleep has got logic to cover the powerpc case -
it uses uaccess_kernel, but we don't supply a definition for that on
powerpc, so if we do end up calling __get_user on a kernel address, I
think we might now throw a warning. (Unless we are saved by
pagefault_disabled()?)

(But I haven't tested this yet, so it's possible I misunderstood
something.)

Do you expect any consequences if we've missed a case where
__(get|put)_user is called on a kernel address because it hasn't been
converted to use better helpers yet?

Kind regards,
Daniel

>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/uaccess.h | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/uaccess.h 
> b/arch/powerpc/include/asm/uaccess.h
> index eaa828a6a419..c4bbc64758a0 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -77,8 +77,7 @@ __pu_failed:
> \
>   __typeof__(*(ptr)) __pu_val = (x);  \
>   __typeof__(size) __pu_size = (size);\
>   \
> - if (!is_kernel_addr((unsigned long)__pu_addr))  \
> - might_fault();  \
> + might_fault();  \
>   __chk_user_ptr(__pu_addr);  \
>   __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err);  \
>   \
> @@ -238,12 +237,12 @@ do {
> \
>   __typeof__(size) __gu_size = (size);\
>   \
>   __chk_user_ptr(__gu_addr);  \
> - if (do_allow && !is_kernel_addr((unsigned long)__gu_addr)) \
> + if (do_allow) { 
> \
>   might_fault();  \
> - if (do_allow)   
> \
>   __get_user_size(__gu_val, __gu_addr, __gu_size, __gu_err);  
> \
> - else
> \
> + } else {
> \
>   __get_user_size_allowed(__gu_val, __gu_addr, __gu_size, 
> __gu_err); \
> + }   
> \
>   (x) = (__typeof__(*(ptr)))__gu_val; \
>   \
>   __gu_err;   \
> -- 
> 2.25.0

Re: [PATCH v2 06/15] powerpc/align: Don't use __get_user_instr() on kernel addresses

2021-03-25 Thread Daniel Axtens

Christophe Leroy  writes:

> In the old days, when we didn't have kernel userspace access
> protection and had set_fs(), it was wise to use __get_user()
> and friends to read kernel memory.
>
> Nowadays, get_user() is granting userspace access and is exclusively
> for userspace access.
>
> In alignment exception handler, use probe_kernel_read_inst()
> instead of __get_user_instr() for reading instructions in kernel.
>
> This will allow to remove the is_kernel_addr() check in
> __get/put_user() in a following patch.
>

Looks good to me!

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/align.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
> index c4d7b445b459..8d4c7af262e2 100644
> --- a/arch/powerpc/kernel/align.c
> +++ b/arch/powerpc/kernel/align.c
> @@ -310,7 +310,11 @@ int fix_alignment(struct pt_regs *regs)
>*/
>   CHECK_FULL_REGS(regs);
>  
> - if (unlikely(__get_user_instr(instr, (void __user *)regs->nip)))
> + if (is_kernel_addr(regs->nip))
> + r = probe_kernel_read_inst(, (void *)regs->nip);
> + else
> + r = __get_user_instr(instr, (void __user *)regs->nip);
> + if (unlikely(r))
>   return -EFAULT;
>   if ((regs->msr & MSR_LE) != (MSR_KERNEL & MSR_LE)) {
>   /* We don't handle PPC little-endian any more... */
> -- 
> 2.25.0

Re: [PATCH v2 05/15] powerpc/uaccess: Move get_user_instr helpers in asm/inst.h

2021-03-25 Thread Daniel Axtens

Hi Christophe,

> Those helpers use get_user helpers but they don't participate
> in their implementation, so they do not belong to asm/uaccess.h
>
> Move them in asm/inst.h

Hmm, is asm/inst.h the right place for this?

asm/inst.h seems to be entirely concerned with the ppc_inst type:
converting things to and from ppc_inst, print ppc_inst as a string,
dealing with prefixed instructs, etc., etc. The only things currently
that look at memory are the probe_user_read_inst and
probe_kernel_read_inst prototypes...

Having said that, I'm not sure quite where else to put it, and none of
the other places in arch/powerpc/include that currently reference
ppc_inst seem any better...

If we do use asm/inst.h, I think maybe it makes sense to put the
code towards the end rather than at the start, as uses structs and calls
macros that are defined later on in the function.

Kind regards,
Daniel

>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/inst.h| 34 ++
>  arch/powerpc/include/asm/uaccess.h | 34 --
>  2 files changed, 34 insertions(+), 34 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
> index cc73c1267572..19e18af2fac9 100644
> --- a/arch/powerpc/include/asm/inst.h
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -4,6 +4,40 @@
>  
>  #include 
>  
> +#ifdef CONFIG_PPC64
> +
> +#define ___get_user_instr(gu_op, dest, ptr)  \
> +({   \
> + long __gui_ret = 0; \
> + unsigned long __gui_ptr = (unsigned long)ptr;   \
> + struct ppc_inst __gui_inst; \
> + unsigned int __prefix, __suffix;\
> + __gui_ret = gu_op(__prefix, (unsigned int __user *)__gui_ptr);  \
> + if (__gui_ret == 0) {   \
> + if ((__prefix >> 26) == OP_PREFIX) {\
> + __gui_ret = gu_op(__suffix, \
> + (unsigned int __user *)__gui_ptr + 1);  \
> + __gui_inst = ppc_inst_prefix(__prefix,  \
> +  __suffix); \
> + } else {\
> + __gui_inst = ppc_inst(__prefix);\
> + }   \
> + if (__gui_ret == 0) \
> + (dest) = __gui_inst;\
> + }   \
> + __gui_ret;  \
> +})
> +#else /* !CONFIG_PPC64 */
> +#define ___get_user_instr(gu_op, dest, ptr)  \
> + gu_op((dest).val, (u32 __user *)(ptr))
> +#endif /* CONFIG_PPC64 */
> +
> +#define get_user_instr(x, ptr) \
> + ___get_user_instr(get_user, x, ptr)
> +
> +#define __get_user_instr(x, ptr) \
> + ___get_user_instr(__get_user, x, ptr)
> +
>  /*
>   * Instruction data type for POWER
>   */
> diff --git a/arch/powerpc/include/asm/uaccess.h 
> b/arch/powerpc/include/asm/uaccess.h
> index 01aea0df4dd0..eaa828a6a419 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -53,40 +53,6 @@ static inline bool __access_ok(unsigned long addr, 
> unsigned long size)
>  #define __put_user(x, ptr) \
>   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
>  
> -#ifdef CONFIG_PPC64
> -
> -#define ___get_user_instr(gu_op, dest, ptr)  \
> -({   \
> - long __gui_ret = 0; \
> - unsigned long __gui_ptr = (unsigned long)ptr;   \
> - struct ppc_inst __gui_inst; \
> - unsigned int __prefix, __suffix;\
> - __gui_ret = gu_op(__prefix, (unsigned int __user *)__gui_ptr);  \
> - if (__gui_ret == 0) {   \
> - if ((__prefix >> 26) == OP_PREFIX) {\
> - __gui_ret = gu_op(__suffix, \
> - (unsigned int __user *)__gui_ptr + 1);  \
> - __gui_inst = ppc_inst_prefix(__prefix,  \
> -  __suffix); \
> - } else {\
> - __gui_inst = ppc_inst(__prefix);\
> - }   \
> - if (__gui_ret == 0)

Re: [PATCH v11 0/6] KASAN for powerpc64 radix

2021-03-22 Thread Daniel Axtens

Hi Christophe,

> In the discussion we had long time ago, 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190806233827.16454-5-...@axtens.net/#2321067
>  
> , I challenged you on why it was not possible to implement things the same 
> way as other 
> architectures, in extenso with an early mapping.
>
> Your first answer was that too many things were done in real mode at startup. 
> After some discussion 
> you said that finally there was not that much things at startup but the issue 
> was KVM.
>
> Now you say that instrumentation on KVM is fully disabled.
>
> So my question is, if KVM is not a problem anymore, why not go the standard 
> way with an early shadow 
> ? Then you could also support inline instrumentation.

Fair enough, I've had some trouble both understanding the problem myself
and clearly articulating it. Let me try again.

We need translations on to access the shadow area.

We reach setup_64.c::early_setup() with translations off. At this point
we don't know what MMU we're running under, or our CPU features.

To determine our MMU and CPU features, early_setup() calls functions
(dt_cpu_ftrs_init, early_init_devtree) that call out to generic code
like of_scan_flat_dt. We need to do this before we turn on translations
because we can't set up the MMU until we know what MMU we have.

So this puts us in a bind:

 - We can't set up an early shadow until we have translations on, which
   requires that the MMU is set up.

 - We can't set up an MMU until we call out to generic code for FDT
   parsing.

So there will be calls to generic FDT parsing code that happen before the
early shadow is set up.

The setup code also prints a bunch of information about the platform
with printk() while translations are off, so it wouldn't even be enough
to disable instrumentation for bits of the generic DT code on ppc64.

Does that make sense? If you can figure out how to 'square the circle'
here I'm all ears.

Other notes:

 - There's a comment about printk() being 'safe' in early_setup(), that
   refers to having a valid PACA, it doesn't mean that it's safe in any
   other sense.

 - KVM does indeed also run stuff with translations off but we can catch
   all of that by disabling instrumentation on the real-mode handlers:
   it doesn't seem to leak out to generic code. So you are right that
   KVM is no longer an issue.

Kind regards,
Daniel


>
> Christophe

Re: [PATCH v11 6/6] powerpc: Book3S 64-bit outline-only KASAN support

2021-03-21 Thread Daniel Axtens

Balbir Singh  writes:

> On Mon, Mar 22, 2021 at 11:55:08AM +1100, Daniel Axtens wrote:
>> Hi Balbir,
>> 
>> > Could you highlight the changes from
>> > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20170729140901.5887-1-bsinghar...@gmail.com/?
>> >
>> > Feel free to use my signed-off-by if you need to and add/update copyright
>> > headers if appropriate.
>> 
>> There's not really anything in common any more:
>> 
>>  - ppc32 KASAN landed, so there was already a kasan.h for powerpc, the
>>explicit memcpy changes, the support for non-instrumented files,
>>prom_check.sh, etc. all already landed.
>> 
>>  - I locate the shadow region differently and don't resize any virtual
>>memory areas.
>> 
>>  - The ARCH_DEFINES_KASAN_ZERO_PTE handling changed upstream and our
>>handling for that is now handled more by patch 3.
>> 
>>  - The outline hook is now an inline function rather than a #define.
>> 
>>  - The init function has been totally rewritten as it's gone from
>>supporting real mode to not supporting real mode and back.
>> 
>>  - The list of non-instrumented files has grown a lot.
>> 
>>  - There's new stuff: stack walking is now safe, KASAN vmalloc support
>>means modules are better supported now, ptdump works, and there's
>>documentation.
>> 
>> It's been a while now, but I don't think when I started this process 2
>> years ago that I directly reused much of your code. So I'm not sure that
>> a signed-off-by makes sense here? Would a different tag (Originally-by?)
>> make more sense?
>>
>
> Sure

Will do.

>  
>> >> + * The shadow ends before the highest accessible address
>> >> + * because we don't need a shadow for the shadow. Instead:
>> >> + * c00e << 3 + a80e    000 = c00fc000
>> >
>> > The comment has one extra 0 in a80e.., I did the math and had to use
>> > the data from the defines :)
>> 
>> 3 extra 0s, even! Fixed.
>> 
>> >> +void __init kasan_init(void)
>> >> +{
>> >> + /*
>> >> +  * We want to do the following things:
>> >> +  *  1) Map real memory into the shadow for all physical memblocks
>> >> +  * This takes us from c000... to c008...
>> >> +  *  2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
>> >> +  * will manage this for us.
>> >> +  * This takes us from c008... to c00a...
>> >> +  *  3) Map the 'early shadow'/zero page over iomap and vmemmap space.
>> >> +  * This takes us up to where we start at c00e...
>> >> +  */
>> >> +
>> >
>> > assuming we have
>> > #define VMEMMAP_END R_VMEMMAP_END
>> > and ditto for hash we probably need
>> >
>> >BUILD_BUG_ON(VMEMMAP_END + KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
>> 
>> Sorry, I'm not sure what this is supposed to be testing? In what
>> situation would this trigger?
>>
>
> I am bit concerned that we have hard coded (IIR) 0xa80e... in the
> config, any changes to VMEMMAP_END, KASAN_SHADOW_OFFSET/END
> should be guarded.
>

Ah that makes sense. I'll come up with some test that should catch any
unsynchronised changes to VMEMMAP_END, KASAN_SHADOW_OFFSET or
KASAN_SHADOW_END.

Kind regards,
Daniel Axtens

> Balbir Singh.

Re: [PATCH v11 6/6] powerpc: Book3S 64-bit outline-only KASAN support

2021-03-21 Thread Daniel Axtens

Hi Balbir,

> Could you highlight the changes from
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20170729140901.5887-1-bsinghar...@gmail.com/?
>
> Feel free to use my signed-off-by if you need to and add/update copyright
> headers if appropriate.

There's not really anything in common any more:

 - ppc32 KASAN landed, so there was already a kasan.h for powerpc, the
   explicit memcpy changes, the support for non-instrumented files,
   prom_check.sh, etc. all already landed.

 - I locate the shadow region differently and don't resize any virtual
   memory areas.

 - The ARCH_DEFINES_KASAN_ZERO_PTE handling changed upstream and our
   handling for that is now handled more by patch 3.

 - The outline hook is now an inline function rather than a #define.

 - The init function has been totally rewritten as it's gone from
   supporting real mode to not supporting real mode and back.

 - The list of non-instrumented files has grown a lot.

 - There's new stuff: stack walking is now safe, KASAN vmalloc support
   means modules are better supported now, ptdump works, and there's
   documentation.

It's been a while now, but I don't think when I started this process 2
years ago that I directly reused much of your code. So I'm not sure that
a signed-off-by makes sense here? Would a different tag (Originally-by?)
make more sense?

>> + * The shadow ends before the highest accessible address
>> + * because we don't need a shadow for the shadow. Instead:
>> + * c00e << 3 + a80e    000 = c00fc000
>
> The comment has one extra 0 in a80e.., I did the math and had to use
> the data from the defines :)

3 extra 0s, even! Fixed.

>> +void __init kasan_init(void)
>> +{
>> +/*
>> + * We want to do the following things:
>> + *  1) Map real memory into the shadow for all physical memblocks
>> + * This takes us from c000... to c008...
>> + *  2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
>> + * will manage this for us.
>> + * This takes us from c008... to c00a...
>> + *  3) Map the 'early shadow'/zero page over iomap and vmemmap space.
>> + * This takes us up to where we start at c00e...
>> + */
>> +
>
> assuming we have
> #define VMEMMAP_END R_VMEMMAP_END
> and ditto for hash we probably need
>
>   BUILD_BUG_ON(VMEMMAP_END + KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);

Sorry, I'm not sure what this is supposed to be testing? In what
situation would this trigger?

Kind regards,
Daniel

>
> Looks good otherwise, I've not been able to test it yet
>
> Balbir Singh.

Re: [PATCH v11 1/6] kasan: allow an architecture to disable inline instrumentation

2021-03-21 Thread Daniel Axtens

Balbir Singh  writes:

> On Sat, Mar 20, 2021 at 01:40:53AM +1100, Daniel Axtens wrote:
>> For annoying architectural reasons, it's very difficult to support inline
>> instrumentation on powerpc64.
>
> I think we can expand here and talk about how in hash mode, the vmalloc
> address space is in a region of memory different than where kernel virtual
> addresses are mapped. Did I recollect the reason correctly?

I think that's _a_ reason, but for radix mode (which is all I support at
the moment), the reason is a bit simpler. We call into generic code like
the DT parser and printk when we have translations off. The shadow
region lives at c00e which is not part of the linear mapping, so if
you try to access the shadow while in real mode you will access unmapped
memory and (at least on PowerNV) take a machine check.

>> 
>> Add a Kconfig flag to allow an arch to disable inline. (It's a bit
>> annoying to be 'backwards', but I'm not aware of any way to have
>> an arch force a symbol to be 'n', rather than 'y'.)
>> 
>> We also disable stack instrumentation in this case as it does things that
>> are functionally equivalent to inline instrumentation, namely adding
>> code that touches the shadow directly without going through a C helper.
>> 
>> Signed-off-by: Daniel Axtens 
>> ---
>>  lib/Kconfig.kasan | 8 
>>  1 file changed, 8 insertions(+)
>> 
>> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
>> index cffc2ebbf185..7e237dbb6df3 100644
>> --- a/lib/Kconfig.kasan
>> +++ b/lib/Kconfig.kasan
>> @@ -12,6 +12,9 @@ config HAVE_ARCH_KASAN_HW_TAGS
>>  config HAVE_ARCH_KASAN_VMALLOC
>>  bool
>>  
>> +config ARCH_DISABLE_KASAN_INLINE
>> +def_bool n
>> +
>
> Some comments on what arch's want to disable kasan inline would
> be helpful and why.

Sure, added.

Kind regards,
Daniel

[PATCH v11 4/6] kasan: Document support on 32-bit powerpc

2021-03-19 Thread Daniel Axtens

KASAN is supported on 32-bit powerpc and the docs should reflect this.

Suggested-by: Christophe Leroy 
Reviewed-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst |  8 ++--
 Documentation/powerpc/kasan.txt   | 12 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index a8c3e0cff88d..2cfd5d9068c0 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -36,7 +36,8 @@ Both software KASAN modes work with SLUB and SLAB memory 
allocators,
 while the hardware tag-based KASAN currently only supports SLUB.
 
 Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
-and riscv architectures, and tag-based KASAN modes are supported only for 
arm64.
+and riscv architectures. It is also supported on 32-bit powerpc kernels.
+Tag-based KASAN modes are supported only for arm64.
 
 Usage
 -
@@ -334,7 +335,10 @@ CONFIG_KASAN_VMALLOC
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
 cost of greater memory usage. Currently, this is supported on x86,
-riscv, s390, and powerpc.
+riscv, s390, and 32-bit powerpc.
+
+It is optional, except on 32-bit powerpc kernels with module support,
+where it is required.
 
 This works by hooking into vmalloc and vmap and dynamically
 allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
new file mode 100644
index ..26bb0e8bb18c
--- /dev/null
+++ b/Documentation/powerpc/kasan.txt
@@ -0,0 +1,12 @@
+KASAN is supported on powerpc on 32-bit only.
+
+32 bit support
+==
+
+KASAN is supported on both hash and nohash MMUs on 32-bit.
+
+The shadow area sits at the top of the kernel virtual memory space above the
+fixmap area and occupies one eighth of the total kernel virtual memory space.
+
+Instrumentation of the vmalloc area is optional, unless built with modules,
+in which case it is required.
-- 
2.27.0

[PATCH v11 5/6] powerpc/mm/kasan: rename kasan_init_32.c to init_32.c

2021-03-19 Thread Daniel Axtens

kasan is already implied by the directory name, we don't need to
repeat it.

Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/mm/kasan/Makefile   | 2 +-
 arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)

diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index bb1a5408b86b..42fb628a44fd 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -2,6 +2,6 @@
 
 KASAN_SANITIZE := n
 
-obj-$(CONFIG_PPC32)   += kasan_init_32.o
+obj-$(CONFIG_PPC32)   += init_32.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
 obj-$(CONFIG_PPC_BOOK3S_32)+= book3s_32.o
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/init_32.c
similarity index 100%
rename from arch/powerpc/mm/kasan/kasan_init_32.c
rename to arch/powerpc/mm/kasan/init_32.c
-- 
2.27.0

[PATCH v11 6/6] powerpc: Book3S 64-bit outline-only KASAN support

2021-03-19 Thread Daniel Axtens

Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.

 - Enable the compiler instrumentation to check addresses and maintain the
   shadow region. (This is the guts of KASAN which we can easily reuse.)

 - Require kasan-vmalloc support to handle modules and anything else in
   vmalloc space.

 - KASAN needs to be able to validate all pointer accesses, but we can't
   instrument all kernel addresses - only linear map and vmalloc. On boot,
   set up a single page of read-only shadow that marks all iomap and
   vmemmap accesses as valid.

 - Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
   generic code, arm64, s390 and x86 all do this for similar sorts of
   reasons: when unwinding a stack, we might touch memory that KASAN has
   marked as being out-of-bounds. In our case we often get this when
   checking for an exception frame because we're checking an arbitrary
   offset into the stack frame.

   See commit 20955746320e ("s390/kasan: avoid false positives during stack
   unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
   frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
   Prevent KASAN false positive warnings") and commit 6e22c8366416
   ("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer")

 - Document KASAN in both generic and powerpc docs.

Background
--

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code with translations off ("real mode") during boot,
   including a lot of generic device-tree parsing code which is used to
   determine MMU features.

[ppc64 mm note: The kernel installs a linear mapping at effective
address c000...-c008 This is a one-to-one mapping with physical
memory from ... onward. Because of how memory accesses work on
powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
same memory both with translations on (accessing as an 'effective
address'), and with translations off (accessing as a 'real
address'). This works in both guests and the hypervisor. For more
details, see s5.7 of Book III of version 3 of the ISA, in particular
the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
KASAN implementation currently only supports Radix.]

 - Some code - most notably a lot of KVM code - also runs with translations
   off after boot.

 - Therefore any offset has to point to memory that is valid with
   translations on or off.

One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.

Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.

Cc: Balbir Singh  # ppc64 out-of-line radix version
Cc: Aneesh Kumar K.V  # ppc64 hash version
Cc: Christophe Leroy  # ppc32 version
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst| 11 +--
 Documentation/powerpc/kasan.txt  | 48 +-
 arch/powerpc/Kconfig |  4 +-
 arch/powerpc/Kconfig.debug   |  3 +-
 arch/powerpc/include/asm/book3s/64/hash.h|  4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  4 +
 arch/powerpc/include/asm/book3s/64/radix.h   | 13 ++-
 arch/powerpc/include/asm/kasan.h | 22 +
 arch/powerpc/kernel/Makefile | 11 +++
 arch/powerpc/kernel/process.c| 16 ++--
 arch/powerpc/kvm/Makefile|  5 ++
 arch/powerpc/mm/book3s64/Makefile|  9 ++
 arch/powerpc/mm/kasan/Makefile   |  1 +
 arch/powerpc/mm/kasan/init_book3s_64.c   | 95 
 arch/powerpc/mm/ptdump/ptdump.c  | 20 -
 arch/powerpc/platforms/Kconfig.cputype   |  1 +
 arch/powerpc/platforms/powernv/Makefile  |  6 ++
 arch/powerpc/platforms/pseries/Makefile  |  3 +
 18 files changed, 257 insertions(+), 19 deletions(-)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index 2cfd5d9068c0..8024b55c7aa8 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -36,8 +36,9 @@ Both software KASAN modes work with SLUB and SLAB memory 
allocators,
 while the hardware tag-based KASAN currently only supports SLUB.
 
 Currently, generic KASAN is supported

[PATCH v11 1/6] kasan: allow an architecture to disable inline instrumentation

2021-03-19 Thread Daniel Axtens

For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.

Signed-off-by: Daniel Axtens 
---
 lib/Kconfig.kasan | 8 
 1 file changed, 8 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..7e237dbb6df3 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,9 @@ config HAVE_ARCH_KASAN_HW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+config ARCH_DISABLE_KASAN_INLINE
+   def_bool n
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -130,6 +133,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
bool "Inline instrumentation"
+   depends on !ARCH_DISABLE_KASAN_INLINE
help
  Compiler directly inserts code checking shadow memory before
  memory accesses. This is faster than outline (in some workloads
@@ -142,6 +146,7 @@ config KASAN_STACK
bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
depends on KASAN_GENERIC || KASAN_SW_TAGS
default y if CC_IS_GCC
+   depends on !ARCH_DISABLE_KASAN_INLINE
help
  The LLVM stack address sanitizer has a know problem that
  causes excessive stack usage in a lot of functions, see
@@ -154,6 +159,9 @@ config KASAN_STACK
  but clang users can still enable it for builds without
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
+ If the architecture disables inline instrumentation, this is
+ also disabled as it adds inline-style instrumentation that
+ is run unconditionally.
 
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
-- 
2.27.0

[PATCH v11 2/6] kasan: allow architectures to provide an outline readiness check

2021-03-19 Thread Daniel Axtens

Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.

Cc: Balbir Singh 
Cc: Aneesh Kumar K.V 
Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 

--

I discuss the justfication for this later in the series. Also,
both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
---
 include/linux/kasan.h | 4 
 mm/kasan/common.c | 4 
 mm/kasan/generic.c| 3 +++
 mm/kasan/shadow.c | 4 
 4 files changed, 15 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 8b3b99d659b7..6bd8343f0033 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -23,6 +23,10 @@ struct kunit_kasan_expectation {
 
 #endif
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#endif
+
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 
 #include 
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 6bb87f2acd4e..f23a9e2dce9f 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -345,6 +345,10 @@ static inline bool kasan_slab_free(struct kmem_cache 
*cache, void *object,
if (unlikely(cache->flags & SLAB_TYPESAFE_BY_RCU))
return false;
 
+   /* We can't read the shadow byte if the arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return false;
+
if (!kasan_byte_accessible(tagged_object)) {
kasan_report_invalid_free(tagged_object, ip);
return true;
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 53cbf28859b5..c3f5ba7a294a 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned 
long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return true;
+
if (unlikely(size == 0))
return true;
 
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 727ad4629173..1f650c521037 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -80,6 +80,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, 
bool init)
 */
addr = kasan_reset_tag(addr);
 
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
/* Skip KFENCE memory if called explicitly outside of sl*b. */
if (is_kfence_address(addr))
return;
-- 
2.27.0

[PATCH v11 3/6] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2021-03-19 Thread Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.

Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 18 +++---
 mm/kasan/init.c   |  6 +++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 6bd8343f0033..68cd6e55c872 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -44,10 +44,22 @@ static inline bool kasan_arch_is_ready(void){ 
return true; }
 #define PTE_HWTABLE_PTRS 0
 #endif
 
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index c4605ac9837b..b4d822dff1fb 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
__page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
-- 
2.27.0

[PATCH v11 0/6] KASAN for powerpc64 radix

2021-03-19 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU.

v11 applies to next-20210317. I had hoped to have it apply to
powerpc/next but once again there are changes in the kasan core that
clash. Also, thanks to mpe for fixing a build break with KASAN off.

I'm not sure how best to progress this towards actually being merged
when it has impacts across subsystems. I'd appreciate any input. Maybe
the first four patches could go in via the kasan tree, that should
make things easier for powerpc in a future cycle?

v10 rebases on top of next-20210125, fixing things up to work on top
of the latest changes, and fixing some review comments from
Christophe. I have tested host and guest with 64k pages for this spin.

There is now only 1 failing KUnit test: kasan_global_oob - gcc puts
the ASAN init code in a section called '.init_array'. Powerpc64 module
loading code goes through and _renames_ any section beginning with
'.init' to begin with '_init' in order to avoid some complexities
around our 24-bit indirect jumps. This means it renames '.init_array'
to '_init_array', and the generic module loading code then fails to
recognise the section as a constructor and thus doesn't run it. This
hack dates back to 2003 and so I'm not going to try to unpick it in
this series. (I suspect this may have previously worked if the code
ended up in .ctors rather than .init_array but I don't keep my old
binaries around so I have no real way of checking.)

(The previously failing stack tests are now skipped due to more
accurate configuration settings.)

Details from v9: This is a significant reworking of the previous
versions. Instead of the previous approach which supported inline
instrumentation, this series provides only outline instrumentation.

To get around the problem of accessing the shadow region inside code we run
with translations off (in 'real mode'), we we restrict checking to when
translations are enabled. This is done via a new hook in the kasan core and
by excluding larger quantites of arch code from instrumentation. The upside
is that we no longer require that you be able to specify the amount of
physically contiguous memory on the system at compile time. Hopefully this
is a better trade-off. More details in patch 6.

kexec works. Both 64k and 4k pages work. Running as a KVM host works, but
nothing in arch/powerpc/kvm is instrumented. It's also potentially a bit
fragile - if any real mode code paths call out to instrumented code, things
will go boom.

Kind regards,
Daniel

Daniel Axtens (6):
  kasan: allow an architecture to disable inline instrumentation
  kasan: allow architectures to provide an outline readiness check
  kasan: define and use MAX_PTRS_PER_* for early shadow tables
  kasan: Document support on 32-bit powerpc
  powerpc/mm/kasan: rename kasan_init_32.c to init_32.c
  powerpc: Book3S 64-bit outline-only KASAN support

Re: [PATCH] powerpc: arch/powerpc/kernel/setup_64.c - cleanup warnings

2021-03-17 Thread Daniel Axtens

"heying (H)"  writes:

> Thank you for your reply.
>
>
> 在 2021/3/17 11:04, Daniel Axtens 写道:
>> Hi He Ying,
>>
>> Thank you for this patch.
>>
>> I'm not sure what the precise rules for Fixes are, but I wonder if this
>> should have:
>>
>> Fixes: 9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses")
>> Fixes: f79643787e0a ("powerpc/64s: flush L1D on kernel entry")
>
> Is that necessary for warning cleanups? I thought 'Fixes' tags are 
> needed only for
>
> bugfix patches. Can someone tell me whether I am right?

Yeah, I'm not sure either. Hopefully mpe will let us know.

Kind regards,
Daniel

>
>>
>> Those are the commits that added the entry_flush and uaccess_flush
>> symbols. Perhaps one for rfi_flush too but I'm not sure what commit
>> introduced that.
>>
>> Kind regards,
>> Daniel
>>
>>> warning: symbol 'rfi_flush' was not declared.
>>> warning: symbol 'entry_flush' was not declared.
>>> warning: symbol 'uaccess_flush' was not declared.
>>> We found warnings above in arch/powerpc/kernel/setup_64.c by using
>>> sparse tool.
>>>
>>> Define 'entry_flush' and 'uaccess_flush' as static because they are not
>>> referenced outside the file. Include asm/security_features.h in which
>>> 'rfi_flush' is declared.
>>>
>>> Reported-by: Hulk Robot 
>>> Signed-off-by: He Ying 
>>> ---
>>>   arch/powerpc/kernel/setup_64.c | 5 +++--
>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
>>> index 560ed8b975e7..f92d72a7e7ce 100644
>>> --- a/arch/powerpc/kernel/setup_64.c
>>> +++ b/arch/powerpc/kernel/setup_64.c
>>> @@ -68,6 +68,7 @@
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>>   
>>>   #include "setup.h"
>>>   
>>> @@ -949,8 +950,8 @@ static bool no_rfi_flush;
>>>   static bool no_entry_flush;
>>>   static bool no_uaccess_flush;
>>>   bool rfi_flush;
>>> -bool entry_flush;
>>> -bool uaccess_flush;
>>> +static bool entry_flush;
>>> +static bool uaccess_flush;
>>>   DEFINE_STATIC_KEY_FALSE(uaccess_flush_key);
>>>   EXPORT_SYMBOL(uaccess_flush_key);
>>>   
>>> -- 
>>> 2.17.1
>> .

Re: [PATCH] powerpc: arch/powerpc/kernel/setup_64.c - cleanup warnings

2021-03-16 Thread Daniel Axtens

Hi He Ying,

Thank you for this patch.

I'm not sure what the precise rules for Fixes are, but I wonder if this
should have:

Fixes: 9a32a7e78bd0 ("powerpc/64s: flush L1D after user accesses")
Fixes: f79643787e0a ("powerpc/64s: flush L1D on kernel entry")

Those are the commits that added the entry_flush and uaccess_flush
symbols. Perhaps one for rfi_flush too but I'm not sure what commit
introduced that.

Kind regards,
Daniel

> warning: symbol 'rfi_flush' was not declared.
> warning: symbol 'entry_flush' was not declared.
> warning: symbol 'uaccess_flush' was not declared.
> We found warnings above in arch/powerpc/kernel/setup_64.c by using
> sparse tool.
>
> Define 'entry_flush' and 'uaccess_flush' as static because they are not
> referenced outside the file. Include asm/security_features.h in which
> 'rfi_flush' is declared.
>
> Reported-by: Hulk Robot 
> Signed-off-by: He Ying 
> ---
>  arch/powerpc/kernel/setup_64.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 560ed8b975e7..f92d72a7e7ce 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -68,6 +68,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "setup.h"
>  
> @@ -949,8 +950,8 @@ static bool no_rfi_flush;
>  static bool no_entry_flush;
>  static bool no_uaccess_flush;
>  bool rfi_flush;
> -bool entry_flush;
> -bool uaccess_flush;
> +static bool entry_flush;
> +static bool uaccess_flush;
>  DEFINE_STATIC_KEY_FALSE(uaccess_flush_key);
>  EXPORT_SYMBOL(uaccess_flush_key);
>  
> -- 
> 2.17.1

Re: [PATCH v2 04/15] powerpc/uaccess: Remove __get/put_user_inatomic()

2021-03-10 Thread Daniel Axtens

Hi Christophe,

> Powerpc is the only architecture having _inatomic variants of
> __get_user() and __put_user() accessors. They were introduced
> by commit e68c825bb016 ("[POWERPC] Add inatomic versions of __get_user
> and __put_user").
>
> Those variants expand to the _nosleep macros instead of expanding
> to the _nocheck macros. The only difference between the _nocheck
> and the _nosleep macros is the call to might_fault().
>
> Since commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with
> pagefault_disable()"), __get/put_user() can be used in atomic parts
> of the code, therefore __get/put_user_inatomic() have become useless.
>
> Remove __get_user_inatomic() and __put_user_inatomic().
>

This makes much more sense, thank you!

Simplifying uaccess.h is always good to me :)

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/uaccess.h| 37 ---
>  .../kernel/hw_breakpoint_constraints.c|  2 +-
>  arch/powerpc/kernel/traps.c   |  2 +-
>  3 files changed, 2 insertions(+), 39 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/uaccess.h 
> b/arch/powerpc/include/asm/uaccess.h
> index a08c482b1315..01aea0df4dd0 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -53,11 +53,6 @@ static inline bool __access_ok(unsigned long addr, 
> unsigned long size)
>  #define __put_user(x, ptr) \
>   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
>  
> -#define __get_user_inatomic(x, ptr) \
> - __get_user_nosleep((x), (ptr), sizeof(*(ptr)))
> -#define __put_user_inatomic(x, ptr) \
> - __put_user_nosleep((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
> -
>  #ifdef CONFIG_PPC64
>  
>  #define ___get_user_instr(gu_op, dest, ptr)  \
> @@ -92,9 +87,6 @@ static inline bool __access_ok(unsigned long addr, unsigned 
> long size)
>  #define __get_user_instr(x, ptr) \
>   ___get_user_instr(__get_user, x, ptr)
>  
> -#define __get_user_instr_inatomic(x, ptr) \
> - ___get_user_instr(__get_user_inatomic, x, ptr)
> -
>  extern long __put_user_bad(void);
>  
>  #define __put_user_size(x, ptr, size, retval)\
> @@ -141,20 +133,6 @@ __pu_failed: 
> \
>   __pu_err;   \
>  })
>  
> -#define __put_user_nosleep(x, ptr, size) \
> -({   \
> - long __pu_err;  \
> - __typeof__(*(ptr)) __user *__pu_addr = (ptr);   \
> - __typeof__(*(ptr)) __pu_val = (x);  \
> - __typeof__(size) __pu_size = (size);\
> - \
> - __chk_user_ptr(__pu_addr);  \
> - __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err); \
> - \
> - __pu_err;   \
> -})
> -
> -
>  /*
>   * We don't tell gcc that we are accessing memory, but this is OK
>   * because we do not write to any memory gcc knows about, so there
> @@ -320,21 +298,6 @@ do { 
> \
>   __gu_err;   \
>  })
>  
> -#define __get_user_nosleep(x, ptr, size) \
> -({   \
> - long __gu_err;  \
> - __long_type(*(ptr)) __gu_val;   \
> - __typeof__(*(ptr)) __user *__gu_addr = (ptr);   \
> - __typeof__(size) __gu_size = (size);\
> - \
> - __chk_user_ptr(__gu_addr);  \
> - __get_user_size(__gu_val, __gu_addr, __gu_size, __gu_err); \
> - (x) = (__force __typeof__(*(ptr)))__gu_val; \
> - \
> - __gu_err;   \
> -})
> -
> -
>  /* more complex routines */
>  
>  extern unsigned long __copy_tofrom_user(void __user *to,
> diff --git a/arch/powerpc/kernel/hw_breakpoint_constraints.c 
> b/arch/powerpc/kernel/hw_breakpoint_constraints.c
> index 867ee4aa026a..675d1f66ab72 100644
> --- a/arch/powerpc/kernel/hw_breakpoint_constraints.c
> +++ b/arch/powerpc/kernel/hw_

Re: [PATCH v2 03/15] powerpc/align: Convert emulate_spe() to user_access_begin

2021-03-10 Thread Daniel Axtens

Hi Christophe,

> This patch converts emulate_spe() to using user_access_being
s/being/begin/ :)
> logic.
>
> Since commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with
> pagefault_disable()"), might_fault() doesn't fire when called from
> sections where pagefaults are disabled, which must be the case
> when using _inatomic variants of __get_user and __put_user. So
> the might_fault() in user_access_begin() is not a problem.
(likewise with the might_fault() in __get_user_nocheck, called from
unsafe_get_user())

> There was a verification of user_mode() together with the access_ok(),
> but the function returns in case !user_mode() immediately after
> the access_ok() verification, so removing that test has no effect.

I agree that removing the test is safe.

> - /* Verify the address of the operand */
> - if (unlikely(user_mode(regs) &&
> -  !access_ok(addr, nb)))
> - return -EFAULT;
> -

I found the reasoning a bit confusing: I think it's safe to remove
because:

 - we have the usermode check immediately following it:

>   /* userland only */
>   if (unlikely(!user_mode(regs)))
>   return 0;

 - and then we have the access_ok() check as part of
   user_read_access_begin later on in the function:

> + if (!user_read_access_begin(addr, nb))
> + return -EFAULT;
> +


>   switch (nb) {
>   case 8:
> - ret |= __get_user_inatomic(temp.v[0], p++);
> - ret |= __get_user_inatomic(temp.v[1], p++);
> - ret |= __get_user_inatomic(temp.v[2], p++);
> - ret |= __get_user_inatomic(temp.v[3], p++);
> + unsafe_get_user(temp.v[0], p++, Efault_read);
> + unsafe_get_user(temp.v[1], p++, Efault_read);
> + unsafe_get_user(temp.v[2], p++, Efault_read);
> + unsafe_get_user(temp.v[3], p++, Efault_read);

This will bail early rather than trying every possible read. I think
that's OK. I can't think of a situation where we could fail to read the
first byte and then successfully read later bytes, for example. Also I
can't think of a sane way userspace could depend on that behaviour. So I
agree with this change (and the change to the write path).

>   fallthrough;
>   case 4:
> - ret |= __get_user_inatomic(temp.v[4], p++);
> - ret |= __get_user_inatomic(temp.v[5], p++);
> + unsafe_get_user(temp.v[4], p++, Efault_read);
> + unsafe_get_user(temp.v[5], p++, Efault_read);
>   fallthrough;
>   case 2:
> - ret |= __get_user_inatomic(temp.v[6], p++);
> - ret |= __get_user_inatomic(temp.v[7], p++);
> - if (unlikely(ret))
> - return -EFAULT;
> + unsafe_get_user(temp.v[6], p++, Efault_read);
> + unsafe_get_user(temp.v[7], p++, Efault_read);
>   }
> + user_read_access_end();
>  
>   switch (instr) {
>   case EVLDD:
> @@ -255,31 +250,41 @@ static int emulate_spe(struct pt_regs *regs, unsigned 
> int reg,
>  
>   /* Store result to memory or update registers */
>   if (flags & ST) {
> - ret = 0;
>   p = addr;
> +
> + if (!user_read_access_begin(addr, nb))

That should be a user_write_access_begin.

> + return -EFAULT;
> +


>  
>   return 1;
> +
> +Efault_read:

Checkpatch complains that this is CamelCase, which seems like a
checkpatch problem. Efault_{read,write} seem like good labels to me.

(You don't need to change anything, I just like to check the checkpatch
results when reviewing a patch.)

> + user_read_access_end();
> + return -EFAULT;
> +
> +Efault_write:
> + user_write_access_end();
> + return -EFAULT;
>  }
>  #endif /* CONFIG_SPE */
>

With the user_write_access_begin change:
  Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

Re: [PATCH v2 01/15] powerpc/uaccess: Remove __get_user_allowed() and unsafe_op_wrap()

2021-03-10 Thread Daniel Axtens

Hi Christophe,

Thanks for the answers to my questions on v1.

This all looks good to me.

Reviewed-by: Daniel Axtens 

Kind regards,
Daniel

> Those two macros have only one user which is unsafe_get_user().
>
> Put everything in one place and remove them.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/uaccess.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/uaccess.h 
> b/arch/powerpc/include/asm/uaccess.h
> index 78e2a3990eab..8cbf3e3874f1 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -53,9 +53,6 @@ static inline bool __access_ok(unsigned long addr, unsigned 
> long size)
>  #define __put_user(x, ptr) \
>   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
>  
> -#define __get_user_allowed(x, ptr) \
> - __get_user_nocheck((x), (ptr), sizeof(*(ptr)), false)
> -
>  #define __get_user_inatomic(x, ptr) \
>   __get_user_nosleep((x), (ptr), sizeof(*(ptr)))
>  #define __put_user_inatomic(x, ptr) \
> @@ -482,8 +479,11 @@ user_write_access_begin(const void __user *ptr, size_t 
> len)
>  #define user_write_access_begin  user_write_access_begin
>  #define user_write_access_endprevent_current_write_to_user
>  
> -#define unsafe_op_wrap(op, err) do { if (unlikely(op)) goto err; } while (0)
> -#define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
> +#define unsafe_get_user(x, p, e) do {
> \
> + if (unlikely(__get_user_nocheck((x), (p), sizeof(*(p)), false)))\
> + goto e; \
> +} while (0)
> +
>  #define unsafe_put_user(x, p, e) \
>   __unsafe_put_user_goto((__typeof__(*(p)))(x), (p), sizeof(*(p)), e)
>  
> -- 
> 2.25.0

Re: [PATCH v1 03/15] powerpc/uaccess: Remove __get/put_user_inatomic()

2021-03-01 Thread Daniel Axtens

Christophe Leroy  writes:

> Since commit 662bbcb2747c ("mm, sched: Allow uaccess in atomic with
> pagefault_disable()"), __get/put_user() can be used in atomic parts
> of the code, therefore the __get/put_user_inatomic() introduced
> by commit e68c825bb016 ("[POWERPC] Add inatomic versions of __get_user
> and __put_user") have become useless.

I spent some time chasing these macro definitions.

Let me see if I understand you.

__get_user(x, ptr) becomes __get_user_nocheck(..., true)
__get_user_inatomic() become __get_user_nosleep()

The difference between how __get_user_nosleep() and
__get_user_nocheck(..., true) operate is that __get_user_nocheck calls
might_fault() and __get_user_nosleep() does not.

If I understand the commit you reference and mm/memory.c, you're saying
that we can indeed call might_fault() when page faults are disabled,
because __might_fault() checks if page faults are disabled and does not
fire a warning if it is called with page faults disabled.

Therefore, it is safe to remove our _inatomic version that does not call
might_fault and just to call might_fault unconditionally.

Is that right?

I haven't checked changes you made to the various .c files in fine
detail but they appear to be entirely mechanical.

> powerpc is the only one having such functions. There is a real
> intention not to have to provide such _inatomic() helpers, see the
> comment in might_fault() in mm/memory.c introduced by
> commit 3ee1afa308f2 ("x86: some lock annotations for user
> copy paths, v2"):
>
>   /*
>* it would be nicer only to annotate paths which are not under
>* pagefault_disable, however that requires a larger audit and
>* providing helpers like get_user_atomic.
>*/
>

I'm not fully sure I understand what you're saying in this part of the
commit message.

Kind regards,
Daniel

>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/uaccess.h| 37 ---
>  arch/powerpc/kernel/align.c   | 32 
>  .../kernel/hw_breakpoint_constraints.c|  2 +-
>  arch/powerpc/kernel/traps.c   |  2 +-
>  4 files changed, 18 insertions(+), 55 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/uaccess.h 
> b/arch/powerpc/include/asm/uaccess.h
> index a08c482b1315..01aea0df4dd0 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -53,11 +53,6 @@ static inline bool __access_ok(unsigned long addr, 
> unsigned long size)
>  #define __put_user(x, ptr) \
>   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
>  
> -#define __get_user_inatomic(x, ptr) \
> - __get_user_nosleep((x), (ptr), sizeof(*(ptr)))
> -#define __put_user_inatomic(x, ptr) \
> - __put_user_nosleep((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
> -
>  #ifdef CONFIG_PPC64
>  
>  #define ___get_user_instr(gu_op, dest, ptr)  \
> @@ -92,9 +87,6 @@ static inline bool __access_ok(unsigned long addr, unsigned 
> long size)
>  #define __get_user_instr(x, ptr) \
>   ___get_user_instr(__get_user, x, ptr)
>  
> -#define __get_user_instr_inatomic(x, ptr) \
> - ___get_user_instr(__get_user_inatomic, x, ptr)
> -
>  extern long __put_user_bad(void);
>  
>  #define __put_user_size(x, ptr, size, retval)\
> @@ -141,20 +133,6 @@ __pu_failed: 
> \
>   __pu_err;   \
>  })
>  
> -#define __put_user_nosleep(x, ptr, size) \
> -({   \
> - long __pu_err;  \
> - __typeof__(*(ptr)) __user *__pu_addr = (ptr);   \
> - __typeof__(*(ptr)) __pu_val = (x);  \
> - __typeof__(size) __pu_size = (size);\
> - \
> - __chk_user_ptr(__pu_addr);  \
> - __put_user_size(__pu_val, __pu_addr, __pu_size, __pu_err); \
> - \
> - __pu_err;   \
> -})
> -
> -
>  /*
>   * We don't tell gcc that we are accessing memory, but this is OK
>   * because we do not write to any memory gcc knows about, so there
> @@ -320,21 +298,6 @@ do { 
> \
>   __gu_err;   \
>  })
>  
> -#define __get_user_nosleep(x, ptr, size) \
> -({   \
> - long __gu_err;  \
> - __long_type(*(ptr)) __gu_val;   \
> - __typeof__(*(ptr)) __user *__gu_addr = (ptr);   \
> - __typeof__(size) __gu_size = (size);\
> -

Re: [PATCH v1 02/15] powerpc/uaccess: Define ___get_user_instr() for ppc32

2021-03-01 Thread Daniel Axtens

Hi Christophe,

> +#else /* !CONFIG_PPC64 */
> +#define ___get_user_instr(gu_op, dest, ptr)  \
> + gu_op((dest).val, (u32 __user *)(ptr))
> +#endif /* CONFIG_PPC64 */
>  
>  #define get_user_instr(x, ptr) \
>   ___get_user_instr(get_user, x, ptr)
> @@ -91,18 +95,6 @@ static inline bool __access_ok(unsigned long addr, 
> unsigned long size)
>  #define __get_user_instr_inatomic(x, ptr) \
>   ___get_user_instr(__get_user_inatomic, x, ptr)
>  
> -#else /* !CONFIG_PPC64 */
> -#define get_user_instr(x, ptr) \
> - get_user((x).val, (u32 __user *)(ptr))
> -
> -#define __get_user_instr(x, ptr) \
> - __get_user_nocheck((x).val, (u32 __user *)(ptr), sizeof(u32), true)
> -
> -#define __get_user_instr_inatomic(x, ptr) \
> - __get_user_nosleep((x).val, (u32 __user *)(ptr), sizeof(u32))
> -
> -#endif /* CONFIG_PPC64 */

The previous version of __get_user_instr called __get_user_nocheck,
this version calls __get_user. Likewise __get_user_instr_inatomic called
__get_user_nosleep and now it calls __get_user_inatomic. I was confused
by this until I chased the macro definitions and realised that both
names refer to the same thing:

#define __get_user(x, ptr) \
__get_user_nocheck((x), (ptr), sizeof(*(ptr)), true)

#define __get_user_inatomic(x, ptr) \
__get_user_nosleep((x), (ptr), sizeof(*(ptr)))

(I don't think you need to do anything here, I'm just documenting what I
considered while reviewing your patch.)

As such:
Reviewed-by: Daniel Axtens 

Kind regards,
Daniel


> -
>  extern long __put_user_bad(void);
>  
>  #define __put_user_size(x, ptr, size, retval)\
> -- 
> 2.25.0

Re: [PATCH v1 01/15] powerpc/uaccess: Remove __get_user_allowed() and unsafe_op_wrap()

2021-03-01 Thread Daniel Axtens




Christophe Leroy  writes:

> Those two macros have only one user which is unsafe_get_user().
>
> Put everything in one place and remove them.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/uaccess.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/uaccess.h 
> b/arch/powerpc/include/asm/uaccess.h
> index 78e2a3990eab..8cbf3e3874f1 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -53,9 +53,6 @@ static inline bool __access_ok(unsigned long addr, unsigned 
> long size)
>  #define __put_user(x, ptr) \
>   __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
>  
> -#define __get_user_allowed(x, ptr) \
> - __get_user_nocheck((x), (ptr), sizeof(*(ptr)), false)
> -
>  #define __get_user_inatomic(x, ptr) \
>   __get_user_nosleep((x), (ptr), sizeof(*(ptr)))
>  #define __put_user_inatomic(x, ptr) \
> @@ -482,8 +479,11 @@ user_write_access_begin(const void __user *ptr, size_t 
> len)
>  #define user_write_access_begin  user_write_access_begin
>  #define user_write_access_endprevent_current_write_to_user
>  
> -#define unsafe_op_wrap(op, err) do { if (unlikely(op)) goto err; } while (0)
> -#define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
> +#define unsafe_get_user(x, p, e) do {
> \
> + if (unlikely(__get_user_nocheck((x), (p), sizeof(*(p)), false)))\
> + goto e; \
> +} while (0)
> +

This seems correct to me.

Checkpatch does have one check that is relevant:

CHECK: Macro argument reuse 'p' - possible side-effects?
#36: FILE: arch/powerpc/include/asm/uaccess.h:482:
+#define unsafe_get_user(x, p, e) do {  \
+   if (unlikely(__get_user_nocheck((x), (p), sizeof(*(p)), false)))\
+   goto e; \
+} while (0)

Given that we are already creating a new block, should we do something
like this (completely untested):

#define unsafe_get_user(x, p, e) do {   \
__typeof__(p) __p = (p);
if (unlikely(__get_user_nocheck((x), (__p), sizeof(*(__p)), false)))\
goto e; \
} while (0)

Kind regards,
Daniel

>  #define unsafe_put_user(x, p, e) \
>   __unsafe_put_user_goto((__typeof__(*(p)))(x), (p), sizeof(*(p)), e)
>  
> -- 
> 2.25.0

Re: [PATCH] powerpc: use ARRAY_SIZE instead of division operation

2021-02-21 Thread Daniel Axtens

Hi Yang,

> This eliminates the following coccicheck warning:
> ./arch/powerpc/boot/mktree.c:130:31-32: WARNING: Use ARRAY_SIZE
>
> Reported-by: Abaci Robot 
> Signed-off-by: Yang Li 
> ---
>  arch/powerpc/boot/mktree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/boot/mktree.c b/arch/powerpc/boot/mktree.c
> index dc603f3..0b2def5 100644
> --- a/arch/powerpc/boot/mktree.c
> +++ b/arch/powerpc/boot/mktree.c
> @@ -127,7 +127,7 @@ int main(int argc, char *argv[])
>   exit(5);
>   }
>   cp = tmpbuf;
> - for (i = 0; i < sizeof(tmpbuf) / sizeof(unsigned int); i++)
> + for (i = 0; i < ARRAY_SIZE(tmpbuf); i++)

Unfortunately this breaks the build - see
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/1613808853-78381-1-git-send-email-yang@linux.alibaba.com/

This is because 'boot/mktree.c' is a userspace program without access to
the usual kernel headers.

It might be best to get coccicheck to exclude all of
'arch/powerpc/boot/' as I believe everything in that directory is
compiled differently to the rest of the kernel (see the Makefile), but
maybe just blacklisting boot/mktree.c would be sufficient.

Kind regards,
Daniel

>   cksum += *cp++;
>   if (write(out_fd, tmpbuf, sizeof(tmpbuf)) != sizeof(tmpbuf)) {
>   perror("boot-image write");
> -- 
> 1.8.3.1

Re: [PATCH v10 6/6] powerpc: Book3S 64-bit outline-only KASAN support

2021-02-03 Thread Daniel Axtens

Christophe Leroy  writes:

> Le 03/02/2021 à 12:59, Daniel Axtens a écrit :
>> Implement a limited form of KASAN for Book3S 64-bit machines running under
>> the Radix MMU, supporting only outline mode.
>> 
>
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index a66f435dabbf..9a6fd603f0e7 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -2157,8 +2157,8 @@ void show_stack(struct task_struct *tsk, unsigned long 
>> *stack,
>>  break;
>>   
>>  stack = (unsigned long *) sp;
>> -newsp = stack[0];
>> -ip = stack[STACK_FRAME_LR_SAVE];
>> +newsp = READ_ONCE_NOCHECK(stack[0]);
>> +ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
>>  if (!firstframe || ip != lr) {
>>  printk("%s["REG"] ["REG"] %pS",
>>  loglvl, sp, ip, (void *)ip);
>> @@ -2176,17 +2176,19 @@ void show_stack(struct task_struct *tsk, unsigned 
>> long *stack,
>>   * See if this is an exception frame.
>>   * We look for the "regshere" marker in the current frame.
>>   */
>> -if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
>> -&& stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
>> +if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE) &&
>> +(READ_ONCE_NOCHECK(stack[STACK_FRAME_MARKER]) ==
>> + STACK_FRAME_REGS_MARKER)) {
>>  struct pt_regs *regs = (struct pt_regs *)
>>  (sp + STACK_FRAME_OVERHEAD);
>>   
>> -lr = regs->link;
>> +lr = READ_ONCE_NOCHECK(regs->link);
>>  printk("%s--- interrupt: %lx at %pS\n",
>> -   loglvl, regs->trap, (void *)regs->nip);
>> +   loglvl, READ_ONCE_NOCHECK(regs->trap),
>> +   (void *)READ_ONCE_NOCHECK(regs->nip));
>>  __show_regs(regs);
>>  printk("%s--- interrupt: %lx\n",
>> -   loglvl, regs->trap);
>> +   loglvl, READ_ONCE_NOCHECK(regs->trap));
>>   
>>  firstframe = 1;
>>  }
>
>
> The above changes look like a bug fix not directly related to KASAN. Should 
> be split out in another 
> patch I think.

That code corresponds to the following part of the patch description:

| - Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
|   generic code, arm64, s390 and x86 all do this for similar sorts of
|   reasons: when unwinding a stack, we might touch memory that KASAN has
|   marked as being out-of-bounds. In our case we often get this when
|   checking for an exception frame because we're checking an arbitrary
|   offset into the stack frame.
|
|   See commit 20955746320e ("s390/kasan: avoid false positives during stack
|   unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
|   frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
|   Prevent KASAN false positive warnings") and commit 6e22c8366416
|   ("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer")

include/linux/compiler.h describes it as follows:

/*
 * Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need
 * to hide memory access from KASAN.
 */

So I think it is sufficently connected with KASAN to be in this patch.

Kind regards,
Daniel

>
> Christophe

[PATCH v10 6/6] powerpc: Book3S 64-bit outline-only KASAN support

2021-02-03 Thread Daniel Axtens

Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.

 - Enable the compiler instrumentation to check addresses and maintain the
   shadow region. (This is the guts of KASAN which we can easily reuse.)

 - Require kasan-vmalloc support to handle modules and anything else in
   vmalloc space.

 - KASAN needs to be able to validate all pointer accesses, but we can't
   instrument all kernel addresses - only linear map and vmalloc. On boot,
   set up a single page of read-only shadow that marks all iomap and
   vmemmap accesses as valid.

 - Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
   generic code, arm64, s390 and x86 all do this for similar sorts of
   reasons: when unwinding a stack, we might touch memory that KASAN has
   marked as being out-of-bounds. In our case we often get this when
   checking for an exception frame because we're checking an arbitrary
   offset into the stack frame.

   See commit 20955746320e ("s390/kasan: avoid false positives during stack
   unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
   frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
   Prevent KASAN false positive warnings") and commit 6e22c8366416
   ("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer")

 - Document KASAN in both generic and powerpc docs.

Background
--

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code with translations off ("real mode") during boot,
   including a lot of generic device-tree parsing code which is used to
   determine MMU features.

[ppc64 mm note: The kernel installs a linear mapping at effective
address c000...-c008 This is a one-to-one mapping with physical
memory from ... onward. Because of how memory accesses work on
powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
same memory both with translations on (accessing as an 'effective
address'), and with translations off (accessing as a 'real
address'). This works in both guests and the hypervisor. For more
details, see s5.7 of Book III of version 3 of the ISA, in particular
the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
KASAN implementation currently only supports Radix.]

 - Some code - most notably a lot of KVM code - also runs with translations
   off after boot.

 - Therefore any offset has to point to memory that is valid with
   translations on or off.

One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.

Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.

Cc: Balbir Singh  # ppc64 out-of-line radix version
Cc: Aneesh Kumar K.V  # ppc64 hash version
Cc: Christophe Leroy  # ppc32 version
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst|  9 +-
 Documentation/powerpc/kasan.txt  | 48 +-
 arch/powerpc/Kconfig |  4 +-
 arch/powerpc/Kconfig.debug   |  3 +-
 arch/powerpc/include/asm/book3s/64/hash.h|  4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  4 +
 arch/powerpc/include/asm/book3s/64/radix.h   | 13 ++-
 arch/powerpc/include/asm/kasan.h | 22 +
 arch/powerpc/kernel/Makefile | 11 +++
 arch/powerpc/kernel/process.c| 16 ++--
 arch/powerpc/kvm/Makefile|  5 ++
 arch/powerpc/mm/book3s64/Makefile|  9 ++
 arch/powerpc/mm/kasan/Makefile   |  1 +
 arch/powerpc/mm/kasan/init_book3s_64.c   | 95 
 arch/powerpc/mm/ptdump/ptdump.c  | 20 -
 arch/powerpc/platforms/Kconfig.cputype   |  1 +
 arch/powerpc/platforms/powernv/Makefile  |  6 ++
 arch/powerpc/platforms/pseries/Makefile  |  3 +
 18 files changed, 256 insertions(+), 18 deletions(-)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index 9cfc116cc6bf..8c5d0c5c35f6 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -22,8 +22,9 @@ out-of-bounds accesses for global variables is only supported 
since Clang 11.
 Tag-based KASAN is only supported in Clang.
 
 Currently generic KASAN is supported for the x86_

[PATCH v10 5/6] powerpc/mm/kasan: rename kasan_init_32.c to init_32.c

2021-02-03 Thread Daniel Axtens

kasan is already implied by the directory name, we don't need to
repeat it.

Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/mm/kasan/Makefile   | 2 +-
 arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)

diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index bb1a5408b86b..42fb628a44fd 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -2,6 +2,6 @@
 
 KASAN_SANITIZE := n
 
-obj-$(CONFIG_PPC32)   += kasan_init_32.o
+obj-$(CONFIG_PPC32)   += init_32.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
 obj-$(CONFIG_PPC_BOOK3S_32)+= book3s_32.o
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/init_32.c
similarity index 100%
rename from arch/powerpc/mm/kasan/kasan_init_32.c
rename to arch/powerpc/mm/kasan/init_32.c
-- 
2.27.0

[PATCH v10 4/6] kasan: Document support on 32-bit powerpc

2021-02-03 Thread Daniel Axtens

KASAN is supported on 32-bit powerpc and the docs should reflect this.

Document s390 support while we're at it.

Suggested-by: Christophe Leroy 
Reviewed-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst |  7 +--
 Documentation/powerpc/kasan.txt   | 12 
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index e022b7506e37..9cfc116cc6bf 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -22,7 +22,8 @@ out-of-bounds accesses for global variables is only supported 
since Clang 11.
 Tag-based KASAN is only supported in Clang.
 
 Currently generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390
-and riscv architectures, and tag-based KASAN modes are supported only for 
arm64.
+and riscv architectures. It is also supported on 32-bit powerpc kernels.
+Tag-based KASAN modes are supported only for arm64.
 
 Usage
 -
@@ -332,7 +333,9 @@ CONFIG_KASAN_VMALLOC
 
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
-cost of greater memory usage. Currently this is only supported on x86.
+cost of greater memory usage. Currently this supported on x86, s390
+and 32-bit powerpc. It is optional, except on 32-bit powerpc kernels
+with module support, where it is required.
 
 This works by hooking into vmalloc and vmap, and dynamically
 allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
new file mode 100644
index ..26bb0e8bb18c
--- /dev/null
+++ b/Documentation/powerpc/kasan.txt
@@ -0,0 +1,12 @@
+KASAN is supported on powerpc on 32-bit only.
+
+32 bit support
+==
+
+KASAN is supported on both hash and nohash MMUs on 32-bit.
+
+The shadow area sits at the top of the kernel virtual memory space above the
+fixmap area and occupies one eighth of the total kernel virtual memory space.
+
+Instrumentation of the vmalloc area is optional, unless built with modules,
+in which case it is required.
-- 
2.27.0

[PATCH v10 3/6] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2021-02-03 Thread Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.

Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 18 +++---
 mm/kasan/init.c   |  6 +++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index d314c0fa5804..84bea59d01b3 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -43,10 +43,22 @@ static inline bool kasan_arch_is_ready(void){ 
return true; }
 #define PTE_HWTABLE_PTRS 0
 #endif
 
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index c4605ac9837b..b4d822dff1fb 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
__page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
-- 
2.27.0

[PATCH v10 2/6] kasan: allow architectures to provide an outline readiness check

2021-02-03 Thread Daniel Axtens

Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.

Cc: Balbir Singh 
Cc: Aneesh Kumar K.V 
Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 

--

I discuss the justfication for this later in the series. Also,
both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
---
 include/linux/kasan.h | 4 
 mm/kasan/common.c | 4 
 mm/kasan/generic.c| 3 +++
 mm/kasan/shadow.c | 4 
 4 files changed, 15 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index bb862d1f0e15..d314c0fa5804 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -23,6 +23,10 @@ struct kunit_kasan_expectation {
 
 #endif
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#endif
+
 #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
 
 #include 
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index a390fae9d64b..871ceefd723d 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -348,6 +348,10 @@ static bool kasan_slab_free(struct kmem_cache *cache, 
void *object,
if (unlikely(cache->flags & SLAB_TYPESAFE_BY_RCU))
return false;
 
+   /* We can't read the shadow byte if the arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return false;
+
if (!kasan_byte_accessible(tagged_object)) {
kasan_report_invalid_free(tagged_object, ip);
return true;
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 2e55e0f82f39..718c171584e3 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned 
long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return true;
+
if (unlikely(size == 0))
return true;
 
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index de6b3f074742..0aafc2d5138f 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -85,6 +85,10 @@ void kasan_poison(const void *address, size_t size, u8 value)
address = kasan_reset_tag(address);
size = round_up(size, KASAN_GRANULE_SIZE);
 
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
/* Skip KFENCE memory if called explicitly outside of sl*b. */
if (is_kfence_address(address))
return;
-- 
2.27.0

[PATCH v10 1/6] kasan: allow an architecture to disable inline instrumentation

2021-02-03 Thread Daniel Axtens

For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.

Signed-off-by: Daniel Axtens 
---
 lib/Kconfig.kasan | 8 
 1 file changed, 8 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..4f4d3fb8733d 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,9 @@ config HAVE_ARCH_KASAN_HW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+config ARCH_DISABLE_KASAN_INLINE
+   def_bool n
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -130,6 +133,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
bool "Inline instrumentation"
+   depends on !ARCH_DISABLE_KASAN_INLINE
help
  Compiler directly inserts code checking shadow memory before
  memory accesses. This is faster than outline (in some workloads
@@ -141,6 +145,7 @@ endchoice
 config KASAN_STACK
bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
depends on KASAN_GENERIC || KASAN_SW_TAGS
+   depends on !ARCH_DISABLE_KASAN_INLINE
default y if CC_IS_GCC
help
  The LLVM stack address sanitizer has a know problem that
@@ -154,6 +159,9 @@ config KASAN_STACK
  but clang users can still enable it for builds without
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
+ If the architecture disables inline instrumentation, this is
+ also disabled as it adds inline-style instrumentation that
+ is run unconditionally.
 
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
-- 
2.27.0

[PATCH v10 0/6] KASAN for powerpc64 radix

2021-02-03 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU.

v10 rebases on top of next-20210125, fixing things up to work on top
of the latest changes, and fixing some review comments from
Christophe. I have tested host and guest with 64k pages for this spin.

It does not apply to powerpc/next, sorry: there are conflicting kasan
changes staged in next.

There is now only 1 failing KUnit test: kasan_global_oob - gcc puts
the ASAN init code in a section called '.init_array'. Powerpc64 module
loading code goes through and _renames_ any section beginning with
'.init' to begin with '_init' in order to avoid some complexities
around our 24-bit indirect jumps. This means it renames '.init_array'
to '_init_array', and the generic module loading code then fails to
recognise the section as a constructor and thus doesn't run it. This
hack dates back to 2003 and so I'm not going to try to unpick it in
this series. (I suspect this may have previously worked if the code
ended up in .ctors rather than .init_array but I don't keep my old
binaries around so I have no real way of checking.)

(The previously failing stack tests are now skipped due to more
accurate configuration settings.)

Details from v9: This is a significant reworking of the previous
versions. Instead of the previous approach which supported inline
instrumentation, this series provides only outline instrumentation.

To get around the problem of accessing the shadow region inside code we run
with translations off (in 'real mode'), we we restrict checking to when
translations are enabled. This is done via a new hook in the kasan core and
by excluding larger quantites of arch code from instrumentation. The upside
is that we no longer require that you be able to specify the amount of
physically contiguous memory on the system at compile time. Hopefully this
is a better trade-off. More details in patch 6.

kexec works. Both 64k and 4k pages work. Running as a KVM host works, but
nothing in arch/powerpc/kvm is instrumented. It's also potentially a bit
fragile - if any real mode code paths call out to instrumented code, things
will go boom.

Kind regards,
Daniel

Re: [PATCH v9 6/6] powerpc: Book3S 64-bit outline-only KASAN support

2021-02-03 Thread Daniel Axtens

Hi Christophe,

>>  select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
>> PPC_RADIX_MMU
>>  select HAVE_ARCH_JUMP_LABEL
>>  select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
>> -select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
>> +select HAVE_ARCH_KASAN  if PPC_BOOK3S_64 && 
>> PPC_RADIX_MMU
>
> PPC_RADIX_MMU already depends on PPC_BOOK3S_64 so 'if PPC_RADIX_MMU' would be 
> enough

Done.

>> +select HAVE_ARCH_NO_KASAN_INLINEif PPC_BOOK3S_64 && 
>> PPC_RADIX_MMU
>
> This list must respect Alphabetical order.

Fixed.

>
>> +select HAVE_ARCH_KASAN_VMALLOC  if HAVE_ARCH_KASAN
>>  select HAVE_ARCH_KGDB
>>  select HAVE_ARCH_MMAP_RND_BITS
>>  select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if COMPAT
>> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
>> index b88900f4832f..60c1bba72a6f 100644
>> --- a/arch/powerpc/Kconfig.debug
>> +++ b/arch/powerpc/Kconfig.debug
>> @@ -396,5 +396,5 @@ config PPC_FAST_ENDIAN_SWITCH
>>   
>>   config KASAN_SHADOW_OFFSET
>>  hex
>> -depends on KASAN
>> +depends on KASAN && PPC32
>>  default 0xe000
>
> Instead of the above, why not doing:
>
>   default 0xe000 if PPC32
>   default 0xa80e is PPC_BOOK3S_64

Done. I just used PPC64.

>
>> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
>> b/arch/powerpc/include/asm/book3s/64/hash.h
>> index 73ad038ed10b..105b90594a8a 100644
>> --- a/arch/powerpc/include/asm/book3s/64/hash.h
>> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
>> @@ -18,6 +18,10 @@
>>   #include 
>>   #endif
>>   
>> +#define H_PTRS_PER_PTE  (1 << H_PTE_INDEX_SIZE)
>> +#define H_PTRS_PER_PMD  (1 << H_PMD_INDEX_SIZE)
>> +#define H_PTRS_PER_PUD  (1 << H_PUD_INDEX_SIZE)
>> +
>>   /* Bits to set in a PMD/PUD/PGD entry valid bit*/
>>   #define HASH_PMD_VAL_BITS  (0x8000UL)
>>   #define HASH_PUD_VAL_BITS  (0x8000UL)
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
>> b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> index a39886681629..767e239d75e3 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -230,6 +230,13 @@ extern unsigned long __pmd_frag_size_shift;
>>   #define PTRS_PER_PUD   (1 << PUD_INDEX_SIZE)
>>   #define PTRS_PER_PGD   (1 << PGD_INDEX_SIZE)
>>   
>> +#define MAX_PTRS_PER_PTE((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? \
>> +  H_PTRS_PER_PTE : R_PTRS_PER_PTE)
>
> Nowadays we allow 100 chars per line. Could this fit on a single line ?

Yes, so long as we drop the tab between the name and the definition and
replace it with a single space.

>> +#define MAX_PTRS_PER_PMD((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? \
>> +  H_PTRS_PER_PMD : R_PTRS_PER_PMD)
>> +#define MAX_PTRS_PER_PUD((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? \
>> +  H_PTRS_PER_PUD : R_PTRS_PER_PUD)
>> +
>>   /* PMD_SHIFT determines what a second-level page table entry can map */
>>   #define PMD_SHIFT  (PAGE_SHIFT + PTE_INDEX_SIZE)
>>   #define PMD_SIZE   (1UL << PMD_SHIFT)

>> +#ifdef CONFIG_PPC32
>>   #define KASAN_SHADOW_END   (-(-KASAN_SHADOW_START >> 
>> KASAN_SHADOW_SCALE_SHIFT))
>> +#endif
>> +
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +/*
>> + * We define the  offset such that the shadow of the linear map lives
>> + * at the end of vmemmap space, that is, we choose offset such that
>> + * shadow(c000___) = c00e___. This gives:
>> + * c00e - c000 >> 3 = a80e
>> + */
>> +#define KASAN_SHADOW_OFFSET ASM_CONST(0xa80e)
>
> Why can't you use CONFIG_KASAN_SHADOW_OFFSET ?

I didn't do this earlier due to some interesting interactions in the
generic configuration. The generic Kconfig/Makefile will turn on stack
instrumentation if CONFIG_KASAN_SHADOW_OFFSET is set. This, however,
creates a bunch of inline code that crashes on ppc64 when it runs
translations off.

I've made changes to the patch that introduces ARCH_DISABLE_KASAN_INLINE
to fix this, and converted to using CONFIG_KASAN_SHADOW_OFFSET.

>> +
>> +/*
>> + * The shadow ends before the highest accessible address
>> + * because we don't need a shadow for the shadow. Instead:
>> + * c00e << 3 + a80e000 = c00fc000
>> + */
>> +#define KASAN_SHADOW_END 0xc00fc000UL
>
> I think we should be able to have a common formula for PPC32 and PPC64.
>

Perhaps, but I can't figure out what it would be. For PPC64,
end = start + (size of kernel space + vmalloc space + ioremap space + vmemmap 
space) >> 3
For PPC32, AIUI you don't have vmemmap space so I'm not sure how to
calculate it.

>> +
>> +DECLARE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
>> +
>> +static inline bool kasan_arch_is_ready_ppc64(void)
>
> I'd make it

Re: [RFC][PATCH] afs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y

2020-12-31 Thread Daniel Axtens

Hi David,

> CONFIG_FORTIFIED_SOURCE=y now causes an oops in strnlen() from afs (see
> attached patch for an explanation).  Is replacing the use with memchr() the
> right approach?  Or should I be calling __real_strnlen() or whatever it's
> called?

You certainly shouldn't be calling __real_strnlen().

memchr() is probably the right answer if you want a small fix.

However, as Linus suggested, the 'right' answer is to re-engineer your
data structures so that the string operations you do don't overflow
their arrays.

Kind regards,
Daniel

>
> David
> ---
> From: David Howells 
>
> afs: Work around strnlen() oops with CONFIG_FORTIFIED_SOURCE=y
>
> AFS has a structured layout in its directory contents (AFS dirs are
> downloaded as files and parsed locally by the client for lookup/readdir).
> The slots in the directory are defined by union afs_xdr_dirent.  This,
> however, only directly allows a name of a length that will fit into that
> union.  To support a longer name, the next 1-8 contiguous entries are
> annexed to the first one and the name flows across these.
>
> afs_dir_iterate_block() uses strnlen(), limited to the space to the end of
> the page, to find out how long the name is.  This worked fine until
> 6a39e62abbaf.  With that commit, the compiler determines the size of the
> array and asserts that the string fits inside that array.  This is a
> problem for AFS because we *expect* it to overflow one or more arrays.
>
> A similar problem also occurs in afs_dir_scan_block() when a directory file
> is being locally edited to avoid the need to redownload it.  There strlen()
> was being used safely because each page has the last byte set to 0 when the
> file is downloaded and validated (in afs_dir_check_page()).
>
> Fix this by using memchr() instead and hoping no one changes that to check
> the object size.
>
> The issue can be triggered by something like:
>
> touch /afs/example.com/thisisaveryveryverylongname
>
> and it generates a report that looks like:
>
> detected buffer overflow in strnlen
> [ cut here ]
> kernel BUG at lib/string.c:1149!
> ...
> RIP: 0010:fortify_panic+0xf/0x11
> ...
> Call Trace:
>  afs_dir_iterate_block+0x12b/0x35b
>  afs_dir_iterate+0x14e/0x1ce
>  afs_do_lookup+0x131/0x417
>  afs_lookup+0x24f/0x344
>  lookup_open.isra.0+0x1bb/0x27d
>  open_last_lookups+0x166/0x237
>  path_openat+0xe0/0x159
>  do_filp_open+0x48/0xa4
>  ? kmem_cache_alloc+0xf5/0x16e
>  ? __clear_close_on_exec+0x13/0x22
>  ? _raw_spin_unlock+0xa/0xb
>  do_sys_openat2+0x72/0xde
>  do_sys_open+0x3b/0x58
>  do_syscall_64+0x2d/0x3a
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Fixes: 6a39e62abbaf ("lib: string.h: detect intra-object overflow in 
> fortified string functions")
> Reported-by: Marc Dionne 
> Signed-off-by: David Howells 
> cc: Daniel Axtens 
>
> diff --git a/fs/afs/dir.c b/fs/afs/dir.c
> index 9068d5578a26..4fafb4e4d0df 100644
> --- a/fs/afs/dir.c
> +++ b/fs/afs/dir.c
> @@ -350,6 +350,7 @@ static int afs_dir_iterate_block(struct afs_vnode *dvnode,
>unsigned blkoff)
>  {
>   union afs_xdr_dirent *dire;
> + const u8 *p;
>   unsigned offset, next, curr;
>   size_t nlen;
>   int tmp;
> @@ -378,9 +379,15 @@ static int afs_dir_iterate_block(struct afs_vnode 
> *dvnode,
>  
>   /* got a valid entry */
>   dire = >dirents[offset];
> - nlen = strnlen(dire->u.name,
> -sizeof(*block) -
> -offset * sizeof(union afs_xdr_dirent));
> + p = memchr(dire->u.name, 0,
> +sizeof(*block) - offset * sizeof(union 
> afs_xdr_dirent));
> + if (!p) {
> + _debug("ENT[%zu.%u]: %u unterminated dirent name",
> +blkoff / sizeof(union afs_xdr_dir_block),
> +offset, next);
> + return afs_bad(dvnode, afs_file_error_dir_over_end);
> + }
> + nlen = p - dire->u.name;
>  
>   _debug("ENT[%zu.%u]: %s %zu \"%s\"",
>  blkoff / sizeof(union afs_xdr_dir_block), offset,
> diff --git a/fs/afs/dir_edit.c b/fs/afs/dir_edit.c
> index 2ffe09abae7f..5ee4e992ed8f 100644
> --- a/fs/afs/dir_edit.c
> +++ b/fs/afs/dir_edit.c
> @@ -111,6 +111,8 @@ static int afs_dir_scan_block(union afs_xdr_dir_block 
> *block, struct qstr *name,
>

Re: [PATCH 1/2] rcu-tasks: move RCU-tasks initialization out of core_initcall()

2020-12-10 Thread Daniel Axtens

Hi Vlad,

>> Initialize the RCU-tasks earlier, before *_initcall() callbacks are
>> invoked. Do it after the workqueue subsytem is up and running. That
>> gives us a possibility to make use of synchronize_rcu_tasks*() wait
>> API in early_initcall() callbacks.
>> 
>> Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall")
>> Signed-off-by: Uladzislau Rezki (Sony) 

Tested-by: Daniel Axtens 

>> ---
>>  include/linux/rcupdate.h |  6 ++
>>  init/main.c  |  1 +
>>  kernel/rcu/tasks.h   | 26 ++
>>  3 files changed, 29 insertions(+), 4 deletions(-)
>> 
> I still don't have a powerPC hw so far, even though i have sent a request
> to the osuosl.org. It would be appreciated if Michael or Daniel could run
> and verify it.

Sorry it's taken me so long to get to this. Your patch fixes things for
me. Thanks!

BTW, I'm happy to see you taking on the challenge of RCU after your good
work on vmalloc - all the best with it!

Kind regards,
Daniel

>
> Thank you in advance!
>
> --
> Vlad Rezki

[PATCH v9 4/6] kasan: Document support on 32-bit powerpc

2020-12-01 Thread Daniel Axtens

KASAN is supported on 32-bit powerpc and the docs should reflect this.

Document s390 support while we're at it.

Suggested-by: Christophe Leroy 
Reviewed-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst |  7 +--
 Documentation/powerpc/kasan.txt   | 12 
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index 2b68addaadcd..eaf868094a8e 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -19,7 +19,8 @@ out-of-bounds accesses for global variables is only supported 
since Clang 11.
 Tag-based KASAN is only supported in Clang.
 
 Currently generic KASAN is supported for the x86_64, arm64, xtensa, s390 and
-riscv architectures, and tag-based KASAN is supported only for arm64.
+riscv architectures. It is also supported on 32-bit powerpc kernels. Tag-based
+KASAN is supported only on arm64.
 
 Usage
 -
@@ -255,7 +256,9 @@ CONFIG_KASAN_VMALLOC
 
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
-cost of greater memory usage. Currently this is only supported on x86.
+cost of greater memory usage. Currently this supported on x86, s390
+and 32-bit powerpc. It is optional, except on 32-bit powerpc kernels
+with module support, where it is required.
 
 This works by hooking into vmalloc and vmap, and dynamically
 allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
new file mode 100644
index ..26bb0e8bb18c
--- /dev/null
+++ b/Documentation/powerpc/kasan.txt
@@ -0,0 +1,12 @@
+KASAN is supported on powerpc on 32-bit only.
+
+32 bit support
+==
+
+KASAN is supported on both hash and nohash MMUs on 32-bit.
+
+The shadow area sits at the top of the kernel virtual memory space above the
+fixmap area and occupies one eighth of the total kernel virtual memory space.
+
+Instrumentation of the vmalloc area is optional, unless built with modules,
+in which case it is required.
-- 
2.25.1

[PATCH v9 3/6] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2020-12-01 Thread Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.

Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 18 +++---
 mm/kasan/init.c   |  6 +++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 3df66fdf6662..893d054aad6f 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -24,10 +24,22 @@ struct kunit_kasan_expectation {
 static inline bool kasan_arch_is_ready(void)   { return true; }
 #endif
 
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index fe6be0be1f76..42bca3d27db8 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -46,7 +46,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -58,7 +58,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -69,7 +69,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE] __page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
 {
-- 
2.25.1

[PATCH v9 6/6] powerpc: Book3S 64-bit outline-only KASAN support

2020-12-01 Thread Daniel Axtens

Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.

 - Enable the compiler instrumentation to check addresses and maintain the
   shadow region. (This is the guts of KASAN which we can easily reuse.)

 - Require kasan-vmalloc support to handle modules and anything else in
   vmalloc space.

 - KASAN needs to be able to validate all pointer accesses, but we can't
   instrument all kernel addresses - only linear map and vmalloc. On boot,
   set up a single page of read-only shadow that marks all iomap and
   vmemmap accesses as valid.

 - Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
   generic code, arm64, s390 and x86 all do this for similar sorts of
   reasons: when unwinding a stack, we might touch memory that KASAN has
   marked as being out-of-bounds. In our case we often get this when
   checking for an exception frame because we're checking an arbitrary
   offset into the stack frame.

   See commit 20955746320e ("s390/kasan: avoid false positives during stack
   unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
   frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
   Prevent KASAN false positive warnings") and commit 6e22c8366416
   ("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer")

 - Document KASAN in both generic and powerpc docs.

Background
--

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code with translations off ("real mode") during boot,
   including a lot of generic device-tree parsing code which is used to
   determine MMU features.

[ppc64 mm note: The kernel installs a linear mapping at effective
address c000...-c008 This is a one-to-one mapping with physical
memory from ... onward. Because of how memory accesses work on
powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
same memory both with translations on (accessing as an 'effective
address'), and with translations off (accessing as a 'real
address'). This works in both guests and the hypervisor. For more
details, see s5.7 of Book III of version 3 of the ISA, in particular
the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
KASAN implementation currently only supports Radix.]

 - Some code - most notably a lot of KVM code - also runs with translations
   off after boot.

 - Therefore any offset has to point to memory that is valid with
   translations on or off.

One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.

Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.

Cc: Balbir Singh  # ppc64 out-of-line radix version
Cc: Aneesh Kumar K.V  # ppc64 hash version
Cc: Christophe Leroy  # ppc32 version
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst|  9 +-
 Documentation/powerpc/kasan.txt  | 48 +-
 arch/powerpc/Kconfig |  4 +-
 arch/powerpc/Kconfig.debug   |  2 +-
 arch/powerpc/include/asm/book3s/64/hash.h|  4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  7 ++
 arch/powerpc/include/asm/book3s/64/radix.h   | 13 ++-
 arch/powerpc/include/asm/kasan.h | 34 ++-
 arch/powerpc/kernel/Makefile |  5 +
 arch/powerpc/kernel/process.c| 16 ++--
 arch/powerpc/kvm/Makefile|  5 +
 arch/powerpc/mm/book3s64/Makefile|  8 ++
 arch/powerpc/mm/kasan/Makefile   |  1 +
 arch/powerpc/mm/kasan/init_book3s_64.c   | 98 
 arch/powerpc/mm/ptdump/ptdump.c  | 20 +++-
 arch/powerpc/platforms/Kconfig.cputype   |  1 +
 arch/powerpc/platforms/powernv/Makefile  |  6 ++
 arch/powerpc/platforms/pseries/Makefile  |  3 +
 18 files changed, 265 insertions(+), 19 deletions(-)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index eaf868094a8e..28f08959bd2e 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -19,8 +19,9 @@ out-of-bounds accesses for global variables is only supported 
since Clang 11.
 Tag-based KASAN is only supported in Clang.
 
 Currently generic KASAN is supported for the x86_64, arm

[PATCH v9 5/6] powerpc/mm/kasan: rename kasan_init_32.c to init_32.c

2020-12-01 Thread Daniel Axtens

kasan is already implied by the directory name, we don't need to
repeat it.

Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/mm/kasan/Makefile   | 2 +-
 arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)

diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index bb1a5408b86b..42fb628a44fd 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -2,6 +2,6 @@
 
 KASAN_SANITIZE := n
 
-obj-$(CONFIG_PPC32)   += kasan_init_32.o
+obj-$(CONFIG_PPC32)   += init_32.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
 obj-$(CONFIG_PPC_BOOK3S_32)+= book3s_32.o
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/init_32.c
similarity index 100%
rename from arch/powerpc/mm/kasan/kasan_init_32.c
rename to arch/powerpc/mm/kasan/init_32.c
-- 
2.25.1

[PATCH v9 2/6] kasan: allow architectures to provide an outline readiness check

2020-12-01 Thread Daniel Axtens

Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
HAVE_ARCH_NO_KASAN_INLINE if it requires this.

Cc: Balbir Singh 
Cc: Aneesh Kumar K.V 
Signed-off-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 

--

I discuss the justfication for this later in the series. Also,
both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
---
 include/linux/kasan.h |  4 
 mm/kasan/common.c | 10 ++
 mm/kasan/generic.c|  3 +++
 3 files changed, 17 insertions(+)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 30d343b4a40a..3df66fdf6662 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -20,6 +20,10 @@ struct kunit_kasan_expectation {
bool report_found;
 };
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
 extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE];
 extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 950fd372a07e..ba7744d3e319 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -117,6 +117,9 @@ void kasan_poison_shadow(const void *address, size_t size, 
u8 value)
 {
void *shadow_start, *shadow_end;
 
+   if (!kasan_arch_is_ready())
+   return;
+
/*
 * Perform shadow offset calculation based on untagged address, as
 * some of the callers (e.g. kasan_poison_object_data) pass tagged
@@ -134,6 +137,9 @@ void kasan_unpoison_shadow(const void *address, size_t size)
 {
u8 tag = get_tag(address);
 
+   if (!kasan_arch_is_ready())
+   return;
+
/*
 * Perform shadow offset calculation based on untagged address, as
 * some of the callers (e.g. kasan_unpoison_object_data) pass tagged
@@ -406,6 +412,10 @@ static bool __kasan_slab_free(struct kmem_cache *cache, 
void *object,
if (unlikely(cache->flags & SLAB_TYPESAFE_BY_RCU))
return false;
 
+   /* We can't read the shadow byte if the arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return false;
+
shadow_byte = READ_ONCE(*(s8 *)kasan_mem_to_shadow(object));
if (shadow_invalid(tag, shadow_byte)) {
kasan_report_invalid_free(tagged_object, ip);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 248264b9cb76..e87404026b2b 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -169,6 +169,9 @@ static __always_inline bool 
check_memory_region_inline(unsigned long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return true;
+
if (unlikely(size == 0))
return true;
 
-- 
2.25.1

[PATCH v9 1/6] kasan: allow an architecture to disable inline instrumentation

2020-12-01 Thread Daniel Axtens

For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

Signed-off-by: Daniel Axtens 
---
 lib/Kconfig.kasan | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index 542a9c18398e..31a0b28f6c2b 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -9,6 +9,9 @@ config HAVE_ARCH_KASAN_SW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+config HAVE_ARCH_NO_KASAN_INLINE
+   def_bool n
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -108,6 +111,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
bool "Inline instrumentation"
+   depends on !HAVE_ARCH_NO_KASAN_INLINE
help
  Compiler directly inserts code checking shadow memory before
  memory accesses. This is faster than outline (in some workloads
-- 
2.25.1

[PATCH v9 0/6] KASAN for powerpc64 radix

2020-12-01 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU.

This is a significant reworking of the previous versions. Instead of
the previous approach which supported inline instrumentation, this
series provides only outline instrumentation.

To get around the problem of accessing the shadow region inside code we run
with translations off (in 'real mode'), we we restrict checking to when
translations are enabled. This is done via a new hook in the kasan core and
by excluding larger quantites of arch code from instrumentation. The upside
is that we no longer require that you be able to specify the amount of
physically contiguous memory on the system at compile time. Hopefully this
is a better trade-off. More details in patch 6.

kexec works. Both 64k and 4k pages work. Running as a KVM host works, but
nothing in arch/powerpc/kvm is instrumented. It's also potentially a bit
fragile - if any real mode code paths call out to instrumented code, things
will go boom.

There are 4 failing KUnit tests:

kasan_stack_oob, kasan_alloca_oob_left & kasan_alloca_oob_right - these are
due to not supporting inline instrumentation.

kasan_global_oob - gcc puts the ASAN init code in a section called
'.init_array'. Powerpc64 module loading code goes through and _renames_ any
section beginning with '.init' to begin with '_init' in order to avoid some
complexities around our 24-bit indirect jumps. This means it renames
'.init_array' to '_init_array', and the generic module loading code then
fails to recognise the section as a constructor and thus doesn't run
it. This hack dates back to 2003 and so I'm not going to try to unpick it
in this series. (I suspect this may have previously worked if the code
ended up in .ctors rather than .init_array but I don't keep my old binaries
around so I have no real way of checking.)


Daniel Axtens (6):
  kasan: allow an architecture to disable inline instrumentation
  kasan: allow architectures to provide an outline readiness check
  kasan: define and use MAX_PTRS_PER_* for early shadow tables
  kasan: Document support on 32-bit powerpc
  powerpc/mm/kasan: rename kasan_init_32.c to init_32.c
  powerpc: Book3S 64-bit outline-only KASAN support

Re: linux-next: build failure in Linus' tree

2020-11-23 Thread Daniel Axtens

Thanks sfr and mpe.

> Applied to powerpc/fixes.
>
> [1/1] powerpc/64s: Fix allnoconfig build since uaccess flush
>   
> https://git.kernel.org/powerpc/c/b6b79dd53082db11070b4368d85dd6699ff0b063

We also needed a similar fix for stable, which has also been applied.

I guess I should build some sort of build process that tests a whole
range of configs. I did test a few but clearly not enough. Is there a
known list that I should be using? Something from kisskb?

Kind regards,
Daniel

Michael Ellerman  writes:

> On Mon, 23 Nov 2020 18:40:16 +1100, Stephen Rothwell wrote:
>> After merging most of the trees, today's linux-next build (powerpc64
>> allnoconfig) failed like this:
>> 
>> In file included from arch/powerpc/include/asm/kup.h:18,
>>  from arch/powerpc/include/asm/uaccess.h:9,
>>  from include/linux/uaccess.h:11,
>>  from include/linux/sched/task.h:11,
>>  from include/linux/sched/signal.h:9,
>>  from include/linux/rcuwait.h:6,
>>  from include/linux/percpu-rwsem.h:7,
>>  from include/linux/fs.h:33,
>>  from include/linux/compat.h:17,
>>  from arch/powerpc/kernel/asm-offsets.c:14:
>> arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data 
>> definition has no type or storage class
>>66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
>>   | ^~~~
>> arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: error: type defaults to 
>> 'int' in declaration of 'DECLARE_STATIC_KEY_FALSE' [-Werror=implicit-int]
>> arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: parameter 
>> names (without types) in function declaration
>> arch/powerpc/include/asm/book3s/64/kup-radix.h: In function 
>> 'prevent_user_access':
>> arch/powerpc/include/asm/book3s/64/kup-radix.h:180:6: error: implicit 
>> declaration of function 'static_branch_unlikely' 
>> [-Werror=implicit-function-declaration]
>>   180 |  if (static_branch_unlikely(_flush_key))
>>   |  ^~
>> arch/powerpc/include/asm/book3s/64/kup-radix.h:180:30: error: 
>> 'uaccess_flush_key' undeclared (first use in this function)
>>   180 |  if (static_branch_unlikely(_flush_key))
>>   |  ^
>> arch/powerpc/include/asm/book3s/64/kup-radix.h:180:30: note: each undeclared 
>> identifier is reported only once for each function it appears in
>> arch/powerpc/include/asm/book3s/64/kup-radix.h: In function 
>> 'prevent_user_access_return':
>> arch/powerpc/include/asm/book3s/64/kup-radix.h:189:30: error: 
>> 'uaccess_flush_key' undeclared (first use in this function)
>>   189 |  if (static_branch_unlikely(_flush_key))
>>   |  ^
>> arch/powerpc/include/asm/book3s/64/kup-radix.h: In function 
>> 'restore_user_access':
>> arch/powerpc/include/asm/book3s/64/kup-radix.h:198:30: error: 
>> 'uaccess_flush_key' undeclared (first use in this function)
>>   198 |  if (static_branch_unlikely(_flush_key) && flags == 
>> AMR_KUAP_BLOCKED)
>>   |  ^
>> 
>> [...]
>

Re: [PATCH 4.4 00/15] 4.4.245-rc1 review

2020-11-22 Thread Daniel Axtens

Hi,
>> Build results:
>>  total: 165 pass: 164 fail: 1
>> Failed builds:
>>  powerpc:ppc64e_defconfig
>> Qemu test results:
>>  total: 328 pass: 323 fail: 5
>> Failed tests:
>>  ppc64:ppce500:corenet64_smp_defconfig:e5500:initrd
>>  ppc64:ppce500:corenet64_smp_defconfig:e5500:nvme:rootfs
>>  ppc64:ppce500:corenet64_smp_defconfig:e5500:sdhci:mmc:rootfs
>>  ppc64:ppce500:corenet64_smp_defconfig:e5500:scsi[53C895A]:rootfs
>>  ppc64:ppce500:corenet64_smp_defconfig:e5500:sata-sii3112:rootfs 
>> 
>> Failure in all cases is:
>> 
>> In file included from arch/powerpc/kernel/ppc_ksyms.c:10:0:
>> arch/powerpc/include/asm/book3s/64/kup-radix.h:11:29: error: redefinition of 
>> ‘allow_user_access’
>>  static __always_inline void allow_user_access(void __user *to, const void 
>> __user *from,
>>  ^
>> In file included from arch/powerpc/include/asm/uaccess.h:12:0,
>>  from arch/powerpc/kernel/ppc_ksyms.c:8:
>> arch/powerpc/include/asm/kup.h:12:20: note: previous definition of 
>> ‘allow_user_access’ was here
>>  static inline void allow_user_access(void __user *to, const void __user 
>> *from,
>> ^
>> In file included from arch/powerpc/kernel/ppc_ksyms.c:10:0:
>> arch/powerpc/include/asm/book3s/64/kup-radix.h:16:20: error: redefinition of 
>> ‘prevent_user_access’
>>  static inline void prevent_user_access(void __user *to, const void __user 
>> *from,
>> ^~~
>> In file included from arch/powerpc/include/asm/uaccess.h:12:0,
>>  from arch/powerpc/kernel/ppc_ksyms.c:8:
>> arch/powerpc/include/asm/kup.h:14:20: note: previous definition of 
>> ‘prevent_user_access’ was here
>>  static inline void prevent_user_access(void __user *to, const void __user 
>> *from,
>> ^~~
>> 
>> Tested-by: Guenter Roeck 
>
> Thanks for testing these.
>
> Daniel, looks like your patches broke some configurations on powerpc as
> shown above.  Care to send a fix-up patch for these?

Will do. I tested ppc64e_defconfig but clearly that wasn't comprehensive
enough.

Kind regards,
Daniel

[PATCH RESEND v2] fs/select.c: batch user writes in do_sys_poll

2020-11-18 Thread Daniel Axtens

When returning results to userspace, do_sys_poll repeatedly calls
put_user() - once per fd that it's watching.

This means that on architectures that support some form of
kernel-to-userspace access protection, we end up enabling and disabling
access once for each file descripter we're watching. This is inefficent
and we can improve things. We could do careful batching of the opening
and closing of the access window, or we could just copy the entire walk
entries structure. While that copies more data, it potentially does so
more efficiently, and the overhead is much less than the lock/unlock
overhead.

Unscientific benchmarking with the poll2_threads microbenchmark from
will-it-scale, run as `./poll2_threads -t 1 -s 15`:

  - Bare-metal Power9 with KUAP: ~49% speed-up
  - VM on amd64 laptop with SMAP: ~25% speed-up

Signed-off-by: Daniel Axtens 
---
 fs/select.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index ebfebdfe5c69..4a74d1353ccb 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -1012,12 +1012,10 @@ static int do_sys_poll(struct pollfd __user *ufds, 
unsigned int nfds,
poll_freewait();
 
for (walk = head; walk; walk = walk->next) {
-   struct pollfd *fds = walk->entries;
-   int j;
-
-   for (j = 0; j < walk->len; j++, ufds++)
-   if (__put_user(fds[j].revents, >revents))
-   goto out_fds;
+   if (copy_to_user(ufds, walk->entries,
+sizeof(struct pollfd) * walk->len))
+   goto out_fds;
+   ufds += walk->len;
}
 
err = fdcount;
-- 
2.25.1

[PATCH v2] fs/select.c: batch user writes in do_sys_poll

2020-08-13 Thread Daniel Axtens

When returning results to userspace, do_sys_poll repeatedly calls
put_user() - once per fd that it's watching.

This means that on architectures that support some form of
kernel-to-userspace access protection, we end up enabling and disabling
access once for each file descripter we're watching. This is inefficent
and we can improve things. We could do careful batching of the opening
and closing of the access window, or we could just copy the entire walk
entries structure. While that copies more data, it potentially does so
more efficiently, and the overhead is much less than the lock/unlock
overhead.

Unscientific benchmarking with the poll2_threads microbenchmark from
will-it-scale, run as `./poll2_threads -t 1 -s 15`:

  - Bare-metal Power9 with KUAP: ~49% speed-up
  - VM on amd64 laptop with SMAP: ~25% speed-up

Signed-off-by: Daniel Axtens 

---

v2: Use copy_to_user instead of put_user, thanks Christoph Hellwig.
---
 fs/select.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 7aef49552d4c..52118cdddf77 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -1012,12 +1012,10 @@ static int do_sys_poll(struct pollfd __user *ufds, 
unsigned int nfds,
poll_freewait();
 
for (walk = head; walk; walk = walk->next) {
-   struct pollfd *fds = walk->entries;
-   int j;
-
-   for (j = 0; j < walk->len; j++, ufds++)
-   if (__put_user(fds[j].revents, >revents))
-   goto out_fds;
+   if (copy_to_user(ufds, walk->entries,
+sizeof(struct pollfd) * walk->len))
+   goto out_fds;
+   ufds += walk->len;
}
 
err = fdcount;
-- 
2.25.1

Re: [PATCH] fs/select.c: batch user writes in do_sys_poll

2020-08-13 Thread Daniel Axtens

Hi,

>> Seem like this could simply use a copy_to_user to further simplify
>> things?
>
> I'll benchmark it and find out.

I tried this:

for (walk = head; walk; walk = walk->next) {
-   struct pollfd *fds = walk->entries;
-   int j;
-
-   for (j = 0; j < walk->len; j++, ufds++)
-   if (__put_user(fds[j].revents, >revents))
-   goto out_fds;
+   if (copy_to_user(ufds, walk->entries,
+sizeof(struct pollfd) * walk->len))
+   goto out_fds;
+   ufds += walk->len;
}

With that approach, the poll2 microbenchmark (which polls 128 fds) is
about as fast as v1.

However, the poll1 microbenchmark, which polls just 1 fd, regresses a
touch (<1% - ~2%) compared to the current code, although it's largely
within the noise. Thoughts?

Kind regards,
Daniel

>> Also please don't pointlessly add overly long lines.
>
> Weird, I ran the commit through checkpatch and it didn't pick it
> up. I'll check the next version more carefully.
>
> Regards,
> Daniel

Re: [PATCH] fs/select.c: batch user writes in do_sys_poll

2020-08-13 Thread Daniel Axtens

Christoph Hellwig  writes:

> On Thu, Aug 13, 2020 at 05:11:20PM +1000, Daniel Axtens wrote:
>> When returning results to userspace, do_sys_poll repeatedly calls
>> put_user() - once per fd that it's watching.
>> 
>> This means that on architectures that support some form of
>> kernel-to-userspace access protection, we end up enabling and disabling
>> access once for each file descripter we're watching. This is inefficent
>> and we can improve things by batching the accesses together.
>> 
>> To make sure there's not too much happening in the window when user
>> accesses are permitted, we don't walk the linked list with accesses on.
>> This leads to some slightly messy code in the loop, unfortunately.
>> 
>> Unscientific benchmarking with the poll2_threads microbenchmark from
>> will-it-scale, run as `./poll2_threads -t 1 -s 15`:
>> 
>>  - Bare-metal Power9 with KUAP: ~48.8% speed-up
>>  - VM on amd64 laptop with SMAP: ~25.5% speed-up
>> 
>> Signed-off-by: Daniel Axtens 
>
> Seem like this could simply use a copy_to_user to further simplify
> things?

I'll benchmark it and find out.

> Also please don't pointlessly add overly long lines.

Weird, I ran the commit through checkpatch and it didn't pick it
up. I'll check the next version more carefully.

Regards,
Daniel

[PATCH] fs/select.c: batch user writes in do_sys_poll

2020-08-13 Thread Daniel Axtens

When returning results to userspace, do_sys_poll repeatedly calls
put_user() - once per fd that it's watching.

This means that on architectures that support some form of
kernel-to-userspace access protection, we end up enabling and disabling
access once for each file descripter we're watching. This is inefficent
and we can improve things by batching the accesses together.

To make sure there's not too much happening in the window when user
accesses are permitted, we don't walk the linked list with accesses on.
This leads to some slightly messy code in the loop, unfortunately.

Unscientific benchmarking with the poll2_threads microbenchmark from
will-it-scale, run as `./poll2_threads -t 1 -s 15`:

 - Bare-metal Power9 with KUAP: ~48.8% speed-up
 - VM on amd64 laptop with SMAP: ~25.5% speed-up

Signed-off-by: Daniel Axtens 
---
 fs/select.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 7aef49552d4c..f58976da9e63 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -1015,9 +1015,19 @@ static int do_sys_poll(struct pollfd __user *ufds, 
unsigned int nfds,
struct pollfd *fds = walk->entries;
int j;
 
+   if (!user_write_access_begin(ufds, (sizeof(struct pollfd) *
+   walk->len)))
+   goto out_fds;
+
for (j = 0; j < walk->len; j++, ufds++)
-   if (__put_user(fds[j].revents, >revents))
-   goto out_fds;
+   unsafe_put_user(fds[j].revents, >revents, 
loop_fault);
+
+   user_write_access_end();
+   continue;
+
+loop_fault:
+   user_write_access_end();
+   goto out_fds;
}
 
err = fdcount;
-- 
2.25.1

Re: [PATCH v2 01/17] KVM: PPC: Book3S HV: simplify kvm_cma_reserve()

2020-08-04 Thread Daniel Axtens

Hi Mike,

>
> The memory size calculation in kvm_cma_reserve() traverses memblock.memory
> rather than simply call memblock_phys_mem_size(). The comment in that
> function suggests that at some point there should have been call to
> memblock_analyze() before memblock_phys_mem_size() could be used.
> As of now, there is no memblock_analyze() at all and
> memblock_phys_mem_size() can be used as soon as cold-plug memory is
> registerd with memblock.
>
> Replace loop over memblock.memory with a call to memblock_phys_mem_size().
>
> Signed-off-by: Mike Rapoport 
> ---
>  arch/powerpc/kvm/book3s_hv_builtin.c | 11 ++-
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
> b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 7cd3cf3d366b..56ab0d28de2a 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -95,22 +95,15 @@ EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);
>  void __init kvm_cma_reserve(void)
>  {
>   unsigned long align_size;
> - struct memblock_region *reg;
> - phys_addr_t selected_size = 0;
> + phys_addr_t selected_size;
>  
>   /*
>* We need CMA reservation only when we are in HV mode
>*/
>   if (!cpu_has_feature(CPU_FTR_HVMODE))
>   return;
> - /*
> -  * We cannot use memblock_phys_mem_size() here, because
> -  * memblock_analyze() has not been called yet.
> -  */
> - for_each_memblock(memory, reg)
> - selected_size += memblock_region_memory_end_pfn(reg) -
> -  memblock_region_memory_base_pfn(reg);
>  
> + selected_size = PHYS_PFN(memblock_phys_mem_size());
>   selected_size = (selected_size * kvm_cma_resv_ratio / 100) << 
> PAGE_SHIFT;

I think this is correct, but PHYS_PFN does x >> PAGE_SHIFT and then the
next line does x << PAGE_SHIFT, so I think we could combine those two
lines as:

selected_size = PAGE_ALIGN(memblock_phys_mem_size() * kvm_cma_resv_ratio / 100);

(I think that might technically change it from aligning down to aligning
up but I don't think 1 page matters here.)

Kind regards,
Daniel


>   if (selected_size) {
>   pr_debug("%s: reserving %ld MiB for global area\n", __func__,
> -- 
> 2.26.2

Re: [PATCH v2 4/5] powerpc/mm: Remove custom stack expansion checking

2020-07-27 Thread Daniel Axtens

Hi Michael,

I tested v1 of this. I ran the test from the bug with a range of stack
sizes, in a loop, for several hours and didn't see any crashes/signal
delivery failures.

I retested v2 for a few minutes just to be sure, and I ran stress-ng's
stack, stackmmap and bad-altstack stressors to make sure no obvious
kernel bugs were exposed. Nothing crashed.

All tests done on a P8 LE guest under KVM.

On that basis:

Tested-by: Daniel Axtens 

The more I look at this the less qualified I feel to Review it, but
certainly it looks better than my ugly hack from late last year.

Kind regards,
Daniel

> We have powerpc specific logic in our page fault handling to decide if
> an access to an unmapped address below the stack pointer should expand
> the stack VMA.
>
> The logic aims to prevent userspace from doing bad accesses below the
> stack pointer. However as long as the stack is < 1MB in size, we allow
> all accesses without further checks. Adding some debug I see that I
> can do a full kernel build and LTP run, and not a single process has
> used more than 1MB of stack. So for the majority of processes the
> logic never even fires.
>
> We also recently found a nasty bug in this code which could cause
> userspace programs to be killed during signal delivery. It went
> unnoticed presumably because most processes use < 1MB of stack.
>
> The generic mm code has also grown support for stack guard pages since
> this code was originally written, so the most heinous case of the
> stack expanding into other mappings is now handled for us.
>
> Finally although some other arches have special logic in this path,
> from what I can tell none of x86, arm64, arm and s390 impose any extra
> checks other than those in expand_stack().
>
> So drop our complicated logic and like other architectures just let
> the stack expand as long as its within the rlimit.
>
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/mm/fault.c | 109 ++--
>  1 file changed, 5 insertions(+), 104 deletions(-)
>
> v2: no change just rebased.
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 3ebb1792e636..925a7231abb3 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -42,39 +42,7 @@
>  #include 
>  #include 
>  
> -/*
> - * Check whether the instruction inst is a store using
> - * an update addressing form which will update r1.
> - */
> -static bool store_updates_sp(struct ppc_inst inst)
> -{
> - /* check for 1 in the rA field */
> - if (((ppc_inst_val(inst) >> 16) & 0x1f) != 1)
> - return false;
> - /* check major opcode */
> - switch (ppc_inst_primary_opcode(inst)) {
> - case OP_STWU:
> - case OP_STBU:
> - case OP_STHU:
> - case OP_STFSU:
> - case OP_STFDU:
> - return true;
> - case OP_STD:/* std or stdu */
> - return (ppc_inst_val(inst) & 3) == 1;
> - case OP_31:
> - /* check minor opcode */
> - switch ((ppc_inst_val(inst) >> 1) & 0x3ff) {
> - case OP_31_XOP_STDUX:
> - case OP_31_XOP_STWUX:
> - case OP_31_XOP_STBUX:
> - case OP_31_XOP_STHUX:
> - case OP_31_XOP_STFSUX:
> - case OP_31_XOP_STFDUX:
> - return true;
> - }
> - }
> - return false;
> -}
> +
>  /*
>   * do_page_fault error handling helpers
>   */
> @@ -267,57 +235,6 @@ static bool bad_kernel_fault(struct pt_regs *regs, 
> unsigned long error_code,
>   return false;
>  }
>  
> -// This comes from 64-bit struct rt_sigframe + __SIGNAL_FRAMESIZE
> -#define SIGFRAME_MAX_SIZE(4096 + 128)
> -
> -static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
> - struct vm_area_struct *vma, unsigned int flags,
> - bool *must_retry)
> -{
> - /*
> -  * N.B. The POWER/Open ABI allows programs to access up to
> -  * 288 bytes below the stack pointer.
> -  * The kernel signal delivery code writes a bit over 4KB
> -  * below the stack pointer (r1) before decrementing it.
> -  * The exec code can write slightly over 640kB to the stack
> -  * before setting the user r1.  Thus we allow the stack to
> -  * expand to 1MB without further checks.
> -  */
> - if (address + 0x10 < vma->vm_end) {
> - struct ppc_inst __user *nip = (struct ppc_inst __user 
> *)regs->nip;
> - /* get user regs even if this fault is in kernel mode */
> - struct pt_regs *uregs = current->thread.regs;
> -

Re: [PATCH v2 2/5] powerpc: Allow 4224 bytes of stack expansion for the signal frame

2020-07-27 Thread Daniel Axtens

Hi Michael,

I have tested this with the test from the bug and it now seems to pass
fine. On that basis:

Tested-by: Daniel Axtens 

Thank you for coming up with a better solution than my gross hack!

Kind regards,
Daniel

> We have powerpc specific logic in our page fault handling to decide if
> an access to an unmapped address below the stack pointer should expand
> the stack VMA.
>
> The code was originally added in 2004 "ported from 2.4". The rough
> logic is that the stack is allowed to grow to 1MB with no extra
> checking. Over 1MB the access must be within 2048 bytes of the stack
> pointer, or be from a user instruction that updates the stack pointer.
>
> The 2048 byte allowance below the stack pointer is there to cover the
> 288 byte "red zone" as well as the "about 1.5kB" needed by the signal
> delivery code.
>
> Unfortunately since then the signal frame has expanded, and is now
> 4224 bytes on 64-bit kernels with transactional memory enabled. This
> means if a process has consumed more than 1MB of stack, and its stack
> pointer lies less than 4224 bytes from the next page boundary, signal
> delivery will fault when trying to expand the stack and the process
> will see a SEGV.
>
> The total size of the signal frame is the size of struct rt_sigframe
> (which includes the red zone) plus __SIGNAL_FRAMESIZE (128 bytes on
> 64-bit).
>
> The 2048 byte allowance was correct until 2008 as the signal frame
> was:
>
> struct rt_sigframe {
> struct ucontextuc;   /* 0  1440 */
> /* --- cacheline 11 boundary (1408 bytes) was 32 bytes ago --- */
> long unsigned int  _unused[2];   /*  144016 */
> unsigned int   tramp[6]; /*  145624 */
> struct siginfo *   pinfo;/*  1480 8 */
> void * puc;  /*  1488 8 */
> struct siginfo info; /*  1496   128 */
> /* --- cacheline 12 boundary (1536 bytes) was 88 bytes ago --- */
> char   abigap[288];  /*  1624   288 */
>
> /* size: 1920, cachelines: 15, members: 7 */
> /* padding: 8 */
> };
>
> 1920 + 128 = 2048
>
> Then in commit ce48b2100785 ("powerpc: Add VSX context save/restore,
> ptrace and signal support") (Jul 2008) the signal frame expanded to
> 2304 bytes:
>
> struct rt_sigframe {
> struct ucontextuc;   /* 0  1696 */
> <--
> /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
> long unsigned int  _unused[2];   /*  169616 */
> unsigned int   tramp[6]; /*  171224 */
> struct siginfo *   pinfo;/*  1736 8 */
> void * puc;  /*  1744 8 */
> struct siginfo info; /*  1752   128 */
> /* --- cacheline 14 boundary (1792 bytes) was 88 bytes ago --- */
> char   abigap[288];  /*  1880   288 */
>
> /* size: 2176, cachelines: 17, members: 7 */
> /* padding: 8 */
> };
>
> 2176 + 128 = 2304
>
> At this point we should have been exposed to the bug, though as far as
> I know it was never reported. I no longer have a system old enough to
> easily test on.
>
> Then in 2010 commit 320b2b8de126 ("mm: keep a guard page below a
> grow-down stack segment") caused our stack expansion code to never
> trigger, as there was always a VMA found for a write up to PAGE_SIZE
> below r1.
>
> That meant the bug was hidden as we continued to expand the signal
> frame in commit 2b0a576d15e0 ("powerpc: Add new transactional memory
> state to the signal context") (Feb 2013):
>
> struct rt_sigframe {
> struct ucontextuc;   /* 0  1696 */
> /* --- cacheline 13 boundary (1664 bytes) was 32 bytes ago --- */
> struct ucontextuc_transact;  /*  1696  1696 */
> <--
> /* --- cacheline 26 boundary (3328 bytes) was 64 bytes ago --- */
> long unsigned int  _unused[2];   /*  339216 */
> unsigned int   tramp[6]; /*  340824 */
> struct siginfo *   pinfo;/*  3432 8 */
> void * puc;  /*  3440 8 */
> struct siginfo info; /*  3448   128 */
> /* --- cacheline 27 boundary (3456 bytes) was 120 bytes ago --- */
> char

Re: [RFC PATCH 4/5] powerpc/mm: Remove custom stack expansion checking

2020-07-23 Thread Daniel Axtens

Hi Michael,

> We have powerpc specific logic in our page fault handling to decide if
> an access to an unmapped address below the stack pointer should expand
> the stack VMA.
>
> The logic aims to prevent userspace from doing bad accesses below the
> stack pointer. However as long as the stack is < 1MB in size, we allow
> all accesses without further checks. Adding some debug I see that I
> can do a full kernel build and LTP run, and not a single process has
> used more than 1MB of stack. So for the majority of processes the
> logic never even fires.
>
> We also recently found a nasty bug in this code which could cause
> userspace programs to be killed during signal delivery. It went
> unnoticed presumably because most processes use < 1MB of stack.
>
> The generic mm code has also grown support for stack guard pages since
> this code was originally written, so the most heinous case of the
> stack expanding into other mappings is now handled for us.
>
> Finally although some other arches have special logic in this path,
> from what I can tell none of x86, arm64, arm and s390 impose any extra
> checks other than those in expand_stack().
>
> So drop our complicated logic and like other architectures just let
> the stack expand as long as its within the rlimit.
>

I applied and tested this. While I wouldn't call my testing
comprehensive, I have not been able to reproduce the crash with this
patch applied.

Kind regards,
Daniel


> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/mm/fault.c | 106 ++--
>  1 file changed, 5 insertions(+), 101 deletions(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index ed01329dd12b..925a7231abb3 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -42,39 +42,7 @@
>  #include 
>  #include 
>  
> -/*
> - * Check whether the instruction inst is a store using
> - * an update addressing form which will update r1.
> - */
> -static bool store_updates_sp(struct ppc_inst inst)
> -{
> - /* check for 1 in the rA field */
> - if (((ppc_inst_val(inst) >> 16) & 0x1f) != 1)
> - return false;
> - /* check major opcode */
> - switch (ppc_inst_primary_opcode(inst)) {
> - case OP_STWU:
> - case OP_STBU:
> - case OP_STHU:
> - case OP_STFSU:
> - case OP_STFDU:
> - return true;
> - case OP_STD:/* std or stdu */
> - return (ppc_inst_val(inst) & 3) == 1;
> - case OP_31:
> - /* check minor opcode */
> - switch ((ppc_inst_val(inst) >> 1) & 0x3ff) {
> - case OP_31_XOP_STDUX:
> - case OP_31_XOP_STWUX:
> - case OP_31_XOP_STBUX:
> - case OP_31_XOP_STHUX:
> - case OP_31_XOP_STFSUX:
> - case OP_31_XOP_STFDUX:
> - return true;
> - }
> - }
> - return false;
> -}
> +
>  /*
>   * do_page_fault error handling helpers
>   */
> @@ -267,54 +235,6 @@ static bool bad_kernel_fault(struct pt_regs *regs, 
> unsigned long error_code,
>   return false;
>  }
>  
> -static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
> - struct vm_area_struct *vma, unsigned int flags,
> - bool *must_retry)
> -{
> - /*
> -  * N.B. The POWER/Open ABI allows programs to access up to
> -  * 288 bytes below the stack pointer.
> -  * The kernel signal delivery code writes up to 4KB
> -  * below the stack pointer (r1) before decrementing it.
> -  * The exec code can write slightly over 640kB to the stack
> -  * before setting the user r1.  Thus we allow the stack to
> -  * expand to 1MB without further checks.
> -  */
> - if (address + 0x10 < vma->vm_end) {
> - struct ppc_inst __user *nip = (struct ppc_inst __user 
> *)regs->nip;
> - /* get user regs even if this fault is in kernel mode */
> - struct pt_regs *uregs = current->thread.regs;
> - if (uregs == NULL)
> - return true;
> -
> - /*
> -  * A user-mode access to an address a long way below
> -  * the stack pointer is only valid if the instruction
> -  * is one which would update the stack pointer to the
> -  * address accessed if the instruction completed,
> -  * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
> -  * (or the byte, halfword, float or double forms).
> -  *
> -  * If we don't check this then any write to the area
> -  * between the last mapped region and the stack will
> -  * expand the stack rather than segfaulting.
> -  */
> - if (address + 4096 >= uregs->gpr[1])
> - return false;
> -
> - if ((flags & FAULT_FLAG_WRITE) && (flags & FAULT_FLAG_USER) &&
> - access_ok(nip, sizeof(*nip)))

Re: [PATCH 2/5] powerpc: Allow 4096 bytes of stack expansion for the signal frame

2020-07-23 Thread Daniel Axtens

Hi Michael,

Unfortunately, this patch doesn't completely solve the problem.

Trying the original reproducer, I'm still able to trigger the crash even
with this patch, although not 100% of the time. (If I turn ASLR off
outside of tmux it reliably crashes, if I turn ASLR off _inside_ of tmux
it reliably succeeds; all of this is on a serial console.)

./foo 1241000 & sleep 1; killall -USR1 foo; echo ok

If I add some debugging information, I see that I'm getting
address + 4096 = 7fed0fa0
gpr1 =   7fed1020

So address + 4096 is 0x80 bytes below the 4k window. I haven't been able
to figure out why, gdb gives me a NIP in __kernel_sigtramp_rt64 but I
don't know what to make of that.

Kind regards,
Daniel

P.S. I don't know what your policy on linking to kernel bugzilla is, but
if you want:

Link: https://bugzilla.kernel.org/show_bug.cgi?id=205183


> Reported-by: Tom Lane 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/mm/fault.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 641fc5f3d7dd..ed01329dd12b 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -274,7 +274,7 @@ static bool bad_stack_expansion(struct pt_regs *regs, 
> unsigned long address,
>   /*
>* N.B. The POWER/Open ABI allows programs to access up to
>* 288 bytes below the stack pointer.
> -  * The kernel signal delivery code writes up to about 1.5kB
> +  * The kernel signal delivery code writes up to 4KB
>* below the stack pointer (r1) before decrementing it.
>* The exec code can write slightly over 640kB to the stack
>* before setting the user r1.  Thus we allow the stack to
> @@ -299,7 +299,7 @@ static bool bad_stack_expansion(struct pt_regs *regs, 
> unsigned long address,
>* between the last mapped region and the stack will
>* expand the stack rather than segfaulting.
>*/
> - if (address + 2048 >= uregs->gpr[1])
> + if (address + 4096 >= uregs->gpr[1])
>   return false;
>  
>   if ((flags & FAULT_FLAG_WRITE) && (flags & FAULT_FLAG_USER) &&
> -- 
> 2.25.1

Re: [PATCH v3] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-16 Thread Daniel Axtens

Michal Suchánek  writes:

> On Wed, Jul 15, 2020 at 07:52:01AM -0400, Nayna Jain wrote:
>> The device-tree property to check secure and trusted boot state is
>> different for guests(pseries) compared to baremetal(powernv).
>> 
>> This patch updates the existing is_ppc_secureboot_enabled() and
>> is_ppc_trustedboot_enabled() functions to add support for pseries.
>> 
>> The secureboot and trustedboot state are exposed via device-tree property:
>> /proc/device-tree/ibm,secure-boot and /proc/device-tree/ibm,trusted-boot
>> 
>> The values of ibm,secure-boot under pseries are interpreted as:
>   ^^^
>> 
>> 0 - Disabled
>> 1 - Enabled in Log-only mode. This patch interprets this value as
>> disabled, since audit mode is currently not supported for Linux.
>> 2 - Enabled and enforced.
>> 3-9 - Enabled and enforcing; requirements are at the discretion of the
>> operating system.
>> 
>> The values of ibm,trusted-boot under pseries are interpreted as:
>^^^
> These two should be different I suppose?

I'm not quite sure what you mean? They'll be documented in a future
revision of the PAPR, once I get my act together and submit the
relevant internal paperwork.

Daniel
>
> Thanks
>
> Michal
>> 0 - Disabled
>> 1 - Enabled
>> 
>> Signed-off-by: Nayna Jain 
>> Reviewed-by: Daniel Axtens 
>> ---
>> v3:
>> * fixed double check. Thanks Daniel for noticing it.
>> * updated patch description.
>> 
>> v2:
>> * included Michael Ellerman's feedback.
>> * added Daniel Axtens's Reviewed-by.
>> 
>>  arch/powerpc/kernel/secure_boot.c | 19 +--
>>  1 file changed, 17 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/powerpc/kernel/secure_boot.c 
>> b/arch/powerpc/kernel/secure_boot.c
>> index 4b982324d368..118bcb5f79c4 100644
>> --- a/arch/powerpc/kernel/secure_boot.c
>> +++ b/arch/powerpc/kernel/secure_boot.c
>> @@ -6,6 +6,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  static struct device_node *get_ppc_fw_sb_node(void)
>>  {
>> @@ -23,12 +24,19 @@ bool is_ppc_secureboot_enabled(void)
>>  {
>>  struct device_node *node;
>>  bool enabled = false;
>> +u32 secureboot;
>>  
>>  node = get_ppc_fw_sb_node();
>>  enabled = of_property_read_bool(node, "os-secureboot-enforcing");
>> -
>>  of_node_put(node);
>>  
>> +if (enabled)
>> +goto out;
>> +
>> +if (!of_property_read_u32(of_root, "ibm,secure-boot", ))
>> +enabled = (secureboot > 1);
>> +
>> +out:
>>  pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled");
>>  
>>  return enabled;
>> @@ -38,12 +46,19 @@ bool is_ppc_trustedboot_enabled(void)
>>  {
>>  struct device_node *node;
>>  bool enabled = false;
>> +u32 trustedboot;
>>  
>>  node = get_ppc_fw_sb_node();
>>  enabled = of_property_read_bool(node, "trusted-enabled");
>> -
>>  of_node_put(node);
>>  
>> +if (enabled)
>> +goto out;
>> +
>> +if (!of_property_read_u32(of_root, "ibm,trusted-boot", ))
>> +enabled = (trustedboot > 0);
>> +
>> +out:
>>  pr_info("Trusted boot mode %s\n", enabled ? "enabled" : "disabled");
>>  
>>  return enabled;
>> -- 
>> 2.26.2
>>

Re: [PATCH v3] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-15 Thread Daniel Axtens

Hi Nayna,

Looks good to me.

Sorry for not noticing this before, but I think
> +#include 
is now superfluous (I think it's leftover from the machine_is
version?). Maybe mpe will take pity on you and remove it when he picks
up your patch.

Kind regards,
Daniel

>  
>  static struct device_node *get_ppc_fw_sb_node(void)
>  {
> @@ -23,12 +24,19 @@ bool is_ppc_secureboot_enabled(void)
>  {
>   struct device_node *node;
>   bool enabled = false;
> + u32 secureboot;
>  
>   node = get_ppc_fw_sb_node();
>   enabled = of_property_read_bool(node, "os-secureboot-enforcing");
> -
>   of_node_put(node);
>  
> + if (enabled)
> + goto out;
> +
> + if (!of_property_read_u32(of_root, "ibm,secure-boot", ))
> + enabled = (secureboot > 1);
> +
> +out:
>   pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled");
>  
>   return enabled;
> @@ -38,12 +46,19 @@ bool is_ppc_trustedboot_enabled(void)
>  {
>   struct device_node *node;
>   bool enabled = false;
> + u32 trustedboot;
>  
>   node = get_ppc_fw_sb_node();
>   enabled = of_property_read_bool(node, "trusted-enabled");
> -
>   of_node_put(node);
>  
> + if (enabled)
> + goto out;
> +
> + if (!of_property_read_u32(of_root, "ibm,trusted-boot", ))
> + enabled = (trustedboot > 0);
> +
> +out:
>   pr_info("Trusted boot mode %s\n", enabled ? "enabled" : "disabled");
>  
>   return enabled;
> -- 
> 2.26.2

Re: [PATCH v2] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-14 Thread Daniel Axtens

Hi Nayna,

Thanks! Would you be able to fold in some of the information from my
reply to v1 into the changelog? Until we have public PAPR release with
it, that information is the extent of the public documentation. It would
be good to get it into the git log rather than just floating around in
the mail archives!

A couple of small nits:

> + if (enabled)
> + goto out;
> +
> + if (!of_property_read_u32(of_root, "ibm,secure-boot", )) {
> + if (secureboot)
> + enabled = (secureboot > 1) ? true : false;

Your tests double up here - you don't need both the 'if' statement and
the 'secureboot > 1' ternary operator.

Just

+   if (!of_property_read_u32(of_root, "ibm,secure-boot", )) {
+   enabled = (secureboot > 1) ? true : false;

or even

+   if (!of_property_read_u32(of_root, "ibm,secure-boot", )) {
+   enabled = (secureboot > 1);

would work.

> + if (!of_property_read_u32(of_root, "ibm,trusted-boot", )) {
> + if (trustedboot)
> + enabled = (trustedboot > 0) ? true : false;

Likewise for trusted boot.

Regards,
Daniel

P.S. please could you add me to the cc: list for future revisions?

> + }
> +
> +out:
>   pr_info("Trusted boot mode %s\n", enabled ? "enabled" : "disabled");
>  
>   return enabled;
> -- 
> 2.26.2

Re: [PATCH] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-06 Thread Daniel Axtens

As a final extra note,

  https://github.com/daxtens/qemu/tree/pseries-secboot

is a qemu repository that fakes out the relevant properties if anyone
else wants to test this.

Also,

>  3-9 - enabled, OS-defined behaviour. In this patch we map all these
>values to enabled and enforcing. Again I think this is the
>appropriate thing to do.

this should read "enabled and enforcing, requirements are at the
discretion of the operating system". Apologies.

Regards,
Daniel

> ibm,trusted-boot isn't published by a current PowerVM LPAR but will be
> published in future. (Currently, trusted boot state is inferred by the
> presence or absense of a vTPM.) It's simply 1 = enabled, 0 = disabled.
>
> As for this patch specifically, with the very small nits below,
>
> Reviewed-by: Daniel Axtens 
>
>> -node = get_ppc_fw_sb_node();
>> -enabled = of_property_read_bool(node, "os-secureboot-enforcing");
>> +if (machine_is(powernv)) {
>> +node = get_ppc_fw_sb_node();
>> +enabled =
>> +of_property_read_bool(node, "os-secureboot-enforcing");
>> +of_node_put(node);
>> +}
>>  
>> -of_node_put(node);
>> +if (machine_is(pseries)) {
> Maybe this should be an else if?
>
>> +secureboot = of_get_property(of_root, "ibm,secure-boot", NULL);
>> +if (secureboot)
>> +enabled = (*secureboot > 1) ? true : false;
>> +}
>>  
>>  pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled");
>>  
>> @@ -38,11 +48,20 @@ bool is_ppc_trustedboot_enabled(void)
>>  {
>>  struct device_node *node;
>>  bool enabled = false;
>> +const u32 *trustedboot;
>>  
>> -node = get_ppc_fw_sb_node();
>> -enabled = of_property_read_bool(node, "trusted-enabled");
>> +if (machine_is(powernv)) {
>> +node = get_ppc_fw_sb_node();
>> +enabled = of_property_read_bool(node, "trusted-enabled");
>> +of_node_put(node);
>> +}
>>  
>> -of_node_put(node);
>> +if (machine_is(pseries)) {
> Likewise.
>> +trustedboot =
>> +of_get_property(of_root, "ibm,trusted-boot", NULL);
>> +if (trustedboot)
>> +enabled = (*trustedboot > 0) ? true : false;
>
> Regards,
> Daniel

Re: [PATCH] powerpc/pseries: detect secure and trusted boot state of the system.

2020-07-06 Thread Daniel Axtens

Thanks Nayna!

I'm hoping to get better public documentation for this soon as it's not
documented in a public PAPR yet.

Until then:

The values of ibm,secure-boot under PowerVM are:

 0 - disabled
 
 1 - audit mode only. This patch ignores this value for Linux, which I
 think is the appropriate thing to do.

 2 - enabled and enforcing

 3-9 - enabled, OS-defined behaviour. In this patch we map all these
   values to enabled and enforcing. Again I think this is the
   appropriate thing to do.

ibm,trusted-boot isn't published by a current PowerVM LPAR but will be
published in future. (Currently, trusted boot state is inferred by the
presence or absense of a vTPM.) It's simply 1 = enabled, 0 = disabled.

As for this patch specifically, with the very small nits below,

Reviewed-by: Daniel Axtens 

> - node = get_ppc_fw_sb_node();
> - enabled = of_property_read_bool(node, "os-secureboot-enforcing");
> + if (machine_is(powernv)) {
> + node = get_ppc_fw_sb_node();
> + enabled =
> + of_property_read_bool(node, "os-secureboot-enforcing");
> + of_node_put(node);
> + }
>  
> - of_node_put(node);
> + if (machine_is(pseries)) {
Maybe this should be an else if?

> + secureboot = of_get_property(of_root, "ibm,secure-boot", NULL);
> + if (secureboot)
> + enabled = (*secureboot > 1) ? true : false;
> + }
>  
>   pr_info("Secure boot mode %s\n", enabled ? "enabled" : "disabled");
>  
> @@ -38,11 +48,20 @@ bool is_ppc_trustedboot_enabled(void)
>  {
>   struct device_node *node;
>   bool enabled = false;
> + const u32 *trustedboot;
>  
> - node = get_ppc_fw_sb_node();
> - enabled = of_property_read_bool(node, "trusted-enabled");
> + if (machine_is(powernv)) {
> + node = get_ppc_fw_sb_node();
> + enabled = of_property_read_bool(node, "trusted-enabled");
> + of_node_put(node);
> + }
>  
> - of_node_put(node);
> + if (machine_is(pseries)) {
Likewise.
> + trustedboot =
> + of_get_property(of_root, "ibm,trusted-boot", NULL);
> + if (trustedboot)
> + enabled = (*trustedboot > 0) ? true : false;

Regards,
Daniel

[PATCH v8 1/4] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2020-07-01 Thread Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.

Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 18 +++---
 mm/kasan/init.c   |  6 +++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 82522e996c76..b6f94952333b 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -14,10 +14,22 @@ struct task_struct;
 #include 
 #include 
 
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index fe6be0be1f76..42bca3d27db8 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -46,7 +46,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -58,7 +58,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -69,7 +69,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE] __page_aligned_bss;
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE] __page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
 {
-- 
2.25.1

[PATCH v8 0/4] KASAN for powerpc64 radix

2020-07-01 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU.

This provides full inline instrumentation on radix, but does require
that you be able to specify the amount of physically contiguous memory
on the system at compile time. More details in patch 4.

v8 is just a rebase of v7 on a more recent powerpc/merge and a fixup
of a whitespace error.

Module globals still don't work, but that's due to some 'clever'
renaming of a section that the powerpc module loading code does to
avoid more complicated relocations/tramplines rather than anything to
do with KASAN.

Daniel Axtens (4):
  kasan: define and use MAX_PTRS_PER_* for early shadow tables
  kasan: Document support on 32-bit powerpc
  powerpc/mm/kasan: rename kasan_init_32.c to init_32.c
  powerpc: Book3S 64-bit "heavyweight" KASAN support

 Documentation/dev-tools/kasan.rst |   8 +-
 Documentation/powerpc/kasan.txt   | 122 ++
 arch/powerpc/Kconfig  |   3 +-
 arch/powerpc/Kconfig.debug|  23 +++-
 arch/powerpc/Makefile |  11 ++
 arch/powerpc/include/asm/book3s/64/hash.h |   4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h  |   7 +
 arch/powerpc/include/asm/book3s/64/radix.h|   5 +
 arch/powerpc/include/asm/kasan.h  |  11 +-
 arch/powerpc/kernel/Makefile  |   2 +
 arch/powerpc/kernel/process.c |  16 ++-
 arch/powerpc/kernel/prom.c|  76 ++-
 arch/powerpc/mm/kasan/Makefile|   3 +-
 .../mm/kasan/{kasan_init_32.c => init_32.c}   |   0
 arch/powerpc/mm/kasan/init_book3s_64.c|  73 +++
 arch/powerpc/mm/ptdump/ptdump.c   |  10 +-
 arch/powerpc/platforms/Kconfig.cputype|   1 +
 include/linux/kasan.h |  18 ++-
 mm/kasan/init.c   |   6 +-
 19 files changed, 377 insertions(+), 22 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

-- 
2.25.1

[PATCH v8 4/4] powerpc: Book3S 64-bit "heavyweight" KASAN support

2020-07-01 Thread Daniel Axtens

Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU:

 - Set aside the last 1/8th of the first contiguous block of physical
   memory to provide writable shadow for the linear map. For annoying
   reasons documented below, the memory size must be specified at compile
   time.

 - Enable the compiler instrumentation to check addresses and maintain the
   shadow region. (This is the guts of KASAN which we can easily reuse.)

 - Require kasan-vmalloc support to handle modules and anything else in
   vmalloc space.

 - KASAN needs to be able to validate all pointer accesses, but we can't
   back all kernel addresses with real memory - only linear map and
   vmalloc. On boot, set up a single page of read-only shadow that marks
   all these accesses as valid.

 - Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
   generic code, arm64, s390 and x86 all do this for similar sorts of
   reasons: when unwinding a stack, we might touch memory that KASAN has
   marked as being out-of-bounds. In our case we often get this when
   checking for an exception frame because we're checking an arbitrary
   offset into the stack frame.

   See commit 20955746320e ("s390/kasan: avoid false positives during stack
   unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
   frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
   Prevent KASAN false positive warnings") and commit 6e22c8366416
   ("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer")

 - Document KASAN in both generic and powerpc docs.

Background
--

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code in real mode after booting. Most notably a lot of KVM
   runs in real mode, and it would be good to be able to instrument it.

 - Because code runs in real mode after boot, the offset has to point to
   valid memory both in and out of real mode.

[ppc64 mm note: The kernel installs a linear mapping at effective
address c000... onward. This is a one-to-one mapping with physical
memory from ... onward. Because of how memory accesses work on
powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
same memory both with translations on (accessing as an 'effective
address'), and with translations off (accessing as a 'real
address'). This works in both guests and the hypervisor. For more
details, see s5.7 of Book III of version 3 of the ISA, in particular
the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
KASAN implementation currently only supports Radix.]

One approach is just to give up on inline instrumentation. This way all
checks can be delayed until after everything set is up correctly, and the
address-to-shadow calculations can be overridden. However, the features and
speed boost provided by inline instrumentation are worth trying to do
better.

If _at compile time_ it is known how much contiguous physical memory a
system has, the top 1/8th of the first block of physical memory can be set
aside for the shadow. This is a big hammer and comes with 3 big
consequences:

 - there's no nice way to handle physically discontiguous memory, so only
   the first physical memory block can be used.

 - kernels will simply fail to boot on machines with less memory than
   specified when compiling.

 - kernels running on machines with more memory than specified when
   compiling will simply ignore the extra memory.

Despite the limitations, it can still find bugs,
e.g. http://patchwork.ozlabs.org/patch/1103775/

At the moment, this physical memory limit must be set _even for outline
mode_. This may be changed in a later series - a different implementation
could be added for outline mode that dynamically allocates shadow at a
fixed offset. For example, see https://patchwork.ozlabs.org/patch/795211/

Suggested-by: Michael Ellerman 
Cc: Balbir Singh  # ppc64 out-of-line radix version
Cc: Christophe Leroy  # ppc32 version
Reviewed-by:  # focussed mainly on Documentation
 and things impacting PPC32
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst|   9 +-
 Documentation/powerpc/kasan.txt  | 112 ++-
 arch/powerpc/Kconfig |   3 +-
 arch/powerpc/Kconfig.debug   |  23 +++-
 arch/powerpc/Makefile|  11 ++
 arch/powerpc/include/asm/book3s/64/hash.h|   4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |   7 ++
 arch/powerpc/include/asm/book3s/64/radix.h   |   5 +
 arch/powerpc/include/asm/kasan.h |  11 +-
 arch/powerpc/kernel/Makefile |   2 +
 arch/powerpc/kernel/proce

[PATCH v8 3/4] powerpc/mm/kasan: rename kasan_init_32.c to init_32.c

2020-07-01 Thread Daniel Axtens

kasan is already implied by the directory name, we don't need to
repeat it.

Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/mm/kasan/Makefile   | 2 +-
 arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)

diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index bb1a5408b86b..42fb628a44fd 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -2,6 +2,6 @@
 
 KASAN_SANITIZE := n
 
-obj-$(CONFIG_PPC32)   += kasan_init_32.o
+obj-$(CONFIG_PPC32)   += init_32.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
 obj-$(CONFIG_PPC_BOOK3S_32)+= book3s_32.o
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/init_32.c
similarity index 100%
rename from arch/powerpc/mm/kasan/kasan_init_32.c
rename to arch/powerpc/mm/kasan/init_32.c
-- 
2.25.1

[PATCH v8 2/4] kasan: Document support on 32-bit powerpc

2020-07-01 Thread Daniel Axtens

KASAN is supported on 32-bit powerpc and the docs should reflect this.

Document s390 support while we're at it.

Suggested-by: Christophe Leroy 
Reviewed-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst |  7 +--
 Documentation/powerpc/kasan.txt   | 12 
 2 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index c652d740735d..554cbee1d240 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -22,7 +22,8 @@ global variables yet.
 Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later.
 
 Currently generic KASAN is supported for the x86_64, arm64, xtensa, s390 and
-riscv architectures, and tag-based KASAN is supported only for arm64.
+riscv architectures. It is also supported on 32-bit powerpc kernels. Tag-based
+KASAN is supported only on arm64.
 
 Usage
 -
@@ -255,7 +256,9 @@ CONFIG_KASAN_VMALLOC
 
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
-cost of greater memory usage. Currently this is only supported on x86.
+cost of greater memory usage. Currently this supported on x86, s390
+and 32-bit powerpc. It is optional, except on 32-bit powerpc kernels
+with module support, where it is required.
 
 This works by hooking into vmalloc and vmap, and dynamically
 allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
new file mode 100644
index ..26bb0e8bb18c
--- /dev/null
+++ b/Documentation/powerpc/kasan.txt
@@ -0,0 +1,12 @@
+KASAN is supported on powerpc on 32-bit only.
+
+32 bit support
+==
+
+KASAN is supported on both hash and nohash MMUs on 32-bit.
+
+The shadow area sits at the top of the kernel virtual memory space above the
+fixmap area and occupies one eighth of the total kernel virtual memory space.
+
+Instrumentation of the vmalloc area is optional, unless built with modules,
+in which case it is required.
-- 
2.25.1

Re: [PATCH AUTOSEL 4.14 72/72] string.h: fix incompatibility between FORTIFY_SOURCE and KASAN

2020-06-08 Thread Daniel Axtens

Hi Sasha,

There's nothing inherently wrong with these patches being backported,
but they fix a bug that doesn't cause a crash and only affects debug
kernels compiled with KASAN and FORTIFY_SOURCE. Personally I wouldn't
change a core header file in a stable kernel for that. Perhaps I'm too
risk-averse.

Regards,
Daniel

> From: Daniel Axtens 
>
> [ Upstream commit 47227d27e2fcb01a9e8f5958d8997cf47a820afc ]
>
> The memcmp KASAN self-test fails on a kernel with both KASAN and
> FORTIFY_SOURCE.
>
> When FORTIFY_SOURCE is on, a number of functions are replaced with
> fortified versions, which attempt to check the sizes of the operands.
> However, these functions often directly invoke __builtin_foo() once they
> have performed the fortify check.  Using __builtins may bypass KASAN
> checks if the compiler decides to inline it's own implementation as
> sequence of instructions, rather than emit a function call that goes out
> to a KASAN-instrumented implementation.
>
> Why is only memcmp affected?
> 
>
> Of the string and string-like functions that kasan_test tests, only memcmp
> is replaced by an inline sequence of instructions in my testing on x86
> with gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2).
>
> I believe this is due to compiler heuristics.  For example, if I annotate
> kmalloc calls with the alloc_size annotation (and disable some fortify
> compile-time checking!), the compiler will replace every memset except the
> one in kmalloc_uaf_memset with inline instructions.  (I have some WIP
> patches to add this annotation.)
>
> Does this affect other functions in string.h?
> =
>
> Yes. Anything that uses __builtin_* rather than __real_* could be
> affected. This looks like:
>
>  - strncpy
>  - strcat
>  - strlen
>  - strlcpy maybe, under some circumstances?
>  - strncat under some circumstances
>  - memset
>  - memcpy
>  - memmove
>  - memcmp (as noted)
>  - memchr
>  - strcpy
>
> Whether a function call is emitted always depends on the compiler.  Most
> bugs should get caught by FORTIFY_SOURCE, but the missed memcmp test shows
> that this is not always the case.
>
> Isn't FORTIFY_SOURCE disabled with KASAN?
> -
>
> The string headers on all arches supporting KASAN disable fortify with
> kasan, but only when address sanitisation is _also_ disabled.  For example
> from x86:
>
>  #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
>  /*
>   * For files that are not instrumented (e.g. mm/slub.c) we
>   * should use not instrumented version of mem* functions.
>   */
>  #define memcpy(dst, src, len) __memcpy(dst, src, len)
>  #define memmove(dst, src, len) __memmove(dst, src, len)
>  #define memset(s, c, n) __memset(s, c, n)
>
>  #ifndef __NO_FORTIFY
>  #define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
>  #endif
>
>  #endif
>
> This comes from commit 6974f0c4555e ("include/linux/string.h: add the
> option of fortified string.h functions"), and doesn't work when KASAN is
> enabled and the file is supposed to be sanitised - as with test_kasan.c
>
> I'm pretty sure this is not wrong, but not as expansive it should be:
>
>  * we shouldn't use __builtin_memcpy etc in files where we don't have
>instrumentation - it could devolve into a function call to memcpy,
>which will be instrumented. Rather, we should use __memcpy which
>by convention is not instrumented.
>
>  * we also shouldn't be using __builtin_memcpy when we have a KASAN
>instrumented file, because it could be replaced with inline asm
>that will not be instrumented.
>
> What is correct behaviour?
> ==
>
> Firstly, there is some overlap between fortification and KASAN: both
> provide some level of _runtime_ checking. Only fortify provides
> compile-time checking.
>
> KASAN and fortify can pick up different things at runtime:
>
>  - Some fortify functions, notably the string functions, could easily be
>modified to consider sub-object sizes (e.g. members within a struct),
>and I have some WIP patches to do this. KASAN cannot detect these
>because it cannot insert poision between members of a struct.
>
>  - KASAN can detect many over-reads/over-writes when the sizes of both
>operands are unknown, which fortify cannot.
>
> So there are a couple of options:
>
>  1) Flip the test: disable fortify in santised files and enable it in
> unsanitised files. This at least stops us missing KASAN checking, but
> we lose the fortify checking.
>
>  2) Make the fortify code always call out to real versions. Do this only

Re: [PATCH v2] relay: handle alloc_percpu returning NULL in relay_open

2020-05-24 Thread Daniel Axtens

>> > Check if alloc_percpu returns NULL.
>> > 
>> > This was found by syzkaller both on x86 and powerpc, and the reproducer
>> > it found on powerpc is capable of hitting the issue as an unprivileged
>> > user.
>> > 
>> > Fixes: 017c59c042d0 ("relay: Use per CPU constructs for the relay channel 
>> > buffer pointers")
>> > Reported-by: syzbot+1e925b4b836afe85a...@syzkaller-ppc64.appspotmail.com
>> > Reported-by: syzbot+587b2421926808309...@syzkaller-ppc64.appspotmail.com
>> > Reported-by: syzbot+58320b7171734bf79...@syzkaller.appspotmail.com
>> > Reported-by: syzbot+d6074fb08bdb2e010...@syzkaller.appspotmail.com
>> > Cc: Akash Goel 
>> > Cc: Andrew Donnellan  # syzkaller-ppc64
>> > Reviewed-by: Michael Ellerman 
>> > Reviewed-by: Andrew Donnellan 
>> > Cc: sta...@vger.kernel.org # v4.10+
>> > Signed-off-by: Daniel Axtens 
>> 
>> Acked-by: David Rientjes 
>
> It looks this one was never applied (which relates to CVE-2019-19462,
> as pointed by Guenter in 20191223163610.ga32...@roeck-us.net).
>
> Whas this lost or are there any issues pending?

I'm not aware of any pending issues.

(But, if anyone does have any objections I'm happy to revise the patch.)

Regards,
Daniel

[PATCH v9 5/5] kasan debug: track pages allocated for vmalloc shadow

2019-10-16 Thread Daniel Axtens

Provide the current number of vmalloc shadow pages in
/sys/kernel/debug/kasan/vmalloc_shadow_pages.

Signed-off-by: Daniel Axtens 

---

v8: rename kasan_vmalloc/shadow_pages -> kasan/vmalloc_shadow_pages

On v4 (no dynamic freeing), I saw the following approximate figures
on my test VM:

 - fresh boot: 720
 - after test_vmalloc: ~14000

With v5 (lazy dynamic freeing):

 - boot: ~490-500
 - running modprobe test_vmalloc pushes the figures up to sometimes
as high as ~14000, but they drop down to ~560 after the test ends.
I'm not sure where the extra sixty pages are from, but running the
test repeately doesn't cause the number to keep growing, so I don't
think we're leaking.
 - with vmap_stack, spawning tasks pushes the figure up to ~4200, then
some clearing kicks in and drops it down to previous levels again.
---
 mm/kasan/common.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git mm/kasan/common.c mm/kasan/common.c
index 81521d180bec..ac05038afa5a 100644
--- mm/kasan/common.c
+++ mm/kasan/common.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -750,6 +751,8 @@ core_initcall(kasan_memhotplug_init);
 #endif
 
 #ifdef CONFIG_KASAN_VMALLOC
+static u64 vmalloc_shadow_pages;
+
 static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr,
  void *unused)
 {
@@ -782,6 +785,7 @@ static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned 
long addr,
if (likely(pte_none(*ptep))) {
set_pte_at(_mm, addr, ptep, pte);
page = 0;
+   vmalloc_shadow_pages++;
}
spin_unlock(_mm.page_table_lock);
if (page)
@@ -836,6 +840,7 @@ static int kasan_depopulate_vmalloc_pte(pte_t *ptep, 
unsigned long addr,
pte_clear(_mm, addr, ptep);
flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
free_page(page);
+   vmalloc_shadow_pages--;
}
spin_unlock(_mm.page_table_lock);
 
@@ -954,4 +959,25 @@ void kasan_release_vmalloc(unsigned long start, unsigned 
long end,
   (unsigned long)shadow_end);
}
 }
+
+static __init int kasan_init_debugfs(void)
+{
+   struct dentry *root, *count;
+
+   root = debugfs_create_dir("kasan", NULL);
+   if (IS_ERR(root)) {
+   if (PTR_ERR(root) == -ENODEV)
+   return 0;
+   return PTR_ERR(root);
+   }
+
+   count = debugfs_create_u64("vmalloc_shadow_pages", 0444, root,
+  _shadow_pages);
+
+   if (IS_ERR(count))
+   return PTR_ERR(root);
+
+   return 0;
+}
+late_initcall(kasan_init_debugfs);
 #endif
-- 
2.20.1

[PATCH v9 4/5] x86/kasan: support KASAN_VMALLOC

2019-10-16 Thread Daniel Axtens

In the case where KASAN directly allocates memory to back vmalloc
space, don't map the early shadow page over it.

We prepopulate pgds/p4ds for the range that would otherwise be empty.
This is required to get it synced to hardware on boot, allowing the
lower levels of the page tables to be filled dynamically.

Acked-by: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 

---
v5: fix some checkpatch CHECK warnings. There are some that remain
around lines ending with '(': I have not changed these because
it's consistent with the rest of the file and it's not easy to
see how to fix it without creating an overlong line or lots of
temporary variables.

v2: move from faulting in shadow pgds to prepopulating
---
 arch/x86/Kconfig|  1 +
 arch/x86/mm/kasan_init_64.c | 60 +
 2 files changed, 61 insertions(+)

diff --git arch/x86/Kconfig arch/x86/Kconfig
index abe822d52167..92f5d5d5c78a 100644
--- arch/x86/Kconfig
+++ arch/x86/Kconfig
@@ -135,6 +135,7 @@ config X86
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN  if X86_64
+   select HAVE_ARCH_KASAN_VMALLOC  if X86_64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS  if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if MMU && COMPAT
diff --git arch/x86/mm/kasan_init_64.c arch/x86/mm/kasan_init_64.c
index 296da58f3013..8f00f462709e 100644
--- arch/x86/mm/kasan_init_64.c
+++ arch/x86/mm/kasan_init_64.c
@@ -245,6 +245,51 @@ static void __init kasan_map_early_shadow(pgd_t *pgd)
} while (pgd++, addr = next, addr != end);
 }
 
+static void __init kasan_shallow_populate_p4ds(pgd_t *pgd,
+  unsigned long addr,
+  unsigned long end,
+  int nid)
+{
+   p4d_t *p4d;
+   unsigned long next;
+   void *p;
+
+   p4d = p4d_offset(pgd, addr);
+   do {
+   next = p4d_addr_end(addr, end);
+
+   if (p4d_none(*p4d)) {
+   p = early_alloc(PAGE_SIZE, nid, true);
+   p4d_populate(_mm, p4d, p);
+   }
+   } while (p4d++, addr = next, addr != end);
+}
+
+static void __init kasan_shallow_populate_pgds(void *start, void *end)
+{
+   unsigned long addr, next;
+   pgd_t *pgd;
+   void *p;
+   int nid = early_pfn_to_nid((unsigned long)start);
+
+   addr = (unsigned long)start;
+   pgd = pgd_offset_k(addr);
+   do {
+   next = pgd_addr_end(addr, (unsigned long)end);
+
+   if (pgd_none(*pgd)) {
+   p = early_alloc(PAGE_SIZE, nid, true);
+   pgd_populate(_mm, pgd, p);
+   }
+
+   /*
+* we need to populate p4ds to be synced when running in
+* four level mode - see sync_global_pgds_l4()
+*/
+   kasan_shallow_populate_p4ds(pgd, addr, next, nid);
+   } while (pgd++, addr = next, addr != (unsigned long)end);
+}
+
 #ifdef CONFIG_KASAN_INLINE
 static int kasan_die_handler(struct notifier_block *self,
 unsigned long val,
@@ -352,9 +397,24 @@ void __init kasan_init(void)
shadow_cpu_entry_end = (void *)round_up(
(unsigned long)shadow_cpu_entry_end, PAGE_SIZE);
 
+   /*
+* If we're in full vmalloc mode, don't back vmalloc space with early
+* shadow pages. Instead, prepopulate pgds/p4ds so they are synced to
+* the global table and we can populate the lower levels on demand.
+*/
+#ifdef CONFIG_KASAN_VMALLOC
+   kasan_shallow_populate_pgds(
+   kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
+   kasan_mem_to_shadow((void *)VMALLOC_END));
+
+   kasan_populate_early_shadow(
+   kasan_mem_to_shadow((void *)VMALLOC_END + 1),
+   shadow_cpu_entry_begin);
+#else
kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
shadow_cpu_entry_begin);
+#endif
 
kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin,
  (unsigned long)shadow_cpu_entry_end, 0);
-- 
2.20.1

[PATCH v9 0/5] kasan: support backing vmalloc space with real shadow memory

2019-10-16 Thread Daniel Axtens

Currently, vmalloc space is backed by the early shadow page. This
means that kasan is incompatible with VMAP_STACK.

This series provides a mechanism to back vmalloc space with real,
dynamically allocated memory. I have only wired up x86, because that's
the only currently supported arch I can work with easily, but it's
very easy to wire up other architectures, and it appears that there is
some work-in-progress code to do this on arm64 and s390.

This has been discussed before in the context of VMAP_STACK:
 - https://bugzilla.kernel.org/show_bug.cgi?id=202009
 - https://lkml.org/lkml/2018/7/22/198
 - https://lkml.org/lkml/2019/7/19/822

In terms of implementation details:

Most mappings in vmalloc space are small, requiring less than a full
page of shadow space. Allocating a full shadow page per mapping would
therefore be wasteful. Furthermore, to ensure that different mappings
use different shadow pages, mappings would have to be aligned to
KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.

Instead, share backing space across multiple mappings. Allocate a
backing page when a mapping in vmalloc space uses a particular page of
the shadow region. This page can be shared by other vmalloc mappings
later on.

We hook in to the vmap infrastructure to lazily clean up unused shadow
memory.

v1: https://lore.kernel.org/linux-mm/20190725055503.19507-1-...@axtens.net/
v2: https://lore.kernel.org/linux-mm/20190729142108.23343-1-...@axtens.net/
 Address review comments:
 - Patch 1: use kasan_unpoison_shadow's built-in handling of
ranges that do not align to a full shadow byte
 - Patch 3: prepopulate pgds rather than faulting things in
v3: https://lore.kernel.org/linux-mm/20190731071550.31814-1-...@axtens.net/
 Address comments from Mark Rutland:
 - kasan_populate_vmalloc is a better name
 - handle concurrency correctly
 - various nits and cleanups
 - relax module alignment in KASAN_VMALLOC case
v4: https://lore.kernel.org/linux-mm/20190815001636.12235-1-...@axtens.net/
 Changes to patch 1 only:
 - Integrate Mark's rework, thanks Mark!
 - handle the case where kasan_populate_shadow might fail
 - poision shadow on free, allowing the alloc path to just
 unpoision memory that it uses
v5: https://lore.kernel.org/linux-mm/20190830003821.10737-1-...@axtens.net/
 Address comments from Christophe Leroy:
 - Fix some issues with my descriptions in commit messages and docs
 - Dynamically free unused shadow pages by hooking into the vmap book-keeping
 - Split out the test into a separate patch
 - Optional patch to track the number of pages allocated
 - minor checkpatch cleanups
v6: https://lore.kernel.org/linux-mm/20190902112028.23773-1-...@axtens.net/
 Properly guard freeing pages in patch 1, drop debugging code.
v7: https://lore.kernel.org/linux-mm/20190903145536.3390-1-...@axtens.net/
Add a TLB flush on freeing, thanks Mark Rutland.
Explain more clearly how I think freeing is concurrency-safe.
v8: https://lore.kernel.org/linux-mm/20191001065834.8880-1-...@axtens.net/
rename kasan_vmalloc/shadow_pages to kasan/vmalloc_shadow_pages
v9: address a number of review comments for patch 1.

Daniel Axtens (5):
  kasan: support backing vmalloc space with real shadow memory
  kasan: add test for vmalloc
  fork: support VMAP_STACK with KASAN_VMALLOC
  x86/kasan: support KASAN_VMALLOC
  kasan debug: track pages allocated for vmalloc shadow

 Documentation/dev-tools/kasan.rst |  63 
 arch/Kconfig  |   9 +-
 arch/x86/Kconfig  |   1 +
 arch/x86/mm/kasan_init_64.c   |  60 
 include/linux/kasan.h |  31 
 include/linux/moduleloader.h  |   2 +-
 include/linux/vmalloc.h   |  12 ++
 kernel/fork.c |   4 +
 lib/Kconfig.kasan |  16 ++
 lib/test_kasan.c  |  26 
 mm/kasan/common.c | 237 ++
 mm/kasan/generic_report.c |   3 +
 mm/kasan/kasan.h  |   1 +
 mm/vmalloc.c  |  48 +-
 14 files changed, 503 insertions(+), 10 deletions(-)

-- 
2.20.1

[PATCH v9 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-16 Thread Daniel Axtens

Hook into vmalloc and vmap, and dynamically allocate real shadow
memory to back the mappings.

Most mappings in vmalloc space are small, requiring less than a full
page of shadow space. Allocating a full shadow page per mapping would
therefore be wasteful. Furthermore, to ensure that different mappings
use different shadow pages, mappings would have to be aligned to
KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.

Instead, share backing space across multiple mappings. Allocate a
backing page when a mapping in vmalloc space uses a particular page of
the shadow region. This page can be shared by other vmalloc mappings
later on.

We hook in to the vmap infrastructure to lazily clean up unused shadow
memory.

To avoid the difficulties around swapping mappings around, this code
expects that the part of the shadow region that covers the vmalloc
space will not be covered by the early shadow page, but will be left
unmapped. This will require changes in arch-specific code.

This allows KASAN with VMAP_STACK, and may be helpful for architectures
that do not have a separate module space (e.g. powerpc64, which I am
currently working on). It also allows relaxing the module alignment
back to PAGE_SIZE.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202009
Acked-by: Vasily Gorbik 
Co-developed-by: Mark Rutland 
Signed-off-by: Mark Rutland  [shadow rework]
Signed-off-by: Daniel Axtens 

--

[I haven't tried to resolve the question of spurious faults. My
understanding is that in order to see the issue, you'd need to:

  a) on CPU 0, allocate shadow memory for memory in the vmalloc or
 module region, where the allocation is located at a fixed
 address, and
  
  b) on CPU 1, access that shadow memory via the fixed address (to
 avoid the dependency that would otherwise require the correct
 ordering)

If this is an issue on x86, we can fix it in the x86 enablement
patch. Hopefully someone from x86land can let us know.]

v2: let kasan_unpoison_shadow deal with ranges that do not use a
full shadow byte.

v3: relax module alignment
rename to kasan_populate_vmalloc which is a much better name
deal with concurrency correctly

v4: Mark's rework
Poision pages on vfree
Handle allocation failures

v5: Per Christophe Leroy, split out test and dynamically free pages.

v6: Guard freeing page properly. Drop WARN_ON_ONCE(pte_none(*ptep)),
 on reflection it's unnecessary debugging cruft with too high a
 false positive rate.

v7: tlb flush, thanks Mark.
explain more clearly how freeing works and is concurrency-safe.

v9:  - Pull in Uladzislau Rezki's changes to better line up with the
   design of the new vmalloc implementation. Thanks Vlad.
 - clarify comment explaining smp_wmb() per Mark and Andrey's discussion
 - tighten up the allocation of backing memory so that it only
   happens for vmalloc or module  space allocations. Thanks Andrey
   Ryabinin.
 - A TLB flush in the freeing path, thanks Mark Rutland.
---
 Documentation/dev-tools/kasan.rst |  63 +
 include/linux/kasan.h |  31 +
 include/linux/moduleloader.h  |   2 +-
 include/linux/vmalloc.h   |  12 ++
 lib/Kconfig.kasan |  16 +++
 mm/kasan/common.c | 211 ++
 mm/kasan/generic_report.c |   3 +
 mm/kasan/kasan.h  |   1 +
 mm/vmalloc.c  |  48 ++-
 9 files changed, 381 insertions(+), 6 deletions(-)

diff --git Documentation/dev-tools/kasan.rst Documentation/dev-tools/kasan.rst
index 525296121d89..e4d66e7c50de 100644
--- Documentation/dev-tools/kasan.rst
+++ Documentation/dev-tools/kasan.rst
@@ -218,3 +218,66 @@ brk handler is used to print bug reports.
 A potential expansion of this mode is a hardware tag-based mode, which would
 use hardware memory tagging support instead of compiler instrumentation and
 manual shadow memory manipulation.
+
+What memory accesses are sanitised by KASAN?
+
+
+The kernel maps memory in a number of different parts of the address
+space. This poses something of a problem for KASAN, which requires
+that all addresses accessed by instrumented code have a valid shadow
+region.
+
+The range of kernel virtual addresses is large: there is not enough
+real memory to support a real shadow region for every address that
+could be accessed by the kernel.
+
+By default
+~~
+
+By default, architectures only map real memory over the shadow region
+for the linear mapping (and potentially other small areas). For all
+other areas - such as vmalloc and vmemmap space - a single read-only
+page is mapped over the shadow area. This read-only shadow page
+declares all memory accesses as permitted.
+
+This presents a problem for modules: they do not live in the linear
+mapping, but in a dedicated module space. By hooking in to the module
+allocator, KASAN can temporarily map real shadow memory to cover
+them. This allows

[PATCH v9 3/5] fork: support VMAP_STACK with KASAN_VMALLOC

2019-10-16 Thread Daniel Axtens

Supporting VMAP_STACK with KASAN_VMALLOC is straightforward:

 - clear the shadow region of vmapped stacks when swapping them in
 - tweak Kconfig to allow VMAP_STACK to be turned on with KASAN

Reviewed-by: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 
---
 arch/Kconfig  | 9 +
 kernel/fork.c | 4 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git arch/Kconfig arch/Kconfig
index 5f8a5d84dbbe..2d914990402f 100644
--- arch/Kconfig
+++ arch/Kconfig
@@ -843,16 +843,17 @@ config HAVE_ARCH_VMAP_STACK
 config VMAP_STACK
default y
bool "Use a virtually-mapped stack"
-   depends on HAVE_ARCH_VMAP_STACK && !KASAN
+   depends on HAVE_ARCH_VMAP_STACK
+   depends on !KASAN || KASAN_VMALLOC
---help---
  Enable this if you want the use virtually-mapped kernel stacks
  with guard pages.  This causes kernel stack overflows to be
  caught immediately rather than causing difficult-to-diagnose
  corruption.
 
- This is presently incompatible with KASAN because KASAN expects
- the stack to map directly to the KASAN shadow map using a formula
- that is incorrect if the stack is in vmalloc space.
+ To use this with KASAN, the architecture must support backing
+ virtual mappings with real shadow memory, and KASAN_VMALLOC must
+ be enabled.
 
 config ARCH_OPTIONAL_KERNEL_RWX
def_bool n
diff --git kernel/fork.c kernel/fork.c
index bcdf53125210..484ca6b0ae6c 100644
--- kernel/fork.c
+++ kernel/fork.c
@@ -94,6 +94,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -224,6 +225,9 @@ static unsigned long *alloc_thread_stack_node(struct 
task_struct *tsk, int node)
if (!s)
continue;
 
+   /* Clear the KASAN shadow of the stack. */
+   kasan_unpoison_shadow(s->addr, THREAD_SIZE);
+
/* Clear stale pointers from reused stack. */
memset(s->addr, 0, THREAD_SIZE);
 
-- 
2.20.1

[PATCH v9 2/5] kasan: add test for vmalloc

2019-10-16 Thread Daniel Axtens

Test kasan vmalloc support by adding a new test to the module.

Signed-off-by: Daniel Axtens 

--

v5: split out per Christophe Leroy
---
 lib/test_kasan.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git lib/test_kasan.c lib/test_kasan.c
index 49cc4d570a40..328d33beae36 100644
--- lib/test_kasan.c
+++ lib/test_kasan.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -748,6 +749,30 @@ static noinline void __init kmalloc_double_kzfree(void)
kzfree(ptr);
 }
 
+#ifdef CONFIG_KASAN_VMALLOC
+static noinline void __init vmalloc_oob(void)
+{
+   void *area;
+
+   pr_info("vmalloc out-of-bounds\n");
+
+   /*
+* We have to be careful not to hit the guard page.
+* The MMU will catch that and crash us.
+*/
+   area = vmalloc(3000);
+   if (!area) {
+   pr_err("Allocation failed\n");
+   return;
+   }
+
+   ((volatile char *)area)[3100];
+   vfree(area);
+}
+#else
+static void __init vmalloc_oob(void) {}
+#endif
+
 static int __init kmalloc_tests_init(void)
 {
/*
@@ -793,6 +818,7 @@ static int __init kmalloc_tests_init(void)
kasan_strings();
kasan_bitops();
kmalloc_double_kzfree();
+   vmalloc_oob();
 
kasan_restore_multi_shot(multishot);
 
-- 
2.20.1

Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-15 Thread Daniel Axtens



> There is a potential problem here, as Will Deacon wrote up at:
>
>   
> https://lore.kernel.org/linux-arm-kernel/20190827131818.14724-1-w...@kernel.org/
>
> ... in the section starting:
>
> | *** Other architecture maintainers -- start here! ***
>
> ... whereby the CPU can spuriously fault on an access after observing a
> valid PTE.
>
> For arm64 we handle the spurious fault, and it looks like x86 would need
> something like its vmalloc_fault() applying to the shadow region to
> cater for this.

I'm not really up on x86 - my first thought would be that their stronger
memory ordering might be sufficient but I really don't know. Reading the
thread I see arm and powerpc discussions but nothing from anyone else,
so I'm none the wiser there...

Andy, do you have any thoughts?

Regards,
Daniel

>
> Thanks,
> Mark.

Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-15 Thread Daniel Axtens

>>> @@ -2497,6 +2533,9 @@ void *__vmalloc_node_range(unsigned long size, 
>>> unsigned long align,
>>> if (!addr)
>>> return NULL;
>>>  
>>> +   if (kasan_populate_vmalloc(real_size, area))
>>> +   return NULL;
>>> +
>>
>> KASAN itself uses __vmalloc_node_range() to allocate and map shadow in 
>> memory online callback.
>> So we should either skip non-vmalloc and non-module addresses here or teach 
>> kasan's memory online/offline
>> callbacks to not use __vmalloc_node_range() (do something similar to 
>> kasan_populate_vmalloc() perhaps?). 
>
> Ah, right you are. I haven't been testing that.
>
> I am a bit nervous about further restricting kasan_populate_vmalloc: I
> seem to remember having problems with code using the vmalloc family of
> functions to map memory that doesn't lie within vmalloc space but which
> still has instrumented accesses.

I was wrong or remembering early implementation bugs.

If the memory we're allocating in __vmalloc_node_range falls outside of
vmalloc and module space, it shouldn't be getting shadow mapped for it
by kasan_populate_vmalloc. For v9, I've guarded the call with
is_vmalloc_or_module. It seems to work fine when tested with hotplugged
memory.

Thanks again.

Regards,
Daniel

> On the other hand, I'm not keen on rewriting any of the memory
> on/offline code if I can avoid it!
>
> I'll have a look and get back you as soon as I can.
>
> Thanks for catching this.
>
> Kind regards,
> Daniel
>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "kasan-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to kasan-dev+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/kasan-dev/352cb4fa-2e57-7e3b-23af-898e113bbe22%40virtuozzo.com.

Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-15 Thread Daniel Axtens

Mark Rutland  writes:

> On Tue, Oct 01, 2019 at 04:58:30PM +1000, Daniel Axtens wrote:
>> Hook into vmalloc and vmap, and dynamically allocate real shadow
>> memory to back the mappings.
>> 
>> Most mappings in vmalloc space are small, requiring less than a full
>> page of shadow space. Allocating a full shadow page per mapping would
>> therefore be wasteful. Furthermore, to ensure that different mappings
>> use different shadow pages, mappings would have to be aligned to
>> KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.
>> 
>> Instead, share backing space across multiple mappings. Allocate a
>> backing page when a mapping in vmalloc space uses a particular page of
>> the shadow region. This page can be shared by other vmalloc mappings
>> later on.
>> 
>> We hook in to the vmap infrastructure to lazily clean up unused shadow
>> memory.
>> 
>> To avoid the difficulties around swapping mappings around, this code
>> expects that the part of the shadow region that covers the vmalloc
>> space will not be covered by the early shadow page, but will be left
>> unmapped. This will require changes in arch-specific code.
>> 
>> This allows KASAN with VMAP_STACK, and may be helpful for architectures
>> that do not have a separate module space (e.g. powerpc64, which I am
>> currently working on). It also allows relaxing the module alignment
>> back to PAGE_SIZE.
>> 
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202009
>> Acked-by: Vasily Gorbik 
>> Signed-off-by: Daniel Axtens 
>> [Mark: rework shadow allocation]
>> Signed-off-by: Mark Rutland 
>
> Sorry to point this out so late, but your S-o-B should come last in the
> chain per Documentation/process/submitting-patches.rst. Judging by the
> rest of that, I think you want something like:
>
> Co-developed-by: Mark Rutland 
> Signed-off-by: Mark Rutland  [shadow rework]
> Signed-off-by: Daniel Axtens 
>
> ... leaving yourself as the Author in the headers.

no worries, I wasn't really sure how best to arrange them, so thanks for
clarifying!

>
> Sorry to have made that more complicated!
>
> [...]
>
>> +static int kasan_depopulate_vmalloc_pte(pte_t *ptep, unsigned long addr,
>> +void *unused)
>> +{
>> +unsigned long page;
>> +
>> +page = (unsigned long)__va(pte_pfn(*ptep) << PAGE_SHIFT);
>> +
>> +spin_lock(_mm.page_table_lock);
>> +
>> +if (likely(!pte_none(*ptep))) {
>> +pte_clear(_mm, addr, ptep);
>> +free_page(page);
>> +}
>
> There should be TLB maintenance between clearing the PTE and freeing the
> page here.

Fixed for v9.

Regards,
Daniel

>
> Thanks,
> Mark.

Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-14 Thread Daniel Axtens

Hi Andrey,


>> +/*
>> + * Ensure poisoning is visible before the shadow is made visible
>> + * to other CPUs.
>> + */
>> +smp_wmb();
>
> I'm not quite understand what this barrier do and why it needed.
> And if it's really needed there should be a pairing barrier
> on the other side which I don't see.

Mark might be better able to answer this, but my understanding is that
we want to make sure that we never have a situation where the writes are
reordered so that PTE is installed before all the poisioning is written
out. I think it follows the logic in __pte_alloc() in mm/memory.c:

/*
 * Ensure all pte setup (eg. pte page lock and page clearing) are
 * visible before the pte is made visible to other CPUs by being
 * put into page tables.
 *
 * The other side of the story is the pointer chasing in the page
 * table walking code (when walking the page table without locking;
 * ie. most of the time). Fortunately, these data accesses consist
 * of a chain of data-dependent loads, meaning most CPUs (alpha
 * being the notable exception) will already guarantee loads are
 * seen in-order. See the alpha page table accessors for the
 * smp_read_barrier_depends() barriers in page table walking code.
 */
smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */

I can clarify the comment.

>> +
>> +spin_lock(_mm.page_table_lock);
>> +if (likely(pte_none(*ptep))) {
>> +set_pte_at(_mm, addr, ptep, pte);
>> +page = 0;
>> +}
>> +spin_unlock(_mm.page_table_lock);
>> +if (page)
>> +free_page(page);
>> +return 0;
>> +}
>> +
>
>
> ...
>
>> @@ -754,6 +769,8 @@ merge_or_add_vmap_area(struct vmap_area *va,
>>  }
>>  
>>  insert:
>> +kasan_release_vmalloc(orig_start, orig_end, va->va_start, va->va_end);
>> +
>>  if (!merged) {
>>  link_va(va, root, parent, link, head);
>>  augment_tree_propagate_from(va);
>> @@ -2068,6 +2085,22 @@ static struct vm_struct *__get_vm_area_node(unsigned 
>> long size,
>>  
>>  setup_vmalloc_vm(area, va, flags, caller);
>>  
>> +/*
>> + * For KASAN, if we are in vmalloc space, we need to cover the shadow
>> + * area with real memory. If we come here through VM_ALLOC, this is
>> + * done by a higher level function that has access to the true size,
>> + * which might not be a full page.
>> + *
>> + * We assume module space comes via VM_ALLOC path.
>> + */
>> +if (is_vmalloc_addr(area->addr) && !(area->flags & VM_ALLOC)) {
>> +if (kasan_populate_vmalloc(area->size, area)) {
>> +unmap_vmap_area(va);
>> +kfree(area);
>> +return NULL;
>> +}
>> +}
>> +
>>  return area;
>>  }
>>  
>> @@ -2245,6 +2278,9 @@ static void __vunmap(const void *addr, int 
>> deallocate_pages)
>>  debug_check_no_locks_freed(area->addr, get_vm_area_size(area));
>>  debug_check_no_obj_freed(area->addr, get_vm_area_size(area));
>>  
>> +if (area->flags & VM_KASAN)
>> +kasan_poison_vmalloc(area->addr, area->size);
>> +
>>  vm_remove_mappings(area, deallocate_pages);
>>  
>>  if (deallocate_pages) {
>> @@ -2497,6 +2533,9 @@ void *__vmalloc_node_range(unsigned long size, 
>> unsigned long align,
>>  if (!addr)
>>  return NULL;
>>  
>> +if (kasan_populate_vmalloc(real_size, area))
>> +return NULL;
>> +
>
> KASAN itself uses __vmalloc_node_range() to allocate and map shadow in memory 
> online callback.
> So we should either skip non-vmalloc and non-module addresses here or teach 
> kasan's memory online/offline
> callbacks to not use __vmalloc_node_range() (do something similar to 
> kasan_populate_vmalloc() perhaps?). 

Ah, right you are. I haven't been testing that.

I am a bit nervous about further restricting kasan_populate_vmalloc: I
seem to remember having problems with code using the vmalloc family of
functions to map memory that doesn't lie within vmalloc space but which
still has instrumented accesses.

On the other hand, I'm not keen on rewriting any of the memory
on/offline code if I can avoid it!

I'll have a look and get back you as soon as I can.

Thanks for catching this.

Kind regards,
Daniel

>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kasan-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/kasan-dev/352cb4fa-2e57-7e3b-23af-898e113bbe22%40virtuozzo.com.

Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-10 Thread Daniel Axtens

Hi Uladzislau,


> Looking at it one more, i think above part of code is a bit wrong
> and should be separated from merge_or_add_vmap_area() logic. The
> reason is to keep it simple and do only what it is supposed to do:
> merging or adding.
>
> Also the kasan_release_vmalloc() gets called twice there and looks like
> a duplication. Apart of that, merge_or_add_vmap_area() can be called via
> recovery path when vmap/vmaps is/are not even setup. See percpu
> allocator.
>
> I guess your part could be moved directly to the __purge_vmap_area_lazy()
> where all vmaps are lazily freed. To do so, we also need to modify
> merge_or_add_vmap_area() to return merged area:

Thanks for the review. I've integrated your snippet - it seems to work
fine, and I agree that it is much simpler and clearer. so I've rolled it
in to v9 which I will post soon.

Regards,
Daniel

>
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e92ff5f7dd8b..fecde4312d68 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -683,7 +683,7 @@ insert_vmap_area_augment(struct vmap_area *va,
>   * free area is inserted. If VA has been merged, it is
>   * freed.
>   */
> -static __always_inline void
> +static __always_inline struct vmap_area *
>  merge_or_add_vmap_area(struct vmap_area *va,
> struct rb_root *root, struct list_head *head)
>  {
> @@ -750,7 +750,10 @@ merge_or_add_vmap_area(struct vmap_area *va,
>  
> /* Free vmap_area object. */
> kmem_cache_free(vmap_area_cachep, va);
> -   return;
> +
> +   /* Point to the new merged area. */
> +   va = sibling;
> +   merged = true;
> }
> }
>  
> @@ -759,6 +762,8 @@ merge_or_add_vmap_area(struct vmap_area *va,
> link_va(va, root, parent, link, head);
> augment_tree_propagate_from(va);
> }
> +
> +   return va;
>  }
>  
>  static __always_inline bool
> @@ -1172,7 +1177,7 @@ static void __free_vmap_area(struct vmap_area *va)
> /*
>  * Merge VA with its neighbors, otherwise just add it.
>  */
> -   merge_or_add_vmap_area(va,
> +   (void) merge_or_add_vmap_area(va,
> _vmap_area_root, _vmap_area_list);
>  }
>  
> @@ -1279,15 +1284,20 @@ static bool __purge_vmap_area_lazy(unsigned long 
> start, unsigned long end)
> spin_lock(_area_lock);
> llist_for_each_entry_safe(va, n_va, valist, purge_list) {
> unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
> +   unsigned long orig_start = va->va_start;
> +   unsigned long orig_end = va->va_end;
>  
> /*
>  * Finally insert or merge lazily-freed area. It is
>  * detached and there is no need to "unlink" it from
>  * anything.
>  */
> -   merge_or_add_vmap_area(va,
> +   va = merge_or_add_vmap_area(va,
> _vmap_area_root, _vmap_area_list);
>  
> +   kasan_release_vmalloc(orig_start,
> +   orig_end, va->va_start, va->va_end);
> +
> atomic_long_sub(nr, _lazy_nr);
>  
> if (atomic_long_read(_lazy_nr) < resched_threshold)
> 
>
> --
> Vlad Rezki

Re: Kernel Concurrency Sanitizer (KCSAN)

2019-10-10 Thread Daniel Axtens

Marco Elver  writes:

> Hi Daniel,
>
> On Tue, 1 Oct 2019 at 16:50, Daniel Axtens  wrote:
>>
>> Hi Marco,
>>
>> > We would like to share a new data-race detector for the Linux kernel:
>> > Kernel Concurrency Sanitizer (KCSAN) --
>> > https://github.com/google/ktsan/wiki/KCSAN  (Details:
>> > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
>>
>> This builds and begins to boot on powerpc, which is fantastic.
>>
>> I'm seeing a lot of reports for locks are changed while being watched by
>> kcsan, so many that it floods the console and stalls the boot.
>>
>> I think, if I've understood correctly, that this is because powerpc
>> doesn't use the queued lock implementation for its spinlock but rather
>> its own assembler locking code. This means the writes aren't
>> instrumented by the compiler, while some reads are. (see
>> __arch_spin_trylock in e.g. arch/powerpc/include/asm/spinlock.h)
>>
>> Would the correct way to deal with this be for the powerpc code to call
>> out to __tsan_readN/__tsan_writeN before invoking the assembler that
>> reads and writes the lock?
>
> This should not be the issue, because with KCSAN, not instrumenting
> something does not lead to false positives. If two accesses are
> involved in a race, and neither of them are instrumented, KCSAN will
> not report a race; if however, 1 of them is instrumented (and the
> uninstrumented access is a write), KCSAN will infer a race due to the
> data value changed ("race at unknown origin").
>
> Rather, if there is spinlock code causing data-races, then there are 2 
> options:
> 1) Actually missing READ_ONCE/WRITE_ONCE somewhere.
> 2) You need to disable instrumentation for an entire function with
> __no_sanitize_thread or __no_kcsan_or_inline (for inline functions).
> This should only be needed for arch-specific code (e.g. see the
> changes we made to arch/x86).

Thanks, that was what I needed. I can now get it to boot Ubuntu on
ppc64le. Still hitting a lot of things, but we'll poke and prod it a bit
internally and let you know how we get on!

Regards,
Daniel

>
> Note: you can explicitly add instrumentation to uninstrumented
> accesses with the API in , but this shouldn't be
> the issue here.
>
> It would be good to symbolize the stack-traces, as otherwise it's hard
> to say exactly what needs to be done.
>
> Best,
> -- Marco
>
>> Regards,
>> Daniel
>>
>>
>> [   24.612864] 
>> ==
>> [   24.614188] BUG: KCSAN: racing read in __spin_yield+0xa8/0x180
>> [   24.614669]
>> [   24.614799] race at unknown origin, with read to 0xc0003fff9d00 of 4 
>> bytes by task 449 on cpu 11:
>> [   24.616024]  __spin_yield+0xa8/0x180
>> [   24.616377]  _raw_spin_lock_irqsave+0x1a8/0x1b0
>> [   24.616850]  release_pages+0x3a0/0x880
>> [   24.617203]  free_pages_and_swap_cache+0x13c/0x220
>> [   24.622548]  tlb_flush_mmu+0x210/0x2f0
>> [   24.622979]  tlb_finish_mmu+0x12c/0x240
>> [   24.623286]  exit_mmap+0x138/0x2c0
>> [   24.623779]  mmput+0xe0/0x330
>> [   24.624504]  do_exit+0x65c/0x1050
>> [   24.624835]  do_group_exit+0xb4/0x210
>> [   24.625458]  __wake_up_parent+0x0/0x80
>> [   24.625985]  system_call+0x5c/0x70
>> [   24.626415]
>> [   24.626651] Reported by Kernel Concurrency Sanitizer on:
>> [   24.628329] CPU: 11 PID: 449 Comm: systemd-bless-b Not tainted 
>> 5.3.0-7-gad29ff6c190d-dirty #9
>> [   24.629508] 
>> ==
>>
>> [   24.672860] 
>> ==
>> [   24.675901] BUG: KCSAN: data-race in _raw_spin_lock_irqsave+0x13c/0x1b0 
>> and _raw_spin_unlock_irqrestore+0x94/0x100
>> [   24.680847]
>> [   24.682743] write to 0xc001ffeefe00 of 4 bytes by task 455 on cpu 5:
>> [   24.683402]  _raw_spin_unlock_irqrestore+0x94/0x100
>> [   24.684593]  release_pages+0x250/0x880
>> [   24.685148]  free_pages_and_swap_cache+0x13c/0x220
>> [   24.686068]  tlb_flush_mmu+0x210/0x2f0
>> [   24.690190]  tlb_finish_mmu+0x12c/0x240
>> [   24.691082]  exit_mmap+0x138/0x2c0
>> [   24.693216]  mmput+0xe0/0x330
>> [   24.693597]  do_exit+0x65c/0x1050
>> [   24.694170]  do_group_exit+0xb4/0x210
>> [   24.694658]  __wake_up_parent+0x0/0x80
>> [   24.696230]  system_call+0x5c/0x70
>> [   24.700414]
>> [   24.712991] read to 0xc001ffeefe00 of 4 bytes by task 454 on cpu 20:
>> [   24.714419]  _raw_spin_lock_irqsave+0x13c/0x1b0
>>

Re: [PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-01 Thread Daniel Axtens

Hi,

>>  /*
>>   * Find a place in the tree where VA potentially will be
>>   * inserted, unless it is merged with its sibling/siblings.
>> @@ -741,6 +752,10 @@ merge_or_add_vmap_area(struct vmap_area *va,
>>  if (sibling->va_end == va->va_start) {
>>  sibling->va_end = va->va_end;
>>  
>> +kasan_release_vmalloc(orig_start, orig_end,
>> +  sibling->va_start,
>> +  sibling->va_end);
>> +
> The same.

The call to kasan_release_vmalloc() is a static inline no-op if
CONFIG_KASAN_VMALLOC is not defined, which I thought was the preferred
way to do things rather than sprinkling the code with ifdefs?

The complier should be smart enough to eliminate all the
orig_state/orig_end stuff at compile time because it can see that it's
not used, so there's no cost in the binary.

Regards,
Daniel

Re: Kernel Concurrency Sanitizer (KCSAN)

2019-10-01 Thread Daniel Axtens

Hi Marco,

> We would like to share a new data-race detector for the Linux kernel:
> Kernel Concurrency Sanitizer (KCSAN) --
> https://github.com/google/ktsan/wiki/KCSAN  (Details:
> https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)

This builds and begins to boot on powerpc, which is fantastic.

I'm seeing a lot of reports for locks are changed while being watched by
kcsan, so many that it floods the console and stalls the boot.

I think, if I've understood correctly, that this is because powerpc
doesn't use the queued lock implementation for its spinlock but rather
its own assembler locking code. This means the writes aren't
instrumented by the compiler, while some reads are. (see
__arch_spin_trylock in e.g. arch/powerpc/include/asm/spinlock.h)

Would the correct way to deal with this be for the powerpc code to call
out to __tsan_readN/__tsan_writeN before invoking the assembler that
reads and writes the lock?

Regards,
Daniel


[   24.612864] 
==
[   24.614188] BUG: KCSAN: racing read in __spin_yield+0xa8/0x180
[   24.614669] 
[   24.614799] race at unknown origin, with read to 0xc0003fff9d00 of 4 
bytes by task 449 on cpu 11:
[   24.616024]  __spin_yield+0xa8/0x180
[   24.616377]  _raw_spin_lock_irqsave+0x1a8/0x1b0
[   24.616850]  release_pages+0x3a0/0x880
[   24.617203]  free_pages_and_swap_cache+0x13c/0x220
[   24.622548]  tlb_flush_mmu+0x210/0x2f0
[   24.622979]  tlb_finish_mmu+0x12c/0x240
[   24.623286]  exit_mmap+0x138/0x2c0
[   24.623779]  mmput+0xe0/0x330
[   24.624504]  do_exit+0x65c/0x1050
[   24.624835]  do_group_exit+0xb4/0x210
[   24.625458]  __wake_up_parent+0x0/0x80
[   24.625985]  system_call+0x5c/0x70
[   24.626415] 
[   24.626651] Reported by Kernel Concurrency Sanitizer on:
[   24.628329] CPU: 11 PID: 449 Comm: systemd-bless-b Not tainted 
5.3.0-7-gad29ff6c190d-dirty #9
[   24.629508] 
==

[   24.672860] 
==
[   24.675901] BUG: KCSAN: data-race in _raw_spin_lock_irqsave+0x13c/0x1b0 and 
_raw_spin_unlock_irqrestore+0x94/0x100
[   24.680847] 
[   24.682743] write to 0xc001ffeefe00 of 4 bytes by task 455 on cpu 5:
[   24.683402]  _raw_spin_unlock_irqrestore+0x94/0x100
[   24.684593]  release_pages+0x250/0x880
[   24.685148]  free_pages_and_swap_cache+0x13c/0x220
[   24.686068]  tlb_flush_mmu+0x210/0x2f0
[   24.690190]  tlb_finish_mmu+0x12c/0x240
[   24.691082]  exit_mmap+0x138/0x2c0
[   24.693216]  mmput+0xe0/0x330
[   24.693597]  do_exit+0x65c/0x1050
[   24.694170]  do_group_exit+0xb4/0x210
[   24.694658]  __wake_up_parent+0x0/0x80
[   24.696230]  system_call+0x5c/0x70
[   24.700414] 
[   24.712991] read to 0xc001ffeefe00 of 4 bytes by task 454 on cpu 20:
[   24.714419]  _raw_spin_lock_irqsave+0x13c/0x1b0
[   24.715018]  pagevec_lru_move_fn+0xfc/0x1d0
[   24.715527]  __lru_cache_add+0x124/0x1a0
[   24.716072]  lru_cache_add+0x30/0x50
[   24.716411]  add_to_page_cache_lru+0x134/0x250
[   24.717938]  mpage_readpages+0x220/0x3f0
[   24.719737]  blkdev_readpages+0x50/0x80
[   24.721891]  read_pages+0xb4/0x340
[   24.722834]  __do_page_cache_readahead+0x318/0x350
[   24.723290]  force_page_cache_readahead+0x150/0x280
[   24.724391]  page_cache_sync_readahead+0xe4/0x110
[   24.725087]  generic_file_buffered_read+0xa20/0xdf0
[   24.727003]  generic_file_read_iter+0x220/0x310
[   24.728906] 
[   24.730044] Reported by Kernel Concurrency Sanitizer on:
[   24.732185] CPU: 20 PID: 454 Comm: systemd-gpt-aut Not tainted 
5.3.0-7-gad29ff6c190d-dirty #9
[   24.734317] 
==


>
> Thanks,
> -- Marco

[PATCH v8 5/5] kasan debug: track pages allocated for vmalloc shadow

2019-10-01 Thread Daniel Axtens

Provide the current number of vmalloc shadow pages in
/sys/kernel/debug/kasan/vmalloc_shadow_pages.

Signed-off-by: Daniel Axtens 

---

v8: rename kasan_vmalloc/shadow_pages -> kasan/vmalloc_shadow_pages

On v4 (no dynamic freeing), I saw the following approximate figures
on my test VM:

 - fresh boot: 720
 - after test_vmalloc: ~14000

With v5 (lazy dynamic freeing):

 - boot: ~490-500
 - running modprobe test_vmalloc pushes the figures up to sometimes
as high as ~14000, but they drop down to ~560 after the test ends.
I'm not sure where the extra sixty pages are from, but running the
test repeately doesn't cause the number to keep growing, so I don't
think we're leaking.
 - with vmap_stack, spawning tasks pushes the figure up to ~4200, then
some clearing kicks in and drops it down to previous levels again.
---
 mm/kasan/common.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index e33cbab83309..5b924f860a32 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -750,6 +751,8 @@ core_initcall(kasan_memhotplug_init);
 #endif
 
 #ifdef CONFIG_KASAN_VMALLOC
+static u64 vmalloc_shadow_pages;
+
 static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr,
  void *unused)
 {
@@ -776,6 +779,7 @@ static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned 
long addr,
if (likely(pte_none(*ptep))) {
set_pte_at(_mm, addr, ptep, pte);
page = 0;
+   vmalloc_shadow_pages++;
}
spin_unlock(_mm.page_table_lock);
if (page)
@@ -829,6 +833,7 @@ static int kasan_depopulate_vmalloc_pte(pte_t *ptep, 
unsigned long addr,
if (likely(!pte_none(*ptep))) {
pte_clear(_mm, addr, ptep);
free_page(page);
+   vmalloc_shadow_pages--;
}
spin_unlock(_mm.page_table_lock);
 
@@ -947,4 +952,25 @@ void kasan_release_vmalloc(unsigned long start, unsigned 
long end,
   (unsigned long)shadow_end);
}
 }
+
+static __init int kasan_init_debugfs(void)
+{
+   struct dentry *root, *count;
+
+   root = debugfs_create_dir("kasan", NULL);
+   if (IS_ERR(root)) {
+   if (PTR_ERR(root) == -ENODEV)
+   return 0;
+   return PTR_ERR(root);
+   }
+
+   count = debugfs_create_u64("vmalloc_shadow_pages", 0444, root,
+  _shadow_pages);
+
+   if (IS_ERR(count))
+   return PTR_ERR(root);
+
+   return 0;
+}
+late_initcall(kasan_init_debugfs);
 #endif
-- 
2.20.1

[PATCH v8 4/5] x86/kasan: support KASAN_VMALLOC

2019-10-01 Thread Daniel Axtens

In the case where KASAN directly allocates memory to back vmalloc
space, don't map the early shadow page over it.

We prepopulate pgds/p4ds for the range that would otherwise be empty.
This is required to get it synced to hardware on boot, allowing the
lower levels of the page tables to be filled dynamically.

Acked-by: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 

---
v5: fix some checkpatch CHECK warnings. There are some that remain
around lines ending with '(': I have not changed these because
it's consistent with the rest of the file and it's not easy to
see how to fix it without creating an overlong line or lots of
temporary variables.

v2: move from faulting in shadow pgds to prepopulating
---
 arch/x86/Kconfig|  1 +
 arch/x86/mm/kasan_init_64.c | 60 +
 2 files changed, 61 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 96ea2c7449ef..3590651e95f5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -135,6 +135,7 @@ config X86
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN  if X86_64
+   select HAVE_ARCH_KASAN_VMALLOC  if X86_64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS  if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if MMU && COMPAT
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 296da58f3013..8f00f462709e 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -245,6 +245,51 @@ static void __init kasan_map_early_shadow(pgd_t *pgd)
} while (pgd++, addr = next, addr != end);
 }
 
+static void __init kasan_shallow_populate_p4ds(pgd_t *pgd,
+  unsigned long addr,
+  unsigned long end,
+  int nid)
+{
+   p4d_t *p4d;
+   unsigned long next;
+   void *p;
+
+   p4d = p4d_offset(pgd, addr);
+   do {
+   next = p4d_addr_end(addr, end);
+
+   if (p4d_none(*p4d)) {
+   p = early_alloc(PAGE_SIZE, nid, true);
+   p4d_populate(_mm, p4d, p);
+   }
+   } while (p4d++, addr = next, addr != end);
+}
+
+static void __init kasan_shallow_populate_pgds(void *start, void *end)
+{
+   unsigned long addr, next;
+   pgd_t *pgd;
+   void *p;
+   int nid = early_pfn_to_nid((unsigned long)start);
+
+   addr = (unsigned long)start;
+   pgd = pgd_offset_k(addr);
+   do {
+   next = pgd_addr_end(addr, (unsigned long)end);
+
+   if (pgd_none(*pgd)) {
+   p = early_alloc(PAGE_SIZE, nid, true);
+   pgd_populate(_mm, pgd, p);
+   }
+
+   /*
+* we need to populate p4ds to be synced when running in
+* four level mode - see sync_global_pgds_l4()
+*/
+   kasan_shallow_populate_p4ds(pgd, addr, next, nid);
+   } while (pgd++, addr = next, addr != (unsigned long)end);
+}
+
 #ifdef CONFIG_KASAN_INLINE
 static int kasan_die_handler(struct notifier_block *self,
 unsigned long val,
@@ -352,9 +397,24 @@ void __init kasan_init(void)
shadow_cpu_entry_end = (void *)round_up(
(unsigned long)shadow_cpu_entry_end, PAGE_SIZE);
 
+   /*
+* If we're in full vmalloc mode, don't back vmalloc space with early
+* shadow pages. Instead, prepopulate pgds/p4ds so they are synced to
+* the global table and we can populate the lower levels on demand.
+*/
+#ifdef CONFIG_KASAN_VMALLOC
+   kasan_shallow_populate_pgds(
+   kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
+   kasan_mem_to_shadow((void *)VMALLOC_END));
+
+   kasan_populate_early_shadow(
+   kasan_mem_to_shadow((void *)VMALLOC_END + 1),
+   shadow_cpu_entry_begin);
+#else
kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
shadow_cpu_entry_begin);
+#endif
 
kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin,
  (unsigned long)shadow_cpu_entry_end, 0);
-- 
2.20.1

[PATCH v8 3/5] fork: support VMAP_STACK with KASAN_VMALLOC

2019-10-01 Thread Daniel Axtens

Supporting VMAP_STACK with KASAN_VMALLOC is straightforward:

 - clear the shadow region of vmapped stacks when swapping them in
 - tweak Kconfig to allow VMAP_STACK to be turned on with KASAN

Reviewed-by: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 
---
 arch/Kconfig  | 9 +
 kernel/fork.c | 4 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 5f8a5d84dbbe..2d914990402f 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -843,16 +843,17 @@ config HAVE_ARCH_VMAP_STACK
 config VMAP_STACK
default y
bool "Use a virtually-mapped stack"
-   depends on HAVE_ARCH_VMAP_STACK && !KASAN
+   depends on HAVE_ARCH_VMAP_STACK
+   depends on !KASAN || KASAN_VMALLOC
---help---
  Enable this if you want the use virtually-mapped kernel stacks
  with guard pages.  This causes kernel stack overflows to be
  caught immediately rather than causing difficult-to-diagnose
  corruption.
 
- This is presently incompatible with KASAN because KASAN expects
- the stack to map directly to the KASAN shadow map using a formula
- that is incorrect if the stack is in vmalloc space.
+ To use this with KASAN, the architecture must support backing
+ virtual mappings with real shadow memory, and KASAN_VMALLOC must
+ be enabled.
 
 config ARCH_OPTIONAL_KERNEL_RWX
def_bool n
diff --git a/kernel/fork.c b/kernel/fork.c
index 6adbbcf448c3..0c9e6478ba85 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -94,6 +94,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -229,6 +230,9 @@ static unsigned long *alloc_thread_stack_node(struct 
task_struct *tsk, int node)
if (!s)
continue;
 
+   /* Clear the KASAN shadow of the stack. */
+   kasan_unpoison_shadow(s->addr, THREAD_SIZE);
+
/* Clear stale pointers from reused stack. */
memset(s->addr, 0, THREAD_SIZE);
 
-- 
2.20.1

[PATCH v8 2/5] kasan: add test for vmalloc

2019-10-01 Thread Daniel Axtens

Test kasan vmalloc support by adding a new test to the module.

Signed-off-by: Daniel Axtens 

--

v5: split out per Christophe Leroy
---
 lib/test_kasan.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/lib/test_kasan.c b/lib/test_kasan.c
index 49cc4d570a40..328d33beae36 100644
--- a/lib/test_kasan.c
+++ b/lib/test_kasan.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -748,6 +749,30 @@ static noinline void __init kmalloc_double_kzfree(void)
kzfree(ptr);
 }
 
+#ifdef CONFIG_KASAN_VMALLOC
+static noinline void __init vmalloc_oob(void)
+{
+   void *area;
+
+   pr_info("vmalloc out-of-bounds\n");
+
+   /*
+* We have to be careful not to hit the guard page.
+* The MMU will catch that and crash us.
+*/
+   area = vmalloc(3000);
+   if (!area) {
+   pr_err("Allocation failed\n");
+   return;
+   }
+
+   ((volatile char *)area)[3100];
+   vfree(area);
+}
+#else
+static void __init vmalloc_oob(void) {}
+#endif
+
 static int __init kmalloc_tests_init(void)
 {
/*
@@ -793,6 +818,7 @@ static int __init kmalloc_tests_init(void)
kasan_strings();
kasan_bitops();
kmalloc_double_kzfree();
+   vmalloc_oob();
 
kasan_restore_multi_shot(multishot);
 
-- 
2.20.1

[PATCH v8 1/5] kasan: support backing vmalloc space with real shadow memory

2019-10-01 Thread Daniel Axtens

Hook into vmalloc and vmap, and dynamically allocate real shadow
memory to back the mappings.

Most mappings in vmalloc space are small, requiring less than a full
page of shadow space. Allocating a full shadow page per mapping would
therefore be wasteful. Furthermore, to ensure that different mappings
use different shadow pages, mappings would have to be aligned to
KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.

Instead, share backing space across multiple mappings. Allocate a
backing page when a mapping in vmalloc space uses a particular page of
the shadow region. This page can be shared by other vmalloc mappings
later on.

We hook in to the vmap infrastructure to lazily clean up unused shadow
memory.

To avoid the difficulties around swapping mappings around, this code
expects that the part of the shadow region that covers the vmalloc
space will not be covered by the early shadow page, but will be left
unmapped. This will require changes in arch-specific code.

This allows KASAN with VMAP_STACK, and may be helpful for architectures
that do not have a separate module space (e.g. powerpc64, which I am
currently working on). It also allows relaxing the module alignment
back to PAGE_SIZE.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202009
Acked-by: Vasily Gorbik 
Signed-off-by: Daniel Axtens 
[Mark: rework shadow allocation]
Signed-off-by: Mark Rutland 

--

v2: let kasan_unpoison_shadow deal with ranges that do not use a
full shadow byte.

v3: relax module alignment
rename to kasan_populate_vmalloc which is a much better name
deal with concurrency correctly

v4: Mark's rework
Poision pages on vfree
Handle allocation failures

v5: Per Christophe Leroy, split out test and dynamically free pages.

v6: Guard freeing page properly. Drop WARN_ON_ONCE(pte_none(*ptep)),
 on reflection it's unnecessary debugging cruft with too high a
 false positive rate.

v7: tlb flush, thanks Mark.
explain more clearly how freeing works and is concurrency-safe.
---
 Documentation/dev-tools/kasan.rst |  63 +
 include/linux/kasan.h |  31 +
 include/linux/moduleloader.h  |   2 +-
 include/linux/vmalloc.h   |  12 ++
 lib/Kconfig.kasan |  16 +++
 mm/kasan/common.c | 204 ++
 mm/kasan/generic_report.c |   3 +
 mm/kasan/kasan.h  |   1 +
 mm/vmalloc.c  |  45 ++-
 9 files changed, 375 insertions(+), 2 deletions(-)

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index b72d07d70239..bdb92c3de7a5 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -215,3 +215,66 @@ brk handler is used to print bug reports.
 A potential expansion of this mode is a hardware tag-based mode, which would
 use hardware memory tagging support instead of compiler instrumentation and
 manual shadow memory manipulation.
+
+What memory accesses are sanitised by KASAN?
+
+
+The kernel maps memory in a number of different parts of the address
+space. This poses something of a problem for KASAN, which requires
+that all addresses accessed by instrumented code have a valid shadow
+region.
+
+The range of kernel virtual addresses is large: there is not enough
+real memory to support a real shadow region for every address that
+could be accessed by the kernel.
+
+By default
+~~
+
+By default, architectures only map real memory over the shadow region
+for the linear mapping (and potentially other small areas). For all
+other areas - such as vmalloc and vmemmap space - a single read-only
+page is mapped over the shadow area. This read-only shadow page
+declares all memory accesses as permitted.
+
+This presents a problem for modules: they do not live in the linear
+mapping, but in a dedicated module space. By hooking in to the module
+allocator, KASAN can temporarily map real shadow memory to cover
+them. This allows detection of invalid accesses to module globals, for
+example.
+
+This also creates an incompatibility with ``VMAP_STACK``: if the stack
+lives in vmalloc space, it will be shadowed by the read-only page, and
+the kernel will fault when trying to set up the shadow data for stack
+variables.
+
+CONFIG_KASAN_VMALLOC
+
+
+With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
+cost of greater memory usage. Currently this is only supported on x86.
+
+This works by hooking into vmalloc and vmap, and dynamically
+allocating real shadow memory to back the mappings.
+
+Most mappings in vmalloc space are small, requiring less than a full
+page of shadow space. Allocating a full shadow page per mapping would
+therefore be wasteful. Furthermore, to ensure that different mappings
+use different shadow pages, mappings would have to be aligned to
+``KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE``.
+
+Instead, we share backing space across multiple mappings. We

[PATCH v8 0/5] kasan: support backing vmalloc space with real shadow memory

2019-10-01 Thread Daniel Axtens

Currently, vmalloc space is backed by the early shadow page. This
means that kasan is incompatible with VMAP_STACK.

This series provides a mechanism to back vmalloc space with real,
dynamically allocated memory. I have only wired up x86, because that's
the only currently supported arch I can work with easily, but it's
very easy to wire up other architectures, and it appears that there is
some work-in-progress code to do this on arm64 and s390.

This has been discussed before in the context of VMAP_STACK:
 - https://bugzilla.kernel.org/show_bug.cgi?id=202009
 - https://lkml.org/lkml/2018/7/22/198
 - https://lkml.org/lkml/2019/7/19/822

In terms of implementation details:

Most mappings in vmalloc space are small, requiring less than a full
page of shadow space. Allocating a full shadow page per mapping would
therefore be wasteful. Furthermore, to ensure that different mappings
use different shadow pages, mappings would have to be aligned to
KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.

Instead, share backing space across multiple mappings. Allocate a
backing page when a mapping in vmalloc space uses a particular page of
the shadow region. This page can be shared by other vmalloc mappings
later on.

We hook in to the vmap infrastructure to lazily clean up unused shadow
memory.

v1: https://lore.kernel.org/linux-mm/20190725055503.19507-1-...@axtens.net/
v2: https://lore.kernel.org/linux-mm/20190729142108.23343-1-...@axtens.net/
 Address review comments:
 - Patch 1: use kasan_unpoison_shadow's built-in handling of
ranges that do not align to a full shadow byte
 - Patch 3: prepopulate pgds rather than faulting things in
v3: https://lore.kernel.org/linux-mm/20190731071550.31814-1-...@axtens.net/
 Address comments from Mark Rutland:
 - kasan_populate_vmalloc is a better name
 - handle concurrency correctly
 - various nits and cleanups
 - relax module alignment in KASAN_VMALLOC case
v4: https://lore.kernel.org/linux-mm/20190815001636.12235-1-...@axtens.net/
 Changes to patch 1 only:
 - Integrate Mark's rework, thanks Mark!
 - handle the case where kasan_populate_shadow might fail
 - poision shadow on free, allowing the alloc path to just
 unpoision memory that it uses
v5: https://lore.kernel.org/linux-mm/20190830003821.10737-1-...@axtens.net/
 Address comments from Christophe Leroy:
 - Fix some issues with my descriptions in commit messages and docs
 - Dynamically free unused shadow pages by hooking into the vmap book-keeping
 - Split out the test into a separate patch
 - Optional patch to track the number of pages allocated
 - minor checkpatch cleanups
v6: https://lore.kernel.org/linux-mm/20190902112028.23773-1-...@axtens.net/
 Properly guard freeing pages in patch 1, drop debugging code.
v7: https://lore.kernel.org/linux-mm/20190903145536.3390-1-...@axtens.net/
Add a TLB flush on freeing, thanks Mark Rutland.
Explain more clearly how I think freeing is concurrency-safe.
v8: rename kasan_vmalloc/shadow_pages to kasan/vmalloc_shadow_pages

Daniel Axtens (5):
  kasan: support backing vmalloc space with real shadow memory
  kasan: add test for vmalloc
  fork: support VMAP_STACK with KASAN_VMALLOC
  x86/kasan: support KASAN_VMALLOC
  kasan debug: track pages allocated for vmalloc shadow

 Documentation/dev-tools/kasan.rst |  63 
 arch/Kconfig  |   9 +-
 arch/x86/Kconfig  |   1 +
 arch/x86/mm/kasan_init_64.c   |  60 
 include/linux/kasan.h |  31 
 include/linux/moduleloader.h  |   2 +-
 include/linux/vmalloc.h   |  12 ++
 kernel/fork.c |   4 +
 lib/Kconfig.kasan |  16 +++
 lib/test_kasan.c  |  26 
 mm/kasan/common.c | 230 ++
 mm/kasan/generic_report.c |   3 +
 mm/kasan/kasan.h  |   1 +
 mm/vmalloc.c  |  45 +-
 14 files changed, 497 insertions(+), 6 deletions(-)

-- 
2.20.1

1 2 3 4 >

1 - 100 of 325 matches

Mail list logo