[PATCH] powerpc/8xx: Fix instruction TLB miss exception with perf enabled

2020-10-08 Thread Christophe Leroy
When perf is enabled, r11 must also be restored when CONFIG_HUGETLBFS
is selected.

Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k 
pages.")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/head_8xx.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 9f359d3fba74..32d85387bdc5 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -268,7 +268,7 @@ InstructionTLBMiss:
addir10, r10, 1
stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0)
mfspr   r10, SPRN_SPRG_SCRATCH0
-#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP)
+#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || 
defined(CONFIG_HUGETLBFS)
mfspr   r11, SPRN_SPRG_SCRATCH1
 #endif
rfi
-- 
2.25.0



Re: [PATCH] powerpc/powernv/dump: Fix race while processing OPAL dump

2020-10-08 Thread Michael Ellerman
Vasant Hegde  writes:
> diff --git a/arch/powerpc/platforms/powernv/opal-dump.c 
> b/arch/powerpc/platforms/powernv/opal-dump.c
> index 543c816fa99e..7e6eeedec32b 100644
> --- a/arch/powerpc/platforms/powernv/opal-dump.c
> +++ b/arch/powerpc/platforms/powernv/opal-dump.c
> @@ -346,21 +345,39 @@ static struct dump_obj *create_dump_obj(uint32_t id, 
> size_t size,
>   rc = kobject_add(>kobj, NULL, "0x%x-0x%x", type, id);
>   if (rc) {
>   kobject_put(>kobj);
> - return NULL;
> + return;
>   }
>  
> + /*
> +  * As soon as the sysfs file for this dump is created/activated there is
> +  * a chance the opal_errd daemon (or any userspace) might read and
> +  * acknowledge the dump before kobject_uevent() is called. If that
> +  * happens then there is a potential race between
> +  * dump_ack_store->kobject_put() and kobject_uevent() which leads to a
> +  * use-after-free of a kernfs object resulting in a kernel crash.
> +  *
> +  * To avoid that, we need to take a reference on behalf of the bin file,
> +  * so that our reference remains valid while we call kobject_uevent().
> +  * We then drop our reference before exiting the function, leaving the
> +  * bin file to drop the last reference (if it hasn't already).
> +  */
> +
> + /* Take a reference for the bin file */
> + kobject_get(>kobj);
>   rc = sysfs_create_bin_file(>kobj, >dump_attr);
>   if (rc) {
>   kobject_put(>kobj);
> - return NULL;
> + /* Drop reference count taken for bin file */
> + kobject_put(>kobj);
> + return;
>   }
>  
>   pr_info("%s: New platform dump. ID = 0x%x Size %u\n",
>   __func__, dump->id, dump->size);
>  
>   kobject_uevent(>kobj, KOBJ_ADD);
> -
> - return dump;
> + /* Drop reference count taken for bin file */
> + kobject_put(>kobj);
>  }

I think this would be better if it was reworked along the lines of:

aea948bb80b4 ("powerpc/powernv/elog: Fix race while processing OPAL error log 
event.")

cheers


Re: [PATCH] powerpc/powernv/elog: Reduce elog message severity

2020-10-08 Thread Michael Ellerman
Vasant Hegde  writes:
> OPAL interrupts kernel whenever it has new error log. Kernel calls
> interrupt handler (elog_event()) to retrieve event. elog_event makes
> OPAL API call (opal_get_elog_size()) to retrieve elog info.
>
> In some case before kernel makes opal_get_elog_size() call, it gets interrupt
> again. So second time when elog_event() calls opal_get_elog_size API OPAL
> returns error.

Can you give more detail there? Do you have a stack trace?

We use IRQF_ONESHOT for elog_event(), which (I thought) meant it
shouldn't be called again until it has completed.

So I'm unclear how you're seeing the behaviour you describe.

cheers

> Its safe to ignore this error. Hence reduce the severity
> of log message.
>
> CC: Mahesh Salgaonkar 
> Signed-off-by: Vasant Hegde 
> ---
>  arch/powerpc/platforms/powernv/opal-elog.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/opal-elog.c 
> b/arch/powerpc/platforms/powernv/opal-elog.c
> index 62ef7ad995da..67f435bb1ec4 100644
> --- a/arch/powerpc/platforms/powernv/opal-elog.c
> +++ b/arch/powerpc/platforms/powernv/opal-elog.c
> @@ -247,7 +247,7 @@ static irqreturn_t elog_event(int irq, void *data)
>  
>   rc = opal_get_elog_size(, , );
>   if (rc != OPAL_SUCCESS) {
> - pr_err("ELOG: OPAL log info read failed\n");
> + pr_debug("ELOG: OPAL log info read failed\n");
>   return IRQ_HANDLED;
>   }
>  
> -- 
> 2.26.2


Re: linux-next: Fixes tag needs some work in the powerpc tree

2020-10-08 Thread Michael Ellerman
Stephen Rothwell  writes:

> Hi all,
>
> In commit
>
>   a2d0230b91f7 ("cpufreq: powernv: Fix frame-size-overflow in 
> powernv_cpufreq_reboot_notifier")
>
> Fixes tag
>
>   Fixes: cf30af76 ("cpufreq: powernv: Set the cpus to nominal frequency 
> during reboot/kexec")

Gah.

I've changed my scripts to make this a hard error when I'm applying
patches.

cheers


Re: [PATCH 2/2] dt: Remove booting-without-of.rst

2020-10-08 Thread Michael Ellerman
Rob Herring  writes:
> booting-without-of.rstt is an ancient document that first outlined
^
nit

> Flattened DeviceTree on PowerPC initially. The DT world has evolved a
> lot in the 15 years since and booting-without-of.rst is pretty stale.
> The name of the document itself is confusing if you don't understand the
> evolution from real 'OpenFirmware'. Most of what booting-without-of.rst
> contains is now in the DT specification (which evolved out of the
> ePAPR). The few things that weren't documented in the DT specification
> are now.
>
> All that remains is the boot entry details, so let's move these to arch
> specific documents. The exception is arm which already has the same
> details documented.
>
> Cc: Frank Rowand 
> Cc: Mauro Carvalho Chehab 
> Cc: Geert Uytterhoeven 
> Cc: Michael Ellerman 
> Cc: Thomas Bogendoerfer 
> Cc: Jonathan Corbet 
> Cc: Paul Mackerras 
> Cc: Yoshinori Sato 
> Cc: Rich Felker 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Acked-by: Benjamin Herrenschmidt 
> Signed-off-by: Rob Herring 
> ---
>  .../devicetree/booting-without-of.rst | 1585 -
>  Documentation/devicetree/index.rst|1 -
>  Documentation/mips/booting.rst|   28 +
>  Documentation/mips/index.rst  |1 +
>  Documentation/powerpc/booting.rst |  110 ++

LGTM.

Acked-by: Michael Ellerman  (powerpc)

cheers


[powerpc:next] BUILD SUCCESS a2d0230b91f7e23ceb5d8fb6a9799f30517ec33a

2020-10-08 Thread kernel test robot
  ppc44x_defconfig
h8300   h8s-sim_defconfig
m68k   m5249evb_defconfig
xtensasmp_lx200_defconfig
armkeystone_defconfig
arm cm_x300_defconfig
mips db1xxx_defconfig
powerpc  ppc6xx_defconfig
riscv   defconfig
powerpcge_imp3a_defconfig
arm axm55xx_defconfig
powerpc  ep88xc_defconfig
powerpcgamecube_defconfig
powerpc mpc832x_rdb_defconfig
arm   aspeed_g5_defconfig
mips cu1000-neo_defconfig
sh   se7619_defconfig
arm nhk8815_defconfig
i386 allyesconfig
arm bcm2835_defconfig
sh espt_defconfig
mips  loongson3_defconfig
mips   ip28_defconfig
armshmobile_defconfig
powerpc  arches_defconfig
powerpc ksi8560_defconfig
arm davinci_all_defconfig
powerpc  allyesconfig
mips  decstation_64_defconfig
powerpc kmeter1_defconfig
powerpc  obs600_defconfig
mips   capcella_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68kdefconfig
m68k allyesconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390defconfig
sparcallyesconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a004-20201008
x86_64   randconfig-a003-20201008
x86_64   randconfig-a005-20201008
x86_64   randconfig-a001-20201008
x86_64   randconfig-a002-20201008
x86_64   randconfig-a006-20201008
i386 randconfig-a006-20201008
i386 randconfig-a005-20201008
i386 randconfig-a001-20201008
i386 randconfig-a004-20201008
i386 randconfig-a002-20201008
i386 randconfig-a003-20201008
i386 randconfig-a006-20201009
i386 randconfig-a005-20201009
i386 randconfig-a001-20201009
i386 randconfig-a004-20201009
i386 randconfig-a002-20201009
i386 randconfig-a003-20201009
x86_64   randconfig-a012-20201009
x86_64   randconfig-a015-20201009
x86_64   randconfig-a013-20201009
x86_64   randconfig-a014-20201009
x86_64   randconfig-a011-20201009
x86_64   randconfig-a016-20201009
i386 randconfig-a015-20201009
i386 randconfig-a013-20201009
i386 randconfig-a014-20201009
i386 randconfig-a016-20201009
i386 randconfig-a011-20201009
i386 randconfig-a012-20201009
i386 randconfig-a015-20201008
i386 randconfig-a013-20201008
i386 randconfig-a014-20201008
i386 randconfig-a016-20201008
i386 randconfig-a011-20201008
i386 randconfig-a012-20201008
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a004-20201009
x86_64   randconfig-a003-20201009
x86_64   randconfig-a005-20201009
x86_64   randconfig-a001-20201009
x86_64   randconfig-a002-20201009
x86_64

[powerpc:merge] BUILD SUCCESS 118be7377c97e35c33819bcb3bbbae5a42a4ac43

2020-10-08 Thread kernel test robot
 ppa8548_defconfig
powerpc  obs600_defconfig
mips   capcella_defconfig
powerpc kmeter1_defconfig
openriscor1ksim_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a004-20201008
x86_64   randconfig-a003-20201008
x86_64   randconfig-a005-20201008
x86_64   randconfig-a001-20201008
x86_64   randconfig-a002-20201008
x86_64   randconfig-a006-20201008
i386 randconfig-a006-20201008
i386 randconfig-a005-20201008
i386 randconfig-a001-20201008
i386 randconfig-a004-20201008
i386 randconfig-a002-20201008
i386 randconfig-a003-20201008
x86_64   randconfig-a012-20201009
x86_64   randconfig-a015-20201009
x86_64   randconfig-a013-20201009
x86_64   randconfig-a014-20201009
x86_64   randconfig-a011-20201009
x86_64   randconfig-a016-20201009
i386 randconfig-a015-20201008
i386 randconfig-a013-20201008
i386 randconfig-a014-20201008
i386 randconfig-a016-20201008
i386 randconfig-a011-20201008
i386 randconfig-a012-20201008
riscvnommu_k210_defconfig
riscvallyesconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv  rv32_defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

clang tested configs:
x86_64   randconfig-a012-20201008
x86_64   randconfig-a015-20201008
x86_64   randconfig-a013-20201008
x86_64   randconfig-a014-20201008
x86_64   randconfig-a011-20201008
x86_64   randconfig-a016-20201008

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


Linux kernel: powerpc: RTAS calls can be used to compromise kernel integrity

2020-10-08 Thread Andrew Donnellan
The Linux kernel for powerpc has an issue with the Run-Time Abstraction 
Services (RTAS) interface, allowing root (or CAP_SYS_ADMIN users) in a 
VM to overwrite some parts of memory, including kernel memory.


This issue impacts guests running on top of PowerVM or KVM hypervisors 
(pseries platform), and does *not* impact bare-metal machines (powernv 
platform).


Description
===

The RTAS interface, defined in the Power Architecture Platform 
Reference, provides various platform hardware services to operating 
systems running on PAPR platforms (e.g. the "pseries" platform in Linux, 
running in a LPAR/VM on PowerVM or KVM).


Some userspace daemons require access to certain RTAS calls for system 
maintenance and monitoring purposes.


The kernel exposes a syscall, sys_rtas, that allows root (or any user 
with CAP_SYS_ADMIN) to make arbitrary RTAS calls. For the RTAS calls 
which require a work area, it allocates a buffer (the "RMO buffer") and 
exposes the physical address in /proc so that the userspace tool can 
pass addresses within that buffer as an argument to the RTAS call.


The syscall doesn't check that the work area arguments to RTAS calls are 
within the RMO buffer, which makes it trivial to read and write to any 
guest physical address within the LPAR's Real Memory Area, including 
overwriting the guest kernel's text.


At the time the RTAS syscall interface was first developed, it was 
generally assumed that root had unlimited ability to modify system 
state, so this would not have been considered an integrity violation. 
However, with the advent of Secure Boot, Lockdown etc, root should not 
be able to arbitrarily modify the kernel text or read arbitrary kernel data.


Therefore, while this issue impacts all kernels since the RTAS interface 
was first implemented, we are only considering it a vulnerability for 
upstream kernels from 5.3 onwards, which is when the Lockdown LSM was 
merged. Lockdown was widely included in pre-5.3 distribution kernels, so 
distribution vendors should consider whether they need to backport the 
patch to their pre-5.3 distro trees.


(A CVE for this issue is pending; we requested one some time ago but it 
has not yet been assigned.)


Fixes
=

A patch is currently in powerpc-next[0] and is expected to be included 
in mainline kernel 5.10. The patch has not yet been backported to 
upstream stable trees.


The approach taken by the patch is to maintain the existing RTAS 
interface, but restrict requests to the list of RTAS calls actually used 
by the librtas userspace library, and restrict work area pointer 
arguments to the region within the RMO buffer.


All RTAS-using applications that we are aware of are system 
management/monitoring tools, maintained by IBM, that use the librtas 
library. We don't anticipate there being any real world legitimate 
applications that require an RTAS call that isn't in the librtas list, 
however if such an application exists, the filtering can be disabled by 
a Kconfig option specified during kernel build.


Credit
==

Thanks to Daniel Axtens (IBM) for initial discovery of this issue.

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next=bd59380c5ba4147dcbaad3e582b55ccfd120b764


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited


linux-next: Fixes tag needs some work in the powerpc tree

2020-10-08 Thread Stephen Rothwell
Hi all,

In commit

  a2d0230b91f7 ("cpufreq: powernv: Fix frame-size-overflow in 
powernv_cpufreq_reboot_notifier")

Fixes tag

  Fixes: cf30af76 ("cpufreq: powernv: Set the cpus to nominal frequency during 
reboot/kexec")

has these problem(s):

  - SHA1 should be at least 12 digits long
Can be fixed by setting core.abbrev to 12 (or more) or (for git v2.11
or later) just making sure it is not set (or set to "auto").

-- 
Cheers,
Stephen Rothwell


pgpscIrJo7gJ2.pgp
Description: OpenPGP digital signature


Re: [RFC PATCH] mm: Fetch the dirty bit before we reset the pte

2020-10-08 Thread Aneesh Kumar K.V

On 10/8/20 10:32 PM, Linus Torvalds wrote:

On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V
 wrote:


In copy_present_page, after we mark the pte non-writable, we should
check for previous dirty bit updates and make sure we don't lose the dirty
bit on reset.


No, we'll just remove that entirely.

Do you have a test-case that shows a problem? I have a patch that I
was going to delay until 5.10 because I didn't think it mattered in
practice..



Unfortunately, I don't have a test case. That was observed by code 
inspection while I was fixing syzkaller report.



The second part of this patch would be to add a sequence count
protection to fast-GUP pinning, so that GUP and fork() couldn't race,
but I haven't written that part.

Here's the first patch anyway. If you actually have a test-case where
this matters, I guess I need to apply it now..

Linus




-aneesh


Re: [RFC PATCH] mm: Fetch the dirty bit before we reset the pte

2020-10-08 Thread Linus Torvalds
On Thu, Oct 8, 2020 at 10:02 AM Linus Torvalds
 wrote:
>
> Here's the first patch anyway. If you actually have a test-case where
> this matters, I guess I need to apply it now..

Actually, I removed the "__page_mapcount()" part of that patch, to
keep it minimal and _only_ do remove the wrprotect trick.

We can do the __page_mapcount() optimization and the mm sequence count
for 5.10 (although so far nobody has actually written the seqcount
patch - I think it would be a trivial few-liner, but I guess it won't
make 5.10 at this point).

So here's what I ended up with.

  Linus
From f3c64eda3e5097ec3198cb271f5f504d65d67131 Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Mon, 28 Sep 2020 12:50:03 -0700
Subject: [PATCH] mm: avoid early COW write protect games during fork()

In commit 70e806e4e645 ("mm: Do early cow for pinned pages during fork()
for ptes") we write-protected the PTE before doing the page pinning
check, in order to avoid a race with concurrent fast-GUP pinning (which
doesn't take the mm semaphore or the page table lock).

That trick doesn't actually work - it doesn't handle memory ordering
properly, and doing so would be prohibitively expensive.

It also isn't really needed.  While we're moving in the direction of
allowing and supporting page pinning without marking the pinned area
with MADV_DONTFORK, the fact is that we've never really supported this
kind of odd "concurrent fork() and page pinning", and doing the
serialization on a pte level is just wrong.

We can add serialization with a per-mm sequence counter, so we know how
to solve that race properly, but we'll do that at a more appropriate
time.  Right now this just removes the write protect games.

It also turns out that the write protect games actually break on Power,
as reported by Aneesh Kumar:

 "Architecture like ppc64 expects set_pte_at to be not used for updating
  a valid pte. This is further explained in commit 56eecdb912b5 ("mm:
  Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")"

and the code triggered a warning there:

  WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
  Call Trace:
copy_present_page mm/memory.c:857 [inline]
copy_present_pte mm/memory.c:899 [inline]
copy_pte_range mm/memory.c:1014 [inline]
copy_pmd_range mm/memory.c:1092 [inline]
copy_pud_range mm/memory.c:1127 [inline]
copy_p4d_range mm/memory.c:1150 [inline]
copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212
dup_mmap kernel/fork.c:592 [inline]
dup_mm+0x77c/0xab0 kernel/fork.c:1355
copy_mm kernel/fork.c:1411 [inline]
copy_process+0x1f00/0x2740 kernel/fork.c:2070
_do_fork+0xc4/0x10b0 kernel/fork.c:2429

Link: https://lore.kernel.org/lkml/CAHk-=wiwr+go0ro4lvnjbms90oiepnyre3e+pjvc9pzdbsh...@mail.gmail.com/
Link: https://lore.kernel.org/linuxppc-dev/20201008092541.398079-1-aneesh.ku...@linux.ibm.com/
Reported-by: Aneesh Kumar K.V 
Tested-by: Leon Romanovsky 
Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: John Hubbard 
Cc: Andrew Morton 
Cc: Jan Kara 
Cc: Michal Hocko 
Cc: Kirill Shutemov 
Cc: Hugh Dickins 
Signed-off-by: Linus Torvalds 
---
 mm/memory.c | 41 -
 1 file changed, 4 insertions(+), 37 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..eeae590e526a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -806,8 +806,6 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		return 1;
 
 	/*
-	 * The trick starts.
-	 *
 	 * What we want to do is to check whether this page may
 	 * have been pinned by the parent process.  If so,
 	 * instead of wrprotect the pte on both sides, we copy
@@ -815,47 +813,16 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * the pinned page won't be randomly replaced in the
 	 * future.
 	 *
-	 * To achieve this, we do the following:
-	 *
-	 * 1. Write-protect the pte if it's writable.  This is
-	 *to protect concurrent write fast-gup with
-	 *FOLL_PIN, so that we'll fail the fast-gup with
-	 *the write bit removed.
-	 *
-	 * 2. Check page_maybe_dma_pinned() to see whether this
-	 *page may have been pinned.
-	 *
-	 * The order of these steps is important to serialize
-	 * against the fast-gup code (gup_pte_range()) on the
-	 * pte check and try_grab_compound_head(), so that
-	 * we'll make sure either we'll capture that fast-gup
-	 * so we'll copy the pinned page here, or we'll fail
-	 * that fast-gup.
-	 *
-	 * NOTE! Even if we don't end up copying the page,
-	 * we won't undo this wrprotect(), because the normal
-	 * reference copy will need it anyway.
-	 */
-	if (pte_write(pte))
-		ptep_set_wrprotect(src_mm, addr, src_pte);
-
-	/*
-	 * These are the "normally we can just copy by reference"
-	 * checks.
+	 * The page pinning checks are just "has this mm ever
+	 * seen pinning", along with the (inexact) check of
+	 * the page count. That might give false positives for
+	 * for 

Re: [PATCH] mm: Avoid using set_pte_at when updating a present pte

2020-10-08 Thread Linus Torvalds
Ahh, and I should learn to read all my emails before replying to some of them..

On Thu, Oct 8, 2020 at 2:26 AM Aneesh Kumar K.V
 wrote:
>
> This avoids the below warning
> [..]
 > WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185
set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185

.. and I assume this is what triggered the other patch too.

Yes, with the ppc warning, we need to do _something_ about this, and
at that point I think the "something" is to just avoid the pte
wrpritect trick.

   Linus


Re: [RFC PATCH] mm: Fetch the dirty bit before we reset the pte

2020-10-08 Thread Linus Torvalds
[ Just adding Leon to the participants ]

This patch (not attached again, Leon has seen it before) has been
tested for the last couple of weeks for the rdma case, so I have no
problems applying it now, just to keep everybody in the loop.

 Linus

On Thu, Oct 8, 2020 at 10:02 AM Linus Torvalds
 wrote:
>
> On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V
>  wrote:
> >
> > In copy_present_page, after we mark the pte non-writable, we should
> > check for previous dirty bit updates and make sure we don't lose the dirty
> > bit on reset.
>
> No, we'll just remove that entirely.
>
> Do you have a test-case that shows a problem? I have a patch that I
> was going to delay until 5.10 because I didn't think it mattered in
> practice..
>
> The second part of this patch would be to add a sequence count
> protection to fast-GUP pinning, so that GUP and fork() couldn't race,
> but I haven't written that part.
>
> Here's the first patch anyway. If you actually have a test-case where
> this matters, I guess I need to apply it now..
>
>Linus


Re: [RFC PATCH] mm: Fetch the dirty bit before we reset the pte

2020-10-08 Thread Linus Torvalds
On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V
 wrote:
>
> In copy_present_page, after we mark the pte non-writable, we should
> check for previous dirty bit updates and make sure we don't lose the dirty
> bit on reset.

No, we'll just remove that entirely.

Do you have a test-case that shows a problem? I have a patch that I
was going to delay until 5.10 because I didn't think it mattered in
practice..

The second part of this patch would be to add a sequence count
protection to fast-GUP pinning, so that GUP and fork() couldn't race,
but I haven't written that part.

Here's the first patch anyway. If you actually have a test-case where
this matters, I guess I need to apply it now..

   Linus


fork-cleanup
Description: Binary data


Re: [PATCH 2/2] dt: Remove booting-without-of.rst

2020-10-08 Thread Borislav Petkov
On Thu, Oct 08, 2020 at 09:24:20AM -0500, Rob Herring wrote:
> booting-without-of.rstt is an ancient document that first outlined
> Flattened DeviceTree on PowerPC initially. The DT world has evolved a
> lot in the 15 years since and booting-without-of.rst is pretty stale.
> The name of the document itself is confusing if you don't understand the
> evolution from real 'OpenFirmware'. Most of what booting-without-of.rst
> contains is now in the DT specification (which evolved out of the
> ePAPR). The few things that weren't documented in the DT specification
> are now.
> 
> All that remains is the boot entry details, so let's move these to arch
> specific documents. The exception is arm which already has the same
> details documented.
> 
> Cc: Frank Rowand 
> Cc: Mauro Carvalho Chehab 
> Cc: Geert Uytterhoeven 
> Cc: Michael Ellerman 
> Cc: Thomas Bogendoerfer 
> Cc: Jonathan Corbet 
> Cc: Paul Mackerras 
> Cc: Yoshinori Sato 
> Cc: Rich Felker 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Acked-by: Benjamin Herrenschmidt 
> Signed-off-by: Rob Herring 
> ---
>  .../devicetree/booting-without-of.rst | 1585 -
>  Documentation/devicetree/index.rst|1 -
>  Documentation/mips/booting.rst|   28 +
>  Documentation/mips/index.rst  |1 +
>  Documentation/powerpc/booting.rst |  110 ++
>  Documentation/powerpc/index.rst   |1 +
>  Documentation/sh/booting.rst  |   12 +
>  Documentation/sh/index.rst|1 +
>  Documentation/x86/booting-dt.rst  |   21 +
>  Documentation/x86/index.rst   |1 +

For x86:

Acked-by: Borislav Petkov 

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


[PATCH 2/2] dt: Remove booting-without-of.rst

2020-10-08 Thread Rob Herring
booting-without-of.rstt is an ancient document that first outlined
Flattened DeviceTree on PowerPC initially. The DT world has evolved a
lot in the 15 years since and booting-without-of.rst is pretty stale.
The name of the document itself is confusing if you don't understand the
evolution from real 'OpenFirmware'. Most of what booting-without-of.rst
contains is now in the DT specification (which evolved out of the
ePAPR). The few things that weren't documented in the DT specification
are now.

All that remains is the boot entry details, so let's move these to arch
specific documents. The exception is arm which already has the same
details documented.

Cc: Frank Rowand 
Cc: Mauro Carvalho Chehab 
Cc: Geert Uytterhoeven 
Cc: Michael Ellerman 
Cc: Thomas Bogendoerfer 
Cc: Jonathan Corbet 
Cc: Paul Mackerras 
Cc: Yoshinori Sato 
Cc: Rich Felker 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-m...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux...@vger.kernel.org
Acked-by: Benjamin Herrenschmidt 
Signed-off-by: Rob Herring 
---
 .../devicetree/booting-without-of.rst | 1585 -
 Documentation/devicetree/index.rst|1 -
 Documentation/mips/booting.rst|   28 +
 Documentation/mips/index.rst  |1 +
 Documentation/powerpc/booting.rst |  110 ++
 Documentation/powerpc/index.rst   |1 +
 Documentation/sh/booting.rst  |   12 +
 Documentation/sh/index.rst|1 +
 Documentation/x86/booting-dt.rst  |   21 +
 Documentation/x86/index.rst   |1 +
 10 files changed, 175 insertions(+), 1586 deletions(-)
 delete mode 100644 Documentation/devicetree/booting-without-of.rst
 create mode 100644 Documentation/mips/booting.rst
 create mode 100644 Documentation/powerpc/booting.rst
 create mode 100644 Documentation/sh/booting.rst
 create mode 100644 Documentation/x86/booting-dt.rst

diff --git a/Documentation/devicetree/booting-without-of.rst 
b/Documentation/devicetree/booting-without-of.rst
deleted file mode 100644
index e9433350a20f..
--- a/Documentation/devicetree/booting-without-of.rst
+++ /dev/null
@@ -1,1585 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-==
-Booting the Linux/ppc kernel without Open Firmware
-==
-
-Copyright (c) 2005 Benjamin Herrenschmidt ,
-IBM Corp.
-
-Copyright (c) 2005 Becky Bruce ,
-Freescale Semiconductor, FSL SOC and 32-bit additions
-
-Copyright (c) 2006 MontaVista Software, Inc.
-Flash chip node definition
-
-.. Table of Contents
-
-  I - Introduction
-1) Entry point for arch/arm
-2) Entry point for arch/powerpc
-3) Entry point for arch/x86
-4) Entry point for arch/mips/bmips
-5) Entry point for arch/sh
-
-  II - The DT block format
-1) Header
-2) Device tree generalities
-3) Device tree "structure" block
-4) Device tree "strings" block
-
-  III - Required content of the device tree
-1) Note about cells and address representation
-2) Note about "compatible" properties
-3) Note about "name" properties
-4) Note about node and property names and character set
-5) Required nodes and properties
-  a) The root node
-  b) The /cpus node
-  c) The /cpus/* nodes
-  d) the /memory node(s)
-  e) The /chosen node
-  f) the /soc node
-
-  IV - "dtc", the device tree compiler
-
-  V - Recommendations for a bootloader
-
-  VI - System-on-a-chip devices and nodes
-1) Defining child nodes of an SOC
-2) Representing devices without a current OF specification
-
-  VII - Specifying interrupt information for devices
-1) interrupts property
-2) interrupt-parent property
-3) OpenPIC Interrupt Controllers
-4) ISA Interrupt Controllers
-
-  VIII - Specifying device power management information (sleep property)
-
-  IX - Specifying dma bus information
-
-  Appendix A - Sample SOC node for MPC8540
-
-
-Revision Information
-
-
-   May 18, 2005: Rev 0.1
-- Initial draft, no chapter III yet.
-
-   May 19, 2005: Rev 0.2
-- Add chapter III and bits & pieces here or
-   clarifies the fact that a lot of things are
-   optional, the kernel only requires a very
-   small device tree, though it is encouraged
-   to provide an as complete one as possible.
-
-   May 24, 2005: Rev 0.3
-- Precise that DT block has to be in RAM
-- Misc fixes
-- Define version 3 and new format version 16
-  for the DT block (version 16 needs kernel
-  patches, will be fwd separately).
-  

[PATCH 1/2] dt-bindings: powerpc: Add a schema for the 'sleep' property

2020-10-08 Thread Rob Herring
Document the PowerPC specific 'sleep' property as a schema. It is
currently only documented in booting-without-of.rst which is getting
removed.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring 
---
 .../devicetree/bindings/powerpc/sleep.yaml| 47 +++
 1 file changed, 47 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/powerpc/sleep.yaml

diff --git a/Documentation/devicetree/bindings/powerpc/sleep.yaml 
b/Documentation/devicetree/bindings/powerpc/sleep.yaml
new file mode 100644
index ..6494c7d08b93
--- /dev/null
+++ b/Documentation/devicetree/bindings/powerpc/sleep.yaml
@@ -0,0 +1,47 @@
+# SPDX-License-Identifier: GPL-2.0-only
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/powerpc/sleep.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: PowerPC sleep property
+
+maintainers:
+  - Rob Herring 
+
+description: |
+  Devices on SOCs often have mechanisms for placing devices into low-power
+  states that are decoupled from the devices' own register blocks.  Sometimes,
+  this information is more complicated than a cell-index property can
+  reasonably describe.  Thus, each device controlled in such a manner
+  may contain a "sleep" property which describes these connections.
+
+  The sleep property consists of one or more sleep resources, each of
+  which consists of a phandle to a sleep controller, followed by a
+  controller-specific sleep specifier of zero or more cells.
+
+  The semantics of what type of low power modes are possible are defined
+  by the sleep controller.  Some examples of the types of low power modes
+  that may be supported are:
+
+   - Dynamic: The device may be disabled or enabled at any time.
+   - System Suspend: The device may request to be disabled or remain
+ awake during system suspend, but will not be disabled until then.
+   - Permanent: The device is disabled permanently (until the next hard
+ reset).
+
+  Some devices may share a clock domain with each other, such that they should
+  only be suspended when none of the devices are in use.  Where reasonable,
+  such nodes should be placed on a virtual bus, where the bus has the sleep
+  property.  If the clock domain is shared among devices that cannot be
+  reasonably grouped in this manner, then create a virtual sleep controller
+  (similar to an interrupt nexus, except that defining a standardized
+  sleep-map should wait until its necessity is demonstrated).
+
+select: true
+
+properties:
+  sleep:
+$ref: /schemas/types.yaml#definitions/phandle-array
+
+additionalProperties: true
-- 
2.25.1



Re: [PATCH v4 2/4] powerpc/sstep: Support VSX vector paired storage access instructions

2020-10-08 Thread kernel test robot
Hi Ravi,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.9-rc8 next-20201007]
[cannot apply to mpe/next scottwood/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Ravi-Bangoria/powerpc-sstep-VSX-32-byte-vector-paired-load-store-instructions/20201008-153614
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-g5_defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/55def6779849f9aec057f405abf1cd98a8674b4f
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Ravi-Bangoria/powerpc-sstep-VSX-32-byte-vector-paired-load-store-instructions/20201008-153614
git checkout 55def6779849f9aec057f405abf1cd98a8674b4f
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   arch/powerpc/lib/sstep.c: In function 'analyse_instr':
>> arch/powerpc/lib/sstep.c:2901:15: error: implicit declaration of function 
>> 'VSX_REGISTER_XTP'; did you mean 'H_REGISTER_SMR'? 
>> [-Werror=implicit-function-declaration]
2901 | op->reg = VSX_REGISTER_XTP(rd);
 |   ^~~~
 |   H_REGISTER_SMR
   cc1: all warnings being treated as errors

vim +2901 arch/powerpc/lib/sstep.c

  2815  
  2816  #ifdef __powerpc64__
  2817  case 62:/* std[u] */
  2818  op->ea = dsform_ea(word, regs);
  2819  switch (word & 3) {
  2820  case 0: /* std */
  2821  op->type = MKOP(STORE, 0, 8);
  2822  break;
  2823  case 1: /* stdu */
  2824  op->type = MKOP(STORE, UPDATE, 8);
  2825  break;
  2826  case 2: /* stq */
  2827  if (!(rd & 1))
  2828  op->type = MKOP(STORE, 0, 16);
  2829  break;
  2830  }
  2831  break;
  2832  case 1: /* Prefixed instructions */
  2833  if (!cpu_has_feature(CPU_FTR_ARCH_31))
  2834  return -1;
  2835  
  2836  prefix_r = GET_PREFIX_R(word);
  2837  ra = GET_PREFIX_RA(suffix);
  2838  op->update_reg = ra;
  2839  rd = (suffix >> 21) & 0x1f;
  2840  op->reg = rd;
  2841  op->val = regs->gpr[rd];
  2842  
  2843  suffixopcode = get_op(suffix);
  2844  prefixtype = (word >> 24) & 0x3;
  2845  switch (prefixtype) {
  2846  case 0: /* Type 00  Eight-Byte Load/Store */
  2847  if (prefix_r && ra)
  2848  break;
  2849  op->ea = mlsd_8lsd_ea(word, suffix, regs);
  2850  switch (suffixopcode) {
  2851  case 41:/* plwa */
  2852  op->type = MKOP(LOAD, PREFIXED | 
SIGNEXT, 4);
  2853  break;
  2854  case 42:/* plxsd */
  2855  op->reg = rd + 32;
  2856  op->type = MKOP(LOAD_VSX, PREFIXED, 8);
  2857  op->element_size = 8;
  2858  op->vsx_flags = VSX_CHECK_VEC;
  2859  break;
  2860  case 43:/* plxssp */
  2861  op->reg = rd + 32;
  2862  op->type = MKOP(LOAD_VSX, PREFIXED, 4);
  2863  op->element_size = 8;
  2864  op->vsx_flags = VSX_FPCONV | 
VSX_CHECK_VEC;
  2865  break;
  2866  case 46:/* pstxsd */
  2867  op->reg = rd + 32;
  2868  op->type = MKOP(STORE_VSX, PREFIXED, 8);
  2869  op->element_size = 8

Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length

2020-10-08 Thread Catalin Marinas
On Thu, Oct 08, 2020 at 09:34:26PM +1100, Michael Ellerman wrote:
> Jann Horn  writes:
> > So while the mprotect() case
> > checks the flags and refuses unknown values, the mmap() code just lets
> > the architecture figure out which bits are actually valid to set (via
> > arch_calc_vm_prot_bits()) and silently ignores the rest?
> >
> > And powerpc apparently decided that they do want to error out on bogus
> > prot values passed to their version of mmap(), and in exchange, assume
> > in arch_calc_vm_prot_bits() that the protection bits are valid?
> 
> I don't think we really decided that, it just happened by accident and
> no one noticed/complained.
> 
> Seems userspace is pretty well behaved when it comes to passing prot
> values to mmap().

It's not necessarily about well behaved but whether it can have security
implications. On arm64, if the underlying memory does not support MTE
(say some DAX mmap) but we still allow PROT_MTE driven by user, it will
lead to an SError which brings the whole machine down.

Not sure whether ADI has similar requirements but at least for arm64 we
addressed the mmap() case as well (see my other email on the details; I
think the approach would work on SPARC as well).

-- 
Catalin


Re: [PATCH] powerpc/64: Make VDSO32 track COMPAT on 64-bit

2020-10-08 Thread Michael Ellerman
Srikar Dronamraju  writes:
> * Michael Ellerman  [2020-09-17 21:28:46]:
>
>> On Tue, 8 Sep 2020 22:58:50 +1000, Michael Ellerman wrote:
>> > When we added the VDSO32 kconfig symbol, which controls building of
>> > the 32-bit VDSO, we made it depend on CPU_BIG_ENDIAN (for 64-bit).
>> > 
>> > That was because back then COMPAT was always enabled for 64-bit, so
>> > depending on it would have left the 32-bit VDSO always enabled, which
>> > we didn't want.
>> > 
>> > [...]
>> 
>> Applied to powerpc/next.
>> 
>> [1/1] powerpc/64: Make VDSO32 track COMPAT on 64-bit
>>   
>> https://git.kernel.org/powerpc/c/231b232df8f67e7d37af01259c21f2a131c3911e
>> 
>> cheers
>
> With this commit which is part of powerpc/next and with
> /opt/at12.0/bin/gcc --version
> gcc (GCC) 8.4.1 20191125 (Advance-Toolchain 12.0-3) [e25f27eea473]
> throws up a compile error on a witherspoon/PowerNV with CONFIG_COMPAT.
> CONFIG_COMPAT got carried from the distro config. (And looks like most
> distros seem to be having this config)

This distro config will have it because previously it couldn't be
disabled. But now that it's selectable all LE distros should disable it.

> cc1: error: _-m32_ not supported in this configuration
> make[4]: *** [arch/powerpc/kernel/vdso32/sigtramp.o] Error 1
> make[4]: *** Waiting for unfinished jobs
> cc1: error: _-m32_ not supported in this configuration
> make[4]: *** [arch/powerpc/kernel/vdso32/gettimeofday.o] Error 1
> make[3]: *** [arch/powerpc/kernel/vdso32] Error 2
> make[3]: *** Waiting for unfinished jobs
> make[2]: *** [arch/powerpc/kernel] Error 2
> make[2]: *** Waiting for unfinished jobs
> make[1]: *** [arch/powerpc] Error 2
> make[1]: *** Waiting for unfinished jobs
> make: *** [__sub-make] Error 2
>
> I don't seem to be facing with other compilers like "gcc (Ubuntu
> 7.4.0-1ubuntu1~18.04.1) 7.4.0" and I was able to disable CONFIG_COMPAT and
> proceed with the build.

It seems your compiler doesn't support building 32-bit binaries. I'm
pretty sure the kernel.org ones do, or you can just turn off COMPAT.

cheers


[PATCH 4/4] powerpc/perf: Exclude kernel samples while counting events in user space.

2020-10-08 Thread Athira Rajeev
By setting exclude_kernel for user space profiling, we set the
freeze bits in Monitor Mode Control Register. Due to hardware
limitation, sometimes, Sampled Instruction Address register (SIAR)
captures kernel address even when counter freeze bits are set in
Monitor Mode Control Register (MMCR2). Patch adds a check to drop
these samples at such conditions.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/perf/core-book3s.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index c018004..10a2d1f 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2143,6 +2143,18 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
perf_event_update_userpage(event);
 
/*
+* Setting exclude_kernel will only freeze the
+* Performance Monitor counters and we may have
+* kernel address captured in SIAR. Hence drop
+* the kernel sample captured during user space
+* profiling. Setting `record` to zero will also
+* make sure event throlling is handled.
+*/
+   if (event->attr.exclude_kernel && record)
+   if (is_kernel_addr(mfspr(SPRN_SIAR)))
+   record = 0;
+
+   /*
 * Finally record data if requested.
 */
if (record) {
-- 
1.8.3.1



[PATCH 2/4] powerpc/perf: Using SIER[CMPL] instead of SIER[SIAR_VALID]

2020-10-08 Thread Athira Rajeev
On power10 DD1, there is an issue that causes the SIAR_VALID
bit of Sampled Instruction Event Register(SIER) not to be
set. But the SIAR_VALID bit is used for fetching the instruction
address from Sampled Instruction Address Register(SIAR), and
marked events are sampled only if the SIAR_VALID bit is set.
So add a condition check for power10 DD1 to use SIER[CMPL] bit
instead.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/perf/core-book3s.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 08643cb..d766090 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -350,7 +350,14 @@ static inline int siar_valid(struct pt_regs *regs)
int marked = mmcra & MMCRA_SAMPLE_ENABLE;
 
if (marked) {
-   if (ppmu->flags & PPMU_HAS_SIER)
+   /*
+* SIER[SIAR_VALID] is not set for some
+* marked events on power10 DD1, so use
+* SIER[CMPL] instead.
+*/
+   if (ppmu->flags & PPMU_P10_DD1)
+   return regs->dar & 0x1;
+   else if (ppmu->flags & PPMU_HAS_SIER)
return regs->dar & SIER_SIAR_VALID;
 
if (ppmu->flags & PPMU_SIAR_VALID)
-- 
1.8.3.1



[PATCH 3/4] powerpc/perf: Use the address from SIAR register to set cpumode flags

2020-10-08 Thread Athira Rajeev
While setting the processor mode for any sample, `perf_get_misc_flags`
expects the privilege level to differentiate the userspace and kernel
address. On power10 DD1, there is an issue that causes [MSR_HV MSR_PR] bits
of Sampled Instruction Event Register (SIER) not to be set for marked
events. Hence add a check to use the address in Sampled Instruction Address
Register (SIAR) to identify the privilege level.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/perf/core-book3s.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index d766090..c018004 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -250,11 +250,25 @@ static inline u32 perf_flags_from_msr(struct pt_regs 
*regs)
 static inline u32 perf_get_misc_flags(struct pt_regs *regs)
 {
bool use_siar = regs_use_siar(regs);
+   unsigned long mmcra = regs->dsisr;
+   int marked = mmcra & MMCRA_SAMPLE_ENABLE;
 
if (!use_siar)
return perf_flags_from_msr(regs);
 
/*
+* Check the address in SIAR to identify the
+* privilege levels since the SIER[MSR_HV, MSR_PR]
+* bits are not set for marked events in power10
+* DD1.
+*/
+   if (marked && (ppmu->flags & PPMU_P10_DD1)) {
+   if (is_kernel_addr(mfspr(SPRN_SIAR)))
+   return PERF_RECORD_MISC_KERNEL;
+   return PERF_RECORD_MISC_USER;
+   }
+
+   /*
 * If we don't have flags in MMCRA, rather than using
 * the MSR, we intuit the flags from the address in
 * SIAR which should give slightly more reliable
-- 
1.8.3.1



[PATCH 1/4] powerpc/perf: Add new power pmu flag "PPMU_P10_DD1" for power10 DD1

2020-10-08 Thread Athira Rajeev
Add a new power PMU flag "PPMU_P10_DD1" which can be
used to conditionally add any code path for power10 DD1 processor
version. Also modify power10 PMU driver code to set this
flag only for DD1, based on the Processor Version Register (PVR)
value.

Signed-off-by: Athira Rajeev 
---
 arch/powerpc/include/asm/perf_event_server.h | 1 +
 arch/powerpc/perf/power10-pmu.c  | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index f6acabb..3b7baba 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -82,6 +82,7 @@ struct power_pmu {
 #define PPMU_ARCH_207S 0x0080 /* PMC is architecture v2.07S */
 #define PPMU_NO_SIAR   0x0100 /* Do not use SIAR */
 #define PPMU_ARCH_31   0x0200 /* Has MMCR3, SIER2 and SIER3 */
+#define PPMU_P10_DD1   0x0400 /* Is power10 DD1 processor version 
*/
 
 /*
  * Values for flags to get_alternatives()
diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
index 8314865..47d930a 100644
--- a/arch/powerpc/perf/power10-pmu.c
+++ b/arch/powerpc/perf/power10-pmu.c
@@ -404,6 +404,7 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter)
 
 int init_power10_pmu(void)
 {
+   unsigned int pvr;
int rc;
 
/* Comes from cpu_specs[] */
@@ -411,6 +412,11 @@ int init_power10_pmu(void)
strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power10"))
return -ENODEV;
 
+   pvr = mfspr(SPRN_PVR);
+   /* Add the ppmu flag for power10 DD1 */
+   if ((PVR_CFG(pvr) == 1))
+   power10_pmu.flags |= PPMU_P10_DD1;
+
/* Set the PERF_REG_EXTENDED_MASK here */
PERF_REG_EXTENDED_MASK = PERF_REG_PMU_MASK_31;
 
-- 
1.8.3.1



[PATCH 0/4] powerpc/perf: Power PMU fixes for power10 DD1

2020-10-08 Thread Athira Rajeev
The patch series addresses PMU fixes for power10 DD1

Patch1 introduces a new power pmu flag to include
conditional code changes for power10 DD1.
Patch2 and Patch3 includes fixes in core-book3s to address
issues with marked events during sampling.
Patch4 includes fix to drop kernel samples while
userspace profiling.

Athira Rajeev (4):
  powerpc/perf: Add new power pmu flag "PPMU_P10_DD1" for power10 DD1
  powerpc/perf: Using SIER[CMPL] instead of SIER[SIAR_VALID]
  powerpc/perf: Use the address from SIAR register to set cpumode flags
  powerpc/perf: Exclude kernel samples while counting events in user
space.

 arch/powerpc/include/asm/perf_event_server.h |  1 +
 arch/powerpc/perf/core-book3s.c  | 35 +++-
 arch/powerpc/perf/power10-pmu.c  |  6 +
 3 files changed, 41 insertions(+), 1 deletion(-)

-- 
1.8.3.1



Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length

2020-10-08 Thread Michael Ellerman
Jann Horn  writes:
> On Wed, Oct 7, 2020 at 2:35 PM Christoph Hellwig  wrote:
>> On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote:
>> > diff --git a/arch/powerpc/kernel/syscalls.c 
>> > b/arch/powerpc/kernel/syscalls.c
>> > index 078608ec2e92..b1fabb97d138 100644
>> > --- a/arch/powerpc/kernel/syscalls.c
>> > +++ b/arch/powerpc/kernel/syscalls.c
>> > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t 
>> > len,
>> >  {
>> >   long ret = -EINVAL;
>> >
>> > - if (!arch_validate_prot(prot, addr))
>> > + if (!arch_validate_prot(prot, addr, len))
>>
>> This call isn't under mmap lock.  I also find it rather weird as the
>> generic code only calls arch_validate_prot from mprotect, only powerpc
>> also calls it from mmap.
>>
>> This seems to go back to commit ef3d3246a0d0
>> ("powerpc/mm: Add Strong Access Ordering support")
>
> I'm _guessing_ the idea in the generic case might be that mmap()
> doesn't check unknown bits in the protection flags, and therefore
> maybe people wanted to avoid adding new error cases that could be
> caused by random high bits being set?

I suspect it's just that when we added it we updated our do_mmap2() and
didn't touch the generic version because we didn't need to. ie. it's not
intentional it's just a buglet.

I think this is the original submission:

  
https://lore.kernel.org/linuxppc-dev/20080610220055.10257.84465.sendpatch...@norville.austin.ibm.com/

Which only calls arch_validate_prot() from mprotect and the powerpc
code, and there was no discussion about adding it elsewhere.

> So while the mprotect() case
> checks the flags and refuses unknown values, the mmap() code just lets
> the architecture figure out which bits are actually valid to set (via
> arch_calc_vm_prot_bits()) and silently ignores the rest?
>
> And powerpc apparently decided that they do want to error out on bogus
> prot values passed to their version of mmap(), and in exchange, assume
> in arch_calc_vm_prot_bits() that the protection bits are valid?

I don't think we really decided that, it just happened by accident and
no one noticed/complained.

Seems userspace is pretty well behaved when it comes to passing prot
values to mmap().

> powerpc's arch_validate_prot() doesn't actually need the mmap lock, so
> I think this is fine-ish for now (as in, while the code is a bit
> unclean, I don't think I'm making it worse, and I don't think it's
> actually buggy). In theory, we could move the arch_validate_prot()
> call over into the mmap guts, where we're holding the lock, and gate
> it on the architecture or on some feature CONFIG that powerpc can
> activate in its Kconfig. But I'm not sure whether that'd be helping or
> making things worse, so when I sent this patch, I deliberately left
> the powerpc stuff as-is.

I think what you've done is fine, and anything more elaborate is not
worth the effort.

cheers


Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length

2020-10-08 Thread Catalin Marinas
On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote:
> arch_validate_prot() is a hook that can validate whether a given set of
> protection flags is valid in an mprotect() operation. It is given the set
> of protection flags and the address being modified.
> 
> However, the address being modified can currently not actually be used in
> a meaningful way because:
> 
> 1. Only the address is given, but not the length, and the operation can
>span multiple VMAs. Therefore, the callee can't actually tell which
>virtual address range, or which VMAs, are being targeted.
> 2. The mmap_lock is not held, meaning that if the callee were to check
>the VMA at @addr, that VMA would be unrelated to the one the
>operation is performed on.
> 
> Currently, custom arch_validate_prot() handlers are defined by
> arm64, powerpc and sparc.
> arm64 and powerpc don't care about the address range, they just check the
> flags against CPU support masks.
> sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take
> the mmap_lock.
> 
> Change the function signature to also take a length, and move the
> arch_validate_prot() call in mm/mprotect.c down into the locked region.

For arm64 mte, I noticed the arch_validate_prot() issue with multiple
vmas and addressed this in a different way:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte=c462ac288f2c97e0c1d9ff6a65955317e799f958
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte=0042090548740921951f31fc0c20dcdb96638cb0

Both patches queued for 5.10.

Basically, arch_calc_vm_prot_bits() returns a VM_MTE if PROT_MTE has
been requested. The newly introduced arch_validate_flags() will check
the VM_MTE flag against what the system supports and this covers both
mmap() and mprotect(). Note that arch_validate_prot() only handles the
latter and I don't think it's sufficient for SPARC ADI. For arm64 MTE we
definitely wanted mmap() flags to be validated.

In addition, there's a new arch_calc_vm_flag_bits() which allows us to
set a VM_MTE_ALLOWED on a vma if the conditions are right (MAP_ANONYMOUS
or shmem_mmap():

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte=b3fbbea4c00220f62e6f7e2514466e6ee81f7f60

-- 
Catalin


Re: [PATCH v2 1/2] powerpc/rtas: Restrict RTAS requests from userspace

2020-10-08 Thread Michael Ellerman
Michael Ellerman  writes:
> Andrew Donnellan  writes:
>> On 26/8/20 11:53 pm, Sasha Levin wrote:
>>> How should we proceed with this patch?
>>
>> mpe: I believe we came to the conclusion that we shouldn't put this in 
>> stable just yet?
>
> Yeah.
>
> Let's give it a little time to get some wider testing before we backport
> it.

So my fault for not dropping the Cc: stable on the commit, sorry.

cheers


Re: [PATCH v2 1/2] powerpc/rtas: Restrict RTAS requests from userspace

2020-10-08 Thread Michael Ellerman
Andrew Donnellan  writes:
> On 26/8/20 11:53 pm, Sasha Levin wrote:
>> How should we proceed with this patch?
>
> mpe: I believe we came to the conclusion that we shouldn't put this in 
> stable just yet?

Yeah.

Let's give it a little time to get some wider testing before we backport
it.

cheers


Re: [PATCH] crypto: talitos - Fix sparse warnings

2020-10-08 Thread Christophe Leroy




Le 07/10/2020 à 08:50, Herbert Xu a écrit :

On Sat, Oct 03, 2020 at 07:15:53PM +0200, Christophe Leroy wrote:


The following changes fix the sparse warnings with less churn:


Yes that works too.  Can you please submit this patch?



This fixed two independant commits from the past. I sent out two fix patches.

Christophe


mm: Question about the use of 'accessed' flags and pte_young() helper

2020-10-08 Thread Christophe Leroy
In a 10 years old commit 
(https://github.com/linuxppc/linux/commit/d069cb4373fe0d451357c4d3769623a7564dfa9f), powerpc 8xx has 
made the handling of PTE accessed bit conditional to CONFIG_SWAP.

Since then, this has been extended to some other powerpc variants.

That commit means that when CONFIG_SWAP is not selected, the accessed bit is not set by SW TLB miss 
handlers, leading to pte_young() returning garbage, or should I say possibly returning false 
allthough a page has been accessed since its access flag was reset.


Looking at various mm/ places, pte_young() is used independent of CONFIG_SWAP

Is it still valid the not manage accessed flags when CONFIG_SWAP is not 
selected ?
If yes, should pte_young() always return true in that case ?

While we are at it, I'm wondering whether powerpc should redefine 
arch_faults_on_old_pte()
On some variants of powerpc, accessed flag is managed by HW. On others, it is managed by SW TLB miss 
handlers via page fault handling.


Thanks
Christophe


[PATCH] crypto: talitos - Fix return type of current_desc_hdr()

2020-10-08 Thread Christophe Leroy
current_desc_hdr() returns a u32 but in fact this is a __be32,
leading to a lot of sparse warnings.

Change the return type to __be32 and ensure it is handled as
sure by the caller.

Fixes: 3e721aeb3df3 ("crypto: talitos - handle descriptor not found in error 
path")
Signed-off-by: Christophe Leroy 
---
 drivers/crypto/talitos.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 7c547352a862..f9f0d34d49f3 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -460,7 +460,7 @@ DEF_TALITOS2_DONE(ch1_3, TALITOS2_ISR_CH_1_3_DONE)
 /*
  * locate current (offending) descriptor
  */
-static u32 current_desc_hdr(struct device *dev, int ch)
+static __be32 current_desc_hdr(struct device *dev, int ch)
 {
struct talitos_private *priv = dev_get_drvdata(dev);
int tail, iter;
@@ -501,13 +501,13 @@ static u32 current_desc_hdr(struct device *dev, int ch)
 /*
  * user diagnostics; report root cause of error based on execution unit status
  */
-static void report_eu_error(struct device *dev, int ch, u32 desc_hdr)
+static void report_eu_error(struct device *dev, int ch, __be32 desc_hdr)
 {
struct talitos_private *priv = dev_get_drvdata(dev);
int i;
 
if (!desc_hdr)
-   desc_hdr = in_be32(priv->chan[ch].reg + TALITOS_DESCBUF);
+   desc_hdr = cpu_to_be32(in_be32(priv->chan[ch].reg + 
TALITOS_DESCBUF));
 
switch (desc_hdr & DESC_HDR_SEL0_MASK) {
case DESC_HDR_SEL0_AFEU:
-- 
2.25.0



[PATCH] crypto: talitos - Endianess in current_desc_hdr()

2020-10-08 Thread Christophe Leroy
current_desc_hdr() compares the value of the current descriptor
with the next_desc member of the talitos_desc struct.

While the current descriptor is obtained from in_be32() which
return CPU ordered bytes, next_desc member is in big endian order.

Convert the current descriptor into big endian before comparing it
with next_desc.

This fixes a sparse warning.

Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on 
SEC1")
Signed-off-by: Christophe Leroy 
---
 drivers/crypto/talitos.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index f9f0d34d49f3..992d58a4dbf1 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -478,7 +478,7 @@ static __be32 current_desc_hdr(struct device *dev, int ch)
 
iter = tail;
while (priv->chan[ch].fifo[iter].dma_desc != cur_desc &&
-  priv->chan[ch].fifo[iter].desc->next_desc != cur_desc) {
+  priv->chan[ch].fifo[iter].desc->next_desc != 
cpu_to_be32(cur_desc)) {
iter = (iter + 1) & (priv->fifo_len - 1);
if (iter == tail) {
dev_err(dev, "couldn't locate current descriptor\n");
@@ -486,7 +486,7 @@ static __be32 current_desc_hdr(struct device *dev, int ch)
}
}
 
-   if (priv->chan[ch].fifo[iter].desc->next_desc == cur_desc) {
+   if (priv->chan[ch].fifo[iter].desc->next_desc == cpu_to_be32(cur_desc)) 
{
struct talitos_edesc *edesc;
 
edesc = container_of(priv->chan[ch].fifo[iter].desc,
-- 
2.25.0



[RFC PATCH] mm: Fetch the dirty bit before we reset the pte

2020-10-08 Thread Aneesh Kumar K.V
In copy_present_page, after we mark the pte non-writable, we should
check for previous dirty bit updates and make sure we don't lose the dirty
bit on reset.

Also, avoid marking the pte write-protected again if copy_present_page
already marked it write-protected.

Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: John Hubbard 
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Cc: Andrew Morton 
Cc: Jan Kara 
Cc: Michal Hocko 
Cc: Kirill Shutemov 
Cc: Hugh Dickins 
Cc: Linus Torvalds 
Signed-off-by: Aneesh Kumar K.V 
---
 mm/memory.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index bfe202ef6244..f57b1f04d50a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -848,6 +848,9 @@ copy_present_page(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
if (likely(!page_maybe_dma_pinned(page)))
return 1;
 
+   if (pte_dirty(*src_pte))
+   pte = pte_mkdirty(pte);
+
/*
 * Uhhuh. It looks like the page might be a pinned page,
 * and we actually need to copy it. Now we can set the
@@ -904,6 +907,11 @@ copy_present_pte(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
if (retval <= 0)
return retval;
 
+   /*
+* Fetch the src pte value again, copy_present_page
+* could modify it.
+*/
+   pte = *src_pte;
get_page(page);
page_dup_rmap(page, false);
rss[mm_counter(page)]++;
-- 
2.26.2



[PATCH] mm: Avoid using set_pte_at when updating a present pte

2020-10-08 Thread Aneesh Kumar K.V
This avoids the below warning

WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 
set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 30613 Comm: syz-executor.0 Not tainted 
5.9.0-rc8-syzkaller-00156-gc85fb28b6f99 #0
Call Trace:
 [c01cd1f0] panic+0x29c/0x75c kernel/panic.c:231
 [c01cce24] __warn+0x104/0x1b8 kernel/panic.c:600
 [c0d829e4] report_bug+0x1d4/0x380 lib/bug.c:198
 [c0036800] program_check_exception+0x4e0/0x750 
arch/powerpc/kernel/traps.c:1508
 [c00098a8] program_check_common_virt+0x308/0x360
--- interrupt: 700 at set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
LR = set_pte_at+0x2a4/0x3a0 arch/powerpc/mm/pgtable.c:185
 [c05d2a7c] copy_present_page mm/memory.c:857 [inline]
 [c05d2a7c] copy_present_pte mm/memory.c:899 [inline]
 [c05d2a7c] copy_pte_range mm/memory.c:1014 [inline]
 [c05d2a7c] copy_pmd_range mm/memory.c:1092 [inline]
 [c05d2a7c] copy_pud_range mm/memory.c:1127 [inline]
 [c05d2a7c] copy_p4d_range mm/memory.c:1150 [inline]
 [c05d2a7c] copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212
 [c01c63cc] dup_mmap kernel/fork.c:592 [inline]
 [c01c63cc] dup_mm+0x77c/0xab0 kernel/fork.c:1355
 [c01c8f70] copy_mm kernel/fork.c:1411 [inline]
 [c01c8f70] copy_process+0x1f00/0x2740 kernel/fork.c:2070
 [c01c9b54] _do_fork+0xc4/0x10b0 kernel/fork.c:2429
 [c01caf54] __do_sys_clone3+0x1d4/0x2b0 kernel/fork.c:27

Architecture like ppc64 expects set_pte_at to be not used for updating a
valid pte. This is further explained in
commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")

Cc: Peter Xu 
Cc: Jason Gunthorpe 
Cc: John Hubbard 
Cc: linux...@kvack.org
Cc: linux-ker...@vger.kernel.org
Cc: Andrew Morton 
Cc: Jan Kara 
Cc: Michal Hocko 
Cc: Kirill Shutemov 
Cc: Hugh Dickins 
Cc: Linus Torvalds 
Signed-off-by: Aneesh Kumar K.V 
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..bfe202ef6244 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -854,7 +854,7 @@ copy_present_page(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
 * source pte back to being writable.
 */
if (pte_write(pte))
-   set_pte_at(src_mm, addr, src_pte, pte);
+   ptep_set_access_flags(vma, addr, src_pte, pte, 1);
 
new_page = *prealloc;
if (!new_page)
-- 
2.26.2



[PATCH v4 4/4] powerpc/sstep: Add testcases for VSX vector paired load/store instructions

2020-10-08 Thread Ravi Bangoria
From: Balamuruhan S 

Add testcases for VSX vector paired load/store instructions.
Sample o/p:

  emulate_step_test: lxvp   : PASS
  emulate_step_test: stxvp  : PASS
  emulate_step_test: lxvpx  : PASS
  emulate_step_test: stxvpx : PASS
  emulate_step_test: plxvp  : PASS
  emulate_step_test: pstxvp : PASS

Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/test_emulate_step.c | 270 +++
 1 file changed, 270 insertions(+)

diff --git a/arch/powerpc/lib/test_emulate_step.c 
b/arch/powerpc/lib/test_emulate_step.c
index 0a201b771477..783d1b85ecfe 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -612,6 +612,273 @@ static void __init test_lxvd2x_stxvd2x(void)
 }
 #endif /* CONFIG_VSX */
 
+#ifdef CONFIG_VSX
+static void __init test_lxvp_stxvp(void)
+{
+   struct pt_regs regs;
+   union {
+   vector128 a;
+   u32 b[4];
+   } c[2];
+   u32 cached_b[8];
+   int stepped = -1;
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+   show_result("lxvp", "SKIP (!CPU_FTR_ARCH_31)");
+   show_result("stxvp", "SKIP (!CPU_FTR_ARCH_31)");
+   return;
+   }
+
+   init_pt_regs();
+
+   /*** lxvp ***/
+
+   cached_b[0] = c[0].b[0] = 18233;
+   cached_b[1] = c[0].b[1] = 34863571;
+   cached_b[2] = c[0].b[2] = 834;
+   cached_b[3] = c[0].b[3] = 6138911;
+   cached_b[4] = c[1].b[0] = 1234;
+   cached_b[5] = c[1].b[1] = 5678;
+   cached_b[6] = c[1].b[2] = 91011;
+   cached_b[7] = c[1].b[3] = 121314;
+
+   regs.gpr[4] = (unsigned long)[0].a;
+
+   /*
+* lxvp XTp,DQ(RA)
+* XTp = 32xTX + 2xTp
+* let TX=1 Tp=1 RA=4 DQ=0
+*/
+   stepped = emulate_step(, ppc_inst(PPC_RAW_LXVP(34, 4, 0)));
+
+   if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+   show_result("lxvp", "PASS");
+   } else {
+   if (!cpu_has_feature(CPU_FTR_VSX))
+   show_result("lxvp", "PASS (!CPU_FTR_VSX)");
+   else
+   show_result("lxvp", "FAIL");
+   }
+
+   /*** stxvp ***/
+
+   c[0].b[0] = 21379463;
+   c[0].b[1] = 87;
+   c[0].b[2] = 374234;
+   c[0].b[3] = 4;
+   c[1].b[0] = 90;
+   c[1].b[1] = 122;
+   c[1].b[2] = 555;
+   c[1].b[3] = 32144;
+
+   /*
+* stxvp XSp,DQ(RA)
+* XSp = 32xSX + 2xSp
+* let SX=1 Sp=1 RA=4 DQ=0
+*/
+   stepped = emulate_step(, ppc_inst(PPC_RAW_STXVP(34, 4, 0)));
+
+   if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == 
c[0].b[1] &&
+   cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] &&
+   cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] &&
+   cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] &&
+   cpu_has_feature(CPU_FTR_VSX)) {
+   show_result("stxvp", "PASS");
+   } else {
+   if (!cpu_has_feature(CPU_FTR_VSX))
+   show_result("stxvp", "PASS (!CPU_FTR_VSX)");
+   else
+   show_result("stxvp", "FAIL");
+   }
+}
+#else
+static void __init test_lxvp_stxvp(void)
+{
+   show_result("lxvp", "SKIP (CONFIG_VSX is not set)");
+   show_result("stxvp", "SKIP (CONFIG_VSX is not set)");
+}
+#endif /* CONFIG_VSX */
+
+#ifdef CONFIG_VSX
+static void __init test_lxvpx_stxvpx(void)
+{
+   struct pt_regs regs;
+   union {
+   vector128 a;
+   u32 b[4];
+   } c[2];
+   u32 cached_b[8];
+   int stepped = -1;
+
+   if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+   show_result("lxvpx", "SKIP (!CPU_FTR_ARCH_31)");
+   show_result("stxvpx", "SKIP (!CPU_FTR_ARCH_31)");
+   return;
+   }
+
+   init_pt_regs();
+
+   /*** lxvpx ***/
+
+   cached_b[0] = c[0].b[0] = 18233;
+   cached_b[1] = c[0].b[1] = 34863571;
+   cached_b[2] = c[0].b[2] = 834;
+   cached_b[3] = c[0].b[3] = 6138911;
+   cached_b[4] = c[1].b[0] = 1234;
+   cached_b[5] = c[1].b[1] = 5678;
+   cached_b[6] = c[1].b[2] = 91011;
+   cached_b[7] = c[1].b[3] = 121314;
+
+   regs.gpr[3] = (unsigned long)[0].a;
+   regs.gpr[4] = 0;
+
+   /*
+* lxvpx XTp,RA,RB
+* XTp = 32xTX + 2xTp
+* let TX=1 Tp=1 RA=3 RB=4
+*/
+   stepped = emulate_step(, ppc_inst(PPC_RAW_LXVPX(34, 3, 4)));
+
+   if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+   show_result("lxvpx", "PASS");
+   } else {
+   if (!cpu_has_feature(CPU_FTR_VSX))
+   show_result("lxvpx", "PASS (!CPU_FTR_VSX)");
+   else
+   show_result("lxvpx", "FAIL");
+   }
+
+   /*** stxvpx ***/
+
+   c[0].b[0] = 21379463;
+   c[0].b[1] = 87;
+   c[0].b[2] = 374234;
+   

[PATCH v4 3/4] powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions

2020-10-08 Thread Ravi Bangoria
From: Balamuruhan S 

Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as
macros for new VSX vector paired instructions,
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao 
Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/include/asm/ppc-opcode.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index a6e3700c4566..5e7918ca4fb7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -78,6 +78,9 @@
 
 #define IMM_L(i)   ((uintptr_t)(i) & 0x)
 #define IMM_DS(i)  ((uintptr_t)(i) & 0xfffc)
+#define IMM_DQ(i)  ((uintptr_t)(i) & 0xfff0)
+#define IMM_D0(i)  (((uintptr_t)(i) >> 16) & 0x3)
+#define IMM_D1(i)  IMM_L(i)
 
 /*
  * 16-bit immediate helper macros: HA() is for use with sign-extending instrs
@@ -295,6 +298,8 @@
 #define __PPC_XB(b)b) & 0x1f) << 11) | (((b) & 0x20) >> 4))
 #define __PPC_XS(s)s) & 0x1f) << 21) | (((s) & 0x20) >> 5))
 #define __PPC_XT(s)__PPC_XS(s)
+#define __PPC_XSP(s)   s) & 0x1e) | (((s) >> 5) & 0x1)) << 21)
+#define __PPC_XTP(s)   __PPC_XSP(s)
 #define __PPC_T_TLB(t) (((t) & 0x3) << 21)
 #define __PPC_WC(w)(((w) & 0x3) << 21)
 #define __PPC_WS(w)(((w) & 0x1f) << 11)
@@ -395,6 +400,14 @@
 #define PPC_RAW_XVCPSGNDP(t, a, b) ((0xf780 | VSX_XX3((t), (a), (b
 #define PPC_RAW_VPERMXOR(vrt, vra, vrb, vrc) \
((0x102d | ___PPC_RT(vrt) | ___PPC_RA(vra) | ___PPC_RB(vrb) | 
(((vrc) & 0x1f) << 6)))
+#define PPC_RAW_LXVP(xtp, a, i)(0x1800 | __PPC_XTP(xtp) | 
___PPC_RA(a) | IMM_DQ(i))
+#define PPC_RAW_STXVP(xsp, a, i)   (0x1801 | __PPC_XSP(xsp) | 
___PPC_RA(a) | IMM_DQ(i))
+#define PPC_RAW_LXVPX(xtp, a, b)   (0x7c00029a | __PPC_XTP(xtp) | 
___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_STXVPX(xsp, a, b)  (0x7c00039a | __PPC_XSP(xsp) | 
___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_PLXVP(xtp, i, a, pr) \
+   ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xe800 | 
__PPC_XTP(xtp) | ___PPC_RA(a) | IMM_D1(i)))
+#define PPC_RAW_PSTXVP(xsp, i, a, pr) \
+   ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xf800 | 
__PPC_XSP(xsp) | ___PPC_RA(a) | IMM_D1(i)))
 #define PPC_RAW_NAP(0x4c000364)
 #define PPC_RAW_SLEEP  (0x4c0003a4)
 #define PPC_RAW_WINKLE (0x4c0003e4)
-- 
2.26.2



[PATCH v4 2/4] powerpc/sstep: Support VSX vector paired storage access instructions

2020-10-08 Thread Ravi Bangoria
From: Balamuruhan S 

VSX Vector Paired instructions loads/stores an octword (32 bytes)
from/to storage into two sequential VSRs. Add emulation support
for these new instructions:
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao 
Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/sstep.c | 146 +--
 1 file changed, 125 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e6242744c71b..e39ee1651636 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -32,6 +32,10 @@ extern char system_call_vectored_emulate[];
 #define XER_OV32   0x0008U
 #define XER_CA32   0x0004U
 
+#ifdef CONFIG_VSX
+#define VSX_REGISTER_XTP(rd)   rd) & 1) << 5) | ((rd) & 0xfe))
+#endif
+
 #ifdef CONFIG_PPC_FPU
 /*
  * Functions in ldstfp.S
@@ -279,6 +283,19 @@ static nokprobe_inline void do_byte_reverse(void *ptr, int 
nb)
up[1] = tmp;
break;
}
+   case 32: {
+   unsigned long *up = (unsigned long *)ptr;
+   unsigned long tmp;
+
+   tmp = byterev_8(up[0]);
+   up[0] = byterev_8(up[3]);
+   up[3] = tmp;
+   tmp = byterev_8(up[2]);
+   up[2] = byterev_8(up[1]);
+   up[1] = tmp;
+   break;
+   }
+
 #endif
default:
WARN_ON_ONCE(1);
@@ -709,6 +726,8 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
reg->d[0] = reg->d[1] = 0;
 
switch (op->element_size) {
+   case 32:
+   /* [p]lxvp[x] */
case 16:
/* whole vector; lxv[x] or lxvl[l] */
if (size == 0)
@@ -717,7 +736,7 @@ void emulate_vsx_load(struct instruction_op *op, union 
vsx_reg *reg,
if (IS_LE && (op->vsx_flags & VSX_LDLEFT))
rev = !rev;
if (rev)
-   do_byte_reverse(reg, 16);
+   do_byte_reverse(reg, size);
break;
case 8:
/* scalar loads, lxvd2x, lxvdsx */
@@ -793,6 +812,20 @@ void emulate_vsx_store(struct instruction_op *op, const 
union vsx_reg *reg,
size = GETSIZE(op->type);
 
switch (op->element_size) {
+   case 32:
+   /* [p]stxvp[x] */
+   if (size == 0)
+   break;
+   if (rev) {
+   /* reverse 32 bytes */
+   buf.d[0] = byterev_8(reg->d[3]);
+   buf.d[1] = byterev_8(reg->d[2]);
+   buf.d[2] = byterev_8(reg->d[1]);
+   buf.d[3] = byterev_8(reg->d[0]);
+   reg = 
+   }
+   memcpy(mem, reg, size);
+   break;
case 16:
/* stxv, stxvx, stxvl, stxvll */
if (size == 0)
@@ -861,28 +894,43 @@ static nokprobe_inline int do_vsx_load(struct 
instruction_op *op,
   bool cross_endian)
 {
int reg = op->reg;
-   u8 mem[16];
-   union vsx_reg buf;
+   int i, j, nr_vsx_regs;
+   u8 mem[32];
+   union vsx_reg buf[2];
int size = GETSIZE(op->type);
 
if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs))
return -EFAULT;
 
-   emulate_vsx_load(op, , mem, cross_endian);
+   nr_vsx_regs = size / sizeof(__vector128);
+   emulate_vsx_load(op, buf, mem, cross_endian);
preempt_disable();
if (reg < 32) {
/* FP regs + extensions */
if (regs->msr & MSR_FP) {
-   load_vsrn(reg, );
+   for (i = 0; i < nr_vsx_regs; i++) {
+   j = IS_LE ? nr_vsx_regs - i - 1 : i;
+   load_vsrn(reg + i, [j].v);
+   }
} else {
-   current->thread.fp_state.fpr[reg][0] = buf.d[0];
-   current->thread.fp_state.fpr[reg][1] = buf.d[1];
+   for (i = 0; i < nr_vsx_regs; i++) {
+   j = IS_LE ? nr_vsx_regs - i - 1 : i;
+   current->thread.fp_state.fpr[reg + i][0] = 
buf[j].d[0];
+   current->thread.fp_state.fpr[reg + i][1] = 
buf[j].d[1];
+   }
}
} else {
-   if (regs->msr & MSR_VEC)
-   load_vsrn(reg, );
-   else
-   current->thread.vr_state.vr[reg - 32] = buf.v;
+   if (regs->msr & MSR_VEC) {
+   for (i = 0; i < 

[PATCH v4 0/4] powerpc/sstep: VSX 32-byte vector paired load/store instructions

2020-10-08 Thread Ravi Bangoria
VSX vector paired instructions operates with octword (32-byte)
operand for loads and stores between storage and a pair of two
sequential Vector-Scalar Registers (VSRs). There are 4 word
instructions and 2 prefixed instructions that provides this
32-byte storage access operations - lxvp, lxvpx, stxvp, stxvpx,
plxvp, pstxvp.

Emulation infrastructure doesn't have support for these instructions,
to operate with 32-byte storage access and to operate with 2 VSX
registers. This patch series enables the instruction emulation
support and adds test cases for them respectively.

v3: 
https://lore.kernel.org/linuxppc-dev/20200731081637.1837559-1-bal...@linux.ibm.com/
 

Changes in v4:
-
* Patch #1 is (kind of) new.
* Patch #2 now enables both analyse_instr() and emulate_step()
  unlike prev series where both were in separate patches.
* Patch #2 also has important fix for emulation on LE.
* Patch #3 and #4. Added XSP/XTP, D0/D1 instruction operands,
  removed *_EX_OP, __PPC_T[P][X] macros which are incorrect,
  and adhered to PPC_RAW_* convention.
* Added `CPU_FTR_ARCH_31` check in testcases to avoid failing
  in p8/p9.
* Some consmetic changes.
* Rebased to powerpc/next

Changes in v3:
-
Worked on review comments and suggestions from Ravi and Naveen,

* Fix the do_vsx_load() to handle vsx instructions if MSR_FP/MSR_VEC
  cleared in exception conditions and it reaches to read/write to
  thread_struct member fp_state/vr_state respectively.
* Fix wrongly used `__vector128 v[2]` in struct vsx_reg as it should
  hold a single vsx register size.
* Remove unnecessary `VSX_CHECK_VEC` flag set and condition to check
  `VSX_LDLEFT` that is not applicable for these vsx instructions.
* Fix comments in emulate_vsx_load() that were misleading.
* Rebased on latest powerpc next branch.

Changes in v2:
-
* Fix suggestion from Sandipan, wrap ISA 3.1 instructions with
  cpu_has_feature(CPU_FTR_ARCH_31) check.
* Rebase on latest powerpc next branch.


Balamuruhan S (4):
  powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31
is set
  powerpc/sstep: Support VSX vector paired storage access instructions
  powerpc/ppc-opcode: Add encoding macros for VSX vector paired
instructions
  powerpc/sstep: Add testcases for VSX vector paired load/store
instructions

 arch/powerpc/include/asm/ppc-opcode.h |  13 ++
 arch/powerpc/lib/sstep.c  | 152 +--
 arch/powerpc/lib/test_emulate_step.c  | 270 ++
 3 files changed, 414 insertions(+), 21 deletions(-)

-- 
2.26.2



[PATCH v4 1/4] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set

2020-10-08 Thread Ravi Bangoria
From: Balamuruhan S 

Unconditional emulation of prefixed instructions will allow
emulation of them on Power10 predecessors which might cause
issues. Restrict that.

Signed-off-by: Balamuruhan S 
Signed-off-by: Ravi Bangoria 
---
 arch/powerpc/lib/sstep.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e9dcaba9a4f8..e6242744c71b 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1346,6 +1346,9 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
switch (opcode) {
 #ifdef __powerpc64__
case 1:
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   return -1;
+
prefix_r = GET_PREFIX_R(word);
ra = GET_PREFIX_RA(suffix);
rd = (suffix >> 21) & 0x1f;
@@ -2733,6 +2736,9 @@ int analyse_instr(struct instruction_op *op, const struct 
pt_regs *regs,
}
break;
case 1: /* Prefixed instructions */
+   if (!cpu_has_feature(CPU_FTR_ARCH_31))
+   return -1;
+
prefix_r = GET_PREFIX_R(word);
ra = GET_PREFIX_RA(suffix);
op->update_reg = ra;
-- 
2.26.2



Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length

2020-10-08 Thread Christoph Hellwig
On Wed, Oct 07, 2020 at 04:42:55PM +0200, Jann Horn wrote:
> > > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t 
> > > len,
> > >  {
> > >   long ret = -EINVAL;
> > >
> > > - if (!arch_validate_prot(prot, addr))
> > > + if (!arch_validate_prot(prot, addr, len))
> >
> > This call isn't under mmap lock.  I also find it rather weird as the
> > generic code only calls arch_validate_prot from mprotect, only powerpc
> > also calls it from mmap.
> >
> > This seems to go back to commit ef3d3246a0d0
> > ("powerpc/mm: Add Strong Access Ordering support")
> 
> I'm _guessing_ the idea in the generic case might be that mmap()
> doesn't check unknown bits in the protection flags, and therefore
> maybe people wanted to avoid adding new error cases that could be
> caused by random high bits being set? So while the mprotect() case
> checks the flags and refuses unknown values, the mmap() code just lets
> the architecture figure out which bits are actually valid to set (via
> arch_calc_vm_prot_bits()) and silently ignores the rest?
> 
> And powerpc apparently decided that they do want to error out on bogus
> prot values passed to their version of mmap(), and in exchange, assume
> in arch_calc_vm_prot_bits() that the protection bits are valid?

The problem really is that now programs behave different on powerpc
compared to all other architectures.

> powerpc's arch_validate_prot() doesn't actually need the mmap lock, so
> I think this is fine-ish for now (as in, while the code is a bit
> unclean, I don't think I'm making it worse, and I don't think it's
> actually buggy). In theory, we could move the arch_validate_prot()
> call over into the mmap guts, where we're holding the lock, and gate
> it on the architecture or on some feature CONFIG that powerpc can
> activate in its Kconfig. But I'm not sure whether that'd be helping or
> making things worse, so when I sent this patch, I deliberately left
> the powerpc stuff as-is.

For now I'd just duplicate the trivial logic from arch_validate_prot
in the powerpc version of do_mmap2 and add a comment that this check
causes a gratious incompatibility to all other architectures.  And then
hope that the powerpc maintainers fix it up :)