[PATCH] powerpc/8xx: Fix instruction TLB miss exception with perf enabled
When perf is enabled, r11 must also be restored when CONFIG_HUGETLBFS is selected. Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_8xx.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 9f359d3fba74..32d85387bdc5 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -268,7 +268,7 @@ InstructionTLBMiss: addir10, r10, 1 stw r10, (itlb_miss_counter - PAGE_OFFSET)@l(0) mfspr r10, SPRN_SPRG_SCRATCH0 -#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) +#if defined(ITLB_MISS_KERNEL) || defined(CONFIG_SWAP) || defined(CONFIG_HUGETLBFS) mfspr r11, SPRN_SPRG_SCRATCH1 #endif rfi -- 2.25.0
Re: [PATCH] powerpc/powernv/dump: Fix race while processing OPAL dump
Vasant Hegde writes: > diff --git a/arch/powerpc/platforms/powernv/opal-dump.c > b/arch/powerpc/platforms/powernv/opal-dump.c > index 543c816fa99e..7e6eeedec32b 100644 > --- a/arch/powerpc/platforms/powernv/opal-dump.c > +++ b/arch/powerpc/platforms/powernv/opal-dump.c > @@ -346,21 +345,39 @@ static struct dump_obj *create_dump_obj(uint32_t id, > size_t size, > rc = kobject_add(>kobj, NULL, "0x%x-0x%x", type, id); > if (rc) { > kobject_put(>kobj); > - return NULL; > + return; > } > > + /* > + * As soon as the sysfs file for this dump is created/activated there is > + * a chance the opal_errd daemon (or any userspace) might read and > + * acknowledge the dump before kobject_uevent() is called. If that > + * happens then there is a potential race between > + * dump_ack_store->kobject_put() and kobject_uevent() which leads to a > + * use-after-free of a kernfs object resulting in a kernel crash. > + * > + * To avoid that, we need to take a reference on behalf of the bin file, > + * so that our reference remains valid while we call kobject_uevent(). > + * We then drop our reference before exiting the function, leaving the > + * bin file to drop the last reference (if it hasn't already). > + */ > + > + /* Take a reference for the bin file */ > + kobject_get(>kobj); > rc = sysfs_create_bin_file(>kobj, >dump_attr); > if (rc) { > kobject_put(>kobj); > - return NULL; > + /* Drop reference count taken for bin file */ > + kobject_put(>kobj); > + return; > } > > pr_info("%s: New platform dump. ID = 0x%x Size %u\n", > __func__, dump->id, dump->size); > > kobject_uevent(>kobj, KOBJ_ADD); > - > - return dump; > + /* Drop reference count taken for bin file */ > + kobject_put(>kobj); > } I think this would be better if it was reworked along the lines of: aea948bb80b4 ("powerpc/powernv/elog: Fix race while processing OPAL error log event.") cheers
Re: [PATCH] powerpc/powernv/elog: Reduce elog message severity
Vasant Hegde writes: > OPAL interrupts kernel whenever it has new error log. Kernel calls > interrupt handler (elog_event()) to retrieve event. elog_event makes > OPAL API call (opal_get_elog_size()) to retrieve elog info. > > In some case before kernel makes opal_get_elog_size() call, it gets interrupt > again. So second time when elog_event() calls opal_get_elog_size API OPAL > returns error. Can you give more detail there? Do you have a stack trace? We use IRQF_ONESHOT for elog_event(), which (I thought) meant it shouldn't be called again until it has completed. So I'm unclear how you're seeing the behaviour you describe. cheers > Its safe to ignore this error. Hence reduce the severity > of log message. > > CC: Mahesh Salgaonkar > Signed-off-by: Vasant Hegde > --- > arch/powerpc/platforms/powernv/opal-elog.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/powernv/opal-elog.c > b/arch/powerpc/platforms/powernv/opal-elog.c > index 62ef7ad995da..67f435bb1ec4 100644 > --- a/arch/powerpc/platforms/powernv/opal-elog.c > +++ b/arch/powerpc/platforms/powernv/opal-elog.c > @@ -247,7 +247,7 @@ static irqreturn_t elog_event(int irq, void *data) > > rc = opal_get_elog_size(, , ); > if (rc != OPAL_SUCCESS) { > - pr_err("ELOG: OPAL log info read failed\n"); > + pr_debug("ELOG: OPAL log info read failed\n"); > return IRQ_HANDLED; > } > > -- > 2.26.2
Re: linux-next: Fixes tag needs some work in the powerpc tree
Stephen Rothwell writes: > Hi all, > > In commit > > a2d0230b91f7 ("cpufreq: powernv: Fix frame-size-overflow in > powernv_cpufreq_reboot_notifier") > > Fixes tag > > Fixes: cf30af76 ("cpufreq: powernv: Set the cpus to nominal frequency > during reboot/kexec") Gah. I've changed my scripts to make this a hard error when I'm applying patches. cheers
Re: [PATCH 2/2] dt: Remove booting-without-of.rst
Rob Herring writes: > booting-without-of.rstt is an ancient document that first outlined ^ nit > Flattened DeviceTree on PowerPC initially. The DT world has evolved a > lot in the 15 years since and booting-without-of.rst is pretty stale. > The name of the document itself is confusing if you don't understand the > evolution from real 'OpenFirmware'. Most of what booting-without-of.rst > contains is now in the DT specification (which evolved out of the > ePAPR). The few things that weren't documented in the DT specification > are now. > > All that remains is the boot entry details, so let's move these to arch > specific documents. The exception is arm which already has the same > details documented. > > Cc: Frank Rowand > Cc: Mauro Carvalho Chehab > Cc: Geert Uytterhoeven > Cc: Michael Ellerman > Cc: Thomas Bogendoerfer > Cc: Jonathan Corbet > Cc: Paul Mackerras > Cc: Yoshinori Sato > Cc: Rich Felker > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: "H. Peter Anvin" > Cc: x...@kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-m...@vger.kernel.org > Cc: linux-...@vger.kernel.org > Cc: linux...@vger.kernel.org > Acked-by: Benjamin Herrenschmidt > Signed-off-by: Rob Herring > --- > .../devicetree/booting-without-of.rst | 1585 - > Documentation/devicetree/index.rst|1 - > Documentation/mips/booting.rst| 28 + > Documentation/mips/index.rst |1 + > Documentation/powerpc/booting.rst | 110 ++ LGTM. Acked-by: Michael Ellerman (powerpc) cheers
[powerpc:next] BUILD SUCCESS a2d0230b91f7e23ceb5d8fb6a9799f30517ec33a
ppc44x_defconfig h8300 h8s-sim_defconfig m68k m5249evb_defconfig xtensasmp_lx200_defconfig armkeystone_defconfig arm cm_x300_defconfig mips db1xxx_defconfig powerpc ppc6xx_defconfig riscv defconfig powerpcge_imp3a_defconfig arm axm55xx_defconfig powerpc ep88xc_defconfig powerpcgamecube_defconfig powerpc mpc832x_rdb_defconfig arm aspeed_g5_defconfig mips cu1000-neo_defconfig sh se7619_defconfig arm nhk8815_defconfig i386 allyesconfig arm bcm2835_defconfig sh espt_defconfig mips loongson3_defconfig mips ip28_defconfig armshmobile_defconfig powerpc arches_defconfig powerpc ksi8560_defconfig arm davinci_all_defconfig powerpc allyesconfig mips decstation_64_defconfig powerpc kmeter1_defconfig powerpc obs600_defconfig mips capcella_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68kdefconfig m68k allyesconfig arc allyesconfig nds32 allnoconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390defconfig sparcallyesconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a004-20201008 x86_64 randconfig-a003-20201008 x86_64 randconfig-a005-20201008 x86_64 randconfig-a001-20201008 x86_64 randconfig-a002-20201008 x86_64 randconfig-a006-20201008 i386 randconfig-a006-20201008 i386 randconfig-a005-20201008 i386 randconfig-a001-20201008 i386 randconfig-a004-20201008 i386 randconfig-a002-20201008 i386 randconfig-a003-20201008 i386 randconfig-a006-20201009 i386 randconfig-a005-20201009 i386 randconfig-a001-20201009 i386 randconfig-a004-20201009 i386 randconfig-a002-20201009 i386 randconfig-a003-20201009 x86_64 randconfig-a012-20201009 x86_64 randconfig-a015-20201009 x86_64 randconfig-a013-20201009 x86_64 randconfig-a014-20201009 x86_64 randconfig-a011-20201009 x86_64 randconfig-a016-20201009 i386 randconfig-a015-20201009 i386 randconfig-a013-20201009 i386 randconfig-a014-20201009 i386 randconfig-a016-20201009 i386 randconfig-a011-20201009 i386 randconfig-a012-20201009 i386 randconfig-a015-20201008 i386 randconfig-a013-20201008 i386 randconfig-a014-20201008 i386 randconfig-a016-20201008 i386 randconfig-a011-20201008 i386 randconfig-a012-20201008 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv rv32_defconfig riscvallmodconfig x86_64 rhel x86_64 allyesconfig x86_64rhel-7.6-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 kexec clang tested configs: x86_64 randconfig-a004-20201009 x86_64 randconfig-a003-20201009 x86_64 randconfig-a005-20201009 x86_64 randconfig-a001-20201009 x86_64 randconfig-a002-20201009 x86_64
[powerpc:merge] BUILD SUCCESS 118be7377c97e35c33819bcb3bbbae5a42a4ac43
ppa8548_defconfig powerpc obs600_defconfig mips capcella_defconfig powerpc kmeter1_defconfig openriscor1ksim_defconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a004-20201008 x86_64 randconfig-a003-20201008 x86_64 randconfig-a005-20201008 x86_64 randconfig-a001-20201008 x86_64 randconfig-a002-20201008 x86_64 randconfig-a006-20201008 i386 randconfig-a006-20201008 i386 randconfig-a005-20201008 i386 randconfig-a001-20201008 i386 randconfig-a004-20201008 i386 randconfig-a002-20201008 i386 randconfig-a003-20201008 x86_64 randconfig-a012-20201009 x86_64 randconfig-a015-20201009 x86_64 randconfig-a013-20201009 x86_64 randconfig-a014-20201009 x86_64 randconfig-a011-20201009 x86_64 randconfig-a016-20201009 i386 randconfig-a015-20201008 i386 randconfig-a013-20201008 i386 randconfig-a014-20201008 i386 randconfig-a016-20201008 i386 randconfig-a011-20201008 i386 randconfig-a012-20201008 riscvnommu_k210_defconfig riscvallyesconfig riscvnommu_virt_defconfig riscv allnoconfig riscv rv32_defconfig riscvallmodconfig x86_64 rhel x86_64 allyesconfig x86_64rhel-7.6-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 kexec clang tested configs: x86_64 randconfig-a012-20201008 x86_64 randconfig-a015-20201008 x86_64 randconfig-a013-20201008 x86_64 randconfig-a014-20201008 x86_64 randconfig-a011-20201008 x86_64 randconfig-a016-20201008 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
Linux kernel: powerpc: RTAS calls can be used to compromise kernel integrity
The Linux kernel for powerpc has an issue with the Run-Time Abstraction Services (RTAS) interface, allowing root (or CAP_SYS_ADMIN users) in a VM to overwrite some parts of memory, including kernel memory. This issue impacts guests running on top of PowerVM or KVM hypervisors (pseries platform), and does *not* impact bare-metal machines (powernv platform). Description === The RTAS interface, defined in the Power Architecture Platform Reference, provides various platform hardware services to operating systems running on PAPR platforms (e.g. the "pseries" platform in Linux, running in a LPAR/VM on PowerVM or KVM). Some userspace daemons require access to certain RTAS calls for system maintenance and monitoring purposes. The kernel exposes a syscall, sys_rtas, that allows root (or any user with CAP_SYS_ADMIN) to make arbitrary RTAS calls. For the RTAS calls which require a work area, it allocates a buffer (the "RMO buffer") and exposes the physical address in /proc so that the userspace tool can pass addresses within that buffer as an argument to the RTAS call. The syscall doesn't check that the work area arguments to RTAS calls are within the RMO buffer, which makes it trivial to read and write to any guest physical address within the LPAR's Real Memory Area, including overwriting the guest kernel's text. At the time the RTAS syscall interface was first developed, it was generally assumed that root had unlimited ability to modify system state, so this would not have been considered an integrity violation. However, with the advent of Secure Boot, Lockdown etc, root should not be able to arbitrarily modify the kernel text or read arbitrary kernel data. Therefore, while this issue impacts all kernels since the RTAS interface was first implemented, we are only considering it a vulnerability for upstream kernels from 5.3 onwards, which is when the Lockdown LSM was merged. Lockdown was widely included in pre-5.3 distribution kernels, so distribution vendors should consider whether they need to backport the patch to their pre-5.3 distro trees. (A CVE for this issue is pending; we requested one some time ago but it has not yet been assigned.) Fixes = A patch is currently in powerpc-next[0] and is expected to be included in mainline kernel 5.10. The patch has not yet been backported to upstream stable trees. The approach taken by the patch is to maintain the existing RTAS interface, but restrict requests to the list of RTAS calls actually used by the librtas userspace library, and restrict work area pointer arguments to the region within the RMO buffer. All RTAS-using applications that we are aware of are system management/monitoring tools, maintained by IBM, that use the librtas library. We don't anticipate there being any real world legitimate applications that require an RTAS call that isn't in the librtas list, however if such an application exists, the filtering can be disabled by a Kconfig option specified during kernel build. Credit == Thanks to Daniel Axtens (IBM) for initial discovery of this issue. [0] https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next=bd59380c5ba4147dcbaad3e582b55ccfd120b764 -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited
linux-next: Fixes tag needs some work in the powerpc tree
Hi all, In commit a2d0230b91f7 ("cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier") Fixes tag Fixes: cf30af76 ("cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec") has these problem(s): - SHA1 should be at least 12 digits long Can be fixed by setting core.abbrev to 12 (or more) or (for git v2.11 or later) just making sure it is not set (or set to "auto"). -- Cheers, Stephen Rothwell pgpscIrJo7gJ2.pgp Description: OpenPGP digital signature
Re: [RFC PATCH] mm: Fetch the dirty bit before we reset the pte
On 10/8/20 10:32 PM, Linus Torvalds wrote: On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V wrote: In copy_present_page, after we mark the pte non-writable, we should check for previous dirty bit updates and make sure we don't lose the dirty bit on reset. No, we'll just remove that entirely. Do you have a test-case that shows a problem? I have a patch that I was going to delay until 5.10 because I didn't think it mattered in practice.. Unfortunately, I don't have a test case. That was observed by code inspection while I was fixing syzkaller report. The second part of this patch would be to add a sequence count protection to fast-GUP pinning, so that GUP and fork() couldn't race, but I haven't written that part. Here's the first patch anyway. If you actually have a test-case where this matters, I guess I need to apply it now.. Linus -aneesh
Re: [RFC PATCH] mm: Fetch the dirty bit before we reset the pte
On Thu, Oct 8, 2020 at 10:02 AM Linus Torvalds wrote: > > Here's the first patch anyway. If you actually have a test-case where > this matters, I guess I need to apply it now.. Actually, I removed the "__page_mapcount()" part of that patch, to keep it minimal and _only_ do remove the wrprotect trick. We can do the __page_mapcount() optimization and the mm sequence count for 5.10 (although so far nobody has actually written the seqcount patch - I think it would be a trivial few-liner, but I guess it won't make 5.10 at this point). So here's what I ended up with. Linus From f3c64eda3e5097ec3198cb271f5f504d65d67131 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 28 Sep 2020 12:50:03 -0700 Subject: [PATCH] mm: avoid early COW write protect games during fork() In commit 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes") we write-protected the PTE before doing the page pinning check, in order to avoid a race with concurrent fast-GUP pinning (which doesn't take the mm semaphore or the page table lock). That trick doesn't actually work - it doesn't handle memory ordering properly, and doing so would be prohibitively expensive. It also isn't really needed. While we're moving in the direction of allowing and supporting page pinning without marking the pinned area with MADV_DONTFORK, the fact is that we've never really supported this kind of odd "concurrent fork() and page pinning", and doing the serialization on a pte level is just wrong. We can add serialization with a per-mm sequence counter, so we know how to solve that race properly, but we'll do that at a more appropriate time. Right now this just removes the write protect games. It also turns out that the write protect games actually break on Power, as reported by Aneesh Kumar: "Architecture like ppc64 expects set_pte_at to be not used for updating a valid pte. This is further explained in commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")" and the code triggered a warning there: WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185 Call Trace: copy_present_page mm/memory.c:857 [inline] copy_present_pte mm/memory.c:899 [inline] copy_pte_range mm/memory.c:1014 [inline] copy_pmd_range mm/memory.c:1092 [inline] copy_pud_range mm/memory.c:1127 [inline] copy_p4d_range mm/memory.c:1150 [inline] copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212 dup_mmap kernel/fork.c:592 [inline] dup_mm+0x77c/0xab0 kernel/fork.c:1355 copy_mm kernel/fork.c:1411 [inline] copy_process+0x1f00/0x2740 kernel/fork.c:2070 _do_fork+0xc4/0x10b0 kernel/fork.c:2429 Link: https://lore.kernel.org/lkml/CAHk-=wiwr+go0ro4lvnjbms90oiepnyre3e+pjvc9pzdbsh...@mail.gmail.com/ Link: https://lore.kernel.org/linuxppc-dev/20201008092541.398079-1-aneesh.ku...@linux.ibm.com/ Reported-by: Aneesh Kumar K.V Tested-by: Leon Romanovsky Cc: Peter Xu Cc: Jason Gunthorpe Cc: John Hubbard Cc: Andrew Morton Cc: Jan Kara Cc: Michal Hocko Cc: Kirill Shutemov Cc: Hugh Dickins Signed-off-by: Linus Torvalds --- mm/memory.c | 41 - 1 file changed, 4 insertions(+), 37 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index fcfc4ca36eba..eeae590e526a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -806,8 +806,6 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm, return 1; /* - * The trick starts. - * * What we want to do is to check whether this page may * have been pinned by the parent process. If so, * instead of wrprotect the pte on both sides, we copy @@ -815,47 +813,16 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm, * the pinned page won't be randomly replaced in the * future. * - * To achieve this, we do the following: - * - * 1. Write-protect the pte if it's writable. This is - *to protect concurrent write fast-gup with - *FOLL_PIN, so that we'll fail the fast-gup with - *the write bit removed. - * - * 2. Check page_maybe_dma_pinned() to see whether this - *page may have been pinned. - * - * The order of these steps is important to serialize - * against the fast-gup code (gup_pte_range()) on the - * pte check and try_grab_compound_head(), so that - * we'll make sure either we'll capture that fast-gup - * so we'll copy the pinned page here, or we'll fail - * that fast-gup. - * - * NOTE! Even if we don't end up copying the page, - * we won't undo this wrprotect(), because the normal - * reference copy will need it anyway. - */ - if (pte_write(pte)) - ptep_set_wrprotect(src_mm, addr, src_pte); - - /* - * These are the "normally we can just copy by reference" - * checks. + * The page pinning checks are just "has this mm ever + * seen pinning", along with the (inexact) check of + * the page count. That might give false positives for + * for
Re: [PATCH] mm: Avoid using set_pte_at when updating a present pte
Ahh, and I should learn to read all my emails before replying to some of them.. On Thu, Oct 8, 2020 at 2:26 AM Aneesh Kumar K.V wrote: > > This avoids the below warning > [..] > WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185 .. and I assume this is what triggered the other patch too. Yes, with the ppc warning, we need to do _something_ about this, and at that point I think the "something" is to just avoid the pte wrpritect trick. Linus
Re: [RFC PATCH] mm: Fetch the dirty bit before we reset the pte
[ Just adding Leon to the participants ] This patch (not attached again, Leon has seen it before) has been tested for the last couple of weeks for the rdma case, so I have no problems applying it now, just to keep everybody in the loop. Linus On Thu, Oct 8, 2020 at 10:02 AM Linus Torvalds wrote: > > On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V > wrote: > > > > In copy_present_page, after we mark the pte non-writable, we should > > check for previous dirty bit updates and make sure we don't lose the dirty > > bit on reset. > > No, we'll just remove that entirely. > > Do you have a test-case that shows a problem? I have a patch that I > was going to delay until 5.10 because I didn't think it mattered in > practice.. > > The second part of this patch would be to add a sequence count > protection to fast-GUP pinning, so that GUP and fork() couldn't race, > but I haven't written that part. > > Here's the first patch anyway. If you actually have a test-case where > this matters, I guess I need to apply it now.. > >Linus
Re: [RFC PATCH] mm: Fetch the dirty bit before we reset the pte
On Thu, Oct 8, 2020 at 2:27 AM Aneesh Kumar K.V wrote: > > In copy_present_page, after we mark the pte non-writable, we should > check for previous dirty bit updates and make sure we don't lose the dirty > bit on reset. No, we'll just remove that entirely. Do you have a test-case that shows a problem? I have a patch that I was going to delay until 5.10 because I didn't think it mattered in practice.. The second part of this patch would be to add a sequence count protection to fast-GUP pinning, so that GUP and fork() couldn't race, but I haven't written that part. Here's the first patch anyway. If you actually have a test-case where this matters, I guess I need to apply it now.. Linus fork-cleanup Description: Binary data
Re: [PATCH 2/2] dt: Remove booting-without-of.rst
On Thu, Oct 08, 2020 at 09:24:20AM -0500, Rob Herring wrote: > booting-without-of.rstt is an ancient document that first outlined > Flattened DeviceTree on PowerPC initially. The DT world has evolved a > lot in the 15 years since and booting-without-of.rst is pretty stale. > The name of the document itself is confusing if you don't understand the > evolution from real 'OpenFirmware'. Most of what booting-without-of.rst > contains is now in the DT specification (which evolved out of the > ePAPR). The few things that weren't documented in the DT specification > are now. > > All that remains is the boot entry details, so let's move these to arch > specific documents. The exception is arm which already has the same > details documented. > > Cc: Frank Rowand > Cc: Mauro Carvalho Chehab > Cc: Geert Uytterhoeven > Cc: Michael Ellerman > Cc: Thomas Bogendoerfer > Cc: Jonathan Corbet > Cc: Paul Mackerras > Cc: Yoshinori Sato > Cc: Rich Felker > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: "H. Peter Anvin" > Cc: x...@kernel.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-m...@vger.kernel.org > Cc: linux-...@vger.kernel.org > Cc: linux...@vger.kernel.org > Acked-by: Benjamin Herrenschmidt > Signed-off-by: Rob Herring > --- > .../devicetree/booting-without-of.rst | 1585 - > Documentation/devicetree/index.rst|1 - > Documentation/mips/booting.rst| 28 + > Documentation/mips/index.rst |1 + > Documentation/powerpc/booting.rst | 110 ++ > Documentation/powerpc/index.rst |1 + > Documentation/sh/booting.rst | 12 + > Documentation/sh/index.rst|1 + > Documentation/x86/booting-dt.rst | 21 + > Documentation/x86/index.rst |1 + For x86: Acked-by: Borislav Petkov Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
[PATCH 2/2] dt: Remove booting-without-of.rst
booting-without-of.rstt is an ancient document that first outlined Flattened DeviceTree on PowerPC initially. The DT world has evolved a lot in the 15 years since and booting-without-of.rst is pretty stale. The name of the document itself is confusing if you don't understand the evolution from real 'OpenFirmware'. Most of what booting-without-of.rst contains is now in the DT specification (which evolved out of the ePAPR). The few things that weren't documented in the DT specification are now. All that remains is the boot entry details, so let's move these to arch specific documents. The exception is arm which already has the same details documented. Cc: Frank Rowand Cc: Mauro Carvalho Chehab Cc: Geert Uytterhoeven Cc: Michael Ellerman Cc: Thomas Bogendoerfer Cc: Jonathan Corbet Cc: Paul Mackerras Cc: Yoshinori Sato Cc: Rich Felker Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Cc: x...@kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-m...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux...@vger.kernel.org Acked-by: Benjamin Herrenschmidt Signed-off-by: Rob Herring --- .../devicetree/booting-without-of.rst | 1585 - Documentation/devicetree/index.rst|1 - Documentation/mips/booting.rst| 28 + Documentation/mips/index.rst |1 + Documentation/powerpc/booting.rst | 110 ++ Documentation/powerpc/index.rst |1 + Documentation/sh/booting.rst | 12 + Documentation/sh/index.rst|1 + Documentation/x86/booting-dt.rst | 21 + Documentation/x86/index.rst |1 + 10 files changed, 175 insertions(+), 1586 deletions(-) delete mode 100644 Documentation/devicetree/booting-without-of.rst create mode 100644 Documentation/mips/booting.rst create mode 100644 Documentation/powerpc/booting.rst create mode 100644 Documentation/sh/booting.rst create mode 100644 Documentation/x86/booting-dt.rst diff --git a/Documentation/devicetree/booting-without-of.rst b/Documentation/devicetree/booting-without-of.rst deleted file mode 100644 index e9433350a20f.. --- a/Documentation/devicetree/booting-without-of.rst +++ /dev/null @@ -1,1585 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 - -== -Booting the Linux/ppc kernel without Open Firmware -== - -Copyright (c) 2005 Benjamin Herrenschmidt , -IBM Corp. - -Copyright (c) 2005 Becky Bruce , -Freescale Semiconductor, FSL SOC and 32-bit additions - -Copyright (c) 2006 MontaVista Software, Inc. -Flash chip node definition - -.. Table of Contents - - I - Introduction -1) Entry point for arch/arm -2) Entry point for arch/powerpc -3) Entry point for arch/x86 -4) Entry point for arch/mips/bmips -5) Entry point for arch/sh - - II - The DT block format -1) Header -2) Device tree generalities -3) Device tree "structure" block -4) Device tree "strings" block - - III - Required content of the device tree -1) Note about cells and address representation -2) Note about "compatible" properties -3) Note about "name" properties -4) Note about node and property names and character set -5) Required nodes and properties - a) The root node - b) The /cpus node - c) The /cpus/* nodes - d) the /memory node(s) - e) The /chosen node - f) the /soc node - - IV - "dtc", the device tree compiler - - V - Recommendations for a bootloader - - VI - System-on-a-chip devices and nodes -1) Defining child nodes of an SOC -2) Representing devices without a current OF specification - - VII - Specifying interrupt information for devices -1) interrupts property -2) interrupt-parent property -3) OpenPIC Interrupt Controllers -4) ISA Interrupt Controllers - - VIII - Specifying device power management information (sleep property) - - IX - Specifying dma bus information - - Appendix A - Sample SOC node for MPC8540 - - -Revision Information - - - May 18, 2005: Rev 0.1 -- Initial draft, no chapter III yet. - - May 19, 2005: Rev 0.2 -- Add chapter III and bits & pieces here or - clarifies the fact that a lot of things are - optional, the kernel only requires a very - small device tree, though it is encouraged - to provide an as complete one as possible. - - May 24, 2005: Rev 0.3 -- Precise that DT block has to be in RAM -- Misc fixes -- Define version 3 and new format version 16 - for the DT block (version 16 needs kernel - patches, will be fwd separately). -
[PATCH 1/2] dt-bindings: powerpc: Add a schema for the 'sleep' property
Document the PowerPC specific 'sleep' property as a schema. It is currently only documented in booting-without-of.rst which is getting removed. Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring --- .../devicetree/bindings/powerpc/sleep.yaml| 47 +++ 1 file changed, 47 insertions(+) create mode 100644 Documentation/devicetree/bindings/powerpc/sleep.yaml diff --git a/Documentation/devicetree/bindings/powerpc/sleep.yaml b/Documentation/devicetree/bindings/powerpc/sleep.yaml new file mode 100644 index ..6494c7d08b93 --- /dev/null +++ b/Documentation/devicetree/bindings/powerpc/sleep.yaml @@ -0,0 +1,47 @@ +# SPDX-License-Identifier: GPL-2.0-only +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/powerpc/sleep.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: PowerPC sleep property + +maintainers: + - Rob Herring + +description: | + Devices on SOCs often have mechanisms for placing devices into low-power + states that are decoupled from the devices' own register blocks. Sometimes, + this information is more complicated than a cell-index property can + reasonably describe. Thus, each device controlled in such a manner + may contain a "sleep" property which describes these connections. + + The sleep property consists of one or more sleep resources, each of + which consists of a phandle to a sleep controller, followed by a + controller-specific sleep specifier of zero or more cells. + + The semantics of what type of low power modes are possible are defined + by the sleep controller. Some examples of the types of low power modes + that may be supported are: + + - Dynamic: The device may be disabled or enabled at any time. + - System Suspend: The device may request to be disabled or remain + awake during system suspend, but will not be disabled until then. + - Permanent: The device is disabled permanently (until the next hard + reset). + + Some devices may share a clock domain with each other, such that they should + only be suspended when none of the devices are in use. Where reasonable, + such nodes should be placed on a virtual bus, where the bus has the sleep + property. If the clock domain is shared among devices that cannot be + reasonably grouped in this manner, then create a virtual sleep controller + (similar to an interrupt nexus, except that defining a standardized + sleep-map should wait until its necessity is demonstrated). + +select: true + +properties: + sleep: +$ref: /schemas/types.yaml#definitions/phandle-array + +additionalProperties: true -- 2.25.1
Re: [PATCH v4 2/4] powerpc/sstep: Support VSX vector paired storage access instructions
Hi Ravi, Thank you for the patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v5.9-rc8 next-20201007] [cannot apply to mpe/next scottwood/next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Ravi-Bangoria/powerpc-sstep-VSX-32-byte-vector-paired-load-store-instructions/20201008-153614 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-g5_defconfig (attached as .config) compiler: powerpc64-linux-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/55def6779849f9aec057f405abf1cd98a8674b4f git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Ravi-Bangoria/powerpc-sstep-VSX-32-byte-vector-paired-load-store-instructions/20201008-153614 git checkout 55def6779849f9aec057f405abf1cd98a8674b4f # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): arch/powerpc/lib/sstep.c: In function 'analyse_instr': >> arch/powerpc/lib/sstep.c:2901:15: error: implicit declaration of function >> 'VSX_REGISTER_XTP'; did you mean 'H_REGISTER_SMR'? >> [-Werror=implicit-function-declaration] 2901 | op->reg = VSX_REGISTER_XTP(rd); | ^~~~ | H_REGISTER_SMR cc1: all warnings being treated as errors vim +2901 arch/powerpc/lib/sstep.c 2815 2816 #ifdef __powerpc64__ 2817 case 62:/* std[u] */ 2818 op->ea = dsform_ea(word, regs); 2819 switch (word & 3) { 2820 case 0: /* std */ 2821 op->type = MKOP(STORE, 0, 8); 2822 break; 2823 case 1: /* stdu */ 2824 op->type = MKOP(STORE, UPDATE, 8); 2825 break; 2826 case 2: /* stq */ 2827 if (!(rd & 1)) 2828 op->type = MKOP(STORE, 0, 16); 2829 break; 2830 } 2831 break; 2832 case 1: /* Prefixed instructions */ 2833 if (!cpu_has_feature(CPU_FTR_ARCH_31)) 2834 return -1; 2835 2836 prefix_r = GET_PREFIX_R(word); 2837 ra = GET_PREFIX_RA(suffix); 2838 op->update_reg = ra; 2839 rd = (suffix >> 21) & 0x1f; 2840 op->reg = rd; 2841 op->val = regs->gpr[rd]; 2842 2843 suffixopcode = get_op(suffix); 2844 prefixtype = (word >> 24) & 0x3; 2845 switch (prefixtype) { 2846 case 0: /* Type 00 Eight-Byte Load/Store */ 2847 if (prefix_r && ra) 2848 break; 2849 op->ea = mlsd_8lsd_ea(word, suffix, regs); 2850 switch (suffixopcode) { 2851 case 41:/* plwa */ 2852 op->type = MKOP(LOAD, PREFIXED | SIGNEXT, 4); 2853 break; 2854 case 42:/* plxsd */ 2855 op->reg = rd + 32; 2856 op->type = MKOP(LOAD_VSX, PREFIXED, 8); 2857 op->element_size = 8; 2858 op->vsx_flags = VSX_CHECK_VEC; 2859 break; 2860 case 43:/* plxssp */ 2861 op->reg = rd + 32; 2862 op->type = MKOP(LOAD_VSX, PREFIXED, 4); 2863 op->element_size = 8; 2864 op->vsx_flags = VSX_FPCONV | VSX_CHECK_VEC; 2865 break; 2866 case 46:/* pstxsd */ 2867 op->reg = rd + 32; 2868 op->type = MKOP(STORE_VSX, PREFIXED, 8); 2869 op->element_size = 8
Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
On Thu, Oct 08, 2020 at 09:34:26PM +1100, Michael Ellerman wrote: > Jann Horn writes: > > So while the mprotect() case > > checks the flags and refuses unknown values, the mmap() code just lets > > the architecture figure out which bits are actually valid to set (via > > arch_calc_vm_prot_bits()) and silently ignores the rest? > > > > And powerpc apparently decided that they do want to error out on bogus > > prot values passed to their version of mmap(), and in exchange, assume > > in arch_calc_vm_prot_bits() that the protection bits are valid? > > I don't think we really decided that, it just happened by accident and > no one noticed/complained. > > Seems userspace is pretty well behaved when it comes to passing prot > values to mmap(). It's not necessarily about well behaved but whether it can have security implications. On arm64, if the underlying memory does not support MTE (say some DAX mmap) but we still allow PROT_MTE driven by user, it will lead to an SError which brings the whole machine down. Not sure whether ADI has similar requirements but at least for arm64 we addressed the mmap() case as well (see my other email on the details; I think the approach would work on SPARC as well). -- Catalin
Re: [PATCH] powerpc/64: Make VDSO32 track COMPAT on 64-bit
Srikar Dronamraju writes: > * Michael Ellerman [2020-09-17 21:28:46]: > >> On Tue, 8 Sep 2020 22:58:50 +1000, Michael Ellerman wrote: >> > When we added the VDSO32 kconfig symbol, which controls building of >> > the 32-bit VDSO, we made it depend on CPU_BIG_ENDIAN (for 64-bit). >> > >> > That was because back then COMPAT was always enabled for 64-bit, so >> > depending on it would have left the 32-bit VDSO always enabled, which >> > we didn't want. >> > >> > [...] >> >> Applied to powerpc/next. >> >> [1/1] powerpc/64: Make VDSO32 track COMPAT on 64-bit >> >> https://git.kernel.org/powerpc/c/231b232df8f67e7d37af01259c21f2a131c3911e >> >> cheers > > With this commit which is part of powerpc/next and with > /opt/at12.0/bin/gcc --version > gcc (GCC) 8.4.1 20191125 (Advance-Toolchain 12.0-3) [e25f27eea473] > throws up a compile error on a witherspoon/PowerNV with CONFIG_COMPAT. > CONFIG_COMPAT got carried from the distro config. (And looks like most > distros seem to be having this config) This distro config will have it because previously it couldn't be disabled. But now that it's selectable all LE distros should disable it. > cc1: error: _-m32_ not supported in this configuration > make[4]: *** [arch/powerpc/kernel/vdso32/sigtramp.o] Error 1 > make[4]: *** Waiting for unfinished jobs > cc1: error: _-m32_ not supported in this configuration > make[4]: *** [arch/powerpc/kernel/vdso32/gettimeofday.o] Error 1 > make[3]: *** [arch/powerpc/kernel/vdso32] Error 2 > make[3]: *** Waiting for unfinished jobs > make[2]: *** [arch/powerpc/kernel] Error 2 > make[2]: *** Waiting for unfinished jobs > make[1]: *** [arch/powerpc] Error 2 > make[1]: *** Waiting for unfinished jobs > make: *** [__sub-make] Error 2 > > I don't seem to be facing with other compilers like "gcc (Ubuntu > 7.4.0-1ubuntu1~18.04.1) 7.4.0" and I was able to disable CONFIG_COMPAT and > proceed with the build. It seems your compiler doesn't support building 32-bit binaries. I'm pretty sure the kernel.org ones do, or you can just turn off COMPAT. cheers
[PATCH 4/4] powerpc/perf: Exclude kernel samples while counting events in user space.
By setting exclude_kernel for user space profiling, we set the freeze bits in Monitor Mode Control Register. Due to hardware limitation, sometimes, Sampled Instruction Address register (SIAR) captures kernel address even when counter freeze bits are set in Monitor Mode Control Register (MMCR2). Patch adds a check to drop these samples at such conditions. Signed-off-by: Athira Rajeev --- arch/powerpc/perf/core-book3s.c | 12 1 file changed, 12 insertions(+) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index c018004..10a2d1f 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -2143,6 +2143,18 @@ static void record_and_restart(struct perf_event *event, unsigned long val, perf_event_update_userpage(event); /* +* Setting exclude_kernel will only freeze the +* Performance Monitor counters and we may have +* kernel address captured in SIAR. Hence drop +* the kernel sample captured during user space +* profiling. Setting `record` to zero will also +* make sure event throlling is handled. +*/ + if (event->attr.exclude_kernel && record) + if (is_kernel_addr(mfspr(SPRN_SIAR))) + record = 0; + + /* * Finally record data if requested. */ if (record) { -- 1.8.3.1
[PATCH 2/4] powerpc/perf: Using SIER[CMPL] instead of SIER[SIAR_VALID]
On power10 DD1, there is an issue that causes the SIAR_VALID bit of Sampled Instruction Event Register(SIER) not to be set. But the SIAR_VALID bit is used for fetching the instruction address from Sampled Instruction Address Register(SIAR), and marked events are sampled only if the SIAR_VALID bit is set. So add a condition check for power10 DD1 to use SIER[CMPL] bit instead. Signed-off-by: Athira Rajeev --- arch/powerpc/perf/core-book3s.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 08643cb..d766090 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -350,7 +350,14 @@ static inline int siar_valid(struct pt_regs *regs) int marked = mmcra & MMCRA_SAMPLE_ENABLE; if (marked) { - if (ppmu->flags & PPMU_HAS_SIER) + /* +* SIER[SIAR_VALID] is not set for some +* marked events on power10 DD1, so use +* SIER[CMPL] instead. +*/ + if (ppmu->flags & PPMU_P10_DD1) + return regs->dar & 0x1; + else if (ppmu->flags & PPMU_HAS_SIER) return regs->dar & SIER_SIAR_VALID; if (ppmu->flags & PPMU_SIAR_VALID) -- 1.8.3.1
[PATCH 3/4] powerpc/perf: Use the address from SIAR register to set cpumode flags
While setting the processor mode for any sample, `perf_get_misc_flags` expects the privilege level to differentiate the userspace and kernel address. On power10 DD1, there is an issue that causes [MSR_HV MSR_PR] bits of Sampled Instruction Event Register (SIER) not to be set for marked events. Hence add a check to use the address in Sampled Instruction Address Register (SIAR) to identify the privilege level. Signed-off-by: Athira Rajeev --- arch/powerpc/perf/core-book3s.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index d766090..c018004 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -250,11 +250,25 @@ static inline u32 perf_flags_from_msr(struct pt_regs *regs) static inline u32 perf_get_misc_flags(struct pt_regs *regs) { bool use_siar = regs_use_siar(regs); + unsigned long mmcra = regs->dsisr; + int marked = mmcra & MMCRA_SAMPLE_ENABLE; if (!use_siar) return perf_flags_from_msr(regs); /* +* Check the address in SIAR to identify the +* privilege levels since the SIER[MSR_HV, MSR_PR] +* bits are not set for marked events in power10 +* DD1. +*/ + if (marked && (ppmu->flags & PPMU_P10_DD1)) { + if (is_kernel_addr(mfspr(SPRN_SIAR))) + return PERF_RECORD_MISC_KERNEL; + return PERF_RECORD_MISC_USER; + } + + /* * If we don't have flags in MMCRA, rather than using * the MSR, we intuit the flags from the address in * SIAR which should give slightly more reliable -- 1.8.3.1
[PATCH 1/4] powerpc/perf: Add new power pmu flag "PPMU_P10_DD1" for power10 DD1
Add a new power PMU flag "PPMU_P10_DD1" which can be used to conditionally add any code path for power10 DD1 processor version. Also modify power10 PMU driver code to set this flag only for DD1, based on the Processor Version Register (PVR) value. Signed-off-by: Athira Rajeev --- arch/powerpc/include/asm/perf_event_server.h | 1 + arch/powerpc/perf/power10-pmu.c | 6 ++ 2 files changed, 7 insertions(+) diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h index f6acabb..3b7baba 100644 --- a/arch/powerpc/include/asm/perf_event_server.h +++ b/arch/powerpc/include/asm/perf_event_server.h @@ -82,6 +82,7 @@ struct power_pmu { #define PPMU_ARCH_207S 0x0080 /* PMC is architecture v2.07S */ #define PPMU_NO_SIAR 0x0100 /* Do not use SIAR */ #define PPMU_ARCH_31 0x0200 /* Has MMCR3, SIER2 and SIER3 */ +#define PPMU_P10_DD1 0x0400 /* Is power10 DD1 processor version */ /* * Values for flags to get_alternatives() diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c index 8314865..47d930a 100644 --- a/arch/powerpc/perf/power10-pmu.c +++ b/arch/powerpc/perf/power10-pmu.c @@ -404,6 +404,7 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter) int init_power10_pmu(void) { + unsigned int pvr; int rc; /* Comes from cpu_specs[] */ @@ -411,6 +412,11 @@ int init_power10_pmu(void) strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power10")) return -ENODEV; + pvr = mfspr(SPRN_PVR); + /* Add the ppmu flag for power10 DD1 */ + if ((PVR_CFG(pvr) == 1)) + power10_pmu.flags |= PPMU_P10_DD1; + /* Set the PERF_REG_EXTENDED_MASK here */ PERF_REG_EXTENDED_MASK = PERF_REG_PMU_MASK_31; -- 1.8.3.1
[PATCH 0/4] powerpc/perf: Power PMU fixes for power10 DD1
The patch series addresses PMU fixes for power10 DD1 Patch1 introduces a new power pmu flag to include conditional code changes for power10 DD1. Patch2 and Patch3 includes fixes in core-book3s to address issues with marked events during sampling. Patch4 includes fix to drop kernel samples while userspace profiling. Athira Rajeev (4): powerpc/perf: Add new power pmu flag "PPMU_P10_DD1" for power10 DD1 powerpc/perf: Using SIER[CMPL] instead of SIER[SIAR_VALID] powerpc/perf: Use the address from SIAR register to set cpumode flags powerpc/perf: Exclude kernel samples while counting events in user space. arch/powerpc/include/asm/perf_event_server.h | 1 + arch/powerpc/perf/core-book3s.c | 35 +++- arch/powerpc/perf/power10-pmu.c | 6 + 3 files changed, 41 insertions(+), 1 deletion(-) -- 1.8.3.1
Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
Jann Horn writes: > On Wed, Oct 7, 2020 at 2:35 PM Christoph Hellwig wrote: >> On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote: >> > diff --git a/arch/powerpc/kernel/syscalls.c >> > b/arch/powerpc/kernel/syscalls.c >> > index 078608ec2e92..b1fabb97d138 100644 >> > --- a/arch/powerpc/kernel/syscalls.c >> > +++ b/arch/powerpc/kernel/syscalls.c >> > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t >> > len, >> > { >> > long ret = -EINVAL; >> > >> > - if (!arch_validate_prot(prot, addr)) >> > + if (!arch_validate_prot(prot, addr, len)) >> >> This call isn't under mmap lock. I also find it rather weird as the >> generic code only calls arch_validate_prot from mprotect, only powerpc >> also calls it from mmap. >> >> This seems to go back to commit ef3d3246a0d0 >> ("powerpc/mm: Add Strong Access Ordering support") > > I'm _guessing_ the idea in the generic case might be that mmap() > doesn't check unknown bits in the protection flags, and therefore > maybe people wanted to avoid adding new error cases that could be > caused by random high bits being set? I suspect it's just that when we added it we updated our do_mmap2() and didn't touch the generic version because we didn't need to. ie. it's not intentional it's just a buglet. I think this is the original submission: https://lore.kernel.org/linuxppc-dev/20080610220055.10257.84465.sendpatch...@norville.austin.ibm.com/ Which only calls arch_validate_prot() from mprotect and the powerpc code, and there was no discussion about adding it elsewhere. > So while the mprotect() case > checks the flags and refuses unknown values, the mmap() code just lets > the architecture figure out which bits are actually valid to set (via > arch_calc_vm_prot_bits()) and silently ignores the rest? > > And powerpc apparently decided that they do want to error out on bogus > prot values passed to their version of mmap(), and in exchange, assume > in arch_calc_vm_prot_bits() that the protection bits are valid? I don't think we really decided that, it just happened by accident and no one noticed/complained. Seems userspace is pretty well behaved when it comes to passing prot values to mmap(). > powerpc's arch_validate_prot() doesn't actually need the mmap lock, so > I think this is fine-ish for now (as in, while the code is a bit > unclean, I don't think I'm making it worse, and I don't think it's > actually buggy). In theory, we could move the arch_validate_prot() > call over into the mmap guts, where we're holding the lock, and gate > it on the architecture or on some feature CONFIG that powerpc can > activate in its Kconfig. But I'm not sure whether that'd be helping or > making things worse, so when I sent this patch, I deliberately left > the powerpc stuff as-is. I think what you've done is fine, and anything more elaborate is not worth the effort. cheers
Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote: > arch_validate_prot() is a hook that can validate whether a given set of > protection flags is valid in an mprotect() operation. It is given the set > of protection flags and the address being modified. > > However, the address being modified can currently not actually be used in > a meaningful way because: > > 1. Only the address is given, but not the length, and the operation can >span multiple VMAs. Therefore, the callee can't actually tell which >virtual address range, or which VMAs, are being targeted. > 2. The mmap_lock is not held, meaning that if the callee were to check >the VMA at @addr, that VMA would be unrelated to the one the >operation is performed on. > > Currently, custom arch_validate_prot() handlers are defined by > arm64, powerpc and sparc. > arm64 and powerpc don't care about the address range, they just check the > flags against CPU support masks. > sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take > the mmap_lock. > > Change the function signature to also take a length, and move the > arch_validate_prot() call in mm/mprotect.c down into the locked region. For arm64 mte, I noticed the arch_validate_prot() issue with multiple vmas and addressed this in a different way: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte=c462ac288f2c97e0c1d9ff6a65955317e799f958 https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte=0042090548740921951f31fc0c20dcdb96638cb0 Both patches queued for 5.10. Basically, arch_calc_vm_prot_bits() returns a VM_MTE if PROT_MTE has been requested. The newly introduced arch_validate_flags() will check the VM_MTE flag against what the system supports and this covers both mmap() and mprotect(). Note that arch_validate_prot() only handles the latter and I don't think it's sufficient for SPARC ADI. For arm64 MTE we definitely wanted mmap() flags to be validated. In addition, there's a new arch_calc_vm_flag_bits() which allows us to set a VM_MTE_ALLOWED on a vma if the conditions are right (MAP_ANONYMOUS or shmem_mmap(): https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte=b3fbbea4c00220f62e6f7e2514466e6ee81f7f60 -- Catalin
Re: [PATCH v2 1/2] powerpc/rtas: Restrict RTAS requests from userspace
Michael Ellerman writes: > Andrew Donnellan writes: >> On 26/8/20 11:53 pm, Sasha Levin wrote: >>> How should we proceed with this patch? >> >> mpe: I believe we came to the conclusion that we shouldn't put this in >> stable just yet? > > Yeah. > > Let's give it a little time to get some wider testing before we backport > it. So my fault for not dropping the Cc: stable on the commit, sorry. cheers
Re: [PATCH v2 1/2] powerpc/rtas: Restrict RTAS requests from userspace
Andrew Donnellan writes: > On 26/8/20 11:53 pm, Sasha Levin wrote: >> How should we proceed with this patch? > > mpe: I believe we came to the conclusion that we shouldn't put this in > stable just yet? Yeah. Let's give it a little time to get some wider testing before we backport it. cheers
Re: [PATCH] crypto: talitos - Fix sparse warnings
Le 07/10/2020 à 08:50, Herbert Xu a écrit : On Sat, Oct 03, 2020 at 07:15:53PM +0200, Christophe Leroy wrote: The following changes fix the sparse warnings with less churn: Yes that works too. Can you please submit this patch? This fixed two independant commits from the past. I sent out two fix patches. Christophe
mm: Question about the use of 'accessed' flags and pte_young() helper
In a 10 years old commit (https://github.com/linuxppc/linux/commit/d069cb4373fe0d451357c4d3769623a7564dfa9f), powerpc 8xx has made the handling of PTE accessed bit conditional to CONFIG_SWAP. Since then, this has been extended to some other powerpc variants. That commit means that when CONFIG_SWAP is not selected, the accessed bit is not set by SW TLB miss handlers, leading to pte_young() returning garbage, or should I say possibly returning false allthough a page has been accessed since its access flag was reset. Looking at various mm/ places, pte_young() is used independent of CONFIG_SWAP Is it still valid the not manage accessed flags when CONFIG_SWAP is not selected ? If yes, should pte_young() always return true in that case ? While we are at it, I'm wondering whether powerpc should redefine arch_faults_on_old_pte() On some variants of powerpc, accessed flag is managed by HW. On others, it is managed by SW TLB miss handlers via page fault handling. Thanks Christophe
[PATCH] crypto: talitos - Fix return type of current_desc_hdr()
current_desc_hdr() returns a u32 but in fact this is a __be32, leading to a lot of sparse warnings. Change the return type to __be32 and ensure it is handled as sure by the caller. Fixes: 3e721aeb3df3 ("crypto: talitos - handle descriptor not found in error path") Signed-off-by: Christophe Leroy --- drivers/crypto/talitos.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 7c547352a862..f9f0d34d49f3 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -460,7 +460,7 @@ DEF_TALITOS2_DONE(ch1_3, TALITOS2_ISR_CH_1_3_DONE) /* * locate current (offending) descriptor */ -static u32 current_desc_hdr(struct device *dev, int ch) +static __be32 current_desc_hdr(struct device *dev, int ch) { struct talitos_private *priv = dev_get_drvdata(dev); int tail, iter; @@ -501,13 +501,13 @@ static u32 current_desc_hdr(struct device *dev, int ch) /* * user diagnostics; report root cause of error based on execution unit status */ -static void report_eu_error(struct device *dev, int ch, u32 desc_hdr) +static void report_eu_error(struct device *dev, int ch, __be32 desc_hdr) { struct talitos_private *priv = dev_get_drvdata(dev); int i; if (!desc_hdr) - desc_hdr = in_be32(priv->chan[ch].reg + TALITOS_DESCBUF); + desc_hdr = cpu_to_be32(in_be32(priv->chan[ch].reg + TALITOS_DESCBUF)); switch (desc_hdr & DESC_HDR_SEL0_MASK) { case DESC_HDR_SEL0_AFEU: -- 2.25.0
[PATCH] crypto: talitos - Endianess in current_desc_hdr()
current_desc_hdr() compares the value of the current descriptor with the next_desc member of the talitos_desc struct. While the current descriptor is obtained from in_be32() which return CPU ordered bytes, next_desc member is in big endian order. Convert the current descriptor into big endian before comparing it with next_desc. This fixes a sparse warning. Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on SEC1") Signed-off-by: Christophe Leroy --- drivers/crypto/talitos.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index f9f0d34d49f3..992d58a4dbf1 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -478,7 +478,7 @@ static __be32 current_desc_hdr(struct device *dev, int ch) iter = tail; while (priv->chan[ch].fifo[iter].dma_desc != cur_desc && - priv->chan[ch].fifo[iter].desc->next_desc != cur_desc) { + priv->chan[ch].fifo[iter].desc->next_desc != cpu_to_be32(cur_desc)) { iter = (iter + 1) & (priv->fifo_len - 1); if (iter == tail) { dev_err(dev, "couldn't locate current descriptor\n"); @@ -486,7 +486,7 @@ static __be32 current_desc_hdr(struct device *dev, int ch) } } - if (priv->chan[ch].fifo[iter].desc->next_desc == cur_desc) { + if (priv->chan[ch].fifo[iter].desc->next_desc == cpu_to_be32(cur_desc)) { struct talitos_edesc *edesc; edesc = container_of(priv->chan[ch].fifo[iter].desc, -- 2.25.0
[RFC PATCH] mm: Fetch the dirty bit before we reset the pte
In copy_present_page, after we mark the pte non-writable, we should check for previous dirty bit updates and make sure we don't lose the dirty bit on reset. Also, avoid marking the pte write-protected again if copy_present_page already marked it write-protected. Cc: Peter Xu Cc: Jason Gunthorpe Cc: John Hubbard Cc: linux...@kvack.org Cc: linux-ker...@vger.kernel.org Cc: Andrew Morton Cc: Jan Kara Cc: Michal Hocko Cc: Kirill Shutemov Cc: Hugh Dickins Cc: Linus Torvalds Signed-off-by: Aneesh Kumar K.V --- mm/memory.c | 8 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index bfe202ef6244..f57b1f04d50a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -848,6 +848,9 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (likely(!page_maybe_dma_pinned(page))) return 1; + if (pte_dirty(*src_pte)) + pte = pte_mkdirty(pte); + /* * Uhhuh. It looks like the page might be a pinned page, * and we actually need to copy it. Now we can set the @@ -904,6 +907,11 @@ copy_present_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (retval <= 0) return retval; + /* +* Fetch the src pte value again, copy_present_page +* could modify it. +*/ + pte = *src_pte; get_page(page); page_dup_rmap(page, false); rss[mm_counter(page)]++; -- 2.26.2
[PATCH] mm: Avoid using set_pte_at when updating a present pte
This avoids the below warning WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 30613 Comm: syz-executor.0 Not tainted 5.9.0-rc8-syzkaller-00156-gc85fb28b6f99 #0 Call Trace: [c01cd1f0] panic+0x29c/0x75c kernel/panic.c:231 [c01cce24] __warn+0x104/0x1b8 kernel/panic.c:600 [c0d829e4] report_bug+0x1d4/0x380 lib/bug.c:198 [c0036800] program_check_exception+0x4e0/0x750 arch/powerpc/kernel/traps.c:1508 [c00098a8] program_check_common_virt+0x308/0x360 --- interrupt: 700 at set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185 LR = set_pte_at+0x2a4/0x3a0 arch/powerpc/mm/pgtable.c:185 [c05d2a7c] copy_present_page mm/memory.c:857 [inline] [c05d2a7c] copy_present_pte mm/memory.c:899 [inline] [c05d2a7c] copy_pte_range mm/memory.c:1014 [inline] [c05d2a7c] copy_pmd_range mm/memory.c:1092 [inline] [c05d2a7c] copy_pud_range mm/memory.c:1127 [inline] [c05d2a7c] copy_p4d_range mm/memory.c:1150 [inline] [c05d2a7c] copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212 [c01c63cc] dup_mmap kernel/fork.c:592 [inline] [c01c63cc] dup_mm+0x77c/0xab0 kernel/fork.c:1355 [c01c8f70] copy_mm kernel/fork.c:1411 [inline] [c01c8f70] copy_process+0x1f00/0x2740 kernel/fork.c:2070 [c01c9b54] _do_fork+0xc4/0x10b0 kernel/fork.c:2429 [c01caf54] __do_sys_clone3+0x1d4/0x2b0 kernel/fork.c:27 Architecture like ppc64 expects set_pte_at to be not used for updating a valid pte. This is further explained in commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit") Cc: Peter Xu Cc: Jason Gunthorpe Cc: John Hubbard Cc: linux...@kvack.org Cc: linux-ker...@vger.kernel.org Cc: Andrew Morton Cc: Jan Kara Cc: Michal Hocko Cc: Kirill Shutemov Cc: Hugh Dickins Cc: Linus Torvalds Signed-off-by: Aneesh Kumar K.V --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index fcfc4ca36eba..bfe202ef6244 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -854,7 +854,7 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm, * source pte back to being writable. */ if (pte_write(pte)) - set_pte_at(src_mm, addr, src_pte, pte); + ptep_set_access_flags(vma, addr, src_pte, pte, 1); new_page = *prealloc; if (!new_page) -- 2.26.2
[PATCH v4 4/4] powerpc/sstep: Add testcases for VSX vector paired load/store instructions
From: Balamuruhan S Add testcases for VSX vector paired load/store instructions. Sample o/p: emulate_step_test: lxvp : PASS emulate_step_test: stxvp : PASS emulate_step_test: lxvpx : PASS emulate_step_test: stxvpx : PASS emulate_step_test: plxvp : PASS emulate_step_test: pstxvp : PASS Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/test_emulate_step.c | 270 +++ 1 file changed, 270 insertions(+) diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c index 0a201b771477..783d1b85ecfe 100644 --- a/arch/powerpc/lib/test_emulate_step.c +++ b/arch/powerpc/lib/test_emulate_step.c @@ -612,6 +612,273 @@ static void __init test_lxvd2x_stxvd2x(void) } #endif /* CONFIG_VSX */ +#ifdef CONFIG_VSX +static void __init test_lxvp_stxvp(void) +{ + struct pt_regs regs; + union { + vector128 a; + u32 b[4]; + } c[2]; + u32 cached_b[8]; + int stepped = -1; + + if (!cpu_has_feature(CPU_FTR_ARCH_31)) { + show_result("lxvp", "SKIP (!CPU_FTR_ARCH_31)"); + show_result("stxvp", "SKIP (!CPU_FTR_ARCH_31)"); + return; + } + + init_pt_regs(); + + /*** lxvp ***/ + + cached_b[0] = c[0].b[0] = 18233; + cached_b[1] = c[0].b[1] = 34863571; + cached_b[2] = c[0].b[2] = 834; + cached_b[3] = c[0].b[3] = 6138911; + cached_b[4] = c[1].b[0] = 1234; + cached_b[5] = c[1].b[1] = 5678; + cached_b[6] = c[1].b[2] = 91011; + cached_b[7] = c[1].b[3] = 121314; + + regs.gpr[4] = (unsigned long)[0].a; + + /* +* lxvp XTp,DQ(RA) +* XTp = 32xTX + 2xTp +* let TX=1 Tp=1 RA=4 DQ=0 +*/ + stepped = emulate_step(, ppc_inst(PPC_RAW_LXVP(34, 4, 0))); + + if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) { + show_result("lxvp", "PASS"); + } else { + if (!cpu_has_feature(CPU_FTR_VSX)) + show_result("lxvp", "PASS (!CPU_FTR_VSX)"); + else + show_result("lxvp", "FAIL"); + } + + /*** stxvp ***/ + + c[0].b[0] = 21379463; + c[0].b[1] = 87; + c[0].b[2] = 374234; + c[0].b[3] = 4; + c[1].b[0] = 90; + c[1].b[1] = 122; + c[1].b[2] = 555; + c[1].b[3] = 32144; + + /* +* stxvp XSp,DQ(RA) +* XSp = 32xSX + 2xSp +* let SX=1 Sp=1 RA=4 DQ=0 +*/ + stepped = emulate_step(, ppc_inst(PPC_RAW_STXVP(34, 4, 0))); + + if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == c[0].b[1] && + cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] && + cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] && + cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] && + cpu_has_feature(CPU_FTR_VSX)) { + show_result("stxvp", "PASS"); + } else { + if (!cpu_has_feature(CPU_FTR_VSX)) + show_result("stxvp", "PASS (!CPU_FTR_VSX)"); + else + show_result("stxvp", "FAIL"); + } +} +#else +static void __init test_lxvp_stxvp(void) +{ + show_result("lxvp", "SKIP (CONFIG_VSX is not set)"); + show_result("stxvp", "SKIP (CONFIG_VSX is not set)"); +} +#endif /* CONFIG_VSX */ + +#ifdef CONFIG_VSX +static void __init test_lxvpx_stxvpx(void) +{ + struct pt_regs regs; + union { + vector128 a; + u32 b[4]; + } c[2]; + u32 cached_b[8]; + int stepped = -1; + + if (!cpu_has_feature(CPU_FTR_ARCH_31)) { + show_result("lxvpx", "SKIP (!CPU_FTR_ARCH_31)"); + show_result("stxvpx", "SKIP (!CPU_FTR_ARCH_31)"); + return; + } + + init_pt_regs(); + + /*** lxvpx ***/ + + cached_b[0] = c[0].b[0] = 18233; + cached_b[1] = c[0].b[1] = 34863571; + cached_b[2] = c[0].b[2] = 834; + cached_b[3] = c[0].b[3] = 6138911; + cached_b[4] = c[1].b[0] = 1234; + cached_b[5] = c[1].b[1] = 5678; + cached_b[6] = c[1].b[2] = 91011; + cached_b[7] = c[1].b[3] = 121314; + + regs.gpr[3] = (unsigned long)[0].a; + regs.gpr[4] = 0; + + /* +* lxvpx XTp,RA,RB +* XTp = 32xTX + 2xTp +* let TX=1 Tp=1 RA=3 RB=4 +*/ + stepped = emulate_step(, ppc_inst(PPC_RAW_LXVPX(34, 3, 4))); + + if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) { + show_result("lxvpx", "PASS"); + } else { + if (!cpu_has_feature(CPU_FTR_VSX)) + show_result("lxvpx", "PASS (!CPU_FTR_VSX)"); + else + show_result("lxvpx", "FAIL"); + } + + /*** stxvpx ***/ + + c[0].b[0] = 21379463; + c[0].b[1] = 87; + c[0].b[2] = 374234; +
[PATCH v4 3/4] powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions
From: Balamuruhan S Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as macros for new VSX vector paired instructions, * Load VSX Vector Paired (lxvp) * Load VSX Vector Paired Indexed (lxvpx) * Prefixed Load VSX Vector Paired (plxvp) * Store VSX Vector Paired (stxvp) * Store VSX Vector Paired Indexed (stxvpx) * Prefixed Store VSX Vector Paired (pstxvp) Suggested-by: Naveen N. Rao Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/include/asm/ppc-opcode.h | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index a6e3700c4566..5e7918ca4fb7 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -78,6 +78,9 @@ #define IMM_L(i) ((uintptr_t)(i) & 0x) #define IMM_DS(i) ((uintptr_t)(i) & 0xfffc) +#define IMM_DQ(i) ((uintptr_t)(i) & 0xfff0) +#define IMM_D0(i) (((uintptr_t)(i) >> 16) & 0x3) +#define IMM_D1(i) IMM_L(i) /* * 16-bit immediate helper macros: HA() is for use with sign-extending instrs @@ -295,6 +298,8 @@ #define __PPC_XB(b)b) & 0x1f) << 11) | (((b) & 0x20) >> 4)) #define __PPC_XS(s)s) & 0x1f) << 21) | (((s) & 0x20) >> 5)) #define __PPC_XT(s)__PPC_XS(s) +#define __PPC_XSP(s) s) & 0x1e) | (((s) >> 5) & 0x1)) << 21) +#define __PPC_XTP(s) __PPC_XSP(s) #define __PPC_T_TLB(t) (((t) & 0x3) << 21) #define __PPC_WC(w)(((w) & 0x3) << 21) #define __PPC_WS(w)(((w) & 0x1f) << 11) @@ -395,6 +400,14 @@ #define PPC_RAW_XVCPSGNDP(t, a, b) ((0xf780 | VSX_XX3((t), (a), (b #define PPC_RAW_VPERMXOR(vrt, vra, vrb, vrc) \ ((0x102d | ___PPC_RT(vrt) | ___PPC_RA(vra) | ___PPC_RB(vrb) | (((vrc) & 0x1f) << 6))) +#define PPC_RAW_LXVP(xtp, a, i)(0x1800 | __PPC_XTP(xtp) | ___PPC_RA(a) | IMM_DQ(i)) +#define PPC_RAW_STXVP(xsp, a, i) (0x1801 | __PPC_XSP(xsp) | ___PPC_RA(a) | IMM_DQ(i)) +#define PPC_RAW_LXVPX(xtp, a, b) (0x7c00029a | __PPC_XTP(xtp) | ___PPC_RA(a) | ___PPC_RB(b)) +#define PPC_RAW_STXVPX(xsp, a, b) (0x7c00039a | __PPC_XSP(xsp) | ___PPC_RA(a) | ___PPC_RB(b)) +#define PPC_RAW_PLXVP(xtp, i, a, pr) \ + ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xe800 | __PPC_XTP(xtp) | ___PPC_RA(a) | IMM_D1(i))) +#define PPC_RAW_PSTXVP(xsp, i, a, pr) \ + ((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xf800 | __PPC_XSP(xsp) | ___PPC_RA(a) | IMM_D1(i))) #define PPC_RAW_NAP(0x4c000364) #define PPC_RAW_SLEEP (0x4c0003a4) #define PPC_RAW_WINKLE (0x4c0003e4) -- 2.26.2
[PATCH v4 2/4] powerpc/sstep: Support VSX vector paired storage access instructions
From: Balamuruhan S VSX Vector Paired instructions loads/stores an octword (32 bytes) from/to storage into two sequential VSRs. Add emulation support for these new instructions: * Load VSX Vector Paired (lxvp) * Load VSX Vector Paired Indexed (lxvpx) * Prefixed Load VSX Vector Paired (plxvp) * Store VSX Vector Paired (stxvp) * Store VSX Vector Paired Indexed (stxvpx) * Prefixed Store VSX Vector Paired (pstxvp) Suggested-by: Naveen N. Rao Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/sstep.c | 146 +-- 1 file changed, 125 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index e6242744c71b..e39ee1651636 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -32,6 +32,10 @@ extern char system_call_vectored_emulate[]; #define XER_OV32 0x0008U #define XER_CA32 0x0004U +#ifdef CONFIG_VSX +#define VSX_REGISTER_XTP(rd) rd) & 1) << 5) | ((rd) & 0xfe)) +#endif + #ifdef CONFIG_PPC_FPU /* * Functions in ldstfp.S @@ -279,6 +283,19 @@ static nokprobe_inline void do_byte_reverse(void *ptr, int nb) up[1] = tmp; break; } + case 32: { + unsigned long *up = (unsigned long *)ptr; + unsigned long tmp; + + tmp = byterev_8(up[0]); + up[0] = byterev_8(up[3]); + up[3] = tmp; + tmp = byterev_8(up[2]); + up[2] = byterev_8(up[1]); + up[1] = tmp; + break; + } + #endif default: WARN_ON_ONCE(1); @@ -709,6 +726,8 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, reg->d[0] = reg->d[1] = 0; switch (op->element_size) { + case 32: + /* [p]lxvp[x] */ case 16: /* whole vector; lxv[x] or lxvl[l] */ if (size == 0) @@ -717,7 +736,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg, if (IS_LE && (op->vsx_flags & VSX_LDLEFT)) rev = !rev; if (rev) - do_byte_reverse(reg, 16); + do_byte_reverse(reg, size); break; case 8: /* scalar loads, lxvd2x, lxvdsx */ @@ -793,6 +812,20 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg, size = GETSIZE(op->type); switch (op->element_size) { + case 32: + /* [p]stxvp[x] */ + if (size == 0) + break; + if (rev) { + /* reverse 32 bytes */ + buf.d[0] = byterev_8(reg->d[3]); + buf.d[1] = byterev_8(reg->d[2]); + buf.d[2] = byterev_8(reg->d[1]); + buf.d[3] = byterev_8(reg->d[0]); + reg = + } + memcpy(mem, reg, size); + break; case 16: /* stxv, stxvx, stxvl, stxvll */ if (size == 0) @@ -861,28 +894,43 @@ static nokprobe_inline int do_vsx_load(struct instruction_op *op, bool cross_endian) { int reg = op->reg; - u8 mem[16]; - union vsx_reg buf; + int i, j, nr_vsx_regs; + u8 mem[32]; + union vsx_reg buf[2]; int size = GETSIZE(op->type); if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs)) return -EFAULT; - emulate_vsx_load(op, , mem, cross_endian); + nr_vsx_regs = size / sizeof(__vector128); + emulate_vsx_load(op, buf, mem, cross_endian); preempt_disable(); if (reg < 32) { /* FP regs + extensions */ if (regs->msr & MSR_FP) { - load_vsrn(reg, ); + for (i = 0; i < nr_vsx_regs; i++) { + j = IS_LE ? nr_vsx_regs - i - 1 : i; + load_vsrn(reg + i, [j].v); + } } else { - current->thread.fp_state.fpr[reg][0] = buf.d[0]; - current->thread.fp_state.fpr[reg][1] = buf.d[1]; + for (i = 0; i < nr_vsx_regs; i++) { + j = IS_LE ? nr_vsx_regs - i - 1 : i; + current->thread.fp_state.fpr[reg + i][0] = buf[j].d[0]; + current->thread.fp_state.fpr[reg + i][1] = buf[j].d[1]; + } } } else { - if (regs->msr & MSR_VEC) - load_vsrn(reg, ); - else - current->thread.vr_state.vr[reg - 32] = buf.v; + if (regs->msr & MSR_VEC) { + for (i = 0; i <
[PATCH v4 0/4] powerpc/sstep: VSX 32-byte vector paired load/store instructions
VSX vector paired instructions operates with octword (32-byte) operand for loads and stores between storage and a pair of two sequential Vector-Scalar Registers (VSRs). There are 4 word instructions and 2 prefixed instructions that provides this 32-byte storage access operations - lxvp, lxvpx, stxvp, stxvpx, plxvp, pstxvp. Emulation infrastructure doesn't have support for these instructions, to operate with 32-byte storage access and to operate with 2 VSX registers. This patch series enables the instruction emulation support and adds test cases for them respectively. v3: https://lore.kernel.org/linuxppc-dev/20200731081637.1837559-1-bal...@linux.ibm.com/ Changes in v4: - * Patch #1 is (kind of) new. * Patch #2 now enables both analyse_instr() and emulate_step() unlike prev series where both were in separate patches. * Patch #2 also has important fix for emulation on LE. * Patch #3 and #4. Added XSP/XTP, D0/D1 instruction operands, removed *_EX_OP, __PPC_T[P][X] macros which are incorrect, and adhered to PPC_RAW_* convention. * Added `CPU_FTR_ARCH_31` check in testcases to avoid failing in p8/p9. * Some consmetic changes. * Rebased to powerpc/next Changes in v3: - Worked on review comments and suggestions from Ravi and Naveen, * Fix the do_vsx_load() to handle vsx instructions if MSR_FP/MSR_VEC cleared in exception conditions and it reaches to read/write to thread_struct member fp_state/vr_state respectively. * Fix wrongly used `__vector128 v[2]` in struct vsx_reg as it should hold a single vsx register size. * Remove unnecessary `VSX_CHECK_VEC` flag set and condition to check `VSX_LDLEFT` that is not applicable for these vsx instructions. * Fix comments in emulate_vsx_load() that were misleading. * Rebased on latest powerpc next branch. Changes in v2: - * Fix suggestion from Sandipan, wrap ISA 3.1 instructions with cpu_has_feature(CPU_FTR_ARCH_31) check. * Rebase on latest powerpc next branch. Balamuruhan S (4): powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set powerpc/sstep: Support VSX vector paired storage access instructions powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions powerpc/sstep: Add testcases for VSX vector paired load/store instructions arch/powerpc/include/asm/ppc-opcode.h | 13 ++ arch/powerpc/lib/sstep.c | 152 +-- arch/powerpc/lib/test_emulate_step.c | 270 ++ 3 files changed, 414 insertions(+), 21 deletions(-) -- 2.26.2
[PATCH v4 1/4] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
From: Balamuruhan S Unconditional emulation of prefixed instructions will allow emulation of them on Power10 predecessors which might cause issues. Restrict that. Signed-off-by: Balamuruhan S Signed-off-by: Ravi Bangoria --- arch/powerpc/lib/sstep.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index e9dcaba9a4f8..e6242744c71b 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -1346,6 +1346,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, switch (opcode) { #ifdef __powerpc64__ case 1: + if (!cpu_has_feature(CPU_FTR_ARCH_31)) + return -1; + prefix_r = GET_PREFIX_R(word); ra = GET_PREFIX_RA(suffix); rd = (suffix >> 21) & 0x1f; @@ -2733,6 +2736,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, } break; case 1: /* Prefixed instructions */ + if (!cpu_has_feature(CPU_FTR_ARCH_31)) + return -1; + prefix_r = GET_PREFIX_R(word); ra = GET_PREFIX_RA(suffix); op->update_reg = ra; -- 2.26.2
Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
On Wed, Oct 07, 2020 at 04:42:55PM +0200, Jann Horn wrote: > > > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t > > > len, > > > { > > > long ret = -EINVAL; > > > > > > - if (!arch_validate_prot(prot, addr)) > > > + if (!arch_validate_prot(prot, addr, len)) > > > > This call isn't under mmap lock. I also find it rather weird as the > > generic code only calls arch_validate_prot from mprotect, only powerpc > > also calls it from mmap. > > > > This seems to go back to commit ef3d3246a0d0 > > ("powerpc/mm: Add Strong Access Ordering support") > > I'm _guessing_ the idea in the generic case might be that mmap() > doesn't check unknown bits in the protection flags, and therefore > maybe people wanted to avoid adding new error cases that could be > caused by random high bits being set? So while the mprotect() case > checks the flags and refuses unknown values, the mmap() code just lets > the architecture figure out which bits are actually valid to set (via > arch_calc_vm_prot_bits()) and silently ignores the rest? > > And powerpc apparently decided that they do want to error out on bogus > prot values passed to their version of mmap(), and in exchange, assume > in arch_calc_vm_prot_bits() that the protection bits are valid? The problem really is that now programs behave different on powerpc compared to all other architectures. > powerpc's arch_validate_prot() doesn't actually need the mmap lock, so > I think this is fine-ish for now (as in, while the code is a bit > unclean, I don't think I'm making it worse, and I don't think it's > actually buggy). In theory, we could move the arch_validate_prot() > call over into the mmap guts, where we're holding the lock, and gate > it on the architecture or on some feature CONFIG that powerpc can > activate in its Kconfig. But I'm not sure whether that'd be helping or > making things worse, so when I sent this patch, I deliberately left > the powerpc stuff as-is. For now I'd just duplicate the trivial logic from arch_validate_prot in the powerpc version of do_mmap2 and add a comment that this check causes a gratious incompatibility to all other architectures. And then hope that the powerpc maintainers fix it up :)