Re: [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation

2020-01-28 Thread Christophe Leroy




Le 29/01/2020 à 08:14, Thomas Gleixner a écrit :

Andy Lutomirski  writes:


On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner  wrote:


Andy Lutomirski  writes:

On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy

Would mul_u64_u64_shr() be a good alternative?  Could we adjust it to
assume the shift is less than 32?  That function exists to benefit
32-bit arches.


We'd want mul_u64_u32_shr() for this. The rules for mult and shift are:



That's what I meant to type...


Just that it does not work. The math is:

  ns = d->nsecs;   // That's the nsec value shifted left by d->shift

  ns += ((cur - d->last) & d->mask) * mult;

  ns >>= d->shift;

So we cannot use mul_u64_u32_shr() because we need the addition there
before shifting. And no, we can't drop the fractional part of
d->nsecs. Been there, done that, got sporadic time going backwards
problems as a reward. Need to look at that again as stuff has changed
over time.

On x86 we enforce that mask is 64bit, so the & operation is not there,
but due to the nasties of TSC we have that conditional

 if (cur > last)
return (cur - last) * mult;
 return 0;

Christophe, on PPC the decrementer/RTC clocksource masks are 64bit as
well, so you can spare that & operation there too.



Yes, I did it already. It spares reading d->mast, that the main advantage.

Christophe


Re: [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation

2020-01-28 Thread Thomas Gleixner
Andy Lutomirski  writes:

> On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner  wrote:
>>
>> Andy Lutomirski  writes:
>> > On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
>> >
>> > Would mul_u64_u64_shr() be a good alternative?  Could we adjust it to
>> > assume the shift is less than 32?  That function exists to benefit
>> > 32-bit arches.
>>
>> We'd want mul_u64_u32_shr() for this. The rules for mult and shift are:
>>
>
> That's what I meant to type...

Just that it does not work. The math is:

 ns = d->nsecs;   // That's the nsec value shifted left by d->shift

 ns += ((cur - d->last) & d->mask) * mult;

 ns >>= d->shift;

So we cannot use mul_u64_u32_shr() because we need the addition there
before shifting. And no, we can't drop the fractional part of
d->nsecs. Been there, done that, got sporadic time going backwards
problems as a reward. Need to look at that again as stuff has changed
over time.

On x86 we enforce that mask is 64bit, so the & operation is not there,
but due to the nasties of TSC we have that conditional

if (cur > last)
   return (cur - last) * mult;
return 0;

Christophe, on PPC the decrementer/RTC clocksource masks are 64bit as
well, so you can spare that & operation there too.

Thanks,

tglx





Re: [PATCH] powerpc/xmon: Fix compile error in print_insn* functions

2020-01-28 Thread Michael Ellerman
On Thu, 2020-01-23 at 01:04:55 UTC, Sukadev Bhattiprolu wrote:
> >From 72a7497a8673c93a4b80aa4fc38b88a8e90aa650 Mon Sep 17 00:00:00 2001
> From: Sukadev Bhattiprolu 
> Date: Wed, 22 Jan 2020 18:57:18 -0600
> Subject: [PATCH 1/1] powerpc/xmon: Fix compile error in print_insn* functions
> 
> Fix couple of compile errors I stumbled upon with CONFIG_XMON=y and
> CONFIG_XMON_DISASSEMBLY=n
> 
> Signed-off-by: Sukadev Bhattiprolu 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/493a55f1e7724dee5e71bc726d5b819292094587

cheers


Re: [PATCH v4] powerpc: use probe_user_read() and probe_user_write()

2020-01-28 Thread Michael Ellerman
On Thu, 2020-01-23 at 17:30:47 UTC, Christophe Leroy wrote:
> Instead of opencoding, use probe_user_read() to failessly read
> a user location and probe_user_write() for writing to user.
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/def0bfdbd6039e96a9eb2baaa4470b079daab0d4

cheers


Re: [PATCH] powerpc/papr_scm: Fix leaking 'bus_desc.provider_name' in some paths

2020-01-28 Thread Michael Ellerman
On Wed, 2020-01-22 at 15:51:40 UTC, Vaibhav Jain wrote:
> String 'bus_desc.provider_name' allocated inside
> papr_scm_nvdimm_init() will leaks in case call to
> nvdimm_bus_register() fails or when papr_scm_remove() is called.
> 
> This minor patch ensures that 'bus_desc.provider_name' is freed in
> error path for nvdimm_bus_register() as well as in papr_scm_remove().
> 
> Signed-off-by: Vaibhav Jain 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5649607a8d0b0e019a4db14aab3de1e16c3a2b4f

cheers


Re: [PATCH] powerpc/pseries/vio: Fix iommu_table use-after-free refcount warning

2020-01-28 Thread Michael Ellerman
On Mon, 2020-01-20 at 22:10:02 UTC, Tyrel Datwyler wrote:
> From: Tyrel Datwyler 
> 
> Commit e5afdf9dd515 ("powerpc/vfio_spapr_tce: Add reference counting to
> iommu_table") missed an iommu_table allocation in the pseries vio code.
> The iommu_table is allocated with kzalloc and as a result the associated
> kref gets a value of zero. This has the side effect that during a DLPAR
> remove of the associated virtual IOA the iommu_tce_table_put() triggers
> a use-after-free underflow warning.
> 
> Call Trace:
> [c002879e39f0] [c071ecb4] refcount_warn_saturate+0x184/0x190
> (unreliable)
> [c002879e3a50] [c00500ac] iommu_tce_table_put+0x9c/0xb0
> [c002879e3a70] [c00f54e4] vio_dev_release+0x34/0x70
> [c002879e3aa0] [c087cfa4] device_release+0x54/0xf0
> [c002879e3b10] [c0d64c84] kobject_cleanup+0xa4/0x240
> [c002879e3b90] [c087d358] put_device+0x28/0x40
> [c002879e3bb0] [c07a328c] dlpar_remove_slot+0x15c/0x250
> [c002879e3c50] [c07a348c] remove_slot_store+0xac/0xf0
> [c002879e3cd0] [c0d64220] kobj_attr_store+0x30/0x60
> [c002879e3cf0] [c04ff13c] sysfs_kf_write+0x6c/0xa0
> [c002879e3d10] [c04fde4c] kernfs_fop_write+0x18c/0x260
> [c002879e3d60] [c0410f3c] __vfs_write+0x3c/0x70
> [c002879e3d80] [c0415408] vfs_write+0xc8/0x250
> [c002879e3dd0] [c04157dc] ksys_write+0x7c/0x120
> [c002879e3e20] [c000b278] system_call+0x5c/0x68
> 
> Further, since the refcount was always zero the iommu_tce_table_put()
> fails to call the iommu_table release function resulting in a leak.
> 
> Fix this issue be initilizing the iommu_table kref immediately after
> allocation.
> 
> Fixes: e5afdf9dd515 ("powerpc/vfio_spapr_tce: Add reference counting to 
> iommu_table")
> Signed-off-by: Tyrel Datwyler 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/aff8c8242bc638ba57247ae1ec5f272ac3ed3b92

cheers


Re: [PATCH] selftests/eeh: Bump EEH wait time to 60s

2020-01-28 Thread Michael Ellerman
On Wed, 2020-01-22 at 03:11:25 UTC, Oliver O'Halloran wrote:
> Some newer cards supported by aacraid can take up to 40s to recover
> after an EEH event. This causes spurious failures in the basic EEH
> self-test since the current maximim timeout is only 30s.
> 
> Fix the immediate issue by bumping the timeout to a default of 60s,
> and allow the wait time to be specified via an environmental variable
> (EEH_MAX_WAIT).
> 
> Reported-by: Steve Best 
> Suggested-by: Douglas Miller 
> Signed-off-by: Oliver O'Halloran 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/414f50434aa2463202a5b35e844f4125dd1a7101

cheers


Re: [PATCH 1/5] powerpc/32: add support of KASAN_VMALLOC

2020-01-28 Thread Michael Ellerman
On Tue, 2020-01-14 at 17:54:00 UTC, Christophe Leroy wrote:
> Add support of KASAN_VMALLOC on PPC32.
> 
> To allow this, the early shadow covering the VMALLOC space
> need to be removed once high_memory var is set and before
> freeing memblock.
> 
> And the VMALLOC area need to be aligned such that boundaries
> are covered by a full shadow page.
> 
> Signed-off-by: Christophe Leroy 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/3d4247fcc938d0ab5cf6fdb752dae07fdeab9736

cheers


Re: [PATCH -next] powerpc/maple: fix comparing pointer to 0

2020-01-28 Thread Michael Ellerman
On Tue, 2020-01-21 at 01:31:53 UTC, Chen Zhou wrote:
> Fixes coccicheck warning:
> ./arch/powerpc/platforms/maple/setup.c:232:15-16:
>   WARNING comparing pointer to 0
> 
> Compare pointer-typed values to NULL rather than 0.
> 
> Signed-off-by: Chen Zhou 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1e3531982ee70adf1880715a968d9c3365f321ed

cheers


Re: [PATCH v2] Fix display of Maximum Memory

2020-01-28 Thread Michael Ellerman
On Wed, 2020-01-15 at 14:53:59 UTC, Michael Bringmann wrote:
> Correct overflow problem in calculation+display of Maximum Memory
> value to syscfg where 32bits is insufficient.
> 
> Signed-off-by: Michael Bringmann 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f1dbc1c5c70d0d4c60b5d467ba941fba167c12f6

cheers


Re: [PATCH] MAINTAINERS: Add myself as maintainer of ehv_bytechan tty driver

2020-01-28 Thread Michael Ellerman
On Tue, 2020-01-14 at 11:00:25 UTC, Laurentiu Tudor wrote:
> Michael Ellerman made a call for volunteers from NXP to maintain
> this driver and I offered myself.
> 
> Signed-off-by: Laurentiu Tudor 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/173bf44bdfc768af3c07cd0aeeb6ad8d1331b77d

cheers


Re: [PATCH v2] powerpc/ptdump: don't entirely rebuild kernel when selecting CONFIG_PPC_DEBUG_WX

2020-01-28 Thread Michael Ellerman
On Tue, 2020-01-14 at 07:14:40 UTC, Christophe Leroy wrote:
> Selecting CONFIG_PPC_DEBUG_WX only impacts ptdump and pgtable_32/64
> init calls. Declaring related functions in asm/pgtable.h implies
> rebuilding almost everything.
> 
> Move ptdump_check_wx() declaration in mm/mmu_decl.h
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1e1c8b2cc37afb333c1829e8e0360321813bf220

cheers


Re: [PATCH] powerpc/ptdump: only enable PPC_CHECK_WX with STRICT_KERNEL_RWX

2020-01-28 Thread Michael Ellerman
On Tue, 2020-01-14 at 08:13:10 UTC, Christophe Leroy wrote:
> ptdump_check_wx() is called from mark_rodata_ro() which only exists
> when CONFIG_STRICT_KERNEL_RWX is selected.
> 
> Signed-off-by: Christophe Leroy 
> Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot")

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f509247b08f2dcf7754d9ed85ad69a7972aa132b

cheers


Re: [PATCH] powerpc/ptdump: fix W+X verification

2020-01-28 Thread Michael Ellerman
On Tue, 2020-01-14 at 08:13:09 UTC, Christophe Leroy wrote:
> Verification cannot rely on simple bit checking because on some
> platforms PAGE_RW is 0, checking that a page is not W means
> checking that PAGE_RO is set instead of checking that PAGE_RW
> is not set.
> 
> Use pte helpers instead of checking bits.
> 
> Signed-off-by: Christophe Leroy 
> Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot")

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d80ae83f1f932ab7af47b54d0d3bef4f4dba489f

cheers


Re: [PATCH] powerpc/ptdump: fix W+X verification call in mark_rodata_ro()

2020-01-28 Thread Michael Ellerman
On Tue, 2020-01-14 at 08:13:08 UTC, Christophe Leroy wrote:
> ptdump_check_wx() also have to be called when pages are mapped
> by blocks.
> 
> Signed-off-by: Christophe Leroy 
> Fixes: 453d87f6a8ae ("powerpc/mm: Warn if W+X pages found on boot")

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e26ad936dd89d79f66c2b567f700e0c2a7103070

cheers


Re: [PATCH 1/5] powerpc/pci: Fold pcibios_setup_device() into pcibios_bus_add_device()

2020-01-28 Thread Michael Ellerman
On Fri, 2020-01-10 at 07:02:03 UTC, Oliver O'Halloran wrote:
> pcibios_bus_add_device() is the only caller of pcibios_setup_device().
> Fold them together since there's no real reason to keep them separate.
> 
> Signed-off-by: Oliver O'Halloran 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/3ab3f3c9df348324029e3fbdf381f551b1df8f1e

cheers


Re: [PATCH] powerpc/papr_scm: Don't enable direct map for a region by default

2020-01-28 Thread Michael Ellerman
On Wed, 2020-01-08 at 06:46:47 UTC, "Aneesh Kumar K.V" wrote:
> Setting ND_REGION_PAGEMAP flag implies namespace mode defaults to fsdax mode.
> This also means kernel ends up creating struct page backing for these namspace
> ranges. With large namespaces that is not the right thing to do. We
> should let the user select the mode he/she wants the namespace to be created
> with.
> 
> Hence disable ND_REGION_PAGEMAP for papr_scm regions. We still keep the flag 
> for
> of_pmem because it supports only small persistent memory regions.
> 
> This is similar to what is done for x86 with commit
> commit: 004f1afbe199 ("libnvdimm, pmem: direct map legacy pmem by default")
> 
> Signed-off-by: Aneesh Kumar K.V 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7e6f8cbc5e10cf7601c762db267b795273d53078

cheers


Re: [PATCH] powerpc/pseries: in lmb_is_removable(), advance pfn if section is not present

2020-01-28 Thread Michael Ellerman
On Fri, 2020-01-10 at 04:54:02 UTC, Pingfan Liu wrote:
> In lmb_is_removable(), if a section is not present, it should continue to
> test the rest sections in the block. But the current code fails to do so.
> 
> Signed-off-by: Pingfan Liu 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> To: linuxppc-dev@lists.ozlabs.org

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/fbee6ba2dca30d302efe6bddb3a886f5e964a257

cheers


Re: powerpc/xmon: don't access ASDR in VMs

2020-01-28 Thread Michael Ellerman
On Tue, 2020-01-07 at 02:16:33 UTC, Sukadev Bhattiprolu wrote:
> >From 91a77dbea3c909ff15c66cded37f1334304a293d Mon Sep 17 00:00:00 2001
> From: Sukadev Bhattiprolu 
> Date: Mon, 6 Jan 2020 13:50:02 -0600
> Subject: [PATCH 1/1] powerpc/xmon: don't access ASDR in VMs
> 
> ASDR is HV-privileged and must only be accessed in HV-mode.
> Fixes a Program Check (0x700) when xmon in a VM dumps SPRs.
> 
> Signed-off-by: Sukadev Bhattiprolu 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c2a20711fc181e7f22ee5c16c28cb9578af84729

cheers


Re: [PATCH v3 1/2] powerpc32/booke: consistently return phys_addr_t in __pa()

2020-01-28 Thread Michael Ellerman
On Mon, 2020-01-06 at 04:29:53 UTC, yingjie_...@126.com wrote:
> From: Bai Yingjie 
> 
> When CONFIG_RELOCATABLE=y is set, VIRT_PHYS_OFFSET is a 64bit variable,
> thus __pa() returns as 64bit value.
> But when CONFIG_RELOCATABLE=n, __pa() returns 32bit value.
> 
> When CONFIG_PHYS_64BIT is set, __pa() should consistently return as
> 64bit value irrelevant to CONFIG_RELOCATABLE.
> So we'd make __pa() consistently return phys_addr_t, which is 64bit
> when CONFIG_PHYS_64BIT is set.
> 
> Signed-off-by: Bai Yingjie 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/6ad4afc97bc6c5cca9786030492ddfab871ce79e

cheers


Re: [PATCH 10/10] powerpc/powernv: use resource_size

2020-01-28 Thread Michael Ellerman
On Wed, 2020-01-01 at 17:49:50 UTC, Julia Lawall wrote:
> Use resource_size rather than a verbose computation on
> the end and start fields.
> 
> The semantic patch that makes these changes is as follows:
> (http://coccinelle.lip6.fr/)
> 
> 
> @@ struct resource ptr; @@
> - (ptr.end - ptr.start + 1)
> + resource_size()
> 
> 
> Signed-off-by: Julia Lawall 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/552aa086944a9aeabd599892007c2c7faedb894e

cheers


Re: [PATCH 05/10] powerpc/83xx: use resource_size

2020-01-28 Thread Michael Ellerman
On Wed, 2020-01-01 at 17:49:45 UTC, Julia Lawall wrote:
> Use resource_size rather than a verbose computation on
> the end and start fields.
> 
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
> 
> 
> @@ struct resource ptr; @@
> - (ptr.end - ptr.start + 1)
> + resource_size()
> 
> 
> Signed-off-by: Julia Lawall 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/bfbe37f0ce994e7a9945653d7624fadc5c500a9f

cheers


Re: [PATCH 1/4] misc: cxl: use mmgrab

2020-01-28 Thread Michael Ellerman
On Sun, 2019-12-29 at 15:42:55 UTC, Julia Lawall wrote:
> Mmgrab was introduced in commit f1f1007644ff ("mm: add new mmgrab()
> helper") and most of the kernel was updated to use it. Update a
> remaining file.
> 
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
> 
> 
> @@ expression e; @@
> - atomic_inc(>mm_count);
> + mmgrab(e);
> 
> 
> Signed-off-by: Julia Lawall 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/30e813cf46ccaeea6508607632e49b4a1d743d2a

cheers


Re: [PATCH 09/16] powerpc/mpic: constify copied structure

2020-01-28 Thread Michael Ellerman
On Wed, 2020-01-01 at 07:43:27 UTC, Julia Lawall wrote:
> The mpic_ipi_chip and mpic_irq_ht_chip structures are only copied
> into other structures, so make them const.
> 
> The opportunity for this change was found using Coccinelle.
> 
> Signed-off-by: Julia Lawall 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5084ff33cac0988c1b979814501dcc2e1ecbf9c0

cheers


Re: [PATCH] powerpc/mm: don't log user reads to 0xffffffff

2020-01-28 Thread Michael Ellerman
On Mon, 2019-12-23 at 07:54:22 UTC, Christophe Leroy wrote:
> Running vdsotest leaves many times the following log:
> 
> [   79.629901] vdsotest[396]: User access of kernel address () - 
> exploit attempt? (uid: 0)
> 
> A pointer set to (-1) is likely a programming error similar to
> a NULL pointer and is not worth logging as an exploit attempt.
> 
> Don't log user accesses to 0x.
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/0f9aee0cb9da7db7d96f63cfa2dc5e4f1bffeb87

cheers


Re: [PATCH 1/2] powerpc/book3s64/hash: Disable 16M linear mapping size if not aligned

2020-01-28 Thread Michael Ellerman
On Tue, 2019-12-24 at 06:41:25 UTC, Russell Currey wrote:
> With STRICT_KERNEL_RWX on in a relocatable kernel under the hash MMU, if
> the position the kernel is loaded at is not 16M aligned, the kernel
> miscalculates its ALIGN*()s and things go horribly wrong.
> 
> We can easily avoid this when selecting the linear mapping size, so do
> so and print a warning.  I tested this for various alignments and as
> long as the position is 64K aligned it's fine (the base requirement for
> powerpc).
> 
> Signed-off-by: Russell Currey 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/970d54f99ceac5bbf27929cb5ebfe18338ba1543

cheers


Re: [PATCH v5 01/17] powerpc/32: replace MTMSRD() by mtmsr

2020-01-28 Thread Michael Ellerman
On Sat, 2019-12-21 at 08:32:22 UTC, Christophe Leroy wrote:
> On PPC32, MTMSRD() is simply defined as mtmsr.
> 
> Replace MTMSRD(reg) by mtmsr reg in files dedicated to PPC32,
> this makes the code less obscure.
> 
> Signed-off-by: Christophe Leroy 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/39bccfd164970557c5cfc60b2db029f70542549f

cheers


Re: [PATCH v3] powerpc/mm: Remove kvm radix prefetch workaround for Power9 DD2.2

2020-01-28 Thread Michael Ellerman
On Fri, 2019-12-06 at 03:17:22 UTC, Jordan Niethe wrote:
> Commit a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with
> KVM") introduced a number of workarounds as coming out of a guest with
> the mmu enabled would make the cpu would start running in hypervisor
> state with the PID value from the guest. The cpu will then start
> prefetching for the hypervisor with that PID value.
> 
> In Power9 DD2.2 the cpu behaviour was modified to fix this. When
> accessing Quadrant 0 in hypervisor mode with LPID != 0 prefetching will
> not be performed. This means that we can get rid of the workarounds for
> Power9 DD2.2 and later revisions. Add a new cpu feature
> CPU_FTR_P9_RADIX_PREFETCH_BUG to indicate if the workarounds are needed.
> 
> Signed-off-by: Jordan Niethe 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/736bcdd3a9fc672af33fb83230ecd0570ec38ec6

cheers


Re: [PATCH v4 1/8] powerpc/32: Add VDSO version of getcpu on non SMP

2020-01-28 Thread Michael Ellerman
On Mon, 2019-12-02 at 07:57:27 UTC, Christophe Leroy wrote:
> Commit 18ad51dd342a ("powerpc: Add VDSO version of getcpu") added
> getcpu() for PPC64 only, by making use of a user readable general
> purpose SPR.
> 
> PPC32 doesn't have any such SPR.
> 
> For non SMP, just return CPU id 0 from the VDSO directly.
> PPC32 doesn't support CONFIG_NUMA so NUMA node is always 0.
> 
> Before the patch, vdsotest reported:
> getcpu: syscall: 1572 nsec/call
> getcpu:libc: 1787 nsec/call
> getcpu:vdso: not tested
> 
> Now, vdsotest reports:
> getcpu: syscall: 1582 nsec/call
> getcpu:libc: 502 nsec/call
> getcpu:vdso: 187 nsec/call
> 
> Signed-off-by: Christophe Leroy 

Patches 1, 2 and 4-8, applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/902137ba8e469ed07c7f120a390161937a6288fb

cheers


Re: [PATCH] powerpc/devicetrees: Change 'gpios' to 'cs-gpios' on fsl, spi nodes

2020-01-28 Thread Michael Ellerman
On Thu, 2019-11-28 at 12:16:35 UTC, Christophe Leroy wrote:
> Since commit 0f0581b24bd0 ("spi: fsl: Convert to use CS GPIO
> descriptors"), the prefered way to define chipselect GPIOs is using
> 'cs-gpios' property instead of the legacy 'gpios' property.
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8c452a889821ca0cd2a5f2e3e87fbc01e56408cb

cheers


Re: [PATCH v2] powerpc/8xx: Fix permanently mapped IMMR region.

2020-01-28 Thread Michael Ellerman
On Tue, 2019-11-26 at 13:16:50 UTC, Christophe Leroy wrote:
> When not using large TLBs, the IMMR region is still
> mapped as a whole block in the FIXMAP area.
> 
> Properly report that the IMMR region is block-mapped even
> when not using large TLBs.
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/991d656d722dbc783481f408d6e4cbcce2e8bb78

cheers


Re: [PATCH v2 1/2] powerpc/hw_breakpoints: Rewrite 8xx breakpoints to allow any address range size.

2020-01-28 Thread Michael Ellerman
On Tue, 2019-11-26 at 17:43:29 UTC, Christophe Leroy wrote:
> Unlike standard powerpc, Powerpc 8xx doesn't have SPRN_DABR, but
> it has a breakpoint support based on a set of comparators which
> allow more flexibility.
> 
> Commit 4ad8622dc548 ("powerpc/8xx: Implement hw_breakpoint")
> implemented breakpoints by emulating the DABR behaviour. It did
> this by setting one comparator the match 4 bytes at breakpoint address
> and the other comparator to match 4 bytes at breakpoint address + 4.
> 
> Rewrite 8xx hw_breakpoint to make breakpoints match all addresses
> defined by the breakpoint address and length by making full use of
> comparators.
> 
> Now, comparator E is set to match any address greater than breakpoint
> address minus one. Comparator F is set to match any address lower than
> breakpoint address plus breakpoint length. Addresses are aligned
> to 32 bits.
> 
> When the breakpoint range starts at address 0, the breakpoint is set
> to match comparator F only. When the breakpoint range end at address
> 0x, the breakpoint is set to match comparator E only.
> Otherwise the breakpoint is set to match comparator E and F.
> 
> At the same time, use registers bit names instead of hardcoded values.
> 
> Signed-off-by: Christophe Leroy 
> Cc: Ravi Bangoria 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/39413ae009674c6ba745850515b551bbb9d6374b

cheers


Re: [PATCH] macintosh: Fix Kconfig indentation

2020-01-28 Thread Michael Ellerman
On Wed, 2019-11-20 at 13:41:15 UTC, Krzysztof Kozlowski wrote:
> Adjust indentation from spaces to tab (+optional two spaces) as in
> coding style with command like:
>   $ sed -e 's/^/\t/' -i */Kconfig
> 
> Signed-off-by: Krzysztof Kozlowski 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/25dd118f4b2773490df12b9d190c1956cf3250c3

cheers


Re: [PATCH v2 01/11] powerpc/powernv/ioda: Fix ref count for devices with their own PE

2020-01-28 Thread Michael Ellerman
On Thu, 2019-11-21 at 13:49:08 UTC, Frederic Barrat wrote:
> The pci_dn structure used to store a pointer to the struct pci_dev, so
> taking a reference on the device was required. However, the pci_dev
> pointer was later removed from the pci_dn structure, but the reference
> was kept for the npu device.
> See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
> pcidev from pci_dn").
> 
> We don't need to take a reference on the device when assigning the PE
> as the struct pnv_ioda_pe is cleaned up at the same time as
> the (physical) device is released. Doing so prevents the device from
> being released, which is a problem for opencapi devices, since we want
> to be able to remove them through PCI hotplug.
> 
> Now the ugly part: nvlink npu devices are not meant to be
> released. Because of the above, we've always leaked a reference and
> simply removing it now is dangerous and would likely require more
> work. There's currently no release device callback for nvlink devices
> for example. So to be safe, this patch leaks a reference on the npu
> device, but only for nvlink and not opencapi.
> 
> CC: a...@ozlabs.ru
> CC: ooh...@gmail.com
> Signed-off-by: Frederic Barrat 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/05dd7da76986937fb288b4213b1fa10dbe0d1b33

cheers


Re: [PATCH] powerpc/xive: Drop extern qualifiers from header function prototypes

2020-01-28 Thread Michael Ellerman
On Fri, 2019-11-15 at 18:10:58 UTC, Greg Kurz wrote:
> As reported by ./scripts/checkpatch.pl --strict:
> 
> CHECK: extern prototypes should be avoided in .h files
> 
> Signed-off-by: Greg Kurz 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b059c63620fbba8a5da60f01d99d003681447e3c

cheers


Re: [PATCH v2 1/2] powerpc/powernv: Rework exports to support subnodes

2020-01-28 Thread Michael Ellerman
On Fri, 2019-11-01 at 06:26:10 UTC, Oliver O'Halloran wrote:
> Originally we only had a handful of exported memory ranges, but we'd to
> export the per-core trace buffers. This results in a lot of files in the
> exports directory which is a but unfortunate. We can clean things up a bit
> by turning subnodes into subdirectories of the exports directory.
> 
> Signed-off-by: Oliver O'Halloran 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/db93361260e2411f854514e8b54b6c31a5b2d5bd

cheers


Re: [PATCH 1/2] powerpc/xmon: Allow passing an argument to ppc_md.restart()

2020-01-28 Thread Michael Ellerman
On Fri, 2019-11-01 at 08:55:21 UTC, Oliver O'Halloran wrote:
> On PowerNV a few different kinds of reboot are supported. We'd like to be
> able to exercise these from xmon so allow 'zr' to take an argument, and
> pass that to the ppc_md.restart() function.
> 
> Signed-off-by: Oliver O'Halloran 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2d9b332d99b24f27f77e9ba38ce3c8beb11a81c0

cheers


Re: [PATCH 1/3] powernv/pci: Use pnv_phb as the private data for debugfs entries

2020-01-28 Thread Michael Ellerman
On Thu, 2019-09-12 at 05:29:43 UTC, Oliver O'Halloran wrote:
> Use the pnv_phb structure as the private data pointer for the debugfs
> files.  This lets us delete some code and an open-coded use of
> hose->private_data.
> 
> Signed-off-by: Oliver O'Halloran 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/22ba7289079be12c85895fee41602139e9553c93

cheers


Re: [PATCH] powerpc/eeh: Only dump stack once if an MMIO loop is detected

2020-01-28 Thread Michael Ellerman
On Wed, 2019-10-16 at 01:25:36 UTC, Oliver O'Halloran wrote:
> Many drivers don't check for errors when they get a 0xFFs response from an
> MMIO load. As a result after an EEH event occurs a driver can get stuck in
> a polling loop unless it some kind of internal timeout logic.
> 
> Currently EEH tries to detect and report stuck drivers by dumping a stack
> trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an
> already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump
> would occur every few seconds if the driver was spinning in a loop. This
> results in a lot of spurious stack traces in the kernel log.
> 
> Fix this by limiting it to printing one stack trace for each PE freeze. If
> the driver is truely stuck the kernel's hung task detector is better suited
> to reporting the probelm anyway.
> 
> Cc: Sam Bobroff 
> Signed-off-by: Oliver O'Halloran 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/4e0942c0302b5ad76b228b1a7b8c09f658a1d58a

cheers


Re: [PATCH 1/3] powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV

2020-01-28 Thread Michael Ellerman
On Wed, 2019-08-21 at 06:26:53 UTC, Oliver O'Halloran wrote:
> When disabling virtual functions on an SR-IOV adapter we currently do not
> correctly remove the EEH state for the now-dead virtual functions. When
> removing the pci_dn that was created for the VF when SR-IOV was enabled
> we free the corresponding eeh_dev without removing it from the child device
> list of the eeh_pe that contained it. This can result in crashes due to the
> use-after-free.
> 
> Signed-off-by: Oliver O'Halloran 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/1fb4124ca9d456656a324f1ee29b7bf942f59ac8

cheers


Re: [PATCH 1/2] powerpc/64s: remplement power4_idle code in C

2020-01-28 Thread Michael Ellerman
On Thu, 2019-07-11 at 02:24:03 UTC, Nicholas Piggin wrote:
> This implements the tricky tracing and soft irq handling bits in C,
> leaving the low level bit to asm.
> 
> A functional difference is that this redirects the interrupt exit to
> a return stub to execute blr, rather than the lr address itself. This
> is probably barely measurable on real hardware, but it keeps the link
> stack balanced.
> 
> Tested with QEMU.
> 
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ed0bc98f8cbe4f8254759d333a47aedc816ff8c5

cheers


Re: [PATCH 1/5] powerpc/eeh_cache: Don't use pci_dn when inserting new ranges

2020-01-28 Thread Michael Ellerman
On Mon, 2019-07-15 at 08:56:08 UTC, Oliver O'Halloran wrote:
> At the point where we start inserting ranges into the EEH address cache the
> binding between pci_dev and eeh_dev has already been set up. Instead of
> consulting the pci_dn tree we can retrieve the eeh_dev directly using
> pci_dev_to_eeh_dev().
> 
> Signed-off-by: Oliver O'Halloran 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b1268f4cdba71c0cd40b533778812340d36de8ae

cheers


Re: [PATCH] KVM: PPC: Book3S HV: XIVE: Fix typo in comment

2020-01-28 Thread Michael Ellerman
On Wed, 2019-07-03 at 22:03:19 UTC, Greg Kurz wrote:
> Cc: triv...@kernel.org
> Signed-off-by: Greg Kurz 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/6a3163212f311daaf2ca3b676db2e11cfd81c6b3

cheers


Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Nicholas Piggin
Adhemerval Zanella's on January 29, 2020 3:26 am:
> 
> 
> On 28/01/2020 11:05, Nicholas Piggin wrote:
>> Florian Weimer's on January 28, 2020 11:09 pm:
>>> * Nicholas Piggin:
>>>
 * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
   vectors will return -ENOSYS, and the decision for how to add support for
   a new vector deferred until we see the next user.
>>>
>>> Seems reasonable.  We don't have to decide this today.
>>>
 * Proposal is for scv 0 to provide the standard Linux system call ABI with 
 some
   differences:

 - LR is volatile across scv calls. This is necessary for support because 
 the
   scv instruction clobbers LR.
>>>
>>> I think we can express this in the glibc system call assembler wrapper
>>> generators.  The mcount profiling wrappers already have this property.
>>>
>>> But I don't think we are so lucky for the inline system calls.  GCC
>>> recognizes an "lr" clobber with inline asm (even though it is not
>>> documented), but it generates rather strange assembler output as a
>>> result:
>>>
>>> long
>>> f (long x)
>>> {
>>>   long y;
>>>   asm ("#" : "=r" (y) : "r" (x) : "lr");
>>>   return y;
>>> }
>>>
>>> .abiversion 2
>>> .section".text"
>>> .align 2
>>> .p2align 4,,15
>>> .globl f
>>> .type   f, @function
>>> f:
>>> .LFB0:
>>> .cfi_startproc
>>> mflr 0
>>> .cfi_register 65, 0
>>> #APP
>>>  # 5 "t.c" 1
>>> #
>>>  # 0 "" 2
>>> #NO_APP
>>> std 0,16(1)
>>> .cfi_offset 65, 16
>>> ori 2,2,0
>>> ld 0,16(1)
>>> mtlr 0
>>> .cfi_restore 65
>>> blr
>>> .long 0
>>> .byte 0,0,0,1,0,0,0,0
>>> .cfi_endproc
>>> .LFE0:
>>> .size   f,.-f
>>>
>>>
>>> That's with GCC 8.3 at -O2.  I don't understand what the ori is about.
>> 
>> ori 2,2,0 is the group terminating nop hint for POWER8 type cores
>> which had dispatch grouping rules.
> 
> It worth to note that it aims to mitigate a load-hit-store cpu stall
> on some powerpc chips.
> 
>> 
>>>
>>> I don't think we can save LR in a regular register around the system
>>> call, explicitly in the inline asm statement, because we still have to
>>> generate proper unwinding information using CFI directives, something
>>> that you cannot do from within the asm statement.
>>>
>>> Supporting this in GCC should not be impossible, but someone who
>>> actually knows this stuff needs to look at it.
>> 
>> The generated assembler actually seems okay to me. If we compile
>> something like a syscall and with -mcpu=power9:
>> 
>> long
>> f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0)
>> {
>>   register long r0 asm ("r0") = _r0;
>>   register long r3 asm ("r3") = _r3;
>>   register long r4 asm ("r4") = _r4;
>>   register long r5 asm ("r5") = _r5;
>>   register long r6 asm ("r6") = _r6;
>>   register long r7 asm ("r7") = _r7;
>>   register long r8 asm ("r8") = _r8;
>> 
>>   asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), 
>> "r"(r8) : "lr", "ctr", "cc", "xer");
>> 
>>   return r3;
>> }
>> 
>> 
>> f:
>> .LFB0:
>> .cfi_startproc
>> mflr 0
>> std 0,16(1)
>> .cfi_offset 65, 16
>> mr 0,9
>> #APP
>>  # 12 "a.c" 1
>> # scv
>>  # 0 "" 2
>> #NO_APP
>> ld 0,16(1)
>> mtlr 0
>> .cfi_restore 65
>> blr
>> .long 0
>> .byte 0,0,0,1,0,0,0,0
>> .cfi_endproc
>> 
>> That gets the LR save/restore right when we're also using r0.
>> 
>>>
 - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
   system call exit to avoid restoring the CR register.
>>>
>>> This sounds reasonable, but I don't know what kind of knock-on effects
>>> this has.  The inline system call wrappers can handle this with minor
>>> tweaks.
>> 
>> Okay, good. In the end we would have to check code trace through the
>> kernel and libc of course, but I think there's little to no opportunity
>> to take advantage of current extra non-volatile cr regs.
>> 
>> mtcr has to write 8 independently renamed registers so it's cracked into
>> 2 insns on POWER9 (and likely to always be a bit troublesome). It's not
>> much in the scheme of a system call, but while we can tweak the ABI...
> 
> We don't really need a mfcr/mfocr to implement the Linux syscall ABI on
> powerpc, we can use a 'bns+' plus a neg instead as:
> 
> --
> #define internal_syscall6(name, err, nr, arg1, arg2, arg3, arg4, arg5,  \
>   arg6) \
>   ({\
> register long int r0  __asm__ ("r0") = (long int) (name);   \
> register long int r3  __asm__ ("r3") = (long int) (arg1);   \
> register long int r4  __asm__ ("r4") = (long int) (arg2);   \
> register long int r5  __asm__ ("r5") = (long int) (arg3);   \
> register long int r6  __asm__ ("r6") = (long int) (arg4); 

Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Nicholas Piggin
Florian Weimer's on January 29, 2020 1:58 am:
> * Nicholas Piggin:
> 
>> That gets the LR save/restore right when we're also using r0.
> 
> Yes, I agree it looks good.  Nice.
> 
>>> But the kernel uses the -errno convention internally, so I think it
>>> would make sense to pass this to userspace and not convert back and
>>> forth.  This would align with what most of the architectures do, and
>>> also avoids the GCC oddity.
>>
>> Yes I would be interested in opinions for this option. It seems like
>> matching other architectures is a good idea. Maybe there are some
>> reasons not to.
> 
> If there were a POWER-specific system call that uses all result bits and
> doesn't have room for the 4096 error states (or an error number that's
> outside that range), that would be a blocker.  I can't find such a
> system call wrapped in the glibc sources.

Nothing apparent in the kernel sources either.

> musl's inline syscalls always
> convert the errno state to -errno, so it's not possible to use such a
> system call there.
> 
 - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
   calls if there was interest in developing an ABI for 32-bit programs.
   Marginal benefit in avoiding compat syscall selection.
>>> 
>>> We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
>>> provide an ELFv1 port for this given that it's POWER9-specific.
>>
>> Okay. There's no reason not to enable this for BE, at least for the
>> kernel it's no additional work so it probably remains enabled (unless
>> there is something really good we could do with the ABI if we exclude
>> ELFv1 but I don't see anything).
>>
>> But if glibc only builds for ELFv2 support that's probably reasonable.
> 
> To be clear, we still support ELFv1 for POWER, but given that this
> feature is POWER9 and later, I expect the number of users benefiting
> from 32-bit support (or ELFv1 and thus big-endian support) to be quite
> small.
> 
> Especially if we go the conditional branch route, I would restrict this
> to ELFv2 in glibc, at least for default builds.
> 
>>> From the glibc perspective, the major question is how we handle run-time
>>> selection of the system call instruction sequence.  On i386, we use a
>>> function pointer in the TCB to call an instruction sequence in the vDSO.
>>> That's problematic from a security perspective.  I expect that on
>>> POWER9, using a pointer in read-only memory would be equally
>>> non-attractive due to a similar lack of PC-relative addressing.  We
>>> could use the HWCAP bit in the TCB, but that would add another (easy to
>>> predict) conditional branch to every system call.
>>
>> I would have to defer to glibc devs on this. Conditional branch
>> should be acceptable I think, scv improves speed as much as several
>> mispredicted branches (about 90 cycles).
> 
> But we'd have to pay for that branch (and likely the LR clobber) on
> legacy POWER, and that's something to consider, too.

We would that's true.

> Is there an additional performance hit if a process uses both the old
> and new system call sequence?

No state or logic required to switch between them or run them
concurrently. Just the  extra instruction footprint.

Thanks,
Nick



[PATCH RESEND] powerpc: indent to improve Kconfig readability

2020-01-28 Thread Randy Dunlap
From: Randy Dunlap 

Indent a Kconfig continuation line to improve readability.

Signed-off-by: Randy Dunlap 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/Kconfig |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-next-20200128.orig/arch/powerpc/Kconfig
+++ linux-next-20200128/arch/powerpc/Kconfig
@@ -478,7 +478,7 @@ config MPROFILE_KERNEL
 config HOTPLUG_CPU
bool "Support for enabling/disabling CPUs"
depends on SMP && (PPC_PSERIES || \
-   PPC_PMAC || PPC_POWERNV || FSL_SOC_BOOKE)
+   PPC_PMAC || PPC_POWERNV || FSL_SOC_BOOKE)
help
  Say Y here to be able to disable and re-enable individual
  CPUs at runtime on SMP machines.



Re: [PATCH v2 07/10] powerpc/configs/skiroot: Enable security features

2020-01-28 Thread Daniel Axtens
Joel Stanley  writes:

> On Tue, 21 Jan 2020 at 04:30, Michael Ellerman  wrote:
>>
>> From: Joel Stanley 
>>
>> This turns on HARDENED_USERCOPY with HARDENED_USERCOPY_PAGESPAN, and
>> FORTIFY_SOURCE.
>>
>> It also enables SECURITY_LOCKDOWN_LSM with _EARLY and
>> LOCK_DOWN_KERNEL_FORCE_INTEGRITY options enabled. This still allows
>> xmon to be used in read-only mode.
>>
>> MODULE_SIG is selected by lockdown, so it is still enabled.
>>
>> Signed-off-by: Joel Stanley 
>> [mpe: Switch to lockdown integrity mode per oohal]
>> Signed-off-by: Michael Ellerman 
>
> I did some testing and with change we break kexec. As it's critical
> for this kernel to be able to kexec we need to set KEXEC_FILE=y if
> we're setting FORCE_INTEGRITY=y.
>
> I've tested your series with that modification made and userspace was
> once again able to kexec (with -s).

Has the changes that enable this landed in kexec-lite and petitboot yet?
I had to manually patch them when I was experimenting with it
recently...

Regards,
Daniel

>
> Cheers,
>
> Joel
>
>> ---
>>  arch/powerpc/configs/skiroot_defconfig | 11 ++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> v2: Switch to lockdown integrity mode rather than confidentiality as noticed 
>> by
>> dja and discussed with jms and oohal.
>>
>> diff --git a/arch/powerpc/configs/skiroot_defconfig 
>> b/arch/powerpc/configs/skiroot_defconfig
>> index 24a210fe0049..93b478436a2b 100644
>> --- a/arch/powerpc/configs/skiroot_defconfig
>> +++ b/arch/powerpc/configs/skiroot_defconfig
>> @@ -49,7 +49,6 @@ CONFIG_JUMP_LABEL=y
>>  CONFIG_STRICT_KERNEL_RWX=y
>>  CONFIG_MODULES=y
>>  CONFIG_MODULE_UNLOAD=y
>> -CONFIG_MODULE_SIG=y
>>  CONFIG_MODULE_SIG_FORCE=y
>>  CONFIG_MODULE_SIG_SHA512=y
>>  CONFIG_PARTITION_ADVANCED=y
>> @@ -272,6 +271,16 @@ CONFIG_NLS_ASCII=y
>>  CONFIG_NLS_ISO8859_1=y
>>  CONFIG_NLS_UTF8=y
>>  CONFIG_ENCRYPTED_KEYS=y
>> +CONFIG_SECURITY=y
>> +CONFIG_HARDENED_USERCOPY=y
>> +# CONFIG_HARDENED_USERCOPY_FALLBACK is not set
>> +CONFIG_HARDENED_USERCOPY_PAGESPAN=y
>> +CONFIG_FORTIFY_SOURCE=y
>> +CONFIG_SECURITY_LOCKDOWN_LSM=y
>> +CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
>> +CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY=y
>> +# CONFIG_INTEGRITY is not set
>> +CONFIG_LSM="yama,loadpin,safesetid,integrity"
>>  # CONFIG_CRYPTO_HW is not set
>>  CONFIG_CRC16=y
>>  CONFIG_CRC_ITU_T=y
>> --
>> 2.21.1
>>


Re: [PATCH] powerpc/drmem: cache LMBs in xarray to accelerate lookup

2020-01-28 Thread Nathan Lynch
Scott Cheloha  writes:
> LMB lookup is currently an O(n) linear search.  This scales poorly when
> there are many LMBs.
>
> If we cache each LMB by both its base address and its DRC index
> in an xarray we can cut lookups to O(log n), greatly accelerating
> drmem initialization and memory hotplug.
>
> This patch introduces two xarrays of of LMBs and fills them during
> drmem initialization.  The patch also adds two interfaces for LMB
> lookup.

Good but can you replace the array of LMBs altogether
(drmem_info->lmbs)? xarray allows iteration over the members if needed.


[PATCH] powerpc/drmem: cache LMBs in xarray to accelerate lookup

2020-01-28 Thread Scott Cheloha
LMB lookup is currently an O(n) linear search.  This scales poorly when
there are many LMBs.

If we cache each LMB by both its base address and its DRC index
in an xarray we can cut lookups to O(log n), greatly accelerating
drmem initialization and memory hotplug.

This patch introduces two xarrays of of LMBs and fills them during
drmem initialization.  The patch also adds two interfaces for LMB
lookup.

The first interface, drmem_find_lmb_by_base_addr(), is employed in
hot_add_drconf_scn_to_nid() to replace a linear search.  This speeds up
memory_add_physaddr_to_nid(), which is called by lmb_set_nid(), an
interface used during drmem initialization and memory hotplug.

The second interface, drmem_find_lmb_by_drc_index(), is employed in
get_lmb_range() to replace a linear search.  This speeds up
dlpar_memory_add_by_ic() and dlpar_memory_remove_by_ic(), interfaces
used during memory hotplug.

These substitutions yield significant improvements:

1. A POWER9 VM with a maximum memory of 10TB and 256MB LMBs has
   40960 LMBs.  With this patch it completes drmem_init() ~1138ms
   faster.

Before:
[0.542244] drmem: initializing drmem v1
[1.768787] drmem: initialized 40960 LMBs

After:
[0.543611] drmem: initializing drmem v1
[0.631386] drmem: initialized 40960 LMBs

2. A POWER9 VM with a maximum memory of 4TB and 256MB LMBs has
   16384 LMBs.  Via the qemu monitor we can hot-add memory as
   virtual DIMMs.  Each DIMM is 256 LMBs.  With this patch we
   hot-add every possible LMB about 60 seconds faster.

Before:
[   17.422177] pseries-hotplug-mem: Attempting to hot-add 256 LMB(s) at index 
8100
[...]
[  167.285563] pseries-hotplug-mem: Memory at 3fff000 (drc index 80003fff) 
was hot-added

After:
[   14.753480] pseries-hotplug-mem: Attempting to hot-add 256 LMB(s) at index 
8100
[...]
[  103.934092] pseries-hotplug-mem: Memory at 3fff000 (drc index 80003fff) 
was hot-added

Signed-off-by: Scott Cheloha 
---
These linear searches become a serious bottleneck as the machine
approaches 64TB.  There are just too many LMBs to use a linear
search.

On a 60TB machine we recently saw the following soft lockup during
drmem_init():

[   60.602386] watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [swapper/0:1]
[   60.602414] Modules linked in:
[   60.602417] Supported: No, Unreleased kernel
[   60.602423] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 5.3.18-2-default #1 
SLE15-SP2 (unreleased)
[   60.602426] NIP:  c0095c0c LR: c0095bb0 CTR: 
[   60.602430] REGS: c00022c7fc497830 TRAP: 0901   Not tainted  
(5.3.18-2-default)
[   60.602432] MSR:  82009033   CR: 44000244  
XER: 
[   60.602442] CFAR: c0095c18 IRQMASK: 0 
   GPR00: c0095bb0 c00022c7fc497ac0 c162cc00 
c0003bff5e08 
   GPR04:  c0ea539a 002f 
1000 
   GPR08: c7fc5f59ffb8 14bc5000 c7fc5f1f1a30 
c14f2fb8 
   GPR12:  c0001e980600 
[   60.602464] NIP [c0095c0c] hot_add_scn_to_nid+0xbc/0x400
[   60.602467] LR [c0095bb0] hot_add_scn_to_nid+0x60/0x400
[   60.602470] Call Trace:
[   60.602473] [c00022c7fc497ac0] [c0095bb0] 
hot_add_scn_to_nid+0x60/0x400 (unreliable)
[   60.602478] [c00022c7fc497b20] [c007a6a0] 
memory_add_physaddr_to_nid+0x20/0x60
[   60.602483] [c00022c7fc497b40] [c10235a4] drmem_init+0x258/0x2d8
[   60.602485] [c00022c7fc497c10] [c0010694] do_one_initcall+0x64/0x300
[   60.602489] [c00022c7fc497ce0] [c10144f8] 
kernel_init_freeable+0x2e8/0x3fc
[   60.602491] [c00022c7fc497db0] [c0010b0c] kernel_init+0x2c/0x160
[   60.602497] [c00022c7fc497e20] [c000b960] 
ret_from_kernel_thread+0x5c/0x7c
[   60.602498] Instruction dump:
[   60.602501] 7d0a4214 7faa4040 419d0328 e92a0010 71290088 2fa90008 409e001c 
e92a 
[   60.602506] 7fbe4840 419c0010 7d274a14 7fbe4840 <419c00e4> 394a0018 7faa4040 
409dffd0 

This patch should eliminate the drmem_init() bottleneck during boot.

One other important thing to note is that this only addresses part of
the slowdown during memory hotplug when there are many LMBs.  A far larger
part of it is caused by the linear memblock search in find_memory_block().
That problem is addressed with this patch:

https://lore.kernel.org/lkml/20200121231028.13699-1-chel...@linux.ibm.com/

which is in linux-next.  The numbers I quote here in the commit message
for time improvements during hotplug are taken with that patch applied.

Without it, hotplug is even slower.

 arch/powerpc/include/asm/drmem.h  |  3 ++
 arch/powerpc/mm/drmem.c   | 33 +++
 arch/powerpc/mm/numa.c| 30 +++--
 .../platforms/pseries/hotplug-memory.c| 11 ++-
 4 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h 

Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Joseph Myers
On Tue, 28 Jan 2020, Florian Weimer wrote:

> I don't think we can save LR in a regular register around the system
> call, explicitly in the inline asm statement, because we still have to
> generate proper unwinding information using CFI directives, something
> that you cannot do from within the asm statement.

What other architectures in glibc have done for code sequences for 
syscalls that are problematic for compiler-generated CFI is made the C 
syscall macros call separate functions defined in a .S file (see 
sysdeps/unix/sysv/linux/arm/libc-do-syscall.S, 
sysdeps/unix/sysv/linux/i386/libc-do-syscall.S, 
sysdeps/unix/sysv/linux/mips/mips32/mips-syscall[567].S).  I don't know if 
you can do that in this case and still get the performance benefits of the 
new instruction.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v6 03/10] perf/core: open access to probes for CAP_PERFMON privileged process

2020-01-28 Thread James Morris
On Tue, 28 Jan 2020, Alexey Budankov wrote:

> 
> Open access to monitoring via kprobes and uprobes and eBPF tracing for
> CAP_PERFMON privileged process. Providing the access under CAP_PERFMON
> capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes
> chances to misuse the credentials and makes operation more secure.
> 
> perf kprobes and uprobes are used by ftrace and eBPF. perf probe uses
> ftrace to define new kprobe events, and those events are treated as
> tracepoint events. eBPF defines new probes via perf_event_open interface
> and then the probes are used in eBPF tracing.
> 
> CAP_PERFMON implements the principal of least privilege for performance
> monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
> of least privilege: A security design principle that states that a process or
> program be granted only those privileges (e.g., capabilities) necessary to
> accomplish its legitimate function, and only for the time that such privileges
> are actually required)
> 
> For backward compatibility reasons access to perf_events subsystem remains
> open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
> secure perf_events monitoring is discouraged with respect to CAP_PERFMON
> capability.
> 
> Signed-off-by: Alexey Budankov 
> ---
>  kernel/events/core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)


Acked-by: James Morris 


-- 
James Morris




Re: [PATCH v6 07/10] powerpc/perf: open access for CAP_PERFMON privileged process

2020-01-28 Thread James Morris
On Tue, 28 Jan 2020, Alexey Budankov wrote:

> Signed-off-by: Alexey Budankov 
> ---
>  arch/powerpc/perf/imc-pmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index cb50a9e1fd2d..e837717492e4 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -898,7 +898,7 @@ static int thread_imc_event_init(struct perf_event *event)
>   if (event->attr.type != event->pmu->type)
>   return -ENOENT;
>  
> - if (!capable(CAP_SYS_ADMIN))
> + if (!perfmon_capable())
>   return -EACCES;
>  
>   /* Sampling not supported */
> @@ -1307,7 +1307,7 @@ static int trace_imc_event_init(struct perf_event 
> *event)
>   if (event->attr.type != event->pmu->type)
>   return -ENOENT;
>  
> - if (!capable(CAP_SYS_ADMIN))
> + if (!perfmon_capable())
>   return -EACCES;
>  
>   /* Return if this is a couting event */
> 


Acked-by: James Morris 

-- 
James Morris




Re: [PATCH v6 06/10] trace/bpf_trace: open access for CAP_PERFMON privileged process

2020-01-28 Thread James Morris
On Tue, 28 Jan 2020, Alexey Budankov wrote:

> 
> Signed-off-by: Alexey Budankov 
> ---
>  kernel/trace/bpf_trace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index e5ef4ae9edb5..334f1d71ebb1 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1395,7 +1395,7 @@ int perf_event_query_prog_array(struct perf_event 
> *event, void __user *info)
>   u32 *ids, prog_cnt, ids_len;
>   int ret;
>  
> - if (!capable(CAP_SYS_ADMIN))
> + if (!perfmon_capable())
>   return -EPERM;
>   if (event->attr.type != PERF_TYPE_TRACEPOINT)
>   return -EINVAL;
> 


Acked-by: James Morris 


-- 
James Morris




Re: [PATCH v6 01/10] capabilities: introduce CAP_PERFMON to kernel and user space

2020-01-28 Thread James Morris
On Tue, 28 Jan 2020, Alexey Budankov wrote:

> 
> Signed-off-by: Alexey Budankov 
> ---
>  include/linux/capability.h  | 4 
>  include/uapi/linux/capability.h | 8 +++-
>  security/selinux/include/classmap.h | 4 ++--
>  3 files changed, 13 insertions(+), 3 deletions(-)


Acked-by: James Morris 


-- 
James Morris




Re: [PATCH v6 10/10] drivers/oprofile: open access for CAP_PERFMON privileged process

2020-01-28 Thread James Morris
On Tue, 28 Jan 2020, Alexey Budankov wrote:

> 
> Open access to monitoring for CAP_PERFMON privileged process. Providing
> the access under CAP_PERFMON capability singly, without the rest of
> CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and
> makes operation more secure.
> 
> CAP_PERFMON implements the principal of least privilege for performance
> monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
> of least privilege: A security design principle that states that a process
> or program be granted only those privileges (e.g., capabilities) necessary
> to accomplish its legitimate function, and only for the time that such
> privileges are actually required)
> 
> For backward compatibility reasons access to the monitoring remains open
> for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
> monitoring is discouraged with respect to CAP_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov 
> ---
>  drivers/oprofile/event_buffer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)


Acked-by: James Morris 

> 
> diff --git a/drivers/oprofile/event_buffer.c b/drivers/oprofile/event_buffer.c
> index 12ea4a4ad607..6c9edc8bbc95 100644
> --- a/drivers/oprofile/event_buffer.c
> +++ b/drivers/oprofile/event_buffer.c
> @@ -113,7 +113,7 @@ static int event_buffer_open(struct inode *inode, struct 
> file *file)
>  {
>   int err = -EPERM;
>  
> - if (!capable(CAP_SYS_ADMIN))
> + if (!perfmon_capable())
>   return -EPERM;
>  
>   if (test_and_set_bit_lock(0, _opened))
> 

-- 
James Morris




Re: [PATCH v6 09/10] drivers/perf: open access for CAP_PERFMON privileged process

2020-01-28 Thread James Morris
On Tue, 28 Jan 2020, Alexey Budankov wrote:

> 
> Open access to monitoring for CAP_PERFMON privileged process.
> Providing the access under CAP_PERFMON capability singly, without the
> rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
> credentials and makes operation more secure.
> 
> CAP_PERFMON implements the principal of least privilege for performance
> monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
> of least privilege: A security design principle that states that a process
> or program be granted only those privileges (e.g., capabilities) necessary
> to accomplish its legitimate function, and only for the time that such
> privileges are actually required)
> 
> For backward compatibility reasons access to the monitoring remains open
> for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
> monitoring is discouraged with respect to CAP_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov 
> ---
>  drivers/perf/arm_spe_pmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 


Acked-by: James Morris 


> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 4e4984a55cd1..5dff81bc3324 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -274,7 +274,7 @@ static u64 arm_spe_event_to_pmscr(struct perf_event 
> *event)
>   if (!attr->exclude_kernel)
>   reg |= BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
>  
> - if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && capable(CAP_SYS_ADMIN))
> + if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable())
>   reg |= BIT(SYS_PMSCR_EL1_CX_SHIFT);
>  
>   return reg;
> @@ -700,7 +700,7 @@ static int arm_spe_pmu_event_init(struct perf_event 
> *event)
>   return -EOPNOTSUPP;
>  
>   reg = arm_spe_event_to_pmscr(event);
> - if (!capable(CAP_SYS_ADMIN) &&
> + if (!perfmon_capable() &&
>   (reg & (BIT(SYS_PMSCR_EL1_PA_SHIFT) |
>   BIT(SYS_PMSCR_EL1_CX_SHIFT) |
>   BIT(SYS_PMSCR_EL1_PCT_SHIFT
> 

-- 
James Morris




Re: [PATCH v6 08/10] parisc/perf: open access for CAP_PERFMON privileged process

2020-01-28 Thread James Morris
On Tue, 28 Jan 2020, Alexey Budankov wrote:

> 
> Open access to monitoring for CAP_PERFMON privileged process.
> Providing the access under CAP_PERFMON capability singly, without the
> rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
> credentials and makes operation more secure.
> 
> CAP_PERFMON implements the principal of least privilege for performance
> monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle
> of least privilege: A security design principle that states that a process
> or program be granted only those privileges (e.g., capabilities) necessary
> to accomplish its legitimate function, and only for the time that such
> privileges are actually required)
> 
> For backward compatibility reasons access to the monitoring remains open
> for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure
> monitoring is discouraged with respect to CAP_PERFMON capability.
> 
> Signed-off-by: Alexey Budankov 
> ---
>  arch/parisc/kernel/perf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/parisc/kernel/perf.c b/arch/parisc/kernel/perf.c
> index 676683641d00..c4208d027794 100644
> --- a/arch/parisc/kernel/perf.c
> +++ b/arch/parisc/kernel/perf.c
> @@ -300,7 +300,7 @@ static ssize_t perf_write(struct file *file, const char 
> __user *buf,
>   else
>   return -EFAULT;
>  
> - if (!capable(CAP_SYS_ADMIN))
> + if (!perfmon_capable())
>   return -EACCES;
>  
>   if (count != sizeof(uint32_t))
> 


Acked-by: James Morris 

-- 
James Morris




Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Segher Boessenkool
On Tue, Jan 28, 2020 at 05:04:49PM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> >> > I don't think we can save LR in a regular register around the system
> >> > call, explicitly in the inline asm statement, because we still have to
> >> > generate proper unwinding information using CFI directives, something
> >> > that you cannot do from within the asm statement.
> >
> > Why not?
> 
> As far as I knowm there isn't a CFI directive that allows us to restore
> the CFI state at the end of the inline assembly.  If we say that LR is
> stored in a different register than what the rest of the function uses,
> that would lead to incorrect CFI after the exit of the inline assembler
> fragment.
> 
> At least that's what I think.  Compilers aren't really my thing.

.cfi_restore?  Or .cfi_remember_state / .cfi_restore_state, that is
probably easiest in inline assembler.

> >> > GCC does not model the condition registers,
> >
> > Huh?  It does model the condition register, as 8 registers in GCC's
> > internal model (one each for CR0..CR7).
> 
> But GCC doesn't expose them as integers to C code, so you can't do much
> without them.

Sure, it doesn't expose any other registers directly, either.

> >> > We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
> >> > provide an ELFv1 port for this given that it's POWER9-specific.
> >
> > We *do* have a 32-bit LE ABI.  And ELFv1 is not 32-bit either.  Please
> > don't confuse these things :-)
> >
> > The 64-bit LE kernel does not really support 32-bit userland (or BE
> > userland), *that* is what you want to say.
> 
> Sorry for the confusion.  Is POWER9 running kernels which are not 64-bit
> LE really a thing in practice, though?

Linux only really supports 64-bit LE userland on p9.  Anything else is
not supported.

> >> > From the glibc perspective, the major question is how we handle run-time
> >> > selection of the system call instruction sequence.
> >
> > Well, if it is inlined you don't have this problem either!  :-)
> 
> How so?  We would have to put the conditional sequence into all inline
> system calls, of course.

Ah, if you support older systems in your program as well, gotcha.  That
is not the usual case (just like people use -mcpu=power9 frequently,
which means the resulting program will not run on any older CPU).


Segher


Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-28 Thread Qian Cai



> On Jan 28, 2020, at 12:47 PM, Catalin Marinas  wrote:
> 
> The primary goal here is not finding regressions but having clearly
> defined semantics of the page table accessors across architectures. x86
> and arm64 are a good starting point and other architectures will be
> enabled as they are aligned to the same semantics.

This still does not answer the fundamental question. If this test is simply 
inefficient to find bugs, who wants to spend time to use it regularly?  If this 
is just one off test that may get running once in a few years (when introducing 
a new arch), how does it justify the ongoing cost to maintain it?

I do agree there could be a need to clearly define this thing but that belongs 
to documentation rather than testing purpose. It is confusing to mix this with 
other config options which have somewhat a different purpose, it will then be a 
waste of time for people who mistakenly enable this for regular automatic 
testing and never found any bug from it.

Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-28 Thread Catalin Marinas
On Mon, Jan 27, 2020 at 09:11:53PM -0500, Qian Cai wrote:
> On Jan 27, 2020, at 8:28 PM, Anshuman Khandual  
> wrote:
> > This adds tests which will validate architecture page table helpers and
> > other accessors in their compliance with expected generic MM semantics.
> > This will help various architectures in validating changes to existing
> > page table helpers or addition of new ones.
[...]
> What’s the value of this block of new code? It only supports x86 and
> arm64 which are supposed to be good now. Did those tests ever find any
> regression or this is almost only useful for new architectures which
> only happened once in a few years?

The primary goal here is not finding regressions but having clearly
defined semantics of the page table accessors across architectures. x86
and arm64 are a good starting point and other architectures will be
enabled as they are aligned to the same semantics.

See for example this past discussion:

https://lore.kernel.org/linux-mm/20190628102003.ga56...@arrakis.emea.arm.com/

These tests should act as the 'contract' between the generic mm code and
the architecture port. Without clear semantics, some bugs may be a lot
subtler than a boot failure.

FTR, I fully support this patch (and I should get around to review it
properly; thanks for the reminder ;)).

-- 
Catalin


Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Adhemerval Zanella



On 28/01/2020 11:05, Nicholas Piggin wrote:
> Florian Weimer's on January 28, 2020 11:09 pm:
>> * Nicholas Piggin:
>>
>>> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
>>>   vectors will return -ENOSYS, and the decision for how to add support for
>>>   a new vector deferred until we see the next user.
>>
>> Seems reasonable.  We don't have to decide this today.
>>
>>> * Proposal is for scv 0 to provide the standard Linux system call ABI with 
>>> some
>>>   differences:
>>>
>>> - LR is volatile across scv calls. This is necessary for support because the
>>>   scv instruction clobbers LR.
>>
>> I think we can express this in the glibc system call assembler wrapper
>> generators.  The mcount profiling wrappers already have this property.
>>
>> But I don't think we are so lucky for the inline system calls.  GCC
>> recognizes an "lr" clobber with inline asm (even though it is not
>> documented), but it generates rather strange assembler output as a
>> result:
>>
>> long
>> f (long x)
>> {
>>   long y;
>>   asm ("#" : "=r" (y) : "r" (x) : "lr");
>>   return y;
>> }
>>
>>  .abiversion 2
>>  .section".text"
>>  .align 2
>>  .p2align 4,,15
>>  .globl f
>>  .type   f, @function
>> f:
>> .LFB0:
>>  .cfi_startproc
>>  mflr 0
>>  .cfi_register 65, 0
>> #APP
>>  # 5 "t.c" 1
>>  #
>>  # 0 "" 2
>> #NO_APP
>>  std 0,16(1)
>>  .cfi_offset 65, 16
>>  ori 2,2,0
>>  ld 0,16(1)
>>  mtlr 0
>>  .cfi_restore 65
>>  blr
>>  .long 0
>>  .byte 0,0,0,1,0,0,0,0
>>  .cfi_endproc
>> .LFE0:
>>  .size   f,.-f
>>
>>
>> That's with GCC 8.3 at -O2.  I don't understand what the ori is about.
> 
> ori 2,2,0 is the group terminating nop hint for POWER8 type cores
> which had dispatch grouping rules.

It worth to note that it aims to mitigate a load-hit-store cpu stall
on some powerpc chips.

> 
>>
>> I don't think we can save LR in a regular register around the system
>> call, explicitly in the inline asm statement, because we still have to
>> generate proper unwinding information using CFI directives, something
>> that you cannot do from within the asm statement.
>>
>> Supporting this in GCC should not be impossible, but someone who
>> actually knows this stuff needs to look at it.
> 
> The generated assembler actually seems okay to me. If we compile
> something like a syscall and with -mcpu=power9:
> 
> long
> f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0)
> {
>   register long r0 asm ("r0") = _r0;
>   register long r3 asm ("r3") = _r3;
>   register long r4 asm ("r4") = _r4;
>   register long r5 asm ("r5") = _r5;
>   register long r6 asm ("r6") = _r6;
>   register long r7 asm ("r7") = _r7;
>   register long r8 asm ("r8") = _r8;
> 
>   asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), 
> "r"(r8) : "lr", "ctr", "cc", "xer");
> 
>   return r3;
> }
> 
> 
> f:
> .LFB0:
> .cfi_startproc
> mflr 0
> std 0,16(1)
> .cfi_offset 65, 16
> mr 0,9
> #APP
>  # 12 "a.c" 1
> # scv
>  # 0 "" 2
> #NO_APP
> ld 0,16(1)
> mtlr 0
> .cfi_restore 65
> blr
> .long 0
> .byte 0,0,0,1,0,0,0,0
> .cfi_endproc
> 
> That gets the LR save/restore right when we're also using r0.
> 
>>
>>> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>>>   system call exit to avoid restoring the CR register.
>>
>> This sounds reasonable, but I don't know what kind of knock-on effects
>> this has.  The inline system call wrappers can handle this with minor
>> tweaks.
> 
> Okay, good. In the end we would have to check code trace through the
> kernel and libc of course, but I think there's little to no opportunity
> to take advantage of current extra non-volatile cr regs.
> 
> mtcr has to write 8 independently renamed registers so it's cracked into
> 2 insns on POWER9 (and likely to always be a bit troublesome). It's not
> much in the scheme of a system call, but while we can tweak the ABI...

We don't really need a mfcr/mfocr to implement the Linux syscall ABI on
powerpc, we can use a 'bns+' plus a neg instead as:

--
#define internal_syscall6(name, err, nr, arg1, arg2, arg3, arg4, arg5,  \
  arg6) \
  ({\
register long int r0  __asm__ ("r0") = (long int) (name);   \
register long int r3  __asm__ ("r3") = (long int) (arg1);   \
register long int r4  __asm__ ("r4") = (long int) (arg2);   \
register long int r5  __asm__ ("r5") = (long int) (arg3);   \
register long int r6  __asm__ ("r6") = (long int) (arg4);   \
register long int r7  __asm__ ("r7") = (long int) (arg5);   \
register long int r8  __asm__ ("r8") = (long int) (arg6);   \
__asm__ __volatile__

Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-28 Thread Vineet Gupta
On 1/27/20 7:33 PM, Qian Cai wrote:
>
>>> What’s the value of this block of new code? It only supports x86 and arm64
>>> which are supposed to be good now.
>> We have been over the usefulness of this code many times before as the patch 
>> is
>> already in it's V12. Currently it is enabled on arm64, x86 (except PAE), arc 
>> and
>> ppc32. There are build time or runtime problems with other archs which 
>> prevent
> I am not sure if I care too much about arc and ppc32 which are pretty much 
> legacy
> platforms.

You really need to brush up on your definition and knowledge of what "legacy" 
means.
ARC is actively maintained and used by several customers, some in arch/arc/plat*
and some not in there.
It is present in broadband routers used by major ISP, massively multicore deep
packet inspection system from EZChip, and many more

Sure you may not care about them, but the maintainers for the platforms do.
It would have been better if you had spent the time and energy in improving the
code over 12 revisions rather than bike shedding.

-Vineet


Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-28 Thread Christophe Leroy




Le 28/01/2020 à 02:27, Anshuman Khandual a écrit :

This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.

This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.

Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
right after page_alloc_init_late().

This gets build and run when CONFIG_DEBUG_VM_PGTABLE is selected along with
CONFIG_VM_DEBUG. Architectures willing to subscribe this test also need to
select CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE which for now is limited to x86 and
arm64. Going forward, other architectures too can enable this after fixing
build or runtime problems (if any) with their page table helpers.

Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here will
be reported as an warning which would need to be fixed. This test will help
catch any changes to the agreed upon semantics expected from generic MM and
enable platforms to accommodate it thereafter.



[...]



Tested-by: Christophe Leroy  #PPC32


Also tested on PPC64 (under QEMU): book3s/64 64k pages, book3s/64 4k 
pages and book3e/64



Reviewed-by: Ingo Molnar 
Suggested-by: Catalin Marinas 
Signed-off-by: Andrew Morton 
Signed-off-by: Christophe Leroy 
Signed-off-by: Anshuman Khandual 
---


[...]



diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt 
b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
new file mode 100644
index ..f3f8111edbe3
--- /dev/null
+++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
@@ -0,0 +1,35 @@
+#
+# Feature name:  debug-vm-pgtable
+# Kconfig:   ARCH_HAS_DEBUG_VM_PGTABLE
+# description:   arch supports pgtable tests for semantics compliance
+#
+---
+| arch |status|
+---
+|   alpha: | TODO |
+| arc: |  ok  |
+| arm: | TODO |
+|   arm64: |  ok  |
+| c6x: | TODO |
+|csky: | TODO |
+|   h8300: | TODO |
+| hexagon: | TODO |
+|ia64: | TODO |
+|m68k: | TODO |
+|  microblaze: | TODO |
+|mips: | TODO |
+|   nds32: | TODO |
+|   nios2: | TODO |
+|openrisc: | TODO |
+|  parisc: | TODO |
+|  powerpc/32: |  ok  |
+|  powerpc/64: | TODO |


You can change the two above lines by

powerpc: ok


+|   riscv: | TODO |
+|s390: | TODO |
+|  sh: | TODO |
+|   sparc: | TODO |
+|  um: | TODO |
+|   unicore32: | TODO |
+| x86: |  ok  |
+|  xtensa: | TODO |
+---



diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1ec34e16ed65..253dcab0bebc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -120,6 +120,7 @@ config PPC
#
select ARCH_32BIT_OFF_T if PPC32
select ARCH_HAS_DEBUG_VIRTUAL
+   select ARCH_HAS_DEBUG_VM_PGTABLE if PPC32


Remove the 'if PPC32' as we now know it also work on PPC64.


select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE



diff --git a/arch/x86/include/asm/pgtable_64.h 
b/arch/x86/include/asm/pgtable_64.h
index 0b6c4042942a..fb0e76d254b3 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -53,6 +53,12 @@ static inline void sync_initial_page_table(void) { }
  
  struct mm_struct;
  
+#define mm_p4d_folded mm_p4d_folded

+static inline bool mm_p4d_folded(struct mm_struct *mm)
+{
+   return !pgtable_l5_enabled();
+}
+


For me this should be part of another patch, it is not directly linked 
to the tests.



  void set_pte_vaddr_p4d(p4d_t *p4d_page, unsigned long vaddr, pte_t new_pte);
  void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte);
  
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h

index 798ea36a0549..e0b04787e789 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1208,6 +1208,12 @@ static inline bool arch_has_pfn_modify_check(void)
  # define PAGE_KERNEL_EXEC PAGE_KERNEL
  #endif
  
+#ifdef CONFIG_DEBUG_VM_PGTABLE


Not sure it is a good idea to put that in include/asm-generic/pgtable.h

By doing this you are forcing a 

Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Florian Weimer
* Segher Boessenkool:

>> > I don't think we can save LR in a regular register around the system
>> > call, explicitly in the inline asm statement, because we still have to
>> > generate proper unwinding information using CFI directives, something
>> > that you cannot do from within the asm statement.
>
> Why not?

As far as I knowm there isn't a CFI directive that allows us to restore
the CFI state at the end of the inline assembly.  If we say that LR is
stored in a different register than what the rest of the function uses,
that would lead to incorrect CFI after the exit of the inline assembler
fragment.

At least that's what I think.  Compilers aren't really my thing.

>
>> >> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
>> >>   instruction on the kernel side, and it is currently not implemented well
>> >>   in glibc, requiring a mfcr (mfocr should be possible and asm goto 
>> >> support
>> >>   would allow a better implementation). Is it worth continuing this style 
>> >> of
>> >>   error handling? Or just move to -ve return means error? Using a 
>> >> different
>> >>   bit would allow the kernel to piggy back the CR return code setting with
>> >>   a test for the error case exit.
>> > 
>> > GCC does not model the condition registers,
>
> Huh?  It does model the condition register, as 8 registers in GCC's
> internal model (one each for CR0..CR7).

But GCC doesn't expose them as integers to C code, so you can't do much
without them.

>> >> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>> >>   calls if there was interest in developing an ABI for 32-bit programs.
>> >>   Marginal benefit in avoiding compat syscall selection.
>> > 
>> > We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
>> > provide an ELFv1 port for this given that it's POWER9-specific.
>
> We *do* have a 32-bit LE ABI.  And ELFv1 is not 32-bit either.  Please
> don't confuse these things :-)
>
> The 64-bit LE kernel does not really support 32-bit userland (or BE
> userland), *that* is what you want to say.

Sorry for the confusion.  Is POWER9 running kernels which are not 64-bit
LE really a thing in practice, though?

>> > From the glibc perspective, the major question is how we handle run-time
>> > selection of the system call instruction sequence.
>
> Well, if it is inlined you don't have this problem either!  :-)

How so?  We would have to put the conditional sequence into all inline
system calls, of course.

Thanks,
Florian



Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Florian Weimer
* Nicholas Piggin:

> That gets the LR save/restore right when we're also using r0.

Yes, I agree it looks good.  Nice.

>> But the kernel uses the -errno convention internally, so I think it
>> would make sense to pass this to userspace and not convert back and
>> forth.  This would align with what most of the architectures do, and
>> also avoids the GCC oddity.
>
> Yes I would be interested in opinions for this option. It seems like
> matching other architectures is a good idea. Maybe there are some
> reasons not to.

If there were a POWER-specific system call that uses all result bits and
doesn't have room for the 4096 error states (or an error number that's
outside that range), that would be a blocker.  I can't find such a
system call wrapped in the glibc sources.  musl's inline syscalls always
convert the errno state to -errno, so it's not possible to use such a
system call there.

>>> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>>>   calls if there was interest in developing an ABI for 32-bit programs.
>>>   Marginal benefit in avoiding compat syscall selection.
>> 
>> We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
>> provide an ELFv1 port for this given that it's POWER9-specific.
>
> Okay. There's no reason not to enable this for BE, at least for the
> kernel it's no additional work so it probably remains enabled (unless
> there is something really good we could do with the ABI if we exclude
> ELFv1 but I don't see anything).
>
> But if glibc only builds for ELFv2 support that's probably reasonable.

To be clear, we still support ELFv1 for POWER, but given that this
feature is POWER9 and later, I expect the number of users benefiting
from 32-bit support (or ELFv1 and thus big-endian support) to be quite
small.

Especially if we go the conditional branch route, I would restrict this
to ELFv2 in glibc, at least for default builds.

>> From the glibc perspective, the major question is how we handle run-time
>> selection of the system call instruction sequence.  On i386, we use a
>> function pointer in the TCB to call an instruction sequence in the vDSO.
>> That's problematic from a security perspective.  I expect that on
>> POWER9, using a pointer in read-only memory would be equally
>> non-attractive due to a similar lack of PC-relative addressing.  We
>> could use the HWCAP bit in the TCB, but that would add another (easy to
>> predict) conditional branch to every system call.
>
> I would have to defer to glibc devs on this. Conditional branch
> should be acceptable I think, scv improves speed as much as several
> mispredicted branches (about 90 cycles).

But we'd have to pay for that branch (and likely the LR clobber) on
legacy POWER, and that's something to consider, too.

Is there an additional performance hit if a process uses both the old
and new system call sequence?

Thanks,
Florian



Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Segher Boessenkool
On Wed, Jan 29, 2020 at 12:05:40AM +1000, Nicholas Piggin wrote:
> Florian Weimer's on January 28, 2020 11:09 pm:
> > But I don't think we are so lucky for the inline system calls.  GCC
> > recognizes an "lr" clobber with inline asm (even though it is not
> > documented), but it generates rather strange assembler output as a
> > result:

> > std 0,16(1)
> > ori 2,2,0
> > ld 0,16(1)

> > That's with GCC 8.3 at -O2.  I don't understand what the ori is about.
> 
> ori 2,2,0 is the group terminating nop hint for POWER8 type cores
> which had dispatch grouping rules.

Yup.  GCC generates that here to force the load into a different
scheduling group than the corresponding store is, because that otherwise
would cause very expensive pipeline flushes.  It does that if it knows it
is the same address (like here).

> > I don't think we can save LR in a regular register around the system
> > call, explicitly in the inline asm statement, because we still have to
> > generate proper unwinding information using CFI directives, something
> > that you cannot do from within the asm statement.

Why not?

> >> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
> >>   instruction on the kernel side, and it is currently not implemented well
> >>   in glibc, requiring a mfcr (mfocr should be possible and asm goto support
> >>   would allow a better implementation). Is it worth continuing this style 
> >> of
> >>   error handling? Or just move to -ve return means error? Using a different
> >>   bit would allow the kernel to piggy back the CR return code setting with
> >>   a test for the error case exit.
> > 
> > GCC does not model the condition registers,

Huh?  It does model the condition register, as 8 registers in GCC's
internal model (one each for CR0..CR7).

There is no way to use CR0 across function calls, with our ABIs: it is
a volatile register.

GCC does not model the SO bits in the CR fields.

If the calling convention would only use registers GCC *does* know
about, we can have a builtin for this, so that you can get better
inlining etc., no need for an assembler wrapper.

> > But the kernel uses the -errno convention internally, so I think it
> > would make sense to pass this to userspace and not convert back and
> > forth.  This would align with what most of the architectures do, and
> > also avoids the GCC oddity.
> 
> Yes I would be interested in opinions for this option. It seems like
> matching other architectures is a good idea. Maybe there are some
> reasons not to.

Agreed with you both here.

> >> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
> >>   calls if there was interest in developing an ABI for 32-bit programs.
> >>   Marginal benefit in avoiding compat syscall selection.
> > 
> > We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
> > provide an ELFv1 port for this given that it's POWER9-specific.

We *do* have a 32-bit LE ABI.  And ELFv1 is not 32-bit either.  Please
don't confuse these things :-)

The 64-bit LE kernel does not really support 32-bit userland (or BE
userland), *that* is what you want to say.

> > From the glibc perspective, the major question is how we handle run-time
> > selection of the system call instruction sequence.

Well, if it is inlined you don't have this problem either!  :-)


Segher


Re: [PASEMI PA6T PPC] Onboard CF card device with new SanDisk High (>8G) CF cards

2020-01-28 Thread Christian Zigotzky

On 28 January 2020 at 3:16 pm, Rob Herring wrote:

On Tue, Jan 28, 2020 at 2:01 AM Christian Zigotzky
 wrote:

Hi All,

Which mailing list is responsible for the pata_pcmcia driver? We are
using new SanDisk High (>8G) CF cards with this driver [1] and we need
the following line in the file "drivers/ata/pata_pcmcia.c".

+PCMCIA_DEVICE_MANF_CARD(0x00f1, 0x0101),/* SanDisk High
(>8G) CFA */

Run get_maintainers.pl and it will answer that for you:

$ scripts/get_maintainer.pl -f drivers/ata/pata_pcmcia.c
Bartlomiej Zolnierkiewicz 
(maintainer:LIBATA PATA DRIVERS)
Jens Axboe  (maintainer:LIBATA PATA DRIVERS)
linux-...@vger.kernel.org (open list:LIBATA PATA DRIVERS)
linux-ker...@vger.kernel.org (open list)

Thank you!


Re: [PATCH] of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc

2020-01-28 Thread Rob Herring
On Sun, 26 Jan 2020 22:52:47 +1100, Michael Ellerman wrote:
> There's an OF helper called of_dma_is_coherent(), which checks if a
> device has a "dma-coherent" property to see if the device is coherent
> for DMA.
> 
> But on some platforms devices are coherent by default, and on some
> platforms it's not possible to update existing device trees to add the
> "dma-coherent" property.
> 
> So add a Kconfig symbol to allow arch code to tell
> of_dma_is_coherent() that devices are coherent by default, regardless
> of the presence of the property.
> 
> Select that symbol on powerpc when NOT_COHERENT_CACHE is not set, ie.
> when the system has a coherent cache.
> 
> Fixes: 92ea637edea3 ("of: introduce of_dma_is_coherent() helper")
> Cc: sta...@vger.kernel.org # v3.16+
> Reported-by: Christian Zigotzky 
> Tested-by: Christian Zigotzky 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/Kconfig | 1 +
>  drivers/of/Kconfig   | 4 
>  drivers/of/address.c | 6 +-
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 

Applied, thanks.

Rob


[PATCH 4.9 160/271] perf/ioctl: Add check for the sample_period value

2020-01-28 Thread Greg Kroah-Hartman
From: Ravi Bangoria 

[ Upstream commit 913a90bc5a3a06b1f04c337320e9aeee2328dd77 ]

perf_event_open() limits the sample_period to 63 bits. See:

  0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits")

Make ioctl() consistent with it.

Also on PowerPC, negative sample_period could cause a recursive
PMIs leading to a hang (reported when running perf-fuzzer).

Signed-off-by: Ravi Bangoria 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: a...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: ma...@linux.vnet.ibm.com
Cc: m...@ellerman.id.au
Fixes: 0819b2e30ccb ("perf: Limit perf_event_attr::sample_period to 63 bits")
Link: https://lkml.kernel.org/r/20190604042953.914-1-ravi.bango...@linux.ibm.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Sasha Levin 
---
 kernel/events/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5bbf7537a6121..64ace5e9af2a0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4624,6 +4624,9 @@ static int perf_event_period(struct perf_event *event, 
u64 __user *arg)
if (perf_event_check_period(event, value))
return -EINVAL;
 
+   if (!event->attr.freq && (value & (1ULL << 63)))
+   return -EINVAL;
+
event_function_call(event, __perf_event_period, );
 
return 0;
-- 
2.20.1





Re: [PASEMI PA6T PPC] Onboard CF card device with new SanDisk High (>8G) CF cards

2020-01-28 Thread Rob Herring
On Tue, Jan 28, 2020 at 2:01 AM Christian Zigotzky
 wrote:
>
> Hi All,
>
> Which mailing list is responsible for the pata_pcmcia driver? We are
> using new SanDisk High (>8G) CF cards with this driver [1] and we need
> the following line in the file "drivers/ata/pata_pcmcia.c".
>
> +PCMCIA_DEVICE_MANF_CARD(0x00f1, 0x0101),/* SanDisk High
> (>8G) CFA */

Run get_maintainers.pl and it will answer that for you:

$ scripts/get_maintainer.pl -f drivers/ata/pata_pcmcia.c
Bartlomiej Zolnierkiewicz 
(maintainer:LIBATA PATA DRIVERS)
Jens Axboe  (maintainer:LIBATA PATA DRIVERS)
linux-...@vger.kernel.org (open list:LIBATA PATA DRIVERS)
linux-ker...@vger.kernel.org (open list)


Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Nicholas Piggin
Florian Weimer's on January 28, 2020 11:09 pm:
> * Nicholas Piggin:
> 
>> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
>>   vectors will return -ENOSYS, and the decision for how to add support for
>>   a new vector deferred until we see the next user.
> 
> Seems reasonable.  We don't have to decide this today.
> 
>> * Proposal is for scv 0 to provide the standard Linux system call ABI with 
>> some
>>   differences:
>>
>> - LR is volatile across scv calls. This is necessary for support because the
>>   scv instruction clobbers LR.
> 
> I think we can express this in the glibc system call assembler wrapper
> generators.  The mcount profiling wrappers already have this property.
> 
> But I don't think we are so lucky for the inline system calls.  GCC
> recognizes an "lr" clobber with inline asm (even though it is not
> documented), but it generates rather strange assembler output as a
> result:
> 
> long
> f (long x)
> {
>   long y;
>   asm ("#" : "=r" (y) : "r" (x) : "lr");
>   return y;
> }
> 
>   .abiversion 2
>   .section".text"
>   .align 2
>   .p2align 4,,15
>   .globl f
>   .type   f, @function
> f:
> .LFB0:
>   .cfi_startproc
>   mflr 0
>   .cfi_register 65, 0
> #APP
>  # 5 "t.c" 1
>   #
>  # 0 "" 2
> #NO_APP
>   std 0,16(1)
>   .cfi_offset 65, 16
>   ori 2,2,0
>   ld 0,16(1)
>   mtlr 0
>   .cfi_restore 65
>   blr
>   .long 0
>   .byte 0,0,0,1,0,0,0,0
>   .cfi_endproc
> .LFE0:
>   .size   f,.-f
> 
> 
> That's with GCC 8.3 at -O2.  I don't understand what the ori is about.

ori 2,2,0 is the group terminating nop hint for POWER8 type cores
which had dispatch grouping rules.

> 
> I don't think we can save LR in a regular register around the system
> call, explicitly in the inline asm statement, because we still have to
> generate proper unwinding information using CFI directives, something
> that you cannot do from within the asm statement.
> 
> Supporting this in GCC should not be impossible, but someone who
> actually knows this stuff needs to look at it.

The generated assembler actually seems okay to me. If we compile
something like a syscall and with -mcpu=power9:

long
f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0)
{
  register long r0 asm ("r0") = _r0;
  register long r3 asm ("r3") = _r3;
  register long r4 asm ("r4") = _r4;
  register long r5 asm ("r5") = _r5;
  register long r6 asm ("r6") = _r6;
  register long r7 asm ("r7") = _r7;
  register long r8 asm ("r8") = _r8;

  asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), 
"r"(r8) : "lr", "ctr", "cc", "xer");

  return r3;
}


f:
.LFB0:
.cfi_startproc
mflr 0
std 0,16(1)
.cfi_offset 65, 16
mr 0,9
#APP
 # 12 "a.c" 1
# scv
 # 0 "" 2
#NO_APP
ld 0,16(1)
mtlr 0
.cfi_restore 65
blr
.long 0
.byte 0,0,0,1,0,0,0,0
.cfi_endproc

That gets the LR save/restore right when we're also using r0.

> 
>> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>>   system call exit to avoid restoring the CR register.
> 
> This sounds reasonable, but I don't know what kind of knock-on effects
> this has.  The inline system call wrappers can handle this with minor
> tweaks.

Okay, good. In the end we would have to check code trace through the
kernel and libc of course, but I think there's little to no opportunity
to take advantage of current extra non-volatile cr regs.

mtcr has to write 8 independently renamed registers so it's cracked into
2 insns on POWER9 (and likely to always be a bit troublesome). It's not
much in the scheme of a system call, but while we can tweak the ABI...

> 
>> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
>>   instruction on the kernel side, and it is currently not implemented well
>>   in glibc, requiring a mfcr (mfocr should be possible and asm goto support
>>   would allow a better implementation). Is it worth continuing this style of
>>   error handling? Or just move to -ve return means error? Using a different
>>   bit would allow the kernel to piggy back the CR return code setting with
>>   a test for the error case exit.
> 
> GCC does not model the condition registers, so for inline system calls,
> we have to produce a value anyway that the subsequence C code can check.
> The assembler syscall wrappers do not need to do this, of course, but
> I'm not sure which category of interfaces is more important.

Right. asm goto can improve this kind of pattern if it's inlined
into the C code which tests the result, it can branch using the flags
to the C error handling label, rather than move flags into GPR, test
GPR, branch. However...

> But the kernel uses the -errno convention internally, so I think it
> would make sense to pass this to userspace and not convert back and
> forth.  This would align with what most of the 

Re: powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Florian Weimer
* Nicholas Piggin:

> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
>   vectors will return -ENOSYS, and the decision for how to add support for
>   a new vector deferred until we see the next user.

Seems reasonable.  We don't have to decide this today.

> * Proposal is for scv 0 to provide the standard Linux system call ABI with 
> some
>   differences:
>
> - LR is volatile across scv calls. This is necessary for support because the
>   scv instruction clobbers LR.

I think we can express this in the glibc system call assembler wrapper
generators.  The mcount profiling wrappers already have this property.

But I don't think we are so lucky for the inline system calls.  GCC
recognizes an "lr" clobber with inline asm (even though it is not
documented), but it generates rather strange assembler output as a
result:

long
f (long x)
{
  long y;
  asm ("#" : "=r" (y) : "r" (x) : "lr");
  return y;
}

.abiversion 2
.section".text"
.align 2
.p2align 4,,15
.globl f
.type   f, @function
f:
.LFB0:
.cfi_startproc
mflr 0
.cfi_register 65, 0
#APP
 # 5 "t.c" 1
#
 # 0 "" 2
#NO_APP
std 0,16(1)
.cfi_offset 65, 16
ori 2,2,0
ld 0,16(1)
mtlr 0
.cfi_restore 65
blr
.long 0
.byte 0,0,0,1,0,0,0,0
.cfi_endproc
.LFE0:
.size   f,.-f


That's with GCC 8.3 at -O2.  I don't understand what the ori is about.

I don't think we can save LR in a regular register around the system
call, explicitly in the inline asm statement, because we still have to
generate proper unwinding information using CFI directives, something
that you cannot do from within the asm statement.

Supporting this in GCC should not be impossible, but someone who
actually knows this stuff needs to look at it.

> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>   system call exit to avoid restoring the CR register.

This sounds reasonable, but I don't know what kind of knock-on effects
this has.  The inline system call wrappers can handle this with minor
tweaks.

> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
>   instruction on the kernel side, and it is currently not implemented well
>   in glibc, requiring a mfcr (mfocr should be possible and asm goto support
>   would allow a better implementation). Is it worth continuing this style of
>   error handling? Or just move to -ve return means error? Using a different
>   bit would allow the kernel to piggy back the CR return code setting with
>   a test for the error case exit.

GCC does not model the condition registers, so for inline system calls,
we have to produce a value anyway that the subsequence C code can check.
The assembler syscall wrappers do not need to do this, of course, but
I'm not sure which category of interfaces is more important.

But the kernel uses the -errno convention internally, so I think it
would make sense to pass this to userspace and not convert back and
forth.  This would align with what most of the architectures do, and
also avoids the GCC oddity.

> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>   calls if there was interest in developing an ABI for 32-bit programs.
>   Marginal benefit in avoiding compat syscall selection.

We don't have an ELFv2 ABI for 32-bit.  I doubt it makes sense to
provide an ELFv1 port for this given that it's POWER9-specific.

>From the glibc perspective, the major question is how we handle run-time
selection of the system call instruction sequence.  On i386, we use a
function pointer in the TCB to call an instruction sequence in the vDSO.
That's problematic from a security perspective.  I expect that on
POWER9, using a pointer in read-only memory would be equally
non-attractive due to a similar lack of PC-relative addressing.  We
could use the HWCAP bit in the TCB, but that would add another (easy to
predict) conditional branch to every system call.

I don't think it matters whether both system call variants use the same
error convention because we could have different error code extraction
code on the two branches.

Thanks,
Florian



Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-28 Thread Qian Cai



> On Jan 28, 2020, at 7:10 AM, Mike Rapoport  wrote:
> 
> Aren't x86 and arm64 not decent enough?
> Even if this test could be used to detect regressions only on these two
> platforms, the test is valuable.

The question is does it detect regressions good enough? Where is the list of 
past bugs that it had found?

It is an usual deal for unproven debugging features remain out of tree first 
and keep gathering unique bugs it found and then justify for a mainline 
inclusion with enough data.

Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-28 Thread Mike Rapoport
Hello Qian,

On Mon, Jan 27, 2020 at 10:33:08PM -0500, Qian Cai wrote:
> 
> > On Jan 27, 2020, at 10:06 PM, Anshuman Khandual  
> > wrote:
> >
> > enablement of this test (for the moment) but then the goal is to integrate 
> > all
> > of them going forward. The test not only validates platform's adherence to 
> > the
> > expected semantics from generic MM but also helps in keeping it that way 
> > during
> > code changes in future as well.
> 
> Another option maybe to get some decent arches on board first before merging 
> this
> thing, so it have more changes to catch regressions for developers who might 
> run this. 

Aren't x86 and arm64 not decent enough?
Even if this test could be used to detect regressions only on these two
platforms, the test is valuable.
 

-- 
Sincerely yours,
Mike.



Re: [PATCH V12] mm/debug: Add tests validating architecture page table helpers

2020-01-28 Thread Mark Brown
On Tue, Jan 28, 2020 at 02:12:56AM -0500, Qian Cai wrote:
> > On Jan 28, 2020, at 1:13 AM, Christophe Leroy  
> > wrote:

> > ppc32 an indecent / legacy platform ? Are you kidying ?

> > Powerquicc II PRO for instance is fully supported by the
> > manufacturer and widely used in many small networking devices.

> Of course I forgot about embedded devices. The problem is that how
> many developers are actually going to run this debug option on
> embedded devices?

Much fewer if the code isn't upstream than if it is.  This isn't
something that every developer is going to enable all the time but that
doesn't mean it's not useful, it's more for people doing work on the
architectures or on memory management (or who suspect they're running
into a relevant problem), and I'm sure some of the automated testing
people will enable it.  The more barriers there are in place to getting
the testsuite up and running the less likely it is that any of these
groups will run it regularly.


signature.asc
Description: PGP signature


Re: [FSL P5020 P5040 PPC] Onboard SD card doesn't work anymore after the 'mmc-v5.4-2' updates

2020-01-28 Thread Michael Ellerman
Christian Zigotzky  writes:
> On 24 January 2020 at 12:42 pm, Michael Ellerman wrote:
>> Ulf Hansson  writes:
>>> On Thu, 16 Jan 2020 at 12:18, Christian Zigotzky  
>>> wrote:
 Hi All,

 We still need the attached patch for our onboard SD card interface
 [1,2]. Could you please add this patch to the tree?
>>> No, because according to previous discussion that isn't the correct
>>> solution and more importantly it will break other archs (if I recall
>>> correctly).
>>>
>>> Looks like someone from the ppc community needs to pick up the ball.
>> That's a pretty small community these days :) :/
>>
>> Christian can you test this please? I think I got the polarity of all
>> the tests right, but it's Friday night so maybe I'm wrong :)
>>
>> cheers
> Michael,
>
> Thanks a lot for the new patch! I compiled the RC7 of kernel 5.5 with 
> your patch again yesterday and the kernel works without any problems 
> with our onboard SD cards. [1]

Thanks for testing.

cheers


powerpc Linux scv support and scv system call ABI proposal

2020-01-28 Thread Nicholas Piggin
I would like to enable support for the scv instruction to provide the Linux
system calls.

This requires two things to be defined, firstly how to advertise support
for scv and how to allocate and advertise support for individual scv
vectors. Secondly, how to define a Linux system call ABI with this new
instruction.

I have put together a rough proposal along with some options and 
questions. Any thoughts or input would be welcome, I have probably
missed some things so please point them out.

(I will be on vacation for two weeks from the end of the week, may not
get to replying immediately)

Thanks,
Nick

System Call Vectored (scv) ABI

The scv instruction is introduced with POWER9 / ISA3, it comes with an
rfscv counter-part. The benefit of these instructions is performance
(trading slower SRR0/1 with faster LR/CTR registers, and entering the
kernel with MSR[EE] and MSR[RI] left enabled, which can avoid one mtmsrd
instruction. Another benefit is that the ABI can be changed if there is
a good reason to.

The scv instruction has 128 interrupt entry points (not enough to cover
the Linux system call space). The proposal is to assign scv numbers
conservatively. 'scv 0' could be used for the regular Linux system call
ABI initially. Examples of other assignments could be 32-bit compat
system calls, and firmware service calls.

Linux has not enabled FSCR[SCV] yet, so the instruction will trap with illegal
instruction on current environments. Linux has defined a HWCAP2 bit
PPC_FEATURE2_SCV for SCV support, but does not set it.

One option is for PPC_FEATURE2_SCV to indicate 'scv 0' support, and a new HWCAP
bit assigned for each new scv vector supported for userspace. This is the most
regular and flexible approach. It requires the most HWCAP space, but vector
usage is not expected to grow quickly.

Another option is for PPC_FEATURE2_SCV to indicate 'scv 0', and other vectors
will each return -ENOSYS, then when they are assigned to a new ABI, it will
define a particular way they can be queried for support (which would return
something other than -ENOSYS if supported). This will not require more HWCAP
bits, but it's less regular and more complicated to determine.

* Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
  vectors will return -ENOSYS, and the decision for how to add support for
  a new vector deferred until we see the next user.

* Proposal is for scv 0 to provide the standard Linux system call ABI with some
  differences:

- LR is volatile across scv calls. This is necessary for support because the
  scv instruction clobbers LR.

- CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
  system call exit to avoid restoring the CR register.

- Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
  instruction on the kernel side, and it is currently not implemented well
  in glibc, requiring a mfcr (mfocr should be possible and asm goto support
  would allow a better implementation). Is it worth continuing this style of
  error handling? Or just move to -ve return means error? Using a different
  bit would allow the kernel to piggy back the CR return code setting with
  a test for the error case exit.

- R2 could be volatile as though it's an external function call, which
  would avoid one store  in the system call entry path. However it would
  require the caller to load R2 after the system call returns, where the
  latency of the load can not be overlapped with the costly system call
  exit sequence. On balance, it may be better to keep R2 as non-volatile.

- Number of volatile registers available seems sufficient. Linux's 'sc'
  handler is badly constrained here, but that is because it is shared
  between both hypercall and syscall handlers, which have different
  call conventions that share no volatile GPR registers! r9-r12 should
  be quite enough.

- Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
  calls if there was interest in developing an ABI for 32-bit programs.
  Marginal benefit in avoiding compat syscall selection.



Re: [PATCH] powerpc: drmem: avoid NULL pointer dereference when drmem is unavailable

2020-01-28 Thread Libor Pechacek
Hello Nathan,

On Thu 23-01-20 09:56:10, Nathan Lynch wrote:
> Libor Pechacek  writes:
> > In KVM guests drmem structure is only zero initialized. Trying to
> > manipulate DLPAR parameters results in a crash in this environment.
> 
> I think this statement needs qualification. Unless I'm mistaken, this
> happens only when you boot a guest without any hotpluggable memory
> configured, and then try to add or remove memory.

Thanks for the review. The introductory statement can indeed be clearer.

[...]
> > diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> > b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > index c126b94d1943..4ea6af002e27 100644
> > --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> > +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> > @@ -236,9 +236,9 @@ static int get_lmb_range(u32 drc_index, int n_lmbs,
> > if (!start)
> > return -EINVAL;
> >  
> > -   end = [n_lmbs - 1];
> > +   end = [n_lmbs];
> >  
> > -   last_lmb = _info->lmbs[drmem_info->n_lmbs - 1];
> > +   last_lmb = _info->lmbs[drmem_info->n_lmbs];
> > if (end > last_lmb)
> > return -EINVAL;
> 
> Is this not undefined behavior? I'd rather do this in a way that does
> not involve forming out-of-bounds pointers.

Well, this is a tough question for the case when drmem_info->lmbs was not
allocated. Given that the array does not exist, what bounds are we talking
about?

My patch builds on the fact that NULL[0] is NULL and NULL < NULL is false.
Talking about a pointer to one past the last element of an non-existent array
is too much philosophy for me.

For the case when drmem_info->lmbs is allocated, last_lmb is a pointer to one
past the last element of the array as Michal mentioned.

> Even if it's safe, naming that pointer "last_lmb" now actively hinders
> understanding of the code; it should be named "limit" or something.

Good catch.

[...]
> 1 file changed, 36 insertions(+), 7 deletions(-)
> arch/powerpc/include/asm/drmem.h | 43 +---
> 
> modified   arch/powerpc/include/asm/drmem.h
> @@ -20,19 +20,48 @@ struct drmem_lmb {
>  
>  struct drmem_lmb_info {
>   struct drmem_lmb*lmbs;
> - int n_lmbs;
> + unsigned intn_lmbs;
>   u32 lmb_size;
>  };
>  
>  extern struct drmem_lmb_info *drmem_info;
>  
> -#define for_each_drmem_lmb_in_range(lmb, start, end) \
> - for ((lmb) = (start); (lmb) <= (end); (lmb)++)
> +static inline bool drmem_present(void)
> +{
> + return drmem_info->lmbs != NULL;
> +}

Yes, use of this test was also my first idea about the fix.

> +static inline struct drmem_lmb *drmem_lmb_index(unsigned int index)
> +{
> + if (!drmem_present())
> + return NULL;
>  
> -#define for_each_drmem_lmb(lmb)  \
> - for_each_drmem_lmb_in_range((lmb),  \
> - _info->lmbs[0],   \
> - _info->lmbs[drmem_info->n_lmbs - 1])
> + if (WARN_ON(index >= drmem_info->n_lmbs))
> + return NULL;

Why is this WARN_ON needed?

> +
> + return _info->lmbs[index];
> +}
> +
> +static inline struct drmem_lmb *drmem_first_lmb(void)
> +{
> + return drmem_lmb_index(0);
> +}
> +
> +static inline struct drmem_lmb *drmem_last_lmb(void)
> +{
> + if (!drmem_present())
> + return NULL;
> +
> + return drmem_lmb_index(drmem_info->n_lmbs - 1);

Is the unsigned integer wraparound intended in drmem_info->n_lmbs == 0 case?

> +}
> +
> +#define for_each_drmem_lmb(lmb)  
> \
> + for ((lmb) = drmem_first_lmb(); \

drmem_first_lmb() is essentially a call to drmem_info->lmbs(0). What
happens if drmem_info->n_lmbs is zero and drmem_info->lmbs is not NULL?

> +  (lmb) != NULL && (lmb) <= drmem_last_lmb();\
> +  (lmb)++)
> +
> +#define for_each_drmem_lmb_in_range(lmb, start, end) \
> + for ((lmb) = (start); (lmb) <= (end); (lmb)++)
>  
>  /*
>   * The of_drconf_cell_v1 struct defines the layout of the LMB data
> 

After all, I don't mind how the bug will be fixed. As you can see, my
preference is towards simpler solutions.

In my opinion your solution special-cased drmem_info->lmbs == NULL and opened
the doorway to the combination of drmem_info->lmbs != NULL &&
!drmem_info->n_lmbs. Maybe the condition can never become true but the code
should IMHO be robust enough to handle it.

Thanks!

Libor
-- 
Libor Pechacek
SUSE LabsRemember to have fun...


Re: [PATCH] powerpc/64: system call implement the bulk of the logic in C fix

2020-01-28 Thread Michal Suchánek
On Tue, Jan 28, 2020 at 10:41:02AM +1000, Nicholas Piggin wrote:
> Michal Suchánek's on January 28, 2020 4:08 am:
> > On Tue, Jan 28, 2020 at 12:17:12AM +1000, Nicholas Piggin wrote:
> >> This incremental patch fixes several soft-mask debug and unsafe
> >> smp_processor_id messages due to tracing and false positives in
> >> "unreconciled" code.
> >> 
> >> It also fixes a bug with syscall tracing functions that set registers
> >> (e.g., PTRACE_SETREG) not setting GPRs properly.
> >> 
> >> There was a bug reported with the TM selftests, I haven't been able
> >> to reproduce that one.
> >> 
> >> I can squash this into the main patch and resend the series if it
> >> helps but the incremental helps to see the bug fixes.
> > 
> > There are some whitespace differences between this and the series I have
> > applied locally. What does it apply to?
> > 
> > Is there some revision of the patchset I missed?
> 
> No I may have just missed some of your whitespace cleanups, or maybe I got
> some that Michael made which you don't have in his next-test branch.

Looks like the latter. I will pick patches from next-test.

Thanks

Michal


Re: [PATCH v16 00/23] selftests, powerpc, x86: Memory Protection Keys

2020-01-28 Thread Sandipan Das
Hi Dave,

On 27/01/20 9:12 pm, Dave Hansen wrote:
> 
> How have you tested this patch (and the whole series for that matter)?
> 

I replaced the second patch with this one and did a build test.
Till v16, I had tested the whole series (build + run) on both a POWER8
system (with 4K and 64K page sizes) and a Skylake SP system but for
x86_64 only. Following that, I could only do a build test locally on
my laptop for i386 and x86_64 on my laptop as I did not have access to
the Skylake system anymore.

This is how I tested the build process:

$ cd linux
$ make -C tools/testing/selftests
...
make[1]: Entering directory 
'/home/sandipan/.devel/linux/tools/testing/selftests/vm'
...
gcc -Wall -I ../../../../usr/include  -no-pie -m32  protection_keys.c -lrt -lrt 
-ldl -lm -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32
gcc -Wall -I ../../../../usr/include  -no-pie -m64  protection_keys.c -lrt -lrt 
-ldl -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64
...

$ make -C tools/testing/selftests clean
$ make -C tools/testing/selftests/vm
make: Entering directory 
'/home/sandipan/.devel/linux/tools/testing/selftests/vm'
make --no-builtin-rules ARCH=x86_64 -C ../../../.. headers_install
make[1]: Entering directory '/home/sandipan/.devel/linux'
  INSTALL ./usr/include
make[1]: Leaving directory '/home/sandipan/.devel/linux'
...
gcc -Wall -I ../../../../usr/include  -no-pie -m32  protection_keys.c -lrt -lrt 
-ldl -lm -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32
gcc -Wall -I ../../../../usr/include  -no-pie -m64  protection_keys.c -lrt -lrt 
-ldl -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64
...

$ make -C tools/testing/selftests/vm clean
$ make -C tools/testing/selftests/vm protection_keys
make: Entering directory 
'/home/sandipan/.devel/linux/tools/testing/selftests/vm'
gcc -Wall -I ../../../../usr/include  -no-pie -m32  protection_keys.c -lrt -lrt 
-ldl -lm -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32
gcc -Wall -I ../../../../usr/include  -no-pie -m64  protection_keys.c -lrt -lrt 
-ldl -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64
make: Leaving directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm'

$ make -C tools/testing/selftests/vm clean
$ make -C tools/testing/selftests/vm protection_keys_32
make: Entering directory 
'/home/sandipan/.devel/linux/tools/testing/selftests/vm'
gcc -Wall -I ../../../../usr/include  -no-pie -m32  protection_keys.c -lrt -lrt 
-ldl -lm -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32
make: Leaving directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm'

$ make -C tools/testing/selftests/vm protection_keys_64
make: Entering directory 
'/home/sandipan/.devel/linux/tools/testing/selftests/vm'
gcc -Wall -I ../../../../usr/include  -no-pie -m64  protection_keys.c -lrt -lrt 
-ldl -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64
make: Leaving directory '/home/sandipan/.devel/linux/tools/testing/selftests/vm'

$ make -C tools/testing/selftests/vm clean
$ cd tools/testing/selftests/vm
$ make
make --no-builtin-rules ARCH=x86_64 -C ../../../.. headers_install
make[1]: Entering directory '/home/sandipan/.devel/linux'
  INSTALL ./usr/include
make[1]: Leaving directory '/home/sandipan/.devel/linux'
...
gcc -Wall -I ../../../../usr/include  -no-pie -m32  protection_keys.c -lrt -lrt 
-ldl -lm -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32
gcc -Wall -I ../../../../usr/include  -no-pie -m64  protection_keys.c -lrt -lrt 
-ldl -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64
...

$ make clean
$ make protection_keys
gcc -Wall -I ../../../../usr/include  -no-pie -m32  protection_keys.c -lrt -lrt 
-ldl -lm -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32
gcc -Wall -I ../../../../usr/include  -no-pie -m64  protection_keys.c -lrt -lrt 
-ldl -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64

$ make clean
$ make protection_keys_32
gcc -Wall -I ../../../../usr/include  -no-pie -m32  protection_keys.c -lrt -lrt 
-ldl -lm -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_32

$ make protection_keys_64
gcc -Wall -I ../../../../usr/include  -no-pie -m64  protection_keys.c -lrt -lrt 
-ldl -o 
/home/sandipan/.devel/linux/tools/testing/selftests/vm/protection_keys_64


- Sandipan



Re: vmlinux ELF header sometimes corrupt

2020-01-28 Thread Rasmus Villemoes
On 22/01/2020 18.52, Rasmus Villemoes wrote:
> I'm building for a ppc32 (mpc8309) target using Yocto, and I'm hitting a
> very hard to debug problem that maybe someone else has encountered. This
> doesn't happen always, perhaps 1 in 8 times or something like that.
> 
> The issue is that when the build gets to do "${CROSS}objcopy -O binary
> ... vmlinux", vmlinux is not (no longer) a proper ELF file, so naturally
> that fails with
> 
>   powerpc-oe-linux-objcopy:vmlinux: file format not recognized
> 
> So I hacked link-vmlinux.sh to stash copies of vmlinux before and after
> sortextable vmlinux. Both of those are proper ELF files, and comparing
> the corrupted vmlinux to vmlinux.after_sort they are identical after the
> first 52 bytes; in vmlinux, those first 52 bytes are all 0.
> 
> I also saved stat(1) info to see if vmlinux is being replaced or
> modified in-place.
> 
> $ cat vmlinux.stat.after_sort
>   File: 'vmlinux'
>   Size: 8608456 Blocks: 16696  IO Block: 4096   regular file
> Device: 811h/2065d  Inode: 21919132Links: 1
> Access: (0755/-rwxr-xr-x)  Uid: ( 1000/user)   Gid: ( 1001/user)
> Access: 2020-01-22 10:52:38.946703081 +
> Modify: 2020-01-22 10:52:38.954703105 +
> Change: 2020-01-22 10:52:38.954703105 +
> 
> $ stat vmlinux
>   File: 'vmlinux'
>   Size: 8608456 Blocks: 16688  IO Block: 4096   regular file
> Device: 811h/2065d  Inode: 21919132Links: 1
> Access: (0755/-rwxr-xr-x)  Uid: ( 1000/user)   Gid: ( 1001/user)
> Access: 2020-01-22 17:20:00.650379057 +
> Modify: 2020-01-22 10:52:38.954703105 +
> Change: 2020-01-22 10:52:38.954703105 +
> 
> So the inode number and mtime/ctime are exactly the same, but for some
> reason Blocks: has changed? This is on an ext4 filesystem, but I don't
> suspect the filesystem to be broken, because it's always just vmlinux
> that ends up corrupt, and always in exactly this way with the first 52
> bytes having been wiped.

So, I think I take that last part back. I just hit a case where I built
the kernel manually, made a copy of vmlinux to vmlinux.copy, and file(1)
said both were fine (and cmp(1) agreed they were identical). Then I went
off and did work elsewhere with a lot of I/O. When I came back to the
linux build dir, vmlinux was broken, exactly as before. So I now suspect
it to be some kind of "while the file is in the pagecache, everything is
fine, but when it's read back from disk it's broken".

My ext4 fs does have inline_data enabled, which could explain why the
corruption happens in the beginning. It's just very odd that it only
ever seems to trigger for vmlinux and not other files, but perhaps the
I/O patterns that ld and/or sortextable does are exactly what are needed
to trigger the bug.

I've done a long overdue kernel update, and there are quite a few
fs/ext4/ -stable patches in there, so now I'll see if it still happens.
And if anything more comes of this, I'll remove the kbuild and ppc lists
from cc, sorry for the noise.

Rasmus


Re: [PASEMI PA6T PPC] Onboard CF card device with new SanDisk High (>8G) CF cards

2020-01-28 Thread Christoph Hellwig
On Tue, Jan 28, 2020 at 08:58:29AM +0100, Christian Zigotzky wrote:
> Hi All,
> 
> Which mailing list is responsible for the pata_pcmcia driver? We are using
> new SanDisk High (>8G) CF cards with this driver [1] and we need the
> following line in the file "drivers/ata/pata_pcmcia.c".
> 
> +    PCMCIA_DEVICE_MANF_CARD(0x00f1, 0x0101),        /* SanDisk High (>8G)
> CFA */

Please send a formal patch for it to linux-...@vger.kernel.org


[PASEMI PA6T PPC] Onboard CF card device with new SanDisk High (>8G) CF cards

2020-01-28 Thread Christian Zigotzky

Hi All,

Which mailing list is responsible for the pata_pcmcia driver? We are 
using new SanDisk High (>8G) CF cards with this driver [1] and we need 
the following line in the file "drivers/ata/pata_pcmcia.c".


+    PCMCIA_DEVICE_MANF_CARD(0x00f1, 0x0101),        /* SanDisk High 
(>8G) CFA */


Thanks,
Christian

[1] https://forum.hyperion-entertainment.com/viewtopic.php?f=35=4282