Re: [mm] 8e63b8bbd7: WARNING:at_mm/memory.c:#__apply_to_page_range

2020-08-21 Thread Nicholas Piggin
Excerpts from kernel test robot's message of August 22, 2020 9:24 am:
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 8e63b8bbd7d17f64ced151cebd151a2cd9f63c64 ("[PATCH v5 2/8] mm: 
> apply_to_pte_range warn and fail if a large pte is encountered")
> url: 
> https://github.com/0day-ci/linux/commits/Nicholas-Piggin/huge-vmalloc-mappings/20200821-124543
> base: https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git next
> 
> in testcase: boot
> 
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> +---+++
> |   | 185311995a | 8e63b8bbd7 |
> +---+++
> | boot_successes| 4  | 0  |
> | boot_failures | 0  | 4  |
> | WARNING:at_mm/memory.c:#__apply_to_page_range | 0  | 4  |
> | RIP:__apply_to_page_range | 0  | 4  |
> +---+++
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot 
> 
> 
> [0.786159] WARNING: CPU: 0 PID: 0 at mm/memory.c:2269 
> __apply_to_page_range+0x537/0x9c0

Hmm, I wonder if that's WARN_ON_ONCE(pmd_bad(*pmd))), which would be
odd. I don't know x86 asm well enough to see what the *pmd value would
be there.

I'll try to reproduce and work out what's going on.

Thanks,
Nick


> [0.786675] Modules linked in:
> [0.786888] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 5.9.0-rc1-2-g8e63b8bbd7d17f #2
> [0.787402] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.12.0-1 04/01/2014
> [0.787935] RIP: 0010:__apply_to_page_range+0x537/0x9c0
> [0.788280] Code: 8b 5c 24 50 48 39 5c 24 38 0f 84 6b 03 00 00 4c 8b 74 24 
> 38 e9 63 fb ff ff 84 d2 0f 84 ba 01 00 00 48 8b 1c 24 e9 3c fe ff ff <0f> 0b 
> 45 84 ed 0f 84 08 01 00 00 48 89 ef e8 8f 8f 02 00 48 89 e8
> [0.789467] RSP: :83e079d0 EFLAGS: 00010293
> [0.789805] RAX:  RBX: f5201000 RCX: 
> 000fffe0
> [0.790260] RDX:  RSI: 000ff000 RDI: 
> 
> [0.790724] RBP: 888107408000 R08: 0001 R09: 
> 000107408000
> [0.791179] R10: 840dcb5b R11: fbfff081b96b R12: 
> f520001f
> [0.791634] R13: 0001 R14: f520 R15: 
> dc00
> [0.792090] FS:  () GS:8881eae0() 
> knlGS:
> [0.792607] CS:  0010 DS:  ES:  CR0: 80050033
> [0.792977] CR2: 88823000 CR3: 03e14000 CR4: 
> 000406b0
> [0.793433] DR0:  DR1:  DR2: 
> 
> [0.793889] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [0.794344] Call Trace:
> [0.794517]  ? memset+0x40/0x40
> [0.794745]  alloc_vmap_area+0x7a9/0x2280
> [0.795054]  ? trace_hardirqs_on+0x4f/0x2e0
> [0.795354]  ? _raw_spin_unlock_irqrestore+0x39/0x60
> [0.795682]  ? free_vmap_area+0x1a20/0x1a20
> [0.795959]  ? __kasan_kmalloc+0xbf/0xe0
> [0.796292]  __get_vm_area_node+0xd1/0x300
> [0.796605]  get_vm_area_caller+0x2d/0x40
> [0.796872]  ? acpi_os_map_iomem+0x3c3/0x4e0
> [0.797155]  __ioremap_caller+0x1d8/0x480
> [0.797486]  ? acpi_os_map_iomem+0x3c3/0x4e0
> [0.797770]  ? iounmap+0x160/0x160
> [0.798002]  ? __kasan_kmalloc+0xbf/0xe0
> [0.798335]  acpi_os_map_iomem+0x3c3/0x4e0
> [0.798612]  acpi_tb_acquire_table+0xb3/0x1c5
> [0.798910]  acpi_tb_validate_table+0x68/0xbf
> [0.799199]  acpi_tb_verify_temp_table+0xa1/0x640
> [0.799512]  ? __down_trylock_console_sem+0x7a/0xa0
> [0.799833]  ? acpi_tb_validate_temp_table+0x9d/0x9d
> [0.800159]  ? acpi_ut_init_stack_ptr_trace+0xaa/0xaa
> [0.800490]  ? vprintk_emit+0x10b/0x2a0
> [0.800748]  ? acpi_ut_acquire_mutex+0x1d7/0x32f
> [0.801056]  acpi_reallocate_root_table+0x339/0x385
> [0.801377]  ? acpi_tb_parse_root_table+0x5a5/0x5a5
> [0.801700]  ? dmi_matches+0xc6/0x120
> [0.801968]  acpi_early_init+0x116/0x3ae
> [0.802230]  start_kernel+0x2f7/0x39f
> [0.802477]  secondary_startup_64+0xa4/0xb0
> [0.802770] irq event stamp: 5137
> [0.802992] hardirqs last  enabled at (5145): [] 
> console

[PATCH] powerpc/pseries: Add pcibios_default_alignment implementation

2020-08-21 Thread Shawn Anastasio
Implement pcibios_default_alignment for pseries so that
resources are page-aligned. The main benefit of this is being
able to map any resource from userspace via mechanisms like VFIO.

This is identical to powernv's implementation.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/platforms/pseries/pci.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pci.c 
b/arch/powerpc/platforms/pseries/pci.c
index 911534b89c85..6d922c096354 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -210,6 +210,11 @@ int pseries_pcibios_sriov_disable(struct pci_dev *pdev)
 }
 #endif
 
+static resource_size_t pseries_pcibios_default_alignment(void)
+{
+   return PAGE_SIZE;
+}
+
 static void __init pSeries_request_regions(void)
 {
if (!isa_io_base)
@@ -231,6 +236,8 @@ void __init pSeries_final_fixup(void)
 
eeh_show_enabled();
 
+   ppc_md.pcibios_default_alignment = pseries_pcibios_default_alignment;
+
 #ifdef CONFIG_PCI_IOV
ppc_md.pcibios_sriov_enable = pseries_pcibios_sriov_enable;
ppc_md.pcibios_sriov_disable = pseries_pcibios_sriov_disable;
-- 
2.28.0



[PATCH v2 1/3] Revert "powerpc/64s: Remove PROT_SAO support"

2020-08-21 Thread Shawn Anastasio
This reverts commit 5c9fa16e8abd342ce04dc830c1ebb2a03abf6c05.

Since PROT_SAO can still be useful for certain classes of software,
reintroduce it. Concerns about guest migration for LPARs using SAO
will be addressed next.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  8 ++--
 arch/powerpc/include/asm/cputable.h   | 10 ++---
 arch/powerpc/include/asm/mman.h   | 26 ++--
 arch/powerpc/include/asm/nohash/64/pgtable.h  |  2 +
 arch/powerpc/include/uapi/asm/mman.h  |  2 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |  2 +-
 arch/powerpc/mm/book3s64/hash_utils.c |  2 +
 include/linux/mm.h|  2 +
 include/trace/events/mmflags.h|  2 +
 mm/ksm.c  |  4 ++
 tools/testing/selftests/powerpc/mm/.gitignore |  1 +
 tools/testing/selftests/powerpc/mm/Makefile   |  4 +-
 tools/testing/selftests/powerpc/mm/prot_sao.c | 42 +++
 13 files changed, 90 insertions(+), 17 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6de56c3b33c4..495fc0ccb453 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -20,13 +20,9 @@
 #define _PAGE_RW   (_PAGE_READ | _PAGE_WRITE)
 #define _PAGE_RWX  (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 #define _PAGE_PRIVILEGED   0x8 /* kernel access only */
-
-#define _PAGE_CACHE_CTL0x00030 /* Bits for the folowing cache 
modes */
-   /*  No bits set is normal cacheable memory */
-   /*  0x00010 unused, is SAO bit on radix POWER9 */
+#define _PAGE_SAO  0x00010 /* Strong access order */
 #define _PAGE_NON_IDEMPOTENT   0x00020 /* non idempotent memory */
 #define _PAGE_TOLERANT 0x00030 /* tolerant memory, cache inhibited */
-
 #define _PAGE_DIRTY0x00080 /* C: page changed */
 #define _PAGE_ACCESSED 0x00100 /* R: page referenced */
 /*
@@ -828,6 +824,8 @@ static inline void __set_pte_at(struct mm_struct *mm, 
unsigned long addr,
return hash__set_pte_at(mm, addr, ptep, pte, percpu);
 }
 
+#define _PAGE_CACHE_CTL(_PAGE_SAO | _PAGE_NON_IDEMPOTENT | 
_PAGE_TOLERANT)
+
 #define pgprot_noncached pgprot_noncached
 static inline pgprot_t pgprot_noncached(pgprot_t prot)
 {
diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index fdddb822d564..f89205eff691 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -191,7 +191,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_SPURR  LONG_ASM_CONST(0x0100)
 #define CPU_FTR_DSCR   LONG_ASM_CONST(0x0200)
 #define CPU_FTR_VSXLONG_ASM_CONST(0x0400)
-// Free
LONG_ASM_CONST(0x0800)
+#define CPU_FTR_SAOLONG_ASM_CONST(0x0800)
 #define CPU_FTR_CP_USE_DCBTZ   LONG_ASM_CONST(0x1000)
 #define CPU_FTR_UNALIGNED_LD_STD   LONG_ASM_CONST(0x2000)
 #define CPU_FTR_ASYM_SMT   LONG_ASM_CONST(0x4000)
@@ -436,7 +436,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | CPU_FTR_ASYM_SMT | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | \
CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX )
@@ -445,7 +445,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
@@ -456,7 +456,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR | \
+   CPU_FTR_DSCR | CPU_FTR_SAO  | \
CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
@@ -474,7 +474,7 @@ static inline void cpu_feature_keys_init

[PATCH v2 3/3] selftests/powerpc: Update PROT_SAO test to skip ISA 3.1

2020-08-21 Thread Shawn Anastasio
Since SAO support was removed from ISA 3.1, skip the
prot_sao test if PPC_FEATURE2_ARCH_3_1 is set.

Signed-off-by: Shawn Anastasio 
---
 tools/testing/selftests/powerpc/mm/prot_sao.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/mm/prot_sao.c 
b/tools/testing/selftests/powerpc/mm/prot_sao.c
index e2eed65b7735..e0cf8ebbf8cd 100644
--- a/tools/testing/selftests/powerpc/mm/prot_sao.c
+++ b/tools/testing/selftests/powerpc/mm/prot_sao.c
@@ -18,8 +18,9 @@ int test_prot_sao(void)
 {
char *p;
 
-   /* 2.06 or later should support SAO */
-   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
+   /* SAO was introduced in 2.06 and removed in 3.1 */
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06) ||
+   have_hwcap2(PPC_FEATURE2_ARCH_3_1));
 
/*
 * Ensure we can ask for PROT_SAO.
-- 
2.28.0



[PATCH v2 2/3] powerpc/64s: Disallow PROT_SAO in LPARs by default

2020-08-21 Thread Shawn Anastasio
Since migration of guests using SAO to ISA 3.1 hosts may cause issues,
disable PROT_SAO in LPARs by default and introduce a new Kconfig option
PPC_PROT_SAO_LPAR to allow users to enable it if desired.

Signed-off-by: Shawn Anastasio 
---
 arch/powerpc/Kconfig| 12 
 arch/powerpc/include/asm/mman.h |  9 +++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f48bbfb3ce9..65bed1fdeaad 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -860,6 +860,18 @@ config PPC_SUBPAGE_PROT
 
  If unsure, say N here.
 
+config PPC_PROT_SAO_LPAR
+   bool "Support PROT_SAO mappings in LPARs"
+   depends on PPC_BOOK3S_64
+   help
+ This option adds support for PROT_SAO mappings from userspace
+ inside LPARs on supported CPUs.
+
+ This may cause issues when performing guest migration from
+ a CPU that supports SAO to one that does not.
+
+ If unsure, say N here.
+
 config PPC_COPRO_BASE
bool
 
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 4ba303ea27f5..7cb6d18f5cd6 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -40,8 +40,13 @@ static inline bool arch_validate_prot(unsigned long prot, 
unsigned long addr)
 {
if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
return false;
-   if ((prot & PROT_SAO) && !cpu_has_feature(CPU_FTR_SAO))
-   return false;
+   if (prot & PROT_SAO) {
+   if (!cpu_has_feature(CPU_FTR_SAO))
+   return false;
+   if (firmware_has_feature(FW_FEATURE_LPAR) &&
+   !IS_ENABLED(CONFIG_PPC_PROT_SAO_LPAR))
+   return false;
+   }
return true;
 }
 #define arch_validate_prot arch_validate_prot
-- 
2.28.0



[PATCH v2 0/3] Reintroduce PROT_SAO

2020-08-21 Thread Shawn Anastasio
Changes in v2:
- Update prot_sao selftest to skip ISA 3.1

This set re-introduces the PROT_SAO prot flag removed in
Commit 5c9fa16e8abd ("powerpc/64s: Remove PROT_SAO support").

To address concerns regarding live migration of guests using SAO
to P10 hosts without SAO support, the flag is disabled by default
in LPARs. A new config option, PPC_PROT_SAO_LPAR was added to
allow users to explicitly enable it if they will not be running
in an environment where this is a conern.

Shawn Anastasio (3):
  Revert "powerpc/64s: Remove PROT_SAO support"
  powerpc/64s: Disallow PROT_SAO in LPARs by default
  selftests/powerpc: Update PROT_SAO test to skip ISA 3.1

 arch/powerpc/Kconfig  | 12 ++
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  8 ++--
 arch/powerpc/include/asm/cputable.h   | 10 ++---
 arch/powerpc/include/asm/mman.h   | 31 +++--
 arch/powerpc/include/asm/nohash/64/pgtable.h  |  2 +
 arch/powerpc/include/uapi/asm/mman.h  |  2 +-
 arch/powerpc/kernel/dt_cpu_ftrs.c |  2 +-
 arch/powerpc/mm/book3s64/hash_utils.c |  2 +
 include/linux/mm.h|  2 +
 include/trace/events/mmflags.h|  2 +
 mm/ksm.c  |  4 ++
 tools/testing/selftests/powerpc/mm/.gitignore |  1 +
 tools/testing/selftests/powerpc/mm/Makefile   |  4 +-
 tools/testing/selftests/powerpc/mm/prot_sao.c | 43 +++
 14 files changed, 108 insertions(+), 17 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/mm/prot_sao.c

-- 
2.28.0



Re: [PATCH 2/2] powerpc/64s: Disallow PROT_SAO in LPARs by default

2020-08-21 Thread Shawn Anastasio
On 8/21/20 5:37 AM, Nicholas Piggin wrote:> I think this should be okay. 
Could you also update the selftest to skip

if we have PPC_FEATURE2_ARCH_3_1 set?


Sure. I'll send out a v2 shortly with another patch for this.


Thanks,
Nick


Thanks,
Shawn


Re: [PATCH v6 05/12] mm: HUGE_VMAP arch support cleanup

2020-08-21 Thread Nicholas Piggin
Excerpts from Andrew Morton's message of August 22, 2020 6:14 am:
> On Sat, 22 Aug 2020 01:12:09 +1000 Nicholas Piggin  wrote:
> 
>> This changes the awkward approach where architectures provide init
>> functions to determine which levels they can provide large mappings for,
>> to one where the arch is queried for each call.
>> 
>> This removes code and indirection, and allows constant-folding of dead
>> code for unsupported levels.
>> 
>> This also adds a prot argument to the arch query. This is unused
>> currently but could help with some architectures (e.g., some powerpc
>> processors can't map uncacheable memory with large pages).
>> 
>> --- a/arch/arm64/include/asm/vmalloc.h
>> +++ b/arch/arm64/include/asm/vmalloc.h
>> @@ -1,4 +1,12 @@
>>  #ifndef _ASM_ARM64_VMALLOC_H
>>  #define _ASM_ARM64_VMALLOC_H
>>  
>> +#include 
>> +
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
>> +bool arch_vmap_p4d_supported(pgprot_t prot);
>> +bool arch_vmap_pud_supported(pgprot_t prot);
>> +bool arch_vmap_pmd_supported(pgprot_t prot);
>> +#endif
> 
> Moving these out of generic code and into multiple arch headers is
> unfortunate.  Can we leave them in include/linux/somewhere?  And remove
> the ifdefs, if so inclined - they just move the build error from
> link-time to compile-time, and such an error shouldn't occur!

Yeah this was just an intermediate step as you saw. It's a bit 
unfortunate, but I thought it made the arch changes clearer.

Thanks,
Nick



Re: [PATCH v6 01/12] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings

2020-08-21 Thread Nicholas Piggin
Excerpts from Andrew Morton's message of August 22, 2020 6:07 am:
> On Sat, 22 Aug 2020 01:12:05 +1000 Nicholas Piggin  wrote:
> 
>> vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
>> Whether or not a vmap is huge depends on the architecture details,
>> alignments, boot options, etc., which the caller can not be expected
>> to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.
> 
> I assume this doesn't matter in current mainline?
> If wrong, then what are the user-visible effects and why no cc:stable?

I haven't heard any reports, but in theory it could cause a prolem. The
commit 029c54b095995 from the changelog was made to paper over it. But
that was fixed properly afterward I think by 737326aa510b.

Not sure of the user visible problems currently. I think generally you
wouldn't do vmalloc_to_page() on ioremap() memory, so maybe callilng it
a regression is a bit strong. _Technically_ a regression, maybe.

Thanks,
Nick


Re: [PATCH] tty: hvcs: Don't NULL tty->driver_data until hvcs_cleanup()

2020-08-21 Thread Tyrel Datwyler
On 8/20/20 4:46 PM, Tyrel Datwyler wrote:
> The code currently NULLs tty->driver_data in hvcs_close() with the
> intent of informing the next call to hvcs_open() that device needs to be
> reconfigured. However, when hvcs_cleanup() is called we copy hvcsd from
> tty->driver_data which was previoulsy NULLed by hvcs_close() and our
> call to tty_port_put(&hvcsd->port) doesn't actually do anything since
> &hvcsd->port ends up translating to NULL by chance. This has the side
> effect that when hvcs_remove() is called we have one too many port
> references preventing hvcs_destuct_port() from ever being called. This
> also prevents us from reusing the /dev/hvcsX node in a future
> hvcs_probe() and we can eventually run out of /dev/hvcsX devices.
> 
> Fix this by waiting to NULL tty->driver_data in hvcs_cleanup().

I just realized I neglected a Fixes tag.

Fixes: 27bf7c43a19c ("TTY: hvcs, add tty install")

-Tyrel

> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/tty/hvc/hvcs.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
> index 55105ac38f89..509d1042825a 100644
> --- a/drivers/tty/hvc/hvcs.c
> +++ b/drivers/tty/hvc/hvcs.c
> @@ -1216,13 +1216,6 @@ static void hvcs_close(struct tty_struct *tty, struct 
> file *filp)
>  
>   tty_wait_until_sent(tty, HVCS_CLOSE_WAIT);
>  
> - /*
> -  * This line is important because it tells hvcs_open that this
> -  * device needs to be re-configured the next time hvcs_open is
> -  * called.
> -  */
> - tty->driver_data = NULL;
> -
>   free_irq(irq, hvcsd);
>   return;
>   } else if (hvcsd->port.count < 0) {
> @@ -1237,6 +1230,13 @@ static void hvcs_cleanup(struct tty_struct * tty)
>  {
>   struct hvcs_struct *hvcsd = tty->driver_data;
>  
> + /*
> +  * This line is important because it tells hvcs_open that this
> +  * device needs to be re-configured the next time hvcs_open is
> +  * called.
> +  */
> + tty->driver_data = NULL;
> +
>   tty_port_put(&hvcsd->port);
>  }
>  
> 



Re: [PATCH v6 01/12] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings

2020-08-21 Thread Andrew Morton
On Sat, 22 Aug 2020 01:12:05 +1000 Nicholas Piggin  wrote:

> vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
> Whether or not a vmap is huge depends on the architecture details,
> alignments, boot options, etc., which the caller can not be expected
> to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.

I assume this doesn't matter in current mainline?

If wrong, then what are the user-visible effects and why no cc:stable?

> This change teaches vmalloc_to_page about larger pages, and returns
> the struct page that corresponds to the offset within the large page.
> This makes the API agnostic to mapping implementation details.


Re: [PATCH v6 06/12] powerpc: inline huge vmap supported functions

2020-08-21 Thread Andrew Morton
On Sat, 22 Aug 2020 01:12:10 +1000 Nicholas Piggin  wrote:

>  #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> -bool arch_vmap_p4d_supported(pgprot_t prot);
> -bool arch_vmap_pud_supported(pgprot_t prot);
> -bool arch_vmap_pmd_supported(pgprot_t prot);
> +static inline bool arch_vmap_p4d_supported(pgprot_t prot)
> +{
> + return false;
> +}
> +
> +static inline bool arch_vmap_pud_supported(pgprot_t prot)
> +{
> + /* HPT does not cope with large pages in the vmalloc area */
> + return radix_enabled();
> +}
> +
> +static inline bool arch_vmap_pmd_supported(pgprot_t prot)
> +{
> + return radix_enabled();
> +}
>  #endif

Oh.  OK, whatever ;)


Re: [PATCH v6 05/12] mm: HUGE_VMAP arch support cleanup

2020-08-21 Thread Andrew Morton
On Sat, 22 Aug 2020 01:12:09 +1000 Nicholas Piggin  wrote:

> This changes the awkward approach where architectures provide init
> functions to determine which levels they can provide large mappings for,
> to one where the arch is queried for each call.
> 
> This removes code and indirection, and allows constant-folding of dead
> code for unsupported levels.
> 
> This also adds a prot argument to the arch query. This is unused
> currently but could help with some architectures (e.g., some powerpc
> processors can't map uncacheable memory with large pages).
> 
> --- a/arch/arm64/include/asm/vmalloc.h
> +++ b/arch/arm64/include/asm/vmalloc.h
> @@ -1,4 +1,12 @@
>  #ifndef _ASM_ARM64_VMALLOC_H
>  #define _ASM_ARM64_VMALLOC_H
>  
> +#include 
> +
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
> +bool arch_vmap_p4d_supported(pgprot_t prot);
> +bool arch_vmap_pud_supported(pgprot_t prot);
> +bool arch_vmap_pmd_supported(pgprot_t prot);
> +#endif

Moving these out of generic code and into multiple arch headers is
unfortunate.  Can we leave them in include/linux/somewhere?  And remove
the ifdefs, if so inclined - they just move the build error from
link-time to compile-time, and such an error shouldn't occur!



[PATCH AUTOSEL 4.4 12/22] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests

2020-08-21 Thread Sasha Levin
From: "Desnes A. Nunes do Rosario" 

[ Upstream commit 3337bf41e0dd70b4064cdf60acdfcdc2d050066c ]

An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
formed when count_pmc() is used to reset PMCs on a few selftests. This
extra pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra value on ebb_state.stats.pmc_count.

Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
>> [30]: register SPRN_PMC1  = 0x451e
PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

Signed-off-by: Desnes A. Nunes do Rosario 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200626164737.21943-1-desn...@linux.ibm.com
Signed-off-by: Sasha Levin 
---
 .../selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c| 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c  | 2 --
 .../selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c| 1 -
 .../testing/selftests/powerpc/pmu/ebb/multi_counter_test.c | 7 ---
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c   | 2 --
 .../testing/selftests/powerpc/pmu/ebb/pmae_handling_test.c | 2 --
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c| 2 --
 11 files changed, 26 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
index 94110b1dcd3d8..031baa43646fb 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
@@ -91,8 +91,6 @@ int back_to_back_ebbs(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
index 7c57a8d79535d..361e0be9df9ae 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
@@ -42,8 +42,6 @@ int cycles(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index ecf5ee3283a3e..fe7d0dc2a1a26 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -99,8 +99,6 @@ int cycles_with_freeze(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
printf("EBBs while frozen %d\n", ebbs_while_frozen);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
index c0faba520b35c..b9b30f974b5ea 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
@@ -71,8 +71,6 @@ int cycles_with_mmcr2(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index 9729d9f902187..4154498bc5dc5 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -398,8 +398,6 @@ int ebb_child(union pipe read_pipe, union pipe write_pipe)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git 
a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
index a991d2ea8d0a1..174e4f4dae6c0 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_

[PATCH AUTOSEL 4.9 14/26] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests

2020-08-21 Thread Sasha Levin
From: "Desnes A. Nunes do Rosario" 

[ Upstream commit 3337bf41e0dd70b4064cdf60acdfcdc2d050066c ]

An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
formed when count_pmc() is used to reset PMCs on a few selftests. This
extra pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra value on ebb_state.stats.pmc_count.

Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
>> [30]: register SPRN_PMC1  = 0x451e
PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

Signed-off-by: Desnes A. Nunes do Rosario 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200626164737.21943-1-desn...@linux.ibm.com
Signed-off-by: Sasha Levin 
---
 .../selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c| 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c  | 2 --
 .../selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c| 1 -
 .../testing/selftests/powerpc/pmu/ebb/multi_counter_test.c | 7 ---
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c   | 2 --
 .../testing/selftests/powerpc/pmu/ebb/pmae_handling_test.c | 2 --
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c| 2 --
 11 files changed, 26 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
index 94110b1dcd3d8..031baa43646fb 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
@@ -91,8 +91,6 @@ int back_to_back_ebbs(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
index 7c57a8d79535d..361e0be9df9ae 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
@@ -42,8 +42,6 @@ int cycles(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index ecf5ee3283a3e..fe7d0dc2a1a26 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -99,8 +99,6 @@ int cycles_with_freeze(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
printf("EBBs while frozen %d\n", ebbs_while_frozen);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
index c0faba520b35c..b9b30f974b5ea 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
@@ -71,8 +71,6 @@ int cycles_with_mmcr2(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index 46681fec549b8..2694ae161a84a 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -396,8 +396,6 @@ int ebb_child(union pipe read_pipe, union pipe write_pipe)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git 
a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
index a991d2ea8d0a1..174e4f4dae6c0 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_

[PATCH AUTOSEL 4.14 17/30] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests

2020-08-21 Thread Sasha Levin
From: "Desnes A. Nunes do Rosario" 

[ Upstream commit 3337bf41e0dd70b4064cdf60acdfcdc2d050066c ]

An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
formed when count_pmc() is used to reset PMCs on a few selftests. This
extra pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra value on ebb_state.stats.pmc_count.

Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
>> [30]: register SPRN_PMC1  = 0x451e
PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

Signed-off-by: Desnes A. Nunes do Rosario 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200626164737.21943-1-desn...@linux.ibm.com
Signed-off-by: Sasha Levin 
---
 .../selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c| 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c  | 2 --
 .../selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c| 1 -
 .../testing/selftests/powerpc/pmu/ebb/multi_counter_test.c | 7 ---
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c   | 2 --
 .../testing/selftests/powerpc/pmu/ebb/pmae_handling_test.c | 2 --
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c| 2 --
 11 files changed, 26 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
index 94110b1dcd3d8..031baa43646fb 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
@@ -91,8 +91,6 @@ int back_to_back_ebbs(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
index 7c57a8d79535d..361e0be9df9ae 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
@@ -42,8 +42,6 @@ int cycles(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index ecf5ee3283a3e..fe7d0dc2a1a26 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -99,8 +99,6 @@ int cycles_with_freeze(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
printf("EBBs while frozen %d\n", ebbs_while_frozen);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
index c0faba520b35c..b9b30f974b5ea 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
@@ -71,8 +71,6 @@ int cycles_with_mmcr2(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index 46681fec549b8..2694ae161a84a 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -396,8 +396,6 @@ int ebb_child(union pipe read_pipe, union pipe write_pipe)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git 
a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
index a991d2ea8d0a1..174e4f4dae6c0 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_

[PATCH AUTOSEL 4.14 05/30] powerpc/xive: Ignore kmemleak false positives

2020-08-21 Thread Sasha Levin
From: Alexey Kardashevskiy 

[ Upstream commit f0993c839e95dd6c7f054a1015e693c87e33e4fb ]

xive_native_provision_pages() allocates memory and passes the pointer to
OPAL so kmemleak cannot find the pointer usage in the kernel memory and
produces a false positive report (below) (even if the kernel did scan
OPAL memory, it is unable to deal with __pa() addresses anyway).

This silences the warning.

unreferenced object 0xc000200350c4 (size 65536):
  comm "qemu-system-ppc", pid 2725, jiffies 4294946414 (age 70776.530s)
  hex dump (first 32 bytes):
02 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00  P...
01 00 08 07 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[<81ff046c>] xive_native_alloc_vp_block+0x120/0x250
[] kvmppc_xive_compute_vp_id+0x248/0x350 [kvm]
[] kvmppc_xive_connect_vcpu+0xc0/0x520 [kvm]
[<6acbc81c>] kvm_arch_vcpu_ioctl+0x308/0x580 [kvm]
[<89c69580>] kvm_vcpu_ioctl+0x19c/0xae0 [kvm]
[<902ae91e>] ksys_ioctl+0x184/0x1b0
[] sys_ioctl+0x48/0xb0
[<01b2c127>] system_call_exception+0x124/0x1f0
[] system_call_common+0xe8/0x214

Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200612043303.84894-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/sysdev/xive/native.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/native.c 
b/arch/powerpc/sysdev/xive/native.c
index 30cdcbfa1c04e..b0e96f4b728c1 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -630,6 +631,7 @@ static bool xive_native_provision_pages(void)
pr_err("Failed to allocate provisioning page\n");
return false;
}
+   kmemleak_ignore(p);
opal_xive_donate_page(chip, __pa(p));
}
return true;
-- 
2.25.1



[PATCH AUTOSEL 4.19 20/38] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests

2020-08-21 Thread Sasha Levin
From: "Desnes A. Nunes do Rosario" 

[ Upstream commit 3337bf41e0dd70b4064cdf60acdfcdc2d050066c ]

An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
formed when count_pmc() is used to reset PMCs on a few selftests. This
extra pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra value on ebb_state.stats.pmc_count.

Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
>> [30]: register SPRN_PMC1  = 0x451e
PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

Signed-off-by: Desnes A. Nunes do Rosario 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200626164737.21943-1-desn...@linux.ibm.com
Signed-off-by: Sasha Levin 
---
 .../selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c| 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c  | 2 --
 .../selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c| 1 -
 .../testing/selftests/powerpc/pmu/ebb/multi_counter_test.c | 7 ---
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c   | 2 --
 .../testing/selftests/powerpc/pmu/ebb/pmae_handling_test.c | 2 --
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c| 2 --
 11 files changed, 26 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
index 94110b1dcd3d8..031baa43646fb 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
@@ -91,8 +91,6 @@ int back_to_back_ebbs(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
index 7c57a8d79535d..361e0be9df9ae 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
@@ -42,8 +42,6 @@ int cycles(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index ecf5ee3283a3e..fe7d0dc2a1a26 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -99,8 +99,6 @@ int cycles_with_freeze(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
printf("EBBs while frozen %d\n", ebbs_while_frozen);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
index c0faba520b35c..b9b30f974b5ea 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
@@ -71,8 +71,6 @@ int cycles_with_mmcr2(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index 46681fec549b8..2694ae161a84a 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -396,8 +396,6 @@ int ebb_child(union pipe read_pipe, union pipe write_pipe)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git 
a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
index a991d2ea8d0a1..174e4f4dae6c0 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_

[PATCH AUTOSEL 4.19 07/38] powerpc/xive: Ignore kmemleak false positives

2020-08-21 Thread Sasha Levin
From: Alexey Kardashevskiy 

[ Upstream commit f0993c839e95dd6c7f054a1015e693c87e33e4fb ]

xive_native_provision_pages() allocates memory and passes the pointer to
OPAL so kmemleak cannot find the pointer usage in the kernel memory and
produces a false positive report (below) (even if the kernel did scan
OPAL memory, it is unable to deal with __pa() addresses anyway).

This silences the warning.

unreferenced object 0xc000200350c4 (size 65536):
  comm "qemu-system-ppc", pid 2725, jiffies 4294946414 (age 70776.530s)
  hex dump (first 32 bytes):
02 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00  P...
01 00 08 07 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[<81ff046c>] xive_native_alloc_vp_block+0x120/0x250
[] kvmppc_xive_compute_vp_id+0x248/0x350 [kvm]
[] kvmppc_xive_connect_vcpu+0xc0/0x520 [kvm]
[<6acbc81c>] kvm_arch_vcpu_ioctl+0x308/0x580 [kvm]
[<89c69580>] kvm_vcpu_ioctl+0x19c/0xae0 [kvm]
[<902ae91e>] ksys_ioctl+0x184/0x1b0
[] sys_ioctl+0x48/0xb0
[<01b2c127>] system_call_exception+0x124/0x1f0
[] system_call_common+0xe8/0x214

Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200612043303.84894-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/sysdev/xive/native.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/native.c 
b/arch/powerpc/sysdev/xive/native.c
index cb1f51ad48e40..411f785cdfb51 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -627,6 +628,7 @@ static bool xive_native_provision_pages(void)
pr_err("Failed to allocate provisioning page\n");
return false;
}
+   kmemleak_ignore(p);
opal_xive_donate_page(chip, __pa(p));
}
return true;
-- 
2.25.1



[PATCH AUTOSEL 5.4 25/48] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests

2020-08-21 Thread Sasha Levin
From: "Desnes A. Nunes do Rosario" 

[ Upstream commit 3337bf41e0dd70b4064cdf60acdfcdc2d050066c ]

An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
formed when count_pmc() is used to reset PMCs on a few selftests. This
extra pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra value on ebb_state.stats.pmc_count.

Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
>> [30]: register SPRN_PMC1  = 0x451e
PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

Signed-off-by: Desnes A. Nunes do Rosario 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200626164737.21943-1-desn...@linux.ibm.com
Signed-off-by: Sasha Levin 
---
 .../selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c| 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c  | 2 --
 .../selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c| 1 -
 .../testing/selftests/powerpc/pmu/ebb/multi_counter_test.c | 7 ---
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c   | 2 --
 .../testing/selftests/powerpc/pmu/ebb/pmae_handling_test.c | 2 --
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c| 2 --
 11 files changed, 26 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
index a2d7b0e3dca97..a26ac122c759f 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
@@ -91,8 +91,6 @@ int back_to_back_ebbs(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
index bc893813483ee..bb9f587fa76e8 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
@@ -42,8 +42,6 @@ int cycles(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index dcd351d203289..9ae795ce314e6 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -99,8 +99,6 @@ int cycles_with_freeze(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
printf("EBBs while frozen %d\n", ebbs_while_frozen);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
index 94c99c12c0f23..4b45a2e70f62b 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
@@ -71,8 +71,6 @@ int cycles_with_mmcr2(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index dfbc5c3ad52d7..21537d6eb6b7d 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -396,8 +396,6 @@ int ebb_child(union pipe read_pipe, union pipe write_pipe)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git 
a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
index ca2f7d729155b..b208bf6ad58d3 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_

[PATCH AUTOSEL 5.4 10/48] powerpc/xive: Ignore kmemleak false positives

2020-08-21 Thread Sasha Levin
From: Alexey Kardashevskiy 

[ Upstream commit f0993c839e95dd6c7f054a1015e693c87e33e4fb ]

xive_native_provision_pages() allocates memory and passes the pointer to
OPAL so kmemleak cannot find the pointer usage in the kernel memory and
produces a false positive report (below) (even if the kernel did scan
OPAL memory, it is unable to deal with __pa() addresses anyway).

This silences the warning.

unreferenced object 0xc000200350c4 (size 65536):
  comm "qemu-system-ppc", pid 2725, jiffies 4294946414 (age 70776.530s)
  hex dump (first 32 bytes):
02 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00  P...
01 00 08 07 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[<81ff046c>] xive_native_alloc_vp_block+0x120/0x250
[] kvmppc_xive_compute_vp_id+0x248/0x350 [kvm]
[] kvmppc_xive_connect_vcpu+0xc0/0x520 [kvm]
[<6acbc81c>] kvm_arch_vcpu_ioctl+0x308/0x580 [kvm]
[<89c69580>] kvm_vcpu_ioctl+0x19c/0xae0 [kvm]
[<902ae91e>] ksys_ioctl+0x184/0x1b0
[] sys_ioctl+0x48/0xb0
[<01b2c127>] system_call_exception+0x124/0x1f0
[] system_call_common+0xe8/0x214

Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200612043303.84894-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/sysdev/xive/native.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/native.c 
b/arch/powerpc/sysdev/xive/native.c
index 50e1a8e02497d..3fd086533dcfc 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -646,6 +647,7 @@ static bool xive_native_provision_pages(void)
pr_err("Failed to allocate provisioning page\n");
return false;
}
+   kmemleak_ignore(p);
opal_xive_donate_page(chip, __pa(p));
}
return true;
-- 
2.25.1



[PATCH AUTOSEL 5.7 33/61] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests

2020-08-21 Thread Sasha Levin
From: "Desnes A. Nunes do Rosario" 

[ Upstream commit 3337bf41e0dd70b4064cdf60acdfcdc2d050066c ]

An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
formed when count_pmc() is used to reset PMCs on a few selftests. This
extra pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra value on ebb_state.stats.pmc_count.

Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
>> [30]: register SPRN_PMC1  = 0x451e
PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

Signed-off-by: Desnes A. Nunes do Rosario 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200626164737.21943-1-desn...@linux.ibm.com
Signed-off-by: Sasha Levin 
---
 .../selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c| 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c  | 2 --
 .../selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c| 1 -
 .../testing/selftests/powerpc/pmu/ebb/multi_counter_test.c | 7 ---
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c   | 2 --
 .../testing/selftests/powerpc/pmu/ebb/pmae_handling_test.c | 2 --
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c| 2 --
 11 files changed, 26 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
index a2d7b0e3dca97..a26ac122c759f 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
@@ -91,8 +91,6 @@ int back_to_back_ebbs(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
index bc893813483ee..bb9f587fa76e8 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
@@ -42,8 +42,6 @@ int cycles(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index dcd351d203289..9ae795ce314e6 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -99,8 +99,6 @@ int cycles_with_freeze(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
printf("EBBs while frozen %d\n", ebbs_while_frozen);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
index 94c99c12c0f23..4b45a2e70f62b 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
@@ -71,8 +71,6 @@ int cycles_with_mmcr2(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index dfbc5c3ad52d7..21537d6eb6b7d 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -396,8 +396,6 @@ int ebb_child(union pipe read_pipe, union pipe write_pipe)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git 
a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
index ca2f7d729155b..b208bf6ad58d3 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_

[PATCH AUTOSEL 5.7 12/61] powerpc/xive: Ignore kmemleak false positives

2020-08-21 Thread Sasha Levin
From: Alexey Kardashevskiy 

[ Upstream commit f0993c839e95dd6c7f054a1015e693c87e33e4fb ]

xive_native_provision_pages() allocates memory and passes the pointer to
OPAL so kmemleak cannot find the pointer usage in the kernel memory and
produces a false positive report (below) (even if the kernel did scan
OPAL memory, it is unable to deal with __pa() addresses anyway).

This silences the warning.

unreferenced object 0xc000200350c4 (size 65536):
  comm "qemu-system-ppc", pid 2725, jiffies 4294946414 (age 70776.530s)
  hex dump (first 32 bytes):
02 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00  P...
01 00 08 07 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[<81ff046c>] xive_native_alloc_vp_block+0x120/0x250
[] kvmppc_xive_compute_vp_id+0x248/0x350 [kvm]
[] kvmppc_xive_connect_vcpu+0xc0/0x520 [kvm]
[<6acbc81c>] kvm_arch_vcpu_ioctl+0x308/0x580 [kvm]
[<89c69580>] kvm_vcpu_ioctl+0x19c/0xae0 [kvm]
[<902ae91e>] ksys_ioctl+0x184/0x1b0
[] sys_ioctl+0x48/0xb0
[<01b2c127>] system_call_exception+0x124/0x1f0
[] system_call_common+0xe8/0x214

Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200612043303.84894-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/sysdev/xive/native.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/native.c 
b/arch/powerpc/sysdev/xive/native.c
index 5218fdc4b29a9..82860c7b58353 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -647,6 +648,7 @@ static bool xive_native_provision_pages(void)
pr_err("Failed to allocate provisioning page\n");
return false;
}
+   kmemleak_ignore(p);
opal_xive_donate_page(chip, __pa(p));
}
return true;
-- 
2.25.1



[PATCH AUTOSEL 5.8 33/62] selftests/powerpc: Purge extra count_pmc() calls of ebb selftests

2020-08-21 Thread Sasha Levin
From: "Desnes A. Nunes do Rosario" 

[ Upstream commit 3337bf41e0dd70b4064cdf60acdfcdc2d050066c ]

An extra count on ebb_state.stats.pmc_count[PMC_INDEX(pmc)] is being per-
formed when count_pmc() is used to reset PMCs on a few selftests. This
extra pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra value on ebb_state.stats.pmc_count.

Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
>> [30]: register SPRN_PMC1  = 0x451e
PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

Signed-off-by: Desnes A. Nunes do Rosario 
Tested-by: Sachin Sant 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200626164737.21943-1-desn...@linux.ibm.com
Signed-off-by: Sasha Levin 
---
 .../selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c| 2 --
 .../selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c | 2 --
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c  | 2 --
 .../selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c  | 2 --
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c| 1 -
 .../testing/selftests/powerpc/pmu/ebb/multi_counter_test.c | 7 ---
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c   | 2 --
 .../testing/selftests/powerpc/pmu/ebb/pmae_handling_test.c | 2 --
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c| 2 --
 11 files changed, 26 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
index a2d7b0e3dca97..a26ac122c759f 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
@@ -91,8 +91,6 @@ int back_to_back_ebbs(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
index bc893813483ee..bb9f587fa76e8 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
@@ -42,8 +42,6 @@ int cycles(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index dcd351d203289..9ae795ce314e6 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -99,8 +99,6 @@ int cycles_with_freeze(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
printf("EBBs while frozen %d\n", ebbs_while_frozen);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
index 94c99c12c0f23..4b45a2e70f62b 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
@@ -71,8 +71,6 @@ int cycles_with_mmcr2(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index dfbc5c3ad52d7..21537d6eb6b7d 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -396,8 +396,6 @@ int ebb_child(union pipe read_pipe, union pipe write_pipe)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
-
dump_ebb_state();
 
event_close(&event);
diff --git 
a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
index ca2f7d729155b..b208bf6ad58d3 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_

[PATCH AUTOSEL 5.8 12/62] powerpc/xive: Ignore kmemleak false positives

2020-08-21 Thread Sasha Levin
From: Alexey Kardashevskiy 

[ Upstream commit f0993c839e95dd6c7f054a1015e693c87e33e4fb ]

xive_native_provision_pages() allocates memory and passes the pointer to
OPAL so kmemleak cannot find the pointer usage in the kernel memory and
produces a false positive report (below) (even if the kernel did scan
OPAL memory, it is unable to deal with __pa() addresses anyway).

This silences the warning.

unreferenced object 0xc000200350c4 (size 65536):
  comm "qemu-system-ppc", pid 2725, jiffies 4294946414 (age 70776.530s)
  hex dump (first 32 bytes):
02 00 00 00 50 00 00 00 00 00 00 00 00 00 00 00  P...
01 00 08 07 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[<81ff046c>] xive_native_alloc_vp_block+0x120/0x250
[] kvmppc_xive_compute_vp_id+0x248/0x350 [kvm]
[] kvmppc_xive_connect_vcpu+0xc0/0x520 [kvm]
[<6acbc81c>] kvm_arch_vcpu_ioctl+0x308/0x580 [kvm]
[<89c69580>] kvm_vcpu_ioctl+0x19c/0xae0 [kvm]
[<902ae91e>] ksys_ioctl+0x184/0x1b0
[] sys_ioctl+0x48/0xb0
[<01b2c127>] system_call_exception+0x124/0x1f0
[] system_call_common+0xe8/0x214

Signed-off-by: Alexey Kardashevskiy 
Signed-off-by: Michael Ellerman 
Link: https://lore.kernel.org/r/20200612043303.84894-1-...@ozlabs.ru
Signed-off-by: Sasha Levin 
---
 arch/powerpc/sysdev/xive/native.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/sysdev/xive/native.c 
b/arch/powerpc/sysdev/xive/native.c
index 71b881e554fcb..cb58ec7ce77ac 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -647,6 +648,7 @@ static bool xive_native_provision_pages(void)
pr_err("Failed to allocate provisioning page\n");
return false;
}
+   kmemleak_ignore(p);
opal_xive_donate_page(chip, __pa(p));
}
return true;
-- 
2.25.1



Re: [PATCH v6 11/12] mm/vmalloc: Hugepage vmalloc mappings

2020-08-21 Thread Nicholas Piggin
Excerpts from Eric Dumazet's message of August 22, 2020 1:38 am:
> 
> On 8/21/20 8:12 AM, Nicholas Piggin wrote:
>> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
>> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
>> supports PMD sized vmap mappings.
>> 
>> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or
>> larger, and fall back to small pages if that was unsuccessful.
>> 
>> Allocations that do not use PAGE_KERNEL prot are not permitted to use huge
>> pages, because not all callers expect this (e.g., module allocations vs
>> strict module rwx).
>> 
>> This reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
>> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
>> 
>> This can result in more internal fragmentation and memory overhead for a
>> given allocation, an option nohugevmalloc is added to disable at boot.
>> 
>>
> 
> Thanks for working on this stuff, I tried something similar in the past,
> but could not really do more than a hack.
> ( https://lkml.org/lkml/2016/12/21/285 )

Oh nice. It might be possible to do some ideas from your patch
still. Higher order pages smaller than PMD size, or the memory
policy stuff, perhaps.

> Note that __init alloc_large_system_hash() is used at boot time,
> when NUMA policy is spreading allocations over all NUMA nodes.
> 
> This means that on a dual node system, a hash table should be 50/50 spread.
> 
> With your patch, if a hashtable is exactly the size of one huge page,
> the location of this hashtable will be not balanced, this might have some
> unwanted impact.

In that case it shouldn't because it divides by the number of nodes,
but it will in general have a bit larger granularity in balancing than
smaller pages of course.

There's probably a better way to size these important hashes on NUMA. I
suspect most of the time you have a NUMA machine you actually would
prefer to use large pages now, even if it means taking up to 2MB more
memory per node per hash. It's not a great amount and the allocation 
size is rather arbitrary anyway.

Thanks,
Nick


Re: kernel since 5.6 do not boot anymore on Apple PowerBook

2020-08-21 Thread Giuseppe Sacco
Hello,

Il giorno ven, 21/08/2020 alle 16.03 +0200, Christophe Leroy ha
scritto:
[...]
> Thanks.
> 
> The Oops in the video shows that the issue is at 0x1bcac and msr
> value 
> shows that Instruction MMU is disabled. So this corresponds to
> address 
> 0xc001bcac. In the vmlinux you sent me this address is in 
> power_save_ppc32_restore()
> 
> This issue is fixed by 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7bce32ccbab3ba3e3e0f27da6961bf6313df97ed.1581663140.git.christophe.le...@c-s.fr/
> 
> 
> You also said in a previous mail that your original issue also
> happens 
> when CONFIG_VMAP_STACK is not selected. The above bug being linked
> to 
> CONFIG_VMAP_STACK, maybe it would be easier to bisect with 
> CONFIG_VMAP_STACK unselected.

I rebuilt the same kernel that I would otherwise have skipped because
of the other bug, and it worked, indeed.
I will continue my "bisect quest" unselecting VMAP_STACK in order to
speed up the bisecting.

Thank you very much,
Giuseppe



Re: [PATCH v6 11/12] mm/vmalloc: Hugepage vmalloc mappings

2020-08-21 Thread Eric Dumazet


On 8/21/20 8:12 AM, Nicholas Piggin wrote:
> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
> supports PMD sized vmap mappings.
> 
> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or
> larger, and fall back to small pages if that was unsuccessful.
> 
> Allocations that do not use PAGE_KERNEL prot are not permitted to use huge
> pages, because not all callers expect this (e.g., module allocations vs
> strict module rwx).
> 
> This reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> 
> This can result in more internal fragmentation and memory overhead for a
> given allocation, an option nohugevmalloc is added to disable at boot.
> 
>

Thanks for working on this stuff, I tried something similar in the past,
but could not really do more than a hack.
( https://lkml.org/lkml/2016/12/21/285 )

Note that __init alloc_large_system_hash() is used at boot time,
when NUMA policy is spreading allocations over all NUMA nodes.

This means that on a dual node system, a hash table should be 50/50 spread.

With your patch, if a hashtable is exactly the size of one huge page,
the location of this hashtable will be not balanced, this might have some
unwanted impact.

Thanks !



[PATCH v6 12/12] powerpc/64s/radix: Enable huge vmalloc mappings

2020-08-21 Thread Nicholas Piggin
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nicholas Piggin 
---
 Documentation/admin-guide/kernel-parameters.txt | 2 ++
 arch/powerpc/Kconfig| 1 +
 2 files changed, 3 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index bdc1f33fd3d1..6f0b41289a90 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3190,6 +3190,8 @@
 
nohugeiomap [KNL,X86,PPC] Disable kernel huge I/O mappings.
 
+   nohugevmalloc   [PPC] Disable kernel huge vmalloc mappings.
+
nosmt   [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
 
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f48bbfb3ce9..9171d25ad7dc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -175,6 +175,7 @@ config PPC
select GENERIC_TIME_VSYSCALL
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
+   select HAVE_ARCH_HUGE_VMALLOC   if HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
-- 
2.23.0



[PATCH v6 11/12] mm/vmalloc: Hugepage vmalloc mappings

2020-08-21 Thread Nicholas Piggin
Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
supports PMD sized vmap mappings.

vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or
larger, and fall back to small pages if that was unsuccessful.

Allocations that do not use PAGE_KERNEL prot are not permitted to use huge
pages, because not all callers expect this (e.g., module allocations vs
strict module rwx).

This reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.

This can result in more internal fragmentation and memory overhead for a
given allocation, an option nohugevmalloc is added to disable at boot.

Signed-off-by: Nicholas Piggin 
---
 arch/Kconfig|   4 +
 include/linux/vmalloc.h |   1 +
 mm/page_alloc.c |   5 +-
 mm/vmalloc.c| 180 ++--
 4 files changed, 145 insertions(+), 45 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index af14a567b493..b2b89d629317 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -616,6 +616,10 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
 config HAVE_ARCH_HUGE_VMAP
bool
 
+config HAVE_ARCH_HUGE_VMALLOC
+   depends on HAVE_ARCH_HUGE_VMAP
+   bool
+
 config ARCH_WANT_HUGE_PMD_SHARE
bool
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 15adb9a14fb6..a7449064fe35 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -58,6 +58,7 @@ struct vm_struct {
unsigned long   size;
unsigned long   flags;
struct page **pages;
+   unsigned intpage_order;
unsigned intnr_pages;
phys_addr_t phys_addr;
const void  *caller;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e2bab486fea..b6427cc7b838 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -69,6 +69,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -8102,6 +8103,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
void *table = NULL;
gfp_t gfp_flags;
bool virt;
+   bool huge;
 
/* allow the kernel cmdline to have a say */
if (!numentries) {
@@ -8169,6 +8171,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
} else if (get_order(size) >= MAX_ORDER || hashdist) {
table = __vmalloc(size, gfp_flags);
virt = true;
+   huge = (find_vm_area(table)->page_order > 0);
} else {
/*
 * If bucketsize is not a power-of-two, we may free
@@ -8185,7 +8188,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
 
pr_info("%s hash table entries: %ld (order: %d, %lu bytes, %s)\n",
tablename, 1UL << log2qty, ilog2(size) - PAGE_SHIFT, size,
-   virt ? "vmalloc" : "linear");
+   virt ? (huge ? "vmalloc hugepage" : "vmalloc") : "linear");
 
if (_hash_shift)
*_hash_shift = log2qty;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 1d6cad16bda3..8db53c2d7f72 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -44,6 +44,19 @@
 #include "internal.h"
 #include "pgalloc-track.h"
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
+static bool __ro_after_init vmap_allow_huge = true;
+
+static int __init set_nohugevmalloc(char *str)
+{
+   vmap_allow_huge = false;
+   return 0;
+}
+early_param("nohugevmalloc", set_nohugevmalloc);
+#else /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
+static const bool vmap_allow_huge = false;
+#endif /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
+
 bool is_vmalloc_addr(const void *x)
 {
unsigned long addr = (unsigned long)x;
@@ -477,31 +490,12 @@ static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long 
addr,
return 0;
 }
 
-/**
- * map_kernel_range_noflush - map kernel VM area with the specified pages
- * @addr: start of the VM area to map
- * @size: size of the VM area to map
- * @prot: page protection flags to use
- * @pages: pages to map
- *
- * Map PFN_UP(@size) pages at @addr.  The VM area @addr and @size specify 
should
- * have been allocated using get_vm_area() and its friends.
- *
- * NOTE:
- * This function does NOT do any cache flushing.  The caller is responsible for
- * calling flush_cache_vmap() on to-be-mapped areas before calling this
- * function.
- *
- * RETURNS:
- * 0 on success, -errno on failure.
- */
-int map_kernel_range_noflush(unsigned long addr, unsigned long size,
-pgprot_t prot, struct page **pages)
+static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long 
end,
+   pgprot_t prot, struct page **pages)
 {
unsigned long start = addr;
-   unsigned long end = addr + size;
-   unsigned long next;
pgd_t *pgd;
+   

[PATCH v6 10/12] mm/vmalloc: add vmap_range_noflush variant

2020-08-21 Thread Nicholas Piggin
As a side-effect, the order of flush_cache_vmap() and
arch_sync_kernel_mappings() calls are switched, but that now matches
the other callers in this file.

Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 256554d598e6..1d6cad16bda3 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -237,7 +237,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr, 
unsigned long end,
return 0;
 }
 
-int vmap_range(unsigned long addr, unsigned long end,
+static int vmap_range_noflush(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot,
unsigned int max_page_shift)
 {
@@ -259,14 +259,24 @@ int vmap_range(unsigned long addr, unsigned long end,
break;
} while (pgd++, phys_addr += (next - addr), addr = next, addr != end);
 
-   flush_cache_vmap(start, end);
-
if (mask & ARCH_PAGE_TABLE_SYNC_MASK)
arch_sync_kernel_mappings(start, end);
 
return err;
 }
 
+int vmap_range(unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   unsigned int max_page_shift)
+{
+   int err;
+
+   err = vmap_range_noflush(addr, end, phys_addr, prot, max_page_shift);
+   flush_cache_vmap(addr, end);
+
+   return err;
+}
+
 static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 pgtbl_mod_mask *mask)
 {
-- 
2.23.0



[PATCH v6 09/12] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c

2020-08-21 Thread Nicholas Piggin
This is a generic kernel virtual memory mapper, not specific to ioremap.

Signed-off-by: Nicholas Piggin 
---
 include/linux/vmalloc.h |   3 +
 mm/ioremap.c| 197 
 mm/vmalloc.c| 196 +++
 3 files changed, 199 insertions(+), 197 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3f6bba4cc9bc..15adb9a14fb6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -177,6 +177,9 @@ extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
 
 #ifdef CONFIG_MMU
+int vmap_range(unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   unsigned int max_page_shift);
 extern int map_kernel_range_noflush(unsigned long start, unsigned long size,
pgprot_t prot, struct page **pages);
 int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot,
diff --git a/mm/ioremap.c b/mm/ioremap.c
index c67f91164401..d1dcc7e744ac 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -28,203 +28,6 @@ early_param("nohugeiomap", set_nohugeiomap);
 static const bool iomap_max_page_shift = PAGE_SHIFT;
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
-static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
-{
-   pte_t *pte;
-   u64 pfn;
-
-   pfn = phys_addr >> PAGE_SHIFT;
-   pte = pte_alloc_kernel_track(pmd, addr, mask);
-   if (!pte)
-   return -ENOMEM;
-   do {
-   BUG_ON(!pte_none(*pte));
-   set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot));
-   pfn++;
-   } while (pte++, addr += PAGE_SIZE, addr != end);
-   *mask |= PGTBL_PTE_MODIFIED;
-   return 0;
-}
-
-static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift)
-{
-   if (max_page_shift < PMD_SHIFT)
-   return 0;
-
-   if (!arch_vmap_pmd_supported(prot))
-   return 0;
-
-   if ((end - addr) != PMD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(addr, PMD_SIZE))
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PMD_SIZE))
-   return 0;
-
-   if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
-   return 0;
-
-   return pmd_set_huge(pmd, phys_addr, prot);
-}
-
-static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift, pgtbl_mod_mask *mask)
-{
-   pmd_t *pmd;
-   unsigned long next;
-
-   pmd = pmd_alloc_track(&init_mm, pud, addr, mask);
-   if (!pmd)
-   return -ENOMEM;
-   do {
-   next = pmd_addr_end(addr, end);
-
-   if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot, 
max_page_shift)) {
-   *mask |= PGTBL_PMD_MODIFIED;
-   continue;
-   }
-
-   if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask))
-   return -ENOMEM;
-   } while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
-   return 0;
-}
-
-static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift)
-{
-   if (max_page_shift < PUD_SHIFT)
-   return 0;
-
-   if (!arch_vmap_pud_supported(prot))
-   return 0;
-
-   if ((end - addr) != PUD_SIZE)
-   return 0;
-
-   if (!IS_ALIGNED(addr, PUD_SIZE))
-   return 0;
-
-   if (!IS_ALIGNED(phys_addr, PUD_SIZE))
-   return 0;
-
-   if (pud_present(*pud) && !pud_free_pmd_page(pud, addr))
-   return 0;
-
-   return pud_set_huge(pud, phys_addr, prot);
-}
-
-static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
-   phys_addr_t phys_addr, pgprot_t prot,
-   unsigned int max_page_shift, pgtbl_mod_mask *mask)
-{
-   pud_t *pud;
-   unsigned long next;
-
-   pud = pud_alloc_track(&init_mm, p4d, addr, mask);
-   if (!pud)
-   return -ENOMEM;
-   do {
-   next = pud_addr_end(addr, end);
-
-   if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot, 
max_page_shift)) {
-   *mask |= PGTBL_PUD_MODIFIED;
-   continue;
-   }
-
-   if (vmap_pmd_range(pud, addr, next, phys_addr, prot, 
max_page_shift, mask))
-   return -ENOMEM;
- 

[PATCH v6 08/12] x86: inline huge vmap supported functions

2020-08-21 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Signed-off-by: Nicholas Piggin 
---
 arch/x86/include/asm/vmalloc.h | 22 +++---
 arch/x86/mm/ioremap.c  | 19 ---
 arch/x86/mm/pgtable.c  | 13 -
 3 files changed, 19 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
index 094ea2b565f3..e714b00fc0ca 100644
--- a/arch/x86/include/asm/vmalloc.h
+++ b/arch/x86/include/asm/vmalloc.h
@@ -1,13 +1,29 @@
 #ifndef _ASM_X86_VMALLOC_H
 #define _ASM_X86_VMALLOC_H
 
+#include 
 #include 
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+#ifdef CONFIG_X86_64
+   return boot_cpu_has(X86_FEATURE_GBPAGES);
+#else
+   return false;
+#endif
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   return boot_cpu_has(X86_FEATURE_PSE);
+}
 #endif
 
 #endif /* _ASM_X86_VMALLOC_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 159bfca757b9..1465a22a9bfb 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -481,25 +481,6 @@ void iounmap(volatile void __iomem *addr)
 }
 EXPORT_SYMBOL(iounmap);
 
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-
-bool arch_vmap_pud_supported(pgprot_t prot)
-{
-#ifdef CONFIG_X86_64
-   return boot_cpu_has(X86_FEATURE_GBPAGES);
-#else
-   return false;
-#endif
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   return boot_cpu_has(X86_FEATURE_PSE);
-}
-
 /*
  * Convert a physical pointer to a virtual kernel pointer for /dev/mem
  * access
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index dfd82f51ba66..801c418ee97d 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -780,14 +780,6 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
 }
 
-/*
- * Until we support 512GB pages, skip them in the vmap area.
- */
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;
-}
-
 #ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
@@ -859,11 +851,6 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
 #else /* !CONFIG_X86_64 */
 
-int pud_free_pmd_page(pud_t *pud, unsigned long addr)
-{
-   return pud_none(*pud);
-}
-
 /*
  * Disable free page handling on x86-PAE. This assures that ioremap()
  * does not update sync'd pmd entries. See vmalloc_sync_one().
-- 
2.23.0



[PATCH v6 07/12] arm64: inline huge vmap supported functions

2020-08-21 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Nicholas Piggin 
---
 arch/arm64/include/asm/vmalloc.h | 23 ---
 arch/arm64/mm/mmu.c  | 26 --
 2 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 597b40405319..fc9a12d6cc1a 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -4,9 +4,26 @@
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+   /*
+* Only 4k granule supports level 1 block mappings.
+* SW table walks can't handle removal of intermediate entries.
+*/
+   return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
+  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   /* See arch_vmap_pud_supported() */
+   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
+}
 #endif
 
 #endif /* _ASM_ARM64_VMALLOC_H */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9df7e0058c78..07093e148957 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1304,27 +1304,6 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int 
*size, pgprot_t prot)
return dt_virt;
 }
 
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-
-bool arch_vmap_pud_supported(pgprot_t prot);
-{
-   /*
-* Only 4k granule supports level 1 block mappings.
-* SW table walks can't handle removal of intermediate entries.
-*/
-   return IS_ENABLED(CONFIG_ARM64_4K_PAGES) &&
-  !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   /* See arch_vmap_pud_supported() */
-   return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
-}
-
 int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot)
 {
pud_t new_pud = pfn_pud(__phys_to_pfn(phys), mk_pud_sect_prot(prot));
@@ -1416,11 +1395,6 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
return 1;
 }
 
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;   /* Don't attempt a block mapping */
-}
-
 #ifdef CONFIG_MEMORY_HOTPLUG
 static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size)
 {
-- 
2.23.0



[PATCH v6 06/12] powerpc: inline huge vmap supported functions

2020-08-21 Thread Nicholas Piggin
This allows unsupported levels to be constant folded away, and so
p4d_free_pud_page can be removed because it's no longer linked to.

Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/vmalloc.h   | 19 ---
 arch/powerpc/mm/book3s64/radix_pgtable.c | 21 -
 2 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/vmalloc.h 
b/arch/powerpc/include/asm/vmalloc.h
index 105abb73f075..3f0c153befb0 100644
--- a/arch/powerpc/include/asm/vmalloc.h
+++ b/arch/powerpc/include/asm/vmalloc.h
@@ -1,12 +1,25 @@
 #ifndef _ASM_POWERPC_VMALLOC_H
 #define _ASM_POWERPC_VMALLOC_H
 
+#include 
 #include 
 
 #ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
-bool arch_vmap_p4d_supported(pgprot_t prot);
-bool arch_vmap_pud_supported(pgprot_t prot);
-bool arch_vmap_pmd_supported(pgprot_t prot);
+static inline bool arch_vmap_p4d_supported(pgprot_t prot)
+{
+   return false;
+}
+
+static inline bool arch_vmap_pud_supported(pgprot_t prot)
+{
+   /* HPT does not cope with large pages in the vmalloc area */
+   return radix_enabled();
+}
+
+static inline bool arch_vmap_pmd_supported(pgprot_t prot)
+{
+   return radix_enabled();
+}
 #endif
 
 #endif /* _ASM_POWERPC_VMALLOC_H */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index eca83a50bf2e..27f5837cf145 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1134,22 +1134,6 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
 }
 
-bool arch_vmap_pud_supported(pgprot_t prot)
-{
-   /* HPT does not cope with large pages in the vmalloc area */
-   return radix_enabled();
-}
-
-bool arch_vmap_pmd_supported(pgprot_t prot)
-{
-   return radix_enabled();
-}
-
-int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
-{
-   return 0;
-}
-
 int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 {
pte_t *ptep = (pte_t *)pud;
@@ -1233,8 +1217,3 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 
return 1;
 }
-
-bool arch_vmap_p4d_supported(pgprot_t prot)
-{
-   return false;
-}
-- 
2.23.0



[PATCH v6 05/12] mm: HUGE_VMAP arch support cleanup

2020-08-21 Thread Nicholas Piggin
This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.

This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.

This also adds a prot argument to the arch query. This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).

Cc: linuxppc-dev@lists.ozlabs.org
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Signed-off-by: Nicholas Piggin 
---
 arch/arm64/include/asm/vmalloc.h |  8 +++
 arch/arm64/mm/mmu.c  | 10 +--
 arch/powerpc/include/asm/vmalloc.h   |  8 +++
 arch/powerpc/mm/book3s64/radix_pgtable.c |  8 +--
 arch/x86/include/asm/vmalloc.h   |  7 ++
 arch/x86/mm/ioremap.c| 10 +--
 include/linux/io.h   |  9 ---
 include/linux/vmalloc.h  |  6 ++
 init/main.c  |  1 -
 mm/ioremap.c | 88 +---
 10 files changed, 77 insertions(+), 78 deletions(-)

diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h
index 2ca708ab9b20..597b40405319 100644
--- a/arch/arm64/include/asm/vmalloc.h
+++ b/arch/arm64/include/asm/vmalloc.h
@@ -1,4 +1,12 @@
 #ifndef _ASM_ARM64_VMALLOC_H
 #define _ASM_ARM64_VMALLOC_H
 
+#include 
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_ARM64_VMALLOC_H */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62fea1b6..9df7e0058c78 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1304,12 +1304,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int 
*size, pgprot_t prot)
return dt_virt;
 }
 
-int __init arch_ioremap_p4d_supported(void)
+bool arch_vmap_p4d_supported(pgprot_t prot)
 {
-   return 0;
+   return false;
 }
 
-int __init arch_ioremap_pud_supported(void)
+bool arch_vmap_pud_supported(pgprot_t prot);
 {
/*
 * Only 4k granule supports level 1 block mappings.
@@ -1319,9 +1319,9 @@ int __init arch_ioremap_pud_supported(void)
   !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
-int __init arch_ioremap_pmd_supported(void)
+bool arch_vmap_pmd_supported(pgprot_t prot)
 {
-   /* See arch_ioremap_pud_supported() */
+   /* See arch_vmap_pud_supported() */
return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS);
 }
 
diff --git a/arch/powerpc/include/asm/vmalloc.h 
b/arch/powerpc/include/asm/vmalloc.h
index b992dfaaa161..105abb73f075 100644
--- a/arch/powerpc/include/asm/vmalloc.h
+++ b/arch/powerpc/include/asm/vmalloc.h
@@ -1,4 +1,12 @@
 #ifndef _ASM_POWERPC_VMALLOC_H
 #define _ASM_POWERPC_VMALLOC_H
 
+#include 
+
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_POWERPC_VMALLOC_H */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 28c784976bed..eca83a50bf2e 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1134,13 +1134,13 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
set_pte_at(mm, addr, ptep, pte);
 }
 
-int __init arch_ioremap_pud_supported(void)
+bool arch_vmap_pud_supported(pgprot_t prot)
 {
/* HPT does not cope with large pages in the vmalloc area */
return radix_enabled();
 }
 
-int __init arch_ioremap_pmd_supported(void)
+bool arch_vmap_pmd_supported(pgprot_t prot)
 {
return radix_enabled();
 }
@@ -1234,7 +1234,7 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
return 1;
 }
 
-int __init arch_ioremap_p4d_supported(void)
+bool arch_vmap_p4d_supported(pgprot_t prot)
 {
-   return 0;
+   return false;
 }
diff --git a/arch/x86/include/asm/vmalloc.h b/arch/x86/include/asm/vmalloc.h
index 29837740b520..094ea2b565f3 100644
--- a/arch/x86/include/asm/vmalloc.h
+++ b/arch/x86/include/asm/vmalloc.h
@@ -1,6 +1,13 @@
 #ifndef _ASM_X86_VMALLOC_H
 #define _ASM_X86_VMALLOC_H
 
+#include 
 #include 
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
+bool arch_vmap_p4d_supported(pgprot_t prot);
+bool arch_vmap_pud_supported(pgprot_t prot);
+bool arch_vmap_pmd_supported(pgprot_t prot);
+#endif
+
 #endif /* _ASM_X86_VMALLOC_H */
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 84d85dbd1dad..159bfca757b9 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -481,21 +481,21 @@ void iounmap(volatile void __iomem *addr)
 }
 EXPORT_SYMBOL(iounmap);
 
-int __init arch_ioremap

[PATCH v6 04/12] mm/ioremap: rename ioremap_*_range to vmap_*_range

2020-08-21 Thread Nicholas Piggin
This will be used as a generic kernel virtual mapping function, so
re-name it in preparation.

Signed-off-by: Nicholas Piggin 
---
 mm/ioremap.c | 64 +++-
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/mm/ioremap.c b/mm/ioremap.c
index 5fa1ab41d152..3f4d36f9745a 100644
--- a/mm/ioremap.c
+++ b/mm/ioremap.c
@@ -61,9 +61,9 @@ static inline int ioremap_pud_enabled(void) { return 0; }
 static inline int ioremap_pmd_enabled(void) { return 0; }
 #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 
-static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pte_t *pte;
u64 pfn;
@@ -81,9 +81,8 @@ static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
return 0;
 }
 
-static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_pmd(pmd_t *pmd, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_pmd_enabled())
return 0;
@@ -103,9 +102,9 @@ static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long 
addr,
return pmd_set_huge(pmd, phys_addr, prot);
 }
 
-static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pmd_t *pmd;
unsigned long next;
@@ -116,20 +115,19 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned 
long addr,
do {
next = pmd_addr_end(addr, end);
 
-   if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) {
+   if (vmap_try_huge_pmd(pmd, addr, next, phys_addr, prot)) {
*mask |= PGTBL_PMD_MODIFIED;
continue;
}
 
-   if (ioremap_pte_range(pmd, addr, next, phys_addr, prot, mask))
+   if (vmap_pte_range(pmd, addr, next, phys_addr, prot, mask))
return -ENOMEM;
} while (pmd++, phys_addr += (next - addr), addr = next, addr != end);
return 0;
 }
 
-static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_pud(pud_t *pud, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_pud_enabled())
return 0;
@@ -149,9 +147,9 @@ static int ioremap_try_huge_pud(pud_t *pud, unsigned long 
addr,
return pud_set_huge(pud, phys_addr, prot);
 }
 
-static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr, pgprot_t prot,
-   pgtbl_mod_mask *mask)
+static int vmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot,
+   pgtbl_mod_mask *mask)
 {
pud_t *pud;
unsigned long next;
@@ -162,20 +160,19 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned 
long addr,
do {
next = pud_addr_end(addr, end);
 
-   if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot)) {
+   if (vmap_try_huge_pud(pud, addr, next, phys_addr, prot)) {
*mask |= PGTBL_PUD_MODIFIED;
continue;
}
 
-   if (ioremap_pmd_range(pud, addr, next, phys_addr, prot, mask))
+   if (vmap_pmd_range(pud, addr, next, phys_addr, prot, mask))
return -ENOMEM;
} while (pud++, phys_addr += (next - addr), addr = next, addr != end);
return 0;
 }
 
-static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr,
-   unsigned long end, phys_addr_t phys_addr,
-   pgprot_t prot)
+static int vmap_try_huge_p4d(p4d_t *p4d, unsigned long addr, unsigned long end,
+   phys_addr_t phys_addr, pgprot_t prot)
 {
if (!ioremap_p4d_enabled())
return 0;
@@ -195,9 +192,9 @@ static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long 
addr,
return p4d_set_huge(p4d, phys_addr, prot);
 }
 
-static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
-   

[PATCH v6 03/12] mm/vmalloc: rename vmap_*_range vmap_pages_*_range

2020-08-21 Thread Nicholas Piggin
The vmalloc mapper operates on a struct page * array rather than a
linear physical address, re-name it to make this distinction clear.

Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4e9b21adc73d..45cd80ec7eeb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -189,7 +189,7 @@ void unmap_kernel_range_noflush(unsigned long start, 
unsigned long size)
arch_sync_kernel_mappings(start, end);
 }
 
-static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
+static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -217,7 +217,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr,
return 0;
 }
 
-static int vmap_pmd_range(pud_t *pud, unsigned long addr,
+static int vmap_pages_pmd_range(pud_t *pud, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -229,13 +229,13 @@ static int vmap_pmd_range(pud_t *pud, unsigned long addr,
return -ENOMEM;
do {
next = pmd_addr_end(addr, end);
-   if (vmap_pte_range(pmd, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pte_range(pmd, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (pmd++, addr = next, addr != end);
return 0;
 }
 
-static int vmap_pud_range(p4d_t *p4d, unsigned long addr,
+static int vmap_pages_pud_range(p4d_t *p4d, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -247,13 +247,13 @@ static int vmap_pud_range(p4d_t *p4d, unsigned long addr,
return -ENOMEM;
do {
next = pud_addr_end(addr, end);
-   if (vmap_pmd_range(pud, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pmd_range(pud, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (pud++, addr = next, addr != end);
return 0;
 }
 
-static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
+static int vmap_pages_p4d_range(pgd_t *pgd, unsigned long addr,
unsigned long end, pgprot_t prot, struct page **pages, int *nr,
pgtbl_mod_mask *mask)
 {
@@ -265,7 +265,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
return -ENOMEM;
do {
next = p4d_addr_end(addr, end);
-   if (vmap_pud_range(p4d, addr, next, prot, pages, nr, mask))
+   if (vmap_pages_pud_range(p4d, addr, next, prot, pages, nr, 
mask))
return -ENOMEM;
} while (p4d++, addr = next, addr != end);
return 0;
@@ -306,7 +306,7 @@ int map_kernel_range_noflush(unsigned long addr, unsigned 
long size,
next = pgd_addr_end(addr, end);
if (pgd_bad(*pgd))
mask |= PGTBL_PGD_MODIFIED;
-   err = vmap_p4d_range(pgd, addr, next, prot, pages, &nr, &mask);
+   err = vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, 
&mask);
if (err)
return err;
} while (pgd++, addr = next, addr != end);
-- 
2.23.0



[PATCH v6 02/12] mm: apply_to_pte_range warn and fail if a large pte is encountered

2020-08-21 Thread Nicholas Piggin
Currently this might mistake a large pte for bad, or treat it as a
page table, resulting in a crash or corruption.

Signed-off-by: Nicholas Piggin 
---
 mm/memory.c | 60 +++--
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 602f4283122f..3a39a47920e2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2262,13 +2262,20 @@ static int apply_to_pmd_range(struct mm_struct *mm, 
pud_t *pud,
}
do {
next = pmd_addr_end(addr, end);
-   if (create || !pmd_none_or_clear_bad(pmd)) {
-   err = apply_to_pte_range(mm, pmd, addr, next, fn, data,
-create);
-   if (err)
-   break;
+   if (pmd_none(*pmd) && !create)
+   continue;
+   if (WARN_ON_ONCE(pmd_leaf(*pmd)))
+   return -EINVAL;
+   if (WARN_ON_ONCE(pmd_bad(*pmd))) {
+   if (!create)
+   continue;
+   pmd_clear_bad(pmd);
}
+   err = apply_to_pte_range(mm, pmd, addr, next, fn, data, create);
+   if (err)
+   break;
} while (pmd++, addr = next, addr != end);
+
return err;
 }
 
@@ -2289,13 +2296,20 @@ static int apply_to_pud_range(struct mm_struct *mm, 
p4d_t *p4d,
}
do {
next = pud_addr_end(addr, end);
-   if (create || !pud_none_or_clear_bad(pud)) {
-   err = apply_to_pmd_range(mm, pud, addr, next, fn, data,
-create);
-   if (err)
-   break;
+   if (pud_none(*pud) && !create)
+   continue;
+   if (WARN_ON_ONCE(pud_leaf(*pud)))
+   return -EINVAL;
+   if (WARN_ON_ONCE(pud_bad(*pud))) {
+   if (!create)
+   continue;
+   pud_clear_bad(pud);
}
+   err = apply_to_pmd_range(mm, pud, addr, next, fn, data, create);
+   if (err)
+   break;
} while (pud++, addr = next, addr != end);
+
return err;
 }
 
@@ -2316,13 +2330,20 @@ static int apply_to_p4d_range(struct mm_struct *mm, 
pgd_t *pgd,
}
do {
next = p4d_addr_end(addr, end);
-   if (create || !p4d_none_or_clear_bad(p4d)) {
-   err = apply_to_pud_range(mm, p4d, addr, next, fn, data,
-create);
-   if (err)
-   break;
+   if (p4d_none(*p4d) && !create)
+   continue;
+   if (WARN_ON_ONCE(p4d_leaf(*p4d)))
+   return -EINVAL;
+   if (WARN_ON_ONCE(p4d_bad(*p4d))) {
+   if (!create)
+   continue;
+   p4d_clear_bad(p4d);
}
+   err = apply_to_pud_range(mm, p4d, addr, next, fn, data, create);
+   if (err)
+   break;
} while (p4d++, addr = next, addr != end);
+
return err;
 }
 
@@ -2341,8 +2362,15 @@ static int __apply_to_page_range(struct mm_struct *mm, 
unsigned long addr,
pgd = pgd_offset(mm, addr);
do {
next = pgd_addr_end(addr, end);
-   if (!create && pgd_none_or_clear_bad(pgd))
+   if (pgd_none(*pgd) && !create)
continue;
+   if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+   return -EINVAL;
+   if (WARN_ON_ONCE(pgd_bad(*pgd))) {
+   if (!create)
+   continue;
+   pgd_clear_bad(pgd);
+   }
err = apply_to_p4d_range(mm, pgd, addr, next, fn, data, create);
if (err)
break;
-- 
2.23.0



[PATCH v6 01/12] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings

2020-08-21 Thread Nicholas Piggin
vmalloc_to_page returns NULL for addresses mapped by larger pages[*].
Whether or not a vmap is huge depends on the architecture details,
alignments, boot options, etc., which the caller can not be expected
to know. Therefore HUGE_VMAP is a regression for vmalloc_to_page.

This change teaches vmalloc_to_page about larger pages, and returns
the struct page that corresponds to the offset within the large page.
This makes the API agnostic to mapping implementation details.

[*] As explained by commit 029c54b095995 ("mm/vmalloc.c: huge-vmap:
fail gracefully on unexpected huge vmap mappings")

Signed-off-by: Nicholas Piggin 
---
 mm/vmalloc.c | 41 ++---
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b482d240f9a2..4e9b21adc73d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -36,7 +36,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 #include 
 #include 
@@ -343,7 +343,9 @@ int is_vmalloc_or_module_addr(const void *x)
 }
 
 /*
- * Walk a vmap address to the struct page it maps.
+ * Walk a vmap address to the struct page it maps. Huge vmap mappings will
+ * return the tail page that corresponds to the base page address, which
+ * matches small vmap mappings.
  */
 struct page *vmalloc_to_page(const void *vmalloc_addr)
 {
@@ -363,25 +365,33 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
 
if (pgd_none(*pgd))
return NULL;
+   if (WARN_ON_ONCE(pgd_leaf(*pgd)))
+   return NULL; /* XXX: no allowance for huge pgd */
+   if (WARN_ON_ONCE(pgd_bad(*pgd)))
+   return NULL;
+
p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d))
return NULL;
-   pud = pud_offset(p4d, addr);
+   if (p4d_leaf(*p4d))
+   return p4d_page(*p4d) + ((addr & ~P4D_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(p4d_bad(*p4d)))
+   return NULL;
 
-   /*
-* Don't dereference bad PUD or PMD (below) entries. This will also
-* identify huge mappings, which we may encounter on architectures
-* that define CONFIG_HAVE_ARCH_HUGE_VMAP=y. Such regions will be
-* identified as vmalloc addresses by is_vmalloc_addr(), but are
-* not [unambiguously] associated with a struct page, so there is
-* no correct value to return for them.
-*/
-   WARN_ON_ONCE(pud_bad(*pud));
-   if (pud_none(*pud) || pud_bad(*pud))
+   pud = pud_offset(p4d, addr);
+   if (pud_none(*pud))
+   return NULL;
+   if (pud_leaf(*pud))
+   return pud_page(*pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(pud_bad(*pud)))
return NULL;
+
pmd = pmd_offset(pud, addr);
-   WARN_ON_ONCE(pmd_bad(*pmd));
-   if (pmd_none(*pmd) || pmd_bad(*pmd))
+   if (pmd_none(*pmd))
+   return NULL;
+   if (pmd_leaf(*pmd))
+   return pmd_page(*pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+   if (WARN_ON_ONCE(pmd_bad(*pmd)))
return NULL;
 
ptep = pte_offset_map(pmd, addr);
@@ -389,6 +399,7 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
if (pte_present(pte))
page = pte_page(pte);
pte_unmap(ptep);
+
return page;
 }
 EXPORT_SYMBOL(vmalloc_to_page);
-- 
2.23.0



[PATCH v6 00/12] huge vmalloc mappings

2020-08-21 Thread Nicholas Piggin
Thanks Christophe and Christoph for reviews.

Thanks,
Nick

Since v5:
- Split arch changes out better and make the constant folding work
- Avoid most of the 80 column wrap, fix a reference to lib/ioremap.c
- Fix compile error on some archs

Since v4:
- Fixed an off-by-page-order bug in v4
- Several minor cleanups.
- Added page order to /proc/vmallocinfo
- Added hugepage to alloc_large_system_hage output.
- Made an architecture config option, powerpc only for now.

Since v3:
- Fixed an off-by-one bug in a loop
- Fix !CONFIG_HAVE_ARCH_HUGE_VMAP build fail
- Hopefully this time fix the arm64 vmap stack bug, thanks Jonathan
  Cameron for debugging the cause of this (hopefully).

Since v2:
- Rebased on vmalloc cleanups, split series into simpler pieces.
- Fixed several compile errors and warnings
- Keep the page array and accounting in small page units because
  struct vm_struct is an interface (this should fix x86 vmap stack debug
  assert). [Thanks Zefan]

Nicholas Piggin (12):
  mm/vmalloc: fix vmalloc_to_page for huge vmap mappings
  mm: apply_to_pte_range warn and fail if a large pte is encountered
  mm/vmalloc: rename vmap_*_range vmap_pages_*_range
  lib/ioremap: rename ioremap_*_range to vmap_*_range
  mm: HUGE_VMAP arch support cleanup
  powerpc: inline huge vmap supported functions
  arm64: inline huge vmap supported functions
  x86: inline huge vmap supported functions
  mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c
  mm/vmalloc: add vmap_range_noflush variant
  mm/vmalloc: Hugepage vmalloc mappings
  powerpc/64s/radix: Enable huge vmalloc mappings

 .../admin-guide/kernel-parameters.txt |   2 +
 arch/Kconfig  |   4 +
 arch/arm64/include/asm/vmalloc.h  |  25 +
 arch/arm64/mm/mmu.c   |  26 -
 arch/powerpc/Kconfig  |   1 +
 arch/powerpc/include/asm/vmalloc.h|  21 +
 arch/powerpc/mm/book3s64/radix_pgtable.c  |  21 -
 arch/x86/include/asm/vmalloc.h|  23 +
 arch/x86/mm/ioremap.c |  19 -
 arch/x86/mm/pgtable.c |  13 -
 include/linux/io.h|   9 -
 include/linux/vmalloc.h   |  10 +
 init/main.c   |   1 -
 mm/ioremap.c  | 225 +
 mm/memory.c   |  60 ++-
 mm/page_alloc.c   |   5 +-
 mm/vmalloc.c  | 443 +++---
 17 files changed, 515 insertions(+), 393 deletions(-)

-- 
2.23.0



Re: kernel since 5.6 do not boot anymore on Apple PowerBook

2020-08-21 Thread Christophe Leroy




Le 21/08/2020 à 15:50, Giuseppe Sacco a écrit :

Il giorno ven, 21/08/2020 alle 15.29 +0200, Christophe Leroy ha
scritto:
[...]

Maybe the easiest would be first to locate this issue. Can you send me
the vmlinux and the .config matching the Oops in the video ?

And also the output of git bisect log ?


Here it is. Not really the one on the video, but the last "skipped" for
the same problem.
https://eppesuigoccas.homedns.org/~giuseppe/christophe.tar.bz2



Thanks.

The Oops in the video shows that the issue is at 0x1bcac and msr value 
shows that Instruction MMU is disabled. So this corresponds to address 
0xc001bcac. In the vmlinux you sent me this address is in 
power_save_ppc32_restore()


This issue is fixed by 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7bce32ccbab3ba3e3e0f27da6961bf6313df97ed.1581663140.git.christophe.le...@c-s.fr/


You also said in a previous mail that your original issue also happens 
when CONFIG_VMAP_STACK is not selected. The above bug being linked to 
CONFIG_VMAP_STACK, maybe it would be easier to bisect with 
CONFIG_VMAP_STACK unselected.


Christophe


Re: [PATCH] video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n

2020-08-21 Thread Bartlomiej Zolnierkiewicz


On 8/21/20 12:49 PM, Michael Ellerman wrote:
> The build is currently broken, if COMPILE_TEST=y and PPC_PMAC=n:
> 
>   linux/drivers/video/fbdev/controlfb.c: In function ‘control_set_hardware’:
>   linux/drivers/video/fbdev/controlfb.c:276:2: error: implicit declaration of 
> function ‘btext_update_display’
> 276 |  btext_update_display(p->frame_buffer_phys + CTRLFB_OFF,
> |  ^~~~
> 
> Fix it by including btext.h whenever CONFIG_BOOTX_TEXT is enabled.
> 
> Fixes: a07a63b0e24d ("video: fbdev: controlfb: add COMPILE_TEST support")
> Signed-off-by: Michael Ellerman 

Acked-by: Bartlomiej Zolnierkiewicz 

Thanks for fixing this.

> ---
>  drivers/video/fbdev/controlfb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> Does anyone mind if I apply this via the powerpc tree for v5.9?
> 
> It would be nice to get the build clean.

No objections from my side.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

> cheers
> 
> diff --git a/drivers/video/fbdev/controlfb.c b/drivers/video/fbdev/controlfb.c
> index 9c4f1be856ec..547abeb39f87 100644
> --- a/drivers/video/fbdev/controlfb.c
> +++ b/drivers/video/fbdev/controlfb.c
> @@ -49,6 +49,8 @@
>  #include 
>  #ifdef CONFIG_PPC_PMAC
>  #include 
> +#endif
> +#ifdef CONFIG_BOOTX_TEXT
>  #include 
>  #endif
>  


Re: kernel since 5.6 do not boot anymore on Apple PowerBook

2020-08-21 Thread Christophe Leroy

Hi again,

Le 21/08/2020 à 10:22, Giuseppe Sacco a écrit :

Hello Cristophe,

Il giorno ven, 21/08/2020 alle 08.55 +0200, Christophe Leroy ha
scritto:

Hi Giuseppe,

Le 08/07/2020 à 20:44, Christophe Leroy a écrit :


Le 08/07/2020 à 19:36, Giuseppe Sacco a écrit :

Hi Cristophe,

Il giorno mer, 08/07/2020 alle 19.09 +0200, Christophe Leroy ha
scritto:

[...]

What's the result with:

LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux


$ LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux
CALLscripts/checksyscalls.sh
CALLscripts/atomic/check-atomics.sh
CHK include/generated/compile.h
CC  kernel/module.o
kernel/module.c: In function 'do_init_module':
kernel/module.c:3593:2: error: implicit declaration of function
'module_enable_ro'; did you mean 'module_enable_x'? [-Werror=implicit-
function-declaration]
   3593 |  module_enable_ro(mod, true);
|  ^~~~
|  module_enable_x
cc1: some warnings being treated as errors
make[1]: *** [scripts/Makefile.build:267: kernel/module.o] Error 1
make: *** [Makefile:1735: kernel] Error 2

So, should I 'git bisect skip'?


Ah yes, I had the exact same problem last time I bisected.

So yes do 'git bisect skip'. You'll probably hit this problem half a
dozen of times, but at the end you should get a usefull bisect anyway.



Were you able to progress ?


Very slowly. I am still working on it, currently at recompile #276.
git-bisect states that I have still about 700 commits to check, but the
real problem is that more than 60% of built kernels crash even before
displaying the cpu_freq message (probably another long lasting bug
hides the one I am looking for). All these skipped kernels make
bisecting very very slow.

A short video about the problem I face when I skip the build is here:
https://eppesuigoccas.homedns.org/~giuseppe/bug%20avvio%20powerbook%20g4.mp4



Maybe the easiest would be first to locate this issue. Can you send me 
the vmlinux and the .config matching the Oops in the video ?


And also the output of git bisect log ?

Thanks
Christophe


Re: [PATCH v5 5/8] mm: HUGE_VMAP arch support cleanup

2020-08-21 Thread Christophe Leroy




Le 21/08/2020 à 12:39, Nicholas Piggin a écrit :

Excerpts from Christophe Leroy's message of August 21, 2020 3:40 pm:



Le 21/08/2020 à 06:44, Nicholas Piggin a écrit :

This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.

This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.


I think that in order to allow constant-folding of dead code for
unsupported levels, you must define arch_vmap_xxx_supported() as static
inline in a .h

If you have them in .c files, you'll get calls to tiny functions that
will always return false, but will still be called and dead code won't
be eliminated. And performance wise, that's probably not optimal either.


Yeah that's true actually, I think I didn't find a good place to add
the prototypes in the arch code but I'll have another look and either
rewrite the changelog or remove it. Although this does get a step closer
at least.



linux/vmalloc.h includes asm/vmalloc.h
Should it go there ?

Christophe


Re: [PATCH v5 8/8] mm/vmalloc: Hugepage vmalloc mappings

2020-08-21 Thread kernel test robot
Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on linus/master v5.9-rc1 next-20200821]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/huge-vmalloc-mappings/20200821-124543
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   sh4-linux-ld: mm/page_alloc.o: in function `alloc_large_system_hash':
>> (.init.text+0x1818): undefined reference to `find_vm_area'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v5 0/8] huge vmalloc mappings

2020-08-21 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of August 21, 2020 3:47 pm:
> 
> 
> Le 21/08/2020 à 06:44, Nicholas Piggin a écrit :
>> I made this powerpc-only for the time being. It shouldn't be too hard to
>> add support for other archs that define HUGE_VMAP. I have booted x86
>> with it enabled, just may not have audited everything.
> 
> I like this series, but if I understand correctly it enables huge 
> vmalloc mappings only for hugepages sizes matching a page directory 
> levels, ie on a PPC32 it would work only for 4M hugepages.

Yeah it really just uses the HUGE_VMAP mapping which is already there.

> On the 8xx, we only have 8M and 512k hugepages. Any change that it can 
> support these as well one day ?

The vmap_range interface can now handle that, then adding support is the
main work. Either make it weak and arch can override it, or add some
arch helpers to make the generic version create the huge pages if it's
not too ugly. Then you get those large pages for ioremap for free.

The vmalloc part to allocate and try to map a bigger page size and use 
it is quite trivial to change from PMD to an arch specific size.

Thanks,
Nick



[PATCH] video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n

2020-08-21 Thread Michael Ellerman
The build is currently broken, if COMPILE_TEST=y and PPC_PMAC=n:

  linux/drivers/video/fbdev/controlfb.c: In function ‘control_set_hardware’:
  linux/drivers/video/fbdev/controlfb.c:276:2: error: implicit declaration of 
function ‘btext_update_display’
276 |  btext_update_display(p->frame_buffer_phys + CTRLFB_OFF,
|  ^~~~

Fix it by including btext.h whenever CONFIG_BOOTX_TEXT is enabled.

Fixes: a07a63b0e24d ("video: fbdev: controlfb: add COMPILE_TEST support")
Signed-off-by: Michael Ellerman 
---
 drivers/video/fbdev/controlfb.c | 2 ++
 1 file changed, 2 insertions(+)

Does anyone mind if I apply this via the powerpc tree for v5.9?

It would be nice to get the build clean.

cheers

diff --git a/drivers/video/fbdev/controlfb.c b/drivers/video/fbdev/controlfb.c
index 9c4f1be856ec..547abeb39f87 100644
--- a/drivers/video/fbdev/controlfb.c
+++ b/drivers/video/fbdev/controlfb.c
@@ -49,6 +49,8 @@
 #include 
 #ifdef CONFIG_PPC_PMAC
 #include 
+#endif
+#ifdef CONFIG_BOOTX_TEXT
 #include 
 #endif
 
-- 
2.25.1



Re: [PATCH v5 5/8] mm: HUGE_VMAP arch support cleanup

2020-08-21 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of August 21, 2020 3:40 pm:
> 
> 
> Le 21/08/2020 à 06:44, Nicholas Piggin a écrit :
>> This changes the awkward approach where architectures provide init
>> functions to determine which levels they can provide large mappings for,
>> to one where the arch is queried for each call.
>> 
>> This removes code and indirection, and allows constant-folding of dead
>> code for unsupported levels.
> 
> I think that in order to allow constant-folding of dead code for 
> unsupported levels, you must define arch_vmap_xxx_supported() as static 
> inline in a .h
> 
> If you have them in .c files, you'll get calls to tiny functions that 
> will always return false, but will still be called and dead code won't 
> be eliminated. And performance wise, that's probably not optimal either.

Yeah that's true actually, I think I didn't find a good place to add
the prototypes in the arch code but I'll have another look and either
rewrite the changelog or remove it. Although this does get a step closer
at least.

Thanks,
Nick


Re: [PATCH 2/2] powerpc/64s: Disallow PROT_SAO in LPARs by default

2020-08-21 Thread Nicholas Piggin
Excerpts from Shawn Anastasio's message of August 21, 2020 11:08 am:
> Since migration of guests using SAO to ISA 3.1 hosts may cause issues,
> disable PROT_SAO in LPARs by default and introduce a new Kconfig option
> PPC_PROT_SAO_LPAR to allow users to enable it if desired.

I think this should be okay. Could you also update the selftest to skip
if we have PPC_FEATURE2_ARCH_3_1 set?

Thanks,
Nick

Acked-by: Nicholas Piggin 

> 
> Signed-off-by: Shawn Anastasio 
> ---
>  arch/powerpc/Kconfig| 12 
>  arch/powerpc/include/asm/mman.h |  9 +++--
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 1f48bbfb3ce9..65bed1fdeaad 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -860,6 +860,18 @@ config PPC_SUBPAGE_PROT
>  
> If unsure, say N here.
>  
> +config PPC_PROT_SAO_LPAR
> + bool "Support PROT_SAO mappings in LPARs"
> + depends on PPC_BOOK3S_64
> + help
> +   This option adds support for PROT_SAO mappings from userspace
> +   inside LPARs on supported CPUs.
> +
> +   This may cause issues when performing guest migration from
> +   a CPU that supports SAO to one that does not.
> +
> +   If unsure, say N here.
> +
>  config PPC_COPRO_BASE
>   bool
>  
> diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
> index 4ba303ea27f5..7cb6d18f5cd6 100644
> --- a/arch/powerpc/include/asm/mman.h
> +++ b/arch/powerpc/include/asm/mman.h
> @@ -40,8 +40,13 @@ static inline bool arch_validate_prot(unsigned long prot, 
> unsigned long addr)
>  {
>   if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
>   return false;
> - if ((prot & PROT_SAO) && !cpu_has_feature(CPU_FTR_SAO))
> - return false;
> + if (prot & PROT_SAO) {
> + if (!cpu_has_feature(CPU_FTR_SAO))
> + return false;
> + if (firmware_has_feature(FW_FEATURE_LPAR) &&
> + !IS_ENABLED(CONFIG_PPC_PROT_SAO_LPAR))
> + return false;
> + }
>   return true;
>  }
>  #define arch_validate_prot arch_validate_prot
> -- 
> 2.28.0
> 
> 


[PATCH] powerpc/prom_init: Check display props exist before enabling btext

2020-08-21 Thread Michael Ellerman
It's possible to enable CONFIG_PPC_EARLY_DEBUG_BOOTX for a pseries
kernel (maybe it shouldn't be), which is then booted with qemu/slof.

But if you do that the kernel crashes in draw_byte(), with a DAR
pointing somewhere near INT_MAX.

Adding some debug to prom_init we see that we're not able to read the
"address" property from OF, so we're just using whatever junk value
was on the stack.

So check the properties can be read properly from OF, if not we bail
out before initialising btext, which avoids the crash.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/prom_init.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index ae7ec9903191..5090a5ab54e5 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -2422,10 +2422,19 @@ static void __init prom_check_displays(void)
u32 width, height, pitch, addr;
 
prom_printf("Setting btext !\n");
-   prom_getprop(node, "width", &width, 4);
-   prom_getprop(node, "height", &height, 4);
-   prom_getprop(node, "linebytes", &pitch, 4);
-   prom_getprop(node, "address", &addr, 4);
+
+   if (prom_getprop(node, "width", &width, 4) == 
PROM_ERROR)
+   return;
+
+   if (prom_getprop(node, "height", &height, 4) == 
PROM_ERROR)
+   return;
+
+   if (prom_getprop(node, "linebytes", &pitch, 4) == 
PROM_ERROR)
+   return;
+
+   if (prom_getprop(node, "address", &addr, 4) == 
PROM_ERROR)
+   return;
+
prom_printf("W=%d H=%d LB=%d addr=0x%x\n",
width, height, pitch, addr);
btext_setup_display(width, height, 8, pitch, addr);
-- 
2.25.1



Re: kernel since 5.6 do not boot anymore on Apple PowerBook

2020-08-21 Thread Giuseppe Sacco
Hello Cristophe,

Il giorno ven, 21/08/2020 alle 08.55 +0200, Christophe Leroy ha
scritto:
> Hi Giuseppe,
> 
> Le 08/07/2020 à 20:44, Christophe Leroy a écrit :
> > 
> > Le 08/07/2020 à 19:36, Giuseppe Sacco a écrit :
> > > Hi Cristophe,
> > > 
> > > Il giorno mer, 08/07/2020 alle 19.09 +0200, Christophe Leroy ha
> > > scritto:
[...]
> > > > What's the result with:
> > > > 
> > > > LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux
> > > 
> > > $ LANG=C make ARCH=powerpc CROSS_COMPILE=powerpc-linux- vmlinux
> > >CALLscripts/checksyscalls.sh
> > >CALLscripts/atomic/check-atomics.sh
> > >CHK include/generated/compile.h
> > >CC  kernel/module.o
> > > kernel/module.c: In function 'do_init_module':
> > > kernel/module.c:3593:2: error: implicit declaration of function
> > > 'module_enable_ro'; did you mean 'module_enable_x'? [-Werror=implicit-
> > > function-declaration]
> > >   3593 |  module_enable_ro(mod, true);
> > >|  ^~~~
> > >|  module_enable_x
> > > cc1: some warnings being treated as errors
> > > make[1]: *** [scripts/Makefile.build:267: kernel/module.o] Error 1
> > > make: *** [Makefile:1735: kernel] Error 2
> > > 
> > > So, should I 'git bisect skip'?
> > 
> > Ah yes, I had the exact same problem last time I bisected.
> > 
> > So yes do 'git bisect skip'. You'll probably hit this problem half a 
> > dozen of times, but at the end you should get a usefull bisect anyway.
> > 
> 
> Were you able to progress ?

Very slowly. I am still working on it, currently at recompile #276.
git-bisect states that I have still about 700 commits to check, but the
real problem is that more than 60% of built kernels crash even before
displaying the cpu_freq message (probably another long lasting bug
hides the one I am looking for). All these skipped kernels make
bisecting very very slow.

A short video about the problem I face when I skip the build is here:
https://eppesuigoccas.homedns.org/~giuseppe/bug%20avvio%20powerbook%20g4.mp4

Bye,
Giuseppe



Re: [PATCH] powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000

2020-08-21 Thread Andreas Schwab
On Aug 21 2020, Christophe Leroy wrote:

> In is_module_segment(), when VMALLOC_END is over 0xf000,
> ALIGN(VMALLOC_END, SZ_256M) has value 0.
>
> In that case, addr >= ALIGN(VMALLOC_END, SZ_256M) is always
> true then is_module_segment() always returns false.
>
> Use (ALIGN(VMALLOC_END, SZ_256M) - 1) which will have
> value 0x and will be suitable for the comparison.
>
> Reported-by: Andreas Schwab 
> Signed-off-by: Christophe Leroy 
> Fixes: c49643319715 ("powerpc/32s: Only leave NX unset on segments used for 
> modules")

Thanks, that fixes the crash.

Tested-by: Andreas Schwab 

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH v2 00/13] mm/debug_vm_pgtable fixes

2020-08-21 Thread Anshuman Khandual



On 08/21/2020 02:20 PM, Aneesh Kumar K.V wrote:
> 
>> Sure. I am hoping kernel test robot will pick this up. I did an x86 and 
>> about 19 different ppc config build with the series. The git tree above was 
>> pushed with that. Considering you authored the change i am wondering if you 
>> could help with checking other architecture (may be atleast arm variant)
>>
> 
> I updated the tree after a defconfig build on arch/arm64/s390/x86.  I will 
> not be able to boot test them.
> 
> Can you help with boot testing on arm?

Did not see any obvious problem.


Re: [PATCH v2 00/13] mm/debug_vm_pgtable fixes

2020-08-21 Thread Anshuman Khandual


On 08/19/2020 06:30 PM, Aneesh Kumar K.V wrote:
> This patch series includes fixes for debug_vm_pgtable test code so that
> they follow page table updates rules correctly. The first two patches 
> introduce
> changes w.r.t ppc64. The patches are included in this series for 
> completeness. We can
> merge them via ppc64 tree if required.
> 
> Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
> page table update rules.
> 
> Changes from V1:
> * Address review feedback
> * drop test specific pfn_pte and pfn_pmd.
> * Update ppc64 page table helper to add _PAGE_PTE 
> 
> Aneesh Kumar K.V (13):
>   powerpc/mm: Add DEBUG_VM WARN for pmd_clear
>   powerpc/mm: Move setting pte specific flags to pfn_pte
>   mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
>   mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge
> vmap support.
>   mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with
> CONFIG_NUMA_BALANCING
>   mm/debug_vm_pgtable/THP: Mark the pte entry huge before using
> set_pmd/pud_at
>   mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an
> existing pte entry
>   mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP
>   mm/debug_vm_pgtable/locks: Move non page table modifying test together
>   mm/debug_vm_pgtable/locks: Take correct page table lock
>   mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries
>   mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64
>   mm/debug_vm_pgtable: populate a pte entry before fetching it
> 
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  29 +++-
>  arch/powerpc/include/asm/nohash/pgtable.h|   5 -
>  arch/powerpc/mm/book3s64/pgtable.c   |   2 +-
>  arch/powerpc/mm/pgtable.c|   5 -
>  include/linux/io.h   |  12 ++
>  mm/debug_vm_pgtable.c| 151 +++
>  6 files changed, 127 insertions(+), 77 deletions(-)
> 

Changes proposed here will impact other enabled platforms as well.
Adding the following folks and mailing lists, and hoping to get a
broader review and test coverage. Please do include them in the
next iteration as well.

+ linux-arm-ker...@lists.infradead.org
+ linux-s...@vger.kernel.org
+ linux-snps-...@lists.infradead.org
+ x...@kernel.org
+ linux-a...@vger.kernel.org

+ Gerald Schaefer 
+ Christophe Leroy 
+ Christophe Leroy 
+ Vineet Gupta 
+ Mike Rapoport 
+ Qian Cai 


Re: [PATCH v2 00/13] mm/debug_vm_pgtable fixes

2020-08-21 Thread Aneesh Kumar K.V



Sure. I am hoping kernel test robot will pick this up. I did an x86 and 
about 19 different ppc config build with the series. The git tree above 
was pushed with that. Considering you authored the change i am wondering 
if you could help with checking other architecture (may be atleast arm 
variant)




I updated the tree after a defconfig build on arch/arm64/s390/x86.  I 
will not be able to boot test them.


Can you help with boot testing on arm?

-aneesh


Re: [PATCH v2 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.

2020-08-21 Thread Anshuman Khandual



On 08/19/2020 06:30 PM, Aneesh Kumar K.V wrote:
> ppc64 supports huge vmap only with radix translation. Hence use arch helper
> to determine the huge vmap support.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  include/linux/io.h| 12 
>  mm/debug_vm_pgtable.c |  4 ++--
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/io.h b/include/linux/io.h
> index 8394c56babc2..0b1ecda0cc86 100644
> --- a/include/linux/io.h
> +++ b/include/linux/io.h
> @@ -38,6 +38,18 @@ int arch_ioremap_pud_supported(void);
>  int arch_ioremap_pmd_supported(void);
>  #else
>  static inline void ioremap_huge_init(void) { }
> +static inline int arch_ioremap_p4d_supported(void)
> +{
> + return false;
> +}
> +static inline int arch_ioremap_pud_supported(void)
> +{
> + return false;
> +}
> +static inline int arch_ioremap_pmd_supported(void)
> +{
> + return false;
> +}
>  #endif
>  
>  /*
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 57259e2dbd17..cf3c4792b4a2 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c

This would need an explicit inclusion of  in order
to prevent build failure in some cases.

> @@ -206,7 +206,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
> long pfn, pgprot_t prot)
>  {
>   pmd_t pmd;
>  
> - if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
> + if (!arch_ioremap_pmd_supported())
>   return;
>  
>   pr_debug("Validating PMD huge\n");
> @@ -320,7 +320,7 @@ static void __init pud_huge_tests(pud_t *pudp, unsigned 
> long pfn, pgprot_t prot)
>  {
>   pud_t pud;
>  
> - if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
> + if (!arch_ioremap_pud_supported())
>   return;
>  
>   pr_debug("Validating PUD huge\n");
> 


Re: [PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct

2020-08-21 Thread Laurent Dufour

Le 11/08/2020 à 03:51, Scott Cheloha a écrit :

At memory hot-remove time we can retrieve an LMB's nid from its
corresponding memory_block.  There is no need to store the nid
in multiple locations.

Note that lmb_to_memblock() uses find_memory_block() to get the
corresponding memory_block.  As find_memory_block() runs in sub-linear
time this approach is negligibly slower than what we do at present.

In exchange for this lookup at hot-remove time we no longer need to
call memory_add_physaddr_to_nid() during drmem_init() for each LMB.
On powerpc, memory_add_physaddr_to_nid() is a linear search, so this
spares us an O(n^2) initialization during boot.

On systems with many LMBs that initialization overhead is palpable and
disruptive.  For example, on a box with 249854 LMBs we're seeing
drmem_init() take upwards of 30 seconds to complete:

[   53.721639] drmem: initializing drmem v2
[   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! [swapper/0:1]
[   80.604377] Modules linked in:
[   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+ #4
[   80.604397] NIP:  c00a4980 LR: c00a4940 CTR: 
[   80.604407] REGS: c0002dbff8493830 TRAP: 0901   Not tainted  (5.6.0-rc2+)
[   80.604412] MSR:  82009033   CR: 44000248  
XER: 000d
[   80.604431] CFAR: c00a4a38 IRQMASK: 0
[   80.604431] GPR00: c00a4940 c0002dbff8493ac0 c1904400 
c0003cfede30
[   80.604431] GPR04:  c0f4095a 002f 
1000
[   80.604431] GPR08: cbf7ecdb7fb8 cbf7ecc2d3c8 0008 
c00c0002fdfb2001
[   80.604431] GPR12:  c0001e8ec200
[   80.604477] NIP [c00a4980] hot_add_scn_to_nid+0xa0/0x3e0
[   80.604486] LR [c00a4940] hot_add_scn_to_nid+0x60/0x3e0
[   80.604492] Call Trace:
[   80.604498] [c0002dbff8493ac0] [c00a4940] 
hot_add_scn_to_nid+0x60/0x3e0 (unreliable)
[   80.604509] [c0002dbff8493b20] [c0087c10] 
memory_add_physaddr_to_nid+0x20/0x60
[   80.604521] [c0002dbff8493b40] [c10d4880] drmem_init+0x25c/0x2f0
[   80.604530] [c0002dbff8493c10] [c0010154] do_one_initcall+0x64/0x2c0
[   80.604540] [c0002dbff8493ce0] [c10c4aa0] 
kernel_init_freeable+0x2d8/0x3a0
[   80.604550] [c0002dbff8493db0] [c0010824] kernel_init+0x2c/0x148
[   80.604560] [c0002dbff8493e20] [c000b648] 
ret_from_kernel_thread+0x5c/0x74
[   80.604567] Instruction dump:
[   80.604574] 392918e8 e949 e90a000a e92a 80ea000c 1d080018 3908ffe8 
7d094214
[   80.604586] 7fa94040 419d00dc e9490010 714a0088 <2faa0008> 409e00ac e949 
7fbe5040
[   89.047390] drmem: 249854 LMB(s)

With a patched kernel on the same machine we're no longer seeing the
soft lockup.  drmem_init() now completes in negligible time, even when
the LMB count is large.

Signed-off-by: Scott Cheloha 
---
v1:
  - RFC

v2:
  - Adjusted commit message.
  - Miscellaneous cleanup.

v3:
  - Correct issue found by Laurent Dufour :
- Add missing put_device() call in dlpar_remove_lmb() for the
  lmb's associated mem_block.

  arch/powerpc/include/asm/drmem.h  | 21 
  arch/powerpc/mm/drmem.c   |  6 +
  .../platforms/pseries/hotplug-memory.c| 24 ---
  3 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 414d209f45bb..34e4e9b257f5 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -13,9 +13,6 @@ struct drmem_lmb {
u32 drc_index;
u32 aa_index;
u32 flags;
-#ifdef CONFIG_MEMORY_HOTPLUG
-   int nid;
-#endif
  };

  struct drmem_lmb_info {
@@ -104,22 +101,4 @@ static inline void 
invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
lmb->aa_index = 0x;
  }

-#ifdef CONFIG_MEMORY_HOTPLUG
-static inline void lmb_set_nid(struct drmem_lmb *lmb)
-{
-   lmb->nid = memory_add_physaddr_to_nid(lmb->base_addr);
-}
-static inline void lmb_clear_nid(struct drmem_lmb *lmb)
-{
-   lmb->nid = -1;
-}
-#else
-static inline void lmb_set_nid(struct drmem_lmb *lmb)
-{
-}
-static inline void lmb_clear_nid(struct drmem_lmb *lmb)
-{
-}
-#endif
-
  #endif /* _ASM_POWERPC_LMB_H */
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 59327cefbc6a..873fcfc7b875 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -362,10 +362,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
if (!drmem_info->lmbs)
return;

-   for_each_drmem_lmb(lmb) {
+   for_each_drmem_lmb(lmb)
read_drconf_v1_cell(lmb, &prop);
-   lmb_set_nid(lmb);
-   }
  }

  static void __init init_drmem_v2_lmbs(const __be32 *prop)
@@ -410,8 +408,6 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)

lmb->aa_index = dr_cell.aa_index;
 

Re: [PATCH v2 07/13] mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an existing pte entry

2020-08-21 Thread Anshuman Khandual



On 08/21/2020 12:44 PM, Aneesh Kumar K.V wrote:
> On 8/20/20 8:02 PM, Christophe Leroy wrote:
>>
>>
>> Le 19/08/2020 à 15:01, Aneesh Kumar K.V a écrit :
>>> set_pte_at() should not be used to set a pte entry at locations that
>>> already holds a valid pte entry. Architectures like ppc64 don't do TLB
>>> invalidate in set_pte_at() and hence expect it to be used to set locations
>>> that are not a valid PTE.
>>>
>>> Signed-off-by: Aneesh Kumar K.V 
>>> ---
>>>   mm/debug_vm_pgtable.c | 35 +++
>>>   1 file changed, 15 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 76f4c713e5a3..9c7e2c9cfc76 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -74,15 +74,18 @@ static void __init pte_advanced_tests(struct mm_struct 
>>> *mm,
>>>   {
>>>   pte_t pte = pfn_pte(pfn, prot);
>>> +    /*
>>> + * Architectures optimize set_pte_at by avoiding TLB flush.
>>> + * This requires set_pte_at to be not used to update an
>>> + * existing pte entry. Clear pte before we do set_pte_at
>>> + */
>>> +
>>>   pr_debug("Validating PTE advanced\n");
>>>   pte = pfn_pte(pfn, prot);
>>>   set_pte_at(mm, vaddr, ptep, pte);
>>>   ptep_set_wrprotect(mm, vaddr, ptep);
>>>   pte = ptep_get(ptep);
>>>   WARN_ON(pte_write(pte));
>>> -
>>> -    pte = pfn_pte(pfn, prot);
>>> -    set_pte_at(mm, vaddr, ptep, pte);
>>>   ptep_get_and_clear(mm, vaddr, ptep);
>>>   pte = ptep_get(ptep);
>>>   WARN_ON(!pte_none(pte));
>>> @@ -96,13 +99,11 @@ static void __init pte_advanced_tests(struct mm_struct 
>>> *mm,
>>>   ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
>>>   pte = ptep_get(ptep);
>>>   WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
>>> -
>>> -    pte = pfn_pte(pfn, prot);
>>> -    set_pte_at(mm, vaddr, ptep, pte);
>>>   ptep_get_and_clear_full(mm, vaddr, ptep, 1);
>>>   pte = ptep_get(ptep);
>>>   WARN_ON(!pte_none(pte));
>>> +    pte = pfn_pte(pfn, prot);
>>>   pte = pte_mkyoung(pte);
>>>   set_pte_at(mm, vaddr, ptep, pte);
>>>   ptep_test_and_clear_young(vma, vaddr, ptep);
>>> @@ -164,9 +165,6 @@ static void __init pmd_advanced_tests(struct mm_struct 
>>> *mm,
>>>   pmdp_set_wrprotect(mm, vaddr, pmdp);
>>>   pmd = READ_ONCE(*pmdp);
>>>   WARN_ON(pmd_write(pmd));
>>> -
>>> -    pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>>> -    set_pmd_at(mm, vaddr, pmdp, pmd);
>>>   pmdp_huge_get_and_clear(mm, vaddr, pmdp);
>>>   pmd = READ_ONCE(*pmdp);
>>>   WARN_ON(!pmd_none(pmd));
>>> @@ -180,13 +178,11 @@ static void __init pmd_advanced_tests(struct 
>>> mm_struct *mm,
>>>   pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
>>>   pmd = READ_ONCE(*pmdp);
>>>   WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
>>> -
>>> -    pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>>> -    set_pmd_at(mm, vaddr, pmdp, pmd);
>>>   pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1);
>>>   pmd = READ_ONCE(*pmdp);
>>>   WARN_ON(!pmd_none(pmd));
>>> +    pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>>>   pmd = pmd_mkyoung(pmd);
>>>   set_pmd_at(mm, vaddr, pmdp, pmd);
>>>   pmdp_test_and_clear_young(vma, vaddr, pmdp);
>>> @@ -283,18 +279,10 @@ static void __init pud_advanced_tests(struct 
>>> mm_struct *mm,
>>>   WARN_ON(pud_write(pud));
>>>   #ifndef __PAGETABLE_PMD_FOLDED
>>
>> Same as below, once set_put_at() is gone, I don't think this #ifndef 
>> __PAGETABLE_PMD_FOLDED is still need, should be possible to replace by 'if 
>> (mm_pmd_folded())'
> 
> I would skip that change in this series because I still haven't worked out 
> what it means to have FOLDED PMD with 
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD.
> 
> 
> We should probably push that as a cleanup later and somebody who can test 
> that config can do that? Currently i can't boot ppc64 with DBUG_VM_PGTABLE 
> enabled on ppc64 because it is all buggy w.r.t rules.

Agreed. I think its OK not address these changes/improvements in this particular
series which is trying to modify the test to make it run on ppc64 platform. I 
will
probably look into that later.


Re: [PATCH v2 00/13] mm/debug_vm_pgtable fixes

2020-08-21 Thread Aneesh Kumar K.V

On 8/21/20 1:31 PM, Anshuman Khandual wrote:



On 08/21/2020 12:23 PM, Aneesh Kumar K.V wrote:

On 8/21/20 9:03 AM, Anshuman Khandual wrote:



On 08/19/2020 07:15 PM, Aneesh Kumar K.V wrote:

"Aneesh Kumar K.V"  writes:


This patch series includes fixes for debug_vm_pgtable test code so that
they follow page table updates rules correctly. The first two patches introduce
changes w.r.t ppc64. The patches are included in this series for completeness. 
We can
merge them via ppc64 tree if required.

Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
page table update rules.

Changes from V1:
* Address review feedback
* drop test specific pfn_pte and pfn_pmd.
* Update ppc64 page table helper to add _PAGE_PTE

Aneesh Kumar K.V (13):
    powerpc/mm: Add DEBUG_VM WARN for pmd_clear
    powerpc/mm: Move setting pte specific flags to pfn_pte
    mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
    mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge
  vmap support.
    mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with
  CONFIG_NUMA_BALANCING
    mm/debug_vm_pgtable/THP: Mark the pte entry huge before using
  set_pmd/pud_at
    mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an
  existing pte entry
    mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP
    mm/debug_vm_pgtable/locks: Move non page table modifying test together
    mm/debug_vm_pgtable/locks: Take correct page table lock
    mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries
    mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64
    mm/debug_vm_pgtable: populate a pte entry before fetching it

   arch/powerpc/include/asm/book3s/64/pgtable.h |  29 +++-
   arch/powerpc/include/asm/nohash/pgtable.h    |   5 -
   arch/powerpc/mm/book3s64/pgtable.c   |   2 +-
   arch/powerpc/mm/pgtable.c    |   5 -
   include/linux/io.h   |  12 ++
   mm/debug_vm_pgtable.c    | 151 +++
   6 files changed, 127 insertions(+), 77 deletions(-)



BTW I picked a wrong branch when sending this. Attaching the diff
against what I want to send.  pfn_pmd() no more updates _PAGE_PTE
because that is handled by pmd_mkhuge().

diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 3b4da7c63e28..e18ae50a275c 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -141,7 +141,7 @@ pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
   unsigned long pmdv;
     pmdv = (pfn << PAGE_SHIFT) & PTE_RPN_MASK;
-    return __pmd(pmdv | pgprot_val(pgprot) | _PAGE_PTE);
+    return pmd_set_protbits(__pmd(pmdv), pgprot);
   }
     pmd_t mk_pmd(struct page *page, pgprot_t pgprot)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 7d9f8e1d790f..cad61d22f33a 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -229,7 +229,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
long pfn, pgprot_t prot)
     static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
   {
-    pmd_t pmd = pfn_pmd(pfn, prot);
+    pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
     if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
   return;



Cover letter does not mention which branch or tag this series applies on.
Just assumed it to be 5.9-rc1. Should the above changes be captured as a
pre-requisite patch ?

Anyways, the series fails to be build on arm64.

A) Without CONFIG_TRANSPARENT_HUGEPAGE

mm/debug_vm_pgtable.c: In function ‘debug_vm_pgtable’:
mm/debug_vm_pgtable.c:1045:2: error: too many arguments to function 
‘pmd_advanced_tests’
    pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
    ^~
mm/debug_vm_pgtable.c:366:20: note: declared here
   static void __init pmd_advanced_tests(struct mm_struct *mm,
  ^~

B) As mentioned previously, this should be solved by including 

mm/debug_vm_pgtable.c: In function ‘pmd_huge_tests’:
mm/debug_vm_pgtable.c:215:7: error: implicit declaration of function 
‘arch_ioremap_pmd_supported’; did you mean ‘arch_disable_smp_support’? 
[-Werror=implicit-function-declaration]
    if (!arch_ioremap_pmd_supported())
     ^~

Please make sure that the series builds on all enabled platforms i.e x86,
arm64, ppc32, ppc64, arc, s390 along with selectively enabling/disabling
all the features that make various #ifdefs in the test.



I was hoping to get kernel test robot build report to verify that. But if you 
can help with that i have pushed a branch to github with reported build failure 
fixes.

https://github.com/kvaneesh/linux/tree/debug_vm_pgtable

I still haven't looked at the PMD_FOLDED feedback from Christophe because I am 
not sure i follow why we are checking for PMD folded there.


If this series does not build on existing enabled platform

Re: [PATCH v2 10/13] mm/debug_vm_pgtable/locks: Take correct page table lock

2020-08-21 Thread Aneesh Kumar K.V

On 8/21/20 1:33 PM, Anshuman Khandual wrote:



On 08/19/2020 06:31 PM, Aneesh Kumar K.V wrote:

Make sure we call pte accessors with correct lock held.

Signed-off-by: Aneesh Kumar K.V 
---
  mm/debug_vm_pgtable.c | 34 --
  1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 69fe3cd8126c..8f7a8ccb5a54 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -1024,33 +1024,39 @@ static int __init debug_vm_pgtable(void)
pmd_thp_tests(pmd_aligned, prot);
pud_thp_tests(pud_aligned, prot);
  
+	hugetlb_basic_tests(pte_aligned, prot);

+
/*
 * Page table modifying tests
 */
-   pte_clear_tests(mm, ptep, vaddr);
-   pmd_clear_tests(mm, pmdp);
-   pud_clear_tests(mm, pudp);
-   p4d_clear_tests(mm, p4dp);
-   pgd_clear_tests(mm, pgdp);
  
  	ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl);

+   pte_clear_tests(mm, ptep, vaddr);
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
-   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
-   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-
+   pte_unmap_unlock(ptep, ptl);
  
+	ptl = pmd_lock(mm, pmdp);

+   pmd_clear_tests(mm, pmdp);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
pmd_huge_tests(pmdp, pmd_aligned, prot);
+   pmd_populate_tests(mm, pmdp, saved_ptep);
+   spin_unlock(ptl);
+
+   ptl = pud_lock(mm, pudp);
+   pud_clear_tests(mm, pudp);
+   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
pud_huge_tests(pudp, pud_aligned, prot);
+   pud_populate_tests(mm, pudp, saved_pmdp);
+   spin_unlock(ptl);
  
-	pte_unmap_unlock(ptep, ptl);

+   //hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);


Commenting out an existing test in the middle of another change ?



That is already fixed. That was me creating a git diff against a wrong 
branch.


Thanks.
-aneesh


[PATCH] powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver

2020-08-21 Thread Kajol Jain
Commit 792f73f747b8 ("powerpc/hv-24x7: Add sysfs files inside hv-24x7
device to show cpumask") added cpumask file as part of hv-24x7 driver
inside the interface folder. Cpumask file suppose to be in the top
folder of the pmu driver inorder to make hotplug works.

This patch fix that issue and create new group 'cpumask_attr_group'
to add cpumask file and make sure it added on top folder.

command:# cat /sys/devices/hv_24x7/cpumask
0

Fixes: 792f73f747b8 ("powerpc/hv-24x7: Add sysfs files inside hv-24x7
device to show cpumask")
Signed-off-by: Kajol Jain 
---
 .../testing/sysfs-bus-event_source-devices-hv_24x7|  2 +-
 arch/powerpc/perf/hv-24x7.c   | 11 ++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
index f7e32f218f73..e82fc37be802 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
@@ -43,7 +43,7 @@ Description:  read only
This sysfs interface exposes the number of cores per chip
present in the system.
 
-What:  /sys/devices/hv_24x7/interface/cpumask
+What:  /sys/devices/hv_24x7/cpumask
 Date:  July 2020
 Contact:   Linux on PowerPC Developer List 
 Description:   read only
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index cdb7bfbd157e..6e7e820508df 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -1128,6 +1128,15 @@ static struct bin_attribute *if_bin_attrs[] = {
NULL,
 };
 
+static struct attribute *cpumask_attrs[] = {
+   &dev_attr_cpumask.attr,
+   NULL,
+};
+
+static struct attribute_group cpumask_attr_group = {
+   .attrs = cpumask_attrs,
+};
+
 static struct attribute *if_attrs[] = {
&dev_attr_catalog_len.attr,
&dev_attr_catalog_version.attr,
@@ -1135,7 +1144,6 @@ static struct attribute *if_attrs[] = {
&dev_attr_sockets.attr,
&dev_attr_chipspersocket.attr,
&dev_attr_coresperchip.attr,
-   &dev_attr_cpumask.attr,
NULL,
 };
 
@@ -1151,6 +1159,7 @@ static const struct attribute_group *attr_groups[] = {
&event_desc_group,
&event_long_desc_group,
&if_group,
+   &cpumask_attr_group,
NULL,
 };
 
-- 
2.18.2



Re: [PATCH v2 10/13] mm/debug_vm_pgtable/locks: Take correct page table lock

2020-08-21 Thread Anshuman Khandual



On 08/19/2020 06:31 PM, Aneesh Kumar K.V wrote:
> Make sure we call pte accessors with correct lock held.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  mm/debug_vm_pgtable.c | 34 --
>  1 file changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 69fe3cd8126c..8f7a8ccb5a54 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -1024,33 +1024,39 @@ static int __init debug_vm_pgtable(void)
>   pmd_thp_tests(pmd_aligned, prot);
>   pud_thp_tests(pud_aligned, prot);
>  
> + hugetlb_basic_tests(pte_aligned, prot);
> +
>   /*
>* Page table modifying tests
>*/
> - pte_clear_tests(mm, ptep, vaddr);
> - pmd_clear_tests(mm, pmdp);
> - pud_clear_tests(mm, pudp);
> - p4d_clear_tests(mm, p4dp);
> - pgd_clear_tests(mm, pgdp);
>  
>   ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl);
> + pte_clear_tests(mm, ptep, vaddr);
>   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
> - pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
> - pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
> - hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
> -
> + pte_unmap_unlock(ptep, ptl);
>  
> + ptl = pmd_lock(mm, pmdp);
> + pmd_clear_tests(mm, pmdp);
> + pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
>   pmd_huge_tests(pmdp, pmd_aligned, prot);
> + pmd_populate_tests(mm, pmdp, saved_ptep);
> + spin_unlock(ptl);
> +
> + ptl = pud_lock(mm, pudp);
> + pud_clear_tests(mm, pudp);
> + pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
>   pud_huge_tests(pudp, pud_aligned, prot);
> + pud_populate_tests(mm, pudp, saved_pmdp);
> + spin_unlock(ptl);
>  
> - pte_unmap_unlock(ptep, ptl);
> + //hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);

Commenting out an existing test in the middle of another change ?


Re: [PATCH v2 00/13] mm/debug_vm_pgtable fixes

2020-08-21 Thread Anshuman Khandual



On 08/21/2020 12:23 PM, Aneesh Kumar K.V wrote:
> On 8/21/20 9:03 AM, Anshuman Khandual wrote:
>>
>>
>> On 08/19/2020 07:15 PM, Aneesh Kumar K.V wrote:
>>> "Aneesh Kumar K.V"  writes:
>>>
 This patch series includes fixes for debug_vm_pgtable test code so that
 they follow page table updates rules correctly. The first two patches 
 introduce
 changes w.r.t ppc64. The patches are included in this series for 
 completeness. We can
 merge them via ppc64 tree if required.

 Hugetlb test is disabled on ppc64 because that needs larger change to 
 satisfy
 page table update rules.

 Changes from V1:
 * Address review feedback
 * drop test specific pfn_pte and pfn_pmd.
 * Update ppc64 page table helper to add _PAGE_PTE

 Aneesh Kumar K.V (13):
    powerpc/mm: Add DEBUG_VM WARN for pmd_clear
    powerpc/mm: Move setting pte specific flags to pfn_pte
    mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
    mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge
  vmap support.
    mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with
  CONFIG_NUMA_BALANCING
    mm/debug_vm_pgtable/THP: Mark the pte entry huge before using
  set_pmd/pud_at
    mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an
  existing pte entry
    mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP
    mm/debug_vm_pgtable/locks: Move non page table modifying test together
    mm/debug_vm_pgtable/locks: Take correct page table lock
    mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries
    mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64
    mm/debug_vm_pgtable: populate a pte entry before fetching it

   arch/powerpc/include/asm/book3s/64/pgtable.h |  29 +++-
   arch/powerpc/include/asm/nohash/pgtable.h    |   5 -
   arch/powerpc/mm/book3s64/pgtable.c   |   2 +-
   arch/powerpc/mm/pgtable.c    |   5 -
   include/linux/io.h   |  12 ++
   mm/debug_vm_pgtable.c    | 151 +++
   6 files changed, 127 insertions(+), 77 deletions(-)

>>>
>>> BTW I picked a wrong branch when sending this. Attaching the diff
>>> against what I want to send.  pfn_pmd() no more updates _PAGE_PTE
>>> because that is handled by pmd_mkhuge().
>>>
>>> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
>>> b/arch/powerpc/mm/book3s64/pgtable.c
>>> index 3b4da7c63e28..e18ae50a275c 100644
>>> --- a/arch/powerpc/mm/book3s64/pgtable.c
>>> +++ b/arch/powerpc/mm/book3s64/pgtable.c
>>> @@ -141,7 +141,7 @@ pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
>>>   unsigned long pmdv;
>>>     pmdv = (pfn << PAGE_SHIFT) & PTE_RPN_MASK;
>>> -    return __pmd(pmdv | pgprot_val(pgprot) | _PAGE_PTE);
>>> +    return pmd_set_protbits(__pmd(pmdv), pgprot);
>>>   }
>>>     pmd_t mk_pmd(struct page *page, pgprot_t pgprot)
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 7d9f8e1d790f..cad61d22f33a 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -229,7 +229,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
>>> long pfn, pgprot_t prot)
>>>     static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t 
>>> prot)
>>>   {
>>> -    pmd_t pmd = pfn_pmd(pfn, prot);
>>> +    pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>>>     if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
>>>   return;
>>>
>>
>> Cover letter does not mention which branch or tag this series applies on.
>> Just assumed it to be 5.9-rc1. Should the above changes be captured as a
>> pre-requisite patch ?
>>
>> Anyways, the series fails to be build on arm64.
>>
>> A) Without CONFIG_TRANSPARENT_HUGEPAGE
>>
>> mm/debug_vm_pgtable.c: In function ‘debug_vm_pgtable’:
>> mm/debug_vm_pgtable.c:1045:2: error: too many arguments to function 
>> ‘pmd_advanced_tests’
>>    pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
>>    ^~
>> mm/debug_vm_pgtable.c:366:20: note: declared here
>>   static void __init pmd_advanced_tests(struct mm_struct *mm,
>>  ^~
>>
>> B) As mentioned previously, this should be solved by including 
>>
>> mm/debug_vm_pgtable.c: In function ‘pmd_huge_tests’:
>> mm/debug_vm_pgtable.c:215:7: error: implicit declaration of function 
>> ‘arch_ioremap_pmd_supported’; did you mean ‘arch_disable_smp_support’? 
>> [-Werror=implicit-function-declaration]
>>    if (!arch_ioremap_pmd_supported())
>>     ^~
>>
>> Please make sure that the series builds on all enabled platforms i.e x86,
>> arm64, ppc32, ppc64, arc, s390 along with selectively enabling/disabling
>> all the features that make various #ifdefs in the test.
>>
> 
> I was hoping to get kernel test robot build report to verify that. But if you

Re: [PATCH v5 8/8] mm/vmalloc: Hugepage vmalloc mappings

2020-08-21 Thread kernel test robot
Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on arm64/for-next/core tip/x86/mm linus/master v5.9-rc1]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/huge-vmalloc-mappings/20200821-124543
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: riscv-allnoconfig (attached as .config)
compiler: riscv64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   riscv64-linux-ld: mm/page_alloc.o: in function `.L1578':
>> page_alloc.c:(.init.text+0x11a4): undefined reference to `find_vm_area'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[PATCH] powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000

2020-08-21 Thread Christophe Leroy
In is_module_segment(), when VMALLOC_END is over 0xf000,
ALIGN(VMALLOC_END, SZ_256M) has value 0.

In that case, addr >= ALIGN(VMALLOC_END, SZ_256M) is always
true then is_module_segment() always returns false.

Use (ALIGN(VMALLOC_END, SZ_256M) - 1) which will have
value 0x and will be suitable for the comparison.

Reported-by: Andreas Schwab 
Signed-off-by: Christophe Leroy 
Fixes: c49643319715 ("powerpc/32s: Only leave NX unset on segments used for 
modules")
---
 arch/powerpc/mm/book3s32/mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 82ae9e06a773..d426eaf76bb0 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -194,12 +194,12 @@ static bool is_module_segment(unsigned long addr)
 #ifdef MODULES_VADDR
if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
return false;
-   if (addr >= ALIGN(MODULES_END, SZ_256M))
+   if (addr > ALIGN(MODULES_END, SZ_256M) - 1)
return false;
 #else
if (addr < ALIGN_DOWN(VMALLOC_START, SZ_256M))
return false;
-   if (addr >= ALIGN(VMALLOC_END, SZ_256M))
+   if (addr > ALIGN(VMALLOC_END, SZ_256M) - 1)
return false;
 #endif
return true;
-- 
2.25.0



Re: [PATCH v2 07/13] mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an existing pte entry

2020-08-21 Thread Aneesh Kumar K.V

On 8/20/20 8:02 PM, Christophe Leroy wrote:



Le 19/08/2020 à 15:01, Aneesh Kumar K.V a écrit :

set_pte_at() should not be used to set a pte entry at locations that
already holds a valid pte entry. Architectures like ppc64 don't do TLB
invalidate in set_pte_at() and hence expect it to be used to set 
locations

that are not a valid PTE.

Signed-off-by: Aneesh Kumar K.V 
---
  mm/debug_vm_pgtable.c | 35 +++
  1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 76f4c713e5a3..9c7e2c9cfc76 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -74,15 +74,18 @@ static void __init pte_advanced_tests(struct 
mm_struct *mm,

  {
  pte_t pte = pfn_pte(pfn, prot);
+    /*
+ * Architectures optimize set_pte_at by avoiding TLB flush.
+ * This requires set_pte_at to be not used to update an
+ * existing pte entry. Clear pte before we do set_pte_at
+ */
+
  pr_debug("Validating PTE advanced\n");
  pte = pfn_pte(pfn, prot);
  set_pte_at(mm, vaddr, ptep, pte);
  ptep_set_wrprotect(mm, vaddr, ptep);
  pte = ptep_get(ptep);
  WARN_ON(pte_write(pte));
-
-    pte = pfn_pte(pfn, prot);
-    set_pte_at(mm, vaddr, ptep, pte);
  ptep_get_and_clear(mm, vaddr, ptep);
  pte = ptep_get(ptep);
  WARN_ON(!pte_none(pte));
@@ -96,13 +99,11 @@ static void __init pte_advanced_tests(struct 
mm_struct *mm,

  ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
  pte = ptep_get(ptep);
  WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
-
-    pte = pfn_pte(pfn, prot);
-    set_pte_at(mm, vaddr, ptep, pte);
  ptep_get_and_clear_full(mm, vaddr, ptep, 1);
  pte = ptep_get(ptep);
  WARN_ON(!pte_none(pte));
+    pte = pfn_pte(pfn, prot);
  pte = pte_mkyoung(pte);
  set_pte_at(mm, vaddr, ptep, pte);
  ptep_test_and_clear_young(vma, vaddr, ptep);
@@ -164,9 +165,6 @@ static void __init pmd_advanced_tests(struct 
mm_struct *mm,

  pmdp_set_wrprotect(mm, vaddr, pmdp);
  pmd = READ_ONCE(*pmdp);
  WARN_ON(pmd_write(pmd));
-
-    pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
-    set_pmd_at(mm, vaddr, pmdp, pmd);
  pmdp_huge_get_and_clear(mm, vaddr, pmdp);
  pmd = READ_ONCE(*pmdp);
  WARN_ON(!pmd_none(pmd));
@@ -180,13 +178,11 @@ static void __init pmd_advanced_tests(struct 
mm_struct *mm,

  pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
  pmd = READ_ONCE(*pmdp);
  WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
-
-    pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
-    set_pmd_at(mm, vaddr, pmdp, pmd);
  pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1);
  pmd = READ_ONCE(*pmdp);
  WARN_ON(!pmd_none(pmd));
+    pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
  pmd = pmd_mkyoung(pmd);
  set_pmd_at(mm, vaddr, pmdp, pmd);
  pmdp_test_and_clear_young(vma, vaddr, pmdp);
@@ -283,18 +279,10 @@ static void __init pud_advanced_tests(struct 
mm_struct *mm,

  WARN_ON(pud_write(pud));
  #ifndef __PAGETABLE_PMD_FOLDED


Same as below, once set_put_at() is gone, I don't think this #ifndef 
__PAGETABLE_PMD_FOLDED is still need, should be possible to replace by 
'if (mm_pmd_folded())'


I would skip that change in this series because I still haven't worked 
out what it means to have FOLDED PMD with 
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD.



We should probably push that as a cleanup later and somebody who can 
test that config can do that? Currently i can't boot ppc64 with 
DBUG_VM_PGTABLE enabled on ppc64 because it is all buggy w.r.t rules.


-aneesh