[PATCH v2] powerpc/pseries: Do not initiate shutdown when system is running on UPS

2020-08-19 Thread Vasant Hegde
As per PAPR we have to look for both EPOW sensor value and event modifier to
identify type of event and take appropriate action.

Sensor value = 3 (EPOW_SYSTEM_SHUTDOWN) schedule system to be shutdown after
  OS defined delay (default 10 mins).

EPOW Event Modifier for sensor value = 3:
   We have to initiate immediate shutdown for most of the event modifier except
   value = 2 (system running on UPS).

Checking with firmware document its clear that we have to wait for predefined
time before initiating shutdown. If power is restored within time we should
cancel the shutdown process. I think commit 79872e35 accidently enabled
immediate poweroff for EPOW_SHUTDOWN_ON_UPS event.

We have user space tool (rtas_errd) on LPAR to monitor for EPOW_SHUTDOWN_ON_UPS.
Once it gets event it initiates shutdown after predefined time. Also starts
monitoring for any new EPOW events. If it receives "Power restored" event
before predefined time it will cancel the shutdown. Otherwise after
predefined time it will shutdown the system.

Fixes: 79872e35 (powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must 
initiate shutdown)
Cc: sta...@vger.kernel.org # v4.0+
Cc: Tyrel Datwyler 
Cc: Michael Ellerman 
Signed-off-by: Vasant Hegde 
---
Changes in v2:
  - Updated patch description based on mpe, Tyrel comment.

-Vasant
 arch/powerpc/platforms/pseries/ras.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c 
b/arch/powerpc/platforms/pseries/ras.c
index f3736fcd98fc..13c86a292c6d 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -184,7 +184,6 @@ static void handle_system_shutdown(char event_modifier)
case EPOW_SHUTDOWN_ON_UPS:
pr_emerg("Loss of system power detected. System is running on"
 " UPS/battery. Check RTAS error log for details\n");
-   orderly_poweroff(true);
break;
 
case EPOW_SHUTDOWN_LOSS_OF_CRITICAL_FUNCTIONS:
-- 
2.26.2



Re: [PATCH v2 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.

2020-08-19 Thread Aneesh Kumar K.V
kernel test robot  writes:

> Hi "Aneesh,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on hnaz-linux-mm/master]
> [also build test ERROR on powerpc/next linus/master v5.9-rc1 next-20200819]
> [cannot apply to mmotm/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:
> https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/mm-debug_vm_pgtable-fixes/20200819-213446
> base:   https://github.com/hnaz/linux-mm master
> config: i386-randconfig-s002-20200818 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
> reproduce:
> # apt-get install sparse
> # sparse version: v0.6.2-183-gaa6ede3b-dirty
> # save the attached .config to linux build tree
> make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
>
> All errors (new ones prefixed by >>):
>
>>> arch/x86/mm/ioremap.c:484:12: error: redefinition of 
>>> 'arch_ioremap_p4d_supported'
>  484 | int __init arch_ioremap_p4d_supported(void)
>  |^~
>In file included from arch/x86/mm/ioremap.c:12:
>include/linux/io.h:41:19: note: previous definition of 
> 'arch_ioremap_p4d_supported' was here
>   41 | static inline int arch_ioremap_p4d_supported(void)
>  |   ^~

I guess trying to work this out without using #ifdef is complex. I ended
up with the below

1 file changed, 12 insertions(+), 2 deletions(-)
mm/debug_vm_pgtable.c | 14 --

modified   mm/debug_vm_pgtable.c
@@ -202,11 +202,12 @@ static void __init pmd_leaf_tests(unsigned long pfn, 
pgprot_t prot)
WARN_ON(!pmd_leaf(pmd));
 }
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
prot)
 {
pmd_t pmd;
 
-   if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+   if (!arch_ioremap_pmd_supported())
return;
 
pr_debug("Validating PMD huge\n");
@@ -220,6 +221,10 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
long pfn, pgprot_t prot)
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
 }
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t 
prot) { }
+#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 
 static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
 {
@@ -316,11 +321,12 @@ static void __init pud_leaf_tests(unsigned long pfn, 
pgprot_t prot)
WARN_ON(!pud_leaf(pud));
 }
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
prot)
 {
pud_t pud;
 
-   if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+   if (!arch_ioremap_pud_supported())
return;
 
pr_debug("Validating PUD huge\n");
@@ -334,6 +340,10 @@ static void __init pud_huge_tests(pud_t *pudp, unsigned 
long pfn, pgprot_t prot)
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 }
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t 
prot) { }
+#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 #else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
 static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
 static void __init pud_advanced_tests(struct mm_struct *mm,


[PATCH v2 2/2] selftests/powerpc: Add a rtas_filter selftest

2020-08-19 Thread Andrew Donnellan
Add a selftest to test the basic functionality of CONFIG_RTAS_FILTER.

Signed-off-by: Andrew Donnellan 
---
v1->v2:
- new patch

---
 .../selftests/powerpc/syscalls/Makefile   |   2 +-
 .../selftests/powerpc/syscalls/rtas_filter.c  | 285 ++
 2 files changed, 286 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/syscalls/rtas_filter.c

diff --git a/tools/testing/selftests/powerpc/syscalls/Makefile 
b/tools/testing/selftests/powerpc/syscalls/Makefile
index 01b22775ca87..b63f8459c704 100644
--- a/tools/testing/selftests/powerpc/syscalls/Makefile
+++ b/tools/testing/selftests/powerpc/syscalls/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-TEST_GEN_PROGS := ipc_unmuxed
+TEST_GEN_PROGS := ipc_unmuxed rtas_filter
 
 CFLAGS += -I../../../../../usr/include
 
diff --git a/tools/testing/selftests/powerpc/syscalls/rtas_filter.c 
b/tools/testing/selftests/powerpc/syscalls/rtas_filter.c
new file mode 100644
index ..3b0ac4ff64e5
--- /dev/null
+++ b/tools/testing/selftests/powerpc/syscalls/rtas_filter.c
@@ -0,0 +1,285 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright 2005-2020 IBM Corporation.
+ *
+ * Includes code from librtas (https://github.com/ibm-power-utilities/librtas/)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "utils.h"
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define cpu_to_be32(x) bswap_32(x)
+#define be32_to_cpu(x) bswap_32(x)
+#else
+#define cpu_to_be32(x) (x)
+#define be32_to_cpu(x) (x)
+#endif
+
+#define RTAS_IO_ASSERT -1098   /* Unexpected I/O Error */
+#define RTAS_UNKNOWN_OP -1099  /* No Firmware Implementation of Function */
+#define BLOCK_SIZE 4096
+#define PAGE_SIZE 4096
+#define MAX_PAGES 64
+
+static const char *ofdt_rtas_path = "/proc/device-tree/rtas";
+
+typedef __be32 uint32_t;
+struct rtas_args {
+   __be32 token;
+   __be32 nargs;
+   __be32 nret;
+   __be32 args[16];
+   __be32 *rets; /* Pointer to return values in args[]. */
+};
+
+struct region {
+   uint64_t addr;
+   uint32_t size;
+   struct region *next;
+};
+
+int read_entire_file(int fd, char **buf, size_t *len)
+{
+   size_t buf_size = 0;
+   size_t off = 0;
+   int rc;
+
+   *buf = NULL;
+   do {
+   buf_size += BLOCK_SIZE;
+   if (*buf == NULL)
+   *buf = malloc(buf_size);
+   else
+   *buf = realloc(*buf, buf_size);
+
+   if (*buf == NULL)
+   return -ENOMEM;
+
+   rc = read(fd, *buf + off, BLOCK_SIZE);
+   if (rc < 0)
+   return -EIO;
+
+   off += rc;
+   } while (rc == BLOCK_SIZE);
+
+   if (len)
+   *len = off;
+
+   return 0;
+}
+
+static int open_prop_file(const char *prop_path, const char *prop_name, int 
*fd)
+{
+   char *path;
+   int len;
+
+   /* allocate enough for two string, a slash and trailing NULL */
+   len = strlen(prop_path) + strlen(prop_name) + 1 + 1;
+   path = malloc(len);
+   if (path == NULL)
+   return -ENOMEM;
+
+   snprintf(path, len, "%s/%s", prop_path, prop_name);
+
+   *fd = open(path, O_RDONLY);
+   free(path);
+   if (*fd < 0)
+   return -errno;
+
+   return 0;
+}
+
+static int get_property(const char *prop_path, const char *prop_name,
+   char **prop_val, size_t *prop_len)
+{
+   int rc, fd;
+
+   rc = open_prop_file(prop_path, prop_name, &fd);
+   if (rc)
+   return rc;
+
+   rc = read_entire_file(fd, prop_val, prop_len);
+   close(fd);
+
+   return rc;
+}
+
+int rtas_token(const char *call_name)
+{
+   char *prop_buf = NULL;
+   size_t len;
+   int rc;
+
+   rc = get_property(ofdt_rtas_path, call_name, &prop_buf, &len);
+   if (rc < 0) {
+   rc = RTAS_UNKNOWN_OP;
+   goto err;
+   }
+
+   rc = be32_to_cpu(*(int *)prop_buf);
+
+err:
+   free(prop_buf);
+   return rc;
+}
+
+static int read_kregion_bounds(struct region *kregion)
+{
+   char *buf;
+   int fd;
+   int rc;
+
+   fd = open("/proc/ppc64/rtas/rmo_buffer", O_RDONLY);
+   if (fd < 0) {
+   printf("Could not open rmo_buffer file\n");
+   return RTAS_IO_ASSERT;
+   }
+
+   rc = read_entire_file(fd, &buf, NULL);
+   close(fd);
+   if (rc) {
+   free(buf);
+   return rc;
+   }
+
+   sscanf(buf, "%" SCNx64 " %x", &kregion->addr, &kregion->size);
+   free(buf);
+
+   if (!(kregion->size && kregion->addr) ||
+   (kregion->size > (PAGE_SIZE * MAX_PAGES))) {
+   printf("Unexpected kregion bounds\n");
+   return RTAS_IO_ASSERT;
+   }
+
+   return 0;
+}
+
+st

[PATCH v2 1/2] powerpc/rtas: Restrict RTAS requests from userspace

2020-08-19 Thread Andrew Donnellan
A number of userspace utilities depend on making calls to RTAS to retrieve
information and update various things.

The existing API through which we expose RTAS to userspace exposes more
RTAS functionality than we actually need, through the sys_rtas syscall,
which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they
want with arbitrary arguments.

Many RTAS calls take the address of a buffer as an argument, and it's up to
the caller to specify the physical address of the buffer as an argument. We
allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can
access, and then expose the physical address and size of this buffer in
/proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address,
poke at the buffer using /dev/mem, and pass an address in the RMO buffer to
the RTAS call.

However, there's nothing stopping the caller from specifying whatever
address they want in the RTAS call, and it's easy to construct a series of
RTAS calls that can overwrite arbitrary bytes (even without /dev/mem
access).

Additionally, there are some RTAS calls that do potentially dangerous
things and for which there are no legitimate userspace use cases.

In the past, this would not have been a particularly big deal as it was
assumed that root could modify all system state freely, but with Secure
Boot and lockdown we need to care about this.

We can't fundamentally change the ABI at this point, however we can address
this by implementing a filter that checks RTAS calls against a list
of permitted calls and forces the caller to use addresses within the RMO
buffer.

The list is based off the list of calls that are used by the librtas
userspace library, and has been tested with a number of existing userspace
RTAS utilities. For compatibility with any applications we are not aware of
that require other calls, the filter can be turned off at build time.

Reported-by: Daniel Axtens 
Cc: sta...@vger.kernel.org
Signed-off-by: Andrew Donnellan 

---
v1->v2:
- address comments from mpe
- shorten the names of some struct members
- make the filter array static/ro_after_init, use const char *
- genericise the fixed buffer size cases
- simplify/get rid of some of the error printing
- get rid of rtas_token_name()
---
 arch/powerpc/Kconfig   |  13 
 arch/powerpc/kernel/rtas.c | 153 +
 2 files changed, 166 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f48bbfb3ce9..8dd42b82379b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -989,6 +989,19 @@ config PPC_SECVAR_SYSFS
  read/write operations on these variables. Say Y if you have
  secure boot enabled and want to expose variables to userspace.
 
+config PPC_RTAS_FILTER
+   bool "Enable filtering of RTAS syscalls"
+   default y
+   depends on PPC_RTAS
+   help
+ The RTAS syscall API has security issues that could be used to
+ compromise system integrity. This option enforces restrictions on the
+ RTAS calls and arguments passed by userspace programs to mitigate
+ these issues.
+
+ Say Y unless you know what you are doing and the filter is causing
+ problems for you.
+
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 806d554ce357..954f41676f69 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -992,6 +992,147 @@ struct pseries_errorlog *get_pseries_errorlog(struct 
rtas_error_log *log,
return NULL;
 }
 
+#ifdef CONFIG_PPC_RTAS_FILTER
+
+/*
+ * The sys_rtas syscall, as originally designed, allows root to pass
+ * arbitrary physical addresses to RTAS calls. A number of RTAS calls
+ * can be abused to write to arbitrary memory and do other things that
+ * are potentially harmful to system integrity, and thus should only
+ * be used inside the kernel and not exposed to userspace.
+ *
+ * All known legitimate users of the sys_rtas syscall will only ever
+ * pass addresses that fall within the RMO buffer, and use a known
+ * subset of RTAS calls.
+ *
+ * Accordingly, we filter RTAS requests to check that the call is
+ * permitted, and that provided pointers fall within the RMO buffer.
+ * The rtas_filters list contains an entry for each permitted call,
+ * with the indexes of the parameters which are expected to contain
+ * addresses and sizes of buffers allocated inside the RMO buffer.
+ */
+struct rtas_filter {
+   const char *name;
+   int token;
+   /* Indexes into the args buffer, -1 if not used */
+   int buf_idx1;
+   int size_idx1;
+   int buf_idx2;
+   int size_idx2;
+
+   int fixed_size;
+};
+
+static struct rtas_filter rtas_filters[] __ro_after_init = {
+   { "ibm,activate-firmware", -1, -1, -1, -1, -1 },
+   { "ibm,configure-connector", -1, 0, -1, 1, -1, 4096 },  /* Special 
cased */
+   { "display-character", -1, -1, -1, -1, -1 },
+   { "ibm,displ

Re: [PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources

2020-08-19 Thread Michael Ellerman
Frederic Barrat  writes:
> Fix typo introduced during recent code cleanup, which could lead to
> silently not freeing resources or oops message (on PCI hotplug or CAPI
> reset).
> Only impacts ioda2, the code path for ioda1 is correct.
>
> Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA 
> setup state")
> Signed-off-by: Frederic Barrat 

I changed the subject to:

powerpc/powernv/pci: Fix possible crash when releasing DMA resources


To hopefully better convey that it's a moderately serious bug, even if
the root cause is just a typo. Otherwise folks scanning the commit log
might think it's just a harmless typo.

cheers

> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index c9c25fb0783c..023a4f987bb2 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2705,7 +2705,7 @@ void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe 
> *pe)
>   struct iommu_table *tbl = pe->table_group.tables[0];
>   int64_t rc;
>  
> - if (pe->dma_setup_done)
> + if (!pe->dma_setup_done)
>   return;
>  
>   rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> -- 
> 2.26.2


[RFC PATCH 2/2] KVM: PPC: Book3S HV: Support prefixed instructions

2020-08-19 Thread Jordan Niethe
There are two main places where instructions are loaded from the guest:
* Emulate loadstore - such as when performing MMIO emulation
  triggered by an HDSI
* After an HV emulation assistance interrupt (e40)

If it is a prefixed instruction that triggers these cases, its suffix
must be loaded. Use the SRR1_PREFIX bit to decide if a suffix needs to
be loaded. Make sure if this bit is set inject_interrupt() also sets it
when giving an interrupt to the guest.

ISA v3.10 extends the Hypervisor Emulation Instruction Register (HEIR)
to 64 bits long to accommodate prefixed instructions. For interrupts
caused by a word instruction the instruction is loaded into bits 32:63
and bits 0:31 are zeroed. When caused by a prefixed instruction the
prefix and suffix are loaded into bits 0:63.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/kvm/book3s.c   | 15 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +++---
 arch/powerpc/kvm/book3s_hv_builtin.c|  3 +++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 14 ++
 4 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 70d8967acc9b..18b1928a571b 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -456,13 +456,24 @@ int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
 {
ulong pc = kvmppc_get_pc(vcpu);
u32 word;
+   u64 doubleword;
int r;
 
if (type == INST_SC)
pc -= 4;
 
-   r = kvmppc_ld(vcpu, &pc, sizeof(u32), &word, false);
-   *inst = ppc_inst(word);
+   if ((kvmppc_get_msr(vcpu) & SRR1_PREFIXED)) {
+   r = kvmppc_ld(vcpu, &pc, sizeof(u64), &doubleword, false);
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+   *inst = ppc_inst_prefix(doubleword & 0x, doubleword >> 
32);
+#else
+   *inst = ppc_inst_prefix(doubleword >> 32, doubleword & 
0x);
+#endif
+   } else {
+   r = kvmppc_ld(vcpu, &pc, sizeof(u32), &word, false);
+   *inst = ppc_inst(word);
+   }
+
if (r == EMULATE_DONE)
return r;
else
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 775ce41738ce..0802471f4856 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -411,9 +411,13 @@ static int instruction_is_store(struct ppc_inst instr)
unsigned int mask;
 
mask = 0x1000;
-   if ((ppc_inst_val(instr) & 0xfc00) == 0x7c00)
-   mask = 0x100;   /* major opcode 31 */
-   return (ppc_inst_val(instr) & mask) != 0;
+   if (ppc_inst_prefixed(instr)) {
+   return (ppc_inst_suffix(instr) & mask) != 0;
+   } else {
+   if ((ppc_inst_val(instr) & 0xfc00) == 0x7c00)
+   mask = 0x100;   /* major opcode 31 */
+   return (ppc_inst_val(instr) & mask) != 0;
+   }
 }
 
 int kvmppc_hv_emulate_mmio(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index 073617ce83e0..41e07e63104b 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -807,6 +807,9 @@ static void inject_interrupt(struct kvm_vcpu *vcpu, int 
vec, u64 srr1_flags)
new_pc += 0xC0004000ULL;
}
 
+   if (msr & SRR1_PREFIXED)
+   srr1_flags |= SRR1_PREFIXED;
+
kvmppc_set_srr0(vcpu, pc);
kvmppc_set_srr1(vcpu, (msr & SRR1_MSR_BITS) | srr1_flags);
kvmppc_set_pc(vcpu, new_pc);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 4853b3444c5f..f2a609413621 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1365,6 +1365,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
cmpwi   r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST
bne 11f
mfspr   r3,SPRN_HEIR
+   andis.  r0,r11,SRR1_PREFIXED@h
+   cmpwi   r0,0
+   beq 12f
+   rldicl  r4,r3,0,32  /* Suffix */
+   srdir3,r3,32/* Prefix */
+   b   11f
+12:
+BEGIN_FTR_SECTION
+   rldicl  r3,r3,0,32  /* Word */
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_31)
 11:stw r3,VCPU_HEIR(r9)
stw r4,VCPU_HEIR+4(r9)
 
@@ -2175,6 +2185,10 @@ fast_interrupt_c_return:
ori r4, r3, MSR_DR  /* Enable paging for data */
mtmsrd  r4
lwz r8, 0(r10)
+   andis.  r7, r11, SRR1_PREFIXED@h
+   cmpwi   r7,0
+   beq +4
+   lwz r5, 4(r10)
mtmsrd  r3
 
/* Store the result */
-- 
2.17.1



[RFC PATCH 1/2] KVM: PPC: Use the ppc_inst type

2020-08-19 Thread Jordan Niethe
The ppc_inst type was added to help cope with the addition of prefixed
instructions to the ISA. Convert KVM to use this new type for dealing
wiht instructions. For now do not try to add further support for
prefixed instructions.

Signed-off-by: Jordan Niethe 
---
 arch/powerpc/include/asm/disassemble.h| 80 +-
 arch/powerpc/include/asm/kvm_book3s.h |  4 +-
 arch/powerpc/include/asm/kvm_book3s_asm.h |  3 +-
 arch/powerpc/include/asm/kvm_host.h   |  5 +-
 arch/powerpc/include/asm/kvm_ppc.h| 14 ++--
 arch/powerpc/kernel/asm-offsets.c |  2 +
 arch/powerpc/kernel/kvm.c | 99 +++
 arch/powerpc/kvm/book3s.c |  6 +-
 arch/powerpc/kvm/book3s.h |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c   |  8 +-
 arch/powerpc/kvm/book3s_emulate.c | 30 +++
 arch/powerpc/kvm/book3s_hv.c  | 19 ++---
 arch/powerpc/kvm/book3s_hv_nested.c   |  4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  5 ++
 arch/powerpc/kvm/book3s_hv_tm.c   | 17 ++--
 arch/powerpc/kvm/book3s_hv_tm_builtin.c   | 12 +--
 arch/powerpc/kvm/book3s_paired_singles.c  | 15 ++--
 arch/powerpc/kvm/book3s_pr.c  | 20 ++---
 arch/powerpc/kvm/booke.c  | 18 ++---
 arch/powerpc/kvm/booke.h  |  4 +-
 arch/powerpc/kvm/booke_emulate.c  |  4 +-
 arch/powerpc/kvm/e500_emulate.c   |  6 +-
 arch/powerpc/kvm/e500_mmu_host.c  |  6 +-
 arch/powerpc/kvm/emulate.c| 15 ++--
 arch/powerpc/kvm/emulate_loadstore.c  |  8 +-
 arch/powerpc/kvm/powerpc.c|  4 +-
 arch/powerpc/kvm/trace.h  |  9 ++-
 arch/powerpc/kvm/trace_booke.h|  8 +-
 arch/powerpc/kvm/trace_pr.h   |  8 +-
 arch/powerpc/lib/inst.c   |  4 +-
 arch/powerpc/lib/sstep.c  |  4 +-
 arch/powerpc/sysdev/fsl_pci.c |  4 +-
 32 files changed, 237 insertions(+), 210 deletions(-)

diff --git a/arch/powerpc/include/asm/disassemble.h 
b/arch/powerpc/include/asm/disassemble.h
index 8d2ebc36d5e3..91dbe8e5cd13 100644
--- a/arch/powerpc/include/asm/disassemble.h
+++ b/arch/powerpc/include/asm/disassemble.h
@@ -10,75 +10,82 @@
 #define __ASM_PPC_DISASSEMBLE_H__
 
 #include 
+#include 
 
-static inline unsigned int get_op(u32 inst)
+static inline unsigned int get_op(struct ppc_inst inst)
 {
-   return inst >> 26;
+   return ppc_inst_val(inst) >> 26;
 }
 
-static inline unsigned int get_xop(u32 inst)
+static inline unsigned int get_xop(struct ppc_inst inst)
 {
-   return (inst >> 1) & 0x3ff;
+   return (ppc_inst_val(inst) >> 1) & 0x3ff;
 }
 
-static inline unsigned int get_sprn(u32 inst)
+static inline unsigned int get_sprn(struct ppc_inst inst)
 {
-   return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+   u32 word = ppc_inst_val(inst);
+
+   return ((word >> 16) & 0x1f) | ((word >> 6) & 0x3e0);
 }
 
-static inline unsigned int get_dcrn(u32 inst)
+static inline unsigned int get_dcrn(struct ppc_inst inst)
 {
-   return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+   u32 word = ppc_inst_val(inst);
+
+   return ((word >> 16) & 0x1f) | ((word >> 6) & 0x3e0);
 }
 
-static inline unsigned int get_tmrn(u32 inst)
+static inline unsigned int get_tmrn(struct ppc_inst inst)
 {
-   return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+   u32 word = ppc_inst_val(inst);
+
+   return ((word >> 16) & 0x1f) | ((word >> 6) & 0x3e0);
 }
 
-static inline unsigned int get_rt(u32 inst)
+static inline unsigned int get_rt(struct ppc_inst inst)
 {
-   return (inst >> 21) & 0x1f;
+   return (ppc_inst_val(inst) >> 21) & 0x1f;
 }
 
-static inline unsigned int get_rs(u32 inst)
+static inline unsigned int get_rs(struct ppc_inst inst)
 {
-   return (inst >> 21) & 0x1f;
+   return (ppc_inst_val(inst) >> 21) & 0x1f;
 }
 
-static inline unsigned int get_ra(u32 inst)
+static inline unsigned int get_ra(struct ppc_inst inst)
 {
-   return (inst >> 16) & 0x1f;
+   return (ppc_inst_val(inst) >> 16) & 0x1f;
 }
 
-static inline unsigned int get_rb(u32 inst)
+static inline unsigned int get_rb(struct ppc_inst inst)
 {
-   return (inst >> 11) & 0x1f;
+   return (ppc_inst_val(inst) >> 11) & 0x1f;
 }
 
-static inline unsigned int get_rc(u32 inst)
+static inline unsigned int get_rc(struct ppc_inst inst)
 {
-   return inst & 0x1;
+   return ppc_inst_val(inst) & 0x1;
 }
 
-static inline unsigned int get_ws(u32 inst)
+static inline unsigned int get_ws(struct ppc_inst inst)
 {
-   return (inst >> 11) & 0x1f;
+   return (ppc_inst_val(inst) >> 11) & 0x1f;
 }
 
-static inline unsigned int get_d(u32 inst)
+static inline unsigned int get_d(struct ppc_inst inst)
 {
-   return inst & 0x;
+   return ppc_inst_val(inst) & 0x;
 }
 
-static inline unsigned int get_oc(u32 inst)
+static inline unsigned int get_oc(struct ppc_inst inst)
 {
-   return (inst >> 

Re: [PATCH v3] powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory

2020-08-19 Thread Michael Ellerman
Christoph Hellwig  writes:
> On Tue, Aug 18, 2020 at 07:11:26PM -0300, Thiago Jung Bauermann wrote:
>> POWER secure guests (i.e., guests which use the Protection Execution
>> Facility) need to use SWIOTLB to be able to do I/O with the hypervisor, but
>> they don't need the SWIOTLB memory to be in low addresses since the
>> hypervisor doesn't have any addressing limitation.
>> 
>> This solves a SWIOTLB initialization problem we are seeing in secure guests
>> with 128 GB of RAM: they are configured with 4 GB of crashkernel reserved
>> memory, which leaves no space for SWIOTLB in low addresses.
>> 
>> To do this, we use mostly the same code as swiotlb_init(), but allocate the
>> buffer using memblock_alloc() instead of memblock_alloc_low().
>> 
>> Signed-off-by: Thiago Jung Bauermann 
>
> Looks fine to me (except for the pointlessly long comment lines, but I've
> been told that's the powerpc way).

They're 80 columns AFAICS?

cheers


Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support

2020-08-19 Thread Nicholas Piggin
Excerpts from Shawn Anastasio's message of August 19, 2020 6:59 am:
> On 8/18/20 2:11 AM, Nicholas Piggin wrote> Very reasonable point.
>> 
>> The problem we're trying to get a handle on is live partition migration
>> where a running guest might be using SAO then get migrated to a P10. I
>> don't think we have a good way to handle this case. Potentially the
>> hypervisor could revoke the page tables if the guest is running in hash
>> mode and the guest kernel could be taught about that and sigbus the
>> process, but in radix the guest controls those page tables and the SAO
>> state and I don't think there's a way to cause it to take a fault.
>> 
>> I also don't know what the proprietary hypervisor does here.
>> 
>> We could add it back, default to n, or make it bare metal only, or
>> somehow try to block live migration to a later CPU without the faciliy.
>> I wouldn't be against that.
> 
> 
> Admittedly I'm not too familiar with the specifics of live migration
> or guest memory management, but restoring the functionality and adding
> a way to prevent migration of SAO-using guests seems like a reasonable
> choice to me. Would this be done with help from the guest using some
> sort of infrastructure to signal to the hypervisor that SAO is in use,
> or entirely on the hypervisor by e.g. scanning the through the process
> table for SAO pages?

The first step might be to just re-add the functionality but disable
it by default if firmware_has_feature(FW_FEATURE_LPAR). You could have
a config or boot option to allow guests to use it at the cost of
migration compatibility.

That would probably be good enough for experimenting with the feature.
I think modifying the hypervisor and/or guest to deal with migration
is probably too much work to be justified at the moment.

>> It would be very interesting to know how it performs in such a "real"
>> situation. I don't know how well POWER9 has optimised it -- it's
>> possible that it's not much better than putting lwsync after every load
>> or store.
> 
> 
> This is definitely worth investigating in depth. That said, even if the
> performance on P9 isn't super great, I think the feature could still be
> useful, since it would offer more granularity than the sledgehammer
> approach of emitting lwsync everywhere.

Sure, we'd be interested to hear of results.

> I'd be happy to put in some of the work required to get this to a point
> where it can be reintroduced without breaking guest migration - I'd just
> need some pointers on getting started with whatever approach is decided on.

I think re-adding it as I said above would be okay. The code itself is 
not complex so that was not the reason for removal.

Thanks,
Nick



Re: [PATCH v2 1/4] powerpc/drmem: Make lmb_size 64 bit

2020-08-19 Thread Sasha Levin
Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.8.1, v5.7.15, v5.4.58, v4.19.139, 
v4.14.193, v4.9.232, v4.4.232.

v5.8.1: Build OK!
v5.7.15: Build OK!
v5.4.58: Build OK!
v4.19.139: Build OK!
v4.14.193: Failed to apply! Possible dependencies:
22508f3dc985 ("powerpc/numa: Look up device node in of_get_usable_memory()")
2c77721552e5 ("powerpc: Move of_drconf_cell struct to asm/drmem.h")
35f80debaef0 ("powerpc/numa: Look up device node in of_get_assoc_arrays()")
514a9cb3316a ("powerpc/numa: Update numa code use walk_drmem_lmbs")
6195a5001f1d ("powerpc/pseries: Update memory hotplug code to use drmem LMB 
array")
6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT 
format")
b6eca183e23e ("powerpc/kernel: Enables memory hot-remove after reboot on 
pseries guests")
b88fc309d6ad ("powerpc/numa: Look up associativity array in 
of_drconf_to_nid_single")

v4.9.232: Failed to apply! Possible dependencies:
3a2df3798d4d ("powerpc/mm: Make switch_mm_irqs_off() out of line")
43ed84a891b7 ("powerpc/mm: Move pgdir setting into a helper")
5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering 
CPU features")
5d451a87e5eb ("powerpc/64: Retrieve number of L1 cache sets from 
device-tree")
6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT 
format")
70cd4c10b290 ("KVM: PPC: Book3S HV: Fix software walk of guest process page 
tables")
9b081e10805c ("powerpc: port 64 bits pgtable_cache to 32 bits")
a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with KVM")
bd067f83b084 ("powerpc/64: Fix naming of cache block vs. cache line")
dbcbfee0c81c ("powerpc/64: More definitions for POWER9")
e2827fe5c156 ("powerpc/64: Clean up ppc64_caches using a struct per cache")
f4329f2ecb14 ("powerpc/64s: Reduce exception alignment")

v4.4.232: Failed to apply! Possible dependencies:
11a6f6abd74a ("powerpc/mm: Move radix/hash common data structures to 
book3s64 headers")
26b6a3d9bb48 ("powerpc/mm: move pte headers to book3s directory")
3808a88985b4 ("powerpc: Move FW feature probing out of pseries probe()")
3dfcb315d81e ("powerpc/mm: make a separate copy for book3s")
5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering 
CPU features")
5d31a96e6c01 ("powerpc/livepatch: Add livepatch stack to struct 
thread_info")
6574ba950bbe ("powerpc/kernel: Convert cpu_has_feature() to returning bool")
6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT 
format")
a141cca3892b ("powerpc/mm: Add early_[cpu|mmu]_has_feature()")
a8ed87c92adf ("powerpc/mm/radix: Add MMU_FTR_RADIX")
b92a226e5284 ("powerpc: Move cpu_has_feature() to a separate file")
da6a97bf12d5 ("powerpc: Move epapr_paravirt_early_init() to 
early_init_devtree()")
f63e6d898760 ("powerpc/livepatch: Add livepatch header")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha


Re: [PATCH v2 3/4] powerpc/memhotplug: Make lmb size 64bit

2020-08-19 Thread Sasha Levin
Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.8.1, v5.7.15, v5.4.58, v4.19.139, 
v4.14.193, v4.9.232, v4.4.232.

v5.8.1: Build OK!
v5.7.15: Build OK!
v5.4.58: Build OK!
v4.19.139: Failed to apply! Possible dependencies:
Unable to calculate

v4.14.193: Failed to apply! Possible dependencies:
Unable to calculate

v4.9.232: Failed to apply! Possible dependencies:
1a367063ca0c ("powerpc/pseries: Check memory device state before 
onlining/offlining")
25b587fba9a4 ("powerpc/pseries: Correct possible read beyond dlpar sysfs 
buffer")
333f7b76865b ("powerpc/pseries: Implement indexed-count hotplug memory add")
753843471cbb ("powerpc/pseries: Implement indexed-count hotplug memory 
remove")
943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
c21f515c7436 ("powerpc/pseries: Make the acquire/release of the drc for 
memory a seperate step")
e70d59700fc3 ("powerpc/pseries: Introduce memory hotplug READD operation")
f84775c2d5d9 ("powerpc/pseries: Fix build break when MEMORY_HOTREMOVE=n")

v4.4.232: Failed to apply! Possible dependencies:
183deeea5871 ("powerpc/pseries: Consolidate CPU hotplug code to 
hotplug-cpu.c")
1a367063ca0c ("powerpc/pseries: Check memory device state before 
onlining/offlining")
1dc759566636 ("powerpc/pseries: Use kernel hotplug queue for PowerVM 
hotplug events")
1f859adb9253 ("powerpc/pseries: Verify CPU doesn't exist before adding")
25b587fba9a4 ("powerpc/pseries: Correct possible read beyond dlpar sysfs 
buffer")
333f7b76865b ("powerpc/pseries: Implement indexed-count hotplug memory add")
4a4bdfea7cb7 ("powerpc/pseries: Refactor dlpar_add_lmb() code")
753843471cbb ("powerpc/pseries: Implement indexed-count hotplug memory 
remove")
9054619ef54a ("powerpc/pseries: Add pseries hotplug workqueue")
943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
9dc512819e4b ("powerpc: Fix unused function warning 'lmb_to_memblock'")
bdf5fc633804 ("powerpc/pseries: Update LMB associativity index during DLPAR 
add/remove")
c21f515c7436 ("powerpc/pseries: Make the acquire/release of the drc for 
memory a seperate step")
e70d59700fc3 ("powerpc/pseries: Introduce memory hotplug READD operation")
e9d764f80396 ("powerpc/pseries: Enable kernel CPU dlpar from sysfs")
ec999072442a ("powerpc/pseries: Auto-online hotplugged memory")
f84775c2d5d9 ("powerpc/pseries: Fix build break when MEMORY_HOTREMOVE=n")
fdb4f6e99ffa ("powerpc/pseries: Remove call to memblock_add()")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha


[PATCH net-next v2 4/4] ibmvnic: merge ibmvnic_reset_init and ibmvnic_init

2020-08-19 Thread Lijun Pan
These two functions share the majority of the code, hence merge
them together. In the meanwhile, add a reset pass-in parameter
to differentiate them. Thus, the code is easier to read and to tell
the difference between reset_init and regular init.

Signed-off-by: Lijun Pan 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 65 ++
 1 file changed, 13 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 4ca4647db72a..47fbe0553570 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -104,8 +104,7 @@ static int send_login(struct ibmvnic_adapter *adapter);
 static void send_cap_queries(struct ibmvnic_adapter *adapter);
 static int init_sub_crqs(struct ibmvnic_adapter *);
 static int init_sub_crq_irqs(struct ibmvnic_adapter *adapter);
-static int ibmvnic_init(struct ibmvnic_adapter *);
-static int ibmvnic_reset_init(struct ibmvnic_adapter *);
+static int ibmvnic_reset_init(struct ibmvnic_adapter *, bool reset);
 static void release_crq_queue(struct ibmvnic_adapter *);
 static int __ibmvnic_set_mac(struct net_device *, u8 *);
 static int init_crq_queue(struct ibmvnic_adapter *adapter);
@@ -1868,7 +1867,7 @@ static int do_change_param_reset(struct ibmvnic_adapter 
*adapter,
return rc;
}
 
-   rc = ibmvnic_reset_init(adapter);
+   rc = ibmvnic_reset_init(adapter, true);
if (rc)
return IBMVNIC_INIT_FAILED;
 
@@ -1986,7 +1985,7 @@ static int do_reset(struct ibmvnic_adapter *adapter,
goto out;
}
 
-   rc = ibmvnic_reset_init(adapter);
+   rc = ibmvnic_reset_init(adapter, true);
if (rc) {
rc = IBMVNIC_INIT_FAILED;
goto out;
@@ -2093,7 +2092,7 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter,
return rc;
}
 
-   rc = ibmvnic_init(adapter);
+   rc = ibmvnic_reset_init(adapter, false);
if (rc)
return rc;
 
@@ -4970,7 +4969,7 @@ static int init_crq_queue(struct ibmvnic_adapter *adapter)
return retrc;
 }
 
-static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter)
+static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter, bool reset)
 {
struct device *dev = &adapter->vdev->dev;
unsigned long timeout = msecs_to_jiffies(3);
@@ -4979,10 +4978,12 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter 
*adapter)
 
adapter->from_passive_init = false;
 
-   old_num_rx_queues = adapter->req_rx_queues;
-   old_num_tx_queues = adapter->req_tx_queues;
+   if (reset) {
+   old_num_rx_queues = adapter->req_rx_queues;
+   old_num_tx_queues = adapter->req_tx_queues;
+   reinit_completion(&adapter->init_done);
+   }
 
-   reinit_completion(&adapter->init_done);
adapter->init_done_rc = 0;
rc = ibmvnic_send_crq_init(adapter);
if (rc) {
@@ -5000,7 +5001,8 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter 
*adapter)
return adapter->init_done_rc;
}
 
-   if (test_bit(0, &adapter->resetting) && !adapter->wait_for_reset &&
+   if (reset &&
+   test_bit(0, &adapter->resetting) && !adapter->wait_for_reset &&
adapter->reset_reason != VNIC_RESET_MOBILITY) {
if (adapter->req_rx_queues != old_num_rx_queues ||
adapter->req_tx_queues != old_num_tx_queues) {
@@ -5028,47 +5030,6 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter 
*adapter)
return rc;
 }
 
-static int ibmvnic_init(struct ibmvnic_adapter *adapter)
-{
-   struct device *dev = &adapter->vdev->dev;
-   unsigned long timeout = msecs_to_jiffies(3);
-   int rc;
-
-   adapter->from_passive_init = false;
-
-   adapter->init_done_rc = 0;
-   rc = ibmvnic_send_crq_init(adapter);
-   if (rc) {
-   dev_err(dev, "Send crq init failed with error %d\n", rc);
-   return rc;
-   }
-
-   if (!wait_for_completion_timeout(&adapter->init_done, timeout)) {
-   dev_err(dev, "Initialization sequence timed out\n");
-   return -1;
-   }
-
-   if (adapter->init_done_rc) {
-   release_crq_queue(adapter);
-   return adapter->init_done_rc;
-   }
-
-   rc = init_sub_crqs(adapter);
-   if (rc) {
-   dev_err(dev, "Initialization of sub crqs failed\n");
-   release_crq_queue(adapter);
-   return rc;
-   }
-
-   rc = init_sub_crq_irqs(adapter);
-   if (rc) {
-   dev_err(dev, "Failed to initialize sub crq irqs\n");
-   release_crq_queue(adapter);
-   }
-
-   return rc;
-}
-
 static struct device_attribute dev_attr_failover;
 
 static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)
@@ -5131,7 +509

[PATCH net-next v2 3/4] ibmvnic: remove never executed if statement

2020-08-19 Thread Lijun Pan
At the beginning of the function, from_passive_init is set false by
"adapter->from_passive_init = false;",
hence the if statement will never run.

Signed-off-by: Lijun Pan 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 644352e5056d..4ca4647db72a 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -5000,12 +5000,6 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter 
*adapter)
return adapter->init_done_rc;
}
 
-   if (adapter->from_passive_init) {
-   adapter->state = VNIC_OPEN;
-   adapter->from_passive_init = false;
-   return -1;
-   }
-
if (test_bit(0, &adapter->resetting) && !adapter->wait_for_reset &&
adapter->reset_reason != VNIC_RESET_MOBILITY) {
if (adapter->req_rx_queues != old_num_rx_queues ||
@@ -5059,12 +5053,6 @@ static int ibmvnic_init(struct ibmvnic_adapter *adapter)
return adapter->init_done_rc;
}
 
-   if (adapter->from_passive_init) {
-   adapter->state = VNIC_OPEN;
-   adapter->from_passive_init = false;
-   return -1;
-   }
-
rc = init_sub_crqs(adapter);
if (rc) {
dev_err(dev, "Initialization of sub crqs failed\n");
-- 
2.23.0



[PATCH net-next v2 2/4] ibmvnic: improve ibmvnic_init and ibmvnic_reset_init

2020-08-19 Thread Lijun Pan
When H_SEND_CRQ command returns with H_CLOSED, it means the
server's CRQ is not ready yet. Instead of resetting immediately,
we wait for the server to launch passive init.
ibmvnic_init() and ibmvnic_reset_init() should also return the
error code from ibmvnic_send_crq_init() call.

Signed-off-by: Lijun Pan 
---
v2: removes __func__ in error messages.

 drivers/net/ethernet/ibm/ibmvnic.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 65f5d99f97dc..644352e5056d 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3568,8 +3568,7 @@ static int ibmvnic_send_crq(struct ibmvnic_adapter 
*adapter,
if (rc) {
if (rc == H_CLOSED) {
dev_warn(dev, "CRQ Queue closed\n");
-   if (test_bit(0, &adapter->resetting))
-   ibmvnic_reset(adapter, VNIC_RESET_FATAL);
+   /* do not reset, report the fail, wait for passive init 
from server */
}
 
dev_warn(dev, "Send error (rc=%d)\n", rc);
@@ -4985,7 +4984,12 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter 
*adapter)
 
reinit_completion(&adapter->init_done);
adapter->init_done_rc = 0;
-   ibmvnic_send_crq_init(adapter);
+   rc = ibmvnic_send_crq_init(adapter);
+   if (rc) {
+   dev_err(dev, "Send crq init failed with error %d\n", rc);
+   return rc;
+   }
+
if (!wait_for_completion_timeout(&adapter->init_done, timeout)) {
dev_err(dev, "Initialization sequence timed out\n");
return -1;
@@ -5039,7 +5043,12 @@ static int ibmvnic_init(struct ibmvnic_adapter *adapter)
adapter->from_passive_init = false;
 
adapter->init_done_rc = 0;
-   ibmvnic_send_crq_init(adapter);
+   rc = ibmvnic_send_crq_init(adapter);
+   if (rc) {
+   dev_err(dev, "Send crq init failed with error %d\n", rc);
+   return rc;
+   }
+
if (!wait_for_completion_timeout(&adapter->init_done, timeout)) {
dev_err(dev, "Initialization sequence timed out\n");
return -1;
-- 
2.23.0



[PATCH net-next v2 1/4] ibmvnic: compare adapter->init_done_rc with more readable ibmvnic_rc_codes

2020-08-19 Thread Lijun Pan
Instead of comparing (adapter->init_done_rc == 1), let it
be (adapter->init_done_rc == PARTIALSUCCESS).

Signed-off-by: Lijun Pan 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 5afb3c9c52d2..65f5d99f97dc 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -974,7 +974,7 @@ static int set_link_state(struct ibmvnic_adapter *adapter, 
u8 link_state)
return -1;
}
 
-   if (adapter->init_done_rc == 1) {
+   if (adapter->init_done_rc == PARTIALSUCCESS) {
/* Partuial success, delay and re-send */
mdelay(1000);
resend = true;
-- 
2.23.0



[PATCH net-next v2 0/4] refactoring of ibmvnic code

2020-08-19 Thread Lijun Pan
This patch series refactor reset_init and init functions,
and make some other cosmetic changes to make the code
easier to read and debug. v2 removes __func__ and v1's 1/5.

Lijun Pan (4):
  ibmvnic: compare adapter->init_done_rc with more readable
ibmvnic_rc_codes
  ibmvnic: improve ibmvnic_init and ibmvnic_reset_init
  ibmvnic: remove never executed if statement
  ibmvnic: merge ibmvnic_reset_init and ibmvnic_init

 drivers/net/ethernet/ibm/ibmvnic.c | 84 --
 1 file changed, 21 insertions(+), 63 deletions(-)

-- 
2.23.0



Re: fsl_espi errors on v5.7.15

2020-08-19 Thread Chris Packham

On 19/08/20 6:15 pm, Heiner Kallweit wrote:
> On 19.08.2020 00:44, Chris Packham wrote:
>> Hi Again,
>>
>> On 17/08/20 9:09 am, Chris Packham wrote:
>>
>>> On 14/08/20 6:19 pm, Heiner Kallweit wrote:
 On 14.08.2020 04:48, Chris Packham wrote:
> Hi,
>
> I'm seeing a problem with accessing spi-nor after upgrading a T2081
> based system to linux v5.7.15
>
> For this board u-boot and the u-boot environment live on spi-nor.
>
> When I use fw_setenv from userspace I get the following kernel logs
>
> # fw_setenv foo=1
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
> ...
>
 This error reporting doesn't exist yet in 4.4. So you may have an issue
 under 4.4 too, it's just not reported.
 Did you verify that under 4.4 fw_setenv actually has an effect?
>>> Just double checked and yes under 4.4 the setting does get saved.
> If I run fw_printenv (before getting it into a bad state) it is able to
> display the content of the boards u-boot environment.
>
 This might indicate an issue with spi being locked. I've seen related
 questions, just use the search engine of your choice and check for
 fw_setenv and locked.
>>> I'm running a version of fw_setenv which includes
>>> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it shouldn't
>>> be locking things unnecessarily.
> If been unsuccessful in producing a setup for bisecting the issue. I do
> know the issue doesn't occur on the old 4.4.x based kernel but that's
> probably not much help.
>
> Any pointers on what the issue (and/or solution) might be.
>> I finally managed to get our board running with a vanilla kernel. With
>> corenet64_smp_defconfig I occasionally see
>>
>>     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>
>> other than the message things seem to be working.
>>
>> With a custom defconfig I see
>>
>>     fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set!
>>     fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty!
>>     fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>     ...
>>
>> and access to the spi-nor does not work until the board is reset.
>>
>> I'll try and pick apart the differences between the two defconfigs.

I now think my earlier testing is invalid. I have seen the problem with 
either defconfig if I try hard enough. I had convinced myself that the 
problem was CONFIG_PREEMPT but that was before I found boot-to-boot 
differences with the same kernel.

It's possible that I'm chasing multiple issues with the same symptom.

The error I'm most concerned with is in the sequence
1. boot with old image
2. write environment
3. boot with new image
4. write environment
5. write fails and environment is corrupted

After I recover the system things sometimes seem fine. Until I repeat 
the sequence above.

> Also relevant may be:
> - Which dts are you using?
Custom but based heavily on the t2080rdb.
> - What's the spi-nor type, and at which frequency are you operating it?
The board has several alternate parts for the spi-nor so the dts just 
specifies compatible = "jedec,spi-nor" the actual chip detected on the 
board I have is "n25q032a (4096 Kbytes)". The dts sets spi-max-frequency 
= <1000> I haven't measured the actual frequency on the bus.
> - Does the issue still happen if you lower the frequency?
I did play around with the frequency initially but I should probably 
give that another go now that I have a better reproduction method.

Re: [PATCH v3 09/17] memblock: make memblock_debug and related functionality private

2020-08-19 Thread Mike Rapoport
On Wed, Aug 19, 2020 at 12:24:05PM -0700, Andrew Morton wrote:
> On Tue, 18 Aug 2020 18:16:26 +0300 Mike Rapoport  wrote:
> 
> > From: Mike Rapoport 
> > 
> > The only user of memblock_dbg() outside memblock was s390 setup code and it
> > is converted to use pr_debug() instead.
> > This allows to stop exposing memblock_debug and memblock_dbg() to the rest
> > of the kernel.
> > 
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -137,7 +137,10 @@ struct memblock_type physmem = {
> >  i < memblock_type->cnt;\
> >  i++, rgn = &memblock_type->regions[i])
> >  
> > -int memblock_debug __initdata_memblock;
> > +#define memblock_dbg(fmt, ...) \
> > +   if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> > +
> 
> checkpatch doesn't like this much.
> 
> ERROR: Macros starting with if should be enclosed by a do - while loop to 
> avoid possible if/else logic defects
> #101: FILE: mm/memblock.c:140:
> +#define memblock_dbg(fmt, ...) \
> + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> 
> WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then 
> dev_info(dev, ... then pr_info(...  to printk(KERN_INFO ...
> #102: FILE: mm/memblock.c:141:
> + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> 
> ERROR: trailing statements should be on next line
> #102: FILE: mm/memblock.c:141:
> + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> 
> 
> The first one is significant:
> 
>   if (foo)
>   memblock_dbg(...);
>   else
>   save_the_world();
> 
> could end up inadvertently destroying the planet.

Well, it didn't till now ;-)

> This?
> 
> --- 
> a/mm/memblock.c~memblock-make-memblock_debug-and-related-functionality-private-fix
> +++ a/mm/memblock.c
> @@ -137,8 +137,11 @@ struct memblock_type physmem = {
>i < memblock_type->cnt;\
>i++, rgn = &memblock_type->regions[i])
>  
> -#define memblock_dbg(fmt, ...) \
> - if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> +#define memblock_dbg(fmt, ...)   
> \
> + do {\
> + if (memblock_debug) \
> + pr_info(fmt, ##__VA_ARGS__);\
> + } while (0)

Sure, thanks!

>  static int memblock_debug __initdata_memblock;
>  static bool system_has_some_mirror __initdata_memblock = false;
> _
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v3] powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory

2020-08-19 Thread Thiago Jung Bauermann


Konrad Rzeszutek Wilk  writes:

> On Tue, Aug 18, 2020 at 07:11:26PM -0300, Thiago Jung Bauermann wrote:
>> POWER secure guests (i.e., guests which use the Protection Execution
>> Facility) need to use SWIOTLB to be able to do I/O with the hypervisor, but
>> they don't need the SWIOTLB memory to be in low addresses since the
>> hypervisor doesn't have any addressing limitation.
>> 
>> This solves a SWIOTLB initialization problem we are seeing in secure guests
>> with 128 GB of RAM: they are configured with 4 GB of crashkernel reserved
>> memory, which leaves no space for SWIOTLB in low addresses.
>> 
>> To do this, we use mostly the same code as swiotlb_init(), but allocate the
>> buffer using memblock_alloc() instead of memblock_alloc_low().
>> 
>> Signed-off-by: Thiago Jung Bauermann 
>
> Reviewed-by: Konrad Rzeszutek Wilk 

Thanks!

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v3 09/17] memblock: make memblock_debug and related functionality private

2020-08-19 Thread Andrew Morton
On Tue, 18 Aug 2020 18:16:26 +0300 Mike Rapoport  wrote:

> From: Mike Rapoport 
> 
> The only user of memblock_dbg() outside memblock was s390 setup code and it
> is converted to use pr_debug() instead.
> This allows to stop exposing memblock_debug and memblock_dbg() to the rest
> of the kernel.
> 
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -137,7 +137,10 @@ struct memblock_type physmem = {
>i < memblock_type->cnt;\
>i++, rgn = &memblock_type->regions[i])
>  
> -int memblock_debug __initdata_memblock;
> +#define memblock_dbg(fmt, ...) \
> + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
> +

checkpatch doesn't like this much.

ERROR: Macros starting with if should be enclosed by a do - while loop to avoid 
possible if/else logic defects
#101: FILE: mm/memblock.c:140:
+#define memblock_dbg(fmt, ...) \
+   if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)

WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then 
dev_info(dev, ... then pr_info(...  to printk(KERN_INFO ...
#102: FILE: mm/memblock.c:141:
+   if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)

ERROR: trailing statements should be on next line
#102: FILE: mm/memblock.c:141:
+   if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)


The first one is significant:

if (foo)
memblock_dbg(...);
else
save_the_world();

could end up inadvertently destroying the planet.

This?

--- 
a/mm/memblock.c~memblock-make-memblock_debug-and-related-functionality-private-fix
+++ a/mm/memblock.c
@@ -137,8 +137,11 @@ struct memblock_type physmem = {
 i < memblock_type->cnt;\
 i++, rgn = &memblock_type->regions[i])
 
-#define memblock_dbg(fmt, ...) \
-   if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
+#define memblock_dbg(fmt, ...) \
+   do {\
+   if (memblock_debug) \
+   pr_info(fmt, ##__VA_ARGS__);\
+   } while (0)
 
 static int memblock_debug __initdata_memblock;
 static bool system_has_some_mirror __initdata_memblock = false;
_



Re: [PATCH net-next 3/5] ibmvnic: improve ibmvnic_init and ibmvnic_reset_init

2020-08-19 Thread David Miller
From: Lijun Pan 
Date: Wed, 19 Aug 2020 00:35:10 -0500

> + if (rc) {
> + dev_err(dev, "%s: Send crq init failed with error %d\n", 
> __func__, rc);
> + return rc;

Consistent with my feedback on patch #1, please get rid of this __func__ stuff.

Thank you.


Re: [PATCH] powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10

2020-08-19 Thread Pratik Sampat




On 19/08/20 3:17 pm, Nicholas Piggin wrote:

This driver does not restore stop > 3 state, so it limits itself
to states which do not lose full state or TB.

The POWER10 SPRs are sufficiently different from P9 that it seems
easier to split out the P10 code. The POWER10 deep sleep code
(e.g., the BHRB restore) has been taken out, but it can be re-added
when stop > 3 support is added.

Cc: Ryan P Grimm 
Cc: Michael Neuling 
Cc: Gautham R. Shenoy 
Cc: Pratik Rajesh Sampat 
Signed-off-by: Nicholas Piggin 


Tested-by: Pratik Rajesh Sampat
Reviewed-by: Pratik Rajesh Sampat

I've tested the patch on P9 and P10, can verify the corresponding code
paths are being taken for their respective architecture and cpuidle is
being exercised with the correct set of advertised stop states in each
case.

The patch looks good to me.

---
Thanks,
Pratik

 



Re: [PATCH net-next 1/5] ibmvnic: print caller in several error messages

2020-08-19 Thread David Miller
From: Lijun Pan 
Date: Wed, 19 Aug 2020 00:35:08 -0500

> The error messages in the changed functions are exactly the same.
> In order to differentiate them and make debugging easier,
> we print the function names in the error messages.
> 
> Signed-off-by: Lijun Pan 

I don't see any value in emitting function names in kernel
error log messages.

Sorry.


Re: [PATCH] powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10

2020-08-19 Thread Pratik Sampat

On 19/08/20 3:17 pm, Nicholas Piggin wrote:

This driver does not restore stop > 3 state, so it limits itself
to states which do not lose full state or TB.

The POWER10 SPRs are sufficiently different from P9 that it seems
easier to split out the P10 code. The POWER10 deep sleep code
(e.g., the BHRB restore) has been taken out, but it can be re-added
when stop > 3 support is added.

Cc: Ryan P Grimm 
Cc: Michael Neuling 
Cc: Gautham R. Shenoy 
Cc: Pratik Rajesh Sampat 
Signed-off-by: Nicholas Piggin 

Fixes: 8747bf36f312 ("powerpc/powernv/idle: Replace CPU feature check 
with PVR check")


The patch fixes the commit 8747bf36f312 which had setup idle for only 
for the P9 PVR.


---
Thanks for fixing this!
Pratik


Re: [PATCH] powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10

2020-08-19 Thread Gautham R Shenoy
On Wed, Aug 19, 2020 at 07:47:00PM +1000, Nicholas Piggin wrote:
> This driver does not restore stop > 3 state, so it limits itself
> to states which do not lose full state or TB.
> 
> The POWER10 SPRs are sufficiently different from P9 that it seems
> easier to split out the P10 code. The POWER10 deep sleep code
> (e.g., the BHRB restore) has been taken out, but it can be re-added
> when stop > 3 support is added.

MMCRA[BHRB] save/restore was in the shallow stop-state path. But we
can add it back later.

> 
> Cc: Ryan P Grimm 
> Cc: Michael Neuling 
> Cc: Gautham R. Shenoy 
> Cc: Pratik Rajesh Sampat 
> Signed-off-by: Nicholas Piggin 

Just a minor comment below. But otherwise the patch looks good to me.

[..snip..]

> @@ -1093,11 +1200,15 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 
> *psscr_mask, u32 flags)
>   * @dt_idle_states: Number of idle state entries
>   * Returns 0 on success
>   */
> -static void __init pnv_power9_idle_init(void)
> +static void __init pnv_arch300_idle_init(void)
>  {
>   u64 max_residency_ns = 0;
>   int i;
> 
> + /* stop is not really architected, we only have p9,p10 drivers */
> + if (!pvr_version_is(PVR_POWER10) && !pvr_version_is(PVR_POWER9))
> + return;
> +
>   /*
>* pnv_deepest_stop_{val,mask} should be set to values corresponding to
>* the deepest stop state.
> @@ -1112,6 +1223,11 @@ static void __init pnv_power9_idle_init(void)
>   struct pnv_idle_states_t *state = &pnv_idle_states[i];
>   u64 psscr_rl = state->psscr_val & PSSCR_RL_MASK;
> 
> + /* No deep loss driver implemented for POWER10 yet */
> + if (pvr_version_is(PVR_POWER10) &&
> + state->flags & 
> (OPAL_PM_TIMEBASE_STOP|OPAL_PM_LOSE_FULL_CONTEXT))
> + continue;
> +

Should we have a pr_info() informing the user the kernel is skipping
over these stop states ?

Reviewed-by: Gautham R. Shenoy 

>   if ((state->flags & OPAL_PM_TIMEBASE_STOP) &&
>(pnv_first_tb_loss_level > psscr_rl))
>   pnv_first_tb_loss_level = psscr_rl;

--
Thanks and Regards
gautham.


Re: [PATCH] powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10

2020-08-19 Thread Vaidyanathan Srinivasan
* Nicholas Piggin  [2020-08-19 19:47:00]:

> This driver does not restore stop > 3 state, so it limits itself
> to states which do not lose full state or TB.
> 
> The POWER10 SPRs are sufficiently different from P9 that it seems
> easier to split out the P10 code. The POWER10 deep sleep code
> (e.g., the BHRB restore) has been taken out, but it can be re-added
> when stop > 3 support is added.
> 
> Cc: Ryan P Grimm 
> Cc: Michael Neuling 
> Cc: Gautham R. Shenoy 
> Cc: Pratik Rajesh Sampat 
> Signed-off-by: Nicholas Piggin 


Tested-by: Vaidyanathan Srinivasan 

This patch series was tested on P9 and P10. Correct set of
stop states were discovered and enabled in the platform.

Further the SPRs saved and restored by this code has also been
tested along with cpuidle state tests and cpu offline/online tests.

Thank for patch.

--Vaidy



Re: [PATCH v3] powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory

2020-08-19 Thread Konrad Rzeszutek Wilk
On Tue, Aug 18, 2020 at 07:11:26PM -0300, Thiago Jung Bauermann wrote:
> POWER secure guests (i.e., guests which use the Protection Execution
> Facility) need to use SWIOTLB to be able to do I/O with the hypervisor, but
> they don't need the SWIOTLB memory to be in low addresses since the
> hypervisor doesn't have any addressing limitation.
> 
> This solves a SWIOTLB initialization problem we are seeing in secure guests
> with 128 GB of RAM: they are configured with 4 GB of crashkernel reserved
> memory, which leaves no space for SWIOTLB in low addresses.
> 
> To do this, we use mostly the same code as swiotlb_init(), but allocate the
> buffer using memblock_alloc() instead of memblock_alloc_low().
> 
> Signed-off-by: Thiago Jung Bauermann 

Reviewed-by: Konrad Rzeszutek Wilk 


> ---
>  arch/powerpc/include/asm/svm.h   |  4 
>  arch/powerpc/mm/mem.c|  6 +-
>  arch/powerpc/platforms/pseries/svm.c | 26 ++
>  3 files changed, 35 insertions(+), 1 deletion(-)
> 
> Changes from v2:
> - Panic if unable to allocate buffer, as suggested by Christoph.
> 
> Changes from v1:
> - Open-code swiotlb_init() in arch-specific code, as suggested by
>   Christoph.
> 
> diff --git a/arch/powerpc/include/asm/svm.h b/arch/powerpc/include/asm/svm.h
> index 85580b30aba4..7546402d796a 100644
> --- a/arch/powerpc/include/asm/svm.h
> +++ b/arch/powerpc/include/asm/svm.h
> @@ -15,6 +15,8 @@ static inline bool is_secure_guest(void)
>   return mfmsr() & MSR_S;
>  }
>  
> +void __init svm_swiotlb_init(void);
> +
>  void dtl_cache_ctor(void *addr);
>  #define get_dtl_cache_ctor() (is_secure_guest() ? dtl_cache_ctor : NULL)
>  
> @@ -25,6 +27,8 @@ static inline bool is_secure_guest(void)
>   return false;
>  }
>  
> +static inline void svm_swiotlb_init(void) {}
> +
>  #define get_dtl_cache_ctor() NULL
>  
>  #endif /* CONFIG_PPC_SVM */
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index c2c11eb8dcfc..0f21bcb16405 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -50,6 +50,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -290,7 +291,10 @@ void __init mem_init(void)
>* back to to-down.
>*/
>   memblock_set_bottom_up(true);
> - swiotlb_init(0);
> + if (is_secure_guest())
> + svm_swiotlb_init();
> + else
> + swiotlb_init(0);
>  #endif
>  
>   high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
> diff --git a/arch/powerpc/platforms/pseries/svm.c 
> b/arch/powerpc/platforms/pseries/svm.c
> index 40c0637203d5..81085eb8f225 100644
> --- a/arch/powerpc/platforms/pseries/svm.c
> +++ b/arch/powerpc/platforms/pseries/svm.c
> @@ -7,6 +7,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -34,6 +35,31 @@ static int __init init_svm(void)
>  }
>  machine_early_initcall(pseries, init_svm);
>  
> +/*
> + * Initialize SWIOTLB. Essentially the same as swiotlb_init(), except that it
> + * can allocate the buffer anywhere in memory. Since the hypervisor doesn't 
> have
> + * any addressing limitation, we don't need to allocate it in low addresses.
> + */
> +void __init svm_swiotlb_init(void)
> +{
> + unsigned char *vstart;
> + unsigned long bytes, io_tlb_nslabs;
> +
> + io_tlb_nslabs = (swiotlb_size_or_default() >> IO_TLB_SHIFT);
> + io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
> +
> + bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> +
> + vstart = memblock_alloc(PAGE_ALIGN(bytes), PAGE_SIZE);
> + if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, false))
> + return;
> +
> + if (io_tlb_start)
> + memblock_free_early(io_tlb_start,
> + PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
> + panic("SVM: Cannot allocate SWIOTLB buffer");
> +}
> +
>  int set_memory_encrypted(unsigned long addr, int numpages)
>  {
>   if (!PAGE_ALIGNED(addr))


Re: [PATCH v3] powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory

2020-08-19 Thread Thiago Jung Bauermann


Christoph Hellwig  writes:

> On Tue, Aug 18, 2020 at 07:11:26PM -0300, Thiago Jung Bauermann wrote:
>> POWER secure guests (i.e., guests which use the Protection Execution
>> Facility) need to use SWIOTLB to be able to do I/O with the hypervisor, but
>> they don't need the SWIOTLB memory to be in low addresses since the
>> hypervisor doesn't have any addressing limitation.
>> 
>> This solves a SWIOTLB initialization problem we are seeing in secure guests
>> with 128 GB of RAM: they are configured with 4 GB of crashkernel reserved
>> memory, which leaves no space for SWIOTLB in low addresses.
>> 
>> To do this, we use mostly the same code as swiotlb_init(), but allocate the
>> buffer using memblock_alloc() instead of memblock_alloc_low().
>> 
>> Signed-off-by: Thiago Jung Bauermann 
>
> Looks fine to me (except for the pointlessly long comment lines, but I've
> been told that's the powerpc way).

Thanks! Do I have your Reviewed-by?

-- 
Thiago Jung Bauermann
IBM Linux Technology Center


Re: [PATCH v2 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.

2020-08-19 Thread kernel test robot
Hi "Aneesh,

I love your patch! Yet something to improve:

[auto build test ERROR on hnaz-linux-mm/master]
[also build test ERROR on powerpc/next linus/master v5.9-rc1 next-20200819]
[cannot apply to mmotm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/mm-debug_vm_pgtable-fixes/20200819-213446
base:   https://github.com/hnaz/linux-mm master
config: i386-randconfig-s002-20200818 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.2-183-gaa6ede3b-dirty
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> arch/x86/mm/ioremap.c:484:12: error: redefinition of 
>> 'arch_ioremap_p4d_supported'
 484 | int __init arch_ioremap_p4d_supported(void)
 |^~
   In file included from arch/x86/mm/ioremap.c:12:
   include/linux/io.h:41:19: note: previous definition of 
'arch_ioremap_p4d_supported' was here
  41 | static inline int arch_ioremap_p4d_supported(void)
 |   ^~
>> arch/x86/mm/ioremap.c:489:12: error: redefinition of 
>> 'arch_ioremap_pud_supported'
 489 | int __init arch_ioremap_pud_supported(void)
 |^~
   In file included from arch/x86/mm/ioremap.c:12:
   include/linux/io.h:45:19: note: previous definition of 
'arch_ioremap_pud_supported' was here
  45 | static inline int arch_ioremap_pud_supported(void)
 |   ^~
>> arch/x86/mm/ioremap.c:498:12: error: redefinition of 
>> 'arch_ioremap_pmd_supported'
 498 | int __init arch_ioremap_pmd_supported(void)
 |^~
   In file included from arch/x86/mm/ioremap.c:12:
   include/linux/io.h:49:19: note: previous definition of 
'arch_ioremap_pmd_supported' was here
  49 | static inline int arch_ioremap_pmd_supported(void)
 |   ^~
   arch/x86/mm/ioremap.c:737:17: warning: no previous prototype for 
'early_memremap_pgprot_adjust' [-Wmissing-prototypes]
 737 | pgprot_t __init early_memremap_pgprot_adjust(resource_size_t 
phys_addr,
 | ^~~~

# 
https://github.com/0day-ci/linux/commit/260b675444a7d5afa9ccef47ca9e588fb18d01a3
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Aneesh-Kumar-K-V/mm-debug_vm_pgtable-fixes/20200819-213446
git checkout 260b675444a7d5afa9ccef47ca9e588fb18d01a3
vim +/arch_ioremap_p4d_supported +484 arch/x86/mm/ioremap.c

^1da177e4c3f41 arch/i386/mm/ioremap.c Linus Torvalds2005-04-16  483  
0f472d04f59ff8 arch/x86/mm/ioremap.c  Anshuman Khandual 2019-07-16 @484  int 
__init arch_ioremap_p4d_supported(void)
0f472d04f59ff8 arch/x86/mm/ioremap.c  Anshuman Khandual 2019-07-16  485  {
0f472d04f59ff8 arch/x86/mm/ioremap.c  Anshuman Khandual 2019-07-16  486 
return 0;
0f472d04f59ff8 arch/x86/mm/ioremap.c  Anshuman Khandual 2019-07-16  487  }
0f472d04f59ff8 arch/x86/mm/ioremap.c  Anshuman Khandual 2019-07-16  488  
1e6277de3a2337 arch/x86/mm/ioremap.c  Jan Beulich   2015-05-28 @489  int 
__init arch_ioremap_pud_supported(void)
5d72b4fba40ef4 arch/x86/mm/ioremap.c  Toshi Kani2015-04-14  490  {
5d72b4fba40ef4 arch/x86/mm/ioremap.c  Toshi Kani2015-04-14  491  #ifdef 
CONFIG_X86_64
b8291adc191abe arch/x86/mm/ioremap.c  Borislav Petkov   2016-03-29  492 
return boot_cpu_has(X86_FEATURE_GBPAGES);
5d72b4fba40ef4 arch/x86/mm/ioremap.c  Toshi Kani2015-04-14  493  #else
5d72b4fba40ef4 arch/x86/mm/ioremap.c  Toshi Kani2015-04-14  494 
return 0;
5d72b4fba40ef4 arch/x86/mm/ioremap.c  Toshi Kani2015-04-14  495  #endif
5d72b4fba40ef4 arch/x86/mm/ioremap.c  Toshi Kani2015-04-14  496  }
5d72b4fba40ef4 arch/x86/mm/ioremap.c  Toshi Kani2015-04-14  497  
1e6277de3a2337 arch/x86/mm/ioremap.c  Jan Beulich   2015-05-28 @498  int 
__init arch_ioremap_pmd_supported(void)
5d72b4fba40ef4 arch/x86/mm/ioremap.c  Toshi Kani2015-04-14  499  {
16bf92261b1b6c arch/x86/mm/ioremap.c  Borislav Petkov   2016-03-29  500 
return boot_cpu_has(X86_FEATURE_PSE);
5d72b4fba40ef4 arch/x86/mm/ioremap.c  Toshi Kani2015-04-14  501  }
5d72b4fba40ef4 arch/x86/mm/ioremap.c  Toshi Kani2015-04-14  502  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH 1/2] lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state

2020-08-19 Thread peterz
On Wed, Aug 19, 2020 at 05:32:50PM +0200, pet...@infradead.org wrote:
> On Wed, Aug 19, 2020 at 08:39:13PM +1000, Alexey Kardashevskiy wrote:
> 
> > > or current upstream?
> > 
> > The upstream 18445bf405cb (13 hours old) also shows the problem. Yours
> > 1/2 still fixes it.
> 
> Afaict that just reduces the window.
> 
> Isn't the problem that:
> 
> arch/powerpc/kernel/exceptions-64e.S
> 
>   START_EXCEPTION(perfmon);
>   NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR,
>   PROLOG_ADDITION_NONE)
>   EXCEPTION_COMMON(0x260)
>   INTS_DISABLE
> #   RECONCILE_IRQ_STATE
> # TRACE_DISABLE_INTS
> #   TRACE_WITH_FRAME_BUFFER(trace_hardirqs_off)
> #
> # but we haven't done nmi_enter() yet... whoopsy
> 
>   CHECK_NAPPING()
>   addir3,r1,STACK_FRAME_OVERHEAD
>   bl  performance_monitor_exception
> #  perf_irq()
> #  perf_event_interrupt
> #  __perf_event_interrupt
> #   nmi_enter()
> 
> 
> 
> That is, afaict your entry code is buggered.

That is, patch 1/2 doesn't change the case:

local_irq_enable()
  trace_hardirqs_on()
  
trace_hardirqs_off()
...
if (regs_irqs_disabled(regs)) // false
  trace_hardirqs_on();
  
  raw_local_irq_enable()

Where local_irq_enable() has done trace_hardirqs_on() and the NMI hits
and undoes it, but doesn't re-do it because the hardware state is still
disabled.

What's supposed to happen is:


  nmi_enter()
  trace_hardirqs_off() // no-op, because in_nmi() (or previously 
because lockdep_off())
  ...




Re: [PATCH 1/2] lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state

2020-08-19 Thread peterz
On Wed, Aug 19, 2020 at 08:39:13PM +1000, Alexey Kardashevskiy wrote:

> > or current upstream?
> 
> The upstream 18445bf405cb (13 hours old) also shows the problem. Yours
> 1/2 still fixes it.

Afaict that just reduces the window.

Isn't the problem that:

arch/powerpc/kernel/exceptions-64e.S

START_EXCEPTION(perfmon);
NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR,
PROLOG_ADDITION_NONE)
EXCEPTION_COMMON(0x260)
INTS_DISABLE
# RECONCILE_IRQ_STATE
#   TRACE_DISABLE_INTS
# TRACE_WITH_FRAME_BUFFER(trace_hardirqs_off)
#
# but we haven't done nmi_enter() yet... whoopsy

CHECK_NAPPING()
addir3,r1,STACK_FRAME_OVERHEAD
bl  performance_monitor_exception
#perf_irq()
#  perf_event_interrupt
#__perf_event_interrupt
# nmi_enter()



That is, afaict your entry code is buggered.



Re: [PATCH v2 00/13] mm/debug_vm_pgtable fixes

2020-08-19 Thread Aneesh Kumar K.V
"Aneesh Kumar K.V"  writes:

> This patch series includes fixes for debug_vm_pgtable test code so that
> they follow page table updates rules correctly. The first two patches 
> introduce
> changes w.r.t ppc64. The patches are included in this series for 
> completeness. We can
> merge them via ppc64 tree if required.
>
> Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
> page table update rules.
>
> Changes from V1:
> * Address review feedback
> * drop test specific pfn_pte and pfn_pmd.
> * Update ppc64 page table helper to add _PAGE_PTE 
>
> Aneesh Kumar K.V (13):
>   powerpc/mm: Add DEBUG_VM WARN for pmd_clear
>   powerpc/mm: Move setting pte specific flags to pfn_pte
>   mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
>   mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge
> vmap support.
>   mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with
> CONFIG_NUMA_BALANCING
>   mm/debug_vm_pgtable/THP: Mark the pte entry huge before using
> set_pmd/pud_at
>   mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an
> existing pte entry
>   mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP
>   mm/debug_vm_pgtable/locks: Move non page table modifying test together
>   mm/debug_vm_pgtable/locks: Take correct page table lock
>   mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries
>   mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64
>   mm/debug_vm_pgtable: populate a pte entry before fetching it
>
>  arch/powerpc/include/asm/book3s/64/pgtable.h |  29 +++-
>  arch/powerpc/include/asm/nohash/pgtable.h|   5 -
>  arch/powerpc/mm/book3s64/pgtable.c   |   2 +-
>  arch/powerpc/mm/pgtable.c|   5 -
>  include/linux/io.h   |  12 ++
>  mm/debug_vm_pgtable.c| 151 +++
>  6 files changed, 127 insertions(+), 77 deletions(-)
>

BTW I picked a wrong branch when sending this. Attaching the diff
against what I want to send.  pfn_pmd() no more updates _PAGE_PTE
because that is handled by pmd_mkhuge().

diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 3b4da7c63e28..e18ae50a275c 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -141,7 +141,7 @@ pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
unsigned long pmdv;
 
pmdv = (pfn << PAGE_SHIFT) & PTE_RPN_MASK;
-   return __pmd(pmdv | pgprot_val(pgprot) | _PAGE_PTE);
+   return pmd_set_protbits(__pmd(pmdv), pgprot);
 }
 
 pmd_t mk_pmd(struct page *page, pgprot_t pgprot)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 7d9f8e1d790f..cad61d22f33a 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -229,7 +229,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
long pfn, pgprot_t prot)
 
 static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
 {
-   pmd_t pmd = pfn_pmd(pfn, prot);
+   pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
 
if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
return;


Re: [PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources

2020-08-19 Thread Frederic Barrat




Le 19/08/2020 à 15:15, Oliver O'Halloran a écrit :

On Wed, Aug 19, 2020 at 11:07 PM Frederic Barrat  wrote:


Fix typo introduced during recent code cleanup, which could lead to
silently not freeing resources or oops message (on PCI hotplug or CAPI
reset).


oof

Did you actually hit that oops on CAPI reset? Including the stack
trace is helpful for avoiding this sort of problem in the future.
Anyway,


yeah, I found it with capi reset. It's actually not a oops, we hit the 
WARN_ON in iommu_tce_table_put(), when we try to release the DMA 
resource for a PE where it was never configured to start with. So that 
was easy to track.


  Fred



Reviewed-by: Oliver O'Halloran 


Only impacts ioda2, the code path for ioda1 is correct.


yeah but no ones uses ioda1


Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA setup 
state")
Signed-off-by: Frederic Barrat 
---
  arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index c9c25fb0783c..023a4f987bb2 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2705,7 +2705,7 @@ void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe *pe)
 struct iommu_table *tbl = pe->table_group.tables[0];
 int64_t rc;

-   if (pe->dma_setup_done)
+   if (!pe->dma_setup_done)
 return;

 rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
--
2.26.2



Re: [PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources

2020-08-19 Thread Oliver O'Halloran
On Wed, Aug 19, 2020 at 11:07 PM Frederic Barrat  wrote:
>
> Fix typo introduced during recent code cleanup, which could lead to
> silently not freeing resources or oops message (on PCI hotplug or CAPI
> reset).

oof

Did you actually hit that oops on CAPI reset? Including the stack
trace is helpful for avoiding this sort of problem in the future.
Anyway,

Reviewed-by: Oliver O'Halloran 

> Only impacts ioda2, the code path for ioda1 is correct.

yeah but no ones uses ioda1

> Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA 
> setup state")
> Signed-off-by: Frederic Barrat 
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index c9c25fb0783c..023a4f987bb2 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2705,7 +2705,7 @@ void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe 
> *pe)
> struct iommu_table *tbl = pe->table_group.tables[0];
> int64_t rc;
>
> -   if (pe->dma_setup_done)
> +   if (!pe->dma_setup_done)
> return;
>
> rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> --
> 2.26.2
>


[PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources

2020-08-19 Thread Frederic Barrat
Fix typo introduced during recent code cleanup, which could lead to
silently not freeing resources or oops message (on PCI hotplug or CAPI
reset).
Only impacts ioda2, the code path for ioda1 is correct.

Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA 
setup state")
Signed-off-by: Frederic Barrat 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index c9c25fb0783c..023a4f987bb2 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2705,7 +2705,7 @@ void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe *pe)
struct iommu_table *tbl = pe->table_group.tables[0];
int64_t rc;
 
-   if (pe->dma_setup_done)
+   if (!pe->dma_setup_done)
return;
 
rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
-- 
2.26.2



[PATCH v2 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.

2020-08-19 Thread Aneesh Kumar K.V
ppc64 supports huge vmap only with radix translation. Hence use arch helper
to determine the huge vmap support.

Signed-off-by: Aneesh Kumar K.V 
---
 include/linux/io.h| 12 
 mm/debug_vm_pgtable.c |  4 ++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/io.h b/include/linux/io.h
index 8394c56babc2..0b1ecda0cc86 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -38,6 +38,18 @@ int arch_ioremap_pud_supported(void);
 int arch_ioremap_pmd_supported(void);
 #else
 static inline void ioremap_huge_init(void) { }
+static inline int arch_ioremap_p4d_supported(void)
+{
+   return false;
+}
+static inline int arch_ioremap_pud_supported(void)
+{
+   return false;
+}
+static inline int arch_ioremap_pmd_supported(void)
+{
+   return false;
+}
 #endif
 
 /*
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 57259e2dbd17..cf3c4792b4a2 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -206,7 +206,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned 
long pfn, pgprot_t prot)
 {
pmd_t pmd;
 
-   if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+   if (!arch_ioremap_pmd_supported())
return;
 
pr_debug("Validating PMD huge\n");
@@ -320,7 +320,7 @@ static void __init pud_huge_tests(pud_t *pudp, unsigned 
long pfn, pgprot_t prot)
 {
pud_t pud;
 
-   if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+   if (!arch_ioremap_pud_supported())
return;
 
pr_debug("Validating PUD huge\n");
-- 
2.26.2



[PATCH v2 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it

2020-08-19 Thread Aneesh Kumar K.V
pte_clear_tests operate on an existing pte entry. Make sure that is not a none
pte entry.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 09ce9974c187..7d9f8e1d790f 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -531,7 +531,7 @@ static void __init pgd_populate_tests(struct mm_struct *mm, 
pgd_t *pgdp,
 static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
   unsigned long vaddr)
 {
-   pte_t pte = ptep_get(ptep);
+   pte_t pte =  ptep_get_and_clear(mm, vaddr, ptep);
 
pr_debug("Validating PTE clear\n");
pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
@@ -929,7 +929,7 @@ static int __init debug_vm_pgtable(void)
p4d_t *p4dp, *saved_p4dp;
pud_t *pudp, *saved_pudp;
pmd_t *pmdp, *saved_pmdp, pmd;
-   pte_t *ptep;
+   pte_t *ptep, pte;
pgtable_t saved_ptep;
pgprot_t prot, protnone;
phys_addr_t paddr;
@@ -1034,6 +1034,8 @@ static int __init debug_vm_pgtable(void)
 */
 
ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl);
+   pte = pfn_pte(pte_aligned, prot);
+   set_pte_at(mm, vaddr, ptep, pte);
pte_clear_tests(mm, ptep, vaddr);
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
pte_unmap_unlock(ptep, ptl);
-- 
2.26.2



[PATCH v2 05/13] mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with CONFIG_NUMA_BALANCING

2020-08-19 Thread Aneesh Kumar K.V
Saved write support was added to track the write bit of a pte after marking the
pte protnone. This was done so that AUTONUMA can convert a write pte to protnone
and still track the old write bit. When converting it back we set the pte write
bit correctly thereby avoiding a write fault again. Hence enable the test only
when CONFIG_NUMA_BALANCING is enabled and use protnone protflags.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index cf3c4792b4a2..ffa10ede6842 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -114,10 +114,14 @@ static void __init pte_savedwrite_tests(unsigned long 
pfn, pgprot_t prot)
 {
pte_t pte = pfn_pte(pfn, prot);
 
+   if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+   return;
+
pr_debug("Validating PTE saved write\n");
WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte;
WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte;
 }
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
 {
@@ -225,6 +229,9 @@ static void __init pmd_savedwrite_tests(unsigned long pfn, 
pgprot_t prot)
 {
pmd_t pmd = pfn_pmd(pfn, prot);
 
+   if (!IS_ENABLED(CONFIG_NUMA_BALANCING))
+   return;
+
pr_debug("Validating PMD saved write\n");
WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd;
WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd;
@@ -1005,8 +1012,8 @@ static int __init debug_vm_pgtable(void)
pmd_huge_tests(pmdp, pmd_aligned, prot);
pud_huge_tests(pudp, pud_aligned, prot);
 
-   pte_savedwrite_tests(pte_aligned, prot);
-   pmd_savedwrite_tests(pmd_aligned, prot);
+   pte_savedwrite_tests(pte_aligned, protnone);
+   pmd_savedwrite_tests(pmd_aligned, protnone);
 
pte_unmap_unlock(ptep, ptl);
 
-- 
2.26.2



[PATCH v2 09/13] mm/debug_vm_pgtable/locks: Move non page table modifying test together

2020-08-19 Thread Aneesh Kumar K.V
This will help in adding proper locks in a later patch

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 52 ---
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 6dcac2b40fef..69fe3cd8126c 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -977,7 +977,7 @@ static int __init debug_vm_pgtable(void)
p4dp = p4d_alloc(mm, pgdp, vaddr);
pudp = pud_alloc(mm, p4dp, vaddr);
pmdp = pmd_alloc(mm, pudp, vaddr);
-   ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl);
+   ptep = pte_alloc_map(mm, pmdp, vaddr);
 
/*
 * Save all the page table page addresses as the page table
@@ -997,33 +997,12 @@ static int __init debug_vm_pgtable(void)
p4d_basic_tests(p4d_aligned, prot);
pgd_basic_tests(pgd_aligned, prot);
 
-   pte_clear_tests(mm, ptep, vaddr);
-   pmd_clear_tests(mm, pmdp);
-   pud_clear_tests(mm, pudp);
-   p4d_clear_tests(mm, p4dp);
-   pgd_clear_tests(mm, pgdp);
-
-   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
-   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
-   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-
pmd_leaf_tests(pmd_aligned, prot);
pud_leaf_tests(pud_aligned, prot);
 
-   pmd_huge_tests(pmdp, pmd_aligned, prot);
-   pud_huge_tests(pudp, pud_aligned, prot);
-
pte_savedwrite_tests(pte_aligned, protnone);
pmd_savedwrite_tests(pmd_aligned, protnone);
 
-   pte_unmap_unlock(ptep, ptl);
-
-   pmd_populate_tests(mm, pmdp, saved_ptep);
-   pud_populate_tests(mm, pudp, saved_pmdp);
-   p4d_populate_tests(mm, p4dp, saved_pudp);
-   pgd_populate_tests(mm, pgdp, saved_p4dp);
-
pte_special_tests(pte_aligned, prot);
pte_protnone_tests(pte_aligned, protnone);
pmd_protnone_tests(pmd_aligned, protnone);
@@ -1041,11 +1020,38 @@ static int __init debug_vm_pgtable(void)
pmd_swap_tests(pmd_aligned, prot);
 
swap_migration_tests();
-   hugetlb_basic_tests(pte_aligned, prot);
 
pmd_thp_tests(pmd_aligned, prot);
pud_thp_tests(pud_aligned, prot);
 
+   /*
+* Page table modifying tests
+*/
+   pte_clear_tests(mm, ptep, vaddr);
+   pmd_clear_tests(mm, pmdp);
+   pud_clear_tests(mm, pudp);
+   p4d_clear_tests(mm, p4dp);
+   pgd_clear_tests(mm, pgdp);
+
+   ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl);
+   pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
+   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
+   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+
+
+   pmd_huge_tests(pmdp, pmd_aligned, prot);
+   pud_huge_tests(pudp, pud_aligned, prot);
+
+   pte_unmap_unlock(ptep, ptl);
+
+   pmd_populate_tests(mm, pmdp, saved_ptep);
+   pud_populate_tests(mm, pudp, saved_pmdp);
+   p4d_populate_tests(mm, p4dp, saved_pudp);
+   pgd_populate_tests(mm, pgdp, saved_p4dp);
+
+   hugetlb_basic_tests(pte_aligned, prot);
+
p4d_free(mm, saved_p4dp);
pud_free(mm, saved_pudp);
pmd_free(mm, saved_pmdp);
-- 
2.26.2



[PATCH v2 06/13] mm/debug_vm_pgtable/THP: Mark the pte entry huge before using set_pmd/pud_at

2020-08-19 Thread Aneesh Kumar K.V
kernel expects entries to be marked huge before we use 
set_pmd_at()/set_pud_at().

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index ffa10ede6842..76f4c713e5a3 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -150,7 +150,7 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
  unsigned long pfn, unsigned long vaddr,
  pgprot_t prot)
 {
-   pmd_t pmd = pfn_pmd(pfn, prot);
+   pmd_t pmd;
 
if (!has_transparent_hugepage())
return;
@@ -159,19 +159,19 @@ static void __init pmd_advanced_tests(struct mm_struct 
*mm,
/* Align the address wrt HPAGE_PMD_SIZE */
vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
 
-   pmd = pfn_pmd(pfn, prot);
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_set_wrprotect(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_write(pmd));
 
-   pmd = pfn_pmd(pfn, prot);
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
 
-   pmd = pfn_pmd(pfn, prot);
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
pmd = pmd_wrprotect(pmd);
pmd = pmd_mkclean(pmd);
set_pmd_at(mm, vaddr, pmdp, pmd);
@@ -267,7 +267,7 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
  unsigned long pfn, unsigned long vaddr,
  pgprot_t prot)
 {
-   pud_t pud = pfn_pud(pfn, prot);
+   pud_t pud;
 
if (!has_transparent_hugepage())
return;
@@ -276,25 +276,28 @@ static void __init pud_advanced_tests(struct mm_struct 
*mm,
/* Align the address wrt HPAGE_PUD_SIZE */
vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE;
 
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_set_wrprotect(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(pud_write(pud));
 
 #ifndef __PAGETABLE_PMD_FOLDED
-   pud = pfn_pud(pfn, prot);
+
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 
-   pud = pfn_pud(pfn, prot);
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 #endif /* __PAGETABLE_PMD_FOLDED */
-   pud = pfn_pud(pfn, prot);
+
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
pud = pud_wrprotect(pud);
pud = pud_mkclean(pud);
set_pud_at(mm, vaddr, pudp, pud);
-- 
2.26.2



[PATCH v2 07/13] mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an existing pte entry

2020-08-19 Thread Aneesh Kumar K.V
set_pte_at() should not be used to set a pte entry at locations that
already holds a valid pte entry. Architectures like ppc64 don't do TLB
invalidate in set_pte_at() and hence expect it to be used to set locations
that are not a valid PTE.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 35 +++
 1 file changed, 15 insertions(+), 20 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 76f4c713e5a3..9c7e2c9cfc76 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -74,15 +74,18 @@ static void __init pte_advanced_tests(struct mm_struct *mm,
 {
pte_t pte = pfn_pte(pfn, prot);
 
+   /*
+* Architectures optimize set_pte_at by avoiding TLB flush.
+* This requires set_pte_at to be not used to update an
+* existing pte entry. Clear pte before we do set_pte_at
+*/
+
pr_debug("Validating PTE advanced\n");
pte = pfn_pte(pfn, prot);
set_pte_at(mm, vaddr, ptep, pte);
ptep_set_wrprotect(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(pte_write(pte));
-
-   pte = pfn_pte(pfn, prot);
-   set_pte_at(mm, vaddr, ptep, pte);
ptep_get_and_clear(mm, vaddr, ptep);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
@@ -96,13 +99,11 @@ static void __init pte_advanced_tests(struct mm_struct *mm,
ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
pte = ptep_get(ptep);
WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
-
-   pte = pfn_pte(pfn, prot);
-   set_pte_at(mm, vaddr, ptep, pte);
ptep_get_and_clear_full(mm, vaddr, ptep, 1);
pte = ptep_get(ptep);
WARN_ON(!pte_none(pte));
 
+   pte = pfn_pte(pfn, prot);
pte = pte_mkyoung(pte);
set_pte_at(mm, vaddr, ptep, pte);
ptep_test_and_clear_young(vma, vaddr, ptep);
@@ -164,9 +165,6 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmdp_set_wrprotect(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_write(pmd));
-
-   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
-   set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
@@ -180,13 +178,11 @@ static void __init pmd_advanced_tests(struct mm_struct 
*mm,
pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
pmd = READ_ONCE(*pmdp);
WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
-
-   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
-   set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1);
pmd = READ_ONCE(*pmdp);
WARN_ON(!pmd_none(pmd));
 
+   pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
pmd = pmd_mkyoung(pmd);
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_test_and_clear_young(vma, vaddr, pmdp);
@@ -283,18 +279,10 @@ static void __init pud_advanced_tests(struct mm_struct 
*mm,
WARN_ON(pud_write(pud));
 
 #ifndef __PAGETABLE_PMD_FOLDED
-
-   pud = pud_mkhuge(pfn_pud(pfn, prot));
-   set_pud_at(mm, vaddr, pudp, pud);
pudp_huge_get_and_clear(mm, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(!pud_none(pud));
 
-   pud = pud_mkhuge(pfn_pud(pfn, prot));
-   set_pud_at(mm, vaddr, pudp, pud);
-   pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
-   pud = READ_ONCE(*pudp);
-   WARN_ON(!pud_none(pud));
 #endif /* __PAGETABLE_PMD_FOLDED */
 
pud = pud_mkhuge(pfn_pud(pfn, prot));
@@ -307,6 +295,13 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
pud = READ_ONCE(*pudp);
WARN_ON(!(pud_write(pud) && pud_dirty(pud)));
 
+#ifndef __PAGETABLE_PMD_FOLDED
+   pudp_huge_get_and_clear_full(vma, vaddr, pudp, 1);
+   pud = READ_ONCE(*pudp);
+   WARN_ON(!pud_none(pud));
+#endif /* __PAGETABLE_PMD_FOLDED */
+
+   pud = pud_mkhuge(pfn_pud(pfn, prot));
pud = pud_mkyoung(pud);
set_pud_at(mm, vaddr, pudp, pud);
pudp_test_and_clear_young(vma, vaddr, pudp);
-- 
2.26.2



[PATCH v2 12/13] mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64

2020-08-19 Thread Aneesh Kumar K.V
The seems to be missing quite a lot of details w.r.t allocating
the correct pgtable_t page (huge_pte_alloc()), holding the right
lock (huge_pte_lock()) etc. The vma used is also not a hugetlb VMA.

ppc64 do have runtime checks within CONFIG_DEBUG_VM for most of these.
Hence disable the test on ppc64.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 63576fe767a2..09ce9974c187 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -798,6 +798,7 @@ static void __init hugetlb_basic_tests(unsigned long pfn, 
pgprot_t prot)
 #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 }
 
+#ifndef CONFIG_PPC_BOOK3S_64
 static void __init hugetlb_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma,
  pte_t *ptep, unsigned long pfn,
@@ -840,6 +841,7 @@ static void __init hugetlb_advanced_tests(struct mm_struct 
*mm,
pte = huge_ptep_get(ptep);
WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte)));
 }
+#endif
 #else  /* !CONFIG_HUGETLB_PAGE */
 static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { }
 static void __init hugetlb_advanced_tests(struct mm_struct *mm,
@@ -1050,7 +1052,9 @@ static int __init debug_vm_pgtable(void)
pud_populate_tests(mm, pudp, saved_pmdp);
spin_unlock(ptl);
 
-   //hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+#ifndef CONFIG_PPC_BOOK3S_64
+   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
+#endif
 
spin_lock(&mm->page_table_lock);
p4d_clear_tests(mm, p4dp);
-- 
2.26.2



[PATCH v2 11/13] mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries

2020-08-19 Thread Aneesh Kumar K.V
pmd_clear() should not be used to clear pmd level pte entries.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 8f7a8ccb5a54..63576fe767a2 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -191,6 +191,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_young(pmd));
 
+   /*  Clear the pte entries  */
+   pmdp_huge_get_and_clear(mm, vaddr, pmdp);
pgtable = pgtable_trans_huge_withdraw(mm, pmdp);
 }
 
@@ -311,6 +313,8 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
pudp_test_and_clear_young(vma, vaddr, pudp);
pud = READ_ONCE(*pudp);
WARN_ON(pud_young(pud));
+
+   pudp_huge_get_and_clear(mm, vaddr, pudp);
 }
 
 static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot)
@@ -429,8 +433,6 @@ static void __init pud_populate_tests(struct mm_struct *mm, 
pud_t *pudp,
 * This entry points to next level page table page.
 * Hence this must not qualify as pud_bad().
 */
-   pmd_clear(pmdp);
-   pud_clear(pudp);
pud_populate(mm, pudp, pmdp);
pud = READ_ONCE(*pudp);
WARN_ON(pud_bad(pud));
@@ -562,7 +564,6 @@ static void __init pmd_populate_tests(struct mm_struct *mm, 
pmd_t *pmdp,
 * This entry points to next level page table page.
 * Hence this must not qualify as pmd_bad().
 */
-   pmd_clear(pmdp);
pmd_populate(mm, pmdp, pgtable);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_bad(pmd));
-- 
2.26.2



[PATCH v2 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value

2020-08-19 Thread Aneesh Kumar K.V
ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in
random value.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 086309fb9b6f..57259e2dbd17 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -44,10 +44,13 @@
  * entry type. But these bits might affect the ability to clear entries with
  * pxx_clear() because of how dynamic page table folding works on s390. So
  * while loading up the entries do not change the lower 4 bits. It does not
- * have affect any other platform.
+ * have affect any other platform. Also avoid the 62nd bit on ppc64 that is
+ * used to mark a pte entry.
  */
-#define S390_MASK_BITS 4
-#define RANDOM_ORVALUE GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
+#define S390_SKIP_MASK GENMASK(3, 0)
+#define PPC64_SKIP_MASKGENMASK(62, 62)
+#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK)
+#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK)
 #define RANDOM_NZVALUE GENMASK(7, 0)
 
 static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
-- 
2.26.2



[PATCH v2 10/13] mm/debug_vm_pgtable/locks: Take correct page table lock

2020-08-19 Thread Aneesh Kumar K.V
Make sure we call pte accessors with correct lock held.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 34 --
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 69fe3cd8126c..8f7a8ccb5a54 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -1024,33 +1024,39 @@ static int __init debug_vm_pgtable(void)
pmd_thp_tests(pmd_aligned, prot);
pud_thp_tests(pud_aligned, prot);
 
+   hugetlb_basic_tests(pte_aligned, prot);
+
/*
 * Page table modifying tests
 */
-   pte_clear_tests(mm, ptep, vaddr);
-   pmd_clear_tests(mm, pmdp);
-   pud_clear_tests(mm, pudp);
-   p4d_clear_tests(mm, p4dp);
-   pgd_clear_tests(mm, pgdp);
 
ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl);
+   pte_clear_tests(mm, ptep, vaddr);
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
-   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
-   hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-
+   pte_unmap_unlock(ptep, ptl);
 
+   ptl = pmd_lock(mm, pmdp);
+   pmd_clear_tests(mm, pmdp);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
pmd_huge_tests(pmdp, pmd_aligned, prot);
+   pmd_populate_tests(mm, pmdp, saved_ptep);
+   spin_unlock(ptl);
+
+   ptl = pud_lock(mm, pudp);
+   pud_clear_tests(mm, pudp);
+   pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
pud_huge_tests(pudp, pud_aligned, prot);
+   pud_populate_tests(mm, pudp, saved_pmdp);
+   spin_unlock(ptl);
 
-   pte_unmap_unlock(ptep, ptl);
+   //hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
 
-   pmd_populate_tests(mm, pmdp, saved_ptep);
-   pud_populate_tests(mm, pudp, saved_pmdp);
+   spin_lock(&mm->page_table_lock);
+   p4d_clear_tests(mm, p4dp);
+   pgd_clear_tests(mm, pgdp);
p4d_populate_tests(mm, p4dp, saved_pudp);
pgd_populate_tests(mm, pgdp, saved_p4dp);
-
-   hugetlb_basic_tests(pte_aligned, prot);
+   spin_unlock(&mm->page_table_lock);
 
p4d_free(mm, saved_p4dp);
pud_free(mm, saved_pudp);
-- 
2.26.2



[PATCH v2 02/13] powerpc/mm: Move setting pte specific flags to pfn_pte

2020-08-19 Thread Aneesh Kumar K.V
powerpc used to set the pte specific flags in set_pte_at(). This is different
from other architectures. To be consistent with other architecture update
pfn_pte and pfn_pmd to set _PAGE_PTE on ppc64. Also, drop now unused pte_mkpte.

We add a VM_WARN_ON() to catch the usage of calling set_pte_at() without setting
_PAGE_PTE bit. We will remove that after a few releases.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 15 +--
 arch/powerpc/include/asm/nohash/pgtable.h|  5 -
 arch/powerpc/mm/book3s64/pgtable.c   |  2 +-
 arch/powerpc/mm/pgtable.c|  5 -
 4 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 079211968987..2382fd516f6b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -619,7 +619,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t 
pgprot)
VM_BUG_ON(pfn >> (64 - PAGE_SHIFT));
VM_BUG_ON((pfn << PAGE_SHIFT) & ~PTE_RPN_MASK);
 
-   return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot));
+   return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot) | 
_PAGE_PTE);
 }
 
 static inline unsigned long pte_pfn(pte_t pte)
@@ -655,11 +655,6 @@ static inline pte_t pte_mkexec(pte_t pte)
return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_EXEC));
 }
 
-static inline pte_t pte_mkpte(pte_t pte)
-{
-   return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
-}
-
 static inline pte_t pte_mkwrite(pte_t pte)
 {
/*
@@ -823,6 +818,14 @@ static inline int pte_none(pte_t pte)
 static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, int percpu)
 {
+
+   VM_WARN_ON(!(pte_raw(pte) & cpu_to_be64(_PAGE_PTE)));
+   /*
+* Keep the _PAGE_PTE added till we are sure we handle _PAGE_PTE
+* in all the callers.
+*/
+pte = __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE));
+
if (radix_enabled())
return radix__set_pte_at(mm, addr, ptep, pte, percpu);
return hash__set_pte_at(mm, addr, ptep, pte, percpu);
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h 
b/arch/powerpc/include/asm/nohash/pgtable.h
index 4b7c3472eab1..6277e7596ae5 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -140,11 +140,6 @@ static inline pte_t pte_mkold(pte_t pte)
return __pte(pte_val(pte) & ~_PAGE_ACCESSED);
 }
 
-static inline pte_t pte_mkpte(pte_t pte)
-{
-   return pte;
-}
-
 static inline pte_t pte_mkspecial(pte_t pte)
 {
return __pte(pte_val(pte) | _PAGE_SPECIAL);
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index e18ae50a275c..3b4da7c63e28 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -141,7 +141,7 @@ pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot)
unsigned long pmdv;
 
pmdv = (pfn << PAGE_SHIFT) & PTE_RPN_MASK;
-   return pmd_set_protbits(__pmd(pmdv), pgprot);
+   return __pmd(pmdv | pgprot_val(pgprot) | _PAGE_PTE);
 }
 
 pmd_t mk_pmd(struct page *page, pgprot_t pgprot)
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 9c0547d77af3..ab57b07ef39a 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -184,9 +184,6 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, 
pte_t *ptep,
 */
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
 
-   /* Add the pte bit when trying to set a pte */
-   pte = pte_mkpte(pte);
-
/* Note: mm->context.id might not yet have been assigned as
 * this context might not have been activated yet when this
 * is called.
@@ -275,8 +272,6 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep, pte_
 */
VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep));
 
-   pte = pte_mkpte(pte);
-
pte = set_pte_filter(pte);
 
val = pte_val(pte);
-- 
2.26.2



[PATCH v2 08/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP

2020-08-19 Thread Aneesh Kumar K.V
Architectures like ppc64 use deposited page table while updating the huge pte
entries.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/debug_vm_pgtable.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 9c7e2c9cfc76..6dcac2b40fef 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -149,7 +149,7 @@ static void __init pmd_basic_tests(unsigned long pfn, 
pgprot_t prot)
 static void __init pmd_advanced_tests(struct mm_struct *mm,
  struct vm_area_struct *vma, pmd_t *pmdp,
  unsigned long pfn, unsigned long vaddr,
- pgprot_t prot)
+ pgprot_t prot, pgtable_t pgtable)
 {
pmd_t pmd;
 
@@ -160,6 +160,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
/* Align the address wrt HPAGE_PMD_SIZE */
vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE;
 
+   pgtable_trans_huge_deposit(mm, pmdp, pgtable);
+
pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
set_pmd_at(mm, vaddr, pmdp, pmd);
pmdp_set_wrprotect(mm, vaddr, pmdp);
@@ -188,6 +190,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
pmdp_test_and_clear_young(vma, vaddr, pmdp);
pmd = READ_ONCE(*pmdp);
WARN_ON(pmd_young(pmd));
+
+   pgtable = pgtable_trans_huge_withdraw(mm, pmdp);
 }
 
 static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)
@@ -1000,7 +1004,7 @@ static int __init debug_vm_pgtable(void)
pgd_clear_tests(mm, pgdp);
 
pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
-   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot);
+   pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep);
pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot);
hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot);
 
-- 
2.26.2



[PATCH v2 00/13] mm/debug_vm_pgtable fixes

2020-08-19 Thread Aneesh Kumar K.V
This patch series includes fixes for debug_vm_pgtable test code so that
they follow page table updates rules correctly. The first two patches introduce
changes w.r.t ppc64. The patches are included in this series for completeness. 
We can
merge them via ppc64 tree if required.

Hugetlb test is disabled on ppc64 because that needs larger change to satisfy
page table update rules.

Changes from V1:
* Address review feedback
* drop test specific pfn_pte and pfn_pmd.
* Update ppc64 page table helper to add _PAGE_PTE 

Aneesh Kumar K.V (13):
  powerpc/mm: Add DEBUG_VM WARN for pmd_clear
  powerpc/mm: Move setting pte specific flags to pfn_pte
  mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
  mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge
vmap support.
  mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with
CONFIG_NUMA_BALANCING
  mm/debug_vm_pgtable/THP: Mark the pte entry huge before using
set_pmd/pud_at
  mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an
existing pte entry
  mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP
  mm/debug_vm_pgtable/locks: Move non page table modifying test together
  mm/debug_vm_pgtable/locks: Take correct page table lock
  mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries
  mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64
  mm/debug_vm_pgtable: populate a pte entry before fetching it

 arch/powerpc/include/asm/book3s/64/pgtable.h |  29 +++-
 arch/powerpc/include/asm/nohash/pgtable.h|   5 -
 arch/powerpc/mm/book3s64/pgtable.c   |   2 +-
 arch/powerpc/mm/pgtable.c|   5 -
 include/linux/io.h   |  12 ++
 mm/debug_vm_pgtable.c| 151 +++
 6 files changed, 127 insertions(+), 77 deletions(-)

-- 
2.26.2



[PATCH v2 01/13] powerpc/mm: Add DEBUG_VM WARN for pmd_clear

2020-08-19 Thread Aneesh Kumar K.V
With the hash page table, the kernel should not use pmd_clear for clearing
huge pte entries. Add a DEBUG_VM WARN to catch the wrong usage.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6de56c3b33c4..079211968987 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -868,6 +868,13 @@ static inline bool pte_ci(pte_t pte)
 
 static inline void pmd_clear(pmd_t *pmdp)
 {
+   if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) {
+   /*
+* Don't use this if we can possibly have a hash page table
+* entry mapping this.
+*/
+   WARN_ON((pmd_val(*pmdp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == 
(H_PAGE_HASHPTE | _PAGE_PTE));
+   }
*pmdp = __pmd(0);
 }
 
@@ -916,6 +923,13 @@ static inline int pmd_bad(pmd_t pmd)
 
 static inline void pud_clear(pud_t *pudp)
 {
+   if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) {
+   /*
+* Don't use this if we can possibly have a hash page table
+* entry mapping this.
+*/
+   WARN_ON((pud_val(*pudp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == 
(H_PAGE_HASHPTE | _PAGE_PTE));
+   }
*pudp = __pud(0);
 }
 
-- 
2.26.2



Re: [PATCH 00/10] sound: convert tasklets to use new tasklet_setup()

2020-08-19 Thread Takashi Iwai
On Wed, 19 Aug 2020 13:16:05 +0200,
Mark Brown wrote:
> 
> On Wed, Aug 19, 2020 at 04:21:58PM +0530, Allen wrote:
> 
> > > These patches which I wasn't CCed on and which need their subject lines
> > > fixing :( .  With the subject lines fixed I guess so so
> 
> > Extremely sorry. I thought I had it covered. How would you like it
> > worded?
> 
> ASoC:

To be more exact, "ASoC:" prefix is for sound/soc/*, and for the rest
sound/*, use "ALSA:" prefix please.


Takashi


Re: [PATCH 00/10] sound: convert tasklets to use new tasklet_setup()

2020-08-19 Thread Mark Brown
On Wed, Aug 19, 2020 at 04:21:58PM +0530, Allen wrote:

> > These patches which I wasn't CCed on and which need their subject lines
> > fixing :( .  With the subject lines fixed I guess so so

> Extremely sorry. I thought I had it covered. How would you like it
> worded?

ASoC:

In general you should try to follow the style for the code you're
modifying, this applies to things like commit logs as well as the code
itself.


signature.asc
Description: PGP signature


Re: [PATCH 00/10] sound: convert tasklets to use new tasklet_setup()

2020-08-19 Thread Allen
>
> > Mark, may I apply those ASoC patches through my tree together with
> > others?  Those seem targeting to 5.9, and I have a patch set to
> > convert to tasklet for 5.10, which would be better manageable when
> > based on top of those changes.
>
> These patches which I wasn't CCed on and which need their subject lines
> fixing :( .  With the subject lines fixed I guess so so

Extremely sorry. I thought I had it covered. How would you like it
worded?

> Acked-by: Mark Brown 
>
> but judging from some of the other threads about similar patches that I
> was randomly CCed on I'm not sure people like from_tasklet() so perhaps
> there might be issues.

Yes, there is a new macro by name cast_out() is suggested in place of
from_tasklet(). Hopefully it will go in soon. Will spin out V2 with the change
and also re-word subject line.

> Allen, as documented in submitting-patches.rst please send patches to
> the maintainers for the code you would like to change.  The normal
> kernel workflow is that people apply patches from their inboxes, if they
> aren't copied they are likely to not see the patch at all and it is much
> more difficult to apply patches.

I understand, I'll take care of it in the future. Thank you.

-- 
   - Allen


Re: [PATCH 1/2] lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state

2020-08-19 Thread Alexey Kardashevskiy



On 19/08/2020 09:54, Nicholas Piggin wrote:
> Excerpts from pet...@infradead.org's message of August 19, 2020 1:41 am:
>> On Tue, Aug 18, 2020 at 05:22:33PM +1000, Nicholas Piggin wrote:
>>> Excerpts from pet...@infradead.org's message of August 12, 2020 8:35 pm:
 On Wed, Aug 12, 2020 at 06:18:28PM +1000, Nicholas Piggin wrote:
> Excerpts from pet...@infradead.org's message of August 7, 2020 9:11 pm:
>>
>> What's wrong with something like this?
>>
>> AFAICT there's no reason to actually try and add IRQ tracing here, it's
>> just a hand full of instructions at the most.
>
> Because we may want to use that in other places as well, so it would
> be nice to have tracing.
>
> Hmm... also, I thought NMI context was free to call local_irq_save/restore
> anyway so the bug would still be there in those cases?

 NMI code has in_nmi() true, in which case the IRQ tracing is disabled
 (except for x86 which has CONFIG_TRACE_IRQFLAGS_NMI).

>>>
>>> That doesn't help. It doesn't fix the lockdep irq state going out of
>>> synch with the actual irq state. The code which triggered this with the
>>> special powerpc irq disable has in_nmi() true as well.
>>
>> Urgh, you're talking about using lockdep_assert_irqs*() from NMI
>> context?
>>
>> If not, I'm afraid I might've lost the plot a little on what exact
>> failure case we're talking about.
>>
> 
> Hm, I may have been a bit confused actually. Since your Fix 
> TRACE_IRQFLAGS vs NMIs patch it might now work.
> 
> I'm worried powerpc disables trace irqs trace_hardirqs_off()
> before nmi_enter() might still be a problem, but not sure
> actually. Alexey did you end up re-testing with Peter's patch

The one above in the thread which replaces powerpc_local_irq_pmu_save()
with
raw_powerpc_local_irq_pmu_save()? It did not compile as there is no
raw_powerpc_local_irq_pmu_save() so I may be missing something here.

I applied the patch on top of the current upstream and replaced
raw_powerpc_local_irq_pmu_save() with raw_local_irq_pmu_save()  (which I
think was the intention) but I still see the issue.

> or current upstream?

The upstream 18445bf405cb (13 hours old) also shows the problem. Yours
1/2 still fixes it.


> 
> Thanks,
> Nick
> 

-- 
Alexey


Re: [Virtual ppce500] virtio_gpu virtio0: swiotlb buffer is full

2020-08-19 Thread Christian Zigotzky

On 19 August 2020 at 06:35 am, Gerd Hoffmann wrote:

On Tue, Aug 18, 2020 at 04:41:38PM +0200, Christian Zigotzky wrote:

Hello Gerd,

I compiled a new kernel with the latest DRM misc updates today. The patch is
included in these updates.

This kernel works with the VirtIO-GPU in a virtual e5500 QEMU/KVM HV machine
on my X5000.

Unfortunately I can only use the VirtIO-GPU (Monitor: Red Hat, Inc. 8") with
a resolution of 640x480. If I set a higher resolution then the guest
disables the monitor.
I can use higher resolutions with the stable kernel 5.8 and the VirtIO-GPU.

Please check the latest DRM updates.

https://patchwork.freedesktop.org/patch/385980/

(tests & reviews & acks are welcome)

HTH,
   Gerd


Hello Gerd,

I compiled a new RC1 with our patches today. With these patches, the 
VirtIO-GPU works without any problems. I can use higher resolutions again.


Screenshot of the RC1-3 with the VirtIO-GPU in a virtual e5500 QEMU/KVM 
HV machine on my X5000: 
https://i.pinimg.com/originals/4f/b0/14/4fb01476edd7abe6be1e1203a8e7e152.png


Thanks a lot for your help!

Cheers,
Christian


[PATCH] powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10

2020-08-19 Thread Nicholas Piggin
This driver does not restore stop > 3 state, so it limits itself
to states which do not lose full state or TB.

The POWER10 SPRs are sufficiently different from P9 that it seems
easier to split out the P10 code. The POWER10 deep sleep code
(e.g., the BHRB restore) has been taken out, but it can be re-added
when stop > 3 support is added.

Cc: Ryan P Grimm 
Cc: Michael Neuling 
Cc: Gautham R. Shenoy 
Cc: Pratik Rajesh Sampat 
Signed-off-by: Nicholas Piggin 
---
re-sending with linuxppc-dev list actually cc'ed

 arch/powerpc/include/asm/machdep.h|   2 -
 arch/powerpc/include/asm/processor.h  |   2 +-
 arch/powerpc/include/asm/reg.h|   1 +
 arch/powerpc/platforms/powernv/idle.c | 304 ++
 drivers/cpuidle/cpuidle-powernv.c |   2 +-
 5 files changed, 213 insertions(+), 98 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index a90b892f0bfe..5082cd496190 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -222,8 +222,6 @@ struct machdep_calls {
 
 extern void e500_idle(void);
 extern void power4_idle(void);
-extern void power7_idle(void);
-extern void power9_idle(void);
 extern void ppc6xx_idle(void);
 extern void book3e_idle(void);
 
diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa..6865147209de 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -432,7 +432,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, 
IDLE_POWERSAVE_OFF};
 extern int powersave_nap;  /* set if nap mode can be used in idle loop */
 
 extern void power7_idle_type(unsigned long type);
-extern void power9_idle_type(unsigned long stop_psscr_val,
+extern void arch300_idle_type(unsigned long stop_psscr_val,
  unsigned long stop_psscr_mask);
 
 extern void flush_instruction_cache(void);
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 88fb88491fe9..d3a0aed321d0 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -1353,6 +1353,7 @@
 #define PVR_POWER8NVL  0x004C
 #define PVR_POWER8 0x004D
 #define PVR_POWER9 0x004E
+#define PVR_POWER100x0080
 #define PVR_BE 0x0070
 #define PVR_PA6T   0x0090
 
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 77513a80cef9..1ed7c5286487 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -565,7 +565,7 @@ void power7_idle_type(unsigned long type)
irq_set_pending_from_srr1(srr1);
 }
 
-void power7_idle(void)
+static void power7_idle(void)
 {
if (!powersave_nap)
return;
@@ -659,20 +659,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
mmcr0   = mfspr(SPRN_MMCR0);
}
 
-   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
-   /*
-* POWER10 uses MMCRA (BHRBRD) as BHRB disable bit.
-* If the user hasn't asked for the BHRB to be
-* written, the value of MMCRA[BHRBRD] is 1.
-* On wakeup from stop, MMCRA[BHRBD] will be 0,
-* since it is previleged resource and will be lost.
-* Thus, if we do not save and restore the MMCRA[BHRBD],
-* hardware will be needlessly writing to the BHRB
-* in problem mode.
-*/
-   mmcra   = mfspr(SPRN_MMCRA);
-   }
-
if ((psscr & PSSCR_RL_MASK) >= deep_spr_loss_state) {
sprs.lpcr   = mfspr(SPRN_LPCR);
sprs.hfscr  = mfspr(SPRN_HFSCR);
@@ -735,10 +721,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
mtspr(SPRN_MMCR0, mmcr0);
}
 
-   /* Reload MMCRA to restore BHRB disable bit for POWER10 */
-   if (cpu_has_feature(CPU_FTR_ARCH_31))
-   mtspr(SPRN_MMCRA, mmcra);
-
/*
 * DD2.2 and earlier need to set then clear bit 60 in MMCRA
 * to ensure the PMU starts running.
@@ -823,73 +805,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, 
bool mmu_on)
return srr1;
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-static unsigned long power9_offline_stop(unsigned long psscr)
-{
-   unsigned long srr1;
-
-#ifndef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-   __ppc64_runlatch_off();
-   srr1 = power9_idle_stop(psscr, true);
-   __ppc64_runlatch_on();
-#else
-   /*
-* Tell KVM we're entering idle.
-* This does not have to be done in real mode because the P9 MMU
-* is independent per-thread. Some steppings share radix/hash mode
-* between threads, but in that case KVM has a barrier sync in real
-* mode before and after switching between radix and hash.
-

Re: [PATCH] powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute

2020-08-19 Thread Vaibhav Jain
Thanks Aneesh and Mpe for reviewing this patch.

Michael Ellerman  writes:

> "Aneesh Kumar K.V"  writes:
[snip]
>>>   
>>> +   /* Allow access only to perfmon capable users */
>>> +   if (!perfmon_capable())
>>> +   return -EACCES;
>>> +
>>
>> An access check is usually done in open(). This is the read callback IIUC.
>
> Yes. Otherwise an unprivileged user can open the file, and then trick a
> suid program into reading from it.

Agree, but since the 'open()' for this sysfs attribute is handled
by kern-fs, AFAIK dont see any direct way to enforce this policy.

Only other way it seems to me is to convert the 'perf_stats' DEVICE_ATTR_RO
to DEVICE_ATTR_ADMIN_RO.

>
> cheers

-- 
Cheers
~ Vaibhav


iter and normal ops on /dev/zero & co, was Re: remove the last set_fs() in common code, and remove it for x86 and powerpc

2020-08-19 Thread Christoph Hellwig
On Wed, Aug 19, 2020 at 09:16:59AM +0200, Christophe Leroy wrote:
> I made a test with only the first patch of your series: That's definitely 
> the culprit. With only that patch applies, the duration is 6.64 seconds, 
> that's a 25% degradation.

For the record: the first patch is:

 mem: remove duplicate ops for /dev/zero and /dev/null

So these micro-optimizations matter at least for some popular
benchmarks.  It would be easy to drop, but that means we either:

 - can't support kernel_read/write on these files, which should not
   matter

or
 
 - have to drop the check for both ops being present

Al, what do you think?


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc

2020-08-19 Thread Christophe Leroy




Le 18/08/2020 à 20:23, Christophe Leroy a écrit :



Le 18/08/2020 à 20:05, Christoph Hellwig a écrit :

On Tue, Aug 18, 2020 at 07:46:22PM +0200, Christophe Leroy wrote:

I gave it a go on my powerpc mpc832x. I tested it on top of my newest
series that reworks the 32 bits signal handlers (see
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278) with 


the microbenchmark test used is that series.

With KUAP activated, on top of signal32 rework, performance is 
boosted as
system time for the microbenchmark goes from 1.73s down to 1.56s, 
that is

10% quicker

Surprisingly, with the kernel as is today without my signal's series, 
your
series degrades performance slightly (from 2.55s to 2.64s ie 3.5% 
slower).



I also observe, in both cases, a degradation on

dd if=/dev/zero of=/dev/null count=1M

Without your series, it runs in 5.29 seconds.
With your series, it runs in 5.82 seconds, that is 10% more time.


That's pretty strage, I wonder if some kernel text cache line
effects come into play here?

The kernel access side is only used in slow path code, so it should
not make a difference, and the uaccess code is simplified and should be
(marginally) faster.

Btw, was this with the __{get,put}_user_allowed cockup that you noticed
fixed?



Yes it is with the __get_user_size() replaced by __get_user_size_allowed().


I made a test with only the first patch of your series: That's 
definitely the culprit. With only that patch applies, the duration is 
6.64 seconds, that's a 25% degradation.


A perf record provides the following without the patch:
41.91%  dd   [kernel.kallsyms]  [k] __arch_clear_user
 7.02%  dd   [kernel.kallsyms]  [k] vfs_read
 6.86%  dd   [kernel.kallsyms]  [k] new_sync_read
 6.68%  dd   [kernel.kallsyms]  [k] iov_iter_zero
 6.03%  dd   [kernel.kallsyms]  [k] transfer_to_syscall
 3.39%  dd   [kernel.kallsyms]  [k] memset
 3.07%  dd   [kernel.kallsyms]  [k] __fsnotify_parent
 2.68%  dd   [kernel.kallsyms]  [k] ksys_read
 2.09%  dd   [kernel.kallsyms]  [k] read_iter_zero
 2.01%  dd   [kernel.kallsyms]  [k] __fget_light
 1.84%  dd   [kernel.kallsyms]  [k] __fdget_pos
 1.35%  dd   [kernel.kallsyms]  [k] rw_verify_area
 1.32%  dd   libc-2.23.so   [.] __GI___libc_write
 1.21%  dd   [kernel.kallsyms]  [k] vfs_write
...
 0.03%  dd   [kernel.kallsyms]  [k] write_null

And the following with the patch:

15.54%  dd   [kernel.kallsyms]  [k] __arch_clear_user
 9.17%  dd   [kernel.kallsyms]  [k] vfs_read
 6.54%  dd   [kernel.kallsyms]  [k] new_sync_write
 6.31%  dd   [kernel.kallsyms]  [k] transfer_to_syscall
 6.29%  dd   [kernel.kallsyms]  [k] __fsnotify_parent
 6.20%  dd   [kernel.kallsyms]  [k] new_sync_read
 5.47%  dd   [kernel.kallsyms]  [k] memset
 5.13%  dd   [kernel.kallsyms]  [k] vfs_write
 4.44%  dd   [kernel.kallsyms]  [k] iov_iter_zero
 2.95%  dd   [kernel.kallsyms]  [k] write_iter_null
 2.82%  dd   [kernel.kallsyms]  [k] ksys_read
 2.46%  dd   [kernel.kallsyms]  [k] __fget_light
 2.34%  dd   libc-2.23.so   [.] __GI___libc_read
 1.89%  dd   [kernel.kallsyms]  [k] iov_iter_advance
 1.76%  dd   [kernel.kallsyms]  [k] __fdget_pos
 1.65%  dd   [kernel.kallsyms]  [k] rw_verify_area
 1.63%  dd   [kernel.kallsyms]  [k] read_iter_zero
 1.60%  dd   [kernel.kallsyms]  [k] iov_iter_init
 1.22%  dd   [kernel.kallsyms]  [k] ksys_write
 1.14%  dd   libc-2.23.so   [.] __GI___libc_write

Christophe



Christophe


[PATCH v4] soc: fsl: enable acpi support in RCPM driver

2020-08-19 Thread Ran Wang
From: Peng Ma 

This patch enables ACPI support in RCPM driver.

Signed-off-by: Peng Ma 
Signed-off-by: Ran Wang 
---
Change in v4:
 - Make commit subject more accurate
 - Remove unrelated new blank line

Change in v3:
 - Add #ifdef CONFIG_ACPI for acpi_device_id
 - Rename rcpm_acpi_imx_ids to rcpm_acpi_ids

Change in v2:
 - Update acpi_device_id to fix conflict with other driver

 drivers/soc/fsl/rcpm.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c
index a093dbe..b5aa6db 100644
--- a/drivers/soc/fsl/rcpm.c
+++ b/drivers/soc/fsl/rcpm.c
@@ -2,7 +2,7 @@
 //
 // rcpm.c - Freescale QorIQ RCPM driver
 //
-// Copyright 2019 NXP
+// Copyright 2019-2020 NXP
 //
 // Author: Ran Wang 
 
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define RCPM_WAKEUP_CELL_MAX_SIZE  7
 
@@ -139,10 +140,19 @@ static const struct of_device_id rcpm_of_match[] = {
 };
 MODULE_DEVICE_TABLE(of, rcpm_of_match);
 
+#ifdef CONFIG_ACPI
+static const struct acpi_device_id rcpm_acpi_ids[] = {
+   {"NXP0015",},
+   { }
+};
+MODULE_DEVICE_TABLE(acpi, rcpm_acpi_ids);
+#endif
+
 static struct platform_driver rcpm_driver = {
.driver = {
.name = "rcpm",
.of_match_table = rcpm_of_match,
+   .acpi_match_table = ACPI_PTR(rcpm_acpi_ids),
.pm = &rcpm_pm_ops,
},
.probe = rcpm_probe,
-- 
2.7.4



Re: [PATCH 14/16] debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64

2020-08-19 Thread Aneesh Kumar K.V
Anshuman Khandual  writes:

> On 08/12/2020 07:22 PM, Aneesh Kumar K.V wrote:
>> On 8/12/20 7:04 PM, Anshuman Khandual wrote:
>>>
>>>
>>> On 08/12/2020 06:46 PM, Aneesh Kumar K.V wrote:
 On 8/12/20 6:33 PM, Anshuman Khandual wrote:
>
>
> On 08/12/2020 12:03 PM, Aneesh Kumar K.V wrote:
>> The seems to be missing quite a lot of details w.r.t allocating
>> the correct pgtable_t page (huge_pte_alloc()), holding the right
>> lock (huge_pte_lock()) etc. The vma used is also not a hugetlb VMA.
>>
>> ppc64 do have runtime checks within CONFIG_DEBUG_VM for most of these.
>> Hence disable the test on ppc64.
>
> This test is free from any platform specific #ifdefs which should
> never be broken. If hugetlb_advanced_tests() does not work or is
> not detailed enough for ppc64, then it would be great if you could
> suggest some improvements so that it works for all enabled platforms.
>
>

 As mentioned the test is broken. For hugetlb, the pgtable_t pages should 
 be allocated by huge_pte_alloc(). We need to hold huget_pte_lock() before  
 updating huge tlb pte. That takes hugepage size, which is mostly derived 
 out of vma. Hence vma need to be a hugetlb vma. Some of the functions also 
 depend on hstate. Also we should use set_huge_pte_at() when setting up 
 hugetlb pte entries. I was tempted to remove that test completely marking 
 it broken. But avoided that by marking it broken on only PPC64.
>>>
>>> The test is not broken, hugetlb helpers on multiple platforms dont complain 
>>> about
>>> this at all. The tests here emulate 'enough' MM objects required for the 
>>> helpers
>>> on enabled platforms, to perform the primary task i.e page table 
>>> transformation it
>>> is expected to do. The test does not claim to emulate a perfect MM 
>>> environment for
>>> a given subsystem's (like HugeTLB) arch helpers. Now in this case, the MM 
>>> objects
>>> being emulated for the HugeTLB advanced tests does not seem to be 
>>> sufficient for
>>> ppc64 but it can be improved. But that does not mean it is broken in it's 
>>> current
>>> form for other platforms.
>>>
>> 
>> There is nothing ppc64 specific here. It is just that we have 
>> CONFIG_DEBUG_VM based checks for different possibly wrong usages of these 
>> functions. This was done because we have different page sizes, two different 
>> translations to support and we want to avoid any wrong usage. IMHO expecting 
>> hugetlb page table helpers to work with a non hugetlb VMA and  without 
>> holding hugeTLB pte lock is a clear violation of hugetlb interface.
>
> Do you have a modified version of the test with HugeTLB marked VMA and with 
> pte lock
> held, which works on ppc664 ?

Nope. That is one of the reason I commented that out. We can sort that
out slowly.

-aneesh


[powerpc:fixes-test] BUILD SUCCESS 801980f6497946048709b9b09771a1729551d705

2020-08-19 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
fixes-test
branch HEAD: 801980f6497946048709b9b09771a1729551d705  
powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death

elapsed time: 1042m

configs tested: 72
configs skipped: 68

The following configs have been built successfully.
More configs may be tested in the coming days.

arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
c6x  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
powerpc defconfig
i386 randconfig-a005-20200818
i386 randconfig-a002-20200818
i386 randconfig-a001-20200818
i386 randconfig-a006-20200818
i386 randconfig-a003-20200818
i386 randconfig-a004-20200818
x86_64   randconfig-a013-20200818
x86_64   randconfig-a016-20200818
x86_64   randconfig-a012-20200818
x86_64   randconfig-a011-20200818
x86_64   randconfig-a014-20200818
x86_64   randconfig-a015-20200818
i386 randconfig-a016-20200818
i386 randconfig-a011-20200818
i386 randconfig-a015-20200818
i386 randconfig-a013-20200818
i386 randconfig-a012-20200818
i386 randconfig-a014-20200818
x86_64   randconfig-a006-20200819
x86_64   randconfig-a001-20200819
x86_64   randconfig-a003-20200819
x86_64   randconfig-a005-20200819
x86_64   randconfig-a004-20200819
x86_64   randconfig-a002-20200819
riscvallyesconfig
riscv allnoconfig
riscv   defconfig
riscvallmodconfig
x86_64   rhel
x86_64   allyesconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  kexec

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org