[PATCH v2] powerpc/pseries: Do not initiate shutdown when system is running on UPS
As per PAPR we have to look for both EPOW sensor value and event modifier to identify type of event and take appropriate action. Sensor value = 3 (EPOW_SYSTEM_SHUTDOWN) schedule system to be shutdown after OS defined delay (default 10 mins). EPOW Event Modifier for sensor value = 3: We have to initiate immediate shutdown for most of the event modifier except value = 2 (system running on UPS). Checking with firmware document its clear that we have to wait for predefined time before initiating shutdown. If power is restored within time we should cancel the shutdown process. I think commit 79872e35 accidently enabled immediate poweroff for EPOW_SHUTDOWN_ON_UPS event. We have user space tool (rtas_errd) on LPAR to monitor for EPOW_SHUTDOWN_ON_UPS. Once it gets event it initiates shutdown after predefined time. Also starts monitoring for any new EPOW events. If it receives "Power restored" event before predefined time it will cancel the shutdown. Otherwise after predefined time it will shutdown the system. Fixes: 79872e35 (powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown) Cc: sta...@vger.kernel.org # v4.0+ Cc: Tyrel Datwyler Cc: Michael Ellerman Signed-off-by: Vasant Hegde --- Changes in v2: - Updated patch description based on mpe, Tyrel comment. -Vasant arch/powerpc/platforms/pseries/ras.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index f3736fcd98fc..13c86a292c6d 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -184,7 +184,6 @@ static void handle_system_shutdown(char event_modifier) case EPOW_SHUTDOWN_ON_UPS: pr_emerg("Loss of system power detected. System is running on" " UPS/battery. Check RTAS error log for details\n"); - orderly_poweroff(true); break; case EPOW_SHUTDOWN_LOSS_OF_CRITICAL_FUNCTIONS: -- 2.26.2
Re: [PATCH v2 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.
kernel test robot writes: > Hi "Aneesh, > > I love your patch! Yet something to improve: > > [auto build test ERROR on hnaz-linux-mm/master] > [also build test ERROR on powerpc/next linus/master v5.9-rc1 next-20200819] > [cannot apply to mmotm/master] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch] > > url: > https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/mm-debug_vm_pgtable-fixes/20200819-213446 > base: https://github.com/hnaz/linux-mm master > config: i386-randconfig-s002-20200818 (attached as .config) > compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 > reproduce: > # apt-get install sparse > # sparse version: v0.6.2-183-gaa6ede3b-dirty > # save the attached .config to linux build tree > make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 > > If you fix the issue, kindly add following tag as appropriate > Reported-by: kernel test robot > > All errors (new ones prefixed by >>): > >>> arch/x86/mm/ioremap.c:484:12: error: redefinition of >>> 'arch_ioremap_p4d_supported' > 484 | int __init arch_ioremap_p4d_supported(void) > |^~ >In file included from arch/x86/mm/ioremap.c:12: >include/linux/io.h:41:19: note: previous definition of > 'arch_ioremap_p4d_supported' was here > 41 | static inline int arch_ioremap_p4d_supported(void) > | ^~ I guess trying to work this out without using #ifdef is complex. I ended up with the below 1 file changed, 12 insertions(+), 2 deletions(-) mm/debug_vm_pgtable.c | 14 -- modified mm/debug_vm_pgtable.c @@ -202,11 +202,12 @@ static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot) WARN_ON(!pmd_leaf(pmd)); } +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot) { pmd_t pmd; - if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) + if (!arch_ioremap_pmd_supported()) return; pr_debug("Validating PMD huge\n"); @@ -220,6 +221,10 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot) pmd = READ_ONCE(*pmdp); WARN_ON(!pmd_none(pmd)); } +#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */ +static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot) { } +#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */ + static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot) { @@ -316,11 +321,12 @@ static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) WARN_ON(!pud_leaf(pud)); } +#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot) { pud_t pud; - if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) + if (!arch_ioremap_pud_supported()) return; pr_debug("Validating PUD huge\n"); @@ -334,6 +340,10 @@ static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot) pud = READ_ONCE(*pudp); WARN_ON(!pud_none(pud)); } +#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */ +static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot) { } +#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */ + #else /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { } static void __init pud_advanced_tests(struct mm_struct *mm,
[PATCH v2 2/2] selftests/powerpc: Add a rtas_filter selftest
Add a selftest to test the basic functionality of CONFIG_RTAS_FILTER. Signed-off-by: Andrew Donnellan --- v1->v2: - new patch --- .../selftests/powerpc/syscalls/Makefile | 2 +- .../selftests/powerpc/syscalls/rtas_filter.c | 285 ++ 2 files changed, 286 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/powerpc/syscalls/rtas_filter.c diff --git a/tools/testing/selftests/powerpc/syscalls/Makefile b/tools/testing/selftests/powerpc/syscalls/Makefile index 01b22775ca87..b63f8459c704 100644 --- a/tools/testing/selftests/powerpc/syscalls/Makefile +++ b/tools/testing/selftests/powerpc/syscalls/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only -TEST_GEN_PROGS := ipc_unmuxed +TEST_GEN_PROGS := ipc_unmuxed rtas_filter CFLAGS += -I../../../../../usr/include diff --git a/tools/testing/selftests/powerpc/syscalls/rtas_filter.c b/tools/testing/selftests/powerpc/syscalls/rtas_filter.c new file mode 100644 index ..3b0ac4ff64e5 --- /dev/null +++ b/tools/testing/selftests/powerpc/syscalls/rtas_filter.c @@ -0,0 +1,285 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Copyright 2005-2020 IBM Corporation. + * + * Includes code from librtas (https://github.com/ibm-power-utilities/librtas/) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "utils.h" + +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ +#define cpu_to_be32(x) bswap_32(x) +#define be32_to_cpu(x) bswap_32(x) +#else +#define cpu_to_be32(x) (x) +#define be32_to_cpu(x) (x) +#endif + +#define RTAS_IO_ASSERT -1098 /* Unexpected I/O Error */ +#define RTAS_UNKNOWN_OP -1099 /* No Firmware Implementation of Function */ +#define BLOCK_SIZE 4096 +#define PAGE_SIZE 4096 +#define MAX_PAGES 64 + +static const char *ofdt_rtas_path = "/proc/device-tree/rtas"; + +typedef __be32 uint32_t; +struct rtas_args { + __be32 token; + __be32 nargs; + __be32 nret; + __be32 args[16]; + __be32 *rets; /* Pointer to return values in args[]. */ +}; + +struct region { + uint64_t addr; + uint32_t size; + struct region *next; +}; + +int read_entire_file(int fd, char **buf, size_t *len) +{ + size_t buf_size = 0; + size_t off = 0; + int rc; + + *buf = NULL; + do { + buf_size += BLOCK_SIZE; + if (*buf == NULL) + *buf = malloc(buf_size); + else + *buf = realloc(*buf, buf_size); + + if (*buf == NULL) + return -ENOMEM; + + rc = read(fd, *buf + off, BLOCK_SIZE); + if (rc < 0) + return -EIO; + + off += rc; + } while (rc == BLOCK_SIZE); + + if (len) + *len = off; + + return 0; +} + +static int open_prop_file(const char *prop_path, const char *prop_name, int *fd) +{ + char *path; + int len; + + /* allocate enough for two string, a slash and trailing NULL */ + len = strlen(prop_path) + strlen(prop_name) + 1 + 1; + path = malloc(len); + if (path == NULL) + return -ENOMEM; + + snprintf(path, len, "%s/%s", prop_path, prop_name); + + *fd = open(path, O_RDONLY); + free(path); + if (*fd < 0) + return -errno; + + return 0; +} + +static int get_property(const char *prop_path, const char *prop_name, + char **prop_val, size_t *prop_len) +{ + int rc, fd; + + rc = open_prop_file(prop_path, prop_name, &fd); + if (rc) + return rc; + + rc = read_entire_file(fd, prop_val, prop_len); + close(fd); + + return rc; +} + +int rtas_token(const char *call_name) +{ + char *prop_buf = NULL; + size_t len; + int rc; + + rc = get_property(ofdt_rtas_path, call_name, &prop_buf, &len); + if (rc < 0) { + rc = RTAS_UNKNOWN_OP; + goto err; + } + + rc = be32_to_cpu(*(int *)prop_buf); + +err: + free(prop_buf); + return rc; +} + +static int read_kregion_bounds(struct region *kregion) +{ + char *buf; + int fd; + int rc; + + fd = open("/proc/ppc64/rtas/rmo_buffer", O_RDONLY); + if (fd < 0) { + printf("Could not open rmo_buffer file\n"); + return RTAS_IO_ASSERT; + } + + rc = read_entire_file(fd, &buf, NULL); + close(fd); + if (rc) { + free(buf); + return rc; + } + + sscanf(buf, "%" SCNx64 " %x", &kregion->addr, &kregion->size); + free(buf); + + if (!(kregion->size && kregion->addr) || + (kregion->size > (PAGE_SIZE * MAX_PAGES))) { + printf("Unexpected kregion bounds\n"); + return RTAS_IO_ASSERT; + } + + return 0; +} + +st
[PATCH v2 1/2] powerpc/rtas: Restrict RTAS requests from userspace
A number of userspace utilities depend on making calls to RTAS to retrieve information and update various things. The existing API through which we expose RTAS to userspace exposes more RTAS functionality than we actually need, through the sys_rtas syscall, which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they want with arbitrary arguments. Many RTAS calls take the address of a buffer as an argument, and it's up to the caller to specify the physical address of the buffer as an argument. We allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can access, and then expose the physical address and size of this buffer in /proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address, poke at the buffer using /dev/mem, and pass an address in the RMO buffer to the RTAS call. However, there's nothing stopping the caller from specifying whatever address they want in the RTAS call, and it's easy to construct a series of RTAS calls that can overwrite arbitrary bytes (even without /dev/mem access). Additionally, there are some RTAS calls that do potentially dangerous things and for which there are no legitimate userspace use cases. In the past, this would not have been a particularly big deal as it was assumed that root could modify all system state freely, but with Secure Boot and lockdown we need to care about this. We can't fundamentally change the ABI at this point, however we can address this by implementing a filter that checks RTAS calls against a list of permitted calls and forces the caller to use addresses within the RMO buffer. The list is based off the list of calls that are used by the librtas userspace library, and has been tested with a number of existing userspace RTAS utilities. For compatibility with any applications we are not aware of that require other calls, the filter can be turned off at build time. Reported-by: Daniel Axtens Cc: sta...@vger.kernel.org Signed-off-by: Andrew Donnellan --- v1->v2: - address comments from mpe - shorten the names of some struct members - make the filter array static/ro_after_init, use const char * - genericise the fixed buffer size cases - simplify/get rid of some of the error printing - get rid of rtas_token_name() --- arch/powerpc/Kconfig | 13 arch/powerpc/kernel/rtas.c | 153 + 2 files changed, 166 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1f48bbfb3ce9..8dd42b82379b 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -989,6 +989,19 @@ config PPC_SECVAR_SYSFS read/write operations on these variables. Say Y if you have secure boot enabled and want to expose variables to userspace. +config PPC_RTAS_FILTER + bool "Enable filtering of RTAS syscalls" + default y + depends on PPC_RTAS + help + The RTAS syscall API has security issues that could be used to + compromise system integrity. This option enforces restrictions on the + RTAS calls and arguments passed by userspace programs to mitigate + these issues. + + Say Y unless you know what you are doing and the filter is causing + problems for you. + endmenu config ISA_DMA_API diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 806d554ce357..954f41676f69 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -992,6 +992,147 @@ struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log, return NULL; } +#ifdef CONFIG_PPC_RTAS_FILTER + +/* + * The sys_rtas syscall, as originally designed, allows root to pass + * arbitrary physical addresses to RTAS calls. A number of RTAS calls + * can be abused to write to arbitrary memory and do other things that + * are potentially harmful to system integrity, and thus should only + * be used inside the kernel and not exposed to userspace. + * + * All known legitimate users of the sys_rtas syscall will only ever + * pass addresses that fall within the RMO buffer, and use a known + * subset of RTAS calls. + * + * Accordingly, we filter RTAS requests to check that the call is + * permitted, and that provided pointers fall within the RMO buffer. + * The rtas_filters list contains an entry for each permitted call, + * with the indexes of the parameters which are expected to contain + * addresses and sizes of buffers allocated inside the RMO buffer. + */ +struct rtas_filter { + const char *name; + int token; + /* Indexes into the args buffer, -1 if not used */ + int buf_idx1; + int size_idx1; + int buf_idx2; + int size_idx2; + + int fixed_size; +}; + +static struct rtas_filter rtas_filters[] __ro_after_init = { + { "ibm,activate-firmware", -1, -1, -1, -1, -1 }, + { "ibm,configure-connector", -1, 0, -1, 1, -1, 4096 }, /* Special cased */ + { "display-character", -1, -1, -1, -1, -1 }, + { "ibm,displ
Re: [PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources
Frederic Barrat writes: > Fix typo introduced during recent code cleanup, which could lead to > silently not freeing resources or oops message (on PCI hotplug or CAPI > reset). > Only impacts ioda2, the code path for ioda1 is correct. > > Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA > setup state") > Signed-off-by: Frederic Barrat I changed the subject to: powerpc/powernv/pci: Fix possible crash when releasing DMA resources To hopefully better convey that it's a moderately serious bug, even if the root cause is just a typo. Otherwise folks scanning the commit log might think it's just a harmless typo. cheers > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index c9c25fb0783c..023a4f987bb2 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -2705,7 +2705,7 @@ void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe > *pe) > struct iommu_table *tbl = pe->table_group.tables[0]; > int64_t rc; > > - if (pe->dma_setup_done) > + if (!pe->dma_setup_done) > return; > > rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0); > -- > 2.26.2
[RFC PATCH 2/2] KVM: PPC: Book3S HV: Support prefixed instructions
There are two main places where instructions are loaded from the guest: * Emulate loadstore - such as when performing MMIO emulation triggered by an HDSI * After an HV emulation assistance interrupt (e40) If it is a prefixed instruction that triggers these cases, its suffix must be loaded. Use the SRR1_PREFIX bit to decide if a suffix needs to be loaded. Make sure if this bit is set inject_interrupt() also sets it when giving an interrupt to the guest. ISA v3.10 extends the Hypervisor Emulation Instruction Register (HEIR) to 64 bits long to accommodate prefixed instructions. For interrupts caused by a word instruction the instruction is loaded into bits 32:63 and bits 0:31 are zeroed. When caused by a prefixed instruction the prefix and suffix are loaded into bits 0:63. Signed-off-by: Jordan Niethe --- arch/powerpc/kvm/book3s.c | 15 +-- arch/powerpc/kvm/book3s_64_mmu_hv.c | 10 +++--- arch/powerpc/kvm/book3s_hv_builtin.c| 3 +++ arch/powerpc/kvm/book3s_hv_rmhandlers.S | 14 ++ 4 files changed, 37 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 70d8967acc9b..18b1928a571b 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -456,13 +456,24 @@ int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, { ulong pc = kvmppc_get_pc(vcpu); u32 word; + u64 doubleword; int r; if (type == INST_SC) pc -= 4; - r = kvmppc_ld(vcpu, &pc, sizeof(u32), &word, false); - *inst = ppc_inst(word); + if ((kvmppc_get_msr(vcpu) & SRR1_PREFIXED)) { + r = kvmppc_ld(vcpu, &pc, sizeof(u64), &doubleword, false); +#ifdef CONFIG_CPU_LITTLE_ENDIAN + *inst = ppc_inst_prefix(doubleword & 0x, doubleword >> 32); +#else + *inst = ppc_inst_prefix(doubleword >> 32, doubleword & 0x); +#endif + } else { + r = kvmppc_ld(vcpu, &pc, sizeof(u32), &word, false); + *inst = ppc_inst(word); + } + if (r == EMULATE_DONE) return r; else diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 775ce41738ce..0802471f4856 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -411,9 +411,13 @@ static int instruction_is_store(struct ppc_inst instr) unsigned int mask; mask = 0x1000; - if ((ppc_inst_val(instr) & 0xfc00) == 0x7c00) - mask = 0x100; /* major opcode 31 */ - return (ppc_inst_val(instr) & mask) != 0; + if (ppc_inst_prefixed(instr)) { + return (ppc_inst_suffix(instr) & mask) != 0; + } else { + if ((ppc_inst_val(instr) & 0xfc00) == 0x7c00) + mask = 0x100; /* major opcode 31 */ + return (ppc_inst_val(instr) & mask) != 0; + } } int kvmppc_hv_emulate_mmio(struct kvm_vcpu *vcpu, diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c index 073617ce83e0..41e07e63104b 100644 --- a/arch/powerpc/kvm/book3s_hv_builtin.c +++ b/arch/powerpc/kvm/book3s_hv_builtin.c @@ -807,6 +807,9 @@ static void inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 srr1_flags) new_pc += 0xC0004000ULL; } + if (msr & SRR1_PREFIXED) + srr1_flags |= SRR1_PREFIXED; + kvmppc_set_srr0(vcpu, pc); kvmppc_set_srr1(vcpu, (msr & SRR1_MSR_BITS) | srr1_flags); kvmppc_set_pc(vcpu, new_pc); diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 4853b3444c5f..f2a609413621 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -1365,6 +1365,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) cmpwi r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST bne 11f mfspr r3,SPRN_HEIR + andis. r0,r11,SRR1_PREFIXED@h + cmpwi r0,0 + beq 12f + rldicl r4,r3,0,32 /* Suffix */ + srdir3,r3,32/* Prefix */ + b 11f +12: +BEGIN_FTR_SECTION + rldicl r3,r3,0,32 /* Word */ +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_31) 11:stw r3,VCPU_HEIR(r9) stw r4,VCPU_HEIR+4(r9) @@ -2175,6 +2185,10 @@ fast_interrupt_c_return: ori r4, r3, MSR_DR /* Enable paging for data */ mtmsrd r4 lwz r8, 0(r10) + andis. r7, r11, SRR1_PREFIXED@h + cmpwi r7,0 + beq +4 + lwz r5, 4(r10) mtmsrd r3 /* Store the result */ -- 2.17.1
[RFC PATCH 1/2] KVM: PPC: Use the ppc_inst type
The ppc_inst type was added to help cope with the addition of prefixed instructions to the ISA. Convert KVM to use this new type for dealing wiht instructions. For now do not try to add further support for prefixed instructions. Signed-off-by: Jordan Niethe --- arch/powerpc/include/asm/disassemble.h| 80 +- arch/powerpc/include/asm/kvm_book3s.h | 4 +- arch/powerpc/include/asm/kvm_book3s_asm.h | 3 +- arch/powerpc/include/asm/kvm_host.h | 5 +- arch/powerpc/include/asm/kvm_ppc.h| 14 ++-- arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kernel/kvm.c | 99 +++ arch/powerpc/kvm/book3s.c | 6 +- arch/powerpc/kvm/book3s.h | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 +- arch/powerpc/kvm/book3s_emulate.c | 30 +++ arch/powerpc/kvm/book3s_hv.c | 19 ++--- arch/powerpc/kvm/book3s_hv_nested.c | 4 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 5 ++ arch/powerpc/kvm/book3s_hv_tm.c | 17 ++-- arch/powerpc/kvm/book3s_hv_tm_builtin.c | 12 +-- arch/powerpc/kvm/book3s_paired_singles.c | 15 ++-- arch/powerpc/kvm/book3s_pr.c | 20 ++--- arch/powerpc/kvm/booke.c | 18 ++--- arch/powerpc/kvm/booke.h | 4 +- arch/powerpc/kvm/booke_emulate.c | 4 +- arch/powerpc/kvm/e500_emulate.c | 6 +- arch/powerpc/kvm/e500_mmu_host.c | 6 +- arch/powerpc/kvm/emulate.c| 15 ++-- arch/powerpc/kvm/emulate_loadstore.c | 8 +- arch/powerpc/kvm/powerpc.c| 4 +- arch/powerpc/kvm/trace.h | 9 ++- arch/powerpc/kvm/trace_booke.h| 8 +- arch/powerpc/kvm/trace_pr.h | 8 +- arch/powerpc/lib/inst.c | 4 +- arch/powerpc/lib/sstep.c | 4 +- arch/powerpc/sysdev/fsl_pci.c | 4 +- 32 files changed, 237 insertions(+), 210 deletions(-) diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h index 8d2ebc36d5e3..91dbe8e5cd13 100644 --- a/arch/powerpc/include/asm/disassemble.h +++ b/arch/powerpc/include/asm/disassemble.h @@ -10,75 +10,82 @@ #define __ASM_PPC_DISASSEMBLE_H__ #include +#include -static inline unsigned int get_op(u32 inst) +static inline unsigned int get_op(struct ppc_inst inst) { - return inst >> 26; + return ppc_inst_val(inst) >> 26; } -static inline unsigned int get_xop(u32 inst) +static inline unsigned int get_xop(struct ppc_inst inst) { - return (inst >> 1) & 0x3ff; + return (ppc_inst_val(inst) >> 1) & 0x3ff; } -static inline unsigned int get_sprn(u32 inst) +static inline unsigned int get_sprn(struct ppc_inst inst) { - return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0); + u32 word = ppc_inst_val(inst); + + return ((word >> 16) & 0x1f) | ((word >> 6) & 0x3e0); } -static inline unsigned int get_dcrn(u32 inst) +static inline unsigned int get_dcrn(struct ppc_inst inst) { - return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0); + u32 word = ppc_inst_val(inst); + + return ((word >> 16) & 0x1f) | ((word >> 6) & 0x3e0); } -static inline unsigned int get_tmrn(u32 inst) +static inline unsigned int get_tmrn(struct ppc_inst inst) { - return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0); + u32 word = ppc_inst_val(inst); + + return ((word >> 16) & 0x1f) | ((word >> 6) & 0x3e0); } -static inline unsigned int get_rt(u32 inst) +static inline unsigned int get_rt(struct ppc_inst inst) { - return (inst >> 21) & 0x1f; + return (ppc_inst_val(inst) >> 21) & 0x1f; } -static inline unsigned int get_rs(u32 inst) +static inline unsigned int get_rs(struct ppc_inst inst) { - return (inst >> 21) & 0x1f; + return (ppc_inst_val(inst) >> 21) & 0x1f; } -static inline unsigned int get_ra(u32 inst) +static inline unsigned int get_ra(struct ppc_inst inst) { - return (inst >> 16) & 0x1f; + return (ppc_inst_val(inst) >> 16) & 0x1f; } -static inline unsigned int get_rb(u32 inst) +static inline unsigned int get_rb(struct ppc_inst inst) { - return (inst >> 11) & 0x1f; + return (ppc_inst_val(inst) >> 11) & 0x1f; } -static inline unsigned int get_rc(u32 inst) +static inline unsigned int get_rc(struct ppc_inst inst) { - return inst & 0x1; + return ppc_inst_val(inst) & 0x1; } -static inline unsigned int get_ws(u32 inst) +static inline unsigned int get_ws(struct ppc_inst inst) { - return (inst >> 11) & 0x1f; + return (ppc_inst_val(inst) >> 11) & 0x1f; } -static inline unsigned int get_d(u32 inst) +static inline unsigned int get_d(struct ppc_inst inst) { - return inst & 0x; + return ppc_inst_val(inst) & 0x; } -static inline unsigned int get_oc(u32 inst) +static inline unsigned int get_oc(struct ppc_inst inst) { - return (inst >>
Re: [PATCH v3] powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory
Christoph Hellwig writes: > On Tue, Aug 18, 2020 at 07:11:26PM -0300, Thiago Jung Bauermann wrote: >> POWER secure guests (i.e., guests which use the Protection Execution >> Facility) need to use SWIOTLB to be able to do I/O with the hypervisor, but >> they don't need the SWIOTLB memory to be in low addresses since the >> hypervisor doesn't have any addressing limitation. >> >> This solves a SWIOTLB initialization problem we are seeing in secure guests >> with 128 GB of RAM: they are configured with 4 GB of crashkernel reserved >> memory, which leaves no space for SWIOTLB in low addresses. >> >> To do this, we use mostly the same code as swiotlb_init(), but allocate the >> buffer using memblock_alloc() instead of memblock_alloc_low(). >> >> Signed-off-by: Thiago Jung Bauermann > > Looks fine to me (except for the pointlessly long comment lines, but I've > been told that's the powerpc way). They're 80 columns AFAICS? cheers
Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support
Excerpts from Shawn Anastasio's message of August 19, 2020 6:59 am: > On 8/18/20 2:11 AM, Nicholas Piggin wrote> Very reasonable point. >> >> The problem we're trying to get a handle on is live partition migration >> where a running guest might be using SAO then get migrated to a P10. I >> don't think we have a good way to handle this case. Potentially the >> hypervisor could revoke the page tables if the guest is running in hash >> mode and the guest kernel could be taught about that and sigbus the >> process, but in radix the guest controls those page tables and the SAO >> state and I don't think there's a way to cause it to take a fault. >> >> I also don't know what the proprietary hypervisor does here. >> >> We could add it back, default to n, or make it bare metal only, or >> somehow try to block live migration to a later CPU without the faciliy. >> I wouldn't be against that. > > > Admittedly I'm not too familiar with the specifics of live migration > or guest memory management, but restoring the functionality and adding > a way to prevent migration of SAO-using guests seems like a reasonable > choice to me. Would this be done with help from the guest using some > sort of infrastructure to signal to the hypervisor that SAO is in use, > or entirely on the hypervisor by e.g. scanning the through the process > table for SAO pages? The first step might be to just re-add the functionality but disable it by default if firmware_has_feature(FW_FEATURE_LPAR). You could have a config or boot option to allow guests to use it at the cost of migration compatibility. That would probably be good enough for experimenting with the feature. I think modifying the hypervisor and/or guest to deal with migration is probably too much work to be justified at the moment. >> It would be very interesting to know how it performs in such a "real" >> situation. I don't know how well POWER9 has optimised it -- it's >> possible that it's not much better than putting lwsync after every load >> or store. > > > This is definitely worth investigating in depth. That said, even if the > performance on P9 isn't super great, I think the feature could still be > useful, since it would offer more granularity than the sledgehammer > approach of emitting lwsync everywhere. Sure, we'd be interested to hear of results. > I'd be happy to put in some of the work required to get this to a point > where it can be reintroduced without breaking guest migration - I'd just > need some pointers on getting started with whatever approach is decided on. I think re-adding it as I said above would be okay. The code itself is not complex so that was not the reason for removal. Thanks, Nick
Re: [PATCH v2 1/4] powerpc/drmem: Make lmb_size 64 bit
Hi [This is an automated email] This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all The bot has tested the following trees: v5.8.1, v5.7.15, v5.4.58, v4.19.139, v4.14.193, v4.9.232, v4.4.232. v5.8.1: Build OK! v5.7.15: Build OK! v5.4.58: Build OK! v4.19.139: Build OK! v4.14.193: Failed to apply! Possible dependencies: 22508f3dc985 ("powerpc/numa: Look up device node in of_get_usable_memory()") 2c77721552e5 ("powerpc: Move of_drconf_cell struct to asm/drmem.h") 35f80debaef0 ("powerpc/numa: Look up device node in of_get_assoc_arrays()") 514a9cb3316a ("powerpc/numa: Update numa code use walk_drmem_lmbs") 6195a5001f1d ("powerpc/pseries: Update memory hotplug code to use drmem LMB array") 6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format") b6eca183e23e ("powerpc/kernel: Enables memory hot-remove after reboot on pseries guests") b88fc309d6ad ("powerpc/numa: Look up associativity array in of_drconf_to_nid_single") v4.9.232: Failed to apply! Possible dependencies: 3a2df3798d4d ("powerpc/mm: Make switch_mm_irqs_off() out of line") 43ed84a891b7 ("powerpc/mm: Move pgdir setting into a helper") 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") 5d451a87e5eb ("powerpc/64: Retrieve number of L1 cache sets from device-tree") 6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format") 70cd4c10b290 ("KVM: PPC: Book3S HV: Fix software walk of guest process page tables") 9b081e10805c ("powerpc: port 64 bits pgtable_cache to 32 bits") a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with KVM") bd067f83b084 ("powerpc/64: Fix naming of cache block vs. cache line") dbcbfee0c81c ("powerpc/64: More definitions for POWER9") e2827fe5c156 ("powerpc/64: Clean up ppc64_caches using a struct per cache") f4329f2ecb14 ("powerpc/64s: Reduce exception alignment") v4.4.232: Failed to apply! Possible dependencies: 11a6f6abd74a ("powerpc/mm: Move radix/hash common data structures to book3s64 headers") 26b6a3d9bb48 ("powerpc/mm: move pte headers to book3s directory") 3808a88985b4 ("powerpc: Move FW feature probing out of pseries probe()") 3dfcb315d81e ("powerpc/mm: make a separate copy for book3s") 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") 5d31a96e6c01 ("powerpc/livepatch: Add livepatch stack to struct thread_info") 6574ba950bbe ("powerpc/kernel: Convert cpu_has_feature() to returning bool") 6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format") a141cca3892b ("powerpc/mm: Add early_[cpu|mmu]_has_feature()") a8ed87c92adf ("powerpc/mm/radix: Add MMU_FTR_RADIX") b92a226e5284 ("powerpc: Move cpu_has_feature() to a separate file") da6a97bf12d5 ("powerpc: Move epapr_paravirt_early_init() to early_init_devtree()") f63e6d898760 ("powerpc/livepatch: Add livepatch header") NOTE: The patch will not be queued to stable trees until it is upstream. How should we proceed with this patch? -- Thanks Sasha
Re: [PATCH v2 3/4] powerpc/memhotplug: Make lmb size 64bit
Hi [This is an automated email] This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all The bot has tested the following trees: v5.8.1, v5.7.15, v5.4.58, v4.19.139, v4.14.193, v4.9.232, v4.4.232. v5.8.1: Build OK! v5.7.15: Build OK! v5.4.58: Build OK! v4.19.139: Failed to apply! Possible dependencies: Unable to calculate v4.14.193: Failed to apply! Possible dependencies: Unable to calculate v4.9.232: Failed to apply! Possible dependencies: 1a367063ca0c ("powerpc/pseries: Check memory device state before onlining/offlining") 25b587fba9a4 ("powerpc/pseries: Correct possible read beyond dlpar sysfs buffer") 333f7b76865b ("powerpc/pseries: Implement indexed-count hotplug memory add") 753843471cbb ("powerpc/pseries: Implement indexed-count hotplug memory remove") 943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'") c21f515c7436 ("powerpc/pseries: Make the acquire/release of the drc for memory a seperate step") e70d59700fc3 ("powerpc/pseries: Introduce memory hotplug READD operation") f84775c2d5d9 ("powerpc/pseries: Fix build break when MEMORY_HOTREMOVE=n") v4.4.232: Failed to apply! Possible dependencies: 183deeea5871 ("powerpc/pseries: Consolidate CPU hotplug code to hotplug-cpu.c") 1a367063ca0c ("powerpc/pseries: Check memory device state before onlining/offlining") 1dc759566636 ("powerpc/pseries: Use kernel hotplug queue for PowerVM hotplug events") 1f859adb9253 ("powerpc/pseries: Verify CPU doesn't exist before adding") 25b587fba9a4 ("powerpc/pseries: Correct possible read beyond dlpar sysfs buffer") 333f7b76865b ("powerpc/pseries: Implement indexed-count hotplug memory add") 4a4bdfea7cb7 ("powerpc/pseries: Refactor dlpar_add_lmb() code") 753843471cbb ("powerpc/pseries: Implement indexed-count hotplug memory remove") 9054619ef54a ("powerpc/pseries: Add pseries hotplug workqueue") 943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'") 9dc512819e4b ("powerpc: Fix unused function warning 'lmb_to_memblock'") bdf5fc633804 ("powerpc/pseries: Update LMB associativity index during DLPAR add/remove") c21f515c7436 ("powerpc/pseries: Make the acquire/release of the drc for memory a seperate step") e70d59700fc3 ("powerpc/pseries: Introduce memory hotplug READD operation") e9d764f80396 ("powerpc/pseries: Enable kernel CPU dlpar from sysfs") ec999072442a ("powerpc/pseries: Auto-online hotplugged memory") f84775c2d5d9 ("powerpc/pseries: Fix build break when MEMORY_HOTREMOVE=n") fdb4f6e99ffa ("powerpc/pseries: Remove call to memblock_add()") NOTE: The patch will not be queued to stable trees until it is upstream. How should we proceed with this patch? -- Thanks Sasha
[PATCH net-next v2 4/4] ibmvnic: merge ibmvnic_reset_init and ibmvnic_init
These two functions share the majority of the code, hence merge them together. In the meanwhile, add a reset pass-in parameter to differentiate them. Thus, the code is easier to read and to tell the difference between reset_init and regular init. Signed-off-by: Lijun Pan --- drivers/net/ethernet/ibm/ibmvnic.c | 65 ++ 1 file changed, 13 insertions(+), 52 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 4ca4647db72a..47fbe0553570 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -104,8 +104,7 @@ static int send_login(struct ibmvnic_adapter *adapter); static void send_cap_queries(struct ibmvnic_adapter *adapter); static int init_sub_crqs(struct ibmvnic_adapter *); static int init_sub_crq_irqs(struct ibmvnic_adapter *adapter); -static int ibmvnic_init(struct ibmvnic_adapter *); -static int ibmvnic_reset_init(struct ibmvnic_adapter *); +static int ibmvnic_reset_init(struct ibmvnic_adapter *, bool reset); static void release_crq_queue(struct ibmvnic_adapter *); static int __ibmvnic_set_mac(struct net_device *, u8 *); static int init_crq_queue(struct ibmvnic_adapter *adapter); @@ -1868,7 +1867,7 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter, return rc; } - rc = ibmvnic_reset_init(adapter); + rc = ibmvnic_reset_init(adapter, true); if (rc) return IBMVNIC_INIT_FAILED; @@ -1986,7 +1985,7 @@ static int do_reset(struct ibmvnic_adapter *adapter, goto out; } - rc = ibmvnic_reset_init(adapter); + rc = ibmvnic_reset_init(adapter, true); if (rc) { rc = IBMVNIC_INIT_FAILED; goto out; @@ -2093,7 +2092,7 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter, return rc; } - rc = ibmvnic_init(adapter); + rc = ibmvnic_reset_init(adapter, false); if (rc) return rc; @@ -4970,7 +4969,7 @@ static int init_crq_queue(struct ibmvnic_adapter *adapter) return retrc; } -static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter) +static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter, bool reset) { struct device *dev = &adapter->vdev->dev; unsigned long timeout = msecs_to_jiffies(3); @@ -4979,10 +4978,12 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter) adapter->from_passive_init = false; - old_num_rx_queues = adapter->req_rx_queues; - old_num_tx_queues = adapter->req_tx_queues; + if (reset) { + old_num_rx_queues = adapter->req_rx_queues; + old_num_tx_queues = adapter->req_tx_queues; + reinit_completion(&adapter->init_done); + } - reinit_completion(&adapter->init_done); adapter->init_done_rc = 0; rc = ibmvnic_send_crq_init(adapter); if (rc) { @@ -5000,7 +5001,8 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter) return adapter->init_done_rc; } - if (test_bit(0, &adapter->resetting) && !adapter->wait_for_reset && + if (reset && + test_bit(0, &adapter->resetting) && !adapter->wait_for_reset && adapter->reset_reason != VNIC_RESET_MOBILITY) { if (adapter->req_rx_queues != old_num_rx_queues || adapter->req_tx_queues != old_num_tx_queues) { @@ -5028,47 +5030,6 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter) return rc; } -static int ibmvnic_init(struct ibmvnic_adapter *adapter) -{ - struct device *dev = &adapter->vdev->dev; - unsigned long timeout = msecs_to_jiffies(3); - int rc; - - adapter->from_passive_init = false; - - adapter->init_done_rc = 0; - rc = ibmvnic_send_crq_init(adapter); - if (rc) { - dev_err(dev, "Send crq init failed with error %d\n", rc); - return rc; - } - - if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { - dev_err(dev, "Initialization sequence timed out\n"); - return -1; - } - - if (adapter->init_done_rc) { - release_crq_queue(adapter); - return adapter->init_done_rc; - } - - rc = init_sub_crqs(adapter); - if (rc) { - dev_err(dev, "Initialization of sub crqs failed\n"); - release_crq_queue(adapter); - return rc; - } - - rc = init_sub_crq_irqs(adapter); - if (rc) { - dev_err(dev, "Failed to initialize sub crq irqs\n"); - release_crq_queue(adapter); - } - - return rc; -} - static struct device_attribute dev_attr_failover; static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) @@ -5131,7 +509
[PATCH net-next v2 3/4] ibmvnic: remove never executed if statement
At the beginning of the function, from_passive_init is set false by "adapter->from_passive_init = false;", hence the if statement will never run. Signed-off-by: Lijun Pan --- drivers/net/ethernet/ibm/ibmvnic.c | 12 1 file changed, 12 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 644352e5056d..4ca4647db72a 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -5000,12 +5000,6 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter) return adapter->init_done_rc; } - if (adapter->from_passive_init) { - adapter->state = VNIC_OPEN; - adapter->from_passive_init = false; - return -1; - } - if (test_bit(0, &adapter->resetting) && !adapter->wait_for_reset && adapter->reset_reason != VNIC_RESET_MOBILITY) { if (adapter->req_rx_queues != old_num_rx_queues || @@ -5059,12 +5053,6 @@ static int ibmvnic_init(struct ibmvnic_adapter *adapter) return adapter->init_done_rc; } - if (adapter->from_passive_init) { - adapter->state = VNIC_OPEN; - adapter->from_passive_init = false; - return -1; - } - rc = init_sub_crqs(adapter); if (rc) { dev_err(dev, "Initialization of sub crqs failed\n"); -- 2.23.0
[PATCH net-next v2 2/4] ibmvnic: improve ibmvnic_init and ibmvnic_reset_init
When H_SEND_CRQ command returns with H_CLOSED, it means the server's CRQ is not ready yet. Instead of resetting immediately, we wait for the server to launch passive init. ibmvnic_init() and ibmvnic_reset_init() should also return the error code from ibmvnic_send_crq_init() call. Signed-off-by: Lijun Pan --- v2: removes __func__ in error messages. drivers/net/ethernet/ibm/ibmvnic.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 65f5d99f97dc..644352e5056d 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -3568,8 +3568,7 @@ static int ibmvnic_send_crq(struct ibmvnic_adapter *adapter, if (rc) { if (rc == H_CLOSED) { dev_warn(dev, "CRQ Queue closed\n"); - if (test_bit(0, &adapter->resetting)) - ibmvnic_reset(adapter, VNIC_RESET_FATAL); + /* do not reset, report the fail, wait for passive init from server */ } dev_warn(dev, "Send error (rc=%d)\n", rc); @@ -4985,7 +4984,12 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter) reinit_completion(&adapter->init_done); adapter->init_done_rc = 0; - ibmvnic_send_crq_init(adapter); + rc = ibmvnic_send_crq_init(adapter); + if (rc) { + dev_err(dev, "Send crq init failed with error %d\n", rc); + return rc; + } + if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { dev_err(dev, "Initialization sequence timed out\n"); return -1; @@ -5039,7 +5043,12 @@ static int ibmvnic_init(struct ibmvnic_adapter *adapter) adapter->from_passive_init = false; adapter->init_done_rc = 0; - ibmvnic_send_crq_init(adapter); + rc = ibmvnic_send_crq_init(adapter); + if (rc) { + dev_err(dev, "Send crq init failed with error %d\n", rc); + return rc; + } + if (!wait_for_completion_timeout(&adapter->init_done, timeout)) { dev_err(dev, "Initialization sequence timed out\n"); return -1; -- 2.23.0
[PATCH net-next v2 1/4] ibmvnic: compare adapter->init_done_rc with more readable ibmvnic_rc_codes
Instead of comparing (adapter->init_done_rc == 1), let it be (adapter->init_done_rc == PARTIALSUCCESS). Signed-off-by: Lijun Pan --- drivers/net/ethernet/ibm/ibmvnic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 5afb3c9c52d2..65f5d99f97dc 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -974,7 +974,7 @@ static int set_link_state(struct ibmvnic_adapter *adapter, u8 link_state) return -1; } - if (adapter->init_done_rc == 1) { + if (adapter->init_done_rc == PARTIALSUCCESS) { /* Partuial success, delay and re-send */ mdelay(1000); resend = true; -- 2.23.0
[PATCH net-next v2 0/4] refactoring of ibmvnic code
This patch series refactor reset_init and init functions, and make some other cosmetic changes to make the code easier to read and debug. v2 removes __func__ and v1's 1/5. Lijun Pan (4): ibmvnic: compare adapter->init_done_rc with more readable ibmvnic_rc_codes ibmvnic: improve ibmvnic_init and ibmvnic_reset_init ibmvnic: remove never executed if statement ibmvnic: merge ibmvnic_reset_init and ibmvnic_init drivers/net/ethernet/ibm/ibmvnic.c | 84 -- 1 file changed, 21 insertions(+), 63 deletions(-) -- 2.23.0
Re: fsl_espi errors on v5.7.15
On 19/08/20 6:15 pm, Heiner Kallweit wrote: > On 19.08.2020 00:44, Chris Packham wrote: >> Hi Again, >> >> On 17/08/20 9:09 am, Chris Packham wrote: >> >>> On 14/08/20 6:19 pm, Heiner Kallweit wrote: On 14.08.2020 04:48, Chris Packham wrote: > Hi, > > I'm seeing a problem with accessing spi-nor after upgrading a T2081 > based system to linux v5.7.15 > > For this board u-boot and the u-boot environment live on spi-nor. > > When I use fw_setenv from userspace I get the following kernel logs > > # fw_setenv foo=1 > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty! > fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 > fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty! > fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 > fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty! > fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 > ... > This error reporting doesn't exist yet in 4.4. So you may have an issue under 4.4 too, it's just not reported. Did you verify that under 4.4 fw_setenv actually has an effect? >>> Just double checked and yes under 4.4 the setting does get saved. > If I run fw_printenv (before getting it into a bad state) it is able to > display the content of the boards u-boot environment. > This might indicate an issue with spi being locked. I've seen related questions, just use the search engine of your choice and check for fw_setenv and locked. >>> I'm running a version of fw_setenv which includes >>> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it shouldn't >>> be locking things unnecessarily. > If been unsuccessful in producing a setup for bisecting the issue. I do > know the issue doesn't occur on the old 4.4.x based kernel but that's > probably not much help. > > Any pointers on what the issue (and/or solution) might be. >> I finally managed to get our board running with a vanilla kernel. With >> corenet64_smp_defconfig I occasionally see >> >> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! >> >> other than the message things seem to be working. >> >> With a custom defconfig I see >> >> fsl_espi ffe11.spi: Transfer done but SPIE_DON isn't set! >> fsl_espi ffe11.spi: Transfer done but rx/tx fifo's aren't empty! >> fsl_espi ffe11.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 >> ... >> >> and access to the spi-nor does not work until the board is reset. >> >> I'll try and pick apart the differences between the two defconfigs. I now think my earlier testing is invalid. I have seen the problem with either defconfig if I try hard enough. I had convinced myself that the problem was CONFIG_PREEMPT but that was before I found boot-to-boot differences with the same kernel. It's possible that I'm chasing multiple issues with the same symptom. The error I'm most concerned with is in the sequence 1. boot with old image 2. write environment 3. boot with new image 4. write environment 5. write fails and environment is corrupted After I recover the system things sometimes seem fine. Until I repeat the sequence above. > Also relevant may be: > - Which dts are you using? Custom but based heavily on the t2080rdb. > - What's the spi-nor type, and at which frequency are you operating it? The board has several alternate parts for the spi-nor so the dts just specifies compatible = "jedec,spi-nor" the actual chip detected on the board I have is "n25q032a (4096 Kbytes)". The dts sets spi-max-frequency = <1000> I haven't measured the actual frequency on the bus. > - Does the issue still happen if you lower the frequency? I did play around with the frequency initially but I should probably give that another go now that I have a better reproduction method.
Re: [PATCH v3 09/17] memblock: make memblock_debug and related functionality private
On Wed, Aug 19, 2020 at 12:24:05PM -0700, Andrew Morton wrote: > On Tue, 18 Aug 2020 18:16:26 +0300 Mike Rapoport wrote: > > > From: Mike Rapoport > > > > The only user of memblock_dbg() outside memblock was s390 setup code and it > > is converted to use pr_debug() instead. > > This allows to stop exposing memblock_debug and memblock_dbg() to the rest > > of the kernel. > > > > --- a/mm/memblock.c > > +++ b/mm/memblock.c > > @@ -137,7 +137,10 @@ struct memblock_type physmem = { > > i < memblock_type->cnt;\ > > i++, rgn = &memblock_type->regions[i]) > > > > -int memblock_debug __initdata_memblock; > > +#define memblock_dbg(fmt, ...) \ > > + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) > > + > > checkpatch doesn't like this much. > > ERROR: Macros starting with if should be enclosed by a do - while loop to > avoid possible if/else logic defects > #101: FILE: mm/memblock.c:140: > +#define memblock_dbg(fmt, ...) \ > + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) > > WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then > dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... > #102: FILE: mm/memblock.c:141: > + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) > > ERROR: trailing statements should be on next line > #102: FILE: mm/memblock.c:141: > + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) > > > The first one is significant: > > if (foo) > memblock_dbg(...); > else > save_the_world(); > > could end up inadvertently destroying the planet. Well, it didn't till now ;-) > This? > > --- > a/mm/memblock.c~memblock-make-memblock_debug-and-related-functionality-private-fix > +++ a/mm/memblock.c > @@ -137,8 +137,11 @@ struct memblock_type physmem = { >i < memblock_type->cnt;\ >i++, rgn = &memblock_type->regions[i]) > > -#define memblock_dbg(fmt, ...) \ > - if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) > +#define memblock_dbg(fmt, ...) > \ > + do {\ > + if (memblock_debug) \ > + pr_info(fmt, ##__VA_ARGS__);\ > + } while (0) Sure, thanks! > static int memblock_debug __initdata_memblock; > static bool system_has_some_mirror __initdata_memblock = false; > _ > -- Sincerely yours, Mike.
Re: [PATCH v3] powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory
Konrad Rzeszutek Wilk writes: > On Tue, Aug 18, 2020 at 07:11:26PM -0300, Thiago Jung Bauermann wrote: >> POWER secure guests (i.e., guests which use the Protection Execution >> Facility) need to use SWIOTLB to be able to do I/O with the hypervisor, but >> they don't need the SWIOTLB memory to be in low addresses since the >> hypervisor doesn't have any addressing limitation. >> >> This solves a SWIOTLB initialization problem we are seeing in secure guests >> with 128 GB of RAM: they are configured with 4 GB of crashkernel reserved >> memory, which leaves no space for SWIOTLB in low addresses. >> >> To do this, we use mostly the same code as swiotlb_init(), but allocate the >> buffer using memblock_alloc() instead of memblock_alloc_low(). >> >> Signed-off-by: Thiago Jung Bauermann > > Reviewed-by: Konrad Rzeszutek Wilk Thanks! -- Thiago Jung Bauermann IBM Linux Technology Center
Re: [PATCH v3 09/17] memblock: make memblock_debug and related functionality private
On Tue, 18 Aug 2020 18:16:26 +0300 Mike Rapoport wrote: > From: Mike Rapoport > > The only user of memblock_dbg() outside memblock was s390 setup code and it > is converted to use pr_debug() instead. > This allows to stop exposing memblock_debug and memblock_dbg() to the rest > of the kernel. > > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -137,7 +137,10 @@ struct memblock_type physmem = { >i < memblock_type->cnt;\ >i++, rgn = &memblock_type->regions[i]) > > -int memblock_debug __initdata_memblock; > +#define memblock_dbg(fmt, ...) \ > + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) > + checkpatch doesn't like this much. ERROR: Macros starting with if should be enclosed by a do - while loop to avoid possible if/else logic defects #101: FILE: mm/memblock.c:140: +#define memblock_dbg(fmt, ...) \ + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) WARNING: Prefer [subsystem eg: netdev]_info([subsystem]dev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... #102: FILE: mm/memblock.c:141: + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) ERROR: trailing statements should be on next line #102: FILE: mm/memblock.c:141: + if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) The first one is significant: if (foo) memblock_dbg(...); else save_the_world(); could end up inadvertently destroying the planet. This? --- a/mm/memblock.c~memblock-make-memblock_debug-and-related-functionality-private-fix +++ a/mm/memblock.c @@ -137,8 +137,11 @@ struct memblock_type physmem = { i < memblock_type->cnt;\ i++, rgn = &memblock_type->regions[i]) -#define memblock_dbg(fmt, ...) \ - if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) +#define memblock_dbg(fmt, ...) \ + do {\ + if (memblock_debug) \ + pr_info(fmt, ##__VA_ARGS__);\ + } while (0) static int memblock_debug __initdata_memblock; static bool system_has_some_mirror __initdata_memblock = false; _
Re: [PATCH net-next 3/5] ibmvnic: improve ibmvnic_init and ibmvnic_reset_init
From: Lijun Pan Date: Wed, 19 Aug 2020 00:35:10 -0500 > + if (rc) { > + dev_err(dev, "%s: Send crq init failed with error %d\n", > __func__, rc); > + return rc; Consistent with my feedback on patch #1, please get rid of this __func__ stuff. Thank you.
Re: [PATCH] powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10
On 19/08/20 3:17 pm, Nicholas Piggin wrote: This driver does not restore stop > 3 state, so it limits itself to states which do not lose full state or TB. The POWER10 SPRs are sufficiently different from P9 that it seems easier to split out the P10 code. The POWER10 deep sleep code (e.g., the BHRB restore) has been taken out, but it can be re-added when stop > 3 support is added. Cc: Ryan P Grimm Cc: Michael Neuling Cc: Gautham R. Shenoy Cc: Pratik Rajesh Sampat Signed-off-by: Nicholas Piggin Tested-by: Pratik Rajesh Sampat Reviewed-by: Pratik Rajesh Sampat I've tested the patch on P9 and P10, can verify the corresponding code paths are being taken for their respective architecture and cpuidle is being exercised with the correct set of advertised stop states in each case. The patch looks good to me. --- Thanks, Pratik
Re: [PATCH net-next 1/5] ibmvnic: print caller in several error messages
From: Lijun Pan Date: Wed, 19 Aug 2020 00:35:08 -0500 > The error messages in the changed functions are exactly the same. > In order to differentiate them and make debugging easier, > we print the function names in the error messages. > > Signed-off-by: Lijun Pan I don't see any value in emitting function names in kernel error log messages. Sorry.
Re: [PATCH] powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10
On 19/08/20 3:17 pm, Nicholas Piggin wrote: This driver does not restore stop > 3 state, so it limits itself to states which do not lose full state or TB. The POWER10 SPRs are sufficiently different from P9 that it seems easier to split out the P10 code. The POWER10 deep sleep code (e.g., the BHRB restore) has been taken out, but it can be re-added when stop > 3 support is added. Cc: Ryan P Grimm Cc: Michael Neuling Cc: Gautham R. Shenoy Cc: Pratik Rajesh Sampat Signed-off-by: Nicholas Piggin Fixes: 8747bf36f312 ("powerpc/powernv/idle: Replace CPU feature check with PVR check") The patch fixes the commit 8747bf36f312 which had setup idle for only for the P9 PVR. --- Thanks for fixing this! Pratik
Re: [PATCH] powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10
On Wed, Aug 19, 2020 at 07:47:00PM +1000, Nicholas Piggin wrote: > This driver does not restore stop > 3 state, so it limits itself > to states which do not lose full state or TB. > > The POWER10 SPRs are sufficiently different from P9 that it seems > easier to split out the P10 code. The POWER10 deep sleep code > (e.g., the BHRB restore) has been taken out, but it can be re-added > when stop > 3 support is added. MMCRA[BHRB] save/restore was in the shallow stop-state path. But we can add it back later. > > Cc: Ryan P Grimm > Cc: Michael Neuling > Cc: Gautham R. Shenoy > Cc: Pratik Rajesh Sampat > Signed-off-by: Nicholas Piggin Just a minor comment below. But otherwise the patch looks good to me. [..snip..] > @@ -1093,11 +1200,15 @@ int validate_psscr_val_mask(u64 *psscr_val, u64 > *psscr_mask, u32 flags) > * @dt_idle_states: Number of idle state entries > * Returns 0 on success > */ > -static void __init pnv_power9_idle_init(void) > +static void __init pnv_arch300_idle_init(void) > { > u64 max_residency_ns = 0; > int i; > > + /* stop is not really architected, we only have p9,p10 drivers */ > + if (!pvr_version_is(PVR_POWER10) && !pvr_version_is(PVR_POWER9)) > + return; > + > /* >* pnv_deepest_stop_{val,mask} should be set to values corresponding to >* the deepest stop state. > @@ -1112,6 +1223,11 @@ static void __init pnv_power9_idle_init(void) > struct pnv_idle_states_t *state = &pnv_idle_states[i]; > u64 psscr_rl = state->psscr_val & PSSCR_RL_MASK; > > + /* No deep loss driver implemented for POWER10 yet */ > + if (pvr_version_is(PVR_POWER10) && > + state->flags & > (OPAL_PM_TIMEBASE_STOP|OPAL_PM_LOSE_FULL_CONTEXT)) > + continue; > + Should we have a pr_info() informing the user the kernel is skipping over these stop states ? Reviewed-by: Gautham R. Shenoy > if ((state->flags & OPAL_PM_TIMEBASE_STOP) && >(pnv_first_tb_loss_level > psscr_rl)) > pnv_first_tb_loss_level = psscr_rl; -- Thanks and Regards gautham.
Re: [PATCH] powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10
* Nicholas Piggin [2020-08-19 19:47:00]: > This driver does not restore stop > 3 state, so it limits itself > to states which do not lose full state or TB. > > The POWER10 SPRs are sufficiently different from P9 that it seems > easier to split out the P10 code. The POWER10 deep sleep code > (e.g., the BHRB restore) has been taken out, but it can be re-added > when stop > 3 support is added. > > Cc: Ryan P Grimm > Cc: Michael Neuling > Cc: Gautham R. Shenoy > Cc: Pratik Rajesh Sampat > Signed-off-by: Nicholas Piggin Tested-by: Vaidyanathan Srinivasan This patch series was tested on P9 and P10. Correct set of stop states were discovered and enabled in the platform. Further the SPRs saved and restored by this code has also been tested along with cpuidle state tests and cpu offline/online tests. Thank for patch. --Vaidy
Re: [PATCH v3] powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory
On Tue, Aug 18, 2020 at 07:11:26PM -0300, Thiago Jung Bauermann wrote: > POWER secure guests (i.e., guests which use the Protection Execution > Facility) need to use SWIOTLB to be able to do I/O with the hypervisor, but > they don't need the SWIOTLB memory to be in low addresses since the > hypervisor doesn't have any addressing limitation. > > This solves a SWIOTLB initialization problem we are seeing in secure guests > with 128 GB of RAM: they are configured with 4 GB of crashkernel reserved > memory, which leaves no space for SWIOTLB in low addresses. > > To do this, we use mostly the same code as swiotlb_init(), but allocate the > buffer using memblock_alloc() instead of memblock_alloc_low(). > > Signed-off-by: Thiago Jung Bauermann Reviewed-by: Konrad Rzeszutek Wilk > --- > arch/powerpc/include/asm/svm.h | 4 > arch/powerpc/mm/mem.c| 6 +- > arch/powerpc/platforms/pseries/svm.c | 26 ++ > 3 files changed, 35 insertions(+), 1 deletion(-) > > Changes from v2: > - Panic if unable to allocate buffer, as suggested by Christoph. > > Changes from v1: > - Open-code swiotlb_init() in arch-specific code, as suggested by > Christoph. > > diff --git a/arch/powerpc/include/asm/svm.h b/arch/powerpc/include/asm/svm.h > index 85580b30aba4..7546402d796a 100644 > --- a/arch/powerpc/include/asm/svm.h > +++ b/arch/powerpc/include/asm/svm.h > @@ -15,6 +15,8 @@ static inline bool is_secure_guest(void) > return mfmsr() & MSR_S; > } > > +void __init svm_swiotlb_init(void); > + > void dtl_cache_ctor(void *addr); > #define get_dtl_cache_ctor() (is_secure_guest() ? dtl_cache_ctor : NULL) > > @@ -25,6 +27,8 @@ static inline bool is_secure_guest(void) > return false; > } > > +static inline void svm_swiotlb_init(void) {} > + > #define get_dtl_cache_ctor() NULL > > #endif /* CONFIG_PPC_SVM */ > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c > index c2c11eb8dcfc..0f21bcb16405 100644 > --- a/arch/powerpc/mm/mem.c > +++ b/arch/powerpc/mm/mem.c > @@ -50,6 +50,7 @@ > #include > #include > #include > +#include > > #include > > @@ -290,7 +291,10 @@ void __init mem_init(void) >* back to to-down. >*/ > memblock_set_bottom_up(true); > - swiotlb_init(0); > + if (is_secure_guest()) > + svm_swiotlb_init(); > + else > + swiotlb_init(0); > #endif > > high_memory = (void *) __va(max_low_pfn * PAGE_SIZE); > diff --git a/arch/powerpc/platforms/pseries/svm.c > b/arch/powerpc/platforms/pseries/svm.c > index 40c0637203d5..81085eb8f225 100644 > --- a/arch/powerpc/platforms/pseries/svm.c > +++ b/arch/powerpc/platforms/pseries/svm.c > @@ -7,6 +7,7 @@ > */ > > #include > +#include > #include > #include > #include > @@ -34,6 +35,31 @@ static int __init init_svm(void) > } > machine_early_initcall(pseries, init_svm); > > +/* > + * Initialize SWIOTLB. Essentially the same as swiotlb_init(), except that it > + * can allocate the buffer anywhere in memory. Since the hypervisor doesn't > have > + * any addressing limitation, we don't need to allocate it in low addresses. > + */ > +void __init svm_swiotlb_init(void) > +{ > + unsigned char *vstart; > + unsigned long bytes, io_tlb_nslabs; > + > + io_tlb_nslabs = (swiotlb_size_or_default() >> IO_TLB_SHIFT); > + io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE); > + > + bytes = io_tlb_nslabs << IO_TLB_SHIFT; > + > + vstart = memblock_alloc(PAGE_ALIGN(bytes), PAGE_SIZE); > + if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, false)) > + return; > + > + if (io_tlb_start) > + memblock_free_early(io_tlb_start, > + PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT)); > + panic("SVM: Cannot allocate SWIOTLB buffer"); > +} > + > int set_memory_encrypted(unsigned long addr, int numpages) > { > if (!PAGE_ALIGNED(addr))
Re: [PATCH v3] powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory
Christoph Hellwig writes: > On Tue, Aug 18, 2020 at 07:11:26PM -0300, Thiago Jung Bauermann wrote: >> POWER secure guests (i.e., guests which use the Protection Execution >> Facility) need to use SWIOTLB to be able to do I/O with the hypervisor, but >> they don't need the SWIOTLB memory to be in low addresses since the >> hypervisor doesn't have any addressing limitation. >> >> This solves a SWIOTLB initialization problem we are seeing in secure guests >> with 128 GB of RAM: they are configured with 4 GB of crashkernel reserved >> memory, which leaves no space for SWIOTLB in low addresses. >> >> To do this, we use mostly the same code as swiotlb_init(), but allocate the >> buffer using memblock_alloc() instead of memblock_alloc_low(). >> >> Signed-off-by: Thiago Jung Bauermann > > Looks fine to me (except for the pointlessly long comment lines, but I've > been told that's the powerpc way). Thanks! Do I have your Reviewed-by? -- Thiago Jung Bauermann IBM Linux Technology Center
Re: [PATCH v2 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.
Hi "Aneesh, I love your patch! Yet something to improve: [auto build test ERROR on hnaz-linux-mm/master] [also build test ERROR on powerpc/next linus/master v5.9-rc1 next-20200819] [cannot apply to mmotm/master] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/mm-debug_vm_pgtable-fixes/20200819-213446 base: https://github.com/hnaz/linux-mm master config: i386-randconfig-s002-20200818 (attached as .config) compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.2-183-gaa6ede3b-dirty # save the attached .config to linux build tree make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): >> arch/x86/mm/ioremap.c:484:12: error: redefinition of >> 'arch_ioremap_p4d_supported' 484 | int __init arch_ioremap_p4d_supported(void) |^~ In file included from arch/x86/mm/ioremap.c:12: include/linux/io.h:41:19: note: previous definition of 'arch_ioremap_p4d_supported' was here 41 | static inline int arch_ioremap_p4d_supported(void) | ^~ >> arch/x86/mm/ioremap.c:489:12: error: redefinition of >> 'arch_ioremap_pud_supported' 489 | int __init arch_ioremap_pud_supported(void) |^~ In file included from arch/x86/mm/ioremap.c:12: include/linux/io.h:45:19: note: previous definition of 'arch_ioremap_pud_supported' was here 45 | static inline int arch_ioremap_pud_supported(void) | ^~ >> arch/x86/mm/ioremap.c:498:12: error: redefinition of >> 'arch_ioremap_pmd_supported' 498 | int __init arch_ioremap_pmd_supported(void) |^~ In file included from arch/x86/mm/ioremap.c:12: include/linux/io.h:49:19: note: previous definition of 'arch_ioremap_pmd_supported' was here 49 | static inline int arch_ioremap_pmd_supported(void) | ^~ arch/x86/mm/ioremap.c:737:17: warning: no previous prototype for 'early_memremap_pgprot_adjust' [-Wmissing-prototypes] 737 | pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr, | ^~~~ # https://github.com/0day-ci/linux/commit/260b675444a7d5afa9ccef47ca9e588fb18d01a3 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Aneesh-Kumar-K-V/mm-debug_vm_pgtable-fixes/20200819-213446 git checkout 260b675444a7d5afa9ccef47ca9e588fb18d01a3 vim +/arch_ioremap_p4d_supported +484 arch/x86/mm/ioremap.c ^1da177e4c3f41 arch/i386/mm/ioremap.c Linus Torvalds2005-04-16 483 0f472d04f59ff8 arch/x86/mm/ioremap.c Anshuman Khandual 2019-07-16 @484 int __init arch_ioremap_p4d_supported(void) 0f472d04f59ff8 arch/x86/mm/ioremap.c Anshuman Khandual 2019-07-16 485 { 0f472d04f59ff8 arch/x86/mm/ioremap.c Anshuman Khandual 2019-07-16 486 return 0; 0f472d04f59ff8 arch/x86/mm/ioremap.c Anshuman Khandual 2019-07-16 487 } 0f472d04f59ff8 arch/x86/mm/ioremap.c Anshuman Khandual 2019-07-16 488 1e6277de3a2337 arch/x86/mm/ioremap.c Jan Beulich 2015-05-28 @489 int __init arch_ioremap_pud_supported(void) 5d72b4fba40ef4 arch/x86/mm/ioremap.c Toshi Kani2015-04-14 490 { 5d72b4fba40ef4 arch/x86/mm/ioremap.c Toshi Kani2015-04-14 491 #ifdef CONFIG_X86_64 b8291adc191abe arch/x86/mm/ioremap.c Borislav Petkov 2016-03-29 492 return boot_cpu_has(X86_FEATURE_GBPAGES); 5d72b4fba40ef4 arch/x86/mm/ioremap.c Toshi Kani2015-04-14 493 #else 5d72b4fba40ef4 arch/x86/mm/ioremap.c Toshi Kani2015-04-14 494 return 0; 5d72b4fba40ef4 arch/x86/mm/ioremap.c Toshi Kani2015-04-14 495 #endif 5d72b4fba40ef4 arch/x86/mm/ioremap.c Toshi Kani2015-04-14 496 } 5d72b4fba40ef4 arch/x86/mm/ioremap.c Toshi Kani2015-04-14 497 1e6277de3a2337 arch/x86/mm/ioremap.c Jan Beulich 2015-05-28 @498 int __init arch_ioremap_pmd_supported(void) 5d72b4fba40ef4 arch/x86/mm/ioremap.c Toshi Kani2015-04-14 499 { 16bf92261b1b6c arch/x86/mm/ioremap.c Borislav Petkov 2016-03-29 500 return boot_cpu_has(X86_FEATURE_PSE); 5d72b4fba40ef4 arch/x86/mm/ioremap.c Toshi Kani2015-04-14 501 } 5d72b4fba40ef4 arch/x86/mm/ioremap.c Toshi Kani2015-04-14 502 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH 1/2] lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state
On Wed, Aug 19, 2020 at 05:32:50PM +0200, pet...@infradead.org wrote: > On Wed, Aug 19, 2020 at 08:39:13PM +1000, Alexey Kardashevskiy wrote: > > > > or current upstream? > > > > The upstream 18445bf405cb (13 hours old) also shows the problem. Yours > > 1/2 still fixes it. > > Afaict that just reduces the window. > > Isn't the problem that: > > arch/powerpc/kernel/exceptions-64e.S > > START_EXCEPTION(perfmon); > NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR, > PROLOG_ADDITION_NONE) > EXCEPTION_COMMON(0x260) > INTS_DISABLE > # RECONCILE_IRQ_STATE > # TRACE_DISABLE_INTS > # TRACE_WITH_FRAME_BUFFER(trace_hardirqs_off) > # > # but we haven't done nmi_enter() yet... whoopsy > > CHECK_NAPPING() > addir3,r1,STACK_FRAME_OVERHEAD > bl performance_monitor_exception > # perf_irq() > # perf_event_interrupt > # __perf_event_interrupt > # nmi_enter() > > > > That is, afaict your entry code is buggered. That is, patch 1/2 doesn't change the case: local_irq_enable() trace_hardirqs_on() trace_hardirqs_off() ... if (regs_irqs_disabled(regs)) // false trace_hardirqs_on(); raw_local_irq_enable() Where local_irq_enable() has done trace_hardirqs_on() and the NMI hits and undoes it, but doesn't re-do it because the hardware state is still disabled. What's supposed to happen is: nmi_enter() trace_hardirqs_off() // no-op, because in_nmi() (or previously because lockdep_off()) ...
Re: [PATCH 1/2] lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state
On Wed, Aug 19, 2020 at 08:39:13PM +1000, Alexey Kardashevskiy wrote: > > or current upstream? > > The upstream 18445bf405cb (13 hours old) also shows the problem. Yours > 1/2 still fixes it. Afaict that just reduces the window. Isn't the problem that: arch/powerpc/kernel/exceptions-64e.S START_EXCEPTION(perfmon); NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR, PROLOG_ADDITION_NONE) EXCEPTION_COMMON(0x260) INTS_DISABLE # RECONCILE_IRQ_STATE # TRACE_DISABLE_INTS # TRACE_WITH_FRAME_BUFFER(trace_hardirqs_off) # # but we haven't done nmi_enter() yet... whoopsy CHECK_NAPPING() addir3,r1,STACK_FRAME_OVERHEAD bl performance_monitor_exception #perf_irq() # perf_event_interrupt #__perf_event_interrupt # nmi_enter() That is, afaict your entry code is buggered.
Re: [PATCH v2 00/13] mm/debug_vm_pgtable fixes
"Aneesh Kumar K.V" writes: > This patch series includes fixes for debug_vm_pgtable test code so that > they follow page table updates rules correctly. The first two patches > introduce > changes w.r.t ppc64. The patches are included in this series for > completeness. We can > merge them via ppc64 tree if required. > > Hugetlb test is disabled on ppc64 because that needs larger change to satisfy > page table update rules. > > Changes from V1: > * Address review feedback > * drop test specific pfn_pte and pfn_pmd. > * Update ppc64 page table helper to add _PAGE_PTE > > Aneesh Kumar K.V (13): > powerpc/mm: Add DEBUG_VM WARN for pmd_clear > powerpc/mm: Move setting pte specific flags to pfn_pte > mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value > mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge > vmap support. > mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with > CONFIG_NUMA_BALANCING > mm/debug_vm_pgtable/THP: Mark the pte entry huge before using > set_pmd/pud_at > mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an > existing pte entry > mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP > mm/debug_vm_pgtable/locks: Move non page table modifying test together > mm/debug_vm_pgtable/locks: Take correct page table lock > mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries > mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64 > mm/debug_vm_pgtable: populate a pte entry before fetching it > > arch/powerpc/include/asm/book3s/64/pgtable.h | 29 +++- > arch/powerpc/include/asm/nohash/pgtable.h| 5 - > arch/powerpc/mm/book3s64/pgtable.c | 2 +- > arch/powerpc/mm/pgtable.c| 5 - > include/linux/io.h | 12 ++ > mm/debug_vm_pgtable.c| 151 +++ > 6 files changed, 127 insertions(+), 77 deletions(-) > BTW I picked a wrong branch when sending this. Attaching the diff against what I want to send. pfn_pmd() no more updates _PAGE_PTE because that is handled by pmd_mkhuge(). diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 3b4da7c63e28..e18ae50a275c 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -141,7 +141,7 @@ pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot) unsigned long pmdv; pmdv = (pfn << PAGE_SHIFT) & PTE_RPN_MASK; - return __pmd(pmdv | pgprot_val(pgprot) | _PAGE_PTE); + return pmd_set_protbits(__pmd(pmdv), pgprot); } pmd_t mk_pmd(struct page *page, pgprot_t pgprot) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 7d9f8e1d790f..cad61d22f33a 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -229,7 +229,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot) static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot) { - pmd_t pmd = pfn_pmd(pfn, prot); + pmd_t pmd = pmd_mkhuge(pfn_pmd(pfn, prot)); if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) return;
Re: [PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources
Le 19/08/2020 à 15:15, Oliver O'Halloran a écrit : On Wed, Aug 19, 2020 at 11:07 PM Frederic Barrat wrote: Fix typo introduced during recent code cleanup, which could lead to silently not freeing resources or oops message (on PCI hotplug or CAPI reset). oof Did you actually hit that oops on CAPI reset? Including the stack trace is helpful for avoiding this sort of problem in the future. Anyway, yeah, I found it with capi reset. It's actually not a oops, we hit the WARN_ON in iommu_tce_table_put(), when we try to release the DMA resource for a PE where it was never configured to start with. So that was easy to track. Fred Reviewed-by: Oliver O'Halloran Only impacts ioda2, the code path for ioda1 is correct. yeah but no ones uses ioda1 Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA setup state") Signed-off-by: Frederic Barrat --- arch/powerpc/platforms/powernv/pci-ioda.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index c9c25fb0783c..023a4f987bb2 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2705,7 +2705,7 @@ void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe *pe) struct iommu_table *tbl = pe->table_group.tables[0]; int64_t rc; - if (pe->dma_setup_done) + if (!pe->dma_setup_done) return; rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0); -- 2.26.2
Re: [PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources
On Wed, Aug 19, 2020 at 11:07 PM Frederic Barrat wrote: > > Fix typo introduced during recent code cleanup, which could lead to > silently not freeing resources or oops message (on PCI hotplug or CAPI > reset). oof Did you actually hit that oops on CAPI reset? Including the stack trace is helpful for avoiding this sort of problem in the future. Anyway, Reviewed-by: Oliver O'Halloran > Only impacts ioda2, the code path for ioda1 is correct. yeah but no ones uses ioda1 > Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA > setup state") > Signed-off-by: Frederic Barrat > --- > arch/powerpc/platforms/powernv/pci-ioda.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index c9c25fb0783c..023a4f987bb2 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -2705,7 +2705,7 @@ void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe > *pe) > struct iommu_table *tbl = pe->table_group.tables[0]; > int64_t rc; > > - if (pe->dma_setup_done) > + if (!pe->dma_setup_done) > return; > > rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0); > -- > 2.26.2 >
[PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources
Fix typo introduced during recent code cleanup, which could lead to silently not freeing resources or oops message (on PCI hotplug or CAPI reset). Only impacts ioda2, the code path for ioda1 is correct. Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA setup state") Signed-off-by: Frederic Barrat --- arch/powerpc/platforms/powernv/pci-ioda.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index c9c25fb0783c..023a4f987bb2 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -2705,7 +2705,7 @@ void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe *pe) struct iommu_table *tbl = pe->table_group.tables[0]; int64_t rc; - if (pe->dma_setup_done) + if (!pe->dma_setup_done) return; rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0); -- 2.26.2
[PATCH v2 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.
ppc64 supports huge vmap only with radix translation. Hence use arch helper to determine the huge vmap support. Signed-off-by: Aneesh Kumar K.V --- include/linux/io.h| 12 mm/debug_vm_pgtable.c | 4 ++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/include/linux/io.h b/include/linux/io.h index 8394c56babc2..0b1ecda0cc86 100644 --- a/include/linux/io.h +++ b/include/linux/io.h @@ -38,6 +38,18 @@ int arch_ioremap_pud_supported(void); int arch_ioremap_pmd_supported(void); #else static inline void ioremap_huge_init(void) { } +static inline int arch_ioremap_p4d_supported(void) +{ + return false; +} +static inline int arch_ioremap_pud_supported(void) +{ + return false; +} +static inline int arch_ioremap_pmd_supported(void) +{ + return false; +} #endif /* diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 57259e2dbd17..cf3c4792b4a2 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -206,7 +206,7 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot) { pmd_t pmd; - if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) + if (!arch_ioremap_pmd_supported()) return; pr_debug("Validating PMD huge\n"); @@ -320,7 +320,7 @@ static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot) { pud_t pud; - if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP)) + if (!arch_ioremap_pud_supported()) return; pr_debug("Validating PUD huge\n"); -- 2.26.2
[PATCH v2 13/13] mm/debug_vm_pgtable: populate a pte entry before fetching it
pte_clear_tests operate on an existing pte entry. Make sure that is not a none pte entry. Signed-off-by: Aneesh Kumar K.V --- mm/debug_vm_pgtable.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 09ce9974c187..7d9f8e1d790f 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -531,7 +531,7 @@ static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp, static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep, unsigned long vaddr) { - pte_t pte = ptep_get(ptep); + pte_t pte = ptep_get_and_clear(mm, vaddr, ptep); pr_debug("Validating PTE clear\n"); pte = __pte(pte_val(pte) | RANDOM_ORVALUE); @@ -929,7 +929,7 @@ static int __init debug_vm_pgtable(void) p4d_t *p4dp, *saved_p4dp; pud_t *pudp, *saved_pudp; pmd_t *pmdp, *saved_pmdp, pmd; - pte_t *ptep; + pte_t *ptep, pte; pgtable_t saved_ptep; pgprot_t prot, protnone; phys_addr_t paddr; @@ -1034,6 +1034,8 @@ static int __init debug_vm_pgtable(void) */ ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl); + pte = pfn_pte(pte_aligned, prot); + set_pte_at(mm, vaddr, ptep, pte); pte_clear_tests(mm, ptep, vaddr); pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); pte_unmap_unlock(ptep, ptl); -- 2.26.2
[PATCH v2 05/13] mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with CONFIG_NUMA_BALANCING
Saved write support was added to track the write bit of a pte after marking the pte protnone. This was done so that AUTONUMA can convert a write pte to protnone and still track the old write bit. When converting it back we set the pte write bit correctly thereby avoiding a write fault again. Hence enable the test only when CONFIG_NUMA_BALANCING is enabled and use protnone protflags. Signed-off-by: Aneesh Kumar K.V --- mm/debug_vm_pgtable.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index cf3c4792b4a2..ffa10ede6842 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -114,10 +114,14 @@ static void __init pte_savedwrite_tests(unsigned long pfn, pgprot_t prot) { pte_t pte = pfn_pte(pfn, prot); + if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) + return; + pr_debug("Validating PTE saved write\n"); WARN_ON(!pte_savedwrite(pte_mk_savedwrite(pte_clear_savedwrite(pte; WARN_ON(pte_savedwrite(pte_clear_savedwrite(pte_mk_savedwrite(pte; } + #ifdef CONFIG_TRANSPARENT_HUGEPAGE static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { @@ -225,6 +229,9 @@ static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot) { pmd_t pmd = pfn_pmd(pfn, prot); + if (!IS_ENABLED(CONFIG_NUMA_BALANCING)) + return; + pr_debug("Validating PMD saved write\n"); WARN_ON(!pmd_savedwrite(pmd_mk_savedwrite(pmd_clear_savedwrite(pmd; WARN_ON(pmd_savedwrite(pmd_clear_savedwrite(pmd_mk_savedwrite(pmd; @@ -1005,8 +1012,8 @@ static int __init debug_vm_pgtable(void) pmd_huge_tests(pmdp, pmd_aligned, prot); pud_huge_tests(pudp, pud_aligned, prot); - pte_savedwrite_tests(pte_aligned, prot); - pmd_savedwrite_tests(pmd_aligned, prot); + pte_savedwrite_tests(pte_aligned, protnone); + pmd_savedwrite_tests(pmd_aligned, protnone); pte_unmap_unlock(ptep, ptl); -- 2.26.2
[PATCH v2 09/13] mm/debug_vm_pgtable/locks: Move non page table modifying test together
This will help in adding proper locks in a later patch Signed-off-by: Aneesh Kumar K.V --- mm/debug_vm_pgtable.c | 52 --- 1 file changed, 29 insertions(+), 23 deletions(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 6dcac2b40fef..69fe3cd8126c 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -977,7 +977,7 @@ static int __init debug_vm_pgtable(void) p4dp = p4d_alloc(mm, pgdp, vaddr); pudp = pud_alloc(mm, p4dp, vaddr); pmdp = pmd_alloc(mm, pudp, vaddr); - ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl); + ptep = pte_alloc_map(mm, pmdp, vaddr); /* * Save all the page table page addresses as the page table @@ -997,33 +997,12 @@ static int __init debug_vm_pgtable(void) p4d_basic_tests(p4d_aligned, prot); pgd_basic_tests(pgd_aligned, prot); - pte_clear_tests(mm, ptep, vaddr); - pmd_clear_tests(mm, pmdp); - pud_clear_tests(mm, pudp); - p4d_clear_tests(mm, p4dp); - pgd_clear_tests(mm, pgdp); - - pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); - pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep); - pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot); - hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); - pmd_leaf_tests(pmd_aligned, prot); pud_leaf_tests(pud_aligned, prot); - pmd_huge_tests(pmdp, pmd_aligned, prot); - pud_huge_tests(pudp, pud_aligned, prot); - pte_savedwrite_tests(pte_aligned, protnone); pmd_savedwrite_tests(pmd_aligned, protnone); - pte_unmap_unlock(ptep, ptl); - - pmd_populate_tests(mm, pmdp, saved_ptep); - pud_populate_tests(mm, pudp, saved_pmdp); - p4d_populate_tests(mm, p4dp, saved_pudp); - pgd_populate_tests(mm, pgdp, saved_p4dp); - pte_special_tests(pte_aligned, prot); pte_protnone_tests(pte_aligned, protnone); pmd_protnone_tests(pmd_aligned, protnone); @@ -1041,11 +1020,38 @@ static int __init debug_vm_pgtable(void) pmd_swap_tests(pmd_aligned, prot); swap_migration_tests(); - hugetlb_basic_tests(pte_aligned, prot); pmd_thp_tests(pmd_aligned, prot); pud_thp_tests(pud_aligned, prot); + /* +* Page table modifying tests +*/ + pte_clear_tests(mm, ptep, vaddr); + pmd_clear_tests(mm, pmdp); + pud_clear_tests(mm, pudp); + p4d_clear_tests(mm, p4dp); + pgd_clear_tests(mm, pgdp); + + ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl); + pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); + pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep); + pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot); + hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); + + + pmd_huge_tests(pmdp, pmd_aligned, prot); + pud_huge_tests(pudp, pud_aligned, prot); + + pte_unmap_unlock(ptep, ptl); + + pmd_populate_tests(mm, pmdp, saved_ptep); + pud_populate_tests(mm, pudp, saved_pmdp); + p4d_populate_tests(mm, p4dp, saved_pudp); + pgd_populate_tests(mm, pgdp, saved_p4dp); + + hugetlb_basic_tests(pte_aligned, prot); + p4d_free(mm, saved_p4dp); pud_free(mm, saved_pudp); pmd_free(mm, saved_pmdp); -- 2.26.2
[PATCH v2 06/13] mm/debug_vm_pgtable/THP: Mark the pte entry huge before using set_pmd/pud_at
kernel expects entries to be marked huge before we use set_pmd_at()/set_pud_at(). Signed-off-by: Aneesh Kumar K.V --- mm/debug_vm_pgtable.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index ffa10ede6842..76f4c713e5a3 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -150,7 +150,7 @@ static void __init pmd_advanced_tests(struct mm_struct *mm, unsigned long pfn, unsigned long vaddr, pgprot_t prot) { - pmd_t pmd = pfn_pmd(pfn, prot); + pmd_t pmd; if (!has_transparent_hugepage()) return; @@ -159,19 +159,19 @@ static void __init pmd_advanced_tests(struct mm_struct *mm, /* Align the address wrt HPAGE_PMD_SIZE */ vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE; - pmd = pfn_pmd(pfn, prot); + pmd = pmd_mkhuge(pfn_pmd(pfn, prot)); set_pmd_at(mm, vaddr, pmdp, pmd); pmdp_set_wrprotect(mm, vaddr, pmdp); pmd = READ_ONCE(*pmdp); WARN_ON(pmd_write(pmd)); - pmd = pfn_pmd(pfn, prot); + pmd = pmd_mkhuge(pfn_pmd(pfn, prot)); set_pmd_at(mm, vaddr, pmdp, pmd); pmdp_huge_get_and_clear(mm, vaddr, pmdp); pmd = READ_ONCE(*pmdp); WARN_ON(!pmd_none(pmd)); - pmd = pfn_pmd(pfn, prot); + pmd = pmd_mkhuge(pfn_pmd(pfn, prot)); pmd = pmd_wrprotect(pmd); pmd = pmd_mkclean(pmd); set_pmd_at(mm, vaddr, pmdp, pmd); @@ -267,7 +267,7 @@ static void __init pud_advanced_tests(struct mm_struct *mm, unsigned long pfn, unsigned long vaddr, pgprot_t prot) { - pud_t pud = pfn_pud(pfn, prot); + pud_t pud; if (!has_transparent_hugepage()) return; @@ -276,25 +276,28 @@ static void __init pud_advanced_tests(struct mm_struct *mm, /* Align the address wrt HPAGE_PUD_SIZE */ vaddr = (vaddr & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE; + pud = pud_mkhuge(pfn_pud(pfn, prot)); set_pud_at(mm, vaddr, pudp, pud); pudp_set_wrprotect(mm, vaddr, pudp); pud = READ_ONCE(*pudp); WARN_ON(pud_write(pud)); #ifndef __PAGETABLE_PMD_FOLDED - pud = pfn_pud(pfn, prot); + + pud = pud_mkhuge(pfn_pud(pfn, prot)); set_pud_at(mm, vaddr, pudp, pud); pudp_huge_get_and_clear(mm, vaddr, pudp); pud = READ_ONCE(*pudp); WARN_ON(!pud_none(pud)); - pud = pfn_pud(pfn, prot); + pud = pud_mkhuge(pfn_pud(pfn, prot)); set_pud_at(mm, vaddr, pudp, pud); pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1); pud = READ_ONCE(*pudp); WARN_ON(!pud_none(pud)); #endif /* __PAGETABLE_PMD_FOLDED */ - pud = pfn_pud(pfn, prot); + + pud = pud_mkhuge(pfn_pud(pfn, prot)); pud = pud_wrprotect(pud); pud = pud_mkclean(pud); set_pud_at(mm, vaddr, pudp, pud); -- 2.26.2
[PATCH v2 07/13] mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an existing pte entry
set_pte_at() should not be used to set a pte entry at locations that already holds a valid pte entry. Architectures like ppc64 don't do TLB invalidate in set_pte_at() and hence expect it to be used to set locations that are not a valid PTE. Signed-off-by: Aneesh Kumar K.V --- mm/debug_vm_pgtable.c | 35 +++ 1 file changed, 15 insertions(+), 20 deletions(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 76f4c713e5a3..9c7e2c9cfc76 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -74,15 +74,18 @@ static void __init pte_advanced_tests(struct mm_struct *mm, { pte_t pte = pfn_pte(pfn, prot); + /* +* Architectures optimize set_pte_at by avoiding TLB flush. +* This requires set_pte_at to be not used to update an +* existing pte entry. Clear pte before we do set_pte_at +*/ + pr_debug("Validating PTE advanced\n"); pte = pfn_pte(pfn, prot); set_pte_at(mm, vaddr, ptep, pte); ptep_set_wrprotect(mm, vaddr, ptep); pte = ptep_get(ptep); WARN_ON(pte_write(pte)); - - pte = pfn_pte(pfn, prot); - set_pte_at(mm, vaddr, ptep, pte); ptep_get_and_clear(mm, vaddr, ptep); pte = ptep_get(ptep); WARN_ON(!pte_none(pte)); @@ -96,13 +99,11 @@ static void __init pte_advanced_tests(struct mm_struct *mm, ptep_set_access_flags(vma, vaddr, ptep, pte, 1); pte = ptep_get(ptep); WARN_ON(!(pte_write(pte) && pte_dirty(pte))); - - pte = pfn_pte(pfn, prot); - set_pte_at(mm, vaddr, ptep, pte); ptep_get_and_clear_full(mm, vaddr, ptep, 1); pte = ptep_get(ptep); WARN_ON(!pte_none(pte)); + pte = pfn_pte(pfn, prot); pte = pte_mkyoung(pte); set_pte_at(mm, vaddr, ptep, pte); ptep_test_and_clear_young(vma, vaddr, ptep); @@ -164,9 +165,6 @@ static void __init pmd_advanced_tests(struct mm_struct *mm, pmdp_set_wrprotect(mm, vaddr, pmdp); pmd = READ_ONCE(*pmdp); WARN_ON(pmd_write(pmd)); - - pmd = pmd_mkhuge(pfn_pmd(pfn, prot)); - set_pmd_at(mm, vaddr, pmdp, pmd); pmdp_huge_get_and_clear(mm, vaddr, pmdp); pmd = READ_ONCE(*pmdp); WARN_ON(!pmd_none(pmd)); @@ -180,13 +178,11 @@ static void __init pmd_advanced_tests(struct mm_struct *mm, pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1); pmd = READ_ONCE(*pmdp); WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd))); - - pmd = pmd_mkhuge(pfn_pmd(pfn, prot)); - set_pmd_at(mm, vaddr, pmdp, pmd); pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1); pmd = READ_ONCE(*pmdp); WARN_ON(!pmd_none(pmd)); + pmd = pmd_mkhuge(pfn_pmd(pfn, prot)); pmd = pmd_mkyoung(pmd); set_pmd_at(mm, vaddr, pmdp, pmd); pmdp_test_and_clear_young(vma, vaddr, pmdp); @@ -283,18 +279,10 @@ static void __init pud_advanced_tests(struct mm_struct *mm, WARN_ON(pud_write(pud)); #ifndef __PAGETABLE_PMD_FOLDED - - pud = pud_mkhuge(pfn_pud(pfn, prot)); - set_pud_at(mm, vaddr, pudp, pud); pudp_huge_get_and_clear(mm, vaddr, pudp); pud = READ_ONCE(*pudp); WARN_ON(!pud_none(pud)); - pud = pud_mkhuge(pfn_pud(pfn, prot)); - set_pud_at(mm, vaddr, pudp, pud); - pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1); - pud = READ_ONCE(*pudp); - WARN_ON(!pud_none(pud)); #endif /* __PAGETABLE_PMD_FOLDED */ pud = pud_mkhuge(pfn_pud(pfn, prot)); @@ -307,6 +295,13 @@ static void __init pud_advanced_tests(struct mm_struct *mm, pud = READ_ONCE(*pudp); WARN_ON(!(pud_write(pud) && pud_dirty(pud))); +#ifndef __PAGETABLE_PMD_FOLDED + pudp_huge_get_and_clear_full(vma, vaddr, pudp, 1); + pud = READ_ONCE(*pudp); + WARN_ON(!pud_none(pud)); +#endif /* __PAGETABLE_PMD_FOLDED */ + + pud = pud_mkhuge(pfn_pud(pfn, prot)); pud = pud_mkyoung(pud); set_pud_at(mm, vaddr, pudp, pud); pudp_test_and_clear_young(vma, vaddr, pudp); -- 2.26.2
[PATCH v2 12/13] mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64
The seems to be missing quite a lot of details w.r.t allocating the correct pgtable_t page (huge_pte_alloc()), holding the right lock (huge_pte_lock()) etc. The vma used is also not a hugetlb VMA. ppc64 do have runtime checks within CONFIG_DEBUG_VM for most of these. Hence disable the test on ppc64. Signed-off-by: Aneesh Kumar K.V --- mm/debug_vm_pgtable.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 63576fe767a2..09ce9974c187 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -798,6 +798,7 @@ static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ } +#ifndef CONFIG_PPC_BOOK3S_64 static void __init hugetlb_advanced_tests(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *ptep, unsigned long pfn, @@ -840,6 +841,7 @@ static void __init hugetlb_advanced_tests(struct mm_struct *mm, pte = huge_ptep_get(ptep); WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte))); } +#endif #else /* !CONFIG_HUGETLB_PAGE */ static void __init hugetlb_basic_tests(unsigned long pfn, pgprot_t prot) { } static void __init hugetlb_advanced_tests(struct mm_struct *mm, @@ -1050,7 +1052,9 @@ static int __init debug_vm_pgtable(void) pud_populate_tests(mm, pudp, saved_pmdp); spin_unlock(ptl); - //hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); +#ifndef CONFIG_PPC_BOOK3S_64 + hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); +#endif spin_lock(&mm->page_table_lock); p4d_clear_tests(mm, p4dp); -- 2.26.2
[PATCH v2 11/13] mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries
pmd_clear() should not be used to clear pmd level pte entries. Signed-off-by: Aneesh Kumar K.V --- mm/debug_vm_pgtable.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 8f7a8ccb5a54..63576fe767a2 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -191,6 +191,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm, pmd = READ_ONCE(*pmdp); WARN_ON(pmd_young(pmd)); + /* Clear the pte entries */ + pmdp_huge_get_and_clear(mm, vaddr, pmdp); pgtable = pgtable_trans_huge_withdraw(mm, pmdp); } @@ -311,6 +313,8 @@ static void __init pud_advanced_tests(struct mm_struct *mm, pudp_test_and_clear_young(vma, vaddr, pudp); pud = READ_ONCE(*pudp); WARN_ON(pud_young(pud)); + + pudp_huge_get_and_clear(mm, vaddr, pudp); } static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot) @@ -429,8 +433,6 @@ static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp, * This entry points to next level page table page. * Hence this must not qualify as pud_bad(). */ - pmd_clear(pmdp); - pud_clear(pudp); pud_populate(mm, pudp, pmdp); pud = READ_ONCE(*pudp); WARN_ON(pud_bad(pud)); @@ -562,7 +564,6 @@ static void __init pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp, * This entry points to next level page table page. * Hence this must not qualify as pmd_bad(). */ - pmd_clear(pmdp); pmd_populate(mm, pmdp, pgtable); pmd = READ_ONCE(*pmdp); WARN_ON(pmd_bad(pmd)); -- 2.26.2
[PATCH v2 03/13] mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value
ppc64 use bit 62 to indicate a pte entry (_PAGE_PTE). Avoid setting that bit in random value. Signed-off-by: Aneesh Kumar K.V --- mm/debug_vm_pgtable.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 086309fb9b6f..57259e2dbd17 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -44,10 +44,13 @@ * entry type. But these bits might affect the ability to clear entries with * pxx_clear() because of how dynamic page table folding works on s390. So * while loading up the entries do not change the lower 4 bits. It does not - * have affect any other platform. + * have affect any other platform. Also avoid the 62nd bit on ppc64 that is + * used to mark a pte entry. */ -#define S390_MASK_BITS 4 -#define RANDOM_ORVALUE GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS) +#define S390_SKIP_MASK GENMASK(3, 0) +#define PPC64_SKIP_MASKGENMASK(62, 62) +#define ARCH_SKIP_MASK (S390_SKIP_MASK | PPC64_SKIP_MASK) +#define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) & ~ARCH_SKIP_MASK) #define RANDOM_NZVALUE GENMASK(7, 0) static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot) -- 2.26.2
[PATCH v2 10/13] mm/debug_vm_pgtable/locks: Take correct page table lock
Make sure we call pte accessors with correct lock held. Signed-off-by: Aneesh Kumar K.V --- mm/debug_vm_pgtable.c | 34 -- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 69fe3cd8126c..8f7a8ccb5a54 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -1024,33 +1024,39 @@ static int __init debug_vm_pgtable(void) pmd_thp_tests(pmd_aligned, prot); pud_thp_tests(pud_aligned, prot); + hugetlb_basic_tests(pte_aligned, prot); + /* * Page table modifying tests */ - pte_clear_tests(mm, ptep, vaddr); - pmd_clear_tests(mm, pmdp); - pud_clear_tests(mm, pudp); - p4d_clear_tests(mm, p4dp); - pgd_clear_tests(mm, pgdp); ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl); + pte_clear_tests(mm, ptep, vaddr); pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); - pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep); - pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot); - hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); - + pte_unmap_unlock(ptep, ptl); + ptl = pmd_lock(mm, pmdp); + pmd_clear_tests(mm, pmdp); + pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep); pmd_huge_tests(pmdp, pmd_aligned, prot); + pmd_populate_tests(mm, pmdp, saved_ptep); + spin_unlock(ptl); + + ptl = pud_lock(mm, pudp); + pud_clear_tests(mm, pudp); + pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot); pud_huge_tests(pudp, pud_aligned, prot); + pud_populate_tests(mm, pudp, saved_pmdp); + spin_unlock(ptl); - pte_unmap_unlock(ptep, ptl); + //hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); - pmd_populate_tests(mm, pmdp, saved_ptep); - pud_populate_tests(mm, pudp, saved_pmdp); + spin_lock(&mm->page_table_lock); + p4d_clear_tests(mm, p4dp); + pgd_clear_tests(mm, pgdp); p4d_populate_tests(mm, p4dp, saved_pudp); pgd_populate_tests(mm, pgdp, saved_p4dp); - - hugetlb_basic_tests(pte_aligned, prot); + spin_unlock(&mm->page_table_lock); p4d_free(mm, saved_p4dp); pud_free(mm, saved_pudp); -- 2.26.2
[PATCH v2 02/13] powerpc/mm: Move setting pte specific flags to pfn_pte
powerpc used to set the pte specific flags in set_pte_at(). This is different from other architectures. To be consistent with other architecture update pfn_pte and pfn_pmd to set _PAGE_PTE on ppc64. Also, drop now unused pte_mkpte. We add a VM_WARN_ON() to catch the usage of calling set_pte_at() without setting _PAGE_PTE bit. We will remove that after a few releases. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/pgtable.h | 15 +-- arch/powerpc/include/asm/nohash/pgtable.h| 5 - arch/powerpc/mm/book3s64/pgtable.c | 2 +- arch/powerpc/mm/pgtable.c| 5 - 4 files changed, 10 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 079211968987..2382fd516f6b 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -619,7 +619,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t pgprot) VM_BUG_ON(pfn >> (64 - PAGE_SHIFT)); VM_BUG_ON((pfn << PAGE_SHIFT) & ~PTE_RPN_MASK); - return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot)); + return __pte(((pte_basic_t)pfn << PAGE_SHIFT) | pgprot_val(pgprot) | _PAGE_PTE); } static inline unsigned long pte_pfn(pte_t pte) @@ -655,11 +655,6 @@ static inline pte_t pte_mkexec(pte_t pte) return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_EXEC)); } -static inline pte_t pte_mkpte(pte_t pte) -{ - return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE)); -} - static inline pte_t pte_mkwrite(pte_t pte) { /* @@ -823,6 +818,14 @@ static inline int pte_none(pte_t pte) static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, int percpu) { + + VM_WARN_ON(!(pte_raw(pte) & cpu_to_be64(_PAGE_PTE))); + /* +* Keep the _PAGE_PTE added till we are sure we handle _PAGE_PTE +* in all the callers. +*/ +pte = __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_PTE)); + if (radix_enabled()) return radix__set_pte_at(mm, addr, ptep, pte, percpu); return hash__set_pte_at(mm, addr, ptep, pte, percpu); diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h index 4b7c3472eab1..6277e7596ae5 100644 --- a/arch/powerpc/include/asm/nohash/pgtable.h +++ b/arch/powerpc/include/asm/nohash/pgtable.h @@ -140,11 +140,6 @@ static inline pte_t pte_mkold(pte_t pte) return __pte(pte_val(pte) & ~_PAGE_ACCESSED); } -static inline pte_t pte_mkpte(pte_t pte) -{ - return pte; -} - static inline pte_t pte_mkspecial(pte_t pte) { return __pte(pte_val(pte) | _PAGE_SPECIAL); diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index e18ae50a275c..3b4da7c63e28 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -141,7 +141,7 @@ pmd_t pfn_pmd(unsigned long pfn, pgprot_t pgprot) unsigned long pmdv; pmdv = (pfn << PAGE_SHIFT) & PTE_RPN_MASK; - return pmd_set_protbits(__pmd(pmdv), pgprot); + return __pmd(pmdv | pgprot_val(pgprot) | _PAGE_PTE); } pmd_t mk_pmd(struct page *page, pgprot_t pgprot) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 9c0547d77af3..ab57b07ef39a 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -184,9 +184,6 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, */ VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); - /* Add the pte bit when trying to set a pte */ - pte = pte_mkpte(pte); - /* Note: mm->context.id might not yet have been assigned as * this context might not have been activated yet when this * is called. @@ -275,8 +272,6 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_ */ VM_WARN_ON(pte_hw_valid(*ptep) && !pte_protnone(*ptep)); - pte = pte_mkpte(pte); - pte = set_pte_filter(pte); val = pte_val(pte); -- 2.26.2
[PATCH v2 08/13] mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP
Architectures like ppc64 use deposited page table while updating the huge pte entries. Signed-off-by: Aneesh Kumar K.V --- mm/debug_vm_pgtable.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 9c7e2c9cfc76..6dcac2b40fef 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -149,7 +149,7 @@ static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) static void __init pmd_advanced_tests(struct mm_struct *mm, struct vm_area_struct *vma, pmd_t *pmdp, unsigned long pfn, unsigned long vaddr, - pgprot_t prot) + pgprot_t prot, pgtable_t pgtable) { pmd_t pmd; @@ -160,6 +160,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm, /* Align the address wrt HPAGE_PMD_SIZE */ vaddr = (vaddr & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE; + pgtable_trans_huge_deposit(mm, pmdp, pgtable); + pmd = pmd_mkhuge(pfn_pmd(pfn, prot)); set_pmd_at(mm, vaddr, pmdp, pmd); pmdp_set_wrprotect(mm, vaddr, pmdp); @@ -188,6 +190,8 @@ static void __init pmd_advanced_tests(struct mm_struct *mm, pmdp_test_and_clear_young(vma, vaddr, pmdp); pmd = READ_ONCE(*pmdp); WARN_ON(pmd_young(pmd)); + + pgtable = pgtable_trans_huge_withdraw(mm, pmdp); } static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot) @@ -1000,7 +1004,7 @@ static int __init debug_vm_pgtable(void) pgd_clear_tests(mm, pgdp); pte_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); - pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot); + pmd_advanced_tests(mm, vma, pmdp, pmd_aligned, vaddr, prot, saved_ptep); pud_advanced_tests(mm, vma, pudp, pud_aligned, vaddr, prot); hugetlb_advanced_tests(mm, vma, ptep, pte_aligned, vaddr, prot); -- 2.26.2
[PATCH v2 00/13] mm/debug_vm_pgtable fixes
This patch series includes fixes for debug_vm_pgtable test code so that they follow page table updates rules correctly. The first two patches introduce changes w.r.t ppc64. The patches are included in this series for completeness. We can merge them via ppc64 tree if required. Hugetlb test is disabled on ppc64 because that needs larger change to satisfy page table update rules. Changes from V1: * Address review feedback * drop test specific pfn_pte and pfn_pmd. * Update ppc64 page table helper to add _PAGE_PTE Aneesh Kumar K.V (13): powerpc/mm: Add DEBUG_VM WARN for pmd_clear powerpc/mm: Move setting pte specific flags to pfn_pte mm/debug_vm_pgtable/ppc64: Avoid setting top bits in radom value mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support. mm/debug_vm_pgtable/savedwrite: Enable savedwrite test with CONFIG_NUMA_BALANCING mm/debug_vm_pgtable/THP: Mark the pte entry huge before using set_pmd/pud_at mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an existing pte entry mm/debug_vm_pgtable/thp: Use page table depost/withdraw with THP mm/debug_vm_pgtable/locks: Move non page table modifying test together mm/debug_vm_pgtable/locks: Take correct page table lock mm/debug_vm_pgtable/pmd_clear: Don't use pmd/pud_clear on pte entries mm/debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64 mm/debug_vm_pgtable: populate a pte entry before fetching it arch/powerpc/include/asm/book3s/64/pgtable.h | 29 +++- arch/powerpc/include/asm/nohash/pgtable.h| 5 - arch/powerpc/mm/book3s64/pgtable.c | 2 +- arch/powerpc/mm/pgtable.c| 5 - include/linux/io.h | 12 ++ mm/debug_vm_pgtable.c| 151 +++ 6 files changed, 127 insertions(+), 77 deletions(-) -- 2.26.2
[PATCH v2 01/13] powerpc/mm: Add DEBUG_VM WARN for pmd_clear
With the hash page table, the kernel should not use pmd_clear for clearing huge pte entries. Add a DEBUG_VM WARN to catch the wrong usage. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/book3s/64/pgtable.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index 6de56c3b33c4..079211968987 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -868,6 +868,13 @@ static inline bool pte_ci(pte_t pte) static inline void pmd_clear(pmd_t *pmdp) { + if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) { + /* +* Don't use this if we can possibly have a hash page table +* entry mapping this. +*/ + WARN_ON((pmd_val(*pmdp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == (H_PAGE_HASHPTE | _PAGE_PTE)); + } *pmdp = __pmd(0); } @@ -916,6 +923,13 @@ static inline int pmd_bad(pmd_t pmd) static inline void pud_clear(pud_t *pudp) { + if (IS_ENABLED(CONFIG_DEBUG_VM) && !radix_enabled()) { + /* +* Don't use this if we can possibly have a hash page table +* entry mapping this. +*/ + WARN_ON((pud_val(*pudp) & (H_PAGE_HASHPTE | _PAGE_PTE)) == (H_PAGE_HASHPTE | _PAGE_PTE)); + } *pudp = __pud(0); } -- 2.26.2
Re: [PATCH 00/10] sound: convert tasklets to use new tasklet_setup()
On Wed, 19 Aug 2020 13:16:05 +0200, Mark Brown wrote: > > On Wed, Aug 19, 2020 at 04:21:58PM +0530, Allen wrote: > > > > These patches which I wasn't CCed on and which need their subject lines > > > fixing :( . With the subject lines fixed I guess so so > > > Extremely sorry. I thought I had it covered. How would you like it > > worded? > > ASoC: To be more exact, "ASoC:" prefix is for sound/soc/*, and for the rest sound/*, use "ALSA:" prefix please. Takashi
Re: [PATCH 00/10] sound: convert tasklets to use new tasklet_setup()
On Wed, Aug 19, 2020 at 04:21:58PM +0530, Allen wrote: > > These patches which I wasn't CCed on and which need their subject lines > > fixing :( . With the subject lines fixed I guess so so > Extremely sorry. I thought I had it covered. How would you like it > worded? ASoC: In general you should try to follow the style for the code you're modifying, this applies to things like commit logs as well as the code itself. signature.asc Description: PGP signature
Re: [PATCH 00/10] sound: convert tasklets to use new tasklet_setup()
> > > Mark, may I apply those ASoC patches through my tree together with > > others? Those seem targeting to 5.9, and I have a patch set to > > convert to tasklet for 5.10, which would be better manageable when > > based on top of those changes. > > These patches which I wasn't CCed on and which need their subject lines > fixing :( . With the subject lines fixed I guess so so Extremely sorry. I thought I had it covered. How would you like it worded? > Acked-by: Mark Brown > > but judging from some of the other threads about similar patches that I > was randomly CCed on I'm not sure people like from_tasklet() so perhaps > there might be issues. Yes, there is a new macro by name cast_out() is suggested in place of from_tasklet(). Hopefully it will go in soon. Will spin out V2 with the change and also re-word subject line. > Allen, as documented in submitting-patches.rst please send patches to > the maintainers for the code you would like to change. The normal > kernel workflow is that people apply patches from their inboxes, if they > aren't copied they are likely to not see the patch at all and it is much > more difficult to apply patches. I understand, I'll take care of it in the future. Thank you. -- - Allen
Re: [PATCH 1/2] lockdep: improve current->(hard|soft)irqs_enabled synchronisation with actual irq state
On 19/08/2020 09:54, Nicholas Piggin wrote: > Excerpts from pet...@infradead.org's message of August 19, 2020 1:41 am: >> On Tue, Aug 18, 2020 at 05:22:33PM +1000, Nicholas Piggin wrote: >>> Excerpts from pet...@infradead.org's message of August 12, 2020 8:35 pm: On Wed, Aug 12, 2020 at 06:18:28PM +1000, Nicholas Piggin wrote: > Excerpts from pet...@infradead.org's message of August 7, 2020 9:11 pm: >> >> What's wrong with something like this? >> >> AFAICT there's no reason to actually try and add IRQ tracing here, it's >> just a hand full of instructions at the most. > > Because we may want to use that in other places as well, so it would > be nice to have tracing. > > Hmm... also, I thought NMI context was free to call local_irq_save/restore > anyway so the bug would still be there in those cases? NMI code has in_nmi() true, in which case the IRQ tracing is disabled (except for x86 which has CONFIG_TRACE_IRQFLAGS_NMI). >>> >>> That doesn't help. It doesn't fix the lockdep irq state going out of >>> synch with the actual irq state. The code which triggered this with the >>> special powerpc irq disable has in_nmi() true as well. >> >> Urgh, you're talking about using lockdep_assert_irqs*() from NMI >> context? >> >> If not, I'm afraid I might've lost the plot a little on what exact >> failure case we're talking about. >> > > Hm, I may have been a bit confused actually. Since your Fix > TRACE_IRQFLAGS vs NMIs patch it might now work. > > I'm worried powerpc disables trace irqs trace_hardirqs_off() > before nmi_enter() might still be a problem, but not sure > actually. Alexey did you end up re-testing with Peter's patch The one above in the thread which replaces powerpc_local_irq_pmu_save() with raw_powerpc_local_irq_pmu_save()? It did not compile as there is no raw_powerpc_local_irq_pmu_save() so I may be missing something here. I applied the patch on top of the current upstream and replaced raw_powerpc_local_irq_pmu_save() with raw_local_irq_pmu_save() (which I think was the intention) but I still see the issue. > or current upstream? The upstream 18445bf405cb (13 hours old) also shows the problem. Yours 1/2 still fixes it. > > Thanks, > Nick > -- Alexey
Re: [Virtual ppce500] virtio_gpu virtio0: swiotlb buffer is full
On 19 August 2020 at 06:35 am, Gerd Hoffmann wrote: On Tue, Aug 18, 2020 at 04:41:38PM +0200, Christian Zigotzky wrote: Hello Gerd, I compiled a new kernel with the latest DRM misc updates today. The patch is included in these updates. This kernel works with the VirtIO-GPU in a virtual e5500 QEMU/KVM HV machine on my X5000. Unfortunately I can only use the VirtIO-GPU (Monitor: Red Hat, Inc. 8") with a resolution of 640x480. If I set a higher resolution then the guest disables the monitor. I can use higher resolutions with the stable kernel 5.8 and the VirtIO-GPU. Please check the latest DRM updates. https://patchwork.freedesktop.org/patch/385980/ (tests & reviews & acks are welcome) HTH, Gerd Hello Gerd, I compiled a new RC1 with our patches today. With these patches, the VirtIO-GPU works without any problems. I can use higher resolutions again. Screenshot of the RC1-3 with the VirtIO-GPU in a virtual e5500 QEMU/KVM HV machine on my X5000: https://i.pinimg.com/originals/4f/b0/14/4fb01476edd7abe6be1e1203a8e7e152.png Thanks a lot for your help! Cheers, Christian
[PATCH] powerpc/powernv/idle: add a basic stop 0-3 driver for POWER10
This driver does not restore stop > 3 state, so it limits itself to states which do not lose full state or TB. The POWER10 SPRs are sufficiently different from P9 that it seems easier to split out the P10 code. The POWER10 deep sleep code (e.g., the BHRB restore) has been taken out, but it can be re-added when stop > 3 support is added. Cc: Ryan P Grimm Cc: Michael Neuling Cc: Gautham R. Shenoy Cc: Pratik Rajesh Sampat Signed-off-by: Nicholas Piggin --- re-sending with linuxppc-dev list actually cc'ed arch/powerpc/include/asm/machdep.h| 2 - arch/powerpc/include/asm/processor.h | 2 +- arch/powerpc/include/asm/reg.h| 1 + arch/powerpc/platforms/powernv/idle.c | 304 ++ drivers/cpuidle/cpuidle-powernv.c | 2 +- 5 files changed, 213 insertions(+), 98 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index a90b892f0bfe..5082cd496190 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -222,8 +222,6 @@ struct machdep_calls { extern void e500_idle(void); extern void power4_idle(void); -extern void power7_idle(void); -extern void power9_idle(void); extern void ppc6xx_idle(void); extern void book3e_idle(void); diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index ed0d633ab5aa..6865147209de 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch/powerpc/include/asm/processor.h @@ -432,7 +432,7 @@ enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF}; extern int powersave_nap; /* set if nap mode can be used in idle loop */ extern void power7_idle_type(unsigned long type); -extern void power9_idle_type(unsigned long stop_psscr_val, +extern void arch300_idle_type(unsigned long stop_psscr_val, unsigned long stop_psscr_mask); extern void flush_instruction_cache(void); diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 88fb88491fe9..d3a0aed321d0 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1353,6 +1353,7 @@ #define PVR_POWER8NVL 0x004C #define PVR_POWER8 0x004D #define PVR_POWER9 0x004E +#define PVR_POWER100x0080 #define PVR_BE 0x0070 #define PVR_PA6T 0x0090 diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c index 77513a80cef9..1ed7c5286487 100644 --- a/arch/powerpc/platforms/powernv/idle.c +++ b/arch/powerpc/platforms/powernv/idle.c @@ -565,7 +565,7 @@ void power7_idle_type(unsigned long type) irq_set_pending_from_srr1(srr1); } -void power7_idle(void) +static void power7_idle(void) { if (!powersave_nap) return; @@ -659,20 +659,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on) mmcr0 = mfspr(SPRN_MMCR0); } - if (cpu_has_feature(CPU_FTR_ARCH_31)) { - /* -* POWER10 uses MMCRA (BHRBRD) as BHRB disable bit. -* If the user hasn't asked for the BHRB to be -* written, the value of MMCRA[BHRBRD] is 1. -* On wakeup from stop, MMCRA[BHRBD] will be 0, -* since it is previleged resource and will be lost. -* Thus, if we do not save and restore the MMCRA[BHRBD], -* hardware will be needlessly writing to the BHRB -* in problem mode. -*/ - mmcra = mfspr(SPRN_MMCRA); - } - if ((psscr & PSSCR_RL_MASK) >= deep_spr_loss_state) { sprs.lpcr = mfspr(SPRN_LPCR); sprs.hfscr = mfspr(SPRN_HFSCR); @@ -735,10 +721,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on) mtspr(SPRN_MMCR0, mmcr0); } - /* Reload MMCRA to restore BHRB disable bit for POWER10 */ - if (cpu_has_feature(CPU_FTR_ARCH_31)) - mtspr(SPRN_MMCRA, mmcra); - /* * DD2.2 and earlier need to set then clear bit 60 in MMCRA * to ensure the PMU starts running. @@ -823,73 +805,6 @@ static unsigned long power9_idle_stop(unsigned long psscr, bool mmu_on) return srr1; } -#ifdef CONFIG_HOTPLUG_CPU -static unsigned long power9_offline_stop(unsigned long psscr) -{ - unsigned long srr1; - -#ifndef CONFIG_KVM_BOOK3S_HV_POSSIBLE - __ppc64_runlatch_off(); - srr1 = power9_idle_stop(psscr, true); - __ppc64_runlatch_on(); -#else - /* -* Tell KVM we're entering idle. -* This does not have to be done in real mode because the P9 MMU -* is independent per-thread. Some steppings share radix/hash mode -* between threads, but in that case KVM has a barrier sync in real -* mode before and after switching between radix and hash. -
Re: [PATCH] powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute
Thanks Aneesh and Mpe for reviewing this patch. Michael Ellerman writes: > "Aneesh Kumar K.V" writes: [snip] >>> >>> + /* Allow access only to perfmon capable users */ >>> + if (!perfmon_capable()) >>> + return -EACCES; >>> + >> >> An access check is usually done in open(). This is the read callback IIUC. > > Yes. Otherwise an unprivileged user can open the file, and then trick a > suid program into reading from it. Agree, but since the 'open()' for this sysfs attribute is handled by kern-fs, AFAIK dont see any direct way to enforce this policy. Only other way it seems to me is to convert the 'perf_stats' DEVICE_ATTR_RO to DEVICE_ATTR_ADMIN_RO. > > cheers -- Cheers ~ Vaibhav
iter and normal ops on /dev/zero & co, was Re: remove the last set_fs() in common code, and remove it for x86 and powerpc
On Wed, Aug 19, 2020 at 09:16:59AM +0200, Christophe Leroy wrote: > I made a test with only the first patch of your series: That's definitely > the culprit. With only that patch applies, the duration is 6.64 seconds, > that's a 25% degradation. For the record: the first patch is: mem: remove duplicate ops for /dev/zero and /dev/null So these micro-optimizations matter at least for some popular benchmarks. It would be easy to drop, but that means we either: - can't support kernel_read/write on these files, which should not matter or - have to drop the check for both ops being present Al, what do you think?
Re: remove the last set_fs() in common code, and remove it for x86 and powerpc
Le 18/08/2020 à 20:23, Christophe Leroy a écrit : Le 18/08/2020 à 20:05, Christoph Hellwig a écrit : On Tue, Aug 18, 2020 at 07:46:22PM +0200, Christophe Leroy wrote: I gave it a go on my powerpc mpc832x. I tested it on top of my newest series that reworks the 32 bits signal handlers (see https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278) with the microbenchmark test used is that series. With KUAP activated, on top of signal32 rework, performance is boosted as system time for the microbenchmark goes from 1.73s down to 1.56s, that is 10% quicker Surprisingly, with the kernel as is today without my signal's series, your series degrades performance slightly (from 2.55s to 2.64s ie 3.5% slower). I also observe, in both cases, a degradation on dd if=/dev/zero of=/dev/null count=1M Without your series, it runs in 5.29 seconds. With your series, it runs in 5.82 seconds, that is 10% more time. That's pretty strage, I wonder if some kernel text cache line effects come into play here? The kernel access side is only used in slow path code, so it should not make a difference, and the uaccess code is simplified and should be (marginally) faster. Btw, was this with the __{get,put}_user_allowed cockup that you noticed fixed? Yes it is with the __get_user_size() replaced by __get_user_size_allowed(). I made a test with only the first patch of your series: That's definitely the culprit. With only that patch applies, the duration is 6.64 seconds, that's a 25% degradation. A perf record provides the following without the patch: 41.91% dd [kernel.kallsyms] [k] __arch_clear_user 7.02% dd [kernel.kallsyms] [k] vfs_read 6.86% dd [kernel.kallsyms] [k] new_sync_read 6.68% dd [kernel.kallsyms] [k] iov_iter_zero 6.03% dd [kernel.kallsyms] [k] transfer_to_syscall 3.39% dd [kernel.kallsyms] [k] memset 3.07% dd [kernel.kallsyms] [k] __fsnotify_parent 2.68% dd [kernel.kallsyms] [k] ksys_read 2.09% dd [kernel.kallsyms] [k] read_iter_zero 2.01% dd [kernel.kallsyms] [k] __fget_light 1.84% dd [kernel.kallsyms] [k] __fdget_pos 1.35% dd [kernel.kallsyms] [k] rw_verify_area 1.32% dd libc-2.23.so [.] __GI___libc_write 1.21% dd [kernel.kallsyms] [k] vfs_write ... 0.03% dd [kernel.kallsyms] [k] write_null And the following with the patch: 15.54% dd [kernel.kallsyms] [k] __arch_clear_user 9.17% dd [kernel.kallsyms] [k] vfs_read 6.54% dd [kernel.kallsyms] [k] new_sync_write 6.31% dd [kernel.kallsyms] [k] transfer_to_syscall 6.29% dd [kernel.kallsyms] [k] __fsnotify_parent 6.20% dd [kernel.kallsyms] [k] new_sync_read 5.47% dd [kernel.kallsyms] [k] memset 5.13% dd [kernel.kallsyms] [k] vfs_write 4.44% dd [kernel.kallsyms] [k] iov_iter_zero 2.95% dd [kernel.kallsyms] [k] write_iter_null 2.82% dd [kernel.kallsyms] [k] ksys_read 2.46% dd [kernel.kallsyms] [k] __fget_light 2.34% dd libc-2.23.so [.] __GI___libc_read 1.89% dd [kernel.kallsyms] [k] iov_iter_advance 1.76% dd [kernel.kallsyms] [k] __fdget_pos 1.65% dd [kernel.kallsyms] [k] rw_verify_area 1.63% dd [kernel.kallsyms] [k] read_iter_zero 1.60% dd [kernel.kallsyms] [k] iov_iter_init 1.22% dd [kernel.kallsyms] [k] ksys_write 1.14% dd libc-2.23.so [.] __GI___libc_write Christophe Christophe
[PATCH v4] soc: fsl: enable acpi support in RCPM driver
From: Peng Ma This patch enables ACPI support in RCPM driver. Signed-off-by: Peng Ma Signed-off-by: Ran Wang --- Change in v4: - Make commit subject more accurate - Remove unrelated new blank line Change in v3: - Add #ifdef CONFIG_ACPI for acpi_device_id - Rename rcpm_acpi_imx_ids to rcpm_acpi_ids Change in v2: - Update acpi_device_id to fix conflict with other driver drivers/soc/fsl/rcpm.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c index a093dbe..b5aa6db 100644 --- a/drivers/soc/fsl/rcpm.c +++ b/drivers/soc/fsl/rcpm.c @@ -2,7 +2,7 @@ // // rcpm.c - Freescale QorIQ RCPM driver // -// Copyright 2019 NXP +// Copyright 2019-2020 NXP // // Author: Ran Wang @@ -13,6 +13,7 @@ #include #include #include +#include #define RCPM_WAKEUP_CELL_MAX_SIZE 7 @@ -139,10 +140,19 @@ static const struct of_device_id rcpm_of_match[] = { }; MODULE_DEVICE_TABLE(of, rcpm_of_match); +#ifdef CONFIG_ACPI +static const struct acpi_device_id rcpm_acpi_ids[] = { + {"NXP0015",}, + { } +}; +MODULE_DEVICE_TABLE(acpi, rcpm_acpi_ids); +#endif + static struct platform_driver rcpm_driver = { .driver = { .name = "rcpm", .of_match_table = rcpm_of_match, + .acpi_match_table = ACPI_PTR(rcpm_acpi_ids), .pm = &rcpm_pm_ops, }, .probe = rcpm_probe, -- 2.7.4
Re: [PATCH 14/16] debug_vm_pgtable/hugetlb: Disable hugetlb test on ppc64
Anshuman Khandual writes: > On 08/12/2020 07:22 PM, Aneesh Kumar K.V wrote: >> On 8/12/20 7:04 PM, Anshuman Khandual wrote: >>> >>> >>> On 08/12/2020 06:46 PM, Aneesh Kumar K.V wrote: On 8/12/20 6:33 PM, Anshuman Khandual wrote: > > > On 08/12/2020 12:03 PM, Aneesh Kumar K.V wrote: >> The seems to be missing quite a lot of details w.r.t allocating >> the correct pgtable_t page (huge_pte_alloc()), holding the right >> lock (huge_pte_lock()) etc. The vma used is also not a hugetlb VMA. >> >> ppc64 do have runtime checks within CONFIG_DEBUG_VM for most of these. >> Hence disable the test on ppc64. > > This test is free from any platform specific #ifdefs which should > never be broken. If hugetlb_advanced_tests() does not work or is > not detailed enough for ppc64, then it would be great if you could > suggest some improvements so that it works for all enabled platforms. > > As mentioned the test is broken. For hugetlb, the pgtable_t pages should be allocated by huge_pte_alloc(). We need to hold huget_pte_lock() before updating huge tlb pte. That takes hugepage size, which is mostly derived out of vma. Hence vma need to be a hugetlb vma. Some of the functions also depend on hstate. Also we should use set_huge_pte_at() when setting up hugetlb pte entries. I was tempted to remove that test completely marking it broken. But avoided that by marking it broken on only PPC64. >>> >>> The test is not broken, hugetlb helpers on multiple platforms dont complain >>> about >>> this at all. The tests here emulate 'enough' MM objects required for the >>> helpers >>> on enabled platforms, to perform the primary task i.e page table >>> transformation it >>> is expected to do. The test does not claim to emulate a perfect MM >>> environment for >>> a given subsystem's (like HugeTLB) arch helpers. Now in this case, the MM >>> objects >>> being emulated for the HugeTLB advanced tests does not seem to be >>> sufficient for >>> ppc64 but it can be improved. But that does not mean it is broken in it's >>> current >>> form for other platforms. >>> >> >> There is nothing ppc64 specific here. It is just that we have >> CONFIG_DEBUG_VM based checks for different possibly wrong usages of these >> functions. This was done because we have different page sizes, two different >> translations to support and we want to avoid any wrong usage. IMHO expecting >> hugetlb page table helpers to work with a non hugetlb VMA and without >> holding hugeTLB pte lock is a clear violation of hugetlb interface. > > Do you have a modified version of the test with HugeTLB marked VMA and with > pte lock > held, which works on ppc664 ? Nope. That is one of the reason I commented that out. We can sort that out slowly. -aneesh
[powerpc:fixes-test] BUILD SUCCESS 801980f6497946048709b9b09771a1729551d705
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test branch HEAD: 801980f6497946048709b9b09771a1729551d705 powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death elapsed time: 1042m configs tested: 72 configs skipped: 68 The following configs have been built successfully. More configs may be tested in the coming days. arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig ia64 allmodconfig ia64defconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 allnoconfig c6x allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig alphaallyesconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386defconfig mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig powerpc defconfig i386 randconfig-a005-20200818 i386 randconfig-a002-20200818 i386 randconfig-a001-20200818 i386 randconfig-a006-20200818 i386 randconfig-a003-20200818 i386 randconfig-a004-20200818 x86_64 randconfig-a013-20200818 x86_64 randconfig-a016-20200818 x86_64 randconfig-a012-20200818 x86_64 randconfig-a011-20200818 x86_64 randconfig-a014-20200818 x86_64 randconfig-a015-20200818 i386 randconfig-a016-20200818 i386 randconfig-a011-20200818 i386 randconfig-a015-20200818 i386 randconfig-a013-20200818 i386 randconfig-a012-20200818 i386 randconfig-a014-20200818 x86_64 randconfig-a006-20200819 x86_64 randconfig-a001-20200819 x86_64 randconfig-a003-20200819 x86_64 randconfig-a005-20200819 x86_64 randconfig-a004-20200819 x86_64 randconfig-a002-20200819 riscvallyesconfig riscv allnoconfig riscv defconfig riscvallmodconfig x86_64 rhel x86_64 allyesconfig x86_64rhel-7.6-kselftests x86_64 defconfig x86_64 rhel-8.3 x86_64 kexec --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org