Re: [PATCH 0/4] perf: Add new macros for mem_hops field
Arnaldo Carvalho de Melo writes: > Em Mon, Dec 06, 2021 at 02:47:45PM +0530, Kajol Jain escreveu: >> Patchset adds new macros for mem_hops field which can be >> used to represent remote-node, socket and board level details. >> >> Currently the code had macro for HOPS_0, which corresponds >> to data coming from another core but same node. >> Add new macros for HOPS_1 to HOPS_3 to represent >> remote-node, socket and board level data. >> >> For ex: Encodings for mem_hops fields with L2 cache: > > I checked and this hasn't hit mainstream, is it already merged on a tree > where this is slated to be submitted in the next window? If so please > let me know which one so that I can merge it on perf/core. I haven't picked it up. I guess the kernel changes are mainly in powerpc, but I'd at least need an ack from eg. Peter for the generic perf uapi change. Equally the whole series could go via tip. cheers
Re: [PATCH] powerpc/module_64: Fix livepatching for RO modules
On 12/9/21 2:00 AM, Michael Ellerman wrote: > Russell Currey writes: >> On Tue, 2021-12-07 at 09:44 -0500, Joe Lawrence wrote: >>> On 11/23/21 3:15 AM, Russell Currey wrote: >>> >>> [[ cc += livepatching list ]] >>> >>> Hi Russell, >>> >>> Thanks for writing a minimal fix for stable / backporting. As I >>> mentioned on the github issue [1], this avoided the crashes I >>> reported >>> here and over on kpatch github [2]. I wasn't sure if this is the >>> final >>> version for stable, but feel free to add my: >>> >>> Tested-by: Joe Lawrence >> >> Thanks Joe, as per the discussions on GitHub I think we're fine to use >> this patch for a fix for stable (unless there's new issues found or >> additional community feedback etc). > > Hmm, I read the GitHub discussion as being that you were going to do > another version, which is why I haven't picked this up. But I guess you > and Christophe were talking about the non-minimal fix. > > I know we want this to be minimal, but I think it should be checking > that patch_instruction() isn't failing. > > Incremental diff to do that is below. It boots OK, are you able to throw > a livepatch test at it? > No problem. The incremental patch update tests successful. Thanks, -- Joe
[powerpc:next] BUILD SUCCESS 0d76914a4c99ab5658f3fb07cdf3799d28e2eab3
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next branch HEAD: 0d76914a4c99ab5658f3fb07cdf3799d28e2eab3 powerpc/inst: Optimise copy_inst_from_kernel_nofault() elapsed time: 724m configs tested: 146 configs skipped: 4 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig i386 randconfig-c001-20211209 powerpc randconfig-c003-20211209 x86_64 allyesconfig arc axs103_smp_defconfig powerpc mgcoge_defconfig armmvebu_v7_defconfig m68kstmark2_defconfig arm imote2_defconfig powerpcsam440ep_defconfig m68km5307c3_defconfig sh sdk7780_defconfig arm sunxi_defconfig xtensa cadence_csp_defconfig arm pxa3xx_defconfig m68k m5475evb_defconfig ia64zx1_defconfig ia64 gensparse_defconfig powerpc tqm8xx_defconfig arm mxs_defconfig shapsh4ad0a_defconfig alphaallyesconfig armpleb_defconfig ia64generic_defconfig mips malta_defconfig h8300h8300h-sim_defconfig openrisc simple_smp_defconfig shsh7763rdp_defconfig powerpc mpc834x_mds_defconfig powerpc kilauea_defconfig powerpc makalu_defconfig sh lboxre2_defconfig mips ip27_defconfig powerpc xes_mpc85xx_defconfig powerpcsocrates_defconfig nds32 allnoconfig mips maltasmvp_eva_defconfig mips rs90_defconfig sh se7705_defconfig sh sdk7786_defconfig arm sama7_defconfig mips cobalt_defconfig sh rsk7201_defconfig arm rpc_defconfig armmulti_v5_defconfig m68k alldefconfig arm jornada720_defconfig m68k hp300_defconfig arm omap2plus_defconfig ia64defconfig powerpc ep8248e_defconfig arm ezx_defconfig arm multi_v4t_defconfig powerpc motionpro_defconfig mips gcw0_defconfig armmvebu_v5_defconfig powerpcfsp2_defconfig sh se7721_defconfig arm randconfig-c002-20211209 ia64 allmodconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386defconfig i386 debian-10.3-kselftests i386 debian-10.3 mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a006-20211209 x86_64 randconfig-a005-20211209 x86_64 randconfig-a001-20211209 x86_64 randconfig-a002-20211209 x86_64 randconfig-a004-20211209 x86_64 randconfig-a003-20211209 i386
[powerpc:next-test] BUILD SUCCESS b149d5d45ac9171ed699a256f026c8ebef901112
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test branch HEAD: b149d5d45ac9171ed699a256f026c8ebef901112 powerpc/powermac: Add additional missing lockdep_register_key() elapsed time: 723m configs tested: 146 configs skipped: 4 The following configs have been built successfully. More configs may be tested in the coming days. gcc tested configs: arm defconfig arm64allyesconfig arm64 defconfig arm allyesconfig arm allmodconfig i386 randconfig-c001-20211209 powerpc randconfig-c003-20211209 x86_64 allyesconfig arc axs103_smp_defconfig powerpc mgcoge_defconfig armmvebu_v7_defconfig m68kstmark2_defconfig arm imote2_defconfig powerpcsam440ep_defconfig m68km5307c3_defconfig sh sdk7780_defconfig arm sunxi_defconfig xtensa cadence_csp_defconfig arm pxa3xx_defconfig m68k m5475evb_defconfig ia64zx1_defconfig ia64 gensparse_defconfig powerpc tqm8xx_defconfig arm mxs_defconfig shapsh4ad0a_defconfig alphaallyesconfig armpleb_defconfig ia64generic_defconfig mips malta_defconfig h8300h8300h-sim_defconfig openrisc simple_smp_defconfig shsh7763rdp_defconfig powerpc mpc834x_mds_defconfig powerpc kilauea_defconfig powerpc makalu_defconfig sh lboxre2_defconfig mips ip27_defconfig powerpc xes_mpc85xx_defconfig powerpcsocrates_defconfig nds32 allnoconfig mips maltasmvp_eva_defconfig mips rs90_defconfig sh se7705_defconfig sh sdk7786_defconfig arm sama7_defconfig mips cobalt_defconfig sh rsk7201_defconfig arm rpc_defconfig armmulti_v5_defconfig m68k alldefconfig arm jornada720_defconfig m68k hp300_defconfig arm omap2plus_defconfig ia64defconfig powerpc ep8248e_defconfig arm ezx_defconfig arm multi_v4t_defconfig powerpc motionpro_defconfig mips gcw0_defconfig armmvebu_v5_defconfig powerpcfsp2_defconfig sh se7721_defconfig arm randconfig-c002-20211209 ia64 allmodconfig ia64 allyesconfig m68k allmodconfig m68kdefconfig m68k allyesconfig nios2 defconfig arc allyesconfig nds32 defconfig nios2allyesconfig cskydefconfig alpha defconfig xtensa allyesconfig h8300allyesconfig arc defconfig sh allmodconfig parisc defconfig s390 allyesconfig s390 allmodconfig parisc allyesconfig s390defconfig i386 allyesconfig sparcallyesconfig sparc defconfig i386defconfig i386 debian-10.3-kselftests i386 debian-10.3 mips allyesconfig mips allmodconfig powerpc allyesconfig powerpc allmodconfig powerpc allnoconfig x86_64 randconfig-a006-20211209 x86_64 randconfig-a005-20211209 x86_64 randconfig-a001-20211209 x86_64 randconfig-a002-20211209 x86_64 randconfig-a004-20211209 x86_64 randconfig-a003-20211209 i386
Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout
Christophe Leroy writes: > Le 09/12/2021 à 12:22, Michael Ellerman a écrit : >> Nicholas Piggin writes: >> >>> Excerpts from Christophe Leroy's message of December 9, 2021 8:22 pm: Le 09/12/2021 à 11:15, Nicholas Piggin a écrit : > Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am: >> Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and >> remove arch/powerpc/mm/mmap.c >> >> This change provides standard randomisation of mmaps. >> >> See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386 >> and X86_32") for all the benefits of mmap randomisation. > > The justification seems pretty reasonable. > >> >> Comparison between powerpc implementation and the generic one: >> - mmap_is_legacy() is identical. >> - arch_mmap_rnd() does exactly the same allthough it's written >> slightly differently. >> - MIN_GAP and MAX_GAP are identical. >> - mmap_base() does the same but uses STACK_RND_MASK which provides >> the same values as stack_maxrandom_size(). >> - arch_pick_mmap_layout() is almost identical. The only difference >> is that it also adds the random factor to mm->mmap_base in legacy mode. >> >> That last point is what provides the standard randomisation of mmaps. > > Thanks for describing it. Could you add random_factor to mmap_base for > the legacy path for powerpc as a 2-line change that adds the legacy > randomisation. And then this bigger patch would be closer to a no-op. > You mean you would like to see the following patch before doing the convert ? https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7dabf1cbde67a346a187881d4f0bd17347e0334a.1533732583.git.christophe.le...@c-s.fr/ >>> >>> Yes. >> >> My comment at the time was: >> >>Basically mmap_is_legacy() tells you if any of these is true: >> >> - process has the ADDR_COMPAT_LAYOUT personality >> - global legacy_va_layout sysctl is enabled >> - stack is unlimited >> >>And we only want to change the behaviour for the stack. Or at least the >>change log of your patch only talks about the stack limit, not the >>others. >> >>Possibly we should just enable randomisation for all three of those >>cases, but if so we must spell it out in the patch. >> >>It'd also be good to see the output of /proc/x/maps for some processes >>before and after, to show what actually changes. >> >> >> From: https://github.com/linuxppc/issues/issues/59#issuecomment-502066947 >> >> >> So I think at least the change log on that patch still needs updating to >> be clear that it's changing behaviour for all mmap_is_legacy() cases, >> not just the stack unlimited case. >> >> There's also a risk changing the mmap legacy behaviour breaks something. >> But we are at least matching the behaviour of other architectures, and >> there is also an escape hatch in the form of `setarch -R`. > > That was the purpose of adding in the change log a reference to commit > 8b8addf891de ("x86/mm/32: Enable full randomization on i386 > and X86_32") > > All this applies to powerpc as well. Yeah, I'm just a pessimist :) So although the security benefit is nice, I'm more worried that the layout change will break some mission critical legacy app somewhere. So I just like to have that spelled out in the change log, or at least in the discussion like here. > But I can copy paste the changelog of that commit into mine if you think > it is more explicit. Just referring to it is probably fine. > I agree that old patch was only refering to stack limit, I had no clue > of everything else at that time. No worries. cheers
[RFC PATCH v2 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.
From: Zi Yan alloc_contig_range() now only needs to be aligned to pageblock_order, drop virtio_mem size requirement that it needs to be the max of pageblock_order and MAX_ORDER. Signed-off-by: Zi Yan --- drivers/virtio/virtio_mem.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index 96e5a8782769..dab4bce417fd 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -2436,15 +2436,13 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm) VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD); /* -* We want subblocks to span at least MAX_ORDER_NR_PAGES and -* pageblock_nr_pages pages. This: +* We want subblocks to span at least pageblock_nr_pages pages. This: * - Simplifies our page onlining code (virtio_mem_online_page_cb) * and fake page onlining code (virtio_mem_fake_online). * - Is required for now for alloc_contig_range() to work reliably - * it doesn't properly handle smaller granularity on ZONE_NORMAL. */ - sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES, - pageblock_nr_pages) * PAGE_SIZE; + sb_size = pageblock_nr_pages * PAGE_SIZE; sb_size = max_t(uint64_t, vm->device_block_size, sb_size); if (sb_size < memory_block_size_bytes() && !force_bbm) { -- 2.33.0
[RFC PATCH v2 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned.
From: Zi Yan CMA only requires pageblock alignment now. Change CMA alignment in fadump too. Signed-off-by: Zi Yan --- arch/powerpc/include/asm/fadump-internal.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/fadump-internal.h b/arch/powerpc/include/asm/fadump-internal.h index 8d61c8f3fec4..9198f20b6b68 100644 --- a/arch/powerpc/include/asm/fadump-internal.h +++ b/arch/powerpc/include/asm/fadump-internal.h @@ -20,9 +20,7 @@ #define memblock_num_regions(memblock_type)(memblock.memblock_type.cnt) /* Alignment per CMA requirement. */ -#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << \ -max_t(unsigned long, MAX_ORDER - 1,\ -pageblock_order)) +#define FADUMP_CMA_ALIGNMENT (PAGE_SIZE << pageblock_order) /* FAD commands */ #define FADUMP_REGISTER1 -- 2.33.0
[RFC PATCH v2 5/7] mm: cma: use pageblock_order as the single alignment
From: Zi Yan Now alloc_contig_range() works at pageblock granularity. Change CMA allocation, which uses alloc_contig_range(), to use pageblock_order alignment. Signed-off-by: Zi Yan --- include/linux/mmzone.h | 5 + kernel/dma/contiguous.c | 2 +- mm/cma.c| 6 ++ mm/page_alloc.c | 6 +++--- 4 files changed, 7 insertions(+), 12 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b925431b0123..71830af35745 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -54,10 +54,7 @@ enum migratetype { * * The way to use it is to change migratetype of a range of * pageblocks to MIGRATE_CMA which can be done by -* __free_pageblock_cma() function. What is important though -* is that a range of pageblocks must be aligned to -* MAX_ORDER_NR_PAGES should biggest page be bigger than -* a single pageblock. +* __free_pageblock_cma() function. */ MIGRATE_CMA, #endif diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c index 3d63d91cba5c..ac35b14b0786 100644 --- a/kernel/dma/contiguous.c +++ b/kernel/dma/contiguous.c @@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = { static int __init rmem_cma_setup(struct reserved_mem *rmem) { - phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order); + phys_addr_t align = PAGE_SIZE << pageblock_order; phys_addr_t mask = align - 1; unsigned long node = rmem->fdt_node; bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL); diff --git a/mm/cma.c b/mm/cma.c index bc9ca8f3c487..d171158bd418 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -180,8 +180,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, return -EINVAL; /* ensure minimal alignment required by mm core */ - alignment = PAGE_SIZE << - max_t(unsigned long, MAX_ORDER - 1, pageblock_order); + alignment = PAGE_SIZE << pageblock_order; /* alignment should be aligned with order_per_bit */ if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit)) @@ -268,8 +267,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base, * migratetype page by page allocator's buddy algorithm. In the case, * you couldn't get a contiguous memory, which is not what we want. */ - alignment = max(alignment, (phys_addr_t)PAGE_SIZE << - max_t(unsigned long, MAX_ORDER - 1, pageblock_order)); + alignment = max(alignment, (phys_addr_t)PAGE_SIZE << pageblock_order); if (fixed && base & (alignment - 1)) { ret = -EINVAL; pr_err("Region at %pa must be aligned to %pa bytes\n", diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5ffbeb1b7512..3317f2064105 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -9127,8 +9127,8 @@ static inline void split_free_page_into_pageblocks(struct page *free_page, * be either of the two. * @gfp_mask: GFP mask to use during compaction * - * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES - * aligned. The PFN range must belong to a single zone. + * The PFN range does not have to be pageblock aligned. The PFN range must + * belong to a single zone. * * The first thing this routine does is attempt to MIGRATE_ISOLATE all * pageblocks in the range. Once isolated, the pageblocks should not @@ -9243,7 +9243,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, ret = 0; /* -* Pages from [start, end) are within a MAX_ORDER_NR_PAGES +* Pages from [start, end) are within a pageblock_nr_pages * aligned blocks that are marked as MIGRATE_ISOLATE. What's * more, all pages in [start, end) are free in page allocator. * What we are going to do is to allocate all pages from -- 2.33.0
[RFC PATCH v2 4/7] mm: make alloc_contig_range work at pageblock granularity
From: Zi Yan alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging pageblocks with different migratetypes. It might unnecessarily convert extra pageblocks at the beginning and at the end of the range. Change alloc_contig_range() to work at pageblock granularity. It is done by restoring pageblock types and split >pageblock_order free pages after isolating at MAX_ORDER-1 granularity and migrating pages away at pageblock granularity. The reason for this process is that during isolation, some pages, either free or in-use, might have >pageblock sizes and isolating part of them can cause free accounting issues. Restoring the migratetypes of the pageblocks not in the interesting range later is much easier. Signed-off-by: Zi Yan --- mm/page_alloc.c | 169 ++-- 1 file changed, 149 insertions(+), 20 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 107a5f186d3b..5ffbeb1b7512 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8981,8 +8981,8 @@ struct page *has_unmovable_pages(struct zone *zone, struct page *page, #ifdef CONFIG_CONTIG_ALLOC static unsigned long pfn_max_align_down(unsigned long pfn) { - return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES, -pageblock_nr_pages) - 1); + return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES, +pageblock_nr_pages)); } static unsigned long pfn_max_align_up(unsigned long pfn) @@ -9071,6 +9071,52 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, return 0; } +static inline int save_migratetypes(unsigned char *migratetypes, + unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long pfn = start_pfn; + int num = 0; + + while (pfn < end_pfn) { + migratetypes[num] = get_pageblock_migratetype(pfn_to_page(pfn)); + num++; + pfn += pageblock_nr_pages; + } + return num; +} + +static inline int restore_migratetypes(unsigned char *migratetypes, + unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long pfn = start_pfn; + int num = 0; + + while (pfn < end_pfn) { + set_pageblock_migratetype(pfn_to_page(pfn), migratetypes[num]); + num++; + pfn += pageblock_nr_pages; + } + return num; +} + +static inline void split_free_page_into_pageblocks(struct page *free_page, + int order, struct zone *zone) +{ + unsigned long pfn; + + spin_lock(&zone->lock); + del_page_from_free_list(free_page, zone, order); + for (pfn = page_to_pfn(free_page); +pfn < page_to_pfn(free_page) + (1UL << order); +pfn += pageblock_nr_pages) { + int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); + + __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order, + mt, FPI_NONE); + } + spin_unlock(&zone->lock); +} + /** * alloc_contig_range() -- tries to allocate given range of pages * @start: start PFN to allocate @@ -9096,8 +9142,15 @@ int alloc_contig_range(unsigned long start, unsigned long end, unsigned migratetype, gfp_t gfp_mask) { unsigned long outer_start, outer_end; + unsigned long isolate_start = pfn_max_align_down(start); + unsigned long isolate_end = pfn_max_align_up(end); + unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages); + unsigned long alloc_end = ALIGN(end, pageblock_nr_pages); + unsigned long num_pageblock_to_save; unsigned int order; int ret = 0; + unsigned char *saved_mt; + int num; struct compact_control cc = { .nr_migratepages = 0, @@ -9111,11 +9164,30 @@ int alloc_contig_range(unsigned long start, unsigned long end, }; INIT_LIST_HEAD(&cc.migratepages); + /* +* TODO: make MIGRATE_ISOLATE a standalone bit to avoid overwriting +* the exiting migratetype. Then, we will not need the save and restore +* process here. +*/ + + /* Save the migratepages of the pageblocks before start and after end */ + num_pageblock_to_save = (alloc_start - isolate_start) / pageblock_nr_pages + + (isolate_end - alloc_end) / pageblock_nr_pages; + saved_mt = + kmalloc_array(num_pageblock_to_save, + sizeof(unsigned char), GFP_KERNEL); + if (!saved_mt) + return -ENOMEM; + + num = save_migratetypes(saved_mt, isolate_start, alloc_start); + + num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end); + /* * What we do here is we mark all pageblocks in range as * MIGRATE_ISOLATE. Because pageblock and max o
[RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
From: Zi Yan This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance. It prepares for the upcoming removal of the MAX_ORDER-1 alignment requirement for CMA and alloc_contig_range(). MIGRARTE_HIGHATOMIC should not merge with other migratetypes like MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too. Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness. [1] https://lore.kernel.org/linux-mm/20211130100853.gp3...@techsingularity.net/ Signed-off-by: Zi Yan --- include/linux/mmzone.h | 6 ++ mm/page_alloc.c| 28 ++-- 2 files changed, 24 insertions(+), 10 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 58e744b78c2c..b925431b0123 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -83,6 +83,12 @@ static inline bool is_migrate_movable(int mt) return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE; } +/* See fallbacks[MIGRATE_TYPES][3] in page_alloc.c */ +static inline bool migratetype_has_fallback(int mt) +{ + return mt < MIGRATE_PCPTYPES; +} + #define for_each_migratetype_order(order, type) \ for (order = 0; order < MAX_ORDER; order++) \ for (type = 0; type < MIGRATE_TYPES; type++) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index edfd6c81af82..107a5f186d3b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1041,6 +1041,12 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn, return page_is_buddy(higher_page, higher_buddy, order + 1); } +static inline bool has_non_fallback_pageblock(struct zone *zone) +{ + return has_isolate_pageblock(zone) || zone_cma_pages(zone) != 0 || + zone->nr_reserved_highatomic != 0; +} + /* * Freeing function for a buddy system allocator. * @@ -1116,14 +1122,15 @@ static inline void __free_one_page(struct page *page, } if (order < MAX_ORDER - 1) { /* If we are here, it means order is >= pageblock_order. -* We want to prevent merge between freepages on isolate -* pageblock and normal pageblock. Without this, pageblock -* isolation could cause incorrect freepage or CMA accounting. +* We want to prevent merge between freepages on pageblock +* without fallbacks and normal pageblock. Without this, +* pageblock isolation could cause incorrect freepage or CMA +* accounting or HIGHATOMIC accounting. * * We don't want to hit this code for the more frequent * low-order merging. */ - if (unlikely(has_isolate_pageblock(zone))) { + if (unlikely(has_non_fallback_pageblock(zone))) { int buddy_mt; buddy_pfn = __find_buddy_pfn(pfn, order); @@ -1131,8 +1138,8 @@ static inline void __free_one_page(struct page *page, buddy_mt = get_pageblock_migratetype(buddy); if (migratetype != buddy_mt - && (is_migrate_isolate(migratetype) || - is_migrate_isolate(buddy_mt))) + && (!migratetype_has_fallback(migratetype) || + !migratetype_has_fallback(buddy_mt))) goto done_merging; } max_order = order + 1; @@ -2483,6 +2490,7 @@ static int fallbacks[MIGRATE_TYPES][3] = { [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_TYPES }, [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_TYPES }, [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_TYPES }, + [MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */ #ifdef CONFIG_CMA [MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */ #endif @@ -2794,8 +2802,8 @@ static void reserve_highatomic_pageblock(struct page *page, struct zone *zone, /* Yoink! */ mt = get_pageblock_migratetype(page); - if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt) - && !is_migrate_cma(mt)) { + /* Only reserve normal pageblock */ + if (migratetype_has_fallback(mt)) { zone->nr_reserved_highatomic += pageblock_nr_pages; set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC); move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL); @@ -3544,8 +3552,8 @@ int __isolate_free_page(struct page *page, unsigned int order) struct page *endpage = page + (1 << order) - 1; for (; page < endpage; page += pageblock_nr_pages) { int mt = get_pageblock_migratetype(page); - if (!is_migrate_isolate(mt) && !is_migrate_cma(mt) -
[RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages.
From: Zi Yan alloc_migration_target() is used by alloc_contig_range() and non-LRU movable compound pages can be migrated. Current code does not allocate the right page size for such pages. Check THP precisely using is_transparent_huge() and add allocation support for non-LRU compound pages. Signed-off-by: Zi Yan --- mm/migrate.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index d487a399253b..2ce3c771b1de 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1563,7 +1563,7 @@ struct page *alloc_migration_target(struct page *page, unsigned long private) return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask); } - if (PageTransHuge(page)) { + if (is_transparent_hugepage(page)) { /* * clear __GFP_RECLAIM to make the migration callback * consistent with regular THP allocations. @@ -1572,13 +1572,17 @@ struct page *alloc_migration_target(struct page *page, unsigned long private) gfp_mask |= GFP_TRANSHUGE; order = HPAGE_PMD_ORDER; } + if (PageCompound(page)) { + gfp_mask |= __GFP_COMP; + order = compound_order(page); + } zidx = zone_idx(page_zone(page)); if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE) gfp_mask |= __GFP_HIGHMEM; new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask); - if (new_page && PageTransHuge(new_page)) + if (new_page && is_transparent_hugepage(page)) prep_transhuge_page(new_page); return new_page; -- 2.33.0
[RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.
From: Zi Yan Hi all, This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER adjustable at boot time[1]. The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range() isolates pageblocks to remove free memory from buddy allocator but isolating only a subset of pageblocks within a page spanning across multiple pageblocks causes free page accounting issues. Isolated page might not be put into the right free list, since the code assumes the migratetype of the first pageblock as the whole free page migratetype. This is based on the discussion at [2]. To remove the requirement, this patchset: 1. still isolates pageblocks at MAX_ORDER - 1 granularity; 2. but saves the pageblock migratetypes outside the specified range of alloc_contig_range() and restores them after all pages within the range become free after __alloc_contig_migrate_range(); 3. splits free pages spanning multiple pageblocks at the beginning and the end of the range and puts the split pages to the right migratetype free lists based on the pageblock migratetypes; 4. returns pages not in the range as it did before this patch. Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound) to make sure all pageblocks belonging to a single page are isolated together and later pageblocks outside the range need to have their migratetypes restored; or 2) extra logic will need to be added during page free time to split a free page with multi-migratetype pageblocks. Two optimizations might come later: 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned range during isolation to increase successful rate of alloc_contig_range(). 2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing migratetypes before and after isolation respectively. Feel free to give comments and suggestions. Thanks. [1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi@sent.com/ [2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b94...@redhat.com/ Zi Yan (7): mm: page_alloc: avoid merging non-fallbackable pageblocks with others. mm: compaction: handle non-lru compound pages properly in isolate_migratepages_block(). mm: migrate: allocate the right size of non hugetlb or THP compound pages. mm: make alloc_contig_range work at pageblock granularity mm: cma: use pageblock_order as the single alignment drivers: virtio_mem: use pageblock size as the minimum virtio_mem size. arch: powerpc: adjust fadump alignment to be pageblock aligned. arch/powerpc/include/asm/fadump-internal.h | 4 +- drivers/virtio/virtio_mem.c| 6 +- include/linux/mmzone.h | 11 +- kernel/dma/contiguous.c| 2 +- mm/cma.c | 6 +- mm/compaction.c| 10 +- mm/migrate.c | 8 +- mm/page_alloc.c| 203 + 8 files changed, 196 insertions(+), 54 deletions(-) -- 2.33.0
[RFC PATCH v2 2/7] mm: compaction: handle non-lru compound pages properly in isolate_migratepages_block().
From: Zi Yan In isolate_migratepages_block(), a !PageLRU tail page can be encountered when the page is larger than a pageblock. Use compound head page for the checks inside and skip the entire compound page when isolation succeeds. Signed-off-by: Zi Yan --- mm/compaction.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index b4e94cda3019..ad9053fbbe06 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -979,19 +979,23 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, * Skip any other type of page */ if (!PageLRU(page)) { + struct page *head = compound_head(page); /* * __PageMovable can return false positive so we need * to verify it under page_lock. */ - if (unlikely(__PageMovable(page)) && - !PageIsolated(page)) { + if (unlikely(__PageMovable(head)) && + !PageIsolated(head)) { if (locked) { unlock_page_lruvec_irqrestore(locked, flags); locked = NULL; } - if (!isolate_movable_page(page, isolate_mode)) + if (!isolate_movable_page(head, isolate_mode)) { + low_pfn += (1 << compound_order(head)) - 1 - (page - head); + page = head; goto isolate_success; + } } goto isolate_fail; -- 2.33.0
Re: [PATCH v5 08/12] KVM: Reinstate gfn_to_pfn_cache with invalidation support
On Thu, 2021-12-09 at 23:34 +0100, Paolo Bonzini wrote: > Compared to the review it's missing this hunk: > > @@ -265,7 +265,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct > gfn_to_pfn_cache *gpc) > > gpc->valid = false; > > - old_khva = gpc->khva; > + old_khva = (void *)((unsigned long)gpc->khva & ~PAGE_MASK); > old_dirty = gpc->dirty; > old_gpa = gpc->gpa; > old_pfn = gpc->pfn; Ah right, there were two of those. Will fix; thanks. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH v5 08/12] KVM: Reinstate gfn_to_pfn_cache with invalidation support
On 12/9/21 21:40, David Woodhouse wrote: NP, very useful fixes. Thanks. Incremental patch looks like this. It passes the xen_shinfo_test self-test; will test it with real Xen guests tomorrow and repost based on your kvm/next tree once it shows up. Compared to the review it's missing this hunk: @@ -265,7 +265,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc) gpc->valid = false; - old_khva = gpc->khva; + old_khva = (void *)((unsigned long)gpc->khva & ~PAGE_MASK); old_dirty = gpc->dirty; old_gpa = gpc->gpa; old_pfn = gpc->pfn; Otherwise looks good, thanks! Paolo
Re: [PATCH v2 2/6] powerpc/kexec_file: Add KEXEC_SIG support.
On 12/9/21 04:21, Michal Suchánek wrote: Hello, Hi, On Wed, Dec 08, 2021 at 08:51:47PM -0500, Nayna wrote: On 11/25/21 13:02, Michal Suchanek wrote: Copy the code from s390x Signed-off-by: Michal Suchanek --- arch/powerpc/Kconfig| 11 +++ arch/powerpc/kexec/elf_64.c | 36 2 files changed, 47 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index ac0c515552fd..ecc1227a77f1 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -561,6 +561,17 @@ config KEXEC_FILE config ARCH_HAS_KEXEC_PURGATORY def_bool KEXEC_FILE +config KEXEC_SIG + bool "Verify kernel signature during kexec_file_load() syscall" + depends on KEXEC_FILE && MODULE_SIG_FORMAT After manually applying the patch, the build is failing with the following error: build failed with error "arch/powerpc/kexec/elf_64.o: In function `elf64_verify_sig': /root/kernel/linus/linux/arch/powerpc/kexec/elf_64.c:160: undefined reference to `verify_appended_signature'" This patch does not add call to verify_appended_signature. Maybe you applied the following patch as well? Yes, I tried build after applying all the patches. Thanks & Regards, - Nayna
[Bug 215285] New: power9 le: amdgpu: *ERROR* hw_init of IP block failed -22
https://bugzilla.kernel.org/show_bug.cgi?id=215285 Bug ID: 215285 Summary: power9 le: amdgpu: *ERROR* hw_init of IP block failed -22 Product: Platform Specific/Hardware Version: 2.5 Kernel Version: 5.15.6 Hardware: PPC-64 OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: PPC-64 Assignee: platform_ppc...@kernel-bugs.osdl.org Reporter: s...@aeam.us Regression: No POWER9 blackbird system from Talos. GPU is 6600XT. This seems to be a flip flop regression: https://gitlab.freedesktop.org/drm/amd/-/issues/1519 [ 54.313046] [drm] Found VCN firmware Version ENC: 1.16 DEC: 2 VEP: 0 Revision: 1 [ 54.313054] amdgpu :03:00.0: amdgpu: Will use PSP to load VCN firmware [ 54.314153] amdgpu :03:00.0: enabling bus mastering [ 54.570938] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed! [ 54.571061] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed [ 54.571175] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP block failed -22 [ 54.571277] amdgpu :03:00.0: amdgpu: amdgpu_device_ip_init failed [ 54.571279] amdgpu :03:00.0: amdgpu: Fatal error during GPU init [ 54.571282] amdgpu :03:00.0: amdgpu: amdgpu: finishing device. [ 54.572789] amdgpu: probe of :03:00.0 failed with error -22 -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH v5 08/12] KVM: Reinstate gfn_to_pfn_cache with invalidation support
On Thu, 2021-12-09 at 19:34 +0100, Paolo Bonzini wrote: > Sorry for the late review... NP, very useful fixes. Thanks. Incremental patch looks like this. It passes the xen_shinfo_test self-test; will test it with real Xen guests tomorrow and repost based on your kvm/next tree once it shows up. diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index d8c6e1d4a647..9e3c662f815f 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -124,6 +124,33 @@ static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, } } +static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva) +{ + unsigned long mmu_seq; + kvm_pfn_t new_pfn; + int retry; + + do { + mmu_seq = kvm->mmu_notifier_seq; + smp_rmb(); + + /* We always request a writeable mapping */ + new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL); + if (is_error_noslot_pfn(new_pfn)) + break; + + KVM_MMU_READ_LOCK(kvm); + retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva); + KVM_MMU_READ_UNLOCK(kvm); + if (!retry) + break; + + cond_resched(); + } while (1); + + return new_pfn; +} + int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len, bool dirty) { @@ -147,7 +174,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, old_gpa = gpc->gpa; old_pfn = gpc->pfn; - old_khva = gpc->khva; + old_khva = (void *)((unsigned long)gpc->khva & ~PAGE_MASK); old_uhva = gpc->uhva; old_valid = gpc->valid; old_dirty = gpc->dirty; @@ -178,8 +205,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, if (!old_valid || old_uhva != gpc->uhva) { unsigned long uhva = gpc->uhva; void *new_khva = NULL; - unsigned long mmu_seq; - int retry; /* Placeholders for "hva is valid but not yet mapped" */ gpc->pfn = KVM_PFN_ERR_FAULT; @@ -188,28 +213,15 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, write_unlock_irq(&gpc->lock); - retry_map: - mmu_seq = kvm->mmu_notifier_seq; - smp_rmb(); - - /* We always request a writeable mapping */ - new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL); + new_pfn = hva_to_pfn_retry(kvm, uhva); if (is_error_noslot_pfn(new_pfn)) { ret = -EFAULT; goto map_done; } - KVM_MMU_READ_LOCK(kvm); - retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva); - KVM_MMU_READ_UNLOCK(kvm); - if (retry) { - cond_resched(); - goto retry_map; - } - if (gpc->kernel_map) { if (new_pfn == old_pfn) { - new_khva = (void *)((unsigned long)old_khva - page_offset); + new_khva = old_khva; old_pfn = KVM_PFN_ERR_FAULT; old_khva = NULL; } else if (pfn_valid(new_pfn)) { @@ -219,7 +231,9 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, new_khva = memremap(pfn_to_hpa(new_pfn), PAGE_SIZE, MEMREMAP_WB); #endif } - if (!new_khva) + if (new_khva) + new_khva += page_offset; + else ret = -EFAULT; } @@ -232,7 +246,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, } else { /* At this point, gpc->valid may already have been cleared */ gpc->pfn = new_pfn; - gpc->khva = new_khva + page_offset; + gpc->khva = new_khva; } } else { /* If the HVA→PFN mapping was already valid, don't unmap it. */ smime.p7s Description: S/MIME cryptographic signature
[PATCH] PCI/AER: potential dereference of null pointer
he return value of kzalloc() needs to be checked. To avoid use of null pointer in case of the failure of alloc. Fixes: db89ccbe52c7 ("PCI/AER: Define aer_stats structure for AER capable devices") Signed-off-by: Jiasheng Jiang --- drivers/pci/pcie/aer.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index ec943cee5ecc..d04303edf468 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -376,6 +376,8 @@ void pci_aer_init(struct pci_dev *dev) return; dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL); + if (!dev->aer_stats) + return; /* * We save/restore PCI_ERR_UNCOR_MASK, PCI_ERR_UNCOR_SEVER, -- 2.25.1
Re: [PATCH v2 2/6] powerpc/kexec_file: Add KEXEC_SIG support.
On 11/25/21 13:02, Michal Suchanek wrote: Copy the code from s390x Signed-off-by: Michal Suchanek --- arch/powerpc/Kconfig| 11 +++ arch/powerpc/kexec/elf_64.c | 36 2 files changed, 47 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index ac0c515552fd..ecc1227a77f1 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -561,6 +561,17 @@ config KEXEC_FILE config ARCH_HAS_KEXEC_PURGATORY def_bool KEXEC_FILE +config KEXEC_SIG + bool "Verify kernel signature during kexec_file_load() syscall" + depends on KEXEC_FILE && MODULE_SIG_FORMAT After manually applying the patch, the build is failing with the following error: build failed with error "arch/powerpc/kexec/elf_64.o: In function `elf64_verify_sig': /root/kernel/linus/linux/arch/powerpc/kexec/elf_64.c:160: undefined reference to `verify_appended_signature'" I see it happened because I didn't have MODULE_SIG enabled. Enabling MODULE_SIG fixes it. I wonder why not to add "depends on MODULE_SIG" rather than on MODULE_SIG_FORMAT. Thanks & Regards, - Nayna
Re: [PATCH 0/4] perf: Add new macros for mem_hops field
Em Mon, Dec 06, 2021 at 02:47:45PM +0530, Kajol Jain escreveu: > Patchset adds new macros for mem_hops field which can be > used to represent remote-node, socket and board level details. > > Currently the code had macro for HOPS_0, which corresponds > to data coming from another core but same node. > Add new macros for HOPS_1 to HOPS_3 to represent > remote-node, socket and board level data. > > For ex: Encodings for mem_hops fields with L2 cache: I checked and this hasn't hit mainstream, is it already merged on a tree where this is slated to be submitted in the next window? If so please let me know which one so that I can merge it on perf/core. - Arnaldo > L2 - local L2 > L2 | REMOTE | HOPS_0- remote core, same node L2 > L2 | REMOTE | HOPS_1- remote node, same socket L2 > L2 | REMOTE | HOPS_2- remote socket, same board L2 > L2 | REMOTE | HOPS_3- remote board L2 > > Patch 1 & 2 adds tool and kernel side changes to add new macros for > mem_hops field > > Patch 3 add data source encodings for power10 and older platforms > to represent data based on newer composite PERF_MEM_LVLNUM* fields > > Patch 4 add data source encodings with proper sub_index used to > represent memory/cache level data for power10 platform. > > Kajol Jain (4): > perf: Add new macros for mem_hops field > tools/perf: Add new macros for mem_hops field > powerpc/perf: Add encodings to represent data based on newer composite > PERF_MEM_LVLNUM* fields > powerpc/perf: Add data source encodings for power10 platform > > arch/powerpc/perf/isa207-common.c | 60 --- > include/uapi/linux/perf_event.h | 5 ++- > tools/include/uapi/linux/perf_event.h | 5 ++- > tools/perf/util/mem-events.c | 29 - > 4 files changed, 71 insertions(+), 28 deletions(-) > > -- > 2.27.0 -- - Arnaldo
Re: [PATCH v5 00/12] KVM: x86/xen: Add in-kernel Xen event channel delivery
On 12/9/21 19:47, David Woodhouse wrote: As in the previous two rounds, the last patch (this time patch 12) is included as illustration of how we*might* use this for fixing the UAF bugs in nesting, but isn't intended to be applied as-is. Patches 1-11 are. Queued 1-7, will be on kvm/next tomorrow though. Thanks. I assume you made the changes you wanted to the makefiles then, and will work on the gfn_to_pfn_cache changes you suggested. Yes, thanks. Paolo
Re: [PATCH v5 00/12] KVM: x86/xen: Add in-kernel Xen event channel delivery
On Thu, 2021-12-09 at 19:34 +0100, Paolo Bonzini wrote: > > As in the previous two rounds, the last patch (this time patch 12) is > > included as illustration of how we*might* use this for fixing the UAF > > bugs in nesting, but isn't intended to be applied as-is. Patches 1-11 are. > > Queued 1-7, will be on kvm/next tomorrow though. Thanks. I assume you made the changes you wanted to the makefiles then, and will work on the gfn_to_pfn_cache changes you suggested. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH v5 00/12] KVM: x86/xen: Add in-kernel Xen event channel delivery
On 11/21/21 13:54, David Woodhouse wrote: Introduce the basic concept of 2 level event channels for kernel delivery, which is just a simple matter of a few test_and_set_bit calls on a mapped shared info page. This can be used for routing MSI of passthrough devices to PIRQ event channels in a Xen guest, and we can build on it for delivering IPIs and timers directly from the kernel too. v1: Use kvm_map_gfn() although I didn't quite see how it works. v2: Avoid kvm_map_gfn() and implement a safe mapping with invalidation support for myself. v3: Reinvent gfn_to_pfn_cache with sane invalidation semantics, for my use case as well as nesting. v4: Rework dirty handling, as it became apparently that we need an active vCPU context to mark pages dirty so it can't be done from the MMU notifier duing the invalidation; it has to happen on unmap. v5: Fix sparse warnings reported by kernel test robot. Fix revalidation when memslots change but the resulting HVA stays the same. We can use the same kernel mapping in that case, if the HVA → PFN translation was valid before. So that probably means we shouldn't unmap the "old_hva". Augment the test case to exercise that one too. Include the fix for the dirty ring vs. Xen shinfo oops reported by butt3rflyh4ck. As in the previous two rounds, the last patch (this time patch 12) is included as illustration of how we*might* use this for fixing the UAF bugs in nesting, but isn't intended to be applied as-is. Patches 1-11 are. Queued 1-7, will be on kvm/next tomorrow though. Paolo
Re: [PATCH v5 08/12] KVM: Reinstate gfn_to_pfn_cache with invalidation support
Sorry for the late review... On 11/21/21 13:54, David Woodhouse wrote: +EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_check); + +static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, + gpa_t gpa, bool dirty) +{ + /* Unmap the old page if it was mapped before, and release it */ + if (!is_error_noslot_pfn(pfn)) { + if (khva) { + if (pfn_valid(pfn)) + kunmap(pfn_to_page(pfn)); +#ifdef CONFIG_HAS_IOMEM + else + memunmap(khva); +#endif + } Considering that the khva is passed directly to memunmap, perhaps it's cleaner to ensure it's page-aligned: diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 13cae72d39e9..267477bd2972 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -147,7 +147,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, old_gpa = gpc->gpa; old_pfn = gpc->pfn; - old_khva = gpc->khva; + old_khva = (void *)((unsigned long)gpc->khva & ~PAGE_MASK); old_uhva = gpc->uhva; old_valid = gpc->valid; old_dirty = gpc->dirty; @@ -209,7 +209,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, if (gpc->kernel_map) { if (new_pfn == old_pfn) { - new_khva = (void *)((unsigned long)old_khva - page_offset); + new_khva = old_khva; old_pfn = KVM_PFN_ERR_FAULT; old_khva = NULL; } else if (pfn_valid(new_pfn)) { @@ -265,7 +265,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc) gpc->valid = false; - old_khva = gpc->khva; + old_khva = (void *)((unsigned long)gpc->khva & ~PAGE_MASK); old_dirty = gpc->dirty; old_gpa = gpc->gpa; old_pfn = gpc->pfn; + retry_map: + mmu_seq = kvm->mmu_notifier_seq; + smp_rmb(); + + /* We always request a writeable mapping */ + new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL); + if (is_error_noslot_pfn(new_pfn)) { + ret = -EFAULT; + goto map_done; + } + + KVM_MMU_READ_LOCK(kvm); + retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva); + KVM_MMU_READ_UNLOCK(kvm); + if (retry) { + cond_resched(); + goto retry_map; + } + This should also be a separate function, like static kvm_pfn_t hva_to_pfn_retry(unsigned long uhva) { kvm_pfn_t new_pfn unsigned long mmu_seq; int retry; retry_map: mmu_seq = kvm->mmu_notifier_seq; smp_rmb(); /* We always request a writeable mapping */ new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL); if (is_error_noslot_pfn(new_pfn)) return new_pfn; KVM_MMU_READ_LOCK(kvm); retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva); KVM_MMU_READ_UNLOCK(kvm); if (retry) { cond_resched(); goto retry_map; } return new_pfn; } + write_lock_irq(&gpc->lock); + if (ret) { + gpc->valid = false; + gpc->pfn = KVM_PFN_ERR_FAULT; + gpc->khva = NULL; + } else { + /* At this point, gpc->valid may already have been cleared */ + gpc->pfn = new_pfn; + gpc->khva = new_khva + page_offset; + } Should set gpc->khva only if new_khva != NULL (i.e. only if gpc->kernel_map is true). Paolo
Re: [PATCH v5 06/12] KVM: powerpc: Use Makefile.kvm for common files
On 11/21/21 13:54, David Woodhouse wrote: From: David Woodhouse It's all fairly baroque but in the end, I don't think there's any reason for $(KVM)/irqchip.o to have been handled differently, as they all end up in $(kvm-y) in the end anyway, regardless of whether they get there via $(common-objs-y) and the CPU-specific object lists. Signed-off-by: David Woodhouse Acked-by: Michael Ellerman (powerpc) --- arch/powerpc/kvm/Makefile | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile index 583c14ef596e..245f59118413 100644 --- a/arch/powerpc/kvm/Makefile +++ b/arch/powerpc/kvm/Makefile @@ -4,11 +4,8 @@ # ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm -KVM := ../../../virt/kvm -common-objs-y = $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o -common-objs-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o -common-objs-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o +include $(srctree)/virt/kvm/Makefile.kvm common-objs-y += powerpc.o emulate_loadstore.o obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o @@ -125,7 +122,6 @@ kvm-book3s_32-objs := \ kvm-objs-$(CONFIG_KVM_BOOK3S_32) := $(kvm-book3s_32-objs) kvm-objs-$(CONFIG_KVM_MPIC) += mpic.o -kvm-objs-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o kvm-objs := $(kvm-objs-m) $(kvm-objs-y) Same here, kvm-y += $(kvm-objs-m) $(kvm-objs-y) would be slightly preferrable IMO. Paolo
Re: [PATCH v5 03/12] KVM: s390: Use Makefile.kvm for common files
On 11/21/21 13:54, David Woodhouse wrote: -kvm-objs := $(common-objs) kvm-s390.o intercept.o interrupt.o priv.o sigp.o +kvm-objs := kvm-s390.o intercept.o interrupt.o priv.o sigp.o kvm-objs += diag.o gaccess.o guestdbg.o vsie.o pv.o "kvm-y +=" here (for both lines) would be nicer, it's consistent with x86 and avoids the question of what happens if you have both kvm-objs and kvm-y. Paolo
Re: [PATCH] powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE
Le 09/12/2021 à 07:08, Michael Ellerman a écrit : > Christophe Leroy writes: >> Le 07/12/2021 à 11:34, Maxime Bizon a écrit : >>> >>> On Tue, 2021-12-07 at 06:10 +, Christophe Leroy wrote: >>> >>> Hello, >>> >>> With the patch applied and >>> >>> CONFIG_DEBUG_PAGEALLOC=y >>> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y >>> CONFIG_DEBUG_VM=y >>> >>> I get tons of this during boot: >>> >>> [0.00] Dentry cache hash table entries: 262144 (order: 8, 1048576 >>> bytes, linear) >>> [0.00] Inode-cache hash table entries: 131072 (order: 7, 524288 >>> bytes, linear) >>> [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off >>> [0.00] [ cut here ] >>> [0.00] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:194 >>> set_pte_at+0x18/0x160 >>> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #442 >>> [0.00] NIP: 80015ebc LR: 80016728 CTR: 800166e4 >>> [0.00] REGS: 80751dd0 TRAP: 0700 Not tainted (5.15.0+) >>> [0.00] MSR: 00021032 CR: 42228882 XER: 2000 >>> [0.00] >>> [0.00] GPR00: 800b8dc8 80751e80 806c6300 807311d8 807a1000 8e84 >>> 80751ea8 >>> [0.00] GPR08: 007a1591 0001 007a1180 42224882 >>> 3ff9c608 3fffd79c >>> [0.00] GPR16: >>> 800166e4 807a2000 >>> [0.00] GPR24: 807a1fff 807311d8 807311d8 807a2000 80768804 >>> 807a1000 007a1180 >>> [0.00] NIP [80015ebc] set_pte_at+0x18/0x160 >>> [0.00] LR [80016728] set_page_attr+0x44/0xc0 >>> [0.00] Call Trace: >>> [0.00] [80751e80] [80058570] console_unlock+0x340/0x428 (unreliable) >>> [0.00] [80751ea0] [] 0x0 >>> [0.00] [80751ec0] [800b8dc8] __apply_to_page_range+0x144/0x2a8 >>> [0.00] [80751f00] [80016918] __kernel_map_pages+0x54/0x64 >>> [0.00] [80751f10] [800cfeb0] __free_pages_ok+0x1b0/0x440 >>> [0.00] [80751f50] [805cfc8c] memblock_free_all+0x1d8/0x274 >>> [0.00] [80751f90] [805c5e0c] mem_init+0x3c/0xd0 >>> [0.00] [80751fb0] [805c0bdc] start_kernel+0x404/0x5c4 >>> [0.00] [80751ff0] [33f0] 0x33f0 >>> [0.00] Instruction dump: >>> [0.00] 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 >>> 9421ffe0 93e1001c >>> [0.00] 83e6 8125 71290001 41820014 <0fe0> 7c0802a6 >>> 93c10018 90010024 >>> >>> >> >> That's unrelated to this patch. >> >> The problem is linked to patch c988cfd38e48 ("powerpc/32: use >> set_memory_attr()"), which changed from using __set_pte_at() to using >> set_memory_attr() which uses set_pte_at(). >> >> set_pte_at() has additional checks and shall not be used to updating an >> existing PTE. >> >> Wondering if I should just use __set_pte_at() instead like in the past, >> or do like commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against >> concurrent accesses") and use pte_update() >> >> Michael, Aneesh, any suggestion ? > > The motivation for using pte_update() in that commit is that it does the > update atomically and also handles flushing the HPTE for 64-bit Hash. > > But the books/32 version of pte_update() doesn't do that. In fact > there's some HPTE handling in __set_pte_at(), but then also a comment > saying it's handling in a subsequent flush_tlb_xxx(). > > So that doesn't really help make a decision :) > > On the other hand, could you convert those set_memory_attr() calls to > change_memory_attr() and then eventually drop the former? Sure, that's probably the best. Initially I had to implement that set_memory_attr() variant because change_memory_attr() was doing a pte_clear() that was "sawing off the branch we're sitting on". In extenso mark_rodata_ro() couldn't use change_memory_attr() to change the text section to read-only because mark_rodata_ro() is itself in the text section. But now that change_memory_attr() is using pte_update() instead of going via a pte_clear(), it's possible to use it, so that's what I'll do. Thanks for the idea. Christophe
Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware
On 09/12/2021, 14:53:31, Nathan Lynch wrote: > Laurent Dufour writes: >> On 08/12/2021, 16:21:29, Nathan Lynch wrote: >>> Laurent Dufour writes: On 07/12/2021, 18:07:50, Nathan Lynch wrote: > Laurent Dufour writes: >> On 07/12/2021, 15:32:39, Nathan Lynch wrote: >>> Is there a reasonable fallback for VMs where this parameter doesn't >>> exist? PowerVM partitions should always have it, but what do we want the >>> behavior to be on other hypervisors? >> >> In that case, there is no value displayed in the /proc/powerpc/lparcfg >> and >> the lparstat -i command will fall back to the device tree value. I can't >> see any valid reason to report the value defined in the device tree >> here. > > Here's a valid reason :-) > > lparstat isn't the only possible consumer of the interface, and the > 'ibm,partition-name' property and the dynamic system parameter clearly > serve a common purpose. 'ibm,partition-name' is provided by qemu. If the hypervisor is not providing this value, this is not the goal of this interface to fetch it from the device tree. Any consumer should be able to fall back on the device tree value, and there is no added value to do such a trick in the kernel when it can be done in the user space. >>> >>> There is value in imposing a level of abstraction so that the semantics >>> are: >>> >>> * Report the name assigned to the guest by the hosting environment, if >>> available >>> >>> as opposed to >>> >>> * Return the string returned by a RTAS call to ibm,get-system-parameter >>> with token 55, if implemented >>> >>> The benefit is that consumers of lparcfg do not have to be coded with >>> the knowledge that "if a partition_name= line is absent, the >>> ibm,get-system-parameter RTAS call must have failed, so now I should >>> read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort >>> of esoterica that is appropriate for the kernel to encapsulate. >>> >>> And I'd say the effort involved (falling back to a root node property >>> lookup) is proportional to the benefit. >>> >> >> I don't agree. >> From the kernel point of view, I can't see any benefit, this is adding more >> complexity to do in the kernel what can be done easily in user space. > > Applying this logic, I don't see how adding this to lparcfg would be > justified at all, because user space can already get at the parameter > using the privileged rtas syscall. Publish it to unprivileged programs > over D-Bus or something. That would minimize complexity for the kernel. As you know, there is a need for unprivileged user to read that value. Adding this to lparcfg solves this issue. Also, this makes it easy to read from user space, without the need to get plumbing like D-Bus in the picture. This is simple, working fine, and user space can easily make the choice to read the DT value if needed. Please keep in mind that most of the time the hypervisor is providing that value.
Re: [PATCH] powerpc/rtas: Introduce rtas_get_sensor_nonblocking() for pci hotplug driver.
Mahesh J Salgaonkar writes: > On 2021-11-29 22:53:41 Mon, Nathan Lynch wrote: >> Mahesh Salgaonkar writes: >> > >> > However on certain PHB failures, the rtas call get-sesnor-state() returns >> > extended busy error (9902) until PHB is recovered by phyp. Once PHB is >> > recovered, the get-sensor-state() returns success with correct presence >> > status. The rtas call interface rtas_get_sensor() loops over the rtas call >> > on extended delay return code (9902) until the return value is either >> > success (0) or error (-1). This causes the EEH handler to get stuck for ~6 >> > seconds before it could notify that the pci error has been detected and >> > stop any active operations. >> >> I am curious whether you see any difference with "powerpc/rtas: >> rtas_busy_delay() improvements" which was recently applied. It will >> cause the the calling task to sleep in response to a 990x status instead >> of immediately retrying: >> >> https://git.kernel.org/powerpc/c/38f7b7067dae0c101be573106018e8af22a90fdf >> >> If that commit helps then maybe this change isn't needed. > > I tried with above commit but it didn't help. OK, not too surprising, but thank you for checking.
Re: [PATCH v3] PCI hotplug: rpaphp: Error out on busy status from get-sensor-state
Mahesh Salgaonkar writes: > To avoid this issue, fix the pci hotplug driver (rpaphp) to return an error > if the slot presence state can not be detected immediately. Current > implementation uses rtas_get_sensor() API which blocks the slot check state > until rtas call returns success. Change rpaphp_get_sensor_state() to invoke > rtas_call(get-sensor-state) directly and take actions based on rtas return > status. This patch now errors out immediately on busy return status from > rtas_call. > > Please note that, only on certain PHB failures, the slot presence check > returns BUSY condition. In normal cases it returns immediately with a > correct presence state value. Hence this change has no impact on normal pci > dlpar operations. I was wondering about this. This seems to be saying -2/990x cannot happen in other cases. I couldn't find this specified in the architecture. It seems a bit risky to me to *always* error out on -2/990x - won't we have intermittent slot enable failures? > +/* > + * RTAS call get-sensor-state(DR_ENTITY_SENSE) return values as per PAPR: > + *-1: Hardware Error > + *-2: RTAS_BUSY > + *-3: Invalid sensor. RTAS Parameter Error. > + * -9000: Need DR entity to be powered up and unisolated before RTAS call > + * -9001: Need DR entity to be powered up, but not unisolated, before RTAS > call > + * -9002: DR entity unusable > + * 990x: Extended delay - where x is a number in the range of 0-5 > + */ > +#define RTAS_HARDWARE_ERROR -1 > +#define RTAS_INVALID_SENSOR -3 > +#define SLOT_UNISOLATED -9000 > +#define SLOT_NOT_UNISOLATED -9001 > +#define SLOT_NOT_USABLE -9002 > + > +static int rtas_to_errno(int rtas_rc) > +{ > + int rc; > + > + switch (rtas_rc) { > + case RTAS_HARDWARE_ERROR: > + rc = -EIO; > + break; > + case RTAS_INVALID_SENSOR: > + rc = -EINVAL; > + break; > + case SLOT_UNISOLATED: > + case SLOT_NOT_UNISOLATED: > + rc = -EFAULT; > + break; > + case SLOT_NOT_USABLE: > + rc = -ENODEV; > + break; > + case RTAS_BUSY: > + case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX: > + rc = -EBUSY; > + break; > + default: > + err("%s: unexpected RTAS error %d\n", __func__, rtas_rc); > + rc = -ERANGE; > + break; > + } > + return rc; > +} These conversions look OK to me.
Re: [PATCH v2 0/6] KEXEC_SIG with appended signature
Hello, On Wed, Dec 08, 2021 at 08:50:54PM -0500, Nayna wrote: > > On 11/25/21 13:02, Michal Suchanek wrote: > > Hello, > > Hi Michael, > > > > > This is resend of the KEXEC_SIG patchset. > > > > The first patch is new because it'a a cleanup that does not require any > > change to the module verification code. > > > > The second patch is the only one that is intended to change any > > functionality. > > > > The rest only deduplicates code but I did not receive any review on that > > part so I don't know if it's desirable as implemented. > > > > The first two patches can be applied separately without the rest. > > Patch 2 fails to apply on v5.16-rc4. Can you please also include git > tree/branch while posting the patches ? Sorry, I did not have a clean base and the Kconfig had another change. Here is a tree with the changes applied: https://github.com/hramrach/kernel/tree/kexec_sig > > Secondly, I see that you add the powerpc support in Patch 2 and then modify > it again in Patch 5 after cleanup. Why not add the support for powerpc after > the clean up ? This will reduce some rework and also probably simplify > patches. That's because I don't know if the later patches will be accepted. By queueing this patch first it can be applied standalone to ppc tree without regard for the other patches. It's a copy of the s390 code so it needs the same rework - not really adding complexity. Thanks Michal
Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware
Laurent Dufour writes: > On 08/12/2021, 16:21:29, Nathan Lynch wrote: >> Laurent Dufour writes: >>> On 07/12/2021, 18:07:50, Nathan Lynch wrote: Laurent Dufour writes: > On 07/12/2021, 15:32:39, Nathan Lynch wrote: >> Is there a reasonable fallback for VMs where this parameter doesn't >> exist? PowerVM partitions should always have it, but what do we want the >> behavior to be on other hypervisors? > > In that case, there is no value displayed in the /proc/powerpc/lparcfg and > the lparstat -i command will fall back to the device tree value. I can't > see any valid reason to report the value defined in the device tree > here. Here's a valid reason :-) lparstat isn't the only possible consumer of the interface, and the 'ibm,partition-name' property and the dynamic system parameter clearly serve a common purpose. 'ibm,partition-name' is provided by qemu. >>> >>> If the hypervisor is not providing this value, this is not the goal of this >>> interface to fetch it from the device tree. >>> >>> Any consumer should be able to fall back on the device tree value, and >>> there is no added value to do such a trick in the kernel when it can be >>> done in the user space. >> >> There is value in imposing a level of abstraction so that the semantics >> are: >> >> * Report the name assigned to the guest by the hosting environment, if >> available >> >> as opposed to >> >> * Return the string returned by a RTAS call to ibm,get-system-parameter >> with token 55, if implemented >> >> The benefit is that consumers of lparcfg do not have to be coded with >> the knowledge that "if a partition_name= line is absent, the >> ibm,get-system-parameter RTAS call must have failed, so now I should >> read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort >> of esoterica that is appropriate for the kernel to encapsulate. >> >> And I'd say the effort involved (falling back to a root node property >> lookup) is proportional to the benefit. >> > > I don't agree. > From the kernel point of view, I can't see any benefit, this is adding more > complexity to do in the kernel what can be done easily in user space. Applying this logic, I don't see how adding this to lparcfg would be justified at all, because user space can already get at the parameter using the privileged rtas syscall. Publish it to unprivileged programs over D-Bus or something. That would minimize complexity for the kernel.
[PATCH] selftests/powerpc: Add a test of sigreturning to the kernel
We have a general signal fuzzer, sigfuz, which can modify the MSR & NIP before sigreturn. But the chance of it hitting a kernel address and also clearing MSR_PR is fairly slim. So add a specific test of sigreturn to a kernel address, both with and without attempting to clear MSR_PR (which the kernel must block). Signed-off-by: Michael Ellerman --- .../testing/selftests/powerpc/signal/Makefile | 1 + .../powerpc/signal/sigreturn_kernel.c | 132 ++ 2 files changed, 133 insertions(+) create mode 100644 tools/testing/selftests/powerpc/signal/sigreturn_kernel.c diff --git a/tools/testing/selftests/powerpc/signal/Makefile b/tools/testing/selftests/powerpc/signal/Makefile index d6ae54663aed..84e201572466 100644 --- a/tools/testing/selftests/powerpc/signal/Makefile +++ b/tools/testing/selftests/powerpc/signal/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 TEST_GEN_PROGS := signal signal_tm sigfuz sigreturn_vdso sig_sc_double_restart +TEST_GEN_PROGS += sigreturn_kernel CFLAGS += -maltivec $(OUTPUT)/signal_tm: CFLAGS += -mhtm diff --git a/tools/testing/selftests/powerpc/signal/sigreturn_kernel.c b/tools/testing/selftests/powerpc/signal/sigreturn_kernel.c new file mode 100644 index ..0a1b6e591eee --- /dev/null +++ b/tools/testing/selftests/powerpc/signal/sigreturn_kernel.c @@ -0,0 +1,132 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test that we can't sigreturn to kernel addresses, or to kernel mode. + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include + +#include "utils.h" + +#define MSR_PR (1ul << 14) + +static volatile unsigned long long sigreturn_addr; +static volatile unsigned long long sigreturn_msr_mask; + +static void sigusr1_handler(int signo, siginfo_t *si, void *uc_ptr) +{ + ucontext_t *uc = (ucontext_t *)uc_ptr; + + if (sigreturn_addr) + UCONTEXT_NIA(uc) = sigreturn_addr; + + if (sigreturn_msr_mask) + UCONTEXT_MSR(uc) &= sigreturn_msr_mask; +} + +static pid_t fork_child(void) +{ + pid_t pid; + + pid = fork(); + if (pid == 0) { + raise(SIGUSR1); + exit(0); + } + + return pid; +} + +static int expect_segv(pid_t pid) +{ + int child_ret; + + waitpid(pid, &child_ret, 0); + FAIL_IF(WIFEXITED(child_ret)); + FAIL_IF(!WIFSIGNALED(child_ret)); + FAIL_IF(WTERMSIG(child_ret) != 11); + + return 0; +} + +int test_sigreturn_kernel(void) +{ + struct sigaction act; + int child_ret, i; + pid_t pid; + + act.sa_sigaction = sigusr1_handler; + act.sa_flags = SA_SIGINFO; + sigemptyset(&act.sa_mask); + + FAIL_IF(sigaction(SIGUSR1, &act, NULL)); + + for (i = 0; i < 2; i++) { + // Return to kernel + sigreturn_addr = 0xcull << 60; + pid = fork_child(); + expect_segv(pid); + + // Return to kernel virtual + sigreturn_addr = 0xc008ull << 48; + pid = fork_child(); + expect_segv(pid); + + // Return out of range + sigreturn_addr = 0xc010ull << 48; + pid = fork_child(); + expect_segv(pid); + + // Return to no-man's land, just below PAGE_OFFSET + sigreturn_addr = (0xcull << 60) - (64 * 1024); + pid = fork_child(); + expect_segv(pid); + + // Return to no-man's land, above TASK_SIZE_4PB + sigreturn_addr = 0x1ull << 52; + pid = fork_child(); + expect_segv(pid); + + // Return to 0xd space + sigreturn_addr = 0xdull << 60; + pid = fork_child(); + expect_segv(pid); + + // Return to 0xe space + sigreturn_addr = 0xeull << 60; + pid = fork_child(); + expect_segv(pid); + + // Return to 0xf space + sigreturn_addr = 0xfull << 60; + pid = fork_child(); + expect_segv(pid); + + // Attempt to set PR=0 for 2nd loop (should be blocked by kernel) + sigreturn_msr_mask = ~MSR_PR; + } + + printf("All children killed as expected\n"); + + // Don't change address, just MSR, should return to user as normal + sigreturn_addr = 0; + sigreturn_msr_mask = ~MSR_PR; + pid = fork_child(); + waitpid(pid, &child_ret, 0); + FAIL_IF(!WIFEXITED(child_ret)); + FAIL_IF(WIFSIGNALED(child_ret)); + FAIL_IF(WEXITSTATUS(child_ret) != 0); + + return 0; +} + +int main(void) +{ + return test_harness(test_sigreturn_kernel, "sigreturn_kernel"); +} -- 2.31.1
[Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40
https://bugzilla.kernel.org/show_bug.cgi?id=214913 Michael Ellerman (mich...@ellerman.id.au) changed: What|Removed |Added Status|ASSIGNED|NEEDINFO --- Comment #5 from Michael Ellerman (mich...@ellerman.id.au) --- Sorry I don't have any idea which commit could have fixed this. The process that crashed was "fsstress", do you know if it uses io_uring? -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout
Le 09/12/2021 à 12:22, Michael Ellerman a écrit : > Nicholas Piggin writes: > >> Excerpts from Christophe Leroy's message of December 9, 2021 8:22 pm: >>> >>> >>> Le 09/12/2021 à 11:15, Nicholas Piggin a écrit : Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am: > Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and > remove arch/powerpc/mm/mmap.c > > This change provides standard randomisation of mmaps. > > See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386 > and X86_32") for all the benefits of mmap randomisation. The justification seems pretty reasonable. > > Comparison between powerpc implementation and the generic one: > - mmap_is_legacy() is identical. > - arch_mmap_rnd() does exactly the same allthough it's written > slightly differently. > - MIN_GAP and MAX_GAP are identical. > - mmap_base() does the same but uses STACK_RND_MASK which provides > the same values as stack_maxrandom_size(). > - arch_pick_mmap_layout() is almost identical. The only difference > is that it also adds the random factor to mm->mmap_base in legacy mode. > > That last point is what provides the standard randomisation of mmaps. Thanks for describing it. Could you add random_factor to mmap_base for the legacy path for powerpc as a 2-line change that adds the legacy randomisation. And then this bigger patch would be closer to a no-op. >>> >>> You mean you would like to see the following patch before doing the >>> convert ? >>> >>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7dabf1cbde67a346a187881d4f0bd17347e0334a.1533732583.git.christophe.le...@c-s.fr/ >> >> Yes. > > My comment at the time was: > >Basically mmap_is_legacy() tells you if any of these is true: > > - process has the ADDR_COMPAT_LAYOUT personality > - global legacy_va_layout sysctl is enabled > - stack is unlimited > >And we only want to change the behaviour for the stack. Or at least the >change log of your patch only talks about the stack limit, not the >others. > >Possibly we should just enable randomisation for all three of those >cases, but if so we must spell it out in the patch. > >It'd also be good to see the output of /proc/x/maps for some processes >before and after, to show what actually changes. > > > From: https://github.com/linuxppc/issues/issues/59#issuecomment-502066947 > > > So I think at least the change log on that patch still needs updating to > be clear that it's changing behaviour for all mmap_is_legacy() cases, > not just the stack unlimited case. > > There's also a risk changing the mmap legacy behaviour breaks something. > But we are at least matching the behaviour of other architectures, and > there is also an escape hatch in the form of `setarch -R`. > That was the purpose of adding in the change log a reference to commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386 and X86_32") All this applies to powerpc as well. But I can copy paste the changelog of that commit into mine if you think it is more explicit. I agree that old patch was only refering to stack limit, I had no clue of everything else at that time. Christophe
Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout
Nicholas Piggin writes: > Excerpts from Christophe Leroy's message of December 9, 2021 8:22 pm: >> >> >> Le 09/12/2021 à 11:15, Nicholas Piggin a écrit : >>> Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am: Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and remove arch/powerpc/mm/mmap.c This change provides standard randomisation of mmaps. See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386 and X86_32") for all the benefits of mmap randomisation. >>> >>> The justification seems pretty reasonable. >>> Comparison between powerpc implementation and the generic one: - mmap_is_legacy() is identical. - arch_mmap_rnd() does exactly the same allthough it's written slightly differently. - MIN_GAP and MAX_GAP are identical. - mmap_base() does the same but uses STACK_RND_MASK which provides the same values as stack_maxrandom_size(). - arch_pick_mmap_layout() is almost identical. The only difference is that it also adds the random factor to mm->mmap_base in legacy mode. That last point is what provides the standard randomisation of mmaps. >>> >>> Thanks for describing it. Could you add random_factor to mmap_base for >>> the legacy path for powerpc as a 2-line change that adds the legacy >>> randomisation. And then this bigger patch would be closer to a no-op. >>> >> >> You mean you would like to see the following patch before doing the >> convert ? >> >> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7dabf1cbde67a346a187881d4f0bd17347e0334a.1533732583.git.christophe.le...@c-s.fr/ > > Yes. My comment at the time was: Basically mmap_is_legacy() tells you if any of these is true: - process has the ADDR_COMPAT_LAYOUT personality - global legacy_va_layout sysctl is enabled - stack is unlimited And we only want to change the behaviour for the stack. Or at least the change log of your patch only talks about the stack limit, not the others. Possibly we should just enable randomisation for all three of those cases, but if so we must spell it out in the patch. It'd also be good to see the output of /proc/x/maps for some processes before and after, to show what actually changes. From: https://github.com/linuxppc/issues/issues/59#issuecomment-502066947 So I think at least the change log on that patch still needs updating to be clear that it's changing behaviour for all mmap_is_legacy() cases, not just the stack unlimited case. There's also a risk changing the mmap legacy behaviour breaks something. But we are at least matching the behaviour of other architectures, and there is also an escape hatch in the form of `setarch -R`. cheers
[Bug 215217] Kernel fails to boot at an early stage when built with GCC_PLUGIN_LATENT_ENTROPY=y (PowerMac G4 3,6)
https://bugzilla.kernel.org/show_bug.cgi?id=215217 --- Comment #8 from Christophe Leroy (christophe.le...@csgroup.eu) --- early_32.o should likely also have DISABLE_LATENT_ENTROPY_PLUGIN, maybe even more important that for setup_32.o -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215217] Kernel fails to boot at an early stage when built with GCC_PLUGIN_LATENT_ENTROPY=y (PowerMac G4 3,6)
https://bugzilla.kernel.org/show_bug.cgi?id=215217 --- Comment #7 from Erhard F. (erhar...@mailbox.org) --- Created attachment 299967 --> https://bugzilla.kernel.org/attachment.cgi?id=299967&action=edit kernel .config (5.16-rc4 + CFLAGS_setup_32.o +=... , PowerMac G4 DP) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
[Bug 215217] Kernel fails to boot at an early stage when built with GCC_PLUGIN_LATENT_ENTROPY=y (PowerMac G4 3,6)
https://bugzilla.kernel.org/show_bug.cgi?id=215217 --- Comment #6 from Erhard F. (erhar...@mailbox.org) --- Ok I cheked that out. There are already some in the Makefile: CFLAGS_cputable.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_btext.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) CFLAGS_prom.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) I added CFLAGS_setup_32.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) on top of that, rebuilt the kernel after a make clean. No success so far, the kernel still does not boot. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.
Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout
Excerpts from Christophe Leroy's message of December 9, 2021 8:22 pm: > > > Le 09/12/2021 à 11:15, Nicholas Piggin a écrit : >> Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am: >>> Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and >>> remove arch/powerpc/mm/mmap.c >>> >>> This change provides standard randomisation of mmaps. >>> >>> See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386 >>> and X86_32") for all the benefits of mmap randomisation. >> >> The justification seems pretty reasonable. >> >>> >>> Comparison between powerpc implementation and the generic one: >>> - mmap_is_legacy() is identical. >>> - arch_mmap_rnd() does exactly the same allthough it's written >>> slightly differently. >>> - MIN_GAP and MAX_GAP are identical. >>> - mmap_base() does the same but uses STACK_RND_MASK which provides >>> the same values as stack_maxrandom_size(). >>> - arch_pick_mmap_layout() is almost identical. The only difference >>> is that it also adds the random factor to mm->mmap_base in legacy mode. >>> >>> That last point is what provides the standard randomisation of mmaps. >> >> Thanks for describing it. Could you add random_factor to mmap_base for >> the legacy path for powerpc as a 2-line change that adds the legacy >> randomisation. And then this bigger patch would be closer to a no-op. >> > > You mean you would like to see the following patch before doing the > convert ? > > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7dabf1cbde67a346a187881d4f0bd17347e0334a.1533732583.git.christophe.le...@c-s.fr/ Yes. Thanks, Nick
Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout
Le 09/12/2021 à 11:15, Nicholas Piggin a écrit : > Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am: >> Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and >> remove arch/powerpc/mm/mmap.c >> >> This change provides standard randomisation of mmaps. >> >> See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386 >> and X86_32") for all the benefits of mmap randomisation. > > The justification seems pretty reasonable. > >> >> Comparison between powerpc implementation and the generic one: >> - mmap_is_legacy() is identical. >> - arch_mmap_rnd() does exactly the same allthough it's written >> slightly differently. >> - MIN_GAP and MAX_GAP are identical. >> - mmap_base() does the same but uses STACK_RND_MASK which provides >> the same values as stack_maxrandom_size(). >> - arch_pick_mmap_layout() is almost identical. The only difference >> is that it also adds the random factor to mm->mmap_base in legacy mode. >> >> That last point is what provides the standard randomisation of mmaps. > > Thanks for describing it. Could you add random_factor to mmap_base for > the legacy path for powerpc as a 2-line change that adds the legacy > randomisation. And then this bigger patch would be closer to a no-op. > You mean you would like to see the following patch before doing the convert ? https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7dabf1cbde67a346a187881d4f0bd17347e0334a.1533732583.git.christophe.le...@c-s.fr/
Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout
Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am: > Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and > remove arch/powerpc/mm/mmap.c > > This change provides standard randomisation of mmaps. > > See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386 > and X86_32") for all the benefits of mmap randomisation. The justification seems pretty reasonable. > > Comparison between powerpc implementation and the generic one: > - mmap_is_legacy() is identical. > - arch_mmap_rnd() does exactly the same allthough it's written > slightly differently. > - MIN_GAP and MAX_GAP are identical. > - mmap_base() does the same but uses STACK_RND_MASK which provides > the same values as stack_maxrandom_size(). > - arch_pick_mmap_layout() is almost identical. The only difference > is that it also adds the random factor to mm->mmap_base in legacy mode. > > That last point is what provides the standard randomisation of mmaps. Thanks for describing it. Could you add random_factor to mmap_base for the legacy path for powerpc as a 2-line change that adds the legacy randomisation. And then this bigger patch would be closer to a no-op. Thanks, Nick
Re: [PATCH v4 07/10] powerpc/mm: Use generic_hugetlb_get_unmapped_area()
Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am: > Use the generic version of arch_hugetlb_get_unmapped_area() > which is now available at all time. > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/include/asm/book3s/64/hugetlb.h | 4 -- > arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 55 > arch/powerpc/mm/hugetlbpage.c| 4 +- > 3 files changed, 1 insertion(+), 62 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h > b/arch/powerpc/include/asm/book3s/64/hugetlb.h > index 12e150e615b7..b37a28f62cf6 100644 > --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h > +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h > @@ -8,10 +8,6 @@ > */ > void radix__flush_hugetlb_page(struct vm_area_struct *vma, unsigned long > vmaddr); > void radix__local_flush_hugetlb_page(struct vm_area_struct *vma, unsigned > long vmaddr); > -extern unsigned long > -radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > - unsigned long len, unsigned long pgoff, > - unsigned long flags); > > extern void radix__huge_ptep_modify_prot_commit(struct vm_area_struct *vma, > unsigned long addr, pte_t *ptep, > diff --git a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c > b/arch/powerpc/mm/book3s64/radix_hugetlbpage.c > index 23d3e08911d3..d2fb776febb4 100644 > --- a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c > +++ b/arch/powerpc/mm/book3s64/radix_hugetlbpage.c > @@ -41,61 +41,6 @@ void radix__flush_hugetlb_tlb_range(struct vm_area_struct > *vma, unsigned long st > radix__flush_tlb_range_psize(vma->vm_mm, start, end, psize); > } > > -/* > - * A vairant of hugetlb_get_unmapped_area doing topdown search > - * FIXME!! should we do as x86 does or non hugetlb area does ? > - * ie, use topdown or not based on mmap_is_legacy check ? > - */ > -unsigned long > -radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > - unsigned long len, unsigned long pgoff, > - unsigned long flags) > -{ > - struct mm_struct *mm = current->mm; > - struct vm_area_struct *vma; > - struct hstate *h = hstate_file(file); > - int fixed = (flags & MAP_FIXED); > - unsigned long high_limit; > - struct vm_unmapped_area_info info; > - > - high_limit = DEFAULT_MAP_WINDOW; > - if (addr >= high_limit || (fixed && (addr + len > high_limit))) > - high_limit = TASK_SIZE; I wonder if generic hugetlb_get_unmapped_area needs to have the arch_get_mmap_end() added. arm64 has arch_get_mmap_end() and !HAVE_ARCH_HUGETLB_UNMAPPED_AREA so it looks like it has broken large address hint logic for hugetlbfs mappings? x86-64 defines their own and does the same hinting for normal and hugetlbfs mmap. If we had that and defied arch_get_mmap_end(), then this patch should work. Thanks, Nick
RE: [PATCH] sound: fsl: add missing put_device() call in imx_hdmi_probe()
>> From: Wang Qing >> >> of_find_device_by_node() takes a reference to the embedded struct device >> which needs to be dropped when error return. > >... > >> data = devm_kzalloc(&pdev->dev, sizeof(*data), GFP_KERNEL); >> if (!data) { >> + put_device(&cpu_pdev->dev); > >If it's of_find_device_by_node() you need an of_node_put() since you're >dropping a reference on the OF node. > Label fail will drop a reference on the OF node. Also, put_device() is called later except this branch, we just need to add put_device() here. Thanks, Qing >>ret = -ENOMEM; >> goto fail;
Re: [PATCH v4 06/10] powerpc/mm: Use generic_get_unmapped_area() and call it from arch_get_unmapped_area()
Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am: > Use the generic version of arch_get_unmapped_area() which > is now available at all time instead of its copy > radix__arch_get_unmapped_area() > > Instead of setting mm->get_unmapped_area() to either > arch_get_unmapped_area() or generic_get_unmapped_area(), > always set it to arch_get_unmapped_area() and call > generic_get_unmapped_area() from there when radix is enabled. > > Do the same with radix__arch_get_unmapped_area_topdown() > > Signed-off-by: Christophe Leroy > --- > arch/powerpc/mm/mmap.c | 127 ++--- > 1 file changed, 6 insertions(+), 121 deletions(-) > > diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c > index 9b0d6e395bc0..46781d0103d1 100644 > --- a/arch/powerpc/mm/mmap.c > +++ b/arch/powerpc/mm/mmap.c > @@ -81,115 +81,15 @@ static inline unsigned long mmap_base(unsigned long rnd, > } > > #ifdef HAVE_ARCH_UNMAPPED_AREA > -#ifdef CONFIG_PPC_RADIX_MMU > -/* > - * Same function as generic code used only for radix, because we don't need > to overload > - * the generic one. But we will have to duplicate, because hash select > - * HAVE_ARCH_UNMAPPED_AREA > - */ > -static unsigned long > -radix__arch_get_unmapped_area(struct file *filp, unsigned long addr, > - unsigned long len, unsigned long pgoff, > - unsigned long flags) > -{ > - struct mm_struct *mm = current->mm; > - struct vm_area_struct *vma; > - int fixed = (flags & MAP_FIXED); > - unsigned long high_limit; > - struct vm_unmapped_area_info info; > - > - high_limit = DEFAULT_MAP_WINDOW; > - if (addr >= high_limit || (fixed && (addr + len > high_limit))) > - high_limit = TASK_SIZE; Does 64s radix need to define arch_get_mmap_end() to do the above now? Otherwise great to consolidate this with core code, nice patch. Thanks, Nick
Re: [PATCH v4 02/10] mm, hugetlbfs: Allow an arch to always use generic versions of get_unmapped_area functions
Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am: > Unlike most architectures, powerpc can only define at runtime > if it is going to use the generic arch_get_unmapped_area() or not. > > Today, powerpc has a copy of the generic arch_get_unmapped_area() > because when selection HAVE_ARCH_UNMAPPED_AREA the generic > arch_get_unmapped_area() is not available. > > Rename it generic_get_unmapped_area() and make it independent of > HAVE_ARCH_UNMAPPED_AREA. > > Do the same for arch_get_unmapped_area_topdown() versus > HAVE_ARCH_UNMAPPED_AREA_TOPDOWN. > > Do the same for hugetlb_get_unmapped_area() versus > HAVE_ARCH_HUGETLB_UNMAPPED_AREA. > Reviewed-by: Nicholas Piggin > Signed-off-by: Christophe Leroy > --- > fs/hugetlbfs/inode.c | 17 + > include/linux/hugetlb.h | 5 + > include/linux/sched/mm.h | 9 + > mm/mmap.c| 31 --- > 4 files changed, 51 insertions(+), 11 deletions(-) >
Re: [PATCH v4 03/10] powerpc/mm: Move vma_mmu_pagesize()
Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am: > vma_mmu_pagesize() is only required for slices, > otherwise there is a generic weak version doing the > exact same thing. > > Move it to slice.c > Reviewed-by: Nicholas Piggin > Signed-off-by: Christophe Leroy > --- > arch/powerpc/mm/hugetlbpage.c | 11 --- > arch/powerpc/mm/slice.c | 9 + > 2 files changed, 9 insertions(+), 11 deletions(-) > > diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c > index ddead41e2194..0eec3b61bd13 100644 > --- a/arch/powerpc/mm/hugetlbpage.c > +++ b/arch/powerpc/mm/hugetlbpage.c > @@ -565,17 +565,6 @@ unsigned long hugetlb_get_unmapped_area(struct file > *file, unsigned long addr, > } > #endif > > -unsigned long vma_mmu_pagesize(struct vm_area_struct *vma) > -{ > - /* With radix we don't use slice, so derive it from vma*/ > - if (IS_ENABLED(CONFIG_PPC_MM_SLICES) && !radix_enabled()) { > - unsigned int psize = get_slice_psize(vma->vm_mm, vma->vm_start); > - > - return 1UL << mmu_psize_to_shift(psize); > - } > - return vma_kernel_pagesize(vma); > -} > - > bool __init arch_hugetlb_valid_size(unsigned long size) > { > int shift = __ffs(size); > diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c > index f42711f865f3..8a3ac062b71e 100644 > --- a/arch/powerpc/mm/slice.c > +++ b/arch/powerpc/mm/slice.c > @@ -759,4 +759,13 @@ int slice_is_hugepage_only_range(struct mm_struct *mm, > unsigned long addr, > > return !slice_check_range_fits(mm, maskp, addr, len); > } > + > +unsigned long vma_mmu_pagesize(struct vm_area_struct *vma) > +{ > + /* With radix we don't use slice, so derive it from vma*/ > + if (radix_enabled()) > + return vma_kernel_pagesize(vma); > + > + return 1UL << mmu_psize_to_shift(get_slice_psize(vma->vm_mm, > vma->vm_start)); > +} > #endif > -- > 2.33.1 > >
Re: [PATCH v6 15/18] powerpc/64s: Always define arch unmapped area calls
Excerpts from Nicholas Piggin's message of December 9, 2021 6:25 pm: > Excerpts from Christophe Leroy's message of December 8, 2021 7:38 pm: >> >> >> Le 01/12/2021 à 15:41, Nicholas Piggin a écrit : >>> To avoid any functional changes to radix paths when building with hash >>> MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the >>> arch get_unmapped_area calls on 64s platforms. >>> >>> Signed-off-by: Nicholas Piggin >>> --- >>> arch/powerpc/include/asm/book3s/64/hash.h | 4 --- >>> arch/powerpc/include/asm/book3s/64/mmu.h | 6 >>> arch/powerpc/mm/hugetlbpage.c | 16 ++--- >>> arch/powerpc/mm/mmap.c| 40 +++ >>> arch/powerpc/mm/slice.c | 20 >>> 5 files changed, 51 insertions(+), 35 deletions(-) >>> >>> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h >>> b/arch/powerpc/include/asm/book3s/64/hash.h >>> index 674fe0e890dc..a7a0572f3846 100644 >>> --- a/arch/powerpc/include/asm/book3s/64/hash.h >>> +++ b/arch/powerpc/include/asm/book3s/64/hash.h >>> @@ -99,10 +99,6 @@ >>>* Defines the address of the vmemap area, in its own region on >>>* hash table CPUs. >>>*/ >>> -#ifdef CONFIG_PPC_MM_SLICES >>> -#define HAVE_ARCH_UNMAPPED_AREA >>> -#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN >>> -#endif /* CONFIG_PPC_MM_SLICES */ >>> >>> /* PTEIDX nibble */ >>> #define _PTEIDX_SECONDARY 0x8 >>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h >>> b/arch/powerpc/include/asm/book3s/64/mmu.h >>> index c02f42d1031e..015d7d972d16 100644 >>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h >>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h >>> @@ -4,6 +4,12 @@ >>> >>> #include >>> >>> +#ifdef CONFIG_HUGETLB_PAGE >>> +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA >>> +#endif >>> +#define HAVE_ARCH_UNMAPPED_AREA >>> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN >>> + >>> #ifndef __ASSEMBLY__ >>> /* >>>* Page size definition >>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c >>> index 82d8b368ca6d..ddead41e2194 100644 >>> --- a/arch/powerpc/mm/hugetlbpage.c >>> +++ b/arch/powerpc/mm/hugetlbpage.c >>> @@ -542,20 +542,26 @@ struct page *follow_huge_pd(struct vm_area_struct >>> *vma, >>> return page; >>> } >>> >>> -#ifdef CONFIG_PPC_MM_SLICES >>> +#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA >>> +static inline int file_to_psize(struct file *file) >>> +{ >>> + struct hstate *hstate = hstate_file(file); >>> + return shift_to_mmu_psize(huge_page_shift(hstate)); >>> +} >>> + >>> unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long >>> addr, >>> unsigned long len, unsigned long pgoff, >>> unsigned long flags) >>> { >>> - struct hstate *hstate = hstate_file(file); >>> - int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate)); >>> - >>> #ifdef CONFIG_PPC_RADIX_MMU >>> if (radix_enabled()) >>> return radix__hugetlb_get_unmapped_area(file, addr, len, >>>pgoff, flags); >>> #endif >>> - return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1); >>> +#ifdef CONFIG_PPC_MM_SLICES >>> + return slice_get_unmapped_area(addr, len, flags, file_to_psize(file), >>> 1); >>> +#endif >>> + BUG(); >> >> We shouldn't had new instances of BUG(). >> >> BUILD_BUG() should do the trick here. >> >>> } >>> #endif >>> >>> diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c >>> index ae683fdc716c..c475cf810aa8 100644 >>> --- a/arch/powerpc/mm/mmap.c >>> +++ b/arch/powerpc/mm/mmap.c >>> @@ -80,6 +80,7 @@ static inline unsigned long mmap_base(unsigned long rnd, >>> return PAGE_ALIGN(DEFAULT_MAP_WINDOW - gap - rnd); >>> } >>> >>> +#ifdef HAVE_ARCH_UNMAPPED_AREA >>> #ifdef CONFIG_PPC_RADIX_MMU >>> /* >>>* Same function as generic code used only for radix, because we don't >>> need to overload >>> @@ -181,11 +182,42 @@ radix__arch_get_unmapped_area_topdown(struct file >>> *filp, >>> */ >>> return radix__arch_get_unmapped_area(filp, addr0, len, pgoff, flags); >>> } >>> +#endif >>> + >>> +unsigned long arch_get_unmapped_area(struct file *filp, >>> +unsigned long addr, >>> +unsigned long len, >>> +unsigned long pgoff, >>> +unsigned long flags) >>> +{ >>> +#ifdef CONFIG_PPC_MM_SLICES >>> + return slice_get_unmapped_area(addr, len, flags, >>> + >>> mm_ctx_user_psize(¤t->mm->context), 0); >>> +#else >>> + BUG(); >> >> Same. >> >> And the #else isn't needed >> >>> +#endif >>> +} >>> + >>> +unsigned long arch_get_unmapped_area_topdown(struct file *filp, >>> +const unsigned long addr0, >>> +const unsigned long len, >
Re: [PATCH v2 2/6] powerpc/kexec_file: Add KEXEC_SIG support.
Hello, On Wed, Dec 08, 2021 at 08:51:47PM -0500, Nayna wrote: > > On 11/25/21 13:02, Michal Suchanek wrote: > > Copy the code from s390x > > > > Signed-off-by: Michal Suchanek > > --- > > arch/powerpc/Kconfig| 11 +++ > > arch/powerpc/kexec/elf_64.c | 36 > > 2 files changed, 47 insertions(+) > > > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > > index ac0c515552fd..ecc1227a77f1 100644 > > --- a/arch/powerpc/Kconfig > > +++ b/arch/powerpc/Kconfig > > @@ -561,6 +561,17 @@ config KEXEC_FILE > > config ARCH_HAS_KEXEC_PURGATORY > > def_bool KEXEC_FILE > > > > +config KEXEC_SIG > > + bool "Verify kernel signature during kexec_file_load() syscall" > > + depends on KEXEC_FILE && MODULE_SIG_FORMAT > > After manually applying the patch, the build is failing with the following > error: > > build failed with error "arch/powerpc/kexec/elf_64.o: In function > `elf64_verify_sig': > /root/kernel/linus/linux/arch/powerpc/kexec/elf_64.c:160: undefined > reference to `verify_appended_signature'" This patch does not add call to verify_appended_signature. Maybe you applied the following patch as well? Thanks Michal
Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware
On 08/12/2021, 16:21:29, Nathan Lynch wrote: > Laurent Dufour writes: >> On 07/12/2021, 18:07:50, Nathan Lynch wrote: >>> Laurent Dufour writes: On 07/12/2021, 15:32:39, Nathan Lynch wrote: > Is there a reasonable fallback for VMs where this parameter doesn't > exist? PowerVM partitions should always have it, but what do we want the > behavior to be on other hypervisors? In that case, there is no value displayed in the /proc/powerpc/lparcfg and the lparstat -i command will fall back to the device tree value. I can't see any valid reason to report the value defined in the device tree here. >>> >>> Here's a valid reason :-) >>> >>> lparstat isn't the only possible consumer of the interface, and the >>> 'ibm,partition-name' property and the dynamic system parameter clearly >>> serve a common purpose. 'ibm,partition-name' is provided by qemu. >> >> If the hypervisor is not providing this value, this is not the goal of this >> interface to fetch it from the device tree. >> >> Any consumer should be able to fall back on the device tree value, and >> there is no added value to do such a trick in the kernel when it can be >> done in the user space. > > There is value in imposing a level of abstraction so that the semantics > are: > > * Report the name assigned to the guest by the hosting environment, if > available > > as opposed to > > * Return the string returned by a RTAS call to ibm,get-system-parameter > with token 55, if implemented > > The benefit is that consumers of lparcfg do not have to be coded with > the knowledge that "if a partition_name= line is absent, the > ibm,get-system-parameter RTAS call must have failed, so now I should > read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort > of esoterica that is appropriate for the kernel to encapsulate. > > And I'd say the effort involved (falling back to a root node property > lookup) is proportional to the benefit. > I don't agree. >From the kernel point of view, I can't see any benefit, this is adding more complexity to do in the kernel what can be done easily in user space. This is typically what should be implemented in a user space shared library.
Re: [PATCH v6 17/18] powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU
Excerpts from Michael Ellerman's message of December 7, 2021 11:00 pm: > Nicholas Piggin writes: >> 34 files changed, 173 insertions(+), 52 deletions(-) > > > I was able to clean up some of the ifdefs a little with the changes > below. I'll run these through some test builds and then squash them in. Looks good to me. Thanks, Nick > > cheers > > > diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h > b/arch/powerpc/include/asm/book3s/64/mmu-hash.h > index 3004f3323144..21f780942911 100644 > --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h > +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h > @@ -523,8 +523,14 @@ void slb_save_contents(struct slb_entry *slb_ptr); > void slb_dump_contents(struct slb_entry *slb_ptr); > > extern void slb_vmalloc_update(void); > -extern void slb_set_size(u16 size); > void preload_new_slb_context(unsigned long start, unsigned long sp); > + > +#ifdef CONFIG_PPC_64S_HASH_MMU > +void slb_set_size(u16 size); > +#else > +static inline void slb_set_size(u16 size) { } > +#endif > + > #endif /* __ASSEMBLY__ */ > > /* > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > index 2197404cdcc4..75678ff04dd7 100644 > --- a/arch/powerpc/kernel/prom.c > +++ b/arch/powerpc/kernel/prom.c > @@ -231,10 +231,9 @@ static void __init check_cpu_pa_features(unsigned long > node) > ibm_pa_features, ARRAY_SIZE(ibm_pa_features)); > } > > -#ifdef CONFIG_PPC_BOOK3S_64 > +#ifdef CONFIG_PPC_64S_HASH_MMU > static void __init init_mmu_slb_size(unsigned long node) > { > -#ifdef CONFIG_PPC_64S_HASH_MMU > const __be32 *slb_size_ptr; > > slb_size_ptr = of_get_flat_dt_prop(node, "slb-size", NULL) ? : > @@ -242,7 +241,6 @@ static void __init init_mmu_slb_size(unsigned long node) > > if (slb_size_ptr) > mmu_slb_size = be32_to_cpup(slb_size_ptr); > -#endif > } > #else > #define init_mmu_slb_size(node) do { } while(0) > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c > index 22647bb82198..703a2e6ab08d 100644 > --- a/arch/powerpc/kernel/setup_64.c > +++ b/arch/powerpc/kernel/setup_64.c > @@ -886,9 +886,7 @@ void __init setup_per_cpu_areas(void) > atom_size = SZ_1M; > } else if (radix_enabled()) { > atom_size = PAGE_SIZE; > - } else { > -#ifdef CONFIG_PPC_64S_HASH_MMU > - > + } else if (IS_ENABLED(CONFIG_PPC_64S_HASH_MMU)) { > /* >* Linear mapping is one of 4K, 1M and 16M. For 4K, no need >* to group units. For larger mappings, use 1M atom which > @@ -898,9 +896,6 @@ void __init setup_per_cpu_areas(void) > atom_size = PAGE_SIZE; > else > atom_size = SZ_1M; > -#else > - BUILD_BUG(); // radix_enabled() should be constant true > -#endif > } > > if (pcpu_chosen_fc != PCPU_FC_PAGE) { > diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c > index 92d831621fa0..563e9989a5bf 100644 > --- a/arch/powerpc/kexec/ranges.c > +++ b/arch/powerpc/kexec/ranges.c > @@ -296,7 +296,7 @@ int add_initrd_mem_range(struct crash_mem **mem_ranges) > return ret; > } > > -#ifdef CONFIG_PPC_BOOK3S_64 > +#ifdef CONFIG_PPC_64S_HASH_MMU > /** > * add_htab_mem_range - Adds htab range to the given memory ranges list, > * if it exists > @@ -306,14 +306,10 @@ int add_initrd_mem_range(struct crash_mem **mem_ranges) > */ > int add_htab_mem_range(struct crash_mem **mem_ranges) > { > -#ifdef CONFIG_PPC_64S_HASH_MMU > if (!htab_address) > return 0; > > return add_mem_range(mem_ranges, __pa(htab_address), htab_size_bytes); > -#else > - return 0; > -#endif > } > #endif > > diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c > b/arch/powerpc/mm/book3s64/radix_pgtable.c > index 5f8cbeca8080..3c4f0ebe5df8 100644 > --- a/arch/powerpc/mm/book3s64/radix_pgtable.c > +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c > @@ -333,10 +333,8 @@ static void __init radix_init_pgtable(void) > phys_addr_t start, end; > u64 i; > > -#ifdef CONFIG_PPC_64S_HASH_MMU > /* We don't support slb for radix */ > - mmu_slb_size = 0; > -#endif > + slb_set_size(0); > > /* >* Create the linear mapping > diff --git a/arch/powerpc/platforms/pseries/mobility.c > b/arch/powerpc/platforms/pseries/mobility.c > index 21b706bcea76..85033f392c78 100644 > --- a/arch/powerpc/platforms/pseries/mobility.c > +++ b/arch/powerpc/platforms/pseries/mobility.c > @@ -484,9 +484,7 @@ static int do_suspend(void) > ret = rtas_ibm_suspend_me(&status); > if (ret != 0) { > pr_err("ibm,suspend-me error: %d\n", status); > -#ifdef CONFIG_PPC_64S_HASH_MMU > slb_set_size(saved_slb_size); > -#endif > } > > return ret; > diff --git a/arch/powerpc/platforms/pseries/pseries.h > b/arch/powerpc/platforms/pseries/pseries.h > index 354
Re: [PATCH v6 15/18] powerpc/64s: Always define arch unmapped area calls
Excerpts from Christophe Leroy's message of December 8, 2021 8:00 pm: > > > Le 01/12/2021 à 15:41, Nicholas Piggin a écrit : >> To avoid any functional changes to radix paths when building with hash >> MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the >> arch get_unmapped_area calls on 64s platforms. >> >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/include/asm/book3s/64/hash.h | 4 --- >> arch/powerpc/include/asm/book3s/64/mmu.h | 6 >> arch/powerpc/mm/hugetlbpage.c | 16 ++--- >> arch/powerpc/mm/mmap.c| 40 +++ >> arch/powerpc/mm/slice.c | 20 >> 5 files changed, 51 insertions(+), 35 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h >> b/arch/powerpc/include/asm/book3s/64/hash.h >> index 674fe0e890dc..a7a0572f3846 100644 >> --- a/arch/powerpc/include/asm/book3s/64/hash.h >> +++ b/arch/powerpc/include/asm/book3s/64/hash.h >> @@ -99,10 +99,6 @@ >>* Defines the address of the vmemap area, in its own region on >>* hash table CPUs. >>*/ >> -#ifdef CONFIG_PPC_MM_SLICES >> -#define HAVE_ARCH_UNMAPPED_AREA >> -#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN >> -#endif /* CONFIG_PPC_MM_SLICES */ >> >> /* PTEIDX nibble */ >> #define _PTEIDX_SECONDARY 0x8 >> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h >> b/arch/powerpc/include/asm/book3s/64/mmu.h >> index c02f42d1031e..015d7d972d16 100644 >> --- a/arch/powerpc/include/asm/book3s/64/mmu.h >> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h >> @@ -4,6 +4,12 @@ >> >> #include >> >> +#ifdef CONFIG_HUGETLB_PAGE >> +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA >> +#endif >> +#define HAVE_ARCH_UNMAPPED_AREA >> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN >> + >> #ifndef __ASSEMBLY__ >> /* >>* Page size definition >> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c >> index 82d8b368ca6d..ddead41e2194 100644 >> --- a/arch/powerpc/mm/hugetlbpage.c >> +++ b/arch/powerpc/mm/hugetlbpage.c >> @@ -542,20 +542,26 @@ struct page *follow_huge_pd(struct vm_area_struct *vma, >> return page; >> } >> >> -#ifdef CONFIG_PPC_MM_SLICES >> +#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA > > Could use CONFIG_PPC_BOOK3S_64 instead It was going to make it de-selectable with !HASH. I think your series cleans this stuff up so I dont' think it's a big deal. > >> +static inline int file_to_psize(struct file *file) > > 'inline' is not needed. It is otherwise a !SLICES config causes it to give a defined but not used error. > >> +{ >> +struct hstate *hstate = hstate_file(file); >> +return shift_to_mmu_psize(huge_page_shift(hstate)); >> +} >> + >> unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long >> addr, >> unsigned long len, unsigned long pgoff, >> unsigned long flags) >> { >> -struct hstate *hstate = hstate_file(file); >> -int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate)); >> - >> #ifdef CONFIG_PPC_RADIX_MMU >> if (radix_enabled()) >> return radix__hugetlb_get_unmapped_area(file, addr, len, >> pgoff, flags); >> #endif >> -return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1); >> +#ifdef CONFIG_PPC_MM_SLICES >> +return slice_get_unmapped_area(addr, len, flags, file_to_psize(file), >> 1); >> +#endif >> +BUG(); >> } >> #endif >> >> diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c >> index ae683fdc716c..c475cf810aa8 100644 >> --- a/arch/powerpc/mm/mmap.c >> +++ b/arch/powerpc/mm/mmap.c >> @@ -80,6 +80,7 @@ static inline unsigned long mmap_base(unsigned long rnd, >> return PAGE_ALIGN(DEFAULT_MAP_WINDOW - gap - rnd); >> } >> >> +#ifdef HAVE_ARCH_UNMAPPED_AREA > > Could use CONFIG_PPC_BOOK3S_64 instead. Or better, put all that stuff in > a file in mm/book3s64/ directory Seeing as you cleaned it up with your series, probably doesn't matter much. Thanks, Nick
Re: [PATCH v6 15/18] powerpc/64s: Always define arch unmapped area calls
Le 09/12/2021 à 09:25, Nicholas Piggin a écrit : > Excerpts from Christophe Leroy's message of December 8, 2021 7:38 pm: >> >> >> Le 01/12/2021 à 15:41, Nicholas Piggin a écrit : >>> To avoid any functional changes to radix paths when building with hash >>> MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the >>> arch get_unmapped_area calls on 64s platforms. >>> >>> Signed-off-by: Nicholas Piggin >>> --- >>>arch/powerpc/include/asm/book3s/64/hash.h | 4 --- >>>arch/powerpc/include/asm/book3s/64/mmu.h | 6 >>>arch/powerpc/mm/hugetlbpage.c | 16 ++--- >>>arch/powerpc/mm/mmap.c| 40 +++ >>>arch/powerpc/mm/slice.c | 20 >>>5 files changed, 51 insertions(+), 35 deletions(-) >>> >>> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h >>> b/arch/powerpc/include/asm/book3s/64/hash.h >>> index 674fe0e890dc..a7a0572f3846 100644 >>> --- a/arch/powerpc/include/asm/book3s/64/hash.h >>> +++ b/arch/powerpc/include/asm/book3s/64/hash.h >>> @@ -99,10 +99,6 @@ >>> * Defines the address of the vmemap area, in its own region on >>> * hash table CPUs. >>> */ >>> -#ifdef CONFIG_PPC_MM_SLICES >>> -#define HAVE_ARCH_UNMAPPED_AREA >>> -#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN >>> -#endif /* CONFIG_PPC_MM_SLICES */ >>> >>>/* PTEIDX nibble */ >>>#define _PTEIDX_SECONDARY0x8 >>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h >>> b/arch/powerpc/include/asm/book3s/64/mmu.h >>> index c02f42d1031e..015d7d972d16 100644 >>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h >>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h >>> @@ -4,6 +4,12 @@ >>> >>>#include >>> >>> +#ifdef CONFIG_HUGETLB_PAGE >>> +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA >>> +#endif >>> +#define HAVE_ARCH_UNMAPPED_AREA >>> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN >>> + >>>#ifndef __ASSEMBLY__ >>>/* >>> * Page size definition >>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c >>> index 82d8b368ca6d..ddead41e2194 100644 >>> --- a/arch/powerpc/mm/hugetlbpage.c >>> +++ b/arch/powerpc/mm/hugetlbpage.c >>> @@ -542,20 +542,26 @@ struct page *follow_huge_pd(struct vm_area_struct >>> *vma, >>> return page; >>>} >>> >>> -#ifdef CONFIG_PPC_MM_SLICES >>> +#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA >>> +static inline int file_to_psize(struct file *file) >>> +{ >>> + struct hstate *hstate = hstate_file(file); >>> + return shift_to_mmu_psize(huge_page_shift(hstate)); >>> +} >>> + >>>unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long >>> addr, >>> unsigned long len, unsigned long pgoff, >>> unsigned long flags) >>>{ >>> - struct hstate *hstate = hstate_file(file); >>> - int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate)); >>> - >>>#ifdef CONFIG_PPC_RADIX_MMU >>> if (radix_enabled()) >>> return radix__hugetlb_get_unmapped_area(file, addr, len, >>>pgoff, flags); >>>#endif >>> - return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1); >>> +#ifdef CONFIG_PPC_MM_SLICES >>> + return slice_get_unmapped_area(addr, len, flags, file_to_psize(file), >>> 1); >>> +#endif >>> + BUG(); >> >> We shouldn't had new instances of BUG(). >> >> BUILD_BUG() should do the trick here. >> >>>} >>>#endif >>> >>> diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c >>> index ae683fdc716c..c475cf810aa8 100644 >>> --- a/arch/powerpc/mm/mmap.c >>> +++ b/arch/powerpc/mm/mmap.c >>> @@ -80,6 +80,7 @@ static inline unsigned long mmap_base(unsigned long rnd, >>> return PAGE_ALIGN(DEFAULT_MAP_WINDOW - gap - rnd); >>>} >>> >>> +#ifdef HAVE_ARCH_UNMAPPED_AREA >>>#ifdef CONFIG_PPC_RADIX_MMU >>>/* >>> * Same function as generic code used only for radix, because we don't >>> need to overload >>> @@ -181,11 +182,42 @@ radix__arch_get_unmapped_area_topdown(struct file >>> *filp, >>> */ >>> return radix__arch_get_unmapped_area(filp, addr0, len, pgoff, flags); >>>} >>> +#endif >>> + >>> +unsigned long arch_get_unmapped_area(struct file *filp, >>> +unsigned long addr, >>> +unsigned long len, >>> +unsigned long pgoff, >>> +unsigned long flags) >>> +{ >>> +#ifdef CONFIG_PPC_MM_SLICES >>> + return slice_get_unmapped_area(addr, len, flags, >>> + >>> mm_ctx_user_psize(¤t->mm->context), 0); >>> +#else >>> + BUG(); >> >> Same. >> >> And the #else isn't needed >> >>> +#endif >>> +} >>> + >>> +unsigned long arch_get_unmapped_area_topdown(struct file *filp, >>> +const unsigned long addr0, >>> +const unsigned
Re: [PATCH v6 15/18] powerpc/64s: Always define arch unmapped area calls
Excerpts from Christophe Leroy's message of December 8, 2021 7:38 pm: > > > Le 01/12/2021 à 15:41, Nicholas Piggin a écrit : >> To avoid any functional changes to radix paths when building with hash >> MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the >> arch get_unmapped_area calls on 64s platforms. >> >> Signed-off-by: Nicholas Piggin >> --- >> arch/powerpc/include/asm/book3s/64/hash.h | 4 --- >> arch/powerpc/include/asm/book3s/64/mmu.h | 6 >> arch/powerpc/mm/hugetlbpage.c | 16 ++--- >> arch/powerpc/mm/mmap.c| 40 +++ >> arch/powerpc/mm/slice.c | 20 >> 5 files changed, 51 insertions(+), 35 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h >> b/arch/powerpc/include/asm/book3s/64/hash.h >> index 674fe0e890dc..a7a0572f3846 100644 >> --- a/arch/powerpc/include/asm/book3s/64/hash.h >> +++ b/arch/powerpc/include/asm/book3s/64/hash.h >> @@ -99,10 +99,6 @@ >>* Defines the address of the vmemap area, in its own region on >>* hash table CPUs. >>*/ >> -#ifdef CONFIG_PPC_MM_SLICES >> -#define HAVE_ARCH_UNMAPPED_AREA >> -#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN >> -#endif /* CONFIG_PPC_MM_SLICES */ >> >> /* PTEIDX nibble */ >> #define _PTEIDX_SECONDARY 0x8 >> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h >> b/arch/powerpc/include/asm/book3s/64/mmu.h >> index c02f42d1031e..015d7d972d16 100644 >> --- a/arch/powerpc/include/asm/book3s/64/mmu.h >> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h >> @@ -4,6 +4,12 @@ >> >> #include >> >> +#ifdef CONFIG_HUGETLB_PAGE >> +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA >> +#endif >> +#define HAVE_ARCH_UNMAPPED_AREA >> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN >> + >> #ifndef __ASSEMBLY__ >> /* >>* Page size definition >> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c >> index 82d8b368ca6d..ddead41e2194 100644 >> --- a/arch/powerpc/mm/hugetlbpage.c >> +++ b/arch/powerpc/mm/hugetlbpage.c >> @@ -542,20 +542,26 @@ struct page *follow_huge_pd(struct vm_area_struct *vma, >> return page; >> } >> >> -#ifdef CONFIG_PPC_MM_SLICES >> +#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA >> +static inline int file_to_psize(struct file *file) >> +{ >> +struct hstate *hstate = hstate_file(file); >> +return shift_to_mmu_psize(huge_page_shift(hstate)); >> +} >> + >> unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long >> addr, >> unsigned long len, unsigned long pgoff, >> unsigned long flags) >> { >> -struct hstate *hstate = hstate_file(file); >> -int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate)); >> - >> #ifdef CONFIG_PPC_RADIX_MMU >> if (radix_enabled()) >> return radix__hugetlb_get_unmapped_area(file, addr, len, >> pgoff, flags); >> #endif >> -return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1); >> +#ifdef CONFIG_PPC_MM_SLICES >> +return slice_get_unmapped_area(addr, len, flags, file_to_psize(file), >> 1); >> +#endif >> +BUG(); > > We shouldn't had new instances of BUG(). > > BUILD_BUG() should do the trick here. > >> } >> #endif >> >> diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c >> index ae683fdc716c..c475cf810aa8 100644 >> --- a/arch/powerpc/mm/mmap.c >> +++ b/arch/powerpc/mm/mmap.c >> @@ -80,6 +80,7 @@ static inline unsigned long mmap_base(unsigned long rnd, >> return PAGE_ALIGN(DEFAULT_MAP_WINDOW - gap - rnd); >> } >> >> +#ifdef HAVE_ARCH_UNMAPPED_AREA >> #ifdef CONFIG_PPC_RADIX_MMU >> /* >>* Same function as generic code used only for radix, because we don't >> need to overload >> @@ -181,11 +182,42 @@ radix__arch_get_unmapped_area_topdown(struct file >> *filp, >> */ >> return radix__arch_get_unmapped_area(filp, addr0, len, pgoff, flags); >> } >> +#endif >> + >> +unsigned long arch_get_unmapped_area(struct file *filp, >> + unsigned long addr, >> + unsigned long len, >> + unsigned long pgoff, >> + unsigned long flags) >> +{ >> +#ifdef CONFIG_PPC_MM_SLICES >> +return slice_get_unmapped_area(addr, len, flags, >> + >> mm_ctx_user_psize(¤t->mm->context), 0); >> +#else >> +BUG(); > > Same. > > And the #else isn't needed > >> +#endif >> +} >> + >> +unsigned long arch_get_unmapped_area_topdown(struct file *filp, >> + const unsigned long addr0, >> + const unsigned long len, >> + const unsigned long pgoff, >> + const unsigned long flags) >> +{ >> +#ifdef CONFIG_PPC_MM_SLICES >> +