Re: [PATCH 0/4] perf: Add new macros for mem_hops field

2021-12-09 Thread Michael Ellerman
Arnaldo Carvalho de Melo  writes:
> Em Mon, Dec 06, 2021 at 02:47:45PM +0530, Kajol Jain escreveu:
>> Patchset adds new macros for mem_hops field which can be
>> used to represent remote-node, socket and board level details.
>> 
>> Currently the code had macro for HOPS_0, which corresponds
>> to data coming from another core but same node.
>> Add new macros for HOPS_1 to HOPS_3 to represent
>> remote-node, socket and board level data.
>> 
>> For ex: Encodings for mem_hops fields with L2 cache:
>
> I checked and this hasn't hit mainstream, is it already merged on a tree
> where this is slated to be submitted in the next window? If so please
> let me know which one so that I can merge it on perf/core.

I haven't picked it up. I guess the kernel changes are mainly in
powerpc, but I'd at least need an ack from eg. Peter for the generic
perf uapi change.

Equally the whole series could go via tip.

cheers


Re: [PATCH] powerpc/module_64: Fix livepatching for RO modules

2021-12-09 Thread Joe Lawrence
On 12/9/21 2:00 AM, Michael Ellerman wrote:
> Russell Currey  writes:
>> On Tue, 2021-12-07 at 09:44 -0500, Joe Lawrence wrote:
>>> On 11/23/21 3:15 AM, Russell Currey wrote:
>>>
>>> [[ cc += livepatching list ]]
>>>
>>> Hi Russell,
>>>
>>> Thanks for writing a minimal fix for stable / backporting.  As I
>>> mentioned on the github issue [1], this avoided the crashes I
>>> reported
>>> here and over on kpatch github [2].  I wasn't sure if this is the
>>> final
>>> version for stable, but feel free to add my:
>>>
>>> Tested-by: Joe Lawrence 
>>
>> Thanks Joe, as per the discussions on GitHub I think we're fine to use
>> this patch for a fix for stable (unless there's new issues found or
>> additional community feedback etc).
> 
> Hmm, I read the GitHub discussion as being that you were going to do
> another version, which is why I haven't picked this up. But I guess you
> and Christophe were talking about the non-minimal fix.
> 
> I know we want this to be minimal, but I think it should be checking
> that patch_instruction() isn't failing.
> 
> Incremental diff to do that is below. It boots OK, are you able to throw
> a livepatch test at it?
> 

No problem.  The incremental patch update tests successful.

Thanks,

-- 
Joe



[powerpc:next] BUILD SUCCESS 0d76914a4c99ab5658f3fb07cdf3799d28e2eab3

2021-12-09 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: 0d76914a4c99ab5658f3fb07cdf3799d28e2eab3  powerpc/inst: Optimise 
copy_inst_from_kernel_nofault()

elapsed time: 724m

configs tested: 146
configs skipped: 4

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
i386 randconfig-c001-20211209
powerpc  randconfig-c003-20211209
x86_64   allyesconfig
arc  axs103_smp_defconfig
powerpc  mgcoge_defconfig
armmvebu_v7_defconfig
m68kstmark2_defconfig
arm  imote2_defconfig
powerpcsam440ep_defconfig
m68km5307c3_defconfig
sh  sdk7780_defconfig
arm   sunxi_defconfig
xtensa  cadence_csp_defconfig
arm  pxa3xx_defconfig
m68k   m5475evb_defconfig
ia64zx1_defconfig
ia64  gensparse_defconfig
powerpc  tqm8xx_defconfig
arm mxs_defconfig
shapsh4ad0a_defconfig
alphaallyesconfig
armpleb_defconfig
ia64generic_defconfig
mips  malta_defconfig
h8300h8300h-sim_defconfig
openrisc simple_smp_defconfig
shsh7763rdp_defconfig
powerpc mpc834x_mds_defconfig
powerpc kilauea_defconfig
powerpc  makalu_defconfig
sh  lboxre2_defconfig
mips   ip27_defconfig
powerpc xes_mpc85xx_defconfig
powerpcsocrates_defconfig
nds32 allnoconfig
mips  maltasmvp_eva_defconfig
mips   rs90_defconfig
sh   se7705_defconfig
sh  sdk7786_defconfig
arm   sama7_defconfig
mips cobalt_defconfig
sh  rsk7201_defconfig
arm rpc_defconfig
armmulti_v5_defconfig
m68k alldefconfig
arm  jornada720_defconfig
m68k  hp300_defconfig
arm   omap2plus_defconfig
ia64defconfig
powerpc ep8248e_defconfig
arm ezx_defconfig
arm   multi_v4t_defconfig
powerpc   motionpro_defconfig
mips   gcw0_defconfig
armmvebu_v5_defconfig
powerpcfsp2_defconfig
sh   se7721_defconfig
arm  randconfig-c002-20211209
ia64 allmodconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
i386   debian-10.3-kselftests
i386  debian-10.3
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a006-20211209
x86_64   randconfig-a005-20211209
x86_64   randconfig-a001-20211209
x86_64   randconfig-a002-20211209
x86_64   randconfig-a004-20211209
x86_64   randconfig-a003-20211209
i386

[powerpc:next-test] BUILD SUCCESS b149d5d45ac9171ed699a256f026c8ebef901112

2021-12-09 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: b149d5d45ac9171ed699a256f026c8ebef901112  powerpc/powermac: Add 
additional missing lockdep_register_key()

elapsed time: 723m

configs tested: 146
configs skipped: 4

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
i386 randconfig-c001-20211209
powerpc  randconfig-c003-20211209
x86_64   allyesconfig
arc  axs103_smp_defconfig
powerpc  mgcoge_defconfig
armmvebu_v7_defconfig
m68kstmark2_defconfig
arm  imote2_defconfig
powerpcsam440ep_defconfig
m68km5307c3_defconfig
sh  sdk7780_defconfig
arm   sunxi_defconfig
xtensa  cadence_csp_defconfig
arm  pxa3xx_defconfig
m68k   m5475evb_defconfig
ia64zx1_defconfig
ia64  gensparse_defconfig
powerpc  tqm8xx_defconfig
arm mxs_defconfig
shapsh4ad0a_defconfig
alphaallyesconfig
armpleb_defconfig
ia64generic_defconfig
mips  malta_defconfig
h8300h8300h-sim_defconfig
openrisc simple_smp_defconfig
shsh7763rdp_defconfig
powerpc mpc834x_mds_defconfig
powerpc kilauea_defconfig
powerpc  makalu_defconfig
sh  lboxre2_defconfig
mips   ip27_defconfig
powerpc xes_mpc85xx_defconfig
powerpcsocrates_defconfig
nds32 allnoconfig
mips  maltasmvp_eva_defconfig
mips   rs90_defconfig
sh   se7705_defconfig
sh  sdk7786_defconfig
arm   sama7_defconfig
mips cobalt_defconfig
sh  rsk7201_defconfig
arm rpc_defconfig
armmulti_v5_defconfig
m68k alldefconfig
arm  jornada720_defconfig
m68k  hp300_defconfig
arm   omap2plus_defconfig
ia64defconfig
powerpc ep8248e_defconfig
arm ezx_defconfig
arm   multi_v4t_defconfig
powerpc   motionpro_defconfig
mips   gcw0_defconfig
armmvebu_v5_defconfig
powerpcfsp2_defconfig
sh   se7721_defconfig
arm  randconfig-c002-20211209
ia64 allmodconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
i386   debian-10.3-kselftests
i386  debian-10.3
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a006-20211209
x86_64   randconfig-a005-20211209
x86_64   randconfig-a001-20211209
x86_64   randconfig-a002-20211209
x86_64   randconfig-a004-20211209
x86_64   randconfig-a003-20211209
i386

Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout

2021-12-09 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 09/12/2021 à 12:22, Michael Ellerman a écrit :
>> Nicholas Piggin  writes:
>> 
>>> Excerpts from Christophe Leroy's message of December 9, 2021 8:22 pm:


 Le 09/12/2021 à 11:15, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am:
>> Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and
>> remove arch/powerpc/mm/mmap.c
>>
>> This change provides standard randomisation of mmaps.
>>
>> See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386
>> and X86_32") for all the benefits of mmap randomisation.
>
> The justification seems pretty reasonable.
>
>>
>> Comparison between powerpc implementation and the generic one:
>> - mmap_is_legacy() is identical.
>> - arch_mmap_rnd() does exactly the same allthough it's written
>> slightly differently.
>> - MIN_GAP and MAX_GAP are identical.
>> - mmap_base() does the same but uses STACK_RND_MASK which provides
>> the same values as stack_maxrandom_size().
>> - arch_pick_mmap_layout() is almost identical. The only difference
>> is that it also adds the random factor to mm->mmap_base in legacy mode.
>>
>> That last point is what provides the standard randomisation of mmaps.
>
> Thanks for describing it. Could you add random_factor to mmap_base for
> the legacy path for powerpc as a 2-line change that adds the legacy
> randomisation. And then this bigger patch would be closer to a no-op.
>

 You mean you would like to see the following patch before doing the
 convert ?

 https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7dabf1cbde67a346a187881d4f0bd17347e0334a.1533732583.git.christophe.le...@c-s.fr/
>>>
>>> Yes.
>> 
>> My comment at the time was:
>> 
>>Basically mmap_is_legacy() tells you if any of these is true:
>>
>> - process has the ADDR_COMPAT_LAYOUT personality
>> - global legacy_va_layout sysctl is enabled
>> - stack is unlimited
>> 
>>And we only want to change the behaviour for the stack. Or at least the
>>change log of your patch only talks about the stack limit, not the
>>others.
>>
>>Possibly we should just enable randomisation for all three of those
>>cases, but if so we must spell it out in the patch.
>>
>>It'd also be good to see the output of /proc/x/maps for some processes
>>before and after, to show what actually changes.
>> 
>> 
>> From: https://github.com/linuxppc/issues/issues/59#issuecomment-502066947
>> 
>> 
>> So I think at least the change log on that patch still needs updating to
>> be clear that it's changing behaviour for all mmap_is_legacy() cases,
>> not just the stack unlimited case.
>> 
>> There's also a risk changing the mmap legacy behaviour breaks something.
>> But we are at least matching the behaviour of other architectures, and
>> there is also an escape hatch in the form of `setarch -R`.
>
> That was the purpose of adding in the change log a reference to commit 
> 8b8addf891de ("x86/mm/32: Enable full randomization on i386
> and X86_32")
>
> All this applies to powerpc as well.

Yeah, I'm just a pessimist :) So although the security benefit is nice,
I'm more worried that the layout change will break some mission critical
legacy app somewhere. So I just like to have that spelled out in the
change log, or at least in the discussion like here.

> But I can copy paste the changelog of that commit into mine if you think 
> it is more explicit.

Just referring to it is probably fine.

> I agree that old patch was only refering to stack limit, I had no clue 
> of everything else at that time.

No worries.

cheers


[RFC PATCH v2 6/7] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.

2021-12-09 Thread Zi Yan
From: Zi Yan 

alloc_contig_range() now only needs to be aligned to pageblock_order,
drop virtio_mem size requirement that it needs to be the max of
pageblock_order and MAX_ORDER.

Signed-off-by: Zi Yan 
---
 drivers/virtio/virtio_mem.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 96e5a8782769..dab4bce417fd 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2436,15 +2436,13 @@ static int virtio_mem_init_hotplug(struct virtio_mem 
*vm)
  VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
 
/*
-* We want subblocks to span at least MAX_ORDER_NR_PAGES and
-* pageblock_nr_pages pages. This:
+* We want subblocks to span at least pageblock_nr_pages pages. This:
 * - Simplifies our page onlining code (virtio_mem_online_page_cb)
 *   and fake page onlining code (virtio_mem_fake_online).
 * - Is required for now for alloc_contig_range() to work reliably -
 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 */
-   sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-   pageblock_nr_pages) * PAGE_SIZE;
+   sb_size = pageblock_nr_pages * PAGE_SIZE;
sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
if (sb_size < memory_block_size_bytes() && !force_bbm) {
-- 
2.33.0



[RFC PATCH v2 7/7] arch: powerpc: adjust fadump alignment to be pageblock aligned.

2021-12-09 Thread Zi Yan
From: Zi Yan 

CMA only requires pageblock alignment now. Change CMA alignment in
fadump too.

Signed-off-by: Zi Yan 
---
 arch/powerpc/include/asm/fadump-internal.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 8d61c8f3fec4..9198f20b6b68 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
 #define memblock_num_regions(memblock_type)(memblock.memblock_type.cnt)
 
 /* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT   (PAGE_SIZE <<   \
-max_t(unsigned long, MAX_ORDER - 1,\
-pageblock_order))
+#define FADUMP_CMA_ALIGNMENT   (PAGE_SIZE << pageblock_order)
 
 /* FAD commands */
 #define FADUMP_REGISTER1
-- 
2.33.0



[RFC PATCH v2 5/7] mm: cma: use pageblock_order as the single alignment

2021-12-09 Thread Zi Yan
From: Zi Yan 

Now alloc_contig_range() works at pageblock granularity. Change CMA
allocation, which uses alloc_contig_range(), to use pageblock_order
alignment.

Signed-off-by: Zi Yan 
---
 include/linux/mmzone.h  | 5 +
 kernel/dma/contiguous.c | 2 +-
 mm/cma.c| 6 ++
 mm/page_alloc.c | 6 +++---
 4 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b925431b0123..71830af35745 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
 *
 * The way to use it is to change migratetype of a range of
 * pageblocks to MIGRATE_CMA which can be done by
-* __free_pageblock_cma() function.  What is important though
-* is that a range of pageblocks must be aligned to
-* MAX_ORDER_NR_PAGES should biggest page be bigger than
-* a single pageblock.
+* __free_pageblock_cma() function.
 */
MIGRATE_CMA,
 #endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ac35b14b0786 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {
 
 static int __init rmem_cma_setup(struct reserved_mem *rmem)
 {
-   phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+   phys_addr_t align = PAGE_SIZE << pageblock_order;
phys_addr_t mask = align - 1;
unsigned long node = rmem->fdt_node;
bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
diff --git a/mm/cma.c b/mm/cma.c
index bc9ca8f3c487..d171158bd418 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -180,8 +180,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, 
phys_addr_t size,
return -EINVAL;
 
/* ensure minimal alignment required by mm core */
-   alignment = PAGE_SIZE <<
-   max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+   alignment = PAGE_SIZE << pageblock_order;
 
/* alignment should be aligned with order_per_bit */
if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
@@ -268,8 +267,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 * migratetype page by page allocator's buddy algorithm. In the case,
 * you couldn't get a contiguous memory, which is not what we want.
 */
-   alignment = max(alignment,  (phys_addr_t)PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+   alignment = max(alignment,  (phys_addr_t)PAGE_SIZE << pageblock_order);
if (fixed && base & (alignment - 1)) {
ret = -EINVAL;
pr_err("Region at %pa must be aligned to %pa bytes\n",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5ffbeb1b7512..3317f2064105 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9127,8 +9127,8 @@ static inline void split_free_page_into_pageblocks(struct 
page *free_page,
  * be either of the two.
  * @gfp_mask:  GFP mask to use during compaction
  *
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned.  The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
  *
  * The first thing this routine does is attempt to MIGRATE_ISOLATE all
  * pageblocks in the range.  Once isolated, the pageblocks should not
@@ -9243,7 +9243,7 @@ int alloc_contig_range(unsigned long start, unsigned long 
end,
ret = 0;
 
/*
-* Pages from [start, end) are within a MAX_ORDER_NR_PAGES
+* Pages from [start, end) are within a pageblock_nr_pages
 * aligned blocks that are marked as MIGRATE_ISOLATE.  What's
 * more, all pages in [start, end) are free in page allocator.
 * What we are going to do is to allocate all pages from
-- 
2.33.0



[RFC PATCH v2 4/7] mm: make alloc_contig_range work at pageblock granularity

2021-12-09 Thread Zi Yan
From: Zi Yan 

alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
pageblocks with different migratetypes. It might unnecessarily convert
extra pageblocks at the beginning and at the end of the range. Change
alloc_contig_range() to work at pageblock granularity.

It is done by restoring pageblock types and split >pageblock_order free
pages after isolating at MAX_ORDER-1 granularity and migrating pages
away at pageblock granularity. The reason for this process is that
during isolation, some pages, either free or in-use, might have >pageblock
sizes and isolating part of them can cause free accounting issues.
Restoring the migratetypes of the pageblocks not in the interesting
range later is much easier.

Signed-off-by: Zi Yan 
---
 mm/page_alloc.c | 169 ++--
 1 file changed, 149 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 107a5f186d3b..5ffbeb1b7512 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8981,8 +8981,8 @@ struct page *has_unmovable_pages(struct zone *zone, 
struct page *page,
 #ifdef CONFIG_CONTIG_ALLOC
 static unsigned long pfn_max_align_down(unsigned long pfn)
 {
-   return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
-pageblock_nr_pages) - 1);
+   return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
+pageblock_nr_pages));
 }
 
 static unsigned long pfn_max_align_up(unsigned long pfn)
@@ -9071,6 +9071,52 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
return 0;
 }
 
+static inline int save_migratetypes(unsigned char *migratetypes,
+   unsigned long start_pfn, unsigned long end_pfn)
+{
+   unsigned long pfn = start_pfn;
+   int num = 0;
+
+   while (pfn < end_pfn) {
+   migratetypes[num] = get_pageblock_migratetype(pfn_to_page(pfn));
+   num++;
+   pfn += pageblock_nr_pages;
+   }
+   return num;
+}
+
+static inline int restore_migratetypes(unsigned char *migratetypes,
+   unsigned long start_pfn, unsigned long end_pfn)
+{
+   unsigned long pfn = start_pfn;
+   int num = 0;
+
+   while (pfn < end_pfn) {
+   set_pageblock_migratetype(pfn_to_page(pfn), migratetypes[num]);
+   num++;
+   pfn += pageblock_nr_pages;
+   }
+   return num;
+}
+
+static inline void split_free_page_into_pageblocks(struct page *free_page,
+   int order, struct zone *zone)
+{
+   unsigned long pfn;
+
+   spin_lock(>lock);
+   del_page_from_free_list(free_page, zone, order);
+   for (pfn = page_to_pfn(free_page);
+pfn < page_to_pfn(free_page) + (1UL << order);
+pfn += pageblock_nr_pages) {
+   int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
+
+   __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order,
+   mt, FPI_NONE);
+   }
+   spin_unlock(>lock);
+}
+
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start: start PFN to allocate
@@ -9096,8 +9142,15 @@ int alloc_contig_range(unsigned long start, unsigned 
long end,
   unsigned migratetype, gfp_t gfp_mask)
 {
unsigned long outer_start, outer_end;
+   unsigned long isolate_start = pfn_max_align_down(start);
+   unsigned long isolate_end = pfn_max_align_up(end);
+   unsigned long alloc_start = ALIGN_DOWN(start, pageblock_nr_pages);
+   unsigned long alloc_end = ALIGN(end, pageblock_nr_pages);
+   unsigned long num_pageblock_to_save;
unsigned int order;
int ret = 0;
+   unsigned char *saved_mt;
+   int num;
 
struct compact_control cc = {
.nr_migratepages = 0,
@@ -9111,11 +9164,30 @@ int alloc_contig_range(unsigned long start, unsigned 
long end,
};
INIT_LIST_HEAD();
 
+   /*
+* TODO: make MIGRATE_ISOLATE a standalone bit to avoid overwriting
+* the exiting migratetype. Then, we will not need the save and restore
+* process here.
+*/
+
+   /* Save the migratepages of the pageblocks before start and after end */
+   num_pageblock_to_save = (alloc_start - isolate_start) / 
pageblock_nr_pages
+   + (isolate_end - alloc_end) / 
pageblock_nr_pages;
+   saved_mt =
+   kmalloc_array(num_pageblock_to_save,
+ sizeof(unsigned char), GFP_KERNEL);
+   if (!saved_mt)
+   return -ENOMEM;
+
+   num = save_migratetypes(saved_mt, isolate_start, alloc_start);
+
+   num = save_migratetypes(_mt[num], alloc_end, isolate_end);
+
/*
 * What we do here is we mark all pageblocks in range as
 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
 * have 

[RFC PATCH v2 1/7] mm: page_alloc: avoid merging non-fallbackable pageblocks with others.

2021-12-09 Thread Zi Yan
From: Zi Yan 

This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance.
It prepares for the upcoming removal of the MAX_ORDER-1 alignment
requirement for CMA and alloc_contig_range().

MIGRARTE_HIGHATOMIC should not merge with other migratetypes like
MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too.
Also add MIGRARTE_HIGHATOMIC to fallbacks array for completeness.

[1] https://lore.kernel.org/linux-mm/20211130100853.gp3...@techsingularity.net/

Signed-off-by: Zi Yan 
---
 include/linux/mmzone.h |  6 ++
 mm/page_alloc.c| 28 ++--
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 58e744b78c2c..b925431b0123 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -83,6 +83,12 @@ static inline bool is_migrate_movable(int mt)
return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE;
 }
 
+/* See fallbacks[MIGRATE_TYPES][3] in page_alloc.c */
+static inline bool migratetype_has_fallback(int mt)
+{
+   return mt < MIGRATE_PCPTYPES;
+}
+
 #define for_each_migratetype_order(order, type) \
for (order = 0; order < MAX_ORDER; order++) \
for (type = 0; type < MIGRATE_TYPES; type++)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index edfd6c81af82..107a5f186d3b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1041,6 +1041,12 @@ buddy_merge_likely(unsigned long pfn, unsigned long 
buddy_pfn,
return page_is_buddy(higher_page, higher_buddy, order + 1);
 }
 
+static inline bool has_non_fallback_pageblock(struct zone *zone)
+{
+   return has_isolate_pageblock(zone) || zone_cma_pages(zone) != 0 ||
+   zone->nr_reserved_highatomic != 0;
+}
+
 /*
  * Freeing function for a buddy system allocator.
  *
@@ -1116,14 +1122,15 @@ static inline void __free_one_page(struct page *page,
}
if (order < MAX_ORDER - 1) {
/* If we are here, it means order is >= pageblock_order.
-* We want to prevent merge between freepages on isolate
-* pageblock and normal pageblock. Without this, pageblock
-* isolation could cause incorrect freepage or CMA accounting.
+* We want to prevent merge between freepages on pageblock
+* without fallbacks and normal pageblock. Without this,
+* pageblock isolation could cause incorrect freepage or CMA
+* accounting or HIGHATOMIC accounting.
 *
 * We don't want to hit this code for the more frequent
 * low-order merging.
 */
-   if (unlikely(has_isolate_pageblock(zone))) {
+   if (unlikely(has_non_fallback_pageblock(zone))) {
int buddy_mt;
 
buddy_pfn = __find_buddy_pfn(pfn, order);
@@ -1131,8 +1138,8 @@ static inline void __free_one_page(struct page *page,
buddy_mt = get_pageblock_migratetype(buddy);
 
if (migratetype != buddy_mt
-   && (is_migrate_isolate(migratetype) ||
-   is_migrate_isolate(buddy_mt)))
+   && 
(!migratetype_has_fallback(migratetype) ||
+   
!migratetype_has_fallback(buddy_mt)))
goto done_merging;
}
max_order = order + 1;
@@ -2483,6 +2490,7 @@ static int fallbacks[MIGRATE_TYPES][3] = {
[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   
MIGRATE_TYPES },
[MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, 
MIGRATE_TYPES },
[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   
MIGRATE_TYPES },
+   [MIGRATE_HIGHATOMIC] = { MIGRATE_TYPES }, /* Never used */
 #ifdef CONFIG_CMA
[MIGRATE_CMA] = { MIGRATE_TYPES }, /* Never used */
 #endif
@@ -2794,8 +2802,8 @@ static void reserve_highatomic_pageblock(struct page 
*page, struct zone *zone,
 
/* Yoink! */
mt = get_pageblock_migratetype(page);
-   if (!is_migrate_highatomic(mt) && !is_migrate_isolate(mt)
-   && !is_migrate_cma(mt)) {
+   /* Only reserve normal pageblock */
+   if (migratetype_has_fallback(mt)) {
zone->nr_reserved_highatomic += pageblock_nr_pages;
set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
move_freepages_block(zone, page, MIGRATE_HIGHATOMIC, NULL);
@@ -3544,8 +3552,8 @@ int __isolate_free_page(struct page *page, unsigned int 
order)
struct page *endpage = page + (1 << order) - 1;
for (; page < endpage; page += pageblock_nr_pages) {
int mt = get_pageblock_migratetype(page);
-   if (!is_migrate_isolate(mt) && !is_migrate_cma(mt)
-   

[RFC PATCH v2 3/7] mm: migrate: allocate the right size of non hugetlb or THP compound pages.

2021-12-09 Thread Zi Yan
From: Zi Yan 

alloc_migration_target() is used by alloc_contig_range() and non-LRU
movable compound pages can be migrated. Current code does not allocate the
right page size for such pages. Check THP precisely using
is_transparent_huge() and add allocation support for non-LRU compound
pages.

Signed-off-by: Zi Yan 
---
 mm/migrate.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index d487a399253b..2ce3c771b1de 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1563,7 +1563,7 @@ struct page *alloc_migration_target(struct page *page, 
unsigned long private)
return alloc_huge_page_nodemask(h, nid, mtc->nmask, gfp_mask);
}
 
-   if (PageTransHuge(page)) {
+   if (is_transparent_hugepage(page)) {
/*
 * clear __GFP_RECLAIM to make the migration callback
 * consistent with regular THP allocations.
@@ -1572,13 +1572,17 @@ struct page *alloc_migration_target(struct page *page, 
unsigned long private)
gfp_mask |= GFP_TRANSHUGE;
order = HPAGE_PMD_ORDER;
}
+   if (PageCompound(page)) {
+   gfp_mask |= __GFP_COMP;
+   order = compound_order(page);
+   }
zidx = zone_idx(page_zone(page));
if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
gfp_mask |= __GFP_HIGHMEM;
 
new_page = __alloc_pages(gfp_mask, order, nid, mtc->nmask);
 
-   if (new_page && PageTransHuge(new_page))
+   if (new_page && is_transparent_hugepage(page))
prep_transhuge_page(new_page);
 
return new_page;
-- 
2.33.0



[RFC PATCH v2 0/7] Use pageblock_order for cma and alloc_contig_range alignment.

2021-12-09 Thread Zi Yan
From: Zi Yan 

Hi all,

This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
adjustable at boot time[1].

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. still isolates pageblocks at MAX_ORDER - 1 granularity;
2. but saves the pageblock migratetypes outside the specified range of
   alloc_contig_range() and restores them after all pages within the range
   become free after __alloc_contig_migrate_range();
3. splits free pages spanning multiple pageblocks at the beginning and the end
   of the range and puts the split pages to the right migratetype free lists
   based on the pageblock migratetypes;
4. returns pages not in the range as it did before this patch.

Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
to make sure all pageblocks belonging to a single page are isolated together 
and later pageblocks outside the range need to have their migratetypes restored;
or 2) extra logic will need to be added during page free time to split a free
page with multi-migratetype pageblocks.

Two optimizations might come later:
1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
   range during isolation to increase successful rate of alloc_contig_range().
2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
   migratetypes before and after isolation respectively.

Feel free to give comments and suggestions. Thanks.


[1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi@sent.com/
[2] 
https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b94...@redhat.com/


Zi Yan (7):
  mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  mm: compaction: handle non-lru compound pages properly in
isolate_migratepages_block().
  mm: migrate: allocate the right size of non hugetlb or THP compound
pages.
  mm: make alloc_contig_range work at pageblock granularity
  mm: cma: use pageblock_order as the single alignment
  drivers: virtio_mem: use pageblock size as the minimum virtio_mem
size.
  arch: powerpc: adjust fadump alignment to be pageblock aligned.

 arch/powerpc/include/asm/fadump-internal.h |   4 +-
 drivers/virtio/virtio_mem.c|   6 +-
 include/linux/mmzone.h |  11 +-
 kernel/dma/contiguous.c|   2 +-
 mm/cma.c   |   6 +-
 mm/compaction.c|  10 +-
 mm/migrate.c   |   8 +-
 mm/page_alloc.c| 203 +
 8 files changed, 196 insertions(+), 54 deletions(-)

-- 
2.33.0



[RFC PATCH v2 2/7] mm: compaction: handle non-lru compound pages properly in isolate_migratepages_block().

2021-12-09 Thread Zi Yan
From: Zi Yan 

In isolate_migratepages_block(), a !PageLRU tail page can be encountered
when the page is larger than a pageblock. Use compound head page for the
checks inside and skip the entire compound page when isolation succeeds.

Signed-off-by: Zi Yan 
---
 mm/compaction.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b4e94cda3019..ad9053fbbe06 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -979,19 +979,23 @@ isolate_migratepages_block(struct compact_control *cc, 
unsigned long low_pfn,
 * Skip any other type of page
 */
if (!PageLRU(page)) {
+   struct page *head = compound_head(page);
/*
 * __PageMovable can return false positive so we need
 * to verify it under page_lock.
 */
-   if (unlikely(__PageMovable(page)) &&
-   !PageIsolated(page)) {
+   if (unlikely(__PageMovable(head)) &&
+   !PageIsolated(head)) {
if (locked) {
unlock_page_lruvec_irqrestore(locked, 
flags);
locked = NULL;
}
 
-   if (!isolate_movable_page(page, isolate_mode))
+   if (!isolate_movable_page(head, isolate_mode)) {
+   low_pfn += (1 << compound_order(head)) 
- 1 - (page - head);
+   page = head;
goto isolate_success;
+   }
}
 
goto isolate_fail;
-- 
2.33.0



Re: [PATCH v5 08/12] KVM: Reinstate gfn_to_pfn_cache with invalidation support

2021-12-09 Thread David Woodhouse
On Thu, 2021-12-09 at 23:34 +0100, Paolo Bonzini wrote:
> Compared to the review it's missing this hunk:
> 
> @@ -265,7 +265,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct 
> gfn_to_pfn_cache *gpc)
> 
> gpc->valid = false;
> 
> -   old_khva = gpc->khva;
> +   old_khva = (void *)((unsigned long)gpc->khva & ~PAGE_MASK);
> old_dirty = gpc->dirty;
> old_gpa = gpc->gpa;
> old_pfn = gpc->pfn;

Ah right, there were two of those. Will fix; thanks.


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v5 08/12] KVM: Reinstate gfn_to_pfn_cache with invalidation support

2021-12-09 Thread Paolo Bonzini

On 12/9/21 21:40, David Woodhouse wrote:


NP, very useful fixes. Thanks. Incremental patch looks like this. It
passes the xen_shinfo_test self-test; will test it with real Xen guests
tomorrow and repost based on your kvm/next tree once it shows up.


Compared to the review it's missing this hunk:


@@ -265,7 +265,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct 
gfn_to_pfn_cache *gpc)
  
  	gpc->valid = false;
  
-	old_khva = gpc->khva;

+   old_khva = (void *)((unsigned long)gpc->khva & ~PAGE_MASK);
old_dirty = gpc->dirty;
old_gpa = gpc->gpa;
old_pfn = gpc->pfn;

Otherwise looks good, thanks!

Paolo


Re: [PATCH v2 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2021-12-09 Thread Nayna



On 12/9/21 04:21, Michal Suchánek wrote:

Hello,

Hi,

On Wed, Dec 08, 2021 at 08:51:47PM -0500, Nayna wrote:

On 11/25/21 13:02, Michal Suchanek wrote:

Copy the code from s390x

Signed-off-by: Michal Suchanek
---
   arch/powerpc/Kconfig| 11 +++
   arch/powerpc/kexec/elf_64.c | 36 
   2 files changed, 47 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ac0c515552fd..ecc1227a77f1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -561,6 +561,17 @@ config KEXEC_FILE
   config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE

+config KEXEC_SIG
+   bool "Verify kernel signature during kexec_file_load() syscall"
+   depends on KEXEC_FILE && MODULE_SIG_FORMAT

After manually applying the patch, the build is failing with the following
error:

build failed with error "arch/powerpc/kexec/elf_64.o: In function
`elf64_verify_sig':
/root/kernel/linus/linux/arch/powerpc/kexec/elf_64.c:160: undefined
reference to `verify_appended_signature'"

This patch does not add call to verify_appended_signature.

Maybe you applied the following patch as well?


Yes, I tried build after applying all the patches.

Thanks & Regards,

    - Nayna



[Bug 215285] New: power9 le: amdgpu: *ERROR* hw_init of IP block failed -22

2021-12-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215285

Bug ID: 215285
   Summary: power9 le: amdgpu: *ERROR* hw_init of IP block 
failed -22
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 5.15.6
  Hardware: PPC-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: PPC-64
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: s...@aeam.us
Regression: No

POWER9 blackbird system from Talos. GPU is 6600XT. This seems to be a flip flop
regression: https://gitlab.freedesktop.org/drm/amd/-/issues/1519

[   54.313046] [drm] Found VCN firmware Version ENC: 1.16 DEC: 2 VEP: 0
Revision: 1
[   54.313054] amdgpu :03:00.0: amdgpu: Will use PSP to load VCN firmware
[   54.314153] amdgpu :03:00.0: enabling bus mastering
[   54.570938] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed!
[   54.571061] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed
[   54.571175] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* hw_init of IP
block  failed -22
[   54.571277] amdgpu :03:00.0: amdgpu: amdgpu_device_ip_init failed
[   54.571279] amdgpu :03:00.0: amdgpu: Fatal error during GPU init
[   54.571282] amdgpu :03:00.0: amdgpu: amdgpu: finishing device.
[   54.572789] amdgpu: probe of :03:00.0 failed with error -22

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v5 08/12] KVM: Reinstate gfn_to_pfn_cache with invalidation support

2021-12-09 Thread David Woodhouse
On Thu, 2021-12-09 at 19:34 +0100, Paolo Bonzini wrote:
> Sorry for the late review...


NP, very useful fixes. Thanks. Incremental patch looks like this. It
passes the xen_shinfo_test self-test; will test it with real Xen guests
tomorrow and repost based on your kvm/next tree once it shows up.

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index d8c6e1d4a647..9e3c662f815f 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -124,6 +124,33 @@ static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, 
void *khva,
}
 }
 
+static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva)
+{
+   unsigned long mmu_seq;
+   kvm_pfn_t new_pfn;
+   int retry;
+
+   do {
+   mmu_seq = kvm->mmu_notifier_seq;
+   smp_rmb();
+
+   /* We always request a writeable mapping */
+   new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
+   if (is_error_noslot_pfn(new_pfn))
+   break;
+
+   KVM_MMU_READ_LOCK(kvm);
+   retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
+   KVM_MMU_READ_UNLOCK(kvm);
+   if (!retry)
+   break;
+
+   cond_resched();
+   } while (1);
+
+   return new_pfn;
+}
+
 int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
 gpa_t gpa, unsigned long len, bool dirty)
 {
@@ -147,7 +174,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct 
gfn_to_pfn_cache *gpc,
 
old_gpa = gpc->gpa;
old_pfn = gpc->pfn;
-   old_khva = gpc->khva;
+   old_khva = (void *)((unsigned long)gpc->khva & ~PAGE_MASK);
old_uhva = gpc->uhva;
old_valid = gpc->valid;
old_dirty = gpc->dirty;
@@ -178,8 +205,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct 
gfn_to_pfn_cache *gpc,
if (!old_valid || old_uhva != gpc->uhva) {
unsigned long uhva = gpc->uhva;
void *new_khva = NULL;
-   unsigned long mmu_seq;
-   int retry;
 
/* Placeholders for "hva is valid but not yet mapped" */
gpc->pfn = KVM_PFN_ERR_FAULT;
@@ -188,28 +213,15 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct 
gfn_to_pfn_cache *gpc,
 
write_unlock_irq(>lock);
 
-   retry_map:
-   mmu_seq = kvm->mmu_notifier_seq;
-   smp_rmb();
-
-   /* We always request a writeable mapping */
-   new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
+   new_pfn = hva_to_pfn_retry(kvm, uhva);
if (is_error_noslot_pfn(new_pfn)) {
ret = -EFAULT;
goto map_done;
}
 
-   KVM_MMU_READ_LOCK(kvm);
-   retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
-   KVM_MMU_READ_UNLOCK(kvm);
-   if (retry) {
-   cond_resched();
-   goto retry_map;
-   }
-
if (gpc->kernel_map) {
if (new_pfn == old_pfn) {
-   new_khva = (void *)((unsigned long)old_khva - 
page_offset);
+   new_khva = old_khva;
old_pfn = KVM_PFN_ERR_FAULT;
old_khva = NULL;
} else if (pfn_valid(new_pfn)) {
@@ -219,7 +231,9 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct 
gfn_to_pfn_cache *gpc,
new_khva = memremap(pfn_to_hpa(new_pfn), 
PAGE_SIZE, MEMREMAP_WB);
 #endif
}
-   if (!new_khva)
+   if (new_khva)
+   new_khva += page_offset;
+   else
ret = -EFAULT;
}
 
@@ -232,7 +246,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct 
gfn_to_pfn_cache *gpc,
} else {
/* At this point, gpc->valid may already have been 
cleared */
gpc->pfn = new_pfn;
-   gpc->khva = new_khva + page_offset;
+   gpc->khva = new_khva;
}
} else {
/* If the HVA→PFN mapping was already valid, don't unmap it. */


smime.p7s
Description: S/MIME cryptographic signature


[PATCH] PCI/AER: potential dereference of null pointer

2021-12-09 Thread Jiasheng Jiang
he return value of kzalloc() needs to be checked.
To avoid use of null pointer in case of the failure of alloc.

Fixes: db89ccbe52c7 ("PCI/AER: Define aer_stats structure for AER capable 
devices")
Signed-off-by: Jiasheng Jiang 
---
 drivers/pci/pcie/aer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index ec943cee5ecc..d04303edf468 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -376,6 +376,8 @@ void pci_aer_init(struct pci_dev *dev)
return;
 
dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
+   if (!dev->aer_stats)
+   return;
 
/*
 * We save/restore PCI_ERR_UNCOR_MASK, PCI_ERR_UNCOR_SEVER,
-- 
2.25.1



Re: [PATCH v2 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2021-12-09 Thread Nayna


On 11/25/21 13:02, Michal Suchanek wrote:

Copy the code from s390x

Signed-off-by: Michal Suchanek
---
  arch/powerpc/Kconfig| 11 +++
  arch/powerpc/kexec/elf_64.c | 36 
  2 files changed, 47 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index ac0c515552fd..ecc1227a77f1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -561,6 +561,17 @@ config KEXEC_FILE
  config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE

+config KEXEC_SIG
+   bool "Verify kernel signature during kexec_file_load() syscall"
+   depends on KEXEC_FILE && MODULE_SIG_FORMAT


After manually applying the patch, the build is failing with the 
following error:


build failed with error "arch/powerpc/kexec/elf_64.o: In function 
`elf64_verify_sig':
/root/kernel/linus/linux/arch/powerpc/kexec/elf_64.c:160: undefined 
reference to `verify_appended_signature'"


I see it happened because I didn't have MODULE_SIG enabled. Enabling 
MODULE_SIG fixes it.


I wonder why not to add "depends on MODULE_SIG" rather than on 
MODULE_SIG_FORMAT.


Thanks & Regards,

 - Nayna


Re: [PATCH 0/4] perf: Add new macros for mem_hops field

2021-12-09 Thread Arnaldo Carvalho de Melo
Em Mon, Dec 06, 2021 at 02:47:45PM +0530, Kajol Jain escreveu:
> Patchset adds new macros for mem_hops field which can be
> used to represent remote-node, socket and board level details.
> 
> Currently the code had macro for HOPS_0, which corresponds
> to data coming from another core but same node.
> Add new macros for HOPS_1 to HOPS_3 to represent
> remote-node, socket and board level data.
> 
> For ex: Encodings for mem_hops fields with L2 cache:

I checked and this hasn't hit mainstream, is it already merged on a tree
where this is slated to be submitted in the next window? If so please
let me know which one so that I can merge it on perf/core.

- Arnaldo
 
> L2  - local L2
> L2 | REMOTE | HOPS_0- remote core, same node L2
> L2 | REMOTE | HOPS_1- remote node, same socket L2
> L2 | REMOTE | HOPS_2- remote socket, same board L2
> L2 | REMOTE | HOPS_3- remote board L2
> 
> Patch 1 & 2 adds tool and kernel side changes to add new macros for
> mem_hops field
> 
> Patch 3 add data source encodings for power10 and older platforms
> to represent data based on newer composite  PERF_MEM_LVLNUM* fields
> 
> Patch 4 add data source encodings with proper sub_index used to
> represent memory/cache level data for power10 platform.
> 
> Kajol Jain (4):
>   perf: Add new macros for mem_hops field
>   tools/perf: Add new macros for mem_hops field
>   powerpc/perf: Add encodings to represent data based on newer composite
> PERF_MEM_LVLNUM* fields
>   powerpc/perf: Add data source encodings for power10 platform
> 
>  arch/powerpc/perf/isa207-common.c | 60 ---
>  include/uapi/linux/perf_event.h   |  5 ++-
>  tools/include/uapi/linux/perf_event.h |  5 ++-
>  tools/perf/util/mem-events.c  | 29 -
>  4 files changed, 71 insertions(+), 28 deletions(-)
> 
> -- 
> 2.27.0

-- 

- Arnaldo


Re: [PATCH v5 00/12] KVM: x86/xen: Add in-kernel Xen event channel delivery

2021-12-09 Thread Paolo Bonzini

On 12/9/21 19:47, David Woodhouse wrote:

As in the previous two rounds, the last patch (this time patch 12) is
included as illustration of how we*might*  use this for fixing the UAF
bugs in nesting, but isn't intended to be applied as-is. Patches 1-11 are.

Queued 1-7, will be on kvm/next tomorrow though.


Thanks. I assume you made the changes you wanted to the makefiles then,
and will work on the gfn_to_pfn_cache changes you suggested.


Yes, thanks.

Paolo


Re: [PATCH v5 00/12] KVM: x86/xen: Add in-kernel Xen event channel delivery

2021-12-09 Thread David Woodhouse
On Thu, 2021-12-09 at 19:34 +0100, Paolo Bonzini wrote:
> > As in the previous two rounds, the last patch (this time patch 12) is
> > included as illustration of how we*might*  use this for fixing the UAF
> > bugs in nesting, but isn't intended to be applied as-is. Patches 1-11 are.
> 
> Queued 1-7, will be on kvm/next tomorrow though.

Thanks. I assume you made the changes you wanted to the makefiles then,
and will work on the gfn_to_pfn_cache changes you suggested.





smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v5 00/12] KVM: x86/xen: Add in-kernel Xen event channel delivery

2021-12-09 Thread Paolo Bonzini

On 11/21/21 13:54, David Woodhouse wrote:

Introduce the basic concept of 2 level event channels for kernel delivery,
which is just a simple matter of a few test_and_set_bit calls on a mapped
shared info page.

This can be used for routing MSI of passthrough devices to PIRQ event
channels in a Xen guest, and we can build on it for delivering IPIs and
timers directly from the kernel too.

v1: Use kvm_map_gfn() although I didn't quite see how it works.

v2: Avoid kvm_map_gfn() and implement a safe mapping with invalidation
 support for myself.

v3: Reinvent gfn_to_pfn_cache with sane invalidation semantics, for my
 use case as well as nesting.

v4: Rework dirty handling, as it became apparently that we need an active
 vCPU context to mark pages dirty so it can't be done from the MMU
 notifier duing the invalidation; it has to happen on unmap.

v5: Fix sparse warnings reported by kernel test robot.

 Fix revalidation when memslots change but the resulting HVA stays
 the same. We can use the same kernel mapping in that case, if the
 HVA → PFN translation was valid before. So that probably means we
 shouldn't unmap the "old_hva". Augment the test case to exercise
 that one too.

 Include the fix for the dirty ring vs. Xen shinfo oops reported
 by butt3rflyh4ck.


As in the previous two rounds, the last patch (this time patch 12) is
included as illustration of how we*might*  use this for fixing the UAF
bugs in nesting, but isn't intended to be applied as-is. Patches 1-11 are.


Queued 1-7, will be on kvm/next tomorrow though.

Paolo


Re: [PATCH v5 08/12] KVM: Reinstate gfn_to_pfn_cache with invalidation support

2021-12-09 Thread Paolo Bonzini

Sorry for the late review...

On 11/21/21 13:54, David Woodhouse wrote:

+EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_check);
+
+static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva,
+ gpa_t gpa, bool dirty)
+{
+   /* Unmap the old page if it was mapped before, and release it */
+   if (!is_error_noslot_pfn(pfn)) {
+   if (khva) {
+   if (pfn_valid(pfn))
+   kunmap(pfn_to_page(pfn));
+#ifdef CONFIG_HAS_IOMEM
+   else
+   memunmap(khva);
+#endif
+   }


Considering that the khva is passed directly to memunmap, perhaps it's
cleaner to ensure it's page-aligned:

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 13cae72d39e9..267477bd2972 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -147,7 +147,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct 
gfn_to_pfn_cache *gpc,
 
 	old_gpa = gpc->gpa;

old_pfn = gpc->pfn;
-   old_khva = gpc->khva;
+   old_khva = (void *)((unsigned long)gpc->khva & ~PAGE_MASK);
old_uhva = gpc->uhva;
old_valid = gpc->valid;
old_dirty = gpc->dirty;
@@ -209,7 +209,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct 
gfn_to_pfn_cache *gpc,
 
 		if (gpc->kernel_map) {

if (new_pfn == old_pfn) {
-   new_khva = (void *)((unsigned long)old_khva - 
page_offset);
+   new_khva = old_khva;
old_pfn = KVM_PFN_ERR_FAULT;
old_khva = NULL;
} else if (pfn_valid(new_pfn)) {
@@ -265,7 +265,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct 
gfn_to_pfn_cache *gpc)
 
 	gpc->valid = false;
 
-	old_khva = gpc->khva;

+   old_khva = (void *)((unsigned long)gpc->khva & ~PAGE_MASK);
old_dirty = gpc->dirty;
old_gpa = gpc->gpa;
old_pfn = gpc->pfn;




+   retry_map:
+   mmu_seq = kvm->mmu_notifier_seq;
+   smp_rmb();
+
+   /* We always request a writeable mapping */
+   new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
+   if (is_error_noslot_pfn(new_pfn)) {
+   ret = -EFAULT;
+   goto map_done;
+   }
+
+   KVM_MMU_READ_LOCK(kvm);
+   retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
+   KVM_MMU_READ_UNLOCK(kvm);
+   if (retry) {
+   cond_resched();
+   goto retry_map;
+   }
+


This should also be a separate function, like

static kvm_pfn_t hva_to_pfn_retry(unsigned long uhva)
{
kvm_pfn_t new_pfn
unsigned long mmu_seq;
int retry;

retry_map:
mmu_seq = kvm->mmu_notifier_seq;
smp_rmb();

/* We always request a writeable mapping */
new_pfn = hva_to_pfn(uhva, false, NULL, true, NULL);
if (is_error_noslot_pfn(new_pfn))
return new_pfn;

KVM_MMU_READ_LOCK(kvm);
retry = mmu_notifier_retry_hva(kvm, mmu_seq, uhva);
KVM_MMU_READ_UNLOCK(kvm);
if (retry) {
cond_resched();
goto retry_map;
}
return new_pfn;
}



+   write_lock_irq(>lock);
+   if (ret) {
+   gpc->valid = false;
+   gpc->pfn = KVM_PFN_ERR_FAULT;
+   gpc->khva = NULL;
+   } else {
+   /* At this point, gpc->valid may already have been 
cleared */
+   gpc->pfn = new_pfn;
+   gpc->khva = new_khva + page_offset;
+   }


Should set gpc->khva only if new_khva != NULL (i.e. only if gpc->kernel_map
is true).

Paolo


Re: [PATCH v5 06/12] KVM: powerpc: Use Makefile.kvm for common files

2021-12-09 Thread Paolo Bonzini

On 11/21/21 13:54, David Woodhouse wrote:

From: David Woodhouse 

It's all fairly baroque but in the end, I don't think there's any reason
for $(KVM)/irqchip.o to have been handled differently, as they all end
up in $(kvm-y) in the end anyway, regardless of whether they get there
via $(common-objs-y) and the CPU-specific object lists.

Signed-off-by: David Woodhouse 
Acked-by: Michael Ellerman  (powerpc)
---
  arch/powerpc/kvm/Makefile | 6 +-
  1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 583c14ef596e..245f59118413 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -4,11 +4,8 @@
  #
  
  ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm

-KVM := ../../../virt/kvm
  
-common-objs-y = $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o

-common-objs-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
-common-objs-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
+include $(srctree)/virt/kvm/Makefile.kvm
  
  common-objs-y += powerpc.o emulate_loadstore.o

  obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o
@@ -125,7 +122,6 @@ kvm-book3s_32-objs := \
  kvm-objs-$(CONFIG_KVM_BOOK3S_32) := $(kvm-book3s_32-objs)
  
  kvm-objs-$(CONFIG_KVM_MPIC) += mpic.o

-kvm-objs-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
  
  kvm-objs := $(kvm-objs-m) $(kvm-objs-y)


Same here,

kvm-y += $(kvm-objs-m) $(kvm-objs-y)

would be slightly preferrable IMO.

Paolo


Re: [PATCH v5 03/12] KVM: s390: Use Makefile.kvm for common files

2021-12-09 Thread Paolo Bonzini

On 11/21/21 13:54, David Woodhouse wrote:
  
-kvm-objs := $(common-objs) kvm-s390.o intercept.o interrupt.o priv.o sigp.o

+kvm-objs := kvm-s390.o intercept.o interrupt.o priv.o sigp.o
  kvm-objs += diag.o gaccess.o guestdbg.o vsie.o pv.o


"kvm-y +=" here (for both lines) would be nicer, it's consistent with 
x86 and avoids the question of what happens if you have both kvm-objs 
and kvm-y.


Paolo


Re: [PATCH] powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE

2021-12-09 Thread Christophe Leroy


Le 09/12/2021 à 07:08, Michael Ellerman a écrit :
> Christophe Leroy  writes:
>> Le 07/12/2021 à 11:34, Maxime Bizon a écrit :
>>>
>>> On Tue, 2021-12-07 at 06:10 +, Christophe Leroy wrote:
>>>
>>> Hello,
>>>
>>> With the patch applied and
>>>
>>> CONFIG_DEBUG_PAGEALLOC=y
>>> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
>>> CONFIG_DEBUG_VM=y
>>>
>>> I get tons of this during boot:
>>>
>>> [0.00] Dentry cache hash table entries: 262144 (order: 8, 1048576 
>>> bytes, linear)
>>> [0.00] Inode-cache hash table entries: 131072 (order: 7, 524288 
>>> bytes, linear)
>>> [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off
>>> [0.00] [ cut here ]
>>> [0.00] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:194 
>>> set_pte_at+0x18/0x160
>>> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #442
>>> [0.00] NIP:  80015ebc LR: 80016728 CTR: 800166e4
>>> [0.00] REGS: 80751dd0 TRAP: 0700   Not tainted  (5.15.0+)
>>> [0.00] MSR:  00021032   CR: 42228882  XER: 2000
>>> [0.00]
>>> [0.00] GPR00: 800b8dc8 80751e80 806c6300 807311d8 807a1000 8e84 
>>> 80751ea8 
>>> [0.00] GPR08: 007a1591 0001 007a1180  42224882  
>>> 3ff9c608 3fffd79c
>>> [0.00] GPR16:       
>>> 800166e4 807a2000
>>> [0.00] GPR24: 807a1fff 807311d8 807311d8 807a2000 80768804  
>>> 807a1000 007a1180
>>> [0.00] NIP [80015ebc] set_pte_at+0x18/0x160
>>> [0.00] LR [80016728] set_page_attr+0x44/0xc0
>>> [0.00] Call Trace:
>>> [0.00] [80751e80] [80058570] console_unlock+0x340/0x428 (unreliable)
>>> [0.00] [80751ea0] [] 0x0
>>> [0.00] [80751ec0] [800b8dc8] __apply_to_page_range+0x144/0x2a8
>>> [0.00] [80751f00] [80016918] __kernel_map_pages+0x54/0x64
>>> [0.00] [80751f10] [800cfeb0] __free_pages_ok+0x1b0/0x440
>>> [0.00] [80751f50] [805cfc8c] memblock_free_all+0x1d8/0x274
>>> [0.00] [80751f90] [805c5e0c] mem_init+0x3c/0xd0
>>> [0.00] [80751fb0] [805c0bdc] start_kernel+0x404/0x5c4
>>> [0.00] [80751ff0] [33f0] 0x33f0
>>> [0.00] Instruction dump:
>>> [0.00] 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 
>>> 9421ffe0 93e1001c
>>> [0.00] 83e6 8125 71290001 41820014 <0fe0> 7c0802a6 
>>> 93c10018 90010024
>>>
>>>
>>
>> That's unrelated to this patch.
>>
>> The problem is linked to patch c988cfd38e48 ("powerpc/32: use
>> set_memory_attr()"), which changed from using __set_pte_at() to using
>> set_memory_attr() which uses set_pte_at().
>>
>> set_pte_at() has additional checks and shall not be used to updating an
>> existing PTE.
>>
>> Wondering if I should just use __set_pte_at() instead like in the past,
>> or do like commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against
>> concurrent accesses") and use pte_update()
>>
>> Michael, Aneesh, any suggestion ?
> 
> The motivation for using pte_update() in that commit is that it does the
> update atomically and also handles flushing the HPTE for 64-bit Hash.
> 
> But the books/32 version of pte_update() doesn't do that. In fact
> there's some HPTE handling in __set_pte_at(), but then also a comment
> saying it's handling in a subsequent flush_tlb_xxx().
> 
> So that doesn't really help make a decision :)
> 
> On the other hand, could you convert those set_memory_attr() calls to
> change_memory_attr() and then eventually drop the former?

Sure, that's probably the best.

Initially I had to implement that set_memory_attr() variant because 
change_memory_attr() was doing a pte_clear() that was "sawing off the 
branch we're sitting on". In extenso mark_rodata_ro() couldn't use 
change_memory_attr() to change the text section to read-only because 
mark_rodata_ro() is itself in the text section.

But now that change_memory_attr() is using pte_update() instead of going 
via a pte_clear(), it's possible to use it, so that's what I'll do.

Thanks for the idea.
Christophe

Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

2021-12-09 Thread Laurent Dufour
On 09/12/2021, 14:53:31, Nathan Lynch wrote:
> Laurent Dufour  writes:
>> On 08/12/2021, 16:21:29, Nathan Lynch wrote:
>>> Laurent Dufour  writes:
 On 07/12/2021, 18:07:50, Nathan Lynch wrote:
> Laurent Dufour  writes:
>> On 07/12/2021, 15:32:39, Nathan Lynch wrote:
>>> Is there a reasonable fallback for VMs where this parameter doesn't
>>> exist? PowerVM partitions should always have it, but what do we want the
>>> behavior to be on other hypervisors?
>>
>> In that case, there is no value displayed in the /proc/powerpc/lparcfg 
>> and
>> the lparstat -i command will fall back to the device tree value. I can't
>> see any valid reason to report the value defined in the device tree
>> here.
>
> Here's a valid reason :-)
>
> lparstat isn't the only possible consumer of the interface, and the
> 'ibm,partition-name' property and the dynamic system parameter clearly
> serve a common purpose. 'ibm,partition-name' is provided by qemu.

 If the hypervisor is not providing this value, this is not the goal of this
 interface to fetch it from the device tree.

 Any consumer should be able to fall back on the device tree value, and
 there is no added value to do such a trick in the kernel when it can be
 done in the user space.
>>>
>>> There is value in imposing a level of abstraction so that the semantics
>>> are:
>>>
>>> * Report the name assigned to the guest by the hosting environment, if
>>>   available
>>>
>>> as opposed to
>>>
>>> * Return the string returned by a RTAS call to ibm,get-system-parameter
>>>   with token 55, if implemented
>>>
>>> The benefit is that consumers of lparcfg do not have to be coded with
>>> the knowledge that "if a partition_name= line is absent, the
>>> ibm,get-system-parameter RTAS call must have failed, so now I should
>>> read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort
>>> of esoterica that is appropriate for the kernel to encapsulate.
>>>
>>> And I'd say the effort involved (falling back to a root node property
>>> lookup) is proportional to the benefit.
>>>
>>
>> I don't agree.
>> From the kernel point of view, I can't see any benefit, this is adding more
>> complexity to do in the kernel what can be done easily in user space.
> 
> Applying this logic, I don't see how adding this to lparcfg would be
> justified at all, because user space can already get at the parameter
> using the privileged rtas syscall. Publish it to unprivileged programs
> over D-Bus or something. That would minimize complexity for the kernel.

As you know, there is a need for unprivileged user to read that value.
Adding this to lparcfg solves this issue. Also, this makes it easy to read
from user space, without the need to get plumbing like D-Bus in the
picture. This is simple, working fine, and user space can easily make the
choice to read the DT value if needed. Please keep in mind that most of the
time the hypervisor is providing that value.



Re: [PATCH] powerpc/rtas: Introduce rtas_get_sensor_nonblocking() for pci hotplug driver.

2021-12-09 Thread Nathan Lynch
Mahesh J Salgaonkar  writes:
> On 2021-11-29 22:53:41 Mon, Nathan Lynch wrote:
>> Mahesh Salgaonkar  writes:
>> >
>> > However on certain PHB failures, the rtas call get-sesnor-state() returns
>> > extended busy error (9902) until PHB is recovered by phyp. Once PHB is
>> > recovered, the get-sensor-state() returns success with correct presence
>> > status. The rtas call interface rtas_get_sensor() loops over the rtas call
>> > on extended delay return code (9902) until the return value is either
>> > success (0) or error (-1). This causes the EEH handler to get stuck for ~6
>> > seconds before it could notify that the pci error has been detected and
>> > stop any active operations.
>> 
>> I am curious whether you see any difference with "powerpc/rtas:
>> rtas_busy_delay() improvements" which was recently applied. It will
>> cause the the calling task to sleep in response to a 990x status instead
>> of immediately retrying:
>> 
>> https://git.kernel.org/powerpc/c/38f7b7067dae0c101be573106018e8af22a90fdf
>> 
>> If that commit helps then maybe this change isn't needed.
>
> I tried with above commit but it didn't help.

OK, not too surprising, but thank you for checking.


Re: [PATCH v3] PCI hotplug: rpaphp: Error out on busy status from get-sensor-state

2021-12-09 Thread Nathan Lynch
Mahesh Salgaonkar  writes:
> To avoid this issue, fix the pci hotplug driver (rpaphp) to return an error
> if the slot presence state can not be detected immediately. Current
> implementation uses rtas_get_sensor() API which blocks the slot check state
> until rtas call returns success. Change rpaphp_get_sensor_state() to invoke
> rtas_call(get-sensor-state) directly and take actions based on rtas return
> status. This patch now errors out immediately on busy return status from
> rtas_call.
>
> Please note that, only on certain PHB failures, the slot presence check
> returns BUSY condition. In normal cases it returns immediately with a
> correct presence state value. Hence this change has no impact on normal pci
> dlpar operations.

I was wondering about this. This seems to be saying -2/990x cannot
happen in other cases. I couldn't find this specified in the
architecture. It seems a bit risky to me to *always* error out on
-2/990x - won't we have intermittent slot enable failures?

> +/*
> + * RTAS call get-sensor-state(DR_ENTITY_SENSE) return values as per PAPR:
> + *-1: Hardware Error
> + *-2: RTAS_BUSY
> + *-3: Invalid sensor. RTAS Parameter Error.
> + * -9000: Need DR entity to be powered up and unisolated before RTAS call
> + * -9001: Need DR entity to be powered up, but not unisolated, before RTAS 
> call
> + * -9002: DR entity unusable
> + *  990x: Extended delay - where x is a number in the range of 0-5
> + */
> +#define RTAS_HARDWARE_ERROR  -1
> +#define RTAS_INVALID_SENSOR  -3
> +#define SLOT_UNISOLATED  -9000
> +#define SLOT_NOT_UNISOLATED  -9001
> +#define SLOT_NOT_USABLE  -9002
> +
> +static int rtas_to_errno(int rtas_rc)
> +{
> + int rc;
> +
> + switch (rtas_rc) {
> + case RTAS_HARDWARE_ERROR:
> + rc = -EIO;
> + break;
> + case RTAS_INVALID_SENSOR:
> + rc = -EINVAL;
> + break;
> + case SLOT_UNISOLATED:
> + case SLOT_NOT_UNISOLATED:
> + rc = -EFAULT;
> + break;
> + case SLOT_NOT_USABLE:
> + rc = -ENODEV;
> + break;
> + case RTAS_BUSY:
> + case RTAS_EXTENDED_DELAY_MIN...RTAS_EXTENDED_DELAY_MAX:
> + rc = -EBUSY;
> + break;
> + default:
> + err("%s: unexpected RTAS error %d\n", __func__, rtas_rc);
> + rc = -ERANGE;
> + break;
> + }
> + return rc;
> +}

These conversions look OK to me.


Re: [PATCH v2 0/6] KEXEC_SIG with appended signature

2021-12-09 Thread Michal Suchánek
Hello,

On Wed, Dec 08, 2021 at 08:50:54PM -0500, Nayna wrote:
> 
> On 11/25/21 13:02, Michal Suchanek wrote:
> > Hello,
> 
> Hi Michael,
> 
> > 
> > This is resend of the KEXEC_SIG patchset.
> > 
> > The first patch is new because it'a a cleanup that does not require any
> > change to the module verification code.
> > 
> > The second patch is the only one that is intended to change any
> > functionality.
> > 
> > The rest only deduplicates code but I did not receive any review on that
> > part so I don't know if it's desirable as implemented.
> > 
> > The first two patches can be applied separately without the rest.
> 
> Patch 2 fails to apply on v5.16-rc4. Can you please also include git
> tree/branch while posting the patches ?

Sorry, I did not have a clean base and the Kconfig had another change.

Here is a tree with the changes applied:
https://github.com/hramrach/kernel/tree/kexec_sig

> 
> Secondly, I see that you add the powerpc support in Patch 2 and then modify
> it again in Patch 5 after cleanup. Why not add the support for powerpc after
> the clean up ? This will reduce some rework and also probably simplify
> patches.

That's because I don't know if the later patches will be accepted. By
queueing this patch first it can be applied standalone to ppc tree
without regard for the other patches. It's a copy of the s390 code so it
needs the same rework - not really adding complexity.

Thanks

Michal


Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

2021-12-09 Thread Nathan Lynch
Laurent Dufour  writes:
> On 08/12/2021, 16:21:29, Nathan Lynch wrote:
>> Laurent Dufour  writes:
>>> On 07/12/2021, 18:07:50, Nathan Lynch wrote:
 Laurent Dufour  writes:
> On 07/12/2021, 15:32:39, Nathan Lynch wrote:
>> Is there a reasonable fallback for VMs where this parameter doesn't
>> exist? PowerVM partitions should always have it, but what do we want the
>> behavior to be on other hypervisors?
>
> In that case, there is no value displayed in the /proc/powerpc/lparcfg and
> the lparstat -i command will fall back to the device tree value. I can't
> see any valid reason to report the value defined in the device tree
> here.

 Here's a valid reason :-)

 lparstat isn't the only possible consumer of the interface, and the
 'ibm,partition-name' property and the dynamic system parameter clearly
 serve a common purpose. 'ibm,partition-name' is provided by qemu.
>>>
>>> If the hypervisor is not providing this value, this is not the goal of this
>>> interface to fetch it from the device tree.
>>>
>>> Any consumer should be able to fall back on the device tree value, and
>>> there is no added value to do such a trick in the kernel when it can be
>>> done in the user space.
>> 
>> There is value in imposing a level of abstraction so that the semantics
>> are:
>> 
>> * Report the name assigned to the guest by the hosting environment, if
>>   available
>> 
>> as opposed to
>> 
>> * Return the string returned by a RTAS call to ibm,get-system-parameter
>>   with token 55, if implemented
>> 
>> The benefit is that consumers of lparcfg do not have to be coded with
>> the knowledge that "if a partition_name= line is absent, the
>> ibm,get-system-parameter RTAS call must have failed, so now I should
>> read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort
>> of esoterica that is appropriate for the kernel to encapsulate.
>> 
>> And I'd say the effort involved (falling back to a root node property
>> lookup) is proportional to the benefit.
>> 
>
> I don't agree.
> From the kernel point of view, I can't see any benefit, this is adding more
> complexity to do in the kernel what can be done easily in user space.

Applying this logic, I don't see how adding this to lparcfg would be
justified at all, because user space can already get at the parameter
using the privileged rtas syscall. Publish it to unprivileged programs
over D-Bus or something. That would minimize complexity for the kernel.



[PATCH] selftests/powerpc: Add a test of sigreturning to the kernel

2021-12-09 Thread Michael Ellerman
We have a general signal fuzzer, sigfuz, which can modify the MSR & NIP
before sigreturn. But the chance of it hitting a kernel address and also
clearing MSR_PR is fairly slim.

So add a specific test of sigreturn to a kernel address, both with and
without attempting to clear MSR_PR (which the kernel must block).

Signed-off-by: Michael Ellerman 
---
 .../testing/selftests/powerpc/signal/Makefile |   1 +
 .../powerpc/signal/sigreturn_kernel.c | 132 ++
 2 files changed, 133 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/signal/sigreturn_kernel.c

diff --git a/tools/testing/selftests/powerpc/signal/Makefile 
b/tools/testing/selftests/powerpc/signal/Makefile
index d6ae54663aed..84e201572466 100644
--- a/tools/testing/selftests/powerpc/signal/Makefile
+++ b/tools/testing/selftests/powerpc/signal/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 TEST_GEN_PROGS := signal signal_tm sigfuz sigreturn_vdso sig_sc_double_restart
+TEST_GEN_PROGS += sigreturn_kernel
 
 CFLAGS += -maltivec
 $(OUTPUT)/signal_tm: CFLAGS += -mhtm
diff --git a/tools/testing/selftests/powerpc/signal/sigreturn_kernel.c 
b/tools/testing/selftests/powerpc/signal/sigreturn_kernel.c
new file mode 100644
index ..0a1b6e591eee
--- /dev/null
+++ b/tools/testing/selftests/powerpc/signal/sigreturn_kernel.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test that we can't sigreturn to kernel addresses, or to kernel mode.
+ */
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+
+#define MSR_PR (1ul << 14)
+
+static volatile unsigned long long sigreturn_addr;
+static volatile unsigned long long sigreturn_msr_mask;
+
+static void sigusr1_handler(int signo, siginfo_t *si, void *uc_ptr)
+{
+   ucontext_t *uc = (ucontext_t *)uc_ptr;
+
+   if (sigreturn_addr)
+   UCONTEXT_NIA(uc) = sigreturn_addr;
+
+   if (sigreturn_msr_mask)
+   UCONTEXT_MSR(uc) &= sigreturn_msr_mask;
+}
+
+static pid_t fork_child(void)
+{
+   pid_t pid;
+
+   pid = fork();
+   if (pid == 0) {
+   raise(SIGUSR1);
+   exit(0);
+   }
+
+   return pid;
+}
+
+static int expect_segv(pid_t pid)
+{
+   int child_ret;
+
+   waitpid(pid, _ret, 0);
+   FAIL_IF(WIFEXITED(child_ret));
+   FAIL_IF(!WIFSIGNALED(child_ret));
+   FAIL_IF(WTERMSIG(child_ret) != 11);
+
+   return 0;
+}
+
+int test_sigreturn_kernel(void)
+{
+   struct sigaction act;
+   int child_ret, i;
+   pid_t pid;
+
+   act.sa_sigaction = sigusr1_handler;
+   act.sa_flags = SA_SIGINFO;
+   sigemptyset(_mask);
+
+   FAIL_IF(sigaction(SIGUSR1, , NULL));
+
+   for (i = 0; i < 2; i++) {
+   // Return to kernel
+   sigreturn_addr = 0xcull << 60;
+   pid = fork_child();
+   expect_segv(pid);
+
+   // Return to kernel virtual
+   sigreturn_addr = 0xc008ull << 48;
+   pid = fork_child();
+   expect_segv(pid);
+
+   // Return out of range
+   sigreturn_addr = 0xc010ull << 48;
+   pid = fork_child();
+   expect_segv(pid);
+
+   // Return to no-man's land, just below PAGE_OFFSET
+   sigreturn_addr = (0xcull << 60) - (64 * 1024);
+   pid = fork_child();
+   expect_segv(pid);
+
+   // Return to no-man's land, above TASK_SIZE_4PB
+   sigreturn_addr = 0x1ull << 52;
+   pid = fork_child();
+   expect_segv(pid);
+
+   // Return to 0xd space
+   sigreturn_addr = 0xdull << 60;
+   pid = fork_child();
+   expect_segv(pid);
+
+   // Return to 0xe space
+   sigreturn_addr = 0xeull << 60;
+   pid = fork_child();
+   expect_segv(pid);
+
+   // Return to 0xf space
+   sigreturn_addr = 0xfull << 60;
+   pid = fork_child();
+   expect_segv(pid);
+
+   // Attempt to set PR=0 for 2nd loop (should be blocked by 
kernel)
+   sigreturn_msr_mask = ~MSR_PR;
+   }
+
+   printf("All children killed as expected\n");
+
+   // Don't change address, just MSR, should return to user as normal
+   sigreturn_addr = 0;
+   sigreturn_msr_mask = ~MSR_PR;
+   pid = fork_child();
+   waitpid(pid, _ret, 0);
+   FAIL_IF(!WIFEXITED(child_ret));
+   FAIL_IF(WIFSIGNALED(child_ret));
+   FAIL_IF(WEXITSTATUS(child_ret) != 0);
+
+   return 0;
+}
+
+int main(void)
+{
+   return test_harness(test_sigreturn_kernel, "sigreturn_kernel");
+}
-- 
2.31.1



[Bug 214913] [xfstests generic/051] BUG: Kernel NULL pointer dereference on read at 0x00000108 NIP [c0000000000372e4] tm_cgpr_active+0x14/0x40

2021-12-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=214913

Michael Ellerman (mich...@ellerman.id.au) changed:

   What|Removed |Added

 Status|ASSIGNED|NEEDINFO

--- Comment #5 from Michael Ellerman (mich...@ellerman.id.au) ---
Sorry I don't have any idea which commit could have fixed this.

The process that crashed was "fsstress", do you know if it uses io_uring?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout

2021-12-09 Thread Christophe Leroy


Le 09/12/2021 à 12:22, Michael Ellerman a écrit :
> Nicholas Piggin  writes:
> 
>> Excerpts from Christophe Leroy's message of December 9, 2021 8:22 pm:
>>>
>>>
>>> Le 09/12/2021 à 11:15, Nicholas Piggin a écrit :
 Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am:
> Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and
> remove arch/powerpc/mm/mmap.c
>
> This change provides standard randomisation of mmaps.
>
> See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386
> and X86_32") for all the benefits of mmap randomisation.

 The justification seems pretty reasonable.

>
> Comparison between powerpc implementation and the generic one:
> - mmap_is_legacy() is identical.
> - arch_mmap_rnd() does exactly the same allthough it's written
> slightly differently.
> - MIN_GAP and MAX_GAP are identical.
> - mmap_base() does the same but uses STACK_RND_MASK which provides
> the same values as stack_maxrandom_size().
> - arch_pick_mmap_layout() is almost identical. The only difference
> is that it also adds the random factor to mm->mmap_base in legacy mode.
>
> That last point is what provides the standard randomisation of mmaps.

 Thanks for describing it. Could you add random_factor to mmap_base for
 the legacy path for powerpc as a 2-line change that adds the legacy
 randomisation. And then this bigger patch would be closer to a no-op.

>>>
>>> You mean you would like to see the following patch before doing the
>>> convert ?
>>>
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7dabf1cbde67a346a187881d4f0bd17347e0334a.1533732583.git.christophe.le...@c-s.fr/
>>
>> Yes.
> 
> My comment at the time was:
> 
>Basically mmap_is_legacy() tells you if any of these is true:
>
> - process has the ADDR_COMPAT_LAYOUT personality
> - global legacy_va_layout sysctl is enabled
> - stack is unlimited
> 
>And we only want to change the behaviour for the stack. Or at least the
>change log of your patch only talks about the stack limit, not the
>others.
>
>Possibly we should just enable randomisation for all three of those
>cases, but if so we must spell it out in the patch.
>
>It'd also be good to see the output of /proc/x/maps for some processes
>before and after, to show what actually changes.
> 
> 
> From: https://github.com/linuxppc/issues/issues/59#issuecomment-502066947
> 
> 
> So I think at least the change log on that patch still needs updating to
> be clear that it's changing behaviour for all mmap_is_legacy() cases,
> not just the stack unlimited case.
> 
> There's also a risk changing the mmap legacy behaviour breaks something.
> But we are at least matching the behaviour of other architectures, and
> there is also an escape hatch in the form of `setarch -R`.
> 

That was the purpose of adding in the change log a reference to commit 
8b8addf891de ("x86/mm/32: Enable full randomization on i386
and X86_32")

All this applies to powerpc as well.

But I can copy paste the changelog of that commit into mine if you think 
it is more explicit.

I agree that old patch was only refering to stack limit, I had no clue 
of everything else at that time.

Christophe

Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout

2021-12-09 Thread Michael Ellerman
Nicholas Piggin  writes:

> Excerpts from Christophe Leroy's message of December 9, 2021 8:22 pm:
>> 
>> 
>> Le 09/12/2021 à 11:15, Nicholas Piggin a écrit :
>>> Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am:
 Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and
 remove arch/powerpc/mm/mmap.c

 This change provides standard randomisation of mmaps.

 See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386
 and X86_32") for all the benefits of mmap randomisation.
>>> 
>>> The justification seems pretty reasonable.
>>> 

 Comparison between powerpc implementation and the generic one:
 - mmap_is_legacy() is identical.
 - arch_mmap_rnd() does exactly the same allthough it's written
 slightly differently.
 - MIN_GAP and MAX_GAP are identical.
 - mmap_base() does the same but uses STACK_RND_MASK which provides
 the same values as stack_maxrandom_size().
 - arch_pick_mmap_layout() is almost identical. The only difference
 is that it also adds the random factor to mm->mmap_base in legacy mode.

 That last point is what provides the standard randomisation of mmaps.
>>> 
>>> Thanks for describing it. Could you add random_factor to mmap_base for
>>> the legacy path for powerpc as a 2-line change that adds the legacy
>>> randomisation. And then this bigger patch would be closer to a no-op.
>>> 
>> 
>> You mean you would like to see the following patch before doing the 
>> convert ?
>> 
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7dabf1cbde67a346a187881d4f0bd17347e0334a.1533732583.git.christophe.le...@c-s.fr/
>
> Yes.

My comment at the time was:

  Basically mmap_is_legacy() tells you if any of these is true:
  
   - process has the ADDR_COMPAT_LAYOUT personality
   - global legacy_va_layout sysctl is enabled
   - stack is unlimited

  And we only want to change the behaviour for the stack. Or at least the
  change log of your patch only talks about the stack limit, not the
  others.
  
  Possibly we should just enable randomisation for all three of those
  cases, but if so we must spell it out in the patch.
  
  It'd also be good to see the output of /proc/x/maps for some processes
  before and after, to show what actually changes.


From: https://github.com/linuxppc/issues/issues/59#issuecomment-502066947


So I think at least the change log on that patch still needs updating to
be clear that it's changing behaviour for all mmap_is_legacy() cases,
not just the stack unlimited case.

There's also a risk changing the mmap legacy behaviour breaks something.
But we are at least matching the behaviour of other architectures, and
there is also an escape hatch in the form of `setarch -R`.

cheers



[Bug 215217] Kernel fails to boot at an early stage when built with GCC_PLUGIN_LATENT_ENTROPY=y (PowerMac G4 3,6)

2021-12-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215217

--- Comment #8 from Christophe Leroy (christophe.le...@csgroup.eu) ---
early_32.o should likely also have DISABLE_LATENT_ENTROPY_PLUGIN, maybe even
more important that for setup_32.o

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215217] Kernel fails to boot at an early stage when built with GCC_PLUGIN_LATENT_ENTROPY=y (PowerMac G4 3,6)

2021-12-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215217

--- Comment #7 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 299967
  --> https://bugzilla.kernel.org/attachment.cgi?id=299967=edit
kernel .config (5.16-rc4 + CFLAGS_setup_32.o +=... , PowerMac G4 DP)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215217] Kernel fails to boot at an early stage when built with GCC_PLUGIN_LATENT_ENTROPY=y (PowerMac G4 3,6)

2021-12-09 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=215217

--- Comment #6 from Erhard F. (erhar...@mailbox.org) ---
Ok I cheked that out. There are already some in the Makefile:

CFLAGS_cputable.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
CFLAGS_btext.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
CFLAGS_prom.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)

I added CFLAGS_setup_32.o += $(DISABLE_LATENT_ENTROPY_PLUGIN) on top of that,
rebuilt the kernel after a make clean. No success so far, the kernel still does
not boot.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout

2021-12-09 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of December 9, 2021 8:22 pm:
> 
> 
> Le 09/12/2021 à 11:15, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am:
>>> Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and
>>> remove arch/powerpc/mm/mmap.c
>>>
>>> This change provides standard randomisation of mmaps.
>>>
>>> See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386
>>> and X86_32") for all the benefits of mmap randomisation.
>> 
>> The justification seems pretty reasonable.
>> 
>>>
>>> Comparison between powerpc implementation and the generic one:
>>> - mmap_is_legacy() is identical.
>>> - arch_mmap_rnd() does exactly the same allthough it's written
>>> slightly differently.
>>> - MIN_GAP and MAX_GAP are identical.
>>> - mmap_base() does the same but uses STACK_RND_MASK which provides
>>> the same values as stack_maxrandom_size().
>>> - arch_pick_mmap_layout() is almost identical. The only difference
>>> is that it also adds the random factor to mm->mmap_base in legacy mode.
>>>
>>> That last point is what provides the standard randomisation of mmaps.
>> 
>> Thanks for describing it. Could you add random_factor to mmap_base for
>> the legacy path for powerpc as a 2-line change that adds the legacy
>> randomisation. And then this bigger patch would be closer to a no-op.
>> 
> 
> You mean you would like to see the following patch before doing the 
> convert ?
> 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7dabf1cbde67a346a187881d4f0bd17347e0334a.1533732583.git.christophe.le...@c-s.fr/

Yes.

Thanks,
Nick


Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout

2021-12-09 Thread Christophe Leroy


Le 09/12/2021 à 11:15, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am:
>> Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and
>> remove arch/powerpc/mm/mmap.c
>>
>> This change provides standard randomisation of mmaps.
>>
>> See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386
>> and X86_32") for all the benefits of mmap randomisation.
> 
> The justification seems pretty reasonable.
> 
>>
>> Comparison between powerpc implementation and the generic one:
>> - mmap_is_legacy() is identical.
>> - arch_mmap_rnd() does exactly the same allthough it's written
>> slightly differently.
>> - MIN_GAP and MAX_GAP are identical.
>> - mmap_base() does the same but uses STACK_RND_MASK which provides
>> the same values as stack_maxrandom_size().
>> - arch_pick_mmap_layout() is almost identical. The only difference
>> is that it also adds the random factor to mm->mmap_base in legacy mode.
>>
>> That last point is what provides the standard randomisation of mmaps.
> 
> Thanks for describing it. Could you add random_factor to mmap_base for
> the legacy path for powerpc as a 2-line change that adds the legacy
> randomisation. And then this bigger patch would be closer to a no-op.
> 

You mean you would like to see the following patch before doing the 
convert ?

https://patchwork.ozlabs.org/project/linuxppc-dev/patch/7dabf1cbde67a346a187881d4f0bd17347e0334a.1533732583.git.christophe.le...@c-s.fr/

Re: [PATCH v4 09/10] powerpc/mm: Convert to default topdown mmap layout

2021-12-09 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am:
> Select CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT and
> remove arch/powerpc/mm/mmap.c
> 
> This change provides standard randomisation of mmaps.
> 
> See commit 8b8addf891de ("x86/mm/32: Enable full randomization on i386
> and X86_32") for all the benefits of mmap randomisation.

The justification seems pretty reasonable.

> 
> Comparison between powerpc implementation and the generic one:
> - mmap_is_legacy() is identical.
> - arch_mmap_rnd() does exactly the same allthough it's written
> slightly differently.
> - MIN_GAP and MAX_GAP are identical.
> - mmap_base() does the same but uses STACK_RND_MASK which provides
> the same values as stack_maxrandom_size().
> - arch_pick_mmap_layout() is almost identical. The only difference
> is that it also adds the random factor to mm->mmap_base in legacy mode.
> 
> That last point is what provides the standard randomisation of mmaps.

Thanks for describing it. Could you add random_factor to mmap_base for
the legacy path for powerpc as a 2-line change that adds the legacy
randomisation. And then this bigger patch would be closer to a no-op.

Thanks,
Nick



Re: [PATCH v4 07/10] powerpc/mm: Use generic_hugetlb_get_unmapped_area()

2021-12-09 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am:
> Use the generic version of arch_hugetlb_get_unmapped_area()
> which is now available at all time.
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/book3s/64/hugetlb.h |  4 --
>  arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 55 
>  arch/powerpc/mm/hugetlbpage.c|  4 +-
>  3 files changed, 1 insertion(+), 62 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
> b/arch/powerpc/include/asm/book3s/64/hugetlb.h
> index 12e150e615b7..b37a28f62cf6 100644
> --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
> +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
> @@ -8,10 +8,6 @@
>   */
>  void radix__flush_hugetlb_page(struct vm_area_struct *vma, unsigned long 
> vmaddr);
>  void radix__local_flush_hugetlb_page(struct vm_area_struct *vma, unsigned 
> long vmaddr);
> -extern unsigned long
> -radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> - unsigned long len, unsigned long pgoff,
> - unsigned long flags);
>  
>  extern void radix__huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
>   unsigned long addr, pte_t *ptep,
> diff --git a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c 
> b/arch/powerpc/mm/book3s64/radix_hugetlbpage.c
> index 23d3e08911d3..d2fb776febb4 100644
> --- a/arch/powerpc/mm/book3s64/radix_hugetlbpage.c
> +++ b/arch/powerpc/mm/book3s64/radix_hugetlbpage.c
> @@ -41,61 +41,6 @@ void radix__flush_hugetlb_tlb_range(struct vm_area_struct 
> *vma, unsigned long st
>   radix__flush_tlb_range_psize(vma->vm_mm, start, end, psize);
>  }
>  
> -/*
> - * A vairant of hugetlb_get_unmapped_area doing topdown search
> - * FIXME!! should we do as x86 does or non hugetlb area does ?
> - * ie, use topdown or not based on mmap_is_legacy check ?
> - */
> -unsigned long
> -radix__hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> - unsigned long len, unsigned long pgoff,
> - unsigned long flags)
> -{
> - struct mm_struct *mm = current->mm;
> - struct vm_area_struct *vma;
> - struct hstate *h = hstate_file(file);
> - int fixed = (flags & MAP_FIXED);
> - unsigned long high_limit;
> - struct vm_unmapped_area_info info;
> -
> - high_limit = DEFAULT_MAP_WINDOW;
> - if (addr >= high_limit || (fixed && (addr + len > high_limit)))
> - high_limit = TASK_SIZE;

I wonder if generic hugetlb_get_unmapped_area needs to have the
arch_get_mmap_end() added.

arm64 has arch_get_mmap_end() and !HAVE_ARCH_HUGETLB_UNMAPPED_AREA so
it looks like it has broken large address hint logic for hugetlbfs
mappings? x86-64 defines their own and does the same hinting for
normal and hugetlbfs mmap.

If we had that and defied arch_get_mmap_end(), then this patch should
work.

Thanks,
Nick


RE: [PATCH] sound: fsl: add missing put_device() call in imx_hdmi_probe()

2021-12-09 Thread 王擎

>> From: Wang Qing 
>> 
>> of_find_device_by_node() takes a reference to the embedded struct device 
>> which needs to be dropped when error return.
>
>...
>
>>    data = devm_kzalloc(>dev, sizeof(*data), GFP_KERNEL);
>>    if (!data) {
>> + put_device(_pdev->dev);
>
>If it's of_find_device_by_node() you need an of_node_put() since you're
>dropping a reference on the OF node.
>

Label fail will drop a reference on the OF node.
Also, put_device() is called later except this branch, 
we just need to add put_device() here.

Thanks,
Qing

>>ret = -ENOMEM;
>> goto fail;


Re: [PATCH v4 06/10] powerpc/mm: Use generic_get_unmapped_area() and call it from arch_get_unmapped_area()

2021-12-09 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am:
> Use the generic version of arch_get_unmapped_area() which
> is now available at all time instead of its copy
> radix__arch_get_unmapped_area()
> 
> Instead of setting mm->get_unmapped_area() to either
> arch_get_unmapped_area() or generic_get_unmapped_area(),
> always set it to arch_get_unmapped_area() and call
> generic_get_unmapped_area() from there when radix is enabled.
> 
> Do the same with radix__arch_get_unmapped_area_topdown()
> 
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/mm/mmap.c | 127 ++---
>  1 file changed, 6 insertions(+), 121 deletions(-)
> 
> diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c
> index 9b0d6e395bc0..46781d0103d1 100644
> --- a/arch/powerpc/mm/mmap.c
> +++ b/arch/powerpc/mm/mmap.c
> @@ -81,115 +81,15 @@ static inline unsigned long mmap_base(unsigned long rnd,
>  }
>  
>  #ifdef HAVE_ARCH_UNMAPPED_AREA
> -#ifdef CONFIG_PPC_RADIX_MMU
> -/*
> - * Same function as generic code used only for radix, because we don't need 
> to overload
> - * the generic one. But we will have to duplicate, because hash select
> - * HAVE_ARCH_UNMAPPED_AREA
> - */
> -static unsigned long
> -radix__arch_get_unmapped_area(struct file *filp, unsigned long addr,
> -  unsigned long len, unsigned long pgoff,
> -  unsigned long flags)
> -{
> - struct mm_struct *mm = current->mm;
> - struct vm_area_struct *vma;
> - int fixed = (flags & MAP_FIXED);
> - unsigned long high_limit;
> - struct vm_unmapped_area_info info;
> -
> - high_limit = DEFAULT_MAP_WINDOW;
> - if (addr >= high_limit || (fixed && (addr + len > high_limit)))
> - high_limit = TASK_SIZE;

Does 64s radix need to define arch_get_mmap_end() to do the above now?

Otherwise great to consolidate this with core code, nice patch.

Thanks,
Nick



Re: [PATCH v4 02/10] mm, hugetlbfs: Allow an arch to always use generic versions of get_unmapped_area functions

2021-12-09 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am:
> Unlike most architectures, powerpc can only define at runtime
> if it is going to use the generic arch_get_unmapped_area() or not.
> 
> Today, powerpc has a copy of the generic arch_get_unmapped_area()
> because when selection HAVE_ARCH_UNMAPPED_AREA the generic
> arch_get_unmapped_area() is not available.
> 
> Rename it generic_get_unmapped_area() and make it independent of
> HAVE_ARCH_UNMAPPED_AREA.
> 
> Do the same for arch_get_unmapped_area_topdown() versus
> HAVE_ARCH_UNMAPPED_AREA_TOPDOWN.
> 
> Do the same for hugetlb_get_unmapped_area() versus
> HAVE_ARCH_HUGETLB_UNMAPPED_AREA.
> 

Reviewed-by: Nicholas Piggin 

> Signed-off-by: Christophe Leroy 
> ---
>  fs/hugetlbfs/inode.c | 17 +
>  include/linux/hugetlb.h  |  5 +
>  include/linux/sched/mm.h |  9 +
>  mm/mmap.c| 31 ---
>  4 files changed, 51 insertions(+), 11 deletions(-)
> 


Re: [PATCH v4 03/10] powerpc/mm: Move vma_mmu_pagesize()

2021-12-09 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of December 9, 2021 3:18 am:
> vma_mmu_pagesize() is only required for slices,
> otherwise there is a generic weak version doing the
> exact same thing.
> 
> Move it to slice.c
> 

Reviewed-by: Nicholas Piggin 

> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/mm/hugetlbpage.c | 11 ---
>  arch/powerpc/mm/slice.c   |  9 +
>  2 files changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index ddead41e2194..0eec3b61bd13 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -565,17 +565,6 @@ unsigned long hugetlb_get_unmapped_area(struct file 
> *file, unsigned long addr,
>  }
>  #endif
>  
> -unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
> -{
> - /* With radix we don't use slice, so derive it from vma*/
> - if (IS_ENABLED(CONFIG_PPC_MM_SLICES) && !radix_enabled()) {
> - unsigned int psize = get_slice_psize(vma->vm_mm, vma->vm_start);
> -
> - return 1UL << mmu_psize_to_shift(psize);
> - }
> - return vma_kernel_pagesize(vma);
> -}
> -
>  bool __init arch_hugetlb_valid_size(unsigned long size)
>  {
>   int shift = __ffs(size);
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index f42711f865f3..8a3ac062b71e 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -759,4 +759,13 @@ int slice_is_hugepage_only_range(struct mm_struct *mm, 
> unsigned long addr,
>  
>   return !slice_check_range_fits(mm, maskp, addr, len);
>  }
> +
> +unsigned long vma_mmu_pagesize(struct vm_area_struct *vma)
> +{
> + /* With radix we don't use slice, so derive it from vma*/
> + if (radix_enabled())
> + return vma_kernel_pagesize(vma);
> +
> + return 1UL << mmu_psize_to_shift(get_slice_psize(vma->vm_mm, 
> vma->vm_start));
> +}
>  #endif
> -- 
> 2.33.1
> 
> 


Re: [PATCH v6 15/18] powerpc/64s: Always define arch unmapped area calls

2021-12-09 Thread Nicholas Piggin
Excerpts from Nicholas Piggin's message of December 9, 2021 6:25 pm:
> Excerpts from Christophe Leroy's message of December 8, 2021 7:38 pm:
>> 
>> 
>> Le 01/12/2021 à 15:41, Nicholas Piggin a écrit :
>>> To avoid any functional changes to radix paths when building with hash
>>> MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the
>>> arch get_unmapped_area calls on 64s platforms.
>>> 
>>> Signed-off-by: Nicholas Piggin 
>>> ---
>>>   arch/powerpc/include/asm/book3s/64/hash.h |  4 ---
>>>   arch/powerpc/include/asm/book3s/64/mmu.h  |  6 
>>>   arch/powerpc/mm/hugetlbpage.c | 16 ++---
>>>   arch/powerpc/mm/mmap.c| 40 +++
>>>   arch/powerpc/mm/slice.c   | 20 
>>>   5 files changed, 51 insertions(+), 35 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
>>> b/arch/powerpc/include/asm/book3s/64/hash.h
>>> index 674fe0e890dc..a7a0572f3846 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/hash.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
>>> @@ -99,10 +99,6 @@
>>>* Defines the address of the vmemap area, in its own region on
>>>* hash table CPUs.
>>>*/
>>> -#ifdef CONFIG_PPC_MM_SLICES
>>> -#define HAVE_ARCH_UNMAPPED_AREA
>>> -#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
>>> -#endif /* CONFIG_PPC_MM_SLICES */
>>>   
>>>   /* PTEIDX nibble */
>>>   #define _PTEIDX_SECONDARY 0x8
>>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
>>> b/arch/powerpc/include/asm/book3s/64/mmu.h
>>> index c02f42d1031e..015d7d972d16 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>>> @@ -4,6 +4,12 @@
>>>   
>>>   #include 
>>>   
>>> +#ifdef CONFIG_HUGETLB_PAGE
>>> +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>>> +#endif
>>> +#define HAVE_ARCH_UNMAPPED_AREA
>>> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
>>> +
>>>   #ifndef __ASSEMBLY__
>>>   /*
>>>* Page size definition
>>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>>> index 82d8b368ca6d..ddead41e2194 100644
>>> --- a/arch/powerpc/mm/hugetlbpage.c
>>> +++ b/arch/powerpc/mm/hugetlbpage.c
>>> @@ -542,20 +542,26 @@ struct page *follow_huge_pd(struct vm_area_struct 
>>> *vma,
>>> return page;
>>>   }
>>>   
>>> -#ifdef CONFIG_PPC_MM_SLICES
>>> +#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>>> +static inline int file_to_psize(struct file *file)
>>> +{
>>> +   struct hstate *hstate = hstate_file(file);
>>> +   return shift_to_mmu_psize(huge_page_shift(hstate));
>>> +}
>>> +
>>>   unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long 
>>> addr,
>>> unsigned long len, unsigned long pgoff,
>>> unsigned long flags)
>>>   {
>>> -   struct hstate *hstate = hstate_file(file);
>>> -   int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate));
>>> -
>>>   #ifdef CONFIG_PPC_RADIX_MMU
>>> if (radix_enabled())
>>> return radix__hugetlb_get_unmapped_area(file, addr, len,
>>>pgoff, flags);
>>>   #endif
>>> -   return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1);
>>> +#ifdef CONFIG_PPC_MM_SLICES
>>> +   return slice_get_unmapped_area(addr, len, flags, file_to_psize(file), 
>>> 1);
>>> +#endif
>>> +   BUG();
>> 
>> We shouldn't had new instances of BUG().
>> 
>> BUILD_BUG() should do the trick here.
>> 
>>>   }
>>>   #endif
>>>   
>>> diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c
>>> index ae683fdc716c..c475cf810aa8 100644
>>> --- a/arch/powerpc/mm/mmap.c
>>> +++ b/arch/powerpc/mm/mmap.c
>>> @@ -80,6 +80,7 @@ static inline unsigned long mmap_base(unsigned long rnd,
>>> return PAGE_ALIGN(DEFAULT_MAP_WINDOW - gap - rnd);
>>>   }
>>>   
>>> +#ifdef HAVE_ARCH_UNMAPPED_AREA
>>>   #ifdef CONFIG_PPC_RADIX_MMU
>>>   /*
>>>* Same function as generic code used only for radix, because we don't 
>>> need to overload
>>> @@ -181,11 +182,42 @@ radix__arch_get_unmapped_area_topdown(struct file 
>>> *filp,
>>>  */
>>> return radix__arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
>>>   }
>>> +#endif
>>> +
>>> +unsigned long arch_get_unmapped_area(struct file *filp,
>>> +unsigned long addr,
>>> +unsigned long len,
>>> +unsigned long pgoff,
>>> +unsigned long flags)
>>> +{
>>> +#ifdef CONFIG_PPC_MM_SLICES
>>> +   return slice_get_unmapped_area(addr, len, flags,
>>> +  
>>> mm_ctx_user_psize(>mm->context), 0);
>>> +#else
>>> +   BUG();
>> 
>> Same.
>> 
>> And the #else isn't needed
>> 
>>> +#endif
>>> +}
>>> +
>>> +unsigned long arch_get_unmapped_area_topdown(struct file *filp,
>>> +const unsigned long addr0,
>>> +const unsigned long len,
>>> 

Re: [PATCH v2 2/6] powerpc/kexec_file: Add KEXEC_SIG support.

2021-12-09 Thread Michal Suchánek
Hello,

On Wed, Dec 08, 2021 at 08:51:47PM -0500, Nayna wrote:
> 
> On 11/25/21 13:02, Michal Suchanek wrote:
> > Copy the code from s390x
> > 
> > Signed-off-by: Michal Suchanek
> > ---
> >   arch/powerpc/Kconfig| 11 +++
> >   arch/powerpc/kexec/elf_64.c | 36 
> >   2 files changed, 47 insertions(+)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index ac0c515552fd..ecc1227a77f1 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -561,6 +561,17 @@ config KEXEC_FILE
> >   config ARCH_HAS_KEXEC_PURGATORY
> > def_bool KEXEC_FILE
> > 
> > +config KEXEC_SIG
> > +   bool "Verify kernel signature during kexec_file_load() syscall"
> > +   depends on KEXEC_FILE && MODULE_SIG_FORMAT
> 
> After manually applying the patch, the build is failing with the following
> error:
> 
> build failed with error "arch/powerpc/kexec/elf_64.o: In function
> `elf64_verify_sig':
> /root/kernel/linus/linux/arch/powerpc/kexec/elf_64.c:160: undefined
> reference to `verify_appended_signature'"

This patch does not add call to verify_appended_signature.

Maybe you applied the following patch as well?

Thanks

Michal


Re: [PATCH v3] powerpc/pseries: read the lpar name from the firmware

2021-12-09 Thread Laurent Dufour
On 08/12/2021, 16:21:29, Nathan Lynch wrote:
> Laurent Dufour  writes:
>> On 07/12/2021, 18:07:50, Nathan Lynch wrote:
>>> Laurent Dufour  writes:
 On 07/12/2021, 15:32:39, Nathan Lynch wrote:
> Is there a reasonable fallback for VMs where this parameter doesn't
> exist? PowerVM partitions should always have it, but what do we want the
> behavior to be on other hypervisors?

 In that case, there is no value displayed in the /proc/powerpc/lparcfg and
 the lparstat -i command will fall back to the device tree value. I can't
 see any valid reason to report the value defined in the device tree
 here.
>>>
>>> Here's a valid reason :-)
>>>
>>> lparstat isn't the only possible consumer of the interface, and the
>>> 'ibm,partition-name' property and the dynamic system parameter clearly
>>> serve a common purpose. 'ibm,partition-name' is provided by qemu.
>>
>> If the hypervisor is not providing this value, this is not the goal of this
>> interface to fetch it from the device tree.
>>
>> Any consumer should be able to fall back on the device tree value, and
>> there is no added value to do such a trick in the kernel when it can be
>> done in the user space.
> 
> There is value in imposing a level of abstraction so that the semantics
> are:
> 
> * Report the name assigned to the guest by the hosting environment, if
>   available
> 
> as opposed to
> 
> * Return the string returned by a RTAS call to ibm,get-system-parameter
>   with token 55, if implemented
> 
> The benefit is that consumers of lparcfg do not have to be coded with
> the knowledge that "if a partition_name= line is absent, the
> ibm,get-system-parameter RTAS call must have failed, so now I should
> read /sys/firmware/devicetree/base/ibm,partition_name." That's the sort
> of esoterica that is appropriate for the kernel to encapsulate.
> 
> And I'd say the effort involved (falling back to a root node property
> lookup) is proportional to the benefit.
> 

I don't agree.
>From the kernel point of view, I can't see any benefit, this is adding more
complexity to do in the kernel what can be done easily in user space.

This is typically what should be implemented in a user space shared library.


Re: [PATCH v6 17/18] powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU

2021-12-09 Thread Nicholas Piggin
Excerpts from Michael Ellerman's message of December 7, 2021 11:00 pm:
> Nicholas Piggin  writes:
>>  34 files changed, 173 insertions(+), 52 deletions(-)
> 
> 
> I was able to clean up some of the ifdefs a little with the changes
> below. I'll run these through some test builds and then squash them in.

Looks good to me.

Thanks,
Nick

> 
> cheers
> 
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index 3004f3323144..21f780942911 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -523,8 +523,14 @@ void slb_save_contents(struct slb_entry *slb_ptr);
>  void slb_dump_contents(struct slb_entry *slb_ptr);
>  
>  extern void slb_vmalloc_update(void);
> -extern void slb_set_size(u16 size);
>  void preload_new_slb_context(unsigned long start, unsigned long sp);
> +
> +#ifdef CONFIG_PPC_64S_HASH_MMU
> +void slb_set_size(u16 size);
> +#else
> +static inline void slb_set_size(u16 size) { }
> +#endif
> +
>  #endif /* __ASSEMBLY__ */
>  
>  /*
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 2197404cdcc4..75678ff04dd7 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -231,10 +231,9 @@ static void __init check_cpu_pa_features(unsigned long 
> node)
> ibm_pa_features, ARRAY_SIZE(ibm_pa_features));
>  }
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> +#ifdef CONFIG_PPC_64S_HASH_MMU
>  static void __init init_mmu_slb_size(unsigned long node)
>  {
> -#ifdef CONFIG_PPC_64S_HASH_MMU
>   const __be32 *slb_size_ptr;
>  
>   slb_size_ptr = of_get_flat_dt_prop(node, "slb-size", NULL) ? :
> @@ -242,7 +241,6 @@ static void __init init_mmu_slb_size(unsigned long node)
>  
>   if (slb_size_ptr)
>   mmu_slb_size = be32_to_cpup(slb_size_ptr);
> -#endif
>  }
>  #else
>  #define init_mmu_slb_size(node) do { } while(0)
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 22647bb82198..703a2e6ab08d 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -886,9 +886,7 @@ void __init setup_per_cpu_areas(void)
>   atom_size = SZ_1M;
>   } else if (radix_enabled()) {
>   atom_size = PAGE_SIZE;
> - } else {
> -#ifdef CONFIG_PPC_64S_HASH_MMU
> -
> + } else if (IS_ENABLED(CONFIG_PPC_64S_HASH_MMU)) {
>   /*
>* Linear mapping is one of 4K, 1M and 16M.  For 4K, no need
>* to group units.  For larger mappings, use 1M atom which
> @@ -898,9 +896,6 @@ void __init setup_per_cpu_areas(void)
>   atom_size = PAGE_SIZE;
>   else
>   atom_size = SZ_1M;
> -#else
> - BUILD_BUG(); // radix_enabled() should be constant true
> -#endif
>   }
>  
>   if (pcpu_chosen_fc != PCPU_FC_PAGE) {
> diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c
> index 92d831621fa0..563e9989a5bf 100644
> --- a/arch/powerpc/kexec/ranges.c
> +++ b/arch/powerpc/kexec/ranges.c
> @@ -296,7 +296,7 @@ int add_initrd_mem_range(struct crash_mem **mem_ranges)
>   return ret;
>  }
>  
> -#ifdef CONFIG_PPC_BOOK3S_64
> +#ifdef CONFIG_PPC_64S_HASH_MMU
>  /**
>   * add_htab_mem_range - Adds htab range to the given memory ranges list,
>   *  if it exists
> @@ -306,14 +306,10 @@ int add_initrd_mem_range(struct crash_mem **mem_ranges)
>   */
>  int add_htab_mem_range(struct crash_mem **mem_ranges)
>  {
> -#ifdef CONFIG_PPC_64S_HASH_MMU
>   if (!htab_address)
>   return 0;
>  
>   return add_mem_range(mem_ranges, __pa(htab_address), htab_size_bytes);
> -#else
> - return 0;
> -#endif
>  }
>  #endif
>  
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 5f8cbeca8080..3c4f0ebe5df8 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -333,10 +333,8 @@ static void __init radix_init_pgtable(void)
>   phys_addr_t start, end;
>   u64 i;
>  
> -#ifdef CONFIG_PPC_64S_HASH_MMU
>   /* We don't support slb for radix */
> - mmu_slb_size = 0;
> -#endif
> + slb_set_size(0);
>  
>   /*
>* Create the linear mapping
> diff --git a/arch/powerpc/platforms/pseries/mobility.c 
> b/arch/powerpc/platforms/pseries/mobility.c
> index 21b706bcea76..85033f392c78 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -484,9 +484,7 @@ static int do_suspend(void)
>   ret = rtas_ibm_suspend_me();
>   if (ret != 0) {
>   pr_err("ibm,suspend-me error: %d\n", status);
> -#ifdef CONFIG_PPC_64S_HASH_MMU
>   slb_set_size(saved_slb_size);
> -#endif
>   }
>  
>   return ret;
> diff --git a/arch/powerpc/platforms/pseries/pseries.h 
> b/arch/powerpc/platforms/pseries/pseries.h
> index 

Re: [PATCH v6 15/18] powerpc/64s: Always define arch unmapped area calls

2021-12-09 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of December 8, 2021 8:00 pm:
> 
> 
> Le 01/12/2021 à 15:41, Nicholas Piggin a écrit :
>> To avoid any functional changes to radix paths when building with hash
>> MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the
>> arch get_unmapped_area calls on 64s platforms.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>   arch/powerpc/include/asm/book3s/64/hash.h |  4 ---
>>   arch/powerpc/include/asm/book3s/64/mmu.h  |  6 
>>   arch/powerpc/mm/hugetlbpage.c | 16 ++---
>>   arch/powerpc/mm/mmap.c| 40 +++
>>   arch/powerpc/mm/slice.c   | 20 
>>   5 files changed, 51 insertions(+), 35 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
>> b/arch/powerpc/include/asm/book3s/64/hash.h
>> index 674fe0e890dc..a7a0572f3846 100644
>> --- a/arch/powerpc/include/asm/book3s/64/hash.h
>> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
>> @@ -99,10 +99,6 @@
>>* Defines the address of the vmemap area, in its own region on
>>* hash table CPUs.
>>*/
>> -#ifdef CONFIG_PPC_MM_SLICES
>> -#define HAVE_ARCH_UNMAPPED_AREA
>> -#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
>> -#endif /* CONFIG_PPC_MM_SLICES */
>>   
>>   /* PTEIDX nibble */
>>   #define _PTEIDX_SECONDARY  0x8
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
>> b/arch/powerpc/include/asm/book3s/64/mmu.h
>> index c02f42d1031e..015d7d972d16 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> @@ -4,6 +4,12 @@
>>   
>>   #include 
>>   
>> +#ifdef CONFIG_HUGETLB_PAGE
>> +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>> +#endif
>> +#define HAVE_ARCH_UNMAPPED_AREA
>> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
>> +
>>   #ifndef __ASSEMBLY__
>>   /*
>>* Page size definition
>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>> index 82d8b368ca6d..ddead41e2194 100644
>> --- a/arch/powerpc/mm/hugetlbpage.c
>> +++ b/arch/powerpc/mm/hugetlbpage.c
>> @@ -542,20 +542,26 @@ struct page *follow_huge_pd(struct vm_area_struct *vma,
>>  return page;
>>   }
>>   
>> -#ifdef CONFIG_PPC_MM_SLICES
>> +#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
> 
> Could use CONFIG_PPC_BOOK3S_64 instead

It was going to make it de-selectable with !HASH. I think your series 
cleans this stuff up so I dont' think it's a big deal.

> 
>> +static inline int file_to_psize(struct file *file)
> 
> 'inline' is not needed.

It is otherwise a !SLICES config causes it to give a defined but not 
used error.

> 
>> +{
>> +struct hstate *hstate = hstate_file(file);
>> +return shift_to_mmu_psize(huge_page_shift(hstate));
>> +}
>> +
>>   unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long 
>> addr,
>>  unsigned long len, unsigned long pgoff,
>>  unsigned long flags)
>>   {
>> -struct hstate *hstate = hstate_file(file);
>> -int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate));
>> -
>>   #ifdef CONFIG_PPC_RADIX_MMU
>>  if (radix_enabled())
>>  return radix__hugetlb_get_unmapped_area(file, addr, len,
>> pgoff, flags);
>>   #endif
>> -return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1);
>> +#ifdef CONFIG_PPC_MM_SLICES
>> +return slice_get_unmapped_area(addr, len, flags, file_to_psize(file), 
>> 1);
>> +#endif
>> +BUG();
>>   }
>>   #endif
>>   
>> diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c
>> index ae683fdc716c..c475cf810aa8 100644
>> --- a/arch/powerpc/mm/mmap.c
>> +++ b/arch/powerpc/mm/mmap.c
>> @@ -80,6 +80,7 @@ static inline unsigned long mmap_base(unsigned long rnd,
>>  return PAGE_ALIGN(DEFAULT_MAP_WINDOW - gap - rnd);
>>   }
>>   
>> +#ifdef HAVE_ARCH_UNMAPPED_AREA
> 
> Could use CONFIG_PPC_BOOK3S_64 instead. Or better, put all that stuff in 
> a file in mm/book3s64/ directory

Seeing as you cleaned it up with your series, probably doesn't matter
much.

Thanks,
Nick


Re: [PATCH v6 15/18] powerpc/64s: Always define arch unmapped area calls

2021-12-09 Thread Christophe Leroy


Le 09/12/2021 à 09:25, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of December 8, 2021 7:38 pm:
>>
>>
>> Le 01/12/2021 à 15:41, Nicholas Piggin a écrit :
>>> To avoid any functional changes to radix paths when building with hash
>>> MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the
>>> arch get_unmapped_area calls on 64s platforms.
>>>
>>> Signed-off-by: Nicholas Piggin 
>>> ---
>>>arch/powerpc/include/asm/book3s/64/hash.h |  4 ---
>>>arch/powerpc/include/asm/book3s/64/mmu.h  |  6 
>>>arch/powerpc/mm/hugetlbpage.c | 16 ++---
>>>arch/powerpc/mm/mmap.c| 40 +++
>>>arch/powerpc/mm/slice.c   | 20 
>>>5 files changed, 51 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
>>> b/arch/powerpc/include/asm/book3s/64/hash.h
>>> index 674fe0e890dc..a7a0572f3846 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/hash.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
>>> @@ -99,10 +99,6 @@
>>> * Defines the address of the vmemap area, in its own region on
>>> * hash table CPUs.
>>> */
>>> -#ifdef CONFIG_PPC_MM_SLICES
>>> -#define HAVE_ARCH_UNMAPPED_AREA
>>> -#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
>>> -#endif /* CONFIG_PPC_MM_SLICES */
>>>
>>>/* PTEIDX nibble */
>>>#define _PTEIDX_SECONDARY0x8
>>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
>>> b/arch/powerpc/include/asm/book3s/64/mmu.h
>>> index c02f42d1031e..015d7d972d16 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>>> @@ -4,6 +4,12 @@
>>>
>>>#include 
>>>
>>> +#ifdef CONFIG_HUGETLB_PAGE
>>> +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>>> +#endif
>>> +#define HAVE_ARCH_UNMAPPED_AREA
>>> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
>>> +
>>>#ifndef __ASSEMBLY__
>>>/*
>>> * Page size definition
>>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>>> index 82d8b368ca6d..ddead41e2194 100644
>>> --- a/arch/powerpc/mm/hugetlbpage.c
>>> +++ b/arch/powerpc/mm/hugetlbpage.c
>>> @@ -542,20 +542,26 @@ struct page *follow_huge_pd(struct vm_area_struct 
>>> *vma,
>>> return page;
>>>}
>>>
>>> -#ifdef CONFIG_PPC_MM_SLICES
>>> +#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>>> +static inline int file_to_psize(struct file *file)
>>> +{
>>> +   struct hstate *hstate = hstate_file(file);
>>> +   return shift_to_mmu_psize(huge_page_shift(hstate));
>>> +}
>>> +
>>>unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long 
>>> addr,
>>> unsigned long len, unsigned long pgoff,
>>> unsigned long flags)
>>>{
>>> -   struct hstate *hstate = hstate_file(file);
>>> -   int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate));
>>> -
>>>#ifdef CONFIG_PPC_RADIX_MMU
>>> if (radix_enabled())
>>> return radix__hugetlb_get_unmapped_area(file, addr, len,
>>>pgoff, flags);
>>>#endif
>>> -   return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1);
>>> +#ifdef CONFIG_PPC_MM_SLICES
>>> +   return slice_get_unmapped_area(addr, len, flags, file_to_psize(file), 
>>> 1);
>>> +#endif
>>> +   BUG();
>>
>> We shouldn't had new instances of BUG().
>>
>> BUILD_BUG() should do the trick here.
>>
>>>}
>>>#endif
>>>
>>> diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c
>>> index ae683fdc716c..c475cf810aa8 100644
>>> --- a/arch/powerpc/mm/mmap.c
>>> +++ b/arch/powerpc/mm/mmap.c
>>> @@ -80,6 +80,7 @@ static inline unsigned long mmap_base(unsigned long rnd,
>>> return PAGE_ALIGN(DEFAULT_MAP_WINDOW - gap - rnd);
>>>}
>>>
>>> +#ifdef HAVE_ARCH_UNMAPPED_AREA
>>>#ifdef CONFIG_PPC_RADIX_MMU
>>>/*
>>> * Same function as generic code used only for radix, because we don't 
>>> need to overload
>>> @@ -181,11 +182,42 @@ radix__arch_get_unmapped_area_topdown(struct file 
>>> *filp,
>>>  */
>>> return radix__arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
>>>}
>>> +#endif
>>> +
>>> +unsigned long arch_get_unmapped_area(struct file *filp,
>>> +unsigned long addr,
>>> +unsigned long len,
>>> +unsigned long pgoff,
>>> +unsigned long flags)
>>> +{
>>> +#ifdef CONFIG_PPC_MM_SLICES
>>> +   return slice_get_unmapped_area(addr, len, flags,
>>> +  
>>> mm_ctx_user_psize(>mm->context), 0);
>>> +#else
>>> +   BUG();
>>
>> Same.
>>
>> And the #else isn't needed
>>
>>> +#endif
>>> +}
>>> +
>>> +unsigned long arch_get_unmapped_area_topdown(struct file *filp,
>>> +const unsigned long addr0,
>>> +const unsigned 

Re: [PATCH v6 15/18] powerpc/64s: Always define arch unmapped area calls

2021-12-09 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of December 8, 2021 7:38 pm:
> 
> 
> Le 01/12/2021 à 15:41, Nicholas Piggin a écrit :
>> To avoid any functional changes to radix paths when building with hash
>> MMU support disabled (and CONFIG_PPC_MM_SLICES=n), always define the
>> arch get_unmapped_area calls on 64s platforms.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>   arch/powerpc/include/asm/book3s/64/hash.h |  4 ---
>>   arch/powerpc/include/asm/book3s/64/mmu.h  |  6 
>>   arch/powerpc/mm/hugetlbpage.c | 16 ++---
>>   arch/powerpc/mm/mmap.c| 40 +++
>>   arch/powerpc/mm/slice.c   | 20 
>>   5 files changed, 51 insertions(+), 35 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
>> b/arch/powerpc/include/asm/book3s/64/hash.h
>> index 674fe0e890dc..a7a0572f3846 100644
>> --- a/arch/powerpc/include/asm/book3s/64/hash.h
>> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
>> @@ -99,10 +99,6 @@
>>* Defines the address of the vmemap area, in its own region on
>>* hash table CPUs.
>>*/
>> -#ifdef CONFIG_PPC_MM_SLICES
>> -#define HAVE_ARCH_UNMAPPED_AREA
>> -#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
>> -#endif /* CONFIG_PPC_MM_SLICES */
>>   
>>   /* PTEIDX nibble */
>>   #define _PTEIDX_SECONDARY  0x8
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
>> b/arch/powerpc/include/asm/book3s/64/mmu.h
>> index c02f42d1031e..015d7d972d16 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> @@ -4,6 +4,12 @@
>>   
>>   #include 
>>   
>> +#ifdef CONFIG_HUGETLB_PAGE
>> +#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>> +#endif
>> +#define HAVE_ARCH_UNMAPPED_AREA
>> +#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
>> +
>>   #ifndef __ASSEMBLY__
>>   /*
>>* Page size definition
>> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
>> index 82d8b368ca6d..ddead41e2194 100644
>> --- a/arch/powerpc/mm/hugetlbpage.c
>> +++ b/arch/powerpc/mm/hugetlbpage.c
>> @@ -542,20 +542,26 @@ struct page *follow_huge_pd(struct vm_area_struct *vma,
>>  return page;
>>   }
>>   
>> -#ifdef CONFIG_PPC_MM_SLICES
>> +#ifdef HAVE_ARCH_HUGETLB_UNMAPPED_AREA
>> +static inline int file_to_psize(struct file *file)
>> +{
>> +struct hstate *hstate = hstate_file(file);
>> +return shift_to_mmu_psize(huge_page_shift(hstate));
>> +}
>> +
>>   unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long 
>> addr,
>>  unsigned long len, unsigned long pgoff,
>>  unsigned long flags)
>>   {
>> -struct hstate *hstate = hstate_file(file);
>> -int mmu_psize = shift_to_mmu_psize(huge_page_shift(hstate));
>> -
>>   #ifdef CONFIG_PPC_RADIX_MMU
>>  if (radix_enabled())
>>  return radix__hugetlb_get_unmapped_area(file, addr, len,
>> pgoff, flags);
>>   #endif
>> -return slice_get_unmapped_area(addr, len, flags, mmu_psize, 1);
>> +#ifdef CONFIG_PPC_MM_SLICES
>> +return slice_get_unmapped_area(addr, len, flags, file_to_psize(file), 
>> 1);
>> +#endif
>> +BUG();
> 
> We shouldn't had new instances of BUG().
> 
> BUILD_BUG() should do the trick here.
> 
>>   }
>>   #endif
>>   
>> diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c
>> index ae683fdc716c..c475cf810aa8 100644
>> --- a/arch/powerpc/mm/mmap.c
>> +++ b/arch/powerpc/mm/mmap.c
>> @@ -80,6 +80,7 @@ static inline unsigned long mmap_base(unsigned long rnd,
>>  return PAGE_ALIGN(DEFAULT_MAP_WINDOW - gap - rnd);
>>   }
>>   
>> +#ifdef HAVE_ARCH_UNMAPPED_AREA
>>   #ifdef CONFIG_PPC_RADIX_MMU
>>   /*
>>* Same function as generic code used only for radix, because we don't 
>> need to overload
>> @@ -181,11 +182,42 @@ radix__arch_get_unmapped_area_topdown(struct file 
>> *filp,
>>   */
>>  return radix__arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
>>   }
>> +#endif
>> +
>> +unsigned long arch_get_unmapped_area(struct file *filp,
>> + unsigned long addr,
>> + unsigned long len,
>> + unsigned long pgoff,
>> + unsigned long flags)
>> +{
>> +#ifdef CONFIG_PPC_MM_SLICES
>> +return slice_get_unmapped_area(addr, len, flags,
>> +   
>> mm_ctx_user_psize(>mm->context), 0);
>> +#else
>> +BUG();
> 
> Same.
> 
> And the #else isn't needed
> 
>> +#endif
>> +}
>> +
>> +unsigned long arch_get_unmapped_area_topdown(struct file *filp,
>> + const unsigned long addr0,
>> + const unsigned long len,
>> + const unsigned long pgoff,
>> + const unsigned long flags)
>> +{
>> +#ifdef CONFIG_PPC_MM_SLICES
>> +