Re: [PATCH v6 (proposal)] powerpc/cpu: enable nr_cpus for crash kernel

2024-01-29 Thread Pingfan Liu
Hi Christophe, The latest series is https://lore.kernel.org/linuxppc-dev/20231017022806.4523-1-pi...@redhat.com/ And Michael has his implement on: https://lore.kernel.org/all/20231229120107.2281153-3-...@ellerman.id.au/T/#m46128446bce1095631162a1927415733a3bf0633 Thanks, Pingfan On Fri, Jan

RE: [PATCH v2 linux-next 1/3] x86, crash: don't nest CONFIG_CRASH_DUMP ifdef inside CONFIG_KEXEC_CODE ifdef scope

2024-01-29 Thread Michael Kelley
From: Baoquan He Sent: Monday, January 29, 2024 7:00 PM > > Michael pointed out that the CONFIG_CRASH_DUMP ifdef is nested inside > CONFIG_KEXEC_CODE ifdef scope in some XEN, Hyper-V codes. > > Although the nesting works well too since CONFIG_CRASH_DUMP has > dependency on CONFIG_KEXEC_CORE, it

Re: [PATCH linux-next 1/3] x86, crash: don't nest CONFIG_CRASH_DUMP ifdef inside CONFIG_KEXEC_CODE ifdef scope

2024-01-29 Thread Baoquan He
On 01/30/24 at 01:39am, Michael Kelley wrote: > From: Baoquan He > > > > On 01/29/24 at 06:27pm, Michael Kelley wrote: > > > From: Baoquan He Sent: Monday, January 29, 2024 > > 5:51 AM > > > > > > > > Michael pointed out that the #ifdef CONFIG_CRASH_DUMP is nested inside > > > >

Re: [PATCH v2 linux-next 1/3] x86, crash: don't nest CONFIG_CRASH_DUMP ifdef inside CONFIG_KEXEC_CODE ifdef scope

2024-01-29 Thread Baoquan He
Michael pointed out that the CONFIG_CRASH_DUMP ifdef is nested inside CONFIG_KEXEC_CODE ifdef scope in some XEN, Hyper-V codes. Although the nesting works well too since CONFIG_CRASH_DUMP has dependency on CONFIG_KEXEC_CORE, it may cause confusion because there are places where it's not nested,

RE: [PATCH linux-next 1/3] x86, crash: don't nest CONFIG_CRASH_DUMP ifdef inside CONFIG_KEXEC_CODE ifdef scope

2024-01-29 Thread Michael Kelley
From: Baoquan He > > On 01/29/24 at 06:27pm, Michael Kelley wrote: > > From: Baoquan He Sent: Monday, January 29, 2024 > 5:51 AM > > > > > > Michael pointed out that the #ifdef CONFIG_CRASH_DUMP is nested inside > > > arch/x86/xen/enlighten_hvm.c. > > > > Did some words get left out in the

Re: [PATCH linux-next 1/3] x86, crash: don't nest CONFIG_CRASH_DUMP ifdef inside CONFIG_KEXEC_CODE ifdef scope

2024-01-29 Thread Baoquan He
On 01/29/24 at 06:27pm, Michael Kelley wrote: > From: Baoquan He Sent: Monday, January 29, 2024 5:51 AM > > > > Michael pointed out that the #ifdef CONFIG_CRASH_DUMP is nested inside > > arch/x86/xen/enlighten_hvm.c. > > Did some words get left out in the above sentence? It mentions the Xen >

Re: [PATCH v10 5/6] arm64: support copy_mc_[user]_highpage()

2024-01-29 Thread Andrey Konovalov
On Mon, Jan 29, 2024 at 2:47 PM Tong Tiangen wrote: > > Currently, many scenarios that can tolerate memory errors when copying page > have been supported in the kernel[1][2][3], all of which are implemented by > copy_mc_[user]_highpage(). arm64 should also support this mechanism. > > Due to mte,

Re: [PATCH 1/3] init: Declare rodata_enabled and mark_rodata_ro() at all time

2024-01-29 Thread Luis Chamberlain
On Thu, Dec 21, 2023 at 10:02:46AM +0100, Christophe Leroy wrote: > Declaring rodata_enabled and mark_rodata_ro() at all time > helps removing related #ifdefery in C files. > > Signed-off-by: Christophe Leroy Very nice cleanup, thanks!, applied and pushed Luis

RE: [PATCH linux-next 1/3] x86, crash: don't nest CONFIG_CRASH_DUMP ifdef inside CONFIG_KEXEC_CODE ifdef scope

2024-01-29 Thread Michael Kelley
From: Baoquan He Sent: Monday, January 29, 2024 5:51 AM > > Michael pointed out that the #ifdef CONFIG_CRASH_DUMP is nested inside > arch/x86/xen/enlighten_hvm.c. Did some words get left out in the above sentence? It mentions the Xen case, but not the Hyper-V case. I'm not sure what you

Re: [PATCH v10 3/6] arm64: add uaccess to machine check safe

2024-01-29 Thread Mark Rutland
On Mon, Jan 29, 2024 at 09:46:49PM +0800, Tong Tiangen wrote: > If user process access memory fails due to hardware memory error, only the > relevant processes are affected, so it is more reasonable to kill the user > process and isolate the corrupt page than to panic the kernel. > >

Re: [PATCH v10 2/6] arm64: add support for machine check error safe

2024-01-29 Thread Mark Rutland
On Mon, Jan 29, 2024 at 09:46:48PM +0800, Tong Tiangen wrote: > For the arm64 kernel, when it processes hardware memory errors for > synchronize notifications(do_sea()), if the errors is consumed within the > kernel, the current processing is panic. However, it is not optimal. > > Take uaccess

[PATCH v1 9/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-29 Thread David Hildenbrand
Similar to how we optimized fork(), let's implement PTE batching when consecutive (present) PTEs map consecutive pages of the same large folio. Most infrastructure we need for batching (mmu gather, rmap) is already there. We only have to add get_and_clear_full_ptes() and clear_full_ptes().

[PATCH v1 8/9] mm/mmu_gather: add tlb_remove_tlb_entries()

2024-01-29 Thread David Hildenbrand
Let's add a helper that lets us batch-process multiple consecutive PTEs. Note that the loop will get optimized out on all architectures except on powerpc. We have to add an early define of __tlb_remove_tlb_entry() on ppc to make the compiler happy (and avoid making tlb_remove_tlb_entries() a

[PATCH v1 7/9] mm/mmu_gather: add __tlb_remove_folio_pages()

2024-01-29 Thread David Hildenbrand
Add __tlb_remove_folio_pages(), which will remove multiple consecutive pages that belong to the same large folio, instead of only a single page. We'll be using this function when optimizing unmapping/zapping of large folios that are mapped by PTEs. We're using the remaining spare bit in an

[PATCH v1 6/9] mm/mmu_gather: define ENCODED_PAGE_FLAG_DELAY_RMAP

2024-01-29 Thread David Hildenbrand
Nowadays, encoded pages are only used in mmu_gather handling. Let's update the documentation, and define ENCODED_PAGE_BIT_DELAY_RMAP. While at it, rename ENCODE_PAGE_BITS to ENCODED_PAGE_BITS. If encoded page pointers would ever be used in other context again, we'd likely want to change the

[PATCH v1 5/9] mm/mmu_gather: pass "delay_rmap" instead of encoded page to __tlb_remove_page_size()

2024-01-29 Thread David Hildenbrand
We have two bits available in the encoded page pointer to store additional information. Currently, we use one bit to request delay of the rmap removal until after a TLB flush. We want to make use of the remaining bit internally for batching of multiple pages of the same folio, specifying that the

[PATCH v1 4/9] mm/memory: factor out zapping folio pte into zap_present_folio_pte()

2024-01-29 Thread David Hildenbrand
Let's prepare for further changes by factoring it out into a separate function. Signed-off-by: David Hildenbrand --- mm/memory.c | 53 - 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index

[PATCH v1 3/9] mm/memory: further separate anon and pagecache folio handling in zap_present_pte()

2024-01-29 Thread David Hildenbrand
We don't need up-to-date accessed-dirty information for anon folios and can simply work with the ptent we already have. Also, we know the RSS counter we want to update. We can safely move arch_check_zapped_pte() + tlb_remove_tlb_entry() + zap_install_uffd_wp_if_needed() after updating the folio

[PATCH v1 2/9] mm/memory: handle !page case in zap_present_pte() separately

2024-01-29 Thread David Hildenbrand
We don't need uptodate accessed/dirty bits, so in theory we could replace ptep_get_and_clear_full() by an optimized ptep_clear_full() function. Let's rely on the provided pte. Further, there is no scenario where we would have to insert uffd-wp markers when zapping something that is not a normal

[PATCH v1 1/9] mm/memory: factor out zapping of present pte into zap_present_pte()

2024-01-29 Thread David Hildenbrand
Let's prepare for further changes by factoring out processing of present PTEs. Signed-off-by: David Hildenbrand --- mm/memory.c | 92 ++--- 1 file changed, 52 insertions(+), 40 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index

[PATCH v1 0/9] mm/memory: optimize unmap/zap with PTE-mapped THP

2024-01-29 Thread David Hildenbrand
This series is based on [1] and must be applied on top of it. Similar to what we did with fork(), let's implement PTE batching during unmap/zap when processing PTE-mapped THPs. We collect consecutive PTEs that map consecutive pages of the same large folio, making sure that the other PTE bits are

[PATCH v10 6/6] arm64: introduce copy_mc_to_kernel() implementation

2024-01-29 Thread Tong Tiangen
The copy_mc_to_kernel() helper is memory copy implementation that handles source exceptions. It can be used in memory copy scenarios that tolerate hardware memory errors(e.g: pmem_read/dax_copy_to_iter). Currnently, only x86 and ppc suuport this helper, after arm64 support machine check safe

[PATCH v10 0/6]arm64: add machine check safe support

2024-01-29 Thread Tong Tiangen
With the increase of memory capacity and density, the probability of memory error also increases. The increasing size and density of server RAM in data centers and clouds have shown increased uncorrectable memory errors. Currently, more and more scenarios that can tolerate memory errors???such as

[PATCH v10 2/6] arm64: add support for machine check error safe

2024-01-29 Thread Tong Tiangen
For the arm64 kernel, when it processes hardware memory errors for synchronize notifications(do_sea()), if the errors is consumed within the kernel, the current processing is panic. However, it is not optimal. Take uaccess for example, if the uaccess operation fails due to memory error, only the

[PATCH v10 4/6] mm/hwpoison: return -EFAULT when copy fail in copy_mc_[user]_highpage()

2024-01-29 Thread Tong Tiangen
If hardware errors are encountered during page copying, returning the bytes not copied is not meaningful, and the caller cannot do any processing on the remaining data. Returning -EFAULT is more reasonable, which represents a hardware error encountered during the copying. Signed-off-by: Tong

[PATCH v10 3/6] arm64: add uaccess to machine check safe

2024-01-29 Thread Tong Tiangen
If user process access memory fails due to hardware memory error, only the relevant processes are affected, so it is more reasonable to kill the user process and isolate the corrupt page than to panic the kernel. Signed-off-by: Tong Tiangen --- arch/arm64/lib/copy_from_user.S | 10 +-

[PATCH v10 5/6] arm64: support copy_mc_[user]_highpage()

2024-01-29 Thread Tong Tiangen
Currently, many scenarios that can tolerate memory errors when copying page have been supported in the kernel[1][2][3], all of which are implemented by copy_mc_[user]_highpage(). arm64 should also support this mechanism. Due to mte, arm64 needs to have its own copy_mc_[user]_highpage()

[PATCH v10 1/6] uaccess: add generic fallback version of copy_mc_to_user()

2024-01-29 Thread Tong Tiangen
x86/powerpc has it's implementation of copy_mc_to_user(), we add generic fallback in include/linux/uaccess.h prepare for other architechures to enable CONFIG_ARCH_HAS_COPY_MC. Signed-off-by: Tong Tiangen Acked-by: Michael Ellerman --- arch/powerpc/include/asm/uaccess.h | 1 +

[PATCH linux-next 3/3] arch, crash: move arch_crash_save_vmcoreinfo() out to file vmcore_info.c

2024-01-29 Thread Baoquan He
Nathan reported below building error: = $ curl -LSso .config https://git.alpinelinux.org/aports/plain/community/linux-edge/config-edge.armv7 $ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- olddefconfig all ... arm-linux-gnueabi-ld: arch/arm/kernel/machine_kexec.o: in function

[PATCH linux-next 2/3] crash: fix building error in generic codes

2024-01-29 Thread Baoquan He
Nathan reported some building errors on arm64 as below: == $ curl -LSso .config https://github.com/archlinuxarm/PKGBUILDs/raw/master/core/linux-aarch64/config $ make -skj"$(nproc)" ARCH=arm64 CROSS_COMPILE=aarch64-linux- olddefconfig all ... aarch64-linux-ld: kernel/kexec_file.o: in

[PATCH linux-next 1/3] x86, crash: don't nest CONFIG_CRASH_DUMP ifdef inside CONFIG_KEXEC_CODE ifdef scope

2024-01-29 Thread Baoquan He
Michael pointed out that the #ifdef CONFIG_CRASH_DUMP is nested inside arch/x86/xen/enlighten_hvm.c. Although the nesting works well too since CONFIG_CRASH_DUMP has dependency on CONFIG_KEXEC_CORE, it may cause confuse because there are places where it's not nested, and people may think it need

[PATCH] MAINTAINERS: adjust file entries after crypto vmx file movement

2024-01-29 Thread Lukas Bulwahn
Commit 109303336a0c ("crypto: vmx - Move to arch/powerpc/crypto") moves the crypto vmx files to arch/powerpc, but misses to adjust the file entries for IBM Power VMX Cryptographic instructions and LINUX FOR POWERPC. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about broken

[PATCH v3 15/15] mm/memory: ignore writable bit in folio_pte_batch()

2024-01-29 Thread David Hildenbrand
... and conditionally return to the caller if any PTE except the first one is writable. fork() has to make sure to properly write-protect in case any PTE is writable. Other users (e.g., page unmaping) are expected to not care. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand ---

[PATCH v3 14/15] mm/memory: ignore dirty/accessed/soft-dirty bits in folio_pte_batch()

2024-01-29 Thread David Hildenbrand
Let's always ignore the accessed/young bit: we'll always mark the PTE as old in our child process during fork, and upcoming users will similarly not care. Ignore the dirty bit only if we don't want to duplicate the dirty bit into the child process during fork. Maybe, we could just set all PTEs in

[PATCH v3 12/15] mm/memory: pass PTE to copy_present_pte()

2024-01-29 Thread David Hildenbrand
We already read it, let's just forward it. This patch is based on work by Ryan Roberts. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- mm/memory.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index

[PATCH v3 11/15] mm/memory: factor out copying the actual PTE in copy_present_pte()

2024-01-29 Thread David Hildenbrand
Let's prepare for further changes. Reviewed-by: Ryan Roberts Signed-off-by: David Hildenbrand --- mm/memory.c | 63 - 1 file changed, 33 insertions(+), 30 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 8d14ba440929..a3bdb25f4c8d

[PATCH v3 10/15] powerpc/mm: use pte_next_pfn() in set_ptes()

2024-01-29 Thread David Hildenbrand
Let's use our handy new helper. Note that the implementation is slightly different, but shouldn't really make a difference in practice. Reviewed-by: Christophe Leroy Signed-off-by: David Hildenbrand --- arch/powerpc/mm/pgtable.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff

[PATCH v3 09/15] arm/mm: use pte_next_pfn() in set_ptes()

2024-01-29 Thread David Hildenbrand
Let's use our handy helper now that it's available on all archs. Signed-off-by: David Hildenbrand --- arch/arm/mm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index 674ed71573a8..c24e29c0b9a4 100644 --- a/arch/arm/mm/mmu.c +++

[PATCH v3 08/15] mm/pgtable: make pte_next_pfn() independent of set_ptes()

2024-01-29 Thread David Hildenbrand
Let's provide pte_next_pfn(), independently of set_ptes(). This allows for using the generic pte_next_pfn() version in some arch-specific set_ptes() implementations, and prepares for reusing pte_next_pfn() in other context. Reviewed-by: Christophe Leroy Signed-off-by: David Hildenbrand ---

[PATCH v3 07/15] sparc/pgtable: define PFN_PTE_SHIFT

2024-01-29 Thread David Hildenbrand
We want to make use of pte_next_pfn() outside of set_ptes(). Let's simply define PFN_PTE_SHIFT, required by pte_next_pfn(). Signed-off-by: David Hildenbrand --- arch/sparc/include/asm/pgtable_64.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/sparc/include/asm/pgtable_64.h

[PATCH v3 06/15] s390/pgtable: define PFN_PTE_SHIFT

2024-01-29 Thread David Hildenbrand
We want to make use of pte_next_pfn() outside of set_ptes(). Let's simply define PFN_PTE_SHIFT, required by pte_next_pfn(). Signed-off-by: David Hildenbrand --- arch/s390/include/asm/pgtable.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/s390/include/asm/pgtable.h

[PATCH v3 05/15] riscv/pgtable: define PFN_PTE_SHIFT

2024-01-29 Thread David Hildenbrand
We want to make use of pte_next_pfn() outside of set_ptes(). Let's simply define PFN_PTE_SHIFT, required by pte_next_pfn(). Reviewed-by: Alexandre Ghiti Signed-off-by: David Hildenbrand --- arch/riscv/include/asm/pgtable.h | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH v3 04/15] powerpc/pgtable: define PFN_PTE_SHIFT

2024-01-29 Thread David Hildenbrand
We want to make use of pte_next_pfn() outside of set_ptes(). Let's simply define PFN_PTE_SHIFT, required by pte_next_pfn(). Reviewed-by: Christophe Leroy Signed-off-by: David Hildenbrand --- arch/powerpc/include/asm/pgtable.h | 2 ++ 1 file changed, 2 insertions(+) diff --git

[PATCH v3 03/15] nios2/pgtable: define PFN_PTE_SHIFT

2024-01-29 Thread David Hildenbrand
We want to make use of pte_next_pfn() outside of set_ptes(). Let's simply define PFN_PTE_SHIFT, required by pte_next_pfn(). Signed-off-by: David Hildenbrand --- arch/nios2/include/asm/pgtable.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/nios2/include/asm/pgtable.h

[PATCH v3 02/15] arm/pgtable: define PFN_PTE_SHIFT

2024-01-29 Thread David Hildenbrand
We want to make use of pte_next_pfn() outside of set_ptes(). Let's simply define PFN_PTE_SHIFT, required by pte_next_pfn(). Signed-off-by: David Hildenbrand --- arch/arm/include/asm/pgtable.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/include/asm/pgtable.h

[PATCH v3 01/15] arm64/mm: Make set_ptes() robust when OAs cross 48-bit boundary

2024-01-29 Thread David Hildenbrand
From: Ryan Roberts Since the high bits [51:48] of an OA are not stored contiguously in the PTE, there is a theoretical bug in set_ptes(), which just adds PAGE_SIZE to the pte to get the pte with the next pfn. This works until the pfn crosses the 48-bit boundary, at which point we overflow into

[PATCH v3 00/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-29 Thread David Hildenbrand
Now that the rmap overhaul[1] is upstream that provides a clean interface for rmap batching, let's implement PTE batching during fork when processing PTE-mapped THPs. This series is partially based on Ryan's previous work[2] to implement cont-pte support on arm64, but its a complete rewrite based

[PATCH] perf/pmu-events/powerpc: Update json mapfile with Power11 PVR

2024-01-29 Thread Madhavan Srinivasan
Update the Power11 PVR to json mapfile to enable json events. Power11 is PowerISA v3.1 compliant and support Power10 events. Signed-off-by: Madhavan Srinivasan --- tools/perf/pmu-events/arch/powerpc/mapfile.csv | 1 + 1 file changed, 1 insertion(+) diff --git

Re: [PATCH v2 14/15] mm/memory: ignore dirty/accessed/soft-dirty bits in folio_pte_batch()

2024-01-29 Thread Ryan Roberts
On 25/01/2024 19:32, David Hildenbrand wrote: > Let's always ignore the accessed/young bit: we'll always mark the PTE > as old in our child process during fork, and upcoming users will > similarly not care. > > Ignore the dirty bit only if we don't want to duplicate the dirty bit > into the child

Re: Re: [PATCH] KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'

2024-01-29 Thread Amit Machhiwal
Hi Aneesh, Thanks for looking into the patch. My comments are inline below. On 2024/01/24 01:06 PM, Aneesh Kumar K.V wrote: > Amit Machhiwal writes: > > > Currently, rebooting a pseries nested qemu-kvm guest (L2) results in > > below error as L1 qemu sends PVR value 'arch_compat' == 0 via > >

Re: [PATCH v2 13/15] mm/memory: optimize fork() with PTE-mapped THP

2024-01-29 Thread Ryan Roberts
On 25/01/2024 19:32, David Hildenbrand wrote: > Let's implement PTE batching when consecutive (present) PTEs map > consecutive pages of the same large folio, and all other PTE bits besides > the PFNs are equal. > > We will optimize folio_pte_batch() separately, to ignore selected > PTE bits. This

Re: [PATCH 5/5] sched/vtime: do not include header

2024-01-29 Thread Heiko Carstens
On Sun, Jan 28, 2024 at 08:58:54PM +0100, Alexander Gordeev wrote: > There is no architecture-specific code or data left > that generic needs to know about. > Thus, avoid the inclusion of header. > > Signed-off-by: Alexander Gordeev > --- > include/asm-generic/vtime.h | 1 - >

Re: [PATCH 4/5] s390/irq,nmi: do not include header

2024-01-29 Thread Heiko Carstens
On Sun, Jan 28, 2024 at 08:58:53PM +0100, Alexander Gordeev wrote: > update_timer_sys() and update_timer_mcck() are inlines used for > CPU time accounting from the interrupt and machine-check handlers. > These routines are specific to s390 architecture, but declared > via header, which in turn

Re: [PATCH 3/5] s390/vtime: remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover

2024-01-29 Thread Heiko Carstens
On Sun, Jan 28, 2024 at 08:58:52PM +0100, Alexander Gordeev wrote: > __ARCH_HAS_VTIME_TASK_SWITCH macro is not used anymore. > > Signed-off-by: Alexander Gordeev > --- > arch/s390/include/asm/vtime.h | 2 -- > 1 file changed, 2 deletions(-) Acked-by: Heiko Carstens

Re: [PATCH] mm/debug_vm_pgtable: Fix BUG_ON with pud advanced test

2024-01-29 Thread Aneesh Kumar K.V
On 1/29/24 12:23 PM, Anshuman Khandual wrote: > > > On 1/29/24 11:56, Aneesh Kumar K.V wrote: >> On 1/29/24 11:52 AM, Anshuman Khandual wrote: >>> >>> >>> On 1/29/24 11:30, Aneesh Kumar K.V (IBM) wrote: Architectures like powerpc add debug checks to ensure we find only devmap PUD pte