Re: [f2fs-dev] [PATCH v2 00/89] fs: new accessors for inode->i_ctime
Hello: This series was applied to jaegeuk/f2fs.git (dev) by Christian Brauner : On Wed, 5 Jul 2023 14:58:09 -0400 you wrote: > v2: > - prepend patches to add missing ctime updates > - add simple_rename_timestamp helper function > - rename ctime accessor functions as inode_get_ctime/inode_set_ctime_* > - drop individual inode_ctime_set_{sec,nsec} helpers > > I've been working on a patchset to change how the inode->i_ctime is > accessed in order to give us conditional, high-res timestamps for the > ctime and mtime. struct timespec64 has unused bits in it that we can use > to implement this. In order to do that however, we need to wrap all > accesses of inode->i_ctime to ensure that bits used as flags are > appropriately handled. > > [...] Here is the summary with links: - [f2fs-dev,v2,07/92] fs: add ctime accessors infrastructure https://git.kernel.org/jaegeuk/f2fs/c/9b6304c1d537 - [f2fs-dev,v2,08/92] fs: new helper: simple_rename_timestamp https://git.kernel.org/jaegeuk/f2fs/c/0c4767923ed6 - [f2fs-dev,v2,92/92] fs: rename i_ctime field to __i_ctime https://git.kernel.org/jaegeuk/f2fs/c/13bc24457850 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta
Hi, On 9/4/2023 3:40 PM, Philippe Mathieu-Daudé wrote: Hi, On 1/9/23 04:42, Yuan Tan wrote: MIPS Malta's power off depends on PCI, PCI_QUIRKS, and POWER_RESET_PIIX4_POWEROFF to work. Enable them when POWER_RESET is set for convenience. Suggested-by: Zhangjin Wu Signed-off-by: Yuan Tan --- arch/mips/Kconfig | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index bc8421859006..13bacbd05125 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -547,6 +547,9 @@ config MIPS_MALTA select MIPS_L1_CACHE_SHIFT_6 select MIPS_MSC select PCI_GT64XXX_PCI0 + select PCI if POWER_RESET + select PCI_QUIRKS if POWER_RESET + select POWER_RESET_PIIX4_POWEROFF if POWER_RESET select SMP_UP if SMP select SWAP_IO_SPACE select SYS_HAS_CPU_MIPS32_R1 Shouldn't we also update the _defconfig files? Sorry, in my last email, I forgot to reply to all. So I am now resending this email. In malta_defconfig, PCI and POWER_RESET_PIIX4_POWEROFF have already been set and PCI_QUIRKS is also selected by FSL_PCI [=n]. So shutdown and reboot with malta_defconfig is working and there is no need to update the malta_defconfig
Re: [PATCH v7 3/8] KVM: Make __kvm_follow_pfn not imply FOLL_GET
On Wed, Sep 6, 2023 at 9:45 AM Sean Christopherson wrote: > > On Tue, Sep 05, 2023, David Stevens wrote: > > On Wed, Jul 12, 2023 at 7:00 AM Sean Christopherson > > wrote: > > > > > > On Tue, Jul 11, 2023, Zhi Wang wrote: > > > > On Thu, 6 Jul 2023 15:49:39 +0900 > > > > David Stevens wrote: > > > > > > > > > On Wed, Jul 5, 2023 at 10:19___PM Zhi Wang > > > > > wrote: > > > > > > > > > > > > On Tue, 4 Jul 2023 16:50:48 +0900 > > > > > > David Stevens wrote: > > > > > > If yes, do we have to use FOLL_GET to resolve GFN associated with a > > > > > > tail page? > > > > > > It seems gup can tolerate gup_flags without FOLL_GET, but it is > > > > > > more like a > > > > > > temporary solution. I don't think it is a good idea to play tricks > > > > > > with > > > > > > a temporary solution, more like we are abusing the toleration. > > > > > > > > > > I'm not sure I understand what you're getting at. This series never > > > > > calls gup without FOLL_GET. > > > > > > > > > > This series aims to provide kvm_follow_pfn as a unified API on top of > > > > > gup+follow_pte. Since one of the major clients of this API uses an mmu > > > > > notifier, it makes sense to support returning a pfn without taking a > > > > > reference. And we indeed need to do that for certain types of memory. > > > > > > > > > > > > > I am not having prob with taking a pfn without taking a ref. I am > > > > questioning if using !FOLL_GET in struct kvm_follow_pfn to indicate > > > > taking > > > > a pfn without a ref is a good idea or not, while there is another flag > > > > actually showing it. > > > > > > > > I can understand that using FOLL_XXX in kvm_follow_pfn saves some > > > > translation between struct kvm_follow_pfn.{write, async, } and GUP > > > > flags. However FOLL_XXX is for GUP. Using FOLL_XXX for reflecting the > > > > requirements of GUP in the code path that going to call GUP is > > > > reasonable. > > > > > > > > But using FOLL_XXX with purposes that are not related to GUP call really > > > > feels off. > > > > > > I agree, assuming you're talking specifically about the logic in > > > hva_to_pfn_remapped() > > > that handles non-refcounted pages, i.e. this > > > > > > if (get_page_unless_zero(page)) { > > > foll->is_refcounted_page = true; > > > if (!(foll->flags & FOLL_GET)) > > > put_page(page); > > > } else if (foll->flags & FOLL_GET) { > > > r = -EFAULT; > > > } > > > > > > should be > > > > > > if (get_page_unless_zero(page)) { > > > foll->is_refcounted_page = true; > > > if (!(foll->flags & FOLL_GET)) > > > put_page(page); > > > else if (!foll->guarded_by_mmu_notifier) > > > r = -EFAULT; > > > > > > because it's not the desire to grab a reference that makes getting > > > non-refcounted > > > pfns "safe", it's whether or not the caller is plugged into the MMU > > > notifiers. > > > > > > Though that highlights that checking guarded_by_mmu_notifier should be > > > done for > > > *all* non-refcounted pfns, not just non-refcounted struct page memory. > > > > I think things are getting confused here because there are multiple > > things which "safe" refers to. There are three different definitions > > that I think are relevant here: > > > > 1) "safe" in the sense that KVM doesn't corrupt page reference counts > > 2) "safe" in the sense that KVM doesn't access pfns after they have been > > freed > > 3) "safe" in the sense that KVM doesn't use stale hva -> pfn translations > > > > For property 1, FOLL_GET is important. If the caller passes FOLL_GET, > > then they expect to be able to pass the returned pfn to > > kvm_release_pfn. This means that when FOLL_GET is set, if > > kvm_pfn_to_refcounted_page returns a page, then hva_to_pfn_remapped > > must take a reference count to avoid eventually corrupting the page > > ref count. I guess replacing the FOLL_GET check with > > !guarded_by_mmu_notifier is logically equivalent because > > __kvm_follow_pfn requires that at least one of guarded_by_mmu_notifier > > and FOLL_GET is set. But since we're concerned about a property of the > > refcount, I think that checking FOLL_GET is clearer. > > > > For property 2, FOLL_GET is also important. If guarded_by_mmu_notifier > > is set, then we're all good here. If guarded_by_mmu_notifier is not > > set, then the check in __kvm_follow_pfn guarantees that FOLL_GET is > > set. For struct page memory, we're safe because KVM will hold a > > reference as long as it's still using the page. For non struct page > > memory, we're not safe - this is where the breaking change of > > allow_unsafe_mappings would go. Note that for non-refcounted struct > > page, we can't use the allow_unsafe_mappings escape hatch. Since > > FOLL_GET was requested, if we returned such a page, then the caller > > would eventually corrupt the page refcount via kvm_release_pfn.
Re: [PATCH v7 3/8] KVM: Make __kvm_follow_pfn not imply FOLL_GET
On Tue, Sep 05, 2023, David Stevens wrote: > On Wed, Jul 12, 2023 at 7:00 AM Sean Christopherson wrote: > > > > On Tue, Jul 11, 2023, Zhi Wang wrote: > > > On Thu, 6 Jul 2023 15:49:39 +0900 > > > David Stevens wrote: > > > > > > > On Wed, Jul 5, 2023 at 10:19___PM Zhi Wang > > > > wrote: > > > > > > > > > > On Tue, 4 Jul 2023 16:50:48 +0900 > > > > > David Stevens wrote: > > > > > If yes, do we have to use FOLL_GET to resolve GFN associated with a > > > > > tail page? > > > > > It seems gup can tolerate gup_flags without FOLL_GET, but it is more > > > > > like a > > > > > temporary solution. I don't think it is a good idea to play tricks > > > > > with > > > > > a temporary solution, more like we are abusing the toleration. > > > > > > > > I'm not sure I understand what you're getting at. This series never > > > > calls gup without FOLL_GET. > > > > > > > > This series aims to provide kvm_follow_pfn as a unified API on top of > > > > gup+follow_pte. Since one of the major clients of this API uses an mmu > > > > notifier, it makes sense to support returning a pfn without taking a > > > > reference. And we indeed need to do that for certain types of memory. > > > > > > > > > > I am not having prob with taking a pfn without taking a ref. I am > > > questioning if using !FOLL_GET in struct kvm_follow_pfn to indicate taking > > > a pfn without a ref is a good idea or not, while there is another flag > > > actually showing it. > > > > > > I can understand that using FOLL_XXX in kvm_follow_pfn saves some > > > translation between struct kvm_follow_pfn.{write, async, } and GUP > > > flags. However FOLL_XXX is for GUP. Using FOLL_XXX for reflecting the > > > requirements of GUP in the code path that going to call GUP is reasonable. > > > > > > But using FOLL_XXX with purposes that are not related to GUP call really > > > feels off. > > > > I agree, assuming you're talking specifically about the logic in > > hva_to_pfn_remapped() > > that handles non-refcounted pages, i.e. this > > > > if (get_page_unless_zero(page)) { > > foll->is_refcounted_page = true; > > if (!(foll->flags & FOLL_GET)) > > put_page(page); > > } else if (foll->flags & FOLL_GET) { > > r = -EFAULT; > > } > > > > should be > > > > if (get_page_unless_zero(page)) { > > foll->is_refcounted_page = true; > > if (!(foll->flags & FOLL_GET)) > > put_page(page); > > else if (!foll->guarded_by_mmu_notifier) > > r = -EFAULT; > > > > because it's not the desire to grab a reference that makes getting > > non-refcounted > > pfns "safe", it's whether or not the caller is plugged into the MMU > > notifiers. > > > > Though that highlights that checking guarded_by_mmu_notifier should be done > > for > > *all* non-refcounted pfns, not just non-refcounted struct page memory. > > I think things are getting confused here because there are multiple > things which "safe" refers to. There are three different definitions > that I think are relevant here: > > 1) "safe" in the sense that KVM doesn't corrupt page reference counts > 2) "safe" in the sense that KVM doesn't access pfns after they have been freed > 3) "safe" in the sense that KVM doesn't use stale hva -> pfn translations > > For property 1, FOLL_GET is important. If the caller passes FOLL_GET, > then they expect to be able to pass the returned pfn to > kvm_release_pfn. This means that when FOLL_GET is set, if > kvm_pfn_to_refcounted_page returns a page, then hva_to_pfn_remapped > must take a reference count to avoid eventually corrupting the page > ref count. I guess replacing the FOLL_GET check with > !guarded_by_mmu_notifier is logically equivalent because > __kvm_follow_pfn requires that at least one of guarded_by_mmu_notifier > and FOLL_GET is set. But since we're concerned about a property of the > refcount, I think that checking FOLL_GET is clearer. > > For property 2, FOLL_GET is also important. If guarded_by_mmu_notifier > is set, then we're all good here. If guarded_by_mmu_notifier is not > set, then the check in __kvm_follow_pfn guarantees that FOLL_GET is > set. For struct page memory, we're safe because KVM will hold a > reference as long as it's still using the page. For non struct page > memory, we're not safe - this is where the breaking change of > allow_unsafe_mappings would go. Note that for non-refcounted struct > page, we can't use the allow_unsafe_mappings escape hatch. Since > FOLL_GET was requested, if we returned such a page, then the caller > would eventually corrupt the page refcount via kvm_release_pfn. Yes we can. The caller simply needs to be made aware of is_refcounted_page. I didn't include that in the snippet below because I didn't want to write the entire patch. The whole point of adding is_refcounted_page is so that callers can identify exactly what type of page was at the
Re: [PATCH gmem FIXUP] mm, compaction: make testing mapping_unmovable() safe
On Fri, Sep 01, 2023, Vlastimil Babka wrote: > As Kirill pointed out, mapping can be removed under us due to > truncation. Test it under folio lock as already done for the async > compaction / dirty folio case. To prevent locking every folio with > mapping to do the test, do it only for unevictable folios, as we can > expect the unmovable mapping folios are also unevictable - it is the > case for guest memfd folios. Rather than expect/assume that unmovable mappings are always unevictable, how about requiring that? E.g. either through a VM_WARN_ON in mapping_set_unmovable(), or by simply having that helper forcefully set AS_UNEVICTABLE as well.
[PATCH v2 2/3] vmcore: allow fadump to export vmcore even if is_kdump_kernel() is false
Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set. While elfcorehdr_addr is set for kexec based kernel dump mechanism, alternate dump capturing methods like fadump [1] also set it to export the vmcore. Since, is_kdump_kernel() is used to restrict resources in crash dump capture kernel and such restrictions are not desirable for fadump, allow is_kdump_kernel() to be defined differently for fadump case. With that change, include is_fadump_active() check in functions is_vmcore_usable() & vmcore_unusable() to be able to export vmcore for fadump case too. [1] https://docs.kernel.org/powerpc/firmware-assisted-dump.html Signed-off-by: Hari Bathini --- Changes in v2: * is_fadump_active() check added to is_vmcore_usable() as suggested by Baoquan. include/linux/crash_dump.h | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h index 0f3a656293b0..de8a9fabfb6f 100644 --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -50,6 +50,7 @@ void vmcore_cleanup(void); #define vmcore_elf64_check_arch(x) (elf_check_arch(x) || vmcore_elf_check_arch_cross(x)) #endif +#ifndef is_kdump_kernel /* * is_kdump_kernel() checks whether this kernel is booting after a panic of * previous kernel or not. This is determined by checking if previous kernel @@ -64,6 +65,19 @@ static inline bool is_kdump_kernel(void) { return elfcorehdr_addr != ELFCORE_ADDR_MAX; } +#endif + +#ifndef is_fadump_active +/* + * If f/w assisted dump capturing mechanism (fadump), instead of kexec based + * dump capturing mechanism (kdump) is exporting the vmcore, then this function + * will be defined in arch specific code to return true, when appropriate. + */ +static inline bool is_fadump_active(void) +{ + return false; +} +#endif /* is_vmcore_usable() checks if the kernel is booting after a panic and * the vmcore region is usable. @@ -75,7 +89,8 @@ static inline bool is_kdump_kernel(void) static inline int is_vmcore_usable(void) { - return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0; + return (is_kdump_kernel() || is_fadump_active()) + && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0; } /* vmcore_unusable() marks the vmcore as unusable, @@ -84,7 +99,7 @@ static inline int is_vmcore_usable(void) static inline void vmcore_unusable(void) { - if (is_kdump_kernel()) + if (is_kdump_kernel() || is_fadump_active()) elfcorehdr_addr = ELFCORE_ADDR_ERR; } -- 2.41.0
Re: [PATCH 1/2] vmcore: allow alternate dump capturing methods to export vmcore without is_kdump_kernel()
On 05/09/23 8:00 am, Baoquan He wrote: On 09/04/23 at 08:04pm, Hari Bathini wrote: Hi Baoquan, Thanks for the review... On 03/09/23 9:06 am, Baoquan He wrote: Hi Hari, On 09/02/23 at 12:34am, Hari Bathini wrote: Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set. While elfcorehdr_addr is set for kexec based kernel dump mechanism, alternate dump capturing methods like fadump [1] also set it to export the vmcore. is_kdump_kernel() is used to restrict resources in crash dump capture kernel but such restrictions may not be desirable for fadump. Allow is_kdump_kernel() to be defined differently for such scenarios. With this, is_kdump_kernel() could be false while vmcore is usable. So, introduce is_crashdump_kernel() to return true when elfcorehdr_addr is set and use it for vmcore related checks. I got what is done in these two patches, but didn't get why they need be done. vmcore_unusable()/is_vmcore_usable() are only unitilized in ia64. Why do you care if it's is_crashdump_kernel() or is_kdump_kernel()? If you want to override the generic is_kdump_kernel() with powerpc's own is_kdump_kernel(), your below change is enough to allow you to do that. I can't see why is_crashdump_kernel() is needed. Could you explain that specifically? You mean to just remove is_kdump_kernel() check in is_vmcore_usable() & vmcore_unusable() functions? Replaced generic is_crashdump_kernel() function instead, that returns true for any dump capturing method, irrespective of whether is_kdump_kernel() returns true or false. For fadump case, is_kdump_kernel() will return false after patch 2/2. OK, I could understand what you want to achieve. You want to make is_kdump_kernel() only return true for kdump, while is_vmcore_usable() returns true for both kdump and fadump. IIUC, can we change as below? It could make code clearer and more straightforward. I don't think adding another is_crashdump_kernel() is a good idea, that would be a torture for non-powerpc people reading code when they need differentiate between kdump and crashdump. Sure, Baoquan. Posted v2 based on your suggestion. Thanks Hari
[PATCH v2 1/3] powerpc/fadump: make is_fadump_active() visible for exporting vmcore
Include asm/fadump.h in asm/kexec.h to make it visible while exporting vmcore. Also, update is_fadump_active() to return boolean instead of integer for better readability. The change will be used in the next patch to ensure vmcore is exported when fadump is active. Signed-off-by: Hari Bathini --- Changes in v2: * New patch based on Baoquan's suggestion to use is_fadump_active() instead of introducing new function is_crashdump_kernel(). arch/powerpc/include/asm/fadump.h | 4 ++-- arch/powerpc/include/asm/kexec.h | 8 ++-- arch/powerpc/kernel/fadump.c | 4 ++-- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/fadump.h b/arch/powerpc/include/asm/fadump.h index 526a6a647312..27b74a7e2162 100644 --- a/arch/powerpc/include/asm/fadump.h +++ b/arch/powerpc/include/asm/fadump.h @@ -15,13 +15,13 @@ extern int crashing_cpu; extern int is_fadump_memory_area(u64 addr, ulong size); extern int setup_fadump(void); -extern int is_fadump_active(void); +extern bool is_fadump_active(void); extern int should_fadump_crash(void); extern void crash_fadump(struct pt_regs *, const char *); extern void fadump_cleanup(void); #else /* CONFIG_FA_DUMP */ -static inline int is_fadump_active(void) { return 0; } +static inline bool is_fadump_active(void) { return false; } static inline int should_fadump_crash(void) { return 0; } static inline void crash_fadump(struct pt_regs *regs, const char *str) { } static inline void fadump_cleanup(void) { } diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index a1ddba01e7d1..b760ef459234 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -51,6 +51,7 @@ #ifndef __ASSEMBLY__ #include +#include typedef void (*crash_shutdown_t)(void); @@ -99,10 +100,13 @@ void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_co void kexec_copy_flush(struct kimage *image); -#if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS) +#if defined(CONFIG_CRASH_DUMP) +#define is_fadump_active is_fadump_active +#if defined(CONFIG_PPC_RTAS) void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); #define crash_free_reserved_phys_range crash_free_reserved_phys_range -#endif +#endif /* CONFIG_PPC_RTAS */ +#endif /* CONFIG_CRASH_DUMP */ #ifdef CONFIG_KEXEC_FILE extern const struct kexec_file_ops kexec_elf64_ops; diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index 3ff2da7b120b..5682a65e8326 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -187,9 +187,9 @@ int should_fadump_crash(void) return 1; } -int is_fadump_active(void) +bool is_fadump_active(void) { - return fw_dump.dump_active; + return !!fw_dump.dump_active; } /* -- 2.41.0
[PATCH v2 3/3] powerpc/fadump: make is_kdump_kernel() return false when fadump is active
Currently, is_kdump_kernel() returns true in crash dump capture kernel for both kdump and fadump crash dump capturing methods, as both these methods set elfcorehdr_addr. Some restrictions enforced for crash dump capture kernel, based on is_kdump_kernel(), are specifically meant for kdump case and not desirable for fadump - eg. IO queues restriction in device drivers. So, define is_kdump_kernel() to return false when f/w assisted dump is active. Signed-off-by: Hari Bathini --- arch/powerpc/include/asm/kexec.h | 2 ++ arch/powerpc/kernel/crash_dump.c | 11 +++ 2 files changed, 13 insertions(+) diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index b760ef459234..f0b9c3fd0618 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -101,6 +101,8 @@ void relocate_new_kernel(unsigned long indirection_page, unsigned long reboot_co void kexec_copy_flush(struct kimage *image); #if defined(CONFIG_CRASH_DUMP) +bool is_kdump_kernel(void); +#define is_kdump_kernelis_kdump_kernel #define is_fadump_active is_fadump_active #if defined(CONFIG_PPC_RTAS) void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c index 9a3b85bfc83f..6d8e616ce3ce 100644 --- a/arch/powerpc/kernel/crash_dump.c +++ b/arch/powerpc/kernel/crash_dump.c @@ -92,6 +92,17 @@ ssize_t copy_oldmem_page(struct iov_iter *iter, unsigned long pfn, return csize; } +/* + * Return true only when kexec based kernel dump capturing method is used. + * This ensures all restritions applied for kdump case are not automatically + * applied for fadump case. + */ +bool is_kdump_kernel(void) +{ + return !is_fadump_active() && elfcorehdr_addr != ELFCORE_ADDR_MAX; +} +EXPORT_SYMBOL_GPL(is_kdump_kernel); + #ifdef CONFIG_PPC_RTAS /* * The crashkernel region will almost always overlap the RTAS region, so -- 2.41.0
Re: [PATCH 0/4] ppc, fbdev: Clean up fbdev mmap helper
Hi Am 05.09.23 um 04:47 schrieb Michael Ellerman: Thomas Zimmermann writes: Refactor fb_pgprotect() in PowerPC to work without struct file. Then clean up and rename fb_pgprotect(). This change has been discussed at [1] in the context of refactoring fbdev's mmap code. The first three patches adapt PowerPC's internal interfaces to provide a phys_mem_access_prot() that works without struct file. Neither the architecture code or fbdev helpers need the parameter. Patch 4 replaces fbdev's fb_pgprotect() with fb_pgprot_device() on all architectures. The new helper with its stream-lined interface enables more refactoring within fbdev's mmap implementation. The content of this series is OK, but the way it's structured makes it a real headache to merge, because it's mostly powerpc changes and then a dependant cross architecture patch at the end. It would be simpler if patch 4 was first and just passed file=NULL to the powerpc helper, with an explanation that it's unused and will be dropped in a future cleanup. We could then put the first patch (previously patch 4) in a topic branch that is shared between the powerpc tree and the fbdev tree, and then the powerpc changes could be staged on top of that through the powerpc tree. Ok, thanks for reviewing. The fbdev patch would go through the drm-misc tree as base for further refactoring. I'll update the patchset accordingly. Best regards Thomas cheers -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg) OpenPGP_signature Description: OpenPGP digital signature
[PATCH v2 3/3] perf vendor events: Update metric events for power10 platform
Update JSON/events for power10 platform with additional metrics. Signed-off-by: Kajol Jain --- .../arch/powerpc/power10/metrics.json | 388 ++ 1 file changed, 388 insertions(+) diff --git a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json index 4d66b75c6ad5..a36621858ea3 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json @@ -434,6 +434,13 @@ "MetricName": "L1_INST_MISS_RATE", "ScaleUnit": "1%" }, +{ +"BriefDescription": "Percentage of completed instructions that were demand fetches that missed the L1 and L2 instruction cache", +"MetricExpr": "PM_INST_FROM_L2MISS * 100 / PM_RUN_INST_CMPL", +"MetricGroup": "General", +"MetricName": "L2_INST_MISS_RATE", +"ScaleUnit": "1%" +}, { "BriefDescription": "Percentage of completed instructions that were demand fetches that reloaded from beyond the L3 icache", "MetricExpr": "PM_INST_FROM_L3MISS / PM_RUN_INST_CMPL * 100", @@ -466,6 +473,13 @@ "MetricGroup": "General", "MetricName": "LOADS_PER_INST" }, +{ +"BriefDescription": "Percentage of demand loads that reloaded from the L2 per completed instruction", +"MetricExpr": "PM_DATA_FROM_L2 * 100 / PM_RUN_INST_CMPL", +"MetricGroup": "dL1_Reloads", +"MetricName": "DL1_RELOAD_FROM_L2_RATE", +"ScaleUnit": "1%" +}, { "BriefDescription": "Percentage of demand loads that reloaded from beyond the L2 per completed instruction", "MetricExpr": "PM_DATA_FROM_L2MISS / PM_RUN_INST_CMPL * 100", @@ -473,6 +487,34 @@ "MetricName": "DL1_RELOAD_FROM_L2_MISS_RATE", "ScaleUnit": "1%" }, +{ +"BriefDescription": "Percentage of demand loads that reloaded using modified data from another core's L2 or L3 on a remote chip, per completed instruction", +"MetricExpr": "PM_DATA_FROM_RL2L3_MOD * 100 / PM_RUN_INST_CMPL", +"MetricGroup": "dL1_Reloads", +"MetricName": "DL1_RELOAD_FROM_RL2L3_MOD_RATE", +"ScaleUnit": "1%" +}, +{ +"BriefDescription": "Percentage of demand loads that reloaded using shared data from another core's L2 or L3 on a remote chip, per completed instruction", +"MetricExpr": "PM_DATA_FROM_RL2L3_SHR * 100 / PM_RUN_INST_CMPL", +"MetricGroup": "dL1_Reloads", +"MetricName": "DL1_RELOAD_FROM_RL2L3_SHR_RATE", +"ScaleUnit": "1%" +}, +{ +"BriefDescription": "Percentage of demand loads that reloaded from the L3 per completed instruction", +"MetricExpr": "PM_DATA_FROM_L3 * 100 / PM_RUN_INST_CMPL", +"MetricGroup": "dL1_Reloads", +"MetricName": "DL1_RELOAD_FROM_L3_RATE", +"ScaleUnit": "1%" +}, +{ +"BriefDescription": "Percentage of demand loads that reloaded with data brought into the L3 by prefetch per completed instruction", +"MetricExpr": "PM_DATA_FROM_L3_MEPF * 100 / PM_RUN_INST_CMPL", +"MetricGroup": "dL1_Reloads", +"MetricName": "DL1_RELOAD_FROM_L3_MEPF_RATE", +"ScaleUnit": "1%" +}, { "BriefDescription": "Percentage of demand loads that reloaded from beyond the L3 per completed instruction", "MetricExpr": "PM_DATA_FROM_L3MISS / PM_RUN_INST_CMPL * 100", @@ -480,6 +522,66 @@ "MetricName": "DL1_RELOAD_FROM_L3_MISS_RATE", "ScaleUnit": "1%" }, +{ +"BriefDescription": "Percentage of demand loads that reloaded using modified data from another core's L2 or L3 on a distant chip, per completed instruction", +"MetricExpr": "PM_DATA_FROM_DL2L3_MOD * 100 / PM_RUN_INST_CMPL", +"MetricGroup": "dL1_Reloads", +"MetricName": "DL1_RELOAD_FROM_DL2L3_MOD_RATE", +"ScaleUnit": "1%" +}, +{ +"BriefDescription": "Percentage of demand loads that reloaded using shared data from another core's L2 or L3 on a distant chip, per completed instruction", +"MetricExpr": "PM_DATA_FROM_DL2L3_SHR * 100 / PM_RUN_INST_CMPL", +"MetricGroup": "dL1_Reloads", +"MetricName": "DL1_RELOAD_FROM_DL2L3_SHR_RATE", +"ScaleUnit": "1%" +}, +{ +"BriefDescription": "Percentage of demand loads that reloaded from local memory per completed instruction", +"MetricExpr": "PM_DATA_FROM_LMEM * 100 / PM_RUN_INST_CMPL", +"MetricGroup": "dL1_Reloads", +"MetricName": "DL1_RELOAD_FROM_LMEM_RATE", +"ScaleUnit": "1%" +}, +{ +"BriefDescription": "Percentage of demand loads that reloaded from remote memory per completed instruction", +"MetricExpr": "PM_DATA_FROM_RMEM * 100 / PM_RUN_INST_CMPL", +"MetricGroup": "dL1_Reloads", +"MetricName": "DL1_RELOAD_FROM_RMEM_RATE", +"ScaleUnit": "1%" +}, +{ +
[PATCH v2 2/3] perf vendor events: Update JSON/events for power10 platform
Update JSON/Events list with additional data-source events for power10 platform. Signed-off-by: Kajol Jain --- .../arch/powerpc/power10/datasource.json | 505 ++ 1 file changed, 505 insertions(+) diff --git a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json index 12cfb9785433..6b0356f2d301 100644 --- a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json +++ b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json @@ -1278,5 +1278,510 @@ "EventCode": "0x0A424000C142", "EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3_MOD", "BriefDescription": "The processor's L1 data cache was reloaded with a line in the M (exclusive) state from another core's L2 or L3 on the same chip in a different regent due to a demand miss for a marked instruction." + }, + { +"EventCode": "0x0A424020C142", +"EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3_MOD_ALL", +"BriefDescription": "The processor's L1 data cache was reloaded with a line in the M (exclusive) state from another core's L2 or L3 on the same chip in a different regent due to a demand miss or prefetch reload for a marked instruction." + }, + { +"EventCode": "0x0A03C142", +"EventName": "PM_MRK_INST_FROM_NON_REGENT_L2L3", +"BriefDescription": "The processor's instruction cache was reloaded from another core's L2 or L3 on the same chip in a different regent due to a demand miss for a marked instruction." + }, + { +"EventCode": "0x0A034000C142", +"EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3", +"BriefDescription": "The processor's L1 data cache was reloaded from another core's L2 or L3 on the same chip in a different regent due to a demand miss for a marked instruction." + }, + { +"EventCode": "0x0A030010C142", +"EventName": "PM_MRK_INST_FROM_NON_REGENT_L2L3_ALL", +"BriefDescription": "The processor's instruction cache was reloaded from another core's L2 or L3 on the same chip in a different regent due to a demand miss or prefetch reload for a marked instruction." + }, + { +"EventCode": "0x0A034020C142", +"EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3_ALL", +"BriefDescription": "The processor's L1 data cache was reloaded from another core's L2 or L3 on the same chip in a different regent due to a demand miss or prefetch reload for a marked instruction." + }, + { +"EventCode": "0x0941C142", +"EventName": "PM_MRK_INST_FROM_LMEM", +"BriefDescription": "The processor's instruction cache was reloaded from the local chip's memory due to a demand miss for a marked instruction." + }, + { +"EventCode": "0x09404000C142", +"EventName": "PM_MRK_DATA_FROM_LMEM", +"BriefDescription": "The processor's L1 data cache was reloaded from the local chip's memory due to a demand miss for a marked instruction." + }, + { +"EventCode": "0x09410010C142", +"EventName": "PM_MRK_INST_FROM_LMEM_ALL", +"BriefDescription": "The processor's instruction cache was reloaded from the local chip's memory due to a demand miss or prefetch reload for a marked instruction." + }, + { +"EventCode": "0x09404020C142", +"EventName": "PM_MRK_DATA_FROM_LMEM_ALL", +"BriefDescription": "The processor's L1 data cache was reloaded from the local chip's memory due to a demand miss or prefetch reload for a marked instruction." + }, + { +"EventCode": "0x09804000C142", +"EventName": "PM_MRK_DATA_FROM_L_OC_CACHE", +"BriefDescription": "The processor's L1 data cache was reloaded from the local chip's OpenCAPI cache due to a demand miss for a marked instruction." + }, + { +"EventCode": "0x09804020C142", +"EventName": "PM_MRK_DATA_FROM_L_OC_CACHE_ALL", +"BriefDescription": "The processor's L1 data cache was reloaded from the local chip's OpenCAPI cache due to a demand miss or prefetch reload for a marked instruction." + }, + { +"EventCode": "0x09C04000C142", +"EventName": "PM_MRK_DATA_FROM_L_OC_MEM", +"BriefDescription": "The processor's L1 data cache was reloaded from the local chip's OpenCAPI memory due to a demand miss for a marked instruction." + }, + { +"EventCode": "0x09C04020C142", +"EventName": "PM_MRK_DATA_FROM_L_OC_MEM_ALL", +"BriefDescription": "The processor's L1 data cache was reloaded from the local chip's OpenCAPI memory due to a demand miss or prefetch reload for a marked instruction." + }, + { +"EventCode": "0x0981C142", +"EventName": "PM_MRK_INST_FROM_L_OC_ANY", +"BriefDescription": "The processor's instruction cache was reloaded from the local chip's OpenCAPI cache or memory due to a demand miss for a marked instruction." + }, + { +"EventCode": "0x09814000C142", +"EventName": "PM_MRK_DATA_FROM_L_OC_ANY", +"BriefDescription": "The processor's L1 data cache was reloaded from the local chip's
[PATCH v2 1/3] perf vendor events: Update JSON/events for power10 platform
Update JSON/Events list with data-source events for power10 platform. Signed-off-by: Kajol Jain --- .../arch/powerpc/power10/datasource.json | 1282 + .../arch/powerpc/power10/others.json | 10 - .../arch/powerpc/power10/translation.json |5 - 3 files changed, 1282 insertions(+), 15 deletions(-) create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/datasource.json --- Changelog: v1->v2 - Split first patch as its bounce from linux-perf-us...@vger.kernel.org mailing list because of 'Message too long (>10 chars)' error. --- diff --git a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json new file mode 100644 index ..12cfb9785433 --- /dev/null +++ b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json @@ -0,0 +1,1282 @@ +[ + { +"EventCode": "0x200FE", +"EventName": "PM_DATA_FROM_L2MISS", +"BriefDescription": "The processor's L1 data cache was reloaded from a source beyond the local core's L2 due to a demand miss." + }, + { +"EventCode": "0x300FE", +"EventName": "PM_DATA_FROM_L3MISS", +"BriefDescription": "The processor's L1 data cache was reloaded from beyond the local core's L3 due to a demand miss." + }, + { +"EventCode": "0x400FE", +"EventName": "PM_DATA_FROM_MEMORY", +"BriefDescription": "The processor's data cache was reloaded from local, remote, or distant memory due to a demand miss." + }, + { +"EventCode": "0x0003C040", +"EventName": "PM_INST_FROM_L2", +"BriefDescription": "The processor's instruction cache was reloaded from the local core's L2 due to a demand miss." + }, + { +"EventCode": "0x00034000C040", +"EventName": "PM_DATA_FROM_L2", +"BriefDescription": "The processor's L1 data cache was reloaded from the local core's L2 due to a demand miss." + }, + { +"EventCode": "0x00030010C040", +"EventName": "PM_INST_FROM_L2_ALL", +"BriefDescription": "The processor's instruction cache was reloaded from the local core's L2 due to a demand miss or prefetch reload." + }, + { +"EventCode": "0x00034020C040", +"EventName": "PM_DATA_FROM_L2_ALL", +"BriefDescription": "The processor's L1 data cache was reloaded from the local core's L2 due to a demand miss or prefetch reload." + }, + { +"EventCode": "0x003FC040", +"EventName": "PM_INST_FROM_L1MISS", +"BriefDescription": "The processor's instruction cache was reloaded from a source beyond the local core's L1 due to a demand miss." + }, + { +"EventCode": "0x003F4000C040", +"EventName": "PM_DATA_FROM_L1MISS", +"BriefDescription": "The processor's L1 data cache was reloaded from a source beyond the local core's L1 due to a demand miss." + }, + { +"EventCode": "0x003F0010C040", +"EventName": "PM_INST_FROM_L1MISS_ALL", +"BriefDescription": "The processor's instruction cache was reloaded from a source beyond the local core's L1 due to a demand miss or prefetch reload." + }, + { +"EventCode": "0x003F4020C040", +"EventName": "PM_DATA_FROM_L1MISS_ALL", +"BriefDescription": "The processor's L1 data cache was reloaded from a source beyond the local core's L1 due to a demand miss or prefetch reload." + }, + { +"EventCode": "0x4000C040", +"EventName": "PM_DATA_FROM_L2_NO_CONFLICT", +"BriefDescription": "The processor's L1 data cache was reloaded without dispatch conflicts with data NOT in the MEPF state from the local core's L2 due to a demand miss." + }, + { +"EventCode": "0x4020C040", +"EventName": "PM_DATA_FROM_L2_NO_CONFLICT_ALL", +"BriefDescription": "The processor's L1 data cache was reloaded without dispatch conflicts with data NOT in the MEPF state from the local core's L2 due to a demand miss or prefetch reload." + }, + { +"EventCode": "0x00404000C040", +"EventName": "PM_DATA_FROM_L2_MEPF", +"BriefDescription": "The processor's L1 data cache was reloaded with data in the MEPF state without dispatch conflicts from the local core's L2 due to a demand miss." + }, + { +"EventCode": "0x00404020C040", +"EventName": "PM_DATA_FROM_L2_MEPF_ALL", +"BriefDescription": "The processor's L1 data cache was reloaded with data in the MEPF state without dispatch conflicts from the local core's L2 due to a demand miss or prefetch reload." + }, + { +"EventCode": "0x00804000C040", +"EventName": "PM_DATA_FROM_L2_LDHITST_CONFLICT", +"BriefDescription": "The processor's L1 data cache was reloaded with data that had a dispatch conflict on ld-hit-store from the local core's L2 due to a demand miss." + }, + { +"EventCode": "0x00804020C040", +"EventName": "PM_DATA_FROM_L2_LDHITST_CONFLICT_ALL", +"BriefDescription": "The processor's L1 data cache was reloaded with data that had a dispatch conflict on ld-hit-store from the
Re: [PATCH v4] drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close()
On Mon, 2023-09-04 at 17:03 +, Christophe Leroy wrote: > > Le 04/09/2023 à 14:31, Alexandra Diupina a écrit : > > diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c > > index 47c2ad7a3e42..fd999dabdd39 100644 > > --- a/drivers/net/wan/fsl_ucc_hdlc.c > > +++ b/drivers/net/wan/fsl_ucc_hdlc.c > > @@ -34,6 +34,8 @@ > > #define TDM_PPPOHT_SLIC_MAXIN > > #define RX_BD_ERRORS (R_CD_S | R_OV_S | R_CR_S | R_AB_S | R_NO_S | R_LG_S) > > > > +static int uhdlc_close(struct net_device *dev); > > + > > static struct ucc_tdm_info utdm_primary_info = { > > .uf_info = { > > .tsa = 0, > > @@ -731,7 +733,9 @@ static int uhdlc_open(struct net_device *dev) > > napi_enable(>napi); > > netdev_reset_queue(dev); > > netif_start_queue(dev); > > - hdlc_open(dev); > > + > > + int rc = hdlc_open(dev); > > Do not mix declarations and code. Please put all declaration at the top > of the block. > > > + return rc == 0 ? 0 : (uhdlc_close(dev), rc); > > } > > That's not easy to read. > > I know that's more changes, but I'd prefer something like: > > static int uhdlc_open(struct net_device *dev) > { > u32 cecr_subblock; > hdlc_device *hdlc = dev_to_hdlc(dev); > struct ucc_hdlc_private *priv = hdlc->priv; > struct ucc_tdm *utdm = priv->utdm; > int rc; > > if (priv->hdlc_busy != 1) > return 0; > > if (request_irq(priv->ut_info->uf_info.irq, > ucc_hdlc_irq_handler, 0, "hdlc", priv)) > return -ENODEV; > > cecr_subblock = ucc_fast_get_qe_cr_subblock( > priv->ut_info->uf_info.ucc_num); > > qe_issue_cmd(QE_INIT_TX_RX, cecr_subblock, >QE_CR_PROTOCOL_UNSPECIFIED, 0); > > ucc_fast_enable(priv->uccf, COMM_DIR_RX | COMM_DIR_TX); > > /* Enable the TDM port */ > if (priv->tsa) > qe_setbits_8(>si_regs->siglmr1_h, 0x1 << utdm->tdm_port); > > priv->hdlc_busy = 1; > netif_device_attach(priv->ndev); > napi_enable(>napi); > netdev_reset_queue(dev); > netif_start_queue(dev); > > rc = hdlc_open(dev); > if (rc) > uhdlc_close(dev); > > return rc; > } I agree the above is more readable, but I don't think the whole refactor is not worthy for a -net fix. I think simply rewriting the final statements as: rc = hdlc_open(dev); if (rc) uhdlc_close(dev); return rc; would be good for -net. > > return 0; > > @@ -824,6 +828,8 @@ static int uhdlc_close(struct net_device *dev) > > netdev_reset_queue(dev); > > priv->hdlc_busy = 0; > > > > + hdlc_close(dev); > > + > > return 0; > > > > And while you are looking at the correctness of this code, is it sure > that uhdlc_open() cannot be called twice in parallele ? > If it can be called in parallèle I think the "if (priv->hdlc_busy != 1)" > should be replaced by something using cmpxchg() That part is safe, ndo_open() is invoked under the rtnl lock. The other comments are IMHO relevant, @Alexandra: please address them. Thanks! Paolo
Re: [PATCH v7 3/8] KVM: Make __kvm_follow_pfn not imply FOLL_GET
On Wed, Jul 12, 2023 at 7:00 AM Sean Christopherson wrote: > > On Tue, Jul 11, 2023, Zhi Wang wrote: > > On Thu, 6 Jul 2023 15:49:39 +0900 > > David Stevens wrote: > > > > > On Wed, Jul 5, 2023 at 10:19___PM Zhi Wang > > > wrote: > > > > > > > > On Tue, 4 Jul 2023 16:50:48 +0900 > > > > David Stevens wrote: > > > > If yes, do we have to use FOLL_GET to resolve GFN associated with a > > > > tail page? > > > > It seems gup can tolerate gup_flags without FOLL_GET, but it is more > > > > like a > > > > temporary solution. I don't think it is a good idea to play tricks with > > > > a temporary solution, more like we are abusing the toleration. > > > > > > I'm not sure I understand what you're getting at. This series never > > > calls gup without FOLL_GET. > > > > > > This series aims to provide kvm_follow_pfn as a unified API on top of > > > gup+follow_pte. Since one of the major clients of this API uses an mmu > > > notifier, it makes sense to support returning a pfn without taking a > > > reference. And we indeed need to do that for certain types of memory. > > > > > > > I am not having prob with taking a pfn without taking a ref. I am > > questioning if using !FOLL_GET in struct kvm_follow_pfn to indicate taking > > a pfn without a ref is a good idea or not, while there is another flag > > actually showing it. > > > > I can understand that using FOLL_XXX in kvm_follow_pfn saves some > > translation between struct kvm_follow_pfn.{write, async, } and GUP > > flags. However FOLL_XXX is for GUP. Using FOLL_XXX for reflecting the > > requirements of GUP in the code path that going to call GUP is reasonable. > > > > But using FOLL_XXX with purposes that are not related to GUP call really > > feels off. > > I agree, assuming you're talking specifically about the logic in > hva_to_pfn_remapped() > that handles non-refcounted pages, i.e. this > > if (get_page_unless_zero(page)) { > foll->is_refcounted_page = true; > if (!(foll->flags & FOLL_GET)) > put_page(page); > } else if (foll->flags & FOLL_GET) { > r = -EFAULT; > } > > should be > > if (get_page_unless_zero(page)) { > foll->is_refcounted_page = true; > if (!(foll->flags & FOLL_GET)) > put_page(page); > else if (!foll->guarded_by_mmu_notifier) > r = -EFAULT; > > because it's not the desire to grab a reference that makes getting > non-refcounted > pfns "safe", it's whether or not the caller is plugged into the MMU notifiers. > > Though that highlights that checking guarded_by_mmu_notifier should be done > for > *all* non-refcounted pfns, not just non-refcounted struct page memory. I think things are getting confused here because there are multiple things which "safe" refers to. There are three different definitions that I think are relevant here: 1) "safe" in the sense that KVM doesn't corrupt page reference counts 2) "safe" in the sense that KVM doesn't access pfns after they have been freed 3) "safe" in the sense that KVM doesn't use stale hva -> pfn translations For property 1, FOLL_GET is important. If the caller passes FOLL_GET, then they expect to be able to pass the returned pfn to kvm_release_pfn. This means that when FOLL_GET is set, if kvm_pfn_to_refcounted_page returns a page, then hva_to_pfn_remapped must take a reference count to avoid eventually corrupting the page ref count. I guess replacing the FOLL_GET check with !guarded_by_mmu_notifier is logically equivalent because __kvm_follow_pfn requires that at least one of guarded_by_mmu_notifier and FOLL_GET is set. But since we're concerned about a property of the refcount, I think that checking FOLL_GET is clearer. For property 2, FOLL_GET is also important. If guarded_by_mmu_notifier is set, then we're all good here. If guarded_by_mmu_notifier is not set, then the check in __kvm_follow_pfn guarantees that FOLL_GET is set. For struct page memory, we're safe because KVM will hold a reference as long as it's still using the page. For non struct page memory, we're not safe - this is where the breaking change of allow_unsafe_mappings would go. Note that for non-refcounted struct page, we can't use the allow_unsafe_mappings escape hatch. Since FOLL_GET was requested, if we returned such a page, then the caller would eventually corrupt the page refcount via kvm_release_pfn. Property 3 would be nice, but we've already concluded that guarding all translations with mmu notifiers is infeasible. So maintaining property 2 is the best we can hope for. > As for the other usage of FOLL_GET in this series (using it to conditionally > do > put_page()), IMO that's very much related to the GUP call. Invoking > put_page() > is a hack to workaround the fact that GUP doesn't provide a way to get the pfn > without grabbing a reference to the page. In an ideal world, KVM would NOT > pass > FOLL_GET to the various GUP
Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval
On Tue, Sep 05, 2023 at 12:42:11PM +1000, Michael Ellerman wrote: > Michal Suchánek writes: > > On Thu, Aug 31, 2023 at 12:59:25PM -0500, Nathan Lynch wrote: > ... > >> You (Michal) seem to favor a kernel-user ABI where user space is allowed > >> to invoke arbitrary RTAS functions by name. But we already have that in > >> the form of the rtas() syscall. (User space looks up function tokens by > >> name in the DT.) The point of the series is that we need to move away > >> from that. It's too low-level and user space has to use /dev/mem when > >> invoking any of the less-simple RTAS functions. > > > > We don't have that, directly accessing /dev/mem does not really work. > > And that's what needs fixing in my view. > > > > The rtas calls are all mechanically the same, the function implemented > > here should be able to call any of them if there was a way to specify > > the call. > > > > Given that there is desire to have access to multiple calls I don't > > think it makes sense to allocate a separate device with different name > > for each. > > I think it does make sense. > > We explicitly don't want a general "call any RTAS function" API. > > We want tightly scoped APIs that do one thing, or a family of related > things, but not anything & everything. > > Having different devices for each of those APIs means permissions can be > granted separately on those devices. So a user/group can be given access > to the "papr-vpd" device, but not some other unrelated device that also > happens to expose an RTAS service (eg. error injection). Yes, it does work as a kludge for setting permissions for individual calls. Thanks Michal
Re: [PATCH] powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()
Le 05/09/2023 à 06:46, Christophe Leroy a écrit : > > > Le 05/09/2023 à 04:36, Michael Ellerman a écrit : >> Christophe Leroy writes: >>> Commit 45201c879469 ("powerpc/nohash: Remove hash related code from >>> nohash headers.") replaced: >>> >>> if ((pte_val(*ptep) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0) >>> return 0; >>> >>> By: >>> >>> if (pte_young(*ptep)) >>> return 0; >>> >>> But it should be: >>> >>> if (!pte_young(*ptep)) >>> return 0; >> >> >> That seems bad :) >> >> But I don't know off the top of my head where >> __ptep_test_and_clear_young() is used, and so what the symptoms could >> be. Presumably nothing too bad or someone would have noticed? >> > > The two uses in mm/vmscan.c are as follows: > > if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) > VM_WARN_ON_ONCE(true); > > So it seems to be expected to never happen. > > The only useful place it is used seems to be folio_check_references() > which is part of the reclaim process. So probably it messes up swap, but > to what extent ? ppc64e is for embedded systems. Do embedded systems > have swap at all ? > > Also surprising that it is also called from mm/debug_vm_pgtable.c so > shouldn't we have noticed earlier ? I'll check if it works. I confirm CONFIG_DEBUG_VM_PGTABLE on QEMU detects the problem: debug_vm_pgtable: [debug_vm_pgtable ]: Validating architecture page table helpers [ cut here ] WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:174 debug_vm_pgtable+0x82c/0xa78 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-11584-g0ad6bbfc7dab #457 Hardware name: QEMU ppce500 e6500 0x80400020 QEMU e500 NIP: c10fbe20 LR: c10fbdf8 CTR: REGS: c30cb860 TRAP: 0700 Not tainted (6.5.0-11584-g0ad6bbfc7dab) MSR: 80029002 CR: 44000484 XER: IRQMASK: 0 GPR00: c10fbde0 c30cbb00 c10e1200 c0003f2a9eb8 GPR04: 33cbee9df000 c39b0ef8 0039b124028d 0001 GPR08: 0001 c39b0ef8 0001 GPR12: 24000248 c132e000 c0001e38 GPR16: GPR20: GPR24: c10da0c4 c112d068 c3015920 0020 GPR28: 1000 0001 c0003f2a9eb8 0039b124028d NIP [c10fbe20] debug_vm_pgtable+0x82c/0xa78 LR [c10fbdf8] debug_vm_pgtable+0x804/0xa78 Call Trace: [c30cbb00] [c10fbde0] debug_vm_pgtable+0x7ec/0xa78 (unreliable) [c30cbc70] [c0001a38] do_one_initcall+0x68/0x2b8 [c30cbd40] [c10db498] kernel_init_freeable+0x2d0/0x348 [c30cbde0] [c0001e64] kernel_init+0x34/0x170 [c30cbe50] [c594] ret_from_kernel_user_thread+0x14/0x1c --- interrupt: 0 at 0x0 NIP: LR: CTR: REGS: c30cbe80 TRAP: Not tainted (6.5.0-11584-g0ad6bbfc7dab) MSR: <> CR: XER: IRQMASK: 0 GPR00: GPR04: GPR08: GPR12: GPR16: GPR20: GPR24: GPR28: NIP [] 0x0 LR [] 0x0 --- interrupt: 0 Code: 7c7e189e 4b16a3e9 e9410090 e92a 792b77e3 40c20010 79296842 79299800 f92a e9410090 e92a 792977e2 <0b09> 3920 f92a e9210090 ---[ end trace ]--- Christophe