Re: [f2fs-dev] [PATCH v2 00/89] fs: new accessors for inode->i_ctime

2023-09-05 Thread patchwork-bot+f2fs
Hello:

This series was applied to jaegeuk/f2fs.git (dev)
by Christian Brauner :

On Wed,  5 Jul 2023 14:58:09 -0400 you wrote:
> v2:
> - prepend patches to add missing ctime updates
> - add simple_rename_timestamp helper function
> - rename ctime accessor functions as inode_get_ctime/inode_set_ctime_*
> - drop individual inode_ctime_set_{sec,nsec} helpers
> 
> I've been working on a patchset to change how the inode->i_ctime is
> accessed in order to give us conditional, high-res timestamps for the
> ctime and mtime. struct timespec64 has unused bits in it that we can use
> to implement this. In order to do that however, we need to wrap all
> accesses of inode->i_ctime to ensure that bits used as flags are
> appropriately handled.
> 
> [...]

Here is the summary with links:
  - [f2fs-dev,v2,07/92] fs: add ctime accessors infrastructure
https://git.kernel.org/jaegeuk/f2fs/c/9b6304c1d537
  - [f2fs-dev,v2,08/92] fs: new helper: simple_rename_timestamp
https://git.kernel.org/jaegeuk/f2fs/c/0c4767923ed6
  - [f2fs-dev,v2,92/92] fs: rename i_ctime field to __i_ctime
https://git.kernel.org/jaegeuk/f2fs/c/13bc24457850

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [PATCH v2 1/3] kconfig: add dependencies of POWER_RESET for mips malta

2023-09-05 Thread Yuan Tan

Hi,

On 9/4/2023 3:40 PM, Philippe Mathieu-Daudé wrote:

Hi,

On 1/9/23 04:42, Yuan Tan wrote:

MIPS Malta's power off depends on PCI, PCI_QUIRKS, and
POWER_RESET_PIIX4_POWEROFF to work. Enable them when POWER_RESET is set
for convenience.

Suggested-by: Zhangjin Wu 
Signed-off-by: Yuan Tan 
---
  arch/mips/Kconfig | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index bc8421859006..13bacbd05125 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -547,6 +547,9 @@ config MIPS_MALTA
  select MIPS_L1_CACHE_SHIFT_6
  select MIPS_MSC
  select PCI_GT64XXX_PCI0
+    select PCI if POWER_RESET
+    select PCI_QUIRKS if POWER_RESET
+    select POWER_RESET_PIIX4_POWEROFF if POWER_RESET
  select SMP_UP if SMP
  select SWAP_IO_SPACE
  select SYS_HAS_CPU_MIPS32_R1


Shouldn't we also update the _defconfig files?

Sorry, in my last email, I forgot to reply to all. So I am now resending 
this email.


In malta_defconfig, PCI and POWER_RESET_PIIX4_POWEROFF have already been 
set and PCI_QUIRKS is also selected by FSL_PCI [=n].


So shutdown and reboot with malta_defconfig is working and there is no 
need to update the malta_defconfig 




Re: [PATCH v7 3/8] KVM: Make __kvm_follow_pfn not imply FOLL_GET

2023-09-05 Thread David Stevens
On Wed, Sep 6, 2023 at 9:45 AM Sean Christopherson  wrote:
>
> On Tue, Sep 05, 2023, David Stevens wrote:
> > On Wed, Jul 12, 2023 at 7:00 AM Sean Christopherson  
> > wrote:
> > >
> > > On Tue, Jul 11, 2023, Zhi Wang wrote:
> > > > On Thu, 6 Jul 2023 15:49:39 +0900
> > > > David Stevens  wrote:
> > > >
> > > > > On Wed, Jul 5, 2023 at 10:19___PM Zhi Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Tue,  4 Jul 2023 16:50:48 +0900
> > > > > > David Stevens  wrote:
> > > > > > If yes, do we have to use FOLL_GET to resolve GFN associated with a 
> > > > > > tail page?
> > > > > > It seems gup can tolerate gup_flags without FOLL_GET, but it is 
> > > > > > more like a
> > > > > > temporary solution. I don't think it is a good idea to play tricks 
> > > > > > with
> > > > > > a temporary solution, more like we are abusing the toleration.
> > > > >
> > > > > I'm not sure I understand what you're getting at. This series never
> > > > > calls gup without FOLL_GET.
> > > > >
> > > > > This series aims to provide kvm_follow_pfn as a unified API on top of
> > > > > gup+follow_pte. Since one of the major clients of this API uses an mmu
> > > > > notifier, it makes sense to support returning a pfn without taking a
> > > > > reference. And we indeed need to do that for certain types of memory.
> > > > >
> > > >
> > > > I am not having prob with taking a pfn without taking a ref. I am
> > > > questioning if using !FOLL_GET in struct kvm_follow_pfn to indicate 
> > > > taking
> > > > a pfn without a ref is a good idea or not, while there is another flag
> > > > actually showing it.
> > > >
> > > > I can understand that using FOLL_XXX in kvm_follow_pfn saves some
> > > > translation between struct kvm_follow_pfn.{write, async, } and GUP
> > > > flags. However FOLL_XXX is for GUP. Using FOLL_XXX for reflecting the
> > > > requirements of GUP in the code path that going to call GUP is 
> > > > reasonable.
> > > >
> > > > But using FOLL_XXX with purposes that are not related to GUP call really
> > > > feels off.
> > >
> > > I agree, assuming you're talking specifically about the logic in 
> > > hva_to_pfn_remapped()
> > > that handles non-refcounted pages, i.e. this
> > >
> > > if (get_page_unless_zero(page)) {
> > > foll->is_refcounted_page = true;
> > > if (!(foll->flags & FOLL_GET))
> > > put_page(page);
> > > } else if (foll->flags & FOLL_GET) {
> > > r = -EFAULT;
> > > }
> > >
> > > should be
> > >
> > > if (get_page_unless_zero(page)) {
> > > foll->is_refcounted_page = true;
> > > if (!(foll->flags & FOLL_GET))
> > > put_page(page);
> > > else if (!foll->guarded_by_mmu_notifier)
> > > r = -EFAULT;
> > >
> > > because it's not the desire to grab a reference that makes getting 
> > > non-refcounted
> > > pfns "safe", it's whether or not the caller is plugged into the MMU 
> > > notifiers.
> > >
> > > Though that highlights that checking guarded_by_mmu_notifier should be 
> > > done for
> > > *all* non-refcounted pfns, not just non-refcounted struct page memory.
> >
> > I think things are getting confused here because there are multiple
> > things which "safe" refers to. There are three different definitions
> > that I think are relevant here:
> >
> > 1) "safe" in the sense that KVM doesn't corrupt page reference counts
> > 2) "safe" in the sense that KVM doesn't access pfns after they have been 
> > freed
> > 3) "safe" in the sense that KVM doesn't use stale hva -> pfn translations
> >
> > For property 1, FOLL_GET is important. If the caller passes FOLL_GET,
> > then they expect to be able to pass the returned pfn to
> > kvm_release_pfn. This means that when FOLL_GET is set, if
> > kvm_pfn_to_refcounted_page returns a page, then hva_to_pfn_remapped
> > must take a reference count to avoid eventually corrupting the page
> > ref count. I guess replacing the FOLL_GET check with
> > !guarded_by_mmu_notifier is logically equivalent because
> > __kvm_follow_pfn requires that at least one of guarded_by_mmu_notifier
> > and FOLL_GET is set. But since we're concerned about a property of the
> > refcount, I think that checking FOLL_GET is clearer.
> >
> > For property 2, FOLL_GET is also important. If guarded_by_mmu_notifier
> > is set, then we're all good here. If guarded_by_mmu_notifier is not
> > set, then the check in __kvm_follow_pfn guarantees that FOLL_GET is
> > set. For struct page memory, we're safe because KVM will hold a
> > reference as long as it's still using the page. For non struct page
> > memory, we're not safe - this is where the breaking change of
> > allow_unsafe_mappings would go. Note that for non-refcounted struct
> > page, we can't use the allow_unsafe_mappings escape hatch. Since
> > FOLL_GET was requested, if we returned such a page, then the caller
> > would eventually corrupt the page refcount via kvm_release_pfn.

Re: [PATCH v7 3/8] KVM: Make __kvm_follow_pfn not imply FOLL_GET

2023-09-05 Thread Sean Christopherson
On Tue, Sep 05, 2023, David Stevens wrote:
> On Wed, Jul 12, 2023 at 7:00 AM Sean Christopherson  wrote:
> >
> > On Tue, Jul 11, 2023, Zhi Wang wrote:
> > > On Thu, 6 Jul 2023 15:49:39 +0900
> > > David Stevens  wrote:
> > >
> > > > On Wed, Jul 5, 2023 at 10:19___PM Zhi Wang  
> > > > wrote:
> > > > >
> > > > > On Tue,  4 Jul 2023 16:50:48 +0900
> > > > > David Stevens  wrote:
> > > > > If yes, do we have to use FOLL_GET to resolve GFN associated with a 
> > > > > tail page?
> > > > > It seems gup can tolerate gup_flags without FOLL_GET, but it is more 
> > > > > like a
> > > > > temporary solution. I don't think it is a good idea to play tricks 
> > > > > with
> > > > > a temporary solution, more like we are abusing the toleration.
> > > >
> > > > I'm not sure I understand what you're getting at. This series never
> > > > calls gup without FOLL_GET.
> > > >
> > > > This series aims to provide kvm_follow_pfn as a unified API on top of
> > > > gup+follow_pte. Since one of the major clients of this API uses an mmu
> > > > notifier, it makes sense to support returning a pfn without taking a
> > > > reference. And we indeed need to do that for certain types of memory.
> > > >
> > >
> > > I am not having prob with taking a pfn without taking a ref. I am
> > > questioning if using !FOLL_GET in struct kvm_follow_pfn to indicate taking
> > > a pfn without a ref is a good idea or not, while there is another flag
> > > actually showing it.
> > >
> > > I can understand that using FOLL_XXX in kvm_follow_pfn saves some
> > > translation between struct kvm_follow_pfn.{write, async, } and GUP
> > > flags. However FOLL_XXX is for GUP. Using FOLL_XXX for reflecting the
> > > requirements of GUP in the code path that going to call GUP is reasonable.
> > >
> > > But using FOLL_XXX with purposes that are not related to GUP call really
> > > feels off.
> >
> > I agree, assuming you're talking specifically about the logic in 
> > hva_to_pfn_remapped()
> > that handles non-refcounted pages, i.e. this
> >
> > if (get_page_unless_zero(page)) {
> > foll->is_refcounted_page = true;
> > if (!(foll->flags & FOLL_GET))
> > put_page(page);
> > } else if (foll->flags & FOLL_GET) {
> > r = -EFAULT;
> > }
> >
> > should be
> >
> > if (get_page_unless_zero(page)) {
> > foll->is_refcounted_page = true;
> > if (!(foll->flags & FOLL_GET))
> > put_page(page);
> > else if (!foll->guarded_by_mmu_notifier)
> > r = -EFAULT;
> >
> > because it's not the desire to grab a reference that makes getting 
> > non-refcounted
> > pfns "safe", it's whether or not the caller is plugged into the MMU 
> > notifiers.
> >
> > Though that highlights that checking guarded_by_mmu_notifier should be done 
> > for
> > *all* non-refcounted pfns, not just non-refcounted struct page memory.
> 
> I think things are getting confused here because there are multiple
> things which "safe" refers to. There are three different definitions
> that I think are relevant here:
> 
> 1) "safe" in the sense that KVM doesn't corrupt page reference counts
> 2) "safe" in the sense that KVM doesn't access pfns after they have been freed
> 3) "safe" in the sense that KVM doesn't use stale hva -> pfn translations
>
> For property 1, FOLL_GET is important. If the caller passes FOLL_GET,
> then they expect to be able to pass the returned pfn to
> kvm_release_pfn. This means that when FOLL_GET is set, if
> kvm_pfn_to_refcounted_page returns a page, then hva_to_pfn_remapped
> must take a reference count to avoid eventually corrupting the page
> ref count. I guess replacing the FOLL_GET check with
> !guarded_by_mmu_notifier is logically equivalent because
> __kvm_follow_pfn requires that at least one of guarded_by_mmu_notifier
> and FOLL_GET is set. But since we're concerned about a property of the
> refcount, I think that checking FOLL_GET is clearer.
> 
> For property 2, FOLL_GET is also important. If guarded_by_mmu_notifier
> is set, then we're all good here. If guarded_by_mmu_notifier is not
> set, then the check in __kvm_follow_pfn guarantees that FOLL_GET is
> set. For struct page memory, we're safe because KVM will hold a
> reference as long as it's still using the page. For non struct page
> memory, we're not safe - this is where the breaking change of
> allow_unsafe_mappings would go. Note that for non-refcounted struct
> page, we can't use the allow_unsafe_mappings escape hatch. Since
> FOLL_GET was requested, if we returned such a page, then the caller
> would eventually corrupt the page refcount via kvm_release_pfn.

Yes we can.  The caller simply needs to be made aware of is_refcounted_page.   I
didn't include that in the snippet below because I didn't want to write the 
entire
patch.  The whole point of adding is_refcounted_page is so that callers can
identify exactly what type of page was at the 

Re: [PATCH gmem FIXUP] mm, compaction: make testing mapping_unmovable() safe

2023-09-05 Thread Sean Christopherson
On Fri, Sep 01, 2023, Vlastimil Babka wrote:
> As Kirill pointed out, mapping can be removed under us due to
> truncation. Test it under folio lock as already done for the async
> compaction / dirty folio case. To prevent locking every folio with
> mapping to do the test, do it only for unevictable folios, as we can
> expect the unmovable mapping folios are also unevictable - it is the
> case for guest memfd folios.

Rather than expect/assume that unmovable mappings are always unevictable, how 
about
requiring that?  E.g. either through a VM_WARN_ON in mapping_set_unmovable(), 
or by
simply having that helper forcefully set AS_UNEVICTABLE as well.


[PATCH v2 2/3] vmcore: allow fadump to export vmcore even if is_kdump_kernel() is false

2023-09-05 Thread Hari Bathini
Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set.
While elfcorehdr_addr is set for kexec based kernel dump mechanism,
alternate dump capturing methods like fadump [1] also set it to export
the vmcore. Since, is_kdump_kernel() is used to restrict resources in
crash dump capture kernel and such restrictions are not desirable for
fadump, allow is_kdump_kernel() to be defined differently for fadump
case. With that change, include is_fadump_active() check in functions
is_vmcore_usable() & vmcore_unusable() to be able to export vmcore for
fadump case too.

[1] https://docs.kernel.org/powerpc/firmware-assisted-dump.html

Signed-off-by: Hari Bathini 
---

Changes in v2:
* is_fadump_active() check added to is_vmcore_usable() as suggested
  by Baoquan.


 include/linux/crash_dump.h | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index 0f3a656293b0..de8a9fabfb6f 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -50,6 +50,7 @@ void vmcore_cleanup(void);
 #define vmcore_elf64_check_arch(x) (elf_check_arch(x) || 
vmcore_elf_check_arch_cross(x))
 #endif
 
+#ifndef is_kdump_kernel
 /*
  * is_kdump_kernel() checks whether this kernel is booting after a panic of
  * previous kernel or not. This is determined by checking if previous kernel
@@ -64,6 +65,19 @@ static inline bool is_kdump_kernel(void)
 {
return elfcorehdr_addr != ELFCORE_ADDR_MAX;
 }
+#endif
+
+#ifndef is_fadump_active
+/*
+ * If f/w assisted dump capturing mechanism (fadump), instead of kexec based
+ * dump capturing mechanism (kdump) is exporting the vmcore, then this function
+ * will be defined in arch specific code to return true, when appropriate.
+ */
+static inline bool is_fadump_active(void)
+{
+   return false;
+}
+#endif
 
 /* is_vmcore_usable() checks if the kernel is booting after a panic and
  * the vmcore region is usable.
@@ -75,7 +89,8 @@ static inline bool is_kdump_kernel(void)
 
 static inline int is_vmcore_usable(void)
 {
-   return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
+   return (is_kdump_kernel() || is_fadump_active())
+   && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0;
 }
 
 /* vmcore_unusable() marks the vmcore as unusable,
@@ -84,7 +99,7 @@ static inline int is_vmcore_usable(void)
 
 static inline void vmcore_unusable(void)
 {
-   if (is_kdump_kernel())
+   if (is_kdump_kernel() || is_fadump_active())
elfcorehdr_addr = ELFCORE_ADDR_ERR;
 }
 
-- 
2.41.0



Re: [PATCH 1/2] vmcore: allow alternate dump capturing methods to export vmcore without is_kdump_kernel()

2023-09-05 Thread Hari Bathini




On 05/09/23 8:00 am, Baoquan He wrote:

On 09/04/23 at 08:04pm, Hari Bathini wrote:

Hi Baoquan,

Thanks for the review...

On 03/09/23 9:06 am, Baoquan He wrote:

Hi Hari,

On 09/02/23 at 12:34am, Hari Bathini wrote:

Currently, is_kdump_kernel() returns true when elfcorehdr_addr is set.
While elfcorehdr_addr is set for kexec based kernel dump mechanism,
alternate dump capturing methods like fadump [1] also set it to export
the vmcore. is_kdump_kernel() is used to restrict resources in crash
dump capture kernel but such restrictions may not be desirable for
fadump. Allow is_kdump_kernel() to be defined differently for such
scenarios. With this, is_kdump_kernel() could be false while vmcore
is usable. So, introduce is_crashdump_kernel() to return true when
elfcorehdr_addr is set and use it for vmcore related checks.


I got what is done in these two patches, but didn't get why they need be
done. vmcore_unusable()/is_vmcore_usable() are only unitilized in ia64.
Why do you care if it's is_crashdump_kernel() or is_kdump_kernel()?
If you want to override the generic is_kdump_kernel() with powerpc's own
is_kdump_kernel(), your below change is enough to allow you to do that.
I can't see why is_crashdump_kernel() is needed. Could you explain that
specifically?


You mean to just remove is_kdump_kernel() check in is_vmcore_usable() &
vmcore_unusable() functions? Replaced generic is_crashdump_kernel()
function instead, that returns true for any dump capturing method,
irrespective of whether is_kdump_kernel() returns true or false.
For fadump case, is_kdump_kernel() will return false after patch 2/2.


OK, I could understand what you want to achieve. You want to make
is_kdump_kernel() only return true for kdump, while is_vmcore_usable()
returns true for both kdump and fadump.

IIUC, can we change as below? It could make code clearer and more
straightforward. I don't think adding another is_crashdump_kernel()
is a good idea, that would be a torture for non-powerpc people reading
code when they need differentiate between kdump and crashdump.



Sure, Baoquan.
Posted v2 based on your suggestion.

Thanks
Hari


[PATCH v2 1/3] powerpc/fadump: make is_fadump_active() visible for exporting vmcore

2023-09-05 Thread Hari Bathini
Include asm/fadump.h in asm/kexec.h to make it visible while exporting
vmcore. Also, update is_fadump_active() to return boolean instead of
integer for better readability. The change will be used in the next
patch to ensure vmcore is exported when fadump is active.

Signed-off-by: Hari Bathini 
---

Changes in v2:
* New patch based on Baoquan's suggestion to use is_fadump_active()
  instead of introducing new function is_crashdump_kernel().


 arch/powerpc/include/asm/fadump.h | 4 ++--
 arch/powerpc/include/asm/kexec.h  | 8 ++--
 arch/powerpc/kernel/fadump.c  | 4 ++--
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 526a6a647312..27b74a7e2162 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -15,13 +15,13 @@ extern int crashing_cpu;
 
 extern int is_fadump_memory_area(u64 addr, ulong size);
 extern int setup_fadump(void);
-extern int is_fadump_active(void);
+extern bool is_fadump_active(void);
 extern int should_fadump_crash(void);
 extern void crash_fadump(struct pt_regs *, const char *);
 extern void fadump_cleanup(void);
 
 #else  /* CONFIG_FA_DUMP */
-static inline int is_fadump_active(void) { return 0; }
+static inline bool is_fadump_active(void) { return false; }
 static inline int should_fadump_crash(void) { return 0; }
 static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
 static inline void fadump_cleanup(void) { }
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index a1ddba01e7d1..b760ef459234 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -51,6 +51,7 @@
 
 #ifndef __ASSEMBLY__
 #include 
+#include 
 
 typedef void (*crash_shutdown_t)(void);
 
@@ -99,10 +100,13 @@ void relocate_new_kernel(unsigned long indirection_page, 
unsigned long reboot_co
 
 void kexec_copy_flush(struct kimage *image);
 
-#if defined(CONFIG_CRASH_DUMP) && defined(CONFIG_PPC_RTAS)
+#if defined(CONFIG_CRASH_DUMP)
+#define is_fadump_active   is_fadump_active
+#if defined(CONFIG_PPC_RTAS)
 void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
 #define crash_free_reserved_phys_range crash_free_reserved_phys_range
-#endif
+#endif /* CONFIG_PPC_RTAS */
+#endif /* CONFIG_CRASH_DUMP */
 
 #ifdef CONFIG_KEXEC_FILE
 extern const struct kexec_file_ops kexec_elf64_ops;
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 3ff2da7b120b..5682a65e8326 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -187,9 +187,9 @@ int should_fadump_crash(void)
return 1;
 }
 
-int is_fadump_active(void)
+bool is_fadump_active(void)
 {
-   return fw_dump.dump_active;
+   return !!fw_dump.dump_active;
 }
 
 /*
-- 
2.41.0



[PATCH v2 3/3] powerpc/fadump: make is_kdump_kernel() return false when fadump is active

2023-09-05 Thread Hari Bathini
Currently, is_kdump_kernel() returns true in crash dump capture kernel
for both kdump and fadump crash dump capturing methods, as both these
methods set elfcorehdr_addr. Some restrictions enforced for crash dump
capture kernel, based on is_kdump_kernel(), are specifically meant for
kdump case and not desirable for fadump - eg. IO queues restriction in
device drivers. So, define is_kdump_kernel() to return false when f/w
assisted dump is active.

Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/kexec.h |  2 ++
 arch/powerpc/kernel/crash_dump.c | 11 +++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index b760ef459234..f0b9c3fd0618 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -101,6 +101,8 @@ void relocate_new_kernel(unsigned long indirection_page, 
unsigned long reboot_co
 void kexec_copy_flush(struct kimage *image);
 
 #if defined(CONFIG_CRASH_DUMP)
+bool is_kdump_kernel(void);
+#define is_kdump_kernelis_kdump_kernel
 #define is_fadump_active   is_fadump_active
 #if defined(CONFIG_PPC_RTAS)
 void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
index 9a3b85bfc83f..6d8e616ce3ce 100644
--- a/arch/powerpc/kernel/crash_dump.c
+++ b/arch/powerpc/kernel/crash_dump.c
@@ -92,6 +92,17 @@ ssize_t copy_oldmem_page(struct iov_iter *iter, unsigned 
long pfn,
return csize;
 }
 
+/*
+ * Return true only when kexec based kernel dump capturing method is used.
+ * This ensures all restritions applied for kdump case are not automatically
+ * applied for fadump case.
+ */
+bool is_kdump_kernel(void)
+{
+   return !is_fadump_active() && elfcorehdr_addr != ELFCORE_ADDR_MAX;
+}
+EXPORT_SYMBOL_GPL(is_kdump_kernel);
+
 #ifdef CONFIG_PPC_RTAS
 /*
  * The crashkernel region will almost always overlap the RTAS region, so
-- 
2.41.0



Re: [PATCH 0/4] ppc, fbdev: Clean up fbdev mmap helper

2023-09-05 Thread Thomas Zimmermann

Hi

Am 05.09.23 um 04:47 schrieb Michael Ellerman:

Thomas Zimmermann  writes:

Refactor fb_pgprotect() in PowerPC to work without struct file. Then
clean up and rename fb_pgprotect(). This change has been discussed at
[1] in the context of refactoring fbdev's mmap code.

The first three patches adapt PowerPC's internal interfaces to
provide a phys_mem_access_prot() that works without struct file. Neither
the architecture code or fbdev helpers need the parameter.

Patch 4 replaces fbdev's fb_pgprotect() with fb_pgprot_device() on
all architectures. The new helper with its stream-lined interface
enables more refactoring within fbdev's mmap implementation.


The content of this series is OK, but the way it's structured makes it a
real headache to merge, because it's mostly powerpc changes and then a
dependant cross architecture patch at the end.

It would be simpler if patch 4 was first and just passed file=NULL to
the powerpc helper, with an explanation that it's unused and will be
dropped in a future cleanup.

We could then put the first patch (previously patch 4) in a topic branch
that is shared between the powerpc tree and the fbdev tree, and then the
powerpc changes could be staged on top of that through the powerpc tree.


Ok, thanks for reviewing. The fbdev patch would go through the drm-misc 
tree as base for further refactoring.


I'll update the patchset accordingly.

Best regards
Thomas



cheers


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH v2 3/3] perf vendor events: Update metric events for power10 platform

2023-09-05 Thread Kajol Jain
Update JSON/events for power10 platform with additional metrics.

Signed-off-by: Kajol Jain 
---
 .../arch/powerpc/power10/metrics.json | 388 ++
 1 file changed, 388 insertions(+)

diff --git a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json 
b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json
index 4d66b75c6ad5..a36621858ea3 100644
--- a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json
+++ b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json
@@ -434,6 +434,13 @@
 "MetricName": "L1_INST_MISS_RATE",
 "ScaleUnit": "1%"
 },
+{
+"BriefDescription": "Percentage of completed instructions that were 
demand fetches that missed the L1 and L2 instruction cache",
+"MetricExpr": "PM_INST_FROM_L2MISS * 100 / PM_RUN_INST_CMPL",
+"MetricGroup": "General",
+"MetricName": "L2_INST_MISS_RATE",
+"ScaleUnit": "1%"
+},
 {
 "BriefDescription": "Percentage of completed instructions that were 
demand fetches that reloaded from beyond the L3 icache",
 "MetricExpr": "PM_INST_FROM_L3MISS / PM_RUN_INST_CMPL * 100",
@@ -466,6 +473,13 @@
 "MetricGroup": "General",
 "MetricName": "LOADS_PER_INST"
 },
+{
+"BriefDescription": "Percentage of demand loads that reloaded from the 
L2 per completed instruction",
+"MetricExpr": "PM_DATA_FROM_L2 * 100 / PM_RUN_INST_CMPL",
+"MetricGroup": "dL1_Reloads",
+"MetricName": "DL1_RELOAD_FROM_L2_RATE",
+"ScaleUnit": "1%"
+},
 {
 "BriefDescription": "Percentage of demand loads that reloaded from 
beyond the L2 per completed instruction",
 "MetricExpr": "PM_DATA_FROM_L2MISS / PM_RUN_INST_CMPL * 100",
@@ -473,6 +487,34 @@
 "MetricName": "DL1_RELOAD_FROM_L2_MISS_RATE",
 "ScaleUnit": "1%"
 },
+{
+"BriefDescription": "Percentage of demand loads that reloaded using 
modified data from another core's L2 or L3 on a remote chip, per completed 
instruction",
+"MetricExpr": "PM_DATA_FROM_RL2L3_MOD * 100 / PM_RUN_INST_CMPL",
+"MetricGroup": "dL1_Reloads",
+"MetricName": "DL1_RELOAD_FROM_RL2L3_MOD_RATE",
+"ScaleUnit": "1%"
+},
+{
+"BriefDescription": "Percentage of demand loads that reloaded using 
shared data from another core's L2 or L3 on a remote chip, per completed 
instruction",
+"MetricExpr": "PM_DATA_FROM_RL2L3_SHR * 100 / PM_RUN_INST_CMPL",
+"MetricGroup": "dL1_Reloads",
+"MetricName": "DL1_RELOAD_FROM_RL2L3_SHR_RATE",
+"ScaleUnit": "1%"
+},
+{
+"BriefDescription": "Percentage of demand loads that reloaded from the 
L3 per completed instruction",
+"MetricExpr": "PM_DATA_FROM_L3 * 100 / PM_RUN_INST_CMPL",
+"MetricGroup": "dL1_Reloads",
+"MetricName": "DL1_RELOAD_FROM_L3_RATE",
+"ScaleUnit": "1%"
+},
+{
+"BriefDescription": "Percentage of demand loads that reloaded with 
data brought into the L3 by prefetch per completed instruction",
+"MetricExpr": "PM_DATA_FROM_L3_MEPF * 100 / PM_RUN_INST_CMPL",
+"MetricGroup": "dL1_Reloads",
+"MetricName": "DL1_RELOAD_FROM_L3_MEPF_RATE",
+"ScaleUnit": "1%"
+},
 {
 "BriefDescription": "Percentage of demand loads that reloaded from 
beyond the L3 per completed instruction",
 "MetricExpr": "PM_DATA_FROM_L3MISS / PM_RUN_INST_CMPL * 100",
@@ -480,6 +522,66 @@
 "MetricName": "DL1_RELOAD_FROM_L3_MISS_RATE",
 "ScaleUnit": "1%"
 },
+{
+"BriefDescription": "Percentage of demand loads that reloaded using 
modified data from another core's L2 or L3 on a distant chip, per completed 
instruction",
+"MetricExpr": "PM_DATA_FROM_DL2L3_MOD * 100 / PM_RUN_INST_CMPL",
+"MetricGroup": "dL1_Reloads",
+"MetricName": "DL1_RELOAD_FROM_DL2L3_MOD_RATE",
+"ScaleUnit": "1%"
+},
+{
+"BriefDescription": "Percentage of demand loads that reloaded using 
shared data from another core's L2 or L3 on a distant chip, per completed 
instruction",
+"MetricExpr": "PM_DATA_FROM_DL2L3_SHR * 100 / PM_RUN_INST_CMPL",
+"MetricGroup": "dL1_Reloads",
+"MetricName": "DL1_RELOAD_FROM_DL2L3_SHR_RATE",
+"ScaleUnit": "1%"
+},
+{
+"BriefDescription": "Percentage of demand loads that reloaded from 
local memory per completed instruction",
+"MetricExpr": "PM_DATA_FROM_LMEM * 100 / PM_RUN_INST_CMPL",
+"MetricGroup": "dL1_Reloads",
+"MetricName": "DL1_RELOAD_FROM_LMEM_RATE",
+"ScaleUnit": "1%"
+},
+{
+"BriefDescription": "Percentage of demand loads that reloaded from 
remote memory per completed instruction",
+"MetricExpr": "PM_DATA_FROM_RMEM * 100 / PM_RUN_INST_CMPL",
+"MetricGroup": "dL1_Reloads",
+"MetricName": "DL1_RELOAD_FROM_RMEM_RATE",
+"ScaleUnit": "1%"
+},
+{
+ 

[PATCH v2 2/3] perf vendor events: Update JSON/events for power10 platform

2023-09-05 Thread Kajol Jain
Update JSON/Events list with additional data-source events
for power10 platform.

Signed-off-by: Kajol Jain 
---
 .../arch/powerpc/power10/datasource.json  | 505 ++
 1 file changed, 505 insertions(+)

diff --git a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json 
b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
index 12cfb9785433..6b0356f2d301 100644
--- a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
+++ b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
@@ -1278,5 +1278,510 @@
 "EventCode": "0x0A424000C142",
 "EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3_MOD",
 "BriefDescription": "The processor's L1 data cache was reloaded with a 
line in the M (exclusive) state from another core's L2 or L3 on the same chip 
in a different regent due to a demand miss for a marked instruction."
+  },
+  {
+"EventCode": "0x0A424020C142",
+"EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3_MOD_ALL",
+"BriefDescription": "The processor's L1 data cache was reloaded with a 
line in the M (exclusive) state from another core's L2 or L3 on the same chip 
in a different regent due to a demand miss or prefetch reload for a marked 
instruction."
+  },
+  {
+"EventCode": "0x0A03C142",
+"EventName": "PM_MRK_INST_FROM_NON_REGENT_L2L3",
+"BriefDescription": "The processor's instruction cache was reloaded from 
another core's L2 or L3 on the same chip in a different regent due to a demand 
miss for a marked instruction."
+  },
+  {
+"EventCode": "0x0A034000C142",
+"EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3",
+"BriefDescription": "The processor's L1 data cache was reloaded from 
another core's L2 or L3 on the same chip in a different regent due to a demand 
miss for a marked instruction."
+  },
+  {
+"EventCode": "0x0A030010C142",
+"EventName": "PM_MRK_INST_FROM_NON_REGENT_L2L3_ALL",
+"BriefDescription": "The processor's instruction cache was reloaded from 
another core's L2 or L3 on the same chip in a different regent due to a demand 
miss or prefetch reload for a marked instruction."
+  },
+  {
+"EventCode": "0x0A034020C142",
+"EventName": "PM_MRK_DATA_FROM_NON_REGENT_L2L3_ALL",
+"BriefDescription": "The processor's L1 data cache was reloaded from 
another core's L2 or L3 on the same chip in a different regent due to a demand 
miss or prefetch reload for a marked instruction."
+  },
+  {
+"EventCode": "0x0941C142",
+"EventName": "PM_MRK_INST_FROM_LMEM",
+"BriefDescription": "The processor's instruction cache was reloaded from 
the local chip's memory due to a demand miss for a marked instruction."
+  },
+  {
+"EventCode": "0x09404000C142",
+"EventName": "PM_MRK_DATA_FROM_LMEM",
+"BriefDescription": "The processor's L1 data cache was reloaded from the 
local chip's memory due to a demand miss for a marked instruction."
+  },
+  {
+"EventCode": "0x09410010C142",
+"EventName": "PM_MRK_INST_FROM_LMEM_ALL",
+"BriefDescription": "The processor's instruction cache was reloaded from 
the local chip's memory due to a demand miss or prefetch reload for a marked 
instruction."
+  },
+  {
+"EventCode": "0x09404020C142",
+"EventName": "PM_MRK_DATA_FROM_LMEM_ALL",
+"BriefDescription": "The processor's L1 data cache was reloaded from the 
local chip's memory due to a demand miss or prefetch reload for a marked 
instruction."
+  },
+  {
+"EventCode": "0x09804000C142",
+"EventName": "PM_MRK_DATA_FROM_L_OC_CACHE",
+"BriefDescription": "The processor's L1 data cache was reloaded from the 
local chip's OpenCAPI cache due to a demand miss for a marked instruction."
+  },
+  {
+"EventCode": "0x09804020C142",
+"EventName": "PM_MRK_DATA_FROM_L_OC_CACHE_ALL",
+"BriefDescription": "The processor's L1 data cache was reloaded from the 
local chip's OpenCAPI cache due to a demand miss or prefetch reload for a 
marked instruction."
+  },
+  {
+"EventCode": "0x09C04000C142",
+"EventName": "PM_MRK_DATA_FROM_L_OC_MEM",
+"BriefDescription": "The processor's L1 data cache was reloaded from the 
local chip's OpenCAPI memory due to a demand miss for a marked instruction."
+  },
+  {
+"EventCode": "0x09C04020C142",
+"EventName": "PM_MRK_DATA_FROM_L_OC_MEM_ALL",
+"BriefDescription": "The processor's L1 data cache was reloaded from the 
local chip's OpenCAPI memory due to a demand miss or prefetch reload for a 
marked instruction."
+  },
+  {
+"EventCode": "0x0981C142",
+"EventName": "PM_MRK_INST_FROM_L_OC_ANY",
+"BriefDescription": "The processor's instruction cache was reloaded from 
the local chip's OpenCAPI cache or memory due to a demand miss for a marked 
instruction."
+  },
+  {
+"EventCode": "0x09814000C142",
+"EventName": "PM_MRK_DATA_FROM_L_OC_ANY",
+"BriefDescription": "The processor's L1 data cache was reloaded from the 
local chip's 

[PATCH v2 1/3] perf vendor events: Update JSON/events for power10 platform

2023-09-05 Thread Kajol Jain
Update JSON/Events list with data-source events for power10 platform.

Signed-off-by: Kajol Jain 
---
 .../arch/powerpc/power10/datasource.json  | 1282 +
 .../arch/powerpc/power10/others.json  |   10 -
 .../arch/powerpc/power10/translation.json |5 -
 3 files changed, 1282 insertions(+), 15 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/datasource.json

---
Changelog:
v1->v2
- Split first patch as its bounce from
  linux-perf-us...@vger.kernel.org mailing list because of 
  'Message too long (>10 chars)' error.
---
diff --git a/tools/perf/pmu-events/arch/powerpc/power10/datasource.json 
b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
new file mode 100644
index ..12cfb9785433
--- /dev/null
+++ b/tools/perf/pmu-events/arch/powerpc/power10/datasource.json
@@ -0,0 +1,1282 @@
+[
+  {
+"EventCode": "0x200FE",
+"EventName": "PM_DATA_FROM_L2MISS",
+"BriefDescription": "The processor's L1 data cache was reloaded from a 
source beyond the local core's L2 due to a demand miss."
+  },
+  {
+"EventCode": "0x300FE",
+"EventName": "PM_DATA_FROM_L3MISS",
+"BriefDescription": "The processor's L1 data cache was reloaded from 
beyond the local core's L3 due to a demand miss."
+  },
+  {
+"EventCode": "0x400FE",
+"EventName": "PM_DATA_FROM_MEMORY",
+"BriefDescription": "The processor's data cache was reloaded from local, 
remote, or distant memory due to a demand miss."
+  },
+  {
+"EventCode": "0x0003C040",
+"EventName": "PM_INST_FROM_L2",
+"BriefDescription": "The processor's instruction cache was reloaded from 
the local core's L2 due to a demand miss."
+  },
+  {
+"EventCode": "0x00034000C040",
+"EventName": "PM_DATA_FROM_L2",
+"BriefDescription": "The processor's L1 data cache was reloaded from the 
local core's L2 due to a demand miss."
+  },
+  {
+"EventCode": "0x00030010C040",
+"EventName": "PM_INST_FROM_L2_ALL",
+"BriefDescription": "The processor's instruction cache was reloaded from 
the local core's L2 due to a demand miss or prefetch reload."
+  },
+  {
+"EventCode": "0x00034020C040",
+"EventName": "PM_DATA_FROM_L2_ALL",
+"BriefDescription": "The processor's L1 data cache was reloaded from the 
local core's L2 due to a demand miss or prefetch reload."
+  },
+  {
+"EventCode": "0x003FC040",
+"EventName": "PM_INST_FROM_L1MISS",
+"BriefDescription": "The processor's instruction cache was reloaded from a 
source beyond the local core's L1 due to a demand miss."
+  },
+  {
+"EventCode": "0x003F4000C040",
+"EventName": "PM_DATA_FROM_L1MISS",
+"BriefDescription": "The processor's L1 data cache was reloaded from a 
source beyond the local core's L1 due to a demand miss."
+  },
+  {
+"EventCode": "0x003F0010C040",
+"EventName": "PM_INST_FROM_L1MISS_ALL",
+"BriefDescription": "The processor's instruction cache was reloaded from a 
source beyond the local core's L1 due to a demand miss or prefetch reload."
+  },
+  {
+"EventCode": "0x003F4020C040",
+"EventName": "PM_DATA_FROM_L1MISS_ALL",
+"BriefDescription": "The processor's L1 data cache was reloaded from a 
source beyond the local core's L1 due to a demand miss or prefetch reload."
+  },
+  {
+"EventCode": "0x4000C040",
+"EventName": "PM_DATA_FROM_L2_NO_CONFLICT",
+"BriefDescription": "The processor's L1 data cache was reloaded without 
dispatch conflicts with data NOT in the MEPF state from the local core's L2 due 
to a demand miss."
+  },
+  {
+"EventCode": "0x4020C040",
+"EventName": "PM_DATA_FROM_L2_NO_CONFLICT_ALL",
+"BriefDescription": "The processor's L1 data cache was reloaded without 
dispatch conflicts with data NOT in the MEPF state from the local core's L2 due 
to a demand miss or prefetch reload."
+  },
+  {
+"EventCode": "0x00404000C040",
+"EventName": "PM_DATA_FROM_L2_MEPF",
+"BriefDescription": "The processor's L1 data cache was reloaded with data 
in the MEPF state without dispatch conflicts from the local core's L2 due to a 
demand miss."
+  },
+  {
+"EventCode": "0x00404020C040",
+"EventName": "PM_DATA_FROM_L2_MEPF_ALL",
+"BriefDescription": "The processor's L1 data cache was reloaded with data 
in the MEPF state without dispatch conflicts from the local core's L2 due to a 
demand miss or prefetch reload."
+  },
+  {
+"EventCode": "0x00804000C040",
+"EventName": "PM_DATA_FROM_L2_LDHITST_CONFLICT",
+"BriefDescription": "The processor's L1 data cache was reloaded with data 
that had a dispatch conflict on ld-hit-store from the local core's L2 due to a 
demand miss."
+  },
+  {
+"EventCode": "0x00804020C040",
+"EventName": "PM_DATA_FROM_L2_LDHITST_CONFLICT_ALL",
+"BriefDescription": "The processor's L1 data cache was reloaded with data 
that had a dispatch conflict on ld-hit-store from the 

Re: [PATCH v4] drivers/net: process the result of hdlc_open() and add call of hdlc_close() in uhdlc_close()

2023-09-05 Thread Paolo Abeni
On Mon, 2023-09-04 at 17:03 +, Christophe Leroy wrote:
> 
> Le 04/09/2023 à 14:31, Alexandra Diupina a écrit :
> > diff --git a/drivers/net/wan/fsl_ucc_hdlc.c b/drivers/net/wan/fsl_ucc_hdlc.c
> > index 47c2ad7a3e42..fd999dabdd39 100644
> > --- a/drivers/net/wan/fsl_ucc_hdlc.c
> > +++ b/drivers/net/wan/fsl_ucc_hdlc.c
> > @@ -34,6 +34,8 @@
> >   #define TDM_PPPOHT_SLIC_MAXIN
> >   #define RX_BD_ERRORS (R_CD_S | R_OV_S | R_CR_S | R_AB_S | R_NO_S | R_LG_S)
> >   
> > +static int uhdlc_close(struct net_device *dev);
> > +
> >   static struct ucc_tdm_info utdm_primary_info = {
> > .uf_info = {
> > .tsa = 0,
> > @@ -731,7 +733,9 @@ static int uhdlc_open(struct net_device *dev)
> > napi_enable(>napi);
> > netdev_reset_queue(dev);
> > netif_start_queue(dev);
> > -   hdlc_open(dev);
> > +
> > +   int rc = hdlc_open(dev);
> 
> Do not mix declarations and code. Please put all declaration at the top 
> of the block.
> 
> > +   return rc == 0 ? 0 : (uhdlc_close(dev), rc);
> > }
> 
> That's not easy to read.
> 
> I know that's more changes, but I'd prefer something like:
> 
> static int uhdlc_open(struct net_device *dev)
> {
>   u32 cecr_subblock;
>   hdlc_device *hdlc = dev_to_hdlc(dev);
>   struct ucc_hdlc_private *priv = hdlc->priv;
>   struct ucc_tdm *utdm = priv->utdm;
>   int rc;
> 
>   if (priv->hdlc_busy != 1)
>   return 0;
> 
>   if (request_irq(priv->ut_info->uf_info.irq,
>   ucc_hdlc_irq_handler, 0, "hdlc", priv))
>   return -ENODEV;
> 
>   cecr_subblock = ucc_fast_get_qe_cr_subblock(
>   priv->ut_info->uf_info.ucc_num);
> 
>   qe_issue_cmd(QE_INIT_TX_RX, cecr_subblock,
>QE_CR_PROTOCOL_UNSPECIFIED, 0);
> 
>   ucc_fast_enable(priv->uccf, COMM_DIR_RX | COMM_DIR_TX);
> 
>   /* Enable the TDM port */
>   if (priv->tsa)
>   qe_setbits_8(>si_regs->siglmr1_h, 0x1 << utdm->tdm_port);
> 
>   priv->hdlc_busy = 1;
>   netif_device_attach(priv->ndev);
>   napi_enable(>napi);
>   netdev_reset_queue(dev);
>   netif_start_queue(dev);
> 
>   rc = hdlc_open(dev);
>   if (rc)
>   uhdlc_close(dev);
> 
>   return rc;
> }

I agree the above is more readable, but I don't think the whole
refactor is not worthy for a -net fix. I think simply rewriting the
final statements as:

rc = hdlc_open(dev);
if (rc)
uhdlc_close(dev);

return rc;  

would be good for -net.
 
> > return 0;
> > @@ -824,6 +828,8 @@ static int uhdlc_close(struct net_device *dev)
> > netdev_reset_queue(dev);
> > priv->hdlc_busy = 0;
> >   
> > +   hdlc_close(dev);
> > +
> >   return 0;
> > 
> 
> And while you are looking at the correctness of this code, is it sure 
> that uhdlc_open() cannot be called twice in parallele ?
> If it can be called in parallèle I think the "if (priv->hdlc_busy != 1)" 
> should be replaced by something using cmpxchg()

That part is safe, ndo_open() is invoked under the rtnl lock.

The other comments are IMHO relevant, @Alexandra: please address them.

Thanks!

Paolo



Re: [PATCH v7 3/8] KVM: Make __kvm_follow_pfn not imply FOLL_GET

2023-09-05 Thread David Stevens
On Wed, Jul 12, 2023 at 7:00 AM Sean Christopherson  wrote:
>
> On Tue, Jul 11, 2023, Zhi Wang wrote:
> > On Thu, 6 Jul 2023 15:49:39 +0900
> > David Stevens  wrote:
> >
> > > On Wed, Jul 5, 2023 at 10:19___PM Zhi Wang  
> > > wrote:
> > > >
> > > > On Tue,  4 Jul 2023 16:50:48 +0900
> > > > David Stevens  wrote:
> > > > If yes, do we have to use FOLL_GET to resolve GFN associated with a 
> > > > tail page?
> > > > It seems gup can tolerate gup_flags without FOLL_GET, but it is more 
> > > > like a
> > > > temporary solution. I don't think it is a good idea to play tricks with
> > > > a temporary solution, more like we are abusing the toleration.
> > >
> > > I'm not sure I understand what you're getting at. This series never
> > > calls gup without FOLL_GET.
> > >
> > > This series aims to provide kvm_follow_pfn as a unified API on top of
> > > gup+follow_pte. Since one of the major clients of this API uses an mmu
> > > notifier, it makes sense to support returning a pfn without taking a
> > > reference. And we indeed need to do that for certain types of memory.
> > >
> >
> > I am not having prob with taking a pfn without taking a ref. I am
> > questioning if using !FOLL_GET in struct kvm_follow_pfn to indicate taking
> > a pfn without a ref is a good idea or not, while there is another flag
> > actually showing it.
> >
> > I can understand that using FOLL_XXX in kvm_follow_pfn saves some
> > translation between struct kvm_follow_pfn.{write, async, } and GUP
> > flags. However FOLL_XXX is for GUP. Using FOLL_XXX for reflecting the
> > requirements of GUP in the code path that going to call GUP is reasonable.
> >
> > But using FOLL_XXX with purposes that are not related to GUP call really
> > feels off.
>
> I agree, assuming you're talking specifically about the logic in 
> hva_to_pfn_remapped()
> that handles non-refcounted pages, i.e. this
>
> if (get_page_unless_zero(page)) {
> foll->is_refcounted_page = true;
> if (!(foll->flags & FOLL_GET))
> put_page(page);
> } else if (foll->flags & FOLL_GET) {
> r = -EFAULT;
> }
>
> should be
>
> if (get_page_unless_zero(page)) {
> foll->is_refcounted_page = true;
> if (!(foll->flags & FOLL_GET))
> put_page(page);
> else if (!foll->guarded_by_mmu_notifier)
> r = -EFAULT;
>
> because it's not the desire to grab a reference that makes getting 
> non-refcounted
> pfns "safe", it's whether or not the caller is plugged into the MMU notifiers.
>
> Though that highlights that checking guarded_by_mmu_notifier should be done 
> for
> *all* non-refcounted pfns, not just non-refcounted struct page memory.

I think things are getting confused here because there are multiple
things which "safe" refers to. There are three different definitions
that I think are relevant here:

1) "safe" in the sense that KVM doesn't corrupt page reference counts
2) "safe" in the sense that KVM doesn't access pfns after they have been freed
3) "safe" in the sense that KVM doesn't use stale hva -> pfn translations

For property 1, FOLL_GET is important. If the caller passes FOLL_GET,
then they expect to be able to pass the returned pfn to
kvm_release_pfn. This means that when FOLL_GET is set, if
kvm_pfn_to_refcounted_page returns a page, then hva_to_pfn_remapped
must take a reference count to avoid eventually corrupting the page
ref count. I guess replacing the FOLL_GET check with
!guarded_by_mmu_notifier is logically equivalent because
__kvm_follow_pfn requires that at least one of guarded_by_mmu_notifier
and FOLL_GET is set. But since we're concerned about a property of the
refcount, I think that checking FOLL_GET is clearer.

For property 2, FOLL_GET is also important. If guarded_by_mmu_notifier
is set, then we're all good here. If guarded_by_mmu_notifier is not
set, then the check in __kvm_follow_pfn guarantees that FOLL_GET is
set. For struct page memory, we're safe because KVM will hold a
reference as long as it's still using the page. For non struct page
memory, we're not safe - this is where the breaking change of
allow_unsafe_mappings would go. Note that for non-refcounted struct
page, we can't use the allow_unsafe_mappings escape hatch. Since
FOLL_GET was requested, if we returned such a page, then the caller
would eventually corrupt the page refcount via kvm_release_pfn.

Property 3 would be nice, but we've already concluded that guarding
all translations with mmu notifiers is infeasible. So maintaining
property 2 is the best we can hope for.

> As for the other usage of FOLL_GET in this series (using it to conditionally 
> do
> put_page()), IMO that's very much related to the GUP call.  Invoking 
> put_page()
> is a hack to workaround the fact that GUP doesn't provide a way to get the pfn
> without grabbing a reference to the page.  In an ideal world, KVM would NOT 
> pass
> FOLL_GET to the various GUP 

Re: [PATCH RFC 1/2] powerpc/pseries: papr-vpd char driver for VPD retrieval

2023-09-05 Thread Michal Suchánek
On Tue, Sep 05, 2023 at 12:42:11PM +1000, Michael Ellerman wrote:
> Michal Suchánek  writes:
> > On Thu, Aug 31, 2023 at 12:59:25PM -0500, Nathan Lynch wrote:
> ...
> >> You (Michal) seem to favor a kernel-user ABI where user space is allowed
> >> to invoke arbitrary RTAS functions by name. But we already have that in
> >> the form of the rtas() syscall. (User space looks up function tokens by
> >> name in the DT.) The point of the series is that we need to move away
> >> from that. It's too low-level and user space has to use /dev/mem when
> >> invoking any of the less-simple RTAS functions.
> >
> > We don't have that, directly accessing /dev/mem does not really work.
> > And that's what needs fixing in my view.
> >
> > The rtas calls are all mechanically the same, the function implemented
> > here should be able to call any of them if there was a way to specify
> > the call.
> >
> > Given that there is desire to have access to multiple calls I don't
> > think it makes sense to allocate a separate device with different name
> > for each.
> 
> I think it does make sense.
> 
> We explicitly don't want a general "call any RTAS function" API.
> 
> We want tightly scoped APIs that do one thing, or a family of related
> things, but not anything & everything.
> 
> Having different devices for each of those APIs means permissions can be
> granted separately on those devices. So a user/group can be given access
> to the "papr-vpd" device, but not some other unrelated device that also
> happens to expose an RTAS service (eg. error injection).

Yes, it does work as a kludge for setting permissions for individual
calls.

Thanks

Michal


Re: [PATCH] powerpc/64e: Fix wrong test in __ptep_test_and_clear_young()

2023-09-05 Thread Christophe Leroy


Le 05/09/2023 à 06:46, Christophe Leroy a écrit :
> 
> 
> Le 05/09/2023 à 04:36, Michael Ellerman a écrit :
>> Christophe Leroy  writes:
>>> Commit 45201c879469 ("powerpc/nohash: Remove hash related code from
>>> nohash headers.") replaced:
>>>
>>>    if ((pte_val(*ptep) & (_PAGE_ACCESSED | _PAGE_HASHPTE)) == 0)
>>> return 0;
>>>
>>> By:
>>>
>>>    if (pte_young(*ptep))
>>> return 0;
>>>
>>> But it should be:
>>>
>>>    if (!pte_young(*ptep))
>>> return 0;
>>
>>
>> That seems bad :)
>>
>> But I don't know off the top of my head where
>> __ptep_test_and_clear_young() is used, and so what the symptoms could
>> be. Presumably nothing too bad or someone would have noticed?
>>
> 
> The two uses in mm/vmscan.c are as follows:
> 
>  if (!ptep_test_and_clear_young(args->vma, addr, pte + i))
>      VM_WARN_ON_ONCE(true);
> 
> So it seems to be expected to never happen.
> 
> The only useful place it is used seems to be folio_check_references() 
> which is part of the reclaim process. So probably it messes up swap, but 
> to what extent ? ppc64e is for embedded systems. Do embedded systems 
> have swap at all ?
> 
> Also surprising that it is also called from mm/debug_vm_pgtable.c so 
> shouldn't we have noticed earlier ? I'll check if it works.


I confirm CONFIG_DEBUG_VM_PGTABLE on QEMU detects the problem:

debug_vm_pgtable: [debug_vm_pgtable ]: Validating architecture 
page table helpers
[ cut here ]
WARNING: CPU: 0 PID: 1 at mm/debug_vm_pgtable.c:174 
debug_vm_pgtable+0x82c/0xa78
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-11584-g0ad6bbfc7dab #457
Hardware name: QEMU ppce500 e6500 0x80400020 QEMU e500
NIP:  c10fbe20 LR: c10fbdf8 CTR: 
REGS: c30cb860 TRAP: 0700   Not tainted  (6.5.0-11584-g0ad6bbfc7dab)
MSR:  80029002   CR: 44000484  XER: 
IRQMASK: 0
GPR00: c10fbde0 c30cbb00 c10e1200 c0003f2a9eb8
GPR04: 33cbee9df000 c39b0ef8 0039b124028d 0001
GPR08:  0001 c39b0ef8 0001
GPR12: 24000248 c132e000 c0001e38 
GPR16:    
GPR20:    
GPR24: c10da0c4 c112d068 c3015920 0020
GPR28: 1000 0001 c0003f2a9eb8 0039b124028d
NIP [c10fbe20] debug_vm_pgtable+0x82c/0xa78
LR [c10fbdf8] debug_vm_pgtable+0x804/0xa78
Call Trace:
[c30cbb00] [c10fbde0] debug_vm_pgtable+0x7ec/0xa78 
(unreliable)
[c30cbc70] [c0001a38] do_one_initcall+0x68/0x2b8
[c30cbd40] [c10db498] kernel_init_freeable+0x2d0/0x348
[c30cbde0] [c0001e64] kernel_init+0x34/0x170
[c30cbe50] [c594] ret_from_kernel_user_thread+0x14/0x1c
--- interrupt: 0 at 0x0
NIP:   LR:  CTR: 
REGS: c30cbe80 TRAP:    Not tainted  (6.5.0-11584-g0ad6bbfc7dab)
MSR:   <>  CR:   XER: 
IRQMASK: 0
GPR00:    
GPR04:    
GPR08:    
GPR12:    
GPR16:    
GPR20:    
GPR24:    
GPR28:    
NIP [] 0x0
LR [] 0x0
--- interrupt: 0
Code: 7c7e189e 4b16a3e9 e9410090 e92a 792b77e3 40c20010 79296842 
79299800 f92a e9410090 e92a 792977e2 <0b09> 3920 
f92a e9210090
---[ end trace  ]---

Christophe