Re: [RFC PATCH 3/3] objtool/mcount: Add powerpc specific functions

2022-03-29 Thread Josh Poimboeuf
On Tue, Mar 29, 2022 at 05:32:18PM +, Christophe Leroy wrote:
> 
> 
> Le 29/03/2022 à 14:01, Michael Ellerman a écrit :
> > Josh Poimboeuf  writes:
> >> On Sun, Mar 27, 2022 at 09:09:20AM +, Christophe Leroy wrote:
> >>> Second point is the endianess and 32/64 selection, especially when
> >>> crossbuilding. There is already some stuff regarding endianess based on
> >>> bswap_if_needed() but that's based on constant selection at build time
> >>> and I couldn't find an easy way to set it conditionaly based on the
> >>> target being built.
> >>>
> >>> Regarding 32/64 selection, there is almost nothing, it's based on using
> >>> type 'long' which means that at the time being the target and the build
> >>> platform must both be 32 bits or 64 bits.
> >>>
> >>> For both cases (endianess and 32/64) I think the solution should
> >>> probably be to start with the fileformat of the object file being
> >>> reworked by objtool.
> >>
> >> Do we really need to detect the endianness/bitness at runtime?  Objtool
> >> is built with the kernel, why not just build-in the same target
> >> assumptions as the kernel itself?
> > 
> > I don't think we need runtime detection. But it will need to support
> > basically most combinations of objtool running as 32-bit/64-bit LE/BE
> > while the kernel it's analysing is 32-bit/64-bit LE/BE.
> 
> Exactly, the way it is done today with a constant in 
> objtool/endianness.h is too simple, we need to be able to select it 
> based on kernel's config. Is there a way to get the CONFIG_ macros from 
> the kernel ? If yes then we could use CONFIG_64BIT and 
> CONFIG_CPU_LITTLE_ENDIAN to select the correct options in objtool.

As of now, there's no good way to get CONFIG options from the kernel.
That's pretty much by design, since objtool is meant to be a standalone
tool.  In fact there are people who've used objtool for other projects.

The objtool Makefile does at least have access to HOSTARCH/SRCARCH, but
I guess that doesn't help here.  We could maybe export the endian/bit
details in env variables to the objtool build somehow.

But, I managed to forget that objtool can already be cross-compiled for
a x86-64 target, from a 32-bit x86 LE host or a 64-bit powerpc BE host.
There are some people out there doing x86 kernel builds on such systems
who reported bugs, which were since fixed.  And the fixes were pretty
trivial, IIRC.

Libelf actually does a decent job of abstracting those details from
objtool.  So, forget what I said, it might be ok to just detect
endian/bit (and possibly even arch) at runtime like you originally
suggested.

For example bswap_if_needed() could be reworked to be a runtime check.

-- 
Josh



Re: [PATCH] ibmvscsis: increase INITIAL_SRP_LIMIT to 1024

2022-03-29 Thread Martin K. Petersen


Tyrel,

> The adapter request_limit is hardcoded to be INITIAL_SRP_LIMIT which
> is currently an arbitrary value of 800. Increase this value to 1024
> which better matches the characteristics of the typical IBMi Initiator
> that supports 32 LUNs and a queue depth of 32.

Applied to 5.18/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] ftrace: Make ftrace_graph_is_dead() a static branch

2022-03-29 Thread Steven Rostedt
On Fri, 25 Mar 2022 09:03:08 +0100
Christophe Leroy  wrote:

> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  

Small nit. Please order the includes in "upside-down x-mas tree" fashion:

#include 
#include 
#include 
#include 

Thanks,

-- Steve


Re: [PATCH v2] MAINTAINERS: Enlarge coverage of TRACING inside architectures

2022-03-29 Thread Steven Rostedt
On Fri, 25 Mar 2022 07:32:21 +0100
Christophe Leroy  wrote:

> Most architectures have ftrace related stuff in arch/*/kernel/ftrace.c
> but powerpc has it spread in multiple files located in
> arch/powerpc/kernel/trace/
> In several architectures, there are also additional files containing
> 'ftrace' as part of the name but with some prefix or suffix.

Acked-by: Steven Rostedt (Google) 

-- Steve


[GIT PULL] LIBNVDIMM update for v5.18

2022-03-29 Thread Dan Williams
Hi Linus, please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
tags/libnvdimm-for-5.18

...to receive the libnvdimm update for this cycle which includes the
deprecation of block-aperture mode and a new perf events interface for
the papr_scm nvdimm driver. The perf events approach was acked by
PeterZ. You will notice the top commit is less than a week old as
linux-next exposure identified some build failure scenarios. Kajol
turned around a fix and it has appeared in linux-next with no
additional reports. Some other fixups for the removal of
block-aperture mode also generated some follow-on fixes from -next
exposure.

I am not aware of anything else outstanding, please pull.

---

The following changes since commit 754e0b0e35608ed5206d6a67a791563c631cec07:

  Linux 5.17-rc4 (2022-02-13 12:13:30 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
tags/libnvdimm-for-5.18

for you to fetch changes up to ada8d8d337ee970860c9844126e634df8076aa11:

  nvdimm/blk: Fix title level (2022-03-23 17:52:33 -0700)


libnvdimm for 5.18

- Add perf support for nvdimm events, initially only for 'papr_scm'
  devices.

- Deprecate the 'block aperture' support in libnvdimm, it only ever
  existed in the specification, not in shipping product.


Dan Williams (6):
  nvdimm/region: Fix default alignment for small regions
  nvdimm/blk: Delete the block-aperture window driver
  nvdimm/namespace: Delete blk namespace consideration in shared paths
  nvdimm/namespace: Delete nd_namespace_blk
  ACPI: NFIT: Remove block aperture support
  nvdimm/region: Delete nd_blk_region infrastructure

Kajol Jain (6):
  drivers/nvdimm: Add nvdimm pmu structure
  drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  powerpc/papr_scm: Add perf interface support
  docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries
for nvdimm pmu
  drivers/nvdimm: Fix build failure when CONFIG_PERF_EVENTS is not set
  powerpc/papr_scm: Fix build failure when

Lukas Bulwahn (1):
  MAINTAINERS: remove section LIBNVDIMM BLK: MMIO-APERTURE DRIVER

Tom Rix (1):
  nvdimm/blk: Fix title level

 Documentation/ABI/testing/sysfs-bus-nvdimm |  35 ++
 Documentation/driver-api/nvdimm/nvdimm.rst | 406 +--
 MAINTAINERS|  11 -
 arch/powerpc/include/asm/device.h  |   5 +
 arch/powerpc/platforms/pseries/papr_scm.c  | 230 +
 drivers/acpi/nfit/core.c   | 387 +-
 drivers/acpi/nfit/nfit.h   |   6 -
 drivers/nvdimm/Kconfig |  25 +-
 drivers/nvdimm/Makefile|   4 +-
 drivers/nvdimm/blk.c   | 335 ---
 drivers/nvdimm/bus.c   |   2 -
 drivers/nvdimm/dimm_devs.c | 204 +---
 drivers/nvdimm/label.c | 346 +---
 drivers/nvdimm/label.h |   5 +-
 drivers/nvdimm/namespace_devs.c| 506 ++---
 drivers/nvdimm/nd-core.h   |  27 +-
 drivers/nvdimm/nd.h|  13 -
 drivers/nvdimm/nd_perf.c   | 329 +++
 drivers/nvdimm/region.c|  31 +-
 drivers/nvdimm/region_devs.c   | 157 ++---
 include/linux/libnvdimm.h  |  24 --
 include/linux/nd.h |  78 +++--
 include/uapi/linux/ndctl.h |   2 -
 tools/testing/nvdimm/Kbuild|   4 -
 tools/testing/nvdimm/config_check.c|   1 -
 tools/testing/nvdimm/test/ndtest.c |  67 +---
 tools/testing/nvdimm/test/nfit.c   |  23 --
 27 files changed, 833 insertions(+), 2430 deletions(-)
 delete mode 100644 drivers/nvdimm/blk.c
 create mode 100644 drivers/nvdimm/nd_perf.c


[PATCH] i2c: pasemi: Wait for write xfers to finish

2022-03-29 Thread Martin Povišer
Wait for completion of write transfers before returning from the driver.
At first sight it may seem advantageous to leave write transfers queued
for the controller to carry out on its own time, but there's a couple of
issues with it:

 * Driver doesn't check for FIFO space.

 * The queued writes can complete while the driver is in its I2C read
   transfer path which means it will get confused by the raising of
   XEN (the 'transaction ended' signal). This can cause a spurious
   ENODATA error due to premature reading of the MRXFIFO register.

Adding the wait fixes some unreliability issues with the driver. There's
some efficiency cost to it (especially with pasemi_smb_waitready doing
its polling), but that will be alleviated once the driver receives
interrupt support.

Fixes: beb58aa39e6e ("i2c: PA Semi SMBus driver")
Signed-off-by: Martin Povišer 
---

Tested on Apple's t8103 chip. To my knowledge the PA Semi controller
in its pre-Apple occurences behaves the same as far as this patch is
concerned.

 drivers/i2c/busses/i2c-pasemi-core.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/i2c/busses/i2c-pasemi-core.c 
b/drivers/i2c/busses/i2c-pasemi-core.c
index 7728c8460dc0..9028ffb58cc0 100644
--- a/drivers/i2c/busses/i2c-pasemi-core.c
+++ b/drivers/i2c/busses/i2c-pasemi-core.c
@@ -137,6 +137,12 @@ static int pasemi_i2c_xfer_msg(struct i2c_adapter *adapter,
 
TXFIFO_WR(smbus, msg->buf[msg->len-1] |
  (stop ? MTXFIFO_STOP : 0));
+
+   if (stop) {
+   err = pasemi_smb_waitready(smbus);
+   if (err)
+   goto reset_out;
+   }
}
 
return 0;
-- 
2.33.0



Re: [RFC PATCH 3/3] objtool/mcount: Add powerpc specific functions

2022-03-29 Thread Christophe Leroy


Le 29/03/2022 à 14:01, Michael Ellerman a écrit :
> Josh Poimboeuf  writes:
>> On Sun, Mar 27, 2022 at 09:09:20AM +, Christophe Leroy wrote:
>>> Second point is the endianess and 32/64 selection, especially when
>>> crossbuilding. There is already some stuff regarding endianess based on
>>> bswap_if_needed() but that's based on constant selection at build time
>>> and I couldn't find an easy way to set it conditionaly based on the
>>> target being built.
>>>
>>> Regarding 32/64 selection, there is almost nothing, it's based on using
>>> type 'long' which means that at the time being the target and the build
>>> platform must both be 32 bits or 64 bits.
>>>
>>> For both cases (endianess and 32/64) I think the solution should
>>> probably be to start with the fileformat of the object file being
>>> reworked by objtool.
>>
>> Do we really need to detect the endianness/bitness at runtime?  Objtool
>> is built with the kernel, why not just build-in the same target
>> assumptions as the kernel itself?
> 
> I don't think we need runtime detection. But it will need to support
> basically most combinations of objtool running as 32-bit/64-bit LE/BE
> while the kernel it's analysing is 32-bit/64-bit LE/BE.

Exactly, the way it is done today with a constant in 
objtool/endianness.h is too simple, we need to be able to select it 
based on kernel's config. Is there a way to get the CONFIG_ macros from 
the kernel ? If yes then we could use CONFIG_64BIT and 
CONFIG_CPU_LITTLE_ENDIAN to select the correct options in objtool.


> 
>>> What are current works in progress on objtool ? Should I wait Josh's
>>> changes before starting looking at all this ? Should I wait for anything
>>> else ?
>>
>> I'm not making any major changes to the code, just shuffling things
>> around to make the interface more modular.  I hope to have something
>> soon (this week).  Peter recently added a big feature (Intel IBT) which
>> is already in -next.
>>
>> Contributions are welcome, with the understanding that you'll help
>> maintain it ;-)
>>
>> Some years ago Kamalesh Babulal had a prototype of objtool for ppc64le
>> which did the full stack validation.  I'm not sure what ever became of
>> that.
> 
>  From memory he was starting to clean the patches up in late 2019, but I
> guess that probably got derailed by COVID. AFAIK he never posted
> anything. Maybe someone at IBM has a copy internally (Naveen?).
> 
>> FWIW, there have been some objtool patches for arm64 stack validation,
>> but the arm64 maintainers have been hesitant to get on board with
>> objtool, as it brings a certain maintenance burden.  Especially for the
>> full stack validation and ORC unwinder.  But if you only want inline
>> static calls and/or mcount then it'd probably be much easier to
>> maintain.
> 
> I would like to have the stack validation, but I am also worried about
> the maintenance burden.
> 
> I guess we start with mcount, which looks pretty minimal judging by this
> series, and see how we go from there.
> 

I'm not sure mcount is really needed as we have recordmcount, but at 
least it is an easy one to start with and as we have recordmount we can 
easily compare the results and check it works as expected.

Then it should be straight forward to provide static calls.

Then I'd like to go with uaccess blocks checks as suggested by Christoph 
at 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/a94be61f008ab29c231b805e1a97e9dab35cb0cc.1629732940.git.christophe.le...@csgroup.eu/,
 
thought it might be less easy.


Christophe

[PATCH v2 8/8] powerpc/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE for book3s

2022-03-29 Thread David Hildenbrand
Right now, the last 5 bits (0x1f) of the swap entry are used for the
type and the bit before that (0x20) is used for _PAGE_SWP_SOFT_DIRTY. We
cannot use 0x40, as that collides with _RPAGE_RSV1 -- contained in
_PAGE_HPTEFLAGS. The next candidate would be _RPAGE_SW3 (0x200) -- which is
used for _PAGE_SOFT_DIRTY for !swp ptes.

So let's just use _PAGE_SOFT_DIRTY for _PAGE_SWP_SOFT_DIRTY (to make it
easier to grasp) and use 0x20 now for _PAGE_SWP_EXCLUSIVE.

Signed-off-by: David Hildenbrand 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 21 +++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 8e98375d5c4a..eecff2036869 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -752,6 +752,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 */ \
BUILD_BUG_ON(_PAGE_HPTEFLAGS & SWP_TYPE_MASK); \
BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY);   \
+   BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_EXCLUSIVE);\
} while (0)
 
 #define SWP_TYPE_BITS 5
@@ -772,11 +773,13 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t 
newprot)
 #define __swp_entry_to_pmd(x)  (pte_pmd(__swp_entry_to_pte(x)))
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
-#define _PAGE_SWP_SOFT_DIRTY   _PAGE_NON_IDEMPOTENT
+#define _PAGE_SWP_SOFT_DIRTY   _PAGE_SOFT_DIRTY
 #else
 #define _PAGE_SWP_SOFT_DIRTY   0UL
 #endif /* CONFIG_MEM_SOFT_DIRTY */
 
+#define _PAGE_SWP_EXCLUSIVE_PAGE_NON_IDEMPOTENT
+
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
 {
@@ -794,6 +797,22 @@ static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
 }
 #endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
 
+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+   return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_SWP_EXCLUSIVE));
+}
+
+static inline int pte_swp_exclusive(pte_t pte)
+{
+   return !!(pte_raw(pte) & cpu_to_be64(_PAGE_SWP_EXCLUSIVE));
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+   return __pte_raw(pte_raw(pte) & cpu_to_be64(~_PAGE_SWP_EXCLUSIVE));
+}
+
 static inline bool check_pte_access(unsigned long access, unsigned long ptev)
 {
/*
-- 
2.35.1



[PATCH v2 7/8] powerpc/pgtable: remove _PAGE_BIT_SWAP_TYPE for book3s

2022-03-29 Thread David Hildenbrand
The swap type is simply stored in bits 0x1f of the swap pte. Let's
simplify by just getting rid of _PAGE_BIT_SWAP_TYPE. It's not like that
we can simply change it: _PAGE_SWP_SOFT_DIRTY would suddenly fall into
_RPAGE_RSV1, which isn't possible and would make the
BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY) angry.

While at it, make it clearer which bit we're actually using for
_PAGE_SWP_SOFT_DIRTY by just using the proper define and introduce and
use SWP_TYPE_MASK.

Signed-off-by: David Hildenbrand 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 875730d5af40..8e98375d5c4a 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -13,7 +13,6 @@
 /*
  * Common bits between hash and Radix page table
  */
-#define _PAGE_BIT_SWAP_TYPE0
 
 #define _PAGE_EXEC 0x1 /* execute permission */
 #define _PAGE_WRITE0x2 /* write access allowed */
@@ -751,17 +750,16 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t 
newprot)
 * Don't have overlapping bits with _PAGE_HPTEFLAGS \
 * We filter HPTEFLAGS on set_pte.  \
 */ \
-   BUILD_BUG_ON(_PAGE_HPTEFLAGS & (0x1f << _PAGE_BIT_SWAP_TYPE)); \
+   BUILD_BUG_ON(_PAGE_HPTEFLAGS & SWP_TYPE_MASK); \
BUILD_BUG_ON(_PAGE_HPTEFLAGS & _PAGE_SWP_SOFT_DIRTY);   \
} while (0)
 
 #define SWP_TYPE_BITS 5
-#define __swp_type(x)  (((x).val >> _PAGE_BIT_SWAP_TYPE) \
-   & ((1UL << SWP_TYPE_BITS) - 1))
+#define SWP_TYPE_MASK  ((1UL << SWP_TYPE_BITS) - 1)
+#define __swp_type(x)  ((x).val & SWP_TYPE_MASK)
 #define __swp_offset(x)(((x).val & PTE_RPN_MASK) >> PAGE_SHIFT)
 #define __swp_entry(type, offset)  ((swp_entry_t) { \
-   ((type) << _PAGE_BIT_SWAP_TYPE) \
-   | (((offset) << PAGE_SHIFT) & PTE_RPN_MASK)})
+   (type) | (((offset) << PAGE_SHIFT) & 
PTE_RPN_MASK)})
 /*
  * swp_entry_t must be independent of pte bits. We build a swp_entry_t from
  * swap type and offset we get from swap and convert that to pte to find a
@@ -774,7 +772,7 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 #define __swp_entry_to_pmd(x)  (pte_pmd(__swp_entry_to_pte(x)))
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
-#define _PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE))
+#define _PAGE_SWP_SOFT_DIRTY   _PAGE_NON_IDEMPOTENT
 #else
 #define _PAGE_SWP_SOFT_DIRTY   0UL
 #endif /* CONFIG_MEM_SOFT_DIRTY */
-- 
2.35.1



[PATCH v2 6/8] s390/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

2022-03-29 Thread David Hildenbrand
Let's use bit 52, which is unused.

Signed-off-by: David Hildenbrand 
---
 arch/s390/include/asm/pgtable.h | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 3982575bb586..a397b072a580 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -181,6 +181,8 @@ static inline int is_module_addr(void *addr)
 #define _PAGE_SOFT_DIRTY 0x000
 #endif
 
+#define _PAGE_SWP_EXCLUSIVE _PAGE_LARGE/* SW pte exclusive swap bit */
+
 /* Set of bits not changed in pte_modify */
 #define _PAGE_CHG_MASK (PAGE_MASK | _PAGE_SPECIAL | _PAGE_DIRTY | \
 _PAGE_YOUNG | _PAGE_SOFT_DIRTY)
@@ -826,6 +828,22 @@ static inline int pmd_protnone(pmd_t pmd)
 }
 #endif
 
+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline int pte_swp_exclusive(pte_t pte)
+{
+   return pte_val(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+   return set_pte_bit(pte, __pgprot(_PAGE_SWP_EXCLUSIVE));
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+   return clear_pte_bit(pte, __pgprot(_PAGE_SWP_EXCLUSIVE));
+}
+
 static inline int pte_soft_dirty(pte_t pte)
 {
return pte_val(pte) & _PAGE_SOFT_DIRTY;
@@ -1715,14 +1733,15 @@ static inline int has_transparent_hugepage(void)
  * Bits 54 and 63 are used to indicate the page type. Bit 53 marks the pte
  * as invalid.
  * A swap pte is indicated by bit pattern (pte & 0x201) == 0x200
- * | offset|X11XX|type |S0|
+ * | offset|E11XX|type |S0|
  * |001122334455|5|55566|66|
  * |0123456789012345678901234567890123456789012345678901|23456|78901|23|
  *
  * Bits 0-51 store the offset.
+ * Bit 52 (E) is used to remember PG_anon_exclusive.
  * Bits 57-61 store the type.
  * Bit 62 (S) is used for softdirty tracking.
- * Bits 52, 55 and 56 (X) are unused.
+ * Bits 55 and 56 (X) are unused.
  */
 
 #define __SWP_OFFSET_MASK  ((1UL << 52) - 1)
-- 
2.35.1



[PATCH v2 5/8] s390/pgtable: cleanup description of swp pte layout

2022-03-29 Thread David Hildenbrand
Bit 52 and bit 55 don't have to be zero: they only trigger a
translation-specifiation exception if the PTE is marked as valid, which
is not the case for swap ptes.

Document which bits are used for what, and which ones are unused.

Signed-off-by: David Hildenbrand 
---
 arch/s390/include/asm/pgtable.h | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 9df679152620..3982575bb586 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1712,18 +1712,17 @@ static inline int has_transparent_hugepage(void)
 /*
  * 64 bit swap entry format:
  * A page-table entry has some bits we have to treat in a special way.
- * Bits 52 and bit 55 have to be zero, otherwise a specification
- * exception will occur instead of a page translation exception. The
- * specification exception has the bad habit not to store necessary
- * information in the lowcore.
- * Bits 54 and 63 are used to indicate the page type.
+ * Bits 54 and 63 are used to indicate the page type. Bit 53 marks the pte
+ * as invalid.
  * A swap pte is indicated by bit pattern (pte & 0x201) == 0x200
- * This leaves the bits 0-51 and bits 56-62 to store type and offset.
- * We use the 5 bits from 57-61 for the type and the 52 bits from 0-51
- * for the offset.
- * | offset|01100|type |00|
+ * | offset|X11XX|type |S0|
  * |001122334455|5|55566|66|
  * |0123456789012345678901234567890123456789012345678901|23456|78901|23|
+ *
+ * Bits 0-51 store the offset.
+ * Bits 57-61 store the type.
+ * Bit 62 (S) is used for softdirty tracking.
+ * Bits 52, 55 and 56 (X) are unused.
  */
 
 #define __SWP_OFFSET_MASK  ((1UL << 52) - 1)
-- 
2.35.1



[PATCH v2 4/8] arm64/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

2022-03-29 Thread David Hildenbrand
Let's use one of the type bits: core-mm only supports 5, so there is no
need to consume 6.

Note that we might be able to reuse bit 1, but reusing bit 1 turned out
problematic in the past for PROT_NONE handling; so let's play safe and
use another bit.

Reviewed-by: Catalin Marinas 
Signed-off-by: David Hildenbrand 
---
 arch/arm64/include/asm/pgtable-prot.h |  1 +
 arch/arm64/include/asm/pgtable.h  | 23 ---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h 
b/arch/arm64/include/asm/pgtable-prot.h
index b1e1b74d993c..62e0ebeed720 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -14,6 +14,7 @@
  * Software defined PTE bits definition.
  */
 #define PTE_WRITE  (PTE_DBM)/* same as DBM (51) */
+#define PTE_SWP_EXCLUSIVE  (_AT(pteval_t, 1) << 2)  /* only for swp ptes */
 #define PTE_DIRTY  (_AT(pteval_t, 1) << 55)
 #define PTE_SPECIAL(_AT(pteval_t, 1) << 56)
 #define PTE_DEVMAP (_AT(pteval_t, 1) << 57)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 94e147e5456c..ad9b221963d4 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -402,6 +402,22 @@ static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot)
return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
 }
 
+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+   return set_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE));
+}
+
+static inline int pte_swp_exclusive(pte_t pte)
+{
+   return pte_val(pte) & PTE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+   return clear_pte_bit(pte, __pgprot(PTE_SWP_EXCLUSIVE));
+}
+
 #ifdef CONFIG_NUMA_BALANCING
 /*
  * See the comment in include/linux/pgtable.h
@@ -909,12 +925,13 @@ static inline pmd_t pmdp_establish(struct vm_area_struct 
*vma,
 /*
  * Encode and decode a swap entry:
  * bits 0-1:   present (must be zero)
- * bits 2-7:   swap type
+ * bits 2: remember PG_anon_exclusive
+ * bits 3-7:   swap type
  * bits 8-57:  swap offset
  * bit  58:PTE_PROT_NONE (must be zero)
  */
-#define __SWP_TYPE_SHIFT   2
-#define __SWP_TYPE_BITS6
+#define __SWP_TYPE_SHIFT   3
+#define __SWP_TYPE_BITS5
 #define __SWP_OFFSET_BITS  50
 #define __SWP_TYPE_MASK((1 << __SWP_TYPE_BITS) - 1)
 #define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT)
-- 
2.35.1



[PATCH v2 3/8] x86/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE

2022-03-29 Thread David Hildenbrand
Let's use bit 3 to remember PG_anon_exclusive in swap ptes.

Signed-off-by: David Hildenbrand 
---
 arch/x86/include/asm/pgtable.h   | 16 
 arch/x86/include/asm/pgtable_64.h|  4 +++-
 arch/x86/include/asm/pgtable_types.h |  5 +
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 62ab07e24aef..e42e668153e9 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1292,6 +1292,22 @@ static inline void update_mmu_cache_pud(struct 
vm_area_struct *vma,
 {
 }
 
+#define __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+   return pte_set_flags(pte, _PAGE_SWP_EXCLUSIVE);
+}
+
+static inline int pte_swp_exclusive(pte_t pte)
+{
+   return pte_flags(pte) & _PAGE_SWP_EXCLUSIVE;
+}
+
+static inline pte_t pte_swp_clear_exclusive(pte_t pte)
+{
+   return pte_clear_flags(pte, _PAGE_SWP_EXCLUSIVE);
+}
+
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
 {
diff --git a/arch/x86/include/asm/pgtable_64.h 
b/arch/x86/include/asm/pgtable_64.h
index 56d0399a0cd1..e479491da8d5 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -186,7 +186,7 @@ static inline void native_pgd_clear(pgd_t *pgd)
  *
  * | ...| 11| 10|  9|8|7|6|5| 4| 3|2| 1|0| <- bit number
  * | ...|SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names
- * | TYPE (59-63) | ~OFFSET (9-58)  |0|0|X|X| X| X|F|SD|0| <- swp entry
+ * | TYPE (59-63) | ~OFFSET (9-58)  |0|0|X|X| X| E|F|SD|0| <- swp entry
  *
  * G (8) is aliased and used as a PROT_NONE indicator for
  * !present ptes.  We need to start storing swap entries above
@@ -203,6 +203,8 @@ static inline void native_pgd_clear(pgd_t *pgd)
  * F (2) in swp entry is used to record when a pagetable is
  * writeprotected by userfaultfd WP support.
  *
+ * E (3) in swp entry is used to rememeber PG_anon_exclusive.
+ *
  * Bit 7 in swp entry should be 0 because pmd_present checks not only P,
  * but also L and G.
  *
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 40497a9020c6..54a8f370046d 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -83,6 +83,11 @@
 #define _PAGE_SOFT_DIRTY   (_AT(pteval_t, 0))
 #endif
 
+/*
+ * We borrow bit 3 to remember PG_anon_exclusive.
+ */
+#define _PAGE_SWP_EXCLUSIVE_PAGE_PWT
+
 /*
  * Tracking soft dirty bit when a page goes to a swap is tricky.
  * We need a bit which can be stored in pte _and_ not conflict
-- 
2.35.1



[PATCH v2 1/8] mm/swap: remember PG_anon_exclusive via a swp pte bit

2022-03-29 Thread David Hildenbrand
Currently, we clear PG_anon_exclusive in try_to_unmap() and forget about
it. We do this, to keep fork() logic on swap entries easy and efficient:
for example, if we wouldn't clear it when unmapping, we'd have to lookup
the page in the swapcache for each and every swap entry during fork() and
clear PG_anon_exclusive if set.

Instead, we want to store that information directly in the swap pte,
protected by the page table lock, similarly to how we handle
SWP_MIGRATION_READ_EXCLUSIVE for migration entries. However, for actual
swap entries, we don't want to mess with the swap type (e.g., still one
bit) because it overcomplicates swap code.

In try_to_unmap(), we already reject to unmap in case the page might be
pinned, because we must not lose PG_anon_exclusive on pinned pages ever.
Checking if there are other unexpected references reliably *before*
completely unmapping a page is unfortunately not really possible: THP
heavily overcomplicate the situation. Once fully unmapped it's easier --
we, for example, make sure that there are no unexpected references
*after* unmapping a page before starting writeback on that page.

So, we currently might end up unmapping a page and clearing
PG_anon_exclusive if that page has additional references, for example,
due to a FOLL_GET.

do_swap_page() has to re-determine if a page is exclusive, which will
easily fail if there are other references on a page, most prominently
GUP references via FOLL_GET. This can currently result in memory
corruptions when taking a FOLL_GET | FOLL_WRITE reference on a page even
when fork() is never involved: try_to_unmap() will succeed, and when
refaulting the page, it cannot be marked exclusive and will get replaced
by a copy in the page tables on the next write access, resulting in writes
via the GUP reference to the page being lost.

In an ideal world, everybody that uses GUP and wants to modify page
content, such as O_DIRECT, would properly use FOLL_PIN. However, that
conversion will take a while. It's easier to fix what used to work in the
past (FOLL_GET | FOLL_WRITE) remembering PG_anon_exclusive. In addition,
by remembering PG_anon_exclusive we can further reduce unnecessary COW
in some cases, so it's the natural thing to do.

So let's transfer the PG_anon_exclusive information to the swap pte and
store it via an architecture-dependant pte bit; use that information when
restoring the swap pte in do_swap_page() and unuse_pte(). During fork(), we
simply have to clear the pte bit and are done.

Of course, there is one corner case to handle: swap backends that don't
support concurrent page modifications while the page is under writeback.
Special case these, and drop the exclusive marker. Add a comment why that
is just fine (also, reuse_swap_page() would have done the same in the
past).

In the future, we'll hopefully have all architectures support
__HAVE_ARCH_PTE_SWP_EXCLUSIVE, such that we can get rid of the empty
stubs and the define completely. Then, we can also convert
SWP_MIGRATION_READ_EXCLUSIVE. For architectures it's fairly easy to
support: either simply use a yet unused pte bit that can be used for swap
entries, steal one from the arch type bits if they exceed 5, or steal one
from the offset bits.

Note: R/O FOLL_GET references were never really reliable, especially
when taking one on a shared page and then writing to the page (e.g., GUP
after fork()). FOLL_GET, including R/W references, were never really
reliable once fork was involved (e.g., GUP before fork(),
GUP during fork()). KSM steps back in case it stumbles over unexpected
references and is, therefore, fine.

Signed-off-by: David Hildenbrand 
---
 include/linux/pgtable.h | 29 ++
 include/linux/swapops.h |  2 ++
 mm/memory.c | 55 ++---
 mm/rmap.c   | 19 --
 mm/swapfile.c   | 13 +-
 5 files changed, 105 insertions(+), 13 deletions(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index f4f4077b97aa..53750224e176 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1003,6 +1003,35 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, 
pgprot_t newprot)
 #define arch_start_context_switch(prev)do {} while (0)
 #endif
 
+/*
+ * When replacing an anonymous page by a real (!non) swap entry, we clear
+ * PG_anon_exclusive from the page and instead remember whether the flag was
+ * set in the swp pte. During fork(), we have to mark the entry as !exclusive
+ * (possibly shared). On swapin, we use that information to restore
+ * PG_anon_exclusive, which is very helpful in cases where we might have
+ * additional (e.g., FOLL_GET) references on a page and wouldn't be able to
+ * detect exclusivity.
+ *
+ * These functions don't apply to non-swap entries (e.g., migration, hwpoison,
+ * ...).
+ */
+#ifndef __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+static inline pte_t pte_swp_mkexclusive(pte_t pte)
+{
+   return pte;
+}
+
+static inline int 

[PATCH v2 2/8] mm/debug_vm_pgtable: add tests for __HAVE_ARCH_PTE_SWP_EXCLUSIVE

2022-03-29 Thread David Hildenbrand
Let's test that __HAVE_ARCH_PTE_SWP_EXCLUSIVE works as expected.

Signed-off-by: David Hildenbrand 
---
 mm/debug_vm_pgtable.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index db2abd9e415b..55f1a8dc716f 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -837,6 +837,19 @@ static void __init pmd_soft_dirty_tests(struct 
pgtable_debug_args *args) { }
 static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args) 
{ }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+static void __init pte_swap_exclusive_tests(struct pgtable_debug_args *args)
+{
+#ifdef __HAVE_ARCH_PTE_SWP_EXCLUSIVE
+   pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
+
+   pr_debug("Validating PTE swap exclusive\n");
+   pte = pte_swp_mkexclusive(pte);
+   WARN_ON(!pte_swp_exclusive(pte));
+   pte = pte_swp_clear_exclusive(pte);
+   WARN_ON(pte_swp_exclusive(pte));
+#endif /* __HAVE_ARCH_PTE_SWP_EXCLUSIVE */
+}
+
 static void __init pte_swap_tests(struct pgtable_debug_args *args)
 {
swp_entry_t swp;
@@ -1288,6 +1301,8 @@ static int __init debug_vm_pgtable(void)
pte_swap_soft_dirty_tests();
pmd_swap_soft_dirty_tests();
 
+   pte_swap_exclusive_tests();
+
pte_swap_tests();
pmd_swap_tests();
 
-- 
2.35.1



[PATCH v2 0/8] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages

2022-03-29 Thread David Hildenbrand
More information on the general COW issues can be found at [2]. This series
is based on latest linus/master and [1]:
[PATCH v3 00/16] mm: COW fixes part 2: reliable GUP pins of
anonymous pages

v2 is located at:
https://github.com/davidhildenbrand/linux/tree/cow_fixes_part_3_v2


This series fixes memory corruptions when a GUP R/W reference
(FOLL_WRITE | FOLL_GET) was taken on an anonymous page and COW logic fails
to detect exclusivity of the page to then replacing the anonymous page by
a copy in the page table: The GUP reference lost synchronicity with the
pages mapped into the page tables. This series focuses on x86, arm64,
s390x and ppc64/book3s -- other architectures are fairly easy to support
by implementing __HAVE_ARCH_PTE_SWP_EXCLUSIVE.

This primarily fixes the O_DIRECT memory corruptions that can happen on
concurrent swapout, whereby we lose DMA reads to a page (modifying the user
page by writing to it).

O_DIRECT currently uses FOLL_GET for short-term (!FOLL_LONGTERM)
DMA from/to a user page. In the long run, we want to convert it to properly
use FOLL_PIN, and John is working on it, but that might take a while and
might not be easy to backport. In the meantime, let's restore what used to
work before we started modifying our COW logic: make R/W FOLL_GET
references reliable as long as there is no fork() after GUP involved.

This is just the natural follow-up of part 2, that will also further
reduce "wrong COW" on the swapin path, for example, when we cannot remove
a page from the swapcache due to concurrent writeback, or if we have two
threads faulting on the same swapped-out page. Fixing O_DIRECT is just a
nice side-product

This issue, including other related COW issues, has been summarized in [3]
under 2):
"
  2. Intra Process Memory Corruptions due to Wrong COW (FOLL_GET)

  It was discovered that we can create a memory corruption by reading a
  file via O_DIRECT to a part (e.g., first 512 bytes) of a page,
  concurrently writing to an unrelated part (e.g., last byte) of the same
  page, and concurrently write-protecting the page via clear_refs
  SOFTDIRTY tracking [6].

  For the reproducer, the issue is that O_DIRECT grabs a reference of the
  target page (via FOLL_GET) and clear_refs write-protects the relevant
  page table entry. On successive write access to the page from the
  process itself, we wrongly COW the page when resolving the write fault,
  resulting in a loss of synchronicity and consequently a memory corruption.

  While some people might think that using clear_refs in this combination
  is a corner cases, it turns out to be a more generic problem unfortunately.

  For example, it was just recently discovered that we can similarly
  create a memory corruption without clear_refs, simply by concurrently
  swapping out the buffer pages [7]. Note that we nowadays even use the
  swap infrastructure in Linux without an actual swap disk/partition: the
  prime example is zram which is enabled as default under Fedora [10].

  The root issue is that a write-fault on a page that has additional
  references results in a COW and thereby a loss of synchronicity
  and consequently a memory corruption if two parties believe they are
  referencing the same page.
"

We don't particularly care about R/O FOLL_GET references: they were never
reliable and O_DIRECT doesn't expect to observe modifications from a page
after DMA was started.

Note that:
* this only fixes the issue on x86, arm64, s390x and ppc64/book3s
  ("enterprise architectures"). Other architectures have to implement
  __HAVE_ARCH_PTE_SWP_EXCLUSIVE to achieve the same.
* this does *not * consider any kind of fork() after taking the reference:
  fork() after GUP never worked reliably with FOLL_GET.
* Not losing PG_anon_exclusive during swapout was the last remaining
  piece. KSM already makes sure that there are no other references on
  a page before considering it for sharing. Page migration maintains
  PG_anon_exclusive and simply fails when there are additional references
  (freezing the refcount fails). Only swapout code dropped the
  PG_anon_exclusive flag because it requires more work to remember +
  restore it.

With this series in place, most COW issues of [3] are fixed on said
architectures. Other architectures can implement
__HAVE_ARCH_PTE_SWP_EXCLUSIVE fairly easily.

[1] https://lkml.kernel.org/r/20220329160440.193848-1-da...@redhat.com
[2] https://lkml.kernel.org/r/20211217113049.23850-1-da...@redhat.com
[3] https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4...@redhat.com

v2 -> v3:
* Rebased and retested
* "arm64/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE"
  -> Add RB and a comment to the patch description
* "s390/pgtable: cleanup description of swp pte layout"
  -> Added
* "s390/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE"
  -> Use new set_pte_bit()/clear_pte_bit()
  -> Fixups comments/patch description

David Hildenbrand (8):
  mm/swap: remember PG_anon_exclusive via a swp pte 

[PATCH] powerpc/pseries/vas: use default_groups in kobj_type

2022-03-29 Thread Greg Kroah-Hartman
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.  Move the pseries vas sysfs code to use default_groups field
which has been the preferred way since aa30f47cf666 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Haren Myneni 
Cc: Nicholas Piggin 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman 
---

Note, I would like to take this through my driver-core tree for 5.18-rc2
as this is the last hold-out of the default_attrs field.  It "snuck" in
as new code for 5.18-rc1, any objection to me taking it?

thanks,

greg k-h

 arch/powerpc/platforms/pseries/vas-sysfs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/vas-sysfs.c 
b/arch/powerpc/platforms/pseries/vas-sysfs.c
index 4a7fcde5afc0..909535ca513a 100644
--- a/arch/powerpc/platforms/pseries/vas-sysfs.c
+++ b/arch/powerpc/platforms/pseries/vas-sysfs.c
@@ -99,6 +99,7 @@ static struct attribute *vas_def_capab_attrs[] = {
_used_credits_attribute.attr,
NULL,
 };
+ATTRIBUTE_GROUPS(vas_def_capab);
 
 static struct attribute *vas_qos_capab_attrs[] = {
_total_credits_attribute.attr,
@@ -106,6 +107,7 @@ static struct attribute *vas_qos_capab_attrs[] = {
_total_credits_attribute.attr,
NULL,
 };
+ATTRIBUTE_GROUPS(vas_qos_capab);
 
 static ssize_t vas_type_show(struct kobject *kobj, struct attribute *attr,
 char *buf)
@@ -154,13 +156,13 @@ static const struct sysfs_ops vas_sysfs_ops = {
 static struct kobj_type vas_def_attr_type = {
.release=   vas_type_release,
.sysfs_ops  =   _sysfs_ops,
-   .default_attrs  =   vas_def_capab_attrs,
+   .default_groups =   vas_def_capab_groups,
 };
 
 static struct kobj_type vas_qos_attr_type = {
.release=   vas_type_release,
.sysfs_ops  =   _sysfs_ops,
-   .default_attrs  =   vas_qos_capab_attrs,
+   .default_groups =   vas_qos_capab_groups,
 };
 
 static char *vas_caps_kobj_name(struct vas_caps_entry *centry,
-- 
2.35.1



Re: [PATCH 09/22] gpio-winbond: Use C99 initializers

2022-03-29 Thread Bartosz Golaszewski
On Sat, Mar 26, 2022 at 6:00 PM Benjamin Stürz  wrote:
>
> This replaces comments with C99's designated
> initializers because the kernel supports them now.
>
> Signed-off-by: Benjamin Stürz 
> ---
>  drivers/gpio/gpio-winbond.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpio/gpio-winbond.c b/drivers/gpio/gpio-winbond.c
> index 7f8f5b02e31d..0b637fdb407c 100644
> --- a/drivers/gpio/gpio-winbond.c
> +++ b/drivers/gpio/gpio-winbond.c
> @@ -249,7 +249,7 @@ struct winbond_gpio_info {
>  };
>
>  static const struct winbond_gpio_info winbond_gpio_infos[6] = {
> -   { /* 0 */
> +   [0] = {
> .dev = WB_SIO_DEV_GPIO12,
> .enablereg = WB_SIO_GPIO12_REG_ENABLE,
> .enablebit = WB_SIO_GPIO12_ENABLE_1,
> @@ -266,7 +266,7 @@ static const struct winbond_gpio_info 
> winbond_gpio_infos[6] = {
> .warnonly = true
> }
> },
> -   { /* 1 */
> +   [1] = {
> .dev = WB_SIO_DEV_GPIO12,
> .enablereg = WB_SIO_GPIO12_REG_ENABLE,
> .enablebit = WB_SIO_GPIO12_ENABLE_2,
> @@ -277,7 +277,7 @@ static const struct winbond_gpio_info 
> winbond_gpio_infos[6] = {
> .datareg = WB_SIO_GPIO12_REG_DATA2
> /* special conflict handling so doesn't use conflict data */
> },
> -   { /* 2 */
> +   [2] = {
> .dev = WB_SIO_DEV_GPIO34,
> .enablereg = WB_SIO_GPIO34_REG_ENABLE,
> .enablebit = WB_SIO_GPIO34_ENABLE_3,
> @@ -294,7 +294,7 @@ static const struct winbond_gpio_info 
> winbond_gpio_infos[6] = {
> .warnonly = true
> }
> },
> -   { /* 3 */
> +   [3] = {
> .dev = WB_SIO_DEV_GPIO34,
> .enablereg = WB_SIO_GPIO34_REG_ENABLE,
> .enablebit = WB_SIO_GPIO34_ENABLE_4,
> @@ -311,7 +311,7 @@ static const struct winbond_gpio_info 
> winbond_gpio_infos[6] = {
> .warnonly = true
> }
> },
> -   { /* 4 */
> +   [4] = {
> .dev = WB_SIO_DEV_WDGPIO56,
> .enablereg = WB_SIO_WDGPIO56_REG_ENABLE,
> .enablebit = WB_SIO_WDGPIO56_ENABLE_5,
> @@ -328,7 +328,7 @@ static const struct winbond_gpio_info 
> winbond_gpio_infos[6] = {
> .warnonly = true
> }
> },
> -   { /* 5 */
> +   [5] = {
> .dev = WB_SIO_DEV_WDGPIO56,
> .enablereg = WB_SIO_WDGPIO56_REG_ENABLE,
> .enablebit = WB_SIO_WDGPIO56_ENABLE_6,
> --
> 2.35.1
>

Acked-by: Bartosz Golaszewski 


Re: [RFC PATCH 3/3] objtool/mcount: Add powerpc specific functions

2022-03-29 Thread Michael Ellerman
Josh Poimboeuf  writes:
> On Sun, Mar 27, 2022 at 09:09:20AM +, Christophe Leroy wrote:
>> Second point is the endianess and 32/64 selection, especially when 
>> crossbuilding. There is already some stuff regarding endianess based on 
>> bswap_if_needed() but that's based on constant selection at build time 
>> and I couldn't find an easy way to set it conditionaly based on the 
>> target being built.
>>
>> Regarding 32/64 selection, there is almost nothing, it's based on using 
>> type 'long' which means that at the time being the target and the build 
>> platform must both be 32 bits or 64 bits.
>> 
>> For both cases (endianess and 32/64) I think the solution should 
>> probably be to start with the fileformat of the object file being 
>> reworked by objtool.
>
> Do we really need to detect the endianness/bitness at runtime?  Objtool
> is built with the kernel, why not just build-in the same target
> assumptions as the kernel itself?

I don't think we need runtime detection. But it will need to support
basically most combinations of objtool running as 32-bit/64-bit LE/BE
while the kernel it's analysing is 32-bit/64-bit LE/BE.

>> What are current works in progress on objtool ? Should I wait Josh's 
>> changes before starting looking at all this ? Should I wait for anything 
>> else ?
>
> I'm not making any major changes to the code, just shuffling things
> around to make the interface more modular.  I hope to have something
> soon (this week).  Peter recently added a big feature (Intel IBT) which
> is already in -next.
>
> Contributions are welcome, with the understanding that you'll help
> maintain it ;-)
>
> Some years ago Kamalesh Babulal had a prototype of objtool for ppc64le
> which did the full stack validation.  I'm not sure what ever became of
> that.

>From memory he was starting to clean the patches up in late 2019, but I
guess that probably got derailed by COVID. AFAIK he never posted
anything. Maybe someone at IBM has a copy internally (Naveen?).

> FWIW, there have been some objtool patches for arm64 stack validation,
> but the arm64 maintainers have been hesitant to get on board with
> objtool, as it brings a certain maintenance burden.  Especially for the
> full stack validation and ORC unwinder.  But if you only want inline
> static calls and/or mcount then it'd probably be much easier to
> maintain.

I would like to have the stack validation, but I am also worried about
the maintenance burden.

I guess we start with mcount, which looks pretty minimal judging by this
series, and see how we go from there.

cheers


Re: [PATCH] powerpc/rtas: Keep MSR RI set when calling RTAS

2022-03-29 Thread Michael Ellerman
Laurent Dufour  writes:
> On 29/03/2022, 10:31:33, Nicholas Piggin wrote:
>> Excerpts from Laurent Dufour's message of March 17, 2022 9:06 pm:
>>> RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32bits
>>> mode (MSR[SF] unset).
>>>
>>> The change in MSR is done in enter_rtas() in a relatively complex way,
>>> since the MSR value could be hardcoded.
>>>
>>> Furthermore, a panic has been reported when hitting the watchdog interrupt
>>> while running in RTAS, this leads to the following stack trace:
>>>
>>> [69244.027433][   C24] watchdog: CPU 24 Hard LOCKUP
>>> [69244.027442][   C24] watchdog: CPU 24 TB:997512652051031, last heartbeat 
>>> TB:997504470175378 (15980ms ago)
>>> [69244.027451][   C24] Modules linked in: chacha_generic(E) libchacha(E) 
>>> xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) 
>>> libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) 
>>> algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) 
>>> fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) 
>>> cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) 
>>> algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) 
>>> rpcsec_gss_krb5(E) auth_rpcgss(E)
>>> nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) tcp_diag(E) 
>>> udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) 
>>> netlink_diag(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) 
>>> fscache(E) netfs(E) af_packet(E) rfkill(E) bonding(E) tls(E) ibmveth(EX) 
>>> crct10dif_vpmsum(E) rtc_generic(E) drm(E) drm_panel_orientation_quirks(E) 
>>> fuse(E) configfs(E) backlight(E) ip_tables(E) x_tables(E) 
>>> dm_service_time(E) sd_mod(E) t10_pi(E)
>>> [69244.027555][   C24]  ibmvfc(EX) scsi_transport_fc(E) vmx_crypto(E) 
>>> gf128mul(E) btrfs(E) blake2b_generic(E) libcrc32c(E) crc32c_vpmsum(E) 
>>> xor(E) raid6_pq(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) 
>>> dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) 
>>> scsi_mod(E)
>>> [69244.027587][   C24] Supported: No, Unreleased kernel
>>> [69244.027600][   C24] CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded 
>>> Tainted: GE  X5.14.21-150400.71.1.bz196362_2-default #1 
>>> SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
>>> [69244.027609][   C24] NIP:  1fb41050 LR: 1fb4104c CTR: 
>>> 
>>> [69244.027612][   C24] REGS: cfc33d60 TRAP: 0100   Tainted: G   
>>>  E  X (5.14.21-150400.71.1.bz196362_2-default)
>>> [69244.027615][   C24] MSR:  82981000   CR: 4882 
>>>  XER: 20040020
>>> [69244.027625][   C24] CFAR: 011c IRQMASK: 1
>>> [69244.027625][   C24] GPR00: 0003  
>>> 0001 50dc
>>> [69244.027625][   C24] GPR04: 1ffb6100 0020 
>>> 0001 1fb09010
>>> [69244.027625][   C24] GPR08: 2000  
>>>  
>>> [69244.027625][   C24] GPR12: 8004072a40a8 cff8b680 
>>> 0007 0034
>>> [69244.027625][   C24] GPR16: 1fbf6e94 1fbf6d84 
>>> 1fbd1db0 1fb3f008
>>> [69244.027625][   C24] GPR20: 1fb41018  
>>> 017f f68f
>>> [69244.027625][   C24] GPR24: 1fb18fe8 1fb3e000 
>>> 1fb1adc0 1fb1cf40
>>> [69244.027625][   C24] GPR28: 1fb26000 1fb460f0 
>>> 1fb17f18 1fb17000
>>> [69244.027663][   C24] NIP [1fb41050] 0x1fb41050
>>> [69244.027696][   C24] LR [1fb4104c] 0x1fb4104c
>>> [69244.027699][   C24] Call Trace:
>>> [69244.027701][   C24] Instruction dump:
>>> [69244.027723][   C24]      
>>>   
>>> [69244.027728][   C24]      
>>>   
>>> [69244.027762][T87504] Oops: Unrecoverable System Reset, sig: 6 [#1]
>>> [69244.028044][T87504] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA 
>>> pSeries
>>> [69244.028089][T87504] Modules linked in: chacha_generic(E) libchacha(E) 
>>> xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) 
>>> libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) 
>>> algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) 
>>> fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) 
>>> cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) 
>>> algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) 
>>> rpcsec_gss_krb5(E) auth_rpcgss(E)
>>> nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) tcp_diag(E) 
>>> udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) 
>>> netlink_diag(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) 
>>> fscache(E) netfs(E) af_packet(E) 

Re: [PATCH] livepatch: Remove klp_arch_set_pc() and asm/livepatch.h

2022-03-29 Thread Miroslav Benes
On Mon, 28 Mar 2022, Christophe Leroy wrote:

> All three versions of klp_arch_set_pc() do exactly the same: they
> call ftrace_instruction_pointer_set().
> 
> Call ftrace_instruction_pointer_set() directly and remove
> klp_arch_set_pc().
> 
> As klp_arch_set_pc() was the only thing remaining in asm/livepatch.h
> on x86 and s390, remove asm/livepatch.h
> 
> livepatch.h remains on powerpc but its content is exclusively used
> by powerpc specific code.
> 
> Signed-off-by: Christophe Leroy 

Acked-by: Miroslav Benes 

M


Re: [PATCH 1/3] sched: topology: add input parameter for sched_domain_flags_f()

2022-03-29 Thread Peter Zijlstra
On Tue, Mar 29, 2022 at 02:15:19AM -0700, Qing Wang wrote:
> From: Wang Qing 
> 
> sched_domain_flags_f() are statically set now, but actually, we can get a lot
> of necessary information based on the cpu_map. e.g. we can know whether its
> cache is shared.
> 
> Allows custom extension without affecting current.

Still NAK


[PATCH 3/3] arm64: add arm64 default topology

2022-03-29 Thread Qing Wang
From: Wang Qing 

default_topology does not fit arm64, especially CPU and cache topology.
Add arm64_topology, so we can do more based on CONFIG_GENERIC_ARCH_TOPOLOGY.

arm64_xxx_flags() prefer to get the cache attribute from DT.

Signed-off-by: Wang Qing 
---
 arch/arm64/kernel/smp.c | 56 +
 1 file changed, 56 insertions(+)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 27df5c1..d245012
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -715,6 +715,60 @@ void __init smp_init_cpus(void)
}
 }
 
+#ifdef CONFIG_SCHED_CLUSTER
+static int arm64_cluster_flags(const struct cpumask *cpu_map)
+{
+   int flag = cpu_cluster_flags();
+   int ret = cpu_share_private_cache(cpu_map);
+   if (ret == 1)
+   flag |= SD_SHARE_PKG_RESOURCES;
+   else if (ret == 0)
+   flag &= ~SD_SHARE_PKG_RESOURCES;
+
+   return flag;
+}
+#endif
+
+#ifdef CONFIG_SCHED_MC
+static int arm64_core_flags(const struct cpumask *cpu_map)
+{
+   int flag = cpu_core_flags();
+   int ret = cpu_share_private_cache(cpu_map);
+   if (ret == 1)
+   flag |= SD_SHARE_PKG_RESOURCES;
+   else if (ret == 0)
+   flag &= ~SD_SHARE_PKG_RESOURCES;
+
+   return flag;
+}
+#endif
+
+static int arm64_die_flags(const struct cpumask *cpu_map)
+{
+   int flag = 0;
+   int ret = cpu_share_private_cache(cpu_map);
+   if (ret == 1)
+   flag |= SD_SHARE_PKG_RESOURCES;
+   else if (ret == 0)
+   flag &= ~SD_SHARE_PKG_RESOURCES;
+
+   return flag;
+}
+
+static struct sched_domain_topology_level arm64_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+   { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+#ifdef CONFIG_SCHED_CLUSTER
+   { cpu_clustergroup_mask, arm64_cluster_flags, SD_INIT_NAME(CLS) },
+#endif
+#ifdef CONFIG_SCHED_MC
+   { cpu_coregroup_mask, arm64_core_flags, SD_INIT_NAME(MC) },
+#endif
+   { cpu_cpu_mask, arm64_die_flags, SD_INIT_NAME(DIE) },
+   { NULL, },
+};
+
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
const struct cpu_operations *ops;
@@ -723,6 +777,8 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
unsigned int this_cpu;
 
init_cpu_topology();
+   init_cpu_cache_topology();
+   set_sched_topology(arm64_topology);
 
this_cpu = smp_processor_id();
store_cpu_topology(this_cpu);
-- 
2.7.4



[PATCH 2/3] arch_topology: support for describing cache topology from DT

2022-03-29 Thread Qing Wang
From: Wang Qing 

When ACPI is not enabled, we can get cache topolopy from DT like:
*   cpu0: cpu@000 {
*   next-level-cache = <_1>;
*   L2_1: l2-cache {
*   compatible = "cache";
*   next-level-cache = <_1>;
*   };
*   L3_1: l3-cache {
*   compatible = "cache";
*   };
*   };
*
*   cpu1: cpu@001 {
*   next-level-cache = <_1>;
*   cpu-idle-states = <_l 
*   _mem _pll 
_bus
*   >;
*   };
*   cpu2: cpu@002 {
*   L2_2: l2-cache {
*   compatible = "cache";
*   next-level-cache = <_1>;
*   };
*   };
*
*   cpu3: cpu@003 {
*   next-level-cache = <_2>;
*   };
cache_topology hold the pointer describing "next-level-cache", 
it can describe the cache topology of every level.

Signed-off-by: Wang Qing 
---
 drivers/base/arch_topology.c  | 89 ++-
 include/linux/arch_topology.h |  4 ++
 2 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 1d6636e..41e0301
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -647,6 +647,92 @@ static int __init parse_dt_topology(void)
 }
 #endif
 
+
+/*
+ * cpu cache topology table
+ */
+#define MAX_CACHE_LEVEL 7
+struct device_node *cache_topology[NR_CPUS][MAX_CACHE_LEVEL];
+
+void init_cpu_cache_topology(void)
+{
+   struct device_node *node_cpu, *node_cache;
+   int cpu;
+   int level = 0;
+
+   for_each_possible_cpu(cpu) {
+   node_cpu = of_get_cpu_node(cpu, NULL);
+   if (!node_cpu)
+   continue;
+
+   level = 0;
+   node_cache = node_cpu;
+   while (level < MAX_CACHE_LEVEL) {
+   node_cache = of_parse_phandle(node_cache, 
"next-level-cache", 0);
+   if (!node_cache)
+   break;
+
+   cache_topology[cpu][level++] = node_cache;
+   }
+   of_node_put(node_cpu);
+   }
+}
+
+/*
+ * private means only shared within cpu_mask
+ * Returns -1 if not described int DT.
+ */
+int cpu_share_private_cache(const struct cpumask *cpu_mask)
+{
+   int cache_level, cpu_id;
+   struct cpumask cache_mask;
+   int cpu = cpumask_first(cpu_mask);
+
+   for (cache_level = 0; cache_level < MAX_CACHE_LEVEL; cache_level++) {
+   if (!cache_topology[cpu][cache_level])
+   return -1;
+
+   cpumask_clear(_mask);
+   for (cpu_id = 0; cpu_id < NR_CPUS; cpu_id++) {
+   if (cache_topology[cpu][cache_level] == 
cache_topology[cpu_id][cache_level])
+   cpumask_set_cpu(cpu_id, _mask);
+   }
+
+   if (cpumask_equal(cpu_mask, _mask))
+   return 1;
+   }
+
+   return 0;
+}
+
+bool cpu_share_llc(int cpu1, int cpu2)
+{
+   int cache_level;
+
+   for (cache_level = MAX_CACHE_LEVEL - 1; cache_level > 0; cache_level--) 
{
+   if (!cache_topology[cpu1][cache_level])
+   continue;
+
+   if (cache_topology[cpu1][cache_level] == 
cache_topology[cpu2][cache_level])
+   return true;
+
+   return false;
+   }
+
+   return false;
+}
+
+bool cpu_share_l2c(int cpu1, int cpu2)
+{
+   if (!cache_topology[cpu1][0])
+   return false;
+
+   if (cache_topology[cpu1][0] == cache_topology[cpu2][0])
+   return true;
+
+   return false;
+}
+
 /*
  * cpu topology table
  */
@@ -684,7 +770,8 @@ void update_siblings_masks(unsigned int cpuid)
for_each_online_cpu(cpu) {
cpu_topo = _topology[cpu];
 
-   if (cpuid_topo->llc_id == cpu_topo->llc_id) {
+   if ((cpuid_topo->llc_id != -1 && cpuid_topo->llc_id == 
cpu_topo->llc_id)
+   || (cpuid_topo->llc_id == -1 && cpu_share_llc(cpu, 
cpuid))) {
cpumask_set_cpu(cpu, _topo->llc_sibling);
cpumask_set_cpu(cpuid, _topo->llc_sibling);
}
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 58cbe18..a402ff6
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -86,6 +86,10 @@ extern struct cpu_topology cpu_topology[NR_CPUS];
 #define topology_cluster_cpumask(cpu)  (_topology[cpu].cluster_sibling)
 #define topology_llc_cpumask(cpu)  (_topology[cpu].llc_sibling)
 void 

[PATCH 1/3] sched: topology: add input parameter for sched_domain_flags_f()

2022-03-29 Thread Qing Wang
From: Wang Qing 

sched_domain_flags_f() are statically set now, but actually, we can get a lot
of necessary information based on the cpu_map. e.g. we can know whether its
cache is shared.

Allows custom extension without affecting current.

Signed-off-by: Wang Qing 
---
 arch/powerpc/kernel/smp.c  |  4 ++--
 arch/x86/kernel/smpboot.c  |  8 
 include/linux/sched/topology.h | 10 +-
 kernel/sched/topology.c|  2 +-
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index de0f6f0..e503d23
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1000,7 +1000,7 @@ static bool shared_caches;
 
 #ifdef CONFIG_SCHED_SMT
 /* cpumask of CPUs with asymmetric SMT dependency */
-static int powerpc_smt_flags(void)
+static int powerpc_smt_flags(const struct cpumask *cpu_map)
 {
int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
 
@@ -1018,7 +1018,7 @@ static int powerpc_smt_flags(void)
  * since the migrated task remains cache hot. We want to take advantage of this
  * at the scheduler level so an extra topology level is required.
  */
-static int powerpc_shared_cache_flags(void)
+static int powerpc_shared_cache_flags(const struct cpumask *cpu_map)
 {
return SD_SHARE_PKG_RESOURCES;
 }
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 2ef1477..c005a8e
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -535,25 +535,25 @@ static bool match_llc(struct cpuinfo_x86 *c, struct 
cpuinfo_x86 *o)
 
 
 #if defined(CONFIG_SCHED_SMT) || defined(CONFIG_SCHED_CLUSTER) || 
defined(CONFIG_SCHED_MC)
-static inline int x86_sched_itmt_flags(void)
+static inline int x86_sched_itmt_flags(const struct cpumask *cpu_map)
 {
return sysctl_sched_itmt_enabled ? SD_ASYM_PACKING : 0;
 }
 
 #ifdef CONFIG_SCHED_MC
-static int x86_core_flags(void)
+static int x86_core_flags(const struct cpumask *cpu_map)
 {
return cpu_core_flags() | x86_sched_itmt_flags();
 }
 #endif
 #ifdef CONFIG_SCHED_SMT
-static int x86_smt_flags(void)
+static int x86_smt_flags(const struct cpumask *cpu_map)
 {
return cpu_smt_flags() | x86_sched_itmt_flags();
 }
 #endif
 #ifdef CONFIG_SCHED_CLUSTER
-static int x86_cluster_flags(void)
+static int x86_cluster_flags(const struct cpumask *cpu_map)
 {
return cpu_cluster_flags() | x86_sched_itmt_flags();
 }
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 56cffe4..6aa985a
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -36,28 +36,28 @@ extern const struct sd_flag_debug sd_flag_debug[];
 #endif
 
 #ifdef CONFIG_SCHED_SMT
-static inline int cpu_smt_flags(void)
+static inline int cpu_smt_flags(const struct cpumask *cpu_map)
 {
return SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
 }
 #endif
 
 #ifdef CONFIG_SCHED_CLUSTER
-static inline int cpu_cluster_flags(void)
+static inline int cpu_cluster_flags(const struct cpumask *cpu_map)
 {
return SD_SHARE_PKG_RESOURCES;
 }
 #endif
 
 #ifdef CONFIG_SCHED_MC
-static inline int cpu_core_flags(void)
+static inline int cpu_core_flags(const struct cpumask *cpu_map)
 {
return SD_SHARE_PKG_RESOURCES;
 }
 #endif
 
 #ifdef CONFIG_NUMA
-static inline int cpu_numa_flags(void)
+static inline int cpu_numa_flags(const struct cpumask *cpu_map)
 {
return SD_NUMA;
 }
@@ -180,7 +180,7 @@ void free_sched_domains(cpumask_var_t doms[], unsigned int 
ndoms);
 bool cpus_share_cache(int this_cpu, int that_cpu);
 
 typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
-typedef int (*sched_domain_flags_f)(void);
+typedef int (*sched_domain_flags_f)(const struct cpumask *cpu_map);
 
 #define SDTL_OVERLAP   0x01
 
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05b6c2a..34dfec4
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1556,7 +1556,7 @@ sd_init(struct sched_domain_topology_level *tl,
sd_weight = cpumask_weight(tl->mask(cpu));
 
if (tl->sd_flags)
-   sd_flags = (*tl->sd_flags)();
+   sd_flags = (*tl->sd_flags)(tl->mask(cpu));
if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
"wrong sd_flags in topology description\n"))
sd_flags &= TOPOLOGY_SD_FLAGS;
-- 
2.7.4



[PATCH 0/3] support for describing cache topology from DT

2022-03-29 Thread Qing Wang
From: Wang Qing 

We don't know anything about the cache topology info without ACPI,
but in fact we can get it from DT like:
*   cpu0: cpu@000 {
*   next-level-cache = <_1>;
*   L2_1: l2-cache {
*   compatible = "cache";
*   next-level-cache = <_1>;
*   };
*   L3_1: l3-cache {
*   compatible = "cache";
*   };
*   };
*
*   cpu1: cpu@001 {
*   next-level-cache = <_1>;
*   cpu-idle-states = <_l 
*   _mem _pll 
_bus
*   >;
*   };
*   cpu2: cpu@002 {
*   L2_2: l2-cache {
*   compatible = "cache";
*   next-level-cache = <_1>;
*   };
*   };
*
*   cpu3: cpu@003 {
*   next-level-cache = <_2>;
*   };
Building the cache topology has many benefits, here is a part of useage.

Wang Qing (3):
  sched: topology: add input parameter for sched_domain_flags_f()
  arch_topology: support for describing cache topology from DT
  arm64: add arm64 default topology

 arch/arm64/kernel/smp.c| 56 ++
 arch/powerpc/kernel/smp.c  |  4 +-
 arch/x86/kernel/smpboot.c  |  8 ++--
 drivers/base/arch_topology.c   | 89 +-
 include/linux/arch_topology.h  |  4 ++
 include/linux/sched/topology.h | 10 ++---
 kernel/sched/topology.c|  2 +-
 7 files changed, 160 insertions(+), 13 deletions(-)

-- 
2.7.4



[PATCH] powerpc: Export mmu_feature_keys[] as non-GPL

2022-03-29 Thread Kevin Hao
When the mmu_feature_keys[] was introduced in the commit c12e6f24d413
("powerpc: Add option to use jump label for mmu_has_feature()"),
it is unlikely that it would be used either directly or indirectly in
the out of tree modules. So we export it as GPL only. But with the
evolution of the codes, especially the PPC_KUAP support, it may be
indirectly referenced by some primitive macro or inline functions such
as get_user() or __copy_from_user_inatomic(), this will make it
impossible to build many non GPL modules (such as ZFS) on ppc
architecture. Fix this by exposing the mmu_feature_keys[] to the
non-GPL modules too.

Reported-by: Nathaniel Filardo 
Signed-off-by: Kevin Hao 
---
 arch/powerpc/kernel/cputable.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index ae0fdef0ac11..3a8cd40b6368 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -2119,7 +2119,7 @@ void __init cpu_feature_keys_init(void)
 struct static_key_true mmu_feature_keys[NUM_MMU_FTR_KEYS] = {
[0 ... NUM_MMU_FTR_KEYS - 1] = STATIC_KEY_TRUE_INIT
 };
-EXPORT_SYMBOL_GPL(mmu_feature_keys);
+EXPORT_SYMBOL(mmu_feature_keys);
 
 void __init mmu_feature_keys_init(void)
 {
-- 
2.34.1



Re: [PATCH] powerpc/rtas: Keep MSR RI set when calling RTAS

2022-03-29 Thread Laurent Dufour
On 29/03/2022, 10:31:33, Nicholas Piggin wrote:
> Excerpts from Laurent Dufour's message of March 17, 2022 9:06 pm:
>> RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32bits
>> mode (MSR[SF] unset).
>>
>> The change in MSR is done in enter_rtas() in a relatively complex way,
>> since the MSR value could be hardcoded.
>>
>> Furthermore, a panic has been reported when hitting the watchdog interrupt
>> while running in RTAS, this leads to the following stack trace:
>>
>> [69244.027433][   C24] watchdog: CPU 24 Hard LOCKUP
>> [69244.027442][   C24] watchdog: CPU 24 TB:997512652051031, last heartbeat 
>> TB:997504470175378 (15980ms ago)
>> [69244.027451][   C24] Modules linked in: chacha_generic(E) libchacha(E) 
>> xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) 
>> libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) 
>> algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) 
>> fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) 
>> cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) 
>> algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) 
>> rpcsec_gss_krb5(E) auth_rpcgss(E)
>> nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) tcp_diag(E) 
>> udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) 
>> netlink_diag(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) 
>> fscache(E) netfs(E) af_packet(E) rfkill(E) bonding(E) tls(E) ibmveth(EX) 
>> crct10dif_vpmsum(E) rtc_generic(E) drm(E) drm_panel_orientation_quirks(E) 
>> fuse(E) configfs(E) backlight(E) ip_tables(E) x_tables(E) dm_service_time(E) 
>> sd_mod(E) t10_pi(E)
>> [69244.027555][   C24]  ibmvfc(EX) scsi_transport_fc(E) vmx_crypto(E) 
>> gf128mul(E) btrfs(E) blake2b_generic(E) libcrc32c(E) crc32c_vpmsum(E) xor(E) 
>> raid6_pq(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) 
>> dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E)
>> [69244.027587][   C24] Supported: No, Unreleased kernel
>> [69244.027600][   C24] CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: 
>> GE  X5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 
>> (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
>> [69244.027609][   C24] NIP:  1fb41050 LR: 1fb4104c CTR: 
>> 
>> [69244.027612][   C24] REGS: cfc33d60 TRAP: 0100   Tainted: G
>> E  X (5.14.21-150400.71.1.bz196362_2-default)
>> [69244.027615][   C24] MSR:  82981000   CR: 4882  
>> XER: 20040020
>> [69244.027625][   C24] CFAR: 011c IRQMASK: 1
>> [69244.027625][   C24] GPR00: 0003  
>> 0001 50dc
>> [69244.027625][   C24] GPR04: 1ffb6100 0020 
>> 0001 1fb09010
>> [69244.027625][   C24] GPR08: 2000  
>>  
>> [69244.027625][   C24] GPR12: 8004072a40a8 cff8b680 
>> 0007 0034
>> [69244.027625][   C24] GPR16: 1fbf6e94 1fbf6d84 
>> 1fbd1db0 1fb3f008
>> [69244.027625][   C24] GPR20: 1fb41018  
>> 017f f68f
>> [69244.027625][   C24] GPR24: 1fb18fe8 1fb3e000 
>> 1fb1adc0 1fb1cf40
>> [69244.027625][   C24] GPR28: 1fb26000 1fb460f0 
>> 1fb17f18 1fb17000
>> [69244.027663][   C24] NIP [1fb41050] 0x1fb41050
>> [69244.027696][   C24] LR [1fb4104c] 0x1fb4104c
>> [69244.027699][   C24] Call Trace:
>> [69244.027701][   C24] Instruction dump:
>> [69244.027723][   C24]       
>>  
>> [69244.027728][   C24]       
>>  
>> [69244.027762][T87504] Oops: Unrecoverable System Reset, sig: 6 [#1]
>> [69244.028044][T87504] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA 
>> pSeries
>> [69244.028089][T87504] Modules linked in: chacha_generic(E) libchacha(E) 
>> xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) 
>> libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) 
>> algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) 
>> fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) 
>> cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) 
>> algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) 
>> rpcsec_gss_krb5(E) auth_rpcgss(E)
>> nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) tcp_diag(E) 
>> udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) 
>> netlink_diag(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) 
>> fscache(E) netfs(E) af_packet(E) rfkill(E) bonding(E) tls(E) ibmveth(EX) 
>> crct10dif_vpmsum(E) rtc_generic(E) drm(E) 

[PATCH v3 2/2] PCI/DPC: Disable DPC service when link is in L2/L3 ready, L2 and L3 state

2022-03-29 Thread Kai-Heng Feng
On some Intel AlderLake platforms, Thunderbolt entering D3cold can cause
some errors reported by AER:
[   30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error 
received: :00:1d.0
[   30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected 
(Non-Fatal), type=Transaction Layer, (Requester ID)
[   30.100256] pcieport :00:1d.0:   device [8086:7ab0] error 
status/mask=0010/4000
[   30.100262] pcieport :00:1d.0:[20] UnsupReq   (First)
[   30.100267] pcieport :00:1d.0: AER:   TLP Header: 3400 0852 
 
[   30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected 
callback)
[   30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected 
callback)
[   30.100427] pcieport :00:1d.0: AER: device recovery failed

Since AER is disabled in previous patch for a Link in L2/L3 Ready, L2
and L3, also disable DPC here as DPC depends on AER to work.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453
Reviewed-by: Mika Westerberg 
Signed-off-by: Kai-Heng Feng 
---
v3:
 - Wording change to make the patch more clear.

v2:
 - Wording change.
 - Empty line dropped.

 drivers/pci/pcie/dpc.c | 60 +++---
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 3e9afee02e8d1..414258967f08e 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -343,13 +343,33 @@ void pci_dpc_init(struct pci_dev *pdev)
}
 }
 
+static void dpc_enable(struct pcie_device *dev)
+{
+   struct pci_dev *pdev = dev->port;
+   u16 ctl;
+
+   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
+   ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | 
PCI_EXP_DPC_CTL_INT_EN;
+   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+}
+
+static void dpc_disable(struct pcie_device *dev)
+{
+   struct pci_dev *pdev = dev->port;
+   u16 ctl;
+
+   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
+   ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
+   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+}
+
 #define FLAG(x, y) (((x) & (y)) ? '+' : '-')
 static int dpc_probe(struct pcie_device *dev)
 {
struct pci_dev *pdev = dev->port;
struct device *device = >device;
int status;
-   u16 ctl, cap;
+   u16 cap;
 
if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native)
return -ENOTSUPP;
@@ -364,10 +384,7 @@ static int dpc_probe(struct pcie_device *dev)
}
 
pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, );
-   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
-
-   ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | 
PCI_EXP_DPC_CTL_INT_EN;
-   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+   dpc_enable(dev);
pci_info(pdev, "enabled with IRQ %d\n", dev->irq);
 
pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c 
PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
@@ -380,22 +397,33 @@ static int dpc_probe(struct pcie_device *dev)
return status;
 }
 
-static void dpc_remove(struct pcie_device *dev)
+static int dpc_suspend(struct pcie_device *dev)
 {
-   struct pci_dev *pdev = dev->port;
-   u16 ctl;
+   dpc_disable(dev);
+   return 0;
+}
 
-   pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, );
-   ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
-   pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl);
+static int dpc_resume(struct pcie_device *dev)
+{
+   dpc_enable(dev);
+   return 0;
+}
+
+static void dpc_remove(struct pcie_device *dev)
+{
+   dpc_disable(dev);
 }
 
 static struct pcie_port_service_driver dpcdriver = {
-   .name   = "dpc",
-   .port_type  = PCIE_ANY_PORT,
-   .service= PCIE_PORT_SERVICE_DPC,
-   .probe  = dpc_probe,
-   .remove = dpc_remove,
+   .name   = "dpc",
+   .port_type  = PCIE_ANY_PORT,
+   .service= PCIE_PORT_SERVICE_DPC,
+   .probe  = dpc_probe,
+   .suspend= dpc_suspend,
+   .resume = dpc_resume,
+   .runtime_suspend= dpc_suspend,
+   .runtime_resume = dpc_resume,
+   .remove = dpc_remove,
 };
 
 int __init pcie_dpc_init(void)
-- 
2.34.1



[PATCH v3 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state

2022-03-29 Thread Kai-Heng Feng
On some Intel AlderLake platforms, Thunderbolt entering D3cold can cause
some errors reported by AER:
[   30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error 
received: :00:1d.0
[   30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected 
(Non-Fatal), type=Transaction Layer, (Requester ID)
[   30.100256] pcieport :00:1d.0:   device [8086:7ab0] error 
status/mask=0010/4000
[   30.100262] pcieport :00:1d.0:[20] UnsupReq   (First)
[   30.100267] pcieport :00:1d.0: AER:   TLP Header: 3400 0852 
 
[   30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected 
callback)
[   30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected 
callback)
[   30.100427] pcieport :00:1d.0: AER: device recovery failed

So disable AER service to avoid the noises from turning power rails
on/off when the device is in low power states (D3hot and D3cold), as
PCIe spec "5.2 Link State Power Management" states that TLP and DLLP
transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold
with aux power) and L3 (D3cold).

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453
Reviewed-by: Mika Westerberg 
Signed-off-by: Kai-Heng Feng 
---
v3:
 - Remove reference to ACS.
 - Wording change.

v2:
 - Wording change.

 drivers/pci/pcie/aer.c | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 9fa1f97e5b270..e4e9d4a3098d7 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev)
return 0;
 }
 
+static int aer_suspend(struct pcie_device *dev)
+{
+   struct aer_rpc *rpc = get_service_data(dev);
+
+   aer_disable_rootport(rpc);
+   return 0;
+}
+
+static int aer_resume(struct pcie_device *dev)
+{
+   struct aer_rpc *rpc = get_service_data(dev);
+
+   aer_enable_rootport(rpc);
+   return 0;
+}
+
 /**
  * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
  * @dev: pointer to Root Port, RCEC, or RCiEP
@@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct pci_dev 
*dev)
 }
 
 static struct pcie_port_service_driver aerdriver = {
-   .name   = "aer",
-   .port_type  = PCIE_ANY_PORT,
-   .service= PCIE_PORT_SERVICE_AER,
-
-   .probe  = aer_probe,
-   .remove = aer_remove,
+   .name   = "aer",
+   .port_type  = PCIE_ANY_PORT,
+   .service= PCIE_PORT_SERVICE_AER,
+   .probe  = aer_probe,
+   .suspend= aer_suspend,
+   .resume = aer_resume,
+   .runtime_suspend= aer_suspend,
+   .runtime_resume = aer_resume,
+   .remove = aer_remove,
 };
 
 /**
-- 
2.34.1



Re: [PATCH] powerpc/rtas: Keep MSR RI set when calling RTAS

2022-03-29 Thread Nicholas Piggin
Excerpts from Laurent Dufour's message of March 17, 2022 9:06 pm:
> RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32bits
> mode (MSR[SF] unset).
> 
> The change in MSR is done in enter_rtas() in a relatively complex way,
> since the MSR value could be hardcoded.
> 
> Furthermore, a panic has been reported when hitting the watchdog interrupt
> while running in RTAS, this leads to the following stack trace:
> 
> [69244.027433][   C24] watchdog: CPU 24 Hard LOCKUP
> [69244.027442][   C24] watchdog: CPU 24 TB:997512652051031, last heartbeat 
> TB:997504470175378 (15980ms ago)
> [69244.027451][   C24] Modules linked in: chacha_generic(E) libchacha(E) 
> xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) 
> libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) 
> algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) 
> fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) 
> cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) 
> algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) 
> rpcsec_gss_krb5(E) auth_rpcgss(E)
> nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) tcp_diag(E) 
> udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) 
> netlink_diag(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) 
> fscache(E) netfs(E) af_packet(E) rfkill(E) bonding(E) tls(E) ibmveth(EX) 
> crct10dif_vpmsum(E) rtc_generic(E) drm(E) drm_panel_orientation_quirks(E) 
> fuse(E) configfs(E) backlight(E) ip_tables(E) x_tables(E) dm_service_time(E) 
> sd_mod(E) t10_pi(E)
> [69244.027555][   C24]  ibmvfc(EX) scsi_transport_fc(E) vmx_crypto(E) 
> gf128mul(E) btrfs(E) blake2b_generic(E) libcrc32c(E) crc32c_vpmsum(E) xor(E) 
> raid6_pq(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) 
> dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E)
> [69244.027587][   C24] Supported: No, Unreleased kernel
> [69244.027600][   C24] CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: 
> GE  X5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 
> (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c
> [69244.027609][   C24] NIP:  1fb41050 LR: 1fb4104c CTR: 
> 
> [69244.027612][   C24] REGS: cfc33d60 TRAP: 0100   Tainted: G 
>E  X (5.14.21-150400.71.1.bz196362_2-default)
> [69244.027615][   C24] MSR:  82981000   CR: 4882  
> XER: 20040020
> [69244.027625][   C24] CFAR: 011c IRQMASK: 1
> [69244.027625][   C24] GPR00: 0003  
> 0001 50dc
> [69244.027625][   C24] GPR04: 1ffb6100 0020 
> 0001 1fb09010
> [69244.027625][   C24] GPR08: 2000  
>  
> [69244.027625][   C24] GPR12: 8004072a40a8 cff8b680 
> 0007 0034
> [69244.027625][   C24] GPR16: 1fbf6e94 1fbf6d84 
> 1fbd1db0 1fb3f008
> [69244.027625][   C24] GPR20: 1fb41018  
> 017f f68f
> [69244.027625][   C24] GPR24: 1fb18fe8 1fb3e000 
> 1fb1adc0 1fb1cf40
> [69244.027625][   C24] GPR28: 1fb26000 1fb460f0 
> 1fb17f18 1fb17000
> [69244.027663][   C24] NIP [1fb41050] 0x1fb41050
> [69244.027696][   C24] LR [1fb4104c] 0x1fb4104c
> [69244.027699][   C24] Call Trace:
> [69244.027701][   C24] Instruction dump:
> [69244.027723][   C24]       
>  
> [69244.027728][   C24]       
>  
> [69244.027762][T87504] Oops: Unrecoverable System Reset, sig: 6 [#1]
> [69244.028044][T87504] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [69244.028089][T87504] Modules linked in: chacha_generic(E) libchacha(E) 
> xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) 
> libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) 
> algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) 
> fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) 
> cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) 
> algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) 
> rpcsec_gss_krb5(E) auth_rpcgss(E)
> nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) tcp_diag(E) 
> udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) 
> netlink_diag(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) 
> fscache(E) netfs(E) af_packet(E) rfkill(E) bonding(E) tls(E) ibmveth(EX) 
> crct10dif_vpmsum(E) rtc_generic(E) drm(E) drm_panel_orientation_quirks(E) 
> fuse(E) configfs(E) backlight(E) ip_tables(E) x_tables(E) dm_service_time(E) 
> sd_mod(E) t10_pi(E)
> [69244.028171][T87504]  

Re: [PATCH] powerpc/64: Fix build failure with allyesconfig in book3s_64_entry.S

2022-03-29 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of March 27, 2022 5:32 pm:
> Using conditional branches between two files is hasardous,
> they may get linked to far from each other.
> 
>   arch/powerpc/kvm/book3s_64_entry.o:(.text+0x3ec): relocation truncated
>   to fit: R_PPC64_REL14 (stub) against symbol `system_reset_common'
>   defined in .text section in arch/powerpc/kernel/head_64.o
> 
> Reorganise the code to use non conditional branches.

Thanks for the fix, I agree this is better.

Reviewed-by: Nicholas Piggin 

> 
> Cc: Nicholas Piggin 
> Fixes: 89d35b239101 ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 
> path in C")
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kvm/book3s_64_entry.S | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_entry.S 
> b/arch/powerpc/kvm/book3s_64_entry.S
> index 05e003eb5d90..99fa36df36fa 100644
> --- a/arch/powerpc/kvm/book3s_64_entry.S
> +++ b/arch/powerpc/kvm/book3s_64_entry.S
> @@ -414,10 +414,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_DAWR1)
>*/
>   ld  r10,HSTATE_SCRATCH0(r13)
>   cmpwi   r10,BOOK3S_INTERRUPT_MACHINE_CHECK
> - beq machine_check_common
> + beq 1f
>  
>   cmpwi   r10,BOOK3S_INTERRUPT_SYSTEM_RESET
> - beq system_reset_common
> + bne .
>  
> - b   .
> + b   system_reset_common
> +1:   b   machine_check_common
>  #endif
> -- 
> 2.35.1
> 
> 


Re: [PATCH] KVM: PPC: Book3S HV: Fix vcore_blocked tracepoint

2022-03-29 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of March 29, 2022 7:58 am:
> We removed most of the vcore logic from the P9 path but there's still
> a tracepoint that tried to dereference vc->runner.

Thanks for the fix.

Reviewed-by: Nicholas Piggin 

> 
> Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic")
> Signed-off-by: Fabiano Rosas 
> ---
>  arch/powerpc/kvm/book3s_hv.c | 8 
>  arch/powerpc/kvm/trace_hv.h  | 8 
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index c886557638a1..5f5b2d0dee8c 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4218,13 +4218,13 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore 
> *vc)
>   start_wait = ktime_get();
>  
>   vc->vcore_state = VCORE_SLEEPING;
> - trace_kvmppc_vcore_blocked(vc, 0);
> + trace_kvmppc_vcore_blocked(vc->runner, 0);
>   spin_unlock(>lock);
>   schedule();
>   finish_rcuwait(>wait);
>   spin_lock(>lock);
>   vc->vcore_state = VCORE_INACTIVE;
> - trace_kvmppc_vcore_blocked(vc, 1);
> + trace_kvmppc_vcore_blocked(vc->runner, 1);
>   ++vc->runner->stat.halt_successful_wait;
>  
>   cur = ktime_get();
> @@ -4596,9 +4596,9 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 
> time_limit,
>   if (kvmppc_vcpu_check_block(vcpu))
>   break;
>  
> - trace_kvmppc_vcore_blocked(vc, 0);
> + trace_kvmppc_vcore_blocked(vcpu, 0);
>   schedule();
> - trace_kvmppc_vcore_blocked(vc, 1);
> + trace_kvmppc_vcore_blocked(vcpu, 1);
>   }
>   finish_rcuwait(wait);
>   }
> diff --git a/arch/powerpc/kvm/trace_hv.h b/arch/powerpc/kvm/trace_hv.h
> index 38cd0ed0a617..32e2cb5811cc 100644
> --- a/arch/powerpc/kvm/trace_hv.h
> +++ b/arch/powerpc/kvm/trace_hv.h
> @@ -409,9 +409,9 @@ TRACE_EVENT(kvmppc_run_core,
>  );
>  
>  TRACE_EVENT(kvmppc_vcore_blocked,
> - TP_PROTO(struct kvmppc_vcore *vc, int where),
> + TP_PROTO(struct kvm_vcpu *vcpu, int where),
>  
> - TP_ARGS(vc, where),
> + TP_ARGS(vcpu, where),
>  
>   TP_STRUCT__entry(
>   __field(int,n_runnable)
> @@ -421,8 +421,8 @@ TRACE_EVENT(kvmppc_vcore_blocked,
>   ),
>  
>   TP_fast_assign(
> - __entry->runner_vcpu = vc->runner->vcpu_id;
> - __entry->n_runnable  = vc->n_runnable;
> + __entry->runner_vcpu = vcpu->vcpu_id;
> + __entry->n_runnable  = vcpu->arch.vcore->n_runnable;
>   __entry->where   = where;
>   __entry->tgid= current->tgid;
>   ),
> -- 
> 2.35.1
> 
>