Re: [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb()

2023-06-14 Thread Hugh Dickins
On Wed, 14 Jun 2023, Hugh Dickins wrote:
> On Wed, 14 Jun 2023, Nathan Chancellor wrote:
> > 
> > I just bisected a crash while powering down a MIPS machine in QEMU to
> > this change as commit 8044511d3893 ("mips: update_mmu_cache() can
> > replace __update_tlb()") in linux-next.
> 
> Thank you, Nathan, that's very helpful indeed.  This patch certainly knew
> that it wanted testing, and I'm glad to hear that it is now seeing some.
> 
> While powering down?  The messages below look like it was just coming up,
> but no doubt that's because you were bisecting (or because I'm unfamiliar
> with what messages to expect there).  It's probably irrelevant information,
> but I wonder whether the (V)machine worked well enough for a while before
> you first powered down and spotted the problem, or whether it's never got
> much further than trying to run init (busybox)?  I'm trying to get a feel
> for whether the problem occurs under common or uncommon conditions.
> 
> > Unfortunately, I can still
> > reproduce it with the existing fix you have for this change on the
> > mailing list, which is present in next-20230614.
> 
> Right, that later fix was only for a build warning, nothing functional
> (or at least I hoped that it wasn't making any functional difference).
> 
> Thanks a lot for the detailed instructions below: unfortunately, those
> would draw me into a realm of testing I've never needed to enter before,
> so a lot of time spent on setup and learning.  Usually, I just stare at
> the source.
> 
> What this probably says is that I should revert most my cleanup there,
> and keep as close to the existing code as possible.  But some change is
> needed, and I may need to understand (or have a good guess at) what was
> going wrong, to decide what kind of retreat will be successful.
> 
> Back to the source for a while: I hope I'll find examples in nearby MIPS
> kernel source (and git history), which will hint at the right way forward.
> Then send you a patch against next-20230614 to try, when I'm reasonably
> confident that it's enough to satisfy my purpose, but likely not to waste
> your time.

I'm going to take advantage of your good nature by attaching
two alternative patches, either to go on top of next-20230614.

mips1.patch,
 arch/mips/mm/tlb-r4k.c |   12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

is by far my favourite.  I couldn't see anything wrong with what's
already there for mips, but it seems possible that (though I didn't
find it) somewhere calls update_mmu_cache_pmd() on a page table.  So
mips1.patch restores the pmd_huge() check, and cleans up further by
removing the silly pgdp, p4dp, pudp, pmdp stuff: the pointer has now
been passed in by the caller, why walk the tree again?  I should have
done it this way before.

But if that doesn't work, then I'm afraid it will have to be
mips2.patch,
 arch/mips/include/asm/pgtable.h |   15 ---
 arch/mips/mm/tlb-r3k.c  |5 ++---
 arch/mips/mm/tlb-r4k.c  |   27 ++-
 3 files changed, 32 insertions(+), 15 deletions(-)

which reverts all of the original patch and its build warning fix,
and does a pte_unmap() to balance the silly pte_offset_map() there;
with an apologetic comment for this being about the only place in
the tree where I have no idea what to do if ptep were NULL.

I do hope that you find the first fixes the breakage; but if not, then
I even more fervently hope that the second will, despite my hating it.
Touch wood for the first, fingers crossed for the second, thanks,

Hugh--- a/arch/mips/mm/tlb-r4k.c
+++ b/arch/mips/mm/tlb-r4k.c
@@ -293,12 +293,6 @@ void local_flush_tlb_one(unsigned long page)
 void update_mmu_cache(struct vm_area_struct *vma,
 		  unsigned long address, pte_t *ptep)
 {
-#ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
-	pgd_t *pgdp;
-	p4d_t *p4dp;
-	pud_t *pudp;
-	pmd_t *pmdp;
-#endif
 	unsigned long flags;
 	int idx, pid;
 
@@ -323,12 +317,8 @@ void update_mmu_cache(struct vm_area_struct *vma,
 	tlb_probe_hazard();
 	idx = read_c0_index();
 #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
-	pgdp = pgd_offset(vma->vm_mm, address);
-	p4dp = p4d_offset(pgdp, address);
-	pudp = pud_offset(p4dp, address);
-	pmdp = pmd_offset(pudp, address);
 	/* this could be a huge page  */
-	if (ptep == (pte_t *)pmdp) {
+	if (pmd_huge(*(pmd_t *)ptep)) {
 		unsigned long lo;
 		write_c0_pagemask(PM_HUGE_MASK);
 		lo = pte_to_entrylo(pte_val(*ptep));
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -565,8 +565,15 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
 }
 #endif
 
-extern void update_mmu_cache(struct vm_area_struct *vma,
-	unsigned long address, pte_t *ptep);
+extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
+	pte_t pte);
+
+static inline void update_mmu_cache(struct vm_area

Re: [PATCH v1 10/21] powerpc/kexec: refactor for kernel/Kconfig.kexec

2023-06-14 Thread Michael Ellerman
Eric DeVolder  writes:

> The kexec and crash kernel options are provided in the common
> kernel/Kconfig.kexec. Utilize the common options and provide
> the ARCH_HAS_ and ARCH_SUPPORTS_ entries to recreate the
> equivalent set of KEXEC and CRASH options.
>
> Signed-off-by: Eric DeVolder 
> Reviewed-by: Sourabh Jain 
> ---
>  arch/powerpc/Kconfig | 55 ++--
>  1 file changed, 17 insertions(+), 38 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index bff5820b7cda..36f2fe0cc8a5 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -588,41 +588,21 @@ config PPC64_SUPPORTS_MEMORY_FAILURE
>   default "y" if PPC_POWERNV
>   select ARCH_SUPPORTS_MEMORY_FAILURE
>  
> -config KEXEC
> - bool "kexec system call"
> - depends on PPC_BOOK3S || PPC_E500 || (44x && !SMP)
> - select KEXEC_CORE
> - help
> -   kexec is a system call that implements the ability to shutdown your
> -   current kernel, and to start another kernel.  It is like a reboot
> -   but it is independent of the system firmware.   And like a reboot
> -   you can start any kernel with it, not just Linux.
> -
> -   The name comes from the similarity to the exec system call.
> -
> -   It is an ongoing process to be certain the hardware in a machine
> -   is properly shutdown, so do not be surprised if this code does not
> -   initially work for you.  As of this writing the exact hardware
> -   interface is strongly in flux, so no good recommendation can be
> -   made.
> -
> -config KEXEC_FILE
> - bool "kexec file based system call"
> - select KEXEC_CORE
> - select HAVE_IMA_KEXEC if IMA
> - select KEXEC_ELF
> - depends on PPC64
> - depends on CRYPTO=y
> - depends on CRYPTO_SHA256=y
...
> +
> +config ARCH_HAS_KEXEC_FILE
> + def_bool PPC64 && CRYPTO && CRYPTO_SHA256

The =y's got lost here.

I think they were both meaningful, because both options are tristate. So
this previously required them to be built-in (=y), whereas after your
patch it will allow them to be modules.

I don't know for sure that those options need to be built-in, but that's
what the code does now, so this patch shouldn't change it, at least
without an explanation.

cheers


Re: [PATCH v1 00/21] refactor Kconfig to consolidate KEXEC and CRASH options

2023-06-14 Thread Michael Ellerman
Eric DeVolder  writes:
> On 6/13/23 15:21, Kees Cook wrote:
>> On Mon, Jun 12, 2023 at 01:27:52PM -0400, Eric DeVolder wrote:
>>> The Kconfig is refactored to consolidate KEXEC and CRASH options from
>>> various arch//Kconfig files into new file kernel/Kconfig.kexec.
>> 
>> This looks very nice!
>> 
> Thank you Kees!
>
>>> [...]
>>> - The boolean ARCH_HAS_ in effect allows the arch to determine
>>>when the feature is allowed.  Archs which don't have the feature
>>>simply do not provide the corresponding ARCH_HAS_.
>>>For each arch, where there previously were KEXEC and/or CRASH
>>>options, these have been replaced with the corresponding boolean
>>>ARCH_HAS_, and an appropriate def_bool statement.
>>>
>>>For example, if the arch supports KEXEC_FILE, then the
>>>ARCH_HAS_KEXEC_FILE simply has a 'def_bool y'. This permits the
>>>KEXEC_FILE option to be available.
>>>
>>>If the arch has a 'depends on' statement in its original coding
>>>of the option, then that expression becomes part of the def_bool
>>>expression. For example, arm64 had:
>>>
>>>config KEXEC
>>>  depends on PM_SLEEP_SMP
>>>
>>>and in this solution, this converts to:
>>>
>>>config ARCH_HAS_KEXEC
>>>  def_bool PM_SLEEP_SMP
>>>
>>>
>>> - In order to account for the differences in the config coding for
>>>the three common options, the ARCH_SUPPORTS_ is used.
>>>This options has a 'depends on ' statement to couple it
>>>to the main option, and from there can insert the differences
>>>from the common option and the arch original coding of that option.
>>>
>>>For example, a few archs enable CRYPTO and CRYTPO_SHA256 for
>>>KEXEC_FILE. These require a ARCH_SUPPORTS_KEXEC_FILE and
>>>'select CRYPTO' and 'select CRYPTO_SHA256' statements.
>> 
>> Naming nit: "HAS" and "SUPPORTS" feel very similar, and looking at
>> existing configs, "ARCH_SUPPORTS_..." is already used for doing this
>> kind of bare "bool" management. e.g. see ARCH_SUPPORTS_INT128
>> 
>> It looks like you need to split "depends" and "select" so the options
>> can be chosen separately from the "selectable" configs.
>> 
>> How about naming this ARCH_SELECTS_, since that's what it's
>> there for?
>> 
> I'm OK with this. Let's see if others agree?

Yeah please rename one or both of them. At a glance the difference
between HAS and SUPPORTS is very non-obvious.

I like Kees' suggestion to use ARCH_SUPPORTS and ARCH_SELECTS.

cheers


Re: [6.4-rc6] Crash during a kexec operation (tpm_amd_is_rng_defective)

2023-06-14 Thread Sachin Sant


>> [ 34.381788] Code: 5463063e 408201c8 38210080 4e800020 6000 6000 
>> 6000 7c0802a6 fbe10078 7c7f1b78 f8010090 e9230728  2c2c 
>> 41820020 7d8903a6 
> 
>  2c:   28 07 23 e9 ld  r9,1832(r3)
>  30:   50 00 89 e9 ld  r12,80(r9)
> 
> Where r3 is *chip.
> r9 is NULL, and 80 = 0x50.
> 
> Looks like a NULL chip->ops, which oopses in:
> 
> static int tpm_request_locality(struct tpm_chip *chip)
> {
> int rc;
> 
> if (!chip->ops->request_locality)
> 
> 
> Can you test the patch below?
> 

It proceeds further but then run into following crash

[  103.269574] Kernel attempted to read user page (18) - exploit attempt? (uid: 
0)
[  103.269589] BUG: Kernel NULL pointer dereference on read at 0x0018
[  103.269595] Faulting instruction address: 0xc09dcf34
[  103.269599] Oops: Kernel access of bad area, sig: 11 [#1]
[  103.269602] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[  103.269606] Modules linked in: dm_mod(E) nft_fib_inet(E) nft_fib_ipv4(E) 
nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) 
nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) 
nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) bonding(E) tls(E) rfkill(E) 
ip_set(E) sunrpc(E) nf_tables(E) nfnetlink(E) pseries_rng(E) 
aes_gcm_p10_crypto(E) drm(E) drm_panel_orientation_quirks(E) xfs(E) 
libcrc32c(E) sd_mod(E) sr_mod(E) t10_pi(E) crc64_rocksoft_generic(E) cdrom(E) 
crc64_rocksoft(E) crc64(E) sg(E) ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) 
vmx_crypto(E) fuse(E)
[  103.269644] CPU: 18 PID: 6872 Comm: kexec Kdump: loaded Tainted: G   
 E  6.4.0-rc6-dirty #8
[  103.269649] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
[  103.269653] NIP:  c09dcf34 LR: c09dd2bc CTR: c09eaa60
[  103.269656] REGS: c000a113f510 TRAP: 0300   Tainted: GE  
 (6.4.0-rc6-dirty)
[  103.269660] MSR:  8280b033   CR: 
88484886  XER: 0001
[  103.269669] CFAR: c09dd2b8 DAR: 0018 DSISR: 4000 
IRQMASK: 0  [  103.269669] GPR00: c09dd2bc c000a113f7b0 
c14a1500 c0009031  [  103.269669] GPR04: c0009f77 
0016 06007a01 0016  [  103.269669] GPR08: 
c0009f77   8000  [  
103.269669] GPR12: c09eaa60 c0135fab7f00  
  [  103.269669] GPR16:   
   [  103.269669] GPR20:  
    [  103.269669] GPR24: 
 0016 c0009031 1000  [  
103.269669] GPR28: c0009f77 7a01 c0009f77 
c0009031  [  103.269707] NIP [c09dcf34] 
tpm_try_transmit+0x74/0x300
[  103.269713] LR [c09dd2bc] tpm_transmit+0xfc/0x190
[  103.269717] Call Trace:
[  103.269718] [c000a113f7b0] [c000a113f880] 0xc000a113f880 
(unreliable)
[  103.269724] [c000a113f840] [c09dd2bc] tpm_transmit+0xfc/0x190
[  103.269727] [c000a113f900] [c09dd398] tpm_transmit_cmd+0x48/0x110
[  103.269731] [c000a113f980] [c09df1b0] tpm2_get_tpm_pt+0x140/0x230
[  103.269736] [c000a113fa20] [c09db208] 
tpm_amd_is_rng_defective+0xb8/0x250
[  103.269739] [c000a113faa0] [c09db828] 
tpm_chip_unregister+0x138/0x160
[  103.269743] [c000a113fae0] [c09eaa94] 
tpm_ibmvtpm_remove+0x34/0x130
[  103.269748] [c000a113fb50] [c0115738] vio_bus_remove+0x58/0xd0
[  103.269754] [c000a113fb90] [c0a01dcc] device_shutdown+0x21c/0x39c
[  103.269758] [c000a113fc20] [c01a2684] 
kernel_restart_prepare+0x54/0x70
[  103.269762] [c000a113fc40] [c0292c48] kernel_kexec+0xa8/0x100
[  103.269766] [c000a113fcb0] [c01a2cd4] __do_sys_reboot+0x214/0x2c0
[  103.269770] [c000a113fe10] [c0034adc] 
system_call_exception+0x13c/0x340
[  103.269776] [c000a113fe50] [c000d05c] 
system_call_vectored_common+0x15c/0x2ec
[  103.269781] --- interrupt: 3000 at 0x7fff805459f0
[  103.269784] NIP:  7fff805459f0 LR:  CTR: 
[  103.269786] REGS: c000a113fe80 TRAP: 3000   Tainted: GE  
 (6.4.0-rc6-dirty)
[  103.269790] MSR:  8280f033   CR: 
42422884  XER: 
[  103.269799] IRQMASK: 0  [  103.269799] GPR00: 0058 
7fffc07a68c0 000110437f00 fee1dead  [  103.269799] GPR04: 
28121969 45584543  0003  [  
103.269799] GPR08: 0010   
  [  103.269799] GPR12:  7fff8089b2c0 
00011042f598   [  103.269799] GPR16:  
 00011040fcc0   [  103.269799] GPR20: 
8913 

Re: [PATCH net-next] eth: fs_enet: fix print format for resource size

2023-06-14 Thread Jakub Kicinski
On Wed, 14 Jun 2023 21:02:33 -0700 Randy Dunlap wrote:
> On 6/14/23 20:52, Jakub Kicinski wrote:
> > Randy forwarded report from Stephen that on PowerPC:  
> 
> Stephen forwarded report from Randy?
> 
> netdev & pantelis were cc-ed...

Ah, I misread, you were reporting to Stephen the status for the latest
linux-next!

https://lore.kernel.org/all/8f9f8d38-d9c7-9f1b-feb0-103d76902...@infradead.org/

Seems obvious in hindsight, sorry. I'll reword when applying.

> > drivers/net/ethernet/freescale/fs_enet/mii-fec.c: In function 
> > 'fs_enet_mdio_probe':
> > drivers/net/ethernet/freescale/fs_enet/mii-fec.c:130:50: warning: format 
> > '%x' expects argument of type 'unsigned int', but argument 4 has type 
> > 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=]
> >   130 | snprintf(new_bus->id, MII_BUS_ID_SIZE, "%x", res.start);
> >   | ~^   ~
> >   |  |  |
> >   |  |  
> > resource_size_t {aka long long unsigned int}
> >   |  unsigned int
> >   | %llx
> > 
> > Use the right print format.
> > 
> > Untested, I can't repro this warning myself. With or without
> > the patch mpc512x_defconfig builds just fine.
> > 
> > Link: 
> > https://lore.kernel.org/all/8f9f8d38-d9c7-9f1b-feb0-103d76902...@infradead.org/
> > Signed-off-by: Jakub Kicinski 
> > ---
> > CC: Randy Dunlap 
> > CC: pantelis.anton...@gmail.com
> > CC: linuxppc-dev@lists.ozlabs.org  
> 
> I'm using gcc-12.2.0.
> 
> Reported-by: Randy Dunlap 
> Acked-by: Randy Dunlap 
> Tested-by: Randy Dunlap  # build-tested

Thank you! GCC 11.1 here, FWIW.


Re: [PATCH net-next] eth: fs_enet: fix print format for resource size

2023-06-14 Thread Randy Dunlap



On 6/14/23 20:52, Jakub Kicinski wrote:
> Randy forwarded report from Stephen that on PowerPC:

Stephen forwarded report from Randy?

netdev & pantelis were cc-ed...

> drivers/net/ethernet/freescale/fs_enet/mii-fec.c: In function 
> 'fs_enet_mdio_probe':
> drivers/net/ethernet/freescale/fs_enet/mii-fec.c:130:50: warning: format '%x' 
> expects argument of type 'unsigned int', but argument 4 has type 
> 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=]
>   130 | snprintf(new_bus->id, MII_BUS_ID_SIZE, "%x", res.start);
>   | ~^   ~
>   |  |  |
>   |  |  
> resource_size_t {aka long long unsigned int}
>   |  unsigned int
>   | %llx
> 
> Use the right print format.
> 
> Untested, I can't repro this warning myself. With or without
> the patch mpc512x_defconfig builds just fine.
> 
> Link: 
> https://lore.kernel.org/all/8f9f8d38-d9c7-9f1b-feb0-103d76902...@infradead.org/
> Signed-off-by: Jakub Kicinski 
> ---
> CC: Randy Dunlap 
> CC: pantelis.anton...@gmail.com
> CC: linuxppc-dev@lists.ozlabs.org

I'm using gcc-12.2.0.

Reported-by: Randy Dunlap 
Acked-by: Randy Dunlap 
Tested-by: Randy Dunlap  # build-tested

Thanks.

> 
> Targeting net-next as I can't repro this, and I don't
> see recent changes which could cause this problem.
> So maybe it's something in linux-next... ?
> In any case res is a struct resource so patch shouldn't hurt.
> ---
>  drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fs_enet/mii-fec.c 
> b/drivers/net/ethernet/freescale/fs_enet/mii-fec.c
> index d37d7a19a759..59a8f0bd0f5c 100644
> --- a/drivers/net/ethernet/freescale/fs_enet/mii-fec.c
> +++ b/drivers/net/ethernet/freescale/fs_enet/mii-fec.c
> @@ -127,7 +127,7 @@ static int fs_enet_mdio_probe(struct platform_device 
> *ofdev)
>   if (ret)
>   goto out_res;
>  
> - snprintf(new_bus->id, MII_BUS_ID_SIZE, "%x", res.start);
> + snprintf(new_bus->id, MII_BUS_ID_SIZE, "%pap", );
>  
>   fec->fecp = ioremap(res.start, resource_size());
>   if (!fec->fecp) {

-- 
~Randy


[PATCH net-next] eth: fs_enet: fix print format for resource size

2023-06-14 Thread Jakub Kicinski
Randy forwarded report from Stephen that on PowerPC:

drivers/net/ethernet/freescale/fs_enet/mii-fec.c: In function 
'fs_enet_mdio_probe':
drivers/net/ethernet/freescale/fs_enet/mii-fec.c:130:50: warning: format '%x' 
expects argument of type 'unsigned int', but argument 4 has type 
'resource_size_t' {aka 'long long unsigned int'} [-Wformat=]
  130 | snprintf(new_bus->id, MII_BUS_ID_SIZE, "%x", res.start);
  | ~^   ~
  |  |  |
  |  |  resource_size_t 
{aka long long unsigned int}
  |  unsigned int
  | %llx

Use the right print format.

Untested, I can't repro this warning myself. With or without
the patch mpc512x_defconfig builds just fine.

Link: 
https://lore.kernel.org/all/8f9f8d38-d9c7-9f1b-feb0-103d76902...@infradead.org/
Signed-off-by: Jakub Kicinski 
---
CC: Randy Dunlap 
CC: pantelis.anton...@gmail.com
CC: linuxppc-dev@lists.ozlabs.org

Targeting net-next as I can't repro this, and I don't
see recent changes which could cause this problem.
So maybe it's something in linux-next... ?
In any case res is a struct resource so patch shouldn't hurt.
---
 drivers/net/ethernet/freescale/fs_enet/mii-fec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fs_enet/mii-fec.c 
b/drivers/net/ethernet/freescale/fs_enet/mii-fec.c
index d37d7a19a759..59a8f0bd0f5c 100644
--- a/drivers/net/ethernet/freescale/fs_enet/mii-fec.c
+++ b/drivers/net/ethernet/freescale/fs_enet/mii-fec.c
@@ -127,7 +127,7 @@ static int fs_enet_mdio_probe(struct platform_device *ofdev)
if (ret)
goto out_res;
 
-   snprintf(new_bus->id, MII_BUS_ID_SIZE, "%x", res.start);
+   snprintf(new_bus->id, MII_BUS_ID_SIZE, "%pap", );
 
fec->fecp = ioremap(res.start, resource_size());
if (!fec->fecp) {
-- 
2.40.1



Re: [PATCH] selftests/powerpc: Remove unneeded variable

2023-06-14 Thread Michael Ellerman
wuyonggang...@208suo.com writes:
> Fix the following coccicheck warning:
>
> tools/testing/selftests/powerpc/alignment/alignment_handler.c:558:5-7: 
> Unneeded variable: "rc". Return "0"

The check is wrong.

> diff --git 
> a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
> b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
> index 33ee34fc0828..4980656c3f70 100644
> --- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
> +++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
> @@ -332,7 +332,7 @@ int test_alignment_handler_vsx_206(void)
>   STORE_VSX_XFORM_TEST(stxvd2x);
>   STORE_VSX_XFORM_TEST(stxvw4x);
>   STORE_VSX_XFORM_TEST(stxsdx);
> -return rc;
> +return 0;

rc is used in the macros.

cheers


Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-06-14 Thread Michael Ellerman
"Nicholas Piggin"  writes:
> On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
>> Michael Ellerman  writes:
>> > Nicholas Piggin  writes:
>> >> The most expensive ordering for hwsync to provide is the store-load
>> >> barrier, because all prior stores have to be drained to the caches
>> >> before subsequent instructions can complete.
>> >>
>> >> stsync just orders stores which means it can just be a barrer that
>> >> goes down the store queue and orders draining, and does not prevent
>> >> completion of subsequent instructions. So it should be faster than
>> >> hwsync.
>> >>
>> >> Use stsync for wmb(). Older processors that don't recognise the SC
>> >> field should treat this as hwsync.
>> >
>> > qemu (7.1) emulating ppc64e does not :/
>> >
>> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 CPUs
>> >   mpic: ISU size: 256, shift: 8, mask: ff
>> >   mpic: Initializing for 256 sources
>> >   Oops: Exception in kernel mode, sig: 4 [#1]
>> ..
>> >
>> > I guess just put it behind an #ifdef 64S.
>>
>> That doesn't work because qemu emulating a G5 also doesn't accept it.
>>
>> So either we need to get qemu updated and wait a while for that to
>> percolate, or do some runtime patching of wmbs in the kernel >_<
>
> Gah, sorry. QEMU really should be ignoring reserved fields in
> instructions :(

Yeah, it's an annoying discrepancy vs real hardware and the ISA.

> I guess leave it out for now. Should fix QEMU but we probably also need
> to do patching so as not to break older QEMUs.

I'll plan to take the first 3 patches, they seem OK as-is.

cheers


Re: [6.4-rc6] Crash during a kexec operation (tpm_amd_is_rng_defective)

2023-06-14 Thread Michael Ellerman
Sachin Sant  writes:
> Following crash is observed during a kexec operation on 
> IBM Power10 server:
>
> [ 34.381548] Kernel attempted to read user page (50) - exploit attempt? (uid: 
> 0)
> [ 34.381562] BUG: Kernel NULL pointer dereference on read at 0x0050
> [ 34.381565] Faulting instruction address: 0xc09db1e4
> [ 34.381569] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 34.381572] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [ 34.381576] Modules linked in: dm_mod(E) nft_fib_inet(E) nft_fib_ipv4(E) 
> nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) 
> nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) 
> nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) bonding(E) tls(E) 
> rfkill(E) ip_set(E) sunrpc(E) nf_tables(E) nfnetlink(E) pseries_rng(E) 
> aes_gcm_p10_crypto(E) drm(E) drm_panel_orientation_quirks(E) xfs(E) 
> libcrc32c(E) sd_mod(E) sr_mod(E) t10_pi(E) crc64_rocksoft_generic(E) cdrom(E) 
> crc64_rocksoft(E) crc64(E) sg(E) ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) 
> vmx_crypto(E) fuse(E)
> [ 34.381613] CPU: 18 PID: 5918 Comm: kexec Kdump: loaded Tainted: G E 
> 6.4.0-rc6-00037-gb6dad5178cea #3
> [ 34.381618] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
> of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
> [ 34.381621] NIP: c09db1e4 LR: c09db928 CTR: c09eab60
> [ 34.381625] REGS: c0009742f780 TRAP: 0300 Tainted: G E 
> (6.4.0-rc6-00037-gb6dad5178cea)
> [ 34.381628] MSR: 8280b033  CR: 
> 4444 XER: 0001
> [ 34.381638] CFAR: c09db19c DAR: 0050 DSISR: 4000 
> IRQMASK: 0 
> [ 34.381638] GPR00: c09db928 c0009742fa20 c14a1500 
> c81d 
> [ 34.381638] GPR04: cd842c50 cd842c50 0025 
> fffe 
> [ 34.381638] GPR08:   0009 
> c00800785280 
> [ 34.381638] GPR12: c09eab60 c0135fab7f00  
>  
> [ 34.381638] GPR16:    
>  
> [ 34.381638] GPR20:    
>  
> [ 34.381638] GPR24:    
> c2e21e08 
> [ 34.381638] GPR28: cd842c48 c2a02208 c321c0c0 
> c81d 
> [ 34.381674] NIP [c09db1e4] tpm_amd_is_rng_defective+0x74/0x240
> [ 34.381681] LR [c09db928] tpm_chip_unregister+0x138/0x160
> [ 34.381685] Call Trace:
> [ 34.381686] [c0009742faa0] [c09db928] 
> tpm_chip_unregister+0x138/0x160
> [ 34.381690] [c0009742fae0] [c09eab94] 
> tpm_ibmvtpm_remove+0x34/0x130
...
> [ 34.381788] Code: 5463063e 408201c8 38210080 4e800020 6000 6000 
> 6000 7c0802a6 fbe10078 7c7f1b78 f8010090 e9230728  2c2c 
> 41820020 7d8903a6 

  2c:   28 07 23 e9 ld  r9,1832(r3)
  30:   50 00 89 e9 ld  r12,80(r9)

Where r3 is *chip.
r9 is NULL, and 80 = 0x50.

Looks like a NULL chip->ops, which oopses in:

static int tpm_request_locality(struct tpm_chip *chip)
{
int rc;

if (!chip->ops->request_locality)


Can you test the patch below?

cheers


diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index cd48033b804a..82eb36e2e16d 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -36,7 +36,7 @@ static int tpm_request_locality(struct tpm_chip *chip)
 {
int rc;
 
-   if (!chip->ops->request_locality)
+   if (!chip->ops || !chip->ops->request_locality)
return 0;
 
rc = chip->ops->request_locality(chip, 0);


Re: [kvm-unit-tests v4 00/12] powerpc: updates, P10, PNV support

2023-06-14 Thread Nicholas Piggin
On Wed Jun 14, 2023 at 11:09 AM AEST, Joel Stanley wrote:
> On Thu, 8 Jun 2023 at 07:58, Nicholas Piggin  wrote:
> >
> > Posting again, a couple of patches were merged and accounted for review
> > comments from last time.
>
> I saw some failures in the spr tests running on a power9 powernv system:
>
> $ TESTNAME=sprs TIMEOUT=90s ACCEL= ./powerpc/run powerpc/sprs.elf -smp
> 1 |grep FAIL
> FAIL: WORT  ( 895):0xc0deba80 <==> 0x

This is just TCG machine? I'm not sure why WORT fails, AFAIKS it's the
same on POWER8 and doesn't do anything just a simple register. I think
on real hardware WORT may not have any bits implemented on POWER9
though.

> $ MIGRATION=yes TESTNAME=sprs-migration TIMEOUT=90s ACCEL=
> ./powerpc/run powerpc/sprs.elf -smp 1 -append '-w' | grep FAIL
> FAIL: SRR0  (  26):0xcafefacec0debabc <==> 0x00402244
> FAIL: SRR1  (  27):0xc006409ebab6 <==> 0x80001001
> FAIL: CTRL  ( 136):0x <==> 0x8001
> FAIL: WORT  ( 895):0xc0deba80 <==> 0x
> FAIL: PIR   (1023):0x0010 <==> 0x0049
>
> Linux 6.2.0-20-generic
> QEMU emulator version 7.2.0 (Debian 1:7.2+dfsg-5ubuntu2)
>
> On a power8 powernv:
>
> MIGRATION=yes TESTNAME=sprs-migration TIMEOUT=90s ACCEL= ./powerpc/run
> powerpc/sprs.elf -smp 1 -append '-w' |grep FAIL
> FAIL: SRR0  (  26):0xcafefacec0debabc <==> 0x00402234
> FAIL: SRR1  (  27):0xc006409ebab6 <==> 0x80001000
> FAIL: CTRL  ( 136):0x <==> 0x8001
> FAIL: PIR   (1023):0x0060 <==> 0x0030

Hmm, seems we take some interrupt over migration test that is not
accounted for (could check the address in SRR0 to see where it is).
Either need to prevent that interrupt or avoid failing on SRR0/1 on
this test.

Interesting about CTRL, I wonder if that not migrating correctly.
PIR looks like a migration issue as well, it can't be changed so
destination CPU has got a different PIR. I would be inclined to
leave those as failing to remind us to look into them.

I'll take a look at the others though.

Thanks,
Nick


Re: [PATCH 14/16] powerpc/book3s64/vmemmap: Switch radix to use a different vmemmap handling function

2023-06-14 Thread Aneesh Kumar K.V
Sachin Sant  writes:

>> 1. First try to map things using PMD (2M)
>> 2. With altmap if altmap cross-boundary check returns true, fall back to 
>> PAGE_SIZE
>> 3. IF we can't allocate PMD_SIZE backing memory for vmemmap, fallback to 
>> PAGE_SIZE
>> 
>> On removing vmemmap mapping, check if every subsection that is using the 
>> vmemmap
>> area is invalid. If found to be invalid, that implies we can safely free the
>> vmemmap area. We don't use the PAGE_UNUSED pattern used by x86 because with 
>> 64K
>> page size, we need to do the above check even at the PAGE_SIZE granularity.
>> 
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>
> With this patch series applied I see the following warning
>
> [  OK  ] Started Monitoring of LVM2 mirrors,…sing dmeventd or progress 
> polling.
> [3.283884] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: nvdimm 
> pmu didn't register rc=-2
> [3.284212] papr_scm ibm,persistent-memory:ibm,pmemory@44104002: nvdimm 
> pmu didn't register rc=-2
> [3.563890] radix-mmu: Mapped 0x04001000-0x040c9000 with 
> 64.0 KiB pages
> [3.703227] [ cut here ]
> [3.703236] failed to free all reserved pages
> [3.703244] WARNING: CPU: 41 PID: 923 at mm/memremap.c:152 
> memunmap_pages+0x37c/0x3a0
> [3.703252] Modules linked in: device_dax(+) nd_pmem nd_btt dax_pmem 
> papr_scm libnvdimm pseries_rng vmx_crypto aes_gcm_p10_crypto ext4 mbcache 
> jbd2 sd_mod t10_pi crc64_rocksoft crc64 sg ibmvscsi scsi_transport_srp 
> ibmveth fuse
> [3.703272] CPU: 41 PID: 923 Comm: systemd-udevd Not tainted 
> 6.4.0-rc6-00037-gb6dad5178cea-dirty #1
> [3.703276] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
> of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
> [3.703280] NIP:  c057a18c LR: c057a188 CTR: 
> 005ca81c
> [3.703283] REGS: c00032a170d0 TRAP: 0700   Not tainted  
> (6.4.0-rc6-00037-gb6dad5178cea-dirty)
> [3.703286] MSR:  8282b033   CR: 
> 48248824  XER: 0002
> [3.703296] CFAR: c015f0c0 IRQMASK: 0  [3.703296] GPR00: 
> c057a188 c00032a17370 c1421500 0021  [
> 3.703296] GPR04: 7fff c00032a17140 c00032a17138 
> 0027  [3.703296] GPR08: c015c91a7c10 0001 
> 0027 c2a18a20  [3.703296] GPR12: 48248824 
> c015cb9f4300 c00032a17d68 c1262b20  [3.703296] GPR16: 
> c0080131 ff20 fff2 c008012d7418  [
> 3.703296] GPR20: c00032a17c30 0004 c005 
> 01000200  [3.703296] GPR24: c2f11570 ce376870 
> 0001 0001  [3.703296] GPR28: ce376840 
> ce3768c8  ce376840  [3.70] NIP 
> [c057a18c] memunmap_pages+0x37c/0x3a0
> [3.703338] LR [c057a188] memunmap_pages+0x378/0x3a0
> [3.703342] Call Trace:
> [3.703344] [c00032a17370] [c057a188] 
> memunmap_pages+0x378/0x3a0 (unreliable)
> [3.703349] [c00032a17420] [c057a928] 
> memremap_pages+0x4a8/0x890
> [3.703355] [c00032a17500] [c057ad4c] 
> devm_memremap_pages+0x3c/0xd0
> [3.703359] [c00032a17540] [c008011c084c] 
> dev_dax_probe+0x134/0x3a0 [device_dax]
> [3.703366] [c00032a175e0] [c09f7e8c] dax_bus_probe+0xac/0x140
> [3.703371] [c00032a17610] [c09b5828] really_probe+0x108/0x530
> [3.703375] [c00032a176a0] [c09b5d04] 
> __driver_probe_device+0xb4/0x200
> [3.703379] [c00032a17720] [c09b5ea8] 
> driver_probe_device+0x58/0x120
> [3.703383] [c00032a17760] [c09b6298] 
> __driver_attach+0x148/0x250
> [3.703387] [c00032a177e0] [c09b1a58] 
> bus_for_each_dev+0xa8/0x130
> [3.703392] [c00032a17840] [c09b4b34] driver_attach+0x34/0x50
> [3.703396] [c00032a17860] [c09b3b98] 
> bus_add_driver+0x258/0x300
> [3.703400] [c00032a178f0] [c09b78d4] 
> driver_register+0xa4/0x1b0
> [3.703404] [c00032a17960] [c09f9530] 
> __dax_driver_register+0x50/0x70
> [3.703409] [c00032a17980] [c008011c1374] dax_init+0x3c/0x58 
> [device_dax]
> [3.703414] [c00032a179a0] [c0013260] 
> do_one_initcall+0x60/0x2f0
> [3.703418] [c00032a17a70] [c0248af8] do_init_module+0x78/0x310
> [3.703424] [c00032a17af0] [c024bcac] load_module+0x2a7c/0x2f30
> [3.703429] [c00032a17d00] [c024c4f0] 
> __do_sys_finit_module+0xe0/0x180
> [3.703434] [c00032a17e10] [c00374c0] 
> system_call_exception+0x140/0x350
> [3.703439] [c00032a17e50] [c000d6a0] 
> system_call_common+0x160/0x2e4
> [3.703444] --- interrupt: c00 at 0x7fff9af2fb34
> [3.703447] NIP:  7fff9af2fb34 LR: 7fff9b6dea9c CTR: 
> 
> [3.703450] REGS: 

Re: [PATCH 4/4] powerpc/64s: Use POWER10 stsync barrier for wmb()

2023-06-14 Thread Nicholas Piggin
On Wed Jun 14, 2023 at 3:56 PM AEST, Michael Ellerman wrote:
> Michael Ellerman  writes:
> > Nicholas Piggin  writes:
> >> The most expensive ordering for hwsync to provide is the store-load
> >> barrier, because all prior stores have to be drained to the caches
> >> before subsequent instructions can complete.
> >>
> >> stsync just orders stores which means it can just be a barrer that
> >> goes down the store queue and orders draining, and does not prevent
> >> completion of subsequent instructions. So it should be faster than
> >> hwsync.
> >>
> >> Use stsync for wmb(). Older processors that don't recognise the SC
> >> field should treat this as hwsync.
> >
> > qemu (7.1) emulating ppc64e does not :/
> >
> >   mpic: Setting up MPIC " OpenPIC  " version 1.2 at fe004, max 1 CPUs
> >   mpic: ISU size: 256, shift: 8, mask: ff
> >   mpic: Initializing for 256 sources
> >   Oops: Exception in kernel mode, sig: 4 [#1]
> ..
> >
> > I guess just put it behind an #ifdef 64S.
>
> That doesn't work because qemu emulating a G5 also doesn't accept it.
>
> So either we need to get qemu updated and wait a while for that to
> percolate, or do some runtime patching of wmbs in the kernel >_<

Gah, sorry. QEMU really should be ignoring reserved fields in
instructions :(

I guess leave it out for now. Should fix QEMU but we probably also need
to do patching so as not to break older QEMUs.

Thanks,
Nick


[powerpc:next] BUILD SUCCESS 48f2444eb4dc0f3de9146f7278e859fa6b5e568b

2023-06-14 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next
branch HEAD: 48f2444eb4dc0f3de9146f7278e859fa6b5e568b  powerpc: Switch i2c 
drivers back to use .probe()

elapsed time: 723m

configs tested: 150
configs skipped: 5

The following configs have been built successfully.
More configs may be tested in the coming days.

tested configs:
alphaallyesconfig   gcc  
alphabuildonly-randconfig-r005-20230614   gcc  
alpha   defconfig   gcc  
alpharandconfig-r002-20230612   gcc  
alpharandconfig-r005-20230612   gcc  
arc  allyesconfig   gcc  
arc defconfig   gcc  
arc  randconfig-r025-20230612   gcc  
arc  randconfig-r033-20230612   gcc  
arc  randconfig-r043-20230612   gcc  
arc  randconfig-r043-20230614   gcc  
arm  allmodconfig   gcc  
arm  allyesconfig   gcc  
arm defconfig   gcc  
arm nhk8815_defconfig   gcc  
arm  randconfig-r005-20230612   gcc  
arm  randconfig-r046-20230612   clang
arm   sama5_defconfig   gcc  
arm64allyesconfig   gcc  
arm64buildonly-randconfig-r003-20230614   clang
arm64   defconfig   gcc  
arm64randconfig-r012-20230614   gcc  
arm64randconfig-r031-20230612   clang
csky buildonly-randconfig-r001-20230614   gcc  
csky buildonly-randconfig-r003-20230614   gcc  
cskydefconfig   gcc  
csky randconfig-r015-20230614   gcc  
csky randconfig-r032-20230612   gcc  
hexagon  randconfig-r035-20230612   clang
hexagon  randconfig-r036-20230612   clang
hexagon  randconfig-r041-20230612   clang
hexagon  randconfig-r045-20230612   clang
i386 allyesconfig   gcc  
i386  debian-10.3   gcc  
i386defconfig   gcc  
i386 randconfig-i001-20230614   clang
i386 randconfig-i002-20230614   clang
i386 randconfig-i003-20230614   clang
i386 randconfig-i004-20230614   clang
i386 randconfig-i005-20230614   clang
i386 randconfig-i006-20230614   clang
i386 randconfig-i011-20230612   gcc  
i386 randconfig-i011-20230614   gcc  
i386 randconfig-i012-20230612   gcc  
i386 randconfig-i012-20230614   gcc  
i386 randconfig-i013-20230612   gcc  
i386 randconfig-i013-20230614   gcc  
i386 randconfig-i014-20230612   gcc  
i386 randconfig-i014-20230614   gcc  
i386 randconfig-i015-20230612   gcc  
i386 randconfig-i015-20230614   gcc  
i386 randconfig-i016-20230612   gcc  
i386 randconfig-i016-20230614   gcc  
i386 randconfig-r014-20230614   gcc  
i386 randconfig-r021-20230612   gcc  
loongarchallmodconfig   gcc  
loongarch allnoconfig   gcc  
loongarchbuildonly-randconfig-r005-20230614   gcc  
loongarch   defconfig   gcc  
loongarchrandconfig-r001-20230612   gcc  
loongarchrandconfig-r006-20230612   gcc  
loongarchrandconfig-r014-20230614   gcc  
loongarchrandconfig-r024-20230612   gcc  
m68k allmodconfig   gcc  
m68k allyesconfig   gcc  
m68kdefconfig   gcc  
m68kmac_defconfig   gcc  
m68kmvme16x_defconfig   gcc  
m68k randconfig-r003-20230612   gcc  
m68k randconfig-r012-20230614   gcc  
m68k randconfig-r013-20230614   gcc  
m68k randconfig-r023-20230612   gcc  
m68kstmark2_defconfig   gcc  
microblaze   buildonly-randconfig-r002-20230614   gcc  
microblaze   randconfig-r004-20230612   gcc  
mips allmodconfig   gcc  
mips allyesconfig   gcc  
mips buildonly-randconfig-r004-20230614   gcc  
mips loongson1b_defconfig   gcc  
mips randconfig-r003-20230612   gcc  
mips randconfig-r031-20230612   gcc  
mips randconfig-r034-20230612   gcc  
nios2   defconfig   gcc  
nios2randconfig-r016-20230614   gcc  
nios2randconfig-r022-20230612   gcc  
openrisc buildonly

Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures

2023-06-14 Thread Maciej W. Rozycki
On Wed, 14 Jun 2023, Bjorn Helgaas wrote:

> >  This is v9 of the change to work around a PCIe link training phenomenon 
> > where a pair of devices both capable of operating at a link speed above 
> > 2.5GT/s seems unable to negotiate the link speed and continues training 
> > indefinitely with the Link Training bit switching on and off repeatedly 
> > and the data link layer never reaching the active state.
> > 
> >  With several requests addressed and a few extra issues spotted this
> > version has now grown to 14 patches.  It has been verified for device 
> > enumeration with and without PCI_QUIRKS enabled, using the same piece of 
> > RISC-V hardware as previously.  Hot plug or reset events have not been 
> > verified, as this is difficult if at all feasible with hardware in 
> > question.
> > 
> >  Last iteration: 
> > ,
> >  
> > and my input to it:
> > .
> 
> Thanks, I applied these to pci/enumeration for v6.5.

 Great, thanks!

> I tweaked a few things, so double-check to be sure I didn't break
> something:
> 
>   - Moved dev->link_active_reporting init to set_pcie_port_type()
> because it does other PCIe-related stuff.
> 
>   - Reordered to keep all the link_active_reporting things together.
> 
>   - Reordered to clean up & factor pcie_retrain_link() before exposing
> it to the rest of the PCI core.
> 
>   - Moved pcie_retrain_link() a little earlier to keep it next to
> pcie_wait_for_link_status().
> 
>   - Squashed the stubs into the actual quirk so we don't have the
> intermediate state where we call the stubs but they never do
> anything (let me know if there's a reason we need your order).
> 
>   - Inline pcie_parent_link_retrain(), which seemed like it didn't add
> enough to be worthwhile.

 Ack, I'll double-check and report back.  A minor nit I've spotted below:

>  static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
>  {
> - bool retrain = true;
>   int delay = 1;
> + bool retrain = false;
> + struct pci_dev *bridge;
> +
> + if (pci_is_pcie(dev)) {
> + retrain = true;
> + bridge = pci_upstream_bridge(dev);
> + }

 If doing it this way, which I actually like, I think it would be a little 
bit better performance- and style-wise if this was written as:

if (pci_is_pcie(dev)) {
bridge = pci_upstream_bridge(dev);
retrain = !!bridge;
}

(or "retrain = bridge != NULL" if you prefer this style), and then we 
don't have to repeatedly check two variables iff (pcie && !bridge) in the 
loop below:

> @@ -1201,9 +1190,9 @@ static int pci_dev_wait(struct pci_dev *dev, char 
> *reset_type, int timeout)
>   }
>  
>   if (delay > PCI_RESET_WAIT) {
> - if (retrain) {
> + if (retrain && bridge) {

-- i.e. code can stay then as:

if (retrain) {

here.  I hope you find this observation rather obvious, so will you amend 
your tree, or shall I send an incremental update?

 Otherwise I don't find anything suspicious with the interdiff itself 
(thanks for posting it, that's really useful indeed!), but as I say I'll 
yet double-check how things look and work with your tree.  Hopefully 
tomorrow (Thu), as I have other stuff yet to complete tonight.

  Maciej


Re: [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb()

2023-06-14 Thread Hugh Dickins
gt; > lo = pte_to_entrylo(pte_val(*ptep));
> > write_c0_entrylo0(lo);
> > write_c0_entrylo1(lo + (HPAGE_SIZE >> 7));
> > @@ -344,8 +343,6 @@ void __update_tlb(struct vm_area_struct * vma, unsigned 
> > long address, pte_t pte)
> > } else
> >  #endif
> > {
> > -   ptep = pte_offset_map(pmdp, address);
> > -
> >  #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
> >  #ifdef CONFIG_XPA
> > write_c0_entrylo0(pte_to_entrylo(ptep->pte_high));
> > -- 
> > 2.35.3
> > 
> 
> I just bisected a crash while powering down a MIPS machine in QEMU to
> this change as commit 8044511d3893 ("mips: update_mmu_cache() can
> replace __update_tlb()") in linux-next.

Thank you, Nathan, that's very helpful indeed.  This patch certainly knew
that it wanted testing, and I'm glad to hear that it is now seeing some.

While powering down?  The messages below look like it was just coming up,
but no doubt that's because you were bisecting (or because I'm unfamiliar
with what messages to expect there).  It's probably irrelevant information,
but I wonder whether the (V)machine worked well enough for a while before
you first powered down and spotted the problem, or whether it's never got
much further than trying to run init (busybox)?  I'm trying to get a feel
for whether the problem occurs under common or uncommon conditions.

> Unfortunately, I can still
> reproduce it with the existing fix you have for this change on the
> mailing list, which is present in next-20230614.

Right, that later fix was only for a build warning, nothing functional
(or at least I hoped that it wasn't making any functional difference).

Thanks a lot for the detailed instructions below: unfortunately, those
would draw me into a realm of testing I've never needed to enter before,
so a lot of time spent on setup and learning.  Usually, I just stare at
the source.

What this probably says is that I should revert most my cleanup there,
and keep as close to the existing code as possible.  But some change is
needed, and I may need to understand (or have a good guess at) what was
going wrong, to decide what kind of retreat will be successful.

Back to the source for a while: I hope I'll find examples in nearby MIPS
kernel source (and git history), which will hint at the right way forward.
Then send you a patch against next-20230614 to try, when I'm reasonably
confident that it's enough to satisfy my purpose, but likely not to waste
your time.

Thanks, until later,
Hugh

> 
> I can reproduce it with the GCC 13.1.0 on kernel.org [1].
> 
>   $ make -skj"$(nproc)" ARCH=mips CROSS_COMPILE=mips-linux- mrproper 
> malta_defconfig vmlinux
> 
>   $ qemu-system-mipsel \
>   -display none \
>   -nodefaults \
>   -cpu 24Kf \
>   -machine malta \
>   -kernel vmlinux \
>   -initrd rootfs.cpio \
>   -m 512m \
>   -serial mon:stdio
>   ...
>   Linux version 6.4.0-rc6-next-20230614 (nathan@dev-arch.thelio-3990X) 
> (mips-linux-gcc (GCC) 13.1.0, GNU ld (GNU Binutils) 2.40) #1 SMP Wed Jun 14 
> 16:13:02 MST 2023
>   ...
>   Run /init as init process
>   process '/bin/busybox' started with executable stack
>   do_page_fault(): sending SIGSEGV to init for invalid read access from 
> 003c
>   epc = 77b893dc in ld-uClibc-1.0.39.so[77b84000+8000]
>   ra  = 77b8930c in ld-uClibc-1.0.39.so[77b84000+8000]
>   Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
>   ---[ end Kernel panic - not syncing: Attempted to kill init! 
> exitcode=0x000b ]---
> 
> The rootfs is available at [2] if it is needed. I am more than happy to
> provide additional information or test patches if necessary.
> 
> [1]: https://mirrors.edge.kernel.org/pub/tools/crosstool/
> [2]: 
> https://github.com/ClangBuiltLinux/boot-utils/releases/download/20230609-194440/mipsel-rootfs.cpio.zst
> 
> Cheers,
> Nathan


Re: [PATCH v1 01/21] kexec: consolidate kexec and crash options into kernel/Kconfig.kexec

2023-06-14 Thread Eric DeVolder




On 6/14/23 10:24, Alexander Gordeev wrote:

On Mon, Jun 12, 2023 at 01:27:53PM -0400, Eric DeVolder wrote:
...

+config KEXEC_FILE
+   bool "Enable kexec file based system call"
+   depends on ARCH_HAS_KEXEC_FILE
+   select KEXEC_CORE
+   help
+ This is new version of kexec system call. This system call is
+ file based and takes file descriptors as system call argument
+ for kernel and initramfs as opposed to list of segments as
+ accepted by previous system call.


Which "previous"? I guess, "by kexec system call" would sound clear.

Thanks!


OK, will make that change!
eric


Re: [PATCH v1 01/21] kexec: consolidate kexec and crash options into kernel/Kconfig.kexec

2023-06-14 Thread Alexander Gordeev
On Mon, Jun 12, 2023 at 01:27:53PM -0400, Eric DeVolder wrote:
...
> +config KEXEC_FILE
> + bool "Enable kexec file based system call"
> + depends on ARCH_HAS_KEXEC_FILE
> + select KEXEC_CORE
> + help
> +   This is new version of kexec system call. This system call is
> +   file based and takes file descriptors as system call argument
> +   for kernel and initramfs as opposed to list of segments as
> +   accepted by previous system call.

Which "previous"? I guess, "by kexec system call" would sound clear.

Thanks!


Re: [PATCH v1 00/21] refactor Kconfig to consolidate KEXEC and CRASH options

2023-06-14 Thread Eric DeVolder




On 6/13/23 15:21, Kees Cook wrote:

On Mon, Jun 12, 2023 at 01:27:52PM -0400, Eric DeVolder wrote:

The Kconfig is refactored to consolidate KEXEC and CRASH options from
various arch//Kconfig files into new file kernel/Kconfig.kexec.


This looks very nice!


Thank you Kees!


[...]
- The boolean ARCH_HAS_ in effect allows the arch to determine
   when the feature is allowed.  Archs which don't have the feature
   simply do not provide the corresponding ARCH_HAS_.
   For each arch, where there previously were KEXEC and/or CRASH
   options, these have been replaced with the corresponding boolean
   ARCH_HAS_, and an appropriate def_bool statement.

   For example, if the arch supports KEXEC_FILE, then the
   ARCH_HAS_KEXEC_FILE simply has a 'def_bool y'. This permits the
   KEXEC_FILE option to be available.

   If the arch has a 'depends on' statement in its original coding
   of the option, then that expression becomes part of the def_bool
   expression. For example, arm64 had:

   config KEXEC
 depends on PM_SLEEP_SMP

   and in this solution, this converts to:

   config ARCH_HAS_KEXEC
 def_bool PM_SLEEP_SMP


- In order to account for the differences in the config coding for
   the three common options, the ARCH_SUPPORTS_ is used.
   This options has a 'depends on ' statement to couple it
   to the main option, and from there can insert the differences
   from the common option and the arch original coding of that option.

   For example, a few archs enable CRYPTO and CRYTPO_SHA256 for
   KEXEC_FILE. These require a ARCH_SUPPORTS_KEXEC_FILE and
   'select CRYPTO' and 'select CRYPTO_SHA256' statements.


Naming nit: "HAS" and "SUPPORTS" feel very similar, and looking at
existing configs, "ARCH_SUPPORTS_..." is already used for doing this
kind of bare "bool" management. e.g. see ARCH_SUPPORTS_INT128

It looks like you need to split "depends" and "select" so the options
can be chosen separately from the "selectable" configs.

How about naming this ARCH_SELECTS_, since that's what it's
there for?


I'm OK with this. Let's see if others agree?

Thank you!
eric


-Kees



Re: [PATCH v1 07/21] m68k/kexec: refactor for kernel/Kconfig.kexec

2023-06-14 Thread Eric DeVolder




On 6/12/23 14:38, Geert Uytterhoeven wrote:

On Mon, Jun 12, 2023 at 7:29 PM Eric DeVolder  wrote:

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_HAS_ and ARCH_SUPPORTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 


Reviewed-by: Geert Uytterhoeven 
Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

 Geert



Thank you Geert!
eric


Re: [PATCH v1 05/21] arm64/kexec: refactor for kernel/Kconfig.kexec

2023-06-14 Thread Eric DeVolder




On 6/13/23 20:22, Leizhen (ThunderTown) wrote:



On 2023/6/13 1:27, Eric DeVolder wrote:

The kexec and crash kernel options are provided in the common
kernel/Kconfig.kexec. Utilize the common options and provide
the ARCH_HAS_ and ARCH_SUPPORTS_ entries to recreate the
equivalent set of KEXEC and CRASH options.

Signed-off-by: Eric DeVolder 
---
  arch/arm64/Kconfig | 61 --
  1 file changed, 10 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 343e1e1cae10..33552476a877 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1433,60 +1433,19 @@ config PARAVIRT_TIME_ACCOUNTING
  
  	  If in doubt, say N here.
  
-config KEXEC

-   depends on PM_SLEEP_SMP
-   select KEXEC_CORE
-   bool "kexec system call"
-   help
- kexec is a system call that implements the ability to shutdown your
- current kernel, and to start another kernel.  It is like a reboot
- but it is independent of the system firmware.   And like a reboot
- you can start any kernel with it, not just Linux.
-
-config KEXEC_FILE
-   bool "kexec file based system call"
-   select KEXEC_CORE
-   select HAVE_IMA_KEXEC if IMA
-   help
- This is new version of kexec system call. This system call is
- file based and takes file descriptors as system call argument
- for kernel and initramfs as opposed to list of segments as
- accepted by previous system call.
-
-config KEXEC_SIG
-   bool "Verify kernel signature during kexec_file_load() syscall"
-   depends on KEXEC_FILE
-   help
- Select this option to verify a signature with loaded kernel
- image. If configured, any attempt of loading a image without
- valid signature will fail.
-
- In addition to that option, you need to enable signature
- verification for the corresponding kernel image type being
- loaded in order for this to work.
+config ARCH_HAS_KEXEC
+   def_bool PM_SLEEP_SMP
  
-config KEXEC_IMAGE_VERIFY_SIG

-   bool "Enable Image signature verification support"
-   default y
-   depends on KEXEC_SIG
-   depends on EFI && SIGNED_PE_FILE_VERIFICATION
-   help
- Enable Image signature verification support.


I don't see an alternative to this option. It's used in
arch/arm64/kernel/kexec_image.c:135


Good catch! I will move this into the common options.
Thank you Zhen!
eric


-
-comment "Support for PE file signature verification disabled"
-   depends on KEXEC_SIG
-   depends on !EFI || !SIGNED_PE_FILE_VERIFICATION
+config ARCH_HAS_KEXEC_FILE
+   def_bool y
  
-config CRASH_DUMP

-   bool "Build kdump crash kernel"
-   help
- Generate crash dump after being started by kexec. This should
- be normally only set in special crash dump kernels which are
- loaded in the main kernel with kexec-tools into a specially
- reserved region and then later executed after a crash by
- kdump/kexec.
+config ARCH_SUPPORTS_KEXEC_FILE
+   def_bool y
+   depends on KEXEC_FILE
+   select HAVE_IMA_KEXEC if IMA
  
-	  For more details see Documentation/admin-guide/kdump/kdump.rst

+config ARCH_HAS_CRASH_DUMP
+   def_bool y
  
  config TRANS_TABLE

def_bool y





Re: [PATCH v1 01/21] kexec: consolidate kexec and crash options into kernel/Kconfig.kexec

2023-06-14 Thread Eric DeVolder




On 6/13/23 20:19, Leizhen (ThunderTown) wrote:



On 2023/6/13 1:27, Eric DeVolder wrote:

The config options for kexec and crash features are consolidated
into new file kernel/Kconfig.kexec. Under the "General Setup" submenu
is a new submenu "Kexec and crash handling" where all the kexec and
crash options that were once in the arch-dependent submenu "Processor
type and features" are now consolidated.

The following options are impacted:

  - KEXEC
  - KEXEC_FILE
  - KEXEC_SIG
  - KEXEC_SIG_FORCE
  - KEXEC_BZIMAGE_VERIFY_SIG
  - KEXEC_JUMP
  - CRASH_DUMP

The three main options are KEXEC, KEXEC_FILE and CRASH_DUMP.

Architectures specify support of certain KEXEC and CRASH features with
similarly named new ARCH_HAS_ config options.

Architectures can utilize the new ARCH_SUPPORTS_ config
options to specify additional components when  is enabled.

To summarize, the ARCH_HAS_ permits the  to be
enabled, and the ARCH_SUPPORTS_ handles side effects (ie.
select statements).

Signed-off-by: Eric DeVolder 
---
  arch/Kconfig |  13 --
  init/Kconfig |   2 +
  kernel/Kconfig.kexec | 103 +++
  3 files changed, 105 insertions(+), 13 deletions(-)
  create mode 100644 kernel/Kconfig.kexec

diff --git a/arch/Kconfig b/arch/Kconfig
index 205fd23e0cad..a37730679730 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -11,19 +11,6 @@ source "arch/$(SRCARCH)/Kconfig"
  
  menu "General architecture-dependent options"
  
-config CRASH_CORE

-   bool
-
-config KEXEC_CORE
-   select CRASH_CORE
-   bool
-
-config KEXEC_ELF
-   bool
-
-config HAVE_IMA_KEXEC
-   bool
-
  config ARCH_HAS_SUBPAGE_FAULTS
bool
help
diff --git a/init/Kconfig b/init/Kconfig
index 32c24950c4ce..4424447e23a5 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1917,6 +1917,8 @@ config BINDGEN_VERSION_TEXT
  config TRACEPOINTS
bool
  
+source "kernel/Kconfig.kexec"

+
  endmenu   # General setup
  
  source "arch/Kconfig"

diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
new file mode 100644
index ..660048099865
--- /dev/null
+++ b/kernel/Kconfig.kexec
@@ -0,0 +1,103 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+menu "Kexec and crash features"
+
+config CRASH_CORE
+   bool
+
+config KEXEC_CORE
+   select CRASH_CORE
+   bool
+
+config KEXEC_ELF
+   bool
+
+config HAVE_IMA_KEXEC
+   bool
+
+config KEXEC
+   bool "Enable kexec system call"
+   default ARCH_DEFAULT_KEXEC
+   depends on ARCH_HAS_KEXEC
+   select KEXEC_CORE
+   help
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel.  It is like a reboot
+ but it is independent of the system firmware.   And like a reboot
+ you can start any kernel with it, not just Linux.


"kernel.  It is like", "firmware.   And like"

A few more spaces, I don't know the original author's intention, perhaps can be 
removed.


I'll remove the extra spaces.


+
+ The name comes from the similarity to the exec system call.
+
+ It is an ongoing process to be certain the hardware in a machine
+ is properly shutdown, so do not be surprised if this code does not
+ initially work for you.  As of this writing the exact hardware
+ interface is strongly in flux, so no good recommendation can be
+ made.
+
+config KEXEC_FILE
+   bool "Enable kexec file based system call"
+   depends on ARCH_HAS_KEXEC_FILE
+   select KEXEC_CORE
+   help
+ This is new version of kexec system call. This system call is
+ file based and takes file descriptors as system call argument
+ for kernel and initramfs as opposed to list of segments as
+ accepted by previous system call.
+
+config KEXEC_SIG
+   bool "Verify kernel signature during kexec_file_load() syscall"
+   depends on KEXEC_FILE && MODULE_SIG_FORMAT


I see that there is no "depends on MODULE_SIG_FORMAT" on x86 and arm64.

Good catch, I'll remove MODULE_SIG_FORMAT and place it on just s390 (which is the only arch that had 
it this way).



+   help
+


This blank line can be deleted.


I will remove it.

Thank you, Zhen!
eric


+ This option makes the kexec_file_load() syscall check for a valid
+ signature of the kernel image.  The image can still be loaded without
+ a valid signature unless you also enable KEXEC_SIG_FORCE, though if
+ there's a signature that we can check, then it must be valid.
+
+ In addition to this option, you need to enable signature
+ verification for the corresponding kernel image type being
+ loaded in order for this to work.
+
+config KEXEC_SIG_FORCE
+   bool "Require a valid signature in kexec_file_load() syscall"
+   depends on KEXEC_SIG
+   help
+ This option makes kernel signature verification mandatory for
+ the 

Re: [PATCH v2 07/23] mips: update_mmu_cache() can replace __update_tlb()

2023-06-14 Thread Nathan Chancellor
Hi Hugh,

On Thu, Jun 08, 2023 at 12:17:24PM -0700, Hugh Dickins wrote:
> Don't make update_mmu_cache() a wrapper around __update_tlb(): call it
> directly, and use the ptep (or pmdp) provided by the caller, instead of
> re-calling pte_offset_map() - which would raise a question of whether a
> pte_unmap() is needed to balance it.
> 
> Check whether the "ptep" provided by the caller is actually the pmdp,
> instead of testing pmd_huge(): or test pmd_huge() too and warn if it
> disagrees?  This is "hazardous" territory: needs review and testing.
> 
> Signed-off-by: Hugh Dickins 
> ---
>  arch/mips/include/asm/pgtable.h | 15 +++
>  arch/mips/mm/tlb-r3k.c  |  5 +++--
>  arch/mips/mm/tlb-r4k.c  |  9 +++--
>  3 files changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index 574fa14ac8b2..9175dfab08d5 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -565,15 +565,8 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
>  }
>  #endif
>  
> -extern void __update_tlb(struct vm_area_struct *vma, unsigned long address,
> - pte_t pte);
> -
> -static inline void update_mmu_cache(struct vm_area_struct *vma,
> - unsigned long address, pte_t *ptep)
> -{
> - pte_t pte = *ptep;
> - __update_tlb(vma, address, pte);
> -}
> +extern void update_mmu_cache(struct vm_area_struct *vma,
> + unsigned long address, pte_t *ptep);
>  
>  #define  __HAVE_ARCH_UPDATE_MMU_TLB
>  #define update_mmu_tlb   update_mmu_cache
> @@ -581,9 +574,7 @@ static inline void update_mmu_cache(struct vm_area_struct 
> *vma,
>  static inline void update_mmu_cache_pmd(struct vm_area_struct *vma,
>   unsigned long address, pmd_t *pmdp)
>  {
> - pte_t pte = *(pte_t *)pmdp;
> -
> - __update_tlb(vma, address, pte);
> + update_mmu_cache(vma, address, (pte_t *)pmdp);
>  }
>  
>  /*
> diff --git a/arch/mips/mm/tlb-r3k.c b/arch/mips/mm/tlb-r3k.c
> index 53dfa2b9316b..e5722cd8dd6d 100644
> --- a/arch/mips/mm/tlb-r3k.c
> +++ b/arch/mips/mm/tlb-r3k.c
> @@ -176,7 +176,8 @@ void local_flush_tlb_page(struct vm_area_struct *vma, 
> unsigned long page)
>   }
>  }
>  
> -void __update_tlb(struct vm_area_struct *vma, unsigned long address, pte_t 
> pte)
> +void update_mmu_cache(struct vm_area_struct *vma,
> +   unsigned long address, pte_t *ptep)
>  {
>   unsigned long asid_mask = cpu_asid_mask(_cpu_data);
>   unsigned long flags;
> @@ -203,7 +204,7 @@ void __update_tlb(struct vm_area_struct *vma, unsigned 
> long address, pte_t pte)
>   BARRIER;
>   tlb_probe();
>   idx = read_c0_index();
> - write_c0_entrylo0(pte_val(pte));
> + write_c0_entrylo0(pte_val(*ptep));
>   write_c0_entryhi(address | pid);
>   if (idx < 0) {  /* BARRIER */
>   tlb_write_random();
> diff --git a/arch/mips/mm/tlb-r4k.c b/arch/mips/mm/tlb-r4k.c
> index 1b939abbe4ca..c96725d17cab 100644
> --- a/arch/mips/mm/tlb-r4k.c
> +++ b/arch/mips/mm/tlb-r4k.c
> @@ -290,14 +290,14 @@ void local_flush_tlb_one(unsigned long page)
>   * updates the TLB with the new pte(s), and another which also checks
>   * for the R4k "end of page" hardware bug and does the needy.
>   */
> -void __update_tlb(struct vm_area_struct * vma, unsigned long address, pte_t 
> pte)
> +void update_mmu_cache(struct vm_area_struct *vma,
> +   unsigned long address, pte_t *ptep)
>  {
>   unsigned long flags;
>   pgd_t *pgdp;
>   p4d_t *p4dp;
>   pud_t *pudp;
>   pmd_t *pmdp;
> - pte_t *ptep;
>   int idx, pid;
>  
>   /*
> @@ -326,10 +326,9 @@ void __update_tlb(struct vm_area_struct * vma, unsigned 
> long address, pte_t pte)
>   idx = read_c0_index();
>  #ifdef CONFIG_MIPS_HUGE_TLB_SUPPORT
>   /* this could be a huge page  */
> - if (pmd_huge(*pmdp)) {
> + if (ptep == (pte_t *)pmdp) {
>   unsigned long lo;
>   write_c0_pagemask(PM_HUGE_MASK);
> - ptep = (pte_t *)pmdp;
>   lo = pte_to_entrylo(pte_val(*ptep));
>   write_c0_entrylo0(lo);
>   write_c0_entrylo1(lo + (HPAGE_SIZE >> 7));
> @@ -344,8 +343,6 @@ void __update_tlb(struct vm_area_struct * vma, unsigned 
> long address, pte_t pte)
>   } else
>  #endif
>   {
> - ptep = pte_offset_map(pmdp, address);
> -
>  #if defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
>  #ifdef CONFIG_XPA
>   write_c0_entrylo0(pte_to_entrylo(pt

Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures

2023-06-14 Thread Bjorn Helgaas
On Sun, Jun 11, 2023 at 06:19:08PM +0100, Maciej W. Rozycki wrote:
> Hi,
> 
>  This is v9 of the change to work around a PCIe link training phenomenon 
> where a pair of devices both capable of operating at a link speed above 
> 2.5GT/s seems unable to negotiate the link speed and continues training 
> indefinitely with the Link Training bit switching on and off repeatedly 
> and the data link layer never reaching the active state.
> 
>  With several requests addressed and a few extra issues spotted this
> version has now grown to 14 patches.  It has been verified for device 
> enumeration with and without PCI_QUIRKS enabled, using the same piece of 
> RISC-V hardware as previously.  Hot plug or reset events have not been 
> verified, as this is difficult if at all feasible with hardware in 
> question.
> 
>  Last iteration: 
> ,
>  
> and my input to it:
> .

Thanks, I applied these to pci/enumeration for v6.5.

I tweaked a few things, so double-check to be sure I didn't break
something:

  - Moved dev->link_active_reporting init to set_pcie_port_type()
because it does other PCIe-related stuff.

  - Reordered to keep all the link_active_reporting things together.

  - Reordered to clean up & factor pcie_retrain_link() before exposing
it to the rest of the PCI core.

  - Moved pcie_retrain_link() a little earlier to keep it next to
pcie_wait_for_link_status().

  - Squashed the stubs into the actual quirk so we don't have the
intermediate state where we call the stubs but they never do
anything (let me know if there's a reason we need your order).

  - Inline pcie_parent_link_retrain(), which seemed like it didn't add
enough to be worthwhile.

Interdiff below:

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 80694e2574b8..f11268924c8f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1153,27 +1153,16 @@ void pci_resume_bus(struct pci_bus *bus)
pci_walk_bus(bus, pci_resume_one, NULL);
 }
 
-/**
- * pcie_parent_link_retrain - Check and retrain link we are downstream from
- * @dev: PCI device to handle.
- *
- * Return TRUE if the link was retrained, FALSE otherwise.
- */
-static bool pcie_parent_link_retrain(struct pci_dev *dev)
-{
-   struct pci_dev *bridge;
-
-   bridge = pci_upstream_bridge(dev);
-   if (bridge)
-   return pcie_failed_link_retrain(bridge);
-   else
-   return false;
-}
-
 static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 {
-   bool retrain = true;
int delay = 1;
+   bool retrain = false;
+   struct pci_dev *bridge;
+
+   if (pci_is_pcie(dev)) {
+   retrain = true;
+   bridge = pci_upstream_bridge(dev);
+   }
 
/*
 * After reset, the device should not silently discard config
@@ -1201,9 +1190,9 @@ static int pci_dev_wait(struct pci_dev *dev, char 
*reset_type, int timeout)
}
 
if (delay > PCI_RESET_WAIT) {
-   if (retrain) {
+   if (retrain && bridge) {
retrain = false;
-   if (pcie_parent_link_retrain(dev)) {
+   if (pcie_failed_link_retrain(bridge)) {
delay = 1;
continue;
}
@@ -4914,6 +4903,38 @@ static bool pcie_wait_for_link_status(struct pci_dev 
*pdev,
return (lnksta & lnksta_mask) == lnksta_match;
 }
 
+/**
+ * pcie_retrain_link - Request a link retrain and wait for it to complete
+ * @pdev: Device whose link to retrain.
+ * @use_lt: Use the LT bit if TRUE, or the DLLLA bit if FALSE, for status.
+ *
+ * Retrain completion status is retrieved from the Link Status Register
+ * according to @use_lt.  It is not verified whether the use of the DLLLA
+ * bit is valid.
+ *
+ * Return TRUE if successful, or FALSE if training has not completed
+ * within PCIE_LINK_RETRAIN_TIMEOUT_MS milliseconds.
+ */
+bool pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
+{
+   u16 lnkctl;
+
+   pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, );
+   lnkctl |= PCI_EXP_LNKCTL_RL;
+   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+   if (pdev->clear_retrain_link) {
+   /*
+* Due to an erratum in some devices the Retrain Link bit
+* needs to be cleared again manually to allow the link
+* training to succeed.
+*/
+   lnkctl &= ~PCI_EXP_LNKCTL_RL;
+   pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+   }
+
+   return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+}
+
 /**
  * pcie_wait_for_link_delay - Wait until link is active or 

Re: [PATCH v4 27/34] nios2: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Dinh Nguyen




On 6/14/23 04:30, Geert Uytterhoeven wrote:

Hi Dinh,

On Wed, Jun 14, 2023 at 12:17 AM Dinh Nguyen  wrote:

On 6/12/23 16:04, Vishal Moola (Oracle) wrote:

Part of the conversions to replace pgtable constructor/destructors with
ptdesc equivalents.

Signed-off-by: Vishal Moola (Oracle) 
---
   arch/nios2/include/asm/pgalloc.h | 8 
   1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/nios2/include/asm/pgalloc.h b/arch/nios2/include/asm/pgalloc.h
index ecd1657bb2ce..ce6bb8e74271 100644
--- a/arch/nios2/include/asm/pgalloc.h
+++ b/arch/nios2/include/asm/pgalloc.h
@@ -28,10 +28,10 @@ static inline void pmd_populate(struct mm_struct *mm, pmd_t 
*pmd,

   extern pgd_t *pgd_alloc(struct mm_struct *mm);

-#define __pte_free_tlb(tlb, pte, addr)   \
- do {\
- pgtable_pte_page_dtor(pte); \
- tlb_remove_page((tlb), (pte));  \
+#define __pte_free_tlb(tlb, pte, addr)   \
+ do {\
+ pagetable_pte_dtor(page_ptdesc(pte));   \
+ tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
   } while (0)

   #endif /* _ASM_NIOS2_PGALLOC_H */


Applied!


I don't think you can just apply this patch, as the new functions
were only introduced in [PATCH v4 05/34] of this series.



Ah, thanks for the pointer!

Dinh


Re: [PATCH v4 1/1] PCI: layerscape: Add the endpoint linkup notifier support

2023-06-14 Thread Frank Li
On Mon, Jun 12, 2023 at 12:12:53PM -0400, Frank Li wrote:
> On Mon, May 15, 2023 at 11:10:49AM -0400, Frank Li wrote:
> > Layerscape has PME interrupt, which can be used as linkup notifier.
> > Set CFG_READY bit of PEX_PF0_CONFIG to enable accesses from root complex
> > when linkup detected.
> > 
> > Acked-by: Manivannan Sadhasivam 
> > Signed-off-by: Xiaowei Bao 
> > Signed-off-by: Frank Li 
> > ---
> 
> Ping, not comments almost over 1 months.

@lorenzo and @Bjorn

Could you please pick this patch? This just added a linkup notification for
layerscape platform and there are not futher comments over 1 months.

Frank

> 
> > Change from v3 to v4
> >  - swap irq and big_endian
> > Change from v2 to v3
> >  - align 80 column
> >  - clear irq firstly
> >  - dev_info to dev_dbg
> >  - remove double space
> >  - update commit message
> > 
> > Change from v1 to v2
> > - pme -> PME
> > - irq -> IRQ
> > - update dev_info message according to Bjorn's suggestion
> > 
> >  .../pci/controller/dwc/pci-layerscape-ep.c| 102 +-
> >  1 file changed, 101 insertions(+), 1 deletion(-)
> > 
> > 


[PATCH] powerpc: 52xx: Make immr_id DT match tables static

2023-06-14 Thread Rob Herring
In some builds, the mpc52xx_pm_prepare()/lite5200_pm_prepare() functions
generate stack size warnings. The addition of 'struct resource' in commit
2500763dd3db ("powerpc: Use of_address_to_resource()") grew the stack size
and is blamed for the warnings. However, the real issue is there's no
reason the 'struct of_device_id immr_ids' DT match tables need to be on
the stack as they are constant. Declare them as static to move them off
the stack.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202306130405.utv5yozd-...@intel.com/
Signed-off-by: Rob Herring 
---
 arch/powerpc/platforms/52xx/lite5200_pm.c | 2 +-
 arch/powerpc/platforms/52xx/mpc52xx_pm.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/lite5200_pm.c 
b/arch/powerpc/platforms/52xx/lite5200_pm.c
index ee29b63fca16..4900f5f48cce 100644
--- a/arch/powerpc/platforms/52xx/lite5200_pm.c
+++ b/arch/powerpc/platforms/52xx/lite5200_pm.c
@@ -47,7 +47,7 @@ static int lite5200_pm_begin(suspend_state_t state)
 static int lite5200_pm_prepare(void)
 {
struct device_node *np;
-   const struct of_device_id immr_ids[] = {
+   static const struct of_device_id immr_ids[] = {
{ .compatible = "fsl,mpc5200-immr", },
{ .compatible = "fsl,mpc5200b-immr", },
{ .type = "soc", .compatible = "mpc5200", }, /* lite5200 */
diff --git a/arch/powerpc/platforms/52xx/mpc52xx_pm.c 
b/arch/powerpc/platforms/52xx/mpc52xx_pm.c
index 549b3629e39a..f0c31ae15da5 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_pm.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_pm.c
@@ -60,7 +60,7 @@ int mpc52xx_set_wakeup_gpio(u8 pin, u8 level)
 int mpc52xx_pm_prepare(void)
 {
struct device_node *np;
-   const struct of_device_id immr_ids[] = {
+   static const struct of_device_id immr_ids[] = {
{ .compatible = "fsl,mpc5200-immr", },
{ .compatible = "fsl,mpc5200b-immr", },
{ .type = "soc", .compatible = "mpc5200", }, /* lite5200 */
-- 
2.39.2



Re: [PATCH v4 21/34] arm64: Convert various functions to use ptdescs

2023-06-14 Thread Catalin Marinas
On Mon, Jun 12, 2023 at 02:04:10PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Catalin Marinas 


Re: [PATCH v4 34/34] mm: Remove pgtable_{pmd, pte}_page_{ctor, dtor}() wrappers

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:23PM -0700, Vishal Moola (Oracle) wrote:
> These functions are no longer necessary. Remove them and cleanup
> Documentation referencing them.
> 
> Signed-off-by: Vishal Moola (Oracle) 

I've found one stale reference in riscv:

$ git grep -n pgtable_pmd_page_ctor
arch/riscv/mm/init.c:440:   BUG_ON(!vaddr || 
!pgtable_pmd_page_ctor(virt_to_page(vaddr)));

Otherwise

Acked-by: Mike Rapoport (IBM) 


> ---
>  Documentation/mm/split_page_table_lock.rst| 12 +--
>  .../zh_CN/mm/split_page_table_lock.rst| 14 ++---
>  include/linux/mm.h| 20 ---
>  3 files changed, 13 insertions(+), 33 deletions(-)
> 
> diff --git a/Documentation/mm/split_page_table_lock.rst 
> b/Documentation/mm/split_page_table_lock.rst
> index 50ee0dfc95be..4bffec728340 100644
> --- a/Documentation/mm/split_page_table_lock.rst
> +++ b/Documentation/mm/split_page_table_lock.rst
> @@ -53,7 +53,7 @@ Support of split page table lock by an architecture
>  ===
>  
>  There's no need in special enabling of PTE split page table lock: everything
> -required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), 
> which
> +required is done by pagetable_pte_ctor() and pagetable_pte_dtor(), which
>  must be called on PTE table allocation / freeing.
>  
>  Make sure the architecture doesn't use slab allocator for page table
> @@ -63,8 +63,8 @@ This field shares storage with page->ptl.
>  PMD split lock only makes sense if you have more than two page table
>  levels.
>  
> -PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
> -allocation and pgtable_pmd_page_dtor() on freeing.
> +PMD split lock enabling requires pagetable_pmd_ctor() call on PMD table
> +allocation and pagetable_pmd_dtor() on freeing.
>  
>  Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
>  pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
> @@ -72,7 +72,7 @@ paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
>  
>  With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
>  
> -NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
> +NOTE: pagetable_pte_ctor() and pagetable_pmd_ctor() can fail -- it must
>  be handled properly.
>  
>  page->ptl
> @@ -92,7 +92,7 @@ trick:
> split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
> one more cache line for indirect access;
>  
> -The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
> -pgtable_pmd_page_ctor() for PMD table.
> +The spinlock_t allocated in pagetable_pte_ctor() for PTE table and in
> +pagetable_pmd_ctor() for PMD table.
>  
>  Please, never access page->ptl directly -- use appropriate helper.
> diff --git a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst 
> b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
> index 4fb7aa666037..a2c288670a24 100644
> --- a/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
> +++ b/Documentation/translations/zh_CN/mm/split_page_table_lock.rst
> @@ -56,16 +56,16 @@ Hugetlb特定的辅助函数:
>  架构对分页表锁的支持
>  
>  
> -没有必要特别启用PTE分页表锁:所有需要的东西都由pgtable_pte_page_ctor()
> -和pgtable_pte_page_dtor()完成,它们必须在PTE表分配/释放时被调用。
> +没有必要特别启用PTE分页表锁:所有需要的东西都由pagetable_pte_ctor()
> +和pagetable_pte_dtor()完成,它们必须在PTE表分配/释放时被调用。
>  
>  确保架构不使用slab分配器来分配页表:slab使用page->slab_cache来分配其页
>  面。这个区域与page->ptl共享存储。
>  
>  PMD分页锁只有在你有两个以上的页表级别时才有意义。
>  
> -启用PMD分页锁需要在PMD表分配时调用pgtable_pmd_page_ctor(),在释放时调
> -用pgtable_pmd_page_dtor()。
> +启用PMD分页锁需要在PMD表分配时调用pagetable_pmd_ctor(),在释放时调
> +用pagetable_pmd_dtor()。
>  
>  分配通常发生在pmd_alloc_one()中,释放发生在pmd_free()和pmd_free_tlb()
>  中,但要确保覆盖所有的PMD表分配/释放路径:即X86_PAE在pgd_alloc()中预先
> @@ -73,7 +73,7 @@ PMD分页锁只有在你有两个以上的页表级别时才有意义。
>  
>  一切就绪后,你可以设置CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK。
>  
> -注意:pgtable_pte_page_ctor()和pgtable_pmd_page_ctor()可能失败--必
> +注意:pagetable_pte_ctor()和pagetable_pmd_ctor()可能失败--必
>  须正确处理。
>  
>  page->ptl
> @@ -90,7 +90,7 @@ page->ptl用于访问分割页表锁,其中'page'是包含该表的页面struc
> 的指针并动态分配它。这允许在启用DEBUG_SPINLOCK或DEBUG_LOCK_ALLOC的
> 情况下使用分页锁,但由于间接访问而多花了一个缓存行。
>  
> -PTE表的spinlock_t分配在pgtable_pte_page_ctor()中,PMD表的spinlock_t
> -分配在pgtable_pmd_page_ctor()中。
> +PTE表的spinlock_t分配在pagetable_pte_ctor()中,PMD表的spinlock_t
> +分配在pagetable_pmd_ctor()中。
>  
>  请不要直接访问page->ptl - -使用适当的辅助函数。
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index dc211c43610b..6d83483cf186 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2897,11 +2897,6 @@ static inline bool pagetable_pte_ctor(struct ptdesc 
> *ptdesc)
>   return true;
>  }
>  
> -static inline bool pgtable_pte_page_ctor(struct page *page)
> -{
> - return pagetable_pte_ctor(page_ptdesc(page));
> -}
> -
>  static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
>  {
>   struct folio *folio = ptdesc_folio(ptdesc);
> @@ -2911,11 +2906,6 @@ 

Re: [PATCH v4 33/34] um: Convert {pmd, pte}_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:22PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents. Also cleans up some spacing issues.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/um/include/asm/pgalloc.h | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/um/include/asm/pgalloc.h b/arch/um/include/asm/pgalloc.h
> index 8ec7cd46dd96..de5e31c64793 100644
> --- a/arch/um/include/asm/pgalloc.h
> +++ b/arch/um/include/asm/pgalloc.h
> @@ -25,19 +25,19 @@
>   */
>  extern pgd_t *pgd_alloc(struct mm_struct *);
>  
> -#define __pte_free_tlb(tlb,pte, address) \
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb),(pte));   \
> +#define __pte_free_tlb(tlb, pte, address)\
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
>  } while (0)
>  
>  #ifdef CONFIG_3_LEVEL_PGTABLES
>  
> -#define __pmd_free_tlb(tlb, pmd, address)\
> -do { \
> - pgtable_pmd_page_dtor(virt_to_page(pmd));   \
> - tlb_remove_page((tlb),virt_to_page(pmd));   \
> -} while (0)  \
> +#define __pmd_free_tlb(tlb, pmd, address)\
> +do { \
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));\
> + tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \
> +} while (0)
>  
>  #endif
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 32/34] sparc: Convert pgtable_pte_page_{ctor, dtor}() to ptdesc equivalents

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:21PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable pte constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/sparc/mm/srmmu.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/sparc/mm/srmmu.c b/arch/sparc/mm/srmmu.c
> index 13f027afc875..8393faa3e596 100644
> --- a/arch/sparc/mm/srmmu.c
> +++ b/arch/sparc/mm/srmmu.c
> @@ -355,7 +355,8 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
>   return NULL;
>   page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
>   spin_lock(>page_table_lock);
> - if (page_ref_inc_return(page) == 2 && !pgtable_pte_page_ctor(page)) {
> + if (page_ref_inc_return(page) == 2 &&
> + !pagetable_pte_ctor(page_ptdesc(page))) {
>   page_ref_dec(page);
>   ptep = NULL;
>   }
> @@ -371,7 +372,7 @@ void pte_free(struct mm_struct *mm, pgtable_t ptep)
>   page = pfn_to_page(__nocache_pa((unsigned long)ptep) >> PAGE_SHIFT);
>   spin_lock(>page_table_lock);
>   if (page_ref_dec_return(page) == 1)
> - pgtable_pte_page_dtor(page);
> + pagetable_pte_dtor(page_ptdesc(page));
>   spin_unlock(>page_table_lock);
>  
>   srmmu_free_nocache(ptep, SRMMU_PTE_TABLE_SIZE);
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 31/34] sparc64: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:20PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/sparc/mm/init_64.c | 17 +
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
> index 04f9db0c3111..105915cd2eee 100644
> --- a/arch/sparc/mm/init_64.c
> +++ b/arch/sparc/mm/init_64.c
> @@ -2893,14 +2893,15 @@ pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>  
>  pgtable_t pte_alloc_one(struct mm_struct *mm)
>  {
> - struct page *page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> - if (!page)
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL | __GFP_ZERO, 0);
> +
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pte_page_ctor(page)) {
> - __free_page(page);
> + if (!pagetable_pte_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
> - return (pte_t *) page_address(page);
> + return ptdesc_address(ptdesc);
>  }
>  
>  void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
> @@ -2910,10 +2911,10 @@ void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  
>  static void __pte_free(pgtable_t pte)
>  {
> - struct page *page = virt_to_page(pte);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pte);
>  
> - pgtable_pte_page_dtor(page);
> - __free_page(page);
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  void pte_free(struct mm_struct *mm, pgtable_t pte)
> -- 
> 2.40.1
> 
> 
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 30/34] sh: Convert pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:19PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents. Also cleans up some spacing issues.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> Reviewed-by: Geert Uytterhoeven 
> Acked-by: John Paul Adrian Glaubitz 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/sh/include/asm/pgalloc.h | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/sh/include/asm/pgalloc.h b/arch/sh/include/asm/pgalloc.h
> index a9e98233c4d4..5d8577ab1591 100644
> --- a/arch/sh/include/asm/pgalloc.h
> +++ b/arch/sh/include/asm/pgalloc.h
> @@ -2,6 +2,7 @@
>  #ifndef __ASM_SH_PGALLOC_H
>  #define __ASM_SH_PGALLOC_H
>  
> +#include 
>  #include 
>  
>  #define __HAVE_ARCH_PMD_ALLOC_ONE
> @@ -31,10 +32,10 @@ static inline void pmd_populate(struct mm_struct *mm, 
> pmd_t *pmd,
>   set_pmd(pmd, __pmd((unsigned long)page_address(pte)));
>  }
>  
> -#define __pte_free_tlb(tlb,pte,addr) \
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), (pte));  \
> +#define __pte_free_tlb(tlb, pte, addr)   \
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
>  } while (0)
>  
>  #endif /* __ASM_SH_PGALLOC_H */
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 29/34] riscv: Convert alloc_{pmd, pte}_late() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:18PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> Acked-by: Palmer Dabbelt 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/riscv/include/asm/pgalloc.h |  8 
>  arch/riscv/mm/init.c | 16 ++--
>  2 files changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/pgalloc.h 
> b/arch/riscv/include/asm/pgalloc.h
> index 59dc12b5b7e8..d169a4f41a2e 100644
> --- a/arch/riscv/include/asm/pgalloc.h
> +++ b/arch/riscv/include/asm/pgalloc.h
> @@ -153,10 +153,10 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  
>  #endif /* __PAGETABLE_PMD_FOLDED */
>  
> -#define __pte_free_tlb(tlb, pte, buf)   \
> -do {\
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), pte);\
> +#define __pte_free_tlb(tlb, pte, buf)\
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
>  } while (0)
>  #endif /* CONFIG_MMU */
>  
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 3d689ffb2072..6bfeec80bf4e 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -354,12 +354,10 @@ static inline phys_addr_t __init 
> alloc_pte_fixmap(uintptr_t va)
>  
>  static phys_addr_t __init alloc_pte_late(uintptr_t va)
>  {
> - unsigned long vaddr;
> -
> - vaddr = __get_free_page(GFP_KERNEL);
> - BUG_ON(!vaddr || !pgtable_pte_page_ctor(virt_to_page((void *)vaddr)));
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
>  
> - return __pa(vaddr);
> + BUG_ON(!ptdesc || !pagetable_pte_ctor(ptdesc));
> + return __pa((pte_t *)ptdesc_address(ptdesc));
>  }
>  
>  static void __init create_pte_mapping(pte_t *ptep,
> @@ -437,12 +435,10 @@ static phys_addr_t __init alloc_pmd_fixmap(uintptr_t va)
>  
>  static phys_addr_t __init alloc_pmd_late(uintptr_t va)
>  {
> - unsigned long vaddr;
> -
> - vaddr = __get_free_page(GFP_KERNEL);
> - BUG_ON(!vaddr || !pgtable_pmd_page_ctor(virt_to_page((void *)vaddr)));
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
>  
> - return __pa(vaddr);
> + BUG_ON(!ptdesc || !pagetable_pmd_ctor(ptdesc));
> + return __pa((pmd_t *)ptdesc_address(ptdesc));
>  }
>  
>  static void __init create_pmd_mapping(pmd_t *pmdp,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 28/34] openrisc: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:17PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/openrisc/include/asm/pgalloc.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/openrisc/include/asm/pgalloc.h 
> b/arch/openrisc/include/asm/pgalloc.h
> index b7b2b8d16fad..c6a73772a546 100644
> --- a/arch/openrisc/include/asm/pgalloc.h
> +++ b/arch/openrisc/include/asm/pgalloc.h
> @@ -66,10 +66,10 @@ extern inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  
>  extern pte_t *pte_alloc_one_kernel(struct mm_struct *mm);
>  
> -#define __pte_free_tlb(tlb, pte, addr)   \
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), (pte));  \
> +#define __pte_free_tlb(tlb, pte, addr)   \
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
>  } while (0)
>  
>  #endif
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 27/34] nios2: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:16PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/nios2/include/asm/pgalloc.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/nios2/include/asm/pgalloc.h 
> b/arch/nios2/include/asm/pgalloc.h
> index ecd1657bb2ce..ce6bb8e74271 100644
> --- a/arch/nios2/include/asm/pgalloc.h
> +++ b/arch/nios2/include/asm/pgalloc.h
> @@ -28,10 +28,10 @@ static inline void pmd_populate(struct mm_struct *mm, 
> pmd_t *pmd,
>  
>  extern pgd_t *pgd_alloc(struct mm_struct *mm);
>  
> -#define __pte_free_tlb(tlb, pte, addr)   \
> - do {\
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), (pte));  \
> +#define __pte_free_tlb(tlb, pte, addr)   
> \
> + do {\
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
>   } while (0)
>  
>  #endif /* _ASM_NIOS2_PGALLOC_H */
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 26/34] mips: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:15PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/mips/include/asm/pgalloc.h | 31 +--
>  arch/mips/mm/pgtable.c  |  7 ---
>  2 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
> index f72e737dda21..6940e5536664 100644
> --- a/arch/mips/include/asm/pgalloc.h
> +++ b/arch/mips/include/asm/pgalloc.h
> @@ -51,13 +51,13 @@ extern pgd_t *pgd_alloc(struct mm_struct *mm);
>  
>  static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
>  {
> - free_pages((unsigned long)pgd, PGD_TABLE_ORDER);
> + pagetable_free(virt_to_ptdesc(pgd));
>  }
>  
> -#define __pte_free_tlb(tlb,pte,address)  \
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), pte);\
> +#define __pte_free_tlb(tlb, pte, address)\
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
>  } while (0)
>  
>  #ifndef __PAGETABLE_PMD_FOLDED
> @@ -65,18 +65,18 @@ do {  
> \
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long 
> address)
>  {
>   pmd_t *pmd;
> - struct page *pg;
> + struct ptdesc *ptdesc;
>  
> - pg = alloc_pages(GFP_KERNEL_ACCOUNT, PMD_TABLE_ORDER);
> - if (!pg)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, PMD_TABLE_ORDER);
> + if (!ptdesc)
>   return NULL;
>  
> - if (!pgtable_pmd_page_ctor(pg)) {
> - __free_pages(pg, PMD_TABLE_ORDER);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - pmd = (pmd_t *)page_address(pg);
> + pmd = ptdesc_address(ptdesc);
>   pmd_init(pmd);
>   return pmd;
>  }
> @@ -90,10 +90,13 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
> unsigned long address)
>  static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long 
> address)
>  {
>   pud_t *pud;
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, PUD_TABLE_ORDER);
>  
> - pud = (pud_t *) __get_free_pages(GFP_KERNEL, PUD_TABLE_ORDER);
> - if (pud)
> - pud_init(pud);
> + if (!ptdesc)
> + return NULL;
> + pud = ptdesc_address(ptdesc);
> +
> + pud_init(pud);
>   return pud;
>  }
>  
> diff --git a/arch/mips/mm/pgtable.c b/arch/mips/mm/pgtable.c
> index b13314be5d0e..729258ff4e3b 100644
> --- a/arch/mips/mm/pgtable.c
> +++ b/arch/mips/mm/pgtable.c
> @@ -10,10 +10,11 @@
>  
>  pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
> - pgd_t *ret, *init;
> + pgd_t *init, *ret = NULL;
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, PGD_TABLE_ORDER);
>  
> - ret = (pgd_t *) __get_free_pages(GFP_KERNEL, PGD_TABLE_ORDER);
> - if (ret) {
> + if (ptdesc) {
> + ret = ptdesc_address(ptdesc);
>   init = pgd_offset(_mm, 0UL);
>   pgd_init(ret);
>   memcpy(ret + USER_PTRS_PER_PGD, init + USER_PTRS_PER_PGD,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 25/34] m68k: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:14PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

One comment below
> ---
>  arch/m68k/include/asm/mcf_pgalloc.h  | 41 ++--
>  arch/m68k/include/asm/sun3_pgalloc.h |  8 +++---
>  arch/m68k/mm/motorola.c  |  4 +--
>  3 files changed, 27 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/m68k/include/asm/mcf_pgalloc.h 
> b/arch/m68k/include/asm/mcf_pgalloc.h
> index 5c2c0a864524..857949ac9431 100644
> --- a/arch/m68k/include/asm/mcf_pgalloc.h
> +++ b/arch/m68k/include/asm/mcf_pgalloc.h
> @@ -7,20 +7,19 @@
>  
>  extern inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  {
> - free_page((unsigned long) pte);
> + pagetable_free(virt_to_ptdesc(pte));
>  }
>  
>  extern const char bad_pmd_string[];
>  
>  extern inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> - unsigned long page = __get_free_page(GFP_DMA);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA | __GFP_ZERO, 0);
>  
> - if (!page)
> + if (!ptdesc)
>   return NULL;
>  
> - memset((void *)page, 0, PAGE_SIZE);
> - return (pte_t *) (page);
> + return ptdesc_address(ptdesc);
>  }
>  
>  extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, unsigned long address)
> @@ -35,36 +34,36 @@ extern inline pmd_t *pmd_alloc_kernel(pgd_t *pgd, 
> unsigned long address)
>  static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pgtable,
> unsigned long address)
>  {
> - struct page *page = virt_to_page(pgtable);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgtable);
>  
> - pgtable_pte_page_dtor(page);
> - __free_page(page);
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
>  {
> - struct page *page = alloc_pages(GFP_DMA, 0);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA, 0);

You can add __GFP_ZERO here and drop pagetable_clear() below

>   pte_t *pte;
>  
> - if (!page)
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pte_page_ctor(page)) {
> - __free_page(page);
> + if (!pagetable_pte_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - pte = page_address(page);
> - clear_page(pte);
> + pte = ptdesc_address(ptdesc);
> + pagetable_clear(pte);
>  
>   return pte;
>  }
>  
>  static inline void pte_free(struct mm_struct *mm, pgtable_t pgtable)
>  {
> - struct page *page = virt_to_page(pgtable);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgtable);
>  
> - pgtable_pte_page_dtor(page);
> - __free_page(page);
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  /*
> @@ -75,16 +74,18 @@ static inline void pte_free(struct mm_struct *mm, 
> pgtable_t pgtable)
>  
>  static inline void pgd_free(struct mm_struct *mm, pgd_t *pgd)
>  {
> - free_page((unsigned long) pgd);
> + pagetable_free(virt_to_ptdesc(pgd));
>  }
>  
>  static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
>   pgd_t *new_pgd;
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_DMA | GFP_NOWARN, 0);
>  
> - new_pgd = (pgd_t *)__get_free_page(GFP_DMA | __GFP_NOWARN);
> - if (!new_pgd)
> + if (!ptdesc)
>   return NULL;
> + new_pgd = ptdesc_address(ptdesc);
> +
>   memcpy(new_pgd, swapper_pg_dir, PTRS_PER_PGD * sizeof(pgd_t));
>   memset(new_pgd, 0, PAGE_OFFSET >> PGDIR_SHIFT);
>   return new_pgd;
> diff --git a/arch/m68k/include/asm/sun3_pgalloc.h 
> b/arch/m68k/include/asm/sun3_pgalloc.h
> index 198036aff519..ff48573db2c0 100644
> --- a/arch/m68k/include/asm/sun3_pgalloc.h
> +++ b/arch/m68k/include/asm/sun3_pgalloc.h
> @@ -17,10 +17,10 @@
>  
>  extern const char bad_pmd_string[];
>  
> -#define __pte_free_tlb(tlb,pte,addr) \
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), pte);\
> +#define __pte_free_tlb(tlb, pte, addr)   \
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
>  } while (0)
>  
>  static inline void pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmd, 
> pte_t *pte)
> diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
> index c75984e2d86b..594575a0780c 100644
> --- 

[6.4-rc6] Crash during a kexec operation (tpm_amd_is_rng_defective)

2023-06-14 Thread Sachin Sant
Following crash is observed during a kexec operation on 
IBM Power10 server:

[ 34.381548] Kernel attempted to read user page (50) - exploit attempt? (uid: 0)
[ 34.381562] BUG: Kernel NULL pointer dereference on read at 0x0050
[ 34.381565] Faulting instruction address: 0xc09db1e4
[ 34.381569] Oops: Kernel access of bad area, sig: 11 [#1]
[ 34.381572] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 34.381576] Modules linked in: dm_mod(E) nft_fib_inet(E) nft_fib_ipv4(E) 
nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) 
nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) 
nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) bonding(E) tls(E) rfkill(E) 
ip_set(E) sunrpc(E) nf_tables(E) nfnetlink(E) pseries_rng(E) 
aes_gcm_p10_crypto(E) drm(E) drm_panel_orientation_quirks(E) xfs(E) 
libcrc32c(E) sd_mod(E) sr_mod(E) t10_pi(E) crc64_rocksoft_generic(E) cdrom(E) 
crc64_rocksoft(E) crc64(E) sg(E) ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) 
vmx_crypto(E) fuse(E)
[ 34.381613] CPU: 18 PID: 5918 Comm: kexec Kdump: loaded Tainted: G E 
6.4.0-rc6-00037-gb6dad5178cea #3
[ 34.381618] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
[ 34.381621] NIP: c09db1e4 LR: c09db928 CTR: c09eab60
[ 34.381625] REGS: c0009742f780 TRAP: 0300 Tainted: G E 
(6.4.0-rc6-00037-gb6dad5178cea)
[ 34.381628] MSR: 8280b033  CR: 
4444 XER: 0001
[ 34.381638] CFAR: c09db19c DAR: 0050 DSISR: 4000 
IRQMASK: 0 
[ 34.381638] GPR00: c09db928 c0009742fa20 c14a1500 
c81d 
[ 34.381638] GPR04: cd842c50 cd842c50 0025 
fffe 
[ 34.381638] GPR08:   0009 
c00800785280 
[ 34.381638] GPR12: c09eab60 c0135fab7f00  
 
[ 34.381638] GPR16:    
 
[ 34.381638] GPR20:    
 
[ 34.381638] GPR24:    
c2e21e08 
[ 34.381638] GPR28: cd842c48 c2a02208 c321c0c0 
c81d 
[ 34.381674] NIP [c09db1e4] tpm_amd_is_rng_defective+0x74/0x240
[ 34.381681] LR [c09db928] tpm_chip_unregister+0x138/0x160
[ 34.381685] Call Trace:
[ 34.381686] [c0009742faa0] [c09db928] 
tpm_chip_unregister+0x138/0x160
[ 34.381690] [c0009742fae0] [c09eab94] tpm_ibmvtpm_remove+0x34/0x130
[ 34.381695] [c0009742fb50] [c0115738] vio_bus_remove+0x58/0xd0
[ 34.381701] [c0009742fb90] [c0a01ecc] device_shutdown+0x21c/0x39c
[ 34.381705] [c0009742fc20] [c01a2684] 
kernel_restart_prepare+0x54/0x70
[ 34.381710] [c0009742fc40] [c0292c48] kernel_kexec+0xa8/0x100
[ 34.381714] [c0009742fcb0] [c01a2cd4] __do_sys_reboot+0x214/0x2c0
[ 34.381718] [c0009742fe10] [c0034adc] 
system_call_exception+0x13c/0x340
[ 34.381723] [c0009742fe50] [c000d05c] 
system_call_vectored_common+0x15c/0x2ec
[ 34.381729] --- interrupt: 3000 at 0x7fff9c5459f0
[ 34.381732] NIP: 7fff9c5459f0 LR:  CTR: 
[ 34.381735] REGS: c0009742fe80 TRAP: 3000 Tainted: G E 
(6.4.0-rc6-00037-gb6dad5178cea)
[ 34.381738] MSR: 8280f033  CR: 
42422884 XER: 
[ 34.381747] IRQMASK: 0 
[ 34.381747] GPR00: 0058 7ad83d70 00012fc47f00 
fee1dead 
[ 34.381747] GPR04: 28121969 45584543  
0003 
[ 34.381747] GPR08: 0010   
 
[ 34.381747] GPR12:  7fff9c7bb2c0 00012fc3f598 
 
[ 34.381747] GPR16:   00012fc1fcc0 
 
[ 34.381747] GPR20: 8913 8914 00014b891020 
0003 
[ 34.381747] GPR24:  0001 0003 
7ad83ef0 
[ 34.381747] GPR28: 00012fc19f10 7fff9c6419c0 00014b891080 
00014b891040 
[ 34.381781] NIP [7fff9c5459f0] 0x7fff9c5459f0
[ 34.381784] LR [] 0x0
[ 34.381786] --- interrupt: 3000
[ 34.381788] Code: 5463063e 408201c8 38210080 4e800020 6000 6000 
6000 7c0802a6 fbe10078 7c7f1b78 f8010090 e9230728  2c2c 
41820020 7d8903a6 
[ 34.381800] ---[ end trace  ]---
[ 34.384090] pstore: backend (nvram) writing error (-1)

Git bisect points to following patch

commit bd8621ca1510e6e802df9855bdc35a04a3cfa932
tpm: Add !tpm_amd_is_rng_defective() to the hwrng_unregister() call site

Reverting the commit allows a successful kexec operation.

- Sachin



Re: [PATCH v4 24/34] loongarch: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:13PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/loongarch/include/asm/pgalloc.h | 27 +++
>  arch/loongarch/mm/pgtable.c  |  7 ---
>  2 files changed, 19 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/loongarch/include/asm/pgalloc.h 
> b/arch/loongarch/include/asm/pgalloc.h
> index af1d1e4a6965..70bb3bdd201e 100644
> --- a/arch/loongarch/include/asm/pgalloc.h
> +++ b/arch/loongarch/include/asm/pgalloc.h
> @@ -45,9 +45,9 @@ extern void pagetable_init(void);
>  extern pgd_t *pgd_alloc(struct mm_struct *mm);
>  
>  #define __pte_free_tlb(tlb, pte, address)\
> -do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page((tlb), pte);\
> +do { \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\
>  } while (0)
>  
>  #ifndef __PAGETABLE_PMD_FOLDED
> @@ -55,18 +55,18 @@ do {  
> \
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long 
> address)
>  {
>   pmd_t *pmd;
> - struct page *pg;
> + struct ptdesc *ptdesc;
>  
> - pg = alloc_page(GFP_KERNEL_ACCOUNT);
> - if (!pg)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, 0);
> + if (!ptdesc)
>   return NULL;
>  
> - if (!pgtable_pmd_page_ctor(pg)) {
> - __free_page(pg);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - pmd = (pmd_t *)page_address(pg);
> + pmd = ptdesc_address(ptdesc);
>   pmd_init(pmd);
>   return pmd;
>  }
> @@ -80,10 +80,13 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
> unsigned long address)
>  static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long 
> address)
>  {
>   pud_t *pud;
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
>  
> - pud = (pud_t *) __get_free_page(GFP_KERNEL);
> - if (pud)
> - pud_init(pud);
> + if (!ptdesc)
> + return NULL;
> + pud = ptdesc_address(ptdesc);
> +
> + pud_init(pud);
>   return pud;
>  }
>  
> diff --git a/arch/loongarch/mm/pgtable.c b/arch/loongarch/mm/pgtable.c
> index 36a6dc0148ae..cdba10ffc0df 100644
> --- a/arch/loongarch/mm/pgtable.c
> +++ b/arch/loongarch/mm/pgtable.c
> @@ -11,10 +11,11 @@
>  
>  pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
> - pgd_t *ret, *init;
> + pgd_t *init, *ret = NULL;
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, 0);
>  
> - ret = (pgd_t *) __get_free_page(GFP_KERNEL);
> - if (ret) {
> + if (ptdesc) {
> + ret = (pgd_t *)ptdesc_address(ptdesc);
>   init = pgd_offset(_mm, 0UL);
>   pgd_init(ret);
>   memcpy(ret + USER_PTRS_PER_PGD, init + USER_PTRS_PER_PGD,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 23/34] hexagon: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:12PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/hexagon/include/asm/pgalloc.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/hexagon/include/asm/pgalloc.h 
> b/arch/hexagon/include/asm/pgalloc.h
> index f0c47e6a7427..55988625e6fb 100644
> --- a/arch/hexagon/include/asm/pgalloc.h
> +++ b/arch/hexagon/include/asm/pgalloc.h
> @@ -87,10 +87,10 @@ static inline void pmd_populate_kernel(struct mm_struct 
> *mm, pmd_t *pmd,
>   max_kernel_seg = pmdindex;
>  }
>  
> -#define __pte_free_tlb(tlb, pte, addr)   \
> -do { \
> - pgtable_pte_page_dtor((pte));   \
> - tlb_remove_page((tlb), (pte));  \
> +#define __pte_free_tlb(tlb, pte, addr)   \
> +do { \
> + pagetable_pte_dtor((page_ptdesc(pte))); \
> + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
>  } while (0)
>  
>  #endif
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 22/34] csky: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:11PM -0700, Vishal Moola (Oracle) wrote:
> Part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> Acked-by: Guo Ren 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/csky/include/asm/pgalloc.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/csky/include/asm/pgalloc.h b/arch/csky/include/asm/pgalloc.h
> index 7d57e5da0914..9c84c9012e53 100644
> --- a/arch/csky/include/asm/pgalloc.h
> +++ b/arch/csky/include/asm/pgalloc.h
> @@ -63,8 +63,8 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
>  
>  #define __pte_free_tlb(tlb, pte, address)\
>  do { \
> - pgtable_pte_page_dtor(pte); \
> - tlb_remove_page(tlb, pte);  \
> + pagetable_pte_dtor(page_ptdesc(pte));   \
> + tlb_remove_page_ptdesc(tlb, page_ptdesc(pte));  \
>  } while (0)
>  
>  extern void pagetable_init(void);
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 21/34] arm64: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:10PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/arm64/include/asm/tlb.h | 14 --
>  arch/arm64/mm/mmu.c  |  7 ---
>  2 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index c995d1f4594f..2c29239d05c3 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -75,18 +75,20 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>  static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
> unsigned long addr)
>  {
> - pgtable_pte_page_dtor(pte);
> - tlb_remove_table(tlb, pte);
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 2
>  static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
> unsigned long addr)
>  {
> - struct page *page = virt_to_page(pmdp);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
>  
> - pgtable_pmd_page_dtor(page);
> - tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  #endif
>  
> @@ -94,7 +96,7 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
> pmd_t *pmdp,
>  static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
> unsigned long addr)
>  {
> - tlb_remove_table(tlb, virt_to_page(pudp));
> + tlb_remove_ptdesc(tlb, virt_to_ptdesc(pudp));
>  }
>  #endif
>  
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index af6bc8403ee4..5867a0e917b9 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -426,6 +426,7 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
>  static phys_addr_t pgd_pgtable_alloc(int shift)
>  {
>   phys_addr_t pa = __pgd_pgtable_alloc(shift);
> + struct ptdesc *ptdesc = page_ptdesc(phys_to_page(pa));
>  
>   /*
>* Call proper page table ctor in case later we need to
> @@ -433,12 +434,12 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
>* this pre-allocated page table.
>*
>* We don't select ARCH_ENABLE_SPLIT_PMD_PTLOCK if pmd is
> -  * folded, and if so pgtable_pmd_page_ctor() becomes nop.
> +  * folded, and if so pagetable_pte_ctor() becomes nop.
>*/
>   if (shift == PAGE_SHIFT)
> - BUG_ON(!pgtable_pte_page_ctor(phys_to_page(pa)));
> + BUG_ON(!pagetable_pte_ctor(ptdesc));
>   else if (shift == PMD_SHIFT)
> - BUG_ON(!pgtable_pmd_page_ctor(phys_to_page(pa)));
> + BUG_ON(!pagetable_pmd_ctor(ptdesc));
>  
>   return pa;
>  }
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 20/34] arm: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:09PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> late_alloc() also uses the __get_free_pages() helper function. Convert
> this to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

One comment below.

> ---
>  arch/arm/include/asm/tlb.h | 12 +++-
>  arch/arm/mm/mmu.c  |  6 +++---
>  2 files changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
> index b8cbe03ad260..f40d06ad5d2a 100644
> --- a/arch/arm/include/asm/tlb.h
> +++ b/arch/arm/include/asm/tlb.h
> @@ -39,7 +39,9 @@ static inline void __tlb_remove_table(void *_table)
>  static inline void
>  __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr)
>  {
> - pgtable_pte_page_dtor(pte);
> + struct ptdesc *ptdesc = page_ptdesc(pte);
> +
> + pagetable_pte_dtor(ptdesc);
>  
>  #ifndef CONFIG_ARM_LPAE
>   /*
> @@ -50,17 +52,17 @@ __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 
> unsigned long addr)
>   __tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE);
>  #endif
>  
> - tlb_remove_table(tlb, pte);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  }
>  
>  static inline void
>  __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
>  {
>  #ifdef CONFIG_ARM_LPAE
> - struct page *page = virt_to_page(pmdp);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmdp);
>  
> - pgtable_pmd_page_dtor(page);
> - tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + tlb_remove_ptdesc(tlb, ptdesc);
>  #endif
>  }
>  
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 22292cf3381c..294518fd0240 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -737,11 +737,11 @@ static void __init *early_alloc(unsigned long sz)
>  
>  static void *__init late_alloc(unsigned long sz)
>  {
> - void *ptr = (void *)__get_free_pages(GFP_PGTABLE_KERNEL, get_order(sz));
> + void *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, get_order(sz));
>  
> - if (!ptr || !pgtable_pte_page_ctor(virt_to_page(ptr)))
> + if (!ptdesc || !pagetable_pte_ctor(ptdesc))
>   BUG();
> - return ptr;
> + return ptdesc;

should be

return  ptdesc_to_virt(ptdesc);

>  }
>  
>  static pte_t * __init arm_pte_alloc(pmd_t *pmd, unsigned long addr,
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 19/34] pgalloc: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:08PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  include/asm-generic/pgalloc.h | 62 +--
>  1 file changed, 37 insertions(+), 25 deletions(-)
> 
> diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
> index a7cf825befae..3fd6ce79e654 100644
> --- a/include/asm-generic/pgalloc.h
> +++ b/include/asm-generic/pgalloc.h
> @@ -18,7 +18,11 @@
>   */
>  static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
>  {
> - return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL, 0);
> +
> + if (!ptdesc)
> + return NULL;
> + return ptdesc_address(ptdesc);
>  }
>  
>  #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
> @@ -41,7 +45,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct 
> *mm)
>   */
>  static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
>  {
> - free_page((unsigned long)pte);
> + pagetable_free(virt_to_ptdesc(pte));
>  }
>  
>  /**
> @@ -49,7 +53,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
> pte_t *pte)
>   * @mm: the mm_struct of the current context
>   * @gfp: GFP flags to use for the allocation
>   *
> - * Allocates a page and runs the pgtable_pte_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pte_ctor().

Allocates memory for page table and ptdesc

>   *
>   * This function is intended for architectures that need
>   * anything beyond simple page allocation or must have custom GFP flags.

The Return: description here should be fixed up

> @@ -58,17 +62,17 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
> pte_t *pte)
>   */
>  static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, gfp_t gfp)
>  {
> - struct page *pte;
> + struct ptdesc *ptdesc;
>  
> - pte = alloc_page(gfp);
> - if (!pte)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pte_page_ctor(pte)) {
> - __free_page(pte);
> + if (!pagetable_pte_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - return pte;
> + return ptdesc_page(ptdesc);
>  }
>  
>  #ifndef __HAVE_ARCH_PTE_ALLOC_ONE
> @@ -76,7 +80,7 @@ static inline pgtable_t __pte_alloc_one(struct mm_struct 
> *mm, gfp_t gfp)
>   * pte_alloc_one - allocate a page for PTE-level user page table
>   * @mm: the mm_struct of the current context
>   *
> - * Allocates a page and runs the pgtable_pte_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pte_ctor().

Allocates memory for page table and ptdesc

>   *
>   * Return: `struct page` initialized as page table or %NULL on error

Return: ptdesc ...

>   */
> @@ -98,8 +102,10 @@ static inline pgtable_t pte_alloc_one(struct mm_struct 
> *mm)
>   */
>  static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
>  {
> - pgtable_pte_page_dtor(pte_page);
> - __free_page(pte_page);
> + struct ptdesc *ptdesc = page_ptdesc(pte_page);
> +
> + pagetable_pte_dtor(ptdesc);
> + pagetable_free(ptdesc);
>  }
>  
>  
> @@ -110,7 +116,7 @@ static inline void pte_free(struct mm_struct *mm, struct 
> page *pte_page)
>   * pmd_alloc_one - allocate a page for PMD-level page table
>   * @mm: the mm_struct of the current context
>   *
> - * Allocates a page and runs the pgtable_pmd_page_ctor().
> + * Allocates a ptdesc and runs the pagetable_pmd_ctor().

Allocate memory for page table and ptdesc

>   * Allocations use %GFP_PGTABLE_USER in user context and
>   * %GFP_PGTABLE_KERNEL in kernel context.
>   *
> @@ -118,28 +124,30 @@ static inline void pte_free(struct mm_struct *mm, 
> struct page *pte_page)
>   */
>  static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   gfp_t gfp = GFP_PGTABLE_USER;
>  
>   if (mm == _mm)
>   gfp = GFP_PGTABLE_KERNEL;
> - page = alloc_page(gfp);
> - if (!page)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pmd_page_ctor(page)) {
> - __free_page(page);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
> - return (pmd_t *)page_address(page);
> + return ptdesc_address(ptdesc);
>  }
>  #endif
>  
>  #ifndef __HAVE_ARCH_PMD_FREE
>  static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
>  {
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
> +
> 

Re: [PATCH v4 18/34] mm: Remove page table members from struct page

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:07PM -0700, Vishal Moola (Oracle) wrote:
> The page table members are now split out into their own ptdesc struct.
> Remove them from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm_types.h | 14 --
>  include/linux/pgtable.h  |  3 ---
>  2 files changed, 17 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6161fe1ae5b8..31ffa1be21d0 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -141,20 +141,6 @@ struct page {
>   struct {/* Tail pages of compound page */
>   unsigned long compound_head;/* Bit zero is set */
>   };
> - struct {/* Page table pages */
> - unsigned long _pt_pad_1;/* compound_head */
> - pgtable_t pmd_huge_pte; /* protected by page->ptl */
> - unsigned long _pt_s390_gaddr;   /* mapping */
> - union {
> - struct mm_struct *pt_mm; /* x86 pgds only */
> - atomic_t pt_frag_refcount; /* powerpc */
> - };
> -#if ALLOC_SPLIT_PTLOCKS
> - spinlock_t *ptl;
> -#else
> - spinlock_t ptl;
> -#endif
> - };
>   struct {/* ZONE_DEVICE pages */
>   /** @pgmap: Points to the hosting device page map. */
>   struct dev_pagemap *pgmap;
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c405f74d3875..33cc19d752b3 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1019,10 +1019,7 @@ struct ptdesc {
>  TABLE_MATCH(flags, __page_flags);
>  TABLE_MATCH(compound_head, pt_list);
>  TABLE_MATCH(compound_head, _pt_pad_1);
> -TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
>  TABLE_MATCH(mapping, _pt_s390_gaddr);
> -TABLE_MATCH(pt_mm, pt_mm);
> -TABLE_MATCH(ptl, ptl);
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 17/34] s390: Convert various pgalloc functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:06PM -0700, Vishal Moola (Oracle) wrote:
> As part of the conversions to replace pgtable constructor/destructors with
> ptdesc equivalents, convert various page table functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/s390/include/asm/pgalloc.h |   4 +-
>  arch/s390/include/asm/tlb.h |   4 +-
>  arch/s390/mm/pgalloc.c  | 108 
>  3 files changed, 59 insertions(+), 57 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
> index 17eb618f1348..00ad9b88fda9 100644
> --- a/arch/s390/include/asm/pgalloc.h
> +++ b/arch/s390/include/asm/pgalloc.h
> @@ -86,7 +86,7 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
> unsigned long vmaddr)
>   if (!table)
>   return NULL;
>   crst_table_init(table, _SEGMENT_ENTRY_EMPTY);
> - if (!pgtable_pmd_page_ctor(virt_to_page(table))) {
> + if (!pagetable_pmd_ctor(virt_to_ptdesc(table))) {
>   crst_table_free(mm, table);
>   return NULL;
>   }
> @@ -97,7 +97,7 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t 
> *pmd)
>  {
>   if (mm_pmd_folded(mm))
>   return;
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   crst_table_free(mm, (unsigned long *) pmd);
>  }
>  
> diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
> index b91f4a9b044c..383b1f91442c 100644
> --- a/arch/s390/include/asm/tlb.h
> +++ b/arch/s390/include/asm/tlb.h
> @@ -89,12 +89,12 @@ static inline void pmd_free_tlb(struct mmu_gather *tlb, 
> pmd_t *pmd,
>  {
>   if (mm_pmd_folded(tlb->mm))
>   return;
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   __tlb_adjust_range(tlb, address, PAGE_SIZE);
>   tlb->mm->context.flush_mm = 1;
>   tlb->freed_tables = 1;
>   tlb->cleared_puds = 1;
> - tlb_remove_table(tlb, pmd);
> + tlb_remove_ptdesc(tlb, pmd);
>  }
>  
>  /*
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index 6b99932abc66..eeb7c95b98cf 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -43,17 +43,17 @@ __initcall(page_table_register_sysctl);
>  
>  unsigned long *crst_table_alloc(struct mm_struct *mm)
>  {
> - struct page *page = alloc_pages(GFP_KERNEL, CRST_ALLOC_ORDER);
> + struct ptdesc *ptdesc = pagetable_alloc(GFP_KERNEL, CRST_ALLOC_ORDER);
>  
> - if (!page)
> + if (!ptdesc)
>   return NULL;
> - arch_set_page_dat(page, CRST_ALLOC_ORDER);
> - return (unsigned long *) page_to_virt(page);
> + arch_set_page_dat(ptdesc_page(ptdesc), CRST_ALLOC_ORDER);
> + return (unsigned long *) ptdesc_to_virt(ptdesc);
>  }
>  
>  void crst_table_free(struct mm_struct *mm, unsigned long *table)
>  {
> - free_pages((unsigned long)table, CRST_ALLOC_ORDER);
> + pagetable_free(virt_to_ptdesc(table));
>  }
>  
>  static void __crst_table_upgrade(void *arg)
> @@ -140,21 +140,21 @@ static inline unsigned int atomic_xor_bits(atomic_t *v, 
> unsigned int bits)
>  
>  struct page *page_table_alloc_pgste(struct mm_struct *mm)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   u64 *table;
>  
> - page = alloc_page(GFP_KERNEL);
> - if (page) {
> - table = (u64 *)page_to_virt(page);
> + ptdesc = pagetable_alloc(GFP_KERNEL, 0);
> + if (ptdesc) {
> + table = (u64 *)ptdesc_to_virt(ptdesc);
>   memset64(table, _PAGE_INVALID, PTRS_PER_PTE);
>   memset64(table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
>   }
> - return page;
> + return ptdesc_page(ptdesc);
>  }
>  
>  void page_table_free_pgste(struct page *page)
>  {
> - __free_page(page);
> + pagetable_free(page_ptdesc(page));
>  }
>  
>  #endif /* CONFIG_PGSTE */
> @@ -230,7 +230,7 @@ void page_table_free_pgste(struct page *page)
>  unsigned long *page_table_alloc(struct mm_struct *mm)
>  {
>   unsigned long *table;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned int mask, bit;
>  
>   /* Try to get a fragment of a 4K page as a 2K page table */
> @@ -238,9 +238,9 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   table = NULL;
>   spin_lock_bh(>context.lock);
>   if (!list_empty(>context.pgtable_list)) {
> - page = list_first_entry(>context.pgtable_list,
> - struct page, lru);
> - mask = atomic_read(>pt_frag_refcount);
> + ptdesc = list_first_entry(>context.pgtable_list,
> +

Re: [PATCH v4 16/34] s390: Convert various gmap functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:05PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert
> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.
> 
> Signed-off-by: Vishal Moola (Oracle) 

With folding

ptdesc->_pt_s390_gaddr = 0;

into pagetable_free()

Acked-by: Mike Rapoport (IBM) 


> ---
>  arch/s390/mm/gmap.c | 230 
>  1 file changed, 128 insertions(+), 102 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 81c683426b49..010e87df7299 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -34,7 +34,7 @@
>  static struct gmap *gmap_alloc(unsigned long limit)
>  {
>   struct gmap *gmap;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned long *table;
>   unsigned long etype, atype;
>  
> @@ -67,12 +67,12 @@ static struct gmap *gmap_alloc(unsigned long limit)
>   spin_lock_init(>guest_table_lock);
>   spin_lock_init(>shadow_lock);
>   refcount_set(>ref_count, 1);
> - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> - if (!page)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> + if (!ptdesc)
>   goto out_free;
> - page->_pt_s390_gaddr = 0;
> - list_add(>lru, >crst_list);
> - table = page_to_virt(page);
> + ptdesc->_pt_s390_gaddr = 0;
> + list_add(>pt_list, >crst_list);
> + table = ptdesc_to_virt(ptdesc);
>   crst_table_init(table, etype);
>   gmap->table = table;
>   gmap->asce = atype | _ASCE_TABLE_LENGTH |
> @@ -181,25 +181,25 @@ static void gmap_rmap_radix_tree_free(struct 
> radix_tree_root *root)
>   */
>  static void gmap_free(struct gmap *gmap)
>  {
> - struct page *page, *next;
> + struct ptdesc *ptdesc, *next;
>  
>   /* Flush tlb of all gmaps (if not already done for shadows) */
>   if (!(gmap_is_shadow(gmap) && gmap->removed))
>   gmap_flush_tlb(gmap);
>   /* Free all segment & region tables. */
> - list_for_each_entry_safe(page, next, >crst_list, lru) {
> - page->_pt_s390_gaddr = 0;
> - __free_pages(page, CRST_ALLOC_ORDER);
> + list_for_each_entry_safe(ptdesc, next, >crst_list, pt_list) {
> + ptdesc->_pt_s390_gaddr = 0;
> + pagetable_free(ptdesc);
>   }
>   gmap_radix_tree_free(>guest_to_host);
>   gmap_radix_tree_free(>host_to_guest);
>  
>   /* Free additional data for a shadow gmap */
>   if (gmap_is_shadow(gmap)) {
> - /* Free all page tables. */
> - list_for_each_entry_safe(page, next, >pt_list, lru) {
> - page->_pt_s390_gaddr = 0;
> - page_table_free_pgste(page);
> + /* Free all ptdesc tables. */
> + list_for_each_entry_safe(ptdesc, next, >pt_list, pt_list) 
> {
> + ptdesc->_pt_s390_gaddr = 0;
> + page_table_free_pgste(ptdesc_page(ptdesc));
>   }
>   gmap_rmap_radix_tree_free(>host_to_rmap);
>   /* Release reference to the parent */
> @@ -308,27 +308,27 @@ EXPORT_SYMBOL_GPL(gmap_get_enabled);
>  static int gmap_alloc_table(struct gmap *gmap, unsigned long *table,
>   unsigned long init, unsigned long gaddr)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   unsigned long *new;
>  
>   /* since we dont free the gmap table until gmap_free we can unlock */
> - page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> - if (!page)
> + ptdesc = pagetable_alloc(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
> + if (!ptdesc)
>   return -ENOMEM;
> - new = page_to_virt(page);
> + new = ptdesc_to_virt(ptdesc);
>   crst_table_init(new, init);
>   spin_lock(>guest_table_lock);
>   if (*table & _REGION_ENTRY_INVALID) {
> - list_add(>lru, >crst_list);
> + list_add(>pt_list, >crst_list);
>   *table = __pa(new) | _REGION_ENTRY_LENGTH |
>   (*table & _REGION_ENTRY_TYPE_MASK);
> - page->_pt_s390_gaddr = gaddr;
> - page = NULL;
> + ptdesc->_pt_s390_gaddr = gaddr;
> + ptdesc = NULL;
>   }
>   spin_unlock(>guest_table_lock);
> - if (page) {
> - page->_pt_s390_gaddr = 0;
> - __free_pages(page, CRST_ALLOC_ORDER);
> + if (ptdesc) {
> + ptdesc->_pt_s390_gaddr = 0;
> + pagetable_free(ptdesc);
>   }
>   return 0;
>  }
> @@ -341,15 +341,15 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>   */
>  static unsigned long __gmap_segment_gaddr(unsigned long *entry)
>  {
> - struct page *page;
> + struct ptdesc *ptdesc;
>   

Re: [PATCH v4 15/34] x86: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:04PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Some of the functions use the *get*page*() helper functions. Convert

Nit:   *get_free_page*()

> these to use pagetable_alloc() and ptdesc_address() instead to help
> standardize page tables further.

More importantly, get_free_pages() ensures a page won't be allocated from
HIGHMEM, and for 32-bits this is a must.
 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/x86/mm/pgtable.c | 46 +--
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 15a8009a4480..6da7fd5d4782 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -52,7 +52,7 @@ early_param("userpte", setup_userpte);
>  
>  void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
>  {
> - pgtable_pte_page_dtor(pte);
> + pagetable_pte_dtor(page_ptdesc(pte));
>   paravirt_release_pte(page_to_pfn(pte));
>   paravirt_tlb_remove_table(tlb, pte);
>  }
> @@ -60,7 +60,7 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page 
> *pte)
>  #if CONFIG_PGTABLE_LEVELS > 2
>  void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  {
> - struct page *page = virt_to_page(pmd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>   paravirt_release_pmd(__pa(pmd) >> PAGE_SHIFT);
>   /*
>* NOTE! For PAE, any changes to the top page-directory-pointer-table
> @@ -69,8 +69,8 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  #ifdef CONFIG_X86_PAE
>   tlb->need_flush_all = 1;
>  #endif
> - pgtable_pmd_page_dtor(page);
> - paravirt_tlb_remove_table(tlb, page);
> + pagetable_pmd_dtor(ptdesc);
> + paravirt_tlb_remove_table(tlb, ptdesc_page(ptdesc));
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 3
> @@ -92,16 +92,16 @@ void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
>  
>  static inline void pgd_list_add(pgd_t *pgd)
>  {
> - struct page *page = virt_to_page(pgd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
>  
> - list_add(>lru, _list);
> + list_add(>pt_list, _list);
>  }
>  
>  static inline void pgd_list_del(pgd_t *pgd)
>  {
> - struct page *page = virt_to_page(pgd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pgd);
>  
> - list_del(>lru);
> + list_del(>pt_list);
>  }
>  
>  #define UNSHARED_PTRS_PER_PGD\
> @@ -112,12 +112,12 @@ static inline void pgd_list_del(pgd_t *pgd)
>  
>  static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
>  {
> - virt_to_page(pgd)->pt_mm = mm;
> + virt_to_ptdesc(pgd)->pt_mm = mm;
>  }
>  
>  struct mm_struct *pgd_page_get_mm(struct page *page)
>  {
> - return page->pt_mm;
> + return page_ptdesc(page)->pt_mm;
>  }
>  
>  static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
> @@ -213,11 +213,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, 
> pmd_t *pmd)
>  static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
>  {
>   int i;
> + struct ptdesc *ptdesc;
>  
>   for (i = 0; i < count; i++)
>   if (pmds[i]) {
> - pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
> - free_page((unsigned long)pmds[i]);
> + ptdesc = virt_to_ptdesc(pmds[i]);
> +
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   mm_dec_nr_pmds(mm);
>   }
>  }
> @@ -232,16 +235,21 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t 
> *pmds[], int count)
>   gfp &= ~__GFP_ACCOUNT;
>  
>   for (i = 0; i < count; i++) {
> - pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
> - if (!pmd)
> + pmd_t *pmd = NULL;
> + struct ptdesc *ptdesc = pagetable_alloc(gfp, 0);
> +
> + if (!ptdesc)
>   failed = true;
> - if (pmd && !pgtable_pmd_page_ctor(virt_to_page(pmd))) {
> - free_page((unsigned long)pmd);
> - pmd = NULL;
> + if (ptdesc && !pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
> + ptdesc = NULL;
>   failed = true;
>   }
> - if (pmd)
> + if (ptdesc) {
>   mm_inc_nr_pmds(mm);
> + pmd = ptdesc_address(ptdesc);
> + }
> +
>   pmds[i] = pmd;
>   }
>  
> @@ -830,7 +838,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
>  
>   free_page((unsigned long)pmd_sv);
>  
> - pgtable_pmd_page_dtor(virt_to_page(pmd));
> + pagetable_pmd_dtor(virt_to_ptdesc(pmd));
>   free_page((unsigned long)pmd);
>  
>   return 1;
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 14/34] powerpc: Convert various functions to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:03PM -0700, Vishal Moola (Oracle) wrote:
> In order to split struct ptdesc from struct page, convert various
> functions to use ptdescs.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/powerpc/mm/book3s64/mmu_context.c | 10 +++---
>  arch/powerpc/mm/book3s64/pgtable.c | 32 +-
>  arch/powerpc/mm/pgtable-frag.c | 46 +-
>  3 files changed, 44 insertions(+), 44 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
> b/arch/powerpc/mm/book3s64/mmu_context.c
> index c766e4c26e42..1715b07c630c 100644
> --- a/arch/powerpc/mm/book3s64/mmu_context.c
> +++ b/arch/powerpc/mm/book3s64/mmu_context.c
> @@ -246,15 +246,15 @@ static void destroy_contexts(mm_context_t *ctx)
>  static void pmd_frag_destroy(void *pmd_frag)
>  {
>   int count;
> - struct page *page;
> + struct ptdesc *ptdesc;
>  
> - page = virt_to_page(pmd_frag);
> + ptdesc = virt_to_ptdesc(pmd_frag);
>   /* drop all the pending references */
>   count = ((unsigned long)pmd_frag & ~PAGE_MASK) >> PMD_FRAG_SIZE_SHIFT;
>   /* We allow PTE_FRAG_NR fragments from a PTE page */
> - if (atomic_sub_and_test(PMD_FRAG_NR - count, >pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + if (atomic_sub_and_test(PMD_FRAG_NR - count, 
> >pt_frag_refcount)) {
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   }
>  }
>  
> diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
> b/arch/powerpc/mm/book3s64/pgtable.c
> index 85c84e89e3ea..1212deeabe15 100644
> --- a/arch/powerpc/mm/book3s64/pgtable.c
> +++ b/arch/powerpc/mm/book3s64/pgtable.c
> @@ -306,22 +306,22 @@ static pmd_t *get_pmd_from_cache(struct mm_struct *mm)
>  static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  {
>   void *ret = NULL;
> - struct page *page;
> + struct ptdesc *ptdesc;
>   gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO;
>  
>   if (mm == _mm)
>   gfp &= ~__GFP_ACCOUNT;
> - page = alloc_page(gfp);
> - if (!page)
> + ptdesc = pagetable_alloc(gfp, 0);
> + if (!ptdesc)
>   return NULL;
> - if (!pgtable_pmd_page_ctor(page)) {
> - __free_pages(page, 0);
> + if (!pagetable_pmd_ctor(ptdesc)) {
> + pagetable_free(ptdesc);
>   return NULL;
>   }
>  
> - atomic_set(>pt_frag_refcount, 1);
> + atomic_set(>pt_frag_refcount, 1);
>  
> - ret = page_address(page);
> + ret = ptdesc_address(ptdesc);
>   /*
>* if we support only one fragment just return the
>* allocated page.
> @@ -331,12 +331,12 @@ static pmd_t *__alloc_for_pmdcache(struct mm_struct *mm)
>  
>   spin_lock(>page_table_lock);
>   /*
> -  * If we find pgtable_page set, we return
> +  * If we find ptdesc_page set, we return
>* the allocated page with single fragment
>* count.
>*/
>   if (likely(!mm->context.pmd_frag)) {
> - atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
> + atomic_set(>pt_frag_refcount, PMD_FRAG_NR);
>   mm->context.pmd_frag = ret + PMD_FRAG_SIZE;
>   }
>   spin_unlock(>page_table_lock);
> @@ -357,15 +357,15 @@ pmd_t *pmd_fragment_alloc(struct mm_struct *mm, 
> unsigned long vmaddr)
>  
>  void pmd_fragment_free(unsigned long *pmd)
>  {
> - struct page *page = virt_to_page(pmd);
> + struct ptdesc *ptdesc = virt_to_ptdesc(pmd);
>  
> - if (PageReserved(page))
> - return free_reserved_page(page);
> + if (pagetable_is_reserved(ptdesc))
> + return free_reserved_ptdesc(ptdesc);
>  
> - BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
> - if (atomic_dec_and_test(>pt_frag_refcount)) {
> - pgtable_pmd_page_dtor(page);
> - __free_page(page);
> + BUG_ON(atomic_read(>pt_frag_refcount) <= 0);
> + if (atomic_dec_and_test(>pt_frag_refcount)) {
> + pagetable_pmd_dtor(ptdesc);
> + pagetable_free(ptdesc);
>   }
>  }
>  
> diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c
> index 20652daa1d7e..8961f1540209 100644
> --- a/arch/powerpc/mm/pgtable-frag.c
> +++ b/arch/powerpc/mm/pgtable-frag.c
> @@ -18,15 +18,15 @@
>  void pte_frag_destroy(void *pte_frag)
>  {
>   int count;
> - struct page *page;
> + struct ptdesc *ptdesc;
>  
> - page = virt_to_page(pte_frag);
> + ptdesc = virt_to_ptdesc(pte_frag);
>   /* drop all the pending references */
>   count = ((unsigned long)pte_frag & ~PAGE_MASK) >> PTE_FRAG_SIZE_SHIFT;
>   /* We allow PTE_FRAG_NR fragments from a PTE page */
> - if (atomic_sub_and_test(PTE_FRAG_NR - count, >pt_frag_refcount)) {
> - pgtable_pte_page_dtor(page);
> - __free_page(page);
> + if (atomic_sub_and_test(PTE_FRAG_NR - count, 
> >pt_frag_refcount)) {
> +

Re: [PATCH v4 13/34] mm: Create ptdesc equivalents for pgtable_{pte,pmd}_page_{ctor,dtor}

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:02PM -0700, Vishal Moola (Oracle) wrote:
> Creates pagetable_pte_ctor(), pagetable_pmd_ctor(), pagetable_pte_dtor(),
> and pagetable_pmd_dtor() and make the original pgtable
> constructor/destructors wrappers.

Nit: either "creates ... makes" or "create ... make"
I like the second form more.
 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 56 ++
>  1 file changed, 42 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a1af7983e1bd..dc211c43610b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2886,20 +2886,34 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) 
> { return true; }
>  static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
> -static inline bool pgtable_pte_page_ctor(struct page *page)
> +static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
>  {
> - if (!ptlock_init(page_ptdesc(page)))
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + if (!ptlock_init(ptdesc))
>   return false;
> - __SetPageTable(page);
> - inc_lruvec_page_state(page, NR_PAGETABLE);
> + __folio_set_table(folio);

This comment is more to patch 1 ("mm: Add PAGE_TYPE_OP folio functions")

It would be better to have _pgtable here, as "table" does not necessary
mean page table.
With PageType SetPageTable was fine, but with folio I think it should be
more explicit.

I'd add a third parameter to PAGE_TYPE_OPS for that.

> + lruvec_stat_add_folio(folio, NR_PAGETABLE);
>   return true;
>  }
>  
> +static inline bool pgtable_pte_page_ctor(struct page *page)
> +{
> + return pagetable_pte_ctor(page_ptdesc(page));
> +}
> +
> +static inline void pagetable_pte_dtor(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + ptlock_free(ptdesc);
> + __folio_clear_table(folio);
> + lruvec_stat_sub_folio(folio, NR_PAGETABLE);
> +}
> +
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
> - ptlock_free(page_ptdesc(page));
> - __ClearPageTable(page);
> - dec_lruvec_page_state(page, NR_PAGETABLE);
> + pagetable_pte_dtor(page_ptdesc(page));
>  }
>  
>  #define pte_offset_map_lock(mm, pmd, address, ptlp)  \
> @@ -2981,20 +2995,34 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptl;
>  }
>  
> -static inline bool pgtable_pmd_page_ctor(struct page *page)
> +static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
>  {
> - if (!pmd_ptlock_init(page_ptdesc(page)))
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + if (!pmd_ptlock_init(ptdesc))
>   return false;
> - __SetPageTable(page);
> - inc_lruvec_page_state(page, NR_PAGETABLE);
> + __folio_set_table(folio);
> + lruvec_stat_add_folio(folio, NR_PAGETABLE);
>   return true;
>  }
>  
> +static inline bool pgtable_pmd_page_ctor(struct page *page)
> +{
> + return pagetable_pmd_ctor(page_ptdesc(page));
> +}
> +
> +static inline void pagetable_pmd_dtor(struct ptdesc *ptdesc)
> +{
> + struct folio *folio = ptdesc_folio(ptdesc);
> +
> + pmd_ptlock_free(ptdesc);
> + __folio_clear_table(folio);
> + lruvec_stat_sub_folio(folio, NR_PAGETABLE);
> +}
> +
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
> - pmd_ptlock_free(page_ptdesc(page));
> - __ClearPageTable(page);
> - dec_lruvec_page_state(page, NR_PAGETABLE);
> + pagetable_pmd_dtor(page_ptdesc(page));
>  }
>  
>  /*
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 12/34] mm: Convert ptlock_free() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:01PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  mm/memory.c|  4 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3b54bb4c9753..a1af7983e1bd 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2826,7 +2826,7 @@ static inline void pagetable_clear(void *x)
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
>  bool ptlock_alloc(struct ptdesc *ptdesc);
> -extern void ptlock_free(struct page *page);
> +void ptlock_free(struct ptdesc *ptdesc);
>  
>  static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> @@ -2842,7 +2842,7 @@ static inline bool ptlock_alloc(struct ptdesc *ptdesc)
>   return true;
>  }
>  
> -static inline void ptlock_free(struct page *page)
> +static inline void ptlock_free(struct ptdesc *ptdesc)
>  {
>  }
>  
> @@ -2883,7 +2883,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>  }
>  static inline void ptlock_cache_init(void) {}
>  static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
> -static inline void ptlock_free(struct page *page) {}
> +static inline void ptlock_free(struct ptdesc *ptdesc) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
>  static inline bool pgtable_pte_page_ctor(struct page *page)
> @@ -2897,7 +2897,7 @@ static inline bool pgtable_pte_page_ctor(struct page 
> *page)
>  
>  static inline void pgtable_pte_page_dtor(struct page *page)
>  {
> - ptlock_free(page);
> + ptlock_free(page_ptdesc(page));
>   __ClearPageTable(page);
>   dec_lruvec_page_state(page, NR_PAGETABLE);
>  }
> @@ -2955,7 +2955,7 @@ static inline void pmd_ptlock_free(struct ptdesc 
> *ptdesc)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
>  #endif
> - ptlock_free(ptdesc_page(ptdesc));
> + ptlock_free(ptdesc);
>  }
>  
>  #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
> diff --git a/mm/memory.c b/mm/memory.c
> index ba9579117686..d4d2ea5cf0fd 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5945,8 +5945,8 @@ bool ptlock_alloc(struct ptdesc *ptdesc)
>   return true;
>  }
>  
> -void ptlock_free(struct page *page)
> +void ptlock_free(struct ptdesc *ptdesc)
>  {
> - kmem_cache_free(page_ptl_cachep, page->ptl);
> + kmem_cache_free(page_ptl_cachep, ptdesc->ptl);
>  }
>  #endif
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 11/34] mm: Convert pmd_ptlock_free() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:04:00PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f48e626d9c98..3b54bb4c9753 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2950,12 +2950,12 @@ static inline bool pmd_ptlock_init(struct ptdesc 
> *ptdesc)
>   return ptlock_init(ptdesc);
>  }
>  
> -static inline void pmd_ptlock_free(struct page *page)
> +static inline void pmd_ptlock_free(struct ptdesc *ptdesc)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - VM_BUG_ON_PAGE(page->pmd_huge_pte, page);
> + VM_BUG_ON_PAGE(ptdesc->pmd_huge_pte, ptdesc_page(ptdesc));
>  #endif
> - ptlock_free(page);
> + ptlock_free(ptdesc_page(ptdesc));
>  }
>  
>  #define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
> @@ -2968,7 +2968,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>  }
>  
>  static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
> -static inline void pmd_ptlock_free(struct page *page) {}
> +static inline void pmd_ptlock_free(struct ptdesc *ptdesc) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
>  
> @@ -2992,7 +2992,7 @@ static inline bool pgtable_pmd_page_ctor(struct page 
> *page)
>  
>  static inline void pgtable_pmd_page_dtor(struct page *page)
>  {
> - pmd_ptlock_free(page);
> + pmd_ptlock_free(page_ptdesc(page));
>   __ClearPageTable(page);
>   dec_lruvec_page_state(page, NR_PAGETABLE);
>  }
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 10/34] mm: Convert ptlock_init() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:03:59PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index daecf1db6cf1..f48e626d9c98 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2857,7 +2857,7 @@ static inline spinlock_t *pte_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
>  
> -static inline bool ptlock_init(struct page *page)
> +static inline bool ptlock_init(struct ptdesc *ptdesc)
>  {
>   /*
>* prep_new_page() initialize page->private (and therefore page->ptl)
> @@ -2866,10 +2866,10 @@ static inline bool ptlock_init(struct page *page)
>* It can happen if arch try to use slab for page table allocation:
>* slab code uses page->slab_cache, which share storage with page->ptl.
>*/
> - VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
> - if (!ptlock_alloc(page_ptdesc(page)))
> + VM_BUG_ON_PAGE(*(unsigned long *)>ptl, ptdesc_page(ptdesc));
> + if (!ptlock_alloc(ptdesc))
>   return false;
> - spin_lock_init(ptlock_ptr(page_ptdesc(page)));
> + spin_lock_init(ptlock_ptr(ptdesc));
>   return true;
>  }
>  
> @@ -2882,13 +2882,13 @@ static inline spinlock_t *pte_lockptr(struct 
> mm_struct *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  static inline void ptlock_cache_init(void) {}
> -static inline bool ptlock_init(struct page *page) { return true; }
> +static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void ptlock_free(struct page *page) {}
>  #endif /* USE_SPLIT_PTE_PTLOCKS */
>  
>  static inline bool pgtable_pte_page_ctor(struct page *page)
>  {
> - if (!ptlock_init(page))
> + if (!ptlock_init(page_ptdesc(page)))
>   return false;
>   __SetPageTable(page);
>   inc_lruvec_page_state(page, NR_PAGETABLE);
> @@ -2947,7 +2947,7 @@ static inline bool pmd_ptlock_init(struct ptdesc 
> *ptdesc)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   ptdesc->pmd_huge_pte = NULL;
>  #endif
> - return ptlock_init(ptdesc_page(ptdesc));
> + return ptlock_init(ptdesc);
>  }
>  
>  static inline void pmd_ptlock_free(struct page *page)
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 09/34] mm: Convert pmd_ptlock_init() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:03:58PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bb934d51390f..daecf1db6cf1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2942,12 +2942,12 @@ static inline spinlock_t *pmd_lockptr(struct 
> mm_struct *mm, pmd_t *pmd)
>   return ptlock_ptr(pmd_ptdesc(pmd));
>  }
>  
> -static inline bool pmd_ptlock_init(struct page *page)
> +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc)
>  {
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> - page->pmd_huge_pte = NULL;
> + ptdesc->pmd_huge_pte = NULL;
>  #endif
> - return ptlock_init(page);
> + return ptlock_init(ptdesc_page(ptdesc));
>  }
>  
>  static inline void pmd_ptlock_free(struct page *page)
> @@ -2967,7 +2967,7 @@ static inline spinlock_t *pmd_lockptr(struct mm_struct 
> *mm, pmd_t *pmd)
>   return >page_table_lock;
>  }
>  
> -static inline bool pmd_ptlock_init(struct page *page) { return true; }
> +static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
>  static inline void pmd_ptlock_free(struct page *page) {}
>  
>  #define pmd_huge_pte(mm, pmd) ((mm)->pmd_huge_pte)
> @@ -2983,7 +2983,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct 
> *mm, pmd_t *pmd)
>  
>  static inline bool pgtable_pmd_page_ctor(struct page *page)
>  {
> - if (!pmd_ptlock_init(page))
> + if (!pmd_ptlock_init(page_ptdesc(page)))
>   return false;
>   __SetPageTable(page);
>   inc_lruvec_page_state(page, NR_PAGETABLE);
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 08/34] mm: Convert ptlock_ptr() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:03:57PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/x86/xen/mmu_pv.c |  2 +-
>  include/linux/mm.h| 14 +++---
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index b3b8d289b9ab..f469862e3ef4 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -651,7 +651,7 @@ static spinlock_t *xen_pte_lock(struct page *page, struct 
> mm_struct *mm)
>   spinlock_t *ptl = NULL;
>  
>  #if USE_SPLIT_PTE_PTLOCKS
> - ptl = ptlock_ptr(page);
> + ptl = ptlock_ptr(page_ptdesc(page));
>   spin_lock_nest_lock(ptl, >page_table_lock);
>  #endif
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e6f1be2a405e..bb934d51390f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2828,9 +2828,9 @@ void __init ptlock_cache_init(void);
>  bool ptlock_alloc(struct ptdesc *ptdesc);
>  extern void ptlock_free(struct page *page);
>  
> -static inline spinlock_t *ptlock_ptr(struct page *page)
> +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> - return page->ptl;
> + return ptdesc->ptl;
>  }
>  #else /* ALLOC_SPLIT_PTLOCKS */
>  static inline void ptlock_cache_init(void)
> @@ -2846,15 +2846,15 @@ static inline void ptlock_free(struct page *page)
>  {
>  }
>  
> -static inline spinlock_t *ptlock_ptr(struct page *page)
> +static inline spinlock_t *ptlock_ptr(struct ptdesc *ptdesc)
>  {
> - return >ptl;
> + return >ptl;
>  }
>  #endif /* ALLOC_SPLIT_PTLOCKS */
>  
>  static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(pmd_page(*pmd));
> + return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
>  }
>  
>  static inline bool ptlock_init(struct page *page)
> @@ -2869,7 +2869,7 @@ static inline bool ptlock_init(struct page *page)
>   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
>   if (!ptlock_alloc(page_ptdesc(page)))
>   return false;
> - spin_lock_init(ptlock_ptr(page));
> + spin_lock_init(ptlock_ptr(page_ptdesc(page)));
>   return true;
>  }
>  
> @@ -2939,7 +2939,7 @@ static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
>  
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
> + return ptlock_ptr(pmd_ptdesc(pmd));
>  }
>  
>  static inline bool pmd_ptlock_init(struct page *page)
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 07/34] mm: Convert ptlock_alloc() to use ptdescs

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:03:56PM -0700, Vishal Moola (Oracle) wrote:
> This removes some direct accesses to struct page, working towards
> splitting out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 6 +++---
>  mm/memory.c| 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 088b7664f897..e6f1be2a405e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2825,7 +2825,7 @@ static inline void pagetable_clear(void *x)
>  #if USE_SPLIT_PTE_PTLOCKS
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
> -extern bool ptlock_alloc(struct page *page);
> +bool ptlock_alloc(struct ptdesc *ptdesc);
>  extern void ptlock_free(struct page *page);
>  
>  static inline spinlock_t *ptlock_ptr(struct page *page)
> @@ -2837,7 +2837,7 @@ static inline void ptlock_cache_init(void)
>  {
>  }
>  
> -static inline bool ptlock_alloc(struct page *page)
> +static inline bool ptlock_alloc(struct ptdesc *ptdesc)
>  {
>   return true;
>  }
> @@ -2867,7 +2867,7 @@ static inline bool ptlock_init(struct page *page)
>* slab code uses page->slab_cache, which share storage with page->ptl.
>*/
>   VM_BUG_ON_PAGE(*(unsigned long *)>ptl, page);
> - if (!ptlock_alloc(page))
> + if (!ptlock_alloc(page_ptdesc(page)))
>   return false;
>   spin_lock_init(ptlock_ptr(page));
>   return true;
> diff --git a/mm/memory.c b/mm/memory.c
> index 80ce9dda2779..ba9579117686 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5934,14 +5934,14 @@ void __init ptlock_cache_init(void)
>   SLAB_PANIC, NULL);
>  }
>  
> -bool ptlock_alloc(struct page *page)
> +bool ptlock_alloc(struct ptdesc *ptdesc)
>  {
>   spinlock_t *ptl;
>  
>   ptl = kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL);
>   if (!ptl)
>   return false;
> - page->ptl = ptl;
> + ptdesc->ptl = ptl;
>   return true;
>  }
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 06/34] mm: Convert pmd_pgtable_page() to pmd_ptdesc()

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:03:55PM -0700, Vishal Moola (Oracle) wrote:
> Converts pmd_pgtable_page() to pmd_ptdesc() and all its callers. This
> removes some direct accesses to struct page, working towards splitting
> out struct ptdesc from struct page.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/mm.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f184f1eba85d..088b7664f897 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2931,15 +2931,15 @@ static inline void pgtable_pte_page_dtor(struct page 
> *page)
>  
>  #if USE_SPLIT_PMD_PTLOCKS
>  
> -static inline struct page *pmd_pgtable_page(pmd_t *pmd)
> +static inline struct ptdesc *pmd_ptdesc(pmd_t *pmd)
>  {
>   unsigned long mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
> - return virt_to_page((void *)((unsigned long) pmd & mask));
> + return virt_to_ptdesc((void *)((unsigned long) pmd & mask));
>  }
>  
>  static inline spinlock_t *pmd_lockptr(struct mm_struct *mm, pmd_t *pmd)
>  {
> - return ptlock_ptr(pmd_pgtable_page(pmd));
> + return ptlock_ptr(ptdesc_page(pmd_ptdesc(pmd)));
>  }
>  
>  static inline bool pmd_ptlock_init(struct page *page)
> @@ -2958,7 +2958,7 @@ static inline void pmd_ptlock_free(struct page *page)
>   ptlock_free(page);
>  }
>  
> -#define pmd_huge_pte(mm, pmd) (pmd_pgtable_page(pmd)->pmd_huge_pte)
> +#define pmd_huge_pte(mm, pmd) (pmd_ptdesc(pmd)->pmd_huge_pte)
>  
>  #else
>  
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 05/34] mm: add utility functions for ptdesc

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:03:54PM -0700, Vishal Moola (Oracle) wrote:
> Introduce utility functions setting the foundation for ptdescs. These
> will also assist in the splitting out of ptdesc from struct page.
> 
> Functions that focus on the descriptor are prefixed with ptdesc_* while
> functions that focus on the pagetable are prefixed with pagetable_*.
> 
> pagetable_alloc() is defined to allocate new ptdesc pages as compound
> pages. This is to standardize ptdescs by allowing for one allocation
> and one free function, in contrast to 2 allocation and 2 free functions.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  include/asm-generic/tlb.h | 11 +++
>  include/linux/mm.h| 61 +++
>  include/linux/pgtable.h   | 12 
>  3 files changed, 84 insertions(+)
> 
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index b46617207c93..6bade9e0e799 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -481,6 +481,17 @@ static inline void tlb_remove_page(struct mmu_gather 
> *tlb, struct page *page)
>   return tlb_remove_page_size(tlb, page, PAGE_SIZE);
>  }
>  
> +static inline void tlb_remove_ptdesc(struct mmu_gather *tlb, void *pt)
> +{
> + tlb_remove_table(tlb, pt);
> +}
> +
> +/* Like tlb_remove_ptdesc, but for page-like page directories. */
> +static inline void tlb_remove_page_ptdesc(struct mmu_gather *tlb, struct 
> ptdesc *pt)
> +{
> + tlb_remove_page(tlb, ptdesc_page(pt));
> +}
> +
>  static inline void tlb_change_page_size(struct mmu_gather *tlb,
>unsigned int page_size)
>  {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0db09639dd2d..f184f1eba85d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2766,6 +2766,62 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, 
> pud_t *pud, unsigned long a
>  }
>  #endif /* CONFIG_MMU */
>  
> +static inline struct ptdesc *virt_to_ptdesc(const void *x)
> +{
> + return page_ptdesc(virt_to_page(x));
> +}
> +
> +static inline void *ptdesc_to_virt(const struct ptdesc *pt)
> +{
> + return page_to_virt(ptdesc_page(pt));
> +}
> +
> +static inline void *ptdesc_address(const struct ptdesc *pt)
> +{
> + return folio_address(ptdesc_folio(pt));
> +}
> +
> +static inline bool pagetable_is_reserved(struct ptdesc *pt)
> +{
> + return folio_test_reserved(ptdesc_folio(pt));
> +}
> +
> +/**
> + * pagetable_alloc - Allocate pagetables
> + * @gfp:GFP flags
> + * @order:  desired pagetable order
> + *
> + * pagetable_alloc allocates a page table descriptor as well as all pages
> + * described by it.

I think the order should be switched here to emphasize that primarily this
method allocates memory for page tables. How about

 pagetable_alloc allocates memory for the page tables as well as a page
 table descriptor that describes the allocated memory

> + *
> + * Return: The ptdesc describing the allocated page tables.
> + */
> +static inline struct ptdesc *pagetable_alloc(gfp_t gfp, unsigned int order)
> +{
> + struct page *page = alloc_pages(gfp | __GFP_COMP, order);
> +
> + return page_ptdesc(page);
> +}
> +
> +/**
> + * pagetable_free - Free pagetables
> + * @pt:  The page table descriptor
> + *
> + * pagetable_free frees a page table descriptor as well as all page
> + * tables described by said ptdesc.

Similarly here.

> + */
> +static inline void pagetable_free(struct ptdesc *pt)
> +{
> + struct page *page = ptdesc_page(pt);
> +
> + __free_pages(page, compound_order(page));
> +}
> +
> +static inline void pagetable_clear(void *x)
> +{
> + clear_page(x);
> +}
> +
>  #if USE_SPLIT_PTE_PTLOCKS
>  #if ALLOC_SPLIT_PTLOCKS
>  void __init ptlock_cache_init(void);
> @@ -2992,6 +3048,11 @@ static inline void mark_page_reserved(struct page 
> *page)
>   adjust_managed_page_count(page, -1);
>  }
>  
> +static inline void free_reserved_ptdesc(struct ptdesc *pt)
> +{
> + free_reserved_page(ptdesc_page(pt));
> +}
> +
>  /*
>   * Default method to free all the __init memory into the buddy system.
>   * The freed pages will be poisoned with pattern "poison" if it's within
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 330de96ebfd6..c405f74d3875 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1026,6 +1026,18 @@ TABLE_MATCH(ptl, ptl);
>  #undef TABLE_MATCH
>  static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
>  
> +#define ptdesc_page(pt)  (_Generic((pt), 
> \
> + const struct ptdesc *:  (const struct page *)(pt),  \
> + struct ptdesc *:(struct page *)(pt)))
> +
> +#define ptdesc_folio(pt) (_Generic((pt), \
> + const struct ptdesc *:  (const struct folio *)(pt), \
> + struct ptdesc *:(struct folio *)(pt)))
> +
> +#define page_ptdesc(p)  

Re: [PATCH 2/7] watchdog/hardlockup: Make the config checks more straightforward

2023-06-14 Thread Doug Anderson
Hi,

On Wed, Jun 14, 2023 at 3:29 AM Petr Mladek  wrote:
>
> It seems that we have entered into a bike shedding mode.
> The following questions come to my mind:
>
>1. Does this patchset improve the current state?
>
>2. Maybe, it is not black Is it possible to summarize
>   what exactly got better and what got worse?
>
> Maybe, there is no need to do bike-shedding about every step
> if the final result is reasonable and the steps are not
> completely wrong.
>
> I just followed my intuition and tried to do some changes step
> by step. I got lost many times so maybe the steps are not
> ideal. Anyway, the steps helped me to understand the logic
> and stay reasonably confident that they did not change
> the behavior.
>
> I could rework the patchset. But I first need to know what
> exactly is bad in the result. And eventually if there is more
> logical way how to end there.

Sure. I still feel like the end result of the CONFIG options after
your whole patchset is easier to understand / cleaner by adjusting the
dependencies as I have suggested. That being said, I agree that it is
the type of thing that can be more a matter of personal preference. I
do agree that, even if you don't take my suggestion of adjusting the
dependencies, the end result of your patchset still makes things
better than they were.

...so if you really feel strongly that things are more understandable
with the dependencies specified as you have, I won't stand in the way.
I still think you need a v2, though, just to address other nits.

-Doug


Re: [PATCH v4 04/34] pgtable: Create struct ptdesc

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:03:53PM -0700, Vishal Moola (Oracle) wrote:
> Currently, page table information is stored within struct page. As part
> of simplifying struct page, create struct ptdesc for page table
> information.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/pgtable.h | 51 +
>  1 file changed, 51 insertions(+)
> 
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index c5a51481bbb9..330de96ebfd6 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -975,6 +975,57 @@ static inline void ptep_modify_prot_commit(struct 
> vm_area_struct *vma,
>  #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */
>  #endif /* CONFIG_MMU */
>  
> +
> +/**
> + * struct ptdesc - Memory descriptor for page tables.
> + * @__page_flags: Same as page flags. Unused for page tables.
> + * @pt_list: List of used page tables. Used for s390 and x86.
> + * @_pt_pad_1: Padding that aliases with page's compound head.
> + * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs.
> + * @_pt_s390_gaddr: Aliases with page's mapping. Used for s390 gmap only.
> + * @pt_mm: Used for x86 pgds.
> + * @pt_frag_refcount: For fragmented page table tracking. Powerpc and s390 
> only.
> + * @ptl: Lock for the page table.

Do you mind aligning the descriptions by @pt_frag_refcount? I think it'll
be more readable.

> + *
> + * This struct overlays struct page for now. Do not modify without a good
> + * understanding of the issues.
> + */
> +struct ptdesc {
> + unsigned long __page_flags;
> +
> + union {
> + struct list_head pt_list;
> + struct {
> + unsigned long _pt_pad_1;
> + pgtable_t pmd_huge_pte;
> + };
> + };
> + unsigned long _pt_s390_gaddr;
> +
> + union {
> + struct mm_struct *pt_mm;
> + atomic_t pt_frag_refcount;
> + };
> +
> +#if ALLOC_SPLIT_PTLOCKS
> + spinlock_t *ptl;
> +#else
> + spinlock_t ptl;
> +#endif
> +};
> +
> +#define TABLE_MATCH(pg, pt)  \
> + static_assert(offsetof(struct page, pg) == offsetof(struct ptdesc, pt))
> +TABLE_MATCH(flags, __page_flags);
> +TABLE_MATCH(compound_head, pt_list);
> +TABLE_MATCH(compound_head, _pt_pad_1);
> +TABLE_MATCH(pmd_huge_pte, pmd_huge_pte);
> +TABLE_MATCH(mapping, _pt_s390_gaddr);
> +TABLE_MATCH(pt_mm, pt_mm);
> +TABLE_MATCH(ptl, ptl);
> +#undef TABLE_MATCH
> +static_assert(sizeof(struct ptdesc) <= sizeof(struct page));
> +
>  /*
>   * No-op macros that just return the current protection value. Defined here
>   * because these macros can be used even if CONFIG_MMU is not defined.
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH v4 03/34] s390: Use pt_frag_refcount for pagetables

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:03:52PM -0700, Vishal Moola (Oracle) wrote:
> s390 currently uses _refcount to identify fragmented page tables.
> The page table struct already has a member pt_frag_refcount used by
> powerpc, so have s390 use that instead of the _refcount field as well.
> This improves the safety for _refcount and the page table tracking.
> 
> This also allows us to simplify the tracking since we can once again use
> the lower byte of pt_frag_refcount instead of the upper byte of _refcount.
> 
> Signed-off-by: Vishal Moola (Oracle) 

One nit below, otherwise

Acked-by: Mike Rapoport (IBM) 

> ---
>  arch/s390/mm/pgalloc.c | 38 +++---
>  1 file changed, 15 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
> index 66ab68db9842..6b99932abc66 100644
> --- a/arch/s390/mm/pgalloc.c
> +++ b/arch/s390/mm/pgalloc.c
> @@ -182,20 +182,17 @@ void page_table_free_pgste(struct page *page)
>   * As follows from the above, no unallocated or fully allocated parent
>   * pages are contained in mm_context_t::pgtable_list.
>   *
> - * The upper byte (bits 24-31) of the parent page _refcount is used
> + * The lower byte (bits 0-7) of the parent page pt_frag_refcount is used
>   * for tracking contained 2KB-pgtables and has the following format:
>   *
>   *   PP  AA
> - * 01234567upper byte (bits 24-31) of struct page::_refcount
> + * 01234567upper byte (bits 0-7) of struct page::pt_frag_refcount

Nit:  lower

>   *   ||  ||
>   *   ||  |+--- upper 2KB-pgtable is allocated
>   *   ||  + lower 2KB-pgtable is allocated
>   *   |+--- upper 2KB-pgtable is pending for removal
>   *   + lower 2KB-pgtable is pending for removal
>   *
> - * (See commit 620b4e903179 ("s390: use _refcount for pgtables") on why
> - * using _refcount is possible).
> - *
>   * When 2KB-pgtable is allocated the corresponding AA bit is set to 1.
>   * The parent page is either:
>   *   - added to mm_context_t::pgtable_list in case the second half of the
> @@ -243,11 +240,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   if (!list_empty(>context.pgtable_list)) {
>   page = list_first_entry(>context.pgtable_list,
>   struct page, lru);
> - mask = atomic_read(>_refcount) >> 24;
> + mask = atomic_read(>pt_frag_refcount);
>   /*
>* The pending removal bits must also be checked.
>* Failure to do so might lead to an impossible
> -  * value of (i.e 0x13 or 0x23) written to _refcount.
> +  * value of (i.e 0x13 or 0x23) written to
> +  * pt_frag_refcount.
>* Such values violate the assumption that pending and
>* allocation bits are mutually exclusive, and the rest
>* of the code unrails as result. That could lead to
> @@ -259,8 +257,8 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   bit = mask & 1; /* =1 -> second 2K */
>   if (bit)
>   table += PTRS_PER_PTE;
> - atomic_xor_bits(>_refcount,
> - 0x01U << (bit + 24));
> + atomic_xor_bits(>pt_frag_refcount,
> + 0x01U << bit);
>   list_del(>lru);
>   }
>   }
> @@ -281,12 +279,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
>   table = (unsigned long *) page_to_virt(page);
>   if (mm_alloc_pgste(mm)) {
>   /* Return 4K page table with PGSTEs */
> - atomic_xor_bits(>_refcount, 0x03U << 24);
> + atomic_xor_bits(>pt_frag_refcount, 0x03U);
>   memset64((u64 *)table, _PAGE_INVALID, PTRS_PER_PTE);
>   memset64((u64 *)table + PTRS_PER_PTE, 0, PTRS_PER_PTE);
>   } else {
>   /* Return the first 2K fragment of the page */
> - atomic_xor_bits(>_refcount, 0x01U << 24);
> + atomic_xor_bits(>pt_frag_refcount, 0x01U);
>   memset64((u64 *)table, _PAGE_INVALID, 2 * PTRS_PER_PTE);
>   spin_lock_bh(>context.lock);
>   list_add(>lru, >context.pgtable_list);
> @@ -323,22 +321,19 @@ void page_table_free(struct mm_struct *mm, unsigned 
> long *table)
>* will happen outside of the critical section from this
>* function or from __tlb_remove_table()
>*/
> - mask = atomic_xor_bits(>_refcount, 0x11U << (bit + 24));
> - mask >>= 24;
> + mask = atomic_xor_bits(>pt_frag_refcount, 0x11U << bit);
>   if (mask & 0x03U)
>

Re: [PATCH v4 02/34] s390: Use _pt_s390_gaddr for gmap address tracking

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:03:51PM -0700, Vishal Moola (Oracle) wrote:
> s390 uses page->index to keep track of page tables for the guest address
> space. In an attempt to consolidate the usage of page fields in s390,
> replace _pt_pad_2 with _pt_s390_gaddr to replace page->index in gmap.
> 
> This will help with the splitting of struct ptdesc from struct page, as
> well as allow s390 to use _pt_frag_refcount for fragmented page table
> tracking.
> 
> Since page->_pt_s390_gaddr aliases with mapping, ensure its set to NULL
> before freeing the pages as well.

I'm looking at the final result and unless I've missed something, setting
of _pt_s390_gaddr to 0 is always followed by pagetable_free().
Can't we have pagetable_free() take care of zeroing _pt_s390_gaddr?
I think patch 16 ("s390: Convert various gmap functions to use ptdescs")
would be the right place for that.

Otherwise:

Acked-by: Mike Rapoport (IBM) 
 
> This also reverts commit 7e25de77bc5ea ("s390/mm: use pmd_pgtable_page()
> helper in __gmap_segment_gaddr()") which had s390 use
> pmd_pgtable_page() to get a gmap page table, as pmd_pgtable_page()
> should be used for more generic process page tables.
> 
> Signed-off-by: Vishal Moola (Oracle) 
> ---
>  arch/s390/mm/gmap.c  | 56 +++-
>  include/linux/mm_types.h |  2 +-
>  2 files changed, 39 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index dc90d1eb0d55..81c683426b49 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -70,7 +70,7 @@ static struct gmap *gmap_alloc(unsigned long limit)
>   page = alloc_pages(GFP_KERNEL_ACCOUNT, CRST_ALLOC_ORDER);
>   if (!page)
>   goto out_free;
> - page->index = 0;
> + page->_pt_s390_gaddr = 0;
>   list_add(>lru, >crst_list);
>   table = page_to_virt(page);
>   crst_table_init(table, etype);
> @@ -187,16 +187,20 @@ static void gmap_free(struct gmap *gmap)
>   if (!(gmap_is_shadow(gmap) && gmap->removed))
>   gmap_flush_tlb(gmap);
>   /* Free all segment & region tables. */
> - list_for_each_entry_safe(page, next, >crst_list, lru)
> + list_for_each_entry_safe(page, next, >crst_list, lru) {
> + page->_pt_s390_gaddr = 0;
>   __free_pages(page, CRST_ALLOC_ORDER);
> + }
>   gmap_radix_tree_free(>guest_to_host);
>   gmap_radix_tree_free(>host_to_guest);
>  
>   /* Free additional data for a shadow gmap */
>   if (gmap_is_shadow(gmap)) {
>   /* Free all page tables. */
> - list_for_each_entry_safe(page, next, >pt_list, lru)
> + list_for_each_entry_safe(page, next, >pt_list, lru) {
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
> + }
>   gmap_rmap_radix_tree_free(>host_to_rmap);
>   /* Release reference to the parent */
>   gmap_put(gmap->parent);
> @@ -318,12 +322,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>   list_add(>lru, >crst_list);
>   *table = __pa(new) | _REGION_ENTRY_LENGTH |
>   (*table & _REGION_ENTRY_TYPE_MASK);
> - page->index = gaddr;
> + page->_pt_s390_gaddr = gaddr;
>   page = NULL;
>   }
>   spin_unlock(>guest_table_lock);
> - if (page)
> + if (page) {
> + page->_pt_s390_gaddr = 0;
>   __free_pages(page, CRST_ALLOC_ORDER);
> + }
>   return 0;
>  }
>  
> @@ -336,12 +342,14 @@ static int gmap_alloc_table(struct gmap *gmap, unsigned 
> long *table,
>  static unsigned long __gmap_segment_gaddr(unsigned long *entry)
>  {
>   struct page *page;
> - unsigned long offset;
> + unsigned long offset, mask;
>  
>   offset = (unsigned long) entry / sizeof(unsigned long);
>   offset = (offset & (PTRS_PER_PMD - 1)) * PMD_SIZE;
> - page = pmd_pgtable_page((pmd_t *) entry);
> - return page->index + offset;
> + mask = ~(PTRS_PER_PMD * sizeof(pmd_t) - 1);
> + page = virt_to_page((void *)((unsigned long) entry & mask));
> +
> + return page->_pt_s390_gaddr + offset;
>  }
>  
>  /**
> @@ -1351,6 +1359,7 @@ static void gmap_unshadow_pgt(struct gmap *sg, unsigned 
> long raddr)
>   /* Free page table */
>   page = phys_to_page(pgt);
>   list_del(>lru);
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
>  }
>  
> @@ -1379,6 +1388,7 @@ static void __gmap_unshadow_sgt(struct gmap *sg, 
> unsigned long raddr,
>   /* Free page table */
>   page = phys_to_page(pgt);
>   list_del(>lru);
> + page->_pt_s390_gaddr = 0;
>   page_table_free_pgste(page);
>   }
>  }
> @@ -1409,6 +1419,7 @@ static void gmap_unshadow_sgt(struct gmap *sg, unsigned 
> long raddr)
>   /* Free segment table */
>   page = phys_to_page(sgt);
>   

Re: [PATCH v4 01/34] mm: Add PAGE_TYPE_OP folio functions

2023-06-14 Thread Mike Rapoport
On Mon, Jun 12, 2023 at 02:03:50PM -0700, Vishal Moola (Oracle) wrote:
> No folio equivalents for page type operations have been defined, so
> define them for later folio conversions.
> 
> Also changes the Page##uname macros to take in const struct page* since
> we only read the memory here.
> 
> Signed-off-by: Vishal Moola (Oracle) 

Acked-by: Mike Rapoport (IBM) 

> ---
>  include/linux/page-flags.h | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 92a2063a0a23..e99a616b9bcd 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -908,6 +908,8 @@ static inline bool is_page_hwpoison(struct page *page)
>  
>  #define PageType(page, flag) \
>   ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
> +#define folio_test_type(folio, flag) \
> + ((folio->page.page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE)
>  
>  static inline int page_type_has_type(unsigned int page_type)
>  {
> @@ -920,20 +922,34 @@ static inline int page_has_type(struct page *page)
>  }
>  
>  #define PAGE_TYPE_OPS(uname, lname)  \
> -static __always_inline int Page##uname(struct page *page)\
> +static __always_inline int Page##uname(const struct page *page)  
> \
>  {\
>   return PageType(page, PG_##lname);  \
>  }\
> +static __always_inline int folio_test_##lname(const struct folio *folio)\
> +{\
> + return folio_test_type(folio, PG_##lname);  \
> +}\
>  static __always_inline void __SetPage##uname(struct page *page)  
> \
>  {\
>   VM_BUG_ON_PAGE(!PageType(page, 0), page);   \
>   page->page_type &= ~PG_##lname; \
>  }\
> +static __always_inline void __folio_set_##lname(struct folio *folio) \
> +{\
> + VM_BUG_ON_FOLIO(!folio_test_type(folio, 0), folio); \
> + folio->page.page_type &= ~PG_##lname;   \
> +}\
>  static __always_inline void __ClearPage##uname(struct page *page)\
>  {\
>   VM_BUG_ON_PAGE(!Page##uname(page), page);   \
>   page->page_type |= PG_##lname;  \
> -}
> +}\
> +static __always_inline void __folio_clear_##lname(struct folio *folio)   
> \
> +{\
> + VM_BUG_ON_FOLIO(!folio_test_##lname(folio), folio); \
> + folio->page.page_type |= PG_##lname;\
> +}\
>  
>  /*
>   * PageBuddy() indicates that the page is free and in the buddy system
> -- 
> 2.40.1
> 
> 

-- 
Sincerely yours,
Mike.


Re: [PATCH 3/9] cpu/SMT: Store the current/max number of threads

2023-06-14 Thread Laurent Dufour
On 13/06/2023 20:53:56, Thomas Gleixner wrote:
> On Tue, Jun 13 2023 at 19:16, Laurent Dufour wrote:
>> On 10/06/2023 23:26:18, Thomas Gleixner wrote:
>>> On Thu, May 25 2023 at 01:56, Michael Ellerman wrote:
  #ifdef CONFIG_HOTPLUG_SMT
  enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 +static unsigned int cpu_smt_max_threads __ro_after_init;
 +unsigned int cpu_smt_num_threads;
>>>
>>> Why needs this to be global? cpu_smt_control is pointlessly global already.
>>
>> I agree that cpu_smt_*_threads should be static.

I spoke too quickly, cpu_smt_num_threads is used in the powerpc code.

When a new CPU is added it used to decide whether a thread has to be
onlined or not, and there is no way to pass it as argument at this time.
In details, it is used in topology_smt_thread_allowed() called by
dlpar_online_cpu() (see patch "powerpc/pseries: Honour current SMT state
when DLPAR onlining CPUs" at the end of this series).

I think the best option is to keep it global.

>>
>> Howwever, regarding cpu_smt_control, it is used in 2 places in the x86 code:
>>  - arch/x86/power/hibernate.c in arch_resume_nosmt()
>>  - arch/x86/kernel/cpu/bugs.c in spectre_v2_user_select_mitigation()
> 
> Bah. I must have fatfingered the grep then.
> 
>> An accessor function may be introduced to read that value in these 2
>> functions, but I'm wondering if that's really the best option.
>>
>> Unless there is a real need to change this through this series, I think
>> cpu_smt_control can remain global.
> 
> That's fine.
> 
> Thanks,
> 
> tglx



Re: [PATCH RESEND v2] KVM: move KVM_CAP_DEVICE_CTRL to the generic check

2023-06-14 Thread Oliver Upton
On Tue, Jun 13, 2023 at 02:16:16PM -0700, Sean Christopherson wrote:
> +
> 
> Please use scripts/get_maintainer.pl to generate the To/Cc lists.  This may be
> trivial, but it still needs eyeballs from the relevant maintainers.

+1000. I'd buy someone a beer if they made a bot that just ran
get_maintainer on patches that hit the list :)

> On Wed, Mar 15, 2023, Wei Wang wrote:
> > KVM_CAP_DEVICE_CTRL allows userspace to check if the kvm_device
> > framework (e.g. KVM_CREATE_DEVICE) is supported by KVM. Move
> > KVM_CAP_DEVICE_CTRL to the generic check for the two reasons:
> > 1) it already supports arch agnostic usages (i.e. KVM_DEV_TYPE_VFIO).
> > For example, userspace VFIO implementation may needs to create
> > KVM_DEV_TYPE_VFIO on x86, riscv, or arm etc. It is simpler to have it
> > checked at the generic code than at each arch's code.
> > 2) KVM_CREATE_DEVICE has been added to the generic code.
> > 
> > Link: 
> > https://lore.kernel.org/all/20221215115207.14784-1-wei.w.w...@intel.com
> > Signed-off-by: Wei Wang 
> > Reviewed-by: Sean Christopherson 
> > ---
> >  arch/arm64/kvm/arm.c   | 1 -
> >  arch/powerpc/kvm/powerpc.c | 1 -
> >  arch/riscv/kvm/vm.c| 1 -
> >  arch/s390/kvm/kvm-s390.c   | 1 -
> >  virt/kvm/kvm_main.c| 1 +
> >  5 files changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 3bd732eaf087..96329e675771 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -202,7 +202,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> > ext)
> > r = vgic_present;
> > break;
> > case KVM_CAP_IOEVENTFD:
> > -   case KVM_CAP_DEVICE_CTRL:
> > case KVM_CAP_USER_MEMORY:
> > case KVM_CAP_SYNC_MMU:
> > case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:

for arm64:

Reviewed-by: Oliver Upton 

-- 
Thanks,
Oliver


[PATCH] ASoC: imx-audmix: check return value of devm_kasprintf()

2023-06-14 Thread Claudiu Beznea
devm_kasprintf() returns a pointer to dynamically allocated memory.
Pointer could be NULL in case allocation fails. Check pointer validity.
Identified with coccinelle (kmerr.cocci script).

Fixes: b86ef5367761 ("ASoC: fsl: Add Audio Mixer machine driver")
Signed-off-by: Claudiu Beznea 
---

Hi,

This has been addressed using kmerr.cocci script proposed for update
at [1].

Thank you,
Claudiu Beznea

[1] 
https://lore.kernel.org/all/20230530074044.1603426-1-claudiu.bez...@microchip.com/

 sound/soc/fsl/imx-audmix.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/sound/soc/fsl/imx-audmix.c b/sound/soc/fsl/imx-audmix.c
index 2c57fe9d2d08..af06268ee57b 100644
--- a/sound/soc/fsl/imx-audmix.c
+++ b/sound/soc/fsl/imx-audmix.c
@@ -228,6 +228,8 @@ static int imx_audmix_probe(struct platform_device *pdev)
 
dai_name = devm_kasprintf(>dev, GFP_KERNEL, "%s%s",
  fe_name_pref, args.np->full_name + 1);
+   if (!dai_name)
+   return -ENOMEM;
 
dev_info(pdev->dev.parent, "DAI FE name:%s\n", dai_name);
 
@@ -236,6 +238,8 @@ static int imx_audmix_probe(struct platform_device *pdev)
capture_dai_name =
devm_kasprintf(>dev, GFP_KERNEL, "%s %s",
   dai_name, "CPU-Capture");
+   if (!capture_dai_name)
+   return -ENOMEM;
}
 
priv->dai[i].cpus = [0];
@@ -263,6 +267,8 @@ static int imx_audmix_probe(struct platform_device *pdev)
   "AUDMIX-Playback-%d", i);
be_cp = devm_kasprintf(>dev, GFP_KERNEL,
   "AUDMIX-Capture-%d", i);
+   if (!be_name || !be_pb || !be_cp)
+   return -ENOMEM;
 
priv->dai[num_dai + i].cpus = [2];
priv->dai[num_dai + i].codecs = [3];
@@ -287,6 +293,9 @@ static int imx_audmix_probe(struct platform_device *pdev)
priv->dapm_routes[i].source =
devm_kasprintf(>dev, GFP_KERNEL, "%s %s",
   dai_name, "CPU-Playback");
+   if (!priv->dapm_routes[i].source)
+   return -ENOMEM;
+
priv->dapm_routes[i].sink = be_pb;
priv->dapm_routes[num_dai + i].source   = be_pb;
priv->dapm_routes[num_dai + i].sink = be_cp;
-- 
2.34.1



Re: [PATCH 14/16] powerpc/book3s64/vmemmap: Switch radix to use a different vmemmap handling function

2023-06-14 Thread Sachin Sant


> 1. First try to map things using PMD (2M)
> 2. With altmap if altmap cross-boundary check returns true, fall back to 
> PAGE_SIZE
> 3. IF we can't allocate PMD_SIZE backing memory for vmemmap, fallback to 
> PAGE_SIZE
> 
> On removing vmemmap mapping, check if every subsection that is using the 
> vmemmap
> area is invalid. If found to be invalid, that implies we can safely free the
> vmemmap area. We don't use the PAGE_UNUSED pattern used by x86 because with 
> 64K
> page size, we need to do the above check even at the PAGE_SIZE granularity.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---

With this patch series applied I see the following warning

[  OK  ] Started Monitoring of LVM2 mirrors,…sing dmeventd or progress polling.
[3.283884] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: nvdimm pmu 
didn't register rc=-2
[3.284212] papr_scm ibm,persistent-memory:ibm,pmemory@44104002: nvdimm pmu 
didn't register rc=-2
[3.563890] radix-mmu: Mapped 0x04001000-0x040c9000 with 
64.0 KiB pages
[3.703227] [ cut here ]
[3.703236] failed to free all reserved pages
[3.703244] WARNING: CPU: 41 PID: 923 at mm/memremap.c:152 
memunmap_pages+0x37c/0x3a0
[3.703252] Modules linked in: device_dax(+) nd_pmem nd_btt dax_pmem 
papr_scm libnvdimm pseries_rng vmx_crypto aes_gcm_p10_crypto ext4 mbcache jbd2 
sd_mod t10_pi crc64_rocksoft crc64 sg ibmvscsi scsi_transport_srp ibmveth fuse
[3.703272] CPU: 41 PID: 923 Comm: systemd-udevd Not tainted 
6.4.0-rc6-00037-gb6dad5178cea-dirty #1
[3.703276] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf06 
of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
[3.703280] NIP:  c057a18c LR: c057a188 CTR: 005ca81c
[3.703283] REGS: c00032a170d0 TRAP: 0700   Not tainted  
(6.4.0-rc6-00037-gb6dad5178cea-dirty)
[3.703286] MSR:  8282b033   CR: 
48248824  XER: 0002
[3.703296] CFAR: c015f0c0 IRQMASK: 0  [3.703296] GPR00: 
c057a188 c00032a17370 c1421500 0021  [
3.703296] GPR04: 7fff c00032a17140 c00032a17138 
0027  [3.703296] GPR08: c015c91a7c10 0001 
0027 c2a18a20  [3.703296] GPR12: 48248824 
c015cb9f4300 c00032a17d68 c1262b20  [3.703296] GPR16: 
c0080131 ff20 fff2 c008012d7418  [
3.703296] GPR20: c00032a17c30 0004 c005 
01000200  [3.703296] GPR24: c2f11570 ce376870 
0001 0001  [3.703296] GPR28: ce376840 
ce3768c8  ce376840  [3.70] NIP 
[c057a18c] memunmap_pages+0x37c/0x3a0
[3.703338] LR [c057a188] memunmap_pages+0x378/0x3a0
[3.703342] Call Trace:
[3.703344] [c00032a17370] [c057a188] memunmap_pages+0x378/0x3a0 
(unreliable)
[3.703349] [c00032a17420] [c057a928] memremap_pages+0x4a8/0x890
[3.703355] [c00032a17500] [c057ad4c] 
devm_memremap_pages+0x3c/0xd0
[3.703359] [c00032a17540] [c008011c084c] dev_dax_probe+0x134/0x3a0 
[device_dax]
[3.703366] [c00032a175e0] [c09f7e8c] dax_bus_probe+0xac/0x140
[3.703371] [c00032a17610] [c09b5828] really_probe+0x108/0x530
[3.703375] [c00032a176a0] [c09b5d04] 
__driver_probe_device+0xb4/0x200
[3.703379] [c00032a17720] [c09b5ea8] 
driver_probe_device+0x58/0x120
[3.703383] [c00032a17760] [c09b6298] __driver_attach+0x148/0x250
[3.703387] [c00032a177e0] [c09b1a58] bus_for_each_dev+0xa8/0x130
[3.703392] [c00032a17840] [c09b4b34] driver_attach+0x34/0x50
[3.703396] [c00032a17860] [c09b3b98] bus_add_driver+0x258/0x300
[3.703400] [c00032a178f0] [c09b78d4] driver_register+0xa4/0x1b0
[3.703404] [c00032a17960] [c09f9530] 
__dax_driver_register+0x50/0x70
[3.703409] [c00032a17980] [c008011c1374] dax_init+0x3c/0x58 
[device_dax]
[3.703414] [c00032a179a0] [c0013260] do_one_initcall+0x60/0x2f0
[3.703418] [c00032a17a70] [c0248af8] do_init_module+0x78/0x310
[3.703424] [c00032a17af0] [c024bcac] load_module+0x2a7c/0x2f30
[3.703429] [c00032a17d00] [c024c4f0] 
__do_sys_finit_module+0xe0/0x180
[3.703434] [c00032a17e10] [c00374c0] 
system_call_exception+0x140/0x350
[3.703439] [c00032a17e50] [c000d6a0] 
system_call_common+0x160/0x2e4
[3.703444] --- interrupt: c00 at 0x7fff9af2fb34
[3.703447] NIP:  7fff9af2fb34 LR: 7fff9b6dea9c CTR: 
[3.703450] REGS: c00032a17e80 TRAP: 0c00   Not tainted  
(6.4.0-rc6-00037-gb6dad5178cea-dirty)
[3.703453] MSR:  8280f033   CR: 
2804  XER: 
[3.703462] IRQMASK: 0  [3.703462] GPR00: 

Re: [PATCH 2/7] watchdog/hardlockup: Make the config checks more straightforward

2023-06-14 Thread Petr Mladek
On Thu 2023-06-08 06:55:23, Doug Anderson wrote:
> Hi,
> 
> On Thu, Jun 8, 2023 at 4:02 AM Petr Mladek  wrote:
> >
> > > >  config HARDLOCKUP_DETECTOR
> > > > bool "Detect Hard Lockups"
> > > > depends on DEBUG_KERNEL && !S390
> > > > -   depends on HAVE_HARDLOCKUP_DETECTOR_NON_ARCH || 
> > > > HAVE_HARDLOCKUP_DETECTOR_ARCH
> > > > +   depends on ((HAVE_HARDLOCKUP_DETECTOR_PERF || 
> > > > HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || 
> > > > HAVE_HARDLOCKUP_DETECTOR_ARCH
> > >
> > > Adding the dependency to buddy (see ablove) would simplify the above
> > > to just this:
> > >
> > > depends on HAVE_HARDLOCKUP_DETECTOR_PERF ||
> > > HAVE_HARDLOCKUP_DETECTOR_BUDDY || HAVE_HARDLOCKUP_DETECTOR_ARCH
> >
> > This is exactly what I do not want. It would just move the check
> > somewhere else. But it would make the logic harder to understand.
> 
> Hmmm. To me, it felt easier to understand by moving this into the
> "HAVE_HARDLOCKUP_DETECTOR_BUDDY". To me it was pretty easy to say "if
> an architecture defined its own arch-specific watchdog then buddy
> can't be enabled" and that felt like it fit cleanly within the
> "HAVE_HARDLOCKUP_DETECTOR_BUDDY" definition. It got rid of _a lot_ of
> other special cases / checks elsewhere and felt quite a bit cleaner to
> me. I only had to think about the conflict between the "buddy" and
> "nmi" watchdogs once when I understood
> "HAVE_HARDLOCKUP_DETECTOR_BUDDY".

I see. My problem with this approach was that the dependencies between
the 4 alternative implementations were too distributed. It was
necessary read many definitions to understand what was possible and
what was not possible. And it is even more complicated when
cscope does not support Kconfig.

Also the above solves the buddy detector which is global.

The same conflict has PERF which has arch-specific dependencies.
Maybe, it can be disabled by a conflict in the arch/Kconfig.
But then the PERF dependencies would be split into 3 config
files: arch/Kconfig, lib/Kconfig.debug, arch/Kconfig/.

Anyway, HAVE_*_BUDDY and HAVE_*_PERF should behave the same.
Both should either explicitly conflict with HAVE_*_ARCH
and HAVE_NMI_WATCHDOG. Or they both should be enabled when
they are supported by the architecture and just ignored when
choosing the final implementation.

My wish was to have consistent naming:

   + HAVE_HARDLOCKUP_DETECTOR_ set when the the architecture
   supports the particular implementation.

  + HARDLOCKUP_DETECTOR_ set when the implementation will
   be used (built).


Step aside:

It seems that we have entered into a bike shedding mode.
The following questions come to my mind:

   1. Does this patchset improve the current state?

   2. Maybe, it is not black Is it possible to summarize
  what exactly got better and what got worse?

Maybe, there is no need to do bike-shedding about every step
if the final result is reasonable and the steps are not
completely wrong.

I just followed my intuition and tried to do some changes step
by step. I got lost many times so maybe the steps are not
ideal. Anyway, the steps helped me to understand the logic
and stay reasonably confident that they did not change
the behavior.

I could rework the patchset. But I first need to know what
exactly is bad in the result. And eventually if there is more
logical way how to end there.

Best Regards,
Petr


Re: [PATCH 00/17] tool/perf/test: Fix shellcheck coding/formatting issues of test shell scripts

2023-06-14 Thread Athira Rajeev



> On 14-Jun-2023, at 2:04 AM, Arnaldo Carvalho de Melo  wrote:
> 
> Em Tue, Jun 13, 2023 at 10:11:28PM +0530, Athira Rajeev escreveu:
>> Patchset covers a set of fixes for coding/formatting issues observed while
>> running shellcheck tool on the perf test shell scripts. Shellcheck is a 
>> static
>> analysis tool that can find semantic/syntax bugs in the shell scripts.
> 
> Thanks, applied the series.

Hi,

Thanks Arnaldo for picking up the patchset.
We will check and resubmit patch 6.

Thanks
Athira
> 
> - Arnaldo
> 
>> Patches 1-14 fixes the issues found with shellcheck. Patch 15, 16
>> and patch 17 address a fix in task_analyzer test.
>> 
>> This cleanup is a pre-requisite to include a build option for shellcheck
>> discussed here: https://www.spinics.net/lists/linux-perf-users/msg25553.html
>> Also this is first set of patches. There will be one more set which will
>> include build option for shellcheck as discussed in the mail thread.
>> 
>> Abhirup Deb (2):
>>  tools/perf/tests: fix test_arm_spe.sh signal case issues
>>  perf/tests/shell: fix shellscript errors for lock_contention.sh
>> 
>> Aboorva Devarajan (1):
>>  tools/perf/tests: Fix shellcheck issues in test_task_analyzer.sh file
>> 
>> Aditya Gupta (3):
>>  perf tests task_analyzer: fix bad substitution ${$1}
>>  perf tests task_analyzer: print command on failure
>>  perf tests task_analyzer: skip tests if no libtraceevent support
>> 
>> Akanksha J N (1):
>>  tools/perf/tests: Fix shellcheck warnings for
>>trace+probe_vfs_getname.sh
>> 
>> Anushree Mathur (1):
>>  perf/tests/shell : Shellcheck fixes for perf test
>>"test_arm_coresight.sh"
>> 
>> Barnali Guha Thakurata (1):
>>  tools/perf/tests/shell/stat_all_metrics: Fix shellcheck warning SC2076
>>in stat_all_metrics.sh
>> 
>> Disha Goel (1):
>>  tools/perf/tests: fix shellcheck warning for stat+json_output
>> 
>> Geetika (1):
>>  tools/perf/tests: Fix all POSIX sh warnings in perf shell test
>>test_brstack.sh
>> 
>> Korrapati Likhitha (1):
>>  tools/perf/tests: Fix shellcheck warnings for stat+csv_output
>> 
>> Samir Mulani (1):
>>  tools/perf/tests: fixed shellcheck warnings for perf shell scripts
>> 
>> Shirisha G (1):
>>  tools/perf/tests: fix shellcheck warnings for daemon.sh
>> 
>> Sourabh Jain (1):
>>  perf: get rid of unused import
>> 
>> Spoorthy S (2):
>>  shellcheck : fixing signal names and adding double quotes for
>>expression in test_arm_callgraph_fp
>>  tools/perf/tests: Fix all POSIX sh warnings in stat+shadow_stat.sh
>> 
>> .../scripts/python/arm-cs-trace-disasm.py |   1 -
>> tools/perf/tests/shell/buildid.sh |  12 +-
>> tools/perf/tests/shell/daemon.sh  | 113 --
>> tools/perf/tests/shell/lock_contention.sh |  70 +--
>> .../shell/record+probe_libc_inet_pton.sh  |   6 +-
>> .../shell/record+script_probe_vfs_getname.sh  |   4 +-
>> tools/perf/tests/shell/stat+csv_output.sh |   4 +-
>> tools/perf/tests/shell/stat+json_output.sh|   2 +-
>> tools/perf/tests/shell/stat+shadow_stat.sh|   4 +-
>> tools/perf/tests/shell/stat_all_metrics.sh|   6 +-
>> .../perf/tests/shell/test_arm_callgraph_fp.sh |   6 +-
>> tools/perf/tests/shell/test_arm_coresight.sh  |   6 +-
>> tools/perf/tests/shell/test_arm_spe.sh|   2 +-
>> tools/perf/tests/shell/test_brstack.sh|  12 +-
>> tools/perf/tests/shell/test_task_analyzer.sh  |  98 ---
>> .../tests/shell/trace+probe_vfs_getname.sh|   6 +-
>> 16 files changed, 203 insertions(+), 149 deletions(-)
>> 
>> -- 
>> 2.39.1
>> 
> 
> -- 
> 
> - Arnaldo




Re: [PATCH v4 27/34] nios2: Convert __pte_free_tlb() to use ptdescs

2023-06-14 Thread Geert Uytterhoeven
Hi Dinh,

On Wed, Jun 14, 2023 at 12:17 AM Dinh Nguyen  wrote:
> On 6/12/23 16:04, Vishal Moola (Oracle) wrote:
> > Part of the conversions to replace pgtable constructor/destructors with
> > ptdesc equivalents.
> >
> > Signed-off-by: Vishal Moola (Oracle) 
> > ---
> >   arch/nios2/include/asm/pgalloc.h | 8 
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/nios2/include/asm/pgalloc.h 
> > b/arch/nios2/include/asm/pgalloc.h
> > index ecd1657bb2ce..ce6bb8e74271 100644
> > --- a/arch/nios2/include/asm/pgalloc.h
> > +++ b/arch/nios2/include/asm/pgalloc.h
> > @@ -28,10 +28,10 @@ static inline void pmd_populate(struct mm_struct *mm, 
> > pmd_t *pmd,
> >
> >   extern pgd_t *pgd_alloc(struct mm_struct *mm);
> >
> > -#define __pte_free_tlb(tlb, pte, addr)   \
> > - do {\
> > - pgtable_pte_page_dtor(pte); \
> > - tlb_remove_page((tlb), (pte));  \
> > +#define __pte_free_tlb(tlb, pte, addr) 
> >   \
> > + do {\
> > + pagetable_pte_dtor(page_ptdesc(pte));   \
> > + tlb_remove_page_ptdesc((tlb), (page_ptdesc(pte)));  \
> >   } while (0)
> >
> >   #endif /* _ASM_NIOS2_PGALLOC_H */
>
> Applied!

I don't think you can just apply this patch, as the new functions
were only introduced in [PATCH v4 05/34] of this series.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[PATCH] powerpc/trace: Add support for HAVE_FUNCTION_ARG_ACCESS_API

2023-06-14 Thread Naveen N Rao
When creating a kprobe on function entry through tracefs, enable
arguments to be recorded to be specified using $argN syntax.

Signed-off-by: Naveen N Rao 
---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/ptrace.h | 17 +
 2 files changed, 18 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index bff5820b7cda14..bd76ae95146b42 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -233,6 +233,7 @@ config PPC
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FAST_GUP
select HAVE_FTRACE_MCOUNT_RECORD
+   select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_DESCRIPTORSif PPC64_ELF_ABI_V1
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_FUNCTION_GRAPH_TRACER
diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index 0eb90a01334666..68ce2381b18ae1 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -397,6 +397,23 @@ static inline unsigned long 
regs_get_kernel_stack_nth(struct pt_regs *regs,
return 0;
 }
 
+/**
+ * regs_get_kernel_argument() - get Nth function argument in kernel
+ * @regs:  pt_regs of that context
+ * @n: function argument number (start from 0)
+ *
+ * We support up to 8 arguments and assume they are sent in through the GPRs.
+ * This will fail for fp/vector arguments, but those aren't usually found in
+ * kernel code. This is expected to be called from kprobes or ftrace with regs.
+ */
+static inline unsigned long regs_get_kernel_argument(struct pt_regs *regs, 
unsigned int n)
+{
+#define NR_REG_ARGUMENTS 8
+   if (n < NR_REG_ARGUMENTS)
+   return regs_get_register(regs, offsetof(struct pt_regs, gpr[3 + 
n]));
+   return 0;
+}
+
 #endif /* __ASSEMBLY__ */
 
 #ifndef __powerpc64__

base-commit: bd517a8442b6c6646a136421cd4c1b95bf4ce32b
-- 
2.40.1



Re: [PATCH] powerpc/xmon: Fix comparing pointer

2023-06-14 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 14/06/2023 à 07:48, wuyonggang...@208suo.com a écrit :
>> [Vous ne recevez pas souvent de courriers de wuyonggang...@208suo.com. 
>> D?couvrez pourquoi ceci est important ? 
>> https://aka.ms/LearnAboutSenderIdentification ]
>> 
>> Fix the following coccicheck warning:
>> 
>> arch/powerpc/xmon/spu-dis.c:51:34-35: WARNING comparing pointer to 0
>
> Once again, why do you change the formating of the document ?

And regardless, this file is taken from binutils, so we don't want to
take pointless cleanup patches to it, because then it needlessly
diverges from the binutils source.

cheers


[PATCH] selftests/powerpc: Remove unneeded variable

2023-06-14 Thread wuyonggang001

Fix the following coccicheck warning:

tools/testing/selftests/powerpc/alignment/alignment_handler.c:558:5-7: 
Unneeded variable: "rc". Return "0"


Signed-off-by: Yonggang Wu 
---
 .../powerpc/alignment/alignment_handler.c | 24 +--
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git 
a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c

index 33ee34fc0828..4980656c3f70 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -332,7 +332,7 @@ int test_alignment_handler_vsx_206(void)
 STORE_VSX_XFORM_TEST(stxvd2x);
 STORE_VSX_XFORM_TEST(stxvw4x);
 STORE_VSX_XFORM_TEST(stxsdx);
-return rc;
+return 0;
 }

 int test_alignment_handler_vsx_207(void)
@@ -348,7 +348,7 @@ int test_alignment_handler_vsx_207(void)
 LOAD_VSX_XFORM_TEST(lxsiwzx);
 STORE_VSX_XFORM_TEST(stxsspx);
 STORE_VSX_XFORM_TEST(stxsiwx);
-return rc;
+return 0;
 }

 int test_alignment_handler_vsx_300(void)
@@ -380,7 +380,7 @@ int test_alignment_handler_vsx_300(void)
 STORE_VSX_XFORM_TEST(stxvx);
 STORE_VSX_XFORM_TEST(stxvl);
 STORE_VSX_XFORM_TEST(stxvll);
-return rc;
+return 0;
 }

 int test_alignment_handler_vsx_prefix(void)
@@ -399,7 +399,7 @@ int test_alignment_handler_vsx_prefix(void)
 STORE_VSX_8LS_PREFIX_TEST(PSTXSSP, 0);
 STORE_VSX_8LS_PREFIX_TEST(PSTXV0, 0);
 STORE_VSX_8LS_PREFIX_TEST(PSTXV1, 1);
-return rc;
+return 0;
 }

 int test_alignment_handler_integer(void)
@@ -458,7 +458,7 @@ int test_alignment_handler_integer(void)
 STORE_DFORM_TEST(stmw);
 #endif

-return rc;
+return 0;
 }

 int test_alignment_handler_integer_206(void)
@@ -473,7 +473,7 @@ int test_alignment_handler_integer_206(void)
 LOAD_XFORM_TEST(ldbrx);
 STORE_XFORM_TEST(stdbrx);

-return rc;
+return 0;
 }

 int test_alignment_handler_integer_prefix(void)
@@ -494,7 +494,7 @@ int test_alignment_handler_integer_prefix(void)
 STORE_MLS_PREFIX_TEST(PSTH);
 STORE_MLS_PREFIX_TEST(PSTW);
 STORE_8LS_PREFIX_TEST(PSTD);
-return rc;
+return 0;
 }

 int test_alignment_handler_vmx(void)
@@ -522,7 +522,7 @@ int test_alignment_handler_vmx(void)
 STORE_VMX_XFORM_TEST(stvehx);
 STORE_VMX_XFORM_TEST(stvewx);
 STORE_VMX_XFORM_TEST(stvxl);
-return rc;
+return 0;
 }

 int test_alignment_handler_fp(void)
@@ -550,7 +550,7 @@ int test_alignment_handler_fp(void)
 STORE_FLOAT_XFORM_TEST(stfsux);
 STORE_FLOAT_XFORM_TEST(stfiwx);

-return rc;
+return 0;
 }

 int test_alignment_handler_fp_205(void)
@@ -568,7 +568,7 @@ int test_alignment_handler_fp_205(void)
 STORE_FLOAT_DFORM_TEST(stfdp);
 STORE_FLOAT_XFORM_TEST(stfdpx);

-return rc;
+return 0;
 }

 int test_alignment_handler_fp_206(void)
@@ -582,7 +582,7 @@ int test_alignment_handler_fp_206(void)

 LOAD_FLOAT_XFORM_TEST(lfiwzx);

-return rc;
+return 0;
 }

@@ -599,7 +599,7 @@ int test_alignment_handler_fp_prefix(void)
 LOAD_FLOAT_MLS_PREFIX_TEST(PLFD);
 STORE_FLOAT_MLS_PREFIX_TEST(PSTFS);
 STORE_FLOAT_MLS_PREFIX_TEST(PSTFD);
-return rc;
+return 0;
 }

 void usage(char *prog)


[PATCH] selftests/powerpc: remove unneeded variable

2023-06-14 Thread baomingtong001

fix the following coccicheck warning:

tools/testing/selftests/powerpc/alignment/alignment_handler.c:530:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:558:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:576:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:591:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:407:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:466:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:481:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:502:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:322:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:340:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:356:5-7: 
Unneeded variable: "rc". Return "0".
tools/testing/selftests/powerpc/alignment/alignment_handler.c:388:5-7: 
Unneeded variable: "rc". Return "0".


Signed-off-by: Mingtong Bao 
---
 .../powerpc/alignment/alignment_handler.c | 36 +++
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git 
a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c

index 33ee34fc0828..56fc26c2b75a 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -319,7 +319,6 @@ static bool can_open_cifile(void)

 int test_alignment_handler_vsx_206(void)
 {
-int rc = 0;

 SKIP_IF(!can_open_cifile());
 SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
@@ -332,12 +331,11 @@ int test_alignment_handler_vsx_206(void)
 STORE_VSX_XFORM_TEST(stxvd2x);
 STORE_VSX_XFORM_TEST(stxvw4x);
 STORE_VSX_XFORM_TEST(stxsdx);
-return rc;
+return 0;
 }

 int test_alignment_handler_vsx_207(void)
 {
-int rc = 0;

 SKIP_IF(!can_open_cifile());
 SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_2_07));
@@ -348,12 +346,11 @@ int test_alignment_handler_vsx_207(void)
 LOAD_VSX_XFORM_TEST(lxsiwzx);
 STORE_VSX_XFORM_TEST(stxsspx);
 STORE_VSX_XFORM_TEST(stxsiwx);
-return rc;
+return 0;
 }

 int test_alignment_handler_vsx_300(void)
 {
-int rc = 0;

 SKIP_IF(!can_open_cifile());

@@ -380,12 +377,11 @@ int test_alignment_handler_vsx_300(void)
 STORE_VSX_XFORM_TEST(stxvx);
 STORE_VSX_XFORM_TEST(stxvl);
 STORE_VSX_XFORM_TEST(stxvll);
-return rc;
+return 0;
 }

 int test_alignment_handler_vsx_prefix(void)
 {
-int rc = 0;

 SKIP_IF(!can_open_cifile());
 SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_3_1));
@@ -399,12 +395,11 @@ int test_alignment_handler_vsx_prefix(void)
 STORE_VSX_8LS_PREFIX_TEST(PSTXSSP, 0);
 STORE_VSX_8LS_PREFIX_TEST(PSTXV0, 0);
 STORE_VSX_8LS_PREFIX_TEST(PSTXV1, 1);
-return rc;
+return 0;
 }

 int test_alignment_handler_integer(void)
 {
-int rc = 0;

 SKIP_IF(!can_open_cifile());

@@ -458,12 +453,11 @@ int test_alignment_handler_integer(void)
 STORE_DFORM_TEST(stmw);
 #endif

-return rc;
+return 0;
 }

 int test_alignment_handler_integer_206(void)
 {
-int rc = 0;

 SKIP_IF(!can_open_cifile());
 SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
@@ -473,12 +467,11 @@ int test_alignment_handler_integer_206(void)
 LOAD_XFORM_TEST(ldbrx);
 STORE_XFORM_TEST(stdbrx);

-return rc;
+return 0;
 }

 int test_alignment_handler_integer_prefix(void)
 {
-int rc = 0;

 SKIP_IF(!can_open_cifile());
 SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_3_1));
@@ -494,12 +487,11 @@ int test_alignment_handler_integer_prefix(void)
 STORE_MLS_PREFIX_TEST(PSTH);
 STORE_MLS_PREFIX_TEST(PSTW);
 STORE_8LS_PREFIX_TEST(PSTD);
-return rc;
+return 0;
 }

 int test_alignment_handler_vmx(void)
 {
-int rc = 0;

 SKIP_IF(!can_open_cifile());
 SKIP_IF(!have_hwcap(PPC_FEATURE_HAS_ALTIVEC));
@@ -522,12 +514,11 @@ int test_alignment_handler_vmx(void)
 STORE_VMX_XFORM_TEST(stvehx);
 STORE_VMX_XFORM_TEST(stvewx);
 STORE_VMX_XFORM_TEST(stvxl);
-return rc;
+return 0;
 }

 int test_alignment_handler_fp(void)
 {
-int rc = 0;

 SKIP_IF(!can_open_cifile());

@@ -550,12 +541,11 @@ int test_alignment_handler_fp(void)
 STORE_FLOAT_XFORM_TEST(stfsux);
 STORE_FLOAT_XFORM_TEST(stfiwx);

-return rc;
+return 0;
 }

 int test_alignment_handler_fp_205(void)
 {
-int rc = 0;

 SKIP_IF(!can_open_cifile());
 SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_05));
@@ -568,12 +558,11 @@ int test_alignment_handler_fp_205(void)
 

[PATCH] KVM: PPC: remove unneeded variable

2023-06-14 Thread baomingtong001

fix the following coccicheck warning:

arch/powerpc/kvm/book3s_pr.c:424:5-6: Unneeded variable: "r".

Signed-off-by: Mingtong Bao 
---
 arch/powerpc/kvm/book3s_pr.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 9118242063fb..1b68de369b88 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -421,14 +421,13 @@ void kvmppc_restore_tm_pr(struct kvm_vcpu *vcpu)

 static int kvmppc_core_check_requests_pr(struct kvm_vcpu *vcpu)
 {
-int r = 1; /* Indicate we want to get back into the guest */

 /* We misuse TLB_FLUSH to indicate that we want to clear
all shadow cache entries */
 if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
 kvmppc_mmu_pte_flush(vcpu, 0, 0);
-
-return r;
+/* Indicate we want to get back into the guest */
+return 1;
 }

 /* MMU Notifiers */