Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Oliver O'Halloran
On Sun, May 10, 2020 at 1:51 AM Christophe Leroy
 wrote:
>
>
>
> Le 08/05/2020 à 19:41, Qian Cai a écrit :
> >
> >
> >> On May 8, 2020, at 10:39 AM, Qian Cai  wrote:
> >>
> >> Booting POWER9 PowerNV has this message,
> >>
> >> "ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
> >> early_ioremap() instead”
> >>
> >> but use the patch below will result in leaks because it will never call 
> >> early_iounmap() anywhere. However, it looks me it was by design that 
> >> phb->regs mapping would be there forever where it would be used in 
> >> pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() 
> >> initcall too strong?
> >>
> >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >> @@ -36,6 +36,7 @@
> >> #include 
> >> #include 
> >> #include 
> >> +#include 
> >>
> >> #include 
> >>
> >> @@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
> >> device_node *np,
> >> /* Get registers */
> >> if (!of_address_to_resource(np, 0, )) {
> >> phb->regs_phys = r.start;
> >> -   phb->regs = ioremap(r.start, resource_size());
> >> +   phb->regs = early_ioremap(r.start, resource_size());
> >> if (phb->regs == NULL)
> >> pr_err("  Failed to map registers !\n”);
> >
> > This will also trigger a panic with debugfs reads, so isn’t that this 
> > commit bogus at least for powerpc64?
> >
> > d538aadc2718 (“powerpc/ioremap: warn on early use of ioremap()")
>
> No d538aadc2718 is not bogus. That's the point, we want to remove all
> early usages of ioremap() in order to remove the hack with the
> ioremap_bot stuff and all, and stick to the generic ioremap logic.
>
> In order to do so, all early use of ioremap() has to be converted to
> early_ioremap() or to fixmap or anything else that allows to do ioremaps
> before the slab is ready.
>
> early_ioremap() is for temporary mappings necessary at boottime. For
> long lasting mappings, another method is to be used.
>
> Now, the point is that other architectures like for instance x86 don't
> seem to have to use early_ioremap() much. Powerpc is for instance doing
> early mappings for PCI. Seems like x86 initialises PCI once slab is
> ready. Can't powerpc do the same ?

Patches welcome.


Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Christophe Leroy




Le 08/05/2020 à 19:41, Qian Cai a écrit :




On May 8, 2020, at 10:39 AM, Qian Cai  wrote:

Booting POWER9 PowerNV has this message,

"ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
early_ioremap() instead”

but use the patch below will result in leaks because it will never call 
early_iounmap() anywhere. However, it looks me it was by design that phb->regs 
mapping would be there forever where it would be used in pnv_ioda_get_inval_reg(), 
so is just that check_early_ioremap_leak() initcall too strong?

--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -36,6 +36,7 @@
#include 
#include 
#include 
+#include 

#include 

@@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
/* Get registers */
if (!of_address_to_resource(np, 0, )) {
phb->regs_phys = r.start;
-   phb->regs = ioremap(r.start, resource_size());
+   phb->regs = early_ioremap(r.start, resource_size());
if (phb->regs == NULL)
pr_err("  Failed to map registers !\n”);


This will also trigger a panic with debugfs reads, so isn’t that this commit 
bogus at least for powerpc64?

d538aadc2718 (“powerpc/ioremap: warn on early use of ioremap()")


No d538aadc2718 is not bogus. That's the point, we want to remove all 
early usages of ioremap() in order to remove the hack with the 
ioremap_bot stuff and all, and stick to the generic ioremap logic.


In order to do so, all early use of ioremap() has to be converted to 
early_ioremap() or to fixmap or anything else that allows to do ioremaps 
before the slab is ready.


early_ioremap() is for temporary mappings necessary at boottime. For 
long lasting mappings, another method is to be used.


Now, the point is that other architectures like for instance x86 don't 
seem to have to use early_ioremap() much. Powerpc is for instance doing 
early mappings for PCI. Seems like x86 initialises PCI once slab is 
ready. Can't powerpc do the same ?


Christophe


Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Qian Cai



> On May 9, 2020, at 4:38 AM, Nicholas Piggin  wrote:
> 
> Your patch to use early_ioremap is faulting? I wonder why?

Yes, I don’t know the reasons either. I suppose not many places in other parts 
of the kernel which keep using those addresses from early_ioremap() after 
system booted. Otherwise, we would see those leak warnings elsewhere.

Thus, we probably have to audit the code, and if still necessary, call 
early_ioremap() and then early_iounmap() followed by a ioremap() once slab 
allocator is available?

Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Nicholas Piggin
Excerpts from Oliver O'Halloran's message of May 9, 2020 6:11 pm:
> On Sat, May 9, 2020 at 12:41 AM Qian Cai  wrote:
>>
>>  Booting POWER9 PowerNV has this message,
>>
>> "ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
>> early_ioremap() instead”
>>
>> but use the patch below will result in leaks because it will never call 
>> early_iounmap() anywhere. However, it looks me it was by design that 
>> phb->regs mapping would be there forever where it would be used in 
>> pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() 
>> initcall too strong?
> 
> The warning there is junk. The PHBs are setup at boot and never torn
> down so we're not "leaking" the mapping. It's supposed to be there for
> the lifetime of the kernel.
> 
> That said, we could probably move the PCI setup to a point later in
> boot where the normal ioremap can be be used. We would have to check
> for initcalls which depend on the PHBs being setup and delay those too
> though.

I think it helps to unify code a bit more and take special cases out of 
ioremap to have all these early calls use early_ioremap.

We actually do want to move these later if possible too, on radix they 
use memblock for page tables, and on hash they don't even set up proper 
kernel page tables but just bolt PTEs into the hash table.

Thanks,
Nick


Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Nicholas Piggin
Excerpts from Qian Cai's message of May 9, 2020 3:41 am:
> 
> 
>> On May 8, 2020, at 10:39 AM, Qian Cai  wrote:
>> 
>> Booting POWER9 PowerNV has this message,
>> 
>> "ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
>> early_ioremap() instead”
>> 
>> but use the patch below will result in leaks because it will never call 
>> early_iounmap() anywhere. However, it looks me it was by design that 
>> phb->regs mapping would be there forever where it would be used in 
>> pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() 
>> initcall too strong?
>> 
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -36,6 +36,7 @@
>> #include 
>> #include 
>> #include 
>> +#include 
>> 
>> #include 
>> 
>> @@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
>> device_node *np,
>>/* Get registers */
>>if (!of_address_to_resource(np, 0, )) {
>>phb->regs_phys = r.start;
>> -   phb->regs = ioremap(r.start, resource_size());
>> +   phb->regs = early_ioremap(r.start, resource_size());
>>if (phb->regs == NULL)
>>pr_err("  Failed to map registers !\n”);
> 
> This will also trigger a panic with debugfs reads, so isn’t that this commit 
> bogus at least for powerpc64?

Your patch to use early_ioremap is faulting? I wonder why?

Thanks,
Nick

> 
> d538aadc2718 (“powerpc/ioremap: warn on early use of ioremap()")
> 
> 11017.617022][T122068] Faulting instruction address: 0xc00db564
> [11017.617257][T122066] Faulting instruction address: 0xc00db564
> [11017.617950][T122073] Faulting instruction address: 0xc00db564
> [11017.61][T122064] BUG: Unable to handle kernel data access on read at 
> 0xffe20e10
> [11017.618935][T122064] Faulting instruction address: 0xc00db564
> [11017.737996][T122072] 
> [11017.738010][T122073] Oops: Kernel access of bad area, sig: 11 [#2]
> [11017.738024][T122073] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 
> DEBUG_PAGEALLOC NUMA PowerNV
> [11017.738051][T122073] Modules linked in: brd ext4 crc16 mbcache jbd2 loop 
> kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci libahci tg3 mdio libata 
> libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
> [11017.738110][T122073] CPU: 108 PID: 122073 Comm: read_all Tainted: G  D 
> W 5.7.0-rc4-next-20200508+ #4
> [11017.738138][T122073] NIP:  c00db564 LR: c056f660 CTR: 
> c00db550
> [11017.738173][T122073] REGS: c00374f6f980 TRAP: 0380   Tainted: G  D 
> W  (5.7.0-rc4-next-20200508+)
> [11017.738234][T122073] MSR:  90009033   CR: 
> 22002282  XER: 2004
> [11017.738278][T122073] CFAR: c056f65c IRQMASK: 0 
> [11017.738278][T122073] GPR00: c056f660 c00374f6fc10 
> c1689400 c000201ffc41aa00 
> [11017.738278][T122073] GPR04: c00374f6fc70  
>  0001 
> [11017.738278][T122073] GPR08:  ffe2 
>  c008ee380080 
> [11017.738278][T122073] GPR12: c00db550 c000201fff671280 
>   
> [11017.738278][T122073] GPR16: 0002 10040800 
> 1001ccd8 1001cc80 
> [11017.738278][T122073] GPR20: 1001cc98 1001ccc8 
> 1001cca8 1001cb48 
> [11017.738278][T122073] GPR24:   
> 03ff 7fffebb67390 
> [11017.738278][T122073] GPR28: c00374f6fd90 c000200c0c6a7550 
>  c000200c0c6a7500 
> [11017.738542][T122073] NIP [c00db564] 
> pnv_eeh_dbgfs_get_inbB+0x14/0x30
> [11017.738579][T122073] LR [c056f660] simple_attr_read+0xa0/0x180
> [11017.738613][T122073] Call Trace:
> [11017.738645][T122073] [c00374f6fc10] [c056f630] 
> simple_attr_read+0x70/0x180 (unreliable)
> [11017.738672][T122073] [c00374f6fcb0] [c064a2e0] 
> full_proxy_read+0x90/0xe0
> [11017.738686][T122073] [c00374f6fd00] [c051fe0c] 
> __vfs_read+0x3c/0x70
> [11017.738722][T122073] [c00374f6fd20] [c051feec] 
> vfs_read+0xac/0x170
> [11017.738757][T122073] [c00374f6fd70] [c052034c] 
> ksys_read+0x7c/0x140
> [11017.738818][T122073] [c00374f6fdc0] [c0038af4] 
> system_call_exception+0x114/0x1e0
> [11017.738867][T122073] [c00374f6fe20] [c000c8f0] 
> system_call_common+0xf0/0x278
> [11017.738916][T122073] Instruction dump:
> [11017.738948][T122073] 7c0004ac f9490d10 a14d0c78 3860 b14d0c7a 4e800020 
> 6000 7c0802a6 
> [11017.739001][T122073] 6000 e9230278 e9290028 7c0004ac  
> 0c09 4c00012c 3860 


Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-09 Thread Oliver O'Halloran
On Sat, May 9, 2020 at 12:41 AM Qian Cai  wrote:
>
>  Booting POWER9 PowerNV has this message,
>
> "ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
> early_ioremap() instead”
>
> but use the patch below will result in leaks because it will never call 
> early_iounmap() anywhere. However, it looks me it was by design that 
> phb->regs mapping would be there forever where it would be used in 
> pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() initcall 
> too strong?

The warning there is junk. The PHBs are setup at boot and never torn
down so we're not "leaking" the mapping. It's supposed to be there for
the lifetime of the kernel.

That said, we could probably move the PCI setup to a point later in
boot where the normal ioremap can be be used. We would have to check
for initcalls which depend on the PHBs being setup and delay those too
though.

Oliver


Re: ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-08 Thread Qian Cai



> On May 8, 2020, at 10:39 AM, Qian Cai  wrote:
> 
> Booting POWER9 PowerNV has this message,
> 
> "ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
> early_ioremap() instead”
> 
> but use the patch below will result in leaks because it will never call 
> early_iounmap() anywhere. However, it looks me it was by design that 
> phb->regs mapping would be there forever where it would be used in 
> pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() initcall 
> too strong?
> 
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -36,6 +36,7 @@
> #include 
> #include 
> #include 
> +#include 
> 
> #include 
> 
> @@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
> device_node *np,
>/* Get registers */
>if (!of_address_to_resource(np, 0, )) {
>phb->regs_phys = r.start;
> -   phb->regs = ioremap(r.start, resource_size());
> +   phb->regs = early_ioremap(r.start, resource_size());
>if (phb->regs == NULL)
>pr_err("  Failed to map registers !\n”);

This will also trigger a panic with debugfs reads, so isn’t that this commit 
bogus at least for powerpc64?

d538aadc2718 (“powerpc/ioremap: warn on early use of ioremap()")

11017.617022][T122068] Faulting instruction address: 0xc00db564
[11017.617257][T122066] Faulting instruction address: 0xc00db564
[11017.617950][T122073] Faulting instruction address: 0xc00db564
[11017.61][T122064] BUG: Unable to handle kernel data access on read at 
0xffe20e10
[11017.618935][T122064] Faulting instruction address: 0xc00db564
[11017.737996][T122072] 
[11017.738010][T122073] Oops: Kernel access of bad area, sig: 11 [#2]
[11017.738024][T122073] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 
DEBUG_PAGEALLOC NUMA PowerNV
[11017.738051][T122073] Modules linked in: brd ext4 crc16 mbcache jbd2 loop 
kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci libahci tg3 mdio libata 
libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
[11017.738110][T122073] CPU: 108 PID: 122073 Comm: read_all Tainted: G  D W 
5.7.0-rc4-next-20200508+ #4
[11017.738138][T122073] NIP:  c00db564 LR: c056f660 CTR: 
c00db550
[11017.738173][T122073] REGS: c00374f6f980 TRAP: 0380   Tainted: G  D W 
 (5.7.0-rc4-next-20200508+)
[11017.738234][T122073] MSR:  90009033   CR: 
22002282  XER: 2004
[11017.738278][T122073] CFAR: c056f65c IRQMASK: 0 
[11017.738278][T122073] GPR00: c056f660 c00374f6fc10 
c1689400 c000201ffc41aa00 
[11017.738278][T122073] GPR04: c00374f6fc70  
 0001 
[11017.738278][T122073] GPR08:  ffe2 
 c008ee380080 
[11017.738278][T122073] GPR12: c00db550 c000201fff671280 
  
[11017.738278][T122073] GPR16: 0002 10040800 
1001ccd8 1001cc80 
[11017.738278][T122073] GPR20: 1001cc98 1001ccc8 
1001cca8 1001cb48 
[11017.738278][T122073] GPR24:   
03ff 7fffebb67390 
[11017.738278][T122073] GPR28: c00374f6fd90 c000200c0c6a7550 
 c000200c0c6a7500 
[11017.738542][T122073] NIP [c00db564] pnv_eeh_dbgfs_get_inbB+0x14/0x30
[11017.738579][T122073] LR [c056f660] simple_attr_read+0xa0/0x180
[11017.738613][T122073] Call Trace:
[11017.738645][T122073] [c00374f6fc10] [c056f630] 
simple_attr_read+0x70/0x180 (unreliable)
[11017.738672][T122073] [c00374f6fcb0] [c064a2e0] 
full_proxy_read+0x90/0xe0
[11017.738686][T122073] [c00374f6fd00] [c051fe0c] 
__vfs_read+0x3c/0x70
[11017.738722][T122073] [c00374f6fd20] [c051feec] 
vfs_read+0xac/0x170
[11017.738757][T122073] [c00374f6fd70] [c052034c] 
ksys_read+0x7c/0x140
[11017.738818][T122073] [c00374f6fdc0] [c0038af4] 
system_call_exception+0x114/0x1e0
[11017.738867][T122073] [c00374f6fe20] [c000c8f0] 
system_call_common+0xf0/0x278
[11017.738916][T122073] Instruction dump:
[11017.738948][T122073] 7c0004ac f9490d10 a14d0c78 3860 b14d0c7a 4e800020 
6000 7c0802a6 
[11017.739001][T122073] 6000 e9230278 e9290028 7c0004ac  0c09 
4c00012c 3860 
[11017.739052][T122073] ---[ end trace f68728a0d3053b5e ]---
[11017.828156][T122073] 
[11017.828170][T122068] Oops: Kernel access of bad area, sig: 11 [#3]
[11017.828184][T122068] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 
DEBUG_PAGEALLOC NUMA PowerNV
[11017.828209][T122068] Modules linked in: brd ext4 crc16 mbcache jbd2 loop 
kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci libahci tg3 mdio libata 
libphy firmwa

ioremap() called early from pnv_pci_init_ioda_phb()

2020-05-08 Thread Qian Cai
 Booting POWER9 PowerNV has this message,

"ioremap() called early from pnv_pci_init_ioda_phb+0x420/0xdfc. Use 
early_ioremap() instead”

but use the patch below will result in leaks because it will never call 
early_iounmap() anywhere. However, it looks me it was by design that phb->regs 
mapping would be there forever where it would be used in 
pnv_ioda_get_inval_reg(), so is just that check_early_ioremap_leak() initcall 
too strong?

--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -3827,7 +3828,7 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
/* Get registers */
if (!of_address_to_resource(np, 0, )) {
phb->regs_phys = r.start;
-   phb->regs = ioremap(r.start, resource_size());
+   phb->regs = early_ioremap(r.start, resource_size());
if (phb->regs == NULL)
pr_err("  Failed to map registers !\n”);

[   23.080069][T1] [ cut here ]
[   23.080089][T1] Debug warning: early ioremap leak of 10 areas detected.
[   23.080089][T1] please boot with early_ioremap_debug and report the 
dmesg.
[   23.080157][T1] WARNING: CPU: 4 PID: 1 at mm/early_ioremap.c:99 
check_early_ioremap_leak+0xd4/0x108
[   23.080171][T1] Modules linked in:
[   23.080192][T1] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 
5.7.0-rc4-next-20200508+ #4
[   23.080214][T1] NIP:  c103f2d8 LR: c103f2d4 CTR: 

[   23.080226][T1] REGS: c0003df0f860 TRAP: 0700   Not tainted  
(5.7.0-rc4-next-20200508+)
[   23.080259][T1] MSR:  90029033   CR: 
48000222  XER: 2004
[   23.080296][T1] CFAR: c010d5a8 IRQMASK: 0 
[   23.080296][T1] GPR00: c103f2d4 c0003df0faf0 
c1689400 0072 
[   23.080296][T1] GPR04: 0006  
c0003df0f7e4 0004 
[   23.080296][T1] GPR08: 001ffbb6  
c0003dee6680 0002 
[   23.080296][T1] GPR12:  c01fae00 
c1057860 c10578b0 
[   23.080296][T1] GPR16: c1002d38 c14f0660 
c14f0680 c14f06a0 
[   23.080296][T1] GPR20: c14f06c0 c14f06e0 
c14f0700 c14f0720 
[   23.080296][T1] GPR24: c0c4bc30 c00486b82000 
c15a0fe0 c15a0fc0 
[   23.080296][T1] GPR28: 0010 0010 
c1061e30 000a 
[   23.080507][T1] NIP [c103f2d8] 
check_early_ioremap_leak+0xd4/0x108
[   23.080530][T1] LR [c103f2d4] check_early_ioremap_leak+0xd0/0x108
[   23.080552][T1] Call Trace:
[   23.080571][T1] [c0003df0faf0] [c103f2d4] 
check_early_ioremap_leak+0xd0/0x108 (unreliable)
[   23.080607][T1] [c0003df0fb80] [c001130c] 
do_one_initcall+0xcc/0x660
[   23.080648][T1] [c0003df0fc80] [c1004c18] 
kernel_init_freeable+0x480/0x568
[   23.080681][T1] [c0003df0fdb0] [c0012180] 
kernel_init+0x24/0x194
[   23.080713][T1] [c0003df0fe20] [c000cb28] 
ret_from_kernel_thread+0x5c/0x74

This is from the early_ioremap_debug dmesg.

[0.00][T0] [ cut here ]
[0.00][T0] __early_ioremap(0x000600c3c001, 0001) [0] => 
 + ffbe
[0.00][T0] WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:162 
__early_ioremap+0x2d8/0x408
[0.00][T0] Modules linked in:
[0.00][T0] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.7.0-rc4-next-20200508+ #4
[0.00][T0] NIP:  c103f5e4 LR: c103f5e0 CTR: 
c01e77f0
[0.00][T0] REGS: c168f980 TRAP: 0700   Not tainted  
(5.7.0-rc4-next-20200508+)
[0.00][T0] MSR:  90021033   CR: 
28000248  XER: 2004
[0.00][T0] CFAR: c010d5a8 IRQMASK: 1 
[0.00][T0] GPR00: c103f5e0 c168fc10 
c1689400 0050 
[0.00][T0] GPR04: c152f6f8  
c168f904  
[0.00][T0] GPR08:   
c162f600 0002 
[0.00][T0] GPR12: c01e77f0 c5b3 
  
[0.00][T0] GPR16:   
 1000 
[0.00][T0] GPR20:  81ae 
  
[0.00][T0] GPR24: 0001 c1061da8 
0008 0008 
[0.00][T0] GPR28:  c1061db0 
 c1061eb8 
[0.00][T0] NIP [c103f5e4] __early_ioremap+0x2d8/0x4