Re: ehea crash on boot

2016-10-11 Thread Jan Stancek




- Original Message -
> From: "Michael Ellerman" <m...@ellerman.id.au>
> To: "Jan Stancek" <jstan...@redhat.com>, "Denis Kirjanov" 
> <k...@linux-powerpc.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Sent: Tuesday, 11 October, 2016 7:46:31 AM
> Subject: Re: ehea crash on boot
> 
> Jan Stancek <jstan...@redhat.com> writes:
> 
> > Hi Denis / all,
> >
> > Do you know if there is a patch or lead for this problem? I seem
> > to be hitting same Oops with P730 lpar when running 4.8 (see below),
> > but 4.7.7 looks OK.
> 
> Does this fix it?

Yes, it does. dmesg looks clean and network is up.

Regards,
Jan

> 
> cheers
> 
> 
> diff --git a/arch/powerpc/mm/hash_utils_64.c
> b/arch/powerpc/mm/hash_utils_64.c
> index 4cebc31e53de..4e83d872872d 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -526,7 +526,7 @@ static bool might_have_hea(void)
>*/
>  #ifdef CONFIG_IBMEBUS
>   return !cpu_has_feature(CPU_FTR_ARCH_207S) &&
> - !firmware_has_feature(FW_FEATURE_SPLPAR);
> + firmware_has_feature(FW_FEATURE_SPLPAR);
>  #else
>   return false;
>  #endif
> 


Re: ehea crash on boot

2016-10-10 Thread Michael Ellerman
Jan Stancek  writes:

> Hi Denis / all,
>
> Do you know if there is a patch or lead for this problem? I seem
> to be hitting same Oops with P730 lpar when running 4.8 (see below),
> but 4.7.7 looks OK.

Does this fix it?

cheers


diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 4cebc31e53de..4e83d872872d 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -526,7 +526,7 @@ static bool might_have_hea(void)
 */
 #ifdef CONFIG_IBMEBUS
return !cpu_has_feature(CPU_FTR_ARCH_207S) &&
-   !firmware_has_feature(FW_FEATURE_SPLPAR);
+   firmware_has_feature(FW_FEATURE_SPLPAR);
 #else
return false;
 #endif


Re: ehea crash on boot

2016-10-10 Thread Jan Stancek
Hi Denis / all,

Do you know if there is a patch or lead for this problem? I seem
to be hitting same Oops with P730 lpar when running 4.8 (see below),
but 4.7.7 looks OK.

Regards,
Jan

[8.698424] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready 
[8.713373] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready 
[8.713940] mm: Hashing failure ! EA=0xd80080004040 
access=0x800e current=NetworkManager 
[8.713949] trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 
pte=0xc0003cc033e701ae 
[8.713958] mm: Hashing failure ! EA=0xd80080004040 
access=0x800e current=NetworkManager 
[8.713966] trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 
pte=0xc0003cc033e701ae 
[8.713979] Unable to handle kernel paging request for data at address 
0xd80080004040 
[8.713985] Faulting instruction address: 0xd11cc250 
[8.713992] Oops: Kernel access of bad area, sig: 7 [#1] 
[8.713996] SMP NR_CPUS=2048 NUMA pSeries 
[8.714008] Modules linked in: sg uio_pdrv_genirq uio nfsd auth_rpcgss 
nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi 
scsi_transport_srp ibmveth ehea dm_mirror dm_region_hash dm_log dm_mod 
[8.714063] CPU: 2 PID: 1148 Comm: NetworkManager Not tainted 
4.8.0-1.el7.test.ppc64.debug #1 
[8.714072] task: c65e2080 task.stack: c6668000 
[8.714078] NIP: d11cc250 LR: d11cc118 CTR: 0042c120 
[8.714086] REGS: c666ab00 TRAP: 0300   Not tainted  
(4.8.0-1.el7.test.ppc64.debug) 
[8.714092] MSR: 80009032   CR: 24288442  XER: 
0020 
[8.714120] CFAR: c00087d0 DAR: d80080004040 DSISR: 4200 
SOFTE: 1  
GPR00: d11cc118 c666ad80 d11dbdd8 c6327f80  
GPR04:  c000b080 00029000 00028000  
GPR08: c000b080  d80080004000 d953  
GPR12: 8001 cea61200    
GPR16: 07fe  0001   
GPR20: c000b53ecbd0 c000b53ecb00 c000b53ec1e8 c000b53ec1d0  
GPR24: c000b53ec1b8 c000b53ec200  0015  
GPR28: 09fd c000bbb59418 0028 c6327f80  
[8.714254] NIP [d11cc250] .ehea_create_cq+0x280/0x340 [ehea] 
[8.714263] LR [d11cc118] .ehea_create_cq+0x148/0x340 [ehea] 
[8.714270] Call Trace: 
[8.714278] [c666ad80] [d11cc118] 
.ehea_create_cq+0x148/0x340 [ehea] (unreliable) 
[8.714292] [c666ae30] [d11c5e28] .ehea_up+0x258/0x1200 
[ehea] 
[8.714304] [c666afa0] [d11c6e14] .ehea_open+0x44/0x1a0 
[ehea] 
[8.714316] [c666b030] [c09bc4c4] .__dev_open+0x164/0x310 
[8.714328] [c666b0d0] [c09c6998] 
.__dev_change_flags+0x158/0x4f0 
[8.714339] [c666b180] [c09c6d5c] 
.dev_change_flags+0x2c/0x220 
[8.714349] [c666b220] [c09e2d3c] .do_setlink+0x38c/0xef0 
[8.714359] [c666b3a0] [c09e65cc] .rtnl_newlink+0x97c/0xb10 
[8.714369] [c666b6b0] [c09e4ae4] 
.rtnetlink_rcv_msg+0xc4/0x380 
[8.714379] [c666b7a0] [c0a1c05c] 
.netlink_rcv_skb+0x12c/0x150 
[8.714388] [c666b830] [c09e1b68] .rtnetlink_rcv+0x38/0x60 
[8.714396] [c666b8b0] [c0a1bb74] 
.netlink_unicast+0x554/0x6b0 
[8.714405] [c666b990] [c0a1cbcc] 
.netlink_sendmsg+0x41c/0x490 
[8.714415] [c666ba70] [c0986e18] 
.___sys_sendmsg+0x278/0x370 
[8.714425] [c666bc50] [c09892d4] .SyS_sendmsg+0xc4/0x130 
[8.714436] [c666bd50] [c098a180] 
.SyS_socketcall+0x3d0/0x4e0 
[8.714448] [c666be30] [c0009590] system_call+0x38/0xec 
[8.714455] Instruction dump: 
[8.714462] 38a1 4bffe7fd 6000 7fe3fb78 48003081 e8410028 3860 
4830  
[8.714484] e95f0038 3920 7fe3fb78 f93f0010  3920 79290004 
e95f0038  
[8.714506] ---[ end trace fe4fbc224578dd0c ]--- 


Re: ehea crash on boot

2016-09-26 Thread Denis Kirjanov
On Monday, September 26, 2016, Mathieu Malaterre <
mathieu.malate...@gmail.com> wrote:

> On Fri, Sep 23, 2016 at 2:50 PM, Denis Kirjanov  > wrote:
> > Heh, another thing to debug :)
> >
> > mm: Hashing failure ! EA=0xd80080124040 access=0x800e
> > current=NetworkManager
> > trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2
> pte=0xc0003bc0300301ae
> > mm: Hashing failure ! EA=0xd80080124040 access=0x800e
> > current=NetworkManager
> > trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2
> pte=0xc0003bc0300301ae
> > Unable to handle kernel paging request for data at address
> 0xd80080124040
> > Faulting instruction address: 0xc06f21a0
> > cpu 0x8: Vector: 300 (Data Access) at [c005a8b92b50]
> > pc: c06f21a0: .ehea_create_cq+0x160/0x230
> > lr: c06f2164: .ehea_create_cq+0x124/0x230
> > sp: c005a8b92dd0
> > msr: 80009032
> > dar: d80080124040
> > dsisr: 4200
> > current = 0xc005a8b68200
> > paca = 0xcea94000 softe: 0 irq_happened: 0x01
> > pid = 6787, comm = NetworkManager
> > Linux version 4.8.0-rc6-00214-g4cea877 (kda@ps700) (gcc version 4.8.5
> > 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Fri Sep 23 15:01:08 MSK 2016
> > enter ? for help
> > [c005a8b92dd0] c06f2140 .ehea_create_cq+0x100/0x230
> (unreliable)
> > [c005a8b92e70] c06ed448 .ehea_up+0x288/0xed0
> > [c005a8b92fe0] c06ee314 .ehea_open+0x44/0x130
> > [c005a8b93070] c0812324 .__dev_open+0x154/0x220
> > [c005a8b93110] c0812734 .__dev_change_flags+0xd4/0x1e0
> > [c005a8b931b0] c081286c .dev_change_flags+0x2c/0x80
> > [c005a8b93240] c0829f0c .do_setlink+0x37c/0xe50
> > [c005a8b933c0] c082c884 .rtnl_newlink+0x5e4/0x9b0
> > [c005a8b936d0] c082cd08 .rtnetlink_rcv_msg+0xb8/0x2f0
> > [c005a8b937a0] c084e25c .netlink_rcv_skb+0x12c/0x150
> > [c005a8b93830] c0829458 .rtnetlink_rcv+0x38/0x60
> > [c005a8b938b0] c084d814 .netlink_unicast+0x1e4/0x350
> > [c005a8b93960] c084def8 .netlink_sendmsg+0x418/0x480
> > [c005a8b93a40] c07defac .sock_sendmsg+0x2c/0x60
> > [c005a8b93ab0] c07e0cbc .___sys_sendmsg+0x30c/0x320
> > [c005a8b93c90] c07e21bc .__sys_sendmsg+0x4c/0xb0
> > [c005a8b93d80] c07e2dec .SyS_socketcall+0x34c/0x3d0
> > [c005a8b93e30] c000946c system_call+0x38/0x108
>
> Can you turn UBSAN on for this ?


I'll get back to the problem and send a fix when I'll finish my trip


> --
> Mathieu
>


Re: ehea crash on boot

2016-09-26 Thread Mathieu Malaterre
On Fri, Sep 23, 2016 at 2:50 PM, Denis Kirjanov  wrote:
> Heh, another thing to debug :)
>
> mm: Hashing failure ! EA=0xd80080124040 access=0x800e
> current=NetworkManager
> trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 pte=0xc0003bc0300301ae
> mm: Hashing failure ! EA=0xd80080124040 access=0x800e
> current=NetworkManager
> trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 pte=0xc0003bc0300301ae
> Unable to handle kernel paging request for data at address 0xd80080124040
> Faulting instruction address: 0xc06f21a0
> cpu 0x8: Vector: 300 (Data Access) at [c005a8b92b50]
> pc: c06f21a0: .ehea_create_cq+0x160/0x230
> lr: c06f2164: .ehea_create_cq+0x124/0x230
> sp: c005a8b92dd0
> msr: 80009032
> dar: d80080124040
> dsisr: 4200
> current = 0xc005a8b68200
> paca = 0xcea94000 softe: 0 irq_happened: 0x01
> pid = 6787, comm = NetworkManager
> Linux version 4.8.0-rc6-00214-g4cea877 (kda@ps700) (gcc version 4.8.5
> 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Fri Sep 23 15:01:08 MSK 2016
> enter ? for help
> [c005a8b92dd0] c06f2140 .ehea_create_cq+0x100/0x230 (unreliable)
> [c005a8b92e70] c06ed448 .ehea_up+0x288/0xed0
> [c005a8b92fe0] c06ee314 .ehea_open+0x44/0x130
> [c005a8b93070] c0812324 .__dev_open+0x154/0x220
> [c005a8b93110] c0812734 .__dev_change_flags+0xd4/0x1e0
> [c005a8b931b0] c081286c .dev_change_flags+0x2c/0x80
> [c005a8b93240] c0829f0c .do_setlink+0x37c/0xe50
> [c005a8b933c0] c082c884 .rtnl_newlink+0x5e4/0x9b0
> [c005a8b936d0] c082cd08 .rtnetlink_rcv_msg+0xb8/0x2f0
> [c005a8b937a0] c084e25c .netlink_rcv_skb+0x12c/0x150
> [c005a8b93830] c0829458 .rtnetlink_rcv+0x38/0x60
> [c005a8b938b0] c084d814 .netlink_unicast+0x1e4/0x350
> [c005a8b93960] c084def8 .netlink_sendmsg+0x418/0x480
> [c005a8b93a40] c07defac .sock_sendmsg+0x2c/0x60
> [c005a8b93ab0] c07e0cbc .___sys_sendmsg+0x30c/0x320
> [c005a8b93c90] c07e21bc .__sys_sendmsg+0x4c/0xb0
> [c005a8b93d80] c07e2dec .SyS_socketcall+0x34c/0x3d0
> [c005a8b93e30] c000946c system_call+0x38/0x108

Can you turn UBSAN on for this ?

-- 
Mathieu


ehea crash on boot

2016-09-23 Thread Denis Kirjanov
Heh, another thing to debug :)

mm: Hashing failure ! EA=0xd80080124040 access=0x800e
current=NetworkManager
trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 pte=0xc0003bc0300301ae
mm: Hashing failure ! EA=0xd80080124040 access=0x800e
current=NetworkManager
trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 pte=0xc0003bc0300301ae
Unable to handle kernel paging request for data at address 0xd80080124040
Faulting instruction address: 0xc06f21a0
cpu 0x8: Vector: 300 (Data Access) at [c005a8b92b50]
pc: c06f21a0: .ehea_create_cq+0x160/0x230
lr: c06f2164: .ehea_create_cq+0x124/0x230
sp: c005a8b92dd0
msr: 80009032
dar: d80080124040
dsisr: 4200
current = 0xc005a8b68200
paca = 0xcea94000 softe: 0 irq_happened: 0x01
pid = 6787, comm = NetworkManager
Linux version 4.8.0-rc6-00214-g4cea877 (kda@ps700) (gcc version 4.8.5
20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Fri Sep 23 15:01:08 MSK 2016
enter ? for help
[c005a8b92dd0] c06f2140 .ehea_create_cq+0x100/0x230 (unreliable)
[c005a8b92e70] c06ed448 .ehea_up+0x288/0xed0
[c005a8b92fe0] c06ee314 .ehea_open+0x44/0x130
[c005a8b93070] c0812324 .__dev_open+0x154/0x220
[c005a8b93110] c0812734 .__dev_change_flags+0xd4/0x1e0
[c005a8b931b0] c081286c .dev_change_flags+0x2c/0x80
[c005a8b93240] c0829f0c .do_setlink+0x37c/0xe50
[c005a8b933c0] c082c884 .rtnl_newlink+0x5e4/0x9b0
[c005a8b936d0] c082cd08 .rtnetlink_rcv_msg+0xb8/0x2f0
[c005a8b937a0] c084e25c .netlink_rcv_skb+0x12c/0x150
[c005a8b93830] c0829458 .rtnetlink_rcv+0x38/0x60
[c005a8b938b0] c084d814 .netlink_unicast+0x1e4/0x350
[c005a8b93960] c084def8 .netlink_sendmsg+0x418/0x480
[c005a8b93a40] c07defac .sock_sendmsg+0x2c/0x60
[c005a8b93ab0] c07e0cbc .___sys_sendmsg+0x30c/0x320
[c005a8b93c90] c07e21bc .__sys_sendmsg+0x4c/0xb0
[c005a8b93d80] c07e2dec .SyS_socketcall+0x34c/0x3d0
[c005a8b93e30] c000946c system_call+0x38/0x108