Re: Race with ip=dhcp bootparameter in ip_rcv_finish on am335x

2015-09-23 Thread Nikolay Aleksandrov
On 09/23/2015 12:29 PM, Alexander Aring wrote:
> Hi,
> 
> I am using am335x (beaglebone white) with "ip=dhcp" inside my
> bootparameters. It seems there exists some race, rarely it boots fine
> and often it crashed with following messages:
> 
> First (Without successful dhcp ip assign and is very rarely):
> 
> [6.249083] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [6.398943] .
> [7.400822] Unable to handle kernel paging request at virtual address 
> 7402a8c8
> [7.408419] pgd = c0004000
> [7.411351] [7402a8c8] *pgd=
> [7.415121] Internal error: Oops: 5 [#1] SMP ARM
> [7.419968] Modules linked in:
> [7.423197] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW   
> 4.2.0-11195-ge81be90 #124
> [7.432118] Hardware name: Generic AM33XX (Flattened Device Tree)
> [7.438509] task: c0a996c8 ti: c0a92000 task.ti: c0a92000
> [7.444189] PC is at ip_route_input_noref+0x620/0xeb4
> [7.449491] LR is at trace_hardirqs_on_caller+0xb0/0x1e8
<<>>
> [7.759169] [] (ip_route_input_noref) from [] 
> (ip_rcv_finish+0xd4/0x67c)
<<>>
> 
> 
> 
> I tried to bisect this issue without success, sometimes it's booting fine
> and sometimes I got the above messages. Could anybody help me here to
> fixing this issue?
> 
> - Alex

Hi Alex,
This should be fixed by commit bde6f9ded1bd 
("net: Initialize table in fib result") in net-next.

Cheers,
 Nik


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Race with ip=dhcp bootparameter in ip_rcv_finish on am335x

2015-09-23 Thread Alexander Aring
Hi,

I am using am335x (beaglebone white) with "ip=dhcp" inside my
bootparameters. It seems there exists some race, rarely it boots fine
and often it crashed with following messages:

First (Without successful dhcp ip assign and is very rarely):

[6.249083] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[6.398943] .
[7.400822] Unable to handle kernel paging request at virtual address 
7402a8c8
[7.408419] pgd = c0004000
[7.411351] [7402a8c8] *pgd=
[7.415121] Internal error: Oops: 5 [#1] SMP ARM
[7.419968] Modules linked in:
[7.423197] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW   
4.2.0-11195-ge81be90 #124
[7.432118] Hardware name: Generic AM33XX (Flattened Device Tree)
[7.438509] task: c0a996c8 ti: c0a92000 task.ti: c0a92000
[7.444189] PC is at ip_route_input_noref+0x620/0xeb4
[7.449491] LR is at trace_hardirqs_on_caller+0xb0/0x1e8
[7.455055] pc : []lr : []psr: 2113
[7.455055] sp : c0a93c70  ip : 000f  fp : ff8f
[7.467072] r10: c0b35c80  r9 :   r8 : ce5f9400
[7.472543] r7 : ce5ef000  r6 :   r5 : ce615640  r4 : ce5ffd40
[7.479378] r3 : 7402a8c0  r2 : 0001  r1 : c0549f74  r0 : ce5ffd40
[7.486216] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[7.493686] Control: 10c5387d  Table: 80004019  DAC: 0051
[7.499699] Process swapper/0 (pid: 0, stack limit = 0xc0a92218)
[7.505989] Stack: (0xc0a93c70 to 0xc0a94000)
[7.510561] 3c60:   
c054afa4 
[7.519131] 3c80:  c0b56080    00070044 
0004 
[7.527705] 3ca0: 7402a8c0 012d  0004   
c0586298 
[7.536278] 3cc0:   0102a8c0 7402a8c0 cfba6c34  
c0a996c8 ce615640
[7.544852] 3ce0: c0551310 ce61b850 c07c6b10 c0b560b8 c0a95744  
 c0550668
[7.553422] 3d00: ce5ef000 c0a5ed00 6113 c0090d44 0001 ce615640 
c04f85d0 0050
[7.561996] 3d20: ce61b800 ce5ef000 c0a95744   c0551310 
0001 
[7.570563] 3d40: c04f85d0 ce000140 ce638100 0050 ce61b800 cfba6b10 
ce61b99a ce61b99a
[7.579135] 3d60: c0a82730 c0a92000 ce61b850 c0a5ed94 0100 0102a8c0 
c0a996c8 c0a9791c
[7.587709] 3d80: ce5ef000 ce5ef048 0008 ce615640 ce5ef05c ce6156ec 
ce615640 c04f85d0
[7.596289] 3da0: c0ad3f0c  c0a92000 ce5ef054  c04fb4e8 
6113 ce615640
[7.604869] 3dc0: ce5ef05c   c04fb4e8   
c0b5697c ce6381c0
[7.613445] 3de0: 0020 ce615640 c0b55f7e ce615640 ce5ef5c0 0158 
ce5f4550 8db6
[7.622027] 3e00: 0100 c04fb534   c04fb4e8 ce5ef000 
 4113
[7.630603] 3e20: ce64f900 c04ed184 ce64f942 ce5ef000 ce5ef000 ce5ef000 
ce6381c0 c041fec4
[7.639172] 3e40: 0001 d0978000 ce5f2c50 0001 d0978000 ce5f2c70 
0158 c041d35c
[7.647750] 3e60: 0001  ce5f2c50 0040 cfb9f0c0 c0a93eb0 
0040 c041d3d8
[7.656322] 3e80: ce5ef5e8 ce5ef5e8 0040 c0421f44 ce5ef5e8 0001 
012c c04fcd3c
[7.664892] 3ea0: 6193 0f10e000 cfb9f0c0 c0a910c0 c0a93eb0 c0a93eb0 
c0a93eb8 c0a93eb8
[7.673466] 3ec0: c0b5697c c0a9408c  0008 0003 c0b565c4 
0004 000c
[7.682039] 3ee0: 0100 c0042614 c0a996c8 ce199900 000a 8db5 
0020 0001
[7.690602] 3f00: 00aa c0a8e828  c0a95744  0001 
ce017000 
[7.699170] 3f20: c077471c c0042b3c c0a8e828 c009bb5c c0a93f50 c0010218 
2113 
[7.707733] 3f40: c0a93f84 c0b56e14  c076cef8 0001 0001 
 c0a996c8
[7.716309] 3f60: c0a92000 c0a94b2c  c0a94acc c0b56e14  
 c077471c
[7.724881] 3f80:  c0a93fa0 c008e21c c0010218 2113  
0051 c0010214
[7.733453] 3fa0:  c0083cb4 c0a8c4e4 c0a8f8c8 c0b64000 c0a94a00 
 c0a19c1c
[7.742025] 3fc0:    c0a1967c  c0a76a48 
c0b64214 c0a94ab0
[7.750588] 3fe0: c0a76a44 c0a9b258 80004059 413fc082  8000807c 
 
[7.759169] [] (ip_route_input_noref) from [] 
(ip_rcv_finish+0xd4/0x67c)
[7.768010] [] (ip_rcv_finish) from [] 
(ip_rcv+0x490/0x72c)
[7.775672] [] (ip_rcv) from [] 
(__netif_receive_skb_core+0x240/0x97c)
[7.784339] [] (__netif_receive_skb_core) from [] 
(netif_receive_skb_internal+0x5c/0x1d8)
[7.794742] [] (netif_receive_skb_internal) from [] 
(cpsw_rx_handler+0xac/0x1e8)
[7.804318] [] (cpsw_rx_handler) from [] 
(__cpdma_chan_process+0xe8/0x128)
[7.813346] [] (__cpdma_chan_process) from [] 
(cpdma_chan_process+0x3c/0x5c)
[7.822544] [] (cpdma_chan_process) from [] 
(cpsw_rx_poll+0x14/0x94)
[7.831024] [] (cpsw_rx_poll) from [] 
(net_rx_action+0x1dc/0x33c)
[7.839232] [] (net_rx_action) from [] 
(__do_softirq+0x138/0x340)
[

Re: Race with ip=dhcp bootparameter in ip_rcv_finish on am335x

2015-09-23 Thread Steven Rostedt
On Wed, 23 Sep 2015 14:27:15 +0200
Alexander Aring  wrote:

\
> Finally I figured out that commit
> 4876cc779ff525b9c2376d8076edf47815e71f2c ("SUNRPC: Ensure we release the
> TCP socket once it has been closed") occur this races. After reverting
> this commit everything works fine.

Yay! Everybody sing! "Fix one bug and replace it by another..."

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Race with ip=dhcp bootparameter in ip_rcv_finish on am335x

2015-09-23 Thread Trond Myklebust
+linux-nfs mailing list

On Wed, Sep 23, 2015 at 8:27 AM, Alexander Aring  wrote:
> Hi,
>
> On Wed, Sep 23, 2015 at 01:57:57PM +0200, Alexander Aring wrote:
> ...
>> >
>>
>> Ok, I think I have two issues with two different races the first one was
>> fixed by bde6f9ded1bd ("net: Initialize table in fib result"), but the
>> second one is still there:
>>
>> [8.615806] [ cut here ]
>> [8.620678] Kernel BUG at c016c3d0 [verbose debug info unavailable]
>> [8.627229] Internal error: Oops - BUG: 0 [#1] SMP ARM
>> [8.632611] Modules linked in:
>> [8.635836] CPU: 0 PID: 766 Comm: kworker/0:1H Tainted: GW   
>> 4.2.0-11248-gfbd0351 #140
>> [8.645208] Hardware name: Generic AM33XX (Flattened Device Tree)
>> [8.651616] Workqueue: rpciod xprt_autoclose
>> [8.656091] task: ce3c52c0 ti: ce642000 task.ti: ce642000
>> [8.661744] PC is at iput+0x1a8/0x1f0
>> [8.665579] LR is at xprt_autoclose+0x2c/0x54
>> [8.670136] pc : []lr : []psr: 2113
>> [8.670136] sp : ce643e80  ip :   fp : c0b56688
>> [8.682133] r10: 0001  r9 : ce643ec8  r8 : 
>> [8.687599] r7 : feff3000  r6 : ce615800  r5 : ce615bc0  r4 : ce615b54
>> [8.694421] r3 : 0060  r2 : 000f  r1 : 0f10e000  r0 : cdbed720
>> [8.701254] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
>> none
>> [8.708718] Control: 10c5387d  Table: 80004019  DAC: 0051
>> [8.714732] Process kworker/0:1H (pid: 766, stack limit = 0xce642218)
>> [8.721464] Stack: (0xce643e80 to 0xce644000)
>> [8.726033] 3e80: c066f828 ce615b54 ce615bc0 ce615800 feff3000  
>> ce643ec8 c066c884
>> [8.734596] 3ea0: ce615b54 ce5ff440 cfb9e340 c0057928 0001  
>> c00578b4 cfb9e340
>> [8.743152] 3ec0: c0057cc8  c137972c c0cc1960  c09979f4 
>> cfb9e340 cfb9e340
>> [8.751714] 3ee0: ce5ff458 cfb9e370 ce642000 0008 c0b55ba0 ce5ff440 
>> cfb9e340 c0057c54
>> [8.760274] 3f00: ce659940 ce5ff440 c0057c18  ce659940 ce5ff440 
>> c0057c18 
>> [8.768834] 3f20:    c005d918 c0b5697c  
>>  ce5ff440
>> [8.777390] 3f40:   dead4ead   c0b65d60 
>>  
>> [8.785951] 3f60: c0922088 ce643f64 ce643f64   dead4ead 
>>  
>> [8.794513] 3f80: c0b65d60   c0922088 ce643f90 ce643f90 
>> ce643fac ce659940
>> [8.803069] 3fa0: c005d844   c000f770   
>>  
>> [8.811628] 3fc0:       
>>  
>> [8.820185] 3fe0:     0013  
>> 8fdf6861 8fdf6c61
>> [8.828741] [] (iput) from [] 
>> (xprt_autoclose+0x2c/0x54)
>> [8.836133] [] (xprt_autoclose) from [] 
>> (process_one_work+0x19c/0x48c)
>> [8.844784] [] (process_one_work) from [] 
>> (worker_thread+0x3c/0x4a0)
>> [8.853256] [] (worker_thread) from [] 
>> (kthread+0xd4/0xf0)
>> [8.860827] [] (kthread) from [] 
>> (ret_from_fork+0x14/0x24)
>> [8.868387] Code: e59f0044 e59f1044 ebfb467a eac1 (e7f001f2)
>
> Additional missing information is that I am booting via nfsroot and
> xprt_autoclose is something from sunrpc.
>
> Finally I figured out that commit
> 4876cc779ff525b9c2376d8076edf47815e71f2c ("SUNRPC: Ensure we release the
> TCP socket once it has been closed") occur this races. After reverting
> this commit everything works fine.
>
> I added now:
>
> Steven Rostedt 
> Trond Myklebust 
>
> to cc to report about this issue.
>

Is that happening when the transport is being torn down? If so, is it
fixed by 
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=79234c3db6842a3de03817211d891e0c2878f756
?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Race with ip=dhcp bootparameter in ip_rcv_finish on am335x

2015-09-23 Thread Alexander Aring
Hi,

On Wed, Sep 23, 2015 at 12:36:13PM +0200, Nikolay Aleksandrov wrote:
> On 09/23/2015 12:29 PM, Alexander Aring wrote:
> > Hi,
> > 
> > I am using am335x (beaglebone white) with "ip=dhcp" inside my
> > bootparameters. It seems there exists some race, rarely it boots fine
> > and often it crashed with following messages:
> > 
> > First (Without successful dhcp ip assign and is very rarely):
> > 
> > [6.249083] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> > [6.398943] .
> > [7.400822] Unable to handle kernel paging request at virtual address 
> > 7402a8c8
> > [7.408419] pgd = c0004000
> > [7.411351] [7402a8c8] *pgd=
> > [7.415121] Internal error: Oops: 5 [#1] SMP ARM
> > [7.419968] Modules linked in:
> > [7.423197] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW   
> > 4.2.0-11195-ge81be90 #124
> > [7.432118] Hardware name: Generic AM33XX (Flattened Device Tree)
> > [7.438509] task: c0a996c8 ti: c0a92000 task.ti: c0a92000
> > [7.444189] PC is at ip_route_input_noref+0x620/0xeb4
> > [7.449491] LR is at trace_hardirqs_on_caller+0xb0/0x1e8
> <<>>
> > [7.759169] [] (ip_route_input_noref) from [] 
> > (ip_rcv_finish+0xd4/0x67c)
> <<>>
> > 
> > 
> > 
> > I tried to bisect this issue without success, sometimes it's booting fine
> > and sometimes I got the above messages. Could anybody help me here to
> > fixing this issue?
> > 
> > - Alex
> 
> Hi Alex,
> This should be fixed by commit bde6f9ded1bd 
> ("net: Initialize table in fib result") in net-next.
> 

I tried to cherry-picked this commit and saw this is already in net-next.
I also tried to boot current net, the race is still there.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Race with ip=dhcp bootparameter in ip_rcv_finish on am335x

2015-09-23 Thread Alexander Aring
On Wed, Sep 23, 2015 at 12:36:13PM +0200, Nikolay Aleksandrov wrote:
> On 09/23/2015 12:29 PM, Alexander Aring wrote:
> > Hi,
> > 
> > I am using am335x (beaglebone white) with "ip=dhcp" inside my
> > bootparameters. It seems there exists some race, rarely it boots fine
> > and often it crashed with following messages:
> > 
> > First (Without successful dhcp ip assign and is very rarely):
> > 
> > [6.249083] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> > [6.398943] .
> > [7.400822] Unable to handle kernel paging request at virtual address 
> > 7402a8c8
> > [7.408419] pgd = c0004000
> > [7.411351] [7402a8c8] *pgd=
> > [7.415121] Internal error: Oops: 5 [#1] SMP ARM
> > [7.419968] Modules linked in:
> > [7.423197] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW   
> > 4.2.0-11195-ge81be90 #124
> > [7.432118] Hardware name: Generic AM33XX (Flattened Device Tree)
> > [7.438509] task: c0a996c8 ti: c0a92000 task.ti: c0a92000
> > [7.444189] PC is at ip_route_input_noref+0x620/0xeb4
> > [7.449491] LR is at trace_hardirqs_on_caller+0xb0/0x1e8
> <<>>
> > [7.759169] [] (ip_route_input_noref) from [] 
> > (ip_rcv_finish+0xd4/0x67c)
> <<>>
> > 
> > 
> > 
> > I tried to bisect this issue without success, sometimes it's booting fine
> > and sometimes I got the above messages. Could anybody help me here to
> > fixing this issue?
> > 
> > - Alex
> 
> Hi Alex,
> This should be fixed by commit bde6f9ded1bd 
> ("net: Initialize table in fib result") in net-next.
> 

Ok, I think I have two issues with two different races the first one was
fixed by bde6f9ded1bd ("net: Initialize table in fib result"), but the
second one is still there:

[8.615806] [ cut here ]
[8.620678] Kernel BUG at c016c3d0 [verbose debug info unavailable]
[8.627229] Internal error: Oops - BUG: 0 [#1] SMP ARM
[8.632611] Modules linked in:
[8.635836] CPU: 0 PID: 766 Comm: kworker/0:1H Tainted: GW   
4.2.0-11248-gfbd0351 #140
[8.645208] Hardware name: Generic AM33XX (Flattened Device Tree)
[8.651616] Workqueue: rpciod xprt_autoclose
[8.656091] task: ce3c52c0 ti: ce642000 task.ti: ce642000
[8.661744] PC is at iput+0x1a8/0x1f0
[8.665579] LR is at xprt_autoclose+0x2c/0x54
[8.670136] pc : []lr : []psr: 2113
[8.670136] sp : ce643e80  ip :   fp : c0b56688
[8.682133] r10: 0001  r9 : ce643ec8  r8 : 
[8.687599] r7 : feff3000  r6 : ce615800  r5 : ce615bc0  r4 : ce615b54
[8.694421] r3 : 0060  r2 : 000f  r1 : 0f10e000  r0 : cdbed720
[8.701254] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[8.708718] Control: 10c5387d  Table: 80004019  DAC: 0051
[8.714732] Process kworker/0:1H (pid: 766, stack limit = 0xce642218)
[8.721464] Stack: (0xce643e80 to 0xce644000)
[8.726033] 3e80: c066f828 ce615b54 ce615bc0 ce615800 feff3000  
ce643ec8 c066c884
[8.734596] 3ea0: ce615b54 ce5ff440 cfb9e340 c0057928 0001  
c00578b4 cfb9e340
[8.743152] 3ec0: c0057cc8  c137972c c0cc1960  c09979f4 
cfb9e340 cfb9e340
[8.751714] 3ee0: ce5ff458 cfb9e370 ce642000 0008 c0b55ba0 ce5ff440 
cfb9e340 c0057c54
[8.760274] 3f00: ce659940 ce5ff440 c0057c18  ce659940 ce5ff440 
c0057c18 
[8.768834] 3f20:    c005d918 c0b5697c  
 ce5ff440
[8.777390] 3f40:   dead4ead   c0b65d60 
 
[8.785951] 3f60: c0922088 ce643f64 ce643f64   dead4ead 
 
[8.794513] 3f80: c0b65d60   c0922088 ce643f90 ce643f90 
ce643fac ce659940
[8.803069] 3fa0: c005d844   c000f770   
 
[8.811628] 3fc0:       
 
[8.820185] 3fe0:     0013  
8fdf6861 8fdf6c61
[8.828741] [] (iput) from [] (xprt_autoclose+0x2c/0x54)
[8.836133] [] (xprt_autoclose) from [] 
(process_one_work+0x19c/0x48c)
[8.844784] [] (process_one_work) from [] 
(worker_thread+0x3c/0x4a0)
[8.853256] [] (worker_thread) from [] 
(kthread+0xd4/0xf0)
[8.860827] [] (kthread) from [] 
(ret_from_fork+0x14/0x24)
[8.868387] Code: e59f0044 e59f1044 ebfb467a eac1 (e7f001f2) 
[8.874771] ---[ end trace 6c5e5808fb5c8423 ]---
[8.880045] Unable to handle kernel paging request at virtual address 
ffd0
[8.887603] pgd = c0004000
[8.890438] [ffd0] *pgd=8fdf6861, *pte=, *ppte=
[8.897034] Internal error: Oops: 37 [#2] SMP ARM
[8.901950] Modules linked in:
[8.905165] CPU: 0 PID: 766 Comm: kworker/0:1H Tainted: G  D W   
4.2.0-11248-gfbd0351 #140
[8.914530] Hardware name: Generic AM33XX (Flattened Device Tree)
[8.920923] task: ce3c52c0 ti: ce642000 task.ti: ce642000
[

Re: Race with ip=dhcp bootparameter in ip_rcv_finish on am335x

2015-09-23 Thread Alexander Aring
Hi,

On Wed, Sep 23, 2015 at 09:16:00AM -0400, Trond Myklebust wrote:
...
> 
> Is that happening when the transport is being torn down? If so, is it
> fixed by 
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=79234c3db6842a3de03817211d891e0c2878f756
> ?

thanks. This patch fixed my issue.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html