Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-21 Thread Butler, Peter
, and the process remained forever frozen in the 'D' state and the card had to be rebooted. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-21-17 3:36 PM To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists.sourceforge.net Subject: RE

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-22 Thread Butler, Peter
If you have any suggestions as to procedures/tricks you think might trigger this bug I can certainly attempt to do so in the lab. Obviously we can't attempt to reproduce it on the customer's (live) system. -Original Message- From: Butler, Peter Sent: February-21-17 3:39 PM To: Jon

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
rguments to function 'udp_tunnel6_xmit_skb' include/net/udp_tunnel.h:87:5: note: declared here make[1]: *** [net/tipc/udp_media.o] Error 1 make: *** [net/tipc/] Error 2 -Original Message- From: Butler, Peter Sent: February-23-17 2:14 PM To: Jon Maloy <jon.ma...@ericsson.com>; t

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
Correct - we only use 'eth' as a bearer. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-23-17 3:03 PM To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@eri

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
he resulting TIPC functionality to simply be erroneous at run-time? Peter -Original Message- From: Butler, Peter Sent: February-23-17 2:48 PM To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@er

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-22 Thread Butler, Peter
ll exists in the current (4.9.11 and 4.10) code, but the semantics of the encapsulating while loop are different, and maybe as such that eliminates the issue. Thoughts? Peter -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-22-17 3:01 PM To: Butler, Pe

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-24 Thread Butler, Peter
doesn't mean that run-time issues won't occur. /Peter From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] Sent: February-24-17 5:21 AM To: Butler, Peter <pbut...@sonusnet.com> Cc: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net Subje

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-22 Thread Butler, Peter
The kernel is actual built on a separate compiler than the test lab machine.) Or could I get that message for another reason? -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-22-17 2:11 PM To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
eamlessly integrate with our 4.4.0 kernel, and also be free of the aforementioned bug. Let me know what you think. Thanks, Peter -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-23-17 8:22 AM To: Butler, Peter <pbut...@sonusnet.com>;

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-23-17 10:45 AM To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com> Subject: RE: TIPC Oops in tipc_sk_recv > -O

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
]: *** [net/tipc] Error 2 make: *** [net] Error 2 -Original Message- From: Butler, Peter Sent: February-23-17 10:56 AM To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com> Cc: Butler,

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
and lib/. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-23-17 1:19 PM To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com> Subject: R

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
I definitely don't want to be moving into dangerous waters, so I'll take your suggestion right now and start over -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: February-23-17 1:43 PM To: Butler, Peter <pbut...@sonusnet.com>; tipc-disc

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-23 Thread Butler, Peter
--Original Message----- From: Butler, Peter Sent: February-23-17 1:45 PM To: Jon Maloy <jon.ma...@ericsson.com>; tipc-discussion@lists.sourceforge.net; Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com> Cc: Butler, Peter <pbut...@sonusnet.com> Subject: RE: TIPC Oops i

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-27 Thread Butler, Peter
? It is my understanding that kernel code is meant to be backward-compatible in principle... Peter -Original Message- From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] Sent: February-27-17 7:37 AM To: Butler, Peter <pbut...@sonusnet.com> Cc: Jon Maloy &

Re: [tipc-discussion] TIPC Oops in tipc_sk_recv

2017-02-27 Thread Butler, Peter
d over several connections", do you mean 1000+ connections? Or 1000+ messages per second? Our mesh only has ~30 nodes. Peter -Original Message- From: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] Sent: February-27-17 7:37 AM To: Butler, Peter <pbut..

Re: [tipc-discussion] reproducible link failure scenario

2016-12-12 Thread Butler, Peter
e link failure scenario On 12/09/2016 09:25 PM, Butler, Peter wrote: > We can certainly do that for future upgrades of our customers. However we > may need to just patch in the interim. > > > Is the patch small enough (self-contained enough) that it would be easy > enough fo

[tipc-discussion] reproducible link failure scenario

2016-12-09 Thread Butler, Peter
I have a reproducible failure scenario that results in the following kernel messages being printed in succession (along with the associated link failing): Dec 8 12:10:33 [SEQ 617259] lab236slot6 kernel: [44856.752261] Retransmission failure on link <1.1.6:p19p1-1.1.8:p19p1> Dec 8 12:10:33

Re: [tipc-discussion] reproducible link failure scenario

2016-12-09 Thread Butler, Peter
changed between 4.4 and 4.8? From: Jon Maloy <jon.ma...@ericsson.com> Sent: Friday, December 9, 2016 1:57:46 PM To: Butler, Peter; tipc-discussion@lists.sourceforge.net Subject: RE: reproducible link failure scenario Hi Peter, This is a known bug, fixed in

Re: [tipc-discussion] soft lockup in spin lock

2017-03-09 Thread Butler, Peter
I see " [PATCH v2 net-next 0/6] solve two deadlock issues" that Ying just committed a few minutes before my post - not sure if it is the same thing or not... -Original Message----- From: Butler, Peter Sent: March-09-17 9:53 AM To: tipc-discussion@lists.sourceforge.net Sub

[tipc-discussion] soft lockup in spin lock

2017-03-09 Thread Butler, Peter
This is on node running 4.9.11 TIPC. 9 nodes in cluster, 7 of which are running the same 4.9.11 TIPC (on x86-64), 2 running an old 1.7 TIPC (on PPC). It keeps cycling through these same logs every few seconds. [118768.064830] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]

Re: [tipc-discussion] Constant Illegal FSM event / Resetting Link errors

2017-03-08 Thread Butler, Peter
00 00 -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: March-08-17 11:32 AM To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists.sourceforge.net Subject: RE: Constant Illegal FSM event / Resetting Link errors > -Original Message- > F

Re: [tipc-discussion] Constant Illegal FSM event / Resetting Link errors

2017-03-08 Thread Butler, Peter
To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists.sourceforge.net Subject: RE: Constant Illegal FSM event / Resetting Link errors This looks very much like the deadlock that Partha tried to fix in commit d094c4d5f5c7e1b2 ("tipc: add subscription refcount..") in 4.10. It

Re: [tipc-discussion] Constant Illegal FSM event / Resetting Link errors

2017-03-08 Thread Butler, Peter
Important data point: when the two TIPC 1.7 nodes are taken out of the cluster, the error logs (which were being generated on the 4.9.11 TIPC nodes) cease. -Original Message- From: Jon Maloy [mailto:jon.ma...@ericsson.com] Sent: March-08-17 11:57 AM To: Butler, Peter <p

Re: [tipc-discussion] FW: TIPC issue: connection stalls when switch for bearer 0 recovers

2017-07-14 Thread Butler, Peter
com>; Tung Quang Nguyen <tung.q.ngu...@dektech.com.au> Cc: Andrew Booth (abo...@pt.com) <abo...@pt.com>; Butler, Peter <pbut...@sonusnet.com>; Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>; Ying Xue <ying@windriver.com>; tipc-discussion@l

Re: [tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-25 Thread Butler, Peter
upgrading to the latest kernel appears to just make things worse as per this crash... -Original Message- From: Ying Xue [mailto:ying@windriver.com] Sent: July-25-17 8:48 AM To: Butler, Peter <pbut...@sonusnet.com>; Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@eri

Re: [tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-24 Thread Butler, Peter
to look into upgrading the entire kernel... Peter -Original Message- From: Butler, Peter Sent: July-24-17 11:21 AM To: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>; tipc-discussion@lists.sourceforge.net Cc: Jon Maloy <jon.ma...@ericsson.com>; Yi

Re: [tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-24 Thread Butler, Peter
: Parthasarathy Bhuvaragan [mailto:parthasarathy.bhuvara...@ericsson.com] Sent: July-24-17 8:58 AM To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists.sourceforge.net Cc: Jon Maloy <jon.ma...@ericsson.com>; Ying Xue <ying@windriver.com>; LUU Duc Canh <canh.d@de

Re: [tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-24 Thread Butler, Peter
4] RIP: kfree_skb_list+0x18/0x30 RSP: c90005383b18 [ 2385.388611] ---[ end trace 125f5b3fcb6ee71d ]--- -Original Message- From: Butler, Peter Sent: July-24-17 11:21 AM To: Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>; tipc-discussion@lists.sourceforge.net Cc: Jon M

Re: [tipc-discussion] TIPC connection stalling due to invalid congestion status when bearer 0 recovers

2017-07-24 Thread Butler, Peter
] Sent: July-24-17 9:00 AM To: Butler, Peter <pbut...@sonusnet.com>; tipc-discussion@lists.sourceforge.net Cc: Jon Maloy <jon.ma...@ericsson.com>; Parthasarathy Bhuvaragan <parthasarathy.bhuvara...@ericsson.com>; LUU Duc Canh <canh.d@dektech.com.au> Subject: Re: TIPC