Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-26 Thread Guennadi Liakhovetski

On Sat, 24 Mar 2007, Samuel Ortiz wrote:


On Fri, Mar 23, 2007 at 01:14:43PM +0100, Guennadi Liakhovetski wrote:


Below is a patch that fixes ONE
sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if
there are more there.

Are you still seeing the skb cache growing with your fix ?


No, running stable so far.

Thanks
Guennadi
-
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-25 Thread Samuel Ortiz
On Sat, Mar 24, 2007 at 10:10:34PM -0700, David Miller wrote:
 From: Guennadi Liakhovetski [EMAIL PROTECTED]
 Date: Fri, 23 Mar 2007 13:14:43 +0100 (CET)
 
  On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:
  
   On Wed, 21 Mar 2007, Samuel Ortiz wrote:
  
   I'm quite sure the leak is in the IrDA code rather than in the ppp or
   ipv4 one, hence the need for full irda debug...
  
  Well, looks like you were wrong, Samuel. Below is a patch that fixes ONE 
  sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if 
  there are more there.
 
 This is strange.
 
 The PPP generic layer seems to be very careful about it's handling of
 the -xmit_pending packet.
 
 When a packet is added to -xmit_pending, immediately ppp_push() is
 called, and ppp_push() gives the packet to the channels xmit function,
 and unless the xmit function returns zero the -xmit_pending is reset
 to NULL because non-zero return from the channel xmit functions means
 that the driver took the packet.
 
 Now I checked irnet_ppp.c, which is the driver under scrutiny here,
 and it never ever returns zero, under any circumstance, it always
 return one.
 
 So the -xmit_pending should always be NULL'd out by ppp_push().
 
 There is some funny BLOCK_WHEN_CONNECT code, which will return
 0 in certain cases, but that define it never set during the
 build.
 
 Nevermind... that code does get enabled. :(
It is enabled, yes, from net/irda/irnet/irnet.h
It seems Jean added this option to make the connectdelay 0 pppd option
working.

 
 This looks like it might be a bug, perhaps you can only return zero
 from the transmit function when your queue really is full and you plan
 to wake things up properly when space appears (via
 ppp_output_wakeup()).  You can't return 0 because of an event which
 might never occur, that's what makes -xmit_pending get stuck and
 leak.
I still think Guennadi's fix is correct even if you return 0 from the TX
function only when you're running out of space. If we unregister the ppp
interface before we get a chance to call ppp_output_wake(), then we'll leak
an xmit_pending skb.

I guess Paul will decide here. If he rejects Guennadi's patch, I'll come
up with some IrDA patch so that we free our pending skb from the irnet code,
or try somehow not to block the PPP tx queue while connecting.

Cheers,
Samuel.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-25 Thread Paul Mackerras
David Miller writes:

 The PPP generic layer seems to be very careful about it's handling of
 the -xmit_pending packet.

Mostly, but I think that this is a genuine leak.

 I'm really surprised this leak doesn't trigger already via the
 ppp_synctty.c and ppp_async.c drivers, perhaps they do something to
 make sure the transmitter gets purged properly when unregistering and
 therefore -xmit_pending does not get left non-NULL at unregister
 time.

At least for ppp_async, I think what saves us is that pppd does a
TCFLSH ioctl when shutting down the link, which calls
ppp_async_flush_output, which calls ppp_output_wakeup.

Paul.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-25 Thread Paul Mackerras
Guennadi Liakhovetski writes:

 Don't leak an sk_buff on interface destruction.
 
 Signed-off-by: G. Liakhovetski [EMAIL PROTECTED]

Acked-by: Paul Mackerras [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-25 Thread David Miller
From: Samuel Ortiz [EMAIL PROTECTED]
Date: Mon, 26 Mar 2007 02:50:59 +0300

 I still think Guennadi's fix is correct even if you return 0 from the TX
 function only when you're running out of space. If we unregister the ppp
 interface before we get a chance to call ppp_output_wake(), then we'll leak
 an xmit_pending skb.

I agree, especially after thinking about this some more.

It's just really weird that we've never hit this before,
since this logic has been there forever, that's all :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-25 Thread David Miller
From: Paul Mackerras [EMAIL PROTECTED]
Date: Mon, 26 Mar 2007 11:02:22 +1000

 David Miller writes:
 
  The PPP generic layer seems to be very careful about it's handling of
  the -xmit_pending packet.
 
 Mostly, but I think that this is a genuine leak.
 
  I'm really surprised this leak doesn't trigger already via the
  ppp_synctty.c and ppp_async.c drivers, perhaps they do something to
  make sure the transmitter gets purged properly when unregistering and
  therefore -xmit_pending does not get left non-NULL at unregister
  time.
 
 At least for ppp_async, I think what saves us is that pppd does a
 TCFLSH ioctl when shutting down the link, which calls
 ppp_async_flush_output, which calls ppp_output_wakeup.

Ok, that makes sense, I'll apply the leak fix for sure
especially since I even have your ACK now :-)

Thanks for reviewing Paul.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-24 Thread David Miller
From: Guennadi Liakhovetski [EMAIL PROTECTED]
Date: Fri, 23 Mar 2007 13:14:43 +0100 (CET)

 On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:
 
  On Wed, 21 Mar 2007, Samuel Ortiz wrote:
 
  I'm quite sure the leak is in the IrDA code rather than in the ppp or
  ipv4 one, hence the need for full irda debug...
 
 Well, looks like you were wrong, Samuel. Below is a patch that fixes ONE 
 sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if 
 there are more there.

This is strange.

The PPP generic layer seems to be very careful about it's handling of
the -xmit_pending packet.

When a packet is added to -xmit_pending, immediately ppp_push() is
called, and ppp_push() gives the packet to the channels xmit function,
and unless the xmit function returns zero the -xmit_pending is reset
to NULL because non-zero return from the channel xmit functions means
that the driver took the packet.

Now I checked irnet_ppp.c, which is the driver under scrutiny here,
and it never ever returns zero, under any circumstance, it always
return one.

So the -xmit_pending should always be NULL'd out by ppp_push().

There is some funny BLOCK_WHEN_CONNECT code, which will return
0 in certain cases, but that define it never set during the
build.

Nevermind... that code does get enabled. :(

This looks like it might be a bug, perhaps you can only return zero
from the transmit function when your queue really is full and you plan
to wake things up properly when space appears (via
ppp_output_wakeup()).  You can't return 0 because of an event which
might never occur, that's what makes -xmit_pending get stuck and
leak.

I'm really surprised this leak doesn't trigger already via the
ppp_synctty.c and ppp_async.c drivers, perhaps they do something to
make sure the transmitter gets purged properly when unregistering and
therefore -xmit_pending does not get left non-NULL at unregister
time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-23 Thread Samuel Ortiz
On Fri, Mar 23, 2007 at 01:14:43PM +0100, Guennadi Liakhovetski wrote:
 On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:
 
  On Wed, 21 Mar 2007, Samuel Ortiz wrote:
 
  I'm quite sure the leak is in the IrDA code rather than in the ppp or
  ipv4 one, hence the need for full irda debug...
 
 Well, looks like you were wrong, Samuel. 
Heh, it's good to be wrong sometimes :-)

 Below is a patch that fixes ONE 
 sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if 
 there are more there.
Are you still seeing the skb cache growing with your fix ?

Cheers,
Samuel.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-23 Thread Guennadi Liakhovetski

On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:


On Wed, 21 Mar 2007, Samuel Ortiz wrote:


I'm quite sure the leak is in the IrDA code rather than in the ppp or
ipv4 one, hence the need for full irda debug...


Well, looks like you were wrong, Samuel. Below is a patch that fixes ONE 
sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if 
there are more there.


Thanks
Guennadi
-
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

Don't leak an sk_buff on interface destruction.

Signed-off-by: G. Liakhovetski [EMAIL PROTECTED]

--- a/drivers/net/ppp_generic.c 2007-03-23 13:04:04.0 +0100
+++ b/drivers/net/ppp_generic.c 2007-03-23 13:05:29.0 +0100
@@ -2544,6 +2544,9 @@
ppp-active_filter = NULL;
 #endif /* CONFIG_PPP_FILTER */

+   if (ppp-xmit_pending)
+   kfree_skb(ppp-xmit_pending);
+
kfree(ppp);
 }

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-21 Thread Guennadi Liakhovetski
(Short recap for newly added to cc: netdev: I'm seeing an skb leak in 
2.6.20 during an IrDA IrNET+ppp UDP test with periodic connection 
disruptions)


On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:


On Tue, 20 Mar 2007, Guennadi Liakhovetski wrote:

Ok, looks like all leaked skbuffs come from ip_append_data(), like this:

(sock_alloc_send_skb+0x2c8/0x2e4)
(ip_append_data+0x7fc/0xa80)
(udp_sendmsg+0x248/0x68c)
(inet_sendmsg+0x60/0x64)
(sock_sendmsg+0xb4/0xe4)
 r4 = C3CB4960
(sys_sendto+0xc8/0xf0)
 r4 = 
(sys_socketcall+0x168/0x1f0)
(ret_fast_syscall+0x0/0x2c)


This call to sock_alloc_send_skb() in ip_append_data() is not from the 
inlined ip_ufo_append_data(), it is here:


/* The last fragment gets additional space at tail.
 * Note, with MSG_MORE we overallocate on fragments,
 * because we have no idea what fragment will be
 * the last.
 */
if (datalen == length + fraggap)
alloclen += rt-u.dst.trailer_len;

if (transhdrlen) {
skb = sock_alloc_send_skb(sk,
alloclen + hh_len + 15,
(flags  MSG_DONTWAIT), err);
} else {

Then, I traced a couple of paths how such a skbuff, coming down from 
ip_append_data() and allocated above get freed (when they do):


[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
 r5 = C332BC00  r4 = C332BC00 
[c01824f0] (kfree_skb+0x0/0x50) from [bf0fac58] (irlap_update_nr_received+0x94/0xc8 [irda])

[bf0fabc4] (irlap_update_nr_received+0x0/0xc8 [irda]) from [bf0fda98] 
(irlap_state_nrm_p+0x530/0x7c0 [irda])
 r7 = 0001  r6 = C0367EC0  r5 = C332BC00  r4 = 
[bf0fd568] (irlap_state_nrm_p+0x0/0x7c0 [irda]) from [bf0fbd90] 
(irlap_do_event+0x68/0x18c [irda])
[bf0fbd28] (irlap_do_event+0x0/0x18c [irda]) from [bf1008cc] 
(irlap_driver_rcv+0x1f0/0xd38 [irda])
[bf1006dc] (irlap_driver_rcv+0x0/0xd38 [irda]) from [c01892c0] 
(netif_receive_skb+0x244/0x338)
[c018907c] (netif_receive_skb+0x0/0x338) from [c0189468] 
(process_backlog+0xb4/0x194)
[c01893b4] (process_backlog+0x0/0x194) from [c01895f8] 
(net_rx_action+0xb0/0x210)
[c0189548] (net_rx_action+0x0/0x210) from [c0042f7c] (ksoftirqd+0x108/0x1cc)
[c0042e74] (ksoftirqd+0x0/0x1cc) from [c0053614] (kthread+0x10c/0x138)
[c0053508] (kthread+0x0/0x138) from [c003f918] (do_exit+0x0/0x8b0)
 r8 =   r7 =   r6 =   r5 = 
 r4 = 

and

[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
 r5 = C03909E0  r4 = C1A97400 
[c01824f0] (kfree_skb+0x0/0x50) from [c0199bf8] (pfifo_fast_enqueue+0xb4/0xd0)

[c0199b44] (pfifo_fast_enqueue+0x0/0xd0) from [c0188c30] 
(dev_queue_xmit+0x17c/0x25c)
 r8 = C1A2DCE0  r7 = FFF4  r6 = C3393114  r5 = C03909E0
 r4 = C3393000 
[c0188ab4] (dev_queue_xmit+0x0/0x25c) from [c01a7c18] (ip_output+0x150/0x254)

 r7 = C3717120  r6 = C03909E0  r5 =   r4 = C1A2DCE0
[c01a7ac8] (ip_output+0x0/0x254) from [c01a93d0] 
(ip_push_pending_frames+0x368/0x4d4)
[c01a9068] (ip_push_pending_frames+0x0/0x4d4) from [c01c6954] 
(udp_push_pending_frames+0x14c/0x310)
[c01c6808] (udp_push_pending_frames+0x0/0x310) from [c01c70d8] 
(udp_sendmsg+0x5c0/0x690)
[c01c6b18] (udp_sendmsg+0x0/0x690) from [c01ceafc] (inet_sendmsg+0x60/0x64)
[c01cea9c] (inet_sendmsg+0x0/0x64) from [c017c970] (sock_sendmsg+0xb4/0xe4)
 r7 = C2CEFDF4  r6 = 0064  r5 = C2CEFEA8  r4 = C3C94080
[c017c8bc] (sock_sendmsg+0x0/0xe4) from [c017dd9c] (sys_sendto+0xc8/0xf0)
 r7 = 0064  r6 = C3571580  r5 = C2CEFEC4  r4 = 
[c017dcd4] (sys_sendto+0x0/0xf0) from [c017e654] 
(sys_socketcall+0x168/0x1f0)
[c017e4ec] (sys_socketcall+0x0/0x1f0) from [c001ff40] 
(ret_fast_syscall+0x0/0x2c)
 r5 = 00415344  r4 = 

I would be greatful for any hints how I can identify which skbuff's get 
lost and why, and where and who should free them.


I am not subscribed to netdev, please keep in cc.

Thanks
Guennadi
-
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-21 Thread Samuel Ortiz

On 3/21/2007, Guennadi Liakhovetski [EMAIL PROTECTED] wrote:

(Short recap for newly added to cc: netdev: I'm seeing an skb leak in
2.6.20 during an IrDA IrNET+ppp UDP test with periodic connection
disruptions)

On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:

 On Tue, 20 Mar 2007, Guennadi Liakhovetski wrote:

 Ok, looks like all leaked skbuffs come from ip_append_data(), like this:

 (sock_alloc_send_skb+0x2c8/0x2e4)
 (ip_append_data+0x7fc/0xa80)
 (udp_sendmsg+0x248/0x68c)
 (inet_sendmsg+0x60/0x64)
 (sock_sendmsg+0xb4/0xe4)
  r4 = C3CB4960
 (sys_sendto+0xc8/0xf0)
  r4 = 
 (sys_socketcall+0x168/0x1f0)
 (ret_fast_syscall+0x0/0x2c)

This call to sock_alloc_send_skb() in ip_append_data() is not from the
inlined ip_ufo_append_data(), it is here:

   /* The last fragment gets additional space at tail.
* Note, with MSG_MORE we overallocate on fragments,
* because we have no idea what fragment will be
* the last.
*/
   if (datalen == length + fraggap)
   alloclen += rt-u.dst.trailer_len;

   if (transhdrlen) {
   skb = sock_alloc_send_skb(sk,
   alloclen + hh_len + 15,
   (flags  MSG_DONTWAIT), err);
   } else {

Then, I traced a couple of paths how such a skbuff, coming down from
ip_append_data() and allocated above get freed (when they do):

[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
  r5 = C332BC00  r4 = C332BC00
[c01824f0] (kfree_skb+0x0/0x50) from [bf0fac58] 
(irlap_update_nr_received+0x94/0xc8 [irda])
[bf0fabc4] (irlap_update_nr_received+0x0/0xc8 [irda]) from [bf0fda98] 
(irlap_state_nrm_p+0x530/0x7c0 [irda])
  r7 = 0001  r6 = C0367EC0  r5 = C332BC00  r4 = 
[bf0fd568] (irlap_state_nrm_p+0x0/0x7c0 [irda]) from [bf0fbd90] 
(irlap_do_event+0x68/0x18c [irda])
[bf0fbd28] (irlap_do_event+0x0/0x18c [irda]) from [bf1008cc] 
(irlap_driver_rcv+0x1f0/0xd38 [irda])
[bf1006dc] (irlap_driver_rcv+0x0/0xd38 [irda]) from [c01892c0] 
(netif_receive_skb+0x244/0x338)
[c018907c] (netif_receive_skb+0x0/0x338) from [c0189468] 
(process_backlog+0xb4/0x194)
[c01893b4] (process_backlog+0x0/0x194) from [c01895f8] 
(net_rx_action+0xb0/0x210)
[c0189548] (net_rx_action+0x0/0x210) from [c0042f7c] 
(ksoftirqd+0x108/0x1cc)
[c0042e74] (ksoftirqd+0x0/0x1cc) from [c0053614] (kthread+0x10c/0x138)
[c0053508] (kthread+0x0/0x138) from [c003f918] (do_exit+0x0/0x8b0)
  r8 =   r7 =   r6 =   r5 = 
  r4 = 
This is the IrDA RX path, so I doubt the corresponding skb ever got
through
ip_append_data(). The skb was allocated by your HW driver upon packet
reception, then queued to the net input queue, and finally passed to the
IrDA stack. Are you sure your tracing is correct ?


and

[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
  r5 = C03909E0  r4 = C1A97400
[c01824f0] (kfree_skb+0x0/0x50) from [c0199bf8] 
(pfifo_fast_enqueue+0xb4/0xd0)
[c0199b44] (pfifo_fast_enqueue+0x0/0xd0) from [c0188c30] 
(dev_queue_xmit+0x17c/0x25c)
  r8 = C1A2DCE0  r7 = FFF4  r6 = C3393114  r5 = C03909E0
  r4 = C3393000
[c0188ab4] (dev_queue_xmit+0x0/0x25c) from [c01a7c18] 
(ip_output+0x150/0x254)
  r7 = C3717120  r6 = C03909E0  r5 =   r4 = C1A2DCE0
[c01a7ac8] (ip_output+0x0/0x254) from [c01a93d0] 
(ip_push_pending_frames+0x368/0x4d4)
[c01a9068] (ip_push_pending_frames+0x0/0x4d4) from [c01c6954] 
(udp_push_pending_frames+0x14c/0x310)
[c01c6808] (udp_push_pending_frames+0x0/0x310) from [c01c70d8] 
(udp_sendmsg+0x5c0/0x690)
[c01c6b18] (udp_sendmsg+0x0/0x690) from [c01ceafc] (inet_sendmsg+0x60/0x64)
[c01cea9c] (inet_sendmsg+0x0/0x64) from [c017c970] (sock_sendmsg+0xb4/0xe4)
  r7 = C2CEFDF4  r6 = 0064  r5 = C2CEFEA8  r4 = C3C94080
[c017c8bc] (sock_sendmsg+0x0/0xe4) from [c017dd9c] (sys_sendto+0xc8/0xf0)
  r7 = 0064  r6 = C3571580  r5 = C2CEFEC4  r4 = 
[c017dcd4] (sys_sendto+0x0/0xf0) from [c017e654] 
(sys_socketcall+0x168/0x1f0)
[c017e4ec] (sys_socketcall+0x0/0x1f0) from [c001ff40] 
(ret_fast_syscall+0x0/0x2c)
  r5 = 00415344  r4 = 
This one is on the TX path, yes. However it got dropped and freed because
your TX queue was full. Any idea in which situation does that happen ?


I would be greatful for any hints how I can identify which skbuff's get
lost and why, and where and who should free them.
You're seeing skb leaks when cutting the ppp connection periodically,
right ?
Do you such leaks when not cutting the ppp connection ?
If not, could you send me a kernel trace (with irda debug set to 5) when
the ppp connection is shut down ? It would narrow down the problem a bit.
I'm quite sure the leak is in the IrDA code rather than in the ppp or
ipv4
one, hence the need for full irda debug...

Cheers,
Samuel.

I am not 

Re: [irda-users] [2.6.20-rt8] Neighbour table overflow.

2007-03-21 Thread Guennadi Liakhovetski

On Wed, 21 Mar 2007, Samuel Ortiz wrote:


On 3/21/2007, Guennadi Liakhovetski [EMAIL PROTECTED] wrote:


[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
 r5 = C332BC00  r4 = C332BC00
[c01824f0] (kfree_skb+0x0/0x50) from [bf0fac58] 
(irlap_update_nr_received+0x94/0xc8 [irda])
[bf0fabc4] (irlap_update_nr_received+0x0/0xc8 [irda]) from [bf0fda98] 
(irlap_state_nrm_p+0x530/0x7c0 [irda])
 r7 = 0001  r6 = C0367EC0  r5 = C332BC00  r4 = 
[bf0fd568] (irlap_state_nrm_p+0x0/0x7c0 [irda]) from [bf0fbd90] 
(irlap_do_event+0x68/0x18c [irda])
[bf0fbd28] (irlap_do_event+0x0/0x18c [irda]) from [bf1008cc] 
(irlap_driver_rcv+0x1f0/0xd38 [irda])
[bf1006dc] (irlap_driver_rcv+0x0/0xd38 [irda]) from [c01892c0] 
(netif_receive_skb+0x244/0x338)
[c018907c] (netif_receive_skb+0x0/0x338) from [c0189468] 
(process_backlog+0xb4/0x194)
[c01893b4] (process_backlog+0x0/0x194) from [c01895f8] 
(net_rx_action+0xb0/0x210)
[c0189548] (net_rx_action+0x0/0x210) from [c0042f7c] (ksoftirqd+0x108/0x1cc)
[c0042e74] (ksoftirqd+0x0/0x1cc) from [c0053614] (kthread+0x10c/0x138)
[c0053508] (kthread+0x0/0x138) from [c003f918] (do_exit+0x0/0x8b0)
 r8 =   r7 =   r6 =   r5 = 
 r4 = 

This is the IrDA RX path, so I doubt the corresponding skb ever got
through
ip_append_data(). The skb was allocated by your HW driver upon packet
reception, then queued to the net input queue, and finally passed to the
IrDA stack. Are you sure your tracing is correct ?


I've added a bitfield to struct sk_buff:

__u8pkt_type:3,
fclone:2,
-   ipvs_property:1;
+   ipvs_property:1,
+   trace_dbg:1;

and I set itin ip_append_data() before sock_alloc_send_skb() is called. 
Then I check this bit in __kfree_skb(). The bit is set to 0 in __alloc_skb 
per


memset(skb, 0, offsetof(struct sk_buff, truesize));

So, if it was a freshly allocated skb, the tracing should be correct.


[c0182380] (__kfree_skb+0x0/0x170) from [c0182514] (kfree_skb+0x24/0x50)
 r5 = C03909E0  r4 = C1A97400
[c01824f0] (kfree_skb+0x0/0x50) from [c0199bf8] 
(pfifo_fast_enqueue+0xb4/0xd0)
[c0199b44] (pfifo_fast_enqueue+0x0/0xd0) from [c0188c30] 
(dev_queue_xmit+0x17c/0x25c)
 r8 = C1A2DCE0  r7 = FFF4  r6 = C3393114  r5 = C03909E0
 r4 = C3393000
[c0188ab4] (dev_queue_xmit+0x0/0x25c) from [c01a7c18] 
(ip_output+0x150/0x254)
 r7 = C3717120  r6 = C03909E0  r5 =   r4 = C1A2DCE0
[c01a7ac8] (ip_output+0x0/0x254) from [c01a93d0] 
(ip_push_pending_frames+0x368/0x4d4)
[c01a9068] (ip_push_pending_frames+0x0/0x4d4) from [c01c6954] 
(udp_push_pending_frames+0x14c/0x310)
[c01c6808] (udp_push_pending_frames+0x0/0x310) from [c01c70d8] 
(udp_sendmsg+0x5c0/0x690)
[c01c6b18] (udp_sendmsg+0x0/0x690) from [c01ceafc] (inet_sendmsg+0x60/0x64)
[c01cea9c] (inet_sendmsg+0x0/0x64) from [c017c970] (sock_sendmsg+0xb4/0xe4)
 r7 = C2CEFDF4  r6 = 0064  r5 = C2CEFEA8  r4 = C3C94080
[c017c8bc] (sock_sendmsg+0x0/0xe4) from [c017dd9c] (sys_sendto+0xc8/0xf0)
 r7 = 0064  r6 = C3571580  r5 = C2CEFEC4  r4 = 
[c017dcd4] (sys_sendto+0x0/0xf0) from [c017e654] 
(sys_socketcall+0x168/0x1f0)
[c017e4ec] (sys_socketcall+0x0/0x1f0) from [c001ff40] 
(ret_fast_syscall+0x0/0x2c)
 r5 = 00415344  r4 = 

This one is on the TX path, yes. However it got dropped and freed because
your TX queue was full. Any idea in which situation does that happen ?


No. I can only describe what communication is running while ppp is 
disrupted - it's just some sort of udp mirror test - udp packets are sent 
one after another and mirrored back.



I would be greatful for any hints how I can identify which skbuff's get
lost and why, and where and who should free them.

You're seeing skb leaks when cutting the ppp connection periodically,
right ?


Right


Do you such leaks when not cutting the ppp connection ?


Looks like I don't.


If not, could you send me a kernel trace (with irda debug set to 5) when
the ppp connection is shut down ? It would narrow down the problem a bit.


Attached bzipped... It's a complete log starting from irda up, running udp 
packets over the link, closing the link and bringing irda completely down.



I'm quite sure the leak is in the IrDA code rather than in the ppp or
ipv4 one, hence the need for full irda debug...


Likely, yes. Why I am asking netdev guys for help is just because I have 
very little idea about the data flow in the network stack(s). And the more 
experienced eyes we have on the problem the sooner we might solve it, I 
hope...


Thanks
Guennadi
-
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

mpppdown.bz2
Description: Binary data