Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-26 Thread Alessandro Suardi
On 8/26/05, Patrick McHardy <[EMAIL PROTECTED]> wrote:
> Alessandro Suardi wrote:
> > Stack is hand-copied from the dead box's console.
> >
> > [] die+0xe4/0x170
> > [] do_trap+0x7f/0xc0
> > [] do_invalid_op+0xa3/0xb0
> > [] error_code+0x4f/0x54
> > [] kfree_skbmem+0xb/0x20
> > [] __kfree_skb+0x5f/0xf0
> > [] tcp_clean_rtx_queue+0x16a/0x470
> > [] tcp_ack+0xf6/0x360
> > [] tcp_rcv_established+0x277/0x7a0
> > [] tcp_v4_do_rcv+0xf0/0x110
> > [] tcp_v4_rcv+0x6e0/0x820
> > [] ip_local_deliver_finish+0x84/0x160
> > [] nf_reinject+0x13a/0x1c0
> > [] ipq_issue_verdict+0x28/0x40
> > [] ipq_set_verdict+0x48/0x70
> > [] ipq_receive_peer+0x39/0x50
> > [] ipq_receive_sk+0x172/0x190
> > [] netlink_data_ready+0x35/0x60
> > [] netlink_sendskb+0x24/0x60
> > [] netlink_unicast+0x127/0x160
> > [] netlink_sendmsg+0x204/0x2b0
> > [] sock_sendmsg+0xb0/0xe0
> > [] sys_sendmsg+0x134/0x240
> > [] sys_socketcall+0x224/0x230
> > [] sysenter_past_esp+0x54/0x75
> > Code: 8b 41 0c 85 c0 75 1b 8b 86 94 00 00 00 e8 9e 37 e5 ff 5b 5e c9
> > c3 89 d0 e8 43 46 e5 ff 8d 76 00 eb d2 89 f0 e8 f7 fe ff ff eb dc <0f>
> > 0b 54 01 16 d2 36 c0 eb b4 8d 74 26 00 8d bc 27 00 00 00 00
> > <0>Kernel panic - not syncing: Fatal exception in interrupt
> >
> > If there's need for further info I'd be happy to provide it. For now
> >  the box is rebooted into the same kernel and running the same
> >  PG/eD2k programs, if the issue reproduces I'll follow up on my
> >  own message.
> 
> Any chance you can get the entire Oops including registers etc
> using netconsole or serial console?

Not right now, as I noticed netconsole requires netpoll and this
 latter can't be modular; but I'll do so before leaving tomorrow
 morning, obviously rebuilding with 2.6.13-rc7-git1 or -git2 if
 the new snapshot comes out.

At the moment, the box has been running for 32 hours with
 no sign of wanting to oops...

[EMAIL PROTECTED] ~]# ps ax | egrep 'peer|edon'
 2416 pts/2Sl25:37 peerguardnf -d -l /var/log/pg.log -c /etc/PG.conf
25186 pts/0R+76:37 ./edonkey2000
25189 pts/0S+ 0:06 ./edonkey2000
25191 pts/0S+ 9:49 ./edonkey2000
 7007 pts/0S+ 0:00 ./edonkey2000
 7011 pts/3R+ 0:00 egrep peer|edon
[EMAIL PROTECTED] ~]# w
 22:37:53 up 1 day,  7:49,  4 users,  load average: 0.15, 0.18, 0.25
USER TTY  FROM  LOGIN@   IDLE   JCPU   PCPU WHAT
root pts/0donkey:2.0   Thu14   20:15m  1:26m  0.00s bash
root pts/1donkey:2.0   Thu14   13:40m  0.41s  1:57 
gnome-terminal --sm-config-prefix /gnome-terminal-wBjEOn/ -
root pts/2donkey:2.0   Thu144:07  25:37   0.49s bash
root pts/3192.168.1.6  22:370.00s  0.06s  0.01s w

Thanks,

--alessandro

 "Not every smile means I'm laughing inside"

(Wallflowers - "From The Bottom Of My Heart")
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-26 Thread Patrick McHardy
Alessandro Suardi wrote:
> Stack is hand-copied from the dead box's console. 
> 
> [] die+0xe4/0x170
> [] do_trap+0x7f/0xc0
> [] do_invalid_op+0xa3/0xb0
> [] error_code+0x4f/0x54
> [] kfree_skbmem+0xb/0x20
> [] __kfree_skb+0x5f/0xf0
> [] tcp_clean_rtx_queue+0x16a/0x470
> [] tcp_ack+0xf6/0x360
> [] tcp_rcv_established+0x277/0x7a0
> [] tcp_v4_do_rcv+0xf0/0x110
> [] tcp_v4_rcv+0x6e0/0x820
> [] ip_local_deliver_finish+0x84/0x160
> [] nf_reinject+0x13a/0x1c0
> [] ipq_issue_verdict+0x28/0x40
> [] ipq_set_verdict+0x48/0x70
> [] ipq_receive_peer+0x39/0x50
> [] ipq_receive_sk+0x172/0x190
> [] netlink_data_ready+0x35/0x60
> [] netlink_sendskb+0x24/0x60
> [] netlink_unicast+0x127/0x160
> [] netlink_sendmsg+0x204/0x2b0
> [] sock_sendmsg+0xb0/0xe0
> [] sys_sendmsg+0x134/0x240
> [] sys_socketcall+0x224/0x230
> [] sysenter_past_esp+0x54/0x75
> Code: 8b 41 0c 85 c0 75 1b 8b 86 94 00 00 00 e8 9e 37 e5 ff 5b 5e c9
> c3 89 d0 e8 43 46 e5 ff 8d 76 00 eb d2 89 f0 e8 f7 fe ff ff eb dc <0f>
> 0b 54 01 16 d2 36 c0 eb b4 8d 74 26 00 8d bc 27 00 00 00 00
> <0>Kernel panic - not syncing: Fatal exception in interrupt
> 
> If there's need for further info I'd be happy to provide it. For now
>  the box is rebooted into the same kernel and running the same
>  PG/eD2k programs, if the issue reproduces I'll follow up on my
>  own message.

Any chance you can get the entire Oops including registers etc
using netconsole or serial console?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-26 Thread Harald Welte
On Thu, Aug 25, 2005 at 11:02:01PM +0200, Sven Schuster wrote:
> 
> Hi Harald,
> 
> On Thu, Aug 25, 2005 at 06:55:50PM +0200, Harald Welte told us:
> > Is it true that PeerGuardian is a proprietary application?  I'm not
> > going to debug this problem using a proprietary ip_queue program, sorry.
> 
> sorry to jump in here, but I took a quick look at PeerGuardian,
> according to
> http://methlabs.org/wiki/license_information
> it's open source.  The source code is available at
> http://methlabs.org/projects/peerguardian-linuxosx/

ok, thanks. Sorry for the confusion, but the 'official' website is just
a blog that didn't really reveal all that much information.

-- 
- Harald Welte <[EMAIL PROTECTED]> http://netfilter.org/

  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."-- Paul Vixie


pgpS5H7yzk190.pgp
Description: PGP signature


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-26 Thread Harald Welte
On Thu, Aug 25, 2005 at 11:02:01PM +0200, Sven Schuster wrote:
 
 Hi Harald,
 
 On Thu, Aug 25, 2005 at 06:55:50PM +0200, Harald Welte told us:
  Is it true that PeerGuardian is a proprietary application?  I'm not
  going to debug this problem using a proprietary ip_queue program, sorry.
 
 sorry to jump in here, but I took a quick look at PeerGuardian,
 according to
 http://methlabs.org/wiki/license_information
 it's open source.  The source code is available at
 http://methlabs.org/projects/peerguardian-linuxosx/

ok, thanks. Sorry for the confusion, but the 'official' website is just
a blog that didn't really reveal all that much information.

-- 
- Harald Welte [EMAIL PROTECTED] http://netfilter.org/

  Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed.-- Paul Vixie


pgpS5H7yzk190.pgp
Description: PGP signature


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-26 Thread Patrick McHardy
Alessandro Suardi wrote:
 Stack is hand-copied from the dead box's console. 
 
 [c0103714] die+0xe4/0x170
 [c010381f] do_trap+0x7f/0xc0
 [c0103b33] do_invalid_op+0xa3/0xb0
 [c0102faf] error_code+0x4f/0x54
 [c02eb05b] kfree_skbmem+0xb/0x20
 [c02eb0cf] __kfree_skb+0x5f/0xf0
 [c031304a] tcp_clean_rtx_queue+0x16a/0x470
 [c0313746] tcp_ack+0xf6/0x360
 [c0315d57] tcp_rcv_established+0x277/0x7a0
 [c031eba0] tcp_v4_do_rcv+0xf0/0x110
 [c031f2a0] tcp_v4_rcv+0x6e0/0x820
 [c0305594] ip_local_deliver_finish+0x84/0x160
 [c02fbe4a] nf_reinject+0x13a/0x1c0
 [c033f0d8] ipq_issue_verdict+0x28/0x40
 [c033f968] ipq_set_verdict+0x48/0x70
 [c033fa79] ipq_receive_peer+0x39/0x50
 [c033fc72] ipq_receive_sk+0x172/0x190
 [c02fffa5] netlink_data_ready+0x35/0x60
 [c02ff4a4] netlink_sendskb+0x24/0x60
 [c02ff657] netlink_unicast+0x127/0x160
 [c02ffcc4] netlink_sendmsg+0x204/0x2b0
 [c02e6dc0] sock_sendmsg+0xb0/0xe0
 [c02e83f4] sys_sendmsg+0x134/0x240
 [c02e88e4] sys_socketcall+0x224/0x230
 [c0102d3b] sysenter_past_esp+0x54/0x75
 Code: 8b 41 0c 85 c0 75 1b 8b 86 94 00 00 00 e8 9e 37 e5 ff 5b 5e c9
 c3 89 d0 e8 43 46 e5 ff 8d 76 00 eb d2 89 f0 e8 f7 fe ff ff eb dc 0f
 0b 54 01 16 d2 36 c0 eb b4 8d 74 26 00 8d bc 27 00 00 00 00
 0Kernel panic - not syncing: Fatal exception in interrupt
 
 If there's need for further info I'd be happy to provide it. For now
  the box is rebooted into the same kernel and running the same
  PG/eD2k programs, if the issue reproduces I'll follow up on my
  own message.

Any chance you can get the entire Oops including registers etc
using netconsole or serial console?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-26 Thread Alessandro Suardi
On 8/26/05, Patrick McHardy [EMAIL PROTECTED] wrote:
 Alessandro Suardi wrote:
  Stack is hand-copied from the dead box's console.
 
  [c0103714] die+0xe4/0x170
  [c010381f] do_trap+0x7f/0xc0
  [c0103b33] do_invalid_op+0xa3/0xb0
  [c0102faf] error_code+0x4f/0x54
  [c02eb05b] kfree_skbmem+0xb/0x20
  [c02eb0cf] __kfree_skb+0x5f/0xf0
  [c031304a] tcp_clean_rtx_queue+0x16a/0x470
  [c0313746] tcp_ack+0xf6/0x360
  [c0315d57] tcp_rcv_established+0x277/0x7a0
  [c031eba0] tcp_v4_do_rcv+0xf0/0x110
  [c031f2a0] tcp_v4_rcv+0x6e0/0x820
  [c0305594] ip_local_deliver_finish+0x84/0x160
  [c02fbe4a] nf_reinject+0x13a/0x1c0
  [c033f0d8] ipq_issue_verdict+0x28/0x40
  [c033f968] ipq_set_verdict+0x48/0x70
  [c033fa79] ipq_receive_peer+0x39/0x50
  [c033fc72] ipq_receive_sk+0x172/0x190
  [c02fffa5] netlink_data_ready+0x35/0x60
  [c02ff4a4] netlink_sendskb+0x24/0x60
  [c02ff657] netlink_unicast+0x127/0x160
  [c02ffcc4] netlink_sendmsg+0x204/0x2b0
  [c02e6dc0] sock_sendmsg+0xb0/0xe0
  [c02e83f4] sys_sendmsg+0x134/0x240
  [c02e88e4] sys_socketcall+0x224/0x230
  [c0102d3b] sysenter_past_esp+0x54/0x75
  Code: 8b 41 0c 85 c0 75 1b 8b 86 94 00 00 00 e8 9e 37 e5 ff 5b 5e c9
  c3 89 d0 e8 43 46 e5 ff 8d 76 00 eb d2 89 f0 e8 f7 fe ff ff eb dc 0f
  0b 54 01 16 d2 36 c0 eb b4 8d 74 26 00 8d bc 27 00 00 00 00
  0Kernel panic - not syncing: Fatal exception in interrupt
 
  If there's need for further info I'd be happy to provide it. For now
   the box is rebooted into the same kernel and running the same
   PG/eD2k programs, if the issue reproduces I'll follow up on my
   own message.
 
 Any chance you can get the entire Oops including registers etc
 using netconsole or serial console?

Not right now, as I noticed netconsole requires netpoll and this
 latter can't be modular; but I'll do so before leaving tomorrow
 morning, obviously rebuilding with 2.6.13-rc7-git1 or -git2 if
 the new snapshot comes out.

At the moment, the box has been running for 32 hours with
 no sign of wanting to oops...

[EMAIL PROTECTED] ~]# ps ax | egrep 'peer|edon'
 2416 pts/2Sl25:37 peerguardnf -d -l /var/log/pg.log -c /etc/PG.conf
25186 pts/0R+76:37 ./edonkey2000
25189 pts/0S+ 0:06 ./edonkey2000
25191 pts/0S+ 9:49 ./edonkey2000
 7007 pts/0S+ 0:00 ./edonkey2000
 7011 pts/3R+ 0:00 egrep peer|edon
[EMAIL PROTECTED] ~]# w
 22:37:53 up 1 day,  7:49,  4 users,  load average: 0.15, 0.18, 0.25
USER TTY  FROM  LOGIN@   IDLE   JCPU   PCPU WHAT
root pts/0donkey:2.0   Thu14   20:15m  1:26m  0.00s bash
root pts/1donkey:2.0   Thu14   13:40m  0.41s  1:57 
gnome-terminal --sm-config-prefix /gnome-terminal-wBjEOn/ -
root pts/2donkey:2.0   Thu144:07  25:37   0.49s bash
root pts/3192.168.1.6  22:370.00s  0.06s  0.01s w

Thanks,

--alessandro

 Not every smile means I'm laughing inside

(Wallflowers - From The Bottom Of My Heart)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-25 Thread Sven Schuster

Hi Harald,

On Thu, Aug 25, 2005 at 06:55:50PM +0200, Harald Welte told us:
> Is it true that PeerGuardian is a proprietary application?  I'm not
> going to debug this problem using a proprietary ip_queue program, sorry.

sorry to jump in here, but I took a quick look at PeerGuardian,
according to
http://methlabs.org/wiki/license_information
it's open source.  The source code is available at
http://methlabs.org/projects/peerguardian-linuxosx/

HTH

Sven

-- 
Linux zion.homelinux.com 2.6.13-rc6-mm2 #3 Thu Aug 25 14:53:55 CEST 2005 i686 
athlon i386 GNU/Linux
 22:56:18 up  7:40,  1 user,  load average: 0.46, 0.14, 0.04


pgp8ptImjJfSl.pgp
Description: PGP signature


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-25 Thread Alessandro Suardi
On 8/25/05, Harald Welte <[EMAIL PROTECTED]> wrote:
> On Thu, Aug 25, 2005 at 03:39:02PM +0200, Alessandro Suardi wrote:
> > Howdy, and excuse me for crossposting - feel free to zap CC to
> >  unrelated, if any, mailing lists.
> >
> >   just gave PeerGuardian a spin on my eDonkey home box and
> >   said box didn't last half a day before oopsing in netlink/nf/tcp
> >   related routines (or so it seems to my untrained eye).
> 
> Yes, it indeed could be that there is some fishy interaction between the
> tcp stack and ip_queue causing the oops.
> 
> > K7800, 256MB RAM, uptodate FC3 running 2.6.13-rc6-git12,
> >  doing nothing but running MetaMachine's eDonkey 1.4.3 QT gui.
> > PeerGuardian is the 1.5 beta version available from methlabs.org.
> 
> Is it true that PeerGuardian is a proprietary application?  I'm not
> going to debug this problem using a proprietary ip_queue program, sorry.

I'm not sure I understand the issue; I built PG from these sources:

http://prdownloads.sourceforge.net/peerguardian/pglinux-1.5beta.tar.gz?download

 and I had to install the iptables-devel FC3 rpm to build. The PG
 sources seem to be licensed under GPLv2. But maybe you're
 referring to the fact that whatever PG does, it doesn't show up
 as output from 'iptables -L' ?

> If you can produce a testcase with open source userspace ip_queue code,
> I could look into reproducing the problem locally and debugging the
> problem more thoroughly.

So far the box has been running for over four hours, I'll configure
 my laptop as a netdump server hoping it might capture something
 if the ed2k box crashes again later. I'm afraid I won't be able to set
 up a real testcase (and btw, edonkey v1.4.3 from MetaMachine is
 actually a proprietary program, though entirely in userspace).

> While it definitely is a kernel bug (whatever userspace sends should not
> crash the kernel), it might be something that specifically [only]
> PeerGuardian does to the packet.  Something that ip_queue doesn't check
> (but should check) on packet reinjection and therefore upsets the TCP stack.
> 
> Also helpful would be the output of an "strace -f -x -s65535 -e
> trace=sendmsg" on the PeerGuardian (daemon?) process.
> 
> 
> > [] die+0xe4/0x170
> > [] do_trap+0x7f/0xc0
> > [] do_invalid_op+0xa3/0xb0
> > [] error_code+0x4f/0x54
> > [] kfree_skbmem+0xb/0x20
> > [] __kfree_skb+0x5f/0xf0
> 
> ok, so something down the chain from kfree_skb() results in an invalid
> operation? looks more like some compiler problem, bad memory or memory
> corruption to me.  Try to reproduce the problem without PG.

compiler is fc3's latest - gcc-3.4.4-2.fc3. I might have a go at
 memtest86 in the next weeks if more symptoms point at
 possible bad RAM.

> > [] tcp_clean_rtx_queue+0x16a/0x470
> > [] tcp_ack+0xf6/0x360
> > [] tcp_rcv_established+0x277/0x7a0
> > [] tcp_v4_do_rcv+0xf0/0x110
> > [] tcp_v4_rcv+0x6e0/0x820
> > [] ip_local_deliver_finish+0x84/0x160
> 
> so something in the tcp stack ends up doing tcp_clean_rtx_queue()
> 
> > [] nf_reinject+0x13a/0x1c0
> > [] ipq_issue_verdict+0x28/0x40
> > [] ipq_set_verdict+0x48/0x70
> 
> ip_queue reinjects a packet via nf_reinject()
> 
> > [] ipq_receive_peer+0x39/0x50
> > [] ipq_receive_sk+0x172/0x190
> 
> ip_queue receives and ipq verdict msg packet from netlink
> 
> > [] netlink_data_ready+0x35/0x60
> > [] netlink_sendskb+0x24/0x60
> > [] netlink_unicast+0x127/0x160
> > [] netlink_sendmsg+0x204/0x2b0
> > [] sock_sendmsg+0xb0/0xe0
> > [] sys_sendmsg+0x134/0x240
> > [] sys_socketcall+0x224/0x230
> > [] sysenter_past_esp+0x54/0x75
> 
> process sendmsg()s on the netlink socket.

Thanks,

--alessandro

 "Not every smile means I'm laughing inside"

(Wallflowers - "From The Bottom Of My Heart")
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-25 Thread Harald Welte
On Thu, Aug 25, 2005 at 03:39:02PM +0200, Alessandro Suardi wrote:
> Howdy, and excuse me for crossposting - feel free to zap CC to
>  unrelated, if any, mailing lists.
> 
>   just gave PeerGuardian a spin on my eDonkey home box and
>   said box didn't last half a day before oopsing in netlink/nf/tcp
>   related routines (or so it seems to my untrained eye).

Yes, it indeed could be that there is some fishy interaction between the
tcp stack and ip_queue causing the oops. 

> K7800, 256MB RAM, uptodate FC3 running 2.6.13-rc6-git12,
>  doing nothing but running MetaMachine's eDonkey 1.4.3 QT gui.
> PeerGuardian is the 1.5 beta version available from methlabs.org.

Is it true that PeerGuardian is a proprietary application?  I'm not
going to debug this problem using a proprietary ip_queue program, sorry.

If you can produce a testcase with open source userspace ip_queue code,
I could look into reproducing the problem locally and debugging the
problem more thoroughly.

While it definitely is a kernel bug (whatever userspace sends should not
crash the kernel), it might be something that specifically [only]
PeerGuardian does to the packet.  Something that ip_queue doesn't check
(but should check) on packet reinjection and therefore upsets the TCP stack.

Also helpful would be the output of an "strace -f -x -s65535 -e
trace=sendmsg" on the PeerGuardian (daemon?) process.


> [] die+0xe4/0x170
> [] do_trap+0x7f/0xc0
> [] do_invalid_op+0xa3/0xb0
> [] error_code+0x4f/0x54
> [] kfree_skbmem+0xb/0x20
> [] __kfree_skb+0x5f/0xf0

ok, so something down the chain from kfree_skb() results in an invalid
operation? looks more like some compiler problem, bad memory or memory
corruption to me.  Try to reproduce the problem without PG.

> [] tcp_clean_rtx_queue+0x16a/0x470
> [] tcp_ack+0xf6/0x360
> [] tcp_rcv_established+0x277/0x7a0
> [] tcp_v4_do_rcv+0xf0/0x110
> [] tcp_v4_rcv+0x6e0/0x820
> [] ip_local_deliver_finish+0x84/0x160

so something in the tcp stack ends up doing tcp_clean_rtx_queue()

> [] nf_reinject+0x13a/0x1c0
> [] ipq_issue_verdict+0x28/0x40
> [] ipq_set_verdict+0x48/0x70

ip_queue reinjects a packet via nf_reinject()

> [] ipq_receive_peer+0x39/0x50
> [] ipq_receive_sk+0x172/0x190

ip_queue receives and ipq verdict msg packet from netlink

> [] netlink_data_ready+0x35/0x60
> [] netlink_sendskb+0x24/0x60
> [] netlink_unicast+0x127/0x160
> [] netlink_sendmsg+0x204/0x2b0
> [] sock_sendmsg+0xb0/0xe0
> [] sys_sendmsg+0x134/0x240
> [] sys_socketcall+0x224/0x230
> [] sysenter_past_esp+0x54/0x75

process sendmsg()s on the netlink socket.
-- 
- Harald Welte <[EMAIL PROTECTED]> http://netfilter.org/

  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."-- Paul Vixie


pgpz7kKVQdD10.pgp
Description: PGP signature


oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-25 Thread Alessandro Suardi
Howdy, and excuse me for crossposting - feel free to zap CC to
 unrelated, if any, mailing lists.

  just gave PeerGuardian a spin on my eDonkey home box and
  said box didn't last half a day before oopsing in netlink/nf/tcp
  related routines (or so it seems to my untrained eye).

K7800, 256MB RAM, uptodate FC3 running 2.6.13-rc6-git12,
 doing nothing but running MetaMachine's eDonkey 1.4.3 QT gui.
PeerGuardian is the 1.5 beta version available from methlabs.org.

Stack is hand-copied from the dead box's console. 

[] die+0xe4/0x170
[] do_trap+0x7f/0xc0
[] do_invalid_op+0xa3/0xb0
[] error_code+0x4f/0x54
[] kfree_skbmem+0xb/0x20
[] __kfree_skb+0x5f/0xf0
[] tcp_clean_rtx_queue+0x16a/0x470
[] tcp_ack+0xf6/0x360
[] tcp_rcv_established+0x277/0x7a0
[] tcp_v4_do_rcv+0xf0/0x110
[] tcp_v4_rcv+0x6e0/0x820
[] ip_local_deliver_finish+0x84/0x160
[] nf_reinject+0x13a/0x1c0
[] ipq_issue_verdict+0x28/0x40
[] ipq_set_verdict+0x48/0x70
[] ipq_receive_peer+0x39/0x50
[] ipq_receive_sk+0x172/0x190
[] netlink_data_ready+0x35/0x60
[] netlink_sendskb+0x24/0x60
[] netlink_unicast+0x127/0x160
[] netlink_sendmsg+0x204/0x2b0
[] sock_sendmsg+0xb0/0xe0
[] sys_sendmsg+0x134/0x240
[] sys_socketcall+0x224/0x230
[] sysenter_past_esp+0x54/0x75
Code: 8b 41 0c 85 c0 75 1b 8b 86 94 00 00 00 e8 9e 37 e5 ff 5b 5e c9
c3 89 d0 e8 43 46 e5 ff 8d 76 00 eb d2 89 f0 e8 f7 fe ff ff eb dc <0f>
0b 54 01 16 d2 36 c0 eb b4 8d 74 26 00 8d bc 27 00 00 00 00
<0>Kernel panic - not syncing: Fatal exception in interrupt

If there's need for further info I'd be happy to provide it. For now
 the box is rebooted into the same kernel and running the same
 PG/eD2k programs, if the issue reproduces I'll follow up on my
 own message.

Thanks in advance, ciao,

--alessandro

 "Not every smile means I'm laughing inside"

(Wallflowers - "From The Bottom Of My Heart")
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-25 Thread Alessandro Suardi
Howdy, and excuse me for crossposting - feel free to zap CC to
 unrelated, if any, mailing lists.

  just gave PeerGuardian a spin on my eDonkey home box and
  said box didn't last half a day before oopsing in netlink/nf/tcp
  related routines (or so it seems to my untrained eye).

K7800, 256MB RAM, uptodate FC3 running 2.6.13-rc6-git12,
 doing nothing but running MetaMachine's eDonkey 1.4.3 QT gui.
PeerGuardian is the 1.5 beta version available from methlabs.org.

Stack is hand-copied from the dead box's console. 

[c0103714] die+0xe4/0x170
[c010381f] do_trap+0x7f/0xc0
[c0103b33] do_invalid_op+0xa3/0xb0
[c0102faf] error_code+0x4f/0x54
[c02eb05b] kfree_skbmem+0xb/0x20
[c02eb0cf] __kfree_skb+0x5f/0xf0
[c031304a] tcp_clean_rtx_queue+0x16a/0x470
[c0313746] tcp_ack+0xf6/0x360
[c0315d57] tcp_rcv_established+0x277/0x7a0
[c031eba0] tcp_v4_do_rcv+0xf0/0x110
[c031f2a0] tcp_v4_rcv+0x6e0/0x820
[c0305594] ip_local_deliver_finish+0x84/0x160
[c02fbe4a] nf_reinject+0x13a/0x1c0
[c033f0d8] ipq_issue_verdict+0x28/0x40
[c033f968] ipq_set_verdict+0x48/0x70
[c033fa79] ipq_receive_peer+0x39/0x50
[c033fc72] ipq_receive_sk+0x172/0x190
[c02fffa5] netlink_data_ready+0x35/0x60
[c02ff4a4] netlink_sendskb+0x24/0x60
[c02ff657] netlink_unicast+0x127/0x160
[c02ffcc4] netlink_sendmsg+0x204/0x2b0
[c02e6dc0] sock_sendmsg+0xb0/0xe0
[c02e83f4] sys_sendmsg+0x134/0x240
[c02e88e4] sys_socketcall+0x224/0x230
[c0102d3b] sysenter_past_esp+0x54/0x75
Code: 8b 41 0c 85 c0 75 1b 8b 86 94 00 00 00 e8 9e 37 e5 ff 5b 5e c9
c3 89 d0 e8 43 46 e5 ff 8d 76 00 eb d2 89 f0 e8 f7 fe ff ff eb dc 0f
0b 54 01 16 d2 36 c0 eb b4 8d 74 26 00 8d bc 27 00 00 00 00
0Kernel panic - not syncing: Fatal exception in interrupt

If there's need for further info I'd be happy to provide it. For now
 the box is rebooted into the same kernel and running the same
 PG/eD2k programs, if the issue reproduces I'll follow up on my
 own message.

Thanks in advance, ciao,

--alessandro

 Not every smile means I'm laughing inside

(Wallflowers - From The Bottom Of My Heart)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-25 Thread Harald Welte
On Thu, Aug 25, 2005 at 03:39:02PM +0200, Alessandro Suardi wrote:
 Howdy, and excuse me for crossposting - feel free to zap CC to
  unrelated, if any, mailing lists.
 
   just gave PeerGuardian a spin on my eDonkey home box and
   said box didn't last half a day before oopsing in netlink/nf/tcp
   related routines (or so it seems to my untrained eye).

Yes, it indeed could be that there is some fishy interaction between the
tcp stack and ip_queue causing the oops. 

 K7800, 256MB RAM, uptodate FC3 running 2.6.13-rc6-git12,
  doing nothing but running MetaMachine's eDonkey 1.4.3 QT gui.
 PeerGuardian is the 1.5 beta version available from methlabs.org.

Is it true that PeerGuardian is a proprietary application?  I'm not
going to debug this problem using a proprietary ip_queue program, sorry.

If you can produce a testcase with open source userspace ip_queue code,
I could look into reproducing the problem locally and debugging the
problem more thoroughly.

While it definitely is a kernel bug (whatever userspace sends should not
crash the kernel), it might be something that specifically [only]
PeerGuardian does to the packet.  Something that ip_queue doesn't check
(but should check) on packet reinjection and therefore upsets the TCP stack.

Also helpful would be the output of an strace -f -x -s65535 -e
trace=sendmsg on the PeerGuardian (daemon?) process.


 [c0103714] die+0xe4/0x170
 [c010381f] do_trap+0x7f/0xc0
 [c0103b33] do_invalid_op+0xa3/0xb0
 [c0102faf] error_code+0x4f/0x54
 [c02eb05b] kfree_skbmem+0xb/0x20
 [c02eb0cf] __kfree_skb+0x5f/0xf0

ok, so something down the chain from kfree_skb() results in an invalid
operation? looks more like some compiler problem, bad memory or memory
corruption to me.  Try to reproduce the problem without PG.

 [c031304a] tcp_clean_rtx_queue+0x16a/0x470
 [c0313746] tcp_ack+0xf6/0x360
 [c0315d57] tcp_rcv_established+0x277/0x7a0
 [c031eba0] tcp_v4_do_rcv+0xf0/0x110
 [c031f2a0] tcp_v4_rcv+0x6e0/0x820
 [c0305594] ip_local_deliver_finish+0x84/0x160

so something in the tcp stack ends up doing tcp_clean_rtx_queue()

 [c02fbe4a] nf_reinject+0x13a/0x1c0
 [c033f0d8] ipq_issue_verdict+0x28/0x40
 [c033f968] ipq_set_verdict+0x48/0x70

ip_queue reinjects a packet via nf_reinject()

 [c033fa79] ipq_receive_peer+0x39/0x50
 [c033fc72] ipq_receive_sk+0x172/0x190

ip_queue receives and ipq verdict msg packet from netlink

 [c02fffa5] netlink_data_ready+0x35/0x60
 [c02ff4a4] netlink_sendskb+0x24/0x60
 [c02ff657] netlink_unicast+0x127/0x160
 [c02ffcc4] netlink_sendmsg+0x204/0x2b0
 [c02e6dc0] sock_sendmsg+0xb0/0xe0
 [c02e83f4] sys_sendmsg+0x134/0x240
 [c02e88e4] sys_socketcall+0x224/0x230
 [c0102d3b] sysenter_past_esp+0x54/0x75

process sendmsg()s on the netlink socket.
-- 
- Harald Welte [EMAIL PROTECTED] http://netfilter.org/

  Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed.-- Paul Vixie


pgpz7kKVQdD10.pgp
Description: PGP signature


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-25 Thread Alessandro Suardi
On 8/25/05, Harald Welte [EMAIL PROTECTED] wrote:
 On Thu, Aug 25, 2005 at 03:39:02PM +0200, Alessandro Suardi wrote:
  Howdy, and excuse me for crossposting - feel free to zap CC to
   unrelated, if any, mailing lists.
 
just gave PeerGuardian a spin on my eDonkey home box and
said box didn't last half a day before oopsing in netlink/nf/tcp
related routines (or so it seems to my untrained eye).
 
 Yes, it indeed could be that there is some fishy interaction between the
 tcp stack and ip_queue causing the oops.
 
  K7800, 256MB RAM, uptodate FC3 running 2.6.13-rc6-git12,
   doing nothing but running MetaMachine's eDonkey 1.4.3 QT gui.
  PeerGuardian is the 1.5 beta version available from methlabs.org.
 
 Is it true that PeerGuardian is a proprietary application?  I'm not
 going to debug this problem using a proprietary ip_queue program, sorry.

I'm not sure I understand the issue; I built PG from these sources:

http://prdownloads.sourceforge.net/peerguardian/pglinux-1.5beta.tar.gz?download

 and I had to install the iptables-devel FC3 rpm to build. The PG
 sources seem to be licensed under GPLv2. But maybe you're
 referring to the fact that whatever PG does, it doesn't show up
 as output from 'iptables -L' ?

 If you can produce a testcase with open source userspace ip_queue code,
 I could look into reproducing the problem locally and debugging the
 problem more thoroughly.

So far the box has been running for over four hours, I'll configure
 my laptop as a netdump server hoping it might capture something
 if the ed2k box crashes again later. I'm afraid I won't be able to set
 up a real testcase (and btw, edonkey v1.4.3 from MetaMachine is
 actually a proprietary program, though entirely in userspace).

 While it definitely is a kernel bug (whatever userspace sends should not
 crash the kernel), it might be something that specifically [only]
 PeerGuardian does to the packet.  Something that ip_queue doesn't check
 (but should check) on packet reinjection and therefore upsets the TCP stack.
 
 Also helpful would be the output of an strace -f -x -s65535 -e
 trace=sendmsg on the PeerGuardian (daemon?) process.
 
 
  [c0103714] die+0xe4/0x170
  [c010381f] do_trap+0x7f/0xc0
  [c0103b33] do_invalid_op+0xa3/0xb0
  [c0102faf] error_code+0x4f/0x54
  [c02eb05b] kfree_skbmem+0xb/0x20
  [c02eb0cf] __kfree_skb+0x5f/0xf0
 
 ok, so something down the chain from kfree_skb() results in an invalid
 operation? looks more like some compiler problem, bad memory or memory
 corruption to me.  Try to reproduce the problem without PG.

compiler is fc3's latest - gcc-3.4.4-2.fc3. I might have a go at
 memtest86 in the next weeks if more symptoms point at
 possible bad RAM.

  [c031304a] tcp_clean_rtx_queue+0x16a/0x470
  [c0313746] tcp_ack+0xf6/0x360
  [c0315d57] tcp_rcv_established+0x277/0x7a0
  [c031eba0] tcp_v4_do_rcv+0xf0/0x110
  [c031f2a0] tcp_v4_rcv+0x6e0/0x820
  [c0305594] ip_local_deliver_finish+0x84/0x160
 
 so something in the tcp stack ends up doing tcp_clean_rtx_queue()
 
  [c02fbe4a] nf_reinject+0x13a/0x1c0
  [c033f0d8] ipq_issue_verdict+0x28/0x40
  [c033f968] ipq_set_verdict+0x48/0x70
 
 ip_queue reinjects a packet via nf_reinject()
 
  [c033fa79] ipq_receive_peer+0x39/0x50
  [c033fc72] ipq_receive_sk+0x172/0x190
 
 ip_queue receives and ipq verdict msg packet from netlink
 
  [c02fffa5] netlink_data_ready+0x35/0x60
  [c02ff4a4] netlink_sendskb+0x24/0x60
  [c02ff657] netlink_unicast+0x127/0x160
  [c02ffcc4] netlink_sendmsg+0x204/0x2b0
  [c02e6dc0] sock_sendmsg+0xb0/0xe0
  [c02e83f4] sys_sendmsg+0x134/0x240
  [c02e88e4] sys_socketcall+0x224/0x230
  [c0102d3b] sysenter_past_esp+0x54/0x75
 
 process sendmsg()s on the netlink socket.

Thanks,

--alessandro

 Not every smile means I'm laughing inside

(Wallflowers - From The Bottom Of My Heart)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: oops in 2.6.13-rc6-git12 in tcp/netfilter routines

2005-08-25 Thread Sven Schuster

Hi Harald,

On Thu, Aug 25, 2005 at 06:55:50PM +0200, Harald Welte told us:
 Is it true that PeerGuardian is a proprietary application?  I'm not
 going to debug this problem using a proprietary ip_queue program, sorry.

sorry to jump in here, but I took a quick look at PeerGuardian,
according to
http://methlabs.org/wiki/license_information
it's open source.  The source code is available at
http://methlabs.org/projects/peerguardian-linuxosx/

HTH

Sven

-- 
Linux zion.homelinux.com 2.6.13-rc6-mm2 #3 Thu Aug 25 14:53:55 CEST 2005 i686 
athlon i386 GNU/Linux
 22:56:18 up  7:40,  1 user,  load average: 0.46, 0.14, 0.04


pgp8ptImjJfSl.pgp
Description: PGP signature