Re: Channel bonding kernel crash, workaround

2001-03-23 Thread Jeff Golds

I heard about this issue, and just joined the mailing list.  I am
working on a driver similar to the bonding driver and am getting the
same results.

If I do the following:
ifconfig fte0 10.0.0.2 up
ifenslave fte0 eth1
ifenslave fte0 eth2
ifconfig fte0 down
ifconfig eth1

I get a kernel panic.  I checked the Oops log and saw the following with
ksymoops:

Oops: 
EIP: 0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
Call Trace: [] [] [] []
[] [] [] [] []
[] []
Code: 0f b6 4b 0b 8d 73 04 8b 7c 24 24 fc 39 c9 89 0c 24 f3 a6 0f
 
>>EIP; c020c7c9<=
Trace; c0232d52 
Trace; c0232d80 
Trace; c02330d6 
Trace; c02310f7 
Trace; c011f1d7 
Trace; c020a943 
Trace; c020b9cf 
Trace; c0230a04 
Trace; c0206388 
Trace; c0142977 
Trace; c0109127 
Code;  c020c7c9 
 <_EIP>:
Code;  c020c7c9<=
   0:   0f b6 4b 0b   movzbl 0xb(%ebx),%ecx   <=
Code;  c020c7cd 
   4:   8d 73 04  lea0x4(%ebx),%esi
Code;  c020c7d0 
   7:   8b 7c 24 24   mov0x24(%esp,1),%edi
Code;  c020c7d4 
   b:   fccld
Code;  c020c7d5 
   c:   39 c9 cmp%ecx,%ecx
Code;  c020c7d7 
   e:   89 0c 24  mov%ecx,(%esp,1)
Code;  c020c7da 
  11:   f3 a6 repz cmpsb %es:(%edi),%ds:(%esi)
Code;  c020c7dc 
  13:   0f 00 00  sldt   (%eax)

This is AFTER I removed the code in my driver to set the slave devices'
multicast lists to be equal to the master's.  When I put that code back
in (see bond_set_multicast_list in bonding.c), I get a crash at the same
location, but the call trace is slightly different.

It's looking like the multicast list is either corrupted or the data was
freed elsewhere.

Anyone have any ideas what might be wrong?  The driver I am working on
is a close match to the bonding driver, so that can be used as a
reference.

Thanks for any feedback.

-Jeff

P.S.  I saw the workaround posted earlier, but I am trying to fix this
crash.

-- 
Jeff Golds
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Channel bonding kernel crash, workaround

2001-03-23 Thread Jeff Golds

I heard about this issue, and just joined the mailing list.  I am
working on a driver similar to the bonding driver and am getting the
same results.

If I do the following:
ifconfig fte0 10.0.0.2 up
ifenslave fte0 eth1
ifenslave fte0 eth2
ifconfig fte0 down
ifconfig eth1

I get a kernel panic.  I checked the Oops log and saw the following with
ksymoops:

Oops: 
EIP: 0010:[c020c7c9]
Using defaults from ksymoops -t elf32-i386 -a i386
Call Trace: [c0232d52] [c0232d80] [c02330d6] [c02310f7]
[c011f1d7] [c020a943] [c020b9cf] [c0230a04] [c0206388]
[c0142977] [c0109127]
Code: 0f b6 4b 0b 8d 73 04 8b 7c 24 24 fc 39 c9 89 0c 24 f3 a6 0f
 
EIP; c020c7c9 dev_mc_delete+a9/120   =
Trace; c0232d52 ip_mc_filter_del+32/40
Trace; c0232d80 igmp_group_dropped+20/b0
Trace; c02330d6 ip_mc_down+56/70
Trace; c02310f7 inetdev_event+f7/150
Trace; c011f1d7 notifier_call_chain+27/50
Trace; c020a943 dev_close+53/80
Trace; c020b9cf dev_change_flags+4f/f0
Trace; c0230a04 devinet_ioctl+2c4/630
Trace; c0206388 sock_ioctl+58/80
Trace; c0142977 sys_ioctl+1c7/210
Trace; c0109127 system_call+33/38
Code;  c020c7c9 dev_mc_delete+a9/120
 _EIP:
Code;  c020c7c9 dev_mc_delete+a9/120   =
   0:   0f b6 4b 0b   movzbl 0xb(%ebx),%ecx   =
Code;  c020c7cd dev_mc_delete+ad/120
   4:   8d 73 04  lea0x4(%ebx),%esi
Code;  c020c7d0 dev_mc_delete+b0/120
   7:   8b 7c 24 24   mov0x24(%esp,1),%edi
Code;  c020c7d4 dev_mc_delete+b4/120
   b:   fccld
Code;  c020c7d5 dev_mc_delete+b5/120
   c:   39 c9 cmp%ecx,%ecx
Code;  c020c7d7 dev_mc_delete+b7/120
   e:   89 0c 24  mov%ecx,(%esp,1)
Code;  c020c7da dev_mc_delete+ba/120
  11:   f3 a6 repz cmpsb %es:(%edi),%ds:(%esi)
Code;  c020c7dc dev_mc_delete+bc/120
  13:   0f 00 00  sldt   (%eax)

This is AFTER I removed the code in my driver to set the slave devices'
multicast lists to be equal to the master's.  When I put that code back
in (see bond_set_multicast_list in bonding.c), I get a crash at the same
location, but the call trace is slightly different.

It's looking like the multicast list is either corrupted or the data was
freed elsewhere.

Anyone have any ideas what might be wrong?  The driver I am working on
is a close match to the bonding driver, so that can be used as a
reference.

Thanks for any feedback.

-Jeff

P.S.  I saw the workaround posted earlier, but I am trying to fix this
crash.

-- 
Jeff Golds
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Channel bonding kernel crash, workaround

2001-03-22 Thread Pfenniger Daniel

Hi, 

Up to the latests kernels (-> 2.4.2) channel bonding crashes the kernel
(aille...) when turning it off (e.g. at reboot).  

Here is a way to avoid this, which might help gourous to track the bug. 
Suppose ifconfig says: 
 
bond0 Link encap:Ethernet  HWaddr 00:40:05:A1:C4:13  
  inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
  UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:823297 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0 

eth0  Link encap:Ethernet  HWaddr 00:40:05:A1:C4:13  
  inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
  RX packets:487424 errors:0 dropped:0 overruns:0 frame:0
  TX packets:411649 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:100 
  Interrupt:19 Base address:0xd000 

eth1  Link encap:Ethernet  HWaddr 00:40:05:A1:C4:13  
  inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
  RX packets:487526 errors:0 dropped:0 overruns:0 frame:0
  TX packets:411648 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:100 
  Interrupt:18 Base address:0xb800 

then the exact sequence:

ifconfig eth0 down 
ifconfig eth1 down 
ifconfig bond0 down 
ifconfig eth0 down 
ifconfig eth1 down 

turns off bond0 without crash.  This was tested on several computers with 
all kernel 2.4.2, SMP Pentium II and tulip 21140 and/or 21142/3 NICs.

Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Channel bonding kernel crash, workaround

2001-03-22 Thread Pfenniger Daniel

Hi, 

Up to the latests kernels (- 2.4.2) channel bonding crashes the kernel
(aille...) when turning it off (e.g. at reboot).  

Here is a way to avoid this, which might help gourous to track the bug. 
Suppose ifconfig says: 
 
bond0 Link encap:Ethernet  HWaddr 00:40:05:A1:C4:13  
  inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
  UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:823297 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0 

eth0  Link encap:Ethernet  HWaddr 00:40:05:A1:C4:13  
  inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
  RX packets:487424 errors:0 dropped:0 overruns:0 frame:0
  TX packets:411649 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:100 
  Interrupt:19 Base address:0xd000 

eth1  Link encap:Ethernet  HWaddr 00:40:05:A1:C4:13  
  inet addr:192.168.2.64  Bcast:192.168.2.255  Mask:255.255.255.0
  UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
  RX packets:487526 errors:0 dropped:0 overruns:0 frame:0
  TX packets:411648 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:100 
  Interrupt:18 Base address:0xb800 

then the exact sequence:

ifconfig eth0 down 
ifconfig eth1 down 
ifconfig bond0 down 
ifconfig eth0 down 
ifconfig eth1 down 

turns off bond0 without crash.  This was tested on several computers with 
all kernel 2.4.2, SMP Pentium II and tulip 21140 and/or 21142/3 NICs.

Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/