Re: reproducible cxgb kernel panic in FC8 kernel 2.6.23.1-49

2007-11-19 Thread Ben Greear

Divy Le Ray wrote:

Ben Greear wrote:
This panic happens (almost?) immediately after starting TCP traffic 
between

the cxgb nic on this system and another.  We also got at least one crash
on a custom/tainted 2.6.20.12 kernel, but it would run for at least
a few minutes at ~1Gbps first.

I think my serial console chomped some of this..but it's very 
reproducible,

so if you need more info I can make the terminal wider and do it again.


Hi Ben,

I just posted a patch fixing this T2 crash. It appeared in 2.6.22, when 
eth_type_trans()
was modified to set skb-dev. cxgb3 got fixed at the time, but I 
obviously forgot the
chelsio driver. I'm a bit behind on T2 updates. I will get to it in a 
few days.


Thanks, that seems to have fixed the crash.

A few other bugs to report:

1)  tx/rx pkt counters remain an zero, even though I know it is passing packets.

2)  There are lots of errors about inadequate headroom in Tx.  I had TCP working
at one point, but then it stopped answering ARP for whatever reason.   Never
got UDP to work at all, even when TCP was working.

3)  After resetting the interface (ifdown, ifup), one machine suddenly had a BUG
(null pointer exception) and rebooted.  The listing in /var/log/messages is not
complete (has no stack-trace or module), so I do not include it here.

This 2.6.23 kernel is patched with some of my own hackings, and it's possible 
that
my changes are causing the problem (but, it works fine with e1000 NICs).

If you have any patches you would like us to try, we'll be happy to do so.

Thanks,
Ben

--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducible cxgb kernel panic in FC8 kernel 2.6.23.1-49

2007-11-15 Thread Divy Le Ray

Ben Greear wrote:
This panic happens (almost?) immediately after starting TCP traffic 
between

the cxgb nic on this system and another.  We also got at least one crash
on a custom/tainted 2.6.20.12 kernel, but it would run for at least
a few minutes at ~1Gbps first.

I think my serial console chomped some of this..but it's very 
reproducible,

so if you need more info I can make the terminal wider and do it again.


Hi Ben,

I just posted a patch fixing this T2 crash. It appeared in 2.6.22, when 
eth_type_trans()
was modified to set skb-dev. cxgb3 got fixed at the time, but I 
obviously forgot the
chelsio driver. I'm a bit behind on T2 updates. I will get to it in a 
few days.


Cheers,
Divy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


reproducible cxgb kernel panic in FC8 kernel 2.6.23.1-49

2007-11-13 Thread Ben Greear

This panic happens (almost?) immediately after starting TCP traffic between
the cxgb nic on this system and another.  We also got at least one crash
on a custom/tainted 2.6.20.12 kernel, but it would run for at least
a few minutes at ~1Gbps first.

I think my serial console chomped some of this..but it's very reproducible,
so if you need more info I can make the terminal wider and do it again.

I'm not sure it matters..but the peer NIC (directly connected w/fibre) is
a similar cxgb NIC but with TOE support (the longer, more expensive one).


[EMAIL PROTECTED] ~]# BUG: unable to handle kernel NULL pointer dereference at 
virtual address 0194
printing eip: f8a80b67 *pde = 7d0ac067
Oops: 0002 [#1] SMP
Modules linked in: arc4 michael_mic 8021q cxgb e1000 macvlan pktgen autofs4 
sunrpc ipv6 loop dm_multipath i50d
CPU:1
EIP:0060:[f8a80b67]Not tainted VLI
EFLAGS: 00010206   (2.6.23.1-49.fc8 #1)
EIP is at t1_poll+0x2e0/0x64a [cxgb]
eax: fffd7d78   ebx: f6e56e02   ecx: f6e20500   edx: 
esi: f6ed8846   edi: f6f63428   ebp: f6820500   esp: c0789f7c
ds: 007b   es: 007b   fs: 00d8  gs:   ss: 0068
Process swapper (pid: 0, ti=c0789000 task=f7c42c20 task.ti=c211d000)
Stack:    c0789fd4 f6e2 f6e20500  
   f69f2060 f6f63448 0040 f6f63428 f6e20500 f6f63400  
    f6e2 c2017714 c2017700 c05bdc74 fffd7d78 012c 0001
Call Trace:
 [c05bdc74] net_rx_action+0x9a/0x196
 [c0431e06] __do_softirq+0x66/0xd3
 [c04073d5] do_softirq+0x6c/0xce
 [c0444675] tick_do_update_jiffies64+0x15/0xa8
 [c044018b] ktime_get+0xf/0x2b
 [c045bac7] handle_edge_irq+0x0/0xfc
 [c0431cc9] irq_exit+0x38/0x6b
 [c04074d6] do_IRQ+0x9f/0xb9
 [c043ff60] hrtimer_start+0xe6/0xf0
 [c0405b6f] common_interrupt+0x23/0x28
 [c04032a1] mwait_idle_with_hints+0x3b/0x3f
 [c04032a5] mwait_idle+0x0/0x13
 [c040340b] cpu_idle+0xab/0xcc
 ===
Code: 68 b3 c7 e9 ef 01 00 00 8b 45 50 83 e8 08 3b 45 54 89 45 50 73 04 0f 0b 
eb fe 8d 43 08 8b 55 14 89 85 a
EIP: [f8a80b67] t1_poll+0x2e0/0x64a [cxgb] SS:ESP 0068:c0789f7c
Kernel panic - not syncing: Fatal exception in interrupt


lspci is below:

[EMAIL PROTECTED] ~]# lspci
00:00.0 Host bridge: Intel Corporation 5000V Chipset Memory Controller Hub (rev 
b1)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 
2-3 (rev b1)
00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine 
(rev b1)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers 
(rev b1)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers 
(rev b1)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 
b1)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 
b1)
00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express 
Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB 
Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB 
Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB 
Controller #3 (rev 09)
00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB 
Controller #4 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI 
USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC 
Interface Controller (rev 09)
00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09)
00:1f.2 IDE interface: Intel Corporation 631xESB/632xESB/3100 Chipset SATA IDE 
Controller (rev 09)
00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller 
(rev 09)
01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port 
(rev 01)
01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X 
Bridge (rev 01)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream 
Port E1 (rev 01)
02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream 
Port E3 (rev 01)
04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet 
Controller (Copper) (rev 01)
04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet 
Controller (Copper) (rev 01)
05:01.0 Ethernet controller: Chelsio Communications Inc Unknown device 000a 
07:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
[EMAIL PROTECTED] ~]#



--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this