[pfSense] Frequent bge0: watchdog timeout -- resetting problems

2013-05-13 Thread Paul Mather
I'm running pfSense 2.0.3-RELEASE (i386) on a Dell 2650 rack-mount server.  I'm 
using the built-in Broadcom gigabit ethernet NICs for WAN and LAN:

bge0: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x000105 mem 
0xfca1-0xfca1 irq 28 at device 6.0 on pci4
miibus0: MII bus on bge0
brgphy0: BCM5701 10/100/1000baseTX PHY PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge0: [ITHREAD]
bge1: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x000105 mem 
0xfca0-0xfca0 irq 29 at device 8.0 on pci4
miibus1: MII bus on bge1
brgphy1: BCM5701 10/100/1000baseTX PHY PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge1: [ITHREAD]

bge0@pci0:4:6:0:class=0x02 card=0x01211028 chip=0x164514e4 rev=0x15 
hdr=0x00
class  = network
subclass   = ethernet
cap 07[40] = PCI-X 64-bit supports 133MHz, 512 burst read, 1 split 
transaction
cap 01[48] = powerspec 2  supports D0 D3  current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit 
bge1@pci0:4:8:0:class=0x02 card=0x01211028 chip=0x164514e4 rev=0x15 
hdr=0x00
class  = network
subclass   = ethernet
cap 07[40] = PCI-X 64-bit supports 133MHz, 512 burst read, 1 split 
transaction
cap 01[48] = powerspec 2  supports D0 D3  current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit 


I am having severe problems with these NICs---particularly the WAN side (bge0). 
 Under traffic (not necessarily high load), I will lose connectivity for some 
time until the NIC appears to be reset via a watchdog.  It is typical to see 
this repeated in dmesg:

bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP


In System - Advanced - Networking, I have disabled hardware checksum offload; 
hardware TCP segmentation offload; and hardware large receive offload, but this 
hasn't seemed to help.  I have seen on Google references to problems with 
Broadcom 57XX-based NICs under FreeBSD, and there are indications some work has 
been done in FreeBSD 9-STABLE to improve matters, which is obviously not 
helpful for pfSense running 8.1-RELEASE-p13.

I have checked the state table usage when this problem occurs and it is low 
(with ample free state entries available).

I have heard that disabling MSI can sometimes be helpful, but the bge driver 
does not appear to use it:

sysctl -a | grep msi
hw.bce.msi_enable: 1
hw.cxgb.msi_allowed: 2
hw.em.enable_msix: 1
hw.igb.enable_msix: 1
hw.malo.pci.msi_disable: 0
hw.pci.honor_msi_blacklist: 1
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1


Has anyone run into this problem?  Can anyone offer a possible solution or 
workaround?

I have a dual-NIC expansion card in the same machine that supports fxp NICs, 
and, right now, I am tempted to switch to those, believing it is probably 
better to have stable 100BaseT than flaky 1000BaseT.  But, I'm hoping something 
can be done to make the bge ports be stable.  Any thoughts?

Cheers,

Paul.
___
List mailing list
List@lists.pfsense.org
http://lists.pfsense.org/mailman/listinfo/list


Re: [pfSense] Frequent bge0: watchdog timeout -- resetting problems

2013-05-13 Thread Paul Mather
On May 13, 2013, at 10:40 AM, Giles Coochey gi...@coochey.net wrote:

 On 13/05/2013 15:07, Paul Mather wrote:
 
 bge0: watchdog timeout -- resetting
 bge0: link state changed to DOWN
 bge0: link state changed to UP
 bge0: watchdog timeout -- resetting
 bge0: link state changed to DOWN
 bge0: link state changed to UP
 bge0: watchdog timeout -- resetting
 bge0: link state changed to DOWN
 bge0: link state changed to UP
 bge0: watchdog timeout -- resetting
 bge0: link state changed to DOWN
 bge0: link state changed to UP
 
 
 I had something similar, with a VM implementation, it seemed to go away when 
 I increased the memory on the system.


How much memory was in the increased-memory system?  The hardware I am using 
has 2 GB of RAM, which should be plenty for pfSense.  According to the RRD 
graphs, active+wired+cached memory usage is normally below 5% of total RAM at 
all times on this system.

Cheers,

Paul.

___
List mailing list
List@lists.pfsense.org
http://lists.pfsense.org/mailman/listinfo/list