Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-03-02 Thread Nick Rogers
Second that. Daily panics using a Tyan board w/ BCM5704. Unfortunately
unable to provide crash dump and I was forced to use a different NIC. But
for what its worth here is the relevant pciconf -lv output.

b...@pci0:2:9:0: class=0x02 card=0x164814e4 chip=0x164814e4 rev=0x03
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet
b...@pci0:2:9:1: class=0x02 card=0x164814e4 chip=0x164814e4 rev=0x03
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet


On Sat, Feb 27, 2010 at 2:50 PM, Erik Klavon er...@berkeley.edu wrote:

 I have BCM5704 hardware (Tyan S2882 system board). I am seeing kernel
 panics very similar to those described in this thread on this
 hardware. pciconf -lcv output below. If you'd like access to this
 hardware I can arrange it; please contact me off list.

 Erik

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-03-01 Thread Dmitry Rybin
Broadcom 5714 5715 - no problems.

2010/2/22 Denis Lamanov ukrzi...@gmail.com:
 Yes, PCIX BCM5704
 FreeBSD vpn2 8.0-STABLE FreeBSD 8.0-STABLE #1 r204028: Thu Feb 18 08:29:42
 EET 2010     ad...@vpn2:/usr/obj/usr/src/sys/GENERIC  i386

 2010/2/22 Pyun YongHyeon pyu...@gmail.com

 On Mon, Feb 22, 2010 at 03:17:17PM +0200, Denis Lamanov wrote:
  I see same trouble (lost packets after 4 day uptime and reboot) :(
 
  dev.bge.0.stats.rx.FCSErrors: 18
 

 You also have PCIX BCM5704 controller? What FreeBSD version do you
 use?

  2010/2/19 Slawa Olhovchenkov s...@zxy.spb.ru
 
   On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
  
   
 dev.bge.1.stats.rx.Fragments: 1
   
You received a frame that is less than 64 bytes with a bad FCS.
   
 dev.bge.1.stats.rx.UcastPkts: 2956515
 dev.bge.1.stats.rx.MulticastPkts: 0
 dev.bge.1.stats.rx.FCSErrors: 18
   
You have a lot of FCS errors here.
Please double check cabling. If the statistics counter is right,
sender is guilty or you have bad cabling issues here.
  
   1. lost packets much more 18. I think hundreds, or thousands.
   2. packets lost on both (bge0  bge1) interfaces
   3. packets don't lost on sources at Aug'09
   ___
   freebsd-stable@freebsd.org mailing list
   http://lists.freebsd.org/mailman/listinfo/freebsd-stable
   To unsubscribe, send any mail to 
 freebsd-stable-unsubscr...@freebsd.org
  

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-27 Thread Erik Klavon
Hi Pyun,

On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:
 Since I don't have BCM5704 hardware it's hard to find which
 revision may affect to this issue. Could you narrow down which
 revision number started showing the issue?

I have BCM5704 hardware (Tyan S2882 system board). I am seeing kernel
panics very similar to those described in this thread on this
hardware. pciconf -lcv output below. If you'd like access to this
hardware I can arrange it; please contact me off list.

Erik

b...@pci0:2:9:0:class=0x02 card=0x164414e4 chip=0x164814e4 rev=0x03 
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet
cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
transaction
cap 01[48] = powerspec 2  supports D0 D3  current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit 
b...@pci0:2:9:1:class=0x02 card=0x164414e4 chip=0x164814e4 rev=0x03 
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
class  = network
subclass   = ethernet
cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
transaction
cap 01[48] = powerspec 2  supports D0 D3  current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-22 Thread Denis Lamanov
I see same trouble (lost packets after 4 day uptime and reboot) :(

dev.bge.0.stats.rx.FCSErrors: 18

2010/2/19 Slawa Olhovchenkov s...@zxy.spb.ru

 On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:

 
   dev.bge.1.stats.rx.Fragments: 1
 
  You received a frame that is less than 64 bytes with a bad FCS.
 
   dev.bge.1.stats.rx.UcastPkts: 2956515
   dev.bge.1.stats.rx.MulticastPkts: 0
   dev.bge.1.stats.rx.FCSErrors: 18
 
  You have a lot of FCS errors here.
  Please double check cabling. If the statistics counter is right,
  sender is guilty or you have bad cabling issues here.

 1. lost packets much more 18. I think hundreds, or thousands.
 2. packets lost on both (bge0  bge1) interfaces
 3. packets don't lost on sources at Aug'09
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-22 Thread Denis Lamanov
vpn2# sysctl dev.bge.0.stats
dev.bge.0.stats.FramesDroppedDueToFilters: 0
dev.bge.0.stats.DmaWriteQueueFull: 0
dev.bge.0.stats.DmaWriteHighPriQueueFull: 0
dev.bge.0.stats.NoMoreRxBDs: 0
dev.bge.0.stats.InputDiscards: 0
dev.bge.0.stats.InputErrors: 0
dev.bge.0.stats.RecvThresholdHit: 36622
dev.bge.0.stats.DmaReadQueueFull: 17
dev.bge.0.stats.DmaReadHighPriQueueFull: 0
dev.bge.0.stats.SendDataCompQueueFull: 0
dev.bge.0.stats.RingSetSendProdIndex: 116130
dev.bge.0.stats.RingStatusUpdate: 79240
dev.bge.0.stats.Interrupts: 79240
dev.bge.0.stats.AvoidedInterrupts: 0
dev.bge.0.stats.SendThresholdHit: 0
dev.bge.0.stats.rx.Octets: 132390898
dev.bge.0.stats.rx.Fragments: 0
dev.bge.0.stats.rx.UcastPkts: 117696
dev.bge.0.stats.rx.MulticastPkts: 1
dev.bge.0.stats.rx.FCSErrors: 41
dev.bge.0.stats.rx.AlignmentErrors: 0
dev.bge.0.stats.rx.xonPauseFramesReceived: 0
dev.bge.0.stats.rx.xoffPauseFramesReceived: 0
dev.bge.0.stats.rx.ControlFramesReceived: 0
dev.bge.0.stats.rx.xoffStateEntered: 0
dev.bge.0.stats.rx.FramesTooLong: 0
dev.bge.0.stats.rx.Jabbers: 0
dev.bge.0.stats.rx.UndersizePkts: 0
dev.bge.0.stats.rx.inRangeLengthError: 0
dev.bge.0.stats.rx.outRangeLengthError: 0
dev.bge.0.stats.tx.Octets: 125971311
dev.bge.0.stats.tx.Collisions: 0
dev.bge.0.stats.tx.XonSent: 0
dev.bge.0.stats.tx.XoffSent: 0
dev.bge.0.stats.tx.flowControlDone: 0
dev.bge.0.stats.tx.InternalMacTransmitErrors: 0
dev.bge.0.stats.tx.SingleCollisionFrames: 0
dev.bge.0.stats.tx.MultipleCollisionFrames: 0
dev.bge.0.stats.tx.DeferredTransmissions: 0
dev.bge.0.stats.tx.ExcessiveCollisions: 0
dev.bge.0.stats.tx.LateCollisions: 0
dev.bge.0.stats.tx.UcastPkts: 115417
dev.bge.0.stats.tx.MulticastPkts: 0
dev.bge.0.stats.tx.BroadcastPkts: 0
dev.bge.0.stats.tx.CarrierSenseErrors: 0
dev.bge.0.stats.tx.Discards: 0
dev.bge.0.stats.tx.Errors: 0


2010/2/19 Pyun YongHyeon pyu...@gmail.com

 On Fri, Feb 19, 2010 at 11:13:59PM +0300, Slawa Olhovchenkov wrote:
  On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
 
  
dev.bge.1.stats.rx.Fragments: 1
  
   You received a frame that is less than 64 bytes with a bad FCS.
  
dev.bge.1.stats.rx.UcastPkts: 2956515
dev.bge.1.stats.rx.MulticastPkts: 0
dev.bge.1.stats.rx.FCSErrors: 18
  
   You have a lot of FCS errors here.
   Please double check cabling. If the statistics counter is right,
   sender is guilty or you have bad cabling issues here.
 
  1. lost packets much more 18. I think hundreds, or thousands.
  2. packets lost on both (bge0  bge1) interfaces

 If you see the MAC statistics counter, you have the following
 number of status updates and interrupts. Both numbers are same
 which means the controller didn't lost interrupts for state
 updates.
 dev.bge.0.stats.RingStatusUpdate: 950302
 dev.bge.0.stats.Interrupts: 950302
 and
 dev.bge.1.stats.RingStatusUpdate: 5518912
 dev.bge.1.stats.Interrupts: 5518912

 You received 582767 unicast packets and lost 0 packet for bge0.
 dev.bge.0.stats.rx.UcastPkts: 582767
 And you also received 2956515 unicast packets and lost 19 packets
 for bge1.
 dev.bge.1.stats.rx.Fragments: 1
 dev.bge.1.stats.rx.UcastPkts: 2956515
 dev.bge.1.stats.rx.FCSErrors: 18
 I don't see such a large number packet drops from these MAC
 statistics unless upper stack drops received packets.
 I fixed some counter updates which were ignored in previous
 releases so you may happen to see lost counters in recent version.

 Normally you should not have any FCS errors, it could be related
 with signal quality and these errors might not be correctly
 counted.

  3. packets don't lost on sources at Aug'09

 Since I don't have BCM5704 hardware it's hard to find which
 revision may affect to this issue. Could you narrow down which
 revision number started showing the issue?
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-22 Thread Slawa Olhovchenkov
On Mon, Feb 22, 2010 at 10:18:47AM +0300, Slawa Olhovchenkov wrote:

 On Sun, Feb 21, 2010 at 03:41:53PM -0800, Pyun YongHyeon wrote:
 
  On Sun, Feb 21, 2010 at 12:44:50AM +0300, Slawa Olhovchenkov wrote:
   On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:
   
Normally you should not have any FCS errors, it could be related
with signal quality and these errors might not be correctly
counted.
   
   I can't check cable and switch counters on bge1 before Feb 24.
   
 3. packets don't lost on sources at Aug'09

Since I don't have BCM5704 hardware it's hard to find which
revision may affect to this issue. Could you narrow down which
revision number started showing the issue?
   
   I am don't update source between Aug'09 and Feb 16.
   
  
  There were many bge(4) changes in that time frame. So it's hard to
  find which commit is guilty for the packet drop issue. If you can
  narrow down possible changes that might affect the issue that could
  help me a lot. You can do binary searching technique for the SVN
  revisions to know possible candidates.
  http://svn.freebsd.org/viewvc/base/head/sys/dev/bge/if_bge.c
 
 How I can do this?
 I don't work w/ svn before and don't know optimal way for one file.

mail# rm sys/dev/bge/*
mail# svn checkout -r 201697 svn://svn.freebsd.org/base/stable/8/sys/dev/bge/ 
sys/dev/bge
Asys/dev/bge/if_bgereg.h
Asys/dev/bge/if_bge.c
Checked out revision 201697.
mail# make -DNO_CLEAN -DKERNFAST buildkernel
=== bge (all)
cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc   
-DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/usr/src/sys/MAIL/opt_global.h 
-I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 
--param large-function-growth=1000 -fno-common -g -fno-omit-frame-pointer 
-I/usr/obj/usr/src/sys/MAIL -mcmodel=kernel -mno-red-zone  -mfpmath=387 
-mno-sse -mno-sse2 -mno-sse3 -mno-mmx -mno-3dnow  -msoft-float 
-fno-asynchronous-unwind-tables -ffreestanding -fstack-protector 
-std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs 
-Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  
-Wundef -Wno-pointer-sign -fformat-extensions -c 
/usr/src/sys/modules/bge/../../dev/bge/if_bge.c
ld  -d -warn-common -r -d -o if_bge.ko.debug if_bge.o
: export_syms
awk -f /usr/src/sys/conf/kmod_syms.awk if_bge.ko.debug  export_syms | xargs -J% 
objcopy % if_bge.ko.debug
objcopy --only-keep-debug if_bge.ko.debug if_bge.ko.symbols
objcopy --strip-debug --add-gnu-debuglink=if_bge.ko.symbols if_bge.ko.debug 
if_bge.ko
=== mii (all)
cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc   
-DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/usr/src/sys/MAIL/opt_global.h 
-I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 
--param large-function-growth=1000 -fno-common -g -fno-omit-frame-pointer 
-I/usr/obj/usr/src/sys/MAIL -mcmodel=kernel -mno-red-zone  -mfpmath=387 
-mno-sse -mno-sse2 -mno-sse3 -mno-mmx -mno-3dnow  -msoft-float 
-fno-asynchronous-unwind-tables -ffreestanding -fstack-protector 
-std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs 
-Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  
-Wundef -Wno-pointer-sign -fformat-extensions -c 
/usr/src/sys/modules/mii/../../dev/mii/brgphy.c
ld  -d -warn-common -r -d -o miibus.ko.debug acphy.o amphy.o atphy.o axphy.o 
bmtphy.o brgphy.o ciphy.o e1000phy.o exphy.o gentbi.o icsphy.o inphy.o 
ip1000phy.o jmphy.o lxtphy.o miibus_if.o mii.o mii_physubr.o mlphy.o nsgphy.o 
nsphy.o nsphyter.o pnaphy.o qsphy.o rgephy.o rlphy.o ruephy.o tdkphy.o tlphy.o 
truephy.o ukphy.o ukphy_subr.o xmphy.o
echo mii_mediachgmii_phy_probe   mii_phy_reset   mii_pollstat
mii_tick  export_syms
awk -f /usr/src/sys/conf/kmod_syms.awk miibus.ko.debug  export_syms | xargs -J% 
objcopy % miibus.ko.debug
objcopy --only-keep-debug miibus.ko.debug miibus.ko.symbols
objcopy --strip-debug --add-gnu-debuglink=miibus.ko.symbols miibus.ko.debug 
miibus.ko

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-22 Thread Pyun YongHyeon
On Mon, Feb 22, 2010 at 03:17:17PM +0200, Denis Lamanov wrote:
 I see same trouble (lost packets after 4 day uptime and reboot) :(
 
 dev.bge.0.stats.rx.FCSErrors: 18
 

You also have PCIX BCM5704 controller? What FreeBSD version do you
use?

 2010/2/19 Slawa Olhovchenkov s...@zxy.spb.ru
 
  On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
 
  
dev.bge.1.stats.rx.Fragments: 1
  
   You received a frame that is less than 64 bytes with a bad FCS.
  
dev.bge.1.stats.rx.UcastPkts: 2956515
dev.bge.1.stats.rx.MulticastPkts: 0
dev.bge.1.stats.rx.FCSErrors: 18
  
   You have a lot of FCS errors here.
   Please double check cabling. If the statistics counter is right,
   sender is guilty or you have bad cabling issues here.
 
  1. lost packets much more 18. I think hundreds, or thousands.
  2. packets lost on both (bge0  bge1) interfaces
  3. packets don't lost on sources at Aug'09
  ___
  freebsd-stable@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-22 Thread Denis Lamanov
Yes, PCIX BCM5704
FreeBSD vpn2 8.0-STABLE FreeBSD 8.0-STABLE #1 r204028: Thu Feb 18 08:29:42
EET 2010 ad...@vpn2:/usr/obj/usr/src/sys/GENERIC  i386

2010/2/22 Pyun YongHyeon pyu...@gmail.com

 On Mon, Feb 22, 2010 at 03:17:17PM +0200, Denis Lamanov wrote:
  I see same trouble (lost packets after 4 day uptime and reboot) :(
 
  dev.bge.0.stats.rx.FCSErrors: 18
 

 You also have PCIX BCM5704 controller? What FreeBSD version do you
 use?

  2010/2/19 Slawa Olhovchenkov s...@zxy.spb.ru
 
   On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
  
   
 dev.bge.1.stats.rx.Fragments: 1
   
You received a frame that is less than 64 bytes with a bad FCS.
   
 dev.bge.1.stats.rx.UcastPkts: 2956515
 dev.bge.1.stats.rx.MulticastPkts: 0
 dev.bge.1.stats.rx.FCSErrors: 18
   
You have a lot of FCS errors here.
Please double check cabling. If the statistics counter is right,
sender is guilty or you have bad cabling issues here.
  
   1. lost packets much more 18. I think hundreds, or thousands.
   2. packets lost on both (bge0  bge1) interfaces
   3. packets don't lost on sources at Aug'09
   ___
   freebsd-stable@freebsd.org mailing list
   http://lists.freebsd.org/mailman/listinfo/freebsd-stable
   To unsubscribe, send any mail to 
 freebsd-stable-unsubscr...@freebsd.org
  

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-21 Thread Pyun YongHyeon
On Sun, Feb 21, 2010 at 12:44:50AM +0300, Slawa Olhovchenkov wrote:
 On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:
 
  Normally you should not have any FCS errors, it could be related
  with signal quality and these errors might not be correctly
  counted.
 
 I can't check cable and switch counters on bge1 before Feb 24.
 
   3. packets don't lost on sources at Aug'09
  
  Since I don't have BCM5704 hardware it's hard to find which
  revision may affect to this issue. Could you narrow down which
  revision number started showing the issue?
 
 I am don't update source between Aug'09 and Feb 16.
 

There were many bge(4) changes in that time frame. So it's hard to
find which commit is guilty for the packet drop issue. If you can
narrow down possible changes that might affect the issue that could
help me a lot. You can do binary searching technique for the SVN
revisions to know possible candidates.
http://svn.freebsd.org/viewvc/base/head/sys/dev/bge/if_bge.c

 4. Packets don't lost immediately after reboot.
 
 PS: I got kernel panic.
 

I think this is the same crash(NULL pointer dereference in
m_copym(9)) as you reported and I think this means the patch I
posted did not help to fix the panic issue.

 ===
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address   = 0x18
 fault code  = supervisor read data, page not present
 instruction pointer = 0x20:0x802eb3b7
 stack pointer   = 0x28:0xff80001c66e0
 frame pointer   = 0x28:0xff8  01c6740
 code segment= base 0x0, limi  0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 724 (named)
 [thread pid 724 tid 100051 ]
 Stopped at  m_copym+0x37:   movl0x18(%r12),%eax
 db panic
 panic: from debugger
 cpuid = 0
 Uptime: 1d5h55m33s
 Physical memory: 2039 MB
 Dumping 1448 MB: 1433 1417 1401
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-21 Thread Slawa Olhovchenkov
On Sun, Feb 21, 2010 at 03:41:53PM -0800, Pyun YongHyeon wrote:

 On Sun, Feb 21, 2010 at 12:44:50AM +0300, Slawa Olhovchenkov wrote:
  On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:
  
   Normally you should not have any FCS errors, it could be related
   with signal quality and these errors might not be correctly
   counted.
  
  I can't check cable and switch counters on bge1 before Feb 24.
  
3. packets don't lost on sources at Aug'09
   
   Since I don't have BCM5704 hardware it's hard to find which
   revision may affect to this issue. Could you narrow down which
   revision number started showing the issue?
  
  I am don't update source between Aug'09 and Feb 16.
  
 
 There were many bge(4) changes in that time frame. So it's hard to
 find which commit is guilty for the packet drop issue. If you can
 narrow down possible changes that might affect the issue that could
 help me a lot. You can do binary searching technique for the SVN
 revisions to know possible candidates.
 http://svn.freebsd.org/viewvc/base/head/sys/dev/bge/if_bge.c

How I can do this?
I don't work w/ svn before and don't know optimal way for one file.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-20 Thread Slawa Olhovchenkov
On Fri, Feb 19, 2010 at 01:12:01PM -0800, Pyun YongHyeon wrote:

 Normally you should not have any FCS errors, it could be related
 with signal quality and these errors might not be correctly
 counted.

I can't check cable and switch counters on bge1 before Feb 24.

  3. packets don't lost on sources at Aug'09
 
 Since I don't have BCM5704 hardware it's hard to find which
 revision may affect to this issue. Could you narrow down which
 revision number started showing the issue?

I am don't update source between Aug'09 and Feb 16.

4. Packets don't lost immediately after reboot.

PS: I got kernel panic.

===
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x802eb3b7
stack pointer   = 0x28:0xff80001c66e0
frame pointer   = 0x28:0xff8  01c6740
code segment= base 0x0, limi  0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 724 (named)
[thread pid 724 tid 100051 ]
Stopped at  m_copym+0x37:   movl0x18(%r12),%eax
db panic
panic: from debugger
cpuid = 0
Uptime: 1d5h55m33s
Physical memory: 2039 MB
Dumping 1448 MB: 1433 1417 1401
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Slawa Olhovchenkov
On Fri, Feb 19, 2010 at 08:51:29AM +0300, Slawa Olhovchenkov wrote:

 On Thu, Feb 18, 2010 at 04:19:13PM -0800, Pyun YongHyeon wrote:
 
  
  I'm still not sure whether the panic is related with bge(4) but
  there are a couple of missing workaround for PCIX BCM5704 silicon
  bug in bge(4). Did you also see the panic before updating to
  stable/8?
 
 Before updating to stable/8 2010-Feb-16 I see network freez on stable/8
 2009-Sep -- bge stop receiving packets (by tcpdump), after aprox. 40-50
 days uptime.
 
 
  Anyway, try attached patch and let me know how it works.
 
 Thanks, I try.
 

I don't get trap after 2 hour, but already see next trouble:

===
PING 10.200.0.1 (10.200.0.1): 56 data bytes

--- 10.200.0.1 ping statistics ---
100 packets transmitted, 97 packets received, 3.0% packet loss
round-trip min/avg/max/stddev = 0.188/0.268/0.356/0.044 ms
===

w/o patch, but witch fresh source I see same trouble: after 12 hour 7% lost.
netstat -i don't show any errors.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Pyun YongHyeon
On Fri, Feb 19, 2010 at 03:24:15PM +0300, Slawa Olhovchenkov wrote:
 On Fri, Feb 19, 2010 at 08:51:29AM +0300, Slawa Olhovchenkov wrote:
 
  On Thu, Feb 18, 2010 at 04:19:13PM -0800, Pyun YongHyeon wrote:
  
   
   I'm still not sure whether the panic is related with bge(4) but
   there are a couple of missing workaround for PCIX BCM5704 silicon
   bug in bge(4). Did you also see the panic before updating to
   stable/8?
  
  Before updating to stable/8 2010-Feb-16 I see network freez on stable/8
  2009-Sep -- bge stop receiving packets (by tcpdump), after aprox. 40-50
  days uptime.
  
  
   Anyway, try attached patch and let me know how it works.
  
  Thanks, I try.
  
 
 I don't get trap after 2 hour, but already see next trouble:
 
 ===
 PING 10.200.0.1 (10.200.0.1): 56 data bytes
 
 --- 10.200.0.1 ping statistics ---
 100 packets transmitted, 97 packets received, 3.0% packet loss
 round-trip min/avg/max/stddev = 0.188/0.268/0.356/0.044 ms
 ===
 
 w/o patch, but witch fresh source I see same trouble: after 12 hour 7% lost.
 netstat -i don't show any errors.

I think BCM5704 supports HW MAC statistics counter. Try extract it
with sysctl dev.bge.0.stats. It will give you much more
information.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Slawa Olhovchenkov
On Fri, Feb 19, 2010 at 11:03:59AM -0800, Pyun YongHyeon wrote:

 On Fri, Feb 19, 2010 at 03:24:15PM +0300, Slawa Olhovchenkov wrote:
  On Fri, Feb 19, 2010 at 08:51:29AM +0300, Slawa Olhovchenkov wrote:
  
   On Thu, Feb 18, 2010 at 04:19:13PM -0800, Pyun YongHyeon wrote:
   

I'm still not sure whether the panic is related with bge(4) but
there are a couple of missing workaround for PCIX BCM5704 silicon
bug in bge(4). Did you also see the panic before updating to
stable/8?
   
   Before updating to stable/8 2010-Feb-16 I see network freez on stable/8
   2009-Sep -- bge stop receiving packets (by tcpdump), after aprox. 40-50
   days uptime.
   
   
Anyway, try attached patch and let me know how it works.
   
   Thanks, I try.
   
  
  I don't get trap after 2 hour, but already see next trouble:
  
  ===
  PING 10.200.0.1 (10.200.0.1): 56 data bytes
  
  --- 10.200.0.1 ping statistics ---
  100 packets transmitted, 97 packets received, 3.0% packet loss
  round-trip min/avg/max/stddev = 0.188/0.268/0.356/0.044 ms
  ===
  
  w/o patch, but witch fresh source I see same trouble: after 12 hour 7% lost.
  netstat -i don't show any errors.
 
 I think BCM5704 supports HW MAC statistics counter. Try extract it
 with sysctl dev.bge.0.stats. It will give you much more
 information.

dev.bge.0.stats.FramesDroppedDueToFilters: 0
dev.bge.0.stats.DmaWriteQueueFull: 0
dev.bge.0.stats.DmaWriteHighPriQueueFull: 0
dev.bge.0.stats.NoMoreRxBDs: 0
dev.bge.0.stats.InputDiscards: 0
dev.bge.0.stats.InputErrors: 0
dev.bge.0.stats.RecvThresholdHit: 561594
dev.bge.0.stats.DmaReadQueueFull: 41972
dev.bge.0.stats.DmaReadHighPriQueueFull: 0
dev.bge.0.stats.SendDataCompQueueFull: 0
dev.bge.0.stats.RingSetSendProdIndex: 705180
dev.bge.0.stats.RingStatusUpdate: 950302
dev.bge.0.stats.Interrupts: 950302
dev.bge.0.stats.AvoidedInterrupts: 0
dev.bge.0.stats.SendThresholdHit: 0
dev.bge.0.stats.rx.Octets: 196013834
dev.bge.0.stats.rx.Fragments: 0
dev.bge.0.stats.rx.UcastPkts: 582767
dev.bge.0.stats.rx.MulticastPkts: 0
dev.bge.0.stats.rx.FCSErrors: 0
dev.bge.0.stats.rx.AlignmentErrors: 0
dev.bge.0.stats.rx.xonPauseFramesReceived: 0
dev.bge.0.stats.rx.xoffPauseFramesReceived: 0
dev.bge.0.stats.rx.ControlFramesReceived: 0
dev.bge.0.stats.rx.xoffStateEntered: 0
dev.bge.0.stats.rx.FramesTooLong: 0
dev.bge.0.stats.rx.Jabbers: 0
dev.bge.0.stats.rx.UndersizePkts: 0
dev.bge.0.stats.rx.inRangeLengthError: 0
dev.bge.0.stats.rx.outRangeLengthError: 0
dev.bge.0.stats.tx.Octets: 654902713
dev.bge.0.stats.tx.Collisions: 0
dev.bge.0.stats.tx.XonSent: 0
dev.bge.0.stats.tx.XoffSent: 0
dev.bge.0.stats.tx.flowControlDone: 0
dev.bge.0.stats.tx.InternalMacTransmitErrors: 0
dev.bge.0.stats.tx.SingleCollisionFrames: 0
dev.bge.0.stats.tx.MultipleCollisionFrames: 0
dev.bge.0.stats.tx.DeferredTransmissions: 0
dev.bge.0.stats.tx.ExcessiveCollisions: 0
dev.bge.0.stats.tx.LateCollisions: 0
dev.bge.0.stats.tx.UcastPkts: 699931
dev.bge.0.stats.tx.MulticastPkts: 0
dev.bge.0.stats.tx.BroadcastPkts: 492
dev.bge.0.stats.tx.CarrierSenseErrors: 0
dev.bge.0.stats.tx.Discards: 0
dev.bge.0.stats.tx.Errors: 0

dev.bge.1.stats.FramesDroppedDueToFilters: 0
dev.bge.1.stats.DmaWriteQueueFull: 0
dev.bge.1.stats.DmaWriteHighPriQueueFull: 0
dev.bge.1.stats.NoMoreRxBDs: 0
dev.bge.1.stats.InputDiscards: 0
dev.bge.1.stats.InputErrors: 0
dev.bge.1.stats.RecvThresholdHit: 2889283
dev.bge.1.stats.DmaReadQueueFull: 79
dev.bge.1.stats.DmaReadHighPriQueueFull: 0
dev.bge.1.stats.SendDataCompQueueFull: 0
dev.bge.1.stats.RingSetSendProdIndex: 2861918
dev.bge.1.stats.RingStatusUpdate: 5518912
dev.bge.1.stats.Interrupts: 5518912
dev.bge.1.stats.AvoidedInterrupts: 0
dev.bge.1.stats.SendThresholdHit: 0
dev.bge.1.stats.rx.Octets: 930931282
dev.bge.1.stats.rx.Fragments: 1
dev.bge.1.stats.rx.UcastPkts: 2956515
dev.bge.1.stats.rx.MulticastPkts: 0
dev.bge.1.stats.rx.FCSErrors: 18
dev.bge.1.stats.rx.AlignmentErrors: 0
dev.bge.1.stats.rx.xonPauseFramesReceived: 0
dev.bge.1.stats.rx.xoffPauseFramesReceived: 0
dev.bge.1.stats.rx.ControlFramesReceived: 0
dev.bge.1.stats.rx.xoffStateEntered: 0
dev.bge.1.stats.rx.FramesTooLong: 0
dev.bge.1.stats.rx.Jabbers: 0
dev.bge.1.stats.rx.UndersizePkts: 0
dev.bge.1.stats.rx.inRangeLengthError: 0
dev.bge.1.stats.rx.outRangeLengthError: 0
dev.bge.1.stats.tx.Octets: 305055886
dev.bge.1.stats.tx.Collisions: 0
dev.bge.1.stats.tx.XonSent: 0
dev.bge.1.stats.tx.XoffSent: 0
dev.bge.1.stats.tx.flowControlDone: 0
dev.bge.1.stats.tx.InternalMacTransmitErrors: 0
dev.bge.1.stats.tx.SingleCollisionFrames: 0
dev.bge.1.stats.tx.MultipleCollisionFrames: 0
dev.bge.1.stats.tx.DeferredTransmissions: 0
dev.bge.1.stats.tx.ExcessiveCollisions: 0
dev.bge.1.stats.tx.LateCollisions: 0
dev.bge.1.stats.tx.UcastPkts: 2860335
dev.bge.1.stats.tx.MulticastPkts: 0
dev.bge.1.stats.tx.BroadcastPkts: 447
dev.bge.1.stats.tx.CarrierSenseErrors: 0
dev.bge.1.stats.tx.Discards: 0
dev.bge.1.stats.tx.Errors: 0
___
freebsd-stable@freebsd.org mailing list

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Pyun YongHyeon
On Fri, Feb 19, 2010 at 10:11:03PM +0300, Slawa Olhovchenkov wrote:
 On Fri, Feb 19, 2010 at 11:03:59AM -0800, Pyun YongHyeon wrote:
 
  On Fri, Feb 19, 2010 at 03:24:15PM +0300, Slawa Olhovchenkov wrote:
   On Fri, Feb 19, 2010 at 08:51:29AM +0300, Slawa Olhovchenkov wrote:
   
On Thu, Feb 18, 2010 at 04:19:13PM -0800, Pyun YongHyeon wrote:

 
 I'm still not sure whether the panic is related with bge(4) but
 there are a couple of missing workaround for PCIX BCM5704 silicon
 bug in bge(4). Did you also see the panic before updating to
 stable/8?

Before updating to stable/8 2010-Feb-16 I see network freez on stable/8
2009-Sep -- bge stop receiving packets (by tcpdump), after aprox. 40-50
days uptime.


 Anyway, try attached patch and let me know how it works.

Thanks, I try.

   
   I don't get trap after 2 hour, but already see next trouble:
   
   ===
   PING 10.200.0.1 (10.200.0.1): 56 data bytes
   
   --- 10.200.0.1 ping statistics ---
   100 packets transmitted, 97 packets received, 3.0% packet loss
   round-trip min/avg/max/stddev = 0.188/0.268/0.356/0.044 ms
   ===
   
   w/o patch, but witch fresh source I see same trouble: after 12 hour 7% 
   lost.
   netstat -i don't show any errors.
  
  I think BCM5704 supports HW MAC statistics counter. Try extract it
  with sysctl dev.bge.0.stats. It will give you much more
  information.
 

[...]

 dev.bge.1.stats.rx.Fragments: 1

You received a frame that is less than 64 bytes with a bad FCS.

 dev.bge.1.stats.rx.UcastPkts: 2956515
 dev.bge.1.stats.rx.MulticastPkts: 0
 dev.bge.1.stats.rx.FCSErrors: 18

You have a lot of FCS errors here.
Please double check cabling. If the statistics counter is right,
sender is guilty or you have bad cabling issues here.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Slawa Olhovchenkov
On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:

 
  dev.bge.1.stats.rx.Fragments: 1
 
 You received a frame that is less than 64 bytes with a bad FCS.
 
  dev.bge.1.stats.rx.UcastPkts: 2956515
  dev.bge.1.stats.rx.MulticastPkts: 0
  dev.bge.1.stats.rx.FCSErrors: 18
 
 You have a lot of FCS errors here.
 Please double check cabling. If the statistics counter is right,
 sender is guilty or you have bad cabling issues here.

1. lost packets much more 18. I think hundreds, or thousands.
2. packets lost on both (bge0  bge1) interfaces
3. packets don't lost on sources at Aug'09
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-19 Thread Pyun YongHyeon
On Fri, Feb 19, 2010 at 11:13:59PM +0300, Slawa Olhovchenkov wrote:
 On Fri, Feb 19, 2010 at 12:06:47PM -0800, Pyun YongHyeon wrote:
 
  
   dev.bge.1.stats.rx.Fragments: 1
  
  You received a frame that is less than 64 bytes with a bad FCS.
  
   dev.bge.1.stats.rx.UcastPkts: 2956515
   dev.bge.1.stats.rx.MulticastPkts: 0
   dev.bge.1.stats.rx.FCSErrors: 18
  
  You have a lot of FCS errors here.
  Please double check cabling. If the statistics counter is right,
  sender is guilty or you have bad cabling issues here.
 
 1. lost packets much more 18. I think hundreds, or thousands.
 2. packets lost on both (bge0  bge1) interfaces

If you see the MAC statistics counter, you have the following
number of status updates and interrupts. Both numbers are same
which means the controller didn't lost interrupts for state
updates.
dev.bge.0.stats.RingStatusUpdate: 950302
dev.bge.0.stats.Interrupts: 950302
and
dev.bge.1.stats.RingStatusUpdate: 5518912
dev.bge.1.stats.Interrupts: 5518912

You received 582767 unicast packets and lost 0 packet for bge0.
dev.bge.0.stats.rx.UcastPkts: 582767
And you also received 2956515 unicast packets and lost 19 packets
for bge1.
dev.bge.1.stats.rx.Fragments: 1
dev.bge.1.stats.rx.UcastPkts: 2956515
dev.bge.1.stats.rx.FCSErrors: 18
I don't see such a large number packet drops from these MAC
statistics unless upper stack drops received packets.
I fixed some counter updates which were ignored in previous
releases so you may happen to see lost counters in recent version.

Normally you should not have any FCS errors, it could be related
with signal quality and these errors might not be correctly
counted.

 3. packets don't lost on sources at Aug'09

Since I don't have BCM5704 hardware it's hard to find which
revision may affect to this issue. Could you narrow down which
revision number started showing the issue?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Slawa Olhovchenkov
On Tue, Feb 16, 2010 at 09:57:19AM -0800, Pyun YongHyeon wrote:

 On Sun, Feb 14, 2010 at 10:04:58AM -0800, Nick Rogers wrote:
  I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can anyone
  shed light on the below error? I unfortunately cannot provide a proper crash
  dump. The pointer addresses are always the same. The only other thing I've
  noticed that may be related is a watchdog timeout on bge0 error before the
  panic. Thanks.
  
 
 Any chance to get backtrace from the crash?

I got same trouble on the same platform (8.0-STABLE/amd64).
hw.bge.allow_asf=0 already

I got 2 proper crash dump (first w/ net.inet.ip.forwarding=1
and second w/ net.inet.ip.forwarding=0).

backtrace from the first crash:

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x802ea751
stack pointer   = 0x28:0xff8ef930
frame pointer   = 0x28:0xff8ef970
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (irq26: bge1)
panic: from debugger
cpuid = 0
Uptime: 5h23m50s
Physical memory: 2039 MB
Dumping 1316 MB: 1301 1285 1269 1253 1237 1221 1205 1189 1173 1157 1141 1125 
1109 1093 1077 1061 1045 1029 1013 997 981 965 949 933 917 901 885 869 853 837 
821 805 789 773 757 741 725 709 693 677 661 645 629 613 597 581 565 549 533 517 
501 485 469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 229 213 197 
181 165 149 133 117 101 85 69 53 37 21 5

Reading symbols from /boot/kernel/if_bge.ko...Reading symbols from 
/boot/kernel/if_bge.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/if_bge.ko
Reading symbols from /boot/kernel/miibus.ko...Reading symbols from 
/boot/kernel/miibus.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/miibus.ko
Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
/boot/kernel/ipfw.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ipfw.ko
Reading symbols from /boot/kernel/nfsserver.ko...Reading symbols from 
/boot/kernel/nfsserver.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nfsserver.ko
Reading symbols from /boot/kernel/krpc.ko...Reading symbols from 
/boot/kernel/krpc.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/krpc.ko
Reading symbols from /boot/kernel/nfssvc.ko...Reading symbols from 
/boot/kernel/nfssvc.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nfssvc.ko
#0  doadump () at pcpu.h:223
223 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:223
#1  0x802909b9 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:416
#2  0x80290e0c in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:579
#3  0x801a5bc7 in db_panic (addr=Variable addr is not available.
) at /usr/src/sys/ddb/db_command.c:478
#4  0x801a5fd1 in db_command (last_cmdp=0x806b1fa0, 
cmd_table=Variable cmd_table is not available.
) at /usr/src/sys/ddb/db_command.c:445
#5  0x801a6220 in db_command_loop () at 
/usr/src/sys/ddb/db_command.c:498
#6  0x801a81e9 in db_trap (type=Variable type is not available.
) at /usr/src/sys/ddb/db_main.c:229
#7  0x802c0995 in kdb_trap (type=12, code=0, tf=0xff8ef880) at 
/usr/src/sys/kern/subr_kdb.c:535
#8  0x8049ee0d in trap_fatal (frame=0xff8ef880, eva=Variable 
eva is not available.
) at /usr/src/sys/amd64/amd64/trap.c:852
#9  0x8049f1e4 in trap_pfault (frame=0xff8ef880, usermode=0) at 
/usr/src/sys/amd64/amd64/trap.c:773
#10 0x8049fa6a in trap (frame=0xff8ef880) at 
/usr/src/sys/amd64/amd64/trap.c:499
#11 0x80484ff3 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:224
#12 0x802ea751 in m_copydata (m=0x0, off=0, len=108, 
cp=0xff0027865194 б\026zHqJВ\220ЦПЫСPo~@22Feb 17 15:10:2)
at /usr/src/sys/kern/uipc_mbuf.c:816
#13 0x8035e72d in ip_forward (m=0xff0001530900, srcrt=Variable 
srcrt is not available.
) at /usr/src/sys/netinet/ip_input.c:1444
#14 0x8035fef7 in ip_input (m=0xff0001530900) at 
/usr/src/sys/netinet/ip_input.c:717
#15 0x80342e9e in netisr_dispatch_src (proto=1, source=Variable 
source is not available.
) at /usr/src/sys/net/netisr.c:917
#16 0x8033fd5d in ether_demux 

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Pyun YongHyeon
On Thu, Feb 18, 2010 at 05:38:22PM +0300, Slawa Olhovchenkov wrote:
 On Tue, Feb 16, 2010 at 09:57:19AM -0800, Pyun YongHyeon wrote:
 
  On Sun, Feb 14, 2010 at 10:04:58AM -0800, Nick Rogers wrote:
   I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can anyone
   shed light on the below error? I unfortunately cannot provide a proper 
   crash
   dump. The pointer addresses are always the same. The only other thing I've
   noticed that may be related is a watchdog timeout on bge0 error before the
   panic. Thanks.
   
  
  Any chance to get backtrace from the crash?
 
 I got same trouble on the same platform (8.0-STABLE/amd64).
 hw.bge.allow_asf=0 already
 
 I got 2 proper crash dump (first w/ net.inet.ip.forwarding=1
 and second w/ net.inet.ip.forwarding=0).
 

It looks like mbuf pointer was changed to NULL in the middle of IP
forwarding and IP fragment stage. If bge(4) frees passed mbufs this
may happen but I'm not sure this comes from bge(4).
By chance, are you using polling(4) on bge(4)? Also show me the
dmesg output(only bge(4) related one).

 backtrace from the first crash:
 
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as amd64-marcel-freebsd...
 
 Unread portion of the kernel message buffer:
 
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address   = 0x18
 fault code  = supervisor read data, page not present
 instruction pointer = 0x20:0x802ea751
 stack pointer   = 0x28:0xff8ef930
 frame pointer   = 0x28:0xff8ef970
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 12 (irq26: bge1)
 panic: from debugger
 cpuid = 0
 Uptime: 5h23m50s
 Physical memory: 2039 MB
 Dumping 1316 MB: 1301 1285 1269 1253 1237 1221 1205 1189 1173 1157 1141 1125 
 1109 1093 1077 1061 1045 1029 1013 997 981 965 949 933 917 901 885 869 853 
 837 821 805 789 773 757 741 725 709 693 677 661 645 629 613 597 581 565 549 
 533 517 501 485 469 453 437 421 405 389 373 357 341 325 309 293 277 261 245 
 229 213 197 181 165 149 133 117 101 85 69 53 37 21 5
 
 Reading symbols from /boot/kernel/if_bge.ko...Reading symbols from 
 /boot/kernel/if_bge.ko.symbols...done.
 done.
 Loaded symbols for /boot/kernel/if_bge.ko
 Reading symbols from /boot/kernel/miibus.ko...Reading symbols from 
 /boot/kernel/miibus.ko.symbols...done.
 done.
 Loaded symbols for /boot/kernel/miibus.ko
 Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from 
 /boot/kernel/ipfw.ko.symbols...done.
 done.
 Loaded symbols for /boot/kernel/ipfw.ko
 Reading symbols from /boot/kernel/nfsserver.ko...Reading symbols from 
 /boot/kernel/nfsserver.ko.symbols...done.
 done.
 Loaded symbols for /boot/kernel/nfsserver.ko
 Reading symbols from /boot/kernel/krpc.ko...Reading symbols from 
 /boot/kernel/krpc.ko.symbols...done.
 done.
 Loaded symbols for /boot/kernel/krpc.ko
 Reading symbols from /boot/kernel/nfssvc.ko...Reading symbols from 
 /boot/kernel/nfssvc.ko.symbols...done.
 done.
 Loaded symbols for /boot/kernel/nfssvc.ko
 #0  doadump () at pcpu.h:223
 223 pcpu.h: No such file or directory.
 in pcpu.h
 (kgdb) bt
 #0  doadump () at pcpu.h:223
 #1  0x802909b9 in boot (howto=260) at 
 /usr/src/sys/kern/kern_shutdown.c:416
 #2  0x80290e0c in panic (fmt=Variable fmt is not available.
 ) at /usr/src/sys/kern/kern_shutdown.c:579
 #3  0x801a5bc7 in db_panic (addr=Variable addr is not available.
 ) at /usr/src/sys/ddb/db_command.c:478
 #4  0x801a5fd1 in db_command (last_cmdp=0x806b1fa0, 
 cmd_table=Variable cmd_table is not available.
 ) at /usr/src/sys/ddb/db_command.c:445
 #5  0x801a6220 in db_command_loop () at 
 /usr/src/sys/ddb/db_command.c:498
 #6  0x801a81e9 in db_trap (type=Variable type is not available.
 ) at /usr/src/sys/ddb/db_main.c:229
 #7  0x802c0995 in kdb_trap (type=12, code=0, tf=0xff8ef880) 
 at /usr/src/sys/kern/subr_kdb.c:535
 #8  0x8049ee0d in trap_fatal (frame=0xff8ef880, eva=Variable 
 eva is not available.
 ) at /usr/src/sys/amd64/amd64/trap.c:852
 #9  0x8049f1e4 in trap_pfault (frame=0xff8ef880, usermode=0) 
 at /usr/src/sys/amd64/amd64/trap.c:773
 #10 0x8049fa6a in trap (frame=0xff8ef880) at 
 /usr/src/sys/amd64/amd64/trap.c:499
 #11 0x80484ff3 in calltrap () at 
 /usr/src/sys/amd64/amd64/exception.S:224
 #12 0x802ea751 in m_copydata (m=0x0, off=0, len=108, 
 cp=0xff0027865194 б\026zHqJВ\220ЦПЫСPo~@22Feb 17 

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Slawa Olhovchenkov
On Thu, Feb 18, 2010 at 11:36:12AM -0800, Pyun YongHyeon wrote:

 On Thu, Feb 18, 2010 at 05:38:22PM +0300, Slawa Olhovchenkov wrote:
  On Tue, Feb 16, 2010 at 09:57:19AM -0800, Pyun YongHyeon wrote:
  
   On Sun, Feb 14, 2010 at 10:04:58AM -0800, Nick Rogers wrote:
I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can anyone
shed light on the below error? I unfortunately cannot provide a proper 
crash
dump. The pointer addresses are always the same. The only other thing 
I've
noticed that may be related is a watchdog timeout on bge0 error before 
the
panic. Thanks.

   
   Any chance to get backtrace from the crash?
  
  I got same trouble on the same platform (8.0-STABLE/amd64).
  hw.bge.allow_asf=0 already
  
  I got 2 proper crash dump (first w/ net.inet.ip.forwarding=1
  and second w/ net.inet.ip.forwarding=0).
  
 
 It looks like mbuf pointer was changed to NULL in the middle of IP
 forwarding and IP fragment stage. If bge(4) frees passed mbufs this
 may happen but I'm not sure this comes from bge(4).
 By chance, are you using polling(4) on bge(4)? Also show me the

I am not using polling.

 dmesg output(only bge(4) related one).

dmesg from boot:

bge0: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
miibus0: MII bus on bge0
brgphy0: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge0: Ethernet address: 00:14:c2:3d:e5:52
bge0: [ITHREAD]
bge1: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
miibus1: MII bus on bge1
brgphy1: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge1: Ethernet address: 00:14:c2:3d:e5:51
bge1: [ITHREAD]
bge1: link state changed to UP
bge0: link state changed to UP

Nothing in dmesg before trap.

-- 
Slawa Olhovchenkov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Pyun YongHyeon
On Fri, Feb 19, 2010 at 12:24:28AM +0300, Slawa Olhovchenkov wrote:
 On Thu, Feb 18, 2010 at 11:36:12AM -0800, Pyun YongHyeon wrote:
 
  On Thu, Feb 18, 2010 at 05:38:22PM +0300, Slawa Olhovchenkov wrote:
   On Tue, Feb 16, 2010 at 09:57:19AM -0800, Pyun YongHyeon wrote:
   
On Sun, Feb 14, 2010 at 10:04:58AM -0800, Nick Rogers wrote:
 I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can 
 anyone
 shed light on the below error? I unfortunately cannot provide a 
 proper crash
 dump. The pointer addresses are always the same. The only other thing 
 I've
 noticed that may be related is a watchdog timeout on bge0 error 
 before the
 panic. Thanks.
 

Any chance to get backtrace from the crash?
   
   I got same trouble on the same platform (8.0-STABLE/amd64).
   hw.bge.allow_asf=0 already
   
   I got 2 proper crash dump (first w/ net.inet.ip.forwarding=1
   and second w/ net.inet.ip.forwarding=0).
   
  
  It looks like mbuf pointer was changed to NULL in the middle of IP
  forwarding and IP fragment stage. If bge(4) frees passed mbufs this
  may happen but I'm not sure this comes from bge(4).
  By chance, are you using polling(4) on bge(4)? Also show me the
 
 I am not using polling.
 

Ok.

  dmesg output(only bge(4) related one).
 
 dmesg from boot:
 
 bge0: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
 miibus0: MII bus on bge0
 brgphy0: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus0
 brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 bge0: Ethernet address: 00:14:c2:3d:e5:52
 bge0: [ITHREAD]
 bge1: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
 miibus1: MII bus on bge1
 brgphy1: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus1
 brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 bge1: Ethernet address: 00:14:c2:3d:e5:51
 bge1: [ITHREAD]
 bge1: link state changed to UP
 bge0: link state changed to UP
 
 Nothing in dmesg before trap.
 

Is this PCI-X controller? It would be even better if you can post
bge(4) related dmesg output of verbosed boot and the output of
pciconf -lcv.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Slawa Olhovchenkov
On Thu, Feb 18, 2010 at 01:32:13PM -0800, Pyun YongHyeon wrote:

   dmesg output(only bge(4) related one).
  
  dmesg from boot:
  
  bge0: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
  0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
  miibus0: MII bus on bge0
  brgphy0: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus0
  brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
  1000baseT-FDX, auto
  bge0: Ethernet address: 00:14:c2:3d:e5:52
  bge0: [ITHREAD]
  bge1: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
  0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
  miibus1: MII bus on bge1
  brgphy1: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus1
  brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
  1000baseT-FDX, auto
  bge1: Ethernet address: 00:14:c2:3d:e5:51
  bge1: [ITHREAD]
  bge1: link state changed to UP
  bge0: link state changed to UP
  
  Nothing in dmesg before trap.
  
 
 Is this PCI-X controller? It would be even better if you can post

This integrated controller (HP DL360-G4)

 bge(4) related dmesg output of verbosed boot and the output of

Preloaded elf kernel /boot/kernel/kernel at 0x8088e000.
Preloaded elf obj module /boot/kernel/if_bge.ko at 0x8088e1d0.
Preloaded elf obj module /boot/kernel/miibus.ko at 0x8088e7f8.
pci0:2:2:0: bad VPD cksum, remain 19
bge0: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
bge0: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf7
bge0: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
miibus0: MII bus on bge0
brgphy0: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus0
brgphy0: OUI 0x000818, model 0x0019, rev. 0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge0: bpf attached
bge0: Ethernet address: 00:14:c2:3d:e5:52
ioapic1: routing intpin 1 (PCI IRQ 25) to lapic 0 vector 50
bge0: [MPSAFE]
bge0: [ITHREAD]
pci0:2:2:1: bad VPD cksum, remain 19
bge1: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
bge1: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf6
bge1: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
miibus1: MII bus on bge1
brgphy1: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus1
brgphy1: OUI 0x000818, model 0x0019, rev. 0
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge1: bpf attached
bge1: Ethernet address: 00:14:c2:3d:e5:51
ioapic1: routing intpin 2 (PCI IRQ 26) to lapic 0 vector 51
bge1: [MPSAFE]
bge1: [ITHREAD]
bge1: link UP
bge1: link state changed to UP


 pciconf -lcv.

hos...@pci0:0:0:0:  class=0x06 card=0x32000e11 chip=0x35908086 rev=0x0c 
hdr=0x00
vendor = 'Intel Corporation'
device = 'E7520 Server Memory Controller Hub'
class  = bridge
subclass   = HOST-PCI
cap 09[40] = vendor (length 5) Intel cap 4 version 1
pc...@pci0:0:2:0:   class=0x060400 card=0x chip=0x35958086 rev=0x0c 
hdr=0x01
vendor = 'Intel Corporation'
device = 'E752x Memory Controller Hub PCIe Port A0'
class  = bridge
subclass   = PCI-PCI
cap 01[50] = powerspec 2  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages 
cap 10[64] = PCI-Express 1 root port max data 256(256) link x0(x8)
pc...@pci0:0:4:0:   class=0x060400 card=0x chip=0x35978086 rev=0x0c 
hdr=0x01
vendor = 'Intel Corporation'
device = 'E752x Memory Controller Hub PCIe Port B0'
class  = bridge
subclass   = PCI-PCI
cap 01[50] = powerspec 2  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages 
cap 10[64] = PCI-Express 1 root port max data 256(256) link x8(x8)
pc...@pci0:0:6:0:   class=0x060400 card=0x chip=0x35998086 rev=0x0c 
hdr=0x01
vendor = 'Intel Corporation'
device = 'E752x Memory Controller Hub PCIe Port C0'
class  = bridge
subclass   = PCI-PCI
cap 01[50] = powerspec 2  supports D0 D3  current D0
cap 05[58] = MSI supports 2 messages 
cap 10[64] = PCI-Express 1 root port max data 256(256) link x0(x8)
pc...@pci0:0:28:0:  class=0x060400 card=0x chip=0x25ae8086 rev=0x02 
hdr=0x01
vendor = 'Intel Corporation'
device = 'Hub Interface to PCI-X Bridge (6300ESB)'
class  = bridge
subclass   = PCI-PCI
cap 07[50] = PCI-X 64-bit bridge 
no...@pci0:0:29:0:  class=0x0c0300 card=0x32010e11 chip=0x25a98086 rev=0x02 
hdr=0x00
vendor = 'Intel Corporation'
device = 'USB 1.1 UHCI Controller *1 (6300ESB)'
class  = serial bus
subclass   = USB
no...@pci0:0:29:1:  class=0x0c0300 card=0x32010e11 chip=0x25aa8086 rev=0x02 
hdr=0x00
vendor = 'Intel Corporation'
device = 'USB 1.1 UHCI Controller *2 (6300ESB)'
class  = serial bus
subclass   = USB
no...@pci0:0:29:4:  class=0x088000 card=0x32010e11 

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Jeremy Chadwick
On Fri, Feb 19, 2010 at 12:50:39AM +0300, Slawa Olhovchenkov wrote:
 On Thu, Feb 18, 2010 at 01:32:13PM -0800, Pyun YongHyeon wrote:
 
dmesg output(only bge(4) related one).
   
   dmesg from boot:
   
   bge0: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
   0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
   miibus0: MII bus on bge0
   brgphy0: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus0
   brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
   1000baseT-FDX, auto
   bge0: Ethernet address: 00:14:c2:3d:e5:52
   bge0: [ITHREAD]
   bge1: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
   0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
   miibus1: MII bus on bge1
   brgphy1: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus1
   brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
   1000baseT-FDX, auto
   bge1: Ethernet address: 00:14:c2:3d:e5:51
   bge1: [ITHREAD]
   bge1: link state changed to UP
   bge0: link state changed to UP
   
   Nothing in dmesg before trap.
   
  
  Is this PCI-X controller? It would be even better if you can post
 
 This integrated controller (HP DL360-G4)
 
  bge(4) related dmesg output of verbosed boot and the output of

 ...
 pci0:2:2:0: bad VPD cksum, remain 19
 bge0: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
 bge0: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf7
 bge0: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
 ...
 pci0:2:2:1: bad VPD cksum, remain 19
 bge1: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
 bge1: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf6
 bge1: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X

Are the bad VPD checksum messages somehow responsible for this?
They're both related to the bge(4) interfaces:

 b...@pci0:2:2:0:class=0x02 card=0x00d00e11 chip=0x164814e4 
 rev=0x10 hdr=0x00
 b...@pci0:2:2:1:class=0x02 card=0x00d00e11 chip=0x164814e4 
 rev=0x10 hdr=0x00

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Pyun YongHyeon
On Fri, Feb 19, 2010 at 12:50:39AM +0300, Slawa Olhovchenkov wrote:
 On Thu, Feb 18, 2010 at 01:32:13PM -0800, Pyun YongHyeon wrote:
 
dmesg output(only bge(4) related one).
   
   dmesg from boot:
   
   bge0: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
   0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
   miibus0: MII bus on bge0
   brgphy0: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus0
   brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
   1000baseT-FDX, auto
   bge0: Ethernet address: 00:14:c2:3d:e5:52
   bge0: [ITHREAD]
   bge1: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
   0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
   miibus1: MII bus on bge1
   brgphy1: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus1
   brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
   1000baseT-FDX, auto
   bge1: Ethernet address: 00:14:c2:3d:e5:51
   bge1: [ITHREAD]
   bge1: link state changed to UP
   bge0: link state changed to UP
   
   Nothing in dmesg before trap.
   
  
  Is this PCI-X controller? It would be even better if you can post
 
 This integrated controller (HP DL360-G4)
 
  bge(4) related dmesg output of verbosed boot and the output of
 
 Preloaded elf kernel /boot/kernel/kernel at 0x8088e000.
 Preloaded elf obj module /boot/kernel/if_bge.ko at 0x8088e1d0.
 Preloaded elf obj module /boot/kernel/miibus.ko at 0x8088e7f8.
 pci0:2:2:0: bad VPD cksum, remain 19
 bge0: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
 0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
 bge0: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf7
 bge0: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
 miibus0: MII bus on bge0
 brgphy0: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus0
 brgphy0: OUI 0x000818, model 0x0019, rev. 0
 brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 bge0: bpf attached
 bge0: Ethernet address: 00:14:c2:3d:e5:52
 ioapic1: routing intpin 1 (PCI IRQ 25) to lapic 0 vector 50
 bge0: [MPSAFE]
 bge0: [ITHREAD]
 pci0:2:2:1: bad VPD cksum, remain 19
 bge1: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
 0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
 bge1: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf6
 bge1: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
 miibus1: MII bus on bge1
 brgphy1: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus1
 brgphy1: OUI 0x000818, model 0x0019, rev. 0
 brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 bge1: bpf attached
 bge1: Ethernet address: 00:14:c2:3d:e5:51
 ioapic1: routing intpin 2 (PCI IRQ 26) to lapic 0 vector 51
 bge1: [MPSAFE]
 bge1: [ITHREAD]
 bge1: link UP
 bge1: link state changed to UP
 
 
  pciconf -lcv.
 

[...]

 b...@pci0:2:2:0:class=0x02 card=0x00d00e11 chip=0x164814e4 
 rev=0x10 hdr=0x00
 vendor = 'Broadcom Corporation'
 device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
 class  = network
 subclass   = ethernet
 cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
 transaction
 cap 01[48] = powerspec 2  supports D0 D3  current D0
 cap 03[50] = VPD
 cap 05[58] = MSI supports 8 messages, 64 bit 
 b...@pci0:2:2:1:class=0x02 card=0x00d00e11 chip=0x164814e4 
 rev=0x10 hdr=0x00
 vendor = 'Broadcom Corporation'
 device = 'NetXtreme Dual Gigabit Adapter (BCM5704)'
 class  = network
 subclass   = ethernet
 cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split 
 transaction
 cap 01[48] = powerspec 2  supports D0 D3  current D0
 cap 03[50] = VPD
 cap 05[58] = MSI supports 8 messages, 64 bit 

I'm still not sure whether the panic is related with bge(4) but
there are a couple of missing workaround for PCIX BCM5704 silicon
bug in bge(4). Did you also see the panic before updating to
stable/8?
Anyway, try attached patch and let me know how it works.
Index: sys/dev/bge/if_bge.c
===
--- sys/dev/bge/if_bge.c	(revision 204011)
+++ sys/dev/bge/if_bge.c	(working copy)
@@ -1342,6 +1342,7 @@
 bge_chipinit(struct bge_softc *sc)
 {
 	uint32_t dma_rw_ctl;
+	uint16_t val;
 	int i;
 
 	/* Set endianness before we access any non-PCI registers. */
@@ -1362,6 +1363,17 @@
 	i  BGE_STATUS_BLOCK_END + 1; i += sizeof(uint32_t))
 		BGE_MEMWIN_WRITE(sc, i, 0);
 
+	if (sc-bge_chiprev == BGE_CHIPREV_5704_BX) {
+		/*
+		 *  Fix data corruption casued by non-qword write with WB.
+		 *  Fix master abort in PCI mode.
+		 *  Fix PCI latency timer.
+		 */
+		val = pci_read_config(sc-bge_dev, BGE_PCI_MSI_DATA + 2, 2);
+		val |= (1  10) | (1  12) | (1  13);
+		pci_write_config(sc-bge_dev, BGE_PCI_MSI_DATA + 2, val, 2);
+	}
+
 	/*
 	 * Set up the PCI DMA control register.
 	 */
@@ -3157,6 +3169,26 @@
 	pci_write_config(dev, BGE_PCI_CMD, command, 4);
 	write_op(sc, 

Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Pyun YongHyeon
On Thu, Feb 18, 2010 at 03:32:54PM -0800, Jeremy Chadwick wrote:
 On Fri, Feb 19, 2010 at 12:50:39AM +0300, Slawa Olhovchenkov wrote:
  On Thu, Feb 18, 2010 at 01:32:13PM -0800, Pyun YongHyeon wrote:
  
 dmesg output(only bge(4) related one).

dmesg from boot:

bge0: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
miibus0: MII bus on bge0
brgphy0: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge0: Ethernet address: 00:14:c2:3d:e5:52
bge0: [ITHREAD]
bge1: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
miibus1: MII bus on bge1
brgphy1: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
bge1: Ethernet address: 00:14:c2:3d:e5:51
bge1: [ITHREAD]
bge1: link state changed to UP
bge0: link state changed to UP

Nothing in dmesg before trap.

   
   Is this PCI-X controller? It would be even better if you can post
  
  This integrated controller (HP DL360-G4)
  
   bge(4) related dmesg output of verbosed boot and the output of
 
  ...
  pci0:2:2:0: bad VPD cksum, remain 19
  bge0: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
  0xfdf7-0xfdf7 irq 25 at device 2.0 on pci2
  bge0: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf7
  bge0: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
  ...
  pci0:2:2:1: bad VPD cksum, remain 19
  bge1: HP NC7782 Gigabit Server Adapter, ASIC rev. 0x002100 mem 
  0xfdf6-0xfdf6 irq 26 at device 2.1 on pci2
  bge1: Reserved 0x1 bytes for rid 0x10 type 3 at 0xfdf6
  bge1: CHIP ID 0x2100; ASIC REV 0x02; CHIP REV 0x21; PCI-X
 
 Are the bad VPD checksum messages somehow responsible for this?
 They're both related to the bge(4) interfaces:
 
  b...@pci0:2:2:0:class=0x02 card=0x00d00e11 chip=0x164814e4 
  rev=0x10 hdr=0x00
  b...@pci0:2:2:1:class=0x02 card=0x00d00e11 chip=0x164814e4 
  rev=0x10 hdr=0x00
  

Driver tries to read VPD from controller but it seems it failed to
fully parse the data. But it managed to get PN part so it
successfully extracted device name string from the controller.
I don't think this is related with driver instability though.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-18 Thread Slawa Olhovchenkov
On Thu, Feb 18, 2010 at 04:19:13PM -0800, Pyun YongHyeon wrote:

 
 I'm still not sure whether the panic is related with bge(4) but
 there are a couple of missing workaround for PCIX BCM5704 silicon
 bug in bge(4). Did you also see the panic before updating to
 stable/8?

Before updating to stable/8 2010-Feb-16 I see network freez on stable/8
2009-Sep -- bge stop receiving packets (by tcpdump), after aprox. 40-50
days uptime.


 Anyway, try attached patch and let me know how it works.

Thanks, I try.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-16 Thread Pyun YongHyeon
On Sun, Feb 14, 2010 at 10:04:58AM -0800, Nick Rogers wrote:
 I'm having repeated kernel panic issues on 8.0-RELEASE/amd64. Can anyone
 shed light on the below error? I unfortunately cannot provide a proper crash
 dump. The pointer addresses are always the same. The only other thing I've
 noticed that may be related is a watchdog timeout on bge0 error before the
 panic. Thanks.
 

Any chance to get backtrace from the crash?

 Jan 27 15:25:01 wifi kernel:
 Jan 27 15:25:01 wifi kernel:
 Jan 27 15:25:01 wifi kernel: Fatal trap 12: page fault while in kernel mode
 Jan 27 15:25:01 wifi kernel: cpuid = 4; apic id = 04
 Jan 27 15:25:02 wifi kernel:
 Jan 27 15:25:02 wifi kernel: fault virtual address  = 0x28
 Jan 27 15:25:02 wifi kernel: fault code = supervisor write data,
 page not present
 Jan 27 15:25:02 wifi kernel: instruction pointer=
 0x20:0x803263b7
 Jan 27 15:25:02 wifi kernel: stack pointer  =
 0x28:0xff8073acdb40
 Jan 27 15:25:02 wifi kernel: frame pointer  =
 0x28:0xff8073acdba0
 Jan 27 15:25:02 wifi kernel: code segment   = base 0x0, limit
 0xf, type 0x1b
 Jan 27 15:25:02 wifi kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
 Jan 27 15:25:02 wifi kernel: processor eflags   =
 Jan 27 15:25:02 wifi kernel: interrupt enabled,
 Jan 27 15:25:02 wifi kernel: resume,
 Jan 27 15:25:02 wifi kernel: IOPL = 0
 Jan 27 15:25:02 wifi kernel:
 Jan 27 15:25:02 wifi kernel: current process
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: trap 12: page fault while in kernel mode on 8.0-RELEASE (possibly bge(4) related)

2010-02-15 Thread Nick Rogers
hw.bge.allow_asf: 0

On Mon, Feb 15, 2010 at 2:23 AM, Giacomo Olgeni g.olg...@colby.it wrote:


 Hello,

 Are you running with hw.bge.allow_asf enabled?



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org