Re: crash with -current

2015-11-02 Thread Martin Pieuchot
On 01/11/15(Sun) 22:43, Sonic wrote:
> Upgraded -current earlier today, and system has crashed:

Do you have an idea how to reproduce this crash?  Which program are you
running that uses bpf?

> =
> Nov  1 22:23:55 stargate /bsd: uvm_fault(0x818f9920,
> 0xfff7818adf60, 0, 1) -> e
> Nov  1 22:23:55 stargate /bsd: fatal page fault in supervisor mode
> Nov  1 22:23:55 stargate /bsd: trap type 6 code 0 rip 81329e69
> cs 8 rflags 10286 cr2  fff7818adf60 cpl 7 rsp 8000221df76
> 0
> Nov  1 22:23:55 stargate /bsd: panic: trap type 6, code=0, pc=81329e69
> Nov  1 22:23:55 stargate /bsd: Starting stack trace...
> Nov  1 22:23:55 stargate /bsd: panic() at panic+0x10b
> Nov  1 22:23:55 stargate /bsd: trap() at trap+0x7b8
> Nov  1 22:23:55 stargate /bsd: --- trap (number 6) ---
> Nov  1 22:23:55 stargate /bsd: trap() at trap+0x709
> Nov  1 22:23:55 stargate /bsd: --- trap (number 4) ---
> Nov  1 22:23:55 stargate /bsd: trap() at trap+0x709
> Nov  1 22:23:55 stargate /bsd: --- trap (number 4) ---
> Nov  1 22:23:55 stargate /bsd: bpf_filter() at bpf_filter+0x19b
> Nov  1 22:23:55 stargate /bsd: _bpf_mtap() at _bpf_mtap+0xf4
> Nov  1 22:23:55 stargate /bsd: bpf_mtap_ether() at bpf_mtap_ether+0x39
> Nov  1 22:23:55 stargate /bsd: em_start() at em_start+0xd6
> Nov  1 22:23:55 stargate /bsd: nettxintr() at nettxintr+0x52
> Nov  1 22:23:55 stargate /bsd: softintr_dispatch() at softintr_dispatch+0x8b
> Nov  1 22:23:55 stargate /bsd: Xsoftnet() at Xsoftnet+0x1f
> Nov  1 22:23:55 stargate /bsd: --- interrupt ---
> Nov  1 22:23:55 stargate /bsd: end of kernel
> Nov  1 22:23:55 stargate /bsd: end trace frame: 0xff8132b494, count: 246
> Nov  1 22:23:55 stargate /bsd: 0x282:
> Nov  1 22:23:55 stargate /bsd: End of stack trace.
> =
> # dmesg
> OpenBSD 5.8-current (GENERIC.MP) #8: Sun Nov  1 13:10:37 EST 2015
> r...@stargate.grizzly.bear:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4277665792 (4079MB)
> avail mem = 4143894528 (3951MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.6 @ 0x9f000 (19 entries)
> bios0: vendor American Megatrends Inc. version "1.2b" date 07/19/13
> bios0: Supermicro X7SPA-HF
> acpi0 at bios0: rev 2
> acpi0: sleep states S0 S1 S4 S5
> acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST
> acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4)
> USB2(S4) USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4)
> P0P4(S4) P0P5(S4) P0P6(S4) P0P7(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu0: 512KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 200MHz
> cpu0: mwait min=64, max=64, C-substates=0.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.01 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
> cpu1: 512KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> ioapic0 at mainbus0: apid 3 pa 0xfec0, version 20, 24 pins
> ioapic0: misconfigured as apic 1, remapped to apid 3
> acpimcfg0 at acpi0 addr 0xe000, bus 0-255
> acpihpet0 at acpi0: 14318179 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 4 (P0P1)
> acpiprt2 at acpi0: bus 1 (P0P4)
> acpiprt3 at acpi0: bus 2 (P0P8)
> acpiprt4 at acpi0: bus 3 (P0P9)
> acpicpu0 at acpi0: C1(@1 halt!)
> acpicpu1 at acpi0: C1(@1 halt!)
> acpibtn0 at acpi0: SLPB
> acpibtn1 at acpi0: PWRB
> ipmi at mainbus0 not configured
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel Pineview DMI" rev 0x02
> uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x02: apic 3 int 16
> uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x02: apic 3 int 21
> uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x02: apic 3 int 19
> ehci0 at pci0 dev 26 function 7 "Intel 82801I USB" rev 0x02: apic 3 int 18
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
> ppb0 at pci0 dev 28 function 0 "Intel 82801I PCIE" rev 0x02: msi
> pci1 at ppb0 bus 1
> ppb1 at pci0 dev 28 function 4 "Intel 82801I PCIE" rev 0x02: msi
> pci2 at ppb1 bus 2
> em0 at pci2 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address
> 00:25:90:92:d4:f8
> ppb2 at pci0 dev 28 function 5 "Intel 82801I PCIE" rev 0x02: msi
> pci3 at ppb2 

Re: crash with -current

2015-11-02 Thread Michael McConville
Sonic wrote:
> On Mon, Nov 2, 2015 at 12:19 PM, Martin Pieuchot  wrote:
> > Do you have an idea how to reproduce this crash?  Which program are you
> > running that uses bpf?
> 
> Not using bpf at all (that I know of), just a straightforward firewall
> - pf, dhcpd, unbound.
> 
> The nasty little "em0: watchdog timeout -- resetting", reported on
> 10-4-15 bug is also back - triggered by using a bittorrent client on
> one of my desktops. The above reported crash happened while streaming
> some Netflix. I'm inclined to suspect the em driver is hosed again.

Yeah, my gut reaction was that this was something to do with the recent
em changes rather than BPF. I don't think the BPF code has changed
recently. BPF is hooked into network interfaces pretty directly, so if
some packet data was being freed too early by a race condition, this
seems like a conceivable way we'd find out.

My speculative 2¢.



Re: crash with -current

2015-11-02 Thread Sonic
On Mon, Nov 2, 2015 at 12:19 PM, Martin Pieuchot  wrote:
> Do you have an idea how to reproduce this crash?  Which program are you
> running that uses bpf?

Not using bpf at all (that I know of), just a straightforward firewall
- pf, dhcpd, unbound.

The nasty little "em0: watchdog timeout -- resetting", reported on
10-4-15 bug is also back - triggered by using a bittorrent client on
one of my desktops. The above reported crash happened while streaming
some Netflix. I'm inclined to suspect the em driver is hosed again.

Chris



Re: crash with -current

2015-11-02 Thread Stuart Henderson
On 2015-11-02, Michael McConville  wrote:
> My speculative 2¢.

See, even the USA needs more than ASCII :-)



Re: crash with -current

2015-11-02 Thread Daniel Ouellet
On 11/2/15 1:31 PM, Stuart Henderson wrote:
> On 2015-11-02, Michael McConville  wrote:
>> My speculative 2¢.
> 
> See, even the USA needs more than ASCII :-)

May be not. $0.02 :-))



Re: crash with -current

2015-11-02 Thread Otto Moerbeek
On Mon, Nov 02, 2015 at 12:34:11PM -0500, Sonic wrote:

> On Mon, Nov 2, 2015 at 12:19 PM, Martin Pieuchot  wrote:
> > Do you have an idea how to reproduce this crash?  Which program are you
> > running that uses bpf?
> 
> Not using bpf at all (that I know of), just a straightforward firewall
> - pf, dhcpd, unbound.

dhcpd uses bpf

> 
> The nasty little "em0: watchdog timeout -- resetting", reported on
> 10-4-15 bug is also back - triggered by using a bittorrent client on
> one of my desktops. The above reported crash happened while streaming
> some Netflix. I'm inclined to suspect the em driver is hosed again.
> 
> Chris



crash with -current

2015-11-01 Thread Sonic
Upgraded -current earlier today, and system has crashed:
=
Nov  1 22:23:55 stargate /bsd: uvm_fault(0x818f9920,
0xfff7818adf60, 0, 1) -> e
Nov  1 22:23:55 stargate /bsd: fatal page fault in supervisor mode
Nov  1 22:23:55 stargate /bsd: trap type 6 code 0 rip 81329e69
cs 8 rflags 10286 cr2  fff7818adf60 cpl 7 rsp 8000221df76
0
Nov  1 22:23:55 stargate /bsd: panic: trap type 6, code=0, pc=81329e69
Nov  1 22:23:55 stargate /bsd: Starting stack trace...
Nov  1 22:23:55 stargate /bsd: panic() at panic+0x10b
Nov  1 22:23:55 stargate /bsd: trap() at trap+0x7b8
Nov  1 22:23:55 stargate /bsd: --- trap (number 6) ---
Nov  1 22:23:55 stargate /bsd: trap() at trap+0x709
Nov  1 22:23:55 stargate /bsd: --- trap (number 4) ---
Nov  1 22:23:55 stargate /bsd: trap() at trap+0x709
Nov  1 22:23:55 stargate /bsd: --- trap (number 4) ---
Nov  1 22:23:55 stargate /bsd: bpf_filter() at bpf_filter+0x19b
Nov  1 22:23:55 stargate /bsd: _bpf_mtap() at _bpf_mtap+0xf4
Nov  1 22:23:55 stargate /bsd: bpf_mtap_ether() at bpf_mtap_ether+0x39
Nov  1 22:23:55 stargate /bsd: em_start() at em_start+0xd6
Nov  1 22:23:55 stargate /bsd: nettxintr() at nettxintr+0x52
Nov  1 22:23:55 stargate /bsd: softintr_dispatch() at softintr_dispatch+0x8b
Nov  1 22:23:55 stargate /bsd: Xsoftnet() at Xsoftnet+0x1f
Nov  1 22:23:55 stargate /bsd: --- interrupt ---
Nov  1 22:23:55 stargate /bsd: end of kernel
Nov  1 22:23:55 stargate /bsd: end trace frame: 0xff8132b494, count: 246
Nov  1 22:23:55 stargate /bsd: 0x282:
Nov  1 22:23:55 stargate /bsd: End of stack trace.
=
# dmesg
OpenBSD 5.8-current (GENERIC.MP) #8: Sun Nov  1 13:10:37 EST 2015
r...@stargate.grizzly.bear:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4277665792 (4079MB)
avail mem = 4143894528 (3951MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.6 @ 0x9f000 (19 entries)
bios0: vendor American Megatrends Inc. version "1.2b" date 07/19/13
bios0: Supermicro X7SPA-HF
acpi0 at bios0: rev 2
acpi0: sleep states S0 S1 S4 S5
acpi0: tables DSDT FACP APIC MCFG OEMB HPET EINJ BERT ERST HEST
acpi0: wakeup devices P0P1(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4)
USB2(S4) USB5(S4) EUSB(S4) USB3(S4) USB4(S4) USB6(S4) USBE(S4)
P0P4(S4) P0P5(S4) P0P6(S4) P0P7(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.23 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu0: 512KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 200MHz
cpu0: mwait min=64, max=64, C-substates=0.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Atom(TM) CPU D525 @ 1.80GHz, 1800.01 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE,NXE,LONG,LAHF,PERF,SENSOR
cpu1: 512KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 3 pa 0xfec0, version 20, 24 pins
ioapic0: misconfigured as apic 1, remapped to apid 3
acpimcfg0 at acpi0 addr 0xe000, bus 0-255
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 4 (P0P1)
acpiprt2 at acpi0: bus 1 (P0P4)
acpiprt3 at acpi0: bus 2 (P0P8)
acpiprt4 at acpi0: bus 3 (P0P9)
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
acpibtn0 at acpi0: SLPB
acpibtn1 at acpi0: PWRB
ipmi at mainbus0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Pineview DMI" rev 0x02
uhci0 at pci0 dev 26 function 0 "Intel 82801I USB" rev 0x02: apic 3 int 16
uhci1 at pci0 dev 26 function 1 "Intel 82801I USB" rev 0x02: apic 3 int 21
uhci2 at pci0 dev 26 function 2 "Intel 82801I USB" rev 0x02: apic 3 int 19
ehci0 at pci0 dev 26 function 7 "Intel 82801I USB" rev 0x02: apic 3 int 18
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb0 at pci0 dev 28 function 0 "Intel 82801I PCIE" rev 0x02: msi
pci1 at ppb0 bus 1
ppb1 at pci0 dev 28 function 4 "Intel 82801I PCIE" rev 0x02: msi
pci2 at ppb1 bus 2
em0 at pci2 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address
00:25:90:92:d4:f8
ppb2 at pci0 dev 28 function 5 "Intel 82801I PCIE" rev 0x02: msi
pci3 at ppb2 bus 3
em1 at pci3 dev 0 function 0 "Intel 82574L" rev 0x00: msi, address
00:25:90:92:d4:f9
uhci3 at pci0 dev 29 function 0 "Intel 82801I USB" rev 0x02: apic 3 int 23
uhci4 at pci0 dev 29 function 1 "Intel 82801I USB" rev 0x02: apic 3 int 19
uhci5 at pci0 dev 29 function 2 "Intel 82801I USB" rev 0x02: apic 3 int 18
ehci1 

Re: Sun Netra X1 still get dc0: watchdog timeout and crash on current

2011-03-05 Thread Mark Kettenis
 Date: Fri, 04 Mar 2011 07:40:59 -0500
 From: Daniel Ouellet dan...@presscom.net

 I get pretty much a symetric traffic of ~43.5Mbps each direction using 
 tcpbench as before.
 
 Then after may be one minute, both side will cut off oppose to before 
 when only one side was doing that. It will then stay off for a few 
 seconds and come back up on both side at the same time, just like ti 
 went down at the same time and EACH time this happen I will get a 
 watchdog timeout.
 
 I will get it on either network port like this:
 
 # dc0: watchdog timeout
 dc0: watchdog timeout
 dc1: watchdog timeout
 dc0: watchdog timeout
 dc0: watchdog timeout
 dc0: watchdog timeout
 dc0: watchdog timeout
 dc0: watchdog timeout
 dc0: watchdog timeout
 dc1: watchdog timeout
 dc0: watchdog timeout

Yes.  The chip still craps out.  My diff doesn't address that.  But at
least the machine won't panic anymore, and the network interface will
recover after the timeout.  It's committed now.

I'll see if I can figure out why the chip shits itself.  May send you
more stuff to test.

Cheers,

Mark



Re: Sun Netra X1 still get dc0: watchdog timeout and crash on current

2011-03-05 Thread Daniel Ouellet

On 3/5/11 1:49 PM, Mark Kettenis wrote:

Date: Fri, 04 Mar 2011 07:40:59 -0500
From: Daniel Ouelletdan...@presscom.net

I get pretty much a symetric traffic of ~43.5Mbps each direction using
tcpbench as before.

Then after may be one minute, both side will cut off oppose to before
when only one side was doing that. It will then stay off for a few
seconds and come back up on both side at the same time, just like ti
went down at the same time and EACH time this happen I will get a
watchdog timeout.

I will get it on either network port like this:

# dc0: watchdog timeout
dc0: watchdog timeout
dc1: watchdog timeout
dc0: watchdog timeout
dc0: watchdog timeout
dc0: watchdog timeout
dc0: watchdog timeout
dc0: watchdog timeout
dc0: watchdog timeout
dc1: watchdog timeout
dc0: watchdog timeout


Yes.  The chip still craps out.  My diff doesn't address that.  But at
least the machine won't panic anymore, and the network interface will
recover after the timeout.  It's committed now.

I'll see if I can figure out why the chip shits itself.  May send you
more stuff to test.


Thanks I saw that commit.

Anytime you need me to test more, I am all over it! (:

Best,

Daniel



Re: Sun Netra X1 still get dc0: watchdog timeout and crash on current

2011-03-04 Thread Daniel Ouellet

On 3/3/11 3:28 PM, Mark Kettenis wrote:
 Daniel,

 Can you try this diff?  It won't get rid of the watchdog timeout, but
 hopefully it will prevent the DMA error.


Hi Mark,

Here is the results for this.

Sorry it took me so long to get to test it.

It does make a difference and I will test more today, but so far here is 
what I have after 30+ minutes of testing different things.


No crash yet. (But I will let it run for a few hours to be sure)

The connections behave differently now.

I get pretty much a symetric traffic of ~43.5Mbps each direction using 
tcpbench as before.


Then after may be one minute, both side will cut off oppose to before 
when only one side was doing that. It will then stay off for a few 
seconds and come back up on both side at the same time, just like ti 
went down at the same time and EACH time this happen I will get a 
watchdog timeout.


I will get it on either network port like this:

# dc0: watchdog timeout
dc0: watchdog timeout
dc1: watchdog timeout
dc0: watchdog timeout
dc0: watchdog timeout
dc0: watchdog timeout
dc0: watchdog timeout
dc0: watchdog timeout
dc0: watchdog timeout
dc1: watchdog timeout
dc0: watchdog timeout


More frequently on dc0 then dc1. Why I do not know. May be just what I 
start first, may be, but anyway, that's the results so far.


Each time it comes back up it will go back to the ~42.5Mbps, be stable 
for a while, then goes down on both side at once, etc, but each time the 
watchdog will show up when you loose the traffic going through the router.


But still no crash yet.

One thing that I would love to understand in tcpbench however to make 
this better.


I thought that one could actually preset the size of the packets sent?

I see the -B and -S option, but I keep reading the man page and I just 
don't get it.


I thought may be the -S was the size of the packet, but then if I set it 
above 1500, it shouldn't work then as the MTU is 1500, but it does work.


I would love to get more details on that and to know how to set the size 
of the packets if possible as I do not get the meaning of the man page 
for the -B -S options as is and their real impact on the stream itself 
if any.


I will send you more feedback today at some point after I get some much 
needed sleep.


Look to me based on what I see that if the watchdog timeout wasn't 
there, most likely there wouldn't be cut in the stream going across the 
router as this is way to consistent in between the cut and the console 
output.


Best and thanks again for your work on this. I very much appreciate it 
big time!


Daniel



Re: Sun Netra X1 still get dc0: watchdog timeout and crash on current

2011-03-04 Thread Daniel Ouellet

On 3/4/11 7:40 AM, Daniel Ouellet wrote:

On 3/3/11 3:28 PM, Mark Kettenis wrote:
  Daniel,
 
  Can you try this diff? It won't get rid of the watchdog timeout, but
  hopefully it will prevent the DMA error.



Follow up on my previous email, I let it run 12 hours so far with as 
much traffic I could send both ways with two other servers running 
tcpbench and not a crash what so ever. Still the interrupt runs at 90% 
pretty much constantly other then when the timeout watchdog show up 
where it doesn't have any interrupt then, obviously no traffic to 
process, anyway. So far I see this watchdog still only when I do duplex 
traffic and I do not see it when it's one way, or way slow on on 
direction and full the other.


Your diff sure help, bot sure if that's the best solution, but sure 
remove the crash.


Hope this help some.

Daniel



Re: Sun Netra X1 still get dc0: watchdog timeout and crash on current

2011-03-03 Thread Mark Kettenis
Daniel,

Can you try this diff?  It won't get rid of the watchdog timeout, but
hopefully it will prevent the DMA error.


Index: dc.c
===
RCS file: /cvs/src/sys/dev/ic/dc.c,v
retrieving revision 1.116
diff -u -p -r1.116 dc.c
--- dc.c5 Aug 2010 07:57:04 -   1.116
+++ dc.c3 Mar 2011 20:22:12 -
@@ -3059,6 +3059,7 @@ void
 dc_stop(struct dc_softc *sc, int softonly)
 {
struct ifnet *ifp;
+   u_int32_t isr;
int i;
 
ifp = sc-sc_arpcom.ac_if;
@@ -3070,6 +3071,28 @@ dc_stop(struct dc_softc *sc, int softonl
 
if (!softonly) {
DC_CLRBIT(sc, DC_NETCFG, (DC_NETCFG_RX_ON|DC_NETCFG_TX_ON));
+
+   for (i = 0; i  DC_TIMEOUT; i++) {
+   isr = CSR_READ_4(sc, DC_ISR);
+   if ((isr  DC_ISR_TX_IDLE ||
+   (isr  DC_ISR_TX_STATE) == DC_TXSTATE_RESET) 
+   (isr  DC_ISR_RX_STATE) == DC_RXSTATE_STOPPED)
+   break;
+   DELAY(10);
+   }
+
+   if (i == DC_TIMEOUT) {
+   if (!((isr  DC_ISR_TX_IDLE) ||
+   (isr  DC_ISR_TX_STATE) == DC_TXSTATE_RESET) 
+   !DC_IS_ASIX(sc)  !DC_IS_DAVICOM(sc))
+   printf(%s: failed to force tx to idle state\n,
+   sc-sc_dev.dv_xname);
+   if (!((isr  DC_ISR_RX_STATE) == DC_RXSTATE_STOPPED) 
+   !DC_HAS_BROKEN_RXSTATE(sc))
+   printf(%s: failed to force rx to idle state\n,
+   sc-sc_dev.dv_xname);
+   }
+
CSR_WRITE_4(sc, DC_IMR, 0x);
CSR_WRITE_4(sc, DC_TXADDR, 0x);
CSR_WRITE_4(sc, DC_RXADDR, 0x);



Sun Netra X1 still get dc0: watchdog timeout and crash on current.

2011-03-02 Thread Daniel Ouellet

Hi,

I was testing routing with this box to see what could be done and when 
use in full duplex mode routing traffic with tcpbench on servers at 
either side of this box and this one as a router in between, I get this 
crash after a timeout of the watchdog.


Only happen when I process traffic in both direction, if I do one side, 
all good, the other all good, but both direction at the same time, I get 
94Mb on one side, 54Mb on the other and crash with timeout getting me 
into the ddb


All related information below, trace, ps and dmesg.

Thanks

Daniel


# dc0: watchdog timeout
panic: psycho0: uncorrectable DMA error AFAR 6ed72050 (pa=0 
tte=0/6ce94012) AFSR 21ff4000

kdb breakpoint at 1462280
Stopped at  Debugger+0x4:   nop
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb

ddb trace
psycho_ue(4f99e00, 6, e0017428, 40009db9c40, 14284e0, ) at 
psycho_ue+0x

7c
sparc_interrupt(40009da8f50, 0, 40010b14050, 80006ce94012, 20, 
deafbeef) at

 sparc_interrupt+0x2a0
m_extfree(40009da8f50, 4000a07c0f0, 60014000, ff00, 1918, deafbeef) at 
m_extfre

e+0x7c
m_free_unlocked(40009da8f50, 60014000, 2000, 4000a07c0f0, 11f8ce0, 18d0) 
at m_f

ree_unlocked+0x4c
m_freem(40009da8f50, 2000, 400010a9d00, 0, 800, 2) at m_freem+0x20
dc_stop(4000109a000, 3, 2000, 4000a07c150, 20, 40) at dc_stop+0xd0
dc_init(4000109a000, 6, 40001081900, 1e00, 20, a) at dc_init+0x20
dc_tx_underrun(4000109a000, 73ff, ff, 8000, ff00, 18d0) at 
dc_tx_underrun+0

x1ec
dc_intr(4000109a000, 1, e0017b50, 40009db9c40, 1081200, ) at 
dc_intr+0x198
sparc_interrupt(400014f4050, 40001081e00, 40001081900, 7e0, 20, a) at 
sparc_int

errupt+0x2a0
nettxintr(4000109a000, 3fff, ff, ff00, 1918, 1910) at nettxintr+0x6c
netintr(e0017ec8, 0, e0017ec8, 8000, 11f8ce0, 18d0) at netintr+0xd0
sparc_interrupt(e0017ec8, 0, e0017ec8, 0, 11f8ce0, 180f3d8) at 
sparc_interrupt+

0x2a0
sparc_interrupt(188a030, e0018000, 153a150, 1886148, 0, 0) at 
sparc_interrupt+0

x2a0
sched_idle(e0018000, 4000a070400, 153a758, 4000efb8000, 40010bba470, 
1c09d30) a

t sched_idle+0x140
proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x4
ddb


ddb ps
   PID   PPID   PGRPUID  S   FLAGS  WAIT  COMMAND
 17579  24574  17579  0  3 0x44180  poll  systat
 24574  26610  24574  0  3  0x4080  pause ksh
 26610  29211  26610  0  3  0x4180  selectsshd
  2817  1   2817  0  3 0x40180  selectsendmail
 22433  1  22433  0  3  0x4080  ttyin ksh
 16463  1  16463  0  30x80  selectcron
  1739  1   1739  0  3   0x180  selectinetd
 29211  1  29211  0  30x80  selectsshd
 13002  31220  26791 83  3   0x180  poll  ntpd
 31220  26791  26791 83  3   0x180  poll  ntpd
 26791  1  26791  0  30x80  poll  ntpd
  9509  15123  15123 74  3   0x180  bpf   pflogd
 15123  1  15123  0  30x80  netio pflogd
 30009  12451  12451 73  3   0x180  poll  syslogd
 12451  1  12451  0  30x88  netio syslogd
13  0  0  0  30x100200  aiodoned  aiodoned
12  0  0  0  30x100200  syncerupdate
11  0  0  0  30x100200  cleaner   cleaner
10  0  0  0  30x100200  reaperreaper
 9  0  0  0  30x100200  pgdaemon  pagedaemon
 8  0  0  0  30x100200  bored crypto
 7  0  0  0  30x100200  pftm  pfpurge
 6  0  0  0  30x100200  usbtskusbtask
 5  0  0  0  30x100200  usbatsk   usbatsk
 4  0  0  0  30x100200  bored syswq
*3  0  0  0  7  0x40100200idle0
 2  0  0  0  30x100200  kmalloc   kmthread
 1  0  1  0  3  0x4080  wait  init
 0 -1  0  0  3 0x80200  scheduler swapper
ddb


OpenBSD 4.9 (GENERIC) #253: Tue Mar  1 17:27:32 MST 2011
dera...@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/GENERIC
real mem = 1073741824 (1024MB)
avail mem = 1043521536 (995MB)
mainbus0 at root: Sun Netra X1 (UltraSPARC-IIe 500MHz)
cpu0 at mainbus0: SUNW,UltraSPARC-IIe (rev 1.4) @ 500 MHz
cpu0: physical 16K instruction (32 b/l), 16K data (32 b/l), 256K 
external (64 b/l)

psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
psycho0: bus range 0-0, PCI bus 0
psycho0: dvma map 6000-7fff
pci0 at psycho0
ebus0 at pci0 dev 7 function 0 Acer Labs M1533 ISA rev 0x00
dma at ebus0 addr 0- ivec 0x2a not configured
rtc0 at ebus0 addr 70-71: m5819
power0 at ebus0 addr 2000-2007 ivec 0x23
lom0 at ebus0 addr 8010-8011 ivec 0x2a: LOMlite2 rev 3.10
com0 at ebus0 addr 3f8-3ff 

Kernel crash in -current of yesterday

2006-06-11 Thread Federico Giannici
As I have some lockups of the PC, someone suggested me to upgrade to 
-current.


I download the current snapshot of a couple hours ago and made an upgrade.

At the following reboot the system crashed!

It seems that there were two problems.
First there were the following blue texts:

spec_open_clone(): cloning device (23, 0) for pid 28994
spec_open_clone(): new minor for cloned device is 1

And a couple lines later in the rc output there was the following blue 
text (warning: I copied it by hand):


uvm_fault(0xfe8013033c18, 0x0, 0, 1) - e
kernel: page fault trap, code=0
stopped at ffs_sync_vnode +0x25: testb $0xf,0x20(%rax)

And here is the output of trace (again, copied by hand):

ffs_sync_vnode() AT ffs_sync_vnode + 0x25
ufs_mount_foreach_vnode() AT ufs_mount_foreach_vnode + 0x32
ffs_sync() AT ffs_sync + 0x74
sys_sync() AT sys_sync + 0x97
syscall() AT syscall + 0x225
--- syscall (number 36) ---
end of kernel
end of trace frame: 0x503941a8, count: -5


Bye.

--
___
__
   |-  [EMAIL PROTECTED]
   |ederico Giannici  http://www.neomedia.it
___