Re: serious watchdog timeout issues with em driver
On Mon, Dec 21, 2015 at 10:41:22AM +0200, Kapetanakis Giannis wrote: > Hi, > > Problem is still here with Dec 16 snapshot. > > Dec 17 13:08:20 server /bsd: OpenBSD 5.8-current (GENERIC.MP) #1494: Wed Dec > 16 12:13:03 MST 2015 > Dec 17 13:08:20 server /bsd: > dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP > Dec 17 13:08:20 server /bsd: cpu0: Intel(R) Pentium(R) 4 CPU 3.00GHz > ("GenuineIntel" 686-class) 3 GHz > Dec 17 13:08:20 server /bsd: em0 at pci1 dev 10 function 0 "Intel 82541EI" > rev 0x00: apic 2 int 22, address 00:30:48:72:28:58 > Dec 17 13:08:20 server /bsd: em1 at pci1 dev 11 function 0 "Intel 82541EI" > rev 0x00: apic 2 int 23, address 00:30:48:72:28:59 > Dec 20 16:53:18 server /bsd: em0: watchdog timeout -- resetting > Dec 21 01:54:12 server /bsd: em0: watchdog timeout -- resetting > > G > I'm also seeing this with a Dec 19 snapshot on i386. This is with em0 at pci1 dev 0 function 0 "Intel 82583V" rev 0x00: msi, address 00:03:2d:20:cf:84 em1 at pci2 dev 0 function 0 "Intel 82583V" rev 0x00: msi, address 00:03:2d:20:cf:85 em2 at pci3 dev 0 function 0 "Intel 82583V" rev 0x00: msi, address 00:03:2d:20:cf:86 em3 at pci4 dev 0 function 0 "Intel 82583V" rev 0x00: msi, address 00:03:2d:20:cf:87 the timeouts seem to be much less frequently though and it looks like running iperf doesn't trigger them anymore. When running iperf, I'm seeing the top shows "system" nicely distributed over cores #1 to #3 and interrupts on core #0 and throughput at around 500Mbit/sec. A dmesg is attached after my signature. -- Gregor OpenBSD 5.8-current (GENERIC.MP) #1499: Sat Dec 19 08:24:55 MST 2015 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP cpu0: Intel(R) Atom(TM) CPU D2550 @ 1.86GHz ("GenuineIntel" 686-class) 1.87 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,xTPR,PDCM,MOVBE,LAHF,PERF,ITSC,SENSOR,ARAT real mem = 2135064576 (2036MB) avail mem = 2081611776 (1985MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: date 10/11/11, SMBIOS rev. 2.7 @ 0xe9380 (50 entries) bios0: vendor American Megatrends Inc. version "4.6.5" date 06/21/2012 bios0: INTEL Corporation Tiger Hill acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S3 S4 S5 acpi0: tables DSDT FACP APIC MCFG HPET SSDT SSDT SSDT IFEU acpi0: wakeup devices P0P8(S4) PS2K(S3) PS2M(S3) USB0(S3) USB1(S3) USB2(S3) USB3(S3) USB7(S3) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) mtrr: Pentium Pro MTRR support, 7 var ranges, 88 fixed ranges cpu0: apic clock running at 133MHz cpu0: mwait min=64, max=64, C-substates=0.1, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Atom(TM) CPU D2550 @ 1.86GHz ("GenuineIntel" 686-class) 1.87 GHz cpu1: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,xTPR,PDCM,MOVBE,LAHF,PERF,ITSC,SENSOR,ARAT cpu2 at mainbus0: apid 2 (application processor) cpu2: Intel(R) Atom(TM) CPU D2550 @ 1.86GHz ("GenuineIntel" 686-class) 1.87 GHz cpu2: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,xTPR,PDCM,MOVBE,LAHF,PERF,ITSC,SENSOR,ARAT cpu3 at mainbus0: apid 3 (application processor) cpu3: Intel(R) Atom(TM) CPU D2550 @ 1.86GHz ("GenuineIntel" 686-class) 1.87 GHz cpu3: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,DTES64,MWAIT,DS-CPL,TM2,SSSE3,xTPR,PDCM,MOVBE,LAHF,PERF,ITSC,SENSOR,ARAT ioapic0 at mainbus0: apid 4 pa 0xfec0, version 20, 24 pins acpimcfg0 at acpi0 addr 0xe000, bus 0-255 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 5 (P0P8) acpiprt2 at acpi0: bus 1 (RP01) acpiprt3 at acpi0: bus 2 (RP02) acpiprt4 at acpi0: bus 3 (RP03) acpiprt5 at acpi0: bus 4 (RP04) acpiec0 at acpi0: not present acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) acpicpu2 at acpi0: C1(@1 halt!) acpicpu3 at acpi0: C1(@1 halt!) acpitz0 at acpi0: critical temperature is 140 degC acpipwrres0 at acpi0: FN00, resource for FAN0 acpitz1 at acpi0: critical temperature is 100 degC acpibat0 at acpi0: BAT0 not present acpibat1 at acpi0: BAT1 not present acpibtn0 at acpi0: PWRB acpiac0 at acpi0: AC unit offline acpibtn1 at acpi0: SLPB acpibtn2 at acpi0: LID0 acpivideo0 at acpi0: GFX0 acpivout0 at acpivideo0: DD02 bios0: ROM list: 0xc/0xf400! 0xcf800/0x1000 0xd0800/0x1000 0xd1800/0x1000 0xd2800/0x1000 pci0 at mainbus0 bus 0: configuration mode 1 (bios) pchb0 at pci0 dev 0 function 0 vendor
Re: serious watchdog timeout issues with em driver
Hi, Problem is still here with Dec 16 snapshot. Dec 17 13:08:20 server /bsd: OpenBSD 5.8-current (GENERIC.MP) #1494: Wed Dec 16 12:13:03 MST 2015 Dec 17 13:08:20 server /bsd: dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP Dec 17 13:08:20 server /bsd: cpu0: Intel(R) Pentium(R) 4 CPU 3.00GHz ("GenuineIntel" 686-class) 3 GHz Dec 17 13:08:20 server /bsd: em0 at pci1 dev 10 function 0 "Intel 82541EI" rev 0x00: apic 2 int 22, address 00:30:48:72:28:58 Dec 17 13:08:20 server /bsd: em1 at pci1 dev 11 function 0 "Intel 82541EI" rev 0x00: apic 2 int 23, address 00:30:48:72:28:59 Dec 20 16:53:18 server /bsd: em0: watchdog timeout -- resetting Dec 21 01:54:12 server /bsd: em0: watchdog timeout -- resetting G
Re: serious watchdog timeout issues with em driver
On 09/12/15 10:42, Kapetanakis Giannis wrote: On 08/12/15 21:47, Kapetanakis Giannis wrote: The event happened only once and it's network recovered after a few seconds. no reboot. G Well that didn't last long. Today I found the server hanged at ddb after a new watchdog timeout on em0. Keyboard was not working so I could not get all the info. I wrote on paper: uvm_fault(0xd0ba3660, 0xefffe000, 0, 1) -> d kernel: page fault trap, code=0 Stopped at bpf_m_xhalt+0x6f: movzwl 0(%esi),%eax G Hi, Has something changed from Dec 6 snapshot to Dec 9 current that fixed this? I've seen that the driver has not been updated. I've compiled new current kernel on Dec 9 and system does NOT have any problem since then. No watchdog timeout and no crash. Problem solved? Any link to the commit that solved it? thanks Giannis
Re: serious watchdog timeout issues with em driver
On 08/12/15 21:47, Kapetanakis Giannis wrote: The event happened only once and it's network recovered after a few seconds. no reboot. G Well that didn't last long. Today I found the server hanged at ddb after a new watchdog timeout on em0. Keyboard was not working so I could not get all the info. I wrote on paper: uvm_fault(0xd0ba3660, 0xefffe000, 0, 1) -> d kernel: page fault trap, code=0 Stopped at bpf_m_xhalt+0x6f: movzwl 0(%esi),%eax G
Re: serious watchdog timeout issues with em driver
On 20/11/15 15:12, Martin Pieuchot wrote: I just committed a revert to 1.305 keeping the API changes needed for the driver to build. This should bring your stability back, please let us know if that's not the case. I'm sorry for your troubles. Hi, I've upgraded yesterday to Dec 6 snapshot and today I got my first em0: watchdog timeout -- resetting regards, G OpenBSD 5.8-current (GENERIC.MP) #1468: Sun Dec 6 11:27:59 MST 2015 dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP cpu0: Intel(R) Pentium(R) 4 CPU 3.00GHz ("GenuineIntel" 686-class) 3 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,CNXT-ID,xTPR,PERF real mem = 2146910208 (2047MB) avail mem = 2093232128 (1996MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: date 04/13/04, BIOS32 rev. 0 @ 0xfb7f0, SMBIOS rev. 2.3 @ 0xf0800 (42 entries) bios0: vendor Phoenix Technologies, LTD version "6.00 PG" date 04/13/2004 bios0: Supermicro P4SCE acpi0 at bios0: rev 0 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP APIC acpi0: wakeup devices HUB0(S5) UAR1(S5) UAR2(S5) USB0(S1) USB1(S1) USB2(S1) USB3(S1) USBE(S1) MODM(S5) PCI0(S5) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 199MHz cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Pentium(R) 4 CPU 3.00GHz ("GenuineIntel" 686-class) 3 GHz cpu1: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,CNXT-ID,xTPR,PERF ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins ioapic0: misconfigured as apic 0, remapped to apid 2 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus 1 (HUB0) acpicpu0 at acpi0: C1(@1 halt!) acpicpu1 at acpi0: C1(@1 halt!) acpitz0 at acpi0: critical temperature is 100 degC acpibtn0 at acpi0: PWRB bios0: ROM list: 0xc/0x8000 0xc8000/0x8000! pci0 at mainbus0 bus 0: configuration mode 1 (bios) pchb0 at pci0 dev 0 function 0 "Intel 82875P Host" rev 0x02 uhci0 at pci0 dev 29 function 0 "Intel 82801EB/ER USB" rev 0x02: apic 2 int 16 uhci1 at pci0 dev 29 function 1 "Intel 82801EB/ER USB" rev 0x02: apic 2 int 19 uhci2 at pci0 dev 29 function 2 "Intel 82801EB/ER USB" rev 0x02: apic 2 int 18 uhci3 at pci0 dev 29 function 3 "Intel 82801EB/ER USB" rev 0x02: apic 2 int 16 ehci0 at pci0 dev 29 function 7 "Intel 82801EB/ER USB2" rev 0x02: apic 2 int 23 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1 ppb0 at pci0 dev 30 function 0 "Intel 82801BA Hub-to-PCI" rev 0xc2 pci1 at ppb0 bus 1 vga1 at pci1 dev 9 function 0 "ATI Rage XL" rev 0x27 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) em0 at pci1 dev 10 function 0 "Intel 82541EI" rev 0x00: apic 2 int 22, address 00:30:48:72:28:58 em1 at pci1 dev 11 function 0 "Intel 82541EI" rev 0x00: apic 2 int 23, address 00:30:48:72:28:59 ichpcib0 at pci0 dev 31 function 0 "Intel 82801EB/ER LPC" rev 0x02 pciide0 at pci0 dev 31 function 1 "Intel 82801EB/ER IDE" rev 0x02: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility pciide0: channel 0 disabled (no drives) pciide0: channel 1 disabled (no drives) pciide1 at pci0 dev 31 function 2 "Intel 82801EB SATA" rev 0x02: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI pciide1: using apic 2 int 18 for native-PCI interrupt wd0 at pciide1 channel 0 drive 0: wd0: 16-sector PIO, LBA48, 78533MB, 160836480 sectors wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 6 ichiic0 at pci0 dev 31 function 3 "Intel 82801EB/ER SMBus" rev 0x02: apic 2 int 17 iic0 at ichiic0 spdmem0 at iic0 addr 0x50: 512MB DDR SDRAM non-parity PC3200CL3.0 spdmem1 at iic0 addr 0x51: 512MB DDR SDRAM non-parity PC3200CL3.0 spdmem2 at iic0 addr 0x52: 512MB DDR SDRAM non-parity PC3200CL3.0 spdmem3 at iic0 addr 0x53: 512MB DDR SDRAM non-parity PC3200CL3.0 usb1 at uhci0: USB revision 1.0 uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb2 at uhci1: USB revision 1.0 uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb3 at uhci2: USB revision 1.0 uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1 usb4 at uhci3: USB revision 1.0 uhub4 at usb4 "Intel UHCI root hub" rev 1.00/1.00 addr 1 isa0 at ichpcib0 isadma0 at isa0 fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo pckbc0 at isa0 port 0x60/5 irq 1 irq 12 pckbd0 at pckbc0 (kbd slot) wskbd0 at pckbd0: console keyboard, using wsdisplay0 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 wbsio0 at isa0 port 0x2e/2: W83627HF rev 0x17 lm1 at wbsio0 port 0x290/8: W83627HF npx0 at isa0 port 0xf0/16: reported by CPUID; using
Re: serious watchdog timeout issues with em driver
Kapetanakis Giannis [bil...@edu.physics.uoc.gr] wrote: > On 20/11/15 15:12, Martin Pieuchot wrote: > >I just committed a revert to 1.305 keeping the API changes needed for > >the driver to build. > > > >This should bring your stability back, please let us know if that's not > >the case. > > > >I'm sorry for your troubles. > > Hi, > > I've upgraded yesterday to Dec 6 snapshot and today I got my first > em0: watchdog timeout -- resetting > > em0 at pci1 dev 10 function 0 "Intel 82541EI" rev 0x00: apic 2 int 22, > address 00:30:48:72:28:58 > em1 at pci1 dev 11 function 0 "Intel 82541EI" rev 0x00: apic 2 int 23, > address 00:30:48:72:28:59 Can you try to pinpoint when it started?
Re: serious watchdog timeout issues with em driver
On 08/12/15 19:39, Chris Cappuccio wrote: Kapetanakis Giannis [bil...@edu.physics.uoc.gr] wrote: On 20/11/15 15:12, Martin Pieuchot wrote: I just committed a revert to 1.305 keeping the API changes needed for the driver to build. This should bring your stability back, please let us know if that's not the case. I'm sorry for your troubles. Hi, I've upgraded yesterday to Dec 6 snapshot and today I got my first em0: watchdog timeout -- resetting em0 at pci1 dev 10 function 0 "Intel 82541EI" rev 0x00: apic 2 int 22, address 00:30:48:72:28:58 em1 at pci1 dev 11 function 0 "Intel 82541EI" rev 0x00: apic 2 int 23, address 00:30:48:72:28:59 Can you try to pinpoint when it started? You mean what type of traffic caused it? Don't know. The server is a ~ busy internal-only recursive DNS server (bind). Other than that I was playing in it's shell when the event occurred, nothing special. If you mean time since boot, it was after ~ 22hours Dec 7 15:53:20 /bsd: OpenBSD 5.8-current (GENERIC.MP) #1468: Sun Dec 6 11:27:59 MST 2015 Dec 8 16:06:59 /bsd: em0: watchdog timeout -- resetting Dec 8 16:07:00 named[10537]: client: warning: client xx.xx.xx.xx#30399 (mail.expressionclones.com): error sending response: host unreachable Dec 8 16:07:00 named[10537]: client: warning: client yy.yy.yy.yy#52263 (85.151.91.139.sa-accredit.habeas.com): error sending response: host unreachable The event happened only once and it's network recovered after a few seconds. no reboot. G
Re: serious watchdog timeout issues with em driver
On 30.11.2015 14:08, Atanas Vladimirov wrote: Hi, I'm not sure if this is related to resent em(4) changes, but after upgrade from: Hi, Just ignore my previous assumptions. I thinks that I found the real cause for this upload speed problem. I'm using ifstated to inform me when something goes wrong with my egress interface. snip from ifstated.conf ... state extif_online { init { run "echo External interface ON-line @ `date +%H:%M:%S` | mail -s 'External Interface ON-line' t...@example.com" run "/usr/sbin/arp -Ff /etc/ether.mac" } if $em2_up && ! $peer_up { set-state extif_up } if $em2_down { set-state extif_down } } ... /snip [ns]~$ cat /etc/ether.mac 95.YY.XXX.225 64:87:88:58:b2:41 permanent ^^^ this is the ip of my default gateway If I have a permanent arp entry for my gateway, then I observe 1-2mbps upload speed. After I clear the arp I get 30-40mbps as it should be. Meanwhile I updated to more resent snapshot #1696: Wed Dec 2 10:13:03 MST 2015 and the problem persist. If you need more info just ask. Best regard, Atanas dmesg: OpenBSD 5.8-current (GENERIC.MP) #1696: Wed Dec 2 10:13:03 MST 2015 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 4269342720 (4071MB) avail mem = 4135833600 (3944MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.6 @ 0x9f000 (70 entries) bios0: vendor American Megatrends Inc. version "1.2a" date 06/27/2012 bios0: Supermicro X8SIL acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP APIC MCFG OEMB HPET GSCI SSDT EINJ BERT ERST HEST acpi0: wakeup devices P0P1(S4) P0P3(S4) P0P4(S4) P0P5(S4) P0P6(S4) BR1E(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) GBE_(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz, 2400.38 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 133MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz, 2400.00 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz, 2400.00 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz, 2400.00 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 0, core 3, package 0 ioapic0 at mainbus0: apid 7 pa 0xfec0, version 20, 24 pins ioapic0: misconfigured as apic 1, remapped to apid 7 acpimcfg0 at acpi0 addr 0xe000, bus 0-255 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (P0P1) acpiprt2 at acpi0: bus 1 (P0P3) acpiprt3 at acpi0: bus 2 (P0P5) acpiprt4 at acpi0: bus -1 (P0P6) acpiprt5 at acpi0: bus 6 (BR1E) acpiprt6 at acpi0: bus 3 (BR20) acpiprt7 at acpi0: bus 4 (BR24) acpiprt8 at acpi0: bus 5 (BR25) acpicpu0 at acpi0: !C3(350@17 mwait.1@0x20), !C2(500@17 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu1 at acpi0: !C3(350@17 mwait.1@0x20), !C2(500@17 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu2 at acpi0: !C3(350@17 mwait.1@0x20), !C2(500@17 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu3 at acpi0: !C3(350@17 mwait.1@0x20), !C2(500@17 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpibtn0 at acpi0: SLPB acpibtn1 at acpi0: PWRB ipmi at mainbus0 not configured cpu0: Enhanced SpeedStep 2400 MHz: speeds: 2401, 2400, 2267, 2133, 2000, 1867, 1733, 1600, 1467, 1333, 1200 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Core DMI" rev 0x11 ppb0 at pci0 dev 3 function 0 "Intel Core PCIE" rev 0x11: msi pci1 at ppb0 bus 1 ppb1 at pci0 dev 5 function 0 "Intel Core
Re: serious watchdog timeout issues with em driver
On 02.12.2015 22:25, Atanas Vladimirov wrote: On 30.11.2015 14:08, Atanas Vladimirov wrote: Hi, I'm not sure if this is related to resent em(4) changes, but after upgrade from: Hi, Just ignore my previous assumptions. Hi, Sorry for the noise! Please ignore all of my previous emails. It seems that my ISP changed a NIC port on the router which served as my default gateway and I used a wrong MAC address. I'm really sorry. Best wishes, Atanas
Re: serious watchdog timeout issues with em driver
On 20.11.2015 21:10, Sonic wrote: On Fri, Nov 20, 2015 at 12:37 PM, Mark Ketteniswrote: Thanks Martin. All is fine now. System booted with no errors and no watchdog timeouts. Thanks to all. Chris Hi, I'm not sure if this is related to resent em(4) changes, but after upgrade from: -OpenBSD 5.8-current (GENERIC.MP) #1597: Thu Nov 12 07:33:59 MST 2015 +OpenBSD 5.8-current (GENERIC.MP) #1671: Thu Nov 26 20:36:24 MST 2015 my upload throughput can't reach more than 2mbps. I also tried Nov 27 22:50:35 snapshot with same result. -OpenBSD 5.8-current (GENERIC.MP) #1671: Thu Nov 26 20:36:24 MST 2015 +OpenBSD 5.8-current (GENERIC.MP) #1675: Fri Nov 27 22:50:35 MST 2015 I tried to build an older em(4) revisions, but all exept if_em.c -r 1.312 failed to build. Kernel with if_em.c -r 1.312, if_em.h -r 1.60 and if_em_hw.c -r 1.88 boot normally but same upload throughput. Does anyone observe such a behavior? If you need more info just ask. Thanks for your time. P.S.: When I plug the cable from my ISP in Windows 7 laptop I have 35-40 mbps. full dmesg: OpenBSD 5.8-current (GENERIC.MP) #1675: Fri Nov 27 22:50:35 MST 2015 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 4269342720 (4071MB) avail mem = 4135833600 (3944MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.6 @ 0x9f000 (70 entries) bios0: vendor American Megatrends Inc. version "1.2a" date 06/27/2012 bios0: Supermicro X8SIL acpi0 at bios0: rev 2 acpi0: sleep states S0 S1 S4 S5 acpi0: tables DSDT FACP APIC MCFG OEMB HPET GSCI SSDT EINJ BERT ERST HEST acpi0: wakeup devices P0P1(S4) P0P3(S4) P0P4(S4) P0P5(S4) P0P6(S4) BR1E(S4) PS2K(S4) PS2M(S4) USB0(S4) USB1(S4) USB2(S4) USB3(S4) USB4(S4) USB5(S4) USB6(S4) GBE_(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz, 2400.32 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 133MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz, 2400.00 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz, 2400.00 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz, 2400.01 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,NXE,LONG,LAHF,PERF,ITSC,SENSOR cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 0, core 3, package 0 ioapic0 at mainbus0: apid 7 pa 0xfec0, version 20, 24 pins ioapic0: misconfigured as apic 1, remapped to apid 7 acpimcfg0 at acpi0 addr 0xe000, bus 0-255 acpihpet0 at acpi0: 14318179 Hz acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (P0P1) acpiprt2 at acpi0: bus 1 (P0P3) acpiprt3 at acpi0: bus 2 (P0P5) acpiprt4 at acpi0: bus -1 (P0P6) acpiprt5 at acpi0: bus 6 (BR1E) acpiprt6 at acpi0: bus 3 (BR20) acpiprt7 at acpi0: bus 4 (BR24) acpiprt8 at acpi0: bus 5 (BR25) acpicpu0 at acpi0: !C3(350@17 mwait.1@0x20), !C2(500@17 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu1 at acpi0: !C3(350@17 mwait.1@0x20), !C2(500@17 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu2 at acpi0: !C3(350@17 mwait.1@0x20), !C2(500@17 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu3 at acpi0: !C3(350@17 mwait.1@0x20), !C2(500@17 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpibtn0 at acpi0: SLPB acpibtn1 at acpi0: PWRB ipmi at mainbus0 not configured cpu0: Enhanced SpeedStep 2400 MHz: speeds: 2401, 2400, 2267, 2133, 2000, 1867, 1733, 1600, 1467, 1333, 1200 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Core DMI" rev 0x11 ppb0 at pci0 dev 3 function 0 "Intel Core PCIE" rev 0x11: msi pci1 at ppb0 bus 1 ppb1 at pci0 dev 5 function 0 "Intel Core PCIE" rev 0x11: msi
Re: serious watchdog timeout issues with em driver
On 19/11/15(Thu) 17:54, Sonic wrote: > Have serious problems for over 7 weeks now with em driver, > specifically any rev of if_em.c > 1.305. Starting with rev 1.306, > released on 2015/09/30 and continuing to -current, watchdog timeouts > rue the day. Unfortunately rev 1.305 no longer builds with -current as > it appears the patch in rev 1.309 would be necessary. I just committed a revert to 1.305 keeping the API changes needed for the driver to build. This should bring your stability back, please let us know if that's not the case. I'm sorry for your troubles.
Re: serious watchdog timeout issues with em driver
On Fri, Nov 20, 2015 at 8:12 AM, Martin Pieuchotwrote: > I just committed a revert to 1.305 keeping the API changes needed for > the driver to build. > > This should bring your stability back, please let us know if that's not > the case. The kernel/driver builds with those changes but crashes on startup (hadn't rebuilt userland yet). Couldn't see much when it happened, I believe it was just after starting the network, and there was some Xsoft... error and then what I would describe as a core dump to the console. No footprints seem to be left after a power reset and booting into obsd. Thanks, Chris
Re: serious watchdog timeout issues with em driver
> Date: Fri, 20 Nov 2015 14:12:52 +0100 > From: Martin Pieuchot> > On 19/11/15(Thu) 17:54, Sonic wrote: > > Have serious problems for over 7 weeks now with em driver, > > specifically any rev of if_em.c > 1.305. Starting with rev 1.306, > > released on 2015/09/30 and continuing to -current, watchdog timeouts > > rue the day. Unfortunately rev 1.305 no longer builds with -current as > > it appears the patch in rev 1.309 would be necessary. > > I just committed a revert to 1.305 keeping the API changes needed for > the driver to build. Thanks Martin. I didn't have the time in the last few weeks to do the backout.
Re: serious watchdog timeout issues with em driver
On Fri, Nov 20, 2015 at 12:37 PM, Mark Ketteniswrote: > Thanks Martin. All is fine now. System booted with no errors and no watchdog timeouts. Thanks to all. Chris
serious watchdog timeout issues with em driver
Have serious problems for over 7 weeks now with em driver, specifically any rev of if_em.c > 1.305. Starting with rev 1.306, released on 2015/09/30 and continuing to -current, watchdog timeouts rue the day. Unfortunately rev 1.305 no longer builds with -current as it appears the patch in rev 1.309 would be necessary. System in question is a NAT firewall, also running Unbound and DHCPD. Timeouts occur randomly and can affect both internal and external interfaces. But use of a bittorrent app on an internal client system will always trigger many such timeouts: Nov 18 12:21:17 stargate /bsd: em0: watchdog timeout -- resetting Nov 18 12:21:17 stargate /bsd: em1: watchdog timeout -- resetting Nov 18 12:22:34 stargate unbound: [12687:1] notice: sendto failed: No buffer space available Nov 18 12:22:34 stargate unbound: [12687:1] notice: remote address is 172.27.12.11 port 55181 Nov 18 12:22:36 stargate unbound: [12687:1] notice: sendto failed: No buffer space available Nov 18 12:22:36 stargate unbound: [12687:1] notice: remote address is 172.27.12.253 port 54266 Nov 18 12:22:36 stargate unbound: [22477:0] notice: sendto failed: No buffer space available Nov 18 12:22:36 stargate unbound: [22477:0] notice: remote address is 172.27.12.253 port 53257 Nov 18 12:22:37 stargate /bsd: em0: watchdog timeout -- resetting Nov 18 12:23:42 stargate /bsd: em0: watchdog timeout -- resetting Nov 18 12:28:11 stargate unbound: [12687:1] notice: sendto failed: No buffer space available Nov 18 12:28:11 stargate unbound: [12687:1] notice: remote address is 172.27.12.66 port 56045 Nov 18 12:28:12 stargate unbound: [12687:1] notice: sendto failed: No buffer space available Nov 18 12:28:12 stargate unbound: [12687:1] notice: remote address is 172.27.12.66 port 41975 Nov 18 12:28:12 stargate unbound: [12687:1] notice: sendto failed: No buffer space available Nov 18 12:28:12 stargate unbound: [12687:1] notice: remote address is 172.27.12.66 port 48603 Nov 18 12:28:12 stargate unbound: [12687:1] notice: sendto failed: No buffer space available Nov 18 12:28:12 stargate unbound: [12687:1] notice: remote address is 172.27.12.66 port 17834 Nov 18 12:28:13 stargate unbound: [12687:1] notice: sendto failed: No buffer space available Nov 18 12:28:13 stargate unbound: [12687:1] notice: remote address is 172.27.12.66 port 1177 Nov 18 12:28:14 stargate unbound: [12687:1] notice: sendto failed: No buffer space available Nov 18 12:28:14 stargate unbound: [12687:1] notice: remote address is 172.27.12.66 port 39013 Nov 18 12:28:15 stargate /bsd: em0: watchdog timeout -- resetting Nov 18 12:29:42 stargate /bsd: em0: watchdog timeout -- resetting Nov 18 14:00:01 stargate syslogd: restart Nov 18 16:00:01 stargate syslogd: restart Nov 19 12:00:01 stargate syslogd: restart Nov 19 16:00:01 stargate syslogd: restart Nov 19 16:08:36 stargate /bsd: em0: watchdog timeout -- resetting Nov 19 16:10:34 stargate /bsd: em0: watchdog timeout -- resetting Nov 19 16:15:04 stargate /bsd: em0: watchdog timeout -- resetting Nov 19 16:19:55 stargate last message repeated 3 times (one of the above is on the external interface em1) The timeouts don't just shutdown net access during the reset time, other problems occur. Many time the SSH server no longer accepts connections so shelling into the system is not an option: $ ssh stargate write: Connection reset by peer I've also had a system crash that I suspect (no proof at all and thankfully it hasn't re-occurred, but timing is everything) was caused by the faulty em driver: Nov 1 22:23:55 stargate /bsd: uvm_fault(0x818f9920, 0xfff7818adf60, 0, 1) -> e Nov 1 22:23:55 stargate /bsd: fatal page fault in supervisor mode Nov 1 22:23:55 stargate /bsd: trap type 6 code 0 rip 81329e69 cs 8 rflags 10286 cr2 fff7818adf60 cpl 7 rsp 8000221df76 0 Nov 1 22:23:55 stargate /bsd: panic: trap type 6, code=0, pc=81329e69 Nov 1 22:23:55 stargate /bsd: Starting stack trace... Nov 1 22:23:55 stargate /bsd: panic() at panic+0x10b Nov 1 22:23:55 stargate /bsd: trap() at trap+0x7b8 Nov 1 22:23:55 stargate /bsd: --- trap (number 6) --- Nov 1 22:23:55 stargate /bsd: trap() at trap+0x709 Nov 1 22:23:55 stargate /bsd: --- trap (number 4) --- Nov 1 22:23:55 stargate /bsd: trap() at trap+0x709 Nov 1 22:23:55 stargate /bsd: --- trap (number 4) --- Nov 1 22:23:55 stargate /bsd: bpf_filter() at bpf_filter+0x19b Nov 1 22:23:55 stargate /bsd: _bpf_mtap() at _bpf_mtap+0xf4 Nov 1 22:23:55 stargate /bsd: bpf_mtap_ether() at bpf_mtap_ether+0x39 Nov 1 22:23:55 stargate /bsd: em_start() at em_start+0xd6 Nov 1 22:23:55 stargate /bsd: nettxintr() at nettxintr+0x52 Nov 1 22:23:55 stargate /bsd: softintr_dispatch() at softintr_dispatch+0x8b Nov 1 22:23:55 stargate /bsd: Xsoftnet() at