Re: 5.5-stable network interface rl0 stops working
Hi Roland, On (060705), Roland Smith wrote: couple of weeks - the network interface rl0 (which is the main interface on the maschine, rl1 is for backups/internal use only) stops Are they physically on the motherboard? Or on PCI cards? In the latter case try reseating the card in the slot. fortunately they are PCI cards, so I'll check the seating. Try switching rl0 and rl1, and see if te problem persists. Also, swapping out the ethernet cable is worth trying. Switching/exchanging the cards was an option we haven't tried yet although it came to my mind earlier - for sure the strangest problems are hardware related so I'll give this a try and report back. Swapping out the ethernet cable was one of the first things I checked but to no avail. But I'm not really sure if the switch isn't part of the problem (although all other ports function correctly) so I'll change the switch port to. Another thing to check is if rl0 is sharing an interrupt with another device. That can cause problems. No there is no interupt sharing for this device but thanks for this hint, I hadn't checked it yet. When rl0 stops working ipfw loggs lots of denied packets so that it seems that the dynamic (keep-state) rules don't work any longer. We Does the problem persist without ipfw? I've got an rl0 card on my workstation (6.1-STABLE, amd64, using PF without problems) Unfortunately I can't check this because we use ipfw to generate traffic statistics for the jails. But when the interface stops working it has no impact to disable the firewall, short of that no log messages are generated any longer. After the stop on the interface occurs there is no other way to get the interface up and running again than rebooting the whole machine. Restarting /etc/rc.d/netif, the jails or ipfw doesn't help anything. What does ifconfig say after the interface stops working? When the interface stops working ifconfig seems to think everything is still ok. There is no hint in the output of ifconfig that the interface is not working and ifconfig down/up doesn't help any. Anything in the logs, except the denied packets? No strange enough there is no other hint in the logs that the system is not working. At first I thought it was kind of an ipfw problem because packets seem to arrive on the host but the responses get blocked by ipfw. I'll check with tcpdump the next time it happens if it's true that packets still arrive on the system. On the other hand if ipfw is part of the problem (especially the dynamic rules) then flushing ipfw should help I think - but it doesn't. So maybe it's an hardware issue, I'll definitly check this and report back. Thanks for the hints and tips! Best regards, Hank pgptYmaa0xylf.pgp Description: PGP signature
5.5-stable network interface rl0 stops working
Hello everybody, I have a very disturbing problem with one of our FreeBSD 5.5-stable machines. It is a box on which ~10 jail systems run, each with small to moderate network traffic. Now from time to time - sometimes after a few days, sometimes after a couple of weeks - the network interface rl0 (which is the main interface on the maschine, rl1 is for backups/internal use only) stops working. Each jailed system has its own firewall ruleset, permitting only traffic for the services in that specific jail. The packet filter used is ipfw. Some of the rules are stateful (keep-state). When rl0 stops working ipfw loggs lots of denied packets so that it seems that the dynamic (keep-state) rules don't work any longer. We checked and increased the buffers for the dynamic rules to no avail - I doubt they are part of the problem. I'm not even sure ipfw is part of the problem. After the stop on the interface occurs there is no other way to get the interface up and running again than rebooting the whole machine. Restarting /etc/rc.d/netif, the jails or ipfw doesn't help anything. The bad thing is I haven't found any way to trigger this problem so that I can only check and change things and wait if the situation improves or not. For example I've already set debug.mpsafenet=0 but this doesn't help, in contrast it seems to worsen the problem a little bit. Find attached the dmesg output of the machine. If any other information is needed to hunt down the cause of this problem please let me know. I checked various list archives but haven't found a clue yet. -[ dmesg ]- Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.5-STABLE #5: Tue May 30 13:51:55 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SHAWSHANK WARNING: MPSAFE network stack disabled, expect reduced performance. Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2411.60-MHz 686-class CPU) Origin = GenuineIntel Id = 0xf34 Stepping = 4 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE real memory = 2147418112 (2047 MB) avail memory = 2096037888 (1998 MB) ACPI APIC Table: GBTAWRDACPI ioapic0 Version 2.0 irqs 0-23 on motherboard npx0: math processor on motherboard npx0: INT 16 interface acpi0: GBT AWRDACPI on motherboard acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0 cpu0: ACPI CPU on acpi0 acpi_button0: Power Button on acpi0 pcib0: ACPI Host-PCI bridge port 0x1000-0x10bf,0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 agp0: Intel 82865 host to AGP bridge mem 0xe800-0xefff at device 0.0 on pci0 pcib1: PCI-PCI bridge at device 1.0 on pci0 pci1: PCI bus on pcib1 pcib2: ACPI PCI-PCI bridge at device 30.0 on pci0 pci2: ACPI PCI bus on pcib2 pci2: display, VGA at device 0.0 (no driver attached) rl0: RealTek 8139 10/100BaseTX port 0x9000-0x90ff mem 0xf500-0xf5ff irq 21 at device 1.0 on pci2 miibus0: MII bus on rl0 rlphy0: RealTek internal media interface on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl0: Ethernet address: 00:02:2a:d5:39:74 rl1: RealTek 8139 10/100BaseTX port 0x9400-0x94ff mem 0xf5001000-0xf50010ff irq 22 at device 2.0 on pci2 miibus1: MII bus on rl1 rlphy1: RealTek internal media interface on miibus1 rlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto rl1: Ethernet address: 00:02:2a:d5:39:53 isab0: PCI-ISA bridge at device 31.0 on pci0 isa0: ISA bus on isab0 atapci0: Intel ICH5 UDMA100 controller port 0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 pci0: serial bus, SMBus at device 31.3 (no driver attached) acpi_tz0: Thermal Zone on acpi0 sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A, console sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A pmtimer0 on isa0 orm0: ISA Option ROM at iomem 0xc-0xc7fff on isa0 sc0: System console at flags 0x100 on isa0 sc0: VGA 16 virtual consoles, flags=0x100 vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0 atkbdc0: Keyboard controller (i8042) at port 0x64,0x60 on isa0 atkbd0: AT Keyboard irq 1 on atkbdc0 kbd0 at atkbd0 ppc0: parallel port not found. Timecounter TSC frequency 2411601876 Hz quality 800 Timecounters tick every 10.000 msec ipfw2 initialized, divert disabled, rule-based forwarding disabled, default to deny, logging disabled ad0: 114497MB STARDOM SohoRaid Mirror Rev:B2.7/Rev 2.7 [232629/16/63] at ata0-master UDMA100 acd0: DVDROM TOSHIBA DVD-ROM SD-M1912/TM01 at ata1-master PIO4 Mounting root from ufs:/dev/ad0s1a
Re: 5.5-stable network interface rl0 stops working
On Wed, Jul 05, 2006 at 06:40:58PM +0200, Hank Hampel wrote: Hello everybody, I have a very disturbing problem with one of our FreeBSD 5.5-stable machines. It is a box on which ~10 jail systems run, each with small to moderate network traffic. Now from time to time - sometimes after a few days, sometimes after a couple of weeks - the network interface rl0 (which is the main interface on the maschine, rl1 is for backups/internal use only) stops working. Are they physically on the motherboard? Or on PCI cards? In the latter case try reseating the card in the slot. Try switching rl0 and rl1, and see if te problem persists. Also, swapping out the ethernet cable is worth trying. Another thing to check is if rl0 is sharing an interrupt with another device. That can cause problems. Each jailed system has its own firewall ruleset, permitting only traffic for the services in that specific jail. The packet filter used is ipfw. Some of the rules are stateful (keep-state). When rl0 stops working ipfw loggs lots of denied packets so that it seems that the dynamic (keep-state) rules don't work any longer. We checked and increased the buffers for the dynamic rules to no avail - I doubt they are part of the problem. I'm not even sure ipfw is part of the problem. Does the problem persist without ipfw? I've got an rl0 card on my workstation (6.1-STABLE, amd64, using PF without problems) After the stop on the interface occurs there is no other way to get the interface up and running again than rebooting the whole machine. Restarting /etc/rc.d/netif, the jails or ipfw doesn't help anything. What does ifconfig say after the interface stops working? The bad thing is I haven't found any way to trigger this problem so that I can only check and change things and wait if the situation improves or not. For example I've already set debug.mpsafenet=0 but this doesn't help, in contrast it seems to worsen the problem a little bit. Find attached the dmesg output of the machine. If any other information is needed to hunt down the cause of this problem please let me know. I checked various list archives but haven't found a clue yet. Anything in the logs, except the denied packets? Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgpqxfkKnhmwC.pgp Description: PGP signature