Re: Strange em(4) issues
Chris Cappuccio ??: i've got a pair of h8ssl-i boards that work fine at 133mhz. i have another set that i run at 66mhz, but only because that's the max the raid controller supports (some kind of LSI card. i like the areca better though) bge shows up as: bge0 at pci2 dev 3 function 0 "Broadcom BCM5704C" rev 0x10, BCM5704 B0 (0x2100): irq 5, address 00:30:48:56:68:d4 brgphy0 at bge0 phy 1: BCM5704 10/100/1000baseT PHY, rev. 0 bge1 at pci2 dev 3 function 1 "Broadcom BCM5704C" rev 0x10, BCM5704 B0 (0x2100): irq 9, address 00:30:48:56:68:d5 brgphy1 at bge1 phy 1: BCM5704 10/100/1000baseT PHY, rev. 0 In fact, the H8-SSL-I2 docs say the jumper is for the PCI-X slot, not for the PCI-X bus, so I guess the onboard BCM704C is unaffected of its settings. Anyways, if it is, or is not, it surely IS working fine, except for the input errors Stuart pointed he had, which i could confirm. I've not seen any problems with traffic flowing through them, though, but Stuart have had. Also, nobody claims the PCI-X is not workable on 133 MHz bus, what it seems like is there's a compatibility issues between recent Intel em(4)s and the ServerWorks HT-1000 (or this Supermicro board). In my opinion, it's too bad that hardware of exactly this two brands, which are none-the-less big names in the server market, are unable to play together nicely at 133 MHz. It's a shame! Regards, Doichin Stuart Henderson [EMAIL PROTECTED] wrote: On 2007/11/30 09:57, Girish Venkatachalam wrote: On 20:47:57 Nov 29, Stuart Henderson wrote: Been there, done that. If you use plaintext protocols (ftp or so) over the interface, you'll see random corruption visible in the data (e.g. directory listings). At 133MHz there's some corruption between motherboard and card. Disappears at 66MHz. Normally this would be masked by TCP checksums (you'd get packet loss, but it would mostly be corrected rather than pass corrupt packets up the stack), but the em(4) does offload TCP checksum processing to the card, so the checksum no longer covers the transfer over the PCI bus, hence the wierd protocol errors. TCP checksums or for that matter any checksum cannot catch *all* errors. Agreed, hence the "mostly". Since there is a MAC computation for every packet, this will easily help you identify the problem. With this happening, you're lucky to get an ftp banner through without corruption, I don't think I ever had an SSH session setup. I already have two workarounds, one is to use the old quad em(4) with the IBM(Tundra) bridge (which work ok at 64x133 but the RJ45 sockets are the wrong way up to latch correctly in some of Supermicro's 1U cases), the other is to use the newer cards (Pericom bridge) at 66MHz. I haven't heard of this happen on other systems (and other 64x133 cards work), I suspect it's a hardware problem between H8SSL and the Pericom bridge chip.
Re: Strange em(4) issues
i've got a pair of h8ssl-i boards that work fine at 133mhz. i have another set that i run at 66mhz, but only because that's the max the raid controller supports (some kind of LSI card. i like the areca better though) bge shows up as: bge0 at pci2 dev 3 function 0 "Broadcom BCM5704C" rev 0x10, BCM5704 B0 (0x2100): irq 5, address 00:30:48:56:68:d4 brgphy0 at bge0 phy 1: BCM5704 10/100/1000baseT PHY, rev. 0 bge1 at pci2 dev 3 function 1 "Broadcom BCM5704C" rev 0x10, BCM5704 B0 (0x2100): irq 9, address 00:30:48:56:68:d5 brgphy1 at bge1 phy 1: BCM5704 10/100/1000baseT PHY, rev. 0 Stuart Henderson [EMAIL PROTECTED] wrote: > On 2007/11/30 09:57, Girish Venkatachalam wrote: > > On 20:47:57 Nov 29, Stuart Henderson wrote: > > > > > Been there, done that. If you use plaintext protocols (ftp or so) > > > over the interface, you'll see random corruption visible in the > > > data (e.g. directory listings). > > > > > > At 133MHz there's some corruption between motherboard and card. > > > Disappears at 66MHz. > > > > > > Normally this would be masked by TCP checksums (you'd get packet > > > loss, but it would mostly be corrected rather than pass corrupt > > > packets up the stack), but the em(4) does offload TCP checksum > > > processing to the card, so the checksum no longer covers the > > > transfer over the PCI bus, hence the wierd protocol errors. > > > > TCP checksums or for that matter any checksum cannot catch *all* errors. > > Agreed, hence the "mostly". > > > Since there is a MAC computation for every packet, this will easily help > > you identify the problem. > > With this happening, you're lucky to get an ftp banner through without > corruption, I don't think I ever had an SSH session setup. > > I already have two workarounds, one is to use the old quad em(4) with > the IBM(Tundra) bridge (which work ok at 64x133 but the RJ45 sockets > are the wrong way up to latch correctly in some of Supermicro's 1U cases), > the other is to use the newer cards (Pericom bridge) at 66MHz. > > I haven't heard of this happen on other systems (and other 64x133 cards > work), I suspect it's a hardware problem between H8SSL and the Pericom > bridge chip. -- Those who can, do. Those who can't, sue.
Re: Strange em(4) issues
On 2007/11/30 09:57, Girish Venkatachalam wrote: > On 20:47:57 Nov 29, Stuart Henderson wrote: > > > Been there, done that. If you use plaintext protocols (ftp or so) > > over the interface, you'll see random corruption visible in the > > data (e.g. directory listings). > > > > At 133MHz there's some corruption between motherboard and card. > > Disappears at 66MHz. > > > > Normally this would be masked by TCP checksums (you'd get packet > > loss, but it would mostly be corrected rather than pass corrupt > > packets up the stack), but the em(4) does offload TCP checksum > > processing to the card, so the checksum no longer covers the > > transfer over the PCI bus, hence the wierd protocol errors. > > TCP checksums or for that matter any checksum cannot catch *all* errors. Agreed, hence the "mostly". > Since there is a MAC computation for every packet, this will easily help > you identify the problem. With this happening, you're lucky to get an ftp banner through without corruption, I don't think I ever had an SSH session setup. I already have two workarounds, one is to use the old quad em(4) with the IBM(Tundra) bridge (which work ok at 64x133 but the RJ45 sockets are the wrong way up to latch correctly in some of Supermicro's 1U cases), the other is to use the newer cards (Pericom bridge) at 66MHz. I haven't heard of this happen on other systems (and other 64x133 cards work), I suspect it's a hardware problem between H8SSL and the Pericom bridge chip.
Re: Strange em(4) issues
On 20:47:57 Nov 29, Stuart Henderson wrote: > Been there, done that. If you use plaintext protocols (ftp or so) > over the interface, you'll see random corruption visible in the > data (e.g. directory listings). > > At 133MHz there's some corruption between motherboard and card. > Disappears at 66MHz. > > Normally this would be masked by TCP checksums (you'd get packet > loss, but it would mostly be corrected rather than pass corrupt > packets up the stack), but the em(4) does offload TCP checksum > processing to the card, so the checksum no longer covers the > transfer over the PCI bus, hence the wierd protocol errors. TCP checksums or for that matter any checksum cannot catch *all* errors. The best way to consistently reproduce that is by using our own scp(1). Since there is a MAC computation for every packet, this will easily help you identify the problem. If you do a recursive transfer and play with large files, it gives you enough headroom to track down the bug(s). Best of luck. -Girish
Re: Strange em(4) issues
Stuart Henderson ??: On 2007/11/29 23:25, NetOne - Doichin Dokov wrote: First, thanks for the prompt reply! No problem, if I can save someone else the night I had in a cold datacentre working it out, some good came out of it :-) Nopes, I'm not: # netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls {snip} bge0 1500 00:30:48:57:c3:80 44867924 39723 42574046 1 0 bge0 1500 213.137.48. 213.137.48.1 44867924 39723 42574046 1 0 bge0 1500 fe80::%bge0 fe80::230:48ff:fe 44867924 39723 42574046 1 0 bge1 1500 00:30:48:57:c3:81 45170081 33204 42551236 1 0 bge1 1500 fe80::%bge1 fe80::230:48ff:fe 45170081 33204 42551236 1 0 Despite seeing Ierrs, I do not see any performance and connectivity issues. What exactly does lead to having input errors on the bge(4)s? I mean, would they be usable for what I will need the two more ports for. I don't know what leads to them, but it's not cable/switch, I have tried numerous alternatives. I was running OSPF with fairly short timers over those interfaces, and had a lot of instability until I swapped over to em/sk cards. Most protocols are able to handle delays/loss a lot better than OSPF though. This machine is gonna soon have a twin to be backed up with CARP, and i need the two additional interfaces on each of them for: 1) One interface for cross-connecting the machines to do pfsync Beware split routing; if you only have one active set of BGP sessions (i.e. active/passive with 'depend on carpXX') there's no problem of that kind, but if you have live sessions on both boxes, you'll find that pfsync isn't designed to handle the case where inbound traffic goes one way, and outbound traffic the other, so you run into problems with stateful filtering (sequence number mismatch and maybe there were wscale problems too). mickey posted some diffs on tech@ relating to watchdog problems with bge and em, they might be worth a look. Are these what you're talking about, or there were any subsequent patches I could not find: http://article.gmane.org/gmane.os.openbsd.tech/14133 http://article.gmane.org/gmane.os.openbsd.tech/14134 Yes, those ones. Alternatively it may be a problem with interrupt routing (the fix for that on many machines is to enable acpi to set up interrupts according to the AML from the BIOS - this is more likely to have correct information than other methods of interrupt setup on newer machines, this is a large part of the reason for the ACPI work that has been happening in -current). While you build, don't forget this patch if you will use pfsync: ftp://ftp.openbsd.org/pub/OpenBSD/patches/4.2/common/004_pf.patch Again, thank you very much for the help. I highly appreciate it. $30 will be donated to the OpenBSD foundation, plus another copy of the 4.2 CD set bought (we'll need one for the new machine, no? :D). That's nice, thank you :-) I've now switched the PCI-X slot to 66-bit / 66 MHz, and also applied the watchdog fix patches for em(4) and bge(4) to the kernel. The pf patch was already applied when it was out several days ago, just the system was not still rebooted as i do not use pfsync for now. Thanks for the hint, anyways. I'm still running with ACPI disabled, will see how far it would go and enable it if needed. Are there any performance penalty / boosts from using ACPI? Thanks again. Doichin
Re: Strange em(4) issues
Stuart Henderson ??: gmane mangled them; mv the .orig files back and try these - http://marc.info/?m=119616849501476 http://marc.info/?m=119616948702986 the diffs are made against -current but probably work with stable too. Yup, you're right! Everything compiled fine. Will load the new kernel in several hours. Thanks again! Doichin On 2007/11/29 23:53, NetOne - Doichin Dokov wrote: NetOne - Doichin Dokov ??: dmesg bge(4) timeouts which happen from time to time: = bge0: watchdog timeout -- resetting bge1: watchdog timeout -- resetting mickey posted some diffs on tech@ relating to watchdog problems with bge and em, they might be worth a look. Are these what you're talking about, or there were any subsequent patches I could not find: http://article.gmane.org/gmane.os.openbsd.tech/14133 http://article.gmane.org/gmane.os.openbsd.tech/14134 Those patches apply cleanly on 4.2 stable, but i get compile erros when trying to build the kernel: cc -Werror -Wall -Wstrict-prototypes -Wmissing-prototypes -Wno-uninitialized -Wno-format -Wno-main -Wno-sign-compare -Wstack-larger-than-2047 -mcmodel=kernel -mno-red-zone -fno-strict-aliasing -mno-sse2 -mno-sse -mno-3dnow -mno-mmx -msoft-float -fno-builtin-printf -fno-builtin-log -fno-omit-frame-pointer -O2 -pipe -nostdinc -I. -I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../.. -I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../arch -DDDB -DDIAGNOSTIC -DKTRACE -DACCOUNTING -DKMEMSTATS -DPTRACE -DCRYPTO -DSYSVMSG -DSYSVSEM -DSYSVSHM -DUVM_SWAP_ENCRYPT -DCOMPAT_35 -DCOMPAT_43 -DLKM -DFFS -DFFS2 -DFFS_SOFTUPDATES -DUFS_DIRHASH -DQUOTA -DEXT2FS -DMFS -DXFS -DTCP_SACK -DTCP_ECN -DTCP_SIGNATURE -DNFSCLIENT -DNFSSERVER -DCD9660 -DUDF -DMSDOSFS -DFIFO -DPORTAL -DINET -DALTQ -DINET6 -DIPSEC -DPPP_BSDCOMP -DPPP_DEFLATE -DMROUTING -DBOOT_CONFIG -DUSER_PCICONF -DAPERTURE -DPCIVERBOSE -DUSBVERBOSE -DWSDISPLAY_COMPAT_USL -DWSDISPLAY_COMPAT_RAWKBD -DWSDISPLAY_DEFAULTSCREENS="6" -DWSDISPLAY_COMPAT_PCVT -DONEWIREVERBOSE -DMULTIPROCESSOR -DMPBIOS -D_KERNEL -Damd64 -Dx86_64 -c /usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../dev/pci/if_bge.c /usr/src/sys/dev/pci/if_bge.c: In function `bge_txeof': /usr/src/sys/dev/pci/if_bge.c:2472: error: stray '\231' in program /usr/src/sys/dev/pci/if_bge.c:2472: error: `bge_txcnt' undeclared (first use in this function) /usr/src/sys/dev/pci/if_bge.c:2472: error: (Each undeclared identifier is reported only once /usr/src/sys/dev/pci/if_bge.c:2472: error: for each function it appears in.) *** Error code 1 Stop in /usr/src/sys/arch/amd64/compile/GENERIC.MP (line 2517 of Makefile). I guess they're meant to be used on -current? Regards, Doichiin
Re: Strange em(4) issues
gmane mangled them; mv the .orig files back and try these - http://marc.info/?m=119616849501476 http://marc.info/?m=119616948702986 the diffs are made against -current but probably work with stable too. On 2007/11/29 23:53, NetOne - Doichin Dokov wrote: > NetOne - Doichin Dokov ??: dmesg bge(4) timeouts which happen from time to time: = bge0: watchdog timeout -- resetting bge1: watchdog timeout -- resetting >>> >>> mickey posted some diffs on tech@ relating to watchdog >>> problems with bge and em, they might be worth a look. >> Are these what you're talking about, or there were any subsequent >> patches I could not find: >> http://article.gmane.org/gmane.os.openbsd.tech/14133 >> http://article.gmane.org/gmane.os.openbsd.tech/14134 >> > Those patches apply cleanly on 4.2 stable, but i get compile erros when > trying to build the kernel: > cc -Werror -Wall -Wstrict-prototypes -Wmissing-prototypes -Wno-uninitialized > -Wno-format -Wno-main -Wno-sign-compare -Wstack-larger-than-2047 > -mcmodel=kernel -mno-red-zone -fno-strict-aliasing -mno-sse2 -mno-sse > -mno-3dnow -mno-mmx -msoft-float -fno-builtin-printf -fno-builtin-log > -fno-omit-frame-pointer -O2 -pipe -nostdinc -I. > -I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../.. > -I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../arch -DDDB > -DDIAGNOSTIC -DKTRACE -DACCOUNTING -DKMEMSTATS -DPTRACE -DCRYPTO -DSYSVMSG > -DSYSVSEM -DSYSVSHM -DUVM_SWAP_ENCRYPT -DCOMPAT_35 -DCOMPAT_43 -DLKM -DFFS > -DFFS2 -DFFS_SOFTUPDATES -DUFS_DIRHASH -DQUOTA -DEXT2FS -DMFS -DXFS > -DTCP_SACK -DTCP_ECN -DTCP_SIGNATURE -DNFSCLIENT -DNFSSERVER -DCD9660 -DUDF > -DMSDOSFS -DFIFO -DPORTAL -DINET -DALTQ -DINET6 -DIPSEC -DPPP_BSDCOMP > -DPPP_DEFLATE -DMROUTING -DBOOT_CONFIG -DUSER_PCICONF -DAPERTURE > -DPCIVERBOSE -DUSBVERBOSE -DWSDISPLAY_COMPAT_USL -DWSDISPLAY_COMPAT_RAWKBD > -DWSDISPLAY_DEFAULTSCREENS="6" -DWSDISPLAY_COMPAT_PCVT -DONEWIREVERBOSE > -DMULTIPROCESSOR -DMPBIOS -D_KERNEL -Damd64 -Dx86_64 -c > /usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../dev/pci/if_bge.c > /usr/src/sys/dev/pci/if_bge.c: In function `bge_txeof': > /usr/src/sys/dev/pci/if_bge.c:2472: error: stray '\231' in program > /usr/src/sys/dev/pci/if_bge.c:2472: error: `bge_txcnt' undeclared (first use > in this function) > /usr/src/sys/dev/pci/if_bge.c:2472: error: (Each undeclared identifier is > reported only once > /usr/src/sys/dev/pci/if_bge.c:2472: error: for each function it appears in.) > *** Error code 1 > > Stop in /usr/src/sys/arch/amd64/compile/GENERIC.MP (line 2517 of Makefile). > > I guess they're meant to be used on -current? > > Regards, > Doichiin
Re: Strange em(4) issues
On 2007/11/29 23:25, NetOne - Doichin Dokov wrote: > First, thanks for the prompt reply! No problem, if I can save someone else the night I had in a cold datacentre working it out, some good came out of it :-) > Nopes, I'm not: > # netstat -in > Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls > {snip} > bge0 1500 00:30:48:57:c3:80 44867924 39723 42574046 1 0 > bge0 1500 213.137.48. 213.137.48.1 44867924 39723 42574046 1 0 > bge0 1500 fe80::%bge0 fe80::230:48ff:fe 44867924 39723 42574046 1 0 > bge1 1500 00:30:48:57:c3:81 45170081 33204 42551236 1 0 > bge1 1500 fe80::%bge1 fe80::230:48ff:fe 45170081 33204 42551236 1 0 > > Despite seeing Ierrs, I do not see any performance and connectivity > issues. What exactly does lead to having input errors on the bge(4)s? > I mean, would they be usable for what I will need the two more ports > for. I don't know what leads to them, but it's not cable/switch, I have tried numerous alternatives. I was running OSPF with fairly short timers over those interfaces, and had a lot of instability until I swapped over to em/sk cards. Most protocols are able to handle delays/loss a lot better than OSPF though. > This machine is gonna soon have a twin to be backed up with CARP, > and i need the two additional interfaces on each of them for: > 1) One interface for cross-connecting the machines to do pfsync Beware split routing; if you only have one active set of BGP sessions (i.e. active/passive with 'depend on carpXX') there's no problem of that kind, but if you have live sessions on both boxes, you'll find that pfsync isn't designed to handle the case where inbound traffic goes one way, and outbound traffic the other, so you run into problems with stateful filtering (sequence number mismatch and maybe there were wscale problems too). >> mickey posted some diffs on tech@ relating to watchdog >> problems with bge and em, they might be worth a look. >> > Are these what you're talking about, or there were any subsequent > patches I could not find: > http://article.gmane.org/gmane.os.openbsd.tech/14133 > http://article.gmane.org/gmane.os.openbsd.tech/14134 Yes, those ones. Alternatively it may be a problem with interrupt routing (the fix for that on many machines is to enable acpi to set up interrupts according to the AML from the BIOS - this is more likely to have correct information than other methods of interrupt setup on newer machines, this is a large part of the reason for the ACPI work that has been happening in -current). While you build, don't forget this patch if you will use pfsync: ftp://ftp.openbsd.org/pub/OpenBSD/patches/4.2/common/004_pf.patch > Again, thank you very much for the help. I highly appreciate it. $30 > will be donated to the OpenBSD foundation, plus another copy of the 4.2 > CD set bought (we'll need one for the new machine, no? :D). That's nice, thank you :-)
Re: Strange em(4) issues
NetOne - Doichin Dokov ??: dmesg bge(4) timeouts which happen from time to time: = bge0: watchdog timeout -- resetting bge1: watchdog timeout -- resetting mickey posted some diffs on tech@ relating to watchdog problems with bge and em, they might be worth a look. Are these what you're talking about, or there were any subsequent patches I could not find: http://article.gmane.org/gmane.os.openbsd.tech/14133 http://article.gmane.org/gmane.os.openbsd.tech/14134 Those patches apply cleanly on 4.2 stable, but i get compile erros when trying to build the kernel: cc -Werror -Wall -Wstrict-prototypes -Wmissing-prototypes -Wno-uninitialized -Wno-format -Wno-main -Wno-sign-compare -Wstack-larger-than-2047 -mcmodel=kernel -mno-red-zone -fno-strict-aliasing -mno-sse2 -mno-sse -mno-3dnow -mno-mmx -msoft-float -fno-builtin-printf -fno-builtin-log -fno-omit-frame-pointer -O2 -pipe -nostdinc -I. -I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../.. -I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../arch -DDDB -DDIAGNOSTIC -DKTRACE -DACCOUNTING -DKMEMSTATS -DPTRACE -DCRYPTO -DSYSVMSG -DSYSVSEM -DSYSVSHM -DUVM_SWAP_ENCRYPT -DCOMPAT_35 -DCOMPAT_43 -DLKM -DFFS -DFFS2 -DFFS_SOFTUPDATES -DUFS_DIRHASH -DQUOTA -DEXT2FS -DMFS -DXFS -DTCP_SACK -DTCP_ECN -DTCP_SIGNATURE -DNFSCLIENT -DNFSSERVER -DCD9660 -DUDF -DMSDOSFS -DFIFO -DPORTAL -DINET -DALTQ -DINET6 -DIPSEC -DPPP_BSDCOMP -DPPP_DEFLATE -DMROUTING -DBOOT_CONFIG -DUSER_PCICONF -DAPERTURE -DPCIVERBOSE -DUSBVERBOSE -DWSDISPLAY_COMPAT_USL -DWSDISPLAY_COMPAT_RAWKBD -DWSDISPLAY_DEFAULTSCREENS="6" -DWSDISPLAY_COMPAT_PCVT -DONEWIREVERBOSE -DMULTIPROCESSOR -DMPBIOS -D_KERNEL -Damd64 -Dx86_64 -c /usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../dev/pci/if_bge.c /usr/src/sys/dev/pci/if_bge.c: In function `bge_txeof': /usr/src/sys/dev/pci/if_bge.c:2472: error: stray '\231' in program /usr/src/sys/dev/pci/if_bge.c:2472: error: `bge_txcnt' undeclared (first use in this function) /usr/src/sys/dev/pci/if_bge.c:2472: error: (Each undeclared identifier is reported only once /usr/src/sys/dev/pci/if_bge.c:2472: error: for each function it appears in.) *** Error code 1 Stop in /usr/src/sys/arch/amd64/compile/GENERIC.MP (line 2517 of Makefile). I guess they're meant to be used on -current? Regards, Doichiin
Re: Strange em(4) issues
First, thanks for the prompt reply! Stuart Henderson ??: On 2007/11/29 22:23, NetOne - Doichin Dokov wrote: Two weeks ago i bought an Intel Pro/1000MT dual Gbit NIC because i was gonna soon be in need for more ports in one of our 1U systems, Change the PCI jumper, which is currently probably on auto, to 64 bit 66MHz. You probably need to remove the PCIX card to reach it (unless they changed much of the design between the H8SSL and -I2, which I doubt). Yes, it's there. Right after the first PCI slot. Will do that in several hours, when most of the users go to sleep :) which has 2 onboard bge(4)s which are working quite nice. the 5704C bge(4) on my H8SSL are all disabled because of Ierrs in netstat -ni, maybe you are luckier :-) Nopes, I'm not: # netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls {snip} bge0 1500 00:30:48:57:c3:80 44867924 39723 42574046 1 0 bge0 1500 213.137.48. 213.137.48.1 44867924 39723 42574046 1 0 bge0 1500 fe80::%bge0 fe80::230:48ff:fe 44867924 39723 42574046 1 0 bge1 1500 00:30:48:57:c3:81 45170081 33204 42551236 1 0 bge1 1500 fe80::%bge1 fe80::230:48ff:fe 45170081 33204 42551236 1 0 Despite seeing Ierrs, I do not see any performance and connectivity issues. What exactly does lead to having input errors on the bge(4)s? I mean, would they be usable for what I will need the two more ports for. This machine is gonna soon have a twin to be backed up with CARP, and i need the two additional interfaces on each of them for: 1) One interface for cross-connecting the machines to do pfsync 2) One interface to connect to a private networks and run bacula backups through (i want to use this couple of routers to do some backups at 4-5 a.m. when they are not busy at all) Using em(4)s for the real traffic, would the bge(4)s be suitable for pfsync and bacula backups with these errors they are experiencing? Or I should go get a quad port Intel (i wish i don't have to spend that much money, though) everything from it quite nice, fetch remote sites, etc. Suddenly the SSH connection was dropped with a message I've never seen before - Corrupted MAC header. Been there, done that. If you use plaintext protocols (ftp or so) over the interface, you'll see random corruption visible in the data (e.g. directory listings). At 133MHz there's some corruption between motherboard and card. Disappears at 66MHz. Normally this would be masked by TCP checksums (you'd get packet loss, but it would mostly be corrected rather than pass corrupt packets up the stack), but the em(4) does offload TCP checksum processing to the card, so the checksum no longer covers the transfer over the PCI bus, hence the wierd protocol errors. Affirmative. Exactly what I'm experiencing. dmesg errors during the problems with em(4)s devices: === em1: watchdog timeout -- resetting em1: watchdog timeout -- resetting pckbcintr: no dev for slot 1 pckbcintr: no dev for slot 1 dmesg bge(4) timeouts which happen from time to time: = bge0: watchdog timeout -- resetting bge1: watchdog timeout -- resetting mickey posted some diffs on tech@ relating to watchdog problems with bge and em, they might be worth a look. Are these what you're talking about, or there were any subsequent patches I could not find: http://article.gmane.org/gmane.os.openbsd.tech/14133 http://article.gmane.org/gmane.os.openbsd.tech/14134 If so, I will apply them and recompile. Again, thank you very much for the help. I highly appreciate it. $30 will be donated to the OpenBSD foundation, plus another copy of the 4.2 CD set bought (we'll need one for the new machine, no? :D). Regards, Doichin
Re: Strange em(4) issues
On 2007/11/29 22:23, NetOne - Doichin Dokov wrote: > Two weeks ago i bought an Intel Pro/1000MT dual Gbit NIC because i was gonna > soon be in need for more ports in one of our 1U systems, Change the PCI jumper, which is currently probably on auto, to 64 bit 66MHz. You probably need to remove the PCIX card to reach it (unless they changed much of the design between the H8SSL and -I2, which I doubt). > which has 2 onboard bge(4)s which are working quite nice. the 5704C bge(4) on my H8SSL are all disabled because of Ierrs in netstat -ni, maybe you are luckier :-) > everything from it quite nice, fetch remote sites, etc. Suddenly the SSH > connection was dropped with a message I've never seen before - Corrupted MAC > header. Been there, done that. If you use plaintext protocols (ftp or so) over the interface, you'll see random corruption visible in the data (e.g. directory listings). At 133MHz there's some corruption between motherboard and card. Disappears at 66MHz. Normally this would be masked by TCP checksums (you'd get packet loss, but it would mostly be corrected rather than pass corrupt packets up the stack), but the em(4) does offload TCP checksum processing to the card, so the checksum no longer covers the transfer over the PCI bus, hence the wierd protocol errors. > dmesg errors during the problems with em(4)s devices: > === > em1: watchdog timeout -- resetting > em1: watchdog timeout -- resetting > pckbcintr: no dev for slot 1 > pckbcintr: no dev for slot 1 > > dmesg bge(4) timeouts which happen from time to time: > = > bge0: watchdog timeout -- resetting > bge1: watchdog timeout -- resetting mickey posted some diffs on tech@ relating to watchdog problems with bge and em, they might be worth a look.