Re: Strange em(4) issues

2007-12-01 Thread Chris Cappuccio
i've got a pair of h8ssl-i boards that work fine at 133mhz.  i have
another set that i run at 66mhz, but only because that's the max the raid
controller supports (some kind of LSI card.  i like the areca better though)

bge shows up as:

bge0 at pci2 dev 3 function 0 Broadcom BCM5704C rev 0x10, BCM5704 B0 
(0x2100): irq 5, address 00:30:48:56:68:d4
brgphy0 at bge0 phy 1: BCM5704 10/100/1000baseT PHY, rev. 0
bge1 at pci2 dev 3 function 1 Broadcom BCM5704C rev 0x10, BCM5704 B0 
(0x2100): irq 9, address 00:30:48:56:68:d5
brgphy1 at bge1 phy 1: BCM5704 10/100/1000baseT PHY, rev. 0

Stuart Henderson [EMAIL PROTECTED] wrote:
 On 2007/11/30 09:57, Girish Venkatachalam wrote:
  On 20:47:57 Nov 29, Stuart Henderson wrote:
   
   Been there, done that. If you use plaintext protocols (ftp or so)
   over the interface, you'll see random corruption visible in the
   data (e.g. directory listings).
   
   At 133MHz there's some corruption between motherboard and card.
   Disappears at 66MHz.
   
   Normally this would be masked by TCP checksums (you'd get packet
   loss, but it would mostly be corrected rather than pass corrupt
   packets up the stack), but the em(4) does offload TCP checksum
   processing to the card, so the checksum no longer covers the
   transfer over the PCI bus, hence the wierd protocol errors.
  
  TCP checksums or for that matter any checksum cannot catch *all* errors.
 
 Agreed, hence the mostly.
 
  Since there is a MAC computation for every packet, this will easily help
  you identify the problem.
 
 With this happening, you're lucky to get an ftp banner through without
 corruption, I don't think I ever had an SSH session setup.
 
 I already have two workarounds, one is to use the old quad em(4) with
 the IBM(Tundra) bridge (which work ok at 64x133 but the RJ45 sockets
 are the wrong way up to latch correctly in some of Supermicro's 1U cases),
 the other is to use the newer cards (Pericom bridge) at 66MHz.
 
 I haven't heard of this happen on other systems (and other 64x133 cards
 work), I suspect it's a hardware problem between H8SSL and the Pericom
 bridge chip.

-- 
Those who can, do.
Those who can't, sue.



Re: Strange em(4) issues

2007-12-01 Thread NetOne - Doichin Dokov

Chris Cappuccio ??:

i've got a pair of h8ssl-i boards that work fine at 133mhz.  i have
another set that i run at 66mhz, but only because that's the max the raid
controller supports (some kind of LSI card.  i like the areca better though)

bge shows up as:

bge0 at pci2 dev 3 function 0 Broadcom BCM5704C rev 0x10, BCM5704 B0 
(0x2100): irq 5, address 00:30:48:56:68:d4
brgphy0 at bge0 phy 1: BCM5704 10/100/1000baseT PHY, rev. 0
bge1 at pci2 dev 3 function 1 Broadcom BCM5704C rev 0x10, BCM5704 B0 
(0x2100): irq 9, address 00:30:48:56:68:d5
brgphy1 at bge1 phy 1: BCM5704 10/100/1000baseT PHY, rev. 0
  
In fact, the H8-SSL-I2 docs say the jumper is for the PCI-X slot, not 
for the PCI-X bus, so I guess the onboard BCM704C is unaffected of its 
settings. Anyways, if it is, or is not, it surely IS working fine, 
except for the input errors Stuart pointed he had, which i could 
confirm. I've not seen any problems with traffic flowing through them, 
though, but Stuart have had.
Also, nobody claims the PCI-X is not workable on 133 MHz bus, what it 
seems like is there's a compatibility issues between recent Intel em(4)s 
and the ServerWorks HT-1000 (or this Supermicro board). In my opinion, 
it's too bad that hardware of exactly this two brands, which are 
none-the-less big names in the server market, are unable to play 
together nicely at 133 MHz. It's a shame!


Regards,
Doichin

Stuart Henderson [EMAIL PROTECTED] wrote:
  

On 2007/11/30 09:57, Girish Venkatachalam wrote:


On 20:47:57 Nov 29, Stuart Henderson wrote:
 
  

Been there, done that. If you use plaintext protocols (ftp or so)
over the interface, you'll see random corruption visible in the
data (e.g. directory listings).

At 133MHz there's some corruption between motherboard and card.
Disappears at 66MHz.

Normally this would be masked by TCP checksums (you'd get packet
loss, but it would mostly be corrected rather than pass corrupt
packets up the stack), but the em(4) does offload TCP checksum
processing to the card, so the checksum no longer covers the
transfer over the PCI bus, hence the wierd protocol errors.


TCP checksums or for that matter any checksum cannot catch *all* errors.
  

Agreed, hence the mostly.



Since there is a MAC computation for every packet, this will easily help
you identify the problem.
  

With this happening, you're lucky to get an ftp banner through without
corruption, I don't think I ever had an SSH session setup.

I already have two workarounds, one is to use the old quad em(4) with
the IBM(Tundra) bridge (which work ok at 64x133 but the RJ45 sockets
are the wrong way up to latch correctly in some of Supermicro's 1U cases),
the other is to use the newer cards (Pericom bridge) at 66MHz.

I haven't heard of this happen on other systems (and other 64x133 cards
work), I suspect it's a hardware problem between H8SSL and the Pericom
bridge chip.




Re: Strange em(4) issues

2007-11-30 Thread Stuart Henderson
On 2007/11/30 09:57, Girish Venkatachalam wrote:
 On 20:47:57 Nov 29, Stuart Henderson wrote:
  
  Been there, done that. If you use plaintext protocols (ftp or so)
  over the interface, you'll see random corruption visible in the
  data (e.g. directory listings).
  
  At 133MHz there's some corruption between motherboard and card.
  Disappears at 66MHz.
  
  Normally this would be masked by TCP checksums (you'd get packet
  loss, but it would mostly be corrected rather than pass corrupt
  packets up the stack), but the em(4) does offload TCP checksum
  processing to the card, so the checksum no longer covers the
  transfer over the PCI bus, hence the wierd protocol errors.
 
 TCP checksums or for that matter any checksum cannot catch *all* errors.

Agreed, hence the mostly.

 Since there is a MAC computation for every packet, this will easily help
 you identify the problem.

With this happening, you're lucky to get an ftp banner through without
corruption, I don't think I ever had an SSH session setup.

I already have two workarounds, one is to use the old quad em(4) with
the IBM(Tundra) bridge (which work ok at 64x133 but the RJ45 sockets
are the wrong way up to latch correctly in some of Supermicro's 1U cases),
the other is to use the newer cards (Pericom bridge) at 66MHz.

I haven't heard of this happen on other systems (and other 64x133 cards
work), I suspect it's a hardware problem between H8SSL and the Pericom
bridge chip.



Re: Strange em(4) issues

2007-11-29 Thread Stuart Henderson
On 2007/11/29 22:23, NetOne - Doichin Dokov wrote:
 Two weeks ago i bought an Intel Pro/1000MT dual Gbit NIC because i was gonna 
 soon be in need for more ports in one of our 1U systems,

Change the PCI jumper, which is currently probably on auto,
to 64 bit 66MHz. You probably need to remove the PCIX card to
reach it (unless they changed much of the design between the
H8SSL and -I2, which I doubt).

 which has 2 onboard bge(4)s which are working quite nice.

the 5704C bge(4) on my H8SSL are all disabled because of Ierrs
in netstat -ni, maybe you are luckier :-)

 everything from it quite nice, fetch remote sites, etc. Suddenly the SSH 
 connection was dropped with a message I've never seen before - Corrupted MAC 
 header.

Been there, done that. If you use plaintext protocols (ftp or so)
over the interface, you'll see random corruption visible in the
data (e.g. directory listings).

At 133MHz there's some corruption between motherboard and card.
Disappears at 66MHz.

Normally this would be masked by TCP checksums (you'd get packet
loss, but it would mostly be corrected rather than pass corrupt
packets up the stack), but the em(4) does offload TCP checksum
processing to the card, so the checksum no longer covers the
transfer over the PCI bus, hence the wierd protocol errors.

 dmesg errors during the problems with em(4)s devices:
 ===
 em1: watchdog timeout -- resetting
 em1: watchdog timeout -- resetting
 pckbcintr: no dev for slot 1
 pckbcintr: no dev for slot 1

 dmesg bge(4) timeouts which happen from time to time:
 =
 bge0: watchdog timeout -- resetting
 bge1: watchdog timeout -- resetting

mickey posted some diffs on tech@ relating to watchdog
problems with bge and em, they might be worth a look.



Re: Strange em(4) issues

2007-11-29 Thread NetOne - Doichin Dokov

First, thanks for the prompt reply!

Stuart Henderson ??:

On 2007/11/29 22:23, NetOne - Doichin Dokov wrote:
  
Two weeks ago i bought an Intel Pro/1000MT dual Gbit NIC because i was gonna 
soon be in need for more ports in one of our 1U systems,



Change the PCI jumper, which is currently probably on auto,
to 64 bit 66MHz. You probably need to remove the PCIX card to
reach it (unless they changed much of the design between the
H8SSL and -I2, which I doubt).

  

Yes, it's there. Right after the first PCI slot. Will do that in several
hours, when most of the users go to sleep :)

which has 2 onboard bge(4)s which are working quite nice.



the 5704C bge(4) on my H8SSL are all disabled because of Ierrs
in netstat -ni, maybe you are luckier :-)
  

Nopes, I'm not:
# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
{snip}
bge0 1500 Link 00:30:48:57:c3:80 44867924 39723 42574046 1 0
bge0 1500 213.137.48. 213.137.48.1 44867924 39723 42574046 1 0
bge0 1500 fe80::%bge0 fe80::230:48ff:fe 44867924 39723 42574046 1 0
bge1 1500 Link 00:30:48:57:c3:81 45170081 33204 42551236 1 0
bge1 1500 fe80::%bge1 fe80::230:48ff:fe 45170081 33204 42551236 1 0

Despite seeing Ierrs, I do not see any performance and connectivity
issues. What exactly does lead to having input errors on the bge(4)s?
I mean, would they be usable for what I will need the two more ports
for. This machine is gonna soon have a twin to be backed up with CARP,
and i need the two additional interfaces on each of them for:
1) One interface for cross-connecting the machines to do pfsync
2) One interface to connect to a private networks and run bacula backups
through (i want to use this couple of routers to do some backups at 4-5
a.m. when they are not busy at all)
Using em(4)s for the real traffic, would the bge(4)s be suitable for
pfsync and bacula backups with these errors they are experiencing? Or I
should go get a quad port Intel (i wish i don't have to spend that much
money, though)

everything from it quite nice, fetch remote sites, etc. Suddenly the SSH 
connection was dropped with a message I've never seen before - Corrupted MAC 
header.



Been there, done that. If you use plaintext protocols (ftp or so)
over the interface, you'll see random corruption visible in the
data (e.g. directory listings).

At 133MHz there's some corruption between motherboard and card.
Disappears at 66MHz.

Normally this would be masked by TCP checksums (you'd get packet
loss, but it would mostly be corrected rather than pass corrupt
packets up the stack), but the em(4) does offload TCP checksum
processing to the card, so the checksum no longer covers the
transfer over the PCI bus, hence the wierd protocol errors.

  

Affirmative. Exactly what I'm experiencing.

dmesg errors during the problems with em(4)s devices:
===
em1: watchdog timeout -- resetting
em1: watchdog timeout -- resetting
pckbcintr: no dev for slot 1
pckbcintr: no dev for slot 1

dmesg bge(4) timeouts which happen from time to time:
=
bge0: watchdog timeout -- resetting
bge1: watchdog timeout -- resetting



mickey posted some diffs on tech@ relating to watchdog
problems with bge and em, they might be worth a look.
  

Are these what you're talking about, or there were any subsequent
patches I could not find:
http://article.gmane.org/gmane.os.openbsd.tech/14133
http://article.gmane.org/gmane.os.openbsd.tech/14134

If so, I will apply them and recompile.

Again, thank you very much for the help. I highly appreciate it. $30
will be donated to the OpenBSD foundation, plus another copy of the 4.2
CD set bought (we'll need one for the new machine, no? :D).

Regards,
Doichin



Re: Strange em(4) issues

2007-11-29 Thread NetOne - Doichin Dokov

NetOne - Doichin Dokov ??:

dmesg bge(4) timeouts which happen from time to time:
=
bge0: watchdog timeout -- resetting
bge1: watchdog timeout -- resetting


mickey posted some diffs on tech@ relating to watchdog
problems with bge and em, they might be worth a look.

Are these what you're talking about, or there were any subsequent
patches I could not find:
http://article.gmane.org/gmane.os.openbsd.tech/14133
http://article.gmane.org/gmane.os.openbsd.tech/14134

Those patches apply cleanly on 4.2 stable, but i get compile erros when 
trying to build the kernel:
cc -Werror -Wall -Wstrict-prototypes -Wmissing-prototypes 
-Wno-uninitialized -Wno-format -Wno-main -Wno-sign-compare 
-Wstack-larger-than-2047 -mcmodel=kernel -mno-red-zone 
-fno-strict-aliasing -mno-sse2 -mno-sse -mno-3dnow -mno-mmx -msoft-float 
-fno-builtin-printf -fno-builtin-log -fno-omit-frame-pointer -O2 -pipe 
-nostdinc -I. -I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../.. 
-I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../arch -DDDB 
-DDIAGNOSTIC -DKTRACE -DACCOUNTING -DKMEMSTATS -DPTRACE -DCRYPTO 
-DSYSVMSG -DSYSVSEM -DSYSVSHM -DUVM_SWAP_ENCRYPT -DCOMPAT_35 -DCOMPAT_43 
-DLKM -DFFS -DFFS2 -DFFS_SOFTUPDATES -DUFS_DIRHASH -DQUOTA -DEXT2FS 
-DMFS -DXFS -DTCP_SACK -DTCP_ECN -DTCP_SIGNATURE -DNFSCLIENT -DNFSSERVER 
-DCD9660 -DUDF -DMSDOSFS -DFIFO -DPORTAL -DINET -DALTQ -DINET6 -DIPSEC 
-DPPP_BSDCOMP -DPPP_DEFLATE -DMROUTING -DBOOT_CONFIG -DUSER_PCICONF 
-DAPERTURE -DPCIVERBOSE -DUSBVERBOSE -DWSDISPLAY_COMPAT_USL 
-DWSDISPLAY_COMPAT_RAWKBD -DWSDISPLAY_DEFAULTSCREENS=6 
-DWSDISPLAY_COMPAT_PCVT -DONEWIREVERBOSE -DMULTIPROCESSOR -DMPBIOS 
-D_KERNEL -Damd64 -Dx86_64 -c 
/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../dev/pci/if_bge.c

/usr/src/sys/dev/pci/if_bge.c: In function `bge_txeof':
/usr/src/sys/dev/pci/if_bge.c:2472: error: stray '\231' in program
/usr/src/sys/dev/pci/if_bge.c:2472: error: `bge_txcnt' undeclared (first 
use in this function)
/usr/src/sys/dev/pci/if_bge.c:2472: error: (Each undeclared identifier 
is reported only once

/usr/src/sys/dev/pci/if_bge.c:2472: error: for each function it appears in.)
*** Error code 1

Stop in /usr/src/sys/arch/amd64/compile/GENERIC.MP (line 2517 of Makefile).

I guess they're meant to be used on -current?

Regards,
Doichiin



Re: Strange em(4) issues

2007-11-29 Thread Stuart Henderson
On 2007/11/29 23:25, NetOne - Doichin Dokov wrote:
 First, thanks for the prompt reply!

No problem, if I can save someone else the night I had in a
cold datacentre working it out, some good came out of it :-)

 Nopes, I'm not:
 # netstat -in
 Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
 {snip}
 bge0 1500 Link 00:30:48:57:c3:80 44867924 39723 42574046 1 0
 bge0 1500 213.137.48. 213.137.48.1 44867924 39723 42574046 1 0
 bge0 1500 fe80::%bge0 fe80::230:48ff:fe 44867924 39723 42574046 1 0
 bge1 1500 Link 00:30:48:57:c3:81 45170081 33204 42551236 1 0
 bge1 1500 fe80::%bge1 fe80::230:48ff:fe 45170081 33204 42551236 1 0

 Despite seeing Ierrs, I do not see any performance and connectivity
 issues. What exactly does lead to having input errors on the bge(4)s?
 I mean, would they be usable for what I will need the two more ports
 for.

I don't know what leads to them, but it's not cable/switch, I have
tried numerous alternatives. I was running OSPF with fairly short
timers over those interfaces, and had a lot of instability until
I swapped over to em/sk cards. Most protocols are able to handle
delays/loss a lot better than OSPF though.

 This machine is gonna soon have a twin to be backed up with CARP,
 and i need the two additional interfaces on each of them for:
 1) One interface for cross-connecting the machines to do pfsync

Beware split routing; if you only have one active set of BGP
sessions (i.e. active/passive with 'depend on carpXX') there's
no problem of that kind, but if you have live sessions on
both boxes, you'll find that pfsync isn't designed to handle
the case where inbound traffic goes one way, and outbound
traffic the other, so you run into problems with stateful
filtering (sequence number mismatch and maybe there were
wscale problems too).

 mickey posted some diffs on tech@ relating to watchdog
 problems with bge and em, they might be worth a look.
   
 Are these what you're talking about, or there were any subsequent
 patches I could not find:
 http://article.gmane.org/gmane.os.openbsd.tech/14133
 http://article.gmane.org/gmane.os.openbsd.tech/14134

Yes, those ones. Alternatively it may be a problem with
interrupt routing (the fix for that on many machines is to
enable acpi to set up interrupts according to the AML from
the BIOS - this is more likely to have correct information
than other methods of interrupt setup on newer machines,
this is a large part of the reason for the ACPI work that
has been happening in -current).

While you build, don't forget this patch if you will use pfsync:
ftp://ftp.openbsd.org/pub/OpenBSD/patches/4.2/common/004_pf.patch

 Again, thank you very much for the help. I highly appreciate it. $30
 will be donated to the OpenBSD foundation, plus another copy of the 4.2
 CD set bought (we'll need one for the new machine, no? :D).

That's nice, thank you :-)



Re: Strange em(4) issues

2007-11-29 Thread Stuart Henderson
gmane mangled them; mv the .orig files back and try these -

http://marc.info/?m=119616849501476
http://marc.info/?m=119616948702986

the diffs are made against -current but probably work with stable too.

On 2007/11/29 23:53, NetOne - Doichin Dokov wrote:
 NetOne - Doichin Dokov ??:
 dmesg bge(4) timeouts which happen from time to time:
 =
 bge0: watchdog timeout -- resetting
 bge1: watchdog timeout -- resetting

 mickey posted some diffs on tech@ relating to watchdog
 problems with bge and em, they might be worth a look.
 Are these what you're talking about, or there were any subsequent
 patches I could not find:
 http://article.gmane.org/gmane.os.openbsd.tech/14133
 http://article.gmane.org/gmane.os.openbsd.tech/14134

 Those patches apply cleanly on 4.2 stable, but i get compile erros when 
 trying to build the kernel:
 cc -Werror -Wall -Wstrict-prototypes -Wmissing-prototypes -Wno-uninitialized 
 -Wno-format -Wno-main -Wno-sign-compare -Wstack-larger-than-2047 
 -mcmodel=kernel -mno-red-zone -fno-strict-aliasing -mno-sse2 -mno-sse 
 -mno-3dnow -mno-mmx -msoft-float -fno-builtin-printf -fno-builtin-log 
 -fno-omit-frame-pointer -O2 -pipe -nostdinc -I. 
 -I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../.. 
 -I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../arch -DDDB 
 -DDIAGNOSTIC -DKTRACE -DACCOUNTING -DKMEMSTATS -DPTRACE -DCRYPTO -DSYSVMSG 
 -DSYSVSEM -DSYSVSHM -DUVM_SWAP_ENCRYPT -DCOMPAT_35 -DCOMPAT_43 -DLKM -DFFS 
 -DFFS2 -DFFS_SOFTUPDATES -DUFS_DIRHASH -DQUOTA -DEXT2FS -DMFS -DXFS 
 -DTCP_SACK -DTCP_ECN -DTCP_SIGNATURE -DNFSCLIENT -DNFSSERVER -DCD9660 -DUDF 
 -DMSDOSFS -DFIFO -DPORTAL -DINET -DALTQ -DINET6 -DIPSEC -DPPP_BSDCOMP 
 -DPPP_DEFLATE -DMROUTING -DBOOT_CONFIG -DUSER_PCICONF -DAPERTURE 
 -DPCIVERBOSE -DUSBVERBOSE -DWSDISPLAY_COMPAT_USL -DWSDISPLAY_COMPAT_RAWKBD 
 -DWSDISPLAY_DEFAULTSCREENS=6 -DWSDISPLAY_COMPAT_PCVT -DONEWIREVERBOSE 
 -DMULTIPROCESSOR -DMPBIOS -D_KERNEL -Damd64 -Dx86_64 -c 
 /usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../dev/pci/if_bge.c
 /usr/src/sys/dev/pci/if_bge.c: In function `bge_txeof':
 /usr/src/sys/dev/pci/if_bge.c:2472: error: stray '\231' in program
 /usr/src/sys/dev/pci/if_bge.c:2472: error: `bge_txcnt' undeclared (first use 
 in this function)
 /usr/src/sys/dev/pci/if_bge.c:2472: error: (Each undeclared identifier is 
 reported only once
 /usr/src/sys/dev/pci/if_bge.c:2472: error: for each function it appears in.)
 *** Error code 1

 Stop in /usr/src/sys/arch/amd64/compile/GENERIC.MP (line 2517 of Makefile).

 I guess they're meant to be used on -current?

 Regards,
 Doichiin



Re: Strange em(4) issues

2007-11-29 Thread NetOne - Doichin Dokov

Stuart Henderson ??:

gmane mangled them; mv the .orig files back and try these -

http://marc.info/?m=119616849501476
http://marc.info/?m=119616948702986

the diffs are made against -current but probably work with stable too.
  

Yup, you're right! Everything compiled fine. Will load the new kernel in
several hours.
Thanks again!

Doichin

On 2007/11/29 23:53, NetOne - Doichin Dokov wrote:
  

NetOne - Doichin Dokov ??:


dmesg bge(4) timeouts which happen from time to time:
=
bge0: watchdog timeout -- resetting
bge1: watchdog timeout -- resetting
  

mickey posted some diffs on tech@ relating to watchdog
problems with bge and em, they might be worth a look.


Are these what you're talking about, or there were any subsequent
patches I could not find:
http://article.gmane.org/gmane.os.openbsd.tech/14133
http://article.gmane.org/gmane.os.openbsd.tech/14134

  
Those patches apply cleanly on 4.2 stable, but i get compile erros when 
trying to build the kernel:
cc -Werror -Wall -Wstrict-prototypes -Wmissing-prototypes -Wno-uninitialized 
-Wno-format -Wno-main -Wno-sign-compare -Wstack-larger-than-2047 
-mcmodel=kernel -mno-red-zone -fno-strict-aliasing -mno-sse2 -mno-sse 
-mno-3dnow -mno-mmx -msoft-float -fno-builtin-printf -fno-builtin-log 
-fno-omit-frame-pointer -O2 -pipe -nostdinc -I. 
-I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../.. 
-I/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../arch -DDDB 
-DDIAGNOSTIC -DKTRACE -DACCOUNTING -DKMEMSTATS -DPTRACE -DCRYPTO -DSYSVMSG 
-DSYSVSEM -DSYSVSHM -DUVM_SWAP_ENCRYPT -DCOMPAT_35 -DCOMPAT_43 -DLKM -DFFS 
-DFFS2 -DFFS_SOFTUPDATES -DUFS_DIRHASH -DQUOTA -DEXT2FS -DMFS -DXFS 
-DTCP_SACK -DTCP_ECN -DTCP_SIGNATURE -DNFSCLIENT -DNFSSERVER -DCD9660 -DUDF 
-DMSDOSFS -DFIFO -DPORTAL -DINET -DALTQ -DINET6 -DIPSEC -DPPP_BSDCOMP 
-DPPP_DEFLATE -DMROUTING -DBOOT_CONFIG -DUSER_PCICONF -DAPERTURE 
-DPCIVERBOSE -DUSBVERBOSE -DWSDISPLAY_COMPAT_USL -DWSDISPLAY_COMPAT_RAWKBD 
-DWSDISPLAY_DEFAULTSCREENS=6 -DWSDISPLAY_COMPAT_PCVT -DONEWIREVERBOSE 
-DMULTIPROCESSOR -DMPBIOS -D_KERNEL -Damd64 -Dx86_64 -c 
/usr/src/sys/arch/amd64/compile/GENERIC.MP/../../../../dev/pci/if_bge.c

/usr/src/sys/dev/pci/if_bge.c: In function `bge_txeof':
/usr/src/sys/dev/pci/if_bge.c:2472: error: stray '\231' in program
/usr/src/sys/dev/pci/if_bge.c:2472: error: `bge_txcnt' undeclared (first use 
in this function)
/usr/src/sys/dev/pci/if_bge.c:2472: error: (Each undeclared identifier is 
reported only once

/usr/src/sys/dev/pci/if_bge.c:2472: error: for each function it appears in.)
*** Error code 1

Stop in /usr/src/sys/arch/amd64/compile/GENERIC.MP (line 2517 of Makefile).

I guess they're meant to be used on -current?

Regards,
Doichiin




Re: Strange em(4) issues

2007-11-29 Thread NetOne - Doichin Dokov

Stuart Henderson ??:

On 2007/11/29 23:25, NetOne - Doichin Dokov wrote:
  

First, thanks for the prompt reply!



No problem, if I can save someone else the night I had in a
cold datacentre working it out, some good came out of it :-)

  

Nopes, I'm not:
# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
{snip}
bge0 1500 Link 00:30:48:57:c3:80 44867924 39723 42574046 1 0
bge0 1500 213.137.48. 213.137.48.1 44867924 39723 42574046 1 0
bge0 1500 fe80::%bge0 fe80::230:48ff:fe 44867924 39723 42574046 1 0
bge1 1500 Link 00:30:48:57:c3:81 45170081 33204 42551236 1 0
bge1 1500 fe80::%bge1 fe80::230:48ff:fe 45170081 33204 42551236 1 0

Despite seeing Ierrs, I do not see any performance and connectivity
issues. What exactly does lead to having input errors on the bge(4)s?
I mean, would they be usable for what I will need the two more ports
for.



I don't know what leads to them, but it's not cable/switch, I have
tried numerous alternatives. I was running OSPF with fairly short
timers over those interfaces, and had a lot of instability until
I swapped over to em/sk cards. Most protocols are able to handle
delays/loss a lot better than OSPF though.

  

This machine is gonna soon have a twin to be backed up with CARP,
and i need the two additional interfaces on each of them for:
1) One interface for cross-connecting the machines to do pfsync



Beware split routing; if you only have one active set of BGP
sessions (i.e. active/passive with 'depend on carpXX') there's
no problem of that kind, but if you have live sessions on
both boxes, you'll find that pfsync isn't designed to handle
the case where inbound traffic goes one way, and outbound
traffic the other, so you run into problems with stateful
filtering (sequence number mismatch and maybe there were
wscale problems too).

  

mickey posted some diffs on tech@ relating to watchdog
problems with bge and em, they might be worth a look.
  
  

Are these what you're talking about, or there were any subsequent
patches I could not find:
http://article.gmane.org/gmane.os.openbsd.tech/14133
http://article.gmane.org/gmane.os.openbsd.tech/14134



Yes, those ones. Alternatively it may be a problem with
interrupt routing (the fix for that on many machines is to
enable acpi to set up interrupts according to the AML from
the BIOS - this is more likely to have correct information
than other methods of interrupt setup on newer machines,
this is a large part of the reason for the ACPI work that
has been happening in -current).

While you build, don't forget this patch if you will use pfsync:
ftp://ftp.openbsd.org/pub/OpenBSD/patches/4.2/common/004_pf.patch

  

Again, thank you very much for the help. I highly appreciate it. $30
will be donated to the OpenBSD foundation, plus another copy of the 4.2
CD set bought (we'll need one for the new machine, no? :D).



That's nice, thank you :-)

  
I've now switched the PCI-X slot to 66-bit / 66 MHz, and also applied 
the watchdog fix patches for em(4) and bge(4) to the kernel.
The pf patch was already applied when it was out several days ago, just 
the system was not still rebooted as i do not use pfsync for now. Thanks 
for the hint, anyways.
I'm still running with ACPI disabled, will see how far it would go and 
enable it if needed. Are there any performance penalty / boosts from 
using ACPI?


Thanks again.

Doichin



Re: Strange em(4) issues

2007-11-29 Thread Girish Venkatachalam
On 20:47:57 Nov 29, Stuart Henderson wrote:
 
 Been there, done that. If you use plaintext protocols (ftp or so)
 over the interface, you'll see random corruption visible in the
 data (e.g. directory listings).
 
 At 133MHz there's some corruption between motherboard and card.
 Disappears at 66MHz.
 
 Normally this would be masked by TCP checksums (you'd get packet
 loss, but it would mostly be corrected rather than pass corrupt
 packets up the stack), but the em(4) does offload TCP checksum
 processing to the card, so the checksum no longer covers the
 transfer over the PCI bus, hence the wierd protocol errors.

TCP checksums or for that matter any checksum cannot catch *all* errors.

The best way to consistently reproduce that is by using our own scp(1).

Since there is a MAC computation for every packet, this will easily help
you identify the problem.

If you do a recursive transfer and play with large files, it gives you
enough headroom to track down the bug(s).

Best of luck.

-Girish