Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-25 Thread Matthew Dillon

:...
:> I'm pretty sure that the box was getiting receive interrupts because
:> every time I sent a packet to it from the outside systat -vm showed
:> a PCI interrupt for the network device.  However 'netstat -in 1' did
:> not show the statistics for the received packets until 64 had 
:> accumulated.  It could be that the statistics are not being accumulated
:> on a per-reception basis and that the receive packets are actually
:> getting through, and that its the transmit side which is broken.  I don't
:> know the code well enough yet to make the determination.
:
:If things are done in these drives as they are in the if_de driver then
:what you are seeing is the fact that if_opackets and are only
:updated when the tx ring is reclaimed by an interrupt, not

Next time this bug rears its ugly head I'll get a tcpdump going to try
to figure out what is actually going on.  Ooh, and I just had a 
thought -- a profiled kernel might help track down the problem as well
by enabling it to see which routines get hit (and which don't).

I don't see anything specific in the code so far, other then there being
a lot of memory mapped (apparently shared with the device) objects that 
haven't been volatilized.  So far I can't tie that into anything though. 

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-24 Thread Rodney W. Grimes

...
> I'm pretty sure that the box was getiting receive interrupts because
> every time I sent a packet to it from the outside systat -vm showed
> a PCI interrupt for the network device.  However 'netstat -in 1' did
> not show the statistics for the received packets until 64 had 
> accumulated.  It could be that the statistics are not being accumulated
> on a per-reception basis and that the receive packets are actually
> getting through, and that its the transmit side which is broken.  I don't
> know the code well enough yet to make the determination.

If things are done in these drives as they are in the if_de driver then
what you are seeing is the fact that if_opackets and are only
updated when the tx ring is reclaimed by an interrupt, not
when we actually queue the packet to the card.  This has been a source
of confusion for a long time, and IMNSO we should move the if_ipackets+=
in the code.  Here is an idle box, with an dc21143 in it showing probably
what you are seing (the only network traffic to this box is the output
of this running netstat -I de0 1 command:
input  (de0)   output
   packets  errs  bytespackets  errs  bytes colls
 1 0 60  0 0138 0
 2 0182  0 0250 0
 2 0158  0 0138 0
... 100 + lines of output deleted...
 3 0256  0 0138 0
 1 0 60122 0138 0
 3 0256  0 0138 0
 1 0 60  0 0138 0

Search for lines like this:
sc->tulip_if.if_opackets += xmits;

in the driver to see when we update the counter, then look at how
interrupt per packet drivers do it and propose a nice clean solution :-)



-- 
Rod Grimes - KD7CAX @ CN85sl - (RWG25)   [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-24 Thread Harold Gutch

On Wed, Dec 22, 1999 at 10:18:56PM -0800, Matthew Dillon wrote:
> I'm adding Bill Paul to the list specifically.
> 
> Hmm.  Now this is odd!  I think I may have found something!
> 
> All of my 'rl' driver cards fail this test:
> 
>   apollo# linktest -m 0.1:0.2 -s 16 -f16 lander
>   lander# linktest -m 0.1:0.2 -s 16 -f16 apollo
> 
>   They get about 1% packet loss with the test.  Always.  
>   100BaseTX full or half duplex, or 10BaseT -- I still get
>   failures.
> 

I can't repeat this with a RealTek 8039 (that's an 'ed'-NIC) and
a RealTek 8139 (that's the 'rl'-one) running 10BaseT.
Note that I am _NOT_ running -CURRENT on any of these machines,
they both run 2.2-STABLE (rev. 1.17 of rl.c).

The packetloss when using small packets is exactly 0 - that is no
packetloss occured during the minute or so which I was running
linktest.
I just started it again and will leave it running for a couple of
hours, but I doubt that this will make a change.

Whoops, I in fact experienced packet loss now:

overdose(194.94.249.94)->foobar.franken.de  lost 1/1606
overdose(194.94.249.94)->foobar.franken.de  lost 2/2702
foobar.franken.de->overdose(194.94.249.94)  lost 1/3412
overdose(194.94.249.94)->foobar.franken.de  lost 3/3829


Note that was playing PCM-files via NFS at this time, so there
was additional network traffic of ~180 KByte/s.

These here now occured although there was no additional network
traffic:

overdose(194.94.249.94)->foobar.franken.de  lost 4/5491
overdose(194.94.249.94)->foobar.franken.de  lost 5/5692
overdose(194.94.249.94)->foobar.franken.de  lost 6/7277
overdose(194.94.249.94)->foobar.franken.de  lost 7/8661
overdose(194.94.249.94)->foobar.franken.de  lost 8/9412
overdose(194.94.249.94)->foobar.franken.de  lost 9/11393
overdose(194.94.249.94)->foobar.franken.de  lost 10/13699
foobar.franken.de->overdose(194.94.249.94)  lost 2/13728
overdose(194.94.249.94)->foobar.franken.de  lost 11/16426


It seems as if this was roughly the same amount of packetloss as
you experienced.


>   rl0:  irq 11 at device 3.0 on pci0
>   rl0: Ethernet address: 00:50:ba:d1:89:05
>   miibus0:  on rl0
> 
> All of my 'fxp' driver cards succeed with the above test perfectly.
> If I test an fxp machine verses an 'rl' machine, linktest shows that
> the 'rl' cards can transmit small packets just fine but they lose
> out trying to receive them!

Nope, it's the other way round for me.  overdose has the
'rl'-NIC, foobar has the 'ed'-NIC.

I hope to be able to do a few additional tests soon.


> Methinks there is something going on with the 'rl' driver and/or
> the RealTek cards!

My experience with those cards isn't the best, so I'd place my
bets on the cards.

bye,
  Harold

-- 
 Sleep is an abstinence syndrome wich occurs due to lack of caffein.
Wed Mar  4 04:53:33 CET 1998   #unix, ircnet


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-23 Thread Bill Paul

Of all the gin joints in all the towns in all the world, Matthew Dillon 
had to walk into mine and say:
 
> I'm trying to narrow down the area enough that I can mess with the 
> driver myself and hopefully locate the problem, since it can't be
> reproduced easily.   I was hoping the magic number 64 could be
> related to something - and you have apparently been able to do that,
> which gives me a place to start anyway.   netstat shows the trigger
> to be the reception of 64 packets rather then the transmission, though.
> Is there anything at all about the number 64 that could be related to
> the receiver?

64 is also the number of descriptors/buffers in the RX ring. When you
fill up the RX ring, the chip is supposed to generate a 'no RX buffer
available' interrupt. The driver will check the RX ring for packets
when either an 'RX OK' or 'no RX buffers available' interrupt is
delivered, but you should be getting an 'RX OK' interrupt on every
received packet.

The datasheet for the PNIC II is at:

http://www.freebsd.org/~wpaul/Macronix/PNIC_II.PDF

This is the datasheet LinkSys gave me when they first came out with
the LNE100TX v2.0 board. It's very similar to the Macronix 98715A
datasheet.
 
> I'm pretty sure that the box was getiting receive interrupts because
> every time I sent a packet to it from the outside systat -vm showed
> a PCI interrupt for the network device.  However 'netstat -in 1' did
> not show the statistics for the received packets until 64 had 
> accumulated.  It could be that the statistics are not being accumulated
> on a per-reception basis and that the receive packets are actually
> getting through, and that its the transmit side which is broken.  I don't
> know the code well enough yet to make the determination.

The dc_rxeof() routine is what increments ifp->if_ipackets, so if
netstat -in doesn't show any change until after 64 packets have arrived,
then it isn't getting the 'RX OK' interrupts. But I promise you that I
have never seen a condition where 'RX OK' interrupts failed to arrive
even though 'no RX buffer available' interrupts did. The interrupt handler
re-enables interrupts just before it exits, so there should never be a
case where interrupts are turned off and never turned back on again.

-Bill

> I'll try that next time the problem occurs but I doubt it will have 
> any effect.  Changing the duplex mode does not appear to reset the port 
> whereas forcing the media to 'auto' does appear to reset the port.  This 
> is actually another problem (switches don't appear to pick up the duplex
> change if the port isn't reset), but not one I'm concerned with.

In general what you want to do is a) switch modes and b) reset the link
so that the guy on the other side re-senses the media. However both sides
can only agree on the duplex setting as the result of an NWAY autoneg
session: if you manually select 100baseTX full duplex, the link partner
can only sense the link speed (100mbs as opposed to 10) but not the
duplex mode. The rule is that if you don't have NWAY but can sense the
link speed, you default to half duplex and let the operator manually
fix things if necessary (that's what operators are for). Of course this
only works if the switch has a management interface that allows you
to configure things like that. Some don't, which can make your life tough.

I'm pretty sure the speed and duplex setting don't really have anything
to do with this particular problem though. I was just wondering why
renegotiating the media would have any effect. It's possible that
dc_init() may be called in there somewhere, which could be resetting
all of the driver state.

-Bill

-- 
=
-Bill Paul(212) 854-6020 | System Manager, Master of Unix-Fu
Work: [EMAIL PROTECTED] | Center for Telecommunications Research
Home:  [EMAIL PROTECTED] | Columbia University, New York City
=
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-23 Thread Matthew Dillon

:> It appears that the 'dc' driver continues to take receive interrupts
:> (see the systat -vm snapshot at the end), but winds up not processing 
:> any of the packets.  Except when 64 packets accumulate then suddenly all
:> 64 get processed all at once!  Then nothing again until the next 64
:> accumulate.
:
:Uh. That's... strange. First of all, you haven't said if this is the
:same machine that experienced the problems with the xl driver. Second,
:the number 64 sticks out in this case. If you look at if_dc.c (uh...
:you did actually look at the code, right?), you'll see that dc_encap()
:will only ask for a "TX done" interrupt every 64 packets. Why? Well,
:reclaiming transmit buffers is a fairly unimportant task and I wanted to 

I'm trying to narrow down the area enough that I can mess with the 
driver myself and hopefully locate the problem, since it can't be
reproduced easily.   I was hoping the magic number 64 could be
related to something - and you have apparently been able to do that,
which gives me a place to start anyway.   netstat shows the trigger
to be the reception of 64 packets rather then the transmission, though.
Is there anything at all about the number 64 that could be related to
the receiver?

I'm pretty sure that the box was getiting receive interrupts because
every time I sent a packet to it from the outside systat -vm showed
a PCI interrupt for the network device.  However 'netstat -in 1' did
not show the statistics for the received packets until 64 had 
accumulated.  It could be that the statistics are not being accumulated
on a per-reception basis and that the receive packets are actually
getting through, and that its the transmit side which is broken.  I don't
know the code well enough yet to make the determination.

Previously it was not possible to add debugging code due to the amount
of network traffic involved.  With the new card, though, it should be
possible to add conditional debugging code that could then be turned on
with the sysctl because the network does not lock up completely (so I can
still run 'sysctl' even if it takes it 5 minutes to load over NFS).

:Yes, but the one vital fact you keep leaving out is: does this always
:happen with the same machine. If so, then describe this machine. What
:PCI chipset does it have? And more to the point, what cards have you
:used in this machine that *didn't* exhibit this problem.
:
:No wait, let me guess: Intel fxp. Right? G.

I only have one machine with this configuration (diskless workstation,
everything running over NFS, plus X Display), so yes.  The problem only
occurs on one machine.  It started occuring mid-year, after I threw the
card in that used the xl driver.  The previous ethernet card used a 'de'
driver I believe and didn't have the problem.  The only 'fxp' ethernets
I have are in two of my test boxes - built into the motherboard.  I
don't think I have any PCI cards that use that driver.  The LinkSys
card in my server has never locked up, and the card using the 'xl' driver
in my other diskless test machine (which doesn't have an X display)
has never locked up either.

:> And watch what happens after I managed to 'ifconfig dc0 media auto',
:> it goes back to normal... suddenly everything is working properly
:> again.
:
:And what happens if instead of auto, you use "ifconfg dc0 media 100baseTX
:mediaopt full-duplex" to lock the media setting down? Or what happens if
:you shut down and restart the X server?
:
:-Bill

I'll try that next time the problem occurs but I doubt it will have 
any effect.  Changing the duplex mode does not appear to reset the port 
whereas forcing the media to 'auto' does appear to reset the port.  This 
is actually another problem (switches don't appear to pick up the duplex
change if the port isn't reset), but not one I'm concerned with.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-23 Thread Bill Paul

Of all the gin joints in all the towns in all the world, Matthew Dillon 
had to walk into mine and say:

> Heh heh.  Sorry about this, I believe I have further information on
> another older problem.  Bill, remember those ethernet lockups I was 
> having with the 'xl' driver all those months ago that we could never
> track down?

And remember how I kept telling you that I could never duplicate the
problem here?

> Well, they happen with the 'dc' driver too.  But this time I'm not getting
> a complete lockup.  The network actually continues to work well enough,
> well, just barely well enough, that I can still use it.  slowly.
> 
> It appears that the 'dc' driver continues to take receive interrupts
> (see the systat -vm snapshot at the end), but winds up not processing 
> any of the packets.  Except when 64 packets accumulate then suddenly all
> 64 get processed all at once!  Then nothing again until the next 64
> accumulate.

Uh. That's... strange. First of all, you haven't said if this is the
same machine that experienced the problems with the xl driver. Second,
the number 64 sticks out in this case. If you look at if_dc.c (uh...
you did actually look at the code, right?), you'll see that dc_encap()
will only ask for a "TX done" interrupt every 64 packets. Why? Well,
reclaiming transmit buffers is a fairly unimportant task and I wanted to 
cut down on the number of interrupts that were generated, and when the
tulip reaches the last descriptor in a transmit chain, it's supposed
to generate a "no more buffers in TX ring" interrupt, which will also
trigger a TX buffer reclamation (i.e. dc_txeof() will be called for
either interrupt).

This behavior is controlled by the DC_TX_USE_TX_INTR flag, which
is set for the PNIC II chip. I also use the DC_TX_POLL flag, which
means that the chip is programmed to poll the TX ring and start
transmission itself rather than having the driver write to the
TX DMA start register. This means no register accesses on transmit,
which is always nice. You can ask for a "TX done" interrupt to be
scheduled for each transmitted packet by using the DC_TX_INTR_ALWAYS
flag, which is currently only used for the PNIC I (82c168/82c169)
because it blows goats.

Anyway. I *never* see this behavior on any of my test machines. I
have a LinkSys LNE100TX V2.0 card with the 82c115 chip, as well
as a couple of Macronix cards, a Davicom card, several Intel/DEC
21143 cards, ASIX cards and ADMtek cards, and PNIC I-based LinkSys
cards. None of them exhibit this behavior when I test them.

> This netstat is on the machine with the 'dc' driver that locked up, when
> I ping it from another machine.  The 'dc' driver still works--- barely.
> It doesn't processes any packets until 64 have been received, then it
> processes them all at once.  The transmit side appears to work fine and
> the receive side appears to get interrupts but does not appear to process
> incoming packets.  Yet, obviously, the packets are being accumulated 
> somewhere because I don't have any packet loss, just incredibly long and
> odd ping times.

No no no. You can't say "the receive side appears to get interrupts."
That's speculation. You can stare at the machine and theorize about
what appears to be happening all you want: it won't do a damn bit of good 
until you actually test your theory. You know that an "RX done" interrupt
has been delivered if dc_rxeof() is called. So do something to verify
that it's being called: stick a printf() in dc_rxeof() that tells you
when it trips. Then duplicate the behavior and watch what happens.

> This occurs when I am running netscape on the same box over a remote X
> connection (read:  Lots of packets going over the network plus lots of
> local PCI activity talking to the graphics card).  Same problem occurs 
> with different graphics adapters but I believe this same problem also
> occured with the 'xl' driver on the card I had in before I put this
> card in.

Yes, but the one vital fact you keep leaving out is: does this always
happen with the same machine. If so, then describe this machine. What
PCI chipset does it have? And more to the point, what cards have you
used in this machine that *didn't* exhibit this problem.

No wait, let me guess: Intel fxp. Right? G.

I'm very puzzled by the fact that nobody else has *ever* reported
any problem even remotely like this. Of course, with the level of
feedback I get, it's possible that 50 people are having the same
problem and simply never bothered to tell me.

> And watch what happens after I managed to 'ifconfig dc0 media auto',
> it goes back to normal... suddenly everything is working properly
> again.

And what happens if instead of auto, you use "ifconfg dc0 media 100baseTX
mediaopt full-duplex" to lock the media setting down? Or what happens if
you shut down and restart the X server?

-Bill

-- 
==

Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-23 Thread Matthew Dillon

Heh heh.  Sorry about this, I believe I have further information on
another older problem.  Bill, remember those ethernet lockups I was 
having with the 'xl' driver all those months ago that we could never
track down?

Well, they happen with the 'dc' driver too.  But this time I'm not getting
a complete lockup.  The network actually continues to work well enough,
well, just barely well enough, that I can still use it.  slowly.

It appears that the 'dc' driver continues to take receive interrupts
(see the systat -vm snapshot at the end), but winds up not processing 
any of the packets.  Except when 64 packets accumulate then suddenly all
64 get processed all at once!  Then nothing again until the next 64
accumulate.

This netstat is on the machine with the 'dc' driver that locked up, when
I ping it from another machine.  The 'dc' driver still works--- barely.
It doesn't processes any packets until 64 have been received, then it
processes them all at once.  The transmit side appears to work fine and
the receive side appears to get interrupts but does not appear to process
incoming packets.  Yet, obviously, the packets are being accumulated 
somewhere because I don't have any packet loss, just incredibly long and
odd ping times.

This occurs when I am running netscape on the same box over a remote X
connection (read:  Lots of packets going over the network plus lots of
local PCI activity talking to the graphics card).  Same problem occurs 
with different graphics adapters but I believe this same problem also
occured with the 'xl' driver on the card I had in before I put this
card in.

dc0:  irq 5 at device 9.0 on pci0
dc0: Ethernet address: 00:a0:cc:69:4e:2d

dc0@pci0:9:0:   class=0x02 card=0xc00111ad chip=0xc11511ad rev=0x25 hdr=0x00


input(Total)   output
   packets  errs  bytespackets  errs  bytes colls
64 0   7188 48 0   4792 0
 0 0  0  2 0332 0
64 0   6962 46 0   4628 0
 0 0  0  2 0348 0
64 0   8268 46 0   4592 0
 0 0  0  2 0348 0
64 0   7704 46 0   4656 0
 0 0  0  2 0332 0
64 0   7228 47 0   4614 0
 0 0  0  2 0332 0
65 0   6972 47 0   4686 0
 0 0  0  3 0522 0
64 0  14472 42 0   4188 0
 0 0  0  3 0422 0
64 0   7724 44 0   4196 0
 0 0  0  1 0134 0
64 0   6768 49 0   4830 0
 0 0  0  2 0332 0
64 0   7440 45 0   4386 0
 0 0  0  0 0  0 0
 0 0  0  0 0  0 0
input(Total)   output

When I ping the machine faster from another box:

input(Total)   output
   packets  errs  bytespackets  errs  bytes colls
64 0   6712 50 0   5108 0
64 0   6724 50 0   5132 0
64 0   7948 50 0   5116 0
64 0   6816 48 0   4978 0
64 0   7072 50 0   5208 0
64 0  46144 28 0   3058 0
64 0  37416 31 0   3290 0
64 0   6712 50 0   5108 0
64 0   7004 49 0   4898 0
64 0   6712 46 0    0
64 0   6724 50 0   4724 0
64 0   6432 50 0   4768 0
 0 0  0  0 0  0 0
64 0   6432 50 0   4768 0
64 0   6684 50 0   4724 0
64 0   6792 55 0   5554 0
64 0   6876 53 0   5402 0
64 0   6752 52 0   5212 0
64 0   6712  5 1   4622 0
 0 0  0  0 0 74 0
 0 0  0  0 0850 0

And watch what happens after I managed to 'ifconfig dc0 media auto',
it goes back to normal... suddenly everything is working properly
again.

input

Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-23 Thread Matthew Dillon

:Okay, I patched if_rl.c in -current to fixe the problem demonstrated by 
:Matt's linktest program. The bug was actually on the receive side of the 
:rl driver, not the transmit side. A packet can wrap from the end of the 
:RX buffer back to the beginning, and in some cases these packets would 
:get lost due to botched use of m_pullup(). I can run the linktest 
:program now without losing any frames.
:
:There's another way around this which is to allocate a whole mbuf
:cluster when you know the packet is wrapped and bcopy the data manually
:instead of using m_devget(), but I'm not sure I want to waste a whole
:cluster just for that case.
:
:-Bill
:
:-- 
:=
:-Bill Paul(212) 854-6020 | System Manager, Master of Unix-Fu

Great!  Thanks for your help, Bill!

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-23 Thread Bill Paul

Okay, I patched if_rl.c in -current to fixe the problem demonstrated by 
Matt's linktest program. The bug was actually on the receive side of the 
rl driver, not the transmit side. A packet can wrap from the end of the 
RX buffer back to the beginning, and in some cases these packets would 
get lost due to botched use of m_pullup(). I can run the linktest 
program now without losing any frames.

There's another way around this which is to allocate a whole mbuf
cluster when you know the packet is wrapped and bcopy the data manually
instead of using m_devget(), but I'm not sure I want to waste a whole
cluster just for that case.

-Bill

-- 
=
-Bill Paul(212) 854-6020 | System Manager, Master of Unix-Fu
Work: [EMAIL PROTECTED] | Center for Telecommunications Research
Home:  [EMAIL PROTECTED] | Columbia University, New York City
=
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-23 Thread Mikko T


Just a quick note, not entirely on-topic:

Bill Paul wrote:

[...]

>Yes, I know there's a minimum frame length of 60 bytes. And the rl_encap()
>routine has the following code:

>/* Pad frames to at least 60 bytes. */
>if (m_head->m_pkthdr.len < RL_MIN_FRAMELEN) {
>m_head->m_pkthdr.len +=
>(RL_MIN_FRAMELEN - m_head->m_pkthdr.len);
>m_head->m_len = m_head->m_pkthdr.len;
>}

[...]

>60 bytes, I just adjust bump up m_pkthdr.len and m_len. This adjuster
>length gets used later in rl_start() when transmission is triggered.

I haven't read through the code yet, so I don't know where the extra
memory in that buffer originated from, or rather if it has been zeroed
before reaching this point.  Otherwise you are leaking data from the
kernel out to the network.

Other OSes have done this before.  It can be used for "data fishing"
by just pinging the machine.  Eventually it turns up all sorts of
interesting information ([partial] passwords, for example).

How many other NICs are unable to auto-pad, and how many of the
drivers just add "random" data that happened to be laying around
inside the kernel...?

   Just curious,
   /Mikko

   (Off to make sure that if_ed in my home firewall isn't doing
anything like this...)



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-23 Thread Matthew Dillon

Ok, here's the current status:  The RealTek boards ('rl' driver, D-Link
brand, RealTek chip vendor) appear to have serious packet loss problems 
with small packets.  The cause is currently unknown.  I had two different 
machines (an older PPro 200 and a somewhat newer K6-2/233) with the 
boards in and both exhibited the problem.

The problem is fairly trivial to reproduce using linktest:

http://www.backplane.com/FreeSrc/linktest-1.1.c

host1# linktest -s 16 -f8 host2
host1# linktest -s 16 -f8 host1

These boards were the cause of my TCP problems.

The D-Link boards came with the D-Link switch I had purchased.  I removed
the boards and replaced them with the two LinkSys boards that came with
the LinkSys switch I had purchased.

The LinkSys boards ('dc' driver, LNE100TX+ fame, LC82C115 PNIC II vendor)
do not appear to have the packet loss problem.  I have not had a 
reoccurance of my TCP glitches and my linktest tests have all come out
roses.

I'm hoping Bill will be able to find the problem with the D-Link boards,
just so everyone else using them doesn't hit the same hangup, but my
problem at least appears to be solved after replacing the boards.  I've
stuck my D-Link board into another diskless test machine and it's 
available for testing potential fixes, debugging, etc.

In regards to the switches themselves:  Both the LinkSys and the D-Link
5-Port switches appear to work well.  I've interchanged them with each
other and tested them pretty significantly with four machines attached.
The LinkSys seems to be limited to around 25 MBytes/sec in aggregate
throughput.  The D-Link maxed out my machines (35 MBytes/sec) so I do
not know what it's ultimate limitation is.  The small-packet test maxed
out my machines at 35,000 packets per second.  So while I couldn't find
the limitations of the switches, they're plenty good enough for me!

The only problem I've come up against is that when I change the duplex
with ifconfig the ethernet port is not reset and the switches do not
recognize that the duplex has changed.  If I 'ifconfig XXX media auto',
however, the ports are reset and the switches negotiate full-duplex
properly.  If I ifconfig between 10 and 100BaseT the ports are reset and
the switches appear to figure out the mode properly as well.

So that's where I am.  There was never anything wrong with the switches
or the cabling - the entire problem was due to the D-Link ethernet cards.

-Matt



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-22 Thread Bill Paul

Of all the gin joints in all the towns in all the world, Matthew Dillon 
had to walk into mine and say:
 
> (taking this off -current)
> 
> apollo# linktest -s 51 -f1 lander 1-51 byte payload -> errors
> lander# linktest -s 51 -f1 apollo
> 
> apollo# linktest -s 52 -f1 lander 52+ byte payload -> no errors
> lander# linktest -s 52 -f1 apollo
> 
> 
> You know, this kinda sounds like a jabber lockup.
> 
> Bill, are you following the *MINIMUM* ethernet frame size specification 
> for ethernet?

*sigh* No, I've been living on Mars since 1975 and we don't get IEEE spec
documents up here.

Yes, I know there's a minimum frame length of 60 bytes. And the rl_encap()
routine has the following code:

/* Pad frames to at least 60 bytes. */
if (m_head->m_pkthdr.len < RL_MIN_FRAMELEN) {
m_head->m_pkthdr.len +=
(RL_MIN_FRAMELEN - m_head->m_pkthdr.len);
m_head->m_len = m_head->m_pkthdr.len;
}

The RealTek doesn't autopad, so you have to handle it manually. You're
only allowed one DMA buffer per transmission, so outbound packets are
coalesced into a single mbuf cluster buffer in rl_encap(). A cluster
buffer is always 2K, and frames can never be larger than 1514 bytes, so
we know there'll always be plenty of room. In the case of frames less
60 bytes, I just adjust bump up m_pkthdr.len and m_len. This adjuster
length gets used later in rl_start() when transmission is triggered.

Incidentally, you should be using tcpdump -n -e -i rl0 to measure the
actual frame length of failing and succeeding transmissions: that's
usually a much better indicator of what might be going wrong. You could
calculate it from the data buffer length, but I suck at math; I find it's
easier just to monitor the offending frames.

-Bill

=
-Bill Paul(212) 854-6020 | System Manager, Master of Unix-Fu
Work: [EMAIL PROTECTED] | Center for Telecommunications Research
Home:  [EMAIL PROTECTED] | Columbia University, New York City
=
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-22 Thread Jonathan Lemon

On Dec 12, 1999 at 01:41:04AM -0500, Bill Paul wrote:
> Of all the gin joints in all the towns in all the world, Matthew Dillon 
> had to walk into mine and say:
>  
> > I'm adding Bill Paul to the list specifically.
> > 
> > Hmm.  Now this is odd!  I think I may have found something!
> > 
> > All of my 'rl' driver cards fail this test:
> 
> Oh sure. Bet the farm on the absolute worst NIC on the whole damn planet,
> why don't you.

Sorry, but I can't resist quoting this:

/*
 * The RealTek 8139 PCI NIC redefines the meaning of 'low end.' This is
 * probably the worst PCI ethernet controller ever made, with the possible
 * exception of the FEAST chip made by SMC.
 */

--
Jonathan


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-22 Thread Bill Paul

Of all the gin joints in all the towns in all the world, Matthew Dillon 
had to walk into mine and say:
 
> I'm adding Bill Paul to the list specifically.
> 
> Hmm.  Now this is odd!  I think I may have found something!
> 
> All of my 'rl' driver cards fail this test:

Oh sure. Bet the farm on the absolute worst NIC on the whole damn planet,
why don't you. Why spend a few bucks on some nice 3c905B or 3c905C cards
and beat up on them when you can buy ten RealTek cards for a dollar. About
as reliable as a pair of tin cans and a piece of string, but gosh they
sure are cheap.

You'll have to wait until at least tomorrow before I can look into this,
since I won't be able to do any debugging until I throw my one and only
RealTek 8139 sample adapter into a machine and run some tests with it.

>   rl0:  irq 11 at device 3.0 on pci0
>   rl0: Ethernet address: 00:50:ba:d1:89:05
>   miibus0:  on rl0

pciconf -l would be nice here too (to see the PCI revision code).
 
> Methinks there is something going on with the 'rl' driver and/or
> the RealTek cards!

Gee, y'think? I don't suppose you ran any similar tests with, say,
one of those LinkSys cards you had the other day. Or maybe a 3Com card.
I mean, it's just a little anti-climactic, you know? I put all that
blood, sweat and tears into if_xl and if_dc, but do people do stress
tests with them to help me identify weaknesses? No, they pound on
the house of cards that is if_rl.

*sigh*

-Bill

-- 
=
-Bill Paul(212) 854-6020 | System Manager, Master of Unix-Fu
Work: [EMAIL PROTECTED] | Center for Telecommunications Research
Home:  [EMAIL PROTECTED] | Columbia University, New York City
=
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Woa! May have found something - 'rl' driver and small packets (was Re: Odd TCP glitches in new currents)

1999-12-22 Thread Matthew Dillon

I'm adding Bill Paul to the list specifically.

Hmm.  Now this is odd!  I think I may have found something!

All of my 'rl' driver cards fail this test:

apollo# linktest -m 0.1:0.2 -s 16 -f16 lander
lander# linktest -m 0.1:0.2 -s 16 -f16 apollo

They get about 1% packet loss with the test.  Always.  
100BaseTX full or half duplex, or 10BaseT -- I still get
failures.

rl0:  irq 11 at device 3.0 on pci0
rl0: Ethernet address: 00:50:ba:d1:89:05
miibus0:  on rl0

All of my 'fxp' driver cards succeed with the above test perfectly.
If I test an fxp machine verses an 'rl' machine, linktest shows that
the 'rl' cards can transmit small packets just fine but they lose
out trying to receive them!

(test3 has an 'fxp' driver, apollo has an 'rl' driver.  Both are
on the same switch!)

test3(216.240.41.13)->apollo.backplane.com  lost 79/89027
test3(216.240.41.13)->apollo.backplane.com  lost 80/89990
test3(216.240.41.13)->apollo.backplane.com  lost 81/90953
test3(216.240.41.13)->apollo.backplane.com  lost 82/92879
test3(216.240.41.13)->apollo.backplane.com  lost 83/93842
test3(216.240.41.13)->apollo.backplane.com  lost 84/94805
test3(216.240.41.13)->apollo.backplane.com  lost 85/96730

Methinks there is something going on with the 'rl' driver and/or
the RealTek cards!

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message