Re: No buffer space available

2002-01-28 Thread Terry Lambert

Shizuka Kudo wrote:
> Thanks for your response. I wonder if I misunderstand
> your advice. When looking at the if_rl.c (dated Dec
> 14), there's already a timer attached to
> ifp->if_watchdog. Is this the timer you referred to?
> If so, it looks like this timer never called by the
> driver in my case as I never saw "watchdog timeout"
> error.
> 
> Any advice?

It's not clear to me that the watchdog timer has been
initialized at the time of the problem.

The rl_txeof() function zeros it.

Are you sure you are not getting *one* watchdog reset?

One thing you might try is to put:

rl_reset(sc);

Between the rl_rxeof() call and the rl_init() call in the
rl_watchdog() code.  I'm not positive this is the right
place for it: perhaps it -- and the rl_init()? -- should
be before the rl_txeof() call.  It is noticible that the
rl_reset() function is used everywhere else before the
rl_init() in the error recovery case, but not here, as
when you down and re-up the interface, that's what's
happens as well.



It looks like if the receive interrupt is lost, that the
watchdog doesn't cover that case, that it's specific to
the transmit interrupt.

This won't help with incoming connections initiated by
a remote side (the initial SYN of the three way handshake)
if the thing is wedged at the time, but...

One possible workaround that would cause the transmit to
fix the receive in case the receive interrupt was lost
would be to call rl_rxeof(sc) as the first thing in the
rl_txeof() routine.  That way, a lost interrupt would be
recovered when your ping packet went out by reaping the
receivable data withouyt an interupt at all (basically,
it makes it into a "poll on transmit" model, which is a
really bad model, since it fails in the case I noted,
but what the hack.  8-)).

If the problem is a race window in the receive interrupt
for the flag getting set (bad hardware, bad flag checks
in the driver, etc.), one possible workaround would be to
call the rl_rxeof() unconditionally in the interrupt, even
if you *think* the interrupt is not for the rl device (i.e.
perhaps the interrupt is sent before the RL_INTRS flag is
set in the status word, or perhaps the reading of the
status word is prone to failure).

The way to handle this is to to change the for(;;)
loop in the rl_int() function;  specifically, move the

if((status & RL_INTRS) == 0)
break;

To the *end* of the loop, after the check.  You may also
want to *unconditionally* call rl_rxeof(), instead of
doing the call conditionally, just to be sure (do this
only if nothing else fixes the problem for you).

What's the net effect of this?

The overall effect of doing this would be to slow down any
device that shared a PCI interrupt with the if_rl card(s)
in your system.  This is why it's not done by default.


Another possible approach is a *long* watchdog -- a second
watchdog timer.  Basically, this timer would fire and call
the rl_intr() function on the interface, as if there had
been a hardware interrupt.  You would not want to do this
more than once a second.  The tricky part here is that you
will need a wrapper function to raise and lower the SPL
over the call (I'm actually curious why the current
watchdog timer can get away with not raising the SPL to
splbio from splnet, but I suppose it's so incredibly rare
it's not a practical problem).

If none of this works, you might consider labelling the
harware as "broken", and swapping out the rl interface
(probably means swapping out the motherboard for you,
but rl 10/100 cards are US$9 at Frys, these days, so it
would be a cheap experiment, if you wanted to try it
that way, instead).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: No buffer space available

2002-01-27 Thread Shizuka Kudo

Terry,

Thanks for your response. I wonder if I misunderstand
your advice. When looking at the if_rl.c (dated Dec
14), there's already a timer attached to
ifp->if_watchdog. Is this the timer you referred to?
If so, it looks like this timer never called by the
driver in my case as I never saw "watchdog timeout"
error.

Any advice?

--- Terry Lambert <[EMAIL PROTECTED]> wrote:
> Shizuka Kudo wrote:
> > Is anyone still seeing the "No buffer space
> availabe"
> > message in 5.0-CURRENT? I have checked the mail
> > archieve and saw several replies, but none worked
> in
> > my case.
> > 
> > I have a Thinkpad 600X with a Melco cardbus 10/100
> > ethernet card (a Realtek 8139B) running 5.0
> NEWCARD
> > kernel with NMBCLUSTERS=16384.
> > "No buffer space available" occurred when I tried
> to
> > ftp a file in my Thinkpad from other client.
> "ifconfig
> > rl0 down" and then "ifconfig rl0 up" resumed the
> > operation for awhile until the error happened
> again.
> > Setting media to 10baseT/UTP did not suffer from
> this
> > error and got about 900Kbytes/s throughput.
> > 
> > Would that be a bug in the driver that ftp server
> is
> > delivering too much traffic to the NIC? Any
> suggestion
> > that I can try?
> 
> That downing and reupping it worked indicates that
> you
> are losing a transmit interrupt draining the write
> queue.
> 
> I'm pretty sure that this has to be related to the
> use
> of "interrupt threads" in -current, as I do not have
> the
> problem with the 8139B's I have, with an older,
> stable
> version of FreeBSD.
> 
> The downing and reupping it resets the card (this is
> an
> intentional side effect, designed to make Tigon II
> cards
> suck^W^W^W^W^Wallow recovery from a hosed driver
> that
> people would rather hack around than
> fix^W^W^W^W^W^W^W^W,
> which should never happen in practice).
> 
> The "normal" workaround for this is to have a
> software
> "watchdog" timer that resets the card when it loses
> its
> mind like this (transmit interrupt pending, no
> transmit
> interrupt seen for timeout period).  You could add
> similar
> code the the RealTek driver pretty easily, using the
> Tigon
> II or another driver as a template, if you wanted to
> work
> around the problem instead of actually fixing
> it^W^W^W^W^W.
> 
> -- Terry


__
Do You Yahoo!?
Great stuff seeking new owners in Yahoo! Auctions! 
http://auctions.yahoo.com

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: No buffer space available

2002-01-25 Thread Terry Lambert

Shizuka Kudo wrote:
> Is anyone still seeing the "No buffer space availabe"
> message in 5.0-CURRENT? I have checked the mail
> archieve and saw several replies, but none worked in
> my case.
> 
> I have a Thinkpad 600X with a Melco cardbus 10/100
> ethernet card (a Realtek 8139B) running 5.0 NEWCARD
> kernel with NMBCLUSTERS=16384.
> "No buffer space available" occurred when I tried to
> ftp a file in my Thinkpad from other client. "ifconfig
> rl0 down" and then "ifconfig rl0 up" resumed the
> operation for awhile until the error happened again.
> Setting media to 10baseT/UTP did not suffer from this
> error and got about 900Kbytes/s throughput.
> 
> Would that be a bug in the driver that ftp server is
> delivering too much traffic to the NIC? Any suggestion
> that I can try?

That downing and reupping it worked indicates that you
are losing a transmit interrupt draining the write
queue.

I'm pretty sure that this has to be related to the use
of "interrupt threads" in -current, as I do not have the
problem with the 8139B's I have, with an older, stable
version of FreeBSD.

The downing and reupping it resets the card (this is an
intentional side effect, designed to make Tigon II cards
suck^W^W^W^W^Wallow recovery from a hosed driver that
people would rather hack around than fix^W^W^W^W^W^W^W^W,
which should never happen in practice).

The "normal" workaround for this is to have a software
"watchdog" timer that resets the card when it loses its
mind like this (transmit interrupt pending, no transmit
interrupt seen for timeout period).  You could add similar
code the the RealTek driver pretty easily, using the Tigon
II or another driver as a template, if you wanted to work
around the problem instead of actually fixing it^W^W^W^W^W.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: "No buffer space available" errors

2000-09-18 Thread Matthew N. Dodd

On Mon, 18 Sep 2000, Ben Smithurst wrote:
> Does anyone have any clue what could cause errors like this?  I've
> been seeing this sort of stuff since the SMPng commit, IIRC.  I'm sure
> there's more information I should be giving, so just let me know what
> to find. dmesg is at the end.
...
> ep0: <3Com 3C509-Combo EtherLink III> at port 0x300-0x30f irq 10 on isa0
> ep0: Ethernet address 00:a0:24:eb:f4:a2

Run 'netstat -m' when you encounter the problem and email me the results.

-- 
| Matthew N. Dodd  | '78 Datsun 280Z | '75 Volvo 164E | FreeBSD/NetBSD  |
| [EMAIL PROTECTED] |   2 x '84 Volvo 245DL| ix86,sparc,pmax |
| http://www.jurai.net/~winter | This Space For Rent  | ISO8802.5 4ever |



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: "No buffer space available" errors

2000-09-18 Thread Bosko Milekic


On Mon, 18 Sep 2000, Ben Smithurst wrote:

> Does anyone have any clue what could cause errors like this?  I've
> been seeing this sort of stuff since the SMPng commit, IIRC.  I'm sure
> there's more information I should be giving, so just let me know what to
> find. dmesg is at the end.

This looks an awful lot like something I was seeing during early
  testing while adding locking to the mbuf system. Try `netstat -m' to see
  how many mbuf clusters are allocated. I would guess that the system is
  unable to allocate clusters reliably. In my case, at the time, I had
  forgotten to change a pointer dereference to meet the new structure, and
  thus it just worked out that after allocating the initial amount of
  clusters, nothing more was possible to allocate. I haven't seen this
  problem after fixing my mistake, nor before introducing it. None of the
  work I mentionned has been committed at any point in time (yet), so the
  problem can only be similar, at best (in any case, a `netstat -m' should
  offer a clue).

  Regards,
  Bosko Milekic
  [EMAIL PROTECTED]




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: No buffer space available errors

1999-11-10 Thread Bill Marquette

On Wed, 10 Nov 1999, Doug Ambrisko wrote:
> On one out of 8 machines, I ran into this problem.  My network is running
> at 100BaseTX.  I noticed that ifconfig showed OACTIVE flag set and I
> was running in autosense mode.  So I setup the media to 100BaseTX and now
> it works okay.
> 
> My guess is the autosense gets confused sometimes.

Doug, thanks for the tip, I'll give that a try as I haven't been forcing
either card to 10Mbit or TP connections.  I have added to the mystery
however and this is aggravatingly puzzling.

In my previous message I had mentioned that walking away fromt he console
for any length of time would cause the machine to generate the "No buffer
space" errors.  Since then I've turned off my screensaver (Matrix from
KDE) and the system has stabilized completely; well almost completely.  I
can still duplicate the buffer situation and I've still seen it occur when
doing fairly high bandwidth (for cable, not for ethernet) file transfers.
If I can manage to bring the xfer rate up to around 100K/sec total between
inbound and outbound I experience the "Out of buffer" messages within a
couple minutes.  netstat -m doesn't show anything unusual.  I am going to
try removing both NMBCLUSTERS and NBUF from my config and just use a
maxusers of 128 and see how that goes.

--Bill




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: No buffer space available errors

1999-11-10 Thread Doug Ambrisko

Bill Marquette writes:
| For the last week and a half or so I've been trying to track this down
| assuming it was a configuration error on my part or a problem with my
| ISP's DHCP configuration.  After switching from a DEC 20141 chipset card
| to a 3com 3c905, I found I was still having problems although the error
| message had changed.  I'm now back to the DEC card cause I found more
| results in the mailing lists on the errors I was seeing.
| 
| At first it appeared to be dhclient having issues, but I've finally found
| occurances of other programs showing symptoms prior to dhclient log
| entries.  Also, until today I couldn't replicate the problem while sitting
| at the console, it would only show itself after I'd been away for a few
| minutes (I swear it has a mind of it's own) usually, when I was going to
| be away for more than 10-15 minutes.  Today I managed to force the machine
| to have problems by doing multiple simultaneous downloads and uploads, I
| was running about 100K/sec through the NIC.

On one out of 8 machines, I ran into this problem.  My network is running
at 100BaseTX.  I noticed that ifconfig showed OACTIVE flag set and I
was running in autosense mode.  So I setup the media to 100BaseTX and now
it works okay.

My guess is the autosense gets confused sometimes.

Doug A.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message