Re: No buffer space available
Shizuka Kudo wrote: > Thanks for your response. I wonder if I misunderstand > your advice. When looking at the if_rl.c (dated Dec > 14), there's already a timer attached to > ifp->if_watchdog. Is this the timer you referred to? > If so, it looks like this timer never called by the > driver in my case as I never saw "watchdog timeout" > error. > > Any advice? It's not clear to me that the watchdog timer has been initialized at the time of the problem. The rl_txeof() function zeros it. Are you sure you are not getting *one* watchdog reset? One thing you might try is to put: rl_reset(sc); Between the rl_rxeof() call and the rl_init() call in the rl_watchdog() code. I'm not positive this is the right place for it: perhaps it -- and the rl_init()? -- should be before the rl_txeof() call. It is noticible that the rl_reset() function is used everywhere else before the rl_init() in the error recovery case, but not here, as when you down and re-up the interface, that's what's happens as well. It looks like if the receive interrupt is lost, that the watchdog doesn't cover that case, that it's specific to the transmit interrupt. This won't help with incoming connections initiated by a remote side (the initial SYN of the three way handshake) if the thing is wedged at the time, but... One possible workaround that would cause the transmit to fix the receive in case the receive interrupt was lost would be to call rl_rxeof(sc) as the first thing in the rl_txeof() routine. That way, a lost interrupt would be recovered when your ping packet went out by reaping the receivable data withouyt an interupt at all (basically, it makes it into a "poll on transmit" model, which is a really bad model, since it fails in the case I noted, but what the hack. 8-)). If the problem is a race window in the receive interrupt for the flag getting set (bad hardware, bad flag checks in the driver, etc.), one possible workaround would be to call the rl_rxeof() unconditionally in the interrupt, even if you *think* the interrupt is not for the rl device (i.e. perhaps the interrupt is sent before the RL_INTRS flag is set in the status word, or perhaps the reading of the status word is prone to failure). The way to handle this is to to change the for(;;) loop in the rl_int() function; specifically, move the if((status & RL_INTRS) == 0) break; To the *end* of the loop, after the check. You may also want to *unconditionally* call rl_rxeof(), instead of doing the call conditionally, just to be sure (do this only if nothing else fixes the problem for you). What's the net effect of this? The overall effect of doing this would be to slow down any device that shared a PCI interrupt with the if_rl card(s) in your system. This is why it's not done by default. Another possible approach is a *long* watchdog -- a second watchdog timer. Basically, this timer would fire and call the rl_intr() function on the interface, as if there had been a hardware interrupt. You would not want to do this more than once a second. The tricky part here is that you will need a wrapper function to raise and lower the SPL over the call (I'm actually curious why the current watchdog timer can get away with not raising the SPL to splbio from splnet, but I suppose it's so incredibly rare it's not a practical problem). If none of this works, you might consider labelling the harware as "broken", and swapping out the rl interface (probably means swapping out the motherboard for you, but rl 10/100 cards are US$9 at Frys, these days, so it would be a cheap experiment, if you wanted to try it that way, instead). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: No buffer space available
Terry, Thanks for your response. I wonder if I misunderstand your advice. When looking at the if_rl.c (dated Dec 14), there's already a timer attached to ifp->if_watchdog. Is this the timer you referred to? If so, it looks like this timer never called by the driver in my case as I never saw "watchdog timeout" error. Any advice? --- Terry Lambert <[EMAIL PROTECTED]> wrote: > Shizuka Kudo wrote: > > Is anyone still seeing the "No buffer space > availabe" > > message in 5.0-CURRENT? I have checked the mail > > archieve and saw several replies, but none worked > in > > my case. > > > > I have a Thinkpad 600X with a Melco cardbus 10/100 > > ethernet card (a Realtek 8139B) running 5.0 > NEWCARD > > kernel with NMBCLUSTERS=16384. > > "No buffer space available" occurred when I tried > to > > ftp a file in my Thinkpad from other client. > "ifconfig > > rl0 down" and then "ifconfig rl0 up" resumed the > > operation for awhile until the error happened > again. > > Setting media to 10baseT/UTP did not suffer from > this > > error and got about 900Kbytes/s throughput. > > > > Would that be a bug in the driver that ftp server > is > > delivering too much traffic to the NIC? Any > suggestion > > that I can try? > > That downing and reupping it worked indicates that > you > are losing a transmit interrupt draining the write > queue. > > I'm pretty sure that this has to be related to the > use > of "interrupt threads" in -current, as I do not have > the > problem with the 8139B's I have, with an older, > stable > version of FreeBSD. > > The downing and reupping it resets the card (this is > an > intentional side effect, designed to make Tigon II > cards > suck^W^W^W^W^Wallow recovery from a hosed driver > that > people would rather hack around than > fix^W^W^W^W^W^W^W^W, > which should never happen in practice). > > The "normal" workaround for this is to have a > software > "watchdog" timer that resets the card when it loses > its > mind like this (transmit interrupt pending, no > transmit > interrupt seen for timeout period). You could add > similar > code the the RealTek driver pretty easily, using the > Tigon > II or another driver as a template, if you wanted to > work > around the problem instead of actually fixing > it^W^W^W^W^W. > > -- Terry __ Do You Yahoo!? Great stuff seeking new owners in Yahoo! Auctions! http://auctions.yahoo.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: No buffer space available
Shizuka Kudo wrote: > Is anyone still seeing the "No buffer space availabe" > message in 5.0-CURRENT? I have checked the mail > archieve and saw several replies, but none worked in > my case. > > I have a Thinkpad 600X with a Melco cardbus 10/100 > ethernet card (a Realtek 8139B) running 5.0 NEWCARD > kernel with NMBCLUSTERS=16384. > "No buffer space available" occurred when I tried to > ftp a file in my Thinkpad from other client. "ifconfig > rl0 down" and then "ifconfig rl0 up" resumed the > operation for awhile until the error happened again. > Setting media to 10baseT/UTP did not suffer from this > error and got about 900Kbytes/s throughput. > > Would that be a bug in the driver that ftp server is > delivering too much traffic to the NIC? Any suggestion > that I can try? That downing and reupping it worked indicates that you are losing a transmit interrupt draining the write queue. I'm pretty sure that this has to be related to the use of "interrupt threads" in -current, as I do not have the problem with the 8139B's I have, with an older, stable version of FreeBSD. The downing and reupping it resets the card (this is an intentional side effect, designed to make Tigon II cards suck^W^W^W^W^Wallow recovery from a hosed driver that people would rather hack around than fix^W^W^W^W^W^W^W^W, which should never happen in practice). The "normal" workaround for this is to have a software "watchdog" timer that resets the card when it loses its mind like this (transmit interrupt pending, no transmit interrupt seen for timeout period). You could add similar code the the RealTek driver pretty easily, using the Tigon II or another driver as a template, if you wanted to work around the problem instead of actually fixing it^W^W^W^W^W. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: "No buffer space available" errors
On Mon, 18 Sep 2000, Ben Smithurst wrote: > Does anyone have any clue what could cause errors like this? I've > been seeing this sort of stuff since the SMPng commit, IIRC. I'm sure > there's more information I should be giving, so just let me know what > to find. dmesg is at the end. ... > ep0: <3Com 3C509-Combo EtherLink III> at port 0x300-0x30f irq 10 on isa0 > ep0: Ethernet address 00:a0:24:eb:f4:a2 Run 'netstat -m' when you encounter the problem and email me the results. -- | Matthew N. Dodd | '78 Datsun 280Z | '75 Volvo 164E | FreeBSD/NetBSD | | [EMAIL PROTECTED] | 2 x '84 Volvo 245DL| ix86,sparc,pmax | | http://www.jurai.net/~winter | This Space For Rent | ISO8802.5 4ever | To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: "No buffer space available" errors
On Mon, 18 Sep 2000, Ben Smithurst wrote: > Does anyone have any clue what could cause errors like this? I've > been seeing this sort of stuff since the SMPng commit, IIRC. I'm sure > there's more information I should be giving, so just let me know what to > find. dmesg is at the end. This looks an awful lot like something I was seeing during early testing while adding locking to the mbuf system. Try `netstat -m' to see how many mbuf clusters are allocated. I would guess that the system is unable to allocate clusters reliably. In my case, at the time, I had forgotten to change a pointer dereference to meet the new structure, and thus it just worked out that after allocating the initial amount of clusters, nothing more was possible to allocate. I haven't seen this problem after fixing my mistake, nor before introducing it. None of the work I mentionned has been committed at any point in time (yet), so the problem can only be similar, at best (in any case, a `netstat -m' should offer a clue). Regards, Bosko Milekic [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: No buffer space available errors
On Wed, 10 Nov 1999, Doug Ambrisko wrote: > On one out of 8 machines, I ran into this problem. My network is running > at 100BaseTX. I noticed that ifconfig showed OACTIVE flag set and I > was running in autosense mode. So I setup the media to 100BaseTX and now > it works okay. > > My guess is the autosense gets confused sometimes. Doug, thanks for the tip, I'll give that a try as I haven't been forcing either card to 10Mbit or TP connections. I have added to the mystery however and this is aggravatingly puzzling. In my previous message I had mentioned that walking away fromt he console for any length of time would cause the machine to generate the "No buffer space" errors. Since then I've turned off my screensaver (Matrix from KDE) and the system has stabilized completely; well almost completely. I can still duplicate the buffer situation and I've still seen it occur when doing fairly high bandwidth (for cable, not for ethernet) file transfers. If I can manage to bring the xfer rate up to around 100K/sec total between inbound and outbound I experience the "Out of buffer" messages within a couple minutes. netstat -m doesn't show anything unusual. I am going to try removing both NMBCLUSTERS and NBUF from my config and just use a maxusers of 128 and see how that goes. --Bill To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: No buffer space available errors
Bill Marquette writes: | For the last week and a half or so I've been trying to track this down | assuming it was a configuration error on my part or a problem with my | ISP's DHCP configuration. After switching from a DEC 20141 chipset card | to a 3com 3c905, I found I was still having problems although the error | message had changed. I'm now back to the DEC card cause I found more | results in the mailing lists on the errors I was seeing. | | At first it appeared to be dhclient having issues, but I've finally found | occurances of other programs showing symptoms prior to dhclient log | entries. Also, until today I couldn't replicate the problem while sitting | at the console, it would only show itself after I'd been away for a few | minutes (I swear it has a mind of it's own) usually, when I was going to | be away for more than 10-15 minutes. Today I managed to force the machine | to have problems by doing multiple simultaneous downloads and uploads, I | was running about 100K/sec through the NIC. On one out of 8 machines, I ran into this problem. My network is running at 100BaseTX. I noticed that ifconfig showed OACTIVE flag set and I was running in autosense mode. So I setup the media to 100BaseTX and now it works okay. My guess is the autosense gets confused sometimes. Doug A. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message