The "buffer space" problem makes me think of a system that no longer has TCP sockets available. What I've seen on some systems is that the OS keeps them longer open then the app does (app closes the socket, but from an OS point of view they are still open). And after a while (some kind of timeout I think) the OS realy closes/releases them and they are again available to anyone that wants to use them. I've also seen this on system running mailservers, were a disconnect from the socket is not seen as being a stop of the connection and the OS (or the mail server service/app) still keeps them "active" or open.
Dirk Bulinckx. -----Original Message----- From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf Of Brett Hanson Sent: Friday, July 07, 2006 11:06 PM To: Servers Alive Discussion List Subject: RE: [SA-list] False alarms I was going to stay quiet about the strange night I had on call, but since somebody else is seeing a very similar problem... On June 23 at 10:45 PM, all my URL checks, DB checks, and email alerting failed. Diskspace and service checks still worked. At 11:32 PM (4th check later) all checks started working normally and I received an 'alert storm' as the checks returned to UP status. During the time of unusual behavior, Servers Alive was able to restart services and restart machines. I did learn from the experience that I haven't configured Servers Alive perfectly and some machines still need manual intervention to return to working status after a restart. The next day, I restarted the Servers Alive machine and haven't seen an issue since. I have full logs available from the incident if you are interested. Here is brief excerpt: Friday, June 23, 2006 10:45:53 PM URL check (http://borgweb1/poll/poll_asp.asp) failed due to Unrecognized Error.(line 2120) Friday, June 23, 2006 10:45:53 PM URL check took 17 ms Friday, June 23, 2006 10:45:53 PM INFO: alerting SMTPP Friday, June 23, 2006 10:45:53 PM TO convert : (PID= 0) to [EMAIL PROTECTED] Friday, June 23, 2006 10:45:53 PM Sending email message ([SA] Borgweb1 ASP is DOWN) Friday, June 23, 2006 10:45:53 PM SMTP Error : Error : 10055No buffer space is availableT: 0Pfalse Friday, June 23, 2006 10:45:53 PM SMTP Error : stopped sending mail Friday, June 23, 2006 10:45:53 PM SMTP Error : Error : 10055No buffer space is available Regards, Brett Hanson Systems Analyst, Agrium >>> [EMAIL PROTECTED] 7/7/2006 10:15 AM >>> At 12:50 AM 7/7/2006, Dirk Bulinckx wrote: >What kind of checks are those? PING, NT Service, URL, etc. (basically any and all). >And does SA recover from itself or do you need a restart of the system to >get it recover? Oh, SA recovers on the next round just fine. Just really annoying to get about 50-75 pages all saying "RUNNING" when a) there was no interruption that can be detected by any other means (i.e. outside monitors continue to run, I can be term served into the boxes in question when PING and other alerts fail, etc.) And b) annoying that we don't get the DOWN alerts first. (though like I say, I suspect that's because it can't find the email server by name, so we've changed it to IP address to see what happens.) >Dirk Bulinckx. >-----Original Message----- >From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf >Of Greg D. Moore >Sent: Friday, July 07, 2006 5:46 AM >To: Servers Alive Discussion List >Subject: [SA-list] False alarms > > > >We've started to see a really weird problem that is annoying and causing me >to lose sleep. > >False alarms. > >Namely out of the blue a dozen or more of our alerts will throw alerts. >What's even stranger is they only are emailing UP alerts. There's no >preceding DOWN alert email. > >Basically we're seeing some sort of internal network issue (that I'm trying >to track down). > >It appears the DOWN messages never get sent out. (could it be that salive >tries the mail server, can't reach it and gives up?) > > >Also, what's strange is it appears that only Salive is having this >problem. (i.e. nothing else internally seems to be seeing these >blips). Any ideas on that? errors mostly appear to be: "The Current >connection has been aborted by the network or intermediate >services." It's a mixture of internal IPs and a few over a public >network (so it doesn't look like it's a router issue.) > >The box in question seems to be ok, and I've been term served into it >w/o issues while one of these little "alert storms" occurs. > > > > > >Greg D. Moore [EMAIL PROTECTED] >TownNews.Com 1-518-687-6242 http://www.townnews.com >Operations Manager - East Greenbush Office, Troy NY 12180 > >To unsubscribe send a message with UNSUBSCRIBE as subject to >[email protected] >If you use auto-responders (like out-of-the-office messages), then make sure >that they are not send to the list nor to the individual members of the list >that send a message. Doing this will get you removed from the list. > >To unsubscribe send a message with UNSUBSCRIBE as subject to >[email protected] >If you use auto-responders (like out-of-the-office messages), then >make sure that they are not send to the list nor to the individual >members of the list that send a message. Doing this will get you >removed from the list. Greg D. Moore [EMAIL PROTECTED] TownNews.Com 1-518-687-6242 http://www.townnews.com Operations Manager - East Greenbush Office, Troy NY 12180 To unsubscribe send a message with UNSUBSCRIBE as subject to [email protected] If you use auto-responders (like out-of-the-office messages), then make sure that they are not send to the list nor to the individual members of the list that send a message. Doing this will get you removed from the list. IMPORTANT NOTICE ! This E-Mail transmission and any accompanying attachments may contain confidential information intended only for the use of the individual or entity named above. Any dissemination, distribution, copying or action taken in reliance on the contents of this E-Mail by anyone other than the intended recipient is strictly prohibited and is not intended to, in anyway, waive privilege or confidentiality. If you have received this E-Mail in error please immediately delete it and notify sender at the above E-Mail address. Agrium uses state of the art anti-virus technology on all incoming and outgoing E-Mail. We encourage and promote the use of safe E-Mail management practices and recommend you check this, and all other E-Mail and attachments you receive for the presence of viruses. The sender and Agrium accept no liability for any damage caused by a virus or otherwise by the transmittal of this E-Mail. IMPORTANT NOTICE To unsubscribe send a message with UNSUBSCRIBE as subject to [email protected] If you use auto-responders (like out-of-the-office messages), then make sure that they are not send to the list nor to the individual members of the list that send a message. Doing this will get you removed from the list. To unsubscribe send a message with UNSUBSCRIBE as subject to [email protected] If you use auto-responders (like out-of-the-office messages), then make sure that they are not send to the list nor to the individual members of the list that send a message. Doing this will get you removed from the list.
