I was going to stay quiet about the strange night I had on call, but since 
somebody else is seeing a very similar problem...

On June  23 at 10:45 PM, all my URL checks, DB checks, and email alerting 
failed. Diskspace and service checks still worked.  At 11:32 PM (4th check 
later) all checks started working normally and I received an 'alert storm' as 
the checks returned to UP status.  During the time of unusual behavior, Servers 
Alive was able to restart services and restart machines.  I did learn from the 
experience that I haven't configured Servers Alive perfectly and some machines 
still need manual intervention to return to working status after a restart.

The next day, I restarted the Servers Alive machine and haven't seen an issue 
since.

I have full logs available from the incident if you are interested.  Here is 
brief excerpt:

Friday, June 23, 2006 10:45:53 PM URL check (http://borgweb1/poll/poll_asp.asp) 
failed due to Unrecognized Error.(line  2120)
Friday, June 23, 2006 10:45:53 PM URL check took 17 ms
Friday, June 23, 2006 10:45:53 PM INFO: alerting SMTPP
Friday, June 23, 2006 10:45:53 PM TO convert : (PID= 0) to [EMAIL PROTECTED] 
Friday, June 23, 2006 10:45:53 PM Sending email message ([SA] Borgweb1 ASP is 
DOWN)
Friday, June 23, 2006 10:45:53 PM SMTP Error : Error :  10055No buffer space is 
availableT: 0Pfalse
Friday, June 23, 2006 10:45:53 PM SMTP Error : stopped sending mail
Friday, June 23, 2006 10:45:53 PM SMTP Error : Error :  10055No buffer space is 
available


Regards,

Brett Hanson
Systems Analyst, Agrium

>>> [EMAIL PROTECTED] 7/7/2006 10:15 AM >>>
At 12:50 AM 7/7/2006, Dirk Bulinckx wrote:


>What kind of checks are those?

PING, NT Service, URL, etc. (basically any and all).


>And does SA recover from itself or do you need a restart of the system to
>get it recover?


Oh, SA recovers on the next round just fine.  Just really annoying to 
get about 50-75 pages all saying "RUNNING" when a) there was no 
interruption that can be detected by any other means (i.e. outside 
monitors continue to run, I can be term served into the boxes in 
question when PING and other alerts fail, etc.)

And b) annoying that we don't get the DOWN alerts first. (though like 
I say, I suspect that's because it can't find the email server by 
name, so we've changed it to IP address to see what happens.)




>Dirk Bulinckx.
>-----Original Message-----
>From: Servers Alive Discussion List [mailto:[EMAIL PROTECTED] On Behalf
>Of Greg D. Moore
>Sent: Friday, July 07, 2006 5:46 AM
>To: Servers Alive Discussion List
>Subject: [SA-list] False alarms
>
>
>
>We've started to see a really weird problem that is annoying and causing me
>to lose sleep.
>
>False alarms.
>
>Namely out of the blue a dozen or more of our alerts will throw alerts.
>What's even stranger is they only are emailing UP alerts.  There's no
>preceding DOWN alert email.
>
>Basically we're seeing some sort of internal network issue (that I'm trying
>to track down).
>
>It appears the DOWN messages never get sent out. (could it be that salive
>tries the mail server, can't reach it and gives up?)
>
>
>Also, what's strange is it appears that only Salive is having this
>problem. (i.e. nothing else internally seems to be seeing these
>blips).  Any ideas on that?  errors mostly appear to be: "The Current
>connection has been aborted by the network or intermediate
>services."  It's a mixture of internal IPs and a few over a public
>network (so it doesn't look like it's a router issue.)
>
>The box in question seems to be ok, and I've been term served into it
>w/o issues while one of these little "alert storms" occurs.
>
>
>
>
>
>Greg D. Moore                                   [EMAIL PROTECTED] 
>TownNews.Com    1-518-687-6242          http://www.townnews.com 
>Operations Manager - East Greenbush Office, Troy NY 12180
>
>To unsubscribe send a message with UNSUBSCRIBE as subject to
>[email protected] 
>If you use auto-responders (like out-of-the-office messages), then make sure
>that they are not send to the list nor to the individual members of the list
>that send a message.  Doing this will get you removed from the list.
>
>To unsubscribe send a message with UNSUBSCRIBE as subject to 
>[email protected] 
>If you use auto-responders (like out-of-the-office messages), then 
>make sure that they are not send to the list nor to the individual 
>members of the list that send a message.  Doing this will get you 
>removed from the list.

Greg D. Moore                                   [EMAIL PROTECTED] 
TownNews.Com    1-518-687-6242          http://www.townnews.com 
Operations Manager - East Greenbush Office, Troy NY 12180

To unsubscribe send a message with UNSUBSCRIBE as subject to 
[email protected] 
If you use auto-responders (like out-of-the-office messages), then make sure 
that they are not send to the list nor to the individual members of the list 
that send a message.  Doing this will get you removed from the list.


                            IMPORTANT NOTICE !
This E-Mail transmission and any accompanying attachments may contain
confidential information intended only for the use of the individual or
entity named above. Any dissemination, distribution, copying or action taken
in reliance on the contents of this E-Mail by anyone other than the intended
recipient is strictly prohibited and is not intended to, in anyway, waive
privilege or confidentiality. If you have received this E-Mail in error please
immediately delete it and notify sender at the above E-Mail address.

Agrium uses state of the art anti-virus technology on all incoming and
outgoing E-Mail. We encourage and promote the use of safe E-Mail management
practices and recommend you check this, and all other E-Mail and attachments
you receive for the presence of viruses. The sender and Agrium accept no 
liability
for any damage caused by a virus or otherwise by the transmittal of this E-Mail.
                        IMPORTANT NOTICE

To unsubscribe send a message with UNSUBSCRIBE as subject to [email protected]
If you use auto-responders (like out-of-the-office messages), then make sure 
that they are not send to the list nor to the individual members of the list 
that send a message.  Doing this will get you removed from the list.

Reply via email to