I have a bald spot on my head from pulling my hair out trying to solve why my witango app servers go down a couple times a day, and have to be restarted manually. I know there are a couple of other people that have had this same problem.

This problem has occured usually during very heavy traffic periods, and the crash happens after a random period of time. The event log would sometimes show a fatal exception, and other times would show a strange odbc.connection error. I have sent in test applications and logs to With Enterprise to reproduce the problem, and I believe there are few engineers with bald spots there also.

I almost don't have this problem at all on my test system, only my deployed servers, which can sustain 50 simultaneous connections and server hundreds of thousands of pages and blobs a day.

This pointed me to the problem being some kind of memory leak in the witango odbc interface, complicated with my heavy blob serving.

Last night around midnight I woke up to my alarms going off and found the app servers crashed at the same time, which was unusual, and when I restarted them, they just kept crashing. I would get them up and running, and would serve tml's, but when I hit a taf with a query, they would hang, crash, or give a tcp error that the odbc source was not published on this protocol.

After at least 2 hours of trying to resolve the issue, I realized that the app servers could not see the database server (which is two feet away on the same subnet). So I went to the basics and did a ping from my app server to the database server:

ping db1.mydomain.com

I was astonished to see the result, the app server started pinging an ip address that I have never heard of. The DNS server I was using must have gone wacko. I am using a DNS server in the data center, maintained by the datacenter which is a BIND system running on OS X. I woke the Datacenter mgr up and told him he had a serious problem, and he rebooted the dns servers, and VOILA, the servers started working again and the problem was resolved.

This incident gave me much concern, and I decided that I should not rely on someone elses dns, so I setup a quick forwarding dns server on my database server (windows 2003 server, took 2 minutes), and pointed all of my machines to it. This DNS has no zones, but would cache all dns queries and only serve this group of machines.

To my amazement my servers chugged through the heaviest traffic periods all day, without a single incident. Not a timeout, not a crash, nothing. I have been floored all day staring at my live performance counters. I have watched the servers go through sustained periods up to 4 hours long where they were averaging 50 simultaneous connections.

This has not been possible for me.

In conclusion, I can only assume that if the app server has to connect to a datasource, and needs to resolve a domain name, and the dns server doesn't respond back quick enough, and other queries cause several threads to have the same problem, it would crash.

Bringing the DNS to a high level of performance seems to have solved the issue. I am now going to add stub zones to my dns server so that it never has to query another dns, and the response will be instant. I may even install a backup dns server on each app server, pointing to the main on the database server, and then point the primary dns for each app server to itself, to give it the highest dns performance possible.

This was the last thing I would have ever thought to look at. If you are deploying heavy traffic servers, check this issue.

Some may ask why not use ip addresses and not names. Well periodically an ip address may change, but I can always keep the name consistent, and therefore never have to rewrite code or change a bunch of dsn's because I just changed an IP.

I debated whether posting this to the list so soon, before having more time of watching the issue, but If anyone else is having issues, this may help. It has given me back my life, not having to watch the servers so closely.

--

Robert Garcia
President - BigHead Technology
CTO - eventpix.com
2781 N Carlmont Pl
Simi Valley, Ca 93065
ph: 805.522.8577 - cell: 805.501.1390 - fax: 805.830.0321
[EMAIL PROTECTED] - [EMAIL PROTECTED]
http://bighead.net/ - http://eventpix.com/ - http://theradmac.com/

________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/maillist.taf

Reply via email to