On Nov 7, 2007, at 1:47 AM, [EMAIL PROTECTED] wrote:
Hi,
we have got a problem for some months now that we can’t find a
solution for.
The situation: We have an application that is running on four
different application servers (with quite some instances on each
server, servers running on linux) controlled by monitors running on
two of those servers (each monitor is responsible for 2 servers).
The wotaskd is running on each server as well. Finally we got two
web servers (Apache 2.0.49). We use Java 1.4.2, WebObjects 5.2.3.
The problem: Several times a day on each of the instances we got
session timeouts (SessionRestorationErrors). But the sessions don’t
time out, the requests are placed on the wrong instances. Of
course, the session ids are not known on those wrong instances so
the SessionRestorationErrors take place.
What we have done so far: we tried setting send timeout, receive
timeout and connect timeout in “Load Balancing and Adaptor
Settings” to values of one minute and above without any success.
That is the classic solution for this type of problem. I can think
of two explanations why it might not be working. The first is that
your instances are stalling for longer than one minute. The other is
that the problem is at a level below WebObjects.
For the first situation, we can use the apps to diagnose it. Add
this to your Application,
public WOResponse dispatchRequest(WORequest request)
{
WOResponse response;
NSTimestamp startTime = new NSTimestamp();
response = super.dispatchRequest(request);
NSTimestamp stopTime = new NSTimestamp();
long milliseconds = stopTime.getTime() -
startTime.getTime();
NSLog.debug.appendln("," + request.uri() + ", - elapsed
time: ," + (milliseconds / 1000.0) );
return response;
}
You can easily grep this out of the log, separate it by commas, and
sort by the time to see what the longest lag in returning a response
it. If it is over a minute, I would look at:
1. Slow queries / DB contention
2. Excessive garbage collection due to memory starvation
3. Other processes on the machine (a cron job?) taking too many
resources
If it is not over a minute, see below.
We are logging the woadaptor now. It seems we have got some kind of
connection trouble:
Error: couldn't connect to 10.0.0.40 (1085): Operation now in progress
Error: Error connecting to server 10.0.0.40
Warn: Unable to find instance 55. Attempting to select another.
Warn: Unable to find instance 55. Attempting to select another.
Warn: Unable to find instance 60. Attempting to select another.
But 10.0.0.40:1085 is up and running. This error message is just
been thrown about every 10 or 20 minutes and not all the time.
We found some similar problems in mailing lists but none was
helpful so far. Any suggestions how we can get rid of this problem?
Thanks in advance.
The only other thing I can think of is that you have problems in your
network or the app servers are running out of ports / file handles or
some similar problem below the level of WebObjects. I have no idea
how to debug that.
Chuck
--
Practical WebObjects - for developers who want to increase their
overall knowledge of WebObjects or who are trying to solve specific
problems.
http://www.global-village.net/products/practical_webobjects
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-deploy mailing list (Webobjects-deploy@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-deploy/archive%40mail-archive.com
This email sent to [EMAIL PROTECTED]