David Rees schrieb:
On Sun, Mar 30, 2008 at 2:14 AM, David Rees <[EMAIL PROTECTED]> wrote:
From my understanding of the clustering software, it appears that
Tomcat is trying to send messages to the other Tomcat but it isn't
receiving them? Shouldn't it drop membership and give up? I suspect
that some reconfiguration of the cluster could avoid this...
I decided to try some different settings based on the samples provided
in the documentation. So I went from my essentially default/simple
configuration to adding this to the cluster config:
<Sender className="org.apache.catalina.cluster.tcp.maxQueueLength"
replicationMode="fastasyncqueue"
recoverTimeout="5000" recoverCounter="1"
doTransmitterProcessingStats="true" doProcessingStats="true"
queueDoStats="true" queueTimeWait="true"
queueChecklock="true" maxQueueLength="10000"
waitForAck="true" autoConnect="true"
keepAliveTimeout="320000" keepAliveRequestCount="-1"/>
So far this appears to have resolved the issue as a recent dump shows
no ClusterData or LinkObject classes linked to the
FastAsyncQueueSender class.
I have no idea which one of the specific settings may have resolved
the issue, and until I get the time to duplicate the issue in the lab
(perhaps later this week) I won't be able to verify.
Any ideas?
First to make sure: counting objects in general only makes sense after a
full GC. Otherwise the heap dump will contain garbage too.
Just some basic info: the LinkObject objects can be either in a
FastQueue, or they are used in a FastAsyncSocketSender directly after
removing from the FastQueue and before actually sending.
The LinkObjects themselves represent a simple linked list, linked via
their member "next". Each FastQueue has a reference to one such linked
list, the beginning is the member first of the FastQueue, the end of the
list is the member last. They could be null, in case the queue ist empty.
The FastAsyncSocketSender runs in a small loop, waiting for something to
appear in the queue. If the queue is non empty, it picks up the whole
list of linked elements and replaces the member "first" with null. Then
it tries to send all elements in the linked list to the repliction
member. This happens in methods run (the loop), getQueuedMessage
(retrieve the linked list from the queue) and pushQueuedMessages (send
data to member).
For each replication target (other cluster member) there is a FastQueue
and an associated FastAsyncSocketSender.
The queue has a maximum length called maxQueueLength (which gets set
from the property with the same name associated with the Sender, just
how you set it). If you check the queue size via JConsole or the
jmxproxy of the manager webapp e.g. once a minute, the queue should be
nearly always be 0. In case replication gets slow, the queue might getr
longer, because the Sender needs more time to send the messages, before
it returns back to empty the queue again.
Why you had that many LinkObjects is not clear. You could first try to
check, if the LinkObjects actually belong to a Queue, or not (e.g. then
they are already in the Sender). Have a look at your log files, if there
are errors or unexpected cluster membership messages.
E.g. sometimes there is a problem with relatively low mcastDropTime,
because in case of a long running GC, one of the members might not send
a heartbeat message for several seconds. You should know, how big your
GC pauses get and tune the mcastDropTime to be long enough.
I'm not saying this is the cause of the problems, but maybe a good
starting point.
In general I would suggest to not use the waitForAck feature. That's not
a strict rule, but if you do async replication and use session
stickyness for load balancing, then you usually put a strong focus on
the replication not influencing your webapp negatively. Activating
waitForAck lets you realize more reliably, if there is a replication
problem, but it also increases the overhead. You mileage may vary.
Regards,
Rainer
---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]