David Rees schrieb:
On Sun, Mar 30, 2008 at 2:14 AM, David Rees <[EMAIL PROTECTED]> wrote:
 From my understanding of the clustering software, it appears that
 Tomcat is trying to send messages to the other Tomcat but it isn't
 receiving them? Shouldn't it drop membership and give up? I suspect
 that some reconfiguration of the cluster could avoid this...

I decided to try some different settings based on the samples provided
in the documentation. So I went from my essentially default/simple
configuration to adding this to the cluster config:

<Sender className="org.apache.catalina.cluster.tcp.maxQueueLength"
replicationMode="fastasyncqueue"
recoverTimeout="5000" recoverCounter="1"
doTransmitterProcessingStats="true" doProcessingStats="true"
queueDoStats="true" queueTimeWait="true"
queueChecklock="true" maxQueueLength="10000"
waitForAck="true" autoConnect="true"
keepAliveTimeout="320000" keepAliveRequestCount="-1"/>

So far this appears to have resolved the issue as a recent dump shows
no ClusterData or LinkObject classes linked to the
FastAsyncQueueSender class.

I have no idea which one of the specific settings may have resolved
the issue, and until I get the time to duplicate the issue in the lab
(perhaps later this week) I won't be able to verify.

Any ideas?

First to make sure: counting objects in general only makes sense after a full GC. Otherwise the heap dump will contain garbage too.

Just some basic info: the LinkObject objects can be either in a FastQueue, or they are used in a FastAsyncSocketSender directly after removing from the FastQueue and before actually sending.

The LinkObjects themselves represent a simple linked list, linked via their member "next". Each FastQueue has a reference to one such linked list, the beginning is the member first of the FastQueue, the end of the list is the member last. They could be null, in case the queue ist empty.

The FastAsyncSocketSender runs in a small loop, waiting for something to appear in the queue. If the queue is non empty, it picks up the whole list of linked elements and replaces the member "first" with null. Then it tries to send all elements in the linked list to the repliction member. This happens in methods run (the loop), getQueuedMessage (retrieve the linked list from the queue) and pushQueuedMessages (send data to member).

For each replication target (other cluster member) there is a FastQueue and an associated FastAsyncSocketSender.

The queue has a maximum length called maxQueueLength (which gets set from the property with the same name associated with the Sender, just how you set it). If you check the queue size via JConsole or the jmxproxy of the manager webapp e.g. once a minute, the queue should be nearly always be 0. In case replication gets slow, the queue might getr longer, because the Sender needs more time to send the messages, before it returns back to empty the queue again.

Why you had that many LinkObjects is not clear. You could first try to check, if the LinkObjects actually belong to a Queue, or not (e.g. then they are already in the Sender). Have a look at your log files, if there are errors or unexpected cluster membership messages.

E.g. sometimes there is a problem with relatively low mcastDropTime, because in case of a long running GC, one of the members might not send a heartbeat message for several seconds. You should know, how big your GC pauses get and tune the mcastDropTime to be long enough.

I'm not saying this is the cause of the problems, but maybe a good starting point.

In general I would suggest to not use the waitForAck feature. That's not a strict rule, but if you do async replication and use session stickyness for load balancing, then you usually put a strong focus on the replication not influencing your webapp negatively. Activating waitForAck lets you realize more reliably, if there is a replication problem, but it also increases the overhead. You mileage may vary.

Regards,

Rainer

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to