David Rees wrote:
On Mon, Mar 31, 2008 at 12:49 PM, Rainer Jung <[EMAIL PROTECTED]> wrote:
 First to make sure: counting objects in general only makes sense after a
 full GC. Otherwise the heap dump will contain garbage too.

Yes, I made sure the objects I was looking at had a valid GC
reference. They really were getting stuck in the queue.

 Just some basic info: the LinkObject objects can be either in a
 FastQueue, or they are used in a FastAsyncSocketSender directly after
 removing from the FastQueue and before actually sending.
<snip>

Thank you for the detailed description on how the Queues work with the cluster.

 Why you had that many LinkObjects is not clear. You could first try to
 check, if the LinkObjects actually belong to a Queue, or not (e.g. then
 they are already in the Sender). Have a look at your log files, if there
 are errors or unexpected cluster membership messages.

One problem I've intermittently had with clustering is that after a
Tomcat restart (we shut down one node and it immediately restarts,
generally within 30 seconds), they two nodes don't consistently sync
up. (The restarted node would not have the sessions from the other
node, but new sessions would get replicated over) I have to think that
this may be related to this issue.
I believe you have to wait at least 30seconds before you bring up the other node. especially, if you are using mcastDropTime="30000" (could be the default?) then your nodes wont even realize this one is gone, and when you bring it back up within 30seconds, to the other nodes its like nothing ever changed.

As rainer mentioned, if you are just starting to use cluster, switch to TC6 to avoid the migration you will have to make. TC6 also handles this scenario regardless of what you set your droptime to

Filip
I checked the logs and didn't see any issues in the Tomcat logs with
members dropping from the cluster until the JVM got close to running
out of memory and performing a lot of full GCs - when examing the
dump, the vast majority of space in the heap (600+MB out of 1GB) was
with byte arrays referenced by LinkObjects.

 In general I would suggest to not use the waitForAck feature. That's not
 a strict rule, but if you do async replication and use session
 stickyness for load balancing, then you usually put a strong focus on
 the replication not influencing your webapp negatively. Activating
 waitForAck lets you realize more reliably, if there is a replication
 problem, but it also increases the overhead. You mileage may vary.

So what would cause the FastQueue to accumulate ClusterData even when
the cluster is apparently running properly? Is there any failsafe
(besides setting a maximum queuesize) to allow old data to be purged?
I mean, 600k ClusterData objects is a lot!

-Dave

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to