Re: Cluster Not Syncing After Restart

2008-01-21 Thread Peter Rossbach

Yes, you are right!

Feedback welcome :-)
Peter



Am 18.01.2008 um 22:24 schrieb David Rees:


On Jan 18, 2008 1:13 PM, David Rees [EMAIL PROTECTED] wrote:

So your testing is with the to-be-released 5.5.26? Where can I get a
copy of this and the changes going into 5.5.26?


I found the changelog, looks like I need to build 5.5.26 from source
myself if i want to test it?

http://svn.apache.org/viewvc/tomcat/container/tc5.5.x/webapps/docs/ 
changelog.xml


It does look like there are some bugfixes in it which may affect
cluster synchronization.

-Dave

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






Re: Cluster Not Syncing After Restart

2008-01-18 Thread Peter Rossbach

Hi David,

I have also made a test with two cluster nodes. Is node2 under load  
as node1 is started the message are normal.
Cluster is starts first and receive message, but the application  
starts later. As application is startet at node1 the manager
request node2 for current sessions. All message that received before  
the sync is complete are dropped. All
message has a timestamp and only message dropped before the  
GET_ALL_SESSION message are created.


At next release 5.5.26 the sessionCounter is also set correct after  
restart. At my test all sessions are correct load

after restart.

Peter

PS: My Cluster Config:

		Cluster  
className=org.apache.catalina.cluster.tcp.SimpleTcpCluster

 doClusterLog=true
 clusterLogName=clusterlog
  
manager.className=org.apache.catalina.cluster.session.DeltaManager

 manager.expireSessionsOnShutdown=false
 manager.notifyListenersOnReplication=false
  
manager.notifySessionListenersOnReplication=false

 manager.sendAllSessions=false
 manager.sendAllSessionsSize=500
 manager.sendAllSessionsWaitTime=20
   Membership  
className=org.apache.catalina.cluster.mcast.McastService

   mcastAddr=224.0.0.4
   mcastPort=${mcast.port.prefix}7
   mcastFrequency=1000
   mcastDropTime=3/
   Receiver  
className=org.apache.catalina.cluster.tcp.ReplicationListener

   tcpListenAddress=auto
   tcpListenPort=${port.prefix}7
   tcpSelectorTimeout=100
   tcpThreadCount=6 /
   Sender  
className=org.apache.catalina.cluster.tcp.ReplicationTransmitter

   replicationMode=fastasyncqueue
   recoverTimeout=5000
   recoverCounter=6
   doTransmitterProcessingStats=true
   doProcessingStats=true
   doWaitAckStats=true
   queueTimeWait=true
   queueDoStats=true
   queueCheckLock=true
   ackTimeout=15000
   waitForAck=true
   keepAliveTimeout=8
   keepAliveMaxRequestCount=-1/
   Valve  
className=org.apache.catalina.cluster.tcp.ReplicationValve
  filter=.*\.gif;.* 
\.js;.*\.css;.*\.png;.*\.jpeg;.*\.jpg;.*\.htm;.*\.html;.*\.txt;

primaryIndicator=true /
   Valve  
className=org.apache.catalina.cluster.session.JvmRouteBinderValve

 enabled=true / 
   ClusterListener  
className=org.apache.catalina.cluster.session.ClusterSessionListener / 

   ClusterListener  
className=org.apache.catalina.cluster.session.JvmRouteSessionIDBinderLi 
stener /

  /Cluster


Am 17.01.2008 um 09:24 schrieb David Rees:


On Jan 17, 2008 12:16 AM, Peter Rossbach [EMAIL PROTECTED] wrote:

you can't restart your nodes so quickly. As you shutdown one node you
must wait 30-45 sec before next restart. Don't restart both nodes  
at same time.


I am waiting at 4 minutes before restarting node 2 before restarting
node 1, so that is not the problem in this case.

The sequence goes like this:

04:30: Shut down node 1, it gets restarted automatically immediately
after it has completely shut down (typically within 10 seconds).
04:34: Shut down node 2, it gets restarted automatically immediately
after it has completely shut down (typically within 10 seconds).

-Dave

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






Re: Cluster Not Syncing After Restart

2008-01-18 Thread David Rees
On Jan 18, 2008 1:13 PM, David Rees [EMAIL PROTECTED] wrote:
 So your testing is with the to-be-released 5.5.26? Where can I get a
 copy of this and the changes going into 5.5.26?

I found the changelog, looks like I need to build 5.5.26 from source
myself if i want to test it?

http://svn.apache.org/viewvc/tomcat/container/tc5.5.x/webapps/docs/changelog.xml

It does look like there are some bugfixes in it which may affect
cluster synchronization.

-Dave

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Cluster Not Syncing After Restart

2008-01-18 Thread David Rees
On Jan 18, 2008 8:17 AM, Peter Rossbach [EMAIL PROTECTED] wrote:
 I have also made a test with two cluster nodes. Is node2 under load
 as node1 is started the message are normal.
 Cluster is starts first and receive message, but the application
 starts later. As application is startet at node1 the manager
 request node2 for current sessions. All message that received before
 the sync is complete are dropped. All
 message has a timestamp and only message dropped before the
 GET_ALL_SESSION message are created.

Just to be clear, you are able to stop and immediately restart a node
multiple times without issue?

 At next release 5.5.26 the sessionCounter is also set correct after
 restart. At my test all sessions are correct load
 after restart.

So your testing is with the to-be-released 5.5.26? Where can I get a
copy of this and the changes going into 5.5.26?

Thanks,
Dave

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Cluster Not Syncing After Restart

2008-01-17 Thread Peter Rossbach

Hi David,

you can't restart your nodes so quickly. As you shutdown one node you  
must wait 30-45 sec


before next restart. Don't restart both nodes at same time.

Peter


Am 17.01.2008 um 02:25 schrieb David Rees:


I'm running a Tomcat 5.5.25 cluster with 2 nodes and the following
cluster configuration in the Host/ element:

Cluster className=org.apache.catalina.cluster.tcp.SimpleTcpCluster
Valve className=org.apache.catalina.cluster.tcp.ReplicationValve
		   filter=.*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.* 
\.css;.*\.txt;/
	Valve  
className=org.apache.catalina.cluster.session.JvmRouteBinderValve

enabled=true/
ClusterListener
className=org.apache.catalina.cluster.session.ClusterSessionListener 
/

ClusterListener
className=org.apache.catalina.cluster.session.JvmRouteSessionIDBinder 
Listener/

/Cluster

Every night, I restart the two nodes of the cluster, one at a time,
using a standard shutdown with a processing monitoring Tomcat to start
it back up automatically after it has exited (daemontools for those
interested).

But after restarting the first node, it does not rejoin the cluster.
In the logs of node 1 are this (node was shut down at 04:30):

2008-01-17 04:30:07,212 INFO : SimpleTcpCluster: Cluster is about  
to start

2008-01-17 04:30:07,317 INFO : SimpleTcpCluster: Add Default
ClusterReceiver at cluster localhost
2008-01-17 04:30:07,326 INFO : SimpleTcpCluster: Add Default
ClusterSender at cluster localhost
2008-01-17 04:30:07,341 INFO : SocketReplicationListener: Open Socket
at [10.1.1.5:8015]
2008-01-17 04:30:07,363 ERROR: ClusterListener: Context manager  
doesn't exist:

^^^ That line repeats ~25,000 times for a couple seconds
2008-01-17 04:30:10,110 ERROR: ClusterListener: Context manager  
doesn't exist:

2008-01-17 04:30:11,443 INFO : McastService: membership mbean
registered (Catalina:type=ClusterMembership,host=localhost)
2008-01-17 04:30:11,790 INFO : DeltaManager: Starting clustering  
manager...:

2008-01-17 04:30:11,790 INFO : DeltaManager: Register manager  to
cluster element Host with name localhost
2008-01-17 04:30:11,790 INFO : DeltaManager: Starting clustering  
manager at

2008-01-17 04:30:11,794 WARN : DeltaManager: Manager [], requesting
session state from
org.apache.catalina.cluster.mcast.McastMember[tcp:// 
10.1.1.6:8015,catalina,10.1.1.6,8015,

alive=86161577]. This operation will timeout if no session state has
been received within 60 seconds.
2008-01-17 04:30:21,894 ERROR: DeltaManager: Manager []: No session
state send at 17/01/08 04:30 received, timing out after 10,102 ms.

While node 1 is shutting down, ~4000 messages like this are  
generated on node 2:


2008-01-17 04:30:06,233 WARN : FastAsyncSocketSender: Unable to
asynchronously send session with id=[x.c-web2-1200438737718] -
message will be ignored.
java.lang.NullPointerException
at org.apache.catalina.cluster.tcp.DataSender.pushMessage 
(DataSender.java:1057)
at org.apache.catalina.cluster.tcp.FastAsyncSocketSender 
$FastQueueThread.pushQueuedMessages(FastAsyncSocketSender.java:524)
at org.apache.catalina.cluster.tcp.FastAsyncSocketSender 
$FastQueueThread.run(FastAsyncSocketSender.java:487)

2008-01-17 04:30:06,235 WARN : FastAsyncSocketSender: Unable to
asynchronously send session with id=[x.c-web2-1200438739053] -
message will be ignored.
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java: 
333)
at java.net.PlainSocketImpl.connectToAddress 
(PlainSocketImpl.java:195)

at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at org.apache.catalina.cluster.tcp.DataSender.createSocket 
(DataSender.java:897)
at org.apache.catalina.cluster.tcp.DataSender.openSocket 
(DataSender.java:866)
at org.apache.catalina.cluster.tcp.DataSender.pushMessage 
(DataSender.java:1009)
at org.apache.catalina.cluster.tcp.FastAsyncSocketSender 
$FastQueueThread.pushQueuedMessages(FastAsyncSocketSender.java:524)
at org.apache.catalina.cluster.tcp.FastAsyncSocketSender 
$FastQueueThread.run(FastAsyncSocketSender.java:487)


These stop after about 1 second (last one at 04:30:07.340), and then a
decent number of Send stats lines are then printed on node 2:

2008-01-17 04:30:07,376 INFO : DataSender: Send stats from
[10.1.1.5:8,015], Nr of bytes sent=20,441,242 over 26,000 = 786
bytes/request, processing time 58,669,413 msec, avg processing time
2,257 msec
!-- More lines similar to the above and below --
2008-01-17 04:30:10,053 INFO : DataSender: Send stats from
[10.1.1.5:8,015], Nr of bytes sent=39,242,737 over 50,000 = 784
bytes/request, processing time 58,672,059 msec, avg processing time
1,173 msec

When node 2 restarts a couple minutes later, it is 

Re: Cluster Not Syncing After Restart

2008-01-17 Thread David Rees
On Jan 17, 2008 12:16 AM, Peter Rossbach [EMAIL PROTECTED] wrote:
 you can't restart your nodes so quickly. As you shutdown one node you
 must wait 30-45 sec before next restart. Don't restart both nodes at same 
 time.

I am waiting at 4 minutes before restarting node 2 before restarting
node 1, so that is not the problem in this case.

The sequence goes like this:

04:30: Shut down node 1, it gets restarted automatically immediately
after it has completely shut down (typically within 10 seconds).
04:34: Shut down node 2, it gets restarted automatically immediately
after it has completely shut down (typically within 10 seconds).

-Dave

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Cluster Not Syncing After Restart

2008-01-16 Thread David Rees
I'm running a Tomcat 5.5.25 cluster with 2 nodes and the following
cluster configuration in the Host/ element:

Cluster className=org.apache.catalina.cluster.tcp.SimpleTcpCluster
Valve className=org.apache.catalina.cluster.tcp.ReplicationValve
   
filter=.*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;/
Valve 
className=org.apache.catalina.cluster.session.JvmRouteBinderValve
enabled=true/
ClusterListener
className=org.apache.catalina.cluster.session.ClusterSessionListener/
ClusterListener
className=org.apache.catalina.cluster.session.JvmRouteSessionIDBinderListener/
/Cluster

Every night, I restart the two nodes of the cluster, one at a time,
using a standard shutdown with a processing monitoring Tomcat to start
it back up automatically after it has exited (daemontools for those
interested).

But after restarting the first node, it does not rejoin the cluster.
In the logs of node 1 are this (node was shut down at 04:30):

2008-01-17 04:30:07,212 INFO : SimpleTcpCluster: Cluster is about to start
2008-01-17 04:30:07,317 INFO : SimpleTcpCluster: Add Default
ClusterReceiver at cluster localhost
2008-01-17 04:30:07,326 INFO : SimpleTcpCluster: Add Default
ClusterSender at cluster localhost
2008-01-17 04:30:07,341 INFO : SocketReplicationListener: Open Socket
at [10.1.1.5:8015]
2008-01-17 04:30:07,363 ERROR: ClusterListener: Context manager doesn't exist:
^^^ That line repeats ~25,000 times for a couple seconds
2008-01-17 04:30:10,110 ERROR: ClusterListener: Context manager doesn't exist:
2008-01-17 04:30:11,443 INFO : McastService: membership mbean
registered (Catalina:type=ClusterMembership,host=localhost)
2008-01-17 04:30:11,790 INFO : DeltaManager: Starting clustering manager...:
2008-01-17 04:30:11,790 INFO : DeltaManager: Register manager  to
cluster element Host with name localhost
2008-01-17 04:30:11,790 INFO : DeltaManager: Starting clustering manager at
2008-01-17 04:30:11,794 WARN : DeltaManager: Manager [], requesting
session state from
org.apache.catalina.cluster.mcast.McastMember[tcp://10.1.1.6:8015,catalina,10.1.1.6,8015,
alive=86161577]. This operation will timeout if no session state has
been received within 60 seconds.
2008-01-17 04:30:21,894 ERROR: DeltaManager: Manager []: No session
state send at 17/01/08 04:30 received, timing out after 10,102 ms.

While node 1 is shutting down, ~4000 messages like this are generated on node 2:

2008-01-17 04:30:06,233 WARN : FastAsyncSocketSender: Unable to
asynchronously send session with id=[x.c-web2-1200438737718] -
message will be ignored.
java.lang.NullPointerException
at 
org.apache.catalina.cluster.tcp.DataSender.pushMessage(DataSender.java:1057)
at 
org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.pushQueuedMessages(FastAsyncSocketSender.java:524)
at 
org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.run(FastAsyncSocketSender.java:487)
2008-01-17 04:30:06,235 WARN : FastAsyncSocketSender: Unable to
asynchronously send session with id=[x.c-web2-1200438739053] -
message will be ignored.
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at 
org.apache.catalina.cluster.tcp.DataSender.createSocket(DataSender.java:897)
at 
org.apache.catalina.cluster.tcp.DataSender.openSocket(DataSender.java:866)
at 
org.apache.catalina.cluster.tcp.DataSender.pushMessage(DataSender.java:1009)
at 
org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.pushQueuedMessages(FastAsyncSocketSender.java:524)
at 
org.apache.catalina.cluster.tcp.FastAsyncSocketSender$FastQueueThread.run(FastAsyncSocketSender.java:487)

These stop after about 1 second (last one at 04:30:07.340), and then a
decent number of Send stats lines are then printed on node 2:

2008-01-17 04:30:07,376 INFO : DataSender: Send stats from
[10.1.1.5:8,015], Nr of bytes sent=20,441,242 over 26,000 = 786
bytes/request, processing time 58,669,413 msec, avg processing time
2,257 msec
!-- More lines similar to the above and below --
2008-01-17 04:30:10,053 INFO : DataSender: Send stats from
[10.1.1.5:8,015], Nr of bytes sent=39,242,737 over 50,000 = 784
bytes/request, processing time 58,672,059 msec, avg processing time
1,173 msec

When node 2 restarts a couple minutes later, it is able to
successfully rejoin the cluster:

2008-01-17 04:34:05,433 INFO : SimpleTcpCluster: Cluster is about to start
2008-01-17 04:34:05,515 INFO : SimpleTcpCluster: Add Default
ClusterReceiver at cluster localhost
2008-01-17 04:34:05,521 INFO :