Further information and questions.

I created my own interceptor based on ThroughputInterceptor so that I could log the timing of specific sessions to correlate them with the failures in my health check program. I was surprised to find that in those instances where the health check reported a failure, the interceptor reported that the session send was accomplished in < 5 ms, while the health check app is waiting a full 1000 ms between calls to the different tomcat instances. So now I'm more confused than ever.

Anyone have any ideas?

In a ChannelInterceptor, does when getNext().sendMessage(destination, msg, payload) returns, does that mean that the message has been sent AND received by the recipient member, or does that only indicate a send?


Mitch

On 09/06/2018 01:53 PM, Mitch Claborn wrote:
I'm using a cluster with the DeltaManager between two servers on Tomcat 9.0.11. I've set channelSendOptions="8" (asynchronous session replication).

I have a "health check" app that I run periodically, one of the functions being to check that sessions are being replicated properly. That app 1) Does a GET to tomcat A, calling a Struts action that creates a session and stores a known value in it
2) Waits 2 seconds
3) Uses the session ID cookie from step 1 and makes a call to tomcat B, to an action that retrieves that value from the session 4) Compares the two values from the session to make sure that they are the same.

Most of the time this check works fine, but occasionally the call to the second server will find that the session does not exist on that server, presumably because it has not yet replicated there yet. 2 seconds seems a long time for a session to replicate, especially one as small as this one is. If I decrease the amount of wait time at step 2, the failure rate increases.

I turned on the ThroughputInterceptor and have the following observations.
- Server A has a transmit throughput around 10 MB/sec while B has only around 3 MB/sec. This might be accounted for by the fact that B was the last server to start, so A would have (I think) transmitted all of the sessions at once when B started up, so it might get good throughput from the big send??

Questions:
1. IS 2 seconds a long time to replicate a session?
2. Other than actual network slowness, are there internal issues that could cause the replication to be slow?
3. If so, is there anyway to diagnose those?
4. I'm thinking about writing my own version of ThroughputInterceptor that will give more information on specific messages and timings. Has anyone tried that? In that interceptor can I access the session ID? That would help me correlate timings between my failure reports and the interceptor.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to