Hey Filip,

very welcome that you help.

Filip Hanik - Dev lists schrieb:

I ran some load tests with the pooled mode and the clustering stats are looking good.
next week I am expecting to dig a little bit deeper into the code, but so far it is looking pretty good,


Well, that a very fine news.

I am getting an increased number of incomplete responses, such as 302 redirects from tomcat, but that can also be the load balancer or the client scrambling the headers making an incomplete request.


I have tested with the mod_jk 1.2.10 load balancing, Apache 2.0.52/53 (Wndows XP,Suse 9.1) and start next week some tests with Cisco LB in combination with a lot of Apaches/Tomcat ( 8 Apache and every host a 3 cluster tomcats domain ).

I don't see those 302.


I am glad you removed the compress flag, I am not sure what that was to begin with as if I remember it correctly, messages were already being compressed, and during profiling, this had little impact on performance


On my profiling the compress mode is only usefull when you have large replication messages ( > 8k bytes),
but it use more CPU performance (> 20-30% more). I don't remove the compress flag. I have disable it by default. It is a sender/receiver attribute. The attribute waitForAck and compress was transfered to the Receiver:


<Receiver
className="org.apache.catalina.cluster.tcp.SocketReplicationListener"
tcpListenAddress="@node.clustertcp.address@"
tcpListenPort="@node.clustertcp.port@"
doReceivedProcessingStats="true"
/>
<Sender
className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
replicationMode="fastasyncqueue"
compress="true"
doTransmitterProcessingStats="true"
doProcessingStats="true"
doWaitAckStats="true"
queueTimeWait="true"
queueDoStats="true"
queueCheckLock="true"
ackTimeout="15000"
waitForAck="true"
autoConnect="false"
keepAliveTimeout="80000"
keepAliveMaxRequestCount="-1"/>


One of my ideas is:

Change the cluster protocol that developer can add there own data serialzable/deserialzable format (high risk)

Currently
header 6 bytes (FLT2002)
data.length 4 bytes
data,
end header 6 bytes (TLF2003)
Optimized to
header 2 bytes (TC)
type 1 byte
compressflag 1 byte
data.length 4 bytes,
data | <real uncompressed data.length (4 bytes)> data
"type" means user defined type and receiver extract bytes and type and sende it to callback
s. ObjectReader or SocketObjectReader
compress 1
first data 4 data bytes are the real uncompressed data length. ( Is for better memory management atr recevier side, S. XByteBuffer)
overwrite ClusterSender and ClusterReceiver deserizable/seriazable methods


- Then we can setup a flag at ClusterMessage or make a on the fly decision to compress data.


when changing the code, I was wondering if we can stick to method names that make sense and are logical


public int getTimeoutAllSession()
If this means return the count of all sessions that have timed out, I would suggest public int getSessionTimeoutCount()



No, it is the value of the timeout in sec's that DeltaManager wait after send all session event to one other cluster member.



protected ClusterMessage createRecevierObject(byte[] data)
do you mean deserialize? as in protected ClusterMessage deserialize(byte[] data)


Yes, I have change the names at ClusterReceiverBase and ReplicationTransmitter.
Also my favorit names, but time is limit when you refactor code


I must admit that I am having a little bit of a hard time reading the code because of the funky naming conventions, do you mind me cleaning up some when I go in and add changes?


Yes, feel free to find better names. Please, change the names also inside the mbeans descriptors and testcode.
I thing we must coordinate the work. You announce the change name step, than I can stop my redesign and refactorings.



I will be pushing for stabilization as opposed to new features and so called "refactoring".
As an example, to customers stability and speed is more important than features, take MySQL for example.


Yes, you are right. But my code changes are important for better understanding and made a clearer semantic to a
lot of classes. Other thing is: I want made the cluster faster and easier to extend. I hope we can also port the Remy/Mladens APR
sockets to the clustering module.


The following cases/classes need help:

- SimpleTcpCluster
pause/resume senders
You also mean that pause Receiver help?
Then you must also stop the Membership and that is dangerous.
=> pause: We can send a message to all other nodes that we are member but please queue all messages for us.
We also queue all message from local node.
=> resume: Send all nodes that the now can send the queue and we start also the sending.


Hmm: The async senders can handle it
Currently the async Socket Sender stop the thread when you call disconnect,
when you call connect a new thread starts and all queued message are send.


- PooledSocketSender
  - extend JMX stats
  - pause/resume sockets

DeltaManager
  - expire sessions
        The processExpire send for every session one message
        All 60 Sec the cluster send a lot of those messsages.
      => Better calc all sessions an send on big expire message package.

- Restarting node szenario is flacky....
You wait for GETALLMessage and other node send Sessions Events. (BAD)
You can get a Session Delta before Session exists....
=> I thing before State is not transfer we mus Queue those messages from other cluster nodes.
- Send All Session to more then one messages
1000 Sessions per message
After the complete active sessions transfer send a spezial State Transfer message.


documentation
Wrote a new How to and add sample config
=> I have implement a very fine cluster template and checkin it in this weekend.


Your ClusterSessionListener server.xml change is not needed. At cluster starts a ClusterSessionListener
was created, when no other listener is configured.


Peter


Filip


Peter Rossbach wrote:

Yes, I have change a lot and it is time to test and stabilze the code.
   s. to-do.txt for more....

The current cluster code with 5.5.9 fix pack work very well I testet the fix under very high
load last week


Peter

- Great that you also start to look inside the code.

Filip Hanik - Dev Lists schrieb:

I am going through the cluster code right now and will be adding fixes along the way.
I think the development of this code has focused more on features than stability, so I would like to ask that for the next period, lets focus on the stability and get this beast back in shape again.


Filip


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]








---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to