DO NOT REPLY [Bug 28161] - Replication messages get lost with AsyncSocketSender
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28161. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28161 Replication messages get lost with AsyncSocketSender [EMAIL PROTECTED] changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution||FIXED --- Additional Comments From [EMAIL PROTECTED] 2004-04-07 20:41 --- The Deltamanager now uses a different Id to assign to its messages, that will make them unique. This all the messages will get through. However, the async could be pooled in the same way the the synchronous is in pooled mode, would increase performance alot :) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
DO NOT REPLY [Bug 28161] - Replication messages get lost with AsyncSocketSender
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28161. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28161 Replication messages get lost with AsyncSocketSender --- Additional Comments From [EMAIL PROTECTED] 2004-04-05 16:39 --- Maybe interesting: a simple Hello World JSP does NOT trigger the problem. But once I add a scriptlet %session.setAttribute(COUNT,new Integer(10));% the problem arises. I can now reproduce without any apache or mod_jk involvement. I just start a 2 node cluster and call the JSP via direct HTTP connections to node #1 200 times with a delay between calls of 100ms. After calling 200 times, I can find 200 sessions on node #1, but only between 170 and 195 sessions on node 2. I check session count via /manager/html, but I also added debug output to see, that some sessions are indeed missing. I try to go deeper into cluster messages and the queue handling. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
DO NOT REPLY [Bug 28161] - Replication messages get lost with AsyncSocketSender
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28161. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28161 Replication messages get lost with AsyncSocketSender --- Additional Comments From [EMAIL PROTECTED] 2004-04-05 17:26 --- Next info: If I use this JSP, then synchronous and pooled are both EXTREMELY slow, response times between 1000ms and 5000ms. As soon as I reduce tcpSelectorTimeout from 1000 to 10, I get more reasonable response times (10- 50ms). Any idea, why tcpSelectorTimeout show such a tremendous effect? Then when I use multiple parallel clients, synchronous again gets too slow, so only pooled is an alternative. Synchronous once showed a freeze (getting no more anserws) for 15 seconds. Both, synchronous and pooled do not show the problem of missing sessions. Nevertheless I like the idea of having one or few dedicated replication connections fed by a queue of work load and not directly coupled to the finishing of the original response (asynchronous) much more. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
DO NOT REPLY [Bug 28161] - Replication messages get lost with AsyncSocketSender
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28161. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28161 Replication messages get lost with AsyncSocketSender --- Additional Comments From [EMAIL PROTECTED] 2004-04-05 19:11 --- First: there are app. 6 System.out.print/ln in the cluster code. One of these (line 71 in the SmartQueue.java) prevented me from finding the solution earlier. Here is the SOLUTION: What happens is, that the smart feature of the smart queue gets us into trouble. For my JSP two session messages are being send. One is of type 1 (EVT_SESSION_CREATED), and the second one is of type 13 (EVT_SESSION_DELTA). Both are being send very close to each other during the only request in a session. Most of the time the system is fast enough to handle each message individually, before the next message is put into the queue. Every now and then the message of type 1 is not read from the queue before type 13 is generated. Then the queue replaces the type 1 message in the queue by the type 13 message, and only the type 13 message is send out. Then the receiving side seems to not create the session, since the type 1 message is missing. I didn't check this last point, because I think this is much clearer for you. Isn't there a general problem in using the Delta manager together with the smart queue? Since you only send out delta messages, it doesn't look like a good idea to replace pending messages with newer ones. In fact isn't it necessary to send all deltas and to furthermore make sure, that they are send in the original order? At least this makes clear, why the problem will only show up in asynchronous mode. In synchronous mode you will allways send all messages (and in the right order). Maybe it suffices to strip off the smart feature of the smart queue? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
DO NOT REPLY [Bug 28161] - Replication messages get lost with AsyncSocketSender
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28161. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28161 Replication messages get lost with AsyncSocketSender [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED Resolution||WONTFIX --- Additional Comments From [EMAIL PROTECTED] 2004-04-02 15:57 --- Async data send means that there is not time guarantee for when the session is delivered. The session should not get lost without any error trace in the logs. I am still debating whether to remove this feature all together, but I left it in for people to play with. I have not found a case where async is useful, but I am sure there is which is why it is still there. Most of the time people want to be ensured that the session gets replicated, that is when pooled mode comes in. also, from experience, using mod_jk in high load can result in lost sessions, cause it sometimes messes up the request and looses the session id. from my experience, pen (siag.nu/pen) works better as a load balancer I strongly suggest to retry the same test with replicationMode=pooled and see if you get better results. Pooled means that replication is synchronous, but on concurrent channels. Filip - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
DO NOT REPLY [Bug 28161] - Replication messages get lost with AsyncSocketSender
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28161. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28161 Replication messages get lost with AsyncSocketSender --- Additional Comments From [EMAIL PROTECTED] 2004-04-02 16:45 --- I respect your sugestion to not use asynchronous, although it looked to me like the right way to do it. Just for your information: The messages really get lost, even after we stop load the missing messages don't get replicated. So it's not just a problem of messages getting replicated too late. There are definitely only debug log stetments all the time, except for a few info messages giving mean values for replication data size. No other non debug log statements on any cluster node. Also from what I see I'm pretty sure, that the replication data is written to the Socket. Concerning mod_jk: For this test case we used each session only once. So the correctness of the response through mod_jk somehow didn't matter. We could easily reproduce the same situation using build in Tomcat HTTP Connector (although we didn't do so until now). We will retry using pooled, although I don't like the idea of having up to 25 connections (code constant) and threads for each pair of nodes in the cluster. Also I had the impression, that in pooled mode TCP conections are only used a very short time (I think I remember for only 100 messages? This application will be under heavy load in production). Why do I think asynchronous fits better? In any synchronous situation if the replication is not fast enough I immediately get negative consequences for the application from the user point of view, because the request blocks ressources needed for accepting new requests as long as the replication hasn't finished. So if replication is slow for a few seconds I'm in danger of loosing all free Apache-Slots resp. Tomcat worker threads for incoming requests. When I do asynchronous replication I only loose timely replication of the sesion changes. If I route my request to the primary container, then I still profit from the cluster with respect to availability and servicability (I can shutdown one of the containers without users loosing sessions). For these features it doesn't really matter, if all request are replicated within milliseconds all the time. I'm sorry to bother you, but I think it's an important discussion. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
DO NOT REPLY [Bug 28161] - Replication messages get lost with AsyncSocketSender
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28161. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28161 Replication messages get lost with AsyncSocketSender [EMAIL PROTECTED] changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|WONTFIX | --- Additional Comments From [EMAIL PROTECTED] 2004-04-02 17:01 --- We will retry using pooled, although I don't like the idea of having up to 25 this is not really a big resource issue, since no threads are holding on to these connections, they just grab one from the queue when it is available, then return it. lets reopen this bug re:/ async, once I get all moved in and have my computers set up I can start testing this again Filip - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
DO NOT REPLY [Bug 28161] - Replication messages get lost with AsyncSocketSender
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://issues.apache.org/bugzilla/show_bug.cgi?id=28161. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://issues.apache.org/bugzilla/show_bug.cgi?id=28161 Replication messages get lost with AsyncSocketSender --- Additional Comments From [EMAIL PROTECTED] 2004-04-02 17:04 --- also, the problem with Async, is that it is using only one channel, hence during heavy load, you will not get milli seconds throughput, cause it queues all the messages the solution would be to make an async pooled mode, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]