I think part of the problem might be that the subscriptions, even when you specify a domain, are not domain specific. What I mean is that a user connected to B subscribes to messages for a domain that is mastered on A. However, when the subscription is forwarded to A, it matches messages from all domains, even those generated on B and sent to A. Does this make sense? Could this be part of the problem?
David _____ From: David R Robison [mailto:[EMAIL PROTECTED] To: xmlblaster@server.xmlBlaster.org Sent: Wed, 21 Nov 2007 10:41:10 -0500 Subject: Re: [xmlblaster] Callback message queue fills up Here is a dunp of one of the messages: <MsgUnit index='0'> <key oid='DomainHeartbeat-Albemarle911' contentMime='text/xml' contentMimeExtended='1.0' domain='Albemarle911'/> <content size='46'>Domain Albemarle911 ALIVE at 11/21/07 09:48:43</content> <qos> <subscribable/> <sender>/node/Albemarle911/client/A-NATIVE-CLIENT-PLUGIN/-3</sender> <priority>MAX</priority> <subscribe id='__subId:StauntonSTC-XPATH1195628463329000000'/> <expiration lifeTime='30000' remainingLife='22703' forceDestroy='true'/> <rcvTimestamp nanos='1195659482613000002'/> <queue index='0' size='1'/> <persistent>false</persistent> <isUpdate/> </qos> </MsgUnit> The message was created on node B and sent to node A because of a subscription on node A. But it is now in the callback queue on A to go back to B. Also, I have never seen the route data in the messages. Is there a way to turn this on? David Marcel Ruff wrote: > David R Robison wrote: >> One other thought. Heartbeat messages are published on node B and >> subscribed to by clients on node A. Also, there are clients on node B >> that subscribe to messages on node A. However, it appears that the >> subscriptions the clients on node B are using are also matching the >> heartbeat messages from node B that have been sent to node A. Could I >> have some kind of circular queue? A message is posted on B then sent >> to A because a subscription by a client on A. Then sent back to B >> because of a subscription by a client on B for messages on A. Then >> the message gets sent back to A and the whole cycle repeats? > Could be, usually the cluster should prevent this ... > The messages contain in their QoS the nodes traversed: > > <qos> > <sender>joe</sender> > <route> > <node id='bilbo' stratum='2' timestamp='34460239640'/> > <node id='frodo' stratum='1' timestamp='34460239661'/> > <node id='heron' stratum='0' timestamp='34460239590'/> > </route> > </qos> > > it would be nice to see the dump of such messages, > Use the jconsole or logging output from your receiving client or use the > message sniffer, e.g.: > java javaclients.simplereader.SimpleReaderGui -xpath "//key" > -session.name simpleReader -passwd secret -protocol SOCKET > -dispatch/connection/plugin/socket/hostname 192.168.1.25 -dumpToFile true > or peek the callback queue with administrative messages as described > in one of your last posts, > > thanks > Marcel > >> >> Could this be possible? David >> >> David R Robison wrote: >>> Thanks, See in line... >>> >>> Marcel Ruff wrote: >>>> Hi David, >>>> >>>> do you have a jconsole to observe the two nodes? >>> I don't have a jconsole, but can I get the same using the admin >>> messages? >>>> >>>> If yes, please check the number of subscriptions the node A has >>>> forwarded to node B >>>> (look into node B and check the number of subscriptions of client >>>> A) during such a case. >>>> In case the subscribeQos has set >>> I will check. >>>> >>>> <multiSubscribe>true</multiSubscribe> >>> I believe that we set all to false. >>>> >>>> (which is the default) it could be that the subscriptions multiplied >>>> during small connection errors and reconnects. >>>> This is just a guess. >>>> If it is the case please set multiSubscribe to false. >>>> >>>> Is there a high CPU load during the 1001 message case? >>> No >>>> Are the hearbeat messages persistent messages? >>> Yes, but the only live 30 seconds. At any given time there should >>> only be at most 2 in the history queue >>>> Was the client connected or offline during this message overflow? >>> No, the client was online >>>> Does your heartbeat have a unique id so that you can tell for sure >>>> if the same >>> No, but the content of the message has a timestamp so I knew they >>> were duplicates >>>> published message is cloned many times (try a peek on the callback >>>> queue with jconsole)? >>> Can this be done with the admin messages >>>> >>>> A final option is to use the current svn xmlBlaster and switch on >>>> the checkpoint logging >>>> to get a better idea what is going on. >>> We will try this in house, unfortunately, the problem nodes are in a >>> production environment. >>>> >>>> And finally it could be a problem with your client not taking the >>>> callback messages. >>> Could be, but what I don't see is the queue gradually growing. >>> Instead, it "all-of-a-sudden" appears to be full. >>>> >>>> Another idea: The callback queue contains only a reference on the >>>> message. >>>> If it expires the message-'meat' is destroyed but the reference >>>> remains in the queue >>>> until it is looked at during delivery (and then thrown to garbage), >>>> Michele, could this be? >>>> >>>> thanks >>>> Marcel >>>> >>>> >>>> David R Robison wrote: >>>>> We are experiencing something strange in xmlBlaster 1.6.1. We have >>>>> two nodes, node A subscribes to messages from node B. These are >>>>> heartbeat messages and are generated every 15 seconds with a >>>>> lifetime of 30 seconds. A client connects to node A and subscribes >>>>> to the messages, node A then passes the subscription onto node B. >>>>> Watching the callback message queue, everything seems to run well, >>>>> at most 1 message in the queue waiting to be sent. It can run like >>>>> this for days. Then, unexpectedly, the callback queue will show as >>>>> being full (in this case 1001 messages). The queue contains many >>>>> duplicated messages with different timestamps. From there, the >>>>> server struggles to deliver the messages and keep the queue empty. >>>>> The reader never seems to read enough messages to get the queue >>>>> back down to zero. If I stop the client and reconnect, it will >>>>> recreate its queue and be back to normal. I know this is a bit >>>>> sketchy, but it is becoming a real problem for us. >>>>> >>>>> Any thoughts on what might be the problem? Any idea of where to >>>>> start looking? >>>>> >>>>> One more note, when the client is subscribing to heartbeats that >>>>> are generated on Node A, the client never fails in this manor, >>>>> only when it is subscribing to node A for a message generated on >>>>> node B. >>>>> >>>>> Thanks, in advance, >>>>> David Robison >>>>> >>>> >>> >> > > -- David R Robison Open Roads Consulting, Inc. 708 S. Battlefield Blvd., Chesapeake, VA 23322 phone: (757) 546-3401 e-mail: [EMAIL PROTECTED] web: http://openroadsconsulting.com blog: http://therobe.blogspot.com book: http://www.xulonpress.com/book_detail.php?id=2579