I think part of the problem might be that the subscriptions, even when you 
specify a domain, are not domain specific. What I mean is that a user connected 
to B subscribes to messages for a domain that is mastered on A. However, when 
the subscription is forwarded to A, it matches messages from all domains, even 
those generated on B and sent to A. Does this make sense? Could this be part of 
the problem?


From: David R Robison [mailto:[EMAIL PROTECTED]
To: xmlblaster@server.xmlBlaster.org
Sent: Wed, 21 Nov 2007 10:41:10 -0500
Subject: Re: [xmlblaster] Callback message queue fills up

Here is a dunp of one of the messages:
  <MsgUnit index='0'>
   <key oid='DomainHeartbeat-Albemarle911' contentMime='text/xml' 
  contentMimeExtended='1.0' domain='Albemarle911'/>
    <content size='46'>Domain Albemarle911 ALIVE at 11/21/07 
    <subscribe id='__subId:StauntonSTC-XPATH1195628463329000000'/>
    <expiration lifeTime='30000' remainingLife='22703' forceDestroy='true'/>
    <rcvTimestamp nanos='1195659482613000002'/>
    <queue index='0' size='1'/>
  The message was created on node B and sent to node A because of a 
  subscription on node A. But it is now in the callback queue on A to go 
  back to B. Also, I have never seen the route data in the messages. Is 
  there a way to turn this on?
  Marcel Ruff wrote:
  > David R Robison wrote:
  >> One other thought. Heartbeat messages are published on node B and 
  >> subscribed to by clients on node A. Also, there are clients on node B 
  >> that subscribe to messages on node A. However, it appears that the 
  >> subscriptions the clients on node B are using are also matching the 
  >> heartbeat messages from node B that have been sent to node A. Could I 
  >> have some kind of circular queue? A message is posted on B then sent 
  >> to A because a subscription by a client on A. Then sent back to B 
  >> because of a subscription by a client on B for messages on A. Then 
  >> the message gets sent back to A and the whole cycle repeats?
  > Could be, usually the cluster should prevent this  ...
  > The messages contain in their QoS the nodes traversed:
  > <qos>
  >   <sender>joe</sender>
  >   <route>
  >      <node id='bilbo' stratum='2' timestamp='34460239640'/>
  >      <node id='frodo' stratum='1' timestamp='34460239661'/>
  >      <node id='heron' stratum='0' timestamp='34460239590'/>
  >   </route>
  > </qos>
  > it would be nice to see the dump of such messages,
  > Use the jconsole or logging output from your receiving client or use the
  > message sniffer, e.g.:
  > java javaclients.simplereader.SimpleReaderGui -xpath "//key" 
  > -session.name simpleReader -passwd secret -protocol SOCKET 
  > -dispatch/connection/plugin/socket/hostname -dumpToFile true
  > or peek the callback queue with administrative messages as described 
  > in one of your last posts,
  > thanks
  > Marcel
  >> Could this be possible? David
  >> David R Robison wrote:
  >>> Thanks, See in line...
  >>> Marcel Ruff wrote:
  >>>> Hi David,
  >>>> do you have a jconsole to observe the two nodes?
  >>> I don't have a jconsole, but can I get the same using the admin 
  >>> messages?
  >>>> If yes, please check the number of subscriptions the node A has 
  >>>> forwarded to node B
  >>>> (look into node B and check the number of subscriptions of client 
  >>>> A) during such a case.
  >>>> In case the subscribeQos has set
  >>> I will check.
  >>>> <multiSubscribe>true</multiSubscribe>
  >>> I believe that we set all to false.
  >>>> (which is the default) it could be that the subscriptions multiplied
  >>>> during small connection errors and reconnects.
  >>>> This is just a guess.
  >>>> If it is the case please set multiSubscribe to false.
  >>>> Is there a high CPU load during the 1001 message case?
  >>> No
  >>>> Are the hearbeat messages persistent messages?
  >>> Yes, but the only live 30 seconds. At any given time there should 
  >>> only be at most 2 in the history queue
  >>>> Was the client connected or offline during this message overflow?
  >>> No, the client was online
  >>>> Does your heartbeat have a unique id so that you can tell for sure 
  >>>> if the same
  >>> No, but the content of the message has a timestamp so I knew they 
  >>> were duplicates
  >>>> published message is cloned many times (try a peek on the callback 
  >>>> queue with jconsole)?
  >>> Can this be done with the admin messages
  >>>> A final option is to use the current svn xmlBlaster and switch on 
  >>>> the checkpoint logging
  >>>> to get a better idea what is going on.
  >>> We will try this in house, unfortunately, the problem nodes are in a 
  >>> production environment.
  >>>> And finally it could be a problem with your client not taking the 
  >>>> callback messages.
  >>> Could be, but what I don't see is the queue gradually growing. 
  >>> Instead, it "all-of-a-sudden" appears to be full.
  >>>> Another idea: The callback queue contains only a reference on the 
  >>>> message.
  >>>> If it expires the message-'meat' is destroyed but the reference 
  >>>> remains in the queue
  >>>> until it is looked at during delivery (and then thrown to garbage), 
  >>>> Michele, could this be?
  >>>> thanks
  >>>> Marcel
  >>>> David R Robison wrote:
  >>>>> We are experiencing something strange in xmlBlaster 1.6.1. We have 
  >>>>> two nodes, node A subscribes to messages from node B. These are 
  >>>>> heartbeat messages and are generated every 15 seconds with a 
  >>>>> lifetime of 30 seconds. A client connects to node A and subscribes 
  >>>>> to the messages, node A then passes the subscription onto node B. 
  >>>>> Watching the callback message queue, everything seems to run well, 
  >>>>> at most 1 message in the queue waiting to be sent. It can run like 
  >>>>> this for days. Then, unexpectedly, the callback queue will show as 
  >>>>> being full (in this case 1001 messages). The queue contains many 
  >>>>> duplicated messages with different timestamps. From there, the 
  >>>>> server struggles to deliver the messages and keep the queue empty. 
  >>>>> The reader never seems to read enough messages to get the queue 
  >>>>> back down to zero. If I stop the client and reconnect, it will 
  >>>>> recreate its queue and be back to normal. I know this is a bit 
  >>>>> sketchy, but it is becoming a real problem for us.
  >>>>> Any thoughts on what might be the problem? Any idea of where to 
  >>>>> start looking?
  >>>>> One more note, when the client is subscribing to heartbeats that 
  >>>>> are generated on Node A, the client never fails in this manor, 
  >>>>> only when it is subscribing to node A for a message generated on 
  >>>>> node B.
  >>>>> Thanks, in advance,
  >>>>> David Robison
  David R Robison
  Open Roads Consulting, Inc.
  708 S. Battlefield Blvd., Chesapeake, VA 23322
  phone: (757) 546-3401
  web: http://openroadsconsulting.com
  blog: http://therobe.blogspot.com
  book: http://www.xulonpress.com/book_detail.php?id=2579

Reply via email to