C Broker Availability Problem

Richard Peter Wed, 08 Jun 2011 05:09:52 -0700

Hi,

The issue I'm having is when a client producer sends message based onuser interaction. The message causes a screen to pop up on anotherworkstation. Usually the pop up is instantaneous, sometimes though ittakes up to 2 minutes for the message to get to the other workstation.The message is a JMS text message containing 9 characters, so fairlysmall message. We have tried tuning the worker-threads thinking it wasan availability issue. This single message is more important than allthe other traffic our qpid is handling. Is there a way to give priorityto one queue over another? There is a large amount of traffic beinghandled by the broker, but not sure how the design is setup to handlewhen they are many more sessions/queues than worker-threads. Does athread send all messages to a consumer before moving on to the nextqueue? Or is the only way to ensure availability to further increaseworker-threads? I've had the threads as high as 100, but the load onthe system made the problem worse. Our setup is below.

We are using version 0.8 of the C broker and java client. The brokerhas roughly 100 queues. Each queue has at least two consumers, 1 eachfrom separate servers in a cluster. We then also have 20 clientslistens to 4 topics and 5 clients listening to 1 queue (the importantone mentioned above). So in general out broker has roughly 300 sessionsopen at any given time. Almost all of the queues are durable. Thetopics are not durable, nor are subscribers durable. All but oneclients in the scenario are java clients, with 1 c client. The serversalso use the java client. The following is connection url used by mostof the clients (its embedded in spring xml, thus the escaped &.

amqp://guest:guest@/program?brokerlist='tcp://${broker.addr}?retries='0'&tcp_nodelay='true'&connecttimeout='5000''&maxprefetch='0'&sync_publish='all'&failover='nofailover'

I only recently turned on tcp_nodelay and sync_publish, thinking thatperhaps the message was occasionally getting stuck. These are thesetting from our conf file for the broker:


auth=no
worker-threads=50
data-dir=/somepath/qpid/data
store-dir=/somepath/qpid/messageStore
pid-dir=/somepath/qpid/var/lock
num-jfiles=16
jfile-size-pgs=24
tcp-nodelay=true

Many of the queues are sized larger than the default through a queuecreator script. The sizes range up to a max file count of 32 and filesize of 48. The server running qpid is a 8 cpu system with 2g ofmemory, some of the offices have a 16 cpu system with 8g of memory. Theserver size does not make a difference in the errors.

Part of the theory for availability being the issue was that the clientskept timing out on heartbeat. So we disabled the heartbeat. We alsooccasionally seeINFO 2011-06-06 17:47:42,501 [IoReceiver - somemachine/someip:5672]JmsPooledSession: EDEX: DEFAULT - Failed to close sessionorg.apache.qpid.transport.SessionException: timed out waiting for sync:complete = 30115, point = 30116

    at org.apache.qpid.transport.Session.sync(Session.java:744)
    at org.apache.qpid.transport.Session.sync(Session.java:713)

atorg.apache.qpid.client.AMQSession_0_10.sendClose(AMQSession_0_10.java:427)

    at org.apache.qpid.client.AMQSession.close(AMQSession.java:700)
    at org.apache.qpid.client.AMQSession.close(AMQSession.java:666)
    at org.apache.qpid.client.AMQSession.close(AMQSession.java:525)

atsomepackage.jms.JmsPooledSession.closeInternal(JmsPooledSession.java:164)atsomepackage.jms.JmsPooledConnection.disconnect(JmsPooledConnection.java:152)atsomepackage.jms.JmsPooledConnection.onException(JmsPooledConnection.java:127)atorg.apache.qpid.client.AMQConnectionDelegate_0_10.closed(AMQConnectionDelegate_0_10.java:270)

    at org.apache.qpid.transport.Connection.closed(Connection.java:529)

atorg.apache.qpid.transport.network.Assembler.closed(Assembler.java:113)atorg.apache.qpid.transport.network.InputHandler.closed(InputHandler.java:202)atorg.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:150)

    at java.lang.Thread.run(Thread.java:619)

The gap between complete and point used to be much larger before addingthe sync_publish setting. There are no errors in the qpid broker log.The only thing in the log is along the lines of the following 2 messages:

qpidd[19149]: 2011-06-08 11:50:03 warningManagementAgent::periodicProcessing task overran 1 times by 6ms (taking5098421ns) on average.qpidd[19149]: 2011-06-08 11:50:16 warning task overran 3 times by 2ms(taking 27955ns) on average.


Thanks,
Richard Peter

C Broker Availability Problem

Reply via email to