On 03/15/2014 10:16 PM, Mark Barker wrote:
(don't know how this got into the other thread - sorry, trying again...)

So I've been looking at running Qpid's C++ broker and a client on an
embedded target.
I need to transit some traffic from JMS clients and brokers on workstations
and route it elsewhere via an embedded platform.

One thing I've found is that some fairly innocuous looking Qpid receiver
code running under the C++ API can take orders of magnitude longer to run.

Some experiments just locally on a Ubuntu Linux box (12.04 LTS) reveal:
If I pre-fill a queue with 5000 * 1024-byte messages from a JMS source
client,
a local JMS consumer will take approximately 3s to drain it.

However, if, instead, I have some C++ Qpid Messaging API client just doing:

while(1)
{
   receiver.fetch();
   session.acknowledge();
}

on the same queue, this takes nearly 200s to consume the same data!!

Playing around with 2 parameters yields significant improvement - if I
setup
the receiver with:
receiver.setCapacity(100),
and then ONLY session.acknowledge() once every 100 messages,
the C++ client's message consumption drops massively to just 250ms.

These two settings are mentioned in the performance tips in the programming guide: http://qpid.apache.org/releases/qpid-0.24/programming/book/ch02s13.html

The reason the receiver capacity is set to 0 by default is that arguably that gives the most 'intuitive' behaviour. Messages are only delivered to a client in response to a fetch() call, so the state of the queue is easier to reason about.

This leads me to ask the question:
What should the C++ code look like in order to most closely imitate what
the
JMS equivalent is doing?

It would be pretty close to what you are already doing...

i.e.
The JMS client has an AUTO_ACKNOWLEDGE parameter on the session creation.
How can the C++ code
mimic this behaviour/performance?

The acknowledgement is in batches that are - I believe - related to the 'prefetch' (I can't recall if it can be independently controlled through any configuration option, nor can I see such an option documented).

In addition to the batch size, I believe there is a maximum time delay, such that if you don't get any messages for N seconds, then you acknowledge what you have regardless of the number of outstanding messages. You could do this in the c++ API by specifying a timeout for the fetch() and then handling the case where you didn't 'fill the batch' in a configured amount of time, and issue an acknowledgement anyway.

And how does the C++ receiver capacity fit into the same equation??

The capacity of the receiver is essentially the amount of prefetched messages. This is 500 by default for JMS, but can be configured if desired.

Finally, when running the C++ qpidd broker on the embedded target, CPU
usage
is really high when firing
a similar quantity of message data at it.

Similar to what? To the JMS client sending the same data?

An embedded broker daemon and
consuming client will quite happily max
out the CPU just for the task of messaging. I certainly don't see
tens-of-thousands of messages-per-second throughput. More like tens or
hundreds at best.

Is that with the changes you mention above (increased receiver capacity and batched acknowledgements)?

Are there options, strategies or guidelines for running such an arrangement
in a more efficient and less computationally intensive
manner (either in build-time options or command-line execution of the
broker, or structuring the client code in a specific way)??

Using cmake you can specify a release rather than debug build, but whether that makes much difference probably depends on the machine and compiler in question (last time I compared I didn't see a huge difference).




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to