On 04/30/2015 04:58 PM, Matt Broadstone wrote:
On Thu, Apr 30, 2015 at 11:46 AM, Gordon Sim <[email protected]> wrote:

On 04/29/2015 08:33 PM, Gordon Sim wrote:

On 04/29/2015 07:59 PM, Gordon Sim wrote:

On 04/29/2015 07:46 PM, Matt Broadstone wrote:

The process that's taking up memory is the receiver (mqget.cc in the
posted
gist). Proton is version 0.9.


I see some sort of build up in receivers using AMQP 1.0 with
0.32+proton-0.9 also. Seems to be only when receiving from topics. I'll
investigate further.


It's a bug in the qpid::messaging client for 1.0 I'm afraid.
Specifically it is not locally settling deliveries that are sent
pre-settled. These then build up within protons delivery map.

I'll get a fix in shortly. However as a workaround you can turn on
acknowledgements for the receiver, e.g. using 'my-topic;
{link:{reliability:at-least-once, timeout:1}}'.


This is now fixed and tracked by
https://issues.apache.org/jira/browse/QPID-6521


Great! Thanks for the quick turn around Gordon.

Actually it is largely Alan Conway we have to thank.

As for my original bug,
I've been having considerable trouble replicating it locally. I left my
test programs running for an hour or so and got up to around 8M messages
with no similar error.

I know this is fairly vague, but could you help describe what conditions
might lead to that kind of growth in queue depth?

Usually it is simply the consumer topping processing messages in someway, or slowing down significantly. As you pointed out in an earlier mail, the relatively slow rates you were seeing makes it hard to understand how such a large depth could build up when the steady state did not seem to involve any depth at all.

We have in the (distant) past seen scenarios where the depth as assumed by the queue and the depth as reported by management were out of sync, but those were all around fairly 'exotic' combinations of features (transactions and lvqs etc) and the known issues were fixed long before 0.28.

I recently hit an issue in proton whereby it lost the ability to handle acknowledgements of messages where there were many links on the same session and deliveries were settled significantly out of order. It's possible there could be something like that, but it would show up as growing queue depth I believe.

In my particular scenario
I am doing nothing more than the two provided sample apps do:
non-persistent, default constructed Messages being published to a topic
exchange and consumed by a single consumer.

Obviously the vanilla case here can be cleaned up a bit (maybe adding
message TTLs, or perhaps the acknowledgement workaround you provided), but
still I'm unclear as to why it worked for so many days and then failed on
that assertion (effectively rendering the client useless, as well as
anything posted to that topic). It's true this only happened with 0.28, but
I just don't have enough data points yet to rule out the possibility of it
happening again with 0.32/0.33.

I'm locally testing a number of scenarios, and we have the original
environment up so hopefully the behavior will be triggered again in a state
when I can collect data. In the meantime, any speculation on your side as
to what might cause this would be appreciated. Please let me know if I can
provide any more information.

Running qpid-stat -q periodically against the original environment (if that is possible) might be useful, say every 30 mins or so. That way if it happens again we have a reasonable amount of historical data leading up to the problem.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to