Re: Understanding queue limits

Paul Colby Fri, 22 Nov 2013 03:01:45 -0800

On Fri, Nov 22, 2013 at 8:40 PM, Gordon Sim <[email protected]> wrote:

> On 11/22/2013 04:29 AM, Paul Colby wrote:
>
>> Hi,
>>
>> I'm trying to understand some of the finer details of the queue size
>> limits
>> / policies, specifically in the context of a 0.16 cluster (I know 0.16 is
>> a
>> bit old now, but the relevant config options don't appear to have
>> changed).
>>
>> I have, for example, a number of queues configured like:
>>
>> ... --durable --file-size=512 --file-count=8 --limit-policy=flow-to-disk
>>
>> But I've now been bitten by the "expected behaviour" described in
>> QPID-3286<https://issues.apache.org/jira/browse/QPID-3286> a
>>
>> few times.
>>
>> As I understand it now, the above applies no limits to the queue in RAM,
>> but forces all messages to be written to the journal (aka persistent
>> store), and limits that store to ~256MB. So on some occasions, the journal
>> limit is exceeded on one broker, but not the others, forcing that broker
>> out of the cluster.
>>
>> Since I want all messages for the queue in question to be durable, and
>> I've
>> limited the journal to ~256MB, it seems pretty clear (correct me if I'm
>> wrong) that I should also set the --max-queue-size config to ~256MB too.
>>
>
> I would set the --max-queue-size to be quite a bit less than that (say
> 60%). The journals total size includes padding, dequeue records and other
> things - I've found it very hard to estimate reliably. You want to hit the
> configured queue depth before running out of journal, as that should be
> deterministic across nodes.



Makes sense.  I wouldn't have thought as low as 60%, but its not surprising.


>  So, here's my questions:
>>
>> 1. Is this risky in terms of RAM?  I ask this, because the
>> --max-queue-size
>> help text says: "Maximum in-memory queue size as bytes". Is it possible,
>> therefore, that qpidd might allow 256MB of RAM for each queue, possibly
>> exhausting RAM, thrashing swap space etc.  Or does qpidd apply some other
>> total-size-of-all-queues / RAM limits?
>>
>
> No there is no other restriction. There is a default for the max queue
> size (qpidd --help will give the value on your system).
>
> Since you can already hit the journal limit, it would seem that you are
> going to be restricting the queue depth more than it was, so I don't think
> you are adding any risk at all.
>

That's a very good point! :)

One other possibility is that the queue depth is never reaching that level,
> but there is an old message that gets stuck on the queue for a while. The
> journal is a circular buffer, so one old message on the queue can cause the
> capacity to be reached.
>

This is a very interesting possibility.  Although the most recent
occurrence was a genuinely full journal for the queue in question (a result
of an unexpected external issue), the other times I was skeptical that the
queue would have been full.  Unfortunately I don't have the queue metrics
from those earlier occurrences anymore, but I might have enough application
logs to see any messages were not ack'd properly, so I'll go back and have
a look.

I wonder: would it be possible / practical to monitor the journal for the
oldest message?  If we can detect the problem occurring, we can respond to
it before the cluster is broken.  Something to think about anyway.


> The ways you can get an old message stuck are e.g. not acknowledging a
> message you have received (but acknowledging all messages after it), using
> a selector (JMS only for 0.16), using LVQ or priority queue options...
>
>  2. What should I set the --limit-policy to?  Since I'm making all messages
>> durable, they are all already on disk so flow-to-disk is kind of redundant
>> for me (assuming this is the same store that flow-to-disk refers to?).  So
>> I think the policy here should be reject instead (the ring* policies are
>> certainly not what I want).
>>
>
> The flow-to-disk policy was a poor attempt at reducing RAM used by relying
> on disk. Since in your case it is the journal (i.e. the allocated disk
> space) that is running out, then reject sounds like a better option.
>

Thanks for your help.  You've given me exactly the info I was looking for.

Cheers,

Paul Colby.
----
http://colby.id.au

Re: Understanding queue limits

Reply via email to