Re: Qpid broker 6.0.4 performance issues

Ramayan Tiwari Tue, 18 Oct 2016 10:49:36 -0700

Thanks so much Rob, I will test the patch against trunk and will update you
with the outcome.


- Ramayan

On Tue, Oct 18, 2016 at 2:37 AM, Rob Godfrey <rob.j.godf...@gmail.com>
wrote:

> On 17 October 2016 at 21:50, Rob Godfrey <rob.j.godf...@gmail.com> wrote:
>
> >
> >
> > On 17 October 2016 at 21:24, Ramayan Tiwari <ramayan.tiw...@gmail.com>
> > wrote:
> >
> >> Hi Rob,
> >>
> >> We are certainly interested in testing the "multi queue consumers"
> >> behavior
> >> with your patch in the new broker. We would like to know:
> >>
> >> 1. What will the scope of changes, client or broker or both? We are
> >> currently running 0.16 client, so would like to make sure that we will
> >> able
> >> to use these changes with 0.16 client.
> >>
> >>
> > There's no change to the client.  I can't remember what was in the 0.16
> > client... the only issue would be if there are any bugs in the parsing of
> > address arguments.  I can try to test that out tmr.
> >
>
>
> OK - with a little bit of care to get round the address parsing issues in
> the 0.16 client... I think we can get this to work.  I've created the
> following JIRA:
>
> https://issues.apache.org/jira/browse/QPID-7462
>
> and attached to it are a patch which applies against trunk, and a separate
> patch which applies against the 6.0.x branch (
> https://svn.apache.org/repos/asf/qpid/java/branches/6.0.x - this is 6.0.4
> plus a few other fixes which we will soon be releasing as 6.0.5)
>
> To create a consumer which uses this feature (and multi queue consumption)
> for the 0.16 client you need to use something like the following as the
> address:
>
> queue_01 ; {node : { type : queue }, link : { x-subscribes : {
> arguments : { x-multiqueue : [ queue_01, queue_02, queue_03 ],
> x-pull-only : true }}}}
>
>
> Note that the initial queue_01 has to be a name of an actual queue on
> the virtual host, but otherwise it is not actually used (if you were
> using a 0.32 or later client you could just use '' here).  The actual
> queues that are consumed from are in the list value associated with
> x-multiqueue.  For my testing I created a list with 3000 queues here
> and this worked fine.
>
> Let me know if you have any questions / issues,
>
> Hope this helps,
> Rob
>
>
> >
> >
> >> 2. My understanding is that the "pull vs push" change is only with
> respect
> >> to broker and it does not change our architecture where we use
> >> MessageListerner to receive messages asynchronously.
> >>
> >
> > Exactly - this is only a change within the internal broker threading
> > model.  The external behaviour of the broker remains essentially
> unchanged.
> >
> >
> >>
> >> 3. Once I/O refactoring is completely, we would be able to go back to
> use
> >> standard JMS consumer (Destination), what is the timeline and broker
> >> release version for the completion of this work?
> >>
> >
> > You might wish to continue to use the "multi queue" model, depending on
> > your actual use case, but yeah once the I/O work is complete I would hope
> > that you could use the thousands of consumers model should you wish.  We
> > don't have a schedule for the next phase of I/O rework right now - about
> > all I can say is that it is unlikely to be complete this year.  I'd need
> to
> > talk with Keith (who is currently on vacation) as to when we think we may
> > be able to schedule it.
> >
> >
> >>
> >> Let me know once you have integrated the patch and I will re-run our
> >> performance tests to validate it.
> >>
> >>
> > I'll make a patch for 6.0.x presently (I've been working on a change
> > against trunk - the patch will probably have to change a bit to apply to
> > 6.0.x).
> >
> > Cheers,
> > Rob
> >
> > Thanks
> >> Ramayan
> >>
> >> On Sun, Oct 16, 2016 at 3:30 PM, Rob Godfrey <rob.j.godf...@gmail.com>
> >> wrote:
> >>
> >> > OK - so having pondered / hacked around a bit this weekend, I think to
> >> get
> >> > decent performance from the IO model in 6.0 for your use case we're
> >> going
> >> > to have to change things around a bit.
> >> >
> >> > Basically 6.0 is an intermediate step on our IO / threading model
> >> journey.
> >> > In earlier versions we used 2 threads per connection for IO (one read,
> >> one
> >> > write) and then extra threads from a pool to "push" messages from
> >> queues to
> >> > connections.
> >> >
> >> > In 6.0 we move to using a pool for the IO threads, and also stopped
> >> queues
> >> > from "pushing" to connections while the IO threads were acting on the
> >> > connection.  It's this latter fact which is screwing up performance
> for
> >> > your use case here because what happens is that on each network read
> we
> >> > tell each consumer to stop accepting pushes from the queue until the
> IO
> >> > interaction has completed.  This is causing lots of loops over your
> 3000
> >> > consumers on each session, which is eating up a lot of CPU on every
> >> network
> >> > interaction.
> >> >
> >> > In the final version of our IO refactoring we want to remove the
> >> "pushing"
> >> > from the queue, and instead have the consumers "pull" - so that the
> only
> >> > threads that operate on the queues (outside of housekeeping tasks like
> >> > expiry) will be the IO threads.
> >> >
> >> > So, what we could do (and I have a patch sitting on my laptop for
> this)
> >> is
> >> > to look at using the "multi queue consumers" work I did for you guys
> >> > before, but augmenting this so that the consumers work using a "pull"
> >> model
> >> > rather than the push model.  This will guarantee strict fairness
> between
> >> > the queues associated with the consumer (which was the issue you had
> >> with
> >> > this functionality before, I believe).  Using this model you'd only
> >> need a
> >> > small number (one?) of consumers per session.  The patch I have is to
> >> add
> >> > this "pull" mode for these consumers (essentially this is a preview of
> >> how
> >> > all consumers will work in the future).
> >> >
> >> > Does this seem like something you would be interested in pursuing?
> >> >
> >> > Cheers,
> >> > Rob
> >> >
> >> > On 15 October 2016 at 17:30, Ramayan Tiwari <ramayan.tiw...@gmail.com
> >
> >> > wrote:
> >> >
> >> > > Thanks Rob. Apologies for sending this over weekend :(
> >> > >
> >> > > Are there are docs on the new threading model? I found this on
> >> > confluence:
> >> > >
> >> > > https://cwiki.apache.org/confluence/display/qpid/IO+
> >> > Transport+Refactoring
> >> > >
> >> > > We are also interested in understanding the threading model a little
> >> > better
> >> > > to help us figure our its impact for our usage patterns. Would be
> very
> >> > > helpful if there are more docs/JIRA/email-threads with some details.
> >> > >
> >> > > Thanks
> >> > >
> >> > > On Sat, Oct 15, 2016 at 9:21 AM, Rob Godfrey <
> rob.j.godf...@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > So I *think* this is an issue because of the extremely large
> number
> >> of
> >> > > > consumers.  The threading model in v6 means that whenever a
> network
> >> > read
> >> > > > occurs for a connection, it iterates over the consumers on that
> >> > > connection
> >> > > > - obviously where there are a large number of consumers this is
> >> > > > burdensome.  I fear addressing this may not be a trivial change...
> >> I
> >> > > shall
> >> > > > spend the rest of my afternoon pondering this...
> >> > > >
> >> > > > - Rob
> >> > > >
> >> > > > On 15 October 2016 at 17:14, Ramayan Tiwari <
> >> ramayan.tiw...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Hi Rob,
> >> > > > >
> >> > > > > Thanks so much for your response. We use transacted sessions
> with
> >> > > > > non-persistent delivery. Prefetch size is 1 and every message is
> >> same
> >> > > > size
> >> > > > > (200 bytes).
> >> > > > >
> >> > > > > Thanks
> >> > > > > Ramayan
> >> > > > >
> >> > > > > On Sat, Oct 15, 2016 at 2:59 AM, Rob Godfrey <
> >> > rob.j.godf...@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi Ramyan,
> >> > > > > >
> >> > > > > > this is interesting... in our testing (which admittedly didn't
> >> > cover
> >> > > > the
> >> > > > > > case of this many queues / listeners) we saw the 6.0.x broker
> >> using
> >> > > > less
> >> > > > > > CPU on average than the 0.32 broker.  I'll have a look this
> >> weekend
> >> > > as
> >> > > > to
> >> > > > > > why creating the listeners is slower.  On the dequeing, can
> you
> >> > give
> >> > > a
> >> > > > > > little more information on the usage pattern - are you using
> >> > > > > transactions,
> >> > > > > > auto-ack or client ack?  What prefetch size are you using?
> How
> >> > large
> >> > > > are
> >> > > > > > your messages?
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Rob
> >> > > > > >
> >> > > > > > On 14 October 2016 at 23:46, Ramayan Tiwari <
> >> > > ramayan.tiw...@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi All,
> >> > > > > > >
> >> > > > > > > We have been validating the new Qpid broker (version 6.0.4)
> >> and
> >> > > have
> >> > > > > > > compared against broker version 0.32 and are seeing major
> >> > > > regressions.
> >> > > > > > > Following is the summary of our test setup and results:
> >> > > > > > >
> >> > > > > > > *1. Test Setup *
> >> > > > > > >   *a). *Qpid broker runs on a dedicated host (12 cores, 32
> GB
> >> > RAM).
> >> > > > > > >   *b).* For 0.32, we allocated 16 GB heap. For 6.0.6 broker,
> >> we
> >> > use
> >> > > > 8GB
> >> > > > > > > heap and 8GB direct memory.
> >> > > > > > >   *c).* For 6.0.4, flow to disk has been configured at 60%.
> >> > > > > > >   *d).* Both the brokers use BDB host type.
> >> > > > > > >   *e).* Brokers have around 6000 queues and we create 16
> >> listener
> >> > > > > > > sessions/threads spread over 3 connections, where each
> >> session is
> >> > > > > > listening
> >> > > > > > > to 3000 queues. However, messages are only enqueued and
> >> processed
> >> > > > from
> >> > > > > 10
> >> > > > > > > queues.
> >> > > > > > >   *f).* We enqueue 1 million messages across 10 different
> >> queues
> >> > > > > (evenly
> >> > > > > > > divided), at the start of the test. Dequeue only starts once
> >> all
> >> > > the
> >> > > > > > > messages have been enqueued. We run the test for 2 hours and
> >> > > process
> >> > > > as
> >> > > > > > > many messages as we can. Each message runs for around 200
> >> > > > milliseconds.
> >> > > > > > >   *g).* We have used both 0.16 and 6.0.4 clients for these
> >> tests
> >> > > > (6.0.4
> >> > > > > > > client only with 6.0.4 broker)
> >> > > > > > >
> >> > > > > > > *2. Test Results *
> >> > > > > > >   *a).* System Load Average (read notes below on how we
> >> compute
> >> > > it),
> >> > > > > for
> >> > > > > > > 6.0.4 broker is 5x compared to 0.32 broker. During start of
> >> the
> >> > > test
> >> > > > > > (when
> >> > > > > > > we are not doing any dequeue), load average is normal (0.05
> >> for
> >> > > 0.32
> >> > > > > > broker
> >> > > > > > > and 0.1 for new broker), however, while we are dequeuing
> >> > messages,
> >> > > > the
> >> > > > > > load
> >> > > > > > > average is very high (around 0.5 consistently).
> >> > > > > > >
> >> > > > > > >   *b). *Time to create listeners in new broker has gone up
> by
> >> > 220%
> >> > > > > > compared
> >> > > > > > > to 0.32 broker (when using 0.16 client). For old broker,
> >> creating
> >> > > 16
> >> > > > > > > sessions each listening to 3000 queues takes 142 seconds and
> >> in
> >> > new
> >> > > > > > broker
> >> > > > > > > it took 456 seconds. If we use 6.0.4 client, it took even
> >> longer
> >> > at
> >> > > > > 524%
> >> > > > > > > increase (887 seconds).
> >> > > > > > >      *I).* The time to create consumers increases as we
> create
> >> > more
> >> > > > > > > listeners on the same connections. We have 20 sessions (but
> >> end
> >> > up
> >> > > > > using
> >> > > > > > > around 5 of them) on each connection and we create about
> 3000
> >> > > > consumers
> >> > > > > > and
> >> > > > > > > attach MessageListener to it. Each successive session takes
> >> > longer
> >> > > > > > > (approximately linear increase) to setup same number of
> >> consumers
> >> > > and
> >> > > > > > > listeners.
> >> > > > > > >
> >> > > > > > > *3). How we compute System Load Average *
> >> > > > > > > We query the Mbean SysetmLoadAverage and divide it by the
> >> value
> >> > of
> >> > > > > MBean
> >> > > > > > > AvailableProcessors. Both of these MBeans are available
> under
> >> > > > > > > java.lang.OperatingSystem.
> >> > > > > > >
> >> > > > > > > I am not sure what is causing these regressions and would
> like
> >> > your
> >> > > > > help
> >> > > > > > in
> >> > > > > > > understanding it. We are aware about the changes with
> respect
> >> to
> >> > > > > > threading
> >> > > > > > > model in the new broker, are there any design docs that we
> can
> >> > > refer
> >> > > > to
> >> > > > > > > understand these changes at a high level? Can we tune some
> >> > > parameters
> >> > > > > to
> >> > > > > > > address these issues?
> >> > > > > > >
> >> > > > > > > Thanks
> >> > > > > > > Ramayan
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Qpid broker 6.0.4 performance issues

Reply via email to