OK - so having pondered / hacked around a bit this weekend, I think to get
decent performance from the IO model in 6.0 for your use case we're going
to have to change things around a bit.

Basically 6.0 is an intermediate step on our IO / threading model journey.
In earlier versions we used 2 threads per connection for IO (one read, one
write) and then extra threads from a pool to "push" messages from queues to
connections.

In 6.0 we move to using a pool for the IO threads, and also stopped queues
from "pushing" to connections while the IO threads were acting on the
connection.  It's this latter fact which is screwing up performance for
your use case here because what happens is that on each network read we
tell each consumer to stop accepting pushes from the queue until the IO
interaction has completed.  This is causing lots of loops over your 3000
consumers on each session, which is eating up a lot of CPU on every network
interaction.

In the final version of our IO refactoring we want to remove the "pushing"
from the queue, and instead have the consumers "pull" - so that the only
threads that operate on the queues (outside of housekeeping tasks like
expiry) will be the IO threads.

So, what we could do (and I have a patch sitting on my laptop for this) is
to look at using the "multi queue consumers" work I did for you guys
before, but augmenting this so that the consumers work using a "pull" model
rather than the push model.  This will guarantee strict fairness between
the queues associated with the consumer (which was the issue you had with
this functionality before, I believe).  Using this model you'd only need a
small number (one?) of consumers per session.  The patch I have is to add
this "pull" mode for these consumers (essentially this is a preview of how
all consumers will work in the future).

Does this seem like something you would be interested in pursuing?

Cheers,
Rob

On 15 October 2016 at 17:30, Ramayan Tiwari <ramayan.tiw...@gmail.com>
wrote:

> Thanks Rob. Apologies for sending this over weekend :(
>
> Are there are docs on the new threading model? I found this on confluence:
>
> https://cwiki.apache.org/confluence/display/qpid/IO+Transport+Refactoring
>
> We are also interested in understanding the threading model a little better
> to help us figure our its impact for our usage patterns. Would be very
> helpful if there are more docs/JIRA/email-threads with some details.
>
> Thanks
>
> On Sat, Oct 15, 2016 at 9:21 AM, Rob Godfrey <rob.j.godf...@gmail.com>
> wrote:
>
> > So I *think* this is an issue because of the extremely large number of
> > consumers.  The threading model in v6 means that whenever a network read
> > occurs for a connection, it iterates over the consumers on that
> connection
> > - obviously where there are a large number of consumers this is
> > burdensome.  I fear addressing this may not be a trivial change...  I
> shall
> > spend the rest of my afternoon pondering this...
> >
> > - Rob
> >
> > On 15 October 2016 at 17:14, Ramayan Tiwari <ramayan.tiw...@gmail.com>
> > wrote:
> >
> > > Hi Rob,
> > >
> > > Thanks so much for your response. We use transacted sessions with
> > > non-persistent delivery. Prefetch size is 1 and every message is same
> > size
> > > (200 bytes).
> > >
> > > Thanks
> > > Ramayan
> > >
> > > On Sat, Oct 15, 2016 at 2:59 AM, Rob Godfrey <rob.j.godf...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ramyan,
> > > >
> > > > this is interesting... in our testing (which admittedly didn't cover
> > the
> > > > case of this many queues / listeners) we saw the 6.0.x broker using
> > less
> > > > CPU on average than the 0.32 broker.  I'll have a look this weekend
> as
> > to
> > > > why creating the listeners is slower.  On the dequeing, can you give
> a
> > > > little more information on the usage pattern - are you using
> > > transactions,
> > > > auto-ack or client ack?  What prefetch size are you using?  How large
> > are
> > > > your messages?
> > > >
> > > > Thanks,
> > > > Rob
> > > >
> > > > On 14 October 2016 at 23:46, Ramayan Tiwari <
> ramayan.tiw...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > We have been validating the new Qpid broker (version 6.0.4) and
> have
> > > > > compared against broker version 0.32 and are seeing major
> > regressions.
> > > > > Following is the summary of our test setup and results:
> > > > >
> > > > > *1. Test Setup *
> > > > >   *a). *Qpid broker runs on a dedicated host (12 cores, 32 GB RAM).
> > > > >   *b).* For 0.32, we allocated 16 GB heap. For 6.0.6 broker, we use
> > 8GB
> > > > > heap and 8GB direct memory.
> > > > >   *c).* For 6.0.4, flow to disk has been configured at 60%.
> > > > >   *d).* Both the brokers use BDB host type.
> > > > >   *e).* Brokers have around 6000 queues and we create 16 listener
> > > > > sessions/threads spread over 3 connections, where each session is
> > > > listening
> > > > > to 3000 queues. However, messages are only enqueued and processed
> > from
> > > 10
> > > > > queues.
> > > > >   *f).* We enqueue 1 million messages across 10 different queues
> > > (evenly
> > > > > divided), at the start of the test. Dequeue only starts once all
> the
> > > > > messages have been enqueued. We run the test for 2 hours and
> process
> > as
> > > > > many messages as we can. Each message runs for around 200
> > milliseconds.
> > > > >   *g).* We have used both 0.16 and 6.0.4 clients for these tests
> > (6.0.4
> > > > > client only with 6.0.4 broker)
> > > > >
> > > > > *2. Test Results *
> > > > >   *a).* System Load Average (read notes below on how we compute
> it),
> > > for
> > > > > 6.0.4 broker is 5x compared to 0.32 broker. During start of the
> test
> > > > (when
> > > > > we are not doing any dequeue), load average is normal (0.05 for
> 0.32
> > > > broker
> > > > > and 0.1 for new broker), however, while we are dequeuing messages,
> > the
> > > > load
> > > > > average is very high (around 0.5 consistently).
> > > > >
> > > > >   *b). *Time to create listeners in new broker has gone up by 220%
> > > > compared
> > > > > to 0.32 broker (when using 0.16 client). For old broker, creating
> 16
> > > > > sessions each listening to 3000 queues takes 142 seconds and in new
> > > > broker
> > > > > it took 456 seconds. If we use 6.0.4 client, it took even longer at
> > > 524%
> > > > > increase (887 seconds).
> > > > >      *I).* The time to create consumers increases as we create more
> > > > > listeners on the same connections. We have 20 sessions (but end up
> > > using
> > > > > around 5 of them) on each connection and we create about 3000
> > consumers
> > > > and
> > > > > attach MessageListener to it. Each successive session takes longer
> > > > > (approximately linear increase) to setup same number of consumers
> and
> > > > > listeners.
> > > > >
> > > > > *3). How we compute System Load Average *
> > > > > We query the Mbean SysetmLoadAverage and divide it by the value of
> > > MBean
> > > > > AvailableProcessors. Both of these MBeans are available under
> > > > > java.lang.OperatingSystem.
> > > > >
> > > > > I am not sure what is causing these regressions and would like your
> > > help
> > > > in
> > > > > understanding it. We are aware about the changes with respect to
> > > > threading
> > > > > model in the new broker, are there any design docs that we can
> refer
> > to
> > > > > understand these changes at a high level? Can we tune some
> parameters
> > > to
> > > > > address these issues?
> > > > >
> > > > > Thanks
> > > > > Ramayan
> > > > >
> > > >
> > >
> >
>

Reply via email to