Re: Qpid broker 6.0.4 performance issues

Lorenz Quack Wed, 21 Dec 2016 05:17:58 -0800

Hi,

Regarding 0.32 behaviour, it checked to see whether to flow a message
to disk when putting a message on the Queue the same way Qpid 6 does.
In that sense 6 is not more or less aggressive.  However, the
algorithm behind the decision whether or not to flow to disk has
changed.  This change was done as part of a larger effort to isolate
VirtualHosts from each other.  I would have to go back and check how
the algorithm worked previously but I would assume that it just
considered the total (estimated) amount of memory used and did no per
VH or per Queue allocation.  This means 6 effectively lowers the
threshold on individual Queues especially in the case where some
VirtualHosts and/or Queues are used less than others.  On the upside,
the broker is fairer in its resource management and a single
VirtualHost can no longer use up all available memory.  How exactly
that trade-off between fairness and efficient use of available memory
is made is debatable but I don't think we want to go back to the pre-6
model of just lumping everything together.


Given your numbers (1 VH, 6000 Qs) each Queue would initially be
allocated 1/6000th of 60% of 8 GB ≈ 1 MB.  In then end state the full
Queues should end up with approximately 780 MB but as you noticed the
threshold is only recalculated periodically during housekeeping (by
default every 30 s) or when a VH or Queue is added or deleted.  If you
have DEBUG logging you should see periodic messages like "Allocating
target size to queues [...]"  if not then I am afraid you won't be
able to tell the current thresholds because they are only reported
once when flowToDisk becomes active/inactive.

So I think your analysis is probably correct that the revision of the
threshold is always "behind" the publishing, raising it on every
revision but never far enough to prevent flowToDisk.  This is not
ideal.  We will have to address this.  However, I am afraid that in
the current release there is no way to influence the algorithm other
than setting the available memory and broker.flowToDiskThreshold.

Regarding the MemoryStore, the algorithm triggering flowToDisk is the
same for all stores, just the implementation of the actual writing
messages to disk differs.  For the MemoryStore it is a noop, i.e., the
message is not flown to disk and remains in memory. Performancewise
we do not do a lot of testing with the MemoryStore because it is not a
typical use-case and mainly used for unit and system testing.  I would
assume that the better distribution you are seeing is coincidental
since that part of the code should be relatively independent of the
store type.  Unfortunately, I cannot see any of your graphs.  I
believe the mailing list strips all attachments.

Regarding a recommendation of how to configure your DM vs Heap I would
like to refer you to our documentation [1], especially section
"9.11.6. Memory Tuning the Broker".  There we provide formulas to
estimate the memory consumption of the broker for both DM and Heap.
Note that these are estimates and you should test your chosen settings
under a typical peak workload.  Given that your messages are small you
will probably want to favour Heap over DM but I am reluctant to make
an explicit recommendation.

Kind regards,
Lorenz

P.S.: I am going on a 2 day vacation later today but feel free to
continue this conversation with others on this list.

[1]https://qpid.apache.org/releases/qpid-java-6.1.0/java-broker/book/Java-Broker-Runtime-Memory.html


On 20/12/16 20:37, Ramayan Tiwari wrote:

Hi Lorenz,

Thanks a lot for your response and explaining the flow to diskalgorithm in detail. I have described the test setup in detail in thefirst email of this thread, to summarize the points again:

a) There is only one virtual host.

b) There are 6000 queues in this virtual host, but messages are onlyenqueued to 10 queues.c) Every queue gets equal number of messages (100k) at the start ofthe test (we do not start dequeue till all the 1 million messages areenqueued).d) Heap and DM memory are equal (8GB each) and DM flow to diskthreshold is 60%.


I looked at QUE-1014/15 log lines and following is what I notice:

a) These log lines are not present in 0.32 broker's log, which meansthat its not doing any flow to disk. Is flow to disk behaviordifferent in the two brokers, it looks like 6.0.x is a lot moreaggressive in this regard.

b) Since all the 1 million messages are enqueued at the start of test(takes about 7 mins to enqueue), flow to disk threshold revisionsperformed by the housekeeping task are not able to catch up. Or therate with which thresholds are revised can not catch up with the rateof enqueue. In my test, revisions once happened twice (4 seconds and 5mins after test start) and then on, the threshold was not revised forthe queues.

To make sure that we are not getting penalized by writing to disk, Ialso did a test using Memory store type and compared the result withBDB store type. Apparently, BDB store is slightly more efficient(2.7%) in terms of number of messages delivered. Memory store alsotakes more broker CPU (3% more on average), but its better in terms ofdistributing messages in a round robin manner from all the queues. Seethe attached graphs for details.

I do notice that flow to disk behavior is almost exactly same(QUE-1014/15 log lines are present) when running with Memory store. Iam wondering what does flow to disk does when we use Memory store?

Since our average messages size is less than 1KB, I am really lookingforward to some recommendation around the % allocation for DM vs Heap.


Thanks
Ramayan

On Tue, Dec 20, 2016 at 4:02 AM, Lorenz Quack <quack.lor...@gmail.com<mailto:quack.lor...@gmail.com>> wrote:


    Hello Ramayan,

    glad to hear that the patch is (mostly) working for you.
    To address your points:

        1. If indeed in one case flow to disk is kicking in while in
           the other one it is not, then I am not surprised that
           there is a 5% difference.  The question is whether the
           flow to disk is expected or not which leads to

        2. The direct memory utilization not exceeding a certain
           value is a strong indication that flow to disk is active.
           Could you verify that by checking the logs (QUE-1014/15)?
           If the flow to disk limit is exceeded then it is expected
           that 2 million messages consume the same amount of direct
           memory as 1 million messages.  Could you share a little
           more about the test setup?  How many VirtualHost are
           running on the broker?  How many Queues are on each
           VirtualHost?  What is the Queue depth of those Queues?
           All of those factors influence the actual flow to disk
           threshold.  This is to ensure some fairness between
           VirtualHosts as far as memory consumption is concerned.
           Below I explain how threshold allocation is currently
           performed.  We are considering changing the algorithm in
           the future or making it tunable.  Your ideas, requirements,
           and input on this would certainly be of interest to us.

    Looking forward to hearing from you.

    Kind regards,
    Lorenz


    Algorithm for flow to disk threshold:

     1. Take the total amount of the broker.flowToDiskThreshold and
        divide it amongst all active VirtualHosts as follows

       a. Half of broker.flowToDiskThreshold is evenly devided
          amongst the VHs to ensure a minimum amount is available to
          each VH.

       b. The remaining half is allocated proportional to the current
          usage pattern.  For example, if VH1 is currently using 3
          MB, VH2 is using 1 MB and VH3 is using 0 MB, then of the
          remaining half 3/4 will be allocated to VH1, 1/4 to VH2,
          and nothing to VH3.  If all VHs are empty distribute this
          half evenly like in 1.a.

     2. The VirtualHosts allocate their available memory to their
        Queues in a proportional fashion as explained above (1.b).


    Example:

     * The broker.flowToDiskThreshold is set to 10 GB.

     * Two Virtual Hosts with 10 Queues each.

       * VH1 all 10 Queues are empty.

       * VH2 all Queues contain 10 MB except of one Queue that
         contains 100 MB.

     * According to 1.a each VirtualHost is allocated half of 5 GB,
       i.e., 2.5 GB

     * According to 1.b VH1 using 0MB does not get any additional
       memory while VH2 gets the full of the remainder of the 5 GB
       totaling 7.5 GB.

     * The Queues on VH1 don't have messages on them so the
       VirtualHost falls back to allocating them equal shares: 250 MB
       each.

     * On VH2 the total current memory usage is 9*10 MB + 100 MB =
       190 MB so the smaller Queues receive 10/190 * 7.5 GB = 395 MB
       while the large Queue receives 100/190 * 7.5 GB = 3950 MB.

     * In total we allocated 10 * 250 MB + 9 * 395 MB + 1 * 3950 MB
       totaling 10 GB (within bounds of rounding errors).



    On 19/12/16 20:48, Ramayan Tiwari wrote:

        Hi Rob,

        I did another exhaustive performance test using the
        MultiQueueConsumer feature with 6.0.5 (and the patch). The
        broker CPU issues has been resolved and we no longer have the
        problem message prefetch (caused by long running message).

        Fairness among queue is also great (not as perfect as 0.32
        broker though, see attached graphs). Everything looks great,
        except for:

        1. 6.0.5 delivered around 4.6% less messages. Flow to disk
        triggered aggressively in 6.0.5 but I don't see any flow to
        disk happening in 0.32 (looking for QUE-1014). This might be
        the reason for lesser message delivery.

        2. Direct memory utilization in the new broker does not make
        sense to us. We did 2 tests: 1 millions and 2 million messages
        (220 Byte average message size), however, the direct memory
        utilization never exceeded 500MB (see attached graph), even
        when we are allocating 8GB for direct memory. Because there is
        a 1KB heap overhead with each message, heap utilization looks
        same for both 0.32  and 6.0.5. For our setup, this essentially
        means that, we are cutting our memory capacity by half,
        because now are allocating half of the available RAM to direct
        memory, but will be limited by heap anyway.

        These tests were performed using 16GB RAM, where 8GB was
        allocated to heap and 8GB for Direct memory. I also changed
        flowToDiskThreshold to 60%. This is one of our biggest concern
        with the new broker, since our average message size in
        production is less than 1KB. Currently we allocate all the
        available RAM to heap, which will be reduced in half with the
        new broker.

        What is the recommendation for memory allocation (heap vs dm)
        in our use case?

        Thanks
        Ramaayn

        On Fri, Oct 28, 2016 at 5:37 AM, Keith W <keith.w...@gmail.com
        <mailto:keith.w...@gmail.com> <mailto:keith.w...@gmail.com
        <mailto:keith.w...@gmail.com>>> wrote:

            Hi Ramayan

            QPID-7462 is a new (experimental) feature, so we don't
        consider this
            appropriate for inclusion in the 6.0.5 defect release  We
        follow a
            Semantic Versioning[1] strategy.

            The underlying issue is your testing has uncovered is poor
        performance
            with large numbers of consumers.  QPID-7462 effectively
        side steps the
            problem (by introducing alternative consumer behaviour)
        but does not
            address the root cause. We continue to consider how best
        to resolve
            the problem completely, but don't yet have timelines for
        this change.
            It is something that will be getting attention in what
        remains of this
            year.  We will keep you posted.

            In the meanwhile, I understand this causes you a problem.
        If you
            cannot adopt 6.1 (there should be another RC out soon),
        you could
            consider applying the patch (attached to the JIRA) to
        6.0.x branch and
            building yourself.

            Kind regards, Keith.


            [1] http://semver.org


            On 27 October 2016 at 23:19, Ramayan Tiwari
            <ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>
        <mailto:ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>>> wrote:
            > Hi Rob,
            >
            > I have the truck code which I am testing with, I haven't
            finished the test
            > runs yet. I was hoping that once I validate the change,
        I can simply
            > release 6.0.5.
            >
            > Thanks
            > Ramayan
            >
            > On Thu, Oct 27, 2016 at 12:41 PM, Rob Godfrey
            <rob.j.godf...@gmail.com <mailto:rob.j.godf...@gmail.com>
        <mailto:rob.j.godf...@gmail.com <mailto:rob.j.godf...@gmail.com>>>
            > wrote:
            >
            >> Hi Ramayan,
            >>
            >> did you verify that the change works for you? You said
        you were
            going to
            >> test with the trunk code...
            >>
            >> I'll discuss with the other developers tomorrow about
        whether
            we can put
            >> this change into 6.0.5.
            >>
            >> Cheers,
            >> Rob
            >>
            >> On 27 October 2016 at 20:30, Ramayan Tiwari
            <ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>
        <mailto:ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>>>
            >> wrote:
            >>
            >> > Hi Rob,
            >> >
            >> > I looked at the release notes for 6.0.5 and it doesn't
            include the fix
            >> for
            >> > large consumers issues [1]. The fix is marked for
        6.1, which
            will not
            >> have
            >> > JMX and for us to use this version requires major
        changes in our
            >> monitoring
            >> > framework. Could you please include the fix in 6.0.5
        release?
            >> >
            >> > Thanks
            >> > Ramayan
            >> >
            >> > [1]. https://issues.apache.org/jira/browse/QPID-7462
        <https://issues.apache.org/jira/browse/QPID-7462>
            <https://issues.apache.org/jira/browse/QPID-7462
        <https://issues.apache.org/jira/browse/QPID-7462>>
            >> >
            >> > On Wed, Oct 19, 2016 at 4:49 PM, Helen Kwong
            <helenkw...@gmail.com <mailto:helenkw...@gmail.com>
        <mailto:helenkw...@gmail.com <mailto:helenkw...@gmail.com>>>
            >> wrote:
            >> >
            >> > > Hi Rob,
            >> > >
            >> > > Again, thank you so much for answering our
        questions and
            providing a
            >> > patch
            >> > > so quickly :) One more question I have: would it be
            possible to include
            >> > > test cases involving many queues and listeners (in
        the order of
            >> thousands
            >> > > of queues) for future Qpid releases, as part of
        standard
            perf testing
            >> of
            >> > > the broker?
            >> > >
            >> > > Thanks,
            >> > > Helen
            >> > >
            >> > > On Tue, Oct 18, 2016 at 10:40 AM, Ramayan Tiwari <
            >> > ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>
        <mailto:ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>>
            >> > > > wrote:
            >> > >
            >> > >> Thanks so much Rob, I will test the patch against
        trunk
            and will
            >> update
            >> > >> you with the outcome.
            >> > >>
            >> > >> - Ramayan
            >> > >>
            >> > >> On Tue, Oct 18, 2016 at 2:37 AM, Rob Godfrey
            <rob.j.godf...@gmail.com <mailto:rob.j.godf...@gmail.com>
        <mailto:rob.j.godf...@gmail.com <mailto:rob.j.godf...@gmail.com>>
            >> >
            >> > >> wrote:
            >> > >>
            >> > >>> On 17 October 2016 at 21:50, Rob Godfrey
            <rob.j.godf...@gmail.com <mailto:rob.j.godf...@gmail.com>
        <mailto:rob.j.godf...@gmail.com <mailto:rob.j.godf...@gmail.com>>>
            >> > >>> wrote:
            >> > >>>
            >> > >>> >
            >> > >>> >
            >> > >>> > On 17 October 2016 at 21:24, Ramayan Tiwari <
            >> > ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>
        <mailto:ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>>>

            >> > >>> > wrote:
            >> > >>> >
            >> > >>> >> Hi Rob,
            >> > >>> >>
            >> > >>> >> We are certainly interested in testing the "multi
            queue consumers"
            >> > >>> >> behavior
            >> > >>> >> with your patch in the new broker. We would
        like to know:
            >> > >>> >>
            >> > >>> >> 1. What will the scope of changes, client or
        broker or
            both? We
            >> are
            >> > >>> >> currently running 0.16 client, so would like
        to make
            sure that we
            >> > will
            >> > >>> >> able
            >> > >>> >> to use these changes with 0.16 client.
            >> > >>> >>
            >> > >>> >>
            >> > >>> > There's no change to the client.  I can't
        remember what
            was in the
            >> > 0.16
            >> > >>> > client... the only issue would be if there are
        any bugs
            in the
            >> > parsing
            >> > >>> of
            >> > >>> > address arguments.  I can try to test that out tmr.
            >> > >>> >
            >> > >>>
            >> > >>>
            >> > >>> OK - with a little bit of care to get round the
        address
            parsing
            >> issues
            >> > in
            >> > >>> the 0.16 client... I think we can get this to
        work.  I've
            created the
            >> > >>> following JIRA:
            >> > >>>
            >> > >>> https://issues.apache.org/jira/browse/QPID-7462
        <https://issues.apache.org/jira/browse/QPID-7462>
            <https://issues.apache.org/jira/browse/QPID-7462
        <https://issues.apache.org/jira/browse/QPID-7462>>
            >> > >>>
            >> > >>> and attached to it are a patch which applies against
            trunk, and a
            >> > >>> separate
            >> > >>> patch which applies against the 6.0.x branch (
            >> > >>>
        https://svn.apache.org/repos/asf/qpid/java/branches/6.0.x
        <https://svn.apache.org/repos/asf/qpid/java/branches/6.0.x>
            <https://svn.apache.org/repos/asf/qpid/java/branches/6.0.x
        <https://svn.apache.org/repos/asf/qpid/java/branches/6.0.x>> -
        this is
            >> > >>> 6.0.4
            >> > >>> plus a few other fixes which we will soon be
        releasing as
            6.0.5)
            >> > >>>
            >> > >>> To create a consumer which uses this feature (and
        multi queue
            >> > >>> consumption)
            >> > >>> for the 0.16 client you need to use something
        like the
            following as
            >> the
            >> > >>> address:
            >> > >>>
            >> > >>> queue_01 ; {node : { type : queue }, link : {
            x-subscribes : {
            >> > >>> arguments : { x-multiqueue : [ queue_01, queue_02,
            queue_03 ],
            >> > >>> x-pull-only : true }}}}
            >> > >>>
            >> > >>>
            >> > >>> Note that the initial queue_01 has to be a name of an
            actual queue on
            >> > >>> the virtual host, but otherwise it is not
        actually used
            (if you were
            >> > >>> using a 0.32 or later client you could just use ''
            here).  The actual
            >> > >>> queues that are consumed from are in the list value
            associated with
            >> > >>> x-multiqueue.  For my testing I created a list
        with 3000
            queues here
            >> > >>> and this worked fine.
            >> > >>>
            >> > >>> Let me know if you have any questions / issues,
            >> > >>>
            >> > >>> Hope this helps,
            >> > >>> Rob
            >> > >>>
            >> > >>>
            >> > >>> >
            >> > >>> >
            >> > >>> >> 2. My understanding is that the "pull vs push"
        change
            is only with
            >> > >>> respect
            >> > >>> >> to broker and it does not change our architecture
            where we use
            >> > >>> >> MessageListerner to receive messages
        asynchronously.
            >> > >>> >>
            >> > >>> >
            >> > >>> > Exactly - this is only a change within the
        internal broker
            >> threading
            >> > >>> > model.  The external behaviour of the broker
        remains
            essentially
            >> > >>> unchanged.
            >> > >>> >
            >> > >>> >
            >> > >>> >>
            >> > >>> >> 3. Once I/O refactoring is completely, we would be
            able to go back
            >> > to
            >> > >>> use
            >> > >>> >> standard JMS consumer (Destination), what is the
            timeline and
            >> broker
            >> > >>> >> release version for the completion of this work?
            >> > >>> >>
            >> > >>> >
            >> > >>> > You might wish to continue to use the "multi
        queue" model,
            >> depending
            >> > on
            >> > >>> > your actual use case, but yeah once the I/O work is
            complete I
            >> would
            >> > >>> hope
            >> > >>> > that you could use the thousands of consumers model
            should you
            >> wish.
            >> > >>> We
            >> > >>> > don't have a schedule for the next phase of I/O
        rework
            right now -
            >> > >>> about
            >> > >>> > all I can say is that it is unlikely to be complete
            this year.  I'd
            >> > >>> need to
            >> > >>> > talk with Keith (who is currently on vacation)
        as to
            when we think
            >> we
            >> > >>> may
            >> > >>> > be able to schedule it.
            >> > >>> >
            >> > >>> >
            >> > >>> >>
            >> > >>> >> Let me know once you have integrated the patch
        and I
            will re-run
            >> our
            >> > >>> >> performance tests to validate it.
            >> > >>> >>
            >> > >>> >>
            >> > >>> > I'll make a patch for 6.0.x presently (I've been
            working on a
            >> change
            >> > >>> > against trunk - the patch will probably have to
        change
            a bit to
            >> apply
            >> > >>> to
            >> > >>> > 6.0.x).
            >> > >>> >
            >> > >>> > Cheers,
            >> > >>> > Rob
            >> > >>> >
            >> > >>> > Thanks
            >> > >>> >> Ramayan
            >> > >>> >>
            >> > >>> >> On Sun, Oct 16, 2016 at 3:30 PM, Rob Godfrey <
            >> > rob.j.godf...@gmail.com
        <mailto:rob.j.godf...@gmail.com>
        <mailto:rob.j.godf...@gmail.com <mailto:rob.j.godf...@gmail.com>>

            >> > >>> >
            >> > >>> >> wrote:
            >> > >>> >>
            >> > >>> >> > OK - so having pondered / hacked around a
        bit this
            weekend, I
            >> > think
            >> > >>> to
            >> > >>> >> get
            >> > >>> >> > decent performance from the IO model in 6.0
        for your
            use case
            >> > we're
            >> > >>> >> going
            >> > >>> >> > to have to change things around a bit.
            >> > >>> >> >
            >> > >>> >> > Basically 6.0 is an intermediate step on our
        IO /
            threading
            >> model
            >> > >>> >> journey.
            >> > >>> >> > In earlier versions we used 2 threads per
        connection
            for IO (one
            >> > >>> read,
            >> > >>> >> one
            >> > >>> >> > write) and then extra threads from a pool to
        "push"
            messages
            >> from
            >> > >>> >> queues to
            >> > >>> >> > connections.
            >> > >>> >> >
            >> > >>> >> > In 6.0 we move to using a pool for the IO
        threads,
            and also
            >> > stopped
            >> > >>> >> queues
            >> > >>> >> > from "pushing" to connections while the IO
        threads
            were acting
            >> on
            >> > >>> the
            >> > >>> >> > connection. It's this latter fact which is
        screwing up
            >> > performance
            >> > >>> for
            >> > >>> >> > your use case here because what happens is
        that on
            each network
            >> > >>> read we
            >> > >>> >> > tell each consumer to stop accepting pushes
        from the
            queue until
            >> > >>> the IO
            >> > >>> >> > interaction has completed.  This is causing
        lots of
            loops over
            >> > your
            >> > >>> 3000
            >> > >>> >> > consumers on each session, which is eating
        up a lot
            of CPU on
            >> > every
            >> > >>> >> network
            >> > >>> >> > interaction.
            >> > >>> >> >
            >> > >>> >> > In the final version of our IO refactoring
        we want
            to remove the
            >> > >>> >> "pushing"
            >> > >>> >> > from the queue, and instead have the consumers
            "pull" - so that
            >> > the
            >> > >>> only
            >> > >>> >> > threads that operate on the queues (outside of
            housekeeping
            >> tasks
            >> > >>> like
            >> > >>> >> > expiry) will be the IO threads.
            >> > >>> >> >
            >> > >>> >> > So, what we could do (and I have a patch
        sitting on
            my laptop
            >> for
            >> > >>> this)
            >> > >>> >> is
            >> > >>> >> > to look at using the "multi queue consumers"
        work I
            did for you
            >> > guys
            >> > >>> >> > before, but augmenting this so that the
        consumers
            work using a
            >> > >>> "pull"
            >> > >>> >> model
            >> > >>> >> > rather than the push model.  This will guarantee
            strict fairness
            >> > >>> between
            >> > >>> >> > the queues associated with the consumer
        (which was
            the issue you
            >> > had
            >> > >>> >> with

>> > >>> >> > this functionality before, I believe).Using this

            model you'd
            >> > only
            >> > >>> >> need a
            >> > >>> >> > small number (one?) of consumers per
        session.  The
            patch I have
            >> is
            >> > >>> to
            >> > >>> >> add
            >> > >>> >> > this "pull" mode for these consumers
        (essentially
            this is a
            >> > preview
            >> > >>> of
            >> > >>> >> how
            >> > >>> >> > all consumers will work in the future).
            >> > >>> >> >
            >> > >>> >> > Does this seem like something you would be
        interested in
            >> pursuing?
            >> > >>> >> >
            >> > >>> >> > Cheers,
            >> > >>> >> > Rob
            >> > >>> >> >
            >> > >>> >> > On 15 October 2016 at 17:30, Ramayan Tiwari <
            >> > >>> ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>
        <mailto:ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>>>
            >> > >>> >> > wrote:
            >> > >>> >> >
            >> > >>> >> > > Thanks Rob. Apologies for sending this
        over weekend :(
            >> > >>> >> > >
            >> > >>> >> > > Are there are docs on the new threading
        model? I
            found this on
            >> > >>> >> > confluence:
            >> > >>> >> > >
            >> > >>> >> > >
        https://cwiki.apache.org/confluence/display/qpid/IO+
        <https://cwiki.apache.org/confluence/display/qpid/IO+>
            <https://cwiki.apache.org/confluence/display/qpid/IO+
        <https://cwiki.apache.org/confluence/display/qpid/IO+>>
            >> > >>> >> > Transport+Refactoring
            >> > >>> >> > >
            >> > >>> >> > > We are also interested in understanding the
            threading model a
            >> > >>> little
            >> > >>> >> > better
            >> > >>> >> > > to help us figure our its impact for our usage
            patterns. Would
            >> > be
            >> > >>> very
            >> > >>> >> > > helpful if there are more
        docs/JIRA/email-threads
            with some
            >> > >>> details.
            >> > >>> >> > >
            >> > >>> >> > > Thanks
            >> > >>> >> > >
            >> > >>> >> > > On Sat, Oct 15, 2016 at 9:21 AM, Rob Godfrey <
            >> > >>> rob.j.godf...@gmail.com
        <mailto:rob.j.godf...@gmail.com>
        <mailto:rob.j.godf...@gmail.com <mailto:rob.j.godf...@gmail.com>>

            >> > >>> >> >
            >> > >>> >> > > wrote:
            >> > >>> >> > >
            >> > >>> >> > > > So I *think* this is an issue because of the
            extremely large
            >> > >>> number
            >> > >>> >> of
            >> > >>> >> > > > consumers.  The threading model in v6
        means that
            whenever a
            >> > >>> network
            >> > >>> >> > read
            >> > >>> >> > > > occurs for a connection, it iterates
        over the
            consumers on
            >> > that
            >> > >>> >> > > connection
            >> > >>> >> > > > - obviously where there are a large
        number of
            consumers this
            >> > is
            >> > >>> >> > > > burdensome.  I fear addressing this may
        not be a
            trivial
            >> > >>> change...
            >> > >>> >> I
            >> > >>> >> > > shall
            >> > >>> >> > > > spend the rest of my afternoon pondering
        this...
            >> > >>> >> > > >
            >> > >>> >> > > > - Rob
            >> > >>> >> > > >
            >> > >>> >> > > > On 15 October 2016 at 17:14, Ramayan
        Tiwari <
            >> > >>> >> ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>
            <mailto:ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>>>
            >> > >>> >> > > > wrote:
            >> > >>> >> > > >
            >> > >>> >> > > > > Hi Rob,
            >> > >>> >> > > > >
            >> > >>> >> > > > > Thanks so much for your response. We use
            transacted
            >> sessions
            >> > >>> with
            >> > >>> >> > > > > non-persistent delivery. Prefetch size
        is 1
            and every
            >> > message
            >> > >>> is
            >> > >>> >> same
            >> > >>> >> > > > size
            >> > >>> >> > > > > (200 bytes).
            >> > >>> >> > > > >
            >> > >>> >> > > > > Thanks
            >> > >>> >> > > > > Ramayan
            >> > >>> >> > > > >
            >> > >>> >> > > > > On Sat, Oct 15, 2016 at 2:59 AM, Rob
        Godfrey <
            >> > >>> >> > rob.j.godf...@gmail.com
        <mailto:rob.j.godf...@gmail.com>
            <mailto:rob.j.godf...@gmail.com
        <mailto:rob.j.godf...@gmail.com>>>

            >> > >>> >> > > > > wrote:
            >> > >>> >> > > > >
            >> > >>> >> > > > > > Hi Ramyan,
            >> > >>> >> > > > > >
            >> > >>> >> > > > > > this is interesting... in our
        testing (which
            admittedly
            >> > >>> didn't
            >> > >>> >> > cover
            >> > >>> >> > > > the
            >> > >>> >> > > > > > case of this many queues /
        listeners) we saw
            the 6.0.x
            >> > >>> broker
            >> > >>> >> using
            >> > >>> >> > > > less
            >> > >>> >> > > > > > CPU on average than the 0.32
        broker.  I'll
            have a look
            >> > this
            >> > >>> >> weekend
            >> > >>> >> > > as
            >> > >>> >> > > > to
            >> > >>> >> > > > > > why creating the listeners is
        slower.  On
            the dequeing,
            >> > can
            >> > >>> you
            >> > >>> >> > give
            >> > >>> >> > > a
            >> > >>> >> > > > > > little more information on the usage
        pattern
            - are you
            >> > using
            >> > >>> >> > > > > transactions,
            >> > >>> >> > > > > > auto-ack or client ack?  What
        prefetch size
            are you
            >> using?
            >> > >>> How
            >> > >>> >> > large
            >> > >>> >> > > > are
            >> > >>> >> > > > > > your messages?
            >> > >>> >> > > > > >
            >> > >>> >> > > > > > Thanks,
            >> > >>> >> > > > > > Rob
            >> > >>> >> > > > > >
            >> > >>> >> > > > > > On 14 October 2016 at 23:46, Ramayan
        Tiwari <
            >> > >>> >> > > ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>
            <mailto:ramayan.tiw...@gmail.com
        <mailto:ramayan.tiw...@gmail.com>>>

            >> > >>> >> > > > > > wrote:
            >> > >>> >> > > > > >
            >> > >>> >> > > > > > > Hi All,
            >> > >>> >> > > > > > >
            >> > >>> >> > > > > > > We have been validating the new Qpid
            broker (version
            >> > >>> 6.0.4)
            >> > >>> >> and
            >> > >>> >> > > have
            >> > >>> >> > > > > > > compared against broker version
        0.32 and
            are seeing
            >> > major
            >> > >>> >> > > > regressions.
            >> > >>> >> > > > > > > Following is the summary of our
        test setup and
            >> results:
            >> > >>> >> > > > > > >
            >> > >>> >> > > > > > > *1. Test Setup *
            >> > >>> >> > > > > > >   *a). *Qpid broker runs on a
        dedicated
            host (12
            >> cores,
            >> > >>> 32 GB
            >> > >>> >> > RAM).
            >> > >>> >> > > > > > >   *b).* For 0.32, we allocated 16
        GB heap.
            For 6.0.6
            >> > >>> broker,
            >> > >>> >> we
            >> > >>> >> > use
            >> > >>> >> > > > 8GB
            >> > >>> >> > > > > > > heap and 8GB direct memory.
            >> > >>> >> > > > > > >   *c).* For 6.0.4, flow to disk
        has been
            configured at
            >> > >>> 60%.
            >> > >>> >> > > > > > >   *d).* Both the brokers use BDB
        host type.
            >> > >>> >> > > > > > >   *e).* Brokers have around 6000
        queues
            and we create
            >> 16
            >> > >>> >> listener
            >> > >>> >> > > > > > > sessions/threads spread over 3
            connections, where each
            >> > >>> >> session is
            >> > >>> >> > > > > > listening
            >> > >>> >> > > > > > > to 3000 queues. However, messages
        are only
            enqueued
            >> and
            >> > >>> >> processed
            >> > >>> >> > > > from
            >> > >>> >> > > > > 10
            >> > >>> >> > > > > > > queues.
            >> > >>> >> > > > > > >   *f).* We enqueue 1 million messages
            across 10
            >> > different
            >> > >>> >> queues
            >> > >>> >> > > > > (evenly
            >> > >>> >> > > > > > > divided), at the start of the test.
            Dequeue only
            >> starts
            >> > >>> once
            >> > >>> >> all
            >> > >>> >> > > the
            >> > >>> >> > > > > > > messages have been enqueued. We
        run the
            test for 2
            >> hours
            >> > >>> and
            >> > >>> >> > > process
            >> > >>> >> > > > as
            >> > >>> >> > > > > > > many messages as we can. Each
        message runs
            for around
            >> > 200
            >> > >>> >> > > > milliseconds.
            >> > >>> >> > > > > > >   *g).* We have used both 0.16 and
        6.0.4
            clients for
            >> > these
            >> > >>> >> tests
            >> > >>> >> > > > (6.0.4
            >> > >>> >> > > > > > > client only with 6.0.4 broker)
            >> > >>> >> > > > > > >
            >> > >>> >> > > > > > > *2. Test Results *
            >> > >>> >> > > > > > >   *a).* System Load Average (read
        notes
            below on how
            >> we
            >> > >>> >> compute
            >> > >>> >> > > it),
            >> > >>> >> > > > > for
            >> > >>> >> > > > > > > 6.0.4 broker is 5x compared to 0.32
            broker. During
            >> start
            >> > >>> of
            >> > >>> >> the
            >> > >>> >> > > test
            >> > >>> >> > > > > > (when
            >> > >>> >> > > > > > > we are not doing any dequeue), load
            average is normal
            >> > >>> (0.05
            >> > >>> >> for
            >> > >>> >> > > 0.32
            >> > >>> >> > > > > > broker
            >> > >>> >> > > > > > > and 0.1 for new broker), however,
        while we are
            >> dequeuing
            >> > >>> >> > messages,
            >> > >>> >> > > > the
            >> > >>> >> > > > > > load
            >> > >>> >> > > > > > > average is very high (around 0.5
            consistently).
            >> > >>> >> > > > > > >
            >> > >>> >> > > > > > >   *b). *Time to create listeners
        in new
            broker has
            >> gone
            >> > >>> up by
            >> > >>> >> > 220%
            >> > >>> >> > > > > > compared
            >> > >>> >> > > > > > > to 0.32 broker (when using 0.16
        client).
            For old
            >> broker,
            >> > >>> >> creating
            >> > >>> >> > > 16
            >> > >>> >> > > > > > > sessions each listening to 3000 queues
            takes 142
            >> seconds
            >> > >>> and
            >> > >>> >> in
            >> > >>> >> > new
            >> > >>> >> > > > > > broker
            >> > >>> >> > > > > > > it took 456 seconds. If we use 6.0.4
            client, it took
            >> > even
            >> > >>> >> longer
            >> > >>> >> > at
            >> > >>> >> > > > > 524%
            >> > >>> >> > > > > > > increase (887 seconds).
            >> > >>> >> > > > > > >      *I).* The time to create
        consumers
            increases as
            >> we
            >> > >>> create
            >> > >>> >> > more
            >> > >>> >> > > > > > > listeners on the same connections.
        We have
            20 sessions
            >> > >>> (but
            >> > >>> >> end
            >> > >>> >> > up
            >> > >>> >> > > > > using
            >> > >>> >> > > > > > > around 5 of them) on each
        connection and
            we create
            >> about
            >> > >>> 3000
            >> > >>> >> > > > consumers
            >> > >>> >> > > > > > and
            >> > >>> >> > > > > > > attach MessageListener to it. Each
            successive session
            >> > >>> takes
            >> > >>> >> > longer
            >> > >>> >> > > > > > > (approximately linear increase) to
        setup
            same number
            >> of
            >> > >>> >> consumers
            >> > >>> >> > > and
            >> > >>> >> > > > > > > listeners.
            >> > >>> >> > > > > > >
            >> > >>> >> > > > > > > *3). How we compute System Load
        Average *
            >> > >>> >> > > > > > > We query the Mbean
        SysetmLoadAverage and
            divide it by
            >> > the
            >> > >>> >> value
            >> > >>> >> > of
            >> > >>> >> > > > > MBean
            >> > >>> >> > > > > > > AvailableProcessors. Both of these
        MBeans are
            >> available
            >> > >>> under
            >> > >>> >> > > > > > > java.lang.OperatingSystem.
            >> > >>> >> > > > > > >
            >> > >>> >> > > > > > > I am not sure what is causing these
            regressions and
            >> > would
            >> > >>> like
            >> > >>> >> > your
            >> > >>> >> > > > > help
            >> > >>> >> > > > > > in
            >> > >>> >> > > > > > > understanding it. We are aware
        about the
            changes with
            >> > >>> respect
            >> > >>> >> to
            >> > >>> >> > > > > > threading
            >> > >>> >> > > > > > > model in the new broker, are there any
            design docs
            >> that
            >> > >>> we can
            >> > >>> >> > > refer
            >> > >>> >> > > > to
            >> > >>> >> > > > > > > understand these changes at a high
        level?
            Can we tune
            >> > some
            >> > >>> >> > > parameters
            >> > >>> >> > > > > to
            >> > >>> >> > > > > > > address these issues?
            >> > >>> >> > > > > > >
            >> > >>> >> > > > > > > Thanks
            >> > >>> >> > > > > > > Ramayan
            >> > >>> >> > > > > > >
            >> > >>> >> > > > > >
            >> > >>> >> > > > >
            >> > >>> >> > > >
            >> > >>> >> > >
            >> > >>> >> >
            >> > >>> >>
            >> > >>> >
            >> > >>> >
            >> > >>>
            >> > >>
            >> > >>
            >> > >
            >> >
            >>

---------------------------------------------------------------------

            To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
        <mailto:users-unsubscr...@qpid.apache.org>
            <mailto:users-unsubscr...@qpid.apache.org
        <mailto:users-unsubscr...@qpid.apache.org>>
            For additional commands, e-mail:
        users-h...@qpid.apache.org <mailto:users-h...@qpid.apache.org>
            <mailto:users-h...@qpid.apache.org
        <mailto:users-h...@qpid.apache.org>>




        ---------------------------------------------------------------------
        To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
        <mailto:users-unsubscr...@qpid.apache.org>
        For additional commands, e-mail: users-h...@qpid.apache.org
        <mailto:users-h...@qpid.apache.org>



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
    <mailto:users-unsubscr...@qpid.apache.org>
    For additional commands, e-mail: users-h...@qpid.apache.org
    <mailto:users-h...@qpid.apache.org>




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org

Re: Qpid broker 6.0.4 performance issues

Reply via email to