Re: Disruptor Queue flush behaviour

Bobby Evans Tue, 27 Nov 2018 07:57:13 -0800

FYI in 2.x all of this is different, but to answer your questions for 1.x.

It is a little complicated to try and keep the memory and CPU overhead low,
especially when few tuples are flowing.  Conceptually what happens is that
tuples are placed into a separate data structure when they are inserted.

https://github.com/apache/storm/blob/d2d6f40344e6cc92ab07f3a462d577ef6b61f8b1/storm-core/src/jvm/org/apache/storm/utils/DisruptorQueue.java#L225

If that batch fills up it will attempt to insert them into the disruptor
queue.  Inserting multiple messages into the queue is more efficient than
inserting in a single message at a time.  If there is not enough capacity
to insert the message it goes into an overflow queue.

Every millisecond there is a thread pool that will then work at flushing
all of the tuples buffered in the entire JVM.  First it will force any
outstanding tuples to be placed into the overflow queue.

https://github.com/apache/storm/blob/d2d6f40344e6cc92ab07f3a462d577ef6b61f8b1/storm-core/src/jvm/org/apache/storm/utils/DisruptorQueue.java#L268-L273

After that it goes through that overflow queue and makes sure all of the
tuples from overflow are flushed into the disruptor queue.  So if no tuples
are flowing a single thread in the thread pool will wake up once a ms to do
a few checks per queue and end up doing nothing.  If messages are flowing
once a ms a partial batch is inserted into the disruptor queue, and
depending on how long it takes to insert those messages into the queue
there may be a few threads doing this.

I hope this helps,

Bobby

On Mon, Nov 26, 2018 at 7:03 AM Thomas Cooper (PGR) <
[email protected]> wrote:

> Hi,
>
>
> I have a question about the behaviour of the LMAX disruptor queues that
> the executor send/receive and the worker transfer queues use.
>
>
> These queues batch tuples for processing (100 by default) and will wait
> until a full batch has arrived before passing them to the executor.
> However, they will also flush any tuples in the queue periodically (1 ms by
> default) to prevent the queue blocking for a long time while it waits for
> 100 tuples to turn up.
>
>
> My question is about the implementation of the flush interval behaviour:
>
>
>
>    1. Does the flush interval thread run continuously, issuing a flush
>    command every 1 ms and the queue just ignores it if it is already
>    flushing. If 100 tuples turn up between the constant flush commands
>    the queue issues them straight away.
>    2. Or does the flush interval timer only start when
>    consumeBatchWhenAvailable is called on the disruptor queue and a full
>    batch is not available? In which case the queue will wait for 1ms and
>    return whatever is in the queue at the end of that interval or, if 100
>    tuples turn up within that 1ms, return the full batch.
>
>
> From the code in storm-core/src/jvm/org/apache/storm/utils/DisruptorQueue.java
> it seems option 1 might be the case. However, the code in that class is
> quite complex and the interplay with the underlying LMAX library makes it
> hard to reason about.
>
>
> Any help with the above would greatly appreciated, I am attempting to
> model the effect of these queues on topology performance and hopefully
> investigate a way to optimise the choice of batch size and flush interval.
> <http://localhost:7080/github.com/apache/[email protected]/-/blob/storm-core/src/jvm/org/apache/storm/utils/DisruptorQueue.java?utm_source=share#L300:16>
>
>
> Thanks,
>
>
> Thomas Cooper
> PhD Student
> Newcastle University, School of Computing
> W: http://www.tomcooper.org.uk | Twitter: @tomncooper
> <https://twitter.com/tomncooper>
>

Re: Disruptor Queue flush behaviour

Reply via email to