Dynamic re-direction of tuples impacts end-to-end latency

Nick R. Katsipoulakis Wed, 25 Nov 2015 09:58:44 -0800

Hello,

I have to warn you that this is a long question, but I feel that I need to
provide essential details of my experiment, so that you can get the bigger
picture. If you have any additional questions, feel free to ask. There goes
my problem:


I have installed Storm 0.9.4 and ZooKeeper 3.4.6 on a single server (2
sockets with Intel Xeon 8-core chips, 96 GB ram running CentOS) and I have
setup a pseudo-distributed, single node Storm runtime. My configuration
consists of 1 zookeeper server, 1 nimbus process, 1 supervisor process, and
1 worker process (when topologies are submitted), all running on the same
machine. The purpose of my experiment is to see Storm's behavior on a
single node setting, when input load is dynamically distributed among
executor threads.

For the purpose of my experiment I have input tuples that consist of 1 long
and 1 integer value. The input data come from two spouts that read tuples
from disk files and I control the input rates to follow the pattern:

   1. 200 tuples/second for the first 24 seconds (time 0 - 24 seconds)
   2. 800 tuples/second for the next 12 seconds (24 - 36 seconds)
   3. 200 tuples/sec for 6 more seconds (time 36 - 42 seconds)

Turning to my topology, I have two types of bolts: a) a Dispatcher bolt
that receives input from the two spouts, and (b) a Consumer bolt that
performs an operation on the tuples and maintains some tuples as state. The
parallelism hint for the Dispatcher is one (1 executor/thread), since I
have examined that it never reaches even 10% of its capacity. For the
Consumer bolt I have a parallelism hint of two (2 executors/threads for
that bolt). The input rates I previously mentioned are picked so that I
monitor end-to-end latency less than 10 msecs using the appropriate number
of executors on the Consumer bolt. In detail, I have run the same topology
with one Consumer executor and it can handle an input rate of 200
tuples/sec with end-to-end latency < 10 msec. Similarly, if I add one more
Consumer executor (2 executors in total) the topology can consume 800
tuples/sec with < 10 msecs end-to-end latency. At this point, I have to say
that if I use 1 consumer executor for 800 tuples/sec the end-to-end latency
reaches up to 2 seconds. By the way, I have to mention that I measure
end-to-end latency using the ack() function of my bolts and see how much
time it takes between sending a tuple in the topology, until its tuple tree
is fully acknowledged.

As you realize by now, the goal is to see if I can maintain end-to-end
latency < 10 msec for the input spike, by simulating the addition of
another Consumer executor.In order to simulate the addition of processing
resources for the input spike, I use direct grouping and before the spike,
I send tuples only to one of the two Consumer executors. When the spike is
detected on the Dispatcher, it starts sending tuples to the other Consumer
also, so that the input load is balanced between two threads. Hence, I
expect that when I start sending tuples to the additional Consumer thread,
the end-to-end latency will drop back to its acceptable value. However, the
previous does not happen.

In order to verify my hypothesis that two Consumer executors are able to
maintain < 10 msec latency during a spike, I execute the same experiment,
but this time, I send tuples to both executors (threads) for the whole
lifetime of the experiment. In this case, the end-to-end latency remains
stable and in acceptable levels. So, I do not know what really happens in
my simulation. I can not really figure out what causes the deterioration of
the end-to-end latency in the case where input load is re-directed to the
additional Consumer executor.

In order to figure out more about the mechanics of Storm, I did the same
setup on a smaller machine and did some profiling. I saw that most of the
time is spent in the BlockingWaitStrategy of the lmax disruptor and it
dominates the CPU. My actual processing function (in the Consumer bolt)
takes only a fraction of the lmax BlockingWaitStrategy. Hence, I think that
it is an I/O issue between queues and not something that has to do with the
processing of tuples in the Consumer.

Anybody has any idea what goes wrong and I get this radical/buffling
behavior?

Thank you.

P.S.: I have attached my YAML file for further information and again I
apologize for the long post.


-- 
Nick R. Katsipoulakis,
Department of Computer Science
University of Pittsburgh

storm.yaml
Description: Binary data

Dynamic re-direction of tuples impacts end-to-end latency

Reply via email to