Thank you Roshan for your great suggestions! I will definitely try them
this week and see the performance improvement.

On Jan 31, 2018 11:36 PM, "Roshan Naik" <[email protected]> wrote:

It’s a hit or miss whether you get better latency with batchSz=1… depends
on multiple things like the topology / scale / execution times etc.

For your current test workload it doesn’t look like you have much impact
from Storm. At a higher rate of inter-worker image transfers there is a
possibility that Kryo de/serialization can hold things back (throughput
and/or latency). Not sure if the Netty stuff has much impact.



You could try a few things if you are trying to squeeze every last drop of
latency:

-          Try batchSz=1, with and without ACKing. Disabling ACKing can
reduce latency (if you can tolerate data loss when workers have problems).

-          Measure the latency for transferring images within a single
worker and see how much it differs.

o    If you see a big diff, design your topo deployment to reduce
inter-worker communication.

-          Try Enabling/disabling load aware messaging see if it impacts
your latency.





There are latency and throughput improvements anticipated in Storm 2.0 (via
STORM-2306) along with predictably lower latency for batchSz=1. (the new
default once it is committed)

If you are adventurous/curious you can:

-          Try STORM-2306 and see if you get any improvement. With small
data it is definitely possible to achieve microsecond latency between 2
Storm workers.



-roshan



*From: *Wuyang Zhang <[email protected]>
*Reply-To: *"[email protected]" <[email protected]>
*Date: *Wednesday, January 31, 2018 at 7:55 PM

*To: *"[email protected]" <[email protected]>
*Subject: *Re: Apache Storm High Messaging Delay When Passing 5MB Images



Also, I would like to ask that when the target application demands ultra
low latency and removes any possible messaging delay, do you think setting
the batch size = 1 can help in this case? According to the source code, it
will flush the queue when its size comes to a predefined batch size or goes
after an interval. So to reduce the queue delay, I try to set the batch
size to 1 and the flush interval to 1ms. Meanwhile, I modify the waiting
strategy to yield.



Any other suggestions to reduce messaging delay introduced inside Storm?

ᐧ



On Wed, Jan 31, 2018 at 10:20 PM, Wuyang Zhang <[email protected]>
wrote:

Hi Jungtaek and Roshan,



Thank you for your replies and the suggestions.



I just did a more detailed evaluation for this messaging delay issue.



1. I use ping and iPerf and get the numbers of 0.193ms and 934Mbps.

2. I use scp to transmit the image m0.png(1.7MB) with the delay of 0.072s.

3. I convert two m0.png to a single byte array with the length of
9331200(8.89MB).

4. I write a udp socket with the buffer size of 1024 to transmit the 8.89MB
buffer in 80ms.

5. I send the 8.89MB buffer from the spout to a bolt in 93ms.



The storm message will introduce 10ms more delay.



I may get previous numbers because the heavy computing overhead and the
tuple is buffered in the queue.



Thanks a lot and motivate me to find this. I will try to consider
encode/decode images to reduce the transmission latency.



Best Regards,

Wuyang

ᐧ



On Wed, Jan 31, 2018 at 7:18 PM, Roshan Naik <[email protected]> wrote:

Continuing with Jungtaek’s line of thinking… would also like to know

-          What latencies are you able to achieve when directly
transmitting a 5mb image between two nodes (not using Storm) ?  And
similarly,… within two processes on the same node. ?

-          And how are you measuring it ?

-roshan



*From: *Jungtaek Lim <[email protected]>
*Reply-To: *"[email protected]" <[email protected]>
*Date: *Wednesday, January 31, 2018 at 3:38 PM
*To: *"[email protected]" <[email protected]>
*Subject: *Re: Apache Storm High Messaging Delay When Passing 5MB Images



I meant "easily seen" as "not exposed" and we are easy to miss to consider.



2018년 2월 1일 (목) 오전 8:36, Jungtaek Lim <[email protected]>님이 작성:

I'm not clear whether you're saying message transfer for each bolt took
200ms, or summation of 4 or 5 times network transfer latencies were 200 ms.



Why I say I'm not clear is that if it's latter and there's 5 times network
transfers, it is ideal latency in theory, since 1Gbps is 125MBps (1000Mbps,
not 1024Mbps) and 5M/125MBps = 40ms per each transfer. (I'd rather suspect
how it was possible in this case.)



Even there's 4 times network transfers, we may need to take this to
account: the latency calculation above is in theory, and there're many
overheads other than messaging which is not easily seen, so the latency may
not be from messaging overhead.



If it took 200 ms for each transfer that can be a thing to talk about.
Please let me know which is your case.



Thanks,

Jungtaek Lim (HeartSaVioR)



2018년 2월 1일 (목) 오전 8:11, Wuyang Zhang <[email protected]>님이 작성:

I am playing with Apache Storm for a real-time image processing application
which requires ultra low latency. In the topology definition, a single
spout will emit raw images(5MB) in every 1s and a few bolts will process
them. The processing latency of each bolt is acceptable and the overall
computing delay can be around 150ms.

*However, I find that the message passing delay between workers on the
different nodes is really high. The overall such delay on the 5 successive
bolts is around 200ms.* To calculate this delay, I subtract all the task
latencies from the end-to-end latency. Moreover, I implement a timer bolt
and other processing bolts will register in this timer bolt to record the
timestamp before starting the real processing. By comparing the timestamps
of the bolts, I find the delay between each bolt is high as I previously
noticed.

To analyze the source of this high additional delay, I firstly reduce the
sending interval to 1s and thus there should be no queuing delay due to the
high computing overheads. Also, from the Storm UI, I find none bolt is in
high CPU utilization.

Then, I checked the network delay. I am using a 1Gbps network testbed and
test the network by RTT and bandwidth. The network latency should not be
that high to send a 5MB image.

Finally, I am thinking about the buffer delay. I find each thread maintains
its own sending buffer and transfer the data to the worker's sending
buffer. I am not sure how long it takes before the receiver bolt can get
this sending message. As suggested by the community, I increase the
sender/receiver buffer size to 16384, modify STORM_NETTY_MESSAGE_BATCH_SIZE
to 32768. However, it did not help.

*My question is that how to remove/reduce the messaging overheads between
bolts?(inter workers)* It is possible to synchronize the communication
between bolts and have the receiver got the sending messages immediately
without any delay?



ᐧ

Reply via email to