Re: Realtime computations using storm - questions on performance

Andrew Xor Thu, 16 Jul 2015 13:16:54 -0700

Hi,

 Latency is dependent mainly on two things 1) network transfers involved 2)
actual processing that is performed on each tuple. Certainly, one acker per
worker certainly helps but you have to *reduce* network transfers as much
as possible. Also there is a huge difference between different machines and
configurations; for instance virtualized setups (at least to my end)
usually perform considerably better (network-wise) than when having to deal
with separate physical nodes. For example in my soft-cluster (virtualized)
I could process batches of around 1mb under 200 ms against two nodes (from
Spout -> Bolt A).


You will also have to take in account latency and protocol overhead and any
other network transfers that might hinder performance, truth to be told
I've noticed some deviation in processing times but nothing much (it might
be due to GC phases as well).

Hope this helps.

Kindly yours,

Andrew Grammenos

-- PGP PKey --
 <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt

On Thu, Jul 16, 2015 at 10:50 PM, Nick R. Katsipoulakis <
[email protected]> wrote:

> Hello all,
>
> I have two questions:
>
> 1) How do you exactly measure latency? I am doing the same thing and I
> have a problem getting the exact milliseconds of latency (mainly because of
> clock drifting).
> 2) (to Nathan) Is there a difference in speeds among different groupings?
> For instance, is shuffle faster than direct grouping?
>
> Thanks,
> Nick
>
> 2015-07-15 17:37 GMT-04:00 Nathan Leung <[email protected]>:
>
>> Two things. Your math may be off depending on parallelism. One emit from
>> A becomes 100 emitted from C, and you are joining all of them.
>>
>> Second, try the default number of ackers (one per worker). All your ack
>> traffic is going to a single task.
>>
>> Also you can try local or shuffle grouping if possible to reduce network
>> transfers.
>> On Jul 15, 2015 12:45 PM, "Kashyap Mhaisekar" <[email protected]>
>> wrote:
>>
>>> Hi,
>>> We are attempting a real-time distributed computing using storm and the
>>> solution has only one problem - inter bolt latency on same machine or
>>> across machines ranges between 2 - 250 ms. I am not able to figure out why.
>>> Network latency is under 0.5 ms. By latency, I mean the time between an
>>> emit of one bolt/spout to getting the message in execute() of next bolt.
>>>
>>> I have a topology like the below -
>>> A (Spout) ->(Emits a number say 1000) -> B (bolt) [Receives this number
>>> and divides this into 10 emits of 100 each) -> C (bolt) [Recieves these
>>> emits and divides this to 10 emits of 10 numbers) -> D (bolt) [Does some
>>> computation on the number and emits one message] -> E (bolt) [Aggregates
>>> all the data and confirms if all the 1000 messages are processed)
>>>
>>> Every bolt takes under 3 msec to complete and as a result, I estimated
>>> that the end to end processing for 1000 takes not more than 50 msec
>>> including any latencies.
>>>
>>> *Observations*
>>> 1. The end to end time from Spout A to Bolt E takes 200 msec to 3
>>> seconds. My estimate was under 50 msec given that each bolt and spout take
>>> under 3 msec to execute including any latencies.
>>> 2. I noticed that the most of the time is spent between Emit from a
>>> Spout/Bolt and execute() of the consuming bolt.
>>> 3. Network latency is under 0.5 msec.
>>>
>>> I am not able to figure out why it takes so much time between a
>>> spout/bolt to next bolt. I understand that the spout/bolt buffers the data
>>> into a queue and then the subsequent bolt consumes from there.
>>>
>>> *Infrastructure*
>>> 1. 5 VMs with 4 CPU and 8 GB ram. Workers are with 1024 MB and there are
>>> 20 workers overall.
>>>
>>> *Test*
>>> 1. The test was done with 25 messages to the spout => 25 messages are
>>> sent to spout in a span of 5 seconds.
>>>
>>> *Config values*
>>> Config config = new Config();
>>> config.put(Config.TOPOLOGY_WORKERS, Integer.parseInt(20));
>>> config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
>>> config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384);
>>> config.put(Config.TOPOLOGY_ACKER_EXECUTORS, 1);
>>> config.put(Config.TOPOLOGY_RECEIVER_BUFFER_SIZE, 8);
>>> config.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 64);
>>>
>>> Please let me know if you have encountered similar issues and any steps
>>> you have taken to mitigate the time taken between spout/bolt and another
>>> bolt.
>>>
>>> Thanks
>>> Kashyap
>>>
>>
>
>
> --
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate
>

Re: Realtime computations using storm - questions on performance

Reply via email to