Re: Realtime computations using storm - questions on performance

Nathan Leung Thu, 16 Jul 2015 13:34:37 -0700

Storm task ids don't change:
https://groups.google.com/forum/#!topic/storm-user/7P23beQIL4c


On Thu, Jul 16, 2015 at 4:28 PM, Andrew Xor <[email protected]>
wrote:

> Direct grouping as it is shown in storm docs, means that you have to have
> a specific task id and use "direct streams" which is error prone, probably
> increase latency and might introduce redundancy problems as the producer of
> tuple needs to know the id of the task the tuple will have to go; so
> imagine a scenario where the receiving task fails for some reason and the
> producer can't relay the tuples unless it received the re-spawned task's id.
>
> Hope this helps.
>
> Kindly yours,
>
> Andrew Grammenos
>
> -- PGP PKey --
>  <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt
>
> On Thu, Jul 16, 2015 at 11:24 PM, Nick R. Katsipoulakis <
> [email protected]> wrote:
>
>> Hello again,
>>
>> Nathan, I am using direct-grouping because the application I am working
>> on has to be able to send tuples directly to specific tasks. In general
>> control the data flow. Can you please explain to me why you would not
>> recommend direct grouping? Is there any particular reason in the
>> architecture of Storm?
>>
>> Thanks,
>> Nick
>>
>> 2015-07-16 16:20 GMT-04:00 Nathan Leung <[email protected]>:
>>
>>> I would not recommend direct grouping unless you have a good reason for
>>> it.  Shuffle grouping is essentially random with even distribution which
>>> makes it easier to characterize its performance.  Local or shuffle grouping
>>> stays in process so generally it will be faster.  However you have to be
>>> careful in certain cases to avoid task starvation (e.g. you have kafka
>>> spout with 1 partition on the topic and 1 spout task, feeding 10 bolt "A"
>>> tasks in 10 worker processes). Direct grouping depends on your code (i.e.
>>> you can create hotspots), fields grouping depends on your key distribution,
>>> etc.
>>>
>>> On Thu, Jul 16, 2015 at 3:50 PM, Nick R. Katsipoulakis <
>>> [email protected]> wrote:
>>>
>>>> Hello all,
>>>>
>>>> I have two questions:
>>>>
>>>> 1) How do you exactly measure latency? I am doing the same thing and I
>>>> have a problem getting the exact milliseconds of latency (mainly because of
>>>> clock drifting).
>>>> 2) (to Nathan) Is there a difference in speeds among different
>>>> groupings? For instance, is shuffle faster than direct grouping?
>>>>
>>>> Thanks,
>>>> Nick
>>>>
>>>> 2015-07-15 17:37 GMT-04:00 Nathan Leung <[email protected]>:
>>>>
>>>>> Two things. Your math may be off depending on parallelism. One emit
>>>>> from A becomes 100 emitted from C, and you are joining all of them.
>>>>>
>>>>> Second, try the default number of ackers (one per worker). All your
>>>>> ack traffic is going to a single task.
>>>>>
>>>>> Also you can try local or shuffle grouping if possible to reduce
>>>>> network transfers.
>>>>> On Jul 15, 2015 12:45 PM, "Kashyap Mhaisekar" <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> We are attempting a real-time distributed computing using storm and
>>>>>> the solution has only one problem - inter bolt latency on same
>>>>>> machine or across machines ranges between 2 - 250 ms. I am not able to
>>>>>> figure out why. Network latency is under 0.5 ms. By latency, I mean
>>>>>> the time between an emit of one bolt/spout to getting the message in
>>>>>> execute() of next bolt.
>>>>>>
>>>>>> I have a topology like the below -
>>>>>> A (Spout) ->(Emits a number say 1000) -> B (bolt) [Receives this
>>>>>> number and divides this into 10 emits of 100 each) -> C (bolt) [Recieves
>>>>>> these emits and divides this to 10 emits of 10 numbers) -> D (bolt) [Does
>>>>>> some computation on the number and emits one message] -> E (bolt)
>>>>>> [Aggregates all the data and confirms if all the 1000 messages are
>>>>>> processed)
>>>>>>
>>>>>> Every bolt takes under 3 msec to complete and as a result, I
>>>>>> estimated that the end to end processing for 1000 takes not more than 50
>>>>>> msec including any latencies.
>>>>>>
>>>>>> *Observations*
>>>>>> 1. The end to end time from Spout A to Bolt E takes 200 msec to 3
>>>>>> seconds. My estimate was under 50 msec given that each bolt and spout 
>>>>>> take
>>>>>> under 3 msec to execute including any latencies.
>>>>>> 2. I noticed that the most of the time is spent between Emit from a
>>>>>> Spout/Bolt and execute() of the consuming bolt.
>>>>>> 3. Network latency is under 0.5 msec.
>>>>>>
>>>>>> I am not able to figure out why it takes so much time between a
>>>>>> spout/bolt to next bolt. I understand that the spout/bolt buffers the 
>>>>>> data
>>>>>> into a queue and then the subsequent bolt consumes from there.
>>>>>>
>>>>>> *Infrastructure*
>>>>>> 1. 5 VMs with 4 CPU and 8 GB ram. Workers are with 1024 MB and there
>>>>>> are 20 workers overall.
>>>>>>
>>>>>> *Test*
>>>>>> 1. The test was done with 25 messages to the spout => 25 messages are
>>>>>> sent to spout in a span of 5 seconds.
>>>>>>
>>>>>> *Config values*
>>>>>> Config config = new Config();
>>>>>> config.put(Config.TOPOLOGY_WORKERS, Integer.parseInt(20));
>>>>>> config.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 16384);
>>>>>> config.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 16384);
>>>>>> config.put(Config.TOPOLOGY_ACKER_EXECUTORS, 1);
>>>>>> config.put(Config.TOPOLOGY_RECEIVER_BUFFER_SIZE, 8);
>>>>>> config.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 64);
>>>>>>
>>>>>> Please let me know if you have encountered similar issues and any
>>>>>> steps you have taken to mitigate the time taken between spout/bolt and
>>>>>> another bolt.
>>>>>>
>>>>>> Thanks
>>>>>> Kashyap
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nikolaos Romanos Katsipoulakis,
>>>> University of Pittsburgh, PhD candidate
>>>>
>>>
>>>
>>
>>
>> --
>> Nikolaos Romanos Katsipoulakis,
>> University of Pittsburgh, PhD candidate
>>
>
>

Re: Realtime computations using storm - questions on performance

Reply via email to