Re: Suggestion for topology with high latency bolt

Nathan Leung Tue, 10 Mar 2015 09:46:08 -0700

Storm supports custom schedulers:
http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/


On Tue, Mar 10, 2015 at 12:37 PM, Martin Illecker <[email protected]>
wrote:

> Curtis, I have made exactly the same observations. I have decreased the
> max spout pending to eliminate tuple timeouts.
> But this actually means throttling the whole topology because of one bolt
> with a high latency! (e.g., 5 bolts with 0.1 ms latency and 1 bolt with 1
> ms)
>
> At some point, increasing the parallelism of the high latency bolt will
> impact the overall performance of a worker. There has to be a better way.
>
> A possible solution might be to assign a bolt to a specific worker.
> Currently, if I assume correctly, each bolt is evenly distributed among
> multiple workers.
> (e.g., a bolt with parallelism 10 can be executed by 5 threads on 2
> workers or 2 threads on 5 workers)
>
> If a bolt could be assigned to a specific worker type, then it would be
> possible to add more workers / nodes, which exclusively execute multiple
> threads of a high latency bolt.
> For example, we could have one worker, which executes a high latency bolt
> and another worker, which executes the rest of the topology.
> So the default behavior would be evenly distribute the bolts but it should
> be possible to define different worker types and assign a bolt to these
> worker types.
>
> Does this make any sense?
> And could this be an additional feature of Storm?
>
> 2015-03-10 16:59 GMT+01:00 Curtis Allen <[email protected]>:
>
>> Idan, Use the Config class
>> https://github.com/apache/storm/blob/master/storm-core/src/jvm/backtype/storm/Config.java#L1295
>>
>> On Tue, Mar 10, 2015 at 9:49 AM Idan Fridman <[email protected]> wrote:
>>
>>> curtis, how do you set the storm.message.timeout.secs?
>>>
>>> 2015-03-10 17:07 GMT+02:00 Curtis Allen <[email protected]>:
>>>
>>>> Tuning an topology that contains bolts that have a unpredictable
>>>> execute latency is extremely difficult. I've had to slow down the entire
>>>> topology by increasing the storm.max.spout.pending and
>>>> storm.message.timeout.secs otherwise you'll have tuples queue up and
>>>> timeout.
>>>>
>>>
>>> On Tue, Mar 10, 2015 at 8:53 AM Martin Illecker <[email protected]>
>>> wrote:
>>>
>>>> I would be interested in a solution for high latency bolts as well.
>>>>
>>>> Maybe a custom scheduler, which prioritizes high latency bolts might
>>>> help?
>>>> (e.g., allowing a worker to exclusively run high latency bolts)
>>>>
>>>> Does anyone have a working solution for a high-throughput topology
>>>> (x0000 tuples / sec) including a HTTPClient bolt (latency around 100ms)?
>>>>
>>>>
>>>> 2015-03-08 20:35 GMT+01:00 Frank Jania <[email protected]>:
>>>>
>>>>> I've been running storm successfully now for a while with a fairly
>>>>> simple topology of this form:
>>>>>
>>>>> spout with a stream of tweets --> bolt to check tweet user against
>>>>> cache --> bolts to do some persistence based on tweet content.
>>>>>
>>>>> So far that's been humming along quite well with execute latencies in
>>>>> low single digit or sub millisecond. Other than setting the parallelism 
>>>>> for
>>>>> various bolts, I've been able to run it the default topology config pretty
>>>>> well.
>>>>>
>>>>> Now I'm trying a topology of the form:
>>>>>
>>>>> spout with a stream of tweets --> bolt to extract the urls in the
>>>>> tweet --> bolt to fetch the url and get the page's title.
>>>>>
>>>>> For this topology the "fetch" portion can have a much longer latency,
>>>>> I'm seeing execute latencies in the 300-500ms range to accommodate the
>>>>> fetch of any of these arbitrary urls. I've implemented caching to avoid
>>>>> fetching urls I already have titles for and using socket/connection
>>>>> timeouts to keep fetches from hanging for too long, but even still, this 
>>>>> is
>>>>> going to be a bottleneck.
>>>>>
>>>>> I've set the parallelism for the fetch bolt fairly high already, but
>>>>> are there any best practices for configuring a topology like this where at
>>>>> least one bolt is going to take much more time to process than the rest?
>>>>>
>>>>
>>>>
>

Re: Suggestion for topology with high latency bolt

Reply via email to