Re: Suggestion for topology with high latency bolt

Curtis Allen Tue, 10 Mar 2015 09:08:52 -0700

Idan, Use the Config class
https://github.com/apache/storm/blob/master/storm-core/src/jvm/backtype/storm/Config.java#L1295


On Tue, Mar 10, 2015 at 9:49 AM Idan Fridman <[email protected]> wrote:

> curtis, how do you set the storm.message.timeout.secs?
>
> 2015-03-10 17:07 GMT+02:00 Curtis Allen <[email protected]>:
>
>> Tuning an topology that contains bolts that have a unpredictable execute
>> latency is extremely difficult. I've had to slow down the entire topology
>> by increasing the storm.max.spout.pending and storm.message.timeout.secs
>> otherwise you'll have tuples queue up and timeout.
>>
>
> On Tue, Mar 10, 2015 at 8:53 AM Martin Illecker <[email protected]>
> wrote:
>
>> I would be interested in a solution for high latency bolts as well.
>>
>> Maybe a custom scheduler, which prioritizes high latency bolts might help?
>> (e.g., allowing a worker to exclusively run high latency bolts)
>>
>> Does anyone have a working solution for a high-throughput topology (x0000
>> tuples / sec) including a HTTPClient bolt (latency around 100ms)?
>>
>>
>> 2015-03-08 20:35 GMT+01:00 Frank Jania <[email protected]>:
>>
>>> I've been running storm successfully now for a while with a fairly
>>> simple topology of this form:
>>>
>>> spout with a stream of tweets --> bolt to check tweet user against cache
>>> --> bolts to do some persistence based on tweet content.
>>>
>>> So far that's been humming along quite well with execute latencies in
>>> low single digit or sub millisecond. Other than setting the parallelism for
>>> various bolts, I've been able to run it the default topology config pretty
>>> well.
>>>
>>> Now I'm trying a topology of the form:
>>>
>>> spout with a stream of tweets --> bolt to extract the urls in the tweet
>>> --> bolt to fetch the url and get the page's title.
>>>
>>> For this topology the "fetch" portion can have a much longer latency,
>>> I'm seeing execute latencies in the 300-500ms range to accommodate the
>>> fetch of any of these arbitrary urls. I've implemented caching to avoid
>>> fetching urls I already have titles for and using socket/connection
>>> timeouts to keep fetches from hanging for too long, but even still, this is
>>> going to be a bottleneck.
>>>
>>> I've set the parallelism for the fetch bolt fairly high already, but are
>>> there any best practices for configuring a topology like this where at
>>> least one bolt is going to take much more time to process than the rest?
>>>
>>
>>

Re: Suggestion for topology with high latency bolt

Reply via email to