Thanks Nathan, that's exactly what I meant. :-)

2015-03-10 17:45 GMT+01:00 Nathan Leung <[email protected]>:

> Storm supports custom schedulers:
> http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
>
> On Tue, Mar 10, 2015 at 12:37 PM, Martin Illecker <[email protected]>
> wrote:
>
>> Curtis, I have made exactly the same observations. I have decreased the
>> max spout pending to eliminate tuple timeouts.
>> But this actually means throttling the whole topology because of one bolt
>> with a high latency! (e.g., 5 bolts with 0.1 ms latency and 1 bolt with 1
>> ms)
>>
>> At some point, increasing the parallelism of the high latency bolt will
>> impact the overall performance of a worker. There has to be a better way.
>>
>> A possible solution might be to assign a bolt to a specific worker.
>> Currently, if I assume correctly, each bolt is evenly distributed among
>> multiple workers.
>> (e.g., a bolt with parallelism 10 can be executed by 5 threads on 2
>> workers or 2 threads on 5 workers)
>>
>> If a bolt could be assigned to a specific worker type, then it would be
>> possible to add more workers / nodes, which exclusively execute multiple
>> threads of a high latency bolt.
>> For example, we could have one worker, which executes a high latency bolt
>> and another worker, which executes the rest of the topology.
>> So the default behavior would be evenly distribute the bolts but it
>> should be possible to define different worker types and assign a bolt to
>> these worker types.
>>
>> Does this make any sense?
>> And could this be an additional feature of Storm?
>>
>> 2015-03-10 16:59 GMT+01:00 Curtis Allen <[email protected]>:
>>
>>> Idan, Use the Config class
>>> https://github.com/apache/storm/blob/master/storm-core/src/jvm/backtype/storm/Config.java#L1295
>>>
>>> On Tue, Mar 10, 2015 at 9:49 AM Idan Fridman <[email protected]>
>>> wrote:
>>>
>>>> curtis, how do you set the storm.message.timeout.secs?
>>>>
>>>> 2015-03-10 17:07 GMT+02:00 Curtis Allen <[email protected]>:
>>>>
>>>>> Tuning an topology that contains bolts that have a unpredictable
>>>>> execute latency is extremely difficult. I've had to slow down the entire
>>>>> topology by increasing the storm.max.spout.pending and
>>>>> storm.message.timeout.secs otherwise you'll have tuples queue up and
>>>>> timeout.
>>>>>
>>>>
>>>> On Tue, Mar 10, 2015 at 8:53 AM Martin Illecker <[email protected]>
>>>> wrote:
>>>>
>>>>> I would be interested in a solution for high latency bolts as well.
>>>>>
>>>>> Maybe a custom scheduler, which prioritizes high latency bolts might
>>>>> help?
>>>>> (e.g., allowing a worker to exclusively run high latency bolts)
>>>>>
>>>>> Does anyone have a working solution for a high-throughput topology
>>>>> (x0000 tuples / sec) including a HTTPClient bolt (latency around 100ms)?
>>>>>
>>>>>
>>>>> 2015-03-08 20:35 GMT+01:00 Frank Jania <[email protected]>:
>>>>>
>>>>>> I've been running storm successfully now for a while with a fairly
>>>>>> simple topology of this form:
>>>>>>
>>>>>> spout with a stream of tweets --> bolt to check tweet user against
>>>>>> cache --> bolts to do some persistence based on tweet content.
>>>>>>
>>>>>> So far that's been humming along quite well with execute latencies in
>>>>>> low single digit or sub millisecond. Other than setting the parallelism 
>>>>>> for
>>>>>> various bolts, I've been able to run it the default topology config 
>>>>>> pretty
>>>>>> well.
>>>>>>
>>>>>> Now I'm trying a topology of the form:
>>>>>>
>>>>>> spout with a stream of tweets --> bolt to extract the urls in the
>>>>>> tweet --> bolt to fetch the url and get the page's title.
>>>>>>
>>>>>> For this topology the "fetch" portion can have a much longer latency,
>>>>>> I'm seeing execute latencies in the 300-500ms range to accommodate the
>>>>>> fetch of any of these arbitrary urls. I've implemented caching to avoid
>>>>>> fetching urls I already have titles for and using socket/connection
>>>>>> timeouts to keep fetches from hanging for too long, but even still, this 
>>>>>> is
>>>>>> going to be a bottleneck.
>>>>>>
>>>>>> I've set the parallelism for the fetch bolt fairly high already, but
>>>>>> are there any best practices for configuring a topology like this where 
>>>>>> at
>>>>>> least one bolt is going to take much more time to process than the rest?
>>>>>>
>>>>>
>>>>>
>>
>

Reply via email to