Storm supports custom schedulers: http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/
On Tue, Mar 10, 2015 at 12:37 PM, Martin Illecker <[email protected]> wrote: > Curtis, I have made exactly the same observations. I have decreased the > max spout pending to eliminate tuple timeouts. > But this actually means throttling the whole topology because of one bolt > with a high latency! (e.g., 5 bolts with 0.1 ms latency and 1 bolt with 1 > ms) > > At some point, increasing the parallelism of the high latency bolt will > impact the overall performance of a worker. There has to be a better way. > > A possible solution might be to assign a bolt to a specific worker. > Currently, if I assume correctly, each bolt is evenly distributed among > multiple workers. > (e.g., a bolt with parallelism 10 can be executed by 5 threads on 2 > workers or 2 threads on 5 workers) > > If a bolt could be assigned to a specific worker type, then it would be > possible to add more workers / nodes, which exclusively execute multiple > threads of a high latency bolt. > For example, we could have one worker, which executes a high latency bolt > and another worker, which executes the rest of the topology. > So the default behavior would be evenly distribute the bolts but it should > be possible to define different worker types and assign a bolt to these > worker types. > > Does this make any sense? > And could this be an additional feature of Storm? > > 2015-03-10 16:59 GMT+01:00 Curtis Allen <[email protected]>: > >> Idan, Use the Config class >> https://github.com/apache/storm/blob/master/storm-core/src/jvm/backtype/storm/Config.java#L1295 >> >> On Tue, Mar 10, 2015 at 9:49 AM Idan Fridman <[email protected]> wrote: >> >>> curtis, how do you set the storm.message.timeout.secs? >>> >>> 2015-03-10 17:07 GMT+02:00 Curtis Allen <[email protected]>: >>> >>>> Tuning an topology that contains bolts that have a unpredictable >>>> execute latency is extremely difficult. I've had to slow down the entire >>>> topology by increasing the storm.max.spout.pending and >>>> storm.message.timeout.secs otherwise you'll have tuples queue up and >>>> timeout. >>>> >>> >>> On Tue, Mar 10, 2015 at 8:53 AM Martin Illecker <[email protected]> >>> wrote: >>> >>>> I would be interested in a solution for high latency bolts as well. >>>> >>>> Maybe a custom scheduler, which prioritizes high latency bolts might >>>> help? >>>> (e.g., allowing a worker to exclusively run high latency bolts) >>>> >>>> Does anyone have a working solution for a high-throughput topology >>>> (x0000 tuples / sec) including a HTTPClient bolt (latency around 100ms)? >>>> >>>> >>>> 2015-03-08 20:35 GMT+01:00 Frank Jania <[email protected]>: >>>> >>>>> I've been running storm successfully now for a while with a fairly >>>>> simple topology of this form: >>>>> >>>>> spout with a stream of tweets --> bolt to check tweet user against >>>>> cache --> bolts to do some persistence based on tweet content. >>>>> >>>>> So far that's been humming along quite well with execute latencies in >>>>> low single digit or sub millisecond. Other than setting the parallelism >>>>> for >>>>> various bolts, I've been able to run it the default topology config pretty >>>>> well. >>>>> >>>>> Now I'm trying a topology of the form: >>>>> >>>>> spout with a stream of tweets --> bolt to extract the urls in the >>>>> tweet --> bolt to fetch the url and get the page's title. >>>>> >>>>> For this topology the "fetch" portion can have a much longer latency, >>>>> I'm seeing execute latencies in the 300-500ms range to accommodate the >>>>> fetch of any of these arbitrary urls. I've implemented caching to avoid >>>>> fetching urls I already have titles for and using socket/connection >>>>> timeouts to keep fetches from hanging for too long, but even still, this >>>>> is >>>>> going to be a bottleneck. >>>>> >>>>> I've set the parallelism for the fetch bolt fairly high already, but >>>>> are there any best practices for configuring a topology like this where at >>>>> least one bolt is going to take much more time to process than the rest? >>>>> >>>> >>>> >
