I would be interested in a solution for high latency bolts as well.

Maybe a custom scheduler, which prioritizes high latency bolts might help?
(e.g., allowing a worker to exclusively run high latency bolts)

Does anyone have a working solution for a high-throughput topology (x0000
tuples / sec) including a HTTPClient bolt (latency around 100ms)?


2015-03-08 20:35 GMT+01:00 Frank Jania <[email protected]>:

> I've been running storm successfully now for a while with a fairly simple
> topology of this form:
>
> spout with a stream of tweets --> bolt to check tweet user against cache
> --> bolts to do some persistence based on tweet content.
>
> So far that's been humming along quite well with execute latencies in low
> single digit or sub millisecond. Other than setting the parallelism for
> various bolts, I've been able to run it the default topology config pretty
> well.
>
> Now I'm trying a topology of the form:
>
> spout with a stream of tweets --> bolt to extract the urls in the tweet
> --> bolt to fetch the url and get the page's title.
>
> For this topology the "fetch" portion can have a much longer latency, I'm
> seeing execute latencies in the 300-500ms range to accommodate the fetch of
> any of these arbitrary urls. I've implemented caching to avoid fetching
> urls I already have titles for and using socket/connection timeouts to keep
> fetches from hanging for too long, but even still, this is going to be a
> bottleneck.
>
> I've set the parallelism for the fetch bolt fairly high already, but are
> there any best practices for configuring a topology like this where at
> least one bolt is going to take much more time to process than the rest?
>

Reply via email to