Have you set 'topology.max.spout.pending'?

On Sun, Nov 1, 2015 at 2:26 PM, Renjie Liu <[email protected]> wrote:

> Hi, storm community:
>
> We have a storm cluster deployed with 15 workers and recently we often
> experience failure since ack timeout. Our input source is kafka and we used
> ganglia to monitor our cluster. Recently we experience failures every 12
> hours and following are my observations from some monitoring tools when
> problem happens:
>
>    1. Topology page shows that no worker was down since uptime of each
>    task are nearly equal to topology uptime
>    2. I've checked ganglia, the cpu report and mem report does not give
>    any clue about the problem. But network report shows something unusual: the
>    in speed decreases a little while the out speed decreases to nearly zero on
>    some workers.
>    3. I've logged in to one of machines mentioned above, and found out
>    that one of the survivor areas always remains 100% full.
>    4. dstat show that csw turns to 4k+ every few seconds while it remains
>    around 400 in normal condition.
>
> Can anyone give us some hint about this problem?
>

Reply via email to