Have you set 'topology.max.spout.pending'? On Sun, Nov 1, 2015 at 2:26 PM, Renjie Liu <[email protected]> wrote:
> Hi, storm community: > > We have a storm cluster deployed with 15 workers and recently we often > experience failure since ack timeout. Our input source is kafka and we used > ganglia to monitor our cluster. Recently we experience failures every 12 > hours and following are my observations from some monitoring tools when > problem happens: > > 1. Topology page shows that no worker was down since uptime of each > task are nearly equal to topology uptime > 2. I've checked ganglia, the cpu report and mem report does not give > any clue about the problem. But network report shows something unusual: the > in speed decreases a little while the out speed decreases to nearly zero on > some workers. > 3. I've logged in to one of machines mentioned above, and found out > that one of the survivor areas always remains 100% full. > 4. dstat show that csw turns to 4k+ every few seconds while it remains > around 400 in normal condition. > > Can anyone give us some hint about this problem? >
