Figured out myself, it’s because I throw FailedException() after calling the _collector.fail(input), apparently, HolmesNL spout doesn’t handle FailedException, and the whole worker process halted, as soon as I remove that exception, retry works well
> On Nov 26, 2014, at 3:00 PM, Hefeng Yuan <[email protected]> wrote: > > Thanks for the reply, yeah that explains the auto restart part, still not > clear why it retries 4 times and stop > > I did start with the official Kafka spout, totally doesn't work for me, loses > message, and constantly restart worker with timed-out > > Are there someone else also using HolmesNL spout? Wondering how you guys deal > with failed tuple retry > > > > On Nov 26, 2014, at 13:59, Harsha <[email protected] <mailto:[email protected]>> > wrote: > >> >> If your bolt hanged it will cause workers not to send heartbeats and >> supervisor.worker.timeout.secs trigger causing workers to be killed and >> restarted. Did you try using >> https://github.com/apache/storm/tree/master/external/storm-kafka >> <https://github.com/apache/storm/tree/master/external/storm-kafka> >> -Harsha >> >> On Wed, Nov 26, 2014, at 01:40 PM, Hefeng Yuan wrote: >>> Hello, >>> >>> I’m trying to us HolmesNL/kafka-spout, it worked pretty well for happy >>> path, however, when tuple fails (e.g. _collector.fail(input) gets called in >>> bolt), it seems like only retry 3 or 4 times, and then hang there, until >>> the supervisor.worker.timeout.secs reaches, and topology got restarted. >>> Just wondering where is this number of retried controlled, and also, since >>> the tuple already fail, why would it still trigger >>> supervisor.worker.timeout.secs? >>> >>> Thanks, >>> Hefeng >>
