Simon - Take a look at BasicBoltExecutor#executor which is an adaptor from IBasicBolt to IRichBolt. All collector.fail is accompanied with collector.reportError() if you rethrow exception as ReportedFailedException.
Could you please check that this is the case in your bolts too ? In IRichBolt you would need to take care of that yourself.? ________________________________ From: Simon Cooper <[email protected]> Sent: Tuesday, October 14, 2014 12:48 PM To: [email protected] Subject: RE: Finding out why a tuple failed We're seeing random failures. No exceptions in the logs, just failed tuples at the spout with no other information. We think it's timeouts, but there's no information anywhere as to which bolts in the tuple tree didn't ack or fail the event in time. From: Itai Frenkel [mailto:[email protected]] Sent: 14 October 2014 08:32 To: [email protected] Subject: Re: Finding out why a tuple failed ?Let's say you have 10000 tuples processed. And only one of them reported an error and that is the same tuple that failed. They you look in Sigmund and see the error and you know for sure it relates to the failed tuple. Now let's consider that out of 10000, half of them failed for different reasons, then looking in sigmund will still give you errors, however you would not be able to pinpoint it to a specific tuple id. ________________________________ From: Vladi Feigin <[email protected]<mailto:[email protected]>> Sent: Monday, October 13, 2014 8:50 PM To: [email protected]<mailto:[email protected]> Subject: Re: Finding out why a tuple failed @Itai What do you mean by "other errors" ? Are these the internal Storm errors ,which are not reported in the nimbus? If yes, are they reported in the logs? Vladi On Mon, Oct 13, 2014 at 5:05 PM, Itai Frenkel <[email protected]<mailto:[email protected]>> wrote: Assuming each failure in the code is accompanied by collector.reportError(ex) (aka BasicBolt) then you would see an exception in nimbus. If there are many other errors, then it may not be the exception you are looking for. To get more fidelity you would need to send all errors to ELK stack (that's what we do) and filter by id. Itai ________________________________ From: Simon Cooper <[email protected]<mailto:[email protected]>> Sent: Monday, October 13, 2014 2:58 PM To: [email protected]<mailto:[email protected]> Subject: Finding out why a tuple failed Is there any possible way, either through logging or programmatically, to find out why a tuple failed? If it timed out, which bolts it was waiting for acks from in the tuple tree, and if it was explicitly failed, which bolt failed it? I'm having a hell of a time trying to debug a complex topology that is not acking any of its tuples back at the spout :( Thanks, SimonC
