Unsubscribe Em 7 de dez de 2016 17:46, "map reduced" <k3t.gi...@gmail.com> escreveu:
> Hi, > > I am trying to solve this problem - in my streaming flow, every day few > jobs fail due to some (say kafka cluster maintenance etc, mostly > unavoidable) reasons for few batches and resumes back to success. > I want to reprocess those failed jobs programmatically (assume I have a > way of getting start-end offsets for kafka topics for failed jobs). I was > thinking of these options: > 1) Somehow pause streaming job when it detects failing jobs - this seems > not possible. > 2) From driver - run additional processing to check every few minutes > using driver rest api (/api/v1/applications...) what jobs have failed and > submit batch jobs for those failed jobs > > 1 - doesn't seem to be possible, and I don't want to kill streaming > context just for few failing batches to stop the job for some time and > resume after few minutes. > 2 - seems like a viable option, but a little complicated, since even the > batch job can fail due to whatever reasons and I am back to tracking that > separately etc. > > Does anyone has faced this issue or have any suggestions? > > Thanks, > KP >