Seems like a fair enhancement. Could you please file a JIRA stating these details?
Thanks, Mona On 2/11/13 6:03 PM, "Shangzhong zhu" <[email protected]> wrote: >Thanks, Mona. > >I could rerun the workflow job, but the corresponding coordinator action >status won't get changed, right? which means the coordinator action will >still show failed, even if the associated workflow get rerun successfully. > >Can we enhance the coordinator rerun to be consistent with the workflow >rerun? >* Keep the same workflow ID. >* Support rerun from beginning or rerun from the failed WF action. >* Rerun count reflects the number of tries. > >Thanks, >Shanzhong > >On Mon, Feb 11, 2013 at 1:11 PM, Mona Chitnis <[email protected]> >wrote: > >> Hi folks, >> >> Looking into the Coordinator Rerun logic, it looks like rerunning a >> coordinator action resets its external id (which maps to workflow job) >>and >> external status. This means it will run a fresh workflow job which >> explains why the client method getRerun() was returning '0'. >> >> For using the max-rerun limit, you can use OozieClient.rerun() method >> itself and supply to it the workflow job-id obtained from coordinator >> action's externalId. >> >> Thanks, >> >> Mona >> >> On 2/10/13 11:12 PM, "Shangzhong zhu" <[email protected]> wrote: >> >> >Thanks, Alejandro. >> > >> >For the WF rerun count you mentioned, is it >> >org.apache.oozie.clien.WorkflowJob.getReRun()? >> > >> >However, it seems always return 0 no matter how many rerun I made by >>using >> >Coordinator Rerun. >> > >> >Basically, I am using coordinator rerun: OozieClient.coordReRun() to >>rerun >> >failed coordinator actions/workflows. But I want to control the number >>of >> >reruns, say maximum 3 reruns. >> > >> >Thanks, >> >Shanzhong >> > >> > >> >On Mon, Feb 4, 2013 at 4:17 PM, Alejandro Abdelnur >> ><[email protected]>wrote: >> > >> >> On Sat, Feb 2, 2013 at 11:05 PM, Shangzhong zhu <[email protected]> >> >> wrote: >> >> >> >> > Hi All, >> >> > >> >> > We are developing a wrapper on top of oozie to automate >>failed/killed >> >> > coordinator action rerun. >> >> > >> >> > To rerun a coordinator action, seems I have two options. >> >> > >> >> > 1. Using coordinator action rerun: >> >> > oozie job -rerun <coord_Job_id> <-date XXXX> >> >> > >> >> > 2. Since the failed action is a workflow job, I can also rerun that >> >> > workflow job by setting oozie.wf.rerun.failnodes to rerun from the >> >>failed >> >> > action. >> >> > >> >> > Questions: >> >> > 0. which option is preferred? >> >> > >> >> > 1. For option 1, can I choose to rerun from the failed action like >>the >> >> > oozie.wf.rerun.failnodes option in workflow rerun? >> >> > >> >> > If I recall correctly you cannot do this. >> >> >> >> >> >> > 2. For option 1, seems I cannot change the job configurations. But >>for >> >> > option 2, I have more flexibility in changing the configurations, >>say >> >>I >> >> can >> >> > change the job name so that I know how many rerun has been made for >> >>that >> >> > workflow. >> >> > >> >> > no need for this, there is a WF rerun count. >> >> >> >> >> >> > 3. If I chose option 2, does it mean that the rerun workflow job is >> >>not >> >> > part of the coordinator actions any more? In another word, if I >>killed >> >> that >> >> > coordinator job, that rerun workflow job will be still running? >> >> >> >> >> >> It should get killed as well as the WF job ID is still the same as. >> >> >> >> Wit Option #2 though I'm not sure what will happen with the status of >> >>the >> >> corresponding COORD action. >> >> >> >> >> >> >> >> > >> >> > >> >> Thanks >> >> > >> >> >> >> >> >> >> >> -- >> >> Alejandro >> >> >> >>
