Thanks, Mona. I could rerun the workflow job, but the corresponding coordinator action status won't get changed, right? which means the coordinator action will still show failed, even if the associated workflow get rerun successfully.
Can we enhance the coordinator rerun to be consistent with the workflow rerun? * Keep the same workflow ID. * Support rerun from beginning or rerun from the failed WF action. * Rerun count reflects the number of tries. Thanks, Shanzhong On Mon, Feb 11, 2013 at 1:11 PM, Mona Chitnis <[email protected]> wrote: > Hi folks, > > Looking into the Coordinator Rerun logic, it looks like rerunning a > coordinator action resets its external id (which maps to workflow job) and > external status. This means it will run a fresh workflow job which > explains why the client method getRerun() was returning '0'. > > For using the max-rerun limit, you can use OozieClient.rerun() method > itself and supply to it the workflow job-id obtained from coordinator > action's externalId. > > Thanks, > > Mona > > On 2/10/13 11:12 PM, "Shangzhong zhu" <[email protected]> wrote: > > >Thanks, Alejandro. > > > >For the WF rerun count you mentioned, is it > >org.apache.oozie.clien.WorkflowJob.getReRun()? > > > >However, it seems always return 0 no matter how many rerun I made by using > >Coordinator Rerun. > > > >Basically, I am using coordinator rerun: OozieClient.coordReRun() to rerun > >failed coordinator actions/workflows. But I want to control the number of > >reruns, say maximum 3 reruns. > > > >Thanks, > >Shanzhong > > > > > >On Mon, Feb 4, 2013 at 4:17 PM, Alejandro Abdelnur > ><[email protected]>wrote: > > > >> On Sat, Feb 2, 2013 at 11:05 PM, Shangzhong zhu <[email protected]> > >> wrote: > >> > >> > Hi All, > >> > > >> > We are developing a wrapper on top of oozie to automate failed/killed > >> > coordinator action rerun. > >> > > >> > To rerun a coordinator action, seems I have two options. > >> > > >> > 1. Using coordinator action rerun: > >> > oozie job -rerun <coord_Job_id> <-date XXXX> > >> > > >> > 2. Since the failed action is a workflow job, I can also rerun that > >> > workflow job by setting oozie.wf.rerun.failnodes to rerun from the > >>failed > >> > action. > >> > > >> > Questions: > >> > 0. which option is preferred? > >> > > >> > 1. For option 1, can I choose to rerun from the failed action like the > >> > oozie.wf.rerun.failnodes option in workflow rerun? > >> > > >> > If I recall correctly you cannot do this. > >> > >> > >> > 2. For option 1, seems I cannot change the job configurations. But for > >> > option 2, I have more flexibility in changing the configurations, say > >>I > >> can > >> > change the job name so that I know how many rerun has been made for > >>that > >> > workflow. > >> > > >> > no need for this, there is a WF rerun count. > >> > >> > >> > 3. If I chose option 2, does it mean that the rerun workflow job is > >>not > >> > part of the coordinator actions any more? In another word, if I killed > >> that > >> > coordinator job, that rerun workflow job will be still running? > >> > >> > >> It should get killed as well as the WF job ID is still the same as. > >> > >> Wit Option #2 though I'm not sure what will happen with the status of > >>the > >> corresponding COORD action. > >> > >> > >> > >> > > >> > > >> Thanks > >> > > >> > >> > >> > >> -- > >> Alejandro > >> > >
