Praveen, Good to know it is resolved. I found this is another common issue that we should document in some place.
Regards, Mohammad ________________________________ From: Praveen M <[email protected]> To: [email protected]; Mohammad Islam <[email protected]> Sent: Tuesday, February 5, 2013 10:41 AM Subject: Re: Oozie noob question. Hi Mohammad, Thanks for writing. I found the issue in my case. My input-event criterion was looking for the _SUCCESS file in the directory, and I don't even write such a file, hence i decided to use a <done-flag></done-flag>. That fixed the issue. Thank you. Praveen On Wed, Jan 30, 2013 at 6:36 PM, Mohammad Islam <[email protected]> wrote: > Hi Praveen, > > Current should be based on nominal time (no action creation time). If it > is the case, it would be a big issue. Please make sure this. > > For your example, I think you first job with NominalTime(NT) = 1/26/13, > should look for data of 1/22/13 (current (-4)) NOT 1/26. But , according to > your dataset definition with 'initial-instance' of 1/26/13, Ooize will not > look beyond this date (1/26). In this case, Oozie will return empty ("") as > data directory. In other words, the first action will pass the data > dependency checks (w/o actually checking the dats) whether data is there or > not and will start right a way. Part of the solution is to change > the 'initial-instance' to 1/22/13. This is being well-documented in the > specs. > > If possible, please cut-paste the output of "oozie job -info <COORD-ID> > -verbose" > > Regards, > Mohammad > > > > > ________________________________ > From: Praveen M <[email protected]> > To: [email protected] > Sent: Wednesday, January 30, 2013 5:50 PM > Subject: Oozie noob question. > > I have a coord job definition as below in the end of this mail. > > The job is a daily job, which basically processes some logs. That said, the > logs are sometimes delayed, and i'd want to start the process only after > the logs are present. > Hence, I'm trying to add an input-event which should be satisfied before > the process starts. > > Typically, the logs come in with a delay of 4 days(yea..i know it's > crazy..but let's say so)., I have my input event instance set to > ${coord:current(-4). This works well. > > That said, my initial date of the dataset processed is a little behind > (1/26/2013) > And, lets say i fire this coord job on 1/30/2013). > > To catch up for all the processing not done, the coordinator spins up jobs > for > 1/26/2013 > 1/27/2013 > .... > 1/30/2013 > > but all with creation date of 1/30/2013 > > Now, when these jobs run, the 1st one runs successfully, the input > validation ${coord:current(-4) succeeds, (log for 1/26/2013) is available. > the next one however again succeed's the input validation > ${coord:current(-4) but the job fails, the input con logs for 1/27/2013 is > not available yet. > > I see that this because that the coord:current() value is computed using > the job creation time (and not the nominal time). > > In a day to day job spin off situation this is ok, and would work well. But > trying to schedule work which should've been done..and to catch up..the > dates are off, > > Is there a way to solve this?? Or is it just that I have to run the jobs > separately catch up, and then schedule for future runs? > > <coordinator-app name="sauron-coord" frequency="${coord:days(1)}" > start="2013-01-26T00:00Z" end="2015-01-02T08:00Z" > timezone="America/Los_Angeles" > xmlns="uri:oozie:coordinator:0.1"> > <datasets> > <dataset name="inputDir" frequency="${coord:days(1)}" > initial-instance="2013-01-26T00:00Z" timezone="America/Los_Angeles"> > > > <uri-template>${nameNode}/shared/app_logs/${YEAR}/${MONTH}/${DAY}</uri-template> > </dataset> > </datasets> > <input-events> > <data-in name="coordInput" dataset="inputDir"> > <instance>${coord:current(-4)}</instance> > </data-in> > </input-events> > <action> > <workflow> > > <app-path>${nameNode}/user/${coord:user()}/${sauronRoot}</app-path> > <configuration> > <property> > <name>wfInput</name> > > <value>${nameNode}/shared/app_logs/${coord:formatTime(coord:nominalTime(), > 'yyyy/MM/dd')}/*/{na*,eu*,ap*}</value> > </property> > <property> > <name>wfOutput</name> > <value>/${coord:formatTime(coord:nominalTime(), > 'yyyy/MM/dd')}</value> > </property> > </configuration> > </workflow> > </action> > </coordinator-app> > > > Thank you, > -- > -Praveen > -- -Praveen
