Praveen,
Good to know it is resolved.
I found this is another common issue that we should document in some place.


Regards,
Mohammad

________________________________
 From: Praveen M <[email protected]>
To: [email protected]; Mohammad Islam <[email protected]> 
Sent: Tuesday, February 5, 2013 10:41 AM
Subject: Re: Oozie noob question.
 
Hi Mohammad,

Thanks for writing. I found the issue in my case. My input-event criterion
was looking for the _SUCCESS file in the directory, and I don't even write
such a file, hence i decided to use a <done-flag></done-flag>. That fixed
the issue.

Thank you.
Praveen


On Wed, Jan 30, 2013 at 6:36 PM, Mohammad Islam <[email protected]> wrote:

> Hi Praveen,
>
> Current should be based on nominal time (no action creation time). If it
> is the case, it would be a big issue. Please make sure this.
>
> For your example, I think you first job with NominalTime(NT) = 1/26/13,
> should look for data of 1/22/13 (current (-4)) NOT 1/26. But , according to
> your dataset definition with 'initial-instance' of 1/26/13, Ooize will not
> look beyond this date (1/26). In this case, Oozie will return empty ("") as
> data directory. In other words, the first action will pass the data
> dependency checks (w/o actually checking the dats) whether data is there or
> not and will start right a way. Part of the solution is to change
> the 'initial-instance' to 1/22/13. This is being well-documented in the
> specs.
>
> If possible, please cut-paste the output of "oozie job -info <COORD-ID>
> -verbose"
>
> Regards,
> Mohammad
>
>
>
>
> ________________________________
>  From: Praveen M <[email protected]>
> To: [email protected]
> Sent: Wednesday, January 30, 2013 5:50 PM
> Subject: Oozie noob question.
>
> I have a coord job definition as below in the end of this mail.
>
> The job is a daily job, which basically processes some logs. That said, the
> logs are sometimes delayed, and i'd want to start the process only after
> the logs are present.
> Hence, I'm trying to add an input-event which should be satisfied before
> the process starts.
>
> Typically, the logs come in with a delay of 4 days(yea..i know it's
> crazy..but let's say so)., I have my input event instance set to
> ${coord:current(-4). This works well.
>
> That said, my initial date of the dataset processed is a little behind
> (1/26/2013)
> And, lets say i fire this coord job on 1/30/2013).
>
> To catch up for all the processing not done, the coordinator spins up jobs
> for
> 1/26/2013
> 1/27/2013
> ....
> 1/30/2013
>
> but all with creation date of 1/30/2013
>
> Now, when these jobs run, the 1st one runs successfully, the input
> validation ${coord:current(-4) succeeds, (log for 1/26/2013) is available.
> the next one however again succeed's the input validation
> ${coord:current(-4) but the job fails, the input con logs for 1/27/2013 is
> not available yet.
>
> I see that this because that the coord:current() value is computed using
> the job creation time (and not the nominal time).
>
> In a day to day job spin off situation this is ok, and would work well. But
> trying to schedule work which should've been done..and to catch up..the
> dates are off,
>
> Is there a way to solve this?? Or is it just that I have to run the jobs
> separately catch up, and then schedule for future runs?
>
> <coordinator-app name="sauron-coord" frequency="${coord:days(1)}"
>                     start="2013-01-26T00:00Z" end="2015-01-02T08:00Z"
>                     timezone="America/Los_Angeles"
>                     xmlns="uri:oozie:coordinator:0.1">
>     <datasets>
>       <dataset name="inputDir" frequency="${coord:days(1)}"
> initial-instance="2013-01-26T00:00Z" timezone="America/Los_Angeles">
>
>
> <uri-template>${nameNode}/shared/app_logs/${YEAR}/${MONTH}/${DAY}</uri-template>
>       </dataset>
>    </datasets>
>    <input-events>
>       <data-in name="coordInput" dataset="inputDir">
>           <instance>${coord:current(-4)}</instance>
>       </data-in>
>    </input-events>
>       <action>
>         <workflow>
>
> <app-path>${nameNode}/user/${coord:user()}/${sauronRoot}</app-path>
>           <configuration>
>             <property>
>               <name>wfInput</name>
>
> <value>${nameNode}/shared/app_logs/${coord:formatTime(coord:nominalTime(),
> 'yyyy/MM/dd')}/*/{na*,eu*,ap*}</value>
>               </property>
>               <property>
>                 <name>wfOutput</name>
>                 <value>/${coord:formatTime(coord:nominalTime(),
> 'yyyy/MM/dd')}</value>
>               </property>
>          </configuration>
>        </workflow>
>       </action>
>    </coordinator-app>
>
>
> Thank you,
> --
> -Praveen
>



-- 
-Praveen

Reply via email to