Hi Praveen,

Current should be based on nominal time (no action creation time). If it is the 
case, it would be a big issue. Please make sure this.

For your example, I think you first job with NominalTime(NT) = 1/26/13, should 
look for data of 1/22/13 (current (-4)) NOT 1/26. But , according to your 
dataset definition with 'initial-instance' of 1/26/13, Ooize will not look 
beyond this date (1/26). In this case, Oozie will return empty ("") as data 
directory. In other words, the first action will pass the data dependency 
checks (w/o actually checking the dats) whether data is there or not and will 
start right a way. Part of the solution is to change the 'initial-instance' to 
1/22/13. This is being well-documented in the specs. 

If possible, please cut-paste the output of "oozie job -info <COORD-ID> 
-verbose"

Regards,
Mohammad
 
 


________________________________
 From: Praveen M <[email protected]>
To: [email protected] 
Sent: Wednesday, January 30, 2013 5:50 PM
Subject: Oozie noob question.
 
I have a coord job definition as below in the end of this mail.

The job is a daily job, which basically processes some logs. That said, the
logs are sometimes delayed, and i'd want to start the process only after
the logs are present.
Hence, I'm trying to add an input-event which should be satisfied before
the process starts.

Typically, the logs come in with a delay of 4 days(yea..i know it's
crazy..but let's say so)., I have my input event instance set to
${coord:current(-4). This works well.

That said, my initial date of the dataset processed is a little behind
(1/26/2013)
And, lets say i fire this coord job on 1/30/2013).

To catch up for all the processing not done, the coordinator spins up jobs
for
1/26/2013
1/27/2013
....
1/30/2013

but all with creation date of 1/30/2013

Now, when these jobs run, the 1st one runs successfully, the input
validation ${coord:current(-4) succeeds, (log for 1/26/2013) is available.
the next one however again succeed's the input validation
${coord:current(-4) but the job fails, the input con logs for 1/27/2013 is
not available yet.

I see that this because that the coord:current() value is computed using
the job creation time (and not the nominal time).

In a day to day job spin off situation this is ok, and would work well. But
trying to schedule work which should've been done..and to catch up..the
dates are off,

Is there a way to solve this?? Or is it just that I have to run the jobs
separately catch up, and then schedule for future runs?

<coordinator-app name="sauron-coord" frequency="${coord:days(1)}"
                    start="2013-01-26T00:00Z" end="2015-01-02T08:00Z"
                    timezone="America/Los_Angeles"
                    xmlns="uri:oozie:coordinator:0.1">
    <datasets>
      <dataset name="inputDir" frequency="${coord:days(1)}"
initial-instance="2013-01-26T00:00Z" timezone="America/Los_Angeles">

<uri-template>${nameNode}/shared/app_logs/${YEAR}/${MONTH}/${DAY}</uri-template>
      </dataset>
   </datasets>
   <input-events>
      <data-in name="coordInput" dataset="inputDir">
          <instance>${coord:current(-4)}</instance>
      </data-in>
   </input-events>
      <action>
        <workflow>

<app-path>${nameNode}/user/${coord:user()}/${sauronRoot}</app-path>
          <configuration>
            <property>
              <name>wfInput</name>

<value>${nameNode}/shared/app_logs/${coord:formatTime(coord:nominalTime(),
'yyyy/MM/dd')}/*/{na*,eu*,ap*}</value>
              </property>
              <property>
                <name>wfOutput</name>
                <value>/${coord:formatTime(coord:nominalTime(),
'yyyy/MM/dd')}</value>
              </property>
         </configuration>
       </workflow>
      </action>
   </coordinator-app>


Thank you,
-- 
-Praveen

Reply via email to