Data Pipeline - Does oozie support the newly created partitions from step 1 as the input events and parameters for step 2?

Huiting Li Thu, 12 Dec 2013 02:13:27 -0800

Hi,

In oozie coordinator, we can Using ${coord:current(int n)} to create a 
data-pipeline using a coordinator application. It's said that 
"${coord:current(int n)} represents the nth dataset instance for a synchronous 
dataset, relative to the coordinator action creation (materialization) time. 
The coordinator action creation (materialization) time is computed based on the 
coordinator job start time and its frequency. The nth dataset instance is 
computed based on the dataset's initial-instance datetime, its frequency and 
the (current) coordinator action creation (materialization) time."
However, our case is: coordinator starts at for example 2013-12-12-02, step 1 
outputs multiple partitioned data, like partitions /data/dth=2013-12-11-22, 
/data/dth=2013-12-11-23, /data/dth=2013-12-12-02. We want to process all these 
newly generated partitions in step 2. That means, step 2 take the output of 
step 1 as its input, and will process data in the new partitions one by one. So 
if we define a dataset like below in step 2, how could we define the input 
events (in </data-in>) and pass parameters(in configuration property) to step2?
          <uri-template>
                 hdfs://xxx:8020/data/dth=${YEAR}-${MONTH}-${DAY}-${HOUR}
          </uri-template>


Does oozie support such kind of pipeline?

Thanks,
Huiting

Data Pipeline - Does oozie support the newly created partitions from step 1 as the input events and parameters for step 2?

Reply via email to