You can specify more than one instance in data-out. But if the instances
produced are random, then the only think I can think of is passing the
partitions created from one action to the next in the workflow through
action output. You can write any data in a java action and pass it on to
the next action. Or you can write them to a file in hdfs and let the other
action pick it up.

https://cwiki.apache.org/confluence/display/OOZIE/Java+Cookbook - Check out
capture-output

Regards,
Rohini



On Tue, Dec 17, 2013 at 6:12 PM, Huiting Li <[email protected]> wrote:

> It's said that coord:dataOut() resolves to all the URIs for the dataset
> instance specified in an output event dataset section. From my
> understanding, the output event is a kind of pre-determined value, as
> usually coord:current(0) is used in output event. Taking the oozie doc
> below for example, for the first run, coord:dataOut('outputLogs') will
> resolve to " hdfs://bar:8020/app/logs/2009/01/02", instead of the actual
> output of the last step, which may be a few random partitions, right?
>
> So how to specify the output event in my case? Thanks a lot!
>
>
> ====oozie example=====
> <coordinator-app name="app-coord" frequency="${coord:days(1)}"
>                     start="2009-01-01T24:00Z" end="2009-12-31T24:00Z"
> timezone="UTC"
>                     xmlns="uri:oozie:coordinator:0.1">
>       <datasets>
>         <dataset name="dailyLogs" frequency="${coord:days(1)}"
> initial-instance="2009-01-01T24:00Z" timezone="UTC">
>
> <uri-template>hdfs://bar:8020/app/daily-logs/${YEAR}/${MONTH}/${DAY}</uri-template>
>          </dataset>
>       </datasets>
>       <input-events>... </input-events>
>       <output-events>
>         <data-out name="outputLogs" dataset="dailyLogs">
>           <instance>${coord:current(0)}</instance>
>         </data-out>
>       </output-events>
> <action>.....
>        <property>
>               <name>wfOutput</name>
>               <value>${coord:dataOut('outputLogs')}</value>
>        </property>
> </action>
> </coordinator-app>
>
> Thanks,
> Huiting
>
> -----Original Message-----
> From: Rohini Palaniswamy [mailto:[email protected]]
> Sent: 2013年12月18日 9:09
> To: [email protected]
> Subject: Re: Data Pipeline - Does oozie support the newly created
> partitions from step 1 as the input events and parameters for step 2?
>
> The newly generated partitions should be part of data-out. You can pass
> the partitions using coord:dataOut() EL function
>
> Regards,
> Rohini
>
>
>
> On Thu, Dec 12, 2013 at 2:12 AM, Huiting Li <[email protected]>
> wrote:
>
> > Hi,
> >
> > In oozie coordinator, we can Using ${coord:current(int n)} to create a
> > data-pipeline using a coordinator application. It's said that
> > "${coord:current(int n)} represents the nth dataset instance for a
> > synchronous dataset, relative to the coordinator action creation
> > (materialization) time. The coordinator action creation
> > (materialization) time is computed based on the coordinator job start
> time and its frequency.
> > The nth dataset instance is computed based on the dataset's
> > initial-instance datetime, its frequency and the (current) coordinator
> > action creation (materialization) time."
> > However, our case is: coordinator starts at for example 2013-12-12-02,
> > step 1 outputs multiple partitioned data, like partitions /data/dth=
> > 2013-12-11-22, /data/dth=2013-12-11-23, /data/dth=2013-12-12-02. We
> > want to process all these newly generated partitions in step 2. That
> > means, step
> > 2 take the output of step 1 as its input, and will process data in the
> > new partitions one by one. So if we define a dataset like below in
> > step 2, how could we define the input events (in </data-in>) and pass
> > parameters(in configuration property) to step2?
> >           <uri-template>
> >                  hdfs://xxx:8020/data/dth=${YEAR}-${MONTH}-${DAY}-${HOUR}
> >           </uri-template>
> >
> > Does oozie support such kind of pipeline?
> >
> > Thanks,
> > Huiting
> >
>

Reply via email to