You can specify more than one instance in data-out. But if the instances produced are random, then the only think I can think of is passing the partitions created from one action to the next in the workflow through action output. You can write any data in a java action and pass it on to the next action. Or you can write them to a file in hdfs and let the other action pick it up.
https://cwiki.apache.org/confluence/display/OOZIE/Java+Cookbook - Check out capture-output Regards, Rohini On Tue, Dec 17, 2013 at 6:12 PM, Huiting Li <[email protected]> wrote: > It's said that coord:dataOut() resolves to all the URIs for the dataset > instance specified in an output event dataset section. From my > understanding, the output event is a kind of pre-determined value, as > usually coord:current(0) is used in output event. Taking the oozie doc > below for example, for the first run, coord:dataOut('outputLogs') will > resolve to " hdfs://bar:8020/app/logs/2009/01/02", instead of the actual > output of the last step, which may be a few random partitions, right? > > So how to specify the output event in my case? Thanks a lot! > > > ====oozie example===== > <coordinator-app name="app-coord" frequency="${coord:days(1)}" > start="2009-01-01T24:00Z" end="2009-12-31T24:00Z" > timezone="UTC" > xmlns="uri:oozie:coordinator:0.1"> > <datasets> > <dataset name="dailyLogs" frequency="${coord:days(1)}" > initial-instance="2009-01-01T24:00Z" timezone="UTC"> > > <uri-template>hdfs://bar:8020/app/daily-logs/${YEAR}/${MONTH}/${DAY}</uri-template> > </dataset> > </datasets> > <input-events>... </input-events> > <output-events> > <data-out name="outputLogs" dataset="dailyLogs"> > <instance>${coord:current(0)}</instance> > </data-out> > </output-events> > <action>..... > <property> > <name>wfOutput</name> > <value>${coord:dataOut('outputLogs')}</value> > </property> > </action> > </coordinator-app> > > Thanks, > Huiting > > -----Original Message----- > From: Rohini Palaniswamy [mailto:[email protected]] > Sent: 2013年12月18日 9:09 > To: [email protected] > Subject: Re: Data Pipeline - Does oozie support the newly created > partitions from step 1 as the input events and parameters for step 2? > > The newly generated partitions should be part of data-out. You can pass > the partitions using coord:dataOut() EL function > > Regards, > Rohini > > > > On Thu, Dec 12, 2013 at 2:12 AM, Huiting Li <[email protected]> > wrote: > > > Hi, > > > > In oozie coordinator, we can Using ${coord:current(int n)} to create a > > data-pipeline using a coordinator application. It's said that > > "${coord:current(int n)} represents the nth dataset instance for a > > synchronous dataset, relative to the coordinator action creation > > (materialization) time. The coordinator action creation > > (materialization) time is computed based on the coordinator job start > time and its frequency. > > The nth dataset instance is computed based on the dataset's > > initial-instance datetime, its frequency and the (current) coordinator > > action creation (materialization) time." > > However, our case is: coordinator starts at for example 2013-12-12-02, > > step 1 outputs multiple partitioned data, like partitions /data/dth= > > 2013-12-11-22, /data/dth=2013-12-11-23, /data/dth=2013-12-12-02. We > > want to process all these newly generated partitions in step 2. That > > means, step > > 2 take the output of step 1 as its input, and will process data in the > > new partitions one by one. So if we define a dataset like below in > > step 2, how could we define the input events (in </data-in>) and pass > > parameters(in configuration property) to step2? > > <uri-template> > > hdfs://xxx:8020/data/dth=${YEAR}-${MONTH}-${DAY}-${HOUR} > > </uri-template> > > > > Does oozie support such kind of pipeline? > > > > Thanks, > > Huiting > > >
