You can create a flag to trigger oozie, with `hadoop fs -touchz /path/to/dataset/_SUCCESS` or something similar within the java api.
On Fri, Nov 30, 2012 at 1:38 AM, 黄 山 <[email protected]> wrote: > If I want the dataset be triggered only once after it was changed, how to > set the done-flag? or use other tricks. > > > good luck, > huangs > [email protected] > > > > 在 2012-11-30,上午1:45, Joe Crobak 写道: > > > Oozie is looking for a _SUCCESS flag in the directory specified as your > > coordinator dataset. To disable it, add <done-flag></done-flag> to your > > <dataset> definition. Here's the relevant docs from your link: > > > > *done-flag:* The done file for the data set. If done-flag is not > specified, > > then Oozie configures Hadoop to create a _SUCCESS file in the output > > directory. If the done flag is set to empty, then Coordinator looks for > the > > existence of the directory itself. > > > > > > On Thu, Nov 29, 2012 at 12:13 PM, 黄 山 <[email protected]> wrote: > > > >> I read some documents about coordiantor > >> > >> > http://oozie.apache.org/docs/3.2.0-incubating/CoordinatorFunctionalSpec.html > >> > >> Oozie will check specified path periodically. > >> > >> But I want an application: > >> > >> Dataset triggered when a hdfs path changed. > >> > >> I wrote a coordinator xml below : > >> > >> ======================================================================= > >> <coordinator-app name="aggregator-coord" frequency="1" start="${start}" > >> end="${end}" timezone="UTC" > >> xmlns="uri:oozie:coordinator:0.2"> > >> <controls> > >> <concurrency>1</concurrency> > >> </controls> > >> > >> <datasets> > >> <dataset name="input" frequency="${coord:minutes(1)}" > >> initial-instance="2010-01-01T00:00Z" timezone="UTC"> > >> > >> > <uri-template>${nameNode}/user/${coord:user()}/coord_input</uri-template> > >> </dataset> > >> <dataset name="output" frequency="${coord:minutes(1)}" > >> initial-instance="2010-01-01T01:00Z" timezone="UTC"> > >> > >> > <uri-template>${nameNode}/user/${coord:user()}/coord_output/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template> > >> </dataset> > >> </datasets> > >> > >> <input-events> > >> <data-in name="input" dataset="input"> > >> <start-instance>${coord:current(-2)}</start-instance> > >> <end-instance>${coord:current(0)}</end-instance> > >> </data-in> > >> </input-events> > >> > >> <output-events> > >> <data-out name="output" dataset="output"> > >> <instance>${coord:current(0)}</instance> > >> </data-out> > >> </output-events> > >> > >> <action> > >> <workflow> > >> > >> > <app-path>${nameNode}/user/${coord:user()}/examples/apps/streaming</app-path> > >> <configuration> > >> <property> > >> <name>jobTracker</name> > >> <value>${jobTracker}</value> > >> </property> > >> <property> > >> <name>nameNode</name> > >> <value>${nameNode}</value> </property> > >> <property> > >> <property> > >> <name>queueName</name> > >> <value>${queueName}</value> > >> </property> > >> <property> > >> <name>inputData</name> > >> <value>${coord:dataIn('input')}</value> > >> </property> > >> <property> > >> <name>outputData</name> > >> <value>${coord:dataOut('output')}</value> > >> </property> > >> </configuration> > >> </workflow> > >> </action> > >> </coordinator-app> > >> ======================================================================= > >> > >> I have run this job, put some data into coord_input. but nothing > happened. > >> > >> > >> good luck, > >> huangs > >> > >> [email protected] > >> > >> > >> > >> > >
