If I want the dataset be triggered only once after it was changed, how to set the done-flag? or use other tricks.
good luck, huangs [email protected] 在 2012-11-30,上午1:45, Joe Crobak 写道: > Oozie is looking for a _SUCCESS flag in the directory specified as your > coordinator dataset. To disable it, add <done-flag></done-flag> to your > <dataset> definition. Here's the relevant docs from your link: > > *done-flag:* The done file for the data set. If done-flag is not specified, > then Oozie configures Hadoop to create a _SUCCESS file in the output > directory. If the done flag is set to empty, then Coordinator looks for the > existence of the directory itself. > > > On Thu, Nov 29, 2012 at 12:13 PM, 黄 山 <[email protected]> wrote: > >> I read some documents about coordiantor >> >> http://oozie.apache.org/docs/3.2.0-incubating/CoordinatorFunctionalSpec.html >> >> Oozie will check specified path periodically. >> >> But I want an application: >> >> Dataset triggered when a hdfs path changed. >> >> I wrote a coordinator xml below : >> >> ======================================================================= >> <coordinator-app name="aggregator-coord" frequency="1" start="${start}" >> end="${end}" timezone="UTC" >> xmlns="uri:oozie:coordinator:0.2"> >> <controls> >> <concurrency>1</concurrency> >> </controls> >> >> <datasets> >> <dataset name="input" frequency="${coord:minutes(1)}" >> initial-instance="2010-01-01T00:00Z" timezone="UTC"> >> >> <uri-template>${nameNode}/user/${coord:user()}/coord_input</uri-template> >> </dataset> >> <dataset name="output" frequency="${coord:minutes(1)}" >> initial-instance="2010-01-01T01:00Z" timezone="UTC"> >> >> <uri-template>${nameNode}/user/${coord:user()}/coord_output/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template> >> </dataset> >> </datasets> >> >> <input-events> >> <data-in name="input" dataset="input"> >> <start-instance>${coord:current(-2)}</start-instance> >> <end-instance>${coord:current(0)}</end-instance> >> </data-in> >> </input-events> >> >> <output-events> >> <data-out name="output" dataset="output"> >> <instance>${coord:current(0)}</instance> >> </data-out> >> </output-events> >> >> <action> >> <workflow> >> >> <app-path>${nameNode}/user/${coord:user()}/examples/apps/streaming</app-path> >> <configuration> >> <property> >> <name>jobTracker</name> >> <value>${jobTracker}</value> >> </property> >> <property> >> <name>nameNode</name> >> <value>${nameNode}</value> </property> >> <property> >> <property> >> <name>queueName</name> >> <value>${queueName}</value> >> </property> >> <property> >> <name>inputData</name> >> <value>${coord:dataIn('input')}</value> >> </property> >> <property> >> <name>outputData</name> >> <value>${coord:dataOut('output')}</value> >> </property> >> </configuration> >> </workflow> >> </action> >> </coordinator-app> >> ======================================================================= >> >> I have run this job, put some data into coord_input. but nothing happened. >> >> >> good luck, >> huangs >> >> [email protected] >> >> >> >>
