If I want the dataset be triggered only once after it was changed, how to set 
the done-flag? or use other tricks.


good luck,
huangs
[email protected]



在 2012-11-30,上午1:45, Joe Crobak 写道:

> Oozie is looking for a _SUCCESS flag in the directory specified as your
> coordinator dataset. To disable it, add <done-flag></done-flag> to your
> <dataset> definition. Here's the relevant docs from your link:
> 
> *done-flag:* The done file for the data set. If done-flag is not specified,
> then Oozie configures Hadoop to create a _SUCCESS file in the output
> directory. If the done flag is set to empty, then Coordinator looks for the
> existence of the directory itself.
> 
> 
> On Thu, Nov 29, 2012 at 12:13 PM, 黄 山 <[email protected]> wrote:
> 
>> I read some documents about coordiantor
>> 
>> http://oozie.apache.org/docs/3.2.0-incubating/CoordinatorFunctionalSpec.html
>> 
>> Oozie will check specified path periodically.
>> 
>> But I want an application:
>> 
>> Dataset triggered when a hdfs path changed.
>> 
>> I wrote a coordinator xml below :
>> 
>> =======================================================================
>> <coordinator-app name="aggregator-coord" frequency="1" start="${start}"
>> end="${end}" timezone="UTC"
>>                 xmlns="uri:oozie:coordinator:0.2">
>>    <controls>
>>        <concurrency>1</concurrency>
>>    </controls>
>> 
>>    <datasets>
>>        <dataset name="input" frequency="${coord:minutes(1)}"
>> initial-instance="2010-01-01T00:00Z" timezone="UTC">
>> 
>> <uri-template>${nameNode}/user/${coord:user()}/coord_input</uri-template>
>>        </dataset>
>>        <dataset name="output" frequency="${coord:minutes(1)}"
>> initial-instance="2010-01-01T01:00Z" timezone="UTC">
>> 
>> <uri-template>${nameNode}/user/${coord:user()}/coord_output/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
>>        </dataset>
>>    </datasets>
>> 
>>    <input-events>
>>        <data-in name="input" dataset="input">
>>            <start-instance>${coord:current(-2)}</start-instance>
>>            <end-instance>${coord:current(0)}</end-instance>
>>        </data-in>
>>    </input-events>
>> 
>>    <output-events>
>>        <data-out name="output" dataset="output">
>>            <instance>${coord:current(0)}</instance>
>>        </data-out>
>>    </output-events>
>> 
>>    <action>
>>        <workflow>
>> 
>> <app-path>${nameNode}/user/${coord:user()}/examples/apps/streaming</app-path>
>>            <configuration>
>>                <property>
>>                    <name>jobTracker</name>
>>                    <value>${jobTracker}</value>
>>                </property>
>>                <property>
>>                    <name>nameNode</name>
>>                    <value>${nameNode}</value> </property>
>>                <property>
>>                <property>
>>                    <name>queueName</name>
>>                    <value>${queueName}</value>
>>                </property>
>>                <property>
>>                    <name>inputData</name>
>>                    <value>${coord:dataIn('input')}</value>
>>                </property>
>>                <property>
>>                    <name>outputData</name>
>>                    <value>${coord:dataOut('output')}</value>
>>                </property>
>>            </configuration>
>>        </workflow>
>>    </action>
>> </coordinator-app>
>> =======================================================================
>> 
>> I have run this job, put some data into coord_input. but nothing happened.
>> 
>> 
>> good luck,
>> huangs
>> 
>> [email protected]
>> 
>> 
>> 
>> 

Reply via email to