You can create a flag to trigger oozie, with `hadoop fs -touchz
/path/to/dataset/_SUCCESS` or something similar within the java api.


On Fri, Nov 30, 2012 at 1:38 AM, 黄 山 <[email protected]> wrote:

> If I want the dataset be triggered only once after it was changed, how to
> set the done-flag? or use other tricks.
>
>
> good luck,
> huangs
> [email protected]
>
>
>
> 在 2012-11-30,上午1:45, Joe Crobak 写道:
>
> > Oozie is looking for a _SUCCESS flag in the directory specified as your
> > coordinator dataset. To disable it, add <done-flag></done-flag> to your
> > <dataset> definition. Here's the relevant docs from your link:
> >
> > *done-flag:* The done file for the data set. If done-flag is not
> specified,
> > then Oozie configures Hadoop to create a _SUCCESS file in the output
> > directory. If the done flag is set to empty, then Coordinator looks for
> the
> > existence of the directory itself.
> >
> >
> > On Thu, Nov 29, 2012 at 12:13 PM, 黄 山 <[email protected]> wrote:
> >
> >> I read some documents about coordiantor
> >>
> >>
> http://oozie.apache.org/docs/3.2.0-incubating/CoordinatorFunctionalSpec.html
> >>
> >> Oozie will check specified path periodically.
> >>
> >> But I want an application:
> >>
> >> Dataset triggered when a hdfs path changed.
> >>
> >> I wrote a coordinator xml below :
> >>
> >> =======================================================================
> >> <coordinator-app name="aggregator-coord" frequency="1" start="${start}"
> >> end="${end}" timezone="UTC"
> >>                 xmlns="uri:oozie:coordinator:0.2">
> >>    <controls>
> >>        <concurrency>1</concurrency>
> >>    </controls>
> >>
> >>    <datasets>
> >>        <dataset name="input" frequency="${coord:minutes(1)}"
> >> initial-instance="2010-01-01T00:00Z" timezone="UTC">
> >>
> >>
> <uri-template>${nameNode}/user/${coord:user()}/coord_input</uri-template>
> >>        </dataset>
> >>        <dataset name="output" frequency="${coord:minutes(1)}"
> >> initial-instance="2010-01-01T01:00Z" timezone="UTC">
> >>
> >>
> <uri-template>${nameNode}/user/${coord:user()}/coord_output/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
> >>        </dataset>
> >>    </datasets>
> >>
> >>    <input-events>
> >>        <data-in name="input" dataset="input">
> >>            <start-instance>${coord:current(-2)}</start-instance>
> >>            <end-instance>${coord:current(0)}</end-instance>
> >>        </data-in>
> >>    </input-events>
> >>
> >>    <output-events>
> >>        <data-out name="output" dataset="output">
> >>            <instance>${coord:current(0)}</instance>
> >>        </data-out>
> >>    </output-events>
> >>
> >>    <action>
> >>        <workflow>
> >>
> >>
> <app-path>${nameNode}/user/${coord:user()}/examples/apps/streaming</app-path>
> >>            <configuration>
> >>                <property>
> >>                    <name>jobTracker</name>
> >>                    <value>${jobTracker}</value>
> >>                </property>
> >>                <property>
> >>                    <name>nameNode</name>
> >>                    <value>${nameNode}</value> </property>
> >>                <property>
> >>                <property>
> >>                    <name>queueName</name>
> >>                    <value>${queueName}</value>
> >>                </property>
> >>                <property>
> >>                    <name>inputData</name>
> >>                    <value>${coord:dataIn('input')}</value>
> >>                </property>
> >>                <property>
> >>                    <name>outputData</name>
> >>                    <value>${coord:dataOut('output')}</value>
> >>                </property>
> >>            </configuration>
> >>        </workflow>
> >>    </action>
> >> </coordinator-app>
> >> =======================================================================
> >>
> >> I have run this job, put some data into coord_input. but nothing
> happened.
> >>
> >>
> >> good luck,
> >> huangs
> >>
> >> [email protected]
> >>
> >>
> >>
> >>
>
>

Reply via email to