1 - from your source, create a directory in hdfs that has a yyyy-mm-dd-hh
pattern
2 - put your files there through hdfs or a webhdfs client.
3 - in the same dir, or in a separate one with the same pattern, touchz a
file that will be used as a "ready" flag
4 - make an oozie coordinator that will trigger your action based on this
ok file, used as an input dataset

Use examples in the yahoo/oozie github repo, you should get all you need
there.
Le 14 août 2015 8:13 AM, "Gyani Sinha" <[email protected]> a écrit :

> Hi team,
>
> I am working on a use case -
> Where I have to pull logs(which actually are flat files created from batch
> process every 2 hours) from a non-hadoop source server and then publish to
> HDFS (target hadoop cluster). Then a subsequent downstream process would
> perform analytics on data mostly in real time(as and when hdfs is updated)
>
> I wanted to understand can this be done using oozie only or I would need
> FLUME? I am not much aware of oozie features.
>
> How will it handle the trigger/event when source files are published,
> updates are made to HDFS and then only the subsequent process should run.
> How can I configure an event which can mark end of update on hdfs for the
> transaction.
>
>
> Any help will be much appreciated!!
>
>
> Thanks in advance.
>
> Sent from my iPhone

Reply via email to