1 - from your source, create a directory in hdfs that has a yyyy-mm-dd-hh pattern 2 - put your files there through hdfs or a webhdfs client. 3 - in the same dir, or in a separate one with the same pattern, touchz a file that will be used as a "ready" flag 4 - make an oozie coordinator that will trigger your action based on this ok file, used as an input dataset
Use examples in the yahoo/oozie github repo, you should get all you need there. Le 14 août 2015 8:13 AM, "Gyani Sinha" <[email protected]> a écrit : > Hi team, > > I am working on a use case - > Where I have to pull logs(which actually are flat files created from batch > process every 2 hours) from a non-hadoop source server and then publish to > HDFS (target hadoop cluster). Then a subsequent downstream process would > perform analytics on data mostly in real time(as and when hdfs is updated) > > I wanted to understand can this be done using oozie only or I would need > FLUME? I am not much aware of oozie features. > > How will it handle the trigger/event when source files are published, > updates are made to HDFS and then only the subsequent process should run. > How can I configure an event which can mark end of update on hdfs for the > transaction. > > > Any help will be much appreciated!! > > > Thanks in advance. > > Sent from my iPhone
