Hi team, I am working on a use case - Where I have to pull logs(which actually are flat files created from batch process every 2 hours) from a non-hadoop source server and then publish to HDFS (target hadoop cluster). Then a subsequent downstream process would perform analytics on data mostly in real time(as and when hdfs is updated)
I wanted to understand can this be done using oozie only or I would need FLUME? I am not much aware of oozie features. How will it handle the trigger/event when source files are published, updates are made to HDFS and then only the subsequent process should run. How can I configure an event which can mark end of update on hdfs for the transaction. Any help will be much appreciated!! Thanks in advance. Sent from my iPhone
