Hi team,

I am working on a use case - 
Where I have to pull logs(which actually are flat files created from batch 
process every 2 hours) from a non-hadoop source server and then publish to HDFS 
(target hadoop cluster). Then a subsequent downstream process would perform 
analytics on data mostly in real time(as and when hdfs is updated)

I wanted to understand can this be done using oozie only or I would need FLUME? 
I am not much aware of oozie features.

How will it handle the trigger/event when source files are published, updates 
are made to HDFS and then only the subsequent process should run. How can I 
configure an event which can mark end of update on hdfs for the transaction.


Any help will be much appreciated!!


Thanks in advance.

Sent from my iPhone

Reply via email to