you might want to take a look at zookeeper as a coordination mechanism for when to process what file
On Tue, Jan 10, 2012 at 12:42 PM, rakesh sharma <[email protected] > wrote: > > Hi All, > I am quite new to hadoop world and trying to work on a project using > hadoop and pig. The data is continuously being written in hadoop by many > producers. All producers concurrently write data to the same file for 30 > minutes duration. After 30 minutes, new file is created and they start > writing on it. I need to run pig jobs to analyze the data from hadoop > incrementally and push the resulted data in RDBMS. I am wondering what will > be the right way to implement it. > Thanks,RS
