Re: doubt on Hadoop job submission process

2012-08-13 Thread Harsh J
Hi Manoj, Reply inline. On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu manoj...@gmail.com wrote: Hi All, Normal Hadoop job submission process involves: Checking the input and output specifications of the job. Computing the InputSplits for the job. Setup the requisite accounting information

Locks in M/R framework

2012-08-13 Thread David Ginzburg
Hi, I have an HDFS folder and M/R job that periodically updates it by replacing the data with newly generated data. I have a different M/R job that periodically or ad-hoc process the data in the folder. The second job ,naturally, fails sometime, when the data is replaced by newly generated

Re: Locks in M/R framework

2012-08-13 Thread Tim Robertson
How about introducing a distributed coordination and locking mechanism? ZooKeeper would be a good candidate for that kind of thing. On Mon, Aug 13, 2012 at 12:52 PM, David Ginzburg ginz...@hotmail.comwrote: Hi, I have an HDFS folder and M/R job that periodically updates it by replacing the

Re: doubt on Hadoop job submission process

2012-08-13 Thread Manoj Babu
Hi Harsh, Thanks for your reply. Consider from my main program i am doing so many activities(Reading/writing/updating non hadoop activities) before invoking JobClient.runJob(conf); Is it anyway to separate the process flow by programmatic instead of going for any workflow engine? Cheers! Manoj.

Re: doubt on Hadoop job submission process

2012-08-13 Thread Harsh J
Sure, you may separate the logic as you want it to be, but just ensure the configuration object has a proper setJar or setJarByClass done on it before you submit the job. On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu manoj...@gmail.com wrote: Hi Harsh, Thanks for your reply. Consider from my

Re: Locks in M/R framework

2012-08-13 Thread Harsh J
David, While ZK can solve this, locking may only make you slower. Lets try to keep it simple? Have you considered keeping two directories? One where the older data is moved to (by the first job, instead of replacing files), for consumption by the second job, which triggers by watching this