you are looking at a two step workflow here first unit of your workflow will download the file from external server and write it to DFS and return the file path second unit of your workflow will read the input path and process the data according to your business logic in MR
you can look at cascading for this simple approach. Its easy to build simple workflow application using cascading. other options being oozie or you may try crunch (its very new but easy to use as well) On Tue, Mar 26, 2013 at 2:49 PM, Agarwal, Nikhil <[email protected]>wrote: > Hi,**** > > ** ** > > I have a Hadoop cluster up and running. I want to submit an MR job to it > but the input data is kept on an external server (outside the hadoop > cluster). Can anyone please suggest how do I tell my hadoop cluster to load > the input data from the external servers and then do a MR on it ?**** > > ** ** > > Thanks & Regards,**** > > Nikhil**** > -- Nitin Pawar
