Hi hitarth , If your file1 and file 2 is smaller you can move on with Distributed Cache. mentioned here <http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html> .
Or you can move on with MultipleInputFormat mentioned here <http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html> . [1] http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html [2] http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <[email protected]> wrote: > Hitarth: > You can also consider MultiFileInputFormat (and its concrete > implementations). > > Cheers > > On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <[email protected]> wrote: > >> Hitarth, >> >> I don't know how much direction you are looking for with regards to the >> formats of the times but you can certainly read both files into the third >> mapreduce job using the FileInputFormat by comma-separating the paths to >> the files. The blocks for both files will essentially be unioned together >> and the mappers scheduled across your cluster. >> >> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <[email protected]> >> wrote: >> >>> Hi, >>> >>> I have 6 node cluster, and the scenario is as follows :- >>> >>> I have one map reduce job which will write file1 in HDFS. >>> I have another map reduce job which will write file2 in HDFS. >>> In the third map reduce job I need to use file1 and file2 to do some >>> computation and output the value. >>> >>> What is the best way to store file1 and file2 in HDFS so that they could >>> be used in third map reduce job. >>> >>> Thanks, >>> Hitarth >>> >> >> > -- *Thanks & Regards * *Unmesha Sreeveni U.B* *Hadoop, Bigdata Developer* *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* http://www.unmeshasreeveni.blogspot.in/
