Hi, You can just use put command to load file into HDFS https://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html#put
Copying files into hdfs won't require mapper or map-reduce job; It depends on your processing logic ( map-reduce code ) if you really require to have single merged file. Also, you can set map.input.dir directory path in job configuration. Regards Alok On Mon, Apr 14, 2014 at 9:58 AM, Shashidhar Rao <[email protected]>wrote: > Hi, > > Please can somebody clarify my doubts. Say. I have a cluster of 30 nodes > and I want to put the files in HDFS. And all the files combine together the > size is 10 TB but each file is roughly say 1GB only and the total number > of files 10 files > > 1. In real production environment do we copy these 10 files in hdfs under > a folder one by one. If this is the case then how many mappers do we > specify 10 mappers. And do we use put command of hadoop to transfer this > file. > > 2. If the above is not the case then do we pre-process to merge these 10 > files make it one file of size 10 TB and copy this in hdfs . > > Regards > Shashidhar > > >
