Please read this once .. http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf
*Warm Regards_**∞_* * Shashwat Shriparv* [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9] <http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image: https://twitter.com/shriparv] <https://twitter.com/shriparv>[image: https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image: http://google.com/+ShashwatShriparv] <http://google.com/+ShashwatShriparv>[image: http://www.youtube.com/user/sShriparv/videos] <http://www.youtube.com/user/sShriparv/videos>[image: http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <[email protected]> On Fri, Dec 19, 2014 at 9:18 AM, [email protected] <[email protected]> wrote: > > Hi Hadoopers, > > I got a question about the behavior of HDFS. > > Say, there are 1 namenode and 10 data nodes. > > On the namenode machine, i upload a 1G file to HDFS. Will this 1G file be > distributed evenly to the data nodes, and there is no data stored on the > namenode? > If I upload the the data from the data node, will the file still distributed > evenly to all the data nodes ? I think if most of the data reside on the > node that i upload the data, it will save the network, but this leads to > another problem, when MR this file, > most of time will be spent on this node because it has to process most of > the data. > > ------------------------------ > [email protected] >
