ok .thanks for that information . As i said i am running 2 datanodes on same machine . so my haddop home has 2 conf folders . conf and conf2 and in turn 2 hdfs-site.xml in both conf folders . I guess dfs.replication value in hdfs-site.xml of conf folder should be 3 . What should i have it in conf2 ? should it be 1 there ?
sorry if question sounds stupid . But i unfamiliar with these kind of settings ( 2 datanodes on same machine ..so having 2 conf ) If data is split across multiple datanodes , then processing capacity would be improved - ( thats what i guess ) since my file is only 240 KB , it occupies only one block . It cannot use second block and remain in another datanode . So now , does it make sense to reduce the block size so that blocks are split between 2 datanodes —if i want to take very much advantage of multiple datanodes . Best Regards, Sindhu On 25 May 2014, at 21:47, Peyman Mohajerian <mohaj...@gmail.com> wrote: > Block size are typically 64 M or 12 M, so in your case only a single block is > involved which means if you have a single replica then only a single data > node will be used. The default replication is three and since you only have > two data nodes, you will most likely have two copies of the data in two > separate data nodes. > > > On Sun, May 25, 2014 at 12:40 PM, Sindhu Hosamane <sindh...@gmail.com> wrote: > >>> Hello Friends, >>> >>> I am running multiple datanodes on a single machine . >>> >>> The output of jps command shows >>> Namenode Datanode Datanode Jobtracker tasktracker >>> Secondary Namenode >>> >>> Which assures that 2 datanodes are up and running .I execute cascalog >>> queries on this 2 datanode hadoop cluster , And i get the results of query >>> too. >>> I am not sure if it is really using both datanodes . ( bcoz anyways i get >>> results with one datanode ) >>> >>> (read somewhere about HDFS storing data in datanodes like below ) >>> 1) A HDFS scheme might automatically move data from one DataNode to >>> another if the free space on a DataNode falls below a certain threshold. >>> 2) Internally, a file is split into one or more blocks and these blocks >>> are stored in a set of DataNodes. >>> >>> My doubts are : >>> * Do i have to make any configuration changes in hadoop to tell it to share >>> datablocks between 2 datanodes or does it do automatically . >>> * Also My test data is not too big . its only 240 KB . According to point >>> 1) i don't know if such small test data can initiate automatic movement of >>> data from one datanode to another . >>> * Also what should dfs.replication value be when i am running 2 datanodes >>> ? (i guess its 2 ) >>> >>> >>> Any advice or help would be very much appreciated . >>> >>> Best Regards, >>> Sindhu >> >> > >