ok .thanks for that information . 
As i said i am running  2 datanodes on same machine . so my haddop home has 2 
conf folders .
conf and conf2  and in turn 2 hdfs-site.xml in both conf folders .
I guess dfs.replication value in hdfs-site.xml of conf folder should be 3 .
What should i have it in conf2  ? should it be 1 there ?

sorry if question sounds stupid . But i unfamiliar with these kind of settings 
( 2 datanodes on same machine ..so having 2 conf )


 If data is split across multiple datanodes , then processing capacity would be 
improved - ( thats what i guess ) since my file is only 240 KB , it occupies 
only one block . It cannot use second block and remain in another datanode . 
So now , does it make sense to reduce the block size so that blocks are split 
between 2 datanodes —if i want to take very much advantage of multiple 
datanodes .


Best Regards,
Sindhu


On 25 May 2014, at 21:47, Peyman Mohajerian <mohaj...@gmail.com> wrote:

> Block size are typically 64 M or 12 M, so in your case only a single block is 
> involved which means if you have a single replica then only a single data 
> node will be used. The default replication is three and since you only have 
> two data nodes, you will most likely have two copies of the data in two 
> separate data nodes.
> 
> 
> On Sun, May 25, 2014 at 12:40 PM, Sindhu Hosamane <sindh...@gmail.com> wrote:
> 
>>> Hello Friends, 
>>> 
>>> I am running  multiple datanodes on a single machine .
>>> 
>>> The output of jps command shows 
>>> Namenode       Datanode     Datanode     Jobtracker     tasktracker        
>>> Secondary Namenode
>>> 
>>> Which assures that 2 datanodes are up and running .I execute cascalog 
>>> queries on this 2 datanode hadoop cluster  , And i get the results of query 
>>> too.
>>> I am not sure if it is really using both datanodes . ( bcoz anyways i get 
>>> results with one datanode )
>>> 
>>> (read somewhere about HDFS storing data in datanodes like below )
>>> 1)  A HDFS scheme might automatically move data from one DataNode to 
>>> another if the free space on a DataNode falls below a certain threshold. 
>>> 2)  Internally, a file is split into one or more blocks and these blocks 
>>> are stored in a set of DataNodes. 
>>> 
>>> My doubts are :
>>> * Do i have to make any configuration changes in hadoop to tell it to share 
>>> datablocks between 2 datanodes or does it do automatically .
>>> * Also My test data is not too big . its only 240 KB . According to point 
>>> 1) i don't know if such small test data can initiate automatic movement of  
>>> data from one datanode to another .
>>> * Also what should dfs.replication  value be when i am running 2 datanodes  
>>> ?  (i guess its 2 )
>>> 
>>> 
>>> Any advice or help would be very much appreciated .
>>> 
>>> Best Regards,
>>> Sindhu
>> 
>> 
> 
> 

Reply via email to