Here is the block info for the record I distributed. As can be seen only 10.12.11.210 has all the data and this is the node which is serving all the request. Replicas are available with 209 as well as 210
1073741857: 10.12.11.210:50010 View Block Info 10.12.11.209:50010 View Block Info 1073741858: 10.12.11.210:50010 View Block Info 10.12.11.211:50010 View Block Info 1073741859: 10.12.11.210:50010 View Block Info 10.12.11.209:50010 View Block Info 1073741860: 10.12.11.210:50010 View Block Info 10.12.11.211:50010 View Block Info 1073741861: 10.12.11.210:50010 View Block Info 10.12.11.209:50010 View Block Info 1073741862: 10.12.11.210:50010 View Block Info 10.12.11.209:50010 View Block Info 1073741863: 10.12.11.210:50010 View Block Info 10.12.11.209:50010 View Block Info 1073741864: 10.12.11.210:50010 View Block Info 10.12.11.209:50010 View Block Info --Ashish On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <[email protected]> wrote: > Hello Chris, > > I have now a cluster with 3 nodes and replication factor being 2. When I > distribute a file I could see that there are replica of data available in > other nodes. However when I run a map reduce job again only one node is > serving all the request :(. Can you or anyone please provide some more > inputs. > > Thanks > Ashish > > > On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <[email protected]>wrote: > >> 2 nodes and replication factor of 2 results in a replica of each block >> present on each node. This would allow the possibility that a single node >> would do the work and yet be data local. It will probably happen if that >> single node has the needed capacity. More nodes than the replication >> factor are needed to force distribution of the processing. >> Chris >> On Jan 8, 2014 7:35 AM, "Ashish Jain" <[email protected]> wrote: >> >>> Guys, >>> >>> I am sure that only one node is being used. I just know ran the job >>> again and could see that CPU usage only for one server going high other >>> server CPU usage remains constant and hence it means other node is not >>> being used. Can someone help me to debug this issue? >>> >>> ++Ashish >>> >>> >>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <[email protected]> wrote: >>> >>>> Hello All, >>>> >>>> I have a 2 node hadoop cluster running with a replication factor of 2. >>>> I have a file of size around 1 GB which when copied to HDFS is replicated >>>> to both the nodes. Seeing the block info I can see the file has been >>>> subdivided into 8 parts which means it has been subdivided into 8 blocks >>>> each of size 128 MB. I use this file as input to run the word count >>>> program. Some how I feel only one node is doing all the work and the code >>>> is not distributed to other node. How can I make sure code is distributed >>>> to both the nodes? Also is there a log or GUI which can be used for this? >>>> Please note I am using the latest stable release that is 2.2.0. >>>> >>>> ++Ashish >>>> >>> >>> >
