Hello Chris, I have now a cluster with 3 nodes and replication factor being 2. When I distribute a file I could see that there are replica of data available in other nodes. However when I run a map reduce job again only one node is serving all the request :(. Can you or anyone please provide some more inputs.
Thanks Ashish On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <[email protected]> wrote: > 2 nodes and replication factor of 2 results in a replica of each block > present on each node. This would allow the possibility that a single node > would do the work and yet be data local. It will probably happen if that > single node has the needed capacity. More nodes than the replication > factor are needed to force distribution of the processing. > Chris > On Jan 8, 2014 7:35 AM, "Ashish Jain" <[email protected]> wrote: > >> Guys, >> >> I am sure that only one node is being used. I just know ran the job again >> and could see that CPU usage only for one server going high other server >> CPU usage remains constant and hence it means other node is not being used. >> Can someone help me to debug this issue? >> >> ++Ashish >> >> >> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <[email protected]> wrote: >> >>> Hello All, >>> >>> I have a 2 node hadoop cluster running with a replication factor of 2. I >>> have a file of size around 1 GB which when copied to HDFS is replicated to >>> both the nodes. Seeing the block info I can see the file has been >>> subdivided into 8 parts which means it has been subdivided into 8 blocks >>> each of size 128 MB. I use this file as input to run the word count >>> program. Some how I feel only one node is doing all the work and the code >>> is not distributed to other node. How can I make sure code is distributed >>> to both the nodes? Also is there a log or GUI which can be used for this? >>> Please note I am using the latest stable release that is 2.2.0. >>> >>> ++Ashish >>> >> >>
