German, This does not seem to be helping. I tried to use the Fairscheduler as my resource manger but the behavior remains same. I could see the fairscheduler log getting continuous heart beat from both the other nodes. But it is still not distributing the work to other nodes. What I did next was started 3 jobs simultaneously so that may be some part of one of the job be distributed to other nodes. However still only one node is being used :(((. What is that is going wrong can some one help?
Sample of fairsheduler log: 2014-01-13 15:13:54,293 HEARTBEAT l1dev-211 2014-01-13 15:13:54,953 HEARTBEAT l1-dev06 2014-01-13 15:13:54,988 HEARTBEAT l1-DEV05 2014-01-13 15:13:55,295 HEARTBEAT l1dev-211 2014-01-13 15:13:55,956 HEARTBEAT l1-dev06 2014-01-13 15:13:55,993 HEARTBEAT l1-DEV05 2014-01-13 15:13:56,297 HEARTBEAT l1dev-211 2014-01-13 15:13:56,960 HEARTBEAT l1-dev06 2014-01-13 15:13:56,997 HEARTBEAT l1-DEV05 2014-01-13 15:13:57,299 HEARTBEAT l1dev-211 2014-01-13 15:13:57,964 HEARTBEAT l1-dev06 2014-01-13 15:13:58,001 HEARTBEAT l1-DEV05 My Data distributed as blocks to other nodes. The host with IP 10.12.11.210 has all the data and this is the one which is serving all the request. Total number of blocks: 8 1073741866: 10.12.11.211:50010 View Block Info 10.12.11.210:50010 View Block Info 1073741867: 10.12.11.211:50010 View Block Info 10.12.11.210:50010 View Block Info 1073741868: 10.12.11.210:50010 View Block Info 10.12.11.209:50010 View Block Info 1073741869: 10.12.11.210:50010 View Block Info 10.12.11.209:50010 View Block Info 1073741870: 10.12.11.211:50010 View Block Info 10.12.11.210:50010 View Block Info 1073741871: 10.12.11.210:50010 View Block Info 10.12.11.209:50010 View Block Info 1073741872: 10.12.11.211:50010 View Block Info 10.12.11.210:50010 View Block Info 1073741873: 10.12.11.210:50010 View Block Info 10.12.11.209:50010 View Block Info Someone please advice on how to go about this. --Ashish On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <[email protected]> wrote: > Thanks for all these suggestions. Somehow I do not have access to the > servers today and will try the suggestions made on monday and will let you > know how it goes. > > --Ashish > > > On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo < > [email protected]> wrote: > >> Ashish >> >> Could this be related to the scheduler you are using and its settings?. >> >> >> >> On lab environments when running a single type of job I often use >> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does >> a good job distributing the load. >> >> >> >> You could give that a try ( >> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html >> ) >> >> >> >> I think just changing yarn-site.xml as follows could demonstrate this >> theory (note that how the jobs are scheduled depend on resources such as >> memory on the nodes and you would need to setup yarn-site.xml accordingly). >> >> >> >> <property> >> >> <name>yarn.resourcemanager.scheduler.class</name> >> >> >> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> >> >> </property> >> >> >> >> Regards >> >> ./g >> >> >> >> >> >> *From:* Ashish Jain [mailto:[email protected]] >> *Sent:* Thursday, January 09, 2014 6:46 AM >> *To:* [email protected] >> *Subject:* Re: Distributing the code to multiple nodes >> >> >> >> Another point to add here 10.12.11.210 is the host which has everything >> running including a slave datanode. Data was also distributed this host as >> well as the jar file. Following are running on 10.12.11.210 >> >> 7966 DataNode >> 8480 NodeManager >> 8353 ResourceManager >> 8141 SecondaryNameNode >> 7834 NameNode >> >> >> >> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <[email protected]> wrote: >> >> Logs were updated only when I copied the data. After copying the data >> there has been no updates on the log files. >> >> >> >> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <[email protected]> >> wrote: >> >> Do the logs on the three nodes contain anything interesting? >> Chris >> >> On Jan 9, 2014 3:47 AM, "Ashish Jain" <[email protected]> wrote: >> >> Here is the block info for the record I distributed. As can be seen only >> 10.12.11.210 has all the data and this is the node which is serving all the >> request. Replicas are available with 209 as well as 210 >> >> 1073741857: 10.12.11.210:50010 View Block Info >> 10.12.11.209:50010 View Block Info >> 1073741858: 10.12.11.210:50010 View Block Info >> 10.12.11.211:50010 View Block Info >> 1073741859: 10.12.11.210:50010 View Block Info >> 10.12.11.209:50010 View Block Info >> 1073741860: 10.12.11.210:50010 View Block Info >> 10.12.11.211:50010 View Block Info >> 1073741861: 10.12.11.210:50010 View Block Info >> 10.12.11.209:50010 View Block Info >> 1073741862: 10.12.11.210:50010 View Block Info >> 10.12.11.209:50010 View Block Info >> 1073741863: 10.12.11.210:50010 View Block Info >> 10.12.11.209:50010 View Block Info >> 1073741864: 10.12.11.210:50010 View Block Info >> 10.12.11.209:50010 View Block Info >> >> --Ashish >> >> >> >> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <[email protected]> wrote: >> >> Hello Chris, >> >> I have now a cluster with 3 nodes and replication factor being 2. When I >> distribute a file I could see that there are replica of data available in >> other nodes. However when I run a map reduce job again only one node is >> serving all the request :(. Can you or anyone please provide some more >> inputs. >> >> Thanks >> Ashish >> >> >> >> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <[email protected]> >> wrote: >> >> 2 nodes and replication factor of 2 results in a replica of each block >> present on each node. This would allow the possibility that a single node >> would do the work and yet be data local. It will probably happen if that >> single node has the needed capacity. More nodes than the replication >> factor are needed to force distribution of the processing. >> Chris >> >> On Jan 8, 2014 7:35 AM, "Ashish Jain" <[email protected]> wrote: >> >> Guys, >> >> I am sure that only one node is being used. I just know ran the job again >> and could see that CPU usage only for one server going high other server >> CPU usage remains constant and hence it means other node is not being used. >> Can someone help me to debug this issue? >> >> ++Ashish >> >> >> >> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <[email protected]> wrote: >> >> Hello All, >> >> I have a 2 node hadoop cluster running with a replication factor of 2. I >> have a file of size around 1 GB which when copied to HDFS is replicated to >> both the nodes. Seeing the block info I can see the file has been >> subdivided into 8 parts which means it has been subdivided into 8 blocks >> each of size 128 MB. I use this file as input to run the word count >> program. Some how I feel only one node is doing all the work and the code >> is not distributed to other node. How can I make sure code is distributed >> to both the nodes? Also is there a log or GUI which can be used for this? >> >> Please note I am using the latest stable release that is 2.2.0. >> >> ++Ashish >> >> >> >> >> >> >> >> >> >> >> > >
