Voila!! It worked finally :). Thanks a lot for all the support from all the folks in this forum. So here is the summary for others on what I did finally to solve this up:
1) change the framework to yarn using mapreduce.framework.name in mapred-site.xml 2) In yarn-site.xml add the following properties <name>yarn.nodemanager.resource.memory-mb</name> <name>yarn.scheduler.minimum-allocation-mb</name> 3) In mapred-site.xml add the following properties <name>mapreduce.map.memory.mb</name> <name>mapreduce.reduce.memory.mb</name> <name>mapreduce.map.java.opts</name> <name>mapreduce.reduce.java.opts</name> 4) Use capacity scheduler. I think fair scheduler may also work but I used capacity scheduler Start the system and run the jobs it will be distributed across all the nodes. I could see 8 map jobs running because I had 8 data blocks and also all the nodes serving the request. However I still see only 1 reduce job I will address that in a separate post --Ashish On Wed, Jan 15, 2014 at 7:23 PM, sudhakara st <[email protected]>wrote: > Hello Ashish > > > WARN > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Node : l1-DEV05:1004 does not have sufficient resource for request : > {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, > Location: *, Relax Locality: true} node total capability : <memory:1024, > vCores:8> > > Resource manager trying allocate memory 2GB but it available 1GB. > > > On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <[email protected]> wrote: > >> I tried that but somehow my map reduce jobs do not execute at all once I >> set it to yarn >> >> >> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <[email protected] >> > wrote: >> >>> Surely you don’t have to set **mapreduce.jobtracker.address** in >>> mapred-site.xml >>> >>> >>> >>> In mapred-site.xml you just have to mention: >>> >>> <property> >>> >>> <name>mapreduce.framework.name</name> >>> >>> <value>yarn</value> >>> >>> </property> >>> >>> >>> >>> -Nirmal >>> >>> *From:* Ashish Jain [mailto:[email protected]] >>> *Sent:* Wednesday, January 15, 2014 6:44 PM >>> >>> *To:* [email protected] >>> *Subject:* Re: Distributing the code to multiple nodes >>> >>> >>> >>> I think this is the problem. I have not set >>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is >>> set to local. Now the question is how to set it up to remote. Documentation >>> says I need to specify the host:port of the job tracker for this. As we >>> know hadoop 2.2.0 is completely overhauled and there is no concept of task >>> tracker and job tracker. Instead there is now resource manager and node >>> manager. So in this case what do I set as "mapreduce.jobtracker.address". >>> Do I set is resourceMangerHost:resourceMangerPort? >>> >>> --Ashish >>> >>> >>> >>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <[email protected]> wrote: >>> >>> Hi Sudhakar, >>> >>> Indeed there was a type the complete command is as follows except the >>> main class since my manifest has the entry for main class. >>> /hadoop jar wordCount.jar /opt/ApacheHadoop/temp/worker.log >>> /opt/ApacheHadoop/out/ >>> >>> Next I killed the datanode in 10.12.11.210 and l see the following >>> messages in the log files. Looks like the namenode is still trying to >>> assign the complete task to one single node and since it does not find the >>> complete data set in one node it is complaining. >>> >>> >>> 2014-01-15 16:38:26,894 WARN >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: >>> Node : l1-DEV05:1004 does not have sufficient resource for request : >>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, >>> Location: *, Relax Locality: true} node total capability : <memory:1024, >>> vCores:8> >>> 2014-01-15 16:38:27,348 WARN >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: >>> Node : l1dev-211:1004 does not have sufficient resource for request : >>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, >>> Location: *, Relax Locality: true} node total capability : <memory:1024, >>> vCores:8> >>> 2014-01-15 16:38:27,871 WARN >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: >>> Node : l1-dev06:1004 does not have sufficient resource for request : >>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, >>> Location: *, Relax Locality: true} node total capability : <memory:1024, >>> vCores:8> >>> 2014-01-15 16:38:27,897 WARN >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: >>> Node : l1-DEV05:1004 does not have sufficient resource for request : >>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, >>> Location: *, Relax Locality: true} node total capability : <memory:1024, >>> vCores:8> >>> 2014-01-15 16:38:28,349 WARN >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: >>> Node : l1dev-211:1004 does not have sufficient resource for request : >>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, >>> Location: *, Relax Locality: true} node total capability : <memory:1024, >>> vCores:8> >>> 2014-01-15 16:38:28,874 WARN >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: >>> Node : l1-dev06:1004 does not have sufficient resource for request : >>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, >>> Location: *, Relax Locality: true} node total capability : <memory:1024, >>> vCores:8> >>> 2014-01-15 16:38:28,900 WARN >>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: >>> Node : l1-DEV05:1004 does not have sufficient resource for request : >>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, >>> Location: *, Relax Locality: true} node total capability : <memory:1024, >>> vCores:8> >>> >>> --Ashish >>> >>> >>> >>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <[email protected]> >>> wrote: >>> >>> Hello Ashish >>> >>> >>> >>> 2) Run the example again using the command >>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log >>> /opt/ApacheHadoop/out/ >>> >>> Unless if it typo mistake the command should be >>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log >>> /opt/ApacheHadoop/out/ >>> >>> One more thing try , just stop datanode process in 10.12.11.210 and run >>> the job >>> >>> >>> >>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <[email protected]> wrote: >>> >>> Hello Sudhakara, >>> >>> Thanks for your suggestion. However once I change the mapreduce >>> framework to yarn my map reduce jobs does not get executed at all. It seems >>> it is waiting on some thread indefinitely. Here is what I have done >>> >>> 1) Set the mapreduce framework to yarn in mapred-site.xml >>> <property> >>> <name>mapreduce.framework.name</name> >>> <value>yarn</value> >>> </property> >>> >>> 2) Run the example again using the command >>> >>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log >>> /opt/ApacheHadoop/out/ >>> >>> The jobs are just stuck and do not move further. >>> >>> I also tried the following and it complains of filenotfound exception >>> and some security exception >>> >>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log >>> file:///opt/ApacheHadoop/out/ >>> >>> Below is the status of the job from hadoop application console. The >>> progress bar does not move at all. >>> >>> >>> >>> *ID * >>> >>> *User * >>> >>> *Name * >>> >>> *Application Type * >>> >>> *Queue * >>> >>> *StartTime * >>> >>> *FinishTime * >>> >>> *State * >>> >>> *FinalStatus * >>> >>> *Progress * >>> >>> *Tracking UI * >>> >>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002> >>> >>> root >>> >>> wordcount >>> >>> MAPREDUCE >>> >>> default >>> >>> Wed, 15 Jan 2014 07:52:04 GMT >>> >>> N/A >>> >>> ACCEPTED >>> >>> UNDEFINED >>> >>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps> >>> >>> >>> >>> Please advice what should I do >>> >>> --Ashish >>> >>> >>> >>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <[email protected]> >>> wrote: >>> >>> Hello Ashish >>> >>> It seems job is running in Local job runner(LocalJobRunner) by reading >>> the Local file system. Can you try by give the full URI path of the input >>> and output path. >>> >>> like >>> >>> $hadoop jar program.jar ProgramName -Dmapreduce.framework.name=yarn >>> file:///home/input/ file:///home/output/ >>> >>> >>> >>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <[email protected]> wrote: >>> >>> German, >>> >>> This does not seem to be helping. I tried to use the Fairscheduler as my >>> resource manger but the behavior remains same. I could see the >>> fairscheduler log getting continuous heart beat from both the other nodes. >>> But it is still not distributing the work to other nodes. What I did next >>> was started 3 jobs simultaneously so that may be some part of one of the >>> job be distributed to other nodes. However still only one node is being >>> used :(((. What is that is going wrong can some one help? >>> >>> Sample of fairsheduler log: >>> 2014-01-13 15:13:54,293 HEARTBEAT l1dev-211 >>> 2014-01-13 15:13:54,953 HEARTBEAT l1-dev06 >>> 2014-01-13 15:13:54,988 HEARTBEAT l1-DEV05 >>> 2014-01-13 15:13:55,295 HEARTBEAT l1dev-211 >>> 2014-01-13 15:13:55,956 HEARTBEAT l1-dev06 >>> 2014-01-13 15:13:55,993 HEARTBEAT l1-DEV05 >>> 2014-01-13 15:13:56,297 HEARTBEAT l1dev-211 >>> 2014-01-13 15:13:56,960 HEARTBEAT l1-dev06 >>> 2014-01-13 15:13:56,997 HEARTBEAT l1-DEV05 >>> 2014-01-13 15:13:57,299 HEARTBEAT l1dev-211 >>> 2014-01-13 15:13:57,964 HEARTBEAT l1-dev06 >>> 2014-01-13 15:13:58,001 HEARTBEAT l1-DEV05 >>> >>> My Data distributed as blocks to other nodes. The host with IP >>> 10.12.11.210 has all the data and this is the one which is serving all the >>> request. >>> >>> Total number of blocks: 8 >>> 1073741866: 10.12.11.211:50010 View Block Info >>> 10.12.11.210:50010 View Block Info >>> 1073741867: 10.12.11.211:50010 View Block Info >>> 10.12.11.210:50010 View Block Info >>> 1073741868: 10.12.11.210:50010 View Block Info >>> 10.12.11.209:50010 View Block Info >>> 1073741869: 10.12.11.210:50010 View Block Info >>> 10.12.11.209:50010 View Block Info >>> 1073741870: 10.12.11.211:50010 View Block Info >>> 10.12.11.210:50010 View Block Info >>> 1073741871: 10.12.11.210:50010 View Block Info >>> 10.12.11.209:50010 View Block Info >>> 1073741872: 10.12.11.211:50010 View Block Info >>> 10.12.11.210:50010 View Block Info >>> 1073741873: 10.12.11.210:50010 View Block Info >>> 10.12.11.209:50010 View Block Info >>> >>> >>> >>> Someone please advice on how to go about this. >>> >>> --Ashish >>> >>> >>> >>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <[email protected]> >>> wrote: >>> >>> Thanks for all these suggestions. Somehow I do not have access to the >>> servers today and will try the suggestions made on monday and will let you >>> know how it goes. >>> >>> --Ashish >>> >>> >>> >>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo < >>> [email protected]> wrote: >>> >>> Ashish >>> >>> Could this be related to the scheduler you are using and its settings?. >>> >>> >>> >>> On lab environments when running a single type of job I often use >>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does >>> a good job distributing the load. >>> >>> >>> >>> You could give that a try ( >>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html >>> ) >>> >>> >>> >>> I think just changing yarn-site.xml as follows could demonstrate this >>> theory (note that how the jobs are scheduled depend on resources such as >>> memory on the nodes and you would need to setup yarn-site.xml accordingly). >>> >>> >>> >>> <property> >>> >>> <name>yarn.resourcemanager.scheduler.class</name> >>> >>> >>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> >>> >>> </property> >>> >>> >>> >>> Regards >>> >>> ./g >>> >>> >>> >>> >>> >>> *From:* Ashish Jain [mailto:[email protected]] >>> *Sent:* Thursday, January 09, 2014 6:46 AM >>> *To:* [email protected] >>> *Subject:* Re: Distributing the code to multiple nodes >>> >>> >>> >>> Another point to add here 10.12.11.210 is the host which has everything >>> running including a slave datanode. Data was also distributed this host as >>> well as the jar file. Following are running on 10.12.11.210 >>> >>> 7966 DataNode >>> 8480 NodeManager >>> 8353 ResourceManager >>> 8141 SecondaryNameNode >>> 7834 NameNode >>> >>> >>> >>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <[email protected]> wrote: >>> >>> Logs were updated only when I copied the data. After copying the data >>> there has been no updates on the log files. >>> >>> >>> >>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <[email protected]> >>> wrote: >>> >>> Do the logs on the three nodes contain anything interesting? >>> Chris >>> >>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <[email protected]> wrote: >>> >>> Here is the block info for the record I distributed. As can be seen only >>> 10.12.11.210 has all the data and this is the node which is serving all the >>> request. Replicas are available with 209 as well as 210 >>> >>> 1073741857: 10.12.11.210:50010 View Block Info >>> 10.12.11.209:50010 View Block Info >>> 1073741858: 10.12.11.210:50010 View Block Info >>> 10.12.11.211:50010 View Block Info >>> 1073741859: 10.12.11.210:50010 View Block Info >>> 10.12.11.209:50010 View Block Info >>> 1073741860: 10.12.11.210:50010 View Block Info >>> 10.12.11.211:50010 View Block Info >>> 1073741861: 10.12.11.210:50010 View Block Info >>> 10.12.11.209:50010 View Block Info >>> 1073741862: 10.12.11.210:50010 View Block Info >>> 10.12.11.209:50010 View Block Info >>> 1073741863: 10.12.11.210:50010 View Block Info >>> 10.12.11.209:50010 View Block Info >>> 1073741864: 10.12.11.210:50010 View Block Info >>> 10.12.11.209:50010 View Block Info >>> >>> --Ashish >>> >>> >>> >>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <[email protected]> wrote: >>> >>> Hello Chris, >>> >>> I have now a cluster with 3 nodes and replication factor being 2. When I >>> distribute a file I could see that there are replica of data available in >>> other nodes. However when I run a map reduce job again only one node is >>> serving all the request :(. Can you or anyone please provide some more >>> inputs. >>> >>> Thanks >>> Ashish >>> >>> >>> >>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <[email protected]> >>> wrote: >>> >>> 2 nodes and replication factor of 2 results in a replica of each block >>> present on each node. This would allow the possibility that a single node >>> would do the work and yet be data local. It will probably happen if that >>> single node has the needed capacity. More nodes than the replication >>> factor are needed to force distribution of the processing. >>> Chris >>> >>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <[email protected]> wrote: >>> >>> Guys, >>> >>> I am sure that only one node is being used. I just know ran the job >>> again and could see that CPU usage only for one server going high other >>> server CPU usage remains constant and hence it means other node is not >>> being used. Can someone help me to debug this issue? >>> >>> ++Ashish >>> >>> >>> >>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <[email protected]> wrote: >>> >>> Hello All, >>> >>> I have a 2 node hadoop cluster running with a replication factor of 2. I >>> have a file of size around 1 GB which when copied to HDFS is replicated to >>> both the nodes. Seeing the block info I can see the file has been >>> subdivided into 8 parts which means it has been subdivided into 8 blocks >>> each of size 128 MB. I use this file as input to run the word count >>> program. Some how I feel only one node is doing all the work and the code >>> is not distributed to other node. How can I make sure code is distributed >>> to both the nodes? Also is there a log or GUI which can be used for this? >>> >>> Please note I am using the latest stable release that is 2.2.0. >>> >>> ++Ashish >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> >>> Regards, >>> ...Sudhakara.st >>> >>> >>> >>> >>> >>> >>> -- >>> >>> >>> Regards, >>> ...Sudhakara.st >>> >>> >>> >>> >>> >>> >>> ------------------------------ >>> >>> >>> >>> >>> >>> >>> NOTE: This message may contain information that is confidential, >>> proprietary, privileged or otherwise protected by law. The message is >>> intended solely for the named addressee. If received in error, please >>> destroy and notify the sender. Any use of this email is prohibited when >>> received in error. Impetus does not represent, warrant and/or guarantee, >>> that the integrity of this communication has been maintained nor that the >>> communication is free of errors, virus, interception or interference. >>> >> >> > > > -- > > Regards, > ...Sudhakara.st > >
