I tried that but somehow my map reduce jobs do not execute at all once I set it to yarn
On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <[email protected]>wrote: > Surely you don’t have to set **mapreduce.jobtracker.address** in > mapred-site.xml > > > > In mapred-site.xml you just have to mention: > > <property> > > <name>mapreduce.framework.name</name> > > <value>yarn</value> > > </property> > > > > -Nirmal > > *From:* Ashish Jain [mailto:[email protected]] > *Sent:* Wednesday, January 15, 2014 6:44 PM > > *To:* [email protected] > *Subject:* Re: Distributing the code to multiple nodes > > > > I think this is the problem. I have not set "mapreduce.jobtracker.address" > in my mapred-site.xml and by default it is set to local. Now the question > is how to set it up to remote. Documentation says I need to specify the > host:port of the job tracker for this. As we know hadoop 2.2.0 is > completely overhauled and there is no concept of task tracker and job > tracker. Instead there is now resource manager and node manager. So in this > case what do I set as "mapreduce.jobtracker.address". Do I set is > resourceMangerHost:resourceMangerPort? > > --Ashish > > > > On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <[email protected]> wrote: > > Hi Sudhakar, > > Indeed there was a type the complete command is as follows except the main > class since my manifest has the entry for main class. > /hadoop jar wordCount.jar /opt/ApacheHadoop/temp/worker.log > /opt/ApacheHadoop/out/ > > Next I killed the datanode in 10.12.11.210 and l see the following > messages in the log files. Looks like the namenode is still trying to > assign the complete task to one single node and since it does not find the > complete data set in one node it is complaining. > > > 2014-01-15 16:38:26,894 WARN > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Node : l1-DEV05:1004 does not have sufficient resource for request : > {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, > Location: *, Relax Locality: true} node total capability : <memory:1024, > vCores:8> > 2014-01-15 16:38:27,348 WARN > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Node : l1dev-211:1004 does not have sufficient resource for request : > {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, > Location: *, Relax Locality: true} node total capability : <memory:1024, > vCores:8> > 2014-01-15 16:38:27,871 WARN > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Node : l1-dev06:1004 does not have sufficient resource for request : > {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, > Location: *, Relax Locality: true} node total capability : <memory:1024, > vCores:8> > 2014-01-15 16:38:27,897 WARN > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Node : l1-DEV05:1004 does not have sufficient resource for request : > {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, > Location: *, Relax Locality: true} node total capability : <memory:1024, > vCores:8> > 2014-01-15 16:38:28,349 WARN > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Node : l1dev-211:1004 does not have sufficient resource for request : > {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, > Location: *, Relax Locality: true} node total capability : <memory:1024, > vCores:8> > 2014-01-15 16:38:28,874 WARN > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Node : l1-dev06:1004 does not have sufficient resource for request : > {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, > Location: *, Relax Locality: true} node total capability : <memory:1024, > vCores:8> > 2014-01-15 16:38:28,900 WARN > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > Node : l1-DEV05:1004 does not have sufficient resource for request : > {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, > Location: *, Relax Locality: true} node total capability : <memory:1024, > vCores:8> > > --Ashish > > > > On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <[email protected]> > wrote: > > Hello Ashish > > > > 2) Run the example again using the command > ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log > /opt/ApacheHadoop/out/ > > Unless if it typo mistake the command should be > ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log > /opt/ApacheHadoop/out/ > > One more thing try , just stop datanode process in 10.12.11.210 and run > the job > > > > On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <[email protected]> wrote: > > Hello Sudhakara, > > Thanks for your suggestion. However once I change the mapreduce framework > to yarn my map reduce jobs does not get executed at all. It seems it is > waiting on some thread indefinitely. Here is what I have done > > 1) Set the mapreduce framework to yarn in mapred-site.xml > <property> > <name>mapreduce.framework.name</name> > <value>yarn</value> > </property> > > 2) Run the example again using the command > > ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log > /opt/ApacheHadoop/out/ > > The jobs are just stuck and do not move further. > > I also tried the following and it complains of filenotfound exception > and some security exception > > ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log > file:///opt/ApacheHadoop/out/ > > Below is the status of the job from hadoop application console. The > progress bar does not move at all. > > > > *ID * > > *User * > > *Name * > > *Application Type * > > *Queue * > > *StartTime * > > *FinishTime * > > *State * > > *FinalStatus * > > *Progress * > > *Tracking UI * > > application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002> > > root > > wordcount > > MAPREDUCE > > default > > Wed, 15 Jan 2014 07:52:04 GMT > > N/A > > ACCEPTED > > UNDEFINED > > UNASSIGNE <http://10.12.11.210:8088/cluster/apps> > > > > Please advice what should I do > > --Ashish > > > > On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <[email protected]> > wrote: > > Hello Ashish > > It seems job is running in Local job runner(LocalJobRunner) by reading the > Local file system. Can you try by give the full URI path of the input and > output path. > > like > > $hadoop jar program.jar ProgramName -Dmapreduce.framework.name=yarn > file:///home/input/ file:///home/output/ > > > > On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <[email protected]> wrote: > > German, > > This does not seem to be helping. I tried to use the Fairscheduler as my > resource manger but the behavior remains same. I could see the > fairscheduler log getting continuous heart beat from both the other nodes. > But it is still not distributing the work to other nodes. What I did next > was started 3 jobs simultaneously so that may be some part of one of the > job be distributed to other nodes. However still only one node is being > used :(((. What is that is going wrong can some one help? > > Sample of fairsheduler log: > 2014-01-13 15:13:54,293 HEARTBEAT l1dev-211 > 2014-01-13 15:13:54,953 HEARTBEAT l1-dev06 > 2014-01-13 15:13:54,988 HEARTBEAT l1-DEV05 > 2014-01-13 15:13:55,295 HEARTBEAT l1dev-211 > 2014-01-13 15:13:55,956 HEARTBEAT l1-dev06 > 2014-01-13 15:13:55,993 HEARTBEAT l1-DEV05 > 2014-01-13 15:13:56,297 HEARTBEAT l1dev-211 > 2014-01-13 15:13:56,960 HEARTBEAT l1-dev06 > 2014-01-13 15:13:56,997 HEARTBEAT l1-DEV05 > 2014-01-13 15:13:57,299 HEARTBEAT l1dev-211 > 2014-01-13 15:13:57,964 HEARTBEAT l1-dev06 > 2014-01-13 15:13:58,001 HEARTBEAT l1-DEV05 > > My Data distributed as blocks to other nodes. The host with IP > 10.12.11.210 has all the data and this is the one which is serving all the > request. > > Total number of blocks: 8 > 1073741866: 10.12.11.211:50010 View Block Info > 10.12.11.210:50010 View Block Info > 1073741867: 10.12.11.211:50010 View Block Info > 10.12.11.210:50010 View Block Info > 1073741868: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741869: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741870: 10.12.11.211:50010 View Block Info > 10.12.11.210:50010 View Block Info > 1073741871: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741872: 10.12.11.211:50010 View Block Info > 10.12.11.210:50010 View Block Info > 1073741873: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > > > > Someone please advice on how to go about this. > > --Ashish > > > > On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <[email protected]> wrote: > > Thanks for all these suggestions. Somehow I do not have access to the > servers today and will try the suggestions made on monday and will let you > know how it goes. > > --Ashish > > > > On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo < > [email protected]> wrote: > > Ashish > > Could this be related to the scheduler you are using and its settings?. > > > > On lab environments when running a single type of job I often use > FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does > a good job distributing the load. > > > > You could give that a try ( > https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html > ) > > > > I think just changing yarn-site.xml as follows could demonstrate this > theory (note that how the jobs are scheduled depend on resources such as > memory on the nodes and you would need to setup yarn-site.xml accordingly). > > > > <property> > > <name>yarn.resourcemanager.scheduler.class</name> > > > <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value> > > </property> > > > > Regards > > ./g > > > > > > *From:* Ashish Jain [mailto:[email protected]] > *Sent:* Thursday, January 09, 2014 6:46 AM > *To:* [email protected] > *Subject:* Re: Distributing the code to multiple nodes > > > > Another point to add here 10.12.11.210 is the host which has everything > running including a slave datanode. Data was also distributed this host as > well as the jar file. Following are running on 10.12.11.210 > > 7966 DataNode > 8480 NodeManager > 8353 ResourceManager > 8141 SecondaryNameNode > 7834 NameNode > > > > On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <[email protected]> wrote: > > Logs were updated only when I copied the data. After copying the data > there has been no updates on the log files. > > > > On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <[email protected]> > wrote: > > Do the logs on the three nodes contain anything interesting? > Chris > > On Jan 9, 2014 3:47 AM, "Ashish Jain" <[email protected]> wrote: > > Here is the block info for the record I distributed. As can be seen only > 10.12.11.210 has all the data and this is the node which is serving all the > request. Replicas are available with 209 as well as 210 > > 1073741857: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741858: 10.12.11.210:50010 View Block Info > 10.12.11.211:50010 View Block Info > 1073741859: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741860: 10.12.11.210:50010 View Block Info > 10.12.11.211:50010 View Block Info > 1073741861: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741862: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741863: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741864: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > > --Ashish > > > > On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <[email protected]> wrote: > > Hello Chris, > > I have now a cluster with 3 nodes and replication factor being 2. When I > distribute a file I could see that there are replica of data available in > other nodes. However when I run a map reduce job again only one node is > serving all the request :(. Can you or anyone please provide some more > inputs. > > Thanks > Ashish > > > > On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <[email protected]> > wrote: > > 2 nodes and replication factor of 2 results in a replica of each block > present on each node. This would allow the possibility that a single node > would do the work and yet be data local. It will probably happen if that > single node has the needed capacity. More nodes than the replication > factor are needed to force distribution of the processing. > Chris > > On Jan 8, 2014 7:35 AM, "Ashish Jain" <[email protected]> wrote: > > Guys, > > I am sure that only one node is being used. I just know ran the job again > and could see that CPU usage only for one server going high other server > CPU usage remains constant and hence it means other node is not being used. > Can someone help me to debug this issue? > > ++Ashish > > > > On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <[email protected]> wrote: > > Hello All, > > I have a 2 node hadoop cluster running with a replication factor of 2. I > have a file of size around 1 GB which when copied to HDFS is replicated to > both the nodes. Seeing the block info I can see the file has been > subdivided into 8 parts which means it has been subdivided into 8 blocks > each of size 128 MB. I use this file as input to run the word count > program. Some how I feel only one node is doing all the work and the code > is not distributed to other node. How can I make sure code is distributed > to both the nodes? Also is there a log or GUI which can be used for this? > > Please note I am using the latest stable release that is 2.2.0. > > ++Ashish > > > > > > > > > > > > > > > > > > -- > > > Regards, > ...Sudhakara.st > > > > > > > -- > > > Regards, > ...Sudhakara.st > > > > > > > ------------------------------ > > > > > > > NOTE: This message may contain information that is confidential, > proprietary, privileged or otherwise protected by law. The message is > intended solely for the named addressee. If received in error, please > destroy and notify the sender. Any use of this email is prohibited when > received in error. Impetus does not represent, warrant and/or guarantee, > that the integrity of this communication has been maintained nor that the > communication is free of errors, virus, interception or interference. >
