Re: Distributing the code to multiple nodes

Ashish Jain Wed, 15 Jan 2014 05:38:44 -0800

I tried that but somehow my map reduce jobs do not execute at all once I
set it to yarn



On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <[email protected]>wrote:

>  Surely you don’t have to set **mapreduce.jobtracker.address** in
> mapred-site.xml
>
>
>
> In mapred-site.xml you just have to mention:
>
> <property>
>
> <name>mapreduce.framework.name</name>
>
> <value>yarn</value>
>
> </property>
>
>
>
> -Nirmal
>
> *From:* Ashish Jain [mailto:[email protected]]
> *Sent:* Wednesday, January 15, 2014 6:44 PM
>
> *To:* [email protected]
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> I think this is the problem. I have not set "mapreduce.jobtracker.address"
> in my mapred-site.xml and by default it is set to local. Now the question
> is how to set it up to remote. Documentation says I need to specify the
> host:port of the job tracker for this. As we know hadoop 2.2.0 is
> completely overhauled and there is no concept of task tracker and job
> tracker. Instead there is now resource manager and node manager. So in this
> case what do I set as "mapreduce.jobtracker.address". Do I set is
> resourceMangerHost:resourceMangerPort?
>
> --Ashish
>
>
>
> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <[email protected]> wrote:
>
>  Hi Sudhakar,
>
> Indeed there was a type the complete command is as follows except the main
> class since my manifest has the entry for main class.
> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> Next I killed the datanode in 10.12.11.210 and l see the following
> messages in the log files. Looks like the namenode is still trying to
> assign the complete task to one single node and since it does not find the
> complete data set in one node it is complaining.
>
>
> 2014-01-15 16:38:26,894 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,348 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,871 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,897 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,349 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,874 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,900 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
>   --Ashish
>
>
>
> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <[email protected]>
> wrote:
>
>   Hello Ashish
>
>
>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
>   Unless if it typo mistake the command should be
> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> One more thing try , just stop datanode process in  10.12.11.210 and run
> the job
>
>
>
> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <[email protected]> wrote:
>
>     Hello Sudhakara,
>
> Thanks for your suggestion. However once I change the mapreduce framework
> to yarn my map reduce jobs does not get executed at all. It seems it is
> waiting on some thread indefinitely. Here is what I have done
>
> 1) Set the mapreduce framework to yarn in mapred-site.xml
> <property>
>  <name>mapreduce.framework.name</name>
>  <value>yarn</value>
> </property>
>
> 2) Run the example again using the command
>
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> The jobs are just stuck and do not move further.
>
>   I also tried the following and it complains of filenotfound exception
> and some security exception
>
> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
> file:///opt/ApacheHadoop/out/
>
> Below is the status of the job from hadoop application console. The
> progress bar does not move at all.
>
>
>
> *ID *
>
> *User *
>
> *Name *
>
> *Application Type *
>
> *Queue *
>
> *StartTime *
>
> *FinishTime *
>
> *State *
>
> *FinalStatus *
>
> *Progress *
>
> *Tracking UI *
>
> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>
> root
>
> wordcount
>
> MAPREDUCE
>
> default
>
> Wed, 15 Jan 2014 07:52:04 GMT
>
> N/A
>
> ACCEPTED
>
> UNDEFINED
>
> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>
>
>
> Please advice what should I do
>
> --Ashish
>
>
>
> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <[email protected]>
> wrote:
>
>   Hello Ashish
>
> It seems job is running in Local job runner(LocalJobRunner) by reading the
> Local file system. Can you try by give the full URI path of the input and
> output path.
>
> like
>
> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
> file:///home/input/  file:///home/output/
>
>
>
> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <[email protected]> wrote:
>
>   German,
>
> This does not seem to be helping. I tried to use the Fairscheduler as my
> resource manger but the behavior remains same. I could see the
> fairscheduler log getting continuous heart beat from both the other nodes.
> But it is still not distributing the work to other nodes. What I did next
> was started 3 jobs simultaneously so that may be some part of one of the
> job be distributed to other nodes. However still only one node is being
> used :(((. What is that is going wrong can some one help?
>
> Sample of fairsheduler log:
> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>
> My Data distributed as blocks to other nodes. The host with IP
> 10.12.11.210 has all the data and this is the one which is serving all the
> request.
>
> Total number of blocks: 8
> 1073741866:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741867:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741868:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741869:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741870:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741871:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741872:         10.12.11.211:50010    View Block Info
> 10.12.11.210:50010    View Block Info
> 1073741873:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
>
>
> Someone please advice on how to go about this.
>
> --Ashish
>
>
>
> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <[email protected]> wrote:
>
>  Thanks for all these suggestions. Somehow I do not have access to the
> servers today and will try the suggestions made on monday and will let you
> know how it goes.
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
> [email protected]> wrote:
>
>  Ashish
>
> Could this be related to the scheduler you are using and its settings?.
>
>
>
> On lab environments when running a single type of job I often use
> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
> a good job distributing the load.
>
>
>
> You could give that a try (
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
> )
>
>
>
> I think just changing yarn-site.xml  as follows could demonstrate this
> theory (note that  how the jobs are scheduled depend on resources such as
> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>
>
>
> <property>
>
>   <name>yarn.resourcemanager.scheduler.class</name>
>
>
> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>
> </property>
>
>
>
> Regards
>
> ./g
>
>
>
>
>
> *From:* Ashish Jain [mailto:[email protected]]
> *Sent:* Thursday, January 09, 2014 6:46 AM
> *To:* [email protected]
> *Subject:* Re: Distributing the code to multiple nodes
>
>
>
> Another point to add here 10.12.11.210 is the host which has everything
> running including a slave datanode. Data was also distributed this host as
> well as the jar file. Following are running on 10.12.11.210
>
> 7966 DataNode
> 8480 NodeManager
> 8353 ResourceManager
> 8141 SecondaryNameNode
> 7834 NameNode
>
>
>
> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <[email protected]> wrote:
>
> Logs were updated only when I copied the data. After copying the data
> there has been no updates on the log files.
>
>
>
> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <[email protected]>
> wrote:
>
> Do the logs on the three nodes contain anything interesting?
> Chris
>
> On Jan 9, 2014 3:47 AM, "Ashish Jain" <[email protected]> wrote:
>
> Here is the block info for the record I distributed. As can be seen only
> 10.12.11.210 has all the data and this is the node which is serving all the
> request. Replicas are available with 209 as well as 210
>
> 1073741857:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741858:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741859:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741860:         10.12.11.210:50010    View Block Info
> 10.12.11.211:50010    View Block Info
> 1073741861:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741862:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741863:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
> 1073741864:         10.12.11.210:50010    View Block Info
> 10.12.11.209:50010    View Block Info
>
> --Ashish
>
>
>
> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <[email protected]> wrote:
>
> Hello Chris,
>
> I have now a cluster with 3 nodes and replication factor being 2. When I
> distribute a file I could see that there are replica of data available in
> other nodes. However when I run a map reduce job again only one node is
> serving all the request :(. Can you or anyone please provide some more
> inputs.
>
> Thanks
> Ashish
>
>
>
> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <[email protected]>
> wrote:
>
> 2 nodes and replication factor of 2 results in a replica of each block
> present on each node. This would allow the possibility that a single node
> would do the work and yet be data local.  It will probably happen if that
> single node has the needed capacity.  More nodes than the replication
> factor are needed to force distribution of the processing.
> Chris
>
> On Jan 8, 2014 7:35 AM, "Ashish Jain" <[email protected]> wrote:
>
> Guys,
>
> I am sure that only one node is being used. I just know ran the job again
> and could see that CPU usage only for one server going high other server
> CPU usage remains constant and hence it means other node is not being used.
> Can someone help me to debug this issue?
>
> ++Ashish
>
>
>
> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <[email protected]> wrote:
>
> Hello All,
>
> I have a 2 node hadoop cluster running with a replication factor of 2. I
> have a file of size around 1 GB which when copied to HDFS is replicated to
> both the nodes. Seeing the block info I can see the file has been
> subdivided into 8 parts which means it has been subdivided into 8 blocks
> each of size 128 MB.  I use this file as input to run the word count
> program. Some how I feel only one node is doing all the work and the code
> is not distributed to other node. How can I make sure code is distributed
> to both the nodes? Also is there a log or GUI which can be used for this?
>
> Please note I am using the latest stable release that is 2.2.0.
>
> ++Ashish
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>   --
>
>
> Regards,
> ...Sudhakara.st
>
>
>
>
>
>
>   --
>
>
> Regards,
> ...Sudhakara.st
>
>
>
>
>
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>

Re: Distributing the code to multiple nodes

Reply via email to