RE: Distributing the code to multiple nodes

Nirmal Kumar Wed, 15 Jan 2014 05:31:39 -0800

Surely you don't have to set *mapreduce.jobtracker.address* in mapred-site.xml


In mapred-site.xml you just have to mention:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

-Nirmal
From: Ashish Jain [mailto:[email protected]]
Sent: Wednesday, January 15, 2014 6:44 PM
To: [email protected]
Subject: Re: Distributing the code to multiple nodes

I think this is the problem. I have not set "mapreduce.jobtracker.address" in 
my mapred-site.xml and by default it is set to local. Now the question is how 
to set it up to remote. Documentation says I need to specify the host:port of 
the job tracker for this. As we know hadoop 2.2.0 is completely overhauled and 
there is no concept of task tracker and job tracker. Instead there is now 
resource manager and node manager. So in this case what do I set as 
"mapreduce.jobtracker.address". Do I set is 
resourceMangerHost:resourceMangerPort?
--Ashish

On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain 
<[email protected]<mailto:[email protected]>> wrote:
Hi Sudhakar,

Indeed there was a type the complete command is as follows except the main 
class since my manifest has the entry for main class.
/hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log 
/opt/ApacheHadoop/out/
Next I killed the datanode in 10.12.11.210 and l see the following messages in 
the log files. Looks like the namenode is still trying to assign the complete 
task to one single node and since it does not find the complete data set in one 
node it is complaining.

2014-01-15 16:38:26,894 WARN 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 
0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax 
Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,348 WARN 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Node : l1dev-211:1004 does not have sufficient resource for request : 
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: 
*, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,871 WARN 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Node : l1-dev06:1004 does not have sufficient resource for request : {Priority: 
0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax 
Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:27,897 WARN 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 
0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax 
Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,349 WARN 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Node : l1dev-211:1004 does not have sufficient resource for request : 
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: 
*, Relax Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,874 WARN 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Node : l1-dev06:1004 does not have sufficient resource for request : {Priority: 
0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax 
Locality: true} node total capability : <memory:1024, vCores:8>
2014-01-15 16:38:28,900 WARN 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Node : l1-DEV05:1004 does not have sufficient resource for request : {Priority: 
0, Capability: <memory:2048, vCores:1>, # Containers: 1, Location: *, Relax 
Locality: true} node total capability : <memory:1024, vCores:8>

--Ashish

On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st 
<[email protected]<mailto:[email protected]>> wrote:
Hello Ashish


2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log 
/opt/ApacheHadoop/out/

Unless if it typo mistake the command should be
./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log 
/opt/ApacheHadoop/out/
One more thing try , just stop datanode process in  10.12.11.210 and run the job


On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain 
<[email protected]<mailto:[email protected]>> wrote:
Hello Sudhakara,
Thanks for your suggestion. However once I change the mapreduce framework to 
yarn my map reduce jobs does not get executed at all. It seems it is waiting on 
some thread indefinitely. Here is what I have done
1) Set the mapreduce framework to yarn in mapred-site.xml
<property>
 <name>mapreduce.framework.name<http://mapreduce.framework.name></name>
 <value>yarn</value>
</property>
2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log 
/opt/ApacheHadoop/out/
The jobs are just stuck and do not move further.

I also tried the following and it complains of filenotfound exception and some 
security exception

./hadoop dfs wordCount.jar 
file:///opt/ApacheHadoop/temp/worker.log<file:///\\opt\ApacheHadoop\temp\worker.log>
 file:///opt/ApacheHadoop/out/<file:///\\opt\ApacheHadoop\out\>
Below is the status of the job from hadoop application console. The progress 
bar does not move at all.

ID

User

Name

Application Type

Queue

StartTime

FinishTime

State

FinalStatus

Progress

Tracking UI

application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>

root

wordcount

MAPREDUCE

default

Wed, 15 Jan 2014 07:52:04 GMT

N/A

ACCEPTED

UNDEFINED

UNASSIGNE<http://10.12.11.210:8088/cluster/apps>


Please advice what should I do
--Ashish

On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st 
<[email protected]<mailto:[email protected]>> wrote:
Hello Ashish
It seems job is running in Local job runner(LocalJobRunner) by reading the 
Local file system. Can you try by give the full URI path of the input and 
output path.
like
$hadoop jar program.jar   ProgramName 
-Dmapreduce.framework.name<http://Dmapreduce.framework.name>=yarn 
file:///home/input/<file:///\\home\input\>  
file:///home/output/<file:///\\home\output\>

On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain 
<[email protected]<mailto:[email protected]>> wrote:
German,

This does not seem to be helping. I tried to use the Fairscheduler as my 
resource manger but the behavior remains same. I could see the fairscheduler 
log getting continuous heart beat from both the other nodes. But it is still 
not distributing the work to other nodes. What I did next was started 3 jobs 
simultaneously so that may be some part of one of the job be distributed to 
other nodes. However still only one node is being used :(((. What is that is 
going wrong can some one help?
Sample of fairsheduler log:
2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
My Data distributed as blocks to other nodes. The host with IP 10.12.11.210 has 
all the data and this is the one which is serving all the request.

Total number of blocks: 8
1073741866:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block 
Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741867:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block 
Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741868:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741869:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741870:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block 
Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741871:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741872:         10.12.11.211:50010<http://10.12.11.211:50010>    View Block 
Info         10.12.11.210:50010<http://10.12.11.210:50010>    View Block Info
1073741873:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info

Someone please advice on how to go about this.
--Ashish

On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for all these suggestions. Somehow I do not have access to the servers 
today and will try the suggestions made on monday and will let you know how it 
goes.
--Ashish

On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo 
<[email protected]<mailto:[email protected]>> wrote:
Ashish
Could this be related to the scheduler you are using and its settings?.

On lab environments when running a single type of job I often use FairScheduler 
(the YARN default in 2.2.0 is CapacityScheduler) and it does a good job 
distributing the load.

You could give that a try 
(https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html)

I think just changing yarn-site.xml  as follows could demonstrate this theory 
(note that  how the jobs are scheduled depend on resources such as memory on 
the nodes and you would need to setup yarn-site.xml accordingly).

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

Regards
./g


From: Ashish Jain [mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, January 09, 2014 6:46 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Distributing the code to multiple nodes

Another point to add here 10.12.11.210 is the host which has everything running 
including a slave datanode. Data was also distributed this host as well as the 
jar file. Following are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 SecondaryNameNode
7834 NameNode

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain 
<[email protected]<mailto:[email protected]>> wrote:
Logs were updated only when I copied the data. After copying the data there has 
been no updates on the log files.

On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata 
<[email protected]<mailto:[email protected]>> wrote:

Do the logs on the three nodes contain anything interesting?
Chris
On Jan 9, 2014 3:47 AM, "Ashish Jain" 
<[email protected]<mailto:[email protected]>> wrote:
Here is the block info for the record I distributed. As can be seen only 
10.12.11.210 has all the data and this is the node which is serving all the 
request. Replicas are available with 209 as well as 210

1073741857:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741858:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info
1073741859:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741860:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.211:50010<http://10.12.11.211:50010>    View Block Info
1073741861:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741862:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741863:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info
1073741864:         10.12.11.210:50010<http://10.12.11.210:50010>    View Block 
Info         10.12.11.209:50010<http://10.12.11.209:50010>    View Block Info








--Ashish

On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain 
<[email protected]<mailto:[email protected]>> wrote:
Hello Chris,
I have now a cluster with 3 nodes and replication factor being 2. When I 
distribute a file I could see that there are replica of data available in other 
nodes. However when I run a map reduce job again only one node is serving all 
the request :(. Can you or anyone please provide some more inputs.
Thanks
Ashish

On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata 
<[email protected]<mailto:[email protected]>> wrote:

2 nodes and replication factor of 2 results in a replica of each block present 
on each node. This would allow the possibility that a single node would do the 
work and yet be data local.  It will probably happen if that single node has 
the needed capacity.  More nodes than the replication factor are needed to 
force distribution of the processing.
Chris
On Jan 8, 2014 7:35 AM, "Ashish Jain" 
<[email protected]<mailto:[email protected]>> wrote:
Guys,
I am sure that only one node is being used. I just know ran the job again and 
could see that CPU usage only for one server going high other server CPU usage 
remains constant and hence it means other node is not being used. Can someone 
help me to debug this issue?
++Ashish

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain 
<[email protected]<mailto:[email protected]>> wrote:
Hello All,
I have a 2 node hadoop cluster running with a replication factor of 2. I have a 
file of size around 1 GB which when copied to HDFS is replicated to both the 
nodes. Seeing the block info I can see the file has been subdivided into 8 
parts which means it has been subdivided into 8 blocks each of size 128 MB.  I 
use this file as input to run the word count program. Some how I feel only one 
node is doing all the work and the code is not distributed to other node. How 
can I make sure code is distributed to both the nodes? Also is there a log or 
GUI which can be used for this?
Please note I am using the latest stable release that is 2.2.0.
++Ashish









--

Regards,
...Sudhakara.st




--

Regards,
...Sudhakara.st




________________________________






NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

RE: Distributing the code to multiple nodes

Reply via email to