Re: Data node not able to contact the resource manager

Daniel Santos Tue, 06 Aug 2019 14:05:47 -0700

Hello,

Of course I will be most happy to share it here. Here goes the configuration 
file I am using on the client :


<?xml version="1.0"?>
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hadoopnamenode:9000/</value>
        </property>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>yarn.resourcemanager.address</name>
                <value>hadoopresourcemanager:8032</value>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value>hadoopresourcemanager:8030</value>
        </property>
        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>hadoopresourcemanager:8031</value>
        </property>
</configuration>

I then supply the file in the command used to run the job on the cluster :

~/devtools/hadoop-2.7.0/bin/yarn \
        jar avg_imgsize.jar net.xekmypic.hadoop.avgfilesize.JobDriver \
        -conf ../clusterconfig/hadoop-cluster.xml \
        hdfs://hadoopnamenode:9000/input/avgfilesize 
hdfs://hadoopnamenode:9000/output_avgfilesize

In the above command the file supplied next to the -conf parameter is the one 
containing the xml.

Cheers

> On 6 Aug 2019, at 20:14, Jon Mack <jmack...@gmail.com> wrote:
> 
> Can your share to the group what the xlm configuration file it was. Maybe it 
> could help someone in the future.
> 
> Thanks for letting us know the outcome.
> 
> On Mon, Aug 5, 2019 at 6:00 PM Daniel Santos <daniel.d...@gmail.com 
> <mailto:daniel.d...@gmail.com>> wrote:
> Hello
> 
> I found out the cause of the error. When I submit a job to the cluster, I 
> supply a xml configuration file with properties of the cluster I am 
> connecting to.
> I had to replicate some properties related to addresses of yarn on that 
> configuration file.
> 
> I though that the cluster configuration would be sufficient, but no.
> 
> Thanks for your interest
> Regards
> 
> 
>> On 5 Aug 2019, at 19:21, Jon Mack <jmack...@gmail.com 
>> <mailto:jmack...@gmail.com>> wrote:
>> 
>> Doesn't look the client is resolving the IP Address correctly (IE 
>> 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030> ), try a nslookup on one 
>> of the clients (IE nslookup  hadoopresourcemanager ) to see what the client 
>> is resolving it to. Change the configuration to use the IP Address instead 
>> of the hostname if possible.
>> 
>> Also do a netstat -an | grep 8030 on hadoopresourcemanager to verify the 
>> resource manager service is running.
>> 
>> 
>> On Mon, Aug 5, 2019 at 12:38 PM Daniel Santos <daniel.d...@gmail.com 
>> <mailto:daniel.d...@gmail.com>> wrote:
>> Hello,
>> I am using hosts files on all machines that are centrally managed through 
>> puppet. When I run the yarn startup script on the hadoopresourcemanager 
>> machine it creates the node managers one each slave. 
>> 
>> Regards
>> 
>> Sent from my iPhone
>> 
>> On 5 Aug 2019, at 16:01, Jeff Hubbs <jhubbsl...@att.net 
>> <mailto:jhubbsl...@att.net>> wrote:
>> 
>>> Does "hadoopresourcemanager" resolve to a machine that's a Hadoop resource 
>>> manager? In Hadoop, it's absolutely vital that all names resolve correctly 
>>> in both directions.
>>> 
>>> On 8/5/19 10:55 AM, Daniel Santos wrote:
>>>> Hello Jon,
>>>> 
>>>> I have the following yarn-site.xml :
>>>> 
>>>> <configuration>
>>>> ? ? ? ? <!-- Site specific YARN configuration properties -->
>>>> ? ? ? ? <property>
>>>> ? ? ? ? ? ? ? ? <name>yarn.acl.enable</name>
>>>> ? ? ? ? ? ? ? ? <value>0</value>
>>>> ? ? ? ? </property>
>>>> ? ? ? ? <property>
>>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.hostname</name>
>>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager</value>
>>>> ? ? ? ? </property>
>>>> ? ? ? ? <property>
>>>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager,aux-services</name>
>>>> ? ? ? ? ? ? ? ? <value>mapreduce_shuffle</value>
>>>> ? ? ? ? </property>
>>>> ? ? ? ? <property>
>>>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager.resource.memory-mb</name>
>>>> ? ? ? ? ? ? ? ? <value>1536</value>
>>>> ? ? ? ? </property>
>>>> ? ? ? ? <property>
>>>> ? ? ? ? ? ? ? ? <name>yarn.scheduler.maximum-allocation-mb</name>
>>>> ? ? ? ? ? ? ? ? <value>1536</value>
>>>> ? ? ? ? </property>
>>>> ? ? ? ? <property>
>>>> ? ? ? ? ? ? ? ? <name>yarn.scheduler.minimum-allocation-mb</name>
>>>> ? ? ? ? ? ? ? ? <value>128</value>
>>>> ? ? ? ? </property>
>>>> ? ? ? ? <property>
>>>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager.vmem-check-enabled</name>
>>>> ? ? ? ? ? ? ? ? <value>false</value>
>>>> ? ? ? ? </property>
>>>> ? ? ? ? <property>
>>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.address</name>
>>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8032</value>
>>>> ? ? ? ? </property>
>>>> ? ? ? ? <property>
>>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.scheduler.address</name>
>>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8030</value>
>>>> ? ? ? ? </property>
>>>> ? ? ? ? <property>
>>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.resource-tracker.address</name>
>>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8031</value>
>>>> ? ? ? ? </property>
>>>> </configuration>
>>>> 
>>>> So I can say, I already tried your suggestion
>>>> 
>>>> Cheers
>>>> 
>>>>> On 5 Aug 2019, at 15:22, Jon Mack <jmack...@gmail.com 
>>>>> <mailto:jmack...@gmail.com>> wrote:
>>>>> 
>>>>> Looks to me it's missing the resource manager configuration based on the 
>>>>> port it's trying to connect to.. 
>>>>> 
>>>>> On Mon, Aug 5, 2019 at 9:15 AM Daniel Santos <daniel.d...@gmail.com 
>>>>> <mailto:daniel.d...@gmail.com>> wrote:
>>>>> Hello,
>>>>> 
>>>>> I have a cluster with one machine holding the name nodes (primary and 
>>>>> secondary) a yarn node (resource manager) and four data nodes.
>>>>> I am running hadoop 2.7.0.
>>>>> 
>>>>> When I submit a job to the cluster I can see it in the scheduler webpage. 
>>>>> If I go to the container page and check the logs, in the syslog file i 
>>>>> have in the end the following :
>>>>> 
>>>>> 2019-08-05 14:58:05,962 INFO [main] org.apache.hadoop.ipc.Client: 
>>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 
>>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 2 time(s); retry policy is 
>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>>> MILLISECONDS)
>>>>> 2019-08-05 14:58:06,962 INFO [main] org.apache.hadoop.ipc.Client: 
>>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 
>>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 3 time(s); retry policy is 
>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>>> MILLISECONDS)
>>>>> 2019-08-05 14:58:07,963 INFO [main] org.apache.hadoop.ipc.Client: 
>>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 
>>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 4 time(s); retry policy is 
>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>>> MILLISECONDS)
>>>>> 2019-08-05 14:58:08,965 INFO [main] org.apache.hadoop.ipc.Client: 
>>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 
>>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 5 time(s); retry policy is 
>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>>> MILLISECONDS)
>>>>> 2019-08-05 14:58:09,966 INFO [main] org.apache.hadoop.ipc.Client: 
>>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 
>>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 6 time(s); retry policy is 
>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>>> MILLISECONDS)
>>>>> 2019-08-05 14:58:10,967 INFO [main] org.apache.hadoop.ipc.Client: 
>>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 
>>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 7 time(s); retry policy is 
>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>>> MILLISECONDS)
>>>>> 2019-08-05 14:58:11,968 INFO [main] org.apache.hadoop.ipc.Client: 
>>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 
>>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 8 time(s); retry policy is 
>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>>> MILLISECONDS)
>>>>> 2019-08-05 14:58:12,969 INFO [main] org.apache.hadoop.ipc.Client: 
>>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 
>>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 9 time(s); retry policy is 
>>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>>> MILLISECONDS)
>>>>> 
>>>>> I have checked the configuration of the resource manager and the data 
>>>>> node where the application is running on and the property : 
>>>>> ?yarn.resourcemanager.hostname that I have set in yarn-site.xml is shown.
>>>>> I have disabled ipv6 on the yarn machine, as some posts on the internet 
>>>>> suggested. All the configuration files are the same in every node of the 
>>>>> cluster.
>>>>> 
>>>>> still I am getting these errors, and the application ends with a timeout.
>>>>> 
>>>>> What am I doing wrong ?
>>>>> 
>>>>> Thanks
>>>>> Regards
>>>> 
>>> 
>

Re: Data node not able to contact the resource manager

Reply via email to