Hello, Of course I will be most happy to share it here. Here goes the configuration file I am using on the client :
<?xml version="1.0"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoopnamenode:9000/</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>hadoopresourcemanager:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoopresourcemanager:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoopresourcemanager:8031</value> </property> </configuration> I then supply the file in the command used to run the job on the cluster : ~/devtools/hadoop-2.7.0/bin/yarn \ jar avg_imgsize.jar net.xekmypic.hadoop.avgfilesize.JobDriver \ -conf ../clusterconfig/hadoop-cluster.xml \ hdfs://hadoopnamenode:9000/input/avgfilesize hdfs://hadoopnamenode:9000/output_avgfilesize In the above command the file supplied next to the -conf parameter is the one containing the xml. Cheers > On 6 Aug 2019, at 20:14, Jon Mack <jmack...@gmail.com> wrote: > > Can your share to the group what the xlm configuration file it was. Maybe it > could help someone in the future. > > Thanks for letting us know the outcome. > > On Mon, Aug 5, 2019 at 6:00 PM Daniel Santos <daniel.d...@gmail.com > <mailto:daniel.d...@gmail.com>> wrote: > Hello > > I found out the cause of the error. When I submit a job to the cluster, I > supply a xml configuration file with properties of the cluster I am > connecting to. > I had to replicate some properties related to addresses of yarn on that > configuration file. > > I though that the cluster configuration would be sufficient, but no. > > Thanks for your interest > Regards > > >> On 5 Aug 2019, at 19:21, Jon Mack <jmack...@gmail.com >> <mailto:jmack...@gmail.com>> wrote: >> >> Doesn't look the client is resolving the IP Address correctly (IE >> 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030> ), try a nslookup on one >> of the clients (IE nslookup hadoopresourcemanager ) to see what the client >> is resolving it to. Change the configuration to use the IP Address instead >> of the hostname if possible. >> >> Also do a netstat -an | grep 8030 on hadoopresourcemanager to verify the >> resource manager service is running. >> >> >> On Mon, Aug 5, 2019 at 12:38 PM Daniel Santos <daniel.d...@gmail.com >> <mailto:daniel.d...@gmail.com>> wrote: >> Hello, >> I am using hosts files on all machines that are centrally managed through >> puppet. When I run the yarn startup script on the hadoopresourcemanager >> machine it creates the node managers one each slave. >> >> Regards >> >> Sent from my iPhone >> >> On 5 Aug 2019, at 16:01, Jeff Hubbs <jhubbsl...@att.net >> <mailto:jhubbsl...@att.net>> wrote: >> >>> Does "hadoopresourcemanager" resolve to a machine that's a Hadoop resource >>> manager? In Hadoop, it's absolutely vital that all names resolve correctly >>> in both directions. >>> >>> On 8/5/19 10:55 AM, Daniel Santos wrote: >>>> Hello Jon, >>>> >>>> I have the following yarn-site.xml : >>>> >>>> <configuration> >>>> ? ? ? ? <!-- Site specific YARN configuration properties --> >>>> ? ? ? ? <property> >>>> ? ? ? ? ? ? ? ? <name>yarn.acl.enable</name> >>>> ? ? ? ? ? ? ? ? <value>0</value> >>>> ? ? ? ? </property> >>>> ? ? ? ? <property> >>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.hostname</name> >>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager</value> >>>> ? ? ? ? </property> >>>> ? ? ? ? <property> >>>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager,aux-services</name> >>>> ? ? ? ? ? ? ? ? <value>mapreduce_shuffle</value> >>>> ? ? ? ? </property> >>>> ? ? ? ? <property> >>>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager.resource.memory-mb</name> >>>> ? ? ? ? ? ? ? ? <value>1536</value> >>>> ? ? ? ? </property> >>>> ? ? ? ? <property> >>>> ? ? ? ? ? ? ? ? <name>yarn.scheduler.maximum-allocation-mb</name> >>>> ? ? ? ? ? ? ? ? <value>1536</value> >>>> ? ? ? ? </property> >>>> ? ? ? ? <property> >>>> ? ? ? ? ? ? ? ? <name>yarn.scheduler.minimum-allocation-mb</name> >>>> ? ? ? ? ? ? ? ? <value>128</value> >>>> ? ? ? ? </property> >>>> ? ? ? ? <property> >>>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager.vmem-check-enabled</name> >>>> ? ? ? ? ? ? ? ? <value>false</value> >>>> ? ? ? ? </property> >>>> ? ? ? ? <property> >>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.address</name> >>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8032</value> >>>> ? ? ? ? </property> >>>> ? ? ? ? <property> >>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.scheduler.address</name> >>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8030</value> >>>> ? ? ? ? </property> >>>> ? ? ? ? <property> >>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.resource-tracker.address</name> >>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8031</value> >>>> ? ? ? ? </property> >>>> </configuration> >>>> >>>> So I can say, I already tried your suggestion >>>> >>>> Cheers >>>> >>>>> On 5 Aug 2019, at 15:22, Jon Mack <jmack...@gmail.com >>>>> <mailto:jmack...@gmail.com>> wrote: >>>>> >>>>> Looks to me it's missing the resource manager configuration based on the >>>>> port it's trying to connect to.. >>>>> >>>>> On Mon, Aug 5, 2019 at 9:15 AM Daniel Santos <daniel.d...@gmail.com >>>>> <mailto:daniel.d...@gmail.com>> wrote: >>>>> Hello, >>>>> >>>>> I have a cluster with one machine holding the name nodes (primary and >>>>> secondary) a yarn node (resource manager) and four data nodes. >>>>> I am running hadoop 2.7.0. >>>>> >>>>> When I submit a job to the cluster I can see it in the scheduler webpage. >>>>> If I go to the container page and check the logs, in the syslog file i >>>>> have in the end the following : >>>>> >>>>> 2019-08-05 14:58:05,962 INFO [main] org.apache.hadoop.ipc.Client: >>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 >>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 2 time(s); retry policy is >>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 >>>>> MILLISECONDS) >>>>> 2019-08-05 14:58:06,962 INFO [main] org.apache.hadoop.ipc.Client: >>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 >>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 3 time(s); retry policy is >>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 >>>>> MILLISECONDS) >>>>> 2019-08-05 14:58:07,963 INFO [main] org.apache.hadoop.ipc.Client: >>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 >>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 4 time(s); retry policy is >>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 >>>>> MILLISECONDS) >>>>> 2019-08-05 14:58:08,965 INFO [main] org.apache.hadoop.ipc.Client: >>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 >>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 5 time(s); retry policy is >>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 >>>>> MILLISECONDS) >>>>> 2019-08-05 14:58:09,966 INFO [main] org.apache.hadoop.ipc.Client: >>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 >>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 6 time(s); retry policy is >>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 >>>>> MILLISECONDS) >>>>> 2019-08-05 14:58:10,967 INFO [main] org.apache.hadoop.ipc.Client: >>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 >>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 7 time(s); retry policy is >>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 >>>>> MILLISECONDS) >>>>> 2019-08-05 14:58:11,968 INFO [main] org.apache.hadoop.ipc.Client: >>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 >>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 8 time(s); retry policy is >>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 >>>>> MILLISECONDS) >>>>> 2019-08-05 14:58:12,969 INFO [main] org.apache.hadoop.ipc.Client: >>>>> Retrying connect to server: 0.0.0.0/0.0.0.0:8030 >>>>> <http://0.0.0.0/0.0.0.0:8030>. Already tried 9 time(s); retry policy is >>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 >>>>> MILLISECONDS) >>>>> >>>>> I have checked the configuration of the resource manager and the data >>>>> node where the application is running on and the property : >>>>> ?yarn.resourcemanager.hostname that I have set in yarn-site.xml is shown. >>>>> I have disabled ipv6 on the yarn machine, as some posts on the internet >>>>> suggested. All the configuration files are the same in every node of the >>>>> cluster. >>>>> >>>>> still I am getting these errors, and the application ends with a timeout. >>>>> >>>>> What am I doing wrong ? >>>>> >>>>> Thanks >>>>> Regards >>>> >>> >