I think it may occur at the reduce phase. Too many shuffle fetch at the same time?
2013/6/20 Arun C Murthy <[email protected]> > Where are seeing the error? In the tasks? Or, during job submission? > > On Jun 19, 2013, at 8:43 PM, Li Shengmei <[email protected]> wrote: > > > Hi, all > > > > I am doing some experiments with yarn-2.0.x. I found some > problems > > as follows, can anyone give suggestion? > > > > > > > > 1. I set up hadoop environment with 2 nodes (1 as namenode ,1 as > > datanode). The application(wordcount) runs successfully. > > > > 2. I set up hadoop environment with 3 nodes or more(1 as namenode, > > others as datanodes). > > > > The application sometimes runs successfully, sometimes runs failed. The > > failed logs are > > > > "Exception running child : java.net.ConnectException: Call From > > hadoop3.localdomain/10.3.1.63 to localhost:37497 failed on connection > > exception: java.net.ConnectException: Connection refused; For more > details > > see: http://wiki.apache.org/hadoop/ConnectionRefused" > > > > > > > > I checked the configurations and firewall, there are no problems. I use > the > > wordcount application in Hadoop examples, so the application is also OK. > > > > > > > > Does anyone come across the problem? Any suggestions are welcome. Thanks > a > > lot. > > > > > > > > Shengmei > > > > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ > > > -- *Sincerely,* *Zhaojie* * *
