Also due you see any exception in RM / NM logs? Thanks, Omkar Joshi *Hortonworks Inc.* <http://www.hortonworks.com>
On Mon, Jul 1, 2013 at 11:19 AM, Omkar Joshi <[email protected]> wrote: > Hi, > > As I don't know your complete AM code and how your containers are > communicating with each other...Certain things which might help you in > debugging.... where you are starting your RM (is it really running on > 8030???? are you sure there is no previously started RM still running > there?) Also in yarn-site.xml can you try changing RM address to something > like "localhost:<free-port-but-not-default>" and configure maximum client > thread size for handling AM requests? only your AM is expected to > communicate with RM on AM-RM protocol.. by any chance in your code; are > containers directly communicating with RM on AM-RM protocol?? > > <property> > > <description>The address of the scheduler interface.</description> > > <name>yarn.resourcemanager.scheduler.address</name> > > <value>${yarn.resourcemanager.hostname}:8030</value> > > </property> > > > <property> > > <description>Number of threads to handle scheduler interface.</ > description> > > <name>yarn.resourcemanager.scheduler.client.thread-count</name> > > <value>50</value> > > </property> > > > Thanks, > Omkar Joshi > *Hortonworks Inc.* <http://www.hortonworks.com> > > > On Fri, Jun 28, 2013 at 5:35 AM, blah blah <[email protected]> wrote: > >> Hi >> >> Sorry to reply so late. I don't have the data you requested (sorry I have >> no time, my deadline is within 3 days). However I have observed that this >> issue occurs not only for the "larger" datasets (6.8MB), but for all >> datasets and all jobs in general. However for smaller datasets (1MB) the AM >> does not throw the Exception, only containers throw exceptions (same as in >> previous e-mail). When these exception are throws my code (AM and >> containers) does not perform any operations on HDFS, they only perform >> in-memory computation and communication. Also I have observed that these >> exception occur at "random", I couldn't observe any pattern. I can execute >> job successfully, then resubmit the job repeating the experiment and these >> exceptions occur (no change was made to src code, input dataset,or >> execution/input parameters). >> >> As for the high network usage, as I said I don't have the data. But YARN >> is running on nodes which are exclusive for my experiments no other >> software runs on these nodes (only OS and YARN). Besides I don't think that >> 20 containers working on 1MB dataset (total) can be called high network >> usage. >> >> regards >> tmp >> >> >> >> 2013/6/26 Devaraj k <[email protected]> >> >>> Hi,**** >>> >>> ** ** >>> >>> Could you check the network usage in the cluster when this problem >>> occurs? Probably it is causing due to high network usage. **** >>> >>> ** ** >>> >>> Thanks**** >>> >>> Devaraj k**** >>> >>> ** ** >>> >>> *From:* blah blah [mailto:[email protected]] >>> *Sent:* 26 June 2013 05:39 >>> *To:* [email protected] >>> *Subject:* Yarn HDFS and Yarn Exceptions when processing "larger" >>> datasets.**** >>> >>> ** ** >>> >>> Hi All**** >>> >>> First let me excuse for the poor thread title but I have no idea how to >>> express the problem in one sentence. **** >>> >>> I have implemented new Application Master with the use of Yarn. I am >>> using old Yarn development version. Revision 1437315, from 2013-01-23 >>> (SNAPSHOT 3.0.0). I can not update to current trunk version, as prototype >>> deadline is soon, and I don't have time to include Yarn API changes.**** >>> >>> Currently I execute experiments in pseudo-distributed mode, I use guava >>> version 14.0-rc1. I have a problem with Yarn's and HDFS Exceptions for >>> "larger" datasets. My AM works fine and I can execute it without a problem >>> for a debug dataset (1MB size). But when I increase the size of input to >>> 6.8 MB, I am getting the following exceptions:**** >>> >>> AM_Exceptions_Stack >>> >>> Exception in thread "Thread-3" >>> java.lang.reflect.UndeclaredThrowableException >>> at >>> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135) >>> at >>> org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:77) >>> at >>> org.apache.hadoop.yarn.client.AMRMClientImpl.allocate(AMRMClientImpl.java:194) >>> at >>> org.tudelft.ludograph.app.AppMasterContainerRequester.sendContainerAskToRM(AppMasterContainerRequester.java:219) >>> at >>> org.tudelft.ludograph.app.AppMasterContainerRequester.run(AppMasterContainerRequester.java:315) >>> at java.lang.Thread.run(Thread.java:662) >>> Caused by: com.google.protobuf.ServiceException: java.io.IOException: >>> Failed on local exception: java.io.IOException: Response is null.; Host >>> Details : local host is: "linux-ljc5.site/127.0.0.1"; destination host >>> is: "0.0.0.0":8030; >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212) >>> at $Proxy10.allocate(Unknown Source) >>> at >>> org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:75) >>> ... 4 more >>> Caused by: java.io.IOException: Failed on local exception: >>> java.io.IOException: Response is null.; Host Details : local host is: >>> "linux-ljc5.site/127.0.0.1"; destination host is: "0.0.0.0":8030; >>> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:760) >>> at org.apache.hadoop.ipc.Client.call(Client.java:1240) >>> at >>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) >>> ... 6 more >>> Caused by: java.io.IOException: Response is null. >>> at >>> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:950) >>> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)**** >>> >>> Container_Exception >>> >>> Exception in thread "org.apache.hadoop.hdfs.SocketCache@6da0d866" >>> java.lang.NoSuchMethodError: >>> com.google.common.collect.LinkedListMultimap.values()Ljava/util/List; >>> at org.apache.hadoop.hdfs.SocketCache.clear(SocketCache.java:257) >>> at org.apache.hadoop.hdfs.SocketCache.access$100(SocketCache.java:45) >>> at org.apache.hadoop.hdfs.SocketCache$1.run(SocketCache.java:126) >>> at java.lang.Thread.run(Thread.java:662) >>> >>> **** >>> >>> As I said this problem does not occur for the 1MB input. For the 6MB >>> input nothing is changed except the input dataset. Now a little bit of what >>> am I doing, to give you the context of the problem. My AM starts N (debug >>> 4) containers and each container reads its input data part. When this >>> process is finished I am exchanging parts of input between containers >>> (exchanging IDs of input structures, to provide means for communication >>> between data structures). During the process of exchanging IDs these >>> exceptions occur. I start Netty Server/Client on each container and I use >>> ports 12000-12099 as mean of communicating these IDs. **** >>> >>> Any help will be greatly appreciated. Sorry for any typos and if the >>> explanation is not clear just ask for any details you are interested in. >>> Currently it is after 2 AM I hope this will be a valid excuse.**** >>> >>> regards**** >>> >>> tmp**** >>> >> >> >
