Hitesh, This is the information, I see in the RM logs. There are enough resources available on that NM.
2016-06-17 19:04:50,406 INFO scheduler.SchedulerNode (SchedulerNode.java:allocateContainer(154)) - Assigned container container_e54_1466115469995_0142_01_000001 of capacity <memory:5120, vCores:1> on host usw2stdpwo12.glassdoor.local:45454, which has 1 containers, <memory:5120, vCores:1> used and <memory:22528, vCores:6> available after allocation 2016-06-17 19:04:50,406 INFO capacity.LeafQueue (LeafQueue.java:assignContainer(1633)) - assignedContainer application attempt=appattempt_1466115469995_0142_000001 container=Container: [ContainerId: container_e54_1466115469995_0142_01_000001, NodeId: usw2stdpwo12.glassdoor.local:45454, NodeHttpAddress: usw2stdpwo12.glassdoor.local:8042, Resource: <memory:5120, vCores:1>, Priority: 0, Token: null, ] queue=default: capacity=0.2, absoluteCapacity=0.2, usedResources=<memory:10240, vCores:2>, usedCapacity=0.61731374, absoluteUsedCapacity=0.12345679, numApps=3, numContainers=2 clusterResource=<memory:82944, vCores:21> type=OFF_SWITCH 2016-06-17 19:04:50,407 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken for nodeId : usw2stdpwo12.glassdoor.local:45454 for container On Fri, Jun 17, 2016 at 6:38 PM, Hitesh Shah <[email protected]> wrote: > -dev@tez for now. > > Hello Anandha, > > The usual issue with this is a lack of resources. e.g. no cluster capacity > to launch the AM, queue configs not allowing another AM to launch, the > memory size configured for the AM is too large such that it cannot be > scheduled on any existing node, etc. > > Can you search for this string “1466115469995_0142” within the > ResourceManager logs? That should shed some more light on what is going on. > > thanks > — Hitesh > > > > On Jun 17, 2016, at 6:30 PM, Anandha L Ranganathan < > [email protected]> wrote: > > > > Yes. sufficient resources are available for that job. No other jobs > are running and only this job is running. > > > > > > > > On Fri, Jun 17, 2016 at 5:16 PM, Jeff Zhang <[email protected]> wrote: > > Please check RM UI whether you have sufficient resources for your app > > > > > > On Sat, Jun 18, 2016 at 7:35 AM, Anandha L Ranganathan < > [email protected]> wrote: > > I am upgrading one of our cluster from HDP 2.2 to HDP 2.4.0. version. > > > > > > > > The status I see in the Application monitoring URL is > > > > YARN Applicaiton Status: ACCEPTED: waiting for AM container to be > > allocated, launched and register with RM. But when we submit the MR job, > > then it is running fine. > > > > It waits in that state for sometime(300 seconds) and dies and the service > > check is failed. All nodes are live and Active status. > > > > > > > > We try to run the job manually , and the job stops at this point. > > > > hadoop --config /usr/hdp/2.4.0.0-169/hadoop/conf jar > > /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount > > /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput1/ > > WARNING: Use "yarn jar" to launch YARN applications. > > 16/06/17 19:04:47 INFO client.TezClient: Tez Client Version: [ > > component=tez-api, version=0.7.0.2.4.0.0-169, > > revision=3c1431f45faaca982ecc8dad13a107787b834696, > > SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, > > buildTime=20160210-0711 ] > > 16/06/17 19:04:47 INFO impl.TimelineClientImpl: Timeline service > > address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/ > > 16/06/17 < > http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17> > > 19:04:48 INFO client.RMProxy: Connecting to ResourceManager at > > usw2stdpma03.glassdoor.local/172.17.212.107:8050 > > 16/06/17 19:04:48 INFO client.TezClient: Using > > org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to > > manage Timeline ACLs > > 16/06/17 19:04:48 INFO impl.TimelineClientImpl: Timeline service > > address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/ > > 16/06/17 < > http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17> > > 19:04:49 INFO examples.OrderedWordCount: Running OrderedWordCount > > 16/06/17 19:04:49 INFO client.TezClient: Submitting DAG application > > with id: application_1466115469995_0142 > > 16/06/17 19:04:49 INFO client.TezClientUtils: Using tez.lib.uris value > > from configuration: /hdp/apps/2.4.0.0-169/tez/tez.tar.gz > > 16/06/17 19:04:49 INFO client.TezClient: Stage directory > > /tmp/root/staging doesn't exist and is created > > 16/06/17 19:04:49 INFO client.TezClient: Tez system stage directory > > > hdfs://dfs-nameservices/tmp/root/staging/.tez/application_1466115469995_0142 > > doesn't exist and is created > > 16/06/17 19:04:49 INFO acls.ATSHistoryACLPolicyManager: Created > > Timeline Domain for History ACLs, > > domainId=Tez_ATS_application_1466115469995_0142 > > 16/06/17 19:04:50 INFO client.TezClient: Submitting DAG to YARN, > > applicationId=application_1466115469995_0142, > > dagName=OrderedWordCount, callerContext={ context=TezExamples, > > callerType=null, callerId=null } > > 16/06/17 19:04:50 INFO impl.YarnClientImpl: Submitted application > > application_1466115469995_0142 > > 16/06/17 19:04:50 INFO client.TezClient: The url to track the Tez AM: > > > http://usw2stdpma03.glassdoor.local:8088/proxy/application_1466115469995_0142/ > > 16/06/17 < > http://usw2stdpma03.glassdoor.local:8088/proxy/application_1466115469995_0142/16/06/17 > > > > 19:04:50 INFO impl.TimelineClientImpl: Timeline service address: > > http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/ > > 16/06/17 < > http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17> > > 19:04:50 INFO client.RMProxy: Connecting to ResourceManager at > > usw2stdpma03.glassdoor.local/172.17.212.107:8050 > > 16/06/17 19:04:51 INFO client.DAGClientImpl: Waiting for DAG to start > running > > > > > > > > how do I fix this problem ? > > > > Thanks > > Anand > > > > > > > > -- > > Best Regards > > > > Jeff Zhang > > > >
