Hitesh,

This is the information, I see in the RM logs.  There are enough resources
available on that NM.


2016-06-17 19:04:50,406 INFO  scheduler.SchedulerNode
(SchedulerNode.java:allocateContainer(154)) - Assigned container
container_e54_1466115469995_0142_01_000001 of capacity <memory:5120,
vCores:1> on host usw2stdpwo12.glassdoor.local:45454, which has 1
containers, <memory:5120, vCores:1> used and <memory:22528, vCores:6>
available after allocation
2016-06-17 19:04:50,406 INFO  capacity.LeafQueue
(LeafQueue.java:assignContainer(1633)) - assignedContainer application
attempt=appattempt_1466115469995_0142_000001 container=Container:
[ContainerId: container_e54_1466115469995_0142_01_000001, NodeId:
usw2stdpwo12.glassdoor.local:45454, NodeHttpAddress:
usw2stdpwo12.glassdoor.local:8042, Resource: <memory:5120, vCores:1>,
Priority: 0, Token: null, ] queue=default: capacity=0.2,
absoluteCapacity=0.2, usedResources=<memory:10240, vCores:2>,
usedCapacity=0.61731374, absoluteUsedCapacity=0.12345679, numApps=3,
numContainers=2 clusterResource=<memory:82944, vCores:21> type=OFF_SWITCH
2016-06-17 19:04:50,407 INFO  security.NMTokenSecretManagerInRM
(NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken
for nodeId : usw2stdpwo12.glassdoor.local:45454 for container

On Fri, Jun 17, 2016 at 6:38 PM, Hitesh Shah <[email protected]> wrote:

> -dev@tez for now.
>
> Hello Anandha,
>
> The usual issue with this is a lack of resources. e.g. no cluster capacity
> to launch the AM, queue configs not allowing another AM to launch, the
> memory size configured for the AM is too large such that it cannot be
> scheduled on any existing node, etc.
>
> Can you search for this string “1466115469995_0142” within the
> ResourceManager logs? That should shed some more light on what is going on.
>
> thanks
> — Hitesh
>
>
> > On Jun 17, 2016, at 6:30 PM, Anandha L Ranganathan <
> [email protected]> wrote:
> >
> > Yes.  sufficient resources  are available for that job.  No other jobs
> are running and only this job is running.
> >
> >
> >
> > On Fri, Jun 17, 2016 at 5:16 PM, Jeff Zhang <[email protected]> wrote:
> > Please check RM UI whether you have sufficient resources for your app
> >
> >
> > On Sat, Jun 18, 2016 at 7:35 AM, Anandha L Ranganathan <
> [email protected]> wrote:
> > I am upgrading one of our cluster from HDP 2.2 to HDP 2.4.0. version.
> >
> >
> >
> > The status I see in the Application monitoring URL is
> >
> > YARN Applicaiton Status: ACCEPTED: waiting for AM container to be
> > allocated, launched and register with RM.  But when we submit the MR job,
> > then it is running fine.
> >
> > It waits in that state for sometime(300 seconds) and dies and the service
> > check is failed.  All nodes are live and Active status.
> >
> >
> >
> > We try to run the job manually , and the job stops at this point.
> >
> > hadoop --config /usr/hdp/2.4.0.0-169/hadoop/conf jar
> > /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount
> > /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput1/
> > WARNING: Use "yarn jar" to launch YARN applications.
> > 16/06/17 19:04:47 INFO client.TezClient: Tez Client Version: [
> > component=tez-api, version=0.7.0.2.4.0.0-169,
> > revision=3c1431f45faaca982ecc8dad13a107787b834696,
> > SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git,
> > buildTime=20160210-0711 ]
> > 16/06/17 19:04:47 INFO impl.TimelineClientImpl: Timeline service
> > address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> > 16/06/17 <
> http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17>
> > 19:04:48 INFO client.RMProxy: Connecting to ResourceManager at
> > usw2stdpma03.glassdoor.local/172.17.212.107:8050
> > 16/06/17 19:04:48 INFO client.TezClient: Using
> > org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to
> > manage Timeline ACLs
> > 16/06/17 19:04:48 INFO impl.TimelineClientImpl: Timeline service
> > address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> > 16/06/17 <
> http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17>
> > 19:04:49 INFO examples.OrderedWordCount: Running OrderedWordCount
> > 16/06/17 19:04:49 INFO client.TezClient: Submitting DAG application
> > with id: application_1466115469995_0142
> > 16/06/17 19:04:49 INFO client.TezClientUtils: Using tez.lib.uris value
> > from configuration: /hdp/apps/2.4.0.0-169/tez/tez.tar.gz
> > 16/06/17 19:04:49 INFO client.TezClient: Stage directory
> > /tmp/root/staging doesn't exist and is created
> > 16/06/17 19:04:49 INFO client.TezClient: Tez system stage directory
> >
> hdfs://dfs-nameservices/tmp/root/staging/.tez/application_1466115469995_0142
> > doesn't exist and is created
> > 16/06/17 19:04:49 INFO acls.ATSHistoryACLPolicyManager: Created
> > Timeline Domain for History ACLs,
> > domainId=Tez_ATS_application_1466115469995_0142
> > 16/06/17 19:04:50 INFO client.TezClient: Submitting DAG to YARN,
> > applicationId=application_1466115469995_0142,
> > dagName=OrderedWordCount, callerContext={ context=TezExamples,
> > callerType=null, callerId=null }
> > 16/06/17 19:04:50 INFO impl.YarnClientImpl: Submitted application
> > application_1466115469995_0142
> > 16/06/17 19:04:50 INFO client.TezClient: The url to track the Tez AM:
> >
> http://usw2stdpma03.glassdoor.local:8088/proxy/application_1466115469995_0142/
> > 16/06/17 <
> http://usw2stdpma03.glassdoor.local:8088/proxy/application_1466115469995_0142/16/06/17
> >
> > 19:04:50 INFO impl.TimelineClientImpl: Timeline service address:
> > http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> > 16/06/17 <
> http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17>
> > 19:04:50 INFO client.RMProxy: Connecting to ResourceManager at
> > usw2stdpma03.glassdoor.local/172.17.212.107:8050
> > 16/06/17 19:04:51 INFO client.DAGClientImpl: Waiting for DAG to start
> running
> >
> >
> >
> > how do I fix this problem ?
> >
> > Thanks
> > Anand
> >
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>
>

Reply via email to