[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container

2015-03-01 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342593#comment-14342593
 ] 

Naganarasimha G R commented on YARN-1621:
-

hi Bartosz Lugowski, 
Sorry for the delayed response, will share the review comments today...

 Add CLI to list rows of task attempt ID, container ID, host of container, 
 state of container
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
Assignee: Bartosz Ɓugowski
 Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, 
 YARN-1621.4.patch


 As more applications are moved to YARN, we need generic CLI to list rows of 
 task attempt ID, container ID, host of container, state of container. Today 
 if YARN application running in a container does hang, there is no way to find 
 out more info because a user does not know where each attempt is running in.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers.
  
 {code:title=proposed yarn cli}
 $ yarn application -list-containers -applicationId appId [-containerState 
 state of container]
 where containerState is optional filter to list container in given state only.
 container state can be running/succeeded/killed/failed/all.
 A user can specify more than one container state at once e.g. KILLED,FAILED.
 task attempt ID container ID host of container state of container 
 {code}
 CLI should work with running application/completed application. If a 
 container runs many task attempts, all attempts should be shown. That will 
 likely be the case of Tez container-reuse application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2764) counters.LimitExceededException shouldn't abort AsyncDispatcher

2015-03-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2764:
-
Description: 
I saw the following in container log:
{code}
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
attemptattempt_1414221548789_0023_r_03_0
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24
2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1414221548789_0023Job 
Transitioned from RUNNING to COMMITTING
2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the 
event EventType: JOB_COMMIT
2014-10-25 10:28:55,177 FATAL [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 
121 max=120
  at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101)
  at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108)
  at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78)
  at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95)
  at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106)
  at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203)
  at 
org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1754)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1737)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1718)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1089)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2049)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2045)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
  at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
  at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
  at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
  at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
  at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
  at java.lang.Thread.run(Thread.java:745)
2014-10-25 10:28:55,185 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
{code}
Counter limit was exceeded when JobFinishedEvent was created.

Better handling of LimitExceededException should be provided so that 
AsyncDispatcher can continue functioning.

  was:
I saw the following in container log:
{code}
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
attemptattempt_1414221548789_0023_r_03_0
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED
2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24
2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1414221548789_0023Job 
Transitioned from RUNNING to COMMITTING
2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the 
event EventType: JOB_COMMIT
2014-10-25 10:28:55,177 FATAL 

[jira] [Commented] (YARN-2828) Enable auto refresh of web pages (using http parameter)

2015-03-01 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342074#comment-14342074
 ] 

Allen Wittenauer commented on YARN-2828:


If jsoup is really only choice here rather than something else we already have 
in use, then its version needs to be defined in hadoop-project rather than 
buried inside yarn.

 Enable auto refresh of web pages (using http parameter)
 ---

 Key: YARN-2828
 URL: https://issues.apache.org/jira/browse/YARN-2828
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Tim Robertson
Assignee: Vijay Bhat
Priority: Minor
 Attachments: YARN-2828.001.patch


 The MR1 Job Tracker had a useful HTTP parameter of e.g. refresh=3 that 
 could be appended to URLs which enabled a page reload.  This was very useful 
 when developing mapreduce jobs, especially to watch counters changing.  This 
 is lost in the the Yarn interface.
 Could be implemented as a page element (e.g. drop down or so), but I'd 
 recommend that the page not be more cluttered, and simply bring back the 
 optional refresh HTTP param.  It worked really nicely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3199) Fair Scheduler documentation improvements

2015-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342150#comment-14342150
 ] 

Hudson commented on YARN-3199:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #119 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/119/])
YARN-3199. Fair Scheduler documentation improvements (Rohit Agarwal via aw) 
(aw: rev 8472d729974ea3ccf9fff5ce4f5309aa8e43a49e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md


 Fair Scheduler documentation improvements
 -

 Key: YARN-3199
 URL: https://issues.apache.org/jira/browse/YARN-3199
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal
Priority: Minor
  Labels: documentation
 Fix For: 3.0.0

 Attachments: YARN-3199-1.patch, YARN-3199.patch


 {{yarn.scheduler.increment-allocation-mb}} and 
 {{yarn.scheduler.increment-allocation-vcores}} are not documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3199) Fair Scheduler documentation improvements

2015-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342173#comment-14342173
 ] 

Hudson commented on YARN-3199:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #853 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/853/])
YARN-3199. Fair Scheduler documentation improvements (Rohit Agarwal via aw) 
(aw: rev 8472d729974ea3ccf9fff5ce4f5309aa8e43a49e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md
* hadoop-yarn-project/CHANGES.txt


 Fair Scheduler documentation improvements
 -

 Key: YARN-3199
 URL: https://issues.apache.org/jira/browse/YARN-3199
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal
Priority: Minor
  Labels: documentation
 Fix For: 3.0.0

 Attachments: YARN-3199-1.patch, YARN-3199.patch


 {{yarn.scheduler.increment-allocation-mb}} and 
 {{yarn.scheduler.increment-allocation-vcores}} are not documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3199) Fair Scheduler documentation improvements

2015-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342247#comment-14342247
 ] 

Hudson commented on YARN-3199:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2051 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2051/])
YARN-3199. Fair Scheduler documentation improvements (Rohit Agarwal via aw) 
(aw: rev 8472d729974ea3ccf9fff5ce4f5309aa8e43a49e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md
* hadoop-yarn-project/CHANGES.txt


 Fair Scheduler documentation improvements
 -

 Key: YARN-3199
 URL: https://issues.apache.org/jira/browse/YARN-3199
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal
Priority: Minor
  Labels: documentation
 Fix For: 3.0.0

 Attachments: YARN-3199-1.patch, YARN-3199.patch


 {{yarn.scheduler.increment-allocation-mb}} and 
 {{yarn.scheduler.increment-allocation-vcores}} are not documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3199) Fair Scheduler documentation improvements

2015-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342255#comment-14342255
 ] 

Hudson commented on YARN-3199:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #110 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/110/])
YARN-3199. Fair Scheduler documentation improvements (Rohit Agarwal via aw) 
(aw: rev 8472d729974ea3ccf9fff5ce4f5309aa8e43a49e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md


 Fair Scheduler documentation improvements
 -

 Key: YARN-3199
 URL: https://issues.apache.org/jira/browse/YARN-3199
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal
Priority: Minor
  Labels: documentation
 Fix For: 3.0.0

 Attachments: YARN-3199-1.patch, YARN-3199.patch


 {{yarn.scheduler.increment-allocation-mb}} and 
 {{yarn.scheduler.increment-allocation-vcores}} are not documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-03-01 Thread Beckham007 (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342613#comment-14342613
 ] 

Beckham007 commented on YARN-3080:
--

Getting getting the actual PID from docker inspect is good, but it is too 
complex. I think nm should use 
We sloved this by the same way as DefaultContainerExecutor(The same as 
[~chenchun]), and it works well.

 The DockerContainerExecutor could not write the right pid to container pidFile
 --

 Key: YARN-3080
 URL: https://issues.apache.org/jira/browse/YARN-3080
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Beckham007
Assignee: Abin Shahab
 Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, 
 YARN-3080.patch


 The docker_container_executor_session.sh is like this:
 {quote}
 #!/usr/bin/env bash
 echo `/usr/bin/docker inspect --format {{.State.Pid}} 
 container_1421723685222_0008_01_02`  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
 /bin/mv -f 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
 /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
 GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
 GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
 GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
 GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
 --cpu-shares=1024 -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
  -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh
 {quote}
 The DockerContainerExecutor use docker inspect before docker run, so the 
 docker inspect couldn't get the right pid for the docker, signalContainer() 
 and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-03-01 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342744#comment-14342744
 ] 

Abin Shahab commented on YARN-3080:
---

[~beckham007] Thanks for your comment.
So you're saying that if I send a signal to the pid of the session script(as 
DefaultContainerExecutor does), it will work on the process that docker is 
running, and potentially kill it?

Please help me clarify my understanding:
I am running the following steps:
First I create a file similar to the session script. It writes the pid of the 
session to a pidfile
{code}
$cat  bash_session_pid.sh EOF
 #!/bin/bash
 echo $$  /tmp/pidfile
 exec setsid bash -c 'docker run -itd ubuntu sleep infinity'
 EOF
{code}

I chmod and run this script which starts a docker container
{code}
$chmod a+x bash_session_pid.sh
$./bash_session_pid.sh
$docker ps 
1b8ee377e3d2ubuntu:14.04sleep infinity3 minutes ago   
Up 3 minutescranky_stallman
{code}

Now I cat the pid of the session, and it says the pid is 9281
{code}
$cat /tmp/pidfile
9281
{code}


As you've suggested, I send a kill signal to the pid, hoping that'd kill the 
container
{code}
$kill -9 9281
{code}
I check if the docker container is killed:
{code}
$docker ps
1b8ee377e3d2ubuntu:14.04sleep infinity6 minutes ago   
Up 6 minutescranky_stallman
{code}

Since your method did not kill the container, I get the pid of the process 
running under the container:
{code}
$docker inspect 1b8ee377e3d2
9289
{code}
I check the tree of this process:
{code}
$pstree -ps 9289
init(1)---docker(6512)---sleep(9289)
{code}

As I had expected, this process is a child of the docker daemon, and therefore, 
if it's killed, the container will be killed. Therefore, I send a kill signal 
to this pid:
{code}
$kill -9 9289
{code}

Now I verify if the container is alive:
{code}
$docker ps
{code} 
Container is dead. 
From what I understand, the session pid has no relation to the actual pid of 
the container, and therefore, sending it signal is meaningless.
Therefore, if that meaningless pid is in the pidfile, 
NodeManager/ResourceManager will not be able to send signal to containers as 
needed.
Please let me know where my understanding is mistaken, and I gladly will switch 
it to the simpler implementation.

 The DockerContainerExecutor could not write the right pid to container pidFile
 --

 Key: YARN-3080
 URL: https://issues.apache.org/jira/browse/YARN-3080
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Beckham007
Assignee: Abin Shahab
 Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, 
 YARN-3080.patch


 The docker_container_executor_session.sh is like this:
 {quote}
 #!/usr/bin/env bash
 echo `/usr/bin/docker inspect --format {{.State.Pid}} 
 container_1421723685222_0008_01_02`  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
 /bin/mv -f 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
 /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
 GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
 GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
 GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
 GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
 --cpu-shares=1024 -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
  -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh
 {quote}
 The DockerContainerExecutor use docker inspect before docker run, so the 
 docker inspect couldn't get the right pid for the docker, signalContainer() 
 

[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-03-01 Thread Beckham007 (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342760#comment-14342760
 ] 

Beckham007 commented on YARN-3080:
--

Sorry for mistaking.
The pidfile has two functions. 1. When NM restarting, it use the pid to check 
whether the container finished. 2. As [~chenchun] said, As for 
signalContainer, we can use docker kill --signal=SIGNAL containerId instead.

 The DockerContainerExecutor could not write the right pid to container pidFile
 --

 Key: YARN-3080
 URL: https://issues.apache.org/jira/browse/YARN-3080
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Beckham007
Assignee: Abin Shahab
 Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, 
 YARN-3080.patch


 The docker_container_executor_session.sh is like this:
 {quote}
 #!/usr/bin/env bash
 echo `/usr/bin/docker inspect --format {{.State.Pid}} 
 container_1421723685222_0008_01_02`  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
 /bin/mv -f 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
 /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
 GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
 GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
 GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
 GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
 --cpu-shares=1024 -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
  -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh
 {quote}
 The DockerContainerExecutor use docker inspect before docker run, so the 
 docker inspect couldn't get the right pid for the docker, signalContainer() 
 and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-03-01 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342828#comment-14342828
 ] 

Abin Shahab commented on YARN-3080:
---

[~chengzhendong888], you are implying that the pidfile in future will never be 
used for anything else? Container Executors are required to provide a correct 
pidfile, and that's the api contract between them and the NodeManager. I don't 
see why DockerContainerExecutor should violate that contract.

Also, how would you derive a containerId from a pid in the NM? The pid that 
will be sent to signal container won't even be the correct pid(it will be the 
session script pid).



 The DockerContainerExecutor could not write the right pid to container pidFile
 --

 Key: YARN-3080
 URL: https://issues.apache.org/jira/browse/YARN-3080
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Beckham007
Assignee: Abin Shahab
 Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, 
 YARN-3080.patch


 The docker_container_executor_session.sh is like this:
 {quote}
 #!/usr/bin/env bash
 echo `/usr/bin/docker inspect --format {{.State.Pid}} 
 container_1421723685222_0008_01_02`  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
 /bin/mv -f 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
 /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
 GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
 GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
 GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
 GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
 --cpu-shares=1024 -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
  -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh
 {quote}
 The DockerContainerExecutor use docker inspect before docker run, so the 
 docker inspect couldn't get the right pid for the docker, signalContainer() 
 and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-03-01 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342861#comment-14342861
 ] 

Varun Vasudev commented on YARN-3080:
-

[~ashahab] sorry for the late response. The kill functionality uses the kill 
-9 -$PID form which sends the kill signal to the process as well as all the 
children, grandchildren, etc. There's a more detailed explanation here - 
http://stackoverflow.com/a/15139734. The code that generates the kill command 
is in Shell.java  - look for getSignalKillCommand.

 The DockerContainerExecutor could not write the right pid to container pidFile
 --

 Key: YARN-3080
 URL: https://issues.apache.org/jira/browse/YARN-3080
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Beckham007
Assignee: Abin Shahab
 Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, 
 YARN-3080.patch


 The docker_container_executor_session.sh is like this:
 {quote}
 #!/usr/bin/env bash
 echo `/usr/bin/docker inspect --format {{.State.Pid}} 
 container_1421723685222_0008_01_02`  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
 /bin/mv -f 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
 /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
 GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
 GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
 GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
 GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
 --cpu-shares=1024 -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
  -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh
 {quote}
 The DockerContainerExecutor use docker inspect before docker run, so the 
 docker inspect couldn't get the right pid for the docker, signalContainer() 
 and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3265) CapacityScheduler deadlock when computing absolute max avail capacity (fix for trunk/branch-2)

2015-03-01 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3265:
-
Attachment: YARN-3265.6.patch

Attached new patch addressed findbugs warnings, test failure is not related.

 CapacityScheduler deadlock when computing absolute max avail capacity (fix 
 for trunk/branch-2)
 --

 Key: YARN-3265
 URL: https://issues.apache.org/jira/browse/YARN-3265
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Blocker
 Attachments: YARN-3265.1.patch, YARN-3265.2.patch, YARN-3265.3.patch, 
 YARN-3265.5.patch, YARN-3265.6.patch


 This patch is trying to solve the same problem described in YARN-3251, but 
 this is a longer term fix for trunk and branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-03-01 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342876#comment-14342876
 ] 

Abin Shahab commented on YARN-3080:
---

I agree with that Varun. However, I'm not sure the process launched under
docker is a child of the sessionScript(see my example above). I can be
wrong though.

On Sun, Mar 1, 2015 at 11:09 PM, Varun Vasudev (JIRA) j...@apache.org



 The DockerContainerExecutor could not write the right pid to container pidFile
 --

 Key: YARN-3080
 URL: https://issues.apache.org/jira/browse/YARN-3080
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Beckham007
Assignee: Abin Shahab
 Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, 
 YARN-3080.patch


 The docker_container_executor_session.sh is like this:
 {quote}
 #!/usr/bin/env bash
 echo `/usr/bin/docker inspect --format {{.State.Pid}} 
 container_1421723685222_0008_01_02`  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
 /bin/mv -f 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
  
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
 /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
 GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
 GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
 GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
 GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
 --cpu-shares=1024 -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
  -v 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
 /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh
 {quote}
 The DockerContainerExecutor use docker inspect before docker run, so the 
 docker inspect couldn't get the right pid for the docker, signalContainer() 
 and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-03-01 Thread Ryu Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3249:

Attachment: YARN-3249.4.patch

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
 YARN-3249.4.patch, YARN-3249.patch, killapp-failed.log, killapp-failed2.log, 
 screenshot.png, screenshot2.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-03-01 Thread Ryu Kobayashi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342880#comment-14342880
 ] 

Ryu Kobayashi commented on YARN-3249:
-

New patch has the above update.

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
 YARN-3249.4.patch, YARN-3249.patch, killapp-failed.log, killapp-failed2.log, 
 screenshot.png, screenshot2.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-03-01 Thread Ryu Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3249:

Attachment: screenshot2.png

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
 YARN-3249.4.patch, YARN-3249.patch, killapp-failed.log, killapp-failed2.log, 
 screenshot.png, screenshot2.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3199) Fair Scheduler documentation improvements

2015-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342287#comment-14342287
 ] 

Hudson commented on YARN-3199:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2069 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2069/])
YARN-3199. Fair Scheduler documentation improvements (Rohit Agarwal via aw) 
(aw: rev 8472d729974ea3ccf9fff5ce4f5309aa8e43a49e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md
* hadoop-yarn-project/CHANGES.txt


 Fair Scheduler documentation improvements
 -

 Key: YARN-3199
 URL: https://issues.apache.org/jira/browse/YARN-3199
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal
Priority: Minor
  Labels: documentation
 Fix For: 3.0.0

 Attachments: YARN-3199-1.patch, YARN-3199.patch


 {{yarn.scheduler.increment-allocation-mb}} and 
 {{yarn.scheduler.increment-allocation-vcores}} are not documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3199) Fair Scheduler documentation improvements

2015-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342274#comment-14342274
 ] 

Hudson commented on YARN-3199:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #119 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/119/])
YARN-3199. Fair Scheduler documentation improvements (Rohit Agarwal via aw) 
(aw: rev 8472d729974ea3ccf9fff5ce4f5309aa8e43a49e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md


 Fair Scheduler documentation improvements
 -

 Key: YARN-3199
 URL: https://issues.apache.org/jira/browse/YARN-3199
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Assignee: Rohit Agarwal
Priority: Minor
  Labels: documentation
 Fix For: 3.0.0

 Attachments: YARN-3199-1.patch, YARN-3199.patch


 {{yarn.scheduler.increment-allocation-mb}} and 
 {{yarn.scheduler.increment-allocation-vcores}} are not documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)