[jira] [Commented] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-07 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572673#comment-16572673
 ] 

Chandni Singh commented on YARN-8160:
-

[~eyang] Thanks for pointing out in the code. 

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: container_e02_1533231998644_0009_01_03.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4778) Support specifying resources for task containers in SLS

2018-08-07 Thread Ananyo Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572670#comment-16572670
 ] 

Ananyo Rao commented on YARN-4778:
--

[~leftnoteasy] is there a way to acquire and set specific container resources 
for MR jobs? Currently the json file doesn't hold these information. Could you 
suggest a way for acquiring them as well?

> Support specifying resources for task containers in SLS
> ---
>
> Key: YARN-4778
> URL: https://issues.apache.org/jira/browse/YARN-4778
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-4778.1.patch
>
>
> Currently, SLS doesn't support specify resources for task containers, it uses 
> a global default value for all containers.
> Instead, we should be able to specify different resources for task containers 
> in sls-job.conf.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-07 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572668#comment-16572668
 ] 

Eric Yang commented on YARN-8160:
-

[~csingh] The segment of code basically create a child process and call to 
[Line 
1531|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1531]
 in container-executor.c.  When docker run terminates, container-executor 
continues to run and perform docker inspect in [Line 
1767|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1767],
 and report exit_code = -1 from container-executor which is returned to the 
code that you pointed out.  We can try option 1 that you are suggesting, if 
there is no easy way to prevent race condition between reaping containers and 
container-executor's wait logic.

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: container_e02_1533231998644_0009_01_03.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-07 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572593#comment-16572593
 ] 

Chandni Singh commented on YARN-8160:
-

{quote}
 Exit code 255 is coming from docker inspect 
container_e02_1533231998644_0009_01_03. There looks like a race condition 
where ContainerLaunch thread has issued the termination on docker container 
pid. LinuxContainerExecutor still has a independent child process that is 
checking the liveness of the docker container.
{quote}
[~eyang], the container exit code comes from the below stmt in 
{{ContainerLaunch.call()}}
{code}
ret = launchContainer(new ContainerStartContext.Builder()
  .setContainer(container)
  .setLocalizedResources(localResources)
  .setNmPrivateContainerScriptPath(nmPrivateContainerScriptPath)
  .setNmPrivateTokensPath(nmPrivateTokensPath)
  .setUser(user)
  .setAppId(appIdStr)
  .setContainerWorkDir(containerWorkDir)
  .setLocalDirs(localDirs)
  .setLogDirs(logDirs)
  .setFilecacheDirs(filecacheDirs)
  .setUserLocalDirs(userLocalDirs)
  .setContainerLocalDirs(containerLocalDirs)
  .setContainerLogDirs(containerLogDirs)
  .setUserFilecacheDirs(userFilecacheDirs)
  .setApplicationLocalDirs(applicationLocalDirs).build());
{code}

The docker inspect of the container that has been stopped and cleaned would 
just tell the container is not alive. How does it affect the container's exit 
code? I cannot find this in the code. Could you please point me to it?

I still think, below are the only 2 solutions for this:
1. In node manager, if a container is in REINITIALIZING_AWAITING_KILL and gets 
a CONTAINER_EXITED_WITH_FAILURE event, then it should handle it in the similar 
way as it currently handle the CONTAINER_KILLED_ON_REQUEST.

2. cleanup of container files is not performed until the container exits


> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: container_e02_1533231998644_0009_01_03.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8609) NM oom because of large container statuses

2018-08-07 Thread Xianghao Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghao Lu resolved YARN-8609.
---
Resolution: Duplicate

> NM oom because of large container statuses
> --
>
> Key: YARN-8609
> URL: https://issues.apache.org/jira/browse/YARN-8609
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Xianghao Lu
>Priority: Major
> Attachments: YARN-8609.001.patch, contain_status.jpg, oom.jpeg
>
>
> Sometimes, NodeManger will send large container statuses to ResourceManager 
> when NodeManger start with recovering, as a result , NodeManger will be 
> failed to start because of oom.
>  In my case, the large container statuses size is 135M, which contain 11 
> container statuses, and I find the diagnostics of 5 containers are very 
> large(27M), so, I truncate the container diagnostics as the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8609) NM oom because of large container statuses

2018-08-07 Thread Xianghao Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572565#comment-16572565
 ] 

Xianghao Lu commented on YARN-8609:
---

{quote}

Those looking for a JIRA and finding the summary matching their symptoms should 
be directed to YARN-3998since that alone is sufficient to address that problem.

{quote}


-YARN-3998- did slove the problem. However,  I am worried that -YARN-3998- does 
not mention oom, too much diagnostic info, large container status, etc. in 
summary or description.

 Closed as a duplicate

> NM oom because of large container statuses
> --
>
> Key: YARN-8609
> URL: https://issues.apache.org/jira/browse/YARN-8609
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Xianghao Lu
>Priority: Major
> Attachments: YARN-8609.001.patch, contain_status.jpg, oom.jpeg
>
>
> Sometimes, NodeManger will send large container statuses to ResourceManager 
> when NodeManger start with recovering, as a result , NodeManger will be 
> failed to start because of oom.
>  In my case, the large container statuses size is 135M, which contain 11 
> container statuses, and I find the diagnostics of 5 containers are very 
> large(27M), so, I truncate the container diagnostics as the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8626) Create HomePolicyManager that sends all the requests to the home subcluster

2018-08-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572460#comment-16572460
 ] 

Hudson commented on YARN-8626:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14716 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14716/])
YARN-8626. Create HomePolicyManager that sends all the requests to the (gifuma: 
rev d838179d8dc257e582e8c7bb1cf312d4c0d3f733)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/amrmproxy/AbstractAMRMProxyPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/amrmproxy/RejectAMRMProxyPolicy.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/federation/policies/manager/TestHomePolicyManager.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/manager/HomePolicyManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/amrmproxy/BroadcastAMRMProxyPolicy.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/federation/policies/amrmproxy/TestHomeAMRMProxyPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/federation/utils/FederationPoliciesTestUtil.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/policies/amrmproxy/HomeAMRMProxyPolicy.java


> Create HomePolicyManager that sends all the requests to the home subcluster
> ---
>
> Key: YARN-8626
> URL: https://issues.apache.org/jira/browse/YARN-8626
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Giovanni Matteo Fumarola
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8626.000.patch, YARN-8626.001.patch, 
> YARN-8626.002.patch, YARN-8626.003.patch, YARN-8626.004.patch, 
> YARN-8626.005.patch, YARN-8626.006.patch, YARN-8626.007.patch, 
> YARN-8626.008.patch, YARN-8626.009.patch
>
>
> To have the same behavior as a regular non-federated deployment, one should 
> be able to submit jobs to the local RM and get the job constrained to that 
> subcluster.
> This JIRA creates an AMRMProxyPolicy that sends resources to the home 
> subcluster and mimics the behavior of a non-federated cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8626) Create HomePolicyManager that sends all the requests to the home subcluster

2018-08-07 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572443#comment-16572443
 ] 

Giovanni Matteo Fumarola commented on YARN-8626:


Thanks [~elgoiri] for the patch and [~subru] for the review.
Committed to Trunk.

> Create HomePolicyManager that sends all the requests to the home subcluster
> ---
>
> Key: YARN-8626
> URL: https://issues.apache.org/jira/browse/YARN-8626
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Giovanni Matteo Fumarola
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8626.000.patch, YARN-8626.001.patch, 
> YARN-8626.002.patch, YARN-8626.003.patch, YARN-8626.004.patch, 
> YARN-8626.005.patch, YARN-8626.006.patch, YARN-8626.007.patch, 
> YARN-8626.008.patch, YARN-8626.009.patch
>
>
> To have the same behavior as a regular non-federated deployment, one should 
> be able to submit jobs to the local RM and get the job constrained to that 
> subcluster.
> This JIRA creates an AMRMProxyPolicy that sends resources to the home 
> subcluster and mimics the behavior of a non-federated cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8633) Update JQuery version references in yarn-common

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572377#comment-16572377
 ] 

genericqa commented on YARN-8633:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
29m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 18m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 307 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: . {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}171m 19s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}354m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
\\
\\
|| Subsystem || Report/Notes 

[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572375#comment-16572375
 ] 

genericqa commented on YARN-8561:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 11 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
7s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 9 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 5 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
6s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 28m 55s{color} 
| {color:red} hadoop-yarn-applications in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
30s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 83m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShell |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA 

[jira] [Commented] (YARN-8407) Container launch exception in AM log should be printed in ERROR level

2018-08-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572324#comment-16572324
 ] 

Hudson commented on YARN-8407:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14715 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14715/])
YARN-8407. Container launch exception in AM log should be printed in (wangda: 
rev 861095f761b40171e0dc25f769f486d910cc3e88)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/component/instance/ComponentInstance.java


> Container launch exception in AM log should be printed in ERROR level
> -
>
> Key: YARN-8407
> URL: https://issues.apache.org/jira/browse/YARN-8407
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8407.001.patch, YARN-8407.002.patch, 
> YARN-8407.003.patch
>
>
> when a container launch is failing due to docker image not available is 
> logged as INFO level in AM log. 
> Container launch failure should be logged as ERROR.
> Steps:
> launch httpd yarn-service application with invalid docker image
>  
> {code:java}
> 2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
> container_e05_1528335963594_0001_01_02]: 
> container_e05_1528335963594_0001_01_02 completed. Reinsert back to 
> pending list and requested a new container.
> exitStatus=-1, diagnostics=[2018-06-07 01:51:02.363]Exception from 
> container-launch.
> Container id: container_e05_1528335963594_0001_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: Unable to find image 'xxx/httpd:0.1' locally
> Trying to pull repository xxx/httpd ...
> /usr/bin/docker-current: Get https://xxx/v1/_ping: dial tcp: lookup xxx on 
> yyy: no such host.
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Wrote the exit code 7 to 
> /grid/0/hadoop/yarn/local/nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02/container_e05_1528335963594_0001_01_02.pid.exitcode
> [2018-06-07 01:51:02.393]Diagnostic message from attempt :
> [2018-06-07 01:51:02.394]Container exited with a non-zero exit code 7. Last 
> 4096 bytes of stderr.txt :
> [2018-06-07 01:51:32.428]Could not find 
> nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02//container_e05_1528335963594_0001_01_02.pid
>  in any of the directories
> 2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
> container_e05_1528335963594_0001_01_02] Transitioned from STARTED to INIT 
> on STOP event{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6636) Fair Scheduler: respect node labels at resource request level

2018-08-07 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572303#comment-16572303
 ] 

Yufei Gu commented on YARN-6636:


There are multiple ways to approach node labeling in Fair Scheduler. The 
community doesn't have the consensus. The way YARN-2497 took heavily involves 
queue management or fair share calculations.  Whether node labeling want to 
affect queue management  is decided by whether we want fairness on the node 
labeling. Node labeling partitions the cluster resources. My take is we 
generally still need fairness on each partition, which is materialized by 
queue, fair share. However, some particular cases only require node labeling to 
act like data locality which doesn't need fairness.

> Fair Scheduler: respect node labels at resource request level
> -
>
> Key: YARN-6636
> URL: https://issues.apache.org/jira/browse/YARN-6636
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>Priority: Major
>
> This ticket is to track changes to fair scheduler to respect node labels at 
> resource request level. When the client sets labels at resource request 
> level, the scheduler must schedule those containers only on those nodes with 
> that label. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7089) Mark the log-aggregation-controller APIs as public

2018-08-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572277#comment-16572277
 ] 

Hudson commented on YARN-7089:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14714 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14714/])
YARN-7089. Mark the log-aggregation-controller APIs as public. (Zian (wangda: 
rev c0599151bb438d3dc0c6a54af93b2670770daefd)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java


> Mark the log-aggregation-controller APIs as public
> --
>
> Key: YARN-7089
> URL: https://issues.apache.org/jira/browse/YARN-7089
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Zian Chen
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7089.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8629) Container cleanup fails while trying to delete Cgroups

2018-08-07 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572276#comment-16572276
 ] 

Hudson commented on YARN-8629:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14714 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14714/])
YARN-8629. Container cleanup fails while trying to delete Cgroups. (Suma 
(wangda: rev d4258fcad71eabe2de3cf829cde36840200ab9b6)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsHandlerImpl.java


> Container cleanup fails while trying to delete Cgroups
> --
>
> Key: YARN-8629
> URL: https://issues.apache.org/jira/browse/YARN-8629
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8629.1.patch
>
>
> When an application failed to launch container successfully, the cleanup of 
> container also failed with below message.
> {code}
> 2018-08-06 03:28:20,351 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.
> java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn-tmp-cxx/container_e02_156898541_0010_20_02/tasks
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at java.io.FileInputStream.(FileInputStream.java:93)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.checkAndDeleteCgroup(CGroupsHandlerImpl.java:507)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.deleteCGroup(CGroupsHandlerImpl.java:542)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.postComplete(CGroupsCpuResourceHandlerImpl.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.postComplete(ResourceHandlerChain.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.postComplete(LinuxContainerExecutor.java:964)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reapContainer(LinuxContainerExecutor.java:787)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:821)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:161)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:57)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2018-08-06 03:28:20,372 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8407) Container launch exception in AM log should be printed in ERROR level

2018-08-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572265#comment-16572265
 ] 

Wangda Tan commented on YARN-8407:
--

+1 to the patch. Thanks [~yeshavora]

> Container launch exception in AM log should be printed in ERROR level
> -
>
> Key: YARN-8407
> URL: https://issues.apache.org/jira/browse/YARN-8407
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-8407.001.patch, YARN-8407.002.patch, 
> YARN-8407.003.patch
>
>
> when a container launch is failing due to docker image not available is 
> logged as INFO level in AM log. 
> Container launch failure should be logged as ERROR.
> Steps:
> launch httpd yarn-service application with invalid docker image
>  
> {code:java}
> 2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
> container_e05_1528335963594_0001_01_02]: 
> container_e05_1528335963594_0001_01_02 completed. Reinsert back to 
> pending list and requested a new container.
> exitStatus=-1, diagnostics=[2018-06-07 01:51:02.363]Exception from 
> container-launch.
> Container id: container_e05_1528335963594_0001_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: Unable to find image 'xxx/httpd:0.1' locally
> Trying to pull repository xxx/httpd ...
> /usr/bin/docker-current: Get https://xxx/v1/_ping: dial tcp: lookup xxx on 
> yyy: no such host.
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Wrote the exit code 7 to 
> /grid/0/hadoop/yarn/local/nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02/container_e05_1528335963594_0001_01_02.pid.exitcode
> [2018-06-07 01:51:02.393]Diagnostic message from attempt :
> [2018-06-07 01:51:02.394]Container exited with a non-zero exit code 7. Last 
> 4096 bytes of stderr.txt :
> [2018-06-07 01:51:32.428]Could not find 
> nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02//container_e05_1528335963594_0001_01_02.pid
>  in any of the directories
> 2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
> container_e05_1528335963594_0001_01_02] Transitioned from STARTED to INIT 
> on STOP event{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572253#comment-16572253
 ] 

Wangda Tan commented on YARN-8561:
--

Attached ver.4 patch, fixed jenkins warnings.

> [Submarine] Add initial implementation: training job submission and job 
> history retrieve.
> -
>
> Key: YARN-8561
> URL: https://issues.apache.org/jira/browse/YARN-8561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8561.001.patch, YARN-8561.002.patch, 
> YARN-8561.003.patch, YARN-8561.004.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8561:
-
Attachment: YARN-8561.004.patch

> [Submarine] Add initial implementation: training job submission and job 
> history retrieve.
> -
>
> Key: YARN-8561
> URL: https://issues.apache.org/jira/browse/YARN-8561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8561.001.patch, YARN-8561.002.patch, 
> YARN-8561.003.patch, YARN-8561.004.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572247#comment-16572247
 ] 

genericqa commented on YARN-8561:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
1m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
12s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 9 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 5 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 29m 43s{color} 
| {color:red} hadoop-yarn-applications in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 33s{color} 
| {color:red} hadoop-yarn-submarine in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
28s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | 

[jira] [Commented] (YARN-8629) Container cleanup fails while trying to delete Cgroups

2018-08-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572242#comment-16572242
 ] 

Wangda Tan commented on YARN-8629:
--

Ah forgot to mention, patch got committed to trunk/branch-3.1

> Container cleanup fails while trying to delete Cgroups
> --
>
> Key: YARN-8629
> URL: https://issues.apache.org/jira/browse/YARN-8629
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8629.1.patch
>
>
> When an application failed to launch container successfully, the cleanup of 
> container also failed with below message.
> {code}
> 2018-08-06 03:28:20,351 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.
> java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn-tmp-cxx/container_e02_156898541_0010_20_02/tasks
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at java.io.FileInputStream.(FileInputStream.java:93)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.checkAndDeleteCgroup(CGroupsHandlerImpl.java:507)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.deleteCGroup(CGroupsHandlerImpl.java:542)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.postComplete(CGroupsCpuResourceHandlerImpl.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.postComplete(ResourceHandlerChain.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.postComplete(LinuxContainerExecutor.java:964)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reapContainer(LinuxContainerExecutor.java:787)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:821)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:161)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:57)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2018-08-06 03:28:20,372 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8407) Container launch exception in AM log should be printed in ERROR level

2018-08-07 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8407:
-
Target Version/s: 3.2.0, 3.1.2

> Container launch exception in AM log should be printed in ERROR level
> -
>
> Key: YARN-8407
> URL: https://issues.apache.org/jira/browse/YARN-8407
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-8407.001.patch, YARN-8407.002.patch, 
> YARN-8407.003.patch
>
>
> when a container launch is failing due to docker image not available is 
> logged as INFO level in AM log. 
> Container launch failure should be logged as ERROR.
> Steps:
> launch httpd yarn-service application with invalid docker image
>  
> {code:java}
> 2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
> container_e05_1528335963594_0001_01_02]: 
> container_e05_1528335963594_0001_01_02 completed. Reinsert back to 
> pending list and requested a new container.
> exitStatus=-1, diagnostics=[2018-06-07 01:51:02.363]Exception from 
> container-launch.
> Container id: container_e05_1528335963594_0001_01_02
> Exit code: 7
> Exception message: Launch container failed
> Shell error output: Unable to find image 'xxx/httpd:0.1' locally
> Trying to pull repository xxx/httpd ...
> /usr/bin/docker-current: Get https://xxx/v1/_ping: dial tcp: lookup xxx on 
> yyy: no such host.
> See '/usr/bin/docker-current run --help'.
> Shell output: main : command provided 4
> main : run as user is hbase
> main : requested yarn user is hbase
> Creating script paths...
> Creating local dirs...
> Getting exit code file...
> Changing effective user to root...
> Wrote the exit code 7 to 
> /grid/0/hadoop/yarn/local/nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02/container_e05_1528335963594_0001_01_02.pid.exitcode
> [2018-06-07 01:51:02.393]Diagnostic message from attempt :
> [2018-06-07 01:51:02.394]Container exited with a non-zero exit code 7. Last 
> 4096 bytes of stderr.txt :
> [2018-06-07 01:51:32.428]Could not find 
> nmPrivate/application_1528335963594_0001/container_e05_1528335963594_0001_01_02//container_e05_1528335963594_0001_01_02.pid
>  in any of the directories
> 2018-06-07 01:51:32,966 [Component  dispatcher] INFO  
> instance.ComponentInstance - [COMPINSTANCE httpd-0 : 
> container_e05_1528335963594_0001_01_02] Transitioned from STARTED to INIT 
> on STOP event{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572217#comment-16572217
 ] 

genericqa commented on YARN-8561:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
13s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 9 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 5 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 29m 24s{color} 
| {color:red} hadoop-yarn-applications in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 30s{color} 
| {color:red} hadoop-yarn-submarine in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
22s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | 

[jira] [Commented] (YARN-8629) Container cleanup fails while trying to delete Cgroups

2018-08-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572206#comment-16572206
 ] 

Wangda Tan commented on YARN-8629:
--

+1, patch LGTM, thanks [~suma.shivaprasad].

> Container cleanup fails while trying to delete Cgroups
> --
>
> Key: YARN-8629
> URL: https://issues.apache.org/jira/browse/YARN-8629
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8629.1.patch
>
>
> When an application failed to launch container successfully, the cleanup of 
> container also failed with below message.
> {code}
> 2018-08-06 03:28:20,351 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.
> java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn-tmp-cxx/container_e02_156898541_0010_20_02/tasks
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at java.io.FileInputStream.(FileInputStream.java:93)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.checkAndDeleteCgroup(CGroupsHandlerImpl.java:507)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.deleteCGroup(CGroupsHandlerImpl.java:542)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.postComplete(CGroupsCpuResourceHandlerImpl.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.postComplete(ResourceHandlerChain.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.postComplete(LinuxContainerExecutor.java:964)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reapContainer(LinuxContainerExecutor.java:787)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:821)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:161)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:57)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2018-08-06 03:28:20,372 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8629) Container cleanup fails while trying to delete Cgroups

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572199#comment-16572199
 ] 

genericqa commented on YARN-8629:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 80m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8629 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934686/YARN-8629.1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3607daaae148 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b1a59b1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21534/testReport/ |
| Max. process+thread count | 333 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21534/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572152#comment-16572152
 ] 

Wangda Tan commented on YARN-8561:
--

Attached ver.3 patch which included help messages and cleaned up unused code. 

> [Submarine] Add initial implementation: training job submission and job 
> history retrieve.
> -
>
> Key: YARN-8561
> URL: https://issues.apache.org/jira/browse/YARN-8561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8561.001.patch, YARN-8561.002.patch, 
> YARN-8561.003.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8561:
-
Attachment: YARN-8561.003.patch

> [Submarine] Add initial implementation: training job submission and job 
> history retrieve.
> -
>
> Key: YARN-8561
> URL: https://issues.apache.org/jira/browse/YARN-8561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8561.001.patch, YARN-8561.002.patch, 
> YARN-8561.003.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6636) Fair Scheduler: respect node labels at resource request level

2018-08-07 Thread Brandon Scheller (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572100#comment-16572100
 ] 

Brandon Scheller commented on YARN-6636:


For clarification, this is simply supporting resource request based node labels 
for Fair-Scheduler correct?

Essentially giving all queues access to all node labels and allowing the 
Application/ResourceRequest labels to dictate individual node scheduling.

Am I understanding correctly that this would not affect queue management or 
fair share calculations at all and would leave it to the user to setup 
queues/labels properly?

> Fair Scheduler: respect node labels at resource request level
> -
>
> Key: YARN-6636
> URL: https://issues.apache.org/jira/browse/YARN-6636
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>Priority: Major
>
> This ticket is to track changes to fair scheduler to respect node labels at 
> resource request level. When the client sets labels at resource request 
> level, the scheduler must schedule those containers only on those nodes with 
> that label. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572087#comment-16572087
 ] 

Wangda Tan commented on YARN-8561:
--

Thanks [~sunilg],
1. Addressed.

2. I think we can rely on yarn app -kill for now, we can add more cleanups, 
etc. in the future.

3. The reasons I wrote a different one is the UnitsConversionUtil is not 
straightforward for user. Why G means 1000 and Gi means 1024. It gonna be very 
hard to update UnitsConversionUtil because of compatibility issue. Also, we 
don't need so many units, m/M/g/G will be enough.

4. IIRC, the capacity scheduler matcher is to check if abs resource is being 
used or not, not for parsing. I think the two configs are slightly different in 
syntax (Actually I don't remember what are the exactly differences here, but to 
be more flexible, I suggest to keep as is.)

5. The reason I keep it JobState is: 
- It's under submarine package.
- It's not likely that we will use mapreduce.JobState and submarine.JobState 
(and other classes like JobStatus, etc.) in the same class.

6. I think we can push this to the future patch, one possible solution is to 
include a yaml file to describe job configs and user can reuse it instead of 
passing 10+ params to CLI.

7. Done.

8. I'm not quite sure about this suggestion, it seems to me that we should add 
the getServiceResourceFromYarnResource method to service.Resource instead. I 
don't want to touch any service classes in this patch. Should we do it in a 
separate JIRA?

9. To me it is fine since we will print generated scripts and user can use 
\{{hadoop fs -cat}} to view files easily. Thoughts?

10. Done, now we throw exception when issue happens.

11. This depends on YARN-8488, once YARN-8488 got committed, we need to update 
this. (in a separate JIRA).

12. Done.

13. You meant increase it when job is running? For TF, this is not allowed.

The previous Jenkins report is gone, will update Jenkins reported issues in the 
next patch. 

> [Submarine] Add initial implementation: training job submission and job 
> history retrieve.
> -
>
> Key: YARN-8561
> URL: https://issues.apache.org/jira/browse/YARN-8561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8561.001.patch, YARN-8561.002.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8561:
-
Attachment: YARN-8561.002.patch

> [Submarine] Add initial implementation: training job submission and job 
> history retrieve.
> -
>
> Key: YARN-8561
> URL: https://issues.apache.org/jira/browse/YARN-8561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8561.001.patch, YARN-8561.002.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8629) Container cleanup fails while trying to delete Cgroups

2018-08-07 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572079#comment-16572079
 ] 

Suma Shivaprasad commented on YARN-8629:


CGroupsHandler may get called multiple times since LCE.reapContainer and 
LCE.handleLaunchForLaunchType both call postComplete which in turn calls 
checkAndDeleteCgroup. The Cgroups folder does not exist in the second run and 
hence results in this error

> Container cleanup fails while trying to delete Cgroups
> --
>
> Key: YARN-8629
> URL: https://issues.apache.org/jira/browse/YARN-8629
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8629.1.patch
>
>
> When an application failed to launch container successfully, the cleanup of 
> container also failed with below message.
> {code}
> 2018-08-06 03:28:20,351 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.
> java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn-tmp-cxx/container_e02_156898541_0010_20_02/tasks
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at java.io.FileInputStream.(FileInputStream.java:93)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.checkAndDeleteCgroup(CGroupsHandlerImpl.java:507)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.deleteCGroup(CGroupsHandlerImpl.java:542)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.postComplete(CGroupsCpuResourceHandlerImpl.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.postComplete(ResourceHandlerChain.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.postComplete(LinuxContainerExecutor.java:964)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reapContainer(LinuxContainerExecutor.java:787)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:821)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:161)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:57)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2018-08-06 03:28:20,372 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8629) Container cleanup fails while trying to delete Cgroups

2018-08-07 Thread Suma Shivaprasad (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8629:
---
Attachment: YARN-8629.1.patch

> Container cleanup fails while trying to delete Cgroups
> --
>
> Key: YARN-8629
> URL: https://issues.apache.org/jira/browse/YARN-8629
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8629.1.patch
>
>
> When an application failed to launch container successfully, the cleanup of 
> container also failed with below message.
> {code}
> 2018-08-06 03:28:20,351 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.
> java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn-tmp-cxx/container_e02_156898541_0010_20_02/tasks
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at java.io.FileInputStream.(FileInputStream.java:93)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.checkAndDeleteCgroup(CGroupsHandlerImpl.java:507)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.deleteCGroup(CGroupsHandlerImpl.java:542)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.postComplete(CGroupsCpuResourceHandlerImpl.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.postComplete(ResourceHandlerChain.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.postComplete(LinuxContainerExecutor.java:964)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reapContainer(LinuxContainerExecutor.java:787)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:821)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:161)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:57)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2018-08-06 03:28:20,372 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8331) Race condition in NM container launched after done

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572032#comment-16572032
 ] 

genericqa commented on YARN-8331:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
30s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 80m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8331 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934671/YARN-8331.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux acf05b0a3ecc 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6ed8593 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21532/testReport/ |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21532/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Race condition in NM container launched after done
> 

[jira] [Commented] (YARN-8626) Create HomePolicyManager that sends all the requests to the home subcluster

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572009#comment-16572009
 ] 

genericqa commented on YARN-8626:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
14s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 13s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8626 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934676/YARN-8626.009.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bed697d83334 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6ed8593 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21533/testReport/ |
| Max. process+thread count | 409 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21533/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Create HomePolicyManager that sends all the requests to the home subcluster
> 

[jira] [Commented] (YARN-6972) Adding RM ClusterId in AppInfo

2018-08-07 Thread Tanuj Nayak (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571999#comment-16571999
 ] 

Tanuj Nayak commented on YARN-6972:
---

Hey it seems like these this test is flaky as I uploaded the same exact patch 
twice and this failure only appears here  [~giovanni.fumarola] 

> Adding RM ClusterId in AppInfo
> --
>
> Key: YARN-6972
> URL: https://issues.apache.org/jira/browse/YARN-6972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Tanuj Nayak
>Priority: Major
> Attachments: YARN-6972.001.patch, YARN-6972.002.patch, 
> YARN-6972.003.patch, YARN-6972.004.patch, YARN-6972.005.patch, 
> YARN-6972.006.patch, YARN-6972.007.patch, YARN-6972.008.patch, 
> YARN-6972.009.patch, YARN-6972.010.patch, YARN-6972.011.patch, 
> YARN-6972.012.patch, YARN-6972.013.patch, YARN-6972.014.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7018) Interface for adding extra behavior to node heartbeats

2018-08-07 Thread Manikandan R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-7018:
---
Attachment: YARN-7018.POC.001.patch

> Interface for adding extra behavior to node heartbeats
> --
>
> Key: YARN-7018
> URL: https://issues.apache.org/jira/browse/YARN-7018
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7018.POC.001.patch
>
>
> This JIRA tracks an interface for plugging in new behavior to node heartbeat 
> processing.  Adding a formal interface for additional node heartbeat 
> processing would allow admins to configure new functionality that is 
> scheduler-independent without needing to replace the entire scheduler.  For 
> example, both YARN-5202 and YARN-5215 had approaches where node heartbeat 
> processing was extended to implement new functionality that was essentially 
> scheduler-independent and could be implemented as a plugin with this 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7018) Interface for adding extra behavior to node heartbeats

2018-08-07 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571985#comment-16571985
 ] 

Manikandan R commented on YARN-7018:


[~jlowe] Attaching POC patch based on earlier discussion (and YARN-5215 as an 
use case). Please review the structure and share your comments. Based on your 
feedback, we can iterate further to make it as concrete patch. For example, 
Probably, we can move methods like 
AbstractYarnScheduler#updateNodeResourceUtilization as well?

> Interface for adding extra behavior to node heartbeats
> --
>
> Key: YARN-7018
> URL: https://issues.apache.org/jira/browse/YARN-7018
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
>
> This JIRA tracks an interface for plugging in new behavior to node heartbeat 
> processing.  Adding a formal interface for additional node heartbeat 
> processing would allow admins to configure new functionality that is 
> scheduler-independent without needing to replace the entire scheduler.  For 
> example, both YARN-5202 and YARN-5215 had approaches where node heartbeat 
> processing was extended to implement new functionality that was essentially 
> scheduler-independent and could be implemented as a plugin with this 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7018) Interface for adding extra behavior to node heartbeats

2018-08-07 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571962#comment-16571962
 ] 

Manikandan R commented on YARN-7018:


[~jlowe] and I had a very preliminary offline discussion on approaches. Thank 
you [~jlowe] for explaining the requirements in detail.

Summary of the discussion: (Copying [~jlowe]'s suggestion as is)

The point of YARN-7018 is to create a hook that runs before the scheduler and, 
as much as possible, is *not* scheduler-dependent. If we have to do something 
scheduler-dependent then we might as well just modify the schedulers directly.

I haven't had much time to think about it, but one potential way to implement 
this is a very simple plugin API. Something like this:
 - Admin configures the plugin via a conf key
 - Plugin only has one API call, onNodeHeartbeat or something like that. It's 
called when a NODE_UPDATE event is processed and before the main scheduler loop 
is called
 - Call arguments would include the SchedulerNode and maybe the RMContext.

>From the RMContext the plugin can get the scheduler and do scheduler-specific 
>things if it wants to, or can be scheduler-specific if it wants to.

I'd rather not force plugin writers to implement every scheduler supported. 
It's a burden to writing a plugin and an obstacle to providing new schedulers 
in YARN.

> Interface for adding extra behavior to node heartbeats
> --
>
> Key: YARN-7018
> URL: https://issues.apache.org/jira/browse/YARN-7018
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
>
> This JIRA tracks an interface for plugging in new behavior to node heartbeat 
> processing.  Adding a formal interface for additional node heartbeat 
> processing would allow admins to configure new functionality that is 
> scheduler-independent without needing to replace the entire scheduler.  For 
> example, both YARN-5202 and YARN-5215 had approaches where node heartbeat 
> processing was extended to implement new functionality that was essentially 
> scheduler-independent and could be implemented as a plugin with this 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8626) Create HomePolicyManager that sends all the requests to the home subcluster

2018-08-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571931#comment-16571931
 ] 

Íñigo Goiri commented on YARN-8626:
---

There was an unused import in [^YARN-8626.008.patch]; added  
[^YARN-8626.009.patch].

> Create HomePolicyManager that sends all the requests to the home subcluster
> ---
>
> Key: YARN-8626
> URL: https://issues.apache.org/jira/browse/YARN-8626
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Giovanni Matteo Fumarola
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8626.000.patch, YARN-8626.001.patch, 
> YARN-8626.002.patch, YARN-8626.003.patch, YARN-8626.004.patch, 
> YARN-8626.005.patch, YARN-8626.006.patch, YARN-8626.007.patch, 
> YARN-8626.008.patch, YARN-8626.009.patch
>
>
> To have the same behavior as a regular non-federated deployment, one should 
> be able to submit jobs to the local RM and get the job constrained to that 
> subcluster.
> This JIRA creates an AMRMProxyPolicy that sends resources to the home 
> subcluster and mimics the behavior of a non-federated cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8626) Create HomePolicyManager that sends all the requests to the home subcluster

2018-08-07 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-8626:
--
Attachment: YARN-8626.009.patch

> Create HomePolicyManager that sends all the requests to the home subcluster
> ---
>
> Key: YARN-8626
> URL: https://issues.apache.org/jira/browse/YARN-8626
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Giovanni Matteo Fumarola
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8626.000.patch, YARN-8626.001.patch, 
> YARN-8626.002.patch, YARN-8626.003.patch, YARN-8626.004.patch, 
> YARN-8626.005.patch, YARN-8626.006.patch, YARN-8626.007.patch, 
> YARN-8626.008.patch, YARN-8626.009.patch
>
>
> To have the same behavior as a regular non-federated deployment, one should 
> be able to submit jobs to the local RM and get the job constrained to that 
> subcluster.
> This JIRA creates an AMRMProxyPolicy that sends resources to the home 
> subcluster and mimics the behavior of a non-federated cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8626) Create HomePolicyManager that sends all the requests to the home subcluster

2018-08-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-8626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571916#comment-16571916
 ] 

Íñigo Goiri commented on YARN-8626:
---

It looks like  [^YARN-8626.008.patch] is fine and the unit tests run fine:
* 
[TestHomePolicyManager|https://builds.apache.org/job/PreCommit-YARN-Build/21526/testReport/org.apache.hadoop.yarn.server.federation.policies.manager/TestHomePolicyManager/]
* 
[TestHomeAMRMProxyPolicy|https://builds.apache.org/job/PreCommit-YARN-Build/21526/testReport/org.apache.hadoop.yarn.server.federation.policies.amrmproxy/TestHomeAMRMProxyPolicy/]

> Create HomePolicyManager that sends all the requests to the home subcluster
> ---
>
> Key: YARN-8626
> URL: https://issues.apache.org/jira/browse/YARN-8626
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Giovanni Matteo Fumarola
>Assignee: Íñigo Goiri
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8626.000.patch, YARN-8626.001.patch, 
> YARN-8626.002.patch, YARN-8626.003.patch, YARN-8626.004.patch, 
> YARN-8626.005.patch, YARN-8626.006.patch, YARN-8626.007.patch, 
> YARN-8626.008.patch
>
>
> To have the same behavior as a regular non-federated deployment, one should 
> be able to submit jobs to the local RM and get the job constrained to that 
> subcluster.
> This JIRA creates an AMRMProxyPolicy that sends resources to the home 
> subcluster and mimics the behavior of a non-federated cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8331) Race condition in NM container launched after done

2018-08-07 Thread Pradeep Ambati (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Ambati reassigned YARN-8331:


Assignee: Pradeep Ambati

> Race condition in NM container launched after done
> --
>
> Key: YARN-8331
> URL: https://issues.apache.org/jira/browse/YARN-8331
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Pradeep Ambati
>Priority: Major
> Attachments: YARN-8331.001.patch
>
>
> When a container is launching, in ContainerLaunch#launchContainer, state is 
> SCHEDULED,
> kill event was sent to this container, state : SCHEDULED->KILLING->DONE
>  Then ContainerLaunch send CONTAINER_LAUNCHED event and start the container 
> processes. These absent container processes will not be cleaned up anymore.
>  
> {code:java}
> 2018-05-21 13:11:56,114 INFO  [Thread-11] nodemanager.NMAuditLogger 
> (NMAuditLogger.java:logSuccess(94)) - USER=nobody OPERATION=Start Container 
> Request   TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_0_CONTAINERID=container_0__01_00
> 2018-05-21 13:11:56,114 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:handle(632)) - Application 
> application_0_ transitioned from NEW to INITING
> 2018-05-21 13:11:56,114 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:transition(446)) - Adding 
> container_0__01_00 to application application_0_
> 2018-05-21 13:11:56,118 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:handle(632)) - Application 
> application_0_ transitioned from INITING to RUNNING
> 2018-05-21 13:11:56,119 INFO  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2111)) - Container 
> container_0__01_00 transitioned from NEW to SCHEDULED
> 2018-05-21 13:11:56,119 INFO  [NM ContainerManager dispatcher] 
> containermanager.AuxServices (AuxServices.java:handle(220)) - Got event 
> CONTAINER_INIT for appId application_0_
> 2018-05-21 13:11:56,119 INFO  [NM ContainerManager dispatcher] 
> scheduler.ContainerScheduler (ContainerScheduler.java:startContainer(504)) - 
> Starting container [container_0__01_00]
> 2018-05-21 13:11:56,226 INFO  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2111)) - Container 
> container_0__01_00 transitioned from SCHEDULED to KILLING
> 2018-05-21 13:11:56,227 INFO  [NM ContainerManager dispatcher] 
> containermanager.TestContainerManager 
> (BaseContainerManagerTest.java:delete(287)) - Psuedo delete: user - nobody, 
> type - FILE
> 2018-05-21 13:11:56,227 INFO  [NM ContainerManager dispatcher] 
> nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(94)) - USER=nobody   
>  OPERATION=Container Finished - Killed   TARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_0_
> CONTAINERID=container_0__01_00
> 2018-05-21 13:11:56,238 INFO  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2111)) - Container 
> container_0__01_00 transitioned from KILLING to DONE
> 2018-05-21 13:11:56,238 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:transition(489)) - Removing 
> container_0__01_00 from application application_0_
> 2018-05-21 13:11:56,239 INFO  [NM ContainerManager dispatcher] 
> monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:onStopMonitoringContainer(932)) - Stopping 
> resource-monitoring for container_0__01_00
> 2018-05-21 13:11:56,239 INFO  [NM ContainerManager dispatcher] 
> containermanager.AuxServices (AuxServices.java:handle(220)) - Got event 
> CONTAINER_STOP for appId application_0_
> 2018-05-21 13:11:56,274 WARN  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2106)) - Can't handle this 
> event at current state: Current: [DONE], eventType: [CONTAINER_LAUNCHED], 
> container: [container_0__01_00]
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_LAUNCHED at DONE
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2104)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:104)
>   

[jira] [Updated] (YARN-8629) Container cleanup fails while trying to delete Cgroups

2018-08-07 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8629:
-
Target Version/s: 3.2.0, 3.1.2
Priority: Critical  (was: Major)

> Container cleanup fails while trying to delete Cgroups
> --
>
> Key: YARN-8629
> URL: https://issues.apache.org/jira/browse/YARN-8629
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Suma Shivaprasad
>Priority: Critical
>
> When an application failed to launch container successfully, the cleanup of 
> container also failed with below message.
> {code}
> 2018-08-06 03:28:20,351 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.
> java.io.FileNotFoundException: 
> /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn-tmp-cxx/container_e02_156898541_0010_20_02/tasks
>  (No such file or directory)
> at java.io.FileInputStream.open0(Native Method)
> at java.io.FileInputStream.open(FileInputStream.java:195)
> at java.io.FileInputStream.(FileInputStream.java:138)
> at java.io.FileInputStream.(FileInputStream.java:93)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.checkAndDeleteCgroup(CGroupsHandlerImpl.java:507)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl.deleteCGroup(CGroupsHandlerImpl.java:542)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsCpuResourceHandlerImpl.postComplete(CGroupsCpuResourceHandlerImpl.java:238)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.postComplete(ResourceHandlerChain.java:111)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.postComplete(LinuxContainerExecutor.java:964)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reapContainer(LinuxContainerExecutor.java:787)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:821)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:161)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:57)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:748)
> 2018-08-06 03:28:20,372 WARN  resources.CGroupsHandlerImpl 
> (CGroupsHandlerImpl.java:checkAndDeleteCgroup(523)) - Failed to read cgroup 
> tasks file.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8331) Race condition in NM container launched after done

2018-08-07 Thread Pradeep Ambati (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571886#comment-16571886
 ] 

Pradeep Ambati commented on YARN-8331:
--

Uploaded a patch to address this JIRA. Basic idea is to treat "scheduled" state 
as "running" state  when a container is killed (CONTAINER_KILL event). If a 
container is in "scheduled" state, then it means that it is not yet launched or 
container is launched but it didn't yet receive the "CONTAINER_LAUNCHED" event 
from launcher. If a CONTAINER_KILL event is sent to a container in "scheduled" 
state, we handle the following two scenarios:

1. If the container is not launched yet, then we can catch this in 
containerslauncher.java and send a CONTAINER_KILLED_ON_REQUEST event to 
container, which triggers container cleanup (and container will transition to 
DONE eventually). 

2.On the other hand, if the container is launched and a kill event/signal is 
sent to container before it receives CONTAINER_LAUNCHED event, then fix will do 
the right thing by treating the container as running.

> Race condition in NM container launched after done
> --
>
> Key: YARN-8331
> URL: https://issues.apache.org/jira/browse/YARN-8331
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Priority: Major
> Attachments: YARN-8331.001.patch
>
>
> When a container is launching, in ContainerLaunch#launchContainer, state is 
> SCHEDULED,
> kill event was sent to this container, state : SCHEDULED->KILLING->DONE
>  Then ContainerLaunch send CONTAINER_LAUNCHED event and start the container 
> processes. These absent container processes will not be cleaned up anymore.
>  
> {code:java}
> 2018-05-21 13:11:56,114 INFO  [Thread-11] nodemanager.NMAuditLogger 
> (NMAuditLogger.java:logSuccess(94)) - USER=nobody OPERATION=Start Container 
> Request   TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_0_CONTAINERID=container_0__01_00
> 2018-05-21 13:11:56,114 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:handle(632)) - Application 
> application_0_ transitioned from NEW to INITING
> 2018-05-21 13:11:56,114 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:transition(446)) - Adding 
> container_0__01_00 to application application_0_
> 2018-05-21 13:11:56,118 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:handle(632)) - Application 
> application_0_ transitioned from INITING to RUNNING
> 2018-05-21 13:11:56,119 INFO  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2111)) - Container 
> container_0__01_00 transitioned from NEW to SCHEDULED
> 2018-05-21 13:11:56,119 INFO  [NM ContainerManager dispatcher] 
> containermanager.AuxServices (AuxServices.java:handle(220)) - Got event 
> CONTAINER_INIT for appId application_0_
> 2018-05-21 13:11:56,119 INFO  [NM ContainerManager dispatcher] 
> scheduler.ContainerScheduler (ContainerScheduler.java:startContainer(504)) - 
> Starting container [container_0__01_00]
> 2018-05-21 13:11:56,226 INFO  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2111)) - Container 
> container_0__01_00 transitioned from SCHEDULED to KILLING
> 2018-05-21 13:11:56,227 INFO  [NM ContainerManager dispatcher] 
> containermanager.TestContainerManager 
> (BaseContainerManagerTest.java:delete(287)) - Psuedo delete: user - nobody, 
> type - FILE
> 2018-05-21 13:11:56,227 INFO  [NM ContainerManager dispatcher] 
> nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(94)) - USER=nobody   
>  OPERATION=Container Finished - Killed   TARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_0_
> CONTAINERID=container_0__01_00
> 2018-05-21 13:11:56,238 INFO  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2111)) - Container 
> container_0__01_00 transitioned from KILLING to DONE
> 2018-05-21 13:11:56,238 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:transition(489)) - Removing 
> container_0__01_00 from application application_0_
> 2018-05-21 13:11:56,239 INFO  [NM ContainerManager dispatcher] 
> monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:onStopMonitoringContainer(932)) - Stopping 
> resource-monitoring for container_0__01_00
> 2018-05-21 13:11:56,239 INFO  [NM ContainerManager dispatcher] 
> containermanager.AuxServices (AuxServices.java:handle(220)) - Got event 
> CONTAINER_STOP for appId application_0_
> 2018-05-21 13:11:56,274 WARN  [NM ContainerManager dispatcher] 
> container.ContainerImpl 

[jira] [Updated] (YARN-8331) Race condition in NM container launched after done

2018-08-07 Thread Pradeep Ambati (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Ambati updated YARN-8331:
-
Attachment: YARN-8331.001.patch

> Race condition in NM container launched after done
> --
>
> Key: YARN-8331
> URL: https://issues.apache.org/jira/browse/YARN-8331
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Priority: Major
> Attachments: YARN-8331.001.patch
>
>
> When a container is launching, in ContainerLaunch#launchContainer, state is 
> SCHEDULED,
> kill event was sent to this container, state : SCHEDULED->KILLING->DONE
>  Then ContainerLaunch send CONTAINER_LAUNCHED event and start the container 
> processes. These absent container processes will not be cleaned up anymore.
>  
> {code:java}
> 2018-05-21 13:11:56,114 INFO  [Thread-11] nodemanager.NMAuditLogger 
> (NMAuditLogger.java:logSuccess(94)) - USER=nobody OPERATION=Start Container 
> Request   TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_0_CONTAINERID=container_0__01_00
> 2018-05-21 13:11:56,114 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:handle(632)) - Application 
> application_0_ transitioned from NEW to INITING
> 2018-05-21 13:11:56,114 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:transition(446)) - Adding 
> container_0__01_00 to application application_0_
> 2018-05-21 13:11:56,118 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:handle(632)) - Application 
> application_0_ transitioned from INITING to RUNNING
> 2018-05-21 13:11:56,119 INFO  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2111)) - Container 
> container_0__01_00 transitioned from NEW to SCHEDULED
> 2018-05-21 13:11:56,119 INFO  [NM ContainerManager dispatcher] 
> containermanager.AuxServices (AuxServices.java:handle(220)) - Got event 
> CONTAINER_INIT for appId application_0_
> 2018-05-21 13:11:56,119 INFO  [NM ContainerManager dispatcher] 
> scheduler.ContainerScheduler (ContainerScheduler.java:startContainer(504)) - 
> Starting container [container_0__01_00]
> 2018-05-21 13:11:56,226 INFO  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2111)) - Container 
> container_0__01_00 transitioned from SCHEDULED to KILLING
> 2018-05-21 13:11:56,227 INFO  [NM ContainerManager dispatcher] 
> containermanager.TestContainerManager 
> (BaseContainerManagerTest.java:delete(287)) - Psuedo delete: user - nobody, 
> type - FILE
> 2018-05-21 13:11:56,227 INFO  [NM ContainerManager dispatcher] 
> nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(94)) - USER=nobody   
>  OPERATION=Container Finished - Killed   TARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_0_
> CONTAINERID=container_0__01_00
> 2018-05-21 13:11:56,238 INFO  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2111)) - Container 
> container_0__01_00 transitioned from KILLING to DONE
> 2018-05-21 13:11:56,238 INFO  [NM ContainerManager dispatcher] 
> application.ApplicationImpl (ApplicationImpl.java:transition(489)) - Removing 
> container_0__01_00 from application application_0_
> 2018-05-21 13:11:56,239 INFO  [NM ContainerManager dispatcher] 
> monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:onStopMonitoringContainer(932)) - Stopping 
> resource-monitoring for container_0__01_00
> 2018-05-21 13:11:56,239 INFO  [NM ContainerManager dispatcher] 
> containermanager.AuxServices (AuxServices.java:handle(220)) - Got event 
> CONTAINER_STOP for appId application_0_
> 2018-05-21 13:11:56,274 WARN  [NM ContainerManager dispatcher] 
> container.ContainerImpl (ContainerImpl.java:handle(2106)) - Can't handle this 
> event at current state: Current: [DONE], eventType: [CONTAINER_LAUNCHED], 
> container: [container_0__01_00]
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_LAUNCHED at DONE
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2104)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:104)
>   at 
> 

[jira] [Comment Edited] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571719#comment-16571719
 ] 

Sunil Govindan edited comment on YARN-8561 at 8/7/18 3:00 PM:
--

Thanks [~leftnoteasy] for the effort. I have tried to look through the approach 
and code. 
Few comments which is mixed or major and minor :)

1. I think we can used same CLI model of client where CLI extends Configured 
and implements Tool. This helps for tests. Also this helps to avoid abstract 
run method as its Tool.
2. We could also stop a job from CLI, correct? In that case, do we need to do 
some thing more extra than a simple yarn app -kill appId ?
3. I think we can use UnitsConversionUtil for unit convertion. 
CliUtils#parseResourcesString
4. In CapSchedConfig for absolute resource, we used a pattern match code.
{code}
public static final String PATTERN_FOR_ABSOLUTE_RESOURCE = "^\\[[\\w\\.,\\-_=\\ 
/]+\\]$";
private static final Pattern RESOURCE_PATTERN = 
Pattern.compile(PATTERN_FOR_ABSOLUTE_RESOURCE);
{code}
Could we use same in CLI as well?
5. May be rename JobState to SubmarineJobState
6. Commandline options looks very clean and thorough. I think as we go forward, 
more CLI options will be added. and it will become more complex. Could we load 
a profile to submarine and then use the profile get 80% of such config items. 
Given a profile, may be user might need to fill 1 or 2 variable arguments.
7. DevelopperGuide.md ==> DeveloperGuide.md
8. In getServiceResourceFromYarnResource, I think we should get the resource 
list from ResourceUtils. Also it might be better to use a common client/server 
util method to create resource. something like 
Resource.newInstance(yarnResource) or Resources.createResource(yarnResource)
9. In verbose or debug mode, may be in YarnServiceJobSubmitter could dump all 
contents of \{{FileWriter fw}}
10. It might be better to add a shutdown signal or interrupt signal to break 
out from JobMonitor#waitTrainingFinal, if job is faulty.
11. In fromServiceState, service state STOPPED is considered as 
JobState.SUCCEEDED;
12. Some commented code in JobStatusBuilder
13. How could we increase number of workers on a running job?


was (Author: sunilg):
Thanks [~leftnoteasy] for the effort. I have tried to look through the approach 
and code. 
Few comments which is mixed or major and minor :)

1. I think we can used same CLI model of client where CLI extends Configured 
and implements Tool. This helps for tests. Also this helps to avoid abstract 
run method as its Tool.
2. We could also stop a job from CLI, correct? In that case, do we need to do 
some thing more extra than a simple yarn app -kill appId ?
3. I think we can use UnitsConversionUtil for unit convertion. 
CliUtils#parseResourcesString
4. In CapSchedConfig for absolute resource, we used a pattern match code.
{code}
public static final String PATTERN_FOR_ABSOLUTE_RESOURCE = "^\\[[\\w\\.,\\-_=\\ 
/]+\\]$";
private static final Pattern RESOURCE_PATTERN = 
Pattern.compile(PATTERN_FOR_ABSOLUTE_RESOURCE);
{code}
Could we use same in CLI as well?
5. May be rename JobState to SubmarineJobState
6. Commandline options looks very clean and thorough. I think as we go forward, 
more CLI options will be added. and it will become more complex. Could we load 
a profile to submarine and then use the profile get 80% of such config items. 
Given a profile, may be user might need to fill 1 or 2 variable arguments.
7. DevelopperGuide.md ==> DeveloperGuide.md

> [Submarine] Add initial implementation: training job submission and job 
> history retrieve.
> -
>
> Key: YARN-8561
> URL: https://issues.apache.org/jira/browse/YARN-8561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8561.001.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8609) NM oom because of large container statuses

2018-08-07 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571732#comment-16571732
 ] 

Jason Lowe commented on YARN-8609:
--

bq. Indeed, it would not take up too much memory if running with YARN-3998.

Then I propose this be closed as a duplicate.  Those looking for a JIRA and 
finding the summary matching their symptoms should be directed to YARN-3998 
since that alone is sufficient to address that problem.

bq. if we do truncation in for loop, all kinds of diagnostic info will retain. 
This is what I want to say and it is a small improvement.

We can add the ability to truncate individual diagnostic messages in a separate 
improvement JIRA.  However as I mentioned above, 5000 may be too small of a 
default since it could end up truncating a critical "Caused by" towards the end 
of a large stacktrace.

> NM oom because of large container statuses
> --
>
> Key: YARN-8609
> URL: https://issues.apache.org/jira/browse/YARN-8609
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Xianghao Lu
>Priority: Major
> Attachments: YARN-8609.001.patch, contain_status.jpg, oom.jpeg
>
>
> Sometimes, NodeManger will send large container statuses to ResourceManager 
> when NodeManger start with recovering, as a result , NodeManger will be 
> failed to start because of oom.
>  In my case, the large container statuses size is 135M, which contain 11 
> container statuses, and I find the diagnostics of 5 containers are very 
> large(27M), so, I truncate the container diagnostics as the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8561) [Submarine] Add initial implementation: training job submission and job history retrieve.

2018-08-07 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571719#comment-16571719
 ] 

Sunil Govindan commented on YARN-8561:
--

Thanks [~leftnoteasy] for the effort. I have tried to look through the approach 
and code. 
Few comments which is mixed or major and minor :)

1. I think we can used same CLI model of client where CLI extends Configured 
and implements Tool. This helps for tests. Also this helps to avoid abstract 
run method as its Tool.
2. We could also stop a job from CLI, correct? In that case, do we need to do 
some thing more extra than a simple yarn app -kill appId ?
3. I think we can use UnitsConversionUtil for unit convertion. 
CliUtils#parseResourcesString
4. In CapSchedConfig for absolute resource, we used a pattern match code.
{code}
public static final String PATTERN_FOR_ABSOLUTE_RESOURCE = "^\\[[\\w\\.,\\-_=\\ 
/]+\\]$";
private static final Pattern RESOURCE_PATTERN = 
Pattern.compile(PATTERN_FOR_ABSOLUTE_RESOURCE);
{code}
Could we use same in CLI as well?
5. May be rename JobState to SubmarineJobState
6. Commandline options looks very clean and thorough. I think as we go forward, 
more CLI options will be added. and it will become more complex. Could we load 
a profile to submarine and then use the profile get 80% of such config items. 
Given a profile, may be user might need to fill 1 or 2 variable arguments.
7. DevelopperGuide.md ==> DeveloperGuide.md

> [Submarine] Add initial implementation: training job submission and job 
> history retrieve.
> -
>
> Key: YARN-8561
> URL: https://issues.apache.org/jira/browse/YARN-8561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8561.001.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8633) Update JQuery version references in yarn-common

2018-08-07 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8633:
-
Summary: Update JQuery version references in yarn-common  (was: [BlackDuck] 
[Hadoop Yarn Common] Update JQuery version references)

> Update JQuery version references in yarn-common
> ---
>
> Key: YARN-8633
> URL: https://issues.apache.org/jira/browse/YARN-8633
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-8633.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8633) [BlackDuck] [Hadoop Yarn Common] Update JQuery version references

2018-08-07 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571576#comment-16571576
 ] 

Akhil PB edited comment on YARN-8633 at 8/7/18 12:39 PM:
-

Upgraded {{jquery.dataTables.min.js}} version from v1.9.4 to v1.10.7.

[~sunilg]  [~leftnoteasy] Please review the patch.


was (Author: akhilpb):
[~sunilg]  [~leftnoteasy] Please review the patch.

> [BlackDuck] [Hadoop Yarn Common] Update JQuery version references
> -
>
> Key: YARN-8633
> URL: https://issues.apache.org/jira/browse/YARN-8633
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-8633.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8633) [BlackDuck] [Hadoop Yarn Common] Update JQuery version references

2018-08-07 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571576#comment-16571576
 ] 

Akhil PB edited comment on YARN-8633 at 8/7/18 12:38 PM:
-

[~sunilg] [~wangda] Please review the patch.


was (Author: akhilpb):
[~sunilg] Please review the patch.

> [BlackDuck] [Hadoop Yarn Common] Update JQuery version references
> -
>
> Key: YARN-8633
> URL: https://issues.apache.org/jira/browse/YARN-8633
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-8633.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8633) [BlackDuck] [Hadoop Yarn Common] Update JQuery version references

2018-08-07 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571576#comment-16571576
 ] 

Akhil PB edited comment on YARN-8633 at 8/7/18 12:38 PM:
-

[~sunilg]  [~leftnoteasy] Please review the patch.


was (Author: akhilpb):
[~sunilg] [~wangda] Please review the patch.

> [BlackDuck] [Hadoop Yarn Common] Update JQuery version references
> -
>
> Key: YARN-8633
> URL: https://issues.apache.org/jira/browse/YARN-8633
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-8633.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8633) [BlackDuck] [Hadoop Yarn Common] Update JQuery version references

2018-08-07 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571576#comment-16571576
 ] 

Akhil PB commented on YARN-8633:


[~sunilg] Please review the patch.

> [BlackDuck] [Hadoop Yarn Common] Update JQuery version references
> -
>
> Key: YARN-8633
> URL: https://issues.apache.org/jira/browse/YARN-8633
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-8633.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8633) [BlackDuck] [Hadoop Yarn Common] Update JQuery version references

2018-08-07 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-8633:
---
Attachment: YARN-8633.001.patch

> [BlackDuck] [Hadoop Yarn Common] Update JQuery version references
> -
>
> Key: YARN-8633
> URL: https://issues.apache.org/jira/browse/YARN-8633
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-8633.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8633) [BlackDuck] [Hadoop Yarn Common] Update JQuery version references

2018-08-07 Thread Akhil PB (JIRA)
Akhil PB created YARN-8633:
--

 Summary: [BlackDuck] [Hadoop Yarn Common] Update JQuery version 
references
 Key: YARN-8633
 URL: https://issues.apache.org/jira/browse/YARN-8633
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Akhil PB
Assignee: Akhil PB






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-07 Thread Xianghao Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghao Lu updated YARN-8632:
--
Attachment: YARN-8632.001.patch

> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Priority: Major
> Attachments: YARN-8632.001.patch
>
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-07 Thread Xianghao Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghao Lu updated YARN-8632:
--
Description: 
Recently, I have beenning using 
[SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
 to validate the impact of changes on my FairScheduler. I encountered some 
problems.
 Firstly, I fix a npe bug with the patch in 
https://issues.apache.org/jira/browse/YARN-4302
 Secondly, Everything seems to be ok, but I just get "[]" in file 
realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
because of npe,
 the reason is "wrapper.getQueueSet()" is still null when executing "String 
metrics = web.generateRealTimeTrackingMetrics();"
 So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" in 
try section to avoid MetricsLogRunnable thread exit with unexpected exception. 
 My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
second problem and I have made a patch to solve it.

  was:
Recently, I have beenning using 
[SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
 to validate the impact of changes on my FairScheduler. I encountered some 
problems.
 Firstly, I fix a npe bug with the patch in 
https://issues.apache.org/jira/browse/YARN-4302
 Secondly, Everything seems to be ok, but I just get "[]" in file 
realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
because of npe,
 the reason is "wrapper.getQueueSet()" is still null when executing "String 
metrics = web.generateRealTimeTrackingMetrics();"
 So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" in 
try section to avoid MetricsLogRunnable thread exit with unexpected exception. 
 My hadoop version is 2.7.2, it seems that hadoop trunk branch also has this 
the second problem and I have made a patch to solve it.


> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Priority: Major
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the 
> second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-07 Thread Xianghao Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghao Lu updated YARN-8632:
--
Description: 
Recently, I have beenning using 
[SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
 to validate the impact of changes on my FairScheduler. I encountered some 
problems.
 Firstly, I fix a npe bug with the patch in 
https://issues.apache.org/jira/browse/YARN-4302
 Secondly, Everything seems to be ok, but I just get "[]" in file 
realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
because of npe,
 the reason is "wrapper.getQueueSet()" is still null when executing "String 
metrics = web.generateRealTimeTrackingMetrics();"
 So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" in 
try section to avoid MetricsLogRunnable thread exit with unexpected exception. 
 My hadoop version is 2.7.2, it seems that hadoop trunk branch also has this 
the second problem and I have made a patch to solve it.

  was:
Recently, I have beenning using 
[SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
 to validate the impact of changes on my FairScheduler. I encountered some 
problems.
Firstly, I fix a npe bug with the patch in 
https://issues.apache.org/jira/browse/YARN-4302
Secondly, Everything seems to be ok, but I just get "[]" in file 
realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
because of npe,
the reason is "wrapper.getQueueSet()" is still null when executing "String 
metrics = web.generateRealTimeTrackingMetrics();"
So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" in 
try section to avoid MetricsLogRunnable thread exit with unexpected exception. 
My hadoop version is 2.7.2, it seems that hadoop trunk branch also has this 
problem and I have made a patch to solve it.


> No data in file realtimetrack.json after running SchedulerLoadSimulator
> ---
>
> Key: YARN-8632
> URL: https://issues.apache.org/jira/browse/YARN-8632
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler-load-simulator
>Reporter: Xianghao Lu
>Priority: Major
>
> Recently, I have beenning using 
> [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
>  to validate the impact of changes on my FairScheduler. I encountered some 
> problems.
>  Firstly, I fix a npe bug with the patch in 
> https://issues.apache.org/jira/browse/YARN-4302
>  Secondly, Everything seems to be ok, but I just get "[]" in file 
> realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
> because of npe,
>  the reason is "wrapper.getQueueSet()" is still null when executing "String 
> metrics = web.generateRealTimeTrackingMetrics();"
>  So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" 
> in try section to avoid MetricsLogRunnable thread exit with unexpected 
> exception. 
>  My hadoop version is 2.7.2, it seems that hadoop trunk branch also has this 
> the second problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator

2018-08-07 Thread Xianghao Lu (JIRA)
Xianghao Lu created YARN-8632:
-

 Summary: No data in file realtimetrack.json after running 
SchedulerLoadSimulator
 Key: YARN-8632
 URL: https://issues.apache.org/jira/browse/YARN-8632
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler-load-simulator
Reporter: Xianghao Lu


Recently, I have beenning using 
[SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html]
 to validate the impact of changes on my FairScheduler. I encountered some 
problems.
Firstly, I fix a npe bug with the patch in 
https://issues.apache.org/jira/browse/YARN-4302
Secondly, Everything seems to be ok, but I just get "[]" in file 
realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit 
because of npe,
the reason is "wrapper.getQueueSet()" is still null when executing "String 
metrics = web.generateRealTimeTrackingMetrics();"
So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" in 
try section to avoid MetricsLogRunnable thread exit with unexpected exception. 
My hadoop version is 2.7.2, it seems that hadoop trunk branch also has this 
problem and I have made a patch to solve it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8613) Old RM UI shows wrong vcores total value

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571479#comment-16571479
 ] 

genericqa commented on YARN-8613:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 35m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 34s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}146m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8613 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934617/YARN-8613.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2747e59d2973 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2e4e02b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21530/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21530/testReport/ |
| Max. process+thread count | 871 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-8630) ATSv2 REST APIs should honor filter-entity-list-by-user in non-secure cluster when ACls are enabled

2018-08-07 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571432#comment-16571432
 ] 

Sunil Govindan commented on YARN-8630:
--

+1 on this patch. Its straight forward fix.

Thanks [~rohithsharma]

> ATSv2 REST APIs should honor filter-entity-list-by-user in non-secure cluster 
> when ACls are enabled
> ---
>
> Key: YARN-8630
> URL: https://issues.apache.org/jira/browse/YARN-8630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8630.01.patch
>
>
> It is observed that ATSv2 REST endpoints are not honoring 
> *yarn.webapp.filter-entity-list-by-user* in non-secure cluster when ACLs are 
> enabled. 
> The issue can be seen if static web app filter is not configured in  
> non-secure cluster.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8613) Old RM UI shows wrong vcores total value

2018-08-07 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571429#comment-16571429
 ] 

Sunil Govindan commented on YARN-8613:
--

Thanks [~Sen Zhao] for the patch. This fixes issue which I think is fine. 
However my worry is why QueueMetrics is failing. I think we need to debug more 
on how Queue Metrics got corrupted else same issue will come via metrics etc as 
well.

[~bibinchundatt] [~rohithsharma] pls share ur thoughts

> Old RM UI shows wrong vcores total value
> 
>
> Key: YARN-8613
> URL: https://issues.apache.org/jira/browse/YARN-8613
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akhil PB
>Assignee: Sen Zhao
>Priority: Major
> Attachments: Screen Shot 2018-08-02 at 12.12.41 PM.png, Screen Shot 
> 2018-08-02 at 12.16.53 PM.png, YARN-8613.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8535) DistributedShell unit tests are failing

2018-08-07 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-8535:
---

Assignee: Abhishek Modi

> DistributedShell unit tests are failing
> ---
>
> Key: YARN-8535
> URL: https://issues.apache.org/jira/browse/YARN-8535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: distributed-shell, timelineservice
>Reporter: Eric Yang
>Assignee: Abhishek Modi
>Priority: Major
>
> These tests have been failing for a while in trunk:
> |[testDSShellWithoutDomainV2|https://builds.apache.org/job/PreCommit-YARN-Build/21243/testReport/org.apache.hadoop.yarn.applications.distributedshell/TestDistributedShell/testDSShellWithoutDomainV2]|1
>  min 20 sec|Failed|
> |[testDSShellWithoutDomainV2CustomizedFlow|https://builds.apache.org/job/PreCommit-YARN-Build/21243/testReport/org.apache.hadoop.yarn.applications.distributedshell/TestDistributedShell/testDSShellWithoutDomainV2CustomizedFlow]|1
>  min 20 sec|Failed|
> |[testDSShellWithoutDomainV2DefaultFlow|https://builds.apache.org/job/PreCommit-YARN-Build/21243/testReport/org.apache.hadoop.yarn.applications.distributedshell/TestDistributedShell/testDSShellWithoutDomainV2DefaultFlow]|1
>  min 20 sec|Failed|
> The root causes are the same:
> {code:java}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.verifyEntityTypeFileExists(TestDistributedShell.java:628)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV2(TestDistributedShell.java:546)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:451)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:310)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithoutDomainV2(TestDistributedShell.java:306)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6972) Adding RM ClusterId in AppInfo

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571418#comment-16571418
 ] 

genericqa commented on YARN-6972:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 36s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-6972 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934613/YARN-6972.014.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 389fd3dc6613 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2e4e02b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21529/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21529/testReport/ |
| Max. process+thread count | 928 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Created] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery

2018-08-07 Thread Sanjay Divgi (JIRA)
Sanjay Divgi created YARN-8631:
--

 Summary: YARN RM fails to add the application to the delegation 
token renewer on recovery
 Key: YARN-8631
 URL: https://issues.apache.org/jira/browse/YARN-8631
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.1.0
Reporter: Sanjay Divgi
 Attachments: 
hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log

On HA cluster we have observed that yarn resource manager fails to add the 
application to the delegation token renewer on recovery.

Below is the error:
{code:java}
2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer 
(DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= 
[Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: 
(TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, 
issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, 
masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]]
2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer 
(DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to 
add the application to the delegation token renewer on recovery.
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8613) Old RM UI shows wrong vcores total value

2018-08-07 Thread Sen Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571316#comment-16571316
 ] 

Sen Zhao commented on YARN-8613:


Update a patch to fix this problem.

> Old RM UI shows wrong vcores total value
> 
>
> Key: YARN-8613
> URL: https://issues.apache.org/jira/browse/YARN-8613
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akhil PB
>Assignee: Sen Zhao
>Priority: Major
> Attachments: Screen Shot 2018-08-02 at 12.12.41 PM.png, Screen Shot 
> 2018-08-02 at 12.16.53 PM.png, YARN-8613.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8613) Old RM UI shows wrong vcores total value

2018-08-07 Thread Sen Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sen Zhao updated YARN-8613:
---
Attachment: YARN-8613.001.patch

> Old RM UI shows wrong vcores total value
> 
>
> Key: YARN-8613
> URL: https://issues.apache.org/jira/browse/YARN-8613
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akhil PB
>Priority: Major
> Attachments: Screen Shot 2018-08-02 at 12.12.41 PM.png, Screen Shot 
> 2018-08-02 at 12.16.53 PM.png, YARN-8613.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8613) Old RM UI shows wrong vcores total value

2018-08-07 Thread Sen Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sen Zhao reassigned YARN-8613:
--

Assignee: Sen Zhao

> Old RM UI shows wrong vcores total value
> 
>
> Key: YARN-8613
> URL: https://issues.apache.org/jira/browse/YARN-8613
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akhil PB
>Assignee: Sen Zhao
>Priority: Major
> Attachments: Screen Shot 2018-08-02 at 12.12.41 PM.png, Screen Shot 
> 2018-08-02 at 12.16.53 PM.png, YARN-8613.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8630) ATSv2 REST APIs should honor filter-entity-list-by-user in non-secure cluster when ACls are enabled

2018-08-07 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571248#comment-16571248
 ] 

genericqa commented on YARN-8630:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
7s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 51s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8630 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12934607/YARN-8630.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 91fb62062b51 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2e4e02b |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21528/testReport/ |
| Max. process+thread count | 440 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21528/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> ATSv2 REST APIs should honor 

[jira] [Updated] (YARN-6972) Adding RM ClusterId in AppInfo

2018-08-07 Thread Tanuj Nayak (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanuj Nayak updated YARN-6972:
--
Attachment: YARN-6972.014.patch

> Adding RM ClusterId in AppInfo
> --
>
> Key: YARN-6972
> URL: https://issues.apache.org/jira/browse/YARN-6972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Tanuj Nayak
>Priority: Major
> Attachments: YARN-6972.001.patch, YARN-6972.002.patch, 
> YARN-6972.003.patch, YARN-6972.004.patch, YARN-6972.005.patch, 
> YARN-6972.006.patch, YARN-6972.007.patch, YARN-6972.008.patch, 
> YARN-6972.009.patch, YARN-6972.010.patch, YARN-6972.011.patch, 
> YARN-6972.012.patch, YARN-6972.013.patch, YARN-6972.014.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8630) ATSv2 REST APIs should honor filter-entity-list-by-user in non-secure cluster when ACls are enabled

2018-08-07 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571206#comment-16571206
 ] 

Rohith Sharma K S commented on YARN-8630:
-

[~sunilg] Could you please review? 

> ATSv2 REST APIs should honor filter-entity-list-by-user in non-secure cluster 
> when ACls are enabled
> ---
>
> Key: YARN-8630
> URL: https://issues.apache.org/jira/browse/YARN-8630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8630.01.patch
>
>
> It is observed that ATSv2 REST endpoints are not honoring 
> *yarn.webapp.filter-entity-list-by-user* in non-secure cluster when ACLs are 
> enabled. 
> The issue can be seen if static web app filter is not configured in  
> non-secure cluster.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8630) ATSv2 REST APIs should honor filter-entity-list-by-user in non-secure cluster when ACls are enabled

2018-08-07 Thread Rohith Sharma K S (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-8630:

Attachment: YARN-8630.01.patch

> ATSv2 REST APIs should honor filter-entity-list-by-user in non-secure cluster 
> when ACls are enabled
> ---
>
> Key: YARN-8630
> URL: https://issues.apache.org/jira/browse/YARN-8630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8630.01.patch
>
>
> It is observed that ATSv2 REST endpoints are not honoring 
> *yarn.webapp.filter-entity-list-by-user* in non-secure cluster when ACLs are 
> enabled. 
> The issue can be seen if static web app filter is not configured in  
> non-secure cluster.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8630) ATSv2 REST APIs should honor filter-entity-list-by-user in non-secure cluster when ACls are enabled

2018-08-07 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-8630:
---

 Summary: ATSv2 REST APIs should honor filter-entity-list-by-user 
in non-secure cluster when ACls are enabled
 Key: YARN-8630
 URL: https://issues.apache.org/jira/browse/YARN-8630
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


It is observed that ATSv2 REST endpoints are not honoring 
*yarn.webapp.filter-entity-list-by-user* in non-secure cluster when ACLs are 
enabled. 
The issue can be seen if static web app filter is not configured in  non-secure 
cluster.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6695) Race condition in RM for publishing container events vs appFinished events causes NPE

2018-08-07 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571174#comment-16571174
 ] 

Akhil PB commented on YARN-6695:


NPE has thrown when tried to stop a service, which resulted in RM shutdown.

[~rohithsharma] [~sunilg] [~vrushalic]
{code:java}
2018-08-07 11:36:02,774 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher:
 Error when publishing entity TimelineEntity[type='YARN_APPLICATION', 
id='application_1533536393859_0003']
2018-08-07 11:36:02,833 INFO 
org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager:
 The collector service for application_1533536393859_0003 was removed
2018-08-07 11:36:02,858 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:459)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:73)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:494)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:483)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:748){code}

> Race condition in RM for publishing container events vs appFinished events 
> causes NPE 
> --
>
> Key: YARN-6695
> URL: https://issues.apache.org/jira/browse/YARN-6695
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Priority: Major
>
> When RM publishes container events i.e by enabling 
> *yarn.rm.system-metrics-publisher.emit-container-events*, there is race 
> condition for processing events 
> vs appFinished event that removes appId from collector list which cause NPE. 
> Look at the below trace where appId is removed from collectors first and then 
> corresponding events are processed. 
> {noformat}
> 2017-06-06 19:28:48,896 INFO  capacity.ParentQueue 
> (ParentQueue.java:removeApplication(472)) - Application removed - appId: 
> application_1496758895643_0005 user: root leaf-queue of parent: root 
> #applications: 0
> 2017-06-06 19:28:48,921 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(190)) - The collector service for 
> application_1496758895643_0005 was removed
> 2017-06-06 19:28:48,922 ERROR metrics.TimelineServiceV2Publisher 
> (TimelineServiceV2Publisher.java:putEntity(451)) - Error when publishing 
> entity TimelineEntity[type='YARN_CONTAINER', 
> id='container_e01_1496758895643_0005_01_02']
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:72)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:480)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:469)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org